The fifth and final Udacity course for my Data Analyst Nanodegree was Data Visualization and D3.js. The project submission instructions were: Create a data visualization from a data set that tells a story or highlights trends or patterns in the data. Use either dimple.js or d3.js to create the visualization. Your work should be a […]
Intro to Machine Learning
The fourth Udacity course for my Data Analyst Nanodegree was Intro to Machine Learning. The project submission instructions were: Play detective and put your machine learning skills to use by building an algorithm to identify Enron Employees who may have committed fraud based on the public Enron financial and email dataset. You can find my […]
Hypergeometric Distribution Calculations in Python
I’m going to discuss my code to calculate Hypergeometric confidence limits and sample size in Python. In this post I discussed sample sizes for various statistical methods. Lets first discuss the approach for confidence intervals. Assume we want to calculate a 95% two tailed confidence interval (CI) for a sample size of 284 and sample […]
Data Wrangling with MongoDB
The second Udacity course for my Data Analyst Nanodegree was Data Wrangling with MongoDB. The project submission instructions were: Choose any area of the world in https://www.openstreetmap.org and use data munging techniques, such as assessing the quality of the data for validity, accuracy, completeness, consistency and uniformity, to clean the OpenStreetMap data for a part […]
Intro to Data Science
The first Udacity course for my Data Analyst Nanodegree was Intro to Data Science. The project submission instructions were: Look at NYC Subway data and figure out if more people ride the subway when it is raining versus when it is not raining. Wrangle the NYC subway data, use statistical methods and data visualization to […]