The fifth and final Udacity course for my Data Analyst Nanodegree was Data Visualization and D3.js. The project submission instructions were: Create a data visualization from a data set that tells a story or highlights trends or patterns in the data. Use either dimple.js or d3.js to create the visualization. Your work should be a […]
Data Visualization Project Feedback Request
Hello, I would like to solicit feedback on the first iteration of my final project for the Data Visualization course at Udacity. I wanted to have a little fun with this project and create and explore a new dataset. If you are interested in the source files they can be found here. I play Daily […]
Intro to Machine Learning
The fourth Udacity course for my Data Analyst Nanodegree was Intro to Machine Learning. The project submission instructions were: Play detective and put your machine learning skills to use by building an algorithm to identify Enron Employees who may have committed fraud based on the public Enron financial and email dataset. You can find my […]
Hypergeometric Distribution Calculations in Python
I’m going to discuss my code to calculate Hypergeometric confidence limits and sample size in Python. In this post I discussed sample sizes for various statistical methods. Lets first discuss the approach for confidence intervals. Assume we want to calculate a 95% two tailed confidence interval (CI) for a sample size of 284 and sample […]
Data Analysis with R
The third Udacity course for my Data Analyst Nanodegree was Data Analysis with R. The project submission instructions were: Use R and apply exploratory data analysis techniques to explore relationships in one variable to multiple variables and to explore a selected data set for distributions, outliers, and anomalies. You can find my submission here and […]
Data Wrangling with MongoDB
The second Udacity course for my Data Analyst Nanodegree was Data Wrangling with MongoDB. The project submission instructions were: Choose any area of the world in https://www.openstreetmap.org and use data munging techniques, such as assessing the quality of the data for validity, accuracy, completeness, consistency and uniformity, to clean the OpenStreetMap data for a part […]
Intro to Data Science
The first Udacity course for my Data Analyst Nanodegree was Intro to Data Science. The project submission instructions were: Look at NYC Subway data and figure out if more people ride the subway when it is raining versus when it is not raining. Wrangle the NYC subway data, use statistical methods and data visualization to […]
Dynamic code not hard code
One of my favorite ways of using SQL to streamline your code is the utilization of cursors, loops and dynamic SQL. Sometimes I start with this type of solution right out of the gate and sometimes it comes as my code evolves. Today I ran into a data integrity issue in a table and I […]
Sample Sizes
In this post I will discuss a common question I’m asked regarding statistical sampling and the use of the normal approximation method to calculate sample sizes. The most conventional and easiest approach to use to calculate sample sizes is the normal distribution. However, I’m going to look more closely using a specific example to see […]
Disclaimer on posts
I take confidentiality and proprietary information very seriously. It really goes to the core of my ethical responsibility. With that said, I do feel being more social and writing about things that I do on a daily basis is a good way to demonstrate my professional skills. As such, I will use generic references to […]