Home » Uncategorized

Python: Getting Started with Data Science

This post was written by Dallin Akagi and Mark Steadman.

This short tutorial will not only guide you through some basic data analysis methods but it will also show you how to implement some of the more sophisticated techniques available today. We will look into traffic accident data from the National Highway Traffic Safety Administration and try to predict fatal accidents using state-of-the-art statistical learning techniques.  If you are interested, download the code at the bottom and follow along as we work through a real world data set. This post is in Python while a companion post covers the same techniques in R.


Table of Contents: 

1.  First things first

2.  Get some data

3.  Load the data into Python

4.  Clean up the data

5.  Now we model!

6.  Now what should I do?
        -Data Prep
        -Which is the best model?
        -Which is the best story?

This post was inspired from the StatLearning MOOC by Stanford.

Check out all this information here

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge