The Data Science Toolkit - My Boot Camp Ciriculum

This is a compilation has everything you need to jumpstart your skills in the core tasks of data transformation, modeling, and visualization.

tl;dr: Coursera and John Hopkins have a new course called The Data Scientist's Toolbox. https://www.coursera.org/course/datascitoolbox


Below is a list of popular analysis from Rexer's 2013 survey. The table is biased towards customer transaction, text, and social media data. The average Rexer Survey respondent reports using 12 algorithms. This the industry's algorithmic bread and butter.

Analysis | Typical Use

Regression. Simple trend lines.

Decision trees. Simple decision analysis.

Cluster analysis. Classification.

Time series. Stock tickers.

Text mining. Sentiment analysis.

Ensemble models. Bootstrap aggregating.

Factor analysis. Latent variable exploration.

Neural nets. Function approximation.

Random forests. Classification.

Association rules. Customer behaviour.

Bayesian. Pattern recognition.

Support vector machines (SVM). Classification.

Social network analysis. Marketing and advertising.

Uplift modeling. CRM up-selling.

Survival analysis. Reliability and risk management.

Link analysis. Knowledge discovery.

Genetic algorithms. Phylogenetics.

Splines (MARS). Regression.

CRAN has pages dedicated to each typical task of statistical computing

Python has several packages tailored for statistical analysis including Pandas, Orange, PyBrain and Scikit-learn


OpenRefine is designed to help journalists and other non technical people organize incomplete data from different sources. http://openrefine.org/

A decent Scrapy presentation by David McLean. He shares his opinion about the evils of scraping.

David Beazley's Learn Python Through Public Data Hacking. Dabeaz gold.

Wes McKinney, creator of Pandas gave a 3 hour tutorial on data analysis

Feel more confident with your Python skills with Raymond Hettinger's Transforming Code into Beautiful Idiomatic Python.

VISUALIZATION (a.k.a. Web publishing)

The best way to dive into visualization is by perusing the canonical examples in the d3.js gallery. Spend some time playing with the demos and ask yourself "How can this become more useful?" Indeed, most of the gallery is frivolous, but the potential is undeniable.

D3 tutorials have come a long way in a short time thanks to people like Scott Murray. He did a great O'Reilly webcast http://oreillynet.com/pub/e/2952
There's also youtube videos with Mike Dewar and Malcom Maclean has published D3 Tips and Tricks with Leanpub. http://www.d3noob.org/

Other noteworthy visualization platforms are DataWrapper, Flot, Highcharts, TheJit, Arbor.js, and Kartograph. These libraries depend on JavaScript, SVG and CSS, which is why I consider data visualization synonymous with web publishing.

John Lindquist's Egghead.io covers the details of AngularJS https://www.youtube.com/watch?v=WuiHuZq_cg4

Paul Irish is a star evangelist for HTML5 workflows. He uses a lot of tools in front-end dev.

Getting Started with Django by Matt Love is most excellent, incompleteness notwithstanding.

Code Academy has interactive tutorials for Python, JavaScript and web fundamentals http://www.codecademy.com/

Professional trainers are slower paced and cover material that isn't in MOOCs or youtube videos:

Lynda.com - R stats - http://www.lynda.com/R-tutorials/R-Statistics-Essential-Training/14...
Tutsplus - Say Yo to Yeoman - http://net.tutsplus.com/tutorials/tools-and-tips/say-yo-to-yeoman/
Pluralsight - Twitter Bootstrap 3 - http://pluralsight.com/training/Courses/TableOfContents/bootstrap-3
Codeschool - NodeJS - https://www.codeschool.com/courses/real-time-web-with-nodejs
CBT Nuggets - AWS - http://www.cbtnuggets.com/it-training-videos/course/aws-certificati...

You now have everything you need to make the rubber hit the road. Go find some cool data and get to work!

Tags: Education, Tools, Videos


