Machine Learning and Data Science Cheat Sheet

You can download the new machine learning cheat sheet here (PDF format, 14 pages.) 

Originally published in 2014 and viewed more than 200,000 times, this is the oldest data science cheat sheet - the mother of all the numerous cheat sheets that are so popular nowadays. I decided to update it in June 2019. While the first half, dealing with installing components on your laptop and learning UNIX, regular expressions, and file management hasn't changed much, the second half, dealing with machine learning, was rewritten entirely from scratch. It is amazing how things have changed in just five years!

Source for picture: see here (original) or here (PDF)

Written for people who have never seen a computer in their life, it starts with the very beginning: buying a laptop! You can skip the first half and jump to sections 5 and 6 if you are already familiar with UNIX. This new cheat sheet will be included in my upcoming book Machine Learning: Foundations, Toolbox, and Recipes to be published in September 2019, and available (for free) to Data Science Central members exclusively. This cheat sheet is 14 pages long.


1. Hardware

2. Linux environment on Windows laptop

3. Basic UNIX commands

4. Scripting languages

5. Python, R, Hadoop, SQL, DataViz

6. Machine Learning

  • Algorithms
  • Getting started
  • Applications
  • Data sets and sample projects

To not miss this type of content in the future, subscribe to our newsletter. For related articles from the same author, click here or visit www.VincentGranville.com. Follow me on on LinkedIn, or visit my old web page here.

Views: 267327


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Vincent Granville on February 26, 2016 at 7:22am

Thanks Venkatesh, I fixed the link.

Comment by Venkatesh Balakumar on February 25, 2016 at 11:19pm

Vincent, section 8 Machine Learning, the reference link is not working.

Comment by AllenJ on February 23, 2016 at 2:50am

Thanks for sharing. Very informative and useful

Comment by Phillip Burger on February 17, 2016 at 9:37am

Here's further perspective a year and a half after Vincent's article. His article is still awesome, very helpful.

If you're new to the Data Science space and trying to figure out what platform to adopt, go Linux/Unix. The Data Science, Big Data, Hadoop, etc., space is Linux stack. There is no indication of any OS that will replace it.

Make items 1 and 2 irrelevant in your career. Spend the time building skills in the areas identified in items 3 and 4. Yes, the layer on Windows does work. You can do the analysis you need to as described. It does work. But if you have a choice and think about the future why use it?

If you're in an organization that is Microsoft stack and uses Azure, or aspire to be in such an organization, then go with Windows with the layer. Otherwise, go with Linux/Unix. Thrive and good luck!

Comment by Richard D Appiah on January 9, 2016 at 11:06pm

Extremely informative list! Thanks

Comment by Suresh Chandra Ganapuram on December 12, 2015 at 5:42am
Very interesting and exhaustive information. Already started practicing R thru datacamp. Looks like way to go... wish you all the best everyone who's aspiring to become data scientist.
Comment by pavan on December 11, 2015 at 12:08am

Super Vincent

Comment by Savita Kirpalani on November 24, 2015 at 5:48am

Great list!!


Comment by Bernie L Malonson on November 13, 2015 at 6:41pm

A journey of a thousand miles begins with a simple step, or mouse click, or page view,..., you get the idea. Thanks for a comprehensive overview and roadmap as I begin my journey.

Comment by Besim Ismaili on November 10, 2015 at 2:34am

I love this post! :)

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service