20 short tutorials all data scientists should read (and practice)

The new, completed version of this Data Science Cheat Sheet can be found here.

We are now at 20, up from 17. I hope I find the time to write a one-page survival guide for UNIX, Python and Perl. Here's one for R. The links to core data science concepts are below - I need to add links to web crawling, attribution modeling and API design. Relevancy engines are discussed in some of the tutorials listed below. And that will complete my 10-page cheat sheet for data science. 

Here's the list:

  1. Tutorial: How to detect spurious correlations, and how to find the ...
  2. Practical illustration of Map-Reduce (Hadoop-style), on real data
  3. Jackknife logistic and linear regression for clustering and predict...
  4. From the trenches: 360-degrees data science
  5. A synthetic variance designed for Hadoop and big data
  6. Fast Combinatorial Feature Selection with New Definition of Predict...
  7. A little known component that should be part of most data science a...
  8. 11 Features any database, SQL or NoSQL, should have
  9. Clustering idea for very large datasets
  10. Hidden decision trees revisited
  11. Correlation and R-Squared for Big Data
  12. Marrying computer science, statistics and domain expertize
  13. New pattern to predict stock prices, multiplies return by factor 5
  14. What Map Reduce can't do
  15. Excel for Big Data
  16. Fast clustering algorithms for massive datasets
  17. Source code for our Big Data keyword correlation API
  18. The curse of big data
  19. How to detect a pattern? Problem and solution
  20. Interesting Data Science Application: Steganography

Other Cheat Sheets

Vincent's Cheat Sheets for Perl, R, Excel (includes Linest, Vlookup), Linux, cron jobs, gzip, ftp, putty, regular expressions, Cygwin, pipe operators, files management, dashboard design etc. coming soon

Cheat Sheets for Python 

Cheat Sheets for R 

Cross Reference between R, Python (and Matlab) 

Cheat Sheets for SQL 


Related linkThe Data Science Toolkit

Other interesting links

Views: 239536


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Vijay Singh on September 30, 2019 at 11:15pm

Hi Vincent Granville,

You can add our java Cheat Sheet in the list of your tutorials, which is helping to all the developer nowadays, through this cheat sheet people can learn Java Cheat or can revise with this quick reference.

Comment by mathieu thelot on June 16, 2016 at 9:48am
Comment by Greg Slawek on March 15, 2016 at 6:03pm

Broken link - clicked on "Here's one for R."

Our apologies – this page was not found

Can we get an updated link there?

Comment by Mary James on February 16, 2016 at 1:11am

Thank you for the great resources that help researchers in their projects.

I recommend this website to learn how to program in java:
It was really helpful for me.

Comment by Brad Kolarov on January 26, 2016 at 10:59am

Vince, this is great for aspiring data scientists! I think it is a great resource! Something else I have found beneficial that lets me focus on my data rather than spinning up the analytic stacks is a tool called Stackspace. Its in beta, but has save me a ton of time!

Comment by Jianhua/Jason Li on November 19, 2015 at 10:15am

cool. Thanks.

Comment by Vineet Berlia on June 13, 2015 at 1:17pm

Awsome collection. Thanks Vincent.

Comment by Parmod Kumar on May 24, 2015 at 7:19pm

Thanks for sharing this!

Comment by Godwin Iferi on February 19, 2015 at 12:11pm

Absolutely great resource. l find this helpful!

Comment by Egidio Ndabagoye on November 4, 2014 at 8:44pm

Thanks Vincent.This is very informative

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service