Data Science Central

*The new, completed version of this Data Science Cheat Sheet can be found here.*

We are now at 20, up from 17. I hope I find the time to write a one-page survival guide for UNIX, Python and Perl. Here's one for R. The links to core data science concepts are below - I need to add links to web crawling, attribution modeling and API design. Relevancy engines are discussed in some of the tutorials listed below. And that will complete my 10-page cheat sheet for data science.

Here's the list:

- Tutorial: How to detect spurious correlations, and how to find the real ones
- Practical illustration of Map-Reduce (Hadoop-style), on real data
- Jackknife logistic and linear regression for clustering and predictions
- From the trenches: 360-degrees data science
- A synthetic variance designed for Hadoop and big data
- Fast Combinatorial Feature Selection with New Definition of Predict...
- A little known component that should be part of most data science a...
- 11 Features any database, SQL or NoSQL, should have
- Clustering idea for very large datasets
- Hidden decision trees revisited
- Correlation and R-Squared for Big Data
- Marrying computer science, statistics and domain expertize
- New pattern to predict stock prices, multiplies return by factor 5
- What Map Reduce can't do
- Excel for Big Data
- Fast clustering algorithms for massive datasets
- Source code for our Big Data keyword correlation API
- The curse of big data
- How to detect a pattern? Problem and solution
- Interesting Data Science Application: Steganography

**Other Cheat Sheets**

Vincent's Cheat Sheets for Perl, R, Excel (includes Linest, Vlookup), Linux, cron jobs, gzip, ftp, putty, regular expressions, Cygwin, pipe operators, files management, dashboard design etc. coming soon

Cheat Sheets for Python

- Python: www.astro.up.pt/~sousasag/Python_For_Astronomers/Python_qr.pdf
- NumPy, SciPy and Pandas: s3.amazonaws.com/quandl-static-content/Documents/Quandl+-+Pandas,+S...

Cheat Sheets for R

- Short Reference Card cran.r-project.org/doc/contrib/Short-refcard.pdf
- R Functions for Regression Analysis cran.r-project.org/doc/contrib/Ricci-refcard-regression.pdf
- Time Series cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf
- Data Mining cran.r-project.org/doc/contrib/YanchangZhao-refcard-data-mining.pdf
- Quandl s3.amazonaws.com/quandl-static-content/Documents/Quandl+-+R+Cheat+S...

Cross Reference between R, Python (and Matlab)

Cheat Sheets for SQL

- SQL Joins www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins
- SQL and Hive hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.Ch...

Additional

- Cheat Sheets for Java introcs.cs.princeton.edu/java/11cheatsheet/
- Linux Cheat Sheet www.linuxstall.com/linux-command-line-tips-that-every-linux-user-sh...

**Related link**: The Data Science Toolkit

**Other interesting links**

## Vineet Berlia

Awsome collection. Thanks Vincent.

Jun 13, 2015

## Jianhua/Jason Li

cool. Thanks.

Nov 19, 2015

## Brad Kolarov

Vince, this is great for aspiring data scientists! I think it is a great resource! Something else I have found beneficial that lets me focus on my data rather than spinning up the analytic stacks is a tool called Stackspace. Its in beta, but has save me a ton of time!

Jan 26