This list is a shorter version of the post that I published in my blog a few weeks ago: 75+ free online resources to boost your data science and analysis s...

This list is obviously bias toward my preferences and experience. Moreover, I realised that some interesting topics as data visualization and experiment design are not properly covered. That’s why any suggestion in the comments of this post is more than welcome.

I selected those resources that are more suitable for beginners together with the parts of machine learning that I like the most.

- You can start with this introduction to data mining by Saed Sayad (University of Toronto). I found the first diagram particularly interesting.
- This glossary of machine learning terms is the best that I’ve found so far.
- An introduction to machine learning in 10 pictures is a short still great article to start with.
- Xavier Amatriain, one of the minds behind Netflix’s famous recommendation system, explains what are the advantages of different classification algorithms.
- Don’t miss this list of machine learning podcasts.
- Introduction to Recommender Systems is a 4-hour lecture of the 2014 Machine Learning Summer School at CMU. You can find other interesting machine learning lectures from the same summer school and other programs in Alex Smola’s YouTube channel.
- The Elements of Statistical Learning is a classic book ideal to understand the foundations of many machine learning methods.
- Tex Mining with WEKA cookbook for those who prefer Java.
- “Machine Learning Gremlins” is a presentation on common machine learning mistakes by Ben Hamner (Kaggle).
- Because we don’t always need exact answers, this introduction to stream mining by Mikio Braun can be very useful to you.
- If you want a wider vision of artificial intelligence, these lectures from the AI course taught at MIT by Patrick Winston.
- The lectures of the course “CS273a: Introduction to Machine Learning” by Prof. Alex Ihler (UCI) are available on Youtube.
- Choosing a machine learning model can be a cumbersome task. That’s why we have automatic machine learning to assist model selection. These slides are a good entry point to it.
- For a good picture of the state of the art of neural networks and deep learning, you can find tutorials and workshops of the NIPS 2014 conference in this YouTube channel. You can also find this summary of the conference by John Platt (Microsoft Research).

- Pretty handy resource to explain statistical significance: how to Assess Statistical Significance.
- Top 10 big ideas covered in the Probability course at Harvard by Joe Blitzstein. You can also watch on Youtube the lectures of this course.
- Learn more about errors in hypothesis testing (statistical significance and power) from this lecture on Data Collection and Statistical Inference by Aaron Gullickson.
- What to do when data is missing? Learn what statisticians working in clinical trial field do.
- Introduction to Time Series Analysis from the book Engineering Statistics.
- This article talks about how to optimize decisions beyond A/B testing, including an introduction to the multi-armed bandit problem and the epsilon-greedy strategy.
- Jeff Rajeck has a series of posts titled “using data science with A/B tests”. I particularly enjoyed the one covering Bayesian analysis.
- Brian Caffo is one of the lecturers of the Data Science specialization on Coursera and his YouTube channel is full of resources to learn statistics.
- Some statistical concepts that data scientists usually overlook by Chris Fonnesbeck at SciPy 2015.

Once you are familiar with Python, the following resources for machine learning and data analysis can take your skills to the next level:

- Video tutorials to learn how to use Python’s scikit-learn library to perform machine learning by Kevin Markham.
- 3h+ in-depth introduction to machine learning with scikit-learn by Kyle Kastner (Université de Montréal) and Andreas Mueller (NYU Center for Data Science).
- Machine learning cheat sheet for scikit-learn by Andreas Mueller.
- If you are interested in using neural networks in Python, Daniel Nouri explains how to solve the Facial Keypoint Detection Kaggle challenge using L....
- If you don’t have a technical background, you’ll find very useful the scripts that you can find in Practical Business Python.
- Notebook Gallery: links to the best IPython and Jupyter notebooks submitted by users.
- Recipes of the IPython Cookbook include excellent examples of how to use NumPy, scikit-learn and many other packages.
- Code snippets of some of the most common operations with Pandas.
- Make your first machine learning predictions using Python with this Kaggle tutorial.
- NLTK is the most popular library for natural language processing in Python. This presentation can give you a good overview of what you can do with it and this 1 hour tutorial will show you what you can do with it.
- PyDataTV is the YouTube channel of the PyData conferences. You can find keynotes, talks and workshops on how to use the PyData stack.

I’ve been trying hard to like R. It’s been in fact more than 5 years of trying to like it and I just simply prefer Python. In any case, I still frequently launch an R prompt to use some fantastic packages that R has.

- Intro to R is a playlist by Google Developers that explains all the basics of the language.
- Kaggle top ranker Xavier Conort listed 10 R Packages to win Kaggle competitions. That’s a good way to discover some very prominent R packages.
- An Introduction to Statistical Learning with Applications in R is a terrific free book full of examples.
- “R: the good parts” is an article by Jose Quesada (Data Science Retreat) that lists the main advantages of R with links to other good resources.
- Archetypal analysis is not usually taught in introductory machine learning courses.This post explains how to apply it and shows that it outperforms kmeans in a number of cases. Plus, archetypal analysis is easier to interpret.
- AnomalyDetection and BreakoutDetection: open source R packages for time-series analysis by Twitter.
- qdap is not only one of the best packages for natural language processing in R, but also one of the best documented. Use the vignette to get started with it and later on the manual.
- I know from my own experience that R’s memory limitations can give you a headache. These tricks are sometimes an effective painkiller. Also the slides “Taking R to the Limit: Large Datasets” might help.
- statsTeachR is a repository of lessons for teaching statistics using R.
- Make your first machine learning predictions using R with any of these four tutorials.

To end with, some examples on how data science and machine learning can be used to add value to your organization:

- Jeff Leek (Johns Hopkins University) shared some interesting learnings in his post10 things statistics taught us about big data analysis.
- What Data Science can do for entrepreneurs? Growth, retention, product customization and marketing optimization
- How to start data science initiatives in a lean and cost-effective way
- This paper explains how Booking.com uses crowdsourced data and machine learning to suggest the destination of the next trip.
- Someone asked how Quora uses machine learning and answers are very representative of how a website can benefit from using it.
- Airbnb guest requests are 4% more likely to be accepted after they used collaborative filtering to predict host’s behavior. They also used data to understand what their users want and show more relevant results to ....
- This paper describes how to do customer segmentation for customer retention using decision trees.
- How Spotify uses deep learning to recommend music is well documented in this post.
- How Google transcribed house numbers from Street View using neural networks.
- Predicting consumer credit-risk performance at the beginning of the... (Paper).

© 2020 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Upcoming DSC Webinar**

- Optimization and The NFL’s Toughest Scheduling Problem - June 23

At first glance, the NFL’s scheduling problem seems simple: 5 people have 12 weeks to schedule 256 games over the course of a 17-week season. The scenarios are potentially well into the quadrillions. In this latest Data Science Central webinar, you will learn how the NFL began using Gurobi’s mathematical optimization solver to tackle this complex scheduling problem. Register today.

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Upcoming DSC Webinar**

- Optimization and The NFL’s Toughest Scheduling Problem - June 23

At first glance, the NFL’s scheduling problem seems simple: 5 people have 12 weeks to schedule 256 games over the course of a 17-week season. The scenarios are potentially well into the quadrillions. In this latest Data Science Central webinar, you will learn how the NFL began using Gurobi’s mathematical optimization solver to tackle this complex scheduling problem. Register today.

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central