A summary of the book “Introduction to Statistical Learning” in jupyter notebooks
Whenever someone asks me “How to get started in data science?”, I usually recommend the book — “Introduction of Statistical Learning by Daniela Witten, Trevor Hast…”, to learn the basics of statistics and ML models.
And understandably, completing a technical book while practicing it with relevant data and code is a challenge for lot of us.
So, I have created this course on statistical machine learning in python as a concise summary of the book and hosted it in a GitHub repository Introduction_to_statistical_learning_summary_python.
In the repository, each chapter of the book has been translated into a jupyter notebook with summary of the key concepts, data & python code to practice.
If you want to quickly understand the book, learn statistical machine learning or/and python for data science, then just click here & start learning!
Notebook: Chapter 2: Statistical Learning explains
 What Is Statistical Learning?
 Assessing Model Accuracy
 Introduction to Programming language, Python
Notebook: Chapter 3: Linear Regression explains
 Linear Regression (LR) simple, multiple
 Qualitative Predictors in LR
 Nonlinear Transformations of the Predictors
 Potential Problems with least square linear regression
Notebook: Chapter 4: Classification explains
 Classification Overview
 Logistic Regression
 Linear Discriminant Analysis (LDA)
 Quadratic Discriminant Analysis (QDA)
 Knearest neighbour
Notebook: Chapter 5: Resampling Methods explains
 CrossValidation
 The Validation Set Approach
 LeaveOneOut CrossValidation
 kFoldCrossValidation
 The Bootstrap
Notebook: Chapter 6: Linear Model Selection and Regularization explains
 Subset Selection Models
 Best Subset Selection
 Forward Stepwise Selection
 Backward Stepwise Selection
 Shrinkage Methods
 Ridge Regression
 The Lasso
 Dimension Reduction Methods PCR and PLS Regression
 Principal Components Regression
 Partial Least Squares
Note: Chapter7,8,9 and 10 will be added soon.
About the book:
“This book is intended for anyone who is interested in using modern statistical methods for modeling and prediction from data. This group includes scientists, engineers, data analysts, or quants, but also less technical individuals with degrees in nonquantitative fields such as the social sciences or business. We expect that the reader will have had at least one elementary course in statistics.”
I recommend this book because

This book (and derived notebooks in this repo) marries the statistical machine learning concepts with reallife data science problem statements. Each chapter/concept begins with a real scenario, like – “You are a consultant who needs to advice the best medium of advertising & budgets to increase the sale of a product, using the advertising data” and explains techniques and methods step by step as we solve through it.
 It gives a modest introduction to statistics and mathematics behind the most used methods like
 Regressions,
 classifications,
 decision trees,
 SVM,
 clustering,
 unsupervised learning,
 resampling,
 crossvalidation methods,
 Dimension reduction methods.
 It also provides a lab section at the end of each chapter. It offers R code snippets & various libraries that will come in handy to analyze data, build models, and test them. This repo gives the same code in python, so you are covered either way! This will help you get started and equip you to test out the given methods & models on your own data.
Few important concepts it doesn’t touch at all
 Time series data models
 Neural networks
 Deep learning
 Bayesian
This is the independent part of my blog series, Data science for analytical minds, serving as a resource for people, especially from nontechnical backgrounds like economics, statistics, mathematics, physics etc, to learn different components of data science through real life problem statements.
Checkout its introduction blog & data quality & cleaning blog. This is the 3rd part of the series focusing on statistics & machine learning basics.
This is meant to give you quick head start with most used statistical concepts with data and code to play with. For a deeper understanding of any concept, I recommend referring back to the book.
If you find any issues or have doubts, feel free to submit issues.
If you have any generic feedback, ideas to collaborate or anything interesting to say, you can reach me at shilpaarora992[at]gmail[dot]com.