Subscribe to DSC Newsletter

Free Book: Classification and Regression In a Weekend

By Ajit Jaokar and Dan Howarth. With contributions from Ayse Mutlu.

Exclusively for Data Science Central members, with free access. You can download this book (PDF) here

This tutorial began as a series of weekend workshops created by Ajit Jaokar and Dan Howarth. The idea was to work with a specific (longish) program such that we explore as much of it as possible in one weekend. This book is an attempt to take this idea online. The best way to use this book is to work with the Python code as much as you can. The code has comments.  But you can extend the comments by the concepts explained here.

Content

1. Introduction and approach 4

2. Background, tools and philosophy 6

  • What you will learn from this book? 6
  • Components for book 7
  • Big Picture Diagram 7

3. Code outline 7

  • Regression code outline 7
  • Classification Code Outline 8

4. Exploratory data analysis and graphics 8

  • Numeric descriptive statistics 8
  • Interpreting descriptive statistics 9
  • Understanding the distribution 10
  • Histograms 10
  • Boxplots and IQR 10
  • Correlation 11
  • heatmaps for co-relation 12
  • Analysing the target variable 13

5. Pre-processing data 13

  • Dealing with missing values 13
  • Treatment of categorical values 13
  • Normalise the data 14
  • Split the data 15

6. Choose a Baseline algorithm 15

  • Defining / instantiating the baseline model 15
  • Fitting the model we have developed to our training set 16
  • Define the evaluation metric 16
  • Predict scores against our test set and assess how good it is 18

7. Evaluation metrics for classification 18

  • Improving a model – from baseline models to final models 21
  • Understanding cross validation 21
  • Feature engineering 24
  • Regularization to prevent overfitting 24
  • Ensembles – typically for classification 26
  • Test alternative models 27
  • Hyperparameter tuning 28

8. Conclusion 28

A1. Regression Code 29

A2. Classification Code 36

Other DSC Books

To access the book, and if you are not yet a DSC member, you can register as a member, following this link.

Views: 58384

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Thomas Goronflot on July 16, 2019 at 11:25pm

Hi

I read this free book quickly out of curiosity and spotted an error in the formula for accuracy (p34).

Accuracy = (TP + TN) / n

hope it helps
regards

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service