Book: Mastering Machine Learning with Python in Six Steps

A Practical Implementation Guide to Predictive Data Analytics Using Python

Covers basic to advanced topics in an easy step-oriented manner
Concise on theory, strong focus on practical and hands-on approach
Explores advanced topics, such as Hyper-parameter tuning, deep natural language processing, neural network and deep learning
Describes state-of-art best practices for model tuning for better model accuracy

About The Book:

This book is your practical guide towards novice to master in machine learning with Python in six steps. The six steps path has been designed based on the “Six degrees of separation” theory which states that everyone and everything is a maximum of six steps away. Note that the theory deals with the quality of connections, rather than their existence. So, a great effort has been taken to design an eminent, yet simple six steps covering fundamentals to advanced topics gradually that will help a beginner walk his way from no or least knowledge of machine learning in Python to all the way to becoming a master practitioner. This book is also helpful for current Machine Learning practitioners to learn the advanced topics such as Hyperparameter tuning, various ensemble techniques, Natural Language Processing (NLP), deep learning, and basics of reinforcement learning.

Each topic has two parts, the first part will cover the theoretical concepts and the second part will cover practical implementation with different Python packages. The traditional approach of math to machine learning i.e., learning all the mathematic then understanding how to implement them to solve problems need a great deal of time/effort which has proven to be not efficient for working professionals looking to switch careers. Hence the focus in this book has been more on simplification, such that the theory/math behind algorithms have been covered only to extend required to get you started.

I recommend you to work with the book instead of reading it. Real learning goes on only through active participation. Hence, all the code presented in the book are available in the form of iPython notebooks to enable you to try out these examples yourselves and extend them to your advantage or interest as required later.

What You’ll Learn:

Examine the fundamentals of Python programming language
Review machine Learning history & evolution
Learn various machine learning system development frameworks
Learn fundamentals to advanced text mining techniques
Learn and implement deep learning frameworks

Who This Book Is For:

This book will serve as a great resource for learning machine learning concepts and implementation techniques for:

Python developers or data engineers looking to expand their knowledge or career into machine learning area.
A current non-Python (R, SAS, SPSS, Matlab or any other language) machine learning practitioners looking to expand their implementation skills in Python.
Novice machine learning practitioners looking to learn advanced topics such as hyperparameter tuning, various ensemble techniques, Natural Language Processing (NLP), deep learning, and basics of reinforcement learning.

Content at a Glance

Introduction
Chapter 1: Step 1 – Getting Started in Python
Chapter 2: Step 2 – Introduction to Machine Learning
Chapter 3: Step 3 – Fundamentals of Machine Learning
Chapter 4: Step 4 – Model Diagnosis and Tuning
Chapter 5: Step 5 – Text Mining and Recommender Systems
Chapter 6: Step 6 – Deep and Reinforcement Learning
Chapter 7: Conclusion

Table of Content

INTRODUCTION

CHAPTER 1: STEP 1 – GETTING STARTED IN PYTHON

The Best Things in Life Are Free
The Rising Star
Python 2.7.x or Python 3.4.x?
- Windows Installation
- OSX Installation
- Linux Installation
- Python from Official Website
- Running Python
Key Concepts
- Python Identifiers
- Keywords
- My First Python Program
- Code Blocks (Indentation & Suites)
- Basic Object Types
- When to Use List vs. Tuples vs. Set vs. Dictionary
- Comments in Python
- Multiline Statement
- Basic Operators
- Control Structure
- Lists
- Tuple
- Sets
- Dictionary
- User-Defined Functions
- Module
- File Input/Output
- Exception Handling
Endnotes

CHAPTER 2: STEP 2 – INTRODUCTION TO MACHINE LEARNINGHISTORY AND EVOLUTION

Artificial Intelligence Evolution
Different Forms
- Statistics
- Data Mining
- Data Analytics
- Data Science
- Statistics vs. Data Mining vs. Data Analytics vs. Data Science
Machine Learning Categories
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
Frameworks for Building Machine Learning Systems
- Knowledge Discovery Databases (KDD)
- Cross-Industry Standard Process for Data Mining
- SEMMA (Sample, Explore, Modify, Model, Assess)
- KDD vs. CRISP-DM vs. SEMMA
Machine Learning Python Packages
- Data Analysis Packages
  - NumPy
  - Pandas
  - Matplotlib
- Machine Learning Core Libraries
Endnotes

CHAPTER 3: STEP 3 – FUNDAMENTALS OF MACHINE LEARNING

Machine Learning Perspective of Data
Scales of Measurement
- Nominal Scale of Measurement
- Ordinal Scale of Measurement
- Interval Scale of Measurement
- Ratio Scale of Measurement
Feature Engineering
- Dealing with Missing Data
- Handling Categorical Data
- Normalizing Data
- Feature Construction or Generation
Exploratory Data Analysis (EDA)
- Univariate Analysis
- Multivariate Analysis
Supervised Learning– Regression
- Correlation and Causation
- Fitting a Slope
- How Good Is Your Model?
- Polynomial Regression
- Multivariate Regression
- Multicollinearity and Variation Inflation Factor (VIF)
- Interpreting the OLS Regression Results
- Regression Diagnosis
- Regularization
- Nonlinear Regression
- Supervised Learning – Classification
- Logistic Regression
- Evaluating a Classification Model Performance
- ROC Curve
- Fitting Line
- Stochastic Gradient Descent
- Regularization
- Multiclass Logistic Regression
- Generalized Linear Models
- Supervised Learning – Process Flow
- Decision Trees
- Support Vector Machine (SVM)
- k Nearest Neighbors (kNN)
- Time-Series Forecasting
Unsupervised Learning Process Flow
- Clustering
- K-means
- Finding Value of k
- Hierarchical Clustering
- Principal Component Analysis (PCA)
Endnotes

CHAPTER 4: STEP 4 – MODEL DIAGNOSIS AND TUNING

Optimal Probability Cutoff Point
- Which Error Is Costly?
Rare Event or Imbalanced Dataset
- Known Disadvantages
Which Resampling Technique Is the Best?
Bias and Variance
- Bias
- Variance
K-Fold Cross-Validation
Stratified K-Fold Cross-Validation
Ensemble Methods
Bagging
- Feature Importance
- RandomForest
- Extremely Randomized Trees (ExtraTree)
- How Does the Decision Boundary Look?
- Bagging – Essential Tuning Parameters
Boosting
- Example Illustration for AdaBoost
- Gradient Boosting
- Boosting – Essential Tuning Parameters
- Xgboost (eXtreme Gradient Boosting)
Ensemble Voting – Machine Learning’s Biggest Heroes United
- Hard Voting vs. Soft Voting
Stacking
Hyperparameter Tuning
- GridSearch
- RandomSearch
Endnotes

CHAPTER 5: STEP 5 – TEXT MINING AND RECOMMENDER SYSTEMS

Text Mining Process Overview
Data Assemble (Text)
- Social Media
- Step 1 – Get Access Key (One-Time Activity)
- Step 2 – Fetching Tweets
Data Preprocessing (Text)
- Convert to Lower Case and Tokenize
- Removing Noise
- Part of Speech (PoS) Tagging
- Stemming
- Lemmatization
- N-grams
- Bag of Words (BoW)
- Term Frequency-Inverse Document Frequency (TF-IDF)
Data Exploration (Text)
- Frequency Chart
- Word Cloud
- Lexical Dispersion Plot
- Co-occurrence Matrix
Model Building
Text Similarity
Text Clustering
- Latent Semantic Analysis (LSA)
Topic Modeling
- Latent Dirichlet Allocation (LDA)
- Non-negative Matrix Factorization
Text Classification
Sentiment Analysis
Deep Natural Language Processing (DNLP)
Recommender Systems
- Content-Based Filtering
- Collaborative Filtering (CF)
Endnotes

CHAPTER 6: STEP 6 – DEEP AND REINFORCEMENT LEARNING

Artificial Neural Network (ANN)
What Goes Behind, When Computers Look at an Image?
Why Not a Simple Classification Model for Images?
Perceptron – Single Artificial Neuron
Multilayer Perceptrons (Feedforward Neural Network)
- Load MNIST Data
- Key Parameters for scikit-learn MLP
Restricted Boltzman Machines (RBM)
MLP Using Keras
Autoencoders
- Dimension Reduction Using Autoencoder
- De-noise Image Using Autoencoder
Convolution Neural Network (CNN)
- CNN on CIFAR10 Dataset
- CNN on MNIST Dataset
Recurrent Neural Network (RNN)
- Long Short-Term Memory (LSTM)
Transfer Learning
Reinforcement Learning
Endnotes

CHAPTER 7: CONCLUSION

Summary
Tips
- Start with Questions/Hypothesis Then Move to Data!
- Don’t Reinvent the Wheels from Scratch
- Start with Simple Models
- Focus on Feature Engineering
- Beware of Common ML Imposters
Happy Machine Learning