# Lecture notes for the Statistical Machine Learning course taught  at the Department of Information Technology, University of Uppsala (Sweden.) Updated in March 2019. Authors: Andreas Lindholm, Niklas Wahlström, Fredrik Lindsten, and Thomas B. Schön. Source: page 61 in these lecture notes

Available as a PDF, here (original) or here (mirror).

Content

1 Introduction 7

1.1 What is machine learning all about?
1.2 Regression and classification
1.3 Overview of these lecture notes

2 The regression problem and linear regression 11

2.1 The regression problem
2.2 The linear regression model

• Describe relationships — classical statistics
• Predicting future outputs — machine learning

2.3 Learning the model from training data

• Maximum likelihood
• Least squares and the normal equations

2.4 Nonlinear transformations of the inputs – creating more features
2.5 Qualitative input variables
2.6 Regularization

• Ridge regression
• LASSO
• General cost function regularization

2.A Derivation of the normal equations

• A calculus approach
• A linear algebra approach

3 The classification problem and three parametric classifiers 25

3.1 The classification problem
3.2 Logistic regression

• Learning the logistic regression model from training data
• Decision boundaries for logistic regression
• Logistic regression for more than two classes

3.3 Linear and quadratic discriminant analysis (LDA & QDA)

• Using Gaussian approximations in Bayes’ theorem
• Using LDA and QDA in practice

3.4 Bayes’ classifier — a theoretical justification for turning p(y | x) into yb

• Bayes’ classifier
• Optimality of Bayes’ classifier
• Bayes’ classifier in practice: useless, but a source of inspiration
• Is it always good to predict according to Bayes’ classifier?

3.5 More on classification and classifiers

• Regularization
• Evaluating binary classifiers

4 Non-parametric methods for regression and classification: k-NN and trees 43

4.1 k-NN

• Decision boundaries for k-NN
• Choosing k
• Normalization

4.2 Trees

• Basics
• Training a classification tree
• Other splitting criteria
• Regression trees

5 How well does a method perform? 53

5.1 Expected new data error Enew: performance in production
5.2 Estimating Enew

• Etrain 6≈ Enew: We cannot estimate Enew from training data
• Etest ≈ Enew: We can estimate Enew from test data
• Cross-validation: Eval ≈ Enew without setting aside test data

5.3 Understanding Enew

• Enew = Etrain+ generalization error
• Enew = bias2 + variance + irreducible error

6 Ensemble methods 67

6.1 Bagging

• Variance reduction by averaging
• The bootstrap

6.2 Random forests
6.3 Boosting

• The conceptual idea
• Binary classification, margins, and exponential loss
• Boosting vs. bagging: base models and ensemble size
• Robust loss functions and gradient boosting

6.A Classification loss functions

7 Neural networks and deep learning 83

7.1 Neural networks for regression

• Generalized linear regression
• Two-layer neural network
• Matrix notation
• Deep neural network
• Learning the network from data

7.2 Neural networks for classification

• Learning classification networks from data

7.3 Convolutional neural networks

• Data representation of an image
• The convolutional layer
• Condensing information with strides
• Multiple channels
• Full CNN architecture

7.4 Training a neural network

• Initialization
• Learning rate
• Dropout

A Probability theory 101

A.1 Random variables

• Marginalization
• Conditioning

A.2 Approximating an integral with a sum

B Unconstrained numerical optimization 105

B.1 A general iterative solution
B.2 Commonly used search directions

• Steepest descent direction
• Newton direction
• Quasi-Newton

Bibliography

Views: 4639

Comment

Join Data Science Central