Lecture notes for the Statistical Machine Learning course taught at the Department of Information Technology, University of Uppsala (Sweden.) Updated in March 2019. Authors: Andreas Lindholm, Niklas Wahlström, Fredrik Lindsten, and Thomas B. Schön.

*Source: page 61 in these lecture notes*

Available as a PDF, here (original) or here (mirror).

**Content**

**1 Introduction** 7

1.1 What is machine learning all about?

1.2 Regression and classification

1.3 Overview of these lecture notes

1.4 Further reading

**2 The regression problem and linear regression** 11

2.1 The regression problem

2.2 The linear regression model

- Describe relationships — classical statistics
- Predicting future outputs — machine learning

2.3 Learning the model from training data

- Maximum likelihood
- Least squares and the normal equations

2.4 Nonlinear transformations of the inputs – creating more features

2.5 Qualitative input variables

2.6 Regularization

- Ridge regression
- LASSO
- General cost function regularization

2.7 Further reading

2.A Derivation of the normal equations

- A calculus approach
- A linear algebra approach

**3 The classification problem and three parametric classifiers** 25

3.1 The classification problem

3.2 Logistic regression

- Learning the logistic regression model from training data
- Decision boundaries for logistic regression
- Logistic regression for more than two classes

3.3 Linear and quadratic discriminant analysis (LDA & QDA)

- Using Gaussian approximations in Bayes’ theorem
- Using LDA and QDA in practice

3.4 Bayes’ classifier — a theoretical justification for turning p(y | x) into yb

- Bayes’ classifier
- Optimality of Bayes’ classifier
- Bayes’ classifier in practice: useless, but a source of inspiration
- Is it always good to predict according to Bayes’ classifier?

3.5 More on classification and classifiers

- Regularization
- Evaluating binary classifiers

**4 Non-parametric methods for regression and classification: k-NN and trees** 43

4.1 k-NN

- Decision boundaries for k-NN
- Choosing k
- Normalization

4.2 Trees

- Basics
- Training a classification tree
- Other splitting criteria
- Regression trees

**5 How well does a method perform?** 53

5.1 Expected new data error Enew: performance in production

5.2 Estimating Enew

- Etrain 6≈ Enew: We cannot estimate Enew from training data
- Etest ≈ Enew: We can estimate Enew from test data
- Cross-validation: Eval ≈ Enew without setting aside test data

5.3 Understanding Enew

- Enew = Etrain+ generalization error
- Enew = bias2 + variance + irreducible error

**6 Ensemble methods** 67

6.1 Bagging

- Variance reduction by averaging
- The bootstrap

6.2 Random forests

6.3 Boosting

- The conceptual idea
- Binary classification, margins, and exponential loss
- AdaBoost
- Boosting vs. bagging: base models and ensemble size
- Robust loss functions and gradient boosting

6.A Classification loss functions

**7 Neural networks and deep learning** 83

7.1 Neural networks for regression

- Generalized linear regression
- Two-layer neural network
- Matrix notation
- Deep neural network
- Learning the network from data

7.2 Neural networks for classification

- Learning classification networks from data

7.3 Convolutional neural networks

- Data representation of an image
- The convolutional layer
- Condensing information with strides
- Multiple channels
- Full CNN architecture

7.4 Training a neural network

- Initialization
- Stochastic gradient descent
- Learning rate
- Dropout

7.5 Perspective and further reading

**A Probability theory** 101

A.1 Random variables

- Marginalization
- Conditioning

A.2 Approximating an integral with a sum

**B Unconstrained numerical optimization** 105

B.1 A general iterative solution

B.2 Commonly used search directions

- Steepest descent direction
- Newton direction
- Quasi-Newton

B.3 Further reading

Bibliography

© 2020 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Upcoming DSC Webinar**

- DataOps: How Bell Canada Powers their Business with Data - July 15

Demand for data outstrips the capacity of IT organizations and data engineering teams to deliver. The enabling technologies exist today and data management practices are moving quickly toward a future of DataOps. DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics. Register today.

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Upcoming DSC Webinar**

- DataOps: How Bell Canada Powers their Business with Data - July 15

Demand for data outstrips the capacity of IT organizations and data engineering teams to deliver. The enabling technologies exist today and data management practices are moving quickly toward a future of DataOps. DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics. Register today.

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central