Home » Uncategorized

Probabilistic Machine Learning book – a great free reference for maths of machine learning


At the #universityofoxford I focus a lot on the mathematics aspect of AI

I recommend eight books for the mathematics of AI

  1. The Nature Of Statistical Learning Theory By Vladimir Vapnik.
  2. Pattern Classification By Richard O Duda
  3. Machine Learning: An Algorithmic Perspective, Second Edition By Stephen Marsland
  4. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition By Trevor Hastie, Robert Tibshirani, Jerome Friedman
  5. Pattern Recognition and Machine Learning (Information Science and Statistics) By Christopher M. Bishop
  6. Machine Learning: The Art and Science of Algorithms that Make Sense of Data By Peter Flach
  7. Deep Learning By Goodfellow, Bengio and Corville
  8. Machine Learning: A Probabilistic Perspective by Kevin Murphy

Now, there is a new version of Machine Learning: A Probabilistic Perspective by Kevin Murphy

This is an amazing book – last published in 2012

The structure below and link to access below

If you can, I recommend you should buy this book, as I will – because this is very generous of Kevin Murphy and MIT press

The structure is very detailed and the book takes a Bayesian perspective


               Probabilistic inference  


               Bayes’ rule         

               Bayesian concept learning           

               Bayesian machine learning          

               Probabilistic models      

               Bernoulli and binomial distributions        

               Categorical and multinomial distributions             

               Univariate Gaussian (normal) distribution             

               Some other common univariate distributions      

               The multivariate Gaussian (normal) distribution  

               Linear Gaussian systems              

               Mixture models

               Probabilistic graphical models    

               Parameter estimation   


               Maximum likelihood estimation (MLE)   

               Empirical risk minimization (ERM)            


               The method of moments             

               Online (recursive) estimation     

               Parameter uncertainty   

               Optimization algorithms             

               First-order methods       

               Second-order methods  

               Stochastic gradient descent        

               Constrained optimization            

               Proximal gradient method           

               Bound optimization        

               Blackbox and derivative free optimization            

               Information theory        


               Relative entropy (KL divergence)              

               Mutual information        

               Bayesian statistics          


               Conjugate priors              

               Noninformative priors   

               Hierarchical priors          

               Empirical priors

               Bayesian model comparison       

               Approximate inference algorithms           

               Bayesian decision theory            

               Bayesian decision theory             

               A/B testing         

               Bandit problems              

II             Linear models   

               Linear discriminant analysis       


               Gaussian discriminant analysis   

               Naive Bayes classifiers   

               Generative vs discriminative classifiers   

               Logistic regression          


               Binary logistic regression             

               Multinomial logistic regression  

               Preprocessing discrete input data            

               Robust logistic regression            

               Bayesian logistic regression        

               Linear regression            


               Standard linear regression           

               Ridge regression              

               Robust linear regression              

               Lasso regression              

               Bayesian linear regression           

               Generalized linear models          


               The exponential family  

               Generalized linear models (GLMs)            

               Probit regression             

III           Deep neural networks  

               Neural networks for unstructured data


               Multilayer perceptrons (MLPs)   


               Training neural networks             


               Other kinds of feedforward networks     

               Neural networks for images      



               Image classification using CNNs 

               Solving other discriminative vision tasks with CNNs         

               Generating images by inverting CNNs     

               Adversarial Examples     

               Neural networks for sequences


               Recurrent neural networks (RNNs)          

               1d CNNs             



               Efficient transformers    

IV           Nonparametric models

               Exemplar-based methods           

               K nearest neighbor (KNN) classification  

               Learning distance metrics            

               Kernel density estimation (KDE) 

               Kernel methods              

               Inferring functions from data     

               Mercer kernels 

               Gaussian processes        

               Scaling GPs to large datasets      

               Support vector machines (SVMs)              

               Sparse vector machines 

               Trees, forests, bagging and boosting      

               Classification and regression trees (CART)            

               Ensemble learning          


               Random forests


               Interpreting tree ensembles        

V            Beyond supervised learning       

               Learning with fewer labeled examples  

               Data augmentation         

               Transfer learning             


               Few-shot learning           

               Word embeddings           

               Semi-supervised learning             

               Active learning  

               Dimensionality reduction           

               Principal components analysis (PCA)       

               Factor analysis  


               Manifold learning           


               Recommender systems

               Graph embeddings        

Book link is




 author = “Kevin P. Murphy”,

 title = “Probabilistic Machine Learning: An introduction”,

 publisher = “MIT Press”,

 year = 2021,

 url = “probml.ai


Image source Wikipedia