The arguments / discussions between the Bayesian vs frequentist approaches in statistics are long running. I am interested in how these approaches impact machine learning. Often, books on machine learning combine the two approaches, or in some cases, take only one approach. This does not help from a learning standpoint.

So, in this two-part blog we first discuss the differences between the Frequentist and Bayesian approaches. Then, we discuss how they apply to machine learning algorithms.

Traditionally, we understand statistics as follows. Given a collection of items to be studied (ex: analysing heights of people) which we call as the **population**, you can acquire a **sample** of the population. You could calculate some useful properties of the sample (such as the mean). These give you the **descriptive statistics** for the sample. But if you wanted to generalise about the population based on the sample, you need to consider **inferential statistics**. The goal of inferential statistics is to infer some quantity about the population from the sample. There are two general philosophies for inferential statistics i.e. frequentist and Bayesian.

Frequentist and Bayesian approach differ in their interpretation of probability. In the frequentist world, you can only assign probabilities to repeated random phenomenon (such as the rolling of a dice). From the observations of these long-run phenomenon, you could infer the probability of occurrence of a specific event in question (for instance how many times the fair dice would roll to 6). Thus, in the frequentist world, to apply probability, **we need a repeated event** which is observed over a long duration. In contrast, in the Bayesian view, we assign probabilities to specific events and the probability represents the measure of **belief/confidence** for that event. The belief can be updated in the light of new evidence. In a purist frequentist sense, probabilities can be assigned only to repeated events – you could not assign probability to the outcome of an election (because it is not a repeated event).

There are three key points to remember when discussing the frequentist v.s. the Bayesian philosophies.

- The first, which we already mentioned, Bayesians assign probability to a specific outcome.
- Secondly, Bayesian inference yields probability distributions while frequentist inference focusses on point estimates.
- Finally, in Bayesian statistics, parameters are assigned a probability whereas in the frequentist approach, the parameters are fixed. Thus, in frequentist statistics, we take random samples from the population and aim to find a set of fixed parameters that correspond to the underlying distribution that generated the data. In contrast for Bayesian statistics, we take the entire data and aim to find the parameters of the distribution that generated the data but we consider these parameters as probabilities i.e. not fixed.

So, the question arises: *We have seen how Bayesians incorporate uncertainty in their modelling but how do frequentists treat uncertainty if they work with point estimates?*

The general approach for frequentists is: to make an estimate but to also specify the conditions under which the estimate is valid.

Frequentists use three ideas to understand uncertainty i.e. null hypothesis, p-values and confidence intervals – which come broadly under statistical hypothesis testing for frequentist approaches.

**Use of p-values to indicate statistical significance**: Assuming your null hypothesis is true, a high p-value indicates that your results are random i.e. not related to the experiment you have performed. In other words, the smaller the p-value, the more statistically significant the result. Note that p-values are statements about the data sample, not the hypothesis itself.- Use of confidence intervals to provide an estimated range of values which is likely to include the population parameter. A
gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data. (*confidence interval**Definition taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1*). Common choices for the confidence level*C*are 0.90, 0.95, and 0.99. For example, a 95% confidence interval - But of your population distribution is not normal or if your samples are large, we use the
*Central Limit Theorem.*

In this post, we summarised some complex ideas about frequentist and bayesian probability. In part two, we will see how these ideas apply to machine learning and deep learning algorithms.

https://www.probabilisticworld.com/frequentist-bayesian-approaches-...

https://www.simplypsychology.org/confidence-interval.html

http://www.stat.yale.edu/Courses/1997-98/101/confint.htm

https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_probability/BS...

https://www.simplypsychology.org/confidence-interval.html

http://www.stat.yale.edu/Courses/1997-98/101/confint.htm

Image source: https://www.nps.gov/features/yell/slidefile/mammals/bison/Images/00...

- Juniper adds Mist AIOps to its 128 Technology-based SD-WAN
- 10 microservices patterns all architects should know
- IBM extends Call for Code for Racial Justice program
- citizen development
- How to manage third-party risk in the supply chain
- Gartner predicts data storytelling will dominate BI by 2025
- AWS Data Exchange and the third-party cloud data marketplace
- Overcome common IoT edge computing architecture issues

Posted 1 March 2021

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central