The arguments / discussions between the Bayesian vs frequentist approaches in statistics are long running. I am interested in how these approaches impact machine learning. Often, books on machine learning combine the two approaches, or in some cases, take only one approach. This does not help from a learning standpoint.
So, in this two-part blog we first discuss the differences between the Frequentist and Bayesian approaches. Then, we discuss how they apply to machine learning algorithms.
Traditionally, we understand statistics as follows. Given a collection of items to be studied (ex: analysing heights of people) which we call as the population, you can acquire a sample of the population. You could calculate some useful properties of the sample (such as the mean). These give you the descriptive statistics for the sample. But if you wanted to generalise about the population based on the sample, you need to consider inferential statistics. The goal of inferential statistics is to infer some quantity about the population from the sample. There are two general philosophies for inferential statistics i.e. frequentist and Bayesian.
Frequentist and Bayesian approach differ in their interpretation of probability. In the frequentist world, you can only assign probabilities to repeated random phenomenon (such as the rolling of a dice). From the observations of these long-run phenomenon, you could infer the probability of occurrence of a specific event in question (for instance how many times the fair dice would roll to 6). Thus, in the frequentist world, to apply probability, we need a repeated event which is observed over a long duration. In contrast, in the Bayesian view, we assign probabilities to specific events and the probability represents the measure of belief/confidence for that event. The belief can be updated in the light of new evidence. In a purist frequentist sense, probabilities can be assigned only to repeated events – you could not assign probability to the outcome of an election (because it is not a repeated event).
There are three key points to remember when discussing the frequentist v.s. the Bayesian philosophies.
So, the question arises: We have seen how Bayesians incorporate uncertainty in their modelling but how do frequentists treat uncertainty if they work with point estimates?
The general approach for frequentists is: to make an estimate but to also specify the conditions under which the estimate is valid.
Frequentists use three ideas to understand uncertainty i.e. null hypothesis, p-values and confidence intervals – which come broadly under statistical hypothesis testing for frequentist approaches.
In this post, we summarised some complex ideas about frequentist and bayesian probability. In part two, we will see how these ideas apply to machine learning and deep learning algorithms.