This post continues our discussion on the Bayesian vs the frequentist approaches. Here, we consider implications for parametric and non-parametric models
In the previous blog the Bayesian vs frequentist approaches: implications for machine le…, we said that
In Bayesian statistics, parameters are assigned a probability whereas in the frequentist approach, the parameters are fixed. Thus, in frequentist statistics, we take random samples from the population and aim to find a set of fixed parameters that correspond to the underlying distribution that generated the data. In contrast for Bayesian statistics, we take the entire data and aim to find the parameters of the distribution that generated the data but we consider these parameters as probabilities i.e. not fixed.
The question then is
How does this discussion (Bayesian vs frequentist) extend to parametric and non-parametric models?
To recap: In a parametric model, we have a finite number of parameters. In contrast, for nonparametric models, the number of parameters is (potentially) infinite and more specifically, the number of parameters and the complexity of the model grows with increasing data.
The terms parametric and non-parametric also apply to the underlying distribution. Intuitively, you could say that parametric models follow a specified distribution – which is defined by the parameters. Non-parametric models do not imply an underlying distribution.
Another way to approach the problem is to think of algorithms learning a function.
You can think of machine learning as learning an unknown function that maps the inputs X to outputs Y. The general format of this function is Y = f(x). The algorithm learns the form of the function from the training data. Different algorithms make different assumptions or biases about the form of the function and how it can be learned. By this approach, parametric machine learning algorithms are algorithms that simplify the function to a known form. This approach has benefits because parametric machine learning algorithms are simpler, faster and need less data. However, not all unknown functions can be expressed neatly in the form of relatively simple functions like linear functions. Hence, parametric models are more suited for simpler problems.
In contrast, non-parametric models do not make string assumptions about the underlying function. Non parametric models have the advantage of flexibility, power and better performance. However, they also need more data, are slower and prone to overfitting.
To recap, we said before, because the terms parametric and non-parametric also apply to the underlying distribution, you could say that parametric models follow a specified distribution – which is defined by the parameters. Non-parametric models do not imply an underlying distribution.
Hence, we can conclude that while the concepts are tangentially related they do not imply a connection.
- Parametric models imply that the data comes from a known distribution and the model can infer the parameters of that distribution
- However, this does not imply that parametric models are Bayesian (on the grounds that Bayesian models assume a distribution for the parameters)
- To emphasize, Bayesian models relate to prior information.
- As we see increasingly, the Bayesian vs frequentist issue is more a concern for statisticians, For machine learning, we are working often at a higher level of abstraction – and other considerations apply
The final, concluding part of this blog, will bring all these ideas together