The mode is one of the basic statistics which is defined as the most common value over an array. When the values of the array are categorical, the mode is easy to detect by selecting the one with the most occurrence. The problem of identifying the modes on a numerical array is harder since the values can be continuous and therefore count the occurrences by value is not enough, so the distribution of these values must be checked in order to identify the most probable values. However, numerical arrays can be multi-modal which reduces the problem to finding local maxima on the distribution instead of the global maximum where only one mode is present.

Finding histograms is one of the easiest ways to find the distribution of a numerical array. By looking at the distribution it might be clear where the modes are.

To show that, an array of values is generated by simulated values from various normal distributions, as shown in the code below.

Here are four histograms of the previous simulated values changing the number of bins. As can be seen, the local maxima are colored blue, and the number of local maxima changes depending on the number of bins in the histogram.

Having 10 bins might seem reasonable since 3 might be the correct number of modes. On the other hand, having the 500 bins shows the shape of the distribution but does not allow to identify the modes since there are many local maxima. It is convenient to capture a good approximation of the number of modes in a systematic way, with no need of the human eye intervention.

Mixture models are generated by the aggregation of sub-models, each of them weighted by their own parameter. For this example the weights are simulated by a Dirichlet Process and sum to one, this weights can be simulated by a stick breaking process. Each of the models that add up is Gaussian with their respective parameters.

The number of sub-models is denoted by K and set to 50. As seen below, the model has high complexity and the use of probabilistic programming methods is needed to make it’s estimation feasible.

By checking the local maxima on the expected density, it is possible to check that the values in the domain that results on maxima are: -1.57, 4.59, 14.07. The estimation was done without knowing the exact number of modes, by setting K large enough to host all the possible Gaussian distributions that together form the PDF (Probability Density Function) of the values.

(This blog post originally appeared here)

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central