The ‘Bell curve’ or the ‘Gaussian bell curve’ is one of the fundamental concepts on which most of the statistical analysis is based. From social sciences to astronomy to financial services- most of the application of statistics in the real world relies on the assumption that the data being analysed is distributed in the shape of the bell curve.

In the last article we discussed the usefulness of the Bell curve. It helps us simplify things and use rules to understand distributions. The curve’s symmetry and consistency make it ideal for making predictions.

In this article, we will discuss how these same qualities of the bell curve that make it so tempting and useful can also be a curse.

**Does all information follow the Bell Curve?**

There are many examples of normal (or approximately normal) distribution around us. The statistical concepts have been empirically tested and verified countless times.

Certain quantities in physics are distributed normally such as the velocities of the molecules in an ideal gas. In biology, the *logarithm* of various variables such as the thickness of the tree bark or claws of a mammal tend to have a normal distribution. In Finance, changes in log of certain phenomenon such as exchange rates and price indices are assumed to be normal though this assumption is hotly contested by some. Bell curve grading assigns relative grades based on a normal distribution of scores.

As Dr. Taleb says in his book, The Black Swan, we can make good use of the Gaussian approach (i.e. the bell curve) for variables for which there is a rational reason for the largest not to be too far from the average. If there is gravity pulling down numbers, or if there are physical limitations preventing very large observations (say, the length of the tail of a cat), we end up in mediocristan.

Mediocristan is a term coined by Dr. Taleb to denote situations where the Gaussian approach (normal, binomial, poisson etc.) will work.

**The Curse of the Bell Curve**

The Curse of the Bell Curve, however comes from the fact that we often use the bell curve in situations that bear no resemblance to a normal distribution. Many real life phenomena do not follow the bell curve and yet we assume a normal distribution just because the simplicity of the bell curve is highly tempting. Let us examine some glaring examples here.

**Stock Prices**– If stock prices were normally distributed we would see events like the 2009 crash, once in a 100 years or even less. Ye we see such events almost every decade. Despite the blatant and repetitive empirical evidence that shows that stock prices are not normally distributed, we continue to rely on models that assume the opposite.**Distribution of wealth**– Economists and social scientists often make the assumption that distribution of wealth is normal. Strangely, even a simple test reveals that this is not true. If wealth were normally distributed, people like Mark Zuckerberg or Bill Gates would just not exist.**Balance in Checking/Saving account**– This is an interesting article written by a friend during his consulting days. He illustrates how we tend to blindly apply the bell curve to non-normal data, sometimes with disastrous results.

Most real life data does not exhibit normal distribution. A normal distribution is more of an exception than a rule. Real world data shows variations (high and low) that are far more frequent than what the bell curve predicts. Even data that seems to be normally distributed may seems so only because our observation period is not long enough.

This is an important lesson for any analyst dealing with real world data. Always check the data for normality. And always look for a rational explanation about why the data should be normal. Only if you are satisfied on both the counts, should you assume a normal distribution. And then also, proceed with caution.

The concept of The Bell Curve is a highly seductive one. Once it gets into your mind it is hard to get past it. Hence be careful about its use.

The bell curve has a lot of uses and it should not be discarded completely. But it should be used judiciously or the consequences can be disastrous.

**About the author:**

Gaurav Vohra is an alumnus of IIM Bangalore with over 10 years of experience in the field of analytics. Gaurav has been in the analytics industry from its initial days and his career has spanned companies like Capital One and Information resources Inc., recognized as thought-leaders in the analytics space.

Gaurav is now the co-founder of **Jigsaw academy (www.jigsawacademy.com)**, a training institute that aims to meet the growing demand for talent in the field of analytics by providing industry-relevant training to develop business-ready professionals. You can visit Gaurav’s blog at **www.analyticstraining.com**

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central