Subscribe to DSC Newsletter

Why You Need to Know Those Probability Distributions

If you're in the beginning stages of your data science credential journey, you're either about to take (or have taken) a probability class. As part of that class, you're introduced to several different probability distributions, like the binomial distribution, geometric distribution and uniform distribution. You might be tempted to skip over some elementary topics and just scrape by with a bare pass. Because, let's face it--the way probability is taught (with dice rolls and cards) is far removed from the glamor of data science. You may be wondering

When am I ever going to calculate the probability of five die rolls in a row in real life?

The answer may surprise you: probability never. Nor will you ever need to count cards (unless you land a job at a casino), find the probability of choosing balls from urns (except for Lottery jobs), or calculate the probability of a sixteen-sided die landing on a 1  three times in a row (but that'll make you a great competitor in many games).

So why do you need to learn all of these esoteric probability rules in the first place? The answer is that they form a foundation for learning. Remember back in grade school, when you had to learn those mind-numbingly boring grammar rules? Like:

  • where to put periods (and where not to),
  • when to capitalize,
  • how to construct a sentence.

Ten or twenty years later you're using those rules every time you write an email. They are so entrenched in your mind though, you don't actually realize that you're using them...you just write without thinking. In the same way, the rules of probability are introduced in an abstract way, over and over again in different guises, so that eventually they'll become second nature when you create a report, build a model or present your findings to your boss. 

Real World Examples of Probability Distributions in Data Science

Many Machine Learning models work best with some assumptions about the underlying distribution. For example, if you want to model a person's height, the normal distribution (a bell-shaped curve with continuous data points) will work much better than a binomial distribution (a distribution that's limited to "true/false" type scenarios). Here are five of the most common probability distributions you'll use, along with a few examples of where you'll use them in a few real life situations.

References

Optimizing Count Responses in Surveys

Exact tests using two correlated binomial variables in contemporary...

Kernel methods and the exponential family

Machine Learning Lectures

Simple Dependent Pairs of Exponential and Uniform Random Variables

Radiomics and Machine Learning With Multiparametric Preoperative MR...

A new uniform distribution with bathtub-shaped failure rate with si...

Views: 1146

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Videos

  • Add Videos
  • View All

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service