If you're in the beginning stages of your data science credential journey, you're either about to take (or have taken) a probability class. As part of that class, you're introduced to several different probability distributions, like the binomial distribution, geometric distribution and uniform distribution. You might be tempted to skip over some elementary topics and just scrape by with a bare pass. Because, let's face it--the way probability is taught (with dice rolls and cards) is far removed from the glamor of data science. You may be wondering
When am I ever going to calculate the probability of five die rolls in a row in real life?
The answer may surprise you: probability never. Nor will you ever need to count cards (unless you land a job at a casino), find the probability of choosing balls from urns (except for Lottery jobs), or calculate the probability of a sixteen-sided die landing on a 1 three times in a row (but that'll make you a great competitor in many games).
So why do you need to learn all of these esoteric probability rules in the first place? The answer is that they form a foundation for learning. Remember back in grade school, when you had to learn those mind-numbingly boring grammar rules? Like:
Ten or twenty years later you're using those rules every time you write an email. They are so entrenched in your mind though, you don't actually realize that you're using them...you just write without thinking. In the same way, the rules of probability are introduced in an abstract way, over and over again in different guises, so that eventually they'll become second nature when you create a report, build a model or present your findings to your boss.
Many Machine Learning models work best with some assumptions about the underlying distribution. For example, if you want to model a person's height, the normal distribution (a bell-shaped curve with continuous data points) will work much better than a binomial distribution (a distribution that's limited to "true/false" type scenarios). Here are five of the most common probability distributions you'll use, along with a few examples of where you'll use them in a few real life situations.
References
Optimizing Count Responses in Surveys
Exact tests using two correlated binomial variables in contemporary...
Kernel methods and the exponential family
Simple Dependent Pairs of Exponential and Uniform Random Variables
Radiomics and Machine Learning With Multiparametric Preoperative MR...
A new uniform distribution with bathtub-shaped failure rate with si...
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central