- Knowing how probability distributions are connected can be helpful, especially when it comes to choosing models.
- Many charts showing this interconnectivity exist; Most are cluttered with more information than you're probably not looking for.
- A simplified chart to help you choose between models.

Studying the different relationships between univariate distributions is a fascinating and practical pursuit. If you choose a model that isn't quite right, **there may be another model with a similar shape that fits better.** For example, if your data doesn't quite fit a logistic model, it can be useful to know that a simple transformation can result in a choice from several other distributions: the uniform, log-logistic or standard logistic. Similarly, if your data doesn't quite fit a normal distribution, you could try a skinner t-distribution or the chi-squared. Chi square distributions are always right skewed but greater degrees of freedom will result in a resemblance to the normal distribution.

**A wealth of information exists on these types of relationships**. For example, Leemis et al. [1] created a rich, complex chart with 76 probability distributions and various properties like convolution (indicating that sums of independent random variables come from the same distribution family.) and scaling (which indicates that any positive real constant multiplied by a random variable comes from the same family). The complex interplay of information contained within the chart is great for students of probability, but there may be a tad too much information for someone simply looking to find alternative, similar models. Here's a snapshot of the chart:

One of the best (and simplest) charts was created by Wheyming Tina Song [2] . It contains 35 univariate distributions along with their parameters and range. While this is useful information, some of the distributions are rarely seen in data science, and the addition of the extra information makes it hard to quickly see relationships. To solve this problem, **I created a simplified version** of Song's chart with 24 of the more common distributions.

Key highlights of the chart:

**Discrete probability distributions**are shown in blue.**Continuous probability distributions**are green.**Sampling distributions**are in orange.- Dashed arrows show
**asymptotic relationships**. These are usually in the limit as one or more of the parameters approach the parameter space boundary. - Solid arrows show
**transformations**or**special cases.**Transformations can include the distribution of order statistic, taking a mixture of random variables, or truncating random variables. Some of these transformations can be inverted; These are shown by double-headed arrows. - Orange arrows indicate
**more than one random variable is involved in forming transformations**.

Some of the transformation relationships can be combined. For example, the path Standard Normal → Chi-Squared → Gamma → Exponential path indicates that two random variables can have an exponential distribution if the two variables are independent standard normal variables.

Many more relationships exist with univariate distributions. So many, in fact, that it would be impossible to fit them on a single graph. For example [3]:

- A geometric random variable is the floor of an exponential random variable,
- A rectangular random variable is the ﬂoor of a uniform random variable,
- Certain transformations of a random variable with an F distribution can have a beta distribution.

**References**

Chi-square distributions image: Geek3|Wikimedia Commons. GNU Free License.

[1] Univariate Distributions Relationship Chart

- Juniper adds Mist AIOps to its 128 Technology-based SD-WAN
- 10 microservices patterns all architects should know
- IBM extends Call for Code for Racial Justice program
- citizen development
- How to manage third-party risk in the supply chain
- Gartner predicts data storytelling will dominate BI by 2025
- AWS Data Exchange and the third-party cloud data marketplace
- Overcome common IoT edge computing architecture issues

Posted 1 March 2021

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central