.

Interconnectivity of Univariate Probability Distributions

  • Knowing how probability distributions are connected can be helpful, especially when it comes to choosing models.
  • Many charts showing this interconnectivity exist; Most are cluttered with more information than you're probably not looking for.
  • A simplified chart to help you choose between models.

Studying the different relationships between univariate distributions is a fascinating and practical pursuit. If you choose a model that isn't quite right, there may be another model with a similar shape that fits better. For example, if your data doesn't quite fit a logistic model, it can be useful to know that a simple transformation can result in  a choice from several other distributions: the uniform, log-logistic or standard logistic. Similarly, if your data doesn't quite fit a normal distribution, you could try a skinner t-distribution or the chi-squared. Chi square distributions are always right skewed but greater degrees of freedom will result in a resemblance to the normal distribution.

A wealth of information exists on these types of relationships. For example, Leemis et al. [1] created a rich, complex chart with 76 probability distributions and various properties like convolution (indicating that sums of independent random variables come from the same distribution family.) and scaling (which indicates that any positive real constant multiplied by a random variable comes from the same  family). The complex interplay of information contained within the chart is great for students of probability, but there may be a tad too much information for someone simply looking to find alternative, similar models. Here's a snapshot of the chart:

One of the best (and simplest) charts was created by Wheyming Tina Song [2] . It contains 35 univariate distributions along with their parameters and range. While this is useful information, some of the distributions are rarely seen in data science, and the addition of the extra information makes it hard to quickly see relationships. To solve this problem, I created a simplified version of Song's chart with 24 of the more common distributions. 

Key highlights of the chart:

  • Discrete probability distributions are shown in blue.
  • Continuous probability distributions are green.
  • Sampling distributions are in orange.
  • Dashed arrows show asymptotic relationships. These are usually in the limit as one or more of the parameters approach the parameter space boundary.
  • Solid arrows show transformations or special cases. Transformations can include the distribution of order statistic, taking a mixture of random variables, or truncating random variables. Some of these transformations can be inverted; These are shown by double-headed arrows.
  • Orange arrows indicate more than one random variable is involved in forming transformations.

Some of the transformation relationships can be combined. For example, the path Standard Normal → Chi-Squared → Gamma → Exponential path indicates that two random variables can have an exponential distribution if the two variables are independent standard normal variables. 

Many more relationships exist with univariate distributions. So many, in fact, that it would be impossible to fit them on a single graph. For example [3]:

  • A geometric random variable is the floor of an exponential random variable,
  •  A rectangular random variable is the floor of a uniform random variable,
  • Certain transformations of a random variable with an F distribution can have a beta distribution.

References

Chi-square distributions image: Geek3|Wikimedia Commons. GNU Free License.

[1] Univariate Distributions Relationship Chart

[2] Relationships among some univariate distributions

[3] Univariate Distribution Relationships

Views: 174

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service