Subscribe to DSC Newsletter

We all know that correlations range from - 1 to +1. What about correlations between random variables taking only on positive values, possibly from a Poisson, Exponential or Gamma joint distribution? You would think that that these multivariate  random variables  have a hard time having a very negative correlation. Here we focus on a specific example that has practical applications.

Let's assume that we are dealing with a bivariate distribution (X, Y), with the two marginals X and Y having an exponential distribution. What is the most negative correlation that we could have between X and Y? The answer is not -1, indeed it's about -0.645, and the exact value is 1 - (Pi^2) /6. Read this article for a proof, and for more general results. In particular, if you want to generate an even more negative correlation, try with Gamma distributions.

Application

This model has been used for weather predictions: the variables X and Y being respectively the storm cells duration and intensity, typically modeled as independent variables, while actually, the more intense the precipitations, the shorter the duration (thus a negative correlation). So this problem helps develop a more accurate weather prediction system. You can read the detailed paper here

Related articles:

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 4314

Reply to This

Replies to This Discussion

Hi Vincent,

Can you please clarify your very first statement? I do not really know that "correlations range from -1 to +1". I know that it is true for correlations measured by Pearson's correlation coefficient. I also know that Pearson's correlation coefficient only measures linear relations between variables and this very strong practical limitation results in endless attempts to create/introduce other correlation measures but neither is nearly as popular - I believe mostly due to much less straightforward interpretation.

Many thanks,

Michael   

Hi Michael,

Yes, I was referring to the classic coefficient of correlation that you study in high school. I agree, it has many drawbacks, and I am myself an advocate of alternative measures of correlation, see for instance this article.

Best,

Vincent

Thank you Vincent, it is a useful clarification. Unfortunately, quite a few people tend to use a tool they better know rather than that appropriate for a particular situation. Pearson's r will return 0 for a perfectly functional relation y=sin(x) but it does not mean that r is not a good tool :)

Kind regards,

Michael

Reply to Discussion

RSS

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service