A popular phrase tossed around when we talk about statistical data is “there is correlation between variables”. However, many people wrongly consider this to be the equivalent of “there is causation between variables”. It’s important to explain the distinction: Correlation means that once we know how one variable changes we can make reasonable deductions about how other variables change There are several variants of correlation:
Positive correlation means that with an increase/decrease of one variable, the other variable will rise/fall. A good analogy would be the number of stories in an apartment building and the number of apartments. Logically, a higher building will probably contain more apartments. Full correlation is equal to 1 and means that we can use data to explicitly model changes in values of one variable when we know how others change. In practice this rarely happens, or it describes situations which are obvious and not useful. For example, an increase in Oil volume has perfect correlation with Oil mass.
Correlation that equals 0 means there is no way to deduce behavior of one variable based on another variable’s behavior.
Negative correlation means that with the rise/fall of one variable, we can expect a decrease/increase of the other variable. Negative correlation that is above -1 means only a partial possibility of deduction. For example, if a person’s weight and running speed are negatively correlated, a heavier person can’t usually run as fast as a lighter person – but it’s not always the case. Full negative correlation equals -1 and means that we can perfectly deduce the fall/rise of one variable knowing the rise/fall of the other. In practice it rarely happens or is obvious and, therefore, not useful.
Many people consider correlation as sure proof of causality between variables. That’s not true – correlation can be explained by 5 possibilities:
It may seem that the last point is somehow lazy, but because of the sheer number of data available and the rules of probability, we can find large numbers of correlated variables that in practice are not connected at all.
The webpage http://www.tylervigen.com/spurious-correlations offers a great selection of these types of combinations. Enjoy. :-)
Comment
>Correlation that equals 0 means there is no way to deduce behavior of one
>variable based on another variable’s behavior.
Really?! Is correlation the only measure of dependence?
How about a canonical (counter)exampe of dependent variables with zero correltation.
© 2020 Data Science Central ® Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
DSC Podcast
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
DSC Podcast
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central