Spurious correlations: 15 examples

Sometimes a correlation means absolutely nothing, and is purely accidental (especially when you compute millions of correlations among thousands of variables) or it can be explained by confounding factors. For instance, the fact that the cost of electricity is correlated to how much people spend on education, is  explained by a confounding factor: inflation, which makes both electricity and education costs grow over time. This confounding factor has a bigger influence than true causal factors, such as more administrators / government-funded student loans boosting college tuition.

Even when there is a correlation that can be leveraged to solve a problem, for example a drug that was found to be better than placebo to help with a medical condition, it may work well for some people, and not well for others: the correlation is not universally strong. Also, causation only matters in specific contexts such as root cause analysis, where you need to fix the cause. Sometimes, it does not matter as long as it works, for instance a drug that works against a medical condition even if nobody knows why.

For more articles about cause versus correlations, or correlations in general, click here. Besides, the standard correlation (an L^2 metric) is sensitive to outliers, and indeed, not a great metric. This L^1 metric (to measure correlation) is more robust. 

Below are a few examples of spurious correlations. Click here to check out the 15 examples. 

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

  • Richard Ordowich

    Great article.

    Organizations are drowning in correlations! But there is another perspective on this. These correlations provide some value. They provide "Infotainment" and they keep a lot of Data Pushers employed.

    What we lack is critical thinking. Few people ask the necessary pointed questions when presented with nice graphics and story lines along with the data. When Data Pushers present their correlations they seldom provide much insight as the what was included in the data, what was excluded and finally the assumptions they made when developing the correlations.

    Many "force fit" the data to make their point.  The premise when seeing data should be "buyer beware". Contrary to what many people believe data are not "facts". Correlations should be treated as opinions. Should you trust the correlation? What is the story behind the story? Simple questions like why and show me other models and assumptions should be the norm.