Swimming Pools and Nicholas Cage: A Look at How We Misunderstand Correlation

Can the recent four-point Super Bowl victory by the New England Patriots clue us in as to how many people will die in helicopter accidents?

The link between helicopter accident fatalities and Super Bowl point spreads is the latest Spurious Correlation from Tyler Vigen, who compiles ahighly amusing list of things that things that appear to be related, but most assuredly are not. (The number of people who drowned by falling into a swimming pool and the number of films Nicholas Cage appeared in is a personal favorite.)

As I mentioned in my earlier post, it’s easy to fall for supposed causal connections between events. And we have a long history of doing so, separate from the Super Bowl and other sports events. In the 1950s, for example, the incidence of polio was on the rise. Disease rates were found to be correlated with the consumption of ice cream — the more ice cream consumed, the more polio cases. Some medical authorities advised parents not to feed ice cream to their children.

The misunderstanding of correlation when it comes to public health clearly persists today — we see it in the current and heated controversy over whether vaccines are safe for children. That debate shows us how mastering the basics of statistics is becoming increasingly important. The decisions we face on a regular basis often depend on the level of our understanding of risk, and other statistical measures. Our choices directly affect our quality of life — and, as the measles outbreak illustrates, the lives of those around us.

If you have any doubt that we all need to become more statistically literate, consider a different vaccine debate, the one over the flu shot. In the mid-1970s, for reasons that remain unclear, there was an increase in the incidence of Guillain-Barre syndrome, a debilitating disorder, among people who got the swine flu vaccine. But you can also get GBS from contracting the flu itself. So which is riskier, getting the flu shot, or getting the flu? As the Washington Post pointed out last month, scientists at the Ottawa Hospital Research Center studied the numbers and concluded it’s almost always better to get the flu shot. But decide for yourself — they also developed an online tool that allows you to calculate your personal risk, based on your age, gender, and other factors.

As you encounter statistical dilemmas like these going forward, there are some basic theories you can learn to help you think them through. When you are presented with correlations that seem, well, spurious, for example, it’s important to remember this: Correlation is not causation. Correlation — even statistically significant correlation — does not necessarily imply anything about causation.

Consider a study that found that infants who slept with the lights on were more likely to develop near-sightedness later in life. The study caused many a parent to switch off the lights in the nursery at night– but that missed the point. The real cause was not the light, but a genetic link to myopic parents. Lights in infants’ rooms were more likely to be left on by myopic parents than parents with normal sight, a followup study discovered. The lights left on correlated with the development of myopia — but this was not the cause. The real cause was the parent’s myopia.

So when you are sizing up correlations, question whether there is a reasonable theory to explain the correlation, and whether some third factor might cause it.

Think about the polio-ice cream link I described. Can you think of an external factor that might be correlated with both ice cream consumption and polio incidence? Let me know your theories, and I’ll write about them in an upcoming post.

In the meantime, regarding the supposed link between helicopter deaths and Super Bowl point spread… forget it. But when it comes to real-life correlations we often encounter, there are ways you can determine for yourself whether trends in data are real or due to chance, and I’ll explain that in upcoming posts.

(Peter Bruce is founder of The Institute for Statistics Education at Statistics.com, the leading online provider of analytics and statistics courses since 2002. He also is the author of the newly-released Introductory Statistics and Analytics: A Resampling Perspective. (Wiley)

Follow Peter:
Twitter: @petercbruce, @statisticscom
Websites: www.statistics.comwww.introductorystatistics.com

Views: 4188

Tags: bowl, correlation, polio, statistics, super, vaccines


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Ru Coulter on March 10, 2015 at 1:25am

I believe polio can be caught from non-sterile water: so folk swimming in hot weather in lakes & streams would be more exposed to contagion than if they didn't swim - and only the hardiest go swimming in winter.  Ice-cream sales tend to peak in hot weather too.

Comment by Effi Psychogiou on March 9, 2015 at 10:53pm

On the polio-ice cream link my guess would be that it has to do with hot weather during which people eat more ice creams and maybe it is a better season for the polio to develop and spead.

Comment by John Miglautsch on March 9, 2015 at 8:41am

It would be great to know where ice cream consumption was higher and lower.  I'm guessing, since our little town had a soda fountain, that there was more consumption with higher population density.  My mom talked about walking to the store to buy ice cream.  Polio was contracted by close interaction.  My theory is that it was proximity/population density that gave rise to both.  @JRMigs

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service