In the film Silver Linings Playbook, Jennifer Lawrence’s character, Tiffany, attempts to win over the Philadelphia Eagles-obsessed family of her friend, Pat, by claiming that her time with him is bringing the team good “juju.”
“The first night Pat and I met at my sister’s, the Eagles beat the 49ers handily, 40–26,” she says, pacing around a living room filled with Pat’s rapt relatives. “The second time we got together we went for a run and the Phillies beat the Dodgers 7–5 in the NLCS. The next time we went for a run, the Eagles beat the Falcons 27–14…” And so on.
With the Super Bowl approaching, it’s easy to find other fans who fervently cling to supposed causal connections between events and victories. And should their teams prevail, it’s also sometimes hard to disprove the superstition. Consider the Redskins rule, as explained by John Elder, founder of the data mining firm Elder Research: for over 70 years, if the Washington Redskins won their last home football game, the incumbent party would win the presidential election. One didn’t actually cause the other, but for generations, they just happened to line up.
The rich realm of sports superstitions and rituals this time of year highlights the increasing need for all of us — not just those glued to their Super Bowl screens — to grasp and understand the basics of statistics, well beyond the boundaries of any playing fields.
Ill-prepared consumers have been forced in recent years to master the financial literacy skills necessary to handle their own retirements, investments, and other complex financial instruments as companies shifted these responsibilities to them. These days, statistical literacy is emerging as an equally key skill. An avalanche of big data and a regular stream of media reports on statistics and research forces us to glean truth from the numbers on everything from whether certain vitamin supplements affect heart attack rates to the risks versus benefits of regular mammograms.
But like consumers struggling to figure out how much they’re really paying in fees for their IRAs, folks who don’t know the basics of statistics find themselves at a serious disadvantage.
Statistics and statistical theories serve as the basis for everything from passenger profiling in an era of terrorist threats to the effectiveness of new programs to reduce the rate of hospital errors. They can shed light on whether a hedge fund’s success is genuine or due to chance. They predict whether a given subscriber will leave this year, or an insurance claim is likely to be fraudulent.
They can mislead, just as they enlighten, and we need to know the difference. Even in completely randomly-generated data, interesting patterns appear. If the data are big enough and the search exhaustive enough, the patterns can be very compelling. But they could be nothing more than a mirage that disappears with time and further investigation.
Well-known missteps from the statistics world include studies that show children with bigger feet are more proficient in spelling and states with higher divorce rates have lower death rates. But older children, who tend to have bigger feet, naturally spell better than younger ones, and states with higher divorce rates have larger shares of a younger population cohort. Don’t rush to your lawyer’s office; getting divorced won’t help you live longer.
I’ve been teaching statistics for over two decades. What I’ve seen is that for many people, learning statistics is as obscure as reading the fine print in their financial documents. Usually that’s because much of the teaching method is forced, artificial, and divorced from what most students will end up doing in the workforce, or in their daily lives. But the rapid growth of data science and analytics is opening up new ways to teach and learn statistics. People can figure out how to derive statistical meaning and comprehension not just from mathematics, but also from context and purpose. It’s not just an academic exercise. If you don’t understand the statistical world around you, you don’t really know how things work: Whether watching television actually causes violent behavior. How your gender might impact your earnings. Even the best time of the day to exercise to lose weight.
And in a world where misinformation spreads quickly through the media, our failure to comprehend statistics regularly leads to controversy, and consequences.
Parents in a Maryland suburb recently allowed their children, ages 10 and 6, to walk home alone from a neighborhood park, sparking a debate on our understanding of risk. We might feel like the world is more dangerous for children than it was a few decades ago, but the numbers don’t bear that out. Still, that didn’t stop the police from showing up at the family’s door.
Then there’s the vaccines debate. A study last year created a stir on the Internet by purporting to show that African American boys had a greater risk of autism associated with the time they were vaccinated, and alleging the Centers for Disease Control covered up the findings. The journal that published it eventually retracted it. Among other problems, the study was incorrectly designed, and incorrectly analyzed the data it produced, according to STATS.org, a nonprofit group that promotes statistical literacy. But that doesn’t mean people won’t still think it’s true.
“These flaws will be obvious to statisticians and to scientists who understand statistical analysis,” STATS.org director Rebecca Goldin wrote in a recent post. “The problem is how to undo the damage among a public that is skeptical of scientific authority, and is suspicious or even hostile toward vaccination.”
I’m launching a series of posts to offer some examples of the practical application of statistics in the world around us, in an effort to help us all understand them better. I’ll explain methods I employ, like resampling and bootstrapping, that make statistics simpler and more transparent.
The Silver Linings Playbook example shows how coincidences happen all the time in life — often to a greater extent than we thought possible. But I will explain that with things like resampling — the computer equivalent of drawing numbers from a hat — we can actually decode some of the mystery of statistical analysis, and know with much greater certainty what is true as opposed to chance. We do that through the process of taking repeated samples from observed data — or shuffling that data — to figure out what effect random variation might have on our statistical estimates, our models and our conclusions.
For example, I use a black magician’s hat I usually don at Halloween to explain the permutation test — combining two or more samples in a hat, shuffling the hat, and then picking out resamples at random. I used this to test whether a program to reduce medical errors was effective, or due to chance, but it can apply to many other questions as well.
In future posts, I’ll talk about how you can determine for yourself the validity of political surveys and polls as campaign season heats up. I’ll cite the recent explosion in subprime car loans to revisit how data was misused and misinterpreted during the subprime mortgage crisis. I’ll talk about the ways predictive analytics and modeling techniques can go right — and wrong. There are ways to take away the intimidation of talking about statistics. You really can cut through the fog and be able to understand exactly what those significance tests, p-values, confidence intervals and other difficult concepts are all about. And overcoming the fear puts you at a genuine statistical advantage.
(Peter Bruce is founder of The Institute for Statistics Education at Statistics.com, the leading online provider of analytics and statistics courses since 2002. He also is the author of the newly-released Introductory Statistics and Analytics: A Resampling Perspective. (Wiley)