More speed, more data, real time analytics, in-stream analytics. Everything seems to be trending toward size and speed. Take the now almost universally accepted aphorism 'more data beats better algorithms'. I beg to differ. There are plenty of examples where accepting this rule of thumb has led to shortcuts on the analytic side and lots of bad results.
It's always good to read about projects gone bad as a reminder that we need to pay attention to the basics of good analytics. Larry Greenemeier gives us a great example of how the widely touted Google Flu Trends analysis that was supposed to be an accurate worldwide predictor of annual influenza trends derived from related Google searches missed the mark by a wide margin.
"GFT has overestimated peak flu cases in the U.S. over the past two years. GFT overestimated the prevalence of flu in the 2012-2013 season, as well as the actual levels of flu in 2011-2012, by more than 50 percent, according to the researchers, who hail from the University of Houston, Northeastern University and Harvard University. Additionally, from August 2011 to September 2013, GFT over-predicted the prevalence of flu in 100 out of 108 weeks."
"Big data hubris is the “often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis.” The mistake of many big data projects, the researchers note, is that they are not based on technology designed to produce valid and reliable data amenable for scientific analysis. The data comes from sources such as smartphones, search results and social networks rather than carefully vetted participants and scientific instruments."
Great article. Read the full article here..