This is a recent, very popular article published in the FT Magazine. A similar one - *Eight (No, Nine!) Problems With Big Data* - was published in the New York Times. In both cases, this is an attack against big data. It features Google's failure in its flu prediction for 2013, also claiming that algorithms not trying to identify causes (but focused on correlations only) are doomed to fail.

**Here's my reaction**:

The fact that Google got it wrong last time with their flu prediction does not mean that old statistical science is superior, or that big data is wrong. I believe such a prediction can be made with relatively small data - 10,000 tweets talking about the flu is enough.

Just like because flu vaccines cause side effects and fail 30% of the time, does not mean that they don't work. It just means that you need to collect the right data, and get the right people or automated tools - certainly not traditional statisticians, but instead data-savvy domain experts - to extract meaningful insights. Many times, as in data mining, a causal explanation can't be found, or would take too much time to find, or is of no interest. If you can't make sense of data, if you are data illiterate, or blindly believe in statistical models, data hackers will take advantage of you, and/or your ROI will quickly turn negative.

Nobody *always* win on the stock market. But if you win significantly more frequently than the average trader, you really get a positive ROI. And big data is at least as good as small data and evidently provide either the same amount of information, or more, and yet is cheap, maybe even cheaper than small data. Saying big data is worse than small data is like saying you can't put more people in a 50-story skyscraper than you can in a 3-story house. The analogy is quite good indeed, because if you build your skyscraper the same way you build a 3-story house, it will collapse by the time you reach floor 10. You need the right tools and methodology, both to build a skyscraper (vs. small house) or to build/leverage big data (vs. small data), otherwise - in both cases - it's bound to fail.

**Other links**

- From the trenches: 360-degree data science
- 17 short tutorials all data scientists should read (and practice)
- 10 types of data scientists
- 66 job interview questions for data scientists
- Data Science Certification
- Update about our Data Science Apprenticeship
- Our Wiley Book on Data Science
- Data Science Top Articles
- Our Data Science Weekly Newsletter
- Practical illustration of Map-Reduce (Hadoop-style), on real data
- What makes up data science?
- Data science webinars
- Data science competition

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central