This is a recent, very popular article published in the FT Magazine. A similar one - Eight (No, Nine!) Problems With Big Data - was published in the New York Times. In both cases, this is an attack against big data. It features Google's failure in its flu prediction for 2013, also claiming that algorithms not trying to identify causes (but focused on correlations only) are doomed to fail.
Here's my reaction:
The fact that Google got it wrong last time with their flu prediction does not mean that old statistical science is superior, or that big data is wrong. I believe such a prediction can be made with relatively small data - 10,000 tweets talking about the flu is enough.
Just like because flu vaccines cause side effects and fail 30% of the time, does not mean that they don't work. It just means that you need to collect the right data, and get the right people or automated tools - certainly not traditional statisticians, but instead data-savvy domain experts - to extract meaningful insights. Many times, as in data mining, a causal explanation can't be found, or would take too much time to find, or is of no interest. If you can't make sense of data, if you are data illiterate, or blindly believe in statistical models, data hackers will take advantage of you, and/or your ROI will quickly turn negative.
Nobody always win on the stock market. But if you win significantly more frequently than the average trader, you really get a positive ROI. And big data is at least as good as small data and evidently provide either the same amount of information, or more, and yet is cheap, maybe even cheaper than small data. Saying big data is worse than small data is like saying you can't put more people in a 50-story skyscraper than you can in a 3-story house. The analogy is quite good indeed, because if you build your skyscraper the same way you build a 3-story house, it will collapse by the time you reach floor 10. You need the right tools and methodology, both to build a skyscraper (vs. small house) or to build/leverage big data (vs. small data), otherwise - in both cases - it's bound to fail.