My favorite quote on Big Data is by Dan Ariely who says "Big Data is like teenage sex, everyone talks about it, no one really knows how to do it, everyone thinks everyone else is doing it, so everyone claims that they are doing it..."
Dan Ariely's views aside, Gartner defines Big Data as " high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.”
Regardless of which 'definition' you prefer, there are some stark differences emerging in how Big Data is being used when compared to more traditional Business Intelligence. Statistical Inferential techniques are being used on Big Data to reveal relationships and dependencies or to perform predictions of outcomes or behavior. Business Intelligence tends to be more descriptive statistics and reporting oriented. It is this shift to predictive analytics using Big Data, that makes it important for all of us to be more aware of what Big Data can and more importantly cannot do.
In their article "Critical Questions for Big Data", the authors define Big Data as a cultural, technological, and scholary phenomenon that rests on the interplay of:
1. Technology: maximizing computation power and algorithmic accuracy to gather, analyze, link and compare large data sets
2. Analysis: drawing on large data sets to identify patterns to make economic, social, technical and legal claims
3. Mythology: the belief that big data sets offer a higher form of intelligence and knowledge... with the aura of truth, objectivity and accuracy.
Big Data is not better data: Big Data does not solve the problem of bias in itself and care needs to be taken that big data sources (such as Twitter) are representative of the overall population, without which the results drawn from such sources can lead to wrong conclusions.
Claims to objectivity and accuracy can be misleading: Big Data also introduces new problems, such as the multiple comparison problem, in which testing a large number of hypothesis can produce false positives.
Big Data Analytical techniques have lagged: A lot of the effort in Big Data projects is in the extract, transform, load part and critics argue that the analytical techniques used on Big Data have not kept pace with the evolving needs of large data sets.
In conclusion, while the allure and promise of Big Data is undeniable it behooves all of us to understand that Big Data initiatives are not some magic pills that are going to solve our problems. They do not preclude the need for judgment and understanding of the underlying issues/theory before attempting to solve them..
Image Credit: Alan Tun