Subscribe to DSC Newsletter

My favorite quote on Big Data is by Dan Ariely who says "Big Data is like teenage sex, everyone talks about it, no one really knows how to do it, everyone thinks everyone else is doing it, so everyone claims that they are doing it..."

Dan Ariely's views aside, Gartner defines  Big Data as " high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.” 

Regardless of which 'definition' you prefer, there are some stark differences emerging in how Big Data is being used when compared to more traditional Business Intelligence. Statistical Inferential techniques are being used on Big Data to reveal relationships and dependencies or to perform predictions of outcomes or behavior. Business Intelligence tends to be  more descriptive statistics and reporting oriented. It is this shift to predictive analytics using Big Data, that makes it important for all of us to be more aware of what Big Data can and more importantly cannot do.

In  their article "Critical Questions for Big Data", the authors define Big Data as a cultural, technological, and scholary phenomenon that rests on the interplay of:

1. Technology: maximizing computation power and algorithmic accuracy to gather, analyze, link and compare large data sets

2. Analysis: drawing on large data sets to identify patterns to make economic, social, technical and legal claims

3. Mythology: the belief that big data sets  offer a higher form of intelligence and knowledge... with  the aura of truth, objectivity and accuracy.

Big Data is not better data: Big Data does not solve the problem of bias in itself and care needs to be taken that  big data sources (such as Twitter) are representative of the overall population, without which the results drawn from such sources can lead to wrong conclusions.

Claims to objectivity and accuracy can be misleading: Big Data also introduces new problems, such as the multiple comparison problem, in which testing a large number of hypothesis can produce false positives.

Big Data Analytical techniques have lagged: A lot of the effort in Big Data projects is in the extract, transform, load part and critics argue that the analytical techniques used on Big Data have not kept pace with the evolving needs of large data sets.

In  conclusion, while the allure and promise of Big Data is undeniable it behooves all of us to  understand that Big Data initiatives  are not some magic pills that are going to solve our problems. They do not preclude the need for judgment and understanding of the underlying issues/theory before attempting to solve them..

Image Credit: Alan Tun

Views: 872

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Mathieu Landry on November 4, 2016 at 4:37am

Also love the quote. haha.

Glad to see the data quality importance shine through. Garbage in, garbage out. Big is just a question of scale.

I remember in University when I was the 'only' one loving the statistics course. The problems with 'big data' are the same. Applying the same 'scientific' rigour is perhaps the challenge. Hence IT solutions to handle the load. And AI to go faster...

Comment by Boris Shmagin on September 25, 2016 at 11:10am

The illustration for the text is good, it is emotional

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service