Subscribe to Dr. Granville's Weekly Digest

Big data is not expensive. You can process 10 terabytes of data per year on collocated servers using open source tools (Python - I do it in Perl), using your own home-made Hadoop system if needed, to score 100 billion transactions, all for less than $1,000 per year. It requires a bit of optimization in the way that you manage your files (you don't even need a database, just a bit of robust file architecture), but it is entirely doable.

You can do it without expensive vendor tools, without even any tools other than programming languages, code libraries. The required expertise can be gained for free in various data science programs or books. And if you still need a scientist to analyze your big data, it won't cost more than the salary spent on a statistician working on small data.

I would go as far as to say that big data is easier to process than small data, once you get familiar with the right techniques, including:

One example where big data is simple is when you have 50 observations (from multiple users / clients) in millions of data buckets, it makes your inference process much easier - lower reliance on imputation and sophisticated experimental design (the complexity is in identifying the right buckets, make sure it's robust, and do cross-validation). Another example why big data is simple is when you compare our model-free confidence intervals with the concept of p-value in traditional statistical science: nobody but statisticians understand p-values, while my confidence intervals are easy to understand even by people with no college education. Much of the statistical science has been re-written for big data, to be easier to digest by computer scientists and data engineers, and to be more robust in the context of big data. It has been done at very little cost, which means it costs next to nothing to learn it, and there is no excuse not to gain this knowledge and skills.

Feel free to share your costs of storing / processing / analyzing big data, in the comments section. Mine are definitely below $1,000 per year, and I can scale with no extra costs to 10 terabytes per year - even more if I'm careful in selecting the right metrics to be tracked, and designing the right look-up tables and hierarchical summary tables (organized as smart text files a bit like Hadoop). However, in many projects, I actually use intuition, good judgment and sheer brain power, including selecting and reading data reports and analysis from selected external sources, to come up with a solution -- without collecting or analyzing any data at all. So it is still possible to do business with no data, but blending big data with intuition makes for an explosive cocktail, especially when you use big data where it helps most, and collect the right data (including external data sources), have skilled people help you identify/extract/analyze it, and/or use outsourced data services (such as Google Analytics if your data is digital, web log data - or FICO/Experian scores if your data is financial transactions).

The idea that big data is expensive and complicated is a myth propagated by people who refuse progress, or are concerned about their competitors successfully leveraging big data to outsmart them.

Views: 1711

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Vincent Granville on February 17, 2014 at 8:51am

I once asked the question: could churches benefit from big data? You'd think churches don't need big data. However, if they want to optimize the recruitement of massive numbers of new members (and reduce attrittion), then big data is useful - to identify large number of prospects and target them with the right message, even doing multivariate testing to identify what messages work best to attract many, sticky members. Big data help you see what you can't see. Of course if you are blind, it won't help. And many companies are still data-blind. Some just need powerful glasses to see though big data.

Connect on Twitter

© 2014   Data Science Central

Badges  |  Report an Issue  |  Terms of Service