Subscribe to DSC Newsletter

Looking Forward: Big Data in 2015 by Gurjeet Singh

2014 has been a landmark year for Big Data. The most spectacular example of this was the Hortonworks IPO – a success by any measure.

As we look forward to 2015, it is clear that while big data technologies still have a long ways to go in terms of enterprise adoption, its ultimate adoption is no longer in question.

Enterprises have bought into the promise of big data – leaving the innovators in storage, analytics and visualization to deliver the results.

The following represent some thoughts on what to expect in the intervening twelve months – and what lies just beyond.

#1 – This is the year that enterprises begin to adopt Machine Learning

Innovative companies have been using Machine Learning in a variety of use cases for several years now – but for the most part they have been technology companies developing systems for themselves and for specific cases.

What is different about 2015 is data complexity.  While the explosion in the volume of data today is well documented, in truth what passes for big data today is just a bigger, cheaper and in some cases, faster version of stuff we have done with small data.

The big problem with big data involves complexity. Data is growing exponentially with time and the number of possible hypotheses (insights) in a dataset is exponential in the size of the dataset.

While we have built technologies that make it faster and cheaper to query datasets, this line of attack still doesn’t scale.

This is why Machine Learning is going to make its mark in 2015.

Machine Learning represents the critical set of technologies for actually delivering on the vision that the C-Suite had when it started its Hadoop initiative.  Organizations know they are collecting data faster than they can analyze it, they just don’t know how to solve that problem.  The reason is that while we now have great infrastructure for accumulating and querying data, the attendant infrastructure for mature Machine Learning workflows is missing.

The standard Machine Learning workflow is:

1. Get the data

2. Transform the data to create meaningful entities

3. Transform data for Machine Learning algorithms

4. Build supervised/unsupervised models/representations

5. Deploy the model in production

The reason that 2015 is still an emerging year for Machine Learning is that there is no existing enterprise-grade system available today to support this workflow. Developing this infrastructure represents a giant opportunity in 2015 with a huge payoff for companies that get it right and for the companies that adopt it.

Still, we expect to see Machine Learning applied to a host of business problems in 2015, although hitting scale won’t happen until 2016.

#2 – Better Models Matter

The physicist Eugene Wigner wrote a famous article called “The Unreasonable Effectiveness of Mathematics“. The basic argument was how mathematical models developed in a given context ended up being much more generally useful. A few years ago, the famous computer scientist, Peter Norvig wrote an article called “The Unreasonable Effectiveness of Data”. His argument was that with vast quantities of data, even extremely simple models proved to be effective in practice. Being simple, these models are quick to train and efficient to use in production.

Unfortunately, this has led to the ideology of ‘data trumps math’.

Peter’s argument is obviously true but it is not generally applicable. As an example, consider discovering insights from genetic sequences. In humans, each sequence is about three billion base pairs and there are about seven billion people. This means that we do not have and will probably never have enough data for simple models to be effective.

Even when you have enough data and simple models are producing good-enough results, it still makes sense to try more complex models if the cost of trying a model is low.

For most enterprises starting their data journey, the concept of simple models on large datasets may be the right solution.  This could define the landscape for much of 2015.  Still, towards the end of the year, we expect to see those further down the road start to adopt more complex models – because the cost and the implementation impact will have come down.  That is both the challenge and the opportunity.

#3 – Better Models Do Not Mean True AI

The last five years and 2014 in particular has seen us make incredible advances in a particular family of Machine Learning methods called Deep Learning. The original inspiration for deep learning systems came out of a rough idea of the connectivity of the brain.

The whole ‘brain-like’ thing ends there.

If you pay attention to the world’s best researchers in the field, they are very careful to make this distinction as well. We are very far from a true understanding of how the brain works.

To that end, both the United States and Europe are funding huge initiatives. How and when the fruits of this research will translate into computer systems is unknown. There are lots of open research challenges in this area.

As a result, a future defined by our robot overlords is out of the question over the next 12 months and highly unlikely over the next five years.   

#4 – Soft AI Will Make A Difference in 2015

While HAL/Skynet/The Matrix is not an existential threat anytime soon, there are lots of problems for which we traditionally assumed would require exceptionally strong AI solutions – but don’t.  It turns out that “softer,” narrower AI solutions will work well and we will see those proliferate in the coming year.

A prime example is the recent work in image labeling.  This was assumed to be an impossibly tough problem, but it turns out that we can get pretty good results with enough training data.

Many ‘human-scale’ challenges (such as vision, machine translation etc.) will see significant progress since the web has made it extremely easy to collect training datasets.

I fully expect to see mainstream press reporting on some critical breakthroughs in this area in 2015.

More complex, broad problems such as a better understanding of genomics, will remain slow and challenging for several years.

#5 –  Automation Makes a Splash

Automatically discovering statistically significant insights from data and then building actionable systems remains a formidable challenge.  

The way forward is automation.

Naive automation is not scalable since it requires an exponential number of queries. Using Machine Learning systems is the systematic solution to this problem, however, hiring increasingly scarce machine learning expertise is not possible for all enterprises.

To make Machine Learning accessible to business applications requires building automated systems that try lots of algorithms and combine them together in meaningful ways. There are so many standard approaches which good Machine Learning practitioners utilize (such as z-scoring features before feeding them into Machine Learning algorithms) – the opportunity ahead of us is to build these approaches into software systems.

Automation won’t become prevalent in 2015, but those who are out in front (completed their Hadoop implementations and have adopted real analytics platforms) are going to start to scale automation with extraordinary results.

We call it “operationalization” and it has broad implications for competitive balance within industries.  If your competitor “operationalizes” advanced analytics and Machine Learning before you – look out, it will be very hard to catch up.  


In 2015 we are going to see some significant progress in the automation of insight discovery through Machine Learning and soft AI technologies.  There will not be broad operationalization of either of these approaches, but there will be material progress by some key leaders.  This in turn will set the stage for a very interesting 2016 as those early leaders leverage their data infrastructure and start to do real competitive damage to the rest of the field.


Views: 798


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Sione Palu on January 13, 2015 at 1:48pm

Deep-Leaning is a hype , according to Prof. Michael Jordan of Berkeley ( who is a leading researcher in machine learning (his published work in machine learning has high citation index).  Deep-learning is just a re-branding of Neural Network according to Prof Jordn. I agree with Prof Jordan's views.

Follow Us


  • Add Videos
  • View All


© 2018   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service