A Peek Into the Future of Big Data and Predictive Analytics

Summary: Every year the Strata+Hadoop conference offers a unique opportunity to look into the future and see what trends will dominate our profession in the near term. Here’s my take.

I’m just back from the Strata+Hadoop Conference in San Jose last week and I’m charged up. This is a traditionally sold-out crowd of 5,000 Big Data and data science practitioners meeting in the heart of the Silicon Valley once a year to share three days of insight into what’s new and what’s happening in our world. As a profession we may not be known for our enthusiasm but believe me there was plenty.

During the rest of the year I keep up by talking to colleagues and clients and of course reading as much as I can here at DataScienceCentral.com but once a year I make the pilgrimage up to the headwaters of innovation in ‘the valley’ to listen and learn, and mostly to try to figure out where we’re all headed in the coming year or two.

If you haven’t been to Strata+Hadoop you should definitely try to make it. There are now five of these annually, San Jose, New York, and three international venues. San Jose is always first in the year and certainly my choice. There are two days for formal conferences preceded by a day of tutorials and panels that’s just as valuable as the main conference. On each of the two main conference days, the program starts with all 5,000 attendees gathered for 9 or 10 short Keynotes before breaking up into 16 different tracks each with an additional 6 presentations per day. If there’s anything to complain about it’s having to choose which of the 96 daily presentations to pick since being only one person limits you to 6 per day.

So in all I personally heard 32 presentations in two days, plus talking to the many, many vendor exhibitors to try to distill the major trends we should look out for in the next few years.

The short Keynotes in the morning are the most carefully thought out since they’re designed to tell us what those major trends are and are delivered by the top industry business and academic players. Think TED talks on tech. Engaging in a little meta-analysis, what I saw was two major trends for the year ahead:

AI / Real Time / Deep Learning:

I’ve bunched these together because the thrust of this trend is assisting humans to make real time decisions or assist in real time tasks (think scheduling) using streaming real time data. The integrating thought is that IoT, stream processing, and deep learning are simply feeder or augmenting technologies leading to the real end product, AI.

There are many predictive analytic techniques that can be applied here but Deep Learning as it applies to image, text, and speech recognition got the lion’s share of the coverage.

There was some conversation around whether we were on the verge of SkyNet or Rosie (from the Jetsons) – consensus was Rosie and that we shouldn’t worry. Also whether we would actually achieve artificial general intelligence, a true brain-mimicking OS, or whether we should really be pursuing Augmented Intelligence, which would be more narrowly defined AI capabilities destined to enhance human tasks. Augmented Intelligence is where the VC money is going now since these applications are close in and cover a whole host of physical robots as well as narrowly defined speech, text, and image AIs that could help us schedule, make reservations, communicate in natural speech, or guide those self-driving cars.

While Microsoft, Google, and others have made great strides in commercializing deep learning, there was broad agreement that we were very early in this movement. How is this movement progressing? One speaker praised the AI built into the ubiquitous Roomba but went on to point out that “Roomba doesn’t know sh*t” (apparently Roomba can’t tell it’s running over dog poop).

Still, if you were going to pick a career path, AI is where it’s at (or will be near term).

Cloud Computing

We all know what’s already happened in Big Data infrastructure. Hadoop won (Doug Cutting was a speaker). Spark clearly prevailed over MapReduce. Relational databases (NewSQL) learned how to scale horizontally and are now MPP. SQL is so widely available that it’s just not accurate to say NoSQL anymore.

The forward thrust is enterprise computing and that is completely dominated by the cost and technical advantages of cloud computing.

What is odd, our morning speakers observed, is that we’ve been talking about the dominance of cloud computing for almost five years and yet to this point, the cloud world has been dominated by on-prem private clouds. The consensus however, is that is about to change as the cost advantage of public cloud has become overwhelming.

There will be a transition period and many vendors have positioned themselves for a hybrid transitional cloud strategy. This hybrid strategy generally holds that companies will be most willing to migrate the non-strategic applications such as dev, test, and disaster recovery backup to the public cloud until they become comfortable with moving more. The consensus is that Google, Amazon, and Microsoft are positioned to dominate and that this migration to the public cloud will now happen very rapidly.

Some Closing Observations

There was a great article in the April 5^th Wall Street Journal by Steve Case of AOL fame. In this article “The Next Wave in the Internet’s Evolution” he makes the case that the First Wave was about building out the Internet infrastructure. According to Case, this peaked in about 2000 and set the stage for the Second Wave. This has been about building apps and services on top of the Internet.

Case goes on to say, “The Third Wave has begun. Over the next decade and beyond the Internet will rapidly become ubiquitous, often in invisible ways.”

Steve Case and our keynote speakers are completely in synch. We’ve largely built out ecommerce, and (I dearly hope) have peaked with the next social network / photo sharing / instant messaging app. Even Twitter is 10 years old. To me the case is clear that the new ‘app’ will be an AI capability which should be as invisible as possible. It will be built from components of IoT, stream processing, deep learning, cloud computing, and the rest but it will be so well integrated into our daily lives as to nearly disappear.

About the author: Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist and commercial predictive modeler since 2001. He can be reached at:

[email protected]

A Peek Into the Future of Big Data and Predictive Analytics

Leave a Reply Cancel reply