Subscribe to DSC Newsletter

6 Predictions about Data Science, Machine Learning, and AI for 2018

Summary:  Here are our 6 predictions for data science, machine learning, and AI for 2018.  Some are fast track and potentially disruptive, some take the hype off over blown claims and set realistic expectations for the coming year.

 

It’s that time of year again when we do a look back in order to offer a look forward.  What trends will speed up, what things will actually happen, and what things won’t in the coming year for data science, machine learning, and AI.

We’ve been watching and reporting on these trends all year and we scoured the web and some of our professional contacts to find out what others are thinking.  There are only a handful of trends and technologies that look to disrupt or speed ahead.  These are probably the most interesting in any forecast.  But it also valuable to discuss trends we think are a tad overblown and won’t accelerate as fast as some others believe.  So with a little of both, here’s what we concluded.

 

Prediction 1:  Both model production and data prep will become increasingly automated.  Larger data science operations will converge on a single platform (of many available).  Both of these trends are in response to the groundswell movement for efficiency and effectiveness.  In a nutshell allowing fewer data scientists to do the work of many.

The core challenge is that there remains a structural shortage of data scientists.  Whenever a pain point like this emerges we expect the market to respond and these two elements are its response.  Both come at this from slightly different angles.

The first is that although the great majority of fresh new data scientists have learned their trade in either R or Python that having a large team freelancing directly in code is extremely difficult to manage for consistency and accuracy, much less to debug.

All the way back in their 2016 Magic Quadrant for Advanced Analytic Platforms, Gartner called this out and wouldn’t even rate companies that failed to provide a Visual Composition Framework (drag-and-drop elements of code) as a critical requirement.  Gartner is very explicit that working in code is incompatible with the large organization’s need for quality, consistency, collaboration, speed, and ease of use.

Langley Eide, Chief Strategy Officer at Alteryx offered this same prediction, that “data science will break free from code dependence.  In 2018, we’ll see increased adoption of common frameworks for encoding, managing and deploying Machine Learning and analytic processes. The value of data science will become less about the code itself and more about the application of techniques.  We’ll see the need for a common, code-agnostic platform where LOB analysts and data scientists alike can preserve existing work and build new analytics going forward.”

The second element of this prediction which I do believe is disruptive in its implications is the very rapid evolution of Automated Machine Learning.  The first of these appeared just over a year ago and I’ve written several times about the now 7 or 8 competitors in this field such as DataRobot, Xpanse Analytics, and PurePredictive.  These AML platforms have achieved one-click-data-in-model-out convenience with very good accuracy.  Several of these vendors have also done a creditable job of automating data prep including feature creation and selection.

Gartner says that by 2020, more than 40% of data science tasks will be automated.  Hardly a month goes by without a new platform contacting me wanting to be recognized on this list.  And if you look into the clients many have already acquired you will find a very impressive list of high volume data science shops in insurance, lending, telecoms, and the like.

Even large traditional platforms like SAS offer increasingly automated modules for high volume model creation and maintenance, and many of the smaller platforms like BigML have followed suite with greatly simplified if not fully automated user interfaces.

 

Prediction 2:  Data Science continues to develop specialties that mean the mythical ‘full stack’ data scientist will disappear.

This prediction may already have come true.  There may be some smaller companies that haven’t yet got the message but trying to find a single data scientist, regardless of degree or years of experience, who can do it all just isn’t in the cards. 

First there is the split between specialists in deep learning and predictive analytics.  It’s possible now to devote your career to just CNNs or RNNs, work in Tensorflow, and never touch or understand a classical consumer preference model.

Similarly, the needs of different industries have so diverged in their special applications of predictive analytics that industry experience is just as important as data science skill.  In telecoms and insurance it’s about customer preference, retention, and rates.  In ecommerce it’s about recommenders, web logs, and click streams.  In banking and credit you can make a career in anomaly detection for fraud and abuse.  Whoever hires you is looking for these specific skills and experiences.

Separately there is the long overdue spinoff of the Data Engineer from the Data Scientist.  This is identification of a separate skills path that only began to be recognized a little over a year ago.  The skills the data engineer needs to set up an instance in AWS, or implement Spark Streaming, or simply to create a data lake are different from the analytical skills of the data scientist.  Maybe 10 years ago there were data scientists who had these skills but that’s akin to the early days of personal computers when some early computer geeks could actually assemble their own boxes.  Not anymore.

 

Prediction 3:  Non-Data Scientists will perform a greater volume of fairly sophisticated analytics than data scientists.

As recently as a few years ago the idea of the Citizen Data Scientist was regarded as either humorous or dangerous.  How could someone, no matter how motivated, without several years of training and experience be trusted to create predictive analytics on which the financial success of the company relies?

There is still a note of risk here.  You certainly wouldn’t want to assign a sensitive analytic project to someone just starting out with no training.  But the reality is that advanced analytic platforms, blending platforms, and data viz platforms have simply become easier to use, specifically in response to the demands of this group of users.  And why have platform developers paid so much attention?  Because Gartner says this group will grow 5X as fast as the trained data scientist group, so that’s where the money is.

There will always be a knowledge and experience gap between the two groups, but if you’re managing the advanced analytics group for your company you know about the drive toward ‘data democratization’ which is a synonym for ‘self-service’.  There will always be some risk here to be managed but a motivated LOB manager or experienced data analyst who has come up the learning curve can do some pretty sophisticated things on these new platforms.

Langley Eide, Chief Strategy Officer at Alteryx suggests that we think of these users along a continuum from no-code to low-code to code-friendly.  They are going to want a seat at our common analytic platforms.  They will need supervision, but they will also produce a volume of good analytic work and at very least can leverage the time and skills of your data scientists.

 

Prediction 4:  Deep learning is complicated and hard.  Not many data scientists are skilled in this area and that will hold back the application of AI until the deep learning platforms are significantly simplified and productized.

There’s lots of talk about moving AI into the enterprise and certainly a lot of VC money backing AI startups.  But almost exclusively these are companies looking to apply some capability of deep learning to a real world vertical or problem set, not looking to improve the tool.

Gartner says that by 2018, deep neural networks will be a standard component of 80% of data scientists’ tool boxes.  I say, I’ll take that bet, that’s way too optimistic.

The folks trying to simplify deep learning are the major cloud and DL providers, Amazon, Microsoft, Google, Intel, NVDIA, and their friends.  But as it stands today, first good luck finding a well-qualified data scientists with the skills to do this work (have you seen the salaries they have to pay to attract these folks?).  

Second, the platforms remain exceedingly complex and expensive to use.  Training time for a model is measured in weeks unless you rent a large number of expensive GPU nodes, and still many of these models fail to train at all.  The optimization of hyperparameters is poorly understood and I expect some are not even correctly recognized as yet.

We’ll all look forward to using these DL tools when they become as reasonable to use as the other algorithms in our tool kit.  The first provider to deliver that level of simplicity will be richly rewarded.  It won’t be in 2018.

 

Prediction 5:  Despite the hype, penetration of AI and deep learning into the broader market will be relatively narrow and slower than you think.

AI and deep learning seems to be headed everywhere at once and there are no shortages of articles on how or where to apply AI in every business.  My sense is that these applications will come but much slower than most might expect.

First, what we understand as commercially ready deep learning driven AI is actually limited to two primary areas, text and speech processing, and image and video processing.  Both these areas are sufficiently reliable to be commercially viable and are actively being adopted.

The primary appearance of AI outside of tech will continue to be NLP Chatbots, both as input and output to a variety of query systems ranging from customer service replacements to interfaces on our software and personal devices.  As we wrote in our recent series on chatbots, in 2015 only 25% of companies had even heard of chatbots.  By 2017, 75% had plans to build one.  Voice and text is rapidly becoming a user interface of choice in all our systems and 2018 will see a rapid implementation of that trend.

However, other aspects of deep learning AI like image and video recognition, outside of facial recognition is pretty limited.  There will be some adoption of facial and gesture recognition but those aren’t capabilities that are likely to delight customers at Macy’s, Starbucks, or the grocery store.

There are some interesting emerging developments in using CNNs and RNNs to optimize software integration and other relatively obscure applications not likely to get much attention soon.  And of course there are our self-driving cars based on reinforcement learning but I wouldn’t camp out at your dealership in 2018.

 

Prediction 6:  The public (and the government) will start to take a hard look at social and privacy implications of AI, both intended and unintended.

This hasn’t been so much a tsunami as a steadily rising tide that started back with predictive analytics tracking our clicks, our locations, and even more.  The EU has acted on its right to privacy and the right to be forgotten now documented in their new GDPR regs just now taking effect.

In the US the good news is that the government hasn’t yet stepped in to create regulations this draconian.  Yes there have been restrictions placed on the algorithms and data we can use for some lending and health models in the name of transparency.  This also makes these models less efficient and therefore more prone to error. 

Also, the public is rapidly realizing that AI is not currently able to identify rare events with sufficient accuracy to protect them.  After touting their AI’s ability to spot fake news, or to spot and delete hate speech or criminals trolling for underage children, Facebook, YouTube, Twitter, Instagram, and all the others have been rapidly fessing up that the only way to control this is with legions of human reviewers.  This does need to be solved.

Still, IMHO on line tracking and even location tracking through our personal devices is worth the intrusion in terms of the efficiency and lower cost it creates.  After all, the materials those algorithms present to you on line are more tailored to your tastes and since it reduces advertising cost, should also reduce the cost of what you buy.  You can always opt out or turn off the device.  However, this is small beer compared to what’s coming.

Thanks largely to advances in deep learning applied to image recognition, researchers have recently demonstrated peer-reviewed and well-designed data science studies that show that they can determine criminals from non-criminals, and gays from straights with remarkable levels of accuracy based only on facial recognition.

The principle issue is that while you can turn off your phone or opt out of on-line tracking that the proliferation of video cameras tracking and recording our faces makes it impossible to opt out of being placed in facial recognition databases.  There have not yet been any widely publicized adverse impacts of these systems.  But this is an unintended consequence waiting to happen.  It could well happen in 2018.

 

 

About the author:  Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist since 2001.  He can be reached at:

[email protected]

Views: 17597

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Larry Ye on December 20, 2017 at 12:01pm

Bill, I can not agree on prediction four and five on deep learning, especially on image/video analytics.

Although deep learning requires profound knowledge and experience, it is not too complicated and too hard to be implemented in the business environments. Using transfer learning, proven deep neural net models can apply to many basic business scenarios. I've just done a job to train a nineteen layers neural net model (proven existing model) to be able to recognize new objects for a client within two days. In the business world, it always starts with the proven models and uses transfer learning to tailor it to fit the business needs. So it is not necessarily still taking ages to make a Deep Learning model working for the business scenarios. Many use cases are quite mature, i.e., footfall tracking, crowd counting, traffic monitoring, object recognition, suspicious behavior, left object detection, indexing, cross camera tracking, incident detection, etc. etc.

Deep learning is one of the most important engines of AI. If deep learning is not likely happening in the business in 2018, basically you are predicting the AI in this trend. I would be much more optimistic based on my understanding and experience. Happy to discuss.

Comment by Tich Mangono on December 18, 2017 at 9:23am

Thanks for this analysis, Bill. It is always interesting how the market moves to fill the gap between human and tools/computers/algorithms etc. We can bet on that happening for deep learning as well, but I agree it may not be in 2018. Being in the healthcare industry, I am seeing different trends than you stated when I look at image and video processing. It will be huge in 2018, in fact it picked up so much steam in 2017. Most healthcare-AI VC-backed startups in the last 2 years have been in the image diagnostics area so I think this will have a lot of applications.

Comment by Paul Bremner on December 15, 2017 at 2:10pm

Bill,

Great observations with very interesting implications for data science/scientists.   I recently started SAS’s Enterprise Miner course and was blown away by the speed Data Science Platforms (DSPs) provide in cleaning data, constructing models, comparing results, etc.  I already knew this but seeing it in action still takes your breath away after spending a good amount of time learning database and statistical programming.  (I’d taken a Big Data course from Microsoft using R/RevoscaleR, and then SAS’s course on Predictive Modeling with Logistic Regression – much cleaner, compact code, and faster.  But you’re still doing lots stuff manually that in Enterprise Miner you just fly through.)

I find it fascinating that for years we’ve heard about how open source programming is the wave of the future in data science and now it’s increasingly obvious to people that statistical programming of any kind (whether R, Python or SAS) is not a very effective way to get things done in large-scale data science efforts with lots of steps and models. SAS currently offers around 250 courses but they don’t really attempt to teach statistical programming much beyond regression and clustering: all the advanced data mining techniques of decision trees, random forests, neural networking, SVMs, etc, use drag and drop.  It’s just so much faster.  Of course, database programming is fantastically useful in the extracting and validating of data that precedes use of EM.  And learning statistical programming, while not necessary, helps you understand what’s going on with the data science platforms as well as allowing use of even more advanced techniques/options that  might not be included in the drop-downs.

 

A couple of comments about Prediction #3 and Gartner’s “Citizen Data Scientists.”  Whoever came up with this name should get a marketing communications award – very descriptive and catchy, but slightly gimmicky (and I’m from marketing.)  It implies that data science capabilities can easily be propagated throughout an organization to people with no real statistics background or study.  I think the way you and the Alteryx chief strategy officer describe things is much better.  These folks will want a seat at the table, can help take some of the workload off Data Scientists, can probably do a lot of useful work and, if nothing else, will provide an additional set of eyes on the data and modeling efforts.

 

I suspect the growth of Citizen Data Scientists will be more limited than Gartner and some of the DSP firms seem to be implying.  And the first adopters will most likely be folks with titles like Marketing Analytics, Business Analytics, Customer Analytics, Market Insight, etc.  Based on LinkedIn profiles for people here in the SF Bay Area, these people seem to have backgrounds similar to Data Scientists (often engineering or computer science) with about 50% having SQL coding skills, the same as Data Scientists.  But about 30% also have MBAs, a number similar to general Marketing people, in contrast to Data Scientists where less than 5% have MBAs.  These individuals are presumably in the category that Burtch Works describes as “Predictive Analytics Professionals” (the subset of PAPs that focus on structured data), which means they are already developing models and projections, and have close ties with Marketing/Finance/Operations/Strategy which is where decision-making authority resides.  So they could probably work quite well with the Data Science groups or, alternatively, give Data Scientists a run for their money. 

 

At any rate, thanks for all the good work and analysis on automated machine learning and Data Science Platforms.  I’m always interested to see what’s out there.  One thing I’ve noticed is that even the “self-service” BI applications like Tableau have started incorporating advanced statistical functions (time series analysis, regression, clustering) and, of course, you can run R scripts to implement other statistical models.  It will be interesting to see whether Tableau (and Power BI) continue to add ML capabilities allowing users to employ advanced techniques without having to learn R/Python.  This would make sense and put them in a better position vis a vis the smaller DSP vendors.  The new “Citizen Data Scientists” would probably want to look at both the DSP vendors and Tableau/Power BI in this scenario.

Comment by Engin Heriscakar on December 15, 2017 at 11:52am

Thanks, Bill this is very nice. I am in the middle of a career change and I had an eye on data science because I am a "numbers person" and there is a lot of hype. It is nice to see your grounded-on-earth predictions. Such a career change requires so much effort and time to learn new skills. Programming for data science is certainly at the top of the list as it is pushed by almost everyone in the field. It was great to read your insightful counter argument. Perhaps the field will not be as promising for career opportunities as people think it will be. And that is certainly a big risk for someone like me who is just getting his feet wet to enter the field. Have a good 2018!

Comment by Terry Kaufman on December 14, 2017 at 4:49pm

Pretty sobering, and needed. On the other hand, and though I dismiss the studies on image recognition of gays (and would likely the one on criminals) based on photos as poor science, I believe AI will make astounding breakthroughs based on n-dimension pattern recognition with an increasing frequency. Not sure that machine thinking and machine learning are congruent, but the overlaps will happen and hopefully semantics will not stifle advances. Labels have a way of limiting realization of potential.

Comment by Mitchell A. Sanders on December 14, 2017 at 11:46am

Well thought-out and intelligent set of predictions. I'm welcoming the relief from pressure of the unicorn having to be the master-of-the-data-universe that does his/her own engineering plus is fully functioning up-to-date all things AI plus every new code package that comes out in Python or R plus ... Whew! A need for fracturing these expectations has been building for years.

We'll see if 2018 is the year the myth of the unicorn finally dies and those bearing that title of "Data Scientist" get the relief to work on the things they can provide the most value on. It's been a long time coming.

Follow Us

Resources

© 2018   Data Science Central™   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service