Summary:  The world of data science is splitting into two distinct camps, the start-up app world and the commercial world.  The good news is that almost all the opportunity lies in commercial predictive analytics where you can broadly specialize and still play with all the latest innovations.

In case you’re the only person who hasn’t heard this phrase, data scientists have increasingly been referred to as ‘unicorns’ as in ‘as rare as a unicorn’.  As a data scientist I have taken exception to this since it seems to set an unrealistically high bar and simply isn’t true of the many data scientists I know personally.  (See my earlier article “How to Become a Data Scientist”). 

In February I spent several days at the Strata Conference catching up on all things analytic and big data, and yes there was still a fairly strong theme around the difficulty in finding these unicorns.  One very persuasive speaker actually spoke about how to affect the capture, which in his version was to take fresh Ph.Ds. in math, statistics, OR, or computer science and train them up himself.  Well, I thought, 1.) If the future is limited to data scientists with fresh Ph.Ds. then the supply is indeed vanishingly small, and 2.) What are the rest of us supposed to do for talent?  Three things became apparent.

1. The days of finding a super-polymath data scientist are over.  The field of data science has just become too big and too diverse for any one individual to master all its disciplines.  This is like trying to find a single medical doctor who can cure cancer, deliver babies, diagnose tropical diseases, and perform open heart surgery.  The era of specialization is upon us.  I don’t care how hard you look the odds of finding in one person a data scientist who is equally competent at image processing, natural language processing, graph data bases, recommenders, big data, all the variants on predictive modeling, who can conceptualize and solve analytics-based problems across all processes in all industries, and who is at home writing R, Python, Julia, Scala, and at the same time run a modeling factory with SAS is in fact more rare than a unicorn.  That person simply does not exist.  The good news is that specialization will be good for us.  It will allow practitioners to focus on what they like and what they are best at.

2.  Is there really anyone who actually needs a Unicorn?  Probably not.  But the thing about the Strata Conference which is always held in San Jose is that you are in the app-icenter of big data and new and innovative ways to use analytics.  At Strata you are likely to get a very biased view of what’s going on with analytics and big data since by location and by its focus on what is new and novel, presentations necessarily skew to the cutting edge.  As conferences go, that’s the way it should be for practitioner data scientists.  But that’s not true for our customers, the business people who are paying our bills.  Silicon Valley start-ups may indeed benefit from polymath near-unicorns but as a proportion of practicing data scientists, I propose that this is a very tiny percentage both by job count and by qualified candidates.

3.  Our world of Data Science is dividing in two.  This is perhaps best illustrated by the debate over working in Python, R, Julia, or Scala versus working with advanced drag-and-drop modeling packages like SAS or SPSS.  Supporters of coding in open source R or Python exult in the perfection of their code and the fact that it can do anything that packages can do.  Supporters of packages like SAS say ‘why waste time coding when I can produce 10 times as many models in the same time as you coders’.  And if you weren’t aware, packages like SAS now do Natural Language Processing, graph analytics, optimization, and recommenders just like open source.  In fact some packages allow you to import R models or work directly in R alongside the internally generated models.

The great divide that is becoming evident seems to be this.  If you are part of the start-up, high-innovation, cutting edge app world you are much more likely to be called on to utilize your R or Python skills.  Basically this is custom code that becomes a significant element of the product.  You might even be a unicorn in training.

However, if you are out there in the business world in insurance, banking, ecommerce, manufacturing, brick-and-mortar retail, or any of the other well established B2C or B2B models, including by the way government, then you are much more likely to be using a package like SAS or SPSS. 

Is manual coding in R or Python likely to take over this space?  Not very likely.  Taking just SAS, they’re a $3 Billion company with over 33% market share including 93 of the top 100 Fortune Global 500.  By sales or by head-count, this part of our Data Science world accounts for probably 90% of the folks identifying as data scientists.  These analytics power houses are running modeling factories where turning out one or two new models a day just won’t cut it.  They’re making use of cutting edge in-memory analytics that makes model creation a volume event and can take in Big Data quantities of structured and unstructured data to enhance modeling.  In short, they’re doing about 95% of the things that the most innovative start-ups are doing, and they’re doing it at volume and with a sharp focus on the business top and bottom line.

So how should we respond to these trends:  the declining need for unicorns, the reality of broad specialization, and the split of our world into writing code central to start-up products versus the much larger production world of predictive analytics and data viz?  It’s important for folks who are thinking of becoming data scientists to understand that the great majority of opportunities do not require unicorns.  You can specialize (broadly), work on packages that make analytics faster than coding, and still play with all the really cool innovations that are constantly arising.


March 16, 2015

Bill Vorhies, President & Chief Data Scientist – Data-Magnum - © 2015, all rights reserved.

About the author:  Bill Vorhies is President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist and commercial predictive modeler since 2001.  He can be reached at:

[email protected]

The original blog can be seen at:


Views: 3200

Tags: predictive modeling


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Ravi Krishnappa on November 21, 2018 at 8:59am

Everything was going well till you stated that "Data scientists using ready to use SAS products churn out models in a factory fashion". Most companies can afford SAS if it really helps the company to churn out thousands of models but companies are not buying SAS products. Companies are refusing the SAS approach to data science. SAS is also not learner friendly. It gives free trials for a limited time and that is not sufficient for 99.99% of newcomers and they prefer the truly open-source Python and R. When these developers apply their knowledge, they promote Python and R at workplaces. SAS will go down like other companies that refused to give unlimited trail license to developers. TIBCO and  IBM were like SAS.   

Comment by Tich Mangono on January 23, 2016 at 8:10pm

Great observations here! I also agree with Asad Ali's comment. These are two forks in a road that will likely merge further down because they are trying to get to the same place. As the tools become more defined, it is likely that data science will move toward higher-level, black box kinds of tools to allow specialization as you point out. This trend could also be emerging around big data tools and cloud services - there are so many of them there is a clear business case for streamlining and consolidation. I am very new to this space, so thanks for sharing!

Comment by Robert Klein on June 3, 2015 at 7:39am

Your observation that the data science world is splitting in two is all about the fact that all of the nontechnical holdouts who didn't see an immediate business case finally saw the business case for data science, and they want to use it by whatever means they can. So the forking here for easy-to-use versus perfectible is natural, and tools developed by coders along that perfectibility path will be rolled into the next generation of drag-and-drop-style tools. We're doing our part by releasing an API so the tools the coders want can be built into applications that business people can use. It correlates themes in streams of unstructured information over time. It's inspired by chaos theory. Love to get feedback on it from anyone here. Hit us up on github. 

Comment by Asad Ali on March 19, 2015 at 1:10pm

If a data person (analyst, data scientist) needs to even think about syntax then we can say that our current tools are not up to the mark and need significant improvement. I believe that two trends will become evident in future - the emergence of products that remove business people more away from coding and bring them closer to information and process flows that efficiently translate business decision to analytics framework to data wrangling (be it through code/ machines or humans)

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service