Subscribe to DSC Newsletter

Will We Soon No Longer Need Data Scientists?

The job of data scientist — the quintessential big data job, and the job that was just voted the best job in America for 2016 — is at risk.

Data scientists have been called “unicorns” because finding the right person with the right set of skills — including coding, statistics, machine learning, database management, visualization techniques, and industry-specific knowledge — could be practically impossible.  But machine learning and big data itself may be making those unicorns as obsolete as they are mythical.

New machine learning algorithms can autonomously analyze data and identify patterns, even interpret the data and produce reports and data visualizations.

You (and your computer) can be your own data scientist

Source for picture: click here

While most people can see how certain information would be useful and what sort of insights might be derived from it, most lack the technical skills to perform the analytics. They might not have the computers that are able to carry out the large volume of calculations quickly enough to take action, but more often they lack the analytical skills to tell that computer what to do.

Natural Language Processing (NLP) technologies can help to break down the barriers to widespread use of data analytics by making complex analytics possible to just about anyone, regardless of their technical ability. In essence, NLP is teaching computers to accept input in the natural, spoken language of humans – eliminating the communications barrier between man and machine.

IBM, for example, believes that it can offer a solution to the skills shortage in big data by cutting out the data scientists entirely and replacing (or supplementing) them with its Watson natural language analytics platform.

IBM’s Vice President for Watson Analytics and business intelligence, Marc Altshuller, explains “With a cognitive system like Watson you just bring your question – or if you don’t have a question you just upload your data and Watson can look at it and infer what you might want to know.

“A traditional data scientist might receive training in R or SAS or whatever tool their school uses, but we found in the ‘citizen analyst’ area, they were often being given the wrong tools where they were required to guess the right answer, and then test their guess.”

I believe that Watson, and other NLP or cognitive technologies, will play an important role in the future of analytics and the education around it. As the value of data analytics becomes apparent in all fields of activity, a growing number of people will want to be able to extract insights from their data. They might not want to take three or four years out to learn advanced computer science and statistics, and with the advances in cognitive computing that won’t be necessary. All that is required might be a brief introduction to NLP technologies.

Gartner forecasts that the need for so-called ‘citizen data scientists -- people who are in job roles that are not primarily about analytics but who could benefit from using data-driven insights -- is going to grow five times faster than the need for highly skilled data science specialists. And it is these ‘citizen analysts’ that IBM is hoping to attract to working with Watson.

Visualizations at the click of a button

In addition, new technologies are emerging that will allow lay people in any field to create detailed infographics and other storytelling devices to help interpret the data NLP technologies will return.

Visualizations are usually used as a layer on the top of data, designed to make the data more digestible.

In big data analytics, reporting the insights we’ve gleaned from analyzing large amounts of messy data sets is the crucial “last step” of the process – and it’s often a step which causes us to stumble. We may have crunched terabytes of data in real time to come up with our world changing revelations. But unless we can communicate them convincingly to those who need to take action, they are useless, and worse than that, a waste of valuable time and money.

This is why data analysts have come to rely increasingly on graphics and visualizations combined with text – such as the now ubiquitous “Infographics” – to get a message across. But infographics rarely tell the whole story, and are still generally issued alongside written reports or summaries, particularly if they have a corporate purpose and detail is required. Again, this takes time and effort.

Programs that can visualize data start with the graphing functions available in Excel and get progressively more complex. But one program, called Quill, takes the trend a step further, producing text-based reports that explain the data clearly and concisely.  Think of it as an executive summary created by a computer to explain a set of data.  At the click of a button.

Combined, these types of technologies mean that the human interface — the data scientist — may soon be as mythical as that unicorn, and simply unneeded in the big data landscape where lay persons can conduct their own analytics at will.

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 22618

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Dalila on March 23, 2017 at 11:07am

Nassim Taleb, the author of "Black Swan" stated elegantly that we have many fake statisticians who think they can do statistics because they can use statistical packages to perform the tests for them when, in fact, have no understanding what they are doing.  For instance, many people will perform a Linear Regression without even checking the assumptions, and then go on using it.

That many non statisticians can use statistical packages in their work doesn't mean statistician are obsolete, or these non statisticians understand distributions, bias-variance trade off, sampling etc.   This goes the same with non data scientist who can use tools to build a predictive tool, or draw nice plots to make their points, but have no scientific training, math, statistical, or computational training to build but simple models or understand how to deal with pitfalls in building models.    

As a unicorn my self, and some one who has first hand experience building predictive models (Machine learning, deep learning), supervised and  unsupervised models, text mining, and graph network analysis in the fields of finance, customs, food, business, real estate, and literature, I can state that every problem has its own challenge and has to build on experience and a lot of experimentation to solve it, or come to So What.  I found most of the problems I worked on required more than just using tools and they were far from been straight forward.  

By the way, IBM Watson can help speed up getting to insight from documents and hence allow humans to spend more time doing what we do best, get value out of the extracted information.  IBM Watson team is composed of data scientists, software engineers, domain experts. and others.  Hence, to state that Data Science field is a dying field is  stating that IBM Watson self improve itself and require no supervision from human (far from it.)

My take is many cannot see a unicorn even if it hits them.  The believe that a unicorn is mythical is too ingrained in their psychic.  Your writing attests to this idea, and gives the impression that data science is an obsolete field, should stop looking for unicorns, as we don't exist.   Hence, you are discarding the facts that some problems are too complex to be solved by a simple machine learning, and business and education world can work to increase the pool of unicorns.

By the way, many people mix Big Data with a lot of Data.  You can have a lot of data, and still not have a Big Data problem, but Big Data refers to messy, complex, and a large amount of data. When you have a "well behaved " data, like height of every men in the US, you just need a small sample, not all the data, to estimate the mean height within a population, by a sample from this "population" should be enough to provide a reasonable estimate.  I just worked on a problem where the client though he had Big Data problem when in fact he had a simple problem that required no Hadoop or Spark.

Comment by Sione Palu on May 8, 2016 at 2:11pm

Data scientists will be here in the next 20 years and beyond.  Everyone thought that Electronic engineers would be out of a job by now because designing ICs (integrated circuits) would be done by CADs and automation. That hasn't happened, even with the widespread use of CAD softwares, like various types of PSpice that are available today in the market & other sophisticated simulation system, like MathWorks' Matlab Simulink, National Instruments' LabView, MapleSoft's MapleSim and others. The demand for these skills haven't'  yet  diminished.  Data scientists will be around for a long time.

Videos

  • Add Videos
  • View All

Follow Us

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service