Summary: Data Scientist may be a prestigious title but it doesn’t reflect our area of specialization or the depth of our experience. As legions of newly minted Data Scientists are granted degrees over the next few years the problem for both employee and employer will only grow worse.
With the explosion in undergraduate and graduate level offerings in data science we are on the fast track to relieving a critical human resource shortage and providing a fast-growth career track for tens of thousands of freshly minted data scientists. Who wouldn’t want to be a “Data Scientist”? Mom, I’m a Data Scientist. What a clean, well respected, and obviously high-class profession that must be. After all, we’re scientists, right?
I’ve identified as a Data Scientist for some years now though that’s not the title that was common when I first entered the field 15 or 20 years back. Back then I was a “predictive modeler” and proud of it.
Two decades ago the folks who prepared our reports, graphs, and visualizations were ‘data analysts’ who knew how to extract data from relational data warehouses and run it through reporting and visualization tools like Crystal Reports.
Ten years ago, predictive models were built by ‘predictive modelers’ who understood both the extraction and preparation of the data as well as the specialized predictive analytic tools like SAS and SPSS that allowed them to prepare predictive models. A few years later the term was “predictive analytics” presumably to wrap predictive modeling and data visualization in the same domain. Sometimes I think these name changes are driven by the Gartners and Forresters of the world just to proliferate new categories of reports.
The term ‘data science’ dates back to at least the ‘60’s or 70’s where is was used sometimes interchangeably with ‘computer science’. It wasn’t until about 1996 that it started to be used in its current context and that took several years to catch on. (If you’re interested in the origins of the word try Wikipedia here: https://en.wikipedia.org/wiki/Data_science).
So these days there is just one title that we all want, ‘Data Scientist’ and which describes all of us who are practicing the many sub-disciplines of the art which are now so numerous that no one individual can hope to master them all.
As our newly educated professional associates begin to come on line this leaves us with a real organizational problem. Everyone is a Data Scientist regardless of how junior or senior they are or what their specific expertise in the field may be. It’s time to start thinking about developing some new titles.
There was a really valuable break-through study published by O’Reilly in 2013 titled “Analyzing the Analyzers” by Harris, Murphy, and Vaisman. I first wrote about it in “How to Become a Data Scientist” and it helped me immeasurably to place myself among the many different types of data scientists. The research lays out four types of Data Scientists differentiated not so much by the breadth of knowledge, which is similar, but their depth in specific areas and how each type prefers to interact with data science problems (what now is also called the “T-shaped profile”.
Insightful as it is, it didn’t clear up either horizontal titling (what category of data scientist are you) or the vertical one (how senior are you in your field).
A few months back I wrote about this seniority problem trying to separate ‘doing data science’ from being a ‘data scientist’. (see So You Want to be a Data Scientist). If you are an insider in the field it’s not uncommon to describe members of a predictive analytics team from junior to senior as ‘data wranglers’, ‘model jockeys’, and then ‘data scientists’.
Is this a real problem or am I making a mountainous molehill? It is a real problem and will be increasingly so for new graduates. The problem is not with those of us who practice DS, the problem is with our employers who don’t know what to call us.
If you study job listings it is equally true that the title is used so loosely and with such little understanding that an ad for data scientist may actually describe an entry level analyst and some ads for analysts are looking for polymath data scientists. Especially if you are a newly minted data scientist this can lead to real job dissatisfaction if you end up doing traditional BI instead of what you have trained to do.
I don’t think there is any immediately evident agreement on how to solve this but I’d like to offer some thought starters in hopes that this conversation can start to spread.
Who among us are actually scientists?
I have the upmost respect for true scientists and yes I’m happy to collect some of the glow from the title but I’m not sure it fits. In the O’Reilly study above, to me it’s the type 4 data researchers that are conducting true research in our field that clearly deserve the title. These are the folks that are actually discovering and inventing (do you discover or invent in math?) new machine learning algorithms and ways of tweaking meaning from unstructured and semi-structured data. So far those uber-educated research Ph.Ds. in data science have held their tongues about so many of us using their hard-earned title. Maybe their silence is because there aren’t all that many of them. Maybe they don’t want to start a semantic food fight that would leave us all soiled.
Are we actually engineers?
I can hear the cat calls and boos rising already. Who wants to be an engineer when you can be a scientist? Frankly I think some version of the engineering title fits better what most of us do but I’m sensitive to the perceived demotion that implies. So better that we reconsider the value of engineering rather than thinking of it as a lower level of intellectual effort. Remember the words of Theodore van Karman, founder of the Jet Propulsion Laboratory, “The scientist discovers that which exists. An engineer creates that which never was”.
August 11, 2015
Bill Vorhies, President & Chief Data Scientist – Data-Magnum - © 2015, all rights reserved.
About the author: Bill Vorhies is President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist and commercial predictive modeler since 2001. Bill is also Editorial Director for Data Science Central. He can be reached at: