Summary: Is everyone a ‘data scientist’? What about ‘data engineers’ and the junior versus senior, or skill level distinctions? We do seem to need some agreement about titling. Data Scientists is still the prestige title but there are some folks lobbying to take that title away.
Over just the last 12 to 18 months there has been more and more written about new job titles and roles in our data science profession. It was perhaps only three or four years ago where if you worked in data science you were called a “Data Scientist”. Regardless of degree, regardless of the specific tasks you performed, or the level of experience you possessed this was the title.
If an employer wanted customer predictive analytics, a recommendation engine for ecommerce, an expert in (then brand new) deep learning, or even someone to set up a Big Data data lake or streaming system architecture, the ad still read “Data Scientist”. Increasingly though we see mention of new titles like “Data Engineer” or “Predictive Analytics Professional” that seek to carve off portions of the broader Data Scientist title.
There has always been a little under-the-covers dissent about exactly who should be allowed to call themselves a Data Scientist. For the most part a lot of that came from folks with Ph.Ds. who no doubt thought, perhaps correctly, that only those who had invested as heavily as they had earned that title.
The whole issue of titling in our profession has never been agreed or expressed with any clarity. Not even between junior and senior folks performing the same tasks. So perhaps a little clarity would be welcome.
You Can Still be a Data Scientist
Make no bones about it. Everyone in our field still wants to be called a Data Scientist. There’s money and prestige in the title and we all worked hard to get here. The titles and job descriptions we’re going to discuss here are not widely agreed but they do come with some compelling logic from some credible sources.
If there’s a problem it’s that these titles want to restrict the title ‘Data Scientist’ to just a few of us and that’s going to make a lot of practitioners unhappy.
More importantly, employers will probably continue to be ignorant of these distinctions for quite a long time. It’s still as common now as four years ago to see an ad for ‘Data Scientist’ applied to a pretty much SQL-on-static-data job, and the term ‘Analyst’ applied to a fully skilled predictive analytics job.
In the end, regardless of what is written here, it’s what your employer agrees to call you.
First Carve Out: Data Engineer
The first subset and the one that makes the most sense to me is the fairly new term “Data Engineer”. As it’s used, this is intended to describe a person predominately skilled in CS who plays an important and supporting role to Data Scientists.
Where the Data Engineer used to be called an Analyst or some even less descriptive title within IT, the core of the task is the ETL and blending of the data from various sources that feed up to the Data Scientist.
Typically one Data Engineer could support several Data Scientists, and some vendors like Qubole have made a business trying to make the relationship between Data Engineer and Data Scientist as efficient as possible. You could also argue that blending platforms like Alteryx serve this same relationship.
The term has also been expanded to those who do the planning and execution of Big Data architectures. This ranges from the setup of NoSQL DBs like Hadoop, to establishing streaming and IoT architectures like Spark Streaming, and includes projects of intermediate difficulty like setting up data lakes.
It seems clear that the path to entry level junior level positions involves good SQL skills but that the path to seniority passes through mastery of increasingly sophisticated data provisioning platforms.
Does this require R or Python mastery? Not in the sense that the Data Engineer must produce statistical or machine learning models. But it is probably the Data Engineer who takes the models, makes them production ready, and implements them in production systems. In a few environments that have sufficiently comprehensive data science platforms this might be semi-automated.
In organizations that use hundreds of customer behavior models there is a cross over role for the senior Data Engineer or junior Data Scientist responsible for monitoring and updating implemented models as they degrade over time.
Who’s In and Who’s Out
The balance of this discussion is based largely on the work of BurtchWorks, a recruiting organization run by Linda Burtch who annually gives us very high quality salary studies and in-depth analyses of our profession. There are a great many recruiters in our field but none who give us this depth of reporting about trends as they’ve experienced them from their clients.
If there’s sampling error in what BurtchWorks reports it comes from the fact that they can only report on what their clients have asked for (and they do have a lot of clients, about 450 companies and about 1,200 respondents in their surveys). Second, that they self-limit to the spectrum of requirements that runs from Market Research through Data Scientists (they do not deal in Data Engineers).
So the highest level cut that BurtchWorks makes defines who does not, in their opinion, belong in the data science sphere at all. Those would include:
Marketing Research Professionals: (Consumer insights, shopper insights, category management, media or audience research, competitive and market research.)
Business Intelligence Professionals: (Maintain and analyze Enterprise Data Warehouse, static and structured historical data.)
Marketing or Brand Managers: (Markets, customers, channels, and products focused managers who are the principal consumers of data insights from both BI and predictive analytics. Many aspiring citizen data scientists can be found here along with the ranks of Business Analysts employed either in IT with the Data Engineers or in the lines of business.)
Web Analytics Professionals: (Focused on optimizing search, content consumption, and ecommerce success using tools like Google Analytics.)
This seems largely correct to me. You might argue that Web Analytics Professional might sometime include the development and maintenance of Recommendation Engines which is clearly within the DS domain. However, if that’s your goal as an employer you’re probably looking for that specific experience and not broadly web analytics.
Also, there’s always confusion about where to put the citizen data scientists. As an employer you’re not advertising for this but you’d be happy if your LOB managers or analysts had some of these skills.
Data Scientists versus Predictive Analytic Professionals
The most controversial distinction that BurtchWorks makes is between Data Scientists and Predictive Analytic Professionals (PAPs). Although their report tries to make this distinction clear, there’s plenty here to split hairs over. Not the least of which is that BurtchWorks wants you to stop calling the majority of us Data Scientists and adopt this seemingly lesser title of PAP.
There hasn’t been a truly comprehensive report out in years (if ever) about the use or employment of data science in businesses of different sizes and types. But I think we could intuitively agree that 80% or 90% of data science utilization and hiring lies in consumer behavior modeling. Add to that another couple of percentage points for regression based forecasting and optimization about future prices or values of trends (like sales forecasts) or dynamic pricing and supply chain forecasting.
If there is such a thing as a ‘production environment’ in data science it is among these major employers that include banks, mortgage lenders, financial services, insurance, telecoms, ecommerce, brick-and-mortar retailers, manufacturers, plus government, higher education, and health care. If you’re going to get a job in data science chances are 80% to 90% it’s going to be with these employers.
What does this leave for the individuals BurtchWorks calls “Data Scientists”. At the top end are the very rare AI, deep learning, and quantum application data scientists. But this also includes all those smartest of the smart that are innovating statistical and deep learning techniques in academia as well as at the likes of Google, Facebook, IBM, and Microsoft. It also separates out those writing data science code that is destined to be the product, including hordes of DS startups across the whole spectrum of innovation.
To vastly simplify this, in BurtchWorks bifurcation, the Data Scientists are getting the press and the Predictive Analytic Professionals are doing the work of guiding company strategy and tactics.
Comparing Data Scientists and Predictive Analytic Professionals
On one level it does make some sense to try to separate out ‘production’ data science from ‘innovative’ data science. If you’re an employer you’re probably plenty happy with ‘production’ level skills that frankly can still look like magic in predicting future outcomes.
If you want ‘innovative’ data science skills you probably have a special project in mind, or are hiring at the top of the DS organization ladder for a leader that is extremely well grounded in ‘production’ but has skills in ‘innovation’. Here are some of the comparisons that BurtchWorks makes in its 2016 report:
So in general, BurtchWorks says Data Scientists are Predictive Analytic Professionals plus general purpose coding/programming skills, experience with a variety of data infrastructures, and have business acumen and industry knowledge.
It’s also particularly informative to look at where these two categories work. These also are from the 2016 BurtchWorks reports.
Predictive Analytic Professionals
As you can see, these employer distributions are exactly as we expected for ‘production’ versus ‘innovation’.
My hat’s off to BurtchWorks for its annual good study. Still there are some aspects of this that don’t ring true for me.
Education: This implies that it’s really not possible to get a job in data science with just a Bachelor’s degree. No argument that a Master’s makes a better foundation but this doesn’t seem to reflect the reality that I see in the market place. It also doesn’t seem to reflect the supply of Bachelor’s graduates just coming on line whose degrees specialized in data science.
I also simply question whether there are as many available data science Ph.Ds. as this study suggests. Perhaps this is what employers would like to see but not what they are getting or perhaps it is a respondent bias.
Note that the chart does not suggest that you can compare the two for proportional calculations. For example, although 30% of PAPs are in financial services and 10% of Data Scientists are in financial services this does not allow for the conclusion that there are 3X as many PAPs as DSs in financial services. Chances are it’s more like 10:1 or even greater.
Tools: Given the predominance of R and Python in school training for data science, it seems odd that the PAPs aren’t given more credit for broader programming skills. SAS and SPSS both have large installed bases which are platforms for cooperation and standardization of large DS organizations, but today’s graduates are bringing both R and Python which can be utilized even on those platforms.
In general, I like the differentiation between Data Engineers and Data Scientists. I think that helps the Data Engineers to better differentiate what their skills are and how they should be deployed in an organization.
I do acknowledge that it would be helpful to have some more formal title differences for junior and senior data scientists but I’m not ready to buy in to downgrading the title to PAP for what I expect must be in the range of 60% to 70% of folks entering our profession.
For now fellow Data Scientists take comfort and pride in your title. It’s still whatever you agree with your employer that you should be called.
About the author: Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist and commercial predictive modeler since 2001. He can be reached at: