Pasha Roberts, Chief Scientist
Greta Roberts, CEO
Digging into a cross-industry study of analytics professionals, we identify four distinct patterns of how these workers spend their week: (1) Data Preparation, (2) Manager, (3) Programmer, and (4) Generalist. These functional clusters are defined by unique time-usage patterns, and further, exhibit important differences across dimensions of education, demographics, and mindset. This brief quantifies these characteristics, and their deep implications for sourcing, hiring, managing, and retaining analytics professionals in each category.
The groundbreaking 2012 Analytics Professionals Study by Talent Analytics, Corp and the International Institute for Analytics utilized many measures to understand the characteristics of modern analytics professionals and data scientists1. The study examined 302 active analytics professionals in a diverse sample of companies, industries, sizes and circumstances. For the purpose of this paper we will use the terms data scientist and analytics professional interchangeably.
Our premise? Data scientists have been discussed as if they performed a single role. We suspected this role was wider, containing a broader workflow of tasks. Clarifying the analytics workflow and tasks performed by data scientists could also provide insight into those looking to hire and retain professionals in this important role.
To this end, we asked participants in our study how many hours per week they spent in various analytics-related activities. Every attempt was made to capture and reflect tasks in a modern analytics workflow.
The study also gathered 11 factors pertaining to an individual's “raw talent” factors, also known as aptitude or mindset. Aptitude can be distinguished as different from achievement. This study will show how aptitude and other factors differ across job types.
In aggregate, the study sample spent roughly the same percentage of time performing tasks in the following 11 categories:
However, upon deeper analysis it became clear that there were several “types” of workweeks at hand in this sample.
When we reviewed the 11 different analytics functions,
4 categories emerged, each containing multiple functions.
Several algorithms were examined to perform cluster analysis upon this sample, and “Fuzzy Clustering” delivered the best results. This method implies that each item belongs to each cluster to some degree, which makes sense given the fluidity of most analysts' work. The best results were found with four clusters, which were named based on their dominant activities.
This brief uses a chart type known as a “Density Plot”. A “bell curve” is a density plot. It displays the estimated population percentage for each possible value of a variable, that is the “density” of that variable. The horizontal X axis measures a single raw talent metric (like Curiosity) on a scale of 1 – 100. The farther to the right, the closer the score is to 100, and the more Curious the individual is. The vertical Y axis shows the percentage of the population estimated at each raw talent score. The higher up, the more of the population will have this score at X.
In a random sample of people, the same number of people would have a Curiosity score of 29 as would have a Curiosity score of 78. Therefore, a random sample of people would display as a flat line at 1% (we drew a dotted line at 1% to show where a random sample of people would score.)
When viewing our Density plots, the most interesting information is found when the line is below or above the dotted line (above or below the random sample line).
For the purpose of this paper we focus primarily on teasing apart differences in the 4 functions inside of the data scientist role. These differences have implications for hiring, promoting and retaining. Before we begin, it is interesting to note similarities among the functional clusters before we tease apart differences.
Analysts in all 4 functional clusters have two things in common: 1) very strong intellectual curiosity (Theoretical Drive, see Figure 1) and 2) strong drive to create out of the box solutions (Creative Drive, see Figure 2)
Figure 1: Level of Intellectual Curiosity.
(the further right, the more curious.)
Figure 2: Level of Creativity
(the further right, the more creative)
Analytics professionals in the first group, the Data Preparation cluster, spend a significant amount (46%) of their time gathering and preparing data for analysis used later on in the analytics workflow.
Figure 3: Time by Analytics Function for Data Preparation Cluster
How are Data Preparation Analysts Different from other Analysts?
Figure 4: Density of Drive to Compete and Win
(the further right, the more motivated to compete)
Figure 5: Density of Drive to be Exact, Accurate, Mistake-free
(the further right, the more detailed, exact, precise in their work)
Sourcing, Hiring, Managing and Retaining Data Preparation Analysts
The second functional cluster identified consists of analytics professionals whose workweek is weighted more heavily toward programming – writing computer code to manipulate and process data. They spend more than 3 times the time programming than any of the other clusters. That being said, analytics Programmers still only spend one third of the time, on average, programming. The rest of the time they spend on other analytics-related activities, like other analysts.
Figure 6: Time by Analytics Function for Programmer Cluster
Sourcing, Hiring, Managing and Retaining Analytics Programmers
Figure 7: Time by Analytics Function for Managers Cluster
How Analytics Managers Differ from other Analysts
Figure 8: Density of Drive to be Compassionate and Empathetic
(the further right, the more compassionate)
Figure 9: Density of Drive to Achieve Bottom Line Results, or to See ROI
(the further right, the more focused on results, including personal financial results)
Figure 10: Density of Drive to use an Assertive Management Style
(the further right, the more bold and confident the approach)
Sourcing, Hiring, Managing and Retaining Analytics Managers
One cluster of analytics professionals, Analytics Generalists, did not report spending significant time in any focused area. Generalists in our study were found in a wide variety of company and industries. Contrary to the study’s original hypothesis, Generalists are found in very large organizations, as well as small companies.
We suspect Generalists work in all sized companies is because:
Figure 11: Time spent by Analytics Function for Generalists Cluster
The field of data analytics is going through rapid change as new data sources and new business opportunities emerge. By nature, very few people are well suited to do everything on the spectrum of analytics – to clean data AND program AND analyze AND present AND manage. This is unrealistic and does not scale.
This Study reveals an ongoing trend to divide the work up between Preparation, Programming, and Management. Ironically, the analysis and visualization stage is rather small by comparison.
It appears that some Generalists are in this role for organizational reasons rather than aptitude or personal preferences. Meaning, it could be that today’s Generalists have been placed in this role not because they are great at this, but because their analytics role is less well defined and they were hired to “do everything”. Generalists are found in small organizations, where it may be necessary to do everything, and in large organizations, which could easily specialize, but do not seem to deploy a division of analytics labor.
If this proves to be the case, over time today’s analytics discipline will mature and analytics teams will begin to divide workers into more specialized tasks – like the clusters we’ve identified. When this happens, it could be that a group of “True Generalists” will remain, or perhaps these will emerge as the true “Analysts”.
Pasha Roberts is Co-founder & Chief Scientist of Talent Analytics, Corp.
As Co-Founder and Chief Scientist, Pasha is responsible for architecture, development, and algorithms for Talent Analytics. He wrote the first implementation of the software over a decade ago, and today he continues to drive new features and platforms for the company.
As is often found in data science, Pasha has decades of experience/education that span computing, quantitative, artistic, and business categories.
Pasha holds a bachelors degree in Economics and Russian Studies from The College of William and Mary, and a Masters of Science degree in Financial Engineering from the MIT Sloan School of Management. His thesis at MIT prototyped the application of advanced 3D graphics to massive financial “tick” datasets.
He has founded two companies, WebLine Communications Corporation, an web-call center enterprise software company, and Lineplot Productions, a financial visualization/animation service company.
Pasha’s passion with Talent Analytics is to develop new analytics to focus business performance, and to extend the Talent Analytics model to a useful set of software platforms. He hopes to discover new information about people and the work they do, with every new project. Follow Pasha on twitter @PashaRoberts.
Greta Roberts is Co-founder & CEO of Talent Analytics, Corp.
As Co-founder and CEO, Greta is responsible for charting a predictive analytics approach and software platform to solving employee challenges. In addition to her role as CEO, she was elected as The Program Chair for Predictive Analytics World for Workforce and continues as Faculty at the International Institute for Analytics.
Greta brings a unique perspective to solving complex, long-term challenges. This is never more evident in the firm’s early direction to use analytics to solve “line of business” challenges instead of “HR” challenges and modeling business outcomes instead of HR outcomes. This approach has lead Talent Analytics recognized leader in predicting employee performance and attrition. Talent Analytics focuses their work on high value, high turnover positions like Sales positions, Bank Tellers, Insurance Agents, Customer Service Reps and Data Scientists; all areas where reduced attrition or increased performance can yield $ millions in bottom line savings or income.
Greta is a sought-out international thought leader, presenter, and author. She has been a multi-year presenter at Predictive Analytics World (PAW), keynoting in 2014 at PAW Toronto, the ADMA Global Forum in Sydney, Australia, the INFORMS Analytics Conference & SAP Sapphire Now. In addition to speaking, she is often quoted in the press in a variety of influential business publications.
Follow Greta on twitter @gretaroberts.
Note: To license Talent Analytics Data Scientist benchmark to help build your analytics bench, please contact Talent Analytics directly for more information: 617-864-7474 x.101 or [email protected]
 See IIA Research Brief Quantifying Analytical Talent for additional results of the study.
©2013 Talent Analytics, Corp.
All rights reserved