400 Categorized Job Titles for Data Scientists

Job titles for data scientists, including details about the simple but powerful classifier used to categorize these job titles. This analysis provides a break down per job category, and granular reports that you can download for free (job titles broken down per company, category and level), as well as NLP (natural language processing) source code. It is based on analyzing connections from multiple LinkedIn profiles - totaling more than 10,000 professionals. The first study was published in June 2013.

1. Summary

The table below shows the top job titles in the business analytics category. The full list has 700+ job titles shared by at least two practitioners, across the 11 following categories

  • Recruiter
  • Engineering
  • Developer
  • Data Plumbing
  • Data Science
  • Statistician
  • Research
  • Business Analytics
  • Consultant
  • Trainer
  • Student

The full table  can be downloaded here (Excel spreadsheet). If you include job titles shared by only one person, we have 7,000+ job titles: this is another example of a system governed by a Zipf distribution, with very long tail. A very interesting spreadsheet with full details (including job title, job category, level, and company name) is available for DSC members exclusively. If you are not yet a member, you can sign-up here to access the spreadsheet. 

Figure 1-a: Top job titles in the business analytics category

Figure 1-b: Top job titles in the data science category

2. Methodology

We analyzed the LinkedIn data (connections with job title and company, from well connected data scientists), cleaned the job title field, and created three extra fields:

  • Cleaned job title
  • Level (Executive / Manager / Consultant / Analyst / Professor / Student)
  • Category (see section 1)

In order to identify job categories and levels, we first created a data dictionary of all one-token and two-token keywords found in job titles, ranked by frequency, after filtering out tokens that make no sense (such as vice, because it is always associated with president, in job titles containing vice president).

The top 2-token words are displayed in Figure 2:

Figure 2: top 2-token words found in job titles

The full list, including both one- and two-token words (totaling 15,000 words), can be downloaded here (Excel spreadsheet).

The job categories, levels and cleaned job tiles were computed with the following Perl script, in section 3. While this is a clustering problem (creating a taxonomy of job titles for data scientists), because of our simple and scalable approach, from a computational point of view, it looks more like an indexing problem, rather than pure clustering.

3. Source Code

The idea was to quickly write a script, to produce the results in less than two hours or work - from start to finish. The input file jobs.txt contains raw job title and company, entered by LinkedIn connections. The first step uses regular expressions to clean the job titles. If you are unfamiliar with this type of code, read our data science cheat sheet first. Note that the many "if" statements in the code are in hierarchical order, you can not re-order them without causing some problems.

The main job categories and levels were created by looking at top entries (with highest frequencies) in the data dictionary: see Step 3 in the code below.


while ($i=<IN>) {

  $job=$aux[1]; # job title
  $job=lc($job); # put in lowercase
  $job=~s/ of / /g; # clean job title
  $job=~s/ and / /g; # more cleaning
  $job=~s/[\/,\\,\&,\,,\-,\.]/ /g; # more cleaning
  $jobs=~s/  / /g; # more cleaning


  #---- Step 1: creating job level


  if ($job =~ "vice president") { $level="Executive"; }
  if ($job =~ "vp ") { $level="Executive"; }
  if ($job =~ "ceo") { $level="Executive"; }
  if ($job =~ "executive") { $level="Executive"; }
  if ($job =~ "officer") { $level="Executive"; }
  if ($job =~ "chief") { $level="Executive"; }
  if ($job =~ "partner") { $level="Executive"; }
  if ($job =~ "president") { $level="Executive"; }
  if ($job =~ "director") { $level="Manager"; }
  if ($job =~ "manager") { $level="Manager"; }
  if ($job =~ "lead") { $level="Manager"; }
  if ($job =~ "consultant") { $level="Consultant"; }
  if ($job =~ "principal") { $level="Consultant"; }
  if ($job =~ "professor") { $level="Professor"; }
  if ($job =~ "analyst") { $level="Analyst"; }
  if ($job =~ "student") { $category="Student"; }
  if ($job =~ "analyst") { $category="Analyst"; }


  #---- Step 2: creating category


  if ($job =~ "recruit") { $category="Recruiter"; }
  if ($job =~ "talent") { $category="Recruiter"; }
  if ($job =~ "engineer") { $category="Engineering"; }
  if ($job =~ "software") { $category="Developer"; }
  if ($job =~ "develop") { $category="Developer"; }
  if ($job =~ "architect") { $category="Data Plumbing"; }
  if ($job =~ "scientist") { $category="Data Science"; }
  if ($job =~ "science") { $category="Data Science"; }
  if ($job =~ "stat") { $category="Statistician"; }
  if ($job =~ "research") { $category="Research"; }
  if ($job =~ "marketing") { $category="Business Analytics"; }
  if ($job =~ "analytics") { $category="Business Analytics"; }
  if ($job =~ "business") { $category="Business Analytics"; }
  if ($job =~ "operations") { $category="Business Analytics"; }
  if ($job =~ "consultant") { $category="Consultant"; }
  if ($job =~ "training") { $category="Trainer"; }
  if ($job =~ "lecturer") { $category="Trainer"; }
  if ($job =~ "professor") { $category="Trainer"; }
  if ($job =~ "student") { $category="Student"; }


  print OUT "$company\t$job_raw\t$category\t$level\t$job\n";

  #---- Step 3: create data dictionary

  # ltoken1 is list (hash table) of one-term words found
  # ltoken2 is list (hash table) of two-term words found

  @aux=split(' ',$job);
  for ($k=1; $k< $ntokens; $k++) {
    $ltoken2{"$token_A $token_B"}++;



#---- more output

foreach $token1 (keys(%ltoken1)) {
  print OUT "One Term\t$token1\t$ltoken1{$token1}\n";
foreach $token2 (keys(%ltoken2)) {
  print OUT "Two Terms\t$token2\t$ltoken2{$token2}\n";

foreach $job (keys(%ljob)) {
  if ($ljob{$job} > 1) {   # keep only jobs with 2+ enties
    print OUT "$job\t$ljob{$job}\t$ljob_level{$job}\t$ljob_category{$job}\n";

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 31552


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Richard Ordowich on February 9, 2015 at 7:44am

This is amusing but is useful only if you want to randomly choose a tittle for some fictional role within your organization. Without details of the actual work to be performed, the skills and experience necessary to perform this work and the contribution and performance metrics for the work, these tittles are "empty data". There are no semantics for these titles, which is the problem with a lot of data within organizations.

A lot of an organization's data is "dark data". Its meaning is hidden in the recesses of people's minds.

How about an algorithm to randomly choose a title? It would probably be as effective as the way most people choose job titles but with an added element of surprise and amusement.

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service