Subscribe to DSC Newsletter

SAS Dominates Analytics Job Market?

Bob Muenchen's very useful work on this topic, SAS Dominates Analytics Job Market; R up 42% sent me back to some 2012 work we did at Statistics.com on the subject of what employers are looking for in the way of analytics skills.  First, our main results:

1.  Our numbers showed a much less SAS-dominant world:  1.92 SAS jobs for every R job.  Bob had found the ratio to be 11.06 for every R job.

2.  Like Bob, we found that R was gaining relative to SAS - in our May scrape the SAS/R ratio was 2.44, while in December 2012 it had declined to 1.92.

About our methodology:  Rather than using Indeed.com, a consolidator, we did web scrapes customized to Dice, Amazon, CareerBuilder, and Monster. Indeed.com includes most of those sources, but web scrapes are difficult to configure for indeed.com, since it is a consolidator and the job postings can take different formats, depending on the original source.  The Amazon scrape is for jobs at Amazon itself (Amazon does not run a job market); it was included not to be comprehensive but to have a representative entry from a large tech-oriented firm that does its own hiring.  We searched for jobs posted in the previous 7 days and used the search terms:  analytics or forecasting or statistician or "data mining" or PMML.  PMML was an experimental inclusion and turned out not to be important.

The web scraper then returned all job listings it could find with those criteria.  The search was not perfect, simply because some of the job listing websites were not in a format that fit the scraper.  We then searched through each job listing for a lengthy set of keywords.  SAS, R, SPSS, etc. were included.  Through trial and error several processing rules were developed to identify "R" correctly. 

Here are the results from December - each percentage is relative to SAS.  In other words, the first line indicates that the term R appeared in 52% as many posting as SAS did.

R

52%

C++

97%

Excel

231%

Java

268%

JMP

2%

Matlab

11%

Minitab

3%

Mysql

62%

Perl

70%

Python

70%

SPSS

30%

SQL

421%

Stata

3%

We plan to run another scrape shortly - comments welcome!

Views: 1306

Tags: Muenchen, R, SAS, SPSS, Statistics.com, jobs

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Peter Bruce on June 10, 2013 at 6:51am

In our scrape we selected only job listings that contained keywords like analytics, data mining, etc., then processed them for the presence or absence of a much longer list of keywords.  We then reviewed a few of the ones identified as containing "R" as a keyword to verify that this was correct (all of them were), but we did this early in the process as part of a more general "does it make sense" human review of the whole process.  We had to refine the scraping as we went a long to ensure that the jobs we were getting were the ones relevant to data science.  Part of the goal was to determine, within the universe of analytics listings, what were the specific skills sought and terminology used. With the focus now on R, we can pay more attention to this aspect of the review.  Our bigger challenge earlier was operations research, which, of course, abbreviates to "OR".  

Comment by Ralph Winters on June 8, 2013 at 8:02am

All note that because of the unfortunate choice of the name "R" for the language, there is a natural bias towards any kind of document (even job listings) which contain that term.

Google: " R "  7,810,000,000 hits

" SAS "  275,000,000 hits.

ratio 28:1

So no matter how perfect your Boolean filtering is, you will find it very difficult to overcome the natural 28:1 ratio.

I would suggest running some of these queries by some professional headhunters, have them categorize which ones make sense or not and then feed the results back into your query.

Needs people with domain knowledge.  Can not be done completely automated.

 

Comment by Bob Muenchen on June 8, 2013 at 6:40am

Hi Peter,

Nice job! My approach ignored job title and focused on specific tool criteria. As Ralph points out, it's good to know those criteria so others can try variations and check the work. Most packages have names that are unique enough to search on and get valid results. With SAS I ran into trouble since it can stand for "SCSI Attached Storage." So I avoided job descriptions for that with this string:

SAS -SATA -storage -firmware

Searching for R was a nightmare since R shows up in many extraneous ways, such as Toys-R-Us. Here's the search string I used for R:

sas+or+r or sas+r or r+or+sas or r+sas or matlab+or+r or r+or+matlab or r+matlab or matlab+r or ruby+r or r+ruby or python+or+r or r+or+python r+python or python+r or sql+r or r+sql or r+or+sql or sql+or+r or java+or+r or r+or+java or java+r or r+java or perl+or+r or r+or+perl or perl+r or r+perl or spss+or+r or r+or+spss or spss+r or r+or+spss 
You can cut and paste them into Indeed.com's search window and get a quick comparison between SAS and R jobs at any given point. I'm doing that via a program once a week, so I hope to have an interesting graph this time next year.
I'd love to see other variations people can come up with to study this issue.
Cheers,
Bob
Comment by Ralph Winters on June 7, 2013 at 10:55am

Hi Peter,

We don't really know if this was a representative or random sample.  You used 4 sources.  I would expect Amazon to use "R" more since it uses open source tools more.  If you had used a big bank substitute (that also recruites it's own position but uses more commercial tools) the study might have been slanted more towards SAS.

How do you account for duplication in jobs?  I thought they sometimes employers vary the keywords in job descriptions and post them across various sites.

When you select for "R" or "SAS" what is your search criteria?  How do we know that it is capturing all of the relevant postings?

I would image this this is a daunting task.  In some discipline, such as text analytics, researchers start with some common agreed upon standard collection, such as all Netflix reviews in a certain time period, or all Reuters news articles for 2012.  But if we can't go back and "vet" any of the source data, It is hard to determine exactly what you are measuring.

Hope this helps.

-Ralph

Comment by Peter Bruce on June 7, 2013 at 8:04am

Hi Ralph - Do you have any ideas on how to make our methodology, described in the note, more scientific?  -Peter

Comment by Ralph Winters on June 7, 2013 at 7:43am

Peter,

Yes the large discrepancy between the two results does not add credibility to either of the numbers.  I wish these kind of studies were a little more scientific and subject to peer review.

But thanks anyway for quoting alternate number.

 

Ralph Winters

 

 

 

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service