Home » Uncategorized

Python Overtakes R for Data Science and Machine Learning

This article summarizes a trend in programming languages usage, based on a number of proxy metrics. This change started to be more pronounced in early 2017: Python became the language of choice, over R, for data science and machine learning applications. 

Statistics from Google

Google has one app called Google Trend to find out trends about specific subjects, to compare interest for a number of search topics, broken down by region or time period. 

2808335936

Search index for Python Data Science (blue) versus R Data Science (red) over the last 5 years, in US

We used the app in question to compare search interest for R data Science versus Python Data Science, see above chart.  It looks like until December 2016, R dominated, but fell below Python by early 2017. The above chart displays an interest index, 100 being maximum and 0 being minimum. Click here to access this interactive chart on Google, and check the results for countries other than US, or even for specific regions such as California or New York.

Note that Python always dominated R by a long shot, because it is a general-purpose language, while R is a specialized language. But here, we compare R and Python in the niche context of data science. The map below shows interest for Python (general purpose) per region, using the same Google index in question.    

2808336731

Interest for Python, by region (last 12 months; source: Google)

Indeed statistics

Indeed is a job aggregator. The jobs listed there might have expired or could be duplicate, or irrelevant, anyway it is worth having a quick look:

Python Data Science returns 15,741 full time jobs. Top cities in US are:

  • New York, NY (1401)
  • Seattle, WA (1141)
  • San Francisco, CA (1052)
  • Chicago, IL (469)
  • Boston, MA (410)

R Data Science returns 7,533 full time jobs. Top cities in US are:

7,533 full time jobs

  • New York, NY (734)
  • San Francisco, CA (402)
  • Seattle, WA (375)
  • Boston, MA (269)
  • Chicago, IL (260)

Our internal statistics

We have 83 fresh, active job ads, relevant to data science and mostly in US and London, for Python: you can check them out here. For R, we have 66, and you can check them out here. It would be interesting to compare these stats with job number stats from LinkedIn.  

Another metric of interest is the number of articles written about each language, in the context of data science. On Data Science Central, we have 19,500 documents where R is mentioned (since 2008) versus 11,500 with Python. However, when you click on these two links to check out the top results, 9 out of 10 are in 2017 for Python, versus 7 out of 10 for R. In short, R is starting to show its age.  A Google search for R or Python (on Data Science Central) will yield similar conclusions.

It would be interesting to check what is happening with Java and C++, as they have been the workhorses of software development for a long time. 

DSC Resources

Popular Articles