Subscribe to Dr. Granville's Weekly Digest

Top Languages for analytics, data mining, data science

Very interesting study published by KDNuggets. Here are the highlights:

The most popular languages continue to be R (used by 61% of KDnuggets readers), Python (39%), and SQL (37%). SAS is stable at around 20%. The highest growth was for Pig/Hive/Hadoop-based languages, R, and SQL, while Perl, C/C++, and Unix tools declined. We also find a small affinity between R and Python users.

Previous KDnuggets polls looked at high-level Analytics and Data mining software, but sometimes a full-power programming language is needed. That was the focus of the latest KDnuggets Poll, which asked:

What programming/statistics languages you used for an analytics / d...

R languageBased on a very high response of over 700 voters, the most popular languages continue to be R (now used by 61% of responders), Python (39%), and SQL (37%). On average, there were 2.3 languages used.

For trends, we compared the 2013 results with similar

The language with the highest relative growth (2013 vs 2012) was Julia, which doubled in popularity, but still was used only by 0.7% in 2013.

Among more common languages, the largest relative increases in share of usage from 2012 to 2013 were for

  • Pig Latin/Hive/other Hadoop-based languages, 19% growth, from 6.7% in 2012 to 8.0% in 2013
  • R, 16% growth
  • SQL, 14% growth (perhaps the result of increasing number of SQL interfaces to Hadoop and other Big Data systems?)

The languages with the largest decline is share of usage were

  • Lisp/Clojure, 77% down
  • Perl, 50% down
  • Ruby, 41% down
  • C/C++, 35% down
  • Unix shell/awk/sed, 25% down
  • Java, 22% down

Is there an affinity between R and Python? Yes, people who use R are about 13% more likely to use Python than overall population.

Read the complete analysis.

Views: 3989

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Phillip Julian on September 8, 2013 at 3:35pm
I was surprised at how much interest and publicity was generated by the KDNuggets poll. I agree that we should do our own poll, and I will be glad to help in designing the poll and analyzing the data.

We need a broader and more diverse audience like DSC. We may also want to run this poll in other LinkedIn groups, which may add to the diversity. Then we could analyze results overall and by LinkedIn group.

We should identify diversity by various factors of employment, industry, location, and optional personal demographics. We should gather more information about the software, such as usage frequency, internal ranking of each software package, size of data (big, wide, frequent, or normal), and percent of time devoted to each software.

We should supply a good definition data science, and classify the type of software (shell, integrated system, interactive analysis, visual analsis, etc.). We need to decide if we should group data science with data mining and analytics. Some data mining software does not do analytics. And high-level analytics would include almost all software.

We need to define a small set of questions, and try to keep the same questions year after year. This requires time, thought, and joint (or crowd) design.

These are my initial thoughts on the poll project. Please let me know what you think.
Comment by Vincent Granville on September 5, 2013 at 6:22pm

We should run our own poll. Phillip, you are right, it represents KDNuggets users, which might different from DSC visitors.

Comment by Phillip Julian on August 30, 2013 at 8:20am
The KDNuggets language poll ask some very interesting questions. But what are we really measuring by these results? I'm guessing that these votes come from KDNuggets readers or subscribers. The results show what software is used by that population, but results may be different for a more diverse population.

Can anyone suggest more diverse and objective language polls for big data analysis and data mining?
Comment by Buyanjargal Shirnen on August 28, 2013 at 10:53pm
  • Unix shell/awk/sed, 25% down

Despite the decline above, the following one could be useful in some cases.

https://en.usp-lab.com/

 

Follow Us

Videos

  • Add Videos
  • View All

© 2014   Data Science Central

Badges  |  Report an Issue  |  Terms of Service