One can seriously argue about what programming language is the best for data analysis, but there is one universal metric that can define your choice: speed of calculations. Therefore, the word "best" in the title means the languages that lead to most performant applications. If most performant program can also be written in an easy-to-use, easy-to-learn, dynamically-typed scripting language, this can point to our best choice. Lucky enough, there is an objective method to check this. Run a benchmark test for an algorithm implemented in different dynamically typed languages, and compare the execution time.

There is an interesting discussion on execution time of a simple Monte Carlo algorithm that calculates the PI value posted to Stackoverflow thread. This web page lists implementations of this algorithm in different languages (Java, Python, Groovy, JRuby, Jython, etc.).

The bottom line of this benchmark is:

(1) Java language gives the most performant implementation (I suspect C/C++ will show a similar performance)

(2) The most performant dynamically-typed implementation is Groovy. The execution of the Groovy script on the JDK9 shows the same speed as for the Java/JDK9 itself. This is a staggering observation. You can use dynamically-typed language, and still the execution of the code can be as fast as for a full-blown Java application. But Groovy with loose types code is about a factor 4 slower than the Java code.

(3) Python is about 10 times slower than Java and Groovy (with strictly defined types), and a factor 3 slower than Groovy with loose types. But Python code can be as fast as Groovy code when using PyPy. This is another cool observation. Also, Python can be more optimized, but this requires external libraries (numpy). But, even in this case, Groovy is far more performant compared to Python with numpy.

(4) JRuby running on JDK9 and the standard Python / CPython have a very similar performance.

(5) Jython and BeanShell are slowest in code execution. If you want to get the most of Jython/BeanShell, use these language to call external Java libraries which can give you the same execution speed as for the native Java).

This analysis shows that the most performant easy-to-use scripting language is Groovy. You can use Groovy for developing very fast calculations that use simple types. At at the same time, Groovy can be used as a glue language for calling sophisticated Java libraries, thus providing a very reach multiplatform computing environment for data analysis.

PyPy implementation of the Python language comes second. It is fast, but it still does not support all extension modules of the standard Python.

The 3rd place is Python and JRuby. These scripting languages show a very similar performance.

K.Jonasmon (M.S. in computer science)

© 2020 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central