Home

Best dynamically-typed programming languages for data analysis

One can seriously argue about what programming language is the best for data analysis, but there is one universal metric that can define your choice: speed of calculations. Therefore, the word “best” in the title means the languages that lead to most performant applications. If most performant program can also be written in an easy-to-use, easy-to-learn, dynamically-typed scripting language, this can point to our best choice. Lucky enough, there is an objective method to check this. Run a benchmark test for an algorithm implemented in different dynamically typed languages, and compare the execution time.

There is an interesting discussion on execution time of a simple Monte Carlo algorithm that calculates the PI value posted to Stackoverflow thread. This web page lists implementations of this algorithm in different languages (Java, Python, Groovy, JRuby, Jython, etc.).

The bottom line of this benchmark is:

(1) Java language gives the most performant implementation (I suspect C/C++ will show a similar performance)

(2) The most performant dynamically-typed implementation is Groovy. The execution of the Groovy script on the JDK9 shows the same speed as for the Java/JDK9 itself. This is a staggering observation. You can use dynamically-typed language, and still the execution of the code can be as fast as for a full-blown Java application. But Groovy with loose types code is about a factor 4 slower than the Java code.

(3) Python is about 10 times slower than Java and Groovy (with strictly defined types), and a factor 3 slower than Groovy with loose types. But Python code can be as fast as Groovy code when using PyPy. This is another cool observation. Also, Python can be more optimized, but this requires external libraries (numpy). But, even in this case, Groovy is far more performant compared to Python with numpy.

(4) JRuby running on JDK9 and the standard Python / CPython have a very similar performance.

(5) Jython and BeanShell are slowest in code execution. If you want to get the most of Jython/BeanShell, use these language to call external Java libraries which can give you the same execution speed as for the native Java).

This analysis shows that the most performant easy-to-use scripting language is Groovy. You can use Groovy for developing very fast calculations that use simple types. At at the same time, Groovy can be used as a glue language for calling sophisticated Java libraries, thus providing a very reach multiplatform computing environment for data analysis.

PyPy implementation of the Python language comes second. It is fast, but it still does not support all extension modules of the standard Python.

The 3rd place is Python and JRuby. These scripting languages show a very similar performance.

K.Jonasmon (M.S. in computer science)

Tags: