Data analysis using Python on the Java platform

According to TIOBE Index for January 2016, the Java popularity index has reached 21%, leaving behind C++ (6%), while Python index is only 3.8%. These numbers can be different for data analysts positions, of course, where Python is likely to be more popular than Java.

But how about merging Python with Java? This is exactly what DMelt data analysis environment is all about. You write a data mining program in Python, but the program calls numeric libraries implemented in Java. This way your program is fully multi-platform, i.e. same program can be executed on multiple operating systems (Windows, Linux, Mac) without recompiling.

You may ask: who cares? Python is already a multi-platform environment, and is available on all platforms. Not quite. Python calls platform-dependent C/C++ libraries to speed up calculations. Such libraries are developed on certain platforms. Remember: if your code is 100% Python, it will be rather slow for CPU intensive tasks, or for processing large data volumes.

In the case of DMelt, you simply make Java jar libraries, and call these libraries using  the Python (or Java) interface language. This way your program is 100% compatible with any operating system where Java is installed.

But how many Java libraries are available for data mining? Look at Java API for DMelt. It claims 30,000 available Java classes. Most of them are not related to data mining, but even if a small fraction of these libraries is designed for data mining, this would be A LOT.

Views: 1240

Comments are closed for this blog post

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service