This is not the silly question about which language is best, of course it depends on what kind of applications you work on, your client / company, historical reasons, and your expertize. Most of us use a combination of multiple languages anyway. Though a lot of statistical / machine learning algorithms are now being implemented in Python - see Python and R articles - and it seems that Python is more appropriate for production code and big data flowing in real time, while R is often used for EDA - exporatory data analysis - in manual mode.

Source for picture: R versus Python infographics

My question is, if you make a true apple-to-apple comparison, what kind of computations does Python perform much faster than R, (or the other way around) depending on data size / memory size? What about I/O? Is Python better suited for Hadoop? Any benchmarks? Here I have in mind algorithms such as classifying millions of keywords, something requiring trillions of operations and not easy to do with Hadoop, requiring very efficient algorithms designed for sparse data (sometimes called sparse computing).

Everything that can easily be implemented with Map-Reduce is not a challenge, as in that case, most of the time is spent in data transfers, and thus your Internet bandwidth is more important than your R or Python computational powers for scalability and efficiency: see communication versus computational costs (page 193).

Also, by apple-to-apple, I mean a fair comparison. For instance, the following article topic (see data science book pp 118-122)  shows a Perl script running 10 times faster than the R equivalent, to produce R videos, but it's not because of a language or compiler issue, it's because the Perl version pre-computes all video frames very fast and load them in memory, then the video is displayed (using R ironically), while the R version produces (and displays) one frame at a time and does the whole job in R.

What about accelerating tools, such as the CUDA accelerator for R? Are there such tools for Python?

Thanks for you comments.

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 11634

Reply to This

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service