Subscribe to DSC Newsletter

R vs Python? No! R and Python (and something else)

WOLFRAM MATHEMATICA

Before assessing R and Python, I will start with Wolfram Mathematica. It's a powerful software, similar to MatLab. You can handle lists and matrices easily, you have all the best mathematical functions, backup of Wolfram Alpha and extremely sophisticated graphics visualizations, that allow you, for instance, to make and visualize an animated gradient descent, animate different weights for a given neural network, choose a specific Machine Learning algorithm and automatically classify your dataset in classes, plot stunning 3D visualizations, make animations and manipulate variables values dynamically at the same time you see the output of your calculation. It has 4.65 Gb size and comes with all libraries integrated. It's a great program when you know the formulae for Machine Learning algorithms, so you can build them from scratch, in a completely customized way. You can also do face recognition, geolocation of objects with 3D plots of map surface, handle cellular automata like any other and develop social networks models with artificial intelligence completely customized. You can even develop a self driving car project, see the work on the YouTube video. Below you can see a Support Vector Machine in 3D.

R

However, market demands a skill set where R and/or Python are essential. R is extremely easy to learn and free program, with lots of libraries (on demand, what makes the software faster) including ALL Machine Learning algorithms, including Neural Networks and Deep Learning. You can also build models from scratch (like face recognition), faster than a software like MatLab or Mathematica. Almost everything is automatic. All Machine Learning Algorithms have hyper parameter tuning, what makes extremely easy to build a model. But as you add more libraries, the software becomes slower, also affected by for/while loops. Statistics is an extremely strong feature, better than any other software. You can draw maps, make geolocation easily, animations. Besides, stackoverflow, CRAN and R-bloggers also help a lot. Ah, you can also handle missing values and outliers very easily, rather than just replacing my mean. R may be connected to Microsoft Azure Machine Learning or RapidMiner.

In my point of view, R is perfect when you have a quite large dataset (you can also run calculations in the cloud), for any business that really prefer quality of analysis over quantity. If you want details, R is the right choice. You can even detect faces and objects with R.

But then comes Python, where you can customize everything.

PYTHON

Python is excellent when you want to create APIs, handle large datasets 2 X (or more) faster than R and Mathematica, when you want o build models from scratch, if you have knowledge about Machine Learning algorithms, their formulae. It's easy to learn by yourself and have a good online support (stackoverflow, github, etc). Neural networks modeling is a piece of cake, once you learn to get the best from online sources. Python escalates easily, it's a slim software (I mean not fat), but lacks a more detailed description regarding statistical analysis. You can do face recognition in full, using OpenCV, Convolutional Neural Networks, pattern recognition, also geolocation using Jupyter and use all Machine Learning softwares. Graphics in Python is not advanced and in my point of view, it's the perfect software for handling big data and automating Data Science tasks. Python may run with Tensorflow and Microsoft Azure Machine Learning and use a cloud service like Amazon. Definitely a software to fall in love with.

Views: 24378

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Enzo on October 21, 2016 at 1:49pm

When I read about people that complain about R being slow, I wonder how well they know R and the tricks (and packages) needed to work really fast. For example it is possible to speed-up R 3X or 4X using multi-core linear algebra libraries like openblas. Usually it is as easy as symlink the openblas libraries to get the speed improvement for all packages and LA operations. The same type of improvement is possible using openblas with numpy in python, but require compilation and is available only when using numpy directly or indirectly. Another great package is data.table.

Comment by Dalila Benachenhou on October 11, 2016 at 12:11am

Rubens,  I was just surprised by the finding that Python or R will be faster than compiled mathematica.  In CS, it is a fact that a code in C runs faster than one in Java, and a code in Java runs faster than one in R or Python.  As a code speed dependents on its construction ( its time O()), and the language it is written in, and the platform it is running in. It makes sense to assume that Python code will be slower than a C code.  Hence, if a code is written in Mathematica, Mathematica is written in C, and the Mathematica code is compiled, logic implies that the compiled Mathematica code will be faster than a Python.  Of course if you don't compile than the assumption that the Mathematica code will be faster than a Python code does't apply.

From my experience, at the end of the day it is the customer that decides which language they need you to use. In finance, most of my code is in Matlab.  As an Adjunct Prof. I have to teach SAS.  Many large organizations and the government relay on SAS and they require our students to know SAS.

By the way, I do agree with you on R and machine learning.  It has over 100 predictive models, many I've never heard of.  I also like the packages for Linear Regression.  They have extensive tools to diagnose your model.

Here is a link to a blog where I build predictive models to predict Geico Callers.  I used mostly R except for the RandomForest where I used Mathematica.

http://femvestor.blogspot.com/2016/02/performance-from-various-pred...

Comment by Rubens Zimbres on October 6, 2016 at 12:58pm

Dalila, I agree with you, graphs in Mathematica have much higher resolution, far better than R or Python. Regarding speed, I solved the MNIST task with Python in half of the time spent with Mathematica. At the same time, drawing a social network with 2,000 nodes took Python one tenth of the time spent with Mathematica. Regarding speed, R is the laggard, but it has much more simple ways to implement Machine Learning algorithms, like Python.

Mathematica ML package comes with Logistic Regression, Naive Bayes, Support Vector Machines, KNN and Random Forest. Only. We know there are several other algorithms for ML. Ok, you can develop them from scratch in Mathematica, but it's not necessary if you have Python or R. I didn't explore Python features on Graph analysis, but sincerely I believe it can do the same job as Mathematica.

Mathematica has a lot of tricks to improve performance, as setting up the JVM, writing an efficient code, avoid N[list] code, parallel kernels, Hadoop link, CUDA link, etc. I notice there is a lack of community enlightment regarding Mathematica. The code is kept complicated what makes hard to adopt the software. Demonstrations area is full of examples, but it's really hard for beginners. You can find R and Python resources absolutely anywhere, different models, packages what makes easier to learn, develop and adapt models and hyperparameters. I wish we could have the option to load Mathematica libraries on demand, according to the code selected. This would make it really faster. But lately I've been coding in R and Python, with excellent results.

Comment by Dalila Benachenhou on October 6, 2016 at 12:27pm

I understand that many data scientists deal with traditional datasets, and R has enough packages to meet the needs of these data scientists.  

I use both R and Mathematica.  I like that R has a large implementations of Predictive Models, more than any other language, and that it is easy to build and test your models.  Therefore, I use R to develop predictive models.  However, when you go beyond the traditional datasets, I found it very slow and not as reliable as Mathematica.  I have used both for text mining, and R is very far behind Mathematica.  The text mining packages are not to the Mathematica maturity or even accuracy.  I also created Community Graphs in Mathematica which take fraction of time of packages in R, with higher resolution and accuracy.  

By the way, Python is written over JVM, as is R.  Both are slower than Mathematica, especially if Mathematica is compiled.  Mathematica is written in C (faster than Java.)  In addition, Mathematica has more distributions  implemented than R.  

One more thing, few are aware of, you can implement TensorFlow in Mathematica.

Here is a link comparing both R and Mathematica http://www.stats.uwo.ca/faculty/aim/epubs/MatrixInverseTiming/defau...

One more point, I have been using Mathematica for Graph Network building and analysis.  It is much much faster than R.

I'm just a data scientist, I don't work for Mathematica, or represent it.

Comment by Derek Rucker on October 6, 2016 at 11:01am

Excellent summary of the strengths and weaknesses of three of the most widely used analytics platforms. I personally am most comfortable in R, and have dabbled in Python. Your assessment of Mathematica is of great help, as I haven't been able to say much one way or the other about it before. Thanks!

Comment by Olga on October 5, 2016 at 4:23pm

It's true that many data science positions require both R AND Python. I've done a quick study to see which languages are most popular on the job market. I've gathered 3321 job descriptions of positions that require working with data in Canada and the US, over the past year. I looked on LinkedIn.com, Workopolis.com and Indeed.com. Here is what I found :R vs Python

Comment by Jason Williams on October 5, 2016 at 6:47am

I have to agree with what you have posted up here. I personally think that every data scientist should have experience in both (I don't think I could ever give up the ease / quality of shiny or markdown in R OR the ease of fit, transform, score, repeat of Python ML). I teach a basic analytics class for C-level executives who want to get visuals of these analysis algorithms and the programming behind them. Learners with no prior programming experience really love how automatic/ intuitive R is and it makes more sense to them (at least in my experience). Strong analysis skill is such an important skill set in our time and if a program easily allows someone to learn and perform their analysis with little programming experience - it's an incredible tool! 


Videos

  • Add Videos
  • View All

Follow Us

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service