Subscribe to DSC Newsletter

R, Python or SAS: Which one should you learn first?

Python, R and SAS are the three most popular languages in data science. If you are new to the world of data science and aren’t experienced in either of these languages, it makes sense to be unsure of whether to learn R, SAS or Python.

kd-nuggets-poll-2014-programming-languages

Don’t fret, by the time you’re done reading this article, you will know without a doubt which language is the right one for you.

Overview

R - R is the lingua franca of statistics. It is a free and open source programming language used to perform advanced data analysis tasks.

Python –Python is a multi-purpose, free and open source programming language which has become very popular in data science due to its active community and data mining libraries.

SAS – SAS has been the undisputed market leader in the enterprise analytics space. It offers a huge array of statistical functions, has a good GUI for people to learn quickly and provides brilliant technical support.

If you are looking to start a career in data science or to gain the skills to be able to transition to this field in the future. Then you are probably doing some research on which of these three programming languages you should learn first to maximize your chances of landing your dream job. Should you focus on mastering R? Or would be it better to make SAS a priority? Or should you learn Python?

Take a look at these 5 factors as a starting point to help you decide:

Industries where the tool is used

Burtch Works,HR firm, asked over 1000 quantitative professionals which language they preferred, SAS, R or Python. Here are the survey results:

sas-vs-r-vs-python

SAS is largely preferred by big corporations because they are offered highly reputed customer service, which is also why SAS has an advantage in the financial services sector and marketing companies, where cost is not the primary concern for selecting a tool.

tools-used-in-data-science-industry

R and Python, on the other hand, are used by Startups and mid-sized firms. Tech and Telecom companies require huge volumes of unstructured data to be analyzed, and hence data scientists use machine learning techniques for which R and Python are more suitable.

data-scientist-vs-predictive-analytics

 

SAS is an expensive commercial software and is mostly used by large corporations with huge budgets.

Python and R are free software that can be downloaded by anyone.

You don’t require prior knowledge in programming to learn SAS, and its easy-to-use GUI makes it the easiest to learn of all the three.  The ability to parse SQL codes, combined with macros and other native packages make learning SAS child’s play for professionals with basic SQL knowledge.

To analyze data in Python, you will use data mining libraries like Pandas, Numpy, and Scipy. In other words, you won’t code in native Python language when analyzing data. The code you write in these libraries looks somewhat similar to the code you write in R. Hence, it is easier to learn R when you are already familiar with the Python data mining libraries. If you already know R, then you should learn the basics of Python programming language before you start to learn the Python data mining ecosystem.

So, don’t think that R is difficult, and Python is easy to learn!

Data Science capabilities

SAS is extremely efficient at sequential data access, and database access through SQL is well integrated. The drag-and-drop interface makes it easy for you to create better statistical models quickly.  It has decent functional graphical capabilities, but it’s difficult to create complex graphical plots in SAS.

R is known for In-memory analytics and is mainly used when the data analysis tasks require a standalone server. R is an excellent tool for exploring data. Currently, R has more than 5000 community contributed packages  in CRAN. The wide range of packages and modules available for statistics and data analysis makes it the most popular and powerful language in data science.  Statistical models can be written in a few lines of code.

You can draw complicated graphs beautifully in R using packages like Ggplot2, lattice, rCharts, etc.

Python libraries like Pandas, Numpy, Scipy and Scikit-learn makes it the second most popular programming language in data science after R. You can also create beautiful charts and graphs using libraries like Matlplotlib and Seaborn.  Python is actively used by the machine learning community to scrap and analyze unstructured data from the web.

I Python notebook – a web-based interactive environment – makes it easier to share your code with anther.

Community Support

SAS has an active online community moderated by community managers. These communities have evolved from peer to peer forums to become publishing platforms for essential content. You can ask queries related to SAS, and the community will answer them. The official blog of SAS is also an essential resource to refer to when you need help with a particular problem.

R has 125 active user groups worldwide, and the number of user group meetings has increased by a significant amount in the last year.  Python has 1,657 user groups, its communities strictly focused on data is much less when compared to R.

R and Python have huge online community support from Stackover flow, mailing lists, user-contributed code and documentation.

SAS doesn’t have an active open source community at all.

Job Scenario

SAS has more than 80,000 customers around the globe, and most of them are corporate with huge budgets. Analysts in these organizations use SAS to quickly and efficiently execute a wide range of statistical models on data sets. That is why the tile “analyst” is often mentioned in SAS job descriptions.

On the other hand, R and Python are used by startups and technology companies. R is more inclined towards tasks related to statistics and data analysis because of which R related jobs have mentions like “ Data miner”, “ Statistician”, “ Data analytics manager”, etc.

Meanwhile, given the boom in big data — you can expect increasing numbers of business analysts and other non-programmers to arm themselves with the R language as well.

Whereas, Python is used by programmers that want to delve into data analysis or apply statistical techniques, and by developers that turn to data science. Python related jobs have mentions like “Machine learning engineer”, “ Data engineer”, “ Big data architect”, etc.

Read more here

Views: 32562

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Brandon Ruggles on November 4, 2016 at 4:06pm

I think that all have their own place in this world. One huge advantage to SAS is that in the industries where it is used extensively it is often on a grid(distributed computing) so it can deal with very large datasets pretty quickly, it also has lots of great features for data cleaning, and data management that R and Python just don't have. SAS as a company does lots of services that don't exist for the others like model management... The big problem with SAS is that even if you are familiar with SAS and have access to SAS it doesn't mean that you have the SAS that you need. There are so many modules and product offerings that if you want to do anything outside of base SAS you need a seperate module. Base SAS and SAS Stats are pretty common, but beyond that it is up in the air. 

Python and R are great, if I had my choice I would use SAS for data management and Python or R for the statistical analysis and maybe Tableau for visual analysis(super quick to get good visuals up and running and dynamic). I only have some experience with Python and R, and like both of them quite a bit. I love coding in R or Python in Jupyter Notebook because of the interactiveness. Python and R will always have an edge when it comes to newer algorithms because their open source communities are great.

SQL is a must in many data environments but doesn't take place of any of these languages

When I am hiring new employees for analytics I look for depth of analytical knowledge and skills with strengths in one of these languages and SQL. If you know one you can learn the others pretty easy, SAS is the most different. The hardest part to learn is just how to do the logic for the data cleaning and the theory to use do analytics appropriately.

Comment by Vincent Granville on November 4, 2016 at 11:44am

Also, many data scientists use multiple languages: I have used R, Perl, SAS, Shell scripts, even C, and SQL, for many years, usually simultaneously depending on the task. So Burtch Work's first chart (with 39% / 42%  / 20%) should actually add up to well above 100%, even though it did not include "other" which encompasses all languages except R, Python and SAS. Very interesting article though, thank you for your post!

Comment by Ruth Ogal on November 4, 2016 at 10:49am

I guess SQL is better for manipulation of super large data sets which may cause memory issues in either R or Python. Afterwards, for analysis purposes, i prefer to use R or Python.

Comment by Life Skipper on November 3, 2016 at 3:10pm

Hi all!!:)

I started my data science adventure with R ,and with no previous experience in anything more than the "hello world" level knowledge of Java and C.

I took the courses by Johns hopkins starting with the R programming course,in parallel with Statistics from Duke univ.(both courses offered on Coursera at that time (and still going strong i think)

I found R t be very straight forward and with a very helpful community as well as documentation for about anything you might want to do with data (and more)

I took Python courses after finishing with several R courses and still I found it really tiring for me.

I dont like the Python syntax at all,(with all the dots,and the specificity on white space.:):)

Of course i am still learning Python and maybe this is why i m so negative about it..:):)

My 2 cents

Regards and thanks for the hospitality

GS 2016

Comment by Vikas Matrupally on November 3, 2016 at 10:31am

This article exactly meant for the persons like me, who want to transition into the world of Analytics.

I've already kick started R-programming by reading few articles and overview and curriclum. And this article has now instilled confidence in me.

Thanks Aatash for sharing valuable insights :)

Comment by Vincent Granville on November 3, 2016 at 10:23am

What about SQL?

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service