Home » Uncategorized

R, Python or SAS: Which one should you learn first?

Python, R and SAS are the three most popular languages in data science. If you are new to the world of data science and aren’t experienced in either of these languages, it makes sense to be unsure of whether to learn R, SAS or Python.

kd-nuggets-poll-2014-programming-languages

Don’t fret, by the time you’re done reading this article, you will know without a doubt which language is the right one for you.

Overview

R – R is the lingua franca of statistics. It is a free and open source programming language used to perform advanced data analysis tasks.

Python –Python is a multi-purpose, free and open source programming language which has become very popular in data science due to its active community and data mining libraries.

SAS – SAS has been the undisputed market leader in the enterprise analytics space. It offers a huge array of statistical functions, has a good GUI for people to learn quickly and provides brilliant technical support.

If you are looking to start a career in data science or to gain the skills to be able to transition to this field in the future. Then you are probably doing some research on which of these three programming languages you should learn first to maximize your chances of landing your dream job. Should you focus on mastering R? Or would be it better to make SAS a priority? Or should you learn Python?

Take a look at these 5 factors as a starting point to help you decide:

Industries where the tool is used

Burtch Works,HR firm, asked over 1000 quantitative professionals which language they preferred, SAS, R or Python. Here are the survey results:

sas-vs-r-vs-python

SAS is largely preferred by big corporations because they are offered highly reputed customer service, which is also why SAS has an advantage in the financial services sector and marketing companies, where cost is not the primary concern for selecting a tool.

tools-used-in-data-science-industry

R and Python, on the other hand, are used by Startups and mid-sized firms. Tech and Telecom companies require huge volumes of unstructured data to be analyzed, and hence data scientists use machine learning techniques for which R and Python are more suitable.

data-scientist-vs-predictive-analytics

 

SAS is an expensive commercial software and is mostly used by large corporations with huge budgets.

Python and R are free software that can be downloaded by anyone.

You don’t require prior knowledge in programming to learn SAS, and its easy-to-use GUI makes it the easiest to learn of all the three.  The ability to parse SQL codes, combined with macros and other native packages make learning SAS child’s play for professionals with basic SQL knowledge.

To analyze data in Python, you will use data mining libraries like Pandas, Numpy, and Scipy. In other words, you won’t code in native Python language when analyzing data. The code you write in these libraries looks somewhat similar to the code you write in R. Hence, it is easier to learn R when you are already familiar with the Python data mining libraries. If you already know R, then you should learn the basics of Python programming language before you start to learn the Python data mining ecosystem.

So, don’t think that R is difficult, and Python is easy to learn!

Data Science capabilities

SAS is extremely efficient at sequential data access, and database access through SQL is well integrated. The drag-and-drop interface makes it easy for you to create better statistical models quickly.  It has decent functional graphical capabilities, but it’s difficult to create complex graphical plots in SAS.

R is known for In-memory analytics and is mainly used when the data analysis tasks require a standalone server. R is an excellent tool for exploring data. Currently, R has more than 5000 community contributed packages  in CRAN. The wide range of packages and modules available for statistics and data analysis makes it the most popular and powerful language in data science.  Statistical models can be written in a few lines of code.

You can draw complicated graphs beautifully in R using packages like Ggplot2, lattice, rCharts, etc.

Python libraries like Pandas, Numpy, Scipy and Scikit-learn makes it the second most popular programming language in data science after R. You can also create beautiful charts and graphs using libraries like Matlplotlib and Seaborn.  Python is actively used by the machine learning community to scrap and analyze unstructured data from the web.

I Python notebook – a web-based interactive environment – makes it easier to share your code with anther.

Community Support

SAS has an active online community moderated by community managers. These communities have evolved from peer to peer forums to become publishing platforms for essential content. You can ask queries related to SAS, and the community will answer them. The official blog of SAS is also an essential resource to refer to when you need help with a particular problem.

R has 125 active user groups worldwide, and the number of user group meetings has increased by a significant amount in the last year.  Python has 1,657 user groups, its communities strictly focused on data is much less when compared to R.

R and Python have huge online community support from Stackover flow, mailing lists, user-contributed code and documentation.

SAS doesn’t have an active open source community at all.

Job Scenario

SAS has more than 80,000 customers around the globe, and most of them are corporate with huge budgets. Analysts in these organizations use SAS to quickly and efficiently execute a wide range of statistical models on data sets. That is why the tile “analyst” is often mentioned in SAS job descriptions.

On the other hand, R and Python are used by startups and technology companies. R is more inclined towards tasks related to statistics and data analysis because of which R related jobs have mentions like “ Data miner”, “ Statistician”, “ Data analytics manager”, etc.

Meanwhile, given the boom in big data — you can expect increasing numbers of business analysts and other non-programmers to arm themselves with the R language as well.

Whereas, Python is used by programmers that want to delve into data analysis or apply statistical techniques, and by developers that turn to data science. Python related jobs have mentions like “Machine learning engineer”, “ Data engineer”, “ Big data architect”, etc.

Read more here