Subscribe to DSC Newsletter

How to Choose Between Learning Python or R First

If you're starting out in Data Science this is a good question to ask yourself.  After all you want to be immediately employable and also be efficient with your own time.

Cheng Hang Lee took on this question in an article by this same name earlier this year and has a fairly comprehensive discussion of the pros and cons.  Some highlights:

The Case for R

R has a long and trusted history and a robust supporting community in the data industry. Together, those facts mean that you can rely on online support from others in the field if you need assistance or have questions about using the language. Plus, there are plenty of publicly released packages, more than 5,000 in fact, that you can download to use in tandem with R to extend its capabilities to new heights. That makes R great for conducting complex exploratory data analysis. R also integrates well with other computer languages like C++, Java, and C.

When you need to do heavy statistical analysis or graphing, R’s your go-to. Common mathematical operations like matrix multiplication work straight out of the box, and the language’s array-oriented syntax makes it easier to translate from math to code, especially for someone with no or minimal programming background.

The Case for Python

Python is a general-purpose programming language that can pretty much do anything you need it to: data munging, data engineering, data wrangling, website scraping, web app building, and more. It’s simpler to master than R if you have previously learned an object-oriented programming language like Java or C++.

In addition, because Python is an object-oriented programming language, it’s easier to write large-scale, maintainable, and robust code with it than with R. Using Python, the prototype code that you write on your own computer can be used as production code if needed.

Although Python doesn’t have as comprehensive a set of packages and libraries available to data professionals as R, the combination of Python with tools like Pandas, Numpy, Scipy, Scikit-Learn, and Seaborn will get you pretty darn close. The language is also slowly becoming more useful for tasks like machine learning, and basic to intermediate statistical work (formerly just R’s domain).

In his analysis Cheng Hang Lee goes on to discuss criteria including:

Personal Preference

Project Selection

Collaboration

Job Market

Personally when I see an article making this type of recommendation I always look for the 'job market' criteria.  Many times this will depend on where you want to live.  The suggestion I've always made to new job seekers looking to focus their training is start with where you want to live, then look at the job boards for those locations to see what employers are looking for.  R will take you directly into Data Science.  Python will do the Data Science but as a general purpose language will give you some alternative uses for your investment.  Unless you're living in one of the real hotbeds of Data Science like the Bay Area, Los Angeles, New York, and maybe Washington DC you might want to hedge your bet and have a bias toward Python, or be willing to relocate.

Read Cheng Hang Lee's original article here.

Views: 8648

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Daniel Johnson on December 2, 2015 at 9:14am

I agree with William.  Its about the type of jobs you want.  A data scientist won't really need the computer science foundation you can learn through python.

Comment by Phillip Burger on September 19, 2015 at 10:25am

Good follow up post.

I've thought more about this question lately. All else being equal, if someone is just starting to learn to program, I'd recommend Python. The gist is to learn the foundations of computing with Python that can be applied to learning all other languages for the rest of one's career.

Python is the more computer science-y language of the two. The idioms and programming principles learned in Python translate well to learning other languages, including R. R is anachronistic and the how-to-code in R does not transfer well when needing to learn to code other languages.

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service