If you're starting out in Data Science this is a good question to ask yourself. After all you want to be immediately employable and also be efficient with your own time.
Cheng Hang Lee took on this question in an article by this same name earlier this year and has a fairly comprehensive discussion of the pros and cons. Some highlights:
R has a long and trusted history and a robust supporting community in the data industry. Together, those facts mean that you can rely on online support from others in the field if you need assistance or have questions about using the language. Plus, there are plenty of publicly released packages, more than 5,000 in fact, that you can download to use in tandem with R to extend its capabilities to new heights. That makes R great for conducting complex exploratory data analysis. R also integrates well with other computer languages like C++, Java, and C.
When you need to do heavy statistical analysis or graphing, R’s your go-to. Common mathematical operations like matrix multiplication work straight out of the box, and the language’s array-oriented syntax makes it easier to translate from math to code, especially for someone with no or minimal programming background.
Python is a general-purpose programming language that can pretty much do anything you need it to: data munging, data engineering, data wrangling, website scraping, web app building, and more. It’s simpler to master than R if you have previously learned an object-oriented programming language like Java or C++.
In addition, because Python is an object-oriented programming language, it’s easier to write large-scale, maintainable, and robust code with it than with R. Using Python, the prototype code that you write on your own computer can be used as production code if needed.
Although Python doesn’t have as comprehensive a set of packages and libraries available to data professionals as R, the combination of Python with tools like Pandas, Numpy, Scipy, Scikit-Learn, and Seaborn will get you pretty darn close. The language is also slowly becoming more useful for tasks like machine learning, and basic to intermediate statistical work (formerly just R’s domain).
In his analysis Cheng Hang Lee goes on to discuss criteria including:
Personally when I see an article making this type of recommendation I always look for the 'job market' criteria. Many times this will depend on where you want to live. The suggestion I've always made to new job seekers looking to focus their training is start with where you want to live, then look at the job boards for those locations to see what employers are looking for. R will take you directly into Data Science. Python will do the Data Science but as a general purpose language will give you some alternative uses for your investment. Unless you're living in one of the real hotbeds of Data Science like the Bay Area, Los Angeles, New York, and maybe Washington DC you might want to hedge your bet and have a bias toward Python, or be willing to relocate.
Read Cheng Hang Lee's original article here.