Subscribe to DSC Newsletter

A poll released recently showed Python increasing its lead over R as the language of choice for analytics professionals. Setting aside questions of the representativeness to the analytics practitioner population of a sample produced from online polling, the findings have nonetheless sparked spirited discussion on the future of software for the trade.

My unscientific sample of opinion shows Python slightly ahead of R, with users of each quite passionate about their favorite. And my take is that with the mature ecosystems of both, Python and R will continue to develop, grow, and compete for the foreseeable future.

What I find particularly heartening are the significant developments surrounding interoperability of the two platforms -- the ability to invoke R within Python programs as well as, conversely, Python within R. Indeed, I've written on both Python within R and R within Python for Data Science Central in recent months. Kudos to Python commercial vendor Anaconda and R commercial vendor RStudio for actively promoting these "polyglot" features.

Now complicate this analytics software divide even further by introducing Julia, a language designed from the ground up for performant analytics. With MIT bona-fides, Julia has significantly progressed since its release in 2009. '“Julia has been revolutionizing scientific and technical computing since 2009,” says Edelman, the year the creators started working on a new language that combined the best features of Ruby, MatLab, C, Python, R, and others.' I'm now on my third go-round with Julia and am finally beginning to feel it's legit. The essential DataFrames package is the real deal.

A new competitor such as Julia is considerably behind from the get-go, remaining so until it can both attain a noticeable programmer presence and establish an open source ecosystem. Julia is approaching that point now, helped in no small part by star recognition and a polyglot commitment that allows it to co-exist in Python/R worlds. I just love the prospect of using R's uber-productive ggplot in Python and Julia. And I must admit I'm quite impressed by R-to-Julia package XRJulia developed by venerable S architect/developer John Chambers, and the Julia-to-R library, Rif from R luminary Laurent Gautier -- even though getting them to work is not for the faint of heart.

This Julia kernel Jupyter Notebook purports to demonstrate interoperability from Julia to R and Julia to Python, showcasing the RCall and Pandas packages. I first read a personal, daily-updated dataset of daily stock index levels into a Julia DataFrame. I then summarize the data for a subset of portfolios, "feeding" the resultant DataFrame to a series of R ggplot scripts. I finally invoke Python Pandas within Julia to read the data into a Python DataFrame that is summarized and transformed to Julia for similar R ggplot visualizations. A subsequent blog will examine R to Julia and Python to Julia functionality.

The software used here is Julia 1.0.0, Python 3.6.5, Microsoft Open R 3.4.4, and JupyterLab 0.32.1.

Find the remainder of the post here.

Views: 2129

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by steve miller on December 10, 2018 at 6:10am

Richard -- Thanks for your comment.

I worked extensively with SAS from 1980-2000. Indeed from 1985-1995, SAS was my go-to tool for data access, data munging, and data analysis. I was a pretty proficient SAS programmer, and I actually still occasionally do some SAS/WPS architecture work for legacy customers committed to the platform.

The SAS data step, proc, and macro language that I loved in 1990 seems clunky and dated today. Given the choice, I much prefer working with functional array-oriented tools such as R, Python, Julia, RDBMS's, and Spark. 

The customers I work with now are mostly smaller, start-up data companies that are budget-sensitive and consequently committed to open source. For them, SAS is a prohibitively expensive non-starter.  https://www.sas.com/store/products-solutions/cSoftware-p1.html. Also, as one who recruits quants from top universities, SAS is not now an easy sell. Most of the kids I've hired in the last 5 years come with Python, R, and perhaps Stata backgrounds.

In the end though, you're correct: "Choose your tool and go at the data."

Best

Comment by Richard Boire on December 7, 2018 at 2:06pm

I am always amazed that you never consider SAS which has been the bedrock data science/data mining language for  those most experienced in data science. There is absolutely nothing that I cannot do in working the data and to ultimately develop a solution(all in SAS and base SAS and SAS/STAT.).  I have built data science solutions in virtually all industry sectors using SAS for over 30 years. Yeah, it's not open source but  I do have scripts in R and Python but at the end of the day, who cares. Choose your tool and go at the data.  Coding in SAS with base SAS and SAS/STAT is a pretty minimal cost and one that I might expect will go down given the open source nature of our discipline. 

Videos

  • Add Videos
  • View All

Follow Us

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service