A poll released recently showed Python increasing its lead over R as the language of choice for analytics professionals. Setting aside questions of the representativeness to the analytics practitioner population of a sample produced from online polling, the findings have nonetheless sparked spirited discussion on the future of software for the trade.

My unscientific sample of opinion shows Python slightly ahead of R, with users of each quite passionate about their favorite. And my take is that with the mature ecosystems of both, Python and R will continue to develop, grow, and compete for the foreseeable future.

What I find particularly heartening are the significant developments surrounding interoperability of the two platforms -- the ability to invoke R within Python programs as well as, conversely, Python within R. Indeed, I've written on both Python within R and R within Python for Data Science Central in recent months. Kudos to Python commercial vendor Anaconda and R commercial vendor RStudio for actively promoting these "polyglot" features.

Now complicate this analytics software divide even further by introducing Julia, a language designed from the ground up for performant analytics. With MIT bona-fides, Julia has significantly progressed since its release in 2009. '“Julia has been revolutionizing scientific and technical computing since 2009,” says Edelman, the year the creators started working on a new language that combined the best features of Ruby, MatLab, C, Python, R, and others.' I'm now on my third go-round with Julia and am finally beginning to feel it's legit. The essential DataFrames package is the real deal.

A new competitor such as Julia is considerably behind from the get-go, remaining so until it can both attain a noticeable programmer presence and establish an open source ecosystem. Julia is approaching that point now, helped in no small part by star recognition and a polyglot commitment that allows it to co-exist in Python/R worlds. I just love the prospect of using R's uber-productive ggplot in Python and Julia. And I must admit I'm quite impressed by R-to-Julia package XRJulia developed by venerable S architect/developer John Chambers, and the Julia-to-R library, Rif from R luminary Laurent Gautier -- even though getting them to work is not for the faint of heart.

This Julia kernel Jupyter Notebook purports to demonstrate interoperability from Julia to R and Julia to Python, showcasing the RCall and Pandas packages. I first read a personal, daily-updated dataset of daily stock index levels into a Julia DataFrame. I then summarize the data for a subset of portfolios, "feeding" the resultant DataFrame to a series of R ggplot scripts. I finally invoke Python Pandas within Julia to read the data into a Python DataFrame that is summarized and transformed to Julia for similar R ggplot visualizations. A subsequent blog will examine R to Julia and Python to Julia functionality.

The software used here is Julia 1.0.0, Python 3.6.5, Microsoft Open R 3.4.4, and JupyterLab 0.32.1.

Find the remainder of the post here.

Views: 3530


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Ankit DS on May 13, 2020 at 8:21am

@steve -Steve i am running your post today, i have recently started using Julia,

your comment - A new competitor such as Julia is considerably behind from the get-go, remaining so until it can both attain a noticeable programmer presence and establish an open source ecosystem.

after one and a half year, would you like to add to it, do you think Julia has its ecosystem or going towards it 

Comment by Richard Boire on February 12, 2019 at 11:45am


At the end as I said before, who cares. If you can manipulate data from a variety of different files to create one analytical file that will provide the right info to solve the business problem , you are done. And if you can it done using an abacus(humor,humor), and in acceptable time, then great.

Comment by Paul Bremner on February 12, 2019 at 10:23am

Hi Steve,

Just noticed your comment to Richard (must have been preoccupied in mid-December.)  I'm curious about your statement that the SAS data step, proc, macro, etc. are clunky. As SAS studio is taking off, most "programming" you need to do can be accomplished by clicking on menus/drop-downs in a pane on the left and the code is created on-the-fly in a pane on the right.  I think even a hardcore SAS programmer these days would start by using the GUI in the left screen, create whatever code they need, review it in the right screen, and then submit it.  That's much quicker than creating everything from scratch. I haven't gotten far enough into it to say whether there are things you'd actually need to code (i.e. DO loops, arrays, macros, etc.) My impression is that apps like Enterprise Guide are headed in the same direction.  I would argue that it certainly pays to know the programming so you can check what's happening, and you might want to tweak things for various reasons. But my guess is that for the bulk of tasks, it's far quicker and simpler to use the menus/drop downs.

It seems to me that SAS has essentially erased the distinction between menus/drop-downs, and programming.  You do whatever is fastest and, of course, can do it all in programming if you want.  I'd be curious to know to what extent this is possible in Python and R.  I suspect something similar will appear (if it hasn't already) to make these languages more accessible to users. Of course, as you say, smaller orgs are going to be limited by finances.  And there are certain Data Science/ML things you can't really do without licensing something like Enterprise Miner which is really out of reach in terms of cost for smaller firms.  That's where RapidMiner and other apps like Alteryx will no doubt be trying to make inroads.         

Comment by steve miller on December 10, 2018 at 6:10am

Richard -- Thanks for your comment.

I worked extensively with SAS from 1980-2000. Indeed from 1985-1995, SAS was my go-to tool for data access, data munging, and data analysis. I was a pretty proficient SAS programmer, and I actually still occasionally do some SAS/WPS architecture work for legacy customers committed to the platform.

The SAS data step, proc, and macro language that I loved in 1990 seems clunky and dated today. Given the choice, I much prefer working with functional array-oriented tools such as R, Python, Julia, RDBMS's, and Spark. 

The customers I work with now are mostly smaller, start-up data companies that are budget-sensitive and consequently committed to open source. For them, SAS is a prohibitively expensive non-starter.  https://www.sas.com/store/products-solutions/cSoftware-p1.html. Also, as one who recruits quants from top universities, SAS is not now an easy sell. Most of the kids I've hired in the last 5 years come with Python, R, and perhaps Stata backgrounds.

In the end though, you're correct: "Choose your tool and go at the data."


Comment by Richard Boire on December 7, 2018 at 2:06pm

I am always amazed that you never consider SAS which has been the bedrock data science/data mining language for  those most experienced in data science. There is absolutely nothing that I cannot do in working the data and to ultimately develop a solution(all in SAS and base SAS and SAS/STAT.).  I have built data science solutions in virtually all industry sectors using SAS for over 30 years. Yeah, it's not open source but  I do have scripts in R and Python but at the end of the day, who cares. Choose your tool and go at the data.  Coding in SAS with base SAS and SAS/STAT is a pretty minimal cost and one that I might expect will go down given the open source nature of our discipline. 

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service