A few months ago, I wrote a quite positive blog on the Julia analytics language, reveling in its MIT pedigree, its comprehensible structure, and its interoperability with data science stalwarts R and Python. I demonstrated some of Julia's capabilities with a skinny data set of daily stock index values, and I showed it collaborating with R's powerful ggplot graphics subsystem.

This time around, I decided to test Julia against a much meatier data set -- one that I've already examined extensively with both R and Python. I find time and again that it's critical to push analytics platforms with size to uncover their strengths and weaknesses.

The data set I use here consists of Chicago crime records from 2001-2018 in a csv file posted for download each morning. At present, it consists of over 6.7M records and 20+ attributes on the what, where, and when of all crimes logged by the Chicago Police Department. My tests revolve on loading the "master" data, then enhancing with lookup tables describing Chicago communities and the classifications of crimes. From there, I tally multi-attribute frequencies on type, location, and time -- ultimately graphing the results.

R with it's data.table/tidyverse data management/analysis ecosystem and Python with Pandas have met the challenges with aplomb. Both their notebooks have been easy to construct and plenty fast. How would Julia, a much less mature competitor, stack up?

As an adolescent, the Julia language is a bit of a moving target, making the development process somewhat slower than with R and Python. I was able to do what I needed to, though, adopting a similar development strategy of driving from dataframe/datatable packages. stackoverflow is my best development friend, if, not surprisingly, a bit more helpful with R and Python than Julia.

So what did I find? With the exception of several annoyances such as the absence of a vectorized "in" operator, I was pretty much able to mimic in Julia the programming style I used with Python/Pandas. In fact, Julia was somewhat more facile than Pandas with "by group" processing, as its functions acknowledge missing values, unlike Pandas, which ignores them.

What disappointed me, though, was the relative performance of Julia vs R/Python. I think I'm being charitable noting that the tests I ran in both R and Python run at least twice as fast as comparables in Julia. And, of course, the expectation is that Julia should be faster. So I guess I'm a bit despondent after my second date with Julia -- but not, like some, ready to give up just yet.

The code that follows first downloads the csv file from the Chicago Data Portal. It then reads the data into a Julia dataframe, joining that in turn with several lookup tables -- much like one would do with an RDBMS. I then run a number of frequencies queries involving type, time, and place. Finally, and perhaps most gratifyingly, I use R's ggplot exposed by the RCall package to graph frequency results. The code cells follow.

The software used is Julia 1.0.0, Python 3.6.5, Microsoft Open R 3.4.4, and JupyterLab 0.32.1.

Read the entire blog here.

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central