A few months ago, I wrote a quite positive blog on the Julia analytics language, reveling in its MIT pedigree, its comprehensible structure, and its interoperability with data science stalwarts R and Python. I demonstrated some of Julia's capabilities with a skinny data set of daily stock index values, and I showed it collaborating with R's powerful ggplot graphics subsystem.

This time around, I decided to test Julia against a much meatier data set -- one that I've already examined extensively with both R and Python. I find time and again that it's critical to push analytics platforms with size to uncover their strengths and weaknesses.

The data set I use here consists of Chicago crime records from 2001-2018 in a csv file posted for download each morning. At present, it consists of over 6.7M records and 20+ attributes on the what, where, and when of all crimes logged by the Chicago Police Department. My tests revolve on loading the "master" data, then enhancing with lookup tables describing Chicago communities and the classifications of crimes. From there, I tally multi-attribute frequencies on type, location, and time -- ultimately graphing the results.

R with it's data.table/tidyverse data management/analysis ecosystem and Python with Pandas have met the challenges with aplomb. Both their notebooks have been easy to construct and plenty fast. How would Julia, a much less mature competitor, stack up?

As an adolescent, the Julia language is a bit of a moving target, making the development process somewhat slower than with R and Python. I was able to do what I needed to, though, adopting a similar development strategy of driving from dataframe/datatable packages. stackoverflow is my best development friend, if, not surprisingly, a bit more helpful with R and Python than Julia.

So what did I find? With the exception of several annoyances such as the absence of a vectorized "in" operator, I was pretty much able to mimic in Julia the programming style I used with Python/Pandas. In fact, Julia was somewhat more facile than Pandas with "by group" processing, as its functions acknowledge missing values, unlike Pandas, which ignores them.

What disappointed me, though, was the relative performance of Julia vs R/Python. I think I'm being charitable noting that the tests I ran in both R and Python run at least twice as fast as comparables in Julia. And, of course, the expectation is that Julia should be faster. So I guess I'm a bit despondent after my second date with Julia -- but not, like some, ready to give up just yet.

The code that follows first downloads the csv file from the Chicago Data Portal. It then reads the data into a Julia dataframe, joining that in turn with several lookup tables -- much like one would do with an RDBMS. I then run a number of frequencies queries involving type, time, and place. Finally, and perhaps most gratifyingly, I use R's ggplot exposed by the RCall package to graph frequency results. The code cells follow.

The software used is Julia 1.0.0, Python 3.6.5, Microsoft Open R 3.4.4, and JupyterLab 0.32.1.

Read the entire blog here.

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

**Technical**

- Free Books and Resources for DSC Members
- Learn Machine Learning Coding Basics in a weekend
- New Machine Learning Cheat Sheet | Old one
- Advanced Machine Learning with Basic Excel
- 12 Algorithms Every Data Scientist Should Know
- Hitchhiker's Guide to Data Science, Machine Learning, R, Python
- Visualizations: Comparing Tableau, SPSS, R, Excel, Matlab, JS, Pyth...
- How to Automatically Determine the Number of Clusters in your Data
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- Fast Combinatorial Feature Selection with New Definition of Predict...
- 10 types of regressions. Which one to use?
- 40 Techniques Used by Data Scientists
- 15 Deep Learning Tutorials
- R: a survival guide to data science with R

**Non Technical**

- Advanced Analytic Platforms - Incumbents Fall - Challengers Rise
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- How to Become a Data Scientist - On your own
- 16 analytic disciplines compared to data science
- Six categories of Data Scientists
- 21 data science systems used by Amazon to operate its business
- 24 Uses of Statistical Modeling
- 33 unusual problems that can be solved with data science
- 22 Differences Between Junior and Senior Data Scientists
- Why You Should be a Data Science Generalist - and How to Become One
- Becoming a Billionaire Data Scientist vs Struggling to Get a $100k Job
- Why do people with no experience want to become data scientists?

**Articles from top bloggers**

- Kirk Borne | Stephanie Glen | Vincent Granville
- Ajit Jaokar | Ronald van Loon | Bernard Marr
- Steve Miller | Bill Schmarzo | Bill Vorhies

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives**: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central