Home » Uncategorized

Nowcasting Chicago Crime with Python-Pandas, and R.


In my many years as a data scientist, I’ve spent more time doing forecast work than any other type of predictive modeling. Often as not, the challenges have involved forecasting demand for an organization’s many products/lines of business a year or more out based on five or more years of actual data, generally of daily granularity. A difficult task indeed and one for which accuracy expectations by the business are seldom met.

One thing I’ve learned about forecasting is not to be a slave to any modeling technique, choosing predictive integrity over model fidelity. And I’ve become adept at scrambling — adapting to early forecasting results with appropriate model changes to better predict an ever-evolving future. It turns out that both economists and meteorologists are also in that mode, with a name, “nowcasting”, to describe how they modify early predictions based on experience during the forecast period. Meteorologists are constantly changing their weather forecasts, and economists update their annual GDP projections quite often due to evolving inputs.

Formally, nowcasting is the “prediction of the present, the very near future and the very recent past. Crucial in this process is to use timely monthly information in order to nowcast key economic variables…..the nowcasting process goes beyond the simple production of an early estimate as it essentially requires the assessment of the impact of new data on the subsequent forecast revisions for the target variable.”

I’ve written several blogs over the years on crime in my home city of Chicago, especially after the disturbing uptrend in 2016. I continue to download Chicago crime data daily to look at the frequencies of homicides and violent crimes. The trends are in the right direction, though the pace is not nearly fast enough.

After the disastrous 2016, I’ve been in forecast mode for 2017, 2018, and now 2019. My approach is one of nowcasting — starting with predictions for 2019 based on the available data from 2001-2018, then changing these forecasts based on the daily experience as 2019 progresses. It turns out, not surprisingly, that using year-to-date experience is quite helpful in forecasting final annual counts. Knowing the number of violent crimes between 1/1/2018 and 2/28/2018 was a big help in predicting the final 2018 violent crime frequencies. And knowing the counts through 6/30/2018 was even more valuable.

The remainder of this blog examines how the first four months of frequencies for homicide and violent crimes can assist in forecasting final annual 2019 numbers. I explore the relationships between year-to-date and final counts for homicides and violent crimes in Chicago from 2001-2018, then attempt to forecast 2019’s final frequencies. I’ll continue to do the analytics as 2019 progresses, hopefully nowcasting more accurate (and declining) crime over time.

The technology used is JupyterLab 0.32.1, Anaconda Python 3.6.5, NumPy 1.14.3, Pandas 0.23.0, and Microsoft R 3.4.4 with ggplot and rmagic. The cumulative daily Chicago crime file from 2001 through to-date 2019 (a week in arears) drives the analysis. Data munging is done with Python/Pandas. The crime frequency dataframes are then fed to R for visualization using ggplot.

Find the entire blog here.