Home » Uncategorized

Analyzing crime in NYC: data, visuals and source code

  • SupStat 

Analyzing crime in NYC: data, visuals and source code

Contributed by Chuan Sun. He takes the NYC Data Science Academy 12 week full time Data Science Bootcamp program from July 5th to September 23rd, 2016. This post is based on their first class project – the Exploratory Data Analysis Visualization Project, due on the 2nd week of the program. You can find the original article here.

When Vito Corleone, the head of the Corleone crime family in the movie “The Godfather”, was shot on the street of New York by hitmen, I was shocked.

I was shocked not just because I was so immersed in the movie, but also due to one sentence echoing in my mind: “no one is an island”.

Uncertainty is everywhere, even for the mafia boss, not to mention millions of ordinary New Yorkers.

Why should we care?

Safety is one of the most fundamental needs for people. As one of the most populous urban agglomerations in the world, New York City is heaven for many, but perhaps hell for few, especially those who were unfortunately affected by the seven “sins”:

Analyzing crime in NYC: data, visuals and source code

Each week, NYPD publishes City Wide Crime Statistics, containing detailed weekly statistics of crime complaints on 7 felonies. For example, for the one report during 7/4 to 7/10 of 2016, there were 1888 total crime complaints in NYC: 6 murder, 35 rape, 304 robbery, 444 felony assault, 202 burglary, 765 grand larceny, 132 grand larceny of vehicles.

1888 is not a small number, although the total complaints decreased 5.51% as of 2015. By simple math, we know that there were on average 11.24 felony incidents per hour, or 1 felony incident per 6 minutes in NYC.

Questions

This project investigates 7 sins, a.k.a, felonies, which occurred in NYC in the past 10 years (2006-2015). It focuses on answering the following simple yet important questions:

  • (1) Has NYC become safer over the last 10 years?
  • (2) Which months in a year can be considered as unsafe?
  • (3) Which days in a week can be considered as unsafe?
  • (4) Which hours in a day can be considered as unsafe?
  • (5) Which boroughs are more unsafe than others?

Source Code

See here for R source code to generate the graphs in this post.

Dataset

The NYPD 7 Major Felony Incidents dataset:

  • Contains Seven Major Felonies that is updated quarterly at the incident level.
  • It was made public at Dec 29, 2015, and is available here.
  • Contains around 1.1 million incidents, 22 variables, and is 194MB in size.
  • Contains approximate location of longitude and latitude across 5 boroughs.
  • Contains timestamps of offense incidents (year, month, hour) spanning from 1919 to 2015.

According to the NYPD Incident Level Data Footnotes:

  • Crime complaints which involve multiple offenses are classified according to the most serious offense.
  • For privacy reasons, incidents have been moved to the midpoint of the street segment on which they occur.
  • Attempted crimes are recorded as if the crime actually occurred.
  • Data presented here is based on the year the incident was reported, not necessarily when it occurred.

The first point indicates that the number of actual incidents is larger than that in the dataset. Since we know nothing about which types of offenses are typically associated together in incidents of multiple offenses, we can make no assumptions. The second point affects the accuracy of incident locations.  Nevertheless, at the scale of borough or city level, the inaccuracy in longitude and latitude will not have a major impact on the overall distribution of incidents.

Preprocess

Quick exploration using R revealed that, although the years in the dataset span from 1919 to 2015, over 95% of all incidents occurred after 2005. I thus mainly focus on the year from 2006 to 2015. This 10-year period covers 1.1 million incidents.

Visualization and analysis

Trend in the last 10 years (2006 – 2015)

First let us take a look at the overall trend of 7 felonies in NYC in the last 10 years.

Analyzing crime in NYC: data, visuals and source code

Grand larceny is the most frequent offense of all 7 felonies.  The number of incidents is almost twice that of the second most frequent one.

Three felonies are declining: robbery, burglary, and auto theft. I cannot help but link this to the widely used technology in camera surveillance. Wrongdoers know their big faces will instantly show up in NYPD screens once they risk themselves.

Murder and rape have stayed at the same level across 10 years.

The number of felony assaults is on a slightly increasing trend.

To sum up, it is safe to conclude that NYC is getting safer.

Incidents by month

NYC’s seasons are defined as follows:

  • Spring season: March, April, May
  • Summer season: June, July, August
  • Fall season: September, October, November
  • Winter season: December, January, February

Analyzing crime in NYC: data, visuals and source code

Late winter and early spring tend to have the smallest number of incidents for almost all 7 felonies, with February having a particularly low felony incidence.  These can be considered as the safest seasons. This is understandable. During those months it can become very chilly, windy, and snowy. Who would want to go out in such weather?

Summer and early fall tend to have the largest number of incidents for almost all 7 felonies. Summer months in NYC are usually hot and humid, and temperatures may remain high at night.  This can make certain people ornery.

Incidents by day of week

Analyzing crime in NYC: data, visuals and source code

Friday is the least safe day in the week. This insight is easily perceived from the histograms. On Friday, burglary, grand larceny, larceny of motor vehicle, and robbery occur more frequently than on other days. Maybe, people tend to feel very relaxed on Friday after one week’s work, perhaps therefore not being as vigilant as they otherwise might be. This could give wrongdoers great opportunities to break into houses, steal property, such as cars, or commit robberies on the streets.

As for the weekend, the number of incidents for burglary, grand larceny, auto theft, and robbery declines. If you think that people are at home playing with their kids, enjoying family time, watching favorite TV shows, or preparing for their next week’s work, then maybe there is less of an opportunity for wrongdoers to sneak into their homes.  

On the other hand, weekend is less safe in terms of felony assault, rape and murder. Home violences, bad family relationships and unkindly words, may all related to an unhappy or disastrous weekend. So maybe family time is not equally great for everyone!

Incidents by hour

Knowing which hours are safe or unsafe for certain offenses is vital for New Yorkers, since hour is a “tangible” and controllable unit. One can choose to be at one place at a certain hour, or not.

Analyzing crime in NYC: data, visuals and source code

It strikes me that, even with just simple density and histogram graphs, without any complex machine learning models, we can still distill many insights from history.

  • Burglary happens most often during the morning and late afternoon. This approximates to one hour after New Yorkers leave for work, and one hour before they return home.  It makes sense that burglaries are done when people are not at home.
  • On the other hand, felony assaults occur most often during the evening and midnight hours. Does the nighttime bring out the worst in us?
  • Then again, grand larceny occurs most often at noon, early afternoon, and afternoon.
  • Larceny of motor vehicle occurs most often during the midnight hours. The dark night is a silent but perfect conspirator.
  • Rape also occurs most often during the midnight hours and least often in the morning. People are vulnerable at night, especially when asleep.
  • Robbery occurs most often in the afternoon, spikes at 3pm, and is least often in early morning.
  • Murder occurs most often during the midnight hours and least often at 8am. Again, wrongdoers take advantage of victim’s lack of vigilance at night.

Clock view

It is easy to see on a clock when each of the deadly sins peak in terms of frequency.  You can almost map the life of a felon, and only  few hours in a day are really safe, e.g., 5am is a safe time to be alive.

Analyzing crime in NYC: data, visuals and source code

Incidents by borough in 2015

We should also keep an eye on where felonies occur. 

Analyzing crime in NYC: data, visuals and source code

From the histogram above, it can be seen that Manhattan has the most number of grand larcenies. This is somehow not surprising. Perhaps Wall Street and most financial companies are located there, and wrongdoers can get their hands dirty easily. Brooklyn is the second runner, and Staten island has the least. Despite Manhattan being the winner when it comes to grand larceny, Brooklyn in fact appears to be the most dangerous borough.  It ranks first on the of incidents for 6 out of 7 felonies.  In contrast, Staten Island ranks last.

How does 7 sins distribute in 5 boroughs in 2015?

The density map below depicts a visualization of crime in all 5 boroughs. It turns out that each borough has its own distinct pattern of hot locations.

Offenses in Manhattan in 2015

Analyzing crime in NYC: data, visuals and source code

Offenses in Queens in 2015

Analyzing crime in NYC: data, visuals and source code

Offenses in Bronx in 2015

Analyzing crime in NYC: data, visuals and source code

Offenses in Brooklyn in 2015

Analyzing crime in NYC: data, visuals and source code

Offenses in Staten Island in 2015

Analyzing crime in NYC: data, visuals and source code

Conclusion

New Yorkers may rely solely on NYPD to solve those problems. But if each New Yorker is aware of the time/space patterns identified in this report, s/he can take proper action and things may be different.  

Future work

NYC is getting safer and safer. But we should not be satisfied with this. Eradicating felonies is a long-term mission.  I believe more work can be done, including but not limited to:

  • Investigate how the density map has evolved over the past 10 years. Hotspots might dilute, merge, shrink, inflate, etc. If such patterns can be extracted, more valuable insights might be disclosed.
  • Investigate incidents on a finer-grained level, such as block or street level, and generate dynamics of how other factors such as economy, average income, employment rate, etc affect the felonies.