Home » Technical Topics » Data Science

Can crime be predicted ? Does crime follow a natural rate?

Preface :

Violence is a social phenomenon and has a central role to play in assessing societal developments. Does the trend of subcategories of crime indicate a shift in societal structure ? If yes then then how accurately can we predict it and what can we do about it ? I have tried analysing crime in India over 15+ years in the attached article.

Also looking to connect with people interested in working on the data scraped from data.gov.in

Introduction :

To revenge crime is important, but to prevent it is more so.

-ARTHUR CONAN DOYLE, “The Adventure of the Illustrious Client”

With the advent of advanced predictive models and explosion in quantity of data available for analysis , can we predict crime in upcoming years and take preventive measures? This essay deals with a study of 18 years of crimes and insights on socio-economic complexities existing in the Indian society. Percentage distribution of all crimes into different crime categories happening between 1995-2013 is studied, followed by forecasting using different predictive models.

With what accuracy can different crimes be predicted? How farther can we go with this prediction? Can we use predictive analytics to combat crime be at City/Zonal level ?

To understand and analyse the crime statistics of India it is foremost important to understand how it is structured. Know the rules of the Game. In India, crimes are sub-divided into 2 categories : Cognizable and


  • Cognizable Crimes : Police have direct responsibility to take action and affect arrest without warrant , broadly categorised under IPC(Indian Penal Code) and SLL(Special and Local Laws)
  • Non-Cognizable Crimes : Those which cannot be investigated by police without order of magistrate. 

Till now the structure looks like this ,and this is all required to understand this article :


The crimes under IPC are further subdivided into:

  • Crimes against body

  • Crimes against property

  • Crimes against public order

  • Economic crimes

  • Crimes against women and children

  • Other IPC crimes.

(Find details on each subcategory in the appendix)

Split of IPC crimes into the above mentioned categories during 1995-2013 is as follows:


The chart shows that Crimes against women and children make the largest part of the crimes committed, closely followed by crimes against property. (Other IPC crimes consists of various IPC crimes like accident ,hit and run, attempt to suicide etc.)

Can the rate of crimes against women and children be predicted for the up-coming year and hence laws be put into place to deter the criminals ?

Analysis :

Advanced machine learning algorithm almost always require a large test data set to make acceptable predictions but what do we do in the present case when we are left with nothing but 18 data points which aren’t sufficient to build a strong predictive model (Psst the new state of art model on ImageNet dataset for image classification uses 300M images, just 15 million times more datapoints than our data set) In such case what do we do ? We go old school. Are the old school analysis techniques `still’ helpful ?

The data points corresponding to year 2013 are taken as test set and rest as train set and forecasting is done using the following methods.

     1. Linear Regression.

     2. Time series analysis using ARIMA.

     3. Moving Average.


Fig 1. Plots of year-wise trend of each sub-category of crimes.

Comparison of predicted values:



Table 1 shows the predicted values of each of the subsection for the year 2013 and the actual value for 2013. It can be clearly inferred from Table 2 that predicting values using rolling average gives the best results closely followed by TSA.

This hints that a society can have a natural rate of crime which is intrinsically ingrained or results from the structural characteristics of a society and can persist even after implementations of stricter laws.


Sociologists usually deduce that the crime are deeply influenced and derived from the biology and psychology of the offender, while this might be true for a few of the criminals, coming to conclusions with this premise cannot be more incorrect. It would be silly to say that a state/district with higher crime rate has higher concentration of people with biological or psychological problems. This could also serve as a good example in data analysis where correlation is not equal to causation. Sociological explanations can help understand the crime rate prevalent in any of the sub categories of crime and its possible cause. There are a lot of social structure theories explaining the behaviour of crime in society and finding roots of crime in problems in society rather than biological/psychological complications inside an individual.

Strain theory of sociology says that an increase in income inequality can push a person to resort to crime for profit. Such people have accepted the goal of wealth or success creation and try to achieve it via illegitimate methods as can be seen from the increasing trend of economic crimes in the country. While strain theory gives an explanation for crimes committed for economic profits they do not explain other crime like assault and crimes against women and children, an increase(pct contribution to total crimes) in crimes against women and children can be seen for the years considered for analysis. One of the possible explanations for these crimes from sociological point of view come from developmental theory which states that developmental processes in life experiences of the offender mould the individual and the learning from these experiences can later manifest themselves in form of crimes.

It is also worth noting that data science is not just quantitative analysis of data under consideration using complex algorithms and model but it requires a qualitative approach to understand the data before applying complex models. This becomes increasing important while dealing with smaller datasets.

Originally posted here. Looking to connect to people working or interested in working on similar data.

Leave a Reply

Your email address will not be published. Required fields are marked *