Home » Uncategorized

Blog: Descriptive Versus Predictive Analytics

Generally speaking descriptive analytics is the process of making some sense or adding some structure to a data set, which at times can be very large. For this reason, most discussions referring to “analytics” in business are actually talking about descriptive analytics (Bertolucci, 2013). The most obvious example is when we run descriptive statistics at the beginning of a study, and look at things like the range, mean, median, quartiles, skew and kurtosis. We’re gaining a picture of how the data breaks down. In some cases, that may be actually all that we’re looking for, but in most cases we’ll want to drill down further in our understanding of the data. For example, when we run a clustering algorithm, such as k-means, it allows us to organize the data into groups which appear to have something in common with each other. Sometimes this reflects divisions that are clear in real life, and sometimes it’s more like something that we impose on the data because it’s helpful for our analysis. 

Predictive analytics, on the other hand, may also provide a look at the shape of the data, but it also allows us to identify a trend and make a mathematical prediction about a future event. In short, you’re analyzing the past (perhaps the very recent past, as in “real-time” data, but nevertheless the past), to forecast the future. Perhaps the easiest way to understand this is to think about regression techniques, where you’re identifying a trend line in data–its underlying mathematical formula allows you to make a prediction about what will happen in the future, under similar conditions. Anyone who has taken algebra can understand the principle–once you’ve determined the formula of the model along with its coefficients, you simply plug in your dependent variable, and get a predicted value for an outcome. 

Where it gets a little bit confusing is when we bring “inference” into the mix, which as defined by Merriam-Webster involves “passing from statistical sample data to generalizations (as of the value of population parameters) usually with calculated degrees of certainty”. In other words, we’re making an educated guess about what may happen in the future based on what we’re seeing in the data. This kind of inference is broadly applied to data analytics that are predictive, and also to those considered descriptive. For example, if Netflix uses some form of clustering to group users according to common taste–such as people who like foreign films–they’re very much going to use that descriptive data to inform the movies that they “recommend” to you. They are, in fact, making a prediction based off of descriptive data. It may not have the mathematical equation attached to it (or maybe it does, as they may have more advanced tricks up their sleeves), but is nonetheless a prediction. On the other hand, if Netflix was using a regression model, it might find a correlation between one group of variables and another, and make a quantified prediction about something. 

In summary, there seems to be a bit of overlap between descriptive and predictive techniques, but the definition may be similar to supervised and unsupervised learning, where one involves making a prediction based on past scenarios where we can identify a known outcome, and the other involves going through and mapping out what has happened in the past.  

Bertolucci, J. (2013, January 12). Big Data Analytics: Descriptive Vs. Predictive Vs. Prescriptive. InformationWeek.Retrieved from: https://www.informationweek.com/big-data/big-data-analytics/big-data-analytics-descriptive-vs-predictive-vs-prescriptive/d/d-id/1113279 

Definition of Inference. (n.d.). Merriam-Webster. Retrieved on Aug. 14, 2018 from the Merriam-Webster website:  https://www.merriam-webster.com/dictionary/inference