Interpolation is a mathematical technique to estimate values of an unknown function f(x) for specific arguments x in some interval [a, b], when a number of observed values f(x) are available within that interval. If you want to estimate values of f(x) when x is outside [a, b], the problem is then called extrapolation. The methodology being used could be OLS (ordinary least squares) and can provide error bounds on the estimated values, for instance using model-free confidence intervals or other techniques.
Prediction - more specifically predictive modeling - is a technique based on statistical modeling to essentially compute the estimates that you can get via extrapolation. Likewise, it provides confidence bands for the estimated values.
So what is the difference between extrapolation and prediction? Do they provide different results? When is it best to use extrapolation versus prediction? In my opinion, there is no difference, it's just a question of context: whether the people doing the analysis call themselves mathematician or statistician.
What do you think?
Thanks for the article, it raises some interesting questions.
I would guess, prediction is just a special case of extrapolation concerning a special variable (time) and direction (future). Admittedly, it's the most commonly used case (especially in finance) but there other types of extrapolation, too (e.g. in engineering).
Another very important issue you mentioned are the confidence bands. Especially in finance, predictions are never accurate but lie always in confidence bands. Here, the "bad" cases are normally more interesting than the "good" ones since they determine the possible survival (or bankruptcy) of financial institutions. The whole field of (quantitative) risk management basically deals with the "edges" of these confidence bands.
I worked myself several years in the field of risk management and think lot of concepts developed there can be useful in data science and vice versa. I've summarized some of them in an article some time ago:
There's an additional problem as well , one that is common to most social sciences models
The observed ...is aware that they are being observed
So if you publish any result they may either start striving towards it ... or "game the system" in order not to reach it
Predictive modeling includes both interpolation and extrapolation.
I have always used the term "extrapolation" as a technical term meaning "Beware!" Not sure where I picked that habit up. But it's intended to telegraph the fact that by operating outside the range of known data, we are adding a huge, untestable assumption into the mix ... namely that the data relationships found in the known ranges for the inputs are also valid outside those ranges.
For time series predictions extending into the future, the caveats about extrapolation are implied.
In practice extrapolation/interpolation has been used when looking at single variables with fewer data but prediction or predictive modeling usually involves a larger number of variables and data points.
Prediction in the AI space is very different. In the AI space Forecasting (through extrapolation) is different from Prediction Since prediction would require the same event to have occurred in the past to learn from and then based on new data points predict the likelihood of that event. For example:
Forecasting Question: What will be the temperature tomorrow. Forecasting Answer: 65 degree - based on extrapolation of historical data.
Prediction Qs: What is the likelihood that it will be over 60 degree tomorrow? Prediction Answer: (once a model has been trained with historic data) there is a 75% probability that it will be over 60 degrees tomorrow.
If you include simulation models in the predictive camp, then extrapolation and predictive modeling are very different. With simulation modeling, you create predictions from first principles. Most of the supercomputing cycles in the world are occupied with predictive modeling of biochemical, fusion, and fission reactions, none of them extrapolate from known data.
Reply by Jacob Zahavi
There’s quite a difference between interpolation/extrapolation and prediction. The former belongs to the realm of explanatory models, the latter to the realm of predictive analytics.
What is the difference?
Explanatory models, often involving linear regression, are concerned with explaining a given phenomenon and finding causal relationships between an output (dependent) variable, and a host, often very few, input (independent) variables. The objective is to find a good regression model that fits the data very well which meets the underlying assumption of linear regression. The emphasis here is on hypothesis testing, p-values, confidence intervals,…Once a good model is found, one can use it for estimating the value of the output variable for given values of the input variables. It is OK to estimate an output value based on interpolation, but one must use extreme caution in estimating output values based on extrapolation because the regression model is an explanatory model, not a predictive one.
Predictive models, on the other hand, are concerned with predicting the output values of new observations. While regression models may also be harnessed for prediction, they are quite different than explanatory models. The objective is not to find a regression model with perfect fit, but to find a model with high prediction power and no over fitting, even if it requires relaxing the significance level of introducing/excluding input variables from the model. Prediction models use observed and measurable variables, usually many of them, which require the use of advanced feature selection techniques to render a model with high prediction power that generalizes well for new observations. The prediction accuracy is usually assessed by means of a holdout (validation) sample rather than by means of checking the validity of the model assumptions and testing of hypotheses.
Indeed, prediction models and explanatory models are not the same, being less strict about model assumptions, and focusing instead on prediction accuracy and over-fitting. More on the differences between explanatory models and prediction models can be found in the paper by Shmueli, G., and O. Koppius, "Predictive Analytics in Information Systems Research", MIS Quarterly, vol. 35, issue 3, pp. 553-572, 2011.