In my previous posts, I compared model evaluation techniques using Statistical Tools & Tests and commonly used Classification and Clustering evaluation techniques

In this post, I'll take a look at how you can compare regression models. Comparing regression models is perhaps one of the trickiest tasks to complete in the "comparing models" arena; The reason is that there are literally *dozens* of statistics you can calculate to compare regression models, including:

1. **Error measures in the estimation period** (in-sample testing) or **validation period** (out-of-sample testing):

- Mean Absolute Error (MAE),
- Mean Absolute Percentage Error (MAPE),
- Mean Error,
- Root Mean Squared Error (RMSE),

**2. Tests on Residuals and Goodness-of-Fit:**

**Plots**: actual vs. predicted value; cross correlation; residual autocorrelation; residuals vs. time/predicted values,**Changes**in mean or variance,**Tests**: normally distributed errors; excessive runs (e.g. of positives or negatives); outliers/extreme values/ influential observations.

This list isn't exhaustive--there are many other tools, tests and plots at your disposal. Rather than discuss the statistics in detail, I chose to focus this post on **comparing a few of the most popular regression model evaluation techniques** and discuss when you might want to use them (or when you might *not* want to). The techniques listed below tend to be on the "easier to use and understand" end of the spectrum, so if you're new to model comparison it's a good place to start.

The first question you should be asking is: *How well do I know my data?* In order to evaluate regression models, you need to know what results would be reasonable for your particular situation. For example, if you compare changes in mean or variance, one model might give you impossible results, another might be overly complicated for the task at hand. The ideal model isn't one that's just "correct", it also needs to be relatively simple and useful for the decision making process--something that won't be immediately obvious unless you know your data really well.

Which technique you choose is largely dependent on the software you have at hand (i.e. R, SPSS, or Excel). If you're using Excel, a word of advice: **stop**. It was never designed for serious statistical work and has significant statistical problems. Duke University's Robert Nau puts it best: "*It's a toy (a clumsy one at that), not a tool for serious work." * The number of models you're testing also comes into play. Arguably, which statistic you use (r-squared, p-values etc.) are mostly personal preference (although the test you use might force that choice upon you). Each of the statistics has it's pluses and minuses, its advantages and disadvantages. I won't be bloating this article out with all the comparisons between the statistics, but if you're interested I've linked where possible to articles that explain those in detail.

Nested models are models that are subsets of one another; If you can get one model by constraining the parameters of another, then that model is nested. Nested models require different techniques to evaluate models and there isn't a single, agreed-upon way to test for the "best" model.

Possibly the **easiest** (it can be used with a very basic understanding of statistics) way to compare nested models is to simply measure how well each model performs reclassification. The "better" model will have higher rates of correct reclassification. A chi-square analysis can be used, although if you run a test for sphericity you must use a different chi-square value.

If you're comparing nested models (perhaps you want to know if the simplest model is adequate), you can compare them with a **t-statistic**. You can only run a test for significance against a single extra coefficient. In other words, you can't run it if you have more than one additional coefficient from one model to the next. This article has instructions in R, as well as a fairly detailed overview on running the *general regression test* or the *extra sum of squares test.*

According to Calvin Garbin of the University of Nebraska Lincoln, with **SPSS** you can compare nested models in two different ways using **r-squared:**

- Get the multiple regression results for each model, then compare the models using the FZT Computator's
*R²*

*change F-test*. - Change from one model to another in SPSS, calculating the
*R²-change F-test.*Although convenient, this doesn't always calculate the statistic correctly.

Gabin's article has a couple of excellent examples of how to perform the above tasks as well as SPSS procedures for comparing non-nested models using correlations.

An ANOVA F-test can compare two nested models, where one is a subset of the other. It tests a single predictor variable, but can be used to test multiple predictors at a time.

**Multiple models** can be compared using forward selection, backward elimination, or stepwise selection. Basically, these are all variants of each other and involve removing predictors with the smallest f-value / t-value or largest associated p-value. These techniques can only be used on nested models, but they can all miss optimal models and--if you run all three on the same models--they may not agree with each other.

Non nested models have fewer options for comparison between models. As the models aren't nested, neither will your results (e.g. a chi-square statistic). In layman's terms, if your models are nested then you're comparing apples to apples, which is much easier than comparing apples to oranges.

One of the simplest comparison methods is the **Bayesian information criterion.** Despite the daunting math behind the calculations, most statistical software will calculate** **the BIC for each model.** **This leaves you to simply interpret the results:** **The model with the lowest BIC is considered the best. It's often preferred over other Bayesian methods like Bayes Factors, because BIC doesn't require you to have knowledge about priors.

**Akaike’s Information Criterion** is similar to BIC, except that the BIC tends to favor models with fewer parameters. AIC ranks each model from best to worst. A major downside is that it doesn't say anything about *quality*; It will choose the "best" even if you input a series of poor quality models.

The benefit of the **Cox test** is it's relatively simple (in comparison to the BIC or AIC) to understand what the test is doing behind the scenes. Let's say you were comparing models A and B. If model A contains the the correct regressors, then those regressors fit from model B to model A should yield zero further explanatory value. If there is further explanatory value, then model 1 doesn't contain the correct regressor set. You run the test twice--the second time from B to A--and compare your findings. See: Performing the Cox Test in R.

Non nested model selection criteria

Cox test for comparing non nested models

Quiz #3 Research Hypotheses that Involve Comparing Non-Nested Models

R. Davidson & J. MacKinnon (1981). Several Tests for Model Specification in the Presence of Alternative Hypotheses. *Econometrica*, **49**, 781-793.

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central