Multicollinearity (Collinearity) is not a new term especially when dealing with multiple regression models. This phenomenon of relationship in between one response variable with the set of predictor variables also include models like classification and regression trees as well as neural networks. Collinearity is infamously famous for inflating the variance of at least one estimated regression coefficient, which can cause the model to predict erroneously and in a business setup it can have an unrepairable consequence.

So, the next logical question is how to identify collinearity?

While there are various techniques available to counter this problem, I specifically use two techniques that are readily available in most of the statistical applications:

- Variance Inflation Factor Identification Technique
- Best Subset Analysis – I wrote a specific article around this technique which can be acces....

In this article we will only talk about the Variance Inflation Factor(VIF) identification technique which is very useful for identify high multicollinearity among the predictor variables when working with MLR (Multiple Linear Regression Models).

It is also important to understand that VIF ranges from 1 upwards, where the VIF tells you in (decimal form) by what percentage the variance i.e. standard error squared is inflated for each coefficient.

Example:

VIF of 1.9 indicates that the variance for a particular coefficient is 90% bigger than what one should expect it to be.

Rules for identifying collinearity using VIF technique:

- If all values of VIF are near 1 indicates no collinearity between the predictor variables
- VIF of >1 to 5 indicates moderate collinearity
- VIF of >5 indicates serious collinearity

VIF values greater than 10 may indicate multicollinearity is unduly influencing your regression results. Consider removing unimportant predictors from your model.

In Minitab variance inflation factors can be obtained by running a simple multiple regression analysis via Stat>Regression>Regression>Fit Regression Model

The below snapshot clearly shows the VIF results for a model where the analyst is trying to create a Student Test Score result predictions based on the study hours and the alcohol consumption.

Conclusion:

Multicollinearity is as much an opportunity as it is a problem to improve the predictability of the model and VIF identification is one of the effective and widely used procedures to improve the predictions for multiple linear regression model, helping the analysts to spot large variance inflating factors “without a sweat”.

Views: 5681

Tags: Analytics, Best, Cp, Data, Healthcare, Mallows, Pattern, Prediction, R-Sqaured, Recognition, More…Regression, analysis, subset

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central