The importance of completeness of linear regressions is an often-discussed issue. By leaving out relevant variables the coefficients __might__ be inconsistent.

But why on earth?!

Assuming a linear complete model of the form:

*z = a + bx + cy + **ε**.*

Where *z* is supposed to be dependent, *x* and *y* are independent and *ε* is the error term.

Now we drop *y* to check which terms are affected. By reducing one dimension we transform a linear hyperplane to a linear line. In the initial three-dimensional space this two-dimensional line (incomplete model) is located in the center of *y*. More precisely, at *ȳ* which is the mean of *y*. This leads to a correction of "*a*" and *ε* – if *y* is left out.

Starting from the initial estimated model (without *ε*) we get "*a*" with *x = 0* and *y = 0*. To obtain the new intercept (*α*), "*a*" must be extended from *y = 0* to *y = ȳ* with:

*α* *= a + cȳ.*

For the residuals *ε* the contribution (regarding the explanatory power) of *y* disappears. This leads to an increasing error-term (*u*):

*u = **ε* *+ c*(*y - ȳ*).

So, the incomplete model

*z = **α* *+ bx + u*

consists of

*z = a + cȳ + bx +* *ε* *+ c*(*y - ȳ*)

Dissolving the parentheses leads to the initial model *z*.

__Assumed there is a correlation between x and__

For the initial (complete) model, this is not a problem regarding its consistency. *However, multicollinearity can cause variance inflation.* But for the incomplete model there will be a correlation among the independent variable *x* and the residuals *u *- which can end up in inconsistency.

Thus, if there is no correlation between the omitted variable(s) and the contained variable(s) in the model there is no problem regarding the consistency. *Except the endogeneity comes from errors of measurement or reverse causality. But this is another story…*

* *

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central