Econometrics is fundamental to many of the problems that data scientists care about, and it requires many skills. There's philosophical skill, for thinking about whether fixed effects or random effects models are more appropriate, for example, or what the direction of causality in a particular problem is. There's some coding, including knowing the right commands to interact with statistical programs like Stata or R, and how to interpret their output. There's the intuition to know which policy issues are worth researching, the political skill to obtain data or grant money, even the writing skill to communicate ideas. And "beneath" it all there is linear algebra: matrix formulas for the estimators that are reported, interpreted, and acted on. A person can succeed as an economist or data scientist without having all of the skills listed above. However, it's always helpful to know more: to understand what Stata is doing when you have it run a particular type of regression, or to inform your decisions about which models are most appropriate, or just to understand why an estimator came out differently from how you thought it should be.
The purpose of this post is to outline the linear algebra of some popular regression strategies. It is essentially an extremely short summary of parts of Jeffrey Wooldridge's authoritative econometrics textbook, the text that I use most often.
The estimator for beta is
The bias in the equation is given by:
Here is the equation for homoskedastic standard errors, given by a variance matrix V:
Here is the heteroskedastic standard error variance matrix V:
The fixed effects estimator is:
where F and g are time-demeaned matrices:
which (according to Wooldridge, and comically in my opinion) is "easily seen to be a TxT symmetric, idempotent matrix with rank T-1."
I hope these equations are helpful. Knowing these can give you power to do econometrics better, to solve more problems, to make better decisions about models, and to speak with more confidence about what your models are estimating. Econometrics cannot be learned all at once. You have to be patient, and learn bit by bit, line upon line, until you eventually reach the level of competence you want or need. Good luck to you as you continue to learn!
This post was originally written for bradfordtuckfield.com.