Subscribe to DSC Newsletter

Econometrics is fundamental to many of the problems that data scientists care about, and it requires many skills. There's philosophical skill, for thinking about whether fixed effects or random effects models are more appropriate, for example, or what the direction of causality in a particular problem is. There's some coding, including knowing the right commands to interact with statistical programs like Stata or R, and how to interpret their output. There's the intuition to know which policy issues are worth researching, the political skill to obtain data or grant money, even the writing skill to communicate ideas. And "beneath" it all there is linear algebra: matrix formulas for the estimators that are reported, interpreted, and acted on. A person can succeed as an economist or data scientist without having all of the skills listed above. However, it's always helpful to know more: to understand what Stata is doing when you have it run a particular type of regression, or to inform your decisions about which models are most appropriate, or just to understand why an estimator came out differently from how you thought it should be.

The purpose of this post is to outline the linear algebra of some popular regression strategies. It is essentially an extremely short summary of parts of Jeffrey Wooldridge's authoritative econometrics textbook, the text that I use most often.

Equations for the single equation model (Wooldridge, Chapter 4):
The theoretical value of beta (the marginal effect coefficient) in a linear model is

The estimator for beta is

The bias in the equation is given by:

Here is the equation for homoskedastic standard errors, given by a variance matrix V:
Here is the heteroskedastic standard error variance matrix V:

Here is the formula for the random effects estimator (from Chapter 10 in Wooldridge):



and the j's are vectors of 1's.

The fixed effects estimator is:

where F and g are time-demeaned matrices:
which (according to Wooldridge, and comically in my opinion) is "easily seen to be a TxT symmetric, idempotent matrix with rank T-1."

The asymptotic covariance matrix for a fixed effects estimation is

I hope these equations are helpful. Knowing these can give you power to do econometrics better, to solve more problems, to make better decisions about models, and to speak with more confidence about what your models are estimating. Econometrics cannot be learned all at once. You have to be patient, and learn bit by bit, line upon line, until you eventually reach the level of competence you want or need. Good luck to you as you continue to learn!

This post was originally written for

Views: 2787


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Sione Palu on October 2, 2015 at 2:27pm

The following is relevant here. Econometrics' fundamental theorems' like neo-classical economics were falsified by Osborne in the late 1970s and yet ecomic faculties around the world still teache it.

"Response to “Worrying Trends in Econophysics"

Follow Us


  • Add Videos
  • View All


© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service