Missing variables

Summary

Missing variables

  • Given a random sample and a linear regression model.
  • If we leave out variables in our regression model that ought to be included, then those variables are called missing variables.

Example

  • A random sample \(\left( y_i,x_{i,2} \right)\) and a linear regression model using this only explanatory variable. We may or may not have data on \(x_3\) in either case, it is not included in our regression.
  • Suppose that:

\[E\left( x_2,x_3 \right)=β_1+β_2x_2+β_3x_3 \]

  • but we believe that

\[E\left( x_2,x_3 \right)=β_1+β_2x_2 \]

  • then \(x_3\) is a missing variable.

Results

  • Setup: LRM with missing variables and GM (given all variables)
  • The OLS estimators are generally biased and inconsistent
  • Inference based on the OLS estimators will be incorrect
  • The OLS standard errors are inconsistent as well.
  • If all the included variables are independent of all the missing variables then GM will hold for included variables and we will have no issues with OLS
  • Generally, the higher the correlation between included and missing variables the worse the bias and inconsistency.