Missing variables

Summary

Missing variables

Given a random sample and a linear regression model.
If we leave out variables in our regression model that ought to be included, then those variables are called missing variables.

Example

A random sample \(\left( y_i,x_{i,2} \right)\) and a linear regression model using this only explanatory variable. We may or may not have data on \(x_3\) – in either case, it is not included in our regression.
Suppose that:

\[E\left( x_2,x_3 \right)=β_1+β_2x_2+β_3x_3 \]

\[E\left( x_2,x_3 \right)=β_1+β_2x_2 \]

Results

Setup: LRM with missing variables and GM (given all variables)
The OLS estimators are generally biased and inconsistent
Inference based on the OLS estimators will be incorrect
The OLS standard errors are inconsistent as well.
If all the included variables are independent of all the missing variables then GM will hold for included variables and we will have no issues with OLS
Generally, the higher the correlation between included and missing variables the worse the bias and inconsistency.