Missing variables
Summary
Missing variables
- Given a random sample and a linear regression model.
- If we leave out variables in our regression model that ought to be included, then those variables are called missing variables.
Example
- A random sample \(\left( y_i,x_{i,2} \right)\) and a linear regression model using this only explanatory variable. We may or may not have data on \(x_3\) – in either case, it is not included in our regression.
- Suppose that:
\[E\left( x_2,x_3 \right)=β_1+β_2x_2+β_3x_3 \]
- but we believe that
\[E\left( x_2,x_3 \right)=β_1+β_2x_2 \]
- then \(x_3\) is a missing variable.
Results
- Setup: LRM with missing variables and GM (given all variables)
- The OLS estimators are generally biased and inconsistent
- Inference based on the OLS estimators will be incorrect
- The OLS standard errors are inconsistent as well.
- If all the included variables are independent of all the missing variables then GM will hold for included variables and we will have no issues with OLS
- Generally, the higher the correlation between included and missing variables the worse the bias and inconsistency.