The linear regression model with a redundant variable
Summary
- Setup: random sample \(\left( y_i,x_i,z_i \right)\) for \(i=1, \ldots ,n\) where \(x_i,z_i\) are scalars.
- True relation:
\[E\left( y_i \mid x_i,z_i \right)=βx_i\]
- We believe that
\[E\left( y_i \mid x_i,z_i \right)=βx_i+θz_i\]
- not realizing that \(θ=0\) . We have a wasteful model.
- In the model
\[y_i=βx_i+ε_i\]
- the exogenous assumption is satisfied,
\[E\left( ε_i \mid x_i,z_i \right)=0\]
- Our OLS estimator is unbiased and consistent but not efficient .
- You cannot, in general, say which is worse: missing variable (resulting in a biased and inconsistent estimator) or redundant variable (resulting in an unbiased and consistent but inefficient estimator)