The linear regression model with a redundant variable

Summary

  • Setup: random sample \(\left( y_i,x_i,z_i \right)\) for \(i=1, \ldots ,n\) where \(x_i,z_i\) are scalars.
  • True relation:

\[E\left( y_i \mid x_i,z_i \right)=βx_i\]

  • We believe that

\[E\left( y_i \mid x_i,z_i \right)=βx_i+θz_i\]

  • not realizing that \(θ=0\) . We have a wasteful model.
  • In the model

\[y_i=βx_i+ε_i\]

  • the exogenous assumption is satisfied,

\[E\left( ε_i \mid x_i,z_i \right)=0\]

  • Our OLS estimator is unbiased and consistent but not efficient .
  • You cannot, in general, say which is worse: missing variable (resulting in a biased and inconsistent estimator) or redundant variable (resulting in an unbiased and consistent but inefficient estimator)