The linear regression model with “overspecification”

Summary

  • Setup: random sample \(\left( y_i,x_i \right)\) for \(i=1, \ldots ,n\) where \(x_i\) is a scalar
  • True relation

\[E\left( y_i \mid x_i \right)=β_2x_i\]

  • We believe that

\[E\left( y_i \mid x_i \right)=β_2x_i+β_3x_i^2\]

  • Our model is

\[y_i=β_2x_i+β_3x_i^2+ε_i\]

  • where we fail to realize that \(β_3\) is in fact 0.
  • If \(b\) is the OLS estimator then \(E\left( b_2 \right)=β_2\) and \(E\left( b_3 \right)=0\) .
  • \(b_2\) is still an unbiased and consistent estimator of \(β_2\)
  • However, it is not efficient . We can reduce the variance by removing \(x_i^2\) .
  • Our model is not incorrect or misspecified, it is simply wasteful .