The linear regression model with “overspecification”

Summary

Setup: random sample \(\left( y_i,x_i \right)\) for \(i=1, \ldots ,n\) where \(x_i\) is a scalar
True relation

\[E\left( y_i \mid x_i \right)=β_2x_i\]

\[E\left( y_i \mid x_i \right)=β_2x_i+β_3x_i^2\]

\[y_i=β_2x_i+β_3x_i^2+ε_i\]

where we fail to realize that \(β_3\) is in fact 0.
If \(b\) is the OLS estimator then \(E\left( b_2 \right)=β_2\) and \(E\left( b_3 \right)=0\) .
\(b_2\) is still an unbiased and consistent estimator of \(β_2\)
However, it is not efficient . We can reduce the variance by removing \(x_i^2\) .
Our model is not incorrect or misspecified, it is simply wasteful .