The linear regression model with “overspecification”
Summary
- Setup: random sample \(\left( y_i,x_i \right)\) for \(i=1, \ldots ,n\) where \(x_i\) is a scalar
- True relation
\[E\left( y_i \mid x_i \right)=β_2x_i\]
- We believe that
\[E\left( y_i \mid x_i \right)=β_2x_i+β_3x_i^2\]
- Our model is
\[y_i=β_2x_i+β_3x_i^2+ε_i\]
- where we fail to realize that \(β_3\) is in fact 0.
- If \(b\) is the OLS estimator then \(E\left( b_2 \right)=β_2\) and \(E\left( b_3 \right)=0\) .
- \(b_2\) is still an unbiased and consistent estimator of \(β_2\)
- However, it is not efficient . We can reduce the variance by removing \(x_i^2\) .
- Our model is not incorrect or misspecified, it is simply wasteful .