The linear regression model with “underspecification”

Summary

Setup: random sample \(\left( y_i,x_i \right)\) for \(i=1, \ldots ,n\) where \(x_i\) is a scalar
True relation ( \(β_3≠0\) ):

\[E\left( y_i \mid x_i \right)=β_2x_i+β_3x_i^2\]

\[ \frac{dE\left( y_i \mid x_i \right)}{dx_i}=β_2+2β_3x_i\]

\[E\left( y_i \mid x_i \right)=β_2x_i\]

\[y_i=β_2x_i+ε_i\]

\[E\left( ε_i \mid x_i \right)=0\]

\[E\left( ε_i \mid x_i \right)=E\left( y_i-β_2x_i \mid x_i \right)=β_3x_i^2\]

\[b_2= \frac{∑x_iy_i}{\sum{ x_i^2 }}\]

\[E\left( b_2 \mid x \right)=β_2+β_3 \frac{\sum{ x_i^3 }}{\sum{ x_i^2 }}\]

\(b_2\) is not an unbiased or consistent estimator of \(β_2\) or the marginal effect – it does not estimate anything of interest .
We say that our model \(y_i=β_2x_i+ε_i\) is misspecified , it is missing the \(x_i^2\) variable.