The linear regression model with “underspecification”

Summary

  • Setup: random sample \(\left( y_i,x_i \right)\) for \(i=1, \ldots ,n\) where \(x_i\) is a scalar
  • True relation ( \(β_3≠0\) ):

\[E\left( y_i \mid x_i \right)=β_2x_i+β_3x_i^2\]

  • Marginal effect:

\[ \frac{dE\left( y_i \mid x_i \right)}{dx_i}=β_2+2β_3x_i\]

  • We believe incorrectly that \(β_3=0\) and that

\[E\left( y_i \mid x_i \right)=β_2x_i\]

  • We believe that in the model

\[y_i=β_2x_i+ε_i\]

  • the exogenous assumption is satisfied,

\[E\left( ε_i \mid x_i \right)=0\]

  • but this is wrong. In fact:

\[E\left( ε_i \mid x_i \right)=E\left( y_i-β_2x_i \mid x_i \right)=β_3x_i^2\]

  • The \(x\) -variable is correlated with the error term and it is endogenous .
  • For the OLS estimator of \(β_2\) :

\[b_2= \frac{∑x_iy_i}{\sum{ x_i^2 }}\]

  • we have

\[E\left( b_2 \mid x \right)=β_2+β_3 \frac{\sum{ x_i^3 }}{\sum{ x_i^2 }}\]

  • \(b_2\) is not an unbiased or consistent estimator of \(β_2\) or the marginal effect – it does not estimate anything of interest .
  • We say that our model \(y_i=β_2x_i+ε_i\) is misspecified , it is missing the \(x_i^2\) variable.