The linear regression model with “underspecification”
Summary
- Setup: random sample \(\left( y_i,x_i \right)\) for \(i=1, \ldots ,n\) where \(x_i\) is a scalar
- True relation ( \(β_3≠0\) ):
\[E\left( y_i \mid x_i \right)=β_2x_i+β_3x_i^2\]
- Marginal effect:
\[ \frac{dE\left( y_i \mid x_i \right)}{dx_i}=β_2+2β_3x_i\]
- We believe incorrectly that \(β_3=0\) and that
\[E\left( y_i \mid x_i \right)=β_2x_i\]
- We believe that in the model
\[y_i=β_2x_i+ε_i\]
- the exogenous assumption is satisfied,
\[E\left( ε_i \mid x_i \right)=0\]
- but this is wrong. In fact:
\[E\left( ε_i \mid x_i \right)=E\left( y_i-β_2x_i \mid x_i \right)=β_3x_i^2\]
- The \(x\) -variable is correlated with the error term and it is endogenous .
- For the OLS estimator of \(β_2\) :
\[b_2= \frac{∑x_iy_i}{\sum{ x_i^2 }}\]
- we have
\[E\left( b_2 \mid x \right)=β_2+β_3 \frac{\sum{ x_i^3 }}{\sum{ x_i^2 }}\]
- \(b_2\) is not an unbiased or consistent estimator of \(β_2\) or the marginal effect – it does not estimate anything of interest .
- We say that our model \(y_i=β_2x_i+ε_i\) is misspecified , it is missing the \(x_i^2\) variable.