The linear regression model with a missing variable, part 2

Summary

  • Setup:
    • Random sample \(\left( y_i,x_i,z_i \right)\) for \(i=1, \ldots ,n\) where \(x_i,z_i\) are scalars.
    • \(E\left( y_i \mid x_i,z_i \right)=βx_i+θz_i\)
  • Suppose that \(E\left( y_i \mid x_i \right)\) is linear in \(x_i\) :

\[E\left( y_i \mid x_i \right)=γx_i\]

  • Our OLS estimator

\[b= \frac{∑x_iy_i}{\sum{ x_i^2 }}\]

  • will be an unbiased and consistent estimator of \(γ\) .
  • To summarize:
    • \(b\) is a biased estimator of \(dE\left( y_i \mid x_i,z_i \right)/dx_i\) (in general)
    • An unbiased estimator of \(dE\left( y_i \mid x_i \right)/dx_i\) (if \(E\left( y_i \mid x_i \right)\) is linear in \(x_i\) )
  • In the literature, \(y_i=βx_i+ε_i\) is often described as the “incorrect model” while \(y_i=βx_i+θz_i+ν_i\) is the “correct model”. In fact, whether \(y_i=βx_i+ε_i\) is incorrect or not all depends on what you are trying to estimate . You only have a missing variable bias if you are estimating \(dE\left( y_i \mid x_i,z_i \right)/dx_i\) using \(b\) .