The linear regression model with a missing variable, part 2

Summary

Setup:

Random sample \(\left( y_i,x_i,z_i \right)\) for \(i=1, \ldots ,n\) where \(x_i,z_i\) are scalars.
\(E\left( y_i \mid x_i,z_i \right)=βx_i+θz_i\)

Suppose that \(E\left( y_i \mid x_i \right)\) is linear in \(x_i\) :

\[E\left( y_i \mid x_i \right)=γx_i\]

Our OLS estimator

\[b= \frac{∑x_iy_i}{\sum{ x_i^2 }}\]

will be an unbiased and consistent estimator of \(γ\) .
To summarize:

\(b\) is a biased estimator of \(dE\left( y_i \mid x_i,z_i \right)/dx_i\) (in general)
An unbiased estimator of \(dE\left( y_i \mid x_i \right)/dx_i\) (if \(E\left( y_i \mid x_i \right)\) is linear in \(x_i\) )

In the literature, \(y_i=βx_i+ε_i\) is often described as the “incorrect model” while \(y_i=βx_i+θz_i+ν_i\) is the “correct model”. In fact, whether \(y_i=βx_i+ε_i\) is incorrect or not all depends on what you are trying to estimate . You only have a missing variable bias if you are estimating \(dE\left( y_i \mid x_i,z_i \right)/dx_i\) using \(b\) .