The linear regression model with a missing variable, part 2
Summary
- Setup:
- Random sample \(\left( y_i,x_i,z_i \right)\) for \(i=1, \ldots ,n\) where \(x_i,z_i\) are scalars.
- \(E\left( y_i \mid x_i,z_i \right)=βx_i+θz_i\)
- Suppose that \(E\left( y_i \mid x_i \right)\) is linear in \(x_i\) :
\[E\left( y_i \mid x_i \right)=γx_i\]
- Our OLS estimator
\[b= \frac{∑x_iy_i}{\sum{ x_i^2 }}\]
- will be an unbiased and consistent estimator of \(γ\) .
- To summarize:
- \(b\) is a biased estimator of \(dE\left( y_i \mid x_i,z_i \right)/dx_i\) (in general)
- An unbiased estimator of \(dE\left( y_i \mid x_i \right)/dx_i\) (if \(E\left( y_i \mid x_i \right)\) is linear in \(x_i\) )
- In the literature, \(y_i=βx_i+ε_i\) is often described as the “incorrect model” while \(y_i=βx_i+θz_i+ν_i\) is the “correct model”. In fact, whether \(y_i=βx_i+ε_i\) is incorrect or not all depends on what you are trying to estimate . You only have a missing variable bias if you are estimating \(dE\left( y_i \mid x_i,z_i \right)/dx_i\) using \(b\) .