Linear probability model

Summary

The linear probability model

Given: a random sample: \(\left( y_i,x_i \right)\) where \(y_i\) can only take values 0 or 1 and \(x_i\) is \(k×1\)
Statistical model:

\[E\left( y_i|x_i \right)=x'_iβ\]

\[E\left( y_i|x_i \right)=P\left( y_i=1 \mid x_i \right)\]

Since we assume that \(P\left( y_i=1 \mid x_i \right)=x'_iβ\) , this model is called the linear probability model (LPM)
\(x'_iβ\) is a probability so for the LPM to make sense, \(x'_iβ\) must belong to the interval \(\left[ 0,1 \right]\) .
Define \(ε_i=y_i-E\left( y_i|x_i \right)\) such that

\[y_i=x'_iβ+ε_i\]

Problems with the linear probability model

\[ \frac{∂E\left( y|x \right)}{∂x_j}= \frac{∂P\left( y|x \right)}{∂x_j}=β_j\]

This is unreasonable in most cases. In general, \(∂P\left( y|x \right)/∂x_j\) will decrease in \(x_j\) as \(x_j\) gets large.
\(ε_i\) can only take two values, \(1-x'_iβ\) and \(-x'_iβ\) with conditional probabilities \(x'_iβ\) and \(1-x'_iβ\) . Exogeneity holds but

\[Var\left( ε_i|x_i \right)=x'_iβ\left( 1-x'_iβ \right)\]

which is not constant. We have a complicated form of heteroscedasticity.
Predicted values, \({\hat{y}}_i=x'_ib\) , which are probabilities, may end up outside the \(\left[ 0,1 \right]\) interval.