Linear probability model

Summary

The linear probability model

  • Given: a random sample: \(\left( y_i,x_i \right)\) where \(y_i\) can only take values 0 or 1 and \(x_i\) is \(k×1\)
  • Statistical model:

\[E\left( y_i|x_i \right)=x'_iβ\]

  • where \(β\) is a is \(k×1\) vector of unknown parameters.
  • Result:

\[E\left( y_i|x_i \right)=P\left( y_i=1 \mid x_i \right)\]

  • Since we assume that \(P\left( y_i=1 \mid x_i \right)=x'_iβ\) , this model is called the linear probability model (LPM)
  • \(x'_iβ\) is a probability so for the LPM to make sense, \(x'_iβ\) must belong to the interval \(\left[ 0,1 \right]\) .
  • Define \(ε_i=y_i-E\left( y_i|x_i \right)\) such that

\[y_i=x'_iβ+ε_i\]

  • we can estimate \(β\) consistently using OLS (or MM).

Problems with the linear probability model

  • The model assumes constant marginal effects ,

\[ \frac{∂E\left( y|x \right)}{∂x_j}= \frac{∂P\left( y|x \right)}{∂x_j}=β_j\]

  • This is unreasonable in most cases. In general, \(∂P\left( y|x \right)/∂x_j\) will decrease in \(x_j\) as \(x_j\) gets large.
  • \(ε_i\) can only take two values, \(1-x'_iβ\) and \(-x'_iβ\) with conditional probabilities \(x'_iβ\) and \(1-x'_iβ\) . Exogeneity holds but

\[Var\left( ε_i|x_i \right)=x'_iβ\left( 1-x'_iβ \right)\]

  • which is not constant. We have a complicated form of heteroscedasticity.
  • Predicted values, \({\hat{y}}_i=x'_ib\) , which are probabilities, may end up outside the \(\left[ 0,1 \right]\) interval.