Binary choice models from a latent variable representation

Summary

  • Given:
    • a random sample: \(\left( y_i,x_i \right)\) where \(y_i\) can only take values 0 or 1 and \(x_i\) is \(k×1\)
    • We may assume that \(E\left( y_i|x_i \right)=P\left( y_i=1 \mid x_i \right)=F\left( x'_iβ \right)\) where \(F\) is a function of only one variable from \(R\) to \(\left[ 0,1 \right]\) (binary choice model)
  • A latent variable representation is an alternative, but equivalent, formulation resulting in the same model.
  • We assume that

\[y_i^*=x'_iβ+ε_i\]

  • Here, \(y_i^*\) is a continuous but unobserved variable, called a latent variable (latent means something like “hidden” or concealed”.
  • \(ε_i\) , conditionally on \(x_i\) , is a random variable with a known CDF \(F\) . We assume that the pdf of \(ε_i\) (given \(x_i\) ) is symmetric around zero such that we have exogeneity, \(E\left( ε_i|x_i \right)=0\) .
  • We assume that \(y_i=1\) if \(y_i^*>0\) and \(y_i=0\) if \(y_i^*≤0\) . If the latent variable \(y_i^*\) , which depends on the explanatory variables as well as on the error term, is positive, then we end up observing \(y_i=1\) . Otherwise we observe \(y_i=0\) .
  • The latent variable representation results in the binary choice model:

\[P\left( y_i=1 \mid x_i \right)=P\left( y_i^*>0 \mid x_i \right)=P\left( x'_iβ+ε_i>0 \mid x_i \right)=P\left( ε_i>-x'_iβ \mid x_i \right)=1-F\left( -x'_iβ \right)=F\left( x'_iβ \right)\]