Binary choice models

Summary

Given: a random sample: \(\left( y_i,x_i \right)\) where \(y_i\) can only take values 0 or 1 and \(x_i\) is \(k×1\)
Statistical model:

\[E\left( y_i|x_i \right)=P\left( y_i=1 \mid x_i \right)=G\left( x_i,β \right)\]

where \(G\) is a some given function of \(x_i\) and \(β\) with range \(\left[ 0,1 \right]\) and \(β\) is a vector of unknown parameters.
A common choice for \(G\) is

\[G\left( x_i,β \right)=F\left( x'_iβ \right)\]

where \(F\) is a much simpler function of only one variable from \(R\) to \(\left[ 0,1 \right]\) ( \(G\) is a function of 2 \(k\) variables)
If \(F\) is the CDF of a standard normal ,

\[F\left( w \right)=Φ\left( w \right)=\int_{-∞}^{w}{ \frac{1}{\sqrt{2π}}exp \left( - \frac{t^2}{2} \right)dt }\]

then our binary choice model is called a probit model .
If \(F\) is the CDF of a logistic distribution ,

\[F\left( w \right)=L\left( w \right)= \frac{1}{1+e^{-w}}= \frac{e^w}{1+e^w}\]

then our binary choice model is called a logit model.
If \(F\left( w \right)=Φ\left( w \right)\) or \(F\left( w \right)=L\left( w \right)\) then \(F\) will have the following properties

\(F\left( 0 \right)=0\)
\(F\left( w \right)\) is strictly increasing
\(F\left( w \right)→1\) as \(w→∞\)
\(F\left( w \right)→0\) as \(w→-∞\)

The blue curve is the graph of \(Φ\left( w \right)\) while the red graph is the graph of \(L(w)\)

Once we have made a choice for our \(F\) -function, \(y_i\) (conditionally on \(x_i\) ) follows a Bernoulli with parameter \(F\left( x'_iβ \right)\) and we can find \(L_i\left( β \right), l_i\left( β \right)\) and \(l\left( β \right)\) , the individual likelihood contribution, the individual log-likelihood contribution and the log-likelihood function. We have

\[l\left( β \right)=\sum_{i=1}^{n}{ \left( y_ilog F\left( x'_iβ \right)+\left( 1-y_i \right)log \left( 1-F\left( x'_iβ \right) \right) \right) }\]

We can then estimate \(β\) and its standard errors using maximum likelihood, predict probabilities \(Pr⁡(y=1|x)\) and use Wald, LM and LR tests to test hypothesis and so on.

Goodness of fit

There are many choices for a goodness of fit in a binary choice model. One of the most common is the Pseudo-R^2 defined by

\[1- \frac{1}{1+2\left( l_1-l_0 \right)/n}\]

where \(l_1\) is the maximum value of the likelihood function, \(l_1=l\left( {\hat{β}}_{ML} \right)\) and \(l_0\) is the maximum value of the likelihood function whit no explanatory variables (only an intercept).