Maximum likelihood for one variable
Summary
Setup and statistical model
- Given: random sample \(y_1, \ldots ,y_n\) for \(i=1, \ldots ,n\) where \(y_i\) are scalars. \(y_i\) may be discrete or continuous.
- We postulate a statistical model by assuming that the density/distribution of \(y_i\) is known up to an unknown \(p×1\) vector of parameters \(θ\) .
- The density/distribution function is denoted by
\[f\left( y_i;θ \right)\]
- If \(y_i\) is continuous, then \(f\left( y_i;θ \right)\) is the probability density function while if \(y_i\) is discrete, \(f\left( y_i;θ \right)\) is the probability distribution function (or probability mass function).
- The density/distribution function viewed as a function of \(θ\) is called the likelihood contribution or the individual likelihood function and it is denoted by \(L_i\left( θ \right)\) ,
\[L_i\left( θ \right)=f\left( y_i;θ \right)\]
Joint density and likelihood function
- The joint density function is defined as the product of the marginal densities
\[f_J\left( y;θ \right)=\prod_{i=1}^{n}{ f\left( y_i;θ \right) }\]
- The joint density/distribution function viewed as a function of \(θ\) is called the likelihood function and it is denoted by \(L\left( θ \right)\) ,
\[L\left( θ \right)=\prod_{i=1}^{n}{ L_i\left( θ \right) }\]
Log-likelihood
- The log-likelihood contribution or the individual log-likelihood function is defined as the (natural) logarithm of the likelihood contribution. The log-likelihood contribution is denoted by \(l_i\left( θ \right)\) :
\[l_i\left( θ \right)=log L_i\left( θ \right)\]
- The log-likelihood function is defined as the (natural) logarithm of the likelihood function. The log-likelihood function is denoted by \(l\left( θ \right)\) :
\[l\left( θ \right)=log L\left( θ \right)\]
- Result
\[l\left( θ \right)=\sum_{i=1}^{n}{ l_i\left( θ \right) }\]
Maximum likelihood
- Definition: The maximum likelihood estimator of \(θ\) , denoted by \({\hat{θ}}_{ML}\) is defined as
\[{\hat{θ}}_{ML}=\arg max_{θ} L\left( θ \right)\]
- “arg max” means the argument that maximizes the function
- Result: since \(log x\) is strictly increasing,
\[{\hat{θ}}_{ML}=\arg max_{θ} l\left( θ \right)\]
- It is generally simpler to maximize \(l\left( θ \right)\) than \(L\left( θ \right)\) .
- Result: If then model is correctly specified then under weak regularity conditions, \({\hat{θ}}_{ML}\) is a consistent estimator of \(θ\)
\[plim {\hat{θ}}_{ML}=θ\]
Score vector
- The individual score vector \(s_i\left( θ \right)\) is defined as
\[s_i\left( θ \right)= \frac{∂l_i\left( θ \right)}{∂θ}\]
- \(s_i\left( θ \right)\) is \(p×1\) .
- The score vector \(s\left( θ \right)\) is defined as
\[s\left( θ \right)= \frac{∂l\left( θ \right)}{∂θ}\]
- \(s\left( θ \right)\) is \(p×1\) .
- Result:
\[s\left( θ \right)=\sum_{i=1}^{n}{ s_i\left( θ \right) }\]
- First order condition for \({\hat{θ}}_{ML}\) :
\[s\left( {\hat{θ}}_{ML} \right)=0\]
Information matrix
- The information matrix \(I\left( θ \right)\) for \(i=1, \ldots ,n\) is defined as
\[I\left( θ \right)=-E\left( \frac{∂^2l_i\left( θ \right)}{∂θ∂θ'} \right)=-E\left( \frac{∂s_i\left( θ \right)}{∂θ'} \right)\]
- \(I\left( θ \right)\) is \(p×p\) .
- Result
\[I\left( θ \right)=E\left( \frac{∂l_i\left( θ \right)}{∂θ} \frac{∂l_i\left( θ \right)}{∂θ'} \right)=E\left( s_i\left( θ \right)s_i{\left( θ \right)}' \right)\]
The asymptotic distribution of the MLE
- Result: if then model is correctly specified then under weak regularity conditions
\[\sqrt{n}\left( {\hat{θ}}_{ML}-θ \right)→N\left( 0,V \right)\]
- where \(V=I{\left( θ \right)}^{-1}\) , the asymptotic variance matrix of \({\hat{θ}}_{ML}\) .
- Result: if then model is correctly specified then under weak regularity conditions \({\hat{θ}}_{ML}\) is asymtotically efficient , it has the smallest asymptotic variance among all consistent estimators.
Estimating the asymptotic variance
- In most cases, \(I\left( θ \right)\) cannot be found analytically.
- However,
\[I\left( θ \right)=-E\left( \frac{∂^2l_i\left( θ \right)}{∂θ∂θ'} \right)\]
- can be consistently estimated using
\[I_H\left( θ \right)=- \frac{1}{n}\sum_{i=1}^{n}{ \frac{∂^2l_i\left( θ \right)}{∂θ∂θ'} }\]
- which can be consistently estimated using \(I_H\left( {\hat{θ}}_{ML} \right)\) .
- Also,
\[I\left( θ \right)=E\left( \frac{∂l_i\left( θ \right)}{∂θ} \frac{∂l_i\left( θ \right)}{∂θ'} \right)=E\left( s_i\left( θ \right)s_i{\left( θ \right)}' \right)\]
- can be consistently estimated using
\[I_G\left( θ \right)= \frac{1}{n}\sum_{i=1}^{n}{ s_i\left( θ \right)s_i{\left( θ \right)}' }\]
- which can be consistently estimated using \(I_G\left( {\hat{θ}}_{ML} \right)\)
- We have two consistent estimators of \(I\left( θ \right)\) , \(I_H\left( {\hat{θ}}_{ML} \right)\) and \(I_G\left( {\hat{θ}}_{ML} \right)\) , where \(H\) stands for “Hessian” and \(G\) for “Gradient”. You can find many more.
- Result
\[{\hat{V}}_H=I_H{\left( {\hat{θ}}_{ML} \right)}^{-1}\]
- as well as
\[{\hat{V}}_G=I_G{\left( {\hat{θ}}_{ML} \right)}^{-1}\]
- are consistent estimators of the asympotic variance \(V\) , the variance of \(\sqrt{n}\left( {\hat{θ}}_{ML}-θ \right)\) as \(n→∞\) .
- Result: approximately for \(n\) large,
\[{\hat{θ}}_{ML} \sim N\left( θ,n^{-1}{\hat{V}}_H \right)\]
- and
\[{\hat{θ}}_{ML} \sim N\left( θ,n^{-1}{\hat{V}}_G \right)\]
- \({\hat{V}}_H\) and \({\hat{V}}_G\) can be calculated numerically.