Maximum likelihood, one variable with explanatory variables

Summary

Setup and statistical model

Given: random sample \(\left( y_i,x_{i,1}, \ldots ,x_{i,k} \right)\) for \(i=1, \ldots ,n\) .
Result: The density/distribution of \(\left( y_i,x_i \right)\) , where \(x_i=\left( x_{i,1}, \ldots ,x_{i,k} \right)\) , can be written as a product of a conditional density/distribution \(f_C\left( y_i \mid x_i \right)\) of \(y_i\) given \(x_i\) and the density/distribution of \(x_i\) , denoted by \(f_x\left( x_i \right)\) :

\[f\left( y_i,x_i \right)=f_C\left( y_i \mid x_i \right)f_x\left( x_i \right)\]

We postulate a statistical model by assuming that

the joint conditional density/distribution of \(y_i\) given \(x_i\) , \(f_C\left( y_i \mid x_i \right)\) , is known up to an unknown \(p×1\) vector of parameters \(θ\) . We denote it by \(f_C\left( y_i \mid x_i;θ \right)\)
the density/distribution of \(x_i\) , \(f_x\left( x_i \right)\) , does not depend on \(θ\) .

Likelihood

The likelihood contribution:

\[L_i\left( θ \right)=f_C\left( y_i \mid x_i;θ \right)f_x\left( x_i \right)\]

The likelihood function

\[L\left( θ \right)=\_{i=1}^{n}{ f_C\left( y_i \mid x_i;θ \right)f_x\left( x_i \right) }=\_{i=1}^{n}{ f_C\left( y_i \mid x_i;θ \right) }\_{i=1}^{n}{ f_x\left( x_i \right) }\]

Result: the argument that optimizes the complete likelihood function will be the same as the argument that optimizes

\[\_{i=1}^{n}{ f_C\left( y_i \mid x_i;θ \right) }\]

The second part does not depend on \(θ\) and it is simply a constant in the optimization.
Therefore, we drop the second part and redefine

\[L_i\left( θ \right)=f_C\left( y_i \mid x_i;θ \right)\]

\[L\left( θ \right)=\_{i=1}^{n}{ f_C\left( y_i \mid x_i;θ \right) }\]

The distribution of the x-data will be irrelevant . All we care about is the conditional densities \(f_C\left( y_i \mid x_i;θ \right)\) .
From this point on, everything is the same as ML estimation for one variable. Instead of specifying a family for the density of \(y_i\) , we specify a family for the conditional density of \(y_i\) given \(x_i\) .