Error terms and the regression model
Summary
- Definition of (additive) error terms for \(i=1, \ldots ,n\) :
\[ε_i=y_i-E\left( y_i \right|x_i)\]
- Under the statistical model, \(E\left( y_i \right|x_i)=g(x_i,β)\) and
\[ε_i=y_i-g(x_i,β)\]
- or
\[y_i=g(x_i,β)+ε_i\]
- This is called a regression model (RM) .
- Under the linear statistical model, \(E\left( y_i \right|x_i)=x'_iβ\) , for \(i=1, \ldots ,n\) :
\[ε_i=y_i-x'_iβ\]
- or
\[y_i=x'_iβ+ε_i\]
- This is called a linear regression model ( LRM ).
- Result for \(i=1, \ldots ,n\) : under the linear statistical model ,
\[E\left( ε_i \right|x_i)=0\]
- We say that the explanatory variables are exogenous with respect to the error terms if this holds.
- Definition of the vector of error terms:
\[ε=\begin{bmatrix}ε_1 \\ ⋮ \\ ε_n\end{bmatrix}\]
- \(ε\) is \(n×1\) and we have
\[ε=y-E\left( y \right|X)\]
- Result: Under the linear statistical model, \(E\left( y \right|X)=Xβ\) , we have
\[y=Xβ+ε\]
- This is called the linear regression model in vector form .
- Under the linear statistical model
\[E\left( ε \right|X)=0\]
- This is the exogeneity condition in matrix form.
- Some notes:
- If \(x_i\) is non-stochastic then \(E\left( ε_i|x_i \right)=E\left( ε_i \right)\) for \(i=1, \ldots ,n\)
- If \(x_i\) is independent of \(ε_i\) then \(E\left( ε_i|x_i \right)=E\left( ε_i \right)\) for \(i=1, \ldots ,n\)
- In these cases, the explanatory variables are exogenous if \(E\left( ε_i \right)=0\) for \(i=1, \ldots ,n\) or \(E\left( ε \right)=0\) .