Two-stage least squares

Problem

The IV estimator and the generalized IV estimator can be computed using OLS twice.

  • First, we run a regression with \(X\) as our dependent variable and \(Z\) as explanatory variables using OLS. Even though \(X\) is \(n×k\) , the OLS formula still apply and is given by \({\left( Z'Z \right)}^{-1}Z'X\) .
  • Next, we get the fitted values from this regression. Fitted values are in general notation “ \(Xb\) ” where now \(Z\) are the explanatory variables and \(b\) is \({\left( Z'Z \right)}^{-1}Z'X\) . The fitted values are therefore \(Z{\left( Z'Z \right)}^{-1}Z'X=P_ZX\) .
  • In our second regression we run a regression explaining \(y\) using the fitted values from the first regression using OLS. That is, our “X”-matrix in the second regression is \(P_ZX\) . Show that the OLS estimator from this second regression is the same as the \(b_{GIV}\) ,

\[b_{GIV}={\left( X'P_ZX \right)}^{-1}X'P_Zy\]

  • where \(P_Z=Z{\left( Z'Z \right)}^{-1}Z'\) .

Note: For this reason, the generalized IV estimator is also called the two stage least squares estimator, or the 2SLS estimator.

Solution

The OLS formula is in standard notation \({\left( X'X \right)}^{-1}X'y\) . However, our “ \(X\) ” is \(P_ZX\) in the second regression. Our “ \(X'\) ” is therefore \(X'P'_Z\) and our “ \(X'X\) ” is \(X'P'_ZP_ZX\) . Now, \(P'_Z=P_Z\) and

\[P_ZP_Z=Z{\left( Z'Z \right)}^{-1}Z'Z{\left( Z'Z \right)}^{-1}Z'=P_Z\]

since \(Z'Z{\left( Z'Z \right)}^{-1}=I\) . Therefore, our “ \(X'X\) ” is \(X'P_ZX\) . Our “ \(X'y\) ” becomes \(X'P'_Zy=X'P_Zy\) . Putting this together, the second step OLS estimator is \({\left( X'P_ZX \right)}^{-1}X'P_Zy=b_{GIV}\) .