Instrumental variables in Stata

Summary

Data

Return to schooling, Verbeek chapter 5.4

Data: schooling.dta:

The files schooling contain data taken from the National Longitudinal (unused variables are removed)

Survey of Young Men (NLSYM) concerning the United States. The analysis

focusses on 1976 but uses some variables that date back to earlier years.

smsa76 1 if lived in smsa in 1976 (lived in metropolitan area)

nearc4 grew up near 4-yr college

ed76 education in 1976

age76 age in 1976

south76 1 if lived in south in 1976 (geographically in USA)

lwage76 log wage in 1976 (outliers trimmed)

black 1 if black

exp76 experience in 1976

exp762 exp76 squared

Wage equation, OLS, Table 5.1

Stata command: regr lwage76 ed76 exp76 exp762 black smsa76 south76

Schooling equation using OLS, Table 5.2

g age762 = age76^2

regr ed76 age76 age762 black smsa76 south76 nearc4

Wage equation estimated by IV, Table 5.3

As table 5.1 but using instruments near collage (nearc4) for education and age, age^2 for experience and experience^2.

The correct Stata command is:

ivregress 2sls lwage76 black smsa76 south76 (ed76 exp76 exp762 = nearc4 age76 age762)

  • “ivregress” is the command name for instrumental variable regression
  • “2sls” is the name of the estimator. You have three choices here: 2sls, gmm and liml. If you have the same number of instruments as endogenous variables, they will all return the same result, \(b_{IV}={\left( Z'X \right)}^{-1}Z'y\) described above.
  • Then you give the dependent variable
  • You then list all the exogenous variables
  • The endogenous variables go inside a parenthesis. After listing them, do an “=” followed by instruments. You need at least as many instruments as endogenous variables.

Unfortunately, Stata returns an error (unknown exactly why at this point). If you use the out-of-date command ivreg, it works: