Instrumental variables in Stata
Summary
Data
Return to schooling, Verbeek chapter 5.4
Data: schooling.dta:
The files schooling contain data taken from the National Longitudinal (unused variables are removed)
Survey of Young Men (NLSYM) concerning the United States. The analysis
focusses on 1976 but uses some variables that date back to earlier years.
smsa76 1 if lived in smsa in 1976 (lived in metropolitan area)
nearc4 grew up near 4-yr college
ed76 education in 1976
age76 age in 1976
south76 1 if lived in south in 1976 (geographically in USA)
lwage76 log wage in 1976 (outliers trimmed)
black 1 if black
exp76 experience in 1976
exp762 exp76 squared
Wage equation, OLS, Table 5.1
Stata command: regr lwage76 ed76 exp76 exp762 black smsa76 south76
Schooling equation using OLS, Table 5.2
g age762 = age76^2
regr ed76 age76 age762 black smsa76 south76 nearc4
Wage equation estimated by IV, Table 5.3
As table 5.1 but using instruments near collage (nearc4) for education and age, age^2 for experience and experience^2.
The correct Stata command is:
ivregress 2sls lwage76 black smsa76 south76 (ed76 exp76 exp762 = nearc4 age76 age762)
- “ivregress” is the command name for instrumental variable regression
- “2sls” is the name of the estimator. You have three choices here: 2sls, gmm and liml. If you have the same number of instruments as endogenous variables, they will all return the same result, \(b_{IV}={\left( Z'X \right)}^{-1}Z'y\) described above.
- Then you give the dependent variable
- You then list all the exogenous variables
- The endogenous variables go inside a parenthesis. After listing them, do an “=” followed by instruments. You need at least as many instruments as endogenous variables.
Unfortunately, Stata returns an error (unknown exactly why at this point). If you use the out-of-date command ivreg, it works: