Naïve approach to several groups

Problem

Pelle is interested in the marginal effect of income on amount spent on life insurance. Pelle considers the regression

\[y_i=β_0+ β_1x_i+ε_i\]

Pelle has access to data where each individual is either single, married or divorced. Pelle believes that the marginal effect of income is the same for all individuals, but that the intercept varies between groups.

Pelle’s friend Pelle-Jöns suggests creating a dummy variable \(d_i\) which is 1 for single, 2 for married people and 3 for divorced ones. He suggests the regression

\[y_i=β_0+ β_1x_i+β_2d_i+ε_i\]

Is the setup suggested by Pelle-Jöns reasonable? Explain why / why not. If not, suggest a better strategy for estimating \(β_1\) .

Solution

This is a bad idea. The intercept for singles will be \(β_0+β_2\) , the intercept for married will be \(β_0+2β_2\) and the intercept for divorced will be \(β_0+3β_2\) . We are forcing the intercept of married to be exactly in between the intercept of singles and divorced. There is no motivation for this.

Instead, use two dummies, \(d_{1i}\) and \(d_{2i}\) . Run

\[y_i=β_0+ β_1x_i+β_2d_{1i}+β_3d_{2i}+ε_i\]

You can code, for example:

  • Singles: \(d_{1i}=0\) and \(d_{2i}=0\) . Intercept: \(β_0\)
  • Married: \(d_{1i}=1\) and \(d_{2i}=0\) . Intercept: \(β_0+β_2\)
  • Divorced: \(d_{1i}=0\) and \(d_{2i}=1\) . Intercept: \(β_0+β_3\)

I can now get three independent intercepts.