LRM with more than two categories

Summary

Three categories:

  • If observations can be grouped into three categories, then you need two dummy variables \(d_{1i}\) and \(d_{2i}\) .
  • Observations in group A are coded as \(d_{i,1}=0, d_{i,2}=0\) . This is called the base group.
  • Observations in group B are coded as \(d_{i,1}=1, d_{i,2}=0\) .
  • Observations in group C are coded as \(d_{i,1}=0, d_{i,2}=1\) .
  • Add both dummy variables to your regression:

\[y_i=β_1+β_2x_{i,2}+β_3x_{i,3}+…+β_kx_{i,k}+γ_1d_{i,1}+γ_2d_{i,2}+ε_i i=1,…,n\]

  • The intercept of group A (base group) will be \(β_1\) .
  • The intercept of group B will be \(β_1+γ_1\) . \(γ_1\) is the extra intercept that group B observations have compared to base-group observations.
  • The intercept of group C will be \(β_1+γ_2\) . \(γ_2\) is the extra intercept that group C observations have compared to base-group observations.
  • The same principle can be extended to cases with \(l\) categories. Create \(l-1\) dummy variables and select one group as the base group (all dummy variables are 0 for this group).