LRM with more than two categories
Summary
Three categories:
- If observations can be grouped into three categories, then you need two dummy variables \(d_{1i}\) and \(d_{2i}\) .
- Observations in group A are coded as \(d_{i,1}=0, d_{i,2}=0\) . This is called the base group.
- Observations in group B are coded as \(d_{i,1}=1, d_{i,2}=0\) .
- Observations in group C are coded as \(d_{i,1}=0, d_{i,2}=1\) .
- Add both dummy variables to your regression:
\[y_i=β_1+β_2x_{i,2}+β_3x_{i,3}+…+β_kx_{i,k}+γ_1d_{i,1}+γ_2d_{i,2}+ε_i i=1,…,n\]
- The intercept of group A (base group) will be \(β_1\) .
- The intercept of group B will be \(β_1+γ_1\) . \(γ_1\) is the extra intercept that group B observations have compared to base-group observations.
- The intercept of group C will be \(β_1+γ_2\) . \(γ_2\) is the extra intercept that group C observations have compared to base-group observations.
- The same principle can be extended to cases with \(l\) categories. Create \(l-1\) dummy variables and select one group as the base group (all dummy variables are 0 for this group).