Fitted values and residuals in the LRM

Problem

\(y_i\) is the height of plant \(i\) and \(x_i\) is the amount of water added to the plant. Suppose that

\[y_i=β_1+β_2x_i+ε_i\]

and that the \(x\) -variable is exogenous. Then

\[E\left( y \mid x \right)=β_1+β_2x\]

  1. Describe in words the difference between \(E\left( y \mid x \right)\) and \(\hat{y}\)
  2. Describe in words the difference between the residual \(e\) and the error term \(ε\)

Solution

  1. \(E\left( y \mid x \right)\) is the expected height of a plant that receives \(x\) units of water. We assume that this is a linear function in \(x\) , \(E\left( y \mid x \right)=β_1+β_2x\) , with unknown parameters \(β_1\) and \(β_2\) . However, \(E\left( y \mid x \right)=β_1+β_2x\) is unknown since \(β_1\) and \(β_2\) are unknown. The fitted value, \(\hat{y}=b_1+b_2x_i\) is the estimated value of the expected height of a plant that receives \(x\) units of water using the estimates \(b_1,b_2\) of the parameters \(β_1,β_2\) .
  2. The error term \(ε\) is the difference between the actual height and the expected height of a plant that receives \(x\) units of water, \(ε=y-E\left( y \mid x \right)\) . It is unknown for the same reason that \(E\left( y \mid x \right)\) is unknown ( \(β_1\) and \(β_2\) are unknown). The residual is the difference between the actual height and the estimated expected height of a plant that receives \(x\) units of water, \(e=y-\hat{y}\) . This is known. If we do a good job estimating the expected height then the residual will be close to the error term.