Proving that the slope OLS estimator is unbiased

Problem

This problem is at a higher level. You may want to skip it for now (or try to understand as much as you can) and return to it later in the course.

In this problem, we will prove that the OLS estimator \(b_2\) is unbiased if the x-variable is exogenous.

Start from the statistical formula for \(b_2\) :

\[b_2=β_2+ \frac{\sum_{i=1}^{n}{ \left( x_i-\bar{x} \right)ε_i }}{\sum_{i=1}^{n}{ {\left( x_i-\bar{x} \right)}^2 }}\]

a) Show that we can write this as

\[b_2=β_2+\sum_{i=1}^{n}{ a_iε_i }\]

If we define

\[a_i= \frac{\left( x_i-\bar{x} \right)}{\sum_{i=1}^{n}{ {\left( x_i-\bar{x} \right)}^2 }}\]

Hint: \(\sum_{i=1}^{n}{ {\left( x_i-\bar{x} \right)}^2 }\) is just a constant. Let \(c=\sum_{i=1}^{n}{ {\left( x_i-\bar{x} \right)}^2 }\) . Then

\[b_2=β_2+ \frac{1}{c}\sum_{i=1}^{n}{ \left( x_i-\bar{x} \right)ε_i }\]

We can take constants out of sums, but we can also put them back in…

b) Now, trying to find the unconditional expectation \(E\left( b_2 \right)\) directly turns out to be difficult. It is simpler to first find the conditional expectation of \(b_2\) , \(E\left( b_2 \mid x \right)\) (this is simpler to find than the unconditional \(E\left( b_2 \right)\) since we can treat the x-variable as non-random constants). Take conditional expectations on both sides of part a) and we have

\[E\left( b_2 \mid x \right)=E\left( β_2+\sum_{i=1}^{n}{ a_iε_i } \mid x \right)\]

Show that

\[E\left( β_2+\sum_{i=1}^{n}{ a_iε_i } \mid x \right)=β_2+E\left( \sum_{i=1}^{n}{ a_iε_i } \mid x \right)\]

Hint: For arbitrary random variables \(X,Y\) and an arbitrary constant \(a\) we have \(E\left( a+Y|X \right)=a+E\left( Y|X \right)\) .

c) For arbitrary random variables, \(E\left( Y_1+ \ldots +Y_n \mid X \right)=E\left( Y_1 \mid X \right)+ \ldots +E\left( Y_n \mid X \right)\) . Expectation “goes inside sums”. Show that

\[E\left( \sum_{i=1}^{n}{ a_iε_i } \mid x \right)=\sum_{i=1}^{n}{ a_iE\left( ε_i|x \right) }\]

Hint: Remember that \(a_i\) only contains \(x\) -data and we are conditioning on all \(x\) -data.

d) \(E\left( ε_i \mid x \right)=E\left( ε_i \mid x_i \right)\) as \(ε_i\) is independent of \(x\) -variables with an index that is not \(i\) and \(E\left( ε_i \mid x_i \right)=0\) if the \(x\) -variable is exogenous. Use this to show that \(E\left( b_2 \mid x \right)=β_2\) .

e) One final step. We have showed that \(E\left( b_2 \mid x \right)=β_2\) . Use the Law of iterated expectations to show that \(E\left( b_2 \right)=β_2\) .

Solution

a)

\[b_2=β_2+ \frac{1}{c}\sum_{i=1}^{n}{ \left( x_i-\bar{x} \right)ε_i }=β_2+\sum_{i=1}^{n}{ \frac{\left( x_i-\bar{x} \right)}{c} ε_i }=β_2+\sum_{i=1}^{n}{ a_iε_i }\]

b) Since there is a plus-sign in between \(β_2\) and the sum in \(β_2+\sum_{i=1}^{n}{ a_iε_i }\) , we can split the expectation into two parts:

\[E\left( β_2+\sum_{i=1}^{n}{ a_iε_i } \mid x \right)=E\left( β_2 \mid x \right)+E\left( \sum_{i=1}^{n}{ a_iε_i } \mid x \right)\]

In the first expectation, \(E\left( β_2 \mid x \right)\) , we are finding the expected value of a constant, and that is the same constant. Makes no difference if it is conditional or unconditional expectation. Thus

\[β_2+E\left( \sum_{i=1}^{n}{ a_iε_i } \mid x \right)\]

c) \(\sum_{i=1}^{n}{ a_iε_i }\) means \(a_1ε_1+a_2ε_2+ \ldots +a_nε_n\) . Since this is a sum, I can split the expectation

\[E\left( a_1ε_1+a_2ε_2+ \ldots +a_nε_n|x \right)=E\left( a_1ε_1|x \right)+E\left( a_2ε_2|x \right)+ \ldots +E\left( a_nε_n|x \right)\]

Conditionally on \(x\) (I know \(x_1, \ldots ,x_n\) ), all the \(a_1, \ldots ,a_n\) are known since they are determined by the \(x\) ’s. We can treat the \(a\) ’s as constants and take them outside:

\[a_1E\left( ε_1|x \right)+a_2E\left( ε_2|x \right)+ \ldots +a_nE\left( ε_n|x \right)=\sum_{i=1}^{n}{ a_iE\left( ε_i|x \right) }\]

d) Since \(E\left( ε_i|x \right)=0\) for all \(i\) ,

\[\sum_{i=1}^{n}{ a_iE\left( ε_i|x \right) }=\sum_{i=1}^{n}{ a_i×0 }=0\]

Now go back to part b)

\[E\left( b_2 \mid x \right)=β_2+E\left( \sum_{i=1}^{n}{ a_iε_i } \mid x \right)\]

Part c and d demonstrated that \(E\left( \sum_{i=1}^{n}{ a_iε_i } \mid x \right)=0\) and we have \(E\left( b_2 \mid x \right)=β_2\) .

e) Law of iterated expectations tells us that to get the unconditional expectation, \(E\left( b_2 \right)\) , we simply take the expected value of the conditional expectation,

\[E\left( b_2 \right)=E\left( E\left( b_2 \mid x \right) \right)\]

But since we know that \(E\left( b_2 \mid x \right)=β_2\) is a constant, \(E\left( b_2 \right)=E\left( β_2 \right)=β_2\) . \(b_2\) is unbiased!