Proving that the intercept OLS estimator is unbiased

Problem

This problem is at a higher level. You may want to skip it for now (or try to understand as much as you can) and return to it later in the course.

In this problem, we will prove that the OLS estimator \(b_1\) is unbiased if the x-variable is exogenous.

Start from

\[b_1=\bar{y}-b_2\bar{x}\]

Again, we will first find the conditional expectation \(E\left( b_1 \mid x \right)\) ,

\[E\left( b_1 \mid x \right)=E\left( \bar{y}-b_2\bar{x}|x \right)\]

a) Show that

\[E\left( \bar{y}-b_2\bar{x}|x \right)=E\left( \bar{y}|x \right)-\bar{x}E\left( b_2|x \right)\]

b) Use \(\bar{y}=β_1+β_2\bar{x}+\bar{ε}\) to show that

\[E\left( \bar{y}|x \right)=β_1+β_2\bar{x}\]

c) Show that \(E\left( b_1 \mid x \right)=β_1\) .

d) Show that \(E\left( b_1 \right)=β_1\) .

Solution

a) We can begin by splitting the \(E\left( \bar{y}-b_2\bar{x}|x \right)\) into two parts:

\[E\left( \bar{y}-b_2\bar{x}|x \right)=E\left( \bar{y}|x \right)-E\left( b_2\bar{x}|x \right)\]

Do not take \(b_2\) outside, \(b_2\) is not a constant (it is random) and it is not known even if we condition on all the \(x\) -data (it also depends on the \(y\) -data). However, \(\bar{x}\) is known if we condition on \(x\) so that guy can go outside.

\[E\left( \bar{y}|x \right)-E\left( b_2\bar{x}|x \right)=E\left( \bar{y}|x \right)-\bar{x}E\left( b_2|x \right)\]

b) Using the formula for \(\bar{y}\)

\[E\left( \bar{y}|x \right)=E\left( β_1+β_2\bar{x}+\bar{ε}|x \right)\]

This can be split into three:

\[E\left( β_1+β_2\bar{x}+\bar{ε}|x \right)=E\left( β_1|x \right)+E\left( β_2\bar{x}|x \right)+E\left( \bar{ε}|x \right)\]

\(β_1\) is a constant so \(E\left( β_1|x \right)=β_1\) . \(β_2\) is a constant and conditionally on \(x\) , so is \(\bar{x}\) . Thus, \(E\left( β_2\bar{x}|x \right)=β_2\bar{x}\) . For the final term we use

\[\bar{ε}= \frac{1}{n}\sum_{i=1}^{n}{ ε_i }\]

We can the find

\[E\left( \bar{ε}|x \right)=E\left( \frac{1}{n}\sum_{i=1}^{n}{ ε_i }|x \right)\]

\(1/n\) is a constant we can take outside the expectation. We then have an expected value of a sum which we can split into \(n\) expected values:

\[ \frac{1}{n}\sum_{i=1}^{n}{ E\left( ε_i|x \right) }\]

By exogeneity, all terms in the sum are zero and \(E\left( \bar{ε}|x \right)=0\) . This shows that \(E\left( \bar{y}|x \right)=β_1+β_2\bar{x}\) .

c) From a) we have

\[E\left( b_1 \mid x \right)=E\left( \bar{y}|x \right)-\bar{x}E\left( b_2|x \right)\]

Part b) gave us \(E\left( \bar{y}|x \right)=β_1+β_2\bar{x}\) . Since \(b_2\) is unbiased, \(E\left( b_2|x \right)=β_2\) . Combining

\[E\left( b_1 \mid x \right)=β_1+β_2\bar{x}-\bar{x}β_2=β_1\]

c) Take one more expectation

\[E\left( b_1 \right)=E\left( E\left( b_1 \mid x \right) \right)=E\left( β_1 \right)=β_1\]

Unbiased!