LRM, simulation

Problem

Data: http://media.nek.lu.se/data/OLSSimulation.xlsx

The Excel file contains a simulation of a random sample and calculations. Red cells mean that these are not observed by the econometrician while green ones are observed. The x-data are simulated from \(N(μ,σ_x^2)\) where you can set mu and sigma_x in cell b2 and b3.

Adjust the values in B2 and B3 and observe the x-data in the D-column. Make sure that you understand the connection. (check out the content of cell D1 if you are interested in how to draw random numbers from the normal in Excel).

The error terms are simulated from \(N(0,σ^2)\) where you can set sigma in cell b4. Since these are simulated independent of the x-data, the exogeneity assumption is satisfied.

The y-data is simulated from the LRM, \(y_i=β_1+β_2x_i+ε_i\) where you can set beta_1 and beta_2 in B5 and B6. Check the formula in cell F2 to convince yourself that \(y_1\) is calculated according to the LRM. The x- and the y-data is also displayed in a scatter plot to the right.

Play around with the cells B2 to B6 until you understand the connection between these values and the scatter plot. A good starting point is 0, 4, 1, 0, 1.
Set mu=0, sigma_x = 2, sigma = 0.5, beta_1 = -4 and beta_2=2. Click “Recalculate” 10 times and observe how many times b2 is above beta_2. What do you think this proportion would be if you did this an infinite number of times?
Check the formula for E(y|x) in G2 and y-hat in H2. Make sure that the make sense to you and why they tend to be fairly close
Will you get better or worse estimates of beta_2 if you increase sigma_x?
Will you get better or worse estimates of beta_2 if you increase sigma?

Solution

b. 50%. Because \(b_2\) is an unbiased estimator of \(β_2\)

d. Better. Having a lot of variance in the explanatory variable is very helpful

e. Worse. Having a lot of variance in the error terms is bad news.