Chapter 4 : The regression model with several explanatory variable
By Lund University
This chapter is an extension of chapter 3 allowing for several explanatory variables. First, the linear regression with several explanatory variables, the focus of this chapter, is thoroughly introduced and an extension of the OLS formula is discussed. Since we are not using matrix algebra in this course, we will not be able to present the general formulas such as the OLS formula. Instead, we rely on the fact that they have been correctly programmed into software such as Excel, EVies, Stata and more. We need to make small changes to the inference of this model and we will also introduce some new tests. A new problem that will appear in this model is that of multicolinearity. Next, we look at some nonlinear regression models followed by dummy variables. This chapter is concluded with an anlysis of the data problem heteroscedasticity.
The linear regression model with several explanatory variables
We begin this chapter by extending the linear regression model allowing for several explanatory variables. For example, if we have three explanatory variables then, including the intercept, we will have four unknown beta parameters. We will use the symbol k to denote the number of unknown beta parameters. The OLS principle for estimating the beta parameters will still work but the mathematics will become more complicated and is best done using matrices. However, we can always feed data into software and get the OLS estimates from the software. The fundamental assumption introduced in chapter 3, exogeneity, will be discussed and we will conclude that the OLS estimator will be unbiased and consistent under this assumption. Further, the OLS estimator will be best if the error terms are homoscedastic.
Inference in the linear regression model with several explanatory variables
We begin by looking at the t-test which we use to test a single restriction. In a linear regression model with several explanatory variables it is common to consider hypotheses involving several restrictions. Such hypotheses can be tested using an F test. Finally we look at confidence intervals when we have many explanatory variables.
this section contains two unrelated topics. We begin by looking at multicollinearity, a problem where the explanatory variables are highly correlated. Presence of multicollinearity makes it difficult to estimate individual beta parameters. Forecasting will allow us to predict the value of the dependent variable for given values of the explanatory variables even when the observation is not part of our sample.
So far, the dependent variable has been modeled as a linear function of the explanatory variables plus an additive error term. In this section, we will look at nonlinear models. It turns out that we have two types of linearity in the linear regression model. First, the dependent variable is linear in the explanatory variables. Second, the dependent variable is linear in the beta parameters. Therefore, we can consider two types of non-linearity. In this section, we will focus mainly on nonlinearity in the explanatory variables retaining linearity in the parameters. We will then look at the most common nonlinear models, the log-log model, the loglinear model and a model where we only log (some of) the x variables. Once we have an nonlinear model, we need to reinterpret the beta parameters. For example, in the log-log model, the beta parameters will be elasticities. Choosing between a linear regression model and a model nonlinear in the explanatory variables can be difficult. To help us in this choice, we introduce Ramsey’s RESET test.
If all observations belong to one out of two groups, then a dummy variable can be used to encode this information. A dummy variable will take the value zero for all observations belong to one group and one for all the remaining observations belonging to the other group. We can use a dummy variable as an explanatory variable in a linear regression model in the same way that we use an ordinary explanatory variable. Dummy variables can be used even if you have more than two groups.
Heteroscedasticity means that the variance of the error term is different between different observations and this is very common in economics. We begin by looking at tests helping us figuring out if our data is homoscedastic or heteroscedasticity. If we find that we have heteroscedasticity, then the standard errors derived by assuming homoscedasticity are no longer valid. Instead, we can use robust standard errors. Also, with heteroscedasticity OLS is no longer efficient. In this case, the efficient estimator is called the weighted least squares.