Introduction to Econometrics

Chapter 4 : The regression model with several explanatory variable

By Lund University

This chapter is an extension of chapter 3 allowing for several explanatory variables. First, the linear regression with several explanatory variables, the focus of this chapter, is thoroughly introduced and an extension of the OLS formula is discussed. Since we are not using matrix algebra in this course, we will not be able to present the general formulas such as the OLS formula. Instead, we rely on the fact that they have been correctly programmed into software such as Excel, EVies, Stata and more. We need to make small changes to the inference of this model and we will also introduce some new tests. A new problem that will appear in this model is that of multicolinearity. Next, we look at some nonlinear regression models followed by dummy variables. This chapter is concluded with an anlysis of the data problem heteroscedasticity.

The linear regression model with several explanatory variables

We begin this chapter by extending the linear regression model allowing for several explanatory variables. For example, if we have three explanatory variables then, including the intercept, we will have four unknown beta parameters. We will use the symbol k to denote the number of unknown beta parameters. The OLS principle for estimating the beta parameters will still work but the mathematics will become more complicated and is best done using matrices. However, we can always feed data into software and get the OLS estimates from the software. The fundamental assumption introduced in chapter 3, exogeneity, will be discussed and we will conclude that the OLS estimator will be unbiased and consistent under this assumption. Further, the OLS estimator will be best if the error terms are homoscedastic.

Linear regression with several explanatory variables

OLS

The properties of the OLS estimator

Inference in the linear regression model with several explanatory variables

We begin by looking at the t-test which we use to test a single restriction. In a linear regression model with several explanatory variables it is common to consider hypotheses involving several restrictions. Such hypotheses can be tested using an F test. Finally we look at confidence intervals when we have many explanatory variables.

Hypothesis testing, one restriction – the t-test

Hypothesis testing, several restrictions – the F-test

Confidence intervals in the LRM

Multicollinearity and forecasting

this section contains two unrelated topics. We begin by looking at multicollinearity, a problem where the explanatory variables are highly correlated. Presence of multicollinearity makes it difficult to estimate individual beta parameters. Forecasting will allow us to predict the value of the dependent variable for given values of the explanatory variables even when the observation is not part of our sample.

Multicollinearity

Forecast in the LRM

Nonlinear regression model

So far, the dependent variable has been modeled as a linear function of the explanatory variables plus an additive error term. In this section, we will look at nonlinear models. It turns out that we have two types of linearity in the linear regression model. First, the dependent variable is linear in the explanatory variables. Second, the dependent variable is linear in the beta parameters. Therefore, we can consider two types of non-linearity. In this section, we will focus mainly on nonlinearity in the explanatory variables retaining linearity in the parameters. We will then look at the most common nonlinear models, the log-log model, the loglinear model and a model where we only log (some of) the x variables. Once we have an nonlinear model, we need to reinterpret the beta parameters. For example, in the log-log model, the beta parameters will be elasticities. Choosing between a linear regression model and a model nonlinear in the explanatory variables can be difficult. To help us in this choice, we introduce Ramsey’s RESET test.

Linear in parameters and/or linear in data

Linear regression models which are nonlinear in data

The log-log model

The log-linear model

Logging an x-variable

Ramsey’s RESET test

Dummy variables

If all observations belong to one out of two groups, then a dummy variable can be used to encode this information. A dummy variable will take the value zero for all observations belong to one group and one for all the remaining observations belonging to the other group. We can use a dummy variable as an explanatory variable in a linear regression model in the same way that we use an ordinary explanatory variable. Dummy variables can be used even if you have more than two groups.

The LRM with a dummy variable

LRM with more than two categories

Interactive dummy variables

The Chow test

Heteroscedasticity

Heteroscedasticity means that the variance of the error term is different between different observations and this is very common in economics. We begin by looking at tests helping us figuring out if our data is homoscedastic or heteroscedasticity. If we find that we have heteroscedasticity, then the standard errors derived by assuming homoscedasticity are no longer valid. Instead, we can use robust standard errors. Also, with heteroscedasticity OLS is no longer efficient. In this case, the efficient estimator is called the weighted least squares.

Heteroscedasticity

Test for heteroscedasticity using squared residuals

Robust standard errors with heteroscedasticity

Weighted least squares