Introduction to Econometrics
by Lund University
This is NEKG31, Introduction to Econometrics for students at Lund University . It covers material comparable to a typical first course in econometrics.
Before we begin this course, we will look at sample moments (numbers you can calculate from a sample) and introduce some econometric software.
This chapter introduces the least squares principle. The basic problem is how to fit a straight line through a scatter plot. We will cover the ordinary least squares (OLS) formula which will provide us with an intercept and a slope. We will the derive the OLS formula from the least squares principle. This chapter focuses on the algebra of least squares. There is no probability theory or statistics in this chapter. Important concepts introduced in this chapter: Trendline, residuals, fitted values and R-squared. In addition to Excel, we will also use demonstrate how to find trendlines using EViews and Stata.
We now know how to fit a straight line through a scatter plot. The next step is to introduce appropriate assumptions on how our data was generated. We will model our data as a random sample. More specifically, we will model our data as drawings from random variables. This idea turns out to be very fruitful. Random variables are concepts in probability theory which this chapter is about. This chapter covers the absolute minimum from probability theory that we need to progress: random variables, distribution functions, expected value, variance, covariance and conditional expectations.
This chapter formalizes the most important model in econometrics, the linear regression model. The entire chapter is restricted to a special case, nameley when you have only one explanatory variable. The key assumtion of the linear regression model, exogeneity, is introduced. Then, the OLS formula from chapter 1 is reinterpreted as an estimator of unknown parameters in the linear regression model. This chapter also introduces the variance of the OLS estimator under an important set of assumptions, the Gauss-Markov assumptions.
Inference means something like "a conclusion reached on the basis of evidence and reasoning". We now know how to estimate the parameters of the linear regression model (with one explanatory variable). However, these estimates are uncertain. In this section, we see what conclusions we can draw from all of this. But first, we must investigate a few more distributions (in addition to the normal distribution).
In this chapter, we allow for several explanatory variables. We begin by setting up the linear regression with several explanatory variables including the assumptions that we need to make. As in the simpler model with one explanatory variable, the main focus is on estimating the beta-parameters. However, we will no longer be able to present general formulas, such as the OLS formula for our beta-estimates. To do this, we need matrix algebra which is outside the scope of this course. Instead, we rely on the fact that they have been correctly programmed into software such as Excel, EVies, Stata and more. Once we have fully understood the general linera regression model, we move on to inference.
So far, the dependent variable has been modeled as a linear function of the explanatory variables plus an additive error term. In this section, we will look at nonlinear models. First, we look at general non-linear models. Then, we focus on the most important class of non-linear models, logarithmic models.
If all observations belong to one out of two groups, then a dummy variable can be used to encode this information. A dummy variable will take the value zero for all observations belong to one group and one for all the remaining observations belonging to the other group. We can use a dummy variable as an explanatory variable in a linear regression model in the same way that we use an ordinary explanatory variable. Dummy variables can be used even if you have more than two groups.
Heteroscedasticity means that the variance of the error term is different between different observations and this is very common in economics. We begin by looking at tests helping us figuring out if our data is homoscedastic or heteroscedasticity. If we find that we have heteroscedasticity, then the standard errors derived by assuming homoscedasticity are no longer valid. Instead, we can use robust standard errors. Also, with heteroscedasticity OLS is no longer efficient. In this case, the efficient estimator is called the weighted least squares.
Throughout the course so far, we have assumed that the explanatory variables are exogenous. This is the most critical assumption in econometrics. In this chapter we will look at cases when explanatory variables cannot be expected to be exogenous (we then say that they are endogenous). We will also look at the consequence of econometric analysis with endogenous variables. Specifically, we will look at misspecification of our model, errors in variables and the simultaneity problem. When we have endogenous variables, we can sometimes find instruments for them, variables which are correlated with our endogenous variable but not with the error term. This opens for the possibility of consistently estimate the parameters in our model using the instrumental variable estimator and the generalized instrumental variable estimator.
Time series models
Panel data is data over cross-section as well as time. This chapter is only an introduction to models using panel data. The focus of this chapter is on the error component model where we look at the fixed effect estimator as well as the random effects model. The chapter concludes with a discussion of how to choice between s and how to choice between the fixed effect estimator and the random effects estimator (including the Hausman test).