Anda di halaman 1dari 39

Lecture 1 : A review of linear regression analysis

Rigissa Megalokonomou
University of Queensland

Semester 2 2018
Course Information
I This is A course in Applied Micro-Econometrics. These
methodologies are used most widely in fields like labour economics,
development economics, health economics, economics of crime,
economics of education among others.
I One Lecture (2hr) and one Practical (2hr) per week
I Consultation Times:
Rigissa Megalokonomou
Tuesday 18:00-19:00
Room 532A Colin Clark Bldg
email: R.Megalokonomou@uq.edu.au
Course Information
I Assessment for ECON3360:
1. Mid-Semester Exam (40%), written format; Week 9 in-class (18
Sept, 14:00).
2. Final Exam (45%), written format; during Central Exam Period.
Note: All material covered in lectures and tutorials (i.e. Stata
programming issues) are examinable.
3. Homework (15%): 2 PSs that consist of questions which require
short answers and some calculations and 1 article review. You will
need to demonstrate that you understand how to model various
processes correctly and analyze relationships of interest. Due dates
for three assignments are 7 Sep, 5 Oct, 26 Oct (16:00).
Course Information
I Assessment for ECON7360:
1. Mid-Semester Exam (25%), written format; Week 9 in-class (18
Sept, 14:00).
2. Final Exam (35%), written format; during Central Exam Period.
Note: All material covered in lectures and tutorials (i.e. Stata
programming issues) are examinable.
3. Homework (10%): 2 PSs that consist of questions which require
short answers and some calculations. You will need to be able to
demonstrate that you understand how to model various processes
correctly and analyze relationships of interest. Due dates for the two
assignments are 7 Sep, and 22 Oct (16:00).
4. Article review (15%): A critical review of a journal article that
uses methodologies covered in the course. Due date for the
assignment is 5 Oct (16:00).
5. Research proposal (15%): Students should submit a complete
project plan that includes a clearly defined research question,
literature review, data, and the methodology that addresses the
question. Due date for the assignment is 28 Oct (16:00).
Course Information
I Required Resources: Wooldridge, Jeffrey M. (2013) Introductory
Econometrics: A Modern Approach 5th (or 4th or 6th) edition.
I Recommended Resources (the followings are standard textbooks in
micro-econometrics):
I Angrist, Joshua and Jorn-Steffen Pischke. 2009. Mostly Harmless
Econometrics. Princeton: Princeton University Press.
I Cameron, Colin and Pravin Trivedi. 2009. Microeconometrics
Using Stata. College Station, TX: Stata Press.
I Wooldridge, Jeffrey M. 2010. Econometric Analysis of Cross
Section and Panel Data, MIT Press.
I Cameron, Colin and Pravin Trivedi. 2005. Microeconometrics:
Methods and Applications. Cambridge University Press.
Course Outline
I This course covers the basic micro-econometric concepts and
methods used in modern empirical economic research.
I The goal is to help you understand research papers (i.e. journal
articles) in applied microeconomics and to give you those skills that
are required for you to execute your own empirical project.
I One of the more important skills that will be required of you in the
workforce is the ability to convert a large and complex set of
information into a nice neat package. At many different points in
your work life you will be asked what is the bottom line? on a
particular subject matter.
I For this class, you will get some practice on this by reading academic
articles and summarizing the content in a non-technical manner.
Students will be asked to write a short paper that summarizes the
key concepts of an article (article review). The summary should be 3
pages, double spaced, 11 point type.
Course Outline
I Topics include linear regression analysis, randomized controlled trial,
instrumental variables estimation, linear panel data models,
differences-in-differences
method,differences-in-differences-in-differences method,
simultaneous equations models, propensity score matching,
regression discontinuity design among many others.
I We will look at many examples and do a fair amount of
programming and number-crunching ourselves.
Course Outline
I We start with three workhorse methods that are in all applied
econometrician’s toolkit:
I Linear regression models designed to control for variables that may
mask the causal effects of interest;
I Instrumental variables methods for the analysis of real and natural
experiments;
I Differences-in-differences-type strategies that use repeated
observations to control for unobserved omitted factors.
I The productive use of these techniques requires a solid conceptual
foundation and a good understanding of machinery of statistical
inference.
Reading for lecture 1: A review of linear regression analysis in
Introductory Econometrics (Courses like ECON2300, ECON7310)
I Please review the chapters 1-4 and appendix D and E in the
textbook.
I It is inevitable to use matrix algebra to understand linear regression
models in depth that are used in practice.
I Econometrics is measurement of economic relations
I Need to know
I What is an economic relationship?
I How do we measure such a relation?
I Examples
I production function: relationship between output of firm and inputs
of labor, capital, materials
I earnings function: relation between earnings and education, work
experience, job tenure, worker’s ability
I education production function: relation between student academic
performance and inputs such as class-size, student, teacher, peer
characteristics
I All these relations can be expressed as mathematical functions:

y = f (x1 , x2 , ..., xk )

can be approximated by linear regression model.

y = x1 β 1 + x2 β 2 + .. + xk β k + u
y = xβ + u
Objective: causal relation of economic variables
I Most empirical studies in economics are interested in causal relation
of economic variables. The goal of most empirical studies in
economics is to determine whether a change in one variable, say x1 ,
causes a change in another variable, say y . while we keep all other
variables fixed (ceteris paribus).
I examples
I Does having another year of education cause an increase in monthly
salary?
I Does reducing class size cause an improvement in student
performance?
I Does having twenty-week job training program cause an increase in
worker productivity?
Simple association is not a proper measurement for a relation of
economic variables
I Simply finding association between two or more variables might be
suggestive but it is rarely useful for policy analysis. In other words,
association does not imply causation. See some examples:
I Does the presence of more policy officers on the streets deter crime?
How easy is it to implement the ceteris paribus assumption here?
The decision on the police officers size might be correlated with
other city factors that affect crime (ie. how dodgy the area is).
I Suppose that you want to examine the effect of hiring more teachers.
In year t+1 additional teachers are hired. If students’ GPA goes up,
is it the effect of increasing the number of teachers?
Regression fundamentals We start with linear regression framework
because:
I Very robust technique that allows us to incorporate fairly general
functional form relationships;
I Also provides a basis for more advanced empirical methods;
I A transparent and relatively easy to understand technique.
I Before we get into the important question of when a regression is
likely to have a causal interpretation, let’s review a number of
regression facts and properties.
I The multiple linear regression model and its estimation using
ordinary least squares (OLS) is the most widely used tool in
econometrics.

y = xβ + u
β̂ ols = (x 0 x )−1 x 0 y
Regression fundamentals
I Setting aside the relatively abstract causality problem for the
moment, we start with the mechanical properties of the regression
estimates.
I We start with the multiple linear regression mode:

y = xβ + u

where x = (1, x1 , x2 , ...., xk ) and β = ( β 0 , β 1 , β 2 , ...., β k )0 .


I y is called dependent variable, outcome variable, response variable,
explained variable, predicted variable and; X is called independent
variable, explanatory variable, control variable, predictor variable,
regressor, covariates; β 0 is the intercept parameter; β j where
j=1,2,..,k are slope parameters (our primary interest in most case); u
is called error term or disturbance
OLS estimator
I For observation i = (1, 2, .., n),
yi = xi β + ui
minβ Σni=1 ui2
OLS estimator for β chooses such β that minimizes the sum of error
squares.
I In a matrix form, where u = [u1 , u2 , ...un ]

minβ u 0 u

minβ (y − x β)0 (y − x β)

−2(x 0 (y − x β̂)) = 0
β̂ = (x 0 x )−1 x 0 y
cov (x1 ,y )
I If x=x1 is scalar, β̂ 1 = cov (x1 ,x1 )
.
I What is the condition for E ( β̂|x ) = β?
The Multiple Linear Regression Model
I a structure in which one or more explanatory variables are
considered to generate an outcome variable so that is more
amenable to ceteris paribus analysis as it allows us to explicitly
control for many other factors that affect an outcome variable
I Very robust technique that allows to incorporate fairly general
functional form relationships
I Also provides a basis for more advanced empirical methods
I Transparent and relatively easy to understand technique
Statistical Properties of the OLS estimators

y = β 0 + xβ + u

I (y , x, u ) are random variables


I (y , x) are observed variables (we can sample observations on them)
I u is unobservable (no statistical tests involving u)
I ( β 0 , β) are unobserved but can be estimated under certain
conditions
I Model implies that u captures everything that determines y except
for x
β captures economic relationships between y and x.
I In matrix form:

y = xβ + u

I OLS Estimators:

β̂ = (Σni=1 xi0 xi )−1 (Σni=1 xi0 yi ) = (x0 x)−1 x0 y


Statistical properties of the OLS estimators
In the case of single covariate:

y = β 0 + βx + u

Estimators:
Σni=1 (xi − x̄ )(yi − ȳ )
slope = β̂ = ; intercept = βˆ0 = ȳ − β̂x̄
Σni=1 (xi − x̄ )2

where we have a sample of individuals i = 1, 2, 3..., n.


Population analogues:

Cov (x, y )
slope = ; intercept = E (y ) − β̂E (x )
Var (x )
Unbiasedness of OLS When is OLS unbiased (i.e.,E ( β̂) = β)?
I A1. Model in the population is linear in parameters:

y = β 0 + β 1 x1 + β 2 x2 + ... + β k xk β + u
In matrix form: y = xβ + u; x = (1, x1 , x2 , ..., xk )

I A2. Have a random sample on (yi , xi ). Draws are from iid.


I A3. none of the independent variables is constant and there are no
exact linear relationships among the independent variables.
I A4. Zero conditional mean of errors: The error has an expected
value of zero given any values of the independent variables.

E (u |X ) = 0 ⇒ cov (X , u ) = 0

I OLS is unbiased under A1-A4.


Unbiasedness of OLS When is OLS unbiased (i.e.,E ( β̂) = β)?

β̂ = (X0 X)−1 (X0 y )


0 0 0
where X = (x1 , x2 , ..., xn )0 and X0 X = Σni=1 xi0 xi and y = X β + u

β̂ = [(X0 X)−1 (X0 (X β + u )]

β̂ = β + (X0 X)−1 (X0 u )


E ( β̂|x) = β + (X0 X)−1 (X0 E (u |x)) = β
(because of A4. E (u |X ) = 0 )
I Unbiasedness is a feature of sampling distribution of β̂ OLS : central
tendency to true parameter value
Omitted Variable Problem What if we exclude a relevant variable
from the regression?
I This is an example of misspecification.
I For example ability is not observed and not included in the wage
equation (missing β 2 ability ): wage = β 0 + β 1 educ + u
I β 2 is probably positive
I The higher a student’s ability is, the higher will be her wage.
I But also: the higher a student’s ability is, the higher will be her
education level, so cov (X1 , X2 ) > 0, X1 : educ, X2 : ability
I This means that β 1 has an upward bias
Omitted Variable Problem Suppose that the researcher mistakenly
uses:
y = a∗ + b1∗ X1 + e
while X2 is mistakenly omitted from the model. So the model should
have been:
y = a + b1 X1 + b2 X2 + u

I How does b1 (the regression estimate from the correctly specified


model) compare to b1∗ (the regression estimate from the
mis-specified model)?
Cov (X1 , Y ) Cov (X1 , a + b1 X1 + b2 X2 + u )
b1∗ = =
Var (X1 ) Var (X1 )
Cov (X1 , a) + b1 Cov (X1 , X1 ) + b2 Cov (X1 , X2 ) + Cov (X1 , u )
Var (X1 )
0 + b1 Var (X1 ) + b2 Cov (X1 , X2 ) + 0
Var (X1 )
Cov (X1 , X2 )
b1∗ = b1 + b2
Var (X1 )
Sampling variance of OLS slope estimator
I A5. (Homoskedasticity) The error u has the same variance given any
values of explanatory variables

Var (u |x) = σ2 In

I Variance in the error term u, conditional on the explanatory variables,


is the same for all combinations of the explanatory variables
I Under assumptions 1 through 5, conditional of the sample values of
the independent variables,

σ2
Var ( β̂ j ) =
SSTj (1 − Rj2 )

for j = 1, 2, .., k where SSTj = Σni=1 (xij − x¯j )2 is the total sample
variation in xj , and Rj2 is the R-squared from regression xj to all other
independent variables including an intercept.
I The size of Var ( β̂ j ) is important: a larger variance means a less
precise estimator.
Aside: The components of the OLS variance
I Variance of β̂ j depends on three factors: σ2 , SSTj , Rj2
I (1) The error variance, σ2 : A larger σ2 means larger variance− >
more noise in the equation− >makes it hard to precisely estimate β.
For a given dependent variable y , there is only one way to reduce the
error variance, and that is to add more explanatory variables to the
equation. It is not always possible to find additional legitimate
factors that affect y .
I (2) The total sample variation in xj , SSTj : The larger the variation
in xj (SSTj ) − >the smaller the variance. There is a way to increase
the sample variation in each of the independent variables: increase
the sample size.
I (3) The linear relationships among the independent variables, Rj2 :
high multicollinearity between xj and other independent variables
leads to imprecise estimate of xj (e.g perfect multicollinearity means
Rj2 = 1 and the variance is infinite) Note that Rj2 = 1 is ruled out by
assumption 3.
2. Gauss-Markov Theorem: OLS is BLUE. Under A1-A5, the OLS
estimator β̂ is the best linear unbiased estimator (BLUE) of true
parameter β.
Best means the most efficient (i.e. smallest variance) estimator.
Assumptions again...
I A1: Model is linear
I A2: Have random sample of (yi,xi)
I A3: None of the Xs is constant and there is no perfect
multicollinearity
I A4: Zero conditional mean of errors (exogeneity)
I A5: Error has the same variance for all Xs (homoskedasticity)
I A6:.....
3. Inference with OLS estimator
I Under Gauss-Markov (GM) assumptions, the distribution of β̂ j can
have virtually any shape.
I To make the sampling distribution tractable, we add an assumption
on the distribution of the errors:
I A6. Normality: The population error u is normally distributed with
zero mean and constant variance: : u ∼ N (0, σ2 )
I The assumption of normality, as we have stated it, subsumes both
the assumption of the error process being independent of the
explanatory variables (A4), and that of homoskedasticity (A5). For
cross-sectional regression analysis, these four assumptions define the
classical linear model (CLM).
What does A6 add?
I The CLM assumptions A1-A6 allows to obtain the exact sampling
distributions of t-statistics and F -statistics, so that we can carry out
exact hypotheses tests.
I The rationale for A6: we can appeal to the Central Limit Theorem
to suggest that the sum of a large number of random factors will be
approximately normally distributed.
I The assumption of normally distributed error is probably not a bad
assumption.
Testing hypotheses about single parameter: the t test Under the
CLM assumptions, a test statistic formed from the OLS estimates may be
expressed as:

β̂ j − β j
∼ tn − k − 1
se ( β̂ j )

This test statistic allows us to test the null hypothesis:

H0 : β j = 0

I Here: se ( β̂ j ) = σ̂β̂j ; Var ( β̂) = σ̂u2 (x0 x)−1


I We have (n − k − 1) degrees of freedom. Where n is not that large
relative to k, the resulting t distribution will have considerably fatter
tails than the standard normal.
I Where (n − k − 1) is a large number - greater than 100, for instance
the t distribution will essentially be the standard normal. − > That
is why big n helps.
Summary
I A1-A4: OLS is unbiased
I A1-A5: we derive var ( βˆj )
I A1-A5: Gauss-Markov holds and OLS estimator is BLUE
I A1-A6: we obtain the exact distributions of t-statistics and
F-statistics
The idea is:
I x1 → y where → implies ”affect”.
I u→y
I We want to identify the part of x1 (excluding the part of x1 that
jointly varying with x2 ) and the part of u that are uncorrelated
I then we can have E ( β̂ 1 |x) = β 1
I What can we do in the case of: wage= β 0 +β 1 edu+u ?
The idea is:
I Broadly speaking, empirical micro-econometric methodologies can
be viewed as tools to solve this problem (i.e. cov (x1∗ , u ∗ ) 6= 0) this
is also called endogeneity problem or violation of exogeneity
assumption) typically using two approaches:
I (i) Try to include as much information as possible in x2 so that u is
as small as possible. The examples of approaches include DID
method, fixed effects estimator in panel data model, among many
others.
I (ii) Try to design the setting and the dataset such that cov (x1 ) is
random or isolate the part of variation in cov (x1 ) that is
uncorrelated to unobserved factors u. The examples of this type of
approaches include random experiment design, instrumental variables
approach among others.
I Of course, we can combine these approaches.
Summary of Linear Regression Model
I Advantages:
I Very robust technique that allows to incorporate fairly general
functional form relationships
I Also provides a basis for more advanced empirical methods
I Transparent and relatively easy to understand technique
I Economists use econometric methods to effectively hold other
factors fixed.
I (Causality and ceteris paribus) In economic theory, economic
relations hold ceteris paribus (i.e. holding all other relevant variables
fixed); but since the econometrician does not observe all of the
factors that might important, we cannot always make sensible
inferences about potentially causal factors.
I Our best hope is that we might control for many of the factors, and
be able to use our empirical findings to examine whether
systematic/important factors have been omitted.
Supplementary slides
Main issue here is: when a regression is likely to have a causal
interpretation
I Consider the following model and suppose that we are interested in
estimating β 1 :

y = f (x1 , x2 , ..., xk )
y = β 1 x1 + x 2 β 2 + u
E (y |x) = β 1 x1 + x2 β 2
E (u |x) = 0

I Suppose we estimate y = β 1 x1 + u1 instead and we implement OLS


to get β̂ 1 .
cov (x1 ,y )
I Note that β̂ 1 = cov (x1 ,x1 )
.

cov (x1 , β 1 x1 ) cov (x1 , x2 β 2 ) cov (x1 , u )


β̂ 1 = + +
cov (x1 , x1 ) cov (x1 , x1 ) cov (x1 , x1 )
cov (x1 , x2 ) cov (x1 , u )
β̂ 1 = β 1 + β2 +
cov (x1 , x1 ) cov (x1 , x1 )

I What is the condition for E ( β̂ 1 |x) = β 1 ?


Main issue here is:
cov (x ,u )
I Even if we assume that E ( cov (x 1,x ) |x) = 0, we still have
1 1
cov (x ,x )
E ( cov (x1 ,x2 ) |x) 6= 0 where x = (x1 , x2 ).
1 1

cov (x1 , x2 )
E ( β̂ 1 |x) = β 1 + E ( |x) β 2
cov (x1 , x1 )

I In the previous examples, cov (x1 , x2 ) 6= 0 so E ( β̂ 1 |x) 6= β 1 .

wage = β 1 educ + β 2 IQ + β 3 exper + β 4 tenure + u

I wage = β 1 educ + u1
Supplementary slides: What is the importance of assuming
normality for the error process? Under the assumptions of the
classical linear model, normally distributed errors give rise to normally
distributed OLS estimators:

β̂ j ∼ N ( β j , Var ( β̂ j ))

where Var ( β̂ j ) is provided in p.19 and which will then imply that:

β̂ j − β j
∼ N (0, 1)
sd ( β̂ j )

where sd ( β̂ j ) = σβ̂j
I This follows since each of the β̂ j can be written as a linear
combination of the errors in the sample.
I Since we assume that the errors are independent, identically
distributed normal random variates, any linear combination of those
errors is also normally distributed.
I Any linear combination of the β̂ j is also normally distributed, and a
subset of these estimators has a joint normal distribution.
Supplementary slides: Heteroskedasticity robust variance: Under
A1-A4 assumptions, heteroskedasticity robust variance for β̂ j is provided
as
Σni=1 rˆij2 ûi2
[ ( β̂ j |x) =
Avar (1)
SSRj2
where rˆij denotes the ith residual from regression xj on all other
independent variables, and SSRj is the sum of squared residuals from this
regression.
In matrix form:
[ ( β̂|x) = (Σni=1 x̂i0 x̂i )−1 · (ΣN
Avar 0 2 0
i =1 x̂i x̂i ûi ) · ( Σi =1 x̂i x̂i )
N −1

Avar ( β̂|x) = E ((x0 x)−1 (x0 uu 0 x)(x0 x)−1 |x)


where the square roots of the diagonal elements of this matrix are the
heteroskedasticity robust standard errors as the square roots of variance.
Under homoskedasticity, σ2 · (x0 x)−1 .
I Most statistical packages now support the calculation of these
robust standard errors when a regression is estimated.
I The heteroskedasticity robust standard errors may be used to
compute the heteroskedasticity-robust t-statistic and, likewise,
F -statistics.

Anda mungkin juga menyukai