Anda di halaman 1dari 74

Chapter 10

Regression with
Panel Data

Copyright 2015 Pearson, Inc. All rights reserved.

Outline
1. Panel Data: What and Why
2. Panel Data with Two Time Periods
3. The FIXED effects Model:
1. What is it
2. Potential issues
3. TIME Fixed Effects

4. The RANDOM effects Model


1. What is it
2. Potential issues

5. Choosing b/t the FIXED effects and RANDOM


effects Model
Copyright 2015 Pearson, Inc. All rights reserved.

10-2

Panel Data: What and Why


(SW Section 10.1)
A panel dataset contains observations on multiple
entities (individuals, states, companies), where
each entity is observed at two or more points in
time.
Hypothetical examples:
Data on 420 California school districts in 1999 and again
in 2000, for 840 observations total.
Data on 50 U.S. states, each state is observed in 3 years,
for a total of 150 observations.
Data on 1000 individuals, in four different months, for
4000 observations total.

Copyright 2015 Pearson, Inc. All rights reserved.

10-3

What Are Panel Data? (cont.)


There are four different kinds of variables that we encounter when
we use panel data:
1. Variables that can differ between individuals but dont change
over time:
e.g., gender, ethnicity, and race

2. Variables that change over time but are the same for all
individuals in a given time period:

e.g., the retail price index and the national unemployment rate

3. Variables that vary both over time and between individuals:


e.g., income and marital status

4. Trend variables that vary in predictable ways:


e.g., an individuals age

More in class practice and examples

Copyright 2015 Pearson, Inc. All rights reserved.

10-5

Notation for panel data


A double subscript distinguishes entities (states) and time
periods (years)
i = entity (state), n = number of entities,
so i = 1,,n
t = time period (year), T = number of time periods
so t =1,,T
Data: Suppose we have 1 regressor. The data are:
(Xit, Yit), i = 1,,n, t = 1,,T

Copyright 2015 Pearson, Inc. All rights reserved.

10-6

Panel data notation, ctd.


Panel data with k regressors:
(X1it, X2it,,Xkit, Yit), i = 1,,n, t = 1,,T
n = number of entities (states)
T = number of time periods (years)
Some jargon
Another term for panel data is longitudinal data
balanced panel: no missing observations, that is, all
variables are observed for all entities (states) and all time
periods (years)

Copyright 2015 Pearson, Inc. All rights reserved.

10-7

Why are panel data useful?


With panel data we can control for factors that:
Vary across entities but do not vary over time
Could cause omitted variable bias if they are
omitted
Are unobserved or unmeasured and therefore
cannot be included in the regression using multiple
regression
Heres the key idea:
If an omitted variable does not change over time,
then any changes in Y over time cannot be caused
by the omitted variable.
Copyright 2015 Pearson, Inc. All rights reserved.

10-8

Example of a panel data set:


Traffic deaths and alcohol taxes
Observational unit: a year in a U.S. state
48 U.S. states, so n = # of entities = 48
7 years (1982,, 1988), so T = # of time periods = 7
Balanced panel, so total # observations = 748 = 336

Variables:
Traffic fatality rate (# traffic deaths in that state in that
year, per 10,000 state residents)
Tax on a case of beer
Other (legal driving age, drunk driving laws, etc.)

Copyright 2015 Pearson, Inc. All rights reserved.

10-9

U.S. traffic death data for 1982:

Higher alcohol taxes, more traffic deaths?


Copyright 2015 Pearson, Inc. All rights reserved.

10-10

Why might there be higher/more traffic deaths


in states that have higher alcohol taxes?

Other factors that determine traffic fatality


rate:
Quality (age) of automobiles
Quality of roads
Culture around drinking and driving
Density of cars on the road

Copyright 2015 Pearson, Inc. All rights reserved.

10-11

These omitted factors could cause


omitted variable bias.
Example #1: traffic density. Suppose:
I.

High traffic density means more traffic deaths

II. (Western) states with lower traffic density have lower


alcohol taxes
. Then the two conditions for omitted variable bias are
satisfied. Specifically, high taxes could reflect high traffic
density (so the OLS coefficient would be biased positively
high taxes, more deaths)
. Panel data lets us eliminate omitted variable bias when the
omitted variables are constant over time within a given
state.
Copyright 2015 Pearson, Inc. All rights reserved.

10-12

Example #2:Cultural attitudes towards drinking and


driving:
(i) arguably are a determinant of traffic deaths; and
(ii) potentially are correlated with the beer tax.

Then the two conditions for omitted variable bias are


satisfied. Specifically, high taxes could pick up the effect
of cultural attitudes towards drinking so the OLS coefficient
would be biased

Panel data lets us eliminate omitted variable bias when the


omitted variables are constant over time within a given
state.

Copyright 2015 Pearson, Inc. All rights reserved.

10-13

Outline
1. Panel Data: What and Why
2. Panel Data with Two Time Periods
3. The FIXED effects Model:
1. What is it
2. Potential issues
3. TIME Fixed Effects

4. The RANDOM effects Model


1. What is it
2. Potential issues

5. Choosing b/t the FIXED effects and RANDOM


effects Model
Copyright 2015 Pearson, Inc. All rights reserved.

10-14

Panel Data with Two Time Periods


(SW Section 10.2)
Consider the panel data model,
FatalityRateit = 0 + 1BeerTaxit + 2Zi + uit
Zi is a factor that does not change over time (density), at
least during the years on which we have data.
Suppose Zi is not observed, so its omission could result
in omitted variable bias.
The effect of Zi can be eliminated using T = 2 years.
Copyright 2015 Pearson, Inc. All rights reserved.

10-15

The key idea:


Any change in the fatality rate from 1982 to 1988 cannot be
caused by Zi, because Zi (by assumption) does not change
between 1982 and 1988.

The math: consider fatality rates in 1988 and 1982:


FatalityRatei1988 = 0 + 1BeerTaxi1988 + 2Zi + ui1988
FatalityRatei1982 = 0 + 1BeerTaxi1982 + 2Zi + ui1982

Suppose E(uit|BeerTaxit, Zi) = 0.


Subtracting 1988 1982 (that is, calculating the
change), eliminates the effect of Zi

Copyright 2015 Pearson, Inc. All rights reserved.

10-16

FatalityRatei1988 = 0 + 1BeerTaxi1988 + 2Zi + ui1988


FatalityRatei1982 = 0 + 1BeerTaxi1982 + 2Zi + ui1982
so
FatalityRatei1988 FatalityRatei1982 =
1(BeerTaxi1988 BeerTaxi1982) + (ui1988 ui1982)

The new error term, (ui1988 ui1982), is uncorrelated with


either BeerTaxi1988 or BeerTaxi1982.

This difference equation can be estimated by OLS, even


though Zi isnt observed.

The omitted variable Zi doesnt change, so it cannot be a


determinant of the change in Y

This differences regression doesnt have an intercept it was


eliminated by the subtraction step

Copyright 2015 Pearson, Inc. All rights reserved.

10-17

Example: Traffic deaths and beer taxes


1982 data:

= 2.01 + 0.15BeerTax
FatalityRate
(.15)

(n = 48)

(.13)

1988 data:
= 1.86 + 0.44BeerTax

FatalityRate
(.11)

(n = 48)

(.13)

Difference regression (n = 48)


= .072 1.04(BeerTax1988BeerTax1982)

FR1988 FR1982

(.065) (.36)
An intercept is included in this differences regression allows
for the mean change in FR to be nonzero more on this
later

Copyright 2015 Pearson, Inc. All rights reserved.

10-18

FatalityRate v. BeerTax:

Note that the intercept is nearly zero


Copyright 2015 Pearson, Inc. All rights reserved.

10-19

Outline
1. Panel Data: What and Why DONE
2. Panel Data with Two Time Periods DONE
3. The FIXED effects Model:
1. What is it
2. Potential issues
3. TIME Fixed Effects

4. The RANDOM effects Model


1. What is it
2. Potential issues

5. Choosing b/t the FIXED effects and RANDOM


effects Model
Copyright 2015 Pearson, Inc. All rights reserved.

10-20

The Fixed Effects


Model
Copyright 2015 Pearson, Inc. All rights reserved.

10-21
16-

Fixed Effects
Fixed-effects (FE) explore the relationship between the
independent variables and dependent variable within an
entity (country, state, institution etc.).
Each entity (state) has its own individual characteristics that
may or may not influence the dependent variables
Why use FE? Because we believe that something within the
entity (state) will bias the variables; we need to control for
this to get unbiased estimates. Therefore, FE removes the
effect of those time-invariant characteristics from the
independent variables so we can assess their net effect.
Copyright 2015 Pearson, Inc. All rights reserved.

10-22

Fixed Effects Regression


(SW Section 10.3)
What if you have more than 2 time periods (T > 2)?
Yit = 0 + 1Xit + 2Zi + uit, i =1,,n, T = 1,,T
We can rewrite this in two useful ways:
1. n-1 binary regressor regression model
2. Fixed Effects regression model
We first rewrite this in fixed effects form. Suppose
we have n = 3 states: California, Texas, and
Massachusetts.
Copyright 2015 Pearson, Inc. All rights reserved.

10-23

Yit = 0 + 1Xit + 2Zi + uit, i =1,,n, T = 1,,T


Population regression for California (that is, i = CA):
YCA,t = 0 + 1XCA,t + 2ZCA + uCA,t
= (0 + 2ZCA) + 1XCA,t + uCA,t
Or
YCA,t = CA + 1XCA,t + uCA,t
CA = 0 + 2ZCA doesnt change over time
CA is the intercept for CA, and 1 is the slope
The intercept is unique to CA, but the slope is the
same in all the states: parallel lines.
Copyright 2015 Pearson, Inc. All rights reserved.

10-24

For TX:
YTX,t = 0 + 1XTX,t + 2ZTX + uTX,t
= (0 + 2ZTX) + 1XTX,t + uTX,t

or
YTX,t = TX + 1XTX,t + uTX,t, where TX = 0 + 2ZTX

Collecting the lines for all three states:


YCA,t = CA + 1XCA,t + uCA,t
YTX,t = TX + 1XTX,t + uTX,t
YMA,t = MA + 1XMA,t + uMA,t
or

Yit = i + 1Xit + uit,


Copyright 2015 Pearson, Inc. All rights reserved.

i = CA, TX, MA,

t = 1,,T
10-25

The regression lines for each state in a


picture

Recall that shifts in the intercept can be represented using


binary regressors
Copyright 2015 Pearson, Inc. All rights reserved.

10-26

We now put this in binary regressor form:


Yit = 0 + CADCAi + TXDTXi + 1Xit + uit
DCAi = 1 if state is CA, = 0 otherwise
DTXt = 1 if state is TX, = 0 otherwise
leave out DMAi (why?)
Copyright 2015 Pearson, Inc. All rights reserved.

10-27

Why no dummy for Alabama?

Copyright 2015 Pearson, Inc. All rights reserved.

10-28

Summary: Two ways to write the fixed


effects model
1. n-1 binary regressor form: good for initial
intuition
Yit = 0 + 1Xit + 2D2i + + nDni + uit
1 for i=2 (state #2)

where D2i =

0 otherwise

, etc.

2. Fixed effects form: what we really use in real life


Yit = 1Xit + i + uit
. i is called a state fixed effect or state effect it is
the constant (fixed) effect of being in state i
Copyright 2015 Pearson, Inc. All rights reserved.

10-29

Stata FE model: Method 1

xi: reg mrall beertax i.state

This is how we create a dummy variable for each state

i.state
_Istate_1-56
(naturally coded; _Istate_1 omitted)
Source |
SS
df
MS
Number of obs
=
336
-------------+---------------------------------F(48, 287)
=
56.97
Model | 9.8570e-07
48 2.0535e-08
Prob > F
=
0.0000
Residual | 1.0345e-07
287 3.6047e-10
R-squared
=
0.9050
-------------+---------------------------------Adj R-squared
=
0.8891
Total | 1.0892e-06
335 3.2512e-09
Root MSE
=
1.9e-05
-----------------------------------------------------------------------------mrall |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------beertax | -.0000656
.0000188
-3.49
0.001
-.0001026
-.0000286
_Istate_2 | -.0000568
.0000267
-2.13
0.034
-.0001093
-4.29e-06
_Istate_3 | -.0000655
.0000219
-2.99
0.003
-.0001086
-.0000224
_Istate_4 | -.0001509
.0000304
-4.96
0.000
-.0002109
-.000091
.
_Istate_49 | -.0001759
.0000294
-5.98
0.000
-.0002338
-.0001181
_Istate_50 | -.0000229
.0000313
-0.73
0.466
-.0000844
.0000387
_cons |
.0003478
.0000313
11.10
0.000
.0002861
.0004094
-----------------------------------------------------------------------------10-30
Copyright 2015 Pearson, Inc. All rights reserved.

Stata Model FE: Method 2


First let STATA know you are working with panel
data by defining the entity variable (state) and
time variable (year):
.

xtset state year;


panel variable:
time variable:
delta:

Copyright 2015 Pearson, Inc. All rights reserved.

state (strongly balanced)


year, 1982 to 1988
1 unit

10-31

Stata Model FE: Method 2


xtreg mrall beertax, fe
Fixed-effects (within) regression
Group variable: state
R-sq:
within = 0.0407
between = 0.1101
overall = 0.0934

Number of obs
=
336
Number of groups =
48
Obs per group:
min =
7
avg =
7.0
max =
7
F(1,287)
=
12.19
corr(u_i, Xb) = -0.6885
Prob > F
=
0.0006
-----------------------------------------------------------------------------mrall |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------beertax | -.0000656
.0000188
-3.49
0.001
-.0001026
-.0000286
_cons |
.0002377
9.70e-06
24.51
0.000
.0002186
.0002568
-------------+---------------------------------------------------------------sigma_u | .00007147
sigma_e | .00001899
rho | .93408484
(fraction of variance due to u_i)
-----------------------------------------------------------------------------F test that all u_i=0: F(47, 287) = 52.18
Prob > F = 0.0000

The panel data command xtreg with the option fe performs fixed effects regression.
The reported intercept is arbitrary, and the estimated individual effects are not
reported in the default output.
The fe option means use fixed effects regression
10-32
Copyright 2015 Pearson, Inc. All rights reserved.

Do they give different results ?


Which one should I pick ?
Lets compare the results in a table side by side
we create from scratch:
xi: reg mrall beertax i.state
estimates store xifeno
xtreg mrall beertax, fe
estimates store xtfeno
estimates table xifeno xtfeno, b(%7.4f) se(%7.4f) t
(%7.4f) stats(N r2_a)

Copyright 2015 Pearson, Inc. All rights reserved.

10-33

estimates table xife xtfe, b(%7.4f) se(%7.4f) t (%7.4f)


stats(N r2_a)
---------------------------------Variable | xifeno
xtfeno
-------------+-------------------beertax | -0.0001
-0.0001
|

0.0000

0.0000

-3.4915

-3.4915

_Istate_2 |

-0.0001

0.0000

-2.1290

_Istate_48 |

-0.0001

0.0000

-3.6363

_Istate_49 |

-0.0002

0.0000

Copyright 2015 Pearson, Inc. All rights reserved.

Remember: We always get the


correct adjR2 by running the xi
command, NOT xtreg !

Which AdjR2 is correct then ?


The one from the xi: reg .i.state
command
10-34
16-

Outline
1. Panel Data: What and Why DONE
2. Panel Data with Two Time Periods DONE
3. The FIXED effects Model:
1. What is it DONE
2. Potential issues
3. TIME Fixed Effects

4. The RANDOM effects Model


1. What is it
2. Potential issues

5. Choosing b/t the FIXED effects and RANDOM


effects Model
Copyright 2015 Pearson, Inc. All rights reserved.

10-35

Potential Issues:
1. Heteroskedasticity across i entities
.What is it ?
.Why is it a problem ?
.How do we detect it ?
.How do we fix it ?
2. Serial Correlation across years within entity
.What is it ?
.Why is it a problem ?
.How do we detect it ?
.How do we fix it ?
10-36
Copyright 2015 Pearson, Inc. All rights reserved.

Issue 1: Heteroskedasticity

1. How do we identify if we have it?


xttest3 command
xtreg (your regression), fe
xttest3
Modified Wald test for groupwise heteroskedasticity in fixed effect
regression model
H0: sigma(i)^2 = sigma^2 for all i
What does this tell us about
chi2 (48) =
4826.21
our null (reject / cannot reject)
Prob>chi2 =
0.0000
Do we have heterosked ?

2. How to correct for it?


xtreg (your regression), fe robust
Copyright 2015 Pearson, Inc. All rights reserved.

10-37

Issue 2: Serial Correlation:

What if for an
entity i, the errors are correlated across time ?

Serial Correlation:
Run xtserial (your regression), output
Ho: no serial correlation in the idiosyncratic errors
Reject the null: we have
serial correlation
in the idiosyn errors

How to correct for it?


xtregar (your regression), fe will run the FE effects model with AR(1)

Copyright 2015 Pearson, Inc. All rights reserved.

10-38

What we know so far:


If we have HETEROSKEDASTICITY ONLY
We find is using the xttest3 command
Correct it via the xtreg ., fe robust

If we have SERIAL CORRELATION ONLY


We find it via the xtserial test
We correct it via the xtregar., fe command

But what if we have both ?


Copyright 2015 Pearson, Inc. All rights reserved.

10-39

Valid Standard Errors


HAC (heteroskedastic and autocorrelation
consistent) Standard Errors:
SE that are valid even if our error term is
heteroskedastic and serially correlated within entity.
One type of HAC:
Clustered standard errors:
Allow for heteroskedasticity and for arbitrary
autocorrelation within entity, but assume errors
are uncorrelated across entities.

Copyright 2015 Pearson, Inc. All rights reserved.

10-40

Clustered Standard Errors


Clustered standard errors estimate the
variance of 1 when the errors are:
i.i.d. across entities
but are potentially autocorrelated within an
entity.

Copyright 2015 Pearson, Inc. All rights reserved.

10-41

Clustered SEs: Implementation in STATA


. xtreg vfrall beertax, fe vce(cluster state)
Fixed-effects (within) regression
Group variable: state
R-sq: within = 0.0407
between = 0.1101
overall = 0.0934
corr(u_i, Xb)

= -0.6885

Number of obs
Number of groups
Obs per group: min
avg
max
F(1,47)
Prob > F

=
=
=
=
=
=
=

336
48
7
7.0
7
5.05
0.0294

(Std. Err. adjusted for 48 clusters in state)


-----------------------------------------------------------------------------|
Robust
vfrall |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------beertax | -.6558736
.2918556
-2.25
0.029
-1.243011
-.0687358
_cons |
2.377075
.1497966
15.87
0.000
2.075723
2.678427
------------------------------------------------------------------------------

vce(cluster state) says to use clustered standard errors, where the


clustering is at the state level (observations that have the same
value of the variable state are allowed to be correlated, but are
assumed to be uncorrelated if the value of state differs)
Copyright 2015 Pearson, Inc. All rights reserved.

10-42

Clustered SEs: Implementation in STATA


. xtreg vfrall beertax, fe vce(cluster state)
Fixed-effects (within) regression
Group variable: state
R-sq: within = 0.0407
between = 0.1101
overall = 0.0934
corr(u_i, Xb)

= -0.6885

Number of obs
Number of groups
Obs per group: min
avg
max
F(1,47)
Prob > F

=
=
=
=
=
=
=

336
48
7
7.0
7
5.05
0.0294

(Std. Err. adjusted for 48 clusters in state)


-----------------------------------------------------------------------------|
Robust
vfrall |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------beertax | -.6558736
.2918556
-2.25
0.029
-1.243011
-.0687358
_cons |
2.377075
.1497966
15.87
0.000
2.075723
2.678427
------------------------------------------------------------------------------

Is the R2 correct ?
NO, remember that the correct adj R2 can only be obtained by running the
xi command:

xi: reg vfrall beertax i.state

Copyright 2015 Pearson, Inc. All rights reserved.

10-43

What if instead the residuals are correlated across


groups, NOT within entity ?
Clustered standard errors wont work since they:
Allow for heteroskedasticity and for arbitrary
autocorrelation within entity, but assume errors are
uncorrelated across entities.

We use our regular approach: identify if


problem exists and then correct it.
1. How do we identify if our residuals are indeed
correlated across groups ?
2. How do we fix it?
Copyright 2015 Pearson, Inc. All rights reserved.

10-44

Issue 3: What if the residuals are correlated across groups ?

1. How do we identify if our residuals are indeed


correlated across groups ?

Pesaran CD (cross-sectional dependence) test


Ho: residuals are not correlated.
xtreg (your regression), fe
xtcsd, pesaran abs
2. How to correct for it?
If we reject the null (Prob<0.05), use Driscoll and Kraay
standard errors
xtscc (your regression), fe
10-45
Copyright 2015 Pearson, Inc. All rights reserved.

Summary of issues and solutions


If we have HETEROSKEDASTICITY ONLY
We find is using the xttest3 test
Correct it via the robust option

If we have SERIAL CORRELATION ONLY


We find it via the xtserial test
We correct it via the xtregar., fe command
If we have both HETEROSKED & SERIAL CORELATION:
Use the xtreg , fe vce (cluster id)
____________________________________

If we have ERRORS correlated ACROSS entities


We find is using the Pesaran Test
Correct it via the xtscc command

Copyright 2015 Pearson, Inc. All rights reserved.

10-46

Outline
1. Panel Data: What and Why DONE
2. Panel Data with Two Time Periods DONE
3. The FIXED effects Model:
1. What is it DONE
2. Potential issues DONE
3. TIME Fixed Effects

4. The RANDOM effects Model


1. What is it
2. Potential issues

5. Choosing b/t the FIXED effects and RANDOM


effects Model
Copyright 2015 Pearson, Inc. All rights reserved.

10-47

Regression with Time Fixed Effects


(SW Section 10.4)
An omitted variable might vary over time but not
across states:
Safer cars (air bags, etc.); changes in national
laws
These produce intercepts that change over time
Let St denote the combined effect of variables
which changes over time but not states (safer
cars).
The resulting population regression model is:
Yit = 0 + 1Xit + 2Zi + 3St + uit
Copyright 2015 Pearson, Inc. All rights reserved.

10-48

Estimation with both entity and time


fixed effects
Yit = 1Xit + i + t + uit
Which are the entity fixed effects ?
Which are the time fixed effects ?
Explain please the subscripts.
Is that an error or why dont be have a 0

Copyright 2015 Pearson, Inc. All rights reserved.

10-49

.
.
.
.
.
.
.
.

gen y83=(year==1983);
First generate all the time binary variables
gen y84=(year==1984);
gen y85=(year==1985);
gen y86=(year==1986);
gen y87=(year==1987);
gen y88=(year==1988);
global yeardum "y83 y84 y85 y86 y87 y88";
xtreg vfrall beertax $yeardum, fe vce(cluster state);

Fixed-effects (within) regression


Number of obs
=
336
Group variable: state
Number of groups
=
48
R-sq: within = 0.0803
Obs per group: min =
7
between = 0.1101
avg =
7.0
overall = 0.0876
max =
7
corr(u_i, Xb) = -0.6781
Prob > F
=
0.0009
(Std. Err. adjusted for 48 clusters in state)
-----------------------------------------------------------------------------|
Robust
vfrall |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------beertax | -.6399799
.3570783
-1.79
0.080
-1.358329
.0783691
y83 | -.0799029
.0350861
-2.28
0.027
-.1504869
-.0093188
y84 | -.0724206
.0438809
-1.65
0.106
-.1606975
.0158564
y85 | -.1239763
.0460559
-2.69
0.010
-.2166288
-.0313238
y86 | -.0378645
.0570604
-0.66
0.510
-.1526552
.0769262
y87 | -.0509021
.0636084
-0.80
0.428
-.1788656
.0770615
y88 | -.0518038
.0644023
-0.80
0.425
-.1813645
.0777568
_cons |
2.42847
.2016885
12.04
0.000
2.022725
2.834215
-------------+---------------------------------------------------------------10-50
Copyright 2015 Pearson, Inc. All rights reserved.

Are the time effects jointly statistically


significant?
First Method:
.
(
(
(
(
(
(

test $yeardum;
1)
2)
3)
4)
5)
6)

y83
y84
y85
y86
y87
y88

F(

=
=
=
=
=
=

0
0
0
0
0
0

6,

47) =
Prob > F =

4.22
0.0018

Yes

Copyright 2015 Pearson, Inc. All rights reserved.

10-51

Do we need Time Fixed Effects?


Second method:
joint test to see if the dummies for all years are equal to 0, if they
are then no time fixed effects are needed.
testparm _Iyear*

Copyright 2015 Pearson, Inc. All rights reserved.

Reject the null that all time


coefficients are
equal to zero,
so we do need the time
10-52
fixed effects

More in class practice and examples


1. The data set
originally comes
with one column
listing the
drinking age for
each state. How
do you create the
variables
Drinking Age 18,
Drinking Age 19
and Drinking Age
20 for model (4)
and (5) ?

2. Why would
you create
them in the
first place ?
Copyright 2015 Pearson, Inc. All rights reserved.

10-53

More in class practice and examples


3. What is
income in real
values ? How do
we obtain those
relative to
nominal values ?
4. Why is real
income per capita
in log terms ?

Copyright 2015 Pearson, Inc. All rights reserved.

10-54

Under the LS assumptions for panel data:


The OLS fixed effect estimator 1 is unbiased,
consistent, and asymptotically normally distributed
However, the usual OLS standard errors (both
homoskedasticity-only and heteroskedasticityrobust) will in general be wrong because they
assume that uit is serially uncorrelated.
In practice, the OLS standard errors often understate the
true sampling uncertainty: if uit is correlated over time,
you dont have as much information (as much random
variation) as you would if uit were uncorrelated.
This problem is solved by using clustered standard
errors.
Copyright 2015 Pearson, Inc. All rights reserved.

10-55

More in class practice and examples


1. What is the
difference
between each of
the models?
2. Please interpret
in words each of
the coefficients
in column (4)
3. How would you
test if Time
Effects are
needed ?
4. How can you tell
this is a fixed
effects model?
Copyright 2015 Pearson, Inc. All rights reserved.

10-56

Outline
1. Panel Data: What and Why DONE
2. Panel Data with Two Time Periods DONE
3. The FIXED effects Model:
1. What is it DONE
2. Potential issues DONE
3. TIME Fixed Effects DONE

4. The RANDOM effects Model


1. What is it
2. Potential issues

5. Choosing b/t the FIXED effects and RANDOM


effects Model
Copyright 2015 Pearson, Inc. All rights reserved.

10-57

The Random Effects


Model

Copyright 2015 Pearson, Inc. All rights reserved.

10-58
16-58

The Random Effects Model


Recall that the fixed effects model is based on the
assumption that each cross-sectional unit has its
own intercept
The random effects model instead is based on the
assumption that the intercept for each crosssectional unit is drawn from a distribution (that is
centered around a mean intercept)
Thus each intercept is a random draw from an
intercept distribution and therefore is independent of
the error term for any particular observation
Hence the term random effects model
Copyright 2015 Pearson, Inc. All rights reserved.

10-59
16-

This was fixed effects, from a few slides ago.


Random Effects would mean different slopes !

Copyright 2015 Pearson, Inc. All rights reserved.

10-60
16-

The Random Effects Model (cont.)


Advantages of the random effects model:
1. more degrees of freedom than a fixed effects model
This is because rather than estimating an intercept for
virtually every cross-sectional unit, all we need to do is to
estimate the parameters that describe the distribution of
the intercepts.
2. Can now also estimate time-invariant explanatory
variables (like race or gender).
Disadvantages of the random effects model:
1. Most importantly, the random effects estimator requires us to
assume that ai (the fixed effect term) is uncorrelated with
the independent variables, the Xs, if were going to avoid
omitted variable bias
10-61
16 This may be an overly strong assumption in many cases

Copyright 2015 Pearson, Inc. All rights reserved.

Random Effects
Random effects assume that the group error term is not
correlated with the independent variables which allows for
time-invariant variables to play a role as explanatory
variables: you can include time invariant variables (i.e.
gender). In the fixed effects model these variables are
absorbed by the intercept.
However, you need to specify those individual
characteristics that may or may not influence the
independent variables. Problem: some variables may not be
available (i.e. culture), leading to omitted variable bias.

Copyright 2015 Pearson, Inc. All rights reserved.

10-62

Remember our FE model results


xtreg vfrall beertax, fe vce(cluster state)
Fixed-effects (within) regression
Group variable: state
R-sq: within = 0.0407
between = 0.1101
overall = 0.0934
corr(u_i, Xb)

= -0.6885

Number of obs
Number of groups
Obs per group: min
avg
max
F(1,47)
Prob > F

=
=
=
=
=
=
=

336
48
7
7.0
7
5.05
0.0294

(Std. Err. adjusted for 48 clusters in state)


-----------------------------------------------------------------------------|
Robust
vfrall |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------beertax | -.6558736
.2918556
-2.25
0.029
-1.243011
-.0687358
_cons |
2.377075
.1497966
15.87
0.000
2.075723
2.678427
-----------------------------------------------------------------------------Copyright 2015 Pearson, Inc. All rights reserved.

10-63

Running our RE model results


xtreg vfrall beertax, re vce(cluster state)
Random-effects GLS regression
Group variable: state
R-sq:
within = 0.0407
between = 0.1101
overall = 0.0934

Number of obs
=
336
Number of groups =
48
Obs per group:
min =
7
avg =
7.0
max =
7
Wald chi2(1)
=
0.22
corr(u_i, X)
= 0 (assumed)
Prob > chi2
=
0.6373
(Std. Err. adjusted for 48 clusters in state)
-----------------------------------------------------------------------------|
Robust
vfrall |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------beertax | -.0520158
.1103327
-0.47
0.637
-.2682638
.1642323
_cons |
2.067141
.1212281
17.05
0.000
1.829539
2.304744
-------------+----------------------------------------------------------------

Are the coefficients different from the FE model ? Why ?


Can you still add in time fixed effects ? SURE !
Copyright 2015 Pearson, Inc. All rights reserved.

10-64

Outline
1. Panel Data: What and Why DONE
2. Panel Data with Two Time Periods DONE
3. The FIXED effects Model:
1. What is it DONE
2. Potential issues DONE
3. TIME Fixed Effects DONE

4. The RANDOM effects Model


1. What is it DONE
2. Potential issues

5. Choosing b/t the FIXED effects and RANDOM


effects Model
Copyright 2015 Pearson, Inc. All rights reserved.

10-65

Random Effects Potential Issues

Everything we learned under fixed effects still applies:


1. Heteroskedasticity: quick review within your group:
a) what is it,
b) how do we check for it,
c) how do correct for it

2. Serial Correlation: quick review within your group:


a) what is it,
b) how do we check for it,
c) how do correct for it

3. Both Heterosked & Serial Correlation:


a) how do correct for it

4. Cross-Sectional Dependence:
a) what is it,
b) how do we check for it,
c) how do correct for it

Copyright 2015 Pearson, Inc. All rights reserved.

quick review

10-66

Random Effects TIME Fixed Effects

Everything we learned under fixed effects still applies:

1. Can we still use Time Fixed Effects ?

2. How would you introduce them into the


regression ? Same as under Fixed Effects

First generate all the time binary variables

.gen

.
.
.
.

y83=(year==1983);
. gen y84=(year==1984);
gen y85=(year==1985);
. gen y86=(year==1986);
gen y87=(year==1987);
. gen y88=(year==1988);
global yeardum "y83 y84 y85 y86 y87 y88";
xtreg vfrall beertax $yeardum, re vce(cluster state);

3. How would you check if Time FE are needed ?


Same as under Fixed Effects.
test $yeardum
Copyright 2015 Pearson, Inc. All rights reserved.

10-67

Outline
1. Panel Data: What and Why DONE
2. Panel Data with Two Time Periods DONE
3. The FIXED effects Model:
1. What is it DONE
2. Potential issues DONE
3. TIME Fixed Effects DONE

4. The RANDOM effects Model


1. What is it DONE
2. Potential issues DONE

5. Choosing b/t the FIXED effects and RANDOM


effects Model
Copyright 2015 Pearson, Inc. All rights reserved.

10-68

Choosing Between Fixed and Random


Effects
One key is the nature of the relationship between ai and the Xs:
If theyre likely to be correlated, then it makes sense to use
the fixed effects model
If not, then it makes sense to use the random effects model
Can also use the Hausman test to examine whether there is
correlation between ai and X
Essentially, this procedure tests to see whether the regression
coefficients under the fixed effects and random effects models are
statistically different from each other
If they are different, then the fixed effects model is
preferred
If the they are not different, then the random effects
model is preferred (or estimates of both the fixed effects and
random effects models are provided)
10-69

Copyright 2015 Pearson, Inc. All rights reserved.

Choosing FE vs RE: Method 1: xtoverid


FE: indep vars are uncorrelated with the idiosyncratic error
(orthogonality conditions), but could be corr with the group error
RE: additionally indep vars are uncorrelated with the groupspecific error (orthogonality conditions)
These additional orthogonality conditions are overidentifying
restrictions;
xtoverid
Ho: indep vars are uncorrelated with the group-specific
error (the extra RE orthogonality conditions)
P-value<0.05, reject the null-> indep vars are correlated
with the group-specific error

Copyright 2015 Pearson, Inc. All rights reserved.

10-70

Choosing FE vs RE: Method 1: xtoverid


FE:
xtreg (your regression), fe vce(cluster i) will run the FE effects model
RE:
xtreg (your regression), re vce(cluster i) will run the RE effects model
xtoverid

Ho: indep vars are uncorrelated with the group-specific


error (the extra RE orthogonality conditions)

P-value<0.05, reject the null-> indep vars are correlated


with the group-specific error

Copyright 2015 Pearson, Inc. All rights reserved.

10-71

Choosing FE vs RE: Method 2: hausman


FE:
xtreg (your regression), fe vce(cluster i) will run the FE effects model
estimates store fe
RE:
xtreg (your regression), re vce(cluster i) will run the RE effects model
estimates store re
hausman fe re, sigmaless

Ho: same as before (RE is preferred to FE)


P-value<0.05, reject the null-> indep vars are correlated
with the group-specific error

Copyright 2015 Pearson, Inc. All rights reserved.

10-72

More in class practice and examples


What is wrong with this Hausman Test ?
xtreg vfrall beertax, fe vce(cluster state)
estimates store FE
xtreg vfrall beertax, re vce(cluster state)
estimates store RE
hausman FE RE, sigmaless
STATA RESULT:
ERROR: hausman cannot be used with vce(robust), vce(cluster cvar),
or p-weighted data
Correct:
xtreg vfrall beertax, fe vce(cluster state)
estimates store FE
xtreg vfrall beertax, re vce(cluster state)
estimates store RE
hausman FE RE, sigmaless
Copyright 2015 Pearson, Inc. All rights reserved.

10-73

More in class practice and examples


We run the correct Hausman test and get the following
result. Should we choose FE or RE ?
hausman FE RE, sigmaless
---- Coefficients ---|
(b)
(B)
(b-B)
sqrt(diag(V_b-V_B))
|
FE
RE
Difference
S.E.
-------------+---------------------------------------------------------------beertax |
-.6558736
-.0520158
-.6038579
.1435348
-----------------------------------------------------------------------------b = consistent under Ho and Ha; obtained from xtreg
B = inconsistent under Ha, efficient under Ho; obtained from xtreg
Test:

Ho:

difference in coefficients not systematic


chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B)
=
17.70
Prob>chi2 =
0.0000

Copyright 2015 Pearson, Inc. All rights reserved.

10-74

Anda mungkin juga menyukai