Anda di halaman 1dari 251

DEEQA,Ecole Doctorale MPSE

Academic year 2003-2004

Advanced Econometrics
Panel data econometrics
and GMM estimation

Alban Thomas
MF 102, thomas@toulouse.inra.fr

Purpose of the course

 Present recent developments in econometrics, that allow for


a consistent treatment of the impact of unobserved heterogeneity
on model predictions:

Panel data analysis.

 Present a convenient econometric framework for dealing with


restrictions imposed by theory:

Method of Moments estimation.

Deal with discrete-choice models with unobserved hetero-

geneity.

Two keywords: unobserved heterogeneity and endogeneity.

Methods:
- Fixed Eects Least Squares
- Generalized Least Squares
- Instrumental Variables
- Maximum Likelihood estimation for Panel Data models

- Generalized Method of Moments for Times Series


- Generalized Method of Moments for Panel Data
- Heteroskedasticity-consistent estimation
- Dynamic Panel Data models

- Logit and Probit models for Panel Data


- Simulation-based inference
- Nonparametric and Semiparametric estimation

Statistical software: SAS, GAUSS, STATA (?)

Contents
I

Panel Data Models

Introduction

1.1

Gains in pooling cross section and time series . . .

1.1.1

Discrimination between alternative models .

1.1.2

Examples . . . . . . . . . . . . . . . . . . .

10

1.1.3

Less colinearity between explanatory variables 11

1.1.4

May reduce bias due to missing or unobserved variables

. . . . . . . . . . . . . . .

11

1.2

Analysis of variance . . . . . . . . . . . . . . . . .

12

1.3

Some denitions . . . . . . . . . . . . . . . . . . .

15

The linear model

17

2.1

Notation . . . . . . . . . . . . . . . . . . . . . . .

17

2.1.1

Model notation

. . . . . . . . . . . . . . .

18

2.1.2

Standard matrices and operators . . . . . .

19

2.1.3

Important properties of operators

. . . . .

20

The One-Way Fixed Eects model . . . . . . . . .

21

2.2

2.2.1

The estimator in terms of the Frisch-WaughLovell theorem . . . . . . . . . . . . . . . .

21

2.2.2

Interpretation as a covariance estimator

. .

23

2.2.3

Comments . . . . . . . . . . . . . . . . . .

24

2.2.4

Testing for poolability and individual eects

25

CONTENTS
2.3

The Random Eects model . . . . . . . . . . . . .

26

2.3.1

Notation and assumptions

. . . . . . . . .

26

2.3.2

GLS estimation of the Random-eect model

27

2.3.3

Comparison between GLS, OLS and Within

29

2.3.4

Fixed individual eects or error components? 29

2.3.5

Example: Wage equation, Hausman (1978)

2.3.6

Best Quadratic Unbiased Estimators (BQU)


of variances

31

Extensions

33

3.1

The Two-way panel data model . . . . . . . . . . .

33

3.1.1

The Two-way xed-eect model

33

3.1.2

Example: Production function (Hoch 1962)

3.2

3.3

. . . . . . . . . . . . . . . . .

30

More on non-spherical disturbances

. . . . . .

. . . . . . . .

36
37

3.2.1

Heteroskedasticity in individual eect

. . .

37

3.2.2

`Typical heteroskedasticity . . . . . . . . .

38

Unbalanced panel data models

. . . . . . . . . . .

39

3.3.1

Introduction . . . . . . . . . . . . . . . . .

39

3.3.2

Fixed eect models for unbalanced panels .

40

Augmented panel data models

47

4.1

Introduction . . . . . . . . . . . . . . . . . . . . .

47

4.2

Choice between Within and GLS . . . . . . . . . .

48

4.3

An important test for endogeneity

49

4.4

Instrumental Variable estimation: Hausman-Taylor

. . . . . . . . .

GLS estimator . . . . . . . . . . . . . . . . . . . .

51

4.4.1

Instrumental Variable estimation . . . . . .

51

4.4.2

IV in a panel-data context

51

4.4.3

Exogeneity assumptions and a rst instru-

. . . . . . . . .

ment matrix . . . . . . . . . . . . . . . . .

52

CONTENTS
4.4.4

More ecient procedures: Amemiya-MaCurdy


and Breusch-Mizon-Schmidt

4.5

4.5.1

. . . . . . . . . . . . . . . . . . . . . .

Full IV-GLS estimation procedure

Example: Wage equation


4.6.1

4.7

55

. . . . .

56

. . . . . . . . . . . . . .

56

. . . . . . . . . . . . .

56

Model specication

Application: returns to education

. . . . . . . . .

4.7.1

Variables related to job status

4.7.2

Variables related to characteristics of households heads

53

Computation of variance-covariance matrix for IV


estimators

4.6

. . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . .

58
58

58

Dynamic panel data models

63

5.1

63

Motivation . . . . . . . . . . . . . . . . . . . . . .
5.1.1

5.2

5.3

Dynamic formulations from dynamic programming problems . . . . . . . . . . . . .

63

5.1.2

Euler equations and consumption . . . . . .

65

5.1.3

Long-run relationships in economics

. . . .

67

The dynamic xed-eect model . . . . . . . . . . .

69

5.2.1

Bias in the Fixed-Eects estimator . . . . .

70

5.2.2

Instrumental-variable estimation . . . . . .

73

The Random-eects model

. . . . . . . . . . . . .

75

5.3.1

Bias in the ML estimator . . . . . . . . . .

75

5.3.2

An equivalent representation

. . . . . . . .

76

5.3.3

The role of initial conditions

. . . . . . . .

77

5.3.4

Possible inconsistency of GLS . . . . . . . .

78

5.3.5

Example: The Balestra-Nerlove study

78

. . .

8
II
6

CONTENTS

Generalized Method of Moments estimation


The GMM estimator
6.1

6.2

6.3

85

Moment conditions and the method of moments

85

. . . . . . . . . . . . .

85

6.1.1

Moment conditions

6.1.2

Example: Linear regression model

6.1.3

Example: Gamma distribution

. . . . .

86

. . . . . . .

87

6.1.4

Method of moments estimation . . . . . . .

87

6.1.5

Example: Poisson counting model

. . . . .

88

6.1.6

Comments . . . . . . . . . . . . . . . . . .

89

The Generalized Method of Moments (GMM) . . .

91

6.2.1

Introduction . . . . . . . . . . . . . . . . .

91

6.2.2

Example: Just-identied IV model . . . . .

91

6.2.3

A denition

92

6.2.4

Example: The IV estimator again

. . . . . . . . . . . . . . . . .
. . . . .

92

Asymptotic properties of the GMM estimator . . .

93

6.3.1

Consistency

. . . . . . . . . . . . . . . . .

94

6.3.2

Asymptotic normality . . . . . . . . . . . .

95

6.4

Optimal and two-step GMM

. . . . . . . . . . . .

97

6.5

Inference with GMM

. . . . . . . . . . . . . . . .

99

6.6

Extension: optimal instruments for GMM . . . . .

102

6.6.1

Conditional moment restrictions

. . . . . .

102

6.6.2

A rst feasible estimator

. . . . . . . . . .

104

6.6.3

Nearest-neighbor estimation of optimal instruments

6.6.4

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

GMM estimators for time series models


7.1

GMM and Euler equation models


7.1.1

106

Generalizing the approach: other nonparametric estimators

83

109

115

. . . . . . . . .

115

Hansen and Singleton framework . . . . . .

115

CONTENTS
7.1.2
7.2

7.3

7.4

GMM estimation

. . . . . . . . . . . . . .

117

GMM Estimation of MA models . . . . . . . . . .

118

7.2.1

A simple estimator

. . . . . . . . . . . . .

118

7.2.2

A more ecient estimator . . . . . . . . . .

120

7.2.3

Example: The Durbin estimator

. . . . . .

121

. . . . . . . .

122

. . . . . . . . . . .

122

. . . . . . . . . . . . . . . .

123

Covariance matrix estimation . . . . . . . . . . . .

125

7.4.1

Example 1: Conditional homoskedasticity .

126

7.4.2

Example 2: Conditional heteroskedasticity .

126

7.4.3

Example 3: Covariance stationary process .

127

7.4.4

The Newey-West estimator . . . . . . . . .

128

7.4.5

Weighted autocovariance estimators

. . . .

130

7.4.6

Weighted periodogram estimators

. . . . .

133

GMM Estimation of ARMA models


7.3.1

The ARMA(1,1) model

7.3.2

IV estimation

GMM estimators for dynamic panel data

135

8.1

Introduction . . . . . . . . . . . . . . . . . . . . .

135

8.2

The Arellano-Bond estimator . . . . . . . . . . . .

136

8.2.1

Model assumptions

136

8.2.2

Implementation of the GMM estimator

. . . . . . . . . . . . .
. .

137

More ecient procedures (Ahn-Schmidt) . . . . . .

139

8.3.1

Additional assumptions . . . . . . . . . . .

139

8.4

The Blundell-Bond estimator . . . . . . . . . . . .

140

8.5

Dynamic models with Multiplicative eects

. . . .

141

8.5.1

Multiplicative individual eects . . . . . . .

141

8.5.2

Mixed structure

143

8.3

8.6

. . . . . . . . . . . . . . .

Example: Wage equation

. . . . . . . . . . . . . .

145

10
III
9

CONTENTS

Discrete choice models

149

Nonlinear panel data models


9.1

9.2

151

Brief review of binary discrete-choice models

. . .

151

. . . . . . . . . .

151

9.1.1

Linear Probability model

9.1.2

Logit model

. . . . . . . . . . . . . . . . .

152

9.1.3

Probit model . . . . . . . . . . . . . . . . .

152

Logit models for panel data . . . . . . . . . . . . .

153

9.2.1

Sucient statistics . . . . . . . . . . . . . .

153

9.2.2

Conditional probabilities

. . . . . . . . . .

155

9.2.3

Example:

. . . . . . . . . . . . . . .

156

. . . . . . . . . . . . . . . . . . . .

157

T =2

9.3

Probit models

9.4

Semiparametric estimation of discrete-choice models 158

9.5

9.4.1

The binary choice model

. . . . . . . . . .

159

9.4.2

The IV estimator

. . . . . . . . . . . . . .

162

SML estimation of selection models

. . . . . . . .

164

9.5.1

The GHK simulator . . . . . . . . . . . . .

164

9.5.2

Example

168

. . . . . . . . . . . . . . . . . . .

Appendix 1. Maximum-Likelihood estimation of the


Random-eect model
Appendix 2. The two-way random eects model

171
173

Appendix 3. The one-way unbalanced random eects


model

179

Appendix 4. ML estimation of dynamic panel models181


Appendix 5. GMM estimation of static panel models185

11

CONTENTS

Appendix 6. A framework for simulation-based inference

194

c Software
c
Appendix 8. A crash course in Gauss
c
Appendix 9. Example: The Gauss software

Appendix 7. Example: the SAS

203
211
219

c 224

Appendix 10. IV and GMM estimation with Gauss

Appendix 11. DPD estimation with Gauss

232

References

238

12

CONTENTS

Part I
Panel Data Models

13

Chapter 1
Introduction
Panel data: Sequential observations on a number of
units (individuals, rms).

cross-sections over time, longitudinal data


cross-section time-series data.

Also called

or

pooled

1.1 Gains in pooling cross section and time series


1.1.1

Discrimination between alternative models

Many economic models in the form:

F (Y; X; Z; ) = 0;
where

Y:

individual control variables (workers, rms);

policy or principal's) variables;

:

Z:

(public

(xed) individual attributes;

parameters.

Linear model:

Y = 0 + xX + z Z + u:
15

X:

16

CHAPTER 1. INTRODUCTION

Alternative views concerning this model:

 Policy variables have a signicant impact whatever individual


characteristics, or

 Dierences across individuals are due to idiosyncratic individual


features,

not included in Z .

In practice, observed dierences across individuals may be due


to both inter-individual dierences

and the impact of policy vari-

ables.

1.1.2

Examples

a) W AGE = 0 + 1EDUCAT ION + 2Z .

 People with higher education level have higher wages because


rms value those people more;

 People have higher education because they have higher ability


(expected productivity) anyway, and rms value worker ability
more.

b)

SALES = 0 + 1ADV ERT ISEMENT + 2Z .

 Advertisement expenditures boost sales;


 More ecient rms enjoy more sales, and thus have more money
for advertisement expenditures.

c)

OUT P UT = 0 + 1REGULAT ION + 2Z .

 Regulatory control aects rm output;


 Firms with higher output are more regulated on average.
d) W AGE = 0 + 11I(UNION ) + 2Z .

 Belonging to a union signicantly raises wages;

1.1.

GAINS IN POOLING CROSS SECTION AND TIME SERIES

17

 Firms react to higher wages imposed by unions by hiring higherquality workers, and

1.1.3

1I(UNION ) is a proxy for worker quality.

Less colinearity between explanatory variables

In consumer or production economics, input, output or consumer


prices are dicult to use, because:

 Time-series:

Aggregated macro price indexes are highly cor-

related;

 Cross-sections: Not enough price variation across individuals


or rms.

With panel data, variations across individuals and across time periods are accounted for.

 Time-series: no information on the impact of individual characteristics (socioeconomic variables,...);

 Cross-sections: no information on adjustment dynamics. Estimates may reect inter-individual dierences inherent in comparisons of

1.1.4

dierent people or rms.

May reduce bias due to missing or unobserved


variables

With panel data, easy to control for unobserved heterogeneity


across individuals. This is critical in practice, explains why panel
data models are now so popular in micro- and macro-econometrics.
Point related to endogeneity and omitted variables issues.

18

CHAPTER 1. INTRODUCTION

Example: Output supply function under perfect competition

max  = pQ C (; Q) where C (; Q) = c(Q)


(Q)
, p =  @c@Q
= A Q 1 (Cobb-Douglas)
= ( 0 + 1Q) (Quadratic).
1
Cobb-Douglas case: log Q = 1 (log p
log  A ). From
equilibrium condition to estimable equation: Observations (Qit ; pit ),
unobserved heterogeneity i , rm i, period t.
1
(log pit log i A )
log Qit =
1
Identication issue: estimable equation is

Q~ it = a0 + a1p~it + uit; i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;
~ it = log Qit, p~it = log pit, a1 = 1=( 1),
where Q
a0 = ( A E log i) =( 1), Euit = 0.
Model identied if E log i = 0, i.e., Ei = 1, otherwise A is biased if i is overlooked and E log i 6= 0.
Empirical issue: possible correlation between output price
and eciency term

i.

pit

1.2 Analysis of variance


Consider the model

yit = i + xit i + "it;


where

xit

is scalar,

and

i = 1; 2; : : : ; N; t = 1; 2; : : : ; Ti;
i

are parameters, and

time periods available for individual

i.

Ti:

number of

1.2.

19

ANALYSIS OF VARIANCE

Useful rst-order empirical moments are

Ti
1X
y ;
yi =
T t=1 it

Sxxi =

Ti
X
t=1

x )2;

(xit

and

Syyi =

Ti
X
t=1

(yit

Ti
1X
x ;
xi =
T t=1 it

Sxyi =
yi)2;

Ti
X
t=1

(xit

xi)(yit

yi);

i = 1; 2; : : : ; N:

Least-square parameter estimates are computed as

^ i = Sxyi=Sxxi

and

xi ^

^ i = y i

and the Residual Sum of Squares (RSS) for individual

2 =S ;
Sxyi
xxi

RSSi = Syyi

with

(Ti

i is

2) degrees of freedom:

Consider now a restricted model with constant slopes and constant intercepts:

yit = + xit + "it;

which obtains by imposing the following restrictions

1 = 2 =    = N (= )
1 = 2 =    = N (= ):

Under these restrictions, least-squares parameter estimates would


be

^ =

PN PTi
)(yit
i=1 t=1(xit x
PN PTi
)2
i=1 t=1 (xit x

y)

20

CHAPTER 1. INTRODUCTION

and

^ = y x ^ , where
y =

Ti
N X
X

1
P

i Ti i=1 t=1

yit; x =

1
P

Ti
N X
X
i Ti i=1 t=1

xit:

The Residual Sum of Squares is

RSS =

hP

Ti
N X
X
i=1 t=1

(yit

y)2

with as number of degrees of

N PTi
i=1 t=1(yit y)(xit
PN PTi
)2
i=1 t=1(xit x
PN
freedom:
i=1 Ti 2.

i2

x)

For a majority of applications, the rst model is too general and


estimation would require a great number of time observations. If
unobserved heterogeneity is additive in the model, we might consider the following specication with constant slope and dierent
intercepts:

Minimizing

P P
i t (yit

yit = i + xit + "it:


i xit )2 with respect to i and , we

have

XX
t

(yit

xit ) = 0;

XX
i

xit(yit

xit ) = 0;

so that

P P
x (y y )
^ i = yi xi and ^ = P i P t it it i :
i )
i t xit (xit x
P
Residual Sum of Squares has now
i Ti (N + 1) degrees of

N + 1 parameters are estimated).

free-

dom (

This is the most popular model encountered in empirical applications.

1.3.

21

SOME DEFINITIONS

1.3 Some denitions


 Typical panel: when number of units (individuals) N
and number of time periods (

T ) is small.

is large,

 Short (long) panel: when # periods T is small (large).


 Balanced panel: same # periods for every unit (individual).
 Rotating panel: A subset of individuals is replaced every period. Rotating panels can be balanced or unbalanced.

 Pseudo panel:

when one is pooling cross-sections made of

dierent individuals for every period.

 Attrition: with long panels, the probability that an individual


remains in the sample decreases as the number of periods increases
(non response, moving, death, etc.)

22

CHAPTER 1. INTRODUCTION

Chapter 2
The linear model
2.1 Notation
yit = xit + uit; i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;
where

xit is a K

vector,

is a (K  1) vector of parameters, and

uit is the residual term.


yit and components of xit are both time-varying and varying across
individuals.

Component of dependent variable that is unexplained by

xit:

uit = i + t + "it;
i is the time-invariant individual
eect, and "it is the i.i.d. component.

where

t is the time

uit = i + "it.
error-component model: uit = i + t + "it .

One-way error-component model:


Two-way

eect,

23

24

CHAPTER 2. THE LINEAR MODEL

Allows several predictions of

yit given Xit:

E (yitjxit) = xit across i and t,


E (yitjxit; i) = xit + i for ind. i, across periods,
E (yitjxit; t) = xit + t for period t, across individuals,
E (yitjxit; i; t) = xit + i + t for ind. i and period t.
2.1.1

Model notation

2.1.1.1 Model in matrix form

Y = X + +  + ";

Y; ;  and " are (NT  1), X is (NT  K ).


Convention: index t runs faster, index i runs slower:
where

0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@

y11
..
.

y1T
y21
..
.

y2T
..
.

yit
..
.

yN 1
..
.

yNT

1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A

6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4

(1)
X11
..
.

X1(1)T
(1)
X21
..
.

X2(1)T
..
.

Xit(1)
..
.

XN(1)1
..
.

(1)
XNT














(K )
X11
..
.

X1(KT )
(K )
X21
..
.

X2(KT )
..
.

Xit(K )
..
.

XN(K1)
..
.

(K )
XNT

3
7
7
7
7
70
7
7
7B
7B
7B
7B
7B
7B
7B
7@
7
7
7
7
7
7
7
5

1
2

C
C
..
C
.
C
+ ++"
k C
C
C
..
A
.

2.1.

25

NOTATION

2.1.1.2 Model in vector form

yi = Xi + i +  + "i; i = 1; 2; : : : ; N;
0
where yi is T  1, Xi is T  K . Note:  = (1 ; 2 ; : : : ; T ) and
i = ( i; i; : : : ; i)0 are (T  1).
2.1.2

Standard matrices and operators

 INT : identity matrix w/ NT rows and NT columns;


 eT : T -vector of ones;
 B = IN
(1=T )eT e0T :
 B = (1=N )eN e0N
IT :
 Q = INT

(Between-individual operator);

(Between-period operator);

IN
(1=T )eT e0T = INT

(Within-individual operator);

 Q = INT

(1=N )eN e0N


IT = INT

B

(Within-period operator;)

 B B = (1=NT )eNT e0NT


(Computes full population mean).

Important assumption: No intercept term in the


model (otherwise, use B B to demean all variables).
The

B operators are used to compute, from NT

vectors and ma-

trices, individual- or time-specic means of variables which are

26

CHAPTER 2. THE LINEAR MODEL

stored in matrices of row dimension


The

NT .

operators are used to compute deviations from these

means.

2.1.3

Important properties of operators

Symmetry, idempotency and orthogonality

Q0 = Q; B 0 = B; Q2 = Q; B 2 = B; BQ = QB = 0;

Rank of idempotent matrix = its trace

rank(Q) = N (T 1) and rank(B ) = N:


Decomposition of the Q operator with N = T = 2:
02
3
1
1 0 0 0




B6 0 1 0 0 7
C
1
1
0
1
1
6
7
Cy
Qy = B

@4 0 0 1 0 5
0 1
2 1 1 A
0 0 0 1
0
1
2
30
1
1 1 0 0
y11
y11
B y12 C 1 6 1 1 0 0 7 B y12 C
6
7B
C
C
=B
@ y21 A 2 4 0 0 1 1 5 @ y21 A
0 0 1 1
y22
y22
0
1
0
1
y11
y11 + y12
B y12 C 1 B y11 + y12 C
C
B
C
=B
@ y21 A 2 @ y21 + y22 A
y22
y21 + y22
We will also use

 BT = (1=T )eT e0T : Between operator for a single individual;


 QT = IT (1=T )eT e0T = IT BT : Within operator for a single
individual.

2.2.

27

THE ONE-WAY FIXED EFFECTS MODEL

2.2 The One-Way Fixed Eects model


Terminology: the xed-eects model does not mean that individual eects
estimation is

are not random in the true model !

conditional

Rather,

on unobserved heterogeneity: the

i 's

are treated as parameters to be estimated.

2.2.1

The estimator in terms of the Frisch-Waugh-Lovell


theorem

Inference is conditional on individual eects: estimates obtain by

Y on X and on individual dummies.


Let E the NT  N matrix of individual dummy variables:
2
3
1
0
0  0
61
7
0
0  0
6
7
61
7
0
0  0
6
7
60
7
1
0



0
6
7
60
7
1
0  0
6
7
60
7
1
0  0
6
7
E = 6 ..
7
..

regressing

6
6
6
6
6
6
6
6
4



"

"

0
0
0

0
0
0

(i = 1) (i = 2)

 
0 
0 
0 

 

1
1
1

"

(i = N )

and consider the model

Y = X + E + " = W  + u
0 00
where W = [X; E ],  = ( ; ) , u = + ".

7
7
7
7
7
7
7
7
5

28

CHAPTER 2. THE LINEAR MODEL

Frish-Waugh-Lovell theorem:

Parameter estimates

are numeri-

cally identical in the 2 following procedures:

 ^ from ^0OLS = ( ^ 00; ^0)0 = (W 0W )


 ^ = (X  X ) 1X  Y ; where
X  = [I E (E 0E )
Y  = [I E (E 0E )

1W 0Y

1E 0 ]X = PE X;
1E 0]Y = PE Y

(residuals from least-square regression of

and

on

0
0
But E = IN
eT , E E = IN
eT eT = IN  T
, PE = I E (E 0E ) 1E 0 = I T1 E (IN )E 0
= I 1 (IN
eT )(IN
eT )0 = I IN
1 eT e0 = Q.
T

E ).

^ = (X 0 X ) 1(X 0 Y ) = (X 0PE0 PE X ) 1(X 0PE0 PE Y )


= (X 0QX ) 1(X 0QY ).
Hence

Idea behind the xed-eect estimation procedure:


Eliminate individual eects

Eliminate individualspecic deviations


from variables

Transformation of the linear model as follows:

yit

1=T

X
t

yit = (xit

BY = (X

1=T

X
t

xit) + uit

BX ) + u Bu

1=T

X
t

uit

QY = QX + Qu:

Least square parameter estimate:

^ = [(QX )0(QX )] 1 (QX )0QY = [X 0 Q0QX ] 1 (X 0Q0QY )


= (X 0QX ) 1X 0 QY and V ar( ^ ) = "2(X 0QX ) 1.

2.2.

29

THE ONE-WAY FIXED EFFECTS MODEL

2.2.2

Interpretation as a covariance estimator

The model is, in vector form:

y1

6y 7
6 2 7
6 .. 7
4. 5

yN

x1

6x 7
2 7
=6
6 .. 7
4. 5

xN

eT
0T
60 7
6e 7
T 7
6 T 7
+6
6 .. 7 1 + 6 .. 7 2
4. 5
4. 5

0T
60
6 T

+    + 6 ..
4

eT

0T

7
7
7 N
5

0T

+ 6 ..

7
7
7;
5

"1
6"
6 2
4

"N

with assumptions:

E ("i) = 0; E ("i"0i) = "2IT ; E ("i"0j ) = 0 i 6= j:


OLS estimates of and i obtain by
N
X

X
min "0i"i = (yi
i=1
i=1

, ^ i = yi

i

xi )0(yi

i

xi )

i = 1; 2; : : : ; N;
and substituting in partial derivative wrt. , we have
^ =

" N;T
X
i;t

(xit

xi)(xit

xi ;

xi)0

# 1 " N;T
X
i;t

(xit

xi)(yit

yi)

covariance estimator, or the LSDV (Least-Square


Dummy-Variable) estimator. ^ is unbiased, is consistent when N

This is called the

or

tends to innity. Its covariance matrix is

 

V ar ^ = ^ 2"

" N
X
i=1

xiQT x0i

# 1

30

CHAPTER 2. THE LINEAR MODEL

where

QT = IT

(1=T )eT e0T .

^ i is unbiased but consistent only when T


2.2.3

! 1.

Comments

 Model transformation by ltering out individual components


) Coecients associated with time-invariant regressors are not
identied.

 Fixed-eect procedure uses variation within periods for each


unit, hence the name.

 Another possibility is the Between procedure, using variation between individuals.

BY = BX + B + B";

^ = [(BX )0(BX )] 1 (BX )0BY = [X 0 BX ] 1 X 0BY:


This alternative estimator uses variation between individual means
for model variables.

If

X1 is time-varying only, BX1 = f T1

PT
t x1it i;t

= x1 8i, and

the intercept term is not identied.

A word of caution in computing variance estimates. In the

QY = QX + Qu, statistical software would divide RSS


by NT
K (individual eects not included). But in the model
Y = X + E + + ", the RSS would be divided by N (T 1) K .
model

Parameter variance estimates in the Within regression model must


be multiplied by

(NT

K )=[N (T

1) K ].

2.2.

31

THE ONE-WAY FIXED EFFECTS MODEL

Y
..........
..........
.
.
.
.
.
.
.
.
.
.
..........
..........

.
........
........
.
.
.
.
.
.
.
......

Within

Between

1...........
X
2.2.4

Testing for poolability and individual eects

Poolability
As before:

yit = i + xit i + "it


versus

yit = i + xit + "it;

xit is a K vector.
H0 : 1 = 2 =    = N (= ) (K (N

but now

1) constraints).

Fisher test statistic is

(RRSS URSS )=K (N 1)


v F (K (N
URSS=N (T K 1)

where RRSS: from Within regression


and URSS:

PN
i=1 RSSi where RSSi

Testing for individual eects


H0 : 1 =    = N (= ).

1); N (T

1)) ;

2 =Sxxi (see 1.2).


= Syyi Sxyi

32

CHAPTER 2. THE LINEAR MODEL

yit = + xit + "it

(OLS)

versus

yit = i + xit + "it

(Within)

Fisher test statistic is

(RRSS URSS )=(N 1)


v F ((N
URSS=(NT N K )

1); NT

K )) ;

where RRSS: from OLS regression on pooled data


and URSS: from Within (LSDV) regression.

2.3 The Random Eects model


2.3.1

Notation and assumptions

Problem with Fixed-eect model: degrees of freedom are lost when

! 1.

Dierent approach: assume individual eects are ran-

dom, i.e., model inference is drawn marginally (unconditionally


upon the

i 's) wrt.

the population of all eects.

Assumptions:

i v IID(0;  2 ); "it v IID(0; "2); E ( i"it) = E ( ixit) = 0;


with

E ( i j ) =


 2
0

if

i = j;

otherwise

"2 if i = j and t = s;
E ("it"sj ) = 0 otherwise:
2
2
2
Hence cov (uit ; ujs ) =  + " if i = j and t = s, and  if i = j
and t 6= s.

2.3.

33

THE RANDOM EFFECTS MODEL

Let

 2 + "2  2
6 2
 2 + "2
6
0

T = E (uiui) = 6 ..
4.

2
2






 2
2
..
.

 2 + "2

T  T ) matrix, for every individual i, i = 1; 2; : : : ; N .

a (

E (uu0) =
= IN

T = IN
 2 (eT e0T ) + "2IT


3
7
7
7;
5

We have

= IN
 2 (T  BT ) + "2(QT + BT )
since QT = IT
BT and BT = (1=T )eT e0T . Therefore


= IN
 2 (T  BT ) + "2(QT + BT ) = T  2 B + "2INT

= "2Q + (T  2 + "2)B .

or equivalently:

2.3.2

GLS estimation of the Random-eect model

General model form:

Y = X + U;

with

E (UU 0) =
.

Generalized Least Squares (GLS) produce ecient parameter estimates of

,  2

and

"2,

based on known structure of variance-

.
^ GLS = X 0
1X  1 X 0
1Y
^ GLS ) = "2 X 0
1X  1.
and V ar (
covariance matrix

Computation of

1:

use of the formula

r = ("2)r Q + (T  2 + "2)r B
for an arbitrary scalar

r.

Based on properties of

potency and orthogonality).

Q and B (idem-

34

CHAPTER 2. THE LINEAR MODEL

Hence useful matrices are

1
1
B

1 = 2Q + 2
"
T  + "2

and

1
1

1=2 = Q +
B:
2
"
(T  + "2)1=2
^ GLS = X 0
1X  1 X 0
1Y
We have
"

= X0

"2

 1

# 1"

X0

"2

 1

Y :

i 1h
i
1
1
0
0
= X (Q + B ) X
X (Q + B ) Y ;
2
2 2
2 2
where  = (T  +  )= = 1 + T  = .

"

"

"

GLS as Weighted Least Squares. Premultiply the model by

"
1=2 and use OLS: Y  = X  + u, where


"

1
=
2
Y = "
Y = Q +
B Y
(" + T  )1=2


"

1
=
2
X = "
X = Q +
B X;
(" + T  )1=2
so that
Y  = (Q +  1=2B )Y; X  = (Q +  1=2B )X;
scalar form:

fyit g = (yit

yi) +  1=2yi = yit

(1

fxitg = (xit

xi) +  1=2xi = xit

(1

and in

p1

)yi

p1 )xi:


See Appendix 1 for Maximum Likelihood Estimation of the randomeects model.

2.3.

35

THE RANDOM EFFECTS MODEL

2.3.3

Comparison between GLS, OLS and Within

1
1
^ GLS = X 0 QX + 1 X 0 BX
X 0QY + X 0 BY


^ W ithin = (X 0QX ) 1X 0 QY; ^ Between = (X 0BX ) 1X 0BY;
so that

^ GLS = S1 ^ W ithin + S2 ^ Between;


1 0
0
1 0
where S1 = [X QX + X BX ] X QX and

0
S2 = [X 0QX + 1 X 0 BX ] 1 X BX
 .

 (i) If  2 = 0, then 1= = 1 and ^ GLS = ^ OLS .


 (ii) If T ! 1, then 1= ! 0 and ^ GLS ! ^ W ithin.
 (iii) If 1= ! 1, then ^ GLS ! ^ Between.
 (iv) V ar( ^ W ithin) V ar( ^ GLS ) is a s.d.p. matrix.
 (v) If 1= ! 0 then V ar( ^ W ithin) ! V ar( ^ GLS ).
2.3.4

Fixed individual eects or error components?

Crucial issue in panel data econometrics: how should we treat effects

i's ?

As parameters or as random variables ?

) If inference is restricted to the specic units (individuals)


in the sample: conditional inference, use Fixed eects. Example:
Individuals are not selected as random, or all rms in a given industry are selected.

) If inference on the whole population:

marginal (uncondi-

tional) inference, use Random eects. Example: Individuals are


selected randomly from a huge population (consumers).

36

CHAPTER 2. THE LINEAR MODEL

2.3.4.1 Some practical choice criteria

 Interpretation of eects in the (economic) model;


 Sampling process: purely random or not;
 Number of units (countries, regions, households,...);
 Interchangeability of units;
 Endogeneity of Xit (see later).
2.3.4.2 Terminology
When xed individual eects are considered, Fixed-Eects or

Within estimation procedure. When random individual eects,


GLS (Generalized Least Squares) estimation procedure.
2.3.5

Example: Wage equation, Hausman (1978)

629 high-school graduates, Michigan income dynamics study. 3774


observations (

N = 629, T = 6).

Dependent variable: log wage

The GLS estimator is a weighted-average of the Within and Between estimators, where the weight is the inverse of the corresponding variance.

The Within estimator neglects the variation between individuals,


the Between estimator neglects the
variation within individuals, and the OLS gives equal weight to
both Within and Between variations.
Note. If the model contains an intercept:

yit =  + xit + i + "it;

2.3.

37

THE RANDOM EFFECTS MODEL

Table 2.1:

Within and GLS estimation results

Variable

Within

GLS

Constant

0.8499

Age in [20,35]

0.0557

0.0393

Age in [35,45]

0.0351

0.0092

Age in [45,55]

0.0209

-0.0007

Age in [55,65]

0.0209

-0.0097

Age 65 over

-0.0171

-0.0423

Unemployed prev. year

-0.0042

-0.0277

Poor health prev. year

-0.0204

-0.0250

Self-employed

-0.2190

-0.2670

South

-0.1569

-0.0324

Rural

-0.0101

-0.1215

we use

2.3.6

B B instead of B (to eliminate ) in the formulae.

Best Quadratic Unbiased Estimators (BQU) of


variances

If errors are normal, BQU estimates of

^ 2 = u0Qu=tr(Q) =
"

and

because

 2 and "2 are found from

PN PT
i=1 t=1(uit

N (T

1)

ui)2

X
"2 + T  2 = u0Bu=tr(B ) = T
u2i =N;
i=1

tr(Q) = N (T

But in practice, the


variances from the

1) and tr(B ) = N .

uit's

are unknown and we must estimates

u^it's instead.

38

CHAPTER 2. THE LINEAR MODEL

1/ Wallace and Hussain (1969):


true

u's;

2/ Amemiya (1971):


p
2
pNT (^2"


Use OLS residuals in place of

Use LSDV residuals estimates. We have

N (^

"2)
 2 )

where 
^ 2 = "2 + T  2

vN

0;

2"4 0
0 2 4



^ 2" =T .

3/ Swamy and Arora (1972):

Use mean square errors of the

Within and the Between regressions.


Mean square error from Within regression:

^ 2" = Y 0QY


Y 0QX (X 0QX ) 1X 0 QY =[N (T

1) K ]

and from the Between regression:


Y 0BX (X 0BX ) 1X 0BY =[N

"2 + T  2 = Y 0BY

Note: Intercept term in the Between regressors (

X ),

1]:

not in the

Within regression.

4/ Nerlove (1971):

Compute

^ 2 = N1 1

PN
i
i=1(^

^i)2, where ^ i

are parameter estimates associated to individual dummies from


LSDV regression. And

"2 is estimated from Within regression.

Estimation methods above with covariance components replaced


by consistent estimates:

Feasible GLS.

Chapter 3
Extensions
3.1 The Two-way panel data model
Error component structure of the form:

uit = i + t + "it

i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;

or in matrix form

U = (IN
eT ) + (eN
IT ) + ";
where

= ( 1; : : : ; N )0 and  = (1; : : : ; T )0.

3.1.1

The Two-way xed-eect model

t are treated as xed parameters, conditional


on the N individuals over the period 1 ! T .
and

inference

3.1.1.1 Notation
Fixed-eect estimates of

Q = IN
IT

obtain by using the new operator:

IN
(eT e0T =T ) (eN e0N =N )
IT ;
39

40

CHAPTER 3.

so that

Qu = fuit

ui

EXTENSIONS

utgit :

Averaging over individuals, we have

yt = xt + t + "t

N
X
with restriction

i=1

i = 0:

and averaging over time periods:

yi = xi + i + "i

T
X
with restriction

t=1

t = 0;

OLS on model in deviations yields

^ = (X 0QX ) 1X 0 QY;
^
^ i = yi xi ;
^
^t = yt xt :
If the model contains an intercept, operator

Q = IN
IT
so that

Qu = fuit

Q becomes

IN
(eT e0T =T ) (eN e0N =N )
IT

+(eN e0N =N )
(eT e0T =T )
ui ut + ugit, and Within estimates are

^ = (X 0QX )
^ i = (yi y)
^ t = (yt y)

1X 0 QY;

(xi
(xt

^
x) ;
^
x) :

3.1.1.2 Testing for eects

1/ H0 : 1 =    = N = 1 =    = T = 0.

3.1.

41

THE TWO-WAY PANEL DATA MODEL

Fisher test statistic:

(RRSS URSS )=(N + T 2)


v F (k1; k2);
URSS=[(N 1)(T 1) K ]
where

k1 = N + T

2; k2 = (N

1)(T

1) K );

and

URSS (Unrestricted RSS): from Within model,


RRSS: (Restricted RSS): from pooled OLS.

2/ H0 : 1 =    = N = 0 given t 6= 0; t  T

1.

Fisher test statistic:

(RRSS URSS )=(N 1)


v F (k1; k2);
URSS=[(N 1)(T 1) K ]
where

k1 = N

1; k2 = (N

1)(T

1) K );

and

URSS: from Within model,


RRSS: from regression w/ time dummies only:

(yit

yt) = (xit

xt) + (uit

ut):

3/ H0 : 1 =    = T 1 = 0 given i 6= 0; i  N

1.

Fisher test statistic:

(RRSS URSS )=(T 1)


v F (k1; k2);
URSS=[(N 1)(T 1) K ]
where

k1 = T

1; k2 = (N

1)(T

1)

K );

and

42

CHAPTER 3.

EXTENSIONS

URSS: from Within model,


RRSS: from Within regression as in one-way model:

(yit

yi) = (xit

xi) + (uit

ui):

See Appendix 2 for the two-way random eects model.

3.1.2

Example: Production function (Hoch 1962)

Sample: 63 Minnesota farms over the period 1946-1951.


Estimation of a Cobb-Douglas production function:

log Outputit = 0 + 1 log Laborit + 2 log Real estateit


+ 3 log Machineryit + 4 log F ertilizerit:

uit):
 Climatic conditions, identical across farms (t);
Motivation for adding specic eects (into

 Farm-specic factors (soil, managerial quality) ( i).


Table 3.1:

Least square estimates of Cobb-Douglas production func-

tion
Assumption

Estimate

1 (Labor)
2 (Real estate)
3 (Machinery)
4 (Fertilizer)
Sum of 's
R 2

(I)

(II)

(III)

0.256

0.166

0.043

0.135

0.230

0.199

0.163

0.261

0.194

0.349

0.311

0.289

0.904

0.967

0.726

0.721

0.813

0.884

i = t = 0 i = 0 t = 0

3.2.

43

MORE ON NON-SPHERICAL DISTURBANCES

3.2 More on non-spherical disturbances


Panel data: in the random-eect context, heteroskedasticity due
to panel data structure.

But variances

 2

"2

and

are assumed

constant.

Heteroskedasticity and serial correlation:

V ar( i) = i2
V ar("i) = i2
E ("it"is) 6= 0

Individual-specic heteroskedasticity

t 6= s

Typical heteroskedasticity

Serial correlation

We present here the rst two cases only.

3.2.1

Heteroskedasticity in individual eect

Mazodier and Trognon (1978):

V ar( i) = i2 "it v IID(0; "2);

or

i = 1; 2; : : : ; N;

E ( 0) = diag[i2] =  and " v IID(0; "2).

= E (UU 0) = diag[i2]
(eT e0T ) + diag["2]
IT ;

where

diag["2] is N  N .

We have

e e0
eT e0T

= diag[T i2 + "2]


T T + diag["2]
IT
T
T




eT e0T
eT e0T
r
2
2
r
2
r

= diag[(T i + " ) ]

+ diag[(" ) ]
IT
:
T
T
Transformation of the heteroskedastic model:
multiply both sides by

"
1=2

"
= diag
2
(T i + "2)1=2

eT e0T
+ IN
IT
T

eT e0T
:
T

44

CHAPTER 3.

Transformed variables in scalar form:

yit = yit

"

"
p
T i2 + "2

EXTENSIONS

!#

yi:

Same form as in the homoskedastic case, only here

 is individual-

specic:

i = (T i2 + "2)="2

and

yit = yit

Feasible GLS:

p1 yi:
i

 Step 1. Estimate "2 consistently from usual Within regression;


2
2
2
2
 Step 2. Noting
that V ar (uit ) = wi = i + " , estimate wi by
PT
1=(T

^ui)2, where uit is OLS residual;


Compute 
^ 2i = w^i2 ^ 2" ;
Form T 
^ 2i + ^ 2" , ^i and compute y^it ; x^it;
Regress y
^it on x^it to get ^ .

1)

 Step 3.
 Step 4.
 Step 5.

uit
t=1 (^

Important: consistency of variance components estimates

1; 2; : : : ; N
3.2.2

requires

T >> N .

w^i2; i =

`Typical heteroskedasticity

Assumptions:

i v IID(0; i2) and V ar("it) = i2.

= E (UU 0) = diag[ 2 ]
(eT e0T ) + diag[i2]
IT
= diag[T  2 + i2]
(eT e0T =T ) + diag[i2]
(IT
Transformed model uses

1
]
(eT e0T =T )
2
2
T  + i

1=2 = diag[ p

eT e0T =T ) :

3.3.

45

UNBALANCED PANEL DATA MODELS

+diag[1=i]
(IT

eT e0T =T ) ;

Y  =
1=2 has typical element
y y
y
yit = it i + p 2i 2
i
T  + i
y iyi
i
p
= it
where i = 1
i
T  2 + i2
E (u2it) = wi2 =  2 + i2 8i, hence
OLS residuals u
^it can be used to
P
T
2 ^ 2 = 1=(T 1)
estimate wi : w
uit ^ui)2.
i
t (^
Within residuals u
~ are then used to compute
PTit
2
^ i = 1=(T 1) t (~uit u~i)2.

so that

A consistent estimate of

 2 is ^ 2 = (1=N )

PN 2
^i
i (w

^ 2i ).

3.3 Unbalanced panel data models


3.3.1

Introduction

Denition: number of time periods is dierent from one unit (indi-

i
Ti periods, and total
PN
number of observations is now
i=1 Ti (instead of NT previously).
vidual) to another. For individual , we have

Examples

 Firms: may close down or new intrants in an industry;


 Consumers: may move, die or refuse to answer anymore;
 Workers: may become unemployed,...
Problem of attrition: probability of a unit staying in the sample
decreases as the # of periods increases.

46

CHAPTER 3.

3.3.2

EXTENSIONS

Fixed eect models for unbalanced panels

3.3.2.1 The one-way unbalanced xed-eect model


Consider the unbalanced model with

y11
B y12
B
B y13
B
@ y21
y22
To eliminate

T1 = 3 and T2 = 2:

x11
1
C B x12 C
B 1
C B
C
B
C = B x13 C + B 1
C B
C
B
A @ x21 A
@ 2
x22
2

"11
C B "12
C B
C + B "13
C B
A @ "21
"22

C
C
C:
C
A

, we need a new Within operator


Q =
2

6
6
6
6
4

2=3
1=3
1=3
0
0

I3

e3e03=3
0
I2
1=3
2=3
1=3
0
0

1=3
1=3
2=3
0
0

0
e2e02=2
0
0
0
1=2
1=2


3

0
07
7
07
7;
1=2 5
1=2

and the same procedure as in the balanced case is applied:

^ W ithin = (X 0 QX ) 1 X 0 QY


where

Q = diag(ITi

eTi e0Ti =Ti)ji=1;2;:::;N .

3.3.2.2 The two-way unbalanced xed-eect model


The model is

yit = xit + i + t + "it i = 1; 2; : : : ; Nt; t = 1; 2; : : : ; T;

3.3.

47

UNBALANCED PANEL DATA MODELS

where

Nt:

# of units observed in period

Total number of observations is

n.

t, and n =

PT
t=1 Nt .

A bit more complex to extend the Within approach here.

Important: We now assume that observations are ordered differently:

i runs fast and t runs slowly.

Consider a

N

matrix at time

from which we delete rows

corresponding to missing individuals at

t.

N = 3, N1 = 3, N2 = 2, N3 = 2, and observations are


(y11; y21; y31) (y12; y32) (y13; y23).

Example:

1 0 0
40 1 05
0 0 1

8
>
>
>
>
>
>
>
>
>
>
>
>
>
<

1 0 0
D1 = 4 0 1 0 5
0 0 1

1 0 0
D
=
>
2
>
>
0 0 1
>
>
>
>
>
>
>
>
>
>
:

1 0 0
D3 =
0 1 0
We have 3 (Nt  N ) matrices Dt , t = 1; 2; 3 constructed from I3
above.

 as (1; 2), where 1 = (D10 ; : : : ; DT0 )0,


a (n  N ) matrix, and 2 = diag (Dt eN ), a (n  T ) matrix:
2
3
D1 D1eN   
0
6 D
0 
0 7
6 2
7
 = 6 ..
7:
..
..
4 .
5
0
.
.
DT 0    DT eN

Now dene a new matrix

48

CHAPTER 3.

EXTENSIONS

Dt eN 's provide the number of units present for each period t


(the Nt 's).
The

Matrix

is

n  (N + T ),

and corresponds to the matrix of all

dummies (units and periods) present in the sample. Part

1

in

 is the equivalent ot matrix E (containing individual dummies)


before.

011 = diag(Ti) (number of periods in the sample for


0
unit i), and 2 2 = diag (Nt ) (number of individuals for period
t).
0
Also, 2 1 is a T  N matrix of dummy variables for the presence
in the sample of unit i at time t.
Note that

Fixed-eect estimator could be implemented by considering the


model

yit = xit + Dit + "it i = 1; 2; : : : ; Nt; t = 1; 2; : : : ; T;

where
and

Dit:

t's.

particular row of matrix

, and contains all the i's

1 = (eT
2 = (IT
eN ), and  would be NT  (N + T ).
In the balanced panel case, we would have

IN ) and

3.3.

49

UNBALANCED PANEL DATA MODELS

n = 3 + 2 + 2 = 7 and N = 3:

In example above,

=

vector

1
60
6
60
6
61
6
40
0

0
0
1
1
0
0

1
0
0
0
1
0

1
0
0
1
0
1
0

0
1
0
0
0
0
1

0
0
1
0
1
0
0

1
1
1
0
0
0
0

0
0
0
1
1
0
0

0
07
7
07
7
07
7;
07
7
7
15
1

( 1; 2; 3; 1; 2; 3), and 0Y =

would be

0
1
0
1
0
0

6
6
6
6
6
6
6
6
6
4

0
0
1
0
1
0

1
0
0
0
0
1

0
y
0 B 11 C
y11 + y12 + y13
y
B
21
C
17
y21 + y23
C B
7B
y
B
C B
31
7
B
0 7B
C B y31 + y32
y
B
C=B
12
07
y11 + y21 + y31
C
7B
B y32 C B
5
@
C
0 B
y12 + y32
@ y13 A
1
y13 + y23
y23
3

1
C
C
C
C
C
C
A

would compute the sums of variables over periods and inviduals.

Easier method if

and

are large: use deviations from indi-

vidual and time means, as in the balanced two-way Within case.

Let

N = 011
T = 022
NT = 021
 = 2 1N10NT


P = T NT N10NT = 02

(N  N );
(T  T );
(T  N );
(n  T );
(T  T ):

50

CHAPTER 3.

EXTENSIONS

Wansbeek and Kapteyn (1989): The required Within operator for


such unbalanced two-way panel is

Q = In
where

1N101

: generalized inverse of

P 
 0;


P.

QY , say, is also written as


P 
 0Y = Y 1N11
QY = Y 1N101Y 
0
 0Y .
where 1 = 1 Y and  = P 
Transformed variable

PTi
t=1 yti .

1 compute the individual sums


Typical transformed element:

(QY )ti = yti


where

 ;


1i
a0i
+
Ti
Ti

t;

ai : i-th column of NT .

Example

Y = (y11; y21; y31; y12; y32; y13; y23) = (1; 2; 3; 2; 6; 3; 4), n = 7,


N = 3, T = 3.
Let

We have

3 0 0
1 1 1
N = T = 4 0 2 0 5 ; NT = 4 1 0 1 5 ;
0 0 2
1 1 0
2

P =4

1:6666
0:8333
0:8333

0:8333
1:1666
0:3333

0:8333
0:3333 5
1:1666

3.3.

51

UNBALANCED PANEL DATA MODELS

QY =

B
B
B
B
B
B
B
B
B
@

0:4582
0:1875 C
C
0 1
0
1
0:5000 C
6
0:3383
C
0:5418 C
C ; 1 = @ 6 A  = @ 1:6618 A
0:5000 C
C
9
2:0368
C
0:0832 A
0:1875

For example,

Qy11 = 1

6 1
+ ( ) (1 1 1 ) @
3 3
0

Qy31 = 3

9 1
+ ( ) (1 1 0 ) @
2 2

0:3383
1:6618 A + 0:3383 = 0:4582:
2:0368
1

0:3383
1:6618 A + 0:3383 = 0:5:
2:0368

See Appendix 3 for the unbalanced random-eects model.

52

CHAPTER 3.

EXTENSIONS

Chapter 4
Augmented panel data models
What are augmented panel models ? Implication for estimation ?
Special estimation techniques when GLS are not feasible.

4.1 Introduction
Consider the model

yit = xit + zi + i + "it; i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;


xit a 1  K vector of time- and individual-varying regressors,
and zi a 1  G vector of individual-specic (time-invariant) rewith

gressors.

Example:

log W AGE = 1HOURS + 1EDUC + 2SEX + i + "it:


Estimation method:

 Within: is not identiable because


QY = QX + (I

B )Z + Q + Q" = QX + Q";
53

54
since

CHAPTER 4.

BZ = Z .

Only

AUGMENTED PANEL DATA MODELS

identiable.

But two-step procedure is

feasible:

1/ Run Within regression

) ^ ;

2/ Run Between regression on

xi ^ = i + Zi + "i;
to estimate the 's.
yi

i = 1; 2; : : : ; N;

 GLS: Both and are identiable.


4.2 Choice between Within and GLS
One of the choice criterion between Within and GLS: presence of

zi's in the model.

Recall: GLS is a consistent and ecient estimator provided regressors are exogenous:

E ( izi) = 0
8i; t:
Consider the non-augmented model yit = xit + i + "it .
If xit is endogenous in the sense E ( i xit ) 6= 0, then GLS are not
E ( ixit) = 0

consistent:

and



^ GLS = + X 0
1X 1 X 0
1U


  
 
= + X 0 Q +  1B X 1 X 0 Q +  1B U ;
2 2
where  = 1 + T  =" , so that
 0
X

 
Q +  1B U = [X 0Q" + X 0(B + B")=]

4.3.

55

AN IMPORTANT TEST FOR ENDOGENEITY

= 0 + X 0B = + 0 = X 0 = 6= 0;
because

E (X 0") = 0 and B = .

Same problem with the augmented model, if

E (Z 0 ) 6= 0.

Important consequence in practice:

E (X 0 ) 6= 0 and/or

If (some of the) re-

gressors are endogenous, GLS estimates are not consistent, but


Within estimates are consistent because

is ltered out.

Another criterion of choice between Within and GLS:

 If endogenous regressors ) Choose Within estimation (but


not identiable);

 If all regressors are exogenous, use GLS (the most ecient).


Three problems remain:

 still not identied, because in the Between regression


xi ^ = zi + i + "i,
zi still correlated with i.
yi

 If one uses Within, all regressors are treated as endogenous (no


distinction between exogenous and endogenous

 Within estimates not ecient.

xit's).

4.3 An important test for endogeneity


Null hypothesis:

H0 : E (X 0 ) = E (Z 0 ) = 0 (exogeneity).

Comparison between two estimators:

56

CHAPTER 4.

H0
Alternative

AUGMENTED PANEL DATA MODELS

^ GLS

^ W ithin

Consistent,

Consistent,

ecient

not ecient

Not consistent

Consistent

Hausman (1978): Even if the


mates of

xit's

are exogenous, GLS esti-

are not consistent in the augmented model.

Therefore,

one can test for exogeneity using parameter estimates for

Hausman test statistic: Under

HT = ^ W ithin


Notes
 ^ GLS

^ GLS


0 h

^ W ithin

H0,

only.

i 1

V ar( ^ W ithin) V ar( ^ GLS )


^ GLS

v 2(K ):

^ W ithin hmust have the same dimension.


i
^
^
 Weighting matrix V ar( W ithin) V ar( GLS ) is positive: GLS
and

more ecient than Within under the null.

Recall that

V ar( ^ GLS ) = "2(X 0QX + X 0 BX ) 1 and V ar( ^ w ) =

"2(X 0QX ) 1.

Interpretation of # of degrees of freedom of the test:


0
Within estimator is based on the condition E (X QU ) = 0, whereas
0 1
0
0
GLS is based on E (X
U ) = 0 ) E (X QU ) = 0 and E (X BU ) =
0.
For GLS, we add
of

X.

later).

additional conditions (in terms of

B ):

rank

Hausman test uses these additional restrictions (see GMM

4.4. INSTRUMENTAL VARIABLE ESTIMATION: HAUSMAN-TAYLOR GLS ESTIMATOR57

4.4 Instrumental Variable estimation: HausmanTaylor GLS estimator


4.4.1

Instrumental Variable estimation

Alternative method:

Instrumental-variable estimation.

cross-section context with

In the

observations:

Y = X + "; E (X 0") 6= 0; E (W 0") = 0;


W is a N  L matrix of instruments.
 If K = L,

where

[W 0(Y

 If L > K ,
[W 0(Y

X )] = 0

^ = (W 0X ) 1W 0Y
X )] = 0

(W 0Y ) = (W 0X )

(IV estimator)

L conditions on K

parameters)

(Y X )0W (W 0W ) 1W 0
X ) where PW = W (W 0W ) 1W 0

and construct quadratic form

 (Y

) ^ = (X 0PW0 X )
Note:

in general, instruments

1 (X 0 P Y ):
W
originate from or outside the

equation.

4.4.2

IV in a panel-data context

 Account for variance-covariance structure (


);
 Find relevant instruments, not correlated with .

58

CHAPTER 4.

AUGMENTED PANEL DATA MODELS

Consider the general, augmented model:

Y = X1 1 + X2 2 + Z1 1 + Z2 2 + + ";
where

X1 :
X2 :
Z1 :
Z2 :
and let

N  K1
N  K2
N  G1
N  G2

i and t;
endogenous, varying across i and t;
exogenous, varying across i;
endogenous, varying across i;
exogenous, varying across

 = (X10 ; X20 ; Z10 ; Z20 ) and  = ( 10 ; 20 ; 10 ; 20 )0.

General form of the Instrumental-variable estimator for panel


data: Let
have

Y  =
1=2Y , X  =
1=2X ,
h

 =
1=2.

We

1 0
0
^ IV =  PW 
 PW Y 
h
i 1h
i
0
1
=
2
1
=
2
0
1
=
2
1
=
2
= 
PW


PW
Y :

Computation of

4.4.3

and

1=2:

as in the usual GLS case.

Exogeneity assumptions and a rst instrument matrix

E (X10 ) = E (Z10 ) = 0
) Obvious instruments are X1 and Z1, not sucient because
K1 + G1 < K1 + K2 + G1 + G2.
Additional instruments: must not be correlated with .
Because is the source of endogeneity, every variable not correlated with is a valid instrument. Best valid instruments are
highly correlated with X2 and Z2 .
QX1 and QX2 are valid instruments: E [(QX1)0 ] = E [X10 Q ] =
Exogeneity assumptions:

4.4. INSTRUMENTAL VARIABLE ESTIMATION: HAUSMAN-TAYLOR GLS ESTIMATOR59

0 and E [(QX2)0 ] = E [X20 Q ] = 0.


As for

X1, equivalent to use BX1 because we need

E [X10
1U ] = E [X10 (Q +  1B )U ] = E [X10 B (Q +  1B )U ]
since

BQ = 0 and BB = B .

Hausman-Taylor (1981) matrix of instruments:

WHT = [QX1; QX2; BX1; Z1] = [QX1; QX2; X1; Z1]:


Identication condition: We have K1 + K2 + G1 + G2 parameters
to estimate, using K1 + K1 + K2 + G1 instruments (K1 + K2 instruments in

4.4.4

QX ).

Therefore, identication condition is

K1  G2.

More ecient procedures: Amemiya-MaCurdy and


Breusch-Mizon-Schmidt

4.4.4.1 Amemiya and MaCurdy (1986)

xit is exogenous, we can use the following conE (xit i ) = 0 8i; 8t instead of E (x0i i) = 0.

Use the fact that if


ditions:

Amemiya and MaCurdy (1986) suggest to use matrix

X1

in

60

CHAPTER 4.

AUGMENTED PANEL DATA MODELS

the list of instruments:

x11
6x
6 11
6 :::
6
6
6 x21
6
x21

X1 = 6
6
6 :::
6
6 xN 1
6
6 xN 1
6
4 :::
xN 1

x12
x12
:::
x22
x22
:::
xN 2
xN 2
:::
xN 2

:::
:::
:::
:::
:::
:::
:::
:::
:::
:::

x1T
x1T
:::
x2T
x2T
:::
xNT
xNT
:::
xNT

(i = 1; t = 1)
(i = 1; t = 2) 7
7
7
:::
7
(i = 2; t = 1) 7
7
(i = 2; t = 2) 7
7
7
:::
7
7
(i = N; t = 1) 7
7
(i = N; t = 2) 7
7
5
:::
(i = N; t = T )

QX1 = 0 and BX1 = X1. The AM instrument matrix


= [QX; X1; Z1], and an equivalent estimator obtains by

such that
is

WAM

using

WAM = [QX; (QX1); BX1; Z1];




where (QX1 ) is constructed as X1 above.
Amemiya and MaCurdy: their instrument matrix yields an estimator as least as ecient as with the Hausman-Taylor matrix,
if

i is not correlated with regressors 8t.

(QX1) to the Hausman-Taylor



list of instruments, but as [(QX1 ) ; X1 ] is of rank K1 , we only add
(T 1)K1 instruments. identication condition is T K1  G2.
Identication condition:

We add

4.4.4.2 Breusch, Mizon and Schmidt (1989)


Even more ecient estimator: based on conditions

E [(QX2it)0 i] = 0 8i; 8t, instead of condition


E [(QT X2i)0 i] = 0.

4.5. COMPUTATION OF VARIANCE-COVARIANCE MATRIX FOR IV ESTIMATORS61


For BMS, estimator is more ecient if endogeneity in

X2

origi-

nates from a time-invariant component. BMS instrument matrix:

WBMS = [QX; (QX1); (QX2); BX1; Z1]


where

(QX1)

and

(QX2)

are constructed the same way as

X1

for AM.

(QX2) to AmemiyaMaCurdy instruments. Condition is then T K1 +(T


1)K2  G2.
As before, we only add (T
1)K2 instruments, as (QX2) is not
full rank but (T
1)K2.

Identication condition:

For BMS, we add

4.5 Computation of variance-covariance matrix


for IV estimators
Problem here: endogenous regressors may yield unconsistent estimates of variance components in

, in particular parameter .

Method suggested by Hausman-Taylor (1981) that yields consistent estimates.

Let

M1 denote the individual-mean vector of the Within residual:


M1 = BY

where

BX ^ W = B


BX (X 0 QX ) 1X 0Q Y



= Z + + B BX (X 0 QX ) 1X 0Q ";
X = (X1jX2), Z = (Z1jZ2), and = ( 1; 2).

The last

three terms above can be treated as centered residuals, and it


suces to nd instruments for
The IV estimator of

is

Z2 in order to estimate .

^B = (Z 0PC Z ) 1(Z 0PC M1);

62

CHAPTER 4.

AUGMENTED PANEL DATA MODELS

PC is the projection matrix associated to instruments C =


(X1; Z1). Using parameter estimates ^ W and ^B , we form residwhere

uals

QX ^ W and u^B = BY

u^W = QY

BX ^ W

Z ^B :

These two vectors of residuals are used to compute variance composants as in standard Feasible GLS.

4.5.1

Full IV-GLS estimation procedure

 Step 1. Compute individual means and deviations, BX , BY ,


QX

and

QY .

 Step 2. Estimate parameters associated to X using Within.


 Step 3. Estimate B by the IV procedure above.
 Step 4. Compute  2 and "2 from u^W and u^B , and compute
^ = 1 + T ^ 2 =^ 2" .

 Step 5.p Transform variablespby GLS scalar procedure , e.g.,


(Q + B )Y = yit

(1

)yi.

 Step 6. Compute projection projection PW from instrument


matrix

W.

 Step 7. Estimate parameters .


4.6 Example: Wage equation
4.6.1

Model specication

4.6.

63

EXAMPLE: WAGE EQUATION

Theory (Human capital or signal theory):

log w = F [X1; ; ED];

:

where

w:

wage rate

X1: additional variables (industry, occupation status, etc.), and ED : educational level. Proxies
worker's ability (unobserved),

for ability that can be used: number of hours worked, experience,


union, etc.

Main objective: estimate marginal gain associated with

ED: @w=@ED.

But problem: what if worker's ability is constant through time and

conditions

where

ED ?

True model would be

log w = F [X1; ; ED];


ED = G[; X2];

X2 are additional, individual-specic variables.

If ability

 is replaced by proxies Z , we have




log w = F [X1; Z; ED] + U;


ED = G[X2; Z2] + V;

U = F [X1; ; ED] F [X1; Z; ED] and


V = G[X2; ] G[X2; Z ].

where

Two problems when estimating the rst equation while overlooking the second one:

 If some X1 and X2 variables in common, endogeneity bias (because of

ED);

 If Z correlated with omitted variables (explaining ability), measurement-

error bias.

64

CHAPTER 4.

AUGMENTED PANEL DATA MODELS

4.7 Application: returns to education


Sample used: Panel Study of Income Dynamics (PSID), University of Michigan. See Baltagi and KhantiAkom 1990, Cornwell
and Rupert 1988.

595 individuals, for years 1976 to 1982 (7 time periods): heads of


households (males and females) aged between 18 and 65 in 1976,
with a positive wage in private, nonfarm employment for the
years 1976 to 1982.

4.7.1

Variables related to job status

 LW AGE : logarithm of wage earnings;


 W KS : number of weeks worked in the year;
 EXP : working experience in years at the date of the sample;
 OCC : dummy, 1 if bluecollar occupation;
 IND : dummy, 1 if working in industry;
 UNION : dummy, 1 if wage is covered by a union contract.
4.7.2

Variables related to characteristics of households


heads

 SMSA : dummy, 1 if household resides in SMSA (Standard


Metropolitan Statistical Area);

 SOUT H : dummy, 1 if individual resides in the south;


 MS : Marital Status dummy, 1 if head is married;

4.7.

65

APPLICATION: RETURNS TO EDUCATION

 F EM : dummy, 1 female;
 BLK : dummy, 1 if head is black;
 ED : number of years of education attained.
Individual-specic variables:

ED, BLK

and

F EM .

Estimation of non-augmented models (w/o


Variables

a priori

individual eects):

MS );

Variables

IND).

a priori

Zi's)

endogenous (because correlated with ability:

X2: (EXP E , EXP E 2, UNION , W KS ,

exogenous:

X1: (OCC , SOUT H , SMSA,

Augmented model

Yit = X1it 1 + X2it 2 + Z1i 1 + Z2i 2 + i + "it


a priori endogenous: Z2: ED;
Variables a priori exogenous: Z1 : (BLK , F EM ).
Variables

66

CHAPTER 4.

Table 4.1:
Variable

LW AGE
EXP
W KS
OCC
IND
UNION
SOUT H
SMSA
MS
ED
F EM
BLK

AUGMENTED PANEL DATA MODELS

Sample 1 1976-1982. Descriptive Statistics

Mean

Std. Dev.

Minimum

Maximum

6.6763

0.4615

4.6052

8.5370

19.8538

10.9664

1.0000

51.0000

46.8115

5.1291

5.0000

52.0000

0.5112

0.4999

0.0000

1.0000

0.3954

0.4890

0.0000

1.0000

0.3640

0.4812

0.0000

1.0000

0.2903

0.4539

0.0000

1.0000

0.6538

0.4758

0.0000

1.0000

0.8144

0.3888

0.0000

1.0000

12.8454

2.7880

4.0000

17.0000

0.1126

0.3161

0.0000

1.0000

0.0723

0.2590

0.0000

1.0000

4.7.

67

APPLICATION: RETURNS TO EDUCATION

Table 4.2:

Dependent variable: log(wage).

Exogenous regressors

only.
Within

GLS

0.0976 (0.0040)

OCC

-0.0696 (0.02323)

-0.0701 (0.02322)

SOUTH

-0.0052 (0.05833)

-0.0072 (0.05807)

SMSA

-0.1287 (0.03295)

-0.1275 (0.03290)

0.0317 (0.02626)

0.0317 (0.02624)

Constant

IND

2(4) = 0:551

Notes. Standard errors are in parentheses.

Table 4.3:

Dependent variable: log(wage). Endogenous regressors

only.
Within

GLS

0.0561 (0.0024)

0.1136 (0.002467)

0.1133 (0.002466)

EXPE2

-0.0004 (0.000054)

-0.0004 (0.000054)

WKS

0.0008 (0.0005994)

0.0008 (0.0005994)

-0.0322 (0.01893)

-0.0325 (0.01892)

0.0301 (0.01480)

0.0300 (0.01479)

Constant
EXPE

MS
UNION

Notes. Standard errors are in parentheses.

2(5) = 24:94

68

CHAPTER 4.

Table 4.4:

AUGMENTED PANEL DATA MODELS

Dependent variable: log(wage). Augmented model.


Within

GLS

0.1866 (0.01189)

OCC

-0.0214 (0.01378)

-0.0243 (0.01367)

SOUTH

-0.0018 (0.03429)

0.0048 (0.03188)

SMSA

-0.0424 (0.01942)

-0.0468 (0.01891)

IND

0.0192 (0.01544)

0.0148 (0.01521)

EXPE

0.1132 (0.00247)

0.1084 (0.00243)

-0.0004 (0.00005)

-0.0004 (0.00005)

0.0008 (0.00059)

0.0008 (0.00059)

-0.0297 (0.01898)

-0.0391 (0.01884)

0.0327 (0.01492)

0.0375 (0.01472)

FEM

-0.1666 (0.12646)

BLK

-0.2639 (0.15413)

ED

0.1373 (0.01415)

Constant

EXPE2
WKS
MS
UNION

2(9) = 495:3

Notes. Standard errors are in parentheses.

Table 4.5:

Dependent variable: log(wage). IV Estimation


HT

AM

BMS

0.1772 (0.017)

0.1781 (0.016)

0.1748 (0.016)

-0.0207 (0.013)

-0.0208 (0.013)

-0.0204 (0.013)

0.0074 (0.031)

0.0072 (0.031)

0.0077 (0.031)

-0.0418 (0.018)

-0.0419 (0.018)

-0.0423 (0.018)

IND

0.0135 (0.015)

0.0136 (0.015)

0.0138 (0.015)

EXPE

0.1131 (0.002)

0.1129 (0.002)

0.1127 (0.002)

-0.0004 (0.005)

-0.0004 (0.000)

-0.0004 (0.000)

0.0008 (0.000)

0.0008 (0.000)

0.0008 (0.000)

-0.0298 (0.018)

-0.0300 (0.018)

-0.0303 (0.018)

0.0327 (0.014)

0.0324 (0.014)

0.0326 (0.014)

FEM

-0.1309 (0.126)

-0.1320 (0.126)

-0.1337 (0.126)

BLK

-0.2857 (0.155)

-0.2859 (0.155)

-0.2793 (0.155)

0.1379 (0.021)

0.1372 (0.020)

0.1417 (0.020)

Constant
OCC
SOUTH
SMSA

EXPE2
WKS
MS
UNION

ED
Test

2(3) = 5:23 2(13) = 19:29 2(13) = 12:23

Notes. Standard errors are in parentheses.

Chapter 5
Dynamic panel data models
5.1 Motivation
Usefulness of dynamic panel data models:

 Investigate adjustment dynamics in micro- and macro-economic


variables of interest;

 Estimate equations from intertemporal-framework models (lifecycle models, nance,...)

In practice: estimate long-run elasticities and structural parameters from Euler equations.

5.1.1

Dynamic formulations from dynamic programming


problems

Consider the general problem

R

maxq(0);:::;q(T ) E e rt(t) ;
(t) = p(t)q(t) c[q(t); b(t)];
b_ = G[b(t); q(t)];
69

70

CHAPTER 5.

DYNAMIC PANEL DATA MODELS

b(t) is the state variable (stock, capital,...), q(t) is the control variable, r is discount rate. G(:) describes the evolution path
where

of the state variable.

Dynamic programming solves the problem in a series of steps.

Switch to discrete-time framework:

nP
o
T
t
maxq0;:::;qT E
t=0(1 + r) t ;

bt+1 = f (bt; qt);


and use the Bellman equation:



Vt(bt) = max Et t + (1 + r) 1Vt+1(bt+1)

= max Et fptqt c[qt; bt] + Vt+1f [bt; qt]g ;


where Vt (bt ) is the value function of the problem at time t,
Et is the conditional expectation operator at time t.

and

We use a) the envelope theorem (evolution path at optimum depends only on state variable, as control variable is already optimized); b) First-order condition wrt. control variable.

@Vt(bt) @ t(bt; qt)


1 @Vt+1 @f (bt; qt)
=
+
;
@bt
@bt
1 + r @f
@bt
(Envelope theorem)

@Vt(bt) @ t(bt; qt)


1 @Vt+1 @f (bt; qt)
=
+
=0
@qt
@qt
1 + r @f
@qt
From (F OC ):


@Vt+1
@ t @f (bt; qt)
=
@f
@qt
@qt

 1

(1 + r);

(FOC)

5.1.

71

MOTIVATION

that we replace in rst equation above:

@Vt @ t
=
@bt @bt
Now we lag

 1

(F OC ) once and replace:


"

@ t 1
1 @ t
+
@qt 1 1 + r @bt
Assume

@ t @f (bt; qt)
@qt
@qt


@ t @f
@qt @qt

 1

@f=@q = a1 and @f=@b = a2.




@f (bt; qt)
:
@bt

@f @f (bt 1; qt 1)
= 0:
@bt
@qt 1

We have

@ t
1 + r @ t 1
a @ t
=
+ 1
:
@qt
a2
@qt 1
a2 @bt
This is the Euler equation relating current and past marginal
prots.
If, for instance, prot is linear-quadratic in

b0 + b1qt + b2bt =

1+r (b0 + b1 qt 1 + b2 bt 1)
a 

2

qt and bt, we have

a1
a2

(c0 + c1qt + c2bt)

qit = 0 + 1qi;t 1 + 2bi;t 1 + 3bit + i + "it;

where

0
1
2
3
5.1.2

= (a2 b1
= (a2 b1
= (a2 b1
= (a2 b1

a1c1)
a1c1)
a1c1)
a1c1)

1 [b ((1 + r) a ) + a c ] ;
0
2
1 0
1 [(1 + r)b ] ;
1
1 [(1 + r)b ] ;
2
1 [a c a b ] :
1 2
2 2

Euler equations and consumption

72

CHAPTER 5.

DYNAMIC PANEL DATA MODELS

Consider a two-period model with the following period-to-period


budget constraint

ct + At = yt + At 1(1 + rt); t = 1; 2;
where

ct

is consumption at time

income, and

rt is interest rate.

t, At

is total assets,

yt

is wage

Assume further, intertemporally additive preferences:

U = u(c1) +
where

u0 > 0, u00 < 0 and

1
u(c );
1+ 2

 0 is the subjective discount rate.

Often-used specication: CES (Constant Elasticity of Substitution)

U
where

= c1  +

1
c2 ;
1+

 = 1=(1+ ) is the intertemporal elasticity of substitution.

At the optimum (by replacing budget constraints in utility function and optimizing wrt.

A1):

@u @c1
1 @u @c2
@U
=
+
=0
@A1 @c1 @A1 1 + @c2 @A1
@u 1 + r @u
, @c
=
:
1 1 + @c2

This is the

intertemporal eciency condition

the CES case we have

c1 1= =

1+r
c2 1= :
1+

(Hall 1978), and in

5.1.

73

MOTIVATION

Stochastic framework with

c1 =

1+r
(
1+

u(X ) = 1=2(

Ec2)

X )2 :

c1 = Ec2

if

r = :

Hall Euler equation with more than 2 periods reduces to

ct+1 = ct + "t+1;

where

"t+1 is i.i.d.;

which is tested from the equation

ct = 0 + 1yt + 2(yt 1

ct 1) + "t:

This is an error-correction model that can be written

ct = 0 + 1yt + (ct 1
5.1.3

1yt 1) + 2(yt 1

ct 1) + "t:

Long-run relationships in economics

Long-run relationships are represented by the stationary path


of the variable of interest (consumption, capital stock,...)

yt+1
yt

=  and if we add variable xt, yt+1 = yt + xt+1 , stationary


 x .
equilibrium path is y =
1 
5.1.3.1 Long-run elasticities
Dynamic models are helpful in computing long-run elasticities.
Consider for example the dynamic consumption model

where

C~i;t+j = C~i;t+j 1 + P~i;t+j + ui;t+j ;


C~i;t+j and P~i;t+j respectively denote logs of

and price.
have

consumption

Lagged consumption here accounts for habits.

C~i;t+j = j +1C~i;t 1 + j P~it + j 1P~i;t+1 + : : :

We

74

CHAPTER 5.

DYNAMIC PANEL DATA MODELS

+P~i;t+j 1 + P~i;t+j + ui;t+j ;


j
j 1u
where ui;t+j =  uit + 
i;t+1 +    + ui;t+j 1 + ui;t+j .
Assume we want to compute the change in consumption at

t + j following a permanent change of 1% in price between


t and t + j :
@ C~i;t+j
@ C~i;t+j @ C~i;t+j
+
+



+
= (j + j 1 +    +  + 1):
~
~
~
@ Pit
@ Pi;t+1
@ Pi;t+j
time

When consumption is stationary (in logs),

jj < 1, and the long-

run eect of price obtains by taking the limit

j
X
@ C~i;t+j

j + j 1 +    +  + 1) = :
=
lim

(

~
j !1
j !1
1 
s=0 @ Pi;t+s

lim

5.1.3.2 Dynamic representations from AR(1) errors


Consider the following Cobb-Douglas production model

log Qit = 1 log Nit + 2 log Kit + uit;


where

Qit

is output of rm

capital stock, and


poses into

where

t

change),

uit

i at time t, Nit

is labor input,

Kit

is

is the residual. Assume the latter decom-

uit = t + i + vit + "it;

is a year-specic intercept (industry-wide technological

is the unobserved rm-specic eect,

error component (measurement error), and


shock having an AR(1) representation:

vit = vi;t 1 + eit:

vit

"it

is an i.i.d.

is a productivity

5.2.

75

THE DYNAMIC FIXED-EFFECT MODEL

This model has the following, dynamic representation:

log Qit = 1 log Nit

 1 log Ni;t 1 + 2 log Kit

 2 log Ki;t 1

+ log Qi;t 1 + (t

t 1) + [ i(1 ) + eit + "it

"i;t 1] ;

or

log Qit = 1 log Nit +  log Ni;t 1 + 3 log Kit +  log Ki;t 1
+5 log Qi;t 1 + t + ( i + !it);
subject to restrictions

2 = 1 5 and 4 = 35.

Hence, equivalence between a static (short-run) model with seriallycorrelated productivity shocks, and a dynamic representation of
production output.

5.2 The dynamic xed-eect model


Simple dynamic panel-data model:

yit = yi;t 1 + i + "it; i = 1; 2; : : : ; N ; t = 1; 2; : : : ; T;


yi0; i = 1; 2; : : : ; N are assumed known.
2
We assume E ("it ) = 0 8i; t, E ("it "js ) = "
if i = j; t = s and 0 otherwise, E ( i "it ) = 0 8i; t.

where initial conditions

By continuous substitution:

yit = "it + "i;t 1

+ 2"

i;t

2 +

+ t

t
1 " + 1  + t y :
i1
i0
1  i

76

CHAPTER 5.

5.2.1

DYNAMIC PANEL DATA MODELS

Bias in the Fixed-Eects estimator

The Within estimator is:

^ =

PN PT
i=1 t=1(yit yi)(yi;t 1 yi; 1) ;
PN PT
2
i=1 t=1 (yi;t 1 yi; 1)

^ i = yi

^yi; 1;

where

T
T
T
1X
1X
1X
y ; yi; 1 =
y ; "i =
" :
yi =
T t=1 it
T t=1 i;t 1
T t=1 it
Also,

1 PN PT ("it "i )(yi;t 1 yi; 1)


^ =  + NT 1i=1PN t=1PT
;
2
(
y
y

)
i;t
1
i;
1
i=1 t=1
NT
This estimator exists if denominator

6= 0 and is consistent if nu-

merator converges to 0.
Numerator:

1
plimN !1
NT
because

N;T
X
i;t

(yi;t 1

yi; 1)("it

N
1X
"i) = plim
y "
N i=1 i; 1 i

"it is serially uncorrelated and not correlated with i .

We

use

T
1X
1 1 T
(T
yi; 1 =
yi;t 1 =
yi0 +
T t=1
T 1 

1) T  + T
i
(1 )2


1 T 1
1 T 2
+
" +
" +    + "i;T 1 :
1  i1
1  i2

5.2.

77

THE DYNAMIC FIXED-EFFECT MODEL

We have

N
X

plim

N
X

1
1
1
yi; 1"i = plim
"i
N i=1
N i=1 T
(

N
X

T
X

1
1
"
N i=1 T t=1 it

"2 (T 1)
= 2
T
(1

1
T

"

T 1
X
1

1
t=1

"

T 1
X
1

T t

T t

#)

"it

#)

"it
1

t=1

T  + T
:
)2
1 PN;T (y
2
In a similar manner, we show that plim
i;t 1 yi; 1)
i;t
NT
= plim

2
= " 2 1
1 

1) T  + T
T2

2
(T

(1 )2

1
T

Forming the ratio of these two terms, the asymptotic bias is

1 1 T
T 1 

1+
plimN !1 (^
 ) =
1
T 1

2
(1 )(T

1)

1 T
T (1 )

 1

= O(1=T ):

In the transformed model

(yit

yi) = (yi;t 1

yi; 1) + ("it "i);

the explanatory variable is correlated with residual, and correlation is of order

1=T .

Hence, the Fixed-Eects estimator is biased

in the usual case where

is large and

is small.

78

CHAPTER 5.

Table 5.1:

0.2

0.5

0.7

0.9

DYNAMIC PANEL DATA MODELS

Asymptotic bias in Fixed-Eects DPD estimator


Bias

Percent

-0.2063

-103.1693

-0.1539

-76.9597

10

-0.1226

-61.3139

20

-0.0607

-30.3541

40

-0.0302

-15.0913

-0.2756

-55.1282

-0.2049

-40.9769

10

-0.1622

-32.4421

20

-0.0785

-15.6977

40

-0.0384

-7.6819

-0.3307

-47.2392

-0.2479

-35.4084

10

-0.1966

-28.0912

20

-0.0938

-13.3955

40

-0.0449

-6.4114

-0.3939

-43.7633

-0.3017

-33.5179

10

-0.2432

-27.0248

20

-0.1196

-13.2934

40

-0.0563

-6.2561

5.2.

79

THE DYNAMIC FIXED-EFFECT MODEL

5.2.2

Instrumental-variable estimation

Only way to obtain consistent estimator of


(small).

when

is xed

Dierent procedure to eliminate individual eects: use

First dierencing instead of Within:

(yit

yi;t 1) = (yi;t 1 yi;t 2) + ("it


yit = yi;t 1 + "it;

"i;t 1)

and in vector form:

yi = yi; 1 + "i; i = 1; 2; : : : ; N:


In model above, yi;t 1 correlated by construction with "i;t 1 ! We
need instruments that are uncorrelated with ("it
"i;t 1) but correlated with (yi;t 1
yi;t 2). Only possibility in a single-equation
framework with no other explanatory variables: use values of dependent variables.

Because of autoregressive nature of model, instruments from fu-

yit are not feasible because yit is a recursive function


of "it ; "i;t 1; : : : ; "i1 ; i ; yi0 .
As for lagged dependent variables, we can use either yi;t 2 or
(yi;t 2 yi;t 3):
E [yi;t 2("it "i;t 1)] = E ("i;t 2"it) E ("i;t 2"i;t 1) = 0;
E [(yi;t 2 yi;t 3)("it "i;t 1)] = E ["i;t 2("it "i;t 1)]
E ["i;t 3("it "i;t 1)] = 0;
E [yi;t 2(yi;t 1 yi;t 2)] = 0 E ("2i;t 2) = "2;
E [(yi;t 2 yi;t 3)(yi;t 1 yi;t 2)] = 0 E ("2i;t 2) = "2:
Instrumental-variable estimators that are consistent when N and/or
T ! 1:
PN PT
(y y )(y
y )
^ = PNi=1PT t=3 it i;t 1 i;t 2 i;t 3
i=1 t=3 (yi;t 1 yi;t 2)(yi;t 2 yi;t 3)

ture values of

80

CHAPTER 5.

^ =

or

Conclusion:
even though
because the

DYNAMIC PANEL DATA MODELS

PN PT
i=1 t=3(yit
PN PT
i=1 t=3(yi;t 1

yi;t 1)yi;t 2
:
yi;t 2)yi;t 2

With Within transformation on a dynamic model,

i is eliminated, endogeneity bias occurs for xed T


Q operator used introduces errors "is correlated by

construction with current explanatory variable.

Consider now a more general model:

yit = yi;t 1 + xit + zi + i + "it:


IV Estimation proceeds as follows.

Step 1.

(yit

First-dierence the model, to get

yi;t 1) = (yi;t 1

yi;t 2) + (xit

xi;t 1) + "it

"i;t 1:

yi;t 2 or (yi;t 2 yi;t 3) as instrument for (yi;t 1 yi;t 2) and


estimate ; with the IV procedure.
Use

Step 2.

Substitute

yi

^yi; 1

and estimate

Step 3.

^ and ^

in rst-dierence Between equation:

xi ^ = zi + i + "i; i = 1; 2; : : : ; N;

by OLS.

Estimate variance components:

^ 2" = 2N (T1 1) Ni=1 Tt=1 [(yit yi;t 1) ^(yi;t 1


i2
^
(xit xi;t 1) ;
i2
PN h
2
1
1 ^ 2;
^
^ = N i=1 yi ^yi; 1 zi ^ xi
"
T

yi;t 2)

5.3.

81

THE RANDOM-EFFECTS MODEL

Consistency of the estimator:

, and "2 are consistent when N or T ! 1;


2
IV estimator of and  are consistent only when T ! 1, but
inconsistent when T is xed and N ! 1.

IV estimator of

5.3 The Random-eects model


We now treat

for static models,

as a random variable, in addition to

yi;t 1.

Bias in the ML estimator

In the simple model

yit = yi;t 1 + i + "it, the MLE is equivalent

to the OLS estimator:

^ =

PN PT
i=1 t=1 yit yi;t 1
PN PT
2
i=1 t=1 yi;t 1

=+

PN PT
i=1 t=1( i + "it)yi;t 1 :
PN PT
2
i=1 t=1 yi;t 1

We show that

N X
T
1 1 T
1 X
( + " )y
=
Cov(yi0; i)
plimN !1
NT i=1 t=1 i it i;t 1 T 1 
1  2 
T;
+
(
T
1)
T

+

T (1 )2
and

As

is not eliminated, but it is correlated by

construction with lagged dependent variable

5.3.1

"it.

N X
T
N 2
1 X
1 2T
2
i yi0
plimN !1
yi;t 1 =
:
NT i=1 t=1
T (1 2) N


 2 1
1 T 1 2T
+
: T 2
+
(1 )2 T
1 
1 2

82

CHAPTER 5.

DYNAMIC PANEL DATA MODELS

1 T 1
+
T (1 ) 1 
1

"2
+
(T 1)
T (1 2)2
2

2T
Cov(yi0; i)
2


T 2 + 2T :

The bias depends on the behavior of initial conditions


or generated as

5.3.2

yit).

yi0 (constant

An equivalent representation

We consider a more general model

yit = yi;t 1 + xit + zi + uit;


with the following assumptions:

jj < 1; E ( i) = E ("it) = 0;


E ( ixit) = 0; E ( izi) = 0; E ( i"it) = 0;
E ( i j ) =  2 if i = j;
0 otherwise;
E ("it"js) = "2 if i = j; t = s;
0 otherwise:
We can also write

wit = wi;t 1 + xit + zi + "it;


yit = wit + i;
where i = i =(1
); Ei = 0; V ar(i) = 2 =  2 =(1 )2;
and the dynamic process
fect

i.

fwitg is independent from individual ef-

5.3.

83

THE RANDOM-EFFECTS MODEL

5.3.3

The role of initial conditions

The two equivalent specications of the model are:


(A)

(B)
In model (A),

yit

yit = yi;t 1 + xit + zi + i + "it;


wit = wi;t 1 + xit + zi + "it;
yit = wit + i:
is driven by unobserved characteristics

ferent across units, in addition to

xit and zi .

i , dif-

wit is independent from individual


eects i . Conditional on exogenous xit and zi , wit are driven by
identical processes with i.i.d. shocks "it . But observed value yit is
shifted by individual-specic eect i .

In model (B), dynamic process

Possible interpretation:
and

wit

is a latent variable,

i is a time-invariant measurement error.

The two processes are equivalent because

yit

is observed,

wit is unobserved.

But

assumptions (or knowledge) on initial conditions may help to distinguish between both processes.

Dierent cases:

 1/ yi0 xed;
 2/ yi0 random;
 2.a/ yi0 independent of i, with E (yi0) = y
2 ;

and

V ar(yi0) =

y0

 2.b/ yi0 correlated with i, with Cov(yi0; i) = y2 ;


 3/ wi0 xed;
 4/ wi0 random;
 4.a/ wi0 random with common mean w and variance "2=(1
0

2)

84

CHAPTER 5.

DYNAMIC PANEL DATA MODELS

(stationarity assumption);

 4.b/ wi0 random with common mean w and arbitrary variance


w2 0;

 4.c/ wi0 random with mean i0 and variance "2=(1

2) (sta-

tionarity assumption);

 4.d/ wi0 random with mean i0 and arbitrary variance w2 0.
See Appendix 4 for a derivation of Maximum Likelihood estimators in each case.

5.3.4

Possible inconsistency of GLS

In cases 1 and 2.a/ (yi0 xed of random but independent of i ):

"2 are known, maximizing log-likelihood wrt. ;


2
2
and yields the GLS estimator. When  and " are unknown,
When

 2

and

feasible GLS applies by using consistent estimates of these variances in

VT .

Other cases

and  are consistent when T ! 1, because GLS


converges to Within. When N ! 1 and T is xed, GLS is inconEstimators for

sistent in cases where initial values are correlated with individual


eects.

5.3.5

Example: The Balestra-Nerlove study

Seminal paper on Dynamic Panel Data models (1966). Household


demand for natural gas in the US, including a/ the demand due
to replacement of gas appliances, and b/ demand due to increases
in the stock of appliances.

5.3.

85

THE RANDOM-EFFECTS MODEL

Table 5.2: Properties of the MLE for dynamic panel data models
Parameters

xed,

Case 1:

; ; "2
; 2

Case 2.a:

; ; "2
y ; ;  2 ; y2
0

; ; "2
wi0; ; 2

yi0

xed,

!1

xed
Consistent

Inconsistent

Consistent

yi0

Case 2.b:

!1

Consistent

; ; "2
y ; ;  2 ; y2 ; 

random,

yi0

ind. of

Consistent

Consistent

Inconsistent

Consistent

yi0

correlated with

Consistent

Consistent

Inconsistent

Consistent

Case 3:

wi0

xed

Consistent

Inconsistent

Inconsistent

Inconsistent

wi0 random, mean w , variance "2=(1 2)


; ; "2
Consistent
Consistent
2
w ; ; 
Inconsistent
Consistent
2
Case 4.b: wi0 random, mean w , variance w
; ; "2
Consistent
Consistent
2
w ; ;  ; w
Inconsistent
Consistent
Case 4.c: wi0 random, mean i0, variance "2 =(1
2)
; ; "2
Consistent
Inconsistent
2
Inconsistent
Inconsistent
i0; ; 
2
Case 4.d: wi0 random, mean i0, variance w
; ; "2
Consistent
Inconsistent
2
2
i0;  ; w
Inconsistent
Inconsistent

Case 4.a:

86

CHAPTER 5.

DYNAMIC PANEL DATA MODELS

Demand system:

Git = Git (1 r)Gi;t 1;


Fit = Fit (1 r)Fi;t 1;
Fit = a0 + a1Nit + a2Iit;
Git = b0 + b1Pit + b2Fit;
Git and Git are respectively the new demand and the actual
demand for gas at time t from unit i, r is the appliances deprecia
tion rate, Fit and Fit are respectively the new and actual demand
for all types of fuel, Nit is total population, Iit is per-head income,
and Pit is relative price of gas.
where

Solving the system, we have the equation to be estimated:

Git = 0 + 1Pit + 2Nit + 3Ni;t 1


+ 4Iit + 5Ii;t 1 + 6Gi;t 1;
where

Nit = Nit

Ni;t 1, Iit = Iit

Ii;t 1, and 6 = 1 r.

Estimation procedures: OLS, Within (LSDV) and GLS (with as-

Gi0 are xed, case 1/).


In accordance with the theory,  (here, 6 ) is biased upward for
sumption that initial conditions

OLS and downward for Within.

5.3.

87

THE RANDOM-EFFECTS MODEL

Table 5.3: Parameter estimates, Balestra-Nerlove model


Parameter

0 (Intercept)
1 (Pit)
2 (Nit)
3 (Ni;t 1)
4 (Iit)
5 (Ii;t 1)
6 (Gi;t 1)

OLS

Within

GLS

-3.650

-4.091

(3.316)

(11.544)

-0.0451(*)

-0.2026

-0.0879(*)

(0.027)

(0.0532)

(0.0468)

0.0174(*)

-0.0135

-0.00122

(0.0093)

(0.0215)

(0.0190)

0.00111(**)

0.0327(**)

0.00360(**)

(0.00041)

(0.0046)

(0.00129)

0.0183(**)

0.0131

0.0170(**)

(0.0080)

(0.0084)

(0.0080)

0.00326

0.0044

0.00354

(0.00197)

(0.0101)

(0.00622)

1.010(**)

0.6799(**)

0.9546(**)

(0.014)

(0.0633)

(0.0372)

Notes. N = 36, T = 11. Standard errors are in parentheses. (*) and (**):
parameter signicant at 10% and 5% level respectively.

88

CHAPTER 5.

DYNAMIC PANEL DATA MODELS

Part II
Generalized Method of Moments
estimation

89

Chapter 6
The GMM estimator
Generalized Method of Moments: ecient way to obtain consistent parameter estimates under mild conditions on the model.
Very popular in estimating structural economic models, as it requires much less conditions on model disturbances than Maximum
Likelihood. Another important advantage: easy to obtain parameter estimates that are robust to heteroskedasticity of unknown
form.

6.1 Moment conditions and the method of moments


6.1.1

Moment conditions

N , fxi; i = 1; 2; : : : ; N g from which one


wishes to estimate a p  1 vector  whose true value is 0 .
Note: notation above is very general, xi will typically include de-

Consider a sample of size

pendent (endogenous) and explanatory (exogenous, endogenous)


variables.

Let

f (xi; ) denote a q  1 function whose expectation E [f (xi; )]


91

92

CHAPTER 6. THE GMM ESTIMATOR

exists and is nite. Moment conditions are then dened as

E [f (xi; 0)] = 0:
6.1.2

Example: Linear regression model

Consider the linear model

yi = xi 0 + ui; i = 1; 2; : : : ; N;
where

0 :

true value of parameter vector

term.
A common assumption is

and

ui

is the error

E (uijxi) = 0 , E (yijxi) = xi 0, and

from the Law of Iterated Expectations:

E (xiui) = E [E (xiuijxi)] = E [xiE (uijxi)] = 0:


In terms of the denition above,

xi ).

 = and f ((xi; yi); ) = xi(yi

Moment conditions are then

E (xiui) = E [xi(yi
Note that here,

p = q,

xi 0] = 0:

as many moment conditions as we have

parameters to estimate.

Suppose now we do not assume

E (ziui) = 0.
such that

There are

Vector

but instead, that

zi is q  1 and would consist of instruments

E (ziui) = E [zi(yi xi 0)] = 0;


f [(xi; yi; zi); ] = zi(yi xi ):

or

q moment equations (as many as there are instruments)

p parameters to estimate.
q  p.
and

E (uijxi) = 0

Hence, identication condition is

6.1.

MOMENT CONDITIONS AND THE METHOD OF MOMENTS

6.1.3

Example: Gamma distribution

A sample
bution

93

fxi; i = 1; 2; : : : ; N g is drawn from a Gamma distri-

(a; b) with

true values

a0

and

b0.

Relationship between

parameters and two rst moments of the distribution:

a
E (xi) = 0 ;
b0

a
E (xi)]2 = 20 :
b0
In our notation in the denition above:  = (a; b) and
h
a
a 2 ai
; (x
)
;
f (xi; ) = xi
b i b
b2
so that E [f (xi; 0] = 0.
6.1.4

E [xi

Method of moments estimation

 using moment conditions given above ? In the


case where p = q (as many conditions as parameters), we could
solve E [f (xi; 0 )] = 0 for 0 . But E [f (:)] is unknown, whereas
function values f (xi;  ) can be computed 8; 8i. Also, sample
moments of function f (:) can be computed:

How to estimate

N
1X
fN () =
f (x ; ):
N i=1 i

E (f ) close to
fN (population moments close to empirical moments), then ^N is
a convenient estimate for 0 , where f (^
 N ) = 0.
0 = E [f (0)]  fN (^N ) ) 0  ^N :

Basic idea of the method of moment estimation: if

Two important conditions need to hold for the method of moment


estimation to be valid: a)

E (f )

is adequately approximated by

94

CHAPTER 6. THE GMM ESTIMATOR

fN ; b) moment conditions can be solved for ^N .


Example: linear regression.
Sample moment conditions are

N
N
1X
1X
x u^ =
x (y
N i=1 i i N i=1 i i
and solving for

^ N

yields

^ N =
6.1.5

xi ^ N ) = 0;

N
X
i=1

xix0i

! 1 N
X
i=1

xiyi:

Example: Poisson counting model

Poisson process: dependent variable is discrete (number of events,


etc.). Restriction: Mean of distribution is equal to the variance.
Assumption:

dependent variables

y1; y2; : : : ; yN

are distributed

according to independent Poisson distributions, with parameters

1; 2; : : : ; N

respectively.

P rob[yi = r] = exp( i)


We assume the

i's

ri
r!

depend on explanatory variables by a log-

linear relationship:

log i = 0 +

p
X
j =1

j xij :

The likelihood of the Poisson model is

L=

Ni=1

exp(


yi i
i)
yi!

"

= exp

N
X
i=1

i + 0

N
X
i=1

yi

6.1.

MOMENT CONDITIONS AND THE METHOD OF MOMENTS

p
X
j =1

N
X
i=1

xij yi

 1

Ni=1yi!

95

Let us consider the following sample moments :

T0 =

N
X
i=1

yi

Tj =

N
X
i=1

xij yi

j = 1; : : : ; p;

and we use the fact that

@i
= i
@ 0
If we set derivatives of

T0 =

N
X
i=1

^ i

and

@i
= xij i:
@ j

log L wrt. 0 and the j 's to 0, we get

Tj =

N
X
i=1

xij ^i

j = 1; : : : ; p

P
^i = exp( ^ 0 + pj=1 ^ j xij ): Hence, we match sample moPN
Pp
^
ments T0 and Tj to theoretical moments
exp(

+
^ j xij )
0
i
=1
j
=1
PN
^ Pp ^
and Tj =
i=1 xij exp( 0 + j =1 j xij ) respectively.
We have p + 1 such matching conditions for p + 1 parameters.

where

6.1.6

Comments

Note the dierence between the Method of Moments philosophy


and the usual estimation criteria. For Maximum Likelihood and
Least Squares, we maximize (minimize) a criterion

^ = arg max  log L() (MLE);


^ = arg min N1 PNi [yi f (xi; )]2

(LS)

96

CHAPTER 6. THE GMM ESTIMATOR

whereas here, we start from First-order Conditions and solve the


system for

.

Example: Instrumental Variable estimation


We could consider minimizing the IV criterion wrt.

^ = arg min (Y


where

X)0Z (Z 0Z ) 1Z 0(Y

:

X);

Z is a N  q matrix of instruments, or start from the FOC:


N
N
1X
1X
z u^ =
z (y
N i=1 i i N i=1 i i

^ =

N
X
i=1

zi0 xi

! 1 N
X
i=1

xi^) = 0

zi0 yi = (Z 0X ) 1Z 0Y:

Equivalently, we could maximize the log likelihood wrt.


from the FOC

 or start

N
1X
@ log L()
j=^ = 0;
N i=1
@

which can be regarded here as a set of sample moment conditions.

Problems that remain to be solved:

 Ensure that we can replace population moments by sample moments, for the Method of Moments to work.

 What if the system of moment conditions is overidentied (more


conditions than parameters) ?

 How to be sure our moment conditions are valid (e.g.,


choice of instruments) ?

valid

6.2.

97

THE GENERALIZED METHOD OF MOMENTS (GMM)

6.2 The Generalized Method of Moments (GMM)


6.2.1

Introduction

As the name indicates, GMM is an extension of the Method of

 are overidentied by moment conditions. Equations E [f (xi; 0 ] = 0 represent q conditions for p


unknown parameters, therefore we cannot nd a vector ^
N satisfying fN ( ) = 0.

Moments, when parameters

But we can look for

^ that makes fN () as close to 0 as possible,

by dening

^N = arg min QN () = fN ()0AN fN ();




AN

0(1).
Important note: for the just-identied case, QN ( ) = 0
fN () = 0, but in the over-identied case, QN () > 0.

where

is a positive weighting matrix of order

because

This fact is important for model checking (we will come to this
point later in the course).

6.2.2

Example: Just-identied IV model

Consider

Y = X + u with condition E (W 0u) = 0 (W

ments), and

rank(W 0X ) = p.

Solving for

we have

are instru-

^ = (W 0X ) 1(W 0Y )

that we replace in the IV criterion:

u( ^ )0PW0 u( ^ ) = Y


X (W 0X ) 1(W 0Y ) 0 W
(W 0W ) 1W 0

Y X (W 0X ) 1(W 0Y )





= Y 0PW Y + (W 0Y )0(W 0X ) 1X 0 PW X (W 0X ) 1(W 0Y )

98

CHAPTER 6. THE GMM ESTIMATOR

(W 0Y )0(W 0X ) 1X 0 PW Y Y 0PW X (W 0X ) 1(W 0Y )


= Y 0PW Y + (Y 0W )(W 0X ) 1(X 0W )(W 0W ) 1(W 0X )(W 0X ) 1
(W 0Y ) (Y 0W )(W 0X ) 1(X 0W )(W 0W ) 1(W 0Y )
(Y 0W )(W 0W ) 1(W 0X )(W 0X ) 1(W 0Y )
0
1 = (X 0 W ) 1 :
and because (W X )
u( ^ )0PW0 u( ^ ) = 2Y 0PW Y 2Y 0PW Y = 0:
6.2.3

A denition

Let the observed sample fxi; i = 1; 2; : : : ; N g from


which we wish to estimate a p  1 vector of parameters  whose
true value is 0. Let E [f (xi; 0)] = 0 be a set of q moment conditions, and fN () the corresponding set of sample moments. Dene
the criterion
QN = fN ()0AN fN ();
where AN is a stochastic, positive O(1) matrix. The GMM estimator of  is
^N = arg min QN ():
Denition 1

6.2.4

Example: The IV estimator again

Consider again the linear regression model with

q > p instruments

(this is an over-identied model). Moment conditions are

E (ziui) = E (zi(yi

xi 0)) = 0

and sample moments are

N
1X
fN ( ) =
z (y
N i=1 i i

xi ) =

1 0
(Z Y
N

Z 0X ):

6.3.

ASYMPTOTIC PROPERTIES OF THE GMM ESTIMATOR

99

Let us choose the weighting matrix as

AN =
Assume that

1 0Z

NZ

N
X

1
zi0 zi
N i=1

! 1

= N (Z 0Z ) 1:

converges in probability, (as

! 1), to a

A. The GMM criterion is then


1
QN ( ) = (Z 0Y Z 0X )0(Z 0Z ) 1(Z 0Y Z 0X ):
N
Dierentiating wrt. give rst-order conditions:
1
@QN ( )
j
^ N = 2X 0 Z (Z 0Z ) 1(Z 0Y Z 0 X ^ N ) = 0:

=

@
N
^ N , we have
Solving for
constant matrix


^ N = X 0Z (Z 0Z ) 1Z 0X 1 X 0Z (Z 0Z ) 1Z 0Y:

This expression is the IV formulation for the case where there are
more instruments than parameters.

6.3 Asymptotic properties of the GMM estimator


We examine here key properties that any useful estimator should
verify: consistency (convergence to the true parameter value as
the sample size gets large) and asymptotic normality (to be able
to use the asymptotic distribution for statistical inference).

100

CHAPTER 6. THE GMM ESTIMATOR

6.3.1

Consistency

Assumption set 1
(i)

E [f (xi; )] exists and is nite 8 2  and 8i.


gi() = E [f (xi; )].
gi() = 0 8i ,  = 0.

(ii) Let

There exists

0

such that

fNj and gNj respectively denote elements of the q vectors


p
fN () and gN (). Then fNj gNj !
0 uniformly 8 2 
and 8j = 1; 2; : : : ; q .

(iii) Let

(iv) There exists a non-random sequence of positive denite matrices

AN

such that

AN

AN

Under assumptions (i)


^N is weakly consistent.
Theorem 1

p
!
0.

(iv), the GMM estimator

(iii) is a stronger requirement than


pointwise convergence in probability on . It means that
Note: Uniform convergence in

 p

sup fNj () gNj ()


2

! 0 for j = 1; 2; : : : ; q:

With pointwise convergence in probability only, it is not always


true that
when

fNj (N )

gNj (N )

p
!
0, where N is a sequence of 

increases.

Elements of the proof:


From (iii) and (iv ), we can form a non-random sequence

Q N () = gN ()0AN gN ()

6.3.

ASYMPTOTIC PROPERTIES OF THE GMM ESTIMATOR

101

such that

p
!
0 uniformly for  2 :
 N () = 0 ,  =
we have that Q

QN () Q N ()
(i) and (ii),
Q N () > 0 otherwise.

From

0,

and

Therefore,

0 = arg min Q N ():


2

^N
 ^N minimizes QN ();
 0 minimizes Q N (p );
 QN () Q N () ! 0:
But this implies that

p
!
0, because

For asymptotic normality, we need additional assumptions.

6.3.2

Asymptotic normality

Assumption set 2
(v) Function

f (xi; ) is continuously dierentiable wrt.  on .


P

FN () = @fN ()=@ = N1 Ni=1 @f (xi; )=@.


p

 !
sequence N such that N
0, we assume that

(vi) Let

FN (N ) FN
where
on

.

FN

is a sequence of

For any

p
!
0;

q  p matrices that do not depend

102

CHAPTER 6. THE GMM ESTIMATOR

(vii) Function

f (xi; 0) satises a central limit theorem:

VN 1=2 NfN (0)

d
!
N (0; Iq );
N = NV ar[fN (0)], a sequence of q  q non-random,
where V
positive denite matrices.

Under Assumptions (i) (vii), thepGMM estimator


^N has the following asymptotic distribution: N (^N 0) v
N (0;
), where
is a p  p matrix:
Theorem 2

i 1

FN (^N )0AN FN (^N )

Using White (1984,

FN (^N )0AN VN AN FN (^N )


h
i 1
0
^
^
 FN (N ) AN FN (N ) :

Asymptotic theory for econometricians, Aca-

demic Press: Orlando, Denition 4.20):

 1=2 

0
0
^
^
^
^

FN (N ) AN VN AN FN (N )
FN (N ) AN FT (N )
p
d
N (0; Ip)
 N (^N 0) !

Proof:
We know that

f (^N ) = 0 because ^N

minimizes the GMM crite-

rion. Consider a rst-order expansion of

0:

fN

around the true value

0 = fN (^N ) = fN (0) + FN (N )(^N 0);



where N 2 [^
N ; 0]. Since ^N is a consistent estimator (proved
p
 !
above), we know that N
0.
Let us premultiply expansion above by FN (^
N )0AN :
FN (^N )0AN fN (^N ) = FN (^N )0AN fN (0)
+FN (^N )0AN FN (N )(^N

0) = 0

6.4.

103

OPTIMAL AND TWO-STEP GMM

(^N

0 ) =

N (^N

i 1

FN (^N )0AN FN (N )


h

FN (^N )0AN fN (0)


i

1
0 ) =
FN (^N )0AN FN (N ) 
p
FN (^N )0AN VN1=2VN 1=2 NfN (0)

p
VN 1=2 hNfN (0) is Ni(0; Iq ).
hp
i
p ^
^
Therefore, E
N (N 0) = 0 and V ar N (N 0) =
,
where
=
h
i 1
0

^
FN (N ) AN FN (N )
FN (^N )0AN VN AN FN (^N )
where

FN (^N )0AN FN (N )

i 1

(vi), we can replace FN () by FN (^N )


everywhere. Note that FN is q p, therefore the variance-covariance
matrix of the GMM estimator is p  p.

Finally, using Assumption

6.4 Optimal and two-step GMM


Optimality of GMM: what is the best weighting matrix

AN , the

one giving us the smallest asymptotic variance-covariance matrix.

1 0
1
0
0
Aopt
N = arg min (FN AN FN )) FN AN VN AN FN (FN AN FN ) :
AN

We now use the following lemma.

Lemma 3

The matrix

(FN0 AN FN )) 1 FN0 AN VN AN FN (FN0 AN FN ) 1

is positive semi-denite 8AN .

(FN0 VN 1FN ) 1

104

CHAPTER 6. THE GMM ESTIMATOR

If we select

AN = VN 1, we get

(FN0 AN FN )) 1 FN0 AN FN (FN0 AN FN ) 1

(FN0 AN FN ) 1

= 0:
Hence, best weighting matrix for GMM: inverse of the variancecovariance of moment conditions.
For this choice, variance of GMM is simply

 1

V ar(^N ) = FN0 (^N )VN 1FN (^N )


"

!0
!# 1
h
i
^
^
1
1 @f (x; N ) 1
1 @f (x; N )
V arf (x; ^N )
N @
N
N @

and this denes the optimal GMM. But: in general, no condition imposed on distribution of

u (this is an interesting feature of

GMM, compared to IV).

Empirical issue: nd an estimate of

VN

that produces a

heteroskedasticity-robust GMM estimator for .


Solution: use a two-step estimation procedure

 Step 1.

Compute an initial consistent estimator

arbitrary matrix for

AN (A1N ):

^1N

using an

^1N = arg min u0()ZA1N Z 0u():




 Step 2. Compute V^N from u(^1N ) and nd ^2N such that
^2N = arg min u0()Z (V^N ) 1Z 0u():


Disadvantage: Two-step GMM estimators are independent from


initial matrix

A1N

only asymptotically. In small samples, GMM

6.5.

105

INFERENCE WITH GMM

estimators may not be unique, depending on that choice. Several


solutions:

 Method 1.

Use an iterative algorithm for estimation, succes-

sively replacing

^N

and

AN

until full convergence.

 Method 2. Acknowledge the fact that optimal weighting matrix


depends on

, and solve
^N = arg min QN ()  fN ()0AN ()fN ():


In practice, construction of variance-covariance matrix depends on the nature of data: cross-sections, times series, or panel
data (see dedicated section below).

6.5 Inference with GMM


Advantage of GMM over many alternative estimation procedures:
easy to provide statistical inference on model validity. In general,
we will test for the validity of moment conditions, also denoted orthogonality conditions.

Recall GMM procedure:

^N = arg min QN () = fN () VN fN (), where fN ()


P
1 N f (x ; ) and V is a consistent estimator of
i
N
i
N
V = limN !1 var[ NfN (0)].
First-order condition associated with minimization of QN ( ):
Find

@QN (^N )
^N )0VN 1fN (^N ) = 0;
=
F
(
N
@ ^N
where FN (^
N ) = @fN (N )=@.
If ^
N satises FOC above, it must also satisfy
P^ VN 1=2fN (^N ) = 0;

106

CHAPTER 6. THE GMM ESTIMATOR

where

P^ = M^ (M^ 0M^ ) 1M^ 0

and

so that

M^ = VN 1=2FN (^N );


P^ VN 1=2fN (^N ) = VN 1=2FN (^N ) FN (^N )0VN 1FN (^N )


FN (^N )0VN 1fN (^N ):

 1

Population analog to condition above:

P V 1=2E [f (0)] = 0;
where

P = M (M 0 M ) 1M 0
and Fi ( ) = @f (xi;  )=@ .
Projection matrix
only
If

and

is of rank

p,

linear combinations of the

M = V 1=2E [Fi(0)];

so that restrictions above set

q1

vector

E [f (xi; 0)] to

0.

0 is identied, then these are the identifying conditions, and

the remaining conditions are unused in estimation.


The identifying restrictions determine the asymptotic distribution
of

^N :

N (^N

p
0
1
0
1
=
2
0) = (M M ) M V
NfN (0) + op(1);

M pN (^N 0) is asymptotically equivalent to


P V 1=2 NfN (0). This implies
p ^

d
N (N 0) !
N 0; (M 0M ) 1 :

where

The basic way of testing for model validity is to use the over-

identifying restrictions

(Iq

P )V 1=2E [f (xi; 0)] = 0;

6.5.

107

INFERENCE WITH GMM

which are or rank

p.

We have the sample analog:

P^ )VN 1=2fN (^N ) = VN 1=2fN (^N ):


QN (^N ) measures the extend to which

(IQ
Interpretation:

the data

satises the over-identifying restrictions. The asymptotic distribution of sample moments is determined by the function of the
data in the over-identifying restrictions:

VN 1=2 NfN (^N ) = (I q P )V 1=2 NfN (0) + op(1);


^ converges in probability to P . We nally have
because P

VN 1=2 NfN (^N )

d
!
N (0; Iq

P):

Both statistics (from identifying and over-identifying restrictions)


are orthogonal:

p
p
Cov[ N (^N 0); NfN (^N )]
= (M 0M ) 1M 0 (Iq P ) = 0:

It is equivalent to test model validity by testing either

H0 : E [f (xi; 0)] = 0

or

H0 : V 1=2E [f (xi; 0)] = 0;

V 1=2 is non-singular. H0 is the combination of identifyI


O
ing restrictions (H0 ) and over-identifying restrictions (H0 ):
H0I : P V 1=2E [f (xi; 0)] = 0;
H0O : (Iq P )V 1=2E [f (xi; 0)] = 0:

because

H0I because this is a set of p conditions,


O
by estimated sample moments. But H0

It is not possible to test for


automatically satised

can be tested because they are not necessary for identication.


The test statistic proposed by Hansen (1982) is

JN = NQN (^N )

d
!
2(q

p)

under

H0:

108

CHAPTER 6. THE GMM ESTIMATOR

It can be shown that

JN

A 0
JN v
zq (Iq
where zq v N (0; Iq ).

is asymptotically equivalent to

P )0 (Iq

P )zq = zq0 (I

P )zq ;

6.6 Extension: optimal instruments for GMM


We have seen above how to obtain the optimal GMM estimator,
by selecting for the weighting matrix the inverse of the covariance
matrix for the moment conditions. We now show how to obtain
an even more ecient GMM estimator, based on the best choice
for the instruments. We are looking for the optimal, asymptotic
variance minimizing choice of instruments.

Based on Newey 1993, Ecient estimation of models with conditional moment restrictions.

6.6.1

Conditional moment restrictions

(z; ) denote a s  1 vector of residuals, where z is a p  1


vector of observations (on all variables), and  is the q  1 vector
Let

of parameters. We have the following moment restrictions

E [(z; 0)jx] = 0

E [A(x)(z; 0)] = 0;
where x is a vector of conditioning variables, A(x) is an r  s
matrix of functions of x, and 0 the true value of parameters.
Focus of the analysis here: choose A(x) to minimize the asymptotic variance of the GMM estimator.
Let

@(z; 0)
D(x) = E
jx ;
@

(x) = E [(z; 0)(z; 0)0jx]:

6.6.

109

EXTENSION: OPTIMAL INSTRUMENTS FOR GMM

The optimal instruments are given by

B (x) = C:D(x)0
(x) 1;
where

is an arbitrary, nonsingular matrix, and the asymptotic

covariance matrix for these instruments is


 = E [D(x)0
(x) 1D(x)] 1 :
Example: Linear model with heteroskedasticity
We have in the model

D(x) = x0;

y = x0 0 + "; E ("jx) = 0,

(x) = E ("2jx); C = I; B (x) = x=2(x):

The corresponding IV estimator is in this case the weighted least

1=2(x): 
1 corrects for heteroskedasticity,
Analogy with linear model:
(x)
and derivatives @(z; 0)=@ correspond to regressors, and matrix
D(x) is a function of x closely correlated with those derivatives.

squares estimator with weight

Since

 does not depend on C , we can set C = I

and dene

mA = FN0 AN A(x)(z; 0); mB = B (x)(z; 0);


so that

E (mAm0B ) = FN0 AN E [A(x)


(x)B (x)0] = FN0 AN E [A(x)D(x)]
and
= FN0 AN FN ;
E (mAm0A ) = FN0 AN VN AN FN ; [E (mB m0B )] 1 = :

Therefore,

(FN0 AN FN )) 1 FN0 AN VN AN FN (FN0 AN FN ) 1

= (E [mAm0B ]) 1 E [mAm0A ] (E [mB m0A ]) 1

(E [mB m0B ]) 1

110

CHAPTER 6. THE GMM ESTIMATOR

= (E [mAm0B ]) 1 E [mAm0A ]
where

E [mAm0B ] (E [mB m0B ]) 1 E [mB m0A ]


E [mB m0A] = E [RR0];
n

R = (E [mAm0B ]) 1 mA
Since

E [mAm0B ] (E [mB m0B ]) 1 mB :

E [RR0] is positive semi-denite,  is a lower bound for the

asymptotic variance.

6.6.2

A rst feasible estimator

Optimal instruments

B (x) cannot be used, because they depend

on unknown parameters and/or functions. Assume

D(x) = D(x; 0)

and

(x) =
(x; 0);

D(:) and
(:) are known, and  is a real vector. Because D (x) and
(x), we could estimate 0 by running
a linear regression of @(z; ^
)=@ and (z; ^)(z; ^)0 on x. This
^ (x) = D(x; ^)0
(x; ^) 1 and the resulting GMM estimator
gives B
where functions

would be

^ = arg min

8
n
<X

2 :

i=1

(zi; )0B^ (xi)0

"

n
X
i=1

B^ (xi)B^ (xi)0

# 1
n
X
i=1

9
=

B^ (xi)(zi; ) :

This estimator is always consistent, but not ecient if functions

D(x; ) and
(x; ) are misspecied.

Example: Nonlinear model with heteroskedasticity


Consider

y = f (x; 0) + "; E ["jx] = 0; E ["2jx] = h(x; 0; 0);

6.6.

EXTENSION: OPTIMAL INSTRUMENTS FOR GMM

where

111

h(:) is known.

Model with restricted rst two conditional moments only.

Ex-

ploiting additional information on second moment yields an IV


estimator at least as ecient as weighted least squares.
Drawback: estimator may not be consistent if the form of heteroskedasticity is misspecied.
Dene moment restrictions as

(z; ) = y

f (x; ) ; [y


f (x; )]2 h(x; ; ) 0 :

Optimal instruments take the form

@f (x; )=@ 0
0
D(x) = D(x; 0); D(x; ) = @h(x; ; )=@ 0 @h(x; ; )=@0 ;

(x) = V ar[("; "2)0jx];

B (x) = D(x)0
(x) 1:

Empirical issue: when is incorporating additional moment condition yielding a more ecient estimator ?
Asymptotic variance of the heteroskedasticity-corrected least squares
estimator:



E ["2jx]

@f (x; 0)
@



 
@f (x; 0) 0 1
;
@

to be compared with block corresponding to



in E [D(x)0
(x) 1D(x)] 1.

The two are equal if

 E ["3jx] = 0, or
 h(x; 0; 0) = h(x; 0).
Otherwise, the asymptotic variance of the heteroskedasticity-corrected
least squares estimator will be larger than the conditional moment
bound.

Corollary:

Gain in eciency exists even if

not depend on

x or !

h(x; ; ) and
(x) do

112

CHAPTER 6. THE GMM ESTIMATOR

Computation of an ecient estimator

Needs specication of

(x),

and in particular, conditional third and fourth moments.


Assume

E ["3jx] = 0;

V ar("2jx) = 0h(x; 0; 0)2;

0 can be estimated by the sample vari2


^
^ ^ ), where ^ and ^ are initial estif (xi; )] =h(xi; ;

where kurtosis parameter


ance of

[yi

mators.
Estimated optimal instruments are then

"

^
0
^ x) = h(x; ; ^ )
D^ (x) = D(x; ^);
(
^ ^ )2 ;
0
^:h(x; ;
^ x) 1:
B^ (x) = D^ (x)0
(
6.6.3

Nearest-neighbor estimation of optimal instruments

Advantage:

avoid misspecication in

D(x; 0)

and

(x; 0)

in

computing optimal instruments.


Principle: estimate expectations that enter optimal instruments
nonparametrically (these expectations are conditional upon

x).

6.6.3.1 The nearest neighbor estimator


Simplest nonparametric estimator: nearest neighbor, or

NN

estimator.

The nearest neighbor estimator of conditional expectation is


constructed by averaging over the values of the dependent variable for observations where the conditional variable (x) is closest
to its evaluation value.

6.6.

EXTENSION: OPTIMAL INSTRUMENTS FOR GMM

113

xl denote a measure of scale of lth component of x (standard deviation). x being of rank r , dene
Let

jjxi

xj jjn =

r
X
(xil
l=1

^ l

xjl )2

)1=2

This measures the distance between observations


ing for the multivariate nature of

K; K  n, and

8
<
:

Integer
vation

i.

!kK  0
!kK = 0

x.

i and j , account-

Consider now a given integer

1  k  K;
for
k > K;
PK
k=1 !kK = 1:

for

and

 n is the number of nearest neighbors for any obser-

j 6= i according to distance
th
above. Then assign the weight Wij = !jK to observation with j
smallest distance jjxi
xj jjn.
Let

Wii = 0

and rank observations

!kK = 1=K; k  K .
To compute conditional expectation of y given x:
 Select the set of the K (out of n) xi's closest to point x;
 Compute the mean of the yi values corresponding to the xi's
Example: uniform weights

chosen above:

K
1X

E (yjx) =
!kK yk (x) =
yk (x);
K k=1
k=1
K
X

where

yk(x)

yi's ordered according to distance



is the yi whose xi is closest to x, y2 is

are the original


measure dened above (y1

the second closest, and so on).

114

CHAPTER 6. THE GMM ESTIMATOR

Other possibility:

k NN estimator with non-uniform weights.

Stone (1977) suggests the following estimator

E (yjx) =

n
X
j =1

!j yj(x);

using either triangular weights:

!jT =

2(K
0

j + 1)=[K (K + 1)]

j < K;
for j  K;
for

of quadratic weights:

!jQ

6(K 2
0

(j

1)2]=[K (K + 1)(4K

1)]

j < K;
for j  K:
for

6.6.3.2 Application to optimal instruments estimation


The nearest neighbor estimator of the conditional covariance
at

xi is

^ xi) =

where observation

n
X
j =1

(xi)

Wij (zj ; ^)(zj ; ^)0;

i is excluded because Wii = 0 (leave-one-out

procedure).

D(x) is accordingly
n
X
@(zj ; ^)
^
D(xi) =
Wij
:
@
j =1

The nonparametric estimation of

Assume now some components of


form, and depend only on

x.

D(x)

have known functional

The estimator will consist in the sum

of both parametric and nearest neighbor components. Let

D(x; )

6.6.

115

EXTENSION: OPTIMAL INSTRUMENTS FOR GMM

x and nuisance parameters .


D(x; ) has the same dimension as D(x), and its components are
equal to those of D (x) that are known, and 0 otherwise.

denote a pre-specied function of

The estimator will be

D^ (xi) = D(xi; ^) +

n
X
j =1

Wij

@(zj ; ^)
@

D(xj ; ^) :

Finally, we can compute

n
X
1
0
1
^ xi) ; ^ =
^ xi)D^ (xi)
B^ (xi) = D^ (xi)
(
D^ (xi)0
(
n i=1
6.6.4

! 1

Generalizing the approach: other nonparametric


estimators

6.6.4.1 Conditional moment estimation


We wish to estimate the conditional expectation at the point

x, E (Y jX = x) = m(x), with
m(x) =

Z 1

X=

f (y; x)
y
dy;
f1(x)
1

f (:; :) and f1(:) are respectively the joint density of (x; y)


and the marginal density of x.
A nonparametric alternative to k
NN will consist in estimating densities above nonparametrically, to construct m
^ (x) =
i
R1 h
^
^
1 y f (y; x)=f1(x) dy . Popular approach in practice: the Nadaraya-

where

Watson kernel-based estimator.

116

CHAPTER 6. THE GMM ESTIMATOR

6.6.4.2 Nonparametric density estimation


Let

F (x) denote the cumulative density function of X .

The den-

sity function is

f (x) =

d
F (x + h=2) F (x h=2)
F (x) = lim
h!0
dx
h

P rob (x h=2 < X < x + h=2)


:
h!0
h
For estimating f (x) based on observations x1 ; : : : ; xn , we consider h a function of n such that h
! 0 when n ! 1. The
= lim

probability above is then estimated by the proportion of observations falling in the interval

1
f^(x) =
nh
1
=
nh

(x

Number of


Number of

x1

n
1 X
=
1I
nh i=1

n
1 X
=
1I
nh i=1

1

2

h=2; x + h=2):


h
h
;x+
2
2

x1; : : : ; xn in x
x

;:::;

xn

in

( 1=2; 1=2)

1 xi x 1
 h 2
2


1
;
i
2



xi

f^(x) is the per unit relative frequency in the interval (x h=2; x +


h=2), with midpoint x. Bandwidth h measures the degree to which
the data are smoothed (averaged) in computing f^(x).
This rst, naive nonparametric density estimator as been proposed by Fix and Hodges, 1951, and obtains by averaging the

xi's

6.6.

EXTENSION: OPTIMAL INSTRUMENTS FOR GMM

in an interval around

117

x, e.g., x  h=2, where h is the interval width.

Density estimators using indicator functions are stepwise by

xi  h=2.

nature and have jumps at

If one prefers smoother sets

of weights, one can replace the indicator function by a positive kernel function denoted

K (:).

estimator is

The Parzen-Rosenblatt kernel density

n
n
X
X
x
x
1
1
i
K
=
K ( i) ;
f^(x) =
nh i=1
h
nh i=1
where the kernel function has the following properties:

Z 1

K ( )d = 1; K (

1) = K (1) = 0;

and may or may not be symmetric.

Note: Easy to generalize to multivariate density estimation,


with a multivariate kernel
and

K (x)dx = 1:

K1(:; :) such that K (x) = K1(x; y)dy




n
1 X
z z
^
^
f (y; x) = f (z ) = q+1
K1 i
;
nh i=1
h
where

xi has rank q, zi is the ith observation (yi; xi) and z = (y; x)

is a xed point.

6.6.4.3 Selection of bandwidth parameter


Important issue: selection of the optimal bandwidth parameter,

h.

For this, we need the following set of assumptions:

(A1) Observations
(A2) Kernel

x1; : : : ; xn are i.i.d.

K (:) is symmetric around 0 and satises

118

CHAPTER 6. THE GMM ESTIMATOR

R
R K (2 )d
(ii)
(
R K
2
(i)

(iii)

= 1,
)d = 2 6= 0,
K ( )d < 1.

(A3) Second-order derivatives of


in some neighborhood of

x.

are continuous and bounded

h = hn ! 0 as n ! 1.
(A5) nhn ! 1 as n ! 1.
(A4)

With these assumptions, we can approximate the bias and variance of

f^:

h2 00
^
Bias [f (x)] =
 f (x);
2 2
Strategy for choosing

h:

1
var [f^(x)] =
f (x)
nh

K 2 ( )d :

minimize the Mean Integrated Squared

Error (MISE):

Z h

i2

f^(x) f (x) dx =

Z h

(Bias f^)2 + Var (f^) dx;

or preferably, its approximation (AMISE):

1
AMISE = 1h4 + 2(nh) 1;
4
Z

where

1 = 22 [f 00(x)]2dx; 2 =

K 2( )d :

2 if O(h4 ) and variance is O(nh) 1, AMISE if of order


maxfO(h4); O(nh) 1g. Hence, the only value of h for which the

Since Bias

two are of the same order of magnitude is

h / n 1=5;

for which

AMISE = O(n 4=5):

6.6.

119

EXTENSION: OPTIMAL INSTRUMENTS FOR GMM

6.6.4.4 The Nadaraya-Watson kernel-based estimator


The estimator is

m
^ (x) =

"

1 Pn K1 yi y ; xi x 
iP
=1
h
h
y
dy;
n
x
x
q
1
i
K
1 (nh )
i=1
h

Z 1

(nhp)

K (:) and K1(:; :) are q-multivariate and p-multivariate kernels respectively, and p = q + 1 (recall x has rank q ). Dene
i = h 1(yi y) , y = yi hi. The numerator above becomes

where

Z 1

(nhp) 1

n
X

i=1
Z
n
1X
y
=
n i=1 i
n Z 1
1X

n i=1

(yi
1
1

h )K1 ;


K1 ;

xi

h p+2K1 ;

xi

xi





hd

h q d


d;

and since the last term is zero for symmetric kernels, we nally
have

n
1X
=
yh
n i=1 i

Z 1

K1 ;

xi



d

n
1X
x x
=
:
yih q K i
n i=1
h

Hence the nal nonparametric kernel estimator is

m
^ (x) =

" n
X
i=1

xi

# 1 "X
n
i=1

xi

yi :
G

Special case: the General Nearest Neighbor estimator (

NN ).

120

CHAPTER 6. THE GMM ESTIMATOR

We consider here weights similar to kernel functions with unbounded support:

m(x) = E (Y jX = x) =
where

and

n
X
i=1

!is(x)yi;

xi x 
d
!is(x) = Pn
xi x  ;
K
i=1
d
th nearest
distance between x and its K

d is the

neighbor.

Numerous papers on optimal choice of window width


One can show (Mack, 1981) that

and

K.

are linked in kernel

estimation:

K = nh4=(4+q)

and

K opt / n4=5; hopt / n 1=5:

Chapter 7
GMM estimators for time series
models
7.1 GMM and Euler equation models
Lucas critique (1976): evaluations based traditional dynamic simultaneousequation models are awed because parameters are assumed invariant across dierent policy regimes.
Hence, marginal response to a change in policy instruments is not
to be expected from rational agents taking into account policy
changes in their decision making.
Standard estimation procedures (MLE) are computationally burdensome when one introduces taste and technology parameters.

Hansen and Singleton (1982): GMM can be applied easily to


structural models, to draw inference on these parameters.

7.1.1

Hansen and Singleton framework

Consumption-based asset pricing model: representative agent chooses


consumption and investment in a single asset to maximize dis-

121

122

CHAPTER 7. GMM ESTIMATORS FOR TIME SERIES MODELS

counted utility

max E0

"
1
X
t=0

t U (Ct) ;

where

Et(:) is expectation operator at time t, conditional on information


set t ,
Ct is consumption,
t is a constant discount factor,
U (:) is a strictly concave utility function.
Budget constraint:

Ct + Pt Qt  RtQt 1 + Wt;
where

Rt: pay-o for asset (bought in period t 1),


Pt and Qt: price and quantity of asset bought,
Wt: labor income.
Asset price is deated by the price of consumption good.

First-order condition:

Pt U 0(Ct) = Et[Rt+1U 0(Ct+1)];


Equivalently,

where

Rt+1 U 0(Ct+1)
Et
 U 0(C )
Pt
t

U 0(:) = @U=@C:

1 = 0:

This is the Euler equation for the system.


Specication of the utility function:

Ct
U (Ct) = ;

with

< 1;

7.1.

123

GMM AND EULER EQUATION MODELS

so that

where

7.1.2

R
C
Et t+1  t+1 1 = 0;
Pt
Ct
1.

(7.1)

GMM estimation

Maximum-Likelihood Estimation: specify conditional distributions


of

R
LW1;t+1 = log t+1
Pt

and

C
LW2;t+1 = log t+1
Ct


given

t ;

and maximum likelihood function based on these, subject to restriction (7.1).


Disadvantage:

computer-intensive method, and possible biased

inference if conditional distributions are misspecied.

Consider GMM estimation of Equation (7.1); to identify parameters

and we need at least two moment restrictions.

The

rst one obtains from using the Law of Iterated Expectations:

R
C
E t+1 t+1
Pt Ct

1 =E



R
C
t+1 t+1
Pt Ct



= 0:

Additional restriction obtain from incorporating the rational


expectations hypothesis: agents use all available information at

t, t, so that
If yt+1 2
= t but zt 2 t then Et(yt+1zt) = [Et(yt+1)] zt:
If Et (yt+1 ) = 0, by the Law of Iterated Expectations, we have
E (yt+1zt) = 0, and the Euler equation implies



Rt+1 Ct+1
E ["t+1( ; )zt] = 0 where "t+1 =
1;
Pt
Ct

time

124

CHAPTER 7. GMM ESTIMATORS FOR TIME SERIES MODELS

and

zt

is a vector of variables contained in information set

Valid candidates are

Ct i; Rt i; Pt i; i  0.

t.

Notes.

 This example shows that model errors need not be linear in


endogenous variables for GMM.

 If replaced by 1=(1 + r) where r:


rate, model is just identied (for

observed constant interest

) through Euler equation (7.1).

7.2 GMM Estimation of MA models


Consider estimation of a pure moving average MA(1) model

yt = "t + 0"t 1;
where

7.2.1

"t is an i.i.d.

process with 0 mean, variance

(7.2)

02, and j0j < 1.

A simple estimator

Implied rst-order autocorrelation is

0 =

E (ytyt 1)
0
=
:
E (yt2)
1 + 02

Replacing unknown parameter

^T =
we obtain estimator

^T

0 by sample estimator

PT
t=2 yt yt 1 ;
PT
2
t=2 yt

by solving

^2T

^T 1^T

1 = 0:

7.2.

125

GMM ESTIMATION OF MA MODELS

j^T j < 0:5, but this may


not be veried in nite samples, especially if j0 j close to 1. We

Problem: Solution is real-valued only if

may dene

~T =
and solution for

~T

8
<

0:5

^T < 0:5;


if j
^T j < 0:5;
if 
^T > 0:5;
if

^
: T

0:5

is

~T =

Second structural parameter:


rived from

1 4^2T
:
2^T
02, whose expression can be de-

E (yt2) = 02(1 + 02);

with sample analog

(1=T ) Tt=1 yt2


2
~ T =
:
2
~
1+
Consider now estimation in a GMM framework.

 = (; 2)0 and let




2
y
t yt 1  
f (yt; ) =
;
yt2 2(1 + 2)
such that Ef (yt ; 0 ) = 0 (theoretical moment condition).

Dene parameter vector

Sample moments are

T
1X
f T ( ) =
f (y ; ) =
T t=1 t

(1=T ) PTt=1 ytyt 1 2


(1=T ) Tt=1 yt2 2(1 + 2)

fT (^T = 0
~
^T = ~T = (T ; ~ 2T ).

This system is just-identied, and solving


same estimators as above:

yields the

126

CHAPTER 7. GMM ESTIMATORS FOR TIME SERIES MODELS

Estimators ^T and ~T are consistent and asymptotically normal with distribution
Theorem 4

p
pT (^T

0)
0)

T (^T

where

1
=
(1 02)2

v N (0; );

1 + 02 + 404 + 06 + 08


20203(2 + 02 + 04)
20203(2 + 02 + 04) 204(1 202 + 304 + 206)


0 0
+
;
0 4
with 4 the fourth-order cumulant of "t.
Under the normality assumption, asymptotic variance of the
MLE of

^T

(1

is

02).

Hence this GMM estimator is asymptot-

ically as ecient as MLE only if

0 = 0, and is rather inecient

in general.

7.2.2

A more ecient estimator

This estimator is based on

Autoregressive Approximation (Durbin

1959).
The MA(1) dened by (7.2) is invertible, therefore it admits
an AR representation:

yt =
where

1
X
j =1

j (0)yt j + "t;

j (0) = ( )j ; 8;

j = 1; 2; : : :

7.2.

127

GMM ESTIMATION OF MA MODELS

which is approximated in practice by

yt =

K
X
j =1

j (0)yt j + "Kt:

(7.3)

This approximation produces an extra error because

"Kt = "t +

1
X
j =K +1

j (0)yt j = "t + ( 1)K +10K +1"t

1:

We can look at (7.2) as the structural model, and Equation (7.3)


as the reduced-form model. The AR model captures second-order
properties of

yt

instead of the autocovariance function. We need

then to dene estimators for

0 based on estimators for j (0).

K -vector
0
1
1()
A 8 ; with j () = ( )j ;
AK () = @ ...
K ( )
^K denote the K -vector of OLS estimators (^ 1; : : : ; ^ K )
and let A
Dene the

in (7.3).

For an given

K , we dene


^T K = arg min A^K


2

where

7.2.3

0

AK () VT K

 = ( 1; +1) and VT K

is a

A^K

K K

AK () ;

weighting matrix.

Example: The Durbin estimator

We can write

j () =  j 1; j = 1; 2; : : : ;

(7.4)

with

0() = 1:

128

CHAPTER 7. GMM ESTIMATORS FOR TIME SERIES MODELS

exact autoregressive relationship for j (), and we can


estimate 0 by regressing OLS estimates (^
1; : : : ; ^ K ) on lagged
values of themselves, i.e., on 1; (^
1; : : : ; ^ K 1). The estimator is
This is an

P

^D =

K
^ j ^ j 1
j =1
P

K
2
^j
j =1

with

^ 0 = 1:

And in terms of (7.4):

VT K = BK ()0BK ();
and

LK : K  K

where

BK () = IK + LK ;

matrix with 1's on the rst lower o-diagonal

and 0's elsewhere.

7.3 GMM Estimation of ARMA models


To simplify exposition, we concentrate on the ARMA(1,1) case.

7.3.1

The ARMA(1,1) model

The model is

yt = 0yt 1 + "t + 0"t 1;

where we assume

0 6= 0;

j 0j < 1; j0j < 1;

and we view the model as a regression

yt = 0yt 1 + ut;
where

ut = "t + 0"t 1;

0 is the parameter of interest.

(7.5)

7.3.

129

GMM ESTIMATION OF ARMA MODELS

OLS estimation (ignoring the moving average structure in


is inconsistent because

yt =
7.3.2

1
X
j =0

0j ut

ut)

E (ytyt 1) 6= 0, since by back substitution:

E (utyt 1) = E (utut 1) = "20:

(7.6)

IV estimation

ut implies E (utut j ) = 0 8j  2, and


(7.6) implies that E (ut yt j ) = 0 8j  2. We can use these moment conditions to estimate consistently 0 with an IV procedure.

The MA(1) structure on

Moment conditions are

Ef (yt; 0) = 0

f (yt; ) = (yt

where

yt 1)yt 2;

T
1X
fT ( ) =
(y yt 1)yt 2;
T t=3 t
^ T ) = 0 for ^T gives
and solving fT (

^ T =

T
X
t=3

yt 2 yt 1

! 1 T
X
t=3

yt 2yt:

^ T is consistent and asymptotically normal, with


T ( ^ T 0) v N (0; ),
(1 02)(1 + 402 + 4 00 + 4 003 + 2 0202 + 02)
=
:
(1 + 00)2( 0 + 0)2

Theorem 5

In contrast, the asymptotic distribution is the MLE from the


ARMA(1,1) model is

T ( ^ MLE

(1 + 00)2(1 02)
0) v N 0;
:
( 0 + 0)2

130

CHAPTER 7. GMM ESTIMATORS FOR TIME SERIES MODELS

Notes. Both these estimators have a large variance when


is close to

0.

The MLE is more ecient than GMM, especially for large values
of

0 and 0.

We can also consider augmenting the set of instruments (the


model becomes over-identied) by including

yt j ; j = 2; 3; : : :,

yielding

T
X

^ Tj =

t=j +1

yt 1yt

! 1
j

T
X
t=j +1

yt yt j ;

for

j  2:

^ Tj has asymptotic variance  0 2(j 2).


^ T is the most ecient of these
Because j 0 j < 1, it follows that
Dolado 1990 shows that

estimators. Intuition: eciency of IV estimator related to the cor-

yt 1 )

relation between stochastic regressor (


variable (

yt

j ).

(rapidly) with

Since

yt

and its instrumental

is stationary, this correlation decreases

j , and it is best to choose the smallest j admissible.

Finally, last possibility is to use more than 1 instrument for

yt 1

implied by moment conditions

This gives the

q vector of conditions

E (utyt j ) = 0;

Ef (yt 0) = 0; f (yt; ) = Yq;t 2(yt


where

8j

yt 1))

Yq;t = (yt; : : : ; yt m+1)0 is a q-vector of instruments.

estimator is

^ Tq =

T
X
t=q+2

yt 1Yq;t0 2ATq

T
X
t=q+2

2.

Yq;t 2yt 1

! 1

GMM

7.4.

131

COVARIANCE MATRIX ESTIMATION

T
X
t=q+2

yt 1Yq;t0 2ATq

X
t=q+2

Yq;t 2yt;

q  q weighting matrix.
^ Tq is
The asymptotic distribution of

where

ATq

is a positive denite


T ( ^ Tq 0) !d N 0; "2(Rq0 Aq Rq ) 1Rq0 Aq Vq Aq Rq (Rq0 Aq Rq ) 1 ;

where

Vq = lim T varfT ( 0);


T !1

(1 + 00)( 0 + 0)
"2 0j 1
:
(1 02)
1
The optimal choice for the weighting matrix being ATq = Vq , we

Rq = E (Yq;t 2yt 1);

with

have

T ( ^ Tq

j th element


0) !d N 0; "2(Rq0 Aq Rq ) 1 :

7.4 Covariance matrix estimation


In the time-series framework, moment conditions are dened as

E [f (xt; 0] = 0,

and the variance-covariance matrix to be esti-

mated is

T X
T
X
1
VT = T var[fT (0)] =
E [f (xt; 0)f (xs; 0)]:
T t=1 s=1
This is the average of autocovariances for the process
Let

ft = f (xt; 0)

and rewrite

function:

VT =

T 1
X
j = (T

1)

VT

f (xt; 0).

as a general autocovariance

T (j )

where

132

CHAPTER 7. GMM ESTIMATORS FOR TIME SERIES MODELS

T (j ) =
7.4.1

(1=T ) Tt=j +1 E (ftft0 j ); j  0;


P
(1=T ) Tt= (j 1) E (ft+j ft0); j < 0:

Example 1: Conditional homoskedasticity

Linear regression model

yt = xt 0 + ut;
Assume

where

E (ft) = E (xtut) = 0:

E (utjut 1; xt; ut 2; xt 1; : : :) = 0

and

E (utu0tjut 1; xt; ut 2; xt 1; : : :) = u2 :

Residual

ut is neither heteroskedastic nor serially correlated.

We

have

T
1X

VT =
T (0) =
E (xtututx0t) = u2 E (xtx0t);
T t=1
the standard OLS variance-covariance matrix. The estimator of

VT

constructed from sample moments is the MLE:

T
2X

^
V^T = u
xtx0t;
T t=1
7.4.2

where

T
1X
2
^ u =
u^2t ; u^t = yt
T t=1

^
xt :

Example 2: Conditional heteroskedasticity

Assume now that

E (utu0tjut 1; xt; ut 2; xt 1; : : :) = t2:


The covariance matrix is then

T
1X

VT =
T (0) =
E (xtututx0t);
T t=1

7.4.

133

COVARIANCE MATRIX ESTIMATION

which is consistently estimated by

T
1X
^
VT =
xtu^tu^tx0t:
T t=1
This is White's heteroskedasticity consistent estimator.
In a typical IV setup, where

ft = wt(yt

xt ); wt are instruments;

T
1X

E (u2t )wt0 wt;
VT =
T t=1
and the asymptotic covariance matrix would be

1
P X
T W

 1

1 ^
P P
T W W



1 0
X PW
T

 1

^ is a T  T diagonal matrix with typical element u^2t , and


PW = W (W 0W ) 1W 0.

where

7.4.3

Example 3: Covariance stationary process

Assume

T (j ) = 0 for j > m, so that

VT =

m
X
j= m

T (j ):

The covariance matrix estimator is based on the sample analogue

V^T =

^ T (j ) =

m
X

^ T (j );

where

j= m
(
P
(1=T ) Tt=j +1 xtu^tx0t j u^t j ;
P
(1=T ) Tt= (j 1) xt+j u^t+j x0tu^t;

j  0;
j < 0:

134

CHAPTER 7. GMM ESTIMATORS FOR TIME SERIES MODELS

In most cases, restrictions as in examples 1-3 above are too


strong, and an obvious idea would be to construct an estimator

V^MM

based on sample analogues to population autocovariances:

V^MM =

^ T (j ) =
where

T 1
X
j = (T

1)

^ T (j );

where

P
(1=T ) Tt=j +1 f^tf^t0 j ; j  0;
P
(1=T ) Tt= (j 1) f^t+j f^t0; j < 0;

f^t = ft(xt; ^T ).

But:

 The number of estimated autocovariances grows with the sample size;

 Although V^MM may be asymptotically unbiased, it is not consistent in the mean squared error sense;

 Finite sample properties: In the exact-identication case, V^MM


is 0 8T .
Why sample autocovariance matrix

^ T (j )

not consistent for

j, T + 1  j  T 1 ?
Suppose j = T
2; then
^ T (j ) tends to 0 as T
arbitrary

7.4.4

!1!

The Newey-West estimator

Given problem above, a rst idea is to consider models for which


autocovariance genuinely tends to 0 as
case for

! 1.

This is the

asymptotically independent processes characterized by the

mixing property.

7.4.

135

COVARIANCE MATRIX ESTIMATION

Consider two bounded mappings Y : Rk+1 ! R


and Z : Rl+1 ! R. The sequence fytg is mixing if there exists
a sequence of positive numbers f ng, converging to 0, such that

Denition 2

jE [Y (yt; yt+1; : : : ; yt+k )Z (yt+n; yt+n+1; : : : ; yt+n+l)]


E [Y (:)]E [Z (:)]j < n:
We can replace the sum in the denition of

sum,

such that terms for which

p are eliminated.

VT :

V^T =
^ T (0) +

p 
X
j =1

by a

truncated

is greater than some threshold

Using the fact that

the following estimator for

^ T

( j ) =
(j )0, we consider


^ T (j ) +
^ T (j )0 :

(7.7)

This is the Hansen (1982) covariance estimator (see also Hansen

p is dened as the lag truncation parameter,


1=4), so
and should go to innity at some rate, typically p = o(T
that all non-zero
T (j )'s are consistently estimated.

and Singleton 1982).

Problem with estimator in (7.7): in nite samples,


not be positive semidenite.
(1987):
with

multiply

^ T (j )

^ T (j ) may

Suggestion by Newey and West

by a sequence of weights that decrease

jj j. The Newey-West estimator is


V^T =
^ T (0) +

p 
X
j =1

j
p+1



^ T (j ) +
^ T (j )0 ;

where linear weights decrease from a value of 1 for

1=(p + 1) for jj j = p with step size of 1=(p + 1).

^ T (0) down to

136

CHAPTER 7. GMM ESTIMATORS FOR TIME SERIES MODELS

7.4.5

Weighted autocovariance estimators

Extension of Newey-West suggestion: looking for more ecient


covariance matrix estimators.
General form: weighted average of sample autocovariance matrices:

V^T =

T 1
X
s= (T

1)

!s
^ T (s);

f!sg is denoted the lag window.


Strategy: choose a lag window such that f!s g approaches 1
 rapidly enough to obtain asymptotic unbiasedness;
 slowly enough to ensure that the variance converges to 0.
where the sequence of weights

In practice, we concentrate on

scale parameter windows, where




s
;
!s = k
mT

mT is the scale parameter or the bandwith parameter, and


function k (:) is the lag window generator. These estimators bewhere

long to the class of

0.

kernel spectral density estimators at frequency

We assume

The function k(:) : R ! [ 1; 1] satises

k(0) = 1; k(z ) = k( z ) 8z 2 R;

Z 1

jk(z)jdz < 1;

and k(:) is continuous at 0 and "everywhere else" except at a nite number of points.
Note: When k (:) = 0 for z > 1, mT reduces to p, the lag truncation parameter.

7.4.

Let

r be the largest integer such that


kr = lim

z !0

is nite and not 0. Integer


tion

kr

137

COVARIANCE MATRIX ESTIMATION

k(:), and kr

k(z )
jzjr

r is the characteristic exponent of func-

measures the smoothness of the lag window. If

is nite for some

r0, then kr = 0 for r < r0.

Consider nally the following measure of smoothness of the spectral density function in the neighborhood of 0:

1
X
(
r
)
1
S = (2)
jj jr
(j );
j= 1
also denoted the
function:

When

generalized rth derivative of the spectral density


1
1 X
Sf () =

(j )e
2 j = 1

! 1, the limit of VT

tral density matrix of

ij :

is equal to

2

times the spec-

ft evaluated at the zero frequency (Hansen

1982).
Dene the asymptotic truncated Mean Squared Error:

T
MSEh = E min j vec(V^T
mT
where

BT

VT )0BT vec(V^T

VT )j; h ;

is a square, possibly random, weighting matrix.

We have the following result.

Theorem 6

We have
(i) If m2T =T

(Andrews 1991). Assume mT

! 0 then V^T

VT

p
!
0.

p
! 1 and BT !
B.

138

CHAPTER 7. GMM ESTIMATORS FOR TIME SERIES MODELS

(ii) If m2Tr+1=T ! 2)0; 1( for some r 2 [0; 1) for which kr


and jjS (r) jj < 1, then
p
T=mT (V^T VT ) = Op(1):
(iii)

lim lim MSEh = 42 kr2(vecS (r) )0B vecS (r) =


T !1 h!1

Z 1
+
k2(z )dz tr(B )(I + Bqq )Sf (0)
Sf (0) ;
1 P
P
where Bqq = i j eie0j
ej ei and ei is a zero vector with 1 as
the ith element.

(i): establishes consistency of scale parameter covariance estimators for bandwidth sequences that grow at rate

o( T ).

(ii): Gives rate of convergence.


(iii): Gives asymptotic truncated Mean Squared Error.

For

j th diagonal element of V^T , asymptotic bias is


(r )
mT r kr 2Sj;j
and asymptotic variance
Z 1
m 
T
(2)
2
8 Sj;j
k2(z )dz:
T
1

Criteria of choice for scale parameter window

According to the-

orem above, preferred estimators have large

Variance of these

r.

O(mT =T ) and the bias is O(mT r ).


Also, no kernel estimators with r > 2 can be positive semidenite. Hence, we should restrict attention to estimators with r = 2
kernel estimators are

(which rules out truncated and Bartlett kernels).


Optimal choice of scale parameter

mT :

according to asymptotic

truncated MSE, the scale parameter should be of order

T (2r+1)

7.4.

139

COVARIANCE MATRIX ESTIMATION

Table 7.1: Some Kernel estimators for weighted autocovariance


k(z )
r kr
Truncated
1 for jz j  1,
1 0
0 otherwise
Bartlett
1
jzj for jzj  1
1 1
0 otherwise
Parzen
1
6z 2 + 6jz j3 for 0  jz j  1=2,
2(1
jzj)3 for 1=2  jzj,
2 6
0 otherwise
7.4.6

Weighted periodogram estimators

Consider the Fourier transform of the lag window:

T 1
1 X
W (; mT ) =
!e
2 s= (T 1) s
This is also denoted the

spectral window.

is:

Kernel estimators can

be computed as weighted integrals:

V^T =

Z 

W (; mT )I^T ()d;

where

I^T () = (2T ) 1


is the

sample spectral density

or

T 1
X
s= (T

1)

^ seis

periodogram,

and

W (:; :) is

the

averaging kernel.
Spectral estimators once computationally burdensome, before FFT
(Fast Fourier Transforms) became popular.
Dene the Fourier transform of

f^t as

T
1 X
(p) = p
f^teip t:
2T t=1

140

CHAPTER 7. GMM ESTIMATORS FOR TIME SERIES MODELS

Table 7.2: Some Kernel estimators for weighted periodograms


k(z )
kr
h
i r
sin(6z=5)
25
2
cos(6z=5)
Quadratic 122 z2 6z=5
2  =10
Daniell
(sin(z )=z
2  2 =6
2(1 cos(sz ))
Tent
2 1/12
z2
The periodogram matrix can be computed at the Fourier frequencies

p =
as

2p
T
; p = 1; 2; : : : ;
T
2

I^T (p) = (p)(p)0;

and we have the nal expression for the covariance matrix:

(T 1)

X
2
V^T =
I^T (0p)W (0p; mT );
2T 1 p= (T 1)

0p = 2p=(2T 1). Within the class of scale parameter


windows with r = 2, the Quadratic Spectral window (see table)
where

minimizes the truncated MSE across at the 0 frequency (Andrews


1992).

Chapter 8
GMM estimators for dynamic
panel data
8.1 Introduction
GMM estimation was introduced as an interesting alternative to
Fixed-eects, Maximum-Likelihood or GLS estimation procedures.
But its advantages are the most obvious for estimating dynamic
panel-data models.

Consider the simple model without exogenous regressors:

yit = yi;t 1 + uit; uit = i + "it:


The Anderson-Hsiao Instrumental-variable procedure: consistent
estimates when

is xed, based on First-Dierence model trans-

formation.
Two drawbacks:
a) In IV procedure, variance-covariance matrix is restricted;
b) Only one instrument is used (either

141

yi;t 2 or yi;t 2

yi;t 3).

142

CHAPTER 8. GMM ESTIMATORS FOR DYNAMIC PANEL DATA

8.2 The Arellano-Bond estimator


Important paper: Arellano and Bond (Review Econ. Stat. 1991):
more robust procedure can be used (point a)) and more orthogonality conditions can be used (point b)).

8.2.1

Model assumptions

(i) For all

i, "it is uncorrelated with yi0 for all t;

(ii) For all

i, "it is uncorrelated with i, for all t;

(iii) For all

i, the "it's are mutually uncorrelated.

Under these assumptions, we have the set of moment conditions:

E (yisuit) = 0; t = 2; 3; : : : ; T; s = 0; 1; : : : ; t 2;
uit = "it = "it

where

"i;t 1.

This is a set of

T (T

1)=2

conditions (compare with Anderson-Hsiao, where only 1 condition


was available).

" are
correlated, i.e., we must have E ("it "i;t+s ) = 0, for

Important assumption: conditions above hold if error terms

not serially

s = 1; 1.

If serial correlation is present, we have the set of conditions:

E (yisuit) = 0; t = 3; : : : ; T; s = 0; 1; : : : ; t 3;
which gives

(T

1)(T

2)=2 conditions (we lost (T

1) condi-

tions).
By continuous substitution seen before:

1 t
i + tyi0;
yit = "it + "i;t 1 + 2"i;t 2 +    + t 1"i1 +
1 

8.2.

143

THE ARELLANO-BOND ESTIMATOR

yit = f ("it; "i;t 1; : : : ; "i1; i; yi0), and

so that

E (yi;t 2uit) = E (yi;t 2("it "i;t 1))


= E ("i;t 2("it "i;t 1)) = 0
because by assumption E ( i "it ) = E ("it yi0 ) = 0.
8.2.2

Implementation of the GMM estimator

We need a) the instrument matrix W; b) An initial weighting


matrix.
The instrument submatrix for unit

i is of the form:

yi0 0 0            
6 0 yi0 yi1 0
0 0 
6
60 0 0 y
i0 yi1 yi2 0
Wi = 6





0
0
0



yi;T 2

6
4

..
.

..
.
..
.

0
0
so that Wi ui =
0
ui2 yi0
B ui3 yi0
B
B ui3 yi1
B
B ui4 yi0
B
B u y
i4 i1
B
B u y
i4 i2
B
B
B
B
B
B
@

..
.

uiT yi0
..
.

uiT yi;T 2
0
and E (Wi ui ) = 0.

..
.

..
.

..
.

..
.

..
.

yi0

C
C
C
C
C
C
C
C
C=
C
C
C
C
C
C
A

B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@

(yi2
(yi3
(yi3
(yi4
(yi4
(yi4
(yiT
(yiT

..
.

..
.

yi1) yi0
yi2) yi0
yi2) yi1
yi3) yi0
yi3) yi1
yi3) yi2
..
.

yi;T 1) yi0
..
.

yi;T 1) yi;T 2

7
7
7
7
7
5

1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A

144

CHAPTER 8. GMM ESTIMATORS FOR DYNAMIC PANEL DATA

(W 0
W ) 1:
is the variance-covariance
of " (in the transformed model). If "it is homoskedastic, we have
Initial weighting matrix for

E ("it"i;t 1) = E [("it "i;t


E ("2it) = E [("it "i;t 1)("it
E ("it"i;t+1) = E [("it "i;t
so that for unit

i, E (uiu0i) = "2H , where


2

H=

(T

2)(T

1)("i;t 1 "i;t 2)] = "2


"i;t 1)] = 2"2
1)("i;t+1 "it )] = "2

6
6
6
6
6
4

2
1 0 
1 2
1 0
0
1 2
1
..
.

2) matrix.

..
.
..
.

..
.
..
.

..
.




..
.

We can use

0
07
7
07
7;
..
.

1 2

7
5

to compute the initial

weighting matrix as

A1 =

N
X
i=1

Wi0HWi:

After nding the rst-stage GMM estimator

^GMM = arg min u0W A1 1W 0u





 

= y0 1W A1 1W 0y 1 1 y0 1W A1 1W 0y ;
we can compute the second-stage weighting matrix as:

A2 =
where

^ui = yi

N
X
i=1

^yi; 1.

Wi0^ui^u0iWi;

8.3.

145

MORE EFFICIENT PROCEDURES (AHN-SCHMIDT)

8.3 More ecient procedures (Ahn-Schmidt)


Ahn and Schmidt (1995) propose
ditions:

2 additional nonlinear con-

E (uiT uit) = 0; t = 2; 3; : : : ; T

With Ahn-Schmidt and Arellano-Bond, we

(T

2)

orthogonality conditions.

1:
have T (T

1)=2 +

Ahn-Schmidt show that they

represent all moment conditions implied by our assumptions.

8.3.1

Additional assumptions

8.3.1.1 Homoskedasticity

V ar("2it) is the same 8t, we have:


t = 1; 2; : : : ; T . This adds T 1 condi-

Under the assumption:

E (u2it)

is the same for

8i,

tions, and the nal set of conditions under homoskedasticity is

E (yisuit) = 0 t = 2; : : : ; T; s = 0; : : : ; t 2;
E (yitui;t+1 yi;t+1ui;t+2) = 0 t = 1; : : : ; T 2;
E (uiui;t+1) = 0 t = 1; : : : ; T 1;
where

ui = T1

PT
t=1 uit.

8.3.1.2 Stationarity

Cov( i; yit) is the same


The entire set of the T (T
1)=2+(2T

When stationarity assumption is added:

8t, this adds 1 condition.

2) conditions is now
E (yisuit) = 0 t = 2; : : : ; T; s = 0; : : : ; t
E (uiT yit) = 0 t = 1; : : : ; T 1;
E (uityit ui;t 1yi;t 1) = 0 t = 2; : : : ; T:

2;

146

CHAPTER 8. GMM ESTIMATORS FOR DYNAMIC PANEL DATA

Advantage: this set consists of linear conditions only.


The Ahn and Schmidt estimator obtains by adding to ArellanoBond instrument matrix the following block for unit

Wi =

B
B
B
@

yi2 0 ::: :::


yi3 yi3 0 :::
..
.

..
.

..
.

..
.

::: ::: :::

0
0
..
.

i:

ui 0 ::: 0
0 ui ::: 0
..
.

..
.

..
.

..
.

yi;T 1 0 ::: ::: ui

How to test for alternative assumptions: let

1
C
C
C:
A

W 1 denote the instru-

ment matrix associated with the set of conditions to be tested, and

W0

an instrument matrix associated with a smaller set of valid

conditions.
ments

Let

(W 0; W 1)

^

^0 denote GMM estimates with instru0


W 0 respectively, and J (^) and J (^ ) the

and

and

corresponding GMM criterion values.


Then under

H0 : conditions associated with W 1 are valid, we have


0
J (^) J (^ ) v 2(rank(W 1)):

8.4 The Blundell-Bond estimator


Blundell and Bond (1998) suggest to use linear moment restrictions based on assumptions for initial conditions. They propose

E (uityi;t 1) = 0 t = 3; 4; : : : ; T;
with the addition of

E (ui3yi2) = 0:
This last condition combined with the one above implies the AhnSchmidt (1995) nonlinear restrictions

E (uitui;t 1) = 0; t = 3; : : : ; T .

8.5.

DYNAMIC MODELS WITH MULTIPLICATIVE EFFECTS

147

It means that we have the following stationarity condition on the


model:

yi0 =

+ "i0:

In other terms, initial deviations from


correlated with the level of

i=(1

)

i=(1 ) itself.

must not be

The GMM estimator of Blundell and Bond combines the AhnSchmidt conditions

Wi with their new instruments dened above:

Wi 0
0
6 0 y
0
i2
6
6
Wi+ = 6 0 0 yi3
6
4

..
.

..
.

..
.




..
.



0
0
0
0

yi;T 1

7
7
7
7;
7
5

for estimating parameters in a two-equation system:

yi = yi; 1 + "i


yi = yi; 1 + i + "i:

8.5 Dynamic models with Multiplicative eects


We consider here two generalizations to multiplicative individual
eects models.

8.5.1

Multiplicative individual eects

Holtz-Eakin et al.

(1988, see also Ahn, Lee and Schmidt 2001)

suggest the model

yit = yi;t 1 + xit + uit;


uit = t i + "it;

148

CHAPTER 8. GMM ESTIMATORS FOR DYNAMIC PANEL DATA

where

t

is a time parameter. Unobserved heterogeneity in

is

aected by a time shock (common to all units).


Let us lag equation above one period:

yi;t 1 = yi;t 2 + xi;t 1 + t 1 i + "i;t 1;


and dene a new variable rt = t =t 1 . Substracting from the rst
equation the second one premultiplied by rt , we have
yit rtyi;t 1 = (yi;t 1 rtyi;t 2) + (xit rtxi;t 1)
+"it rt"i;t 1:
This is new, nonlinear equation with parameters to be estimated:

; ; rt; t = 2; 3; : : : ; T .

The transformation used is denoted Quasi-

dierencing.
GMM estimation is applicable as before (Arellano-Bond, AhnSchmidt or Blundell-Bond), but the initial weighting matrix cannot be used anymore. Let

"it = "it

rt"i;t 1.

We have, under

homoskedasticity and no-serial-correlation assumptions:

E ("it"i;t 1) = E [("it rt"i;t 1)("i;t 1 rt 1"i;t 2)]


= rt"2
E ("2
= E [("it rt"i;t 1)("it rt"i;t 1)]
it )
= "2(1 + rt2)
E ("it"i;t+1) = E [("it rt"i;t 1)("i;t+1 rt+1"it)]
= rt+1"2:
Thus, the optimal initial weighting matrix would be

1 + r12 r2
6 r2
1 + r22
6
60
r3
6
4 :::
:::
0
:::

0
r3 0
1 + r32 0
:::
:::
:::
rT 1

:::
:::
:::
:::
1 + rT2

3
7
7
7:
7
5

8.5.

149

DYNAMIC MODELS WITH MULTIPLICATIVE EFFECTS

When the

rt's

are unknown, we must start with arbitrary val-

ues, but they would produce two-step estimates conditional on


our choice, see above. Also, as the model is nonlinear, we must
minimize the GMM numerically (no closed-form solution).

8.5.2

Mixed structure

Consider

yit = yi;t 1 + uit


where

i = 1; 2; :::; N t = 1; 2; :::; T;

jj < 1, initial conditions yi0; i = 1; : : : ; N are known, and


uit = i + tvi + "it:

is the purely stationary individual eect, and

tvi

captures

an additional, time-varying individual eect.


We assume

E ( i2) =  2 ; E (vi2) = v2; E ("2it) = "2 8i; 8t;


E ("it i) = E ("itvi) = 0; E (yi0"it) = 0 8t; E ( ivi) =  v :
Consider the case where one of the following conditions holds:

t = s
Under condition (8.1),

 = i + vi.
i

Under condition (8.2),

8t; s = 1; 2; : : : ; T;

 v = v2 = 0:
let t = 
 8t; then uit = i + "it,
vi

(8.1)
(8.2)
where

is constant, which corresponds to the

E (u2it) =
 2 + "2 and E (uituis) =  2 if t 6= s. Models uit = i + tvi + "it

usual model in terms of second-order moments because

150

CHAPTER 8. GMM ESTIMATORS FOR DYNAMIC PANEL DATA

and

uit = i + "it cannot be distinguished.

Consider now the case where

 v = 1; 1).

and

are perfectly correlated

We have

uit = (1 + t) i + "it


uit = (1 t) i + "it
Then

vi

 v = 1;
if  v =
1:
if

i disappears from the error term, which becomes:


uit = tvi + "it

with

t = (1 + t); vi = i:

8.5.2.1 Inconsistency of GMM with First-Dierence and QuasiDierence


When

t's are dierent across time, rst-dierencing transforma-

tion yields:

uit

ui;t 1 = (t

t 1)vi + "it

"i;t 1;

and instruments from lagged dependent variables are correlated


with

vi:

E [(uit ui;t 1)Yis] = (t t 1) v + s(t t 1)v2

s  t 2:

If quasi-dierence transformation is applied to the model:

uit

rtui;t 1 = (1 rt) i + "it

and the transformed residual depends on

E [(uit

i .

rt"i;t 1;
We have

rtui;t 1)Yis] = (1 rt) 2 + (1 rt)s v

s  t 2:

8.6.

151

EXAMPLE: WAGE EQUATION

8.5.2.2 A consistent transformation


To eliminate both eects

and

tvi,

it is necessary to use a

double-transformation: First-Dierence, and then Quasi-Dierence:

4yit

r~t4yi;t 1 = (4yi;t 1

r~t4yi;t 2) + 4"it

r~t4"i;t 1;

i = 1; 2; : : : ; N; t = 3; 4; : : : ; T , where
r~t = 4t=4t 1 = (t

t 1)=(t 1

t 2):

GMM estimators of the double-dierence model based on Quasidierencing rst and then First-dierencing residuals are not consistent when instruments include lagged dependent variables.
We would have in that case:

4 [("it

rt"i;t 1) + i(1 rt)] = 4"it

which depends on

4(rt"i;t

1)

i4rt;

i.

GMM procedures using instrument matrices from lagged dependent variables would yield consistent estimates only when the correct model transformation is performed.

8.6 Example: Wage equation


Consider the wage equation seen before, in a simpler, dynamic
form:

log wit =  log wi;t 1 + 1W KSit + 2OCCit + uit;


where

wit:

OCCit:

wage rate,

W KSit:

# of years worked in the year,

dummy for blue collar job.

152

CHAPTER 8. GMM ESTIMATORS FOR DYNAMIC PANEL DATA

We estimate the above model under three specications:

 1. Usual case uit = i + "it;


 2. Multiplicative case uit = tvi + "it;
 3. Mixed case uit = i + tvi + "it.

In case 1, we use a linear GMM procedure with First-dierence


transformation. In case 2, nonlinear GMM in parameters

 and

rt = t=t 1; t = 3; 4; : : : ; T , and Quasi-dierence. In case 3, nonlinear GMM in parameters  and r


~t = t=t 1; t = 4; 5; : : : ; T ,
and Double-dierence.

W KS; OCC ).

Instruments in all cases: weakly exogenous in level (

Table 8.1:
Parameter


1
2

First-dierence GMM

Estimate

Std. error

t-stat.

0.9465

0.0126

74.83

0.0022

0.0022

0.98

-0.0848

0.0423

-2.00

Hansen test 69.68 (7.4E-07)

8.6.

Table 8.2:
Parameter


1
2
r1
r2
r3
r4
r5

153

EXAMPLE: WAGE EQUATION

Quasi-dierence GMM

Estimate

Std. error

t-stat.

0.9121

0.0218

41.72

0.0150

0.0038

3.87

-0.1014

0.1007

-1.00

-0.5838

0.3856

-1.51

-0.0871

0.0974

-0.89

0.3294

0.0621

5.29

-0.1842

0.1074

-1.71

1.0401

0.5947

1.75

Hansen test 2.32 (0.99)

154

CHAPTER 8. GMM ESTIMATORS FOR DYNAMIC PANEL DATA

Table 8.3:
Parameter


1
2
r~1
r~2
r~3
r~4

Double-dierence GMM

Estimate

Std. error

t-stat.

0.9211

0.0460

19.98

0.0082

0.0014

5.79

-0.0394

0.0322

-1.22

-0.5272

0.2250

-2.34

-0.1188

0.1029

-1.15

0.2931

0.1009

2.90

-0.0863

0.0399

-2.16

Hansen test 19.20 (0.05)

Part III
Discrete choice models

155

Chapter 9
Nonlinear panel data models
9.1 Brief review of binary discrete-choice models
Models with qualitative variables: binary choice and multinomial
models. Brief survey of these models, for cross-section data and
the binary case :

yi = xi + ui; i = 1; 2; : : : ; N;

yi = 1
if yi > 0;

yi = 0
if yi  0;

yi and yi: respectively latent (unobserved) and observed variables;


xi: 1  K vector of regressors. Threshold 0 is arbitrary here, as
E (yi) is unknown.
9.1.1

Linear Probability model

E (yi) = P rob(yi = 1) = xi + ui:


[0; 1]. Two
possible values for residual ui : 1 xi (when yi = 1) or ui =
xi

Unreasonable, as predicted probabilility may not lie in

157

158

CHAPTER 9. NONLINEAR PANEL DATA MODELS

yi = 0). Heteroskedasticity, since V ar(ui) = P rob(yi =


0)  ( xi )2 + P rob(yi = 1)  (1 xi )2
(when

= (1 xi )  ( xi )2 + xi  (1 xi )2
= (1 xi )[( xi )2 + xi (1 xi )]
= xi  (1 xi ):

9.1.2

Logit model

Based on Logistic distribution:

exp(xi ) ;
P rob(yi = 1) = (xi ) = 1+exp(
xi )
1
P rob(yi = 0) = 1 (xi ) = 1+exp(
xi ) ;
exp(xi )
Density: (xi ) =
[1+exp(xi )] :
2

In this case,

9.1.3

V ar(ui) = 2=3.

Probit model

Based on Normal distribution:


xi = R xi = p1 exp( u2i );
   1  2
2 2
R
1 exp( u2i2 );
p
 xi = x+1
2
i =  2
2
ui
p1
 2 exp( 22 ):

P rob(yi = 1) = 
P rob(yi = 0) = 1
 
xi
Density: 
 =
Parameter

ui is N (0; 2)

is unidentied (appears in ratio

=): 

is normal-

ized to 1.
Estimation method: Maximum Likelihood:

^ = arg max

N
Y
i=1

[P rob(yi = 1)]yi [1

P rob(yi = 0)]1

yi

9.2.

159

LOGIT MODELS FOR PANEL DATA

= arg min

where

N
Y
i=1

F (ixi );

F (:) is probability function ( or ), and i = 2yi

1.

In these models, inference is best drawn on a) sign of estimates;

@P rob(yi = 1)=@xi).

b) marginal eects (

When moving to panel data, we consider

uit = i + "it, so that

P rob(yit = 1) = P rob(yit > 0) = P rob("it > xit


= P rob("it < xit + i) = F (xit + i):

i )

9.2 Logit models for panel data


9.2.1

Sucient statistics

Consider rst a model with xed-eects.


Maximum Likelihood estimator: we have to estimate both

and

i; i = 1; : : : ; N , but i and are not independent for qualitativechoice models. When T is xed, MLE estimates of i are not consistent and consequently, the MLE of is not consistent either.
Individual eects i are denoted incidental parameters (their number increases with N ).
Solution: Neyman-Scott (1948) principle of estimation in the presence of incidental parameters.

statistic i for , i = 1; 2; : : : ; N

Suppose there exists a

sucient

that does not depend on

, then

the conditional density

f (yijxi; i; ) =

f (yijxi; i; )
;
g(ijxi; i; )

for

g(ijxi; i; ) > 0;

160

CHAPTER 9. NONLINEAR PANEL DATA MODELS

does not depend on

i.

then obtains by maximizing the conditional density of (y1 ; : : : ; yN ) given (1 ; : : : ; N ):


A consistent estimator of

^ = arg max

Joint probability of

yi:
h

P rob(yi) =

exp i

N
Y
i=1

P

f (yijxi; i; ):

T
t=1 yit

P

QT
t=1 [1 + exp(xit

T
t=1 yit xit

 i

+ i)]

If we solve the FOC associated with maximizing the log-likelihood


wrt.

N X
T
@ log L X
=
@
i=1 t=1
and wrt.

exp(xit + i)
+ y x = 0;
1 + exp(xit + i ) it it

i:

T
@ log L X
=
@ i
t=1

T
X
t=1

yit =

exp(xit + i)
+ y = 0; i = 1; 2; : : : ; N;
1 + exp(xit + i) it
T 
X
t=1

exp(xit + i)
1 + exp(xit + i )

i is: i =
PT
The probability that
t yit = s is
Hence, a sucient statistic for

exp( is)
T!
Q


s!(T s)!
[1
+
exp(
x

+

)]
it
i
t

i = 1; 2; : : : ; N:

PT
t=1 yit .

X
d2Bi

exp

T
X
t=1

! )

ditxit

9.2.

161

LOGIT MODELS FOR PANEL DATA

9.2.2

Conditional probabilities

The conditional probability of

yi given i is:

 i
T
exp
t=1 yit xit
P

P rob (yi i) = P
T
d2Bi exp
t=1 dit xit
P
P
( t yit)!(T
t yit)! ;

hP

where

T!
Bi is a set of indices for individual i:
(

Bi = (di1; di2; : : : ; diT )jdit = 0; 1


Set

Bi

T
X
and

t=1

dit =

T
X
t=1

yit :

yP
it for individual
T
in
t yit . Groups

represents all possible combinations of

with the same number of 1's as described

for which

PT
t yit

= 0

or

PT
t yit

= T

have probability 1, and

contribute nothing to the likelihood. Only sets of interest: when

T
y
=
s
2
]0
;
T
[
;
there
are
(
it
1
s ) =T !=[s!(T s)!] such elements,
that correspond to distinct T sequences with value s.
PT

Notes:

 The second expression does not depend on and can be dropped;


 To compute the above probability, we have to consider for each
s all possible sequences of 0's and 1's. Example: if T = 4 and
s = 2, we would have 6 cases and
2
3
1 1 0 0 0
1
61 0 1 07
exp(
x

)
i
1
!
6
7
T
X
X
6 1 0 0 1 7 B exp(xi2 ) C
7B
C
exp
ditxit = vec 6
6 0 1 1 0 7 @ exp(xi3 ) A
6
7
t=1
d2Bi
40 1 0 15
exp(xi4 )
0 0 1 1

162

CHAPTER 9. NONLINEAR PANEL DATA MODELS

9.2.3

Example:

T =2

Only case of interest:

yi1 + yi2 = 1.

!i = 1
!i = 0

if
if

Let

(yi1; yi2) = (0; 1);


(yi1; yi2) = (1; 0):

We have the conditional probability:

P rob(!i = 1jyi1 + yi2 = 1) =

P rob(!i = 1)
P rob(!i = 0) + P rob(!i = 1)

exp( i + yi2xi2 )
=
[1 + exp( i + xi1 )][1 + exp( i + xi2 )]

exp( i + xi1 )][1 + exp( i + xi2 )]


 [1 +exp
( i + xi1 ) + exp( i + xi2 )
exp( i + xi2 )
=
exp( i + xi1) + exp( i + xi2 )
exp[(xi2 xi1) ])
=
= [(xi2 xi1) ]:
1 + exp[(xi2 xi1) ]
In that case, Bi = fijyi1 + yi2 = 1g and the conditional
likelihood is log L =
X

i2Bi

f!i log [(x2i

log-

xi1) ] + (1 !i) log f1 [(x2i xi1) ]gg :

T >P
2, we have to consider alternative sets of
T
observations for which
t yit is the same. Note that this formulation is a conditional Logit specication: regressors x depend on
In practice, when

the alternative.

9.3.

163

PROBIT MODELS

9.3 Probit models


One typically uses the Probit model in the random-eect case
(easier to work with).

uit = i + "it,

Consider a model where


distribution

where

G(:) and is independent of the xi's.

is drawn from

Assume

 2
:
1 +  2
The contribution to the likelihood of unit i is Li = P rob(yi )
V ar( ) =  2 ; V ar("it) = 1; Corr(uit; uis) =  =
=

Z i1 xi1



Z iT xiT

it = 2yit
elements in ui .

where

1
and

f (ui1; ui2; : : : ; uiT )dui1    duiT ;


f ( :)

is the joint density function of

Integration of this density is impratical when

is large, but one

can work with the conditional density, because conditional on


the

uit's are independent:

f (ui1; ui2; : : : ; uiT ) =

Z +1 Y
T

1 t=1

where the density of

 2 )).

Z +1

i,

f (ui1; ui2; : : : ; uiT j i)f ( i)d i

f (uitj i )f ( i)d i;

i is N [0; =(1 )] (remember  =  2 =(1 +


Li as
p2 #
) dti;
itxit + it ti p
1 

Butler and Mott (1982) show that we can write

1
Li(yi) = p


Z +1

"
T
Y
t2i
(
t=1

which is now a one-dimensional integral that can be evaluated numerically (Gauss-Hermite integration procedure).
of the method: assume a constant correlation

Disadvantage

) across periods.

164

CHAPTER 9. NONLINEAR PANEL DATA MODELS

9.4 Semiparametric estimation of discrete-choice


models
We consider here estimation of binary-choice panel data models
with xed eects and possibly endogenous regressors.
In the model

yit = x0it + i + "it;

i = 1; : : : ; N; t = 1; : : : ; T;

we now that the Within estimator is consistent if

E [(xit xi)("it "i)] = 0;


true if

x is strictly exogenous:
E ["itjxi1; : : : ; xiT ] = 0:

It is not sucient to assume that

x is predetermined only:

E ["itjxi1; : : : ; xit] = 0;
and in this case we have to use IV estimation strategy, e.g., tting

4yit = 4xit + 4"it;


using as instruments past values of

x.

Such an approach would not work in nonlinear models, unless


some linearization of the model is performed.

Semiparametric approach of Honor and Lewbel (2000): provide a

N -consistent semiparametric estimator for binary-choice

models, where distribution of error is unspecied.

9.4. SEMIPARAMETRIC ESTIMATION OF DISCRETE-CHOICE MODELS165

9.4.1

The binary choice model

Consider the binary-choice model

yit = 1I(it + x0it + i + "it > 0):


Negative result of Chamberlain (1993): Even if the distribution of

"it

is known, the Logit model is the only version of model above

that can provide a

N -consistent estimator for .

But this nega-

tive result can be overthrown under some additional assumptions


(e.g.,

it

is independent from

set of instruments denoted

zi).

and

"it,

conditional on

x and

Assumptions
A.1. The conditional distribution of

it given (xit; zi) is abso-

lutely continuous wrt. a Lebesgue measure with non-degenerate


Radon-Nikodym conditional density function

ft(itjxit; zi).

eit = i + "it; 8 t. eit is conditionally independent of


it (conditioning on xit and zi). The conditional distribution of
eit has support
et(xit; zi) and is denoted Fet(eitjxit; zi).
A.2. Let

t = r and t = s, the conditional distribution of it given xit and zi has support [Lt ; Kt ] with
1  Lt <
0 < Kt  1, and the support of xit eit is a subset of [Lt; Kt].
A.3. For 2 periods

A.4. Let

xtz = E (xitzi0 ) and zz = E (zizi0 ).


(i)

Then

E ("ir zi) = E ("iszi) = 0;

(ii)
E ( izi); zz ; xrz and xsz exist:
(iii) zz and (xrz xsz )  0zz (xrz xsz )0

166

CHAPTER 9. NONLINEAR PANEL DATA MODELS

are nonsingular.

 The case ir = is = i is allowed;


 (xit; zi) can be correlated with it, but (A.1) rules out (xit; zi)

it;
 i can be correlated with xit or zi, but (nuit; i) must be independent given (xit ; zi );
 "it is uncorrelated with instruments zi;
 (A.2) means that the conditional distribution of "it given (it; xit; zi)
does not depend on it ;
 According to (A.3), it can take on any value that x0it + eit
as deterministic functions of

(rest of latent variable) can take on.

These assumptions imply that there exist sequences

fk2g such that


P rob(yitjxit; zi; it = k1) ! 0

and

fk1g and

P rob(yitjxit; zi; it = k2)

! 1:

Practical implication: the resulting estimator will perform well


when the variance of

it is large relative to the rest of the latent

variable.

Theorem 7

Let

yit =

yit 1I(it > 0)


:
ft(itjxit; zi)

If Assumptions (A.1) to (A.3) hold, then for t = r; s,


E (yit jxit; zi) = x0it + E ( i + "itjxit; zi):
Proof. Let (dropping subscripts for clarity) s = s(x; e) =

x0

e.

We have

E (yjx; z ) = E

E [y

1I( > 0)j; x; z ]


jx; z
f ( jx; z )

9.4. SEMIPARAMETRIC ESTIMATION OF DISCRETE-CHOICE MODELS167

=
=

Z KZ

Z K

E [y

1I( > 0j; x; z )]


f ( jx; z )d
f ( jx; z )

[1I( + x0 + e > 0) 1I( > 0)] dFe(ej; x; z )d

(and because of A.2: conditional independence of

Z K

e L

[1I( > s)

e wrt.  :)

1I( > 0)] ddFe(ejx; z ):

Note that

1I( > s)

1I( > 0) = 1I( > s > 0)1I(s > 0) + [1I(0    s)

+1I( > 0  s)] 1I(s  0) [1I(s >  > 0) + 1I( > s > 0)] 1I(s > 0)
1I( > 0  s)1I(s  0)

= 1I(s > 0) [1I( > s > 0) 1I(s >  > 0) 1I( > s > 0)]
+1I(s  0) [1I(0    s) + 1I( > 0  s)
= 1I(s  0)1I(0 >   s)

,
=

E (yjx; z ) =


1I(s  0)

Z K

e L
Z 0
s

1I( > 0  s)]

1I(s > 0)1I(s >  > 0)

[1I(s  0)1I(0 >   s) 1I(s > 0)1I(0 <  < s)]

ddFe(ejx; z)

1  d

1I(s > 0)

Z s

1  d dFe(ejx; z )

[1I(s  0)  ( s) 1I(s > 0)  s] =


= x0 + E (ejx; z )

QED:

sdFe(ejx; z )

168

CHAPTER 9. NONLINEAR PANEL DATA MODELS

9.4.2

The IV estimator

A corollary to the theorem above is that, under assumptions (A.1)


to (A.4),

E (ziyit ) = E (zix0it) + E (zi i);

for

t = r; s:

Let


xsz )0 1 (xrz

xsz )zz1(xrz

= (xrz

xsz )zz1

t = E (ziyit ):
Then is consistently estimated by
(r
s).
and

xir

Estimation procedure: run a 2SLS regression of

xis, using zi as instruments.

yir

yis

on

We use the fact that

E (yjx; z ) = x0 + E ( + "jx; z )

,
Let

4x = xr

yis = (x0ir x0is) + E ("ir "isjx; z )


= (x0ir x0is) + E ("ir "isjx):
xs, 4y = yr ys,4 = zyr zys. The 2SLS

yir

will be



^ = (4xz 0)(z 0z ) 1(z 4x0) 1 (4xz 0)(z 0z ) 1z 4y:
Lewbel and Honor show that

N ( ^

where

) v N 0;
Var(Q^ i)
0 ;

can be replaced by
^ and

Q^ i = (ziyir

ziyis ) zi(xir

^
xis)0 :

9.4. SEMIPARAMETRIC ESTIMATION OF DISCRETE-CHOICE MODELS169


For computing

yit , we need a nal component:

estimate of

ft .

Feasible two-step estimation: use a kernel estimator of joint den-

(it; xit; zi) divided by a kernel estimator of joint density of


(xit; zi).

sity of

wit = (xit; zi) denote the K + L vector of explanatory and


instrument variables, and uit = (it ; wit ) (a K + L + 1 vector).
f^(it; wit) and f^(wit) respectively denote the estimated joint density function of it and components of wit , and the joint density
associated to components of wit . These densities are
Let

NT
X
K +L+1 1

f^(it; wit) = NT h

= NT h

NT Z
X

1
K +L+1

uit

uj

f^(it; wit)dit


Km

it

NT

1X
j =1

Km

j wit
;

wit

wj

h


wj

dit

Km R(:) and Km (:) are Rtwo multivariate


kernels such that Km (x) =
Km (x; y)dy and Km (x)dx = 1.

where

j =1


NT hK +L

Km

j =1

f^(wit) =

is the window,
2

The conditional density is then estimated by

f^(it; wit)
^
ft(itjxit; zi) = ^
:
f (wit)

170

CHAPTER 9. NONLINEAR PANEL DATA MODELS

9.5 SML estimation of selection models


General-purpose estimation technique for models with selection:
models with endogenous regime switching, Generalized Tobit models, etc.
Use of a particular, ecient simulator for multivariate normal
distributions: the GHK simulator (Geweke-Hajivassiliou-Keane,
Geweke 1991, Brsch-Supan and Hajivassiliou 1993, Keane 1994).

9.5.1

The GHK simulator

Consider the following likelihood function

L=

where

g("j)f ()d;

f:rg
 = (1; 2; : : : ; K )0 and "

are a

K -vector

(9.1)

and a

M-

vector of normal variates respectively. This corresponds to a very


general structural model dened by
straints

  r.

g(:j:),

and the set of con-

Notes:

 In this model formulation, " is an implicit function of parameters and observed variables.

  is typically an unobserved heterogeneity term.


 Function g(:j:) may depend in particular on the conditional distribution of

" given .

Problem in practice: ML estimation would require numerical


integration involving multiple probability distributions.
Idea of the GHK technique: construct a recursive algorithm to
approximate multiple integrals.

9.5.

171

SML ESTIMATION OF SELECTION MODELS

Let

= var(),

a positive-denite matrix such that there

exists a lower diagonal matrix


decomposition):

D
Dene

B
=B
B
@

satisfying

DD0 =

(Choleski

d11 0 : : : 0
d21 d22 : : : 0 C
C
..
.

..
.

..

dK 1 dK 2 : : :

..
.
..
.

C:
A

 = D 1, such that  is a multivariate standard normal

variate. We have

L=
where

i(:):

K
Y

f :D rg i=1

i(i) g("jD )d;

(9.2)

 = (1; 2; : : : ; K ).
f :   rg can be written

standard normal density of

The domain (set of constraints)


recursively as

1 

1
1
1
r1; 2  (r2 d211); 3  (r3 d311 d322);
d11
d22
d33
1
: : : ; K 
(r
d  : : : dK;K 1K 1):
dKK K K 1 1

Equation (9.2) becomes

L=

Z 1

"

Z 1

1
r1 =d11 d22
(r2 d21 1 )

:::

K
Y
i=1

i(i) g("jD ) d1 : : : dK :

Dene now the truncated normal density function for

(i)Ai = (i) 

1
1 
(r
dii i

di11

i

: : : di;i 1i 1)

(9.3)

 1

172

CHAPTER 9. NONLINEAR PANEL DATA MODELS

1
where Ai =
dii (ri

di11

: : : di;i 1i 1); 1

i
, and

(:) is the

normal cumulative density function (CDF). The likelihood function above is now

L=

Z
A1

:::

"

Z
AK

K 
Y
i=1

K
Y
i=1

1
(r

dii i
!

!

di11

: : : di;i 1i 1)

Ai (i) g("jD ) d1 : : : dK :

We now move from truncated normal variables to uniform ran-

i given that its


distribution is truncated normal is between 0 and 1. Let ui denote
a random variable on [0; 1]. We can then write
dom variables. The probability associated to any

ui =

(i)
1

 d1ii (ri


 d1ii (ri

di11
di11

: : : di;i 1i 1)
: : : di;i 1i 1)

; i = 1; 2; : : : ; K:

For example:

(1) (r1=d11)
,
1 =  1 [u1  (1 (r1=d11)) + (r1=d11)]
1 (r1=d11)
(2) (1=d22(r2 d21r1))
u2 =
1 (1=d22(r2 d21r1))






1
1
, 2 =  1 u2  1  d (r2 d211) +  d (r2 d211) ;
22
22
where 1 is dened above.
For any i, we have the recursive formula:




1
i =  1 ui  1 
(r d  : : : di;i 1i 1)
dii i i1 1

u1 =

9.5.

173

SML ESTIMATION OF SELECTION MODELS

1
+
(r
dii i

di11



: : : di;i 1i 1)

(u1; : : : ; uK ).
The likelihood function now involves random variables ui ; i =
1; : : : ; K and K integrals with constant bounds:
which depends on the sequence of uniform random variables

L=

Z 1

where

:::

Z 1" Y
K 

i=1

1

(r
dii i

di11

: : : di;i 1i 1)

!

g("jD )

du1du2 : : : duK ;

i's are implicit recursive functions of the ui's.

Note: the product of conditional normal densities

QK
i Ai (i)

disappears from the likelihood function, because it is equals to

dui=di.

Since the

ui's

are i.i.d., we can approximate the likelihood

above by

 Drawing S values for the vector u: fus1; us2; : : : ; usK gSs=1;


 Compute recursively (1s; : : : ; Ks ) from us above;
 Average out over the S draws to form the Simulated Likelihood:

LS =

"K 
S Y
X

1
1
1 
(r
S s=1 i=1
dii i

di11s



: : : di;i 1is 1)

Note
Easy to generalize to a restriction set of the form

a <  < b.

would construct recursively:

i = q [ui; (ai

di11

: : : di;i 1i 1)=dii;

g("jD s) :

We

174

CHAPTER 9. NONLINEAR PANEL DATA MODELS

(bi

di11

: : : di;i 1i 1)=dii] ; i = 1; : : : ; K;

where

q(u; a; b) =  1 [(a)  (1 u) + (b)  u] :


If we want to compute for example the probability of an event

a <  < b in the multivariate case, we just need to evaluate


Q( ) = Q1:Q2: : : : QK ;

where

Qi =  [(bi di11 : : : di;i 1i 1)=dii]


 [(ai di11 : : : di;i 1i 1)=dii] ;
and average out over simulations.

9.5.2

Example

Based on the paper: V.A. Hajivassiliou and Y.M. Ioannides 2001,


"Unemployment and liquidity constraints".

St and Et respectively.
y1t > 0;
y1t  0:

Liquidity and employment constraints:

St =
Et =

8
<
:

0
1

1
0

if
if

y2t <  ;

+
if   y2t <  ;
+

if   y2t :
if

System of latent variables:

y1 = 1I(y2 <  ) 11 + 1I( < y2 < +) 12 + x1 1 + v1;


y2 = 1I(y1 > 0)2 + x2 2 + v2:
Six possible regimes, as (S; E ) in f0; 1g  f 1; 0; 1g.

9.5.

SML ESTIMATION OF SELECTION MODELS

S E
0

-1

-1

175

y1
y2
11 + x1 1 + v1 < 0
x2 2 + v2 < 
x1 1 + v1 <0
 < x2 2 + v2 < +
12 + x1 1 + v1 < 0
+ < x2 2 + v2
11 + x1 1 + v1 > 0
2 + x2 2 + v2 < 
x1 1 + v1 > 0  < 2 + x2 2 + v2 < +
12 + x1 1 + v1 > 0
+ < 2 + x2 2 + v2

We can then dene bounds corresponding to

a1
a2

<

v1
v2

<

b1
b2

as follows:

S E
0

-1

-1

a1

1
1
1

( 11 + x1 1)
x1 1 
( 12 + x1 1) +

a2


+

x2 2
x2 2

2
2

x2 2
x2 2

b1
( 11 + x1 1)

x1 1
+
( 12 + x1 1)
+1  2
+1 + 2
+1

Advantage of the method: only need to specify the variancecovariance matrix

for  (and possibly the one for ") in g("j).

vit = i + it in the


example above, where v corresponds to " and corresponds to 
In the panel data case: We would have

in our general notation above.

 We can then construct the distribution of full error term condi-

tional on

(Recall discussion on panel Probit model).

b2
x2 2
x2 2
+1
x2 2
x2 2
+1

176

CHAPTER 9. NONLINEAR PANEL DATA MODELS

 Allows for multivariate distributions for individual eects, possibly correlated across equations.

 Allows for various serial and contemporaneous correlations across


the

it's (pure one-way random eects, serial correlation, etc.)

177

Appendix 1. Maximum-Likelihood estimation


of the Random-eect model
and ", the log-likelihood is
NT
N
1 0 1
log("2)
log()
U  U;
2
2
2"2

Under normality assumption for

log L =
where

NT
log(2)
2

 =
="2 = Q + B , and

j
j = ("2)N (T

1)( 2 + T  2 )N = ( 2)NT N :
"

"

Concentrated log-likelihood wrt.

":

 

 

NT
1
N
NT
log(2)
log d0 Q + B d
log();
log L =
2
2

2
where d = Y
X ^ .
Estimate of 1= conditional on :
P P
0 Qd
d
dit di)2
i t (P
1d
= =
=
:
(T 1)d0Bd T (T 1) i(di d)2
Estimate of

conditional on 1=:


1
X0 Q + B X

 1

1
X 0 Q + B Y:

Maddala (1971): there are at most 2 maxima for the log-likelihood


(problem of local maximum).
Breusch (1987) procedure: iterate between
vergence.

^ "2 and 1d
= until con-

 Starting with ^ W ithin and 1= = 0, the next 1d


= is positive and
starts an increasing sequence;

178

APPENDIX 1. MLE OF THE RANDOM-EFFECT MODEL

 Starting with ^ Between and 1= ! 1, the next 1d


= is positive
and starts a decreasing sequence.

Since at most 2 maxima, use both as starting values. If


maximum of log L is the same, this is the true maximum.

179

Appendix 2. The two-way random eects model


A2.1 Assumptions and notation
Assumptions:

i v IID(0;  2 ); t v IID(0; 2 ); "it v IID(0; "2);


E ( it) = E ( i"it) = E ("it) = 0;
and

is independent of

i; t and "it.

We have

E (uitujs) =

8 2
2
2
<  +  + "
2
: 2


i = j; t = s;
if i = j; t 6= s;
if i 6= j; t = s:
if

Variance-covariance matrix of error term is

=  2 (IN
eT e0T ) + 2 (eN e0N
IT ) + "2(IN
IT )
= T  2 B + N2 B + "2INT :
A2.2 Feasible GLS estimation
We can write

P4
j =1 j Mj ,

1 = "2
2 = T  2 + "2
3 = N2 + "2
4 = T  2 + N2 + "2

with

M1 = (IN eNNeN )
(IT
0
0
M2 = (IN eNNeN )
eETeT
0
0
M3 = eNNeN
(IT eTTeT )
0
0
M4 = ( eNNeN )
( eTTeT ):

eT e0T
T )

180

APPENDIX 2. THE TWO-WAY RANDOM EFFECTS MODEL

We have

r =

P4
r
j =1 j Mj ,

so that

4 
X


"
1=2 =
p" j Mj
j =1
and the typical element of

yit = yit
with

1 = 1

p" 2 ; 2 = 1

Y  = "
1=2Y
1yi


is

2yt + 3y;


p" 3 ; 3 = 1 + 2 +

GLS estimates obtain by OLS regression of

p" 4

1:

Y  on X .

V ar(Mj U ) = j Mj ; j = 1; 2; 3, the Best Quadratic Un0


biased estimator of j is U Mj U=tr (Mj ); j = 1; 2; 3.
Because

Amemiya (1971): Replace OLS residuals by Within (two-eect)


residuals.

Asymptotic distribution of variance component esti-

mates:

p
2
NT
(^

p 2"
@ N (^
p 2
0

T (^

"2)
2"4 0 0
 2 ) A v N @0; @ 0 2 4 0
0 0 24
2 )

Method of Swamy and Arora (1972):

11
AA :

Estimate variance com-

ponents from mean square errors of three regressions:


Between-individual and Between-periods.

First regression: Within, model transformed by


M1 = (IN eN e0N =N )
(IT eT e0T =T ).

Within,

181
Estimate of

1:

^ 1 = ^ 2" =

[Y 0M1Y

Y 0M1X (X 0M1X ) 1X 0M1Y ]


:
(N 1)(T 1) K

Second regression: Between individual,


M2 = (IN eN e0N =N )
(eT e0T =T ).
Estimate of

2:

^ 2 =

[Y 0M2Y

and we compute

model transformed by

Y 0M2X (X 0M2X ) 1X 0M2Y ]


;
(N 1) K

^ 2 = (1=T )(^2

^ 2" ).

Third regression: Between period, model transformed by


M3 = (IN eN eN =N )
(eT e0T =T ).
Estimate of

3:

^ 3 =

[Y 0M3Y

and we compute

Y 0M3X (X 0M3X ) 1X 0M3Y ]


;
(T 1) K

^ 2 = (1=N )(^3

^ 2" ).

General formulation of the GLS estimate:



^ GLS = (X 0 M1X )="2 + (X 0M2X )=2 + (X 0M3X )=3 1

and

(X 0M1Y )="2 + (X 0M2Y )=2 + (X 0M3Y )=3


V ar( ^ GLS ) = "2 (X 0M1X ) + "2(X 0 M2X )=2


+"2(X 0M3X )=3 1 :


^ W ithin = [X 0M1X ] 1[X 0M1Y ],
Within estimator
^ BI = [X 0M2X ] 1[X 0M2Y ],
Between-individual estimator is

182

APPENDIX 2. THE TWO-WAY RANDOM EFFECTS MODEL

Between-period estimator is

^ BP = [X 0M3X ] 1[X 0 M3Y ], so that

^ GLS = W1 ^ W ithin + W2 ^ BI + W3 ^ BP ;
with

i
0M X
0M X 1
X
X
0
2
2
W1 = X M1X + "  + " 
(X 0M1X );
h
i
0M X
0 M X 1 "
X
X
2
0
0
2
W2 = X M1X + "  + " 
 (X M2 X );
h
i
0M X
0 M X 1 "
X
X
0
2
2
W3 = X M1X + 
+
(X 0M3X ):
2

"

2

"

3

2
2
2

3

 If  2 = 2 = 0, ^ GLS is ^ OLS ;


 When T and N ! 1, ^ GLS ! ^ W ithin;
 If " ! 1, then ^ GLS ! ^ BI ;
 If " ! 1, then ^ GLS ! ^ BP .
2

2
2
3

A2.3 Testing for eects


Breusch-Pagan (1980): Lagrange Multiplier test statistic for

 =  = 0.

Lagrange Multiplier (LM) test: uses restricted estimates

H0 :

 only

  
 1 

@ log L() 0
@ 2 log L()
@ log L()
LM =
E
;
@
@@0
@
where

log L() =
and

 = ( 2 ; 2 ; "2).

NT
log(2)
2

1
log j
j
2

1 0 1
U
U;
2

183
Gradient of log likelihood:

@ log L()
1
@

= tr
1
@i
2
@i
i = 1; 2; 3.



1
@

+ U 0
1

1U ;
2
@i

Because

=  2 (IN
eT e0T ) + 2 (eN e0N
IT ) + "2(IN
IT );
we have

=
@i

8
0
< IN eT eT
eN e0N IT
:
INT

Hence

( 2 )
2
i=2 ( )
2
i=3 (" ):
i=1

0
0
0
@ log L()
NT 4 1 U0 (IN
0 eT eT )U=U 0U
=
1 U (eN eN
IT )U=U U
@
2"2
0
and

3
5;

 1
@ 2 log L()
=
E
@@0
2
3
(
N
1)
0
(1
N
)
4
2"
4
0
(T 1) (1 T ) 5 :
NT (N 1)(T 1) (1 N ) (1 T ) (NT 1)


LM test statistic is nally

NT
LM =
1
2(T 1)


NT
+
1
2(N 1)


U 0(IN
eT e0T )U 2
U 0U

U 0(eN e0N
IT )U 2
U 0U

184

APPENDIX 2. THE TWO-WAY RANDOM EFFECTS MODEL

and is distributed as a

Important note. LM

2(2) under H0.

test statistic does not depend on variances,

only uses OLS residuals

U.

185

Appendix 3. The one-way unbalanced random


eects model
A3.1 Notation

D1 and D1 + D2 resp.

 



D1  1 )
Y1
X1
U1
=
+
;
(D1 + D2)  1 ) Y2
X2
U2
where X1 and X2 are resp. D1  K and (D1 + D2 )  K .

Consider 2 cross-sections, of dimension

Variance-covariance matrix of

Now, let

We have

"2 ID1 +  2 eD1 e0D1

0
0

Tj =


Pj
i=1 Di ,


1 0

=
;
0
2

is

(2D1 + D2)  (2D1 + D2)


=
0

"2 ID1 +  2 eD1 e0D1


 2 eD1 e0D2
2
0
2
 eD2 eD1
" ID2 +  2 eD2 e0D2

(Tj  2 + "2)eTj e0Tj =Tj


with
j =
+"2(ITj eTj e0Tj =Tj ):

rj = (Tj  2 + "2)r

eT e0
j

Tj

Tj
2
2
2
If we denote wj = Tj  + " ,

matrix:

+ ("2)r ITj

!
0
eTj eTj

Tj

"
wj

the transformation for the unbal-

anced panel is

"
j 1=2 =

T1 = D1 and T2 = D1 + D2.

so that

Using the formula for the power of the

eTj e0Tj
+ ITj
Tj

eTj e0Tj
Tj

186APPENDIX 3.

THE ONE-WAY UNBALANCED RANDOM EFFECTS MODEL

eTj e0Tj
j
Tj

= ITj

where

1=2Yj : yjt
Typical element of "

Direct generalization to the case


diagonal and o-terms (in the

0
^ GLS = X  X 

 1

Y  = "
1=2Y , and

"
1=2 = diag ITi

j = 1

"
:
wj


PTj
1
j Tj t=1 yjt .

N > 2,

because

is block-

j 's) are always equal to  2 .

0
X  Y  where X  = "
1=2X;

eTi e0Ti
+ diag
Ti



"
wi



eTi e0Ti
Ti



A3.2 Estimation of variance components


Amemiya (1971) suggests the following estimates for

U^ 0QU^
;
T
N
K
i
i

 2 and "2:

^ 2" = P


N + tr (X 0QX ) 1X 0 B X ^ 2"
2
P
P 2 P
^ =
T
i
i
i Ti = i Ti
  0 

tr (X Q X ) 1X 0 (Jn=N ) X ^ 2"
P
P 2 P
+
;
T
T
=
T
i
i
i
i i
i
P
P
where Jn is a matrix of ones, of dimension (
i Ti)  ( i Ti),
U^ 0B U^

eTi e0Ti

B = diag
Ti

ji=1!N ; Q = diag

ITi

eTi e0Ti
Ti

ji=1!N :

187

Appendix 4. ML estimation of dynamic panel


models
A4.1 Likelihood functions
Dierent likelihoods corresponding to cases 1 to 4 above. Assumption:

i and "it are jointly normally distributed.


y

For Case 1/ ( i0 xed):

L1 = (2)

NT
2

"
N

(det V )

exp

N
1X

2 i=1

u0iVT 1ui ;

ui = (yi yi; 1 xi zi ) and VT = "2IT +  2 eT e0T , the


(T  T ) variance-covariance matrix for unit i.

where

i):

For Case 2.a/ ( i0 random and independent of

L2a = L1  (2) (y2 )


N
2

"

N
2

N
X

1
(y
2y2 i=1 i0

exp

y )2 :
0

For Case 2.b/ ( i0 random and correlated with


NT

L2b = (2)
(

exp

("2)

N (T

1)

("2 + T a)

N
2

(y2 )
0

N
2

" T
N X
X

N X
T
X

1
a
2+
u
it
2"2 i=1 t=1
2"2("2 + T a) i=1

(2)

i):

"

N
2

exp

N
X

1
(y
2y2 i=1 i0
0

t=1

u2it

y )2 ;
0

#)

188

APPENDIX 4. ML ESTIMATION OF DYNAMIC PANEL MODELS

where

y ).

a =  2 2 y2

and

uit = yit yi;t 1 xit zi (yi0

For Case 3/ ( i0 xed):

L3 = (2)

NT
2

( 2 )

NT

"

(2)

N X
T
1 X
[(y
2"2 i=1 t=1 it

exp

(yi;t 1
N

yi0 + wi0) xit


"

(2)

N
2

z ]2

2)):

"

N (T +1)

L4a = (2)

"

j
T +1j

N
2

exp

N
1X

wi0)2 :

For Case 4.a/ ( i0 random with common mean

2=(1

N
1 X
(y
22 i=1 i0

exp

yi0 + wi0)

w and variance
#

1 v0 ;
vi
T +1
i

2 i=1
where vi is a (T + 1) vector vi = (yi0
w ; yi1 yi0 xit
zi ; : : : ; yiT yi;T 1 xiT zi ) and
T +1 is a (T +1)  (T +1)
matrix

T +1 = "2 1 
0T
Useful expressions:

00T
IT

 1 
+  2 1 
eT
1

"2T
j
T +1j = 1 2 "2 + T  2 + 11 +   2
and

1 = 1

T +1
"2

"

1 2 00T
0T
IT

 2
"
 2

; e0T :

1+
+T +
1 

 1

189

1+
(1 + ; e0T ) :
eT

For Case 4.b/ ( i0 random with common mean w and arbitrary


2
variance w0 ): same as 4.a/, but with
T +1 replaced by

 1
2 = 2 00 

VT +1 = "2 w " T +  2 1 
0T IT
eT



; e0T :

For Case 4.c/ ( i0 random with mean i0 and variance


2 ): same as 4.a/, but with y replaced by i0.
0

)

For Case 4.d/ ( i0 random with mean


ance

(1

2

Same as L2b but with y , 


2
) (2 + w2 ) respectively.

w0 ):

i0

and

"2=(1

and arbitrary vari-

y

replaced by

i0,

Consistency of the MLE (Maximum Likelihood Estimator) depends on the way

and

tend to innity, in each case.

Crucial problem with MLE: estimator is not consistent when, for


large

and xed

T , the choice of initial conditions is mistaken,

because likelihood function is dierent.

A4.2 Specication tests


Useful for checking maintained assumptions on initial conditions.
Based on Likelihood Ratio (LR) statistics.

Case 1

yi0 xed.

Test for random-eects specication, i.e. for

VT +1. Let L01 denote es2


timated log-likelihood L1 under assumption H0 : VT = " IT +
 2 eT e0T , and L1 the estimated log-likelihood with unrestricted VT
 L0) is distributed
(T (T + 1)=2 components). Under H0 , 2(L1
1

structure of variance-covariance matrix

190

APPENDIX 4. ML ESTIMATION OF DYNAMIC PANEL MODELS

as a

2(T (T + 1)=2 2).

Case 4.a

wi0

random with common mean

wi0

random with common mean

w

and variance

"2=(1 2)
H0: matrix
T +1 as dened in likelihood for Case 4.a, vs. alternative: unrestricted variance-covariance with (T + 1)(T + 2)=2
0

components, with log-likelihoods L4a and L4a respectively. Under
H0, 2(L4a L04a) is distributed as a 2((T + 1)(T + 2)=2 2)
(note only two free parameters in restricted VT +1, as  already
estimated).

Case 4.b
variance

2

w0 .

Let

L04b

w

and arbitrary

denote log-likelihood under restriction on

VT +1 for Case 4.b, and L4a the unrestricted log-likelihood for Case
 L0 ) admits
4.a (as above). Under H0 : True model is 4.b, 2(L4a
4b
2
a  ((T +1)(T +2)=2
3) distribution (3 free parameters in Case
2
2
2
4.b: " ;  ; w ).
0

Test for stationarity: Case 4.a vs. Case 4.b

H0:
2
as a  (1).
Under

stationarity (Case 4.a),

2(L04:b L04:a) is distributed

191

Appendix 5. GMM estimation of static panel


models
In the Instrumental-Variable context (Hausman-Taylor, AmemiyaMaCurdy, Breusch-Mizon,Schmidt), we assumed:

 Error-component model with E (uu0) =


= "2INT +  2 (IN

eT e0T ).

 Endogeneity was caused by E (X 0 ) 6= 0 or E (Z 0 ) 6= 0, but it


E (X 0") = E (Z 0") = 0.

was assumed

With GMM, we can consider dierent exogeneity assumptions related to

or ", producing dierent orthogonality conditions.

Several cases:
1. Random or xed eects (instruments correlated with

);

2. Strictly or weakly exogenous instruments (correlation with

").

A.5.1 Computation of the variance-covariance matrix


For the panel data case, we can use the fact that several time
observations are available for each unit.

If heteroskedasticity of

E (uituis) = 0; t 6= s, we have


N
1 0
VN = NV arf (x; ) = Nvar Z u = 2 E [Z 0uu0Z ]
N
N
1
= Z 0[IT
diagfi2g]Z
N
P
2
where i can be estimated by 
^ 2i = T1 Tt=1 u^2it. Hence, a optimal
second-step estimate for VN would be
N
X
1
^ i; where H^ = diagf^ 2i g:
V^N =
Zi0HZ
N i=1
2
2
the form E (u ) = 
it

such that

192

APPENDIX 5. GMM ESTIMATION OF STATIC PANEL MODELS

Important aspect for panel data: If we transform the model to


remove individual eects

i, the optimal weighting matrix AN =

VN 1 depends on this transformation.


model with

Consider a linear xed-eect

q orthogonality conditions:

E [Wi0ui ] = 0 where ui = QT ui


and Wi is a T  q matrix of instruments.
Because QT is a T  T symmetric matrix, conditions above can
0
be rewritten E [(QT Wi ) ui ] = 0 and the optimal weighting matrix
AN is VN 1 with


W 0 u
VN = NE
N

  0 
u W

= NE [(QW=N )0uu0(QW=N )]

= [(QW )0 (QW )]:


N
Hence, for GMM, it is equivalent to transform the model (by
or the instrument matrix.

Assume now the error-component assumption holds; we have

VN =

because

1
[(QW )0["2INT + T  2 B ](QW )]
N
1
= [(QW )0["2INT ](QW )]
N

Q and B are orthogonal, therefore


"2 0
1 0
VN = (W Q)(QW )
W QW:
N
N

Replacing in the GMM criterion:

 0
0
0 
^N = arg min u () W ( W QW ) 1 W u () ;

N
N
N

Q)

193
and the optimal GMM estimator is

^N = X 0W (W 0QW ) 1W 0X  1 X 0W (W 0QW ) 1W 0Y  :


A.5.2 Random eects and strictly exogenous instruments
By denition, random eects

exogeneity (

E (X 0 ) = 0 and we assume strict

uncorrelated with

" at

1  q instrument vector wit, we have

E (wis0 uit) = 0

for

every time period). For a

s; t = 1; 2; : : : ; T;

qT 2 moment conditions. Let wit0  (wi1; wi2; : : : ; wit)


8t = 1; 2; : : : ; T , and set WSE;i = IT
wiT0 . Moment conditions
0
then read E (WSE;iui ) = 0.

which gives

We can show, using Theorem above, that GMM estimators using


the form for 2SLS or the 3SLS form are equivalent. We have

0 ) =  1=2
w0
 1=2WSE;i =  1=2(IN
wiT
iT
0 )( 1=2
I ) = W B;
= (IN
wiT
qT
SE;i

where

B =  1=2
IqT .

Hence Theorem 4 applies.

A.5.3 Fixed eects and strictly exogenous instruments


We assume now instruments are correlated with

, but still strictly

i, we can use the rst-dierence operator


LT of dimension T  (T 1):

exogenous. To remove

L0T yi = L0T xi + L0T ui

where

L0T ui = L0T "i;

194

APPENDIX 5. GMM ESTIMATION OF STATIC PANEL MODELS

where

6
6
LT = 6
6
6
4

Note that (

 T)

LT (L0T LT ) 1L0T .

If instruments

1 0 0
1 1 0




0
0

0 0 0
0 0 0




1 1
0
1

..
.

..
.

..
.

..
.

..
.

0
0
..
.

3
7
7
7
7:
7
5

Within operator is related to

LT : QT =

wit are strictly exogenous, we have

0 L0 u ) = E (Z 0 L0 " ) = 0;
E (ZSE;i
T i
SE;i t i

where

0:
ZSE;i = IT 1
wiT

Model in First-dierence form can be estimated by GMM using

ZSE;i as instruments.

A.5.4 Weakly exogenous instruments


In this case, we consider a

1q

vector of instruments

wit

such

that

E (wit0 uis) = 0; for t = 1; 2; : : : ; T; t  s:


There are T (T + 1)=2 such conditions: instruments are not correlated with future values of "it (and are not correlated with i ).
On the other hand, if instruments are weakly exogenous but are
correlated with

i, we have a smaller number of conditions, that

can be written

E (wit0 uis) = E (wit0 "is) = 0


where

uis = uis

for

t = 1; 2; : : : ; T

1; t  s;

ui;s 1.

A convenient way of transforming the above model is to use the

Forward-Filter (Keane and Runkle 1992). Let F be a T T upper0


triangular matrix that satises F F = IT , so that Cov (F ui ) =

195

IT . We have F = fFij g, Fij = 0 for i > j . Using instruments


0 ), we have the following Forward-Filter
Wi = (wi01; wi02; : : : ; wiT
estimator:


 1
^ FF = X 0F 0 H (H 0H ) 1H 0F X
 X 0F 0 H (H 0H ) 1H 0F Y ;

where

F  = IN
F .

Dierence between standard IV and FF: IV with instruments

Wi

1=2 is not consistent unless Hi are strictly exogenous.


and lter 
But FF transformation preserves the weak exogeneity of instruments

wit.

When

is large and

is small, the FF is not nec-

essarily more ecient GMM or 3SLS with the same instruments

Wi .

If we don't have conditional homoskedasticity:

plim N1

PN
0 0
i=1 HiF ui uiF Hi

1 PN H 0 F F 0 H
6= plimP
i
i=1 i
N

plim N1

N
0
i=1 Hi Hi :

A.5.5 Ecient GMM estimation


We now present alternative GMM estimators that may be more
ecient than IV-HT, IV-AM or IV-BMS. Why: under strict exogeneity assumption, we have much more moment conditions than
HT, AM or BMS.
We rst consider the case where we restrict

 = "2IT +  eT e0T
2

(as in HT, AM and BMS). We will then examine the case of an


unrestricted

 matrix.

Consider the model with strictly exogenous regressors

yi = Ri + (eT
zi) + ui  Xi + ui;
0 )0 (a T  k matrix of
ui = (eT
i )+"i, Ri = (ri01; r20 i; : : : ; riT
0 0
00
time-varying regressors), eT
zi = [zi ; zi ; : : : ; zi ] (a T  g matrix

where

196

APPENDIX 5. GMM ESTIMATION OF STATIC PANEL MODELS

of time-invariant regressors).
Assume regressors

rit and zi are strictly exogenous wrt. "it:

E (di
"i) = 0;

where

di = (ri1; ri2; : : : ; riT ; zi);

but some may be correlated with

i:

rit = (r1it; r2it); zi = (z1i; z2i); E (r10 it i ) = E (w10 i i) = 0:


HT, AM and BMS instruments are of the form

WA;i = (QT Ri; eT


si); sHT;i = (r1i; z1i)
sAM;i = (r1i1; r1i2; : : : ; r1iT ; z1i)
sBMS;i = (sAM;i; r2i);
where

r2i = (r2i1

If the

no conditional heteroskedasticity condition holds:

r2i; : : : ; r2i;T 1 r2i).

E (Wi0uiu0iWi) = E (Wi0Wi),
ing the same instruments

Wi.

then GMM is as ecient as IV usBut GMM is more ecient if this

condition is violated and a unrestricted weighting matrix is used.


The strict exogeneity assumption implies

E [(LT
di)0ui] = E (L0T ui
di)
= E [L0T (eT i + "i)
di] = E (L0T "i
di) = 0;
where

LT

di is a T  [(T

1)(kT + g]

matrix. Arellano and

Bover (1995) propose a GMM estimator using instruments:

WB;i = (LT
di; eT
si) instead of WA;i = (QT Ri; eT
si):
Number of additional instruments wrt.
BMS:

rank(ZB;i)

rank(ZA;i) = (T

advantage: variance-covariance matrix

IV-HT, IV-AM or IV-

1)(kT + g)

k.

 is unrestricted.

Other

197
A.5.6 GMM with unrestricted variance-covariance matrix

ZB;i satisfy the no conditional heteroskedasticity assumption, but the variance-covariance of u is unrestricted.

We assume instruments

Result of Im, Ahn, Schmidt and Wooldridge (1996): The 3SLS


form of the GMM estimator with unrestricted
ments

 1ZA;i

using instru-

is numerically equivalent to the 3SLS estimator

when all instruments

ZB;i are used, in the BMS case.

This is not

true for HT or AM instrument matrices. We have

E (Ri0 QT  1ui) = E (Ri0 QT  1eT i) + E (Ri0 QT  1"i)


= E (Ri0 QT  1eT i):
But when BMS assumption is not true and with an unrestricted

, E (Ri0 QT  1eT i) 6= 0.

These authors propose another trans-

formation matrix instead of

Q =  1
and we can show that

QT

for removing

i :

 1eT (e0T  1eT ) 1e0T  1;

QeT = 0.

Therefore:

Ri0 QeT i = 0 and E (Ri0 Qui) = E (Ri0 Q"i) = 0


because

Ri are assumed strictly exogenous wrt. "i.

The optimal choice of instruments would be

ZC;i = (QRi;  1eT


si);
for

si:

HT, AM or BMS. This modied 3SLS estimator is an

 is unrestricted but the no conditional heteroskedasticity condition is valid.

ecient GMM estimator when

A.5.7 GMM vs. IV estimators

198

APPENDIX 5. GMM ESTIMATION OF STATIC PANEL MODELS

Main dierence between GMM and Instrumental-Variable estimators:


specied

variance-covariance matrix of error terms need not be

a priori for GMM, it must be for IV.

In the GMM case, we nd parameters by solving the system of


moment conditions or by nding

^GMM = arg min u0()ZVN 1Z 0u();




Z are instruments and VN is an estimate of the variance of


0 0
moment conditions: V = E (Z uu Z ). In a linear model, u( ) =
Y X where   , we can solve directly for ^ N :


^ GMM = X 0ZVN 1Z 0X 1 X 0ZVN 1Z 0Y:

where

In the IV case, we restrict

u to be a) homoskedastic (V

is diago-

nal), or b) heteroskedastic of known form. Example: panel data

= E (uu0) = IN
, where

= "2INT +  2 (IN
eT e0T ) and  = "2IT +  2 eT e0T :

Consider two IV estimators for panel data: 2SLS or 3SLS.


In the 2SLS case (HT, AM, BMS), we premultiply the model in
vector form

Zi:

yi = Xi + ui

by

 1=2 and then apply instruments



^ 2SLS = X0
1=2Z (Z 0Z ) 1Z 0
1=2X 1
 X 0
1=2Z (Z 0Z ) 1Z 0
1=2Y :
1=2Zi as instruAn equivalent 2SLS estimator obtains by using 
1

ments:



^ 2SLS = X0
1Z (Z 0
1Z ) 1Z 0
1X 1
 X 0
1Z (Z 0
1Z ) 1Z 0
1Y :
2

In the 3SLS case, we have





^ 3SLS = X 0Z (Z 0
Z ) 1Z 0X 1  X 0Z (Z 0
Z ) 1Z 0Y :

199
GMM and 3SLS are equivalent if the following condition holds:

E (Zi0uiu0iZi) = E (Zi0Zi) 8i = 1; 2 : : : ; N;
because, as

! 1,

N
1 0
1X
plim Z
Z = plim
Zi0u^iu^0iZi = E (Zi0uiu0iZi) = V:
N
N i=1
This condition is denoted
When condition

No conditional heteroskedasticity.

E (Zi0uiu0iZi) = E (Zi0Zi)

does not hold, GMM

is strictly more ecient than 3SLS.


Impossible to prove 3SLS is more or less ecient than 2SLS, but
there exists a condition for numerical equivalence between 2SLS
and 3SLS:

The 2SLS and 3SLS estimators are equivalent if there


exists a non-singular, non-stochastic matrix B such that
1=2Z =
ZB .

Theorem 8

is
^
estimated from rst-stage  N for GMM. It states that under this
1=2) does
condition, ltering (premultiplying instruments by

This Theorem can be applied for IV or GMM procedures:

not change eciency of GMM or IV estimators.

200APPENDIX 6.

A FRAMEWORK FOR SIMULATION-BASED INFERENCE

Appendix 6. A framework for simulation-based


inference
A.6.1 Heterogeneity and the linear property
In linear panel-data models, the residual consists of an heterogeneity factor

i and an i.i.d.

error term

"it:

uit = i + "it:
OLS (or, equivalently, ML) yield consistent but not ecient estimates if unobserved heterogeneity is omitted.

In nonlinear models, this often leads to signicant biases. Other


problem: dicult to compute the likelihood of nonlinear models
because of dependent observations for a given individual (

yit

is

not i.i.d.).

A.6.1.1 Example: Dynamic model with heterogenous AR(1)


root and no individual eect

where

yit = iyi;t 1 + "it = ( + i)yi;t 1 + "it;


jij < 1, i independent from "it, "it is N (0; "2).

The

nonlinear feature of the model comes from

yit = "it + i"i;t 1 + 2i "i;t 1 +    + hi"i;t h + : : : :


If the restricted model is estimated, under the following data generating process:

yit = yi;t 1 + "it;

201
the OLS estimate of

 is

N
1X
Cov(i; V ar(yi;t
P
^ 
i +
N i=1
1=n i V ar(yi;t

1))
1)

N
Covi(P
i; "2=(1 2i ))
1X
 +
:
=
N i=1 i 1=n i "2=(1 2i )

i > 0, Cov(i; "2=(1 2i )) > 0 and ^


average of the true i 's (the bias is positive).

If all

overestimates the

A6.1.2 Example: Duration model with heterogeneity

yit v i exp( iyit) = ( + i ) exp[ ( + i )yit];


where

i are i.i.d.

heterogeneity factors with

E ( i) = 0.

This is

the exponential duration model. If the model is misspecied:

yit v  exp( yit);


the Maximum Likelihood estimate of

^ =
We have

^ T !1
!

"

"P

 is

N PT
i=1 t=1 yit

NT

N
X

1
1
N i=1 i

# 1

# 1

N
1X
<
:
N i=1 i

Hence, the MLE of the misspecied model underestimates the average of individual parameters

i .

202APPENDIX 6.

A FRAMEWORK FOR SIMULATION-BASED INFERENCE

In many cases, it is not possible to lter out the individual effect without very restrictive assumptions (e.g., Fixed-eect Logit,
Another possibility is to integrate

Butler-Mott Probit, ...).

out the heterogeneity factor.


Basic idea: specify a density distribution for

i and compute the

conditional likelihood.

yit conditional on xit and i is


f~(yit; xit + i) with i v ( ; );
where is a distributional parameter, and the vector of param-

Assume the density function of

eters of interest.
The distribution of

yit conditional on observed variables is

f (yitjxit; ; ) =

f~(yit; xit + )( ; )d

In many cases, this cannot be solved analytically. Additional parameters to estimate:

A.6.1.3 Example: Poisson model


Assume

exp(xit + i)yit
f~(yit; xit + i) =
exp[ exp(xit + i)]:
yit!
Change of variable: i = exp( i ), with probability distribution:
1= 1 exp( =)
(; ) =
;
( )1= (1= )
where
(:): Gamma distribution, and > 0. Then it can be
shown that

(1= + yit)[ exp(xit )]yit


f (yitjxit; ; ) =
:
(1= ) (yit + 1)[1 + ; exp(xit )]yit+1=

203
This is the

negative binomial distribution.

A.5.1.4 Example: the Probit model again


Probit with heterogeneity:

P rob[yit = 1jxit; i] = [xit + i]:


Assume

i v N (0;  2 ):
P rob[yit = 1jxit] =

where

(:):

1

(xit + ) 
d ;
 

density function of

N (0; 1).

Since observations are dependent:

P rob[yi1 = 1; : : : ; yiT = 1] =

6=

T
Y
t=1

Z Y
T

1

(xit + ) 
d




t=1

P rob[yit = 1]:

In this case, numerical integration is feasible.

In more complex

cases, one can use simulation techniques to approximate integrals


of the form

M (yitjxit; ; ) =

m(yit; xit + )( ; )d :

A.6.2 Integration by simulation


Purpose: approximate multiple integrals using Monte Carlo (simulation) techniques.

204APPENDIX 6.

A FRAMEWORK FOR SIMULATION-BASED INFERENCE

We can write

M (yitjxit; ; ) =

( ; ) 0
m(yit; xit + ) 0
 ( ; 0)d ;
0
 ( ; )

(:; 0) is a known distribution density with xed parame0


ters . We have for individual i at time t:


( ; )
M (yit jxit; ; ) = E m(yit; xit + ) 0
;
 ( ; 0)
0
0
which is the expectation using distribution of m( ) ( )= ( ).
0
Density function  is the importance sampling function.
where

S random variables for individual i: is; s = 1; : : : ; S


from distribution  0 , we can approximate the above

If we can nd

drawn

expectation by

S
1X
( is ; )
s
m(yit; xit + i ) 0 s 0 :
S s=1
 ( i ; )
Under (mild) regularity assumptions, the simulated expression
converges to the above expectation, using a weak Law of Large
Numbers. Two issues in practice:

 Choice of density function 0( ; );


 Number of draws to obtain consistency ?
For the choice of the importance sampling function, make sure the
domain of

0 contains the domain of  (to capture rare events in

tails of distribution). Regarding the number of draws, consistency


of estimator depends on estimation procedure.

A.6.3 Simulated GMM and Maximum Likelihood estimators

205
Gouriroux and Monfort (J. of Econometrics, 1993): Simulated
GMM (SGMM) and Simulated Maximum Likelihood (SML).
For SGMM, when population moments are impossible to compute,
we replace

S
1X
E [f (yit; xit; i; ] = 0 by
[f (yit; xit; is; ]  0;
S s=1
or by

S
1X
( s ; )
[f (yit; xit; is; ] 0 is  0:
S s=1
 ( i ; )

The SGMM criterion is then

s
MGMM
=

( N
X

S
1X
[f (yi; xi; is; ]0 Zi
S s=1
i=1

!)

T 1

N
X

S
1X
0
Zi
[f (yi; xi; is; )]
S s=1
i=1

Zi is a T  L matrix of instruments. The SGMM is consistent and asymptotically normal when N tends to innity and S

where

is xed. This is because we can use the weak Law of Large Numbers for consistency of the simulator

1P f
s
S

towards

E f

and a

Central Limit Theorem for asymptotic normality, see below.

For the SML estimator, we want to compute

log L() =
where heterogeneity

f (yijxi; ).

N
X
i=1

log f (yijxi; );

is already integrated out in density function

But if this integration is not possible analytically ?

206APPENDIX 6.

A FRAMEWORK FOR SIMULATION-BASED INFERENCE

Suppose we nd a simulator

f~(yi; xi; ; ) where is drawn from

a known distribution, such that

E f~(yi; xi; ; ) = f (yijxi; ):

Then

f (yijxi; ) can be approximated by

S
1X
f~(yi; xi; is; );
S s=1
where

simulations are used for each

the Simulated Log-likelihood is

Ls() =

"

N
X

S
X

i: is ; s = 1; 2; : : : ; S

and

1
1
log
f~(y ; x ; s; ) :
N i=1
S s=1 i i i

The Simulated Maximum Likelihood estimator is consistent when

N=S

! 0. In practice, a very large number of simulated draws

may be necessary.

A.6.4 Choice of simulation number and mode


We use the Gouriroux and Monfort (1993) result. The SGMM
and SML criteria are of the form

"

GN () =

1
N

N
X

and we assume that, when


in

 to

(yi; xi; E (yi; xi; ; ))

! 1, GN () converges uniformly

G() = [E (yi; xi; E (yi; xi; ; ))] :


Two dierent simulated criteria can be used for GN ( ): whether
I
D
identical (GN ( )) or dierent sets (GN ( )) of simulation draws

207
are used for each individual:

"

N
S
1X
1X
I
GN ( ) =
yi; xi;
(yi; xi; s; )
N i
S s

"

GDN () =

1
N

N
X
i

yi; xi;

 Case 1. S is xed and N ! 1.

1
S

S
X
s

(yi; xi; is; )

!#

;
!#

GIN () converges to the random variable (it is a function of ( 1; : : : ; S )):


"

E yi; xi;

1
S

S
X
s

!#

(yi; xi; s; )

I
G(). Therefore ^ that maximizes (SML)
I
or minimizes (SGMM) GN ( ) is inconsistent.
GDN () converges to the non random scalar:

which is dierent from

"

!#

S
1X
E E yi; xi;
(yi; xi; s; ) ;
S s
which is in general dierent from G( ). But if function is linear
D
D
wrt. E (:), GN ( ) converges to G( ) and ^
 is consistent.

 Case 2. S and N ! 1.
Both

^I

and

^D

are consistent.

A.6.5 Examples: Probit and Tobit models

yit = xit +  i + ""it;


yit = 1 if yit > 0;
yit = 0 if yit  0; (Probit);

208APPENDIX 6.

A FRAMEWORK FOR SIMULATION-BASED INFERENCE

and

yit = xit +  i + "";


yit = yit if yit > 0;
yit = 0 if yit  0; (Tobit);
where i v N (0; 1), "it v N (0; 1).
Because
the

yit

T -fold

is present for each component in

yi = (yi1; : : : ; yiT )0,

are serially correlated and the likelihood would contain


integrals. But we can consider the conditional likelihood

functions of

yi given xi and i:

f (yijxi; i; ) =

yit =1

(xit +  ) 

for the Probit and

1
y
f (yijxi; i; ) =
 it

yit >0 "
Y

Y
yit =0

Y
yit =0

( xit

xit
"

xit 
"

 )

for the Tobit. These conditional likelihoods can be directly used


as simulators.

Appendix 7. Example: the SAS c Software


*
*
*
*
*
*

;
DYNTAB.SAS ;
;
Uses datafile DYNTAB3.DAT;
;
Create library and file names ;
* Change directory information below ;

libname water 'd:/dea/panel';


filename watfile 'd:/dea/panel/dyntab3.dat';
* Create SAS table and read data from Ascii file ;
data wat;
infile watfile;
input id year conso price revenue precip ;
* Compute logs ;
lconso=log(conso); lprice=log(price);
lrevenue=log(revenue);
run;
* Descriptive statistics ;
proc means data=wat;run;
* OLS regression ;
proc reg data=wat;
model lconso = lprice lrevenue;
run;
* Model 1: One-way Fixed effects ;
* cs=116:

Set the number of cross-sections ;

209

210

APPENDIX 7. EXAMPLE: THE SAS

* option /fixone: Set one-way Fixed-effect ;


proc tscsreg data=wat cs=116;
model lconso= lprice lrevenue /fixone ;
run;
* Model 2: Two-way Fixed effects ;
* option /fixtwo: Set two-way Fixed-effect ;
proc tscsreg data=wat cs=116;
model lconso= lprice lrevenue /fixtwo ;
run;
* Model 3: One-way Random effects ;
* option /ranone: Set one-way Random-effect ;
proc tscsreg data=wat cs=116;
model lconso= lprice lrevenue /ranone;
run;
* Model 4: Two-way Random effects ;
* option /rantwo Set Two-way Random-effect ;
proc tscsreg data=wat cs=116;
model lconso= lprice lrevenue /rantwo;
run;
* Model 5: One-way Random effects with AR(1) ;
* option /ranone parks rho Set One-way Random-effect ;
* and compute RHO: Ar(1) parameter ;
proc tscsreg data=wat cs=116;
model lconso= lprice lrevenue /ranone parks rho;
run;
* Compute parameter estimates on each cross section ;
proc sort data=wat;
by year;
proc reg data=wat;

SOFTWARE

211
model lconso= lprice lrevenue ;
by year;
run;
* Compute Within and Between estimates ;
* using the MEANS procedure ;
proc sort data=wat;
by id;
proc means data=wat noprint;
var lconso lprice lrevenue ;
by id;
output out=out1 mean=mconso mprice mrevenue ;
data out1;set out1;
keep id mconso mprice mrevenue ;
data wat;
merge wat out1;
by id;
data wat;set wat;
qconso=lconso-mconso; qprice=lprice-mprice;
qrevenue=lrevenue-mrevenue;
* Within regression ;
proc reg data=wat;
model qconso = qprice qrevenue ;
run;
* Between regression ;
proc reg data=wat;
model mconso = mprice mrevenue;
run;

212

APPENDIX 7. EXAMPLE: THE SAS

SOFTWARE

ESTIMATES USING TSCSREG PROCEDURE


MODEL 1. ONE-WAY FIXED EFFECTS

The SAS System 16:15 Monday, January 22, 2001 3


TSCSREG Procedure
Dependent Variable: LCONSO

Model Description
Estimation Method
FIXONE
Number of Cross Sections 116
Time Series Length
6
SSE
MSE
RSQ

Model Variance
2.578099 DFE
578
0.00446
Root MSE 0.066786
0.9344

F Test for No Fixed Effects


Numerator DF:
115 F value: 58.3964
Denominator DF: 578 Prob.>F: 0.0000
Parameter Estimates
Variable
CS 1
CS 2
CS 3
CS 4
CS 5
... ...
CS 112
CS 113
CS 114
CS 115
INTERCEP
LPRICE
LREVENUE

DF
1
1
1
1
1
...
1
1
1
1
1
1
1

Parameter
Estimate
-0.455773
-0.222476
0.153338
-0.131488
0.027422
...
0.420843
-0.322888
-0.259767
-0.240823
5.099257
-0.134245
0.024386

Standard
Error
0.039463
0.039923
0.038900
0.039174
0.038890
...
0.040309
0.039376
0.038678
0.039379
0.366957
0.018447
0.033223

T for H0:
Parameter=0
-11.549433
-5.572620
3.941882
-3.356518
0.705132
... ...
10.440506
-8.200102
-6.716134
-6.115479
13.896065
-7.277506
0.734009

Prob > |T|


0.0001
0.0001
0.0001
0.0008
0.4810
...
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.4632

Variable
Label
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Intercept

213
MODEL 2. TWO-WAY FIXED EFFECTS

The SAS System 16:15 Monday, January 22, 2001 7


TSCSREG Procedure
Dependent Variable:

LCONSO

Model Description
Estimation Method
FIXTWO
Number of Cross Sections 116
Time Series Length
6
SSE
MSE
RSQ

Model Variance
2.205671 DFE
573
0.003849 Root MSE 0.062043
0.9439

F Test for No Fixed Effects


Numerator DF:
120 F value: 65.6530
Denominator DF: 573 Prob.>F: 0.0000

Variable
CS 1
CS 2
CS 3
...
CS 114
CS 115
TS 1
TS 2
TS 3
TS 4
TS 5
INTERCEP
LPRICE
LREVENUE

DF
1
1
1
...
1
1
1
1
1
1
1
1
1
1

Parameter Estimates
Parameter Standard T for H0:
Estimate
Error
Parameter=0
-0.535192 0.040793 -13.119702
-0.302435 0.041809 -7.233670
0.120803
0.037066 3.259125
... ...
...
... ...
-0.288486 0.036463 -7.911820
-0.256215 0.036669 -6.987209
-0.102087 0.017883 -5.708681
-0.047565 0.016463 -2.889216
-0.030524 0.014486 -2.107135
-0.007359 0.012507 -0.588378
-0.025528 0.009992 -2.554900
6.316873
0.396540 15.929983
-0.251061 0.034210 -7.338896
-0.053316 0.033244 -1.603773

Prob > |T|


0.0001
0.0001
0.0012
...
0.0001
0.0001
0.0001
0.0040
0.0355
0.5565
0.0109
0.0001
0.0001
0.1093

Variable
Label
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Time Seri
Time Seri
Time Seri
Time Seri
Time Seri
Intercept

214

APPENDIX 7. EXAMPLE: THE SAS

SOFTWARE

MODEL 3. ONE-WAY RANDOM EFFECTS

The SAS System 16:15 Monday, January 22, 2001 11


TSCSREG Procedure
Dependent Variable: LCONSO

Model Description
Estimation Method
RANONE
Number of Cross Sections 116
Time Series Length
6
Variance Component Estimates
SSE 3.12498
DFE
693
MSE 0.004509 Root MSE 0.067152
RSQ 0.1087
Variance Component for Cross Sections
Variance Component for Error

0.043243
0.004460

Hausman Test for Random Effects


Degrees of Freedom: 2
m value: 14.4912 Prob. > m: 0.0007

Variable
INTERCEP
LPRICE
LREVENUE

DF
1
1
1

Parameter
Estimate
4.692305
-0.149074
0.053077

Parameter Estimates
Standard T for H0:
Error
Parameter=0
0.354917 13.220844
0.017611 -8.465039
0.032306 1.642977

Prob > |T|


0.0001
0.0001
0.1008

Variable
Label
Intercept

215
MODEL 4. TWO-WAY FIXED EFFECTS

The SAS System 16:15 Monday, January 22, 2001 12


TSCSREG Procedure
Dependent Variable:

LCONSO

Model Description
Estimation Method
RANTWO
Number of Cross Sections 116
Time Series Length
6
Variance Component Estimates
SSE 2.707154 DFE
693
MSE 0.003906 Root MSE 0.062501
RSQ 0.0907
Variance Component for Cross Sections
Variance Component for Time Series
Variance Component for Error

0.043638
0.000746
0.003849

Hausman Test for Random Effects


Degrees of Freedom: 2
m value: 22.2377 Prob. > m: 0.0000

Variable
INTERCEP
LPRICE
LREVENUE

DF
1
1
1

Parameter
Estimate
5.674742
-0.225151
-0.018251

Parameter Estimates
Standard T for H0:
Error
Parameter=0
0.371984 15.255323
0.027604 -8.156464
0.032401 -0.563297

Prob > |T|


0.0001
0.0001
0.5734

Variable
Label
Intercept

216

APPENDIX 7. EXAMPLE: THE SAS

SOFTWARE

WITHIN REGRESSION USING PROC REG

Source
Model
Error
c Total

Analysis
Sum of
DF
Squares
2
0.31252
693 2.57810
695 2.89062

Root MSE
Dep Mean
C.V.

Variable
INTERCEP
QPRICE
QREVENUE

DF
1
1
1

of Variance
Mean
Square
F Value
0.15626 42.003
0.00372

0.06099
-0.00000
-1.291786E17

R-square
Adj R-sq

Prob>F
0.0001

0.1081
0.1055

Parameter Estimates
Parameter
Standard
T for H0:
Estimate
Error
Parameter=0
-5.28092E-17 0.00231195 -0.000
-0.134245
0.01684666 -7.969
0.024386
0.03034107 0.804

Prob > |T|


1.0000
0.0001
0.4218

Variable
Label

BETWEEN REGRESSION USING PROC REG

Source
Model
Error
C Total

DF
2
693
695

Analysis of Variance
Sum of
Mean
Squares
Square
F Value
7.13103
3.56551 84.369
29.28684 0.04226
36.41786

Root MSE
Dep Mean
C.V.

Variable
INTERCEP
MPRICE
MREVENUE

DF
1
1
1

Parameter
Estimate
-0.176444
-0.259461
0.494483

0.20557
4.99481
4.11576

R-square
Adj R-sq

Prob>F
0.0001

0.1958
0.1935

Parameter Estimates
Standard
T for H0:
Error
Parameter=0
0.68091356 -0.259
0.02278084 -11.389
0.05958703 8.298

Prob > |T|


0.7956
0.0001
0.0001

Variable
Label

217

Appendix 8. A crash course in Gauss c


Introduction

Gauss is an interpreter computer language, that is most conveniently run in interactive mode (global variables are kept in memory until one quits Gauss). It has a small built-in editor useful for
long jobs, or it can be used in command mode.

Editing and running jobs


When Gauss is executed rst, you are inside the command mode,
with the following prompt:

[Gauss].

You can toggle between the

command mode and the edit mode using either tool bar (Windows bar at the bottom, Gauss bar on top). In command mode,
you can edit any le (for example

myprog.prg)

by typing

edit

myprog.prg, or running this le by typing run myprog.prg.

You

may edit the preselected le by entering the F4 function key. In


edit mode, simply use the Run option on top, or enter key function F3. You may save the program by entering the F2 function
key.

Saving results and output management


To declare a text le for output, use the syntax

output file=c:/mydir/toto.out reset;


The reset option clears the le if it exists!
In a program, you can choose to have output written to the le
or not (useful for inspecting results on the screen only):

output on; (open output le at bottom) or output off; (closes


output le).

218

APPENDIX 8. A CRASH COURSE IN GAUSS

Loading data and creating Gauss datasets


You can either work with data les in text format (Ascii), or with
preexisting Gauss datasets. To load a text-format data le:

load x[1000,5]=mydata.dat
or
n=100;t=10;nvar=5;load x[n*t,nvar]=mydata.dat;.

1000  5 matrix denoted x in memory .


1

This will load a

Some built-in procedures in Gauss require specic Gauss datasets.

To create one, you must specify a) a data matrix ( ); b) a vector


of variable names (

mydata").

varnames);

c) and the Gauss dataset name

("

Then, use the command

call saved(x,"mydata",varnames).

Basic operators
In Gauss, most operators return a value that may be stored in a
variable, or printed to screen. If no assigment command is given,
the program will simply output the result to the screen. Example:

2*x; vs. y=2*x.


You don't have to specify the dimension of ectors or matrices if
they are assigned a computed value.

But you need to assign a

prior value in two cases: a vector/matrix of parameters (that can


be modied afterwards) or when using loops (see below). To create a vector with predetermined values:

x={1 2 3};

(a

1  3 vector) or x={1,2,3}; (a 3  1 vector).

And you can do the same for a vector of strings:

1 Note:

commands are always separated by semicolons

;.

vnames={"a","b","c"}.

219
Here is a list of useful operators:

cols(x)
rows(x)
meanc(x)
stdc(x)
sqrt(x)
sumc(x)
cumsumc(x)
columns of

x;

Returns the number of columns of


Returns the number of rows of

x;

Returns the mean of columns in

x;

x;

Returns the standard deviation of columns in


Computes square root of elements in

x;

Returns the sum of elements in columns of

x;

x;

Returns the cumulative sum of elements in

cdfn(x)
Returns the cumulative normal distribution (x);
2
cdfchic(x,y)
Returns the complement to 1 of the  (x) cumulative distribution with

2
puting p-values of 

degrees of freedom. Useful for com-

tests.

Working with matrices

x'
Transposes matrix or vector x;
y=x1 x2, y=x1|x2
Concatenates

two vectors or matrices

horizontally or vertically;

y=x[.,1]
Selects column 1 and all rows of matrix x;
y=x[1:10,.]
Selects rows 1 to 10 and all columns;
y=x[1:10,1:20]
Selects columns 1 to 20 and rows 1 to 10;
vec(x)
Creates a vector from a matrix, by stacking all
columns one after the other. vec(x) is NT  1 if x is N  T ;
diag(x)
Returns the rst diagonal of matrix x (must be
square);

reshape(x,n,t)
Reshapes matrix x into a N  T matrix;
a*b*c
Performs matrix multiplication (check number of rows
and columns!);

a.*b, a./b

Performs element-wise matrix multiplication or

220

APPENDIX 8. A CRASH COURSE IN GAUSS

a and b must have the same dimension);


inv(x)
Compute inverse of x (for generalized
division (

inverse, use

invpd(x));
zeros(n,m)
Returns a n  m matrix of zeros;
ones(n,m)
Returns a n  m matrix of ones;
eye(n)
Returns a n  n identity matrix;
a.*.b
Computes the Kronecker product a
b;
Conditional operators and loops
Useful for testing and creating dummy variables. Operators:

.neq, .lt, .le, gt., ge.

.eq,

for equal to, not equal to, strictly

less than, less than or equal to, strictly greater than, greater than
or equal to.
Example: suppose you want to create an indicator variable equal

xi  50; i = 1; 2; : : : ; N . The syntax would be


y= x .le 50, which creates a N  1 vector y , with yi = 1 if
xi  50 and 0 otherwise. That is, when a variable is assigned
to 1 when

the result of a condition, Gauss automatically creates an indicator variable.


Example: You want to create a new variable
and equal to

y if z > 0.

+ y.*(z .gt 0).

z , equal to x if z < 0

The syntax would be

z = x.*(z .lt 0)

Loops are not recommended because they produce lengthy processes, and vector operators should always be preferred. But in
some cases, they are necessary. Examples of loops are:

i=1; do until i>n;


y[i]=x[i]+a;
i=i+1;
endo;

221
or

i=1; do while i=<n;


y[i]=x[i]+a;
i=i+1;
endo;
Note: in the above examples, vector
instance

y=zeros(n,1).

y must be dened before, for

Working with data matrices

It is very easy with Gauss to sort data vectors or matrices, or to


select a subset of observations.

y=sorthc(x,1)

Sorts matrix

x using

variable in column 1

as key;

y=selif(x, x .eq 1)

Creates matrix

Creates matrix

equal to 1;

y=delif(x, x .lt 0)
tive values from

x;

from values of

by deleting nega-

Creating procedures
Very useful to speed up repetitive tasks. The general syntax is

proc func(a);
local toto;

:::

retp(toto);
endp;.

a as input (scalar, vector or matrix), create toto as


local variable (not accessible outside procedure func) and return
a single argument toto. In some cases, it is necessary to have more

This will use

than 1 input and 1 output; we can use then:

222

APPENDIX 8. A CRASH COURSE IN GAUSS

proc (3)=func(a1,a2,: : : ,aK);


local toto1,toto2,toto3;

:::

retp(toto1,toto2,toto3);
endp;.
This code declares 3 inputs

a1; a2; a3; proc(3) = func states that

there will be 3 outputs.


In that case, we must use the following syntax when calling this
procedure:

{b1,b2,b3}=func(a1,a2,a3);

Beware of the use of local variables; any variable used in the procedure must either be declared as local (its value is lost when one
quits the procedure) or else where in the program (this will be a
global variable). A possibility to avoid problems is to declare all
variables as global at the start of the program, with the syntax:

clearg a,b,: : : , toto;

Example: procedure for returning deviations from individual means


(Within operator).

proc(x);
local toto;
toto=reshape(x,n,t);
toto=toto-meanc(toto');
toto=reshape(toto,n*t,1);
retp(toto);
endp;
Note in this case, variables

and

are global variables. If not,

one could use them as arguments in an equivalent, more compact


procedure:

proc(x,n,t);
local toto;

223

toto=reshape(x,n,t);
retp(reshape(toto-meanc(toto'),n*t,1));
retp(toto);
endp;
And if we wished to return both Between and Within:

proc (2)=(x,n,t);
local toto;
toto=reshape(meanc(reshape(x,n,t)'),n*t,1);
retp(toto,x-toto));
endp;
Some useful built-in procedures

Some of these procedures require a Gauss dataset to be created.


If not, a 0 is put in place of the Gauss dataset name.

call dstat(0,x)
in

x;

Prints descriptive statistics for elements

call dstat("mydata",1|3)

Prints descriptive statistics for

elements 1 and 3 in Gauss dataset "mydata";

call ols(0,y,x);

Runs an OLS regression of

To minimize a function, a useful procedure is

y on x;

optmum, which works

as follows:

library optmum;optmum;

To load library and default ar-

guments;

x0={0.1 , 0.1 , 0.5};

Declare initial values of parame-

ters;

{x, f, g, ret} = optmum(&func,x0);

Main command;

x returns the nal value of parameters after convergence, f

is the

224

APPENDIX 8. A CRASH COURSE IN GAUSS

nal function value,

is the gradient vector, and

ret is a return

code (equal to 0 if convergence is OK).


The optmum procedure calls a user-dened procedure (here,

func)

that returns the value of the function to be minimized, depending


on parameters (here,

proc(z);
:::;
retp(crit);
endp;

z).

Example: To estimate a nonlinear model by minimizing the residual sum of squares, where the model is

log( 1)wi:

yi = 0 + 1 2xi +

library optmum;optmum;
x0={0.1 , 0.1 , 0.5};
{x, f, g, ret} = optmum(&func,x0);
proc(z);
local err;
err=y-z[1]-z[1]*z[2]*x-ln(z[2])*w;

z [2], 2 is z [3], and variables y; x; w


be global variables, while err (the residual) is local
1 PN u2
err=meanc(err'*err);
Computes
i i
N
Note:

is

z [1], 1

is

retp(crit);
endp;

To compute gradients and hessians numerically:

gg=gradp(&func,x) and hh=hessp(&func,x0).

must

225

Appendix 9. Example: The Gauss c software


/* DYNTAB.PRG 16 01 2001 Residential water use */
new; clear all;
library tscs,pgraph;
tscsset;graphset;

output le=d:/dea/panel/dyntab.out reset;


output on;

n=116; t=6;
load x[n*t,6]=d:/dea/panel/dyntab3.dat;
id=x[.,1];
year=x[.,2];
conso=ln(x[.,3]);
price=ln(x[.,4]);
revenue=ln(x[.,5]);
precip=ln(x[.,6]);

vnames="year","conso","price","revenue","precip","id" ;
call saved(year conso price revenue precip id,"watle",vnames);

y= conso ;
x= price,revenue ;
grp= id ;
__title("Water demand equation");

call tscs("watle",y,x,grp);

226

APPENDIX 9. EXAMPLE: THE GAUSS

SOFTWARE

=====================================================================
TSCS Version 3.1.2 1/17/01 3:51 pm
=====================================================================
Data Set: watfile
 OLS DUMMY VARIABLE RESULTS 
Dependent variable: conso


Observations :
Number of Groups :
Degrees of freedom :
Residual SS :
Std error of est :
Total SS (corrected) :
F = 35.033
P-value =
Var
price
revenue

Coef.
-0.134245
0.024386

Std.

Group Number
1
2
3
...
114
115
116

696
116
578
2.578
0.067
2.891
with 2,578 degrees of freedom
0.000

Coef.

-0.347461
0.035045

Std.

Error

0.018447
0.033223

Dummy Variable
4.643484
4.876781
5.252595
... ... ...
4.839490
4.858434
5.099257

t-Stat

-7.277506
0.734009

Standard Error
0.365639
0.370063
0.369474
... ... ...
0.365496
0.359065
0.366957

F-statistic for equality of dummy variables :


F(115, 578) = 58.3964 P-value: 0.0000

P-Value
0.000
0.463

227

OLS ESTIMATE OF CONSTRAINED MODEL


Dependent variable: conso

Observations :
696
Number of Groups :
116
Degrees of freedom :
693
R-squared :
0.172
Rbar-squared :
0.170
Residual SS :
32.532
Std error of est :
0.217
Total SS (corrected) : 39.308
F = 72.175
with 3,693 degrees of freedom
P-value =
0.000
Var
CONSTANT
price
revenue

Coef.
1.164761
-0.249873
0.376643

Std.

Coef.

-0.406149
0.257121

Std.

Error

0.598014
0.022153
0.052746

t-Stat

1.947715
-11.279345
7.140637

P-Value

0.052
0.000
0.000

FULL, RESTRICTED, AND PARTIAL R-SQUARED TERMSDUMMY VARIABLES ARE CONSTRAINED


TABLE OF R-SQUARED TERMS
R-squaredfull model:
0.934
R-squaredconstrained model: 0.172
Partial R-squared:
0.921

FULL, RESTRICTED, AND PARTIAL R-SQUARED TERMSX VARIABLES ARE CONSTRAINED

228

APPENDIX 9. EXAMPLE: THE GAUSS

SOFTWARE

TABLE OF R-SQUARED TERMS


R-squaredfull model:
0.934
R-squaredconstrained model: 0.926 Partial R-squared:
0.108

GLS ERROR COMPONENTS RESULTS



Dependent variable: conso

Observations :
696
Number of Groups :
116
Degrees of freedom :
693
Residual SS :
3.135
Std error of est :
0.067
Total SS (corrected) : 3.517
F = 22047.870
with 3,693 degrees of freedom
P-value =
0.000
Std. errors of error terms:
Individual constant terms: 0.206
White noise error : 0.067

Var
CONSTANT
price
revenue

Coef.
4.687235
-0.149316
0.053560

Std.

Coef.

-0.363264
0.071009

Std.

Error

0.355285
0.017623
0.032338

t-Stat

13.192903
-8.472974
1.656247

P-Value

0.000
0.000
0.098

229
Group Number
1
2
3
4
5
...
112
113
114
115
116

Random Components
-0.346522
-0.121608
0.250638
-0.020350
0.128761
... ... ...
0.512636
-0.216224
-0.151243
-0.125587
0.104064

Lagrange Multiplier Test for Error Components Model


Null hypothesis: Individual error components do not exist.
Chi-squared statistic (1): 1367.1014
P-value:
0.0000

230

APPENDIX 10. IV AND GMM ESTIMATION WITH GAUSS

Appendix 10. IV and GMM estimation with


Gauss c
/* IV2.PRG Instrumental variable estimation and GMM estimation
Model y(it) = X(it)beta + Z(i) gamma
We use Hausman-Taylor, Amemiya-MaCurdy, Breusch-Mizon-Schmidt instruments,
both for IV and GMM */
new;clear all;
/* You only need to change this block */
/* Define dimensions
N: number of units, T=number of time periods
nvar= Nb. of variables to be read
k1: Nb. of X1it, k2: Nb. of X2it, g1= Nb.
kq= k1+k2, kb= k1+k2+g1+g2*/
n=595;
t=7;
nvar=13;
k1=4;
k2=5;
g1=2;
g2=1;
kq=k1+k2;
kb=k1+k2+g1+g2;
et=ones(t,1);
un=ones(n*t,1);
unb=ones(n,1);
/* Read data */
load x[n*t,nvar]=psid.dat;
output file=iv1.out reset;
expe=x[.,1];
expe2=x[.,2];
wks=x[.,3];

of Z1i, g2: Nb.

of Z2i

231
occ=x[.,4];
ind=x[.,5];
south=x[.,6];
smsa=x[.,7];
ms=x[.,8];
fem=x[.,9];
unioni=x[.,10];
edu=x[.,11];
blk=x[.,12];
lwage=x[.,13];
/* Define matrices X, Z and vector Y */
x1=occ south smsa ind;
x2=expe expe2 wks ms unioni;
z1=fem blk;
z2=edu;
y=lwage;
x=x1 x2;
z=z1 z2;
/* You don't need to change anything after this */
/* Compute Between and Within transformations:
Caution: keep that order for BXZ: X,Z,Y */
qx=with(x y);
bxz=bet(x z y);
by=bxz[.,cols(bxz)];
bxz=bxz[.,1:cols(bxz)-1];
qy=qx[.,cols(qx)];
qx=qx[.,1:cols(qx)-1];
/* Within regression and error term (uw) */
betaw=inv(qx'qx)*qx'qy;
uw=qy-qx*betaw;
/* Compute variance with instruments */
exob=un bxz;
gamb=inv(exob'exob)*(exob'by);

BX and QX

232

APPENDIX 10. IV AND GMM ESTIMATION WITH GAUSS

ub=by-exob*gamb;
sigep=uw'uw/(n*(t-1)-kq);
sigq=sqrt(sigep*diag(inv(qx'qx)));
a=x1 z1;
di=by-bxz[.,1:kq]*betaw;
zz=un z1 z2;
gamhatw=inv(zz'*a*inv(a'*a)*a'*zz)*zz'*a*inv(a'*a)*a'*di;
s2=(1/(n*t))*(by-bxz[.,1:kq]*betaw
-zz*gamhatw)'*(by-bxz[.,1:kq]*betaw-zz*gamhatw);
sigal=s2-(1/t)*sigep;
theta=sqrt(sigep/(sigep+t*sigal));
/* GLS transformation and estimate
Caution: keep the order 1,X1,X2,Z1,Z2 in matrix EXOG */
exog=gls(un x1 x2 z1 z2 y);
yg=exog[.,cols(exog)];
exog=exog[.,1:cols(exog)-1];
betagls=inv(exog'exog)*(exog'yg);
siggls=sqrt(sigep*diag(inv(exog'exog)));
/* HT */
aht=un qx bet(x1) z1;
betaht=inv(exog'*aht*inv(aht'*aht)*aht'*exog)*exog'*aht*inv(aht'*aht)
*aht'*yg;
sight=sqrt(sigep*diag(inv(exog'*aht*inv(aht'*aht)*aht'*exog)));
/* AM */
x1s=tam(x1);
aam=un qx x1s z1;
betaam=inv(exog'*aam*inv(aam'*aam)*aam'*exog);
betaam=betaam*exog'*aam*inv(aam'*aam)*aam'*yg;
sigam=sqrt(sigep*diag(inv(exog'*aam*inv(aam'*aam)*aam'*exog)));
/* BMS */

233
abms1=aam tbms(with(x2));
/* This is the general form for BMS instrument, it should work in most
cases. But with the application to PSID data, we must drop some variables,
see below. This means you have to delete ABMS1 below for your application
*/
/* Remove abms1 just below: */
abms1=un qx bet(x1) tbms(with(occ south smsa ind ms wks unioni)) z1;
betabms1=inv(exog'*abms1*inv(abms1'*abms1)*abms1'*exog)
*exog'*abms1*inv(abms1'*abms1)*abms1'*yg;
sigbms1=sqrt(sigep*diag(inv(exog'*abms1*inv(abms1'*abms1)*abms1'*exog)));
/* Compute variance-covariance matrices */
varq=sigep*inv(qx'qx); varg=sigep*inv(exog'*exog);
varht=sigep*inv(exog'*aht*inv(aht'*aht)*aht'*exog);
varam=sigep*inv(exog'*aam*inv(aam'*aam)*aam'*exog);
varbms1=sigep*inv(exog'*abms1*inv(abms1'*abms1)*abms1'*exog);
test1=(betagls[2:kq+1]-betaw)'*inv(varq-varg[2:kq+1,2:kq+1]);
test1=test1*(betagls[2:kq+1]-betaw);
test2=(betaht[2:kq+1]-betaw)'*inv(varq-varht[2:kq+1,2:kq+1])
*(betaht[2:kq+1]-betaw);
test3=(betaht-betaam)'*inv(varht-varam)*(betaht-betaam);
test4=(betaam-betabms1)'*inv(varam-varbms1)*(betaam-betabms1);
output file=iv1.out reset;
output on;
"Within estimates ";
" Estimate standard error t-stat ";
betaw sigq betaw./sigq;
"GLS estimates";
"sigma(alpha),sigma(epsilon),theta(=(sig(ep)/(sig(ep+t*sig(al)))**(1/2))";
sigal sigep theta;
" Estimate standard error t-stat ";
betagls siggls betagls./siggls;

234

APPENDIX 10. IV AND GMM ESTIMATION WITH GAUSS

"HT estimates ";


" Estimate standard error t-stat ";
betaht sight betaht./sight;
"AM estimates ";
" Estimate standard error t-stat ";
betaam sigam betaam./sigam; "BMS estimates ";
" Estimate standard error t-stat ";
betabms1 sigbms1 betabms1./sigbms1;
"Hausman test statistics and p-value ";
"Within vs. GLS ";
test1 cdfchic(test1,kq);
"Within vs. HT ";
test2 cdfchic(test2,k1-g2);
"AM vs. HT ";
test3 cdfchic(test3,cols(aam)-cols(aht));
"BMS vs. AM ";
test4 cdfchic(test4,cols(abms1)-cols(aam));
/* GMM estimation */
b1,se1,b2,se2,sar = gmm(y,un x1 x2 z1 z2,aht,1);
"GMM-HT estimates ";
" Estimate standard error t-stat ";
b2 se2 b2./se2;
"Hansen test and p-value ";
sar cdfchic(sar,cols(aht)-rows(b2));
b1,se1,b2,se2,sar = gmm(y,un x1 x2 z1 z2,aam,1);
"GMM-AM estimates ";
" Estimate standard error t-stat ";
b2 se2 b2./se2;
"Hansen test and p-value ";
sar cdfchic(sar,cols(aam)-rows(b2));
b1,se1,b2,se2,sar = gmm(y,un x1 x2 z1 z2,abms1,1);
"GMM-BMS estimates ";

235
" Estimate standard error t-stat ";
b2 se2 b2./se2;
"Hansen test and p-value ";
sar cdfchic(sar,cols(abms1)-rows(b2));
output off;
proc bet(w);
/* Compute BX from matrix w */
local i,term,betx;
term=reshape(w[.,1],n,t);
term=meanc(term').*.et;
term=reshape(term,n*t,1);
betx=term;
i=2;
do until i>cols(w);
term=reshape(w[.,i],n,t);
term=reshape(meanc(term').*.et,n*t,1);
betx=betx term;
i=i+1;
endo;
retp(betx);
endp;
proc with(w);
/* Compute Within transformation for matrix W */
retp(w-bet(w));
endp;
proc gls(w);
/* GLS transformation */
local term; term=w-(1-theta)*bet(w);
retp(term);
endp;
proc tam(w);
/* AM transformation, stacking time observations */
local i,term,xstar;
term=reshape(w[.,1],n,t).*.et;
xstar=term;

236

APPENDIX 10. IV AND GMM ESTIMATION WITH GAUSS

i=2;
do until i>cols(w);
term=reshape(w[.,i],n,t).*.et;
xstar=xstar term;
i=i+1;
endo;
retp(xstar);
endp;
proc tbms(w);
/* BMS transformation, stacking time observations but deleting last column
*/
local i,term,xstar;
term=reshape(w[.,1],n,t).*.et;
xstar=term[.,1:cols(term)-1];
i=2;
do until i>cols(w);
term=reshape(w[.,i],n,t).*.et;
xstar=xstar term[.,1:cols(term)-1];
i=i+1;
endo;
retp(xstar);
endp;
proc (5)=gmm(y,x,z,d);
local zx,w,w2,b,e,e2,b2,se,se2,sar2;
zx = z'x;
if d==1;
w = invpd(inw(z));
else;
w = invpd(z'z);
endif;
b = invpd(zx'w*zx)*zx'w*z'y;
e = y-x*b;
w2 = ezw(e,z);
se = invpd(zx'w*zx)*zx'w*w2*w*zx*invpd(zx'w*zx);

237
w = invpd(w2);
se2 = invpd(zx'w*zx);
b2 = se2*zx'w*z'y;
e2 = y-x*b2;
sar2 = e2'z*w*z'e2;
retp(b,sqrt(diag(se)),b2,sqrt(diag(se2)),sar2);
endp;
proc ezw(e,z);
local k,ez,T;
T = rows(e)/N;
k = cols(z);
ez = reshape(e.*z,N,K*T)*(ones(T,1).*.eye(K));
retp(ez'ez);
endp;
proc inw(z);
local a,i,zi,zaz,T;
t = rows(z)/N;
a = eye(T);
zaz = 0;
i = 1;
do until i>N;
zi = z[(i-1)*T+1:i*T,.];
zaz = zaz + zi'a*zi;
i = i+1;
endo;
retp(zaz);
endp;

238

APPENDIX 11. DPD ESTIMATION WITH GAUSS

Appendix 11. DPD estimation with Gauss c

/* DPD1.PRG Program for DPD (Dynamic Panel Data model)


Method: Arellano-Bond */
/* Defines variables below as global */
clearg N,T,y,x,z,alpha,sco,hes,zgy,fake,mom,w;
/*Read data*/
n=595; t=7; nvar=13;
load x[n*t,nvar]=d:/dea/panel/psid.dat;
lwage=x[.,13];
wks=x[.,3];
occ=x[.,4];
clear x;
/* Create a (NxT) matrix for dependent var.
y=reshape(lwage,n,t);
/* Stack exogenous vars. */
x=wks occ;

*/

/* Set top=0 for instruments from lagged Y's only;


top=1 to add instruments from X that are weakly exogenous and in level;
set top=2 to add for instruments from X that are strongly exogenous and
in first-difference form */
top=2;
/* Set AR1 to 0 for general case, and AR1 to 1
for serially correlated epsilon's of order 1 (E (epi tepi ; t + 1) <> 0) */
ar1=1;
/* You don't need to change anything after this line */
/* Define identity matrices I(T-2) for AB and BB */
ddif = eye(T-2);
/* Construct AB instrument matrix Z.

239
First component matrix: lagged Y's
Recall: if AR1=1, restriction when epsilon's are serially correlated
of order 1 */
z = (y[.,1]).*.ddif[.,1];
j = 2;
do until j>cols(ddif);
z = z ((y[.,1:j]).*.ddif[.,j]);
j = j+1;
endo;
if ar1==1;
z = (y[.,1]).*.ddif[.,1];
j = 2;
do until j>cols(ddif);
z = z ((y[.,1:j-1]).*.ddif[.,j]);
j = j+1;
endo;
z=z[.,2:cols(z)];
endif;
/* Second component matrix: Instruments from X */
/* Delete this block if you want only instruments from y's */
if top==1;
/* Weakly exogenous X's, in level */
toto=shapent(x[.,1]);
z2 = (toto[.,1]).*.ddif[.,1];
j = 2;
do until j>cols(ddif);
z2 = z2 ((toto[.,1:j]).*.ddif[.,j]);
j = j+1;
endo;
i=2;
do until i>cols(x);
toto=shapent(x[.,i]);
z2 =z2 ((toto[.,1]).*.ddif[.,1]);
j = 2;
do until j>cols(ddif);
z2 = z2 ((toto[.,1:j]).*.ddif[.,j]);

240

APPENDIX 11. DPD ESTIMATION WITH GAUSS

j = j+1;
endo;
i=i+1;
endo;
z=z z2;
endif;
if top==2;
/* Strongly exogenous X's, in first-difference form */
toto=shapent(x[.,1]);
z2 = (toto[.,3]-toto[.,2]).*.ddif[.,1];
j = 2;
do until j>cols(ddif);
z2 = z2 ((toto[.,j]-toto[.,j-1]).*.ddif[.,j]);
j = j+1;
endo;
i=2;do until i>cols(x);
toto=shapent(x[.,i]);
z2 = z2 ((toto[.,3]-toto[.,2]).*.ddif[.,1]);
j = 2;
do until j>cols(ddif);
z2 = z2 ((toto[.,j]-toto[.,j-1]).*.ddif[.,j]);
j = j+1;
endo;
i=i+1;
endo;
z=z z2;
endif;
b1,se1,b2,se2,sar = gmm(vec((y[.,3:T]-y[.,2:T-1])'),
vec((y[.,2:T-1]-y[.,1:T-2])')
trans(x),z,1);
output file = dpd1.out on;
"Arellano-Bond GMM estimates";
if top ==0;
"Instruments from lagged Y's only (TOP=0)";
endif;
if top==1;

241
"Instruments from X are weakly exogenous and in level (TOP=1)";
endif;
if top==2;
"Instruments from X are strongly exogenous and first-differenced (TOP=2)";
endif;
if ar1==1;
"Restricted estimates: epsilon are serially correlated of order 1 (AR1=1)";
endif;
" Estimate standard error t-stat";
b2 se2 b2./se2;
"Nb. of conditions (instruments) " cols(z);
"Nb. of parameters " rows(b2);
"Hansen specification test and p-value ";
sar cdfchic(sar,cols(z)-rows(b2));
output off;
proc shapent(w);
/* Reshapes vector in NxT form */
retp(reshape(w,n,t));
endp;
proc trans(w);
/* Transforms matrix X in First Difference */
local toto,i,xfd;
toto=reshape(w[.,1],n,t);
toto=vec((toto[.,3:T]-toto[.,2:T-1])');
xfd=toto;
i=2;
do until i>cols(w);
toto=reshape(w[.,i],n,t);
toto=vec((toto[.,3:T]-toto[.,2:T-1])');
xfd=xfd toto;
i=i+1;
endo;
retp(xfd);
endp;

242

proc (2)=ls(y,x);
/* Computes OLS, returns White var-covar matrix */
local ixx,b,e,v;
ixx = invpd(x'x);
b = ixx*x'y;
e = y-x*b;
v = ixx*(ezw(e,x))*ixx;
retp(b,v);
endp;
proc ezw(e,z);
local k,ez,T;
T = rows(e)/N;
k = cols(z);
ez = reshape(e.*z,N,K*T)*(ones(T,1).*.eye(K));
retp(ez'ez);
endp;
proc inw(z);
local d,a,i,zi,zaz,T;
T = rows(z)/N;
d = zeros(T,1) (eye(T-1)|zeros(1,T-1));
a = 2*eye(T) - (d + d');
zaz = 0;
i = 1;
do until i>N;
zi = z[(i-1)*T+1:i*T,.];
zaz = zaz + zi'a*zi;
i = i+1;
endo;
retp(zaz);
endp;
proc (5)=gmm(y,x,z,d);
local zx,w,w2,b,e,e2,b2,se,se2,sar2;
zx = z'x;

APPENDIX 11. DPD ESTIMATION WITH GAUSS

243
if d==1;
w = invpd(inw(z));
else;
w = invpd(z'z);
endif;
b = invpd(zx'w*zx)*zx'w*z'y;
e = y-x*b;
w2 = ezw(e,z);
se = invpd(zx'w*zx)*zx'w*w2*w*zx*invpd(zx'w*zx);
w = invpd(w2);
se2 = invpd(zx'w*zx);
b2 = se2*zx'w*z'y;
e2 = y-x*b2;
sar2 = e2'z*w*z'e2;
retp(b,sqrt(diag(se)),b2,sqrt(diag(se2)),sar2);
endp;

244

REFERENCES

References
S.C. Ahn and P. Schmidt, Ecient Estimation of Models for Dynamic Panel
Data, Journal of Econometrics, 68, 5-27, 1995.
S.C. Ahn and P. Schmidt, A Separability Result for GMM Estimation, with
Applications to GLS Prediction and Conditional Moment Tests, Econometric Reviews, 14(1), 19-34, 1995.
S.C. Ahn and P. Schmidt, Ecient Estimation of Dynamic Panel Data Models:
Alternative Assumptions and Simplied Estimation, Journal of Econometrics, 76,
309-321, 1997.
S.C. Ahn, Y.H. Lee and P. Schmidt, GMM Estimation of Linear Panel Data
Models with Time-varying Individual Eects, Journal of Econometrics, 101, 219255, 2001.
T. Amemiya, The estimation of the variances in a variance-components model,
International Economic Review, 12, 1-13, 1971.
T. Amemiya and T.E. MaCurdy, Instrumental-Variable Estimation of an ErrorComponents Model, Econometrica, 54(4), 869880, 1986.
E.B. Andersen, Conditional inference and models for measuring (Mentalhygiejnisk Forlag, Copenhague), 1973.
T.W. Anderson and C. Hsiao, Formulation and Estimation of Dynamic Models
Using Panel Data, Journal of Econometrics, 18, 4782, 1982.
D.W.K. Andrews, Heteroskedasticity and autocorrelation consistent covariance
matrix estimation, Econometrica, 59, 817-858, 1991.
D.W.K. Andrews and J.C. Monahan, An improved heteroskedasticity and autocorrelation consistent covariance matrix estimator, Econometrica, 60, 953-966,
1992.
W. Antweiler, Nested Random Eects Estimation in Unbalanced Panel Data,
Journal of Econometrics, 101, 295-313, 2001.
M. Arellano, Discrete choices with panel data, working paper 0101, CEMFI,

245
2001.
M. Arellano and S. Bond, Some Tests of Specication for Panel Data: Monte
Carlo Evidence and an Application to Employment Equations, Review of Economic
Studies, 58, 277297, 1991.
M. Arellano and O. Bover, Another Look at the Instrumental Variable Estimation of Error-Components Models, Journal of Econometrics, 68, 2951, 1995.
J. Alvarez and M. Arellano, The Time Series and Cross Section Asymptotics
of Dynamic Panel Data Estimators, CEMFI Working Paper No. 9808, 1998.
P. Balestra and M. Nerlove, Pooling cross-section and time-series data in the
estimation of a dynamic model: the demand for natural gas, Econometrica, 34,
585-612,1966.
B.H. Baltagi, Econometric Analysis of Panel Data, J. Wiley, 1995.
B.H. Baltagi and S. Khanti-Akom, On ecient estimation with panel data:an
empirical comparison of instrumental variables estimators, Journal of Applied Econometrics, 5, 401-406, 1990.
B.H. Baltagi, Simultaneous equations with error components, Journal of Econometrics, 17, 189-200, 1981.
B.H. Baltagi, Specication issues, in The econometrics of panel data: Handbook of theory and applications, chap. 9, L. Matyas and P. Sevestre eds., Kluwer
Academix Publishers, Dordrecht, 196-205, 1992.
B.H. Baltagi, Panel data, Journal of Econometrics, 68, 1-268, 1995.
B.H. Baltagi, S.H. Song and B.C. Jung, The Unbalanced Nested Error Component Regression Model, Journal of Econometrics, 101, 357-381, 2001.
R. Blundell and S. Bond, GMM estimation with persistent panel data: An
application to production functions, IFS working paper W99/4, 1999.
R. Blundell and S. Bond, Initial Conditions and Moment Restrictions in Dynamic Panel Data Models, Journal of Econometrics, 87, 115143, 1998.

246

REFERENCES

A. Brsch-Supan and V. Hajivassiliou, Smooth unbiased multivariate probability simulators for maximum likelihood estimation of limited dependent variables
models, Cowles Foundation paper 960, Yale University, 1990.
T.S. Breusch, G.E. Mizon and P. Schmidt, Ecient Estimation Using Panel
Data, Econometrica, 57(3), 695-700, 1989.
G. Chamberlain, Asymptotic Eciency in Estimation with Conditional Moment Restrictions, Journal of Econometrics, 34, 305-334, 1987.
G. Chamberlain, Panel data, in Handbook of Econometrics, pp. 1247-1318, Z.
Griliches and M. Intriligator eds., North- Holland, Amsterdam, 1984.
G. Chamberlain, Comment: Sequential Moment Restrictions in Panel Data,
Journal of Business and Economic Statistics, 10, 20-26, 1992.
G. Chamberlain, Multivariate regression models for panel data, Journal of
Econometrics, 18, 5-46, 1982.
E. Charlier, B. Melenberg and A. van Soest, Estimation of a censored regression panel data model using conditional moment restrictions eciently, Journal of
Econometrics, 95, 25-56, 2000.
C. Cornwell. and P. Rupert, Ecient Estimation with Panel Data: An Empirical Comparison of Instrumental Variables Estimators, Journal of Applied Econometrics, 3, 149-155, 1988.
B. Crpon, F. Kramarz and A. Trognon, Parameters of Interest, Nuisance Parameters and Orthogonality Conditions. An Application to Autoregressive Error
Component Models, Journal of Econometrics, 82, 135156, 1997.
C. Cornwell, P. Schmidt and D. Wyhowski, Simultaneous equations and panel
data, Journal of Econometrics, 51, 151-181, 1992.
G. Dionne, R. Gagn and C. Vanasse, Inferring technological parameters from
incomplete panel data, Journal of Econometrics, 87, 303-327, 1998.
J. Dolado, Optimal instrumental variable estimator of the AR parameter of an
ARMA(1,1) process, Econometric Theory, 6, 117-119.

247
B. Dormont, Introduction l'Economtrie des Donnes de Panel, Editions du
Centre National de la Recherche Scientique, Paris, 1989.
E. Fix and J.L. Hodges, Discriminatory analysis, nonparametric estimation:
consistent properties, Report No 4, USAF School of Aviation Medicine, Randolph
Field, Texas, 1951.
J. Geweke, Bayesian inference in econometric models using Monte Carlo integration, Econometrica, 57, 1317-1339, 1989.
S. Girma, A quasi-dierencing approach to dynamic modelling from a time series of independent cross-sections, Journal of Econometrics, 365-383, 2000.
R. Hall, Stochastic implications of the life cycle-permanent income hypothesis,
Journal of Political Economy, 86, 971-987, 1978.
B.E. Hansen, Threshold Eects in Non-Dynamic Panels: Estimation, Testing,
and Inference,Journal of Econometrics, 93, 345368, 1999.
L.P. Hansen, Large sample properties of generalized method of moments estimators, Econometrica, 50, 102-1054, 1982.
L.P. Hansen, A method of calculating bounds on the asymptotic covariance
matrices of generalized method of moments estimators, Journal of Econometrics,
30, 203-238, 1985.
L.P. Hansen and T.J. Sargent, Instrumental variables procedures for estimating
linear rational expectations models, Journal of Monetary Economics, 9, 263-296,
1982.
L.P. Hansen and K.J. Singleton, Generalized instrumental variable estimation
of nonlinear rational expectations models, Econometrica, 50, 1269-1286, 1982.
L.P. Hansen, J.C. Heaton and A. Yaron, Finite-sample properties of some alternative GMM estimators, Journal of Business and Economics Statistics, 14, 262-280,
1993.
W. Hrdle and J.S. Marron, Optimal bandwidth selection in nonparametric
regression function estimation, Annals of Statistics, 13 1465-1481, 1983.
R.D.F. Harris and E. Tzavalis, Inference for unit roots in dynamic panels where

248

REFERENCES

the time dimension is xed, Journal of Econometrics, 91, 201-226, 1999.


J.A. Hausman, Specication Tests in Econometrics, Econometrica, 46(6), 12511271,
1978.
J.A. Hausman and W.E. Taylor, Panel Data and Unobservable Individual Effects, Econometrica, 49(6), 13771398, 1981.
J.J. Heckman and T.E. MaCurdy, A life-cycle model of female labor supply,
Review of Economic Studies, 47, 47-74, 1980.
I. Hoch, Estimation of production function parameters combining time-series
and cross-section data, Econometrica, 30, 34-53, 1962.
D. Holtz-Eakin, W. Newey and H. Rosen, Estimating Vector Autoregressions
with Panel Data, Econometrica, 56, 13711395, 1988.
B.E. Honor and A. Lewbel, Semiparametric binary choice panel data models
without strictly exogeneous regressors, working paper, Boston College, 2000.
C. Hsiao, Analysis of Panel Data, Cambridge University Press, 1986.
K.S. Im, S.C. Ahn, P. Schmidt and J.M. Wooldridge, Ecient estimation of
panel data models with strictly exogenous explanatory variables, Journal of Econometrics, 93, 177-201, 1999.
G.W. Imbens, One-step estimators for over-identied generalized method of
moments models, Review of Economic Studies, 64, 359-383.
J. Inkmann, Misspecied heteroskedasticity in the panel Probit model: A small
sample comparison of GMM and SML estimators, Journal of Econometrics, 97, 227259, 2000.
R.A. Judson and A.L. Owen, Estimating dynamic panel data models: A guide
for macroeconomists, Economics Letters, 65, 9-15, 1999.
M.P. Keane and D.E. Runkle, On the estimation of panel-data models with
serial correlation when instruments are not strictly exogenous, Journal of Business
and Economic Statistics, 10, 1-9, 1992.
N.M. Kiefer, A Time Series-Cross Section Model with Fixed Eects with an

249
Intertemporal Factor Structure, unpublished manuscript, Cornell University, 1980.
E. Kyriazidou, Estimation of a panel data sample selection model, Econometrica, 65, 1335-1364, 1997.
Y.H. Lee and P. Schmidt, A Production Frontier Model with Flexible Temporal
Variation in Technical Ineciency, in The Measurement of Productive Eciency:
Techniques and Applications, Oxford University Press, 1993.
L.A. Lillard and Y. Weiss, Components of Variation in Panel Earnings Data:
American Scientists 1960-1970, Econometrica, 47, 437454, 1979.
R. Lucas, Econometric policy evaluation: A critique, in The Phillips curve and
labor markets, K. Brunner (Ed.), Vol. 1, North-Holland, 1976.
Y.P. Mack, Local properties of k N N regression estimates, SIAM Journal of
Algebraic and discrete methods, 2, 311-323, 1981.
L. Matyas and P. Sevestre, The Econometrics of Panel Data. Handbook of
Theory and Applications, Kluwer Academic Publishers, 1992.
P. Mazodier and A. Trognon, Heteroskedasticity and stratication in error components models, Annales de l'INSEE, 30-31, 451-482, 1978.
C. Meghir and F. Windmeijer, Moment Conditions for Dynamic Panel Data
Models with Multiplicative Individual Eects in the Conditional Variance,IFS
Working Paper Series No. W97/21, 1997.
R. Mott, Identication and estimation of dynamic models with a time series
of repeated cross-sections, Journal of Econometrics, 59, 99-123, 1993.
M. Nerlove, A note on error components models, Econometrica, 39, 383-396,
1971.
W.K. Newey, Ecient estimation of models with conditional moment restrictions, in Handbook of Statistics, C.R. Rao and H.D. Vinod (Eds.), Vol. 11, Elsevier
Science Publishers, 1993.
W.K. Newey, Ecient instrumental variables estimation of nonlinear models,
Econometrica, 58, 809-837, 1990.

250

REFERENCES

W.K. Newey and K.D. West, Automatic lag selection in covariance estimation,
Review of Economic Studies, 61, 631-653, 1994.
W.K. Newey and K.D. West, Hypothesis testing with ecient method of moments estimation, International Economic Review, 28, 777-787, 1987.
W.K. Newey and K.D. West, A simple, positive denite, heteroscedasticity and
autocorrelation consistent covariance matrix, Econometrica, 55, 703-708, 1987.
P. Schmidt, S.C. Ahn and D. Wyhowski, Comment: Sequential Moment Restrictions in Panel Data,Journal of Business and Economic Statistics, 10, 1014,
1992.
C.J. Stone, Consistent nonparametric regression, Annals of Statistics, 5, 595645, 1977.
P.A.V.B. Swamy and S.S. Arora, The exact nite sample properties of the estimators of coecients in the error components regression models, Econometrica,
40, 261-275, 1972.
M. Verbeek and T.E. Nijman, Testing for selectivity bias in panel data models,
International Economic Review, 33, 681-703, 1992.
M. Verbeek and T.E. Nijman, Minimum MSE estimation of a regression model
with xed eects and a series of cross- sections, Journal of Econometrics, 59, 125136, 1993.
T.D. Wallace and A. Hussain, The use of error components models in combining cross-sction and time-series data, Econometrica, 37, 55-72, 1969.
T.J. Wansbeek and A. Kapteyn, Estimation of the error components model
with incomplete panels, Journal of Econometrics, 41, 341-361, 1989.
H. White, A heteroscedasticity consistent covariance matrix estimator and a
direct test for heteroscedasticity, Econometrica, 48, 817-838, 1980.
H. White, Asymptotic theory for econometricians, Academic Press, Orlando,
1984.

251
J.M. Wooldridge, A framework for estimating dynamic, unobserved eects
panel data models with possible feedback to future explanatory variables, Economics Letters, 68, 245-250, 2000.