Anda di halaman 1dari 251

DEEQA,Ecole Doctorale MPSE

Panel data econometrics
and GMM estimation

Alban Thomas
MF 102, thomas@toulouse.inra.fr

 Present recent developments in econometrics, that allow for

a consistent treatment of the impact of unobserved heterogeneity
on model predictions:

 Present a convenient econometric framework for dealing with

restrictions imposed by theory:

geneity.

Two keywords: unobserved heterogeneity and endogeneity.

Methods:
- Fixed Eects Least Squares
- Generalized Least Squares
- Instrumental Variables
- Maximum Likelihood estimation for Panel Data models

- Generalized Method of Moments for Times Series

- Generalized Method of Moments for Panel Data
- Heteroskedasticity-consistent estimation
- Dynamic Panel Data models

- Logit and Probit models for Panel Data

- Simulation-based inference
- Nonparametric and Semiparametric estimation

Contents
I

Introduction

1.1

1.1.1

Discrimination between alternative models .

1.1.2

Examples . . . . . . . . . . . . . . . . . . .

10

1.1.3

1.1.4

May reduce bias due to missing or unobserved variables

. . . . . . . . . . . . . . .

11

1.2

Analysis of variance . . . . . . . . . . . . . . . . .

12

1.3

Some denitions . . . . . . . . . . . . . . . . . . .

15

The linear model

17

2.1

Notation . . . . . . . . . . . . . . . . . . . . . . .

17

2.1.1

Model notation

. . . . . . . . . . . . . . .

18

2.1.2

19

2.1.3

. . . . .

20

21

2.2

2.2.1

21

2.2.2

Interpretation as a covariance estimator

. .

23

2.2.3

Comments . . . . . . . . . . . . . . . . . .

24

2.2.4

25

CONTENTS
2.3

26

2.3.1

Notation and assumptions

. . . . . . . . .

26

2.3.2

27

2.3.3

29

2.3.4

2.3.5

2.3.6

of variances

31

Extensions

33

3.1

33

3.1.1

33

3.1.2

Example: Production function (Hoch 1962)

3.2

3.3

. . . . . . . . . . . . . . . . .

30

. . . . . .

. . . . . . . .

36
37

3.2.1

Heteroskedasticity in individual eect

. . .

37

3.2.2

`Typical heteroskedasticity . . . . . . . . .

38

Unbalanced panel data models

. . . . . . . . . . .

39

3.3.1

Introduction . . . . . . . . . . . . . . . . .

39

3.3.2

40

Augmented panel data models

47

4.1

Introduction . . . . . . . . . . . . . . . . . . . . .

47

4.2

48

4.3

49

4.4

Instrumental Variable estimation: Hausman-Taylor

. . . . . . . . .

GLS estimator . . . . . . . . . . . . . . . . . . . .

51

4.4.1

Instrumental Variable estimation . . . . . .

51

4.4.2

IV in a panel-data context

51

4.4.3

Exogeneity assumptions and a rst instru-

. . . . . . . . .

ment matrix . . . . . . . . . . . . . . . . .

52

CONTENTS
4.4.4

More ecient procedures: Amemiya-MaCurdy

and Breusch-Mizon-Schmidt

4.5

4.5.1

. . . . . . . . . . . . . . . . . . . . . .

Example: Wage equation

4.6.1

4.7

55

. . . . .

56

. . . . . . . . . . . . . .

56

. . . . . . . . . . . . .

56

Model specication

Application: returns to education

. . . . . . . . .

4.7.1

4.7.2

53

Computation of variance-covariance matrix for IV

estimators

4.6

. . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . .

58
58

58

Dynamic panel data models

63

5.1

63

Motivation . . . . . . . . . . . . . . . . . . . . . .
5.1.1

5.2

5.3

63

5.1.2

65

5.1.3

. . . .

67

69

5.2.1

Bias in the Fixed-Eects estimator . . . . .

70

5.2.2

Instrumental-variable estimation . . . . . .

73

The Random-eects model

. . . . . . . . . . . . .

75

5.3.1

Bias in the ML estimator . . . . . . . . . .

75

5.3.2

An equivalent representation

. . . . . . . .

76

5.3.3

. . . . . . . .

77

5.3.4

78

5.3.5

78

. . .

8
II
6

CONTENTS

Generalized Method of Moments estimation

The GMM estimator
6.1

6.2

6.3

85

Moment conditions and the method of moments

85

. . . . . . . . . . . . .

85

6.1.1

Moment conditions

6.1.2

6.1.3

. . . . .

86

. . . . . . .

87

6.1.4

87

6.1.5

Example: Poisson counting model

. . . . .

88

6.1.6

Comments . . . . . . . . . . . . . . . . . .

89

The Generalized Method of Moments (GMM) . . .

91

6.2.1

Introduction . . . . . . . . . . . . . . . . .

91

6.2.2

91

6.2.3

A denition

92

6.2.4

Example: The IV estimator again

. . . . . . . . . . . . . . . . .
. . . . .

92

Asymptotic properties of the GMM estimator . . .

93

6.3.1

Consistency

. . . . . . . . . . . . . . . . .

94

6.3.2

Asymptotic normality . . . . . . . . . . . .

95

6.4

Optimal and two-step GMM

. . . . . . . . . . . .

97

6.5

Inference with GMM

. . . . . . . . . . . . . . . .

99

6.6

102

6.6.1

. . . . . .

102

6.6.2

A rst feasible estimator

. . . . . . . . . .

104

6.6.3

Nearest-neighbor estimation of optimal instruments

6.6.4

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

7.1

7.1.1

106

Generalizing the approach: other nonparametric estimators

83

109

115

. . . . . . . . .

115

Hansen and Singleton framework . . . . . .

115

CONTENTS
7.1.2
7.2

7.3

7.4

GMM estimation

. . . . . . . . . . . . . .

117

GMM Estimation of MA models . . . . . . . . . .

118

7.2.1

A simple estimator

. . . . . . . . . . . . .

118

7.2.2

120

7.2.3

Example: The Durbin estimator

. . . . . .

121

. . . . . . . .

122

. . . . . . . . . . .

122

. . . . . . . . . . . . . . . .

123

125

7.4.1

126

7.4.2

126

7.4.3

127

7.4.4

128

7.4.5

. . . .

130

7.4.6

. . . . .

133

7.3.1

7.3.2

IV estimation

GMM estimators for dynamic panel data

135

8.1

Introduction . . . . . . . . . . . . . . . . . . . . .

135

8.2

The Arellano-Bond estimator . . . . . . . . . . . .

136

8.2.1

Model assumptions

136

8.2.2

Implementation of the GMM estimator

. . . . . . . . . . . . .
. .

137

More ecient procedures (Ahn-Schmidt) . . . . . .

139

8.3.1

Additional assumptions . . . . . . . . . . .

139

8.4

140

8.5

. . . .

141

8.5.1

Multiplicative individual eects . . . . . . .

141

8.5.2

Mixed structure

143

8.3

8.6

. . . . . . . . . . . . . . .

Example: Wage equation

. . . . . . . . . . . . . .

145

10
III
9

CONTENTS

149

9.1

9.2

151

Brief review of binary discrete-choice models

. . .

151

. . . . . . . . . .

151

9.1.1

Linear Probability model

9.1.2

Logit model

. . . . . . . . . . . . . . . . .

152

9.1.3

Probit model . . . . . . . . . . . . . . . . .

152

Logit models for panel data . . . . . . . . . . . . .

153

9.2.1

Sucient statistics . . . . . . . . . . . . . .

153

9.2.2

Conditional probabilities

. . . . . . . . . .

155

9.2.3

Example:

. . . . . . . . . . . . . . .

156

. . . . . . . . . . . . . . . . . . . .

157

T =2

9.3

Probit models

9.4

9.5

9.4.1

The binary choice model

. . . . . . . . . .

159

9.4.2

The IV estimator

. . . . . . . . . . . . . .

162

. . . . . . . .

164

9.5.1

The GHK simulator . . . . . . . . . . . . .

164

9.5.2

Example

168

. . . . . . . . . . . . . . . . . . .

Appendix 1. Maximum-Likelihood estimation of the

Random-eect model
Appendix 2. The two-way random eects model

171
173

model

179

Appendix 4. ML estimation of dynamic panel models181

Appendix 5. GMM estimation of static panel models185

11

CONTENTS

Appendix 6. A framework for simulation-based inference

194

c Software
c
Appendix 8. A crash course in Gauss
c
Appendix 9. Example: The Gauss software

203
211
219

c 224

Appendix 11. DPD estimation with Gauss

232

References

238

12

CONTENTS

Part I
Panel Data Models

13

Chapter 1
Introduction
Panel data: Sequential observations on a number of
units (individuals, rms).

cross-sections over time, longitudinal data

cross-section time-series data.

Also called

or

pooled

1.1.1

Many economic models in the form:

F (Y; X; Z; ) = 0;
where

Y:

:

Z:

(public

(xed) individual attributes;

parameters.

Linear model:

Y = 0 + xX + z Z + u:
15

X:

16

CHAPTER 1. INTRODUCTION

 Policy variables have a signicant impact whatever individual

characteristics, or

 Dierences across individuals are due to idiosyncratic individual

features,

not included in Z .

In practice, observed dierences across individuals may be due

to both inter-individual dierences

ables.

1.1.2

Examples

 People with higher education level have higher wages because

rms value those people more;

 People have higher education because they have higher ability

(expected productivity) anyway, and rms value worker ability
more.

b)

SALES = 0 + 1ADV ERT ISEMENT + 2Z .

 More ecient rms enjoy more sales, and thus have more money

c)

 Regulatory control aects rm output;

 Firms with higher output are more regulated on average.
d) W AGE = 0 + 11I(UNION ) + 2Z .

1.1.

GAINS IN POOLING CROSS SECTION AND TIME SERIES

17

 Firms react to higher wages imposed by unions by hiring higherquality workers, and

1.1.3

In consumer or production economics, input, output or consumer

prices are dicult to use, because:

 Time-series:

related;

 Cross-sections: Not enough price variation across individuals

or rms.

With panel data, variations across individuals and across time periods are accounted for.

 Time-series: no information on the impact of individual characteristics (socioeconomic variables,...);

 Cross-sections: no information on adjustment dynamics. Estimates may reect inter-individual dierences inherent in comparisons of

1.1.4

variables

With panel data, easy to control for unobserved heterogeneity

across individuals. This is critical in practice, explains why panel
data models are now so popular in micro- and macro-econometrics.
Point related to endogeneity and omitted variables issues.

18

CHAPTER 1. INTRODUCTION

max  = pQ C (; Q) where C (; Q) = c(Q)

(Q)
, p =  @c@Q
= A Q 1 (Cobb-Douglas)
= ( 0 + 1Q) (Quadratic).
1
Cobb-Douglas case: log Q = 1 (log p
log  A ). From
equilibrium condition to estimable equation: Observations (Qit ; pit ),
unobserved heterogeneity i , rm i, period t.
1
(log pit log i A )
log Qit =
1
Identication issue: estimable equation is

Q~ it = a0 + a1p~it + uit; i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;
~ it = log Qit, p~it = log pit, a1 = 1=( 1),
where Q
a0 = ( A E log i) =( 1), Euit = 0.
Model identied if E log i = 0, i.e., Ei = 1, otherwise A is biased if i is overlooked and E log i 6= 0.
Empirical issue: possible correlation between output price
and eciency term

i.

pit

1.2 Analysis of variance

Consider the model

yit = i + xit i + "it;

where

xit

is scalar,

and

i = 1; 2; : : : ; N; t = 1; 2; : : : ; Ti;
i

time periods available for individual

i.

Ti:

number of

1.2.

19

ANALYSIS OF VARIANCE

Useful rst-order empirical moments are

Ti
1X
y ;
yi =
T t=1 it

Sxxi =

Ti
X
t=1

x )2;

(xit

and

Syyi =

Ti
X
t=1

(yit

Ti
1X
x ;
xi =
T t=1 it

Sxyi =
yi)2;

Ti
X
t=1

(xit

xi)(yit

yi);

i = 1; 2; : : : ; N:

^ i = Sxyi=Sxxi

and

xi ^

^ i = y i

and the Residual Sum of Squares (RSS) for individual

2 =S ;
Sxyi
xxi

with

(Ti

i is

2) degrees of freedom:

Consider now a restricted model with constant slopes and constant intercepts:

which obtains by imposing the following restrictions

1 = 2 =    = N (= )
1 = 2 =    = N (= ):

Under these restrictions, least-squares parameter estimates would

be

^ =

PN PTi
)(yit
i=1 t=1(xit x
PN PTi
)2
i=1 t=1 (xit x

y)

20

CHAPTER 1. INTRODUCTION

and

^ = y x ^ , where
y =

Ti
N X
X

1
P

i Ti i=1 t=1

yit; x =

1
P

Ti
N X
X
i Ti i=1 t=1

xit:

hP

Ti
N X
X
i=1 t=1

(yit

y)2

with as number of degrees of

N PTi
i=1 t=1(yit y)(xit
PN PTi
)2
i=1 t=1(xit x
PN
freedom:
i=1 Ti 2.

i2

x)

For a majority of applications, the rst model is too general and

estimation would require a great number of time observations. If
unobserved heterogeneity is additive in the model, we might consider the following specication with constant slope and dierent
intercepts:

Minimizing

P P
i t (yit

yit = i + xit + "it:

i xit )2 with respect to i and , we

have

XX
t

(yit

xit ) = 0;

XX
i

xit(yit

xit ) = 0;

so that

P P
x (y y )
^ i = yi xi and ^ = P i P t it it i :
i )
i t xit (xit x
P
Residual Sum of Squares has now
i Ti (N + 1) degrees of

free-

dom (

1.3.

21

SOME DEFINITIONS

1.3 Some denitions

 Typical panel: when number of units (individuals) N
and number of time periods (

T ) is small.

is large,

 Short (long) panel: when # periods T is small (large).

 Balanced panel: same # periods for every unit (individual).
 Rotating panel: A subset of individuals is replaced every period. Rotating panels can be balanced or unbalanced.

 Pseudo panel:

 Attrition: with long panels, the probability that an individual

remains in the sample decreases as the number of periods increases
(non response, moving, death, etc.)

22

CHAPTER 1. INTRODUCTION

Chapter 2
The linear model
2.1 Notation
yit = xit + uit; i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;
where

xit is a K

vector,

uit is the residual term.

yit and components of xit are both time-varying and varying across
individuals.

Component of dependent variable that is unexplained by

xit:

uit = i + t + "it;
i is the time-invariant individual
eect, and "it is the i.i.d. component.

where

t is the time

uit = i + "it.
error-component model: uit = i + t + "it .

Two-way

eect,

23

24

E (yitjxit) = xit across i and t,

E (yitjxit; i) = xit + i for ind. i, across periods,
E (yitjxit; t) = xit + t for period t, across individuals,
E (yitjxit; i; t) = xit + i + t for ind. i and period t.
2.1.1

Model notation

Y = X + +  + ";

Y; ;  and " are (NT  1), X is (NT  K ).

Convention: index t runs faster, index i runs slower:
where

0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@

y11
..
.

y1T
y21
..
.

y2T
..
.

yit
..
.

yN 1
..
.

yNT

1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A

6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4

(1)
X11
..
.

X1(1)T
(1)
X21
..
.

X2(1)T
..
.

Xit(1)
..
.

XN(1)1
..
.

(1)
XNT














(K )
X11
..
.

X1(KT )
(K )
X21
..
.

X2(KT )
..
.

Xit(K )
..
.

XN(K1)
..
.

(K )
XNT

3
7
7
7
7
70
7
7
7B
7B
7B
7B
7B
7B
7B
7@
7
7
7
7
7
7
7
5

1
2

C
C
..
C
.
C
+ ++"
k C
C
C
..
A
.

2.1.

25

NOTATION

2.1.1.2 Model in vector form

yi = Xi + i +  + "i; i = 1; 2; : : : ; N;
0
where yi is T  1, Xi is T  K . Note:  = (1 ; 2 ; : : : ; T ) and
i = ( i; i; : : : ; i)0 are (T  1).
2.1.2

 INT : identity matrix w/ NT rows and NT columns;

 eT : T -vector of ones;
 B = IN
(1=T )eT e0T :
 B = (1=N )eN e0N
IT :
 Q = INT

(Between-individual operator);

(Between-period operator);

IN
(1=T )eT e0T = INT

(Within-individual operator);

 Q = INT

(1=N )eN e0N

IT = INT

B

(Within-period operator;)

 B B = (1=NT )eNT e0NT

(Computes full population mean).

Important assumption: No intercept term in the

model (otherwise, use B B to demean all variables).
The

26

The

NT .

means.

2.1.3

Symmetry, idempotency and orthogonality

Q0 = Q; B 0 = B; Q2 = Q; B 2 = B; BQ = QB = 0;

rank(Q) = N (T 1) and rank(B ) = N:

Decomposition of the Q operator with N = T = 2:
02
3
1
1 0 0 0




B6 0 1 0 0 7
C
1
1
0
1
1
6
7
Cy
Qy = B

@4 0 0 1 0 5
0 1
2 1 1 A
0 0 0 1
0
1
2
30
1
1 1 0 0
y11
y11
B y12 C 1 6 1 1 0 0 7 B y12 C
6
7B
C
C
=B
@ y21 A 2 4 0 0 1 1 5 @ y21 A
0 0 1 1
y22
y22
0
1
0
1
y11
y11 + y12
B y12 C 1 B y11 + y12 C
C
B
C
=B
@ y21 A 2 @ y21 + y22 A
y22
y21 + y22
We will also use

 BT = (1=T )eT e0T : Between operator for a single individual;

 QT = IT (1=T )eT e0T = IT BT : Within operator for a single
individual.

2.2.

27

2.2 The One-Way Fixed Eects model

Terminology: the xed-eects model does not mean that individual eects
estimation is

conditional

Rather,

i 's

2.2.1

theorem

Y on X and on individual dummies.

Let E the NT  N matrix of individual dummy variables:
2
3
1
0
0  0
61
7
0
0  0
6
7
61
7
0
0  0
6
7
60
7
1
0



0
6
7
60
7
1
0  0
6
7
60
7
1
0  0
6
7
E = 6 ..
7
..

regressing

6
6
6
6
6
6
6
6
4



"

"

0
0
0

0
0
0

(i = 1) (i = 2)

 
0 
0 
0 

 

1
1
1

"

(i = N )

and consider the model

Y = X + E + " = W  + u
0 00
where W = [X; E ],  = ( ; ) , u = + ".

7
7
7
7
7
7
7
7
5

28

CHAPTER 2. THE LINEAR MODEL

Frish-Waugh-Lovell theorem:

Parameter estimates

are numeri-

 ^ from ^0OLS = ( ^ 00; ^0)0 = (W 0W )

 ^ = (X  X ) 1X  Y ; where
X  = [I E (E 0E )
Y  = [I E (E 0E )

1W 0Y

1E 0 ]X = PE X;
1E 0]Y = PE Y

(residuals from least-square regression of

and

on

0
0
But E = IN
eT , E E = IN
eT eT = IN  T
, PE = I E (E 0E ) 1E 0 = I T1 E (IN )E 0
= I 1 (IN
eT )(IN
eT )0 = I IN
1 eT e0 = Q.
T

E ).

^ = (X 0 X ) 1(X 0 Y ) = (X 0PE0 PE X ) 1(X 0PE0 PE Y )

= (X 0QX ) 1(X 0QY ).
Hence

Idea behind the xed-eect estimation procedure:

Eliminate individual eects

from variables

yit

1=T

X
t

yit = (xit

BY = (X

1=T

X
t

xit) + uit

BX ) + u Bu

1=T

X
t

uit

QY = QX + Qu:

^ = [(QX )0(QX )] 1 (QX )0QY = [X 0 Q0QX ] 1 (X 0Q0QY )

= (X 0QX ) 1X 0 QY and V ar( ^ ) = "2(X 0QX ) 1.

2.2.

29

2.2.2

The model is, in vector form:

y1

6y 7
6 2 7
6 .. 7
4. 5

yN

x1

6x 7
2 7
=6
6 .. 7
4. 5

xN

eT
0T
60 7
6e 7
T 7
6 T 7
+6
6 .. 7 1 + 6 .. 7 2
4. 5
4. 5

0T
60
6 T

+    + 6 ..
4

eT

0T

7
7
7 N
5

0T

+ 6 ..

7
7
7;
5

"1
6"
6 2
4

"N

with assumptions:

E ("i) = 0; E ("i"0i) = "2IT ; E ("i"0j ) = 0 i 6= j:

OLS estimates of and i obtain by
N
X

X
min "0i"i = (yi
i=1
i=1

, ^ i = yi

i

xi )0(yi

i

xi )

i = 1; 2; : : : ; N;
and substituting in partial derivative wrt. , we have
^ =

" N;T
X
i;t

(xit

xi)(xit

xi ;

xi)0

# 1 " N;T
X
i;t

(xit

xi)(yit

yi)

covariance estimator, or the LSDV (Least-Square

Dummy-Variable) estimator. ^ is unbiased, is consistent when N

or

 

V ar ^ = ^ 2"

" N
X
i=1

xiQT x0i

# 1

30

where

QT = IT

2.2.3

! 1.

 Model transformation by ltering out individual components

) Coecients associated with time-invariant regressors are not
identied.

 Fixed-eect procedure uses variation within periods for each

unit, hence the name.

 Another possibility is the Between procedure, using variation between individuals.

BY = BX + B + B";

^ = [(BX )0(BX )] 1 (BX )0BY = [X 0 BX ] 1 X 0BY:

This alternative estimator uses variation between individual means
for model variables.

If

PT
t x1it i;t

QY = QX + Qu, statistical software would divide RSS

by NT
K (individual eects not included). But in the model
Y = X + E + + ", the RSS would be divided by N (T 1) K .
model

be multiplied by

(NT

K )=[N (T

1) K ].

2.2.

31

Y
..........
..........
.
.
.
.
.
.
.
.
.
.
..........
..........

.
........
........
.
.
.
.
.
.
.
......

Within

Between

1...........
X
2.2.4

Poolability
As before:

versus

yit = i + xit + "it;

xit is a K vector.
H0 : 1 = 2 =    = N (= ) (K (N

but now

1) constraints).

v F (K (N
URSS=N (T K 1)

PN

Testing for individual eects

H0 : 1 =    = N (= ).

1); N (T

1)) ;

= Syyi Sxyi

32

(OLS)

versus

(Within)

v F ((N
URSS=(NT N K )

1); NT

K )) ;

where RRSS: from OLS regression on pooled data

and URSS: from Within (LSDV) regression.

2.3.1

! 1.

upon the

i 's) wrt.

Assumptions:

i v IID(0;  2 ); "it v IID(0; "2); E ( i"it) = E ( ixit) = 0;

with

E ( i j ) =


 2
0

if

i = j;

otherwise

"2 if i = j and t = s;
E ("it"sj ) = 0 otherwise:
2
2
2
Hence cov (uit ; ujs ) =  + " if i = j and t = s, and  if i = j
and t 6= s.

2.3.

33

THE RANDOM EFFECTS MODEL

Let

 2 + "2  2
6 2
 2 + "2
6
0

T = E (uiui) = 6 ..
4.

2
2






 2
2
..
.

 2 + "2

T  T ) matrix, for every individual i, i = 1; 2; : : : ; N .

a (

E (uu0) =
= IN

T = IN
 2 (eT e0T ) + "2IT


3
7
7
7;
5

We have

= IN
 2 (T  BT ) + "2(QT + BT )
since QT = IT
BT and BT = (1=T )eT e0T . Therefore


= IN
 2 (T  BT ) + "2(QT + BT ) = T  2 B + "2INT

= "2Q + (T  2 + "2)B .

or equivalently:

2.3.2

Y = X + U;

with

E (UU 0) =
.

,  2

and

"2,

based on known structure of variance-

.
^ GLS = X 0
1X  1 X 0
1Y
^ GLS ) = "2 X 0
1X  1.
and V ar (
covariance matrix

Computation of

1:

use of the formula

r = ("2)r Q + (T  2 + "2)r B
for an arbitrary scalar

r.

Based on properties of

Q and B (idem-

34

Hence useful matrices are

1
1
B

1 = 2Q + 2
"
T  + "2

and

1
1

1=2 = Q +
B:
2
"
(T  + "2)1=2
^ GLS = X 0
1X  1 X 0
1Y
We have
"

= X0

"2

 1

# 1"

X0

"2

 1

Y :

i 1h
i
1
1
0
0
= X (Q + B ) X
X (Q + B ) Y ;
2
2 2
2 2
where  = (T  +  )= = 1 + T  = .

"

"

"

GLS as Weighted Least Squares. Premultiply the model by

"
1=2 and use OLS: Y  = X  + u, where


"

1
=
2
Y = "
Y = Q +
B Y
(" + T  )1=2


"

1
=
2
X = "
X = Q +
B X;
(" + T  )1=2
so that
Y  = (Q +  1=2B )Y; X  = (Q +  1=2B )X;
scalar form:

fyit g = (yit

(1

fxitg = (xit

(1

and in

p1

)yi

p1 )xi:


2.3.

35

2.3.3

Comparison between GLS, OLS and Within

1
1
^ GLS = X 0 QX + 1 X 0 BX
X 0QY + X 0 BY


^ W ithin = (X 0QX ) 1X 0 QY; ^ Between = (X 0BX ) 1X 0BY;
so that

^ GLS = S1 ^ W ithin + S2 ^ Between;

1 0
0
1 0
where S1 = [X QX + X BX ] X QX and

0
S2 = [X 0QX + 1 X 0 BX ] 1 X BX
 .

 (i) If  2 = 0, then 1= = 1 and ^ GLS = ^ OLS .

 (ii) If T ! 1, then 1= ! 0 and ^ GLS ! ^ W ithin.
 (iii) If 1= ! 1, then ^ GLS ! ^ Between.
 (iv) V ar( ^ W ithin) V ar( ^ GLS ) is a s.d.p. matrix.
 (v) If 1= ! 0 then V ar( ^ W ithin) ! V ar( ^ GLS ).
2.3.4

i's ?

) If inference is restricted to the specic units (individuals)

in the sample: conditional inference, use Fixed eects. Example:
Individuals are not selected as random, or all rms in a given industry are selected.

) If inference on the whole population:

marginal (uncondi-

tional) inference, use Random eects. Example: Individuals are

selected randomly from a huge population (consumers).

36

 Interpretation of eects in the (economic) model;

 Sampling process: purely random or not;
 Number of units (countries, regions, households,...);
 Interchangeability of units;
 Endogeneity of Xit (see later).
2.3.4.2 Terminology
When xed individual eects are considered, Fixed-Eects or

Within estimation procedure. When random individual eects,

GLS (Generalized Least Squares) estimation procedure.
2.3.5

observations (

N = 629, T = 6).

Dependent variable: log wage

The GLS estimator is a weighted-average of the Within and Between estimators, where the weight is the inverse of the corresponding variance.

The Within estimator neglects the variation between individuals,

the Between estimator neglects the
variation within individuals, and the OLS gives equal weight to
both Within and Between variations.
Note. If the model contains an intercept:

2.3.

37

Table 2.1:

Variable

Within

GLS

Constant

0.8499

Age in [20,35]

0.0557

0.0393

Age in [35,45]

0.0351

0.0092

Age in [45,55]

0.0209

-0.0007

Age in [55,65]

0.0209

-0.0097

Age 65 over

-0.0171

-0.0423

-0.0042

-0.0277

-0.0204

-0.0250

Self-employed

-0.2190

-0.2670

South

-0.1569

-0.0324

Rural

-0.0101

-0.1215

we use

2.3.6

variances

If errors are normal, BQU estimates of

^ 2 = u0Qu=tr(Q) =
"

and

because

 2 and "2 are found from

PN PT
i=1 t=1(uit

N (T

1)

ui)2

X
"2 + T  2 = u0Bu=tr(B ) = T
u2i =N;
i=1

tr(Q) = N (T

But in practice, the

variances from the

1) and tr(B ) = N .

uit's

38

1/ Wallace and Hussain (1969):

true

u's;

2/ Amemiya (1971):


p
2
pNT (^2"


Use LSDV residuals estimates. We have

N (^

"2)
 2 )

where 
^ 2 = "2 + T  2

vN

0;

2"4 0
0 2 4



^ 2" =T .

Within and the Between regressions.

Mean square error from Within regression:

^ 2" = Y 0QY


Y 0QX (X 0QX ) 1X 0 QY =[N (T

1) K ]

and from the Between regression:


Y 0BX (X 0BX ) 1X 0BY =[N

"2 + T  2 = Y 0BY

Note: Intercept term in the Between regressors (

X ),

1]:

not in the

Within regression.

4/ Nerlove (1971):

Compute

^ 2 = N1 1

PN
i
i=1(^

^i)2, where ^ i

are parameter estimates associated to individual dummies from

LSDV regression. And

Estimation methods above with covariance components replaced

by consistent estimates:

Feasible GLS.

Chapter 3
Extensions
3.1 The Two-way panel data model
Error component structure of the form:

uit = i + t + "it

i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;

or in matrix form

U = (IN
eT ) + (eN
IT ) + ";
where

3.1.1

t are treated as xed parameters, conditional

on the N individuals over the period 1 ! T .
and

inference

3.1.1.1 Notation
Fixed-eect estimates of

Q = IN
IT

obtain by using the new operator:

IN
(eT e0T =T ) (eN e0N =N )
IT ;
39

40

CHAPTER 3.

so that

Qu = fuit

ui

EXTENSIONS

utgit :

N
X
with restriction

i=1

i = 0:

T
X
with restriction

t=1

t = 0;

OLS on model in deviations yields

^ = (X 0QX ) 1X 0 QY;
^
^ i = yi xi ;
^
^t = yt xt :
If the model contains an intercept, operator

Q = IN
IT
so that

Qu = fuit

Q becomes

IN
(eT e0T =T ) (eN e0N =N )
IT

+(eN e0N =N )
(eT e0T =T )
ui ut + ugit, and Within estimates are

^ = (X 0QX )
^ i = (yi y)
^ t = (yt y)

1X 0 QY;

(xi
(xt

^
x) ;
^
x) :

3.1.1.2 Testing for eects

1/ H0 : 1 =    = N = 1 =    = T = 0.

3.1.

41

Fisher test statistic:

v F (k1; k2);
URSS=[(N 1)(T 1) K ]
where

k1 = N + T

2; k2 = (N

1)(T

1) K );

and

2/ H0 : 1 =    = N = 0 given t 6= 0; t  T

1.

Fisher test statistic:

v F (k1; k2);
URSS=[(N 1)(T 1) K ]
where

k1 = N

1; k2 = (N

1)(T

1) K );

and

URSS: from Within model,

RRSS: from regression w/ time dummies only:

(yit

yt) = (xit

xt) + (uit

ut):

3/ H0 : 1 =    = T 1 = 0 given i 6= 0; i  N

1.

Fisher test statistic:

v F (k1; k2);
URSS=[(N 1)(T 1) K ]
where

k1 = T

1; k2 = (N

1)(T

1)

K );

and

42

CHAPTER 3.

EXTENSIONS

URSS: from Within model,

RRSS: from Within regression as in one-way model:

(yit

yi) = (xit

xi) + (uit

ui):

3.1.2

Sample: 63 Minnesota farms over the period 1946-1951.

Estimation of a Cobb-Douglas production function:

log Outputit = 0 + 1 log Laborit + 2 log Real estateit

+ 3 log Machineryit + 4 log F ertilizerit:

uit):
 Climatic conditions, identical across farms (t);
Motivation for adding specic eects (into

Table 3.1:

Least square estimates of Cobb-Douglas production func-

tion
Assumption

Estimate

1 (Labor)
2 (Real estate)
3 (Machinery)
4 (Fertilizer)
Sum of 's
R 2

(I)

(II)

(III)

0.256

0.166

0.043

0.135

0.230

0.199

0.163

0.261

0.194

0.349

0.311

0.289

0.904

0.967

0.726

0.721

0.813

0.884

i = t = 0 i = 0 t = 0

3.2.

43

3.2 More on non-spherical disturbances

Panel data: in the random-eect context, heteroskedasticity due
to panel data structure.

But variances

 2

"2

and

are assumed

constant.

Heteroskedasticity and serial correlation:

V ar( i) = i2
V ar("i) = i2
E ("it"is) 6= 0

Individual-specic heteroskedasticity

t 6= s

Typical heteroskedasticity

Serial correlation

3.2.1

V ar( i) = i2 "it v IID(0; "2);

or

i = 1; 2; : : : ; N;

E ( 0) = diag[i2] =  and " v IID(0; "2).

= E (UU 0) = diag[i2]
(eT e0T ) + diag["2]
IT ;

where

diag["2] is N  N .

We have

e e0
eT e0T

= diag[T i2 + "2]

T T + diag["2]
IT
T
T




eT e0T
eT e0T
r
2
2
r
2
r

= diag[(T i + " ) ]

+ diag[(" ) ]
IT
:
T
T
Transformation of the heteroskedastic model:
multiply both sides by

"
1=2

"
= diag
2
(T i + "2)1=2

eT e0T
+ IN
IT
T

eT e0T
:
T

44

CHAPTER 3.

yit = yit

"

"
p
T i2 + "2

EXTENSIONS

!#

yi:

Same form as in the homoskedastic case, only here

 is individual-

specic:

i = (T i2 + "2)="2

and

yit = yit

Feasible GLS:

p1 yi:
i

 Step 1. Estimate "2 consistently from usual Within regression;

2
2
2
2
 Step 2. Noting
that V ar (uit ) = wi = i + " , estimate wi by
PT
1=(T

^ui)2, where uit is OLS residual;

Compute 
^ 2i = w^i2 ^ 2" ;
Form T 
^ 2i + ^ 2" , ^i and compute y^it ; x^it;
Regress y
^it on x^it to get ^ .

1)

 Step 3.
 Step 4.
 Step 5.

uit
t=1 (^

Important: consistency of variance components estimates

1; 2; : : : ; N
3.2.2

requires

T >> N .

w^i2; i =

`Typical heteroskedasticity

Assumptions:

i v IID(0; i2) and V ar("it) = i2.

= E (UU 0) = diag[ 2 ]
(eT e0T ) + diag[i2]
IT
= diag[T  2 + i2]
(eT e0T =T ) + diag[i2]
(IT
Transformed model uses

1
]
(eT e0T =T )
2
2
T  + i

1=2 = diag[ p

eT e0T =T ) :

3.3.

45

UNBALANCED PANEL DATA MODELS

+diag[1=i]
(IT

eT e0T =T ) ;

Y  =
1=2 has typical element
y y
y
yit = it i + p 2i 2
i
T  + i
y iyi
i
p
= it
where i = 1
i
T  2 + i2
E (u2it) = wi2 =  2 + i2 8i, hence
OLS residuals u
^it can be used to
P
T
2 ^ 2 = 1=(T 1)
estimate wi : w
uit ^ui)2.
i
t (^
Within residuals u
~ are then used to compute
PTit
2
^ i = 1=(T 1) t (~uit u~i)2.

so that

A consistent estimate of

 2 is ^ 2 = (1=N )

PN 2
^i
i (w

^ 2i ).

3.3.1

Introduction

Denition: number of time periods is dierent from one unit (indi-

i
Ti periods, and total
PN
number of observations is now
i=1 Ti (instead of NT previously).
vidual) to another. For individual , we have

Examples

 Firms: may close down or new intrants in an industry;

 Consumers: may move, die or refuse to answer anymore;
 Workers: may become unemployed,...
Problem of attrition: probability of a unit staying in the sample
decreases as the # of periods increases.

46

CHAPTER 3.

3.3.2

EXTENSIONS

3.3.2.1 The one-way unbalanced xed-eect model

Consider the unbalanced model with

y11
B y12
B
B y13
B
@ y21
y22
To eliminate

T1 = 3 and T2 = 2:

x11
1
C B x12 C
B 1
C B
C
B
C = B x13 C + B 1
C B
C
B
A @ x21 A
@ 2
x22
2

"11
C B "12
C B
C + B "13
C B
A @ "21
"22

C
C
C:
C
A

Q =
2

6
6
6
6
4

2=3
1=3
1=3
0
0

I3

e3e03=3
0
I2
1=3
2=3
1=3
0
0

1=3
1=3
2=3
0
0

0
e2e02=2
0
0
0
1=2
1=2


3

0
07
7
07
7;
1=2 5
1=2

where

Q = diag(ITi

The model is

3.3.

47

where

Nt:

n.

t, and n =

PT
t=1 Nt .

Consider a

N

matrix at time

t.

N = 3, N1 = 3, N2 = 2, N3 = 2, and observations are

(y11; y21; y31) (y12; y32) (y13; y23).

Example:

1 0 0
40 1 05
0 0 1

8
>
>
>
>
>
>
>
>
>
>
>
>
>
<

1 0 0
D1 = 4 0 1 0 5
0 0 1

1 0 0
D
=
>
2
>
>
0 0 1
>
>
>
>
>
>
>
>
>
>
:

1 0 0
D3 =
0 1 0
We have 3 (Nt  N ) matrices Dt , t = 1; 2; 3 constructed from I3
above.

 as (1; 2), where 1 = (D10 ; : : : ; DT0 )0,

a (n  N ) matrix, and 2 = diag (Dt eN ), a (n  T ) matrix:
2
3
D1 D1eN   
0
6 D
0 
0 7
6 2
7
 = 6 ..
7:
..
..
4 .
5
0
.
.
DT 0    DT eN

48

CHAPTER 3.

EXTENSIONS

(the Nt 's).
The

Matrix

is

n  (N + T ),

1

in

before.

011 = diag(Ti) (number of periods in the sample for

0
unit i), and 2 2 = diag (Nt ) (number of individuals for period
t).
0
Also, 2 1 is a T  N matrix of dummy variables for the presence
in the sample of unit i at time t.
Note that

model

where
and

Dit:

t's.

, and contains all the i's

1 = (eT
2 = (IT
eN ), and  would be NT  (N + T ).
In the balanced panel case, we would have

IN ) and

3.3.

49

UNBALANCED PANEL DATA MODELS

n = 3 + 2 + 2 = 7 and N = 3:

In example above,

=

vector

1
60
6
60
6
61
6
40
0

0
0
1
1
0
0

1
0
0
0
1
0

1
0
0
1
0
1
0

0
1
0
0
0
0
1

0
0
1
0
1
0
0

1
1
1
0
0
0
0

0
0
0
1
1
0
0

0
07
7
07
7
07
7;
07
7
7
15
1

would be

0
1
0
1
0
0

6
6
6
6
6
6
6
6
6
4

0
0
1
0
1
0

1
0
0
0
0
1

0
y
0 B 11 C
y11 + y12 + y13
y
B
21
C
17
y21 + y23
C B
7B
y
B
C B
31
7
B
0 7B
C B y31 + y32
y
B
C=B
12
07
y11 + y21 + y31
C
7B
B y32 C B
5
@
C
0 B
y12 + y32
@ y13 A
1
y13 + y23
y23
3

1
C
C
C
C
C
C
A

would compute the sums of variables over periods and inviduals.

Easier method if

and

vidual and time means, as in the balanced two-way Within case.

Let

N = 011
T = 022
NT = 021
 = 2 1N10NT


P = T NT N10NT = 02

(N  N );
(T  T );
(T  N );
(n  T );
(T  T ):

50

CHAPTER 3.

EXTENSIONS

Wansbeek and Kapteyn (1989): The required Within operator for

such unbalanced two-way panel is

Q = In
where

1N101

: generalized inverse of

P 
 0;


P.

QY , say, is also written as

P 
 0Y = Y 1N11
QY = Y 1N101Y 
0
 0Y .
where 1 = 1 Y and  = P 
Transformed variable

PTi
t=1 yti .

1 compute the individual sums

Typical transformed element:

where

 ;


1i
a0i
+
Ti
Ti

t;

Example

Y = (y11; y21; y31; y12; y32; y13; y23) = (1; 2; 3; 2; 6; 3; 4), n = 7,

N = 3, T = 3.
Let

We have

3 0 0
1 1 1
N = T = 4 0 2 0 5 ; NT = 4 1 0 1 5 ;
0 0 2
1 1 0
2

P =4

1:6666
0:8333
0:8333

0:8333
1:1666
0:3333

0:8333
0:3333 5
1:1666

3.3.

51

UNBALANCED PANEL DATA MODELS

QY =

B
B
B
B
B
B
B
B
B
@

0:4582
0:1875 C
C
0 1
0
1
0:5000 C
6
0:3383
C
0:5418 C
C ; 1 = @ 6 A  = @ 1:6618 A
0:5000 C
C
9
2:0368
C
0:0832 A
0:1875

For example,

Qy11 = 1

6 1
+ ( ) (1 1 1 ) @
3 3
0

Qy31 = 3

9 1
+ ( ) (1 1 0 ) @
2 2

0:3383
1:6618 A + 0:3383 = 0:4582:
2:0368
1

0:3383
1:6618 A + 0:3383 = 0:5:
2:0368

See Appendix 3 for the unbalanced random-eects model.

52

CHAPTER 3.

EXTENSIONS

Chapter 4
Augmented panel data models
What are augmented panel models ? Implication for estimation ?
Special estimation techniques when GLS are not feasible.

4.1 Introduction
Consider the model

yit = xit + zi + i + "it; i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;

xit a 1  K vector of time- and individual-varying regressors,
and zi a 1  G vector of individual-specic (time-invariant) rewith

gressors.

Example:

log W AGE = 1HOURS + 1EDUC + 2SEX + i + "it:

Estimation method:

 Within: is not identiable because

QY = QX + (I

B )Z + Q + Q" = QX + Q";
53

54
since

CHAPTER 4.

BZ = Z .

Only

identiable.

feasible:

) ^ ;

2/ Run Between regression on

xi ^ = i + Zi + "i;
to estimate the 's.
yi

i = 1; 2; : : : ; N;

 GLS: Both and are identiable.

4.2 Choice between Within and GLS
One of the choice criterion between Within and GLS: presence of

zi's in the model.

Recall: GLS is a consistent and ecient estimator provided regressors are exogenous:

E ( izi) = 0
8i; t:
Consider the non-augmented model yit = xit + i + "it .
If xit is endogenous in the sense E ( i xit ) 6= 0, then GLS are not
E ( ixit) = 0

consistent:

and



^ GLS = + X 0
1X 1 X 0
1U


  
 
= + X 0 Q +  1B X 1 X 0 Q +  1B U ;
2 2
where  = 1 + T  =" , so that
 0
X

 
Q +  1B U = [X 0Q" + X 0(B + B")=]

4.3.

55

AN IMPORTANT TEST FOR ENDOGENEITY

= 0 + X 0B = + 0 = X 0 = 6= 0;
because

E (X 0") = 0 and B = .

E (Z 0 ) 6= 0.

Important consequence in practice:

E (X 0 ) 6= 0 and/or

gressors are endogenous, GLS estimates are not consistent, but

Within estimates are consistent because

is ltered out.

 If endogenous regressors ) Choose Within estimation (but

not identiable);

 If all regressors are exogenous, use GLS (the most ecient).

Three problems remain:

 still not identied, because in the Between regression

xi ^ = zi + i + "i,
zi still correlated with i.
yi

 If one uses Within, all regressors are treated as endogenous (no

distinction between exogenous and endogenous

xit's).

4.3 An important test for endogeneity

Null hypothesis:

H0 : E (X 0 ) = E (Z 0 ) = 0 (exogeneity).

56

CHAPTER 4.

H0
Alternative

^ GLS

^ W ithin

Consistent,

Consistent,

ecient

not ecient

Not consistent

Consistent

mates of

xit's

Therefore,

HT = ^ W ithin


Notes
 ^ GLS

^ GLS


0 h

^ W ithin

H0,

only.

i 1

^ GLS

v 2(K ):

^ W ithin hmust have the same dimension.

i
^
^
 Weighting matrix V ar( W ithin) V ar( GLS ) is positive: GLS
and

Recall that

"2(X 0QX ) 1.

Interpretation of # of degrees of freedom of the test:

0
Within estimator is based on the condition E (X QU ) = 0, whereas
0 1
0
0
GLS is based on E (X
U ) = 0 ) E (X QU ) = 0 and E (X BU ) =
0.
For GLS, we add
of

X.

later).

B ):

rank

4.4.1

Instrumental Variable estimation

Alternative method:

Instrumental-variable estimation.

In the

observations:

Y = X + "; E (X 0") 6= 0; E (W 0") = 0;

W is a N  L matrix of instruments.
 If K = L,

where

[W 0(Y

 If L > K ,
[W 0(Y

X )] = 0

^ = (W 0X ) 1W 0Y
X )] = 0

(W 0Y ) = (W 0X )

(IV estimator)

L conditions on K

parameters)

(Y X )0W (W 0W ) 1W 0
X ) where PW = W (W 0W ) 1W 0

and construct quadratic form

 (Y

) ^ = (X 0PW0 X )
Note:

in general, instruments

1 (X 0 P Y ):
W
originate from or outside the

equation.

4.4.2

IV in a panel-data context

 Account for variance-covariance structure (

);
 Find relevant instruments, not correlated with .

58

CHAPTER 4.

Consider the general, augmented model:

Y = X1 1 + X2 2 + Z1 1 + Z2 2 + + ";
where

X1 :
X2 :
Z1 :
Z2 :
and let

N  K1
N  K2
N  G1
N  G2

i and t;
endogenous, varying across i and t;
exogenous, varying across i;
endogenous, varying across i;
exogenous, varying across

data: Let
have

Y  =
1=2Y , X  =
1=2X ,
h

 =
1=2.

We

1 0
0
^ IV =  PW 
 PW Y 
h
i 1h
i
0
1
=
2
1
=
2
0
1
=
2
1
=
2
= 
PW


PW
Y :

Computation of

4.4.3

and

1=2:

Exogeneity assumptions and a rst instrument matrix

E (X10 ) = E (Z10 ) = 0
) Obvious instruments are X1 and Z1, not sucient because
K1 + G1 < K1 + K2 + G1 + G2.
Additional instruments: must not be correlated with .
Because is the source of endogeneity, every variable not correlated with is a valid instrument. Best valid instruments are
highly correlated with X2 and Z2 .
QX1 and QX2 are valid instruments: E [(QX1)0 ] = E [X10 Q ] =
Exogeneity assumptions:

As for

X1, equivalent to use BX1 because we need

E [X10
1U ] = E [X10 (Q +  1B )U ] = E [X10 B (Q +  1B )U ]
since

BQ = 0 and BB = B .

WHT = [QX1; QX2; BX1; Z1] = [QX1; QX2; X1; Z1]:

Identication condition: We have K1 + K2 + G1 + G2 parameters
to estimate, using K1 + K1 + K2 + G1 instruments (K1 + K2 instruments in

4.4.4

QX ).

K1  G2.

More ecient procedures: Amemiya-MaCurdy and

Breusch-Mizon-Schmidt

4.4.4.1 Amemiya and MaCurdy (1986)

xit is exogenous, we can use the following conE (xit i ) = 0 8i; 8t instead of E (x0i i) = 0.

ditions:

X1

in

60

CHAPTER 4.

x11
6x
6 11
6 :::
6
6
6 x21
6
x21

X1 = 6
6
6 :::
6
6 xN 1
6
6 xN 1
6
4 :::
xN 1

x12
x12
:::
x22
x22
:::
xN 2
xN 2
:::
xN 2

:::
:::
:::
:::
:::
:::
:::
:::
:::
:::

x1T
x1T
:::
x2T
x2T
:::
xNT
xNT
:::
xNT

(i = 1; t = 1)
(i = 1; t = 2) 7
7
7
:::
7
(i = 2; t = 1) 7
7
(i = 2; t = 2) 7
7
7
:::
7
7
(i = N; t = 1) 7
7
(i = N; t = 2) 7
7
5
:::
(i = N; t = T )

QX1 = 0 and BX1 = X1. The AM instrument matrix

= [QX; X1; Z1], and an equivalent estimator obtains by

such that
is

WAM

using

WAM = [QX; (QX1); BX1; Z1];



where (QX1 ) is constructed as X1 above.
Amemiya and MaCurdy: their instrument matrix yields an estimator as least as ecient as with the Hausman-Taylor matrix,
if

(QX1) to the Hausman-Taylor


list of instruments, but as [(QX1 ) ; X1 ] is of rank K1 , we only add
(T 1)K1 instruments. identication condition is T K1  G2.
Identication condition:

4.4.4.2 Breusch, Mizon and Schmidt (1989)

Even more ecient estimator: based on conditions

E [(QX2it)0 i] = 0 8i; 8t, instead of condition

E [(QT X2i)0 i] = 0.

4.5. COMPUTATION OF VARIANCE-COVARIANCE MATRIX FOR IV ESTIMATORS61

For BMS, estimator is more ecient if endogeneity in

X2

origi-

where

(QX1)

and

(QX2)

X1

for AM.

(QX2) to AmemiyaMaCurdy instruments. Condition is then T K1 +(T

1)K2  G2.
As before, we only add (T
1)K2 instruments, as (QX2) is not
full rank but (T
1)K2.

Identication condition:

4.5 Computation of variance-covariance matrix

for IV estimators
Problem here: endogenous regressors may yield unconsistent estimates of variance components in

, in particular parameter .

Let

M1 denote the individual-mean vector of the Within residual:

M1 = BY

where

BX ^ W = B


BX (X 0 QX ) 1X 0Q Y



= Z + + B BX (X 0 QX ) 1X 0Q ";
X = (X1jX2), Z = (Z1jZ2), and = ( 1; 2).

The last

three terms above can be treated as centered residuals, and it

suces to nd instruments for
The IV estimator of

is

Z2 in order to estimate .

62

CHAPTER 4.

PC is the projection matrix associated to instruments C =

(X1; Z1). Using parameter estimates ^ W and ^B , we form residwhere

uals

QX ^ W and u^B = BY

u^W = QY

BX ^ W

Z ^B :

These two vectors of residuals are used to compute variance composants as in standard Feasible GLS.

4.5.1

QX

and

QY .

 Step 2. Estimate parameters associated to X using Within.

 Step 3. Estimate B by the IV procedure above.
 Step 4. Compute  2 and "2 from u^W and u^B , and compute
^ = 1 + T ^ 2 =^ 2" .

(Q + B )Y = yit

(1

)yi.

matrix

W.

 Step 7. Estimate parameters .

4.6 Example: Wage equation
4.6.1

Model specication

4.6.

63

log w = F [X1; ; ED];

:

where

w:

wage rate

X1: additional variables (industry, occupation status, etc.), and ED : educational level. Proxies
worker's ability (unobserved),

union, etc.

ED: @w=@ED.

conditions

where

ED ?

ED = G[; X2];

If ability



log w = F [X1; Z; ED] + U;

ED = G[X2; Z2] + V;

U = F [X1; ; ED] F [X1; Z; ED] and

V = G[X2; ] G[X2; Z ].

where

Two problems when estimating the rst equation while overlooking the second one:

ED);

error bias.

64

CHAPTER 4.

4.7 Application: returns to education

Sample used: Panel Study of Income Dynamics (PSID), University of Michigan. See Baltagi and KhantiAkom 1990, Cornwell
and Rupert 1988.

595 individuals, for years 1976 to 1982 (7 time periods): heads of

households (males and females) aged between 18 and 65 in 1976,
with a positive wage in private, nonfarm employment for the
years 1976 to 1982.

4.7.1

 LW AGE : logarithm of wage earnings;

 W KS : number of weeks worked in the year;
 EXP : working experience in years at the date of the sample;
 OCC : dummy, 1 if bluecollar occupation;
 IND : dummy, 1 if working in industry;
 UNION : dummy, 1 if wage is covered by a union contract.
4.7.2

 SMSA : dummy, 1 if household resides in SMSA (Standard

Metropolitan Statistical Area);

 SOUT H : dummy, 1 if individual resides in the south;

 MS : Marital Status dummy, 1 if head is married;

4.7.

65

APPLICATION: RETURNS TO EDUCATION

 F EM : dummy, 1 female;
 BLK : dummy, 1 if head is black;
 ED : number of years of education attained.
Individual-specic variables:

ED, BLK

and

F EM .

Estimation of non-augmented models (w/o

Variables

a priori

individual eects):

MS );

Variables

IND).

a priori

Zi's)

exogenous:

Augmented model

Yit = X1it 1 + X2it 2 + Z1i 1 + Z2i 2 + i + "it

a priori endogenous: Z2: ED;
Variables a priori exogenous: Z1 : (BLK , F EM ).
Variables

66

CHAPTER 4.

Table 4.1:
Variable

LW AGE
EXP
W KS
OCC
IND
UNION
SOUT H
SMSA
MS
ED
F EM
BLK

Mean

Std. Dev.

Minimum

Maximum

6.6763

0.4615

4.6052

8.5370

19.8538

10.9664

1.0000

51.0000

46.8115

5.1291

5.0000

52.0000

0.5112

0.4999

0.0000

1.0000

0.3954

0.4890

0.0000

1.0000

0.3640

0.4812

0.0000

1.0000

0.2903

0.4539

0.0000

1.0000

0.6538

0.4758

0.0000

1.0000

0.8144

0.3888

0.0000

1.0000

12.8454

2.7880

4.0000

17.0000

0.1126

0.3161

0.0000

1.0000

0.0723

0.2590

0.0000

1.0000

4.7.

67

Table 4.2:

Dependent variable: log(wage).

Exogenous regressors

only.
Within

GLS

0.0976 (0.0040)

OCC

-0.0696 (0.02323)

-0.0701 (0.02322)

SOUTH

-0.0052 (0.05833)

-0.0072 (0.05807)

SMSA

-0.1287 (0.03295)

-0.1275 (0.03290)

0.0317 (0.02626)

0.0317 (0.02624)

Constant

IND

2(4) = 0:551

Table 4.3:

Dependent variable: log(wage). Endogenous regressors

only.
Within

GLS

0.0561 (0.0024)

0.1136 (0.002467)

0.1133 (0.002466)

EXPE2

-0.0004 (0.000054)

-0.0004 (0.000054)

WKS

0.0008 (0.0005994)

0.0008 (0.0005994)

-0.0322 (0.01893)

-0.0325 (0.01892)

0.0301 (0.01480)

0.0300 (0.01479)

Constant
EXPE

MS
UNION

2(5) = 24:94

68

CHAPTER 4.

Table 4.4:

Dependent variable: log(wage). Augmented model.

Within

GLS

0.1866 (0.01189)

OCC

-0.0214 (0.01378)

-0.0243 (0.01367)

SOUTH

-0.0018 (0.03429)

0.0048 (0.03188)

SMSA

-0.0424 (0.01942)

-0.0468 (0.01891)

IND

0.0192 (0.01544)

0.0148 (0.01521)

EXPE

0.1132 (0.00247)

0.1084 (0.00243)

-0.0004 (0.00005)

-0.0004 (0.00005)

0.0008 (0.00059)

0.0008 (0.00059)

-0.0297 (0.01898)

-0.0391 (0.01884)

0.0327 (0.01492)

0.0375 (0.01472)

FEM

-0.1666 (0.12646)

BLK

-0.2639 (0.15413)

ED

0.1373 (0.01415)

Constant

EXPE2
WKS
MS
UNION

2(9) = 495:3

Table 4.5:

HT

AM

BMS

0.1772 (0.017)

0.1781 (0.016)

0.1748 (0.016)

-0.0207 (0.013)

-0.0208 (0.013)

-0.0204 (0.013)

0.0074 (0.031)

0.0072 (0.031)

0.0077 (0.031)

-0.0418 (0.018)

-0.0419 (0.018)

-0.0423 (0.018)

IND

0.0135 (0.015)

0.0136 (0.015)

0.0138 (0.015)

EXPE

0.1131 (0.002)

0.1129 (0.002)

0.1127 (0.002)

-0.0004 (0.005)

-0.0004 (0.000)

-0.0004 (0.000)

0.0008 (0.000)

0.0008 (0.000)

0.0008 (0.000)

-0.0298 (0.018)

-0.0300 (0.018)

-0.0303 (0.018)

0.0327 (0.014)

0.0324 (0.014)

0.0326 (0.014)

FEM

-0.1309 (0.126)

-0.1320 (0.126)

-0.1337 (0.126)

BLK

-0.2857 (0.155)

-0.2859 (0.155)

-0.2793 (0.155)

0.1379 (0.021)

0.1372 (0.020)

0.1417 (0.020)

Constant
OCC
SOUTH
SMSA

EXPE2
WKS
MS
UNION

ED
Test

Notes. Standard errors are in parentheses.

Chapter 5
Dynamic panel data models
5.1 Motivation
Usefulness of dynamic panel data models:

 Investigate adjustment dynamics in micro- and macro-economic

variables of interest;

 Estimate equations from intertemporal-framework models (lifecycle models, nance,...)

In practice: estimate long-run elasticities and structural parameters from Euler equations.

5.1.1

problems

Consider the general problem

R

maxq(0);:::;q(T ) E e rt(t) ;
(t) = p(t)q(t) c[q(t); b(t)];
b_ = G[b(t); q(t)];
69

70

CHAPTER 5.

DYNAMIC PANEL DATA MODELS

b(t) is the state variable (stock, capital,...), q(t) is the control variable, r is discount rate. G(:) describes the evolution path
where

nP
o
T
t
maxq0;:::;qT E
t=0(1 + r) t ;

bt+1 = f (bt; qt);

and use the Bellman equation:



Vt(bt) = max Et t + (1 + r) 1Vt+1(bt+1)

= max Et fptqt c[qt; bt] + Vt+1f [bt; qt]g ;

where Vt (bt ) is the value function of the problem at time t,
Et is the conditional expectation operator at time t.

and

We use a) the envelope theorem (evolution path at optimum depends only on state variable, as control variable is already optimized); b) First-order condition wrt. control variable.

@Vt(bt) @ t(bt; qt)

1 @Vt+1 @f (bt; qt)
=
+
;
@bt
@bt
1 + r @f
@bt
(Envelope theorem)

@Vt(bt) @ t(bt; qt)

1 @Vt+1 @f (bt; qt)
=
+
=0
@qt
@qt
1 + r @f
@qt
From (F OC ):


@Vt+1
@ t @f (bt; qt)
=
@f
@qt
@qt

 1

(1 + r);

(FOC)

5.1.

71

MOTIVATION

@Vt @ t
=
@bt @bt
Now we lag

 1

"

@ t 1
1 @ t
+
@qt 1 1 + r @bt
Assume

@ t @f (bt; qt)
@qt
@qt


@ t @f
@qt @qt

 1

@f=@q = a1 and @f=@b = a2.



@f (bt; qt)
:
@bt

@f @f (bt 1; qt 1)
= 0:
@bt
@qt 1

We have

@ t
1 + r @ t 1
a @ t
=
+ 1
:
@qt
a2
@qt 1
a2 @bt
This is the Euler equation relating current and past marginal
prots.
If, for instance, prot is linear-quadratic in

b0 + b1qt + b2bt =

1+r (b0 + b1 qt 1 + b2 bt 1)
a 

2

a1
a2

qit = 0 + 1qi;t 1 + 2bi;t 1 + 3bit + i + "it;

where

0
1
2
3
5.1.2

= (a2 b1
= (a2 b1
= (a2 b1
= (a2 b1

a1c1)
a1c1)
a1c1)
a1c1)

1 [b ((1 + r) a ) + a c ] ;
0
2
1 0
1 [(1 + r)b ] ;
1
1 [(1 + r)b ] ;
2
1 [a c a b ] :
1 2
2 2

72

CHAPTER 5.

Consider a two-period model with the following period-to-period

budget constraint

ct + At = yt + At 1(1 + rt); t = 1; 2;
where

ct

is consumption at time

income, and

rt is interest rate.

t, At

is total assets,

yt

is wage

U = u(c1) +
where

1
u(c );
1+ 2

U
where

= c1  +

1
c2 ;
1+

 = 1=(1+ ) is the intertemporal elasticity of substitution.

At the optimum (by replacing budget constraints in utility function and optimizing wrt.

A1):

@u @c1
1 @u @c2
@U
=
+
=0
@A1 @c1 @A1 1 + @c2 @A1
@u 1 + r @u
, @c
=
:
1 1 + @c2

This is the

c1 1= =

1+r
c2 1= :
1+

5.1.

73

MOTIVATION

c1 =

1+r
(
1+

u(X ) = 1=2(

Ec2)

X )2 :

c1 = Ec2

if

r = :

Hall Euler equation with more than 2 periods reduces to

ct+1 = ct + "t+1;

where

"t+1 is i.i.d.;

ct 1) + "t:

This is an error-correction model that can be written

ct = 0 + 1yt + (ct 1
5.1.3

1yt 1) + 2(yt 1

ct 1) + "t:

Long-run relationships are represented by the stationary path

of the variable of interest (consumption, capital stock,...)

yt+1
yt

=  and if we add variable xt, yt+1 = yt + xt+1 , stationary

 x .
equilibrium path is y =
1 
5.1.3.1 Long-run elasticities
Dynamic models are helpful in computing long-run elasticities.
Consider for example the dynamic consumption model

where

C~i;t+j = C~i;t+j 1 + P~i;t+j + ui;t+j ;

C~i;t+j and P~i;t+j respectively denote logs of

and price.
have

consumption

We

74

CHAPTER 5.

+P~i;t+j 1 + P~i;t+j + ui;t+j ;

j
j 1u
where ui;t+j =  uit + 
i;t+1 +    + ui;t+j 1 + ui;t+j .
Assume we want to compute the change in consumption at

t + j following a permanent change of 1% in price between

t and t + j :
@ C~i;t+j
@ C~i;t+j @ C~i;t+j
+
+



+
= (j + j 1 +    +  + 1):
~
~
~
@ Pit
@ Pi;t+1
@ Pi;t+j
time

run eect of price obtains by taking the limit

j
X
@ C~i;t+j

j + j 1 +    +  + 1) = :
=
lim

(

~
j !1
j !1
1 
s=0 @ Pi;t+s

lim

5.1.3.2 Dynamic representations from AR(1) errors

Consider the following Cobb-Douglas production model

where

Qit

is output of rm

poses into

where

t

change),

uit

i at time t, Nit

is labor input,

Kit

is

error component (measurement error), and

shock having an AR(1) representation:

vit = vi;t 1 + eit:

vit

"it

is an i.i.d.

is a productivity

5.2.

75

 2 log Ki;t 1

t 1) + [ i(1 ) + eit + "it

"i;t 1] ;

or

log Qit = 1 log Nit +  log Ni;t 1 + 3 log Kit +  log Ki;t 1
+5 log Qi;t 1 + t + ( i + !it);
subject to restrictions

2 = 1 5 and 4 = 35.

Hence, equivalence between a static (short-run) model with seriallycorrelated productivity shocks, and a dynamic representation of
production output.

5.2 The dynamic xed-eect model

Simple dynamic panel-data model:

yit = yi;t 1 + i + "it; i = 1; 2; : : : ; N ; t = 1; 2; : : : ; T;

yi0; i = 1; 2; : : : ; N are assumed known.
2
We assume E ("it ) = 0 8i; t, E ("it "js ) = "
if i = j; t = s and 0 otherwise, E ( i "it ) = 0 8i; t.

where initial conditions

By continuous substitution:

yit = "it + "i;t 1

+ 2"

i;t

2 +

+ t

t
1 " + 1  + t y :
i1
i0
1  i

76

CHAPTER 5.

5.2.1

The Within estimator is:

^ =

PN PT
i=1 t=1(yit yi)(yi;t 1 yi; 1) ;
PN PT
2
i=1 t=1 (yi;t 1 yi; 1)

^ i = yi

^yi; 1;

where

T
T
T
1X
1X
1X
y ; yi; 1 =
y ; "i =
" :
yi =
T t=1 it
T t=1 i;t 1
T t=1 it
Also,

1 PN PT ("it "i )(yi;t 1 yi; 1)

^ =  + NT 1i=1PN t=1PT
;
2
(
y
y

)
i;t
1
i;
1
i=1 t=1
NT
This estimator exists if denominator

6= 0 and is consistent if nu-

merator converges to 0.
Numerator:

1
plimN !1
NT
because

N;T
X
i;t

(yi;t 1

yi; 1)("it

N
1X
"i) = plim
y "
N i=1 i; 1 i

"it is serially uncorrelated and not correlated with i .

We

use

T
1X
1 1 T
(T
yi; 1 =
yi;t 1 =
yi0 +
T t=1
T 1 

1) T  + T
i
(1 )2


1 T 1
1 T 2
+
" +
" +    + "i;T 1 :
1  i1
1  i2

5.2.

77

THE DYNAMIC FIXED-EFFECT MODEL

We have

N
X

plim

N
X

1
1
1
yi; 1"i = plim
"i
N i=1
N i=1 T
(

N
X

T
X

1
1
"
N i=1 T t=1 it

"2 (T 1)
= 2
T
(1

1
T

"

T 1
X
1

1
t=1

"

T 1
X
1

T t

T t

#)

"it

#)

"it
1

t=1

T  + T
:
)2
1 PN;T (y
2
In a similar manner, we show that plim
i;t 1 yi; 1)
i;t
NT
= plim

2
= " 2 1
1 

1) T  + T
T2

2
(T

(1 )2

1
T

1 1 T
T 1 

1+
plimN !1 (^
 ) =
1
T 1

2
(1 )(T

1)

1 T
T (1 )

 1

= O(1=T ):

(yit

yi) = (yi;t 1

1=T .

is large and

is small.

78

CHAPTER 5.

Table 5.1:

0.2

0.5

0.7

0.9

Bias

Percent

-0.2063

-103.1693

-0.1539

-76.9597

10

-0.1226

-61.3139

20

-0.0607

-30.3541

40

-0.0302

-15.0913

-0.2756

-55.1282

-0.2049

-40.9769

10

-0.1622

-32.4421

20

-0.0785

-15.6977

40

-0.0384

-7.6819

-0.3307

-47.2392

-0.2479

-35.4084

10

-0.1966

-28.0912

20

-0.0938

-13.3955

40

-0.0449

-6.4114

-0.3939

-43.7633

-0.3017

-33.5179

10

-0.2432

-27.0248

20

-0.1196

-13.2934

40

-0.0563

-6.2561

5.2.

79

THE DYNAMIC FIXED-EFFECT MODEL

5.2.2

Instrumental-variable estimation

(small).

when

is xed

(yit

yi;t 1) = (yi;t 1 yi;t 2) + ("it

yit = yi;t 1 + "it;

"i;t 1)

yi = yi; 1 + "i; i = 1; 2; : : : ; N:

In model above, yi;t 1 correlated by construction with "i;t 1 ! We
need instruments that are uncorrelated with ("it
"i;t 1) but correlated with (yi;t 1
yi;t 2). Only possibility in a single-equation
framework with no other explanatory variables: use values of dependent variables.

yit are not feasible because yit is a recursive function

of "it ; "i;t 1; : : : ; "i1 ; i ; yi0 .
As for lagged dependent variables, we can use either yi;t 2 or
(yi;t 2 yi;t 3):
E [yi;t 2("it "i;t 1)] = E ("i;t 2"it) E ("i;t 2"i;t 1) = 0;
E [(yi;t 2 yi;t 3)("it "i;t 1)] = E ["i;t 2("it "i;t 1)]
E ["i;t 3("it "i;t 1)] = 0;
E [yi;t 2(yi;t 1 yi;t 2)] = 0 E ("2i;t 2) = "2;
E [(yi;t 2 yi;t 3)(yi;t 1 yi;t 2)] = 0 E ("2i;t 2) = "2:
Instrumental-variable estimators that are consistent when N and/or
T ! 1:
PN PT
(y y )(y
y )
^ = PNi=1PT t=3 it i;t 1 i;t 2 i;t 3
i=1 t=3 (yi;t 1 yi;t 2)(yi;t 2 yi;t 3)

ture values of

80

CHAPTER 5.

^ =

or

Conclusion:
even though
because the

PN PT
i=1 t=3(yit
PN PT
i=1 t=3(yi;t 1

yi;t 1)yi;t 2
:
yi;t 2)yi;t 2

i is eliminated, endogeneity bias occurs for xed T

Q operator used introduces errors "is correlated by

yit = yi;t 1 + xit + zi + i + "it:

IV Estimation proceeds as follows.

Step 1.

(yit

First-dierence the model, to get

yi;t 1) = (yi;t 1

yi;t 2) + (xit

xi;t 1) + "it

"i;t 1:

yi;t 2 or (yi;t 2 yi;t 3) as instrument for (yi;t 1 yi;t 2) and

estimate ; with the IV procedure.
Use

Step 2.

Substitute

yi

^yi; 1

and estimate

Step 3.

^ and ^

in rst-dierence Between equation:

xi ^ = zi + i + "i; i = 1; 2; : : : ; N;

by OLS.

^ 2" = 2N (T1 1) Ni=1 Tt=1 [(yit yi;t 1) ^(yi;t 1

i2
^
(xit xi;t 1) ;
i2
PN h
2
1
1 ^ 2;
^
^ = N i=1 yi ^yi; 1 zi ^ xi
"
T

yi;t 2)

5.3.

81

, and "2 are consistent when N or T ! 1;

2
IV estimator of and  are consistent only when T ! 1, but
inconsistent when T is xed and N ! 1.

IV estimator of

We now treat

yi;t 1.

to the OLS estimator:

^ =

PN PT
i=1 t=1 yit yi;t 1
PN PT
2
i=1 t=1 yi;t 1

=+

PN PT
i=1 t=1( i + "it)yi;t 1 :
PN PT
2
i=1 t=1 yi;t 1

We show that

N X
T
1 1 T
1 X
( + " )y
=
Cov(yi0; i)
plimN !1
NT i=1 t=1 i it i;t 1 T 1 
1  2 
T;
+
(
T
1)
T

+

T (1 )2
and

As

5.3.1

"it.

N X
T
N 2
1 X
1 2T
2
i yi0
plimN !1
yi;t 1 =
:
NT i=1 t=1
T (1 2) N


 2 1
1 T 1 2T
+
: T 2
+
(1 )2 T
1 
1 2

82

CHAPTER 5.

1 T 1
+
T (1 ) 1 
1

"2
+
(T 1)
T (1 2)2
2

2T
Cov(yi0; i)
2


T 2 + 2T :

The bias depends on the behavior of initial conditions

or generated as

5.3.2

yit).

yi0 (constant

An equivalent representation

yit = yi;t 1 + xit + zi + uit;

with the following assumptions:

jj < 1; E ( i) = E ("it) = 0;

E ( ixit) = 0; E ( izi) = 0; E ( i"it) = 0;
E ( i j ) =  2 if i = j;
0 otherwise;
E ("it"js) = "2 if i = j; t = s;
0 otherwise:
We can also write

wit = wi;t 1 + xit + zi + "it;

yit = wit + i;
where i = i =(1
); Ei = 0; V ar(i) = 2 =  2 =(1 )2;
and the dynamic process
fect

i.

5.3.

83

5.3.3

(A)

(B)
In model (A),

yit

yit = yi;t 1 + xit + zi + i + "it;

wit = wi;t 1 + xit + zi + "it;
yit = wit + i:
is driven by unobserved characteristics

xit and zi .

i , dif-

wit is independent from individual

eects i . Conditional on exogenous xit and zi , wit are driven by
identical processes with i.i.d. shocks "it . But observed value yit is
shifted by individual-specic eect i .

In model (B), dynamic process

Possible interpretation:
and

wit

is a latent variable,

The two processes are equivalent because

yit

is observed,

wit is unobserved.

But

assumptions (or knowledge) on initial conditions may help to distinguish between both processes.

Dierent cases:

 1/ yi0 xed;
 2/ yi0 random;
 2.a/ yi0 independent of i, with E (yi0) = y
2 ;

and

V ar(yi0) =

y0

 2.b/ yi0 correlated with i, with Cov(yi0; i) = y2 ;

 3/ wi0 xed;
 4/ wi0 random;
 4.a/ wi0 random with common mean w and variance "2=(1
0

2)

84

CHAPTER 5.

DYNAMIC PANEL DATA MODELS

(stationarity assumption);

w2 0;

 4.c/ wi0 random with mean i0 and variance "2=(1

2) (sta-

tionarity assumption);

 4.d/ wi0 random with mean i0 and arbitrary variance w2 0.
See Appendix 4 for a derivation of Maximum Likelihood estimators in each case.

5.3.4

"2 are known, maximizing log-likelihood wrt. ;

2
2
and yields the GLS estimator. When  and " are unknown,
When

 2

and

VT .

Other cases

and  are consistent when T ! 1, because GLS

converges to Within. When N ! 1 and T is xed, GLS is inconEstimators for

eects.

5.3.5

Seminal paper on Dynamic Panel Data models (1966). Household

demand for natural gas in the US, including a/ the demand due
to replacement of gas appliances, and b/ demand due to increases
in the stock of appliances.

5.3.

85

THE RANDOM-EFFECTS MODEL

Table 5.2: Properties of the MLE for dynamic panel data models
Parameters

xed,

Case 1:

; ; "2
; 2

Case 2.a:

; ; "2
y ; ;  2 ; y2
0

; ; "2
wi0; ; 2

yi0

xed,

!1

xed
Consistent

Inconsistent

Consistent

yi0

Case 2.b:

!1

Consistent

; ; "2
y ; ;  2 ; y2 ;

random,

yi0

ind. of

Consistent

Consistent

Inconsistent

Consistent

yi0

correlated with

Consistent

Consistent

Inconsistent

Consistent

Case 3:

wi0

xed

Consistent

Inconsistent

Inconsistent

Inconsistent

wi0 random, mean w , variance "2=(1 2)

; ; "2
Consistent
Consistent
2
w ; ; 
Inconsistent
Consistent
2
Case 4.b: wi0 random, mean w , variance w
; ; "2
Consistent
Consistent
2
w ; ;  ; w
Inconsistent
Consistent
Case 4.c: wi0 random, mean i0, variance "2 =(1
2)
; ; "2
Consistent
Inconsistent
2
Inconsistent
Inconsistent
i0; ; 
2
Case 4.d: wi0 random, mean i0, variance w
; ; "2
Consistent
Inconsistent
2
2
i0;  ; w
Inconsistent
Inconsistent

Case 4.a:

86

CHAPTER 5.

Demand system:

Git = Git (1 r)Gi;t 1;

Fit = Fit (1 r)Fi;t 1;
Fit = a0 + a1Nit + a2Iit;
Git = b0 + b1Pit + b2Fit;
Git and Git are respectively the new demand and the actual
demand for gas at time t from unit i, r is the appliances deprecia
tion rate, Fit and Fit are respectively the new and actual demand
for all types of fuel, Nit is total population, Iit is per-head income,
and Pit is relative price of gas.
where

Git = 0 + 1Pit + 2Nit + 3Ni;t 1

+ 4Iit + 5Ii;t 1 + 6Gi;t 1;
where

Nit = Nit

Ni;t 1, Iit = Iit

Ii;t 1, and 6 = 1 r.

Gi0 are xed, case 1/).

In accordance with the theory,  (here, 6 ) is biased upward for
sumption that initial conditions

5.3.

87

Table 5.3: Parameter estimates, Balestra-Nerlove model

Parameter

0 (Intercept)
1 (Pit)
2 (Nit)
3 (Ni;t 1)
4 (Iit)
5 (Ii;t 1)
6 (Gi;t 1)

OLS

Within

GLS

-3.650

-4.091

(3.316)

(11.544)

-0.0451(*)

-0.2026

-0.0879(*)

(0.027)

(0.0532)

(0.0468)

0.0174(*)

-0.0135

-0.00122

(0.0093)

(0.0215)

(0.0190)

0.00111(**)

0.0327(**)

0.00360(**)

(0.00041)

(0.0046)

(0.00129)

0.0183(**)

0.0131

0.0170(**)

(0.0080)

(0.0084)

(0.0080)

0.00326

0.0044

0.00354

(0.00197)

(0.0101)

(0.00622)

1.010(**)

0.6799(**)

0.9546(**)

(0.014)

(0.0633)

(0.0372)

Notes. N = 36, T = 11. Standard errors are in parentheses. (*) and (**):
parameter signicant at 10% and 5% level respectively.

88

CHAPTER 5.

DYNAMIC PANEL DATA MODELS

Part II
Generalized Method of Moments
estimation

89

Chapter 6
The GMM estimator
Generalized Method of Moments: ecient way to obtain consistent parameter estimates under mild conditions on the model.
Very popular in estimating structural economic models, as it requires much less conditions on model disturbances than Maximum
Likelihood. Another important advantage: easy to obtain parameter estimates that are robust to heteroskedasticity of unknown
form.

6.1 Moment conditions and the method of moments

6.1.1

Moment conditions

N , fxi; i = 1; 2; : : : ; N g from which one

wishes to estimate a p  1 vector  whose true value is 0 .
Note: notation above is very general, xi will typically include de-

variables.

Let

91

92

exists and is nite. Moment conditions are then dened as

E [f (xi; 0)] = 0:
6.1.2

Consider the linear model

yi = xi 0 + ui; i = 1; 2; : : : ; N;
where

0 :

true value of parameter vector

term.
A common assumption is

and

ui

is the error

E (xiui) = E [E (xiuijxi)] = E [xiE (uijxi)] = 0:

In terms of the denition above,

xi ).

Moment conditions are then

E (xiui) = E [xi(yi
Note that here,

p = q,

xi 0] = 0:

as many moment conditions as we have

parameters to estimate.

E (ziui) = 0.
such that

There are

Vector

E (ziui) = E [zi(yi xi 0)] = 0;

f [(xi; yi; zi); ] = zi(yi xi ):

or

q moment equations (as many as there are instruments)

p parameters to estimate.
q  p.
and

E (uijxi) = 0

6.1.

6.1.3

A sample
bution

93

fxi; i = 1; 2; : : : ; N g is drawn from a Gamma distri-

(a; b) with

true values

a0

and

b0.

Relationship between

parameters and two rst moments of the distribution:

a
E (xi) = 0 ;
b0

a
E (xi)]2 = 20 :
b0
In our notation in the denition above:  = (a; b) and
h
a
a 2 ai
; (x
)
;
f (xi; ) = xi
b i b
b2
so that E [f (xi; 0] = 0.
6.1.4

E [xi

 using moment conditions given above ? In the

case where p = q (as many conditions as parameters), we could
solve E [f (xi; 0 )] = 0 for 0 . But E [f (:)] is unknown, whereas
function values f (xi;  ) can be computed 8; 8i. Also, sample
moments of function f (:) can be computed:

How to estimate

N
1X
fN () =
f (x ; ):
N i=1 i

E (f ) close to
fN (population moments close to empirical moments), then ^N is
a convenient estimate for 0 , where f (^
 N ) = 0.
0 = E [f (0)]  fN (^N ) ) 0  ^N :

Two important conditions need to hold for the method of moment

estimation to be valid: a)

E (f )

is adequately approximated by

94

fN ; b) moment conditions can be solved for ^N .

Example: linear regression.
Sample moment conditions are

N
N
1X
1X
x u^ =
x (y
N i=1 i i N i=1 i i
and solving for

^ N

yields

^ N =
6.1.5

xi ^ N ) = 0;

N
X
i=1

xix0i

! 1 N
X
i=1

xiyi:

Poisson process: dependent variable is discrete (number of events,

etc.). Restriction: Mean of distribution is equal to the variance.
Assumption:

dependent variables

y1; y2; : : : ; yN

are distributed

according to independent Poisson distributions, with parameters

1; 2; : : : ; N

respectively.

We assume the

i's

ri
r!

depend on explanatory variables by a log-

linear relationship:

log i = 0 +

p
X
j =1

j xij :

L=

Ni=1

exp(


yi i
i)
yi!

"

= exp

N
X
i=1

i + 0

N
X
i=1

yi

6.1.

p
X
j =1

N
X
i=1

xij yi

 1

Ni=1yi!

95

Let us consider the following sample moments :

T0 =

N
X
i=1

yi

Tj =

N
X
i=1

xij yi

j = 1; : : : ; p;

and we use the fact that

@i
= i
@ 0
If we set derivatives of

T0 =

N
X
i=1

^ i

and

@i
= xij i:
@ j

log L wrt. 0 and the j 's to 0, we get

Tj =

N
X
i=1

xij ^i

j = 1; : : : ; p

P
^i = exp( ^ 0 + pj=1 ^ j xij ): Hence, we match sample moPN
Pp
^
ments T0 and Tj to theoretical moments
exp(

+
^ j xij )
0
i
=1
j
=1
PN
^ Pp ^
and Tj =
i=1 xij exp( 0 + j =1 j xij ) respectively.
We have p + 1 such matching conditions for p + 1 parameters.

where

6.1.6

Note the dierence between the Method of Moments philosophy

and the usual estimation criteria. For Maximum Likelihood and
Least Squares, we maximize (minimize) a criterion

^ = arg max  log L() (MLE);

^ = arg min N1 PNi [yi f (xi; )]2

(LS)

96

system for

.

Example: Instrumental Variable estimation

We could consider minimizing the IV criterion wrt.

^ = arg min (Y


where

X)0Z (Z 0Z ) 1Z 0(Y

:

X);

Z is a N  q matrix of instruments, or start from the FOC:

N
N
1X
1X
z u^ =
z (y
N i=1 i i N i=1 i i

^ =

N
X
i=1

zi0 xi

! 1 N
X
i=1

xi^) = 0

zi0 yi = (Z 0X ) 1Z 0Y:

from the FOC

 or start

N
1X
@ log L()
j=^ = 0;
N i=1
@

Problems that remain to be solved:

 Ensure that we can replace population moments by sample moments, for the Method of Moments to work.

 What if the system of moment conditions is overidentied (more

conditions than parameters) ?

 How to be sure our moment conditions are valid (e.g.,

choice of instruments) ?

valid

6.2.

97

6.2.1

Introduction

 are overidentied by moment conditions. Equations E [f (xi; 0 ] = 0 represent q conditions for p

unknown parameters, therefore we cannot nd a vector ^
N satisfying fN ( ) = 0.

by dening

^N = arg min QN () = fN ()0AN fN ();



AN

0(1).
Important note: for the just-identied case, QN ( ) = 0
fN () = 0, but in the over-identied case, QN () > 0.

where

is a positive weighting matrix of order

because

This fact is important for model checking (we will come to this
point later in the course).

6.2.2

Consider

Y = X + u with condition E (W 0u) = 0 (W

ments), and

rank(W 0X ) = p.

Solving for

we have

are instru-

^ = (W 0X ) 1(W 0Y )

that we replace in the IV criterion:

u( ^ )0PW0 u( ^ ) = Y


X (W 0X ) 1(W 0Y ) 0 W
(W 0W ) 1W 0

Y X (W 0X ) 1(W 0Y )





= Y 0PW Y + (W 0Y )0(W 0X ) 1X 0 PW X (W 0X ) 1(W 0Y )

98

(W 0Y )0(W 0X ) 1X 0 PW Y Y 0PW X (W 0X ) 1(W 0Y )

= Y 0PW Y + (Y 0W )(W 0X ) 1(X 0W )(W 0W ) 1(W 0X )(W 0X ) 1
(W 0Y ) (Y 0W )(W 0X ) 1(X 0W )(W 0W ) 1(W 0Y )
(Y 0W )(W 0W ) 1(W 0X )(W 0X ) 1(W 0Y )
0
1 = (X 0 W ) 1 :
and because (W X )
u( ^ )0PW0 u( ^ ) = 2Y 0PW Y 2Y 0PW Y = 0:
6.2.3

A denition

Let the observed sample fxi; i = 1; 2; : : : ; N g from

which we wish to estimate a p  1 vector of parameters  whose
true value is 0. Let E [f (xi; 0)] = 0 be a set of q moment conditions, and fN () the corresponding set of sample moments. Dene
the criterion
QN = fN ()0AN fN ();
where AN is a stochastic, positive O(1) matrix. The GMM estimator of  is
^N = arg min QN ():
Denition 1

6.2.4

Consider again the linear regression model with

q > p instruments

(this is an over-identied model). Moment conditions are

E (ziui) = E (zi(yi

xi 0)) = 0

N
1X
fN ( ) =
z (y
N i=1 i i

xi ) =

1 0
(Z Y
N

Z 0X ):

6.3.

99

AN =
Assume that

1 0Z

NZ

N
X

1
zi0 zi
N i=1

! 1

= N (Z 0Z ) 1:

! 1), to a

A. The GMM criterion is then

1
QN ( ) = (Z 0Y Z 0X )0(Z 0Z ) 1(Z 0Y Z 0X ):
N
Dierentiating wrt. give rst-order conditions:
1
@QN ( )
j
^ N = 2X 0 Z (Z 0Z ) 1(Z 0Y Z 0 X ^ N ) = 0:

=

@
N
^ N , we have
Solving for
constant matrix


^ N = X 0Z (Z 0Z ) 1Z 0X 1 X 0Z (Z 0Z ) 1Z 0Y:

This expression is the IV formulation for the case where there are
more instruments than parameters.

6.3 Asymptotic properties of the GMM estimator

We examine here key properties that any useful estimator should
verify: consistency (convergence to the true parameter value as
the sample size gets large) and asymptotic normality (to be able
to use the asymptotic distribution for statistical inference).

100

6.3.1

Consistency

Assumption set 1
(i)

E [f (xi; )] exists and is nite 8 2  and 8i.

gi() = E [f (xi; )].
gi() = 0 8i ,  = 0.

(ii) Let

There exists

0

such that

fNj and gNj respectively denote elements of the q vectors

p
fN () and gN (). Then fNj gNj !
0 uniformly 8 2 
and 8j = 1; 2; : : : ; q .

(iii) Let

AN

such that

AN

AN

Under assumptions (i)

^N is weakly consistent.
Theorem 1

p
!
0.

(iii) is a stronger requirement than

pointwise convergence in probability on . It means that
Note: Uniform convergence in

 p

sup fNj () gNj ()

2

! 0 for j = 1; 2; : : : ; q:

With pointwise convergence in probability only, it is not always

true that
when

fNj (N )

gNj (N )

p
!
0, where N is a sequence of 

increases.

Elements of the proof:

From (iii) and (iv ), we can form a non-random sequence

6.3.

ASYMPTOTIC PROPERTIES OF THE GMM ESTIMATOR

101

such that

p
!
0 uniformly for  2 :
 N () = 0 ,  =
we have that Q

QN () Q N ()
(i) and (ii),
Q N () > 0 otherwise.

From

0,

and

Therefore,

0 = arg min Q N ():

2

^N
 ^N minimizes QN ();
 0 minimizes Q N (p );
 QN () Q N () ! 0:
But this implies that

p
!
0, because

For asymptotic normality, we need additional assumptions.

6.3.2

Asymptotic normality

Assumption set 2
(v) Function

P

FN () = @fN ()=@ = N1 Ni=1 @f (xi; )=@.

p

 !
sequence N such that N
0, we assume that

(vi) Let

FN (N ) FN
where
on

.

FN

is a sequence of

For any

p
!
0;

102

(vii) Function

VN 1=2 NfN (0)

d
!
N (0; Iq );
N = NV ar[fN (0)], a sequence of q  q non-random,
where V
positive denite matrices.

Under Assumptions (i) (vii), thepGMM estimator

^N has the following asymptotic distribution: N (^N 0) v
N (0;
), where
is a p  p matrix:
Theorem 2

i 1

FN (^N )0AN VN AN FN (^N )

h
i 1
0
^
^
 FN (N ) AN FN (N ) :

demic Press: Orlando, Denition 4.20):

 1=2 

0
0
^
^
^
^

FN (N ) AN VN AN FN (N )
FN (N ) AN FT (N )
p
d
N (0; Ip)
 N (^N 0) !

Proof:
We know that

0:

fN

0 = fN (^N ) = fN (0) + FN (N )(^N 0);


where N 2 [^
N ; 0]. Since ^N is a consistent estimator (proved
p
 !
above), we know that N
0.
Let us premultiply expansion above by FN (^
N )0AN :
FN (^N )0AN fN (^N ) = FN (^N )0AN fN (0)
+FN (^N )0AN FN (N )(^N

0) = 0

6.4.

103

(^N

0 ) =

N (^N

i 1

h

FN (^N )0AN fN (0)

i

1
0 ) =
FN (^N )0AN FN (N ) 
p
FN (^N )0AN VN1=2VN 1=2 NfN (0)

p
VN 1=2 hNfN (0) is Ni(0; Iq ).
hp
i
p ^
^
Therefore, E
N (N 0) = 0 and V ar N (N 0) =
,
where
=
h
i 1
0

^
FN (N ) AN FN (N )
FN (^N )0AN VN AN FN (^N )
where

i 1

(vi), we can replace FN () by FN (^N )

everywhere. Note that FN is q p, therefore the variance-covariance
matrix of the GMM estimator is p  p.

6.4 Optimal and two-step GMM

Optimality of GMM: what is the best weighting matrix

AN , the

one giving us the smallest asymptotic variance-covariance matrix.

1 0
1
0
0
Aopt
N = arg min (FN AN FN )) FN AN VN AN FN (FN AN FN ) :
AN

Lemma 3

The matrix

(FN0 VN 1FN ) 1

104

CHAPTER 6. THE GMM ESTIMATOR

If we select

AN = VN 1, we get

(FN0 AN FN )) 1 FN0 AN FN (FN0 AN FN ) 1

(FN0 AN FN ) 1

= 0:
Hence, best weighting matrix for GMM: inverse of the variancecovariance of moment conditions.
For this choice, variance of GMM is simply

 1

V ar(^N ) = FN0 (^N )VN 1FN (^N )

"

!0
!# 1
h
i
^
^
1
1 @f (x; N ) 1
1 @f (x; N )
V arf (x; ^N )
N @
N
N @

and this denes the optimal GMM. But: in general, no condition imposed on distribution of

VN

that produces a

heteroskedasticity-robust GMM estimator for .

Solution: use a two-step estimation procedure

 Step 1.

AN (A1N ):

^1N

using an

^1N = arg min u0()ZA1N Z 0u():



 Step 2. Compute V^N from u(^1N ) and nd ^2N such that
^2N = arg min u0()Z (V^N ) 1Z 0u():


initial matrix

A1N

6.5.

105

solutions:

 Method 1.

sively replacing

^N

and

AN

 Method 2. Acknowledge the fact that optimal weighting matrix

depends on

, and solve
^N = arg min QN ()  fN ()0AN ()fN ():


In practice, construction of variance-covariance matrix depends on the nature of data: cross-sections, times series, or panel
data (see dedicated section below).

6.5 Inference with GMM

Advantage of GMM over many alternative estimation procedures:
easy to provide statistical inference on model validity. In general,
we will test for the validity of moment conditions, also denoted orthogonality conditions.

^N = arg min QN () = fN () VN fN (), where fN ()

P
1 N f (x ; ) and V is a consistent estimator of
i
N
i
N
V = limN !1 var[ NfN (0)].
First-order condition associated with minimization of QN ( ):
Find

@QN (^N )
^N )0VN 1fN (^N ) = 0;
=
F
(
N
@ ^N
where FN (^
N ) = @fN (N )=@.
If ^
N satises FOC above, it must also satisfy
P^ VN 1=2fN (^N ) = 0;

106

where

P^ = M^ (M^ 0M^ ) 1M^ 0

and

so that

M^ = VN 1=2FN (^N );


P^ VN 1=2fN (^N ) = VN 1=2FN (^N ) FN (^N )0VN 1FN (^N )

FN (^N )0VN 1fN (^N ):

 1

Population analog to condition above:

P V 1=2E [f (0)] = 0;
where

P = M (M 0 M ) 1M 0
and Fi ( ) = @f (xi;  )=@ .
Projection matrix
only
If

and

is of rank

p,

linear combinations of the

M = V 1=2E [Fi(0)];

so that restrictions above set

q1

vector

E [f (xi; 0)] to

0.

the remaining conditions are unused in estimation.

The identifying restrictions determine the asymptotic distribution
of

^N :

N (^N

p
0
1
0
1
=
2
0) = (M M ) M V
NfN (0) + op(1);

M pN (^N 0) is asymptotically equivalent to

P V 1=2 NfN (0). This implies
p ^

d
N (N 0) !
N 0; (M 0M ) 1 :

where

The basic way of testing for model validity is to use the over-

identifying restrictions

(Iq

6.5.

107

p.

P^ )VN 1=2fN (^N ) = VN 1=2fN (^N ):

QN (^N ) measures the extend to which

(IQ
Interpretation:

the data

satises the over-identifying restrictions. The asymptotic distribution of sample moments is determined by the function of the
data in the over-identifying restrictions:

VN 1=2 NfN (^N ) = (I q P )V 1=2 NfN (0) + op(1);

^ converges in probability to P . We nally have
because P

d
!
N (0; Iq

P):

Both statistics (from identifying and over-identifying restrictions)

are orthogonal:

p
p
Cov[ N (^N 0); NfN (^N )]
= (M 0M ) 1M 0 (Iq P ) = 0:

It is equivalent to test model validity by testing either

H0 : E [f (xi; 0)] = 0

or

V 1=2 is non-singular. H0 is the combination of identifyI

O
ing restrictions (H0 ) and over-identifying restrictions (H0 ):
H0I : P V 1=2E [f (xi; 0)] = 0;
H0O : (Iq P )V 1=2E [f (xi; 0)] = 0:

because

H0I because this is a set of p conditions,

O
by estimated sample moments. But H0

It is not possible to test for

automatically satised

can be tested because they are not necessary for identication.

The test statistic proposed by Hansen (1982) is

JN = NQN (^N )

d
!
2(q

p)

under

H0:

108

It can be shown that

JN

A 0
JN v
zq (Iq
where zq v N (0; Iq ).

is asymptotically equivalent to

P )0 (Iq

P )zq = zq0 (I

P )zq ;

6.6 Extension: optimal instruments for GMM

We have seen above how to obtain the optimal GMM estimator,
by selecting for the weighting matrix the inverse of the covariance
matrix for the moment conditions. We now show how to obtain
an even more ecient GMM estimator, based on the best choice
for the instruments. We are looking for the optimal, asymptotic
variance minimizing choice of instruments.

Based on Newey 1993, Ecient estimation of models with conditional moment restrictions.

6.6.1

(z; ) denote a s  1 vector of residuals, where z is a p  1

vector of observations (on all variables), and  is the q  1 vector
Let

of parameters. We have the following moment restrictions

E [(z; 0)jx] = 0

E [A(x)(z; 0)] = 0;
where x is a vector of conditioning variables, A(x) is an r  s
matrix of functions of x, and 0 the true value of parameters.
Focus of the analysis here: choose A(x) to minimize the asymptotic variance of the GMM estimator.
Let

@(z; 0)
D(x) = E
jx ;
@

6.6.

109

B (x) = C:D(x)0
(x) 1;
where

covariance matrix for these instruments is


 = E [D(x)0
(x) 1D(x)] 1 :
Example: Linear model with heteroskedasticity
We have in the model

D(x) = x0;

y = x0 0 + "; E ("jx) = 0,

The corresponding IV estimator is in this case the weighted least

1=2(x): 
1 corrects for heteroskedasticity,
Analogy with linear model:
(x)
and derivatives @(z; 0)=@ correspond to regressors, and matrix
D(x) is a function of x closely correlated with those derivatives.

Since

and dene

so that

E (mAm0B ) = FN0 AN E [A(x)

(x)B (x)0] = FN0 AN E [A(x)D(x)]
and
= FN0 AN FN ;
E (mAm0A ) = FN0 AN VN AN FN ; [E (mB m0B )] 1 = :

Therefore,

(E [mB m0B ]) 1

110

CHAPTER 6. THE GMM ESTIMATOR

= (E [mAm0B ]) 1 E [mAm0A ]
where

E [mAm0B ] (E [mB m0B ]) 1 E [mB m0A ]

E [mB m0A] = E [RR0];
n

R = (E [mAm0B ]) 1 mA
Since

E [RR0] is positive semi-denite,  is a lower bound for the

asymptotic variance.

6.6.2

A rst feasible estimator

Optimal instruments

D(x) = D(x; 0)

and

(x) =
(x; 0);

D(:) and
(:) are known, and  is a real vector. Because D (x) and
(x), we could estimate 0 by running
a linear regression of @(z; ^
)=@ and (z; ^)(z; ^)0 on x. This
^ (x) = D(x; ^)0
(x; ^) 1 and the resulting GMM estimator
gives B
where functions

would be

^ = arg min

8
n
<X

2 :

i=1

(zi; )0B^ (xi)0

"

n
X
i=1

B^ (xi)B^ (xi)0

# 1
n
X
i=1

9
=

B^ (xi)(zi; ) :

This estimator is always consistent, but not ecient if functions

D(x; ) and
(x; ) are misspecied.

Consider

6.6.

where

111

h(:) is known.

Ex-

ploiting additional information on second moment yields an IV

estimator at least as ecient as weighted least squares.
Drawback: estimator may not be consistent if the form of heteroskedasticity is misspecied.
Dene moment restrictions as

(z; ) = y

f (x; ) ; [y

f (x; )]2 h(x; ; ) 0 :

Optimal instruments take the form

@f (x; )=@ 0
0
D(x) = D(x; 0); D(x; ) = @h(x; ; )=@ 0 @h(x; ; )=@0 ;

(x) = V ar[("; "2)0jx];

B (x) = D(x)0
(x) 1:

Empirical issue: when is incorporating additional moment condition yielding a more ecient estimator ?
Asymptotic variance of the heteroskedasticity-corrected least squares
estimator:



E ["2jx]

@f (x; 0)
@



 
@f (x; 0) 0 1
;
@



in E [D(x)0
(x) 1D(x)] 1.

The two are equal if

 E ["3jx] = 0, or
 h(x; 0; 0) = h(x; 0).
Otherwise, the asymptotic variance of the heteroskedasticity-corrected
least squares estimator will be larger than the conditional moment
bound.

Corollary:

not depend on

x or !

h(x; ; ) and
(x) do

112

Computation of an ecient estimator

Needs specication of

(x),

Assume

E ["3jx] = 0;

0 can be estimated by the sample vari2

^
^ ^ ), where ^ and ^ are initial estif (xi; )] =h(xi; ;

where kurtosis parameter

ance of

[yi

mators.
Estimated optimal instruments are then

"

^
0
^ x) = h(x; ; ^ )
D^ (x) = D(x; ^);
(
^ ^ )2 ;
0
^:h(x; ;
^ x) 1:
B^ (x) = D^ (x)0
(
6.6.3

Nearest-neighbor estimation of optimal instruments

avoid misspecication in

D(x; 0)

and

(x; 0)

in

computing optimal instruments.

Principle: estimate expectations that enter optimal instruments
nonparametrically (these expectations are conditional upon

x).

6.6.3.1 The nearest neighbor estimator

Simplest nonparametric estimator: nearest neighbor, or

NN

estimator.

The nearest neighbor estimator of conditional expectation is

constructed by averaging over the values of the dependent variable for observations where the conditional variable (x) is closest
to its evaluation value.

6.6.

EXTENSION: OPTIMAL INSTRUMENTS FOR GMM

113

xl denote a measure of scale of lth component of x (standard deviation). x being of rank r , dene
Let

jjxi

xj jjn =

r
X
(xil
l=1

^ l

xjl )2

)1=2

This measures the distance between observations

ing for the multivariate nature of

K; K  n, and

8
<
:

Integer
vation

i.

!kK  0
!kK = 0

x.

i and j , account-

1  k  K;
for
k > K;
PK
k=1 !kK = 1:

for

and

 n is the number of nearest neighbors for any obser-

j 6= i according to distance
th
above. Then assign the weight Wij = !jK to observation with j
smallest distance jjxi
xj jjn.
Let

Wii = 0

and rank observations

!kK = 1=K; k  K .
To compute conditional expectation of y given x:
 Select the set of the K (out of n) xi's closest to point x;
 Compute the mean of the yi values corresponding to the xi's
Example: uniform weights

chosen above:

K
1X

E (yjx) =
!kK yk (x) =
yk (x);
K k=1
k=1
K
X

where

yk(x)

yi's ordered according to distance


is the yi whose xi is closest to x, y2 is

are the original


measure dened above (y1

114

CHAPTER 6. THE GMM ESTIMATOR

Other possibility:

E (yjx) =

n
X
j =1

!j yj(x);

using either triangular weights:

!jT =

2(K
0

j + 1)=[K (K + 1)]

j < K;
for j  K;
for

!jQ

6(K 2
0

(j

1)2]=[K (K + 1)(4K

1)]

j < K;
for j  K:
for

6.6.3.2 Application to optimal instruments estimation

The nearest neighbor estimator of the conditional covariance
at

xi is

^ xi) =

where observation

n
X
j =1

(xi)

i is excluded because Wii = 0 (leave-one-out

procedure).

D(x) is accordingly
n
X
@(zj ; ^)
^
D(xi) =
Wij
:
@
j =1

Assume now some components of

form, and depend only on

x.

D(x)

D(x; )

6.6.

115

x and nuisance parameters .

D(x; ) has the same dimension as D(x), and its components are
equal to those of D (x) that are known, and 0 otherwise.

n
X
j =1

Wij

@(zj ; ^)
@

D(xj ; ^) :

Finally, we can compute

n
X
1
0
1
^ xi) ; ^ =
^ xi)D^ (xi)
B^ (xi) = D^ (xi)
(
D^ (xi)0
(
n i=1
6.6.4

! 1

estimators

6.6.4.1 Conditional moment estimation

We wish to estimate the conditional expectation at the point

x, E (Y jX = x) = m(x), with
m(x) =

Z 1

X=

f (y; x)
y
dy;
f1(x)
1

f (:; :) and f1(:) are respectively the joint density of (x; y)

and the marginal density of x.
A nonparametric alternative to k
NN will consist in estimating densities above nonparametrically, to construct m
^ (x) =
i
R1 h
^
^
1 y f (y; x)=f1(x) dy . Popular approach in practice: the Nadaraya-

where

116

Let

F (x) denote the cumulative density function of X .

The den-

sity function is

f (x) =

d
F (x + h=2) F (x h=2)
F (x) = lim
h!0
dx
h

P rob (x h=2 < X < x + h=2)

:
h!0
h
For estimating f (x) based on observations x1 ; : : : ; xn , we consider h a function of n such that h
! 0 when n ! 1. The
= lim

probability above is then estimated by the proportion of observations falling in the interval

1
f^(x) =
nh
1
=
nh

(x

Number of


Number of

x1

n
1 X
=
1I
nh i=1

n
1 X
=
1I
nh i=1

1

2

h=2; x + h=2):


h
h
;x+
2
2

x1; : : : ; xn in x
x

;:::;

xn

in

( 1=2; 1=2)

1 xi x 1
 h 2
2


1
;
i
2



xi

f^(x) is the per unit relative frequency in the interval (x h=2; x +

h=2), with midpoint x. Bandwidth h measures the degree to which
the data are smoothed (averaged) in computing f^(x).
This rst, naive nonparametric density estimator as been proposed by Fix and Hodges, 1951, and obtains by averaging the

xi's

6.6.

EXTENSION: OPTIMAL INSTRUMENTS FOR GMM

in an interval around

117

xi  h=2.

If one prefers smoother sets

of weights, one can replace the indicator function by a positive kernel function denoted

K (:).

estimator is

The Parzen-Rosenblatt kernel density

n
n
X
X
x
x
1
1
i
K
=
K ( i) ;
f^(x) =
nh i=1
h
nh i=1
where the kernel function has the following properties:

Z 1

K ( )d = 1; K (

1) = K (1) = 0;

Note: Easy to generalize to multivariate density estimation,

with a multivariate kernel
and

K (x)dx = 1:

K1(:; :) such that K (x) = K1(x; y)dy



n
1 X
z z
^
^
f (y; x) = f (z ) = q+1
K1 i
;
nh i=1
h
where

is a xed point.

6.6.4.3 Selection of bandwidth parameter

Important issue: selection of the optimal bandwidth parameter,

h.

For this, we need the following set of assumptions:

(A1) Observations
(A2) Kernel

118

R
R K (2 )d
(ii)
(
R K
2
(i)

(iii)

= 1,
)d = 2 6= 0,
K ( )d < 1.

(A3) Second-order derivatives of

in some neighborhood of

x.

are continuous and bounded

h = hn ! 0 as n ! 1.
(A5) nhn ! 1 as n ! 1.
(A4)

With these assumptions, we can approximate the bias and variance of

f^:

h2 00
^
Bias [f (x)] =
 f (x);
2 2
Strategy for choosing

h:

1
var [f^(x)] =
f (x)
nh

K 2 ( )d :

Error (MISE):

Z h

i2

f^(x) f (x) dx =

Z h

or preferably, its approximation (AMISE):

1
AMISE = 1h4 + 2(nh) 1;
4
Z

where

1 = 22 [f 00(x)]2dx; 2 =

K 2( )d :

2 if O(h4 ) and variance is O(nh) 1, AMISE if of order

maxfO(h4); O(nh) 1g. Hence, the only value of h for which the

Since Bias

h / n 1=5;

for which

6.6.

119

6.6.4.4 The Nadaraya-Watson kernel-based estimator

The estimator is

m
^ (x) =

"

1 Pn K1 yi y ; xi x 
iP
=1
h
h
y
dy;
n
x
x
q
1
i
K
1 (nh )
i=1
h

Z 1

(nhp)

K (:) and K1(:; :) are q-multivariate and p-multivariate kernels respectively, and p = q + 1 (recall x has rank q ). Dene
i = h 1(yi y) , y = yi hi. The numerator above becomes

where

Z 1

(nhp) 1

n
X

i=1
Z
n
1X
y
=
n i=1 i
n Z 1
1X

n i=1

(yi
1
1

h )K1 ;


K1 ;

xi

h p+2K1 ;

xi

xi





hd

h q d


d;

and since the last term is zero for symmetric kernels, we nally
have

n
1X
=
yh
n i=1 i

Z 1

K1 ;

xi



d

n
1X
x x
=
:
yih q K i
n i=1
h

m
^ (x) =

" n
X
i=1

xi

# 1 "X
n
i=1

xi

yi :
G

NN ).

120

We consider here weights similar to kernel functions with unbounded support:

m(x) = E (Y jX = x) =
where

and

n
X
i=1

!is(x)yi;

xi x 
d
!is(x) = Pn
xi x  ;
K
i=1
d
th nearest
distance between x and its K

d is the

neighbor.

Numerous papers on optimal choice of window width

One can show (Mack, 1981) that

and

K.

estimation:

K = nh4=(4+q)

and

K opt / n4=5; hopt / n 1=5:

Chapter 7
GMM estimators for time series
models
7.1 GMM and Euler equation models
Lucas critique (1976): evaluations based traditional dynamic simultaneousequation models are awed because parameters are assumed invariant across dierent policy regimes.
Hence, marginal response to a change in policy instruments is not
to be expected from rational agents taking into account policy
changes in their decision making.
Standard estimation procedures (MLE) are computationally burdensome when one introduces taste and technology parameters.

Hansen and Singleton (1982): GMM can be applied easily to

structural models, to draw inference on these parameters.

7.1.1

Consumption-based asset pricing model: representative agent chooses

consumption and investment in a single asset to maximize dis-

121

122

counted utility

max E0

"
1
X
t=0

t U (Ct) ;

where

Et(:) is expectation operator at time t, conditional on information

set t ,
Ct is consumption,
t is a constant discount factor,
U (:) is a strictly concave utility function.
Budget constraint:

Ct + Pt Qt  RtQt 1 + Wt;
where

Rt: pay-o for asset (bought in period t 1),

Pt and Qt: price and quantity of asset bought,
Wt: labor income.
Asset price is deated by the price of consumption good.

First-order condition:

Equivalently,

where

Rt+1 U 0(Ct+1)
Et
 U 0(C )
Pt
t

U 0(:) = @U=@C:

1 = 0:

This is the Euler equation for the system.

Specication of the utility function:

Ct
U (Ct) = ;

with

< 1;

7.1.

123

GMM AND EULER EQUATION MODELS

so that

where

7.1.2

R
C
Et t+1  t+1 1 = 0;
Pt
Ct
1.

(7.1)

GMM estimation

Maximum-Likelihood Estimation: specify conditional distributions

of

R
LW1;t+1 = log t+1
Pt

and

C
LW2;t+1 = log t+1
Ct


given

t ;

The

R
C
E t+1 t+1
Pt Ct

1 =E



R
C
t+1 t+1
Pt Ct



= 0:

Additional restriction obtain from incorporating the rational

expectations hypothesis: agents use all available information at

t, t, so that
If yt+1 2
= t but zt 2 t then Et(yt+1zt) = [Et(yt+1)] zt:
If Et (yt+1 ) = 0, by the Law of Iterated Expectations, we have
E (yt+1zt) = 0, and the Euler equation implies



Rt+1 Ct+1
E ["t+1( ; )zt] = 0 where "t+1 =
1;
Pt
Ct

time

124

and

zt

Valid candidates are

Ct i; Rt i; Pt i; i  0.

t.

Notes.

 This example shows that model errors need not be linear in

endogenous variables for GMM.

 If replaced by 1=(1 + r) where r:

rate, model is just identied (for

7.2 GMM Estimation of MA models

Consider estimation of a pure moving average MA(1) model

yt = "t + 0"t 1;
where

7.2.1

"t is an i.i.d.

(7.2)

02, and j0j < 1.

A simple estimator

0 =

E (ytyt 1)
0
=
:
E (yt2)
1 + 02

Replacing unknown parameter

^T =
we obtain estimator

^T

0 by sample estimator

PT
t=2 yt yt 1 ;
PT
2
t=2 yt

by solving

^2T

^T 1^T

1 = 0:

7.2.

125

j^T j < 0:5, but this may

not be veried in nite samples, especially if j0 j close to 1. We

may dene

~T =
and solution for

~T

8
<

0:5

if j
^T j < 0:5;
if 
^T > 0:5;
if

^
: T

0:5

is

~T =

Second structural parameter:

rived from

1 4^2T
:
2^T
02, whose expression can be de-

(1=T ) Tt=1 yt2

2
~ T =
:
2
~
1+
Consider now estimation in a GMM framework.

 = (; 2)0 and let



2
y
t yt 1  
f (yt; ) =
;
yt2 2(1 + 2)
such that Ef (yt ; 0 ) = 0 (theoretical moment condition).

T
1X
f T ( ) =
f (y ; ) =
T t=1 t

(1=T ) PTt=1 ytyt 1 2

(1=T ) Tt=1 yt2 2(1 + 2)

fT (^T = 0
~
^T = ~T = (T ; ~ 2T ).

This system is just-identied, and solving

same estimators as above:

yields the

126

CHAPTER 7. GMM ESTIMATORS FOR TIME SERIES MODELS

Estimators ^T and ~T are consistent and asymptotically normal with distribution
Theorem 4

p
pT (^T

0)
0)

T (^T

where

1
=
(1 02)2

v N (0; );

1 + 02 + 404 + 06 + 08

20203(2 + 02 + 04)
20203(2 + 02 + 04) 204(1 202 + 304 + 206)


0 0
+
;
0 4
with 4 the fourth-order cumulant of "t.
Under the normality assumption, asymptotic variance of the
MLE of

^T

(1

is

02).

in general.

7.2.2

Autoregressive Approximation (Durbin

1959).
The MA(1) dened by (7.2) is invertible, therefore it admits
an AR representation:

yt =
where

1
X
j =1

j (0)yt j + "t;

j = 1; 2; : : :

7.2.

127

which is approximated in practice by

yt =

K
X
j =1

j (0)yt j + "Kt:

(7.3)

"Kt = "t +

1
X
j =K +1

1:

We can look at (7.2) as the structural model, and Equation (7.3)

as the reduced-form model. The AR model captures second-order
properties of

yt

0 based on estimators for j (0).

K -vector
0
1
1()
A 8 ; with j () = ( )j ;
AK () = @ ...
K ( )
^K denote the K -vector of OLS estimators (^ 1; : : : ; ^ K )
and let A
Dene the

in (7.3).

For an given

K , we dene


^T K = arg min A^K

2

where

7.2.3

0

AK () VT K

 = ( 1; +1) and VT K

is a

A^K

K K

AK () ;

weighting matrix.

Example: The Durbin estimator

We can write

j () =  j 1; j = 1; 2; : : : ;

(7.4)

with

0() = 1:

128

exact autoregressive relationship for j (), and we can

estimate 0 by regressing OLS estimates (^
1; : : : ; ^ K ) on lagged
values of themselves, i.e., on 1; (^
1; : : : ; ^ K 1). The estimator is
This is an

P

^D =

K
^ j ^ j 1
j =1
P

K
2
^j
j =1

with

^ 0 = 1:

And in terms of (7.4):

VT K = BK ()0BK ();
and

LK : K  K

where

BK () = IK + LK ;

7.3 GMM Estimation of ARMA models

To simplify exposition, we concentrate on the ARMA(1,1) case.

7.3.1

The model is

where we assume

0 6= 0;

and we view the model as a regression

yt = 0yt 1 + ut;
where

ut = "t + 0"t 1;

(7.5)

7.3.

129

OLS estimation (ignoring the moving average structure in

is inconsistent because

yt =
7.3.2

1
X
j =0

0j ut

ut)

(7.6)

IV estimation

ut implies E (utut j ) = 0 8j  2, and

(7.6) implies that E (ut yt j ) = 0 8j  2. We can use these moment conditions to estimate consistently 0 with an IV procedure.

Moment conditions are

Ef (yt; 0) = 0

f (yt; ) = (yt

where

yt 1)yt 2;

T
1X
fT ( ) =
(y yt 1)yt 2;
T t=3 t
^ T ) = 0 for ^T gives
and solving fT (

^ T =

T
X
t=3

yt 2 yt 1

! 1 T
X
t=3

yt 2yt:

^ T is consistent and asymptotically normal, with

T ( ^ T 0) v N (0; ),
(1 02)(1 + 402 + 4 00 + 4 003 + 2 0202 + 02)
=
:
(1 + 00)2( 0 + 0)2

Theorem 5

In contrast, the asymptotic distribution is the MLE from the

ARMA(1,1) model is

T ( ^ MLE

(1 + 00)2(1 02)
0) v N 0;
:
( 0 + 0)2

130

Notes. Both these estimators have a large variance when

is close to

0.

The MLE is more ecient than GMM, especially for large values
of

0 and 0.

We can also consider augmenting the set of instruments (the

model becomes over-identied) by including

yt j ; j = 2; 3; : : :,

yielding

T
X

^ Tj =

t=j +1

yt 1yt

! 1
j

T
X
t=j +1

yt yt j ;

for

j  2:

^ Tj has asymptotic variance  0 2(j 2).

^ T is the most ecient of these
Because j 0 j < 1, it follows that
Dolado 1990 shows that

yt 1 )

variable (

yt

j ).

(rapidly) with

Since

yt

yt 1

This gives the

q vector of conditions

E (utyt j ) = 0;

where

8j

yt 1))

estimator is

^ Tq =

T
X
t=q+2

yt 1Yq;t0 2ATq

T
X
t=q+2

2.

Yq;t 2yt 1

! 1

GMM

7.4.

131

COVARIANCE MATRIX ESTIMATION

T
X
t=q+2

yt 1Yq;t0 2ATq

X
t=q+2

Yq;t 2yt;

q  q weighting matrix.
^ Tq is
The asymptotic distribution of

where

ATq

is a positive denite


T ( ^ Tq 0) !d N 0; "2(Rq0 Aq Rq ) 1Rq0 Aq Vq Aq Rq (Rq0 Aq Rq ) 1 ;

where

Vq = lim T varfT ( 0);

T !1

(1 + 00)( 0 + 0)
"2 0j 1
:
(1 02)
1
The optimal choice for the weighting matrix being ATq = Vq , we

Rq = E (Yq;t 2yt 1);

with

have

T ( ^ Tq

j th element


0) !d N 0; "2(Rq0 Aq Rq ) 1 :

7.4 Covariance matrix estimation

In the time-series framework, moment conditions are dened as

E [f (xt; 0] = 0,

and the variance-covariance matrix to be esti-

mated is

T X
T
X
1
VT = T var[fT (0)] =
E [f (xt; 0)f (xs; 0)]:
T t=1 s=1
This is the average of autocovariances for the process
Let

ft = f (xt; 0)

and rewrite

function:

VT =

T 1
X
j = (T

1)

VT

f (xt; 0).

as a general autocovariance

T (j )

where

132

T (j ) =
7.4.1

(1=T ) Tt=j +1 E (ftft0 j ); j  0;

P
(1=T ) Tt= (j 1) E (ft+j ft0); j < 0:

Linear regression model

yt = xt 0 + ut;
Assume

where

E (ft) = E (xtut) = 0:

E (utjut 1; xt; ut 2; xt 1; : : :) = 0

and

Residual

ut is neither heteroskedastic nor serially correlated.

We

have

T
1X

VT =
T (0) =
E (xtututx0t) = u2 E (xtx0t);
T t=1
the standard OLS variance-covariance matrix. The estimator of

VT

T
2X

^
V^T = u
xtx0t;
T t=1
7.4.2

where

T
1X
2
^ u =
u^2t ; u^t = yt
T t=1

^
xt :

E (utu0tjut 1; xt; ut 2; xt 1; : : :) = t2:

The covariance matrix is then

T
1X

VT =
T (0) =
E (xtututx0t);
T t=1

7.4.

133

which is consistently estimated by

T
1X
^
VT =
xtu^tu^tx0t:
T t=1
This is White's heteroskedasticity consistent estimator.
In a typical IV setup, where

ft = wt(yt

xt ); wt are instruments;

T
1X

E (u2t )wt0 wt;
VT =
T t=1
and the asymptotic covariance matrix would be

1
P X
T W

 1

1 ^
P P
T W W



1 0
X PW
T

 1

^ is a T  T diagonal matrix with typical element u^2t , and

PW = W (W 0W ) 1W 0.

where

7.4.3

Assume

VT =

m
X
j= m

T (j ):

The covariance matrix estimator is based on the sample analogue

V^T =

^ T (j ) =

m
X

^ T (j );

where

j= m
(
P
(1=T ) Tt=j +1 xtu^tx0t j u^t j ;
P
(1=T ) Tt= (j 1) xt+j u^t+j x0tu^t;

j  0;
j < 0:

134

In most cases, restrictions as in examples 1-3 above are too

strong, and an obvious idea would be to construct an estimator

V^MM

based on sample analogues to population autocovariances:

V^MM =

^ T (j ) =
where

T 1
X
j = (T

1)

^ T (j );

where

P
(1=T ) Tt=j +1 f^tf^t0 j ; j  0;
P
(1=T ) Tt= (j 1) f^t+j f^t0; j < 0;

But:

 The number of estimated autocovariances grows with the sample size;

 Although V^MM may be asymptotically unbiased, it is not consistent in the mean squared error sense;

 Finite sample properties: In the exact-identication case, V^MM

is 0 8T .
Why sample autocovariance matrix

^ T (j )

not consistent for

j, T + 1  j  T 1 ?
Suppose j = T
2; then
^ T (j ) tends to 0 as T
arbitrary

7.4.4

!1!

Given problem above, a rst idea is to consider models for which

autocovariance genuinely tends to 0 as
case for

! 1.

This is the

mixing property.

7.4.

135

Consider two bounded mappings Y : Rk+1 ! R

and Z : Rl+1 ! R. The sequence fytg is mixing if there exists
a sequence of positive numbers f ng, converging to 0, such that

Denition 2

jE [Y (yt; yt+1; : : : ; yt+k )Z (yt+n; yt+n+1; : : : ; yt+n+l)]

E [Y (:)]E [Z (:)]j < n:
We can replace the sum in the denition of

sum,

such that terms for which

p are eliminated.

VT :

V^T =
^ T (0) +

p 
X
j =1

by a

truncated

the following estimator for

^ T

( j ) =
(j )0, we consider


^ T (j ) +
^ T (j )0 :

(7.7)

p is dened as the lag truncation parameter,

1=4), so
and should go to innity at some rate, typically p = o(T
that all non-zero
T (j )'s are consistently estimated.

Problem with estimator in (7.7): in nite samples,

not be positive semidenite.
(1987):
with

multiply

^ T (j )

^ T (j ) may

V^T =
^ T (0) +

p 
X
j =1

j
p+1



^ T (j ) +
^ T (j )0 ;

^ T (0) down to

136

7.4.5

Extension of Newey-West suggestion: looking for more ecient

covariance matrix estimators.
General form: weighted average of sample autocovariance matrices:

V^T =

T 1
X
s= (T

1)

!s
^ T (s);

f!sg is denoted the lag window.

Strategy: choose a lag window such that f!s g approaches 1
 rapidly enough to obtain asymptotic unbiasedness;
 slowly enough to ensure that the variance converges to 0.
where the sequence of weights

In practice, we concentrate on



s
;
!s = k
mT

mT is the scale parameter or the bandwith parameter, and

function k (:) is the lag window generator. These estimators bewhere

0.

We assume

The function k(:) : R ! [ 1; 1] satises

k(0) = 1; k(z ) = k( z ) 8z 2 R;

Z 1

jk(z)jdz < 1;

and k(:) is continuous at 0 and "everywhere else" except at a nite number of points.
Note: When k (:) = 0 for z > 1, mT reduces to p, the lag truncation parameter.

7.4.

Let

kr = lim

z !0

tion

kr

137

k(:), and kr

k(z )
jzjr

r0, then kr = 0 for r < r0.

Consider nally the following measure of smoothness of the spectral density function in the neighborhood of 0:

1
X
(
r
)
1
S = (2)
jj jr
(j );
j= 1
also denoted the
function:

When

1
1 X
Sf () =

(j )e
2 j = 1

ij :

is equal to

2

ft evaluated at the zero frequency (Hansen

1982).
Dene the asymptotic truncated Mean Squared Error:

T
MSEh = E min j vec(V^T
mT
where

BT

VT )j; h ;

Theorem 6

We have
(i) If m2T =T

! 0 then V^T

VT

p
!
0.

p
! 1 and BT !
B.

138

(ii) If m2Tr+1=T ! 2)0; 1( for some r 2 [0; 1) for which kr

and jjS (r) jj < 1, then
p
T=mT (V^T VT ) = Op(1):
(iii)

lim lim MSEh = 42 kr2(vecS (r) )0B vecS (r) =

T !1 h!1

Z 1
+
k2(z )dz tr(B )(I + Bqq )Sf (0)
Sf (0) ;
1 P
P
where Bqq = i j eie0j
ej ei and ei is a zero vector with 1 as
the ith element.

(i): establishes consistency of scale parameter covariance estimators for bandwidth sequences that grow at rate

o( T ).

(ii): Gives rate of convergence.

(iii): Gives asymptotic truncated Mean Squared Error.

For

j th diagonal element of V^T , asymptotic bias is

(r )
mT r kr 2Sj;j
and asymptotic variance
Z 1
m 
T
(2)
2
8 Sj;j
k2(z )dz:
T
1

Criteria of choice for scale parameter window

According to the-

orem above, preferred estimators have large

Variance of these

r.

O(mT =T ) and the bias is O(mT r ).

Also, no kernel estimators with r > 2 can be positive semidenite. Hence, we should restrict attention to estimators with r = 2
kernel estimators are

(which rules out truncated and Bartlett kernels).

Optimal choice of scale parameter

mT :

according to asymptotic

T (2r+1)

7.4.

139

Table 7.1: Some Kernel estimators for weighted autocovariance

k(z )
r kr
Truncated
1 for jz j  1,
1 0
0 otherwise
Bartlett
1
jzj for jzj  1
1 1
0 otherwise
Parzen
1
6z 2 + 6jz j3 for 0  jz j  1=2,
2(1
jzj)3 for 1=2  jzj,
2 6
0 otherwise
7.4.6

Consider the Fourier transform of the lag window:

T 1
1 X
W (; mT ) =
!e
2 s= (T 1) s
This is also denoted the

spectral window.

is:

V^T =

Z 

where

is the

sample spectral density

or

T 1
X
s= (T

1)

^ seis

periodogram,

and

W (:; :) is

the

averaging kernel.
Spectral estimators once computationally burdensome, before FFT
(Fast Fourier Transforms) became popular.
Dene the Fourier transform of

f^t as

T
1 X
(p) = p
f^teip t:
2T t=1

140

Table 7.2: Some Kernel estimators for weighted periodograms

k(z )
kr
h
i r
sin(6z=5)
25
2
cos(6z=5)
Quadratic 122 z2 6z=5
2  =10
Daniell
(sin(z )=z
2  2 =6
2(1 cos(sz ))
Tent
2 1/12
z2
The periodogram matrix can be computed at the Fourier frequencies

p =
as

2p
T
; p = 1; 2; : : : ;
T
2

and we have the nal expression for the covariance matrix:

(T 1)

X
2
V^T =
I^T (0p)W (0p; mT );
2T 1 p= (T 1)

0p = 2p=(2T 1). Within the class of scale parameter

windows with r = 2, the Quadratic Spectral window (see table)
where

minimizes the truncated MSE across at the 0 frequency (Andrews

1992).

Chapter 8
GMM estimators for dynamic
panel data
8.1 Introduction
GMM estimation was introduced as an interesting alternative to
Fixed-eects, Maximum-Likelihood or GLS estimation procedures.
But its advantages are the most obvious for estimating dynamic
panel-data models.

yit = yi;t 1 + uit; uit = i + "it:

The Anderson-Hsiao Instrumental-variable procedure: consistent
estimates when

is xed, based on First-Dierence model trans-

formation.
Two drawbacks:
a) In IV procedure, variance-covariance matrix is restricted;
b) Only one instrument is used (either

141

yi;t 2 or yi;t 2

yi;t 3).

142

8.2 The Arellano-Bond estimator

Important paper: Arellano and Bond (Review Econ. Stat. 1991):
more robust procedure can be used (point a)) and more orthogonality conditions can be used (point b)).

8.2.1

Model assumptions

Under these assumptions, we have the set of moment conditions:

E (yisuit) = 0; t = 2; 3; : : : ; T; s = 0; 1; : : : ; t 2;
uit = "it = "it

where

"i;t 1.

This is a set of

T (T

1)=2

conditions (compare with Anderson-Hsiao, where only 1 condition

was available).

" are
correlated, i.e., we must have E ("it "i;t+s ) = 0, for

not serially

s = 1; 1.

If serial correlation is present, we have the set of conditions:

E (yisuit) = 0; t = 3; : : : ; T; s = 0; 1; : : : ; t 3;
which gives

(T

1)(T

2)=2 conditions (we lost (T

1) condi-

tions).
By continuous substitution seen before:

1 t
i + tyi0;
yit = "it + "i;t 1 + 2"i;t 2 +    + t 1"i1 +
1 

8.2.

143

so that

E (yi;t 2uit) = E (yi;t 2("it "i;t 1))

= E ("i;t 2("it "i;t 1)) = 0
because by assumption E ( i "it ) = E ("it yi0 ) = 0.
8.2.2

We need a) the instrument matrix W; b) An initial weighting

matrix.
The instrument submatrix for unit

i is of the form:

yi0 0 0            
6 0 yi0 yi1 0
0 0 
6
60 0 0 y
i0 yi1 yi2 0
Wi = 6





0
0
0



yi;T 2

6
4

..
.

..
.
..
.

0
0
so that Wi ui =
0
ui2 yi0
B ui3 yi0
B
B ui3 yi1
B
B ui4 yi0
B
B u y
i4 i1
B
B u y
i4 i2
B
B
B
B
B
B
@

..
.

uiT yi0
..
.

uiT yi;T 2
0
and E (Wi ui ) = 0.

..
.

..
.

..
.

..
.

..
.

yi0

C
C
C
C
C
C
C
C
C=
C
C
C
C
C
C
A

B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@

(yi2
(yi3
(yi3
(yi4
(yi4
(yi4
(yiT
(yiT

..
.

..
.

yi1) yi0
yi2) yi0
yi2) yi1
yi3) yi0
yi3) yi1
yi3) yi2
..
.

yi;T 1) yi0
..
.

yi;T 1) yi;T 2

7
7
7
7
7
5

1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A

144

CHAPTER 8. GMM ESTIMATORS FOR DYNAMIC PANEL DATA

(W 0
W ) 1:
is the variance-covariance
of " (in the transformed model). If "it is homoskedastic, we have
Initial weighting matrix for

E ("it"i;t 1) = E [("it "i;t

E ("2it) = E [("it "i;t 1)("it
E ("it"i;t+1) = E [("it "i;t
so that for unit

2

H=

(T

2)(T

1)("i;t 1 "i;t 2)] = "2

"i;t 1)] = 2"2
1)("i;t+1 "it )] = "2

6
6
6
6
6
4

2
1 0 
1 2
1 0
0
1 2
1
..
.

2) matrix.

..
.
..
.

..
.
..
.

..
.




..
.

We can use

0
07
7
07
7;
..
.

1 2

7
5

to compute the initial

weighting matrix as

A1 =

N
X
i=1

Wi0HWi:

^GMM = arg min u0W A1 1W 0u




 

= y0 1W A1 1W 0y 1 1 y0 1W A1 1W 0y ;
we can compute the second-stage weighting matrix as:

A2 =
where

^ui = yi

N
X
i=1

^yi; 1.

Wi0^ui^u0iWi;

8.3.

145

8.3 More ecient procedures (Ahn-Schmidt)

Ahn and Schmidt (1995) propose
ditions:

2 additional nonlinear con-

E (uiT uit) = 0; t = 2; 3; : : : ; T

With Ahn-Schmidt and Arellano-Bond, we

(T

2)

orthogonality conditions.

1:
have T (T

1)=2 +

represent all moment conditions implied by our assumptions.

8.3.1

8.3.1.1 Homoskedasticity

V ar("2it) is the same 8t, we have:

t = 1; 2; : : : ; T . This adds T 1 condi-

E (u2it)

8i,

tions, and the nal set of conditions under homoskedasticity is

E (yisuit) = 0 t = 2; : : : ; T; s = 0; : : : ; t 2;
E (yitui;t+1 yi;t+1ui;t+2) = 0 t = 1; : : : ; T 2;
E (uiui;t+1) = 0 t = 1; : : : ; T 1;
where

ui = T1

PT
t=1 uit.

8.3.1.2 Stationarity

Cov( i; yit) is the same

The entire set of the T (T
1)=2+(2T

8t, this adds 1 condition.

2) conditions is now
E (yisuit) = 0 t = 2; : : : ; T; s = 0; : : : ; t
E (uiT yit) = 0 t = 1; : : : ; T 1;
E (uityit ui;t 1yi;t 1) = 0 t = 2; : : : ; T:

2;

146

Advantage: this set consists of linear conditions only.

The Ahn and Schmidt estimator obtains by adding to ArellanoBond instrument matrix the following block for unit

Wi =

B
B
B
@

yi3 yi3 0 :::
..
.

..
.

..
.

..
.

0
0
..
.

i:

ui 0 ::: 0
0 ui ::: 0
..
.

..
.

..
.

..
.

1
C
C
C:
A

W0

conditions.
ments

Let

(W 0; W 1)

^

^0 denote GMM estimates with instru0

W 0 respectively, and J (^) and J (^ ) the

and

and

Then under

H0 : conditions associated with W 1 are valid, we have

0
J (^) J (^ ) v 2(rank(W 1)):

8.4 The Blundell-Bond estimator

Blundell and Bond (1998) suggest to use linear moment restrictions based on assumptions for initial conditions. They propose

E (uityi;t 1) = 0 t = 3; 4; : : : ; T;
with the addition of

E (ui3yi2) = 0:
This last condition combined with the one above implies the AhnSchmidt (1995) nonlinear restrictions

E (uitui;t 1) = 0; t = 3; : : : ; T .

8.5.

147

model:

yi0 =

+ "i0:

In other terms, initial deviations from

correlated with the level of

i=(1

)

i=(1 ) itself.

must not be

The GMM estimator of Blundell and Bond combines the AhnSchmidt conditions

Wi 0
0
6 0 y
0
i2
6
6
Wi+ = 6 0 0 yi3
6
4

..
.

..
.

..
.




..
.



0
0
0
0

yi;T 1

7
7
7
7;
7
5

yi = yi; 1 + "i

yi = yi; 1 + i + "i:

8.5 Dynamic models with Multiplicative eects

We consider here two generalizations to multiplicative individual
eects models.

8.5.1

Multiplicative individual eects

Holtz-Eakin et al.

yit = yi;t 1 + xit + uit;

uit = t i + "it;

148

where

t

is

aected by a time shock (common to all units).

Let us lag equation above one period:

yi;t 1 = yi;t 2 + xi;t 1 + t 1 i + "i;t 1;

and dene a new variable rt = t =t 1 . Substracting from the rst
equation the second one premultiplied by rt , we have
yit rtyi;t 1 = (yi;t 1 rtyi;t 2) + (xit rtxi;t 1)
+"it rt"i;t 1:
This is new, nonlinear equation with parameters to be estimated:

; ; rt; t = 2; 3; : : : ; T .

The transformation used is denoted Quasi-

dierencing.
GMM estimation is applicable as before (Arellano-Bond, AhnSchmidt or Blundell-Bond), but the initial weighting matrix cannot be used anymore. Let

"it = "it

rt"i;t 1.

We have, under

E ("it"i;t 1) = E [("it rt"i;t 1)("i;t 1 rt 1"i;t 2)]

= rt"2
E ("2
= E [("it rt"i;t 1)("it rt"i;t 1)]
it )
= "2(1 + rt2)
E ("it"i;t+1) = E [("it rt"i;t 1)("i;t+1 rt+1"it)]
= rt+1"2:
Thus, the optimal initial weighting matrix would be

1 + r12 r2
6 r2
1 + r22
6
60
r3
6
4 :::
:::
0
:::

0
r3 0
1 + r32 0
:::
:::
:::
rT 1

:::
:::
:::
:::
1 + rT2

3
7
7
7:
7
5

8.5.

149

When the

rt's

ues, but they would produce two-step estimates conditional on

our choice, see above. Also, as the model is nonlinear, we must
minimize the GMM numerically (no closed-form solution).

8.5.2

Mixed structure

Consider

yit = yi;t 1 + uit

where

i = 1; 2; :::; N t = 1; 2; :::; T;

jj < 1, initial conditions yi0; i = 1; : : : ; N are known, and

uit = i + tvi + "it:

tvi

captures

We assume

E ( i2) =  2 ; E (vi2) = v2; E ("2it) = "2 8i; 8t;

E ("it i) = E ("itvi) = 0; E (yi0"it) = 0 8t; E ( ivi) =  v :
Consider the case where one of the following conditions holds:

t = s
Under condition (8.1),

 = i + vi.
i

Under condition (8.2),

8t; s = 1; 2; : : : ; T;

 v = v2 = 0:
let t = 
 8t; then uit = i + "it,
vi

(8.1)
(8.2)
where

is constant, which corresponds to the

E (u2it) =
 2 + "2 and E (uituis) =  2 if t 6= s. Models uit = i + tvi + "it

150

and

 v = 1; 1).

and

We have

uit = (1 + t) i + "it

uit = (1 t) i + "it
Then

vi

 v = 1;
if  v =
1:
if

i disappears from the error term, which becomes:

uit = tvi + "it

with

t = (1 + t); vi = i:

When

tion yields:

uit

ui;t 1 = (t

t 1)vi + "it

"i;t 1;

with

vi:

s  t 2:

uit

E [(uit

i .

rt"i;t 1;
We have

s  t 2:

8.6.

151

8.5.2.2 A consistent transformation

To eliminate both eects

and

tvi,

it is necessary to use a

double-transformation: First-Dierence, and then Quasi-Dierence:

4yit

r~t4yi;t 1 = (4yi;t 1

r~t4yi;t 2) + 4"it

r~t4"i;t 1;

i = 1; 2; : : : ; N; t = 3; 4; : : : ; T , where
r~t = 4t=4t 1 = (t

t 1)=(t 1

t 2):

GMM estimators of the double-dierence model based on Quasidierencing rst and then First-dierencing residuals are not consistent when instruments include lagged dependent variables.
We would have in that case:

4 [("it

rt"i;t 1) + i(1 rt)] = 4"it

which depends on

4(rt"i;t

1)

i4rt;

i.

GMM procedures using instrument matrices from lagged dependent variables would yield consistent estimates only when the correct model transformation is performed.

8.6 Example: Wage equation

Consider the wage equation seen before, in a simpler, dynamic
form:

where

wit:

OCCit:

wage rate,

W KSit:

152

 1. Usual case uit = i + "it;

 2. Multiplicative case uit = tvi + "it;
 3. Mixed case uit = i + tvi + "it.

In case 1, we use a linear GMM procedure with First-dierence

transformation. In case 2, nonlinear GMM in parameters

 and

rt = t=t 1; t = 3; 4; : : : ; T , and Quasi-dierence. In case 3, nonlinear GMM in parameters  and r

~t = t=t 1; t = 4; 5; : : : ; T ,
and Double-dierence.

W KS; OCC ).

Instruments in all cases: weakly exogenous in level (

Table 8.1:
Parameter


1
2

First-dierence GMM

Estimate

Std. error

t-stat.

0.9465

0.0126

74.83

0.0022

0.0022

0.98

-0.0848

0.0423

-2.00

8.6.

Table 8.2:
Parameter


1
2
r1
r2
r3
r4
r5

153

EXAMPLE: WAGE EQUATION

Quasi-dierence GMM

Estimate

Std. error

t-stat.

0.9121

0.0218

41.72

0.0150

0.0038

3.87

-0.1014

0.1007

-1.00

-0.5838

0.3856

-1.51

-0.0871

0.0974

-0.89

0.3294

0.0621

5.29

-0.1842

0.1074

-1.71

1.0401

0.5947

1.75

154

CHAPTER 8. GMM ESTIMATORS FOR DYNAMIC PANEL DATA

Table 8.3:
Parameter


1
2
r~1
r~2
r~3
r~4

Double-dierence GMM

Estimate

Std. error

t-stat.

0.9211

0.0460

19.98

0.0082

0.0014

5.79

-0.0394

0.0322

-1.22

-0.5272

0.2250

-2.34

-0.1188

0.1029

-1.15

0.2931

0.1009

2.90

-0.0863

0.0399

-2.16

Hansen test 19.20 (0.05)

Part III
Discrete choice models

155

Chapter 9
Nonlinear panel data models
9.1 Brief review of binary discrete-choice models
Models with qualitative variables: binary choice and multinomial
models. Brief survey of these models, for cross-section data and
the binary case :

yi = xi + ui; i = 1; 2; : : : ; N;

yi = 1
if yi > 0;

yi = 0
if yi  0;

yi and yi: respectively latent (unobserved) and observed variables;

xi: 1  K vector of regressors. Threshold 0 is arbitrary here, as
E (yi) is unknown.
9.1.1

E (yi) = P rob(yi = 1) = xi + ui:

[0; 1]. Two
possible values for residual ui : 1 xi (when yi = 1) or ui =
xi

157

158

yi = 0). Heteroskedasticity, since V ar(ui) = P rob(yi =

0)  ( xi )2 + P rob(yi = 1)  (1 xi )2
(when

= (1 xi )  ( xi )2 + xi  (1 xi )2
= (1 xi )[( xi )2 + xi (1 xi )]
= xi  (1 xi ):

9.1.2

Logit model

Based on Logistic distribution:

exp(xi ) ;
P rob(yi = 1) = (xi ) = 1+exp(
xi )
1
P rob(yi = 0) = 1 (xi ) = 1+exp(
xi ) ;
exp(xi )
Density: (xi ) =
[1+exp(xi )] :
2

In this case,

9.1.3

V ar(ui) = 2=3.

Probit model

Based on Normal distribution:


xi = R xi = p1 exp( u2i );
   1  2
2 2
R
1 exp( u2i2 );
p
 xi = x+1
2
i =  2
2
ui
p1
 2 exp( 22 ):

P rob(yi = 1) = 
P rob(yi = 0) = 1
 
xi
Density:
 =
Parameter

ui is N (0; 2)

is unidentied (appears in ratio

=): 

is normal-

ized to 1.
Estimation method: Maximum Likelihood:

^ = arg max

N
Y
i=1

[P rob(yi = 1)]yi [1

P rob(yi = 0)]1

yi

9.2.

159

= arg min

where

N
Y
i=1

F (ixi );

1.

In these models, inference is best drawn on a) sign of estimates;

@P rob(yi = 1)=@xi).

b) marginal eects (

P rob(yit = 1) = P rob(yit > 0) = P rob("it > xit

= P rob("it < xit + i) = F (xit + i):

i )

9.2 Logit models for panel data

9.2.1

Sucient statistics

Consider rst a model with xed-eects.

Maximum Likelihood estimator: we have to estimate both

and

i; i = 1; : : : ; N , but i and are not independent for qualitativechoice models. When T is xed, MLE estimates of i are not consistent and consequently, the MLE of is not consistent either.
Individual eects i are denoted incidental parameters (their number increases with N ).
Solution: Neyman-Scott (1948) principle of estimation in the presence of incidental parameters.

statistic i for , i = 1; 2; : : : ; N

sucient

, then

the conditional density

f (yijxi; i; ) =

f (yijxi; i; )
;
g(ijxi; i; )

for

g(ijxi; i; ) > 0;

160

i.

then obtains by maximizing the conditional density of (y1 ; : : : ; yN ) given (1 ; : : : ; N ):

A consistent estimator of

^ = arg max

Joint probability of

yi:
h

P rob(yi) =

exp i

N
Y
i=1

P

f (yijxi; i; ):

T
t=1 yit

P

QT
t=1 [1 + exp(xit

T
t=1 yit xit

 i

+ i)]

If we solve the FOC associated with maximizing the log-likelihood

wrt.

N X
T
@ log L X
=
@
i=1 t=1
and wrt.

exp(xit + i)
+ y x = 0;
1 + exp(xit + i ) it it

i:

T
@ log L X
=
@ i
t=1

T
X
t=1

yit =

exp(xit + i)
+ y = 0; i = 1; 2; : : : ; N;
1 + exp(xit + i) it
T 
X
t=1

exp(xit + i)
1 + exp(xit + i )

i is: i =
PT
The probability that
t yit = s is
Hence, a sucient statistic for

exp( is)
T!
Q


s!(T s)!
[1
+
exp(
x

+

)]
it
i
t

i = 1; 2; : : : ; N:

PT
t=1 yit .

X
d2Bi

exp

T
X
t=1

! )

ditxit

9.2.

161

LOGIT MODELS FOR PANEL DATA

9.2.2

Conditional probabilities

The conditional probability of

yi given i is:

 i
T
exp
t=1 yit xit
P

P rob (yi i) = P
T
d2Bi exp
t=1 dit xit
P
P
( t yit)!(T
t yit)! ;

hP

where

T!
Bi is a set of indices for individual i:
(

Bi = (di1; di2; : : : ; diT )jdit = 0; 1

Set

Bi

T
X
and

t=1

dit =

T
X
t=1

yit :

yP
it for individual
T
in
t yit . Groups

for which

PT
t yit

= 0

or

PT
t yit

= T

contribute nothing to the likelihood. Only sets of interest: when

T
y
=
s
2
]0
;
T
[
;
there
are
(
it
1
s ) =T !=[s!(T s)!] such elements,
that correspond to distinct T sequences with value s.
PT

Notes:

 The second expression does not depend on and can be dropped;

 To compute the above probability, we have to consider for each
s all possible sequences of 0's and 1's. Example: if T = 4 and
s = 2, we would have 6 cases and
2
3
1 1 0 0 0
1
61 0 1 07
exp(
x

)
i
1
!
6
7
T
X
X
6 1 0 0 1 7 B exp(xi2 ) C
7B
C
exp
ditxit = vec 6
6 0 1 1 0 7 @ exp(xi3 ) A
6
7
t=1
d2Bi
40 1 0 15
exp(xi4 )
0 0 1 1

162

9.2.3

Example:

T =2

yi1 + yi2 = 1.

!i = 1
!i = 0

if
if

Let

(yi1; yi2) = (0; 1);

(yi1; yi2) = (1; 0):

P rob(!i = 1jyi1 + yi2 = 1) =

P rob(!i = 1)
P rob(!i = 0) + P rob(!i = 1)

exp( i + yi2xi2 )
=
[1 + exp( i + xi1 )][1 + exp( i + xi2 )]

exp( i + xi1 )][1 + exp( i + xi2 )]

 [1 +exp
( i + xi1 ) + exp( i + xi2 )
exp( i + xi2 )
=
exp( i + xi1) + exp( i + xi2 )
exp[(xi2 xi1) ])
=
= [(xi2 xi1) ]:
1 + exp[(xi2 xi1) ]
In that case, Bi = fijyi1 + yi2 = 1g and the conditional
likelihood is log L =
X

i2Bi

log-

xi1) ] + (1 !i) log f1 [(x2i xi1) ]gg :

T >P
2, we have to consider alternative sets of
T
observations for which
t yit is the same. Note that this formulation is a conditional Logit specication: regressors x depend on
In practice, when

the alternative.

9.3.

163

PROBIT MODELS

9.3 Probit models

One typically uses the Probit model in the random-eect case
(easier to work with).

uit = i + "it,

distribution

where

G(:) and is independent of the xi's.

is drawn from

Assume

 2
:
1 +  2
The contribution to the likelihood of unit i is Li = P rob(yi )
V ar( ) =  2 ; V ar("it) = 1; Corr(uit; uis) =  =
=

Z i1 xi1



Z iT xiT

it = 2yit
elements in ui .

where

1
and

f ( :)

the

Z +1 Y
T

1 t=1

 2 )).

Z +1

i,

f (ui1; ui2; : : : ; uiT j i)f ( i)d i

f (uitj i )f ( i)d i;

Li as
p2 #
) dti;
itxit + it ti p
1 

Butler and Mott (1982) show that we can write

1
Li(yi) = p


Z +1

"
T
Y
t2i
(
t=1

which is now a one-dimensional integral that can be evaluated numerically (Gauss-Hermite integration procedure).
of the method: assume a constant correlation

) across periods.

164

9.4 Semiparametric estimation of discrete-choice

models
We consider here estimation of binary-choice panel data models
with xed eects and possibly endogenous regressors.
In the model

yit = x0it + i + "it;

i = 1; : : : ; N; t = 1; : : : ; T;

E [(xit xi)("it "i)] = 0;

true if

x is strictly exogenous:
E ["itjxi1; : : : ; xiT ] = 0:

It is not sucient to assume that

x is predetermined only:

E ["itjxi1; : : : ; xit] = 0;
and in this case we have to use IV estimation strategy, e.g., tting

4yit = 4xit + 4"it;

using as instruments past values of

x.

Such an approach would not work in nonlinear models, unless

some linearization of the model is performed.

9.4.1

yit = 1I(it + x0it + i + "it > 0):

Negative result of Chamberlain (1993): Even if the distribution of

"it

tive result can be overthrown under some additional assumptions

(e.g.,

it

is independent from

set of instruments denoted

zi).

and

"it,

conditional on

x and

Assumptions
A.1. The conditional distribution of

lutely continuous wrt. a Lebesgue measure with non-degenerate

Radon-Nikodym conditional density function

ft(itjxit; zi).

eit = i + "it; 8 t. eit is conditionally independent of

it (conditioning on xit and zi). The conditional distribution of
eit has support
et(xit; zi) and is denoted Fet(eitjxit; zi).
A.2. Let

t = r and t = s, the conditional distribution of it given xit and zi has support [Lt ; Kt ] with
1  Lt <
0 < Kt  1, and the support of xit eit is a subset of [Lt; Kt].
A.3. For 2 periods

A.4. Let

(i)

Then

E ("ir zi) = E ("iszi) = 0;

(ii)
E ( izi); zz ; xrz and xsz exist:
(iii) zz and (xrz xsz )  0zz (xrz xsz )0

166

are nonsingular.

 The case ir = is = i is allowed;

 (xit; zi) can be correlated with it, but (A.1) rules out (xit; zi)

it;
 i can be correlated with xit or zi, but (nuit; i) must be independent given (xit ; zi );
 "it is uncorrelated with instruments zi;
 (A.2) means that the conditional distribution of "it given (it; xit; zi)
does not depend on it ;
 According to (A.3), it can take on any value that x0it + eit
as deterministic functions of

fk2g such that

P rob(yitjxit; zi; it = k1) ! 0

and

fk1g and

! 1:

Practical implication: the resulting estimator will perform well

when the variance of

variable.

Theorem 7

Let

yit =

:
ft(itjxit; zi)

If Assumptions (A.1) to (A.3) hold, then for t = r; s,

E (yit jxit; zi) = x0it + E ( i + "itjxit; zi):
Proof. Let (dropping subscripts for clarity) s = s(x; e) =

x0

e.

We have

E (yjx; z ) = E

E [y

jx; z
f ( jx; z )

=
=

Z KZ

Z K

E [y

f ( jx; z )d
f ( jx; z )

Z K

e L

[1I( > s)

e wrt.  :)

Note that

1I( > s)

1I( > 0) = 1I( > s > 0)1I(s > 0) + [1I(0    s)

+1I( > 0  s)] 1I(s  0) [1I(s >  > 0) + 1I( > s > 0)] 1I(s > 0)
1I( > 0  s)1I(s  0)

= 1I(s > 0) [1I( > s > 0) 1I(s >  > 0) 1I( > s > 0)]
+1I(s  0) [1I(0    s) + 1I( > 0  s)
= 1I(s  0)1I(0 >   s)

,
=

E (yjx; z ) =


1I(s  0)

Z K

e L
Z 0
s

[1I(s  0)1I(0 >   s) 1I(s > 0)1I(0 <  < s)]

ddFe(ejx; z)

1  d

1I(s > 0)

Z s

1  d dFe(ejx; z )

[1I(s  0)  ( s) 1I(s > 0)  s] =

= x0 + E (ejx; z )

QED:

sdFe(ejx; z )

168

9.4.2

The IV estimator

to (A.4),

E (ziyit ) = E (zix0it) + E (zi i);

for

t = r; s:

Let


xsz )0 1 (xrz

xsz )zz1(xrz

= (xrz

xsz )zz1

t = E (ziyit ):
Then is consistently estimated by
(r
s).
and

xir

yir

yis

on

We use the fact that

E (yjx; z ) = x0 + E ( + "jx; z )

,
Let

4x = xr

yis = (x0ir x0is) + E ("ir "isjx; z )

= (x0ir x0is) + E ("ir "isjx):
xs, 4y = yr ys,4 = zyr zys. The 2SLS

yir

will be



^ = (4xz 0)(z 0z ) 1(z 4x0) 1 (4xz 0)(z 0z ) 1z 4y:
Lewbel and Honor show that

N ( ^

where

) v N 0;
Var(Q^ i)
0 ;

can be replaced by
^ and

Q^ i = (ziyir

ziyis ) zi(xir

^
xis)0 :

For computing

estimate of

ft .

(xit; zi).

sity of

wit = (xit; zi) denote the K + L vector of explanatory and

instrument variables, and uit = (it ; wit ) (a K + L + 1 vector).
f^(it; wit) and f^(wit) respectively denote the estimated joint density function of it and components of wit , and the joint density
associated to components of wit . These densities are
Let

NT
X
K +L+1 1

f^(it; wit) = NT h

= NT h

NT Z
X

1
K +L+1

uit

uj

f^(it; wit)dit


Km

it

NT

1X
j =1

Km

j wit
;

wit

wj

h


wj

dit

Km R(:) and Km (:) are Rtwo multivariate

kernels such that Km (x) =
Km (x; y)dy and Km (x)dx = 1.

where

j =1


NT hK +L

Km

j =1

f^(wit) =

is the window,
2

The conditional density is then estimated by

f^(it; wit)
^
ft(itjxit; zi) = ^
:
f (wit)

170

9.5 SML estimation of selection models

General-purpose estimation technique for models with selection:
models with endogenous regime switching, Generalized Tobit models, etc.
Use of a particular, ecient simulator for multivariate normal
distributions: the GHK simulator (Geweke-Hajivassiliou-Keane,
Geweke 1991, Brsch-Supan and Hajivassiliou 1993, Keane 1994).

9.5.1

Consider the following likelihood function

L=

where

g("j)f ()d;

f:rg
 = (1; 2; : : : ; K )0 and "

are a

K -vector

(9.1)

and a

M-

vector of normal variates respectively. This corresponds to a very

general structural model dened by
straints

  r.

g(:j:),

and the set of con-

Notes:

 In this model formulation, " is an implicit function of parameters and observed variables.

  is typically an unobserved heterogeneity term.

 Function g(:j:) may depend in particular on the conditional distribution of

" given .

Problem in practice: ML estimation would require numerical

integration involving multiple probability distributions.
Idea of the GHK technique: construct a recursive algorithm to
approximate multiple integrals.

9.5.

171

Let

= var(),

exists a lower diagonal matrix

decomposition):

D
Dene

B
=B
B
@

satisfying

DD0 =

(Choleski

d11 0 : : : 0
d21 d22 : : : 0 C
C
..
.

..
.

..

dK 1 dK 2 : : :

..
.
..
.

C:
A

variate. We have

L=
where

i(:):

K
Y

i(i) g("jD )d;

(9.2)

 = (1; 2; : : : ; K ).
f :   rg can be written

The domain (set of constraints)

recursively as

1 

1
1
1
r1; 2  (r2 d211); 3  (r3 d311 d322);
d11
d22
d33
1
: : : ; K 
(r
d  : : : dK;K 1K 1):
dKK K K 1 1

L=

Z 1

"

Z 1

1
r1 =d11 d22
(r2 d21 1 )

:::

K
Y
i=1

Dene now the truncated normal density function for

(i)Ai = (i) 

1
1 
(r
dii i

di11

i

: : : di;i 1i 1)

(9.3)

 1

172

1
where Ai =
dii (ri

di11

: : : di;i 1i 1); 1

i
, and

(:) is the

normal cumulative density function (CDF). The likelihood function above is now

L=

Z
A1

:::

"

Z
AK

K 
Y
i=1

K
Y
i=1

1
(r

dii i
!

!

di11

: : : di;i 1i 1)

i given that its

distribution is truncated normal is between 0 and 1. Let ui denote
a random variable on [0; 1]. We can then write
dom variables. The probability associated to any

ui =

(i)
1

 d1ii (ri


 d1ii (ri

di11
di11

: : : di;i 1i 1)
: : : di;i 1i 1)

; i = 1; 2; : : : ; K:

For example:

(1) (r1=d11)
,
1 =  1 [u1  (1 (r1=d11)) + (r1=d11)]
1 (r1=d11)
(2) (1=d22(r2 d21r1))
u2 =
1 (1=d22(r2 d21r1))






1
1
, 2 =  1 u2  1  d (r2 d211) +  d (r2 d211) ;
22
22
where 1 is dened above.
For any i, we have the recursive formula:




1
i =  1 ui  1 
(r d  : : : di;i 1i 1)
dii i i1 1

u1 =

9.5.

173

SML ESTIMATION OF SELECTION MODELS

1
+
(r
dii i

di11



: : : di;i 1i 1)

(u1; : : : ; uK ).
The likelihood function now involves random variables ui ; i =
1; : : : ; K and K integrals with constant bounds:
which depends on the sequence of uniform random variables

L=

Z 1

where

:::

Z 1" Y
K 

i=1

1

(r
dii i

di11

: : : di;i 1i 1)

!

g("jD )

du1du2 : : : duK ;

QK
i Ai (i)

dui=di.

Since the

ui's

above by

 Drawing S values for the vector u: fus1; us2; : : : ; usK gSs=1;

 Compute recursively (1s; : : : ; Ks ) from us above;
 Average out over the S draws to form the Simulated Likelihood:

LS =

"K 
S Y
X

1
1
1 
(r
S s=1 i=1
dii i

di11s



: : : di;i 1is 1)

Note
Easy to generalize to a restriction set of the form

a <  < b.

i = q [ui; (ai

di11

g("jD s) :

We

174

(bi

di11

where

q(u; a; b) =  1 [(a)  (1 u) + (b)  u] :

If we want to compute for example the probability of an event

a <  < b in the multivariate case, we just need to evaluate

Q( ) = Q1:Q2: : : : QK ;

where

Qi =  [(bi di11 : : : di;i 1i 1)=dii]

 [(ai di11 : : : di;i 1i 1)=dii] ;
and average out over simulations.

9.5.2

Example

Based on the paper: V.A. Hajivassiliou and Y.M. Ioannides 2001,

"Unemployment and liquidity constraints".

St and Et respectively.
y1t > 0;
y1t  0:

St =
Et =

8
<
:

0
1

1
0

if
if

y2t <  ;

+
if   y2t <  ;
+

if   y2t :
if

y1 = 1I(y2 <  ) 11 + 1I( < y2 < +) 12 + x1 1 + v1;

y2 = 1I(y1 > 0)2 + x2 2 + v2:
Six possible regimes, as (S; E ) in f0; 1g  f 1; 0; 1g.

9.5.

SML ESTIMATION OF SELECTION MODELS

S E
0

-1

-1

175

y1
y2
11 + x1 1 + v1 < 0
x2 2 + v2 < 
x1 1 + v1 <0
 < x2 2 + v2 < +
12 + x1 1 + v1 < 0
+ < x2 2 + v2
11 + x1 1 + v1 > 0
2 + x2 2 + v2 < 
x1 1 + v1 > 0  < 2 + x2 2 + v2 < +
12 + x1 1 + v1 > 0
+ < 2 + x2 2 + v2

a1
a2

<

v1
v2

<

b1
b2

as follows:

S E
0

-1

-1

a1

1
1
1

( 11 + x1 1)
x1 1 
( 12 + x1 1) +

a2


+

x2 2
x2 2

2
2

x2 2
x2 2

b1
( 11 + x1 1)

x1 1
+
( 12 + x1 1)
+1  2
+1 + 2
+1

vit = i + it in the

example above, where v corresponds to " and corresponds to 
In the panel data case: We would have

tional on

b2
x2 2
x2 2
+1
x2 2
x2 2
+1

176

CHAPTER 9. NONLINEAR PANEL DATA MODELS

 Allows for multivariate distributions for individual eects, possibly correlated across equations.

the

177

Appendix 1. Maximum-Likelihood estimation

of the Random-eect model
and ", the log-likelihood is
NT
N
1 0 1
log("2)
log()
U  U;
2
2
2"2

Under normality assumption for

log L =
where

NT
log(2)
2

 =
="2 = Q + B , and

j
j = ("2)N (T

1)( 2 + T  2 )N = ( 2)NT N :
"

"

Concentrated log-likelihood wrt.

":

 

 

NT
1
N
NT
log(2)
log d0 Q + B d
log();
log L =
2
2

2
where d = Y
X ^ .
Estimate of 1= conditional on :
P P
0 Qd
d
dit di)2
i t (P
1d
= =
=
:
(T 1)d0Bd T (T 1) i(di d)2
Estimate of

conditional on 1=:


1
X0 Q + B X

 1

1
X 0 Q + B Y:

Maddala (1971): there are at most 2 maxima for the log-likelihood

(problem of local maximum).
Breusch (1987) procedure: iterate between
vergence.

^ "2 and 1d
= until con-

 Starting with ^ W ithin and 1= = 0, the next 1d

= is positive and
starts an increasing sequence;

178

 Starting with ^ Between and 1= ! 1, the next 1d

= is positive
and starts a decreasing sequence.

Since at most 2 maxima, use both as starting values. If

maximum of log L is the same, this is the true maximum.

179

Appendix 2. The two-way random eects model

A2.1 Assumptions and notation
Assumptions:

i v IID(0;  2 ); t v IID(0; 2 ); "it v IID(0; "2);

E ( it) = E ( i"it) = E ("it) = 0;
and

is independent of

i; t and "it.

We have

E (uitujs) =

8 2
2
2
<  +  + "
2
: 2


i = j; t = s;
if i = j; t 6= s;
if i 6= j; t = s:
if

Variance-covariance matrix of error term is

=  2 (IN
eT e0T ) + 2 (eN e0N
IT ) + "2(IN
IT )
= T  2 B + N2 B + "2INT :
A2.2 Feasible GLS estimation
We can write

P4
j =1 j Mj ,

1 = "2
2 = T  2 + "2
3 = N2 + "2
4 = T  2 + N2 + "2

with

M1 = (IN eNNeN )
(IT
0
0
M2 = (IN eNNeN )
eETeT
0
0
M3 = eNNeN
(IT eTTeT )
0
0
M4 = ( eNNeN )
( eTTeT ):

eT e0T
T )

180

APPENDIX 2. THE TWO-WAY RANDOM EFFECTS MODEL

We have

r =

P4
r
j =1 j Mj ,

so that

4 
X


"
1=2 =
p" j Mj
j =1
and the typical element of

yit = yit
with

1 = 1

p" 2 ; 2 = 1

Y  = "
1=2Y
1yi


is

2yt + 3y;


p" 3 ; 3 = 1 + 2 +

p" 4

1:

Y  on X .

V ar(Mj U ) = j Mj ; j = 1; 2; 3, the Best Quadratic Un0

biased estimator of j is U Mj U=tr (Mj ); j = 1; 2; 3.
Because

residuals.

Asymptotic distribution of variance component esti-

mates:

p
2
NT
(^

p 2"
@ N (^
p 2
0

T (^

"2)
2"4 0 0
 2 ) A v N @0; @ 0 2 4 0
0 0 24
2 )

11
AA :

ponents from mean square errors of three regressions:

Between-individual and Between-periods.

First regression: Within, model transformed by

M1 = (IN eN e0N =N )
(IT eT e0T =T ).

Within,

181
Estimate of

1:

^ 1 = ^ 2" =

[Y 0M1Y

:
(N 1)(T 1) K

Second regression: Between individual,

M2 = (IN eN e0N =N )
(eT e0T =T ).
Estimate of

2:

^ 2 =

[Y 0M2Y

and we compute

model transformed by

Y 0M2X (X 0M2X ) 1X 0M2Y ]

;
(N 1) K

^ 2 = (1=T )(^2

^ 2" ).

Third regression: Between period, model transformed by

M3 = (IN eN eN =N )
(eT e0T =T ).
Estimate of

3:

^ 3 =

[Y 0M3Y

and we compute

Y 0M3X (X 0M3X ) 1X 0M3Y ]

;
(T 1) K

^ 2 = (1=N )(^3

^ 2" ).

General formulation of the GLS estimate:



^ GLS = (X 0 M1X )="2 + (X 0M2X )=2 + (X 0M3X )=3 1

and

(X 0M1Y )="2 + (X 0M2Y )=2 + (X 0M3Y )=3


V ar( ^ GLS ) = "2 (X 0M1X ) + "2(X 0 M2X )=2


+"2(X 0M3X )=3 1 :

^ W ithin = [X 0M1X ] 1[X 0M1Y ],
Within estimator
^ BI = [X 0M2X ] 1[X 0M2Y ],
Between-individual estimator is

182

APPENDIX 2. THE TWO-WAY RANDOM EFFECTS MODEL

Between-period estimator is

^ BP = [X 0M3X ] 1[X 0 M3Y ], so that

^ GLS = W1 ^ W ithin + W2 ^ BI + W3 ^ BP ;
with

i
0M X
0M X 1
X
X
0
2
2
W1 = X M1X + "  + " 
(X 0M1X );
h
i
0M X
0 M X 1 "
X
X
2
0
0
2
W2 = X M1X + "  + " 
 (X M2 X );
h
i
0M X
0 M X 1 "
X
X
0
2
2
W3 = X M1X + 
+
(X 0M3X ):
2

"

2

"

3

2
2
2

3

 If  2 = 2 = 0, ^ GLS is ^ OLS ;

 When T and N ! 1, ^ GLS ! ^ W ithin;
 If " ! 1, then ^ GLS ! ^ BI ;
 If " ! 1, then ^ GLS ! ^ BP .
2

2
2
3

A2.3 Testing for eects

Breusch-Pagan (1980): Lagrange Multiplier test statistic for

 =  = 0.

Lagrange Multiplier (LM) test: uses restricted estimates

H0 :

 only

  
 1 

@ log L() 0
@ 2 log L()
@ log L()
LM =
E
;
@
@@0
@
where

log L() =
and

 = ( 2 ; 2 ; "2).

NT
log(2)
2

1
log j
j
2

1 0 1
U
U;
2

183
Gradient of log likelihood:

@ log L()
1
@

= tr
1
@i
2
@i
i = 1; 2; 3.



1
@

+ U 0
1

1U ;
2
@i

Because

=  2 (IN
eT e0T ) + 2 (eN e0N
IT ) + "2(IN
IT );
we have

=
@i

8
0
< IN eT eT
eN e0N IT
:
INT

Hence

( 2 )
2
i=2 ( )
2
i=3 (" ):
i=1

0
0
0
@ log L()
NT 4 1 U0 (IN
0 eT eT )U=U 0U
=
1 U (eN eN
IT )U=U U
@
2"2
0
and

3
5;

 1
@ 2 log L()
=
E
@@0
2
3
(
N
1)
0
(1
N
)
4
2"
4
0
(T 1) (1 T ) 5 :
NT (N 1)(T 1) (1 N ) (1 T ) (NT 1)


NT
LM =
1
2(T 1)


NT
+
1
2(N 1)


U 0(IN
eT e0T )U 2
U 0U

U 0(eN e0N
IT )U 2
U 0U

184

APPENDIX 2. THE TWO-WAY RANDOM EFFECTS MODEL

and is distributed as a

Important note. LM

U.

185

Appendix 3. The one-way unbalanced random

eects model
A3.1 Notation

D1 and D1 + D2 resp.

 



D1  1 )
Y1
X1
U1
=
+
;
(D1 + D2)  1 ) Y2
X2
U2
where X1 and X2 are resp. D1  K and (D1 + D2 )  K .

Consider 2 cross-sections, of dimension

Variance-covariance matrix of

Now, let

We have

0
0

Tj =


Pj
i=1 Di ,


1 0

=
;
0
2

is

=
0

"2 ID1 +  2 eD1 e0D1

 2 eD1 e0D2
2
0
2
 eD2 eD1
" ID2 +  2 eD2 e0D2

(Tj  2 + "2)eTj e0Tj =Tj

with
j =
+"2(ITj eTj e0Tj =Tj ):

rj = (Tj  2 + "2)r

eT e0
j

Tj

Tj
2
2
2
If we denote wj = Tj  + " ,

matrix:

+ ("2)r ITj

!
0
eTj eTj

Tj

"
wj

the transformation for the unbal-

anced panel is

"
j 1=2 =

T1 = D1 and T2 = D1 + D2.

so that

eTj e0Tj
+ ITj
Tj

eTj e0Tj
Tj

186APPENDIX 3.

THE ONE-WAY UNBALANCED RANDOM EFFECTS MODEL

eTj e0Tj
j
Tj

= ITj

where

1=2Yj : yjt
Typical element of "

Direct generalization to the case

diagonal and o-terms (in the

0
^ GLS = X  X 

 1

Y  = "
1=2Y , and

"
1=2 = diag ITi

j = 1

"
:
wj


PTj
1
j Tj t=1 yjt .

N > 2,

because

is block-

j 's) are always equal to  2 .

0
X  Y  where X  = "
1=2X;

eTi e0Ti
+ diag
Ti



"
wi



eTi e0Ti
Ti



A3.2 Estimation of variance components

Amemiya (1971) suggests the following estimates for

U^ 0QU^
;
T
N
K
i
i

 2 and "2:

^ 2" = P


N + tr (X 0QX ) 1X 0 B X ^ 2"
2
P
P 2 P
^ =
T
i
i
i Ti = i Ti
  0 

tr (X Q X ) 1X 0 (Jn=N ) X ^ 2"
P
P 2 P
+
;
T
T
=
T
i
i
i
i i
i
P
P
where Jn is a matrix of ones, of dimension (
i Ti)  ( i Ti),
U^ 0B U^

eTi e0Ti

B = diag
Ti

ji=1!N ; Q = diag

ITi

eTi e0Ti
Ti

ji=1!N :

187

Appendix 4. ML estimation of dynamic panel

models
A4.1 Likelihood functions
Dierent likelihoods corresponding to cases 1 to 4 above. Assumption:

y

L1 = (2)

NT
2

"
N

(det V )

exp

N
1X

2 i=1

u0iVT 1ui ;

ui = (yi yi; 1 xi zi ) and VT = "2IT +  2 eT e0T , the

(T  T ) variance-covariance matrix for unit i.

where

i):

N
2

"

N
2

N
X

1
(y
2y2 i=1 i0

exp

y )2 :
0

For Case 2.b/ ( i0 random and correlated with

NT

L2b = (2)
(

exp

("2)

N (T

1)

("2 + T a)

N
2

(y2 )
0

N
2

" T
N X
X

N X
T
X

1
a
2+
u
it
2"2 i=1 t=1
2"2("2 + T a) i=1

(2)

i):

"

N
2

exp

N
X

1
(y
2y2 i=1 i0
0

t=1

u2it

y )2 ;
0

#)

188

where

y ).

a =  2 2 y2

and

L3 = (2)

NT
2

( 2 )

NT

"

(2)

N X
T
1 X
[(y
2"2 i=1 t=1 it

exp

(yi;t 1
N

"

(2)

N
2

z ]2

2)):

"

N (T +1)

L4a = (2)

"

j
T +1j

N
2

exp

N
1X

wi0)2 :

For Case 4.a/ ( i0 random with common mean

2=(1

N
1 X
(y
22 i=1 i0

exp

yi0 + wi0)

w and variance
#

1 v0 ;
vi
T +1
i

2 i=1
where vi is a (T + 1) vector vi = (yi0
w ; yi1 yi0 xit
zi ; : : : ; yiT yi;T 1 xiT zi ) and
T +1 is a (T +1)  (T +1)
matrix

T +1 = "2 1 
0T
Useful expressions:

00T
IT

 1 
+  2 1 
eT
1

"2T
j
T +1j = 1 2 "2 + T  2 + 11 +   2
and

1 = 1

T +1
"2

"

1 2 00T
0T
IT

 2
"
 2

; e0T :

1+
+T +
1 

 1

189

1+
(1 + ; e0T ) :
eT

For Case 4.b/ ( i0 random with common mean w and arbitrary

2
variance w0 ): same as 4.a/, but with
T +1 replaced by

 1
2 = 2 00 

VT +1 = "2 w " T +  2 1 
0T IT
eT



; e0T :

For Case 4.c/ ( i0 random with mean i0 and variance

2 ): same as 4.a/, but with y replaced by i0.
0

)

ance

(1

2

Same as L2b but with y ,

2
) (2 + w2 ) respectively.

w0 ):

i0

and

"2=(1

y

replaced by

i0,

and

large

and xed

A4.2 Specication tests

Useful for checking maintained assumptions on initial conditions.
Based on Likelihood Ratio (LR) statistics.

Case 1

yi0 xed.

VT +1. Let L01 denote es2

timated log-likelihood L1 under assumption H0 : VT = " IT +
 2 eT e0T , and L1 the estimated log-likelihood with unrestricted VT
 L0) is distributed
(T (T + 1)=2 components). Under H0 , 2(L1
1

190

as a

Case 4.a

wi0

wi0

random with common mean

w

and variance

"2=(1 2)
H0: matrix
T +1 as dened in likelihood for Case 4.a, vs. alternative: unrestricted variance-covariance with (T + 1)(T + 2)=2
0

components, with log-likelihoods L4a and L4a respectively. Under
H0, 2(L4a L04a) is distributed as a 2((T + 1)(T + 2)=2 2)
(note only two free parameters in restricted VT +1, as  already
estimated).

Case 4.b
variance

2

w0 .

Let

L04b

w

and arbitrary

denote log-likelihood under restriction on

VT +1 for Case 4.b, and L4a the unrestricted log-likelihood for Case
 L0 ) admits
4.a (as above). Under H0 : True model is 4.b, 2(L4a
4b
2
a  ((T +1)(T +2)=2
3) distribution (3 free parameters in Case
2
2
2
4.b: " ;  ; w ).
0

H0:
2
as a  (1).
Under

191

Appendix 5. GMM estimation of static panel

models
In the Instrumental-Variable context (Hausman-Taylor, AmemiyaMaCurdy, Breusch-Mizon,Schmidt), we assumed:

 Error-component model with E (uu0) =

= "2INT +  2 (IN

eT e0T ).

 Endogeneity was caused by E (X 0 ) 6= 0 or E (Z 0 ) 6= 0, but it

E (X 0") = E (Z 0") = 0.

was assumed

or ", producing dierent orthogonality conditions.

Several cases:
1. Random or xed eects (instruments correlated with

);

").

A.5.1 Computation of the variance-covariance matrix

For the panel data case, we can use the fact that several time
observations are available for each unit.

If heteroskedasticity of

E (uituis) = 0; t 6= s, we have


N
1 0
VN = NV arf (x; ) = Nvar Z u = 2 E [Z 0uu0Z ]
N
N
1
= Z 0[IT
diagfi2g]Z
N
P
2
where i can be estimated by 
^ 2i = T1 Tt=1 u^2it. Hence, a optimal
second-step estimate for VN would be
N
X
1
^ i; where H^ = diagf^ 2i g:
V^N =
Zi0HZ
N i=1
2
2
the form E (u ) = 
it

such that

192

Important aspect for panel data: If we transform the model to

remove individual eects

model with

Consider a linear xed-eect

q orthogonality conditions:

E [Wi0ui ] = 0 where ui = QT ui

and Wi is a T  q matrix of instruments.
Because QT is a T  T symmetric matrix, conditions above can
0
be rewritten E [(QT Wi ) ui ] = 0 and the optimal weighting matrix
AN is VN 1 with


W 0 u
VN = NE
N

  0 
u W

= NE [(QW=N )0uu0(QW=N )]

= [(QW )0 (QW )]:

N
Hence, for GMM, it is equivalent to transform the model (by
or the instrument matrix.

Assume now the error-component assumption holds; we have

VN =

because

1
[(QW )0["2INT + T  2 B ](QW )]
N
1
= [(QW )0["2INT ](QW )]
N

"2 0
1 0
VN = (W Q)(QW )
W QW:
N
N

Replacing in the GMM criterion:

 0
0
0 
^N = arg min u () W ( W QW ) 1 W u () ;

N
N
N

Q)

193
and the optimal GMM estimator is

^N = X 0W (W 0QW ) 1W 0X  1 X 0W (W 0QW ) 1W 0Y  :

A.5.2 Random eects and strictly exogenous instruments
By denition, random eects

exogeneity (

E (X 0 ) = 0 and we assume strict

uncorrelated with

" at

E (wis0 uit) = 0

for

every time period). For a

s; t = 1; 2; : : : ; T;

qT 2 moment conditions. Let wit0  (wi1; wi2; : : : ; wit)

8t = 1; 2; : : : ; T , and set WSE;i = IT
wiT0 . Moment conditions
0
then read E (WSE;iui ) = 0.

which gives

We can show, using Theorem above, that GMM estimators using

the form for 2SLS or the 3SLS form are equivalent. We have

0 ) =  1=2
w0
 1=2WSE;i =  1=2(IN
wiT
iT
0 )( 1=2
I ) = W B;
= (IN
wiT
qT
SE;i

where

B =  1=2
IqT .

A.5.3 Fixed eects and strictly exogenous instruments

We assume now instruments are correlated with

i, we can use the rst-dierence operator

LT of dimension T  (T 1):

exogenous. To remove

where

194

APPENDIX 5. GMM ESTIMATION OF STATIC PANEL MODELS

where

6
6
LT = 6
6
6
4

Note that (

 T)

LT (L0T LT ) 1L0T .

If instruments

1 0 0
1 1 0




0
0

0 0 0
0 0 0




1 1
0
1

..
.

..
.

..
.

..
.

..
.

0
0
..
.

3
7
7
7
7:
7
5

LT : QT =

wit are strictly exogenous, we have

0 L0 u ) = E (Z 0 L0 " ) = 0;
E (ZSE;i
T i
SE;i t i

where

0:
ZSE;i = IT 1
wiT

Model in First-dierence form can be estimated by GMM using

ZSE;i as instruments.

A.5.4 Weakly exogenous instruments

In this case, we consider a

1q

vector of instruments

wit

such

that

E (wit0 uis) = 0; for t = 1; 2; : : : ; T; t  s:

There are T (T + 1)=2 such conditions: instruments are not correlated with future values of "it (and are not correlated with i ).
On the other hand, if instruments are weakly exogenous but are
correlated with

can be written

E (wit0 uis) = E (wit0 "is) = 0

where

uis = uis

for

t = 1; 2; : : : ; T

1; t  s;

ui;s 1.

Forward-Filter (Keane and Runkle 1992). Let F be a T T upper0

triangular matrix that satises F F = IT , so that Cov (F ui ) =

195

IT . We have F = fFij g, Fij = 0 for i > j . Using instruments

0 ), we have the following Forward-Filter
Wi = (wi01; wi02; : : : ; wiT
estimator:


 1
^ FF = X 0F 0 H (H 0H ) 1H 0F X
 X 0F 0 H (H 0H ) 1H 0F Y ;

where

F  = IN
F .

Wi

1=2 is not consistent unless Hi are strictly exogenous.

and lter 
But FF transformation preserves the weak exogeneity of instruments

wit.

When

is large and

Wi .

If we don't have conditional homoskedasticity:

plim N1

PN
0 0
i=1 HiF ui uiF Hi

1 PN H 0 F F 0 H
6= plimP
i
i=1 i
N

plim N1

N
0
i=1 Hi Hi :

A.5.5 Ecient GMM estimation

We now present alternative GMM estimators that may be more
ecient than IV-HT, IV-AM or IV-BMS. Why: under strict exogeneity assumption, we have much more moment conditions than
HT, AM or BMS.
We rst consider the case where we restrict

 = "2IT +  eT e0T
2

unrestricted

 matrix.

Consider the model with strictly exogenous regressors

yi = Ri + (eT
zi) + ui  Xi + ui;
0 )0 (a T  k matrix of
ui = (eT
i )+"i, Ri = (ri01; r20 i; : : : ; riT
0 0
00
time-varying regressors), eT
zi = [zi ; zi ; : : : ; zi ] (a T  g matrix

where

196

APPENDIX 5. GMM ESTIMATION OF STATIC PANEL MODELS

of time-invariant regressors).
Assume regressors

E (di
"i) = 0;

where

i:

rit = (r1it; r2it); zi = (z1i; z2i); E (r10 it i ) = E (w10 i i) = 0:

HT, AM and BMS instruments are of the form

WA;i = (QT Ri; eT

si); sHT;i = (r1i; z1i)
sAM;i = (r1i1; r1i2; : : : ; r1iT ; z1i)
sBMS;i = (sAM;i; r2i);
where

r2i = (r2i1

If the

r2i; : : : ; r2i;T 1 r2i).

E (Wi0uiu0iWi) = E (Wi0Wi),
ing the same instruments

Wi.

condition is violated and a unrestricted weighting matrix is used.

The strict exogeneity assumption implies

E [(LT
di)0ui] = E (L0T ui
di)
= E [L0T (eT i + "i)
di] = E (L0T "i
di) = 0;
where

LT

di is a T  [(T

1)(kT + g]

Bover (1995) propose a GMM estimator using instruments:

WB;i = (LT
di; eT
si) instead of WA;i = (QT Ri; eT
si):
Number of additional instruments wrt.
BMS:

rank(ZB;i)

rank(ZA;i) = (T

IV-HT, IV-AM or IV-

1)(kT + g)

k.

 is unrestricted.

Other

197
A.5.6 GMM with unrestricted variance-covariance matrix

ZB;i satisfy the no conditional heteroskedasticity assumption, but the variance-covariance of u is unrestricted.

We assume instruments

Result of Im, Ahn, Schmidt and Wooldridge (1996): The 3SLS

form of the GMM estimator with unrestricted
ments

 1ZA;i

using instru-

This is not

E (Ri0 QT  1ui) = E (Ri0 QT  1eT i) + E (Ri0 QT  1"i)

= E (Ri0 QT  1eT i):
But when BMS assumption is not true and with an unrestricted

, E (Ri0 QT  1eT i) 6= 0.

formation matrix instead of

Q =  1
and we can show that

QT

for removing

i :

QeT = 0.

Therefore:

because

si);
for

si:

198

specied

In the GMM case, we nd parameters by solving the system of

moment conditions or by nding



Z are instruments and VN is an estimate of the variance of

0 0
moment conditions: V = E (Z uu Z ). In a linear model, u( ) =
Y X where   , we can solve directly for ^ N :


^ GMM = X 0ZVN 1Z 0X 1 X 0ZVN 1Z 0Y:

where

In the IV case, we restrict

u to be a) homoskedastic (V

is diago-

nal), or b) heteroskedastic of known form. Example: panel data

= E (uu0) = IN
, where

= "2INT +  2 (IN
eT e0T ) and  = "2IT +  2 eT e0T :

Consider two IV estimators for panel data: 2SLS or 3SLS.

In the 2SLS case (HT, AM, BMS), we premultiply the model in
vector form

Zi:

yi = Xi + ui

by

 1=2 and then apply instruments



^ 2SLS = X0
1=2Z (Z 0Z ) 1Z 0
1=2X 1
 X 0
1=2Z (Z 0Z ) 1Z 0
1=2Y :
1=2Zi as instruAn equivalent 2SLS estimator obtains by using 
1

ments:



^ 2SLS = X0
1Z (Z 0
1Z ) 1Z 0
1X 1
 X 0
1Z (Z 0
1Z ) 1Z 0
1Y :
2

In the 3SLS case, we have





^ 3SLS = X 0Z (Z 0
Z ) 1Z 0X 1  X 0Z (Z 0
Z ) 1Z 0Y :

199
GMM and 3SLS are equivalent if the following condition holds:

E (Zi0uiu0iZi) = E (Zi0Zi) 8i = 1; 2 : : : ; N;
because, as

! 1,

N
1 0
1X
plim Z
Z = plim
Zi0u^iu^0iZi = E (Zi0uiu0iZi) = V:
N
N i=1
This condition is denoted
When condition

No conditional heteroskedasticity.

E (Zi0uiu0iZi) = E (Zi0Zi)

is strictly more ecient than 3SLS.

Impossible to prove 3SLS is more or less ecient than 2SLS, but
there exists a condition for numerical equivalence between 2SLS
and 3SLS:

The 2SLS and 3SLS estimators are equivalent if there

exists a non-singular, non-stochastic matrix B such that
1=2Z =
ZB .

Theorem 8

is
^
estimated from rst-stage  N for GMM. It states that under this
1=2) does
condition, ltering (premultiplying instruments by

200APPENDIX 6.

Appendix 6. A framework for simulation-based

inference
A.6.1 Heterogeneity and the linear property
In linear panel-data models, the residual consists of an heterogeneity factor

i and an i.i.d.

error term

"it:

uit = i + "it:
OLS (or, equivalently, ML) yield consistent but not ecient estimates if unobserved heterogeneity is omitted.

In nonlinear models, this often leads to signicant biases. Other

problem: dicult to compute the likelihood of nonlinear models
because of dependent observations for a given individual (

yit

is

not i.i.d.).

A.6.1.1 Example: Dynamic model with heterogenous AR(1)

root and no individual eect

where

yit = iyi;t 1 + "it = ( + i)yi;t 1 + "it;

jij < 1, i independent from "it, "it is N (0; "2).

The

yit = "it + i"i;t 1 + 2i "i;t 1 +    + hi"i;t h + : : : :

If the restricted model is estimated, under the following data generating process:

yit = yi;t 1 + "it;

201
the OLS estimate of

 is

N
1X
Cov(i; V ar(yi;t
P
^ 
i +
N i=1
1=n i V ar(yi;t

1))
1)

N
Covi(P
i; "2=(1 2i ))
1X
 +
:
=
N i=1 i 1=n i "2=(1 2i )

i > 0, Cov(i; "2=(1 2i )) > 0 and ^

average of the true i 's (the bias is positive).

If all

overestimates the

where

i are i.i.d.

E ( i) = 0.

This is

yit v  exp( yit);

the Maximum Likelihood estimate of

^ =
We have

^ T !1
!

"

"P

 is

N PT
i=1 t=1 yit

NT

N
X

1
1
N i=1 i

# 1

# 1

N
1X
<
:
N i=1 i

Hence, the MLE of the misspecied model underestimates the average of individual parameters

i .

202APPENDIX 6.

A FRAMEWORK FOR SIMULATION-BASED INFERENCE

In many cases, it is not possible to lter out the individual effect without very restrictive assumptions (e.g., Fixed-eect Logit,
Another possibility is to integrate

out the heterogeneity factor.

Basic idea: specify a density distribution for

i and compute the

conditional likelihood.

yit conditional on xit and i is

f~(yit; xit + i) with i v ( ; );
where is a distributional parameter, and the vector of param-

Assume the density function of

eters of interest.
The distribution of

yit conditional on observed variables is

f (yitjxit; ; ) =

A.6.1.3 Example: Poisson model

Assume

exp(xit + i)yit
f~(yit; xit + i) =
exp[ exp(xit + i)]:
yit!
Change of variable: i = exp( i ), with probability distribution:
1= 1 exp( =)
(; ) =
;
( )1= (1= )
where
(:): Gamma distribution, and > 0. Then it can be
shown that

(1= + yit)[ exp(xit )]yit

f (yitjxit; ; ) =
:
(1= ) (yit + 1)[1 + ; exp(xit )]yit+1=

203
This is the

A.5.1.4 Example: the Probit model again

Probit with heterogeneity:

P rob[yit = 1jxit; i] = [xit + i]:

Assume

i v N (0;  2 ):
P rob[yit = 1jxit] =

where

(:):

1

(xit + )
d ;
 

density function of

N (0; 1).

Since observations are dependent:

P rob[yi1 = 1; : : : ; yiT = 1] =

6=

T
Y
t=1

Z Y
T

1

(xit + )
d



t=1

P rob[yit = 1]:

In more complex

cases, one can use simulation techniques to approximate integrals

of the form

M (yitjxit; ; ) =

A.6.2 Integration by simulation

Purpose: approximate multiple integrals using Monte Carlo (simulation) techniques.

204APPENDIX 6.

A FRAMEWORK FOR SIMULATION-BASED INFERENCE

We can write

M (yitjxit; ; ) =

( ; ) 0
m(yit; xit + ) 0
 ( ; 0)d ;
0
 ( ; )

(:; 0) is a known distribution density with xed parame0

ters . We have for individual i at time t:


( ; )
M (yit jxit; ; ) = E m(yit; xit + ) 0
;
 ( ; 0)
0
0
which is the expectation using distribution of m( ) ( )= ( ).
0
Density function  is the importance sampling function.
where

S random variables for individual i: is; s = 1; : : : ; S

from distribution  0 , we can approximate the above

If we can nd

drawn

expectation by

S
1X
( is ; )
s
m(yit; xit + i ) 0 s 0 :
S s=1
 ( i ; )
Under (mild) regularity assumptions, the simulated expression
converges to the above expectation, using a weak Law of Large
Numbers. Two issues in practice:

 Choice of density function 0( ; );

 Number of draws to obtain consistency ?
For the choice of the importance sampling function, make sure the
domain of

tails of distribution). Regarding the number of draws, consistency

of estimator depends on estimation procedure.

A.6.3 Simulated GMM and Maximum Likelihood estimators

205
Gouriroux and Monfort (J. of Econometrics, 1993): Simulated
GMM (SGMM) and Simulated Maximum Likelihood (SML).
For SGMM, when population moments are impossible to compute,
we replace

S
1X
E [f (yit; xit; i; ] = 0 by
[f (yit; xit; is; ]  0;
S s=1
or by

S
1X
( s ; )
[f (yit; xit; is; ] 0 is  0:
S s=1
 ( i ; )

The SGMM criterion is then

s
MGMM
=

( N
X

S
1X
[f (yi; xi; is; ]0 Zi
S s=1
i=1

!)

T 1

N
X

S
1X
0
Zi
[f (yi; xi; is; )]
S s=1
i=1

Zi is a T  L matrix of instruments. The SGMM is consistent and asymptotically normal when N tends to innity and S

where

is xed. This is because we can use the weak Law of Large Numbers for consistency of the simulator

1P f
s
S

towards

E f

and a

For the SML estimator, we want to compute

log L() =
where heterogeneity

f (yijxi; ).

N
X
i=1

206APPENDIX 6.

Then

f (yijxi; ) can be approximated by

S
1X
f~(yi; xi; is; );
S s=1
where

the Simulated Log-likelihood is

Ls() =

"

N
X

S
X

i: is ; s = 1; 2; : : : ; S

and

1
1
log
f~(y ; x ; s; ) :
N i=1
S s=1 i i i

N=S

! 0. In practice, a very large number of simulated draws

may be necessary.

A.6.4 Choice of simulation number and mode

We use the Gouriroux and Monfort (1993) result. The SGMM
and SML criteria are of the form

"

GN () =

1
N

N
X

in

 to

G() = [E (yi; xi; E (yi; xi; ; ))] :

Two dierent simulated criteria can be used for GN ( ): whether
I
D
identical (GN ( )) or dierent sets (GN ( )) of simulation draws

207
are used for each individual:

"

N
S
1X
1X
I
GN ( ) =
yi; xi;
(yi; xi; s; )
N i
S s

"

GDN () =

1
N

N
X
i

yi; xi;

1
S

S
X
s

!#

;
!#

GIN () converges to the random variable (it is a function of ( 1; : : : ; S )):

"

E yi; xi;

1
S

S
X
s

!#

(yi; xi; s; )

I
G(). Therefore ^ that maximizes (SML)
I
or minimizes (SGMM) GN ( ) is inconsistent.
GDN () converges to the non random scalar:

which is dierent from

"

!#

S
1X
E E yi; xi;
(yi; xi; s; ) ;
S s
which is in general dierent from G( ). But if function is linear
D
D
wrt. E (:), GN ( ) converges to G( ) and ^
 is consistent.

 Case 2. S and N ! 1.
Both

^I

and

^D

are consistent.

yit = xit +  i + ""it;

yit = 1 if yit > 0;
yit = 0 if yit  0; (Probit);

208APPENDIX 6.

and

yit = xit +  i + "";

yit = yit if yit > 0;
yit = 0 if yit  0; (Tobit);
where i v N (0; 1), "it v N (0; 1).
Because
the

yit

T -fold

are serially correlated and the likelihood would contain

integrals. But we can consider the conditional likelihood

functions of

yi given xi and i:

f (yijxi; i; ) =

yit =1

(xit +  ) 

for the Probit and

1
y
f (yijxi; i; ) =
it

yit >0 "
Y

Y
yit =0

Y
yit =0

( xit

xit
"

xit 
"

 )

as simulators.

Appendix 7. Example: the SAS c Software

*
*
*
*
*
*

;
DYNTAB.SAS ;
;
Uses datafile DYNTAB3.DAT;
;
Create library and file names ;
* Change directory information below ;

libname water 'd:/dea/panel';

filename watfile 'd:/dea/panel/dyntab3.dat';
* Create SAS table and read data from Ascii file ;
data wat;
infile watfile;
input id year conso price revenue precip ;
* Compute logs ;
lconso=log(conso); lprice=log(price);
lrevenue=log(revenue);
run;
* Descriptive statistics ;
proc means data=wat;run;
* OLS regression ;
proc reg data=wat;
model lconso = lprice lrevenue;
run;
* Model 1: One-way Fixed effects ;
* cs=116:

209

210

* option /fixone: Set one-way Fixed-effect ;

proc tscsreg data=wat cs=116;
model lconso= lprice lrevenue /fixone ;
run;
* Model 2: Two-way Fixed effects ;
* option /fixtwo: Set two-way Fixed-effect ;
proc tscsreg data=wat cs=116;
model lconso= lprice lrevenue /fixtwo ;
run;
* Model 3: One-way Random effects ;
* option /ranone: Set one-way Random-effect ;
proc tscsreg data=wat cs=116;
model lconso= lprice lrevenue /ranone;
run;
* Model 4: Two-way Random effects ;
* option /rantwo Set Two-way Random-effect ;
proc tscsreg data=wat cs=116;
model lconso= lprice lrevenue /rantwo;
run;
* Model 5: One-way Random effects with AR(1) ;
* option /ranone parks rho Set One-way Random-effect ;
* and compute RHO: Ar(1) parameter ;
proc tscsreg data=wat cs=116;
model lconso= lprice lrevenue /ranone parks rho;
run;
* Compute parameter estimates on each cross section ;
proc sort data=wat;
by year;
proc reg data=wat;

SOFTWARE

211
model lconso= lprice lrevenue ;
by year;
run;
* Compute Within and Between estimates ;
* using the MEANS procedure ;
proc sort data=wat;
by id;
proc means data=wat noprint;
var lconso lprice lrevenue ;
by id;
output out=out1 mean=mconso mprice mrevenue ;
data out1;set out1;
keep id mconso mprice mrevenue ;
data wat;
merge wat out1;
by id;
data wat;set wat;
qconso=lconso-mconso; qprice=lprice-mprice;
qrevenue=lrevenue-mrevenue;
* Within regression ;
proc reg data=wat;
model qconso = qprice qrevenue ;
run;
* Between regression ;
proc reg data=wat;
model mconso = mprice mrevenue;
run;

212

SOFTWARE

ESTIMATES USING TSCSREG PROCEDURE

MODEL 1. ONE-WAY FIXED EFFECTS

The SAS System 16:15 Monday, January 22, 2001 3

TSCSREG Procedure
Dependent Variable: LCONSO

Model Description
Estimation Method
FIXONE
Number of Cross Sections 116
Time Series Length
6
SSE
MSE
RSQ

Model Variance
2.578099 DFE
578
0.00446
Root MSE 0.066786
0.9344

F Test for No Fixed Effects

Numerator DF:
115 F value: 58.3964
Denominator DF: 578 Prob.>F: 0.0000
Parameter Estimates
Variable
CS 1
CS 2
CS 3
CS 4
CS 5
... ...
CS 112
CS 113
CS 114
CS 115
INTERCEP
LPRICE
LREVENUE

DF
1
1
1
1
1
...
1
1
1
1
1
1
1

Parameter
Estimate
-0.455773
-0.222476
0.153338
-0.131488
0.027422
...
0.420843
-0.322888
-0.259767
-0.240823
5.099257
-0.134245
0.024386

Standard
Error
0.039463
0.039923
0.038900
0.039174
0.038890
...
0.040309
0.039376
0.038678
0.039379
0.366957
0.018447
0.033223

T for H0:
Parameter=0
-11.549433
-5.572620
3.941882
-3.356518
0.705132
... ...
10.440506
-8.200102
-6.716134
-6.115479
13.896065
-7.277506
0.734009

Prob > |T|

0.0001
0.0001
0.0001
0.0008
0.4810
...
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.4632

Variable
Label
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Intercept

213
MODEL 2. TWO-WAY FIXED EFFECTS

The SAS System 16:15 Monday, January 22, 2001 7

TSCSREG Procedure
Dependent Variable:

LCONSO

Model Description
Estimation Method
FIXTWO
Number of Cross Sections 116
Time Series Length
6
SSE
MSE
RSQ

Model Variance
2.205671 DFE
573
0.003849 Root MSE 0.062043
0.9439

F Test for No Fixed Effects

Numerator DF:
120 F value: 65.6530
Denominator DF: 573 Prob.>F: 0.0000

Variable
CS 1
CS 2
CS 3
...
CS 114
CS 115
TS 1
TS 2
TS 3
TS 4
TS 5
INTERCEP
LPRICE
LREVENUE

DF
1
1
1
...
1
1
1
1
1
1
1
1
1
1

Parameter Estimates
Parameter Standard T for H0:
Estimate
Error
Parameter=0
-0.535192 0.040793 -13.119702
-0.302435 0.041809 -7.233670
0.120803
0.037066 3.259125
... ...
...
... ...
-0.288486 0.036463 -7.911820
-0.256215 0.036669 -6.987209
-0.102087 0.017883 -5.708681
-0.047565 0.016463 -2.889216
-0.030524 0.014486 -2.107135
-0.007359 0.012507 -0.588378
-0.025528 0.009992 -2.554900
6.316873
0.396540 15.929983
-0.251061 0.034210 -7.338896
-0.053316 0.033244 -1.603773

0.0001
0.0001
0.0012
...
0.0001
0.0001
0.0001
0.0040
0.0355
0.5565
0.0109
0.0001
0.0001
0.1093

Variable
Label
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Cross Sec
Time Seri
Time Seri
Time Seri
Time Seri
Time Seri
Intercept

214

SOFTWARE

The SAS System 16:15 Monday, January 22, 2001 11

TSCSREG Procedure
Dependent Variable: LCONSO

Model Description
Estimation Method
RANONE
Number of Cross Sections 116
Time Series Length
6
Variance Component Estimates
SSE 3.12498
DFE
693
MSE 0.004509 Root MSE 0.067152
RSQ 0.1087
Variance Component for Cross Sections
Variance Component for Error

0.043243
0.004460

Hausman Test for Random Effects

Degrees of Freedom: 2
m value: 14.4912 Prob. > m: 0.0007

Variable
INTERCEP
LPRICE
LREVENUE

DF
1
1
1

Parameter
Estimate
4.692305
-0.149074
0.053077

Parameter Estimates
Standard T for H0:
Error
Parameter=0
0.354917 13.220844
0.017611 -8.465039
0.032306 1.642977

Prob > |T|

0.0001
0.0001
0.1008

Variable
Label
Intercept

215
MODEL 4. TWO-WAY FIXED EFFECTS

The SAS System 16:15 Monday, January 22, 2001 12

TSCSREG Procedure
Dependent Variable:

LCONSO

Model Description
Estimation Method
RANTWO
Number of Cross Sections 116
Time Series Length
6
Variance Component Estimates
SSE 2.707154 DFE
693
MSE 0.003906 Root MSE 0.062501
RSQ 0.0907
Variance Component for Cross Sections
Variance Component for Time Series
Variance Component for Error

0.043638
0.000746
0.003849

Hausman Test for Random Effects

Degrees of Freedom: 2
m value: 22.2377 Prob. > m: 0.0000

Variable
INTERCEP
LPRICE
LREVENUE

DF
1
1
1

Parameter
Estimate
5.674742
-0.225151
-0.018251

Parameter Estimates
Standard T for H0:
Error
Parameter=0
0.371984 15.255323
0.027604 -8.156464
0.032401 -0.563297

0.0001
0.0001
0.5734

Variable
Label
Intercept

216

SOFTWARE

WITHIN REGRESSION USING PROC REG

Source
Model
Error
c Total

Analysis
Sum of
DF
Squares
2
0.31252
693 2.57810
695 2.89062

Root MSE
Dep Mean
C.V.

Variable
INTERCEP
QPRICE
QREVENUE

DF
1
1
1

of Variance
Mean
Square
F Value
0.15626 42.003
0.00372

0.06099
-0.00000
-1.291786E17

R-square

Prob>F
0.0001

0.1081
0.1055

Parameter Estimates
Parameter
Standard
T for H0:
Estimate
Error
Parameter=0
-5.28092E-17 0.00231195 -0.000
-0.134245
0.01684666 -7.969
0.024386
0.03034107 0.804

1.0000
0.0001
0.4218

Variable
Label

BETWEEN REGRESSION USING PROC REG

Source
Model
Error
C Total

DF
2
693
695

Analysis of Variance
Sum of
Mean
Squares
Square
F Value
7.13103
3.56551 84.369
29.28684 0.04226
36.41786

Root MSE
Dep Mean
C.V.

Variable
INTERCEP
MPRICE
MREVENUE

DF
1
1
1

Parameter
Estimate
-0.176444
-0.259461
0.494483

0.20557
4.99481
4.11576

R-square

Prob>F
0.0001

0.1958
0.1935

Parameter Estimates
Standard
T for H0:
Error
Parameter=0
0.68091356 -0.259
0.02278084 -11.389
0.05958703 8.298

0.7956
0.0001
0.0001

Variable
Label

217

Appendix 8. A crash course in Gauss c

Introduction

Gauss is an interpreter computer language, that is most conveniently run in interactive mode (global variables are kept in memory until one quits Gauss). It has a small built-in editor useful for
long jobs, or it can be used in command mode.

Editing and running jobs

When Gauss is executed rst, you are inside the command mode,
with the following prompt:

[Gauss].

You can toggle between the

command mode and the edit mode using either tool bar (Windows bar at the bottom, Gauss bar on top). In command mode,
you can edit any le (for example

myprog.prg)

by typing

edit

You

may edit the preselected le by entering the F4 function key. In

edit mode, simply use the Run option on top, or enter key function F3. You may save the program by entering the F2 function
key.

Saving results and output management

To declare a text le for output, use the syntax

output file=c:/mydir/toto.out reset;

The reset option clears the le if it exists!
In a program, you can choose to have output written to the le
or not (useful for inspecting results on the screen only):

output le).

218

APPENDIX 8. A CRASH COURSE IN GAUSS

You can either work with data les in text format (Ascii), or with
preexisting Gauss datasets. To load a text-format data le:

or

1

To create one, you must specify a) a data matrix ( ); b) a vector

of variable names (

mydata").

varnames);

("

Then, use the command

call saved(x,"mydata",varnames).

Basic operators
In Gauss, most operators return a value that may be stored in a
variable, or printed to screen. If no assigment command is given,
the program will simply output the result to the screen. Example:

2*x; vs. y=2*x.

You don't have to specify the dimension of ectors or matrices if
they are assigned a computed value.

prior value in two cases: a vector/matrix of parameters (that can

be modied afterwards) or when using loops (see below). To create a vector with predetermined values:

x={1 2 3};

(a

1 Note:

commands are always separated by semicolons

;.

vnames={"a","b","c"}.

219
Here is a list of useful operators:

cols(x)
rows(x)
meanc(x)
stdc(x)
sqrt(x)
sumc(x)
cumsumc(x)
columns of

x;

Returns the number of columns of

Returns the number of rows of

x;

x;

x;

Returns the standard deviation of columns in

Computes square root of elements in

x;

x;

x;

Returns the cumulative sum of elements in

cdfn(x)
Returns the cumulative normal distribution (x);
2
cdfchic(x,y)
Returns the complement to 1 of the  (x) cumulative distribution with

2
puting p-values of

tests.

Working with matrices

x'
Transposes matrix or vector x;
y=x1 x2, y=x1|x2
Concatenates

two vectors or matrices

horizontally or vertically;

y=x[.,1]
Selects column 1 and all rows of matrix x;
y=x[1:10,.]
Selects rows 1 to 10 and all columns;
y=x[1:10,1:20]
Selects columns 1 to 20 and rows 1 to 10;
vec(x)
Creates a vector from a matrix, by stacking all
columns one after the other. vec(x) is NT  1 if x is N  T ;
diag(x)
Returns the rst diagonal of matrix x (must be
square);

reshape(x,n,t)
Reshapes matrix x into a N  T matrix;
a*b*c
Performs matrix multiplication (check number of rows
and columns!);

a.*b, a./b

220

a and b must have the same dimension);

inv(x)
Compute inverse of x (for generalized
division (

inverse, use

invpd(x));
zeros(n,m)
Returns a n  m matrix of zeros;
ones(n,m)
Returns a n  m matrix of ones;
eye(n)
Returns a n  n identity matrix;
a.*.b
Computes the Kronecker product a
b;
Conditional operators and loops
Useful for testing and creating dummy variables. Operators:

.eq,

for equal to, not equal to, strictly

less than, less than or equal to, strictly greater than, greater than
or equal to.
Example: suppose you want to create an indicator variable equal

xi  50; i = 1; 2; : : : ; N . The syntax would be

y= x .le 50, which creates a N  1 vector y , with yi = 1 if
xi  50 and 0 otherwise. That is, when a variable is assigned
to 1 when

the result of a condition, Gauss automatically creates an indicator variable.

Example: You want to create a new variable
and equal to

y if z > 0.

+ y.*(z .gt 0).

z , equal to x if z < 0

The syntax would be

z = x.*(z .lt 0)

Loops are not recommended because they produce lengthy processes, and vector operators should always be preferred. But in
some cases, they are necessary. Examples of loops are:

y[i]=x[i]+a;
i=i+1;
endo;

221
or

i=1; do while i=<n;

y[i]=x[i]+a;
i=i+1;
endo;
Note: in the above examples, vector
instance

y=zeros(n,1).

It is very easy with Gauss to sort data vectors or matrices, or to

select a subset of observations.

y=sorthc(x,1)

Sorts matrix

x using

variable in column 1

as key;

y=selif(x, x .eq 1)

Creates matrix

Creates matrix

equal to 1;

y=delif(x, x .lt 0)
tive values from

x;

from values of

by deleting nega-

Creating procedures
Very useful to speed up repetitive tasks. The general syntax is

proc func(a);
local toto;

:::

retp(toto);
endp;.

a as input (scalar, vector or matrix), create toto as

local variable (not accessible outside procedure func) and return
a single argument toto. In some cases, it is necessary to have more

222

proc (3)=func(a1,a2,: : : ,aK);

local toto1,toto2,toto3;

:::

retp(toto1,toto2,toto3);
endp;.
This code declares 3 inputs

there will be 3 outputs.

In that case, we must use the following syntax when calling this
procedure:

{b1,b2,b3}=func(a1,a2,a3);

Beware of the use of local variables; any variable used in the procedure must either be declared as local (its value is lost when one
quits the procedure) or else where in the program (this will be a
global variable). A possibility to avoid problems is to declare all
variables as global at the start of the program, with the syntax:

Example: procedure for returning deviations from individual means

(Within operator).

proc(x);
local toto;
toto=reshape(x,n,t);
toto=toto-meanc(toto');
toto=reshape(toto,n*t,1);
retp(toto);
endp;
Note in this case, variables

and

one could use them as arguments in an equivalent, more compact

procedure:

proc(x,n,t);
local toto;

223

toto=reshape(x,n,t);
retp(reshape(toto-meanc(toto'),n*t,1));
retp(toto);
endp;
And if we wished to return both Between and Within:

proc (2)=(x,n,t);
local toto;
toto=reshape(meanc(reshape(x,n,t)'),n*t,1);
retp(toto,x-toto));
endp;
Some useful built-in procedures

Some of these procedures require a Gauss dataset to be created.

If not, a 0 is put in place of the Gauss dataset name.

call dstat(0,x)
in

x;

Prints descriptive statistics for elements

call dstat("mydata",1|3)

call ols(0,y,x);

y on x;

optmum, which works

as follows:

library optmum;optmum;

guments;

ters;

Main command;

is the

224

ret is a return

code (equal to 0 if convergence is OK).

The optmum procedure calls a user-dened procedure (here,

func)

that returns the value of the function to be minimized, depending

on parameters (here,

proc(z);
:::;
retp(crit);
endp;

z).

Example: To estimate a nonlinear model by minimizing the residual sum of squares, where the model is

log( 1)wi:

yi = 0 + 1 2xi +

library optmum;optmum;
x0={0.1 , 0.1 , 0.5};
{x, f, g, ret} = optmum(&func,x0);
proc(z);
local err;
err=y-z-z*z*x-ln(z)*w;

z , 2 is z , and variables y; x; w

be global variables, while err (the residual) is local
1 PN u2
err=meanc(err'*err);
Computes
i i
N
Note:

is

z , 1

is

retp(crit);
endp;

must

225

Appendix 9. Example: The Gauss c software

/* DYNTAB.PRG 16 01 2001 Residential water use */
new; clear all;
library tscs,pgraph;
tscsset;graphset;

output le=d:/dea/panel/dyntab.out reset;

output on;

n=116; t=6;
id=x[.,1];
year=x[.,2];
conso=ln(x[.,3]);
price=ln(x[.,4]);
revenue=ln(x[.,5]);
precip=ln(x[.,6]);

vnames="year","conso","price","revenue","precip","id" ;
call saved(year conso price revenue precip id,"watle",vnames);

y= conso ;
x= price,revenue ;
grp= id ;
__title("Water demand equation");

call tscs("watle",y,x,grp);

226

APPENDIX 9. EXAMPLE: THE GAUSS

SOFTWARE

=====================================================================
TSCS Version 3.1.2 1/17/01 3:51 pm
=====================================================================
Data Set: watfile
 OLS DUMMY VARIABLE RESULTS 
Dependent variable: conso


Observations :
Number of Groups :
Degrees of freedom :
Residual SS :
Std error of est :
Total SS (corrected) :
F = 35.033
P-value =
Var
price
revenue

Coef.
-0.134245
0.024386

Std.

Group Number
1
2
3
...
114
115
116

696
116
578
2.578
0.067
2.891
with 2,578 degrees of freedom
0.000

Coef.

-0.347461
0.035045

Std.

Error

0.018447
0.033223

Dummy Variable
4.643484
4.876781
5.252595
... ... ...
4.839490
4.858434
5.099257

t-Stat

-7.277506
0.734009

Standard Error
0.365639
0.370063
0.369474
... ... ...
0.365496
0.359065
0.366957

F-statistic for equality of dummy variables :

F(115, 578) = 58.3964 P-value: 0.0000

P-Value
0.000
0.463

227

OLS ESTIMATE OF CONSTRAINED MODEL


Dependent variable: conso

Observations :
696
Number of Groups :
116
Degrees of freedom :
693
R-squared :
0.172
Rbar-squared :
0.170
Residual SS :
32.532
Std error of est :
0.217
Total SS (corrected) : 39.308
F = 72.175
with 3,693 degrees of freedom
P-value =
0.000
Var
CONSTANT
price
revenue

Coef.
1.164761
-0.249873
0.376643

Std.

Coef.

-0.406149
0.257121

Std.

Error

0.598014
0.022153
0.052746

t-Stat

1.947715
-11.279345
7.140637

P-Value

0.052
0.000
0.000

FULL, RESTRICTED, AND PARTIAL R-SQUARED TERMSDUMMY VARIABLES ARE CONSTRAINED

TABLE OF R-SQUARED TERMS
R-squaredfull model:
0.934
R-squaredconstrained model: 0.172
Partial R-squared:
0.921

FULL, RESTRICTED, AND PARTIAL R-SQUARED TERMSX VARIABLES ARE CONSTRAINED

228

SOFTWARE

TABLE OF R-SQUARED TERMS

R-squaredfull model:
0.934
R-squaredconstrained model: 0.926 Partial R-squared:
0.108

GLS ERROR COMPONENTS RESULTS


Dependent variable: conso

Observations :
696
Number of Groups :
116
Degrees of freedom :
693
Residual SS :
3.135
Std error of est :
0.067
Total SS (corrected) : 3.517
F = 22047.870
with 3,693 degrees of freedom
P-value =
0.000
Std. errors of error terms:
Individual constant terms: 0.206
White noise error : 0.067

Var
CONSTANT
price
revenue

Coef.
4.687235
-0.149316
0.053560

Std.

Coef.

-0.363264
0.071009

Std.

Error

0.355285
0.017623
0.032338

t-Stat

13.192903
-8.472974
1.656247

P-Value

0.000
0.000
0.098

229
Group Number
1
2
3
4
5
...
112
113
114
115
116

Random Components
-0.346522
-0.121608
0.250638
-0.020350
0.128761
... ... ...
0.512636
-0.216224
-0.151243
-0.125587
0.104064

Lagrange Multiplier Test for Error Components Model

Null hypothesis: Individual error components do not exist.
Chi-squared statistic (1): 1367.1014
P-value:
0.0000

230

Appendix 10. IV and GMM estimation with

Gauss c
/* IV2.PRG Instrumental variable estimation and GMM estimation
Model y(it) = X(it)beta + Z(i) gamma
We use Hausman-Taylor, Amemiya-MaCurdy, Breusch-Mizon-Schmidt instruments,
both for IV and GMM */
new;clear all;
/* You only need to change this block */
/* Define dimensions
N: number of units, T=number of time periods
nvar= Nb. of variables to be read
k1: Nb. of X1it, k2: Nb. of X2it, g1= Nb.
kq= k1+k2, kb= k1+k2+g1+g2*/
n=595;
t=7;
nvar=13;
k1=4;
k2=5;
g1=2;
g2=1;
kq=k1+k2;
kb=k1+k2+g1+g2;
et=ones(t,1);
un=ones(n*t,1);
unb=ones(n,1);
/* Read data */
output file=iv1.out reset;
expe=x[.,1];
expe2=x[.,2];
wks=x[.,3];

of Z1i, g2: Nb.

of Z2i

231
occ=x[.,4];
ind=x[.,5];
south=x[.,6];
smsa=x[.,7];
ms=x[.,8];
fem=x[.,9];
unioni=x[.,10];
edu=x[.,11];
blk=x[.,12];
lwage=x[.,13];
/* Define matrices X, Z and vector Y */
x1=occ south smsa ind;
x2=expe expe2 wks ms unioni;
z1=fem blk;
z2=edu;
y=lwage;
x=x1 x2;
z=z1 z2;
/* You don't need to change anything after this */
/* Compute Between and Within transformations:
Caution: keep that order for BXZ: X,Z,Y */
qx=with(x y);
bxz=bet(x z y);
by=bxz[.,cols(bxz)];
bxz=bxz[.,1:cols(bxz)-1];
qy=qx[.,cols(qx)];
qx=qx[.,1:cols(qx)-1];
/* Within regression and error term (uw) */
betaw=inv(qx'qx)*qx'qy;
uw=qy-qx*betaw;
/* Compute variance with instruments */
exob=un bxz;
gamb=inv(exob'exob)*(exob'by);

BX and QX

232

APPENDIX 10. IV AND GMM ESTIMATION WITH GAUSS

ub=by-exob*gamb;
sigep=uw'uw/(n*(t-1)-kq);
sigq=sqrt(sigep*diag(inv(qx'qx)));
a=x1 z1;
di=by-bxz[.,1:kq]*betaw;
zz=un z1 z2;
gamhatw=inv(zz'*a*inv(a'*a)*a'*zz)*zz'*a*inv(a'*a)*a'*di;
s2=(1/(n*t))*(by-bxz[.,1:kq]*betaw
-zz*gamhatw)'*(by-bxz[.,1:kq]*betaw-zz*gamhatw);
sigal=s2-(1/t)*sigep;
theta=sqrt(sigep/(sigep+t*sigal));
/* GLS transformation and estimate
Caution: keep the order 1,X1,X2,Z1,Z2 in matrix EXOG */
exog=gls(un x1 x2 z1 z2 y);
yg=exog[.,cols(exog)];
exog=exog[.,1:cols(exog)-1];
betagls=inv(exog'exog)*(exog'yg);
siggls=sqrt(sigep*diag(inv(exog'exog)));
/* HT */
aht=un qx bet(x1) z1;
betaht=inv(exog'*aht*inv(aht'*aht)*aht'*exog)*exog'*aht*inv(aht'*aht)
*aht'*yg;
sight=sqrt(sigep*diag(inv(exog'*aht*inv(aht'*aht)*aht'*exog)));
/* AM */
x1s=tam(x1);
aam=un qx x1s z1;
betaam=inv(exog'*aam*inv(aam'*aam)*aam'*exog);
betaam=betaam*exog'*aam*inv(aam'*aam)*aam'*yg;
sigam=sqrt(sigep*diag(inv(exog'*aam*inv(aam'*aam)*aam'*exog)));
/* BMS */

233
abms1=aam tbms(with(x2));
/* This is the general form for BMS instrument, it should work in most
cases. But with the application to PSID data, we must drop some variables,
see below. This means you have to delete ABMS1 below for your application
*/
/* Remove abms1 just below: */
abms1=un qx bet(x1) tbms(with(occ south smsa ind ms wks unioni)) z1;
betabms1=inv(exog'*abms1*inv(abms1'*abms1)*abms1'*exog)
*exog'*abms1*inv(abms1'*abms1)*abms1'*yg;
sigbms1=sqrt(sigep*diag(inv(exog'*abms1*inv(abms1'*abms1)*abms1'*exog)));
/* Compute variance-covariance matrices */
varq=sigep*inv(qx'qx); varg=sigep*inv(exog'*exog);
varht=sigep*inv(exog'*aht*inv(aht'*aht)*aht'*exog);
varam=sigep*inv(exog'*aam*inv(aam'*aam)*aam'*exog);
varbms1=sigep*inv(exog'*abms1*inv(abms1'*abms1)*abms1'*exog);
test1=(betagls[2:kq+1]-betaw)'*inv(varq-varg[2:kq+1,2:kq+1]);
test1=test1*(betagls[2:kq+1]-betaw);
test2=(betaht[2:kq+1]-betaw)'*inv(varq-varht[2:kq+1,2:kq+1])
*(betaht[2:kq+1]-betaw);
test3=(betaht-betaam)'*inv(varht-varam)*(betaht-betaam);
test4=(betaam-betabms1)'*inv(varam-varbms1)*(betaam-betabms1);
output file=iv1.out reset;
output on;
"Within estimates ";
" Estimate standard error t-stat ";
betaw sigq betaw./sigq;
"GLS estimates";
"sigma(alpha),sigma(epsilon),theta(=(sig(ep)/(sig(ep+t*sig(al)))**(1/2))";
sigal sigep theta;
" Estimate standard error t-stat ";
betagls siggls betagls./siggls;

234

"HT estimates ";

" Estimate standard error t-stat ";
betaht sight betaht./sight;
"AM estimates ";
" Estimate standard error t-stat ";
betaam sigam betaam./sigam; "BMS estimates ";
" Estimate standard error t-stat ";
betabms1 sigbms1 betabms1./sigbms1;
"Hausman test statistics and p-value ";
"Within vs. GLS ";
test1 cdfchic(test1,kq);
"Within vs. HT ";
test2 cdfchic(test2,k1-g2);
"AM vs. HT ";
test3 cdfchic(test3,cols(aam)-cols(aht));
"BMS vs. AM ";
test4 cdfchic(test4,cols(abms1)-cols(aam));
/* GMM estimation */
b1,se1,b2,se2,sar = gmm(y,un x1 x2 z1 z2,aht,1);
"GMM-HT estimates ";
" Estimate standard error t-stat ";
b2 se2 b2./se2;
"Hansen test and p-value ";
sar cdfchic(sar,cols(aht)-rows(b2));
b1,se1,b2,se2,sar = gmm(y,un x1 x2 z1 z2,aam,1);
"GMM-AM estimates ";
" Estimate standard error t-stat ";
b2 se2 b2./se2;
"Hansen test and p-value ";
sar cdfchic(sar,cols(aam)-rows(b2));
b1,se1,b2,se2,sar = gmm(y,un x1 x2 z1 z2,abms1,1);
"GMM-BMS estimates ";

235
" Estimate standard error t-stat ";
b2 se2 b2./se2;
"Hansen test and p-value ";
sar cdfchic(sar,cols(abms1)-rows(b2));
output off;
proc bet(w);
/* Compute BX from matrix w */
local i,term,betx;
term=reshape(w[.,1],n,t);
term=meanc(term').*.et;
term=reshape(term,n*t,1);
betx=term;
i=2;
do until i>cols(w);
term=reshape(w[.,i],n,t);
term=reshape(meanc(term').*.et,n*t,1);
betx=betx term;
i=i+1;
endo;
retp(betx);
endp;
proc with(w);
/* Compute Within transformation for matrix W */
retp(w-bet(w));
endp;
proc gls(w);
/* GLS transformation */
local term; term=w-(1-theta)*bet(w);
retp(term);
endp;
proc tam(w);
/* AM transformation, stacking time observations */
local i,term,xstar;
term=reshape(w[.,1],n,t).*.et;
xstar=term;

236

APPENDIX 10. IV AND GMM ESTIMATION WITH GAUSS

i=2;
do until i>cols(w);
term=reshape(w[.,i],n,t).*.et;
xstar=xstar term;
i=i+1;
endo;
retp(xstar);
endp;
proc tbms(w);
/* BMS transformation, stacking time observations but deleting last column
*/
local i,term,xstar;
term=reshape(w[.,1],n,t).*.et;
xstar=term[.,1:cols(term)-1];
i=2;
do until i>cols(w);
term=reshape(w[.,i],n,t).*.et;
xstar=xstar term[.,1:cols(term)-1];
i=i+1;
endo;
retp(xstar);
endp;
proc (5)=gmm(y,x,z,d);
local zx,w,w2,b,e,e2,b2,se,se2,sar2;
zx = z'x;
if d==1;
w = invpd(inw(z));
else;
w = invpd(z'z);
endif;
b = invpd(zx'w*zx)*zx'w*z'y;
e = y-x*b;
w2 = ezw(e,z);
se = invpd(zx'w*zx)*zx'w*w2*w*zx*invpd(zx'w*zx);

237
w = invpd(w2);
se2 = invpd(zx'w*zx);
b2 = se2*zx'w*z'y;
e2 = y-x*b2;
sar2 = e2'z*w*z'e2;
retp(b,sqrt(diag(se)),b2,sqrt(diag(se2)),sar2);
endp;
proc ezw(e,z);
local k,ez,T;
T = rows(e)/N;
k = cols(z);
ez = reshape(e.*z,N,K*T)*(ones(T,1).*.eye(K));
retp(ez'ez);
endp;
proc inw(z);
local a,i,zi,zaz,T;
t = rows(z)/N;
a = eye(T);
zaz = 0;
i = 1;
do until i>N;
zi = z[(i-1)*T+1:i*T,.];
zaz = zaz + zi'a*zi;
i = i+1;
endo;
retp(zaz);
endp;

238

/* DPD1.PRG Program for DPD (Dynamic Panel Data model)

Method: Arellano-Bond */
/* Defines variables below as global */
clearg N,T,y,x,z,alpha,sco,hes,zgy,fake,mom,w;
n=595; t=7; nvar=13;
lwage=x[.,13];
wks=x[.,3];
occ=x[.,4];
clear x;
/* Create a (NxT) matrix for dependent var.
y=reshape(lwage,n,t);
/* Stack exogenous vars. */
x=wks occ;

*/

/* Set top=0 for instruments from lagged Y's only;

top=1 to add instruments from X that are weakly exogenous and in level;
set top=2 to add for instruments from X that are strongly exogenous and
in first-difference form */
top=2;
/* Set AR1 to 0 for general case, and AR1 to 1
for serially correlated epsilon's of order 1 (E (epi tepi ; t + 1) <> 0) */
ar1=1;
/* You don't need to change anything after this line */
/* Define identity matrices I(T-2) for AB and BB */
ddif = eye(T-2);
/* Construct AB instrument matrix Z.

239
First component matrix: lagged Y's
Recall: if AR1=1, restriction when epsilon's are serially correlated
of order 1 */
z = (y[.,1]).*.ddif[.,1];
j = 2;
do until j>cols(ddif);
z = z ((y[.,1:j]).*.ddif[.,j]);
j = j+1;
endo;
if ar1==1;
z = (y[.,1]).*.ddif[.,1];
j = 2;
do until j>cols(ddif);
z = z ((y[.,1:j-1]).*.ddif[.,j]);
j = j+1;
endo;
z=z[.,2:cols(z)];
endif;
/* Second component matrix: Instruments from X */
/* Delete this block if you want only instruments from y's */
if top==1;
/* Weakly exogenous X's, in level */
toto=shapent(x[.,1]);
z2 = (toto[.,1]).*.ddif[.,1];
j = 2;
do until j>cols(ddif);
z2 = z2 ((toto[.,1:j]).*.ddif[.,j]);
j = j+1;
endo;
i=2;
do until i>cols(x);
toto=shapent(x[.,i]);
z2 =z2 ((toto[.,1]).*.ddif[.,1]);
j = 2;
do until j>cols(ddif);
z2 = z2 ((toto[.,1:j]).*.ddif[.,j]);

240

APPENDIX 11. DPD ESTIMATION WITH GAUSS

j = j+1;
endo;
i=i+1;
endo;
z=z z2;
endif;
if top==2;
/* Strongly exogenous X's, in first-difference form */
toto=shapent(x[.,1]);
z2 = (toto[.,3]-toto[.,2]).*.ddif[.,1];
j = 2;
do until j>cols(ddif);
z2 = z2 ((toto[.,j]-toto[.,j-1]).*.ddif[.,j]);
j = j+1;
endo;
i=2;do until i>cols(x);
toto=shapent(x[.,i]);
z2 = z2 ((toto[.,3]-toto[.,2]).*.ddif[.,1]);
j = 2;
do until j>cols(ddif);
z2 = z2 ((toto[.,j]-toto[.,j-1]).*.ddif[.,j]);
j = j+1;
endo;
i=i+1;
endo;
z=z z2;
endif;
b1,se1,b2,se2,sar = gmm(vec((y[.,3:T]-y[.,2:T-1])'),
vec((y[.,2:T-1]-y[.,1:T-2])')
trans(x),z,1);
output file = dpd1.out on;
"Arellano-Bond GMM estimates";
if top ==0;
"Instruments from lagged Y's only (TOP=0)";
endif;
if top==1;

241
"Instruments from X are weakly exogenous and in level (TOP=1)";
endif;
if top==2;
"Instruments from X are strongly exogenous and first-differenced (TOP=2)";
endif;
if ar1==1;
"Restricted estimates: epsilon are serially correlated of order 1 (AR1=1)";
endif;
" Estimate standard error t-stat";
b2 se2 b2./se2;
"Nb. of conditions (instruments) " cols(z);
"Nb. of parameters " rows(b2);
"Hansen specification test and p-value ";
sar cdfchic(sar,cols(z)-rows(b2));
output off;
proc shapent(w);
/* Reshapes vector in NxT form */
retp(reshape(w,n,t));
endp;
proc trans(w);
/* Transforms matrix X in First Difference */
local toto,i,xfd;
toto=reshape(w[.,1],n,t);
toto=vec((toto[.,3:T]-toto[.,2:T-1])');
xfd=toto;
i=2;
do until i>cols(w);
toto=reshape(w[.,i],n,t);
toto=vec((toto[.,3:T]-toto[.,2:T-1])');
xfd=xfd toto;
i=i+1;
endo;
retp(xfd);
endp;

242

proc (2)=ls(y,x);
/* Computes OLS, returns White var-covar matrix */
local ixx,b,e,v;
ixx = invpd(x'x);
b = ixx*x'y;
e = y-x*b;
v = ixx*(ezw(e,x))*ixx;
retp(b,v);
endp;
proc ezw(e,z);
local k,ez,T;
T = rows(e)/N;
k = cols(z);
ez = reshape(e.*z,N,K*T)*(ones(T,1).*.eye(K));
retp(ez'ez);
endp;
proc inw(z);
local d,a,i,zi,zaz,T;
T = rows(z)/N;
d = zeros(T,1) (eye(T-1)|zeros(1,T-1));
a = 2*eye(T) - (d + d');
zaz = 0;
i = 1;
do until i>N;
zi = z[(i-1)*T+1:i*T,.];
zaz = zaz + zi'a*zi;
i = i+1;
endo;
retp(zaz);
endp;
proc (5)=gmm(y,x,z,d);
local zx,w,w2,b,e,e2,b2,se,se2,sar2;
zx = z'x;

APPENDIX 11. DPD ESTIMATION WITH GAUSS

243
if d==1;
w = invpd(inw(z));
else;
w = invpd(z'z);
endif;
b = invpd(zx'w*zx)*zx'w*z'y;
e = y-x*b;
w2 = ezw(e,z);
se = invpd(zx'w*zx)*zx'w*w2*w*zx*invpd(zx'w*zx);
w = invpd(w2);
se2 = invpd(zx'w*zx);
b2 = se2*zx'w*z'y;
e2 = y-x*b2;
sar2 = e2'z*w*z'e2;
retp(b,sqrt(diag(se)),b2,sqrt(diag(se2)),sar2);
endp;

244

REFERENCES

References
S.C. Ahn and P. Schmidt, Ecient Estimation of Models for Dynamic Panel
Data, Journal of Econometrics, 68, 5-27, 1995.
S.C. Ahn and P. Schmidt, A Separability Result for GMM Estimation, with
Applications to GLS Prediction and Conditional Moment Tests, Econometric Reviews, 14(1), 19-34, 1995.
S.C. Ahn and P. Schmidt, Ecient Estimation of Dynamic Panel Data Models:
Alternative Assumptions and Simplied Estimation, Journal of Econometrics, 76,
309-321, 1997.
S.C. Ahn, Y.H. Lee and P. Schmidt, GMM Estimation of Linear Panel Data
Models with Time-varying Individual Eects, Journal of Econometrics, 101, 219255, 2001.
T. Amemiya, The estimation of the variances in a variance-components model,
International Economic Review, 12, 1-13, 1971.
T. Amemiya and T.E. MaCurdy, Instrumental-Variable Estimation of an ErrorComponents Model, Econometrica, 54(4), 869880, 1986.
E.B. Andersen, Conditional inference and models for measuring (Mentalhygiejnisk Forlag, Copenhague), 1973.
T.W. Anderson and C. Hsiao, Formulation and Estimation of Dynamic Models
Using Panel Data, Journal of Econometrics, 18, 4782, 1982.
D.W.K. Andrews, Heteroskedasticity and autocorrelation consistent covariance
matrix estimation, Econometrica, 59, 817-858, 1991.
D.W.K. Andrews and J.C. Monahan, An improved heteroskedasticity and autocorrelation consistent covariance matrix estimator, Econometrica, 60, 953-966,
1992.
W. Antweiler, Nested Random Eects Estimation in Unbalanced Panel Data,
Journal of Econometrics, 101, 295-313, 2001.
M. Arellano, Discrete choices with panel data, working paper 0101, CEMFI,

245
2001.
M. Arellano and S. Bond, Some Tests of Specication for Panel Data: Monte
Carlo Evidence and an Application to Employment Equations, Review of Economic
Studies, 58, 277297, 1991.
M. Arellano and O. Bover, Another Look at the Instrumental Variable Estimation of Error-Components Models, Journal of Econometrics, 68, 2951, 1995.
J. Alvarez and M. Arellano, The Time Series and Cross Section Asymptotics
of Dynamic Panel Data Estimators, CEMFI Working Paper No. 9808, 1998.
P. Balestra and M. Nerlove, Pooling cross-section and time-series data in the
estimation of a dynamic model: the demand for natural gas, Econometrica, 34,
585-612,1966.
B.H. Baltagi, Econometric Analysis of Panel Data, J. Wiley, 1995.
B.H. Baltagi and S. Khanti-Akom, On ecient estimation with panel data:an
empirical comparison of instrumental variables estimators, Journal of Applied Econometrics, 5, 401-406, 1990.
B.H. Baltagi, Simultaneous equations with error components, Journal of Econometrics, 17, 189-200, 1981.
B.H. Baltagi, Specication issues, in The econometrics of panel data: Handbook of theory and applications, chap. 9, L. Matyas and P. Sevestre eds., Kluwer
Academix Publishers, Dordrecht, 196-205, 1992.
B.H. Baltagi, Panel data, Journal of Econometrics, 68, 1-268, 1995.
B.H. Baltagi, S.H. Song and B.C. Jung, The Unbalanced Nested Error Component Regression Model, Journal of Econometrics, 101, 357-381, 2001.
R. Blundell and S. Bond, GMM estimation with persistent panel data: An
application to production functions, IFS working paper W99/4, 1999.
R. Blundell and S. Bond, Initial Conditions and Moment Restrictions in Dynamic Panel Data Models, Journal of Econometrics, 87, 115143, 1998.

246

REFERENCES

A. Brsch-Supan and V. Hajivassiliou, Smooth unbiased multivariate probability simulators for maximum likelihood estimation of limited dependent variables
models, Cowles Foundation paper 960, Yale University, 1990.
T.S. Breusch, G.E. Mizon and P. Schmidt, Ecient Estimation Using Panel
Data, Econometrica, 57(3), 695-700, 1989.
G. Chamberlain, Asymptotic Eciency in Estimation with Conditional Moment Restrictions, Journal of Econometrics, 34, 305-334, 1987.
G. Chamberlain, Panel data, in Handbook of Econometrics, pp. 1247-1318, Z.
Griliches and M. Intriligator eds., North- Holland, Amsterdam, 1984.
G. Chamberlain, Comment: Sequential Moment Restrictions in Panel Data,
Journal of Business and Economic Statistics, 10, 20-26, 1992.
G. Chamberlain, Multivariate regression models for panel data, Journal of
Econometrics, 18, 5-46, 1982.
E. Charlier, B. Melenberg and A. van Soest, Estimation of a censored regression panel data model using conditional moment restrictions eciently, Journal of
Econometrics, 95, 25-56, 2000.
C. Cornwell. and P. Rupert, Ecient Estimation with Panel Data: An Empirical Comparison of Instrumental Variables Estimators, Journal of Applied Econometrics, 3, 149-155, 1988.
B. Crpon, F. Kramarz and A. Trognon, Parameters of Interest, Nuisance Parameters and Orthogonality Conditions. An Application to Autoregressive Error
Component Models, Journal of Econometrics, 82, 135156, 1997.
C. Cornwell, P. Schmidt and D. Wyhowski, Simultaneous equations and panel
data, Journal of Econometrics, 51, 151-181, 1992.
G. Dionne, R. Gagn and C. Vanasse, Inferring technological parameters from
incomplete panel data, Journal of Econometrics, 87, 303-327, 1998.
J. Dolado, Optimal instrumental variable estimator of the AR parameter of an
ARMA(1,1) process, Econometric Theory, 6, 117-119.

247
B. Dormont, Introduction l'Economtrie des Donnes de Panel, Editions du
Centre National de la Recherche Scientique, Paris, 1989.
E. Fix and J.L. Hodges, Discriminatory analysis, nonparametric estimation:
consistent properties, Report No 4, USAF School of Aviation Medicine, Randolph
Field, Texas, 1951.
J. Geweke, Bayesian inference in econometric models using Monte Carlo integration, Econometrica, 57, 1317-1339, 1989.
S. Girma, A quasi-dierencing approach to dynamic modelling from a time series of independent cross-sections, Journal of Econometrics, 365-383, 2000.
R. Hall, Stochastic implications of the life cycle-permanent income hypothesis,
Journal of Political Economy, 86, 971-987, 1978.
B.E. Hansen, Threshold Eects in Non-Dynamic Panels: Estimation, Testing,
and Inference,Journal of Econometrics, 93, 345368, 1999.
L.P. Hansen, Large sample properties of generalized method of moments estimators, Econometrica, 50, 102-1054, 1982.
L.P. Hansen, A method of calculating bounds on the asymptotic covariance
matrices of generalized method of moments estimators, Journal of Econometrics,
30, 203-238, 1985.
L.P. Hansen and T.J. Sargent, Instrumental variables procedures for estimating
linear rational expectations models, Journal of Monetary Economics, 9, 263-296,
1982.
L.P. Hansen and K.J. Singleton, Generalized instrumental variable estimation
of nonlinear rational expectations models, Econometrica, 50, 1269-1286, 1982.
L.P. Hansen, J.C. Heaton and A. Yaron, Finite-sample properties of some alternative GMM estimators, Journal of Business and Economics Statistics, 14, 262-280,
1993.
W. Hrdle and J.S. Marron, Optimal bandwidth selection in nonparametric
regression function estimation, Annals of Statistics, 13 1465-1481, 1983.
R.D.F. Harris and E. Tzavalis, Inference for unit roots in dynamic panels where

248

REFERENCES

the time dimension is xed, Journal of Econometrics, 91, 201-226, 1999.

J.A. Hausman, Specication Tests in Econometrics, Econometrica, 46(6), 12511271,
1978.
J.A. Hausman and W.E. Taylor, Panel Data and Unobservable Individual Effects, Econometrica, 49(6), 13771398, 1981.
J.J. Heckman and T.E. MaCurdy, A life-cycle model of female labor supply,
Review of Economic Studies, 47, 47-74, 1980.
I. Hoch, Estimation of production function parameters combining time-series
and cross-section data, Econometrica, 30, 34-53, 1962.
D. Holtz-Eakin, W. Newey and H. Rosen, Estimating Vector Autoregressions
with Panel Data, Econometrica, 56, 13711395, 1988.
B.E. Honor and A. Lewbel, Semiparametric binary choice panel data models
without strictly exogeneous regressors, working paper, Boston College, 2000.
C. Hsiao, Analysis of Panel Data, Cambridge University Press, 1986.
K.S. Im, S.C. Ahn, P. Schmidt and J.M. Wooldridge, Ecient estimation of
panel data models with strictly exogenous explanatory variables, Journal of Econometrics, 93, 177-201, 1999.
G.W. Imbens, One-step estimators for over-identied generalized method of
moments models, Review of Economic Studies, 64, 359-383.
J. Inkmann, Misspecied heteroskedasticity in the panel Probit model: A small
sample comparison of GMM and SML estimators, Journal of Econometrics, 97, 227259, 2000.
R.A. Judson and A.L. Owen, Estimating dynamic panel data models: A guide
for macroeconomists, Economics Letters, 65, 9-15, 1999.
M.P. Keane and D.E. Runkle, On the estimation of panel-data models with
serial correlation when instruments are not strictly exogenous, Journal of Business
and Economic Statistics, 10, 1-9, 1992.
N.M. Kiefer, A Time Series-Cross Section Model with Fixed Eects with an

249
Intertemporal Factor Structure, unpublished manuscript, Cornell University, 1980.
E. Kyriazidou, Estimation of a panel data sample selection model, Econometrica, 65, 1335-1364, 1997.
Y.H. Lee and P. Schmidt, A Production Frontier Model with Flexible Temporal
Variation in Technical Ineciency, in The Measurement of Productive Eciency:
Techniques and Applications, Oxford University Press, 1993.
L.A. Lillard and Y. Weiss, Components of Variation in Panel Earnings Data:
American Scientists 1960-1970, Econometrica, 47, 437454, 1979.
R. Lucas, Econometric policy evaluation: A critique, in The Phillips curve and
labor markets, K. Brunner (Ed.), Vol. 1, North-Holland, 1976.
Y.P. Mack, Local properties of k N N regression estimates, SIAM Journal of
Algebraic and discrete methods, 2, 311-323, 1981.
L. Matyas and P. Sevestre, The Econometrics of Panel Data. Handbook of
Theory and Applications, Kluwer Academic Publishers, 1992.
P. Mazodier and A. Trognon, Heteroskedasticity and stratication in error components models, Annales de l'INSEE, 30-31, 451-482, 1978.
C. Meghir and F. Windmeijer, Moment Conditions for Dynamic Panel Data
Models with Multiplicative Individual Eects in the Conditional Variance,IFS
Working Paper Series No. W97/21, 1997.
R. Mott, Identication and estimation of dynamic models with a time series
of repeated cross-sections, Journal of Econometrics, 59, 99-123, 1993.
M. Nerlove, A note on error components models, Econometrica, 39, 383-396,
1971.
W.K. Newey, Ecient estimation of models with conditional moment restrictions, in Handbook of Statistics, C.R. Rao and H.D. Vinod (Eds.), Vol. 11, Elsevier
Science Publishers, 1993.
W.K. Newey, Ecient instrumental variables estimation of nonlinear models,
Econometrica, 58, 809-837, 1990.

250

REFERENCES

W.K. Newey and K.D. West, Automatic lag selection in covariance estimation,
Review of Economic Studies, 61, 631-653, 1994.
W.K. Newey and K.D. West, Hypothesis testing with ecient method of moments estimation, International Economic Review, 28, 777-787, 1987.
W.K. Newey and K.D. West, A simple, positive denite, heteroscedasticity and
autocorrelation consistent covariance matrix, Econometrica, 55, 703-708, 1987.
P. Schmidt, S.C. Ahn and D. Wyhowski, Comment: Sequential Moment Restrictions in Panel Data,Journal of Business and Economic Statistics, 10, 1014,
1992.
C.J. Stone, Consistent nonparametric regression, Annals of Statistics, 5, 595645, 1977.
P.A.V.B. Swamy and S.S. Arora, The exact nite sample properties of the estimators of coecients in the error components regression models, Econometrica,
40, 261-275, 1972.
M. Verbeek and T.E. Nijman, Testing for selectivity bias in panel data models,
International Economic Review, 33, 681-703, 1992.
M. Verbeek and T.E. Nijman, Minimum MSE estimation of a regression model
with xed eects and a series of cross- sections, Journal of Econometrics, 59, 125136, 1993.
T.D. Wallace and A. Hussain, The use of error components models in combining cross-sction and time-series data, Econometrica, 37, 55-72, 1969.
T.J. Wansbeek and A. Kapteyn, Estimation of the error components model
with incomplete panels, Journal of Econometrics, 41, 341-361, 1989.
H. White, A heteroscedasticity consistent covariance matrix estimator and a
direct test for heteroscedasticity, Econometrica, 48, 817-838, 1980.
H. White, Asymptotic theory for econometricians, Academic Press, Orlando,
1984.

251
J.M. Wooldridge, A framework for estimating dynamic, unobserved eects
panel data models with possible feedback to future explanatory variables, Economics Letters, 68, 245-250, 2000.