of Econometrics
18 (1982) 546.
North-Holland
Publishing
MULTIVARIATE
REGRESSION
FOR PANEL DATA
Company
MODELS
Gary CHAMBERLAIN*
University
Nutionul
Bureau of EconomicResearch,
WI 53706, USA
Cambridge,
MA 02138. USA
1. Introduction
Suppose that we have a sample of individuals
(or firms) followed over time:
and i= 1,. ., N individuals.
(xif,yiJ, where there are t= 1,. . ., T periods
Consider the following distributed
lag specification:
E(YitIXil,...,XiT,biO,...,biJ,Ci)=
i bijXi,t-j+Ci,
j=O
t=J+l,...,T
1982 North-Holland
. I=
E*(Y,
1x,, x, -
1,.
.I,
so that y does not cause x according to the Sims (1972) definition, is it then
true that there is no heterogeneity
bias? The answer is no, because if d is an
invariant random variable, then
E*(dI . . .. x,_~,x,,x,+~
,... )=E*(dIxt,xtpl
,... ).
G. Chamberlain,
Mulrirw-iate
regression
models fir
panel data
and f7 is a
differentiable
implications.
formed from
G. Chamberlain,
Multivariate
squares regressions of y, on xi,. ., xT. There are significant leads and lags; if
they are generated just by a random
intercept
(c), then ZZ should have a
distinctive
form. There is some evidence in favor of this, and hence some
justification
for analysis of covariance
estimation.
In this example, the leads
and lags could be interpreted
as due just to c, with E(y, 1x1,. . ., xT, c) =j?x, + c.
2. Identification
Suppose
technology,
that
a farmer
is producing
a product
o<p<1,
Y,=Px,+c+~,,
with
a Cobb-Douglas
t=l,...,7;
+ c}/(l -
p),
mean
711=cov(Yl,4Vx,),
square
error
linear
predictor
(the wide-sense
%=E(Y~)--~E(xJ
Cov(c, x,)#O if V(c)#O; then n, #p and the least squares estimator of /I does
not converge to fi as N-co.
Furthermore,
with a single cross-section,
there
would be no internal evidence of this heterogeneity
bias.
This example
is discussed
in Mundlak
(1961,1963)
and in Zellner,
Kmenta,
G. Chamberlain,
:Illrltirariate
regression
xx,
Then
Iz= I/
co+, x) v
(x) = p I, + 1A,
j-~~-x~,...,x~-x~~~)=O,
t = 2,
,) 7:
10
G. Chamberlain,
Mu&variate
when we transform
the model to first
restriction is testable since it implies that
E*(!,,-Yr~1IXz-X1,..., xT--x.-l)=~*(Yt-Y,-l
hence there are exclusion restrictions
A stronger condition is that
E*(u,lx
on the linear
I-+-x,-d
predictors.
t=l,...,T.
,,..., xT)=o,
Proposition
Suppose that
I.
E(y, I x, b, 4 = b x, + c,
3The strict exogeneity
terminology
n with i 5 n 5 T.
t=l,...,T.
[f conditions
fir
panel data
11
I4 = E(Y,I 4,
t=l,...,7:
implies that
The following
equalities
I xn -
one:
if T2 3, then
(C) implies
that
~(clx)=E(y,lx)--E(blx)x,=E(y,lx,)--xx,;
hence E(c 1x) = E(c 1x1) and so E(c 1x) = E(c).
Q.E.D.
that
it
G. Chamberlain,
12
but p2 #E(b)
distinction
here
only takes on
probability
that
for large 7:
The following
distinction;
it is
Multivariate
regression
unless
prob(x, = ... =
between continuous
a finite set of values,
x1 =. . . = xT, although
models
for panel data
xT) = 0. So there
is an important
and discrete distributions
for x. If x,
then there will generally
be positive
this probability
may become negligible
proposition
provides
some additional
insight
mto
based on a condition that is slightly weaker than (R):
Condition (R).
Prob(x,
Proposition 2.
Suppose
this
= x2 =. . . = xT) = 0.
that
t=l,...,7;
6=til
(Yt-m-+l(x,--v.
Then E(6j = E(b) if E((6j) < a.4
ProoJ
The following
equalities
E(l+,b,c)= i
b(x,-X)
one:
(x,-%)2=b;
I t=1
t=1
so if E(I6/)< co,
E(6j = E[E(6[ X, b, c)] = E(b).
Suppose
that (yil,. . ., yi,, xii,. . ., xiT),
from the distribution
of b,x). Define
6zt$l
(Yit-Pi)(xit-xi)
Q.E.D.
i= 1,. . ., N,
is
random
sample
til (xit-xi)2.
of freedom.
13
1.
of Proposition
Suppose that
E(y,IX,b,,...,b,,c)=
t=J+l,...,T
bjx,-j+c,
j=O
ii
1 X,-J-l
:1
X,-J
. . . . x&,
I4
t=J+1,...,7;
5A solution
could be based on Mundlaks
(1978a) proposal
that E(bIx)=$,,+$,
CT=, x1.
However, even if we assume that the regression function is linear in x1,. .,xT, it may be difficult
to justify the restriction
that only cx,
matters, unless T is large and we have stationarity:
cov (b, I,) = cov (b, x1) and V(x) band diagonal. (See Proposition
4 and the discussion preceding
it). Furthermore,
if cov(h, x,) = cov(b, x1), then E(b 1x2-x,,
.,xr -xT- 1)= E(b) (if the regression
function is linear), and so there is no heterogeneity
bias once we transform to first differences.
6We shall not discuss the problems
that arise from truncating
the lag distribution
when
T < J + 1. These problems are discussed in Griliches and Pakes (1980). By working with linear
transformations
of the process, it is fairly straightforward
to extend our analysis to general
rational distributed
lag schemes.
14
implies
that
E(y,Ix)=
Bjxt-j+Y,
j=O
where
pj = E(bj) = E(b, 1x)
and
j=O,...,
Suppose
E(.Yfldx),c)= f
J.
that
Bjxt-j+c2
i=O
I 4) = E(Yt I x,2 xt -
19.. 4
3.
E(dIo(x))=E(dlx,,x,-,,...),
where t is any integer.
It follows that
n
E(Y,Ia(x))=E(cIx,,x,-,,...)+
C Pjx*-j
j=O
=E(y,Ix,,x,-I,...).
So we cannot
1s
a large number
of lags have been included, then a small number of leads
provide little additional
information
on c.
We can gain some insight into this result by considering
the linear
predictor of an invariant
random variable. Let
E*(c 1xl,. . ., x.)=IC/T+&XT,
where
2;. =(& i, . . .) A,,)
and
x;=(xl,...,xT).
Stationarity
implies that I,=rV(xT)l, where r =cov(xl, c) and 1 is a TX 1
vector of ones. Since V(x,) is a band-diagonal
matrix, I is approximately
an
eigenvector
of I+,)
for large T; hence &.x,EzIc~T=
1x,. For example, if
X, = px, _ i + u,, where v, is serially uncorrelated,
then
&-x,=~
i=l
case is covered
4.
by the following
proposition:
)=$
+/IT?,
where 2 is the limit in mean square of cJ= Ox, _ j/J as J+ co, t is any integer,
A=cov(d,i)/V(i)
=o
if
V(a)#O,
if
V(a)=O,
and
$ = E(d) - AE(f).
(See appendix
A for proof.)
The existence of the f limit, both in mean square and almost surely, is the
main result of ergodic theory and will be discussed further below. It is clear
that 2 is an invariant
random variable. If V(a)#O, then the x process has a
(non-degenerate)
invariant
component,
and conditioning
on the xs gives a
G. Chamberlain,
16
Multioariate
regression
modelsfor
panel data
non-trivial
linear predictor if 2 is correlated with c. However, if V(i)=O, then
cov(c, x,)=0 for all t, and the linear prediction
of c is not improved
by
conditioning
on the xs
It follows from Proposition
E*(Y,
4 that
I . . .. x,-1,x*,x,+
,,..
=E*(ytIxt,x,-,,...I
=i+jio
( 1
Bj++
xt - j + r(J),
of 8,
all of
is not
have
that
E(Y,I 44, A= b x, + c,
and let
Gil
Recall
condition
(Y,-Ylbt--x)
il h-3.
(R):
=...=x~)=O.
prob(s,
want
to
examine
the
G. Chamberlain, Multinariate
significance
of condition
So a limiting
version
(R) as T+n;
of condition
in the stationary
17
(R) is
prob[ I/(x, ) f) = 0] = 0.
If this condition
holds, then
~~(xlY1l~)-~(xlI8)~(YlI,a)~,
limb
E(4 I &)-cm,
T-rX
as.
I &)I2
and b is observable
as T-tco.
But if there is positive probability
that
T/(x, 1f) =O, then the identification
problem is more difficult. There is no
information
on b for the stayers; so-in order to obtain E(b), even as T-co,
we have to make untestable
assumptions
about the unobservable
part of the
b distribution.
3. Estimation
Consider
a sample
Y;=(x:,yi),
i = 1,. . .,X, where
xi. = (xi,, . ., xiK), yi
=(yil,. . ., yiM). We shall assume
that vi is independent
and identically
distributed
(i.i.d.) according
to some multivariate
distribution
with finite
fourth moments
and E(x,x:) non-singular.
Consider
the minimum
mean
square error linear predictors,
E*(yi, I xi)
m=l,...,M,
=dlxi>
with
in section
2 if xi includes
a constant.
18
G. Chamberlain,
Multivariate
of linear predictors
Let wi be the vector formed from the distinct elements of riri that have
non-zero
variance.
Since v;=(xi,yi)
is i.i.d., it follows that wi is i.i.d. This
simple observation
is the key to our results. Since IZ is a function of E(wi),
our problem is to make inferences about a function of a population
mean,
under random sampling.
Let p= E(w,) and let IL be the vector formed from the columns of ll [Z
= vet (IZ)]. Then YI is a function
of P: x=/z(p).
7i = h(w) is the least squares estimator:
Let
W= cy2 1 w,/N;
then
.=VeC[(~~XixI)-~~XiYI].
J5$i-pO)%v(O,V(w,)).
Since h(p) is differentiable
gives
JN(iZ-d)%v(O,R),
where
(1978a) discuss
is generated
by
G. Chamberlain,
Multivariate
19
distributions
for sample correlation
and regression
coefficients (p. 367); he
presents an explicit formula for the variance of the limiting distribution
of a
sample correlation
coefficient (p. 359). Kendall and Stuart (1961, p. 293) and
Goldberger
(1974) present
the formula
for the variance
of the limiting
distribution
of a simple regression coefficient.
Evaluating
the partial derivatives
in the formula for 52 is tedious. That
calculation
can be simplified since i has a ratio form. In the case of simple
regression with a zero intercept, we have rc= E(y,x,)/E(xj!) and
fi(kTO)=
i=l
The definition
implies that
y.u.I
I
we obtain
Ql
xi)[fi(
distribution
by working
limit theorem
C(Yi- noxi)xillCfi
,$ m)].
E(xZ)l,
This approach
was used by White (1980) to obtain the limiting distribution
B (Proposition
5) we
for univariate
regression
coefficients. lo In appendix
follow Whites approach to obtain
s2 = E[iJJi-noxi)(yi
(1)
where
@, = E(qx;).
A consistent
estimator
sample moments,
o=&$
L 1
of 52 is readily
[~i-Bxi)(JJi-fiXi)@
n here
S,=
5 x,x:/N.
i=l
available
from
S;(Xixi)S;q
the
corresponding
AL?,
(2)
20
G. Chamberlain,
Multkariate
regression
If the conditional
variance
depend on xi, then
3.2. Imposing
restrictions:
modelsfiv
function
panel data
is linear,
then
is homoskedastic,
so that
This
minimization
problem
is
The properties
of 6 are developed, for example, in Malinvaud
(1970, ch. 9).
Since g does not depend on any exogenous variables, the derivation
of these
properties can be simplified considerably,
as in Chiang (1956) and Ferguson
(1958).
For completeness,
we shall state a set of regularity
conditions
and the
properties that they imply:
21
Assumption 2. $?[a,-g(O)]
O in which g has continuous
G = ag(eOym.
Choose 8 to
E. of
%(O, A); r contains
a neighborhood
second partial derivatives; rank (G) =p, where
minCa,-g(e)lA.Ca,-s(e)l.
0Er
Proposition
6.
If Assumption
Proposition
where
7.
Zf Assumptions
Wwd831
(This is extended
B.)12
A),
hence an
if A is a q x q positive
4vC~,-g(B)1%2kp).
in Proposition
8, appendix
Suppose that the restrictions involve only Zl. We specify the restrictions
by
the condition
that z=f (4, where 6 is s x 1 and the domain of 6 is Y,, a
subset of R that includes the true value 6. Consider the following estimator
of 6: choose s^ to
~:CA-f(6)]8-[li-f(S)],
1
Since
appendix
ch. 9).
22
where
definite.
fi is given
If Y, and
fi(&
we assume
f satisfy Assumptions
so)qo,
[F
that
n ~l Fj
- ),
and
where
F=
i3f(d)/W.
E*(yi 1Xil,Xiz)=710 +
Consider
?Tl Xi1
+7Cz Xi2.
G. Chamberlain,
(estimated)
variance
Multivariate
of the limiting
regression
distribution
23
tij and
I& in their
limiting
and
if V(y, 1xii, xi2)=a2,
then
w12/022 =
If E(Y, 1Xil,XiJ
is linear
-COv(Xi,,Xi2)/~(Xi~)
and s^= byxl. But in general 8# byxl and s^ is more
efficient than by_. The source of the efficiency gain is that the limiting
distribution
for ti, has a zero mean (if rc2=O), and so we can reduce variance
without
introducing
any bias if 72, is correlated
with b,,l. Under
the
assumptions
of linear regression
and homoskedasticity,
b,_ and 72, are
uncorrelated;
but this need not be true in the more general framework that
we are using.
3.3. Simultaneous
squares
equations:
A generalization
of two-
and three-stage
least
Given the discussion on imposing restrictions, it is not surprising that twostage least squares is not, in general, an efficient procedure
for combining
instrumental
variables.
I shall demonstrate
this with a simple example.
Assume that (yi,zirxil,xi2)
is i.i.d. according to some distribution
with finite
fourth moments, and that
yi = 6 Zi +
Vi,
where E(ui xii) = E(ui xi2) = 0. Assume also that E(zi xii) # 0, E(z, xi2) # 0. Then
there are two instrumental
variable estimators that both converge a.s. to 6:
j= 1,2,
fi{(;;)-(;)}-N(OJ)>
where the j, k element
of n is
2, = EC(Yi-dzi)2XijXi!J
Jk
E(zixii)E(zi.xik)
j,k=1,2.
24
The two-stage
least squares estimator
combines
8,
^
zi=7c1xil +ti2xi2, based on the least squares regression
sume that E[(xir, Xia)(Xil, xi2)] is non-singular),
where
N
oiti,
ZiXil
i=l
Since i %a,
JN(&s,,
-6)
I(
rili~lzixil+722
C
i=l
zixi2
)
distribution
as
gives
e^=z&+(l-z)&,
where
~=(~+1,2)/(3.1+2~12+~22),
obtained
by using a
and Ijk is the j, k element
of A - . The estimator
consistent estimator of A has the same limiting distribution.
In general z #a since r is a function
of fourth moments
and a is not.
Suppose, for example, that zi = Xi2. Then IX= 0 but z # 0 unless
xil
E(xil
xi2
xi2)
>I=o.
G. Chamberlain,
Consider
the standard
yi =
Multivariate
regression
simultaneous
nxi + ui,
equations
25
model:
E(Ui xi) = 0,
8 to
y:
E
where
[7i -
f(d)]&
1[72-f(s)],
is given
we assume
that
definite. Let
F= af(s)/S.
Then we have J%(~-~~)%NN(O,
A), where n
= (F Q - 1 F) . This generalizes Malinvauds
minimum distance estimator (p.
676); it reduces to his estimator if UPuy is uncorrelated
with xi xi, so that Q
= E(up up) @ [E(.qx;)] - (up = yi - Zlx,).
Now suppose that the only restrictions
on r and B are that certain
coefficients
are zero, together with the normalization
restrictions
that the
coefticient of yim in the mth structural
equation is one. Then we can give an
explicit formula for A. Write the mth structural equation as
Then
Y, and
f satisfy Assumptions
26
A!+
N(O, A). The formula for &r/&Y is given in Rothenberg (1973, p. 69),
an/as =
-(r
1 cgzK)p,,(zM
B d5;
)I,
n = {~,,[E(Op Up~
Xi
X:)] -
l UP:,>- ,
z)/?]s)-[a+(r-
0 4Bl
=[(ro1)72+P]O-[(ro1)12+81,
where
o=(Z~~;l)E(f
UpU:r~XtX;)(Z~Qi;).
matrix:
and let
where
iji = ~yi +
~Xi,
p+rO
7
B% BO.
G. Chamberlain,
Now replace
Multivariate
21
0 by
6 = (Z@s,- ) 9yzgs,- ),
and note that
(I 0 S,)[(r
distance
function:
This corresponds
to Basmanns
(1965)
squares. 3
Minimizing
with respect to 6 gives
a,,=(S,,!F
interpretation
of three-stage
least
s:,)-(s,,!Ps,,).
The limiting
distribution
(Proposition
5). We record
of this
it as:
estimator
is
derived
in
appendix
Proposition
10. fi(6^,,-6)%iV(0,A),
where A =(@,, P- @P:,)-l. This
generalized three-stage least squares estimator is asymptotically efficient within
the class of minimum distance estimators.
Finally, we shall
Suppose that
Yil =S;
where E(xiUil)=O,
system by setting
consider
zil
the generalization
of two-stage
least
squares.
Oil,
Zil is sl x 1, and
rank
[E(XiZ:l)] =sl.
We complete
the
yi, = nk xi + Uim,
where E(XiUi,)=O
(m=2,.
. ., M). SO z~,,,=x~
(m=2,.
. ., M), and
28
We have
gives
G. Chamberlain,
a($,
-@)1?N(O,
Multivariate
regression
n 11), and
evaluating
the partitioned
inverse
(4)
where
$1 =yi, -s;ozir.
We can obtain
the same limiting
distribution
generalization
of two-stage least squares: Let
by
using
the
following
and
where $I %Sy
then
(for example,
8r could
&;G2
= (Z; x!PE;,x2,)-
be an instrumental
(z;
variable
estimator);
x!P ,, Xy,).
efjciency:
A comparison
likelihood
S=kiil
(ri-FJ(ri-yi),
If the distribution
function is
of vi is multivariate
normal,
then
the log-likelihood
estimator
of 8
w (s, e) [s-g(e)] = 0.
He derives
the limiting
distribution
of fi(&-- fI) under
regularity
conditions
on the functions
W and g. These regularity
conditions
are
particularly
simple in our problem since W does not depend on S. We can
state them as follows:
8; g is a continuous,
oneinverse; g has continuous
=p for OE 8,; Z(O) is non-
result that
Then Fergusons
theorem
implies that the likelihood
equations
surely have a unique
solution
within So for sufficiently
large
14The quasi-maximum
Malinvaud
(1970, p. 678).
JE--B
likelihood
terminology
was
used
by the
Cowles
almost
N, and
Commission;
see
30
y*G*)-
the following
T$[s*
-g*(B)]
(G* y* A* y* G*)(e*
minimum
12.
1.
choose @MDto
If Assumption
estimator:
G*)-
A,{!?* -g*(O)],
distance
y*
3 is satisfied,
as fi(gMD
then J%(&~~~
-0)
of 8 and
has the
- 0).
If A* is non-singular,
an optimal
minimum
distance
estimator
has
A,a%[A*-,
where [ is an arbitrary positive real number. If the distribution
of ri is normal, then A*- =iY*; but in general A*- is not proportional
to
Y*, since A* depends on fourth moments
and Y* is a function of second
moments.
So in general flPML is less efficient than the optimal minimum
distance estimator that uses
-1
,
;i;l(s~-s*)(s:-s-i)
1
where SF is the vector formed from the lower triangle of (ri-r](ri-f).
More generally, we can consider the class of consistent estimators that are
continuously
differentiable
functions of s-*: &=@*). Chiang (1956) shows that
the minimum distance estimator based on A*- has the minimal asymptotic
covariance
matrix within this class. The minimum
distance estimator based
on A, in (5) attains this lower bound.
G. Chamberlain,
Multivariate
regression
31
4. An empirical example
We shall present
an empirical
example
that illustrates
some of the
preceding
results. The data come from the panel of Young Men in the
National
Longitudinal
Survey (Parnes). The sample consists of 1454 young
men who were not enrolled in school in 1969, 1970, or 1971, and who had
complete
data on the variables
listed in table 1. Table 2a presents
an
unrestricted
least squares regression of the logarithm of wage in 1969 on the
union, SMSA, and region variables for all three years. The regression also
includes a constant, schooling, experience, experience squared, and race. This
regression is repeated using the 1970 wage and the 1971 wage.
Table
Characteristics
Young Men,
of National Longitudinal
Survey
not enrolled in school in 1969,
1970, 1971; N= 1454.
Variable
Mean
LWI
LWZ
LW3
Ul
u2
lJ3
lJlU2
lJIcJ3
U2U3
UI CJ2U3
SMSAI
SMSAZ
SMSA3
RNSI
RNS2
RNS3
s
EXP69
EXP692
RACE
5.64
5.74
5.82
0.336
0.362
0.364
0.270
0.262
0.303
0.243
0.697
0.627
0.622
0.409
0.404
0.410
11.7
5.11
39.8
0.264
Standard
deviation
0.423
0.426
0.437
2.64
3.71
46.6
Xl.
1.f uoyxx
aJv sIoJ,a
p~vpuels
(6LO.O)
011'0
kmY0)
S80'0-
zzo'o-
(zsuo)
(SPO'O)
PIO'O
ZLO'O-
(ESOO)
(OPO'O)
610'0-
(LEOO)
OSO'O-
ZM7
lM7
aPnF'"!SUO!='J%~J flt'.
(180'0)
P9Z'O
(PLO'O)
181'0
(6Lo'O)
9PZ'O
(260'0)
811'0
'(83VX p59dX~'69dX5Sf
9SZ'O(990'0)
LZZ'O
LZI'O
bM)'O)
(tr 10)
(911'0)
tzxo-
LPO'O-
hO~0)
f/M7
ZLOO-
(IPO'O)
uog3as (z)
juapuadaa
alqe!JeA
.___
(ZLO'O)
821'0
1~
rn
In
zn
-.- ___~0 (sJolJa p.mpue)s pm) siua!ogao3
lseaI pap!llsamn
9z aw1
salenbs
znrn
(SLO'O)
260'0
.suo!ssaSaJ
i-n/n
(OLO'O)
9SI'O
fnzn
(POI'O)
281'0fl?Zl?nl/l
_.
e
(IEO'O) (OEO'O)
SEI'O
(szuo)
(SZO'O)
600'0-
tn
(EZO'O)
9Po'O
ZM7
.CM7
IPOO
(EZO'O)
8Po'O
(OEO'O)
OSI'O
(8200)
In
ajqe!mh
luapuadaa
l/Ml
(szo'o)
zn
pun) slua!xjja03
ILI'O
';Zg
aJe slolJa pmpueis ayL $wu~ 26ydxy fj9dxg s I) apnpu! suo!ssaJ%aJ I[V,
(950'0) (SSO'O)
880'0
COO'0
(LZO'O) (9zo.o)
980'0
010'0
pa~yno~m
16LO'O)
PLO'0
(I90'0) (S90'0)
.ba u! Q Bu!sn
(E60'0)
OSO'O
(660'0)
s90'0
(PSO'O) (SSO'O)
IOO'OZEO'O
IVSWS
Iseal pal3ysaJun
~__
Zf I0
(8LO'O)
ZCZ'O-
(601'0)
6EO'O-
';pg'
cvsws ZVSWS
__.
sanmbs
f800
SSI'O-
(Z6D'O)
'g:;'
ESO'O
(OLO'O)
801'o_
ISNkI
ozo'o
Ci'Ntl ZSNU
Rwo!ssaSaJ
ez alw
G. Chamberlain,
Multivariate
regression
33
Using
May-May
CPS matches for 197771978, Mellow (1981) reports coefftcients (standard
errors) of 0.087 (0.018) and -0.069 (0.020) for entering and leaving union membership
in a wage
change regression.
The sample consists of 6,602 males employed as non-agricultural
wage and
salary workers in both years. He also reports results for 2,177 males and females whose age was
525. Here the coefficients on entering and leaving union membership
are quite different: 0.198
(0.031) and -0.035
(0.041); it would be useful to reconcile these numbers with our results for
young men. Also see Stafford and Duncan (1980).
34
G. Chamberlain,
Multivariate
regression
does not hold. For example, the union coefticients provide some evidence
that E(b 1x1, x2,x,) is constant for the individuals
who experience a change in
if x,+x,+x,#O
or 33; but there is
union coverage [i.e., E(b 1x,,x,,x,)=if
no direct evidence on E(b 1x1, x2, x3) for the people who are always covered
or never covered. Furthermore,
our alternative
hypothesis has no structure.
It might be fruitful, for example, to examine the changes in union coverage
jointly with changes in employer.
Table 3a exhibits the estimates that result from imposing the restrictions
using
the optimal
minimum
distance
estimator.j
We also give the
conventional
generalized least squares estimates. They are minimum
distance
estimates in which the weighting matrix (AN) is the inverse of
0.086
(0.025)
~ 0.008
(0.046)
SMSAZ
- 0.067
(0.040)
- 0.023
(0.030)
SMSA
u2
UI
0.032
(0.046)
SMSA3
-0.082
(0.037)
0.156
(0.057)
lJllJ2
0.100
(0.072)
RNSl
0.152
(0.062)
UlU3
- 0.02 I
(0.077)
RNS2
0.195
(0.059)
r/2 U3
-0.085
(0.040)
(0.052)
- 0.082
(0.045)
RNS
-0.128
(0.068)
RNS3
-0.229
(0.085)
lJIUZU3
E*Cyjx)=nx=n,x,+n,x,;
x;=(Ul,
U2, U3, UIU2, UIU3, U2U3, UIU2U3, SMSAl,
SMSA2, SMSAS,
RNSI, RNS2, RNS3); x; =( 1, S, EXP69, EXP69, RACE). ZZ, = (/J,Z,, 0, BSMSAZJ,fiRNSZ3)+ 12; ZZ2 is unrestricted.
The restrictions are expressed as n = F6, where 6 is unrestricted.
B and 1 are minimum distance estimates with
A, =d in eq. (2), section 3.1; to,., and lo,, are minimum distance estimates with Ai = 6, in eq. (6), section
one based on
error for /?o,, is the conventional
4 ([or., is not shown in the table). The first standard
The x2
(FR,
4-l;
the second standard error for &rs is based on (FSZ;F)~Fn;1d6,F(F~;F)~.
statistics are computed from N[k-FG]&[?i-Fs].
x2(23) = 19.36
.I
0.050
(0.017)
(0.021)
0.121
(0.013)
(0.018)
/%,s
(i-3
0.0.56
(0.020)
0.107
(0.016)
errors) ok
SMS.4
(and standard
Coefficients
estimates.
Table 3a
Restricted
36
G. Chamberlain,
Multivariate
regression
Table 3b
Restricted
estimates
under
Coefficients
s^
the constraint
(and standard
that I = 0.
errors) of:
SMSA
RNS
0.157
(0.012)
0.120
(0.013)
-0.150
(0.016)
x2(36) = 89.08
See footnote
to table 3a.
no leads or lags, give union coefficients of 0.195, 0.189, and 0.191 in 1969,
1970 and 1971.17 So the decline in the union coefficient, when we allow for
heterogeneity
bias, is 32% or 44x, depending
on which biased estimate (0.16
or 0.19) one uses. The SMSA and region coefficients also decline in absolute
value. The least squares estimates
for the separate cross-sections
give an
average SMSA coefficient
of 0.147 and an average region coefficient
of
-0.131. So the decline in the SMSA coefficient is either 53% or 62x, and the
decline in absolute value of the region coefficient is either 45% or 37%.
5. Conclusion
We have examined the relationship
between heterogeneity
bias and strict
exogeneity in distributed
lag regressions
of y on x. The relationship
is very
strong when x is continuous,
weaker when x is discrete, and non-existent
as
the order of the distributed
lag becomes infinite.
The individual
specific random
variables
introduce
nonlinearity
and
heteroskedasticity.
So we have provided an appropriate
framework
for the
estimation
of multivariate
linear predictors.
We showed that the optimal
minimum
distance
estimator
is more
efficient,
in general,
than
the
conventional
estimators
such as quasi-maximum
likelihood,
We provided
computationally
simple generalizations
of two- and three-stage least squares
that achieve this efficiency gain.
Using the NLS Young Men in 1969 (N = 1362), Griliches (1976) reports a union membership
coefticient of 0.203. Using the NLS Young Men in a pooled regression for 19661971
and 1973
(N=470),
Brown (1980) reports a coefficient of 0.130 on a variable measuring the probability
of
union coverage. (The union coverage question was asked only in 1969, 1970, and 1971; so this
variable is imputed for the other four years.) The coefficient declines to 0.081 when individual
intercepts
are included
in the regression.
His regressions
also include a large number
of
occupation
and industry specific job characteristics.
G. Chamberlain,
Multiuariate
regression
modekfor
37
panel data
Some of these ideas were illustrated using the sample of Young Men in the
National
Longitudinal
Survey. We examined
regressions
of wages on the
leads and lags in union coverage, SMSA, and region. The results indicate
that the leads and lags could have been generated
just by a random
intercept. This gives some support for analysis of covariance
type estimates;
these estimates indicate a substantial
heterogeneity
bias in the union, SMSA,
and region coefficients.
Appendix A
Let Sz be a set of points where OEQ is a doubly infinite sequence of
vectors of real numbers:
0={...,0_~,0~,0~,...}={0,,t~I),
where w,ER~
and I is the set of all integers. Let z,(w)=o,
be the tth coordinate
function.
Let F be the a-field generated by sets of the form
A = (0.xz,(w)E B,, . . ., Z,+k(u) E Bk},
where t, k E I and the Bs are q-dimensional
Bore1 sets. Let P be a probability
measure defined on 9 such that {e,, t E 11 is a (strictly) stationary
stochastic
process on the probability
space (C&P-, P).
The shift transformation
S is defined by z,(So) =zt+ r(w). It is an invertible,
measure preserving transformation.
A random variable d defined on (sZ,P, P)
is invariant
if d(So)=d(w)
except on a set with probability
measure zero
(almost surely or as.). A set A E 9 is invariant
if its indicator function is an
invariant
random variable.
We shall use E(d ( Y), to denote the conditional
expectation
of the random
variable
d with respect to the o-field 3, evaluated
at w. Let x, be a
component
of zl, let g(x) denote the a-field generated by {. . ., x_ 1, x0, x1,. . .},
and let E(d1 xt,x,_ r,. . .) denote the expectation
of d conditional
on the
a-field generated by xt, xt 1,. . . .
Proposition
3.
Billingsley
(1965,
example
variable.
a.s.
10.3, p.
109).]
Since
d is an
invariant
38
random
G. Chamberlain,
variable,
Multivariate
we have d(Sw)=d(o)
regression
models,for
panel data
Hence
If d is an invariant random variable and E(d) < co, E(xf) < co,
E*(dl...,
x_l,xO,xl
,... )=$+A&
=o
if
V(2) # 0,
if
V(R)=O,
t is any integer,
t E I.
Q.E.D.
G. Chamberlain,
Multioariate
reyression
39
Appendix B
Let r: = (x;,yi), i = 1,. . ., N, where xi = (xii,. . ., xix) and yi = (y,r, . . ., yiM). Write
the mth structural equation as
Yim
S:,Zirn
Uim,
m=l,...,M,
and
Let 0: = (I$~, . ., &), where u& = yim - 6,ozi, and ~5: is the true value of 6,; let
Gz, = E&J
Let 6 =(S;, . . ., 6b) be s x 1, and set
s^=
(S,, D - Sz,) -
(S,,
D s,,,).
Proposition
5.
Assume that (1) ri is i.i.d. according to some distribution with
,finite fourth moments; (2) E[xi(yi, -8: Zi,J] = 0 (m = 1,. , ., M); (3) rank (a..,) = s;
und (4) D a%Y as N-+E, where !P is a positive definite matrix. Then
,I$&
6)s N(0, A), where
Proof:
~(S^-~O)=(S,,D-~S~,)-~~,,D~~~~~,(U~O~~)~~.
strong law of large numbers,
Sz,2@,,;
@2x Y ~ W,
definite
matrix
since rank (@,,)=s.
So we obtain
distribution
by considering
(Gi,, Y - l a:,) - 1 CD,, Y
l f
i=l
(I$ @ X&G.
the central
limit theorem
By
the
is an s x s positive
the same limiting
X~)=E(U~U:
gives ~(8-6)~N(0,A).
Xix;).
Q.E.D.
Then
40
G. Chamberlain,
Multivariate
regression
covariance
Xi xi),
Y = diag{ E($t
matrix
for two-stage
least squares.
If
. ., E(?I~$ xi xi)},
IfAssumption
3.2.
ProoJ:
Let D*(6) = &(0) -g(0)] Yk(8) -g(e)].
D, a.~. converges
uniformly
to D* on Y: Let B be a neighborhood
of 8 and set r = r- B. Then
min D,(@*min
BEY
BET
D*(8)=&
Proposition
7.
If
neighborhood
Assumptions
of 8, we have shown
and
are
satisfied,
then
that
large.
8 Leo.
fi(&-8)
Zf A is positive
optimal
choicefbr
Proof
Let
definite,
then A -(G
Y is A
A - 1 G)- 1 is positive
semi-definite;
41
hence
an
I.
s,(fI)=dD,(B)/80=
-2(aip(e)/a~)A.[a,-g(e)].
Since &SO,
for N sufficiently large we as.
mean value theorem implies that
32 G Y G,
which is non-singular.
fi(e^for sufficiently
considering
Hence @(e^-
Hence
0) = - [t%,(tl*)/ae,] - l JNs,(eo)
large
N.
We
obtain
the
same
a.s.,
limiting
distribution
by
A=CC.
which is positive
Proposition
8.
and if ANaLA-,
semi-definite.
If Assumptions
Q.E.D.
I and 2 are
satisfied,
then
%c2(q
-PI.
if A is positive
definite,
42
Proof:
For sufficiently
large N we have
JNCg(&)-g(OO)]
= G, JE(e^- 0)
where G, %G.
From
a.s.,
7, we have
JE(e^--OO)=R,JN[u,-g(BO)]as.
where R,~R=(GA-G)-lGAp.
fib,
where Q = Z,- GR,
II 2 N(O, I,);
-g(831
.,
Hence
=,,%,
-AeoN
C is a non-singular
-fiCg(83-g(e")l
matrix
such
that
sQCU>
CC = A,
and
-g(8)] %i C Q A - 1 QCu.
Let G=CG
and
M,=Z,-c(G@lc;
idempotent
matrix with rank q-p and
then
M,
is
symmetric
CQA-QC=M;=M,.
Hence d,~,uM,u~~X2(q-p).
Q.E.D.
Let h(a)=g[f(a)].
Choose
oi to
min [ahi-h(a)]AN[u,-h(a)].
OIE
r,
G. Chamberlain,
Proposition
8.
If Assumptions
regression
forpanel
models
1, 2, and 2 are
then d, -d2%~(p-s),
satisfied,
data
43
$ A is positive
where
-W)lA.Ca,-W41,
d, =N[a,
Furthermore,
Multivariate
d, -d,
is independent
Proof
The assumptions
on f and
Y, imply that h and
& satisfy
Assumptions
1 and 2. By following the proof of Proposition
8, we can show
that the vector (d,, d2) converges in distribution
to (d:, d:), where
U& N(O,Z), C is
G=C-G,
and
non-singular
matrix
such
that
CC= A, 8=
C-H,
M,=z,-e(el~-w
MH=Iq-A(RA)-lzT,
=M,;
so
d, -d,~tU(MH-MG)u~~X2(p-s).
Since
cov[(M,-M,)u,M,u]
we see that d: - d,* is independent
=(MH-MG)MG=O,
of d:.
Q.E.D.
y;,rCk)
- W41ANCQ4- W)l.
44
Proposition 9a. Assume that (1) Assumptions 1 and 2 are satisfied for g and
1: (2) 1 is one-to-one and continuous on the range of g(O) for 0~ K 1 has
continuous second partial derivatives in a neighborhood of g(Q); L is nonsingular;
(3) A is positive
3 N(0, A),
where A = (G A
Proof:
definite
and A,
A(LAL)-.
Then
,/%(6-O)
l c)-.*
By the d-method,
fi[f(~,)-h(8~)]~N(O,
L A L).
Hence ,,/%(&
O)% N(0, A), where ,4 = (H(L A L)) 1 H)- and H= c%(O~)/N.
Q.E.D.
Since H= L G and L is non-singular,
we have A =(G A ~ c)-.
Finally,
consider
augmenting
aN to a k x 1 vector
cN: c;V=(a;, bk),
kzq.
(For example,
we can augment
12 by adding
WZ.) Assume
that
cN %<, where 5 = (g(OO),cg), and assume that J%(c~-<~)~N(O,@).
We
shall let TZ be unrestricted.
Let t,k = ($;, I+VJ= (O,t,k) be a 1 x n vector, where
n =p + k - q; set ml(+) = (g(O),I&). Choose $ to
where Ag%Wl.
Then I,&, provides an estimator of 8; we want to compare
this estimator with the following one: choose 8 to
minCa,-g~e8)1A.[a,-g(e)l,
eer
where A,
We shall
A,
and
consisting
following
Proposition
Proof
%A-,
Y is a compact subset of RP, and g is continuous
on Y.
set Y, equal to the Cartesian product of Y and Rkm4. Suppose that
A; are positive
definite,
and that the submatrix
of (Ag)-l
of the first q rows and columns equals Ail.
Then we have the
result:
9b.
$, = 6
Minimizing
4-+,,=
to ti2 gives
-(A~,,)-A~,,Ca,-g(e)l,
45
= [aN -g(O)]
So the
distance
A,[a,
of A;
-g(O)].
addition
of unrestricted
estimator.
the
concentrated
Q.E.D.
moments
does
not
affect
the
minimum
References
Anderson, T.W., 1969, Statistical inference for covariance matrices with linear structure, in: P.R.
Krishnaiah,
ed., Proceedings
of the second international
symposium on multivariate
analysis
(Academic Press, New York).
Anderson,
T.W., 1970, Estimation
of covariance
matrices which are linear combinations
or
whose inverse are linear combinations
of given matrices,
in: Essays in probability
and
statistics (University of North Carolina Press, Chapel Hill, NC).
Amemiya, T., 1971, The estimation
of variances in a variance-components
model, International
Economic Review 12, l-13.
Balestra, P. and M. Nerlove, 1966, Pooling cross section and time series data in the estimation
of a dynamic model: The demand for natural gas, Econometrica
34, 5855612.
Basmann, R.L., 1965, On the application
of the identifiability
test statistic and its exact finite
sample
distribution
function
in predictive
testing
of explanatory
economic
models,
Unpublished
manuscript.
Billingsley, P., 1965, Ergodic theory and information
(Wiley, New York).
Billingsley, P., 1979, Probability
and measure (Wiley, New York).
Brown, C., 1980, Equalizing differences in the labor market, Quarterly Journal of Economics 94,
113-134.
Chamberlain,
G., 1980, Analysis
of covariance
with qualitative
data, Review of Economic
Studies 47, 225-238.
Chiang, C.L., 1956, On regular best asymptotically
normal estimates, Annals of Mathematical
Statistics 27, 336-351.
Cramer, H., 1946, Mathematical
methods of statistics (Princeton
University
Press, Princeton,
NJ).
Ferguson,
T.S., 1958, A method
of generating
best asymptotically
normal
estimates
with
application
to the estimation
of bacterial
densities, Annals of Mathematical
Statistics 29,
10461062.
Goldberger,
AS., 1974, Asymptotics
of the sample regression slope, Unpublished
lecture note no.
12.
Griliches, Z., 1976, Wages of very young men, Journal of Political Economy 84, S69-S85.
Griliches, Z. and A. Pakes, 1980, The estimation
of distributed
lags in short panels, National
Bureau of Economic Research technical paper no. 4.
Hansen,
L.P., 1982, Large sample properties
of generalized
method of moments
estimators,
Econometrica
50, forthcoming.
Hsiao, C., 1975, Some estimation
methods for a random coefficient model, Econometrica
43,
3055325.
Jennrich, R.I., 1969, Asymptotic
properties of non-linear least squares estimators,
The Annals of
Mathematical
Statistics 40, 6333643.
Kendall, M.G. and A. Stuart, 1961, The advanced theory of statistics, Vol. 2 (Griffin, London).
MaCurdy.
T.E., 1979, Multiple time series models applied to panel data: Specification
of a
dynamic model of labor supply, Unpublished
manuscript,
46