Multivariate Regression Models For Panel Data

Journal
of Econometrics
18 (1982) 546.
North-Holland
Publishing
MULTIVARIATE
REGRESSION
FOR PANEL DATA
Company
MODELS
Gary CHAMBERLAIN*
University
Nutionul
(!/ K+sconsin Madison,
Bureau of EconomicResearch,
WI 53706, USA
Cambridge,
MA 02138. USA
The paper examines

the relationship
between heterogeneity
bias and strict exogeneity
in a
distributed
lag regression
of y on X. The relationship
is very strong when x is continuous,
weaker when x is discrete, and non-existent
as the order of the distributed
lag becomes
random
variables
introduce
nonlinearity
and
heteroinfinite.
The individual
specific
skedasticity;
so the paper provides an appropriate
framework
for the estimation
of multivariate
linear predictors.
Restrictions
are imposed using a minimum distance estimator.
It is generally
more efficient than the conventional
estimators
such as quasi-maximum
likelihood.
There are
computationally
simple generalizations
of two- and three-stage
least squares
that achieve
this efficiency gain. Some of these ideas are illustrated
using the sample of Young Men in the
National
Longitudinal
Survey. The paper reports regressions on the leads and lags of variables
measuring union coverage, SMSA, and region. The results indicate that the leads and lags could
have been generated
just by a random
intercept.
This gives some support
for analysis of
covariance
type estimates; these estimates indicate a substantial
heterogeneity
bias in the union,
SMSA, and region coefficients.
1. Introduction
Suppose that we have a sample of individuals
(or firms) followed over time:
and i= 1,. ., N individuals.
(xif,yiJ, where there are t= 1,. . ., T periods
Consider the following distributed
lag specification:
E(YitIXil,...,XiT,biO,...,biJ,Ci)=
i bijXi,t-j+Ci,
j=O
t=J+l,...,T
The coefficients b,, and ci are allowed to vary across individuals

but are
constant
over time. The population
parameters
of interest are fij= E(bij),
J. If the bii or ci are correlated with x, then a least squares regression
j=O,...,
*I am grateful to Arthur Goldberger, Zvi Griliches, Donald Hester, George Jakubson, Ariel
Pakes, and Burton Singer for comments and helpful discussions. Financial support was provided
by the National
Science Foundation
(Grants No. SOC-7925959
and No. SES-8016383) and by
funds granted to the Institute for Research on Poverty at the University of Wisconsin, Madison,
by the Department
of Health,
Education,
and Welfare pursuant
to the provisions
of the
Economic Opportunity
Act of 1964.
01657410/82/000Cr0000/$02.75
1982 North-Holland
G. Chamberlain, Multitlariate reqression models for panel data
of y, on x,, ., .xtmJ will not provide a consistent

estimator
of the B,i (as
N-+co). We shall refer to this inconsistency
as a heterogeneity
bias.
In section 2, on identification,
we consider first the case J =0 and bij=Pj
We argue that the presence of heterogeneity
bias will be signalled by a full
set of lags and leads in the least squares regression
of y, on x1,. . .,xT
Furthermore,
if we let y=(yi,..
.,yr), x=(xr,. . .,x,) and consider
the
multivariate
linear predictor: E*(y lx) = no + lI,x,
then the T x T matrix ZZ,
should have a distinctive
pattern the off-diagonal
elements within the
same column are all equal. In that case,
so there is just a contemporaneous

relationship
when we transform
to first
differences.
I think that a test for such restrictions
should accompany
analysis of covariance type estimation.
There is an analogous
question when J is finite and the bj are random as
well as c. Does E(y, 1x1,. . ., xT) = E(y, 1x,, . . ., xtmJ) imply that there is no
heterogeneity
bias? We find that the answer is yes if x has a continuous
distribution
but not if x is discrete.
New issues arise as the order (J) of the distributed
lag becomes infinite.
We consider this problem in the context of a stationary
stochastic process; c
and the bj are (shift) invariant
random variables. There are invariant
random
variables with non-zero variance if and only if the process is not ergodic. We
pose the following question: if
E*(Y, I . . ..Xf-1.&,Xt+1,..
. I=
E*(Y,
1x,, x, -
1,.
.I,
so that y does not cause x according to the Sims (1972) definition, is it then
true that there is no heterogeneity
bias? The answer is no, because if d is an
invariant random variable, then
E*(dI . . .. x,_~,x,,x,+~
,... )=E*(dIxt,xtpl
,... ).
Section 3 of the paper considers

the estimation
of multivariate
linear
predictors. lhere is a sample ri = (x;,y$ i = 1,. . ., N, where x; = (xi,, ., xiK) and
yi=(y,r,. . ., yiM). We assume that ri is independent
and identically
distributed
(i.i.d.) according to some distribution
with finite fourth moments. We do not
assume that the regression function E(ji 1xi) is linear; for although E(ji 1xi, ci)
may be linear, there is generally no reason to insist that E(c,j xi) is linear.
Furthermore,
we allow the conditional
variance
V(_V,1xi) to be an arbitrary
function of xi; the heteroskedasticity
could, for example, be due to random
coefficients. Let wi be the vector formed from the squares and cross-products
of the elements of vi; let Zl be the matrix of linear predictor
coefficients:
G. Chamberlain,
Mulrirw-iate
regression
models fir
panel data
,5*Cyi ( xi) =ZIx,

where fl= ECy,x;)[E(xix~)] -I. Then wi is i.i.d.
function of E(wi). So the problem is to make inferences about
functions of a population
mean, under random sampling.
This is straightforward
and the results have a variety of novel
Let ii be the least squares estimator; let it and 71 be the vectors
and f7 is a
differentiable
implications.
formed from
the columns of ii and II. Then fi(7i-~)~N(O,Q)

as N-t co. The formula
for C2 is not the standard
one, since we are not assuming
homoskedastic,
linear regression.
We impose restrictions
by using a minimum
distance estimator:
find the
matrix satisfying the restrictions
that is closest to fi in the norm provided by
fi -I, where fi is a consistent (as N+ 00) estimator of a. This leads to some
consider
a univariate
linear
predictor:
surprising
results.
For example,
E*(yi 1xii, xiz)= x0 + zlxil + n2xi2. We can impose the restriction
that n2 =0
by using a least squares regression of y on x1 to estimate rcr; however, this is
asymptotically
less efficient, in general, than our minimum distance estimator.
The conventional
estimator is a minimum
distance estimator, but it is using
a different norm.
A related result is that two-stage
least squares is not, in general, an
efticient procedure
for combining
instrumental
variables;
three-stage
least
squares is also using the wrong norm. We provide more efficient estimators
for the linear simultaneous
equations
model by applying
our minimum
distance procedure
to the reduced form, thereby generalizing
Malinvauds
(1970) minimum
distance estimator.
Suppose that the only restrictions
are
that certain structural
coefficients are zero (and the normalization
rule). We
provide
a generalization
of three-stage
least squares that has the same
limiting
distribution
as our minimum
distance
estimator.
There is a
corresponding
generalization
of two-stage least squares.
We also consider the maximum
likelihood
estimator
based on assuming
that ri has a multivariate
normal distribution
with mean z and covariance
matrix Z. Then the slope coefficients in IZ are functions
of C and, more
generally,
we can consider estimating
arbitrary
functions
of C subject to
restrictions.
When the normality
assumptions
do not hold, we refer to the
estimator
as a quasi-maximum
likelihood
estimator.
The quasi-maximum
likelihood estimator has the same limiting distribution
as a certain minimum
distance estimator;
but in general that minimum
distance estimator
is not
using the optimal norm. Hence our estimator is generally more efficient than
the quasi-maximum
likelihood estimator.
Section 4 of the paper presents an empirical example that illustrates some
of the results. It is based on the panel of Young Men in the National
Longitudinal
Survey (Parnes); y, is the logarithm
of the individuals
hourly
wage, and x, includes variables to indicate whether or not the individuals
wage is set by collective bargaining;
whether or not he lives in an SMSA;
and whether or not he lives in the South. We present unrestricted
least
G. Chamberlain,
Multivariate
regression models for panel data
squares regressions of y, on xi,. ., xT. There are significant leads and lags; if
they are generated just by a random
intercept
(c), then ZZ should have a
distinctive
form. There is some evidence in favor of this, and hence some
justification
for analysis of covariance
estimation.
In this example, the leads
and lags could be interpreted
as due just to c, with E(y, 1x1,. . ., xT, c) =j?x, + c.
2. Identification
Suppose
technology,
that
a farmer
is producing
a product
o<p<1,
Y,=Px,+c+~,,
with
a Cobb-Douglas
t=l,...,7;
where y, is the logarithm

of output, x, is the logarithm
of a variable input
(labor), c represents
an input
that is fixed over time (soil quality),
U,
represents
a stochastic
input (rainfall), which is not under the farmers
control, and t indexes the seasons. We shall assume that the farmer knows
the product price (P) and the input price (W), which do not depend on his
decisions, and that he knows c. The factor input decision, however, is made
before knowing
u,, and we shall assume that xt is chosen to maximize
expected profits. Then the factor demand equation is
x, = {ln /I + ln CE(e 1%)I + ln(P,/&)
+ c}/(l -
p),
where L?$*is the information

set available to the farmer when he chooses xt.i
Although c is known to the farmer and affects his factor demand decisions,
we assume that it is not known to the econometrician.
He observes only
yr)
and
x
=(x1,.
.
.,
xT)
for
each
member
of
a
sample
of N farms.
y=(y1,...,
Consider the least squares regression of y, on x1 using just a single crosssection of the data. The population
counterpart
is
where E* is the minimum

regression function),
mean
711=cov(Yl,4Vx,),
square
error
linear
predictor
(the wide-sense
%=E(Y~)--~E(xJ
Cov(c, x,)#O if V(c)#O; then n, #p and the least squares estimator of /I does
not converge to fi as N-co.
Furthermore,
with a single cross-section,
there
would be no internal evidence of this heterogeneity
bias.
This example
is discussed
in Mundlak
(1961,1963)
and in Zellner,
Kmenta,
and Dr6ze (1966).
G. Chamberlain,
:Illrltirariate
regression
models for panel data
With more than one observation

per farm, however, we can consider the
counterpart
least squares regression of y, on x = (xi,. ., xT). The population
is
E*(y, I x) = pxt + E*(c ( x) + E*(u, I x).
Assume
that V(X) is non-singular.

E*(c 1x) = $ +
xx,
Then
Iz= I/
(x) cov(x, c).
Even if E*(u, / x) =O, there will generally

be a full set of lags and leads
if V(c) # 0. For example, if cov (xt, c) =cov (x,, c), t = 1,. . ., ?; then Iz is proportional
to the row sums of V-(x), and all of the elements
of I will
typically be non-zero. I think that it is generally true that E*(c lx) depends
on all of the x,s if it depends
on any of them. So the presence
of
heterogeneity
bias will be signalled by a full set of lags and leads. Also, if
E*(u) x)=0,
then
the wide-sense
multivariate
regression
will have a
distinctive pattern:
co+, x) v
(x) = p I, + 1A,
where 1 is a TX 1 vector of ones. The off-diagonal

elements within the same
column of ll, are all equal.
A common solution to the bias problem is some form of analysis of covariance. For example, we can form the farm specific means (j?=CT= 1 y,/T,
X =cT= 1 x,/T) and the deviations around them (jt = y, - j, 3, = x,-X), and then
run a pooled least squares regression of ~7 on 2. This is equivalent
to first
running the least squares regression of g* on & for each of the T cross-section
samples, and then forming a weighted average of the T slope coefficients. The
population
counterpart
of the tth least squares regression is
So the least squares regression of Y; on ?r provides a consistent

(as N-co)
estimator of fl only if E*(u, - Ul X,-Z?) =O. I would not expect this condition
to hold unless
E*(u,-uu,.
j-~~-x~,...,x~-x~~~)=O,
t = 2,
,) 7:
This analysis of covariance

estimator
was used by Mundlak (1961). Related estimators
have
been discussed by Balestra and Nerlove (1966), Wallace and Hussein (1969), Amemiya (1971),
Maddala (1971), and Mundlak (1978). Analysis of covariance in nonlinear models is discussed in
Chamberlain
(1980).
10
G. Chamberlain,
Mu&variate
so that x is strictly exogenous

differences.3 The strict exogeneity
when we transform
the model to first
restriction is testable since it implies that
E*(!,,-Yr~1IXz-X1,..., xT--x.-l)=~*(Yt-Y,-l
hence there are exclusion restrictions
A stronger condition is that
E*(u,lx
on the linear
I-+-x,-d
predictors.
t=l,...,T.
,,..., xT)=o,
This implies that Zl, has the form fiZ,+l1.

These restrictions
on n,are
testable; we can summarize
them by saying that x is strictly exogenous
conditional
on c. The restrictions
would fail to hold in the production
function
example
if u, is partly
predictable
from its past, so that
E[exp(u,) 1LAY,]depends on u, _ r, u, _ 2, . . .
Now suppose that the technology varies across the farms, so that
y,=bx,+c+u,,
where b is a random variable that is constant over time. We shall refer to b
and c as invariant
random
variables.
Our discussion
of E*(c lx) indicated
that it depends on all of the x,)s if it depends on any of them. I would expect
this to be true of E(c 1x) as well. This general characteristic
of invariant
random variables is formulated in the following condition:
Condition (C). Let x* =(xt,, . . ., xlK), where {tr,. . ., tK} is some proper subset of
random
variable. Then E(d 1x)=E(d I x*)
{l,..., T). Let d be an invariant
implies that E(d 1x) = E(d).
Suppose that the parameter
of interest is /l=E(b). If b or c is correlated
with x, then a least squares regression
of y, on x, will not provide
a
consistent estimator of /I. We have argued that such a heterogeneity
bias will
be signalled by a full set of lags and leads when we regress y, on (x1,. . ., xT).
Under what conditons can we infer that there is no bias if we observe only a
contemporaneous
relationship?
Proposition
1 provides some guidance; it can
be extended easily to the case of a finite distributed
lag.
Condition (R).
Prob (x, =x, _ 1) = 0 for some integer
Proposition
Suppose that
I.
E(y, I x, b, 4 = b x, + c,
3The strict exogeneity
terminology
n with i 5 n 5 T.
t=l,...,T.
is based on Sims (1972, 1974)
G. Chamberlain, Multivariate regression models
[f conditions
fir
panel data
11
(C) and (R) hold and if T 23. then

E(Y,
I4 = E(Y,I 4,
t=l,...,7:
implies that
where /I = E(b) = E(b ) x) and y = E(c) = E(c )x).

ProoJ:
The following
equalities
hold with probability
E(bI 4 = CE(Y, I 4 - E(Y,

So E(bIx)=E(bI
and
x,, x, _ I), and
I xn -
one:
d/(x, - xn- 11,
if T2 3, then
(C) implies
that
E(b 1x)= E(b),
~(clx)=E(y,lx)--E(blx)x,=E(y,lx,)--xx,;
hence E(c 1x) = E(c 1x1) and so E(c 1x) = E(c).
Q.E.D.
This analysis can be applied to linear transformations

of the process. If
we find that E(y, 1x) has a full set of lags and leads, then we can ask if
that is just due to E(c/x)#E(c).
Let dy,=y,-y,_,,
Ax~=x~--x~-~,
and
Ax = (Ax,, . . ., Ax,). Under the assumptions
of the proposition,
if
E(AY, 1A4 = E(AY, ( Ax,),
then
E(AY, 1A4 = B(A-4.
is possible
to find E(Ay, 1Ax)=E(Ay,
(Ax,) even though
or
example,
consider
the
stationary
case
in which cov(x,, b)
-W+)#W).
F
= cov (x,, b); then E*(b 1Ax) = E(b) and so E(b 1Ax)= E(b) if the regression
function of b on Ax is linear. Then we might find that E(Ay,) x) has a full set
of lags and leads even though E(Ay, 1Ax) does not.
The condition
that prob(x,=x,_
,)=O is necessary.
For consider
the
following
counter-example:
E(b ( x) = /II1 if x1 =. . . = xT, E(b 1x) = p2 if not
Note
that
it
(PI f PA. Then
G. Chamberlain,
12
but p2 #E(b)
distinction
here
only takes on
probability
that
for large 7:
The following
distinction;
it is
Multivariate
regression
unless
prob(x, = ... =
between continuous
a finite set of values,
x1 =. . . = xT, although
models
for panel data
xT) = 0. So there
is an important
and discrete distributions
for x. If x,
then there will generally
be positive
this probability
may become negligible
proposition
provides
some additional
insight
mto
based on a condition that is slightly weaker than (R):
Condition (R).
Prob(x,
Proposition 2.
Suppose
this
= x2 =. . . = xT) = 0.
that
E(Y,I x, b,4 = bxt+ c,
t=l,...,7;
where T 2 2. Assume that condition (R) holds and define
6=til
(Yt-m-+l(x,--v.
Then E(6j = E(b) if E((6j) < a.4
ProoJ
The following
equalities
E(l+,b,c)= i
hold with probability

i
b(x,-X)
one:
(x,-%)2=b;
I t=1
t=1
so if E(I6/)< co,
E(6j = E[E(6[ X, b, c)] = E(b).
Suppose
that (yil,. . ., yi,, xii,. . ., xiT),
from the distribution
of b,x). Define
6zt$l
(Yit-Pi)(xit-xi)
Q.E.D.
i= 1,. . ., N,
is
random
sample
til (xit-xi)2.
Then if the assumptions

of Proposition
2 are satisfied, cr= I &i/N converges
almost surely (as.) to E(b) as N-co.
It is important
that gi is an unbiased
estimator
of E(b), since we are actually taking the unweighted
mean of a
*The assumption
that E(161)< co is not innocuous.
For example, suppose that V(c)= V(b)=0
and (x,, y,) is independent
and identically distributed
(t = 1,. ., T) according to a bivariate normal
with
distribution.
Then h^=b+{ P(y, Ix~)/[(T-~)V(.X,)]}~
w, where w has Students t-distribution
T- 1 degrees
of freedom.
Hence Q/61) < cc only if T 2 3.
G. Chamberlain, Multiuariate regression models for panel data
13
large number of these estimators.

The lack of bias requires that x be strictly
exogenous
conditional
on b,c. It would not be sufficient to assume that
E(y, ( xt, b, c) = bx, + c. For example, if x, = y,_ 1, then our estimator would not
converge to E(b), due to the small T bias in least squares estimates of an
autoregressive
process.
Let Di =0 if xi1 = .. .. = xiT, Di= 1 if not. We can compute gi only for the
group with Di= 1. The sample mean of bi for that group converges as. to
E(b 1D = l), but we have no information
on E(b 1D = 0). So unless prob(D = 0)
= 0, any value for E(b) is consistent with a given value for E(b 1D = 1).5
If x, has a continuous
distribution,
then the assumption
that the regression
function is linear (E(y, 1xt, b, c) = bx, + c) is very restrictive; the implication
of
this assumption
(combined
with strict exogeneity) is that we can obtain an
unbiased
estimator
for b, and hence a consistent
(as N+co)
estimator
for
E(b). If x, is a binary variable, then the assumption
of linear regression is not
restrictive
at all; but there are fewer implications
since there is positive
probability
that 6is not defined for finite ?:
The following extension
lag is straightforward?
Proposition
1.
of Proposition
1 to the case of a finite distributed
Suppose that
E(y,IX,b,,...,b,,c)=
t=J+l,...,T
bjx,-j+c,
j=O
If condition (C) holds, ij
ii
1 X,-J-l
:1
X,-J
. . . . x&,
,fbr some integer n with 25 + 2 5 n 5 7; and if T 2 25 + 3, then

E(Y,
I4
= E(Y, I x,, . . .>X,-J),
t=J+1,...,7;
5A solution
could be based on Mundlaks
(1978a) proposal
that E(bIx)=$,,+$,
CT=, x1.
However, even if we assume that the regression function is linear in x1,. .,xT, it may be difficult
to justify the restriction
that only cx,
matters, unless T is large and we have stationarity:
cov (b, I,) = cov (b, x1) and V(x) band diagonal. (See Proposition
4 and the discussion preceding
it). Furthermore,
if cov(h, x,) = cov(b, x1), then E(b 1x2-x,,
.,xr -xT- 1)= E(b) (if the regression
function is linear), and so there is no heterogeneity
bias once we transform to first differences.
6We shall not discuss the problems
that arise from truncating
the lag distribution
when
T < J + 1. These problems are discussed in Griliches and Pakes (1980). By working with linear
transformations
of the process, it is fairly straightforward
to extend our analysis to general
rational distributed
lag schemes.
G. Chamberlain, Multivariate regression models for panel data
14
implies
that
E(y,Ix)=
Bjxt-j+Y,
j=O
where
pj = E(bj) = E(b, 1x)
and
y = E(c) = E(c Ix),
j=O,...,
The extension of Proposition

2 is also straightforward.
There
however, in the infinite lag case, which we shall take up next.
Large number of lags.
Suppose
E(.Yfldx),c)= f
J.
are new issues,
that
Bjxt-j+c2
i=O
where O(X) is the information

set (a-field) generated
by {. . .,x_ I, x0, x1,. . .},
and Cj=o /Ij x,_ j converges in mean square as J-+ co. Consider a regression
version of the Sims (1972) condition
for x to be strictly exogenous (y does
not cause x),
E(Yt
I 4) = E(Yt I x,2 xt -
19.. 4
Does this condition

imply
that E(c 1a(x))=E(c),
so that there is no
heterogeneity
bias?
We shall consider this question
in the context of a (strictly) stationary
stochastic
process. Since c does not change over time, it is an invariant
random variable. The following proposition
is proved in appendix A:
Proposition
3.
Ifd is an invariant random variable with E(ldl)< co, then
E(dIo(x))=E(dlx,,x,-,,...),
where t is any integer.
It follows that
n
E(Y,Ia(x))=E(cIx,,x,-,,...)+
C Pjx*-j
j=O
=E(y,Ix,,x,-I,...).
So we cannot
rule out heterogeneity
bias just because
y does not cause x. If
1s
a large number
of lags have been included, then a small number of leads
provide little additional
information
on c.
We can gain some insight into this result by considering
the linear
predictor of an invariant
random variable. Let
E*(c 1xl,. . ., x.)=IC/T+&XT,
where
2;. =(& i, . . .) A,,)
and
x;=(xl,...,xT).
Stationarity
implies that I,=rV(xT)l, where r =cov(xl, c) and 1 is a TX 1
vector of ones. Since V(x,) is a band-diagonal
matrix, I is approximately
an
eigenvector
of I+,)
for large T; hence &.x,EzIc~T=
1x,. For example, if
X, = px, _ i + u,, where v, is serially uncorrelated,
then
&-x,=~
(1-PI i xt+P(x, +x,) /cu+P) vxln

1
i=l
Now in this example, L&K, does not approach

a limit as T--+Lx unless
z = cov (x,, c) =O. In fact cov (xi, c) is zero here, since there is a non-trivial
linear predictor only if cj=O x,_ j/J converges to a non-degenerate
random
variable as J-rco.
The general
Proposition
then
case is covered
4.
by the following
proposition:
If d is an invariant random variable
E*(d I . ..) X_1,X&Xl)...
)=$
and E(d) < co, E(xf) < 00,
+/IT?,
where 2 is the limit in mean square of cJ= Ox, _ j/J as J+ co, t is any integer,
A=cov(d,i)/V(i)
=o
if
V(a)#O,
if
V(a)=O,
and
$ = E(d) - AE(f).
(See appendix
A for proof.)
The existence of the f limit, both in mean square and almost surely, is the
main result of ergodic theory and will be discussed further below. It is clear
that 2 is an invariant
random variable. If V(a)#O, then the x process has a
(non-degenerate)
invariant
component,
and conditioning
on the xs gives a
G. Chamberlain,
16
Multioariate
regression
modelsfor
panel data
non-trivial
linear predictor if 2 is correlated with c. However, if V(i)=O, then
cov(c, x,)=0 for all t, and the linear prediction
of c is not improved
by
conditioning
on the xs
It follows from Proposition
E*(Y,
4 that
I . . .. x,-1,x*,x,+
,,..
=E*(ytIxt,x,-,,...I
=i+jio
( 1
Bj++
xt - j + r(J),
where r(J) converges in mean square to zero as J-co.

So y does not cause x
according to Sims definition; but this does not imply that c is uncorrelated
with the xs. If we include a large number of lags, then the bias in any one
coefficient is a negligible A/J, but the bias in the sum of the lag coefficients
tends to 2 as J-co.
If we include K leads, then the sum of their coefficients
is approximately
K3,/J, which is close to zero when J is much larger than K.
If the pi are zero for j> J*, then the lag coefficients beyond that point will
be close to zero but their sum will be close to II.
there are non-degenerate
invariant
Under the stationarity
assumption,
random variables if and only if the process is not ergodic. The basic result
here is the (pointwise)
ergodic theorem:
Let g be a random
variable
on
and
let
g,(o)=g(Sw),
where
S
is
the
shift
(Q,F,P)
with E(lgl)< co,
transformation
(see appendix A); then the following limit exists as.:
The limit kj is an invariant

random
variable;
it is the expectation
conditional
on &, where f is the information
set (a-field) generated by
the invariant
random variables. If 1/(i) # 0 for some g, then the process
ergodic. In the ergodic case, all of the invariant
random
variables
degenerate distributions.
Suppose
of 8,
all of
is not
have
that
E(Y,I 44, A= b x, + c,
and let
Gil
Recall
condition
(Y,-Ylbt--x)
il h-3.
(R):
=...=x~)=O.
prob(s,
want
to
examine
the
G. Chamberlain, Multinariate
significance
of condition
So a limiting
version
(R) as T+n;
of condition
in the stationary
17
case. Note that
(R) is
prob[ I/(x, ) f) = 0] = 0.
If this condition
holds, then
~~(xlY1l~)-~(xlI8)~(YlI,a)~,
limb
E(4 I &)-cm,
T-rX
as.
I &)I2
and b is observable
as T-tco.
But if there is positive probability
that
T/(x, 1f) =O, then the identification
problem is more difficult. There is no
information
on b for the stayers; so-in order to obtain E(b), even as T-co,
we have to make untestable
assumptions
about the unobservable
part of the
b distribution.
3. Estimation
Consider
a sample
Y;=(x:,yi),
i = 1,. . .,X, where
xi. = (xi,, . ., xiK), yi
=(yil,. . ., yiM). We shall assume
that vi is independent
and identically
distributed
(i.i.d.) according
to some multivariate
distribution
with finite
fourth moments
and E(x,x:) non-singular.
Consider
the minimum
mean
square error linear predictors,
E*(yi, I xi)
m=l,...,M,
=dlxi>
which we can write as

E*bi 1xi) = LZxi
with
tZ = Ebi xi) [E(xi xi)]
We want to estimate ll subject to restrictions

and to test those restrictions.
For example, we may want to test whether a submatrix
of Ll has the form
/?Z+lA. I think that analysis of covariance estimation
should be accompanied
by such a test.
We shall not assume that the regression
function E(y, 1xi) is linear. For
although E@, 1xi, ci) may be linear (indeed, we hope that it is), there is generally
This agrees with the definition
in section
2 if xi includes
a constant.
18
G. Chamberlain,
Multivariate
no reason to insist that E(ci Ixi) is linear. So we shall present a theory of

inference for linear predictors. Furthermore,
even if the regression function is
linear, there may be heteroskedasticity
due to random
coefficients, for
example.8 So we shall allow V(j, 1xi) to be an arbitrary function of xi.
3.1. The estimation
of linear predictors
Let wi be the vector formed from the distinct elements of riri that have
non-zero
variance.
Since v;=(xi,yi)
is i.i.d., it follows that wi is i.i.d. This
simple observation
is the key to our results. Since IZ is a function of E(wi),
our problem is to make inferences about a function of a population
mean,
under random sampling.
Let p= E(w,) and let IL be the vector formed from the columns of ll [Z
= vet (IZ)]. Then YI is a function
of P: x=/z(p).
7i = h(w) is the least squares estimator:
Let
W= cy2 1 w,/N;
then
.=VeC[(~~XixI)-~~XiYI].
By the strong law of large numbers,

W converges almost surely to p as
(WL $), where p is the true value of p. Let n=h(~o). Since h(p) is
7~. The central limit theorem implies
continuous
at p =p, we have 2%
that
N-tee
J5$i-pO)%v(O,V(w,)).
Since h(p) is differentiable
at p = PO, the &method
gives
JN(iZ-d)%v(O,R),
where
We have derived the limiting distribution

of the least squares estimator.
This approach
was used by Cramer
(1946) to obtain
limiting
normal
*Anderson
(1969,1970),
Swamy (1970,1974),
Hsiao (1975), and Mundlak
estimators
that incorporate
the particular
form of heteroskedasticity
that
random coefficients.
See Billingsley (1979, example 29.1, p. 340) or Rao (1973, p. 388).
(1978a) discuss
is generated
by
G. Chamberlain,
Multivariate
19
distributions
for sample correlation
and regression
coefficients (p. 367); he
presents an explicit formula for the variance of the limiting distribution
of a
sample correlation
coefficient (p. 359). Kendall and Stuart (1961, p. 293) and
Goldberger
(1974) present
the formula
for the variance
of the limiting
distribution
of a simple regression coefficient.
Evaluating
the partial derivatives
in the formula for 52 is tedious. That
calculation
can be simplified since i has a ratio form. In the case of simple
regression with a zero intercept, we have rc= E(y,x,)/E(xj!) and
fi(kTO)=
i=l
Since I?= r x/N*E(x?),

with
fl
The definition
implies that
y.u.I
I
we obtain
Ql
xi)[fi(
distribution
by working
of rc gives E[(y, - rcxi)xi] = 0, and so the central
limit theorem
C(Yi- noxi)xillCfi
the same limiting
,$ m)].
E(xZ)l,
This approach
was used by White (1980) to obtain the limiting distribution
B (Proposition
5) we
for univariate
regression
coefficients. lo In appendix
follow Whites approach to obtain
s2 = E[iJJi-noxi)(yi
-nOx,) @@i;l (Xi xi) @, 1,
(1)
where
@, = E(qx;).
A consistent
estimator
sample moments,
o=&$
L 1
of 52 is readily
[~i-Bxi)(JJi-fiXi)@
n here
S,=
5 x,x:/N.
i=l
Also see White (1980a,b).
available
from
S;(Xixi)S;q
the
corresponding
AL?,
(2)
20
G. Chamberlain,
Multkariate
regression
If E(j, 1xi) =ZZx, so that the regression
If Vcvi 1Xi) is uncorrelated
If the conditional
variance
depend on xi, then
3.2. Imposing
restrictions:
modelsfiv
function
panel data
is linear,
then
with xix;, then
is homoskedastic,
so that
V(j, 1xi)= C does not
The minimum distance estimator
Since IZ is a function of E(w,), restrictions

on ZZ imply restrictions
on E(wi).
by the
Let the dimension
of r=E(wi)
be q. We shall specify the restrictions
condition
that ~1 depends only on a p x 1 vector 8 of unknown
parameters:
p
=g(8), where g is a known function and psq. The domain of 8 is X a subset
of p-dimensional
Euclidean space (RP) that contains the true value 8. So the
restrictions imply that ~=g(6)
is confined to a certain subset of Rq.
We can impose the restrictions
by using a minimum
distance estimator:
choose &to
where A, ff-i !P and !P is positive definite.

equivalent
to the following one: choose 6 to
This
minimization
problem
is
The properties
of 6 are developed, for example, in Malinvaud
(1970, ch. 9).
Since g does not depend on any exogenous variables, the derivation
of these
properties can be simplified considerably,
as in Chiang (1956) and Ferguson
(1958).
For completeness,
we shall state a set of regularity
conditions
and the
properties that they imply:
If there is one element
in ripi with zero variance,
then q = [(K + M)(K + M + 1)/2] - 1.
21
Yis a compact subset of RP that contains 6; g

Assumption 1. uN aAg(Bo);
is continuous
on yT and g(6)=g(O) for 0~ Y implies that 8=8;
A, s Y,
where Y is positive definite.
Assumption 2. $?[a,-g(O)]
O in which g has continuous
G = ag(eOym.
Choose 8 to
E. of
%(O, A); r contains
a neighborhood
second partial derivatives; rank (G) =p, where
minCa,-g(e)lA.Ca,-s(e)l.
0Er
Proposition
6.
If Assumption
Proposition
where
7.
Zf Assumptions
I is satisfied, then ea%Oo.

I and 2 are satisfied, then ,,/%(&O)%V(O,
If A is positive definite, then A -(CT A - 1 c)- 1 is positive semi-definite;

optimal choice for Y is A .
Proposition
8. If Assumptions I and 2 are satisfied,

definite matrix, and if A,%AI, then
Wwd831
(This is extended
B.)12
A),
hence an
if A is a q x q positive
4vC~,-g(B)1%2kp).
to the case of nested restrictions
in Proposition
8, appendix
Suppose that the restrictions involve only Zl. We specify the restrictions
by
the condition
that z=f (4, where 6 is s x 1 and the domain of 6 is Y,, a
subset of R that includes the true value 6. Consider the following estimator
of 6: choose s^ to
~:CA-f(6)]8-[li-f(S)],
1
Since
appendix
ch. 9).
the proofs are simple, we shall keep the paper self-contained

and include them in
B. The proofs are based on Chiang (1956), Ferguson
(1958), and Malinvaud
(1970,
G. Chamberlain, Multioariate regression models for panel data
22
where
definite.
fi is given
If Y, and
fi(&
in eq. (2) and
we assume
f satisfy Assumptions
so)qo,
[F
that
in eq. (1) is positive
1 and 2, then 6^3S,
n ~l Fj
- ),
and
where
F=
i3f(d)/W.
We can also estimate So by applying the minimum distance procedure to w

instead of to Iz. Suppose that the components
of wi are arranged
so that
w:=(w;,, wQ, where wil contains the components
of x&. Partition
p=E(wi)
conformably:
p = (PC;,&). Set 8 = (8r, VZ)= (8, pi). Assume
that
V(w,) is
positive definite. Now choose 6 to
and g,(n, ~1~)= pr. Then &r gives an estimator of 6;

distribution
as the estimator
8 that we obtained by
distance procedure to 12.(See Proposition
9, appendix
This framework
leads to some surprising
results
For a simple example, we shall use a univariate linear
E*(yi 1Xil,Xiz)=710 +
Consider
?Tl Xi1
it has the same limiting

applying the minimum
B.)
on efficient estimation.
predictor model,
+7Cz Xi2.
imposing the restriction

rc2 = 0. Then the conventional
estimator of
the
slope
coefficient
in
the
least
squares
regression
of
y
on x1. We
n1 is byx,,
shall show that this estimator
is generally less efficient than the minimum
distance
estimator
if the regression
function
is nonlinear
or if there is
heteroskedasticity.
Let fi,,it, be the slope coefficients in the least squares multiple regression
of y on x1,x2. The minrmum distance estimator
of a, under the restriction
rrZ =0 can be obtained
as 6=72r +r&
where r is chosen to minimize
the
G. Chamberlain,
(estimated)
variance
Multivariate
of the limiting
regression
distribution
where Qj, is the estimated

covariance
between
distribution.
Since 72, = bYx, - 722bx2x1,we have
23
of & this gives
tij and
I& in their
limiting
and
if V(y, 1xii, xi2)=a2,
then
w12/022 =
If E(Y, 1Xil,XiJ
is linear
-COv(Xi,,Xi2)/~(Xi~)
and s^= byxl. But in general 8# byxl and s^ is more
efficient than by_. The source of the efficiency gain is that the limiting
distribution
for ti, has a zero mean (if rc2=O), and so we can reduce variance
without
introducing
any bias if 72, is correlated
with b,,l. Under
the
assumptions
of linear regression
and homoskedasticity,
b,_ and 72, are
uncorrelated;
but this need not be true in the more general framework that
we are using.
3.3. Simultaneous
squares
equations:
A generalization
of two-
and three-stage
least
Given the discussion on imposing restrictions, it is not surprising that twostage least squares is not, in general, an efficient procedure
for combining
instrumental
variables.
I shall demonstrate
this with a simple example.
Assume that (yi,zirxil,xi2)
is i.i.d. according to some distribution
with finite
fourth moments, and that
yi = 6 Zi +
Vi,
where E(ui xii) = E(ui xi2) = 0. Assume also that E(zi xii) # 0, E(z, xi2) # 0. Then
there are two instrumental
variable estimators that both converge a.s. to 6:
$jcifI YixijlifI zixij,
j= 1,2,
fi{(;;)-(;)}-N(OJ)>
where the j, k element
of n is
2, = EC(Yi-dzi)2XijXi!J
Jk
E(zixii)E(zi.xik)
j,k=1,2.
24
The two-stage
least squares estimator
combines
8,
^
zi=7c1xil +ti2xi2, based on the least squares regression
sume that E[(xir, Xia)(Xil, xi2)] is non-singular),
and & by forming

of z on x1,x2 (as-
where
N
oiti,
ZiXil
i=l
Since i %a,
JN(&s,,
-6)
I(
rili~lzixil+722
C
i=l
has the same limiting
This suggests finding the r that minimizes

distribution
of fi[r($i
- 6) + (1 -r)(& -S)].
minimum distance estimator: choose e^to
zixi2
)
distribution
as
the variance of the limiting

The answer
leads to the
gives
e^=z&+(l-z)&,
where
~=(~+1,2)/(3.1+2~12+~22),
obtained
by using a
and Ijk is the j, k element
of A - . The estimator
consistent estimator of A has the same limiting distribution.
In general z #a since r is a function
of fourth moments
and a is not.
Suppose, for example, that zi = Xi2. Then IX= 0 but z # 0 unless
xil
E(xil
xi2
xi2)
>I=o.
If we add another equation, then we can consider the conventional

threestage least squares estimator. Its limiting distribution
is derived in appendix
B (Proposition
5); however, viewed as a minimum
distance estimator,
it is
using the wrong norm in general.
G. Chamberlain,
Consider
the standard
yi =
Multivariate
regression
simultaneous
nxi + ui,
equations
25
model:
E(Ui xi) = 0,
ryi + BXi = ui,

where rll+
B= 0 and Tui = vi. We are continuing
to assume that yi is
M x 1, xi is K x 1, r; = (xi yi) is i.i.d. according to a distribution
with finite fourth
moments (1 = 1,. .,N), and that E(x,,xi) is non-singular.
There are restrictions on
r and B: m(T, B)=O, where m is a known function. Assume that the implied
restrictions
on ll can be specified by the condition
that n=vec(lT)=f(Q
where the domain of 6 is r,, a subset of R that includes the true value So
(s 5 MK). Assume that Y, and f satisfy Assumptions
1 and 2; these properties
could be derived from regularity
conditions
on m, as in Malinvaud
(1970,
prop. 2, p. 670).
Choose
8 to
y:
E
where
[7i -
f(d)]&
1[72-f(s)],
is given
by eq. (2) and
we assume
that
in eq. (1) is positive
definite. Let
F= af(s)/S.
Then we have J%(~-~~)%NN(O,
A), where n
= (F Q - 1 F) . This generalizes Malinvauds
minimum distance estimator (p.
676); it reduces to his estimator if UPuy is uncorrelated
with xi xi, so that Q
= E(up up) @ [E(.qx;)] - (up = yi - Zlx,).
Now suppose that the only restrictions
on r and B are that certain
coefficients
are zero, together with the normalization
restrictions
that the
coefticient of yim in the mth structural
equation is one. Then we can give an
explicit formula for A. Write the mth structural equation as
where the components

of zi, are the variables in yi and xi that appear in the
mth equation with unknown
coefficients. Let there be M structural equations
and assume that the true value r is non-singular.
Let 6 =(S;, . . ., &) be s x 1,
and let r(6) and B(6) be parametric
representations
of r and B that satisfy
the zero restrrctions
and the normalization
rule. We can choose a compact
set Y, c R containing
a neighborhood
of the true value a, such that I(6) is
non-singular
for b E Y,. Then s = f(s), where f(s) = vet [ - r (6) B(S)].
Assume that f(s) =IL implies that 6=6, so that the structural parameters
are identified.
Then
Y, and
f satisfy Assumptions
1 and 2, and J%(8-6)
26
A!+
N(O, A). The formula for &r/&Y is given in Rothenberg (1973, p. 69),
an/as =
-(r
1 cgzK)p,,(zM
B d5;
)I,
where @,, is block-diagonal: @,, = diag {E(zilx:), . . ., E(Zi,Xi)}, and @,=E(X&).

So we have
n = {~,,[E(Op Up~
Xi
X:)] -
l UP:,>- ,
where I$ = royi + So xi. If up up is uncorrelated

to
with xi xj, then this reduces
n = {@J-E -(Up up) @ @,I] a;,> - l,

which is the conventional asymptotic covariance matrix for three-stage least
squares [Zellner and Thiel (1962)].
I shall present a generalization of three-stage least squares that has the
same limiting distribution as the generalized minimum distance estimator.
Let /I=vec(B) and note that R= -(f ~ @ I)/?. Then we have
[ji+(r-
z)/?]s)-[a+(r-
0 4Bl
=[(ro1)72+P]O-[(ro1)12+81,
where
o=(Z~~;l)E(f
UpU:r~XtX;)(Z~Qi;).
Let S,, be the following block-diagonal
matrix:
and let
where
iji = ~yi +
~Xi,
p+rO
7
B% BO.
G. Chamberlain,
Now replace
Multivariate
regression models ,fir palIe data
21
0 by
6 = (Z@s,- ) 9yzgs,- ),
and note that
(I 0 S,)[(r
0 472 + j?] = sxy - s:,s.
Then we have the following
distance
function:
This corresponds
to Basmanns
(1965)
squares. 3
Minimizing
with respect to 6 gives
a,,=(S,,!F
interpretation
of three-stage
least
s:,)-(s,,!Ps,,).
The limiting
distribution
(Proposition
5). We record
of this
it as:
estimator
is
derived
in
appendix
Proposition
10. fi(6^,,-6)%iV(0,A),
where A =(@,, P- @P:,)-l. This
generalized three-stage least squares estimator is asymptotically efficient within
the class of minimum distance estimators.
Finally, we shall
Suppose that
Yil =S;
where E(xiUil)=O,
system by setting
consider
zil
the generalization
of two-stage
least
squares.
Oil,
Zil is sl x 1, and
rank
[E(XiZ:l)] =sl.
We complete
the
yi, = nk xi + Uim,
where E(XiUi,)=O
(m=2,.
. ., M). SO z~,,,=x~
(m=2,.
. ., M), and
Let 6 =(6;, II;, . ., nJ and apply the minimum

distance procedure to obtain
8; since we are ignoring any restrictions
on R, (m = 2,. . ., M), 8 is a limited
information
minimum distance estimator.
13See Rothenberg (1973 p. 82). A more general derivation of this distance function can be
obtained by following Hanken (1982). Also see White (1982).
28
We have
gives
G. Chamberlain,
a($,
-@)1?N(O,
Multivariate
regression
n 11), and
evaluating
the partitioned
n 11= {E(Zi, $I [E((Diq)Z Xix:)] - E(Xi Zil)} _ ,
inverse
(4)
where
$1 =yi, -s;ozir.
We can obtain
the same limiting
distribution
generalization
of two-stage least squares: Let
by
using
the
following
and
where $I %Sy
then
(for example,
8r could
&;G2
= (Z; x!PE;,x2,)-
be an instrumental
(z;
variable
estimator);
x!P ,, Xy,).
This is the estimator of S, that we obtain by applying generalized three-stage

least squares
to the completed
system, with no restrictions
on A, (m
of this estimator is derived in appendix
= 2,. . .) M). The limiting distribution
B (Proposition
5):
Proposition 11. ,,/%(8,,,
-Sy)%N(O, A,,), where A, I is given in eq. (4). This
generalized two-stage least squares estimator is asymptotically
efficient in the
class of limited information minimum distance estimators.
3.4. Asymptotic
estimator
efjciency:
A comparison
with the quasi-maximum
likelihood
with E(r,) =z,

V(rJ
Assume that ri is i.i.d. (i= 1,. . ., N) from a distribution
where Z is a J x J positive definite matrix; the fourth moments
are
finite. Suppose that we wish to estimate functions of Z subject to restrictions.
Let C= vet(Z) and express the restrictions
by the condition
that a=g(O),
where g is a function from Yinto Rq with a domain YC RP that contains the
true value O(q = J*; p 5 J(J + 1)/2). Let
=Z,
S=kiil
(ri-FJ(ri-yi),
If the distribution
function is
of vi is multivariate
normal,
then
the log-likelihood
If there are no restrictions

on r, then the maximum likelihood
is a solution to the following problem: Choose 6 to solve
estimator
of 8
We shall derive the properties of this estimator when the distribution

of Yi is
not necessarily normal; in that case we shall refer to the estimator as a quasimaximum likelihood estimator (e^,,,).14
MaCurdy
(1979) considered
a version of this problem and showed that,
under suitable regularity
conditions,
,/%(gQML -0) has a limiting normal
distribution;
the covariance
matrix, however, is not given by the standard
information
matrix formula. We would like to compare this distribution
with
the distribution
of the minimum distance estimator.
This comparison
can be readily made by using Theorem
1 in Ferguson
(1958). In our notation,
Ferguson considers the following problem: Choose 8
to solve
w (s, e) [s-g(e)] = 0.
He derives
the limiting
distribution
of fi(&-- fI) under
regularity
conditions
on the functions
W and g. These regularity
conditions
are
particularly
simple in our problem since W does not depend on S. We can
state them as follows:
Assumption 3. E. c RP is an open set containing

to-one mapping
of E. into Rq with a continuous
second partial derivatives
in Eo; rank [ag(fI)/S]
singular for edo.
8; g is a continuous,
oneinverse; g has continuous
=p for OE 8,; Z(O) is non-
In addition, we shall need SaAg(Oo) and the central limit theorem

+%(S-g(e))%N(O,d),
where A = V[(U~-~~)@(U~-~~)].
result that
Then Fergusons
theorem
implies that the likelihood
equations
surely have a unique
solution
within So for sufficiently
large
14The quasi-maximum
Malinvaud
(1970, p. 678).
JE--B
likelihood
terminology
was
used
by the
Cowles
almost
N, and
Commission;
see
30
G. Chamberlain, Multioariate regression models for panel data
vmL4L- eO)%,N(O, A), where

A=(GYG,-GYAYG(GYG,)-,
and G=&(fl)/%,
Y=(Z@Zo)-.
It will be convenient
to rewrite this,
imposing the symmetry restrictions
on Z. Let G* be the J( J+ 1)/2 x 1 vector
formed by stacking the columns of the lower triangle of Z. We can define a
J* x [ J( J + 1)/2] matrix T such that CT= Ta*. The elements in each row of T
are all 0 except for a single element which is one; T has full column rank. Let
s= J-s* g(6)= Tg*(B), G* = ~g*(~)/S,
Y* = TYT;
then fi[S*
-s*(0)]
%N(O,A*), where A* is the covariance matrix of the vector formed from the
columns of the lower triangle of (ri-rO)(ri -TO). NOW we can set
/I =(e*
Consider
y*G*)-
the following
T$[s*
-g*(B)]
(G* y* A* y* G*)(e*
minimum
12.
1.
choose @MDto
of E. that contains a neighborhood

result is implied by Proposition
7.
If Assumption
same limiting distribution
estimator:
G*)-
A,{!?* -g*(O)],
where ris a compact subset

A,=%Y*. Then the following
Proposition
distance
y*
3 is satisfied,
as fi(gMD
then J%(&~~~
-0)
of 8 and
has the
- 0).
If A* is non-singular,
an optimal
minimum
distance
estimator
has
A,a%[A*-,
where [ is an arbitrary positive real number. If the distribution
of ri is normal, then A*- =iY*; but in general A*- is not proportional
to
Y*, since A* depends on fourth moments
and Y* is a function of second
moments.
So in general flPML is less efficient than the optimal minimum
distance estimator that uses
-1
,
;i;l(s~-s*)(s:-s-i)
1
where SF is the vector formed from the lower triangle of (ri-r](ri-f).
More generally, we can consider the class of consistent estimators that are
continuously
differentiable
functions of s-*: &=@*). Chiang (1956) shows that
the minimum distance estimator based on A*- has the minimal asymptotic
covariance
matrix within this class. The minimum
distance estimator based
on A, in (5) attains this lower bound.
G. Chamberlain,
Multivariate
regression
31
4. An empirical example
We shall present
an empirical
example
that illustrates
some of the
preceding
results. The data come from the panel of Young Men in the
National
Longitudinal
Survey (Parnes). The sample consists of 1454 young
men who were not enrolled in school in 1969, 1970, or 1971, and who had
complete
data on the variables
listed in table 1. Table 2a presents
an
unrestricted
least squares regression of the logarithm of wage in 1969 on the
union, SMSA, and region variables for all three years. The regression also
includes a constant, schooling, experience, experience squared, and race. This
regression is repeated using the 1970 wage and the 1971 wage.
Table
Characteristics
Young Men,
of National Longitudinal
Survey
not enrolled in school in 1969,
1970, 1971; N= 1454.
Variable
Mean
LWI
LWZ
LW3
Ul
u2
lJ3
lJlU2
lJIcJ3
U2U3
UI CJ2U3
SMSAI
SMSAZ
SMSA3
RNSI
RNS2
RNS3
s
EXP69
EXP692
RACE
5.64
5.74
5.82
0.336
0.362
0.364
0.270
0.262
0.303
0.243
0.697
0.627
0.622
0.409
0.404
0.410
11.7
5.11
39.8
0.264
Standard
deviation
0.423
0.426
0.437
2.64
3.71
46.6
LWI, L W2, LW3

~
logarithm
of hourly
earnings (in cents) on the current or last job in
1969,1970,1971;
UI, U2, U3 1 if wages on
current or last job set by collective bargaining,
0 if not, in 1969,1970,1971;
SMSAI,SMSAZ,
SMSA3 - 1 if respondent
in SMSA, 0 if not,
in 1969,1970,1971;
RNSI, RNSZ, RNS3 1, if
respondent in South, 0 if not, in 1969,1970,1971;
S ~ years of schooling completed;
EXP69 (S-age
in 1969 -6); RACE - 1 if respondent
black, 0 if not.
Xl.
1.f uoyxx
aJv sIoJ,a
p~vpuels
(6LO.O)
011'0
kmY0)
S80'0-
zzo'o-
(zsuo)
(SPO'O)
PIO'O
ZLO'O-
(ESOO)
(OPO'O)
610'0-
(LEOO)
OSO'O-
ZM7
lM7
aPnF'"!SUO!='J%~J flt'.
(2) .ba u! y Bu!sn pav+cym
(180'0)
P9Z'O
(PLO'O)
181'0
'CSNZI 'ZSNti 'ISNXt-VSSWS'ZVSSWSIVS'SWS~
(6Lo'O)
9PZ'O
(260'0)
811'0
'(83VX p59dX~'69dX5Sf
9SZ'O(990'0)
LZZ'O
LZI'O
bM)'O)
(tr 10)
(911'0)
tzxo-
LPO'O-
hO~0)
f/M7
ZLOO-
(IPO'O)
uog3as (z)
juapuadaa
alqe!JeA
.___
(ZLO'O)
821'0
1~
rn
In
zn
-.- ___~0 (sJolJa p.mpue)s pm) siua!ogao3
lseaI pap!llsamn
9z aw1
salenbs
znrn
(SLO'O)
260'0
.suo!ssaSaJ
i-n/n
(OLO'O)
9SI'O
fnzn
(POI'O)
281'0fl?Zl?nl/l
_.
e
(IEO'O) (OEO'O)
SEI'O
(szuo)
(SZO'O)
600'0-
tn
(EZO'O)
9Po'O
ZM7
.CM7
IPOO
(EZO'O)
8Po'O
(OEO'O)
OSI'O
(8200)
In
ajqe!mh
luapuadaa
l/Ml
(szo'o)
zn
pun) slua!xjja03
ILI'O
';Zg
aJe slolJa pmpueis ayL $wu~ 26ydxy fj9dxg s I) apnpu! suo!ssaJ%aJ I[V,
(950'0) (SSO'O)
880'0
COO'0
(LZO'O) (9zo.o)
980'0
010'0
pa~yno~m
16LO'O)
PLO'0
(I90'0) (S90'0)
.ba u! Q Bu!sn
(E60'0)
OSO'O
(660'0)
s90'0
(PSO'O) (SSO'O)
IOO'OZEO'O
IVSWS
Iseal pal3ysaJun
:Jo (sJoJla pmpuels
~__
Zf I0
(8LO'O)
ZCZ'O-
(601'0)
6EO'O-
';pg'
cvsws ZVSWS
__.
sanmbs
f800
SSI'O-
(Z6D'O)
'g:;'
ESO'O
(OLO'O)
801'o_
ISNkI
ozo'o
Ci'Ntl ZSNU
Rwo!ssaSaJ
ez alw
G. Chamberlain,
Multivariate
regression
33
In section 2 we discussed the implications

of a random intercept (c) and a
random slope (b). If the leads and lags are due just to c, then the submatrices
of LI corresponding
to the union, SMSA, or region coefficients should have
the form /3l+U.
Consider,
for example,
the 3 x 3 submatrix
of union
coefficients ~ the off-diagonal
elements in each column should be equal to
each other. So we compare 0.048 to 0.046, 0.042 to 0.041, and -0.009 to 0.010;
not bad.
In table 2b we add a complete set of union interactions,
so that, for the
union variables
at least, we have a general regression
function.
Now the
submatrix
of union coefficients is 3 x 7. If it equals (pZ3,0)+Zl, then in the
first three columns,
the off-diagonal
elements within a column should be
equal; in the last four columns, all elements within a column should be equal.
I first imposed the restrictions
on the SMSA and region coefficients, using
the minimum
distance estimator. fl is estimated using the formula in eq. (2),
section 3.1, and A,=&.
The minimum
distance statistic (Proposition
8) is
6.82, which is not a surprising value from a ~(10) distribution.
If we impose
the restrictions
on the union coefficients as well, then the 21 coefficients in
table 2b are replaced by 8: one fl and seven 2s. This gives an increase in the
minimum
distance
statistic
(Proposition
8, appendix
B) of 19.36-6.82
= 12.54, which is not a surprising value from a ~(13) distribution.
So there is
no evidence here against the hypothesis
that all the lags and leads are
generated by c.
Consider a transformation
of the model in which the dependent variables are
LWl, LW2-LWl,
and LW3-LW2.
Start with a multivariate
regression on
all of the lags and leads (and union interactions);
then impose the restriction that
U, SMSA, and RNS appear in the LW2- L WI and LW3 - LW2 equations
only as contemporaneous
changes (E(y, - y, 1 1x1, x2, x3) = p(x, - x,_ J). This
is equivalent
to the restriction
that c generates all of the lags and leads, and
we have seen that it is supported by the data. I also considered imposing all
of the restrictions
with the single exception of allowing separate coefficients
for entering and leaving union coverage in the wage change equations.
The
estimates (standard errors) are 0.097 (0.019) and -0.119 (0.022). The standard
error on the sum of the coefficients is 0.024, so again there is no evidence
against the simple model with E(y, 1x1, x2, x3, c) = /IX, + c.15
However, since the x,s are binary variables, condition (R) in Proposition
1
Using
May-May
CPS matches for 197771978, Mellow (1981) reports coefftcients (standard
errors) of 0.087 (0.018) and -0.069 (0.020) for entering and leaving union membership
in a wage
change regression.
The sample consists of 6,602 males employed as non-agricultural
wage and
salary workers in both years. He also reports results for 2,177 males and females whose age was
525. Here the coefficients on entering and leaving union membership
are quite different: 0.198
(0.031) and -0.035
(0.041); it would be useful to reconcile these numbers with our results for
young men. Also see Stafford and Duncan (1980).
34
G. Chamberlain,
Multivariate
regression
does not hold. For example, the union coefticients provide some evidence
that E(b 1x1, x2,x,) is constant for the individuals
who experience a change in
if x,+x,+x,#O
or 33; but there is
union coverage [i.e., E(b 1x,,x,,x,)=if
no direct evidence on E(b 1x1, x2, x3) for the people who are always covered
or never covered. Furthermore,
our alternative
hypothesis has no structure.
It might be fruitful, for example, to examine the changes in union coverage
jointly with changes in employer.
Table 3a exhibits the estimates that result from imposing the restrictions
using
the optimal
minimum
distance
estimator.j
We also give the
conventional
generalized least squares estimates. They are minimum
distance
estimates in which the weighting matrix (AN) is the inverse of
We give the conventional

standard
errors based on (pfi;F)-
and the
standard errors calculated according to Proposition
7, which do not require
an assumption
of homoskedastic
linear regression. These standard errors are
larger than the conventional
ones, by about 30%. The estimated
gain in
efficiency from using the appropriate
metric is not very large; the standard
errors calculated according to Proposition
7 are about 10% larger when we
use conventional
GLS instead of the optimum minimum distance estimator.
Table 3a also presents
the estimated
Ils. Consider,
for example,
an
individual
who was covered by collective
bargaining
in 1969. The linear
predictor of c increases by 0.089 if he is also covered in 1970, and it increases
by an additional
0.036 if he is covered in all three years. The predicted c for
someone who is always covered is higher by 0.102 than for someone who is
never covered.
Table 3b presents estimates under the constraint
that I=U. The increment
in the distance statistic is 89.08 - 19.36= 69.72, which is a surprisingly
large
value to come from a x2 (13) distribution.
If we constrain
only the union As
to be zero, then the increment
is 57.06- 19.36= 37.7, which is surprisingly
large coming from a x2(7) distribution.
So there is strong evidence for
heterogeneity
bias.
The union coefficient declines from 0.157 to 0.107 when we relax the A=0
restriction.
The least squares estimates for the separate cross-sections,
with
16We did not find much evidence for nonstationarity
in the slope coefficients. If we allow the
union fi to vary over the three years, we get 0.105, 0.103, 0.114. The distance statistic declines
IO 18.51, giving 19.36- 18.51 =0X5; this is not a surprising value from a x*(2) distribution. If we
also free up /I for SMSA and RNS, then the decline in the distance statistic is 18.51- 13.44
= 5.07, which is not a surprising value from a x(4) distribution.
0.086
(0.025)
~ 0.008
(0.046)
SMSAZ
- 0.067
(0.040)
- 0.023
(0.030)
SMSA
u2
UI
0.032
(0.046)
SMSA3
-0.082
(0.037)
0.156
(0.057)
lJllJ2
0.100
(0.072)
RNSl
0.152
(0.062)
UlU3
- 0.02 I
(0.077)
RNS2
0.195
(0.059)
r/2 U3
-0.085
(0.040)
(0.052)
- 0.082
(0.045)
RNS
-0.128
(0.068)
RNS3
-0.229
(0.085)
lJIUZU3
E*Cyjx)=nx=n,x,+n,x,;
x;=(Ul,
U2, U3, UIU2, UIU3, U2U3, UIU2U3, SMSAl,
SMSA2, SMSAS,
RNSI, RNS2, RNS3); x; =( 1, S, EXP69, EXP69, RACE). ZZ, = (/J,Z,, 0, BSMSAZJ,fiRNSZ3)+ 12; ZZ2 is unrestricted.
The restrictions are expressed as n = F6, where 6 is unrestricted.
B and 1 are minimum distance estimates with
A, =d in eq. (2), section 3.1; to,., and lo,, are minimum distance estimates with Ai = 6, in eq. (6), section
one based on
error for /?o,, is the conventional
4 ([or., is not shown in the table). The first standard
The x2
(FR,
4-l;
the second standard error for &rs is based on (FSZ;F)~Fn;1d6,F(F~;F)~.
statistics are computed from N[k-FG]&[?i-Fs].
x2(23) = 19.36
.I
0.050
(0.017)
(0.021)
0.121
(0.013)
(0.018)
/%,s
(i-3
0.0.56
(0.020)
0.107
(0.016)
errors) ok
SMS.4
(and standard
Coefficients
estimates.
Table 3a
Restricted
36
G. Chamberlain,
Multivariate
regression
Table 3b
Restricted
estimates
under
Coefficients
s^
the constraint
(and standard
that I = 0.
errors) of:
SMSA
RNS
0.157
(0.012)
0.120
(0.013)
-0.150
(0.016)
x2(36) = 89.08
See footnote
to table 3a.
no leads or lags, give union coefficients of 0.195, 0.189, and 0.191 in 1969,
1970 and 1971.17 So the decline in the union coefficient, when we allow for
heterogeneity
bias, is 32% or 44x, depending
on which biased estimate (0.16
or 0.19) one uses. The SMSA and region coefficients also decline in absolute
value. The least squares estimates
for the separate cross-sections
give an
average SMSA coefficient
of 0.147 and an average region coefficient
of
-0.131. So the decline in the SMSA coefficient is either 53% or 62x, and the
decline in absolute value of the region coefficient is either 45% or 37%.
5. Conclusion
We have examined the relationship
between heterogeneity
bias and strict
exogeneity in distributed
lag regressions
of y on x. The relationship
is very
strong when x is continuous,
weaker when x is discrete, and non-existent
as
the order of the distributed
lag becomes infinite.
The individual
specific random
variables
introduce
nonlinearity
and
heteroskedasticity.
So we have provided an appropriate
framework
for the
estimation
of multivariate
linear predictors.
We showed that the optimal
minimum
distance
estimator
is more
efficient,
in general,
than
the
conventional
estimators
such as quasi-maximum
likelihood,
We provided
computationally
simple generalizations
of two- and three-stage least squares
that achieve this efficiency gain.
Using the NLS Young Men in 1969 (N = 1362), Griliches (1976) reports a union membership
coefticient of 0.203. Using the NLS Young Men in a pooled regression for 19661971
and 1973
(N=470),
Brown (1980) reports a coefficient of 0.130 on a variable measuring the probability
of
union coverage. (The union coverage question was asked only in 1969, 1970, and 1971; so this
variable is imputed for the other four years.) The coefficient declines to 0.081 when individual
intercepts
are included
in the regression.
His regressions
also include a large number
of
occupation
and industry specific job characteristics.
G. Chamberlain,
Multiuariate
regression
modekfor
37
panel data
Some of these ideas were illustrated using the sample of Young Men in the
National
Longitudinal
Survey. We examined
regressions
of wages on the
leads and lags in union coverage, SMSA, and region. The results indicate
that the leads and lags could have been generated
just by a random
intercept. This gives some support for analysis of covariance
type estimates;
these estimates indicate a substantial
heterogeneity
bias in the union, SMSA,
and region coefficients.
Appendix A
Let Sz be a set of points where OEQ is a doubly infinite sequence of
vectors of real numbers:
0={...,0_~,0~,0~,...}={0,,t~I),
where w,ER~
and I is the set of all integers. Let z,(w)=o,
be the tth coordinate
function.
Let F be the a-field generated by sets of the form
A = (0.xz,(w)E B,, . . ., Z,+k(u) E Bk},
where t, k E I and the Bs are q-dimensional
Bore1 sets. Let P be a probability
measure defined on 9 such that {e,, t E 11 is a (strictly) stationary
stochastic
process on the probability
space (C&P-, P).
The shift transformation
S is defined by z,(So) =zt+ r(w). It is an invertible,
measure preserving transformation.
A random variable d defined on (sZ,P, P)
is invariant
if d(So)=d(w)
except on a set with probability
measure zero
(almost surely or as.). A set A E 9 is invariant
if its indicator function is an
invariant
random variable.
We shall use E(d ( Y), to denote the conditional
expectation
of the random
variable
d with respect to the o-field 3, evaluated
at w. Let x, be a
component
of zl, let g(x) denote the a-field generated by {. . ., x_ 1, x0, x1,. . .},
and let E(d1 xt,x,_ r,. . .) denote the expectation
of d conditional
on the
a-field generated by xt, xt 1,. . . .
Proposition
3.
If d is an invariant random variable
with E(ldl)< co, then
where t is any integer.

Proof.
First we shall show that E(d I a(x)) is an invariant
random
Let f(o)=d(Sw).
A change of variable argument shows that
E(d I CT(X))~~
= E(fl S- o(x)),
[See
Billingsley
(1965,
example
variable.
a.s.
10.3, p.
109).]
Since
d is an
invariant
38
random
G. Chamberlain,
variable,
Multivariate
we have d(Sw)=d(o)
regression
models,for
panel data
as.; also S- CJ(X)=0(x).
Hence
Let CJ(X,,Xf _ 1,. . .) denote the a-field generated

by (x,, x,_ 1,. . .), and let
~=~~_a(xt,x,1,. . .) be the left tail o-field generated by the x process.
Since E(d 1(T(X)) is an invariant
random
variable,
there is a version
of
E(d 1a(x)) that is measurable
Y-. [See Rozanov (1967, lemma 6. l., p. 162).]
a.s., and
so E(d 1a(x)) = E(d 10(x,, xt_ 1,. . .)).
Hence
E(d 1o(x)) = E(d 1Y)
Q.E.D.
Let d be an invariant
random
variable
and assume that E(P)< co,
E(xT)< co. Consider the Hilbert space of random variables generated by the
linear manifold
spanned by the variables
{d,. . .,x_ 1,x,,, x1,. . .}, closed with
respect to convergence
in mean square. We also include a constant
(1) in
the space. The inner product
is (a, b) =E(ab).
Then the linear predictor
of d on the closed linear
E*(d I . ..) X_1,X(),Xl)... ) is defined as the projection
subspace generated by { 1,. . ., x _ 1,x0, x1,. . .}.
Proposition 4.
then
If d is an invariant random variable and E(d) < co, E(xf) < co,
E*(dl...,
x_l,xO,xl
,... )=$+A&
where f is the limit in mean square of cfzO X,-~/J as J-co,

and
i = cov (d, ~)/V(~)
=o
if
V(2) # 0,
if
V(R)=O,
t is any integer,
rc/= E(d) - AE(.f).

Proof: The existence of the limit is implied by the mean ergodic theorem
[Billingsley
(1965, theorem
2.1, p. 21)]. Since d is an invariant
random
variable, we have cov(d,x,)=cov(d,
x1) for all t. Let aJ=xf=l
x,-j/J. Then
cov (d, a,) = cov (d, x,), and so cov (d, m)= lim,, m cov (d, 2,) = cov (d, x1). Since
i is an invariant
random
variable, we have cov (a, a,)= cov (a, x,), and so
V(a) = lim,, m cov (2, a,) = cov (a, x1). Hence
cov (d - $ - Af, x,) = cov (d, x1)-I
cov (a, x1)
= cov (d, 2) - 1 V(a)= 0,
t E I.
Since we also have E(d - II/- 2~2)= 0, the proof is complete.
Q.E.D.
G. Chamberlain,
Multioariate
reyression
39
Appendix B
Let r: = (x;,yi), i = 1,. . ., N, where xi = (xii,. . ., xix) and yi = (y,r, . . ., yiM). Write
the mth structural equation as
Yim
S:,Zirn
Uim,
m=l,...,M,
where the components

of zi, are the variables in yi and xi that appear in the
mth equation
with unknown
coefficients.
Let S,, be the following blockdiagonal matrix:
and
Let 0: = (I$~, . ., &), where u& = yim - 6,ozi, and ~5: is the true value of 6,; let
Gz, = E&J
Let 6 =(S;, . . ., 6b) be s x 1, and set
s^=
(S,, D - Sz,) -
(S,,
D s,,,).
Proposition
5.
Assume that (1) ri is i.i.d. according to some distribution with
,finite fourth moments; (2) E[xi(yi, -8: Zi,J] = 0 (m = 1,. , ., M); (3) rank (a..,) = s;
und (4) D a%Y as N-+E, where !P is a positive definite matrix. Then
,I$&
6)s N(0, A), where
Proof:
~(S^-~O)=(S,,D-~S~,)-~~,,D~~~~~,(U~O~~)~~.
strong law of large numbers,
Sz,2@,,;
@2x Y ~ W,
definite
matrix
since rank (@,,)=s.
So we obtain
distribution
by considering
(Gi,, Y - l a:,) - 1 CD,, Y
l f
i=l
(I$ @ X&G.
Note that II: @Ixi is i.i.d. with E(u: @ Xi)=O, V(U~ 0

applying
the central
limit theorem
By
the
is an s x s positive
the same limiting
X~)=E(U~U:
gives ~(8-6)~N(0,A).
Xix;).
Q.E.D.
Then
40
G. Chamberlain,
Multivariate
regression
This result includes

as special cases a number
of the commonly
used
estimators.
If zi, = xi(m = 1, . . ., M) and D =Z, then 8 is the least squares
estimator and ,4 reduces to the formula for R given in eq. (1) of section 3.1.
If Y = E($$)
0 E(x&), then n is the asymptotic
covariance
matrix for the
three-stage
least squares estimator.
If Y =E($$
@ XiXI), then ,4 is the
asymptotic
covariance
matrix for the generalized
three-stage
least squares
estimator [eq. (3), section 3.31. If
Y = diag{E(z$~) E(xi xi), . ., E(vg) E(xi xi)),
then we have the asymptotic
covariance
Xi xi),
Y = diag{ E($t
matrix
for two-stage
least squares.
If
. ., E(?I~$ xi xi)},
we have the asymptotic

covariance
matrix for generalized
two-stage least
squares. [A,, is given in eq. (4), section 3.3.1
Next we shall derive the properties of the minimum distance estimator. Let
D,(0) = [a,-g(@]AJa,-g(6)]
and choose e to
min ON(e).
Bt 1
Assumptions
Proposition
1 and 2 are stated in section

6.
IfAssumption
3.2.
1 is satisfied, then @%I.
ProoJ:
Let D*(6) = &(0) -g(0)] Yk(8) -g(e)].
D, a.~. converges
uniformly
to D* on Y: Let B be a neighborhood
of 8 and set r = r- B. Then
min D,(@*min
BEY
BET
D*(8)=&
Since 6 > 0 and DN(&)a*O, it must

Since B is an arbitrary
Q.E.D.
Proposition
7.
% N(U, A), where
If
be that 8~ B a.s. for N sufficiently
neighborhood
Assumptions
of 8, we have shown
and
are
satisfied,
then
that
large.
8 Leo.
fi(&-8)
Zf A is positive
optimal
choicefbr
Proof
Let
definite,
then A -(G
Y is A
A - 1 G)- 1 is positive
semi-definite;
41
hence
an
I.
s,(fI)=dD,(B)/80=
-2(aip(e)/a~)A.[a,-g(e)].
Since &SO,
for N sufficiently large we as.
mean value theorem implies that
have 6~ Z. and s,(8) = 0. The
s,(6)= ~~(8) + (ds,(O*)/tW) (6 0) a.s.,

for sufficiently large N, where 8* is on the line segment connecting
8 and 0.
[There is a different 8* for each row of &,(O*)/%; the measurability
of 8*
follows from lemmas 2 and 3 of Jennrich
(1969).] Since 0*28,
direct
evaluation
shows that
&&I*)/%
32 G Y G,
which is non-singular.
fi(e^for sufficiently
considering
Hence @(e^-
Hence
0) = - [t%,(tl*)/ae,] - l JNs,(eo)
large
N.
We
obtain
the
same
a.s.,
limiting
distribution
by
0)s N(O, A).
To find an optimal Y, note that there is a non-singular

matrix C such that
Let G=C-Gand
B=(GYG)-GYC.
Then we have
A=CC.
which is positive
Proposition
8.
and if ANaLA-,
semi-definite.
If Assumptions
Q.E.D.
I and 2 are
satisfied,
then
NC%-s(e)l Ada, -&I
%c2(q
-PI.
if A is positive
definite,
42
G. Chamberlain, Mulr~rtrriate regression models for panel data
Proof:
For sufficiently
large N we have
JNCg(&)-g(OO)]
= G, JE(e^- 0)
where G, %G.
From
the proof of Proposition
a.s.,
7, we have
JE(e^--OO)=R,JN[u,-g(BO)]as.
where R,~R=(GA-G)-lGAp.
fib,
where Q = Z,- GR,
II 2 N(O, I,);
-g(831
.,
Hence
=,,%,
-AeoN
C is a non-singular
d, = N[a, -g(@)] A,[a,
-fiCg(83-g(e")l
matrix
such
that
sQCU>
CC = A,
and
-g(8)] %i C Q A - 1 QCu.
Let G=CG
and
M,=Z,-c(G@lc;
idempotent
matrix with rank q-p and
then
M,
is
symmetric
CQA-QC=M;=M,.
Hence d,~,uM,u~~X2(q-p).
Q.E.D.
Now consider imposing additional

restrictions,
which are expressed by the
condition
that 8 =f(a), where a is s x 1 (s 5 p). The domain of a is Y,, a subset
of R that contains
the true value d . So O=f(a) is confined to a certain
subset of RP.
Assumption 2.
Y, is a compact
subset of R that contains
a; f is a
continuous
mapping
from r, into Y, f(a) = e for a E Y, implies a =a; Y,
contains
a neighborhood
of a0 in which f has continuous
second partial
derivatives; rank (F) = s, where F= df (a)/da.
Let h(a)=g[f(a)].
Choose
oi to
min [ahi-h(a)]AN[u,-h(a)].
OIE
r,
G. Chamberlain,
Proposition
8.
If Assumptions
definite, and if AN%Ael,
regression
forpanel
models
1, 2, and 2 are
then d, -d2%~(p-s),
satisfied,
data
43
$ A is positive
where
-W)lA.Ca,-W41,
d, =N[a,
Furthermore,
Multivariate
d, -d,
is independent
of d, in their limiting joint distribution.
Proof
The assumptions
on f and
Y, imply that h and
& satisfy
Assumptions
1 and 2. By following the proof of Proposition
8, we can show
that the vector (d,, d2) converges in distribution
to (d:, d:), where
U& N(O,Z), C is
G=C-G,
and
non-singular
matrix
such
that
CC= A, 8=
C-H,
M,=z,-e(el~-w
MH=Iq-A(RA)-lzT,
Since fi is in the column space of e, we have M,Mc=MGM,

MH-MG is a symmetric idempotent
matrix with rank p-s. Hence
=M,;
so
d, -d,~tU(MH-MG)u~~X2(p-s).
Since
cov[(M,-M,)u,M,u]
we see that d: - d,* is independent
=(MH-MG)MG=O,
of d:.
Q.E.D.
In section 3.2 we considered

applying
the minimum
distance procedure
both to L and to W. We want to show that if the restrictions
involve only n,
then the two procedures
give estimators
of R with the same limiting
distribution.
First consider the effect of a one-to-one
transformation
from W
to (ti, w;): let I(p) be a function from R4 into Rq and let L = al(p)/a$,
where
p =g(O). Let h(8) = Ik(O)]. Choose 8 to
y;,rCk)
- W41ANCQ4- W)l.
44
Proposition 9a. Assume that (1) Assumptions 1 and 2 are satisfied for g and
1: (2) 1 is one-to-one and continuous on the range of g(O) for 0~ K 1 has
continuous second partial derivatives in a neighborhood of g(Q); L is nonsingular;
(3) A is positive
3 N(0, A),
where A = (G A
Proof:
definite
and A,
A(LAL)-.
Then
,/%(6-O)
l c)-.*
By the d-method,
fi[f(~,)-h(8~)]~N(O,
L A L).
Hence ,,/%(&
O)% N(0, A), where ,4 = (H(L A L)) 1 H)- and H= c%(O~)/N.
Q.E.D.
Since H= L G and L is non-singular,
we have A =(G A ~ c)-.
Finally,
consider
augmenting
aN to a k x 1 vector
cN: c;V=(a;, bk),
kzq.
(For example,
we can augment
12 by adding
WZ.) Assume
that
cN %<, where 5 = (g(OO),cg), and assume that J%(c~-<~)~N(O,@).
We
shall let TZ be unrestricted.
Let t,k = ($;, I+VJ= (O,t,k) be a 1 x n vector, where
n =p + k - q; set ml(+) = (g(O),I&). Choose $ to
7; CG- W)l A$CCN

- 4W,
6I
where Ag%Wl.
Then I,&, provides an estimator of 8; we want to compare
this estimator with the following one: choose 8 to
minCa,-g~e8)1A.[a,-g(e)l,
eer
where A,
We shall
A,
and
consisting
following
Proposition
Proof
%A-,
Y is a compact subset of RP, and g is continuous
on Y.
set Y, equal to the Cartesian product of Y and Rkm4. Suppose that
A; are positive
definite,
and that the submatrix
of (Ag)-l
of the first q rows and columns equals Ail.
Then we have the
result:
9b.
$, = 6
Minimizing
4-+,,=
first with respect
to ti2 gives
-(A~,,)-A~,,Ca,-g(e)l,
A similar result is in Ferguson (1958).
45
G. Chamberlain, Multiwriate regression models for panel data
where A&, is the s, t submatrix

distance function is
= [aN -g(O)]
So the
distance
A,[a,
of A;
-g(O)].
addition
of unrestricted
estimator.
(s, t = 1,2). Then
the
concentrated
Q.E.D.
moments
does
not
affect
the
minimum
References
Anderson, T.W., 1969, Statistical inference for covariance matrices with linear structure, in: P.R.
Krishnaiah,
ed., Proceedings
of the second international
symposium on multivariate
analysis
(Academic Press, New York).
Anderson,
T.W., 1970, Estimation
of covariance
matrices which are linear combinations
or
whose inverse are linear combinations
of given matrices,
in: Essays in probability
and
statistics (University of North Carolina Press, Chapel Hill, NC).
Amemiya, T., 1971, The estimation
of variances in a variance-components
model, International
Economic Review 12, l-13.
Balestra, P. and M. Nerlove, 1966, Pooling cross section and time series data in the estimation
of a dynamic model: The demand for natural gas, Econometrica
34, 5855612.
Basmann, R.L., 1965, On the application
of the identifiability
test statistic and its exact finite
sample
distribution
function
in predictive
testing
of explanatory
economic
models,
Unpublished
manuscript.
Billingsley, P., 1965, Ergodic theory and information
(Wiley, New York).
Billingsley, P., 1979, Probability
and measure (Wiley, New York).
Brown, C., 1980, Equalizing differences in the labor market, Quarterly Journal of Economics 94,
113-134.
Chamberlain,
G., 1980, Analysis
of covariance
with qualitative
data, Review of Economic
Studies 47, 225-238.
Chiang, C.L., 1956, On regular best asymptotically
normal estimates, Annals of Mathematical
Statistics 27, 336-351.
Cramer, H., 1946, Mathematical
methods of statistics (Princeton
University
Press, Princeton,
NJ).
Ferguson,
T.S., 1958, A method
of generating
best asymptotically
normal
estimates
with
application
to the estimation
of bacterial
densities, Annals of Mathematical
Statistics 29,
10461062.
Goldberger,
AS., 1974, Asymptotics
of the sample regression slope, Unpublished
lecture note no.
12.
Griliches, Z., 1976, Wages of very young men, Journal of Political Economy 84, S69-S85.
Griliches, Z. and A. Pakes, 1980, The estimation
of distributed
lags in short panels, National
Bureau of Economic Research technical paper no. 4.
Hansen,
L.P., 1982, Large sample properties
of generalized
method of moments
estimators,
Econometrica
50, forthcoming.
Hsiao, C., 1975, Some estimation
methods for a random coefficient model, Econometrica
43,
3055325.
Jennrich, R.I., 1969, Asymptotic
properties of non-linear least squares estimators,
The Annals of
Mathematical
Statistics 40, 6333643.
Kendall, M.G. and A. Stuart, 1961, The advanced theory of statistics, Vol. 2 (Griffin, London).
MaCurdy.
T.E., 1979, Multiple time series models applied to panel data: Specification
of a
dynamic model of labor supply, Unpublished
manuscript,
46
Maddala, G.S., 1971, The use of variance components

models in pooling cross section and time
series data, Econometrica
39, 341-358.
Malinvaud,
E., 1970, Statistical methods of econometrics
(North-Holland,
Amsterdam).
Mellow, W., 1981, Unionism
and wages: A longitudinal
analysis, Review of Economics
and
Statistics 63, 43-52.
Mundlak,
Y., 1961, Empirical
production
function free of management
bias, Journal of Farm
Economics 43,44-56.
Mundlak,
Y., 1963, Estimation
of production
and behavioral
functions from a combination
of
time series and cross section data, in: C. Christ et al., eds., Measurement
in economics
(Stanford University Press, Stanford, CA).
Mundlak,
Y., 1978, On the pooling of time series and cross section data, Econometrica
46, 699
85
Mundlak,
Y., 1978a. Models with variable coefftcients: Integration
and extension, Annales de
IINSEE, 30-31, 4833509.
Rao, C.R., 1973, Linear statistical inference and its applications
(Wiley, New York).
Rothenberg,
T.J., 1973, Efficient estimation
with a priori information
(Yale University
Press,
New Haven, CT).
Rozanov, Y.A., 1967, Stationary
random processes (Holden-Day,
San Francisco, CA).
Sims, CA., 1972, Money, income, and causality, American Economic Review 62, 54G552.
Sims, C.A. 1974, Distributed
lags, in: M.D. Intriligator
and D.A. Kendrick,
eds., Frontiers
of
quantitative
economics, Vol. II (North-Holland,
Amsterdam).
Swamv,
P.A.V.B..
1970. Efficient
inference
in a random
coefficient
regression
model.
Fconometrica
38, 3111323.
Swamv. P.A.V.B.. 1974. Linear models with random coefficients. in: P. Zarembka.
ed.. Frontiers
in econometrics
(Academic Press, New York).
Stafford, F.P. and G.J. Duncan,
1980, Do union
members
receive compensating
wage
differentials?, American Economic Review 70, 355-371.
Wallace, T.D. and A. Hussain, 1969, The use of error components
models in combining
time
series with cross section data, Econometrica
37, 55-72.
White, H., 1980, Using least squares to approximate
unknown regression functions, International
Economic Review 21. 1499170.
White, H., 1980a, Nonlinear regression on cross section data, Econometrica
48, 721-746.
White, H., 1980b, A heteroskedasticity-consistent
covariance
matrix estimator
and a direct test
for heteroskedasticity,
Econometrica
48, 817-838.
White, H., 1982, Instrumental
variables regression with independent
observations,
Econometrica
50. forthcoming.
Zellner,
A. and H. Theil,
1962, Three-stage
least squares:
Simultaneous
estimation
of
simultaneous
equations, Econometrica
30, 54-78.
Zellner, A., J. Kmenta
and J. D&e,
1966, Specification
and estimation
of CobbDouglas
production
function models, Econometrica
34, 784795.

Multivariate Regression Models For Panel Data

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Multivariate Regression Models For Panel Data

Diunggah oleh

Hak Cipta:

Format Tersedia

Journal

(!/ K+sconsin Madison,

The paper examines

The coefficients b,, and ci are allowed to vary across individuals

G. Chamberlain, Multitlariate reqression models for panel data

of y, on x,, ., .xtmJ will not provide a consistent

so there is just a contemporaneous

Section 3 of the paper considers

,5*Cyi ( xi) =ZIx,

the columns of ii and II. Then fi(7i-~)~N(O,Q)

regression models for panel data

where y, is the logarithm

where L?$*is the information

where E* is the minimum

and Dr6ze (1966).

models for panel data

With more than one observation

that V(X) is non-singular.

(x) cov(x, c).

Even if E*(u, / x) =O, there will generally

where 1 is a TX 1 vector of ones. The off-diagonal

So the least squares regression of Y; on ?r provides a consistent

This analysis of covariance

so that x is strictly exogenous

regression models for panel data

This implies that Zl, has the form fiZ,+l1.

Prob (x, =x, _ 1) = 0 for some integer

is based on Sims (1972, 1974)

G. Chamberlain, Multivariate regression models

(C) and (R) hold and if T 23. then

where /I = E(b) = E(b ) x) and y = E(c) = E(c )x).

hold with probability

E(bI 4 = CE(Y, I 4 - E(Y,

x,, x, _ I), and

d/(x, - xn- 11,

E(b 1x)= E(b),

This analysis can be applied to linear transformations

(PI f PA. Then

E(Y,I x, b,4 = bxt+ c,

where T 2 2. Assume that condition (R) holds and define

hold with probability

Then if the assumptions

Hence Q/61) < cc only if T 2 3.

G. Chamberlain, Multiuariate regression models for panel data

large number of these estimators.

1 to the case of a finite distributed

If condition (C) holds, ij

,fbr some integer n with 25 + 2 5 n 5 7; and if T 2 25 + 3, then

= E(Y, I x,, . . .>X,-J),

G. Chamberlain, Multivariate regression models for panel data

y = E(c) = E(c Ix),

The extension of Proposition

are new issues,

where O(X) is the information

Does this condition

Ifd is an invariant random variable with E(ldl)< co, then

rule out heterogeneity

bias just because

y does not cause x. If

G. Chamberlain, Multivariate regression models for panel data

(1-PI i xt+P(x, +x,) /cu+P) vxln

Now in this example, L&K, does not approach

If d is an invariant random variable