GLS Handout

Generalized Least Squares
(Handout Version)∗
Walter Belluzzo
Econ 507 Econometric Analysis

Spring 2013
1 Introduction
Efficiency of the OLS Estimator
• Remember that the OLS estimator efficient (best linear unbiased estimator) if the DGP
belongs to the regression model
y = Xβ + u, u|X ∼ iid(0, σ 2 I),
a result stated in the Gauss-Markov theorem.
• For efficiency of least squares, the error terms must be uncorrelated and have the equal
variance, Var(u) = σ 2 I.
• The usual estimators of the covariance matrices of the OLS and NLS estimators are not
valid when these assumptions do not hold.
• Alternative “sandwich” covariance matrix estimators that are asymptotically valid can be
obtained. But inefficiency of the estimators β̂ remains.
Regression Model with Non-spherical Disturbances
• Non-spherical disturbances affect both linear and nonlinear regression models in the same
way. So, we can focus our attention to the simpler, linear case.
• Let us consider the model
y = Xβ + u, E(uu0 ) = Ω.
• The idea to obtain an efficient estimator for the vector β in this model is to find a
transformation that makes the Gauss-Markov conditions to be satisfied.
• The resulting efficient estimator (why?) is called the generalized least squares, or
GLS, estimator.
∗ This lecture is based on D&M Chapter 6.
Econ 507 – Spring 2013
2 Generalized Least Squares

• The transformation we want to find must be such that the new. transformed, error terms
have variance matrix Var(u) = σ 2 I.
• Consider transforming the regression by pre-multiplying by Ψ . Then, the transformed

error vector Ψ 0 u is
E(Ψ 0 uu0 Ψ ) = Ψ 0 E(uu0 ) Ψ
= Ψ 0Ω Ψ ,
• To make the expression in the farther right-hand side to reduce to σ 2 I, we must define Ψ
such that
Ω −1 = Ψ Ψ 0 .
Transforming Back to Classic Regression
• In this case, the variance of the transformed error reduces to
E(Ψ 0 uu0 Ψ ) = Ψ 0 (Ψ Ψ 0 )−1 Ψ

= Ψ 0 (Ψ 0 )−1 Ψ −1 Ψ = I.
• Premultiplying the regression by Ψ 0 gives
Ψ 0 y = Ψ 0 Xβ + Ψ 0 u.
• Because the covariance matrix Ω is nonsingular, the matrix Ψ must be as well, and so
the transformed regression model is perfectly equivalent to the original model.
GLS of the Transformed Model
• The OLS estimator of β from the transformed regression is
β̂gls = (X0 Ψ Ψ 0 X)−1 X0 Ψ Ψ 0 y
= (X0 Ω −1 X)−1 X0 Ω −1 y.
• This is the expression for the generalized least squares, estimator of β.
• Since β̂gls is just the OLS estimator for the transformed model, its covariance matrix can
−1
be found directly from the OLS covariance matrix, σ 2 X0 X .
• Replacing X by Ψ 0 X and σ02 by 1 we get
Var(β̂gls ) = (X0 Ψ Ψ 0 X)−1 = (X0 Ω −1 X)−1 .
2
The GLS Criterion Function
• The generalized least squares estimator β̂gls can also be obtained by minimizing the GLS
criterion function
(y − Xβ)0 Ω −1 (y − Xβ),
which is just the sum of squared residuals from the transformed regression.
• This can be viewed as the SSR function from the original model, weighted by the inverse
of the matrix Ω.
• The effect of such a weighting scheme is clearest when Ω is a diagonal matrix. In that
case, the weight given to the tth observation is proportional to the inverse of Var(ut ).
3 Efficiency of the GLS Estimator

Method of Moments Representation of GLS
• The GLS estimator β̂gls defined in (7.04) is also the solution of the set of moment condi-
tions
X0 Ω −1 (y − X β̂gls ) = 0.
which the same old with W = Ω −1 X.
• It is easy to verify that these moment conditions are equivalent to the first-order conditions
for the minimization of the GLS criterion function (do it as an exercise).
• Since the GLS estimator is a method of moments estimator, it is interesting to compare

it with estimators obtained with a general matrix W, denoted β̂w .
• We will obtain efficiency from this comparison.
Method of Moments Representation of GLS
• Suppose that the DGP is a special case of that model, with parameter vector β0 and
known covariance matrix Ω.
• Assume further that E(u|X, W) = 0. As we have seen before, to obtain consistency,

pre-determinedness would suffice.
• Substituting Xβ0 + u for y in W0 (y − Xβ) = 0, we see that
β̂w = β0 + (W0 X)−1 W0 u.
• Therefore, the covariance matrix of β̂w is

Var(β̂w ) = E (β̂w − β0 )(β̂w − β0 )0
= E (W0 X)−1 W0 uu0 W(X0 W)−1

= (W0 X)−1 W0 ΩW(X0 W)−1 .
3
Efficiency of the GLS Estimator
• To show efficiency of β̂gls , we proceed as in previous cases and show that the difference
of the precision matrices,
X0 Ω −1 X − X0 W(W0 ΩW)−1 W0 X, (1)
is positive semidefinite (Do it as an exercise).
• This difference being positive semidefinite means that any other choice of variables W
yields larger variance than W = X0 Ω −1 .
• In fact, β̂gls is typically more efficient for all elements of β, because it is only in very
special cases that the matrix (1) will have any zero diagonal elements.
• Note that β̂w reduces to the OLS estimator when W = X. Thus we conclude that our
conclusions apply to the OLS estimator, β̂.
4 Computing GLS Estimates

• The main issue in computing the GLS estimator is that, in general, the matrix Ω in
unknown. But it is important to note that there is a computational difficulty even if Ω
is known.
• The reason is that when n is large, computation based on Ω, which is an n × n matrix,

can be very demanding in terms of computer memory.
• In general, computation of the GLS estimator will be easy only if the matrix Ψ has a
form that allows us to calculate Ψ 0 x, without having to store Ψ itself in memory.
GLS with Ω Known Up to a Constant
• Suppose that Ω = σ 2 ∆ , where the n × n matrix ∆ is known to the investigator, but the
positive scalar σ 2 is unknown.
• Then if we define Ψ in terms of ∆ instead of Ω, the transformed regression is still valid,

but the error terms will now have variance σ 2 instead of variance 1.
• The OLS estimates from the transformed regression with the modified Ψ is numerically
identical to β̂gls :
(X0 ∆−1 X)−1 X0 ∆−1 y = (X0 (σ −2 Ω)−1 X)−1 X0 (σ −2 Ω)−1 y
= (X0 Ω −1 X)−1 X0 Ω −1 y
= β̂gls .
• Thus the GLS estimates will be the same whether we use Ω or ∆, that is, whether or not
we know σ 2 .
4
• The covariance matrix of β̂gls in this case can be written as
Var(β̂gls ) = σ 2 (X∆X),
which can be estimated by replacing σ 2 with the usual estimator OLS of the error variance,
s2 , from the transformed regression.
Weighted Least Squares

• Let ωt2 denote the tth diagonal element of Ω. That is, the error terms are heteroskedastic
but uncorrelated.
• Then Ω −1 is a diagonal matrix with tth diagonal element ω −2 , and thus Ψ will be a
diagonal matrix with elements ωt−1 .
• In this case, the transformed regression can be written as

1 1 1
yt = Xt β + ut ,
ωt ωt ωt
and estimated by OLS.
• This special case of GLS estimation is often called weighted least squares, or WLS.
• The weight given to each observation is ω −1 , and thus observations for which the variance
of the t error term is large/small are given low/high weights.
• Note that all the variables in the regression, including the constant term, must be multi-
plied by the same weights.
• Note that the R2 only makes sense in terms of the transformed regressand, since the
“undoing” the weighting does not preserve orthogonality of residuals and fitted values.
That is,
û ⊥ ŷ =⇒
6 Ψ −1 û ⊥ Ψ −1 ŷ
Generalized Nonlinear Least Squares

• Replacing the vector of regression functions Xβ by x(β), we obtain generalized non-
linear least squares, or GNLS, estimates by minimizing the criterion function
0
(y − x(β)) Ω −1 (y − x(β)) ,
• Differentiating with respect to β and dividing by −2 yields the moment conditions
X 0 (β)Ω −1 (y − x(β)) = 0,
where, X(β) is the matrix of derivatives of x(β) with respect to β.
5
5 Feasible Generalized Least Squares

GLS is Infeasible in Practice
• As we discussed before, even if the matrix Ψ is known, computation of GLS estimates is

expensive because there is a n × n matrix to be handled.
• Life is much easier if there is heteroskedasticity and no serial correlation. In this case, we
can simply use weighted least squares.
• But even in this case some information on ωt is still necessary, such as sampling design
or a direct relationship between E(u2t ) and some variable zt that can be used as weight.
• In practice, the covariance matrix Ω is often not known even up to a scalar factor. This
makes it impossible to compute GLS estimates.
Estimating the Variance Matrix Ω
• In many cases it is reasonable to suppose that Ω , or ∆, depends in a known way on a

vector of unknown parameters γ, that is, assume that Ω = Ω(γ).
• In this case, if it is possible to obtain a consistent estimate of γ, then Ω̂ = Ω(γ̂) is

consistent for Ω.
• Then we can define Ψ (γ̂) such that
Ω̂ = Ψ (γ̂)Ψ 0 (γ̂).
and obtain GLS estimates conditional on Ψ (γ̂).
• The resulting estimator is called feasible generalized least squares, or feasible GLS
Estimating Ω Using Skedastic Functions
• In the same way that a regression function determines the conditional mean of a random
variable, a skedastic function determines its conditional variance:
E(u2t |xt , zt ) = h(zt ; γ),
where γ is an l-vector of unknown parameters, and zt is a vector of observations on

exogenous or predetermined variables that belong to the information set on which we are
conditioning.
• An example of a skedastic function is exp(Zt γ), which conveniently produces positive

estimated variances for all γ.
6
Example of Feasible GLS Procedure
• Consider the linear regression model
yt = xt β + ut , E(u2t ) = exp(zt γ).
• In order to obtain consistent estimates of γ, we can start obtaining consistent estimates

of the error terms from the vector of OLS residuals with typical element ût .
• We can then obtain OLS estimates γ̂ running the auxiliary linear regression
log û2t = Zt γ + vt ,
• These estimates are then used to compute

1/2
ω̂t = exp(Zt γ̂)
for all t.
• Finally, feasible GLS estimates of β are obtained by using ordinary least squares to esti-
mate regression, with the estimates ω̂t replacing the unknown ωt ,
1 1 1
yt = Xt β + ut .
ω̂t ω̂t ω̂t
• This is an example of feasible weighted least squares.
• Under suitable regularity conditions, it can be shown that this type of procedure yields
a feasible GLS estimator β̂f that is consistent and asymptotically equivalent to the GLS
estimator β̂gls .
Why Feasible GLS Works

Consistency of the GLS Estimator
• If we substitute Xβ0 + u for y into the formula for the GLS estimator, we find that
β̂gls = β0 + (X0 Ω −1 X)−1 X0 Ω −1 u.
• Taking probability limits, after rearranging multiplying each factor by an appropriate

power of n, we get
−1
√

a 1
n(β̂gls − β0 ) = plim X0 Ω −1 X plim n −1/2 0
XΩ −1
u .
n
• As usual, we assume sufficient conditions for the first factor in the right-hand side to tend
to a non-stochastic k × k matrix.
• Then, we apply a CLT to the second factor to conclude that it is a asymptotically normal
random vector, and thus obtain root-n consistency and normality.
7
• Following the same argument for the feasible GLS estimator, we find that
−1
√

a 1 0 −1 −1/2 0 −1
n(β̂f − β0 ) = plim X Ω (γ̂)X plim n X Ω (γ̂)u .
n
• Clearly, β̂gls will be asymptotically equivalent to β̂f if

1 0 −1 1
plim X Ω (γ̂)X = plim X0 Ω −1 X
n n
and
plim n− /2 X0 Ω −1 (γ̂)u = plim n− /2 X0 Ω −1 u.
1 1
• For these equalities to hold, it is necessary that plim γ̂ = γ.
Small Sample Properties of the Feasible GLS

• Whether or not feasible GLS is a desirable estimation method in practice depends on how
good an estimate of Ω can be obtained.
• If Ω(γ̂) is a very good estimate, then feasible GLS will have essentially the same properties
as GLS itself.
• As a result, inferences should be reasonably reliable, even though they will not be exact
in finite samples.
• On the other hand, if Ω(γ̂) is a poor estimate, feasible GLS estimates may have quite
different properties from real (infeasible) GLS estimates, and inferences may be quite
misleading.
Alternative Estimation Approaches

• It is possible to iterate a feasible GLS procedure, using β̂f to compute new set of residuals,
ˆ.
û
ˆ to obtain a second-round estimate of γ̂

• Then, use û ˆ , which can be used to calculate
ˆ
second-round feasible GLS estimates, β̂f , and so on.
• This procedure can either be stopped after a predetermined number of rounds or continued
until convergence is achieved (although convergence is not guaranteed).
• Iteration does not change the asymptotic distribution of the feasible GLS estimator, but
it does change its finite-sample distribution.
• Another way to estimate models in which the covariance matrix of the error terms depends
on one or more unknown parameters is to use the method of maximum likelihood.
• As we will see later on, in this case, β and γ are estimated jointly and consistency will
follow if the maximum likelihood regularity conditions are satisfied.
• In many cases, an iterated feasible GLS estimator will be the same as a maximum likeli-
hood estimator based on the assumption of normally distributed errors.
8
6 Testing for Heteroskedasticity

Model Specification and Heteroskedasticity
• It is important to note that in our usual setup, homoskedasticity is imposed as a assump-

tion in model specification.
• If the true DGP is heteroskedastic, it will not the included in the estimated model, and
therefore there is a specification error.
• The specification error does not bias the OLS estimator, but renders it inefficient, as the
sandwich form of its covariance matrix suggests.
• As we have seen, we can compute asymptotically valid covariance matrix estimates for
the (inefficient) OLS and NLS parameter estimates.
• So, what if we choose to assume heteroskedasticity and settle with a inefficient estimator,
but the true DGP is homoskedastic?
• Simulation experiments suggest that this specification error frequently has little cost.
• This evidence can be taken as an indication that it may be prudent to employ an HCCME
anyway, especially if the sample size is large.
• However, in finite samples, tests and confidence intervals based on HCCMEs will always
be somewhat less reliable than ones based on the usual OLS covariance matrix under
homoskedasticity.
• If we have information on the form of the skedastic function, we might well wish to use
feasible generalized least squares, which is asymptotically equal to the efficient generalized
least squares.
• However, small sample properties of the feasible generalized least squares depend critically
on the estimates Ω̂.
• So, if the true DGP is homoskedastic and we assume heteroskedastcity, we can expect
that the specification error may be costly in small samples.
• So, before deciding to use a HCCME or a Feasible GLS procedure, it is advisable to

perform a specification test of the null hypothesis that the error terms are homoskedastic.
Skedastic Function and Heteroskedasticity Testing
• Let us consider a reasonably general model of conditional heteroskedasticity, such as
E(u2t | Ωt ) = h(δ + zt γ),
where the skedastic function h( · ) is a nonlinear function that can take on only posi-
tive values, zt is a 1 × r vector of observations on exogenous or predetermined variables
that belong to the information set Ωt , δ is a scalar parameter, and γ is an r-vector of
parameters.
9
• Under the null hypothesis that γ = 0, the function h(δ+Zt γ) collapses to h(δ), a constant.
• If we think of the skedastic function as a regression equation in conditional expectation

form, then its error form can be written as
u2t = h(δ + zt γ) + vt .
• Alternatively, you can define vt as the difference between u2t and its conditional expecta-
tion, and rewrite the skedastic function as in the last expression.
• Suppose that we actually observe ut . Then, we can test γ = 0 using a Gauss-Newton

regression
u2t − h(δ + Zt γ) = h0 (δ + Zt γ)bδ + h0 (δ + Zt γ)Zt bγ + residual,
where h0 ( · ) is first derivative of h( · ), bδ is the coefficient of δ, and bγ is the r-vector of

coefficients associated with γ.
GNR Testing for Heteroskedasticity
• Remember that we need to evaluate the GNR at “initial” parameter values.
• So, let us evaluate it at γ = 0 and δ = δ̃ ≡ h−1 (σ̃ 2 ), where σ̃ 2 is the sample variance of
ut :
u2t − σ̃ 2 = h0 (δ̃)bδ + h0 (δ̃)Zt bγ + residual.
• For the purpose of testing the null hypothesis that γ = 0, this regression is equivalent to
u2t = bδ + Zt bγ + residual,
with a suitable redefinition of the artificial parameters bδ and bγ , which does not depend
on the functional form of h( · ).
Residuals and Heteroskedasticity Testing
• It can be shown that replacing u2t by û2t does not change the asymptotic distribution of
the F and nR2 statistics for testing the hypothesis bγ = 0;
• The last issue is to choose the variables to be included in Z. White suggests including
all squares and cross-products of the variables em X (why?), which results in the White
Test.
• The general form of the test is basically the Breush-Pagan Test. We will derive the
limiting distribution for this test later, in a more convenient framework.
• Since the asymptotic approximations for these test statistics may be inaccurate in finite-
samples, bootstrapping them when the sample size is small or moderate may be a good
idea.
10

GLS Handout

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

GLS Handout

Diunggah oleh

Hak Cipta:

Format Tersedia

Generalized Least Squares

Econ 507 Econometric Analysis

y = Xβ + u, u|X ∼ iid(0, σ 2 I),

a result stated in the Gauss-Markov theorem.

Regression Model with Non-spherical Disturbances

• Let us consider the model

2 Generalized Least Squares

• Consider transforming the regression by pre-multiplying by Ψ . Then, the transformed

E(Ψ 0 uu0 Ψ ) = Ψ 0 E(uu0 ) Ψ

Transforming Back to Classic Regression

• In this case, the variance of the transformed error reduces to

E(Ψ 0 uu0 Ψ ) = Ψ 0 (Ψ Ψ 0 )−1 Ψ

• Premultiplying the regression by Ψ 0 gives

GLS of the Transformed Model

• The OLS estimator of β from the transformed regression is

β̂gls = (X0 Ψ Ψ 0 X)−1 X0 Ψ Ψ 0 y

• This is the expression for the generalized least squares, estimator of β.

• Replacing X by Ψ 0 X and σ02 by 1 we get

Var(β̂gls ) = (X0 Ψ Ψ 0 X)−1 = (X0 Ω −1 X)−1 .

The GLS Criterion Function

3 Efficiency of the GLS Estimator

• Since the GLS estimator is a method of moments estimator, it is interesting to compare

• We will obtain efficiency from this comparison.

Method of Moments Representation of GLS

• Assume further that E(u|X, W) = 0. As we have seen before, to obtain consistency,

• Substituting Xβ0 + u for y in W0 (y − Xβ) = 0, we see that

β̂w = β0 + (W0 X)−1 W0 u.

• Therefore, the covariance matrix of β̂w is

= (W0 X)−1 W0 ΩW(X0 W)−1 .

Efficiency of the GLS Estimator

X0 Ω −1 X − X0 W(W0 ΩW)−1 W0 X, (1)

is positive semidefinite (Do it as an exercise).

4 Computing GLS Estimates

• The reason is that when n is large, computation based on Ω, which is an n × n matrix,

GLS with Ω Known Up to a Constant

• Then if we define Ψ in terms of ∆ instead of Ω, the transformed regression is still valid,

(X0 ∆−1 X)−1 X0 ∆−1 y = (X0 (σ −2 Ω)−1 X)−1 X0 (σ −2 Ω)−1 y

• The covariance matrix of β̂gls in this case can be written as

Weighted Least Squares

• In this case, the transformed regression can be written as

Generalized Nonlinear Least Squares

• Differentiating with respect to β and dividing by −2 yields the moment conditions

where, X(β) is the matrix of derivatives of x(β) with respect to β.

5 Feasible Generalized Least Squares

• As we discussed before, even if the matrix Ψ is known, computation of GLS estimates is

Estimating the Variance Matrix Ω

• In many cases it is reasonable to suppose that Ω , or ∆, depends in a known way on a

• In this case, if it is possible to obtain a consistent estimate of γ, then Ω̂ = Ω(γ̂) is

• Then we can define Ψ (γ̂) such that

and obtain GLS estimates conditional on Ψ (γ̂).

Estimating Ω Using Skedastic Functions

E(u2t |xt , zt ) = h(zt ; γ),

where γ is an l-vector of unknown parameters, and zt is a vector of observations on

• An example of a skedastic function is exp(Zt γ), which conveniently produces positive

Example of Feasible GLS Procedure

• Consider the linear regression model

yt = xt β + ut , E(u2t ) = exp(zt γ).

• In order to obtain consistent estimates of γ, we can start obtaining consistent estimates

• These estimates are then used to compute

• This is an example of feasible weighted least squares.

Why Feasible GLS Works

β̂gls = β0 + (X0 Ω −1 X)−1 X0 Ω −1 u.

• Taking probability limits, after rearranging multiplying each factor by an appropriate

• Clearly, β̂gls will be asymptotically equivalent to β̂f if