Anda di halaman 1dari 41

Semiparametric ARCH Models: An Estimating Function Approach 1

by
David X. Li
and
H. J. Turtle

Correspondence to either of the following addresses:


David X. Li H. J. Turtle
Riskmetrics Group PO Box 644746
44 Wall Street, 22nd Floor Department of Finance, Insurance, and Real Estate
New York, NY 10005 College of Business and Economics
Washington State University
tel: (212) 981-7453 Pullman, Washington, 99164-4746
fax: (212) 981-7402
tel: (509) 335-3797
fax: (509) 335-3857
email:david.li@riskmetrics.com
web:http://www.riskmetrics.com email: hturtle@wsu.edu
web: http://www.cbe.wsu.edu/~hturtle

Original draft: April 1995


Current draft: April 1999

1
We thank John Kling, Tom McCurdy, Ieuan Morgan, seminar participants at the 1996 Northern
Finance Association Meetings, and three anonymous referees for helpful comments. Financial
assistance from the Social Sciences and Humanities Research Council (SSHRC) is gratefully
acknowledged (Turtle). The usual disclaimer applies.
Semiparametric ARCH Models: An Estimating Function Approach

We introduce the method of estimating functions to study the class of


autoregressive conditional heteroskedasticity (ARCH) models. We derive the
optimal estimating functions by combining linear and quadratic estimating
functions. The resultant estimators are more efficient than the quasi-maximum
likelihood estimator. If the assumption of conditional normality is imposed, the
estimator obtained by using the theory of estimating functions is identical to that
obtained by using the maximum likelihood method in finite samples. The relative
efficiencies of the estimating function approach in comparison with the quasi-
maximum likelihood estimator are developed. We illustrate the estimating function
approach using a univariate GARCH(1,1) model with conditional Normal,
Student-t, and Gamma distributions. The efficiency benefits of the estimating
function (EF) approach relative to the quasi-maximum likelihood approach are
substantial for the Gamma distribution with large skewness. Simulation analysis
shows that the finite sample properties of the estimators from the estimating
function approach are attractive. EF estimators tend to display less bias and root
mean squared error than the quasi-maximum likelihood estimator. The efficiency
gains are substantial for highly nonnormal distributions. An example demonstrates
that implementation of the method is straightforward.

KEY WORDS: Quasi-maximum likelihood estimation; relative efficiency; GARCH.

1. INTRODUCTION

Recent financial studies show substantial interest in the sampling properties of estimators
that result from models of conditional volatility. Many volatility models follow from the seminal
work of Engle (1982) on autoregressive conditional heteroskedasticity (ARCH). Engle models
conditional variances as evolving according to a linear function of predetermined variables, most
notably squared prior disturbances. The generalizations and refinements to Engle’s pioneering
work are extensive (c.f., the GARCH model of Bollerslev (1986), the IGARCH model of Engle
and Bollerslev (1986), the ARCH-M model of Engle, Lilien and Robins (1987), the Quadratic
GARCH of Sentana (1991), the Student-t GARCH model of Engle and Bollerslev (1986) and
Bollerslev (1987), the log GARCH of Geweke (1986), the Exponential ARCH of Nelson (1991),
the nonlinear GARCH of Higgins and Bera (1992), or the threshold ARCH model of Glosten,
Jaganathan, and Runkle (1993)).
Volatility models have been successfully employed in pricing derivative securities, in
stochastic modeling of the term structure of interest rates, in applications related to fixed-income
portfolio management, and in asset pricing studies. The interested reader is referred to Bollerslev,
Chou and Kroner (1992) or Engle (1995) for an extensive survey of the ARCH methodology in
finance. For a review of the literature using stochastic volatility, Taylor (1994) provides an
excellent summary.

ARCH model estimation can be accomplished using a variety of techniques including


maximum likelihood (ML) estimation assuming conditional normality, quasi-maximum likelihood
(QML) estimation (c.f., Weiss 1986, and Bollerslev and Wooldridge 1988), generalized method of
moments (GMM) estimation (e.g., Bodurtha and Mark 1991), or semiparametric estimation (c.f.,
Engle and Gonzalez-Rivera 1991, or Drost and Klaassen 1997). It is well known that GMM and
QML estimation procedures produce inefficient and possibly biased estimates relative to ML
estimates when the true distribution is known. We develop an estimation approach for ARCH
models that reduces bias and improves efficiency without any necessary assumptions regarding the
underlying variate distribution.

The purpose of this paper is twofold. First, we seek to introduce the theory of estimating
functions (EFs) into the finance literature. We show that the EF approach is well suited to
financial data. A related paper by Vinod (1996) considers the benefits of using the estimating
function approach in conjunction with bootstrapping to meaningfully shrink confidence intervals
in many econometric contexts. Second, we show how the EF approach can be applied to the
estimation of ARCH models. Many unsolved problems in the estimation of ARCH models may
be addressed using the EF approach. In particular, the optimality of QML and GMM estimation
is often based on asymptotic theory. Unfortunately, asymptotic findings do not apply to the small
sample sizes often used in practice. This problem is exacerbated because estimation of higher
moments often requires very large samples to obtain convergence to asymptotic results. Because
the EF approach is based on finite samples from the outset, these criticisms do not apply.
Nonetheless, under strong distributional assumptions, many standard results in the estimation of
ARCH models based on conditional normality are recoverable under the EF approach. Thus, in
addition to important finite sample properties, the EF approach provides a strong link to the
existing literature.

2
The remainder of the paper is organized as follows. Section 2 provides an introduction to
the theory of EFs. Sections 3 and 4 show how the EF approach can be used to estimate ARCH
and ARCH regression models, respectively. Section 5 discusses properties of optimal estimating
functions. Section 6 derives measures of the relative efficiency of EF estimators. Section 7
performs Monte Carlo analysis to examine the behavior of EF estimators in terms of variance,
bias, and root mean squared errors in both a moderate and large sample setting (500 and 1000
observations). Section 8 demonstrates the use of the EF approach in a simple example using daily
observations on the S&P500 index. Finally, in section 9 we offer concluding comments.

2. THE THEORY OF ESTIMATING FUNCTIONS

In this section we introduce preliminary concepts and results from the theory of EFs
required for our development. We draw extensively on the work of Godambe (1960, 1976, 1985,
1991), Godambe and Thompson (1984, 1989), and Heyde (1989) to present the important theory
and application of the EF approach to ARCH models.

Traditional estimation theory, such as the method of ML or the method of least squares
(LS), focuses on properties of estimators that are functions of observations. The optimality of a
subclass of estimators is typically established according to an optimality criterion, such as
minimum mean squared error, or uniform minimum variance unbiasedness. An alternative
approach is to focus on functions of both the observation x and the unknown parameter θ, and to
study estimators as the solution of some equation,

g ( x ,θ )= 0

3
The function g is denoted an estimating function, while the underlying equation is called an
estimating equation. For a given optimal estimating function, an estimate of θ may be obtained.
Many estimation approaches can be viewed as special cases of the estimating function approach.
For example, the ML estimator is typically obtained by setting the score function equal to zero.
In the estimating function approach, it is the estimating function itself, rather than the estimator,
which is the subject of study. This change of emphasis from the estimator to the estimating
function has the following advantages:

1. Optimality criteria are defined for the estimating function itself, not the estimator. Thus,
for example, optimality can be based on finite sample properties.
2. Parametric models and semiparametric models can be studied with equal ease by the
approach of estimating functions.
3. Information from multiple sources can be readily combined using the concept of
orthogonal estimating functions.

Given these advantages, the estimating function approach has been successfully applied to
research in areas such as biostatistics, statistical inference in stochastic processes, and survey
sampling. The focus on the EFs directly implies that the resultant estimators need not be
unbiased. We address these issues theoretically in section 5, and empirically in section 7.

We present without proof a number of important definitions and theorems related to the
theory of EFs (the interested reader is referred to Godambe (1960, 1976, 1985, 1991), Godambe
and Thompson (1984, 1989), and Heyde (1989)). Suppose that X = ( x1 , x 2 ,..., xT ) is a vector
random variable on a probability space. The distribution family of this vector random variable is

( )
parameterized by θ = θ 1 ,θ 2 ,...,θ p . If there is a one-to-one mapping from the distributional

family H to the parameter space θ, the model under study is called a parametric model;
otherwise, it is called a nonparametric model. We first present results for the scalar parameter
case and then we extend our results to the multiparameter case (beginning with Definition 3).

Definition 1 (Godambe 1960). An estimating function is a function g ( X ,θ ) of both the

observation X and the parameter θ. The estimating function is called unbiased if


E [g ( X ,θ )] = 0

4
for all F ∈ H such that θ(F) = θ.

Godambe also imposed some regularity conditions on unbiased estimating functions to form a
class ςof regular unbiased EFs. An estimate of θ based on the EF is obtained by solving the
estimating equation

g ( X ,θ ) = 0 .

In many applications, the number of EFs is set to equal the number of parameters so that a unique
θ can be obtained. If f(x, θ) is the probability density function for observation x for given θ, the
score function is defined as


S ( x ,θ ) = log f ( x ,θ ). (1)
∂θ

Because E[S( x,θ)]= 0 under standard regularity conditions (c.f., Lehmann 1983, p. 118), the

score function is an unbiased EF.

Definition 2 (Godambe 1960). Within the class ς of all regular unbiased EFs, a function g
belonging to ςis an optimal EF for θ if, for any F ∈ H with θ = θ(F), it minimizes the quotient,

E g2 []. (2)
{ [ ]}
2
∂g
E ∂θ

The intuition of this optimality criterion is twofold. First, we desire the smallest numerator,

[]
E g 2 , possible for a given denominator. The numerator can be interpreted as the variance of an

{ [ ]}, that demonstrates as much sensitivity as


2
∂g
unbiased EF, g. We also seek a denominator, E ∂θ

possible to changes in the parameter θ. This optimality criterion for estimation of a single
parameter is due to Godambe (1960). Multiparameter versions are discussed in Durbin (1960),
Godambe and Heyde (1987), and Godambe and Thompson (1989).

Definition 3 (Kale 1962). For a given set of unbiased EFs, g = (g1, g2, ... , gm) from the class ς,
the EF g*∈ ς is said to be optimal if, for any distribution F ∈ H , the following is satisfied

J − H ( H * ) J * ( H *' ) H ' ≥ 0
−1 −1

5
( )
for all g ∈ ς, where J = Cov F ( g ), J * = Cov F g * , H = − EF ( ), and H
∂g
∂θ
*
= − EF ( ).
∂g *
∂θ

When the form of density function f ( x , θ ) is specified, the score function given by equation
(1) is the optimal EF.

Theorem 1 (Godambe 1960). In the parametric model, the score function is the optimal EF in ς.

This theorem justifies the use of ML in parametric models from the vantage point of the theory of
EFs. When the form of the density function is not specified, the optimal EF which minimizes the
quotient (2) still provides the highest correlation with any possible score functions S ( x ,θ ) (c.f.,
Godambe 1985).

Suppose there exist two matrix functions, D(θ) and V(θ) > 0, such that for any F ∈ H
satisfying θ = θ(F),

 = D (θ )
 ∂θ 
E and Var ( g ) = V (θ ).
∂g
 

Then, the EF,


−1
D 'V g

is an optimal EF of θ in the class ς. This can be readily verified by the multiparameter definition
of optimality. A sufficient condition for g* to be optimal in ςis given by the following lemma.

[ ]
Lemma 1 (Godambe 1985). An EF g* in ςis optimal if E g* ( gs − gs* ) = 0 for any g ∈ ς, where gs

is the standardized form of the EF g, defined as gs =


E ( )g .
∂g
∂θ

E( g 2 )

Let X = {x}be an abstract sample space and Ω = {θ }be the parameter vector space. Let pj,

j=1,2,..., k be any real functions defined on the product space X ×Ω = {( x,θ )x ∈ X ,θ ∈ Ω }of

sample X and parameter Ω , such that

[
E p j ( X ,θ ( F ))ℑ j ]=0 for F ∈ H , (3)

6
[ ]
where EF ⋅ℑ j is the expectation under F, conditional on ℑ j , a σ-algebra generated by a partition

on the sample space. In most applications, a set of unbiased EFs p j ( X ,θ ( F )),

j=1,2,...,k is chosen according to the underlying application. In many situations, simple EFs can
be formed from relationships involving only the first few moments. A better EF can then be
formed using optimal orthogonal combinations based on the following definition.

Definition 4 (Godambe and Thompson 1989). The EFs pj, j=1,2,...,k satisfying equation (3) are
mutually orthogonal, if
E ( p j pi ℑ i )= 0 and E p j p i ℑ j = 0 ( )
for F ∈ H and i ≠ j, i, j=1,2,...,k.
We can now form a class of linear combinations of unbiased EFs as follows,

k
l = ∑ ai pi , (4)
i =1

where the pi’s satisfy equation (3) and each ai is a function of the observation X and parameter θ,
which is measurable with respect to the σ-algebra ℑ j . Theorem 2 shows how to construct an

optimal EF of the form (4).

Theorem 2 (Godambe and Thompson 1989). In the class of EFs l, the optimal EF is given by

*
=∑
n ( ℑ )p
E ∂pi
∂θ i

E(p ℑ )
l 2 i
i =1 i i

if the functions pi are mutually orthogonal, and assuming the existence of the involved derivatives
and their expectations.
This result is very general and can apply to many problems that have been studied by
different estimation methods including least squares, and maximum likelihood. In the following
section, we apply the EF approach to the estimation of ARCH models.

7
3. ARCH MODEL

Suppose yt is generated by the ARCH process described as,

y t ℑ t − 1 ~ (0, ht ), (5)

and

ht = h( y t − 1 , yt − 2 , . . ., yt − q , α ) (6)

where ℑ t − 1 represents the information set available at time t-1, q is the order of the ARCH

process, and α is a vector of unknown parameters.

An obvious choice for an estimating function is

gt = yt2 − ht , t = 1, 2, ..., T. (7)

In general, the choice of an estimating function can be viewed in a manner analogously to the
selection of moment conditions in the Generalized Method of Moments (GMM) approach of
Hansen (1982). Specific motivations for estimating functions may arise from economic or
statistical theory. For example, moment conditions may naturally arise from the expectations of
economic agents in a given economic problem.

We can easily verify that conditional on the information set {ℑ t − 1 , t = 1,2,...,T }, the gt’s are

unbiased, and mutually orthogonal. Consider the linear combinations of the basic EFs
T
l = ∑ at gt (8)
t =1

where the weights, at, are any function of yt and α that are measurable with respect to the
information set {ℑ t − 1 , t = 1,2,..., T }.

Following Theorem 2.1 of Godambe (1985), the optimal EF is given by


T
l * = ∑ at*gt (9)
t =1

where at* = E (∂g t


∂α
) ( )
ℑ t − 1 / E g t2 ℑ t − 1 .

8
Now based on (7), E (
∂g t
∂α
)
ℑ t− 1 = −
∂ht
∂α
( ) ( )
and E g t2 ℑ t − 1 = E y t4 ℑ t − 1 − ht2 . Thus, the optimal

EF can be written as

T ∂ht
(y 2
− ht )
l =−
*
∑ E (y
t =1
∂α
4
t

)
ℑ t − 1 − ht2
. (10)
t

For emphasis, we stress that (10) is based upon the finite sample and it does not depend on
any distributional assumptions for yt conditional on ℑ t − 1 .

(
Assuming conditional normality as in Engle (1982), we also have E y t4 ℑ t − 1 = 3ht2 , and )
equation (10) simplifies to,

T
1 ∂ht  yt2 
l =−∑
*
 − 1 . (11)
t =1 2 ht ∂α ht 

Comparing this with the first order condition of equation (7) in Engle (1982), we note that they
are equivalent up to a sign change. Therefore, we conclude that, under the additional assumption
of normality, the theory of EFs and the maximum likelihood method give the same estimate for
parameters in the ARCH model.

The EF method we develop is based on a semiparametric model that is not fully specified by
the parameters of interest; whereas the ML method is based on a parametric model, in which the
model is fully described under the assumption of conditional normality. Equation (10) is valid for
any conditional distributions satisfying the mean-variance structure assumed in equation (5) and
(6). From an EF viewpoint, equation (11) is valid assuming conditional normality, or any other
( )
conditional distribution in which E y t4 ℑ t − 1 = 3ht2 . If the exact distribution is unknown, equation

(10) should be used instead of (11). Godambe and Thompson (1989) show that equation (10) can
be interpreted as a quasi-score function because it possesses properties similar to an ordinary
score function. In this context, we can define the information matrix as the expectation of the
Hessian averaged over all observations. (The information matrix can be estimated by

9
∂h ∂h
1 T t t

∑ ∂α ∂α ′
T t =1 (γ2 t + 2)ht2
, which is identical to the estimate provided in Engle (1982) equation (14), when

standardized kurtosis, γ2t , is zero.)

4. ARCH REGRESSION MODEL

An ARCH regression model can be expressed as

y t ℑ t − 1 ~ ( xt β , ht )

ht = α 0 + α 1εt2− 1 + . . . + α q εt2− q

for unknown coefficient vectors α and β, and where ε t = y t − xt β .

We can form two basic EFs

g1t = yt − xt β , and

g2′=
t ( yt − xt β) − ht . 2

Unfortunately, g2′
t is not orthogonal to the linear EF, g 1t . We adopt the orthogonalization

procedure in Doob (1953) to produce an orthogonal EF,

g2 t = ( yt − xt β ) − ht − γ1t ht1/ 2 ( yt − xt β )
2
(12)

where γ1t =
[
E ( y t − xt β ) ℑ t − 1
3
] is the skewness of y conditional on ℑ t − 1 .
t
ht3 / 2

We now form the linear combination of these basic EFs to estimate the coefficient vectors α
and β

T T
l1 = ∑ a1t g1t + ∑a 2t g 2t
t =1 t =1

T T
l 2 = ∑ b1t g1t + ∑b 2t g 2t . (13)
t =1 t =1

10
Let be the class of all EFs (l1 ,l2 ) given by (13). Following Godambe and Thompson

(1989) Theorem 2.1, the jointly optimal EFs (l ,l )


*
1
*
2 are given by (13) with

ait = ait* , bit = bit* , i = 1, 2, and t = 1, 2, ... T , where

E ( ∂g 1 t
∂α ℑ t− 1 ) E (
∂g 2 t
∂α ℑ t− 1 ) −
∂h t
∂α

( ) ( ) h (γ + 2 − γ )
a 1*t = = 0, a *2 t = =
2 2
E g 12t ℑ t− 1 E g 22 t ℑ t− 1 t 2t 1t

E (
∂g 1 t
∂β ℑ t− 1 ) ∂x t β
∂β
E ( ∂g 2 t
∂β ℑ t− 1 ) h γ 1
t
2
1t
∂x t β
∂β

∂h t
∂β

( ) ( ) h (γ
b1*t = =− b 2* t = =
)
, and .
E g 12t ℑ t− 1 ht E g 22 t ℑ t− 1
2
t 2t + 2 − γ12t

In general, a1t* , a 2t* , b1t* , and b2t* will be vector quantities with dimensions determined by the

dimensions of α and β . The coefficient γ2 t =


E [( y t − x t β )4 ℑ t − 1 ]
− 3 represents the
h t2

standardized kurtosis.

Now, the optimal EFs can be written as

T ∂ht

l1* = − ∑ h (γ +
t =1
2
∂α

2− γ 2
)g 2t
t 2t 1t

∂xt β ∂xt β ∂ht


T T ht1/ 2γ1t ∂β −
l =−*
2 ∑t =1
∂β

ht
g1t + ∑ h (γ +
t =1
2
2− γ
∂β
2
)g 2t . (14)
t 2t 1t

We again stress that this result is very general in the sense that no distributional assumptions on
y t ℑ t − 1 are made. The usual result obtained under conditional normality is recoverable, by

imposing γ1t = 0 , and γ2t = 0 . In this case the optimal EFs become,

1 ∂h t  
 εt − 1
T 2
l 1* = − ∑ 2 ht ∂ α  
t =1  ht 

ε t x t' 1 ∂h t  
 ε t − 1 .
T T 2
l 2* = − ∑ ht
− ∑ 2 ht ∂ β  
t =1 t =1  ht 

The resulting optimal estimating equations l1* = 0 and l2* = 0 are equivalent to the first order
conditions of equations (7) and (20) in Engle (1982) under the assumption of conditional

11
normality (up to a sign change). The contribution of γ1t and γ2t to the estimation of α and β will

be important when the underlying conditional distribution displays third and fourth moments that
deviate from normality. The simplified quasi-likelihood equations obtained under the normality
assumption will be inefficient for distributions with nonzero values of γ1t and γ2t (c.f., the

discussion in Engle and Gonzalez-Rivera (1991)). In contrast, the results obtained from (14) will
be more efficient even if only an approximate specification of γ1t and γ2t are available. The

orthogonality of the functions g1t and g2t holds for any value of γ2t . This suggests that even an

approximate value for γ2t can be used to give near optimal estimating functions l1* and l 2* . We

document these benefits in sections 6 and 7 after discussing some properties of optimal estimating
functions and their resultant estimators in section 5.

5. PROPERTIES OF OPTIMAL ESTIMATING FUNCTIONS

The optimal EFs l1* and l 2* obtained in section 4 are martingales; thus, they are sometimes
called the optimal martingale estimating functions. According to the martingale central limit
theorems given in Hall and Heyde (1980) and some mild conditions, the optimal EFs obtained
after orthogonalization, standardization and optimal combinations have the property
1
( )
T 2 θ$ EF − θ → MVN ( 0, VEF
−1
)
where V EF = E ( ), i, j, = 1,2 .
∂li*
∂ θj
(Interested readers are referred to Godambe and Heyde (1987),

Anh (1988), Heyde and Lin (1992), or a more recent book by Heyde (1997).)

Crowder (1986) discusses explicit conditions under which the estimators from the EFs
converge in probability to the true parameters. Using Theorem 3.3 from Crowder (1986), weak
convergence of our estimator can be established using the optimal estimating functions presented
in section 4. Recently, Chen (1993) has provided proper conditions and a rigorous proof of
strong consistency for both linear and quadratic EFs. Optimal estimating functions provide an
approximation to the underlying true score functions and have similar properties to a score
function. Hence the results of Hutton and Nelson (1986) on the asymptotic consistency and

12
normality of maximum quasi-likelihood estimates for semi-martingales may be applied, assuming a
martingale difference structure for both the mean and variance of y t ℑ t − 1 .

In the theory of EFs, the emphasis of study is the EFs themselves, rather than the resultant
estimates. Efficiency is measured with respect to the EFs. Bhapkar (1972) proposed an
efficiency measure that is essentially the inverse of Godambe’s criterion in the case of one
dimension. The Bhapkar efficiency measure is the variance of the estimator derived from the
corresponding EF. In this paper, we follow the tradition of comparing competing estimators by
their variance.

Definition 5. The relative efficiency of the estimate for the parameter θ, derived from the optimal
EF, is the ratio of the variance obtained by the ML method when the true density function is
assumed, to the variance derived from the optimal EF when only the first few moments are
assumed, i.e.,

REθ =
( ).
Var θ$ ML
Var (θ$ )
EF

Given our interest in the finite sample properties of our estimators, we also report the bias
and root mean squared error for alternative estimators in our empirical analysis.

6. COMPUTING THE RELATIVE EFFICIENCY OF THE ESTIMATING


FUNCTION APPROACH IN ARCH MODELS

Weiss (1986), and Bollerslev and Wooldridge (1988) show that under a correct specification
of the first and second moments, consistent estimates of the parameters of the ARCH model can
be obtained by maximizing a likelihood function constructed under the assumption of conditional
normality, even when the true density deviates from normality. This approach is now called the
quasi-maximum likelihood (QML) method. Engle and Gonzalez-Rivera (1991) quantify the loss
of efficiency that results when the QML estimator is employed, using Monte Carlo simulations for
two densities -- one leptokurtic and the other positive skewed. They find that the efficiency of the
QML method for a Gamma distribution is particularly low and conclude (p. 347), “It is
worthwhile searching for estimators that can improve on QMLE.”

13
We adopt the theory of EFs to estimate parameters in ARCH type models. In this section
we state the relative efficiency measures in the special case of a Student t or Gamma distribution
to demonstrate the potential efficiency gains from the EF approach. In the next section, we report
the behavior of EF estimators in a Monte Carlo experiment to allow us to compare the finite
sample performance of the EF approach to ML, QML, and other semiparametric approaches such
as Engle and Gonzalez-Rivera (1991), or Drost and Klaassen (1997).

Consider a GARCH (1,1) process to describe the dynamics of asset returns for scalar valued
parameters α and β ,

y t | ℑ t − 1 ~ (0, ht )

ht = (1 − α − β ) + α yt2− 1 + βht − 1 .

The optimal EFs for the GARCH(1,1) model can now be written as,

∂h t
T
l 1* = − ∑ h (γ
t =1
2
∂α

+ 2 − γ12t )
g 2 t , and
t 2t

∂h t
T

∑ h (γ
∂β
l 2* =−
)
g2t . (15)
t =1 t
2
2t + 2 − γ12t

The asymptotic variance-covariance matrix of the coefficient vector (α , β )′may be written as the

V V 
2 by 2 matrix V − 1 where V =  11 12  has elements given by,
V 21 V 22 

 ∂l *   *   * 
V11 = E  1 ℑ t − 1  , V = V = E  ∂l 1 ℑ , and V = E  ∂l 2 ℑ .
t− 1 t− 1
 ∂α  12 21
 ∂β  22
 ∂β 
     

14
In sections 6.1 and 6.2 we consider the relative efficiency of the EF approach for the
Student t and Gamma distributions.

6.1 Student’s t Distribution


Assume that the conditional density of yt follows a Student’s t distribution with v
( v ≥ 5 )degrees of freedom,
− (v + 1)/ 2
Γ( v2+ 1 ) y t2 
f ( yt ℑ t − 1 )=
1
1 +  . (16)
π(v − 2 )ht Γ( v2 )  (v − 2 )ht 

The moment structure for this distribution up to the fourth order is,

(
E ( y t ℑ t − 1 )= 0 , E y t2 ℑ t − 1 = ht , )
γ1 t = 0 , γ2 t = 6
.
v− 4

This distribution is symmetric about 0 and exhibits leptokurtosis. As v tends to infinity, this
distribution tends to the Normal distribution, N (0, ht ).

It is straightforward to demonstrate that the relative efficiency of the EF estimators for α


and β, may be stated as

T 1
( )( ) I
2
∂ht 2 ( v + 1) y t2
1−
( v − 1) t∑=1 ht2 ∂β y t2 + ht ( v − 2 ) 
REα =   , and
2( v − 4)  ( )
T
∂ht 2
∑ 1
V 
 t =1
ht2 ∂β

T 1
( ) (1 − ) I
2
∂ht 2 ( v + 1) y t2
( v − 1) t∑=1 ht2 ∂α y t2 + ht ( v − 2 ) 
REβ =   , respectively, (17)
2( v − 4)  ∑ ( )
T
∂ht 2
1
V 
 t =1
ht2 ∂α

where V − 1 is the asymptotic variance-covariance matrix of the EF estimators for α and β, V is

the determinant of V , I − 1 is the asymptotic variance-covariance matrix of the ML estimators of α


and β, and I is the determinant of I .

15
6.2 Gamma Distribution
Suppose that the conditional density of yt follows a Gamma distribution with shape
parameter c. In this case, the conditional density and first four moments are,

c− 1  c yt 
 c yt  −  + c
f ( yt ℑ t − 1 )=
c 
 + c e 
ht 
(18)
ht Γ(c )
 th 

E ( yt ℑ t − 1 )= 0, Var ( y t ℑ t − 1 )= ht

2 6
where γ1 = , and γ2 = .
c c

Following the development in the appendix, the relative efficiency measures for α and β
under the Gamma distribution may be computed as,

 T 1 1 ∂h t 2  c − u t2 
2 
∑ 
 
   I 
2 (1 + c ) t =1 4 h t2  ∂β   c + u t  
RE α =   , and
c 2
 1  ∂h t  
T
 ∑   V 
t =1 h t 
2 ∂β 
 

 T 1 1 ∂h t 2  c − u t2 
2 
∑ 
 
   I 
2 (1 + c ) t =1 4 h t2  ∂α   c + u t  
RE β =  . (19)
c 2
 1  ∂h t  
T
 ∑   V 
t =1 h t 
2 ∂α 
 

Comparing our results with the QML measures of relative efficiency in Engle and Gonzalez-
Rivera (1991), we find that the two approaches produce identical results when the conditional
distribution of yt follows a Student’s t distribution; however, assuming a conditional Gamma

distribution leads to a substantial contrast in the approaches. The equivalence in the results for
the Student’s t distribution occurs because of the symmetry in the distribution ( γ1t = 0 ), and the

special relationship between the second and fourth moments. The Gamma distribution differs
from the Normal distribution primarily with respect to skewness. For quadratic EFs to perform
well, information regarding both third and fourth moments is required. The quasi-maximum
likelihood (QML) estimator maximizes the Normal log likelihood function based on the mean and
variance. This approach will be inappropriate when the data displays serious departures from
normality in the third and fourth moments. An interesting alternative procedure used in

16
generalized statistical models is the method of maximum quasi-likelihood (MQL) estimation (c.f.,
Godambe and Heyde 1987, or Heyde and Lin 1992). This approach uses a quasi-likelihood
function based only on the first few moments of the distribution.

Comparing our results with the QML results of Engle and Gonzalez-Rivera (1991) under
the assumption of a conditional Gamma distribution, we find that the optimal EF differs from the
score function and hence produces different estimators. The relative efficiency of the EF
approach is equal to that of the QML estimator (c.f., Engle and Gonzalez-Rivera 1991) multiplied
by the constant factor of (3+c)/(1+c). Therefore, the optimal EF estimator is always more
efficient than the QML estimator. For small c values the efficiency gain is substantial; however, as
c increases this factor approaches 1 and the variance of the QML estimator approaches its lower
bound. Thus, the QML estimator becomes more efficient as c tends to infinity and the Gamma
distribution converges to the Normal distribution. The EF approach uses the additional
information inherent in the density’s skewness and kurtosis to form the optimal EF. The
prevalence of deviations from normality in third and fourth moments in empirical studies of stock
returns, short term interest rates, and exchange rates suggests incorporating this information into
the estimation approach is important (c.f., Rogalski and Vinso 1978, Hsieh 1989, Schwert 1989,
Engle, Ng, and Rothschild 1990, and Mills 1995, among others). The theory of EFs provides a
direct method to capitalize on this information.

7. MONTE CARLO SIMULATION

In this section we report Monte Carlo results demonstrating the benefits of using the EF
approach in the context of nonnormal data. We present finite sample properties for EF estimators
relative to both ML and QML estimators for both a moderate and large sample of observations.
In each simulation we consider a moderately and highly persistent GARCH process.

We adopt a simplistic specification for skewness and kurtosis to examine the potential of the
EF approach. To admit meaningful comparisons with prior simulation results of Engle and
Gonzalez-Rivera (1991), and Drost and Klaassen (1997), we generate a GARCH(1,1) of length
T=500 or T=1,000 for various values of (α , β) given by (.1, .8) or (.05, .9). The GARCH
process with parameters α and β is described in detail in section 6. For each (α , β) pair, we

17
consider six error distributions: a Normal distribution; a Student t distribution with 5, 8, or 12
degrees of freedom; and a Gamma distribution with parameter of 12, or 30. In all cases, the
unconditional mean and variance are zero and one, respectively. For each generated series, we
then estimate (α , β ) using ML, QML or the EF approach.

Given that the true error distribution is unknown, ML estimation represents an unattainable
outcome in practice. Nonetheless, ML results provide a meaningful bound for comparison
purposes. To maintain comparability with the work of Drost and Klaassen, initial values for the
EF approach are given by the QML estimates. Our empirical application of the EF approach
numerically minimizes the sum of the squared optimal estimating functions from equation (15),
2 2
l1* + l 2* . To define the optimal EFs requires a specification for the skewness and kurtosis
parameters, γ1 t and γ2 t . As a first approximation, we propose the following simple approach.
For each conditional series analyzed, we standardize the series to have a sample mean of zero and
a sample variance of one. The skewness and kurtosis parameters used in estimation are the
sample means of the third power of the standardized series, and the fourth power of the
standardized series, less 3. Future research is warranted to examine the possible benefits that are
attainable through more complex estimation strategies for these nuisance parameters. Possible
alternatives worth consideration include allowing these parameters to follow temporal processes,
or to allow them to iterate within the estimation process.

Table 1 reports summary measures for each of the estimated parameters based on 2500
replications of the above experiment. The first column of the table describes the error distribution
used to generate the data. For each (α , β ) pair considered, we estimate the GARCH parameters

(αˆ, βˆ) under ML, QML or the EF approach. The sample means αˆ , βˆ , standard deviations
 

(σˆ
αˆ , σˆβˆ ), biases (bias αˆ ,bias βˆ ), and root mean squared errors (rmse αˆ , rmse βˆ ) for each estimation
approach are presented in the remaining columns of the table.

*** insert Table 1 about here ***

18
Panels A and B of Table 1 report the simulation results for our moderate sample size
experiment with T=500 observations. Moderate and high persistence series are given by
(α , β )=(.1, .8) or (.05, .9), respectively. In the first row of each panel we report the simulation
results for the Normal distribution (in which case, ML and QML estimation are equivalent). The
next three rows of each panel present summary results for Student t innovations with 5, 8, or 12
degrees of freedom. The final two rows of the panel detail the results for the Gamma distribution
with shape parameter given by 12 or 30.

The reported standard errors of the estimated parameters show the familiar result that the
QML estimators are inefficient relative to the unattainable ML estimators. As an example,
consider the case of moderate persistence for the heavily tailed Student-t distribution with five
degrees of freedom in panel A. The standard deviation of the 2500 estimated α parameters is .05
for the ML approach and .066 for the QML estimates. Thus, in this example the QML estimator
suffers a loss in efficiency relative to the ML estimator. The EF approach partially recovers this
loss in efficiency as suggested by the reported standard deviation of .062 for α .

The EF approach is based on unbiased estimating functions in the finite sample, not unbiased
estimators of the underlying parameters. For this reason, we report the finite sample bias and root
mean squared errors for each of the estimators. Continuing with our previous example, we
observe that the EF approach shows a smaller finite sample bias and root mean squared error
(rmse) for both α and β relative to the QML estimates.

The remaining results in panel A show that the EF approach partially recovers the QML
efficiency loss in virtually every instance. In addition, the finite sample bias of the EF approach is
often less than the QML bias, and the EF rmse is always less than the QML rmse. The only
exception to this finding is in the case of normality where the rmse for α is improved and the
rmse for β is slightly worsened.

The high persistence results in panel B display a similar pattern. The EF standard deviations
are always smaller than the comparable QML estimator standard deviations in all cases of
nonnormal data. Similarly, in all cases of nonnormal data, the EF rmse always improves upon the
QML estimator rmse. In the case of normality the rmse results for the ML, QML and EF
approaches are virtually identical.

19
Panels C and D of Table 1 report the estimation results assuming an underlying GARCH
process given by (α , β )=(.1, .8) or (.05, .9), respectively, for our large sample experiment with
T=1,000 observations. The presentation format follows that in panels A and B.

In every case considered in panels C and D, we observe that the ML estimator displays less
absolute bias than the comparable QML estimator. Surprisingly, we also find that the EF
estimator always displays less bias than the QML estimator (with the exception of the case when
ML and QML are equivalent under normality). This finding suggests that the EF approach can be
used to improve the location of QML estimates even with our proposed simple specification for
skewness and kurtosis.

The standard deviation of the EF estimator relative to the ML and QML estimator in panels
C and D also behave well for the larger sample results. The information available from empirical
third and fourth moments can be used to improve the efficacy of the QML estimator. Comparison
of the reported EF standard deviations to the QML standard deviations suggests a marked
improvement in virtually every case. The sole exception to this result for nonnormal data occurs
for the Student t distribution with 12 degrees of freedom in panel C. In this case, we observe
(σˆ
αˆ , σˆβˆ ) equal to (.032, .097) and (.033, .085) for the QML and EF estimators, respectively. In

general we conclude that the EF focus on the finite sample from the outset leads to a substantial
increase in efficiency. This finding is especially important given the lack of bias found in the EF
estimator.

Drost and Klaassen (DK, 1997) propose an alternative semiparametric estimator to that of
Engle and Gonzalez-Rivera (1991). Based on a simulation experiment similar to ours, they show
substantial efficiency gains for their 1-step estimator relative to the QML estimator. The
simulation framework of DK is also based on 2,500 replications of draws from a GARCH(1,1)
series of length T=1,000. In contrast to our research design, DK employ a variance specification
1
with an unconditional variance of for given GARCH parameters α * and β * . In spite
1− α − β
* *

of the differences between our studies, both proposed estimators behave comparably. The
primary difference between the EF and DK estimators relates to bias. We find that, relative to the

QML estimator, the EF estimator leads to a reduction in bias for both αˆ and βˆ ; DK find a

20
substantial increase in the bias for αˆ* , and a commensurate reduction in the bias for βˆ* relative
to their QML estimator.

8. AN EMPIRICAL EXAMPLE: THE S&P500 DAILY INDEX

In this section, we consider an empirical application of the estimating function approach


using the Standard and Poor’s 500 daily composite stock index (SP500) series for the sample
period from Thursday, January 23, 1941 through Monday, January 15, 1996. The SP500 is a
value-weighted index of common stock prices. Prior to March of 1957, the index was composed
of only 90 stocks. Subsequent to March of 1957, the index was expanded to 500 stocks. Finally,
in July of 1976, the index included a group of financial stocks, some of which now trade over the
counter (c.f., French, Schwert and Stambaugh 1987, or Gallant, Rossi, and Tauchen 1992 for
further discussion). This series does not include dividend distributions; however, for ease of
discussion we use the terms return and percentage price change interchangeably.

Our primary focus is to present an appropriate and accurate representation of the GARCH
process governing second moment evolution of this series. To cleanse the raw series of any
conditional mean effects and any deterministic variance effects, we adopt a procedure similar to
Gallant, Rossi, and Tauchen (1992). The general issue of whitening the data for our example is
not trivial. We seek to model the variance process for a realistic series displaying zero conditional
mean and unit variance. Deviations from normality in the third and fourth moments will not cause
estimation difficulties; however, we wish to cleanse any known deterministic effects from the first
and second moments. Removal of effects related to events like Black Monday, Oct. 19, 1987,
will depend on the researcher's ideology in treating outliers, as well as the goal of the research
undertaken. For completeness, we consider three levels of filtering.

We begin by regressing percentage changes in the SP500 daily index on dummy variables
related to calendar effects, wartime years, changes in the composition of the index, and
autoregressive mean effects in the following mean adjustment equation,

yt = xt ′
β + ut , (20)

21
where y t is the original percentage change in the SP500 series, x t is a vector of regressors, and
ut is the disturbance term. The least squares residuals from equation (22) are then fit using an
AR(10) process to remove any possible remaining temporal persistence in the conditional mean
that might influence later variance estimates. These AR(10) residuals are then transformed for use
as a dependent variable in the variance specification,

log( et2 ) = x t ′
γ+ εt . (21)

The final series used in our examples are the standardized residuals, constructed as,

et
zt = (22)

exp( x t γ/ 2 )

and standardized to have zero mean and unit variance.

Table 2 reports the estimation results for three alternative sets of whitening regressors. In a
similar context, Gallant, Rossi, and Tauchen (1992) also consider additional dummy variables for
i) the months of February, March, April, May, June, July, August, September, October, and
November, ii) trading gaps of 1, 2, 3, and 4 days, and iii) time trends and quadratic time trends for
the variance specification. In contrast to Gallant, Rossi, and Tauchen (1992), we also filter out
effects related to i) each of the 10 trading days following Black Monday, 1987, for the conditional
mean and variance, ii) a dummy variable for changes in the composition of the index in March of
1957 and July of 1976, and iii) temporal components from the conditional mean using an AR(10)
process. A good discussion of the financial literature surrounding these filters can be found in
Lakonishok and Smidt (1988), or Gallant, Rossi, and Tauchen (1992).

The first column of Table 2 contains the estimated autoregressive terms for the raw series
without any additional adjustments. We observe significant autocorrelation coefficients over the
first six lags, possibly related to weekly effects. The next two columns of the table report day of
the week effects for both the mean and variance specifications. With the exception of
Wednesdays and Thursdays, all days of the week display significantly negative returns as shown in
the mean column. The extreme negative coefficient for Mondays is the familiar weekend effect.
The autoregressive coefficients in the conditional mean column are qualitatively similar to the
purely autoregressive analysis reported in the first column. The estimated day-of-the-week

22
variance effects suggest that the Thursday and Friday returns display significantly lower variance
than the earlier portion of the week. In total, the reported day-of-the-week effects, suggest a
reduction in both expected returns and risks when additional conditioning information is
considered.

*** insert Table 2 about here ***

The final two columns of the table present preliminary estimation results, when all of the
dummy variables and autoregressive components are considered. The conditional mean
coefficients for days of the week and the autoregressive component are comparable, although the
Monday effect is not as extreme given the effect of Monday, Oct. 19, 1987 has been mitigated.
The January effect can be observed in the positive returns for the last week of December and first
week of January. Interestingly, we find a large positive effect in the last weeks of both December
and January. The conditional variance effects reported during the ten-day crash period must be
interpreted with caution. During these days the conditional mean effect is clearly negative and
substantial (conditional upon knowledge of the crash). The dummy variable in the conditional
mean removes the primary effect on a day by day basis; resulting in a very good fit for the day
considered, and leaving a relatively small amount of variability to explain in the variance equation.
The remaining variables included have relatively little effect on the conditional mean of the series.

Table 3 reports summary statistics describing the raw SP500 percentage price changes as
well as the three whitened and standardized series. The first column of the table reports the
summary statistics for the raw SP500 percentage price change series. The conditional daily
effective mean is .00032, with a standard deviation of .00080. Thus, the unconditional sample
reward-to-variability (or Sharpe) ratio is .40. The data is somewhat left skewed and highly
leptokurtic. The departures from normality are severe as indicated by the reported Jarque-Bera
test statistics. The Ljung-Box portmanteau statistics demonstrate that the raw series displays
serious temporal persistence in the conditional mean specification and the conditional variance
specification when considering lags of 15, 20 or 25 trading days. The reported robust Q statistics
(c.f., Lo and MacKinlay 1989, and Lobato, Nankervis, and Savin 1998) support these findings for
lags of 15, 20 or 25 trading days.

23
The final three columns of Table 3 report the same results for the three series generated
from the whitening procedure detailed in Table 2. All series have been standardized to have zero
mean and unit variances for our estimating function example. Skewness is significantly negative
in only two of the series, while all transformed series remain highly leptokurtic. Thus, even after
substantial filtering, the data display serious departures from normality, requiring an estimation
methodology that is robust to such departures. The conditional means of all remaining series
appear to be relatively free of temporal dependence in the mean; however, the conditional
variances remain highly persistent. It is this remaining persistence that we hope to fit with our
GARCH specification.

*** insert Table 3 about here ***

Table 4 reports the estimated parameters and diagnostic statistics for the standardized
residuals for the GARCH(1,1) process for each of the three filtered and whitened series described
in Table 3. Our application of the EF approach uses a simple sample mean of the third power of
the standardized series for the skewness parameter, and a sample average of the fourth power of
the standardized series, less 3, for the central kurtosis parameter.

The reported results show that both GARCH parameters are highly significant and suggest a
lengthy conditional variance decay process for all series. In this large sample empirical application
we find little change in the estimated parameters for two of the three cases. In the final column of
Table 4, where all filters are applied, we observe a substantial reduction in the estimate for α .
The earlier simulation results suggest that the efficiency of the reported estimates should be
improved over QML estimates given the extreme nonnormality in all of the series. The mean of
the standardized residuals from the estimation are consistently negative, suggesting that smaller
conditional variance terms are more often associated with negative mean errors. The skewness
and kurtosis of the standardized residuals retain their nonnormal characteristics of significantly
negative skewness and leptokurtism. The final three rows of the table demonstrate that the
temporal dependence in the conditional variance up to lag 25 has been adequately captured by the
GARCH(1,1) process.

24
Our example demonstrates that even after extensive filtering of the raw data, serious
nonnormalities remain in the data analyzed. The EF methodology explicitly uses this information
to improve the efficiency of the estimator and to lessen the bias in estimated GARCH parameters.
Further research is warranted to expand the estimation procedure to allow for temporal changes
in skewness and kurtosis, and to allow the nuisance parameters to change iteratively.

*** insert Table 4 about here ***

9. CONCLUDING COMMENTS

We have demonstrated the benefits of using the estimating function (EF) approach for
modeling data drawn from nonnormal conditional distributions. The approach naturally takes
advantage of departures from normality to improve the efficiency of estimated parameters given a
finite sample of data. In comparison with asymptotically based procedures, the focus on the finite
sample in the EF approach is important. We find efficiency gains from the EF approach are
substantial. Thus, the estimating function approach will be most useful in cases with serious
departures from normality where efficiency is important. The estimating function approach
should be a natural forecasting technology, when accurate and small confidence bounds are
sought.

Our simulation results suggest that the finite sample bias and variance of EF estimators are
desirable relative to other alternatives such as QML or other semiparametric techniques. In
particular, we find that both the finite sample bias and variance of the EF approach is virtually
always less than the QML estimator bias and variance.

Our empirical example demonstrates that the estimating function approach is readily
implemented in a simple example with a highly nonnormal data series. The approach is able to
successfully eliminate second order effects; however, nonnormal higher moments persist in
standardized residuals. The finite sample approach with unrestricted parameters for skewness and

25
kurtosis departures appears well suited to the data. Future research may seek to extend the
specifications for third and fourth moments to improve the efficiency of second moment estimates.

26
APPENDIX: COMPUTING THE RELATIVE EFFICIENCY OF THE
ESTIMATING FUNCTION APPROACH (GAMMA DISTRIBUTION)

Assuming the conditional density of yt follows a Gamma distribution with shape parameter
c, the optimal EFs are,

1 ∂h t  
T


c  y t2 − h t − 2
l 1* = − h t1 / 2 y t  and
2 (1 + c ) t =1 h t2 ∂ α 
 c

1 ∂h t  
T


c  y t2 − h t − 2
l 2* = − h t1 / 2 y t  .
2 (1 + c ) t =1 h t2 ∂ β 
 c

The variance of α$ EF and β$ EF can then be calculated as,

2
1  ∂h t 
T
Var (αˆEF )= ∑
V 22 c
=   V , and
V 2 (1 + c ) t =1 h t2  
 ∂β 

( )
2
1  ∂h t 
T


V c
Var βˆEF = 11 =   V
V 2 (1 + c ) t =1 h t2  
 ∂α 

where V − 1 is again the asymptotic variance-covariance matrix of the EF estimators for α and β,
and V is the determinant of V .

If the ML method is used, we have score functions of,

1 1 ∂h t 
c − ut

T 2
l 1* = − ∑ 2 ht ∂α  c + u t
,

and
t =1  

1 1 ∂h t 
c − ut

T 2
l 2* = − ∑ 2 ht ∂α  c + u t


t =1  

c
where u t = yt .
ht

27
I I 
The estimated information matrix multiplied by the sample size T, I =  11 12 , has
 I 12 I 22 

elements,
2
T
1 1  ∂h t 
2
 c − u t2 
I 11 = ∑ 
4 h t2 


 ∂α 

 c + ut
 ,

t =1  

2
1 1  ∂h t  ∂h t  c − ut

T 2
I 12 = I 21 = ∑ 
4 h t2 

 
 c+ u
 ∂ β  ∂ α 
 , and

t =1  t 

2
1 1  ∂h t   
2
c − ut
T 2
I 22 = ∑ 
4 h t2 

 c+ u
 ∂β  
 .

t =1  t 

Using the inverse of the information matrix as the variance-covariance matrix for α and β
yields the Gamma distribution relative efficiency measures as reported in equation (19).

28
REFERENCES

Anh, V.V. (1988), "Nonlinear Least Squares and Maximum Likelihood Estimation of Heteroscedastic Regression
Model," Stochastic Processes and their Applications, 29, 317-333.

Bhapkar, V. P. (1972), “On a Measure of Efficiency of an Estimating Function,” Sankhya, 34, 467-472.

Bodurtha, J. N. and Mark, N. C. (1991), “Testing the CAPM with Time-Varying Risks and Returns,” Journal of
Finance, 46, 1485-1505.

Bollerslev, T. (1986), “Generalized Autoregressive Conditional Heteroskedasticity,” Journal of Econometrics, 31, 307-
327.

Bollerslev, T. (1987), “A Conditional Heteroskedastic Time Series Model for Speculative Prices and Rates of Return,”
Review of Economics and Statistics, 69, 542-547.

Bollerslev, T., Chou, R. Y. and Kroner, K. F. (1992), “ARCH Modelling in Finance: A Review of the Theory and
Empirical Evidence,” Journal of Econometrics, 52, 5-59.

Bollerslev, T. and Wooldridge, J. M. (1988), “Quasi-Maximum Likelihood Estimation of Dynamic Models with Time
Varying Covariance,” Econometric Reviews, 11, 143-172.

Crowder, M. (1986), “On Consistency and Inconsistency of Estimating Equations,” Econometric Theory, 2, 305-330.

Chen, Y. (1993), “Asymptotic Theory of Optimal Estimating Functions,” Technical Report Series, STAT-93-01,
University of Waterloo.

Doob, (1953), Stochastic Processes, New York: John Wiley and Sons.

Drost, F. C., and Klaassen, C. A. J. (1997), "Efficient Estimation in Semiparametric GARCH Models," Journal of
Econometrics, 81, 193-221.

Durbin, J. (1960), "Estimation of Parameters in Time Series Regression Models," Journal of the Royal Statistical
Society, Series B, 22, 139-153.

Engle, R. F. (1982), “Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of U. K. Inflation,”
Econometrica, 50, 987-1008.

Engle, R. F. (1995), ARCH, Oxford: Oxford University Press.

Engle, R. F. and Bollerslev, T. (1986), “Modeling the Persistence of Conditional Variances,” Econometric Reviews, 5,
1-50, pp. 81-87.
Engle, R. F., Lilien, D. M. and Robins, R. P. (1987), “Estimating Time Varying Risk Premia in the Term Structure:
The ARCH-M Model,” Econometrica, 55, pp. 391-407.

Engle, Robert F., Ng, Victor K. and Rothschild, Michael (1990), “Asset Pricing with a Factor-ARCH Covariance
Structure: Empirical Estimates for Treasury Bills,” Journal of Econometrics, 45, 213-237.

Engle, R. F. and Gonzalez-Rivera, G. (1991), “Semiparametric ARCH Models,” Journal of Business & Economic
Statistics, 9, No. 4, pp. 345-359.

French, K., Schwert, G. W. and Stambaugh, R. (1987), “Expected Stock Returns and Volatility,” Journal of Financial
Economics, 19, 3-30.

Gallant, A. R., Rossi, P. E. and Tauchen, G. (1992), “Stock Prices and Volume,” Review of Financial Studies, 5, 199-
242.

Geweke, J. (1986), "Modelling the Persistence of Conditional Variances: A Comment," Econometric Reviews, 5, 1, 57-
61.

Glosten, L. R., Jaganathan, R. and Runkle, D. (1993), “On the Relation between the Expected Value and the Volatility
of the Nominal Excess Return on Stocks,” Journal of Finance, 48, 1779-1802.

Godambe, V. P. (1960), “An Optimum Property of Regular Maximum Likelihood Estimation,” The Annals of
Mathematical Statistics, 31, 1208-12.

Godambe, V. P. (1976), “Conditional Likelihood and Unconditional Optimum Estimating Equations,” Biometrika. 63,
277-84.

Godambe, V. P. (1985), “The Foundation of Finite Sample Estimation in Stochastic Processes,” Biometrika. 72, 419-
28.

Godambe, V. P., Ed. (1991), Estimating Functions, Oxford: Oxford University Press.

Godambe, V. P. and Heyde, C. C. (1987), “Quasi-likelihood and Optimal Estimation,” International Statistical Review,
55, 231-44.

Godambe, V. P. and Thompson, M. E. (1984), “Robust Estimation Through Estimating Equation,” Biometrika 71, 115-
25.

Godambe, V. P. and Thompson, M. E. (1989), “An Extension of Quasi-Likelihood Estimation (with discussion),”
Journal of Statistical Planning and Inference, 22, 137-72.

Hall , P. and Heyde, C. C. (1980), Martingale Limit Theory and Its Application, New York: Academic Press.

30
Heyde, C. C. (1989), “Quasi-likelihood and Optimality of Estimating Function: Some Current Unifying Themes,”
Bulletin of the International Statistical Institute, Book 1, 19-29.

Heyde, C. C. (1997), Quasi-likelihood and Its Applications, New York: Springer-Verlag.

Heyde, C. C., and Lin, Y. X. (1992), "On Quasi-likelihood Methods and Estimation for Branching Processes and
Heteroscedastic Regression Models," Australian Journal of Statistics, 34, 2, 199-206.

Higgins, M. L., and Bera, A. K. (1992), “A Class of Nonlinear ARCH Models,” International Economic Review, 33,
137-158.

Hsieh, D. A., (1989), “Modelling Heteroscedasticity in Daily Foreign Exchange Rates,” Journal of Business and
Economic Statistics, 7, 307-317.

Hutton, J. E. and Nelson, P. I. (1986), “Quasi-likelihood Estimation for Semimartingales,” Stochastic Processes and
their Applications, 22, 245-257.

Kale, B.K. (1962), "An Extension of the Cramer-Rao Inequality for Statistical Estimating Functions," Scandinavian
Actuarial Journal, 45, 60-89.

Lehmann, E. L., (1983), Theory of Point Estimation, New York: Wiley.

Liang, K. Y., and Zeger, S. L. (1986), “Longitudinal Data Analysis Using Generalized Linear Models,” Biometrika, 73,
13-22.

Lakonishok, J. and Smidt, S. (1988), “Are Seasonal Anomalies Real? A Ninety-Year Perspective,” Review of Financial
Studies, 1, 403-425.

Lo, A. W., and MacKinlay, A. C. (1989), “The Size and Power of the Variance Ratio Test in Finite Samples: A Monte
Carlo Investigation,” Journal of Econometrics, 40, 203-238.

Lobato, I., Nankervis, J. C., and Savin, N. E. (1998), “Testing that Stock Returns are Uncorrelated Using a Modified
Box-Pierce Q-Test,” working paper, University of Iowa.

Mills, T. (1995), “Modelling Skewness and Kurtosis in the London Stock Exchange FT-SE Index Return Distributions,”
The Statistician, 44, 323-332.

Nelson, D., (1991), “Conditional Heteroscedasticity in Asset Returns: A New Approach,” Econometrica, 59, 347-370.

Rogalski, R. J., and Vinso, J. D. (1978), “Empirical Properties of Foreign Exchange Rates,” Journal of International
Business Studies, 9, 69-79.

31
Schwert, G. William (1989), “Why Does Stock Market Volatility Change Over Time,” Journal of Finance, 44, 1115-
1153.

Sentana, E., (1991), “Quadratic ARCH models: A potential re-interpretation of ARCH models,” Unpublished working
paper, CEMFI, Madrid.

Taylor, S. J. (1994), “Modelling Stochastic Volatility,” Mathematical Finance, 4, 183-204.

Vinod, H. D. (1996), "Using Godambe-Durbin Estimating Function in Econometrics," Proceedings, Institute of


Mathematical Statistics, Symposium on Estimating Functions.

Weiss, A. A. (1986), “Asymptotic Theory for ARCH Models: Estimation and Testing,” Econometric Theory, 2, 107-
131.

32
Table 1
Finite Sample Properties of ML, QML and EF estimates
αˆ βˆ σˆαˆ σˆβˆ biasαˆ bias βˆ rmseαˆ rmse βˆ
Panel A. α = 0.1, β = 0.8, T=500
Normal ML=QML 0.104 0.774 0.045 0.131 0.004 -0.026 0.046 0.133
EF 0.103 0.771 0.044 0.130 0.003 -0.029 0.044 0.134
Student t (5) ML 0.105 0.775 0.050 0.134 0.005 -0.025 0.050 0.137
QML 0.112 0.759 0.066 0.170 0.012 -0.041 0.067 0.174
EF 0.109 0.769 0.062 0.155 0.009 -0.031 0.063 0.158
Student t (8) ML 0.105 0.772 0.047 0.137 0.005 -0.028 0.048 0.140
QML 0.105 0.773 0.053 0.145 0.005 -0.027 0.053 0.148
EF 0.107 0.767 0.051 0.143 0.007 -0.033 0.052 0.147
Student t (12) ML 0.103 0.775 0.046 0.135 0.003 -0.025 0.046 0.138
QML 0.104 0.771 0.049 0.146 0.004 -0.029 0.049 0.149
EF 0.104 0.771 0.047 0.140 0.004 -0.029 0.048 0.143
Gamma (12) ML 0.102 0.777 0.039 0.118 0.002 -0.023 0.039 0.120
QML 0.105 0.769 0.048 0.143 0.005 -0.031 0.048 0.146
EF 0.102 0.777 0.043 0.126 0.002 -0.023 0.043 0.128
Gamma (30) ML 0.102 0.778 0.042 0.128 0.002 -0.022 0.042 0.130
QML 0.104 0.769 0.045 0.141 0.004 -0.031 0.035 0.144
EF 0.103 0.775 0.043 0.130 0.003 -0.025 0.043 0.133

Panel B. α = 0.05, β = 0.9, T=500


Normal ML=QML 0.054 0.863 0.032 0.139 0.004 -0.037 0.033 0.144
EF 0.055 0.858 0.033 0.138 0.005 -0.042 0.034 0.144
Student t (5) ML 0.056 0.865 0.036 0.138 0.006 -0.035 0.036 0.142
QML 0.061 0.853 0.051 0.164 0.011 -0.047 0.052 0.171
EF 0.063 0.846 0.051 0.158 0.013 -0.054 0.052 0.167
Student t (8) ML 0.056 0.860 0.035 0.142 0.006 -0.040 0.035 0.147
QML 0.056 0.860 0.040 0.152 0.006 -0.040 0.041 0.157
EF 0.057 0.858 0.039 0.140 0.007 -0.042 0.039 0.146
Student t (12) ML 0.055 0.861 0.035 0.144 0.005 -0.039 0.035 0.149
QML 0.055 0.864 0.035 0.141 0.005 -0.036 0.035 0.145
EF 0.055 0.864 0.034 0.131 0.005 -0.036 0.034 0.136
Gamma (12) ML 0.053 0.873 0.030 0.123 0.003 -0.027 0.030 0.126
QML 0.055 0.864 0.035 0.139 0.005 -0.036 0.036 0.143
EF 0.054 0.864 0.032 0.126 0.004 -0.036 0.033 0.131
Gamma (30) ML 0.053 0.868 0.030 0.122 0.003 -0.032 0.030 0.126
QML 0.054 0.864 0.035 0.139 0.004 -0.036 0.035 0.144
EF 0.054 0.866 0.032 0.128 0.004 -0.034 0.033 0.132

33
αˆ βˆ σˆαˆ σˆβˆ biasαˆ bias βˆ rmseαˆ rmse βˆ
Panel C. α = 0.1, β = 0.8, T=1,000
Normal ML=QML 0.102 0.788 0.028 0.076 0.002 -0.012 0.028 0.077
EF 0.101 0.786 0.027 0.078 0.001 -0.014 0.027 0.079
Student t (5) ML 0.102 0.787 0.032 0.083 0.002 -0.013 0.032 0.083
QML 0.108 0.771 0.044 0.123 0.008 -0.029 0.045 0.126
EF 0.107 0.775 0.042 0.109 0.007 -0.025 0.043 0.112
Student t (8) ML 0.101 0.788 0.030 0.082 0.001 -0.012 0.030 0.083
QML 0.104 0.777 0.035 0.106 0.004 -0.023 0.036 0.108
EF 0.103 0.785 0.034 0.088 0.003 -0.015 0.034 0.089
Student t (12) ML 0.102 0.785 0.029 0.079 0.002 -0.015 0.029 0.081
QML 0.104 0.779 0.032 0.097 0.004 -0.021 0.032 0.100
EF 0.102 0.786 0.033 0.085 0.002 -0.014 0.033 0.087
Gamma (12) ML 0.101 0.791 0.025 0.063 0.001 -0.009 0.025 0.063
QML 0.103 0.782 0.032 0.100 0.003 -0.018 0.032 0.102
EF 0.102 0.790 0.029 0.074 0.002 -0.010 0.029 0.074
Gamma (30) ML 0.102 0.787 0.028 0.075 0.002 -0.013 0.028 0.076
QML 0.103 0.781 0.031 0.094 0.003 -0.019 0.031 0.095
EF 0.101 0.787 0.028 0.079 0.001 -0.013 0.028 0.080

Panel D. α = 0.05, β = 0.9, T=1,000


Normal ML=QML 0.052 0.883 0.022 0.087 0.002 -0.017 0.022 0.088
EF 0.054 0.874 0.022 0.099 0.004 -0.026 0.022 0.103
Student t (5) ML 0.052 0.888 0.023 0.070 0.002 -0.012 0.023 0.071
QML 0.059 0.842 0.036 0.185 0.009 -0.058 0.038 0.194
EF 0.057 0.869 0.031 0.112 0.007 -0.031 0.032 0.116
Student t (8) ML 0.051 0.887 0.021 0.068 0.001 -0.013 0.021 0.070
QML 0.057 0.846 0.028 0.172 0.007 -0.054 0.029 0.180
EF 0.055 0.873 0.025 0.096 0.005 -0.027 0.026 0.099
Student t (12) ML 0.052 0.883 0.022 0.082 0.002 -0.017 0.022 0.084
QML 0.055 0.848 0.025 0.177 0.005 -0.052 0.026 0.184
EF 0.054 0.876 0.023 0.090 0.004 -0.024 0.024 0.093
Gamma (12) ML 0.052 0.884 0.019 0.072 0.002 -0.016 0.019 0.074
QML 0.057 0.844 0.025 0.175 0.007 -0.056 0.026 0.184
EF 0.053 0.879 0.021 0.088 0.003 -0.021 0.021 0.091
Gamma (30) ML 0.053 0.880 0.020 0.081 0.003 -0.020 0.020 0.084
QML 0.055 0.851 0.025 0.168 0.005 -0.049 0.025 0.175
EF 0.053 0.877 0.021 0.089 0.003 -0.023 0.022 0.092

34
Table 2. Data Adjustments for S&P500 Daily Index Returns
Whitened and Standardized Data
AR Terms and Day of the
Only AR Terms Week Effects AR Terms and All Dummies
Mean Mean Variance Mean Variance
*** *** ***
Constant .07880 -1.8879 .07605 -1.5448***
Day of the Week
Monday -.20029*** -.04107 -.18390*** -.06841
Tuesday -.08540*** -.06017 -.08482*** -.04610
Thursday -.06434* -.16872** -.06087** -.17843***
Friday -.04391*** -.22623*** -.04128* -.22504***
January Effect
Dec. 1 - 7 .10247* .30894**
Dec. 8 - 14 -.02406 -.01504
Dec. 15 - 21 .05941 .11450
Dec. 22 - 31 .10040*** -.87696***
Jan. 1 - 7 .07911 -.71494***
Jan. 8 - 14 -.04995 .47067***
Jan. 15 - 21 -.01437 .12693
Jan. 22 - 31 .11882** .13879
Black Monday
Oct. 19 -25.536*** .81099***
Oct. 20 6.5203*** -1.2401***
Oct. 21 11.273*** -3.8493***
Oct. 22 -4.9666*** -2.0639***
Oct. 23 -.08961*** -2.0673***
Oct. 26 -10.294*** -2.8112***
Oct. 27 3.0037*** -9.2526***
Oct. 28 -.06746*** -2.8453***
Oct. 29 6.1099*** -3.8248***
Oct. 30 3.5152*** -5.2992***
War Years -.00043 .02935
March, 1957 .00231 -.29899***
July, 1976 -.03025 -.48008***

Constant .00029 .00029 .00029


AR terms
1 .12064*** .12136*** .12614***
2 -.05160*** -.05024*** -.03823***
3 .00046 .00178 .00123
4 -.01641* -.01579* -.00402
5 .02253*** .01809** -.00077
6 -.02908*** -.02773*** -.03146***
7 .00628 .00737 -.00020
8 .00594 .00729 .01916**
9 -.01164 -.01105 -.01125
10 .00127 -.00304 .00432
NOTE: The sample period contains the 14,342 return observations from Thursday, January 23, 1941 through Monday, January
15, 1996. Significance at the 1, 5, and 10 percent levels are denoted by ***, **, and *, respectively.

35
Table 3. Summary Statistics for the Raw and Whitened S&P500 Daily Index Return Data
Whitened and Standardized Data
Raw Series Only AR terms AR Terms and Day AR T
of the Week Effects
Mean .00032 .00000 .00000
Minimum -.20457 -25.254 -24.514
Maximum .09099 10.378 10.255
Standard Deviation .00799 1.0000 1.0000
γ
1 -1.3062*** -1.2123*** -1.1347***
γ2 36.9623*** 35.3327*** 32.0499***
Jarque-Bera 820500*** 749540*** 616910***
Ljung-Box portmanteau
QX (15) 229.14*** 4.71 4.53
Q X ( 20) 241.58*** 14.98 12.44
Q X ( 25) 244.21*** 17.37 14.00
Q XX (15) 1442.68*** 1650.97*** 1807.81***
Q XX ( 20) 1474.68*** 1686.01*** 1849.12***
Q XX ( 25) 1497.32*** 1713.87*** 1881.78***
Robust Q statistics
Q *X (15 ) 39.39*** 2.39 2.23
Q *X ( 20 ) 46.44*** 8.29 6.71
Q *X ( 25 ) 47.97*** 9.64 7.57
NOTE: Significance at the 1, 5, and 10 percent levels are denoted by ***, **, and *, respectively.

37
Table 4. Estimated Parameters and Diagnostic Analysis
AR Terms and Day of the AR Term
Only AR Terms Week Effects Dum
α$ .06670 ***
.06703*** .05
β$
***
.92330 .92297*** .92
Mean -.00915 -.00864 -.00
Minimum -15.776 -15.733 -15.
Maximum 10.784 10.207 9.0
Standard Deviation 1.2775 1.2777 1.1
γ1 -.60732*** -.62492*** -.57
γ2 6.53001*** 6.72728*** 6.7
Jarque-Bera 26363. *** 27978. *** 282
Ljung-Box portmanteau
Q XX (15) 19.57 16.45 15
Q XX ( 20) 20.86 17.83 17
Q XX ( 25) 23.71 20.80 18
NOTE: Significance at the 1, 5, and 10 percent levels are denoted by ***, **, and *, respectively.

39
40