Anda di halaman 1dari 137

Chapter 2 Stationary Time Series Models

This chapter develops the Box-Jenkins Method-


ology for estimating time series models of the
form
y
t
= a
0
+a
1
y
t1
+. . .+a
p
y
tp
+
t
+
1

t1
+. . .+
q

tq
which are called autoregressive integrated mov-
ing average (ARIMA) models.
The chapter has three aims:
1. Present the theory of stochastic linear dier-
ence equations and consider time series proper-
ties of stationary ARIMA models; a stationary
ARIMA model is called an autoregressive mov-
ing average (ARMA) model.
1
2. Develop tools used in estimating ARMA mod-
els. Especially useful are the autocorrelation
functions (ACF) and partial autocorrelation func-
tions (PACF).
3. Consider various test statistics to check for
model adequacy and show how a properly es-
timated model can be used for forecasting.
1. Stochastic Dierence Equation Models
Stochastic dierence equations are a conve-
nient way of modeling dynamic economic pro-
cesses. To take a simple example, suppose the
Federal Reserves money supply target grows
3% each period. Hence,
m

t
= 1.03m

t1
(1)
so that, given the initial condition m

0
, the par-
ticular solution is
m

t
= (1.03)
t
m

0
2
where m

t
=the logarithm of the money supply
target in period t, m

0
=the logarithm of money
supply in period 0.
Of course, the actual money supply, m
t
, and
the target money supply, m

t
, need not be equal.
Suppose that at the beginning of period
t there are m
t1
dollars so that the gap
between the target and the actual money
supply is m

t
m
t1
Suppose that Fed cannot perfectly control
the money supply but attempts to change
the money supply by % of any gap be-
tween the desired and actual money supply
We can model this behavior as
m
t
= [m

t
m
t1
] +
t
3
Using (1), we obtain
m
t
= (1.03)
t
m

0
+(1 )m
t1
+
t
(2)
where
t
is the uncontrollable portion of the
money supply, and we assume its mean is
zero in all time periods.
Although the model is overly simple, it does
illustrate the key points:
1. Equation (2) is a discrete dierence equa-
tion. Since {
t
} is stochastic, the money
supply is stochastic; we call (2) a linear
stochastic dierence equation.
4
2. If we knew the distribution of {
t
}, we could
calculate the distribution for each element in
the {m
t
} sequence. Since (2) shows how the
realizations of the {m
t
} sequence are linked
across time, we would be able to calculate the
various joint probabilities. We note that the
distribution of the money supply sequence is
completely determined by the parameters of
the dierence equation (2) and the distribu-
tion of the {
t
} sequence.
3. Having observed the rst t observations in
the {m
t
} sequence, we can make forecasts of
m
t+1
, m
t+2
, . . . ,. For example, updating (2)
by one period and taking the conditional ex-
pectation, the forecast of m
t+1
is: E
t
m
t+1
=
(1.03)
t+1
m

0
+(1 )m
t
.
5
In this as well as all other chapters the se-
quence {
t
} will always refer to a white noise
process, that is, for each t,
t
is a process with
the following three properties
Zero mean:
E(
t
) = E(
t1
) = . . . = 0
Constant variance:
var(
t
) = var(
t1
) = . . . =
2
or, E(
2
t
) = E(
2
t1
) = . . . =
2
Uncorrelated with all other realizations:
cov(
t
,
ts
) = cov(
tj
,
tjs
) = 0
or, E(
t

ts
) = E(
tj

tjs
) = . . . = 0 for
all j and s
6
A white noise process can be used to construct
more interesting time series processes. For ex-
ample, the time series
x
t
=
q

i=0

ti
(3)
is constructed by taking the values
t
,
ti
, . . . ,
tq
and multiplying each by the associated value of

i
.
A series formed in this manner is called a
moving average of order q
It is denoted by MA(q)
Although the sequence {
t
} is a white noise
process, the sequence {x
t
} will not be a
white noise process if two or more of the

i
are dierent from zero
7
To illustrate using an MA(1) process, set
0
=
1,
1
= 0.5, and all other
i
= 0. Then
E(x
t
) = E(
t
+0.5
t1
) = 0
var(x
t
) = var(
t
+0.5
t1
) = 1.25
2
E(x
t
) = E(x
ts
) and var(x
t
) = var(x
ts
)
for all s
Hence, the rst two conditions for {x
t
} to be
a white noise are satised. However,
E(x
t
x
t1
) = E[(
t
+0.5
t1
)(
t1
+0.5
t2
)]
= E[
t

t1
+0.5(
t1
)
2
+0.5
t

t2
+0.25
t1

t2
]
= 0.5
2
Given that there exists a value of s = 0
such that E(x
t
x
ts
) = 0,the sequence {x
t
}
is not a white noise process
8
2. ARMA Models
It is possible to combine a moving average pro-
cess with a linear dierence equation to obtain
an autoregressive moving average model. Con-
sider the p-th order dierence equation:
y
t
= a
0
+
p

i=1
a
i
y
ti
+x
t
. (4)
Now let {x
t
} be the MA(q) process given by
(3) so that we can write
y
t
= a
0
+
p

i=1
a
i
y
ti
+
q

i=0

ti
(5)
where by convention we normalize
0
to unity.
If the characteristic roots of (5) are all in
the unit circle, then y
t
is said to follow an
autoregressive moving average (ARMA)
model
9
The autoregressive part of the model is the
dierence equation given by the homoge-
neous portion of (4) and the moving aver-
age part is the x
t
sequence
If the homogeneous part of the dierence
equation contains p lags and the model
for x
t
contains q lags, the model is called
ARMA(p,q) model
If q = 0, the model is a pure autoregressive
model denoted by AR(p)
If p = 0, the model is a pure moving aver-
age model denoted by MA(q)
In an ARMA model, it is permissible to al-
low p and/or q to be innite
10
If one or more characteristic roots of (5)
are greater than or equal to unity, the {y
t
}
sequence is called an integrated process
and (5) is called an autoregressive inte-
grated moving average (ARIMA) model
This chapter considers only models in which
all of the characteristic roots of (5) are
within the unit circle
Treating (5) as a dierence equation suggests
that y
t
can be solved in terms of {
t
} sequence
The solution of an ARMA(p,q) model ex-
pressing y
t
in terms of the {
t
} sequence is
the moving average representation of y
t
11
For the AR(1) model: y
t
= a
0
+a
1
y
t1
+
t
,
the moving average representation can be
shown to be
y
t
= a
0
/(1 a
1
) +

i=0
a
i
1

ti
For the general ARMA(p,q) model, using
the lag operator L, (5) can be rewritten as
(1
p

i=1
a
i
L
i
)y
t
= a
0
+
q

i=0

ti
so the particular solution for y
t
is
y
t
= (a
0
+
q

i=0

ti
)/(1
p

i=1
a
i
L
i
) (6)
The expansion of (6) yields an MA() pro-
cess
12
Issue: whether the expansion is convergent
so that the stochastic dierence given by
(6) is stable
Will see in the next section, the stability
condition is that the roots of the polyno-
mial (1

p
i=1
a
i
L
i
) must lie outside the
unit circle
Will also see that, if y
t
is a linear stochastic
dierence equation, the stability condition
is a necessary condition for the time series
{y
t
} to be stationary
3. Stationarity
Suppose the quality control division of a
manufacturing rm samples four machines
each hour. Every hour, quality control nds
the mean of the machines output levels.
13
The plot of each machines hourly output is
shown in Figure 2.1. If y
it
represents machine
y
i
s output at hour t, the means ( y
t
) are readily
calculated as
y
t
=
4

i=1
y
it
/4.
For hours 5, 10, and 15, these mean values are
4.61, 5.14, and 5.03, respectively.
The sample variance for each hour can similarly
be constructed.
Unfortunately, we do not usually have the
luxury of being able to obtain an ensem-
ble, that is, multiple observations of the
same process over the same time period
Typically, we observe only one set of re-
alizations, that is, one observation, of a
process, over a given time period
14
Fortunately, if {y
t
} is a stationary series,
the mean , variance, and autocorrelations
can be well approximated by suciently
long time averages based on the single
set of realizations
Suppose you observed the output of machine
1 for 20 periods. If you knew that the out-
put was stationary, you could approximate the
mean level of output by
y
t

20

i=1
y
it
/20.
In using this approximation you would be as-
suming that the mean was the same for each
period. Formally, a stochastic process having
a nite mean and variance is covariance sta-
tionary if for all t and t s,
Mean(y
t
) = Mean(y
ts
) =
E(y
t
) = E(y
ts
) = (7)
15
V ar(y
t
) = V ar(y
ts
) =
2
y
E[(y
t
)
2
] = E(y
ts
)
2
] =
2
y
(8)
Cov(y
t
, y
ts
) = Cov(y
tj
y
tjs
) =
s
E[(y
t
)(y
ts
)]
= E(y
tj
)(y
tjs
)] =
s
(9)
where ,
2
y
, and
s
are all constants. (For s =
0, (8) and (9) are identical, so
0
equals the
variance of y
t
.)
To reiterate, a time series is covariance sta-
tionary if its mean and all autocovariances
are unaected by a change in time origin
A covariance stationary process is also
referred to as a weakly stationary, second-
order stationary, or wide-sense station-
ary process
16
A strongly stationary process need not have
a nite mean and/or variance
In our course, we consider only covariance
stationary series. So there is no ambiguity
in using the terms stationary and covari-
ance stationary interchangeably
In multivariate models, the term autoco-
variance is reserved for the covariance be-
tween y
t
and its own lags
In univariate time series models, there is
no ambiguity and the terms autocovariance
and covariance are used interchangeably
For a covariance stationary series, we can de-
ne the autocorrelation between y
t
and y
ts
as

s

s
/
0
where
s
and
0
are dened by (9).
17
Since
s
and
0
are time-independent, the
autocorrelation coecients
s
are also time-
independent
Although the autocorrelation between y
t
and y
t1
can dier from the autocorrela-
tion between y
t
and y
t2
, the autocorrela-
tion between y
t
and y
t1
must be identical
to that between y
ts
and y
ts1
Obviously,
0
= 1
Stationarity Restrictions for an AR(1) Model
Let
y
t
= a
0
+a
1
y
t1
+
t
where
t
is a white noise.
18
Case: y
0
known
Suppose the process started in period zero, so
that y
0
is a deterministic initial condition. The
solution to this equation is
y
t
= a
0
t1

i=0
a
i
1
+a
t
1
y
0
+
t1

i=0
a
i
1

ti
. (10)
Taking expected value of (10), we obtain
Ey
t
= a
0
t1

i=0
a
i
1
+a
t
1
y
0
. (11)
Updating by s periods yields
Ey
t+s
= a
0
t+s1

i=0
a
i
1
+a
t+s
1
y
0
. (12)
Comparing (11) and (12), it is clear that
both means are time-dependent
Since Ey
t
= Ey
t+s
, the sequence cannot
be stationary
19
However, if t is large, we can consider the
limiting value of y
t
in (10)
If |a
1
| < 1, then a
t
1
y
0
converges to zero
as t becomes innitely large and the sum
a
0
[1 + a
1
+ (a
1
)
2
+ (a
1
)
3
+ . . .] converges
to a
0
/(1 a
1
)
Thus, if |a
1
| < 1, as t , we have
lim y
t
=
a
0
1 a
1
+

i=0
a
i
1

ti
. (13)
Now take expectations of (13)
Then we have, for suciently large values
of t, Ey
t
= a
0
/(1 a
1
), since E(
ti
) = 0
for all i
20
Thus, the mean value of y
t
is nite and
time-independent:
Ey
t
= Ey
ts
= a
0
/(1 a
1
) for all t.
Turning to the variance, we nd
E(y
t
)
2
= E[(
t
+a
1

t1
+(a
1
)
2

t2
+. . .)
2
]
=
2
[1 +(a
1
)
2
+(a
1
)
4
+. . .]
=
2
/[1 (a
1
)
2
]
which is also nite and time-independent
Finally, the limiting values of all autoco-
variances,
s
, s = 0, 1, 2, . . ., are also nite
and time-independent:

s
= E[(y
t
)(y
ts
)]
= E{[
t
+a
1

t1
+(a
1
)
2

t2
+. . .]
[
ts
+a
1

ts1
+(a
1
)
2

ts2
+. . .]}
=
2
(a
1
)
s
[1 +(a
1
)
2
+(a
1
)
4
+. . .]
=
2
(a
1
)
s
/[1 (a
1
)
2
]
(14)
21
Case: y
0
unknown
Little would change were we not given the ini-
tial condition. Without the initial value y
o
,
the sum of the particular solution and homo-
geneous solution for y
t
is
y
t
= a
0
/(1 a
1
) +

i=0
a
i
1

ti
. .
particular solution
+ A(a
1
)
t
. .
homogeneous solution
(15)
where A= an arbitrary constant = deviation
from long-run equilibrium.
If we take the expectation of (15), it is
clear that the {y
t
} sequence cannot be sta-
tionary unless the particular solution A(a
1
)
t
is equal to zero
Either the sequence must have started in-
nitely long ago (so that a
t
1
= 0) or the
arbitrary constant A must be zero
22
Thus, we have the stability conditions:
The homogeneous solution must be zero.
Either the sequence must have started in-
nitely far in the past or the process must
always be in equilibrium (so that the arbi-
trary constant is zero).
The characteristic root a
1
must be less
than unity in absolute value. These two
conditions readily generalize to all ARMA(p,q)
processes. The homogeneous solution to
(5) has the form
p

i=1
A
i

t
i
or if there are m repeated roots,

i=1
A
i
t
i
+
p

i=m+1
A
i

t
i
23
where the A
i
are arbitrary constants, is the
repeated root, and
i
are the distinct roots.
If any portion of the homogeneous equa-
tion is present, the mean, variance, and al
covariances will be time-dependent
Hence, for any ARMA(p,q) model, station-
arity necessitates that the homogeneous
solution be zero
The next section addresses stationarity restric-
tions for the particular solution.
4. Stationarity Restrictions for an
ARMA(p,q) Model
As a prelude to the stationarity conditions for
the general ARMA(p,q) model, rst consider
the stationarity conditions for an ARMA(2,1)
24
model. Since the magnitude of the intercept
term does not aect the stability (or station-
arity) condition, set a
0
= 0 and write
y
t
= a
1
y
t1
+a
2
y
t2
+
t
+
1

t1
. (16)
From the previous section, we know that the
homogeneous solution must be zero. So it
is only necessary to nd the particular solu-
tion. Using the method of undetermined co-
ecients, we can write the challenge solution
as
y
t
=

i=0

ti
. (17)
For (17) to be a solution of (16), the various

i
must satisfy

t
+
1

t1
+
2

t2
+
3

t3
+. . .
= a
1
(
0

t1
+
1

t2
+
2

t3
+
3

t4
+. . .)
+a
2
(
0

t2
+
1

t3
+
2

t4
+
3

t5
+. . .)
+
t
+
1

t1
.
25
Matching the coecients of
t
,
t1
,
t2
, ,
yields
1.
0
= 1
2.
1
= a
1

0
+
1

1
= a
1
+
1
3.
i
= a
1

i1
+a
2

i2
for all i 2.
The key point is that for i 2, the coe-
cients must satisfy the dierence equation

i
= a
1

i1
+a
2

i2
If the characteristic roots of (16) are within
the unit circle, then the {
i
} must consti-
tute a convergent sequence
To verify that the {
i
} sequence generated
by is stationary, take the expectation of
(17) and note that Ey
t
= Ey
ti
= 0 for all
t and i
Hence, the mean is nite and time-invariant
26
Since the {
t
} sequence is assumed to be white
noise process, the variance of y
t
is constant and
time-independent:
V ar(y
t
)
= E[(
0

t
+
1

t1
+
2

t2
+
3

t3
+. . .)
2
]
=
2

i=0

2
i
V ar(y
ts
)
= E[(
0

ts
+
1

ts1
+
2

ts2
+
3

ts3
+. . .)
2
]
=
2

i=0

2
i
Hence, V ar(y
t
) = V ar(y
ts
) for all t and s
27
Finally, note,
Cov(y
t
, y
t1
)
= E[(
t
+
1

t1
+
2

t2
+
3

t3
+. . .)
(
t1
+
1

t2
+
2

t3
+
3

t4
+. . .)]
=
2
(
1
+
2

1
+
3

2
+ )
Cov(y
t
, y
t2
)
= E[(
t
+
1

t1
+
2

t2
+
3

t3
+. . .)
(
t2
+
1

t3
+
2

t4
+
3

t5
+. . .)]
=
2
(
2
+
3

1
+
4

2
+. . .)
From the above pattern, it is clear that the
s-th autocovariance,
s
, is given by

s
= Cov(y
t
, y
ts
)
=
2
(
s
+
s+1

1
+
s+2

2
+. . .) (18)
Thus, the s-th autocovariance,
s
, is con-
stant and independent of t
28
Conversely, if the characteristic roots of
(16) do not lie within the unit circle, the
{
i
} sequence will not be convergent, and
hence, the {y
t
} sequence cannot be con-
vergent
Stationarity Restrictions for the
Moving Average Coecients
Next, we look at the conditions ensuring
the stationarity of a pure MA() process:
x
t
=

i=0

ti
where
t
WN(0,
2
). We have already
determined that {x
t
} is not a white noise
process; now the issue is whether {x
t
} is co-
variance stationary. Given conditions (7),
(8), and (9), we ask the following:
29
1. Is the mean nite and time-independent?
E(x
t
) = E(
t
+
1

t1
+
2

t2
+. . .)
= E
t
+
1
E
t1
+
2
E
t2
+. . .
= 0
Repeating the calculation with x
ts
, we obtain
E(x
ts
) = E(
ts
+
1

ts1
+
2

ts2
+. . .)
= E
ts
+
1
E
ts1
+
2
E
ts2
+. . .
= 0
Hence, all elements in the {x
t
} sequence have
the same nite mean ( = 0).
2. Is the variance nite and time-independent?
V ar(x
t
)
= E[(
t
+
1

t1
+
2

t2
+. . .)
2
]
= E(
t
)
2
+(
1
)
2
E(
t1
)
2
+(
2
)
2
E(
t2
)
2
+. . .
[since E
t

ts
= 0 for s = 0]
=
2
[1 +(
1
)
2
+(
2
)
2
+. . .]
30
Therefore, a necessary condition for V ar(x
t
)
to be nite is that

i=0
(
i
)
2
be nite.
Repeating the calculation with x
ts
yields
V ar(x
ts
)
= E[(
ts
+
1

ts1
+
2

ts2
+. . .)
2
]
= E(
ts
)
2
+(
1
)
2
E(
ts1
)
2
+(
2
)
2
E(
ts2
)
2
+. . .
[since E
ts

tsi
= 0 for i = 0]
=
2
[1 +(
1
)
2
+(
2
)
2
+. . .]
Thus, if

i=0
(
i
)
2
is nite, then V ar(x
t
) =
V ar(x
ts
) for all t and t s, and hence, all
elements in the {x
t
} sequence have the same
nite variance.
3. Are all autocovariances nite and time-
independent?
31
The s-th autocovariance,
s
, is given by

s
= Cov(x
t
, x
ts
)
= E(x
t
x
ts
)
= E(
t
+
1

t1
+
2

t2
+. . .)
(
ts
+
1

ts1
+
2

ts2
+. . .)
=
2
(
s
+
s+1

1
+
s+2

2
+. . .)
Therefore, for
s
to be nite, the sum

s
+
s+1

1
+
s+2

2
+. . ., must be nite.
In summary, the necessary and sucient con-
ditions for an MA() process to be stationary
are that the sums
(i)
2
0
+
2
1
+
2
2
+. . ., and
(ii)
s
+
s+1

1
+
s+2

2
+. . .
be nite.
However, since (ii) must hold for all values of
s 0, and
0
= 1, condition (i) is redundant.
32
Stationarity Restrictions for the
Moving Average Coecients
Now consider the pure autoregressive model of
order p:
y
t
= a
0
+
p

i=1
a
i
y
ti
+
t
. (19)
If the characteristic roots of the homogeneous
equation of (19) all lie inside the unit circle,
we can write the particular solution as
y
t
=
a
0
1

p
i=1
a
i
++

i=0

ti
(20)
where
0
= 1 and {
i
, i 1} are undeter-
mined coecients. We know that (20) is a
convergent sequence so long as the character-
istic roots of (19) are inside the unit circle. We
also know that the sequence {
i
} will solve the
dierence equation

i
a
1

i1
a
2

i2
. . . a
p

ip
= 0. (21)
If the characteristic roots of (21) are all in-
side the unit circle, the {
i
} sequence will be
convergent.
33
Although (20) is an innite-order moving av-
erage process, the convergence of the MA co-
ecients implies that

i=0

2
i
is nite. Thus,
we can use (20) to check the three condiions
of stationarity.
Ey
t
= Ey
ts
=
a
0
1

p
i=1
a
i
A necessary condition for all characteristic roots
to lie inside the unit circle is 1

p
i=1
a
i
> 0.
Hence, the mean of the sequence is nite and
time-invariant.
V ar(y
t
)
= E[(
t
+
1

t1
+
2

t2
+. . .)
2
]
= E(
t
)
2
+(
1
)
2
E(
t1
)
2
+(
2
)
2
E(
t2
)
2
+. . .
=
2
[1 +(
1
)
2
+(
2
)
2
+. . .]
=
2

i=0

2
i
34
Similarly,
V ar(y
ts
)
= E[(
ts
+
1

ts1
+
2

ts2
+. . .)
2
]
= E(
ts
)
2
+(
1
)
2
E(
ts1
)
2
+
(
2
)
2
E(
ts2
)
2
+. . .
=
2
[1 +(
1
)
2
+(
2
)
2
+. . .]
=
2

i=0

2
i
Thus, if

i=0
(
i
)
2
is nite, then V ar(y
t
) =
V ar(y
ts
) for all t and t s, and hence, all
elements in the {y
t
} sequence have the same
nite variance.
Finally, let us look at the s-th autocovariance,

s
, which is given by

s
= Cov(y
t
, y
ts
)
= E(y
t
y
ts
)
= E(
t
+
1

t1
+
2

t2
+. . .)
(
ts
+
1

ts1
+
2

ts2
+. . .)
=
2
(
s
+
s+1

1
+
s+2

2
+. . .)
35
Therefore, for
s
to be nite, the sum

s
+
s+1

1
+
s+2

2
+. . ., must be nite.
Nothing of substance is changed by combining
the AR(p) and MA(q) models into the general
ARMA(p,q) model:
y
t
= a
0
+
p

i=1
a
i
y
ti
+x
t
x
t
=
q

i=0

ti
. (22)
If the roots of the inverse characteristic equa-
tion lie outside the unit circle[that is, if the
roots of the homogeneous form of (22) lie in-
side the unit circle] and if the {x
t
} sequence is
stationary, the {y
t
} sequence will be stationary.
Consider
y
t
=
a
0
1

p
i=1
a
i
+

t
1

p
i=1
a
i
L
i
+

1

t1
1

p
i=1
a
i
L
i
+

2

t2
1

p
i=1
a
i
L
i
+. . . (23)
36
Each of the expressions on the right-hand
side of (23) is stationary as long as the
roots of 1

p
i=1
a
i
L
i
are outside the unit
circle
Given that {x
t
} is stationary, only the roots
of the autoregressive portion of (22) deter-
mine whether the {y
t
} sequence is station-
ary
5. The Autocorrelation Function
The autocovariances and autocorrelations of
the type found in (18) serve as useful tools
in the Box-Jenkins approach to identifying and
estimating time series models. Illustrated be-
low are four important examples: the AR(1),
AR(2), MA(1), and ARMA(1,1) models.
The Autocorrelation Function of an AR(1)
Process
For an AR(1) model, y
t
= a
0
+a
1
y
t1
+
t
, (14)
shows
37

0
=

2
[1 (a
1
)
2
]

s
=

2
(a
1
)
s
[1 (a
1
)
2
]
.
Now dividing
s
by
0
, gives autocorrelation
function (ACF) at lag s:
s
=

s

0
. Thus, we
nd that,

0
= 1,
1
= a
1
,
2
= (a
1
)
2
, . . . ,
s
= (a
1
)
s
.
A necessary condition for an AR(1) process
to be stationary is that |a
1
| < 1
Thus, the plot of
s
against s - called the
correlogram - should converge to zero ge-
ometrically if the series is stationary
38
If a
1
is positive, convergence will be direct,
and if a
1
is negative, the correlogram will
follow a damped oscillatory path around
zero
The rst two graphs on the left-hand side
of Figure 2.2 show the theoretical auto-
correlation function for a
1
= 0.7 and a
1
=
0.7, respectively
In these diagrams
0
is not shown since its
value is necessarily equal to one
The Autocorrelation Function of an AR(2)
Process
We now consider AR(2) process y
t
= a
1
y
t1
+
a
2
y
t2
+
t
(with a
0
omitted since this intercept
term has no eect on the ACF. For the AR(2)
to be stationary, we know that it is necessary
to restrict the roots of the second-order lag
39
polynomial (1 a
1
L a
2
L
2
) to be outside the
unit circle. In section 4, we derived the auto-
covariances of an ARMA(2,1) process by use
of the method of undetermined coecients.
Now we use an alternative technique known as
Yule-Walker equations. Multiply the second-
order dierence equation by y
t
, y
t1
, y
t2
, . . . , y
ts
and take expectations. This yields
Ey
t
y
t
= a
1
Ey
t1
y
t
+a
2
Ey
t2
y
t
+E
t
y
t
Ey
t
y
t1
= a
1
Ey
t1
y
t1
+a
2
Ey
t2
y
t1
+E
t
y
t1
Ey
t
y
t2
= a
1
Ey
t1
y
t2
+a
2
Ey
t2
y
t2
+E
t
y
t2
.
.
.
Ey
t
y
ts
= a
1
Ey
t1
y
ts
+a
2
Ey
t2
y
ts
+E
t
y
ts
(24)
By denition, the autocovariances of a station-
ary series are such that Ey
t
y
ts
= Ey
ts
y
t
=
Ey
tk
y
tks
=
s
. We also know that E
t
y
t
=

2
and E
t
y
ts
= 0. Hence, we can use equa-
tions (24) to form

o
= a
1

1
+a
2

2
+
2
(25)

1
= a
1

o
+a
2

1
(26)

s
= a
1

s1
+a
2

s2
(27)
40
Dividing (26) and (27) by
0
yields

1
= a
1

o
+a
2

1
(28)

s
= a
1

s1
+a
2

s2
(29)
We know that
0
= 1. So, from (28), we have

1
= a
1
/(1 a
2
). Hence, we can nd all
s
for
s 2 by solving the dierence equation (29).
For example, for s = 2, and s = 3,

2
= (a
1
)
2
/(1 a
2
) +a
2

3
= a
1
[(a
1
)
2
/(1 a
2
) +a
2
] +a
2
a
1
/(1 a
2
)
Given the solutions for
0
and
1
, the key
point to note is that the
s
all satisfy the
dierence equation (29)
The solution may be oscillatory or direct
Note that the stationarity condition for y
t
necessitates that the characteristic roots of
(29) lie inside the unit circle
41
Hence, the {
s
} sequence must be conver-
gent
The correlogram for an AR(2) process must
be such that
0
= 1 and
1
be determined
by (28)
These two values can be viewed as the ini-
tial values for the second-order dierence
equation (29)
The fourth panel on the left-hand side of
Figure 2.2 shows the ACF for the process
y
t
= 0.7y
t1
0.49y
t2
+
t
.
The properties of the various
s
follow di-
rectly from the homogeneous equation y
t

0.7y
t1
0.49y
t2
= 0
42
The roots are obtained as
= {0.7 [(0.7)
2
4(0.49)]
1/2
}/2
Since the discriminant
d = (0.7)
2
4(0.49)
is negative, the characteristic roots are imag-
inary. So the solution oscillates
However, since a
2
= 0.49, the solution is
convergent and the {y
t
} is stationary
Finally, we may wish to nd the autoco-
variances,
s
. Since we know all the auto-
correlations, if we can nd the variance of
y
t
, that is,
0
, we can nd all of the other

s
.
Since
i
=
i
/
0
, from (25) we have

0
= a
1
(
1

0
) +a
2
(
2

0
) +
2

0
(1 a
1

1
a
2

2
) =
2

0
=

2
(1a
1

1
a
2

2
)
43
Substituting for
1
and
2
yields

0
= V ar(y
t
)
=
_
(1 a
2
)
(1 +a
2
)
_ _

2
(a
1
+a
2
1)(a
2
a
1
1)
_
.
The Autocorrelation Function of an MA(1)
Process
Next consider the MA(1) process: y
t
=
t
+

t1
. Again, we can obtain the Yule-Walker
equations by multiplying y
t
by each y
ts
, s =
0, 1, 2, . . . and take expectations. This yields

0
= V ar(y
t
) = Ey
t
y
t
= E[(
t
+
t1
)(
t
+
t1
)] = (1 +
2
)
2

1
= Ey
t
y
t1
= E[(
t
+
t1
)(
t1
+
t2
)] =
2
.
.
.

s
= Ey
t
y
ts
= E[(
t
+
t1
)(
ts
+
ts1
)] = 0 s > 1
44
Dividing each
s
by
0
, it can be seen that
the ACF is simply

0
= 1,

1
= (1 +
2
), and

s
= 0 s > 1.
The third graph on the left-hand side of
Figure 2.2 shows the ACF for the MA(1)
process: y
t
=
t
0.7
t1
You saw above that for an MA(1) process,

s
= 0 s > 1.
As an easy exercise, convince yourself that,
for an MA(2) process,
s
= 0 s > 2,
for an MA(3) process,
s
= 0 s > 3,
and so on.
45
The Autocorrelation Function of an
ARMA(1,1) Process
Finally, consider the ARMA(1,1) process:
y
t
= a
1
y
t1
+
t
+
1

t1
. Using the now-familiar
procedure, the Yule-Walker equations are:
Ey
t
y
t
= a
1
Ey
t1
y
t
+E
t
y
t
+
1
E
t1
y
t

0
= a
1

1
+
2
+
1
(a
1
+
1
)
2
(30)
Ey
t
y
t1
= a
1
Ey
t1
y
t1
+E
t
y
t1
+
1
E
t1
y
t1

1
= a
1

0
+
1

2
(31)
Ey
t
y
t2
= a
1
Ey
t1
y
t2
+E
t
y
t2
+
1
E
t1
y
t2

2
= a
1

1
(32)
.
.
.
Ey
t
y
ts
= a
1
Ey
t1
y
ts
+E
t
y
ts
+
1
E
t1
y
ts

s
= a
1

s1
. (33)
Solving (30) and (31) simultaneously for
0
and
1
yields
46

0
=
1 +
2
1
+2a
1

1
(1 a
2
1
)

2
, and

1
=
(1 +a
1

1
)(a
1
+
1
)
(1 a
2
1
)

2
.
Hence,

1
=
(1 +a
1

1
)(a
1
+
1
)
1 +
2
1
+2a
1

1
(34)
and
s
= a
1

s1
for all s 2.
Thus, the ACF for an ARMA(1,1) process is
such that the magnitude of
1
depends on both
a
1
and
1
. Beginning with this value of
1
,
the ACF of an ARMA(1,1) process looks like
that of the AR(1) process. If 0 < a
1
< 1,
convergence will be direct, and if 1 < a
1
< 0,
the autocorrelations will oscillate. The ACF
for the function y
t
= 0.7y
t1
+
t
0.7
t1
is
shown as the last graph on the left-hand side
of Figure 2.2.
47
From the above you should be able to able
to recognize that the correlogram can re-
veal the pattern of the autoregressive co-
ecients
For an ARMA(p,q) model beginning after
lag q, the values of
i
will satisfy

i
= a
1

i1
+a
2

i2
+. . . +a
p

ip
.
6. The Partial Autocorrelation Function
In an AR(1) process, y
t
and y
t2
are cor-
related even though y
t2
does not directly
appear in the model
48
The correlation between y
t
and y
t2
, (i.e.,

2
) is equal to the correlation between y
t
and y
t1
, (i.e.,
1
) multiplied by the corre-
lation between y
t1
and y
t2
(i.e.,
1
again)
so that
2
= (
1
)
2
It is important to note that all such indirect
correlations are present in the ACF of any
autoregressive process
In contrast, the partial autocorrelation
between y
t
and y
ts
eliminates the eects
of the intervening values y
t1
throughy
ts+1
As such,in an AR(1) process the partial au-
tocorrelation between y
t
and y
t2
is equal
to zero
49
The most direct way to nd the partial au-
tocorrelation function is to rst form the
series y

t
by subtracting the mean of the
series (i.e., ) from each observation to
obtain y

t
y
t

Next, form the rst-order autoregression


y

t
=
11
y

t1
+e
t
where e
t
is the regression error term which
need not be a white noise process
Since there is no intervening values,
11
is both the autocorrelation and the partial
autocorrelation between y
t
and y
t1
Now form the second-order autoregression
y

t
=
21
y

t1
+
22
y

t2
+e
t
Here
22
is the partial autocorrelation co-
ecient between y
t
and y
t2
50
In other words,
22
is the correlation be-
tween between y
t
and y
t2
controlling for
(i.e., netting out) the eect of y
t1
Repeating the process for all additional lags
s yields the partial autocorrelation function
(PACF)
Using Yule-Walker equations, one can form
the partial autocorrelations from the auto-
correlations as

11
=
1
(35)

22
=
(
2

2
1
)
(1
2
1
)
(36)
and for additional lags,

ss
=

s1
j=1

s1,j

sj
1

s1
j=1

s1,j

j
(37)
where
sj
=
s1,j

ss

s1,sj
,
j = 1, 2, 3, . . . , s 1.
51
For an AR(p) process, there is no direct
correlation between y
t
and y
ts
for s > p
Hence, for s > p, all values of
ss
will be
zero and the PACF for pure AR(p) should
cut o to zero for all lags greater than p
In contrast,The PACF of an MA(1) pro-
cess: y
t
=
t
+
t1
As long as = 1, we can write y
t
/(1 +
L) =
t
, which we know has the AR()
representation
y
t
y
t1

2
y
t2

3
y
t3
+. . . =
t
.
Therefore, the PACF will not jump to zero
since y
t
will be correlated with all of its own
lags
52
Instead, the PACF coecients exhibit a ge-
ometrically decaying pattern
If < 0, decay is direct and if > 0, the
PACF coecients oscillate
The right-hand side of the fth panel in
Figure 2.2 shows the PACF for the ARMA(1,1)
model:
y
t
= 0.7y
t1
+
t
0.7
t1
More generally, the PACF of a stationary
ARMA(p, q) process must ultimately decay
toward zero beginning at lag p
The decay pattern depends on the coef-
cients of the lag polynomial (1 +
1
L +

2
L
2
+. . . +
q
L
q
)
53
Table 2.1 summarizes some of the properties
of the ACF and PACF for various ARMA pro-
cesses. For stationary processes, the key points
to note are the following:
The ACF of an ARMA(p, q) process will
begin to decay after lag q. After lag q,
the coecients of the ACF (ie., the
i
)
will satisfy the dierence equation (
i
=
a
1

i1
+ a
2

i2
+ . . . + a
p

ip
. Since the
characteristic roots are inside the unit cir-
cle, the autocorrelations will decay after
lag q. Moreover, the pattern of the auto-
correlation coecients will mimic that sug-
gested by the characteristic roots.
The PACF of an ARMA(p, q) process will
begin to decay after lag p. After lag p, the
coecients of the PACF (ie., the
ss
) will
mimic the ACF coecients from the model
y
t
/(1 +
1
L +
2
L
2
+. . . +
q
L
q
).
54
We can illustrate the usefulness of the ACF
and PACF functions using the model y
t
= a
0
+
0.7y
t1
+
t
. If we compare the top two graphs
in Figure 2.2, the ACF shows the monotonic
decay of the autocorrelations while the PACF
exhibits the single spike at lag 1. Suppose a
researcher collected sample data and plotted
the ACF and PACF functions. If the actual
patterns compared favorably to the theoreti-
cal patterns, the researcher might try to t an
AR(1) model. Conversely, if the ACF exhibited
a single spike and the PACF exhibited mono-
tonic decay the researcher might try an MA(1)
model.
7. Sample Autocorrelations of Stationary
Time series
Let there be T observations y
1
, y
2
, . . . , y
T
. If
the data series is stationary, we can use the
sample mean y, sample variance
2
, and
55
sample autocorrelations r
s
, as estimates of the
population mean , population variance
2
,
and population autocorrelations
s
, respectively,
where
y = (1/T)
T

t=1
y
t
(38)

2
= (1/T)
T

t=1
(y
t
y)
2
(39)
and for s = 1, 2, . . . ,
r
s
=

T
t=s+1
(y
t
y)(y
ts
y)

T
t=1
(y
t
y)
2
. (40)
The sample ACF and PACF can be compared
to theoretical ACF and PACF to identify the
actual data generating process. If the true
value of
s
= 0, that is, the true data-generating
process is MA(s-1), the sampling variance of
56
r
s
is given by
V ar(r
s
) = T
1
for s = 1
= T
1
(1 +2
s1

j=1
r
2
j
) for s > 1(41)
It T is large, r
s
is distributed normally with
mean zero. For the PACF coecients, under
the null hypothesis of an AR(p) model, that
is, under the null that all
p+i,p+i
are zero, the
variance of

p+i,p+i
is approximately T
1
.
We can test for signicance of sample ACF
and sample PACF using (41). For example, if
we use a 95% condence interval, (i.e., 2 stan-
dard deviations), and the calculated value of r
1
exceeds 2T
1/2
, it is possible to reject the null
hypothesis that the rst-order autocorrelation
is not statistically signicantly dierent from
zero. Rejecting this hypothesis means reject-
ing an MA(s 1) = MA(0) process and ac-
cepting the alternative q > 0. Next, try s = 2.
57
Then Var(r
2
) = (1 + 2r
2
1
)/T. If r
1
= 0.5 and
T = 100, then Var(r
2
) = 0.015 and SD(r
2
)
= 0.123. Thus, if the calculated value of r
2
exceeds 2(0.123), it is possible to reject the
null hypothesis, H
0
:
2
= 0. Again, rejecting
the null means accepting the alternative that
q > 1. Proceeding in this way it is possible to
identify the order of the process.
Box and Pierce (1970) developed the Q-
statistic to test whether a group of auto-
correlations is signicantly dierent from
zero. Under the null hypothesis H
0
:
1
=

2
= . . . =
s
= 0, the statistic
Q = T
s

k=1
r
2
k
is asymptotically distributed as a
2
with s
degrees of freedom
58
The intuition behind the use of this statis-
tic is that large sample autocorrelations will
lead to large values of Q, while a white
noise process (in which autocorrelations at
all lags should be zero) would have a Q
value of zero
Thus if the calculated value of Q exceeds
the appropriate value in a
2
table, we can
reject the null of no signicant autocorre-
lations
Rejecting the null means accepting an al-
ternative that at least one autocorrelation
is non-zero
A problem with the Box-Pierce Q-statistic:
it works poorly even in moderately large
samples
59
Remedy: Modied Q-statistic of Ljung and
Box (1978):
Q = T(T +2)
s

k=1
r
2
k
/(T k) (42)
If the sample value of Q from (42) exceeds
the critical value of
2
with s degrees of
freedom, then at least one value of r
k
is
statistically signicantly dierent from zero
at the specied signicance level
The Box-Pierce and Ljung-Box Q-statistics
also serve as a check to see if the residu-
als from an estimated ARMA(p, q) model
behave as a white noise process
60
However, when the s autocorrelations from
an estimated ARMA(p, q) model are formed,
the degrees of freedom are reduced by the
number of estimated coecients
Hence, using the residuals of an ARMA(p, q)
model, Q has a
2
distribution with spq
degrees of freedom (if a constant is in-
cluded in the estimation, the degrees of
freedom are s p q 1)
Model Selection Criteria
A natural question to ask of any estimated
model is: How well does it t the data?
The larger the lag orders p and/or q, the
smaller is the sum of squares of the esti-
mated residuals of the tted model
61
However, adding such lags entails estima-
tion of additional coecients and an asso-
ciated loss of degrees of freedom
Moreover, inclusion of extraneous coe-
cients will reduce the forecasting perfor-
mance of the tted model
Thus, increasing the lag lengths p and/or
q, involves both benets and costs
If we choose a lag order that is lower than
necessary, we will omit valuable informa-
tion contained in the more distant lags, and
thus, will undert the model
If we choose a lag order that is higher than
necessary, we will overt the model and
estimate extraneous coecients and inject
additional estimation error into our fore-
casts
62
Model selection criteria attempt to choose
the most parsimonious model by select-
ing the lag orders p and/or q by balancing
the benet of reduced sum of squares of
estimated residuals due to additional lags
against the cost of additional estimation
error
Two most commonly used model selec-
tion criteria are Akaike Information Crite-
rion (AIC) and Schwartz Bayesian Criterion
(SBC).
AIC = T ln (SSR) + 2n
SBC = T ln (SSR) + n ln(T)
where n = number of parameters estimated
(p +q+possible constant term),
T = number of observations,
SSR = sum of squared residuals.
63
Estimation of an AR(1) Model
Beginning with t = 1, 100 values of {y
t
} are
generated using the AR(1) process: y
t
= 0.7y
t1
+

t
, with the initial condition y
0
= 0. The up-
per left graph of Figure 2.3 shows the sample
ACF and the upper right graph shows the sam-
ple PACF of this AR(1) process. It is impor-
tant that you compare these ACF and PACF
to those of the theoretical processes shown in
Figure 2.2.
In practice, we never know the true data gen-
erating process. However, suppose we were
presented with those 100 sample values and
were asked to uncover the true process.
The rst step might be to compare the sample
ACF and PACF to those of the various theo-
retical models. The decaying pattern of the
ACF and the single large spike at lag 1 in the
sample PACF suggests an AR(1) model. The
64
rst three sample autocorrelations are r
1
=
0.74, r
2
= 0.58, and r
3
= 0.47 (which are some-
what greater than the corresponding theoreti-
cal autocorrelations of 0.7, 0.49, and 0.343.
In the PACF, there is a sizeable spike of 0.74
at lag 1, and all other autocorrelations (except
for lag 12) are very small.
Under the null hypothesis of an MA(0) pro-
cess, the standard deviation of r
1
is T
1/2
=
0.1. Since the sample value of r
1
= 0.74 is
more than seven standard deviations from
zero, we can reject the null hypothesis H
0
:

1
= 0
The standard deviation of r
2
is obtained
from (41) by taking s = 2:
V ar(r
2
) = (1 +2(0.74)
2
)/100 = 0.021.
65
Since (0.021)
1/2
= 0.1449, the sample value
of r
2
is more than 3 standard deviations
from zero; at conventional signicance lev-
els, we can reject the null hypothesis H
0
:

2
= 0
Similarly, we can test for the signicance of
all other values of sample autocorrelations
It can be seen in the second panel of Figure
2.3, other than
11
, all partial autocorrelations
(except for lag 12) are less than 2T
1/2
= 0.2.
The decay of the ACF and the single spike
of the PACF give strong indication of AR(1)
model. Nevertheless, if we did not know the
true underlying process, and happened to be
using monthly data, we might be concerned
with the signicant partial autocorrelation at
lag 12. After all, with monthly data we might
expect some direct relationship between y
t
and
y
t12
.
66
Although we know that the data were actually
generated from an AR(1) process, it is illumi-
nating to compare the estimates of two dier-
ent models. Suppose we estimate an AR(1)
model and also try to capture the spike at lag
12 with an MA coecient. Thus, we can con-
sider the two tentative models as
Model 1: y
t
= a
1
y
t1
+
t
,
Model 2: y
t
= a
1
y
t1
+
t
+
12

t12
.
Table 2.2 reports the results of the two esti-
mations. The coecient of Model 1 satises
the stability condition |a
1
| < 1 and has a low
standard error (the associated t-statistic for a
null of zero is more than 12). As a useful di-
agnostic check, we plot the correlogram of the
residuals of the tted model in Figure 2.4.
The Ljung-Box Q-statistics for these residuals
indicate that each one of the autocorrelations
is less than 2 standard deviations from zero.
The Q-statistics indicate that as a group, lags
1 through 8, 1 through 16, and 1 through 24
are not signicantly dierent from zero.
67
This is strong evidence that the AR(1) model
ts the data well. If the residual autocorrela-
tions were signicant, the AR(1) model would
not utilize all available information concern-
ing movements in the y
t
sequence. For exam-
ple, suppose we wanted to forecast y
t+1
con-
ditional on all available information up to and
including period t. With Model 1, the value of
y
t+1
is y
t+1
= a
1
y
t
+
t+1
. Hence, the forecast
from Model 1 is:
E
t
y
t+1
= E
t
(a
1
y
t
+
t+1
)
= E
t
(a
1
y
t
) +E
t
(
t+1
)
= a
1
y
t
.
If the residual autocorrelation had been signi-
cant, this forecast would not capture all of the
available information set.
Examining the results for Model 2, note that
both models yield similar estimates for the rst-
order autoregressive coecient and the asso-
ciated standard error. However, the estimate
for
68

12
is of poor quality; the insignicant t-value
suggests that it should be dropped from the
model. Moreover, comparing the AIC and the
SBC values of the two models suggests that
any benets of a reduced sum of squared resid-
uals is overwhelmed by the detrimental eects
of estimating an additional parameter. All of
these indicators point to the choice of Model
1.
Estimation of an ARMA(1,1) Model
See ARMA(1,1) & Table 2.3 under Figures
& Tables in Chapter 2.
Estimation of an AR(2) Model
See AR(2) under Figures & Tables in Chap-
ter 2.
8. Box-Jenkins Model Selection
The estimates of the AR(1), ARMA(1,1) and
AR(2) models in the previous section illustrate
69
the Box-Jenkins (9176) strategy for appropri-
ate model selection. Box and Jenkins popular-
ized a three-stage method aimed at selecting
an appropriate model for the purpose of esti-
mating and forecasting a univariate time series.
In the identication stage, the researcher vi-
sually examines the time plot of the series, the
autocorrelation function, and the partial auto-
correlation function. Plotting the time path
of the {y
t
} sequence provides useful informa-
tion concerning outliers, missing values, and
structural breaks in the data. Nonstationary
variables may have a pronounced trend or ap-
pear to meander without a constant long-run
mean or variance. Missing values and outliers
can be corrected at this point. Earlier, a stan-
dard practice was to rst-dierence any series
deemed to be nonstationary. Currently, a large
literature is evolving that develops formal pro-
cedures to check for nonstationarity. We defer
this discussion until Chapter 4 and
70
assume that we are working with stationary
data. A comparison of the sample ACF and
sample PACF to those of various theoretical
ARMA processes may suggest several plausi-
ble models. In the estimation stage each of
the tentative models is t and the various a
i
and
i
coecients are examined. In this sec-
ond stage, the estimated models are compared
using the following criteria.
Parsimony
A fundamental idea in the Box-Jenkins approach
is the principle of parsimony. Incorporating
additional coecients will necessarily increase
t (e.g., the value of R
2
will increase) at a
cost of reducing degrees of freedom. Box and
Jenkins argue that parsimonious models pro-
duce better forecasts than overparameterized
models. A parsimonious model ts the data
well without incorporating any needless coef-
cients. The aim is to approximate the true
data generating process but not to pin down
71
the exact process. The goal of parsimony sug-
gested eliminating the MA(12) coecient in
the simulated AR(1) model shown earlier.
In selecting an appropriate model, the econo-
metrician needs to be aware that several dier-
ent models may have similar properties. As an
extreme example, note that the AR(1) model
y
t
= 0.5y
t1
+
t
has the equivalent innite-order moving-average
representation of
y
t
=
t
+0.5
t1
+0.25
t2
+0.125
t3
+0.0625
t4
+. . . .
In most samples, approximating this MA()
process with an MA(2) or MA(3) model will
give a very good t. However, the AR(1)
model is the more parsimonious model and is
preferred.
72
One also needs to be aware of the common
factor problem. Suppose we wanted to t the
ARMA(2,3) model
(1a
1
La
2
L
2
)y
t
= (1+
1
L+
2
L
2
+
3
L
3
)
t
.
(43)
Suppose that (1 a
1
La
2
L
2
) and (1 +
1
L+

2
L
2
+
3
L
3
) can each be factored as (1 +
cL)(1 +aL) and (1 +cL)(1 +b
1
L +b
2
L
2
), re-
spectively. Since (1 +cL) is a common factor
to each, (43) has the equivalent, but more par-
simonious, form
(1 +aL)y
t
= (1 +b
1
L +b
2
L
2
)
t
. (44)
In order to ensure that the model is parsimo-
nious, the various a
i
and
i
should all have t-
statistics of 2.0 or greater (so that each coe-
cient is signicantly dierent from zero at the
5% level). Moreover, the coecients should
not be strongly correlated with each other.
Highly collinear coecients are unstable, usu-
ally one or more can be eliminated from the
model without reducing forecasting performance.
73
Stationarity and Invertibility
The distribution theory underlying the use of
the sample ACF and PACF as approximations
to those of the true data generating process
is based on the assumption of stationarity of
the y
t
sequence. Moreover, t-statistics and Q-
statistics also presume that the data are sta-
tionary. The estimated autoregressive coe-
cients should be consistent with this underlying
assumption. Hence, we should be suspicious of
an AR(1) model if the estimated value of a
1
is close to unity. For an ARMA(2,q) model,
the characteristic roots of the estimated poly-
nomial (1 a
1
L a
2
L
2
) should be outside the
unit circle.
The Box-Jenkins methodology also necessitates
that the model be invertible. Formally, y
t
is
invertible if it can be represented by a nite-
order or convergent autoregressive process. In-
vertibility is important because the use of the
74
ACF and PACF implicitly assumes that the {y
t
}
sequence can be represented by an autoregres-
sive model. As a demonstration, consider the
simple MA(1) model
y
t
=
t

t1
(45)
so that if |
1
| < 1,
y
t
/(1
1
L) =
t
or
y
t
+
1
y
t1
+
2
1
y
t2
+
3
1
y
t3
+. . . =
t
. (46)
If |
1
| < 1, (46) can be estimated using the
Box-Jenkins method. However, if |
1
| 1,
the {y
t
} sequence cannot be represented by a
nite-order AR process, and thus, it is not in-
vertible. More generally, for an ARMA model
to have a convergent AR representation, the
roots of the polynomial (1+
1
L+
2
L
2
+. . . +

q
L
q
) must lie outside the unit circle.
75
We note that there is noting improper about a
noninvertible model. The {y
t
} sequence im-
plied by y
t
=
t

t1
is stationary in that
it has a constant time-invariant mean [Ey
t
=
Ey
ts
= 0], a constant time-invariant variance
[V ar(y
t
) = V ar(y
ts
) =
2
(1 +
2
1
) +2
2
], and
the autocovariances
1
=
1

2
and all other

s
= 0. The problem is that the technique does
not allow for the estimation of such models. If

1
= 1, (46) becomes
y
t
+y
t1
+y
t2
+y
t3
+y
t4
+. . . =
t
.
Clearly, the autocorrelations and partial auto-
correlations between y
t
and y
ts
will never de-
cay.
Goodness of Fit
R
2
and the average of the residual sum of
squares are common measures of goodness of
t in ordinary least squares.
AIC and SBC are more appropriate measures
of t in time series models.
76
Caution must be exercised if estimates fail to
converge rapidly. Failure of rapid convergence
might be indicative of estimates being unsta-
ble. In such circumstances, adding an addi-
tional observation or two can greatly alter the
estimates.
The third stage of the Box-Jenkins methodol-
ogy involves diagnostic checking. The stan-
dard practice is to plot the residuals to look
for outliers and for evidence of periods in which
the model does not t the data well. If all plau-
sible ARMA models show evidence of a poor t
during a reasonably long portion of the sample,
it is wise to consider using intervention analy-
sis, transfer function analysis, or any other of
the multivariate estimation methods, discussed
in later chapters. If the variance of the residual
is increasing, a logarithmic transformation may
be appropriate. Alternatively, we may wish to
actually model any tendency of the variance to
change using the ARCH techniques discussed
in Chapter 3.
77
It is particularly important that the residuals
from an estimated model be serially uncorre-
lated. Any evidence of serial correlation implies
a systematic movement in the {y
t
} sequence
that is not accounted for by the ARMA coe-
cients included in the model. Hence, any of the
tentative models yielding nonrandom residuals
should be eliminated from consideration. To
check for correlation in the residuals, construct
the ACF and the PACF of the residuals of the
estimated model. Then use (41) and (42) to
determine whether any or all of the residual
autocorrelations or partial autocorrelations are
statistically signicant. Although there is no
signicance level that is deemed most appro-
priate, be wary of any model yielding
(1) several residual correlations that are marginally
signicant, and
(2) a Q-statistic that is barely signicant at
10% level.
In such circumstances,it is usually possible to
78
formulate a better performing model. If there
are sucient observations, tting the same ARMA
model to each of two subsamples can provide
useful information concerning the validity of
the assumption that the data generating pro-
cess is unchanging. In the AR(2) model that
was estimated in the last section, the sam-
ple was split in half. In general, suppose you
estimated an ARMA(p, q) model using a sam-
ple of T observations. Denote the sum of the
squared residuals as SSR. Now divide the T ob-
servations with t
m
observations in the rst and
t
n
= T t
m
observations in the second. Use
each subsample to estimate the two models
y
t
= a
0
(1) +a
1
(1)y
t1
+. . . +a
p
(1)y
tp
+
t
+
1
(1)
t1
+. . .+
q
(1)
tq
[using t
1
, . . . , t
m
]
y
t
= a
0
(2) +a
1
(2)y
t1
+. . . +a
p
(2)y
tp
+
t
+
1
(2)
t1
+. . .+
q
(2)
tq
[using t
m+1
, . . . , t
T
].
79
Let the sum of the squared residuals from the
two models be, respectively, SSR
1
and SSR
2
.
To test the restriction that all coecients are
equal, [i.e., a
0
(1) = a
0
(2) and a
1
(1) = a
1
(2)
and . . . a
p
(1) = a
p
(2) and
1
(1) =
1
(2) and
. . .
q
(1) =
q
(2)], conduct an F-test using
F =
(SSR SSR
1
SSR
2
)/n
(SSR
1
+SSR
2
)/(T 2n)
(47)
where n = number of parameters estimated
= p+q+1 (if an intercept is included)
= p +q (if no intercept is included)
and the numbers of degrees of freedom are
(n, T 2n).
Intuitively, if the coecients are equal, that
is, if the restriction is not binding, then the
sum of squared residuals SSR from the re-
stricted model and the sum of squared residu-
als (SSR
1
+SSR
2
) from the unrestricted model
should be equal. Hence, F should be zero.
Conversely, if restriction is binding, SSR should
80
exceed (SSR
1
+SSR
2
). And, the larger the dif-
ference between SSR and (SSR
1
+SSR
2
), and
thus, the larger the calculated value of F, the
larger is the evidence against the hypothesis
that the coecients are equal.
Similarly, a model can be estimated over only a
portion of the data set. The estimated model
can then be used to forecast the known values
of the series. The sum of the squared forecast
errors is a useful way to compare the adequacy
of alternative models. Those models with poor
out-of-sample forecasts should be eliminated.
9. Properties of Forecasts
One of the most important uses of ARMA
models is to forecast future values of the {y
t
}
sequence. To simplify the following discussion,
it is assumed that the actual data generating
process and the current and past realizations
of {y
t
} and {
t
} sequences are known to the
81
researcher. First consider the forecasts of an
AR(1)model: y
t
= a
0
+ a
1
y
t1
+
t
. Updating
one period, we obtain: y
t+1
= a
0
+a
1
y
t
+
t+1
.
If we know the coecients a
0
and a
1
, we can
forecast y
t+1
conditioned on the information
available at period t as
E
t
y
t+1
= a
0
+a
1
y
t
(48)
where the notation E
t
y
t+j
stands for the con-
ditional expectation of y
t+j
given the informa-
tion available at period t. Formally,
E
t
y
t+j
= E(y
t+j
|y
t
, y
t1
, y
t2
, . . . ,
t
,
t1
, . . .).
In the same way, since y
t+2
= a
0
+ a
1
y
t+1
+

t+2
, the forecast of y
t+2
conditioned on the
information available at period t is
E
t
y
t+2
= a
0
+a
1
E
t
y
t+1
and using (48)
E
t
y
t+2
= a
0
+a
1
(a
0
+a
1
y
t
).
82
Thus forecast of y
t+1
can be used to forecast
y
t+2
. In other words, forecasts can be con-
structed using forward iteration; the forecast
of y
t+j
can be used to forecast y
t+j+1
. Since
y
t+j+1
= a
0
+a
1
y
t+j
+
t+j+1
, it follows that
E
t
y
t+j+1
= a
0
+a
1
E
t
y
t+j
. (49)
From (48) and (49) it should be clear that
it is possible to obtain the entire sequence
of j-step-ahead forecasts by forward iteration.
Consider
E
t
y
t+j
= a
0
(1 +a
1
+a
2
1
+. . . +a
j1
1
) +a
j
1
y
t
.
83
This equation, called the forecast function,
expresses all of the j-step-ahead forecasts as
functions of the information set in period t.
Unfortunately, the quality of the forecasts de-
clines as we forecast further out into the fu-
ture. Think of (49) as a rst-order dier-
ence equation in the {E
t
y
t+j
} sequence. Since
|a
1
| < 1, the dierence equation is stable, and
it is straightforward to nd the particular so-
lution to the dierence equation. If we take
the limit of E
t
y
t+j
as j , we nd that
E
t
y
t+j
a
0
/(1 a
1
). This result is quite gen-
eral. For any stationary ARMA model, the
conditional forecast of y
t+j
converges to the
unconditional mean as j .
Because the forecasts from an ARMA model
will not be perfectly accurate, it is important to
consider the properties of the forecast errors.
Forecasting from time period t, we can dene
the j-step-ahead forecast error e
t
(j) as the dif-
ference between the realized value of y
t+j
and
84
the forecast value E
t
y
t+j
. Thus
e
t
(j) y
t+j
E
t
y
t+j
.
Hence, the 1-step-ahead forecast error e
t
(1) =
y
t+1
E
t
y
t+1
=
t+1
(i.e., the unforecastable
portion of y
t+1
given the information available
in period t).
To nd the two-step-ahead forecast error, we
need to form e
t
(2) = y
t+2
E
t
y
t+2
. Since
y
t+2
= a
0
+ a
1
y
t+1
+
t+2
and E
t
y
t+2
= a
0
+
a
1
E
t
y
t+1
, it follows that
e
t
(2) = a
1
(y
t+1
E
t
y
t+1
) +
t+2
=
t+2
+a
1

t+1
.
Proceeding in a like manner, you can demon-
strate that for the AR(1) model, j-step-ahead
forecast error e
t
(j) is given by
e
t
(j) =
t+j
+a
1

t+j1
+a
2
1

t+j2
+a
3
1

t+j3
+. . . +a
j1
1

t+1
. (50)
85
Since the mean of (50) is zero, the forecasts
are unbiased estimates of each value y
t+j
. It
can be seen as follows. Since E
t

t+j
= E
t

t+j1
=
. . . = E
t

t+1
= 0, the conditional expectation
of (50) is E
t
e
t
(j) = 0. Since the expected
value of the forecast error is zero, the fore-
casts are unbiased.
Next we look at the variance of the forecast er-
ror. To compute the forecast error, continue
to assume that the elements of the {
t
} se-
quence are independent with a variance equal
to
2
. Then using (50), the variance of the
forecast error is
V ar[e
t
(j)] =
2
[1+a
2
1
+a
4
1
+a
6
1
+. . . +a
2(j1)
1
].
(51)
for j = 1, 2, . . . , . Thus, the one-step-ahead
forecast error variance is
2
, the two-step-ahead
forecast error variance is
2
(1 + a
2
1
), and so
forth. The essential point to note is that the
86
forecast error variance is an increasing function
of j. Consequently, we can have more con-
dence in short-term forecasts than in long-term
forecasts. In the limit, as j , the forecast
error variance converges to
2
/(1a
2
1
); hence,
the forecast error variance converges to the
unconditional variance of the {y
t
} sequence.
Moreover, assuming the {
t
} sequence is nor-
mally distributed, you can place condence in-
tervals around the forecasts. The one-step-
ahead forecast of y
t+1
is a
0
+ a
1
y
t
and the
forecast error is
2
. Therefore, the 95% con-
dence interval for the one-step-ahead forecast
can be constructed as
a
0
+a
1
y
t
1.96.
We can construct a condence interval for the
two-step-ahead forecast error in a similar way.
Using (49), the two-step-ahead forecast is E
t
y
t+2
= a
0
(1+a
1
)+a
2
1
y
t
. Again using (51), we know
87
that V ar[e
t
(2)] =
2
(1 + a
2
1
). Thus, the 95%
condence interval for the two-step-ahead fore-
cast is
a
0
(1 +a
1
) +a
2
1
y
t
1.96(1 +a
2
1
)
1/2
.
Higher-Order Models
Now we generalize the above discussion to de-
rive forecasts for any ARMA(p, q) model. To
keep the algebra simple, consider the ARMA(2,1)
model:
y
t
= a
0
+a
1
y
t1
+a
2
y
t2
+
t
+
1

t1
. (52)
Updating one period yields
y
t+1
= a
0
+a
1
y
t
+a
2
y
t1
+
t+1
+
1

t
.
If we continue to assume that (1) all the coef-
cients are known; (2) all variables subscripted
t, t 1, t 2, . . . are known at period t; and (3)
E
t

t+j
= 0 for j > 0, the conditional expecta-
tion of y
t+1
is
E
t
y
t+1
= a
0
+a
1
y
t
+a
2
y
t1
+
1

t
. (53)
88
Equation (53) is the one-step-ahead forecast
of y
t+1
. The one-step-ahead forecast error:
e
t
(1) = y
t+1
E
t
y
t+1
=
t+1
.
To nd the two-step-ahead forecast, update
(52) by two periods
y
t+2
= a
0
+a
1
y
t+1
+a
2
y
t
+
t+2
+
1

t+1
.
The conditional expectation of y
t+2
is
E
t
y
t+2
= a
0
+a
1
E
t
y
t+1
+a
2
y
t
. (54)
Equation (54) expresses the two-step-ahead
forecast in terms of the one-step-ahead fore-
cast and current value of y
t
. Combining (53)
and (54) yields
E
t
y
t+2
= a
0
+a
1
[a
0
+a
1
y
t
+a
2
y
t1
+
1

t
]+a
2
y
t
= a
0
(1 +a
1
) +[a
2
1
+a
2
]y
t
+a
1
a
2
y
t1
+a
1

t
.
89
To nd the two-step-ahead forecast error, sub-
tract (54) from y
t+2
. Thus,
e
t
(2) = y
t+2
E
t
y
t+2
= [a
0
+a
1
y
t+1
+a
2
y
t
+
t+2
+
1

t+1
]
[a
0
+a
1
E
t
y
t+1
+a
2
y
t
]
= a
1
(y
t+1
E
t
y
t+1
) +
t+2
+
1

t+1
. (55)
Since y
t+1
E
t
y
t+1
is equal to the one-step-
ahead forecast error
t+1
, we can write the
forecast error as e
t
(2) = (a
1
+
1
)
t+1
+
t+2
.
Alternatively,
e
t
(2) = y
t+2
E
t
y
t+2
= [a
0
+a
1
y
t+1
+a
2
y
t
+
t+2
+
1

t+1
]
[a
0
(1 +a
1
) +[a
2
1
+a
2
]y
t
+a
1
a
2
y
t1
+a
1

t
]
= (a
1
+
1
)
t+1
+
t+2
. (56)
Finally, all j-step-ahead forecasts can be ob-
tained from
E
t
y
t+j
= a
0
+a
1
E
t
y
t+j1
+a
2
E
t
y
t+j2
, j 2.
(57)
90
Equation (57) suggests that the forecasts will
satisfy a second-order dierence equation. As
long as the characteristic roots of (57) lie in-
side the unit circle, the forecasts will converge
to the unconditional mean: a
0
/(1 a
1
a
2
).
We can use (57) to nd the j-step-ahead fore-
cast errors. Since y
t+j
= a
0
+ a
1
y
t+j1
+
a
2
y
t+j2
+
t+j
+
1

t+j1
, the j-step-ahead
forecast error:
e
t
(j) = y
t+j
E
t
y
t+j
= [a
0
+a
1
y
t+j1
+a
2
y
t+j2
+
t+j
+
1

t+j1
]
E
t
[a
0
+a
1
y
t+j1
+a
2
y
t+j2
+
t+j
+
1

t+j1
]
= a
1
(y
t+j1
E
t
y
t+j1
)
+a
2
(y
t+j2
E
t
y
t+j2
)
+
t+j
+
1

t+j1
= a
1
e
t
(j 1) +a
2
e
t
(j 2)
+
t+j
+
1

t+j1
. (58)
91
In practice, we will not know the actual order
of the ARMA process or the actual values of
the coecients of that process. Instead, to
create out-of-sample forecasts, it is necessary
to use the estimated coecients from what
we believe to be the appropriate form of the
ARMA model. Suppose we have T observa-
tions of the {y
t
} sequence and choose to t
an ARMA(2,1) model to the data. Let a hat
or caret, (i.e., ) over a parameter denote the
estimated value of the parameter and let {
t
}
denote the residuals of the estimated model.
Hence the estimated ARMA(2,1) model can
be written as
y
t
= a
0
+a
1
y
t1
+a
2
y
t2
+
t
+

t1
.
Given that the sample contains T observations,
the out-of-sample forecasts can be easily con-
structed. For example, we can use (53) to
forecast the value of y
T+1
conditional on the
T observations as
E
T
y
T+1
= a
0
+a
1
y
T
+a
2
y
T1
+

T
. (59)
92
Once we know the values of a
0
, a
1
, a
2
, and

1
,
(59) can easily be constructed using the ac-
tual values of y
T
, y
T1
, and
T
. Similarly, the
forecast of y
T+2
can be constructed as
E
T
y
T+2
= a
0
+a
1
E
T
y
T+1
+a
2
y
T
where E
T
y
T+1
is the forecast from (59).
Given these two forecasts, all subsequent fore-
casts can be obtained from the dierence equa-
tion
E
T
y
T+j
= a
0
+a
1
E
T
y
T+j1
+a
2
E
T
y
T+j2
, j 2.
Note: it is much more dicult to construct
condence intervals for the forecast errors. Not
only is it necessary to include the eects of
the stochastic variation in the future values of
{y
T+1
}, it is also necessary to incorporate the
fact that the coecients are estimated with
errors.
93
Now that we have estimated a series and have
forecasted its future values, the obvious ques-
tion is: How good are our forecasts? Typically,
there will be several plausible models that we
can select to use for our forecasts. Do not be
fooled into thinking that the one with the best
t is the one that will forecast the best. To
make a simple point, suppose you wanted to
forecast the future values of the ARMA(2,1)
process given by (52). If you could forecast
the value of y
T+1
using (53), you would ob-
tain the one-step-ahead forecast error
e
T
(1) = y
T+1
a
0
a
1
y
T
a
2
y
T1

T
=
T+1
.
Since the forecast error is the pure unfore-
castable portion of y
T+1
, no other ARMA model
can provide you with superior forecasting per-
formance. However, we need to estimate the
parameters of the process, so our forecasts
must be made using (59). Therefore, our es-
timated forecast error will be
e
T
(1) = y
T+1
(a
0
+a
1
y
T
+a
2
y
T1
+

T
).
94
Clearly, the two forecast errors are not iden-
tical. When we forecast using (59), the co-
ecients (and residuals) are estimated impre-
cisely. The forecasts made using the estimated
model extrapolate this coecient uncertainty
into the future. Since coecient uncertainty
increases as the model becomes more complex,
it could be that an estimated AR(1) model
forecasts the process given by (52) better than
an estimated ARMA(2,1) model.
How do we know which one of several rea-
sonable models has the best forecasting per-
formance? One way to determine that is to
put the alternative models to a head-to-head
test. Since the future values of the series are
unknown, you can hold back a portion of the
observations from the estimation process and
estimate the alternative models over the short-
ened span of data and use these estimates to
forecast the observations of the holdback pe-
riod. You can then compare the properties of
95
the forecast errors from the alternative mod-
els. To take a simple example, suppose that
{y
t
} contains a total of 150 observations and
that you are unsure as to whether an AR(1) or
an MA(1) model best captures the behavior of
the series. One way to proceed is to use the
rst 100 observations to estimate both mod-
els and use each to forecast the value of y
101
.
Since you know the actual value of y
101
, you
can construct the forecast error obtained from
AR(1) and from MA(1). These two forecast
errors are precisely those that someone would
have made if they had been making a one-
step-ahead forecast in period 100. Now, re-
estimate an AR(1) and an MA(1) model using
the rst 101 observations. Although the esti-
mated coecients will change somewhat, they
are those that someone would have obtained
in period 101. Use the two models to forecast
the value of y
102
. Given that you know the ac-
tual value of y
102
, you can construct two more
forecast errors. Since you know all the values
96
of the {y
t
} sequence through period 150, you
can continue this process so as to obtain two
series of one-step-ahead forecast errors, each
containing 50 errors. To keep the notation
simple, let {f
1t
} and {f
2t
} denote the sequences
of forecasts from the AR(1) and the MA(1),
respectively. Similarly, let {e
1t
} and {e
2t
} de-
note the sequences of forecast errors from the
AR(1) and the MA(1), respectively. Then it
should be clear that f
11
= E
100
y
101
is the rst
forecast using the AR(1), e
11
= y
101
f
11
is
the rst forecast error (where the rst hold
back observation is y
101
), and e
2,50
is the last
forecast error from the MA(1).
It is desirable that the forecast errors have a
mean zero and a small variance. A regression-
based method to assess the forecasts is to use
the 50 forecasts from the AR(1) to estimate
an equation of the form
y
100+t
= a
0
+a
1
f
1t
+v
1t
, t = 1, 2, . . . , 50.
97
If the forecasts are unbiased, an F-test should
allow us to impose the restriction a
0
= 0 and
a
1
= 1. Similarly, the residual series v
1t
should
act as a white noise process. It is a good idea
to plot v
1t
against y
100+t
to determine if there
are periods in which our forecasts are espe-
cially poor. Now repeat the process with the
forecasts from the MA(1). In particular, use
the 50 forecasts from the MA(1) to estimate
y
100+t
= b
0
+b
1
f
2t
+v
2t
, t = 1, 2, . . . , 50.
Again, if we use an F-test, we should not be
able to reject the joint hypothesis b
0
= 0 and
b
1
= 1. If the signicance levels from the two
F-tests are similar, we might select the model
with the smallest residual variance: that is, se-
lect the AR(1) if V ar(y
1t
) < V ar(y
2t
).
More generally, we might want to have a hold-
back period that diers from 50 observations.
98
With a very small sample, it may not be possi-
ble to hold back 50 observations. Small sam-
ples are a problem since Ashley (1997) shows
that very large samples are often necessary to
reveal a signicant dierence between the out-
of-sample forecasting performances of similar
models. Hence, we need to have enough ob-
servations to have well-estimated coecients
for the in-sample period and enough out-of-
sample forecasts so that the test has good
power. If we have a large sample, it is typi-
cal to hold back as much as 50% of the data
set. Also, we might want to use j-step-ahead
forecasts instead of one-step-ahead forecasts.
For example, if we have quarterly data and
want to forecast one year into the future, we
can perform the analysis using four-step-ahead
forecasts. Nevertheless, once we have the two
sequences of forecast errors, we can compare
their properties.
99
Instead of using a regression based approach, a
researcher could select a model with the small-
est mean square prediction error (MSPE). If
there are H observations in the hold back pe-
riods, the MSPE for the AR(1) can be calcu-
lated as
MSPE =
1
H

H
i=1
e
2
1i
Several methods have been proposed to deter-
mine whether one MSPE is statistically dier-
ent from the other. If we put the larger of
the two MSPEs in the numerator, a standard
recommendation is to use the F-statistic
F =
H

i=1
e
2
1i
/
H

i=1
e
2
2i
(60)
The intuition is that the value of F will equal
unity if the forecast errors from the two models
are identical. A very large value of F implies
that the forecast errors from the rst model
are substantially larger than those from the
second. Under the null hypothesis of equal
100
forecasting performance, (60) has a standard
F distribution with (H, H) degrees of freedom
if the following 3 assumptions hold. The fore-
cast errors are
1. normally distributed with zero mean,
2. serially uncorrelated, and
3. contemporaneously uncorrelated.
Although it is common practice to assume that
the {e
t
} sequence is normally distributed, it is
not necessarily the case that the forecast errors
are normally distributed with zero mean. Sim-
ilarly, the forecasts may be serially correlated;
this is particularly true if we use multi-step-
ahead forecasts. For example, equation (56)
indicated that the two-step-ahead forecast er-
ror for y
t+2
is
e
t
(2) = (a
1
+
1
)
t+1
+
t+2
and updating e
t
(2) by one period yields the
two-step-ahead forecast error for y
t+3
as
e
t+1
(2) = (a
1
+
1
)
t+2
+
t+3
.
101
Thus predicting y
t+2
from the perspective of
period t and predicting y
t+2
from the perspec-
tive of period t+1 both contain an error due to
the presence of
t+2
. This induces serial cor-
relation between the two forecast errors. For-
mally it can be seen as follows:
E[e
t
(2)e
t+1
(2)] = (a
1
+
1
)
2
= 0.
However, for i > 1, E[e
t
(2)e
t+1
(2)] = 0 since
there are no overlapping forecasts. Hence, the
autocorrelations of the two-step-ahead fore-
cast errors cut o to zero after lag 1. As an
exercise you can demonstrate the general re-
sult that j-step-ahead forecast errors act as an
MA(j 1) process.
Finally, the forecast errors from the two alter-
native models will usually be highly correlated
with each other. For example, a negative re-
alization of
t+1
will tend to cause the fore-
casts from both models to be too high. Also
note: the violation of any of the 3 assump-
tions means that the ratio of the MSPEs in
(60) does not have an F-distribution.
102
The Granger-Newbold Test
Granger and Newbold (1976) show how to over-
come the problem of contemporaneously cor-
related forecast errors. Use the two sequences
of forecast errors to form
x
t
= e
1t
+e
2t
and z
t
= e
1t
e
2t
.
If assumptions 1 and 2 are valid, then under
the null hypothesis of equal forecast accuracy,
x
t
and z
t
should be uncorrelated. That is,

xz
= Ex
t
z
t
= E(e
2
1t
e
2
2t
).
should be zero. Model 1 has a larger MSPE if

xz
is positive and Model 2 has a larger MSPE
if
xz
is negative. Let r
xz
denote the sample
correlation coecient between {x
t
} and {z
t
}.
Granger and Newbold (1976) show that
r
xz
/
_
(1 r
2
xz
)/(H 1) (61)
has a t-distribution with H1 degrees of free-
dom. Thus, if r
xz
is statistically signicantly
dierent from zero, model 1 has a larger MSPE
if r
xz
is positive and model 2 has a larger MSPE
if r
xz
is negative.
103
The Diebold-Mariano Test
Diebold and Mariano (1995) relaxes assump-
tions 1 - 3 and allow for an objective func-
tion that is not quadratic. This is important
because if, for example, an investors loss de-
pends on the size of the forecast error, the
forecaster should be concerned with the abso-
lute values of the forecast errors. As another
example, an options trader receives a pay-o
of zero if the value of the underlying asset
lies below the strike price but receives a one-
dollar pay-o for each dollar the asset price
rises above the strike price.
If we consider only one-step-ahead forecasts,
we can eliminate the subscript j and let the
loss from a forecast error in period i be denoted
by g(e
i
). In the typical case of mean squared
errors, the loss is e
2
i
. To allow the loss function
to be general, we can write the dierential loss
in period i from using model 1 versus model 2
104
as d
i
= g(e
1i
) g(e
2i
). The mean loss can be
obtained as

d =
1
H
H

i=1
[g(e
1i
) g(e
2i
)]. (62)
Under the null hypothesis of equal forecast
accuracy, the value of

d is zero. Since

d is
the mean of the individual losses, under fairly
weak conditions, the Central Limit Theorem
implies that

d should have a normal distribu-
tion. Hence it is not necessary to assume that
the individual forecast errors are normally dis-
tributed. Thus if we know V ar(

d), we could
construct the ratio

d/
_
V ar(

d) and test the null


hypothesis of equal forecast accuracy using a
standard normal distribution. In practice, to
implement the test we rst need to estimate
V ar(

d).
If the {d
i
} series are serially uncorrelated with
a sample variance of
0
,the estimate of V ar(

d)
105
is simply
0
/(H 1). Since we use the es-
timated value of the variance, the expression

d/
_

0
/(H 1) has a t-distribution with H 1
degrees of freedom.
If the {d
t
} series are serially correlated, let
i
denote the i-th autocovariance of the {d
t
} se-
quence. Suppose that the rst q values of

i
are dierent from zero. The variance of

d
can be approximated by V ar(

d) = [
0
+2
1
+
. . . + 2
q
](H 1)
1
. Harvey, Leybourne, and
Newbold (1998) recommend constructing the
Diebold-Mariano (DM) statistic as
DM =

d
_
(
0
+2
1
+. . . +2
q
)/(H 1)
. (63)
To conduct a test, compare the sample value
of the test statistic in (63) to the appropriate
t-statistic with H 1 degrees of freedom.
It is also possible to use the method for j-step-
ahead forecasts. If {e
1t
} and {e
2t
} denote two
106
sequences of j-step-ahead forecasts, the DM
statistic is
DM =

d
_

0
+2
1
+...+2
q
H+12j+H
1
j(j1)
.
An example showing the appropriate use of the
Granger-Newbold and Diebold-Mariano tests is
provided in the next section.
10. A Model of the Producer Price Index
This section is intended to illustrate some of
the ambiguities frequently encountered in the
Box-Jenkins technique. These ambiguities may
lead two equally skilled econometricians to es-
timate and forecast the same series using very
dierent ARMA processes. Nonetheless, if you
make reasonable choices, you will select mod-
els that come very close to mimicking the ac-
tual data generating process.
Now we look at the illustration of Box-Jenkins
modeling procedure by estimating a quarterly
107
model of the U.S. Producer Price Index (PPI).
The data used in this section are the series la-
beled PPI in the le QUARTERLY.XLS. Panel
(a) of Figure 2.5 clearly reveals that there is
little point in modeling the series as being sta-
tionary; there is a decidedly positive trend or
drift throughout the period 1960Q1 to 2002Q1.
The rst dierence of the series seems to have
a constant mean, although inspection of Panel
(b) suggests that the variance is an increasing
function of time. As shown in Panel (c), the
rst dierence of the logarithm (denoted by
lppi) is the most likely candidate to be covari-
ance stationary. Moreover, there is a strong
economic reason to be interested in the log-
arithmic change since lppi
t
is a measure of
ination. However, the large volatility of the
PPI accompanying the oil price shocks in the
1970s should make us somewhat wary of the
assumption that the process is covariance sta-
tionary. At this point, some researchers would
make
108
additional transformations intended to reduce
the volatility exhibited in the 1970s. However,
it seems reasonable to estimate a model of the
{lppi
t
} sequence without any further trans-
formations. As always, you should maintain
a healthy skepticism of the accuracy of your
model.
The autocorrelation and partial autocorrela-
tion functions of the {lppi
t
} sequence can
be seen in Figure 2.6. Let us try to identify
the tentative models that we would want to
estimate. In making our decision, we note the
following:
1. The ACF and PACF converge to zero rea-
sonably quickly. We do not want to overdier-
ence the data and try to model the {
2
lppi
t
}
sequence.
2. The theoretical ACF of a pure MA(q) pro-
cess cuts o to zero at lag q and the theoretical
109
ACF of an AR(1) process decays geometrically.
Examination of Figure 2.6 suggests that nei-
ther of these specications seems appropriate
for the sample data.
3. The ACF does not decay geometrically.
The value of
1
is 0.603 and the values of

2
,
3
, and
4
are 0.494, 0.451, and 0.446,
respectively. Thus the ACF is suggestive of
an AR(2) process or a process with both au-
toregressive and moving average components.
The PACF is such that
11
= 0.604 and cuts
o to 0.203 abruptly (i.e.,
22
= 0.203). Over-
all, the PACF suggests that we should consider
models such that p = 1 and p = 2.
4. Note the jump in ACF after lag 4 and the
small jump in the PACF at lag 4 (
44
= 0.148
while
55
= - 0.114). Since we are using quar-
terly data, we might want to incorporate a sea-
sonal factor at lag 4.
Points 1 to 4 suggest an ARMA(1,1) or an
AR(2) model. In addition, we might want to
110
consider models with a seasonal term at lag 4.
However, to compare with a variety of mod-
els, Table 2.4 reports estimates of 6 tentative
models. To ensure comparability, all were esti-
mated over the same sample period. We make
the following observations:
1. The estimated AR(1) model conrms our
analysis conducted in the identication stage.
Even though the estimated value of a
1
(0.603)
is less than unity in absolute value and al-
most four standard deviations from zero, the
AR(1) specication is inadequate. Forming
the Ljung-Box Q-statistic for 4 lags of the resid-
uals yields a value of 13.9, we can reject the
null that Q(4) = 0 at the 1% signicance level.
Hence, the lagged residuals of this model ex-
hibit substantial serial autocorrelation and we
must eliminate this model from consideration.
2. The AR(2) model is an improvement over
the AR(1) specication. The estimated coef-
cients (a
1
= 0.480 and a
2
= 0.209) are each
111
signicantly dierent from zero at conventional
levels and imply characteristic roots in the unit
circle. However, there are some ambiguity about
the information content of the residuals. The
Q-statistics indicate that the autocorrelations
of the residuals are not statistically signicant
at the 5% level but are signicant at the 10%
level. As measured by the AIC and SBC, the
t of the AR(2) model is superior to that of
the AR(1). Overall, the AR(2) model domi-
nates the AR(1) specication.
3. The ARMA(1,1) specication is superior
to the AR(2) model. The estimated coe-
cients are highly signicant (with t-values of
14.9 and - 4.41). The estimated value of a
1
is
positive and less than unity and the Q-statistics
indicate that the autocorrelations of the resid-
uals are not signicant at conventional levels.
Moreover, all goodness-of-t measures select
the ARMA(1,1) specication over the AR(2)
model. Thus, there is little reason to maintain
the AR(2) specication.
112
4. In order to account for the possibility of sea-
sonality, we estimated the ARMA(1,1) model
with the additional moving average coecient
at lag 4. That is, we estimated a model of the
form: y
t
= a
0
+ a
1
y
t1
+
t
+
1

t1
+
4

t4
.
Other seasonal patterns are considered in the
next section. For now, note that the additive
expression
4

t4
is often preferable to an addi-
tive autoregressive term a
4
y
t4
. For truly sea-
sonal shocks, the expression
4

t4
captures
spikes - not decay - at the quarterly lags. The
slope coecients of the estimated ARMA(1,
(1,4)) model are all highly signicant with t-
statistics of 9.46, -3.41, and 3.63. The Q-
statistics of the residuals are all very low, im-
plying that the autocorrelations are not statis-
tically signicantly dierent from zero. More-
over, the AIC and SBC select this model over
the ARMA(1,1) model.
5. In contrast, the ARMA(2,1) contains a su-
peruous coecient. The t-statistic for a
2
is
113
suciently low that we should eliminate this
model.
6. Since a seasonal term seems to t the data
well, we estimated an equation of the form:
y
t
= a
0
+a
1
y
t1
+a
2
y
t2
+
t
+
4

t4
. Notice
that all coecients have t-values in excess of
2.0 and that the Q-statistics are not signi-
cant. However, as measured by the AIC and
the SBC, this model does not t the data as
well as the ARMA(1, (1,4)).
Having identied and estimated a plausible model,
we want to perform additional diagnostic checks
of model adequacy. Due to the high volatility
in the 1970s, the sample was split into two
subperiods. 1960Q3 - 1971Q4 and 1972Q1 -
2002Q1. Model estimates for each subperiod
were
lppi
t
= 0.002+0.621lppi
t1
+
t
0.329
t1
+0.263
t4
(1960Q3 - 1971Q4)
114
and
lppi
t
= 0.002+0.763lppi
t1
+
t
0.350
t1
+0.301
t4
(1972Q1 - 2002Q1)
The coecients of the two models appear to
be quite similar; we can formally test for the
equality of coecients using (47). Respec-
tively, the sum of squared residuals for the two
models are SSR
1
= 0.001267 and SSR
2
=
0.017551. Estimating the model over the full
sample period yields SSR = 0.018870. Since
T = 167 and n = 4, (47) becomes
F =
(0.0188700.0012670.017551)/4
(0.001267+0.017551)/(1678)
= 0.10984.
With 4 degrees of freedom in the numerator
and 159 in the denominator, we cannot reject
the null of no structural change in the coef-
cients (i.e., we accept the hypothesis that
there is no change in the structural coe-
cients).
Out-of-Sample Forecasts
We can assess the forecasting performance of
115
the ARMA(1,1) and ARMA(1, (1,4)) by us-
ing the Granger-Newbold and Diebold-Mariano
tests discussed in the previous section. Given
that the data set contains a total of 167 usable
observations, it is possible to use a holdback
period of 50 observations. This way, there are
at least 117 observations in each of the es-
timated models and an adequate number of
out-of-sample forecasts. First, the two mod-
els were estimated using all available observa-
tions through 1989Q3 and two one-step-ahead
forecasts were obtained. The actual value of
lppi
1989:4
= 0.00385; the ARMA(1,1) pre-
dicted a value of 0.00715 and ARMA(1, (1,4))
model predicted a value of 0.00415. Thus, the
forecast of the ARMA(1, (1,4)) is superior to
that of the ARMA(1,1) for this rst period.
An additional 49 forecasts were obtained for
periods 1990Q1 to 2002Q1. Let {e
1t
} denote
the forecast errors from the ARMA(1,1) model
and {e
2t
} denote the forecast errors from the
116
ARMA(1,(1,4)) model. The mean of {e
1t
} is
-0.00210, the mean of {e
2t
} is -0.002250, and
the estimated variances are such that V ar(e
1t
) =
0.000127 and V ar(e
2t
) = 0.000133. Thus,
the forecasting performance of the ARMA(1,1)
model is better. To ascertain whether this dif-
ference is statistically signicant, we rst use
the Granger-Newbold test. Form the x
t
and
z
t
series as x
t
= e
1t
+ e
2t
and z
t
= e
1t
e
2t
.
The correlation coecient between x
t
and z
t
is r
xz
= 0.0767. Given that there are 50
observations in the holdback period, form the
Granger-Newbold statistic
r
xz
/
_
(1 r
2
xz
)/(H 1)
= 0.0767/
_
(1 (0.0767)
2
)/49
= 0.5387.
With 49 degrees of freedom, a value of t =
-0.5387 is not statistically signicant. We can
conclude that the forecasting performance of
the ARMA(1,1) is not statistically signicantly
dierent from that of the ARMA(1, (1,4)).
117
We obtain virtually the same answer using the
DM statistic. To illustrate the use of the DM
test, suppose that the cost of a forecast error
rises extremely quickly in the size of the error.
In such circumstances, the loss function might
be best represented by the forecast error raised
to the 4-th power. Hence,
d
t
= e
4
1t
e
4
2t
.
The mean value

d of the {d
t
} sequence is 2.995
10
9
and the estimated variance is 4.6487
10
16
. Since H = 50, we can form the DM-
statistic
DM = 2.995 10
9
/(4.6487 10
16
/49)
1/2
= 0.972.
If there were serial correlation in the {d
t
} se-
ries, we would need to use the specication in
(63). Toward this end, we would select the
statistically signicant values of
q
. However,
in this example, we do not need to worry about
118
the serial correlation in the {d
t
} sequence be-
cause the Ljung-Box statistics do not indicate
that the autocorrelations are signicant (The
individual autocorrelations are shown on page
92). Instead, suppose we used the squared
forecast errors as a measure of the loss so that
d
t
= e
2
1t
e
2
2t
. Now

d = 5.863 10
6
and the
estimated variance is 1.8703 10
9
(The es-
timated values of the autocorrelations can be
found on page 92).
The Ljung-Box Q(4) is not signicant at the
5% level, but the Q(8) has a prob-value of
0.004. If we construct the DM statistic using
all 12 autocovariances (recall
i
=
i

0
), we
nd that
0
+2
1
+. . . +2
12
= 5.957 10
9
.
Thus, the value of the DM statistic is
DM = 5.863 10
6
/(5.957 10
9
/49)
1/2
= 0.532.
If we now compare this value to a t-statistic
119
with 49 degrees of freedom, we conclude that
the dierence between the forecasting abilities
of the two models is not statistically signif-
icant. This is not too surprising since Ash-
ley (1997) shows that often very large samples
are necessary to reveal a signicant dierence
between the out-of-sample forecasting perfor-
mances of similar models.
Although the ARMA(1,1) and ARMA(1, (1,4))
appear to be adequate, other researchers might
have selected a decidedly dierent model. Con-
sider some of the alternatives:
1. Trends: Although the logarithmic change
of the PPI appears to be stationary, the ACF
converges to zero rather slowly. Moreover,
both the ARMA(1,1) and ARMA(1, (1,4)) mod-
els yield estimated values of a
1
(0.871 and
0.768, respectively), which are close to unity.
Some researchers might have chosen to model
120
the second-order dierence of the series. Oth-
ers might have detrended the data using a de-
terministic time trend.
2. Seasonality: The seasonality of the data
was modeled by using a moving average term
at lag 4. However, there are many other plausi-
ble ways to model the seasonality in the data,
as will be discussed in the next section. For
example, consider the multiplicative seasonal
model
(1 a
1
L)y
t
= (1 +
1
L)(1 +
4
L
4
)
t
.
Here the seasonal expression
4

t4
enters the
model in a multiplicative rather than a lin-
ear, fashion. Experimenting with various mul-
tiplicative seasonal coecients might be a way
to improve forecasting performance.
3. Volatility: Given the volatility of the lppi
t
sequence during the 1970s, the assumption of
121
a constant variance might not not be appro-
priate. Transforming the data using a square
root, rather than logarithm, might be more
appropriate. A general class of transformation
was proposed by Box and Cox (1964). Sup-
pose that all values of {y
t
} are positive so that
it is possible to construct the transformed {y

t
}
sequence as
y

t
= (y

t
1)/ = 0
= ln(y
t
) = 0.
The common practice is to transform the data
using a preselected value of . Selecting a
value of that is close to zero acts to smooth
the sequence. As in the PPI example, an ARMA
model can be tted to the transformed data.
It is also possible to actually model the vari-
ance using the method discussed in the next
chapter.
122
11. Seasonality
The Box-Jenkins technique for modeling sea-
sonal data is only a bit dierent from that of
nonseasonal data. The twist introduced by
seasonal data of period s is that the seasonal
coecients of the ACF and PACF appear at
lags s, 2s, 3s, . . ., rather than at lags 1, 2, 3,
. . . . For example, two purely seasonal models
for quarterly data might be
y
t
= a
4
y
t4
+
t
(64)
and
y
t
=
t
+
4

t4
. (65)
The theoretical correlogram for (64) is
i
=
(a
4
)
i/4
if i/4 is an integer and
i
= 0 otherwise;
thus, the ACF exhibits decay at lags 4, 8, 12,
. . . . For model (65), the ACF exhibits a single
spike at lag 4 and all other correlations are
zero.
123
In practice, identication will be complicated
by the fact that the seasonal pattern will inter-
act with the nonseasonal pattern in the data.
The ACF and PACF for a combined seasonal/nonseasonal
process will reect both elements. Note that
the nal model of the PPI estimated in the last
section had the form
y
t
= a
1
y
t1
+
t
+
1

t1
+
4

t4
. (66)
Alternatively, an autoregressive coecient at
lag 4 might have been used to capture the
seasonality as follows:
y
t
= a
1
y
t1
+a
4
y
t4
+
t
+
1

t1
.
Both of these models treat the seasonal coef-
cients additively: an AR or an MA coecient
is added at the seasonal period. Multiplica-
tive seasonality allows for the interaction of
the ARMA and the seasonal eects. Consider
the multiplicative specications
(1 a
1
L)y
t
= (1 +
1
L)(1 +
4
L
4
)
t
, (67)
124
(1 a
1
L)(1 a
4
L
4
)y
t
= (1 +
1
L)
t
. (68)
Equation (67) diers from (66) in that it al-
lows the moving average term at lag 1 to in-
teract with the seasonal moving average eect
at lag 4. In the same way, (68) allows the au-
toregressive term at lag 1 to interact with the
seasonal autoregressive eect at lag 4. Many
researchers prefer the multiplicative form since
a rich interaction pattern can be captured with
a small number of coecients. Rewrite (67)
as
y
t
= a
1
y
t1
+
t
+
1

t1
+
4

t4
+
1

t5
.
Estimating only three coecients (i.e., a
1
,
2
,
and
4
) allows us to capture the eects of an
autoregressive term and the eects of moving
average terms at lags 1, 4, and 5. Of course,
you do not get something for nothing. The
estimates of the three moving average coe-
cients are interrelated. A researcher
125
estimating the unconstrained model y
t
= a
1
y
t1
+

t
+
1

t1
+
4

t4
+
5

t5
would necessar-
ily obtain a smaller residual sum of squares.
However, (67) is clearly the more parsimonious
model. If the unconstrained value of
5
ap-
proximates the product
1

4
, the multiplicative
model will be preferable.
Seasonal Dierencing
The dashed line in Figure 2.7 shows the U.S.
money supply, as measured by M1. It has
a decidedly upward trend. The series, called
MINSA, is in the le QUARTERLY.XLS. We
use the data to follow along with the discus-
sion below. The logarithmic change, shown by
the solid line, appears to be stationary. Nev-
ertheless, there is a clear seasonal pattern in
that the value of the fourth quarter for every
year is substantially higher than that for the
adjacent quarters.
126
This combination of strong seasonality and non-
stationarity is often found in economic data.
The ACF for a process with strong season-
ality is similar to that for a nonseasonal pro-
cess; the main dierence is that the spikes at
lags s, 2s, 3s, . . . do not exhibit rapid decay. We
know that it is necessary to dierence (or take
logarithmic change of) a nonstationary pro-
cess. Similarly, if the autocorrelations at the
seasonal lags do not decay, it is necessary to
take the seasonal dierence so that the other
autocorrelations are not dwarfed by the sea-
sonal eects. The ACF and PACF for the
growth rate of M1 as shown in Panel (a) of
Figure 2.8. For now, just focus on the auto-
correlations at the seasonal lags. All seasonal
autocorrelations are large and show no ten-
dency to decay. In particular,
4
= 0.66,
8
=
0.52,
12
= 0.43,
16
= 0.42,
20
= 0.47, and

24
= 0.49. These autocorrelations reect the
fact that the change in M1 from one Christ-
mas season to the next is not as pronounced
as the change between the fourth quarter and
other quarters.
127
The rst step in the Box-Jenkins method to
transform the data so as to make it station-
ary. A logarithmic transformation is helpful
because i can straighten the nonlinear trend
in M1. Let y
t
denote the log of M1. As men-
tioned above, the rst dierence of the {y
t
}
sequence, illustrated by the solid line in Fig-
ure 2.7, appears to be stationary. However, to
remove the strong seasonal persistence in the
data, we need to take the seasonal dierence.
For quarterly data, the seasonal dierence is
y
t
y
t4
. Since the order of dierencing is irrel-
evant, we can form the transformed sequence
m
t
= (1 L)(1 L
4
)y
t
.
Thus, we use the seasonal dierence of the
rst dierence. The ACF and PACF for the
{m
t
} sequence are shown in Panel (b) of Fig-
ure 2.8; the properties of this series are much
more amenable to the Box-Jenkins methodol-
ogy. The autocorrelation and partial autocor-
relations for the rst few lags are strongly
128
suggestive of an AR(1) process (
1
=
11
=
0.38,
2
= 0.16, and
22
= 0.02). Recall that
the ACF for an AR(1) process will decay and
the PACF will cut o to zero after lag 1. Given
that
4
= 0.28,
5
= 0.01,
44
= 0.34, and

55
= 0.28, there is evidence of remaining sea-
sonality in the {m
t
} sequence. The seasonal
term is most likely to be in the form of an
MA coecient since the autocorrelation cuts
o to zero whereas the PACF does not. Nev-
ertheless, it is best to estimate several similar
models and then select the best. Estimates of
the following 3 models are reported in Table
2.5:
Model 1: AR(1) with Seasonal MA
m
t
= a
0
+a
1
m
t1
+
t
+
4

t4
Model 2: Multiplicative Autorgressive
m
t
= a
0
+(1 +a
1
L)(1 +a
4
L
4
)m
t1
+
t
Model 3: Multiplicative Moving Average
m
t
= a
0
+(1 +
1
L)(1 +
4
L
4
)
t
129
The point estimates of the coecients all im-
ply stationarity and invertibility. Moreover, all
are at least 6 standard deviations from zero.
However, the diagnostic statistics all suggest
that model 1 is preferred. Model 1 has the
best r in that it has the lowest sum of squared
residuals (SSR). Moreover, the Q-statistics for
lags 4, 8, and 12 indicate that the residual
autocorrelations are insignicant. In contrast,
the residual correlations for Model 2 are signi-
cant at long lags (i. e., Q(8) and Q(12) are sig-
nicant at the 0.007 and 0.002 levels]. This is
because the multiplicative seasonal autoregres-
sive (SAR) term does not adequately capture
the seasonal pattern. An SAR term implies
autoregressive decay from period s into period
s + 1. In Panel (b) of Figure 2.5, the value
of
1
is -0.28 and
5
is almost zero. Thus, a
multiplicative seasonal moving average (SMA)
term is more appropriate. Model 3 properly
captures the seasonal pattern, but the MA(1)
130
term does not capture the autoregressive de-
cay present at the short lags. Other diagnostic
methods, including splitting the sample, sug-
gest that Model 1 is appropriate.
The out-of-sample forecasts are shown in Fig-
ure 2.9. To create the one-through-twelve-
step-ahead forecasts, Model 1 was estimated
over the full sample period 1961Q3 - 2002Q1.
The estimated model is
m
t
= 0.529m
t1
+
t
0.758
t4
. (69)
Given that m
2002:1
= 0.00795 and the resid-
ual for 2001:02 was 0.00119 (i.e.,
2001Q2
=
0.0119), the forecast of m
2002:2
is -0.00490.
Now use this forecast and the value of
2001Q3
to forecast m
2002Q3
. You can continue in this
fashion so as to obtain the out-of-sample fore-
casts for the {m
t
} sequence. Although you
do not have the residuals for periods beyond
2002Q1, you can simply use their forecasted
131
values of zero. The trick to forecasting future
values of M1 from the {m
t
} sequence is to sum
the changes and the seasonal changes so as to
obtain the logarithm of the forecasted values
of M1. Since m
t
= (1 L)(1 L
4
)ln(M1
t
),
it follows that the values of ln(M1
t
) can be
obtained from m
t
+ ln(M1
t1
) + ln(M1
t4
)
ln(M1
t5
). The rst 12 of the forecasted val-
ues are plotted in Figure 2.9.
The procedures illustrated in this example of
tting a model to highly seasonal data are typ-
ical of many other series. With highly seasonal
data:
1. In the identication stage, it is usually nec-
essary to seasonally dierence the data and to
check the ACF of the resultant series. Often,
the seasonally dierenced data will not be sta-
tionary. In such instances, the data may also
need to be rst dierenced.
132
2. Use the ACF and PACF to identify poten-
tial models. Try to estimate models with low-
order nonseasonal ARMA coecients. Con-
sider both additive and multiplicative season-
ality. allow the appropriate form of seasonal-
ity to be determined by the various diagnostic
statistics.
A compact notation that allows for ecient
representation of intricate models is as follows.
The d-th dierence of a series is denoted by

d
. Hence,

2
y
t
= y
t
= (y
t
y
t1
)
= y
t
y
t1
)
= [y
t
y
t1
] [y
t1
y
t2
]
= y
t
2y
t1
+y
t2
.
A seasonal dierence is denoted by
s
where
s stands for the seasonal period of the data.
Thus, the D-th seasonal dierence is denoted
133
by
D
s
. For example, the second seasonal dif-
ference of a monthly series is

2
12
y
t
=
12

12
y
t
=
12
(y
t
y
t12
)
=
12
y
t

12
y
t12
)
= [y
t
y
t12
] [y
t12
y
t24
]
= y
t
2y
t12
+y
t24
.
Combining two dierent types of dierencing
yields
d

D
s
. Multiplicative models are writ-
ten in the form ARIMA(p, d, q)(P, D, Q)
s
, where
p and q = the nonseasonal ARMA coecients
d = number of nonseasonal dierences
P = number of multiplicative auto-
regressive coecients
D = number of seasonal dierences
Q = number of multiplicative moving
average coecients
s = seasonal period.
Using this notation, we can say that the t-
ted model of PPI is an ARIMA(1,1,0)(0,1,1)
4
model.
134
Moreover, the value of m
t
can be written as
m
t
=
4
ln(M1
t
).
In applied work, the ARIMA(1,1,0)(0,1,1) and
the ARIMA(0,1,1)(0,1,1) models occur rou-
tinely; the latter is called the airline model ever
sine Box and Jenkins (1976) used this model
to analyze airline travel data.
12. Summary and Conclusions
This chapter has focused on the Box-Jenkins
(1976) approach to identication, estimation,
diagnostic checking, and forecasting a univari-
ate time-series. ARMA models can be viewed
as a special class of linear stochastic dierence
equations. By denition, an ARMA model is
covariance stationary in that it has nite and
time-invariant mean and covariances. For an
ARMA model to be stationary, the character-
istic roots of the dierence equation must lie
inside the unit circle. Moreover, the process
135
must have started in the innite past or the
process must always be in equilibrium.
In the identication stage, the series is plot-
ted and the sample autocorrelations and par-
tial autocorrelations are examined. As illus-
trated using the U.S. PPI, a slowly decaying
autocorrelation function suggests nonstation-
ary behavior of the data. Box and Jenkins rec-
ommend dierencing the data. Formal tests
of nonstationarity are presented in Chapter 4.
A common practice is to use a logarithmic or
Bo-Cox transformation if the variance does not
appear to be constant. Chapter 3 presents a
number of modern techniques that can be used
to model the variance.
The sample autocorrelations and partial auto-
correlations of the suitably transformed data
are compared to those of various theoretical
136
ARMA processes. All plausible models are es-
timated and compared using a battery of di-
agnostic tests. A well-estimated model: (i)
is parsimonious; (ii) has coecients that im-
ply stationarity and invertibility; (iii) ts the
data well; (iv) has residuals that approximate
a white noise process; (v) has coecients that
do not change over the sample period; and (vi)
has good out-of-sample forecasts.
137

Anda mungkin juga menyukai