(c) Copyright 1990, 1994, 1996, Edward J. Wegman, All Rights Reserved
Chapter 1
Introduction
Time series analysis deals with the probabilistic and structural inference about a sequence of data
evolving through time. Normally, we shall write X , ,X as the time series observations. Frequently, this
will be a sample from a continuous stochastic process X! ( ) so that X really means X ! where "t is the
sampling interval. Our basic model conceptually is a stochastic process X! ( ) where t is the time index and
+, where (+, 7, P) is a probability space and P is a probability measure. In simple terms, we are interested
in discovering P based on a time series X , ,X . Fortunately, or unfortunately, this is a ridiculously hard
problem, in general. Hence, we typically make simplifying assumptions. For the purposes of our discussion, we
will usually only consider finite dimensional distributions, i.e.
Fx x (u , ,u ) y P (X! | u , , X! | u ), Dk, Dt , , t , Du , , u
which is in the simple one dimensional case,
Fx (u ) y P (X | u ).
The well-known Kolmogorov Extension Theorem basically says that knowledge of all finite-dimensional
distributions is sufficient to reconstruct the probability measure P. This theorem is well beyond the scope of our
course, however, and hence will not be discussed any further.
We actually go even further in making assumptions and assume that the finite dimensional
distributions are all multivariate normal (or Gaussian). Since this is the case, it is sufficient to know only the
first and second moments because the first and second moments of a Gaussian distribution is necessary and
sufficient to characterize that distribution. We let O EX and , O cov (X , X ). If we know and
, , then we have enough information to reconstruct all of the finite dimensional distributions and hence the
probability measure, P.
We actually begin our discussion with even one more simplification. We basically want our process to
be invariant with a time shift, i.e. time homogeneous so that we will assume, at least initially, that the
probability structure does not shift with time. We write this as follows:
(1.1)
Fx ]
x ]
(t , , t ).
for every t , , t , for every i , , i , for every k and for every . The parameter is the constant time shift.
If this equation holds we say the process X is strictly stationary.
A corollary to our Gaussian assumption is that strictly stationary implies means and covariances are
invariant with time shifts. In particular,
(1.2)
]
and
(1.3)
!,!
!] !] .
Thus if we let
and if we let
, a constant,
!^!,
!,!
!^! .
Thus, the mean is constant; we will usually assume O 0 for convenience. Moreover, the covariance
depends only on the difference between the indices, called the lag. !^! is called the autocovariance or lag
covariance. A process which has a constant first moment and a covariance function which depends only on the
lag is said to be weakly stationary or wide-sense stationary. A process may be strictly stationary and not weakly
stationary if it has no second-order moments. Conversely, a process may be weakly stationary, but not strictly
stationary is say its third moments change with time. Usually there is no confusion since under Gaussian
distribution assumptions, these two notions are equivalent. In this case we simply say the process is stationary.
The problem of estimating P has been reduced to the much easier problem of estimating . This is the
probabilistic or statistical inference problem. We are also interested in structural inference, i.e. finding the
functional structure of the X process. We can do this in the frequency domain or the time domain.
In the frequency domain, we consider the function,
f()
(1.4)
e ^q d
This is called the spectral density or power spectral density. The spectral density is the Fourier transform of the
covariance function and hence carries the same information as the covariance function.
The time domain models, usually assumed to be linear and stationary, are of the form:
(1.5)
X!
X!^
]]
X !^ + !
!^ + + !^ ,
where are parameters and ! is a white noise. This is called the autoregressivemoving average model (ARMA) and is the most general linear model of a stationary stochastic process. As we
shall later see, the parameters y , ) and y ) determine and are determined by the
covariance function. Moreover, we will see that we can write the spectral density in terms of and so that we
have a fundamental triad given in the figure below.
Probabilistic
! (P)
Frequency Domain
f()
Time Domain
Basically given one corner, we can derive the other two. Hence, under the stationary, Gaussian
assumptions, analyses in the probability domain, the frequency domain, or the time domain are all basically
equivalent and fundamentally dependent upon the mean and covariance structure of the process. The nature of
the problem and what we are trying to infer will generally lead us to prefer one mode over another.
Generally speaking, engineering applications tend to succeed somewhat better with a frequency domain
model while business and economic applications tend to be somewhat better represented by time domain
models. This, of course, is not a hard and fast rule by any means. We will consider both frequency domain and
time domain models.
Chapter 2
Frequency Models
The major focus of the first part of our discussion is structural inference for time series in the frequency
domain. Historically, one of the earliest attempts to model this time series structure is known as the method of
hidden periodicities. The premise for such a techniques is that the time series may be modeled by a linearly
superimposed sum of sinusoids together with some Gaussian errors. We will develop such a model in several
stages.
2.1. A Single Sinusoid in Noise. Consider the model, X! y R cos (t + ), real-valued is called the
(angular) frequency, [ ^ , ] is called the phase, R } 0 is called the amplitude. The stochastic model is
(2.1.1)
y]
X
R cos (t + ) + .
cos() cos( )
R cos(t + )
R cos(t) cos()
(2.1.2)
sin() sin( ).
R sin(t) sin().
y]
X
where,
A
A cos(t) + B sin(t) +
y ^ R sin().
R cos(), B
SS
(X
^
!y
A cos(
t)
B sin(
t))
^
!y
! .
2 (X
^
!y
y ^
and
A cos(
t)
B sin(
t)) ( ^ cos(
t))
2 cos(
t) (X ^ A cos(
t) ^ B sin(
t))
^
y
C SS
CB
O
Setting partials
2 (X
^
!y
A cos(
t)
^ 2 sin(
t) (X
^
y
B sin(
t)) ( ^ sin(
t))
A cos(
t)
B sin(
t)).
cos
^
y
(
t) + B
cos(
t) sin(
t) ^ cos(
t) X
^
^
!y
!y
cos(
t) sin(
t) + B sin
^
^
y
!y
or rewriting,
!y
^
y
X sin(
t)
!
!y
0.
cos (
t), b y cos(
t)sin(
t) and c y sin (
t).
^
^
!y
^
!y
!y
^
A ab + B b
A ab + B ac
!y
X! cos(
t)
and
X sin(
t) y 0.
^
!y
So that
(2.1.6)
X sin(
t) ^ b X cos(
t)
^
!y
^
!y
ac ^ b
Similarly,
A ac + B bc
X cos(
t) y 0
^
!y
sin(
t) X y 0
^
X cos(
t) y 0
Ab+Bc
^
Aa+Bb
t)
(
X sin(
t) y 0.
^
A b + B bc
!y
X cos(
t) ^ b X sin(
t) y 0
^
ac) + c
^
!y
!y
which yields
(2.1.7)
X
^
!y
cos(
t) ^ b
ac ^
b
X
^
!y
sin(
t)
sin (
t) X cos(
t) ^ sin(
t) cos(
t) X sin(
t)
^
^
!y
^
^
!y
!y
!y
(2.1.8)
and
cos (
t) X sin(
t)^ sin(
t) cos(
t) X cos(
t)
^
!y
^
!y
^
^
!y
!y
where
D O ac ^ b
y sin (
t) cos (
t) ^ ( sin(
t)cos(
t)) .
^
^
^
!y
!y
!y
{cos ( + ) + cos (
^ )}.
O O
t,
cos
^
y
^ cos(2
t) +
!y
1
2
n
2
t)
(
cos(
t) cos(
t) O {cos(2
t) + cos (0)}
^
!y
1
2
^
!y
^
cos (2
t), we recall the fundamental identity,
y
cos (t)
e t + e^
2
so that
cos (t) O J e + e K.
^
^
1
2
!y
^
!
!y
^ !
!y
e
^
!
!y
e
^
1^e
1 ^ e
and
^ !
!y
1 ^ e ^
1^ e ^
cos (t) O
^
1
2
!y
J 11^ ee
^
1 ^ e ^
1 ^ e^
Factoring out e in numerator of first term, e in denominator of the first, e^ in the numerator of the
second term and finally e^ in the denominator of the
second, we have
cos (t)
^
!y
1
2
.e ^
.e
^
^e
^e
^
] ee
^
So that
cos (t) O J
^
!y
(^)
+e
2
^ ^
Je ^
Je ^
^e
^e
or
cos(t) O cos $ %
^
!y
(n^1)
2
sin $ n2 %
sin $ %
2
^ e ^ /
.
. ^e^/
.e
e
Thus letting O 2
cos (2
t) O
^
!y
cos (n ^ 1)
sin (n
)
sin (
) .
Finally we have
cos (
t) O
^
(2.1.9)
!y
1
2
cos (n ^ 1)
sin (n
)
sin (
)
sin (n
)
n sin (
) ,
.1 +
O
The function, D (
)
n
2
n
2
n
2
{1 + D (
) cos $(n ^ 1)
%}.
sin (n
)
n sin (
)
cos $(n ^ 1)
%/
cos (
t) sin (
t) O
^
(2.1.10)
!y
n
2
D (
) sin (n ^ 1)
and
sin (
t) O
^
(2.1.11)
!y
n
2
{ 1 ^ D (
) cos (n ^ 1)
}.
Notice as verification
sin
^
!y
(
t) +
cos
^
!y
n
2
{1
(
t)
{1 + D (
) cos (n ^ 1)
}
n
2
D (
) cos $(n ^ 1)
%
n
2
n
2
n.
Now we can write A and B in terms of the Dirichlet Kernel in closed form by substituting (2.1.9) through
(2.1.11) in (2.1.8). Now we have defined a, b and c as follows.
a
(2.1.12)
y
b
.1 + D (
) cos $(n ^ 1)
%/,
n
2
n
2
D (
) sin $(n ^ 1)
%,
and
n
2
.1 ^ D (
) cos$(n ^ 1)
%/.
A + B
r y
Therefore R
r
nA
]r
B
O ^ R sin ( )
R cos ( ) , B
so that
B
A
R sin ()
R cos ()
y R .
tan ( ).
Case 3. If A
( , ) so that
2
0, B
arctan ( ^ B )
A
Case 4. If B
0, then
Case 5. If A
0, B > 0, then
Case 6. If A
arctan ( ^
y 0 B z 0 then y
^
2
^ and ^ 32 .
2
.
0 and B
B
A
0. In this case,
.
0.
r r
y ^ .
2
.
2
Thus we can determine R , from A and B which can in turn be estimated from the data.
Least Squares Solution with Non-Zero Mean
If , the mean, is not equal to 0, when we may construct normal equations as follows. Consider again
the sum of squares.
SS
y ! y
(X
^
^
!y
!y
^^
A cos (
t)
B sin (
t)) .
C SS
CA
O ^
{(X ^ ) ^ A cos (
t) ^ B sin (
t)) cos (
t)} y 0
^
!y
{(X ^ ) ^ A cos (
t) ^ B sin (
t) ) sin (
t)} y 0
^
O ^
C SS
CB
(2.1.14)
!y
and
C SS
C
(2.1.15)
O ^2
{ (X ^ ) ^ A cos (
t) ^ B sin (
t)} y 0.
^
!y
Equations (2.1.13) and (2.1.14) are the same as our previous normal equations with X ^ substituted for X .
sin (
t) (X ^r)cos(
t)^ sin(
t)cos(
t) (X ^r)sin(
t)
^
^
!y
!y
^
^
!y
!y
(2.1.16)
and
cos (
t) (X ^r)sin(
t)^ sin(
t)cos(
t) (X ^r)cos(
t)
^
^
!y
!y
^
^
!y
!y
where again
DO
sin (
t) cos (
t) ^ ( sin(
t)cos(
t)) .
^
^
y
^
!y
!y
X
^
!y
A cos (
t) ^ B sin (
t) y 0.
^
^
!y
!y
(2.1.17)
Notice two dilemmas: 1. As formulated solution for A and B depends on and similarly the solution for
depends an A and B , and 2. Notice also that the estimator for is not X . Thus as things stand now we have
a recursive, but not direct solution for the case that 0. Remember that our first inclination would be to
^
centralize the data by subtract X , but (2.1.17) tells us that we may not do this either and have a least squares
solution.
The Matrix Formulation of the General Least Squares Problem
Letting a, b and c have their previous meaning and now defining d
we can write
cos(
t) and e y sin(
t),
^
^
!y
!y
Aa+Bb+d
Ab+Bc+e
(2.1.18)
X cos (
t)
X sin (
t)
^
!y
^
!y
and
Ad+Be+n
X
^
! y
!y
0.
(2.1.19)
a
b
d
b d
c e
e n
H
xA{
zH}
zB}
z }
yH|
x X cos (
t) {
z
}
z
z X sin (
t) }.
}
z
}
z
}
z
}
z
}
X
y
|
^
!y
^
!y
^
!y
Taking inverse matrix and solving for the unknown parameters, we have
(2.1.20)
H
xA{ a
zH} b
zB} B
z } d
yH|
O
x X cos (
t) {
z
}
d ^ z
z X sin (
t) }.
}
z
}
eC
z
}
z
}
n
z
}
X
y
|
^
b
c
e
!y
^
!y
^
!y
Now
(2.1.21)
n D (
)
sin (n
)
sin (
)
1
sin (
)
for
bounded away from 0 and .
Thus
.
cos (
t) O n/2,
^
!y
.
sin(
t) cos(
t) O 0,
^
(2.1.22)
!y
and
.
sin (
t) O n/2.
^
!y
(2.1.23)
2
n
2
n
(X
^ r ) cos (
t),
(X
^ r ) sin (
t)
^
!y
^
!y
and
H y ^.
X
Least Squares Estimation of the Optimal Frequency,
, in Model (2.1.1)
r r
Using this formulation of the quick and dirty estimates of A , B and , we may perform a calculation
for the residual sum of squares,
SS (residuals)
(X
^
!y
^r ^
A cos
t
^ r sin
t)
B
r
r
(X ^ r ) ^ 2 (X ^ r ) (A cos (
t) + r sin (
t)) ] (A cos(
t) + B sin (
t))
B
^
O
^
O
^
O
^
^
r
r r ^
O (X! ^ r ) ^ 2nA ^ 2nB + A cos (
t) + r sin (
t)
B
!y
!y
!y
HH
+ 2A B
(X
^ r ) ^
^
.
O
!y
r
nA
sin (
t) cos (
t)
^
!y
r
^ n r + A $ n % + r $ n % + 0.
B
B
2
2
Therefore,
^ r) ^
. ^ (X
SS (residuals) O
!
!y
n
2
r r
(A + B ).
(X
^
SS(corrected)
!y
.
^ r) y
where we define
SS (due to model at
)
n
2
H H
(A + B )
O nH .
R
2
We recall that R is amplitude squared of the cosine model, (2.1.1) and note that we can regard R as a function
of the frequency, R (
). This suggests that we can also estimate
. Thus, the least squares estimate of
is
H
n
8 R
(
) is called the periodogram.
X!
{A cos (
y
t)
+ B sin (
t)} +! , t
0, ,n^1
In this case,
(2.2.2)
SS
^
!y
!
$X ^
^
!y
{ A cos (
t)
y
] B sin (
t)}% .
$X ^ ^ {A cos (
t) + B sin (
t)}%cos(
t) y 0, k y 1, ,m
^
C SS
C A
C SS
C B
!y
y
$X ^ ^ {A cos (
t) + B sin (
t)}%sin(
t) y 0, k y 1, ,m
^
!y
y
and
C SS
C
$X ^ ^ {A cos (
t) + B sin (
t)}% y 0.
^
!y
y
cos (
t) cos (
t)
^
b
^
c
(2.2.3)
^
a
!y
sin (
t) cos (
t)
!y
d
sin (
t) sin (
t)
!y
cos (
t)
^
!y
and
e
sin (
t).
^
!y
A a
y
A b
(2.2.4)
y
]
]
B b
y
B c
y
X cos (
t) y 0, k y 1, ,m
d
^
e
^
!y
X sin (
t) y 0, k y 1, ,m
!y
and
A d
y
B e
y
X
^
!y
x a
z a
a
y y
a
a
a
a
{
}
|
a
a
a
0.
x b
z b
b
y y
b
x c
z c
c
y y
c
b
b
b
c
c
c
{
}
|
b
{
}
|
c
b
b
c
c
(2.2.5)
x d {
z d }
d
y y |,
d
x A {
A
A
y z },
y A |
and finally,
x e {
z e }
e
yy|
e
x B {
B
B
yz }
y B |
(2.2.6)
^
x X! cos (
t) {
z !y
}
z
}
z
}
z ^
}
z
z X! cos (
t) }
}
z !y
}
z ^
}
z
z X! sin (
t) }.
}
z !y
}
z
}
z
}
z
}
z ^
z X sin (
t) }
}
z
}
z !y
}
z
}
^
X
!y
a
x \
z bT
\
y \T
d
(2.2.7)
b
\
c
\
e T
\
H
xA
d
\{z\
e } z H
\ z\
B
n | y H
{
}
} S
} O\
|
(2.2.8)
H
xA{ x a
z \ } z \T
zH} O b
z\}
\
B
y \T
d
yH|
a
^1
d
\{
e } S.
\
\
n |
b
\
c
\
e T
\
cos(
t) cos(
t) will play a critical role in
^
O
evaluating equations (2.2.8). Let us temporarily divert our attentions to evaluating expressions of this sort. Let
us restrict our attention to 0 |
| 2 and we will eventually want to consider
y 2 (j/n), j y 0, 1,
, n ^ 1. Thus just as we consider sampled times j y 0 , , n ^ 1, we consider sampled frequencies
,
j y 0 , , n ^ 1.
Recall the identity we used earlier
cos(a)cos(b)
We may thus write
{cos(a ] b)
cos(a ^ b)}.
cos(
t)cos(
t) O . cos$(
]
) t% ] cos$(
^
) t%.
^
O
^
^
O
O
Using the earlier closed form solution and the Dirichlet formula, we may write
cos(
t)cos(
t) O
^
!y
]
where D (x)
cos
cos$
(n^1)(
]
)
2
(n^1)(
^
)
2
%nD $
]
2
% n D $
^
%
2
sin(nx)
n sin(x) .
]
y 2(j ] k)/n. If j k mod (n), then the first Dirichlet function D ((j ] k)/n) has a numerator equal
to sin$(j ] k)% y 0 since j ] k is always an integer. In the same way, D ( (j ^ k)/n) has a numerator equal
to sin$(j ^ k)%
m. Thus
2j/n
(2.2.9)
cos(
t)cos(
t) O cos (
t) O
^
!y
^
!y
n
2
1 ] D (2j/n)cos $(n ^ 1)
%.
0. Hence for
cos(
t)cos(
t) O
^
(2.2.10)
!y
y 2j/n
n
2.
O 2j/n,
(2.2.11)
a
O .0,
Similarly,
(2.2.12)
and
b
(2.2.13)
c O .
n/2, jyk
jk.
0, ,j, k
n/2, jyk
0,
jk.
A
and
2
n
(X ^ r ) cos(
t),
^
!y
0<j
| n^1
B
2
n
(X ^ r ) sin(
t),
^
!y
z j | n^1
r
H y A y
which are as before the quick and dirty estimators.
1
n
X
^
!y
y^
X
Chapter 3
Fourier Transforms and Harmonic Analysis
With the basic theory of frequency domain models now in hand, it is important to develop diagnostic
tools to structure these models. One of the most powerful of these diagnostic tools is harmonic analysis which
may be intuitively thought of as the decomposition of a time series into its sinusoidal components. Roughly
speaking, our general frequency domain model (2.2.1) suggests that we conceive of a time series as the
superposition of sinusoidal components. The determination of which of these frequency components,
, are the
critical ones is the point harmonic analysis and the basic mathematical tool for doing this is the Fourier
transform. We begin by considering complex sinusoids. Again we will be assuming throughout the next
several sections that
O 2 j/n as we did in the last section.
Complex Sinusoids
Recall that e O cos() ] i sin() which is the so-called complex sinusoid. From this it is easy to
see that e^ y cos ( ^ ) + i sin ( ^ ) y cos() ^ i sin() for (0, ). From this it is easy to derive
the fundamental identity,
cos ()
(3.1.1)
e + e^
2
(3.1.2)
e
^e
^
2i
e
^
!
!y
e ^
!
e
^
^
!
!y
cos $(
^
) t% + i sin$
^
^
!y
!y
) t%.
e
^
(
^
) !
!y
Jcos :
n^1(
^
)
2
] i sin : (n^1)(
^
) ;K n D : (
^
) ;
2
2
But
2(j^k)
n
so that, sin:
n (
^
)
2
sin6(j ^ k)7
n D
is 0 unless
^
(
^
)
2
;O
e
(
^
) !
!y
n (
^
)
2
; / sin :
^
2
0. In this case the ratio is undefined. Since this ratio is 0, it follows that,
^
Now if
^
sin:
0 unless
^
0.
0, then
e
^
e
^
(
^
) !
!y
1 O n.
^
!y
!y
Thus we have
e
^
(3.1.3)
(
^
) !
!y
n, jyk mod(n)
0, otherwise.
J
1
n
X e
^
!y
^
! ,
y 1, , n ^ 1.
The set of possibly complex numbers J , ,J^ is the discrete Fourier transform (DFT) of X , ,X^ . To
give a bit of intuition to our interpretation of the DFT, suppose X and Y are two times series each with 0
mean. Then the expression,
cov(X, Y)
1
n
X Y
^
!y
is the sample covariance of the two time series. Now if we take Y! y e^
! y cos(
t) ^ i sin(
t) as the
fundamental complex sinusoid of frequency
, then J measures the degree of correlation" of X with this
fundamental sinusoid of frequency
. Of course, this is not a true correlation in the straight statistical sense
since J may be complex valued. However, if we look at the magnitude of J , i.e. J we do indeed get
something very close to a correlation. We shall examine this in more detail later. Let us notice now, however,
that
^ y 2(n ^ j)/n y 2 ^ 2j/n. Thus exp( ^ i
^ t) y exp( ^ 2it) exp(2jt/n). Because ^ 2 t is
an integer multiple of 2 , we have exp( ^ 2 it) y 1 and exp( ^ i
^ t) y exp (i
t). Thus
(3.1.5)
J^
1
n
X e^
^
!y
^ !
1
n
X e
^
!y
^^ !
1
n
X e
^
!y
^ ^ !
J^ .
J*
^
O J1
n
X e
^
y
^
^ !
K O
1
n
X* e
^
!y
^ !
1
n
X e
^
!y
^
!
J
a + ib, then
J%,
X e
^
1
n
!y
Y e
^
1
n
*
J . This is called the Hermetian or symmetry property. Now let
!y
^
!
^
!
1
n
(Y + Z ) e
^
1
n
!y
Z e
^
^
!
!y
^
!
J& + J' , .
J e
^
X
y
!
if J
X e
^
1
n
!y
^
! .
We consider then
1
n
(3.1.8)
J e
^ ^
!y y
! e^
!
1
n
J e
^
y
^
^
! .
!y
Now by (3.1.3)
(3.1.9)
Now because J O
J e
^ ^
1
n
1
n
Xe
^
!y
!y y
^
! ,
! e^
!
J x J
n, j y k mod(n)
^
1
n
y
J .
0, j k mod(n)
J e
^
y
! .
pairs.
An interesting connection may be made with model (2.2.1). Choose this model with
the quick and dirty solution for the A and the B . In particular
A
B
(3.1.10)
2
n
X cos(
t),
^
!y
0<j
| n^1
and
(3.1.11)
2
n
X sin(
t),
^
!y
z j | n ^ 1.
X e
^
1
n
!y
Thus we have
(3.1.12)
Consider
(3.1.13)
| J |
J J *
^
!
1
n
J
J
1
4 (A ^
X cos(
t) ^ X sin(
t).
^
!y
A
i
n
^
!y
B .
i B ) (A + B )
1
4
(A
i B )
0 and recall
A + B )
1
4(
1
4 R .
Thus the square modulus of the Fourier transform is related to the fundamental model. Recall from before that
we had defined R by the expression
R cos (
t + )
(3.1.14)
Thus the quick and dirty computation of the solution to (2.2.1) can be computed in terms of the real and
imaginary components of the Fourier transform of the time series, i.e. A y 2 real (J ) and B y ^ 2
imaginary (J ).
The use of J as a diagnostic tool is to be explored a bit further. Basically we gave a correlation" type
*
interpretation to J . Because J J y R is the amplitude squared of the frequency component at . By plotting
R , we can get an idea of power of frequency component at .
Fourier Transforms for a Continuous Range of Frequencies
A natural question concerns the extension of the concept of Fourier transforms to a continuous range of
frequencies. Consider
J()
(3.1.15)
1
n
X e
^
^
!
!y
e
d
^
However, if n
^
2 .
0,
^
sin (
n) d
^
sinn)
n
cos (
n d
Also
O^
e
n d
Thus
sin(^n)
n
^
0^0
n
y ^
1^1
n
0.
0.
0.
(3.1.16)
^
2, ny0
OJ
0, n 0 and an integer
This result is analogous to the orthogonality result in the discrete case. We may derive
(3.1.17)
X
n
2
J(
) e
! d
, 0
|tz
so that J(
) and X are Fourier transform pairs as before. To see this consider,
J(
) e
! d
n
2
^
n
2
^
X
^
1
n
"
"y
e ^
" e
! d
by definition. Re-ordering,
n
2
J(
) e
! d
^
1
2
X
^
"
"y
e^
(!^") d
.
^
But by (3.1.16),
J(
) e
! d
n
2
^
2, tyu
X xJ
^
1
2
"
"y
X! 2
1
2
0, tu
X! .
(3.1.18)
X e
^
1
n
|
!y
y J X e K
^(^
)!
^
1
n
!y
^
!
J(
)* .
Parseval's Relation
Now let us consider,
| X
^
!y
X X
^
!y
^ ^
*
!
!y y
J e^
!
Je
^
y
*
!
| X
^
y
|
^
y
J e
^
J
y
^
*
^
!
!y
J J xJ
^
y
^
y
n, jyk mod(n)
*
.
0, jk mod(n)
| X
^
!y
|
J J
^
y
J
^
y
.
The equation
^
(3.1.19)
!y
^
y
| X!
y
J
| J(
) | d
^
^
1
n
X e
^
!y
^
! 1
n
X
^
"y
*
"
e
" d
.
| J(
) | d
1
n
1
n
^
Using (3.1.16)
| J(
) |
^
d
X! X*
e
("^!) d
.
"
X! X*
"
^ ^
^
!y "y
^ ^
!y "y
2, tyu
0, otherwise.
(3.1.20)
| J(
) | d
^
2
n
XX
^
!y
*
!
|X | .
^
2
n
!y
This is also called Parseval's relation. We shall see later that there is yet another form of this relation in
continuous time. Basically this relation says that the sum of squares in time domain is equal to the sum of
squares in frequency domain. One important implication is that the least squares solutions for a fitting problem
may be done in either frequency or time domain depending on which is more convenient.
Some Examples of Fourier Transforms
In the following discussion we calculate some specific Fourier transforms. First recall that we had
earlier shown
cos(t) O cos: ;
^
(n^1)
2
!y
sin$ n %
2
sin$ %
2
In a similar way
e O e
^
(3.1.21)
!
sin$ n %
2
^
sin$ %
2
!y
We may use (3.1.21) to calculate the Fourier transform in a number of interesting cases.
Cosine Wave: First, let
(3.1.22)
X!
R cos (
t ] )
Je
!]
+ e !^
2
K.
Xe
^
1
n
!y
^
!
^
R
n
!y
cos (
t ] ) e ^
! .
R
2n
e
^
!^
!+
!y
R
2n
e
^
!y
^!^ !^ .
R
2n
e
e
^
!y
^ !
R
2n
e^
e
^
!y
^] ! .
These are each geometric series and may, as before, be written in terms of the Dirichlet kernel. Thus we have
J
R
2n
e
sin:
^
^
n(
^
2
sin:
^
2
R
2n
^
^^
]
sin:
n(
]
2
sin:
]
2
J
R
2
e e
^
^
D
^
) +
R
2
e^ e
^^
]
D
]
).
Or finally,
(3.1.23)
J
R
2
^^
Je
(^
])
D (
^
) + e
^(^ ])
D
]
.
This illustrates the leakage phenomena. The leakage phenomena refers to the fact that when one takes a
Fourier transform of a single sinusoid, i.e. all energy concentrated at a single frequency, the Fourier transform
suggests that energy has leaked out to neighboring frequencies.
A Step Function: Let
(3.1.24)
OJ
X
1 0|tzm
.
0 m|tzn
In this case
e
! O 1
n
J
^
J O 1
n
!y
^
^
sin:
(m^1)
2
sin : 2 ;
R (
)
m
n
^
^
*
J J
D (
).
( m ) | D (
) | .
n
X!
Ot^
n^1
2 ,
a
!y
1+
a
!y
1^a+
1^a
6 a 7
t a
!y
da
!^
!y
1^a]
1 ^a
da
so that
t a
!^
!y
which is
t a
!^
!y
Thus
ta y a ta
!y
a)a ^
O ^ (n+1)(1^(1^a) (1^a
]
!^
!y
]
Finally replacing n by n ^ 1,
ta
^
(3.1.29)
!y
a
y ^ n(1^a)(1^^a)a(1^a ) .
Thus
1
n
X! e^
!
J
J
^
!y
1
n
t e
^
^
!
!y
n^1
2n
e
^
!y
^
! .
****
(1^ e^
)e^
(1^e^
)
^
^
e ^
$ 1^e^
%
n$1^e^
%
n^1
2n
Je^
^
sin $(n^1)
/2%
sin (
/2)
K.
J
^
O $1 ^ e^
%
e^
^
1
n
^
e^
$1 ^ e^
%$1 ^ e^
%
^ n^1 e^
^
D (
).
!
!
"
"
Symmetry: Let
(3.1.33)
Thus we have that
J
J
1
n]1
2 (j/n) so that
n
On^
X
1
y n+1
^
!
!y
O 0, , n.
X^! , t
X e
1
n+1
Recall that
X
X
!y
yn^
t so that t
u. We thus obtain
X
e^
^"
1
n+1
2 j. Thus e
1 and
"
"y
e^
! .
^!
(3.1.34)
J
X
1
n+1
"
"y
e
" e
.
"
"y
e]
" .
If X is a real-valued time series, we may take the complex conjugate of (3.1.34) to obtain
J*
(3.1.35)
1
n+1
X $e %
"y
]
"
*
"
X
1
n+1
"y
"
e^
"
J .
Roughness Measures:
Let us finally consider
(X
^
(3.1.36)
!y
X! ^ )
as a measure of roughness of the process. This is essentially the variation of the process. Now,
X!
X!^
^
y
X^ )
J
Je
! ^ e
!^K y
J e
^
y
y
J
^
$1 ^ e^
%
!
J e
^
y
*
!
61 ^ e^
7.
e ^
! (1 ^ e ]
)
so that
(X
^
(3.1.37)
y
X!^ )
J J $1 ^ e %$1 ^ e % e
^ ^
y y
^
*
^
(
^
)!
!y
(X
^
(3.1.38)
But
!y
$1 ^ e
%$1 ^ e
Thus
^
y2^
X!^ )
$e
| J |
^
y
+ e^
%
| 1 ^ e
| .
y2^
2 cos(
)
2 (1 ^ cos(
)).
(X
^
!y
X !^ )
(X
^
(3.1.39)
!y
(3.1.40)
(X
^
!y
X!^ )
X
^
!y
X
^
!y
!
y
(1
cos (
)).
2 sin (
/2), so that
X!^ )
|J |
^
y
sin (
/2).
^
|J |
^
y
sin (
/2) | J |
.
n |J |
^
!
y
^
y
This equation may be used as a measure of frequency content. If low frequencies predominate, X is close to
X^ and the ratio is small. If high frequencies are plentiful, the ratio will be large.
Chapter 4
Fast Fourier Transforms
n J
X e
^
!y
^
!
, j
Let
O 2(j/n) as
0, 1, , n ^ 1.
In this computation, there are n ^ 1 (complex) multiplies (one for each j y 1, , n ^ 1; j y 0 implies e y 1
so that no multiply is involved). There are also (n ^ 1) additions (adding n terms means n ^ 1 additions).
Since there are a total of n equations, to compute the full Fourier transform, there are n(n ^ 1) complex
additions and n(n ^ 1) complex multiplies.
Write
t O t n ] t , 0 | t z n , 0 | t
to generate a rectangular array, e.g. n y 55 y 5 x 11.
X
X
X
X
X
X
X
or, in general
X
X
X^
X^]
nJ
X^
X^
X^.
j n + j , 0
X e
^
!y
X
X
X]
X
^ !
z n
j
^ ^
! y ! y
n ,
| j z
or equivalently
(4.1.3)
n J
Y(t , t ) w
^ ^
! y ! y
But w!
O .e /
(4.1.4)
!
1 !
n J
Y (t , t ) w
^ ^
e^
w O e^
Reversing the order of summation in (4.1.4),
*****
!
Also,
w
and, similarly,
! y ! y
(4.1.5)
nJ
w
^
w
^
w!
w! .
!
^
!
! y
w
w.
Y(t , t ) w
^
^
! y
!
.
Z(j , t )
Y(t , t ) w
! y
!
.
This is a column by column transform of the original data. In this transform, there are n ^ 1 multiplies and
n ^ 1 adds. Now
(4.1.7)
n J
.w
^
! y
!
Z(j , t )/ w! .
For a fixed j there are n ^ 1 adds in equation (4.1.6) and n ^ 1 adds in equation (4.1.7). Thus there are
n +n ^ 2 adds for each j and total of n(n ] n ^ 2) y n n (n ] n ^ 2). For multiplies, we have n ^ 1
multiplies in (4.1.6). In addition in (4.1.7), we have n ^ 1 multiplies for a fixed j, so that we have n ] n ^ 2
for each fixed j. Since we have n j 's we have n(n ] n ^ 2) multiplies. The multiplication of the twiddle
factor requires n additional multiplies (j y 0, ,n ^ 1, t y 0, ,n ). Thus the total multiplies for the
factored form is
n(n ] n ^ 2) ] n
y n n (n n ^ 1)
Example: Let n y 50 y 5 _ 10 and consider the full transform DFT. Here 50(50 ^ 1) y 50 _ 49 y
2450, so that we have 2450 adds and 2450 multiplies. For the FFT, we have 50(15 ^ 2) y 50 _ 13 y 650
adds and 50 (15 ^ 1) y 50 _ 14 y 700 multiplies. This gives a FFT/DFT ratio of 650/2450 y .2653
for adds and 700/2450 y .2857 for multiplies.
Example:
Consider this time n y 49 y 7 _ 7 in the full DFT. Here 49(49 ^ 1) y 49 _ 48 y 2352.
Thus we have 2352 adds and 2352 multiplies. Correspondingly for the FFT, we have 49 (7 + 7 ^ 2) y 49 _
12 y 588 adds and 49 (14 ^ 1) y 49 _ 13 y 637 multiplies. Thus the FFT/DFT ratio is 588/2352 y .25
for adds and 637/2352 y .2708 for multiplies. We minimize these ratios when n, the number of observations,
is fully factored.
y
any case, n n (
y
Thus we may
Example: Consider now n y 128 so that k y 7 and the FFT/DFT efficiency ratio is 14/64 y .1875. Thus
an FFT for 64 terms would take 2 _ 64 _ 6 y 768 adds and multiplies. Recall for n y 50 y 10 _ 5, we
had 700 multiplies and 650 adds. One can conclude from this example that not much is lost in absolute
operation count by using FFT with excess terms since we have essentially the same operation counts for an FFT
of 128 terms and a DFT of only 50 terms.
Example: Consider an even more extreme example with n y 1024, so that k y 10 making the FFT/DFT
20
efficiency ratio to be .019 y 1024 , i.e. 20 _ 1024 y 20480 compared to 1024(1023) y 1,047,552. Suppose
Z
Z
now we have X , ,X^ with n z 2 y nZ . Let X y X , j y 0, 1, , n ^ 1 and let X y 0, j y
n, nZ ^ 1 so that we fill the data vector out with zeros. Then
JZ
Here we will let
Z
Z
2 j/nZ . Then J
1
nZ
n
1
nZ _ n
1
nZ
Xe
Z^
!y
X exp(
^
!y
X exp(
^
!y
Z
!
^
Z !
^ !)
n
Z
^ !) y
nZ
Z
J(
).
Thus J (
) O n JZ . The FFT of the extended series is the Fourier transform of the original series evaluated at
n
Z instead of
. Of course,
Z
J O J(
) J (
Z ), but since nZ { n,
y 2jnZ z 2jn y
.
Thus we evaluate J(
) on a denser grid when we fill out the FFT with zeros.
Z
Example: (Worst case). Let n y 1025 so that n y 1024 too small. Extend the sample size to nZ y 2048 with
1023 zeros filled in. Then the number of FFT multiplies for n Z O 2048 O 2 is 2 _ 11 _ 2048 y 45,056.
The number of FFT multiplies for n y 1024 y 2 is 2 _ 10 _ 1024 y 20,480. However, the number of
DFT multiplies for n y 1025 is 1025 _ 1024 y 1,049,600 so that even in the worst case scenario the FFT is
dramatically better than the DFT. Consider for a moment the corresponding evaluation grids. For the
n y 2048 FFT case
y 2 j/2048, j y 0, ,2047 so that the evaluation points are 0, .0003069, .00614,
.00921, .01227, . The evaluation grid for the n=1025 DFT case is computed based on
y 2 j/1025, j y
0, ,1025. This gives a grid of evaluation points 0, .00612, .01226, which is roughly twice the spacing of
the more desirable FFT grid.
Chapter 5
Leakage and Aliasing
As we have already seen, harmonic analysis can lead to substantial insight into the frequency structure
of a time series. Harmonic methods based on Fourier transforms are not, however, universal panaceas for
understanding the frequency content of a time series. Problems exist with so-called leakage and aliasing and
with lack of controllability of the variance of raw Fourier transforms. In this chapter we shall look into these
problems in somewhat more detail.
5.1 Leakage
Leakage refers to the fact that if we take the Fourier transform of a pure sinusoidal function, we will
get non-zero amplitude for nearby frequencies. To see this, let
X y exp (
!),
the complex sinusoid with unit amplitude and zero phase, then
J(
)
(5.1.1)
X e
^
1
n
^
!
!y
so that
J(
)
(5.1.2)
^
1
n
!y
e(
^
)! .
(5.1.3)
(^)
^
)
_ nsin (n(
)/2) .
sin (( ^ )/2)
R (
) O J(
) J(
)*
O : nsinsin(n(((
^^
)/2) ;.
)/2)
(5.1.5)
J(
) e
(^)
(5.1.6)
(^)
sin (n(
^
)2)
n sin((
^
)2)
(5.1.7)
K(
^
2
n )
(^)
sin :n 6
^6
^ 2n 77 / 2;
n sin 66
^6
^ 2n 77 27
K(
^ 2n )
(5.1.8)
sin $n(
^
)2]%
(^)
n sin $$
^
%] %
n
so that approximately
K(
^ 2n )
(5.1.9)
(^
sin (n(
^
)/2)
n sin $(
^
)/2%
(5.1.10)
(^)
sin (n(
^
)/2)
n sin ((
^
)/2) .
Let us define K (
) by
K (
)
(5.1.11)
K$
^ 2n %
K(
)
K (
] 2n .
Notice for
, K (
) 0 for n sufficiently large for the Dirichlet approximation to hold. In reality we are
interested in J(
), not K(
), so we transform back by taking J to be defined by
K (
)
(5.1.12)
J (
) e
(^)
or equivalently
J (
)
(5.1.13)
K (
) e
^(^)
This weighted sum is called the Hanning Window and by it we introduce the idea that we can reduce leakage by
taking weighted sums. A little algebraic manipulation will lend some further insight. First, since (5.1.11) gives
us
K (
) O
K(
^ 2n ) ]
K(
) ]
K(
] 2n ),
we have
K (
) O J(
^ 2n ) _
(^)
^
] J(
) _ _
(^)
] J(
] 2n ) _ _
(^)(
] )
6 X e
1
n
^
!y
^
!
(^)
7_: e
^!
^
)
e^ (
] ] e
!
^
)
e(
;.
K (
) O
(5.1.15)
1
n
X e J
^
^
^
!
!y
1
2
(5.1.16)
1
n
XJ
^
!y
1^cos(2(t] )n
2
Ke
^
! .
Y
O X J1
(t] )/n)
^cos(2
K.
Thus we are led to the idea of weighting the data to reduce this bias introduced by leakage. We have seen that
we can do this in the frequency domain by taking the weighted sums of Fourier transforms
K (
)
K (
^ 2n ) ]
K (
) ]
K (
] 2n )
X
1^cos(2(t] )/n)
2
The latter is called the data window, the former is usually called the spectral window. We shall investigate
windowing in much more detail later on.
5.2 Convolutions
One interesting little sidelight to note concerns convolutions. Let Z be defined by
O X" Y!^" ,
^
(5.2.1)
Z
t O 0 ,1, ,n ^ 1.
"y
Then,
(5.2.2)
J' (
) O
Z e
^
!y
^
! .
J' (
) O
^ ^
J' (
) O
^
!y "y
X Y
"y
^
"
!y
!^"
e^
! .
J' (
)
(5.2.5)
X Y
^
^
"
"y
#y
e ^
# e ^
" .
(5.2.6)
X
^
"y
"
Y
^
e^
"
#y
e ^
#
J% (
) _ J& (
).
Thus the Fourier transform of a convolution of two time series is the product of the Fourier transforms of the
two time series and, of course, vice versa. The leakage phenomena can be understood more easily in terms of
convolutions. Leakage really arises because an infinite set of data, namely X^ , X^ , X , X , X , , is
multiplied by a finite function, to be precise the function
(5.2.7)
I (j)
1
O J0
j [0, n^1]
j [0 n^1].
if
if
The calculation we have just performed shows us is that we can reduce leakage bias by replacing I[,] (j) by a
more tapered window, say
(5.2.8)
H (j)
O J 1^ cos(22 (j] /n) K.
Of course, this is not the only window we could use, but it is a computationally simple one. We shall use this
smoothing window idea again in later discussions.
5.3 Aliasing
We have just seen that one problem in dealing with DFTs involves frequency smearing. That is, the
DFT of a single frequency signal yields a frequency spectrum which spreads some energy of the single
frequency signal over frequencies adjacent to that of the signal. Another interesting phenomena involves
aliasing. An alias is another name for something. In the frequency analysis of a continuous time signal we
shall see that potentially many frequencies of a signal will appear in the same place, i.e. many frequencies are
labeled with the same frequency designation. This is the origin of the name, alias.
Let us begin by considering a single frequency process
(5.3.1)
X
cos (
t)
t
(5.3.2)
For t
X
2
t ,
(5.3.3)
Substituting for
,
let
Z
2
t
cos (j)
so that
2
t
X
cos $ t jt%
^
,
^
.
Z
when j
. ^1when j isis odd .
1
even
Then
cos (
jt).
(5.3.4)
X
cos
J62t ^
7jtK
Z
X
cos (2j
Z jt)
cos ( ^
Z jt)
cos (
Z jt).
O 2t ]
, so that
^ t y
. By a similar computation,
Z
ZZ
ZZ
X
cos (
jt)
cos .$
ZZ ^ 2t %jt/
cos .
ZZ jt ^ 2j/
3
t ,
let
cos $
ZZ jt%.
Graphically, frequencies are folded back upon each other. See Figures **** Because of this folding pattern,
(and, of course, that Nyquist first articulated this idea), the frequency, /"t, is called the Nyquist folding
frequency. For a sampled time series frequencies higher that the Nyquist frequency are indistinguishable from
(alias as) lower frequencies. This process is called aliasing.
Chapter 6
Random Process Models
6.1 Introduction to Random Processes
To this point we have said nothing of the statistical structure of the process X , especially with respect
to the DFT of X . The DFT is of course itself a random process if X is a random process. The implication is
that a different realization of X yields a different realization of J(). In order to make the DFT a statistically
meaningful diagnostic tool, we must investigate the stochastic structure of X a bit more thoroughly. Basically,
the use of the DFT straight away on the data implies we believe X is essentially deterministic.
We now want to regard X as a stochastic process. That is to say we let + be a sample space and P a
probability on that sample space satisfying the axioms given below.
Axiom 1. For A +, 0 | P(A) | 1.
Axiom 2. For A +, A disjoint,
P(A ) y P( m A ).
Axiom 3. P(+) O 1
In this section it is often convenient to regard a random process, X, as complex valued rather than real valued.
Of course, since all real-valued processes are trivially complex-valued, this is no real loss of generality. We
shall define X ( ) O X(t, ) to be a random process if
X(t, c ) : + b, the real numbers (or S, the complex numbers).
That is to say, for t fixed, X(t, c ) is a mapping of the sample space into the real line (the complex plane). For
each fixed t, X(t, ) is an ordinary random variable. Usually we write X ( c ) for X(t, c ). It is common to
suppress the explicit dependence on , although, of course, it is always there implicitly.
The variable, t, is usually thought of as belonging to an index set, J , and usually is thought of as time.
As we had earlier discussed in Chapter 1,
Fx! ,,x! (u , ,u )
(6.1.1)
P(X!
| u , , X! | u )
{A
E (X ! )
, written as , (0, ).
{E( A
y
) cos(n
t) ] E( B )sin(n
t)]} y 0.
J y y6A cos(n
t) + B sin(n
t)76A cos(m
t) ] B sin(m
t)7K.
E{X X } = E
Expanding we obtain,
E{X X }
cos(n
t) + B sin(n
t)}. Here A and B are independent normal random
y
.E(A ) cos(n
t )cos(n
t ) + E(B ) sin(n
t )sin(n
t )/,
y
so that
E{X X }
[cos(n
t )cos(n
t ) + sin(n
t )sin(n
t )].
y
E{X X }
y
cos$n
(t ^ t )%.
This process is thus weakly stationary. Also since a linear combination of normal random variables is normal,
the process is also normal and hence it is also strictly stationary.
In the discussion which follows, we shall assume that the reader is familiar with Riemann-Stieltjes
integration. There is a discussion of Riemann-Stieltjes integration in Appendix A.
A bivariate function (s, t) is non-negative definite if for any set of time points t ,
complex numbers z ,
,z
(
(6.1.3)
y y
! )
z z*
,t
0.
If the inequality is strict, the function is said to be positive definite. In general we will use the * to indicate the
complex conjugate. That is if z is a complex number given by a ] bi, the z* is a ^ bi. Notice that if (s, t) y
E(X X* ) is the covariance function of a zero mean process, then
(t t ) z z
(6.1.4)
y y
*
. X
y y
!
*
X* z z
!
/ y E * X z X z +.
!
y
y
* *
!
(t t ) z z
(6.1.5)
y y
*
y E
X z |
y
!
0.
Thus the covariance function of a zero-mean stochastic process is a non-negative definite function. A trivial
modification shows, in fact, that every covariance function is non-negative definite. The converse is also true,
i.e. if is non-negative definite, then there is a stochastic process for which it is a covariance function.
If we assume X is complex valued and stationary, then without any loss of generality, we may assume
E(X ) y 0. Notice in this case as we did in Chapter 1, we have
*
(6.1.6)
(s, t) y E (X X* ) y E (X ] X!] ).
!
Now let O ^ t so that (s, t) y E( X ^! X* ) y (s ^ t, 0). It is customary to write ! y (t, 0) Thus we
may define a one-dimensional function ! to be non-negative definite if and only if (t, tZ ) O !^! is nonnegative definite. Notice that if O E(X!] X* ), then
!
Z
(6.1.7)
^
*
E( X!^ X! )
*
E( X! X!] )
*.
x
z
z
z
y ^
^
^ ^ {
^ ^ }
^ ^ }
}
|
.
Note that the matrix is symmetric with constant elements on any diagonal. The corresponding correlation
matrix is R = / .
One of the consequences of positive definiteness is that the determinant and all of its principal minors are
greater than 0.
Hence
B
B
or
or
| |
or
| |
Similarly,
C
C
0;
B
1; | |
; | |
| |
1;
0; and
B
0.
^1 z
^
^
or equivalently
^1 z
^
1^
1;
| |
1.
Bochner's Theorem: A continuous function (t) is non-negative definite if and only if it can be represented by
an integral of the form
(t) O
e
! d F(
)
B
(6.2.1)
^B
where G is a real-valued, non-decreasing and bounded function. The function, F, will be the analog of a
probability distribution function as we shall see shortly. Notice that (6.2.1) strongly resembles relationship for
the inverse Fourier transform we had discussed earlier. Recall
(6.2.2)
X!
n
2
e
! J(
) d
.
The proof of Bochner's Theorem is beyond the scope of these lecture notes and will be left as an unproven
assertion. Clearly to employ Bochner's Theorem as it might apply to a covariance function, , we need only
show that is continuous.
Lemma: If is continuous at O 0, then it is continuous at all .
Proof: We first consider evaluated at and . Now
Z
| ^ |
Z
*
*
| E. X!+ X! / ^ E.X!] Z X! /|
y E.(X ^
!]
*
X!] ) X! /|
Z
|
Z
| E.(X
^X
Z
) X* /|
Z
Z
Z
Z
$E. X* /% $E.| X ^ X | /% .
Z
Z
Z
*
X / y 2 ^ ^
Z
Hence E.| X
^
Z ^
^X / 0
Z|
so that
We are now in a position to combine Bochner's Theorem with the basic Lemma we have just show to
obtain the basic spectral representation theorem.
Spectral Representation Theorem: If is the covariance function of X and is continuous at 0, then there
is a real-valued, non-decreasing bounded function F such that
(6.2.3)
^B
e d F(
)
The function, F, is called the spectral distribution function. The proof follows from Bochner's Theorem and the
fact that is non-negative definite and continuous.
Notice that
d F(
)
^B
FZ (
)
d F( )
d
and
e f(
) d
.
^B
The function f(
) is called the spectral density of X! . The set of all discontinuities of F is called the point
spectrum of X! .
The spectral representation theorem we have just seen express the covariance function in terms of the
spectral distribution function or the spectral density function. Of major interest is the situation when the
spectral distribution or the spectral density can be represented in terms of the covariance function.
B
If
| | d <
^B
(6.2.5)
f(
)
*, then
O F (
) exists and f(
) O
e^
! ! d t.
1
2
^B
(6.2.6)
e
^
1
n
^
!
X! .
y
This is a rather striking analogy. Of course, we are dealing with continuous-time processes in the present
discussion. If X! is not a continuous-time process but rather a discrete-time process, the analogies are even
more striking.
To see this, let be the covariance function for a wide sense stationary discrete time process, X! t
The discrete analog of the Spectral Representation Theorem is known as the Herglotz
Lemma and is given below.
y 0 a 1, a 2, .
Herglotz Lemma: If is the covariance function indicated above, then there is a real-valued, non-decreasing,
bounded function, F, defined on the interval ( ^ , ) such that
(6.2.7)
e
d F(
).
^
Recall in the case of the discrete Fourier transform we studied earlier that
(6.2.8)
X
O 21
e
! n J(
) d
so that we have an even more striking analogy. As in the continuous case, F is defined up to an additive
constant. Choose F( ^ ) y 0, so that F( ) y and as before we take F (
) y F(
] ), i.e. choose F to be
continuous from the right. The discrete version of the inverse Fourier transform can then be given.
| |<*, then
B
y^B
(6.2.9)
f(
) O F (
) exists and f(
)
Z
B
1
2
e^
,
y^B
z .
(6.2.10)
1
n
e
-
^
y
X .
To this point, we have been assuming that X! is a complex-valued process. If, in fact, X! is a real-valued
process, O ^ . In this case
B
(6.2.11)
e
d F(
)
^B
^
e^ dF(
)
^B
2
+ ^
(e
^B
]e
^
)
d F(
) .
Thus
(6.2.12)
B
^B
(6.2.13)
B (e
]e
^B 2
O
O
^
)
d F()
B
^B
0. Similarly
1
2
^B
1
2
e^
! ! d t
1
2
(6.2.14)
1
2
cos (
t) ! d t
^B
cos (
^B
^ t)
e
! ! d t
^B
d t.
f ( ^
).
Thus for a real-valued stochastic process, the spectral density is symmetric about the 0 frequency. A practical
implication of this fact is that we really need not plot the spectral density function over the whole range from
^ to . The range from 0 to is sufficient. If is real so that is symmetric about 0, then we may define
G(
) y 2F(
) ^ . In this case
(6.2.15)
e d F()
^B
^B
(6.2.17)
or
(6.2.18)
e
d F(
)
^
1
2
e d F(f).
e
d F(
)
^
^
That is to say, f(
) is the intensity of the fundamental sinusoid at frequency
in the covariance, , much the
same way that |J(
)| O R (
) was the intensity of the fundamental sinusoid at frequency
in the data X . Of
course J(
) is a random process while f(
) is a deterministic function. Of course, as we indicated earlier, this is
the reason for exploiting Bochner's Theorem.
We now give a fundamental theorem which will illustrate the relationship between the Fourier
transform of X and the Fourier transform of . Clearly we want to have the Fourier transform of represent
in some sense the structure of the Fourier transform of X . We will only give the statement of the following
theorem.
Let X(t) O X be a zero mean (continuous) process with spectral distribution function, F(
). A
process K(
) is said to have orthogonal increments if for every set,
z
|
z
, we have
(6.3.1)
E 6K(
)
K(
)76K(
)
K(
)7K
0.
Spectral Representation Theorem For every stationary process, X(t), there is a stochastic process, K(
), with
orthogonal increments, such that for each fixed t
(6.3.2)
X(t)
e
d K(
),
^B
K(
) is defined up to an additive random variable. We may fix this additive random variable by taking
K( ^ )
0. In this case,
B y
O 0,
E. |K(
) / y F(
),
E$ K(
)%
and
E.|d K(
)| / O d F(
).
The process, K(
), is called the spectral representation process. (The DFT process, J(
), is an approximation
to K(
) based on a time series). This result may be interpreted to say that the expected sinusoidal oscillations in
X are the same sinusoidal oscillations in , i.e. the structure we expect to find in X are found in . We say
this in the sense that the respective spectral densities are the same. This gives us confidence that using as a
deterministic representative of X is very satisfactory since (at least as far as frequency content is concerned) the
structure of is reflective of the structure we expect to find for X .
Chapter 7
Estimating Spectral Densities
7.1 Estimating of the Mean of a Stationary Time Series. I want to make a slight change of notation
here. Instead of writing the time series as X , ,X^ , I will write the series as X , ,X . Previously, in our
discussion of complex sinusoids for, has been convenient to start with an index of zero so as to make the first
term in the DFT series one. For reasons of symmetry in our present discussion it will be more convenient to
make the first index one. The major effect of this shift is to create a phase shift in the computation of the DFT,
i.e. multiplication by a complex constant whose modulus is one. This will be irrelevant to the use of the DFT as
a diagnostic tool.
As before, let X! be a real-valued stationary process with constant mean, , and covariance function,
^ y
. We assume that we have real-valued time series X , ,X . We consider the sample mean, X
^ y
1
nE
1
n
X .
!y
!y
1
n
!y
the maximum likelihood estimate for the mean, , in the hidden periodicities model. Here, however, we are
only prepared to assume a stationary model for the process. Thus we are not yet able to compute all other
^
^
properties of X . We are interested in computing the variance of X . Unlike the usual independent and
identically distributed case (i.i.d.), the computation is not straightforward. We start with a lemma.
Lemma: If a , j
0,
a 1, a 2, , then
a
(7.1.1)
!y
y
^
!^
!y^]
(m ^ |t|) a! .
h(t
(7.1.2)
0 0
^ s) ds dt y
(T
^T
^ t ) h(t) dt.
Proof: We prove the discrete case. The other follows in a similar way. Consider
a
!y
y
!^
a
!^
!y y!^
where we are letting t ^ s y j. Interchanging the order of summation (see **** figure), we obtain
a
!y
y
!^
a
]
y^]
!y
a.
^
y
!y]
Rewriting yields
a
!y
y
!^
y^]
(m ] j) a
(m ^ j) a
^
y
^
y^]
(m ^
j ) a .
This gives us the result in the discrete case. the continuous case follows in a similar way although actually the
computation is slightly easier.
^
^
var( X ) y E. X ^ ) G y
Then
E J (X ^ )K
1
n
!y
!^
^ y
^ y
var( X )
1
n
1
n
!y
y
^
1
n
J (X ^ )(X ^ )K.
1
n
!y
(n ^
!y^]
y
t ) !.
Canceling an n yields,
(7.1.3)
var( X )
^
(1 ^
t
n )!.
!y^]
^y1
n
Theorem: Let X
!y
^
1
n
!y^]
t
.1 ^ n /! .
z B , then
B
If
!y^B
(7.1.4)
var( X )
In particular, var( X )
1
n
as n B.
B
!y^B
(7.1.5)
!y^B
^ O
n var (X )
$1 ^ %
^
!y^]
t
n
^
!y^]
!
1
n
^
!y^]
t !.
^
!y^]
t ! y
B
!y^B
I (t) n !
1 if ^n]1|t|n^1
yJ
0 otherwise
Notice that
I (t)
t
n
!
| ! .
I (t)
B
lim
B
B
Hence n var( X )
!y^B
!y^B
!y^B
! y
0 y 0.
B
!y^B
!y^B
so that
f(0)
f(
)
Thus n var( X )
(7.1.6)
lim BI (t)
! y
e
B
1
2
^
!
!y^B
e
B
1
2
!y^B
^
!
! ,
1
2
B
!y^B
!.
^ y
(7.1.7)
1
T
var( X )
with
(7.1.8)
^ Tt )
(1
^T
! d t
T var( X )
^B
dt
2 f(0).
7.2 Estimation of Variances and Covariances. Let X , ,X be a time series from X! which is
stationary process with mean and covariance function . We want now to estimate . A first
approximation variance estimator might be
y
(7.2.1)
(X ^ ^ )
X
1
n
!y
the usual estimator of variance with the general stationary covariance estimator
y
(7.2.2)
n^
(X
^ )(X!] ^ ^ ).
X
^
!y
!^ X
This, like other ad hoc estimators, has some potential drawbacks. One might object that the first factor in the
^
summation given in (7.2.2) involves the observations X , ,X^ and yet we subtract X which is computed
from the whole set of observations, X , ,X . Similarly, the second factor involves X ] , ,X and again we
^
subtract X . In spite of the symmetry, this clearly will cause the cross products we expect to cancel in quadratic
form computations not to cancel. Hence, a second proposed estimator might be
^ y
(7.2.3)
where X ,
(7.2.4)
so that
1
n^s
X and ^
X
^
!y
E(^ )
,
1
n^s
1
n^s
1
n^s
(X ^ ^
X
^
!y
)(X!]
^ ^ , )
X
!y ]
: JX X
!y
,
^
!]
^ ^ , X!] ^ ^ , X! ] ^ , ^ , K;
X
X
X X
E(^ ) y
(7.2.5)
1
n^s E
: X X
^ X
^ X
^ 2(n ^ s)X , ^ , ] (n ^ s)X , ^ , ;.
^
!y
!]
(7.2.6)
1
n^s E
: X X
^
!y
!]
^ X
^ (n ^ s)X , ^ , ;.
E(^ )
E(^ )
: (X ^ )(X
1
n^s
^ ) ] ^ , ] ^ , ^ ^ ^ , ^ , ;
X
X
X X
^
!y
!]
: J(X ^ )(X
1
n^s
^
^ )K ^ ( ^ , ^ )(X , ^ );.
X
^
!y
!]
(7.2.9)
X
X
^ cov( ^ , , ^ , ).
But
(7.2.10)
^
y EJ Xn^s
y
^
cov( X , , X , )
^
y
(X] ^ )
n^s
cov( X , , X , )
1
(n^s)
E(X ^ )(X
^ ^
y y
^ )
]
so that
cov( X , , X , )
1
(n^s)
^
^
y y
]^ .
(7.2.11)
cov( X , , X , )
^ ^
1
(n^s)
y^] ]
] (n
^ s ^ j ).
z B, then cov( ^ ^ ) 0 as n B.
X
X
B
As before, if
y^B
,
,
^ ^ ] y ^ n cov( ^ , ^
X
X
n[E( )
,
,
^n
n^s
^ ^
(1
y^] ]
j
n^s ) ] .
Letting n
B, we have
^ ^ ] ^ y .
B
n[E( )
(7.2.13)
]
This shows that ^ is asymptotically unbiased. Computing the asymptotic mean of ^ is not a trivial task.
Computing the corresponding mean of is a real mess! We will not show the algebra of the computation, but
for the record,
E( )
(7.2.14)
^ n(n1^s) J
Taking limits as n
y^B
^
] var(X )
(n ^ k) ] (n ^ s) ]
^
y ]
y
B and simplifying
(n ^ s ] k) ( ] ] ).
^
] var(X ) ^
E( )
(7.2.15)
where lim
B
r y 2
y^(^ )^
1
n
r
, so that
B
y^B
n E( ) ^
(7.2.16)
.
B
y^B
(7.2.17)
1
n
X X
^
!y
!]
It is straightforward to see
O as n B.
Whereas the computation of the variance of and ^ is all but impossible, some results are known about the
^
variance of . We may get some idea by considering a special case. The following results were developed in
****. Let X O b . We shall later see this as an infinite moving average where are independent
^
E( )
(7.2.18)
1
n
^
!^
n^s
n
y
!^
^
0, var( ) O 1, E( ) z B and |b z Kc , 0 | c<1. Let y
(7.2.19)
^
lim B n.E( ^
/y2
B
y^B
( ^ ^ ).
B
^
lim n var ( ) y
( ] ] ^ ^ 4 ^ ] 2 ) y V
B
y^B
^
^ .
Then
(7.2.21)
^ ^
n cov ( ,
lim
B
]#
) y
(
B
y^B
]#
^ 2 ^ ^# ]
2
U ,# .
(7.2.22)
U,
V .
(7.2.23)
dist
n(0, V ).
^
^
Thus is asymptotically unbiased, consistent and asymptotically normal. Under appropriate assumptions,
^
(and by implication) seem to be reasonable estimators of respectively the correlation, , and the covariance,
, respectively. Notice that is we consider lim V , then we have
(7.2.24)
V y lim
lim
lim
B B
^
n var ( ) y
.
B
y^B
^
Hence even if n is large, when we try to estimate for large lags, the variance of the estimate does not
converge to 0.
7.3 The Periodogram and Other Smoothers. We restrict our attention to real-valued, wide-sense
stationary time series with a spectral density function. Let X , ,X be the time series. We want to estimate
the spectral density function, f(
), |
|
y^B
Recall by
definition
f(
)
(7.3.1)
B
1
2
y^B
e^
.
By analogy to (7.3.1), we can consider an empirical density estimator called the periodogram and given as
^
where
^
1
2
(7.3.2)
1
n
X X
^
!y
!]
y^]
^
e^
,
J(
)J* (
)
O X! e^
! | .
!y
(7.3.4)
I (
)
1
2n
X e
!y
^
!
^
y
1
2
^
y^]
^
e^
,
e
I (
) d
.
^
and
1
2
e
I (
) d
^
e
1
2
^
^
y^]
^
1
2
y^]
^
e^
so that
^
e^
d
1
2
^
e
I (
) d
^
e
(^ ) d
^
y^]
2
^
y^]
^
^
.
Here is the Dirac delta function. That this integral is either 0 or 1 follows from the orthogonality result
(3.1.16). Hence, second part follows immediately from the (7.3.2). Now consider,
I (
)
1
2n
X e
!y
Simplifying, we have
I (
)
Now letting s
Ot^
so that
yt^
I (
)
^
! |
1
2n
JX e X
!y
e^
!y
1
2n
y
(!^
^
!
y
e]
K.
X! X* .
s we have
2n
!^
*
e^ X! X!^ .
!y y!^
^
I() y 21
y]
e
. X X /.
1
n
*
!^
Here because the range of summation on t is dependent on , we have deliberately left the summation
ambiguous. For
| 0, we consider
1
n
X X
]
!y
*
!^ .
Since is negative
^
is positive and in this case has precisely the definition,
XX
!y ]
*
!^ .
Let s
O t ^ so that t y s ] .
1
n
X
]
1
n
!y
X* . For
!^
0, we consider
1
n
Then
XX
!y ]
^
*
!^
1
n
X
^
y
] .
^
X , so that the above sum is the definition of . In either case,
I (
)
1
2
^
y^]
^
e .
.A
y
cos (
t)
!y
B sin (
t) , we know from the
X cos (
t)and B
2
n
A ] B y
4
n
X X .cos (
s) cos (
t) + sin (
s) sin (
t)/
!y
X sin (
t).
A
(7.3.5)
2
n
!y
Consider then
(7.3.6)
!y y
X X
4
n
cos (
(t ^ s)).
!y y
We have assumed that X is real valued. We may take the real part of A ] B to have
A ] B y Real
J n4
X X
e
(!^
!y y
This is simplified to
A ] B y Real
Real
X e
J n4
!y
!
J X e X
4
n2
!y
!
y
e^
K.
X e
4
y n
! | .
8
n
I (
).
(7.3.7)
1
J 2n | !y X! e^
! |K y
2 n _
4
n _
(7.3.8)
where
I (
)
(7.3.9)
A ] B
1
2
or
I(
) y 21
(7.3.10)
with A
8
n
X e
I (
).
!y
^
y^]
^
! |
^
e^
^
O 2 X! cos(
t) and B
O 2 X! sin(
t), and with O 1 X! X!] where X! is real valued.
n
n
n
^
!y
!y
!y
Thus
we have a fundamental linkage between the generalized model of hidden periodicities, the DFT, the spectral
density and the covariance function. This gives additional intuitive support to the use of R (
) as a diagnostic
tool for fitting the model of hidden periodicities.
Next let us consider some of the statistical properties of the periodogram. It is not hard to see that
(7.3.11)
E[ I (
)]
1
2
^
y^]
^
e^
E[ ]
1
2
^
y^]
e^
. /.
n^j
n
The latter equality follows from a slight adaptation of (7.2.18). A little algebra yields
(7.3.12)
^
E[I (
)] O 21
y^]
. ^ /
j
n
e^
1
1
2
e
B
y^B
^
y f
, as n B.
var[I (
)]
(7.3.13)
y EJ
1
2
^
y^]
e^
^
^ 1 ^
|j|
n
G 7K .
Realistically there is not much hope for a simple solution based on (7.3.10), The formulation (7.3.9), however,
yields a more tractable computation, so consider
E[ I (
)]
(7.3.14)
O EJ 41n
X e
!y
^
! |
K.
E[I (
)]
E[X X
1
4 n
!y
X" X# ] e^
! e]
e^
" e]
# .
y "y #y
Now let the argument of the quadruple sum be E!"# . Depending on the values of s, t, u and v, we will get
different sums as follows.
If s O t u y v, E !"# y and there are n(n ^ 1) terms of this form.
If s y u t y v, E !"# y e
^! and there are n(n ^ 1) terms of this form.
If s O v t y u, E!"# y and there are n(n ^ 1) terms of this form.
If s y t y u y v, E !"# y and there are n terms of this form.
Otherwise, E !"# y 0.
We may use these 5 cases to form the equation
(7.3.16)
E [I (
)]
1
4 n
:2 n(n ^ 1) + n ] J| e
O
| ^ n
K;
The ^ n term in the expression in braces is meant to account for the case s y t which is double counted in the
square modulus. Using our earlier results related to the Dirichlet kernel
(7.3.17)
E[I (
)]
2
2 n
4 n
4
sin(n
)
n sin(
)
;^
sin(n
)
n sin(
)
4 n .
(7.3.18)
E[I (
)]
2
4 n
. ^ 3 G ]
4
;.
But E[I (
)] f(
) y 2 for a times series which is an independent and identically distributed sequence
(called white noise). We may combine this with (7.3.18) to obtain an approximation for the variance.
(7.3.19)
.
var[I (
)] O 2
)) ; ^ 4
sin(
Simplifying (7.3.19) and recognizing that the second term of (7.3.19) approximates 0 for large n, we obtain
.
var[I (
)] O 4
(7.3.20)
If
sin(n
)
n sin(
)
0 or
O
a , then
sin(n
)
n sin(
)
1 and var[I (
)]
0 as n B and var[I (
)]
4
4
sin(n
)
: n sin(
) ;.
for n large. If
2
0 and
a , then
for n large.
Thus
var[I (
)]
(7.3.21)
4
,
0 and
a
,
y 0 or
a.
2
Thus, even in this simple case of white noise, the variance of the periodogram is asymptotically not equal to
zero. In general,
var[I (
)]
(7.3.22)
f (
),
0 and
a
2 f (
),
y 0 or
y a.
(7.3.23)
sin $ n (
^)%
2
sin $
^ %
2
sin $ n (
])%
2
K f(
n)f() 0 as n B.
sin $
] %
2
In general, the periodogram is too rough, i.e. its variance is too large. We want to smooth the periodogram. We
do this with a kernel smoother.
7.4 Kernel Smoothers. We construct a smoothed periodogram by a convolution of the raw periodogram
with a smoother function which we will label K(
). Let K(
) be a real-valued, non-negative function such that
K(
)
(7.4.1)
K( ^
) and
K(
) d
^B
1.
Consider
B
K(
(7.4.2)
^B
Let s y
^ , so that ds
B
If we define a
^ ) I () d y
^B
^ ) 21
^
y^]
^
e^ d.
1
2
^
J
K(s) e dsK e
^
y^]
^B
(7.4.3)
K(
y ^ d and y
^ s.
K(
^B
^ ) I () d y
K(
^B
^ ) I () d y
1
2
^
y^]
^
a e^
.
^
.
^ (
)
f
(7.4.4)
1
2
^
k(j/m) e^
y^
(7.4.5)
K(
)
1
2
e
B
1
2
!
e
k(j/m)
y^
c
K (
) O m K(m
) if K is concentrated on ( ^ , ). We may begin with k(t) and calculate K or K as
desired. In general, however, we will want to weight the covariance and then use the formula
^B
^ (
)
f
(7.4.6)
1
2
^
k(j/m) cos(
j)
y^
as the computational form. We are assuming that the time series X is real valued so that the imaginary part of
(7.4.4) is zero yielding the very convenient form (7.4.6).
Examples of lag windows: There are several classical windows for smoothing the periodogram. We may
think of the smoothed periodogram as a nonparametric spectral density estimator. It is nonparametric because
we are not assuming any parametric form for the spectral density. We will later study time-domain models
which are indeed parametric. Thus we will also be able to estimate spectral densities parametrically. The
smoothed periodogram is in a sense an ad hoc estimator. It is not chosen against any optimality criterion. The
practical consideration involves reducing the variance so as to make f (
) a statistically consistent estimator of
the density f(
). The trade-off is that the bias of f (
) will increase. For this reason, a fairly large number of
different functional forms for the kernel, k, have been developed.
Bartlett window.
(7.4.7)
k(x)
OJ
1 ^ | x |, for | x | | 1
0, otherwise.
The Bartlett window is desirable because of its particularly simple computational form.
Daniell window.
(7.4.8)
k(x)
sin(x)
x ,
^ B z x z B.
Tukey window.
(7.4.9)
k(x)
OJ
1 ^ 2 a ] 2 a cos(x), |x| | 1
0, otherwise.
For a
1
4
and |x|
1, k(x)
The value a
Parzen window.
^ 6x ^ 6 | x | , 0 | | x | | .5
O J2 (1 ^ | x |), .5 | | x | | 1
1
(7.4.10)
k(x)
0, otherwise.
Generally speaking the choice of a window will have a comparatively minor effect on the spectral density
estimator. The choice of the truncation point, m, is more critical An optimal window can actually be computed
by minimizing
1
n
(7.4.11)
JI 6
^
y^]
j
n 7-
K+
f6 jn 7
FL f(
)G d
^
where f is chosen from a class of smooth functions which we shall not attempt to define here The parameter,
,
d
is a bandwidth parameter and L is a differential operator. If L is chosen as d
, the the solution to (7.4.11) is a
cubic spline. The expression is, needless to say a rather complex expression and will not be treated here. A
fuller discussion may be found in Wegman (1980, 1981).
7.5 Statistical Properties of Smoothed Periodograms. Choice of kernels and truncation points are, of
course, factors which will affect the statistical properties of our proposed kernel smoothed periodograms. We
state some results here which can be used to establish choices of both.
Bias. If both the spectral density, f(
), and and the lag window, k(
), are twice differentiable, then
lim
, B m.E[f (
)-f(
)]/ y ^ kf ZZ (
)
(7.5.1)
where k
O ^ kZZ (0) y
B
^B
K(
)d
. For the Tukey window k y
^ 4
k O 6, so that the Tukey window has better bias properties. Equation (7.5.1) may be rewritten as E[f (
)]
k
f(
) ^ m f ZZ (
).
f (
)
k (u) du, 0 z |
| z
(7.5.2)
lim
B
n
m
var[f (
)]
yJ
^B
2 f (
)
k (u) du,
y 0 or a.
^B
Thus
(7.5.3)
var[f (
)]
m
n
f (
)
k (u) du, 0
^B
|
|
.
MSE
c
m
c m
n
where the c -term represents the bias and the c -term represents the variance. We want these terms to
converge to 0 at the same rate (be of the same order) so that we may equate them to obtain
c
m
(7.5.5)
m
c m
n .
y c n
is optimal choice of m in
7.6 Practical Spectral Estimation. There are 3 choices associated with the estimation of the spectral
density by a smoothed periodogram, i.e. by
(7.6.1)f (
)
1
2
^
k$ %
y^
j
m
cos(j
),
namely the choice of a lag window, k, the choice of a truncation point, m, and the choice of a set of frequencies,
at which to evaluate (7.6.1). We comment on each in turn. For a choice of a lag window, the Tukey window
or the Parzen window are usually quite satisfactory. Window choice is not so critical. For the choice of
bandwidth, m, there is a much more difficult choice here as m determines the degree of smoothing. The choice
of m O c n. is an optimal asymptotic choice, but we usually have a fixed sample size which means that c is
the critical choice. I recommend that you try m O 10% of n to start, but be prepared for some trial and error.
Advanced techniques of cross-validation and other types of error criterion such as Akaike Information Criterion
(AIC) can help but are really beyond our discussion here.
The choice of frequencies,
, at which to evaluate is another fairly critical choice. Here we want a
choice that is reasonably dense along the frequency axis. Fourier frequencies
O 2n j are not a bad starting
point, but are probably too dense for large data sets. Some refinement of that choice with a less fine grid may be
in order! In general, there is no optimal theory. This is a nonparametric technique and so some
experimentation is in order.
Chapter 8
Filtering
8.1 Linear Filters. We now turn our attention to filters. A filter maps one stochastic process into
another. It can conveniently be and usually is thought of as a black box which has an input and an output
respectively, X! O Xt as input and Y! y Yt as output. We can use the conceptual black box to model a wide
variety of systems such as an amplifier, a communication channel, the ocean or atmosphere or any number of
models which may transform one signal into another. Indeed, estimating the filter structure is known in the
electrical engineering literature as the system identification problem. From a more mathematical perspective,
let < be a set of complex-valued functions. Let : <
< . is a functional or an operator in an abstract
mathematical setting. For us (X ) y Y is a filter.
The filter, , is linear if for every a , a complex numbers, with X , X < such that
!
!
a X ] a X < , then (a X +a X ) y a (X ) + a (X ). The filter, , is called time invariant if for
!
!
!
!
!
!
any X! < , then if X (t) y X(t+h) and Y (t) y Y(t+h) with (X) y Y, we have (X ) y Y .
Examples of Filters. Let X! be a stochastic process.
Backward Shift Filter: Y! y X!^ , t J .
Difference Filter: Y! y X! ^ X!^ y "X! , t J .
Moving Average Filter: Y!
a X
!^ ,
y
J.
Exponential Filter:
Differential Filter:
O
e^ (!^ ) X ds, t J .
^B
(X ) O X O dX , t J .
dt
Y
Z
!
y
w(t-s) X ds, t J . If w(t) y 0, for t z 0, then t ^ s z
^B
Thus w(t ^ s) { 0 only when t ^ s { 0, that is, when t { s. Then
B
!
Y O
w(t-s) X ds O
w(t-s) X ds.
0.
^B
0 implies that
^B
Since we deal with values of the process in the past with respect to t, we say that the filter is physically
realizable.
General Physically Realizable Filter: Y!
w(t
^B
^ s) X
ds, t
J.
B
a 1, a 2, / for example.
*.
The backward shift, the difference and the moving average are examples of
discrete-time filters. The exponential, differential, general linear filters and the general physically realizable
filters are examples of continuous-time filters.
We may define physically realizable filters in the discrete-time case as well. For the filter, Y
c X!^ , if c
0 for j
0, then Y!
c X
B
y
!^
B
y^B
e^
! w(t) dt
^B
B
c e^
, for |
|
y^B
|c | z
Then we have
y EJ
B
(8.2.1)
E[ Y! ]
Ky
B
c X!^
y^B
c E[ X!^ ]
y^B
B
y^B
c X!^ with
B
y^B
y 0.
Thus the mean of a filtered process is 0 if the original process has mean 0. Consider the the expression for the
covariance,
(8.2.2)
O E.
E [Y Y* ]
!]
c X!^
y^B
y^B
c* X*
!]
^
/.
We may move the expectation operator through the summation signs to obtain
E [Y Y* ]
!]
B
y^B y^B
*
*
c c E[ X!^ X!]
^
].
The expectation, however, is just the covariance of the X! process so that we have
(8.2.3)
E [Y! Y* ]
!]
B
y^B y^B
*
c c
^]
which doesn't depend on t. Hence Y is stationary stochastic process. This is true in general. A time-invariant
linear filtered version of a stationary process is still stationary. We may continue the development of equation
(8.2.3) by replacing the covariance function by the Fourier transform of the spectral density to obtain
(8.2.4)
E [Y Y* ]
!]
cc
e
B
y^B y^B
^ ^]
f% (
) d
.
e^
B
c e^
y^B
B
c* e]
f% (
) d
y^B
so that
(8.2.5)
E [Y Y* ]
!]
e^
|
^
B
y^B
c e^
| f% (
) d
& .
&
But by (6.2.5), we know that
e^
f& (
) d
. Thus
^
e^
|
(8.2.6)
^
B
y^B
c e^
| f% (
) d
e^
f& (
) d
.
^
(8.2.7)
B
f% (
) |
y^B
c e^
or equivalently
f (
)
(8.2.8)
f (
) | g(
) | .
Thus the spectral density of the Y process may be computed in terms of the spectral density of the X process
that the transfer function of the filter. Frequently we may measure both the X process and the Y process.
Thus by computing the ratio of the spectral densities, we may estimate the frequency response function of the
filter.
O
w(t ^ s)X ds with
(8.2.9)
f& (
)
^B
^B
w(t)| dt z B, we have
f% (
) |
w(x) e^
% dx |
^B
so that
f& (
)
(8.2.10)
f% (
) | g(
) | .
f& (
)
f% (
) | g(
) | .
The nature of the frequency response function largely determines the properties of the filter. A filter such that
(8.2.11)
| g(
)
|
OJ
1,
|
|
0, elsewhere
(8.2.12)
| g(
)
|
OJ
1,
|
0, elsewhere
| g(
)
(8.2.13)
|
OJ
1,
}
0, elsewhere
f (
)
1
2
B
y^B
e
2
2
where is the common variance of ! . It is not too hard to think filtering the white noise with a time invariant
linear filter to obtain
(8.2.15)
X!
B
y^B
c ! .
f& (
)
B
2
y^B
c e^
| .
This is a general linear model for a stochastic process known as an infinite moving average. If
(8.2.17)
X
c
!^ ,
y
then
(8.2.18)
f& (
)
2
c e
y
^
| .
The finite moving average representation of a stationary stochastic process, X , yields a powerful parametric
representation in the time domain for a time series. Moreover we can see that the spectral density given in
(8.2.18) has a parametric representation with q ] 1 parameters. This motivates a general discussion of linear,
time-domain models of time series with a finite number of parameters. These are known in the statistical
literature as either Box-Jenkins models or autoregressive-moving average (ARMA) models and in the electrical
engineering as linear recursive models. With this motivation, we shall begin our discussion of time domain
modeling.
Chapter 9
Linear Processes
9.1 Structural Models. A structural model describes the relationship among two are more random
variables in contrast to a statistical model which describes the probability structure of one or more random
variables. To be sure there are very intimate connections which as we indicated earlier are captured in the time
series case by the covariance structure of a stationary time series. Nonetheless, in the next chapters we are
interested in describing structural models appropriate to time series. These time domain models are, in essence,
analogs to linear regression models and, in fact, share so rather extensive theoretical connections. They do,
however, also have features which are peculiar to time series analysis.
Let us first consider a discrete time process of the form
X
+ !^ + !^ +
X
B
!^ .
y
The 's are uncorrelated random variables with zero means and constant variance, . As defined earlier, such
a series { } is called white noise and the process {X! } is called a general linear process. Notice that the
covariance function for ! is given by
(9.1.2)
OJ
, ky0
0, k0
and autocorrelation by
(9.1.3)
OJ
1, k y 0
0, k 0.
Equation (9.1.1) may be re-written under certain invertibility conditions which we describe later in the alternate
form
X
O X!^ + X!^ ] + !
X
X
B
y
!^
+ ! .
These infinite summation forms, (9.1.1) and (9.1.4), while elegant do not yield a convenient calculus for
manipulating time series. It will be convenient to introduce difference equation notation for this will give us the
tools for manipulating the time domain models more easily. Let B denote the backward shift operator so that
y
(9.1.5)
X
(B)
B ).
X
O
*
O
B )
as
(B )
(9.1.6)
Here the operator (B)
(1
B
y
B ).
.
time series allow us to develop a calculus. Consider applying (B) on both sides of (9.1.6) to obtain
(B)(B) X! y (B)! y X!
from which we have the identity
(9.1.7)
(B)(B) y I.
Here I is the identity operator so that I X! y X! . Notice also that FB X! y BF X! y X! , so that FB y BF
y I. It follows from this that (F)(B) y I. Combining this result with (9.1.7) we may conclude that (F)
y (B) or equivalently that ^ (B) y (B) where inverse means the inverse operator.
9.2 The Autocovariance Generating Function and the Spectral Density Function. With the
introduction of the notion of back shift and forward shift operators, we are now in a position to develop some
simple difference equation theory. Let's first use (9.1.1) to notice that
O EJ !^ !] ^ K.
y y
B B
E [X X] ]
We may interchange the expectation operator with the summation operators to obtain
E [X X] ]
B
O
O
E[!^ !] ^ ].
Because the process is white noise, only products with identical subscripts will contribute to the summation.
Thus
(9.2.1)
E [X X] ]
B
y
] .
We first observe that the right hand side of (9.2.1) does not depend on t. Because is a 0 mean process, X is
also a 0 mean process. Thus (9.2.1) describes the covariance function of a stationary process. We may thus
write
(9.2.2)
E [X X] ]
B
y
]
Our only concern is with the convergence of the infinite series represented in (9.2.2).
y
y
]
We assume that
B
y y^B s
expression (9.2.2) for in the definition of the autocovariance generating function to obtain
(s)
(9.2.3)
B
y^B
y
] s
(s)
where it is understood that
With this we have
B
y
y^
] s
(s)
(s)
h so that
y h ^ j.
s
B B
y y
^ .
s KJ s K.
B
J
y
y
^
The autocovariance generating function may be used in another clever way. To see this application, let
e^
. Recall by definition, (s)
s.
B
y^B
$ %O e
B
e^
y^B
^
2f(
),
^ |
| .
he covariance generating function is related to the spectral density function by the equation
(9.2.6)
f(
) y 21 (e^ ), ^ |
| .
In our previous example, X! O ! + !^ , (s) O (2 + s + s^ ). Thus
f(
)
We may simplify this to
(e^
)
2
2
J2+2 (e
^
+e]
)
2
K, ^ |
| .
f(
) O 2 2 (1+cos(
)) O
, ^ |
|
which is a particularly simple form of spectral density. Another informative example is the case for white noise,
X! O ! . In this case (s) O so that f(
) O 2 , a constant for ^ |
| . This gives insight to the use
of the phrase, white noise. As with white light, white noise contains all frequencies with equal amplitude. The
fact that the spectral density function is constant verifies that all frequencies are of equal amplitude.
(1]cos(
))
The general spectral density is computed by first observing that the autocovariance generating function
is given by
(s) O (s) (s^ ).
^
for s, we have
Substituting e
(9.2.7)
f(
) O 2 (e^
) (e
) O 2 |(e^
)| .
Thus if we know the infinite moving average form of the general linear model, then we may calculate the
spectral density function directly by (9.2.7). This is a very powerful equation as it allows us to move from a
time domain model to a frequency domain model easily. It also has the enormous advantage that if we have a
finite parameter time domain model, we may easily calculate the spectral density parametrically in stead of
having to use the nonparametric smoothing windows techniques we studied in Chapter 7.
9.3 Stationarity and Invertibility. We have already determined that for the infinite moving average
0 so that | |
| | z B so that
*
O
0 as well.
y
y
| | |.
This
| 1.
(s)
y s
*
O
If a time series process with -weights can be changed to a process with converging -weights, it is
said to be invertible. A series may not be invertible because the weights increase instead of decrease.
Consider for example X O (I ^ B) . The inverse operator is (I ^ B)^ y (I+B+ B +) so that
X! y ! ^ X!^ ^ X!^ ^ .
Thus y ^ so if ||>1, the weights increase. We say X! is invertible if the generating function of the
weights,
(s)
*
O
s
1.
In the special case of the -weight representation when we have only a finite number of weights, we
call the process autoregressive and write
(9.3.1)
X O X!^ + X!^ + + X!^ + ! .
We denote such a process by AR(p) and p is called the order of the process. We can write (I ^ B ^
^ B ) X! y ! or in operator form
(9.3.2)
(B) X! y !
where, of course, (B) y (I ^ B ^ ^ B ).
If only a finite number of the weights are non-zero, we call the process a moving average process
and write
(9.3.3)
X! y ! ^ !^ ^ !^ ^ ^ !^ .
We denote such a process by MA(q) where now q is called the order of the process. We can write X! y
(I ^ B ^ ^ B )! or in operator form
(9.3.4)
X! y (B)! .
If we have more complicated data, it is sometimes necessary to use a mixed model which is given in operator
form by
(9.3.5)
(B)X! y (B)! .
This is called a mixed autoregressive moving average process of order (p, q) and abbreviated ARMA (p, q).
These three classes of models represent the most general form of finite parameter linear models for the structure
of a time series process and are commonly know as Box-Jenkins models. We will study these time domain
models in more detail. We note in closing this section that autoregressive processes are always invertible by
definition and that moving average processes are always stationary. Autoregressive and ARMA processes may
be stationary and moving average and ARMA processes may be invertible. Conditions under which they are are
the subject of our continued study.
Chapter 10
Autogregressive Models
10.1 Autoregressive Models. Let us now consider a general p! order autoregressive model,
X! y X!^ ++ X!^ +! or (I ^ B ^ ^ B ) X! y (B) X! y ! . Clearly this is an invertible
process, so the question in this case is whether it is stationarity. For stationarity (s) must converge for |s| |
1. We know by definition that (s) is a p! order polynomial. Suppose we write in terms of its factors
(10.1.1)
(s) y (I ^ u s)(I ^ u s) (I ^ u s).
We may expand (10.1.1) in terms of partial fractions as follows
k
1
1
k
(10.1.2)
(s) O (1^us)(1^us) O (1^us) ] ] (1^us)
so that
X!
^ (B) !
y
k
1^u B ! .
Now expand 1^1 B ! as (1 + u B + u B + ) ! which is stationary if |u | z 1. Hence the p! order
u
autoregressive process is stationary if |u | z 1 for i y 1, 2, , p. Setting (s) y 0, we find (1 ^ u s)
1
1
1
(1 ^ u s) y 0. Hence the roots are the polynomial (s) are s y u , , u . Thus | u |>1 if and only if
|u | z 1, so that an autoregressive process is stationary if and only if the roots of the equation, (s) y 0, lie
outside the unit circle. The equation, (s) y 0, is called the auxiliary equation and has much practical
importance in time series analysis.
Let us again consider the autogressive model, X O X!^ ++ X!^ +! . We wish to develop an
expression for the autocovariance function. Multiplying through this expression by X!^ , we obtain
(10.1.3)
X X^ O X!^ X!^ + + X!^ X!^ + ! X!^ .
Taking expectations, we see that
(10.1.4)
O ^ + + ^ , > 0.
Note that X^ depends only on !^ , !^ ^ ,
and so is uncorrelated with ! . Thus E[! X!^ ] y 0.
Dividing (10.1.4) by , we obtain an expression for the autocorrelation function
y ^ + + ^, > 0,
or equivalently
(10.1.5)
(B) O 0.
This latter expression is a homogeneous linear difference equation whose solution is given as a linear
combination of a function of the roots of the auxiliary equation.
(10.1.7)
y
y
y
y ,
and letting k
1, 2
+ + + ^
+ + + ^
^ + ^ + +
This set of equations is known as the Yule-Walker equations and they relate the autoregressive coefficients to the
autocorrelations (or equivalently the autocovariances). In matrix form write
1
x 1
x {
x {
z
O , O , P O z
y |
y |
y ^
^
^ {
^ }
P^ .
We have used the basic autoregressive difference equation to derive the coefficients in terms of the
autocorrelations . Next we wish to derive the variance, , in terms of the autocorrelations and the
autoregressive coefficients. Let's consider again X! O X!^ ++ X!^ +! . Multiplying by X! , we obtain
X y X!^ X! + + X!^ X! + ! X! .
!
Noting that E [ X ] O E [ (B) ] O E [ ] O , we obtain
(10.1.9)
O + + + .
,
Now dividing by y '
1 y + + +
or
' O 1^ ^ ^ .
Also noting (s)
(s) ,
2
|(e )|
1
2 |1^ e^
^ ^ e^
| .
10.2. First Order Autoregressive Process (Gauss-Markov Process). The first order autoregressive
process is X! O X!^ +! O ! + !^ + !^ + where ^ 1 z z 1 for stationarity. The auxiliary
equation is (s) O (1 ^ s) y 0, so that the difference equation y ^ has solution y , k } 0. In
particular O O so that often the process is written X! O X!^ +! . The autocorrelation function either
decays exponentially } 0 or oscillates, but the oscillations decay exponentially. The variance ' is given by
'
(10.2.1)
1^
1^
f(
)
1
2 |1^ e^
| ,
^ |
| .
1
2 1] ^ (e^
]e]
) ,
^ |
z
f(
) O 2 1]^21cos() .
10.3 Second Order Autoregressive Process. The second order autoregressive process is
(10.3.1)
X! O X!^ + X!^ + ! .
For stationarity, the roots of the auxiliary equation, (s) O 1 ^ s ^ s y 0, must be outside the unit
circle. A necessary and sufficient condition for this is that i. + z 1, ii. ^ z 1, and iii.
^ 1 z z 1. This is a triangular region of admissibility for the coefficients and . **** Assuming
distinct roots, the solution to the difference equation, O ^ + ^ , is
(10.3.2)
O A u + A u .
We need some initial conditions to solve for the unknown coefficients, A and A . Recall that O 1 and,
from the Yule-Walker equations, we have O + .
Hence O 1^ .
Now since
1 ^ s ^ s y (1 ^ u s) (1 ^ u s) y 1 ^ (u +u )s + u u s , we can write y u +u and y ^ u u .
Thus
u u
O 1]]u .
u
Using this together with the condition that O 1 , we can write
(10.3.3)
O A +A O 1 and O A u +A u O 1u]]uu .
u
Note the symmetry in u and u , A and A . Now
A u + (1 ^ A )u y 1u]]uu
u
which implies
u ]u ^u ^u u
A (u ^ u ) y 1]uu
so that
u (1^u )
(10.3.4)
A y (u^u)(1]uu) .
By symmetry,
u (1^u )
(10.3.5)
A y (u^u)(1]uu),
hence
(1^u )u ] ^ (1^u )u ]
(10.3.6)
.
O
(u ^u )(1]u u )
2u
O A O 1 and O A u+A u O 1]u .
A + 1
so that
(10.3.7)
Hence
A
u + 1^u u
1]u
2
1]u
1^u
1]u .
1]u ] ^ u
1]u
u .
Thus
(1
(10.3.8)
O 1] ]]u^ )u u .
1
Thus either decays exponentially or is dominated by a decaying exponential.
Finally, if the roots are complex conjugate pairs, we can write u O ue
and u O ue^
. Since
|u | z 1, we also have |u| z 1 . Now
O A u e ] A u e^ .
Taking complex conjugates,
d
d
d O O A u e^ + A u e .
@ O A and A@ O A . Let A O (a ^ bi) and A y (a ] bi). Then
Since is real, we can identify A
O (a ^ bi)u e + (a+bi)u e^ .
Collecting coefficients, we have
(10.3.9)
O u Ja (e
^
+e^ )
+b (e ^e )
2
2i
K.
(10.3.10)
where a and b can be written in terms of A . Once again we find is dominated by a decaying exponential.
Using the Yule-Walker Equations,
(10.3.11)
+ and
From this
(10.3.12)
^
.
1^
O (1 ^ ) y 1^ 1^( ^ ) , so that
^
O
(10.3.13)
(1^ )
.
1^
1^
'
(10.3.15)
1^ ^ .
+
The variance ' is given by
1^ 1^ ^
(1^ )
.
1^ ^ ^ (1^ )^
' O
1^
1] (1^ ) ^ .
f(
)
1
2 |1^ e^
^ e^
| .
1
2 1] ] ]2 ( ^1). e
]e^
/ ^ 2 . e
]e^
/ .
2
2
Using trigonometric identities once again, we obtain the closed form for the second order autoregressive spectral
density
f(
)
(10.3.18)
1
2 1] ] ^2 (1^ ) cos () ^ 2 cos (2) .
Thus the theory of autoregressive processes is quite complete with respect to expressing the autoregressive
parameters , , , and, by extension, f(), in terms of the autocovariance, . Since we have the ability
to estimate directly from the data, we have a reasonably satisfactory approach to estimating the
autoregressive structure in general. Two facts are evident in this discussion. First, we are estimating functions
of the moments, that is, functions of the covariances. This is essentially applying the method of moments,
which, although it has a long history, is certainly not guaranteed to give us any statistical optimality. It would
be better to consider least squares or maximum likelihood techniques if available. We shall do this later. Also,
left unsaid was how to determine the order, p, of the process. We address this issue by the use of partial
autocorrelations.
10.4 Partial Autocorrelations. The autocorrelation, , is a valuable diagnostic tool in fitting time
series models. As we have just seen, is a function which either is a decreasing exponential function or at
least is dominated by a decreasing exponential function whenever X ! is an autoregressive process. A second
function which will serve as a diagnostic aid is the partial autocorrelation function of lag k, . To define this
let be the j! coefficient of an autoregressive process of order . Thus, our standard p! order autoregressive
model becomes
O ^ ] ^ + + ^ j y 1, 2, , .
From the Yale-Walker equations, we have
(10.4.1)
^
+ y P
so that can be easily picked out. Note O . Also worth noting is that if X is truly an autoregressive
process of order p, the will be non-zero for | p.
The partial autocorrelations may be estimated by fitting autoregressive processes of orders 1, 2, 3,
^
^
^
and picking out the estimates , , ,
. If X is an autoregressive process of order p, then for
^ should approximate 0 and the its variance given by
p+1,
^
var [ ]
1
n
^
so that the standard error of is approximated by
(10.4.2)
^
SE[ ]
1
ln , }
p+1.
^
As a diagnostic tool, we can thus plot the as a function of . By drawing in the SE limits we can compare
^
^
the values of to the standard error estimates. If lies well within the two standard errors limit, it will
probably be estimating y 0 and hence the process is of order less than .
Chapter 11
Moving Average Processes
11.1 Moving Average Processes We now turn our attention to a moving average process of order q given
by the expression, X O ^ !^ ^ ^ !^ . Expressing thus in terms of a backwards shift operator, we
obtain
X! y (I ^ B ^ ^ B )! y (B)!
(11.1.1)
Recall that a moving average process is always stationary, but may not be invertible. However, in its inverted
form we would have ! y ^ (B) X! . If we let
(s)
y & (1 ^ v s)
y
and expanding as we did in the autoregressive case using partial fractions, we obtain
(s)
y ^ (s) y
y
m
1^v B .
1 + v B + v B ]
y 1, 2, , q. Or equivalently X y
y 0, lie outside the unit circle.
Just as the autoregressive process was always invertible because |(s)| z B for all s, so too is the
moving average process always stationary because |(s)| y |(s)| z B for all s. It is worthwhile to note this
E [X X^ ]
It recurs frequently.
To compute the
^ !^ ^ ^ ^ !^ ^ )]
Hence, since only terms with similar subscripts on the will contribute to the covariance, we have
(11.1.2)
(1 + ] ] )
and
(11.1.3)
OJ
.
0,
} q]1
(11.1.4)
OJ
^ + ]++ ^ 1 + + ++ , y 1, 2, , q 0,
} q + 1.
Notice for the MA(q) process, the autocorrelations behave like the partial auto-correlations do for the AR(p) in
the sense that they are non-zero up to some cut-off point q and the zero from then on.
Similarly for the moving average process the partial autocorrelations will be dominated by an
exponential function. Again the duality between autoregressive and moving average shows up. Comparison of
these two functions, the autocorrelation and the partial autocorrelation, frequently allows us to distinguish
between
data
which
follows
autoregressive
and
moving
average
processes.
Since
(s) O (s) O 1 ^ s ^ ^ s , the spectral density function is
(11.1.5)
f(
)
2
|(e^
)|
2
|1 ^ e^
^^
e^
| .
11.2 The First Order Moving Average Process It is instructive to consider several simple cases of the
moving average process. We begin with the first order moving average, X! O ! ^ !^ . In the operator form
we have,
(11.2.1)
For invertibility,
by
X
^ 1 z z 1 since 1
y (I ^ B) y
(B) .
(11.2.2)
(1+ )
OJ
(11.2.3)
^
,
1]
y1
.
0, } 2
Of course, as always O 1. We note from this equation, O ( ^ )/(1 ] ), that and are related in a
-
non-linear fashion. In fact if is a solution then so is since
1
1] 1
^
+1
^
.
1]
The spectral density function for the first order moving average process is
f(
)
(11.2.4)
2
|1 ^ e^ | .
O 2 .1+ ^ 2 $ e
f(
)
O 2 .1 + ^
^
]e
%/
which reduces to
(11.2.5)
2 cos ()/.
We can calculate the partial autocorrelations from the Yule-Walker equations and our expressions for
. In particular
P
^
1]
x 1
z ^
z 1]
z
z
y 0
^
1]
^
1]
1
0
{
}
}
}
}
|
so that
P B
x 1] {
z 0 }
z
}
|
y 0
^
O ^ [1 ^ ]/[1 ^ ] ]
which is dominated by an decreasing exponential. Notice that the partial autocorrelation for the moving
average process has this damped exponential property that the autocorrelation of the autoregressive process has
and simultaneously the autocorrelation of the moving average process has the cutoff property that the partial
autocorrelation of the autoregressive process has.
11.3 The Second Order Moving Average Process The second order moving average process is
X O ^ !^ ^ !^ . Of course it is automatically stationary. In analogy with the AR(2) process, it will
be invertible if and only if the roots of the equation, 1 ^ s ^ s y 0, lie outside the unit circle.
Equivalently, again in analogy to the stationary conditions for the AR(2) case, if and only if i. + z 1, ii.
^ z 1 and iii. ^ 1 z z 1. The variance of the second order moving average process is
(11.3.1)
y (1+ + )
^ (1^ )
1] ]
(11.3.2)
^
1] ]
and
0,
3.
(11.3.3)
2
|1 ^ e^ ^ e^
f(
)
2
.1+ + ^ 2 (1 ^ ) cos() ^ 2 cos(2)/, ^ z z .
Again it is clear that the basic theory for moving average time series is in hand. The major difficulty is that, in
general, the relationship between the moving average coefficients, , and the covariance function, , is
nonlinear whereas the relationship between the autoregressive coefficients, , and the covariance function, ,
is given in a linear fashion by the Yule-Walker equations. For this reason as well as their analogy to differential
equations modeling physical systems, the autoregressive models seem to be somewhat preferred. Nonetheless,
the autoregressive models may not be a very economical representation. The most general finite parameter
representation of a linear process is given by the mixed autoregressive-moving average models which we study
in the next section.
11.4 Mixed Processes-Autoregressive-Moving Average Processes In order to achieve a representation
of a time series data set with a finite number of parameters, we can consider a mixed model
(11.4.1)
X!
(B)X
( B ) .
we infer that the process will be stationary if ^ (B) exists, that is, if the roots of the auxiliary equation,
(s) y 0, lie outside the unit circle. By the same token,
^ (B)(B)
X!
!
so that the process is invertible if ^ (B) exists, that is, if the roots of the auxiliary equation, (s) y 0, lie
outside the unit circle.
Next, we multiply (11.4.1) by X!^ to obtain
(11.4.3)
Notice that E[^ X!^ ] O 0 provided t ^ j>t ^ , that is, provided j z . Write E[!^ X!^ ] y % (
that
^ j) so
( )
0,
( )
and
0,
0.
(11.4.4)
Hence if
O ^ ++ ^ + % ( ) ^ % ( ^ 1) ^ ^ % ( ^ q).
^ q { 0, that is, { q
(11.4.5)
^ ++ ^
^ ++ ^,
and so
(11.4.6)
{q
(B)
(11.4.7)
0,
{ q.
Thus there are q autocorrelations , which depend on , , as well as , , . The remaining
autocorrelations satisfy the usual difference equation, (B) O 0, and, hence, will be an exponentially decaying
functions or at least dominated by an exponentially decaying function. When O 0,
++ + ^ % ( ^ 1) ^ ^ % ( ^ q).
Hence to solve for we must solve simultaneously with the p equations (11.4.4) for
1, 2 , , p.
f()
2
(e^
)
,
(e^
)
^ | | .
Because (s) and (s) are polynomials, this is sometimes referred to as a rational spectral density (rational in
polynomials). It is easy to see that rational spectral densities can approximate a very large class of functions, so
that the ARMA model is, indeed, a very powerful one. We may expand these polynomials to
(11.4.9)
f() O
1^e^
^^^
2 1^ e^
^^ e^
.
Finally, since ! O ^ (B)(B)X! and ^ (s) is an infinite series, the partial autocorrelation is infinite in extent
and eventually behaves like the partial autocorrelation of a pure moving average process. It is perhaps worth
examining the ARMA (1, 1) in a bit more detail.
11.5 The ARMA (1, 1) Process To understand the generality of the ARMA (p, q) process, it is
worthwhile examining the ARMA (1, 1) process. This is particularly useful since the parameter estimation
problem for the ARMA (p, q) is in general quite messy and the ARMA (1, 1) illustrates the principles a bit more
clearly. The ARMA (1, 1) given by
(11.5.1)
The process is stationary if
X! ^ X!^
y ! ^ !^ .
^ 1 z z 1 and invertible if ^ 1 z z 1.
+ ^ ( ^ 1)
(11.5.2)
^,
^
2.
Also note
X ^
X!^ !^
! !^ ^ !^
( ^ 1)
y ^
Thus, we may substitute the expression in (11.5.3) for ( ^ 1) into (11.5.2) to obtain
(11.5.4)
+
( ^ )
^
(1 ^ + ).
^
( ^ )
By a simple substitution,
(11.5.8)
y
(1] ^2 )
.
1^
O (1^
)( ^ )
1^
and
O
O
(1^ )( ^ )
.
(1] ^2 )
^ ,
} 2.
The above discussions allow us to estimate the parameters of an autoregressive process, a moving
average process or a mixed process once we have settled on a model. The model building process is not a trivial
one and fundamental questions as to model order and type must be addressed. We have suggested the use of the
autocorrelation function and the partial autocorrelation function as tools to the end. The following table
summarizes the behavior of different linear models.
Autoregressive
Models in terms
of X's
Moving Average
(B)X! O !
- (B)X!
O - (B)!
O (B)!
- (B)(B)X!
X!
O !
Models in terms
of 's
X!
Weights
Finite Series
Infinite Series
Infinite Series
Weights
Infinite Series
Finite Series
Infinite Series
Always
Stationary
Invertibility
Conditions
Always Invertible
Autocorrelation
Function
Partial
Autocorrelation
Function
Stationarity
Conditions
X!
O !
Mixed (ARMA)
Chapter 12
Nonstationary Processes and Time Series
12.1 Nonstationary Time Series Models In the previous chapter, we introduced the ARMA model
(12.1.1)
(B)X! O (B)!
with (s) and (s) polynomials in s of degree p and q respectively. We assumed that (s) O 0 had roots outside
the unit circle so that the resulting process would be stationary. Suppose now we allow the possibility of roots
on or inside the unit circle. For simplicity consider the AR(1) model
(1 ^ B)X!
!
If is taken as two, root is 1/2, then, for example, the series increases or decreases exponentially essentially
with no effect from the random input, ! . We will often refer to the ! as random shocks or just shocks.
Similarly in any model for which one or more of the roots of (s) y 0 lie strictly within the unit circle, the
process will behave in this exponential manner essentially independent of the ! 's. In this circumstance the
process is essentially deterministic and there is little need for a statistical analysis. We consider then only those
cases where the roots lie on the unit circle.
In particular, we now entertain the model
(12.1.2)
(B)(I ^ B) X! y (B)! .
Here (B) is an ordinary stationary autoregressive operator. We write I ^ B y ", the difference operator. Thus
our model can be written (B)" X! y (B)! or (B)Y! y (B)! with Y! y " X! . After proper differencing
of the process X! we have the usual ARMA (p, q) model. Now X! y S Y! where S is the infinite summation
operator defined by
(12.1.3)
SX!
X
!
y^B
(1+B+B +B +) X!
SX
(1 ^ B)- X
y -X
.
S
S X!
S X!
which may be written as
Similarly
and
X
!
y^B y^B
X.
!
Differencing is very much analogous to differentiation while summing is analogous to integration. For this
reason the model (12.1.2) is called the autoregressive integrated moving average process of order (p, d, q) and
abbreviated ARIMA(p, d, q).
One important feature of the nonstationary process of the type we seek to analyze with the ARIMA
model is that the local behavior must be independent of the level of the process. That is, we want,
(B)(X +c)
(12.1.5)
(B)X .
(B)"(X +c)
(B)[X +c ^ X^ ^ c]
(B)"X!
which is independent of level. Thus, for a process which is homogeneous except in level, the model
(B)"X!
(12.1.6)
(B)!
is appropriate. If, in contrast, a series has neither fixed level nor fixed slope, we require
(B)(X! +mt+c)
(B)X!
(B)" X! .
(B)" X!
(12.1.7)
+(B)!
The operator (B) is the autoregressive operator and it is assumed that the roots lie outside the unit circle. The
operator (B)" is the generalized autoregressive operator and finally (B) is the moving average operator
and it is assumed that the roots here also lie outside the unit circle. If we wish to include a deterministic
function of time as a trend, this can be done by choosing non-zero. For example if d y 1, we can estimate a
deterministic linear trend in the presence of nonstationary noise.
12.2 Some Important Special Cases We consider the following special cases in some detail: i. ARIMA
ii.
(0,
1,
1),
"X! y ! ^ !^ y (I ^ B)! ;
ARIMA
(0,
2,
2),
" X! y a! ^ !^ ^ !^ y (I ^ B ^ B )! ; and iii. ARIMA (1, 1, 1), "X! ^ "X!^ y !^ or
(I ^ B)"X! y (I ^ B)! . Let us now write for the generalized autoregressive operator,
)(B)
(12.2.1)
(B)" .
or
{I ^ (1+)B+B }X!
X!
y (I ^ B)!
It is sometimes convenient to represent the ARIMA models as a infinite moving average, that is, with
weights. For nonstationary time series models, the -weight representation does not converge. Thus strictly
speaking the following discussion is only a formal computation. It may be used to suggest relationships among
the -weights, the -weights, the -weights and the -weights, but because we are manipulating nonconvergent series, these relationships must be verified by more rigorous calculations. We may formally write
X! in the infinite moving average form
X!
so that
(B)! .
)(B) X!
)(B)(B)!
But since
)(B) X!
y (B)!
)(B)(B). Thus,
)(B)
0.
Thus for j sufficiently large, the satisfy the same homogeneous difference equation as the covariance
function, , and, hence, can be written as a mixture of polynomials, exponentials and sinusoids in the
argument j.
Consider for example the ARIMA (1, 1, 1). Here
O I ^ (1+)B+B
)( B )
so that
Thus
and for
y I ^ B.
^ (1+) + y ^
1,
^ (1+)^ +^ y 0.
This is a second order difference equation with roots 1 and ^ . Thus
simultaneously with ^ (1+ )+ y ^ yields
A
1^
1^
and A
y A +A , j } 1.
Solving this
^
1^ .
(A +A )
B
y
!^
with the above values for A and A . Notice that for A 0, this is not a convergent series and so, in some
sense this infinite series representation is nonsense. Even so, it will have some utility in establishing forecasting
procedures for the nonstationary time series models.
Finally we can write the ARIMA time series model in its inverted form (B )X!
X! y X!^ + X!^ ] + ! .
As before, operating on both sides with (B)
y ! or
(B)(B)X!
)(B)
(B)!
)(B)X! .
or
0. Thus (1)
Since (s)
0 so that 1 ^
*
0.
O
1
0 or
equivalently
*
(12.2.4)
O
Thus if d { 0, the -weights must sum to 1. They are not necessarily positive, so that they may not be a
probability distribution.
12.3 Integrated Moving Average Process Models In this section we wish to consider integrated moving
average processes (IMA) of the form " X O (B) . Two models of particular interest are the ARIMA (0, 1,
1) O IMA (1, 1), X O X^ +! ^ !^ and the ARIMA(0, 2, 2) y IMA (2, 2), X! y
2X!^ ^ X!^ +! ^ !^ ^ !^ .
First the IMA(1, 1). Let us note that we may write
I ^ B
(1 ^ )B+(I ^ B)
(1 ^ )B+"
B+",
with
y 1 ^ .
X!
The -weights are
1 and
X
B
B
y
!^
S to both sides
+ ! .
X!^ .
y
y
We notice
OI^
( I ^ B ) ( B )
Equivalently,
(s)
1^s
1^s
(s)
B.
1^s^(1^)s
.
1^s
y 1 ^ (1^s)s .
1^
(s)
y1^
(s)
or
s(1 ^ )(1+s+ s + ).
(1 ^ )^
(1 ^ )^ , j
except for which is 1. Thus X! is an exponentially weighted moving average (EWMA) of previous values.
This is sometimes also called an exponential smoother or an exponential filter. For this case, we can write
(1 ^ )
^ !^ () y
X
^
X
so that
^
X
(1 ^ )
()
()
y
y
X +
^ X
!^
^ X
(1 ^ )
B
y
]^
^ X
]^
^
X
()
X +
(1 ^ )
B
y
X^ .
^
X
()
X + (1 ^ )
(1 ^ )
B
^
y
X^ .
Thus
^ ) y X + (1 ^
^ (), may be thought ofX ! (the level"! of the^ ) X !^(). t. Since ^ 1 z z 1, we have that
The function, X
as
process at time
0 z z 2. In most cases 0 z z 1. If is close to 0, the level" is almost constant. If is close to 1, the
level is determined almost exclusively by the most recent observation, X . Since
^
X y X ^ () + !
and
we have
or
^ ! () y X!+(1 ^ ) ^ !^()
X
X
^ () y ^ !^() + ! + (1 ^ ) ^ !^()
X
X
X
^ ! () y ^ !^ () + !.
X
X
Thus while the process itself has the random shock, ! , the level of the process has random shock, ! , which
may be more or less than ! depending on .
where
y 2 ^ ^ and y ^ 1
or in inverse form
1+ and
y 1 ^ ^ .
" X!
S!^
S, twice
+ S !^ + ! .
But
S!^
so that
S !^
We may expand to get
!^
+ !^ + !^ +
!^ ] ^ ]
S !^
!^
+ 2!^ + 3!^ + .
1
+ j ,
and
(2
(1 ^ s ^ s )(1 ^ s ^ s
O 2 ^ y +
^ ) ^ (1+) y + 2 ^
(1 ^ B ^ B )
0,
^ ).
( + )
3.
Chapter 13
Least Squares Forecasts
13.1 Linear Forecasting Functions. We now wish to consider forecasting based on the general ARIMA
process model
)(B)X!
(B)" X!
(B)! .
We suppose we are currently at time t and we wish to forecast m steps ahead, X!] . We shall write the
^
^
forecast of X!] as X! (m). We wish to write X! (m) as a linear function of the past values of the time series data
^
we know, that is, X! , X!^ , X!^ , . Equivalently we may write X! (m) as a linear function of ! , !^ , !^ ,
. We will assume an infinite moving average representation exists. For a nonstationary model this
representation doesn't converge. Hence, for nonstationary models, the following discussion is a formal
representation, but not a rigorous proof.
We wish to make a linear representation of the process depending on the past values ! , !^ , !^ ,
which we do by setting
^
X! (m)
(13.1.1)
*
*
*
y ! ] ] !^ ] ] !^ ] .
*
Here, the are weights on the ! which may or may not be related to the weights. We do know however
that
(13.1.2)
X!] O !] ] !]^ ] ] ! ] ] !^ ] .
^
Subtracting X! (m) from X!] , squaring and then taking expectations yields
^
E[X!] ^ X! (m)]
(1 ] ] ] ] ^ )
*
]
^
X (m)
(13.1.3)
O
]
.
B
y
*
^ ] / .
]
!]
so that
(13.1.4)
X!]
e! (m)
^
X! (m).
We call e! (m) the forecast error of lead time m. Let us write E[X!] | X! , X!^ , ] as E[X!] | t]. We first
notice that
E[X!] | t]
B
E[
y
!]^
| t]
B
y
E[!]^ | t]
B
y
!]^
^
X! (m).
^
Hence X (m) is the best linear estimator of X] given the data up to time, t. We may now examine
(13.1.5)
E[e! (m) | t]
0.
^
Hence the expected forecast error is 0 and X (m) is the minimum mean square error estimator and is unbiased.
^
Hence X (m) is the best estimator in the Rao-Blackwell sense. Let us also notice that
var[e (m) | t]
(1 ] ] ] ^ ) .
,X
!]
defined as follows
X!]
O !] ] ! ] !^ ]
X]
We may define Q by
Q
w X
y
!]
[w ] (w^ ] w ) ] ] (w ] w ] ] ^ w ) ]
] ( w ] ]
] / .
] .( w ] ] w )
] w )
^
Q y ( w ] ] w )! ] ( w ] ] ] w )!^ ] .
But from our earlier discussion, we know
^
X! (1) y ! ] !^ ]
^
X! (2) y ! ] !^ ]
^
X! (m) y ! ] ] !^ ] .
From this we may conclude that
(13.1.6)
^
Q
^
w X! (1)
]]
^
w X! (m).
^
w X!] is w X! (1)
]]
^
w X! (m).
Correlations of Forecast Errors. Now let us consider the correlations of forecast errors. First we note
that
e! (1)
^
X!] ^ X! (1)
!]
!] .
O
^
y
^
with
1.
Thus
^
/O
^
y
^
y
, 0
|jz
Thus
E[e! (m)e! (m ] j)]
so that
E[e (m)e (m ] j)]
^
] .
y
^
y
]
J K
^
y
^^
y
13.2 Other Forms of Forecast Functions. We have so far considered forecasts from the
formulation of the general linear model. Let us now consider some alternatives. We first observe that
E[X^ | t] O X!^ , j O 0, 1, 2,
^
E[X!] | t] y X! (j), j y 1, 2, 3,
^
E[!^ | t] y !^ y X!^ ^ X!^^ (1), j
and finally
0, 1, 2,
-weight
E[!] | t]
0, j
1, 2, 3, .
(13.2.1)
Using the above forms of the conditional expectations, we can use (13.2.1) as a forecast function. To make this
clearer, it is worth considering some examples. Consider for example (I ^ .8 B)(I ^ B)X!] y !] . Then
X!]
Then
1.8 X!]^
^
X! (1)
^
X! (2)
y
y
y
^
1.8 X! (m ^ 1)
] !] .
1.8 X! ^ .8 X!^
^
1.8 X! (1) ^ .8 X!
^
X! (m)
.8 X!]^
and
^
.8 X! (m ^ 2), m
(I
3, 4, 5, .
" X!]
Then
X!]
y 2 X!]^ ^
.9 B
X!]^
!]
.5 B ) !] .
.9 !]^
.5 !]^ .
O X!]^ ]
B
X]
y
!]
so that
(13.2.2)
^
X (m)
^
X (m ^ j) ] X
^
y
y
!]^
13.3 Practical Forecasting We have now developed three forms of the forecasting function, specifically
(13.3.1)
(13.3.2)
^
X (m) O E[! | t] ] ] E[!^ | t] ] ,
^
X! (m) y ) E[X!]^ | t] ] ] )] E[X!]^^ | t]
^ E[!]^ | t] ^ ^ E[!]^ | t] ] E[!] | t],
and
^
X! (m) y
(13.3.3)
^
X (m ^ j) ] X
^
y
y
!]^ .
Equation (13.3.2) is a useful computational algorithm if the process is purely autoregressive in form. However,
^
if there is a nontrivial moving average component, the computation of E[ ^ | t] involves X !^ ^ X !^^(1).
This leads to an infinite regression and, so, creates a problem with initializing the algorithm. The same is
basically true of (13.3.1). Equation (13.3.3) can be used, but of course we would generally only have data
X , ,X! so that
^
X! (m) y
^
X (m ^ j)
^
y
X
!]^
y
!]^
is the practical formula. Because of the truncation of the second series, this would only be practical in settings
where the -weights converged to 0 sufficiently rapidly that the truncated part of the series were negligible.
Clearly the model for the time series must be invertible for this to be true, but even more the time series must be
sufficiently long so that we can reach sufficiently far back into the tail of the second series.
To calculate the -weights, we use the standard device we have used earlier. Since )(B)X
and also &(B)X O , we can substitute the second in the first to obtain
)(B)X
O (B)
(B)&(B)X .
By equating operators, we get )(B) O (B)&(B). We can then equate the coefficients of the Bs to solve for .
In general
O ) ^
y ) ^ ] () ^ )
(B)
y 0, j { max{p+d, q}.
In the last difference equation, the operator (B) operates on the j subscript. Notice that if the roots of the
auxiliary equation, (s) y 0, lie outside the unit circle, then the -weights will be exponentially damped (or
dominated by something which is exponentially damped) and, hence, we will be able to use the -weight
representation to initiate our forecasting.
13.4 Updating Forecasts One issue of substantial interest is an updating procedure. That is, if we know
the time series up to time, t, e.g. X , ,X! , and we add a new observation, X!] , it is desirable not to have to
recompute the whole forecast from scratch. To see how to update, recall that
and
^
X!] (m) y !] ] ] ! ] ] !^ ]
^
X! (m ] 1) y ] ! ] ] !^ ] ] !^ ] .
Subtracting, we have
^
^
X!] (m) ^ X! (m ] 1)
or
!]
^
^
X!] (m) y X! (m ] 1)
!] .
^
^
Notice that X!] (m) is an updated forecast of X!]] with the shift in origin from t to t ] 1. X!] (m) and
^
X! (m ] 1) both forecast X!]] . Thus to update a forecast, we only need to add a single term, !] , to the
^
old forecast. Recall also that !] y X!] ^ X! (1). Thus the update term is
[X!]
^
^ X! (1)].
To calculate the -weights, we use the same standard device as in the previous section
)(B)(B)
y (B)
so that
(I ^ ) B ^ ^ )] )(I ] B ] B ] ) y (I ^ B ^ ^ B ).
Equating coefficients, as before,
y ) ^
y ) ] ) ^
where
y 1, y 0, j z 0 and y 0, j { q.
For example, in our example in the last chapter, we were considering (I ^ .8B)(I ^ B)X!
computation thus is
y 1
y 1.8
y 1.8 ^ .8
y ! .
The
From this it is simple to compute, y 1, y 1.8, y 2.44, y 2.95, y 3.38, . From this it is a
simple step to update our forecasts. Notice, that the -weights are increasing. This is as it should be since this
is a nonstationary model and the -weights will not converge.
13.5 Confidence Limits for Forecasts Finally, we would like to address the question of confidence
^
limits for the forecasts. Let us consider X!] . We first observe that E[X!] | t] y X! (m) so that
^
var[X!] | t] y E[{X!] ^ X! (m)} | t] y E[{e! (m)} | t] y (1 ] ] ] ^ ) .
^
Assuming the are normally distributed, then so is X] ^ X! (m). Thus upper and lower probability limits for
X!] are
$ %
]
^
X! (m) ] p 1 ]
y
and
$ %
]
^
X (m) ^ p 1 ]
y
where p is the (1 ^ /2) _ 100 percentile of a standard normal distribution. If we know the model exactly,
then, of course, , is known.
In closing this section, it is worth noting that all of this chapter presumes that we know the model
exactly. Thus, none of the coefficients ) , , or are estimated. None of this theory really works if when
we take the conditional expectations, E[ c | t], we must account for coefficients ) , , and which are
random variables depending on the observations, X , ,X . In practice, of course, we use estimates of ) , ,
and based on the data and then forecast into the future based on these estimated coefficients. This is
strictly speaking a questionable practice from a theoretical point of view and is truly a weakness of the theory
just developed. The situation is improved somewhat is the data used to estimate the coefficients, ) , , and
is different from the data used for forecasting X!] . In this case the conditional expectations are at least
legitimate and the minimum mean square error forecast holds at least conditionally. Unfortunately, this is often
not possible, for example, with economic forecasting. In any case, the procedures described above seem to have
value even though the proofs given may not be entirely legitimate in the setting in which these procedures are
often used.
Appendix A
Riemann-Stieltjes Integrals
In our discussion of random process models in Chapter 6, it is convenient to use the concept of
Riemann-Stieltjes integration. Consider an interval (, ) and let xo z x z z x be a partition of (, ) so
that
(A.1)
x z x z z x y
g (x ) (F(x ) ^ F(x
(A.2)
y
^ )).
(A.3)
x y
^ ))
If F is the identity function, the Riemann-Stieltjes integral is the ordinary integral as defined in elementary
calculus. The function F in general must be of bounded variation, but for our purposes, we will make the more
restrictive assumption that F is non-decreasing. The usual properties of integrals hold.
Property 1.
(g(x)
Property 2.
dF(x)
h(x)) dF(x) y
g(x) dF(x)
F (x)
then
g(x) dF(x)
Property 4. If g(x)
h(x) dF(x).
y F( ) ^ F().
Property 3. If f(x)
d F(x)
dx
1,
f(x) dx
O F( ) ^ F()
and, hence,
F(x)
f(u) du.
^B
x z x z z x y
and
y y z y z z y y
and also
x y
B
lim
x
y
,
y y
y y
^
y
g(x, y) F(dx, dy).
^ ,
y )
] F(x^ , y^ ))
21
16
6
4
7
14
34
45
43
48
42
28
10
8
2
0
1
5
12
14
35
46
41
30
24
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
16
7
4
2
8
17
36
50
62
67
71
48
28
8
13
57
122
138
103
86
63
37
24
11
15
1845
1846
1847
1948
1949
1850
1851
1852
1953
1954
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
40
62
98
124
96
66
64
54
39
21
7
4
23
55
94
96
77
59
44
47
30
16
7
37
74