Anda di halaman 1dari 12

CHAPTER 3

STATIONARY TIME SERIES


In this chapter we discuss methods for analyzing one-dimensional time series. The gener-
alization to multiple dimensions will be developed in future chapters.
The main distinction between time series analysis and mere statistics is that the order of
a random sample X
1
, X
2
, . . . , X
N
is important. Consistent with this, we will refer to the
indices 1, 2, . . . , N as time. An ordered collection of random variables will be denoted
by X
t
, indexed by t, and called a stochastic process, random process, or simply process.
If the order of a stochastic process is important, then it cannot be described completely
by the probability distribution at a single time step. Thus, the statement X
t
N(,
2
)
is not adequate. Rather, the probability distribution of a stochastic process is described by
a joint distribution at several different times. For instance, we would need to specify the
joint probability density p(x
1
, x
2
) of the random variable on the rst and second time step.
In the context of weather and climate, the question immediately arises as to what is the
population. The answer is that the population is a hypothetical, innite collection of earths.
This collection is often called an ensemble. Typical examples of ensembles include:
a collection of weather systems with slightly different initial states (often used in
weather predictability studies).
a collection of earths each subjected to the same forcing but having different detailed
weather patterns (often used in climate change studies).
Statistical Methods in Climate Science. By DelSole and Tippett 53
54 STATIONARY TIME SERIES
Although there exists only one earth in reality, we can construct theoretical models of the
climate system. Such models allow us to dene the distribution of the climate population.
Without making any further assumptions, a complete description of a stochastic process
requires specifying the joint distribution of the stochastic process at all possible times.
Clearly, this is a difcult task. To make progress, we constrain the probability model in
some way. In this chapter, we consider stochastic processes that are stationary.
3.1 STATIONARY PROCESSES
Denition 3.1 (Stationary). A stationary stochastic process is a stochastic process in
which the probability of
X
t1
, X
t2
, . . . , X
t
K
(3.1)
is identical to the probability of the shifted set
X
t1+h
, X
t2+h
, . . . , X
t
K
+h
(3.2)
for any collection of time steps t
1
, t
2
, . . . , t
K
, and for any shift h.
Thus, the joint distribution of the stochastic process is invariant with respect to a shift
in time. It follows from this that the distribution is independent of the time origin. If a
process is not stationary, then we call it nonstationary.
A stationary Gaussian process is a process whose joint distribution p(x
1
, x
2
, . . . ) is
Gaussian.
EXAMPLE 3.1
The expectation of a stationary stochastic process is independent of time:
E [Xt
1
] = E [Xt
2
] = ... = E [Xt
k
] = (3.3)
3.2 TIME-LAGGED COVARIANCE FUNCTION
If a stochastic process is stationary, then the covariance between any two time steps of
the process depends only on the difference in times. This fact can be seen by considering
various covariances shifted in time:
cov [X
t
, X
t
] = cov [X
t+1
, X
t+1
] = ... = constant 1
cov [X
t+1
, X
t
] = cov [X
t+2
, X
t+1
] = ... = constant 2
cov [X
t+2
, X
t
] = cov [X
t+3
, X
t+1
] = ... = constant 3
Thus, for stationary processes we denote the covariance between any two time steps by
c

= cov [X
t+
, X
t
] , (3.4)
where the parameter is called the time lag. Note that in this notation c
0
=
2
.
AUTOCORRELATION FUNCTION 55
EXAMPLE 3.2
The time-lagged covariance function of a stationary process is an even function of time-lag.
This can be seen by invoking the fact that the time-lagged covariance of a stationary process
is invariant to shifts in time, and does not depend on the order:
c = cov[Xt , Xt] = cov[Xt, Xt+ ] = cov[Xt+ , Xt] = c (3.5)
3.3 AUTOCORRELATION FUNCTION
One of the most important statistical quantities in time series analysis is the correlation
between the value of a stochastic process at two times:
(, t) =
cov [X
t+
, X
t
]
_
var [X
t+
] var [X
t
]
(3.6)
If the stochastic process is stationary, two simplications can be made. First, the variance
is independent of time, so var[X
t+
] = var[X
t
] =
2
. Second, the covariance depends
only on the time lag. Accordingly, the autocorrelation function of a stationary process can
be represented as

=
c

c
0
. (3.7)
An attractive property of the autocorrelation function is that it is invariant to linear
transformation of the data of the form Z
t
= aX
t
+b. This implies that the autocorrelation
function does not depend on the units used to express the process

is nondimensional.
Notation. The autocorrelation function often is denoted as ACF.
3.4 WHITE NOISE
Denition 3.2 (White Noise). A stochastic process X
t
is called white noise if the value at
one time is statistically independent of its value at all other times. Thus,
cov [X
t
, X
s
] = 0 if t = s (3.8)
for a white noise process.
A nonstationary white noise process has a variance that changes in time. We can gener-
ate a nonstationary white noise process by selecting random numbers independently from
a distribution whose parameters change with time. However, in most practical cases, the
white noise process is assumed to be stationary and hence completely characterized by two
parameters, and
2
, dened as
cov [X
t
, X
s
] =
t,s

2
E [X
t
] = , (3.9)
where
t,s
is the Kronecker delta function, which is one when t = s and vanishes otherwise.
56 STATIONARY TIME SERIES
A white noise process is said to have no memory, in the sense that the value at time t
gives no information about the value at any other time. Stationary white noise is equivalent
to the independent and identically distributed processes studied in previous chapters.
3.5 STATISTICS OF THE TIME MEAN
In practice, only one realization of a stochastic process is available. Thus, technically, we
cannot compute a sample variance of X
t
because we do not have access to more than one
realization of X
t
at a xed value of t. Can we use the time mean to estimate the population
mean? The time mean of a process is dened as
X =
1
N
N

t=1
X
t
(3.10)
If the process is stationary, then the expectation of the time mean is
E
_
X

=
1
N
N

t=1
E [X
t
] = , (3.11)
which implies that the time mean is an unbiased estimate of the population mean of a
stationary process. The variance of the time mean is
var[X] =E
_
_
X
_
2
_
= E
_
_
_
1
N
N

i=1
(X
i
)
_
_
_
1
N
N

j=1
(X
j
)
_
_
_
_
=
1
N
2
N

i=1
N

j=1
E [(X
i
)(X
j
)]
=

2
N
2
N

i=1
N

j=1

ij
(3.12)
where we have used the denition of correlation, and the fact that the covariance of a
stationary process depends only on the difference in times.
Since the term within the double sum depends only on the difference i j, it is useful
to change variables. Let the difference variable be = i j. Then, changing from (i, j)
coordinates to (i, ) coordinates maps the original rectangular domain into a parallelogram,
as indicated in g. 3.1 and table 3.1, with the nal summation reducing to
STATISTICS OF THE TIME MEAN 57
Page 5 of 35
Change of Variables
i j J=i-j
A 1 1 0
B N 1 N1
C N N 0
D 1 N -(N-1)
Figure 3.1 Schematic of how the summation domain changes from the old coordinate
system (left) to the new coordinate system (right).
i j =i-j
A 1 1 0
B N 1 N1
C N N 0
D 1 N -(N-1)
Table 3.1 Table specifying how the end points of the summation domain change
from the old coordinate system to the new coordinate system.
N

i=1
N

j=1

=(N1)
+N

i=1
+
N1

=1
N

i=+1

=(N1)
( + N) +
N1

=1
(N )

N1

=(N1)
(N ||) . (3.13)
The change in variables gives
var
_
X
_
=

2
N
2
N1

=(N1)

(N ||) =
2
N
1
N
, (3.14)
where
N
1
=
N1

=(N1)

_
1
||
N
_
. (3.15)
Since variance is non-negative, the above derivation proves that N
1
is non-negative for any
stationary process.
If the process is white noise, then the variance of a sample mean (3.14) reduces to
var
_
X
_
=

2
N
, (3.16)
58 STATIONARY TIME SERIES
which we had derived earlier assuming that each sample was independent and identically
distributed. Loosely speaking, the ratio N/N
1
can be interpreted as the effective sample size
for the purpose of estimating the variance of the time mean.
3.6 THE ERGODIC PROPERTY
In the limit of large N, we have
lim
N>
N
1
=

= 1 + 2

=1

(3.17)
If this sum converges, then 1) the autocorrelation function vanishes in the limit of large
lag , and 2) the variance of the time mean vanishes in the limit of large N. This means
that the time average converges to the ensemble average in the limit of N . Processes
for which the time average of a single realization equals the ensemble average are called
ergodic. By this denition, an ergodic process is a process in which the innite sum of
correlations over time lag converges.
Ergodic implies that the autocorrelation function vanishes in the limit of large lag. Thus,
after a sufciently long time, the value of a process is uncorrelated with itself. Strictly
speaking, lack of correlation does not necessarily imply independence. However, if two
samples separated in time are in fact independent, then classical statistics with independent
samples can be applied. What is interesting is that, for the purpose of estimating the mean,
the process need not be strictly independent after a sufciently long lag in order to provide
a useful estimate of the population mean. The parameter N
1
can be interpreted as the
time we need to wait before we have an effectively independent sample (for the purpose of
measuring the mean).
3.7 SAMPLE TIME-LAGGED COVARIANCE
In dealing with observations, we generally do not know the time-lagged covariance func-
tion of a process, so we must estimate it from data. The estimator for the time-lagged
covariance differs slightly from the sample covariance function.
Denition 3.3 (Sample Autocovariance Function). The sample autocovariance function
is dened
c

=
1
N
N||

n=1
_
X
n+||
X
_ _
X
n
X
_
, (3.18)
for = 1, 2, . . . , (N 1).
Note that autocovariance function normalizes sums by 1/N, rather than by the number
of terms in the sum 1/(N ||). Also, this denition involves the time mean X over the
entire sample of N, rather than the mean of just the terms in the sum over N || terms.
The sample autocorrelation is dened as

=
c

c
0
. (3.19)
BARTLETTS THEOREM (1946) 59
This denition differs from the correlation coefcient between X
n
and X
n+
in two ways:
(1) the variance in the denominator is estimated from the entire record, not from the sep-
arate variances of X
n
and X
n+
estimated over N || terms, and (2) the time mean of
the full record of size N is used to compute anomalies from X
n
and X
n+
.
The justication for the above denitions cannot be fully explained now, but are related
to ensuring that the resulting set of autocovariances satisfy certain realizability conditions
(such as positive deniteness of the correlation matrix and positive power spectrums).
3.8 BARTLETTS THEOREM (1946)
The sample correlation

is a random variable because it is a function of random variables.


The sampling properties of the sample autocorrelation is given by the following theorem.
Theorem 3.1 (Bartletts Theorem). If x
t
is a stationary Gaussian process with autocor-
relation function

, then for 0 and large N,

is approximately normally distributed


with mean E[

] =

and variance
var [

]
1
N

m=
_

2
m
+
m+

m
+ 2
2

2
m
4

m
_
. (3.20)
EXAMPLE 3.3 Variance of the Autocorrelation for White Noise
Question: What is the variance of the sample autocorrelation for a white noise process?
Answer: For white noise, (0) = 1 and (=0) = 0. Substituting in (3.20) gives
var [ ]
1
N
(3.21)
This result is extremely useful in testing the hypothesis of independence.
EXAMPLE 3.4 Variance of Sample Autocorrelation for Large Lags
Question: Consider a stationary process whose autocorrelation function vanishes for all time
lags greater than or equal to K; i.e., = 0 for || K. What is the variance of the sample
autocorrelation for || K?
Answer: In this case, only the rst term in (3.20) survives, leaving
var [ ]
1
N

X
m=

2
m
=
1
N

1 + 2

X
m=1

2
m
!
=
N2
N
, (3.22)
where
N2 = 1 + 2

X
m=1

2
m
. (3.23)
This result reduces to (3.21) if the process is white noise (i.e., K = 1).
60 STATIONARY TIME SERIES
0 20 40 60 80 100

0
.
5
0
.
0
0
.
5
1
.
0
time lag
A
C
F
N
1
= 0.1
N
2
= 12.5

= exp( 0.04 ) cos( 0.8 )


0 20 40 60 80 100

0
.
5
0
.
0
0
.
5
1
.
0
time lag
A
C
F
N
1
= 50
N
2
= 25

= exp( 0.04 )
Figure 3.2 Illustration of two types of autocorrelation functions.
3.9 TIME SCALES
We have now encountered two time scales in sampling distributions, namely
lim
N>
N
1
= 1 + 2

=1

(3.24)
N
2
= 1 + 2

m=1

2
m
(3.25)
The two expressions differ by whether the correlation function is raised to the rst or
second power. The time scale N
1
occurred when we estimated the variance of the time
mean, while the time scale N
2
occurred when we estimated the variance of the sample
correlation function. These results reveal that the concept of the effective samples size, or
effective number of degrees of freedom, depends on the statistic.
A major difference between N
1
and N
2
can be appreciated by considering the auto-
correlations illustrated in g. 3.2. The left panel shows an autocorrelation that oscillates
while the right shows an ACF that decays monotonically with lead time. Both ACFs decay
at the same rate, so the time scale for approaching zero correlation is the same. However,
the oscillatory ACF has time scales N
1
= 0.1 and N
2
= 12.5, revealing that N
1
grossly
underestimates the time scale for effectively independent samples. N
1
is small because the
oscillations occur many times before appreciable amplitude decay, and therefore cancel
each other. In contrast, N
2
is within a factor of 2 of each other for the two cases, because
the oscillations do not cancel each other. Note also that N
1
and N
2
are within a factor of
two of each other for the pure exponential decay. For this reason, 2N
2
is often used as a
measure of the time scale for decay.
3.10 AUTOREGRESSIVE MODEL
Now we consider models that actually generate realizations of a stochastic process. An
autoregressive model represents the current value of a process by a function of previous
THE AR(1) MODEL 61
values, plus noise:
X
t
=
1
X
t1
+
2
X
t2
+ ... +
p
X
tp
+ W
t
+ k, (3.26)
where W
t
is white noise with zero mean and variance
2
, and k is a constant. The above
model is called an autoregressive model of order p, and is denoted as AR(p).
The above model is a regression model, but the dependent variable is a function of
previous values of itself, hence it is autoregressive. There are p + 2 parameters in the
above model, namely k, ,
1
,
1
, . . . ,
p
.
3.11 THE AR(1) MODEL
The AR(1) model is one of the most important stochastic models in climate science so it
is worth reviewing the basic characteristics of this particular model all by itself. An AR(1)
model is of the form
X
t
=
1
X
t1
+ W
t
+ k. (3.27)
The solution to this equation can be inferred by examining the rst few iterations and
looking for a pattern:
X
1
=
1
X
0
+ W
1
+ k
X
2
=
1
X
1
+ W
2
+ k =
1
(
1
X
0
+ W
1
+ k) + W
2
+ k
=
2
1
X
0
+ (1 +
1
)k + (W
2
+
1
W
1
)
.
.
.
X
t
=
t1

m=0

m
1
k +
t
1
X
0
+
t1

m=0

m
1
W
tm
. (3.28)
This result shows that the solution (X
1
, X
2
, . . . , X
t
) is uniquely determined by the
initial condition X
0
and the sequence of forcings W
1
, W
2
, . . . , W
t
. If X
0
is xed,
then the above solution is not stationary it depends on the preselected value of X
0
, and
thus on the time relative to t = 0.
The rst term in (3.28) can be evaluated by recognizing that it is a geometric series.
Using (A.1), we can rewrite the solution as
X
t
= k
1
t
1
1
+
t
1
X
0
+
t1

m=0

m
1
W
tm
. (3.29)
In general, we are interested in bounded solutions; that is, solutions that are nite as
t . Clearly, a necessary condition for the solution to remain bounded is that |
1
| < 1,
otherwise the terms that depend on k and X
0
diverge to innity. It turns out that this also
is a sufcient condition for bounded solutions.
62 STATIONARY TIME SERIES
3.11.1 Asymptotic Stationarity
If we assume that an AR(1) process started in the innite past, then all memory of the
initial condition will have been lost and the process can be considered stationary. To show
this, note that the solution (3.28) assumed that the initial time step was t = 0. We can
re-derive the solution assuming the initial time step is t = j, which turns out to be
X
t
=
tj
1
X
j
+
t1j

m=0

m
1
k +
t1j

m=0

m
1
W
tm
. (3.30)
We recover (3.28) for j = 0. Taking the limit j , corresponding to initializing the
process in the innite past, gives
X
t
=
k
1
1
+

m=0

m
1
W
tm
. (3.31)
In other words, an AR(1) process can be written as an innite sum of white noise processes.
Thus, even though the past values of X
t
do not appear explicitly, the above process is
autoregressive. To understand this, note that the above equation is a running average of
random processes. Thus, the running average at time t is correlated with its value at t 1,
say. The above solution is stationary because all terms on the right hand side are stationary.
Thus, we have proven that AR(1) processes are asymptotically stationary. For short times,
an AR(1) process is not stationary because the mean depends on the initial condition and
the variance depends on time.
3.11.2 The Autocorrelation Function of an AR(1) Process
A key quantity is the autocorrelation function, which we now derive. The derivation is
considerably simplied by assuming the process is stationary, in which case the mean is
independent of time, so let E[X
t
] = . Subtracting the mean from both sides yields
X
t
=
1
(X
t1
) + W
t
+ (
1
1) + k. (3.32)
The constant term (
1
1) + k vanishes because we know that E[X
t
] = k/(1
1
)
from (3.31). Multiplying both sides by X
t
and taking expectations gives
cov[X
t
, X
t
] =
1
cov[X
t1
, X
t
] + cov[W
t
, X
t
]. (3.33)
An important observation is the following: the variable X
t
is independent of W
t
for all
1. This fact follows from causality: the variable X
t1
precedes W
t
. You also can see
this rigorously from the solution (3.28) it depends only on present and past values of W
t
.
Thus, cov[W
t
, X
t
] = 0 for 1, giving
c

=
1
c
1
, (3.34)
where we have used the fact that the covariances depend only on time lag. This recursive
equation has the general solution c

1
c
0
, which implies that the autocorrelation is

=
c

c
0
=
||
1
. (3.35)
THE AR(1) MODEL 63
EXAMPLE 3.5 Effective Sample Size for AR(1) Process
Compute N1 for an AR(1) process with parameter 1.
N1 =1 + 2

X
=1
= 1 + 2

X
=1

1
= 1 + 2

1
1 1
1

(3.36)
=
1 + 1
1 1
. (3.37)
For white noise processes,
1
= 0 and N
1
= 1, consistent with the fact the each sample
is independent of the others. As
1
1, N
1
, consistent with the fact that large
values of
1
correspond to very smooth, slowly varying time series.
EXAMPLE 3.6 Illustrations of AR(1) Processes
Some basic properties of stochastic processes generated by AR(1) models are shown in g.
3.3. The left panels show actual realizations while the right panels shows the corresponding
ACFs. Comparison between the rst and second row shows the result of increasing the value
of 1. We see that the uctuations occur on much longer time scales for larger 1. This long
time scale is reected by the longer time it takes the correlation function to decay. The third
row shows a time series from the same process as in the rst row, but for a shorter length. We
see that the sample autocorrelation function exhibits stronger uctuations at large lags.
The above example illustrates several basic aspects of AR(1) processes. In general, as
1 approaches 1, the time series becomes smoother and the autocorrelation function takes
a longer time to decay. However, after the true ACF essentially vanishes, the sample ACF
still uctuates, mostly within the condence interval. Moreover, these uctuations are not
independent of lead time, but tend to be correlated with lead time.
64 STATIONARY TIME SERIES
Time
x
0 50 100 150 200 250

2
0
2
4
Time Series for x
t
= 0.7 x
t1
+ w
t
(N= 256 )
0 5 10 15 20

0
.
2
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Lag
A
C
F
ACF ( N
1
= 6 and 2N
2
= 6 )
Time
x
0 50 100 150 200 250

1
0

5
0
5
Time Series for x
t
= 0.98 x
t1
+ w
t
(N= 256 )
0 5 10 15 20

0
.
2
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Lag
A
C
F
ACF ( N
1
= 99 and 2N
2
= 100 )
Time
x
0 10 20 30 40 50 60

1
0
1
2
Time Series for x
t
= 0.7 x
t1
+ w
t
(N= 64 )
0 5 10 15 20

0
.
2
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Lag
A
C
F
ACF ( N
1
= 6 and 2N
2
= 6 )
Figure 3.3 A realization of a stochastic process generated by an AR(1) model (left
panels) and associated autocorrelation functions (right panels). The AR(1) and sample size
are indicated in the title of the left panel, and the time scales N1 and 2N2 are indicated in the
title of the right panels. In the right panels, the vertical bars are the sample autocorrelation
(derived from the time series shown in the immediate left panel), the thick dashed curve is the
exact autocorrelation function, and the thin, horizontal dashed lines indicate the condence
interval for zero correlation 2/

N.

Anda mungkin juga menyukai