Anda di halaman 1dari 36

TIME SERIES

Contents
Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Models for time series 1.1 Time series data . . . . . . . . . . . . . 1.2 Trend, seasonality, cycles and residuals 1.3 Stationary processes . . . . . . . . . . 1.4 Autoregressive processes . . . . . . . . 1.5 Moving average processes . . . . . . . . 1.6 White noise . . . . . . . . . . . . . . . 1.7 The turning point test . . . . . . . . . iii iii iv 1 1 1 1 2 3 4 4 5 5 5 6 6 7 7 8 9 9 9 12 13 13 15 16 17 17 17 18 19

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

2 Models of stationary processes 2.1 Purely indeterministic processes . . . . . . 2.2 ARMA processes . . . . . . . . . . . . . . 2.3 ARIMA processes . . . . . . . . . . . . . . 2.4 Estimation of the autocovariance function 2.5 Identifying a MA(q ) process . . . . . . . . 2.6 Identifying an AR(p) process . . . . . . . . 2.7 Distributions of the ACF and PACF . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

3 Spectral methods 3.1 The discrete Fourier transform . . . . . . . . . . . . . . . . . . . . . . 3.2 The spectral density . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Analysing the eects of smoothing . . . . . . . . . . . . . . . . . . . . 4 Estimation of the spectrum 4.1 The periodogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Distribution of spectral estimates . . . . . . . . . . . . . . . . . . . . 4.3 The fast Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . 5 Linear lters 5.1 The Filter Theorem . . . . . . . . . . . . 5.2 Application to autoregressive processes . 5.3 Application to moving average processes 5.4 The general linear process . . . . . . . .
i

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

5.5 Filters and ARMA processes . . . . . . . . . . . . . . . . . . . . . . . 5.6 Calculating autocovariances in ARMA models . . . . . . . . . . . . . 6 Estimation of trend and seasonality 6.1 Moving averages . . . . . . . . . . . 6.2 Centred moving averages . . . . . . 6.3 The Slutzky-Yule eect . . . . . . . 6.4 Exponential smoothing . . . . . . . 6.5 Calculation of seasonal indices . . . 7 Fitting ARIMA models 7.1 The Box-Jenkins procedure . . 7.2 Identication . . . . . . . . . . 7.3 Estimation . . . . . . . . . . . . 7.4 Verication . . . . . . . . . . . 7.5 Tests for white noise . . . . . . 7.6 Forecasting with ARMA models 8 State space models 8.1 Models with unobserved states 8.2 The Kalman lter . . . . . . . 8.3 Prediction . . . . . . . . . . . 8.4 Parameter estimation revisited

20 20 21 21 22 22 23 24 25 25 25 25 27 27 28 29 29 30 31 32

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

ii

Syllabus
Time series analysis refers to problems in which observations are collected at regular time intervals and there are correlations among successive observations. Applications cover virtually all areas of Statistics but some of the most important include economic and nancial time series, and many areas of environmental or ecological data. In this course, I shall cover some of the most important methods for dealing with these problems. In the case of time series, these include the basic denitions of autocorrelations etc., then time-domain model tting including autoregressive and moving average processes, spectral methods, and some discussion of the eect of time series correlations on other kinds of statistical inference, such as the estimation of means and regression coecients.

Books
1. P.J. Brockwell and R.A. Davis, Time Series: Theory and Methods, Springer Series in Statistics (1986). 2. C. Chateld, The Analysis of Time Series: Theory and Practice, Chapman and Hall (1975). Good general introduction, especially for those completely new to time series. 3. P.J. Diggle, Time Series: A Biostatistical Introduction, Oxford University Press (1990). 4. M. Kendall, Time Series, Charles Grin (1976).

iii

Keywords
ACF, 2 Akaikes AIC, 26 AR(p), 2 ARIMA(p, d, q ), 6 ARMA(p, q ), 5 autocorrelation function, 2 autocovariance function, 2, 5 autoregressive integrated moving average process, 6 autoregressive moving average process, 5 autoregressive process, 2 backshift operator, 17 Box-Jenkins, 25 BoxPierce, 27 centred average of fours, 22 centred-moving average, 22 classical decomposition, 1 correlogram, 6 estimation, 25 fast Fourier transform, 16 lter generating function, 17 Fourier frequencies, 13 Gaussian process, 5 general linear process, 19 identiability, 19 identication, 25 invertible process, 4 Kalman lter updating equations, 31 Levinson-Durbin recursion, 7, 25 linear lter, 12 MA(q ), 3 moving average process, 3 nonnegative denite sequence, 6 PACF, 8 periodogram, 14 purely-indeterministic, 5 sample partial autocorrelation coecient, 8 second order stationary, 2 simple exponential smoothing, 23 Slutzky-Yule eect, 22 spectral density function, 10 spectral distribution function, 9 state space model, 5 strictly stationary, 1 strongly stationary, 1 symmetric moving average, 21 transfer function, 12, 17 turning point test, 4 variate dierence method, 22 verication, 27 weakly stationary, 2 white noise, 4, 10 Yule-Walker equations, 3, 25

iv

1
1.1

Models for time series


Time series data

A time series is a set of statistics, usually collected at regular intervals. Time series data occurs naturally in many application areas. economics - e.g., monthly data for unemployment, hospital admissions, etc. nance - e.g., daily exchange rate, a share price, etc. environmental - e.g., daily rainfall, air quality readings. medicine - e.g., ECG brain wave activity every 28 secs. The methods of time series analysis pre-date those for general stochastic processes and Markov Chains. The aims of time series analysis are to describe and summarise time series data, t low-dimensional models, and make forecasts. We write our real-valued series of observations as . . . , X2, X1, X0, X1 , X2, . . . , a doubly innite sequence of real-valued random variables indexed by Z. 1.2 Trend, seasonality, cycles and residuals

One simple method of describing a series is that of classical decomposition. The notion is that the series can be decomposed into four elements: Trend (Tt) long term movements in the mean; Seasonal eects (It ) cyclical uctuations related to the calendar; Cycles (Ct) other cyclical uctuations (such as a business cycles); Residuals (Et ) other random or systematic uctuations. The idea is to create separate models for these four elements and then combine them, either additively Xt = Tt + It + Ct + Et or multiplicatively Xt = Tt It Ct Et . 1.3 Stationary processes

1. A sequence {Xt , t Z} is strongly stationary or strictly stationary if (Xt1 , . . . , Xtk ) =(Xt1 +h , . . . , Xtk +h) for all sets of time points t1 , . . . , tk and integer h.
1
D

2. A sequence is weakly stationary, or second order stationary if (a) E (Xt ) = , and (b) cov(Xt , Xt+k ) = k , where is constant and k is independent of t. 3. The sequence {k , k Z} is called the autocovariance function. 4. We also dene k = k /0 = corr(Xt , Xt+k ) and call {k , k Z} the autocorrelation function (ACF). Remarks. 1. A strictly stationary process is weakly stationary. 2. If the process is Gaussian, that is (Xt1 , . . . , Xtk ) is multivariate normal, for all t1 , . . . , tk , then weak stationarity implies strong stationarity. 3. 0 = var(Xt ) > 0, assuming Xt is genuinely random. 4. By symmetry, k = k , for all k . 1.4 Autoregressive processes
p

The autoregressive process of order p is denoted AR(p), and dened by Xt =


r=1

r Xtr +

(1.1)

where 1 , . . . , r are xed constants and { t } is a sequence of independent (or uncorrelated) random variables with mean 0 and variance 2. The AR(1) process is dened by Xt = 1 Xt1 + Xt = + 1 ( + 1 ( + )) =
t.

(1.2) + 2 1 +

To nd its autocovariance function we make successive substitutions, to get


t t 1 t 2 t

+ 1

t 1

t 2

The fact that {Xt } is second order stationary follows from the observation that E (Xt ) = 0 and that the autocovariance function can be calculated as follows: 0 = E
t

+ 1

t 1

2 1 t 2

= 1+

2 1

4 1

2 + = 1 2 1
2

k = E
r=0

r 1 t r
s=0

s 1 t+ k s
2

2 k 1 = . 1 2 1

There is an easier way to obtain these results. Multiply equation (1.2) by Xtk and take the expected value, to give
E (Xt Xtk ) = E (1 Xt1 Xtk ) + E ( t Xtk ) .

Thus k = 1 k1, k = 1, 2, . . . Similarly, squaring (1.2) and taking the expected value gives
2 2 2 E (Xt2 ) = 1 E (Xt21 ) + 21 E (Xt1 t ) + E ( 2 t ) = 1 E (Xt1 ) + 0 +

and so 0 = 2 /(1 2 1 ). More generally, the AR(p) process is dened as Xt = 1 Xt1 + 2 Xt2 + + p Xtp +
t

(1.3)

Again, the autocorrelation function can be found by multiplying (1.3) by Xtk , taking the expected value and dividing by 0, thus producing the Yule-Walker equations k = 1 k1 + 2 k2 + + p kp , k = 1, 2, . . .

These are linear recurrence relations, with general solution of the form
|k | , k = C1 1 + + Cpp |k |

where 1 , . . . , p are the roots of p 1 p1 2 p2 p = 0 and C1, . . . , Cp are determined by 0 = 1 and the equations for k = 1, . . . , p 1. It is natural to require k 0 as k , in which case the roots must lie inside the unit circle, that is, |i | < 1. Thus there is a restriction on the values of 1 , . . . , p that can be chosen. 1.5 Moving average processes

The moving average process of order q is denoted MA(q ) and dened by


q

Xt =
s=0

t s

(1.4)

where 1, . . . , q are xed constants, 0 = 1, and { t } is a sequence of independent (or uncorrelated) random variables with mean 0 and variance 2. It is clear from the denition that this is second order stationary and that k = 0, 2
q |k | s=0 s s+k ,

|k | > q |k | q

We remark that two moving average processes can have the same autocorrelation function. For example, Xt =
t

t 1

and Xt =

+ (1/)

t 1

both have 1 = /(1 + 2), k = 0, |k | > 1. However, the rst gives


t

= Xt

t 1

= Xt (Xt1

t 2 )

= Xt Xt1 + 2 Xt2

This is only valid for || < 1, a so-called invertible process. No two invertible processes have the same autocorrelation function. 1.6 White noise

The sequence { t }, consisting of independent (or uncorrelated) random variables with mean 0 and variance 2 is called white noise (for reasons that will become clear later.) It is a second order stationary series with 0 = 2 and k = 0, k = 0. 1.7 The turning point test

We may wish to test whether a series can be considered to be white noise, or whether a more complicated model is required. In later chapters we shall consider various ways to do this, for example, we might estimate the autocovariance function, say k is near zero for all k > 0. { k }, and observe whether or not However, a very simple diagnostic is the turning point test, which examines a series {Xt } to test whether it is purely random. The idea is that if {Xt } is purely random then three successive values are equally likely to occur in any of the six possible orders.

In four cases there is a turning point in the middle. Thus in a series of n points we might expect (2/3)(n 2) turning points. In fact, it can be shown that for large n, the number of turning points should be distributed as about N (2n/3, 8n/45). We reject (at the 5% level) the hypothesis that the series is unsystematic if the number of turning points lies outside the range 2n/3 1.96 8n/45.

2
2.1

Models of stationary processes


Purely indeterministic processes

Suppose {Xt } is a second order stationary process, with mean 0. Its autocovariance function is k = E (Xt Xt+k ) = cov(Xt , Xt+k ), 1. As {Xt } is stationary, k does not depend on t. 2. A process is said to be purely-indeterministic if the regression of Xt on Xtq , Xtq1, . . . has explanatory power tending to 0 as q . That is, the residual variance tends to var(Xt ). An important theorem due to Wold (1938) states that every purelyindeterministic second order stationary process {Xt } can be written in the form Xt = + 0Zt + 1Zt1 + 2 Zt2 + where {Zt } is a sequence of uncorrelated random variables. 3. A Gaussian process is one for which Xt1 , . . . , Xtn has a joint normal distribution for all t1 , . . . , tn . No two distinct Gaussian processes have the same autocovariance function. 2.2 ARMA processes
p q

k Z.

The autoregressive moving average process, ARMA(p, q ), is dened by Xt


r=1

r Xtr =
s=0

t s

where again { t } is white noise. This process is stationary for appropriate , . Example 2.1 Consider the state space model Xt = Xt1 + Yt = Xt + t .
t,

Suppose {Xt } is unobserved, {Yt } is observed and { t } and {t} are independent white noise sequences. Note that {Xt } is AR(1). We can write t = Yt Yt1 = (Xt + t ) (Xt1 + t1) = (Xt Xt1) + (t t1) = t + t t1
5

Now t is stationary and cov(t , t+k ) = 0, k 2. As such, t can be modelled as a MA(1) process and {Yt } as ARMA(1, 1). 2.3 ARIMA processes

If the original process {Yt} is not stationary, we can look at the rst order dierence process Xt = Yt = Yt Yt1 or the second order dierences Xt = 2Yt = (Y )t = Yt 2Yt1 + Yt2 and so on. If we ever nd that the dierenced process is a stationary process we can look for a ARMA model of that. The process {Yt } is said to be an autoregressive integrated moving average process, ARIMA(p, d, q ), if Xt = d Yt is an ARMA(p, q ) process. AR, MA, ARMA and ARIMA processes can be used to model many time series. A key tool in identifying a model is an estimate of the autocovariance function. 2.4 Estimation of the autocovariance function
T 1

Suppose we have data (X1 , . . . , XT ) from a stationary time series. We can estimate = (1/T ) the mean by X Xt ,
T t=k +1 (Xt

the autocovariance by ck = k = (1/T ) k = k / 0. the autocorrelation by rk =

)(Xtk X ), and X

The plot of rk against k is known as the correlogram. If it is known that is 0 there is no need to correct for the mean and k can be estimated by k = (1/T )
T t=k +1 Xt Xtk

Notice that in dening k we divide by T rather than by (T k ). When T is large relative to k it does not much matter which divisor we use. However, for mathematical simplicity and other reasons there are advantages in dividing by T . Suppose the stationary process {Xt } has autocovariance function {k }. Then
T T T T T

var
t=1

at Xt

=
t=1 s=1

at as cov(Xt , Xs) =
t=1 s=1

at as |ts| 0.

A sequence {k } for which this holds for every T 1 and set of constants (a1 , . . . , aT ) is called a nonnegative denite sequence. The following theorem states that {k } is a valid autocovariance function if and only if it is nonnegative denite.
6

Theorem 2.2 (Blochner) The following are equivalent. 1. There exists a stationary sequence with autocovariance function {k }. 2. {k } is nonnegative denite. 3. The spectral density function, 1 f ( ) = is positive if it exists. Dividing by T rather than by (T k ) in the denition of k ensures that { k } is nonnegative denite (and thus that it could be the autocovariance function of a stationary process), and can reduce the L2-error of rk . 2.5 Identifying a MA(q ) process

k e
k =

ik

1 2 = 0 +

k cos(k ) ,
k =1

In a later lecture we consider the problem of identifying an ARMA or ARIMA model for a given time series. A key tool in doing this is the correlogram. The MA(q ) process Xt has k = 0 for all k , |k | > q . So a diagnostic for MA(q ) is that |rk | drops to near zero beyond some threshold. 2.6 Identifying an AR(p) process

The AR(p) process has k decaying exponentially. This can be dicult to recognise in the correlogram. Suppose we have a process Xt which we believe is AR(k ) with
k

Xt =
j =1

j,k Xtj +

with t independent of X1 , . . . , Xt1. Given the data X1 , . . . , XT , the least squares estimates of (1,k , . . . , k,k ) are obtained by minimizing 1 T
T k 2

Xt
t=k +1 j =1

j,k Xtj

This is approximately equivalent to solving equations similar to the Yule-Walker equations,


k

j =
=1

,k |j |,

j = 1, . . . , k

These can be solved by the Levinson-Durbin recursion:


7

2 Step 0. 0 := 0,

1,1 = 1/ 0,

k := 0

k,k near 0: Step 1. Repeat until k := k + 1


k 1

k,k :=

k
j =1

j,k1 kj

2 k 1

j,k1 k,k j,k := kj,k1, for j = 1, . . . , k 1


2 2 2 := k k 1 (1 k,k )

We test whether the order k t is an improvement over the order k 1 t by looking k,k is far from zero. to see if k,k is called the k th sample partial autocorrelation coecient The statistic (PACF). If the process Xt is genuinely AR(p) then the population PACF, k,k , is exactly zero for all k > p. Thus a diagnostic for AR(p) is that the sample PACFs are close to zero for k > p. 2.7 Distributions of the ACF and PACF

Both the sample ACF and PACF are approximately normally distributed about their population values, and have standard deviation of about 1/ T , where T is the length of the series. A rule of thumb for negligibility of k (and similarly for k,k ) is that rk (similarly k,k ) should lie between 2/ T . (2 is an approximation to 1.96. Recall that if Z1, . . . , Zn N (, 1), a test of size 0.05 of the hypothesis H0 : = 0 against lies outside 1.96/n). H1 : = 0 rejects H0 if and only if Z Care is needed in applying this rule of thumb. It is important to realize that the sample autocorrelations, r1, r2, . . . , (and sample partial autocorrelations, 1,1 , 2,2, . . . ) are not independently distributed. The probability that any one rk should lie outside 2/ T depends on the values of the other rk . A portmanteau test of white noise (due to Box & Pierce and Ljung & Box) can be based on the fact that approximately
m

Qm = T (T + 2)
k =1

2 ( T k ) 1 rk 2 m.

The sensitivity of the test to departure from white noise depends on the choice of m. If the true model is ARMA(p, q ) then greatest power is obtained then rejection of the white noise hypothesis is most probable when m is about p + q .

3
3.1

Spectral methods
The discrete Fourier transform

If h(t) is dened for integers t, the discrete Fourier transform of h is

H ( ) =
t=

h(t)eit,

The inverse transform is 1 h(t) = 2


eit H ( ) d .

If h(t) is real-valued, and an even function such that h(t) = h(t), then

H ( ) = h(0) + 2
t=1

h(t) cos(t)

and 1 h(t) = 3.2 The spectral density

cos(t)H ( ) d .
0

The Wiener-Khintchine theorem states that for any real-valued stationary process there exists a spectral distribution function, F (), which is a nondecreasing and right continuous on [0, ] such that F (0) = 0, F ( ) = 0 and

k =
0

cos(k ) dF ( ) .

The integral is a Lebesgue-Stieltges integral and is dened even if F has discontinuities. Informally, F (2) F (1) is the contribution to the variance of the series made by frequencies in the range (1, 2). F () can have jump discontinuities, but always can be decomposed as F ( ) = F1 ( ) + F2 ( ) where F1 () is a nondecreasing continuous function and F2() is a nondecreasing step function. This is a decomposition of the series into a purely indeterministic component and a deterministic component. Suppose the process is purely indeterministic, (which happens if and only if k |k | < ). In this case F () is a nondecreasing continuous function, and differentiable at all points (except possibly on a set of measure zero). Its its derivative
9

f ( ) = F ( ) exists, and is called the spectral density function. Apart from a multiplication by 1/ it is simply the discrete Fourier transform of the autocovariance function and is given by 1 f ( ) = with inverse
k =

k e

ik

1 2 = 0 +

k cos(k ) ,
1

k =
0

cos(k )f ( ) d .

Note. Some authors dene the spectral distribution function on [, ]; the use of negative frequencies makes the interpretation of the spectral distribution less intuitive and leads to a dierence of a factor of 2 in the denition of the spectra density. Notice, however, that if f is dened as above and extended to negative frequencies, f ( ) = f ( ), then we can write

k =

1 ik f ( ) d . 2e

Example 3.1 (a) Suppose {Xt } is i.i.d., 0 = var(Xt ) = 2 > 0 and k = 0, k 1. Then f ( ) = 2/ . The fact that the spectral density is at means that all frequencies are equally present accounts for our calling this sequence white noise. (b) As an example of a process which in not purely indeterministic, consider Xt = cos(0t + U ) where 0 is a value in [0, ] and U U [, ]. The process has zero mean, since 1 E (Xt ) = 2 and autocovariance k = E (Xt , Xt+k ) 1 cos(0t + u) cos(0t + 0k + u)du = 2 1 1 [cos(0k ) + cos(20t + 0 k + 2u)] du = 2 2 1 1 [2 cos(0k ) + 0] = 2 2 1 = cos(0k ) . 2
10

cos(0t + u) du = 0

Hence Xt is second order stationary and we have 1 1 1 k = cos(0 k ), F ( ) = I[0] and f ( ) = 0 ( ) . 2 2 2 Note that F is a nondecreasing step function. More generally, the spectral density
n

f ( ) =
j =1

1 aj j ( ) 2 cos(j t + Uj ) where j [0, ] and

corresponds to the process Xt = U1, . . . , Un are i.i.d. U [, ].

n j =1 aj

(c) The MA(1) process, Xt = 1 t1 + t , where { t } is white noise. Recall 0 = 2 2 (1 + 1 ) , 1 = 1 2, and k = 0, k > 1. Thus f ( ) =
2 2 (1 + 21 cos + 1 ) .

(d) The AR(1) process, Xt = 1 Xt1 + t , where { t } is white noise. Recall


2 2 0 = 2 0 = 2/(1 2 var(Xt ) = 2 1 var(Xt1 ) + = 1 0 + = 1)

where we need |1 | < 1 for Xt stationary. Also, k = cov(Xt , Xtk ) = cov(1 Xt1 + t , Xtk ) = 1 k1. So k = 1 0, k Z. Thus 0 2 + f ( ) = = 0
|k |

k 1 0 cos(k )
k =1 i

0 =
i

1+
k =1

ik k + eik 1 e

1 e 1 e + 1 1 ei 1 1 ei 2 . = (1 21 cos + 2 1) 1+

0 1 2 1 1 21 cos + 2 1

Note that > 0 has power at low frequency, whereas < 0 has power at high frequency.
1 =
1 2 5 4 3

1 = 1 2

5 4 3

f ( )
2 1 0 0 1 2 3

f ( )
2 1 0 0 1 2 3

11

Plots above are the spectral densities for AR(1) processes in which { t } is Gaussian white noise, with 2 / = 1. Samples for 200 data points are shown below.
AR(1), p=0.5

50

100

150

200

AR(1), p=-0.5

50

100

150

200

3.3

Analysing the eects of smoothing

Let {as } be a sequence of real numbers. A linear lter of {Xt } is Yt =


s=

as Xts .

In Chapter 5 we show that the spectral density of {Yt } is given by fY ( ) = |a( )|2 fX ( ) , where a(z ) is the transfer function

a( ) =
s=

as eis .

This result can be used to explore the eect of smoothing a series. Example 3.2 Suppose the AR(1) series above, with 1 = 0.5, is smoothed by a moving average on three points, so that smoothed series is Yt = 1 3 [Xt+1 + Xt + Xt1 ] .
1 i 2 1 i 2 Then |a( )|2 = | 1 +1 3e 3 + 3 e | = 9 (1 + 2 cos ) . Notice that X (0) = 4/3, Y (0) = 2/9, so {Yt } has 1/6 the variance of {Xt }. Moreover, all components of frequency = 2/3 (i.e., period 3) are eliminated in the smoothed series.
5

|a( )|2

fY ( )

= |a( )|2 fX ( )

4 3 2 1

12

4
4.1

Estimation of the spectrum


The periodogram

Suppose we have T = 2m + 1 observations of a time series, y1 , . . . , yT . Dene the Fourier frequencies, j = 2j/T , j = 1, . . . , m, and consider the regression model
m m

yt = 0 +
j =1

j cos(j t) +
j =1

j sin(j t) ,

which can be written as a general linear model, Y = X + , where 0 1 1 c11 s11 cm1 sm1 y1 1 . . . . . . . . . Y = . , . , = . . . . . , X = . . . . . 1 c1t s1t cmt smt yT m m cjt = cos(j t), sjt = sin(j t) . The least squares estimates in this model are given by = (X X )1 X Y . Note that
T

. =. .
T 1

e
t=1 T

ij t

eij (1 eij T ) = =0 1 eij


T T

=
t=1

cjt + i
t=1

sjt = 0

=
t=1

cjt =
t=1

sjt = 0

and
T T

cjt sjt =
t=1 T T

1 2 t=1

sin(2j t) = 0 ,

c2 jt =
t=1 T

1 2 t=1 T

{1 + cos(2j t)} = T /2 , {1 cos(2j t)} = T /2 ,


t=1 T

s2 jt =
t=1 T

1 2 T

cjt skt =
t=1 t=1

cjt ckt =
t=1

sjt skt = 0,

j = k.

13

Using these, we have 1 T 0 0 0 0 T /2 0 1 = . = . . . . . . . . . . . 0 0 T /2 m and the regression sum of squares is Y = Y X (X X )1 X Y = T y Y 2 +


m j =1

(2/T ) t c1t yt = . . . . . . (2/T ) t smt yt t smt yt


t c1t yt

t yt

2 T

cjt yt
t=1

+
t=1

sjt yt

Since we are tting T unknown parameters to T data points, the model ts with no = Y . Hence residual error, i.e., Y 2 2 T m T T 2 (yt y )2 = cjt yt + sjt yt . T t=1 t=1 t=1 j =1 This motivates denition of the periodogram as 2 T T 1 I ( ) = yt cos(t) + yt sin(t) T t=1 t=1 .

A factor of (1/2 ) has been introduced into this denition so that the sample variance, 0 = (1/T ) T )2, equates to the sum of the areas of m rectangles, whose t=1 (yt y heights are I (1 ), . . . , I (m), whose widths are 2/T , and whose bases are centred at 1, . . . , m. I.e., 0 = (2/T ) m j =1 I (j ). These rectangles approximate the area under the curve I ( ), 0 .

I ( )

I ( 5 )

0
2/T

5 14

Using the fact that T I (j ) =

T t=1 cjt T

T t=1 sjt 2

= 0, we can write
T 2

yt cos(j t)
t=1 T

+
t=1 2

yt sin(j t)
T 2

=
t=1 T

(yt y ) cos(j t)
2

+
t=1

(yt y ) sin(j t)

=
t=1 T

(yt y )eij t
T ij t

=
t=1 T

(yt y )e
2

(ys y )eij s
T

s=1 T 1

=
t=1

(yt y ) + 2

(yt y )(ytk y ) cos(j k ) .


k =1 t=k +1 T 1

Hence 1 2 I ( j ) = 0 + k cos(j k ) .
k =1

I ( ) is therefore a sample version of the spectral density f ( ). 4.2 Distribution of spectral estimates

If the process is stationary and the spectral density exists then I ( ) is an almost unbiased estimator of f ( ), but it is a rather poor estimator without some smoothing. Suppose {yt } is Gaussian white noise, i.e., y1 , . . . , yT are iid N (0, 2). Then for any Fourier frequency = 2j/T , 1 I ( ) = A( ) 2 + B ( ) 2 , (4.1) T where
T T

A( ) =
t=1

yt cos(t) ,

B ( ) =
t=1

yt sin(t) .

(4.2)

Clearly A( ) and B ( ) have zero means, and


T

var[A( )] =

2 t=1 T

cos2(t) = T 2 /2 , sin2 (t) = T 2/2 ,


t=1

var[B ( )] = 2

15

cov[A( ), B ( )] = E
t=1 s=1

yt ys cos(t) sin(s) =

2 t=1

cos(t) sin(t) = 0 .

Hence A( ) 2/T 2 and B ( ) 2/T 2 are independently distributed as N (0, 1), and 2 2 2 A( )2 + B ( )2 /(T 2) is distributed as 2 2 . This gives I ( ) ( / )2 /2. Thus we see that I (w) is an unbiased estimator of the spectrum, f ( ) = 2 / , but it is not consistent, since var[I ( )] = 4/ 2 does not tend to 0 as T . This is perhaps surprising, but is explained by the fact that as T increases we are attempting to estimate I ( ) for an increasing number of Fourier frequencies, with the consequence that the precision of each estimate does not change. By a similar argument, we can show that for any two Fourier frequencies, j and k the estimates I (j ) and I (k ) are statistically independent. These conclusions hold more generally. Theorem 4.1 Let {Yt } be a stationary Gaussian process with spectrum f ( ). Let I () be the periodogram based on samples Y1, . . . , YT , and let j = 2j/T , j < T /2, be a Fourier frequency. Then in the limit as T , (a) I (j ) f (j )2 2 /2. (b) I (j ) and I (k ) are independent for j = k . Assuming that the underlying spectrum is smooth, f ( ) is nearly constant over a small range of . This motivates use of an estimator for the spectrum of (j ) = f 1 2p + 1
p

I ( j + ) .
=p

2 (j ) f (j )2 Then f 2(2p+1)/[2(2p + 1)], which has variance f ( ) /(2p + 1). The idea is to let p as T .

4.3

The fast Fourier transform

I (j ) can be calculated from (4.1)(4.2), or from 1 I ( j ) = T


T 2

yt eij t
t=1

Either way, this requires of order T multiplications. Hence to calculate the complete periodogram, i.e., I (1 ), . . . , I (m), requires of order T 2 multiplications. Computation eort can be reduced signicantly by use of the fast Fourier transform, which computes I (1), . . . , I (m) using only order T log2 T multiplications.
16

5
5.1

Linear lters
The Filter Theorem

A linear lter of one random sequence {Xt } into another sequence {Yt } is Yt =
s=

as Xts .

(5.1)

Theorem 5.1 (the lter theorem) Suppose Xt is a stationary time series with spectral density fX ( ). Let {at } be a sequence of real numbers such that t= |at | < . Then the process Yt = s= as Xts is a stationary time series with spectral density function fY ( ) = A(ei ) fX ( ) = |a( )|2 fX ( ) , where A(z ) is the lter generating function
2

A( z ) =
s=

as z s ,

|z | 1 .

and a( ) = A(ei ) is the transfer function of the linear lter. Proof. cov(Yt , Yt+k ) = =
r,sZ rZ sZ

ar as cov(Xtr , Xt+ks)

ar as k+rs

=
r,sZ

ar as

1 i (k +rs) fX ( )d 2e

A(ei )A(ei ) 1 eik fX ( )d 2


1 ik 2e

A(ei ) fX ( )d

1 ik fY ( )d . 2e

Thus fY ( ) is the spectral density for Y and Y is stationary. 5.2 Application to autoregressive processes

Let us use the notation B for the backshift operator B 0 = I, (B 0X )t = Xt , (BX )t = Xt1 ,
17

(B 2X )t = Xt2,

...

Then the AR(p) process can be written as (I


p r r=1 r B ) X

or (B )X = , where is the function (z ) = 1


p r r=1 r z

By the lter theorem, f ( ) = | ei )|2fX ( , so since f ( ) = 2 / , fX ( ) = 2 . |(ei )|2 (5.2)

ik As fX ( ) = (1/ ) , we can calculate the autocovariances by expandk = k e i ing fX ( ) as a power series in e . For this to work, the zeros of (z ) must lie outside the unit circle in C . This is the stationarity condition for the AR(p) process.

Example 5.2 For the AR(1) process, Xt 1 Xt1 = t , we have (z ) = 1 1z , with its zero at z = 1/1. The stationarity condition is |1 | < 1. Using (5.2) we nd 2 2 , = fX ( ) = |1 ei |2 (1 2 cos + 2 ) which is what we found by other another method in Example 3.1(c). To nd the autocovariances we can write 1 1 1 = = = |(z )|2 (z )(1/z ) (1 1 z )(1 1 /z )
r r 1z s s 1z

=
k =

4 z k (1 (1 + 2 1 + 1 + )) =

|k |

r=0 s=0 k |k | z 1 1 2 1 k =

1 = fX ( ) =
|k |

k =

2 1 ik e 1 2 1

|k |

and so k = 21 /(1 2 1 ) as we saw before. In general, it is often easier to calculate the spectral density function rst, using lters, and then deduce the autocovariance function from it. 5.3 Application to moving average processes
t

The MA(q ) process Xt =


q s s=0 s B .

q s=1 s ts

can be written as

X = (B ) where (z ) = By the lter theorem, fX ( ) = |(ei )|2( 2/ ).


18

Example 5.3 For the MA(1), Xt =

t 1 ,

(z ) = 1 + 1z and

2 2 1 + 21 cos + 1 . As above, we can obtain the autocovariance function by expressing fX ( ) as a power series in ei . We have 2 2 1ei + (1 + 1 ) + 1ei fX ( ) = 2 2 2 So 0 = (1 + 1 ), 1 = 1 , 2 = 0, |k | > 1. fX ( ) = As we remarked in Section 1.5, the autocovariance function of a MA(1) process 2 2 1 , 1 ). That is, with parameters ( 2, 1) is identical to one with parameters (1
2 2 2 2 = 1 (1 + 1/1 ) = 2 (1 + 1 ) = 0 0 1 2 2 1 = 1 /(1 + 1 ) = 1 /(1 + 1 ) = 1 .

In general, the MA(q ) process can be written as X = (B ) , where


q

(z ) =
k =0

k z t k =

( i z ) .
k =1

(5.3)

So the autocovariance generating function is


q

g (z ) =
k =q

k z = ( z ) ( z ) =
k =1

(i z )(i z 1 ) .

1 1 2 Note that (i z )(i z 1 ) = i ( i z )(i z 1 ). So g (z ) is unchenged when 1 in (5.3) we replace (i z ) by i (i z ). Thus (in the case that all roots of (z ) = 0 are real) there can be as many as 2q dierent MA(q ) processes with the same autocovariance function. For identiability, we assume that all the roots of (z ) lie outside the unit circle in C . This is equivalent to the invertibility condition, that t can be written as a convergent power series in {Xt , Xt1, . . . }.

5.4

The general linear process

A special case of (5.1) is the general linear process, Yt =


s=0

as Xts ,

where {Xt } is white noise. This has


cov(Yt , Yt+k ) =

2 s=0

as as+k
s=0

a2 s,

19

where the inequality is an equality when k = 0. Thus {Yt } is stationary if and 2 only if s=0 as < . In practice the general linear model is useful when the as are expressible in terms of a nite number of parameters which can be estimated. A rich class of such models are the ARMA models. 5.5 Filters and ARMA processes

The ARMA(p, q ) model can be written as (B )X = (B ) . Thus |(e )| fX ( ) = |(e )|


i 2 i 2 2

(ei ) fX ( ) = (ei )

2 .

This is subject to the conditions that the zeros of lie outside the unit circle in C for stationarity. the zeros of lie outside the unit circle in C for identiability. (z ) and (z ) have no common roots. If there were a common root, say 1/, so that (I B )1(B )X = (I B )1(B ) , n n then we could multiply both sides by n=0 B and deduce 1 (B )X = 1 (B ) , and thus that a more economical ARMA(p 1, q 1) model suces. 5.6 Calculating autocovariances in ARMA models

As above, the lter theorem can assist in calculating the autocovariances of a model. These can be compared with autocovariances estimated from the data. For example, an ARMA(1, 2) has (z ) = 1 z, Then X = C (B ) , where

(z ) = 1 + 1 z + 2 z 2 ,

where || < 1.

C (z ) = (z )/(z ) = 1 + 1z + 2 z with c0 = 1, c1 = + 1 , and

2 n=0

z =
n=0

n n

cn z n ,

cn = n + n11 + n12 = n2 2 + 1 + 2 , So Xt =
n=0 cn tn

n 2.

and we can compute covariances as


k = cov(Xt , Xt+k ) =
n,m=0

cn cm cov(

t n , t+ k m )

=
n=0

cn cn+k 2 .

For example, k = k1, k 3. As a test of whether the model is ARMA(1, 2) we might look to see if the sample autocovariances decay geometrically, for k 2,
20

6
6.1

Estimation of trend and seasonality


Moving averages

Consider a decomposition into trend, seasonal, cyclic and residual components. Xt = Tt + It + Ct + Et . Thus far we have been concerned with modelling {Et }. We have also seen that the periodogram can be useful for recognising the presence of {Ct}. We can estimate trend using a symmetric moving average,
k

t = T
S =k

as Xt+s ,

where as = as . In this case the transfer function is real-valued. The choice of moving averages requires care. For example, we might try to estimate the trend with t = 1 (Xt1 + Xt + Xt+1) . T 3 But suppose Xt = Tt + t , where trend is the quadratic Tt = a + bt + ct2 . Then t = Tt + 2 c + 1 ( T 3 3
t 1

t+1 ) ,

t = E Xt + 2 c and thus T is a biased estimator of the trend. so E T 3 This problem is avoided if we estimate trend by tting a polynomial of sucient degree, e.g., to nd a cubic that best ts seven successive points we minimize
3

Xt b0 b1t b2 t2 b3t3
t= 3

So Xt tXt t2 Xt t3 Xt Then
1 b0 = 21 7 Xt t2 Xt 1 = 21 (2X3 + 3X2 + 6X1 + 7X0 + 6X1 + 3X2 2X3) .

= 7 b0 + 28 b2 = 28 b1 + 196 b3 = 28 b0 + 196 b3 = 196b1 + 1588 b3

0 = b0 , and similarly, We estimate the trend at time 0 by T t = T


1 21

(2Xt3 + 3Xt2 + 6Xt1 + 7Xt + 6Xt+1 + 3Xt+2 2Xt+3) .


21

1 A notation for this moving average is 21 [2, 3, 6, 7, 6, 3, 2]. Note that the weights sum to 1. In general, we can t a polynomial of degree q to 2m +1 points by applying a symmetric moving average. (We t to an odd number of points so that the midpoint of tted range coincides with a point in time at which data is measured.) A value for q can be identied using the variate dierence method: if {Xt } is indeed a polynomial of degree q , plus residual error { t }, then the trend in r Xt is a polynomial of degree q r and

q Xt = constant + q

= constant +

q 1
2

t 1

q 2

t 2

+ (1)q

t q

The variance of q Xt is therefore q var( t ) = 1 + 1


q 2

q + 2

+ + 1 2 =

2q 2 , q

where the simplication in the nal line comes from looking at the coecient of z q in expansions of both sides of (1 + z )q (1 + z )q = (1 + z )2q . Dene Vr = var(q Xt )/ 2qq . The fact that the plot of Vr against r should atten out at r q can be used to identify q . 6.2 Centred moving averages

If there is a seasonal component then a centred-moving average is useful. Suppose data is measured quarterly, then applying twice the moving average 1 4 [1, 1, 1, 1] is 1 equivalent to applying once the moving average 8 [1, 2, 2, 2, 1]. Notice that this socalled centred average of fours weights each quarter equally. Thus if Xt = It + t , t has no seasonal where It has period 4, and I1 + I2 + I3 + I4 = 0, then Xt T component. Similarly, if data were monthly we use a centred average of 12s, that is, 1 24 [1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1]. 6.3 The Slutzky-Yule eect

To remove both trend and seasonal components we might successively apply a number of moving averages, one or more to remove trend and another to remove seasonal eects. This is the procedure followed by some standard forecasting packages. However, there is a danger that application of successive moving averages can introduce spurious eects. The Slutzky-Yule eect is concerned with the fact that a moving average repeatedly applied to a purely random series can introduce articial cycles. Slutzky (1927) showed that some trade cycles of the nineteenth century were no more than artifacts of moving averages that had been used to smooth the data.
22

To illustrate this idea, suppose the moving average 1 6 [1, 2, 4, 2, 1] is applied k times to a white noise series. This moving average has transfer function, a( ) = 1 (4+ 6 4 cos 2 cos 2 ), which is maximal at = /3. The smoothed series has a spectral density, say fk ( ), proportional to a( )2k , and hence for = /3, fk ( )/fk (/3) 0 as k . Thus in the limit the smoothed series is a periodic wave with period 6. 6.4 Exponential smoothing

Single exponential smoothing

Suppose the mean level of a series drifts slowly over time. A naive one-step-ahead forecast is Xt (1) = Xt . However, we might let all past observations play a part in the forecast, but give greater weights to those that are more recent. Choose weights to decrease exponentially and let Xt (1) = 1 Xt + Xt1 + 2 Xt2 + + t1X1 , t 1

where 0 < < 1. Dene St as the right hand side of the above as t , i.e., St = (1 )
s=0

s Xts .

St can serve as a one-step-ahead forecast, Xt (1). St is known as simple exponential smoothing. Let = 1 . Simple algebra gives St = Xt + (1 )St1 Xt (1) = Xt1 (1) + [Xt Xt1(1)] . This shows that the one-step-ahead forecast at time t is the one-step-ahead forecast at time t 1, modied by times the forecasting error incurred at time t 1. To get things started we might set S0 equal to the average of the rst few data points. We can play around with , choosing it to minimize the mean square forecasting error. In practice, in the range 0.250.5 usually works well.
Double exponential smoothing

Suppose the series is approximately linear, but with a slowly varying trend. If it were true that Xt = b0 + b1 t + t , then

St = (1 )
s=0

s (b0 + b1(t s) + t )

= b0 + b1 t b1 (1 )
s=0

s + b1(1 )
s s=0

t s ,

23

and hence
E St = b0 + b1 t b1 /(1 ) = E Xt+1 b1 /(1 ) .

Thus the forecast has a bias of b1/(1 ). To eliminate this bias let St1 = St be the rst smoothing, and St2 = St1 + (1 )St21 be the simple exponential smoothing of St1 . Then
2 E St = E St1 b1/(1 ) = E Xt 2b1/(1 ) , 1 E (2St St2) = b0 + b1t, 1 E ( St St2 ) = b1(1 )/ .

This suggests the estimates b0 + b1t = 2St1 St2 and b1 = (St1 St2)/(1 ). The forecasting equation is then b0 + b1(t + s) = (2St1 St2 ) + s(St1 St2 )/(1 ) . Xt (s) = As with single exponential smoothing we can experiment with choices of and nd 1 2 0 + 1t, to the rst few points of the S0 and S0 by tting a regression line, Xt = series and solving
1 0 (1 ) 1 /, S0 = 2 0 2(1 ) 1/ . S0 =

6.5

Calculation of seasonal indices

1 be the Suppose data is quarterly and we want to t an additive model. Let I 2 be the average of X2 , X6, X10, . . . , and so on for I 3 average of X1 , X5, X9, . . . , let I 4. The cumulative seasonal eects over the course of year should cancel, so that and I if Xt = a + It , then Xt + Xt+1 + Xt+2 + Xt+3 = 4a. To ensure this we take our nal t 1 (I + + I 4). estimates of the seasonal indices as It = I 4 1 If the model is multiplicative and Xt = aIt , we again wish to see the cumulative eects over a year cancel, so that Xt + Xt+1 + Xt+2 + Xt+3 = 4a. This means that we t 1 (I should take It = I 4 1 + + I4 ) + 1, adjusting so the mean of I1 , I2 , I3 , I4 is 1. When both trend and seasonality are to be extracted a two-stage procedure is recommended: 1. (a) Make a rst estimate of trend, say T
t

Subtract this from {Xt } and calculate rst estimates of the seasonal indices, say 1. It1, from Xt T t The rst estimate of the deseasonalised series is Yt1 = Xt It1 . 2 . (b) Make a second estimate of the trend by smoothing Yt1, say T t Subtract this from {Xt } and calculate second estimates of the seasonal indices, 2. say It2 , from Xt T t The second estimate of the deseasonalised series is Yt2 = Xt It2 .
24

7
7.1

Fitting ARIMA models


The Box-Jenkins procedure

A general ARIMA(p, d, q ) model is (B )(B )dX = (B ) , where (B ) = I B . The Box-Jenkins procedure is concerned with tting an ARIMA model to data. It has three parts: identication, estimation, and verication. 7.2 Identication

The data may require pre-processing to make it stationary. To achieve stationarity we may do any of the following. Look at it. Re-scale it (for instance, by a logarithmic or exponential transform.) Remove deterministic components. Dierence it. That is, take (B )dX until stationary. In practice d = 1, 2 should suce. We recognise stationarity by the observation that the autocorrelations decay to zero exponentially fast. Once the series is stationary, we can try to t an ARMA(p, q ) model. We consider k,k . We have already k / 0 and the partial autocorrelations the correlogram rk = made the following observations. An MA(q ) process has negligible ACF after the q th term. An AR(p) process has negligible PACF after the pth term. As we have noted, very approximately, both the sample ACF and PACF have stan dard deviation of around 1/ T , where T is the length of the series. A rule of thumb is that ACF and PACF values are negligible when they lie between 2/ T . An ARMA(p, q ) process has k th order sample ACF and PACF decaying geometrically for k > max(p, q ). 7.3 Estimation

AR processes
p To t a pure AR(p), i.e., Xt = r=1 r Xtr + t we can use the Yule-Walker p equations k = r=1 r |kr| . We t by solving k = p |kr| , k = 1, . . . , p. 1 r These can be solved by a Levinson-Durbin recursion, (similar to that used to solve for partial autocorrelations in Section 2.6). This recursion also gives the estimated

25

2 residual variance p , and helps in choice of p through the approximate log likelihood 2 2 log L T log( p ). Another popular way to choose p is by minimizing Akaikes AIC (an information criterion ), dened as AIC = 2 log L + 2k , where k is the number of parameters estimated, (in the above case p). As motivation, suppose that in a general modelling context we attempt to t a model with parameterised likelihood function f (X | ), , and this includes the true model for some 0 . Let X = (X1, . . . , Xn ) be a (X ) be the maximum likelihood estimator vector of n independent samples and let of . Suppose Y is a further independent sample. Then (X ) = 2E X log f X | (X ) + 2k + O 1/ n , 2nE Y E X log f Y |

where k = ||. The left hand side is 2n times the conditional entropy of Y given (X ), i.e., the average number of bits required to specify Y given (X ). The right hand side is approximately the AIC and this is to be minimized over a set of models, say (f1, 1 ), . . . , (fm, m ). ARMA processes Generally, we use the maximum likelihood estimators, or at least squares numerical approximations to the MLEs. The essential idea is prediction error decomposition. We can factorise the joint density of (X1, . . . , XT ) as
T

f (X1, . . . , XT ) = f (X1)
t=2

f (Xt | X1, . . . , Xt1) .

Suppose the conditional distribution of Xt given (X1, . . . , Xt1) is normal with mean t and variance Pt1 , and suppose also that X1 is normal N (X 1, P0 ). Here X t and X Pt1 are functions of the unknown parameters 1 , . . . , p , 1, . . . , q and the data. The log likelihood is
T

2 log L = 2 log f =
t=1

t )2 (Xt X log(2 ) + log Pt1 + Pt1

We can minimize this with respect to 1 , . . . , p , 1, . . . , q to t ARMA(p, q ). Additionally, the second derivative matrix of log L (at the MLE) is the observed information matrix, whose inverse is an approximation to the variance-covariance matrix of the estimators. In practice, tting ARMA(p, q ) the log likelihood (2 log L) is modied to sum only over the range {m + 1, . . . , T }, where m is small. Example 7.1 t = For AR(p), take m = p so X
p r=1 r Xtr ,

t m + 1, Pt1 = 2.

26

Note. When using this approximation to compare models with dierent numbers of parameters we should always use the same m. Again we might choose p and q by minimizing the AIC of 2 log L + 2k , where k = p + q is the total number of parameters in the model. 7.4 Verication

The third stage in the Box-Jenkins algorithm is to check whether the model ts the data. There are several tools we may use. Overtting. Add extra parameters to the model and use likelihood ratio test or t-test to check that they are not signicant. Residuals analysis. Calculate the residuals from the model and plot them. The autocorrelation functions, ACFs, PACFs, spectral densities, estimates, etc., and conrm that they are consistent with white noise. 7.5 Tests for white noise

Tests for white noise include the following. (a) The turning point test (explained in Lecture 1) compares the number of peaks and troughs to the number that would be expected for a white noise series. (b) The BoxPierce test is based on the statistic 1 Qm = T
m 2 rk , k =1

where rk is the k th sample autocorrelation coecient of the residual series, and p+q < m T . It is called a portmanteau test, because it is based on the all-inclusive statistic. If the model is correct then Qm 2 mpq approximately. In fact, rk has variance (T k )/(T (T + 2)), and a somewhat more powerful test uses the Ljung-Box statistic quoted in Section 2.7,
m

Qm = T (T + 2)
k =1

2 ( T k ) 1 rk ,

where again, Qm 2 mpq approximately. (c) Another test for white noise can be constructed from the periodogram. Recall that I (j ) ( 2/ )2 2 /2 and that I (1 ), . . . , I (m ) are mutually independent.
2 Dene Cj = j k =1 I (k ) and Uj = Cj /Cm . Recall that 2 is the same as the exponential distribution and that if Y1, . . . , Ym are i.i.d. exponential random variables,

27

then (Y1 + + Yj )/(Y1 + + Ym ), j = 1, . . . , m 1, have the distribution of an ordered sample of m 1 uniform random variables drawn from [0, 1]. Hence under the hypothesis that {Xt } is Gaussian white noise Uj , j = 1, . . . , m 1 have the distribution of an ordered sample of m 1 uniform random variables on [0, 1]. The standard test for this is the Kolomogorov-Smirnov test, which uses as a test statistic, D, dened as the maximum dierence between the theoretical distribution function for U [0, 1], F (u) = u, and the empirical distribution (u) = {#(Uj u)}/(m 1). Percentage points for D can be found in tables. F 7.6 Forecasting with ARMA models

Recall that (B )X = (B ) , so the power series coecients of C (z ) = (z )/(z ) = r r=0 cr z give an expression for Xt as Xt = r=0 cr tr . r But also, = D(B )X , where D(z ) = (z )/(z ) = r=0 dr z as long as the zeros of lie strictly outside the unit circle and thus t = r=0 dr Xtr . The advantage of the representation above is that given (. . . , Xt1, Xt) we can calculate values for (. . . , t1, t ) and so can forecast Xt+1. In general, if we want to forecast XT +k from (. . . , XT 1, XT ) we use

T,k = X
r =k

cr

T +k r

=
r=0

ck+r

T r

,
T 1, T ).

which has the least mean squared error over all linear combinations of (. . . , In fact,
k 1

T,k XT +k ) (X

2 r=0

c2 r.

In practice, there is an alternative recursive approach. Dene (T 1) k 0 , T,k = XT +k , X optimal predictor of TT +k given X1 , . . . , XT , 1 k . We have the recursive relation
p q

T,k = X
r=1

T,kr + T +k + r X
s=1

s T +ks

For k = (T 1), (T 2), . . . , 0 this gives estimates of t for t = 1, . . . , T . T,k for XT +k . We take t = 0 for t > T . For k > 0, this give a forecast X But this needs to be started o. We need to know (Xt , t 0) and t , t 0. There are two standard approaches. 1. Conditional approach: take Xt =
t

= 0, t 0.

2. Backcasting: we forecast the series in the reverse direction to determine estimators of X0 , X1, . . . and 0 , 1, . . . .
28

8
8.1

State space models


Models with unobserved states

State space models are an alternative formulation of time series with a number of advantages for forecasting. 1. All ARMA models can be written as state space models. 2. Nonstationary models (e.g., ARMA with time varying coecients) are also state space models. 3. Multivariate time series can be handled more easily. 4. State space models are consistent with Bayesian methods. In general, the model consists of observed data: unobserved state: observation noise: state noise: Xt = Ft St + vt St = Gt St1 + wt vt N (0, Vt) wt N (0, Wt)

where vt, wt are independent and Ft , Gt are known matrices often time dependent (e.g., because of seasonality). Example 8.1 Xt = St + vt , St = St1 + wt . Dene Yt = Xt Xt1 = (St + vt ) (St1 + vt1) = wt + vt vt1. The autocorrelations of {yt } are zero at all lags greater than 1. So {Yt } is MA(1) and thus {Xt } is ARMA(1, 1). Example 8.2 The general ARMA(p, q ) model Xt = model. We write Xt = Ft St , where
p r=1 r Xtr

q s=0 s ts

is a state space

Ft = (1, 2 , , p , 1, 1, , q ),

Xt1 . . . Xtp St = R p+q+1 t . . .


t q

29

with vt = 0, Vt = 0.

1 1 0 . . . 0 St = Gt St1 + wt = 0 0 0 . . . 0

2 0 . . 1 . . . . . . . 0 0 0 0 0 0 0 0 . . . . . . 0 0

p 0 0 . . . 1 0 0 0 . . . 0

1 0 0 . . . 0 0 0 0 . . . 0

1 0 0 . . . 0 0 1 0 . . . 0

2 0 0 . . . 0 0 0 1 . . . 0

q 1 0 0 . . . . . . 0 0 0 0 . . . . . . 1

Xt2 0 q 0 Xt3 0 . . . . 0 . . . . . . . . . . . 0 Xtp1 + 0 . 0 t 1 t 0 t 2 0 . . 0 . . . . . . . . . . . . . 0 0 t q 1

8.2

The Kalman lter

Given observed data X1 , . . . , Xt we want to nd the conditional distribution of St and a forecast of Xt+1. Recall the following multivariate normal fact: If Y = then
1 1 (Y1, | Y2 ) N 1 + A12A 22 (Y2 2 ), A11 A12 A22 A21 .

Y1 Y2

1 A11 A12 , 2 A21 A22

(8.1)

(8.2)

Conversely, if (Y1 | Y2) saties (8.2), and Y2 N (2 , A22) then the joint distribution is as in (8.1). Now let Ft1 = (X1, . . . , Xt1) and suppose we know that (St1 | Ft1) t1, Pt1 . Then N S St = Gt St1 + wt , so t1, Gt Pt1 G + Wt , (St | Ft1) N Gt S t and also (Xt | St , Ft1) N (FtSt , Vt). Put Y1 = Xt and Y2 = St . Let Rt = Gt Pt1 Gt + Wt . Taking all variables conditional on Ft1 we can use the converse of the multivariate normal fact and identify t1 2 = Gt S and A22 = Rt .
30

Since St is a random variable,


1 A12 = Ft Rt 1 + A12A 22 (St 2 ) = Ft St =

and 1 = Ft 2 .

Also
1 1 A11 = Vt + Ft Rt Rt Rt Ft = Vt + Ft Rt Ft . A11 A12A 22 A21 = Vt =

What this says is that Xt St =N


Ft1

t1 Vt + Ft Rt Ft Ft Gt S , t1 Rt Ft Gt S

Ft Rt Rt

Now apply the multivariate normal fact directly to get (St | Xt , Ft1) = (St | Ft ) t , Pt ), where N (S t1 + Rt F t = Gt S S t Pt = Rt Rt Ft Vt + Ft Rt Ft
1 1

t1 Xt Ft Gt S

Vt + FtRt Ft

Ft Rt

These are the Kalman lter updating equations. t . If contains the term Note the form of the right hand side of the expression for S t1, which is simply what we would predict if it were known that St1 = S t1 , plus Gt S t1 . a term that depends on the observed error in forecasting Xt , i.e., Xt Ft Gt S This is similar to the forecast updating expression for simple exponential smoothing in Section 6.4. 0 and P0 . Three All we need to start updating the estimates are the initial values S ways are commonly used. 1. Use a Bayesian prior distribution. 2. If F, G, V, W are independent of t the process is stationary. We could use the stationary distribution of S to start. 3. Choosing S0 = 0, P0 = kI (k large) reects prior ignorance. 8.3 Prediction

Suppose we want to predict the XT +k given (X1, . . . , XT ). We already have (XT +1 | X1 , . . . , XT ) N FT +1GT +1St , VT +1 + FT +1RT +1FT +1 which solves the problem for the case k = 1. By induction we can show that T +k , PT +k (ST +k | X1 , . . . , XT ) N S
31

where T,0 S PT,0 T,k S PT,k T =S T =P T,k1 = GT +k S = GT +k PT,k1GT +k

T,k , VT +k + FT +k PT,k F and hence that (XT +k | X1, . . . , XT ) N FT +k S T +k . 8.4 Parameter estimation revisited

In practice, of course, we may not know the matrices Ft , Gt, Vt , Wt. For example, in ARMA(p, q ) they will depend on the parameters 1 , . . . , p , 1, . . . , q , 2 , which we may not know. We saw that when performing prediction error decomposition that we needed to calculate the distribution of (Xt | X1 , . . . , Xt1). This we have now done. Example 8.3 Consider the state space model observed data Xt = St + vt , unobserved state St = St1 + wt , where vt , wt are independent errors, vt N (0, V ) and wt N (0, W ). Then we have Ft = 1, Gt = 1, Vt = V , Wt = W . Rt = Pt1 + W . So if t1 , Pt1 then (St | X1 , . . . , Xt) N S t , Pt , where (St1 | X1, . . . , Xt1) N S t = S t1 + Rt (V + Rt )1(Yt S t1) S
2 Rt V Rt V (Pt1 + W ) Pt = Rt . = = V + Rt V + Rt V + Pt1 + W

t Asymptotically, Pt P , where P is the positive root of P 2 + W P W V = 0 and S r t = (1 ) behaves like S r=0 Xtr , where = V /(V + W + P ). Note that this is simple exponential smoothing. T,k , PT,k where Equally, we can predict ST +k given (X1, . . . , XT ) as N S T,0 S PT,0 T,k S PT,k = St , = PT , T , =S = PT + kW .

T , V + PT + kW . So (XT +k | X1 , . . . , XT ) N S
32

Anda mungkin juga menyukai