Anda di halaman 1dari 31

Avd.

Matematisk statistik

Formulas and survey Time series analysis


Jan Grandell

INNEHLL A

Inneh all
1 Some notation 2 General probabilistic formulas 2.1 Some distributions . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Stochastic processes 4 Stationarity 5 Spectral theory 2 2 3 5 5 6 7

6 Time series models 8 6.1 ARMA processes . . . . . . . . . . . . . . . . . . . . . . . . . . 9 6.2 ARIMA and FARIMA processes . . . . . . . . . . . . . . . . . . 11 6.3 Financial time series . . . . . . . . . . . . . . . . . . . . . . . . 11 7 Prediction 12 7.1 Prediction for stationary time series . . . . . . . . . . . . . . . . 12 7.2 Prediction of an ARMA Process . . . . . . . . . . . . . . . . . . 14 8 Partial correlation 14 8.1 Partial autocorrelation . . . . . . . . . . . . . . . . . . . . . . . 14 9 Linear lters 10 Estimation in time series 10.1 Estimation of . . . . . . . . . . . 10.2 Estimation of () and () . . . . . 10.3 Estimation of the spectral density . 10.3.1 The periodogram . . . . . . 10.3.2 Smoothing the periodogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 15 15 16 17 17 17 19 19 19 20 21 22 23 23 25 28

11 Estimation for ARMA models 11.1 Yule-Walker estimation . . . . . . . . . 11.2 Burgs algorithm . . . . . . . . . . . . 11.3 The innovations algorithm . . . . . . . 11.4 The HannanRissanen algorithm . . . 11.5 Maximum Likelihood and Least Square 11.6 Order selection . . . . . . . . . . . . . 12 Multivariate time series 13 Kalman ltering Index

. . . . . . . . . . . . . . . . . . . . . . . . estimation . . . . . .

1 SOME NOTATION

Some notation

R = (, ) Z = {0, 1, 2, . . . } C = The complex numbers = {x + iy; x R, y R} def = means equality by denition.

General probabilistic formulas

(, F, P ) is a probability space, where: is the sample space, i.e. the set of all possible outcomes of an experiment. F is a -eld (or a -algebra), i.e. (a) F ; (b) if A1 , A2 , F then (c) if A F then Ac F. P is a probability measure, i.e. a function F [0, 1] satisfying (a) P () = 1; (b) P (A) = 1 P (Ac ); (c) if A1 , A2 , F are disjoint, then P
1 1

Ai F;

Ai

P (Ai ).

Denition 2.1 A random variable X dened on (, F, P ) is a function R such that { : X() x} F for all x R. Let X be a random variable. FX (x) = P {X x} is the distribution function (frdelningsfunktionen). o fX (), given by FX (x) =
x

fX (y) dy, is the density function (tthetsfunktionen). a

pX (k) = P {X = k} is the probability function (sannolikhetsfunktionen). X (u) = E eiuX is the characteristic function (karakteristiska funktionen). Denition 2.2 Let X1 , X2 , . . . be a sequence of random variables. We say P that Xn converges in probability to the real number a, written Xn a, if for every > 0, lim P (|Xn a| > ) = 0.
n

2 GENERAL PROBABILISTIC FORMULAS

Denition 2.3 Let X1 , X2 , . . . be a sequence of random variables with nite second moment. We say that Xn converges in mean-square to the random m.s. variable X, written Xn X, if E[(Xn X)2 ] 0 as n . An important property of mean-square convergence is that Cauchy-sequences do converge. More precisely this means that if X1 , X2 , . . . have nite second moment and if E[(Xn Xk )2 ] 0 as n, k , then there exists a random variable X with nite second moment such that m.s. Xn X. The space of square integrable random variables is complete under mean-square convergence.

2.1

Some distributions

The Binomial Distribution X Bin(n, p) if pX (k) = n k p (1 p)nk , k k = 0, 1, . . . , n and 0 < p < 1.

E(X) = np, Var(X) = np(1 p). The Poisson Distribution X Po() if pX (k) = k e , k! k = 0, 1, . . . and > 0.
iu )

E(X) = , Var(X) = , X (u) = e(1e

The Exponential Distribution X Exp() if fX (x) = E(X) = , Var(X) = 2 . The Standard Normal Distribution X N (0, 1) if 1 2 fX (x) = ex /2 , 2
2 1 x/ e

if x 0, if x < 0

> 0.

x R.

E(X) = 0, Var(X) = 1, X (u) = eu /2 . The density function is often denoted by () and the distribution function by ().

2 GENERAL PROBABILISTIC FORMULAS The Normal Distribution X N (, 2 ) if X N (0, 1), R, > 0.


2 2 /2

E(X) = , Var(X) = 2 , X (u) = eiu eu

The (multivariate) Normal Distribution Y = (Y1 , . . . , Ym ) N (, ), if there exists b11 . . . b1n 1 . . a vector = . , a matrix B = . with = BB . . bm1 . . . bmn m and a random vector X = (X1 , . . . , Xn ) with independent and N (0, 1)-distributed components, such that Y = + BX. If Y1 Y2 then Y1 conditional on Y2 = y2 N 1 + More generally, if Y1 Y2 N 1 , 2 11 12 21 22 , 1 2 (y2 2 ), (1 2 )1 . 2 N 1 , 2
2 1 1 2 2 1 2 2

then Y 1 conditional on Y 2 = y 2 N 1 + 12 1 (y 2 2 ), 11 12 1 21 . 22 22 Asymptotic normality Denition 2.4 Let Y1 , Y2 , . . . be a sequence of random variables. 2 Yn AN(n , n ) means that
n

lim P

Y n n x n

= (x).

Denition 2.5 Let Y 1 , Y 2 , . . . be a sequence of random k-vectors. Y n AN(n , n ) means that (a) 1 , 2 , . . . have no zero diagonal elements; (b) Y n AN( n , n ) for every Rk such that n > 0 for all suciently large n.

3 STOCHASTIC PROCESSES

2.2

Estimation

Let x1 , . . . , xn be observations of random variables X1 , . . . , Xn with a (known) distribution depending on the unknown parameter . A point estimate (punktskattning) of is then the value (x1 , . . . , xn ). In order to analyze the estimate we consider the estimator (stickprovsvariabeln) (X1 , . . . , Xn ). Some nice properties of an estimate are the following: An estimate of is unbiased (vntevrdesriktig) if E((X1 , . . . , Xn )) = a a for all . An estimate of is consistent if P (|(X1 , . . . , Xn ) | > ) 0 for n . If and are unbiased estimates of we say that is more eective than if Var((X1 , . . . , Xn )) Var( (X1 , . . . , Xn )) for all .

Stochastic processes

Denition 3.1 (Stochastic process) A stochastic process is a family of random variables {Xt , t T } dened on a probability space (, F, P ). A stochastic process with T Z is often called a time series. Denition 3.2 (The distribution of a stochastic process) Put T = {t T n : t1 < t2 < < tn , n = 1, 2, . . . }. The (nite-dimensional) distribution functions are the family {Ft (), t T } dened by Ft (x) = P (Xt1 x1 , . . . , Xtn xn ), t T n , x Rn .

With the distribution of {Xt , t T R} mean the family {Ft (), t T }. ae Denition 3.3 Let {Xt , t T } be a stochastic process with Var(Xt ) < . The mean function of {Xt } is X (t) = E(Xt ), The covariance function of {Xt } is X (r, s) = Cov(Xr , Xs ), r, s T.
def

t T.

Denition 3.4 (Standard Brownian motion) A standard Brownian motion, or a standard Wiener process {B(t), t 0} is a stochastic process satisfying (a) B(0) = 0;

4 STATIONARITY

(b) for every t = (t0 , t1 , . . . , tn ) with 0 = t0 < t1 < < tn the random variables 1 = B(t1 )B(t0 ), . . . , n = B(tn )B(tn1 ) are independent; (c) B(t) B(s) N (0, t s) for t s. Denition 3.5 (Poisson process) A Poisson process {N (t), t 0} with mean rate (or intensity) is a stochastic process satisfying (a) N (0) = 0; (b) for every t = (t0 , t1 , . . . , tn ) with 0 = t0 < t1 < < tn the random variables 1 = N (t1 )N (t0 ), . . . , n = N (tn )N (tn1 ) are independent; (c) N (t) N (s) Po((t s)) for t s. Denition 3.6 (Gaussian time series) The time series {Xt , t Z} is said to be a Gaussian time series if all nite-dimensional distributions are normal.

Stationarity

Denition 4.1 The time series {Xt , t Z} is said to be strictly stationary if the distributions of (Xt1 , . . . , Xtk ) and (Xt1 +h , . . . , Xtk +h ) are the same for all k, and all t1 , . . . , tk , h Z. Denition 4.2 The time series {Xt , t Z} is said to be (weakly) stationary if, see Denition 3.3 on the preceding page for notation, (i) Var(Xt ) < (ii) X (t) = for all t Z, for all t Z, for all r, s, t Z.

(iii) X (r, s) = X (r + t, s + t)

(iii) implies that X (r, s) is a function of r s, and it is convenient to dene X (h) = X (h, 0). The value his referred to as the lag. Denition 4.3 Let {Xt , t Z} be a stationary time series. The autocovariance function (ACVF) of {Xt } is X (h) = Cov(Xt+h , Xt ). The autocorrelation function (ACF) is X (h) =
def def

X (h) . X (0)

5 SPECTRAL THEORY

Spectral theory

Denition 5.1 The complex-valued time series {Xt , t Z} is said to be stationary if (i) E|Xt |2 < for all t Z,

(ii) EXt is independent of t for all t Z, (iii) E[Xt+h Xt ] is independent of t for all t Z. Denition 5.2 The autocovariance function () of a complex-valued stationary time series {Xt } is (h) = E[Xt+h Xt ] EXt+h EXt . Suppose that
h=

|(h)| < . The function

1 f () = 2

eih (h),
h=

(1)

is called the spectral density of the time series {Xt , t Z}. We have the spectral representation of the ACVF

(h) =

eih f () d.

For a real-valued time series f is symmetric, i.e. f () = f (). For any stationary time series the ACVF has the representation (h) =
(,]

eih dF () for all h Z,

where the spectral distribution function F () is a right-continuous, non-decreasing, bounded function on [, ] and F () = 0. The time series itself has a spectral representation Xt =
(,]

eit dZ()

where {Z(), [, ]} is an orthogonal-increment process. Denition 5.3 (Orthogonal-increment process) An orthogonal-increment process on [, ] is a complex-valued process {Z()} such that Z(), Z() < , and where X, Y = EXY . Z(), 1 = 0, , , if (1 , 2 ] (3 , 4 ] =

Z(4 ) Z(3 ), Z(2 ) Z(1 ) = 0,

6 TIME SERIES MODELS

Time series models

Denition 6.1 (White noise) A process {Xt , t Z} is said to be a white noise with mean and variance 2 , written {Xt } WN(, 2 ), if EXt = and (h) = 2 0 if h = 0, if h = 0.

A WN(, 2 ) has spectral density f () = 2 , 2 .

Denition 6.2 (Linear processes) The process {Xt , t Z} is said to be a linear process if it has the representation

Xt =
j=

j Ztj ,

{Zt } WN(0, 2 ),

where

j=

|j | < .

A linear process is stationary with mean 0, autocovariance function

(h) =
j=

j j+h 2 ,

and spectral density f () = where (z) =


j=

2 |(ei )|2 , 2

j z j .

Denition 6.3 (IID noise) A process {Xt , t Z} is said to be an IID noise with mean 0 and variance 2 , written {Xt } IID(0, 2 ), if the random variables Xt are independent and identically distributed with EXt = 0 and Var(Xt ) = 2 .

6 TIME SERIES MODELS

6.1

ARMA processes

Denition 6.4 (The ARMA(p, q) process) The process {Xt , t Z} is said to be an ARMA(p, q) process if it is stationary and if Xt 1 Xt1 . . . p Xtp = Zt + 1 Zt1 + . . . + q Ztq , (2)

where {Zt } WN(0, 2 ). We say that {Xt } is an ARMA(p, q) process with mean if {Xt } is an ARMA(p, q) process. Equations (2) can be written as (B)Xt = (B)Zt , where (z) = 1 1 z . . . p z p , (z) = 1 + 1 z + . . . + q z q , and B is the backward shift operator, i.e. (B j X)t = Xtj . The polynomials () and () are called generating polynomials. Denition 6.5 An ARMA(p, q) process dened by the equations (B)Xt = (B)Zt {Zt } WN(0, 2 ),
j=0

t Z,

is said to be causal if there exists constants {j } such that

|j | < and (3)

Xt =
j=0

j Ztj ,

t Z.

Theorem 6.1 Let {Xt } be an ARMA(p, q) for which () and () have no common zeros. Then {Xt } is causal if and only if (z) = 0 for all |z| 1. The coecients {j } in (3) are determined by the relation

(z) =
j=0

j z j =

(z) , (z)

|z| 1.

Denition 6.6 An ARMA(p, q) process dened by the equations (B)Xt = (B)Zt , {Zt } WN(0, 2 ),
j=0

is said to be invertible if there exists constants {j } such that and Zt =


j=0

|j | < (4)

j Xtj ,

t Z.

6 TIME SERIES MODELS

10

Theorem 6.2 Let {Xt } be an ARMA(p, q) for which () and () have no common zeros. Then {Xt } is invertible if and only if (z) = 0 for all |z| 1. The coecients {j } in (4) are determined by the relation

(z) =
j=0

j z j =

(z) , (z)

|z| 1.

A causal and invertible ARMA(p, q) process has spectral density f () = 2 |(ei )|2 , 2 |(ei )|2 .

Denition 6.7 (The AR(p) process) The process {Xt , t Z} is said to be an AR(p) autoregressive process of order p if it is stationary and if Xt 1 Xt1 . . . p Xtp = Zt , {Zt } WN(0, 2 ).

We say that {Xt } is an AR(p) process with mean if {Xt } is an AR(p) process. A causal AR(p) process has spectral density f () = 2 1 i )|2 2 |(e .

Its ACVF is determined be the the Yule-Walker equations: (k) 1 (k 1) . . . p (k p) = A causal AR(1) process dened by Xt Xt1 = Zt , has ACVF (h) = and spectral density f () = 2 1 , 2 2 cos() 2 1 + . {Zt } WN(0, 2 ), 2 |h| 1 2 0, 2, k = 1, . . . , p, k = 0. (5)

Denition 6.8 (The MA(q) process) The process {Xt , t Z} is said to be a moving average of order q if Xt = Zt + 1 Zt1 + . . . + q Ztq , where 1 , . . . , q are constants. {Zt } WN(0, 2 ),

6 TIME SERIES MODELS An invertible MA(1) process dened by Xt = Zt + Zt1 , has ACVF {Zt } WN(0, 2 ), if h = 0, if |h| = 1, if |h| > 1.

11

(1 + 2 ) 2 (h) = 2 0

and spectral density 2 f () = (1 + 2 + 2 cos()), 2 .

6.2

ARIMA and FARIMA processes

Denition 6.9 (The ARIMA(p, d, q) process) Let d be a non-negative integer. The process {Xt , t Z} is said to be an ARIMA(p, d, q) process if (1 B)d Xt is a causal ARMA(p, q) process. Denition 6.10 (The FARIMA(p, d, q) process) Let 0 < |d| < 0.5. The process {Xt , t Z} is said to be a fractionally integrated ARMA process or a FARIMA(p, d, q) process if {Xt } is stationary and satises (B)(1 B)d Xt = (B)Zt , {Zt } WN(0, 2 ).

6.3

Financial time series

Denition 6.11 (The ARCH(p) process) The process {Xt , t Z} is said to be an ARCH(p) process if it is stationary and if Xt = t Zt , where
2 2 2 t = 0 + 1 Xt1 + . . . + p Xtp

{Zt } IID N (0, 1),

and 0 > 0, j 0 for j = 1, . . . , p, and if Zt and Xt1 , Xt2 , . . . are independent for all t. Denition 6.12 (The GARCH(p, q) process) The process {Xt , t Z} is said to be an GARCH(p, q) process if it is stationary and if Xt = t Zt , where
2 2 2 2 2 t = 0 + 1 Xt1 + . . . + p Xtp + 1 t1 + . . . + q tq

{Zt } IID N (0, 1),

and 0 > 0, j 0 for j = 1, . . . , p, k 0 for k = 1, . . . , q, and if Zt and Xt1 , Xt2 , . . . are independent for all t.

7 PREDICTION

12

Prediction

Let X1 , X2 , . . . , Xn and Y be any random variables with nite means and variances. Put i = E(Xi ), = E(Y ), 1,1 . . . 1,n Cov(X1 , X1 ) . . . Cov(X1 , Xn ) . . . n = . = . . n,1 . . . n,n Cov(Xn , X1 ) . . . Cov(Xn , Xn ) and Cov(X1 , Y ) 1 . . . n = . = . . . Cov(Xn , Y ) n

Denition 7.1 The best linear predictor Y of Y in terms of X1 , X2 , . . . , Xn is a random variable of the form Y = a0 + a1 X1 + . . . + an Xn such that E (Y Y )2 is minimized with respect to a0 , . . . , an . E (Y Y )2 is called the mean-squared error. It is often convenient to use the notation Psp{1, X1 ,...,Xn } Y = Y . The predictor is given by Y = + a1 (X1 1 ) + . . . + an (Xn n ) where a1 . . an = . an
def

satises n = n an . If n is non-singular we have an = 1 n . n There is no restriction to assume all means to be 0. The predictor Y of Y is determined by Cov(Y Y, Xi ) = 0, for i = 1, . . . , n.

7.1

Prediction for stationary time series

Theorem 7.1 If {Xt } is a zero-mean stationary time series such that (0) > 0 and (h) 0 as h , the best linear predictor Xn+1 of Xn+1 in terms of X1 , X2 , . . . , Xn is
n

Xn+1 =
i=1

n,i Xn+1i ,

n = 1, 2, . . . ,

7 PREDICTION where (1) n,1 . . n = . = 1 n , n = . . . n (n) n,n (1 1) . . . (1 n) . . n = . . (n 1) . . . (n n)

13

and

The mean-squared error is vn = (0) n 1 n . n Theorem 7.2 (The DurbinLevinson Algorithm) If {Xt } is a zero-mean stationary time series such that (0) > 0 and (h) 0 as h , then 1,1 = (1)/(0), v0 = (0),
n1

n,n = (n)
j=1

1 n1,j (n j) vn1

n,1 n1,1 n1,n1 . . . . . . = n,n . . . n,n1 n1,n1 n1,1 and vn = vn1 [1 2 ]. n,n Theorem 7.3 (The Innovations Algorithm) If {Xt } has zero-mean and
(1,1) ... (1,n)

E(Xi Xj ) = (i, j), where the matrix 0 Xn+1 = and v0 = (1, 1),
k1 n

(n,1) ... (n,n)

. . .

is non-singular, we have if n = 0,

j=1

n,j (Xn+1j Xn+1j )

if n 1,

(6)

n,nk =

1 vk

(n + 1, k + 1)
j=0 n1

k,kj n,nj vj , k = 0, . . . , n 1,

vn = (n + 1, n + 1)
j=0

2 n,nj vj .

8 PARTIAL CORRELATION

14

7.2

Prediction of an ARMA Process

Let {Xt } be a causal ARMA(p, q) process dened by (B)Xt = (B)Zt . Then n n,j (Xn+1j Xn+1j ) if 1 n < m, j=1 Xn+1 = 1 Xn + + p Xn+1p q + n,j (Xn+1j Xn+1j ) if n m,
j=1

where m = max(p, q). The nj :s are obtained by the innovations algorithm applied to Wt = 1 Xt , if t = 1, . . . , m, Wt = 1 (B)Xt , if t > m.

Partial correlation

Denition 8.1 Let Y1 , Y2 and W1 , . . . , Wk be random variables. The partial correlation coecient of Y1 and Y2 with respect to W1 , . . . , Wk is dened by (Y1 , Y2 ) = (Y1 Y1 , Y2 Y2 ), where Y1 = Psp{1,W1 ,...,Wk } Y1 and Y2 = Psp{1,W1 ,...,Wk } Y2 .
def

8.1

Partial autocorrelation

Denition 8.2 Let {Xt , t Z} be a zero-mean stationary time series. The partial autocorrelation function (PACF) of {Xt } is dened by (0) = 1, (1) = (1), (h) = (Xh+1 Psp{X2 ,...,Xh } Xh+1 , X1 Psp{X2 ,...,Xh } X1 ), h 2.

Theorem 8.1 Under the assumptions of Theorem 7.2 (h) = h,h for h 1.

Linear lters

A lter is an operation on a time series {Xt } in order to obtain a new time series {Yt }. {Xt } is called the input and {Yt } the output. The following operation

Yt =
k=

ct,k Xk

denes a linear lter. A lter is called time-invariant if ct,k depends only on t k, i.e. if ct,k = htk .

10 ESTIMATION IN TIME SERIES A time-invariant linear lter (TLF) is said to by causal if hj = 0 for j < 0, A TLF is called stable if

15

|hk | < .

Put h(z) = hk z k . Then Y = h(B)X. The function h(ei ) is called the transfer function (verfringsfunktion eller frekvenssvarsfunktion). The funco o tion |h(ei )|2 is called the power transfer function. Theorem 9.1 Let {Xt } be a possibly complex-valued stationary input in a stable TLF h(B) and let {Yt } be the output, i.e. Y = h(B)X. Then (a) EYt = h(1)EXt ; (b) Yt is stationary; (c) FY () =
(,]

|h(ei )|2 dFX (),

for [, ].

10

Estimation in time series

Denition 10.1 (Strictly linear time series) A stationary time series {Xt } is called strictly linear if it has the representation

Xt = +
j=

j Ztj ,

{Zt } IID(0, 2 ).

10.1

Estimation of
1 n n j=1

Consider X n =

Xj , which is a natural unbiased estimate of .

Theorem 10.1 If {Xt } is a stationary time series with mean and autocovariance function (), then as n , Var(X n ) = E[(X n )2 ] 0 and n Var(X n )
h=

if (n) 0,

(h) = 2f (0) if
h=

|(h)| < .
j=

Theorem 10.2 If {Xt } is a strictly linear time series where and j= j = 0, then v X n AN , , n where v =
h=

|j | <

(h) = 2

j=

The notion AN is found in Denitions 2.4 and 2.5 on page 4.

10 ESTIMATION IN TIME SERIES

16

10.2

Estimation of () and ()
1 (h) = n
nh

Consider (Xt X n )(Xt+h X n ),


t=1

0 h n 1,

and (h) = respectively.

(h) , (0)
j=

Theorem 10.3 If {Xt } is a strictly linear time series where and EZt4 = 4 < , then (0) (0) . 1 . . AN . , n V , . . (h) (h) where V = (vij )i,j=0,...,h is the covariance matrix and

|j | <

vij = ( 3)(i)(j) +

{(k)(k i + j) + (k + j)(k i)}.


k=

Note: If {Zt , t Z} is Gaussian, then = 3. Theorem 10.4 If {Xt } is a strictly linear time series where and EZt4 < , then (1) (1) . . 1 . AN . , n W , . . (h) (h) where W = (wij )i,j=1,...,h is the covariance matrix and
j=

2 |j | <

wij =

{(k + i)(k + j) + (k i)(k + j)


k=

+ 2(i)(j)2 (k) 2(i)(k)(k + j) 2(j)(k)(k + i)}. (7) In the following theorem, the assumption EZt4 < is relaxed at the expense of a slightly stronger assumption on the sequence {j }. Theorem 10.5 If {Xt } is a strictly linear time series where 2 and j= j |j| < , then (1) (1) . 1 . . AN . , n W , . . (h) (h) where W is given by the previous theorem.
j=

|j | <

10 ESTIMATION IN TIME SERIES

17

10.3

Estimation of the spectral density


2j , n

The Fourier frequencies are given by j = Fn = {j Z, < j } = where [x] denotes the integer part of x. 10.3.1 The periodogram
def

< j . Put n1 n ,..., 2 2 ,

Denition 10.2 The periodogram In () of {X1 , . . . , Xn } is dened by 1 In (j ) = n


n 2

Xt e
t=1

itj

j Fn .

Denition 10.3 (Extension of the periodogram) For any [, ] we dene In () = In (k ) In () if k /n < k + /n and 0 , if [, 0).

Theorem 10.6 We have EIn (0) n2 2f (0) as n and EIn () 2f () as n if = 0. (If = 0 then In () converges uniformly to 2f () on [, ).) Theorem 10.7 Let {Xt } be a strictly linear time series with

= 0,
j=

|j ||j|1/2 < and EZ 4 < .

Then 2(2)2 f 2 (j ) + O(n1/2 ) Cov(In (j ), In (k )) = (2)2 f 2 (j ) + O(n1/2 ) O(n1 ) 10.3.2 Smoothing the periodogram if j = k = 0 or , if 0 < j = k < , if j = k .

Denition 10.4 The estimator f () = f (g(n, )) with f (j ) = 1 2 Wn (k)In (j+k ),


|k|mn

10 ESTIMATION IN TIME SERIES where mn and mn /n 0 as n , for all k, Wn (k) = Wn (k),


|k|mn

18

Wn (k) 0, Wn (k) = 1,

and
2 Wn (k) 0 as |k|mn

n ,

is called a discrete spectral average estimator of f (). (If j+k [, ] the term In (j+k ) is evaluated by dening In to have period 2.) Theorem 10.8 Let {Xt } be a strictly linear time series with

= 0,
j=

|j ||j|1/2 < and EZ 4 < .

Then
n

lim E f () = f ()

and 2f 2 () 1 lim Cov(f (), f ()) = f 2 () 2 n Wn (k) 0 |k|mn if = = 0 or , if 0 < = < , if = .

Remark 10.1 If = 0 we ignore In (0). Thus we can use


n 1 f (0) = Wn (0)In (1 ) + 2 Wn (k)In (k+1 ) . 2 k=1

Moreover, whenever In (0) appears in f (j ) we replace it with f (0).

Example 10.1 (The simple moving average estimate) For this estimate we have 1/(2mn + 1) if |k| mn , Wn (k) = 0 if |k| > mn , and Var(f ())
1 f 2 () mn 1 f 2 () mn 2

if = 0 or , if 0 < < . 2

11 ESTIMATION FOR ARMA MODELS

19

11
11.1

Estimation for ARMA models


Yule-Walker estimation
Xt 1 Xt1 . . . p Xtp = Zt , {Zt } IID(0, 2 ).

Consider a causal zero-mean AR(p) process {Xt }:

The Yule-Walker equations (5) on page 10 can be written on the form p = p where and 2 = (0) p , (1) . and p = . . . (p)

(0) . . . (p 1) . . p = . (p 1) . . . (0)

If we replace p and p with the estimates p and p we obtain the following equations for the Yule-Walker estimates p = p where and 2 = (0) p , (1) . and p = . . . (p)

(0) . . . (p 1) . . p = . (p 1) . . . (0)

Theorem 11.1 If {Xt } is a causal AR(p) process with {Zt } IID(0, 2 ), and is the Yule-Walker estimate of , then AN , Moreover, 2 2. A usual way to proceed is as if {Xt } were an AR(m) process for m = 1, 2, . . . until we believe that m p. In that case we can use the Durbin-Levinson algorithm, see Theorem 7.2 on page 13, with () replaced by ().
P

2 1 p n

for large values of n.

11.2

Burgs algorithm

Assume as usual that x1 , . . . , xn are the observations. The idea is to consider one observation after the other and to predictit both by forward and backward data. The forward and backward prediction errors {ui (t)} and {vi (t)} satisfy the recursions u0 (t) = v0 (t) = xn+1t , ui (t) = ui1 (t 1) ii vi1 (t),

11 ESTIMATION FOR ARMA MODELS and vi (t) = vi1 (t) ii ui1 (t 1).

20

Suppose now that we know i1,k for k = 1, . . . , i 1 and ii . Then i,k for k = 1, . . . , i 1 may be obtained by the Durbin-Levinson algorithm. Thus the main problem is to obtain an algorithm for calculating ii for i = 1, 2, . . . Burgs algorithm: d(1) = 1 x2 + x2 + . . . + x2 + 1 x2 2 n1 2 1 2 n
(B) ii

(8) (9)

1 = vi1 (t)ui1 (t 1) d(i) t=i+1


(B)2 i

d(i) 1 ii = ni
(B)2

(B)2

(10) (11)

d(i + 1) = d(i) 1 ii

2 1 vi (i + 1) 1 u2 (n). 2 2 i

The Burg estimates for an AR(p) have the same statistical properties for large values of n as the Yule-Walker estimate, i.e. Theorem 11.1 on the preceding page holds.

11.3

The innovations algorithm


Xt = Zt + 1 Zt1 + . . . + q Ztq , {Zt } IID(0, 2 ),

Since an MA(q) process

has, by denition, an innovation representation, it is natural to use the innovations algorithm for prediction in a similar way as the Durbin-Levinson algorithm was used. Since, generally, q is unknown, we can try to t MA models Xt = Zt + m1 Zt1 + . . . + mm Ztm , {Zt } IID(0, vm ), of orders m = 1, 2, . . . , by means of the innovations algorithm. Denition 11.1 (Innovations estimates of MA parameters) If (0) > 0 we dene the innovations estimates m1 . m = . and vm , m = 1, 2, . . . , n 1, . mm

11 ESTIMATION FOR ARMA MODELS by the recursion relations = (0), v0 k1 1 m,mk = vk (m k) m,mj k,kj vj j=0 m1 2 vm = (0) vj .
j=0 m,mj

21

, k = 0, . . . , m 1,

This method works also for causal invertible ARMA processes. The following theorem gives asymptotic statistical properties of the innovations estimates. Theorem 11.2 Let {Xt } be the causal invertible ARMA process (B)Xt = (z) (B)Zt , {Zt } IID(0, 2 ), EZt4 < , and let (z) = j z j = (z) , |z| 1 j=0 (with 0 = 1 and j = 0 for j < 0). Then for any sequence of positive integers {m(n), n = 1, 2, . . . } such that m and m = o(n1/3 ) as n , we have for each xed k, m1 1 . . 1 . AN . , n A , . . k mk where A = (aij )i,j=1,...,k and
min(i,j)

aij =
r=1

ir jr .

Moreover, vm 2 .
P

11.4

The HannanRissanen algorithm


{Zt } IID(0, 2 ).

Let {Xt } be an ARMA(p, q) process: Xt 1 Xt1 . . . p Xtp = Zt + 1 Zt1 + . . . + q Ztq ,

The HannanRissanen algorithm consists of the following two steps: Step 1 A high order AR(m) model (with m > max(p, q)) is tted to the data by YuleWalker estimation. If m1 , . . . , mm are the estimated coecients, then Zt is estimated by Zt = Xt m1 Xt1 . . . mm Xtm , Step 2 The vector = (, ) is estimated by least square regression of Xt onto Xt1 , . . . , Xtp , Zt1 , . . . , Ztq , t = m + 1, . . . , n.

11 ESTIMATION FOR ARMA MODELS i.e. by minimizing


n

22

S() =

(Xt 1 Xt1 . . . p Xtp 1 Zt1 . . . q Ztq )2


t=m+1

with respect to . This gives the HannanRissanen estimator = (Z Z)1 Z X n provided Z Z is non-singular, where Xm+1 . Xn = . . Xn Xm Xm1 . . . Xmp+1 Zm Zm1 . . . Zmq+1 . . . Z= . . . . Xn1 Xn2 . . . Xnp Zn1 Zn2 . . . Znq

and

The HannanRissanen estimate of the white noise variance 2 is


2 HR =

S( ) . nm

11.5

Maximum Likelihood and Least Square estimation

It is possible to obtain better estimates by the maximum likelihood method (under the assumption of Gaussian processes) or by the least square method. In the least square method we minimize
n

S(, ) =
j=1

(Xj Xj )2 , rj1

where rj1 = vj1 / 2 , with respect to and . The estimates has to be obtained by recursive methods, and the estimates discussed are natural starting values. The least square estimate of 2 is
2 LS =

S( LS , LS ) , npq

where ( LS , LS ) is the estimate obtained by minimizing S(, ). Let us assume, or at least act as if, the process is Gaussian. Then, for any xed values of , , and 2 , the innovations X1 X1 , . . . , Xn Xn are independent and normally distributed with zero means and variances v0 = 2 r0 , . . . , vn1 = 2 rn1 . The likelihood function is then
n n

L(, , ) =
j=1

fXj Xj (Xj Xj ) = b
j=1

(Xj Xj )2 1 exp 2 2 rj1 2 2 rj1

12 MULTIVARIATE TIME SERIES Proceeding in the usual way get ae 1 S(, ) ln L(, , 2 ) = ln((2 2 )n r0 rn1 ) . 2 2 2

23

Obviously r0 , . . . , rn1 depend on and but they do not depend on 2 . To maximize ln L(, , 2 ) is the same as to minimize
n

(, ) = ln(n S(, )) + n

1 j=1

ln rj1 ,

which has to be done numerically. In the causal and invertible case rn 1 and therefore n1 n ln rj1 is j=1 asymptotically negligible compared with ln S(, ). Thus both methods least square and maximum likelihood give asymptotically the same result in that case.

11.6

Order selection

Assume now that we want to t an ARMA(p, q) process to real data, i.e. we want to estimate p, q, (, ), and 2 . We restrict ourselves to maximum likelihood estimation. Then we maximize L(, , 2 ), or which is the same minimize 2 ln L(, , 2 ), where L is regarded as a function also of p and q. Most probably we will get very high values of p and q. Such a model will probably t the given data very well, but it is more or less useless as a mathematical model, since it will probably not be lead to reasonable predictors nor describe a dierent data set well. It is therefore natural to introduce a penalty factorto discourage the tting of models with too many parameters. Instead of maximum likelihood estimation we may apply the AICC Criterion: Choose p, q, and (p , q ), to minimize AICC = 2 ln L(p , q , S(p , q )/n) + 2(p + q + 1)n/(n p q 2). (The letters AIC stand for Akaikes Information Criterionnd the last C for a biased-Corrected.) The AICC Criterion has certain nice properties, but also its drawbacks. In general one may say the order selection is genuinely dicult.

12
Let

Multivariate time series


Xt1 def . Xt = . , . Xtm t Z,

where each component is a time series. In that case we talk about multivariate time series.

12 MULTIVARIATE TIME SERIES The second-order properties of {X t } are specied by the mean vector t1 EXt1 def . . t = EX t = . = . , t Z, . . tm EXtm and the covariance matrices

24

11 (t + h, t) . . . 1m (t + h, t) def . . (t+h, t) = E[(X t+h t+h )(X t t ) ] = . m1 (t + h, t) . . . mm (t + h, t) where ij (t + h, t) = Cov(Xt+h,i , Xt,j ). Denition 12.1 The m-variate time series {X t , t Z} is said to be (weakly) stationary if (i) t = for all t Z, for all r, s, t Z.
def

(ii) (r, s) = (r + t, s + t)

Item (ii) implies that (r, s) is a function of r s, and it is convenient to dene (h) = (h, 0). Denition 12.2 (Multivariate white noise) An m-variate process {Z t , t Z} is said to be a white noise with mean and covariance matrix , written | {Z t } WN(, ), | if EZ t = and (h) = | 0 if h = 0, if h = 0.
def

Denition 12.3 (The ARMA(p, q) process) The process {X t , t Z} is said to be an ARMA(p, q) process if it is stationary and if X t 1 X t1 . . . p X tp = Z t + 1 Z t1 + . . . + q Z tq , (12)

where {Z t } WN(0, ). We say that {X t } is an ARMA(p, q) process with | mean if {X t } is an ARMA(p, q) process. Equations (12) can be written as (B)X t = (B)Z t , where (z) = I 1 z . . . p z p , t Z,

13 KALMAN FILTERING (z) = I + 1 z + . . . + q z q ,

25

are matrix-valued polynomials. Causality and invertibility are characterized in terms of the generating polynomials: Causality: X t is causal if det (z) = 0 for all |z| 1; Invertibility: X t is invertible if det (z) = 0 for all |z| 1. Assume that

|ij (h)| < ,


h=

i, j = 1, . . . , m.

(13)

Denition 12.4 (The cross spectrum) Let {X t , t Z} be an m-variate stationary time series whose ACVF satises (13). The function 1 fjk () = 2

eih jk (h),
h=

, j = k,

is called the cross spectrum or cross spectral density of {Xtj } and {Xtk }. The matrix f11 () . . . f1m () . f () = . . fm1 () . . . fmm () is called the spectrum or spectral density matrix of {X t }. The spectral density matrix f () is non-negative denite for all [, ].

13

Kalman ltering
{Z t } WN(0, {t }), |

We will use the notation

to indicate that the process {Z t } has mean 0 and that EZ s Z t = t | 0 if s = t, otherwise.

Notice this denition is an extension of Denition 12.2 on the page before in order to allow for non-stationarity. A state-space model is dened by the state equation X t+1 = Ft X t + V t , t = 1, 2, . . . , (14)

13 KALMAN FILTERING where {X t } is a v-variate process describing the state of some system, {V t } WN(0, {Qt }), and {Ft } is a sequence of v v matrices and the observation equation Y t = Gt X t + W t , t = 1, 2, . . . ,

26

(15)

where {Y t } is a w-variate process describing the observed state of some system, {W t } WN(0, {Rt }), and {Gt } is a sequence of w v matrices. Further {W t } and {V t } are uncorrelated. To complete the specication it is assumed that the initial state X 1 is uncorrelated with {W t } and {V t }. Denition 13.1 (State-space representation) A time series {Y t } has a state-space representation if there exists a state-space model for {Y t } as specied by equations (14) and (15). Put Pt (X) = P (X | Y 0 , . . . , Y t ), i.e. the vector of best linear predictors of X1 , . . . , Xv in terms of all components of Y 0 , . . . , Y t . Linear estimation of X t in terms of Y 0 , . . . , Y t1 denes the prediction problem; Y 0 , . . . , Y t denes the ltering problem; Y 0 , . . . , Y n , n > t, denes the smoothing problem. Theorem 13.1 (Kalman Prediction) The predictors X t = Pt1 (X t ) and the error covariance matrices t = E[(X t X t )(X t X t ) ] are uniquely determined by the initial conditions X 1 = P (X 1 | Y 0 ), and the recursions, for t = 1, . . ., X t+1 = Ft X t + t 1 (Y t Gt X t ) t t+1 = Ft t Ft + Qt t 1 t , t where t = Gt t Gt + Rt , t = Ft t Gt . The matrix t 1 is called the Kalman gain. t (16) (17) 1 = E[(X 1 X 1 )(X 1 X 1 ) ]
def def def def

13 KALMAN FILTERING
def

27

Theorem 13.2 (Kalman Filtering) The ltered estimates X t|t = Pt (X t ) and the error covariance matrices t|t = E[(X t X t|t )(X t X t|t ) ] are determined by the relations Xt|t = Pt1 (X t ) + t Gt 1 (Y t Gt X t ) t and t|t+1 = t t Gt 1 Gt t . t Theorem 13.3 (Kalman Fixed Point Smoothing) The smoothed estimates def X t|n = Pn (X t ) and the error covariance matrices t|n = E[(X t X t|n )(X t X t|n ) ] are determined for xed t by the recursions, which can be solved successively for n = t, t + 1, . . . : Pn (X t ) = Pn1 (X t ) + t.n Gn 1 (Y n Gn X n ), n t.n+1 = t.n [Fn n 1 Gn ] , n t|n = t|n1 t.n Gn 1 Gn t.n , n with initial conditions Pt1 (X t ) = X t and t.t = t|t1 = t found from Kalman prediction.
def def

Sakregister
ACF, 6 ACVF, 6 AICC, 23 AR(p) process, 10 ARCH(p) process, 11 ARIMA(p, d, q) process, 11 ARMA(p, q) process, 9 causal, 9 invertible, 9 multivariate, 24 autocorrelation function, 6 autocovariance function, 6 autoregressive process, 10 best linear predictor, 12 Brownian motion, 5 Cauchy-sequence, 3 causality, 9 characteristic function, 2 convergence mean-square, 3 cross spectrum, 25 density function, 2 distribution function, 2 DurbinLevinson algorithm, 13 estimation least square, 22 maximum likelihood, 23 FARIMA(p, d, q) process, 11 Fourier frequencies, 17 GARCH(p, q) process, 11 Gaussian time series, 6 generating polynomials, 9 HannanRissanen algorithm, 21 IID noise, 8 innovations algorithm, 13 invertibility, 9 Kalman ltering, 27 28 Kalman prediction, 26 Kalman smoothing, 27 linear lter, 14 causal, 15 stable, 15 time-invariant, 14 linear process, 8 MA(q) process, 10 mean function, 5 mean-square convergence, 3 mean-squared error, 12 moving average, 10 observation equation, 26 PACF, 14 partial autocorrelation, 14 partial correlation coecient, 14 periodogram, 17 point estimate, 5 Poisson process, 6 power transfer function, 15 probability function, 2 probability measure, 2 probability space, 2 random variable, 2 sample space, 2 shift operator, 9 -eld, 2 spectral density, 7 matrix, 25 spectral distribution, 7 spectral estimator discrete average, 18 state equation, 25 state-space model, 25 state-space representation, 26 stochastic process, 5 strict stationarity, 6 strictly linear time series, 15

SAKREGISTER time series, 5 linear, 8 multivariate, 23 stationary, 6, 24 strictly linear, 15 strictly stationary, 6 weakly stationary, 6, 24 TLF, 15 transfer function, 15 weak stationarity, 6, 24 white noise, 8 multivariate, 24 Wiener process, 5 WN, 8, 24 Yule-Walker equations, 10

29

Anda mungkin juga menyukai