Matematisk statistik
INNEHLL A
Inneh all
1 Some notation 2 General probabilistic formulas 2.1 Some distributions . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Stochastic processes 4 Stationarity 5 Spectral theory 2 2 3 5 5 6 7
6 Time series models 8 6.1 ARMA processes . . . . . . . . . . . . . . . . . . . . . . . . . . 9 6.2 ARIMA and FARIMA processes . . . . . . . . . . . . . . . . . . 11 6.3 Financial time series . . . . . . . . . . . . . . . . . . . . . . . . 11 7 Prediction 12 7.1 Prediction for stationary time series . . . . . . . . . . . . . . . . 12 7.2 Prediction of an ARMA Process . . . . . . . . . . . . . . . . . . 14 8 Partial correlation 14 8.1 Partial autocorrelation . . . . . . . . . . . . . . . . . . . . . . . 14 9 Linear lters 10 Estimation in time series 10.1 Estimation of . . . . . . . . . . . 10.2 Estimation of () and () . . . . . 10.3 Estimation of the spectral density . 10.3.1 The periodogram . . . . . . 10.3.2 Smoothing the periodogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 15 15 16 17 17 17 19 19 19 20 21 22 23 23 25 28
11 Estimation for ARMA models 11.1 Yule-Walker estimation . . . . . . . . . 11.2 Burgs algorithm . . . . . . . . . . . . 11.3 The innovations algorithm . . . . . . . 11.4 The HannanRissanen algorithm . . . 11.5 Maximum Likelihood and Least Square 11.6 Order selection . . . . . . . . . . . . . 12 Multivariate time series 13 Kalman ltering Index
. . . . . . . . . . . . . . . . . . . . . . . . estimation . . . . . .
1 SOME NOTATION
Some notation
(, F, P ) is a probability space, where: is the sample space, i.e. the set of all possible outcomes of an experiment. F is a -eld (or a -algebra), i.e. (a) F ; (b) if A1 , A2 , F then (c) if A F then Ac F. P is a probability measure, i.e. a function F [0, 1] satisfying (a) P () = 1; (b) P (A) = 1 P (Ac ); (c) if A1 , A2 , F are disjoint, then P
1 1
Ai F;
Ai
P (Ai ).
Denition 2.1 A random variable X dened on (, F, P ) is a function R such that { : X() x} F for all x R. Let X be a random variable. FX (x) = P {X x} is the distribution function (frdelningsfunktionen). o fX (), given by FX (x) =
x
pX (k) = P {X = k} is the probability function (sannolikhetsfunktionen). X (u) = E eiuX is the characteristic function (karakteristiska funktionen). Denition 2.2 Let X1 , X2 , . . . be a sequence of random variables. We say P that Xn converges in probability to the real number a, written Xn a, if for every > 0, lim P (|Xn a| > ) = 0.
n
Denition 2.3 Let X1 , X2 , . . . be a sequence of random variables with nite second moment. We say that Xn converges in mean-square to the random m.s. variable X, written Xn X, if E[(Xn X)2 ] 0 as n . An important property of mean-square convergence is that Cauchy-sequences do converge. More precisely this means that if X1 , X2 , . . . have nite second moment and if E[(Xn Xk )2 ] 0 as n, k , then there exists a random variable X with nite second moment such that m.s. Xn X. The space of square integrable random variables is complete under mean-square convergence.
2.1
Some distributions
E(X) = np, Var(X) = np(1 p). The Poisson Distribution X Po() if pX (k) = k e , k! k = 0, 1, . . . and > 0.
iu )
The Exponential Distribution X Exp() if fX (x) = E(X) = , Var(X) = 2 . The Standard Normal Distribution X N (0, 1) if 1 2 fX (x) = ex /2 , 2
2 1 x/ e
if x 0, if x < 0
> 0.
x R.
E(X) = 0, Var(X) = 1, X (u) = eu /2 . The density function is often denoted by () and the distribution function by ().
The (multivariate) Normal Distribution Y = (Y1 , . . . , Ym ) N (, ), if there exists b11 . . . b1n 1 . . a vector = . , a matrix B = . with = BB . . bm1 . . . bmn m and a random vector X = (X1 , . . . , Xn ) with independent and N (0, 1)-distributed components, such that Y = + BX. If Y1 Y2 then Y1 conditional on Y2 = y2 N 1 + More generally, if Y1 Y2 N 1 , 2 11 12 21 22 , 1 2 (y2 2 ), (1 2 )1 . 2 N 1 , 2
2 1 1 2 2 1 2 2
then Y 1 conditional on Y 2 = y 2 N 1 + 12 1 (y 2 2 ), 11 12 1 21 . 22 22 Asymptotic normality Denition 2.4 Let Y1 , Y2 , . . . be a sequence of random variables. 2 Yn AN(n , n ) means that
n
lim P
Y n n x n
= (x).
Denition 2.5 Let Y 1 , Y 2 , . . . be a sequence of random k-vectors. Y n AN(n , n ) means that (a) 1 , 2 , . . . have no zero diagonal elements; (b) Y n AN( n , n ) for every Rk such that n > 0 for all suciently large n.
3 STOCHASTIC PROCESSES
2.2
Estimation
Let x1 , . . . , xn be observations of random variables X1 , . . . , Xn with a (known) distribution depending on the unknown parameter . A point estimate (punktskattning) of is then the value (x1 , . . . , xn ). In order to analyze the estimate we consider the estimator (stickprovsvariabeln) (X1 , . . . , Xn ). Some nice properties of an estimate are the following: An estimate of is unbiased (vntevrdesriktig) if E((X1 , . . . , Xn )) = a a for all . An estimate of is consistent if P (|(X1 , . . . , Xn ) | > ) 0 for n . If and are unbiased estimates of we say that is more eective than if Var((X1 , . . . , Xn )) Var( (X1 , . . . , Xn )) for all .
Stochastic processes
Denition 3.1 (Stochastic process) A stochastic process is a family of random variables {Xt , t T } dened on a probability space (, F, P ). A stochastic process with T Z is often called a time series. Denition 3.2 (The distribution of a stochastic process) Put T = {t T n : t1 < t2 < < tn , n = 1, 2, . . . }. The (nite-dimensional) distribution functions are the family {Ft (), t T } dened by Ft (x) = P (Xt1 x1 , . . . , Xtn xn ), t T n , x Rn .
With the distribution of {Xt , t T R} mean the family {Ft (), t T }. ae Denition 3.3 Let {Xt , t T } be a stochastic process with Var(Xt ) < . The mean function of {Xt } is X (t) = E(Xt ), The covariance function of {Xt } is X (r, s) = Cov(Xr , Xs ), r, s T.
def
t T.
Denition 3.4 (Standard Brownian motion) A standard Brownian motion, or a standard Wiener process {B(t), t 0} is a stochastic process satisfying (a) B(0) = 0;
4 STATIONARITY
(b) for every t = (t0 , t1 , . . . , tn ) with 0 = t0 < t1 < < tn the random variables 1 = B(t1 )B(t0 ), . . . , n = B(tn )B(tn1 ) are independent; (c) B(t) B(s) N (0, t s) for t s. Denition 3.5 (Poisson process) A Poisson process {N (t), t 0} with mean rate (or intensity) is a stochastic process satisfying (a) N (0) = 0; (b) for every t = (t0 , t1 , . . . , tn ) with 0 = t0 < t1 < < tn the random variables 1 = N (t1 )N (t0 ), . . . , n = N (tn )N (tn1 ) are independent; (c) N (t) N (s) Po((t s)) for t s. Denition 3.6 (Gaussian time series) The time series {Xt , t Z} is said to be a Gaussian time series if all nite-dimensional distributions are normal.
Stationarity
Denition 4.1 The time series {Xt , t Z} is said to be strictly stationary if the distributions of (Xt1 , . . . , Xtk ) and (Xt1 +h , . . . , Xtk +h ) are the same for all k, and all t1 , . . . , tk , h Z. Denition 4.2 The time series {Xt , t Z} is said to be (weakly) stationary if, see Denition 3.3 on the preceding page for notation, (i) Var(Xt ) < (ii) X (t) = for all t Z, for all t Z, for all r, s, t Z.
(iii) X (r, s) = X (r + t, s + t)
(iii) implies that X (r, s) is a function of r s, and it is convenient to dene X (h) = X (h, 0). The value his referred to as the lag. Denition 4.3 Let {Xt , t Z} be a stationary time series. The autocovariance function (ACVF) of {Xt } is X (h) = Cov(Xt+h , Xt ). The autocorrelation function (ACF) is X (h) =
def def
X (h) . X (0)
5 SPECTRAL THEORY
Spectral theory
Denition 5.1 The complex-valued time series {Xt , t Z} is said to be stationary if (i) E|Xt |2 < for all t Z,
(ii) EXt is independent of t for all t Z, (iii) E[Xt+h Xt ] is independent of t for all t Z. Denition 5.2 The autocovariance function () of a complex-valued stationary time series {Xt } is (h) = E[Xt+h Xt ] EXt+h EXt . Suppose that
h=
1 f () = 2
eih (h),
h=
(1)
is called the spectral density of the time series {Xt , t Z}. We have the spectral representation of the ACVF
(h) =
eih f () d.
For a real-valued time series f is symmetric, i.e. f () = f (). For any stationary time series the ACVF has the representation (h) =
(,]
where the spectral distribution function F () is a right-continuous, non-decreasing, bounded function on [, ] and F () = 0. The time series itself has a spectral representation Xt =
(,]
eit dZ()
where {Z(), [, ]} is an orthogonal-increment process. Denition 5.3 (Orthogonal-increment process) An orthogonal-increment process on [, ] is a complex-valued process {Z()} such that Z(), Z() < , and where X, Y = EXY . Z(), 1 = 0, , , if (1 , 2 ] (3 , 4 ] =
Denition 6.1 (White noise) A process {Xt , t Z} is said to be a white noise with mean and variance 2 , written {Xt } WN(, 2 ), if EXt = and (h) = 2 0 if h = 0, if h = 0.
Denition 6.2 (Linear processes) The process {Xt , t Z} is said to be a linear process if it has the representation
Xt =
j=
j Ztj ,
{Zt } WN(0, 2 ),
where
j=
|j | < .
(h) =
j=
j j+h 2 ,
2 |(ei )|2 , 2
j z j .
Denition 6.3 (IID noise) A process {Xt , t Z} is said to be an IID noise with mean 0 and variance 2 , written {Xt } IID(0, 2 ), if the random variables Xt are independent and identically distributed with EXt = 0 and Var(Xt ) = 2 .
6.1
ARMA processes
Denition 6.4 (The ARMA(p, q) process) The process {Xt , t Z} is said to be an ARMA(p, q) process if it is stationary and if Xt 1 Xt1 . . . p Xtp = Zt + 1 Zt1 + . . . + q Ztq , (2)
where {Zt } WN(0, 2 ). We say that {Xt } is an ARMA(p, q) process with mean if {Xt } is an ARMA(p, q) process. Equations (2) can be written as (B)Xt = (B)Zt , where (z) = 1 1 z . . . p z p , (z) = 1 + 1 z + . . . + q z q , and B is the backward shift operator, i.e. (B j X)t = Xtj . The polynomials () and () are called generating polynomials. Denition 6.5 An ARMA(p, q) process dened by the equations (B)Xt = (B)Zt {Zt } WN(0, 2 ),
j=0
t Z,
Xt =
j=0
j Ztj ,
t Z.
Theorem 6.1 Let {Xt } be an ARMA(p, q) for which () and () have no common zeros. Then {Xt } is causal if and only if (z) = 0 for all |z| 1. The coecients {j } in (3) are determined by the relation
(z) =
j=0
j z j =
(z) , (z)
|z| 1.
Denition 6.6 An ARMA(p, q) process dened by the equations (B)Xt = (B)Zt , {Zt } WN(0, 2 ),
j=0
|j | < (4)
j Xtj ,
t Z.
10
Theorem 6.2 Let {Xt } be an ARMA(p, q) for which () and () have no common zeros. Then {Xt } is invertible if and only if (z) = 0 for all |z| 1. The coecients {j } in (4) are determined by the relation
(z) =
j=0
j z j =
(z) , (z)
|z| 1.
A causal and invertible ARMA(p, q) process has spectral density f () = 2 |(ei )|2 , 2 |(ei )|2 .
Denition 6.7 (The AR(p) process) The process {Xt , t Z} is said to be an AR(p) autoregressive process of order p if it is stationary and if Xt 1 Xt1 . . . p Xtp = Zt , {Zt } WN(0, 2 ).
We say that {Xt } is an AR(p) process with mean if {Xt } is an AR(p) process. A causal AR(p) process has spectral density f () = 2 1 i )|2 2 |(e .
Its ACVF is determined be the the Yule-Walker equations: (k) 1 (k 1) . . . p (k p) = A causal AR(1) process dened by Xt Xt1 = Zt , has ACVF (h) = and spectral density f () = 2 1 , 2 2 cos() 2 1 + . {Zt } WN(0, 2 ), 2 |h| 1 2 0, 2, k = 1, . . . , p, k = 0. (5)
Denition 6.8 (The MA(q) process) The process {Xt , t Z} is said to be a moving average of order q if Xt = Zt + 1 Zt1 + . . . + q Ztq , where 1 , . . . , q are constants. {Zt } WN(0, 2 ),
6 TIME SERIES MODELS An invertible MA(1) process dened by Xt = Zt + Zt1 , has ACVF {Zt } WN(0, 2 ), if h = 0, if |h| = 1, if |h| > 1.
11
(1 + 2 ) 2 (h) = 2 0
6.2
Denition 6.9 (The ARIMA(p, d, q) process) Let d be a non-negative integer. The process {Xt , t Z} is said to be an ARIMA(p, d, q) process if (1 B)d Xt is a causal ARMA(p, q) process. Denition 6.10 (The FARIMA(p, d, q) process) Let 0 < |d| < 0.5. The process {Xt , t Z} is said to be a fractionally integrated ARMA process or a FARIMA(p, d, q) process if {Xt } is stationary and satises (B)(1 B)d Xt = (B)Zt , {Zt } WN(0, 2 ).
6.3
Denition 6.11 (The ARCH(p) process) The process {Xt , t Z} is said to be an ARCH(p) process if it is stationary and if Xt = t Zt , where
2 2 2 t = 0 + 1 Xt1 + . . . + p Xtp
and 0 > 0, j 0 for j = 1, . . . , p, and if Zt and Xt1 , Xt2 , . . . are independent for all t. Denition 6.12 (The GARCH(p, q) process) The process {Xt , t Z} is said to be an GARCH(p, q) process if it is stationary and if Xt = t Zt , where
2 2 2 2 2 t = 0 + 1 Xt1 + . . . + p Xtp + 1 t1 + . . . + q tq
and 0 > 0, j 0 for j = 1, . . . , p, k 0 for k = 1, . . . , q, and if Zt and Xt1 , Xt2 , . . . are independent for all t.
7 PREDICTION
12
Prediction
Let X1 , X2 , . . . , Xn and Y be any random variables with nite means and variances. Put i = E(Xi ), = E(Y ), 1,1 . . . 1,n Cov(X1 , X1 ) . . . Cov(X1 , Xn ) . . . n = . = . . n,1 . . . n,n Cov(Xn , X1 ) . . . Cov(Xn , Xn ) and Cov(X1 , Y ) 1 . . . n = . = . . . Cov(Xn , Y ) n
Denition 7.1 The best linear predictor Y of Y in terms of X1 , X2 , . . . , Xn is a random variable of the form Y = a0 + a1 X1 + . . . + an Xn such that E (Y Y )2 is minimized with respect to a0 , . . . , an . E (Y Y )2 is called the mean-squared error. It is often convenient to use the notation Psp{1, X1 ,...,Xn } Y = Y . The predictor is given by Y = + a1 (X1 1 ) + . . . + an (Xn n ) where a1 . . an = . an
def
satises n = n an . If n is non-singular we have an = 1 n . n There is no restriction to assume all means to be 0. The predictor Y of Y is determined by Cov(Y Y, Xi ) = 0, for i = 1, . . . , n.
7.1
Theorem 7.1 If {Xt } is a zero-mean stationary time series such that (0) > 0 and (h) 0 as h , the best linear predictor Xn+1 of Xn+1 in terms of X1 , X2 , . . . , Xn is
n
Xn+1 =
i=1
n,i Xn+1i ,
n = 1, 2, . . . ,
13
and
The mean-squared error is vn = (0) n 1 n . n Theorem 7.2 (The DurbinLevinson Algorithm) If {Xt } is a zero-mean stationary time series such that (0) > 0 and (h) 0 as h , then 1,1 = (1)/(0), v0 = (0),
n1
n,n = (n)
j=1
1 n1,j (n j) vn1
n,1 n1,1 n1,n1 . . . . . . = n,n . . . n,n1 n1,n1 n1,1 and vn = vn1 [1 2 ]. n,n Theorem 7.3 (The Innovations Algorithm) If {Xt } has zero-mean and
(1,1) ... (1,n)
E(Xi Xj ) = (i, j), where the matrix 0 Xn+1 = and v0 = (1, 1),
k1 n
. . .
is non-singular, we have if n = 0,
j=1
if n 1,
(6)
n,nk =
1 vk
(n + 1, k + 1)
j=0 n1
k,kj n,nj vj , k = 0, . . . , n 1,
vn = (n + 1, n + 1)
j=0
2 n,nj vj .
8 PARTIAL CORRELATION
14
7.2
Let {Xt } be a causal ARMA(p, q) process dened by (B)Xt = (B)Zt . Then n n,j (Xn+1j Xn+1j ) if 1 n < m, j=1 Xn+1 = 1 Xn + + p Xn+1p q + n,j (Xn+1j Xn+1j ) if n m,
j=1
where m = max(p, q). The nj :s are obtained by the innovations algorithm applied to Wt = 1 Xt , if t = 1, . . . , m, Wt = 1 (B)Xt , if t > m.
Partial correlation
Denition 8.1 Let Y1 , Y2 and W1 , . . . , Wk be random variables. The partial correlation coecient of Y1 and Y2 with respect to W1 , . . . , Wk is dened by (Y1 , Y2 ) = (Y1 Y1 , Y2 Y2 ), where Y1 = Psp{1,W1 ,...,Wk } Y1 and Y2 = Psp{1,W1 ,...,Wk } Y2 .
def
8.1
Partial autocorrelation
Denition 8.2 Let {Xt , t Z} be a zero-mean stationary time series. The partial autocorrelation function (PACF) of {Xt } is dened by (0) = 1, (1) = (1), (h) = (Xh+1 Psp{X2 ,...,Xh } Xh+1 , X1 Psp{X2 ,...,Xh } X1 ), h 2.
Theorem 8.1 Under the assumptions of Theorem 7.2 (h) = h,h for h 1.
Linear lters
A lter is an operation on a time series {Xt } in order to obtain a new time series {Yt }. {Xt } is called the input and {Yt } the output. The following operation
Yt =
k=
ct,k Xk
denes a linear lter. A lter is called time-invariant if ct,k depends only on t k, i.e. if ct,k = htk .
10 ESTIMATION IN TIME SERIES A time-invariant linear lter (TLF) is said to by causal if hj = 0 for j < 0, A TLF is called stable if
15
|hk | < .
Put h(z) = hk z k . Then Y = h(B)X. The function h(ei ) is called the transfer function (verfringsfunktion eller frekvenssvarsfunktion). The funco o tion |h(ei )|2 is called the power transfer function. Theorem 9.1 Let {Xt } be a possibly complex-valued stationary input in a stable TLF h(B) and let {Yt } be the output, i.e. Y = h(B)X. Then (a) EYt = h(1)EXt ; (b) Yt is stationary; (c) FY () =
(,]
for [, ].
10
Denition 10.1 (Strictly linear time series) A stationary time series {Xt } is called strictly linear if it has the representation
Xt = +
j=
j Ztj ,
{Zt } IID(0, 2 ).
10.1
Estimation of
1 n n j=1
Consider X n =
Theorem 10.1 If {Xt } is a stationary time series with mean and autocovariance function (), then as n , Var(X n ) = E[(X n )2 ] 0 and n Var(X n )
h=
if (n) 0,
(h) = 2f (0) if
h=
|(h)| < .
j=
Theorem 10.2 If {Xt } is a strictly linear time series where and j= j = 0, then v X n AN , , n where v =
h=
|j | <
(h) = 2
j=
16
10.2
Estimation of () and ()
1 (h) = n
nh
0 h n 1,
(h) , (0)
j=
Theorem 10.3 If {Xt } is a strictly linear time series where and EZt4 = 4 < , then (0) (0) . 1 . . AN . , n V , . . (h) (h) where V = (vij )i,j=0,...,h is the covariance matrix and
|j | <
vij = ( 3)(i)(j) +
Note: If {Zt , t Z} is Gaussian, then = 3. Theorem 10.4 If {Xt } is a strictly linear time series where and EZt4 < , then (1) (1) . . 1 . AN . , n W , . . (h) (h) where W = (wij )i,j=1,...,h is the covariance matrix and
j=
2 |j | <
wij =
+ 2(i)(j)2 (k) 2(i)(k)(k + j) 2(j)(k)(k + i)}. (7) In the following theorem, the assumption EZt4 < is relaxed at the expense of a slightly stronger assumption on the sequence {j }. Theorem 10.5 If {Xt } is a strictly linear time series where 2 and j= j |j| < , then (1) (1) . 1 . . AN . , n W , . . (h) (h) where W is given by the previous theorem.
j=
|j | <
17
10.3
The Fourier frequencies are given by j = Fn = {j Z, < j } = where [x] denotes the integer part of x. 10.3.1 The periodogram
def
Xt e
t=1
itj
j Fn .
Denition 10.3 (Extension of the periodogram) For any [, ] we dene In () = In (k ) In () if k /n < k + /n and 0 , if [, 0).
Theorem 10.6 We have EIn (0) n2 2f (0) as n and EIn () 2f () as n if = 0. (If = 0 then In () converges uniformly to 2f () on [, ).) Theorem 10.7 Let {Xt } be a strictly linear time series with
= 0,
j=
Then 2(2)2 f 2 (j ) + O(n1/2 ) Cov(In (j ), In (k )) = (2)2 f 2 (j ) + O(n1/2 ) O(n1 ) 10.3.2 Smoothing the periodogram if j = k = 0 or , if 0 < j = k < , if j = k .
18
Wn (k) 0, Wn (k) = 1,
and
2 Wn (k) 0 as |k|mn
n ,
is called a discrete spectral average estimator of f (). (If j+k [, ] the term In (j+k ) is evaluated by dening In to have period 2.) Theorem 10.8 Let {Xt } be a strictly linear time series with
= 0,
j=
Then
n
lim E f () = f ()
Example 10.1 (The simple moving average estimate) For this estimate we have 1/(2mn + 1) if |k| mn , Wn (k) = 0 if |k| > mn , and Var(f ())
1 f 2 () mn 1 f 2 () mn 2
if = 0 or , if 0 < < . 2
19
11
11.1
The Yule-Walker equations (5) on page 10 can be written on the form p = p where and 2 = (0) p , (1) . and p = . . . (p)
(0) . . . (p 1) . . p = . (p 1) . . . (0)
If we replace p and p with the estimates p and p we obtain the following equations for the Yule-Walker estimates p = p where and 2 = (0) p , (1) . and p = . . . (p)
(0) . . . (p 1) . . p = . (p 1) . . . (0)
Theorem 11.1 If {Xt } is a causal AR(p) process with {Zt } IID(0, 2 ), and is the Yule-Walker estimate of , then AN , Moreover, 2 2. A usual way to proceed is as if {Xt } were an AR(m) process for m = 1, 2, . . . until we believe that m p. In that case we can use the Durbin-Levinson algorithm, see Theorem 7.2 on page 13, with () replaced by ().
P
2 1 p n
11.2
Burgs algorithm
Assume as usual that x1 , . . . , xn are the observations. The idea is to consider one observation after the other and to predictit both by forward and backward data. The forward and backward prediction errors {ui (t)} and {vi (t)} satisfy the recursions u0 (t) = v0 (t) = xn+1t , ui (t) = ui1 (t 1) ii vi1 (t),
11 ESTIMATION FOR ARMA MODELS and vi (t) = vi1 (t) ii ui1 (t 1).
20
Suppose now that we know i1,k for k = 1, . . . , i 1 and ii . Then i,k for k = 1, . . . , i 1 may be obtained by the Durbin-Levinson algorithm. Thus the main problem is to obtain an algorithm for calculating ii for i = 1, 2, . . . Burgs algorithm: d(1) = 1 x2 + x2 + . . . + x2 + 1 x2 2 n1 2 1 2 n
(B) ii
(8) (9)
d(i) 1 ii = ni
(B)2
(B)2
(10) (11)
d(i + 1) = d(i) 1 ii
2 1 vi (i + 1) 1 u2 (n). 2 2 i
The Burg estimates for an AR(p) have the same statistical properties for large values of n as the Yule-Walker estimate, i.e. Theorem 11.1 on the preceding page holds.
11.3
has, by denition, an innovation representation, it is natural to use the innovations algorithm for prediction in a similar way as the Durbin-Levinson algorithm was used. Since, generally, q is unknown, we can try to t MA models Xt = Zt + m1 Zt1 + . . . + mm Ztm , {Zt } IID(0, vm ), of orders m = 1, 2, . . . , by means of the innovations algorithm. Denition 11.1 (Innovations estimates of MA parameters) If (0) > 0 we dene the innovations estimates m1 . m = . and vm , m = 1, 2, . . . , n 1, . mm
11 ESTIMATION FOR ARMA MODELS by the recursion relations = (0), v0 k1 1 m,mk = vk (m k) m,mj k,kj vj j=0 m1 2 vm = (0) vj .
j=0 m,mj
21
, k = 0, . . . , m 1,
This method works also for causal invertible ARMA processes. The following theorem gives asymptotic statistical properties of the innovations estimates. Theorem 11.2 Let {Xt } be the causal invertible ARMA process (B)Xt = (z) (B)Zt , {Zt } IID(0, 2 ), EZt4 < , and let (z) = j z j = (z) , |z| 1 j=0 (with 0 = 1 and j = 0 for j < 0). Then for any sequence of positive integers {m(n), n = 1, 2, . . . } such that m and m = o(n1/3 ) as n , we have for each xed k, m1 1 . . 1 . AN . , n A , . . k mk where A = (aij )i,j=1,...,k and
min(i,j)
aij =
r=1
ir jr .
Moreover, vm 2 .
P
11.4
The HannanRissanen algorithm consists of the following two steps: Step 1 A high order AR(m) model (with m > max(p, q)) is tted to the data by YuleWalker estimation. If m1 , . . . , mm are the estimated coecients, then Zt is estimated by Zt = Xt m1 Xt1 . . . mm Xtm , Step 2 The vector = (, ) is estimated by least square regression of Xt onto Xt1 , . . . , Xtp , Zt1 , . . . , Ztq , t = m + 1, . . . , n.
22
S() =
with respect to . This gives the HannanRissanen estimator = (Z Z)1 Z X n provided Z Z is non-singular, where Xm+1 . Xn = . . Xn Xm Xm1 . . . Xmp+1 Zm Zm1 . . . Zmq+1 . . . Z= . . . . Xn1 Xn2 . . . Xnp Zn1 Zn2 . . . Znq
and
S( ) . nm
11.5
It is possible to obtain better estimates by the maximum likelihood method (under the assumption of Gaussian processes) or by the least square method. In the least square method we minimize
n
S(, ) =
j=1
(Xj Xj )2 , rj1
where rj1 = vj1 / 2 , with respect to and . The estimates has to be obtained by recursive methods, and the estimates discussed are natural starting values. The least square estimate of 2 is
2 LS =
S( LS , LS ) , npq
where ( LS , LS ) is the estimate obtained by minimizing S(, ). Let us assume, or at least act as if, the process is Gaussian. Then, for any xed values of , , and 2 , the innovations X1 X1 , . . . , Xn Xn are independent and normally distributed with zero means and variances v0 = 2 r0 , . . . , vn1 = 2 rn1 . The likelihood function is then
n n
L(, , ) =
j=1
fXj Xj (Xj Xj ) = b
j=1
12 MULTIVARIATE TIME SERIES Proceeding in the usual way get ae 1 S(, ) ln L(, , 2 ) = ln((2 2 )n r0 rn1 ) . 2 2 2
23
Obviously r0 , . . . , rn1 depend on and but they do not depend on 2 . To maximize ln L(, , 2 ) is the same as to minimize
n
(, ) = ln(n S(, )) + n
1 j=1
ln rj1 ,
which has to be done numerically. In the causal and invertible case rn 1 and therefore n1 n ln rj1 is j=1 asymptotically negligible compared with ln S(, ). Thus both methods least square and maximum likelihood give asymptotically the same result in that case.
11.6
Order selection
Assume now that we want to t an ARMA(p, q) process to real data, i.e. we want to estimate p, q, (, ), and 2 . We restrict ourselves to maximum likelihood estimation. Then we maximize L(, , 2 ), or which is the same minimize 2 ln L(, , 2 ), where L is regarded as a function also of p and q. Most probably we will get very high values of p and q. Such a model will probably t the given data very well, but it is more or less useless as a mathematical model, since it will probably not be lead to reasonable predictors nor describe a dierent data set well. It is therefore natural to introduce a penalty factorto discourage the tting of models with too many parameters. Instead of maximum likelihood estimation we may apply the AICC Criterion: Choose p, q, and (p , q ), to minimize AICC = 2 ln L(p , q , S(p , q )/n) + 2(p + q + 1)n/(n p q 2). (The letters AIC stand for Akaikes Information Criterionnd the last C for a biased-Corrected.) The AICC Criterion has certain nice properties, but also its drawbacks. In general one may say the order selection is genuinely dicult.
12
Let
where each component is a time series. In that case we talk about multivariate time series.
12 MULTIVARIATE TIME SERIES The second-order properties of {X t } are specied by the mean vector t1 EXt1 def . . t = EX t = . = . , t Z, . . tm EXtm and the covariance matrices
24
11 (t + h, t) . . . 1m (t + h, t) def . . (t+h, t) = E[(X t+h t+h )(X t t ) ] = . m1 (t + h, t) . . . mm (t + h, t) where ij (t + h, t) = Cov(Xt+h,i , Xt,j ). Denition 12.1 The m-variate time series {X t , t Z} is said to be (weakly) stationary if (i) t = for all t Z, for all r, s, t Z.
def
(ii) (r, s) = (r + t, s + t)
Item (ii) implies that (r, s) is a function of r s, and it is convenient to dene (h) = (h, 0). Denition 12.2 (Multivariate white noise) An m-variate process {Z t , t Z} is said to be a white noise with mean and covariance matrix , written | {Z t } WN(, ), | if EZ t = and (h) = | 0 if h = 0, if h = 0.
def
Denition 12.3 (The ARMA(p, q) process) The process {X t , t Z} is said to be an ARMA(p, q) process if it is stationary and if X t 1 X t1 . . . p X tp = Z t + 1 Z t1 + . . . + q Z tq , (12)
where {Z t } WN(0, ). We say that {X t } is an ARMA(p, q) process with | mean if {X t } is an ARMA(p, q) process. Equations (12) can be written as (B)X t = (B)Z t , where (z) = I 1 z . . . p z p , t Z,
25
are matrix-valued polynomials. Causality and invertibility are characterized in terms of the generating polynomials: Causality: X t is causal if det (z) = 0 for all |z| 1; Invertibility: X t is invertible if det (z) = 0 for all |z| 1. Assume that
i, j = 1, . . . , m.
(13)
Denition 12.4 (The cross spectrum) Let {X t , t Z} be an m-variate stationary time series whose ACVF satises (13). The function 1 fjk () = 2
eih jk (h),
h=
, j = k,
is called the cross spectrum or cross spectral density of {Xtj } and {Xtk }. The matrix f11 () . . . f1m () . f () = . . fm1 () . . . fmm () is called the spectrum or spectral density matrix of {X t }. The spectral density matrix f () is non-negative denite for all [, ].
13
Kalman ltering
{Z t } WN(0, {t }), |
Notice this denition is an extension of Denition 12.2 on the page before in order to allow for non-stationarity. A state-space model is dened by the state equation X t+1 = Ft X t + V t , t = 1, 2, . . . , (14)
13 KALMAN FILTERING where {X t } is a v-variate process describing the state of some system, {V t } WN(0, {Qt }), and {Ft } is a sequence of v v matrices and the observation equation Y t = Gt X t + W t , t = 1, 2, . . . ,
26
(15)
where {Y t } is a w-variate process describing the observed state of some system, {W t } WN(0, {Rt }), and {Gt } is a sequence of w v matrices. Further {W t } and {V t } are uncorrelated. To complete the specication it is assumed that the initial state X 1 is uncorrelated with {W t } and {V t }. Denition 13.1 (State-space representation) A time series {Y t } has a state-space representation if there exists a state-space model for {Y t } as specied by equations (14) and (15). Put Pt (X) = P (X | Y 0 , . . . , Y t ), i.e. the vector of best linear predictors of X1 , . . . , Xv in terms of all components of Y 0 , . . . , Y t . Linear estimation of X t in terms of Y 0 , . . . , Y t1 denes the prediction problem; Y 0 , . . . , Y t denes the ltering problem; Y 0 , . . . , Y n , n > t, denes the smoothing problem. Theorem 13.1 (Kalman Prediction) The predictors X t = Pt1 (X t ) and the error covariance matrices t = E[(X t X t )(X t X t ) ] are uniquely determined by the initial conditions X 1 = P (X 1 | Y 0 ), and the recursions, for t = 1, . . ., X t+1 = Ft X t + t 1 (Y t Gt X t ) t t+1 = Ft t Ft + Qt t 1 t , t where t = Gt t Gt + Rt , t = Ft t Gt . The matrix t 1 is called the Kalman gain. t (16) (17) 1 = E[(X 1 X 1 )(X 1 X 1 ) ]
def def def def
13 KALMAN FILTERING
def
27
Theorem 13.2 (Kalman Filtering) The ltered estimates X t|t = Pt (X t ) and the error covariance matrices t|t = E[(X t X t|t )(X t X t|t ) ] are determined by the relations Xt|t = Pt1 (X t ) + t Gt 1 (Y t Gt X t ) t and t|t+1 = t t Gt 1 Gt t . t Theorem 13.3 (Kalman Fixed Point Smoothing) The smoothed estimates def X t|n = Pn (X t ) and the error covariance matrices t|n = E[(X t X t|n )(X t X t|n ) ] are determined for xed t by the recursions, which can be solved successively for n = t, t + 1, . . . : Pn (X t ) = Pn1 (X t ) + t.n Gn 1 (Y n Gn X n ), n t.n+1 = t.n [Fn n 1 Gn ] , n t|n = t|n1 t.n Gn 1 Gn t.n , n with initial conditions Pt1 (X t ) = X t and t.t = t|t1 = t found from Kalman prediction.
def def
Sakregister
ACF, 6 ACVF, 6 AICC, 23 AR(p) process, 10 ARCH(p) process, 11 ARIMA(p, d, q) process, 11 ARMA(p, q) process, 9 causal, 9 invertible, 9 multivariate, 24 autocorrelation function, 6 autocovariance function, 6 autoregressive process, 10 best linear predictor, 12 Brownian motion, 5 Cauchy-sequence, 3 causality, 9 characteristic function, 2 convergence mean-square, 3 cross spectrum, 25 density function, 2 distribution function, 2 DurbinLevinson algorithm, 13 estimation least square, 22 maximum likelihood, 23 FARIMA(p, d, q) process, 11 Fourier frequencies, 17 GARCH(p, q) process, 11 Gaussian time series, 6 generating polynomials, 9 HannanRissanen algorithm, 21 IID noise, 8 innovations algorithm, 13 invertibility, 9 Kalman ltering, 27 28 Kalman prediction, 26 Kalman smoothing, 27 linear lter, 14 causal, 15 stable, 15 time-invariant, 14 linear process, 8 MA(q) process, 10 mean function, 5 mean-square convergence, 3 mean-squared error, 12 moving average, 10 observation equation, 26 PACF, 14 partial autocorrelation, 14 partial correlation coecient, 14 periodogram, 17 point estimate, 5 Poisson process, 6 power transfer function, 15 probability function, 2 probability measure, 2 probability space, 2 random variable, 2 sample space, 2 shift operator, 9 -eld, 2 spectral density, 7 matrix, 25 spectral distribution, 7 spectral estimator discrete average, 18 state equation, 25 state-space model, 25 state-space representation, 26 stochastic process, 5 strict stationarity, 6 strictly linear time series, 15
SAKREGISTER time series, 5 linear, 8 multivariate, 23 stationary, 6, 24 strictly linear, 15 strictly stationary, 6 weakly stationary, 6, 24 TLF, 15 transfer function, 15 weak stationarity, 6, 24 white noise, 8 multivariate, 24 Wiener process, 5 WN, 8, 24 Yule-Walker equations, 10
29