Anda di halaman 1dari 31

Box-Jenkins Analysis of ARMA(p,q) Models

Eric Zivot

April 7, 2011
Box-Jenkins Modeling Strategy for Fitting ARMA(p, q) Models

1. Transform the data, if necessary, so that the assumption of covariance


stationarity is a reasonable one

2. Make an initial guess for the values of p and q

3. Estimate the parameters of the proposed ARMA(p, q) model

4. Perform diagnostic analysis to confirm that the proposed model adequately


describes the data (e.g. examine residuals from fitted model)
Identification of Stationary ARMA(p,q) Processes

Intuition: The mean, variance, and autocorrelations define the properties of


an ARMA(p,q) model. A natural way to identify an ARMA model is to match
the pattern of the observed (sample) autocorrelations with the patterns of the
theoretical autocorrelations of a particular ARMA(p, q) model.

Sample autocovariances/autocorrelations

1 X T 1 XT
j = (yt )(ytj ), = yt
T t=j+1 T t=1
j
j =
0
Sample autocorrelation function (SACF)/correlogram
plot j vs. j
Result: If Yt W N (0, 2) then j = 0 for all j and
d
T j N (0, 1)
so that
1
avar(j ) =
T
Therefore, a simple tstatistic for H0 : j = 0 is

T j
and we reject H0 : j = 0 if

|j | > 1.96 T
Remark: 1, . . . , k are asymptotically independent:
d
T k N (0, Ik )
where k = (1, . . . , k )0.
Box-Pierce and Box-Ljung Q-statistics

The joint statistical significance of 1, . . . , k may be tested using the Box-


Pierce Portmanteau statistic
k
X
Q(k) = T 2j
j=1

Assuming Yt W N (0, 2) it is straightforward to show that


d
Q(k) 2(k)
Reject H0 : 1 = = k = 0 if

Q(k) > 20.95(k)


Remark: Box and Ljung show that a simple degrees-of-freedom adjustment
improves the finite sample performance:
k
X 2j d
Q(k) = T (T + 2) 2(k)
j=1 T j
Partial Autocorrelation Function (PACF)

The kth order partial autocorrelation of Xt = Yt is the partial regression


coecient (in the population) kk in the kth order autoregression

Xt = 1k Xt1 + 2k Xt2 + + kk Xtk + errort


PACF:
plot kk vs. k
Sample PACF (SPACF)
plot kk vs. k
where kk is estimated from an AR(k) model for Yt.
Example: AR(2)

Yt = 1(Yt1 ) + 2(Yt2 ) + t
11 6= 0, 22 = 2, kk = 0 for k > 2
Example: MA(1)

Yt = + t + t1, || < 1 (invertible)


Since || < 1, Yt has an AR() representation

(Yt ) = 1(Yt1 ) + 2(Yt2 ) +


j = ()j
Therefore,
kk 6= 0 for all k, and kk 0 as k
Results:

1. AR(p) processes: kk 6= 0 for k p, kk = 0 for k > p

2. MA(q) processes: kk 6= 0 for all k, and 0 as k


Inspection of the SACF and SPACF to identify ARMA models is somewhat of
an art rather than a science. A more rigorous procedure to identify an ARMA
model is to use formal model selection criteria. The two most widely used
criteria are the Akaike information criterion (AIC) and the Bayesian (Schwarz)
criterion (BIC or SIC)
2(p + q)
AIC(p, q) = ln( 2) +
T
2 ln(T )(p + q)
BIC(p, q) = ln( ) +
T
2 = estimate of 2 from ARMA(p, q)
Intuition: Think of adjusted R2 in regression:
ln( 2) measures overall fit
2(p + q) ln(T )(p + q)
, penalty terms for large models
T T
Note: BIC penalizes larger models more than AIC.
How to use AIC, BIC to identify an ARMA(p,q) model

1. Set upper bounds, P and Q for the AR and MA order, respectively

2. Fit all possible ARMA(p, q) models for p P and q Q using a common


sample size T

3. The best models satisfy


min AIC(p, q)
pP,qQ
min BIC(p, q)
pP,qQ
Result: If the true values of p and q satisfy p P and q Q then

1. BIC picks the true model with probability 1 as T

2. AIC picks largers values of p and q with positive probability as T


Maximum Likelihood Estimation of ARMA models

For iid data with marginal pdf f (yt; ), the joint pdf for a sample y =
(y1, . . . , yT ) is
T
Y
f (y; ) = f (y1, . . . , yT ; ) = f (yt; )
t=1
The likelihood function is this joint density treated as a function of the para-
meters given the data y :
T
Y
L(|y) = L(|y1, . . . , yT ) = f (yt; )
t=1
The log-likelihood is
T
X
ln L(|y) = ln f (yt; )
t=1
Problem: For a sample from a covariance stationary time series {yt}, the con-
struction of the log-likelihood given above doesnt work because the random
variables in the sample (y1, . . . , yT ) are not iid.

One Solution: Conditional factorization of log-likelihood

Intiution: Consider the joint density of two adjacent observations f (y2, y1; ).
The joint density can always be factored as the product of the conditional
density of y2 given y1 and the marginal density of y1 :

f (y2, y1; ) = f (y2|y1; )f (y1; )


For three observations, the factorization becomes

f (y3, y2, y1; ) = f (y3|y2, y1; )f (y2|y1; )f (y1; )


In general, the conditional marginal factorization has the form

T
Y
f (yT , . . . , y1; ) = f (yt|It1, ) f (yp, . . . , y1; )
t=p+1
It = {yt, . . . , y1} = info availble at time t
yp, . . . , y1 = initial values
The exact log-likelihood function may then be expressed as
T
X
ln L(|y) = ln f (yt|It1, ) + ln f (yp, . . . , y1; )
t=p+1
The conditional log-likelihood is
T
X
ln L(|y) = ln f (yt|It1, )
t=p+1
Two types of maximum likelihood estimates (mles) may be computed.

The first type is based on maximizing the conditional log-likelihood function.


These estimates are called conditional mles and are defined by
T
X
cmle = arg max ln f (yt|It1, )
t=p+1
The second type is based on maximizing the exact log-likelihood function.
These estimates are called exact mles, and are defined by
T
X
mle = arg max ln f (yt|It1, ) + ln f (yp, . . . , y1; )
t=p+1
Result: For stationary models, cmle and mle are consistent and have the
same limiting normal distribution:

cmle, mle N (, T 1V)

In finite samples, however, cmle and mle are generally not equal and my
dier by a substantial amount if the data are close to being non-stationary or
non-invertible.
Example: MLE for stationary AR(1)

yt = c + yt1 + t, t iid N (0, 2), t = 1, . . . , T


= (c, , 2)0, || < 1
Conditional on It1

yt|It1 N (c + yt1, 2), t = 2, . . . , T


which only depends on yt1. The conditional density f (yt|It1, ) is then

2 1
f (yt|yt1, ) = (2 ) 1/2 exp 2 (yt c yt1)2 ,
2
t = 2, . . . , T
To determine the marginal density for the initial value y1, recall that for a
stationary AR(1) process
c
E[y1] = =
1
2
var(y1) =
1 2
It follows that
!
c 2
y1 N ,
1 1 2
!1/2 !2
2 2 1 2 c
f (y1; ) = exp y1
1 2 2 2 1
The conditional log-likelihood function is
T
X (T 1) (T 1)
ln f (yt|yt1, ) = ln(2) ln( 2)
t=2 2 2
1 XT
2 (yt c yt1)2
2 t=2
Notice that the conditional log-likelihood function has the form of the log-
likelihood function for a linear regression model with normal errors

yt = c + yt1 + t, t = 2, . . . , T
t iid N (0, 2)
It follows that

ccmle = cols
cmle = ols
T
X
2cmle = (T 1)1 (yt ccmle cmleyt1)2
t=2
The marginal log-likelihood for the initial value y1 is
!
1 1 2
ln f (y1; ) = ln(2) ln
2 2 1 2
!2
1 2 c
y1
2 2 1
The exact log-likelihood function is then
!
T 1 2
ln L(|y) = ln(2) ln
2 2 1 2
!2
1 2 c
y1
2 2 1
(T 1) 1 XT
2
ln( ) 2 (yt c yt1)2
2 2 t=2
Remarks

1. The exact log-likelihood function is a non-linear function of the parameters


, and so there is no closed form solution for the exact mles.

2. The exact mles must be determined by numerically maximizing the exact


log-likelihood function. Usually, a Newton-Raphson type algorithm is used for
the maximization which leads to the interative scheme
mle,n = mle,n1 H(mle,n1)1s(mle,n1)
where H() is an estimate of the Hessian matrix (2nd derivative of the log-
likelihood function), and s() is an estimate of the score vector (1st derivative
of the log-likelihood function).

3. The estimates of the Hessian and Score may be computed numerically (using
numerical derivative routines) or they may be computed analytically (if analytic
derivatives are known).
Prediction Error Decomposition of Log-Likelihood

To illustrate this algorithm, consider the simple AR(1) model. Recall,

yt|It1 N (c + yt1, 2), t = 2, . . . , T


from which it follows that

E[yt|It1] = c + yt1
var(yt|It1) = 2
The 1-step ahead prediction errors may then be defined as

vt = yt E[yt|It1] = yt c + yt1, t = 2, . . . T
The variance of the prediction error at time t is

ft = var(vt) = var(t) = 2, t = 2, . . . T
For the initial value, the first prediction error and its variance are
c
v1 = y1 E[y1] = y1
1
2
f1 = var(v1) =
1 2
Using the prediction errors and the prediction error variances, the exact log-
likelihood function may be re-expressed as
T T
T 1X 1X vt2
ln L(|y) = ln(2) ln ft
2 2 t=1 2 t=1 ft
which is the prediction error decomposition.
Remarks

1. A further simplification may be achieved by writing

var(vt) = 2ft
1
= 2 2 for t = 1
1
= 2 1 for t > 1
That is ft = 1/(1 2) for t = 1 and ft = 1 for t > 1. Then the log-
likelihood becomes
T 1X T 1 XT
vt2
T 2
ln L(|y) = ln(2) ln ln ft 2
2 2 2 t=1 2 t=1 ft
2. With above simplification, 2 may be concentrated out of the log-likelihood.
That is,
T
ln L(|y) 2 1 X vt2
= 0 mle(c, ) =
2 T t=1 ft
Substituting 2mle(c, ) back into ln L(|y) gives the concentrated log-likelihood
T
c T T 2 1X
ln L (c, |y) = ln(2 + 1) ln mle(c, ) ln ft
2 2 2 t=1
Maximizing ln Lc(c, |y) gives the mles for c and . Maximizing ln Lc(c, |y)
is faster than maximizing ln L(|y) and is more numerically stable.
3. For general time series models, the prediction error decomposition may be
conveniently computed as a by product of the Kalman filter algorithm if the
time series model can be cast in state space form.

We will cover this algorithm later on in the course.


Diagnostics of Fitted ARMA Model

1. Compare theoretical ACF, PACF with SACF, SPACF

2. Examine autocorrelation properties of residuals

SACF, SPACF of residuals

LM test for serial correlation in residuals


Remarks:

1. If the Box-Ljung Q-statistic is used on the residuals from a fitted ARMA(p,q)


model, the degrees of freedom of the limiting chi-square distribution must be
adjusted for the number of estimated parameters:
d
Q(k) 2k(p+q)
Note that this test is only valid for k > (p + q). In finite samples, it may
perform badly if k is close to p + q.

Anda mungkin juga menyukai