Anda di halaman 1dari 95

WESS Time Series Lectures

Alexander Karalis Isaac

July, 2015

Alexander Karalis Isaac (Warwick) Time Series July, 2015 1 / 90


Econometrics so far
We have studied the regression equation

yi = α + βxi + i i = 1...n

where i indicates an individual in a sample of n observations.


We have been interested in
The effect of x on y; dy /dx = β
The predicted value of y, given x; E[yi |xi ] = α + βxi
The fit of the model, e.g. the R 2 statistic
Can we do the same with a pair of time series

yt = α + βxt + t

NO!
(Or rather, only under special circumstances, and such a regression is only
ever part of the answer!)

Alexander Karalis Isaac (Warwick) Time Series July, 2015 2 / 90


Two problems in time series (1)

With data yt , xt the critical assumption E[t |xt ] = 0 is difficult to


maintain. The y and x are often simultaneously determined in a system,
’the economy’.

yt = αy + βy xt + yt
xt = αx + βx yt + xt

Consider the regression

yt = a + bxt + et
= a + b(α + βx yt + xt ) + et

The regression error et is an estimate of the yt error yt . It will be


correlated with the regressor xt as this regressor actually contains yt , and
so contains yt

Alexander Karalis Isaac (Warwick) Time Series July, 2015 3 / 90


Two problems in time series (2)

Look at the estimates from a regression yt = α + βxt + t

param estimate tvalue


a 7.9 (139)
b 2.9 (15.6)
R2 0.75 -

Does this look like a good regression?

Alexander Karalis Isaac (Warwick) Time Series July, 2015 4 / 90


Two problems in time series (2)

Look at the estimates from a regression yt = α + βxt + t

param estimate tvalue


a 7.9 (139)
b 2.9 (15.6)
R2 0.75 -

Does this look like a good regression?

yt is U.S. output. xt is mean land sea temperature


Now what do you think about the regression?

Alexander Karalis Isaac (Warwick) Time Series July, 2015 4 / 90


Two problems in time series (2)

Look at the estimates from a regression yt = α + βxt + t

param estimate tvalue


a 7.9 (139)
b 2.9 (15.6)
R2 0.75 -

Does this look like a good regression?

yt is U.S. output. xt is mean land sea temperature


Now what do you think about the regression?

This is not a regression about climate change!


It is called the spurious regression problem.
It is easy to do crap regressions with time series data!

Alexander Karalis Isaac (Warwick) Time Series July, 2015 4 / 90


Overview

In Part I our models will look like

yt = α + βyt−1 + t or yt = α + βt−1 + t

so the explanatory variable is replaced by the previous value of the


dependent variable, or by previous errors.
Our primary interest will be prediction, E[yt |yt−1 ] = α + βyt−1 .

Alexander Karalis Isaac (Warwick) Time Series July, 2015 5 / 90


Overview

In Part I our models will look like

yt = α + βyt−1 + t or yt = α + βt−1 + t

so the explanatory variable is replaced by the previous value of the


dependent variable, or by previous errors.
Our primary interest will be prediction, E[yt |yt−1 ] = α + βyt−1 .
In Part II we will learn how to estimate dynamic relationships

yt = α0 + α1 yt−1 + β0 xt + β1 xt−1 + t

in a way that elimantes the two common problems with time series
regressions.

Alexander Karalis Isaac (Warwick) Time Series July, 2015 5 / 90


Time series data

Our sample data {yt }, {xt } refers to observations on the same unit in
sequential time periods, t = 1, . . . , T
Periods may be years, quarters, months, weeks or days, depending on
interst and data availability

years quarters months weeks days


finance finance finance
macro macro macro
growth growth

Alexander Karalis Isaac (Warwick) Time Series July, 2015 6 / 90


Time series data

There are three main types of time sereis data


Mean reverting (’stationary’)
Series with a trend (’trend stationary’) series
Series with permanent shocks (’Integrated series’)

Part I of these notes deals with stationary series

Part II looks at testing for permanent shocks and dealing with Integrated
series

Alexander Karalis Isaac (Warwick) Time Series July, 2015 7 / 90


Mean reverting series

Alexander Karalis Isaac (Warwick) Time Series July, 2015 8 / 90


Trend stationary series

Alexander Karalis Isaac (Warwick) Time Series July, 2015 9 / 90


Integrated series

Alexander Karalis Isaac (Warwick) Time Series July, 2015 10 / 90


Know your data

The data file ‘macro vars.xls’ contains lots of U.S. data series
GDP is a good example

Plot U.S. GDP

What is the first thing to do to this data?

Alexander Karalis Isaac (Warwick) Time Series July, 2015 11 / 90


Take logs

The first thing we don’t like is the exponential shape.


Our regressions are linear regressions, so lets transform the data to make it
more linear

generate ln(GDP) and plot this

Alexander Karalis Isaac (Warwick) Time Series July, 2015 12 / 90


Logs

Look at your data. For data in levels, taking logs is often the first
step in applied work
Not if the data is already in % changes, or an interest rate!
Logarithms make percentage changes comparable by eye, which is
often more relevant.
Recall g = (xt − xt−1 )/xt−1 ⇒ 1 + g = xt /xt−1 , so then
ln(xt ) − ln(xt−1 ) = ln(1 + g ) ≈ ln(g ) for small g .

Alexander Karalis Isaac (Warwick) Time Series July, 2015 13 / 90


Difference

If the variable appears to be trending, as with ln GDPt , a safe thing to


do is take differences.

Generate the difference of GDP and plot this


What is the main difference compared to previous plots?

This now looks mean reverting. Later we will test formally whether a
series is mean reverting or integrated, but don’t forget it’s always
sensible to start by looking at your data.
∆yt = yt − yt−1 is the difference operator.

Alexander Karalis Isaac (Warwick) Time Series July, 2015 14 / 90


Know your data: prices

Consider the price data. Look at


levels
log levels
difference of logs - inflation
the difference of inflation
Which would you be happy to consider mean reverting?

Alexander Karalis Isaac (Warwick) Time Series July, 2015 15 / 90


Part I: Modelling stationary time series

So far I have talked about mean reverting series, but we can be more
precise
We will model variables which are covariance stationary (stationary
for short)
The mean exists and does not depend on time, E[yt ] = µ for all t
(quick notation ∀t).
The variance exists and is independent of time, var(yt ) = σy2 .
The Autocovariance, cov(yt , yt−k ) = σk2 is indpendent of time, it
depends only on k and not on t

Alexander Karalis Isaac (Warwick) Time Series July, 2015 16 / 90


Modelling stationary time series: assumptions on errors

Any model of a stationary series imposes two key assumptions on the


errors
A1: E[t ] = E[E[t |yt−1 ]] = 0
A2: E[t t−s ] = 0 ∀s > 0
There are also some technical assumptions
A3: E[2t ] = var(t ) = σ2
A4: yt and yt−j become independent as j gets large
A5: Very large outliers are unlikely
These assumptions apply to the true model, and we have to replicate them
in our statistical model.

Alexander Karalis Isaac (Warwick) Time Series July, 2015 17 / 90


Discussion of assumpions

A1 This tells us that t is unpredictable given information about yt−1 ,


available before period t begins. In the regression context, we require
that t is unpredictable given all our r.h.s. variables. We use
predetermined data yt−1 , yt−2 , t−1 , t−2 , ...
A2 In cross-sections, this is a second order assumption determining the
standard error of b.
In time series it is a first order assumption, determining the
consistency of b. See exercise.
A3 Allows us to make calculations about variances, including confidence
intervals around parameter estimates and forecasts. It is implied by
stationarity

Alexander Karalis Isaac (Warwick) Time Series July, 2015 18 / 90


Discussion of assumptions

A4 This is a technical requirement to derive limiting behaviour of


estimators. It replaces i.i.d. assumption in cross-sectional data
A5 This says that our models are not suitable for certain types of very
wild randomness. You should worry about this if you do
high-frequency finance, but it’s generally not a problem with
macroeconomic data.
In applied work, spurious regressions and models with wrong/insufficient
dynamics tend to violate A2, so checking it is key. Also, check A2 if you
are evaluating someone else’s work!

Alexander Karalis Isaac (Warwick) Time Series July, 2015 19 / 90


Stationary time series: AR(1) model

Our first time sereis model for stationary data

yt = α + βyt−1 + t (1)

This replaces independent explanatory data with past value of the


dependent variable.
Models the correlation between yt and its own past
Stationarity requires |β| < 1
Then the influence of past shocks dies away smoothly
Estimate the model by OLS

Alexander Karalis Isaac (Warwick) Time Series July, 2015 20 / 90


The AR(1) estimator

The AR(1) regression is like a standard OLS regression


PT
t=2 (yt − ȳ )(yt−1 − ȳ )
b= PT 2
t=2 (yt−1 − ȳ )
cov(yt , yt−1 )
=
var (yt )
a = ȳ − b ȳ
a
⇒ ȳ =
1−b

Alexander Karalis Isaac (Warwick) Time Series July, 2015 21 / 90


The AR(1) estimator

Variance of b follows standard OLS theory


T
X −1
var(b) = σ̂ 2 (yt−1 − ȳ )2
t=2
σ̂ 2
=
\t )
var(y
T
1 X
where σ̂ 2 = ˆ2t
T −1−k
t=2

Note we lose an extra DoF for every lag we include in the autoregression
Confidence testing as usual, given |b| < 1:
τ = (b − bH0 )/SE (b) ∼ tα/2,DoF
Expect low R 2 compared to cross sectional data.

Alexander Karalis Isaac (Warwick) Time Series July, 2015 22 / 90


Example

Look at the series for ∆ln(GDP) and do an AR(1) estimation

param estimate tvalue


a
b
R2 -

Plot the residuals of the regression. Do you think they meet A1 - A3?
We will look at formal tests for these assumptions below.

Alexander Karalis Isaac (Warwick) Time Series July, 2015 23 / 90


General AR(p) model
One lag of yt may not be enough: an omitted variable bias
This shows up as E[t t−s ] 6= 0
We find a model with enough lags to ensure E[t t−s ] = 0∀s

yt = α + β1 yt−1 + β2 yt−2 + · · · + βp yt−p + t

Post-estimation approximate F-test, Bruesch-Godfrey test

ˆt = b1 ˆt−1 + · · · + bq t−q


ˆ + νt
H0 : b1 = b2 = · · · = bq = 0
HA : bi 6= 0 for some i
(RSSR − RSSU)/q
τ= ∼ χ2q
(RSSU)/DoF
= nR 2 ∼ χ2q

Inference on β̂i as in standard multivariate OLS models


Alexander Karalis Isaac (Warwick) Time Series July, 2015 24 / 90
Model selection strategy

Should be begin small and add lags until A2 holds?

Alexander Karalis Isaac (Warwick) Time Series July, 2015 25 / 90


Model selection strategy

Should be begin small and add lags until A2 holds?


NO!: Don’t base your model selection algorithm on starting from
models that don’t make any statistical sense
Start big and eliminate insignificant regressors, to find the smallest
model for which A2 still holds
Often start with p = f + 1 where f =nobs/year.
Quarterly example
Begin with p=5
Re-esetimate excluding the insiginifcant longer lags
Check E[t t−s ] = 0, s = 1...4
Repeat untill model contains only significant terms
D. Hendry ’PcGets’ software automates this

Alexander Karalis Isaac (Warwick) Time Series July, 2015 25 / 90


Notes on examples
You do some examples: ∆GDPt , ∆Const , ∆Invt , ∆Inft :

Alexander Karalis Isaac (Warwick) Time Series July, 2015 26 / 90


The MA(q) process

We noticed some series require very long AR models to capture all


the conditional correlation of the yt series.
This costs degress of freedom, making estimates and forecasts less
accurate
Is there a smaller model which could capture the dependency that AR
models struggle with?

Alexander Karalis Isaac (Warwick) Time Series July, 2015 27 / 90


The MA(q) process

We noticed some series require very long AR models to capture all


the conditional correlation of the yt series.
This costs degress of freedom, making estimates and forecasts less
accurate
Is there a smaller model which could capture the dependency that AR
models struggle with?
This is the moving average process

Alexander Karalis Isaac (Warwick) Time Series July, 2015 27 / 90


The MA(1) process

Equation:

yt = α + βt−1 + t

Simple to analyse
Stationary for any β value
t }T
Harder to estimate - b determines {ˆ t }T
t=1 , but {ˆ t=1 is the
regressor which determines b!
Solution: take an MLE approach (as in Probit)

Alexander Karalis Isaac (Warwick) Time Series July, 2015 28 / 90


MLE in the MA(1)

t |yt−1 ∼ N(0, σ2 )


1
f (yt |yt−1 ) = √ exp((yt − α − βt−1 )2 /(2σ2 ))
2πσ
X
l(α, β, σ2 ) = ln f (yt |yt−1 )
t
max l(.)w .r .t.α, β, σ2

Techinchally this is also conditional on 0 . A typical assumption is


0 = E[t ] = 0, though there are other approaches.
Inference follows standard maximum likelihood procedure

Alexander Karalis Isaac (Warwick) Time Series July, 2015 29 / 90


Information criteria

The MLE approach suggests another tool for tackling model selection
Minimise the expected information loss across potential models

AIC: −(2l(θ̂) − 2k): choose model with lowest AIC


BIC: −(2l(θ̂) − k ln(T )): choose model with lowest BIC

BIC generally chooses smaller models, unless you have small T

Combine insights from significance tests and Info criteria to choose


parsimonious model. Always check A2 holds!
Information criteria are also relevant for AR models, which can be
placed within MLE theory

Alexander Karalis Isaac (Warwick) Time Series July, 2015 30 / 90


Notes on examples
You do some MA(q) examples: ∆GDPt , ∆Const , ∆Invt , ∆Inft :

Alexander Karalis Isaac (Warwick) Time Series July, 2015 31 / 90


Model evaluation: forecast performance
If your job is forecasting, choose model with best forecasts!
I In sample forecasts:
Estimation period is 1 . . . T and look at e.g. 1-period ahead forecast
E[yt+1 |yt , θ̂T ]
This is similar to in-sample fit where we compare ŷt with yt , but now
we are doing it 1-period ahead.
I Out of sample forecasts:
Estimation period is 1 . . . N, and look at 1-period ahead forecasts
E[yt+1 |yt , θ̂N ] for t = N + 1, N + 2, N + 3 etc. up to final data point T .
This is a tougher test as none of the information in the forecast period
contributed to the parameter estimation.
A simple criterion Minimum Mean Square Error
N
1 X
MSE = (ŷi − yi )2
N
i=1

where ŷi is the forecast, yi is the realsiation.


Alexander Karalis Isaac (Warwick) Time Series July, 2015 32 / 90
Empirical Example: In-sample forecast comparisons

Compare 1-step ahead forecasts from 4-lag and preferred AR, MA models

Variable Model MSE Model MSE


∆ GDP AR(4) MA(4)
AR( ) MA( )
∆ Cons AR(4) MA(4)
AR( ) MA( )
∆ Inv AR(4) MA(4)
AR( ) MA( )
∆ Inf AR(4) MA(4)
AR( ) MA( )

Alexander Karalis Isaac (Warwick) Time Series July, 2015 33 / 90


ARMA(p,q) models

We can combine the forecasting power of AR and MA components

yt = α + β1 yt−1 + · · · + βp yt−p + γ1 t−1 + · · · + γq t−q + t

A1, A2, A3 apply for a well specified model


Estimation is by maximum likelihood
Don’t do large ARMAs, in practice ARMA(2,1) is often a good
approximation for macroeconomic time series.

Alexander Karalis Isaac (Warwick) Time Series July, 2015 34 / 90


Forecasts from an ARMA(2,1)

MSE
Variable k=1 k=4 k=8
∆ GDP
∆ Cons
∆ Inv
∆ Inf

Estimate the model to 2005. From 2003q1, produce static 1-period ahead
forecasts up to 2005, then dynamic 4 and 8 period ahead forecasts also
from 2003q1
What happens to the MSE as the forecast horizon increases?

Alexander Karalis Isaac (Warwick) Time Series July, 2015 35 / 90


Out of sample forecast example

Now estimate the model to 2007 and repeat the process using dynamic
out of sample forecasting up to 2011

MSE
Variable k=1 k=4 k=8
∆ GDP
∆ Cons
∆ Inv
∆ Inf

This is the problem the BoE had (with a more sophisticated model) during
the crisis
The FED did less badly because its model updates the parameters, via the
Kalman filter, when it makes an error. Beyond the scope of this course!

Alexander Karalis Isaac (Warwick) Time Series July, 2015 36 / 90


More on forecast errors

We have used the MSFE to look at different models and the effect of
different time horizons
Out of sample forecasts errors are larger than in-sample, because the
forecast error is really composed of two parts

MSFE = E[(yT +1 − ŷT +1|T )2 ]


= σ2 + var[(a − α) + (b − β)yT ]

The out of sample forecasts involve re-estimating the model, so give


an estimate of the likely performance of the model in real time

Alexander Karalis Isaac (Warwick) Time Series July, 2015 37 / 90


Deeper into time sereis: preliminaries

What does the AR part actually measure?


What does the MS part actually measure?
Why is their combination sometimes more useful?

Think about the way the influence past shocks, t−s decays over time
To go deeper into time series we need to brush up our maths!
We will look at deriving the conditional and unconditional
expectations, variances and autocovariances for simple time-series
models.

Alexander Karalis Isaac (Warwick) Time Series July, 2015 38 / 90


Conditional Expectations

The conditional expectation E[yt+1 |yt ] follows from the conditional mean
eqation we write down in AR(1) or MA(1) model

E[yt+1 |yt ] = E[α + βyt + t+1 |yt ]


= α + βE[yt |yt ] + E[t+1 |yt ]
= α + βyt

E[yt+1 |yt ] = E[α + βt + t+1 |yt ]


= α + βE[t |yt ]
= α + βt

Alexander Karalis Isaac (Warwick) Time Series July, 2015 39 / 90


Looking further ahead: iterative forecasts
AR(1)

E[yt+2 |yt ] = E[α + βyt+1 + t+2 |yt ]


= α + βE[yt+1 |yt ] + E[t+2 |yt ]
= α + β(α + βyt )
= α + βα + β 2 yt
k−1
X
E[yt+k |yt ] = β i α + β k yt
i=0
α
lim E[yt+k |yt ] =
k→∞ 1−β

MA(1)

E[yt+k |yt ] = α ∀k ≥ 2

Alexander Karalis Isaac (Warwick) Time Series July, 2015 40 / 90


Unconditional Expectations

If we know the process, but have no observations, what is our best


guess at a value yt ? Our best guess is the unconditional mean implied
by the process
AR(1)

E[yt ] = α + βE[yt−1 ] + E[t ]


= α + βE[yt ]
α
E[yt ] =
1−β

MA(1)

E[yt ] = α + βE[t−1 ] + E[t ]


Alexander Karalis Isaac (Warwick) Time Series July, 2015 41 / 90


Uncertainty and variance AR(1)

Conditional variance

var(yt+1 |yt ) = var(α + βyt + t+1 |yt )


= var (t |yt ) = σ2
var(tt+2 |yt ) = var(α + βyt+1 + t+2 |yt )
= β 2 var(yt+1 |yt ) + var(t+2 |yt )
= (1 + β 2 )σ 2
k−1
X
var(yt+k |yt ) = (β 2 )i σ 2
i=0
⇒ lim var(yt+k |yt ) =
k→∞

Alexander Karalis Isaac (Warwick) Time Series July, 2015 42 / 90


Uncertainty and variance AR(1)

Unconditional variance

σy2 = var(yt ) = var(α + βyt−1 + t )


= β 2 var(yt ) + σ2
σ2
σy2 =
1 − β2
Compare this to the limit of the conditional variance

Alexander Karalis Isaac (Warwick) Time Series July, 2015 43 / 90


Uncertainty and Variance MA(1)

Conditional variance

var(yt+1 |yt ) = var(α + βt + t+1 |yt )


= σ2
var(yt+k |yt ) = var(α + βt+k−1 + t+k |yt )
= (1 + β 2 )σ2 ∀k ≥ 2

Unconditional variance

σy2 = var(α + βt−1 + t )


= (1 + β 2 )σ 2

So the conditional variance of MA(1) returns to unconditional


variance after 2 periods!

Alexander Karalis Isaac (Warwick) Time Series July, 2015 44 / 90


Forecast error variance

We should include confidence intervals in our forecasts


Assume t ∼ N(0, σ2 )
Then the 95% confidence intervals for E[yt+k |yt ] are
k−1
X 
AR(1) = yt+k|t ± 1.96 (β 2 )i σ2
j=0

MA(1) = yt+k|t ± 1.96(1 + β 2 )σ2 ∀k ≥ 2

In practice it is common to apply these formulas to forecasts


generated with estimates a, b, σ̂2 , ignoring the extra uncertainty
created by estimating parameters

Alexander Karalis Isaac (Warwick) Time Series July, 2015 45 / 90


Forecasts with confidence intervals

PIC

Alexander Karalis Isaac (Warwick) Time Series July, 2015 46 / 90


Deeper into time sereis: ACF

An important property is the correlation between yt and yt−k


The Autocovariance function is the set of numbers
cov(yt , yt−k ) := σk2
The sample estimator
PT of the Autocovariance function is
2 1
σ̂k = T −k−1 t=k+1 ỹt ỹt−k where ỹt = yt − ȳ
The Autocovariance function is normalised by the variance of y to
give the Autocorrelation function ACF(k):

cov(yt , yt−k )
ρk =
var(yt )

Alexander Karalis Isaac (Warwick) Time Series July, 2015 47 / 90


ACF for various stationary models

PIC

Alexander Karalis Isaac (Warwick) Time Series July, 2015 48 / 90


ACF: discussion

The ACF shows us how long it takes for the influence of past shocks
to die away, by measuring the correlation between yt and its own past
values.
For stationary processes the ACF becomes statistically insignificant
after a finite number of periods.
Stationary processes have finite memory - the influence of a shock is
finite
PIC:growth ACF

Alexander Karalis Isaac (Warwick) Time Series July, 2015 49 / 90


Deeper into time series: PACF

Clearly autoregressions can caputre correlation between yt and its


past, but how many lags do we need?
If yt = α + β1 yt−1 + β2 yt−2 + t , then we know from regression
analysis that β2 is a measure of the conditional correlation between yt
and yt−2 after accounting for the correlation explained by yt−1

cov(yt , yt−k |yt−1 , yt−2 , ..., yt−k+1 )


PACF (k) = 1/2
var(yt |yt−1 , ..., yt−k+1 ) var(yt−k |yt−1 , ..., yt−k+1 )
e.g. PACF (3) =

Alexander Karalis Isaac (Warwick) Time Series July, 2015 50 / 90


PACFs for stationary processes

PIC

Alexander Karalis Isaac (Warwick) Time Series July, 2015 51 / 90


Memory in AR(1)
Let yt = βyt−1 + t , i.e. put α = 0 ⇒ µ = 0

cov(yt , yt−1 ) = E[(βyt−1 + t )yt−1 ]


2
= βE[yt−1 ] = βσy2
⇒ corr (yt , yt−1 ) = β

cov(yt , yt−2 ) = E[(βyt−1 + t )yt−2 )]


= E[(β(βyt−2 + t−1 ) + t )yt−2 ]
= E[β 2 yt−1 + βt−1 yt−2 + t yt−2 ]
= βσy2
⇒ corr (yt , yt−2 ) = β 2

corr (yt , yt−k ) = β k

Alexander Karalis Isaac (Warwick) Time Series July, 2015 52 / 90


ACF for different AR(1) models

PIC

Alexander Karalis Isaac (Warwick) Time Series July, 2015 53 / 90


PACF AR models

yt = β1 yt−1 + β2 yt−2 +t


| {z }
cond corr

The coefficient in the bracketed term is


cov(yt , yt−2 |yt−1 )
p
var(yt |yt−1 ) var(yt−2 |yt−1 )

PACF (k) = βk in AR(p) models


So the PACF drops sharply to 0 after the final lagged term in the
AR(p) model
This is an alternative way to think about how many lags to include

Alexander Karalis Isaac (Warwick) Time Series July, 2015 54 / 90


PACF various AR models

PIC
What do you notice about ACF vs. PACF in AR models?

Alexander Karalis Isaac (Warwick) Time Series July, 2015 55 / 90


ACF MA(1)

Consider the mean zero MA(1) yt = βt−1 + t

cov(yt , yt−1 ) = E[(βt−1 + t )(βt−2 + t−1 )]


= βσ2
⇒ corr (yt , yt−1 ) = β

cov(yt , yt−2 ) = E[(βt−1 + t )(βt−3 + t−2 )]


=0
ACF (k) = 0 ∀k≥2

The ACF of an MA(q) process drops to 0 sharply after q + 1 lags

Alexander Karalis Isaac (Warwick) Time Series July, 2015 56 / 90


PACF of MA(1)

To calculate the PACF directly is hard. Here’s a neat trick


Assume β < 1, notice t = yt − βt−1

yt = β(yt−1 − βt−2 ) + t
= β(yt−1 − β(yt−2 − βt−3 )) + t
= βyt−1 − β 2 yt−2 + β 3 (yt−3 − βt−4 ) + t
X∞
yt = (−1)i+1 β i yt−i + t
j=1

Which is an AR(∞), and is well defined give |β| < 1.


Using the earlier result, the PACF will decay geometrically as β i declines
to zero

Alexander Karalis Isaac (Warwick) Time Series July, 2015 57 / 90


Box Jenkins model building method

Two famous statisticians suggested the ACF/PACF as a way of building


times series regressions

AR(p) MA(q) ARMA(p,q)


ACF Decays smoohtly Chops off at q lags Decays smoothly
PACF Chops off at p lags Decays smoothly Decays smoothly

Inspection of empirical ACF, PACF can help suggest sensible starting


ARMA(p,q) model.
Then test down to small model using significance and information
criteria.
Always check A2 holds for your residuals

Alexander Karalis Isaac (Warwick) Time Series July, 2015 58 / 90


Emprical P/ACF

Genuine AR(1) process


PIC
∆ ln GDP
PIC

Alexander Karalis Isaac (Warwick) Time Series July, 2015 59 / 90


Emprical P/ACF

Genuine MA(1) process


PIC
∆Inf
PIC

Alexander Karalis Isaac (Warwick) Time Series July, 2015 60 / 90


Summary

We have dealt with finite memory processes where


I ACF (k) → 0 as k → ∞
I PACF (k) → 0 as k → ∞
I E[yt ] = µ ∀ t
I var(yt ) = σy2 ∀ t
I cov(yt , yt−k ) depends only on k and not t
ARMA(p,q) models make decent forcasts for these series
But in economics, they are only approximate models

How do we deal with levels of series and model relationships between


dynamic economic variables?

Alexander Karalis Isaac (Warwick) Time Series July, 2015 61 / 90


PART II: Integrated processes

Prcoesses with permanent shocks are called integrated processes


I A simple example shows our ideas of µ and σ 2 are not compatible with
permanent shocks
The first problem is to decide if a series is integrated
I Dickey Fuller tests
We then have a choice
I Difference the series to make it stationary
I Look for cointegration between two or more integrated series

Alexander Karalis Isaac (Warwick) Time Series July, 2015 62 / 90


Permanent shocks

Consider the random walk yt = yt−1 + t , y0 = 0

y1 = y0 + 1 = 1
y2 = y1 + 2 = 1 + 2
...yt = 1 + 2 + · · · + t

Xt−1
var(yt ) = var( t−i )
j=0
2
= tσ
→∞ as t → ∞

The variance of this process grows without bound

Alexander Karalis Isaac (Warwick) Time Series July, 2015 63 / 90


Permanent shocks

What about the mean?


Think about the random walk with drift yt = α + yt−1 + t
This is an AR(1) with β = 1
α
Thus E[yt ] = 1−β is undefined
The process has no unconditional mean
Conditional forecasts

E[yt+k |yt ] = α + yt−1

With an error variance that grows without bound

Regression analysis struggles with such data

Alexander Karalis Isaac (Warwick) Time Series July, 2015 64 / 90


Regressions with random walks

Regress the two uncorrelated random walks yt , xt in the dataset on


eachother

param value tstat


α
β
R2 -

Breusch-Godfrey stat for serial corr up to order 4:


This is typical of a spurious regression
High R 2 combined with positive serial correlation is always a sign of
spurious regression
Now regress ∆y on ∆x. Is there any relationship?

Alexander Karalis Isaac (Warwick) Time Series July, 2015 65 / 90


Testing for unit roots: Dickey-Fuller test

The best way to avoid spurious regressions is to do regressions with


stationary series
To determine stationarity, we need to test β = 1 in the process
yt = α + βyt−1 + t

∆yt = α + (β − 1)yt−1 + t
= α + ρyt−1 + t
H0 : ρ̂ = 0 ⇒ there is a unit root
HA : ρ̂ < 0 ⇒ No unit root
ρ̂
tDF =
SE (ρ̂)

The test stat tDF follows the Dickey Fuller distribution, which gives much
more negative critical values than the standard normal

Alexander Karalis Isaac (Warwick) Time Series July, 2015 66 / 90


Dickey Fuller distribution

PIC
The DF distribution is sensitive to specification of the test
I Inclusion of an intercept
I Inclusion of a trend
I Number of lags
I Sample size

Alexander Karalis Isaac (Warwick) Time Series July, 2015 67 / 90


The Augmented Dickey Fuller test

It is essential the there is no serial correlation in DF regression


residuals
If necessary add lagged differences of the dependent variable

yt = α + β1 yt−1 + β2 yt−2 + t
= α + β1 yt−1 + β2 yt−1 − β2 yt−1 + β2 yt−2 + t
= α + (β1 + β2 )yt−1 − β2 ∆yt−1 + t
∆yt = α + (β1 + β2 − 1)yt−1 − β2 ∆yt−1 + t
= α + ρyt−1 − β2 ∆yt−1 + t

Hypothesis, Alternative and test statistic as previous slide

Alexander Karalis Isaac (Warwick) Time Series July, 2015 68 / 90


Dealing with trends

Include trends using the ‘restricted trend’ option if available, for


g = γ/(1 − β)

yt = α + γt + βyt−1 + t
∆yt = α + (β − 1)(yt−1 − gt) + t
= α + ρ(yt−1 − gt) + t
⇒ ∆yt = α + t if ρ = 0 (2)
⇒ yt = α + γt + βyt−1 + t if ρ < 0 (3)

From (2) if process is unit root it is RW with drift


From (3) if process is not unit root, it is trend stationary with |β| < 1.

Alexander Karalis Isaac (Warwick) Time Series July, 2015 69 / 90


Dickey Fuller Tables

Alexander Karalis Isaac (Warwick) Time Series July, 2015 70 / 90


Notes on exercise

The order of integration, d, written yt ∼ I (d) is the number of times


a series must be differenced in order to make the series yt stationary
Determine the order of integration of Output, Consupmtion,
Investment and Prices.
Do any series exhibit trend-stationary behaviour?

Alexander Karalis Isaac (Warwick) Time Series July, 2015 71 / 90


Cointegration: Random walks which Tango!

So far we have dealt with Integrated series by differencing to make


them stationary and modelling their (univariate) stationary behaviour.
There is an important case when we can work with two (or more)
Integrated series directly. This is when the series are cointegrated
I Economic behaviour creates long run - equilibrium - relationships
between series. E.g. output and consumption, investment and output,
house prices and earnings (?), stock prices and profits (?)
I The ratio of such series is a stationary series, even though the two
series are I(1)!
I Variables which cointegrate in this way adjust to dynamic shocks in
order to move back towards their equilibrium relationship

Alexander Karalis Isaac (Warwick) Time Series July, 2015 72 / 90


Output and Consumption

Plots of series

Alexander Karalis Isaac (Warwick) Time Series July, 2015 73 / 90


Cointegration: formal definition

If a linear combination of I(1) series is I(0) then the two series cointegrate

xt ∼ I (1) yt ∼ I (1)
yt − βxt ∼ I (0)

The ‘cointegrating vector’ is the pair of values (1, −β) which


(working in logs) give the stationary ratio between the series
Economic theory often suggests theoretical values for β, so itis
interesting to see if these are true in the data

Alexander Karalis Isaac (Warwick) Time Series July, 2015 74 / 90


Common stochastic trends
Cointegration occurs when two series share a common stochastic trend,
say Xt . Let X0 = 0 and

Xt = Xt−1 + t
t
X
⇒ Xt = t
s=1

Let ỹt and x̃t be independent I (0) processes and let

yt = βXt + ỹt xt = Xt + x̃t


⇒ yt − βxt = βXt + ỹt − β(Xt + x̃t )
= ỹt − β x̃t ∼ I (0)

The common stochastic trend has been cancelled out. The pair (1, β) is
called the cointegrating vector as gives is the stationary linear combination
of y and x
Alexander Karalis Isaac (Warwick) Time Series July, 2015 75 / 90
Output and Consumption

Plots of ratio and residual in superconsistent regression

Alexander Karalis Isaac (Warwick) Time Series July, 2015 76 / 90


Cointegration: long and short-run relationships

If an economically meaningful equilibrium relationship exists:


There must be dynamic adjustment in the short run in order to return
the variables towards equilibrium levels when shocks push them apart
Thus the long-run relationship makes predictions about short-run
adjustment dynamics
The levels of the series this period help us predict changes in the
series next period
We can represent both the long-run and the short-run behaviour of
cointegrated series through the error correction model

Alexander Karalis Isaac (Warwick) Time Series July, 2015 77 / 90


Error correction model

We have seen an estimate of the cointegrating relationship between


const and outputt

ct = βyt + 

Encouragingly the residuals ˆt from this relationship were stationary


But look at the BGodfrey stat - XXX - the above model is not
dynamically well-specified; it does not meet A2.
A model with more general dynamics is

ct = β1 yt + β2 yt−1 + β3 ct−1 + t (4)

This allows for the response of ct to its own past, current and lagged
values of yt

Alexander Karalis Isaac (Warwick) Time Series July, 2015 78 / 90


Error correction model
Although (4) is a more general dynamic specification, it consists of
I (1) variables, yet the t series should be I (0).
With a bit of algebra we can rewrite the model entirely in terms of
I (0) variables

ct = β1 yt + β2 yt−1 + β3 ct−1 + t
= β1 yt − β1 yt−1 + β1 yt−1 + β2 yt−1 + β3 ct−1 + t
= β1 ∆yt + (β1 + β2 )yt−1 + β3 ct−1 + t
∆ct = β1 ∆yt + (β1 + β2 )yt−1 + (β3 − 1)ct−1 + t
 
β1 + β2
= β1 ∆yt + (β3 − 1) ct−1 − yt−1 +t
1 − β3
| {z }
E. C. term

∆yt , ∆ct are I (0), provided there is cointegration, so are the error
term and the equilibrium relationship in the large brackets

Alexander Karalis Isaac (Warwick) Time Series July, 2015 79 / 90


Error correction model

We can re-write the final line of the ECM as

∆ct = α1 ∆yt + α2 (ct−1 − βyt−1 ) + t (5)


β1 +β2
Cointegration imposes the restrictions γ = (β3 − 1) and β = 1−β3
If there is a cointegrating relationship
I ct−1 − β̂yt−1 ∼ I (0), and ˆt ∼ I (0)
I α̂2 < 0
The α̂2 < 0 requirement ensures ct adjusts to being above its
long-run level in period t − 1 by reducing in period t
To estimate such a model, we need an estimate of ct−1 − β̂yt−1

Alexander Karalis Isaac (Warwick) Time Series July, 2015 80 / 90


Estimation of the ECM
Engle and Granger (1987) propose a two-step procedure for estimating (5)
First we need an estimate of the cointegrating vector. Regress:

ct = βyt + νt
⇒ ν̂t = ct − β̂yt

the ν̂t is our estimate of deviations from the long-run equilibrium


relationship
Second, we estimate, by OLS

∆ct = α1 ∆yt + α2 ν̂t−1 + t

We can recover estimates of the parameters of the original dynamic


model (4) from the parameters of the estimated ECM, α̂1 , α̂2 and
ν̂t−1

Alexander Karalis Isaac (Warwick) Time Series July, 2015 81 / 90


Testing for cointegration: EG procedure
The two-step estimation approach suggests a method for testing whether 2
series are actually cointegrated
Estimate the cointegrating relationship

ct = βyt + νt

Save ν̂t series and perform an ADF test with no intercept


p−1
X
∆ν̂t = ρν̂t + γi ∆ν̂t−i + ut
i=1
H0 : ρ = 0 ⇒ ut is I(1) and there is no conitegration
HA : ρ < 0 ⇒ ut is I(0) and there may be cointegration
Critical values are McKinnon’s < DF critical values

If we find H0 is rejected ...

Alexander Karalis Isaac (Warwick) Time Series July, 2015 82 / 90


Testing for cointegration: EG procedure
...estimate the ECM

∆ct = α1 ∆yt + α2 ν̂t−1 + t

Test that there is a significant, negative change in ct whenever


ct−1 > β̂yt−1 , in order to restor equilibrium

H0 : α̂2 < 0 ⇒ error correction is significant


HA : α̂2 ≥ 0 ⇒ no significant error correction
α̂2
τ= ∼ t0.05,DoF
SE (α̂2 )

If the estimates pass these two tests, there is significant cointegration


and the ECM can be used to estimate the dynamic model
If not, then work with differences, i.e. transform the two series to
make them stationary.

Alexander Karalis Isaac (Warwick) Time Series July, 2015 83 / 90


EG procedure: discussion

The Engle-Granger procedure works well with two variables, but there are
drawbacks
The initial regression is misspecified, ν̂t is usually serially correlated
This two-step step approach introduces more variance than a
dynamically well-specified 1-step procedure
Results, esp. with more than two variables are sensitive to which
variable is taken as the left hand side variable
With more than two variables, there may be more than one
cointegrating relationship, and EG will estimate a linear combination
of these relationships, which has no real interpretation
These problems can be overcome by the Johansen procedure which is a
vector-based approach to estimating cointegrating equations

Alexander Karalis Isaac (Warwick) Time Series July, 2015 84 / 90


Empirical examples

Series β̂ ν̂t ∼ I (0) α̂2 t-stat


(ct , yt )
(hpt , wt )
(SPt , Dt )

Alexander Karalis Isaac (Warwick) Time Series July, 2015 85 / 90


Forecast comparisons

Estimate your error correction models on 1960-2000


Estimate your preferred ARIMA on 1960-2000
Produce 1-step and 4-step ahead out of sample forecasts with each
model for 2001-2006
Compare the MSPE from each model

Alexander Karalis Isaac (Warwick) Time Series July, 2015 86 / 90


Summary: Work stream for applied time series

Graph your data. Think about:


I Is the series trending over time?
I Is the trend exponential or linear?
I Is the series mean reverting?
I Would the series look mean reverting in most subsamples?
I Are there several variables that seem to exhibit the same random trend?
Take logs of exponentially increasing variables
Begin Dickey Fuller tests
I Decide about appropriate inclusion of trends and constants based on
visual inspection and inspection of DF regression results
I Include f + 1 lags in initial DF specification and remove insignificant
lags; check for serial correlation up to order f , ensure A2 is satisfied.
Using preferred specification of DF tests decide on order of
integration of the series

Alexander Karalis Isaac (Warwick) Time Series July, 2015 87 / 90


Summary: With the transformed stationary series

Build univariate ARMA models for forecasting


I Inspect ACF, PACF, decide on candidate AR, MA, ARMA specification
I Start with AR(f+1), MA(f+1) or ARMA (f/2,f/2)(?) specification and
test down by eliminating insignificant lags, minimizing AIC/BIC; ensure
A2 is satisfied in preferred model
Inspect forecast predictions v.s. actual outcomes
I Do the forecast error bounds include 95% of actual outcomes?
I Are the forecast errors close to uncorrelated?
Test robustness by performing out of sample forecast exercise
I You will need to reserve part of your sample so will lose some
information from the estimation
I But you might find a model that performs better in practice, or at
leatunderstand more about how your model is likely to perform as new
data comes in

Alexander Karalis Isaac (Warwick) Time Series July, 2015 88 / 90


Summary: Modelling cointegrating series
Plot the ratio of interest
Engle-Granger Procedure Step I
I Estimate the cointegrating relationship with appropriate constant/trend
inclusion
I Save the residuals
I Perform Dickey Fuller test on residuals
I No constant! McKinnon p-values
I H0 : no cointegration. If reject H0 go to...
Engle-Granger Procedure Step II
I Estimate ECM with appropriate lagged differences so that A2 holds
I Test αˆ2 < 0 by standard t-test
I H0 : no cointegration (α2 = 0). If reject H0 ...
ECM is correct model. Recover parameters of restricted ARDL model
with appropriate tranformations
Interpret cointegrating relationship
Make dynamic forecasts
Alexander Karalis Isaac (Warwick) Time Series July, 2015 89 / 90
The End!

Alexander Karalis Isaac (Warwick) Time Series July, 2015 90 / 90