Dynamics

Part 6: Dynamic models October 7, 2008
Part VI
Dynamic Econometric Models

reading: G[19,20], DM[13].
1 Introduction to dynamic models
Many models in economics related to dynamic behavior:
• Disequilibrium and incomplete adjustment,
• capital and resource stock utilization and depreciation.
• Whether our initial theory suggests dynamics or not, time series data often
hints at dynamic interactions across time periods.
This large section of the course deals with models in which adjacent observations
are related in one way or another. The most prominent application of this form of
model is with time-series data, although spatial data can have similar characteris-
tics. We will focus on data with temporal relationships.
1.1 General form of a linear dynamic model
A general representation of such a model might be
p
X r
X
yt = µ + γi yt−i + βj xt−j + εt ,
i=1 j=0
where εt might be related to past values of ε, such that
εt = ρ1 εt−1 + ρ2 εt−2 + · · · ρr εt−g + ut (serial correlation), or
εt = ut + θ1 ut−1 + · · · θq ut−q , (moving average error)
Page 73 — WSU Econometrics II 2007,

c Jonathan Yoder. All rights reserved
We begin by looking closely at univariate time series, where yt is modeled
as a function of only past values of itself. Later we examine Autoregressive

Distributed Lag Models with other regressors xt · · · xt−r .
1.2 Necessary conditions for consistency of OLS
1. E[εt |xt−s ] = 0 ∀ s ≥ 0. Implies that εt contains only new info at t. No AR
or MA errors. [note: here x denotes all RHS vars, including lagged dep.
vars.]
PT
1
2. A regularity condition: plim T −s i=s+1 xt x0t−s = Q(s) finite
3. (required for univariate series) yt “stable” :
Pp
(a) Stationary: For a model yt = µ + i γi yt−i + εt , the roots (z) of the
polynomial 1 − γ1 z − γ2 z 2 − · · · − γp z p = 0 are outside the unit circle.
(b) Ergodic: events are asymptotically independent — impact of one on the
other weakens with temporal distance.
4. (required for ARDL models) The regression relationship between y and its
regressors x is “stable” over time: the series are “cointegrated” (note that
each series y, x need not be stationary, just the relationship between them).
5. Asymptotic normality holds with some additional assumptions called the
Mann and Wold conditions, so that b ∼a N β, σ 2 Q(s)−1 .

To summarize: for OLS to be consistent,
• with a lag(s) of the dependent variable on the right-hand-side, errors must
contain only new information (no AR or MA processes).
• You need either stationary series, or a stationary relationship among nonsta-
tionary series.

When one or more of these conditions doesn’t hold, a number of approaches are
used for consistent parameter estimation, including pre-estimation data differenc-
ing, instrumental variables methods, and/or conditional Maximum Likelihood or
Nonlinear Least Squares.
2 Univariate time series models
Reading: G[20], DM[13], K[18]
2.1 lag operators
The lag operator L is defined such that
Lxt = xt−1
L(Lxt ) = L2 xt = xt−2
Lp xt = xt−p
Lq (Lp xt ) = Lp+q xt = xt−p−q
In some cases we want to work in changes, or differences of data across observations:
(1 − L)xt = xt − xt−1 = ∆xt (first-difference)
(1 − L)(1 − L)xt = (1 − L)(xt − xt−1 )
= (xt − xt−1 ) − (xt−1 − xt−2 )
= ∆2 xt (second-difference)
Now consider the infinite series
∞
X 1
A(L) = 1 + aL + (aL)2 + (aL)3 + · · · = (aL)i = .
i=0
(1 − aL)

this implies, for example, that
∞ ∞ ∞
xt X X X
= xt (aL)i = (aL)i xt = ai xt−i .
(1 − aL) i=0 i=0 i=0
Infinite series of lags on xt , but one associated parameter, a.
1. This section focuses on the univariate model
C(L)yt = µ + R(L)εt ,
which is called Autoregressive Moving Average model, or ARMA(p,q),
where p is the order of C(L) and q is the order of R(L).
2. An Autoregressive Integrated Moving Average, or ARIMA(p,d,q) model
corresponds to data that must be differenced d times to ensure stationarity,
which is a necessary condition for univariate estimation.
3. When a moving average disturbance process exist, then nonlinear least squares
or Maximum likelihood conditional on the initial observations is a consistent
estimator of ARIMA parameters.
4. We will define autocorrelation functions and partial autocorrelation
functions. The empirical counterparts to these will help us to identify the
dynamic characteristics of a data series
2.2 Stationarity and Invertibility
C(L)yt = µ + R(L)εt
where C(L) is represents an AR process in y and R(L) represents a moving average
process in ε, such as
R(L)εt = (1 − θ1 L − θ2 L2 )εt = εt − θ1 εt−1 − θ2 εt−2

For estimation of ARIMA models, we require two things:
1. (Weak) stationarity of y for finite and stable asymptotic variance.
(a) E[yt ] is independent of t.
(b) Var[yt ] is finite and independent of t.
(c) Cov[yt , ys ] is a function of t − s, not t.
2. weak stationarity is satisfied if the root(s) z of C(z) lie outside the unit circle.
For example,
(a) AR(1) process in y,:the characteristic equation is c(z) = 1 − γz = 0 for
stationarity, z = 1/γ must be larger than one.
(b) AR(2) in y: c(z) = 1−γ1 z −γ2 z 2 = 0. The series is covariance stationary
if |γ2 | < 1, γ1 + γ2 < 1, and γ2 − γ1 < 1.
3. Invertibility of R(L). Invertibility requires the root(s) of R(z) to lie outside
the unit circle (same requirement as above for C(z)).
4. We need invertibility so that we can define the disturbances as an autoregres-
sive process of y:
µ
εt = D(L)yt −
R(L)
2.3 Nonstationarity and Integrated series
Consider the following time series specifications:

Random Walk: yt = µ + εt , εt = εt−1 + ut or equivalently yt = yt−1 + ut
Random Walk with a drift: yt = µ + yt−1 + ut
Trend stationary: yt = µ + βt + ut
Each of these can be characterized as (1 − L)yt = α + εt , where εt is white noise.

For example, consider the trend stationary case. Take first differences:
∆yt = (1 − L)yt = (µ + βt + ut ) − (µ + β(t − 1) + ut−1 )
= β + εt .
where α = β, εt = ∆ut , is also white noise, but is a non-invertible moving average
error, MA(1). In any of these cases, the root of the characteristic equation for
(1 − L) equals one — a unit root; so y is nonstationary series.
2.4 Consequences of nonstationarity
non-constant variances: Consider the random walk, starting from period 1.
y1 = y0 + u1
y2 = (y0 + u1 ) + u2
..
.
X
yt = y0 + ut .
t
X X
E[yt ] = y0 + ut = y0 ; Var[yt ] = σu2 = tσ 2 .
t t
Spurious inference: suppose yt = γyt−1 + εt , and γ does in fact equal 1. it has
been shown that:
1. In finite samples, γ̂OLS is biased downward (away from 1),
2. γ̂OLS converges to its probability faster than normal, so we reject the null
hypothesis of γ = 1 too often with standard t and z tests.

2.4.1 Integrated processes and differencing
• Take a random walk with a drift, and substitute lags back to infinity, and you
get:
∞
X
yt = µ + yt−1 + εt = (µ + εt−i )
i=0
The expected value and variance grows to infinity as t grows.
• Now take the first difference:
∞
X ∞
X
yt − yt−1 = (µ + εt−i ) − (µ + εt−i )
i=0 i=1
∆yt = µ + εt
which is a white noise process.
• Note that a unit root implies that past levels of y do not provide any infor-
mation for explaining the change y at t.
• The series yt is integrated of order one, or I(1) because first difference of
yt are stationary.
• A series that becomes stationary after d differences is I(d); integrated of order
d.
• For an individual (univariate) time series, the solution to the estimation prob-
lems associated with a unit root is to difference the data until it is I(0). Then
proceed with estimation.
• For two or more related series in a regression, differencing of the data is not
necessary for consistent parameter estimation if the series are cointegrated.
More on this later.
Now consider how to test for nonstationarity of a single series.

2.4.2 Dickey-Fuller stationarity tests
Consider a univariate autoregressive relationship that nests the possibility of a ran-
dom walk, a random walk with a drift, and a trend stationary series:
(1 − γL)(yt − α − βt) = εt
(1 − γL)yt − (1 − γ)α − (1 − γL)βt = εt
Note that (1 − γL)βt = β(t − γ(t − 1)) = β(1 − γ)t + βγ, so
yt = [α(1 − γ) + βγ] + β(1 − γ)t + γyt−1 + εt .
A more convenient form is generated by subtracting yt−1 from both sides:
∆yt = [α(1 − γ) + βγ] + β(1 − γ)t + (γ − 1)yt−1 + εt ,
define γ ∗ = γ − 1,
= [−αγ ∗ + β(γ ∗ − 1)] − βγ ∗ t + γ ∗ yt−1 + εt .
Given a unit root, γ ∗ = 0, and the above equation collapses to
∆yt = β + εt ,
which is stationary as a first difference.
Constant and trend :∆yt = µ + βt + γ ∗ yt−1
Constant only :∆yt = µ + γ ∗ yt−1
no constant, trend :∆yt = γ ∗ yt−1

Figure 1: DF and ADF critical Values
• The null hypothesis is that γ ∗ = 0, or that there IS a unit root.
• γ ∗ ≤ 0.
• The test statistic is calculated as the usual t-statistic: γ̂ ∗ /se(γ̂ ∗ ) but it does
not have a t standard distribution. The statistic is compared to specific dis-
tributions developed by Dickey and later improved by MacKinnon.
• Table 20.5 (Greene) provides three sets of critical values If you KNOW your
model doesn’t include a constant and/or a trend, then the appropriate critical
values will provide a more powerful test (reduce the chances of a type 2 error
– failure to reject H0 when H0 is false).
• If |γ̂ ∗ /se(γ̂ ∗ )| < |critical value|, then fail to reject null, so difference or de-
trend the data. Else proceed with estimation on the original (nondifferenced)
data.

2.4.3 DF tests for AR(p) processes
Same concept, but add lagged differences (out to p − 1) on the right hand side:
∆yt = µ + βt + γ ∗ yt−1 + φ1 ∆yt−1 + φ2 ∆yt−2 + · · · + φp ∆yt−(p−1) .
where, ∆yt−i is the ith lag of the first-difference of the dependent variable (first
difference of yt ). The Augmented Dickey Fuller test is carried out by testing β =
γ ∗ = 0. Compare to the ADF critical values.
Once we have have a stationary series, then we can begin to examine the data
series model more closely and (ultimately) consistently estimate the parameters of
a model.
2.5 Autocovariances and Autocorrelations
Our goal is to characterize the relationship between yt and lags of yt . the estimated
counterparts to various autocorrelation functions help us do that; which will in turn
provide guidance for how to estimate the dynamic structure of a time-series.
Notation:
• λk = Cov[yt , yt−k ] is the autocovariance coefficient between yt and yt−k .
λ0 ≡ Var[yt ].
• The Autocovariance function is the (possibly infinite) series of covariances
λ0 , λ1 , λ2 · · · .
E[yt yt−k ] λk
• The autocorrelation coefficient is ρk = √ √ = λ0 .
Var(yt ) Var(yt−k )
• The Autocorrelation function (ACF) is the autocovariance function di-
vided through by λ0 .

For estimation we will work mainly with the ACF and the Partial Autocor-
relation function (PACF), to be defined later, but we need the autovariances to
calculate the ACFs.
2.5.1 Variance of y
In general, C(L)yt = µ + R(L)εt , so
µ
yt = + A(L)εt where A(L) = R(L)/C(L).
C(1)
∞
µ X
= + αi εt−i . then
C(1) i=0
∞
X
λ0 = Var[yt ] = αi2 σε2 .
i=0
2.5.2 Example: ACF for AR(2)
yt = γ1 yt−1 + γ2 yt−2 + εt , the ACF is a function of γ1 , γ2 , and σε2 . Start by
multiplying both sides by yt and take expectations:
E[yt yt ] = E[yt (γ1 yt−1 + γ2 yt−2 + εt )] = λ0
= γ1 λ1 + γ2 λ2 + E[yt εt ].
E[yt εt ] = E[(γ1 yt−1 + γ2 yt−2 + εt )εt ] = E[εt εt ] = σε2 , so
λ0 = γ1 λ1 + γ2 λ2 + σε2 . similarly,
λ1 = E[yt yt−1 ] = E[yt−1 (γ1 yt−1 + γ2 yt−2 + εt )] = γ1 λ0 + γ2 λ1
λ2 = E[yt yt−2 ] = E[yt−1 (γ1 yt−2 + γ2 yt−3 + εt )] = γ1 λ1 + γ2 λ0

Summary of 3 equations, three unknowns (λ0 , λ1 , λ2 ):
λ0 = γ1 λ1 + γ2 λ2 + σε2
λ1 = γ1 λ0 + γ2 λ1
λ2 = γ1 λ1 + γ2 λ0
σε2
solve for λ0 to get λ0 = σy2 = 1−γ12 (1−γ2 )−1 +γ2
, then plug this into the formulas for
λ1 and λ2 to get the autocoviances. The Autocorrelation coefficients for the first
two lags are
λ1 γ1
ρ1 = = γ1 + γ2 ρ1 =⇒ ρ1 =
λ0 1 − γ2
λ2 γ12
ρ2 = = γ1 ρ1 + γ2 =⇒ ρ2 = + γ2
λ0 1 − γ2
Generally,
λk = E[yt yt−k ] = E[yt−k (γ1 yt−1 + γ2 yt−2 + εt )]
= γ1 λk−1 + γ2 λk−2 ,
and the ACF for an AR(2) isρk = γ1 ρk−1 + γ2 ρk−2 .
Exercise: What’s the ACF for and AR(1)?
2.5.3 ACF for MA(q)
yt = εt − θ1 εt−1 − · · · − θq εt−q
!
X
γ0 = E[yt2 ] = σε2 1 + θq2
q
q
!
X
γk = Cov[yt , yt−k ] = σε2 −θ1 + θi−1 θi for i ≥ q.
i=2

NOTE: the ACF for an MA(q) goes to zero abruptly at lag q.
exercises: Derive the ACF for an MA(1), then for an ARMA(1,1): (1 − γL)yt =
(1 − θL)εt .
2.6 Partial Autocorrelation coefficients
The Autocorrelation is the gross correlation λk between yt and yt−k . The partial
autocorrelation, ρ∗k is the simple linear correlation between yt and yt−k after
accounting for the effects of intervening lags.
e.g., for AR(1), yt = γyt−1 +εt , the correlation coefficient is ρ2 = corr(yt , yt−2 ) =
γ 2 . If we remove the effect of yt−1 on yt , then yt−2 will have no effect on yt .
ρ∗k can be calculated as the k th parameter estimate in an AR(k) process:
yt = ρ∗1 yt−1
yt = β1 yt−1 + ρ∗2 yt−2
yt = β1 yt−1 + β2 yt−2 + ρ∗3 yt−3
etc.
• ρ1 = ρ∗1 for any process.
• For an ARMA(p,0) process, ρ∗k = 0 for k > p.

PACF for ARMA(0,q): yt = µ + R(L)εt , and with invertibility R(L)−1 (yt −
µ) = εt . This implies:
∞
µ X
yt = + πi yt−i + εt ,
R(L) i=1
which is an AR(∞). Given invertibility, the πi will tend to dampen as i become
larger.
2.7 Sample counterpart to the ACF, PACF
The sample ACF is often called the autocorrelogram or periodogram , and
is calculated just as any standard correlation coefficient. Given that y is measured
in deviations from mean ȳ,
PT
t=k+1 yt yt−k
rk = PT .
2
t=1 yt
The sample PACF , or the partial autocorrelogram could be estimated based
on the progressive regressions above, but usually they are estimated in the following

way: Regress yt and yt−k (separately) on 1 yt−1 . . . yt−(k−1) , and calculate
the residuals, yt∗ and yt−k

∗
from these regressions.
PT ∗
yt∗ yt−k
rk∗ = PTt=k+1
∗ 2
t=k+1 (yt−k )
Seasonality in ARMA models The basics are a straightforward extension of
basic ARMA processes. we will not discuss in class, but understand how to im-
plement seasonal differences and seasonal lags in an ARIMA framework. Refer to
Greene section 18.3.5 (very short).

2.7.1 Summary of the relationship between ACF and PACF
Process ACF PACF
AR(p) Infinite (dampens) Finite (zero after lag p)
MA(q) Finite (zero after lag q) Infinite (dampens)
ARMA(p,q) Infinite (dampens) Infinite (dampens)
Example: AR(1), MA(1)

AR(1): yt = γyt−1 + εt MA(1): yt = εt − θεt−1
ACF ρk = γ k ρ1 = −θ/(1 + θ); ρi>1 = 0
PACF yt = ρ∗1 yt−1 + εt ; ρ∗i>1 = 0 ρ∗k = θk
2.8 Modeling procedure for ARIMA models
The Wold Decomposition theorem states that every zero-mean covariance stationary
series C(L)yt = µ + R(L)εt can be represented as
p
X ∞
X
yt = γi yt−i + πi εt−i .
i=1 i=0
We cannot estimate the infinite series of πi So we compromise and choose an
AR(p) and an MA process with finite q to best fit the data.
The Box-Jenkins approach to ARIMA(p,d,q) modeling is often broken down
into 3 steps:
• identification, which refers to identification of the best lag structure (p,d,q)
for the series.
1. First determine the need to difference or de-trend using Dickey-Fuller
tests or related tests; then difference the data for stationarity.
2. Given a stationary series, we use estimated autocorrelation functions and
tests for white noise (i.i.d.) errors to determine the appropriate lag struc-
ture.

• estimation, which refers to the process of estimating the parameters based
on either OLS, or if necessary conditional NLS or ML methods.
• Forecasting based on the estimated parameters and their estimated vari-
ances.
NOTE: this class will only scratch the surface of ARIMA modeling.
Take STAT 516 for a more complete treatment.
2.8.1 Identification and testing for white noise
The goal of the Box-Jenkins approach is to specify the most parsimonious model
that provides you with white noise errors.
Once you have a stationary series, you can test for non-zero ACFs and PACFs.
ACFs and PACs will be approximately N(0,1/T) under the null hypothesis of white
noise. A test statistic for the joint test of whether all elements of the ACF (PACF)
are white noise up to lag p is Ljung-Box Q statistic:
p
0
X rk2
Q = T (T + 2) .
T −k
k=1
Process of model identification for a stationary series:
1. generate the autocorrelogram and partial autocorrelogram for the series. If,
for any lag, the Q statistic leads to rejection of the null hypothesis of white
noise in the ACF and/or PACF, then add AR or MA model components based
on the structure of the ACF and PACF.
2. Test for white noise in the residuals of the ARIMA model you have specified.
If tests indicate white noise for all lags in the correlograms, then stop. Else,
respecify the model and check the residuals again for white noise.
3. Akaike and Schwartz information criterion can also be used to help select a
model specification.

2.8.2 Estimation of ARIMA models
the Wold Decomposition theorem states that every zero-mean covariance stationary
series C(L)yt = µ + R(L)εt can be represented as
p
X ∞
X
yt = γi yt−i + πi εt−i .
i=1 i=0
If we are able to specify our model such that πi = 0 for all i (that is, no MA
disturbance process), then a the model
p
X
yt = γi yt−i + εt , εt ∼ white noise
i=1
could be estimated consistently with OLS. If MA errors persist, Nonlinear models
are required.
Consider a simple model with MA(1) errors ut = θεt−1 − εt , and start at t = 1.
y1 = µ + ε1 − θε0 → ε1 = y1 − µ + θε0
y2 = µ + ε2 − θε1 = (1 + θ)µ − θy1 − θ2 ε0 + ε2

..
.···
t−1
X t−1
X
s
yt = µ θ − θs yt−s − εt − αs ε0 .
s=0 s=1
• If ε0 = 0 this is just a nonlinear regression function of µ and θ that explicitly
depends on the sample size. Thus, nonlinear least squares is often used.
• In practice, we don’t know what the true value of ε0 is. Often ε0 = 0 is used,
so the model is actually conditional nonlinear least squares model. More
sophisticated means of estimating ε0 are available too.
With more complicated ARIMA models, the estimated model can become quite
complicated.

2.8.3 Forecasting with ARIMA models
You will see some forecasting methods based on Kalman filters in the next section.
We will not cover them here.

3 Autoregressive distributed lag models
Reading: G[19], DM[13], K[18]
3.1 Distributed Lags
Distributed lags deal with the current and lagged effects of an independent variable
on the dependent variable. That is:
yt = α + β0 xt + β1 xt−1 + β2 xt−2 + . . . + et
∞
X
=α+ βi xt−i + et
i=0
The effect of x on y is distributed over time:
• The immediate effect is β0 (AKA impact multiplier);
P
• The long-run effect over all future periods is βi (AKA equilibrium mul-
tiplier);
P∞ iβi P∞
• The mean lag is i=1
P∞
βj = i=1 iwi
j=0
The problem with the above model: an infinite number of coefficients. Two
feasible approaches are:
• Assume βi = 0 for i > some finite number.
• Assume that βi can be written as a function of a finite number of parameters
for all i = 1 to ∞.
3.2 Finite distributed lags
Consider the model yt = α + β0 xt + . . . + βp xt−p + εt . No restrictions are placed on
the coefficients of the current and lagged values of x, but we need to decide on p.

t-tests are usually not good for selecting lag length because lagged values of x
are likely to be highly correlated with current values. i.e. t-tests will have low
power.
Two better approaches, both based on the assumption that you know some
upper bound P for the lag length:
• Choose the lag length p ≤ P that maximizes R̄2 or minimizes the Akaike Info.
0
Criterion (AIC) = ln eTe + 2p T .
• Start with high P and do F-tests for joint significance of βi . Successively drop
lags. Stop dropping lags as soon as Ho : βi = 0∀i is rejected.
Both methods tend to “overfit” (leave too many lags in), so high significance
levels should be used for the F-test (e.g. α = .01).
3.3 Geometric Lag models
Two models, the Adaptive Expectations Model and the Partial Adjustment
Model have been used a great deal in the literature. They are two specific models
that imply a specific form of infinite distributed lag effects called Geometric lags.
3.3.1 Partial Adjustment Model
Suppose the current value of the independent variables determines the desired value
or goal for the dependent variable:
yt∗ = α + βxt + εt ,
but only a fixed fraction of desired adjustment is accomplished in one period (it
takes time to build factories, restock diminished inventories, change institutional

structure). The partial adjustment function is:
yt − yt−1 = (1 − λ)(yt∗ − yt−1 ); |λ| < 1.
rearrange:
yt∗ = (1 − λ)−1 (1 − λL)yt
replace yt∗ with the r.h.s. above and rearrange:
(1 − λL)yt = (1 − λ)(α + βxt + εt )
yt = α(1 − λ) + β(1 − λ)xt + λyt−1 + (1 − λ)εt
= α̃ + β̃xt + λyt−1 + ε̃t
Intrinsically linear in parameters and disturbances uncorrelated if ε uncorrelated.
OLS is consistent and efficient.
3.3.2 Adaptive Expectations model
An Adaptive expectations model is based on a maintained hypothesis about how
expectations change. Example: When input decisions (supply decisions) are based
on expected future prices.
yt = α + βx∗t+1 + δwt + εt
x∗t+1 = λx∗t + (1 − λ)xt
x∗t is the expected value for xt evaluated at time t − 1, and 0 < λ < 1. The second
equation implies that the change in expectations from t − 1 to t is proportional to
the difference between the actual value of x in period t and last periods expectation
about xt .
(1−λ)
1. Rearrange the second equation to get x∗t+1 = (1−λL) xt .

2. Substitute x∗t+1 out of the first equation to get
1−λ
yt = α + β xt + δwt + εt
1 − λL
X∞
= α + β(1 − λ) λi xt−i + δwt + εt
i=0
= α + γzt (λ) + δwt + εt , γ = β(1 − λ)
this is in distributed lag form. Estimation proceeds recursively (as discussed in
Greene p. 568.) Briefly,
1. Constructed a variable zt (λ) that satisfies zt (λ) = xt + λzt−1 . Use z1 (λ) =
x1 (1 − λ).
2. pick a set of λs in (0,1), calculate z(λ)s, include it as one of the variables in
separate OLS regressions.
3. Choose λ̂ that minimizes SSE.
4. Use computer search and/or optimization routines to do this.
Note that the disturbances satisfy the CLRM assumptions, and if they are i.i.d.
normal, this recursive process (that minimizes SSE) is also Maximum Likelihood).
The autoregressive form is
(1 − λ)
yt = α + β xt + εt
(1 − λL)
yt (1 − λL) = α(1 − λL) + β(1 − λ)xt + δ(1 − λL)wt + (1 − λL)εt
yt = α̃ + β̃xt + δ̃wt + λyt−1 + ut
where ut = εt −λεt−1 is a moving average error. Rather than the recursive approach
discussed above, you could also use Instrumental Variables approach (replace yt−1 )

with an appropriate instrument to ensure consistency.
3.4 Autoregressive Distributed Lag Models (ARDL)
The previous models are restrictive:
• The geometric lag is very restrictive regarding the relative impact of different
lagged values of x.
• Unrestricted lags truncate and eat up DF.
the ARDL is a more general form that can accommodate and approximate a huge
array of functional forms. An ARDL(p, r) is defined as
p
X r
X
yt = µ + γi yt−i + βj xt−j + εt , ε ∼ i.i.d ∀ t
i=1 j=0
C(L)yt = µ + B(L)xt + εt , where
C(L) = 1 − γ1 L − γ2 L2 − · · · − γp Lp and
B(L) = β0 + β1 L + β2 L2 + · · · + βr Lr
3.4.1 Estimation
Consider the simplest model with a lagged dependent variable:
yt = αyt−1 + εt
∞
εt X
yt = = αi εt−i
1 − αL i=0
∞
X
yt−1 = αi εt−i (note index i starts at 1)
i=1
So, yt−1 is a function of εt−1 and all previous disturbances. The CLRM assumption
of E[ε0 X] = 0 holds if ε is i.i.d. (in this case X is lag y), because yt−1 is not

correlated with εt . We can therefore consistently estimate α. However, if εt is a
function of past disturbances, then Cov[yt−1 , εt ] 6= 0. OLS Biased, inconsistent.
Show for yourself that Cov[yt−1 , εt ] 6= 0 for a model with one lagged de-
pendent variable on the right and an AR(1) disturbance .
3.4.2 Summary stats for effect of x on y
The equilibrium multiplier (long-run effect of a change in x) in the ARDL model
generally is
∞ Pr
B(1) i βi
X
Long Run multiplier = αi = = A(1) = P p
i=0
C(1) 1 − i γi
B(L)
where A(L) = C(L) . Assuming no shocks (disturbances) and assuming stationarity,
the long-run relationship among the variables in a regression are
µ B1 (1) B2 (1) Bk (1)

ȳ = + X̄1 + X̄2 + · · · + X̄k
C(1) C(1) C(1) C(1)
where ȳ and X̄ are constant values of y and Xi .

0
The Mean Lag is AA(L) (L)

L=1
.
3.4.3 Calculating the lag coefficients
Consider an ARDL(2,1):
(β0 + β1 L)
yt = µ̃ + xt + ε̃t
(1 − γ1 L − γ2 L2 )
B(L)
= µ̃ + xt + ε̃t
C(L)
= µ̃ + A(L)xt + ε̃t
∞
X
= µ̃ + αi xt−1 + ε̃t
i=0

αi is the direct effect of xt−s on yt ; the coefficient on Li in A(L). Suppose we want
to calculate αi .
A(L)C(L) = B(L)
(α0 + α1 L + α2 L2 + . . . )(1 − γ1 L − γ2 L2 ) = (β0 + β1 L)
Expanding this over a subset of A(L),
(α0 −α0 γ1 L−α0 γ2 L2 )+(α1 L−α1 γ1 L2 −α1 γ2 L3 )+(α2 L2 −α2 γ1 L3 −α2 γ2 L4 )+· · · = β0 +β1 L
Now, collect terms for each lag length, respectively:
L0 : α0 = β0
L1 : −α0 γ1 + α1 = β1
L2 : −α0 γ2 − α1 γ1 + α2 = 0
L3 : −α1 γ2 − α2 γ1 + α3 = 0
Rearranging each line respectively gives αi as a function of the estimable parameters
βi and γi .
α 0 = β0
α1 = β1 + α0 γ1 = β1 + β0 γ 1
α2 = α0 γ2 + α1 γ1 = β0 γ2 + (β1 + β0 γ1 )γ1
α3 = α1 γ2 + α2 γ1 = etc. · · ·
αj = γ2 αj−2 + γ1 αj−1 = etc. · · · for j > 3 with an ARDL(2,1).

3.4.4 Forecasting with ARDL
Consider an ARDL(2,1):
yT +1 |yT = γ1 yT + γ2 yT −1 + β0 xT +1 + εT +1
0
= γ 0 xT +1 + εT +1 , where xT +1 = yT yT −1 xT +1 .
Because E[εT +1 ] = 0, ŷT +1 |yT is a consistent estimator of yT +1 |yT .
Var[e1 |T ] = E[ε0T +1 εT +1 ]
= x0T +1 σ 2 (X0 X)−1 xT +1 + σ 2 [be able to show this]

0 2 0 −1
\
Var[e 1 |T ] = xT +1 s (X X) xT +1 + s2 .
q
A forecast interval for y1 is ŷ1 ± tα/2 \
Var[e 1 |T ].
Kalman Filter simplifies extended forecasting
Assume that εT +1 is the only source of uncertainty.
  
   
γ1 γ2 γ3 ··· γp−1 γp   yT   
ŷT +1 µ̂T +1    ε̂T +1


 
 
 1
  0 0 ··· 0 0  yT −1  
   

 yT   0      0 
= +0 ··· 0  yT −2  + 
        
 1 0 0 
y   0      0 
 T −1     ..  .  
  ..  


..
 
..
 
 .   ..

. .    .
0 0 0 ··· 1 0 yT −p
ŷT +1 = µ̂T +1 + CyT + ε̂T +1

where µ̂T +1 = µ + β0 xT +1 + · · · + βr xT +1−r is known with certainty (so forecasts are
conditional on xT +1 ).
 
2
σ 0 · · ·
Cov[ε̂T +1 ] = E[(ŷT +1 − yT +1 )(ŷT +1 − yT +1 )0 ] =  2 0
 
0 0 · · ·  = σ jj ,
. .. . . 
.. . .

0
where j = 1 0 0 · · · (which is a p×p matrix) and Var[εT +1 ] = Cov11 [ε̂T +1 ] =
σ2 .
Note: The forecast errors ε̂T +i are included above for intuition about the forecast
variance. When calculating the point estimates ŷT +i , set ε̂T +i to its expected value
of zero.
For T+2:
ŷT +2 = µ̂T +2 + CyT +1 + ε̂T +2
= µ̂T +2 + C(µ̂T +1 + CyT + ε̂T +1 ) + ε̂T +2
= µ̂T +2 + Cµ̂T +1 + C2 yT + (Cε̂T +1 + ε̂T +2 )
Cov[Cε̂T +1 + ε̂T +2 )] = σ 2 (Cjj0 C0 + jj0 )
and Var[ŷT +2 ] is the upper left element of σ 2 (Cjj0 C0 + jj0 ).
For F periods out (normalize T to T = 0):
F
X
ŷF = CF y0 + Cf −1 [µ̂F −(f −1) + ε̂F −(f −1) ]
f =1
−1
" F
#
X
2 0 i 0 i 0
Var[ŷF ] = σ jj + [C ]jj [C ] .
i=1

Example: ARDL(2,1):
      
ŷT +1  µ̂T +1  γ̂1 γ̂2   yT 
 = +  
yT 0 1 0 yT −1
where µ̂T +1 = µ̂ + β̂0 xT +1 + β̂1 xT . Remember, for calculating forecasts, ε̂T +1 = 0.
3.4.5 Common Factor restrictions
An AR(1) model
yt = βxt + vt ; vt = ρvt−1 + εt
can be written as
yt = ρyt−1 + βxt − ρβxt−1 + εt
which is an ARDL(1,1) with a restriction on the coefficient on xt−1 .
AR(p) as a restricted ARDL(p,p): Let εt be an i.i.d. disturbance.
yt = βxt + vt ,
where vt = ρ1 vt−1 + · · · + ρp vt−p + εt
⇒ vt R(L) = εt .
εt
yt = βxt +
R(L)
R(L)yt = βR(L)xt + εt
C(L)yt = βB(L)xt + εt for C(L) = B(L).
Implications
1. Any AR(p) disturbance in a static model can be interpreted as a restricted
version of an ARDL(p,p).

2. Finding an AR(p) error process in your regression results can be an indication
of unaccounted-for ARDL process (i.e. a misspecified model).
E.g. an ARDL(2,2)model is
yt = γ1 yt−1 + γ2 yt−2 + β00 xt + β10 xt−1 + β20 xt−2 + εt
Test for an AR(2) as a restricted ARDL(2,2) by testing the joint restriction
   
β1 + γ1 β0  0
f (b) =  = 
β2 + γ2 β0 0
CFRs using characteristic roots. A more flexible and general method of test-
ing the specification of ARDL models is based on the roots of the Lag operator
polynomials.
C(L) = (1 − γ1 L − γ2 L2 ) = (1 − λ1 L)(1 − λ2 L)
B(L) = β0 (1 − β1 L − β2 L2 ) = β0 (1 − τ1 L)(1 − τ2 L)
where λi and τi are characteristics roots (note, we just arbitrarily changed the signs
of β1 , β2 ). Then the ARDL(2,2) can be written as
(1 − λ1 L)(1 − λ2 L)yt = β0 (1 − τ1 L)(1 − τ2 L)xt + εt , or
yt = (λ1 + λ2 )yt−1 − (λ1 λ2 )yt−2 + β0 xt − β0 (τ1 + τ2 )xt−1 + β0 (τ1 τ2 )xt−2 + εt .

Now restrict λ1 = τ1 = ρ (the lag operator polynomials have a “common factor”).
The model becomes an AR(1):
(1 − ρL)(1 − λ2 L)yt = (1 − ρL)(1 − τ2 L)β0 xt + εt
(1 − λ2 L)yt = (1 − τ2 L)β0 xt + ut
εt
where ut = 1−ρL = ρut−1 + εt , an AR(1) error process.
Implications for estimation
1. The ARDL(2,2) has a white noise error, can be estimated consistently with
OLS.
2. The restricted model has a lagged dep. var and and AR(1) — OLS is incon-
sistent.
Two possible approaches:
1. Estimate the unrestricted version (ARDL(p,r)); inefficient because the restric-
tion is not imposed, but consistent.
2. Use IV with an instrument for the lagged dep. vars on right hand side; per-
haps, for e.g., ŷt−1 from a regression of yt on xt ...xt−p .
Question: How would you test for autocorrelated errors in an ARDL(p,r) model?
3.4.6 Error Correction Models (ECM)
ARDL(1,1): yt = µ + γyt−1 + β0 xt + β1 xt−1 + εt
subtract yt−1 : ∆yt = µ + (γ − 1)yt−1 + β0 xt + β1 xt−1 + εt
add,subt. β0 xt−1 : ∆yt = µ + (γ − 1)yt−1 + β0 ∆xt + (β0 + β1 )xt−1 + εt


γ−1
Then multiply (β0 + β1 )xt−1 by γ−1 to get

β0 + β1 B(1)
∆yt = µ + β0 ∆xt + (γ − 1)(yt−1 − θxt−1 ) + εt , where θ = = .
1−γ C(1)
B(1)
, where C(1) is the long-run multiplier we saw a while back. This is called an Error
correction model, or more precisely, the error correction form of the ARDL(1,1)
model. One more step:
∆yt = β0 ∆xt + γ̃[yt−1 − (µ̃ + θxt−1 )] + εt
µ µ
where µ̃ = 1−γ = − γ−1 and γ̃ = (γ − 1). ∆yt is comprised of two components (plus
disturbance): a short run shock from ∆xt and a reversion toward equilibrium, or
equilibrium-error correction. To see this, note that in equilibrium yt = yt−1 = ȳ,
and xt = xt−1 = x̄, so ∆yt = 0 and ∆xt = 0. Then the ECM is
0 = γ̃[yt−1 − (µ̃ + θxt−1 )], so
ȳ = µ̃ + θx̄
Therefore, yt−1 − (µ̃ + θxt−1 ) represents deviation from the equilibrium relationship
y = µ̃ + θx. β1 = (γ − 1) is the marginal impact of this deviation on ∆yt .
Estimation: Assuming stationarity of y, all parameter of the ECM can be cal-
culated based on estimates from the original ARDL(1,1) model. Alternatively, all
parameters of the ARDL(1,1) model can be calculated with the parameters from
the alternative specification
∆yt = α0 + α1 ∆xt + α2 yt−1 + α3 xt−1 + εt

The results will be identical. Covariances can be calculated using the Delta method
if necessary. You could also estimate the ECM model parameters directly via non-
linear least squares.
3.5 Cointegration
• The problem: in general, a linear combination of variables will usually be
integrated to the highest order of the variables.
• If any of the variables are I(d) with d > 0, then many or all parameter
estimates and associated t statistics may be biased. Consider the regression
yt = x0t β + εt . If x and y are integrated of different orders, then εt = y − x0t β
will not be stationary.
• The exception: if two or more of the series are integrated of the same order —
drifting or trending at the same rate — then we may be able to find a linear
combination of the variables that are I(0).
• If so, we can consistently estimate parameters and use standard inference
statistics without having to difference or de-trend the variables.
3.5.1 Example: trending variables
Consider two trending random variables: y1t = 3t + ut and y2t = t + vt , where vt
and ut are uncorrelated white noise errors. Both y1 and y2 are I(1), because their
first difference is stationary. Now consider the error process from a relationship

between y1 on y2 :
yit = αy2t + εt
 
y1t 

εt = y1t − αy2t = 1 −α  
y2t
= (3t + ut ) − α(t + vt )
= (3 − α)t + (ut − αvt )
• This is a linear combination of two I(1) variables, and so would in most cases
be I(1), and the variance if εt would explode as t increases (i.e. not stationary).
• However, if α = 3, then the εt is I(0) — stationary, implying that y1 and y2
are cointegrated: integrated of the same order.

• 1 −α = 1 −3 (or any multiple of it) is called a cointegrating vector
of y1t and y2t .
3.5.2 Error Correction form and cointegration
The ARDL(1,1) model
yt = α0 wt + γyt−1 + β0 xt + β1 xt−1 + εt
can be written as
∆yt = α0 wt + β0 ∆xt + γ ∗ (yt−1 − θxt−1 ) + εt
∆yt = α0 zt + β0 ∆xt + γ ∗ zt + εt
where zt = yt−1 − θxt−1 , θ = −(β0 + β1 )/(γ ∗ ), and γ ∗ = (γ − 1).

• If y and x are I(1), and wt are I(0), then ε is I(0) if zt = yt−1 − θxt−1 is
I(0).
• Because θ is a function of the ARDL(1,1) parameters, a cointegrating rela-
tionship between the unrestricted ARDL parameters must hold for ε to be
stationary.
• If such a relationship DOES hold, then ε will be stationary and we can es-
timate both the ARDL(1,1) form and the ECM form in a standard fashion
(OLS, NLS, with standard sampling distributions for the parameter estimates)
WITHOUT having to difference all the data.
• If a cointegrating relationship does NOT hold, then the disturbance process
is not covariance stationary, and therefore the parameter estimates are not
covariance stationary, which means their sampling distributions are not sta-
tionary.
Note that when there is a cointegrating relationship, the regression above:
∆yt = α0 wt + β0 ∆xt + γ ∗ zt + εt
is a regression of the I(0) variable ∆yt on other I(0) variables.
Generally: If an ARDL(p,r) can be reparameterized as an ECM model with I(0)
variables, then the parameters on those I(0) variables can be estimated consistently
with OLS applied to the original ARDL(p,q) model, and the t-statistics on these pa-
rameter estimates are asymptotically standard normal. If in a reformulated (ECM)
regression only a subset of the parameters are associated with I(0) variables, this
subset of parameter estimates have standard sampling distributions. The others
don’t.
The next question: How do we know if a cointegrating relationship exists be-
tween the two variables?

3.5.3 Testing for cointegration
Three approaches to testing for cointegrating vectors – single equation approaches
and a multiple equation approach. We begin with the single equation approach and
discuss the multiple equation approach in the context of VAR’s.
Single-equation cointegration test. If two (or more) series are cointegrated
(or I(0)) in the context of then the errors in a regression of one on the others
will be associated with a disturbance series that is I(0) . The Engle-Granger
cointegration test Proceed as follows:
• Calculate a DF test statistic based on the errors from your hypothesized

γ̂ ∗
regression: That is, run the regression ∆ε̂t = γ ∗ εt−1 +vt , and calculate se(γ̂ ∗ ) .
• This test statistic does not have the same distribution as the usual DF test
statistic. You need to compare it to a different set of critical values developed
by Davidson and MacKinnon (1993) (Not shown in Greene).
• If a unit root is not rejected, then no cointegration and inference relating to
model parameters based on that model is suspect.
• Differencing of one thing or another is likely called for.
3.6 Vector Autoregression, VAR
A VAR can be thought of as a reduced form for a system of dynamic equations.
Usefulness of the VAR framework:
• Forecasting
• Testing Granger Causality
• characterizing the time path of effects of shocks (impulse response).

Figure 2: Engle-Granger critical Values
For two endogenous variables and known lag-length of p = 2, the VAR is a
two=equation model structured as:
y1t = µ1 + δ111 y1t−1 + δ112 y2t−1 + δ211 y1t−2 + δ212 y2t−2 + ε1t
y2t = µ2 + δ121 y1t−1 + δ122 y2t−1 + δ221 y1t−2 + δ222 y2t−2 + ε2t
or
           
y1t  µ1  δ111 δ112  y1t−1  δ211 δ212  y1t−2  ε1t 
 = +  +  + 
y2t µ2 δ121 δ122 y2t−1 δ221 δ222 y2t−2 ε2t
or
yt = µ + Γ1 yt−1 + Γ2 yt−2 + εt

Where δjml is the coefficient for the j th lag in the mth equation on the lth endogenous
variable.
Estimation: VARs are systems of regression equations with interrelated errors, so
SUR seems appropriate. However, because there are no cross-equation restrictions,
SUR is mathematically equivalent to OLS equation by equation.
3.6.1 Granger Causality
yt = γyt−1 + βxt−1 + εt
• If β 6= 0 then x Granger-causes y in the regression above.
• Generally, if xt−1 adds information to yt in addition to that added by yt−1 , x
“Granger causes” y.
• Granger causality of x on y is absent when f (yt |yt−1 , xt−1 , xt−2 , · · · ) = f (yt |yt−1 );
lagged values of x add no additional information.
• This is a statistical relationship — it does not imply causation in any sense
more general than this.
Example 19.8 (Greene), but extended to a VAR(2,2) here: increased oil prices have
0
preceded all but one recession since WWII. Let yt = GNP OIL PRICE .
           
GNP t  µ1  α1 α2   GNPt−1  α3 α4   GNPt−2  ε1t 
= + + + 

  
P OILt µ2 β1 β2 P OILt−1 β3 β4 P OILt−2 ε2t
If α2 = α4 = 0 then changes in oil prices do not “Granger cause” changes in GNP;
otherwise it does.
testing for GC: H0 : α2 = α4 = 0. We can use a likelihood ratio test based on
first the restricted and unrestricted regressions of the first equation (GNP) alone; no

need to estimate the second equation for this test. Test stat distributed χ2 (J = 2).
3.6.2 Impulse Response Functions
Impulse response functions track the effect of a single shock (from one or more
disturbance terms εi ) on equilibrium values of y on the time path of y after the
shock.
           
y1t  µ1  δ111 δ112  y1t−1  δ211 δ212  y1t−2  ε1t 
 = +  +  + 
y2t µ2 δ121 δ122 y2t−1 δ221 δ222 y2t−2 ε2t
or
yt = µ + Γ1 yt−1 + · · · + Γp yt−p + vt
(mx1) (mx1) (mxm) (mx1) (mxm) (mx1) (mx1)
For forecasting we can use the same Kalman filter arrangement as with the ARDL
Pp
model before: Recast the general model yt = µ + i Γi yt−i + vt as
 
      
yt µ Γ Γ2 ··· Γp yt−1ε
     1    t
        
 yt−1   0   I 0 ··· 0 yt−2   0 
   
 . =.+ .  .  +  . 
    
 .  .  . ..
 .  .  . . ··· 0   ..   .. 
   
        
yt−p+1 0 0 ··· I 0 yt−p 0
or
ỹt = µ̃ + Γ (L)ỹt + vt .
let Γ (L) = Γ1 L + Γ2 L2 , so that yt = µ + Γ1 yt−1 + Γ2 yt−2 + εt can be written
as
yt = µ + Γ (L)yt + vt .

Assuming a stable system (and leaving out the tildes),
[I − Γ (L)]yt = µ + vt
yt = [I − Γ (L)]−1 (µ + vt )
∞
X
−1
= [I − Γ (L)] µ+ Γ i vt−i
i
∞
X
= ȳ + Γ i vt−i
i
= ȳ + [I − Γ (L)]−1 vt
Note: for y to be stationary, we need [I − Γ (L)]−1 to be nonsingular — for this, all
eigenvalues must be less than one in absolute value (i.e. the moduli), whether or
not the eigenvalue(s) are real or complex. Note, the modulus of a complex number
√
h + vi is R = h2 + v 2 .
What we are interested in is how a one-time shock flows through to the yi,t+j . In
general, a set of impulse response function and it’s covariance matrix is calculated
as
ŷT +s = ȳ + Γ s vT
s−1
X
Σ̂T +s = Γ i Ω(Γ 0 )i
i=0
where Γ i is Γ to the ith power.
Example: Suppose a first order VAR with µ = 0 for both equations in.
        
y1t  0.008 0.461 y1t−1  v1t  1 .5
 =   +  ; Ω = Cov[v1t , v2t ] =  
y2t 0.232 0.297 y2t−1 v2t .5 2

Now, suppose a one unit change in v2t at t=0, such that v20 = 1. Then
   
y10  0
 = 
y20 1
      
y
 11  0.008 0.461 y
  10  0.461
 =   = 


y21 0.232 0.297 y20 0.297
      2    
y
 12  0.008 0.461 y
  11  0.008 0.461 y
  10  0.141
 =   =    =


y22 0.232 0.297 y21 0.232 0.297 y20 0.195
The covariance estimate for the two-period ahead impulse response is
     0
1 .5 0.008 0.461  1 .5 0.008 0.461
Σ̂2 =  +   
.5 2 0.232 0.297 .5 2 0.232 0.297
which you can use for estimating a confidence interval.

3.6.3 Estimation of nonstationary cointegrated variables with VAR
VAR can be applied to a set of nonstationary variables are and cointegrated (see
Davidson and Mackinnon section 14.5).
Consider the VAR with g endogenous variables Yt :
p+1
X
Yt = Xt B + Yt−i Φi + Ut
i=1
where Yt are I(1), Xt are assumed I(0) deterministic variables, and B and Γi are
matrices of parameters to be estimated. This VAR can be reparameterized as
p
X
∆Yt = Xt B + Yt−1 Π + ∆Yt−i Γi + Ut
i=1
Pp+1
where Γp = −Φp+1 , Γi = Γi+1 − Φi+1 for i = 1...p and Π = i=1 Φi − Ig . This
is the multivariate analogue of an augmented Dickey-Fuller test regression.
If the original variables Y are I(1) and the deterministic variables X are I(0),
then the rank r ≤ q of Π is equal to the number of cointegrating vectors. If r = 0,
there are no cointegrating vectors. r = g implies that all Y are stationary.
Note that the estimated value of Π will always be full rank (unless there is
perfect collinearity in the data to begin with, but then you wouldn’t be able to run
a regression in the first place). The question is: can we test where our estimates
suggest one or more cointegrating vectors.
3.6.4 VARs and cointegration tests
Cointegration tests become potentially more complicated because for g variables
(Greene uses M ) in a VAR with I(0) variables, there can be up to g-1 cointegrating
vectors.
The Johansen test seems to be the most popular method for testing for cointe-
grating vectors in VARs. We will not go into detail about estimation, but there is

a brief discussion in Greene p. 656-657 and in Davidson and MacKinnon.
1. We need to the number r of linearly independent cointegrating vectors em-
bedded in Π . This is done with successive tests of Ho : there are r or fewer
cointegrating vectors, versus Ha : there are more than r cointegrating vectors
(up to g).
2. For each r starting with zero, a trace statistic or max statistic is calculated,
depending on the approach. These statistics are asymptotically χ2 [g − r]. Big
statistic ⇒ reject Ho , and move on to next r. When statistics<critical value,
accept null.
Note that r > 1 implies more than one possible long-run relationship repre-
sented by a number of possible parameterizations. This is similar to the case of
an overidentified structural equation. Indeed, a VAR is in effect a reduced form of
a dynamic structural equation. To identify which cointegrating relationship holds
requires out-of-sample structural information (as in structural models themselves).
3.6.5 Structural VARs
A VAR yt = µ + Γ yt−1 + vt can be seen as a reduced form of the structural model
Θyt = α + Φyt−1 + εt
where Γ = Θ −1 Φ, α = Θ −1 µ, εt = Θ −1 vt , and Cov[εt ] = [Θ −1 ]Σ [Θ −1 ]0 .
Thus, we are simply back to simultaneous equation systems, but with the issues of
dynamics and simultaneity combined.
Example: Suppose that
 
 1 −θ12 
Θ = .
−θ21 1

Then we have a dynamic simultaneous equations problem, with all the lagged
dep. vars. being predetermined and therefore, for our purposes, exogenous.
Hsiao (1997) shows that if you have nonstationarity but cointegrating relation-
ships in your model, then 2SLS and 3SLS can proceed as usual to address endo-
geneity.


Dynamics

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Dynamics

Diunggah oleh

Hak Cipta:

Format Tersedia

Part 6: Dynamic models October 7, 2008

Dynamic Econometric Models

1 Introduction to dynamic models

Many models in economics related to dynamic behavior:

• Disequilibrium and incomplete adjustment,

• capital and resource stock utilization and depreciation.

hints at dynamic interactions across time periods.

tics. We will focus on data with temporal relationships.

1.1 General form of a linear dynamic model

A general representation of such a model might be

where εt might be related to past values of ε, such that

εt = ρ1 εt−1 + ρ2 εt−2 + · · · ρr εt−g + ut (serial correlation), or

εt = ut + θ1 ut−1 + · · · θq ut−q , (moving average error)

Page 73 — WSU Econometrics II 2007,

We begin by looking closely at univariate time series, where yt is modeled

as a function of only past values of itself. Later we examine Autoregressive

1.2 Necessary conditions for consistency of OLS

1. E[εt |xt−s ] = 0 ∀ s ≥ 0. Implies that εt contains only new info at t. No AR

3. (required for univariate series) yt “stable” :

polynomial 1 − γ1 z − γ2 z 2 − · · · − γp z p = 0 are outside the unit circle.

(b) Ergodic: events are asymptotically independent — impact of one on the

other weakens with temporal distance.

5. Asymptotic normality holds with some additional assumptions called the

Mann and Wold conditions, so that b ∼a N β, σ 2 Q(s)−1 .

To summarize: for OLS to be consistent,

• with a lag(s) of the dependent variable on the right-hand-side, errors must

contain only new information (no AR or MA processes).

• You need either stationary series, or a stationary relationship among nonsta-

Page 74 — WSU Econometrics II 2007,

used for consistent parameter estimation, including pre-estimation data differenc-

ing, instrumental variables methods, and/or conditional Maximum Likelihood or

Nonlinear Least Squares.

2 Univariate time series models

Reading: G[20], DM[13], K[18]

2.1 lag operators

The lag operator L is defined such that

Lq (Lp xt ) = Lp+q xt = xt−p−q

In some cases we want to work in changes, or differences of data across observations:

(1 − L)xt = xt − xt−1 = ∆xt (first-difference)

(1 − L)(1 − L)xt = (1 − L)(xt − xt−1 )

= (xt − xt−1 ) − (xt−1 − xt−2 )

Now consider the infinite series

Page 75 — WSU Econometrics II 2007,

this implies, for example, that

Infinite series of lags on xt , but one associated parameter, a.

1. This section focuses on the univariate model

which is called Autoregressive Moving Average model, or ARMA(p,q),

where p is the order of C(L) and q is the order of R(L).

2. An Autoregressive Integrated Moving Average, or ARIMA(p,d,q) model

corresponds to data that must be differenced d times to ensure stationarity,

which is a necessary condition for univariate estimation.

or Maximum likelihood conditional on the initial observations is a consistent

estimator of ARIMA parameters.

4. We will define autocorrelation functions and partial autocorrelation

functions. The empirical counterparts to these will help us to identify the

dynamic characteristics of a data series

2.2 Stationarity and Invertibility

where C(L) is represents an AR process in y and R(L) represents a moving average

R(L)εt = (1 − θ1 L − θ2 L2 )εt = εt − θ1 εt−1 − θ2 εt−2

Page 76 — WSU Econometrics II 2007,

For estimation of ARIMA models, we require two things:

1. (Weak) stationarity of y for finite and stable asymptotic variance.

(a) E[yt ] is independent of t.

(b) Var[yt ] is finite and independent of t.