Anda di halaman 1dari 24

State Space Models, Kalman Filter and Smoothing

The idea that any dynamic system can be expressed in a particular representation called the
state space representation was proposed by Kalman. He presented an algorithm (or) a set of rules
to sequentially forecast and update a set of projections of the unknown state vector.
State space representation of a dynamic system The general case
State space models were originally developed by control engineers to represent a dynamic system
or dynamic linear models. Interest normally centers on a (m 1) vector of variables, called a state
vector, that may be signals from a satellite or the actual position of a missile or a rocket. A state
vector represents the dynamics of the process. More precisely, it retains all the memory in the
process. All the dependence between past and future, must funnel through the state vector. The
elements of the state vector may not have any specic economic meaning, but state space approach is
popular in economic applications involving modelling unobserved or latent variables, like permanent
income, NAIRU (Non-Acclerating Ination Rate of Unemployment), expected ination, state of the
economy in business cycle analysis, etc. In most cases such signals are not observable directly, but
such a vector of variables is related to a (n 1) vector z
t
of variables that are actually observed,
through an equation called measurement equation or the observation equation, given by
z
t
= A
t
x
t
+Y
t

t
+N
t
(1)
where Y
t
and A
t
are parameter matrices of order (n m) and (n k) respectively, x
t
is (k 1)
vector of exogeneous or pre-determined variables, and N
t
is (n1) vector of disturbances which has
zero mean and covariance matrix H
t
.
Although the state vector
t
is not directly observable, its movements are assumed to be governed
by a well dened process, called the transition equation or state equation given by,

t
= T
t

t1
+R
t

t
, t = 1, . . . , T. (2)
where T
t
and R
t
are matrices of order (m m) and (m g) respectively.
t
is (g 1) vector of
disturbances with mean zero and covariance matrix Q
t
.
Remarks:
1. Note that in the measurement equation we have an added disturbances term N
t
. We need it i
we assume that what we have observed is contaminated by an additional noise; otherwise we
simply have
z
t
= A
t
x
t
+Y
t

t
. (3)
1
2. In those cases where we allow additional noise to be a part of the measurement equation,
we assume that disturbances in the measurement and transition equations are mutually and
serially uncorrelated at all time periods. Additionally they are uncorrelated with the initial
state vector
0
. We summarize these assumptions as:
_
_
N
t

t
_
_
WN
_
_
0,
_
_
H
t
0
0 Q
t
_
_
_
_
, t = 0, 1, . . . , T (4)
and
E(
0

t
) = 0, E(
0
N
t
) = 0, t = 1, . . . , T (5)
3. x
t
is a (k 1) vector of predetermined or exogeneous variables. This means they may contain
lagged values of z as well as variables that are uncorrelated with
t
and N
t
, also.
4. The equations and assumptions about the error vectors are generally used to describe a nite
series of observations {z
1
, z
2
, . . . , z
T
} for which we need some assumptions about the initial
values of the state vector,
1
. We therefore assume that
1
is uncorrelated with any realizations
of
t
and N
t
. That is,
E(
t

1
) = 0 for t = 1, . . . , T, (6)
E(N
t

1
) = 0 for t = 1, . . . , T. (7)
Additionally we also assume that
t
is uncorrelated with lagged values of
t
. That is,
E(
t

) = 0 for = t 1, t 2, . . . , 1. (8)
Similarly,
E(N
t

) = 0 for = 1, 2, . . . , T (9)
E(N
t
z

) = E[N
t
(A
t
x
t
+Y
t

t
+N
t
)

]
= 0 for = t 1, t 2 . . . , 1 (10)
E(
t
z

) = 0 for = t 1, t 2 . . . , 1. (11)
These assumptions have been made to make the system quite exible. Any assumption can be
relaxed and the results generalized.
5. Notice that we have not mentioned anything about the size of the matrices as well as the
dimension of the state vector. Suce it to say at this point, that the dimensions have to be
2
large enough so that the dynamics of the systems can be captured by the simple rst order
Markov structure of the state equation. From a technical point of view, the aim of state space
form is to set up
t
such that, it has as small number of elements as possible. Such a state
space set up is called a minimal realization and it is a basic criterion for a good state space
form.
6. In many cases of interest only one observation is available in each time period that is z
t
is now a scalar in the state equation. Also, the transition matrix is much simpler than given
before, in the sense, the parameters in most cases, including the variance, are assumed to be
time invariant. Thus the transition equation now becomes

t
= T
t1
+R
t
, t = 1, . . . , T. (12)
and

t
= WN(0,
2
Q). (13)
7. For many applications using Kalman lter, the vector of exogenous variables is simply not
necessary. One may also assume that the variance of the noise term is time invariant. So that,
the general system now boils down to:
z
t
= y

t
+ N
t
, t = 1, . . . , T (14)

t
= T
t1
+R
t
, t = 1, . . . , T. (15)
z
t
now is a scalar and N
t
(0,
2
h) and y

t
is a (1 m) vector. In some of the state
space applications, especially those that use ARMA models the measurement error in the
observation equation i.e. N
t
is assumed to be zero. This means that N
t
in such applications
will be absent.
8. There are many ways to write a given system in state-space form. But written in any way, if
our primary interest is forecasting, we would get identical forecasts no matter which form we
use. Note also that, we can write any state space form as an ARMA model. This way, there
is an equivalence between the two forms.
3
Examples of state space representation:
Example 1: First let us consider the general ARMA(p, q) model and see how it can be cast in a
state space form. An ARMA(p, q) model can be written, by dening m =max(p, q +1), in the form:
z
t
=
1
z
t1
+
2
z
t2
+ +
m
z
tm
+
1
e
t1
+
2
e
t2
+ +
m1
e
tm+1
+ e
t
where we interpret
j
= 0 for j > p and
j
= 0 for j > q.
Then we can write the state and observation equation as follows:
State equation:

t
=
_

1
.
.
.

2
.
.
. I
m1
.
.
.
.
.
.
. . . . . . . . .

m
.
.
. 0

t1
+
_

_
1

1
.
.
.
.
.
.

m1
_

_
e
t
Observation equation:
z
t
=
_
1 0 . . . 0

t
.
The original model can be easily recovered by repeated substitution, starting at the bottom row of
the state equation. We can easily note that the rst element of the state vector is identically equal
to the given model for z
t
.
Example 2: Let us consider next a univariate AR(p) process:
z
t
=
1
z
t1
+
2
z
t2
+ +
p
z
tp
+ e
t
where (B) = (1
1
B
2
B
2

p
B
p
) is the AR operator and e
t
is white noise. This could
be dened in state space form by writing the (m1) state vector,
t
, where m = p for the present
case, as follows:
State equation:

t
=
_

1
.
.
.

2
.
.
. I
m1
.
.
.
.
.
.
. . . . . . . . .

m
.
.
. 0

t1
+
_

_
1
0
.
.
.
.
.
.
0
_

_
e
t
4
Observation equation:
z
t
=
_
1 0 . . . 0

t
.
Dening
t
= (
1t

2t
. . .
mt
)

, and substituting from the bottom row, we get the original AR


model.
Example 3: Let us consider the following ARMA(1, 1) model. For this model m = 2.
z
t
=
1
z
t1
+
1
e
t1
+ e
t
.
For this model the state and the measurement equations are given below:
StateEquation :
t
=
_
_

1
1
0 0
_
_

t1
+
_
_
1

1
_
_
e
t
and
ObservationEquation : z
t
=
_
1 0

t
.
If we dene
t
= (
1t

2t
)

, then

2t
=
1
e
t

1t
=
1

1,t1
+
2,t1
+ e
t
=
1
z
t1
+
1
e
t1
+ e
t
.
and this is precisely the original model.
Example 4: As a nal example, we shall consider the rst order moving average model, assuming
that the model has zero mean:
z
t
= e
t
+
1
e
t1
.
Here m = 2, so that the state and the measurement equations are given as follows:
State equation:
t
=
_
_
0 1
0 0
_
_

t1
+
_
_
1

1
_
_
e
t
and
Observation equation: z
t
=
_
1 0

t
.
If we dene
t
= (
1t

2t
)

, then
2t
=
1
e
t
and
1t
=
2,t1
+e
t
= e
t
+
1
e
t1
and this is precisely
the original model.
5
We have seen before that there are many ways of writing a given system in state space form. We
shall here give an example of writing the AR(p) process in a dierent way.
Example 5: As before let m = p. The state equation is given as: State Equation:
_

_
z
t

z
t1

.
.
.
z
tp+1

_

_
. .

t
=
_

1

2
. . .
p1

p
1 0 . . . 0 0
.
.
.
.
.
.
0 . . . . . . 1 0
_

_
. .

_
TTT
_

_
z
t1

z
t2

.
.
.
z
tp

_

_
. .

t1
+
_

_
1
0
.
.
.
0
_

_
. .

_
RRR
e
t
Observation Equation:
(z
t
) =
_
1 0 . . . 0

. .

_
yyy

t
_

_
z
t

z
t1

.
.
.
z
tp+1

_

_
. .

t
In this case, by carrying out the matrix multiplication on RHS of the state equation, we can
notice that the rst row gives the original AR model and the rest are trivial identities, including the
observation equation.
Example 6: Let us take the ARMA(p, q) that we have seen before:
z
t
=
1
z
t1
+
2
z
t2
+ +
m
z
tm
+
1
e
t1
+
2
e
t2
+ +
m1
e
tm+1
+ e
t
where we interpret
j
= 0 for j > p and
j
= 0 for j > q. We shall re-write it in a way dierent
from what we saw in Example 1. Let m =max(p, q + 1). Then we can write the state equation and
observation equation as follows:
State Equation:

t+1
=
_

1

2
. . .
m1

m
1 0 . . . 0 0
.
.
.
.
.
.
0 0 . . . 1 0
_

t
+
_

_
e
t+1
0
.
.
.
0
_

_
6
Observation Equation:
z
t
= +
_
1
1
, . . . ,
m1

t
.
We shall take the ARMA(1, 1) model and see how to write the state space form as given Example
6 and retrieve the original model. For ARMA(1, 1), m = 2. So the state and the observation
equations are:
StateEquation :

t+1
=
_
_

1
0
1 0
_
_

t
+
_
_
1
0
_
_
e
t+1
,
ObservationEquation :
z
t
= +
_
1
1

t
.
Starting from the second row of the state equation, we have

2,t+1
=
1,t
.
First row of state equation implies that

1,t+1
=
1

1,t
+ e
t+1
or
_
1
1
B
_

1,t+1
= e
t+1
. . . . (1)
Observation equation states that
z
t
= +
1,t
+
1

2,t
= +
1,t
+
1

1,t1
= +
_
1 +
1
B
_

1,t
. . . (2)
Multiply (2)
_
1
1
B
_
to give:
_
1
1
B
__
z
t

_
=
_
1
1
B
__
1 +
1
B
_

1,t
=
_
1 +
1
B
_
e
t
[from (1)]
= (which is the given model)
Example 7: Let us take an example of state space formulation for an economic problem. Fama
and Gibbons, Journal of Monetary Economics, 1982,9,pp.297-32 use the state space idea
7
to study the behaviour of ex-ante real interest rate (dened as the nominal interest rate, i
t
, minus
the expected ination rate,
e
t
.) This is unobservable because we do not have data on the anticipated
rate of ination. Thus, the state variable is:

t
= i
t

e
t
,
where is the average of ex-ante real interest rate. Fama and Gibbons assume that the ex-ante real
interest rate follows the AR(1) process:

t+1
=
t
+ e
t+1
.
But an econometrician has data on ex-post real interest rate (that is, nominal interest rate, i
t
minus
the actual rate of ination,
t
.) That is,
i
t

t
=
_
i
t

e
t
__

e
t

t
_
= +
t
+
t
,
where
t
=
e
t

t
, is the error agents made in forecasting ination. If people forecast optimally,
then
t
should be free of autocorrelation and should be uncorrelated with ex-ante real interest rate.
Kalman Filter An Overview
Consider the system given by the following equations:
z
t
= y

t
+ N
t
, t = 1, . . . , T

t
= T
t1
+R
t
, t = 1, . . . , T.
Given this, our objectives could be either to obtain the values of unknown parameters or, given
the parameter vectors, we may be aiming to obtain the linear least squares forecasts of the state
vector on the basis observed data. Kalman lter (KF here after) has many uses. We are utilising
it as an algorithm to evaluate the components of the likelihood function. Kalman ltering follows
a two-step procedure. In the rst step, the optimal predictor for the next observation is formed,
based on all the information currently available. This is done by the prediction equation. In the
second step, the moment a new observation becomes available, it is then incorporated into the
estimator of the state vector using the updating equation. These two equations collectively form
the Kalman lter equations. Applied recursively, the KF provides an optimal solution to the twin
problems of prediction and updating. Assuming that the observations are normally distributed and
also assuming that current estimator of the state vector is the best available, the prediction and the
8
updating estimators are the best. By best, we mean the estimators have the minimum mean squared
error (MMSE). It is very evident that the process of predicting the next observation and updating it
as soon as the actual value becomes available, has an interesting by product the prediction error.
And we have seen in the chapter on estimation, that how a set of dependent observation can be
decomposed in terms of the prediction errors. KF gives us a natural mechanism to carry out this
decomposition.
Kalman lter recursions Main equations
We shall use a
t
to denote the MMSE estimator of
t
based on all information up to and including
the current observation z
t.
Similarly, we have a
t/t1
as the MMSE estimator of
t
at time t 1.
That is, a
t|t1
= E(
t
|I
t1
).
Prediction:
At time t1, all available information, including z
t1
is incorporated in a
t1
, which is the MMSE
estimator of
t1
. The prediction error has a covariance matrix of
2
e
P
t1
. More precisely,

2
e
P
t1
= E
_
(
t1
a
t1
) (
t1
a
t1
)

From

t
= T
t1
+R
t
,
we get that at time t 1, the MMSE estimator of
t
is given by
a
t/t1
= Ta
t1
so that the estimation error or the sampling error is given by
_

t
a
t/t1
_
= T(
t1
a
t1
) +R
t
.
The right hand side of this estimation error has zero expectations. We have to note here that an
estimator is unconditionally unbiased (u-unbiased) if its estimation error has zero expectations. And
when an estimator is u-unbiased its MSE matrix, E
_
_

t
a
t|t1
_ _

t
a
t|t1
_

_
is identical to the
covariance matrix of the estimation error,
_
_
a
t|t1

t
_ _
a
t|t1

t
_

_
. And hence we can write the
the covariance of the estimation error as:
E
_
_

t
a
t/t1
_ _

t
a
t/t1
_

_
= E
__
T(
t1
a
t1
) +R
t
__
T(
t1
a
t1
) +R
t
_

_
= TE
_
(
t1
a
t1
) (
t1
a
t1
)

_
T

+TE
_
(
t1
a
t1
)

t
_
R

+
RE
_

t
(
t1
a
t1
)

_
T

+RE
_

t
_
R

=
2
TP
t1
T

+
2
RQR

.
9
Thus,
_

t
a
t/t1
_
WS
_
0,
2
P
t/t1
_
where
P
t/t1
= TP
t1
T

+RQR

where WS stands for wide sense. Weak stationarity is sometimes referred to as wide sense
stationarity.
Now, given that a
t/t1
is MMSE of
t
at time t 1, the MMSE of z
t
at time time t 1 clearly
is,
z
t/t1
= y

t
a
t/t1
.
The associated prediction error is
_
z
t
z
t/t1
_
=
t
= y

t
_

t
a
t/t1
_
+ N
t
the expectation of which is zero. Hence,
var
t
= E(
2
t
) = E
_
y

t
_

t
a
t/t1
_ _

t
a
t/t1
_

y
t
_
+ E(N
2
t
)
[since cross product terms have zero expectations]
=
2
y

t
P
t|t1
y
t
+
2
h =
2
f
t

Deriving the state updating equations is involved and hence the important steps are relegated to
the appendix and we state only the main equations below:
Updating equation
a
t
= a
t|t1
+P
t|t1
y
t
_
z
t
y

t
a
t|t1
_
/f
t
.
And the estimation error is said to be
(
t
a
t
) WS(0,
2
P
t
)
where
P
t
= P
t|t1
P
t|t1
y
t
y

t
P
t|t1
/f
t
where f
t
= y

t
P
t|t1
y
t
+ h.
We have to highlight the following points.
10
1. Note the role played by the prediction error,
t
=
_
z
t
y

t
a
t|t1
_
and the variance associated
with it,
2
f
t
.
2. And note also the term, (m 1) vector,
_
P
t|t1
y
t
/f
t
_
, which is called the Kalman gain.
3. In the discussion so far, we have assumed the presence of an additional noise in the measurement
equation; that is, h > 0. But we also have to note that, in our examples of state space
representation of ARMA models, we have assumed that the measurement equation has no
additional error. That is, N
t
is assumed to be zero, implying h, the variance of the measurement
error term, will be zero. However this should not matter, since through these adjustments note
that we have isolated h as an additive scalar, which when becomes zero, does not aect our
calculations. (Note the expression for f
t
.)
ML Estimation of ARMA models
Literature has many algorithms aimed at simplifying the computation of the components of
the likelihood. One approach is to use the Kalman lter recursions. Other useful algorithms are
by Newbold (Biometrica, 1974, Vol.61, 423-26) and the innovations algorithm, suggested by
Ansley (Biometrica, 1979, Vol.66,59-65).
KF recursions are useful for a number of purposes. But our emphasis will be on understanding
how these recursions (1) can be used to construct linear least squares forecasts of the state vector on
the basis of data observed through time t, and (2) use the resulting prediction error and its variance
to build the components of the likelihood function. In our derivation so far, we have motivated the
discussion on Kalman lter in terms of linear projection of the state vector,
t
and the observed times
series z
t
. These are linear forecasts and are optimal among any function, if we assume that the state
vector and the disturbances are multivariate Gaussian. Our main aim is to see how KF recursions
calculate these forecasts recursively, generating a
1 | 0
, a
2 | 1
, . . . , a
T | T1
, and P
1|0
, P
2|1
, . . . , P
T|T1
in
succession.
How do we start the recursions?
To start the recursions, we need to get a
1|0
. This means we should get the rst period forecast of
based on an information set. Since we dont have information on the zeroth period, we take the
unconditional expectation as
a
1|0
= E (
1
) ,
where the associated estimation error has zero mean and covariance matrix
2
P
1|0
.
11
Let us explain this with the help of an example.
Example 8: Let us take the simplest MA(1) model.
z
t
= e
t
+
1
e
t1
We have shown before that the state vector is simply

t
=
_
_
z
t

1
e
t
_
_
and hence
a
1|0
= E
_
_
z
1

1
e
1
_
_
=
_
_
0
0
_
_
.
And the associated variance matrix of the estimation error,
2
P
0
or
2
P
1/0
, is simply E(
1

1
), so
that we have,
P
1|0
=
2
E (
1

1
)
=
2
E
_
_
_
_
_
z
1

1
e
1
_
_
_
z
1

1
e
1

_
_
_
=
_
_
1 +
2
1

1

1

2
1
_
_
While one can work out by hand the covariance matrix for the initial state vector for pure MA
models, this turns out to be too tedious for higher order mixed models. So, we need a closed form
solution to calculate this matrix. We get such a solution by generalising this. Generalisation is easy
if we can make prior assumptions about the distribution of the state vector.
Two categories of state vector can be distinguished depending on whether or not the state vector
is covariance stationary. If it is so, then the distribution of the state vector is readily available; and
with that the problem of starting values can be easily resolved. With the assumption that the state
vector is covariance stationary, one can easily check from the state equation that the unconditional
mean of the state vector is zero. That is, from the state equation, one can easily see that
E(
t
) = 0,
and the unconditional variance of
t
is easily seen to be,
E (
t

t
) = E
_
(T
t1
+R
t
) (T
t1
+R
t
)

12
Let us denote the LHS of the above expression as . Noting that the state vector depends on shocks
only up to t 1, we get
= TT

+RQR

Though this can be solved in many ways, a direct closed form solution is given by the following
matrix lemma.
We use the vec operator and use the following result.
Proposition: Let A, B and C be matrices such that the product ABC exists. Then
vec
_
ABC
_
=
_
C

A
_
vec
_
B
_
.
Thus, we vectorize both sides of the expression for and rearrange to get a closed form solution
as,
vec () =
_
I
m
2 (TT)
_
1
vec (RQR

What this implies is that, provided the process is covariance stationary, Kalman lter recursions
can be started with
_
a
1 | 0
_
= 0, and the (mm) matrix P
1 | 0
, whose elements can be expressed as
a column vector, is obtained from:
vec
_
P
1|0
_
=
_
I
m
2 (TT)
_
1
vec (RQR

)
The best way to get a grasp of the Kalman recursions, is to try them out on a simple model. Let
us try them on the simple MA(1) model.
Example 9: Assume for convenience that the process has zero mean. So, the MA(1) model can
be written as,
z
t
= e
t
+
1
e
t1
.
Here m = 2. So from Example 3, we have the state and the measurement equations given as follows:
State equation:
t
=
_
_
0 1
0 0
_
_

t1
+
_
_
1

1
_
_
e
t
and
Observation equation: z
t
=
_
1 0

t
.
Note that the observation equation has no error. How do we start the recursions? Recall from the
prediction equation that we have to rst get a
t|t1
. That is, for the rst period, we need to get
13
a
1|0
, the initial state vector. From our discussion about covariance stationary properties of the state
vector, it is clear that that
a
1|0
= Ta
0
= 0.
Next we have to calculate the matrix of the estimation error, i.e.
2
P
1|0
or
2
P
0
. Though we have
a formula to calculate the such matrices, for the present problem one can nd it directly:
P
1|0
= P
0
=
2
E
_

1
_
=
2
E
_
_
_
_
_
z
1

1
e
1
_
_
_
{z
1

1
e
1
}
_
_
=
_
_
(1 +
2
1
)
1

1

2
1
_
_
.
Let us calculate the prediction error for z
1
. One can easily see that z
1|0
= 0, and hence the associated
prediction error
1
= z
1
itself and the prediction error variance is given as:
var (
1
) =
2
_
1 0
_

2
E
_

1
_
_
_
1
0
_
_
=
2
[1 0]
_
_
1 +
2
1

1

1

2
1
_
_
_
_
1
0
_
_
=
2
(1 +
2
1
), with f
1
= (1 +
2
1
).
Application of the updating formula:
a
1
=
_
_
(1 +
2
1
)
1

1

2
1
_
_
_
_
1
0
_
_
z
1
_
(1 +
2
1
)
=
_
_
(1 +
2
1
)z
1
z
1

1
_
_
_
(1 +
2
1
)
=
_
_
z
1
z
1

1
_
(1 +
2
1
)
_
_
14
Similarly,
P
1
=
_
_
(1 +
2
1
)
1

1

2
1
_
_

_
_
(1 +
2
1
)
1

1

2
1
_
_
_
_
1
0
_
_
_
1 0
_
_
_
(1 +
2
1
)
1

1

2
1
_
_
_
(1 +
2
1
)
=
_
_
0 0
0
4
1
_
(1 +
2
1
)
_
_
Prediction equation for
2
:
a
2|1
= Ta
1
=
_
_
0 1
0 0
_
_
_
_
z
1
z
1

1
_
(1 +
2
1
)
_
_
=
_
_
z
1

1
_
(1 +
2
1
)
0
_
_
.
And,
P
2|1
=
_
_
0 1
0 0
_
_
_
_
0 0
0
4
1
_
(1 +
2
1
)
_
_
_
_
0 0
1 0
_
_
+
_
_
1
1

1

2
1
_
_
=
_
_

4
1
_
(1 +
2
1
) 0
0 0
_
_
+
_
_
1
1

1

2
1
_
_
=
_

_
(1 +
2
1
+
4
1
)
(1 +
2
1
)

1

1

2
1
_

_
.
Predicting z
2
:
z
2
=
_
1 0
_
_
_
z
1

1
_
(1 +
2
1
)
0
_
_
= z
1

1
_
(1 +
2
1
)
Prediction error
2
:

2
= z
2
z
1

1
_
(1 +
2
1
),
15
and
f
2
=
_
1 0
_
_

_
(1 +
2
1
+
4
1
)
(1 +
2
1
)

1

1

2
1
_

_
_
_
1
0
_
_
=
_
(1 +
2
1
+
4
1
)
_
(1 +
2
1
)
1
_
_
_
1
0
_
_
= (1 +
2
1
+
4
1
)/(1 +
2
1
)
These steps show that, for the MA(1) model, one can calculate the prediction error and its variance
using the following recursions:

t
= z
t


1

t1
f
t1
, t = 1, 2, . . . , T, where
0
= 0, and
f
t
= 1 +

2t
1
1 +
2
1
+ +
2(t1)
1
Note here that the expressions for the prediction error
t
and the prediction error variance f
t
are
exactly the same as those obtained using triangular factorization for the MA(1) model.
-
As a nal step towards nalising the likelihood function, we shall note the following further
simplication. Recall that we had decomposed the likelihood for set of dependent observations, into
a likelihood for the independent errors, using the concept of prediction error decomposition, as:
logL(z) =
T
2
log2
T
2
log
2

1
2
T

t=1
logf
t

1
2

2
T

t=1

2
t
/f
t
.
From our derivation, we can see that the
t
and f
t
do not depend on
2
and hence we can concen-
trate
2
out. This means, we have to dierentiate the log-likelihood with respect to
2
and get an
estimator for
2
, say,
2
. So we get,

2
=
1
T
T

t=1

2
t
f
t
.
Evaluating the log-likelihood in terms of
2
=
2
and simplifying, we get
Log L(z)
c
=
T
2
_
log 2 + 1
_

1
2
T

t=1
log f
t

T
2
log
2
.
We either maximize this log likelihood or minimize,
Log L(z)
c
=
T

t=1
log f
t
+ T log
2
.
16
One can make an initial guess about the underlying parameters and either apply the numerical
estimation procedures to calculate the derivatives or analytically calculate the derivatives by dier-
entiating the Kalman recursions. In either case one has to keep in mind the restrictions to be imposed
on the parameters, especially on the MA parameters, to take care of the identication problem. Also,
it has been proved in the literature, that using Kalman recursions to estimate pure AR models is
really not necessary.
-
Kalman Smoothing
We have motivated the discussion on kalman lter so far as an algorithm for predicting the
state vector, obtaining exact nite sample forecasts, as a linear function of past observations.
We have also shown, how the resulting prediction error and the prediction error variance, can
be used to evaluate the loglikelihood.
This is sub-optimal if we are interested in estimating the sequence of states. In many cases,
kalman lter is used to obtain an estimate of the state vector itself. For example, in their model
of the business cycle, Stock and Watson show how one may be interested in knowing the state
of the economy or the phase of the business cycle the economy is in, which is unobservable
at any given historical point. Stock and Watson suggest that comovements in many macro
aggregates have a common element, which may be called the state of the economy and this is
unobservable. They motivate the use of kalman lter to obtain an estimate of this unobserved
state of the economy.
Sometimes elements of the state vector are even interpreted as estimates of missing observations,
which could be higher frequency data points from an observable lower frequency one or simply
an estimate of missing data point. For example, if we have data on a macro aggregate from
1955 through 2104, we may interested in obtaining an estimate of 1970 which may be missing.
Or, we may be interested in extracting monthly data from quarterly data.
Such estimates of the unobserved state of the economy or missing observations can be obtained
from smoothed estimates of the state vector,
t
.
Each step of the kalman recursions gives an estimate of the state vector,
t
, given all current and
past observations. But an econometrician should use all available information to estimate the
sequence of states. Kalman smoother provides these estimates. The only smoothed estimator
which utilises all the sample observations is given by
17
a
t|T
=

E(
t
|I
T
)
and the MSE of this smoothed estimate is denoted
P
t|T
=

E
_
(
t
a
t|T
)(
t
a
t|T
)

.
The smoothing equations start from a
t|T
and P
t|T
and work backwards.
The expressions for a
t|T
and P
t|T
, which may be called the smoothing algorithm, are given
below without proof:
a
t|T
= a
t
+P

t
_
a
t+1|T
T
t+1
a
t
_
P
t|T
= P
t
+P

t
_
P
t+1|T
P
t+1|t
_
P

t
where
P

t
= P
t
T

t+1
P
1
t+1|t
, t = T 1, . . . , 1
with a
T|T
= a
T
and P
T|T
= P
T
.
A set of direct residuals can also be obtained from the smoothed estimators.
e
t
= z
t
y

t
a
t|T
, t = 1, . . . , T
This is not to be confused with the prediction residuals,
t
, dened earlier.
-
We shall explain the smoothing algorithm with an example. Consider the simple model
z
t
=
t
+
t
,
t
WN(0,
2
)

t
=
t1
+
t
,
t
WN(0,
2
q)
where the state,
t
, and the observation, z
t
, are scalars. The state, which follows a random walk
process, cannot be observed directly as it is contaminated by noise. This is the simple signal plus
noise model. We assume that q is known. Also note that in this example we have allowed the
observation z
t
to be measured with error,
t
. For this example, note that T = 1, R = 1 and y

t
= 1.
18
The prediction equations for this example are
a
t|t1
= a
t1
, P
t|t1
= P
t1
+ q
and the updating equations are
a
t
= a
t|t1
+ P
t|t1
(z
t
a
t|t1
)/(P
t|t1
+ 1)
and
P
t
= P
t|t1
P
2
t|t1
/(P
t|t1
+ 1)
We shall demonstrate how to predict, update and smooth with 4 observations: z
1
= 4.4, z
2
=
4, z
3
= 3.5 and z
4
= 4.6. The initial state vector has the property,
0
N(a
0
,
2
P) and we have
been given that a
0
= 4, P
0
= 12 and q = 4 so that RQR

= 4 and h = 1.
From the prediction equation we have a
1|0
= 4, and P
1|0
= 16, so that from the updating
equations we have,
a
1
= 4 + (12 + 4)(4.4 4)/(12 + 4 + 1) = 4.376
and
P
1
= 16 16
2
/17 = 0.941
Since y

t
= 1 in the measurement equation for all t, MMSLE of z
t
is always a
t|t1
. So, z
2|1
=
a
2|1
= a
1
= 4.376.
Repeating the calculations for t = 2, 3 and 4, we get the following results:
Smoothed estimators and residuals
t 1 2 3 4
z
t
4.4 4.0 3.5 4.6
a
t
4.376 4.063 3.597 4.428
P
t
0.941 0.832 0.829 0.828

t
0.400 -0.376 -0.563 1.003
a
t|T
4.306 4.007 3.739 4.428
P
t|T
0.785 0.710 0.711 0.828
e
t
0.094 0.007 -0.239 0.172
19
From the above table we also have: a
2|1
= 4.376, P
2|1
= 4.941, a
3|2
= 4.063, P
3|2
= 4.832, a
4|3
=
3.597 and P
4|3
= 4.829.
From the table, the nal estimates are seen to be a
4
= 4.428 and P
4
= 0.828.
These values can now be used in the smoothing algorithm. And the algorithm, for the current
example reduces to,
a
t|T
= a
t
+ P
t
/P
t+1|t
_
a
t+1|T
a
t
_
P
t|T
= P
t
+
_
P
t
/P
t+1|t
_
2
_
P
t+1|T
P
t+1|t
_
, t = T 1, . . . , 1
Since a
4|4
= a
4
and P
4|4
= P
4
, we can apply the smoothing algorithm to obtain smoothed
estimates for a
3|4
and P
3|4
and work backwards. So we have
a
3|4
= 3.597 + (0.829/4.829)(4.428 3.597) = 3.379
P
3|4
= 0.829 + (0.829/4.829)
2
(0.828 4.829) = 0.711
The rest of the smoothed estimates have been displayed in the table above.
The smoothed estimates of the unobserved state vector is displayed by the row a
t|T
in the table
above.
Both the direct and the prediction error residuals have been calculated using the formulae,
e
t
= z
t
a
t|T
and
t
= z
t
a
t1
respectively.

20
Appendix
Derivation of updating equations
In this Appendix we shall derive the important steps leading to the updating equation and the
associated variance matrix of the estimation error. Before discussing the steps involved, we shall
digress a bit to delve into the following important material.
1. Consider the model:
Z
(T1)
= Y
(Tm)

(m1)
+ N
(T1)
, N
_
0,
2

_
.
We shall call this model the sample information.
(a) Case 1: If is xed in the above model, we have the usual GLS estimator given as:
_
Y
1
Y
_
Y
1
Z
and this would be BLUE.
(b) Case 2: Suppose the vector is either partially or fully random or stochastic. The
question now here, is the GLS estimator still BLUE? The answer is it still is according to
the extended Gauss-Markov theorem, enunciated by Duncan and Horn, JASA, 1972,
pp.815-21. They proved that the GLS estimator now satises a condition called best,
linear, unconditionally unbiased (or u-unbiased) estimator.[An estimator is u-unbiased if
its estimation error has expectation zero.]
(c) Case 3: Suppose that is still fully or partially random. Additionally suppose that
we have some prior information about it. How can we use it to update the estimator
of already obtained? This becomes a special case of the mixed estimation procedure
developed by Theil and Goldberger (see Theil, Principles of Econometrics, pp.347-
52) where we incorporate such prior information with the sample information. Suppose
in our case, the prior information is given in the form given below:
(
0
)
_
0,
2
P
0
_
,
where
0
is a known vector and P
0
is a known positive denite matrix. Then to get an
updated estimator, that combines this prior information with the sample information, we
rst construct the augmented model:
_
_

0
Z
_
_
=
_
_
I
Y
_
_
+
_
_

0

N
_
_
21
More concisely,

Z =

Y+

N, where, E(

N) = 0, and
E
_

=
2

V =
2
_
_
P
0
0
0
_
_
.
Using the extended Gauss-Markov theorem, we have the estimator of given as:
=
_

V
1

Y
_
1

V
1

Z.
Using the original notations, this can be re-written as:
= P
_
P
1
0

0
+Y

1
Z
_
where P =
_
P
1
0
+Y

1
Y

1
.
is now the updated MMSE of , with
( )
_
0,
2
P
_
We are going to use this principle of combining sample information and prior information in deriving
our updating equation of the KF recursion.
Updating the state vector
The role of the updating vector is to incorporate the new information in z
t
the moment we
are at time t with the information already available in the estimator a
t/t1
. This problem is directly
analogous to the one that we discussed under the extended Gauss-Markov theorem and Theils mixed
estimation procedure, where prior information was combined with the sample information. For our
case, the prior information is in
_

t
a
t/t1
_

_
0,
2
P
t/t1
_
,
while the sample information is derived from the measurement equation. Thus the augmented model:
a
t/t1
=
t
+a
t/t1

t
z
t
= y

t
+ N
t
.
In matrix notation,
_
_
a
t/t1
z
t
_
_
=
_
_
I
y

t
_
_

t
+
_
_
a
t/t1

t
N
t
_
_
.
The disturbance term has zero expectations and covariance matrix,
E
_
_
_
a
t/t1

t
_
_
_
a
t/t1

t
_

t
_
N
t
_
_
=
2
_
_
P
t/t1
0
0 h
_
_
.
22
More precisely,

Z
t
= y
t

t
+ e
t
,
where E( e
t
) is zero and E( e
t
e

t
) =
2

V, where

V =
_
_
P
t/t1
0
0 h
_
_
.
Now, using the extended Gauss-Markov theorem, we can write
a
t
=
_
y

V
1

Y
t
_
1
y

V
1

Z
t
Using the original notations, we can re-write the expression for a
t
as follows:
a
t
= P
t
_
P
1
t|t1
a
t| t1
+y
t
z
t
/h
_
where
P
t
=
_
P
1
t| t1
+y
t
y

t
/h
_
Thus
_
a
t

t
_

_
0,
2
P
t
_
The updating formula can be put in a dierent way using a matrix inversion lemma. The advan-
tage in such an adjustment is that we dont have to invert any matrix in the updating equations.
Lemma:
For any (n n) matrix, D, dened by
D =
_
A+BCB

1
,
where A and C are non-singular matrices of order n and m respectively and B is (n m) , then we
have:
D = A
1
A
1
B
_
C
1
+B

A
1
B

1
B

A
1
We can use this lemma on the expression for P
t
by noting that P
t
= D, P
1
t|t1
= A, y
t
= B
and C = h
1
and it follows that,
P
t
= P
t|t1
P
t|t1
y
t
y

t
P
t|t1
/f
t
where f
t
= y

t
P
t|t1
y
t
+ h.
23
One can make it even more compact by writing
a
t
=
_
P
t|t1
P
t|t1
y
t
y

t
P
t|t1
/f
t
_
_
P
1
t|t1
a
t| t1
+y
t
z
t
/h
_
= a
t|t1
+P
t|t1
y
t
_
z
t
/h y

t
a
t|t1
/f
t
y

t
P
t|t1
y
t
z
t
/f
t
h
_
= a
t|t1
+ f
1
t
P
t|t1
y
t
_
z
t
f
t
/h y

t
a
t|t1
y

t
P
t|t1
y
t
z
t
/h
_
.
Substituting for f
t
h = y

t
P
t|t1
y
t
in the above term and re-arranging, we get
a
t
= a
t|t1
+P
t|t1
y
t
_
z
t
y

t
a
t|t1
_
/f
t
.
Note that the expressions for a
t
and P
t
in this appendix are exactly the ones we have used as
updating and the variance matrix of the estimation error respectively in the main text.
Note also that in the discussion so far, we have assumed the presence of an additional noise
in the measurement equation; that is, h > 0. If we dont, then, note that

V would become
singular. But we also have to note that, in our examples of state space representation of ARMA
models, we have assumed that the measurement equation has no additional error. However this
should not matter, since through these adjustments, note that we have isolated the variance
component as an additive scalar, which when becomes zero, does not aect our calculations.
24

Anda mungkin juga menyukai