Anda di halaman 1dari 56

Lecture 1.

Bivariate CLRM
1. (a) The use of vertical rather than horizontal distances relates to the idea
that the explanatory variable, x, is fixed in repeated samples, so what the
model tries to do is to fit the most appropriate value of y using the model for a
given value of x. Taking horizontal distances would have suggested that we
had fixed the value of y and tried to find the appropriate values of x.
(b) When we calculate the deviations of the points, yt, from the fitted
values, y t , some points will lie above the line (yt > y t ) and some will lie below
the line (yt < y t ). When we calculate the residuals ( u t = yt y t ), those
corresponding to points above the line will be positive and those below the line
negative, so adding them would mean that they would largely cancel out. In
fact, we could fit an infinite number of lines with a zero average residual. By
squaring the residuals before summing them, we ensure that they all
contribute to the measure of loss and that they do not cancel. It is then
possible to define unique (ordinary least squares) estimates of the intercept
and slope.
(c) Taking the absolute values of the residuals and minimising their sum
would certainly also get around the problem of positive and negative residuals
cancelling. However, the absolute value function is much harder to work with
than a square. Squared terms are easy to differentiate, so it is simple to find
analytical formulae for the mean and the variance.
2. The population regression function (PRF) is a description of the model that
is thought to be generating the actual data and it represents the true
relationship between the variables. The population regression function is also
known as the data generating process (DGP). The PRF embodies the true
values of and , and for the bivariate model, could be expressed as
y t xt u t

Note that there is a disturbance term in this equation. In some textbooks, a

distinction is drawn between the PRF (the underlying true relationship
between y and x) and the DGP (the process describing the way that the actual
observations on y come about).
The sample regression function, SRF, is the relationship that has been
estimated using the sample observations, and is often written as

y t x t
Notice that there is no error or residual term in the equation for the SRF: all
this equation states is that given a particular value of x, multiplying it by


Introductory Econometrics for Finance Chris Brooks 2008

and adding will give the model fitted or expected value for y, denoted y . It
is also possible to write

y t xt u t
This equation splits the observed value of y into two components: the fitted
value from the model, and a residual term. The SRF is used to infer likely
values of the PRF. That is the estimates and are constructed, for the
sample data.
3. An estimator is simply a formula that is used to calculate the estimates, i.e.
the parameters that describe the relationship between two or more
explanatory variables. There are an infinite number of possible estimators;
OLS is one choice that many people would consider a good one. We can say
that the OLS estimator is best i.e. that it has the lowest variance among the
class of linear unbiased estimators. So it is optimal in the sense that no other
linear, unbiased estimator would have a smaller sampling variance. We could
define an estimator with a lower sampling variance than the OLS estimator,
but it would either be non-linear or biased or both! So there is a trade-off
between bias and variance in the choice of the estimator.

4. A list of the assumptions of the classical linear regression models

disturbance terms is given in Box 2.3 on p.44 of the book.
We need to make the first four assumptions in order to prove that the ordinary
least squares estimators of and are best, that is to prove that they have
minimum variance among the class of linear unbiased estimators. The
theorem that proves that OLS estimators are BLUE (provided the assumptions
are fulfilled) is known as the Gauss-Markov theorem. If these assumptions are
violated (which is dealt with in Chapter 4), then it may be that OLS estimators
are no longer unbiased or efficient. That is, they may be inaccurate or
subject to fluctuations between samples.
We needed to make the fifth assumption, that the disturbances are normally
distributed, in order to make statistical inferences about the population
parameters from the sample data, i.e. to test hypotheses about the coefficients.
Making this assumption implies that test statistics will follow a t-distribution
(provided that the other assumptions also hold).
5. If the models are linear in the parameters, we can use OLS.
(2.57) Yes, can use OLS since the model is the usual linear model we have been
dealing with.
(2.58) Yes. The model can be linearised by taking logarithms of both sides and
by rearranging. Although this is a very specific case, it has sound theoretical

Introductory Econometrics for Finance Chris Brooks 2008

foundations (e.g. the Cobb-Douglas production function in economics), and it

is the case that many relationships can be approximately linearised by taking
logs of the variables. The effect of taking logs is to reduce the effect of extreme
values on the regression function, and it may be possible to turn multiplicative
models into additive ones which we can easily estimate.
(2.59) Yes. We can estimate this model using OLS, but we would not be able to
obtain the values of both and, but we would obtain the value of these two
coefficients multiplied together.
(2.60) Yes, we can use OLS, since this model is linear in the logarithms. For
those who have done some economics, models of this kind which are linear in
the logarithms have the interesting property that the coefficients ( and ) can
be interpreted as elasticities.
(2.61). Yes, in fact we can still use OLS since it is linear in the parameters. If
we make a substitution, say qt = xtzt, then we can run the regression
yt = + qt + ut
as usual.
So, in fact, we can estimate a fairly wide range of model types using these
simple tools.
6. The null hypothesis is that the true (but unknown) value of beta is equal to
one, against a one sided alternative that it is greater than one:
H0 : = 1
H1 : > 1
The test statistic is given by

test stat

* 1.147 1

SE ( )

We want to compare this with a value from the t-table with T-2 degrees of
freedom, where T is the sample size, and here T-2 =60. We want a value with
5% all in one tail since we are doing a 1-sided test. The critical t-value from the
t-table is 1.671:


Introductory Econometrics for Finance Chris Brooks 2008


5% rejection


The value of the test statistic is in the rejection region and hence we can reject
the null hypothesis. We have statistically significant evidence that this security
has a beta greater than one, i.e. it is significantly more risky than the market
as a whole.
7. We want to use a two-sided test to test the null hypothesis that shares in
Chris Mining are completely unrelated to movements in the market as a
whole. In other words, the value of beta in the regression model would be zero
so that whatever happens to the value of the market proxy, Chris Mining
would be completely unaffected by it.
The null and alternative hypotheses are therefore:
H0 : = 0
H1 : 0
The test statistic has the same format as before, and is given by:

test stat

* 0.214 0

SE ( )

We want to find a value from the t-tables for a variable with 38-2=36 degrees
of freedom, and we want to look up the value that puts 2.5% of the distribution
in each tail since we are doing a two-sided test and we want to have a 5% size
of test over all:


Introductory Econometrics for Finance Chris Brooks 2008

2.5% rejection region


2.5% rejection region


The critical t-value is therefore 2.03.

Since the test statistic is not within the rejection region, we do not reject the
null hypothesis. We therefore conclude that we have no statistically significant
evidence that Chris Mining has any systematic risk. In other words, we have
no evidence that changes in the companys value are driven by movements in
the market.
8. A confidence interval for beta is given by the formula:

( SE ( ) t crit , SE ( ) t crit )
Confidence intervals are almost invariably 2-sided, unless we are told
otherwise (which we are not here), so we want to look up the values which put
2.5% in the upper tail and 0.5% in the upper tail for the 95% and 99%
confidence intervals respectively. The 0.5% critical values are given as follows
for a t-distribution with T-2=38-2=36 degrees of freedom:


Introductory Econometrics for Finance Chris Brooks 2008

0.5% rejection region


0.5% rejection region


The confidence interval in each case is thus given by (0.2140.186*2.03) for a

95% confidence interval, which solves to (-0.164, 0.592) and
(0.2140.186*2.72) for a 99% confidence interval, which solves to (0.292,0.720)
There are a couple of points worth noting.
First, one intuitive interpretation of an X% confidence interval is that we are
X% sure that the true value of the population parameter lies within the
interval. So we are 95% sure that the true value of beta lies within the interval
(-0.164, 0.592) and we are 99% sure that the true population value of beta lies
within (-0.292, 0.720). Thus in order to be more sure that we have the true
vale of beta contained in the interval, i.e. as we move from 95% to 99%
confidence, the interval must become wider.
The second point to note is that we can test an infinite number of hypotheses
about beta once we have formed the interval. For example, we would not reject
the null hypothesis contained in the last question (i.e. that beta = 0), since that
value of beta lies within the 95% and 99% confidence intervals. Would we
reject or not reject a null hypothesis that the true value of beta was 0.6? At the
5% level, we should have enough evidence against the null hypothesis to reject
it, since 0.6 is not contained within the 95% confidence interval. But at the 1%
level, we would no longer have sufficient evidence to reject the null hypothesis,
since 0.6 is now contained within the interval. Therefore we should always if
possible conduct some sort of sensitivity analysis to see if our conclusions are
altered by (sensible) changes in the level of significance used.
9. We test hypotheses about the actual coefficients, not the estimated values.
We want to make inferences about the likely values of the population
parameters (i.e. to test hypotheses about them). We do not need to test
hypotheses about the estimated values since we know exactly what our


Introductory Econometrics for Finance Chris Brooks 2008

Lecture 2. Multivariate CLRM

1. It can be proved that a t-distribution is just a special case of the more
general F-distribution. The square of a t-distribution with T-k degrees of
freedom will be identical to an F-distribution with (1,T-k) degrees of freedom.
But remember that if we use a 5% size of test, we will look up a 5% value for
the F-distribution because the test is 2-sided even though we only look in one
tail of the distribution. We look up a 2.5% value for the t-distribution since the
test is 2-tailed.
Examples at the 5% level from tables

F critical value

t critical value

2. (a) H0 : 3 = 2
We could use an F- or a t- test for this one since it is a single hypothesis
involving only one coefficient. We would probably in practice use a t-test since
it is computationally simpler and we only have to estimate one regression.
There is one restriction.
H0 : 3 + 4 = 1
Since this involves more than one coefficient, we should use an F-test. There is
one restriction.
H0 : 3 + 4 = 1 and 5 = 1
Since we are testing more than one hypothesis simultaneously, we would use
an F-test. There are 2 restrictions.
H0 : 2 =0 and 3 = 0 and 4 = 0 and 5 = 0
As for (c), we are testing multiple hypotheses so we cannot use a t-test. We
have 4 restrictions.
H0 : 2 3 = 1
Although there is only one restriction, it is a multiplicative restriction. We
therefore cannot use a t-test or an F-test to test it. In fact we cannot test it at
all using the methodology that has been examined in this chapter.
3. THE regression F-statistic would be given by the test statistic associated
with hypothesis iv) above. We are always interested in testing this hypothesis
since it tests whether all of the coefficients in the regression (except the
constant) are jointly insignificant. If they are then we have a completely
useless regression, where none of the variables that we have said influence y
actually do. So we would need to go back to the drawing board!


Introductory Econometrics for Finance Chris Brooks 2008

The alternative hypothesis is:

H1 : 2 0 or 3 0 or 4 0 or 5 0
Note the form of the alternative hypothesis: or indicates that only one of the
components of the null hypothesis would have to be rejected for us to reject
the null hypothesis as a whole.
4. The restricted residual sum of squares will always be at least as big as the
unrestricted residual sum of squares i.e.
To see this, think about what we were doing when we determined what the
regression parameters should be: we chose the values that minimised the
residual sum of squares. We said that OLS would provide the best parameter
values given the actual sample data. Now when we impose some restrictions
on the model, so that they cannot all be freely determined, then the model
should not fit as well as it did before. Hence the residual sum of squares must
be higher once we have imposed the restrictions; otherwise, the parameter
values that OLS chose originally without the restrictions could not be the best.
In the extreme case (very unlikely in practice), the two sets of residual sum of
squares could be identical if the restrictions were already present in the data,
so that imposing them on the model would yield no penalty in terms of loss of
5. The null hypothesis is: H0 : 3 + 4 = 1 and 5 = 1
The first step is to impose this on the regression model:
yt = 1 + 2x2t + 3x3t + 4x4t + 5x5t + ut subject to 3 + 4 = 1 and 5 = 1.
We can rewrite the first part of the restriction as 4 = 1 - 3
Then rewrite the regression with the restriction imposed
yt = 1 + 2x2t + 3x3t + (1- 3)x4t + x5t + ut
which can be re-written
yt = 1 + 2x2t + 3x3t + x4t - 3x4t + x5t + ut
and rearranging
(yt x4t x5t ) = 1 + 2x2t + 3x3t - 3x4t + ut


Introductory Econometrics for Finance Chris Brooks 2008

(yt x4t x5t) = 1 + 2x2t + 3(x3t x4t)+ ut

Now create two new variables, call them Pt and Qt:
pt = (yt - x3t - x4t)
qt = (x2t -x3t)
We can then run the linear regression:
pt = 1 + 2x2t + 3qt+ ut ,
which constitutes the restricted regression model.
The test statistic is calculated as ((RRSS-URSS)/URSS)*(T-k)/m
In this case, m=2, T=96, k=5 so the test statistic = 5.704. Compare this to an
F-distribution with (2,91) degrees of freedom, which is approximately 3.10.
Hence we reject the null hypothesis that the restrictions are valid. We cannot
impose these restrictions on the data without a substantial increase in the
residual sum of squares.

ri = 0.080 + 0.801Si + 0.321MBi + 0.164PEi - 0.084BETAi

(0.064) (0.147) (0.136)

The t-ratios are given in the final row above, and are in italics. They are
calculated by dividing the coefficient estimate by its standard error. The
relevant value from the t-tables is for a 2-sided test with 5% rejection overall.
T-k = 195; tcrit = 1.97. The null hypothesis is rejected at the 5% level if the
absolute value of the test statistic is greater than the critical value. We would
conclude based on this evidence that only firm size and market to book value
have a significant effect on stock returns.
If a stocks beta increases from 1 to 1.2, then we would expect the return on the
stock to FALL by (1.2-1)*0.084 = 0.0168 = 1.68%
This is not the sign we would have expected on beta, since beta would be
expected to be positively related to return, since investors would require
higher returns as compensation for bearing higher market risk.
7. We would thus consider deleting the price/earnings and beta variables from
the regression since these are not significant in the regression - i.e. they are
not helping much to explain variations in y. We would not delete the constant
term from the regression even though it is insignificant since there are good
statistical reasons for its inclusion.
yt 1 2 x2t 3 x3t 4 yt 1 u t
yt 1 2 x2t 3 x3t 4 yt 1 vt .


Introductory Econometrics for Finance Chris Brooks 2008

Note that we have not changed anything substantial between these models in
the sense that the second model is just a re-parameterisation (rearrangement)
of the first, where we have subtracted yt-1 from both sides of the equation.
(a) Remember that the residual sum of squares is the sum of each of the
squared residuals. So lets consider what the residuals will be in each
case. For the first model in the level of y

u t yt y t yt 1 2 x 2t 3 X 3t 4 yt 1
Now for the second model, the dependent variable is now the change in y:
vt yt y t yt 1 2 x 2t 3 x3t 4 yt 1

where y is the fitted value in each case (note that we do not need at this stage
to assume they are the same). Rearranging this second model would give

u t y t y t 1 1 2 x 2t 3 x3t 4 y t 1
y t 1 2 x 2t 3 x3t ( 4 1) y t 1
If we compare this formulation with the one we calculated for the first model,
we can see that the residuals are exactly the same for the two models, with
4 4 1 and i i (i = 1, 2, 3). Hence if the residuals are the same, the
residual sum of squares must also be the same. In fact the two models are
really identical, since one is just a rearrangement of the other.
(b) As for R2, recall how we calculate R2:
R2 1

for the first model and
( yi y ) 2

R2 1

in the second case. Therefore since the total sum of
(yi y ) 2

squares (the denominator) has changed, then the value of R2 must have also
changed as a consequence of changing the dependent variable.
(c) By the same logic, since the value of the adjusted R2 is just an algebraic
modification of R2 itself, the value of the adjusted R2 must also change.
8. A researcher estimates the following two econometric models
y t 1 2 x 2t 3 x3t u t
yt 1 2 x2t 3 x3t 4 x4t vt


Introductory Econometrics for Finance Chris Brooks 2008

(a) The value of R2 will almost always be higher for the second model since it
has another variable added to the regression. The value of R2 would only be
identical for the two models in the very, very unlikely event that the estimated
coefficient on the x4t variable was exactly zero. Otherwise, the R2 must be
higher for the second model than the first.
(b) The value of the adjusted R2 could fall as we add another variable. The
reason for this is that the adjusted version of R2 has a correction for the loss of
degrees of freedom associated with adding another regressor into a regression.
This implies a penalty term, so that the value of the adjusted R2 will only rise if
the increase in this penalty is more than outweighed by the rise in the value of
11. R2 may be defined in various ways, but the most common is
Since both ESS and TSS will have units of the square of the dependent
variable, the units will cancel out and hence R2 will be unit-free!


Introductory Econometrics for Finance Chris Brooks 2008

Lecture 3. CLRM assumptions and diagnostic tests

1. In the same way as we make assumptions about the true value of beta and
not the estimated values, we make assumptions about the true unobservable
disturbance terms rather than their estimated counterparts, the residuals.
We know the exact value of the residuals, since they are defined by
u t y t y t . So we do not need to make any assumptions about the residuals
since we already know their value. We make assumptions about the
unobservable error terms since it is always the true value of the population
disturbances that we are really interested in, although we never actually know
what these are.
2. We would like to see no pattern in the residual plot! If there is a pattern in
the residual plot, this is an indication that there is still some action or
variability left in yt that has not been explained by our model. This indicates
that potentially it may be possible to form a better model, perhaps using
additional or completely different explanatory variables, or by using lags of
either the dependent or of one or more of the explanatory variables. Recall
that the two plots shown on pages 157 and 159, where the residuals followed a
cyclical pattern, and when they followed an alternating pattern are used as
indications that the residuals are positively and negatively autocorrelated
Another problem if there is a pattern in the residuals is that, if it does
indicate the presence of autocorrelation, then this may suggest that our
standard error estimates for the coefficients could be wrong and hence any
inferences we make about the coefficients could be misleading.
3. The t-ratios for the coefficients in this model are given in the third row after
the standard errors. They are calculated by dividing the individual coefficients
by their standard errors.
R2 0.96,R 2 0.89
0.638 + 0.402 x2t - 0.891 x3t
(0.436) (0.291)

y t =

The problem appears to be that the regression parameters are all individually
insignificant (i.e. not significantly different from zero), although the value of
R2 and its adjusted version are both very high, so that the regression taken as a
whole seems to indicate a good fit. This looks like a classic example of what we
term near multicollinearity. This is where the individual regressors are very
closely related, so that it becomes difficult to disentangle the effect of each
individual variable upon the dependent variable.
The solution to near multicollinearity that is usually suggested is that since the
problem is really one of insufficient information in the sample to determine


Introductory Econometrics for Finance Chris Brooks 2008

each of the coefficients, then one should go out and get more data. In other
words, we should switch to a higher frequency of data for analysis (e.g. weekly
instead of monthly, monthly instead of quarterly etc.). An alternative is also to
get more data by using a longer sample period (i.e. one going further back in
time), or to combine the two independent variables in a ratio (e.g. x2t / x3t ).
Other, more ad hoc methods for dealing with the possible existence of near
multicollinearity were discussed in Chapter 4:

Ignore it: if the model is otherwise adequate, i.e. statistically and in terms
of each coefficient being of a plausible magnitude and having an
appropriate sign. Sometimes, the existence of multicollinearity does not
reduce the t-ratios on variables that would have been significant without
the multicollinearity sufficiently to make them insignificant. It is worth
stating that the presence of near multicollinearity does not affect the BLUE
properties of the OLS estimator i.e. it will still be consistent, unbiased
and efficient since the presence of near multicollinearity does not violate
any of the CLRM assumptions 1-4. However, in the presence of near
multicollinearity, it will be hard to obtain small standard errors. This will
not matter if the aim of the model-building exercise is to produce forecasts
from the estimated model, since the forecasts will be unaffected by the
presence of near multicollinearity so long as this relationship between the
explanatory variables continues to hold over the forecasted sample.

Drop one of the collinear variables - so that the problem disappears.

However, this may be unacceptable to the researcher if there were strong a
priori theoretical reasons for including both variables in the model. Also, if
the removed variable was relevant in the data generating process for y, an
omitted variable bias would result.

Transform the highly correlated variables into a ratio and include only the
ratio and not the individual variables in the regression. Again, this may be
unacceptable if financial theory suggests that changes in the dependent
variable should occur following changes in the individual explanatory
variables, and not a ratio of them.

4. (a) The assumption of homoscedasticity is that the variance of the errors is

constant and finite over time. Technically, we write Var (u t ) u2 .
(b) The coefficient estimates would still be the correct ones (assuming
that the other assumptions required to demonstrate OLS optimality are
satisfied), but the problem would be that the standard errors could be
wrong. Hence if we were trying to test hypotheses about the true parameter
values, we could end up drawing the wrong conclusions. In fact, for all of
the variables except the constant, the standard errors would typically be too
small, so that we would end up rejecting the null hypothesis too many


Introductory Econometrics for Finance Chris Brooks 2008

(c) There are a number of ways to proceed in practice, including

- Using heteroscedasticity robust standard errors which correct for the
problem by enlarging the standard errors relative to what they would have
been for the situation where the error variance is positively related to one of
the explanatory variables.
- Transforming the data into logs, which has the effect of reducing the effect of
large errors relative to small ones.
5. (a) This is where there is a relationship between the ith and jth residuals.
Recall that one of the assumptions of the CLRM was that such a relationship
did not exist. We want our residuals to be random, and if there is evidence of
autocorrelation in the residuals, then it implies that we could predict the sign
of the next residual and get the right answer more than half the time on
(b) The Durbin Watson test is a test for first order autocorrelation. The test is
calculated as follows. You would run whatever regression you were interested
in, and obtain the residuals. Then calculate the statistic

DW t 2

ut 1

t 2


You would then need to look up the two critical values from the Durbin
Watson tables, and these would depend on how many variables and how many
observations and how many regressors (excluding the constant this time) you
had in the model.
The rejection / non-rejection rule would be given by selecting the appropriate
region from the following diagram:

(c) We have 60 observations, and the number of regressors excluding the

constant term is 3. The appropriate lower and upper limits are 1.48 and 1.69
respectively, so the Durbin Watson is lower than the lower limit. It is thus


Introductory Econometrics for Finance Chris Brooks 2008

clear that we reject the null hypothesis of no autocorrelation. So it looks like

the residuals are positively autocorrelated.
(d) yt 1 2 x2t 3 x3t 4 x4t u t
The problem with a model entirely in first differences, is that once we calculate
the long run solution, all the first difference terms drop out (as in the long run
we assume that the values of all variables have converged on their own long
run values so that yt = yt-1 etc.) Thus when we try to calculate the long run
solution to this model, we cannot do it because there isnt a long run solution
to this model!
(e) yt 1 2 x2t 3 x3t 4 x4t 5 x2t 1 6 X 3t 1 7 X 4t 1 vt
The answer is yes, there is no reason why we cannot use Durbin Watson in this
case. You may have said no here because there are lagged values of the
regressors (the x variables) variables in the regression. In fact this would be
wrong since there are no lags of the DEPENDENT (y) variable and hence DW
can still be used.

yt 1 2 x2t 3 x3t 4 yt 1 5 x 2t 1 6 x3t 1 7 x rt 4 u t

The major steps involved in calculating the long run solution are to
- set the disturbance term equal to its expected value of zero
- drop the time subscripts
- remove all difference terms altogether since these will all be zero by the
definition of the long run in this context.
Following these steps, we obtain
0 1 4 y 5 x 2 6 x3 7 x3

We now want to rearrange this to have all the terms in x2 together and so that
y is the subject of the formula:

4 y 1 5 x 2 6 x3 7 x3
4 y 1 5 x 2 ( 6 7 ) x3

( 4 )

y 1 5 x2 6
4 4
The last equation above is the long run solution.


Introductory Econometrics for Finance Chris Brooks 2008

7. Ramseys RESET test is a test of whether the functional form of the

regression is appropriate. In other words, we test whether the relationship
between the dependent variable and the independent variables really should
be linear or whether a non-linear form would be more appropriate. The test
works by adding powers of the fitted values from the regression into a second
regression. If the appropriate model was a linear one, then the powers of the
fitted values would not be significant in this second regression.
If we fail Ramseys RESET test, then the easiest solution is probably to
transform all of the variables into logarithms. This has the effect of turning a
multiplicative model into an additive one.
If this still fails, then we really have to admit that the relationship between the
dependent variable and the independent variables was probably not linear
after all so that we have to either estimate a non-linear model for the data
(which is beyond the scope of this course) or we have to go back to the drawing
board and run a different regression containing different variables.
8. (a) It is important to note that we did not need to assume normality in order
to derive the sample estimates of and or in calculating their standard
errors. We needed the normality assumption at the later stage when we come
to test hypotheses about the regression coefficients, either singly or jointly, so
that the test statistics we calculate would indeed have the distribution (t or F)
that we said they would.
(b) One solution would be to use a technique for estimation and inference
which did not require normality. But these techniques are often highly
complex and also their properties are not so well understood, so we do not
know with such certainty how well the methods will perform in different
One pragmatic approach to failing the normality test is to plot the estimated
residuals of the model, and look for one or more very extreme outliers. These
would be residuals that are much bigger (either very big and positive, or very
big and negative) than the rest. It is, fortunately for us, often the case that one
or two very extreme outliers will cause a violation of the normality
assumption. The reason that one or two extreme outliers can cause a violation
of the normality assumption is that they would lead the (absolute value of the)
skewness and / or kurtosis estimates to be very large.
Once we spot a few extreme residuals, we should look at the dates when these
outliers occurred. If we have a good theoretical reason for doing so, we can add
in separate dummy variables for big outliers caused by, for example, wars,
changes of government, stock market crashes, changes in market
microstructure (e.g. the big bang of 1986). The effect of the dummy variable
is exactly the same as if we had removed the observation from the sample
altogether and estimated the regression on the remainder. If we only remove
observations in this way, then we make sure that we do not lose any useful
pieces of information represented by sample points.


Introductory Econometrics for Finance Chris Brooks 2008

9. (a) Parameter structural stability refers to whether the coefficient estimates

for a regression equation are stable over time. If the regression is not
structurally stable, it implies that the coefficient estimates would be different
for some sub-samples of the data compared to others. This is clearly not what
we want to find since when we estimate a regression, we are implicitly
assuming that the regression parameters are constant over the entire sample
period under consideration.

rt = 0.0215 + 1.491 rmt
rt = 0.0163 + 1.308 rmt
rt = 0.0360 + 1.613 rmt

RSS=0.189 T=180
RSS=0.079 T=82
RSS=0.082 T=98

(c) If we define the coefficient estimates for the first and second halves of the
sample as 1 and 1, and 2 and 2 respectively, then the null and alternative
hypotheses are
H0 : 1 = 2 and 1 = 2

H1 : 1 2 or 1 2

(d) The test statistic is calculated as

Test stat. =
RSS ( RSS 1 RSS 2 ) (T 2k ) 0.189 (0.079 0.082) 180 4

0.079 0.082
This follows an F distribution with (k,T-2k) degrees of freedom. F(2,176) =
3.05 at the 5% level. Clearly we reject the null hypothesis that the coefficients
are equal in the two sub-periods.
10. The data we have are
rt = 0.0215 + 1.491 Rmt
rt = 0.0212 + 1.478 Rmt


RSS=0.189 T=180
RSS=0.148 T=168

Introductory Econometrics for Finance Chris Brooks 2008

rt = 0.0217 + 1.523 Rmt

RSS=0.182 T=168

First, the forward predictive failure test - i.e. we are trying to see if the model
for 1981M1-1994M12 can predict 1995M1-1995M12.
The test statistic is given by
RSS RSS1 T1 k 0.189 0.148 168 2

Where T1 is the number of observations in the first period (i.e. the period that
we actually estimate the model over), and T2 is the number of observations we
are trying to predict. The test statistic follows an F-distribution with (T2, T1k) degrees of freedom. F(12, 166) = 1.81 at the 5% level. So we reject the null
hypothesis that the model can predict the observations for 1995. We would
conclude that our model is no use for predicting this period, and from a
practical point of view, we would have to consider whether this failure is a
result of a-typical behaviour of the series out-of-sample (i.e. during 1995), or
whether it results from a genuine deficiency in the model.
The backward predictive failure test is a little more difficult to understand,
although no more difficult to implement. The test statistic is given by

RSS RSS 1 T1 k 0.189 0.182 168 2


Now we need to be a little careful in our interpretation of what exactly are the
first and second sample periods. It would be possible to define T1 as always
being the first sample period. But I think it easier to say that T1 is always the
sample over which we estimate the model (even though it now comes after the
hold-out-sample). Thus T2 is still the sample that we are trying to predict, even
though it comes first. You can use either notation, but you need to be clear and
consistent. If you wanted to choose the other way to the one I suggest, then
you would need to change the subscript 1 everywhere in the formula above so
that it was 2, and change every 2 so that it was a 1.
Either way, we conclude that there is little evidence against the null
hypothesis. Thus our model is able to adequately back-cast the first 12
observations of the sample.
11. By definition, variables having associated parameters that are not
significantly different from zero are not, from a statistical perspective, helping
to explain variations in the dependent variable about its mean value. One
could therefore argue that empirically, they serve no purpose in the fitted
regression model. But leaving such variables in the model will use up valuable
degrees of freedom, implying that the standard errors on all of the other
parameters in the regression model, will be unnecessarily higher as a result. If
the number of degrees of freedom is relatively small, then saving a couple by
deleting two variables with insignificant parameters could be useful. On the
other hand, if the number of degrees of freedom is already very large, the


Introductory Econometrics for Finance Chris Brooks 2008

impact of these additional irrelevant variables on the others is likely to be

12. An outlier dummy variable will take the value one for one observation in the
sample and zero for all others. The Chow test involves splitting the sample into two
parts. If we then try to run the regression on both the sub-parts but the model contains
such an outlier dummy, then the observations on that dummy will be zero everywhere
for one of the regressions. For that sub-sample, the outlier dummy would show
perfect multicollinearity with the intercept and therefore the model could not be


Introductory Econometrics for Finance Chris Brooks 2008

Lecture 4. Univariate time series

1. Autoregressive models specify the current value of a series yt as a function of
its previous p values and the current value an error term, ut, while moving
average models specify the current value of a series yt as a function of the
current and previous q values of an error term, ut. AR and MA models have
different characteristics in terms of the length of their memories, which has
implications for the time it takes shocks to yt to die away, and for the shapes of
their autocorrelation and partial autocorrelation functions.
2. ARMA models are of particular use for financial series due to their
flexibility. They are fairly simple to estimate, can often produce reasonable
forecasts, and most importantly, they require no knowledge of any structural
variables that might be required for more traditional econometric analysis.
When the data are available at high frequencies, we can still use ARMA models
while exogenous explanatory variables (e.g. macroeconomic variables,
accounting ratios) may be unobservable at any more than monthly intervals at

yt = yt-1 + ut
yt = 0.5 yt-1 + ut
yt = 0.8 ut-1 + ut


(a) The first two models are roughly speaking AR(1) models, while the last is
an MA(1). Strictly, since the first model is a random walk, it should be called
an ARIMA(0,1,0) model, but it could still be viewed as a special case of an
autoregressive model.
(b) We know that the theoretical acf of an MA(q) process will be zero after q
lags, so the acf of the MA(1) will be zero at all lags after one. For an
autoregressive process, the acf dies away gradually. It will die away fairly
quickly for case (2), with each successive autocorrelation coefficient taking on
a value equal to half that of the previous lag. For the first case, however, the
acf will never die away, and in theory will always take on a value of one,
whatever the lag.
Turning now to the pacf, the pacf for the first two models would have a large
positive spike at lag 1, and no statistically significant pacfs at other lags.
Again, the unit root process of (1) would have a pacf the same as that of a
stationary AR process. The pacf for (3), the MA(1), will decline geometrically.
(c) Clearly the first equation (the random walk) is more likely to represent
stock prices in practice. The discounted dividend model of share prices states
that the current value of a share will be simply the discounted sum of all
expected future dividends. If we assume that investors form their expectations
about dividend payments rationally, then the current share price should
embody all information that is known about the future of dividend payments,

Introductory Econometrics for Finance Chris Brooks 2008

and hence todays price should only differ from yesterdays by the amount of
unexpected news which influences dividend payments.
Thus stock prices should follow a random walk. Note that we could apply a
similar rational expectations and random walk model to many other kinds of
financial series.
If the stock market really followed the process described by equations (2) or
(3), then we could potentially make useful forecasts of the series using our
model. In the latter case of the MA(1), we could only make one-step ahead
forecasts since the memory of the model is only that length. In the case of
equation (2), we could potentially make a lot of money by forming multiple
step ahead forecasts and trading on the basis of these.
Hence after a period, it is likely that other investors would spot this potential
opportunity and hence the model would no longer be a useful description of
the data.
(d) See the book for the algebra. This part of the question is really an extension
of the others. Analysing the simplest case first, the MA(1), the memory of
the process will only be one period, and therefore a given shock or
innovation, ut, will only persist in the series (i.e. be reflected in yt) for one
period. After that, the effect of a given shock would have completely worked
For the case of the AR(1) given in equation (2), a given shock, ut, will persist
indefinitely and will therefore influence the properties of yt for ever, but its
effect upon yt will diminish exponentially as time goes on.
In the first case, the series yt could be written as an infinite sum of past
shocks, and therefore the effect of a given shock will persist indefinitely, and
its effect will not diminish over time.
4. (a) Box and Jenkins were the first to consider ARMA modelling in this
logical and coherent fashion. Their methodology consists of 3 steps:
Identification - determining the appropriate order of the model using
graphical procedures (e.g. plots of autocorrelation functions).
Estimation - of the parameters of the model of size given in the first stage.
This can be done using least squares or maximum likelihood, depending on
the model.
Diagnostic checking - this step is to ensure that the model actually estimated is
adequate. B & J suggest two methods for achieving this:
- Overfitting, which involves deliberately fitting a model larger than
that suggested in step 1 and testing the hypothesis that all the additional
coefficients can jointly be set to zero.


Introductory Econometrics for Finance Chris Brooks 2008

- Residual diagnostics. If the model estimated is a good description of

the data, there should be no further linear dependence in the residuals of the
estimated model. Therefore, we could calculate the residuals from the
estimated model, and use the Ljung-Box test on them, or calculate their acf. If
either of these reveal evidence of additional structure, then we assume that the
estimated model is not an adequate description of the data.
If the model appears to be adequate, then it can be used for policy analysis and
for constructing forecasts. If it is not adequate, then we must go back to stage 1
and start again!
(b) The main problem with the B & J methodology is the inexactness of the
identification stage. Autocorrelation functions and partial autocorrelations for
actual data are very difficult to interpret accurately, rendering the whole
procedure often little more than educated guesswork. A further problem
concerns the diagnostic checking stage, which will only indicate when the
proposed model is too small and would not inform on when the model
proposed is too large.
(c) We could use Akaikes or Schwarzs Bayesian information criteria. Our
objective would then be to fit the model order that minimises these.
We can calculate the value of Akaikes (AIC) and Schwarzs (SBIC) Bayesian
information criteria using the following respective formulae
AIC = ln ( 2 ) + 2k/T
SBIC = ln ( 2 ) + k ln(T)/T
The information criteria trade off an increase in the number of parameters
and therefore an increase in the penalty term against a fall in the RSS,
implying a closer fit of the model to the data.
5. The best way to check for stationarity is to express the model as a lag
polynomial in yt.

yt 0803
. yt 1 0.682 yt 2 ut
Rewrite this as
yt (1 0.803L 0.682 L2 ) ut

We want to find the roots of the lag polynomial (1 0.803L 0.682 L2 ) 0 and
determine whether they are greater than one in absolute value. It is easier (in
my opinion) to rewrite this formula (by multiplying through by -1/0.682,
using z for the characteristic equation and rearranging) as
z2 + 1.177 z - 1.466 = 0


Introductory Econometrics for Finance Chris Brooks 2008

Using the standard formula for obtaining the roots of a quadratic equation,

. 2 4 * 1 * 1466
= 0.758 or 1.934

Since ALL the roots must be greater than one for the model to be stationary,
we conclude that the estimated model is not stationary in this case.
6. Using the formulae above, we end up with the following values for each
criterion and for each model order (with an asterisk denoting the smallest
value of the information criterion in each case).
ARMA (p,q) model order

log ( 2 )






The result is pretty clear: both SBIC and AIC say that the appropriate model is
an ARMA(3,2).
7. We could still perform the Ljung-Box test on the residuals of the estimated
models to see if there was any linear dependence left unaccounted for by our
postulated models.
Another test of the models adequacy that we could use is to leave out some of
the observations at the identification and estimation stage, and attempt to
construct out of sample forecasts for these. For example, if we have 2000
observations, we may use only 1800 of them to identify and estimate the
models, and leave the remaining 200 for construction of forecasts. We would
then prefer the model that gave the most accurate forecasts.
8. This is not true in general. Yes, we do want to form a model which fits the
data as well as possible. But in most financial series, there is a substantial
amount of noise. This can be interpreted as a number of random events that
are unlikely to be repeated in any forecastable way. We want to fit a model to
the data which will be able to generalise. In other words, we want a model
which fits to features of the data which will be replicated in future; we do not
want to fit to sample-specific noise.


Introductory Econometrics for Finance Chris Brooks 2008

This is why we need the concept of parsimony - fitting the smallest possible
model to the data. Otherwise we may get a great fit to the data in sample, but
any use of the model for forecasts could yield terrible results.
Another important point is that the larger the number of estimated
parameters (i.e. the more variables we have), then the smaller will be the
number of degrees of freedom, and this will imply that coefficient standard
errors will be larger than they would otherwise have been. This could lead to a
loss of power in hypothesis tests, and variables that would otherwise have
been significant are now insignificant.
9. (a) We class an autocorrelation coefficient or partial autocorrelation
coefficient as significant if it exceeds 1.96
= 0.196. Under this rule, the
sample autocorrelation functions (sacfs) at lag 1 and 4 are significant, and the
spacfs at lag 1, 2, 3, 4 and 5 are all significant.
This clearly looks like the data are consistent with a first order moving average
process since all but the first acfs are not significant (the significant lag 4 acf is
a typical wrinkle that one might expect with real data and should probably be
ignored), and the pacf has a slowly declining structure.
(b) The formula for the Ljung-Box Q* test is given by

Q* T (T 2)
k 1

T k


using the standard notation.

In this case, T=100, and m=3. The null hypothesis is H0: 1 = 0 and 2 = 0 and
3 = 0. The test statistic is calculated as

0.420 2 0.104 2 0.032 2

Q* 100 102

100 1 100 2 100 3
The 5% and 1% critical values for a 2 distribution with 3 degrees of freedom
are 7.81 and 11.3 respectively. Clearly, then, we would reject the null
hypothesis that the first three autocorrelation coefficients are jointly not
significantly different from zero.
10. (a) To solve this, we need the concept of a conditional expectation,
i.e. Et 1 ( y t y t 2 , y t 3 ,...)
For example, in the context of an AR(1) model such as , yt a0 a1 yt 1 ut


Introductory Econometrics for Finance Chris Brooks 2008

If we are now at time t-1, and dropping the t-1 subscript on the expectations
E ( yt ) a0 a1 yt 1
E ( yt 1 ) a0 a1 E ( yt )
= a0 a1 yt 1 (a0 a1 yt 1 )

a0 a0a1 a12 yt 1
E ( yt 2 ) a0 a1 E ( yt 1 )
= a0 a1 (a0 a1 E ( yt ))
= a0 a0a1 a1 E ( yt )
= a0 a0a1 a1 E ( yt )
= a0 a0a1 a1 (a0 a1 yt 1 )
= a0 a0a1 a1 a0 a1 yt 1
f t 1,1 a0 a1 yt 1
f t 1,2 a0 a1 f t 1,1
f t 1,3 a0 a1 f t 1,2

To forecast an MA model, consider, e.g.

yt ut b1ut 1

E ( yt yt 1 , yt 2 ,...)



E (ut b1ut 1 )
b1u t 1


E (ut 1 b1ut )

b1u t 1


E ( yt 1 yt 1 , yt 2 ,...)

Going back to the example above,

yt 0.036 0.69 yt 1 0.42ut 1 ut
Suppose that we know t-1, t-2,... and we are trying to forecast yt.
Our forecast for t is given by

E ( yt yt 1 , yt 2 ,...) = f t 1,1 0.036 0.69 yt 1 0.42ut 1 ut

= 0.036 +0.693.4+0.42(-1.3)
= 1.836


Introductory Econometrics for Finance Chris Brooks 2008

ft-1,2 = E ( yt 1 yt 1 , yt 2 ,...) 0.036 0.69 yt 0.42ut ut 1

But we do not know yt or ut at time t-1.
Replace yt with our forecast of yt which is ft-1,1.

= 0.036 +0.69 ft-1,1

= 0.036 + 0.69*1.836
= 1.302


= 0.036 +0.69 ft-1,2

= 0.036 + 0.69*1.302
= 0.935

(b) Given the forecasts and the actual value, it is very easy to calculate the
MSE by plugging the numbers in to the relevant formula, which in this case is



n 1

( xt 1 n f t 1, n ) 2

if we are making N forecasts which are numbered 1,2,3.

Then the MSE is given by

(1.836 0.032) 2 (1.302 0.961) 2 (0.935 0.203) 2

(3.489 0.116 0.536) 1.380


Notice also that 84% of the total MSE is coming from the error in the first
forecast. Thus error measures can be driven by one or two times when the
model fits very badly. For example, if the forecast period includes a stock
market crash, this can lead the mean squared error to be 100 times bigger
than it would have been if the crash observations were not included. This point
needs to be considered whenever forecasting models are evaluated. An idea of
whether this is a problem in a given situation can be gained by plotting the
forecast errors over time.
(c) This question is much simpler to answer than it looks! In fact, the inclusion
of the smoothing coefficient is a red herring - i.e. a piece of misleading and
useless information. The correct approach is to say that if we believe that the
exponential smoothing model is appropriate, then all useful information will
have already been used in the calculation of the current smoothed value
(which will of course have used the smoothing coefficient in its calculation).
Thus the three forecasts are all 0.0305.
(d) The solution is to work out the mean squared error for the exponential
smoothing model. The calculation is


Introductory Econometrics for Finance Chris Brooks 2008

(0.0305 0.032) 2 (0.0305 0.961) 2 (0.0305 0.203) 2


0.0039 0.8658 0.0298 0.2998

Therefore, we conclude that since the mean squared error is smaller for the
exponential smoothing model than the Box Jenkins model, the former
produces the more accurate forecasts. We should, however, bear in mind that
the question of accuracy was determined using only 3 forecasts, which would
be insufficient in a real application.
11. (a) The shapes of the acf and pacf are perhaps best summarised in a table:

No significant coefficients

No significant coefficients

Geometrically declining or
damped sinusoid acf

First 2 pacf coefficients

significant, all others
Geometrically declining or
damped sinusoid pacf
Geometrically declining or
damped sinusoid pacf

First acf coefficient significant,

all others insignificant
Geometrically declining or
damped sinusoid acf

A couple of further points are worth noting. First, it is not possible to tell what
the signs of the coefficients for the acf or pacf would be for the last three
processes, since that would depend on the signs of the coefficients of the
processes. Second, for mixed processes, the AR part dominates from the point
of view of acf calculation, while the MA part dominates for pacf calculation.
(b) The important point here is to focus on the MA part of the model and to
ignore the AR dynamics. The characteristic equation would be
(1+0.42z) = 0
The root of this equation is -1/0.42 = -2.38, which lies outside the unit circle,
and therefore the MA part of the model is invertible.
(c) Since no values for the series y or the lagged residuals are given, the
answers should be stated in terms of y and of u. Assuming that information is
available up to and including time t, the 1-step ahead forecast would be for
time t+1, the 2-step ahead for time t+2 and so on. A useful first step would be
to write the model out for y at times t+1, t+2, t+3, t+4:
yt 1 0.036 0.69 yt 0.42u t u t 1
yt 2 0.036 0.69 yt 1 0.42u t 1 u t 2


Introductory Econometrics for Finance Chris Brooks 2008

yt 3 0.036 0.69 yt 2 0.42u t 2 u t 3

yt 4 0.036 0.69 yt 3 0.42u t 3 u t 4

The 1-step ahead forecast would simply be the conditional expectation of y for
time t+1 made at time t. Denoting the 1-step ahead forecast made at time t as
ft,1, the 2-step ahead forecast made at time t as ft,2 and so on:

E ( yt 1 yt , yt 1 ,...) f t ,1 Et [ yt 1 ] Et [0.036 0.69 yt 0.42ut ut 1 ] 0.036 0.69 yt 0.42ut

since Et[ut+1]=0. The 2-step ahead forecast would be given by
E ( yt 2 yt , yt 1,...) ft , 2 Et [ yt 2 ] Et [0.036 0.69 yt 1 0.42ut 1 ut 2 ] 0.036 0.69 f t ,1

since Et[ut+1]=0 and Et[ut+2]=0. Thus, beyond 1-step ahead, the MA(1) part of
the model disappears from the forecast and only the autoregressive part
remains. Although we do not know yt+1, its expected value is the 1-step ahead
forecast that was made at the first stage, ft,1.
The 3-step ahead forecast would be given by
E ( yt 3 yt , yt 1,...) ft ,3 Et [ yt 3 ] Et [0.036 0.69 yt 2 0.42ut 2 ut 3 ] 0.036 0.69 f t , 2

and the 4-step ahead by

E ( yt 4 yt , yt 1,...) ft , 4 Et [ yt 4 ] Et [0.036 0.69 yt 3 0.42ut 3 ut 4 ] 0.036 0.69 f t ,3

(d) A number of methods for aggregating the forecast errors to produce a

single forecast evaluation measure were suggested in the paper by Makridakis
and Hibon (1995) and some discussion is presented in the book. Any of the
methods suggested there could be discussed. A good answer would present an
expression for the evaluation measures, with any notation introduced being
carefully defined, together with a discussion of why the measure takes the
form that it does and what the advantages and disadvantages of its use are
compared with other methods.
(e) Moving average and ARMA models cannot be estimated using OLS they
are usually estimated by maximum likelihood. Autoregressive models can be
estimated using OLS or maximum likelihood. Pure autoregressive models
contain only lagged values of observed quantities on the RHS, and therefore,
the lags of the dependent variable can be used just like any other regressors.
However, in the context of MA and mixed models, the lagged values of the
error term that occur on the RHS are not known a priori. Hence, these
quantities are replaced by the residuals, which are not available until after the
model has been estimated. But equally, these residuals are required in order to
be able to estimate the model parameters. Maximum likelihood essentially
works around this by calculating the values of the coefficients and the
residuals at the same time. Maximum likelihood involves selecting the most
likely values of the parameters given the actual data sample, and given an


Introductory Econometrics for Finance Chris Brooks 2008

assumed statistical distribution for the errors. This technique will be

discussed in greater detail in the section on volatility modelling in Chapter 8.
12. (a) Some of the stylised differences between the typical characteristics of
macroeconomic and financial data were presented in Chapter 1. In particular,
one important difference is the frequency with which financial asset return
time series and other quantities in finance can be recorded. This is of
particular relevance for the models discussed in Chapter 5, since it is usually a
requirement that all of the time-series data series used in estimating a given
model must be of the same frequency. Thus, if, for example, we wanted to
build a model for forecasting hourly changes in exchange rates, it would be
difficult to set up a structural model containing macroeconomic explanatory
variables since the macroeconomic variables are likely to be measured on a
quarterly or at best monthly basis. This gives a motivation for using pure timeseries approaches (e.g. ARMA models), rather than structural formulations
with separate explanatory variables.
It is also often of particular interest to produce forecasts of financial variables
in real time. Producing forecasts from pure time-series models is usually
simply an exercise in iterating with conditional expectations. But producing
forecasts from structural models is considerably more difficult, and would
usually require the production of forecasts for the structural variables as well.
(b) A simple rule of thumb for determining whether autocorrelation
coefficients and partial autocorrelation coefficients are statistically significant
is to classify them as significant at the 5% level if they lie outside of
1.96 *
, where T is the sample size. In this case, T = 500, so a particular
coefficient would be deemed significant if it is larger than 0.088 or smaller
than 0.088. On this basis, the autocorrelation coefficients at lags 1 and 5 and
the partial autocorrelation coefficients at lags 1, 2, and 3 would be classed as
significant. The formulae for the Box-Pierce and the Ljung-Box test statistics
are respectively

k 1




Q* T (T 2)
k 1

T k

In this instance, the statistics would be calculated respectively as

Q 500 [0.307 2 (0.0132 ) 0.086 2 0.0312 (0.197 2 )] 70.79


0.307 2 (0.013 2 ) 0.086 2

0.0312 (0.197 2 )
Q* 500 502

500 2
500 3 500 4
500 5
500 1


Introductory Econometrics for Finance Chris Brooks 2008

The test statistics will both follow a 2 distribution with 5 degrees of freedom
(the number of autocorrelation coefficients being used in the test). The critical
values are 11.07 and 15.09 at 5% and 1% respectively. Clearly, the null
hypothesis that the first 5 autocorrelation coefficients are jointly zero is
resoundingly rejected.
(c) Setting aside the lag 5 autocorrelation coefficient, the pattern in the table is
for the autocorrelation coefficient to only be significant at lag 1 and then to fall
rapidly to values close to zero, while the partial autocorrelation coefficients
appear to fall much more slowly as the lag length increases. These
characteristics would lead us to think that an appropriate model for this series
is an MA(1). Of course, the autocorrelation coefficient at lag 5 is an anomaly
that does not fit in with the pattern of the rest of the coefficients. But such a
result would be typical of a real data series (as opposed to a simulated data
series that would have a much cleaner structure). This serves to illustrate that
when econometrics is used for the analysis of real data, the data generating
process was almost certainly not any of the models in the ARMA family. So all
we are trying to do is to find a model that best describes the features of the
data to hand. As one econometrician put it, all models are wrong, but some are
(d) Forecasts from this ARMA model would be produced in the usual way.
Using the same notation as above, and letting fz,1 denote the forecast for time
z+1 made for x at time z, etc:
Model A: MA(1)

f z ,1 0.38 0.10u t 1
f z , 2 0.38 0.10 0.02 0.378
f z , 2 f z ,3 0.38
Note that the MA(1) model only has a memory of one period, so all forecasts
further than one step ahead will be equal to the intercept.
Model B: AR(2)
xt 0.63 0.17 xt 1 0.09 xt 2

f z ,1 0.63 0.17 0.31 0.09 0.02 0.681

f z , 2 0.63 0.17 0.681 0.09 0.31 0.718
f z ,3 0.63 0.17 0.718 0.09 0.681 0.690
f z , 4 0.63 0.17 0.690 0.09 0.716 0.683
(e) The methods are overfitting and residual diagnostics. Overfitting involves
selecting a deliberately larger model than the proposed one, and examining
the statistical significances of the additional parameters. If the additional
parameters are statistically insignificant, then the originally postulated model
is deemed acceptable. The larger model would usually involve the addition of


Introductory Econometrics for Finance Chris Brooks 2008

one extra MA term and one extra AR term. Thus it would be sensible to try an
ARMA(1,2) in the context of Model A, and an ARMA(3,1) in the context of
Model B. Residual diagnostics would involve examining the acf and pacf of the
residuals from the estimated model. If the residuals showed any action, that
is, if any of the acf or pacf coefficients showed statistical significance, this
would suggest that the original model was inadequate. Residual diagnostics
in the Box-Jenkins sense of the term involved only examining the acf and pacf,
rather than the array of diagnostics considered in Chapter 4.
It is worth noting that these two model evaluation procedures would only
indicate a model that was too small. If the model were too large, i.e. it had
superfluous terms, these procedures would deem the model adequate.
(f) There are obviously several forecast accuracy measures that could be
employed, including MSE, MAE, and the percentage of correct sign
predictions. Assuming that MSE is used, the MSE for each model is

MSE ( Model A)
MSE ( Model B)

(0.378 0.62) 2 (0.38 0.19) 2 (0.38 0.32) 2 (0.38 0.72) 2 0.175

(0.681 0.62) 2 (0.718 0.19) 2 (0.690 0.32) 2 (0.683 0.72) 2 0.326

Therefore, since the mean squared error for Model A is smaller, it would be
concluded that the moving average model is the more accurate of the two in


Introductory Econometrics for Finance Chris Brooks 2008

Lecture 5. Multivariate time series

1. (a) This is simple to accomplish in theory, but difficult in practice as a
result of the algebra. The original equations are (renumbering them (1),
(2) and (3) for simplicity)

y1t 0 1 y 2 t 2 y 3t 3 X 1t 4 X 2 t u1t
y 2 t 0 1 y 3t 2 X 1t 3 X 3t u2 t
y 3t 0 1 y1t 2 X 2 t 3 X 3t u3t

( 3)

The easiest place to start (I think) is to take equation (1), and substitute in for
y3t, to get

y1t 0 1 y2t 2 ( 0 1 y1t 2 X 2t 3 X 3t u3t ) 3 X 1t 4 X 2t u1t

Working out the products that arise when removing the brackets,

y1t 0 1 y2t 2 0 2 1 y1t 2 2 X 2t 2 3 X 3t 2 u3t 3 X 1t 4 X 2t u1t

Gathering terms in y1t on the LHS:

y1t 2 1 y1t 0 1 y2t 2 0 2 2 X 2t 2 3 X 3t 2 u3t 3 X 1t 4 X 2t u1t

y1t (1 2 1 ) 0 1 y2t 2 0 2 2 X 2t 2 3 X 3t 2 u3t 3 X 1t 4 X 2t u1t
Now substitute into (2) for y3t from (3).
y2t 0 1 ( 0 1 y1t 2 X 2t 3 X 3t u3t ) 2 X 1t 3 X 3t u2t
Removing the brackets

y2t 0 1 0 1 1 y1t 1 2 X 2t 1 3 X 3t 1u3t 2 X 1t 3 X 3t u2t


Substituting into (4) for y2t from (5),

y1t (1 2 1 ) 0 1 ( 0 1 0 1 1 y1t 2 X 2 t 1 3 X 3t 1u3t 2 X 1t

3 X 3t u2 t ) 2 0 2 2 X 2 t 2 3 X 3t 2 u3t 3 X 1t 4 X 2 t u1t
Taking the y1t terms to the LHS:

y1t (1 2 1 1 1 1 ) 0 1 0 1 1 0 1 2 X 2 t 1 1 3 X 3t 1 1u3t 1 2 X 1t

13 X 3t 1u2 t 2 0 2 2 X 2 t 2 3 X 3t 2 u3t 3 X 1t 4 X 2 t u1t

Gathering like-terms in the other variables together:


Introductory Econometrics for Finance Chris Brooks 2008

y1t (1 2 1 1 1 1 ) 0 1 0 1 1 0 2 0 X 1t (1 2 3 ) X 2 t (1 1 2 2 2 4 )
X 3t (1 1 3 1 3 2 3 ) u3t (1 1 2 ) 1 u2 t u1t

Multiplying all through equation (3) by (1 2 1 1 1 1 ) :

y3t (1 2 1 11 1 ) 0 (1 2 1 11 1 ) 1 y1t (1 2 1 11 1 )

2 X 2 t (1 2 1 11 1 ) 3 X 3t (1 2 1 11 1 ) u3t (1 2 1 11 1 )
Replacing y1t (1 2 1 11 1 )

in (7) with the RHS of (6),

0 1 0 1 1 0 2 0 X 1t (1 2 3 )

y 3t (1 2 1 11 1 ) 0 (1 2 1 11 1 ) 1 X 2 t (1 1 2 2 2 4 ) X 3t (1 1 3 1 3
2 3 ) u3t (1 1 2 ) 1u2 t u1t

2 X 2 t (1 2 1 11 1 ) 3 X 3t (1 2 1 11 1 ) u3t (1 2 1 11 1 )
Expanding the brackets in equation (8) and cancelling the relevant terms

y3t (1 2 1 11 1 ) 0 10 11 0 X 1t (1 2 1 1 3 ) X 2 t ( 2 14 )
X 3t ( 11 3 3 ) u3t 11u2 t 1u1t
Multiplying all through equation (2) by (1 2 1 1 1 1 ) :

y2 t (1 1 1 1 12 ) 0 (1 1 1 1 12 ) 1 y3t (1 1 1 1 12 )

2 X 1t (1 1 1 1 12 ) 3 X 3t (1 1 11 12 ) u2 t (1 1 1 1 12 )
Replacing y3t (1 2 1 11 1 )

in (10) with the RHS of (9),

0 1 0 11 0 X 1t (1 2 1 1 3 )

y 2 t (1 1 1 1 1 2 ) 0 (1 1 1 1 12 ) 1 X 2 t ( 2 1 4 ) X 3t ( 3 11 3 ) u3t
11u2 t 1u1t

2 X 1t (1 1 1 1 12 ) 3 X 3t (1 1 1 1 1 2 ) u2 t (1 1 1 1 1 2 )
Expanding the brackets in (11) and cancelling the relevant terms

y2t (1 1 1
1 12 ) 0 02 1
1 0
1 10 X 1t
1 1 3 2 22 1 ) X 2 t (
1 2
1 14 )
X 3t (
1 3 3 32 1 ) 1u3t u2 t (1 2 1 )
1 1u1t


Introductory Econometrics for Finance Chris Brooks 2008

Although it might not look like it (!), equations (6), (12), and (9) respectively
will give the reduced form equations corresponding to (1), (2), and (3), by
doing the necessary division to make y1t, y2t, or y3t the subject of the formula.
From (6),

0 1 0 1 1 0 2 0
(1 2 3 )
( 2 2 4 )

X 1t 1 1 2
X 2t
(1 2 1 1 1 1 )
(1 2 1 1 1 1 )
(1 2 1 1 1 1 )
(1 1 3 1 3 2 3 )
u ( 2 ) 1u2 t u1t
X 3t 3t 1 1
(1 2 1 1 1 1 )
(1 2 1 1 1 1 )
From (12),

y2 t

0 02 1 1 01 10 ( 1 1 3 2 22 1 )
( 1 2 1 14 )

(1 1 11 12 )
(1 1 11 12 )
(1 1 11 12 ) 2 t

( 1 3 3 32 1 )
u u (1 2 1 ) 1 1u1t
X 3t 1 3t 2 t
(1 1 11 12 )
(1 1 11 12 )
From (9),

y 3t

0 10 11 0
(1 2 1 1 3 )
( 2 1 4 )

X 1t
(1 2 1 11 1 ) (1 2 1 11 1 )
(1 2 1 11 1 ) 2 t

( 11 3 3 )
u 11u2 t 1u1t
X 3t 3t
(1 2 1 11 1 )
(1 2 1 11 1 )


Notice that all of the reduced form equations (13)-(15) in this case depend on
all of the exogenous variables, which is not always the case, and that the
equations contain only exogenous variables on the RHS, which must be the
case for these to be reduced forms.
(b) The term identification refers to whether or not it is in fact possible to
obtain the structural form coefficients (the , , and s in equations (1)-(3))
from the reduced form coefficients (the s) by substitution. An equation can
be over-identified, just-identified, or under-identified, and the equations in a
system can have differing orders of identification. If an equation is underidentified (or not identified), then we cannot obtain the structural form
coefficients from the reduced forms using any technique. If it is just identified,
we can obtain unique structural form estimates by back-substitution, while if
it is over-identified, we cannot obtain unique structural form estimates by
substituting from the reduced forms.
There are two rules for determining the degree of identification of an
equation: the rank condition, and the order condition. The rank condition is a
necessary and sufficient condition for identification, so if the rule is satisfied,
it guarantees that the equation is indeed identified. The rule centres around a
restriction on the rank of a sub-matrix containing the reduced form


Introductory Econometrics for Finance Chris Brooks 2008

coefficients, and is rather complex and not particularly illuminating, and was
therefore not covered in this course.
The order condition, can be expressed in a number of ways, one of which is the
following. Let G denote the number of structural equations (equal to the
number of endogenous variables). An equation is just identified if G-1
variables are absent. If more than G-1 are absent, then the equation is overidentified, while if fewer are absent, then it is not identified.
Applying this rule to equations (1)-(3), G=3, so for an equation to be
identified, we require 2 to be absent. The variables in the system are y1, y2, y3,
X1, X2, X3. Is this the case?
Equation (1): X3t only is missing, so the equation is not identified.
Equation (2): y1t and X2t are missing, so the equation is just identified.
Equation (3): y2t and X1t are missing, so the equation is just identified.
However, the order condition is only a necessary (and not a sufficient)
condition for identification, so there will exist cases where a given equation
satisfies the order condition, but we still cannot obtain the structural form
coefficients. Fortunately, for small systems this is rarely the case. Also, in
practice, most systems are designed to contain equations that are overidentified.
(c). It was stated in Chapter 4 that omitting a relevant variable from a
regression equation would lead to an omitted variable bias (in fact an
inconsistency as well), while including an irrelevant variable would lead to
unbiased but inefficient coefficient estimates. There is a direct analogy with
the simultaneous variable case. Treating a variable as exogenous when it really
should be endogenous because there is some feedback, will result in biased
and inconsistent parameter estimates. On the other hand, treating a variable
as endogenous when it really should be exogenous (that is, having an equation
for the variable and then substituting the fitted value from the reduced form if
2SLS is used, rather than just using the actual value of the variable) would
result in unbiased but inefficient coefficient estimates.
If we take the view that consistency and unbiasedness are more important that
efficiency (which is the view that I think most econometricians would take),
this implies that treating an endogenous variable as exogenous represents the
more severe mis-specification. So if in doubt, include an equation for it!
(Although, of course, we can test for exogeneity using a Hausman-type test).
(d). A tempting response to the question might be to describe indirect least
squares (ILS), that is estimating the reduced form equations by OLS and then
substituting back to get the structural forms; however, this response would be
WRONG, since the question tells us that the system is over-identified.
A correct answer would be to describe either two stage least squares (2SLS) or
instrumental variables (IV). Either would be acceptable, although IV requires
the user to determine an appropriate set of instruments and hence 2SLS is
simpler in practice. 2SLS involves estimating the reduced form equations, and


Introductory Econometrics for Finance Chris Brooks 2008

obtaining the fitted values in the first stage. In the second stage, the structural
form equations are estimated, but replacing the endogenous variables on the
RHS with their stage one fitted values. Application of this technique will yield
unique and unbiased structural form coefficients.
2. (a) A glance at equations (6.97) and (6.98) reveals that the dependent
variable in (6.97) appears as an explanatory variable in (6.98) and that the
dependent variable in (6.98) appears as an explanatory variable in (6.97). The
result is that it would be possible to show that the explanatory variable y2t in
(6.97) will be correlated with the error term in that equation, u1t, and that the
explanatory variable y1t in (6.98) will be correlated with the error term in that
equation, u2t. Thus, there is causality from y1t to y2t and from y2t to y1t, so that
this is a simultaneous equations system. If OLS were applied separately to
each of equations (6.97) and (6.98), the result would be biased and
inconsistent parameter estimates. That is, even with an infinitely large
number of observations, OLS could not be relied upon to deliver the
appropriate parameter estimates.
(b) If the variable y1t had not appeared on the RHS of equation (6.98), this
would no longer be a simultaneous system, but would instead be an example
of a triangular system (see question 3). Thus it would be valid to apply OLS
separately to each of the equations (6.97) and (6.98).
(c) The order condition for determining whether an equation from a
simultaneous system is identified was described in question 1, part (b). There
are 2 equations in the system of (6.97) and (6.98), so that only 1 variable
would have to be missing from an equation to make it just identified. If no
variables are absent, the equation would not be identified, while if more than
one were missing, the equation would be over-identified. Considering
equation (6.97), no variables are missing so that this equation is not
identified, while equation (6.98) excludes only variable X2t, so that it is just
(d) Since equation (6.97) is not identified, no method could be used to obtain
estimates of the parameters of this equation, while either ILS or 2SLS could be
used to obtain estimates of the parameters of (6.98), since it is just identified.
ILS operates by obtaining and estimating the reduced form equations and
then obtaining the structural parameters of (6.98) by algebraic backsubstitution. 2SLS involves again obtaining and estimating the reduced form
equations, and then estimating the structural equations but replacing the
endogenous variables on the RHS of (6.97) and (6.98) with their reduced form
fitted values.
Comparing between ILS and 2SLS, the former method only requires one set of
estimations rather than two, but this is about its only advantage, and
conducting a second stage OLS estimation is usually a computationally trivial
exercise. The primary disadvantage of ILS is that it is only applicable to just
identified equations, whereas many sets of equations that we may wish to
estimate are over-identified. Second, obtaining the structural form coefficients


Introductory Econometrics for Finance Chris Brooks 2008

via algebraic substitution can be a very tedious exercise in the context of large
systems (as the solution to question 1, part (a) shows!).
(e) The Hausman procedure works by first obtaining and estimating the
reduced form equations, and then estimating the structural form equations
separately using OLS, but also adding the fitted values from the reduced form
estimations as additional explanatory variables in the equations where those
variables appear as endogenous RHS variables. Thus, if the reduced form
fitted values corresponding to equations (6.97) and (6.98) are given by y1t and
y2t respectively, the Hausmann test equations would be

y1t 0 1 y 2t 2 X 1t 3 X 2t 4 y 2t 'u1t
y 2t 0 1 y1t 2 X 1t 3 y1t ' u1t

Separate tests of the significance of the y1t and y2t terms would then be
performed. If it were concluded that they were both significant, this would
imply that additional explanatory power can be obtained by treating the
variables as endogenous.
3. An example of a triangular system was given in Section 6.7. Consider a
scenario where there are only two endogenous variables. The key distinction
between this and a fully simultaneous system is that in the case of a triangular
system, causality runs only in one direction, whereas for a simultaneous
equation, it would run in both directions. Thus, to give an example, for the
system to be triangular, y1 could appear in the equation for y2 and not vice
versa. For the simultaneous system, y1 would appear in the equation for y2,
and y2 would appear in the equation for y1.
4. (a) p=2 and k=3 implies that there are two variables in the system, and that
both equations have three lags of the two variables. The VAR can be written in
long-hand form as:

y1t 10 111 y1t 1 211 y 2t 1 112 y1t 2 212 y 2t 2 113 y1t 3 213 y 2t 3 u1t
y 2t 20 121 y1t 1 221 y 2t 1 122 y1t 2 222 y 2t 2 123 y1t 3 223 y 2t 3 u 2t

where 0 , yt , ut , and the coefficients on the lags of yt
y2 t
u2 t
are defined as follows: ijk refers to the kth lag of the ith variable in the jth
equation. This seems like a natural notation to use, although of course any
sensible alternative would also be correct.
(b) This is basically a what are the advantages of VARs compared with
structural models? type question, to which a simple and effective response
would be to list and explain the points made in the book.


Introductory Econometrics for Finance Chris Brooks 2008

The most important point is that structural models require the researcher to
specify some variables as being exogenous (if all variables were endogenous,
then none of the equations would be identified, and therefore estimation of the
structural equations would be impossible). This can be viewed as a restriction
(a restriction that the exogenous variables do not have any simultaneous
equations feedback), often called an identifying restriction. Determining
what are the identifying restrictions is supposed to be based on economic or
financial theory, but Sims, who first proposed the VAR methodology, argued
that such restrictions were incredible. He thought that they were too loosely
based on theory, and were often specified by researchers on the basis of giving
the restrictions that the models required to make the equations identified.
Under a VAR, all the variables have equations, and so in a sense, every variable
is endogenous, which takes the ability to cheat (either deliberately or
inadvertently) or to mis-specify the model in this way, out of the hands of the
Another possible reason why VARs are popular in the academic literature is
that standard form VARs can be estimated using OLS since all of the lags on
the RHS are counted as pre-determined variables.
Further, a glance at the academic literature which has sought to compare the
forecasting accuracies of structural models with VARs, reveals that VARs seem
to be rather better at forecasting (perhaps because the identifying restrictions
are not valid). Thus, from a purely pragmatic point of view, researchers may
prefer VARs if the purpose of the modelling exercise is to produce precise
point forecasts.
(c) VARs have, of course, also been subject to criticisms. The most important
of these criticisms is that VARs are atheoretical. In other words, they use very
little information form economic or financial theory to guide the model
specification process. The result is that the models often have little or no
theoretical interpretation, so that they are of limited use for testing and
evaluating theories.
Second, VARs can often contain a lot of parameters. The resulting loss in
degrees of freedom if the VAR is unrestricted and contains a lot of lags, could
lead to a loss of efficiency and the inclusion of lots of irrelevant or marginally
relevant terms. Third, it is not clear how the VAR lag lengths should be
chosen. Different methods are available (see part (d) of this question), but
they could lead to widely differing answers.
Finally, the very tools that have been proposed to help to obtain useful
information from VARs, i.e. impulse responses and variance decompositions,
are themselves difficult to interpret! See Runkle (1987).
(d) The two methods that we have examined are model restrictions and
information criteria. Details on how these work are contained in Sections
6.12.4 and 6.12.5. But briefly, the model restrictions approach involves
starting with the larger of the two models and testing whether it can be
restricted down to the smaller one using the likelihood ratio test based on the


Introductory Econometrics for Finance Chris Brooks 2008

determinants of the variance-covariance matrices of residuals in each case.

The alternative approach would be to examine the value of various
information criteria and to select the model that minimises the criteria. Since
there are only two models to compare, either technique could be used. The
restriction approach assumes normality for the VAR error terms, while use of
the information criteria does not. On the other hand, the information criteria
can lead to quite different answers depending on which criterion is used and
the severity of its penalty term. A completely different approach would be to
put the VARs in the situation that they were intended for (e.g. forecasting,
making trading profits, determining a hedge ratio etc.), and see which one


Introductory Econometrics for Finance Chris Brooks 2008

Lecture 6. Modelling long-run relationships in finance

1. (a) Many series in finance and economics in their levels (or log-levels) forms
are non-stationary and exhibit stochastic trends. They have a tendency not to
revert to a mean level, but they wander for prolonged periods in one
direction or the other. Examples would be most kinds of asset or goods prices,
GDP, unemployment, money supply, etc. Such variables can usually be made
stationary by transforming them into their differences or by constructing
percentage changes of them.
(b) Non-stationarity can be an important determinant of the properties of a
series. Also, if two series are non-stationary, we may experience the problem
of spurious regression. This occurs when we regress one non-stationary
variable on a completely unrelated non-stationary variable, but yield a
reasonably high value of R2, apparently indicating that the model fits well.
Most importantly therefore, we are not able to perform any hypothesis tests in
models which inappropriately use non-stationary data since the test statistics
will no longer follow the distributions which we assumed they would (e.g. a t
or F), so any inferences we make are likely to be invalid.
(c) A weakly stationary process was defined in Chapter 5, and has the
following characteristics:
1. E(yt) =
2. E ( yt )( yt ) 2
3. E ( yt1 )( yt 2 ) t 2 t1 t1 , t2
That is, a stationary process has a constant mean, a constant variance, and a
constant covariance structure. A strictly stationary process could be defined by
an equation such as

Fx t1 , xt 2 ,..., xtT ( x1 ,..., xT ) Fx t1 k , xt 2 k ,..., xtT k ( x1 ,..., xT )

for any t1 , t2 , ..., tT Z, any k Z and T = 1, 2, ...., and where F denotes the
joint distribution function of the set of random variables. It should be evident
from the definitions of weak and strict stationarity that the latter is a stronger
definition and is a special case of the former. In the former case, only the first
two moments of the distribution has to be constant (i.e. the mean and
variances (and covariances)), whilst in the latter case, all moments of the
distribution (i.e. the whole of the probability distribution) has to be constant.
Both weakly stationary and strictly stationary processes will cross their mean
value frequently and will not wander a long way from that mean value.
An example of a deterministic trend process was given in Figure 7.5. Such a
process will have random variations about a linear (usually upward) trend. An
expression for a deterministic trend process yt could be


Introductory Econometrics for Finance Chris Brooks 2008

yt = + t + u t
where t = 1, 2,, is the trend and ut is a zero mean white noise disturbance
term. This is called deterministic non-stationarity because the source of the
non-stationarity is a deterministic straight line process.
A variable containing a stochastic trend will also not cross its mean value
frequently and will wander a long way from its mean value. A stochastically
non-stationary process could be a unit root or explosive autoregressive process
such as
yt = yt-1 + ut
where 1.
2. (a)The null hypothesis is of a unit root against a one sided stationary
alternative, i.e. we have
H0 : yt I(1)
H1 : yt I(0)
which is also equivalent to
H0 : = 0
H1 : < 0
(b) The test statistic is given by / SE ( ) which equals -0.02 / 0.31 = -0.06
Since this is not more negative than the appropriate critical value, we do not
reject the null hypothesis.
(c) We therefore conclude that there is at least one unit root in the series
(there could be 1, 2, 3 or more). What we would do now is to regress 2yt on
yt-1 and test if there is a further unit root. The null and alternative hypotheses
would now be
H0 : yt I(1) i.e. yt I(2)
H1 : yt I(0) i.e. yt I(1)
If we rejected the null hypothesis, we would therefore conclude that the first
differences are stationary, and hence the original series was I(1). If we did not
reject at this stage, we would conclude that yt must be at least I(2), and we
would have to test again until we rejected.
(d) We cannot compare the test statistic with that from a t-distribution since
we have non-stationarity under the null hypothesis and hence the test statistic
will no longer follow a t-distribution.
3. Using the same regression as above, but on a different set of data, the
researcher now obtains the estimate =-0.52 with standard error = 0.16.


Introductory Econometrics for Finance Chris Brooks 2008

(a) The test statistic is calculated as above. The value of the test statistic = 0.52 /0.16 = -3.25. We therefore reject the null hypothesis since the test
statistic is smaller (more negative) than the critical value.
(b) We conclude that the series is stationary since we reject the unit root null
hypothesis. We need do no further tests since we have already rejected.
(c) The researcher is correct. One possible source of non-whiteness is when
the errors are autocorrelated. This will occur if there is autocorrelation in the
original dependent variable in the regression (yt). In practice, we can easily
get around this by augmenting the test with lags of the dependent variable to
soak up the autocorrelation. The appropriate number of lags can be
determined using the information criteria.
4. (a) If two or more series are cointegrated, in intuitive terms this implies that
they have a long run equilibrium relationship that they may deviate from in
the short run, but which will always be returned to in the long run. In the
context of spot and futures prices, the fact that these are essentially prices of
the same asset but with different delivery and payment dates, means that
financial theory would suggest that they should be cointegrated. If they were
not cointegrated, this would imply that the series did not contain a common
stochastic trend and that they could therefore wander apart without bound
even in the long run. If the spot and futures prices for a given asset did
separate from one another, market forces would work to bring them back to
follow their long run relationship given by the cost of carry formula.
The Engle-Granger approach to cointegration involves first ensuring that the
variables are individually unit root processes (note that the test is often
conducted on the logs of the spot and of the futures prices rather than on the
price series themselves). Then a regression would be conducted of one of the
series on the other (i.e. regressing spot on futures prices or futures on spot
prices) would be conducted and the residuals from that regression collected.
These residuals would then be subjected to a Dickey-Fuller or augmented
Dickey-Fuller test. If the null hypothesis of a unit root in the DF test
regression residuals is not rejected, it would be concluded that a stationary
combination of the non-stationary variables has not been found and thus that
there is no cointegration. On the other hand, if the null is rejected, it would be
concluded that a stationary combination of the non-stationary variables has
been found and thus that the variables are cointegrated.
Forming an error correction model (ECM) following the Engle-Granger
approach is a 2-stage process. The first stage is (assuming that the original
series are non-stationary) to determine whether the variables are
cointegrated. If they are not, obviously there would be no sense in forming an
ECM, and the appropriate response would be to form a model in first
differences only. If the variables are cointegrated, the second stage of the
process involves forming the error correction model which, in the context of
spot and futures prices, could be of the form given in equation (7.57) on page


Introductory Econometrics for Finance Chris Brooks 2008

(b) There are many other examples that one could draw from financial or
economic theory of situations where cointegration would be expected to be
present and where its absence could imply a permanent disequilibrium. It is
usually the presence of market forces and investors continually looking for
arbitrage opportunities that would lead us to expect cointegration to exist.
Good illustrations include equity prices and dividends, or price levels in a set
of countries and the exchange rates between them. The latter is embodied in
the purchasing power parity (PPP) theory, which suggests that a
representative basket of goods and services should, when converted into a
common currency, cost the same wherever in the world it is purchased. In the
context of PPP, one may expect cointegration since again, its absence would
imply that relative prices and the exchange rate could wander apart without
bound in the long run. This would imply that the general price of goods and
services in one country could get permanently out of line with those, when
converted into a common currency, of other countries. This would not be
expected to happen since people would spot a profitable opportunity to buy
the goods in one country where they were cheaper and to sell them in the
country where they were more expensive until the prices were forced back into
line. There is some evidence against PPP, however, and one explanation is that
transactions costs including transportation costs, currency conversion costs,
differential tax rates and restrictions on imports, stop full adjustment from
taking place. Services are also much less portable than goods and everybody
knows that everything costs twice as much in the UK as anywhere else in the
5. (a) The Johansen test is computed in the following way. Suppose we have p
variables that we think might be cointegrated. First, ensure that all the
variables are of the same order of non-stationary, and in fact are I(1), since it is
very unlikely that variables will be of a higher order of integration. Stack the
variables that are to be tested for cointegration into a p-dimensional vector,
called, say, yt. Then construct a p1 vector of first differences, yt, and form
and estimate the following VAR
yt = yt-k + 1 yt-1 + 2 yt-2 + ... + k-1 yt-(k-1) + ut
Then test the rank of the matrix . If is of zero rank (i.e. all the eigenvalues
are not significantly different from zero), there is no cointegration, otherwise,
the rank will give the number of cointegrating vectors. (You could also go into
a bit more detail on how the eigenvalues are used to obtain the rank.)
(b) Repeating the table given in the question, but adding the null and
alternative hypotheses in each case, and letting r denote the number of
cointegrating vectors:


Introductory Econometrics for Finance Chris Brooks 2008






Considering each row in the table in turn, and looking at the first one first, the
test statistic is greater than the critical value, so we reject the null hypothesis
that there are no cointegrating vectors. The same is true of the second row
(that is, we reject the null hypothesis of one cointegrating vector in favour of
the alternative that there are two). Looking now at the third row, we cannot
reject (at the 5% level) the null hypothesis that there are two cointegrating
vectors, and this is our conclusion. There are two independent linear
combinations of the variables that will be stationary.
(c) Johansens method allows the testing of hypotheses by considering them
effectively as restrictions on the cointegrating vector. The first thing to note is
that all linear combinations of the cointegrating vectors are also cointegrating
vectors. Therefore, if there are many cointegrating vectors in the unrestricted
case and if the restrictions are relatively simple, it may be possible to satisfy
the restrictions without causing the eigenvalues of the estimated coefficient
matrix to change at all. However, as the restrictions become more complex,
renormalisation will no longer be sufficient to satisfy them, so that imposing
them will cause the eigenvalues of the restricted coefficient matrix to be
different to those of the unrestricted coefficient matrix. If the restriction(s)
implied by the hypothesis is (are) nearly already present in the data, then the
eigenvectors will not change significantly when the restriction is imposed. If,
on the other hand, the restriction on the data is severe, then the eigenvalues
will change significantly compared with the case when no restrictions were
The test statistic for testing the validity of these restrictions is given by

[ln(1 ) ln(1 )] 2(p-r)

i r 1



i* are the characteristic roots (eigenvalues) of the restricted model

i are the characteristic roots (eigenvalues) of the unrestricted model

r is the number of non-zero (eigenvalues) characteristic roots in the
unrestricted model
p is the number of variables in the system.

Introductory Econometrics for Finance Chris Brooks 2008

If the restrictions are supported by the data, the eigenvalues will not change
much when the restrictions are imposed and so the test statistic will be small.
(d) There are many applications that could be considered, and tests for PPP,
for cointegration between international bond markets, and tests of the
expectations hypothesis were presented in Sections 7.9, 7.10, and 7.11
respectively. These are not repeated here.
(e) Both Johansen statistics can be thought of as being based on an
examination of the eigenvalues of the long run coefficient or matrix. In both
cases, the g eigenvalues (for a system containing g variables) are placed
ascending order: 1 2 ... g. The maximal eigenvalue (i.e. the max)
statistic is based on an examination of each eigenvalue separately, while the
trace statistic is based on a joint examination of the g-r largest eigenvalues. If
the test statistic is greater than the critical value from Johansens tables, reject
the null hypothesis that there are r cointegrating vectors in favour of the
alternative that there are r+1 (for max) or more than r (for trace). The testing
is conducted in a sequence and under the null, r = 0, 1, ..., g-1 so that the
hypotheses for trace and max are as follows

Null hypothesis for both tests

H0 :
H0 :
H0 :
H0 :

r = p-1

Trace alternative

Max alternative

H1: 0 < r g
H1: 1 < r g
H1: 2 < r g
H1: r = g

H1: r = 1
H1: r = 2
H1: r = 3
H1: r = g

Thus the trace test starts by examining all eigenvalues together to test H 0: r =
0, and if this is not rejected, this is the end and the conclusion would be that
there is no cointegration. If this hypothesis is not rejected, the largest
eigenvalue would be dropped and a joint test conducted using all of the
eigenvalues except the largest to test H0: r = 1. If this hypothesis is not
rejected, the conclusion would be that there is one cointegrating vector, while
if this is rejected, the second largest eigenvalue would be dropped and the test
statistic recomputed using the remaining g-2 eigenvalues and so on. The
testing sequence would stop when the null hypothesis is not rejected.
The maximal eigenvalue test follows exactly the same testing sequence with
the same null hypothesis as for the trace test, but the max test only considers
one eigenvalue at a time. The null hypothesis that r = 0 is tested using the
largest eigenvalue. If this null is rejected, the null that r = 1 is examined using
the second largest eigenvalue and so on.


Introductory Econometrics for Finance Chris Brooks 2008

6. (a) The operation of the Johansen test has been described in the book, and
also in question 5, part (a) above. If the rank of the matrix is zero, this
implies that there is no cointegration or no common stochastic trends between
the series. A finding that the rank of is one or two would imply that there
were one or two linearly independent cointegrating vectors or combinations of
the series that would be stationary respectively. A finding that the rank of is
3 would imply that the matrix is of full rank. Since the maximum number of
cointegrating vectors is g-1, where g is the number of variables in the system,
this does not imply that there 3 cointegrating vectors. In fact, the implication
of a rank of 3 would be that the original series were stationary, and provided
that unit root tests had been conducted on each series, this would have
effectively been ruled out.
(b) The first test of H0: r = 0 is conducted using the first row of the table.
Clearly, the test statistic is greater than the critical value so the null hypothesis
is rejected. Considering the second row, the same is true, so that the null of r =
1 is also rejected. Considering now H0: r = 2, the test statistic is smaller than
the critical value so that the null is not rejected. So we conclude that there are
2 cointegrating vectors, or in other words 2 linearly independent combinations
of the non-stationary variables that are stationary.
7. The fundamental difference between the Engle-Granger and the Johansen
approaches is that the former is a single-equation methodology whereas
Johansen is a systems technique involving the estimation of more than one
equation. The two approaches have been described in detail in Chapter 7 and
in the answers to the questions above, and will therefore not be covered again.
The main (arguably only) advantage of the Engle-Granger approach is its
simplicity and its intuitive interpretability. However, it has a number of
disadvantages that have been described in detail in Chapter 7, including its
inability to detect more than one cointegrating relationship and the
impossibility of validly testing hypotheses about the cointegrating vector.


Introductory Econometrics for Finance Chris Brooks 2008

Lecture 7. Modelling volatility and correlation

1. (a). A number of stylised features of financial data have been suggested at
the start of Chapter 8 and in other places throughout the book:
- Frequency: Stock market prices are measured every time there is a trade or
somebody posts a new quote, so often the frequency of the data is very high
- Non-stationarity: Financial data (asset prices) are covariance nonstationary; but if we assume that we are talking about returns from here on,
then we can validly consider them to be stationary.
- Linear Independence: They typically have little evidence of linear
(autoregressive) dependence, especially at low frequency.
- Non-normality: They are not normally distributed they are fat-tailed.
- Volatility pooling and asymmetries in volatility: The returns exhibit volatility
clustering and leverage effects.
Of these, we can allow for the non-stationarity within the linear (ARIMA)
framework, and we can use whatever frequency of data we like to form the
models, but we cannot hope to capture the other features using a linear model
with Gaussian disturbances.
(b) GARCH models are designed to capture the volatility clustering effects in
the returns (GARCH(1,1) can model the dependence in the squared returns, or
squared residuals), and they can also capture some of the unconditional
leptokurtosis, so that even if the residuals of a linear model of the form given
by the first part of the equation in part (e), the s, are leptokurtic, the
standardised residuals from the GARCH estimation are likely to be less
leptokurtic. Standard GARCH models cannot, however, account for leverage
(c) This is essentially a which disadvantages of ARCH are overcome by
GARCH question. The disadvantages of ARCH(q) are:
- How do we decide on q?
- The required value of q might be very large

Non-negativity constraints might be violated.


variance cannot be negative)

GARCH(1,1) goes some way to get around these. The GARCH(1,1) model has
only three parameters in the conditional variance equation, compared to q+1
for the ARCH(q) model, so it is more parsimonious. Since there are less
parameters than a typical qth order ARCH model, it is less likely that the

Introductory Econometrics for Finance Chris Brooks 2008

estimated values of one or more of these 3 parameters would be negative than

all q+1 parameters. Also, the GARCH(1,1) model can usually still capture all of
the significant dependence in the squared returns since it is possible to write
into the infinite past help to explain the current value of the conditional
variance, ht.
(d) There are a number that you could choose from, and the relevant ones that
were discussed in Chapter 8, inlcuding EGARCH, GJR or GARCH-M.
The first two of these are designed to capture leverage effects. These are
asymmetries in the response of volatility to positive or negative returns. The
standard GARCH model cannot capture these, since we are squaring the
lagged error term, and we are therefore losing its sign.
The conditional variance equations for the EGARCH and GJR models are
where It-1


= 1 if ut= 0 otherwise

The EGARCH model also has the added benefit that the model is expressed in
terms of the log of ht, so that even if the parameters are negative, the
conditional variance will always be positive. We do not therefore have to
artificially impose non-negativity constraints.
One form of the GARCH-M model can be written
so that the model allows the lagged value of the conditional variance to affect
the return. In other words, our best current estimate of the total risk of the

(e). Since yt are returns, we would expect their mean value (which will be
but suppose that we had a year of daily r
average daily percentage return over the year, which might be, say 0.05
something of that order. The unconditional variance of the disturbances would


Introductory Econometrics for Finance Chris Brooks 2008

respectively. The important thing is that all three alphas must be positive, and

(f) Since the model was estimated using maximum likelihood, it does not seem
natural to test this restriction using the F-test via comparisons of residual
sums of squares (and a t-test cannot be used since it is a test involving more
than one coefficient). Thus we should use one of the approaches to hypothesis
testing based on the principles of maximum likelihood (Wald, Lagrange
Multiplier, Likelihood Ratio). The easiest one to use would be the likelihood
ratio test, which would be computed as follows:
Estimate the unrestricted model and obtain the maximised value of the
log-likelihood function.
Impose the restriction by rearranging the model, and estimate the
restricted model, again obtaining the value of the likelihood at the new
optimum. Note that this value of the LLF will be likely to be lower than the
unconstrained maximum.

Then form the likelihood ratio test statistic given by

LR = -2(Lr -

where Lr and Lu are the values of the LLF for the restricted and unrestricted
models respectively, and m denotes the number of restrictions, which in this
case is one.
If the value of the test statistic is greater than the critical value, reject
the null hypothesis that the restrictions are valid.
(g) In fact, it is possible to produce volatility (conditional variance) forecasts in
exactly the same way as forecasts are generated from an ARMA model by
iterating through the equations with the conditional expectations operator.
We know all information including that available up to time T. The answer to
this question will use the convention from the GARCH modelling literature to
denote the conditional variance by ht rather than
denotes all information available up to and including observation T. Adding 1
then 2 then 3 to each of the time subscripts, we have the conditional variance
equations for times T+1, T+2, and T+3:


Introductory Econometrics for Finance Chris Brooks 2008

Let be the one step ahead forecast for h made at time T. This is easy to
calculate since, at time T, we know the values of all the terms on the RHS.
Given , how do we calculate , that is the 2-step ahead forecast for h made at
time T?
From (2), we can write
where ET( ) is the expectation, made at time T, of , which is the squared
can now write
Var(ut) = E[(ut -E(ut))2]= E[(ut)2].
The conditional variance of ut is ht, so

Turning this argument around, and applying it to the problem that we have,
ET[(uT+1)2] = hT+1
but we do not know hT+1 , so we replace it with , so that (4) becomes

What about the 3-step ahead forecast?

By similar arguments,

And so on. This is the method we could use to forecast the conditional variance
of yt. If yt were, say, daily returns on the FTSE, we could use these volatility
forecasts as an input in the Black Scholes equation to help determine the
appropriate price of FTSE index options.
(h) An s-step ahead forecast for the conditional variance could be written
hich is bigger than 1. It is obvious
forecasts to explode. The forecasts will keep on increasing and will tend to
infinity as the forecast horizon increases (i.e. as s increases). This is obviously
an undesirable property of a forecasting model! This is called nonstationarity in variance.


Introductory Econometrics for Finance Chris Brooks 2008

GARCH or IGARCH, there is a unit root in the conditional variance, and the
forecasts will stay constant as the forecast horizon increases.
2. (a) Maximum likelihood works by finding the most likely values of the
parameters given the actual data. More specifically, a log-likelihood function is
formed, usually based upon a normality assumption for the disturbance terms,
and the values of the parameters that maximise it are sought. Maximum
likelihood estimation can be employed to find parameter values for both linear
and non-linear models.
(b) The three hypothesis testing procedures available within the maximum
likelihood approach are lagrange multiplier (LM), likelihood ratio (LR) and
Wald tests. The differences between them are described in Figure 8.4, and are
not defined again here. The Lagrange multiplier test involves estimation only
under the null hypothesis, the likelihood ratio test involves estimation under
both the null and the alternative hypothesis, while the Wald test involves
estimation only under the alternative. Given this, it should be evident that the
LM test will in many cases be the simplest to compute since the restrictions
implied by the null hypothesis will usually lead to some terms cancelling out to
give a simplified model relative to the unrestricted model.
(c) OLS will give identical parameter estimates for all of the intercept and
slope parameters, but will give a slightly different parameter estimate for the
variance of the disturbances. These are shown in the Appendix to Chapter 8.
The difference in the OLS and maximum likelihood estimators for the variance
of the disturbances can be seen by comparing the divisors of equations
(8A.25) and (8A.26).
3. (a) The unconditional variance of a random variable could be thought of,
abusing the terminology somewhat, as the variance without reference to a
time index, or rather the variance of the data taken as a whole, without
conditioning on a particular information set. The conditional variance, on the
other hand, is the variance of a random variable at a particular point in time,
conditional upon a particular information set. The variance of ut, , conditional
-1, ut-2,...) = E[(ut-1, ut-2,...], while the unconditional variance would simply be
Forecasts from models such as GARCH would be conditional forecasts,
produced for a particular point in time, while historical volatility is an
unconditional measure that would generate unconditional forecasts. For
producing 1-step ahead forecasts, it is likely that a conditional model making
use of recent relevant information will provide more accurate forecasts
(although whether it would in any particular application is an empirical
question). As the forecast horizon increases, however, a GARCH model that is
stationary in variance will yield forecasts that converge upon the long-term
average (historical) volatility. By the time we reach 20-steps ahead, the


Introductory Econometrics for Finance Chris Brooks 2008

GARCH forecast is likely to be very close to the unconditional variance so that

there is little gain likely from using GARCH models for forecasts with very
long horizons. For approaches such as EWMA, where there is no converge on
an unconditional average as the prediction horizon increases, they are likely to
produce inferior forecasts as the horizon increases for series that show a longterm mean reverting pattern in volatility. This arises because if the volatility
estimate is above its historical average at the end of the in-sample estimation
period, EWMA would predict that it would continue at this level while in
reality it is likely to fall back towards its long-term mean eventually.
(b) Equation (8.110) is an equation showing that the variance of the
disturbances is not fixed over time, but rather varies systematically according
to a GARCH process. This is therefore an example of heteroscedasticity. Thus,
the consequences if it were present but ignored would be those described in
Chapter 4. In summary, the coefficient estimates would still be consistent and
unbiased but not efficient. There is therefore the possibility that the standard
error estimates calculated using the usual formulae would be incorrect leading
to inappropriate inferences.
(c) There are of course a large number of competing methods for measuring
and forecasting volatility, and it is worth stating at the outset that no research
has suggested that one method is universally superior to all others, so that
each method has its merits and may work well in certain circumstances.
Historical measures of volatility are just simple average measures for
example, the standard deviation of daily returns over a 3-year period. As such,
they are the simplest to calculate, but suffer from a number of shortcomings.
First, since the observations are unweighted, historical volatility can be slow to
respond to changing market circumstances, and would not take advantage of
short-term persistence in volatility that could lead to more accurate shortterm forecasts. Second, if there is an extreme event (e.g. a market crash), this
will lead the measured volatility to be high for a number of observations equal
to the measurement sample length. For example, suppose that volatility is
being measured using a 1-year (250-day) sample of returns, which is being
rolled forward one observation at a time to produce a series of 1-step ahead
volatility forecasts. If a market crash occurs on day t, this will increase the
measured level of volatility by the same amount right until day t+250 (i.e. it
will not decay away) and then it will disappear completely from the sample so
that measured volatility will fall abruptly. Exponential weighting of
observations as the EWMA model does, where the weight attached to each
observation in the calculation of volatility declines exponentially as the
observations go further back in time, will resolve both of these issues.
However, if forecasts are produced from an EWMA model, these forecasts will
not converge upon the long-term mean volatility estimate as the prediction
horizon increases, and this may be undesirable (see part (a) of this question).
(8.5) on page 443, although, of course, it can be estimated using maximum
likelihood). GARCH models overcome this problem with the forecasts as well,
since a GARCH model that is stationary in variance will have forecasts that
converge upon the long-term average as the horizon increases (see part (a) of
this question). GARCH models will also overcome the two problems with


Introductory Econometrics for Finance Chris Brooks 2008

unweighted averages described above. However, GARCH models are far more
difficult to estimate than the other two models, and sometimes, when
estimation goes wrong, the resulting parameter estimates can be nonsensical,
leading to nonsensical forecasts as well. Thus it is important to apply a reality
check to estimated GARCH models to ensure that the coefficient estimates
are intuitively plausible. Finally, implied volatility estimates are those derived
from the prices of traded options. The market-implied volatility forecasts are
obtained by backing out the volatility from the price of an option using an
option pricing formula together with an iterative search procedure. Financial
market practitioners would probably argue that implied forecasts of the future
volatility of the underlying asset are likely to be more accurate than those
estimated from statistical models because the people who work in financial
markets know more about what is likely to happen to those instruments in the
future than econometricians do. Also, an inaccurate volatility forecast
implied from an option price may imply an inaccurate option price and
therefore the possibility of arbitrage opportunities. However, the empirical
evidence on the accuracy of implied versus statistical forecasting models is
mixed, and some research suggests that implied volatility systematically overestimates the true volatility of the underlying asset returns. This may arise
from the use of an incorrect option pricing formula to obtain the implied
volatility for example, the Black-Scholes model assumes that the volatility of
the underlying asset is fixed (non-stochastic), and also that the returns to the
underlying asset are normally distributed. Both of these assumptions are at
best tenuous. A further reason for the apparent failure of the implied model
may be a manifestation of the peso problem. This occurs when market
practitioners include in the information set that they use to price options the
possibility of a very extreme return that has a low probability of occurrence,
but has important ramifications for the price of the option due to its sheer
size. If this event does not occur in the sample period over which the implied
and actual volatilities are compared, the implied model will appear inaccurate.
Yet this does not mean that the practitioners forecasts were wrong, but rather
simply that the low-probability, high-impact event did not happen during that
sample period. It is also worth stating that only one implied volatility can be
calculated from each option price for the average volatility of the underlying
asset over the remaining lifetime of the option.

4. (a). A possible diagonal VECH model would be


The coefficients expected would be very small for the conditional mean
be positive or negative, although a positive average return is probably more
likely. Similarly, the intercept terms in the conditional variance equations
would also be expected to be small since and positive this is daily data. The
coefficients on the lagged squared error and lagged conditional variance in the


Introductory Econometrics for Finance Chris Brooks 2008

conditional variance equations must lie between zero and one, and more
values for the conditional covariance equation are more difficult to predict:
covariances. The parameters in this equation could be negative, although
given that the returns for two stock markets are likely to be positively
correlated, the parameters would probably be positive, although the model
would still be a valid one if they were not.
(b) One of two procedures could be used. Either the daily returns data would
be transformed into weekly returns data by adding up the returns over all of
the trading days in each week, or the model would be estimated using the daily
data. Daily forecasts would then be produced up to 10 days (2 trading weeks)
In both cases, the models would be estimated, and forecasts made of the
conditional variance and conditional covariance. If daily data were used to
estimate the model, the forecasts for the conditional covariance forecasts for
the 5 trading days in a week would be added together to form a covariance
forecast for that week, and similarly for the variance. If the returns had been
aggregated to the weekly frequency, the forecasts used would simply be 1-step
Finally, the conditional covariance forecast for the week would be divided by
the product of the square roots of the conditional variance forecasts to obtain a
correlation forecast.
(c) There are various approaches available, including computing simple
historical correlations, exponentially weighted measures, and implied
correlations derived from the prices of traded options.
(d) The simple historical approach is obviously the simplest to calculate, but
has two main drawbacks. First, it does not weight information: so any
observations within the sample will be given equal weight, while those outside
the sample will automatically be given a weight of zero. Second, any extreme
observations in the sample will have an equal effect until they abruptly drop
out of the measurement period. For example, suppose that one year of daily
data is used to estimate volatility. If the sample is rolled through one day at a
time, an observation corresponding to a market crash will appear in the next
250 samples, with equal effect, but with then disappear altogether.
Exponentially weighted moving average models of covariance and variance
(which can be used to construct correlation measures) more plausibly give
additional weight to more recent observations, with the weight given to each
observation declining exponentially as they go further back into the past.
These models have the undesirable property that the forecasts for different
numbers of steps ahead will be the same. Hence the forecasts will not tend to
the unconditional mean as those from a suitable GARCH model would.


Introductory Econometrics for Finance Chris Brooks 2008

Finally, implied correlations may at first blush appear to be the best method
for calculating correlation forecasts accurately, for they rely on information
obtained from the market itself. After all, who should know better about future
correlations in the markets than the people who work in those markets?
However, market-based measures of volatility and correlation are sometimes
surprisingly inaccurate, and are also sometimes difficult to obtain. Most
fundamentally, correlation forecasts will only be available where there is an
option traded whose payoffs depend on the prices of two underlying assets.
For all other situations, a market-based correlation forecast will simply not be
Finally, multivariate GARCH models will give more weight to recent
observations in computing the forecasts, but maybe difficult and compute
time-intensive to estimate.
5. A news impact curve shows the effect of shocks of different magnitudes on
the next periods volatility. These curves can be used to examine visually
whether there are any asymmetry effects in volatility for a particular set of
data. For the data given in this question, the way I would approach it is to put
values of the lagged error into column A ranging from 1 to +1 in increments
of 0.01. Then simply enter the formulae for the GARCH and EGARCH models
into columns 2 and 3 that refer to those values of the lagged error put in
column A. The graph obtained would be
This graph is a bit of an odd one, in the sense that the conditional variance is
always lower for the EGARCH model. This may suggest estimation error in
one of the models. There is some evidence for asymmetries in the case of the
EGARCH model since the value of the conditional variance is 0.1 for a shock of
1 and 0.12 for a shock of 1.
(b) This is a tricky one. The leverage effect is used to rationalise a finding of
asymmetries in equity returns, but such an argument cannot be applied to
foreign exchange returns, since the concept of a Debt/Equity ratio has no
meaning in that context.
On the other hand, there is equally no reason to suppose that there are no
asymmetries in the case of fx data. The data used here were daily USD_GBP
returns for 1974-1994. It might be the case that, for example, that news
relating to one country has a differential impact to equally good and bad news
relating to another. To offer one illustration, it might be the case that the bad
news for the currently weak euro has a bigger impact on volatility than news
about the currently strong dollar. This would lead to asymmetries in the news
impact curve. Finally, it is also worth noting that the asymmetry term in the


Introductory Econometrics for Finance Chris Brooks 2008


Introductory Econometrics for Finance Chris Brooks 2008