Anda di halaman 1dari 51

Swiss Finance Institute

Research Paper Series N06 5

Model Combination and Stock


Return Predictability

Matthias HAGMANN
University of Geneva and Concordia Advisors

Joachim LOEBB
University of Zurich and Swiss Banking Institute
Established at the initiative of the Swiss Bankers' Association, the Swiss
Finance Institute is a private foundation funded by the Swiss banks and
SWX. It merges 3 existing foundations: the International Center FAME, the
Swiss Banking School and the Stiftung "Banking and Finance" in Zurich.
With its university partners, the Swiss Finance Institute pursues the
objective of forming a competence center in banking and finance
commensurate to the importance of the Swiss financial center. It will be
active in research, doctoral training and executive education while also
proposing activities fostering interactions between academia and the
industry. The Swiss Finance Institute supports and promotes promising
research projects in selected subject areas. It develops its activity in
complete symbiosis with the NCCR FinRisk.

The National Centre of Competence in Research Financial Valuation and


Risk Management (FinRisk) was launched in 2001 by the Swiss National
Science Foundation (SNSF). FinRisk constitutes an academic forum that
fosters cutting-edge finance research, education of highly qualified finance
specialists at the doctoral level and knowledge transfer between finance
academics and practitioners. It is managed from the University of Zurich and
includes various academic institutions from Geneva, Lausanne, Lugano,
St.Gallen and Zurich. For more information see www.nccr-finrisk.ch .

This paper can be downloaded without charge from the Swiss Finance
Institute Research Paper Series hosted on the Social Science Research
Network electronic library at:

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=897810
Model Combination and Stock Return
Predictability
Matthias Hagmann Joachim Loebb
This version: March 2006

Abstract
Bayesian Model Averaging (BMA) has recently been discussed in the fi-
nancial literature as an effective way to account for model uncertainty. In
this paper we compare BMA to a new model uncertainty framework intro-
duced by Yang (2004), called Aggregate Forecasting Through Exponential
Reweighting, which has as well a Bayesian interpretation, but enjoys sev-
eral attractive features not shared by BMA. The AFTER algorithm has nice
theoretical properties if the true model does not belong to the class of con-
sidered models and can easily incorporate stylized facts of financial data in
the weighting scheme, such as time-varying volatility and fat tails. Most im-
portantly, the determination of model weights in AFTER is based on pseudo
out-of-sample performance and not on within sample criteria as it is the case
for BMA. This seems rather attractive from an investment perspective.
JEL classification: G11; G12; C11

Keywords: Predictability, model combination, Bayesian Model Averaging,


investment strategies.

1 Introduction
The predictability of stock returns has been one of the most discussed issues of
empirical asset pricing over the last decade and half. Since existing equilibrium
asset pricing theories are relatively silent about which variables should enter the
correct predictive regression model, the empirical findings in the literature are prone
to over-fitting and data snooping concerns as outlined by Ferson, Sarkissian and
Simin (2003) and White (2000). To overcome some of these concerns, Pesaran and
Timmermann (1995, 2000) and Bossaerts and Hillion (1999) consider an investor

The first author gratefully acknowledges financial support from the Swiss National Science
Foundation through the National Center of Competence in Research: Financial Valuation and
Risk Management (NCCR FINRISK). The views expressed in this paper are solely those of the
authors, they are not related to those of the first authors employer. The usual disclaimer applies.
We also would like to thank Rajna Gibson and Greg Connor for helpful comments and suggestions.

Concordia Advisors, United Kingdom and HEC Geneva, Switzerland.

Swiss Banking Institute - University of Zurich, Switzerland.

1
who chooses a linear prediction model in real time. Using several different statistical
model selection criteria, the investor chooses a single best model from a large set
of models spanned by various potential predictor variables. Clearly, this approach
neglects an important issue, namely the tremendous uncertainty the researcher has
about the correct model. A natural step for the forecaster facing model uncertainty
and/or model instability is to average over a number of forecasts from different
models. The idea of combining forecasts to achieve better forecast performance
in terms of lower mean-square error was introduced by Bates and Granger (1969)
and was widely used successfully by researchers in various statistical fields; see
Clemen (1989) for a general overview of the literature. In a recent application of the
thick modeling approach of Granger and Jeon (2004) in a financial setting, Aiolfi
and Favero (2005) find that averaging over a subgroup of models, chosen by some
statistical selection criteria, significantly improves the quality of the return forecast.
Performing a Monte Carlo simulation and empirical studies, Hendry and Clements
(2004) find more formal evidence on the value-added of averaging over several
models. Further using the results of the M3-competition (a competition on the
statistical performance of a large number of forecasting models given various data
sets), Hibon and Evgeniou (2005) conclude that averaging over different forecasts
does not in general outperform the best single models. But facing model uncertainty
and instability, it is less risky in practice to combine forecasts rather than selecting
an individual forecasting model.
From a Bayesian point of view, Bayesian model averaging (BMA) is a natural
approach to account for model uncertainty. BMA updates the prior probabili-
ties of a set of candidate models, chosen by the researcher, using within sample
information and then weights the candidate models according to their posterior
probabilities. In fact, as long as the true model belongs to the set of candidate
models, averaging over all candidate models in this fashion provides better pre-
dictive ability, as measured by a logarithmic scoring rule, than using any single
candidate model. In the financial literature, BMA has been applied by Avramov
(2002) and Cremers (2002) to forecast U.S. stock market indices. Both authors find
that the empirical evidence of out-of-sample predictability improves if one accounts
for model uncertainty. Although BMA is a natural way to account for model un-
certainty, there are several question marks which have to be addressed. From a
statistical perspective, it is unrealistic that the true model is a member of the set
of candidate models. Little is known about the behavior of BMA in this case. In
the context of predicting monthly U.S. excess stock returns, Tang (2003) shows in
a simulation study that the performance of BMA decreases in such a case, once
one assumes a realistic structure for the data generating process. Furthermore,
due to its computational complexity, BMA has been carried out assuming that
financial returns are conditionally Gaussian with constant variance. Although the
latter assumption holds approximately true for monthly index data1 , fat tails are
still a prominent feature of the data and in sharp contrast with the assumption of
BMA. From a financial perspective, the calculation of posterior weights in BMA
1
It is well known that time-varying volatility dynamics such as GARCH are more prominent
at higher than at monthly frequency. Especially for index data, those effects seem to be less
important for monthly data.

2
is based on within sample information. However, Aiolfi and Favero (2005) find
strong empirical evidence that the ranking of different prediction models according
to their within-sample performance does not at all match the ranking of models in
terms of their ex-post forecasting power. The impact of this observation, namely
the performance of investment strategies based on BMA, has to the best of our
knowledge not been examined in the literature so far.
In this study we introduce Yangs (2004) new model combination algorithm into
the financial literature. The so-called AFTER algorithm (Aggregate Forecasting
Through Exponential Re-weighting) allows us to take into account prominent fea-
tures of financial data, such as time-varying volatility and non-gaussian distribution
of the standardized residuals, in the weighting scheme. Although AFTER is not a
formal Bayesian procedure, the algorithm has an obvious Bayesian interpretation.
A further key feature of AFTER is that the calculation of posterior model proba-
bilities is not based on within sample information (as is the case for BMA), but is
directly linked to the out-of-sample performance of the candidate models. This is
attractive from a financial investment perspective as argued earlier on. Yang (2004)
shows that AFTER enjoys attractive theoretical properties which will be outlined
later on. It is worth noting at this point, that these properties also hold when the
true model is not included in the candidate model set, which will almost always
be the case in real-life situations. Most important, despite these highly attractive
features of AFTER, the algorithm is very simple to implement.
Apart from introducing the AFTER algorithm to the financial return forecast-
ing literature, the contribution of this study is as follows: first, in the context of
predicting excess returns of the S&P 500 index, we provide a detailed out-of-sample
comparison of AFTER and BMA and discuss the resulting implications for the ev-
idence on the predictability of excess stock returns. This out-of-sample comparison
also allows us to characterize the time-varying importance of a popular set of pre-
dictor variables for the potential predictability in excess stock returns.2 Second,
we examine the profitability of a set of trading strategies in real time for both
methodologies. To the best of our knowledge, it has not been examined whether
the attractive properties of BMA, reported by Cremers (2002) and Avramov (2002),
give rise to economically important profits compared to profits generated by the
buy-and-hold strategy. Third, we examine the performance of AFTER in the con-
text of different assumptions concerning the conditional distribution of the data.
So far, in an empirical setting, the algorithm has just been applied by Zou and
Yang (2004) in a Gaussian context.
Our first result is that a simple implementation of AFTER significantly beats
the constant, unconditional benchmark model: AFTER outperforms from a statis-
tical perspective in terms of lower RMSE, whereas in economic terms it also yields
very attractive dynamic investment strategies. These properties do not hold true for
BMA. We also add to the growing evidence, that accounting for model uncertainty
is very important in an asset return forecasting application: All model combination
frameworks we consider in this study produce better results than model selection
2
More concretely, we show the probability or weight of each predictor variable in the weighted
predictive regression model over a long time period. Both, Cremers (2002) and Avramov (2002)
give this analysis just at one point in time, namely at the end of their sample.

3
based on statistical criteria as examined by Bossaerts and Hillion (1999) or Pesaran
and Timmermann (2000). By using two different types of priors for a skeptical and
optimistic investor similar to the study of Cremers (2002), we are able to compare
results across different investor beliefs concerning the likelihood that stock returns
can be predicted using a set of financial and macroeconomic variables. Third by
calculating a long history of out-of-sample forecasts, we show the time-varying in-
clusion weights of popular information variables over 476 months for AFTER and
BMA. Among the variables with the largest weights attributed by BMA and AF-
TER were the dividend yield, the default premium, the change in the Tbill yield
and the change in industrial production.
The outline of the chapter is as follows: Section 2 gives an overview of methods
to account for model uncertainty, section 3 briefly reviews BMA and section 4
describes Yangs AFTER algorithm and discusses its theoretical properties. Section
5 collects empirical evidence on the predictability of stock returns arising from the
different modeling approaches to account for model uncertainty. The properties of
different trading strategies are examined in detail in section 6 to determine, whether
an investor could generate economically meaningful profits using such frameworks.
The last section of this chapter summarizes the results and concludes.

2 Accounting for model uncertainty


We consider an investor who believes that future excess stock returns rt+1 can
potentially be predicted by a set of financial and macroeconomic indicators
collected in the vector Xt , which are available at time t. However, he does not
know the true data generating process, given by

rt+1 = m0 Xt0 + t+1 , (1)
where m0 () is the conditional mean function and Xt0 is a vector collecting the
true set 0 of 0 predictor variables. The classical approach modeling (1) would
be the selection of one best approximating model for forecasting and inference.
The concern with this approach is that the uncertainty associated with the whole
estimation process is to a large part ignored, which can lead to an overly optimistic
assessment due to ignoring uncertainty in the overall estimation process (see e.g.
Chatfield (1995) and Draper (1995)). Equation (1) clearly indicates that different
sources of model uncertainty are relevant for the investor: first of all uncertainty
about the functional form of the conditional mean function, and second uncertainty
concerning the true set of predictor variables 0 .
Apart from a few exceptions (e.g. Racine (2001) and Kanas (2003); see Clements,
Franses and Swanson (2004) for a recent survey), the vast majority of papers in
the literature has considered a linear functional form for the predictive regression
model:
0
rt+1 = 0 Xt0 + t+1 . (2)
We will follow this approach and concentrate on model uncertainty concerning
the choice of the appropriate set of financial and macroeconomic variables. Note
that there are two relevant situations: the true set 0 of predictor variables is

4
contained in the set , which the investor considers to be important (M-closed
perspective), or the true set 0 contains some variables which are not contained
in (M-open perspective). Since asset pricing theories are relatively silent about
which variables are contained in 0 , the choice of by the investor is mainly driven
by empirical findings and it is most likely, that the true model is not contained in
the set of 2 possible models {Mj }2j=1 the investor can generate from . Of course,
the relevance of this assertion grows even stronger once one considers uncertainty
about the functional form of the regression function. This issue will become relevant
once we examine theoretical properties of different approaches to account for model
uncertainty.
In this study, we assume that the investor builds his conditional forecast of
future excess returns by averaging over the resulting forecasts from a whole set of
2
possible models {Mj }j=1 :
2
X
2
X

0


rt+1 = wj,t rj,t+1 = wj,t j Xtj , (3)
j=1 j=1

where Xtj is a vector of j predictor variables associated with model Mj , j is the


associated ordinary least squares estimate, rj,t+1 is the forecast implied by model

Mj , and {wj,t }2j=1 is a time-varying sequence of non-negative model weights which
sum up to one.
We briefly describe the options the investor has in this study to choose the
sequence of weights {wj,t }2j=1 and contrast their ability to account for model un-
certainty. Our study will have a strong focus on AFTER and BMA:
Real-time model selection: The model selection approach of Pesaran and
Timmermann (1995) can be regarded as a special case of the above model
combination framework. The investor evaluates all possible models, but then
puts all the weight on a single model chosen by a statistical selection cri-
teria. Thus the investor does account for model uncertainty in estimating
all possible models at each time t. He then chooses one single model under
the assumption that past statistical performance as measured by a selection
criteria proves useful in choosing a good forecasting model. There is grow-
ing evidence (e.g. Yuan and Yang (2003), DellAquila and Ronchetti (2006)
among others) that model selection is often unstable in the sense that a slight
change in the data causes the selection of a different model, which usually
makes the estimator or forecast based on the selected model have unneces-
sarily large variance.
Thick modeling: To overcome the instability of model selection criteria Aiolfi
and Favero (2005) apply the thick modeling approach of Granger and Jeon
(2004) to forecast financial returns. The investor averages over all or a sub-
group of the best models ranked by a statistical criterion. In other words the
investor tries to reduce model uncertainty in his assessment of future returns
by simple equally weighted averaging over a group of plausible models.
Bayesian model averaging: BMA expresses model uncertainty in the estima-
tion problem with the language of probability theory. The investor has a

5
prior view on the inclusion probability of a specific model. This view is up-
dated having observed the sample. In a financial context, this is pursued in
Avramov (2002) and Cremers (2002), where the sequence of model weights
coincides with the model posterior probabilities in a BMA framework.

Adaptive modeling strategy with AFTER: Given a set of models, however


not knowing which model will perform best, the investor is interested to find
a combined strategy which will perform as well as the best strategy out of
the given set. Notice that this idea is directly linked to the out-of-sample
performance of the model. As will be shown shortly, the AFTER algorithm
has exactly this property. The weights will change favoring models with better
out-of-sample performance and eliminating inferior models at an optimal rate.

We stress the point that the weights of the first three strategies are determined
based on within sample information. This may not be necessarily a good modeling
strategy in a financial context as argued in the introduction. Model selection, BMA
and AFTER will asymptotically recover the true model if this model is contained
in the set of all estimated models. This is in sharp contrast to thick modeling
which will never recover the true model by construction. In an asymptotic sense,
an additional price is paid by thick modeling because it averages over potentially
inferior models as well. However this method may have interesting properties in
small samples.
We note that a large amount of combining methods do exist in the literature
with an important distinction to the ones examined in this study. All the above
models try to combine the forecasts for adaptation of the true process. The mo-
tivation of the first paper on combining by Bates and Granger (1969) had a more
aggressive motivation to improve the forecast further by combining different meth-
ods. This group of methods looks at the statistical properties of the out-sample
forecasts relative to the realizations and tries to combine for improvement (see e.g.
Elliott and Timmermann (2004) for recent advances in that area). We do not
consider this second motivation for combining methods. Note however, that aver-
aging through thick modeling could share some properties from the latter strain of
combining methods.
In the next two subsections, we describe the determination of the time varying
sequence of model weights by BMA and Yangs AFTER algorithm. Thereby, we
focus especially on the properties of the AFTER algorithm which has to the best
of our knowledge not been applied yet in the financial literature.

3 Bayesian Model Averaging


Suppose that the true model is (2) with the additional assumption that
N (0, 2 ) , which implies that r is conditionally Gaussian with constant variance
2 . The investor observes a series of data D = {rt0 , Xt0 }tt0 , where the implicit as-
sumption in BMA is that Xt contains Xt0 (Mclosed perspective). The Bayesian
solution to account for model uncertainty in predicting future excess returns is to

6
choose the model weights in (3) as

Pr ( D| Mj ) Pr (Mj )
wj,t = Pr ( Mj | D) = P2 ,
l=1 Pr ( D| Ml ) Pr (Ml )

where
Z

Pr ( D| Mj ) = Pr D| Mj , j , j2 Pr j , j2 Mj d j , j2 , (4)

is the marginal likelihood of model Mj , j , j2 are the parameters of model Mj ,

Pr j , j2 Mj is the prior density of j , j2 under model Mj , Pr D| Mj , j , j2
is the likelihood function, and Pr (Mj ) is the prior probability that model Mj is the
true model. In this case, the combined forecast arising from (3) is an average over
all model forecasts, weighted by the corresponding posterior model probabilities.
Note that all probabilities are implicitly conditional on M, meaning by construc-
tion that the true model is assumed being in the model set. Under this condition,
Raftery, Madigan and Hoeting (1997) note that averaging over all models in this
fashion provides better predictive ability, as measured by a logarithmic scoring rule,
than using any single model Mj :
" ( 2 )#
X
E log Pr rt+1 Mj , j , j2 Pr ( Mj | D)
j=1
E [log {Pr (rt+1 |Mj , D )}] j = 1, ..., 2 ,
P2
where the expectation is with respect to j=1 Pr rt+1 Mj , j , j2 Pr ( Mj | D).
This follows from the non-negativity of the Kullback-Leibler information diver-
gence.

3.1 Bayesian model priors


Concerning model priors, we assume that the investor does not have any a priori
reasons to believe that one potential predictor variable is more likely to predict
excess returns than another one. In this case, we can specify the prior probability
of model Mj by
Pr (Mj ) = j (1 )j , (5)
which depends on the parameter [0, 1] and the number of predictor variables
included in model Mj . For = 0.5, the investor assigns an equal prior probability
to each model. If = 0.25, a model including j 1 predictor variables is a priori
0.75/0.25 = 3 times more favorable for the investor than a model including j
variables. The reverse holds
true
if = 0.75.
2
For the parameters j , j of model Mj we follow Raftery et al. (1997) and use
the standard normal gamma conjugate class of priors

j j2 N j , j2 Vj ,

2 ,
j2

7

where j = j0 , 0, 0, ..., 0 , j0 is the OLS estimate of the constant in the regres-

2 2 2
sion model Mj , V =diag sr , gs1 , ..., gsj , g is the shrinkage-parameter for the
standard Zellners gpriors, s2j is the sample variance of the predictor variable Xj ,
and and are some hyperparameters. This particular choice of the prior struc-
ture and the assumption that r is conditionally Gaussian with constant variance
allows us to obtain a closed form solution for the marginal likelihood in (4). This
computational convenience is the main reason that those assumptions have been
imposed, as in the empirical applications involving BMA in a financial context by
Cremers (2002) and Avramov (2002).
The choice of an appropriate prior structure is an ongoing discussion in the
literature on Bayesian model averaging; see Fernandez, Ley and Steel (2001) for
a recent discussion and proposition of a benchmark prior choice. However to ob-
tain a better comparison of previous work in a similar context we borrow some
of our prior settings from Cremers (2002), to whom we refer for a more detailed
treatment of BMA in a financial framework. To examine the effect of different
priors on the out-of-sample forecasting performance of BMA, we will consider in
our empirical application two investors (skeptical and optimistic), differing in their
beliefs whether excess stock returns can be predicted by financial and macroeco-
nomic variables. The skeptical investor will favor models which are parsimonious
and therefore punishes models which include a lot of variables ( = 0.25). The
optimistic investor does on the contrary not harshly punish model complexity and
imposes less shrinkage ( = 0.75). We use similar values for the shrinkage parameter
as Cremers (2002) (g = 1.25) and (g = 3.5). Further, in our empirical implementa-
tion, we choose fix (the degrees of freedoms of the Chi-square distribution) to be 4
to get a rather flat prior. We then choose such that the a-priori expected variance
coincides with the one chosen in Cremers (2002) for the sceptical and optimistic
investors (E [ 2 ] = .99 and E [ 2 ] = .92).3

3.2 Posteriors and computational implementation


Given the above normal conjugate prior structure the marginal likelihood in (4)
has a closed form solution:

2 )
( +n ()/2
Pr ( D| Mj ) = 1/2
n/2 ( 2 )|I+Xj Vj Xj| |
h 1 i(+n)/2
+ (r Xj j )| I + Xj Vj Xj| (r Xj j ) (6)

Since the computational burden can grow very quickly when 2k models need
to be calculated, Raftery et al. (1997) propose two methods for approximating the
posterior probabilities; Occams window and Markov chain Monte Carlo (MCMC)
approach. However since in our empirical setup we only use 12 explanatory vari-
ables, totaling 4096 models, we calculate the posterior for each single model. A
3
Note that the prior distribution
for 2 can be rewritten as an inverse Gamma distribution
b
IG2 (b, a), where E = a2 or E 2 = 2
2 2
using the hyperparameters of the Chi-square
distribution.

8
more important concern for our out-of-sample study is the fact that the computa-
tion time of Equation (6) depends on the length of Xj with order O(2). This is
a particular problem for our recursive out-of-sample studies. 4 To let the overall
computation time remain in a feasible domain, we only calculate a new BMA esti-
mate for at a 6 months interval and use this estimate with the new information of
X to get a forecast for the next period. Note however that posterior model weights
are updated monthly.

4 The AFTER algorithm


The AFTER model combination algorithm has recently been introduced by Yang
(2004). We will first describe this new model uncertainty framework when stock
returns are supposed to be conditionally normal and discuss its theoretical prop-
erties. Then we discuss combination in the non-gaussian case and describe the
empirical implementation.

4.1 Combining under Gaussianity


We consider the following model for excess stock returns:
0
rt+1 = 0 Xt0 + t+1 ,
t+1 = t+1 t+1 ,
t+1 N (0, 1) , (7)

where t+1 denotes the conditional variance function of excess returns. Note that
as in BMA, this model implies that excess returns are conditionally Gaussian, but
the variance is allowed to be time-varying. Yang (2004) proposes to choose model
weights in (3) as

1
wj,t1 j,t exp (rt rj,t )2 /2j,t
2
wj,t = PJ 1
2 2
, (8)
0
j =1 w j 0 ,t1 0 exp (rt rj 0 ,t ) /2 0
j ,t j ,t

2
where rj,t and j,t are forecasts of model Mj for the conditional mean and variance
respectively, which have been produced at time (t 1). Note that the weight of
each candidate model is updated after each additional observation5 and depends,
by iterating (8) , on its whole past forecasting performance. So whereas BMA uses
in sample information to determine the weight for model Mj , AFTER allocates
model weights on the basis of past pseudo out-of-sample performance.
The weighting in (8) has an obvious Bayesian interpretation. If we view wj,t1
as the prior probability of model Mj before observing rt , then wj,t is the posterior
probability of this model after rt has become known. Although AFTER has rela-
tions to the Bayesian framework, it is not a formal Bayesian procedure. Especially,
4
The computation time on a 2.6 GHz processor grows from 20 seconds for the first estimate
with 120 observations to almost an hour for the full set of 596 observations.
5
This is where the name Aggregate Forecasting Through Exponential Reweighting, or short,
AFTER, comes from.

9
no prior distributions for the parameters are considered. This may be considered
as a drawback compared to BMA, since AFTER does not explicitly account for pa-
rameter uncertainty. However, this may also be considered as an advantage since
less inputs have to chosen to make the algorithm operationable. This feature makes
the algorithm more robust with respect to the choice of priors than e.g. BMA. Also,
note that Avramov (2002) reports that in a financial context, model uncertainty
appears more important than parameter uncertainty. Furthermore, since AFTER
needs less input in the form of prior information than BMA, we expect that its
results are more robust than those arising from BMA, whose outcome is sensitive
to the provided prior information.
An attractive feature of AFTER is that, compared to other combining proce-
dures, its theoretical properties are known. Yang shows that under some regularity
conditions, the combined forecast in (3) satisfies

PT h  0 X 0 r
i 2
1 0 t t+1
T t=1 E t+1
(9)
n P h 0 0 i PT h io
( X r )2 (j,t+1 t+1 )2
c inf logT2 + T1 Tt=1 E 0 tt+1j,t+1 + 1
T t=1 E 2
t+1
,
j1

where c is a constant explicitly given in Yang (2004). Therefore, the risk of


the combined forecast is automatically within a multiple of the risk of the best
model plus the risk of variance estimation. This implies that with an appropriate
variance estimation, AFTER provides the best rate of convergence provided by the
individual forecasting procedures. Most importantly, this results holds also true in
the Mopen perspective, where none of the models considered by the investor is
the true one.
Yang argues that the weighting scheme in (8) should also perform reasonably
well if the conditional distribution is not Gaussian. However, a more efficient
combination scheme can be designed in a non-gaussian setting. In our application
to monthly excess returns of the S&P 500, we do not find strong evidence of time-
varying variance. Volatility clustering is clearly a more important issue at a higher
than the monthly frequency and at a lower aggregation level. Nevertheless, the
error distribution in our application still exhibits fat tails. We will show next how
AFTER allows us to take into account this feature by designing a more efficient
combination scheme than the one based on Gaussian errors.

4.2 Combination under Non-Gaussianity


Suppose that at each time t0 the investor made estimates of the conditional density
of rt0 +1 denoted by pj,t0 (rt0 +1 ) , using his set of different models. For a general
form of the conditional prediction density, Yang proposes the use of the weighting
scheme
wj,t1 pj,t1 (rt )
wj,t = PJ . (10)
0
j =1 w j 0 ,t1 pj 0 ,t1 (rt )

0j 2
Note that in case pj,t1 is Gaussian with mean j Xt1 and variance j,t , then (10)
coincides with (8). To account for tail-fatness, we will compare in our empirical
application the Gaussian case to the case where the error term in (7) follows a

10
standardized t-distribution parameterized by h , where denotes the degrees of
freedom. We can then write (10) as

1 r r
wj,t1 j,t hj,t tj,tj,t
wj,t = P , (11)
J 1 rt rj 0 ,t
0
j =1 w j 0 ,t1 0 h 0
j ,t j ,t 0
j ,t

where j,t is an estimate of the degrees of freedom of the error distribution of model
Mj , obtained by the investor at time (t 1) . Again under regularity conditions, an
analogous property as in (9) for the Gaussian case holds when the weights of the
different models are chosen under more general distributional assumptions. Note
that the flexibility of the AFTER algorithm to incorporate different assumptions
concerning the conditional distribution of the data is very attractive for financial
applications, where volatility clustering and fat tails are prominent features.

4.3 A comparison to model selection and Bayesian model


averaging
In this paragraph we provide a comparison between AFTER, BMA and selection
methods. Note that the AFTER algorithm for combining density forecasts in Equa-
tion (10) can be written as
Q
wj,0 t1 i=1 pj,i (ri )
wj,t = PJ Qt1 . (12)
j 0 =1 wj,0 i=1 pj,i (ri )

The numerator in Equation (12) is an out-of-sample version of the likelihood


function. Following Buckland, Burnham and Augustin (1997) BMA can be ap-
proximated using the weighting scheme

wj,0 exp(0.5 Ij,t1 )


wj,t = PJ , (13)
j 0 =1 wj,0 exp(0.5 Ij,t1 )

where Ij,t1 is an information criterion of model j at time-step t 1 defined as

Ij,t = 2 log(Lj,t ) + q. (14)

L denotes the likelihood function evaluated at the estimated model parameters, q is


a penalty function depending on the the number of parameters and/or the number
of observations: For AIC q = 2p and for BIC q = p log(t), where p is the number
of model parameters and t is the number of sample observations.
Examining Equations (12) and (13), the similarities between the two weighting
methods become apparent. BMA and AFTER both weight the different models
using an adjusted likelihood function. Whereas AFTER uses an out-of-sample
version of the likelihood function to discriminate between models, BMA uses the
standard in-sample likelihood function but penalizes model complexity by q. Al-
ternatively one can say, that both algorithms follow a logarithmic scoring rule,

11
either in-sample or out-of-sample.6 The proposed approximation of BMA leads
further to a preference of parsimonious models. If the estimation environment
exhibits large explanatory power, BMA and model selection will lead to similar
results. This because the dominant model will be given the entire weight due to a
large difference in the likelihood functions, which affects the weight exponentially.
AFTER will in this case also achieve a similar result as selection given that in-
sample performance is informative for out-of-sample performance. In summary:
Combining methods such as AFTER and BMA should be superior to selection
based methods if model uncertainty is large and therefore statistical discrimination
between different models difficult. This situation is especially given in the low R2
environment of predicting asset returns in small samples, which is the situation we
deal with in our empirical application.

4.4 Starting weights and empirical implementation


To implement the AFTER algorithm, some starting weights have to be chosen at
the very beginning of the forecasting period. To achieve this, we borrow from
BMA and choose the model priors as in (5). We consider again a skeptical
and optimistic investor ( = 0.25 and = 0.75 respectively), which allows us to
examine the sensitivity of the empirical results with respect to the starting values.
Also, a comparison to the analogous cases in BMA is interesting, where the model
priors have the same interpretation, but parameter uncertainty is taken care of
additionally.
Furthermore, a measure of the error variance for each model is necessary to
implement the AFTER algorithm. Since we do not find strong evidence of time-
2
varying volatility on a monthly basis in our data, we simply estimate j,t on an
expanding d-month window based on the realized pseudo out-of-sample forecast er-
rors of model Mj . For data exhibiting volatility clustering, exponential smoothing
of the squared residuals would provide an easy to implement variance estimate. A
computationally more expensive method would be to fit an unrestricted GARCH
model to the residuals of the different models. For the Gaussian case, this com-
pletes the implementation. For the standardized t-distribution, we estimate the
degrees of freedom j for each model from the realized forecast errors of model Mj .
Although a maximum likelihood procedure would yield more efficient estimates, we
apply the method of moment estimator to a expanding d-month window to reduce
computational burden. 7
6 1
PT 1
The logarithmic scoring rule is defined out-of-sample as T 1 t=1 log pt (yt+1 ), where T is
the number of out-of-sample steps and p is the density forecast. Note that for the Gaussian kernel,
as is implicitly assumed in many empirical applications, the logarithmic scoring rule is equivalent
to a quadratic scoring rule such as the mean squared error (MSE).
7
The method of moment estimate for the degrees of freedom of the normalized (variance
equals 1) t-distribution is = 2m
m4 3
4 3
, where mi denotes the i-th estimated moment and > 4
(see e.g. Hansen (1994) or Rockinger and Jondeau (2003) for a discussion of the normalized
t-distribution).

12
5 Empirical Results
In this section we provide a thorough empirical comparison between BMA and
the AFTER algorithm as outlined in the previous section. After a description of
the data, we will analyze the time-varying predictive content of twelve different
popular predictor variables. We thereafter collect average statistical and financial
performance measures of the different model uncertainty algorithms. To examine
the value of time-varying model weights, we also report results based on the thick
modeling approach of Granger and Jeon (2004) and Aiolfi and Favero (2005), which
simply applies equal weights to each model. Finally, to make our results comparable
to previous research, we also report results when a unit weight is put on a single
optimal model as in Pesaran and Timmermann (1995) and Bossaerts and Hillion
(1999). The selection criteria we consider are Akaikes Information Criterion (AIC),
Akaike (1974), Schwarzs Bayesian Information Criterion (BIC), Schwarz (1978),
and the adjusted R2 .

5.1 The data


We use the S&P 500 monthly excess return over the 3 month Tbill rate as the
endogenous variable. The data period runs from February 1955 to September
2004, which is a total of 596 monthly observations. Our employed set of predictor
variables consists of popular financial and macroeconomic indicators, which have
been used extensively in the literature to forecast aggregate market returns (see
Cremers (2002) for an overview of variables used in past research articles). All
data are collected from Bloomberg or the Federal Reserve internet site (mainly
interest rate data) if not stated differently:

The one month S&P 500 excess return over the 3 month Tbill rate (dRet)8 .

The dividend yield (Div), calculated as the 3-month trailing difference be-
tween the total return index and the S&P 500 capital gain return.

The earnings yield (E/P) using reported earnings.

The difference of the earnings yield and the 10-year Treasury bond rate (Fed).
This series is often quoted as the Fed model.

The change in the 3-month Treasury bill rate (dTbill).

The default premium (DefP) defined as the difference between the yields on
Moodys BAA and AAA rated corporate bonds.

The term premium (Term) measured as the difference between the 10-year
constant maturity Treasury bond yield and the 3-month Treasury bill rate.

Changes in the US industrial production index (IProd).


8
The S&P 500 total return index previous to 1970 is taken from Ibbotson. After 1970 the
source is Bloomberg.

13
The seasonally adjusted inflation measured as the change in the US producer
price index for finished goods (PPI).
Changes in inflation (dPPI).
The Fama and French (1993) factor returns SMB and HML.9

Since the macroeconomic data is not available at month-end (IProd, PPI, and
dPPI) we use one month lagged values for those data series.

[TABLE 1 ABOUT HERE]

Table 1 provides some descriptive statistics of the data set used, reporting the
mean, standard deviation, skewness and kurtosis and the Jarque-Bera statistics.
All variables have been multiplied by 100 for better readability. All information
variables except the lagged return exhibit autocorrelation, as can be seen from the
significant Q-statistics. The minima and maxima of the series lie often several
standard deviations from the mean away, e.g. the minimum monthly return in
the S&P 500 was -21.88 percent (a five times standard deviation event), while the
maximum was 16.06 percent.

[TABLE 2 ABOUT HERE]

In table 2 all information variables have only a small correlation over the whole
sample period with the endogenous variable. Among the information variables
there is however considerable correlation between PPI inflation and the earnings
yield (0.69). Significant correlation in absolute terms can also be observed between
the Fed model and the dividend yield (0.51), and the earnings and dividend yield(-
0.38). Nevertheless the information data set has enough orthogonal information.

5.2 Constructing the out-of-sample forecasts


All results reported in this section are based on pseudo out-of-sample forecasts
from February 1965 till September 2004. The period of 120 months from February
1955 to January 1965 is used for calculating initial parameter estimates of the
2 regression models. In each forecasting period, we re-estimated all 2 different
regression models with an expanding window which results in a total of 476 out-
of-sample forecasts. Based on the estimated parameters, we constructed pseudo
out-of-sample forecasts for each time period. These time series of forecasts were
then used to generate a sequence of model weights for the following methods:

AFTER in the Gaussian case, where starting weights are chosen by either the
skeptical (ANS) or the optimistic investor (ANO). The weights are calculated
using Equation (8) .
AFTER with t-distributed error terms, where starting weights are chosen by
either the skeptical (ATS) or the optimistic investor (ATO). The weights are
calculated using Equation (11) .
9
These data series are available from their website and are free for download.

14
BMA, where priors are chosen by the skeptical (BMAS) or optimistic investor
(BMAO). The posterior probability weights are calculated as described in
Section 2.1.

Thick modeling, where each forecast model gets the same weight.

Selection, where in each period a single optimal model is chosen according


to either the AIC, BIC or adjusted R2 .

Once all different weight series are constructed, the final pseudo out-of-sample
forecasts for each different combination or selection algorithm can be computed
using (3).

[FIGURE 1 ABOUT HERE]

Figure 1 shows the excess return forecasts generated by these different methods.
For AFTER and BMA, we concentrate in all graphs on the skeptical investor to
save space.10 This investor achieves better results and provides therefore the more
interesting case study. For the same reasons, we also omit graphs for model selection
by the adjusted R2 .
An apparent feature in the graphs of figure 1 is that the forecast methods relying
on combination rather than selection yield less variable conditional mean estimates.
Note that the plotted forecast series are not as similar as one would expect ex
ante. Table 4 shows the correlation matrix of the different forecasting methods.
The correlations between conditional mean estimates can be as low as 0.58, which
shows that differences generated by alternative model uncertainty frameworks are
worthwhile being analyzed.

[TABLE 4 ABOUT HERE]

[FIGURE 2 AND 3 ABOUT HERE]

Concentrating on the comparison of AFTER and BMA for the skeptical in-
vestor, figure 2 plots the rolling squared correlation coefficient (RSC) between the
forecasts and the actual returns.11 The high RSC in figure 2 during the sixties stems
certainly from the overlapping of in-sample and out-of-sample forecasts. However
the RSC remains between 5% and 8% during 1975-1990 for ANS and is worse
for BMAS and ATS. A similar pattern can be observed for the AIC, BIC and
thick modeling. Thick modeling and ANS exhibit the highest RSC. In the nineties
however, predictability as measured by the RSC decreases virtually to zero. Inter-
estingly the worst modeling approaches catch up in terms of RSC after the year
2000, while the previous winners remain at low RSC levels. A reason for this could
be the adjustment speed of the weights to models which involve variables whose
10
Graphs for the optimistic investor can be obtained from the authors on request.
11
We first calculated the squared correlation coefficient of the different forecasts with the actual
values over a rolling window of 120 month. To ease comparison, the measures plotted are 6 month
moving averages of those numbers.

15
parameter coefficients suffer from a structural break. As Pesaran and Timmermann
(2002) have shown, structural breaks play an important role in forecasting asset
returns. Thick modeling which puts equal weights to all models does not adjust
weights of models experiencing a to structural break at all. Model selection, BMA
and AFTER will exclude or down-weight models, which use information variables
adversely influenced by a structural break. The adjustment speed depends on the
relative in-sample performance before and after the break for BMA and selection,
and on the relative out-of-sample performance for AFTER. Note that the weights
for AFTER depend on their entire history in Equation (10). So if a model had a
very small weight before the break, it will take longer to gain weight based on the
relative performance after the break.
We note that ANS seems to have a slight advantage over BMAS. However, tak-
ing account of the non-Gaussian error distribution in the design of the combination
algorithm seems not to improve results. The reason for this slightly disappointing
result might be twofold: First the deviation from Gaussianity may be too small to
be relevant in this empirical example. A second reason might be that AFTER with
t-distributed errors needs the additional estimation of the degrees of freedom in the
standardized error distribution.12 These inputs add an extra variability and lead
to a larger risk bound. This may worsen results compared to the gaussian version
of AFTER.

5.3 Out-of-sample performance


Table 3 reports summary statistics for the out-of-sample predictability of all con-
sidered modeling frameworks. All results are compared to the unconditional model
involving only a constant as an explanatory variable (the iid model). As expected
it is very difficult to beat the iid model in its statistical out-of-sample performance.
In 56.09% of the months a simple mean forecast would yield the correct sign.
ANS has with 59.66% the best sign predictability. The other model uncertainty
algorithms based on AFTER as well as thick modeling produce higher sign pre-
dictability than the iid model. We note that this is not the case for BMA and
the selection algorithms. In the second column we report the probability against
the null of no predictive performance of the Pesaran and Timmermann (1992) test,
which is asymptotically equal to the test of Henriksson and Merton (1981) for mar-
ket timing ability. All AFTER models and thick modeling reject the null of no
market timing ability at the 1% level, while BMA and selection by AIC, BIC reject
the null at 5%.

[TABLE 3 ABOUT HERE]

Measuring forecasting quality by the root-mean-squared error (RMSE), ANS


and thick modeling produce significantly better results than the iid model at the
10% level according to the Diebold and Mariano (1995) test statistic including 4 lags
12
As we estimate the degrees of freedom with an expanding window, the average degrees of
freedom per model starts with a large number of more than 40 in the first 10 years. Then through
the various crises in the seventies the number falls to values around 10 and after the crash of 1987
the degrees of freedom fall further to 7.

16
for autocorrelation. The RMSE for the constant, unconditional model equals 0.2008
whereas it is 0.1976 for ANS and 0.1983 for thick modeling. These two models also
have slightly better mean absolute deviation (MAD) than the iid model. All other
forecasting methods based on AFTER or BMA yield similar statistical results as
the iid model. Selection based methods tend to underperform the iid model, which
confirms the results of Bossaerts and Hillion (1999).
It can be seen from Table 3 that there is a tendency that the skeptical investor,
who relies more heavily on parsimonious models, performs better statistically than
the optimistic investor. For this reason and to save space we concentrate most of
our graphs to report results for ANS, ATS and BMAS.

[TABLE 5 ABOUT HERE]

Table 5 collects the average weight of the iid model and the full model (i.e. all
information variables included) for the two AFTER algorithms and BMA. ANO
gives some weight to the full model for the optimistic investor: Especially in the
early seventies this weight rises up to 20% as shown in Graph 5. BMA yields a
zero posterior probability for the full model in both, the skeptic and the optimistic
prior case. The BMAS investor assigns however a large weight of over 30% in
the first ten years of the sample period (1965 to 1975) to the iid model. ATS,
ATO and BMAO put only little weight to these two polar cases out of the 4096
considered models. To deepen this analysis, figures 6 and 7 show how the aggregated
weight of parsimonious models evolves through time for the various algorithms.
The aggregated weight of parsimonious models is defined as the sum of the weights
allocated to a model at time t, which has 6 or less information variables included.
Both BMA priors lead to strong preference on parsimony. Surprisingly the two
AFTER algorithms show very different patterns. The Gaussian AFTER version
exhibits highly persistent weights given to the parsimonious models depending on
the starting weights. On the contrary AFTER with t-distributed errors prefers
clearly parsimonious models through time. However, given the large amount of 476
out-of-sample steps, the adjustment speed is not very high.

[FIGURES 4 TO 7 ABOUT HERE]

When model uncertainty is large, as we think is certainly the case in forecasting


asset returns, then the initial weights play an important role as can be seen for the
AFTER algorithm. If no single model or group of models has a clear advantage
in terms of out-of-sample predictability over another, the initial weight will show
high persistence in the AFTER algorithm (Equation (10)). This is a possible
explanation of the better statistical performance of the skeptical investor for the
two proposed AFTER weighting schemes. For BMA the slightly better statistical
out-of-sample performance of the skeptical investor is to a large amount caused by
the chosen prior structure, which leads to preference of parsimonious models and a
tendency to equal weighting. In a recent study, DellAquila and Ronchetti (2006)
look at the discrimination power of well-known model selection criteria when the
R2 is low as it is the case in our asset return predictability study. They find that
potentially out of the 2k models, there is a large group of models whose members

17
are not significantly distinguishable from each other in terms of selection criteria.
This potentially makes any ranking procedure and especially usage of a single
model spurious. DellAquila and Ronchetti (2006) further argue, that the posterior
probabilities of BMA will only be spuriously different from an equal weighting in a
low R2 setting, indicating why thick modeling might be a satisfactory solution in
such an environment. Though AFTER is using out-of-sample information, a similar
property could arise in a low R2 environment. However, whereas both BMA and
AFTER automatically adjust the weights to the number of relevant models in such
a case, thick modeling requires that one chooses explicitly the number of models
over which the average forecast will be taken in an ad-hoc fashion. We think
that the good performance of averaging over all models in our study is specific to
this data set. Results for averaging over subgroups of AIC and BIC showed less
favorable results (in the range of the BMA results) and were omitted for the reason
that we focus on the comparison of BMA and AFTER.
In summary we conclude, that methods accounting for model uncertainty per-
form better in an asset return forecasting framework than methods relying on a
single optimal model. Also the most simple way to adjust for model uncertainty
by weighting all models equally over time performs well, which confirms the find-
ings in Aiolfi and Favero (2005). All combining models do perform better than
the selection models. Our results also show that it remains difficult to beat the iid
model by a large margin.

5.4 Time-varying weights of information variables


Since we are calculating a large number of out-sample forecasting steps, we can an-
alyze the changing weights of specific information variables included in the models
through time. Table 6 shows mean and variance of the weight given to a specific
information variable. The inclusion weight is calculated as the sum of all model
weights including the information variable at hand.

[TABLE 6 ABOUT HERE]

Looking first at the difference in weights between the skeptical and optimistic
investor, we can see a clear down-weighting of all variable inclusion weights for the
skeptical investor. In the BMA case this is a direct consequence of the prior struc-
ture and is driven by choosing ( = 0.25) as the prior probability of a model having
j variables included in Equation (5). As we argued earlier on, a model including
j 1 predictor variables is a priori 0.75/0.25 = 3 times more favorable for the
investor than a model including j variables. This prior view remains constant over
time. Hence the average weight of each information variable is driven by the other
prior hyperparameters and with an augmenting sample size increasingly by relative
model performance. The weights or rather posterior probabilities appear to be very
sensitive to the choice of the shrinkage parameter g. For the AFTER algorithm on
the contrary, Equation (5) is only used for the starting weights. Every information
variable will have a starting weight summed up over all models equal to 75% for
the optimistic and 25% for the pessimistic investor. Then these weights will be up-
dated through each new out-of-sample step. However as seen in the previous part,

18
with only little explanatory power of a single information variable on the one hand
and low overall statistical fit, starting weights will show high persistence. This is
the case for the lagged return (dRet) and Fama and French HML variables, as the
weights remain around the starting weights with little volatility. On the contrary,
information variables with better explanatory power do move in the same direc-
tion from their starting weights or tend towards a 50% inclusion weight. AFTER
with t-distributed errors shrinks all weights more towards zero than its Gaussian
counterpart. These results would remain unchanged if we analyzed sub-periods of
Table 6. In this asset return forecasting exercise it seems crucial for the application
of the AFTER algorithm to start with weights of the skeptical investor favoring
parsimonious models and let then the weights adjust accordingly through time.
The weights for BMA are direct averages of the posterior probabilities of models
including a specific information variable. In sharp contrast to Cremers (2002) the
dividend yield (Div) gets the highest posterior probability of 68.56% for the opti-
mistic and 71.19% for the skeptic investor. All other interest rate related variables
achieve also considerable posterior probability weights.

[FIGURE 8 AND 9 ABOUT HERE]

For reasons given above, we focus our analysis of the time-varying inclusion
weights on the skeptical investor priors. The time-varying weights are plotted in
the graphs of figure 8 and 9. All market return related variables - lagged return,
HML and SMB factors - show high persistence in the inclusion weights. The low
explanatory power of these variables throughout the entire time span is confirmed
by the little posterior weight of BMA. Most interest rate related information vari-
ables show on the contrary highly volatile patterns for all three approaches, however
not necessarily with the same direction. Surprisingly ATS patterns tend to move
similar to BMA for the dividend yield, the default premium and the change in
Tbill yield, while ANS moves in opposite directions for these information variables.
This graphically confirms the lowest correlation of return forecasts of 0.58 among
all forecasting algorithms between the two different AFTER methods. However we
think that the correlation among the interest rate related variables leads to these
effects as there might be a problem of collinearity through certain periods. The
two oil shocks and the resulting strong focus on monetary policy is reflected in the
weights of the Tbill yield and the Fed model and generally lead to volatile weights
for all interest rate related variables through that period. This holds especially
true in the beginnings of the eigthies. Another striking pattern is, that almost all
inclusion weights for ATS have fallen to virtually zero for most variables except in-
dustrial production, PPI and change in Tbill yield. Again a faster adjusting speed
to a generally more difficult forecasting environment- recall the low rolling R2 in
the nineties in graph 2 - could be a possible explanation.

6 Economic significance of predictability


As argued in Granger and Pesaran (2000) it is important to analyze the statistical
performance of different forecast models more closely with the economic relevance

19
of any decisions taken upon the results. To conclude our empirical study, we use the
obtained out-of-sample return forecasts from the various models to look at the long-
term performance of different investment strategies. We examine these strategies
from a statistical perspective rather than from a real-world performance point of
view. For this reason we abstract from including transaction costs. Furthermore,
while e.g. a switching strategy would have been nearly impossible at reasonable
costs in the beginning of our sample, it can be attained at virtually no costs today
using financial futures or other derivatives. Finally, leverage is excluded for all
trading strategies, or more formally |w| 1, where w is the weight of the risky
asset.
The (long-term) evaluation of dynamic trading strategies is difficult for vari-
ous reasons. Most importantly dynamic trading strategies can be used to create
nonlinear pay-off structures and non-normal returns. For this reason standard
mean-variance performance evaluation metrics such as the Sharpe ratio can lead to
wrong conclusions, especially when pay-offs are asymmetric, since the Sharpe ratio
will favor concave strategies (see e.g. Leland (1999)). In an empirical study, Lhabi-
tant (2000) shows among others that applying the Sharpe ratio as a risk adjusted
measure to portfolios with option strategies, such as writing covered calls, leads to
a wrong evaluation of the outperformance.
In our assessment of the results of the trading strategies implemented in this
study we will report a number of measures. First we give the statistical measures of
excess returns, the mean, standard deviation, skewness and kurtosis. Being aware
of the drawbacks of the standard Sharpe measure, we use an adjusted Sharpe
ratio advocated by Graham and Harvey (1997) to keep its intuition but adjust
for market timing capabilities. To calculate the adjusted Sharpe ratio we leverage
or de-leverage the trading strategy expected return by the ratio of the volatility
of the S&P 500 and the volatility of the trading strategy. We also calculate the
maximum drawdown of the strategy. Finally we use the Diebold and Mariano
(1995) test statistic as an autocorrelation adjusted substitute for the popular t-test
of significant outperformance against a benchmark. At the end of this chapter
we will look at certainty equivalent returns as an economically more appropriate
measure of long-term performance.

6.1 Switching strategy


The first strategy shown in Table 7 is a switching strategy depending on the sign of
the forecast similar to Pesaran and Timmermann (1995). The long/short strategy
invests in the S&P 500 depending on the sign of the forecast, i.e. either long or
short the market. The second strategy is either long the market given a positive
sign or long the risk-free asset when the forecasted excess return turns negative. All
combining models beat the buy-and-hold strategy of the iid benchmark model in
terms of excess return and adjusted Sharpe ratio. The ANS investor is significantly
better than the benchmark using the Diebold and Mariano (1995) test statistics on
the outperformance against the benchmark for both strategies. For the long-only
strategy, ANS and thick modeling are significantly better at the 10% level. ANS
achieves with 9.32% the highest excess return compared to 3.77% of the S&P 500.

20
Again the simple method of thick modeling works fairly well. Investment strategies
based on BMA and the selection algorithms have large draw downs around 50%,
which means that an investor who would have followed such a strategy at the worst
entry moment would have lost 50% of his initial wealth. The long only strategy
seems to be more favorable, since all models outperform the benchmark under any
aspect. As expected from the disappointing statistical results, the model-selection
based approaches are inferior.

[TABLE 7 ABOUT HERE]

6.2 Weighted sign allocation strategy


To stress the notion to account for model uncertainty, we implement a second
type of strategy. Instead of using the sign of the overall forecast, we calculate the
weighted sign over all 2k models using either the time-varying weights of AFTER,
the constant equal weights with thick modeling or the posterior probabilities for
BMA:
2k
X
wsignt = wj,t sign(rj,t ) (15)
j=1

Again we split this strategy into a long/short and a long only strategy. The
investor invests a fraction of his wealth depending on the above formula in the
market and the risk-free asset, taking the long only constraint into account for the
second strategy. Table 8 shows the results of the described weighted sign strategies.
As expected, the risk level as measured by the standard deviation and the maximum
drawdown is lower than for the pure sign switching strategy. For the long/short
strategy ANS and thick modeling achieve a higher adjusted Sharpe ratio than
the pure sign switching strategy. However, only AFTER performs significantly
better than the S&P 500 at the 10% significance level. The long only weighted
sign strategy does not yield significant outperformance for any model, but yields
better risk adjusted returns compared to the benchmark. Overall the weighted
sign strategy yields the highest risk adjusted performance compared to all other
strategies considered in this study.

[TABLE 8 ABOUT HERE]

6.3 Mean-variance rule strategy


As a third strategy we look at the optimal rule to the portfolio problem. Since
the focus of this study lies in the comparison between different statistical models
for return predictability and model combination, we take a pragmatic approach
and approximate the solution to a single-period optimal portfolio problem with a
simple mean-variance portfolio rule equally for all models. However we briefly show
the assumptions implied and some of the difficulties that arise, when one wishes

21
to solve the optimal portfolio problem in a more general setting including model
uncertainty.
In a general approach, the investor maximizes the expected utility of terminal
wealth by solving the following dynamic program and by choosing the optimal
allocation weights w of the risky asset over the time span s = t to T :

J(Dt ) = max E [U (WT )|Dt ] (16)


{w}T
s=t
Z
= max U (WT ) p(WT |Dt )dWT .
{w}T
s=t

p(WT |Dt ) is the predictive distribution of terminal wealth given the data D
up to time t. Though the formulation might look rather simple, there exists a
closed-form solution to this dynamic program only in stylized cases. Even if one
chooses a specific parametric form of the utility function (e.g. the HARA-class),
the distribution of terminal wealth depends heavily on the assumptions driving the
opportunity set and the model framework. To solve the multi-period problem, one
usually assumes an observable and parametric structure of state variables driving
the stochastic opportunity set. Detemple, Garcia and Rindisbacher (2003) provide
a general simulation-based solution to the multi-period portfolio problem, when
asset and state variable dynamics are known.
In a Bayesian setting however, the parameters of p(WT |Dt ) might not be ob-
servable directly and must be learned (see e.g. Xia (2001)). Further in the case
of model uncertainty there does not exist one single correct model. Even if every
single model might have a parametric form, the combined density is not necessarily
simply obtained.
Second, solving for the optimal portfolio rule of the weights conditional on
p(WT |Dt ) is extremely difficult, since to obtain the predictive density p(WT |Dt ) one
would need to integrate over the entire history of the returns and state variables,
which is analytically and numerically intractable.
We follow the existing literature and reduce the multi-period problem to a
single-period instead.13 Since the seminal paper of Merton (1971) the main differ-
ence between a single-period and a multi-period problem is known to be hedging
demands against adverse shifts in the stochastic opportunity set. However the hedg-
ing demands are found to be relatively small compared to the market risk (see e.g.
Ang and Bekaert (1999), Brandt (1999) and At-Sahalia and Brandt (2001)). For
the single period, the investor maximizes expected utility of next periods wealth

J(Dt ) = maxE [U (Wt+1 )|Dt ] (17)


wt
R
= max
wt
U (Wt [wt exp(rt+1 +rf )+(1wt ) exp(rf )])p(rt+1 |Dt )drt+1 ,

where p(rt+1 |Dt ) denotes the distribution of next periods return conditional on the
information set Dt , and wt is the weight on the risky asset. Given a parametric
13
Specifically within a Bayesian framework, see e.g. Kandel and Stambaugh (1996), Pastor
(2000), Pastor and Stambaugh (2000) and Barberis (2000) )

22
form of the utility function, the question whether there exists a closed-form solu-
tion of the problem depends on the predictive return distribution. Within a model
combination framework, the predictive distribution will be a mixture of densities,
therefore a closed form solution is usually not attainable. Only in the model se-
lection case one deals with only one density function. To obtain a result for the
optimal portfolio weight one would first need to construct the combined predictive
density of all models. Within the simple framework of this study, this would e.g.
be a mixture of normals for the (normal) AFTER algorithm and thick modeling,
and a mixture of t-distributions for BMA. The optimal weight is obtained numeri-
cally by sampling from the combined density estimate (see Avramov (2002) for an
application and further details).
However, if one assumes continuous-time diffusions for the expected returns
and volatilities, and independence to return shocks, then the solution to the single-
period optimal portfolio problem is of the following well-known form:14 The investor
allocates a weight w to the risky asset

Jw t rf
wt = . (18)
Jww W t2
This is usually referred to as the mean-variance rule, where t is the instan-
taneous expected excess return on the risky asset. For continuously compounded
returns over a discrete time span from t to t + 1, and taking a CRRA utility func-
1
tion U (W ) = W1 for 6= 1 and U (W ) = ln(W ) if = 1, the optimal portfolio
allocation rule is
1 t+1 rf 1
wt = 2
+ , (19)
t+1 2
where is the inverse of the relative risk aversion. Kandel and Stambaugh (1996)
find that this simple rule approximates the portfolio allocation in their Bayesian
setting very well, when compared to the results obtained from the true t-distributed
predictive density. We use this simple rule as an approximation to the solution of
the single-period problem for all models by plugging in the appropriate model
estimates for the expected returns. For the volatility estimate we take a rolling
sample estimate of 120 months equally for all models to keep the focus on the
expected return estimate of each model. Finally we impose a relative risk aversion
of 5.
Table 9 shows the results for the mean-variance trading strategy. ANS is the
best long/short strategy achieving 6.83% excess return per annum and an adjusted
Sharpe ratio of 1.04. It is the only strategy, that is significantly better than the iid
strategy at the 10% level. Thick modeling is the second best, while the selection
based models are slightly inferior than the combined models. In the long only
domain, ANS is again the highest yielding strategy with 4.76% and a Sharpe ratio
of 1.22, however this time BMA, thick modeling, BIC and AIC achieve the highest
Sharpe ratios. However due to the large values in skewness and kurtosis, the value
of the adjusted Sharpe ratio must be interpreted cautiously.
14
See Gron, Jrgensen and Polson (2005) for a thorough mathematical treatment in the single-
period case with stochastic volatility

23
[TABLE 9 ABOUT HERE]

6.4 Certainty equivalent returns


In the introduction of this chapter we already pointed to some drawbacks of mean-
variance based performance measures such as the Sharpe ratio. The economically
most appropriate way to compare the performance of different trading strategies or
pay-offs to contingent claims is the concept of certainty equivalent return, taking
the preferences of the investor into account. The certainty equivalent return rceq is
the return at which an investor is indifferent ex ante between choosing the trading
strategy return rt+1 or the certain return. In other words, we solve

U (exp(rceq )) = E [U (exp(rt+1 ))|Dt ] (20)


Z
= U (exp(rt+1 )) p(rt+1 |Dt )drt+1

for rceq , where p(rt+1 |Dt ) denotes the predictive distribution of next periods
return conditional on the information Dt . Using the simplifying assumption that
the returns to our trading strategy are approximately iid, the estimate rceq is defined
as the solution to the empirical version of (20):
T
1X
U (exp(rceq )) = U (exp(rt )).
T t=1
1
We use the CRRA utility function U (W ) = W1 for 6= 1 and U (W ) = ln(W ) if
= 1. Similar to the mean-variance rule we set the relative risk aversion parameter
to 5. In the tables we report certainty equivalent returns rceq in excess of the average
risk free rate over the corresponding period.
Table 10 shows a summary of the CER over all three types of strategies. The
first line shows the value of the CER for the iid strategy. The negative sign is the
result, that a risk averse investor ( = 5) would prefer a smaller return rather than
bearing the risk of the iid strategy, especially when fully invested in the risky asset
like in the switching and weighted sign strategies. ANS achieves the highest CER
in all strategies and is the single-best model. The simple thick modeling is the
second best strategy. All long only strategies achieve large economic gains when
compared to the constant benchmark, while with the long/short strategies, the
selection based models except for the AIC are clearly inferior. However the next
section will show, that these economic gains disappeared after 1993.

[TABLE 10 ABOUT HERE]

6.5 Results for sub-periods


The statistical results and the performance of the trading strategies over the entire
time span are in favor of statistical and economical gains of predictability in stock

24
returns. However as has been shown in the rolling R2 Graphs 2 and 3, the predictive
ability of the models decreased to virtually zero after 1992 up until 2000. It is
therefore a natural choice to look at the statistical and economic performance of
the various models and trading strategies in the two sub-periods 1964 to 1993 and
1993 to 2004.
Table 11 presents statistical summary statistics of the out-of-sample predictabil-
ity of the various models distinguished by sample period. The more recent 1993
to 2004 period was governed by up-trending markets. This is why the iid model
predicts the sign of returns correctly in 62.75% of the cases. This number reads
only 52.94% in the 1965 to 1993 period where markets where less upward trending.

[TABLE 11 ABOUT HERE]

All statistical models show significant market-timing ability at the 5% level and
except for the adjusted R2 , all models are better than the iid model in terms of
RMSE and MAD. The picture changes dramatically for the second period. The iid
model is best across all statistical performance measures. Only ATS can signifi-
cantly time the market at the 10% level. Although this result seems discouraging,
the AFTER algorithm nevertheless shows its desirable properties: The two AF-
TER versions for the skeptical investor favoring parsimonious models achieve levels
of sign predictability above 60% even for the more recent subperiod. BMA, thick
modeling and selection based forecasting methods perform very poorly yielding sign
predictability measures ranging from 49 to 54% only.

[TABLE 12 ABOUT HERE]

In Table 12 we compare the certainty equivalent returns for the two periods.
The picture is similar to the statistical results. In the first sub-period all models
achieve large economic gains as measured by the CER compared to the iid strategy,
with ANS being the single-best strategy, followed by thick modeling and BMA. But
also selection based on AIC achieves very high CER. Clearly when the achievable R2
is reasonably high, as was the case in the first period, then model selection has no
disadvantage against combining. However, again the picture changes dramatically
after 1993. ANS is the only strategy, that can achieve a CER in excess of the
iid strategy for the switching and weighted sign strategies. All other forecasting
methods are clearly inferior compared to the constant benchmark.
We think that one interpretation of these results is directly connected to the
efficient market hypothesis. Around the beginning of the nineties of the last century,
several academic studies looked at time-variation in risk premia and predictability
of stock returns.15 When this information became public and quantitative tools
more widely available, it could not be exploited anymore with an economic gain.
Another interpretation of the results is that the period after 1993 is potentially
influenced by a structural break investors should account for in their investment
behavior. We refer to Pesaran and Timmermann (2002) for estimation methods
accounting for this feature.
15
See e.g. Ferson (1990), Ferson and Harvey (1991, 1993), Bekaert and Hodrick (1992), Solnik
(1993) among others.

25
7 Concluding remarks
In this article we have introduced Yangs (2004) AFTER algorithm to a financial
framework and compared it with a recently applied Bayesian framework to account
for model uncertainty, Bayesian model averaging. Unlike BMA, the AFTER al-
gorithm can account for prominent features of the financial data in the weighting
scheme, such as time-varying volatility and non-Gaussian distribution of the stan-
dardized residuals. Furthermore, the true model is not required to be in the set of
candidate models. Most important, compared to BMA and other statistical crite-
ria based combination methods, the update of weights in the AFTER algorithm is
based on out-of-sample information.
Our out-of-sample forecast study on the S&P 500 excess return yielded several
important results. First, the important positive message of this study is the fact,
that AFTER in the most simple case with Gaussian errors does significantly beat
the constant, unconditional benchmark model both in terms of RMSE and in the
outperformance of all trading strategies considered, the switching strategy, the
weighted sign and the mean-variance rule strategy. This is on the other hand not
the case for BMA. We also compare the results with a recent application of the
thick modelling approach of Granger and Jeon (2004) by Aiolfi and Favero (2005)
in a financial setting. Giving constant equal weight to all models, our application of
thick modeling yields significantly better RMSE results than the iid model, beating
all other models except AFTER with Gaussian errors. Unfortunately, adjusting for
fat tails by estimating AFTER with t-distributed errors did not improve results over
the simple Gaussian case. A possible explanation could be, that the estimation
of the degrees of freedom for the t-distribution might deteriorate the results by
adding further noise expanding the risk bound for the algorithm compared to the
best model.
Second, we add to the growing evidence, that accounting for model uncertainty
is very important in an asset return forecasting application. All model combination
methods performed superior to selection methods based on statistical criteria as
examined in Bossaerts and Hillion (1999) or Pesaran and Timmermann (2000). By
using two different types of priors for a skeptical and optimistic investor similar to
the study of Cremers (2002), we were able to compare results expressing different
views about the performance of model parsimony. As expected the performance of
the skeptical investor, i.e. giving more (initial) weight to parsimonious models led to
better results. However in a typically low R2 environment the initial weights do play
a crucial role for the AFTER algorithm and must be chosen carefully. The intuition
behind preferring parsimonious models is a good starting point. Sensitivity analysis
of the adjusting speed of the AFTER algorithm towards the best model depending
on the ex-post R2 could render more insight in future research.
Third by calculating a long history of out-of-sample forecasts, we were able
to show the time-varying inclusion weights of popular information variables over
476 months for AFTER and BMA. Among the variables attributed the largest
weights for BMA and AFTER were the dividend yield, the default premium, the
change in the Tbill yield and the change in industrial production. However, during
the nineties of the past century, the explanatory power of all applied forecasting

26
models fell to virtually zero, which confirms the view of Timmermann and Granger
(2004). The constant search by researchers and market participants for predictable
patterns affects prices when they attempt to exploit new trading opportunities.
The recent rise in explanatory power leaves room for alternative explanations of
nonlinearity, non-stationarity and structural breaks in the data generating process
through certain time periods.
Research on accounting for model uncertainty using BMA, thick modeling and
other combining methods in general and the AFTER algorithm in particular are
far from being exploited. These initial results for AFTER within a simple linear
framework are very promising and call for several extensions. Combined forecasts
from linear and nonlinear models are a natural extension to our study along the
lines of Terui and van Dijk (2002) and Stock and Watson (1998); especially the
time-variation in the weight attributed to linear and non-linear models could shed
more light in the ongoing discussion, whether nonlinear models should be in fact
preferred over linear models. As Pesaran and Timmermann (2002) have pointed
out and our results for the sub-period after 1993 indicate, accounting for structural
breaks should increase the forecasting power. A sensitivity analysis could shed
more light on this question for AFTER. Further, within a low R2 environment the
question remains open, whether simple averaging algorithms such as thick modeling
with constant weights are superior to more complicated and time-varying methods
such as AFTER and BMA, i.e. time-variation in weights adds further noise to the
forecast. Using weekly or daily data allows analyzing the combination of models
with more realistic specifications of the conditional distribution of the error terms.
We leave these questions for future research.

27
References
Aiolfi, M. and Favero, C. A.: 2005, Model uncertainty, thick modelling and the
predictability of stock returns, Journal of Forecasting 24(4), 233254.

At-Sahalia, Y. and Brandt, M.: 2001, Variable selection for portfolio choice, Jour-
nal of Finance 56(4), 12971351.

Akaike, H.: 1974, A new look at the statistical model identification, IEEE Trans-
actions on Automatic Control 19(6), 716723.

Ang, A. and Bekaert, G.: 1999, International asset allocation with time-varying
correlations, National Bureau of Economic Research Working Paper 7056.

Avramov, D.: 2002, Stock return predictability and model uncertainty, Journal of
Financial Economics 64(3), 423458.

Barberis, N.: 2000, Investing for the long run when returns are predictable, Journal
of Finance 55(1), 225264.

Bates, J. and Granger, C.: 1969, The combination of forecasts, Operational Re-
search Quarterly 20(4), 451468.

Bekaert, G. and Hodrick, R. J.: 1992, Characterizing predictable components in


excess returns on equity and foreign exchange markets, Journal of Finance
47(2), 467509.

Bossaerts, P. and Hillion, P.: 1999, Implementing statistical criteria to select re-
turn forecasting models: what do we learn?, Review of Financial Studies
12(2), 405428.

Brandt, M. W.: 1999, Estimating portfolio and consumption choice: A conditional


euler equations approach, Journal of Finance 54(5), 16091645.

Buckland, S. T., Burnham, K. P. and Augustin, N. H.: 1997, Model selection: An


integral part of inference, Biometrics 53(2), 603618.

Chatfield, C.: 1995, Model uncertainty, data mining and statistical inference, Jour-
nal of the Royal Statistical Society. Series A 158(3), 419466.

Clemen, R. T.: 1989, Combining forecasts: A review and annotated bibliography,


International Journal of Forecasting 5(4), 559583.

Clements, M. P., Franses, P. H. and Swanson, N. R.: 2004, Forecasting economic


and financial time-series with non-linear models, International Journal of Fore-
casting 20(2), 169183.

Cremers, K. J. M.: 2002, Stock return predictability: A Bayesian model selection


perspective, Review of Financial Studies 15(4), 12231249.

28
DellAquila, R. and Ronchetti, E.: 2006, Stock and bond return predictability: The
discrimination power of model selection criteria, Computational Statistics &
Data Analysis 50(6), 14781495.

Detemple, J., Garcia, R. and Rindisbacher, M.: 2003, A monte carlo method for
optimal portfolios, Journal of Finance 58(1), 401446.

Diebold, F. X. and Mariano, R. S.: 1995, Comparing predictive accuracy, Journal


of Business and Economic Statistics 13(3), 253265.

Draper, D.: 1995, Assessment and propagation of model uncertainty, Journal of


the Royal Statistical Society. Series B (Methodological) 57(1), 4597.

Elliott, G. and Timmermann, A.: 2004, Optimal forecast combinations under gen-
eral loss functions and forecast error distributions, Journal of Econometrics
122(1), 4779.

Fama, E. F. and French, K. R.: 1993, Common risk factors in the returns on stocks
and bonds, Journal of Financial Economics 33(1), 356.

Fernandez, C., Ley, E. and Steel, M. F.: 2001, Benchmark priors for Bayesian
model averaging, Journal of Econometrics 100(2), 381427.

Ferson, W. E.: 1990, Are the latent variables in time-varying expected returns
compensation for consumption risk?, Journal of Finance 45(2), 397429.

Ferson, W. E. and Harvey, C. R.: 1991, The variation of economic risk premiums,
Journal of Political Economy 99(2), 385415.

Ferson, W. E. and Harvey, C. R.: 1993, The risk and predictability of international
equity returns, Review of Financial Studies 6(3), 527566.

Ferson, W. E., Sarkissian, S. and Simin, T. T.: 2003, Spurious regressions in finan-
cial economics?, Journal of Finance 58(4), 13931414.

Graham, J. R. and Harvey, C. R.: 1997, Grading the performance of market-timing


newsletters, Financial Analyst Journal 53(6), 5466.

Granger, C. W. J. and Jeon, Y.: 2004, Thick modeling, Economic Modelling


21(2), 323343.

Granger, C. W. J. and Pesaran, M. H.: 2000, Economic and statistical measures of


forecast accuracy, Journal of Forecasting 19(7), 537560.

Gron, A., Jrgensen, B. N. and Polson, N. G.: 2005, Optimal portfolio choice and
stochastic volatility. Working Paper, University of Chicago.

Hansen, B. E.: 1994, Autoregressive conditional density estimation, International


Economic Review 35(3), 705730.

Hendry, D. F. and Clements, M. P.: 2004, Pooling of forecasts, Econometrics Jour-


nal 7(1).

29
Henriksson, R. D. and Merton, R. C.: 1981, On market timing and investment
performance. ii. statistical procedures for evaluating forecasting skills, Journal
of Business 54(4), 513533.
Hibon, M. and Evgeniou, T.: 2005, To combine or not to combine: selecting
among forecasts and their combinations, International Journal of Forecasting
21(1), 1524.
Kanas, A.: 2003, Non-linear forecasts of stock returns, Journal of Forecasting
22(4), 299315.
Kandel, S. and Stambaugh, R. F.: 1996, On the predictability of stock returns: An
asset-allocation perspective, Journal of Finance 51(2), 385424.
Leland, H. E.: 1999, Beyond mean-variance: Performance measurement in a non-
symmetrical world, Financial Analyst Journal 55(1), 2736.
Lhabitant, F.-S.: 2000, Derivatives in portfolio management: Why beating the
market is easy, Derivatives Quarterly 7(2), 3746.
Merton, R. C.: 1971, Optimum consumption and portfolio rules in a continuous-
time model, Journal of Economic Theory 3(4), 373413.
Pastor, L.: 2000, Portfolio selection and asset pricing models, Journal of Finance
55(1), 179223.
Pastor, L. and Stambaugh, R. F.: 2000, Comparing asset pricing models: An
investment perspective, Journal of Financial Economics 56(3), 335381.
Pesaran, M. H. and Timmermann, A.: 1992, A simple nonparametric test of predic-
tive performance, Journal of Business and Economic Statistics 10(4), 461465.
Pesaran, M. H. and Timmermann, A.: 1995, Predictability of stock returns: Ro-
bustness and economic significance, Journal of Finance 50(4), 12011228.
Pesaran, M. H. and Timmermann, A.: 2000, A recursive modelling approach to
predicting uk stock returns, The Economic Journal 110(460), 159191.
Pesaran, M. H. and Timmermann, A.: 2002, Market timing and return prediction
under model instability, Journal of Empirical Finance 9(5), 495510.
Racine, J.: 2001, On the nonlinear predictability of stock returns using financial and
economic variables, Journal of Business and Economic Statistics 19(3), 380
383.
Raftery, A. E., Madigan, D. and Hoeting, J. A.: 1997, Bayesian model averaging
for linear regression models, Journal of The American Statistical Association
92(437), 179191.
Rockinger, M. and Jondeau, E.: 2003, Conditional volatility, skewness, and kurto-
sis: existence, persistence, and comovements, Journal of Economic Dynamics
and Control 27(10), 16991737.

30
Schwarz, G.: 1978, Estimating the dimension of a model, The Annals of Statistics
6(2), 461464.

Solnik, B.: 1993, The performance of international asset allocation strategies using
conditioning information, Journal of Empirical Finance 1(1), 3355.

Stock, J. H. and Watson, M. W.: 1998, A comparison of linear and nonlinear


univariate models for forecasting macroeconomic time series, National Bureau
of Economic Research Working Paper 6607.

Tang, D. Y.: 2003, Asset return predictability and Bayesian model averaging,
Working Paper, University of Texas at Austin .

Terui, N. and van Dijk, H. K.: 2002, Combined forecasts from linear and nonlinear
time series models, International Journal of Forecasting 18(3), 421438.

Timmermann, A. and Granger, C. W. J.: 2004, Efficient market hypothesis and


forecasting, International Journal of Forecasting 20(1), 1527.

White, H.: 2000, A reality check for data snooping, Econometrica 68(5), 10971126.

Xia, Y.: 2001, Learning about predictability: The effects of parameter uncertainty
on dynamic asset allocation, Journal of Finance 56(1), 205246.

Yang, Y.: 2004, Combining forecasting procedures: Some theoretical results,


Econometric Theory 20(1), 176222.

Yuan, Z. and Yang, Y.: 2003, Combining linear regression models: When and how?,
working paper .

Zou, H. and Yang, Y.: 2004, Combining time series models for forecasting, Inter-
national Journal of Forecasting 20(1), 6984.

31
Table 1: Descriptive statistics for the data
mean std. skew. kurt. B-J p-Q(4) min max
dRet 0.50 4.22 -0.36 4.65 80.16 0.81 -21.88 16.06
Div -1.96 2.32 -0.58 4.02 58.97 0.00 -10.26 4.29
E/P 6.68 2.63 0.90 3.29 81.83 0.00 1.61 14.37
DefP 0.96 0.42 1.36 5.00 284.23 0.00 0.32 2.69
dTbill 0.00 0.49 -1.39 15.66 4168.40 0.00 -3.85 2.40
Fed -0.05 2.26 0.52 2.56 31.77 0.00 -4.90 6.21
Term 1.39 1.21 0.02 2.59 4.21 0.00 -2.02 4.72
IProd 3.44 5.17 -0.35 3.77 26.81 0.00 -12.42 21.67
dPPI 0.01 0.65 0.43 6.26 281.98 0.00 -2.40 3.60
PPI 3.28 3.78 1.52 5.36 368.73 0.00 -2.90 19.60
SMB 0.22 2.92 0.30 4.89 97.51 0.06 -11.60 14.62
HML 0.34 2.99 -0.56 10.26 1341.00 0.01 -20.79 14.92
Note: Descriptive statistics for the data running from 02/1955 to 09/2004.
The explanatory variables are the following series: The one month lagged
S&P 500 excess return over the 3 month Tbill rate (dRet), the dividend yield
(Div), calculated as the 3-month trailing difference between the total return
index and the S&P 500, the earnings yield (E/P) and the difference of the
earnings yield and the 10-year Treasury bond rate (Fed), the change in the
3-month Treasury bill rate (dTbill), the default premium measured as the
difference between the yields on Moodys BAA and AAA rated corporate
bonds (DefP), the term premium (Term) measured as the difference of the
10-year constant maturity Treasury bond yield and the 3-month Treasury bill
rate, the US industrial production data (IProd), the seasonally adjusted US
producer price index for finished goods (PPI) and its first difference (dPPI).
The macroeconomic data is lagged one month. All data is from Bloomberg
or the Federal Reserve internet site except the S&P 500 total return index
previous to 1970 which is taken from Ibbotson. The Fama and French style
factors SMB and HML are taken from their internet site. The shown statistics
are the mean, standard deviation, skewness and kurtosis, the Jarque-Bera test
statistics, the p-level for the the Q-test statistics up to the forth lag and the
minimum and the maximum of the series.

32
Table 2: Correlation of information set and endogenous variable
Ret dRet Div E/P DefP dTbill Fed Term IProd dPPI PPI SMB HML
Ret 1.00
dRet 0.02 1.00
Div 0.13 0.13 1.00
E/P 0.05 -0.02 -0.38 1.00
DefP 0.07 0.07 -0.42 0.52 1.00
dTbill -0.15 -0.09 -0.12 -0.03 -0.17 1.00
Fed 0.10 0.04 0.51 0.42 -0.23 -0.01 1.00
Term 0.11 0.10 0.33 -0.20 0.29 -0.25 -0.38 1.00
IProd -0.08 -0.08 0.07 -0.19 -0.53 0.15 0.01 -0.14 1.00
dPPI -0.07 -0.04 0.04 -0.04 -0.16 0.03 0.05 -0.06 0.20 1.00
PPI -0.07 -0.08 -0.45 0.69 0.35 0.04 0.28 -0.38 -0.28 0.08 1.00
SMB 0.05 0.16 0.07 0.03 0.08 -0.02 0.06 0.06 -0.12 -0.04 0.04 1.00
HML -0.07 -0.26 0.00 0.00 0.01 -0.10 -0.03 0.05 0.05 -0.03 0.01 -0.22 1.00

33
10

10
5

5
0

0
5

5
10

10
1970 1980 1990 2000 1970 1980 1990 2000

ANS ATS
10

10
5

5
0

0
5

5
10

10
1970 1980 1990 2000 1970 1980 1990 2000

BMAS AIC selection


10

10
5

5
0

0
5

5
10

10

1970 1980 1990 2000 1970 1980 1990 2000

BIC selection Thick modeling

Figure 1: Expected return forecasts for the skeptic investor

34
Table 3: Statistics of out-of-sample predictability
signpred signtest RMSE MAD bias
iid 56.09% 50.00% 0.2008 3.34 -0.07
ANO 57.35% 0.39% 0.2009 3.37 0.06
ANS 59.66% 0.01% 0.1976 3.31 0.12
ATO 56.72% 0.81% 0.2016 3.39 0.21
ATS 57.77% 0.77% 0.2006 3.34 0.03
BMAO 54.41% 1.77% 0.2011 3.40 0.38
BMAS 54.62% 3.39% 0.2009 3.37 0.38
Thick 56.93% 0.35% 0.1983 3.33 0.24
AIC 54.83% 1.77% 0.2040 3.44 0.37
BIC 53.99% 1.44% 0.2032 3.40 0.51
R2 53.78% 7.59% 0.2041 3.44 0.21
Note: Reported are some statistics of the out-of-sample pre-
dictability of the various models. The first column denotes
the sign predictability of a model, i.e. what percentage of the
time is the sign of residuals equal to the sign of the excess re-
turns. The second column reports the probability against the
null of no predictive performance of the Pesaran and Timmer-
mann (1992) test, which is asymptotically equal to the test of
Henriksson and Merton (1981) for market timing ability. Fi-
nally, the root mean square error (RMSE), the mean absolute
deviation (MAD) and the bias of the forecasts are reported.
Values denoted with a are significantly different from the
iid model at the 10%-level using the Diebold and Mariano
(1995) test statistic with 4 lags for autocorrelation.

Table 4: Correlation results between models


ANO ANS ATO ATS BMAO BMAS Thick AIC BIC R2
ANO 1.00
ANS 0.86 1.00
ATO 0.80 0.67 1.00
ATS 0.69 0.58 0.95 1.00
BMAO 0.81 0.72 0.89 0.81 1.00
BMAS 0.68 0.68 0.84 0.79 0.95 1.00
Thick 0.93 0.82 0.93 0.84 0.91 0.84 1.00
AIC 0.83 0.67 0.90 0.83 0.89 0.79 0.91 1.00
BIC 0.66 0.66 0.82 0.79 0.86 0.89 0.83 0.78 1.00
R2 0.86 0.67 0.93 0.84 0.89 0.79 0.92 0.94 0.75 1.00
Note: The table shows the correlation between the estimated excess returns (rj,t ) of
the various modeling approaches.

35
0.14
0.12
ANS
ATS

0.10
BMAS

0.08
0.06
0.04
0.02
0.00

1970 1980 1990 2000

Figure 2: Rolling R2 for AFTER, AFTER-t and BMA for the skeptical investor
0.14
0.12

Thick
AIC
0.10

BIC
0.08
0.06
0.04
0.02
0.00

1970 1980 1990 2000

Figure 3: Rolling R2 for thick modeling, selection by AIC and BIC

Note: The R2 is calculated as the rolling squared correlation coefficient of the estimated
out-of-sample excess return forecasts and the true excess returns with a window of 120
observations (including the in-sample results for the first observations). Further the graph
is smoothed with a 6-month moving average.

36
Table 5: Average weights of the full and iid model
iid model full model
mean std. mean std.
Equal weight 0.0244% 0.0244%
ANO 0.0000% 0.0000% 5.2427% 5.7262%
ANS 0.1433% 0.5654% 0.0009% 0.0023%
ATO 0.0000% 0.0000% 0.3050% 0.7859%
ATS 0.5159% 1.0854% 0.0000% 0.0000%
BMAO 0.1402% 0.3533% 0.0000% 0.0000%
BMAS 4.1286% 8.5113% 0.0000% 0.0000%
Note: Average weight (AFTER) / average posterior probability
(BMA) of the full model (i.e. including all information variables)
and the iid model (i.e. including only the constant)

37
1.0
BMAO

0.8
BMAS

0.6
0.4
0.2
0.0

1970 1980 1990 2000

Figure 4: Posterior probabilities of the iid model for BMA


1.0

ANO
0.8

ANS
0.6
0.4
0.2
0.0

1970 1980 1990 2000

Figure 5: Weights of the full model for AFTER

38
1.0
0.8
0.6
0.4

ANO
ATO
BMAO
0.2
0.0

1970 1980 1990 2000

Figure 6: Sum of weights for all models with (#variables 6) for optimistic investor
1.0
0.8
0.6
0.4

ANS
ATS
BMAS
0.2
0.0

1970 1980 1990 2000

Figure 7: Sum of weights for all models with (#variables 6) for skeptic investor

39
Table 6: Average inclusion weights of information variables
ANO ANS ATO ATS BMAO BMAS
dRet mean 74.75% 26.61% 58.41% 7.67% 11.54% 2.35%
std 4.71% 6.57% 10.17% 5.51% 2.96% 0.80%
Div mean 71.59% 62.80% 51.20% 19.07% 68.56% 71.19%
std 24.60% 28.47% 25.39% 17.27% 15.45% 21.93%
E/P mean 67.11% 32.34% 27.55% 8.13% 28.94% 5.05%
std 26.09% 26.40% 21.19% 8.49% 5.71% 1.98%
DefP mean 83.46% 29.70% 54.02% 27.85% 53.99% 44.11%
std 11.31% 16.46% 23.37% 21.15% 29.07% 38.19%
dTbill mean 77.15% 47.19% 69.10% 30.32% 51.63% 32.11%
std 28.01% 30.59% 16.61% 22.67% 28.48% 30.26%
Fed mean 57.27% 30.69% 42.70% 16.98% 40.57% 13.70%
std 23.06% 11.24% 19.31% 15.29% 12.67% 8.59%
Term mean 68.78% 28.90% 48.24% 18.96% 31.13% 15.17%
std 20.41% 22.78% 16.98% 17.32% 11.64% 9.64%
IProd mean 94.59% 56.08% 72.62% 55.58% 22.48% 6.21%
std 4.15% 17.24% 13.26% 25.15% 8.81% 3.56%
dPPI mean 80.47% 41.70% 31.89% 7.76% 15.90% 4.27%
std 11.90% 18.00% 19.53% 7.34% 7.21% 2.38%
PPI mean 91.61% 64.14% 60.92% 63.73% 37.56% 7.74%
std 5.32% 15.86% 20.60% 24.68% 16.02% 5.20%
SMB mean 95.30% 72.65% 59.41% 12.12% 12.26% 2.78%
std 2.88% 10.45% 11.13% 4.86% 7.31% 2.20%
HML mean 72.34% 24.85% 64.02% 18.23% 10.55% 2.15%
std 10.11% 6.83% 4.90% 2.91% 4.83% 1.19%
Note: Reported are the mean and standard deviations of the inclusion weights
of the information variables for the various models over the entire out-of-sample
period. For AFTER the inclusion weight of a variable for a single period equals
the weight given to all models including the according variable. For BMA this
equals the posterior probability of the models including the specific variable.

40
1.0

1.0
ANS ANS

0.8

0.8
ATS ATS
BMAS BMAS

0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0
1970 1980 1990 2000 1970 1980 1990 2000

dRet DivYield
1.0

1.0
ANS ANS
0.8

0.8
ATS ATS
BMAS BMAS
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0
1970 1980 1990 2000 1970 1980 1990 2000

E/P DefPrem
1.0

1.0

ANS ANS
0.8

0.8

ATS ATS
BMAS BMAS
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

1970 1980 1990 2000 1970 1980 1990 2000

dTbill FedModel

Figure 8: Inclusion probability/weight for information variables dRet, DivYield,


E/P, DefPrem, dTbill and FedModel

41
1.0

1.0
ANS ANS
0.8

0.8
ATS ATS
BMAS BMAS
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0
1970 1980 1990 2000 1970 1980 1990 2000

Term IProd
1.0

1.0

ANS ANS
0.8

0.8

ATS ATS
BMAS BMAS
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

1970 1980 1990 2000 1970 1980 1990 2000

dPPI PPI
1.0

1.0

ANS ANS
0.8

0.8

ATS ATS
BMAS BMAS
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

1970 1980 1990 2000 1970 1980 1990 2000

SMB HML

Figure 9: Inclusion probability/weight for information variables Term, IProd, dPPI,


PPI, SMB and HML

42
Table 7: Results of switching investment strategies based on the forecasted
sign
Long/short strategy
ex. ret. std. ann. skew. kurt. SR MaxD dm-test
S&P 500 3.77% 15.21% -0.58 5.50 0.25 44.73% -
iid 3.77% 15.21% -0.58 5.50 0.25 44.73% -
ANO 7.26% 14.93% 0.08 4.22 0.51 41.98% 0.1305
ANS 9.32% 14.81% 0.01 4.31 0.66 38.87% 0.0253
ATO 5.38% 15.07% -0.19 4.49 0.37 49.20% 0.2953
ATS 6.20% 15.03% -0.15 4.51 0.42 44.73% 0.1699
BMAO 4.85% 15.01% 0.24 3.99 0.34 49.38% 0.3878
BMAS 4.60% 15.07% 0.07 4.22 0.31 49.38% 0.4091
Thick 7.01% 14.95% 0.03 4.25 0.49 40.54% 0.1451
AIC 4.83% 15.08% -0.04 4.37 0.33 50.08% 0.3728
BIC 3.52% 15.09% 0.17 4.16 0.24 52.55% 0.4721
2
R 3.28% 15.15% -0.03 4.33 0.22 38.87% 0.4367
Long only strategy
ex. ret. std. ann. skew. kurt. SR MaxD dm-test
iid 3.77% 15.21% -0.58 5.50 0.25 44.73% -
ANO 6.03% 11.14% 0.10 5.29 0.94 38.87% 0.0791
ANS 7.04% 11.25% 0.04 5.26 1.04 38.87% 0.0133
ATO 5.09% 11.24% -0.33 6.13 0.80 45.32% 0.1954
ATS 5.36% 12.36% -0.27 4.83 0.65 44.73% 0.1140
BMAO 4.95% 9.96% 0.39 7.78 1.08 38.87% 0.2701
BMAS 4.81% 10.20% 0.12 6.06 0.99 38.87% 0.2867
Thick 5.96% 10.60% 0.07 5.86 1.06 39.77% 0.0832
AIC 4.91% 10.30% -0.11 7.83 0.98 46.03% 0.2486
BIC 4.33% 9.58% 0.34 7.92 1.09 34.51% 0.3828
R2 4.11% 10.63% -0.18 7.33 0.80 38.87% 0.4172
Note: The investment strategies are based on the out-of-sample excess return
forecasts of the various models. There are no transaction costs. The switching
sign predictability strategy invests according to the sign of the estimated excess
return. For the long/short strategy the investor is either long or short the S&P 500
depending on the sign of the forecasted excess return. For the long only strategy the
investor is either invested in the S&P 500 or in the risk free asset. Reported are the
excess returns and their standard deviation, skewness and kurtosis. Additionally
the adjusted Sharpe ratio and the maximum drawdown of the strategy are shown.
Finally the outperformance against the S&P 500 is tested using the Diebold and
Mariano (1995) test statistic with 4 lags for autocorrelation. The table shows the
probability of the null hypothesis of zero outperformance.

43
Table 8: Results of investment strategies accounting for model uncertainty
in sign forecasts
Long/short strategy
ex. ret. std. ann. skew. kurt. SR MaxD dm-test
S&P 500 3.77% 15.21% -0.58 5.50 0.25 44.73% -
iid 3.77% 15.21% -0.58 5.50 0.25 44.73% 0.3761
ANO 5.81% 12.05% -0.02 5.03 0.74 33.77% 0.2191
ANS 7.07% 10.58% 0.21 5.19 1.21 35.02% 0.0828
ATO 4.49% 12.30% -0.16 5.52 0.57 42.57% 0.3932
ATS 5.46% 12.05% -0.45 5.89 0.70 44.47% 0.2219
BMAO 3.94% 11.76% 0.54 6.01 0.59 32.53% 0.4799
BMAS 3.99% 11.72% 0.43 6.22 0.60 37.83% 0.4727
Thick 5.13% 9.93% 0.24 6.15 1.12 31.27% 0.2994
AIC 4.83% 15.08% -0.04 4.37 0.33 50.08% 0.3728
BIC 3.52% 15.09% 0.17 4.16 0.24 52.55% 0.4721
2
R 3.28% 15.15% -0.03 4.33 0.22 38.87% 0.4367
Long only strategy
ex. ret. std. ann. skew. kurt. SR MaxD dm-test
iid 3.77% 15.21% -0.58 5.50 0.25 44.73% -
ANO 4.41% 9.47% 0.19 7.07 1.13 33.77% 0.3518
ANS 5.20% 8.82% 0.13 7.54 1.51 35.02% 0.1913
ATO 3.59% 9.65% -0.27 7.69 0.95 41.00% 0.4543
ATS 4.53% 10.34% -0.45 6.71 0.92 44.47% 0.3053
BMAO 3.46% 8.00% 0.33 10.37 1.50 31.81% 0.4338
BMAS 4.03% 8.33% 0.17 9.14 1.48 32.97% 0.4454
Thick 3.51% 7.81% 0.32 10.13 1.61 31.27% 0.4414
AIC 4.91% 10.30% -0.11 7.83 0.98 46.03% 0.2486
BIC 4.33% 9.58% 0.34 7.92 1.09 34.51% 0.3828
R2 4.11% 10.63% -0.18 7.33 0.80 38.87% 0.4172
Note: The investment strategies are based on the out-of-sample excess return fore-
casts of the various models. There are no transaction costs. The considered in-
vestment strategy invests the percentage of the wealth according to the weighted
2k
P
average of all models sign forecasts: wsignt = wj,t sign(rj,t ). By definition the
j=1
weighted average sign forecasts can take values between -1 and 1. The weight for
BMA is the posterior probability of the single models. For the long/short strategy
the investor invests the fraction of his wealth depending on the above mentioned
weighted sign either long or short the S&P 500 and the risk free asset. For the long
only strategy the investor either invests a fraction in the S&P 500 plus the risk free
asset or in the risk free asset only if the weighted sign is negative. Reported are the
excess returns and their standard deviation, skewness and kurtosis. Additionally
the adjusted Sharpe ratio and the maximum drawdown of the strategy is shown.
Finally the outperformance against the S&P 500 is tested using the Diebold and
Mariano (1995) test statistic with 4 lags for autocorrelation. The table shows the
probability of the null hypothesis of zero outperformance.

44
Table 9: Results of investment strategies with mean-variance criterion
Long/short strategy
ex. ret. std. ann. skew. kurt. SR MaxD dm-test
S&P 500 3.77% 15.21% -0.58 5.50 0.25 44.73% -
iid 1.88% 10.11% -0.75 5.15 0.58 34.83% 0.0265
ANO 5.08% 11.72% -0.13 6.08 0.72 34.09% 0.3009
ANS 6.83% 11.11% 0.12 5.12 1.04 37.14% 0.0952
ATO 4.09% 12.44% -0.36 6.31 0.51 41.57% 0.4505
ATS 5.07% 12.07% -0.37 6.66 0.66 43.17% 0.2759
BMAO 4.26% 11.06% 0.39 5.21 0.74 31.16% 0.4364
BMAS 3.63% 10.67% 0.42 5.76 0.73 32.45% 0.4794
Thick 5.26% 10.96% -0.05 7.14 0.88 32.59% 0.2771
AIC 3.85% 11.53% -0.06 6.15 0.61 37.31% 0.4894
BIC 2.75% 12.13% 0.03 5.60 0.41 43.75% 0.3737
R2 3.89% 12.19% -0.25 6.56 0.52 39.70% 0.4833
Long only strategy
ex. ret. std. ann. skew. kurt. SR MaxD dm-test
iid 1.88% 10.11% -0.75 5.15 0.58 34.83% -
ANO 3.80% 9.46% 0.09 7.38 1.03 34.09% 0.4927
ANS 4.76% 9.40% 0.13 7.25 1.22 37.14% 0.2665
ATO 3.52% 9.88% -0.35 8.68 0.88 40.43% 0.4357
ATS 4.29% 10.21% -0.32 7.40 0.92 43.17% 0.3622
BMAO 3.50% 8.27% 0.40 10.06 1.39 31.16% 0.4431
BMAS 3.56% 8.14% 0.34 10.27 1.46 32.45% 0.4543
Thick 3.64% 8.55% 0.21 9.05 1.31 32.59% 0.4679
AIC 3.31% 8.32% 0.22 9.84 1.33 37.31% 0.3979
BIC 3.20% 7.86% 0.54 12.23 1.51 31.61% 0.3826
2
R 2.94% 9.46% -0.39 9.97 0.89 38.87% 0.3108
Note: The investment strategies are based on the out-of-sample excess return fore-
casts of the various models. There are no transaction costs and no leverage, i.e.
|w| 1. The weight invested in the risky asset is the mean-variance criterion
rf
wt = 1 t+12
t+1
1
+ 2 , where is the inverse of the relative risk aversion. We set
equal to 5. For the long only strategy the investor either invests a fraction in the
S&P 500 plus the risk free asset. In the long/short strategy the investor can ad-
ditionally short the S&P 500. Reported are the excess returns and their standard
deviation, skewness and kurtosis. Additionally the adjusted Sharpe ratio and the
maximum drawdown of the strategy is shown. Finally the outperformance against
the S&P 500 is tested using the Diebold and Mariano (1995) test statistic with 4
lags for autocorrelation. The table shows the probability of the null hypothesis of
zero outperformance.

45
Table 10: Certainty equivalent returns for investment strategies
switching weighted sign mean-variance
long/short long only long/short long only long/short long only
iid -1.03 -1.03 -1.03 -1.03 -0.23 -0.23
ANO 2.82 3.55 2.90 2.62 2.31 2.01
ANS 4.93 4.51 4.85 3.64 4.37 3.00
ATO 0.77 2.52 1.43 1.70 0.93 1.53
ATS 1.64 2.27 2.48 2.35 2.09 2.17
BMAO 0.40 2.99 1.23 2.19 1.85 2.15
BMAS 0.07 2.73 1.29 2.65 1.39 2.25
Thick 2.55 3.72 3.17 2.30 2.85 2.18
AIC 0.27 2.77 0.27 2.77 1.17 1.93
BIC -1.00 2.51 -1.00 2.51 -0.19 1.98
R2 -1.33 1.82 -1.33 1.82 0.87 1.12
Note:
R The certainty equivalent return solves the following equation U (exp(rceq )) =
U (exp(rt+1 )) p(rt+1 |Dt )drt+1 , where p(rt+1 |Dt ) denotes the predictive distribution
of next periods excess return conditional on information Dt . Under the assumption that
the returns of a trading strategy are iid, PTwe estimate the expected utility based on past
strategy returns by U (exp(rceq )) = T1 t=1 U (exp(rt )) and then solve for the certainty
1
equivalent return. We use the CRRA utility function U (W ) = W1 , where we set
equal to 5. The certainty equivalent returns are reported as annualized returns in excess
of the average risk free rate over the period.

46
Table 11: Statistics of out-of-sample predictability for
sub-periods
Panel A: Out-of-sample predictability (1965-1993)
signpred PT-test RMSE MAD bias
iid 52.94% 0.2506 3.42 -0.14
ANO 59.44% 0.06% 0.2487 3.42 -0.03
ANS 58.82% 0.11% 0.2435 3.34 0.15
ATO 55.73% 1.84% 0.2496 3.43 0.33
ATS 56.66% 1.40% 0.2481 3.38 0.12
BMAO 56.35% 0.71% 0.2483 3.44 0.39
BMAS 56.66% 0.84% 0.2475 3.39 0.41
Thick 58.20% 0.16% 0.2448 3.36 0.23
AIC 55.42% 1.28% 0.2531 3.49 0.41
BIC 56.35% 0.46% 0.2498 3.40 0.54
R2 55.42% 2.21% 0.2525 3.48 0.24

Panel B: Out-of-sample predictability (1993-2004)


signpred PT-test RMSE MAD bias
iid 62.75% 0.3321 3.18 0.06
ANO 52.94% 43.20% 0.3387 3.27 0.25
ANS 61.44% 10.21% 0.3369 3.23 0.04
ATO 58.82% 35.66% 0.3403 3.29 -0.04
ATS 60.13% 5.86% 0.3391 3.25 -0.16
BMAO 50.33% 40.59% 0.3414 3.32 0.36
BMAS 50.33% 24.74% 0.3427 3.34 0.30
Thick 54.25% 38.91% 0.3371 3.26 0.25
AIC 53.60% 23.39% 0.3424 3.34 0.27
BIC 49.02% 37.61% 0.3484 3.38 0.43
R2 50.33% 10.09% 0.3452 3.34 0.15
Note: Reported are out-of-sample predictability statistics of
the various forecasting algorithms. The first column contains
the sign predictability of a model, i.e. what percentage of the
time is the sign of the forecast equal to the sign of the excess
returns. The second column reports the probability against
the null of no predictive performance of the Pesaran and
Timmermann (1992) test, which is asymptotically equal to
the test of Henriksson and Merton (1981) for market timing
ability. Finally, the root mean square error (RMSE), the
mean absolute deviation (MAD) and the bias are reported.

47
Table 12: Certainty equivalent returns in sub-periods
Panel A: Certainty equivalent returns (1965-1993)
switching weighted sign mean-variance
long/short long only long/short long only long/short long only
iid -2.28 -2.28 -2.28 -2.28 -1.01 -1.01
ANO 6.71 5.08 4.90 3.81 3.65 3.15
ANS 6.07 4.89 6.03 4.25 5.93 4.08
ATO 1.49 3.33 2.75 2.88 1.88 2.58
ATS 2.34 2.92 3.20 3.00 3.25 3.36
BMAO 3.98 4.53 4.34 3.55 4.31 3.54
BMAS 3.51 4.29 4.31 4.10 4.00 3.63
Thick 5.19 4.94 5.06 3.65 4.38 3.45
AIC 2.86 4.42 2.86 4.42 3.25 3.53
BIC 3.25 4.31 3.25 4.31 3.27 3.66
R2 1.65 3.44 1.65 3.44 3.33 3.04
Panel B: Certainty equivalent returns (1993-2004)
switching weighted sign mean-variance
long/short long only long/short long only long/short long only
iid 1.59 1.59 1.59 1.59 1.42 1.42
ANO -5.12 0.41 -1.23 0.16 -0.44 -0.34
ANS 2.57 3.73 2.42 2.40 1.16 0.77
ATO -0.71 0.85 -1.30 -0.73 -1.04 -0.62
ATS 0.18 0.93 1.01 1.01 -0.31 -0.27
BMAO -6.91 -0.20 -5.14 -0.62 -3.23 -0.71
BMAS -6.98 -0.48 -4.91 -0.36 -3.98 -0.60
Thick -2.89 1.20 -0.72 -0.50 -0.32 -0.45
AIC -5.07 -0.62 -5.07 -0.62 -3.11 -1.38
BIC -9.67 -1.20 -9.67 -1.20 -7.29 -1.49
R2 -7.46 -1.52 -7.46 -1.52 -4.18 -2.85
Note:
R The certainty equivalent return solves the following equation U (exp(rceq )) =
U (exp(rt+1 )) p(rt+1 |Dt )drt+1 , where p(rt+1 |Dt ) denotes the predictive distri-
bution of next periods excess return conditional on information Dt . Under the
assumption that the returns of a trading strategy are iid, we estimate
PT the expected
utility based on past strategy returns by U (exp(rceq )) = T1 t=1 U (exp(rt )) and
then solve for the certainty equivalent return. We use the CRRA utility function
1
U (W ) = W1 , where we set equal to 5. The certainty equivalent returns are
reported as annualized returns in excess of the average risk free rate over the two
sub periods.

48
ECF-SFI 06 25.1.2006 16:05 Page 24

c/o University of Geneva


40 bd du Pont d'Arve
1211 Geneva 4
Switzerland

T +41 22 379 84 71
F +41 22 379 82 77
RPS@sfi.ch
www.SwissFinanceInstitute.ch