Eco No Metric Analysis of Financial Market Data

Econometric Analysis of Financial Market Data
1
ZONGWU CAI
E-mail address: zcai@uncc.edu
Department of Mathematics & Statistics and Department of Economics,
University of North Carolina, Charlotte, NC 28223, U.S.A.
Wang Yanan Institute for Studies in Economics, Xiamen University, China
February 3, 2010
c 2010, ALL RIGHTS RESERVED by ZONGWU CAI
1
This manuscript may be printed and reproduced for individual or instructional use, but may
not be printed for commercial purposes.
i
Preface
The main purpose of this lecture notes is to provide you with a foundation to pursue the
basic theory and methodology as well as applied projects involving the skills to analyzing
nancial data. This course also gives an overview of the econometric methods (models and
their modeling techniques) applicable to nancial economic modeling. More importantly, it
is the ultimate goal of bringing you to the research frontier of the empirical (quantitative)
nance. To model nancial data, some packages will be used such as R, which is a very
convenient programming language for doing homework assignments and projects. You can
download it for free from the web site at http://www.r-project.org/.
Several projects, including the heavy computer works, are assigned throughout the semester.
The group discussion is allowed to do the projects and the computer related homework, par-
ticularly writing the computer codes. But, writing the nal report to each project or home
assignment must be in your own language. Copying each other will be regarded as a cheating.
If you use the R language, similar to SPLUS, you can download it from the public web site
at http://www.r-project.org/ and install it into your own computer or you can use PCs at
our labs. You are STRONGLY encouraged to use (but not limited to) the package R since
it is a very convenient programming language for doing statistical analysis and Monte Carol
simulations as well as various applications in quantitative economics and nance. Of course,
you are welcome to use any one of other packages such as SAS, MATLAB, GAUSS, and
STATA. But, I might not have an ability of giving you a help if doing so.
How to Install R ?
The main package used is R, which is free from R-Project for Statistical Computing.
(1) go to the web site http://www.r-project.org/;
(2) click CRAN;
(3) choose a site for downloading, say http://cran.cnr.Berkeley.edu;
(4) click Windows (95 and later);
(5) click base;
(6) click R-2.10.1-win32.exe (Version of December 14, 2009) to save this le rst and
then run it to install (Note that the setup program is 32 megabytes and it is updated
almost every three months).
The above steps install the basic R into your computer. If you need to install other
packages, you need to do the followings:
(7) After it is installed, there is an icon on the screen. Click the icon to get into R;
(8) Go to the top and nd packages and then click it;
ii
(9) Go down to Install package(s)... and click it;
(10) There is a new window. Choose a location to download packages, say USA(CA1),
move mouse to there and click OK;
(11) There is a new window listing all packages. You can select any one of packages and
click OK, or you can select all of them and then click OK.
Data Analysis and Graphics Using R An Introduction (109 pages)
I encourage you to download the le r-notes.pdf (109 pages) which can be downloaded
from http://www.math.uncc.edu/ zcai/r-notes.pdf and learn it by yourself. Please
see me if any questions.
CRAN Task View: Empirical Finance
This CRAN Task View contains a list of packages useful for empirical work in Finance and
it can be downloaded from the web site at
http://cran.cnr.berkeley.edu/src/contrib/Views/Finance.html.
CRAN Task View: Computational Econometrics
Base R ships with a lot of functionality useful for computational econometrics, in particular
in the stats package. This functionality is complemented by many packages on CRAN. It
can be downloaded from the web site at
http://cran.cnr.berkeley.edu/src/contrib/Views/Econometrics.html.
Contents
1 A Motivation Example 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Preliminary Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Jump-Diusion Modeling Procedures . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Pricing American-style Options Using Stratication Simulation Method . . . 8
1.5 Hedging Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Basic Concepts of Prices and Returns 13
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Basic Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Time Value of Money . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Assets and Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.3 Financial Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Statistical Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.1 Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.2 Frequency of Observations . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.3 Denition of Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Stylized Facts for Financial Returns . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Linear Time Series Models and Their Applications 31
3.1 Stationary Stochastic Process . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Constant Expected Return Model . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.1 Model Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.2 Regression Model Representation . . . . . . . . . . . . . . . . . . . . 34
3.2.3 CER Model of Asset Returns and Random Walk Model of Asset Prices 35
3.2.4 Monte Carlo Simulation Method . . . . . . . . . . . . . . . . . . . . . 36
3.2.5 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.6 Statistical Properties of Estimates . . . . . . . . . . . . . . . . . . . . 38
3.3 AR(1) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.1 Estimation and Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.2 White Noise Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . 41
iii
CONTENTS iv
3.3.3 Unit Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.4 Estimation and Tests in the Presence of a Unit Root . . . . . . . . . 42
3.4 MA(1) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5 ARMA, ARIMA, and ARFIMA Processes . . . . . . . . . . . . . . . . . . . 45
3.5.1 ARMA(1,1) Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5.2 ARMA(p,q) Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5.3 AR(p) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5.4 MA(q) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.5.5 AR() Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5.6 MA() Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5.7 ARIMA Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5.8 ARFIMA Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.6 R Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.7 Regression Models With Correlated Errors . . . . . . . . . . . . . . . . . . . 56
3.8 Comments on Nonlinear Models and Their Applications . . . . . . . . . . . . 56
3.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.9.1 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.9.2 R Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.10 Appendix A: Linear Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.11 Appendix B: Forecasting Based on AR(p) Model . . . . . . . . . . . . . . . . 62
3.12 Appendix C: Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.13 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4 Predictability of Asset Returns 69
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.1.1 Martingale Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . 69
4.1.2 Tests of MD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 Random Walk Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2.1 IID Increments (RW1) . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2.2 Independent Increments (RW2) . . . . . . . . . . . . . . . . . . . . . 71
4.2.3 Uncorrelated Increments (RW3) . . . . . . . . . . . . . . . . . . . . . 72
4.2.4 Unconditional Mean is the Best Predictor (RW4) . . . . . . . . . . . 72
4.3 Tests of Predictability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.3.1 Nonparametric Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3.2 Autocorrelation Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.3.3 Variance Ratio Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.3.4 Trading Rules and Market Eciency . . . . . . . . . . . . . . . . . . 80
4.4 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.4.1 Evidence About Returns Predictability Using VR and Autocorrelation Tests 84
4.4.2 Cross Lag Autocorrelations and Lead-Lag Relations . . . . . . . . . . 85
4.4.3 Evidence About Returns Predictability Using Trading Rules . . . . . 86
4.5 Predictability of Real Stock and Bond Returns . . . . . . . . . . . . . . . . . 87
4.5.1 Financial Predictors . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.5.2 Models and Modeling Methods . . . . . . . . . . . . . . . . . . . . . 88
4.6 A Recent Perspective on Predictability of Asset Return . . . . . . . . . . . . 95
CONTENTS v
4.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.6.2 Conditional Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.6.3 Conditional Variances . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.6.4 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.6.5 The future . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.7 Comments on Predictability Based on Nonlinear Models . . . . . . . . . . . 101
4.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.8.1 Exercises for Homework . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.8.2 R Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.8.3 Project #1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5 Market Model 111
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.2 Assumptions About Asset Returns . . . . . . . . . . . . . . . . . . . . . . . 112
5.3 Unconditional Properties of Returns . . . . . . . . . . . . . . . . . . . . . . . 112
5.4 Conditional Properties of Returns . . . . . . . . . . . . . . . . . . . . . . . . 113
5.5 Beta as a Measure of Portfolio Risk . . . . . . . . . . . . . . . . . . . . . . . 114
5.6 Diagnostics for Constant Parameters . . . . . . . . . . . . . . . . . . . . . . 115
5.7 Estimation and Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . 116
5.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6 Event-Study Analysis 119
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.2 Outline of an Event Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.3 Models for Measuring Normal Returns . . . . . . . . . . . . . . . . . . . . . 121
6.4 Measuring and Analyzing Abnormal Returns . . . . . . . . . . . . . . . . . . 122
6.4.1 Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.4.2 Aggregation of Abnormal Returns . . . . . . . . . . . . . . . . . . . . 124
6.4.3 Modifying the Null Hypothesis: . . . . . . . . . . . . . . . . . . . . . 127
6.4.4 Nonparametric Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.4.5 Cross-Sectional Models . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.4.6 Power of Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.5 Further Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7 Introduction to Portfolio Theory 136
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.1.1 Ecient Portfolios With Two Risky Assets . . . . . . . . . . . . . . . 137
7.1.2 Ecient Portfolios with One Risky Asset and One Risk-Free Asset . . 138
7.1.3 Ecient portfolios with two risky assets and a risk-free asset . . . . . 139
7.2 Ecient Portfolios with N risky assets . . . . . . . . . . . . . . . . . . . . . 140
7.3 Another Look at Mean-Variance Eciency . . . . . . . . . . . . . . . . . . . 142
CONTENTS vi
7.4 The Black-Litterman Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.4.1 Expected Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.4.2 The Black-Litterman Model . . . . . . . . . . . . . . . . . . . . . . . 145
7.4.3 Building the Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.5 Estimation of Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . . . 147
7.5.1 Estimation Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.5.2 Shrinkage estimator of the covariance matrix . . . . . . . . . . . . . . 150
7.5.3 Recent Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
8 Capital Asset Pricing Model 155
8.1 Review of the CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
8.2 Statistical Framework for Estimation and Testing . . . . . . . . . . . . . . . 157
8.2.1 Time-Series Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 158
8.2.2 Cross-Sectional Regression . . . . . . . . . . . . . . . . . . . . . . . . 159
8.2.3 Fama-MacBeth Procedure . . . . . . . . . . . . . . . . . . . . . . . . 162
8.3 Empirical Results on CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
8.3.1 Testing CAPM Based On Cross-Sectional Regressions . . . . . . . . . 163
8.3.2 Return-Measurement Interval and Beta . . . . . . . . . . . . . . . . . 165
8.3.3 Results of FF and KSS . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
8.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
9 Multifactor Pricing Models 169
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
9.1.1 Why Do We Expect Multiple Factors? . . . . . . . . . . . . . . . . . 169
9.1.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
9.2 Selection of Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
9.2.1 Theoretical Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 171
9.2.2 Small and Value/Growth Stocks . . . . . . . . . . . . . . . . . . . . . 171
9.2.3 Macroeconomic Factors . . . . . . . . . . . . . . . . . . . . . . . . . . 172
9.2.4 Statistical Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 176
9.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
9.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
List of Tables
2.1 Illustration of the Eects of Compounding: . . . . . . . . . . . . . . . . . . . 15
3.1 Denitions of ten types of stochastic process . . . . . . . . . . . . . . . . . . 32
3.2 Large-sample critical values for the ADF statistic . . . . . . . . . . . . . . . 43
3.3 Summary of DF test for unit roots in the absence of serial correlation . . . . 44
4.1 Variance ratio test values, daily 1991-2000 (from Taylor, 2005) . . . . . . . . 86
4.2 Variance ratio test values, weekly 1962-1994 (from Taylor, 2005) . . . . . . . 86
4.3 Autocorrelations in daily, weekly, and monthly stock index returns . . . . . . 87
7.1 Example Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.2 Expected excess return vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 146
7.3 Recommended portfolio weights . . . . . . . . . . . . . . . . . . . . . . . . . 146
vii
List of Figures
1.1 The time series plot of the swap rates. . . . . . . . . . . . . . . . . . . . . . 3
1.2 The time series plot of the log of swap rates. . . . . . . . . . . . . . . . . . . 4
1.3 The scatter plot of the log return versus the level of log of swap rates. . . . . 5
2.1 The weekly and monthly prices of IBM stock. . . . . . . . . . . . . . . . . . 18
2.2 The weekly and monthly returns of IBM stock. . . . . . . . . . . . . . . . . . 20
2.3 The empirical distribution of standardized IBM daily returns and the pdf of standard normal. Notice
2.4 The empirical distribution of standardized Microsoft daily returns and the pdf of standard normal.
2.5 Q-Q plots for the standardized IBM returns (top panel) and the standardized Microsoft returns (b
3.1 Some examples of dierent categories of stochastic processes. . . . . . . . . . 33
3.2 Relationships between categories of uncorrelated processes. . . . . . . . . . . 33
3.3 Monte Carlo Simulation of the CER model. . . . . . . . . . . . . . . . . . . 37
3.4 Sample autocorrelation function of the absolute series of daily simple returns for the CRSP value-w
6.1 Time Line of an event study. . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.2 Power function of the J
1
test at the 5% signicance level for sample sizes 1, 10, 20 and 50.133
7.1 Plot of portfolio expected return,
p
versus portfolio standard deviation,
p
. . 137
7.2 Plot of portfolio expected return versus standard deviation. . . . . . . . . . . 139
7.3 Plot of portfolio expected return versus standard deviation. . . . . . . . . . . 140
7.4 Deriving the new combined return vector E(R). . . . . . . . . . . . . . . . . 148
8.1 Cross-sectional regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
viii
Chapter 1
A Motivation Example
The purpose of this chapter is to present you, as a motivation example, a simple procedure
that can be used for proposing a reasonable jump-diusion model for a real market data
(swap rates), calibrating parameters for the jump-diusion model, and pricing American-
style options under the proposed jump-diusion process. In addition, we will discuss hedging
issues for such options and sensitivity of parameters for American-style options.
1.1 Introduction
It is well known (see, e.g., Due (1996)) that under some regular conditions, there is
an equivalent martingale measure Q, such that for any European contingent claim on an
underlying {X
t
; t 0} without paying dividends with maturity T in the market, it can be
priced as follows:
P(0, T) = E
Q
0
_
exp
_
_
T
0
r(s)ds
_
g(X
T
, T)
_
, (1.1)
where g(, ) stands for the payo function of underling for this contingent claim, P(0, T) is
the claims arbitrage-free or fair price at time 0, r
t
is the riskless short-term interest rate,
and E
Q
0
[] presents the expectation operator conditional on the information up to now.
For an American contingent claim on the same underlying with maturity T in the market,
it can be priced similarly:
P(0, T) = sup
E
Q
0
_
exp
_
_

0
r(s)ds
_
g(X
, )
_
, (1.2)
where is the collection of all stopping times less than the maturity time T. A comparison
of (1.1) with (1.2) reveals that the theory is similar but the computing for the American
option is much dicult.
1
CHAPTER 1. A MOTIVATION EXAMPLE 2
This theory provides us a risk neutral scheme to price any contingent claim. More
precisely, we can pretend to live a risk neutral world to modelling and calibrating parameters
using the data lived in the real world, then we do pricing using equation (1.1) or (1.2).
We will use this scheme throughout this paper.
In this chapter, we will present a simple procedure that can be used for proposing a
reasonable jump-diusion model for a real market data, calibrating parameters for the jump-
diusion model, and pricing American-style options under the proposed jump-diusion pro-
cess. In addition, we will discuss hedging issues for such options and sensitivity of parameters
for American-style options. The remainder of the chapter is structured as follows. Section 2
presents some empirical properties of the data by graphing, mining data, and doing some pre-
liminary statistical analysis. Section 3 provides a jump-diusion model based on the given
properties observed from Section 2, a calibration of parameters under this jump-diusion
setting by MLE method, and a test for the existence of jump. Section 4 proposes a uni-
versal algorithm for American-style options with one-factor underlying model, and uses this
algorithm to price American option for the real data. Section 5 presents hedging issues for
the given American option. Section 6 concludes this chapter and discusses an extension of
our model to a more general tractable jump-diusion setting called ane jump-diusion
model proposed by Due, Pan and Singleton (2000).
1.2 Preliminary Statistical Analysis
The data we will investigate is a collection of swap rates (the dierences between 10 years
LIBOR rates and 10-year treasury bonds yields) from December 19, 2002 to October 15,
2004. We can present the data graphically in Figure 1.1 by the time series plot. From the
graph, we observe the followings:
(O1) We can visually nd there are some possible jumps for swap rates, and jumps seem
to have almost same frequencies for positive and negative jumps. In addition, from
economic standpoint of view, the dierence of LIBOR and treasury yield should always
be positive since the former always includes some credit issues.
(O2) We can nd mean reversion from the graph, which means a very high swap rate tends
to go lower, while a low swap rate tends to bounce back to a higher level. Economically,
0 100 200 300 400
3
0
3
5
4
0
4
5
5
0
5
5
6
0
6
5
date
S
w
a
p

r
a
t
e
Figure 1.1: The time series plot of the swap rates.
it makes sense since we can not expect a sequence of swap rates going up without any
pull-back.
(O3) We shall pay attention to the graph not exactly presenting the data, since we dont
consider irregularity of time space at the x-axis because of no recording of holidays and
weekends. For details on the calendar eects, see the book by Taylor (2005, Section
4.5). The implication of this irregularity of time space is that some of possible jumps
maybe come from no transaction for long-time which leads to an accumulative eects
of a series bad or good news on the next transaction day(s).
(O4) We can nd visually that jumps seem to be clustered, which means that if a jump
occurs, there will follow more jumps with a greater probability, and a sequential positive
jumps occurred will follow a a sequential negative jumps with a greater probability.
This is an embarrassing nding, since we will not deal with this issue in this chapter
but this is an important research topic for academics and practitioners.
A formal way to modelling a dynamic system for a necessary positive data is to modelling
the logarithm of the original data. The transformed data is graphed in Figure 1.2: Since our
0 100 200 300 400
3
.
6
3
.
8
4
.
0
4
.
2
date
L
o
g

o
f

S
w
a
p

R
a
t
e
Figure 1.2: The time series plot of the log of swap rates.
objective is to modelling a dynamic mechanism of the evolution of swap rates, we propose a
general stochastic dierential equation to the transformed variable (logarithm of swap rate)
which can usually be called state variable. Let S
t
be the swap rate at time t, and denote
X
t
as the logarithm of S
t
, namely, X
t
= log(S
t
). The general stochastic dierential equation
(SDE, or called Black-Scholes model) of X
t
is as follows,
dX
t
= (X
t
)dt + (X
t
)dW
t
+ dJ
t
, X
0
= x
0
, (1.3)
where () (drift) and () (diusion) stand for instantaneous mean function and volatility
function of the process respectively and W
t
and J
t
are a standard Brownian motion and a
pure jump process respectively.
The objective of modelling, in fact, is to specify the explicit forms of () and (), and
probability mechanism of pure jump process J
t
. In this section, we will have some idea about
the possible shape of () by a preliminary approximation of the SDE and the transformed
data. First, for a very small time interval t, the SDE can be approximated by a dierence
equation (Euler approximation) as follows,
X
t+t
X
t
(X
t
)t + (X
t
)(W
t+t
W
t
) + (J
t+t
J
t
)
(X
t
)(W
t+t
W
t
) + (J
t+t
J
t
). (1.4)
The reason to omit the term (X
t
)t in the above equation is that this term is of the order
of o(1) while other two terms have a lower order. By (1.4), we can have a preliminary
visual sense of the form of () by looking at the graph of the transformed data with X
t
as
x-coordinate and X
t+1
X
t
(log return) as y-coordinate; see Figure 1.3. The theory behind
3.6 3.8 4.0 4.2
0
.
2
0
.
1
0
.
0
0
.
1
Level of log swap rate
L
o
g

r
e
t
u
r
n
Figure 1.3: The scatter plot of the log return versus the level of log of swap rates.
this idea can be found in Stanton (1997) or Cai and Hong (2003). We will discuss this idea
in detail later. In the above gure, each horizontal line except x-axis represents the level of
number of standard deviations away from zero. Except some outliers which can be explained
partly by the existence of jumps in the system, most of data points fall within 3 standard
deviations away from 0. This gure intensively indicates that the variations (volatility) of
dierence of X
t+1
and X
t
for every level of X
t
are almost same, which means it is reasonable
to assume that () is a constant function.
1.3 Jump-Diusion Modeling Procedures
By regularities observed in Section 2, we can specify our model under the so-called equiva-
lent martingale measure Q (see, e.g., Due (1996)) as follows:
(M1). We assume the volatility function () is a constant function, namely
(x) = , x 0 (1.5)
(M2). By (O2) in Section 2, we assume the instantaneous mean function (x) is an ane
function,
(x) = A( x x), x 0, (1.6)
where x stands for long-term mean of the process, and A > 0 is the speed of process
back to the long-term mean x. We will explain more about these two parameters.
(M3). We assume the pure jump process J
t
is a compound Poisson process independent
of continuous part of X
t
and {W
t
; t 0} although this assumption might not be
necessary. More formally, we assume that the intensity of the Poisson process is a
constant and jump sizes are i.i.d with same distribution . From (O1) in Section 2,
we can assume that is a normal distribution, with mean 0, and standard deviation
J
although the normality assumption on jump sizes might not be appropriate due to its
lack of fat-tail (One can assume that it follows a double exponential as in Kou (2002)
or Tsay (2002, 2005, Section 6.9)).
By assumptions (M1)-(M3) above, we can reformulate (1.3) as follows:
dX
t
= A( x X
t
)dt + dW
t
+ dJ
t
, X
0
= x
0
, (1.7)
where the compensator measure of J
t
, satises:
(de, dt) =

_
2
2
J
exp
_
e
2
2
2
J
_
dedt, (1.8)
and
E
Q
[dW
t
dJ
s
] = 0, s, t 0. (1.9)
Using the Ito lemma for semi-martingale, we can solve the equation (1.7) explicitly. That
is, for any given times t and T (we always assume t T in the following), we have,
X
T
= X
t
e
A(Tt)
+ x
_
1 e
A(Tt)
_
+ e
A(Tt)
_
_
T
t
e
A(st)
dW
s
+
_
T
t
e
A(st)
dJ
s
_
. (1.10)
By taking the expectation on both sides of (1.10), we obtain
E
Q
[X
T
] = E
Q
[X
t
]e
A(Tt)
+ x
_
1 e
A(Tt)
_
. (1.11)
Since A > 0, when T t , the rst term on the right side of (1.11) will diminish to 0,
while E
Q
[X
T
] x with exponential rate A. These facts tell us why x is called long-term
mean and A is called the speed of process back to the long-term mean.
Suppose that the times of observations of the process are equally-spaced, namely, we
assume we observe the process at the regular times to observe data (X
t
1
, X
t
2
, . . . , X
t
N+1
),
(for notational simplicity, we will denote X
n
= X
tn
for 1 n N + 1 ), where the equal
time interval is dened as = t
n+1
t
n
. Then (X
1
, X
2
, . . . , X
N+1
) follows an AR(1) model;
that is,
X
n+1
= a + bX
n
+
n+1
, 1 n N, (1.12)
where
a = x(1 e
A
), b = e
A
, (1.13)
and
n
e
A
_

0
e
As
dW
s
+ e
A
_

0
e
As
dJ
s
i.i.d. (1.14)
Using (1.12), (1.13) and (1.14), to overcome curse of dimensionality for estimating pro-
cedure, we propose the so called two-stage estimating technique to obtain preliminary
estimate for parameters. Formally speaking, we rst estimate parameters A and x by using
Weighted Least Square method, then use residuals to implement MLE estimating procedure
to estimate , and
J
. So only thing left we need to do is to nd the probability density
function for
n
, which is given by the following,
f
n
(x) =
e
2
2A
(1 e
2A
)
_
_
x
_
2
2A
(1 e
2A
)
_
_
+
k=1
e
k
k!
_

0
. . .
_

0
1
_
2
2A
(1 e
2A
) +
k
l=1
e
2A(s
l
)
2
J
_
_
x
_
2
2A
(1 e
2A
) +
k
l=1
e
2A(s
l
)
2
J
_
_
ds
1
. . . ds
k
, (1.15)
where (x) =
1
2
e
x
2
2
, namely, the p.d.f of standard normal distribution.
The two-stage estimate for parameters then can be numerically implemented. To estimate
parameters more eciently, we shall use the whole MLE procedure using Newton-Raphson
algorithm with Two-Stage estimate as initial point of algorithm. Our Two-stage estimates
(based on daily) are as follows:
A = 0.03110101

x = 3.743758 = 0.01841

= 0.06385
J
= 0.09299. (1.16)
Our whole MLE estimates (based on daily) are as follows:
A = 0.017124

x = 3.73213 = 0.018181

= 0.064548
J
= 0.092432 (1.17)
Now we turn to testing whether the jump diusion model is adequate. For testing pa-
rameters, we only do test for . Equivalently, it tests whether there are jumps for swap rates
evolution. Remaining parameters test can be done similarly. This statistical hypothesis can
be formulated as follows:
H
0
: = 0 versus H
1
: > 0. (1.18)
We use the likelihood ratio method to test this hypothesis. It is well known that 2 times
the dierence of two maximum log likelihoods converges asymptotically to a
2
-distribution
with degree of freedom equal to dierence of dimensions of two parameter spaces. In this
hypothesis, the degree of freedom is 2 since = 0 makes
J
irrelevant to the process. We
nd that p-value of test statistic is much less than 0.001, which means that H
0
is rejected.
So a model for this dataset without jump could be inappropriate.
1.4 Pricing American-style Options Using Stratica-
tion Simulation Method
To price an American option by using a simulation method (see, e.g., Glasserman, 2004),
it is always approximated by reducing the American option with intrinsic innite exercise
opportunities into a Bermudan option with nite exercise opportunities. Suppose the ap-
proximated Bermudan option can be exercised only at a xed set of exercise opportunities
t
1
< t
2
< . . . < t
m
, which are often equally spaced, and underlying process is denoted by
{X
t
; t 0}. To reduce notation, we write X
t
i
as X
i
. Then, if {X
t
} is a Markov process,
{X
i
; 0 i m} is a Markov chain, where X
0
denotes an initial state of the underlying. Let
h
i
denote the payo function for exercise at t
i
, which is allowed to depend on i. Let V
i
(x)
denote the value of the option at t
i
given X
i
= x. By assuming the option has not previ-
ously been exercised, we are ultimately interested in V
0
(X
0
). This value can be determined
recursively as follows:
V
m
(x) = h
m
(x) (1.19)
and
V
i1
(x) = max
_
h
i1
(x), E
Q
[D
i1,i
(X
i
)V
i
(X
i
)|X
i1
= x]
_
, (1.20)
where i = 1, 2, . . . , m, and D
i1,i
(X
i
) stands for the discount factor from t
i1
to t
i
, which
could have the form as
D
i1,i
(X
i
) = exp
_
_
t
i
t
i1
r(u)du
_
. (1.21)
So for simulation, the main job will be on implementing (1.20), and main diculty is also
at here. Actually, if the underlying state is of one dimension, for instance in our setting,
then we can eciently implement (1.20) by stratication method. That is, we discretize not
only time-dimension but also state space. Formally speaking, for each exercise date t
i
, let
A
i1
, . . . , A
ib
i
be a partition of the state space of X
i
into b
i
subsets. For the initial time 0,
take b
0
= 1 and A
01
= {X
0
}. Dene transition probabilities
p
i
j,k
= P
Q
(X
i+1
A
i+1,k
|X
i
A
ij
) (1.22)
for all j = 1, . . . , b
i
, k = 1, . . . , b
i+1
, and i = 0, . . . , m 1. (This is taken to be 0 if
P
Q
(X
i
A
ij
) = 0.) For each i = 1, . . . , m and j = 1, . . . , b
i
, we also dene
h
i,j
= E
Q
[h
i
(X
i
)|X
i
A
ij
] (1.23)
takeing this to be 0 if P
Q
(X
i
A
ij
) = 0. Now we consider the backward induction
V
ij
= max
_
h
ij
,
b
i+1
k=1
p
i
jk
V
i+1,k
_
(1.24)
for all j = 1, . . . , b
i
, k = 1, . . . , b
i+1
, and i = 0, . . . , m 1, initialized with V
mj
= h
mj
. This
method takes the value V
01
calculated through (1.24) as an approximation to V
0
(X
0
).
To implement this method, we need to do following steps:
(A1) Simulate a reasonably large number of replications of the Markov chain X
0
, X
1
, . . . , X
m
.
(A2) Record N
i
jk
, the number of paths that move from A
ij
to A
i+1,k
, for all i = 0, . . . , m1,
j = 1, . . . , b
i
and k = 1, . . . , b
i+1
.
(A3) Calculate the estimates
p
i
j,k
= N
i
j,k
/(N
i
j,1
+ . . . + N
i
j,b
i
) (1.25)
taking the ratio to be 0 whenever the denominator is 0. And calculate

h
i,j
as the
average value of h(X
i
) over those replications in which X
i
A
ij
, taking it to be 0
whenever there is no path in which X
i
A
ij
.
(A4) Set

V
mj
=
h
mj
for all j = 1, . . . , b
m
, and recursively calculate
V
ij
= max
_
h
ij
,
b
i+1
k=1
p
i
jk
V
i+1,k
_
(1.26)
for all j = 1, . . . , b
i
, and i = 0, . . . , m1. Then

V
01
just be our estimate of V
01
.
For our example, the American-style option is dened with payo function 1000000(exp(X)
K)
+
, where K = 44 bps, maturity T = 1 year, and initial price exp(X
0
) = 44 bps. Using
parameters presented on (1.17), we simulate 10000 paths with m = 400 exercise opportuni-
ties and decompose state-space into b
i
= 100 subsets. Then using the above algorithm, we
can approximate the American option price. Based on the simulation with 25 replications,
the mean value and standard deviation of approximates assuming that risk-free interest is
2.5% annually are as follows:
P = 781.762, and s
P
= 2.632. (1.27)
Note that the estimated value of price based on this jump model is quite close to the real
value.
1.5 Hedging Issues
In the previous implementation, we have assumed risk-free interest is 2.5% annually, and
parameters as presented in (1.17). In this Section, we consider hedging problems given these
parameters. We only discuss rst order hedging for the American option. Denote P(S
0
)
the option price, where we have omitted all other parameters except initial swap rate in the
function of P(). To do rst hedging for the derivative is to nd the value of
P
S
(S
0
). We can
nd this value numerically by using the Euler approximation, namely using rst dierence
ratio to approximate the partial derivative,
P
S
(S
0
)
P(S
0
+ S) P(S
0
)
S
(1.28)
Then we can use simulation method to nd P(S
0
+S) for suciently small S so that we
can nd an approximate hedging ratio. For our example, we let S = 0.25 bps. The
simulated hedging ratio is 0.2515. We can use similar technique to nd other Greeks.
1.6 Conclusions
We present a whole procedure of modelling, estimating, pricing, and hedging for a real
data under a simple jump-diusion setting. Some shortcomings are obvious in our setting,
since we dont consider some issues which maybe important for the price of this option.
For instance, we assume interest rate is deterministic, and intensity of jump event is
constant. Most critically, we dont deal with the issue observed on (O4) presented on Section
2. But, sometimes we need to compromise between accuracy and tractability in practice,
since calibrating a jump-diusion model usually needs a large scale of calculating eorts. A
reasonable extension (still no touch on (O4)) to our model is to take so called multi-factor
models, which usually include interest rates, CPI, GDP growth rate, volatility, and other
economic variables as factors. See Due, Pan, Singleton (2000) for more details.
1.7 References
Cai, Z. and Y. Hong (2003). Nonparametric methods in continuous-time nance: A selective
review. In Recent Advances and Trends in Nonparametric Statistics (M.G. Akritas and
D.M. Politis, eds.), 283-302.
Due, D. (2001). Dynamic Asset Pricing Theory, 3th Edition. Princeton University Press,
Princeton, NJ.
Due, D., J. Pan and K. Singleton (2000). Transform analysis and asset Pricing for ane
jump-diusion. Econometrica, 68, 1343-1376.
Glasserman, P. (2004). Monte Carlo Methods in Financial Engineering. Springer-Verlag,
New York.
Kou, S.G. (2002). A jump diusion model for option pricing. Management Science, 48,
1086-1101.
Merton, R.C. (1976). Option pricing when underlying stock return are discontinuous.
Journal of Financial Economics, 3, 125-144.
Stanton, R. (1997). A nonparametric model of term structure dynamics and the market
price of interest rate risk. Journal of Finance, 52, 1973-2002.
Taylor, S. (2005). Asset Price Dynamics, Volatility, And Prediction. Princeton University
Press, Princeton, NJ. (Chapter 4)
Tsay, R.S. (2005). Analysis of Financial Time Series, 2th Edition. John Wiley & Sons,
New York.
Chapter 2
Basic Concepts of Prices and Returns
2.1 Introduction
Any empirical analysis of the dynamics of asset prices through time requires price data which
raise some of the questions:
1. The rst question is where we can nd the data. There are many sources of data includ-
ing web sites, commercial vendors, university research centers, and nancial markets.
Here are some of them, listed below:
(a) CRSP: http://www.crsp.com (US stocks)
(b) Commodity Systems Inc: http://www.csidata.com (Futures)
(c) Datastream: http://www.datastream.com/product/has/ (Stocks, bonds, curren-
cies, etc.)
(d) IFM (Institute for Financial Markets): http://www.theifm.org (futures, US stocks)
(e) Olsen & Associates: http://www.olsen.ch (Currencies, etc.)
(f) Trades and Quotes DB: http://www.nyse.com/marketinfo (US stocks)
(g) US Federal Reserve: http://www.federalreserve.gov/releases (Currencies, etc.)
(h) Yahoo! (free): http://biz.yahoo.com/r/ (Stocks, many countries)
(i) For downloading the Chinese nancial data, please see the le on my home page
http://www.math.uncc.edu/ zcai/nance-data.doc which is downloadable.
Further, the high frequency data (tick by tick data) can be downloaded from the
Bloomberg machine located at Room 33 of Friday Building on our campus but you
13
CHAPTER 2. BASIC CONCEPTS OF PRICES AND RETURNS 14
might ask Department of Finance for a help. Finally, you can download some data
through the web site at Wharton Research Data Services (WRDS)
http://wrds.wharton.upenn.edu/index.shtml, which our UNCC subscribes partially.
To log in WRDS, you need to have an account which can be obtained by contacting
Jon Finn through e-mail jcnn@uncc.edu or phone (704) 687-3156.
2. The second question is what the frequency of data is. It depends on what kind of
data you have and what kind of topics you are doing. For study of microstructure of
nancial market, you need to have high frequency data. For most of studies, you might
need daily/weekly/monthly data.
3. The third one is how many periods (say, years) (the length) of data we need for analysis.
Theoretically, the larger sample size would be better but it might have structural
changes for a long sample period. In other words, the dynamics might change over
time.
4. The last one is how many prices for each period we wish to obtain and what kind of
price we need.
Answer: It depends on the purpose of your study.
2.2 Basic Denitions
First, we introduce some basic concepts, which you might be very familiar with.
2.2.1 Time Value of Money
Consider an amount $V invested for n years at a simple interest rate of r per annum (where
r is expressed as a decimal). If compounding takes place only at the end of the year, the
future value after n years is:
FV
n
= V (1 + r)
n
.
If interest is paid m times per year then the future value after n years is:
FV
m
n
= V (1 +
r
m
)
mn
.
Table 2.1: Illustration of the Eects of Compounding:
The Time Interval Is 1 Year and the Interest Rate is 10% per Annum.
Number Interest rate
Type of payments per period Net Value
Annual 1 0.1 $1.10000
Semiannual 2 0.05 $1.10250
Quarterly 4 0.025 $1.10381
Monthly 12 0.0083 $1.10471
Weekly 52 0.1/52 $1.10506
Daily 365 0.1/365 $1.10516
Continuously exp(0.1) $1.10517
As m, the frequency of compounding, increases the rate becomes continuously compounded
and it can be shown that the future value becomes:
FV
c
n
= lim
m
V (1 +
r
m
)
mn
= V exp(r n), (2.1)
where exp() is the exponential function.
Example: Assume that the interest rate of a bank deposit is 10% per annum and the initial
deposit is $1.00. If the bank pays interest once a year, then the net value of the deposit
becomes $1(1+0.1)=$1.1 one year later. If the bank pays interest semi-annually, the 6-month
interest rate is 10%/2 = 5% and the net value is $1(1 0.1/2)
2
=$1.1025 after the rst year.
In general, if the bank pays interest m times a year, then the interest rate for each payment
is 10%/m and the net value of the deposit becomes $1(1 0.1/m)
m
one year later. Table 2.1
gives the results for some commonly used time intervals on a deposit of $1.00 with interest
rate 10% per annum. In particular, the net value approaches $1.1052, which is obtained by
exp(0.1) and referred to as the result of continuous compounding.
2.2.2 Assets and Markets
Financial Assets:
1. Zero-Coupon Bond (discount bond). A zero-coupon bond with maturity date T pro-
vides a monetary unit at date T. At date t with t T, the zero-coupon bond has a
residual maturity of H = T t and a price of B(t, H) (or B(t, T t)), which is the
price at time t,
B(t, T) =
_
_
_
(1 + r)
(Tt)
pay at the end of maturity day
(1 + r/m)
m(Tt)
compounding with frequency m
exp(r(T t)) continuous compounding
where r is the interest rate. In particular, B(0, T) is the current, time 0 of the bond,
and B(T, T) = 1 is equal to the face value, which is a certain amount of money that
the issuing institute (for example, a government, a bank or a company) promises to
exchange the bond for.
2. Coupon Bond. Bonds promising a sequence of payments are called coupon bonds. The
price p
t
at which the coupon bond is traded at any date t between 0 and the maturity
date T diers from the issuing price p
0
.
3. Stocks
4. Buying and Selling Foreign Currency
5. Options
6. More
For more details about bonds, see the book by Capi nski and Zastawniak (2003).
2.2.3 Financial Theory
Basic theoretical concepts in nancial theory (The best book for this aspect is the book by
Cochrane (2002)):
1. Equilibrium Models. (CAPM, CCAPM, market microstructure theory). Our focus
is only on the CAPM. Please read the paper by Cai and Kuan (2008) and the references
therein on the recent developments in the conditional CAP/APT models. Also, for the
market microstructure theory, please read Chapter 3 of Campbell, Lo and MacKinlay
(1997, CLM hereafter), or Part IV of Taylor (2005), or Chapter 5 of Tsay (2005).
2. Absence of Arbitrage Opportunity. The theory is based on the assumption that it
is impossible to achieve sure, strictly positive, gain with a zero initial endowment. This
assumption suggests imposing deterministic inequality restrictions on asses prices.
3. Actuarial Approach. This approach assumes a deterministic environment and em-
phasizes the concept of a fair price of nancial asset.
Example: The price of stock at period 0 that provides future dividends d
1
, d
2
, . . . , d
t
at
predetermined dates 1, 2, . . . , t has to coincide with the discounted sum of future cash ows:
S
0
=
t=1
d
t
B(0, t),
where B(0, t) is the price of the zero-coupon bond with maturity t (discount factor). The
actuarial approach is not conrmed by empirical research because it does not take into
account uncertainty.
2.3 Statistical Features
2.3.1 Prices
Prices: closing prices in stock market; currency exchange rates; option prices; more, .
2.3.2 Frequency of Observations
It depends on the data available and the questions that interest a researcher. The price
interval between prices should be sucient to ensure that trade occurs in most intervals
and it is preferable that the volume of trade is substantial. Daily data are ne for most of
the applications. Also, it is important to distinguish the price data indexed by transaction
counts from the data indexed by time of associated transactions.
2.3.3 Denition of Returns
The statistical inference on asset prices is complicated because asset price might have non-
stationary behavior (upward and downward movements). One can transform asset prices
into returns, which empirically display more stationary behavior. Also, returns are scale-free
and not limited to the positiveness. You may notice the dierence in the behavior of price
data and returns by looking at IBM prices and IBM returns in Figure 2.1 and Figure 2.2.
1. Return of a nancial asset (stock) with price P
t
at date t that produces no divi-
dends over the period (t, t + H) is dened as:
r(t, t + H) =
P
t+H
P
t
P
t
(2.2)
Date
0
9
/
2
7
/
9
7
0
6
/
0
4
/
9
8
0
2
/
0
9
/
9
9
1
0
/
1
7
/
9
9
0
6
/
2
3
/
0
0
0
2
/
2
8
/
0
1
1
1
/
0
5
/
0
1
0
7
/
1
3
/
0
2
0
3
/
2
0
/
0
3
1
1
/
2
5
/
0
3
C
l
o
s
e
40
60
80
100
120
140
The stock price of IBM, weekly observations
Date
0
9
/
2
7
/
9
7
0
6
/
0
4
/
9
8
0
2
/
0
9
/
9
9
1
0
/
1
7
/
9
9
0
6
/
2
3
/
0
0
0
2
/
2
8
/
0
1
1
1
/
0
5
/
0
1
0
7
/
1
3
/
0
2
0
3
/
2
0
/
0
3
1
1
/
2
5
/
0
3
C
l
o
s
e
40
60
80
100
120
140
The stock price of IBM, monthly observations
Figure 2.1: The weekly and monthly prices of IBM stock.
Very often, we will investigate returns at a xed unitary horizon. In this case H = 1
and return is dened as:
r(t, t + 1) =
P
t+1
P
t
P
t
=
P
t+1
P
t
1. (2.3)
Returns r(t, t + H) and r(t, t + 1) in (2.2) and (2.3) are sometimes called the simple
net return. Very often, r(t, t +1) is simply denoted as r
t+1
. The simple gross return is
dened as:
R(t, t + H) =
P
t+H
P
t
= 1 + r(t, t + H).
Since
P
t+H
Pt
=
P
t+H
P
t+H1
P
t+H1
P
t+H2
. . .
P
t+1
Pt
the R(t, t + H) can be rewritten as:
R(t, t + H) =
P
t+H
P
t+H1
P
t+H1
P
t+H2
. . .
P
t+1
P
t
= R(t + H 1, t + H) R(t + H 2, t + H 1) . . . R(t, t + 1)
=
H
j=1
R(t + H j, t + H + 1 j).
The simple gross return over H periods is the product of one period returns.
The formula in (2.3) is often replaced by the following approximation:
r(t, t + 1) r
t+1
ln(P
t+1
) ln(P
t
) = ln
_
P
t+1
P
t
_
= ln(R(t, t + 1)). (2.4)
The return in (2.4) is also known as continuously compounded return or log return. To
see why r(t, t +1) is called the continuously compounded return, take the exponential
of both sides of (2.4) and rearranging we get
P
t+1
= P
t
exp(r(t, t + 1)). (2.5)
By comparing (2.5) with (2.1) one can see that r(t, t + 1) is the continuously com-
pounded growth rate in prices between months t 1 and t. Rearranging (2.4) one can
show that:
r(t, t + H) =
H
j=1
r(t + H j, t + H + 1 j).
2. Return of a nancial asset (stock) with price P
t
at date t that produces dividends
D
t+1
over the period (t, t + 1) is dened as:
r(t, t + 1) =
P
t+1
+ D
t+1
P
t
P
t
=
P
t+1
P
t
P
t
+
D
t+1
P
t
, (2.6)
where D
t+1
/P
t
is the ratio of dividend over price (d-p ratio), which is a very important
nancial instrument for studying nancial behavior.
3. Spot currency returns. Suppose that P
t
is the dollar price in period t for one unit
of foreign currency (say, euro). Let i
t1
be the continuously compounded interest rate
Date
0
9
/
2
7
/
9
7
0
6
/
0
4
/
9
8
0
2
/
0
9
/
9
9
1
0
/
1
7
/
9
9
0
6
/
2
3
/
0
0
0
2
/
2
8
/
0
1
1
1
/
0
5
/
0
1
0
7
/
1
3
/
0
2
0
3
/
2
0
/
0
3
1
1
/
2
5
/
0
3
C
l
o
s
e
-0.20
-0.15
-0.10
-0.05
0.00
0.05
0.10
0.15
0.20
The returns of IBM, weekly observations
Date
0
9
/
2
7
/
9
7
0
6
/
0
4
/
9
8
0
2
/
0
9
/
9
9
1
0
/
1
7
/
9
9
0
6
/
2
3
/
0
0
0
2
/
2
8
/
0
1
1
1
/
0
5
/
0
1
0
7
/
1
3
/
0
2
0
3
/
2
0
/
0
3
1
1
/
2
5
/
0
3
C
l
o
s
e
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
The returns of IBM, monthly observations
Figure 2.2: The weekly and monthly returns of IBM stock.
for deposits in foreign currency from time t 1 until time t. Then one dollar used to
buy 1/P
t1
euros in period t 1, which are sold with accumulated interest in period t,
gives proceeds equal to P
t
exp(i
t1
)/P
t1
and the return is
r
t
= log(P
t
) log(P
t1
) + i
t1
= p
t
p
t1
+ i
t1
.
In practice, the foreign interest rate is ignored because it is very small compared with
the magnitude of typical daily logarithmic price change.
4. Futures returns. Suppose F
t,T
is the futures price in period t for delivery or cash set-
tlement in some later period T. As there are no dividend payouts on futures contracts,
the futures return is dened as:
r
t
= log(F
t,T
) log(F
t1,T
) = f
t,T
f
t1,T
,
where f
t,T
= log(F
t,T
).
5. Excess return is dened as the dierence between the assets return and the return
on some reference asset. The reference asset is usually assumed to be riskless and in
practice is usually a short-term Treasury bill return. Excess return is deed as:
z(t, t + 1) = z
t+1
= r(t, t + 1) r
0
(t, t + 1), (2.7)
where r
0
(t, t + 1) is the reference return from period t to period t + 1.
2.4 Stylized Facts for Financial Returns
When you have data, the rst and very important step you need to do is to explore primarily
the data. That is; before you build models for the given data, you need to examine the data to
see what kind of key features the data have, to avoid the mis-specication, so that intuitively,
you have some basic ideas about the data and possible models for the given data. Here are
three important and common properties that are found in almost all sets of daily returns
obtained from a few years of prices:
1. The distribution of returns is not normal (do you believe this?), but it has
the following empirical properties:
Stationarity. There are two denitions: weakly (second moment) stationary and
strictly stationary. The former is referred in most of applications. Question: How
to check stationarity?
It is approximately symmetric. Sample estimates of skewness (
3
/
3
, where
i
is the ith central moment
i
= E(r
t
)
i
, is the mean, and
2
is the variance)
for daily US stock returns tend to be negative for stock indices but close to zero
or positive for individual stocks.
It has fat tails. Kurtosis (the ratio of the forth central moment over square of
the second central moment minus 3; that is, =
4
/
2
2
3) for daily US stock
returns are large and positive for both indices and individual stocks which means
that returns have more probability mass in the tail areas than would be predicted
by a normal distribution (leptokurtic or > 0).
It has a high peak. See Figure 2.3 for IMB daily returns by a comparison with
the standard norm.
Figures 2.3-2.4 compare empirical estimates of the probability distribution function
6 4 2 0 2 4 6
0
20
40
60
80
Standardized IBM returns
6 4 2 0 2 4 6
0
0.1
0.2
0.3
0.4
0.5
Analytical pdf
Empirical pdf
Figure 2.3: The empirical distribution of standardized IBM daily returns and the pdf of stan-
dard normal. Notice fat tails of empirical distribution compared with the tails of standard
normal.
(pdf) of standardized IBM and Microsoft (MSFT) returns, z
t
= (r
t
r)/ , with the
probability density distribution of normal distribution. This empirical density estimate
6 4 2 0 2 4 6
0
20
40
60
80
100
Standardized MSFT returns
6 4 2 0 2 4 6
0
0.1
0.2
0.3
0.4
0.5
Analytical pdf
Empirical pdf
Figure 2.4: The empirical distribution of standardized Microsoft daily returns and the pdf
of standard normal. Notice fat tails of empirical distribution compared with the tails of
standard normal.
has been calculated using nonparametric kernel density estimation:
f(z) =
1
T
T
t=1
1
h
K
_
z z
t
h
_
, (2.8)
where K() is a kernel function and h = h(T) 0 as T is called bandwidth.
In practice, h = c T
0.2
for some positive c dependent on the features of data. Note
that (2.8) is well known in the nonparametric statistics literature. For details, see the
book by Fan and Gijbels (1996). The estimated kurtosis for the standardized IBM
and Microsoft returns is 5.59 and 5.04 respectively (excess kurtosis, /
_
24/T). The
fact that the distribution of returns is not normal implies that classical
linear regression models for returns may be not good enough. A satisfactory
probability distribution for daily returns must have high kurtosis and be either exactly
or approximately symmetric. Figure 2.5 displays the quantile-quantile (Q-Q) plots
for the standardized IBM returns (top panel) and the standardized Microsoft returns
(bottom panel). It is evident that the IBM and MSFT returns are not exactly normally
distributed. For more examples, see Table 1.2 (p. 11) and Figure 1.4 (p.19) in Tsay
(2005) or Table 4.6 and Figures 4.1 and 4.2 (pp 70-72) in Taylor (2005).
4 3 2 1 0 1 2 3 4
6
4
2
0
2
4
6
Standard Normal Quantiles
Q
u
a
n
t
i
l
e
s

o
f

I
n
p
u
t

S
a
m
p
l
e
QQplot for standardized IBM returns
4 3 2 1 0 1 2 3 4
6
4
2
0
2
4
6
Standard Normal Quantiles
Q
u
a
n
t
i
l
e
s

o
f

I
n
p
u
t

S
a
m
p
l
e
QQplot for standardized MSFT returns
Figure 2.5: Q-Q plots for the standardized IBM returns (top panel) and the standardized
Microsoft returns (bottom panel).
Question 1: How to model the distribution of a return or returns? (A) Parametric
models; (B) Mixture models (see Section 4.8 in Taylor (2005) and Maheu and McCurdy
(2009)); (C) Nonparametric models.
Question 2: How do you know the distribution of a return belong to a particular family?
(A) Informative way to do a model checking using graphical methods, such as Q-Q plot;
(B) Ocial way is to do hypothesis testing; say Jarque-Bera test and Kolmogorov-
Smirnov tests or other advanced tests, say nonparametric versus parametric tests.
2. There is almost no correlation between returns for dierent days. Recall that
the correlation between returns periods apart is estimated from T observations by
the sample autocorrelation at lag :

T
t=1
(r
t
r)(r
t+
r)
T
t=1
(r
t
r)
2
where r is the sample mean of all T observations. The command acf() in R is the plot
of
versus , which is called the ACF plot.

To test H
0
:
1
= 0, one can use the Durbin-Watson test statistic which is
DW =
T
t=2
(r
t
r
t1
)
2
/
T
t=1
r
2
t
.
Straightforward calculation shows that DW 2(1
1
), where
1
is the lag-1 ACF of
{r
t
}.
Consider testing that several autocorrelation coecients are simultaneously zero, i.e.
H
0
:
1
=
2
= . . . =
m
= 0. Under the null hypothesis, it is easy to show (see, Box
and Pierce (1970)) that
Q = T
m
k=1
2
k

2
m
. (2.9)
Ljung and Box (1978) provided the following nite sample correction which yields a
better t to the
2
m
for small sample sizes:
Q
= T(T + 2)
m
k=1

2
k
T k

2
m
. (2.10)
Both are called Q-test and well known in the statistics literature. Of course, they are
very useful in applications.
The function in R for the Ljung-Box test is
Box.test(x, lag = 1, type = c("Box-Pierce", "Ljung-Box"))
and the Durbin-Watson test for autocorrelation of disturbances is
dwtest(formula, order.by = NULL, alternative = c("greater","two.sided",
"less"),iterations = 15, exact = NULL, tol = 1e-10, data = list())
which is in the package lmtest.
3. The correlation between magnitudes of returns on nearby days are positive
and statistically signicant. Functions of returns can have substantial autocorrela-
tions even though returns have very small autocorrelations. Usually, autocorrelations
are discussed for {|r
t
|
}, = 1, 2. It is a stylized fact that there is positive dependence

between absolute returns on nearby days, and likewise for squared returns. See Section
4.10 in Taylor (2005) and Section 3.5.8.
The autocorrelations of absolute returns are always positive at a lag one day and
positive dependence continues to be found for several further lags. Squared returns
also exhibit positive positive dependence but to a lesser degree. The dependence
in absolute returns may be explained by volatility clustering or regime
switching or nonlinearity. See Section 4.9 in Taylor (2005).
4. Nonlinearity of the Returns Process. For example, Hong and Lee (2003) con-
ducted studies on exchange rates and they found that some of them are predictable
based on nonlinear time series models. There are many ongoing research activities in
this direction. See Chapter 4 in Tsay (2005) and Cai and Kuan (2008). If we have
time, we will spend some time in exploring further on this topic.
2.5 Problems
1. Download weekly (daily) price data for any two stocks, for example, IBM stock (P
1t
)
for 01/02/62 - 01/15/08 and for Microsoft stock (P
2t
) for 03/13/86 - 01/15/2008.
(a) Create a time series of continuously compounded weekly returns for IBM (r
1t
)
and for Microsoft (r
2t
).
(b) Use the constructed weekly returns to construct a series of monthly returns. You
may assume for simplicity that one month consists of four weeks.
(c) Construct a graph of stock price series (P
1t
, P
2t
) and returns series (r
1t
, r
2t
).
(d) Compute and graph the rolling estimates of the sample mean and variance for
stock prices and returns. In computation of rolling estimates, you may use the
last quarter of data (13 weeks).
NOTE: You either write code by yourself or use the build-in function in R. To use
the build-in function for the rolling analysis in R, you need to do the followings:
The rst thing you need to do is to load fTrading, which is a package for RMetrics.
When you open R window, go to packages local packages, and go down
to fTrading, and nally, double click it. After you load the package fTrading, the
command for the rolling analysis is
roll=rollFun(x,n,FUN=mean) # x is the series for the rolling
Or, you can use rapply or rollmean in the package zoo. To use the package zoo,
you need to load it rst.
x1=zoo(x)
x2=rapply(x1,n,FUN=mean) # x is the series for the rolling
(e) What is the denition of a stationary stochastic process? Do prices look like a
stationary process? Why? Do returns look like a stationary process? Why?
(f) Compute autocorrelation coecients
k
for 1 k 5 for prices and returns series.
To compute autocorrelation coecients, you may use the program acf function in
R. This program is called as follows:
rho=acf(x,k, plot=F)
win.graph()
# open a graph window
plot(rho)
# make a plot
rho_value=rho$acf
# get the estimated $\rho$-values
print(rho_value)
# print the estmated $\rho$-values on screen
# where $x$ is a time-series vector (stock prices, stock returns,
# etc.), $k$ is the maximum lag considered ($5$ in this example).
(g) Based on the computed autocorrelations for IBM and MSFT stock prices and
returns, what can you say about correlation between stock prices for dierent
days? What can you say about correlation between stock returns for dierent
days?
(h) Using your stock returns for IBM and MSFT, r
it
, i = 1, 2, construct four more
series y
it
= |r
it
|
, i = 1, 2 and = 1, 2. Compute autocorrelation coecients

k
for 1 k 5 for the newly constructed series. Compare the computed
correlations for |r
it
|
, = 1, 2, and |r
it
|. Are results as you expected?
(i) Use the Jarque-Bera test (see Jarque and Bera (1980, 1987)) to test the assump-
tion of return normality for IBM and Microsoft stock returns.
NOTE: The Jarque-Bera test evaluates the hypothesis that X has a normal dis-
tribution with unspecied mean and variance, against the alternative that X does
not have a normal distribution. The test is based on the sample skewness and
kurtosis of X. For a true normal distribution, the sample skewness should be
near 0 and the sample kurtosis should be near 3. A test has the following general
form:
JB =
T
6
_
S
k
+
(K 3)
2
4
_

2
2
,
where S
k
and K are the measures of skewness and kurtosis respectively. To use
the build-in function for the Jarque-Bera test in R, you need to do the followings:
The rst thing you need to do is to load tseries, which is a package for Time
Series and Computational Finance. When you open R window, go to packages
local packages, and go down to tseries, and nally, double click it. After
you load the package tseries, the command for the Jarque-Bera test is
jb=jarque.bera.test(x) # x is the series for the test
print(jb)
Alternatively, you can also use the Kolmogorov-Smirnov tests as
ks.test(x, y, ..., alternative = c("two.sided", "less", "greater"),
exact = NULL)
To use Kolmogorov-Smirnov tests, you need to standardize the data rst.
2. Use R program to estimate the probability density function (see (2.8)) of standardized
IBM and MSFT stock returns z
it
, z
it
= (r
it
r
i
)/
i
, where r
i
and
i
are the sample
mean and standard deviation of r
i
, i = 1, 2. The program R is called as follows:
Suppose that Z is a vector of standardized stock returns,
y0=density(Z, m=100, from=-3, to=3)
# m is the number of grid points from interval (from, to)
y1=y0$y
# get estimated density vaules at m grid points
x0=seq(-3,3,length=100)
# set the vaules for m grid points
win.graph()
matplot(x0,cbind(y1,dnorm(x0)),type="l", lty=c(1,2),xlab="",ylab="")
# make a plot with two graphs
win.graph()
qqnorm(Z)
qqline(Z,col=2)
# make a Q-Q plot of Z
# where $y1$ is a vector of estimated probabilities at $m=100$
# grid points from $-3$ to $3$. Compare the empirical distribution
# with a graph of standard normal distribution.
(a) Estimate and construct a graph of the estimated probability density function for
IBM and Microsoft stock returns:
(b) On the same graph with the empirical density, construct a graph of the standard
normal density function. Comment your results.
(c) Construct QQ-plot for standardized IBM and MSFT returns. You may use the
R command for this. Comment your results.
2.6 References
Box, G. and D. Pierce (1970). Distribution of residual autocorrelations in autoregressive
integrated moving average time series models. Journal of the American Statistical
Association, 65, 1509-1526.
Cai, Z. and C.-M. Kuan (2008). Time-varying betas models: A nonparametric analysis.
Working paper, Department of Mathematics and Statistics, University of North Car-
olina at Charlotte.
Campbell, J. Y., A.W. Lo and A.C. MacKinlay (1997). The Econometrics of Financial
Markets. Princeton University Press, Princeton, NJ. (Chapter 1).
Capi nski, M. and T. Zastawniak (2003). Mathematics for Finance. Springer-Verlag, Lon-
don.
Cochrane, J.H. (2002). The Asset Pricing Theory. Princeton University Press, Princeton,
NJ. (nancial theory)
Hong, Y. and T.-H. Lee (2003). Inference on via generalized spectrum and nonlinear time
series models. The Review of Economics and Statistics, 85, 1048-1062.
Gourieroux, C. and J. Jasiak (2001). Financial Econometrics: Problems, Models, and
Methods. Princeton University Press, Princeton, NJ. (Chapter 1)
Fan, J. and I. Gijbels (1996). Local Polynomial Modeling and Its Applications. London:
Chapman and Hall.
Jarque, C.M. and A.K. Bera (1980). Ecient tests for normality, homoscedasticity and
serial independence of regression residuals. Economics Letters, 6, 255-259.
Jarque, C.M. and A.K. Bera (1987). A test for normality of observations and regression
residuals. International Statistical Review, 55, 163-172.
Ljung, G. and G. Box (1978). On a measure of lack of t in time series models. Biometrika,
66, 67-72.
Maheu, J.M. and T.H. McCurdy (2009). How Useful are Historical Data for Forecasting
the Long-Run Equity Return Distribution? Journal of Business & Economic Statistics,
27, 95-112.
Taylor, S. (2005). Asset Price Dynamics, Volatility, And Prediction. Princeton University
Press, Princeton, NJ. (Chapters 1-4)
New York. (Chapter 1)
Zivot, E. (2002). Lecture Notes on Applied Econometric Modeling in Finance. The web
link is: http://faculty.washington.edu/ezivot/econ483/483notes.htm
Chapter 3
Linear Time Series Models and Their
Applications
In this chapter, we discuss basic theories of linear time series analysis, introduce some simple
econometric models useful for analyzing nancial time series, and apply the models to asset
returns. Discussions of the concepts are brief with emphasis on those relevant to nancial
applications. Understanding the simple time series models introduced here will go a long
way to better appreciate the more sophisticated nancial econometric models of the later
chapters. There are many time series textbooks available. For basic concepts of linear time
series analysis, see Box, Jenkins, and Reinsel (1994, Chapters 2 and 3) and Brockwell and
Davis (1996, Chapters 1).
Treating an asset return (e.g., log return r
t
of a stock) as a collection of random vari-
ables over time, we have a time series {r
t
}. Linear time series analysis provides a natural
framework to study the dynamic structure of such a series. The theories of linear time series
discussed include stationarity, dynamic dependence, autocorrelation function, modeling, and
forecasting. The econometric models introduced include
(a) simple autoregressive (AR) models,
(b) simple moving-average (MA) models,
(c) mixed autoregressive moving-average (ARMA) models,
(d) a simple regression model (constant expected return model) with time series errors,
and
(f) dierenced models (ARIMA).
For an asset return r
t
, simple models attempt to capture the linear relationship between r
t
and information available prior to time t. The information may contain the historical values
31
CHAPTER 3. LINEAR TIME SERIES MODELS AND THEIR APPLICATIONS 32
Table 3.1: Denitions of ten types of stochastic process
A process is . . . If . . .
1. Strictly stationary The multivariate distribution function for k consecutive variables does not
depend on the time subscript attached to the rst variable (any k)
2. Stationary Means and variances do not depend on time subscripts, covariances depend
only on the dierence between two subsripts
3. Uncorrelated The correlation between variables having dierent time subscripts is always
zero
4. Autocorrelated It is not uncorrelated
5. White noise The variables are uncorrelated, stationary and have mean equal to 0
6. Strict white noise The variables are independent and have identical distributions whose mean
is equal to 0
7. A martingale The expected value of variable at time t, conditional on the information
provided by all previous values, equals variables at time t 1
8. A martingale dierence The expected value of a variable at period t, conditional on the information
provided by all previous values, always equals 0
9. Gaussian All multivariate distributions are multivariate normal
10. Linear It is a liner combination of the present and past terms from a strict white
noise process.
of r
t
and the random vector Y
t
that describes the economic environment under which the
asset price is determined. As such, correlation plays an important role in understanding
these models. In particular, correlations between the variable of interest and its past values
become the focus of linear time series analysis. These correlations are referred to as serial
correlations or autocorrelations. They are the basic tool for studying a stationary time series.
3.1 Stationary Stochastic Process
A stochastic process (time series) is a sequence of random variables in time order. Some-
times it is called the data generating process (DGP) of a model. A stochastic process is
often denoted by a typical variable in curly brackets, such as {X
t
}. A time-ordered set of
observations, {x
1
, x
2
, . . . , x
T
}, is called a time series. Much of the time series and nancial
econometrics is about methods for inferring and estimating the properties of the stochastic
process that generates a time series of returns. Table 3.1 gives denitions of some categories
of stochastic process; see Taylor (2005, p.31). Some examples of categories of stochastic
processes are displayed in Figure 3.1, and relationships between categories of uncorrelated
processes are given in Figure 3.2. Note that correlation or autocorrelation coecient mea-
sures only a linear relationship of two variables and the martingale dierence corresponds to
the market eciency in nance.
0 50 100 150 200 250 300
4
2
0
2
4
Strictly stationary, Uncorrelated, Strict white noise, MD
0 50 100 150 200 250 300
20
0
20
40
60
Not stationary, Unocorrelated, Not White noise, Not MD
0 50 100 150 200 250 300
5
0
5
10
15
Not stationary, Autocorrelated, Not White noise, Martingale
Figure 3.1: Some examples of dierent categories of stochastic processes.
Gaussian white noise
Strict white noise
Stationary martingale
difference
White noise
Unocorrelated, zero
mean
Martingale difference
Figure 3.2: Relationships between categories of uncorrelated processes.
Question: Is a time series of stock or market index returns really stationary? How to check
stationarity?
Exercises: As exercises, please nd some stock and market index returns and examine them.
Try to make conclusions by yourself to see what you can make. Also, similar to Figures 3.1
and 3.2, please simulate various time series (dierent types and dierent sample sizes) and
make the time series plot for them to make some feelings about them intuitively.
3.2 Constant Expected Return Model
Although this model is very simple and might not be appropriate for applications, it allows
us to discuss and develop important econometric topics such as estimation and hypothesis
testing. We will touch with some sophisticated and modern models later but they require
much deeper knowledge.
3.2.1 Model Assumptions
Let r
it
denote the continuously compounded return on an asset i at time t, r
it
= log(P
it
)
log(P
i,t1
) = p
it
p
i,t1
. There are following assumptions about the probability distribution
of r
it
for i = 1, . . . , N assets over time horizon t = 1, . . . , T:
Assumption 1. Normality of returns: r
it
N(
i
,
2
i
), i = 1, . . . , N and t = 1, . . . , T.
Assumption 2. Constant variances and covariances: Cov(r
it
, r
jt
) =
ij
, i, j = 1, . . . , N
and t = 1, . . . , T.
Assumption 3. No serial correlation across assets over time: Cov(r
it
, r
js
) = 0, for t = s
and i, j = 1, . . . , N.
3.2.2 Regression Model Representation
A convenient mathematical representation or model of asset returns can be given based on
assumptions 1-3. This is the constant expected return (CER) regression model. For assets
i = 1, . . . , N and time periods t = 1, . . . , T, the CER model is represented as:
r
it
=
i
+ e
it
with e
it
iid
N(0,
2
i
) and Cov(e
it
, e
jt
) =
ij
, (3.1)
where
i
is a constant and e
it
is a normally distributed random variable with mean zero
and variance
2
i
. Using the basic properties of expectation, variance and covariance, we can
derive the following properties of returns:
E(r
it
) =
i
, Var(r
it
) =
2
i
, Cov(r
it
, r
jt
) =
ij
, and Cov(r
it
, r
js
) = 0, t = s
so that
Corr(r
it
, r
jt
) =

ij
j
=
ij
and Corr(r
it
, r
js
) =
0
j
= 0, i = j, t = s.
Since the random variable e
it
is independent and identically distributed normal the asset
returns r
it
will also be i.i.d normal:
r
it
iid
N(
i
,
2
i
)
Therefore, the CER (3.1) is equivalent to the model implied by assumptions 1-3. The random
variable e
it
can be interpreted as representing the unexpected news concerning the value of
asset that arrives between times t 1 and time t:
e
it
= r
it
i
= r
it
E(r
it
).
The assumption that E(e
it
) = 0 means that news, on average, is neutral. The assumption
that Var(e
it
) =
2
i
can be interpreted as saying that volatility of news arrival is constant
over time.
Question: Do you think that the CER model is a good model for applications? Please
answer this question from the empirical stand point of view.
3.2.3 CER Model of Asset Returns and Random Walk Model of
Asset Prices
The CER model of asset returns (3.1) gives a rise to the random walk (RW) model of the
logarithm of asset prices. Recall that continuously compounded return, r
it
, is dened as:
ln(P
it
) ln(P
i,t1
) = r
it
Letting p
it
= ln(P
it
) and using the representation of r
it
in the CER model (3.1), we may
rewrite the above as the random walk model (RW)
p
it
p
i,t1
=
i
+ e
it
(3.2)
In the RW model,
i
represents the expected change in the log of asset prices between periods
t 1 and t and e
it
represents the unexpected change in prices. The RW model gives the
following interpretation for the evolution of asset prices:
p
it
=
i
+ p
it1
+ e
it
, p
iT
= T
i
+ p
i0
+
T
t=1
e
it
.
At time t = 0 the expected price at time t = T is E(p
iT
) = p
i0
+ T
i
.
3.2.4 Monte Carlo Simulation Method
A good way to understand the probabilistic behavior of a model is to use simulation methods
to create pseudo data from the model. The process of creating such pseudo data is called
Monte Carlo simulation. The steps to create a Monte Carlo simulation from the CER model
are:
Fix values for the CER model parameters and (or
2
).
Determine the number of simulated values, T, to create.
Use a computer random number generator to simulate T iid values e
t
from N(0,
2
)
distribution. Denote these simulated values as e
1
, . . . , e
T
.
Create simulated return data r
t
= + e
t
for t = 1, . . . , T.
The Monte Carlo simulation of returns and prices using the CER model is presented in
Figure 3.3.
Exercises: Please follow the above steps to do some Monte Carlo simulations and make
your conclusions and interpret them.
3.2.5 Estimation
The CER model states that r
it
iid
N(
i
,
2
i
). Our best guess for the return at the end of
the month is E(r
it
) =
i
, our measure of uncertainty about our best guess is captured by
i
=
_
Var(r
it
), and our measure of the direction of linear association between r
it
and r
jt
is
ij
= Cov(r
it
, r
jt
). A key task in nancial econometrics is estimating the values of
i
,
2
i
and
ij
from the observed historical data. The ordinary least squares (OLS) estimates are:

i
=
1
T

r =
1
T
T
t=1
r
it
= r
i
,

2
i
=
1
T 1
(r
i
r
i
)
(r
i
r
i
) =
1
T 1
T
t=1
(r
it
r
i
)
2
,
i
=
_

2
i
,

ij
=
1
T 1
(r
i
r
i
)
(r
j
r
j
) =
1
T 1
T
t=1
(r
it
r
i
)(r
jt
r
j
), and
ij
=

ij

i

j
,
where is an T 1 vector of ones, r
i
= (r
i1
, r
i2
, . . . , r
iT
)
is a T 1 vector of returns.
months
0 40 80 120 160
r
e
t
u
r
n
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
Simulated returns from CER model
= 0.023
= 0.11
months
0 40 80 120 160
-2
0
2
4
6
Monte Carlo simulation of the RW model based
on the CER model
p(t)
E[p(t)]
p(t) - E[p(t)]
Figure 3.3: Monte Carlo Simulation of the CER model.
Example: Please nd the estimates of the CER model parameters for any three stocks and
two market indices such as S&P500 index.
3.2.6 Statistical Properties of Estimates
It follows from the properties of OLS estimators that as T ,

i
N(
i
,
2
i
/T)
based on the Central Limit Theorem (CLT). Since
2
i
is not observed, one uses an estimate
of
2
i
,
2
i
, and the standard error

SE( ) =
i
/
T. Then,

i
SE(
i
)
t
T1
. (3.3)
To compute a (1) 100% condence interval for
i
we use (3.3) and the quantile (critical
value) t
T1,/2
to give
Pr
_
t
T1,/2

i
i

i
/
T
t
T1,/2
_
1 ,
which can be rearranged as
Pr
_

i
t
T1,/2

i
/
T
i

i
+ t
T1,/2

i
/
T
_
1 .
Hence the condence interval [
i
t
T1,/2

i
/
T,
i
+ t
T1,/2

i
/
T] covers the true un-

known value of
i
with an approximate probability 1 . Therefore, the above results can
be used for the statistical inferences such as testing hypothesis.
3.3 AR(1) Model
The series {y
t
; t Z} follows an autoregressive (AR) process of order 1, denoted by AR(1),
if and only if it can be written as
y
t
= y
t1
+ e
t
, (3.4)
where {e
t
, t Z} is a weak white noise with variance Var(e
t
) =
2
, and is a real number
of absolute value less than 1. The dynamics of the AR models depend on:
1. The past history, i.e the last realization y
t1
for the AR(1) model.
2. Random shock e
t
that occurs at time t. It is called innovation and is not observable.
Proposition 3.1: An AR(1) process can be written as the sum of all past innovations:
y
t
= e
t
+ e
t1
+
2
e
t2
+ . . . =
h=0
h
e
th
,
which is called a linear process. This is the innite moving average MA() representation
of the AR(1) process, and
h
is the moving average coecient of order h. It is easiest to show
that this is true using the lag operator L, dened by La
t
= a
t1
for any innite sequence of
variables or numbers {a
t
}. Recall that L
k
X = X
tk
and L
k
= for all integers k. Equation
(3.4) can be rewritten as: (1 L)y
t
= e
t
. As < 1, there is the result
1
1 L
=
i=0
(L)
i
and therefore
y
t
=
1
1 L
e
t
=
i=0
(L)
i
e
t
=
h=0
h
e
th
.
The moving average coecients
h
can be viewed as dynamic multipliers, i.e. they show the
eect of a transitory shock (e
0
) at time 0 to the initial innovation e
0
. See Taylor (2005,
Chapter 3) or Tsay (2005, Chapter 2) for details. Also, the moving average
m
h=0
h
e
th
is
called exponential smoothing.
Proposition 3.2: The AR(1) process is such that
1. E(y
t
) = 0, for all t
2. Cov(y
t
, y
th
) =
2
|h|
/(1
2
) for all t, h; in particular Var(y
t
) =
2
/(1
2
);
3. (t, h) =
h
, for all t, h;
4. {y
t
} is second-order stationary (or covariance stationary), i.e. the mean and variance
are the same for all t.
The autocorrelation coecient (t, h) is an extension of the correlation coecient between
two random variables X and Y :
Corr(X, Y )
Cov(X, Y )
_
Var(X)
_
Var(Y )
.
And for a second-order stationary process the autocorrelation coecient is
(t, h)
Cov(y
t
, y
th
)
_
Var(y
t
)
_
Var(y
th
)
=
Cov(y
t
, y
th
)
Var(y
t
)
.
Note that the process AR(1) is second order stationary when || < 1 since the mean of y
t
and
V ar(y
t
) do not depend on time index t. Also note that the variance of y
t
is a function of both
2
and . As a function of , it increases with || and tends to innity when approaches the
value +1 or 1. The autoregressive parameter can be viewed as the persistence measure of an
additional transitory shock. Since (t, h) =
h
, an increase of the autoregressive parameter
results in higher autocorrelations and stronger persistence of past shocks. The optimal
linear forecast of y
t+H
, made at time t, is given by:
f
t,H
=
H
y
t
.
See Taylor (2005, Chapter 3) or Tsay (2005, Chapter 2) for details.
3.3.1 Estimation and Tests
The estimator of can be obtained using ordinary least squares (OLS):

T
=
T
t=2
y
t
y
t1
T
t=2
y
2
t1
= (Y
t1
Y
t1
)
1
Y
t1
Y
t
,
where Y
t
= (y
2
, y
3
, . . . , y
T
)
is a (T1)1 vector of the observations, Y

t1
= (y
1
, y
2
, . . . , y
T1
)
is a (T 1) 1 vector of the observations.

Proposition 3.3: If {y
t
} is an AR(1) process with a strong white noise, then
1. The estimator
T
converges to the true value of when T tends to innity
2. It is asymptotically normal:
T(
T
) N(0, 1
2
). (3.5)
From (3.5), we can see that if is close to one, then the limiting distribution is approaching
to zero and it becomes degenerate. An OLS estimator of the variance is as follows:

2
T
=
1
T 1
T
t=2
e
2
t
=
1
T 1
e
t
e
t
,
where e
t
= (e
2
, . . . , e
T
)
is a (T 1) 1 vector of the residuals. One can also assume that

the white noise is Gaussian, i.e, follows normal distribution. The Maximum Likelihood (ML)
estimators are obtained by maximizing the likelihood function with respect to ,
2
. For
an AR(1) model the ML estimators and OLS estimators are equivalent. See Taylor (2005,
Chapter 3) or Tsay (2005, Chapter 2) for details.
3.3.2 White Noise Hypothesis
One may want to test the hypothesis that the last realization y
t1
does not aect the real-
ization y
t
, i.e one may want to test the following null hypothesis: H
0
: = 0 . Note from
the distribution of
T
in (3.5) that under the null hypothesis, the distribution of
T
under
H
0
is:
T
T
N(0, 1)
Therefore, the 95% condence interval is |
T
| 1.96/
T, which shows up in the ACF plot

with two blue dotted lines. The test consists of accepting H
0
: = 0 if |
T
T
| 1.96 or of
rejecting it otherwise. See Taylor (2005, Chapter 3) or Tsay (2005, Chapter 2) for details.
Remark: An AR(1) process is invariant with respect to selected sampling frequency, i.e.
an AR(1) series of weekly returns remains an AR(1) series when the frequency is reduced to
monthly data or increased to daily data.
Remark: From (3.5), when = 1 or is very close to 1, the asymptotic distribution becomes
degenerate. This means that the asymptotic distribution of
T
needs to be re-considered and
it might not be normal.
3.3.3 Unit Root
The process {y
t
; t Z} is integrated of order 1, denoted by I(1), if and only if it satises
the recursive equation
y
t
= y
t1
+ e
t
,
where {e
t
} is a weak white noise. The process {y
t
; t Z} is I(1) process with a drift if it
has a constant term:
y
t
= + y
t1
+ e
t
. (3.6)
The mean and variance of y
t
in (3.6) are as follows:
E(y
t
) = E(y
0
) + t, and Var(y
t
) = V ar(y
0
) +
2
t.
Compare it with mean and variance of covariance-stationary AR(1) process in (1) of Propo-
sition 3.2. Note that for an I(1) process with drift, the variance depends on t whenever
2
= 0, and the mean varies with t as well whenever = 0. Therefore, I(1) processes are
non-stationary. See Hamilton (1994, Chapter 17), Taylor (2005, Chapter 3), and Tsay (2005)
for details.
3.3.4 Estimation and Tests in the Presence of a Unit Root
The I(1) specication can be represented by a regression model:
y
t
= y
t1
+ e
t
, without drift (3.7)
y
t
= + y
t1
+ e
t
, with drift (3.8)
and corresponds to the case when = 1. The OLS estimators of the parameters and in
(3.7) and (3.8) can still be found but their properties are dierent from the standard case
when || < 1.
t
} is an I(1) process without drift, OLS estimate of ,
T
, tends
asymptotically to 1.
t
} is an I(1) process with drift, OLS estimate of ,
T
, tends asymp-
totically to 1 and
T
to .
Proposition 3.6: The ACF of a non-stationary time series decays very slowly as a function
of lag h. The PACF of a non-stationary time series tends to have a peak very near unity at
lag 1, with other values less than the signicance level. Indeed, if h > 0,
(y
t
, y
t+h
) =
_
t
t + h
,
which depends on t.
The starting point for the Dickey-Fuller (DF) test is the autoregressive model of order one,
AR(1) as in (3.8). If = 1, y
t
is nonstationary and contains a stochastic trend. Therefore,
within the AR(1) model, the hypothesis that y
t
has a trend can be tested by testing:
H
0
: = 1 vs. H
1
: < 1.
This test is most easily implemented by estimating a modied version of (3.8). Subtract y
t1
from both sides and let = 1. Then, model (3.8) becomes:
y
t
= + y
t1
+ e
t
(3.9)
Table 3.2: Large-sample critical values for the ADF statistic
Deterministic regressors 10% 5% 1%
Intercept only -2.57 -2.86 -3.43
Intercept and time trend -3.12 -3.41 -3.96
and the testing hypothesis is:
H
0
: = 0 vs. H
1
: < 0.
The OLS t-statistic in (3.9) testing = 0 is known as the Dickey-Fuller test statistic.
The extension of the DF test to the AR(p) model is a test of the null hypothesis H
0
: = 0
against the one-sided alternative H
1
: < 0 in the following regression:
y
t
= + y
t1
+
1
y
t1
+ +
p
y
tp
+ e
t
. (3.10)
Under the null hypothesis, y
t
has a stochastic trend and under the alternative hypothesis, y
t
is
stationary. If instead the alternative hypothesis is that y
t
is stationary around a deterministic
linear time trend, then this trend must be added as an additional regressor in model (3.10)
and the DF regression becomes
y
t
= + t + y
t1
+
1
y
t1
+ +
p
y
tp
+ e
t
. (3.11)
This is called the augmented Dickey-Fuller (ADF) test and the test statistic is the OLS
t-statistic testing that = 0 in equation (3.11).
The ADF statistic does not have a normal distribution, even in large samples. Critical
values for the one-sided ADF test depend on whether the test is based on equation (3.10) or
(3.11) and are given in Table 3.2. Table 17.1 of Hamilton (1994, p.502) presents a summary
of DF tests for unit roots in the absence of serial correlation for testing the null hypothesis
of unit root against some dierent alternative hypothesis. It is very important for you to
understand what your alternative hypothesis is in conducting unit root tests. I reproduce
this table here, but you need to check Hamiltons (1994) book for the critical values of DF
statistic for dierent cases. The critical values are presented in the Appendix of the book.
In the above models (4 cases), the basic assumption is that u
t
is iid. But this assumption
is violated if u
t
is serially correlated and potentially heteroskedastic. To take account of
Table 3.3: Summary of DF test for unit roots in the absence of serial correlation
Case 1:
True process: y
t
= y
t1
+ u
t
, u
t
N(0,
2
) iid.
Estimated regression: y
t
= y
t1
+ u
t
.
T( 1) has the distribution described under the heading Case 1 in Table B.5.
( 1)/
2

has the distribution described under Case 1 in Table B.6.
Case 2:
True process: y
t
= y
t1
+ u
t
, u
t
N(0,
2
) iid.
t
= + y
t1
+ u
t
.
T( 1) has the distribution described under Case 2 in Table B.5.
( 1)/
2

OLS F-test of join hypothesis that = 0 and = 1 has the distribution described under Case 2
in Table B.7.
Case 3:
True process: y
t
= + y
t1
+ u
t
, = 0, u
t
N(0,
2
) iid.
t
= + y
t1
+ u
t
.
( 1)/
2

N(0, 1).
Case 4:
True process: y
t
= + y
t1
+ u
t
, = 0, u
t
N(0,
2
) iid.
t
= + y
t1
+ t + u
t
.
T( 1) has the distribution described under Case 4 in Table B.5.
( 1)/
2

OLS F-test of join hypothesis that = 1 and = 0 has the distribution described under Case 4
in Table B.7.
serial correlation and potential heteroskedasticity, one way is to use the Phillips and Perron
test (PP test) proposed by Phillips and Perron (1988). For other tests for unit roots, please
read the book by Hamilton (1994, p.506, Section 17.6). Some recent testing methods have
been proposed. Finally, notice that in R, there are at least ve packages to provide unit root
tests such as tseries, urca, uroot, fUnitRoots and FinTS.
library(tseries) # call library(tseries)
library(urca) # call library(urca)
library(quadprog) # call library(quadprog)
# for Functions to solve Quadratic Programming Problems
library(zoo)
test1=adf.test(cpi) # Augmented Dickey-Fuller test
test2=pp.test(cpi) # do Phillips-Perron test
test3=ur.df(y=cpi,lag=5,type=c("drift"))
See Hamilton (1994, Chapter 17), Taylor (2005, Chapter 3), and Tsay (2005, Chapter 2)
for details.
3.4 MA(1) Model
The moving average process of order one, denoted MA(1), is dened as:
y
t
= + e
t
+ e
t1
.
It is assumed that moving-average parameter satises the invertibility condition || < 1 and
then the optimal linear forecasts can be calculated. An MA(1) process has autocorrelations
1
=

1 +
2
,
= 0 for 2.
The optimal linear forecasts are given by
f
t,1
= + (y
t
f
t1,1
), and f
t,H
= , H 2.
See Hamilton (1994, Chapter 4), Taylor (2005, Chapter 3), and Tsay (2005, Chapter 2) for
details.
3.5 ARMA, ARIMA, and ARFIMA Processes
According to the Wold theorem, any second-order stationary time process can be written as
a moving average of order innity. See Hamilton (1994, Chapter 4), Taylor (2005, Chapter
3), and Tsay (2005, Chapter 2) for details.
3.5.1 ARMA(1,1) Process
Consider a combination of the AR(1) process and MA(1) models dened by
y
t
= y
t1
+ e
t
+ e
t1
,
which is called the autoregressive, moving-average process, denoted ARMA(1,1). It is as-
sumed that 0 < || < 1 and 0 < < 1. Autocorrelations are given by
= A(, )
, 1
with
A(, ) =
(1 + )( + )
(1 + 2 +
2
)
.
The ARMA(1,1) process can be written using the lag operator as:
(1 L)y
t
= (1 + L)e
t
.
This implies that
y
t
=
1 + L
1 L
e
t
=
_

i=0
i
L
i
_
(1 + L)e
t
= e
t
+ ( + )
i=1
i1
e
ti
,
i.e. the ARMA(1,1) process can be written as MA() process. The optimal linear forecast
of y
t+1
is
f
t,1
= ( + )
i=1
()
i1
y
t+1i
or
f
t,1
= ( + )y
t
f
t1,1
.
To forecast observed values, we replace the parameters , , and by their estimates. The
optimal linear forecast further ahead is constructed as follows:
f
t,H
=
H
f
t,1
.
3.5.2 ARMA(p,q) Process
A second-order stationary (covariance-stationary) process {y
t
} is an ARMA(p,q) process of
autoregressive order p and moving average order q if it can be written as
y
t
=
1
y
t1
+ . . . +
p
y
tp
+ e
t
1
e
t1
. . .
q
e
tq
,
where
p
= 0,
q
= 0 and {e
t
} is a weak white noise. The ARMA process can be written as
(L)y
t
= (L)e
t
, (3.12)
where (L) = 1
1
L
2
L
2
. . .
p
L
p
and (L) = 1 +
1
L +
2
L
2
+ . . . +
2
L
q
.
Now the question is how to select among various plausible models. Box, Jenkins, and
Reinsel (1994) described the Box-Jenkins methodology for selecting an appropriate ARMA
model. We mention that two criteria which reward reducing the squared error and penalize
for additional parameters are the Akaike Information Criterion
AIC(K) = log
2
+
2 K
n
and the Schwarz Information Criterion
SIC(K) = log
2
+
K log(n)
n
;
(Schwarz, 1978) where K is the number of parameters tted (exclusive of variance parame-
ters) and
2
is the maximum likelihood estimator for the variance. This is sometimes termed
the Bayesian Information Criterion, BIC and will often yield models with fewer parameters
than the other selection methods. A modication to AIC(K) that is particularly well suited
for small samples was suggested by Hurvich and Tsai (1989). This is the corrected AIC,
given by
AICC(K) = log
2
+
n + K
n K + 2
.
The rule for all three measures above is to choose the value of K leading to the smallest
value of AIC(K) or SIC(K) or AICC(K). See Brockwell and Davis (1991, Section 9.3) for
details. For more details about model selection methodologies, please read Chapter 2 of my
lecture notes (see Cai (2007, Chapter 2)).
The R commands for tting and simulating an ARIMA model are
arima(x, order = c(0, 0, 0),seasonal = list(order = c(0, 0, 0), period = NA),
xreg = NULL, include.mean = TRUE, transform.pars = TRUE, fixed = NULL,
init = NULL, method = c("CSS-ML", "ML", "CSS"), n.cond,
optim.control = list(), kappa = 1e6)
arima.sim(model, n, rand.gen = rnorm, innov = rand.gen(n, ...),
n.start = NA, start.innov = rand.gen(n.start, ...), ...)
ar(x, aic = TRUE, order.max = NULL,
method=c("yule-walker", "burg", "ols", "mle", "yw"),
na.action, series, ...)
3.5.3 AR(p) Model
The series {y
t
; t Z} follows an autoregressive process of order p, denoted AR(p), if and
only if it can be written as
y
t
=
p
j=1
j
y
tj
+ e
t
, (3.13)
where {e
t
, t Z} is a weak white noise with variance Var(e
t
) =
2
. It is convenient to
rewrite (3.13), using the back-shift operator, as
(L) y
t
= w
t
, where (L) = 1
1
L
2
L
2

p
L
p
, (3.14)
is a polynomial with roots (solutions of (L) = 0) outside the unit circle (|L
j
| > 1)
1
. The
restrictions are necessary for expressing the solution y
t
of (3.14) in terms of present and past
values of w
t
, which is called invertibility of an AR(p) series. That solution has the form
y
t
= (L) w
t
, where (L) =
k=0
k
L
k
, (3.15)
is an innite polynomial (
0
= 1), with coecients determined by equating coecients of B
in
(L) (L) = 1. (3.16)
Equation (3.15) can be obtained formally by noting that choosing (L) satisfying (3.16),
and multiplying both sides of (3.15) by (L) gives the representation (3.15). It is clear that
the random walk has
1
= 1 and
k
= 0 for all k 2, which does not satisfy the restriction
and the process is nonstationary. y
t
is stationary if
k
|
k
| < ; see Proposition 3.1.2 in
Brockwell and Davis (1991, p.84), which can be weakened by
k

2
k
< ; see Hamilton
(1994, p.52).
Question: How to identify the order p in an AR(p) model intuitively?
Proposition 3.7 The partial autocorrelation function (PACF) as a function of lag h is zero
for h > p, the order of the autoregressive process. This enables one to make a prelimi-
nary identication of the order p of the process using the partial autocorrelation function
PACF. Simply choose the order beyond which most of the sample values of the PACF are
approximately zero.
1
This restriction is a sucient and necessary condition for an ARMA time series to be invertible; see
Section 3.7 in Hamilton (1994) or Theorem 3.1.2 in Brockwell and Davis (1991, p.86) and the related
discussions.
To verify the above, note that the PACF is basically the last coecient obtained when
minimizing the squared error
MSE = E
_
_
_
y
t+h
k=1
a
k
y
t+hk
_
2
_
_
.
Setting the derivatives with respect to a
j
equal to zero leads to the equations
E
_
_
_
y
t+h
k=1
a
k
y
t+hk
_
2
y
t+hj
_
_
= 0
This can be written as
y
(j)
h
k=1
a
k

y
(j k) = 0
for 1 j h. Now, it is clear that, for an AR(p), we may take a
k
=
k
for k p and a
k
= 0
for k > p to get a solution for the above equation. This implies Proposition 3.7 above.
To estimate the coecients of the pth order AR in (3.13), write the equation (3.14) as
y
t
k=1
k
y
tk
= w
t
and multiply both sides by y
th
for any h 1. Assuming that the mean E(y
t
) = 0, and
using the denition of the autocovariance function leads to the equation
E
_
(y
t
y
th
k=1
k
y
tk
y
th
_
= E[w
t
y
th
].
The left-hand side immediately becomes
y
(h)
p
k=1
k

y
(hk). The representation (3.15)
implies that
E[w
t
y
th
] = E[w
t
(w
th
+
1
w
th1
+
2
w
th2
+ )] =
_
2
w
, if h = 0,
0 otherwise.
Hence, we may write the equations for determining
x
(h) as
y
(0)
p
k=1
k

y
(k) =
2
w
(3.17)
and
y
(h)
p
k=1
k

y
(h k) = 0 for h 1. (3.18)
Note that one will need the property
y
(h) =
y
(h) in solving these equations. Equations
(3.17) and (3.18) are called the Yule-Walker Equations (see Yule, 1927, Walker, 1931).
Having decided on the order p of the model, it is clear that, for the estimation step, one
may write the model (3.13) in the regression form
y
t
=
z
t
+ w
t
, (3.19)
where = (
1
,
2
, ,
p
)
corresponds to and z
t
= (y
t1
, y
t2
, , y
tp
)
is the vector
of dependent variables. Taking into account the fact that y
t
is not observed for t 0, we
may run the regression approach for t = p +1, , n to get estimators for and for
2
, the
variance of the white noise process. These so-called conditional maximum likelihood
estimators are commonly used because the exact maximum likelihood estimators involve
solving nonlinear equations; see Chapter 5 in Hamilton (1994) for details and we will discuss
this issue later.
3.5.4 MA(q)
We may also consider processes that contain linear combinations of underlying unobserved
shocks, say, represented by white noise series w
t
. These moving average components generate
a series of the form
y
t
= w
t
k=1
k
w
tk
, (3.20)
where q denotes the order of the moving average component and
k
(1 k q) are param-
eters to be estimated. Using the back-shift notation, the above equation can be written in
the form
y
t
= (L) w
t
with (L) = 1
q
k=1
k
L
k
, (3.21)
where (L) is another polynomial in the shift operator L. It should be noted that the MA
process of order q is a linear process of the form considered earlier with
0
= 1,
1
=
1
,
,
q
=
q
. This implies that the ACF will be zero for lags larger than q because terms
in the form of the covariance function will all be zero. Specically, the exact forms are
y
(0) =
2
w
_
1 +
q
k=1
2
k
_
and
y
(h) =
2
w
_
h
+
qh
k=1
k+h
k
_
(3.22)
for 1 h q 1, with
y
(q) =
2
w

q
, and
x
(h) = 0 for h > q. Hence, we will have the
property of ACF for for MA Series.
Property 3.8: For a moving average series of order q, note that the autocorrelation function
(ACF) is zero for lags h > q, i.e.
y
(h) = 0 for h > q. Such a result enables us to diagnose
the order of a moving average component by examining
y
(h) and choosing q as the value
beyond which the coecients are essentially zero.
Fitting the pure moving average term turns into a nonlinear problem as we can see by
noting that either maximum likelihood or regression involves solving (3.20) or (3.21) for w
t
,
and minimizing the sum of the squared errors. Suppose that the roots of (L) = 0 are all
outside the unit circle, then this is possible by solving (L) (L) = 1, so that, for the vector
parameter = (
1
, ,
q
)
, we may write
w
t
() = (L) y
t
(3.23)
and minimize SSE() =
n
t=q+1
w
2
t
() as a function of the vector parameter . We do not
really need to nd the operator (L) but can simply solve (3.23) recursively for w
t
, with
w
1
, w
2
, , w
q
= 0, and w
t
() = y
t
+
q
k=1
k
w
tk
for q + 1 t n. It is easy to verify
that SSE() will be a nonlinear function of
1
,
2
, ,
q
. However, note that by the Taylor
expansion
w
t
() w
t
(
0
) +
_
w
t
()
0
_
(
0
),
where the derivative is evaluated at the previous guess
0
. Rearranging the above equation
leads to
w
t
(
0
)
_
w
t
()
0
_
(
0
) + w
t
(),
which is just a regression model. Hence, we can begin with an initial guess
0
= (0.1, 0.1, , 0.1)
,
say and successively minimize SSE() until convergence. See Chapter 5 in Hamilton (1994)
for details and we will discuss this issue later.
Forecasting: In order to forecast a moving average series, note that y
t+h
= w
t+h

q
k=1
k
w
t+hk
. The results below (3.28) imply that y
t
t+h
= 0 if h > q and if h q,
y
t
t+h
=
q
k=h
k
w
t+hk
,
where the w
t
values needed for the above are computed recursively as before. Because of
(3.15), it is clear that
0
= 1 and
k
=
k
for 1 k q and these values can be substituted
directly into the variance formula (3.31). That is, P
t
t+h
=
2
w
_
1 +
h1
k=1

2
k
_
.
3.5.5 AR() Process
Under the condition that the roots of the moving average polynomial (z) lie outside the
unit circle, (3.12) can be rewritten as:
(L)
(L)
y
t
= e
t
, or B(L)y
t
= e
t
, or
h=0
b
h
y
th
= e
t
,
where b
1
, b
2
, . . . are appropriately dened functions of s and s.
3.5.6 MA() Process
Under the condition that the roots of the autoregressive polynomial (z) lie outside the unit
circle, we can rewrite (3.12) as:
y
t
=
(L)
(L)
e
t
= A(L)e
t
=
h=0
a
h
e
th
,
where A(L) = (L)(L)
1
= 1 + a
1
L + a
2
L
2
+ . . . and the parameters a
1
, a
2
, . . . are
appropriately dened functions of s and s. This model is also called a linear process in
the stochastic processes literature.
3.5.7 ARIMA Processes
The acronym ARIMA(p,1,q) is used for a process {y
t
}, when it is non-stationary but its
rst dierences, {y
t
y
t1
}, follow a stationary ARMA(p,q) process. The additional letter
I states that the process {y
t
} is integrated, while the numeral 1 indicates that only one
application of dierences is required to achieve stationarity.
3.5.8 ARFIMA Process
An ARMA(p,q) process can be described as
(L)y
t
= (L)e
t
.
The ARFIMA(p,d,q) process can be written as
(1 L)
d
(L)y
t
= (L)e
t
, or y
t
= (1 L)
d
(L)
1
(L)e
t
This ARFIMA process is stationary when d < 0.5. Assuming that d is positive, it is a special
case of a long memory process or a fractional process or long range dependent time series.
The letter d means that the process {y
t
} is fractional with the index d, which is called
the long memory parameter or H = d + 1/2 is called the Hurst parameter; see Hurst
(1951). This is a very big area and there are a lot research activities in this area. A long
memory process has widely used in nancial applications such as modeling the relationship
between the implied and realized volatilities; see the survey paper by Andersen, Bollerslev,
Christoersen and Diebold (2005).
Long memory time series have been a popular area of research in economics, nance and
statistics and other applied elds such as hydrological sciences during the recent years. Long
memory dependence was rst observed by the hydrologist Hurst (1951) when analyzing the
minimal water ow of the Nile River when planning the Aswan Dam. Granger (1966) gave
an intensive discussion about the application of long memory dependence in economics and
its consequence was initiated. Here we only briey discuss some most useful time series
models in the literature. For more details about the aforementioned models, please read the
books by Brockwell and Davis (1991) and Hamilton (1994).
Exercises: Please use Monte Carlo simulation method to generate data from the above
models and make graphs to what you can make conclusions from the graphs.
Applications
The usage of the function fracdi() is
fracdiff(x, nar = 0, nma = 0,
ar = rep(NA, max(nar, 1)), ma = rep(NA, max(nma, 1)),
dtol = NULL, drange = c(0, 0.5), h, M = 100)
This function can be used to compute the maximum likelihood estimators of the parameters
of a fractionally-dierenced ARIMA(p, d, q) model, together (if possible) with their estimated
covariance and correlation matrices and standard errors, as well as the value of the maximized
likelihood. The likelihood is approximated using the fast and accurate method of Haslett
and Raftery (1989). To generate simulated long-memory time series data from the fractional
ARIMA(p, d, q) model, we can use the following function fracdi.sim() and its usage is
fracdiff.sim(n, ar = NULL, ma = NULL, d,
rand.gen = rnorm, innov = rand.gen(n+q, ...),
n.start = NA, allow.0.nstart = FALSE, ..., mu = 0.)
An alternative way to simulate a long memory time series is to use the function arima.sim().
The menu for the package fracdi can be downloaded from the web site at
http://cran.cnr.berkeley.edu/doc/packages/fracdi.pdf
The function spec.pgram() in R calculates the periodogram using a fast Fourier trans-
form, and optionally smooths the result with a series of modied Daniell smoothers (moving
averages giving half weight to the end values). The usage of this function is
spec.pgram(x, spans = NULL, kernel, taper = 0.1,
pad = 0, fast = TRUE, demean = FALSE, detrend = TRUE,
plot = TRUE, na.action = na.fail, ...)
We can also use the function spectrum() to estimate the spectral density of a time series
and its usage is
spectrum(x, ..., method = c("pgram", "ar"))
Finally, it is worth to pointing out that there is a package called longmemo for long-memory
processes, which can be downloaded from
http://cran.cnr.berkeley.edu/doc/packages/longmemo.pdf. This package also pro-
vides a simple periodogram estimation by function per() and other functions like llplot()
and lxplot() for making graphs for spectral density. See the menu for details.
Example: As an illustration, Figure 3.4 show the sample ACFs of the absolute series
of daily simple returns for the CRSP value-weighted (left top panel) and equal-weighted
(right top panel) indexes from July 3, 1962 to December 31, 1997 and the sample partial
autocorrelation function of the absolute series of daily simple returns for the CRSP value-
weighted (left middle panel) and equal-weighted (right middle panel) indexes. The ACFs
are relatively small in magnitude, but decay very slowly; they appear to be signicant at
the 5% level even after 300 lags. There are only the rst few lags for PACFs outside the
condence interval and then the rest is basically within the condence interval. For more
information about the behavior of sample ACF of absolute return series, see Ding, Granger,
and Engle (1993). To estimate the long memory parameter estimate d, we can use the
function fracdi() in the package fracdi in R and results are

d = 0.1867 for the absolute
0 100 200 300 400
0
.
1
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
ACF for valueweighted index
0 100 200 300 400
0
.
1
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
ACF for equalweighted index
0 100 200 300 400
0
.
1
0
.
0
0
.
1
0
.
2
0
.
3
PACF for valueweighted index
0 100 200 300 400
0
.
1
0
.
0
0
.
1
0
.
2
0
.
3
PACF for equalweighted index
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07
1
1
6
Log Smoothed Spectral Density of VW
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07
1
2
1
0
6
Log Smoothed Spectral Density of EW
Figure 3.4: Sample autocorrelation function of the absolute series of daily simple returns
for the CRSP value-weighted (left top panel) and equal-weighted (right top panel) indexes.
Sample partial autocorrelation function of the absolute series of daily simple returns for the
CRSP value-weighted (left middle panel) and equal-weighted (right middle panel) indexes.
The log smoothed spectral density estimation of the absolute series of daily simple returns
for the CRSP value-weighted (left bottom panel) and equal-weighted (right bottom panel)
indexes.
returns of the value-weighted index and

d = 0.2732 for the absolute returns of the equal-
weighted index. To support our conclusion above, we plot the log smoothed spectral density
estimation of the absolute series of daily simple returns for the CRSP value-weighted (left
bottom panel) and equal-weighted (right bottom panel). They show clearly that both log
spectral densities decay like a log function and they support the spectral densities behavior.
3.6 R Commands
Classical time series functionality in R is provided by the arima() and KalmanLike() com-
mands in the basic R distribution. The dse packages provides a variety of more advanced
estimation methods; fracdi can estimate fractionally integrated series; longmemo cov-
ers related material. For volatily modeling, the standard GARCH(1,1) model can be es-
timated with the garch() function in the tseries package. Unit root and cointegration
tests are provided by tseries, urca and uroot. The Rmetrics bundle comprised of the
fArma, fAsianOptions, fAssets, fBasics, fBonds, fCalendar, fCopulae, fEcon, fExoticOp-
tions, fExtremes, fGarch, fImport, fMultivar, fNonlinear, fOptions, fPortfolio, fRegression,
fSeries, fTrading, fUnitRoots and fUtilities packages contains a very large number of relevant
functions for dierent aspect of empirical and computational nance, including a number
of estimation functions for ARMA, GARCH, long memory models, unit roots and more.
The ArDec implements autoregressive time series decomposition in a Bayesian framework.
The dyn and dynlm are suitable for dynamic (linear) regression models. Several pack-
ages provide wavelet analysis functionality: rwt, wavelets, waveslim, wavethresh. Some
methods from chaos theory are provided by the package tseriesChaos. For more details,
please see the le in the web site at http://www.math.uncc.edu/ zcai/CRAN-Finance.html
or http://cran.cnr.berkeley.edu/src/contrib/Views/Finance.html which is downloadable.
3.7 Regression Models With Correlated Errors
See my lecture notes on Advanced Topics in Analysis of Economic and Financial Data Using
R and SAS, which can be downloaded from http://www.math.uncc.edu/ zcai/cai-notes.pdf
3.8 Comments on Nonlinear Models and Their Appli-
cations
All aforementioned models are basically linear but we have not touched with nonlinear time
series models. It requires much deeper statistics knowledge for nonlinear time series models.
Indeed, during the last two decades, there have been a lot of research activities on nonlinear
models and their applications, particularly, in nance; see Tsay (2005, Chapter 4) and Fan
and Yao (2003). Also, see Chapter 12 of Gourieroux and Jasiak (2001) and Chapter 16 of
Taylor (2005) for nonlinear models in nance.
3.9 Problems
3.9.1 Problems
1. Download weekly (daily) price data for any stock or index, for example, Microsoft
(MSFT) stock (P
t
) for 03/13/86 - 1/15/2008. It is okey if you download other stocks
and indices.
(a) Compute mean ( ), standard deviation ( ), skewness (S
k
), and kurtosis (K
r
)
for Microsoft stock returns. Comment your ndings on skewness and kurtosis of
Microsoft stock returns. Are your results as expected?
# Mean, Variance:
rt=rnorm(100)
mean(rt)
var(rt)
# Skewness, Kurtosis:
library(fUtilities) # call library -- fUtilities
skewness(rt)
kurtosis(rt)
(b) Use constant expected returns (CER) model to simulate a sample of articial
data:
r
t
= + e
t
, 1 t T, e
t
N(0,
2
)
In generation of the articial sample, set equal to the sample mean of Microsoft
returns and
2
equal to the sample variance of Microsoft returns, i.e. = ,
2
=
2
. Use the R random number generator to generate error terms {e
t
} for
1 t T for dierent values of T. Generate the articial sample of returns and
prices using the above model. For generating prices, set p
0
= 1.
(c) If the CER model is a good model to describe stock market returns, then the
simulated (articial) sample of returns should have the same properties (mean,
variance, skewness, kurtosis, persistence) as the sample Microsoft stock market
returns.
(i) Compare the mean from the simulated sample and the sample mean of
Microsoft returns.
(ii) Compare the variance from the simulated sample and the sample variance
of Microsoft returns.
(iii) Compare the skewness from the simulated sample and the sample skew-
ness of Microsoft returns.
(iv) Compare the kurtosis from the simulated sample and the sample kur-
tosis of Microsoft returns.
(v) Can CER model explain excess kurtosis of stock returns?
What you need is to simulate 1000 times for the given sample
size. For each sample, compute sample mean, sample variance,
sample skewness, and sample kurtosis and then compute the
median of each of them as estimated mean, variance, skewness
and kurtosis from the simulated model. Finally compare the
estimated values from the simulated model with the true values
from the real data.
2. Estimate the CER model for Microsoft using the OLS estimation.
(a) Use t-statistic to test the null hypothesis that the mean of Microsoft returns is
zero.
(b) Look at the coecient of determination R
2
. What can you say about the t of
the CER model for Microsoft returns?
fit=lm(rt~1) # fit a constant term of regression model
print(summary(fit)) # print the results on the screen
(c) Use CER model to form the forecast for r
T+1
given all the information up to
period T, i.e. r
T+1 | T
.
(d) Use CER model to form the forecast r
T+2|T
.
Please think about how to do a forecasting for a CER model.
3. Estimate the AR(1) process of the following form for Microsoft stock returns:
r
t
= r
t1
+ e
t
, t = 1, . . . , T.
(a) Estimate the model using OLS.
(b) Use both t-statistic based on the OLS estimate and ADF test to test the hy-
pothesis of unit root for this model.
(c) Test the null hypothesis that is zero.
(d) Look at the coecient of determination R
2
the AR(1) model without drift for Microsoft returns?
n=length(r) # rt the series for some return
y1<-rt[2:n]
x1<-rt[1:(n-1)]
fit=lm(y1~-1+x1) # fit an AR(1) model without intercept
# Alternatively, you can use the command ar() to fit AR(p) using
fit1=ar(rt) # Let AIC select automatically the best model
print(summary(fit)) # print the results on the screen
(e) Use AR(1) model without drift to form the forecast r
T+1|T
. Write down the
formula.
(f) Use AR(1) model without drift to form the forecast r
T+2|T
. Write down the
formula.
See Section 3.9.2 for R codes for predictions.
4. Estimate the AR(1) process with drift for Microsoft stock returns:
r
t
= + r
t1
+ e
t
, 1 t T.
(a) Estimate the model using OLS.
(b) Use both t-statistic based on the OLS estimate and the ADF test to test the
hypothesis of unit root for this model.
(c) Find the estimate of the rst autocorrelation coecient of the error term (you
need to use residual e
t
= r
t
r
t1
). Test the null hypothesis that this
coecients of e
t
are zero.
(d) Look at the coecient of determination R
2
the AR(1) model with drift for Microsoft returns?
(e) Use AR(1) model without drift to form the forecast r
T+1|T
. Write down the
formula and make computations in R. For details, see Section 3.9.2.
(f) Use AR(1) model without drift to form the forecast r
T+2|T
. Write down the
formula and make computations in R. For details, see Section 3.9.2.
5. Estimate the AR(p) process with drift for Microsoft stock returns:
r
t
= +
1
r
t1
+ +
p
r
tp
+ e
t
, 1 t T.
(a) Estimate the model using OLS. Explain how you choose the lag length p.
(b) Test the null hypothesis that all autoregressive parameters are simultaneously
equal to zero, i.e. H
0
:
1
= =
p
= 0.
(c) Test the null hypothesis that = 0.
(d) What do these tests tell you about predictability of MSFT stock returns using
AR(p) model?
n=length(rt) # rt the series for some return
p<-??? # set up the number of lags
y1<-rt[(p+1):n]
xx<-rep(1,p*(n-p))
dim(xx)=c(n-p,p)
for(i in 1:p){
xx[,i]<-rt[i:(n-p+i-1)]
}
fit=lm(y1~xx)
# fit an AR(p) model with an intercept
3.9.2 R Code
Predictions
# 2-16-2008
graphics.off()
data=read.csv(file="c:/zcai/res-teach/econ6219/Bank-of-America.csv",header=T)
x=data[,5] # get the closing prices
x=rev(x) # reverse order of observations
n=length(x) # sample size
rt=diff(log(x)) # log return
n1=length(rt)
# do prediction
m=20 # leave the last m observations for prediction
# One-Step Ahead Forecasting
pred_1=rep(0,m)
for(i in 1:m){
fit1=arima0(rt[1:(n1-m+i-1)],order=c(1,0,0)) # fit an AR(1) model
pred0=predict(fit1,n.ahead=1)
pred_1[i]=pred0$pred[1] # compute predicted values
}
print(c("One-Step Ahead Forecasting"))
print(pred_1)
# Two-Step Ahead Forecasting
pred_2=rep(0,m)
for(i in 1:m){
fit1=arima0(rt[1:(n1-m+i-2)],order=c(1,0,0)) # fit an AR(1) model
pred0=predict(fit1,n.ahead=2) # two-step ahead forecasting
pred_2[i]=pred0$pred[2]
}
print(c("Two-Step Ahead Forecasting"))
print(pred_2)
3.10 Appendix A: Linear Forecasting
Assume that the records of an AR(1) process y
t
contain observations up to time T and we
wish to predict unknown future value y
t+H
that is H steps ahead. H is called the forecast
horizon.
Proposition 3.9: If y
t
is AR(1) process, the linear forecast at horizon H is
LE[y
t+H
|Y
T
] =
H
y
T
,
while corresponding forecast error is
e
T
(H) = y
t+H

H
y
t
.
When the forecast horizon increases, the accuracy of forecast deteriorates. The relative
forecast accuracy can measured by the ratio:
1
Var( e
T
(H))
Var(y
T+H
)
=
2H
.
Since we use an estimate
T
in practice, the empirical forecast for H steps ahead is
y
T+H
=
H
T
y
T
and the associated prediction interval is
_
y
T+H
2
2
T
_
1
2H
T
1
2
T
_
1/2
_
.
3.11 Appendix B: Forecasting Based on AR(p) Model
Time series analysis has proved to be fairly good way of producing forecasts. Its drawback
is that it is typically not conducive to structural or economic analysis of the forecast. The
model has forecasting power only if the future variable being forecasted is related to current
values of the variables that we include in the model.
The goal is to forecast the variable y
s
based on a set of variables X
t
(X
t
may consist of
the lags of variable y
t
). Let y
s
t
denote a forecast of y
s
based on X
t
. A quadratic loss function
is the same as in OLS regression, i.e. choose y
t
s
to minimize E(y
t
s
y
s
)
2
and the mean squared
error (MSE) is dened as MSE(y
t
s
) = E [(y
t
s
y
s
)
2
| X
t
]. It can be shown that the forecast
with the smallest MSE is the expectation of y
s
conditional on X
t
, that is y
t
s
= E(y
s
| X
t
).
Then, the MSE of the optimal forecast is the conditional variance of y
s
given X
t
, that is
Var(y
s
| X
t
).
We now consider the class of forecasts that are linear projection. These forecasts are
used very often in empirical analysis of time series data. There are two conditions for the
forecast y
t
s
to be a linear projection: (1) The forecast y
t
s
needs to be a linear function of X
t
,
that is y
t
s
= E(y
s
| X
t
) =
X
t
, and (2) the coecients should be chosen in such a way
that E[(y
s

X
t
)X
t
] = 0. The forecast
X
t
satisfying (1) and (2) is called the linear
projection of y
s
on X
t
. One of the reasons linear projects are popular is that the linear
projection produces the smallest MSE among the class of linear forecasting rules.
Finally, we give a general approach to forecasting for any process that can be written in
the form (3.15), a linear process. This includes the AR, MA and ARMA processes. We
begin by dening an h-step forecast of the process y
t
as
y
t
t+h
= E[y
t+h
| y
t
, y
t1
, ] (3.24)
For an AR(P) model
y
t
= +
1
y
t1
+ +
p
y
tp
+ e
t
,
the one-step ahead forecasting formula in (3.24) becomes
y
t
t+1
= E[y
t+1
| y
t
, y
t1
, ] = +
1
y
t1
+ +
p
y
tp
, (3.25)
and two-step ahead forecasting formula in (3.24) is
y
t
t+2
= E[y
t+2
| y
t
, y
t1
, ] = +
1
E[y
t+1
| y
t
, y
t1
, ] +
2
y
t
+ +
p
y
tp+2
= +
1
y
t
t+1
+
2
y
t
+ +
p
y
tp+2
. (3.26)
A general formula for h-step ahead forecasting can be expressed as
y
t
t+h
=
_
+
1
y
t
t+h1
+ +
h1
y
t
t+1
+
h
y
t
+ +
p
y
t+hp
, if h p
+
1
y
t
t+h1
+ +
p
y
t
t+hp
, if h > p.
(3.27)
Note that this is not exactly right because we only have y
1
, y
2
, , y
t
available, so that con-
ditioning on the innite past is only an approximation. From this denition, it is reasonable
to intuit that y
t
s
= y
t
for s t and
E[w
s
| y
t
, y
t1
, ] = E[w
s
| w
t
, w
t1
, ] = w
t
s
= w
s
(3.28)
for s t. For s > t, use y
t
s
and
E[w
s
| y
t
, y
t1
, ] = E[w
s
| w
t
, w
t1
, ] = w
t
s
= E(w
s
) = 0 (3.29)
since w
s
will be independent of past values of w
t
. We dene the h-step forecast variance as
P
t
t+h
= E[(y
t+h
y
t
t+h
)
2
| y
t
, y
t1
, ] (3.30)
To develop an expression for this mean square error, note that, with
0
= 1, we can write
y
t+h
=
k=0
k
w
t+hk
.
Then, since w
t
t+hk
= 0 for t + h k > t, i.e. k < h, we have
y
t
t+h
=
k=0
k
w
t
t+hk
=
k=h
k
w
t+hk
,
so that the residual is
y
t+h
y
t
t+h
=
h1
k=0
k
w
t+hk
.
Hence, the mean square error (3.30) is just the variance of a linear combination of indepen-
dent zero mean errors, with common variance
2
w
P
t
t+h
=
2
w
h1
k=0
2
k
. (3.31)
For more discussions, see Hamilton (1994, Chapter 4).
The R code for doing prediction is given by the following examples
pred1=predict(arima(lh, order=c(3,0,0)), n.ahead = 12)
fit1=arima(USAccDeaths, order=c(0,1,1),seasonal=list(order=c(0,1,1))))
pre2=predict(fit1, n.ahead = 6)
Alternatively, you can use the function arima0() and predict().
3.12 Appendix C: Random Variables
One Variable
The level of a stock market index on the following day may be regarded as a random variable.
For any random variable X, with possible outcomes that may range across all real numbers,
the cumulative distribution function (cdf) F() is dened as the probability of an outcome
at a particular level, or lower as F(x) = P(X x) with P() referring to the probability of
the bracketed event. The probability distribution function f() of a discrete random variable
satises:
f(x) = P(X = x), f(x) 0,
x=
f(x) = 1, F(x) =
x
u=
f(u)
and F() is not dierentiable function of x. Most of random variables are continuous and their
cdf is dierentiable. The density function f() of a continuous variable is f(x) = dF(x)/dx
(pdf) with
f(x) 0,
_

f(x)dx = 1, F(x) =
_
x
f(t)dt.
The probability of an outcome within a short interval from x
1
2
to x+
1
2
is approximately
f(x), while the exact probability for a given interval from a to b is given by
P(a X b) = F(b) F(a) =
_
b
a
f(x)dx.
The expectation or mean of a continuous random variable X is dened by
E(X) =
_

xf(x)dx
if the integration exists. For any function Y = g(X) of a random variable X, the expectation
is dened as
E(g(X)) =
_

g(x)f(x)dx
if the integration exists. The variance of X is dened as follows:
Var(X)
2
=
_

(x )
2
f(x)dx.
The mean and variance are two key measures to characterize the features of distribution
and they are widely used in practice. But, please note that the mean and variance can not
determine completely a distribution.
Normal and Lognormal Distributions
The normal (or Gaussian) distribution, denoted X N(,
2
), is one of the most important
continuous distributions in application. The normal density function is:
f(x) =
1
2
exp
_
1
2
2
(x )
2
_
,
which is also called bell-shaped curve. This density has two parameters: the mean and
variance
2
. It is well known that
X N(,
2
) Z =
X
N(0, 1),
where the variable Z is called the standard normal distribution. A positive random variable
Y has a lognormal distribution whenever log(Y ) has a normal distribution. When log(Y )
N(,
2
), the density of Y is
f(y) =
1
y
2
exp
_
1
2
2
(log(y) )
2
_
, y > 0
and f(y) = 0 for y 0. For this variable, it is easy to show that E[Y
n
] = exp(n +
1
2
n
2
2
)
for all n. As a result, the mean and variance are dened as:
E[Y ] = exp
_
+
1
2
2
_
, and Var(Y ) = exp
_
2 +
2
_ _
exp
_
2
_
1
.
Question: Why is the lognormal distribution useful in nance?
Multivariate Cases
Two random variables X and Y have a bivariate cumulative distribution function (cdf)
that gives the probabilities of both outcomes being less than or equal to levels x and y
respectively: F(x, y) = P(X x, Y y). The bivariate pdf is dened for continuous
variables by f(x, y) =
2
F(x, y)/xy.
Conditional pdf
Conditional expectation
Covariance and correlation between two variables
Independent random variables
The combination a +
i
b
i
Y
i
has a normal distribution when the component variables
have a multivariate normal distribution.
A multivariate normal distribution has the pdf:
f(y) =
1
(2)
n/2
_
det()
exp
_
1
2
(y )
1
(y )
_
for vectors y = (y
1
, . . . , y
n
)
, = (
1
, . . . ,
n
)
, with
i
= E(Y
i
), and a matrix that
has elements given by
i,j
= Cov(Y
i
, Y
j
).
Question: Why are the multivariate distributions important in nance?
3.13 References
Andersen, T.G., T. Bollerslev, P.E. Christoersen and F.X. Diebold (2005). Volatility and
Correlation Forecasting. In Handbook of Economic Forecasting (G. Elliott, C.W.J.
Granger and A. Timmermann, eds.). Amsterdam: North-Holland.
Brockwell, P.J. and R.A. Davis (1991). Time Series Theory and Methods. Springer, New
York.
Brockwell, P.J. and R.A. Davis (1996). Introduction to Time Series and Forecasting.
Springer, New York.
Box, G.E.P., G.M. Jenkins and G.G. Reinsel (1994). Time Series Analysis, Forecasting and
Control, (3th ed.). Englewood Clis, NJ: Prentice-Hall.
Cai, Z. (2007). Lecture Notes on Advanced Topics in Analysis of Economic and Finan-
cial Data Using R and SAS. The web link is: http://www.math.uncc.edu/ zcai/cai-
notes.pdf
Campbell, J.Y., A.W. Lo and A.C. MacKinlay (1997). The Econometrics of Financial
Diebold, F.X. and R.S. Mariano (1995). Comparing predictive accuracy. Journal of Busi-
ness and Economic Statistics, 13(3), 253-263.
Dickey, D.A. and W.A. Fuller (1979). Distribution of the estimators for autoregressive time
series with a unit root. Journal of the American Statistical Association, 74, 427-431.
Ding, Z., C.W.J. Granger and R.F. Engle (1993). A long memory property of stock returns
and a new model. Journal of Empirical Finance, 1, 83-106.
Haslett, J. and A.E. Raftery (1989). Space-time modelling with long-memory dependence:
Assessing Irelands wind power resource (with discussion). Applied Statistics, 38, 1-50.
Hurst, H.E. (1951). Long-term storage capacity of reservoirs. Transactions of the American
Society of Civil Engineers, 116, 770-799.
Fan, J. and Q. Yao (2003). Nonlinear Time Series: Nonparametric and Parametric Model.
Springer, New York.
Granger, C.W.J. (1966). The typical spectral shape of an economic variable. Econometrica,
34, 150-161.
Hamilton, J. (1994). Time Series Analysis. Princeton University Press.
Hurvich, C. and C.L. Tsai (1989). Regression and time series model selection in small
samples. Biometrika, 76, 297-307.
Phillips, P.C.B. and P. Perron (1988). Testing for a unit root in time series regression.
Biometrika, 75, 335-346.
Schwarz, F.(1978). Estimating the dimension of a model. Annals of Statist, 6, 461464.
Sullivan, R., A. Timmermann and H. White (1999). Data snooping, technical trading rule
performance, and the bootstrap. Journal of Finance, 54, 1647-1692.
Taylor, S. (2005). Asset Price Dynamics, Volatility, and Prediction. Princeton University
Press, Princeton, NJ. (Chapter 3)
Walker, G. (1931). On the periodicity in series of related terms. Proceedings of the Royal
Society of London, Series A, 131, 518-532.
Yule, G.U. (1927). On a method of investigating periodicities in disturbed series with
special reference to Wolfers Sun spot numbers. Philosophical Transactions of the
Royal Society of London, Series A, 226, 267-298.
Chapter 4
Predictability of Asset Returns
4.1 Introduction
4.1.1 Martingale Hypothesis
A process {y
t
; t N} is a martingale (this is a mathematical term) if and only if E
t
(y
t+1
) = y
t
for all t 0, where E
t
denotes the conditional expectation given the information at period
t, denoted by I
t
; that is E(y
t+1
|I
t
) = y
t
. Equivalently, this condition can be written as
y
t
= y
t1
+ e
t
,
where the process {e
t
; t 0} satises
E
t1
(e
t
) = 0. (4.1)
Note that (4.1) is called the martingale dierence (MD). {y
t
} is martingale if and only
if {e
t
= y
t
y
t1
} is MD. Note that the condition (4.1) is stronger than a weak white
noise condition for I(1) process, i.e condition (4.1) is stronger than imposing E(e
t
) = 0,
Cov(e
t
, e
th
) = 0 for all h = 0. An essence of a martingale is the notion of a fair game, a
game which is neither in your favor nor your opponents. The martingale condition of prices
implies that the best (nonlinear) prediction of the futures price is the current price. Another
aspect of the martingale hypothesis is that non-overlapping price changes are uncorrelated
at all leads and lags, which implies the ineectiveness of all linear forecasting rules for future
price changes based on the historical prices alone. However, one of the central tenets of
nancial economics is the necessity of some trade-o between risk and expected returns, and
although the martingale hypothesis places a restriction on expected returns, it does not account
for risk in any way. The terms ecient market hypothesis and martingale hypothesis
are equivalent.
69
CHAPTER 4. PREDICTABILITY OF ASSET RETURNS 70
4.1.2 Tests of MD
It is important to test if a time series is a martingale dierence sequence in many economic
and nancial studies. For example, the martingale version of the market eciency hypothesis
requires the asset returns in an ecient market to follow an MD process, so that currently
available information does not help improving the forecasts of future returns; see, e.g., Fama
(1970, 1991) and LeRoy (1989). Hall (1978) also argued that changes in consumption between
any two consecutive periods should be unpredictable. The concept of MD has also been used
to dene the correctness of econometric models. A time series regression model is said to be
correctly specied (for the conditional mean) if the disturbances of the model follow an MD
sequence. Therefore, a test of the MD hypothesis is useful in evaluating economic hypotheses
as well as econometric models.
It is well known that the tests based on the autocorrelation function and its spectral
counterpart are not consistent against non-MD sequences that are serially uncorrelated.
The autocorrelation-based Q-test of Box and Pierce (1970) and Ljung and Box (1978) and
the spectrum-based test of Durlauf (1991) and Hong (1996) are leading examples. The
modied Q-test (Lobato, Nankervis, and Savin, 2001; Hong 2001) and the modied Durlaufs
test (Deo, 2000), although robust to conditional heteroskedasticity, have the same problem.
There are several consistent tests of the MD hypothesis in the literature; see e.g., Bierens
(1982, 1984), De Jong (1996), Bierens and Ploberger (1997), Dominguez and Lobato (2000),
and Whang (2000, 2001). While consistency is an important property, these MD tests
typically suer from the drawback that their limiting distributions are data dependent.
Implementing these tests are therefore practically cumbersome because their critical values
can not be tabulated. An exception is the test proposed by Hong (1999); yet Hongs test
is in eect a test of pairwise independence which is not necessary for the MD hypothesis.
To overcome the aforementioned drawbacks, Kuan and Lee (2004) proposed a class of MD
tests based on a set of unconditional moment conditions that are equivalent to the MD
hypothesis (Bierens, 1982). The test by proposed Kuan and Lee (2004) has the following
advantages, relative to existing tests. It has a standard limiting distribution and is easy to
implement, has power against a much larger class of alternatives than the commonly used
autocorrelation- and spectrum-based tests, and the validity of this test does not rely on the
assumption of conditional homoscedasticity. This feature makes the proposed test a sensible
tool for testing economic and nancial time series. For details, see Kuan and Lee (2004) and
the references therein.
4.2 Random Walk Hypotheses
Let P
t
be the price and p
t
be its log price and r
t
= p
t
denote the log return. That is,
p
t
= + p
t1
+ e
t
, r
t
= + e
t
,
where is the expected price change or drift. This is the CER model discussed in Chapter
3.
4.2.1 IID Increments (RW1)
The simplest version of the random walk hypothesis (RWH) is the independency and iden-
tically distributed (IID) increments case in which the dynamics of {r
t
} are given by the
following equation:
r
t
= + e
t
, e
t
IID(0,
2
).
Unrealistic. The independence of the increments {e
t
} is much stronger than the mar-
tingale: Independence implies not only that increments are uncorrelated, but that any
nonlinear functions of the increments are also uncorrelated.
Note: avoiding violation of limited liability.
This RWH will be rejected by an appropriate test if the conditional variance of returns
have sucient variation through time, but this may tell nothing about the predictability of
returns. The statistically signicant autocorrelation in absolute and squared returns rejects
the i.i.d hypothesis but it does not prove that returns can be predicted.
4.2.2 Independent Increments (RW2)
r
t
= + e
t
, e
t
independent.
For this formulation, E[r
t
] = E[r
t+
], Cov(r
t
, r
t+
) = 0 for all t and all > 0.
The assumption of IID increments is not plausible for nancial asset prices over long
time periods. The assertion that the probability law of daily stock returns has remained
over the two-hundred-year history of NYSE is not reasonable. Therefore, researchers
relax the assumptions of RW1 to include processes with independent but not identically
distributed increments.
RW2 is weaker than RW1 (allows for heteroskedasticity). RW2 still has the economic
property of IID random walk: any arbitrary transformation of future price changes is
unforecastable using any transformation of past price changes.
4.2.3 Uncorrelated Increments (RW3)
The next model is
r
t
= + e
t
, e
t
uncorrelated.
One may also relax the assumption of RW2 to include processes with dependent but
uncorrelated increments.
This is the weakest and the most often tested in the empirical literature form of ran-
dom walk hypothesis. It allows for heteroskedasticity as well as dependence in higher
moments.
RW3 model contains RW1 and RW2 as special cases.
4.2.4 Unconditional Mean is the Best Predictor (RW4)
The formulations of random walk hypothesis RW1-RW2 do not rule out the possibility that a
nonlinear predictor is more accurate than the unconditional expectation. The unconditional
mean is the best prediction in the RW4:
E(r
t+1
|I
t
) =
for some constant , and for all times t and returns histories I
t
= {r
ti
, i 0}.
4.3 Tests of Predictability
For testing the predictability of stock returns, which is a cornerstone in nance, researchers
have used a variety of tests:
1. Nonparametric tests
2. Autocorrelation tests
3. Variance ratio tests
4. Tests based on trading rules
We will consider some of these tests next. For more tests please read the book by Lo and
MacKinlay (1999) or any statistics books on this regard. Two nonlinear correlation measures
are Kendalls , which is dened as
= 4
_ _
F(x, y)dF(x, y) 1,
proposed by Kendall (1938), where F(x, y) is the joint cdf of (X, Y ), and the Spearmans ,
which is dened as
s
= 12Cov(F
x
(X), F
y
(Y )),
proposed by Spearman (1904), where F
x
(x) and F
y
(y) are the marginal cdfs of X and Y
respectively. For details, see the book by Nelsen (2005).
4.3.1 Nonparametric Tests
There are several nonparametric tests for testing the IID assumption of the increments.
Some examples are the Spearman rank correlation test, Spearmans footrule test, runs
test, the Kendall correlation test. Note those tests can be found in the function
correlationTest in the package fBasics and the runs test built in the package tseries
with the function (runs.test).
correlationTest(x, y, method = c("pearson", "kendall", "spearman"),
title = NULL, description = NULL)
runs.test(x, alternative = c("two.sided", "less", "greater"))
The brief description of the Pearson, Spearman rank and Kendall correlation tests
is given below.
A. Pearson correlation test: The test statistic is
r/
_
(1 r
2
)/(n 2)
H
0
t
n2
,
which has a t-distribution with degrees of freedom of n2, where r is the sample
correlation coecient as
r =
n
t=1
(x
t
x)(y
t
y)
[
n
t=1
(x
t
x)
2
n
t=1
(y
t
y)
2
]
1/2
.
Here it is assumed the Gaussian distribution.
B. Spearmans Rank Correlation Test: Pearson correlation is unduly inuenced by
outliers, unequal variances, non-normality, and nonlinearity. An important com-
petitor of the Pearson correlation coecient is the Spearmas rank correlation
coecient. This latter correlation is calculated by applying the Pearson corre-
lation formula to the ranks of the data rather than to the actual data values
themselves. In so doing, many of the distortions that plague the Pearson cor-
relation are reduced considerably. Pearson correlation measures the strength of
linear relationship between x and y. In the case of nonlinear, but monotonic
relationships, a useful measure is Spearmans rank correlation coecient.
Spearmans rank correlation test is a test for correlation between a sequence of
pairs of values. Using ranks eliminates the sensitivity of the correlation test to
the function linking the pairs of values. In particular, the standard correlation
test is used to nd linear relations between test pairs, but the rank correlation
test is not restricted in this way. Given pairs of observations (x
y
, y
t
), the x
t
values
are assigned a rank value and, separately, the y
t
values are assigned a rank. For
each pair (x
y
, y
t
), the corresponding dierence, d
t
between the x
t
and y
t
ranks is
found. The value R is
n
t=1
d
2
t
. For large samples the test statistic is then
Z =
6 R n(n
2
1)
n(n + 1)
n 1
,
which is approximately normally distributed.
C. Kendalls correlation test: This is a measure of correlation between two ordinal-
level variables. It is most appropriate for square tables. For any sample of n
observations, there are [n(n 1)/2] possible comparisons of points (x
i
, y
i
) and
(x
j
, y
j
). Let C = number of pairs that are concordant and let D = number of
pairs that are not concordant.
Kendalls =
C D
_
n
2
_ .
Obviously, has the range: 1 1. If x
i
= x
j
, or y
i
= y
j
or both, the
comparison is called a tie. Ties are not counted as concordant or discordant.
If there are a large number of ties, then the dominator
_
n
2
_
has to be replaced by
_
__
n
2
_
n
x
__
n
2
_
n
y
where n
x
is the number of ties involving x, and n
y
is the
number of ties involving y. In large samples, the statistic:
3
_
n(n 1)/
_
2(2n + 5),
has a normal distribution, and therefore can be used as a test statistic for testing
the null hypothesis of zero correlation.
D. Runs Test: See Section 6.5 in Taylor (2005, p.133).
One of the rst tests of RW1 was proposed by Cowles and Jones (1937) and consists
of a comparison of the frequency of sequences (pairs with consecutive returns with
the same sign) and reversals (pairs with consecutive returns with opposite signs) in
historical returns. Specically, Cowles and Jones (1937) assumed that the log price
follows an IID random walk without drift:
p
t
= + p
t1
+ e
t
, e
t
IID(0,
2
) (4.2)
given a sample of T +1 prices p
1
, p
2
, . . . , p
T+1
the number of sequences N
s
and reversals
N
r
may be expressed as:
N
s

T
t=1
Y
t
, Y
t
I
t
I
t+1
+ (1 I
t
)(1 I
t+1
), and N
r
T N
s
,
where
I
t
=
_
1 if r
t
p
t
p
t1
> 0
0 if r
t
p
t
p
t1
0.
If log prices follow a driftless ( = 0) random walk and the distribution of e
t
is symmet-
ric, the positive and negative values of r
t
should be equally likely. The Cowles-Jones
ratio for testing IID assumption is dened as:
CJ
N
s
N
r
=
N
s
/T
N
r
/T
=

s
1
s
,
where
s
= E(Y
t
) and
s
= N
s
/T is the sample version of
s
. Cowles and Jones (1937)
found that this ratio exceeded one for many historical stock returns and concluded
that this represents conclusive evidence of structure in stock prices. Under the null
hypothesis of IID increments, one may show that:
T
_
CJ

s
1
s
_
H
0
N
_
0,

s
(1
s
) + 2(
3
s
+ (1
s
)
3
2
s
)
(1
s
)
4
_
, (4.3)
where
s
(/) and () is the distribution function of the standard normal. By
assuming that = 0 in (4.2), under the null hypothesis of IID increments,
s
= 1/2,
so that under the null hypothesis of IID increments, we have
z
0
=
T (CJ 1) /2
H
0
N(0, 1), (4.4)
and the p-value can be approximated as
p-value P(|N(0, 1)| > |z
0
|) = 2[1 (
T|CJ 1|/2).
Note that if = 0, we need to centerize the data rst and then apply the Cowles-
Jones test. Alternatively, we can estimate and
2
so that we can estimate
s
by

s
= ( / ) and then use equation (4.3). Now the test statistic in (4.4) becomes
z
1
=
T
_
CJ

s
1
s
_
/
s
H
0
N(0, 1), (4.5)
where
2
s
= [
s
(1
s
) + 2(
3
s
+ (1
s
)
3
2
s
)] (1
s
)
4
, and the p-value can be
approximated as
p-value P(|N(0, 1)| > |z
1
|) = 2[1 (
T|CJ
s
/(1
s
)|/
s
).
Note that is the sample mean of returns and
2
is the sample variance of returns.
For details, see CLM (1997, Section 2.2.2).
4.3.2 Autocorrelation Tests
Assume that r
t
is covariance stationary and ergodic. Then,
k
= Cov(r
t
, r
tk
),
k
=

k
0
,
and sample estimates are
Result: Under RW1 it can be shown that:
T
k
H
0
N(0, 1).
This test can be used to check whether each autocorrelation coecient
k
is individually
statistically signicant, H
0
:
k
= 0.
Box-Pierce Q-statistic
Consider testing that several autocorrelation coecients are simultaneously zero, i.e. H
0
:
1
=
2
= . . . =
m
= 0. Under the RW1 null hypothesis, it is easy to show (see, Box and
Pierce (1970)) that
Q = T
m
k=1
2
k
H
0

2
m
. (4.6)
Ljung and Box (1978) provided the following nite sample correction which yields a better
t to the
2
m
for small sample sizes:
Q
= T(T + 2)
m
k=1

2
k
T k
H
0

2
m
. (4.7)
Both are called Q-test (Q-statistic in (4.6) or Q
-statistic in (4.7)) and well known in the

statistics literature. Of course, they are very useful in applications. Finally, note that many
versions of the modied Q-test can be found in the literature; see Lobato, Nankervis, and
Savin (2001) and Hong (2001).
4.3.3 Variance Ratio Tests
The white noise hypothesis can also be veried by aggregating data sampled at various
frequencies and comparing properties of the obtained time series. Let us consider a series
obtained by adding n consecutive observations:
r
n
t
= r
t
+ r
t+1
+ . . . + r
t+n1
.
Under the white noise hypothesis, = 0 and we get:
Var(r
n
t
) = Var(r
t
+r
t+1
+ . . . +r
t+n1
) = Var(e
t
+e
t+1
+ . . . +e
t+n1
) = nVar(e
t
) = nVar(r
t
).
The variance of a multi-period returns is the sum of single period variance when the hypoth-
esis of RW is true. Then, under the null hypothesis of white noise for the error term (i.e.
RW hypothesis):
Var(r
n
t
)
nVar(r
t
)
= 1.
Example:
Under RW1:
VR(2) =
Var(r
t
(2))
2 Var(r
t
)
=
Var(r
t
+ r
t1
)
2 Var(r
t
)
=
2
2
2
2
= 1.
If r
t
is a covariance stationary process, then
VR(2) =
Var(r
t
) + Var(r
t1
) + 2 Cov(r
t
, r
t1
)
2 Var(r
t
)
=
2
2
+ 2
1
2
2
= 1 +
1
.
Three cases are possible:

1
= 0 VR(2) = 1

1
> 0 VR(2) > 1 (mean aversion)

1
< 0 VR(2) < 1 (mean reversion)
A general n period variance ratio (VR) under stationarity
VR(n) =
Var(r
n
t
)
nVar(r
t
)
= 1 + 2
n1
k=1
_
1
k
n
_
k
.
The asymptotic distribution of

VR(n) is as follows:
T
_
VR(n) 1
_
H
0
N(0, 2(n 1)),
where

VR(n) is the sample version of VR(n) and the sample version of Var(r
n
t
) is based on
non-overlapping; see Theorem 2.1 in Lo and MacKinley (1999, p.22). The null hypothesis
of white noise can be tested by computing the standardized statistic
T
_
VR(n) 1
_
/
_
2(n 1).
If it lies outside the interval [1.96, 1.96] the white noise hypothesis can be rejected.
Lo and MacKinlays VR Test Statistics
This test was proposed by Lo and MacKinlay (1988, 1989), described as follows. Also, it can
be found in the book by Lo and MacKinlay (1999). Under RW1, the standardized variance
ratio
(n) =
T
_
VR(n) 1
_
_
2(2n 1)(n 1)
3 n
_
1/2
H
0
N(0, 1), (4.8)
where

VR(n) is the sample version of VR(n) and the sample version of Var(r
n
t
) is based on
overlapping; see Theorem 2.2 in Lo and MacKinley (1999, p.23).
Under RW2 and RW3, the heteroskedasticity-robust standardized variance ratio:
(n) =
T
_
VR(n) 1
_
(n)
1/2
H
0
N(0, 1),
where
(n) = 4
n1
j=1
_
n j
n
_
2
j
,
j
=
T
t=j+1
0t
jt
(
T
t=1
0t
)
2
, and
jt
=
_
r
tj
r
tj1
r
T
r
0
T
_
2
.
For more materials, see Lo and MacKinley (1999) and the references therein. Based on the
results from Lo and MacKinlay (1999, Section 2.2), the weakly stock price returns (both
market indices and individual securities) do not follow random walks by using the variance
ratio tests.
R Functions
The above variance ratio type tests can be found in the package vrtest in R. There are several
functions available for variance ratio tests
Boot.test(y, kvec, nboot, indicator)
# This function returns bootstrap p-values of the Lo-MacKilay (1988)
# and Chow-Denning (1993) tests
Chow.Denning(y, kvec)
# This function returns Chow-Denning test statistics.
# CD1: test for iid series;
# CD2: test for uncorrelated series with possible heteroskedasticity
Wright(y, kvec)
# The function returns R1, R2 and S1 tests statistics detailed in Wright (2000)
Wright.crit(n, k, nit)
# This function returns critical values of Wrights tests based on
# the simulation method detailed in Wright (2000)
Joint.Wright(y, kvec)
# This function returns joint or multiple version of Wrights rank and sign
# tests; see Wright (2000), Belaire-Franch and Contreras (2004) and
# Kim and Shamsuddin (2004).
# The test takes the maximum value of the individual rank or sign tests,
# in the same manner as Chow-Denning test
JWright.crit(n, kvec, nit)
# This function runs a simulation to calculate the critical values of the
# joint versions of Wrights tests.
Lo.Mac(y, kvec)
# The function returns M1 and M2 statistics of Lo and MacKinlay (1998)
# M1: tests for iid series;
# M2: for uncorrelated series with possible heteroskedasticity.
Subsample.test(y, kvec)
# The function returns the p-values of the subsampling test; see Whang
# and Kim (2003). The block lengths are chosen internally using the rule
# proposed in Whang and Kim (2003)
Wald(y, kvec)
# This function returns the Wald test statistic with critical values;
# see Richardson and Smith (1991)
For details about the aforementioned functions in vrtest in R, please read the manual of the
package vrtest.
4.3.4 Trading Rules and Market Eciency
Testing for independence without assuming identical distributions is quite dicult. There
are two lines of empirical research that can be viewed as economic test of RW2: trading
rules, and technical analysis. To test RW2, one can apply a lter rule in which an asset is
purchased when its price increases by x% and sold when its price drops by x%. The total
return of this dynamic portfolio strategy is then not a measure of the predictability in asset
returns. A comparison of the total return to the return from a buy-and-hold strategy for the
Dow Jones and S&P500 indices led some researchers to conclude that there are some trends
in stock market prices. However, if empirical analysis is corrected for dividends and trading
costs, lter rules do not perform as well as buy-and-hold strategy.
A trading rule is a method for converting the history of prices into investment decisions.
Trend-following trading rules have the potential to exploit any positive autocorrelation in
the stochastic process that generates returns. The idea is that ecient markets lead to prior
beliefs that trading rules can not achieve anything of practical value. There are four popular
trading rules:
1. The double moving-average trading rule
2. The channel rule
3. The lter rule
4. The rule designed around ARMA(1,1) forecasts of future returns
In investment decisions, a typical decision variable at time t is the quantity q
t+1
of an asset
that is owned from the time of price observation t until the next observation at time t + 1.
The quantity q
t
is some function of the price history, I
t
= {p
t
, p
t1
, p
t2
, . . . .}.
The Moving-Average Rule
Two averages of length S (a short period of time) and L (a longer period) are calculated at
time t from the most recent price observations, including p
t
:
a
t,S
=
1
S
S
j=1
p
tS+j
=
1
S
(p
tS+1
+ + p
t
), a
t,L
=
1
L
L
j=1
p
tL+j
=
1
L
(p
tL+1
+ + p
t
).
Alternatively, one might use the exponential smoothing technique as we discussed in Chapter
3. The R for exponential smoothing is in the package fTrading. We consider the relative
dierence between the short- and long-term averages:
R
t
= (a
t,S
a
t,L
)/a
t,L
.
Some popular parameter combinations have S 5 (one week) and L 50 (10 weeks). When
the short-term average is above [below] the long-term average, it may be imagined that prices
are following an upward [downward] trend. The investment decision is dened as follows:
_
Buy if R
t
> B,
Neutral if |R
t
| B,
Sell if R
t
< B.
This algorithm has three parameters: S, L, and B. The bandwidth B can be zero and then
(almost) all days are either Buys or Sells. For more about the moving-average technical
trading rule (MATTR), please see the papers by LeBaron (1997, 1999).
The Channel Rule
By analogy with the moving-average rule, the short-term average is replaced by the most re-
cent price (S = 1) and the long-term average is replaced by either a minimum or a maximum
of the L previous prices dened by:
m
t1
= min(p
tL
, . . . , p
t2
, p
t1
), and M
t1
= max(p
tL
, . . . , p
t2
, p
t1
).
A person who believes prices have been following an upward [downward] trend may be willing
to believe the trend has changed direction when the latest price is less [more] than all recent
previous prices. The rule has two parameters: the channel length L and the bandwidth B.
The algorithm is dened as follows. If day t is a Buy, then day t + 1 is
_
Buy if p
t
(1 + B)m
t1
,
Sell if p
t
< (1 B)m
t1
,
Neutral otherwise.
(4.9)
If day t is a Sell, then symmetric principles classify day t + 1 as:
_
Sell if p
t
(1 B)M
t1
,
Buy if p
t
> (1 + B)M
t1
,
Neutral otherwise.
(4.10)
For a Neutral day t, day t + 1 is
_
Buy if p
t
> (1 + B)M
t1
,
Sell if p
t
< (1 B)m
t1
,
Neutral otherwise.
(4.11)
Filter rule
In this algorithm, the short-term average is replaced by the most recent price and the long-
term average is replaced by some multiple of the maximum or minimum since the most recent
trend is believed to have commenced. The terms m
t
and M
t
are dened for a positive lter
size parameter f and a trend commencing at time s, by
m
t1
= (1 f) min(p
s
, . . . , p
t2
, p
t1
), and M
t1
= (1 + f) max(p
s
, . . . , p
t2
, p
t1
).
A person may believe an upward (downward) trend has changed direction when the latest
price has fallen (risen) by a fraction f from the highest (lowest) price during the upward
(downward) trend. The parameters of the lter rule are f and B. If day t is a buy, then
s + 1 is the earliest buy day for which there are no intermediate Sell days and day t + 1 is
classied using (4.9), it is possible that s +1 = t. If day t is a Sell, then s +1 is the earliest
sell day for which there are not intermediate buy days and day t +1 is classied using (4.10).
If day t is neutral, then nd the most recent non-neutral day and use its value of s: if this
non-neutral day is a buy, then apply (4.9) and otherwise apply (4.10). To start classication,
the rst non-neutral day is identied when either p
t
> (1 + B)M
t1
or p
t
< (1 B)m
t1
,
with s = 1.
A Statistical Rule
Trading rules based upon ARMA models (say an ARMA(1,1)) are also popular even though
the prots from these rules are slightly less than those from simpler moving-average, channel,
and lter rules. The statistical trading rule uses ARMA forecasting theory applied to re-
scaled returns dened by r
t
/
h
t
with the conditional standard deviation
h
t
obtained from
a special case of the simple ARCH or GARCH (say an GARCH) type model. The rule relies
on k
t+1
which is dened as
k
t+1
= f
t,1
/
f
,
where f
t,1
is the one-day-ahead forecast and
f
is its standard error. They are dened as:
f
t,1
= (h
t+1
/h
t
)
1/2
[( + )r
t
f
t1,1
],
f
=
_
h
t+1
[A( + )/(1 + )]
1/2
,
and
_
h
t+1
= 0.9
_
h
t
+ 0.1253|r
t
|.
An upward [downward] trend is predicted when k
t+1
is positive [negative]. A nonnegative
threshold parameter k
determines the classication of days. If day t is a Buy, then day t +1

is
_
Buy if k
t
> 0,
Sell if k
t
k
,
Neutral otherwise.
If day t is a Sell, then day t + 1 is
_
Sell if k
t
< 0,
Buy if k
t
k
,
Neutral otherwise.
The day after a Neutral day t is
_
Buy if k
t
k
,
Sell if k
t
k
,
Neutral otherwise.
R Functions
The package TTR contains functions to construct technical trading rules in R.
4.4 Empirical Results
4.4.1 Evidence About Returns Predictability Using VR and Au-
tocorrelation Tests
Taylor (2005) presented some results on daily, weekly, and monthly returns using variance
ratio tests; see Table 5.2 in Taylor (2005, p.110). Empirical results can also be found in Sec-
tion 2.8 of CLM (1997). CLM (1997) considered CRSP value-weighted and equal-weighted
indices, individual securities from 1962 - 1994.
Daily, weekly and monthly continuously compounded returns from value-weighted and
equal-weighted indices show signicant the rst order positive autocorrelation (Table
4.3).

VR(n) > 1 and
(n) statistics reject RW hypothesis for equal-weighted index but not

for value-weighted index (Tables 4.1 and 4.2 and Table 2.5 in CLM (1997, p.69)).
Poterba and Summers (1988) compared monthly and annual variances of US market
returns in excess of the risk-free rate from 1962 to 1985. The variance ratio from the
value weighted index
VR(12) = 1.31 with a similar ratio of 1.27 for the equal-weighted

index.
Rejection of RW hypothesis by the equal-weighted index but not by the value-
weighted index suggests that market capitalization or size may play a role in the
behavior of the variance ratios. It turns out that
VR(n) > 1 and
(n) are largest

for portfolios of small rms.
For individual securities, typically

VR(q) < 1 (i.e. slightly negative autocorrelation)
and
(n) is not signicant.

That returns have statistically insignicant autocorrelation is not surprising. In-
dividual returns contain much specic or idiosyncratic noise that makes it dicult
to detect the presence of predictable components.
Nevertheless, how is it possible that portfolio

VR(n) > 1 (positive autocorrela-
tion) when individual security

VR(n) < 1?
4.4.2 Cross Lag Autocorrelations and Lead-Lag Relations
Explanation: Portfolio returns can be positively correlated and securities returns can be
negatively correlated if there are positive cross lag autocorrelations between the securities in
the portfolio.
Let R
t
denote an N 1 vector of N security returns. Dene
k
ij
= Cov(r
it
, r
jtk
) = cross lag autocorrelation.
Then,
k
= Cov(R
t
, R
tk
) =
_
_
_
_
k
11

k
12

k
1N
k
21

k
22

k
2N
.
.
.
.
.
.
.
.
.
.
.
.
k
N1

k
N2

k
NN
_
_
_
_
Let R
mt
denote a returns on equal-weighted portfolio, i.e. R
mt
=
R
t
/N, where is a N 1
vector of ones. Then,
Cov(R
m,t
, R
m,t1
) =
1
N
2
1
.
The rst-order autocorrelation of the portfolio can be expressed as:
Corr(R
mt
, R
mt1
) =
Cov(R
mt
, R
mt1
)
Var(R
mt
)
=

1
tr(
1
)
+
tr(
1
)
. (4.12)
The rst term of the right hand side of (4.12) contains only cross-autocovariances and the
second term only the own-autocovariances. Tables 2.8 and 2.9 in CLM (1997) show the
empirical study on how the market capitalization or size may play a role in the behavior of
variance ratios.
Discuss the autocorrelation matrix of the dierent size-sorted (according to CRSP
quintile) portfolios. See Table 2.8 of CLM (1997, p.75) for the empirical study.
Lead-lag pattern: larger capitalization stocks lead and smaller capitalization stocks
lag. See Table 2.9 of CLM (1997, p.77) for the empirical study.
Table 4.1: Variance ratio test values, daily 1991-2000 (from Taylor, 2005)
Variance Ratios VR(n)
n=2 n = 5 n = 20
S&P 100 index 0.976 0.905 0.759
Spot DM/$ 1.018 1.042 1.36
z(n) statistic
S&P 100 index -0.73 -1.41 -1.76
Spot DM/$ 0.73 0.80 0.30
S&P 500 index 4.00 2.66 0.62
Nikkei 225-share 1.83 -0.01 0.46
Coca Cola -1.24 -2.33 -2.05
General Electric -0.92 -1.93 -1.27
General Motors 0.57 -1.29 -0.75
Glaxo 3.56 1.85 0.48
Notes: The crash week, commencing on 19 October 1987, is excluded from the time series. Overall, these
tests do not provide much evidence against randomness
Table 4.2: Variance ratio test values, weekly 1962-1994 (from Taylor, 2005)
Variance Ratios VR(n)
n=2 n = 4 n = 8 n = 16
Equal weighted 1.20 1.42 1.65 1.74
Value weighted 1.02 1.02 1.04 1.02
z(n) statistic
S&P 100 index 4.53 5.30 5.84 4.85
Spot DM/$ 0.51 0.30 0.41 0.14
Notes: CLM considered equal and value-weighted indices calculated by pooling returns from NYSE and
AMEX
4.4.3 Evidence About Returns Predictability Using Trading Rules
We here present some evidences about equity returns predictability and evidences about the
predictability of currency and other returns. See Taylor (20045) and CLM (1997). For recent
developments, see the paper by Polk, Thompson and Vuolteenaho (2006).
Table 4.3: Autocorrelations in daily, weekly, and monthly stock index returns
(from CLM, 1997, p.67)
Sample Mean SD
1

2

3

4

Q
5

Q
10
Daily returns, CRSP value - weighted index
period I 0.041 0.824 0.176 -0.007 0.001 -0.008 263.3 269.5
period II 0.054 0.901 0.108 -0.022 -0.029 -0.035 69.5 72.1
Daily returns, CRSP equal - weighted index
period I 0.070 0.764 0.35 0.093 0.085 0.099 1301 1369
period II 0.078 0.756 0.26 0.049 0.020 0.049 348.9 379.5
Weekly returns, CRSP value - weighted index
period I 0.196 2.093 0.015 -0.025 0.035 -0.007 8.8 36.7
period II 0.248 2.188 -0.020 -0.015 0.016 -0.033 5.3 25.2
Weekly returns, CRSP equal - weighted index
period I 0.339 2.321 0.203 0.061 0.091 0.048 94.3 109.3
period II 0.354 2.174 0.184 0.043 0.055 0.022 33.7 51.3
Monthly returns, CRSP value - weighted index
period I 0.861 4.336 0.043 -0.053 -0.013 -0.040 6.8 12.5
period II 1.076 4.450 0.013 -0.063 -0.083 -0.077 7.5 14.0
Monthly returns, CRSP equal - weighted index
period I 1.077 5.749 0.171 -0.034 -0.033 -0.016 12.8 21.3
period II 1.105 5.336 0.150 -0.016 -0.124 -0.074 8.9 14.2
Notes: period I = 62:07:03 - 94:12:30; period II= 78:10:30 - 94:12:30.
2
5,0.005
= 16.7.
4.5 Predictability of Real Stock and Bond Returns
4.5.1 Financial Predictors
There is some evidence that following nancial variables (instruments) may help predict log
real stock and bond returns over horizons of 1-10 years based on some linear or nonlinear
models:
Dividend-price ratio. The dividend-price ratio in year t is the ratio of nominal dividends
during year t to the nominal stock price in January of year t + 1.
Dividend yield. The dividend yield in year t corresponds to the ratio of nominal
dividends for year t to the nominal stock price in January of year t.
Earnings-price ratio.
Book-to-market ratio.
Federal q. This is the ratio of the total market value of equities outstanding to corporate
net worth.
Payout ratio. Ratio of the dividends to the earnings
Term spread. This is dierence between annualized long-term and short-term govern-
ment yield.
Default spread. This is dierence between Moodys seasoned Baa corporate bond yield
and the Moodys seasoned Aaa corporate bond yield
Short-term rate. This is the 3-month Treasure bill rate (secondary market)

4.5.2 Models and Modeling Methods
Introduction
The predictability of stock returns has been studied for decades as a cornerstone research
topic in economics and nance. See, for example, Fama and French (1988), Keim and
Stambaugh (1986), Campbell and Shiller (1988), Cutler, Poterba, and Summers (1991),
Balvers, Cosimano, and McDonald (1990), Schwert (1990), Fama(1990), and Kothari and
Shanken (1997). In many nancial applications such as the mutual fund performance, the
conditional capital asset pricing, and the optimal asset allocations, people routinely examine
the predictability problem. See, for example, Christopherson et al. (1998), Ferson and Schadt
(1996), Ferson and Harvey (1991), Ghysels (1998), Ait-Sahalia and Brandt (2001), Barberis
(2000), Brandt (1999), Campbell and Viceira (1998), and Kandel and Stambaugh (1996).
Tremendous empirical studies document the predictability of stock returns using various
lagged nancial variables, such as the dividend yield, the term spread and default premia,
the dividend-price ratio, the earning-price ratio, the book-to-market ratio, and the interest
rates. Important questions are often asked about whether the returns are predictable and
whether the predictability is stable over time. Since many of the predictive nancial variables
are highly persistent and even nonstationary, it is really challenging statistically to answer
these questions.
The predictability issues are generally assessed in the context of parametric predictive
regression models in which rates of returns are regressed against the lagged values of stochas-
tic explanatory variables (or state variables). Now let us review the eorts in literature on
this topic. Mankiw and Shapiro (1986) and Stambaugh (1986) were the rst to discern
the econometric (statistical) diculties inherent in the estimation of predictive regressions
through the structure predictive linear model:
y
t
=
1
+ x
t1
+
t
, x
t
= x
t1
+ u
t
, 1 t n, (4.13)
where innovations {(
t
, u
t
)} are independently and identically distributed bivariate normal
N(0, ) with =
_

2
u

2
u
_
, y
t
is the predictable variable, say excess stock returns, in
period t, and x
t1
is a nancial variable such as the log dividend-price ratio at t 1, which is
commonly modeled by an AR(1) model as in (4.13). Note that the correlation between the
innovations is =
u
/
u
, which is unfortunately non-zero for many empirical applications;
see Table 4 in Campbell and Yogo (2006) and Table 1 in Paye and Timmermann (2006).
This creates the endogeneity (x
t1
and
t
are correlated) which makes modeling dicult.
The parameter is the unknown degree of persistence of the variable x
t
. That is, x
t
is
stationary (|| < 1); see Amihud and Hurvich (2004) and Paye and Timmermann (2006),
or it is local-to unity or nearly integrated ( = 1 + c/n with c < 0), or it is unit root or
integrated (denoted by I(1)) ( = 1). See, for example, Elliott and Stock (1994), Cavanagh,
Elliott, and Stock (1995), Torous, Valkanov, and Yan (2004), Campbell and Yogo (2006),
Polk, Thompson, and Vuolteenho (2006), and Rossi (2007), among others. This means that
predictive variable x
t
is highly persistent, not really exogenous, and even nonstationary,
which causes a lot of troubles for statistical modeling.
As shown in Nelson and Kim (1993), the ordinary least squares (OLS) estimates of the
slope coecient and its standard errors are substantially biased in nite samples if x
t
is
highly persistent, not really exogenous, and even nonstationary. Conventional tests based
on standard t-statistics from OLS estimates tend to over-reject the null of non-predictability
in the Monte-Carlo simulations, although some improvements were developed recently.
In an eort to dealing with the aforementioned diculties associated with the endogeneity
and to obtaining ecient inference about the coecient , researchers have made their
contributions, summarized as follows:
(1) The bias correction of the OLS estimate, using information conveyed by the autoregres-
sive process of the predictive variable. See, for example, the rst order bias-corrected
much smaller. For these predictor variables, the pretest rejects the null hypothesis, which
suggests that the conventional t-test leads to approximately valid inference.
4.3. Testing the predictability of returns
In this section, we construct valid condence intervals for b through the Bonferroni
Q-test to test the predictability of returns. In reporting our condence interval for b, we
scale it by bs
e
=bs
u
. In other words, we report the condence interval for
e
b s
e
=s
u
b instead
ARTICLE IN PRESS
Table 4
Estimates of the model parameters
Series Obs. Variable p d DF-GLS 95% CI: r 95% CI: c
Panel A: S&P 1880 2002, CRSP 1926 2002
S&P 500 123 dp 3 0.845 0.855 0:949; 1:033 6:107; 4:020
ep 1 0.962 2.888 0:768; 0:965 28:262; 4:232
Annual 77 dp 1 0.721 1.033 0:903; 1:050 7:343; 3:781
ep 1 0.957 2.229 0:748; 1:000 19:132; 0:027
Quarterly 305 dp 1 0.942 1.696 0:957; 1:007 13:081; 2:218
ep 1 0.986 2.191 0:939; 1:000 18:670; 0:145
Monthly 913 dp 2 0.950 1.657 0:986; 1:003 12:683; 2:377
ep 1 0.987 1.859 0:984; 1:002 14:797; 1:711
Panel B: S&P 1880 1994, CRSP 1926 1994
S&P 500 115 dp 3 0.835 2.002 0:854; 1:010 16:391; 1:079
ep 1 0.958 3.519 0:663; 0:914 38:471; 9:789
Annual 69 dp 1 0.693 2.081 0:745; 1:010 17:341; 0:690
ep 1 0.959 2.859 0:591; 0:940 27:808; 4:074
Quarterly 273 dp 1 0.941 2.635 0:910; 0:991 24:579; 2:470
ep 1 0.988 2.827 0:900; 0:986 27:322; 3:844
Monthly 817 dp 2 0.948 2.551 0:971; 0:998 23:419; 1:914
ep 2 0.983 2.600 0:970; 0:997 24:105; 2:240
Panel C: CRSP 1952 2002
Annual 51 dp 1 0.749 0.462 0:917; 1:087 4:131; 4:339
ep 1 0.955 1.522 0:773; 1:056 11:354; 2:811
r
3
1 0.006 1.762 0:725; 1:040 13:756; 1:984
yr
1
1 0.243 3.121 0:363; 0:878 31:870; 6:100
Quarterly 204 dp 1 0.977 0.392 0:981; 1:022 3:844; 4:381
ep 1 0.980 1.195 0:958; 1:017 8:478; 3:539
r
3
4 0.095 1.572 0:941; 1:013 11:825; 2:669
yr
1
2 0.100 2.765 0:869; 0:983 26:375; 3:347
Monthly 612 dp 1 0.967 0.275 0:994; 1:007 3:365; 4:451
ep 1 0.982 0.978 0:989; 1:006 6:950; 3:857
r
3
2 0.071 1.569 0:981; 1:004 11:801; 2:676
yr
1
1 0.066 4.368 0:911; 0:968 54:471; 19:335
This table reports estimates of the parameters for the predictive regression model. Returns are for the annual S&P
500 index and the annual, quarterly, and monthly CRSP value-weighted index. The predictor variables are the log
dividendprice ratio (dp), the log earningsprice ratio (ep), the three-month T-bill rate (r
3
), and the long-short
yield spread (yr
1
). p is the estimated autoregressive lag length for the predictor variable, and d is the estimated
correlation between the innovations to returns and the predictor variable. The last two columns are the 95%
condence intervals for the largest autoregressive root (r) and the corresponding local-to-unity parameter (c) for
each of the predictor variables, computed using the DF-GLS statistic.
J.Y. Campbell, M. Yogo / Journal of Financial Economics 81 (2006) 2760 47
A. Results
For each of our sampled stochastic explanatory variables, table 1
presents 95% condence intervals for U.
10
We provide results using the
entire time series of data and, to investigate the robustness of our
conclusions, the pre-1952 and post-1952 subsamples. In almost every
case, these 95% condence intervals include the unit root U = 1. The
exceptions include the log dividend yield series over the 1926:12 to
1994:12 sample period whose upper limit of 0.996 is nearly indis-
tinguishable from 1. While the 95% condence interval for the term-
spread series based on the entire sample period does not contain 1, the
interval based on the post-1952 subsample does.
TABLE 1 95% Condence Intervals for the Largest Autoregressive Root
of the Stochastic Explanatory Variables
Series Sample Period k ADF 95% Interval
Dividend yield 1926:121994:12 5 3.30 (.960, .996)
1926:121951:12 1 2.84 (.915, 1.004)
1952:11994:12 1 2.65 (.956, 1.004)
Default spread 1926:121994:12 2 2.49 (.976, 1.003)
1926:121951:12 3 0.90 (.984, 1.015)
1952:11994:12 2 2.50 (.963, 1.004)
Book-to-market 1926:121994:08 6 2.35 (.977, 1.003)
1926:121951:12 6 1.60 (.967, 1.013)
1952:11994:08 6 1.24 (.986, 1.008)
Term spread 1926:121994:12 6 3.57 (.955, .992)
1926:121951:12 6 3.11 (.943, .999)
1952:11994:12 2 1.83 (.957, 1.012)
Short-term rate 1926:121994:12 8 1.85 (.984, 1.004)
1926:121951:12 1 1.90 (0.955, 1.012)
1952:11994:12 7 1.90 (.974, 1.007)
Note.This table provides 95% condence intervals for the largest autoregressive root U of sto-
chastic explanatory variables typically used in predictive regressions. The explanatory variables used
are Dividend yield, Default spread, Book to market, Term spread, and Short-term rate. Dividend yield
is the log real dividend yield, constructed as in Fama and French (1988). Default spread is the log of the
difference between monthly averaged annualized yields of bonds rated Baa and Aaa by Moodys. Book-
to-market is the log of Pontiff and Schalls (1998) Dow Jones Industrial Average (DJIA) book-to-
market ratio. Term spread is the difference between annualized yields of Treasury bonds with maturity
closest to 10 years at month end and 3-month Treasury bills. Short-term rate is the nominal 1-month
Treasury bill rate. The augmented Dickey-Fuller statistic is denoted ADF, and we follow Ng and Perron
(1995) in determining the maximum lag length k.
where
f1
2
ac
and
f
1
1
2
a
c
are the
1
2
a and 1
1
2
a percentiles of H as a function of c. As f
is strictly monotone in c, we can invert the preceding expression to give
c : f
1
1
2
a
c c f
1
1
1
2
a
c

:
10. We use the sequential pretesting method of Ng and Perron (1995) to determine
the maximum lag length k. This method selects k only after testing sequentially that the
coefcients on additional lags, k 1 and longer, are statistically insignicant.
944 Journal of Business
OLS estimator in Kothari and Shanken (1997) and Stambaugh (1999), the second or-
der bias-correction method in Amihud and Hurvich (2004), and the conservative bias-
correction method in Lewellen (2004) which assumes the true autoregressive coecient
of AR(1) to be close to one.
(2) Econometric inferences about the linear regression coecient . The inference for the
slope coecient is unreliable, due to the discontinuity in the asymptotic distribution
of the estimator of the I(1) or nearly I(1) autoregressive coecient of the predictive
variable which is often persistent and nonstationary. This is another diculty for
modeling predictive regression models. In nite samples, this problem thwarts the
drawing of correct inference of the slope coecient even when the coecient in an
AR(1) process is close to, but not necessarily equal to, one. In the literature, people
seek more accurate sampling distributions of test statistics. Some apply the exact nite-
sample theory under the assumption of normality, (see Kothari and Shanken (1997),
Stambaugh(1999), and Lewellen (2004), among others) and others employ nearly I(1)
asymptotics to approximate the nite sample distributions. It is noteworthy that these
hypothesis testing procedures are all based on the biased OLS coecient estimates.
Note that OLS estimates of the coecient in predictive linear regression are also widely
used in nance literature on out-of-sample forecasting; see Goyal and Welch (2003a,b).
(3) The instability of return forecasting models. In fact, in forecasting models for the
dividend and earnings yield, the short interest rate, and the term spread and default
premium, there have been found many evidences on instability of prediction in the
second half of the 1990s, which lead to the conclusion that the coecients should
change over time; see Lettau and Ludvigsson (2001), Goyal and Welch (2003a), Paye
and Timmermann (2006), and Ang and Bekaert (2007).
However, existing approaches may not be appropriate in many real applications due to
restrictive assumptions on the functional forms in regression. In fact, the above studies
are mostly based on linear predictive models and produce biased and inecient estimates,
especially when the predictive variable follows an AR(1) model with the innovation highly
correlated with the error series of the return (endogenous). In addition, most studies assume
that the coecients of the state variables are xed over time, which may not hold in practice.
Recent empirical studies have cast doubt upon the constant-coecient assumption; see Goyal
and Welch (2003a) and Paye and Timmermann (2006).
To tackle the above problems, I would like to point out a host of new semiparametric
and nonparametric modeling techniques to reduce possible modeling biases in the paramet-
ric predictive regression models and to capture time-varying dynamics of the returns. New
models and cutting-edge technologies will be introduced to check the predictability of re-
turns and to test the stability of predictability which have been puzzling us since 1980s.
The proposed models belong to the nonlinear additive time series models and time-varying
coecient models but with possibly highly persistent, not really exogenous, and even nonsta-
tionary nancial predictors. As expected, they will avoid misspecication and produce more
accurate and ecient estimates of the true functions. Fundamental theoretical results for
the proposed methodology will be established, which will enrich the theory of statistics and
econometrics, enlarge the scope of application of nonparametric/semiparametric modeling,
and improve understanding of predictability of returns.
Finally, it is necessary to point out the dierences between classical (standard) nonpara-
metric regression models [see Fan and Yao (2003)] and nonlinear predictive regression models
proposed in this proposal. The biggest dierence is that the latter involves the endogene-
ity (predetermined) and persistent and nonstationary (nearly integrated or I(1)) predictive
variables, which make the asymptotic analysis of the associated estimators much more chal-
lenging. As we aware of, there are no any theoretical results available in the literature for
nonparametric/semiparametric predictive regression models.
Existing Methods for Predictive Regression Models
For simplicity, we follows the notation in Campbell and Yogo (2006) and consider a single-
variable predictive regression model formulated in (4.13), which postulates the structure
relationship between x
t1
and y
t
.
The main eort in the literature is to estimate eciently and to test if the returns are
predictable using the state variable, which amounts to testing the null hypothesis H
0
: = 0,
treating
1
and as nuisance parameters. Due to the non-zero correlation between
t
and
u
t
, this model violates the classical OLS assumption of independence between variable x
t1
and error
t
at all leads and lags. Therefore, the OLS estimates,

and , are biased, and
the biases of the two estimators are closely related, since E
_

_
= E [ ], where
=
/
u
. Furthermore, the persistent nancial variable x
t
renders diculties in making
inference about predictability. Even if the predictor variable x
t
is indeed I(0), the rst-order
asymptotics can be a poor approximation when is close to one. This is because of the
discontinuity in the asymptotic distribution at = 1 where the variance of x
t
diverges to
innity. Inference about based on the rst order asymptotics, such as conventional t-tests,
is therefore invalid due to large size distortions; see the aforementioned papers for details.
In what follows, I briey delineate the existing mainstream approaches to dealing with
the bias-correction and inference problems. Clearly, the nite sample bias in

comes from
the bias of the autoregressive estimation of and is magnied by . A common solution
is to obtain a more precise nite sample approximation to the bias of by utilizing the
bias-corrected estimate of . This includes the following three methods:
(i) The rst order bias-correction estimator in Stambaugh (1999),

c
=

+ (1 + 3 )/n,
where =
u
/
2
u
, and and u are all obtained from OLS estimation. This estimator is
obtained based on Kendall (1954)s analytical result, E( ) = (1+3 )/n+O(n
2
).
(ii) The two-stage least squares method in Amihud and Hurvich (2004). Assuming < 1
and a linear relationship between
t
and u
t
(indeed, the projection of
t
onto u
t
) as
t
= u
t
+ v
t
, (4.14)
the predictive regression model (4.13) can be rewritten as
y
t
=
1
+ x
t1
+ u
t
+ v
t
, (4.15)
where v
t
is white noise independent of both x
t
and u
t
at all leads and lags. The
regression thus meets the classical assumption of OLS without endogeneity if u
t
were
known. This motivated Amihud and Hurvich (2004) to obtain the OLS estimate of
rst and then to regress y
t
on x
t1
and the tted residuals u
t
to obtain a bias-corrected
estimate

, which is indeed a second order bias-correction method.

(iii) The conservative bias-adjusted estimator in Lewellen (2004),

=

+ (0.9999 )
when is very close to one. It can be showed easily that

must be the least biased

estimator of when the true is indeed very close to one.
While these methods evidence the predictability of returns, they have at least the following
drawbacks. First, they work under the linear relationship between the return and the state
variables which may not hold. Second, they do not consider instability issues (coecients in
the predictive models might change over time). For example, they do not determine if the
coecients might change over time where the return models may have changed, nor do they
consider the possibility of structural breaks or the time of their occurrence. These important
issues should be addressed. See, for example, Bossaerts and Hillion (1999), Sullivan, Tim-
mermann and White (1999), Marquering and Verbeek (2004), and Cooper, Gutierrez and
Marcum (2005). Furthermore, if nancial prediction models are evolving (unstable) over
time, the economic signicance of return predictability can only be assessed provided it is
determined how widespread such instability is both internationally and over time and the
extent to which it aects the predictability of stock returns. To investigate these problems,
using a sample of excess returns for international equity indices, Paye and Timmermann
(2006) analyzed both how widespread the evidence of structural breaks is and to what ex-
tent the breaks aect the predictability of stock returns. Also, Inoue and Kilian (2004)
showed that tests based on in-sample predictability typically are much more powerful than
out-of-sample tests which generally use much smaller sample sizes. Indeed, it is possible that
the absence of strong out-of-sample predictability in stock returns is entirely due to the use
of relatively short evaluation samples. Using the full sample for analysis, Paye and Timmer-
mann (2006) argued that there is a sucient power to address whether this explanation is
valid or whether predictability genuinely has declined over time.
4.6 A Recent Perspective on Predictability of Asset
Return
To summarize the above and to see what the future should be in this direction, I strongly
recommend you should read the following paper by the Nobel Laureate in Economics in 2003
Professor Clive W.J. Granger, which appeared in Journal of Econometrics (2005). As you
might know, Professor Clive Granger received the Nobel Prize in economics in 2003 due to
his contributions in Time Series Econometrics.
4.6.1 Introduction
Granger and Morgenstern (1970) published a book about the Forecastability of Stock Mar-
ket Prices, generally using lower frequency (say, daily or weekly or monthly) data to test
the random walk theory using autocorrelations and spectra. However, they did also consider
high-frequency transaction (say, tick-by-tick) data plus dividends and earnings in macro-
economic relationships.
Unsurprisingly, we found that returns are dicult to forecast, except in the very short-run
and the very long-run. In the third of a century since the book appeared empirical nance
has changed dramatically from just a few active workers to hundreds, maybe thousands.
The number of nance journals changed from one dozens and the techniques have become
considerably more advanced. The availability of much more data and greatly increased
computer power has produced more impressive research publications. It can be argued that
many of these publications have relatively little practical usefulness. In fact the purpose
of much of the work is unclear. Papers still keep appearing that rearm the random walk
theory. Of course, if a researcher had discovered a method of successfully forecasting returns,
she would not have published it, but would have accumulated considerable wealth. It may
well have happened, and we just do not know.
Occasionally, papers are published suggesting how returns can be forecast using a simple
statistical model, and presumably these techniques are the basis of the decisions of some
nancial analysts. More likely the results are fragile, once you try to use them, they go away.
There now exists several excellent textbooks on nancial econometrics and they generally
do a good job of surveying the safe features of the most popular procedures. I plan to
take a rather more realistic and forward looking viewpoint on the available and forthcoming
techniques. I will use four sections, about conditional means, conditional variances, then
conditional distributions, and nally, the future.
4.6.2 Conditional Means
The original objective of much of the empirical nancial research concentrated on mean
returns, conditional on previous returns, and possibly on other economic variables. Only
quite recently has the pair of return and volume be modelled jointly, as would be suggested by
a micro-economics text. Most of the techniques considered are those developed in statistical
and macro time series analysis, that is autoregressive models, VARs, unit root models,
cointegration, seasonality, and the usual bundle of nonlinear models, including chaos, neural
networks, and various other nonlinear autoregressive models. Some of these models seem to
be relevant and helpful, most do not.
Quite a lot of attention has been given to a property known as long-memory, in which
autocorrelations decline very slowly compared to any simple autoregressive model. It is
observed that the autocorrelations of measures of volatility, such as |r
t
|
d
, where r
t
is a
return series and d is positive, have the long-memory property. This observation, which
is wide-spread and occurs for many assets and markets, has produced a misinterpretation.
Theoretical results show that the fractional integrated (I(d)) model has the long-memory
property, and so it was concluded that any process with this property must be an I(d)
process. However, the conclusion is incorrect as pointed out in Granger (2000) and elsewhere,
as other processes can produce long-memory, particularly processes with breaks. If X
t
is a
positive process, and therefore has positive mean, if it is I(d); it must have a mean that is
proportional to t
d
; and so will have a distinct trend in mean. As volatility has no such trend
it cannot be I(d); especially as the estimated value of d is often found to be near 1/2. It
follows that the I(d) model is not appropriate for volatility but a break model remains a
plausible candidate to explain the observed long-memory property.
There have been several papers pointing out that a stationary process with occasional
level shifts will have the long memory properties, for example Granger and Hyung (2004)
(based on Hyungs 1999 Ph.D. thesis) and Diebold and Inoue (2001). The breaks need to
be not too frequent but stochastic in magnitude. A break process considered by Hyung and
Franses (2002) takes the form
y
t
= m
t
+
t
, m
t
= m
t1
+ q
t
t
(4.16)
with
t
,
t
being zero mean, white noise and where q
t
follows an i.i.d. binomial t distribution,
so that q
t
= 1 with probability p, q
t
= 0 with a probability of 1 p. The expected number
of breaks is aected by p and the magnitude of
2
. The break processes for stock prices

produces returns with a longer-tailed distribution but volatilities such as absolute returns
that do not suer from the trending problem. These volatilities are found to t as well, if
not better in other respects, than an I(d) model, by Granger and Hyung (2004).
4.6.3 Conditional Variances
If one wants to describe a distribution, just knowing the mean is totally inadequate, knowing
the mean and variance is clearly better. For those of us interested in empirical studies, our
immediate problem is that variance is not easily observed. One can form a sum of squared
deviations of returns around a mean but they take time to accumulate. The ARCH class
of models partly circumvents this problem and provides quite up-to-date values for the
variance. The purpose of measuring variance is somewhat less clear, particularly as returns
have been shown, consistently to have non-Gaussian distributions. The part of economics
that discusses uncertainty, risk, and insurance have for many years emphasized that measures
of volatility based on E(|r
t
|
d
) for positive d are quite inappropriate measures of risk. The
topic is mentioned in Granger (2002). The problem is easy to illustrate. Suppose a small
portfolio experiences a large negative shock to an asset, this will be treated as an increase
in risk, as it increases the chance of selling the asset at a lower price that its purchase price.
However, if an asset receives a large positive price shock, this is considered an increase in
uncertainty, but not in risk. However, both shocks will produce an increase in variance,
which treats movements in either tail of the distribution equally, although only those on one
side are undesirable. Measurements of risk based on quantiles, such as Value-at-Risk, or
VaR, avoid such problems as does the semi-variance suggested by Markowitz in his original
book on portfolio theory.
4.6.4 Distributions
The next obvious step is towards using predictive, or conditional, distributions. Major
problems remain, particularly with parametric forms and in the multivariate case. For
the center of the distribution a mixture of Gaussians appears to work well but these do
not represent tail probabilities in a satisfactory fashion. By thinking about a multivariate
distribution written in terms of marginals and a rectangular copula, it seems that all tail
properties will come from the marginals. A very practical time-series approach to conditional
distributions is to model quantiles, which can take autoregressive forms, have breaks, unit
roots, and other driving variables. Modeling and estimation is not very dicult and in
practice the problem of estimated quantiles crossing appears not to be dicult (see Granger
and Sin, 2000). The observed long-memory properties of volatility should be observed in the
quantiles due to breaks.
4.6.5 The future
The immediate future in any active academic eld always involves topics that have already
started. I believe that conditional distributions will continue to be a major subject as nance
learns how to generate of its fundamental theories into distributional forms, arbitrage, port-
folio theory, ecient market theory and consequences, Black-Scholes formula, and so forth.
This will be an exciting period and very general results will appear and new testing methods
devised. It is also likely that there will be structural breaks in the present framework, but
such breaks are dicult to forecast, which is the basic element of their nature. However,
there are two I think may be seen; the rst is a new approach to volatility and the second
is a reformulation of basic functional theory. Most of the old literature on prices, returns,
and volatility had, basically, a linear foundation. From studying the models suggested by
these approaches a number of stylized facts have been accumulated, these being empirical
facts that have been observed to occur for many (possibly all) assets in most (possibly all)
markets, most time periods and most data frequencies. A list of these stylized facts would
include:
(i) Returns are nearly white noise; that is, they have no serial or auto correlation.
(ii) The autocorrelations of r
2
t
and |r
t
|
d
decline slowly with increasing lag long memory
eect.
(iii) Similarly, the autocorrelations of |r
t
|
d
decline slowly, with the slowest decline for
d = 1 (Taylor eect).
(iv) Autocorrelations of sign r
t
are all small, insignicant.
(v) If one ts a GARCH(1, 1) model to the series, then + 1, with the usual
notation.
In a remarkable paper, Yoon (2003) shows, largely by simulation, that the simple stochas-
tic unit root model
P
t
= (1 + a
t
) P
t1
+
t
,
where P
t
is log stock price and a
t
,
t
are independent white noise series produces returns series
that have all of the stylized facts observed with actual data. It does not imply that actual
log stock prices are generated by this model, but it does suggest that it can capture many
realistic properties in a very simple model, and so deserves further study. Yoons model is
an example of a stochastic unit root process as discussed by Granger and Swanson (1997),
Leybourne, McCabe and Mills (1996), and Leybourne, McCabe and Tremayne (1996). Yoon
considers a particularly simple case where a
t
is a zero mean i.i.d. sequence and
t
is a zero
mean white noise.
Let me nally turn to an area in which I do not claim to have much special knowledge,
continuous time nance theory. I have looked over a number of books in the area and note
that much of the work starts with an assumption that a price or a return can be written in
terms of a standard diusion, which is based on a Gaussian distribution.
This immediately brings up warning signals because much of early econometrics used a
similar Gaussian assumption, just for mathematical convenience, and without proper test-
ing. Occasionally, it was asked if a marginal distribution could pass a test with a null of
Gaussianity, but I never saw a joint test of normality, which was really needed for much of
the theory to be operative. For the continuous time theory there is eectively no evaluation
of the theory using empirical tests because there is no continuous time data. When the
theory is brought over to discrete time, it is unclear if it continues to hold. There could be
a bifurcation in going from continuous to discrete time. Itos lemma, which uses a Gaussian
assumption, I believe, need no longer work in the discrete time zone. In fact the majority of
the empirical work that I have seen appears to nd that in the highest frequency data the
best models do not agree with continuous time theory.
Some recent work by At-Sahalia (2002) suggests that the discrete data results are more
consistent with jump-diusions, that is diusions with breaks, rather than standard diu-
sions. If further evidence for that result is accumulated, it is likely that the majority of
current nancial theory will have to be rewritten, with jump-diusion replacing diu-
sion, and with some consequent changes in theorems and results. As a great deal of human
capital will be devalued by such a development, it will certainly be opposed by many editors
and referees, as happens with all radical new ideas.
4.7 Comments on Predictability Based on Nonlinear
Models
The aforementioned predictability of asset returns is mainly on linear models but not much
on nonlinear models. As advocated by Granger (2005), nonlinear conditional mean functions
might be a good way to be explored, as in Hyung and Franses (2002) or model (4.16), which
can be regarded as the threshold type model, a special case of nonlinear models. Of course,
other types of nonlinear forms are warranted for a further study or they can be regarded as
a future research topic. To explore a possible research topic, you may have an interest in
exploring the data set in the data le SP-A.txt [The rst column is the return for S&P
500 CRSP weighted value and the second column is the log dividend-price ratio and the
third column is the log earnings-price ratio], which can be downloaded from the course web
site. As mentioned in Chapter 2, Hong and Lee (2003) conducted studies on exchange rates
and they found that some of them are predictable based on nonlinear time series models.
There are many ongoing research activities in this direction. See Chapter 4 in Tsay (2005),
Chapter 12 of Campbell, Lo and MacKinlay (1997), and the book by Fan and Yao (2003).
If we have time, we will come back to this topic later.
4.8 Problems
4.8.1 Exercises for Homework
1. Please download weekly (daily) price data for any stock, for example, Microsoft (MSFT)
stock (P
t
) for 03/13/1986 - 02/15/2008.
2. Estimate the CER model for Microsoft using the OLS estimation and construct a series
of residuals: e
t
= r
t
.
(a) Compute the autocorrelation function (ACF) of the residuals, {
k
}
10
k=1
. Graph the
autocorrelation coecients and condence intervals around them. What does it
suggest about autocorrelation in returns and predictability of returns?
(b) Test the following null hypothesis: (i) H
0
:
1
= 0, (ii) H
0
:
2
= 0, and (iii)
H
0
:
7
= 0.
(c) Use the modied Ljung-Box Q-test dened in equation (4.7) for testing autocorre-
lation. In testing, set the number of autocorrelations used m = 10. This modied
Q-test will give you dierent results from the results on Q-test in the previous
problem because the test statistic is dierent.
(d) Use the variance ratio statistic VR(n) in equation (4.8) to test for predictability in
stock returns. The variance ratio statistic can be computed using R. The program
also computes the standardized variance ratio statistic which follows a standard
normal distribution. Present your results and comment on predictability of MSFT
stock returns.
(e) Consider the following model for MSFT prices: p
t
= p
t1
+ e
t
. Use CJ test
statistic to test the predictability of MSFT prices. Are your results as expected?
You may mention your results of signicance test of in Problem 5 in Chapter 3.
3. Use autocorrelation tests and variance ratio tests to check predictability for IBM,
Coca-Cola, Glaxo stock returns for both weekly and daily for the period 03/13/1986 -
2/15/2008. Comment your results.
4. Use autocorrelation tests and variance ratio tests to check predictability of S&P500
index and DJIA index for weekly and daily for the period 03/13/1986 - 2/15/2008.
Comment your results.
5. Assume that you have an equal weighted portfolio that consists of four stocks: IBM,
Microsoft, Coca-Coal, and Glaxo for both weekly and daily. For the period 03/13/1986
- 2/15/2008, construct portfolio returns of this portfolio and conduct autocorrelation
and variance ratio tests of predictability. Comment your results.
4.8.2 R Codes
# 2-13-2008
# R code for computing the p-value for Cowles-Jones test
data=read.csv(file="c:/zcai/res-teach/econ6219/Bank-of-America.csv",header=T)
x=data[,5] # get the closing prices
x=rev(x) # reverse
n=length(x) # sample size
rt=diff(log(x)) # log return
rt_0=rt-mean(rt) # centerized
n1=length(rt_0)
I_t=(rt_0>0) # indicator for return is positive
n2=n1-1
I_t1=I_t[2:n1]
Y_t=I_t[1:n2]*I_t1+(1-I_t[1:n2])*(1-I_t1) # compute Y_t
n_s=sum(Y_t) # number of Y_t=1
n_r=n2-n_s
cj=n_s/n_r # CJ statistic
z=sqrt(n2)*abs(cj-1) # Z-score
p_value=2*(1-pnorm(z)) # p-value
print(c("The p-value for Cowles-Jones test is", p_value))
# Variance Ratio Test
library(vrtest) # load package
kvec1=c(2,5,10,20)
LM_test=Lo.Mac(rt,kvec1)
print(c("Results for Lo-MacKinlay test:", LM_test))
4.8.3 Project #1
1. Read the article Ecient Capital Markets: II by Fama (1991).
(a) Briey describe the main results of the literature on the predictability of short-run
returns.
(b) Briey describe the main results of the literature on the predictability of long-run
returns.
2. Read Chapter 7 of Taylor (2005). Briey explain the main ndings about the pre-
dictability of equities, currencies, and futures based on trading rules analysis.
3. After you read the survey paper by Granger (2005), please think about some possible
and interesting projects in this area that you can do and write a short report on your
thoughts.
4. After you read the paper by Campbell and Yogo (2006) and Paye and Timmermann
(2006) and other papers related to this topic, please think about some possible and
interesting projects in this area that you can do in your research. First, please explore
the data set SP-A.txt to see what you can nd. Say, consider a possible relationship
between the return and log dividend-price ratio or a relationship between the return
and log earnings-price ratio. The rst column is the excess return for S&P 500 CRSP
weighted value and the second column is the log dividend-price ratio and the third
column is the log earnings-price ratio. The sample period is 1880-2002 at yearly fre-
quency. Write a report on what your ndings are based on your analysis of this data
set.
(a) Based on what you have learned from our class, please re-analyze this data set.
Can you nd any problems? Are what your new ndings?
(b) Did the previous models support the data?
(c) For your new ndings, please describe your possible solutions to the problems.
4.9 References
Amihud, Y. and C. Hurvich (2004). Predictive regressions: A reduced-bias estimation
method. Journal of Financial and Quantitative Analysis, 39, 813-841.
At-Sahalia, Y. (2002). Maximum-likelihood estimation of discrete-sampled diusions: a
closed form approximation approach. Econometrica, 70, 223-262.
Ait-Sahalia, Y. and M. Brandt (2001). Variable selection for portfolio choice. Journal of
Finance, 56, 1297-1350.
Ang, A. and G. Bekaert (2007). Stock return predictability: Is it there? Review of Financial
Studies, 20, 651-707.
Barberis, N. (2000). Investing for the long run when returns are predictable. Journal of
Finance, 55, 225-264.
Balvers, R.J., T.F. Cosimano and B. McDonald (1990). Predicting stock returns in an
ecient market. Journal of Finance, 45, 1109-1128.
Belaire-Franch, G. and D. Contreras (2004). Ranks and signs-based multiple variance ratio
tests. Working paper, University of Valencia.
Bierens, H.J. (1982). Consistent model specication tests. Journal of Econometrics, 20,
105-134.
Bierens, H.J. (1984). Model specication testing of time series regressions. Journal of
Econometrics, 26, 323-353.
Bierens, H.J. (1990): A consistent conditional moment test of functional form. Economet-
rica, 58, 1443-1458.
Bierens, H.J. and W. Ploberger (1997). Asymptotic theory of integrated conditional mo-
ment tests. Econometrica, 65, 1129-1151.
Bossaerts, P. and P. Hillion (1999). Implementing statistical criteria to select return fore-
casting models: what do we learn? Review of Financial Studies, 12, 405-428.
Box, G. and D. Pierce (1970). Distribution of residual autocorrelations in autoregressive
integrated moving average time series models. Journal of the American Statistical
Brandt, M.W. (1999). Estimating portfolio and consumption choice: A conditional Euler
equations approach. Journal of Finance, 54, 1609C1646.
Campbell, J. and R. Shiller (1988). The dividend-price ratio and expectations of future
dividends and discount factors. Review of Financial Studies, 1, 195-227.
Campbell, J.Y. and L. Viceira (1998). Consumption and portfolio decisions when expected
returns are time varying. Quarterly Journal of Economics, 114, 433-495.
Campbell, J. and M. Yogo (2006). Ecient tests of stock return predictability. Journal of
Financial Economics, 81, 27-60.
Cavanagh, C.L., G. Elliott and J.H. Stock (1995). Inference in models with nearly integrated
regressors. Econometric Theory, 11, 1131-1147.
Chow, K.V. and K.C. Denning (1993). A simple multiple variance ratio test. Journal of
Christopherson, J.A., W. Ferson and D.A. Glassman (1998). Conditioning manager alphas
on economic information: another look at the persistence of performance. Review of
Financial Studies, 11, 111-142.
Cooper, M., R.C. Gutierrez Jr. and W. Marcum (2005). On the predictability of stock
returns in real time. Journal of Business, 78, 469-499.
Cowles, A. and H. Jones (1937). Some posterior probabilities in stock market action.
Econometrica, 5, 280-294.
Cutler, D.M., J.M. Poterba and L.H. Summers (1991). Speculative dynamics. Review of
Economic Studies, 58, 529-546.
De Jong, R.M. (1996). The Bierens test under data dependence. Journal of Econometrics,
72, 1-32.
Deo, R.S. (2000). Spectral tests of the martingale hypothesis under conditional het-
eroscedasticity. Journal of Econometrics, 99, 291-315.
Diebold, F.X. and A. Inoue (2001). Long memory and regime switching. Journal of Econo-
metrics, 105, 131-159.
Diebold F.X. and R.S. Mariano (1995). Comparing predictive accuracy. Journal of Business
and Economic Statistics, 13(3), 253-263.
Dominguez, M.A. and I.N. Lobato (2000). A consistent test for the martingale dierence
hypothesis. Working Paper, Instituto Tecnologico Autonomo de Mexico.
Durlauf, S.N. (1991). Spectral based testing of the martingale hypothesis. Journal of
Elliott, G. and J.H. Stock (1994). Inference in time series regression when the order of
integration of a regressor is unknown. Econometric Theory, 10, 672-700.
Fama, E.F. (1970): Ecient capital markets: A review of theory and empirical work.
Journal of Finance, 25, 383-417.
Fama, E.F. (1990). Stock returns, real returns, and economic activity. Journal of Finance,
45, 1089-1108.
Fama, F.F. (1991). Ecient capital markets: II. The Journal of Finance, 46, 1575-1617.
Fama, E.F. and K.R. French (1988). Dividend yields and expected stock returns. Journal
of Financial Economics, 22, 3-26.
Fan, J. and Q. Yao (2003). Nonlinear Time Series: Nonparametric and Parametric Model.
Springer, New York.
Ferson, W. and C.R. Harvey (1991). The variation of economic risk premiums. Journal of
Political Economy, 99, 385-415.
Ferson, W.E. and R.W. Schadt (1996). Measuring fund strategy and performance in chang-
ing economic conditions. Journal of Finance, 51, 425-461.
Ghysels, E. (1998). On stable factor structures in the pricing of risk: do time-varying betas
help or hurt? Journal of Finance, 53, 549-574.
Goyal, A. and I. Welch (2003a). Predicting the equity premium with dividend ratios.
Management Science, 49, 639-654.
Goyal, A. and I. Welch (2003b). A note on Predicting Returns with Financial Ratios.
Working Paper.
Granger, C.W.J. (2000). Current perspectives on long memory processes. Chung-Hua
Series of Lectures, No. 26, Institute of Economics, Academia Sinica, Taiwan.
Granger, C.W.J. (2002). Some comments on risk. Journal of Applied Econometrics, 17,
447-456.
Granger, C.W.J. (2005). The past and future of empirical nance: some personal comments.
Journal of Econometrics, 129, 35-40.
Granger, C.W.J. and N. Hyung (2004). Occasional structural breaks and long memory.
Journal of Empirical Finance, 11, 399-421.
Granger, C.W.J. and O. Morgenstern (1970). Predictability of Stock Market Prices. Heath
Lexington Books, Lexington, MA.
Granger, C.W.J. and C.-Y. Sin (2000). Modeling the absolute returns of dierent stock
indices exploring the forecastability of alternative measures of risk. Journal of Fore-
casting, 19, 277-298.
Granger, C.W.J. and N. Swanson (1997). An introduction to stochastic unit root processes.
Journal of Econometrics, 80, 35-61.
Hall, R.E. (1978). Stochastic implications of the life cycle-permanent income hypothesis:
Theory and evidence. Journal of Political Economy, 86, 971-987.
Hamilton, J. (1994). Time Series Analysis. Princeton University Press, Princeton, NJ.
Hong, Y. (1996). Consistent testing for serial correlation of unknown form. Econometrica,
64, 837-864.
Hong, Y. (1999). Hypothesis testing in time series via the empirical characteristic func-
tion: A generalized spectral density approach. Journal of the American Statistical
Hong, Y. (2001). A test for volatility spillover with application to exchange rates. Journal
of Econometrics, 103, 183-224.
Hong, Y. and T.-H. Lee (2003). Inference on via generalized spectrum and nonlinear time
series models. The Review of Economics and Statistics, 85, 1048-1062.
Hyung, N. and P.H. Franses (2002). Ination rates: long-memory, level shifts, or both?
Econometric Institute, Erasmus University Rotterdam Report 2002-08.
Kendall, M.G. (1938). A New Measure of Rank Correlation, Biometrika, 30, 81-93.
Kandel, S. and R. Stambaugh (1996). On the predictability of stock returns: an asset
allocation perspective. Journal of Finance, 51, 385-424.
Keim, D.B. and R.F. Stambaugh (1986). Predicting returns in the stock and bond markets.
Kim, J.H and A. Shamsuddin (2004). Are Asian stock markets ecient? Evidence from
new multiple variance ratio tests. Working Paper, Monash University.
Kothari, S.P. and J. Shanken (1997). Book-to-market, dividend yield, and expected market
returns: A time-series analysis. Journal of Financial Economics, 44, 169-203.
Kuan, C.-M. and W.-M. Lee (2004). A new test of the martingale dierence hypothesis.
Studies in Nonlinear Dynamics & Econometrics,8, Issue 4, Article 1.
LeRoy, S.F. (1989): Ecient capital markets and martingales. Journal of Economic Liter-
ature, 27, 1583-1621.
Lettau, M. and S. Ludvigsson (2001). Consumption, aggregate wealth, and expected stock
returns. Journal of Finance, 56, 815-849.
Leybourne, S., M. McCabe and M. Mills (1996). Randomized unit root processes for
modeling and forecasting nancial time series: theory and applications. Journal of
Forecasting, 15, 153-270.
LeBaron, B. (1997). Technical trading rule and regime shifts in foreign exchange. In
Advances in Trading Rules (eds E. Acar and S. Satchell), pp. 5-40. Oxford: Butter-
worthCHeinemann.
LeBaron, B. (1999). Technical trading rule protability and foreign exchange intervention.
Journal of International Economics, 49, 125-143.
Lewellen, J. (2004). Predicting returns with nancial ratios. Journal of Financial Eco-
nomics, 74, 209-235.
Leybourne, S., M. McCabe and J. Tremayne (1996). Can economic time series be dier-
enced to stationarity? Journal of Business and Economic Statistics, 14, 435-446.
Lo, A.W. and A.C. MacKinlay (1999). A Non-Random Walk Down Wall Street. Princeton
University Press, Princeton, NJ.
Lo, A.W. and A.C. MacKinlay (1988). Stock market prices do not follow random walks:
Evidence from a simple specication. Review of Financial Studies, 1, 41-66.
Lo, A.W. and A.C. MacKinlay (1989). The size and power of the variance ratio test in
nite samples: A Mote Carlo Investigation. Journal of Econometrics, 40, 203-238.
Lobato, I., J.C. Nankervis, and N.E. Savin (2001). Testing for autocorrelation using a
modied Box-Pierce Q test. International Economic Review, 42, 187-205.
Ljung, G. and G. Box (1978). On a measure of lack of t in time series models. Biometrika,
66, 67-72.
Mankiw, N.G. and M. Shapiro (1986). Do we reject too often? Small sample properties of
tests of rational expectation models. Economics Letters, 20, 139-145.
Marquering, W. and M. Verbeek (2004). The economic value of predicting stock index
returns and volatility. Journal of Financial and Quantitative Analysis, 39, 407-429.
Nelson, C.R. and M.J. Kim (1993). Predictable stock returns: The role of small sample
bias. Journal of Finance, 48, 641-661.
Nelsen, R.B. (1998). An Introduction to Copulas, Springer-Verlag, New York.
Paye, B.S. and A. Timmermann (2006). Instability of return prediction models. Journal
of Empirical Finance, 13, 274-315.
Polk, C., S. Thompson and T. Vuolteenaho (2006). Cross-sectional forecasts of the equity
premium. Journal of Financial Economics, 81, 101-141.
Poterba, J. and L. Summers (1988). Mean reversion in stock returns: Evidence and impli-
cations. Journal of Financial Economics, 22, 27-60.
Richardson, M. and T. Smith (1991). Tests of nancial models in the presence of overlapping
observations. The Review Financial Studies, 4, 227-254.
Rossi, B. (2007). Expectation hypothesis tests and predictive regressions at long horizons.
Econometrics Journal, 10, 1-26.
Schwert, G.W. (1990). Stock returns and real activity: A century of evidence. Journal of
Finance, 45, 1237-1257.
Spearman, C. (1904). The Proof and Measurement of Association Between Two Things,
the American Journal of Psychology, 15, 72-101.
Stambaugh, R. (1986). Bias in regressions with lagged stochastic regressors. Working
Paper, University of Chicago.
Stambaugh, R. (1999). Predictive regressions. Journal of Financial Economics, 54, 375-
421.
Sullivan, R., A. Timmermann and H. White (1999). Data snooping, technical trading rule
performance, and the bootstrap. Journal of Finance, 54, 1647-1692.
Taylor, S. (2005). Asset Price Dynamics, Volatility, and Prediction. Princeton University
Press, Princeton, NJ. (Chapters 3 and 7)
Torous, W., R. Valkanov and S. Yan (2004). On predicting stock returns with nearly
integrated explanatory variables. Journal of Business, 77, 937-966.
Whang, Y.-J. (2000). Consistent bootstrap tests of parametric regression functions. Journal
of Econometrics, 98, 27-46.
Whang, Y.-J. (2001). Consistent specication testing for conditional moment restrictions.
Economics Letters, 71, 299-306.
Whang, Y.-J. and J. Kim (2003). A multiple variance ratio test using subsampling. Eco-
nomics Letters, 79, 225-230.
Wright, J.H. (2000). Alternative variance-ratio tests using ranks and signs. Journal of
Business & Economic Statistics, 18, 1-9.
Yoon, G. (2003). A simple model that generates stylized facts of returns. Pusan National
University, Korea, UCSD Working Paper, San Diego, CA.
Chapter 5
Market Model
5.1 Introduction
The single index model is a purely statistical model used to explain the behavior of asset
returns. It is known as Sharpes single index model (SIM) or the market model or the the
single factor model or the -representation in capital asset pricing model (CAPM) /arbitagy
pricing theory (APT) context. The single index model has the form of a simple bivariate
linear regression model:
r
it
=
i
+
i,m
r
m,t
+ e
i,t
, 1 i N; 1 t T, (5.1)
where r
it
is the continuously compounded return on asset i (i = 1, . . . , N) between time
periods t1 and t and r
mt
is the continuously compounded return on a market index portfolio
or an individual stock return.
The intuition behind the single index model is as follows. The market index r
mt
captures
macro or market-wide systematic risk factors. This type of risk is called systematic risk or
market risk, cannot be eliminated in a well diversied portfolio. The random error term e
it
captures micro or rm-specic risk factors that aect an individual asset return and that are
not related to macro events. This type of risk is called rm specic risk, idiosyncratic risk
or non-market risk. This type of risk can be eliminated in a well diversied portfolio.
The CER model is a special case of the single index model where
i,m
= 0 for all i. In
this case,
i
= . Also, the single index model can be extended to capture multiple factors:
r
it
=
i
+
i,1
f
1t
+
i,2
f
2t
+ +
i,k
f
kt
+ e
it
,
111
CHAPTER 5. MARKET MODEL 112
where f
jt
denotes the j
th
systematic factor,
i,j
denotes asset is loading on the j
th
factor,
and e
it
denotes the random component independent of all the systematic factors.
The single index model is heavily used in empirical nance. It is used to estimate expected
returns, variances and covariance that are needed to implement portfolio theory. It is used
as a model to explain normal or usual rate of return on an asset for the use in event studies.
An excellent overview of event studies is given in Chapter 4 of CLM and we will study it in
detail in the next chapter. Cochrane (2002) provides a detailed a mathematical derivation
of single index models. As advocated by Cochrane (2002), the single index model is used to
explain the variation in average returns across assets but not about predicting returns from
variables seen ahead of time.
5.2 Assumptions About Asset Returns
There are following assumptions about the probability distribution of r
it
for i = 1, . . . , N
assets over time horizon t = 1, . . . , T:
1. (r
it
, r
mt
) are jointly normally distributed for i = 1, . . . , N and t = 1, . . . , T.
2. E(e
it
) = 0 for i = 1, . . . , N and t = 1, . . . , T.
3. Var(e
it
) =
2
e,i
for i = 1, . . . , N (constant variance or homoskedasticity).
4. Cov(e
it
, r
mt
) = 0 for i = 1, . . . , N and t = 1, . . . , T (uncorrelated cross assets).
5. Cov(e
it
, e
js
) = 0 for all t, s and i = j (uncorrelated cross assets and time).
6. e
it
is normally distributed.
5.3 Unconditional Properties of Returns
Under the above assumptions, we can show easily that
E(r
it
) =
i
=
i
+
im
E(r
mt
) =
i
+
im
m
, Cov(r
it
, r
jt
) =
ij
=
2
m
j
,
Var(r
it
) =
2
i
=
2
im
Var(r
mt
) + Var(e
it
) =
2
im
2
m
+
2
ei
,
so that
im
=
Cov(r
it
, r
mt
)
Var(r
mt
)
=

im
2
m
.
Further,
r
it
N(
i
,
2
i
) and r
mt
N(,
2
m
).
There are several things to notice:
1. The unconditional expected return on asset i,
i
, is constant. This relationship may
be used to create prediction of expected returns over some future period. For example,
suppose
i
= 0.015,
im
= 0.7 and that a market analyst forecasts
m
= 0.05. Then
the forecast for the expected return on asset is

i
= 0.015 + 0.7 0.05 = 0.04.
2. The unconditional variance of the return on asset i is constant and consists of variability
due to the market index,
2
im
2
m
, and variability due to specic risk,
2
ei
. Notice that
2
i
=
2
im
2
m
+
2
ei
, or

2
im
2
m
2
i
+

2
ei
2
i
= 1.
Then, one can dene
R
2
i
=

2
im
2
m
2
i
= 1

2
ei
2
i
as the proportion of the total variability of r
it
that is attributable to variability in
the market index. One can think of R
2
i
as measuring the proportion of risk in asset i
that cannot be diversied away when forming a portfolio and can be computed as the
coecient of determination from regression (5.1). Similarly,
1 R
2
i
=

2
ei
2
i
is the proportion of the variability of r
it
that is due to rm specic factors. One can
think of 1 R
2
i
as measuring the proportion of risk in asset i that can be diversied
away. Sharpe (1970) computed R
2
i
for thousands of assets and found that for a typical
stock R
2
i
0.30, which is regarded as a rule of thumb in applications.
5.4 Conditional Properties of Returns
Suppose that an analyst observes the returns on market portfolio at period t, r
mt
. The
properties of the single index model conditional on r
mt
are:
E(r
it
|r
mt
) =
i|rmt
=
i
+
im
r
mt
, Var(r
it
|r
mt
) = Var(e
it
) =
2
ei
, Cov(r
it
, r
jt
|r
mt
) = 0. (5.2)
Property (5.2) shows that once the movements in the market are controlled for, assets are
uncorrelated. The single index model for the entire set of N asset may be conveniently
represented using matrix algebra:
R
t
= + r
mt
+ e
t
, t = 1, . . . , T,
where R
t
= (r
1t
, . . . , r
Nt
)
, e
t
= (e
1t
, e
2t
, . . . , e
Nt
), = (
1
, . . . ,
N
)
, = (
1m
, . . . ,
Nm
)
.
The variance-covariance matrix may be computed as:
Var(R
t
) = E(R
t
ER
t
)(R
t
ER
t
)
=
2
m
+ ,
where is a N N variance-covariance matrix of all stock returns, is a diagonal matrix
with
2
ei
along the main diagonal. Suppose that the single index model describes the returns
on two assets:
r
1t
=
1
+
1m
r
mt
+ e
1t
, and r
2t
=
2
+
2m
r
mt
+ e
2t
.
Consider forming a portfolio of these two assets. Let w
1
denote the share of wealth in asset
1, w
2
the share of wealth in asset 2, w
1
+ w
2
= 1. It can be shown that the return on this
portfolio is:
r
pt
= w
1
r
1t
+ w
2
r
2t
=
p
+
pm
r
mt
+ e
pt
,
where
p
= w
1
1
+ w
2
p
,
pm
= w
1
1
+ w
2
p
, and e
pt
= w
1
e
1t
+ w
2
e
2t
. This additivity
result of the single index model holds for portfolios of any size, i.e. for portfolio consisting
of N assets
p
=
N
i=1
w
i
i
,
p
=
N
i=1
w
i
im
, and e
pt
=
N
i=1
w
i
e
it
.
5.5 Beta as a Measure of Portfolio Risk
The individual specic risk of an asset, measured by the assets own variance, can be diver-
sied away in well diversied portfolios whereas the covariance of the asset with the other
assets in the portfolio cannot be completely diversied away. Consider an equally weighted
portfolio of 99 stocks with the return on this portfolio denoted r
99
and variance
2
99
. Next,
consider adding one more stock, say IBM, to the portfolio. Let r
IBM
and
2
IBM
denote the
return and variance of IBM and let
99,IBM
= Cov(r
99
, r
IBM
). What is the contribution
of IBM to the portfolio risk, as measured by portfolio variance? A new equally weighted
portfolio is constructed as:
r
100
= 0.99r
99
+ 0.01r
IBM
.
The variance of this portfolio:
2
100
= 0.99
2
2
99
+ 0.01
2
2
IBM
+ 2 0.99 0.01
99,IBM
0.98
2
99
+ 0.02
99,IBM
. (5.3)
Dene
99,IBM
=
Cov(r
99
, r
IBM
)
Var(r
99
)
=

99,IBM
2
99
.
Then,
99,IBM
=
99,IBM

2
99
,
and (5.3) becomes:
2
100
= 0.98
2
99
+ 0.02
99,IBM

2
99
.
Then adding IBM does not change the variability of the portfolio as long as
99,IBM
= 1. If
99,IBM
> 1 then
2
100
>
2
99
and if
99,IBM
< 1 then
2
100
<
2
99
.
In general, let r
p
denote the return on a large diversied portfolio and let r
i
denote the
return on some asset i. Then
p,i
=
Cov(r
p
, r
i
)
Var(r
p
)
can used as a measure of portfolio risk of a specic asset i.
5.6 Diagnostics for Constant Parameters
The assumption on constant and has been challenged in the literature. Cui, He and Zhu
(2002), Akdeniz, Altay-Salih and Caner (2003), You and Jiang (2005), and Cai (2007), and
among others showed that in many applications, changes over time. In other words, we
need to do diagnostics for constant parameters, which can be formulated as. Assume that
R
t
iid
N(,
2
) for 1 t T. The null hypothesis is H
0
: is constant over time H
1
:
changes over time.
To see intuitively whether the parameters change over time, we use a very simple method:
the rolling idea. Compute estimate of over rolling windows of length n < T,

t
(n) =
1
n
n1
i=0
R
ti
=
1
n
(R
t
+ R
t1
+ + R
tn+1
),
and compute estimates of
2
and over rolling windows of length n < T as

2
t
(n) =
1
n 1
n1
i=0
(R
ti

t
(n))
2
.
Similarly, compute estimates of
jk
and
jk
over rolling windows of length n < T,
jk,t
(n)
and
jk,t
(n). Make time series plots and check to whether those estimates are time-varying.
Further, compute estimates of
i
and
i
from SI model over rolling windows of length n < T
R
it
(n) =
i
(n) +
i
(n) R
Mt
(n) +
it
(n).
Finally, use rolling estimates of and to compute rolling ecient portfolios: global mini-
mum variance portfolio, tangency portfolio, and ecient frontier.
Exercises: Please download several stocks and market indices and check whether the pa-
rameters change over time by using the rolling method.
5.7 Estimation and Hypothesis Testing
Ordinary least squares (OLS) regressions can be used to nd the OLS estimates of the
model parameters and usual statistical tests such t-tests for individual parameter or F-tests
for multiple parameters may be applied to this model. For details, please see Chapter 4 of
CLM.
5.8 Problems
1. Download weekly (daily) price data for several stocks, for example, IBM stock (P
t
) for
02/13/86 - 02/15/08. Create stock market return series for IBM, {r
t
}
T
t=1
. Download
weekly (daily) data on S&P500 or S&P100 index for the same period.
(a) Estimate the market model:
r
t
= + r
mt
+ e
t
, 1 t T,
where you may use returns on S&P100 index as market returns.
(b) If one uses the variance of IBM returns as a measure of volatility, what is the
proportion of total risk of IBM stock returns attributed to the market factor?
What is the proportion of idiosyncratic risk?
(c) Test the null hypothesis that = 1 against the alternative that = 1 and against
the alternative that = 1.
(d) Test the null hypothesis that = 0 against the alternative that = 0 and against
the alternative that > 0.
(e) Use F-statistics to test the following simultaneous restrictions on parameters:
H
0
: = 0, and = 1.
(f) Repeat the above steps for several stocks.
2. Use the rolling method to estimate the parameters. Based on your conclusions, do you
support the assumption that the parameters in the model are constant?
3. Read the papers by Cui, He and Zhu (2002), Akdeniz, Altay-Salih and Caner (2003),
You and Jiang (2005), and Cai (2007). What do you suggest a better model for building
a single index model between an individual stock (say, IMB stock) return and a market
index (say, S&P100 index)? Explore this topic further and regard it as a project and
write and explain in detail your methodologies and conclusions.
5.9 References
Akdeniz, L., A. Altay-Salih and M. Caner (2003). Time-varying betas help in asset pricing:
The threshold CAPM. Studies in Nonlinear Dynamics and Econometrics, 6, No.4,
Article 1.
Cai, Z. (2007). Trending time varying coecient time series models with serially correlated
errors. Journal of Econometrics, 137, 163-188.
Cochrane, J.H. (2002). The Asset Pricing Theory. Princeton University Press, Prince-
ton, NJ.
Markets. Princeton University Press, Princeton, NJ. (Chapter 4.3-4.4).
Cui, H., X. He and L. Zhu (2002). On regression estimators with de-noised variables.
Statistica Sinica, 12, 1191-1205.
Sharpe, W. (1970). Portfolio Theory and Capital Markets. McGraw-Hill, New York.
You, J. and J. Jiang (2005). Inferences for varying-coecient partially linear models with
serially correlated errors, In Advances in Statistical Modeling and Inference: Essays in
Honor of Kjell A. Doksum, Ed. Vijay Nair. Series in Biostatistics, 3, 175-195. World
Scientic Publishing Co. Pte. Ltd., Singapore.
link is: http://faculty.washington.edu/ezivot/econ483/483notes.htm.
Chapter 6
Event-Study Analysis
6.1 Introduction
Event studies are an important part of corporate nance. This research documents interest-
ing regularities in the response of stock prices to investment decisions, nancing decisions,
and changes in corporate control. Even studies have long history, Dolley (1933) investigated
the impact of stock splits and other important papers are Brown and Warner (1980, 1985)
and Boehmer, Musumeci and Poulsen (1991). In particular, Fama (1991) listed the following
main results from the event studies research:
1. Unexpected changes in dividends are on average associated with stock-price changes
of the same sign.
2. New issues of common stocks are bad news for stock prices and redemptions, through
tenders or open-market purchases, are good news.
3. The following ndings follow from the analysis of corporate-control transactions:
(a) Mergers and tender oers on average produce larger gains for stockholders of the
target rms.
(b) Management buyouts are also wealth-enhancing for target stockholders.
As to the market eciency, the typical result in event studies on daily data is that stock
prices seem to adjust within a day to event announcements. As Fama (1991) pointed out,
even studies are the cleanest evidence on market eciency. On average, this evidence is
supportive.
119
CHAPTER 6. EVENT-STUDY ANALYSIS 120
6.2 Outline of an Event Study
Usually, an event study analysis has seven steps:
1. Event denition.
The event of interest: earnings announcements, stock splits, mergers, etc.
The event window: the day of the announcement and the day after the announce-
ment. This is a period over which security prices will be examined. The period
prior to the event window and the period after the event window are investigated
separately.
2. Selection criteria: determine the selection criteria for inclusion of a given rm in the
study. Possible selection criteria include listing on NYSE, members of a specic indus-
try, region, etc.
3. Normal and abnormal returns.
The normal return is the return that would be expected if the event did not take
place.
The abnormal return is the actual ex post return of the security over the event
window minus the normal return of the rm over the event window, i.e. for each
rm i and event date , we have:
e
it
= R
it
E[R
it
|X
t
],
where e
it
is the abnormal return, R
it
is the actual ex post return, E[R
it
| X
t
]
is the normal return. X
t
is the conditioning information for the normal perfor-
mance model.
Two common choices for modeling the normal return
(a) The constant-mean-return model: X
t
is a constant. This model assumes
that mean return is constant.
(b) The market model: X
t
is the market return. This model assumes a stable
linear relation between the market return and the security return.
4. Estimation procedure
Estimation window: subset of the data used to estimate the parameters of the
normal return model.
The most common choice for estimation window is the period prior to the event
window. Generally, the event period is not included in the estimation period.
5. Testing procedure
Calculate the abnormal returns
Dene the null hypothesis to be tested
Determine the techniques for aggregating the abnormal returns of individual rms
6. Empirical results
Results and some diagnostics
The empirical results can be heavily inuenced by by one or two rms
7. Interpretation and conclusions
6.3 Models for Measuring Normal Returns
There are a number of approaches available to calculate the normal return of a given security.
Here are two common approaches to measure the normal performance:
1. Statistical: approaches based on statistical assumptions about the behavior of asset
returns.
(a) Constant-Mean-Return model. The performance of this simple model is similar
to more sophisticated models
(b) Market Model (single index model). The potential improvement of the market
model over the constant mean model is that it removes the portion of the return
that is related to variation in the markets return, thus reducing the variance of
the abnormal return.
(c) Factor model. The potential improvement is the reduction of the abnormal re-
turn by explaining more of the variation in the normal return. In practice the
gains from employing multifactor models for event studies are limited because
the marginal explanatory power of additional factors beyond the market factor is
small.
(d) Market-adjusted-return model. This model can be viewed as a restricted market
model with
i
constrained to be 0 and
i
constrained to be 1.
2. Economic: approaches based on assumptions concerning investors behavior (some
statistical assumptions are still needed to use economic models in practice) can be
classied as follows:
(a) CAPM: The use of capital asset pricing model (CAPM) in event studies has
almost ceased.
(b) APT: Arbitrary pricing theory (APT) model has little practical advantage relative
to unrestricted market model.
6.4 Measuring and Analyzing Abnormal Returns
Notation:
= 0 is the event date
T
0
< T
1
< T
2
< T
3
(T
0
, T
1
] is the estimation window
(T
1
, T
2
] is the event window, T
1
+ 1 0 T
2
.
(T
2
, T
3
] is the post-even window.
L
1
= T
1
T
0
is the length (sample size) of the estimation window
L
2
= T
2
T
1
is the length of the even window
L
3
= T
3
T
2
is the length of the post-event window
The abnormal return over the event window is interpreted as a measure of the impact of the
event on the value of the rm. The time line of an event study is presented in Figure 6.1.
Note that
1
Estimation
window
Event
window
Postevent
window
T
0
T
1
T
2
T
3
0
Time Line:
Model for "normal" returns
is estimated:
1) Market Model
2) CER model
3) Factor Model
2
Aggregation of
abnormal returns
L
1
L
2
L
3
Figure 6.1: Time Line of an event study.
It is typical for the estimation window and the event window not to overlap. This
insures that estimators for the parameters of the normal return model are not inuenced
by the event-related returns
The methodology implicitly assumes that the event is exogenous with respect to the
change in market value of security.
There are examples where event is triggered by the change in the market value of a
security, i.e. the event is endogenous.
6.4.1 Estimation Procedure
The estimation window observations can be expressed as a regression system
R
i
= X
i
i
+e
i
, (6.1)
where R
i
= (R
i,T
0
+1
, . . . , R
i,T
1
)
is an L
1
1 vector, X
i
= ( R
m
) is an L
1
2 matrix with
a vector of ones in the rst column and the vector of market returns observations R
m
in
the second column, R
m
= (R
m,T
0
+1
, . . . , R
m,T
1
)
,
i
= (
i
,
i
)
is a 2 1 parameter vector.
One estimates model (6.1) and obtains the OLS estimates

i
,
2
e
i
, e
i
, Var(
i
). The sample
vector of abnormal returns e
i
for rm i from the event window, T
1
+ 1 to T
2
, is computed
as follows:
e
i
= R
i
X
i
,
where R
i
= (R
i,T
1
+1
, . . . , R
i,T
2
)
is an L
2
1 vector of event-window returns, X
i
= ( R
m
)
is an L
2
2 matrix with a vector of ones in the rst column and the vector of market returns
observations R
m
in the second column, R
m
= (R
m,T
1
+1
, . . . , R
m,T
2
)
i
is an OLS estimate.
Conditional on the market return over the event window, the abnormal returns will be
jointly normal with zero mean and conditional covariance matrix V
i
which is dened as:
V
i
=
2
e
_
I +X
i
(X
i
X
i
)
1
X
. (6.2)
The covariance matrix of abnormal return consists of two parts. The rst term is the variance
due to future disturbances and the second term is the additional variance due to the sampling
error in

i
.
Under the null hypothesis, H
0
, that the given event has no impact on the mean or
variance of returns, the vector of event window sample abnormal returns has the following
distribution:
e
i
N(0, V
i
),
where V
i
is dened in (6.2).
6.4.2 Aggregation of Abnormal Returns
The abnormal return observations must be aggregated in order to draw overall inferences
for the event of interest. The aggregation is along two dimensions - through time and across
securities.
The aggregation through time:
To accommodate multiple sampling intervals within the event window one needs to
introduce cumulative abnormal returns (CAR).
Dene CAR
i
(
1
,
2
) as the cumulative abnormal return for security i from
1
to
2
where T
1
<
1

2
T
2
.
Let be an L
2
1 vector of one in positions
1
T
1
to
2
T
1
and zeros elsewhere.
Then, we have
CAR
i
(
1
,
2
)
i
, and Var[CAR
i
(
1
,
2
)] =
2
i
(
1
,
2
) =
V
i
.
Under H
0
that the given event has no impact on the mean or variance of returns:
CAR
i
N
_
0,
2
i
(
1
,
2
)
_
.
One can construct a test of H
0
for security i as follows:
SCAR
i
(
1
,
2
) =
CAR
i
(
1
,
2
)

i
(
1
,
2
)
, (6.3)
where
2
i
(
1
,
2
) is calculated with
2
e
substituted for
2
e
i
.
Under the null hypothesis the distribution of SCAR(
1
,
2
) in (6.3) is Student-t with
L
1
2 degrees of freedom.
The aggregation through time and across securities:
1. The rst approach is as follows.
Assume that there is not any correlation across the abnormal returns of dierent
securities. This implies that there is not any overlap in the event windows of the
included securities.
Given a sample of N securities, dening e
as the sample average of the N ab-

normal return vectors, one has:
e
=
1
N
N
i=1
e
i
, and Var[ e
] = V =
1
N
2
N
i=1
V
i
.
Dene CAR(
1
,
2
), the cumulative average abnormal return, as follows:
CAR(
1
,
2
)
, and Var(CAR(
1
,
2
) =
2
(
1
,
2
) =
V.
Under the assumption that the event windows of the N securities do not overlap,
inferences about the cumulative abnormal returns can be drawn using
CAR(
1
,
2
) N
_
0,
2
(
1
,
2
)
_
.
In practice,
2
(
1
,
2
) is unknown and one needs to use

2
(
1
,
2
) =
1
N
2
N
i=1

2
i
(
1
,
2
)
as a consistent estimator to test H
0
using
J
1
=
CAR(
1
,
2
)
[

2
(
1
,
2
)]
1
2
N(0, 1).
2. The second approach of aggregation is to give equal weighting to the individual SCAR
i
s.
Dene
SCAR(
1
,
2
) =
1
N
N
i=1
SCAR
i
(
1
,
2
).
Assuming that the event securities of the N securities do not overlap in calendar
time, the null hypothesis H
0
can tested using
J
2
=
_
N(L
1
4)
L
1
2
_1
2
SCAR(
1
,
2
) N(0, 1).
Note that the power of tests J
1
and J
2
might be similar for most studies and of course,
it depends on the alternative.
Sensitivity to Normal Return Model:
Use of the market model reduces the variance of the abnormal return compared to the
constant-mean-model. This is because
it
= (1
2
im
) Var[R
it
],
where
im
= Corr(R
it
, R
mt
). (please verify the above formula.) For the constant
mean model R
it
=
i
+
it
,
it
= Var[R
it
i
] = Var[R
it
].
Thus
2
it
= (1
2
im
)
2
it

2
it
because 0
2
im
1. See the empirical examples in
CLM (p. 163) and Table 4.1.
Inferences with Clustering:
The basic assumption in the aggregation over securities is that individual securities are
uncorrelated in the cross section. This is the case if the event windows over dierent
securities do not overlap in calendar time. If they do, the correlation should be taken
into account. One way is to aggregate the individual securities with overlapping event
windows to portfolios, and the apply the above standard event study analysis. Another
way is to analyze without aggregation.
6.4.3 Modifying the Null Hypothesis:
So far the null hypothesis has been that the event has no impact on the behavior of the
return. Either a mean eect or variance eect violates this hypothesis. If we are interested
only in the mean eect, say, the analysis must be expanded to allow for changing variances.
A popular way to do this is to estimate cross-sectional variance at each time point within
the event window.
Var
_
CAR(
1
,
2
)
=
1
N
2
N
i=1
_
CAR
i
(
1
,
2
) CAR(
1
,
2
)
2
,
and
Var
_
SCAR(
1
,
2
)
=
1
N
2
N
i=1
_
SCAR
i
(
1
,
2
) SCAR(
1
,
2
)
2
,
Note that you can nd a rationale for these variance estimators and discuss assumptions
behind the validity of these estimators (please verify this, left as an exercise). Using
these variance estimators in J
1
and J
2
test statistics allows for testing the mean eect under
a possible variance eect.
6.4.4 Nonparametric Tests
The advantage of nonparametric approach is that it is free of specic assumptions concerning
the return distribution. Common and classical nonparametric tests are the sign and rank
tests, which can be found in some statistics books; see, for example, Conover (1999). The
sign test is based on the sign of the abnormal return with assumptions: (1) Independence:
returns are independent across securities, (2) Symmetry: positive and negative returns are
equally likely under the null hypothesis of no event eect.
Let p = P(CAR
i
= 0), then if the research hypothesis is that there is a positive return
eect of the event the statistical null and alternative hypotheses are H
0
: p = 0.5 versus
H
1
: p > 0.5. Let N
+
be the number of cases with positive returns, and N the total number
of cases, then a statistic based on these information for testing the null hypothesis H
0
can
be formulated as
J
3
=
_
N
+
N
0.5
_
N
1/2
0.5
N(0, 1).
Large values of J
3
lead to rejection of H
0
. Note that you can derive a small sample test for
the null hypothesis. What you need to do is to use the Central Limit Theorem and try to
rationale the asymptotic distribution result of J
3
. For example, dene random variables Y
i
such that Y
i
= 1, if the CAR
i
> 0 and Y
i
= 0 otherwise. Then N
+
=
N
i=1
Y
i
.
Note that the weakness of the the sign test is that it may not be well dened if the
(abnormal) return distribution is skewed, i.e. if P(
it
0 | H
0
) = P(
it
< 0 | H
0
). A rank
test is one choice which allows non-symmetry. Consider only the case for testing the null
hypothesis that the event day abnormal return is zero. The rank test (Wilcoxon rank sum
test) is as follows: Consider a sample of L
2
abnormal returns for each of N securities. Order
the returns from smallest to largest, and let K
i,
= rank(
i,t
) be the rank number (i.e. K
i,
ranges from 1 to L
2
). Under the null hypothesis of no event impact the abnormal return
should be just arbitrary random value, and consequently obtain an arbitrary rank position
from 1 to L
2
. That is each observation should take each rank value equally likely, i.e., with
probability 1/L
2
. Consequently the expected value of K
i,
at each time point and for each
security i under the null hypothesis is
K
= E[K
i,
] =
L
2
j=1
j P(K
i,
= j) =
1
L
2
L
2
j=1
j =
1
2
(L
2
+ 1),
and variance
Var[K
i,
] =
L
2
j=1
(j
K
)
2
P(K
i,
= j).
A test statistic for testing the event day ( = 0) eect, suggested by Corrado (1989), is
J
3
=
1
N
N
i=1
_
K
i,0
L
2
+ 1
2
_
/s(L
2
),
where
s(L
2
) =
_
1
L
2
T
2
=T
1
+1
_
1
N
N
i=1
_
K
i,

L
2
+ 1
2
_
_
2
.
Under the null hypothesis, J
3
N(0, 1). Typically, nonparametric tests are used in con-
junction with the parametric tests. The R code for implementing the Wilcoxon rank sum
test is wilcox.test().
6.4.5 Cross-Sectional Models
Here the interest is in the magnitude of association between abnormal return and character-
istics specic to the observed event. Let Y be an N 1 vector of CARs and X be an N K
matrix of K 1 characteristics (The rst column is a vector of ones for the intercept term).
Then a cross-sectional (linear) model to explain the magnitudes of CARs is
Y = X + ,
where is a K 1 coecient vector and is an N 1 disturbance vector. OLS estimator
= (X
X)
1
X
Y which is consistent (i.e.,

) if E[X
] = 0 (i.e., residuals are not

with the explanatory variables) and
Var
_
_
= (X
X)
1
.
Replacing
2
by its consistent estimator

2
=
1
N K

,
where = YX
, makes possible to calculate standard errors of the regression coecients

and construct t-test to make inference on -coecients.
In nancial markets homoscedasticity is a questionable assumption. This is why it is
usually suggested to use White (1980)s heteroscedasticity consistent (HC) standard errors
of -estimates. These are obtained as square roots from the main diagonal of
Var
_
_
=
1
N
(X
X)
1
_
N
i=1
x
i
x
i

2
i
_
(X
X)
1
.
These are usually available in most econometric packages or you can compute them by
yourself.
Newey and West (1987, 1994) proposed a more general estimator that is consistent of
both heteroscedasticity and autocorrelation (HAC). In general, this estimator essentially
can use a nonparametric method to estimate the covariance matrix of
n
t=1
t
x
t
and a class
of kernel-based heteroskedasticity and autocorrelation consistent (HAC) covariance matrix
estimators was introduced by Andrews (1991). Note, however, that this may be used only for
time series regression. Not for cross-sectional regression! For discussion on studies applying
cross-sectional models in conjunction of event studies see CLM (p. 174).
To use HC or HAC estimator, we can use the package sandwich in R and the commands
are vcovHC() or vcovHAC() or meatHAC(). There are a set of functions implementing
a class of kernel-based heteroskedasticity and autocorrelation consistent (HAC) covariance
matrix estimators as introduced by Andrews (1991). In vcovHC(), these estimators dier in
their choice of the
i
in = Var(e) = diag{
1
, ,
n
}, an overview of the most important
cases is given in the following:
const :
i
=
2
HC0 :
i
= e
2
i
HC1 :
i
=
n
n k
e
2
i
HC2 :
i
=
e
2
i
1 h
i
HC3 :
i
=
e
2
i
(1 h
i
)
2
HC4 :
i
=
e
2
i
(1 h
i
)
i
where h
i
= H
ii
are the diagonal elements of the hat matrix and
i
= min{4, h
i
/h}.
vcovHC(x, type = c("HC3", "const", "HC", "HC0", "HC1", "HC2", "HC4"),
omega = NULL, sandwich = TRUE, ...)
meatHC(x, type = , omega = NULL)
vcovHAC(x, order.by = NULL, prewhite = FALSE, weights = weightsAndrews,
adjust = TRUE, diagnostics = FALSE, sandwich = TRUE, ar.method = "ols",
data = list(), ...)
meatHAC(x, order.by = NULL, prewhite = FALSE, weights = weightsAndrews,
adjust = TRUE, diagnostics = FALSE, ar.method = "ols", data = list())
kernHAC(x, order.by = NULL, prewhite = 1, bw = bwAndrews,
kernel = c("Quadratic Spectral", "Truncated", "Bartlett", "Parzen",
"Tukey-Hanning"), approx = c("AR(1)", "ARMA(1,1)"), adjust = TRUE,
diagnostics = FALSE, sandwich = TRUE, ar.method = "ols", tol = 1e-7,
data = list(), verbose = FALSE, ...)
weightsAndrews(x, order.by = NULL,bw = bwAndrews,
kernel = c("Quadratic Spectral","Truncated","Bartlett","Parzen",
"Tukey-Hanning"), prewhite = 1, ar.method = "ols", tol = 1e-7,
data = list(), verbose = FALSE, ...)
bwAndrews(x,order.by=NULL,kernel=c("Quadratic Spectral", "Truncated",
"Bartlett","Parzen","Tukey-Hanning"), approx=c("AR(1)", "ARMA(1,1)"),
weights = NULL, prewhite = 1, ar.method = "ols", data = list(), ...)
Also, there are a set of functions implementing the Newey and West (1987, 1994) het-
eroskedasticity and autocorrelation consistent (HAC) covariance matrix estimators.
NeweyWest(x, lag = NULL, order.by = NULL, prewhite = TRUE, adjust = FALSE,
diagnostics = FALSE, sandwich = TRUE, ar.method = "ols", data = list(),
verbose = FALSE)
bwNeweyWest(x, order.by = NULL, kernel = c("Bartlett", "Parzen",
"Quadratic Spectral", "Truncated", "Tukey-Hanning"), weights = NULL,
prewhite = 1, ar.method = "ols", data = list(), ...)
For more details, see the papers by Zeileis (2004, 2006).
6.4.6 Power of Tests
The goodness of a statistical test is its ability to detect false null hypothesis. This is called
the power of the test, and is technically measured by power function, which depends on the
parameter values under the H
1
(in the case of abnormal returns, )
() = P
( reject H
0
when H
0
is not ture),
where denotes the size of the test (i.e., the signicance level which usually is 1% or 5%),
and P
() denotes the probability as a function of . Thus the power function gives the
probability to reject H
0
on dierent values of the tested parameter ().
Example:
Consider the J
1
test and test the event day abnormal return. Furthermore, assume for
simplicity that the market model parameters are known with
2
A
(
1
,
2
) = 0.0016. Then the
power depends on the sample size N, the level of signicance and the magnitude of the
(average) abnormal return . For the xed = 0.05, the two-sided test, i.e., H
0
: = 0
vs H
1
: = 0 has the power function
0.05
() = P
(J
1
< z
0.025
) + P
(J
1
> z
0.025
). The
distribution of J
1
depends on such that
E[J
1
] =

A
(
1
,
2
)
=
.
Thus J
1
N(
, 1). Note that J

1
N(0, 1). The power function is then

0.05
()
P(J
1
< z
0.025
) + P(J
1
> z
0.025
) = (z
0.025

) + (1 (z
0.025

)), where z
0.025
is
the critical value at 0.025 level and () is the cumulative distribution function (CDF) of
the standardized normal distribution, N(0, 1). Below are graphs of the power function of
the J
1
test at the 5% signicance level for sample sizes 1, 10, 20 and 50. We observe that
the smaller the eect is the larger the sample size must be in order for the test statistic to
detect it. Especially for N = 1 (individual stocks) the eect must be relatively high before
it can be statistically identied. The important factor aecting the power is the parameter
N/
A
, which is a kind of signal-to-noise ratio, where is the amount of signal and
A
/
N is the noise component, which decreases as a function of the sample size (number
of events).
6.5 Further Issues
1. Role of the sampling interval: The interval between adjacent observations con-
stitute the sampling interval (minutes, hour, day, week, month). If the event time is
known accurately a shorter sampling interval is expected lead higher ability to identify
6 4 2 0 2 4 6
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
delta
p
o
w
e
r
N=1
N=10
N=20
N=50
Figure 6.2: Power function of the J
1
test at the 5% signicance level for sample sizes 1, 10,
20 and 50.
the event eect (power of the test increases). Use of intraday data may involve some
complications due to thin trading, autocorrelation, etc. So the benets of very short
interval is unclear. For an empirical analysis/example, see Morse (1984).
2. Inferences with event-date uncertainty: Sometimes the exact event date may be
dicult to identify. Usually the uncertainty is of the whether the event information
published e.g. in newspapers was available to the markets already a day before. A
practical way to accommodate this uncertainty is to expand the event window to two
days, the event day 0 and next day +1. This, however, reduces the power of the test
(extra noise is incorporated to the testing).
3. Possible biases: Nonsynchronous and thin trading: Actual time between e.g. daily
returns (based on closing prices) is not exactly one day long but irregular, which is a
potential source of bias to the variance and correlation estimates.
6.6 Problems
1. In this problem set, you will conduct a small event study which examines the eect
of September 11 terrorist attack on the performance of six companies: Continental
Airlines (CAL), Delta Airlines (DAL), Southwest Airlines(LUV), the Boeing Co. (BA),
Allied Defense Group (ADG), and Engineered Support Systems (EASI)
1
. To implement
the event study, we will use data for the period 01/01/2001 - 12/01/2001. We will
assume that the event date is September 17 because this the day when the market
reopened. In the analysis, we will examine abnormal returns for the period 20 days
before and 20 days after the event.
Use standardized cumulative abnormal return (SCAR) to test that the event has no
eect on stock prices:
(a) Estimate market model and construct normal returns.
(b) Construct abnormal returns.
(c) Construct cumulative abnormal returns (CAR) for each stock.
(d) Construct standardized cumulative abnormal return for each stock.
Comment your results on each part.
2. Split stocks into two groups. The rst group contains airline related stocks (CAL, DAL,
LUV, BA) and the second group contains the stocks of defense oriented companies
(ADG, EASI). Use two approaches discussed in the class and the book by CLM (1997)
to aggregate abnormal stock market returns. Test the null hypothesis that even has no
eect on stock prices. Are results for two groups dierent? Is it what you expected?
Discuss your results.
3. Read the paper by Bernanke and Kuttner (2005) and write a referee report on this
paper. Think about the possible projects of applying the proposed approaches in this
paper to studying the US stock markets reaction to the policy changes by the Federal
Reserve Board.
1
Engineered Support Systems designs, manufactures, and supplies integrated military electronics, support
equipment, and technical and logistics services for all branches of Americas armed forces and certain foreign
militaries, homeland security forces and selected government and intelligence agencies.
6.7 References
Andrews, D.W.K. (1991). Heteroskedasticity and autocorrelation consistent covariance
matrix estimation. Econometrica, 59, 817-858.
Bernanke, B.S. and K.N. Kuttner (2005). What explains the stock markets reaction to
Federal Reserve policy? Journal of Finance, 60, 1221-1257.
Boehmer, E., J. Musumeci and A. Poulsen (1991). Even study methodology under condi-
tions of event induced variance. Journal of Financial Economics, 30, 253-272.
Brown, S. and J. Warner (1980). Measuring security price performance. Journal of Finan-
cial Economics, 8, 205-258.
Brown, S. and J. Warner (1985). Using daily stock returns: The case of event studies.
Conover, W.J. (1999). Practical Nonparametric Statistics, 3rd Edition. John Wiley & Sons,
New York.
Corrado, C. (1989). A nonparametric test for abnormal security price performance. Journal
of Financial Economics, 23, 385-395.
Cochrane, J.H. (2002). The Asset Pricing Theory. Princeton University Press, Prince-
ton, NJ.
Dolley, J. (1933). Characteristics and procedure of common stock split-ups. Harvard Busi-
ness Review, 316-326.
Fama, E.F. (1991). Ecient capital markets: II. The Journal of Finance, 46, 1599-1603.
Morse, D. (1984). An econometric analysis of the choice of daily versus monthly returns in
tests of information content. Journal of Accounting Research, 22, 605-623.
Newey, W. and K. West (1987). A simple, positive semi-denite, heteroscedasticity and
autocorrelation consistent covariance matrix. Econometrica, 55, 703-708.
Newey, W.K. and K.D. West (1994). Automatic lag selection in covariance matrix estima-
tion. Review of Economic Studies, 61, 631-653.
White, H. (1980). A heteroskedasticity-consistent covariance matrix estimators and a direct
test for heteroskedasticity. Econometrica, 48, 817-838.
Zeileis, A. (2004). Econometric computing with HC and HAC covariance matrix estimators.
Journal of Statistical Software, Volume 11, Issue 10.
Zeileis, A. (2006). Object-oriented computation of sandwich estimators. Journal of Statis-
tical Software, 16, 1-16.
Chapter 7
Introduction to Portfolio Theory
7.1 Introduction
Consider the following investment problem.
1
One can invest in two non-dividend paying
stocks A and B. Let r
A
denote monthly return on stock A and r
B
denote the monthly
return on stock B. Assume that the returns r
A
and r
B
are jointly normally distributed with
the following parameters:
A
= E(r
A
),
2
A
= Var(r
A
),
B
= E(r
B
),
2
B
= Var(r
B
), and
AB
= Cov(r
A
, r
B
).
We assume that these values are given (estimated using the historical return data). The
portfolio problem is as follows. An investor has a given amount of wealth and it is assumed
that she will exhaust all her wealth between investment in the two stocks. Let w
A
denote
the share of wealth invested in stock A and w
B
denote the share of wealth invested in stock
B, w
A
+ w
B
= 1. The shares w
A
and w
B
are referred to as portfolio weights (allocations).
The long position means that w
A
> 0 and w
B
> 0 and the short position is that w
A
< 0 and
w
B
> 0. The return on the portfolio over the next period is given by
r
p
= w
A
r
A
+ w
B
r
B
.
You should be able to show that:
p
= E(r
p
) = w
A
A
+ w
B

B
, and
2
p
= Var(r
p
) = w
2
A
2
A
+ w
2
B

2
B
+ 2w
A
w
B

AB
.
1
This section is mostly from lecture notes of Zivot. For those of you who are interested in more details
on asset allocation, please visit the website of Campbell R. Harvey for the course Global Asset Allocation
and Stock Selection at http://www.duke.edu/ charvey/Classes/ba453/syl453.htm.
136
CHAPTER 7. INTRODUCTION TO PORTFOLIO THEORY 137
7.1.1 Ecient Portfolios With Two Risky Assets
Assumptions:
1. Returns are jointly normally distributed. This implies that means, variances and co-
variances of returns completely characterize the joint distribution of returns
2. Investors only care about portfolio expected return and portfolio variance. Investors
like portfolios with high expected return but dislike portfolios with high return variance
Under theses assumptions, the distribution of the portfolio r
p
is N(
p
,
2
p
). We want to
nd the set of portfolios that have the highest expected return for a given level of risk as
measured by portfolio variance. We summarize the expected return-risk (mean-variance)
Table 7.1: Example Data
A

B

2
A

2
B

A

B

AB

AB
0.175 0.055 0.067 0.013 0.258 0.115 -0.004875 -0.164
properties of the feasible portfolios in a plot with portfolio expected return,
p
, in the
vertical axis and portfolio standard deviation,
p
, on the horizontal axis. The investment
possibilities set or portfolio frontier for the data in Table 7.1 is illustrated in Figure 7.1.
Portfolio std. deviation
0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
P
o
r
t
f
o
l
i
o

e
x
p
e
c
t
e
d

r
e
t
u
r
n
0.00
0.05
0.10
0.15
0.20
0.25
Portfolio Frontier with two Risky Assets
Figure 7.1: Plot of portfolio expected return,
p
versus portfolio standard deviation,
p
.
The portfolio weight on asset A, w
A
, is varied from 0.4 to 1.4 in increments of 0.1 and
the weight on asset B varies from 1.4 to 0.4, i.e. there are 18 portfolios with weights
(w
A
, w
B
) = (0.4, 1.4), (0.3, 1.3), . . . , (1.4, 0.4). We compute
p
and
p
for each of
these portfolios. Portfolio at the bottom of parabola, denoted by M, has the smallest variance
among all feasible portfolios. This portfolio is called global minimum variance portfolio. To
nd the minimum variance portfolio one solves the constrained optimization problem
min
w
A
,w
B
2
p
= w
2
A

2
A
+ w
2
B

2
B
+ 2w
A
w
B

AB
s.t. w
A
+ w
B
= 1.
Solving this problem, one nds that the weights of stocks A and B for the minimum variance
portfolio are as follows:
w
min
A
=

2
B
AB
2
A
+
2
B
2
AB
, and w
min
B
= 1 w
min
A
.
For our example, using the data in Table 7.1, we get w
min
A
= 0.2 and w
min
B
= 0.8. Note that
the shape of investment possibilities is very sensitive to the correlation between assets A and
B.
7.1.2 Ecient Portfolios with One Risky Asset and One Risk-Free
Asset
Continuing with the example, consider an investment in asset B and the risk free asset (for
example, US T-bill rate) and suppose that r
f
= 0.03. The risk free asset has some special
properties:
f
= E[r
f
] = r
f
, Var(r
f
) = 0, and Cov(r
B
, r
f
) = 0.
The portfolio expected return and variance are:
r
p
= w
B
r
B
+ (1 w
B
)r
f
,
p
= w
B
(
B
r
f
) + r
f
(7.1)
2
p
= w
2
B
2
B
. (7.2)
Note that (7.2) implies that w
B
=
p
/
B
. Plugging this result into (7.1) and we obtain that
the set of ecient portfolios follows the equation:
p
= r
f
+

B
r
f
p
. (7.3)
Therefore, the ecient set of portfolios is a straight line in (
p
,
p
) with intercept r
f
and
slope (
B
r
f
)/
B
. The slope of the combination line between risk free asset and a risky
0.00 0.05 0.10 0.15 0.20
P
o
r
t
f
o
l
i
o

e
x
p
e
c
t
e
d

r
e
t
u
r
n
0.02
0.04
0.06
0.08
0.10
0.12
0.14
Portfolio Frontier with one Risky Asset and
T-bill
Asset A and T-bill
Asset B and T-bill
Figure 7.2: Plot of portfolio expected return versus standard deviation.
asset is called Sharpe ratio proposed by Sharpe (1963) and it measures the risk premium
on the asset per unit of risk (measured by standard deviation of the asset). The portfolio
frontier with one risky asset and T-bill is illustrated in Figure 7.2.
7.1.3 Ecient portfolios with two risky assets and a risk-free asset
Now we consider a case when investor is allowed to form portfolios of assets A, B and T-bills.
The ecient set in this case is still a straight line in (
p
,
p
)-space with intercept r
f
. The
slope of the ecient set, the maximum Sharpe ratio, is such that it is tangent to the ecient
set constructed just using the two risk assets. We can determine the proportions of each
asset in the tangency portfolio by nding the values w
A
and w
B
that maximize the Sharpe
ratio of a portfolio. Formally, one solves
max
(w
A
,w
B
):w
A
+w
B
=1
p
r
f
p
,
where
p
= w
A
A
+w
B

B
and
2
p
= w
2
A
2
A
+w
2
B

2
B
+2w
A
w
B

AB
. The above problem may
be reduced to
max
w
A
w
A
(
A
r
f
) + (1 w
A
)(
B
r
f
)
(w
2
A
2
A
+ (1 w
A
)
2
2
B
+ 2w
A
(1 w
A
)
AB
)
1/2
0.00 0.05 0.10 0.15 0.20 0.25
P
o
r
t
f
o
l
i
o

e
x
p
e
c
t
e
d

r
e
t
u
r
n
0.00
0.05
0.10
0.15
Portfolio Frontier with two Risky Assets and
T-bill
Asset A and T-bill
Asset B and T-bill
Tangency Portfolio
Figure 7.3: Plot of portfolio expected return versus standard deviation.
The solution to this problem is:
w
T
A
=
(
A
r
f
)
2
B
(
B
r
f
)
AB
(
A
r
f
)
2
B
+ (
B
r
f
)
2
A
(
A
r
f
+
B
r
f
)
AB
, and w
T
B
= 1 w
T
A
.
For the example data in Table 7.1 and using r
f
= 0.03, we get w
T
A
= 0.542 and w
T
B
= 0.458.
The expected return on the tangency portfolio is
T
= 0.11 and
T
= 0.124. The portfolio
frontier with two risky assets and T-bill is illustrated in Figure 7.3. The ecient portfolios
are combinations of the tangency portfolio and the T-bill. This important result is known
as the mutual fund separation theorem. Which combination of the tangency portfolio and
the T-bill an investor will choose depends on the investors risk preferences. For example, a
highly risk averse investor may choose to put 10% of her wealth in the tangency portfolio
and 90% in the T-bill. For example, a highly risk averse investor may choose to put 10%
of her wealth in the tangency portfolio and 90% in the T-bill. Then she will hold 5.42%
(0.1 0.542) of her wealth in asset A, 4.58% of her wealth in asset B and 90% of her wealth
in the T-bill.
7.2 Ecient Portfolios with N risky assets
Assume that there be N risky assets with mean vector and covariance matrix . Assume
that the expected returns of at least two assets dier and that the covariance matrix is
of full rank. Dene w
a
as the N 1 vector of portfolio weights for an arbitrary portfolio
a with weights summing to unity. Portfolio a has mean return
a
= w
a
and variance
2
a
= w
a
w
a
. The covariance between any two portfolios a and b is w
a
w
b
. We consider
minimum-variance portfolios in the absence of a risk free asset.
Denition: Portfolio p is the minimum-variance portfolio of all portfolios with mean return
p
if its portfolio weight is the solution to the following constrained optimization:
min
w
{w
w : w
=
p
, and w
= 1},
where is a conforming vector of ones. To solve this problem, we form a Lagrangian function
L, dierentiate with respect to w, set the resulting equations equal to zero, and then solve
for w. For the Lagrangian function we have:
L = w
w+ 2
1
(
p
w
) + 2
2
(1 w
),
where 2
1
and 2
2
are Lagrange multipliers. Dierentiating L with respect to w we get:
w
p
=
1
(
1
+
2
). (7.4)
We nd Lagrange multipliers from the constraints satisfying
_
_ _
2
_
=
_
p
1
_
_
B A
A C
_ _
2
_
,
where A =
1
, B =
1
, and C =
1
. Hence, with D = BC A
2
,
1
= (C
p
A)/D, and
2
= (B A
p
)/D.
Plugging in to (7.4) we get the portfolio weights and variance:
w
p
= g +
p
h,
where g = [B(
1
) A(
1
)]/D and h = [C(
1
) A(
1
)]/D. There is a number
of results for minimum-variance portfolios (you may refer to CLM for more results):
Result 1: The minimum-variance frontier can be generated from any two distinct
minimum-variance portfolios
Result 2: For the global minimum-variance portfolio, g, we have:
w
g
=
1
C

1
,
g
=
A
C
, and
2
g
=
1
C
.
Given a risk free asset with return r
f
the minimum-variance portfolio with expected return
p
will be a solution to the constrained optimization:
min
w
w
w, s.t. w
+ (1 w
) r
f
=
p
.
The solution is:
w
p
=

p
r
f
( r
f
)
1
( r
f
)
1
( r
f
).
In this case w
p
can be expressed as follows:
w
p
= c
p
w,
where
c
p
=

p
r
f
( r
f
)
1
( r
f
)
, and w =
1
( r
f
).
With a risk free asset all minimum-variance portfolios are a combination of a given risky
asset portfolio with weights proportional to w and the riskfree asset. This portfolio is called
tangency portfolio and has the weight vector:
w
q
=
1
1
( r
f
)
1
( r
f
).
The Sharpe ratio for any portfolio a is dened as the mean excess return divided by the
standard deviation of return:
sr
a
= (
a
r
f
)/
a
.
The Sharpe ratio is the slope of the line from the risk free return (r
f
, 0) to the portfolio
(
a
,
a
). The tangency portfolio q can be characterized as the portfolio with the maximum
Sharpe ratio of all portfolios of risky assets. Therefore, testing the mean-variance eciency
of a given portfolio is equivalent to testing whether the Sharpe ratio of the portfolio is the
maximum of the set Sharpe ratios of all possible portfolios.
7.3 Another Look at Mean-Variance Eciency
Review of the capital asset pricing model (CAPM):
There is nite number of securities indexed by i, i = 0, . . . , N.
Let r
ft
denote the risk-free rate at period t.
The security 0 is risk-free. It has a price of 1 at date t and its price is 1 +r
t
at period
t + 1.
Other securities are risky and have prices p
it
, i = 1, . . . , N, t = 1, . . . , T. There are
no dividends.
A portfolio is described by an allocation vector (w
0
, w
1
, . . . , w
N
)
= (w
0
, w
An acquisition cost of portfolio at date t is W

t
= w
0
+w
p
t
.
A value of portfolio at date t + 1 is unknown, but its expectation and variance are as
follows:
Wt
(w
0
, w) = E
t
[W
t+1
] = w
0
(1 + r
ft
) +w
E
t
[p
t+1
],
and
2
Wt
(w
0
, w) = Var
t
(W
t+1
) = w
Var
t
[p
t+1
] w.
The investors optimization objective is:
max
w
0
,w
_
Wt
(w
0
, w)

2

2
Wt
(w
0
, w)
_
(7.5)
subject to the budget constraint
w
0
+w
p
t
= W, (7.6)
where W is the initial endowment (wealth) at time t and is the investors risk
aversion. From the budget constraint (7.6), one can derive the quantity of risk-free
asset: w
0
= W w
p
t
.
The objective function (7.5) can be rewritten as:
max
w
_
W(1 + r
ft
) +w
{E
t
(p
t+1
) p
t
(1 + r
ft
)}

2
w
Var
t
[p
t+1
] w
_
,
or
= max
w
_
w

2
w
t
w
_
,
where Y
t+1
= p
t+1
p
t
(1 + r
ft
) is an N 1 vector of the excess gain on risky assets
(excess returns),
t
= E
t
(Y
t+1
) is the expected mean of excess returns (N 1 vector),
t
= Var
t
(p
t+1
) is an N N covariance matrix of expected returns.
The objective function is concave in , and the optimal allocation satises the rst-order
condition:
t
=
t
w
t
,
which implies that the solutions of the mean-variance optimization, that is, the mean-
variance ecient portfolio allocations, consist of allocations in risky assets as follows:
w
t
=
1

1
t

t
. (7.7)
The corresponding quantity of risk-free asset is w
0,t
= W w
t
p
t
.
7.4 The Black-Litterman Model
7.4.1 Expected Returns
In the traditional mean-variance approach the user inputs a complete set of expected returns
and the variance matrix of expected returns, and then the portfolio optimizer generates
the optimal portfolio weights according to equation (7.7). In the Black-Litterman model
proposed by Black and Litterman (1992), the user inputs
(1) any number of views or statements about the expected returns of arbitrary portfolios,
and
(2) equilibrium values.
The model combines the views, producing both the set of expected returns of assets and
as the optimal portfolio weights. The Black-Litterman (BL) model creates stable, mean-
variance ecient portfolios, which overcomes the problem of input-sensitivity. It provides the
exibility to combine the market equilibrium with additional market views of the investor.
This model uses equilibrium returns that clear the market as a starting point for the
neutral expected returns. The equilibrium returns are derived using a reversed optimization
method:
= w
mkt
, (7.8)
where is an N 1 vector of implied excess equilibrium returns, is the risk aversion
coecient, is a N N covariance matrix of excess returns, w
mkt
is the N 1 vector of
market capitalization weights. The risk aversion coecient measures the rate at which an
investor will forego expected return for less variance. Therefore, the average risk tolerance of
the world is represented by the risk-aversion parameter . The equilibrium expected returns
are and the CAPM prior distribution for the expected returns is +
e
, where
e
is
normally distributed with mean zero and covariance and the parameter is a scalar
measuring the uncertainty of the CAPM prior. As you have seen in the previous section, the
solution to the unconstrained maximization problem: max[w
w/2] implies
w =
1

1
, (7.9)
where is the expected mean of excess returns. One may use the historical return vector
(
hist
) as an estimate of next period return or an estimate of using other methods. If
= , then the optimal weight vector w in (7.9) equals to w
mkt
. Otherwise, w will not
equal to w
mkt
.
He and Litterman (1999) cited two problems with the Markowitz framework of Markowitz
(1952):
1. The Markowitz formulation requires expected returns to be specied for every com-
ponent of the relevant universe, while investment managers tend to focus on small
segments of their potential investment universe.
2. When managers try to optimize using the Markowitz approach, they usually nd that
portfolio weights (when not overly constrained) to appear to be extreme and not par-
ticularly intuitive. Also, the optimal weights seem to change dramatically from period
to period. This is illustrated in Tables 7.2 and 7.3.
7.4.2 The Black-Litterman Model
The BL formulas for expected returns are written as follows:
E(R) =
_
1

1
+P
1
P
_
1
_
1

1
+P
1
Q
_
(7.10)
Var(R) =
_
1

1
+P
1
P
_
1
, (7.11)
where E(R) is the N 1 updated (posterior) return vector, is scalar, P is a KN matrix
that identies the assets involved in the K views, is a K K diagonal covariance matrix
of error terms from expressed views, Q is a K 1 view vector. The expressions for E(R)
and Var(R) are used in formula (7.9) to nd optimal weights. The BL model allows investor
Table 7.2: Expected excess return vectors
Asset Class Historical CAPM GSMI CAPM Implied Equilibrium
hist

GSMI
Portfolio
p
Return
US Bonds 3.15% 0.02% 0.08% 0.08%
Intl Bonds 1.75% 0.18% 0.67% 0.67%
US Large Growth -6.39% 5.57% 6.41% 6.41%
US Large Value -2.86% 3.39% 4.08% 4.08%
US Small Growth -6.75% 6.59% 7.43% 7.43%
US Small Value -0.54% 3.16% 3.70% 3.70%
Intl Dev Equity -6.75% 3.92% 4.80% 4.80%
Intl Emerg. Equity -5.26% 5.60% 6.60% 6.60%
Weighted Average -1.97% 2.41% 3.00% 3.00%
Standard Deviation 3.73% 2.28% 2.53% 2.53%
High 3.15% 6.59% 7.43% 7.43%
Low -6.75% 0.02% 0.08% 0.08%
All four estimates are based on 60 months of excess returns over the risk-free rate.
The two CAPM estimates are based on a risk premium of 3. Dividing the risk
premium by the variance of the market (or benchmark) excess returns (
2
) results
in a risk-aversion coecient () of approximately 3.07.
All the assets show the evidence of fat tails, since the kurtosis exceeds 3, which is
the normal value
Table 7.3: Recommended portfolio weights
Asset Class Weignt based Weight based Weight Market
Asset Class Historical CAPM GSMI based on Capitalization
hist

GSMI
w
mkt
US Bonds 1144.32% 21.33% 19.34% 19.34%
Intl Bonds -104.59% 5.19% 26.13% 26.13%
US Large Growth 54.99% 10.80% 12.09% 12.09%
US Large Value -5.29% 10.82% 12.09% 12.09%
US Small Growth -60.52% 3.73% 1.34% 1.34%
US Small Value 81.47% -0.49% 1.34% 1.34%
Intl Dev Equity -104.36% 17.10% 24.18% 24.18%
Intl Emerg. Equity 14.59% 2.14% 3.49% 3.49%
High 1144.32% 21.33% 26.13% 26.13%
Low -104.59% -0.49% 1.34% 1.34%
views to be expressed in either absolute or relative terms. Three sample views may be as
follows:
View 1: International Developed Equity will have an absolute excess return of 5.25%.
Condence of view is 25%.
View 2: International Bonds will outperform US Bonds by 25 basis points. Condence
of view is 50%.
View 3: US Large Growth and US Small Growth will outperform US Large Value
and US Small value by 2%. Condence of view is 65%.
7.4.3 Building the Inputs
The model does not require that investors specify views on all assets, i.e. K may be less
than N. The uncertainty of the views results in a random, unknown, independent normally-
distributed error term vector e with a mean 0 and covariance matrix , i.e. View is Q+ e
and for three views considered Q = (5.25, 0.25, 2). The variance of the error term is =
diag{
1
, ,
K
}. The expressed views in column vector Q are matched to specic assets
by matrix P: P = (p
ij
) and for the views considered
P =
_
_
0 0 0 0 0 0 1 0
1 1 0 0 0 0 0 0
0 0 1/2 1/2 1/2 1/2 0 0
_
_
and the equal weighting scheme in row 3 of P is used. Other options are to use a market
capitalization scheme. Once the matrix P is dened, one can calculate the variance of each
individual view portfolio p
k
p
k
, where p
k
is kth 1N row of matrix P. He and Litterman
(1999) assumed that = 0.025% and dened:
= diag{p
1
p
1
,
.
.
. , p
K
p
K
cr}.
The process of construction of new combined (or updated) returns may be summarized in
Figure 7.4.
7.5 Estimation of Covariance Matrix
The estimation of the covariance matrix of stock returns is very important in portfolio
selection process. There are two major methods in the literature.
Risk Aversion
Coefficient
= (ERr
f
)/
2
Covariance
Matrix
()
Market
Capitalization
Weights
(w
mkt
)
Views
(Q)
Uncertainty
of Views
()
Implied Equilibrium Return Vector
= w
mkt
Prior Equilibrium Distribution
r~N(, )
View Distribution
r~N(Q, )
New Combined Return Distribution
r~N(, )
= (
1
/ + P
1
P)
1
(
1
/ +P
1
Q)
= (
1
/ +P
1
P)
1
Figure 7.4: Deriving the new combined return vector E(R).
7.5.1 Estimation Approaches
Let R
t
= (r
1t
, r
2t
, . . . , r
Nt
)
be an N 1 vector of stock returns at period t and

R =
1
T
T
t=1
R
t
. There are two popular approaches to estimate the covariance matrix of stock
returns:
1. Sample variance-covariance matrix that can be computed as follows:
S =
1
T
T
t=1
(R
t

R)(R
t

R)
,
where S is an N N sample variance-covariance matrix. The main advantage for this
approach is that this estimator does not impose too much structure on the process
generating returns. But, the disadvantage for S is singular if T < N.
2. Covariance matrix may be computed using factor models of the following form:
r
it
=
i
+
i1
r
mt
+
i2
f
2t
+ +
ik
f
kt
+ e
it
, i = 1, ..., N; t = 1, ..., T, (7.12)
e
t
N(0,
2
i
) is uncorrelated with the factors. Model (7.12) may be written in matrix
notation as follows:
R
t
= +BX
t
+E
t
, t = 1, ..., T, (7.13)
where
B =
_
_
_
_
11

1k
21

2k
.
.
.
.
.
.
N1

Nk
_
_
_
_
, and X
t
=
_
_
_
_
r
mt
f
2t
.
.
.
f
kt
_
_
_
_
.
The covariance matrix of returns in model (7.12) can be written as follows:
= (
ij
) = B
X
B
+, (7.14)
where
X
is the covariance matrix of factors X
t
, is a diagonal matrix. Note that
The factor model (7.12) can be used for risk decomposition of the portfolio. In
particular, the portfolio returns are dened as r
p
= w
R
t
, where w is an N 1
vector of weight allocation. The portfolio variance is equal to:
2
p
= w
w = w
B
X
B
w+w
w,
where w
B
X
B
w is the risk attributed to common factors and w
w is the
risk attributed to the idiosyncratic component.
For the single index factor model (market model) the covariance matrix (7.14)
becomes:
=
2
m

+, (7.15)
where
2
is the variance of the market factor.
3. The advantages of the factor approach to compute the covariance matrix are that the
covariance matrix is nonsingular and factors may have economic meaning. But the
disadvantages are that there are no consensus on the number of factors to be used in
the model and no consensus on which factors should be included in the model.
Ledoit and Wolf (2003) suggested using the weighted average of the sample covariance
matrix and the covariance matrix computed based on the single index model as the estimate
of the covariance matrix, i.e. compute the covariance matrix as follows
S
= F + (1 ) S, (7.16)
where 0 1 and F = (f
ij
) is the estimate of covariance matrix in equation (7.15).
The advantage is that the covariance matrix S
is nonsingular. and there is no question

about the selection of appropriate factors. The problem with (7.16) is how you choose .
To choose , Ledoit and Wolf (2003) proposed a shrinkage method, described next.
7.5.2 Shrinkage estimator of the covariance matrix
Assumptions:
A1: Stock returns are independent and identically distributed (IID) though time.
A2: The number of stocks N is xed and nite, while the number of observations T goes
to innity
A3: Stock returns have nite fourth moment.
A4: = = Var(R
t
) = (
ij
).
A5: The market portfolio has positive variance, i.e.
2
m
> 0.
The actual stocks do not verify Assumption A1 because it ignores:
1. Lead-lags eects.
2. Volatility clustering: autoregressive conditional heteroskedasticity (ARCH).
3. Nonsynchronous trading
Also, note that
1. Any broad-based market index can be used as the market portfolio.
2. Equal-weighted indies are better in explaining stock market variance than value-weighted
indices.
3. The assumption that residuals are uncorrelated should theoretically preclude that the
portfolio which makes up the market contains any of the N stocks in the sample.
However, as long as the size of the portfolio is large, such a violation will have a small
eect and is typically ignored in applications.
Ledoit and Wolf (2003) suggested that the optimal choice of shrinkage should satisfy:
= /T, and = ( )/,
where , and are appropriately dened. It can be shown from Ledoit and Wolf (2003)
that for the optimal shrinkage constant the following are true:
=
N
i=1
N
j=1
ij
, =
N
i=1
N
j=1
ij
, and =
N
i=1
N
j=1
ij
,
where
ij
is the asymptotic variance of T s
ij
,
ij
is the asymptotic covariance of
T f
ij
and
T s
ij
, and
ij
is (
ij
ij
)
2
. Keeping the same notation as in the paper by Ledoit and Wolf
(2003), the consistent estimators for
ij
,
ij
and
ij
are as follows:

ij
=
1
T
T
t=1
{(r
it
r
i
)(r
jt
r
j
) s
ij
}
2
,
ij
=
1
T
T
t=1
ijt
, i = j,
ii
=
ij
,
and
ij
= (f
ij
s
ij
)
2
, where
ijt
=
s
j0
s
00
(r
it
r
i
) + s
i0
s
00
(r
jt
r
j
) s
i0
s
j0
(r
0t
r
0
)
s
2
00
(r
0t
r
0
)(r
it
r
i
)(r
jt
r
j
) f
ii
s
ij
with s
2
00
=
2
m
, s
j0
= Cov(r
j
, r
m
), and r
0t
= r
mt
. It can be shown that = ( )/ is a
consistent estimator for the optimal shrinkage constant = ( )/, where
=
N
i=1
N
j=1

ij
, =
N
i=1
N
j=1

ij
, and =
N
i=1
N
j=1

ij
.
As a result, Ledoit and Wolf (2003) recommended the following shrinkage estimator for the
covariance matrix of stock returns:
= F + (1 ) S,
where = /T. For more details about the theory and the methodology, please read the
paper by Ledoit and Wolf (2003).
7.5.3 Recent Developments
For the recent developments in this area, please read the papers by Ledoit and Wolf (2004)
and Fan, Fan and Lv (2008).
7.6 Problems
1. Read the paper by Fan, Fan and Lv (2008). Write a referee report in which you sum-
marize the main reasons for this paper, the novel approach proposed in the estimation
of variance-covariance matrix, and the main ndings.
2. Refer to the paper by Ledoit and Wold (2003) to do this problem. Use the data for
34 stocks in 34stocks.csv (or other stocks) to nd weights in the construction of the
optimal mean-variance portfolio using dierent approaches. The sample period is from
January, 1985 to September, 2004 with 237 observations. The rst column is for the
date of stocks observed and the columns 37-39 contain the information about the name
of companies. If you need the market returns (say, S&P500), please download them
by yourself but the sample period must be the same as that for 34 stocks. You may
use historical sample averages as estimates of expected values of stock returns.
(a) Use the sample variance-covariance matrix of stock returns S to construct the
optimal portfolio.
(b) Use the estimate of variance-covariance matrix of stock returns from the market
model F to construct the optimal portfolio.
(c) Use the improved estimate of variance-covariance matrix of stock returns S
to
construct the optimal portfolio.
3. Construct the mean-variance ecient frontier for the portfolio of the examined 34
stocks for the last month of the sample. If you need the value for the risk-aversion
coecient (), you can take to it to be as approximately 3. You may use any estimator
of variance-covariance matrix of stock returns. You may use the historical sample
average of stock returns as the estimate of expected value of returns.
4. Download data for returns on 30 Industry Portfolios
2
provided by Ken French at
http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/datalibrary.html
2
You need to use new specication of industries and monthly returns.
(a) Use the sample variance-covariance matrix of portfolio returns S to construct the
optimal portfolio consisting of 30 industry portfolios (asset classes).
(b) Use the estimate of variance-covariance matrix of portfolio returns from the mar-
ket model F to construct the optimal portfolio consisting of 30 industry portfolios.
(c) Use the Ledoit and Wolf (2003) or Fan, Fan and Lv (2008)s estimate of variance-
covariance matrix of stock returns S
to construct the optimal portfolio consisting

of 30 industry portfolios.
7.7 References
Bevan, A. and K. Winkelmann (1998). Using the Black-Litterman global asset alloca-
tion model: Three years of practical experience. Goldman Sachs. The web link is
http://faculty.fuqua.duke.edu/ charvey/Teaching/BA453 2005/GS Using the black.pdf
Black, F. and R. Litterman (1990). Asset allocation: Combining investor views with market
equilibrium. Fixed Income Research, Goldman, Sachs & Co., October.
Black, F. and R. Litterman (1992). Global portfolio optimization. Financial Analysts
Journal, September/October, 28-43.
Markets. Princeton University Press, Princeton, NJ. (Chapter 5.2).
Fan, J., Y. Fan and J. Lv (2008). High dimensional covariance matrix estimation using a
factor model. Journal of Econometrics, 147, 186-197.
Methods. Princeton University Press, Princeton, NJ. (Chapter 3.4, 4.2)
He, G. and R. Litterman (1999). The intuition behind the Black-Litterman model portfo-
lios. Investment Management Research, Goldman, Sachs & Co., December. The web
link is
http://faculty.fuqua.duke.edu/ charvey/Teaching/BA453 2005/GS The intuition behind.pdf
Idzorek, T.M. (2004). A step-by-step guide to the Black-Litterman model. The web link is
http://faculty.fuqua.duke.edu/charvey/Teaching/BA453 2005/Idzorek onBL.pdf
Markowitz, H. (1952). Portfolio selection. Journal of Finance, 7, 71-99.
Ledoit, O. and M. Wolf (2003). Improved estimation of the covariance matrix of stock
returns with an application to portfolio selection. Journal of Empirical Finance, 10,
603-621.
Ledoit, O. and M. Wolf (2004). A well-conditioned estimator for large-dimensional covari-
ance matrices. Journal of Multivariate Analysis, 88, 365-411
Sharpe, W.F. (1963). A simplied model for portfolio analysis. Management Science, 9,
277-293.
Chapter 8
Capital Asset Pricing Model
8.1 Review of the CAPM
Markovitz (1959) laid the groundwork for the capital asset pricing model (CAPM) and cast
the investors portfolio selection problem in terms of expected return and variance of return
and argued that investors would optimally hold a mean-variance ecient portfolio, i.e. a
portfolio with the highest expected return for a given level of variance. Sharper (1964)
and Lintner (1965a, 1965b) showed that if investors have homogeneous expectations and
optimally hold mean-variance ecient portfolio then, in the absence of market frictions, the
portfolio of all invested wealth (the market portfolio) will itself be a mean-variance ecient
portfolio.
The Sharper and Lintner version of the CAPM can be expressed in terms of the following
statistical model:
E(R
i
) = R
f
+
im
(E(R
m
) R
f
),
im
=
Cov(R
i
, R
m
)
Var(R
m
)
, (8.1)
where R
i
is the ith asset return, R
m
is the return on the market portfolio, R
f
is the return
on the risk-free asset, and stock market returns are assumed to be i.i.d and jointly normally
distributed (CER model). The Sharper-Lintner version can be expressed in terms of excess
returns:
E(Z
i
) =
im
E(Z
m
),
im
=
Cov(Z
i
, Z
m
)
Var(Z
m
)
, (8.2)
where Z
i
= R
i
R
f
and Z
m
= R
m
R
f
. In empirical applications, the estimates of
im
from (8.1) may dier because R
f
is stochastic. Notice that model (8.2) may be written as:
E(Z
i
) =
E(Z
m
)
Var(Z
m
)
Cov(Z
i
, Z
m
).
155
CHAPTER 8. CAPITAL ASSET PRICING MODEL 156
There are several derivations of the CAPM model.
1
One of the ways to derive the CAPM
model is to assume exponential utility and a normally distributed set of returns. In this case,
the expected utility is
E[u(c)] = E[exp(Ac)],
where A is the coecient of absolute risk aversion and c is consumption. If consumption is
normally distributed, c N(
c
,
2
c
), we have
E[u(c)] = exp
_
A
c
+
A
2
2

2
c
_
.
Suppose that the investor has initial wealth W which can be split between a risk-free asset
paying R
f
and a set of risky assets paying return R which are assumed to be normally
distributed. Let y denote the amount
2
of the wealth W invested in each security. Therefore,
the budget constraint is:
c = y
f
R
f
+y
R, and W = y
f
+y
,
where is an N 1 vector of ones. Then, consumption is normally distributed because
risky assets are normally distributed with the mean
c
= y
f
R
f
+ y
R
and the variance
2
c
= y
y, where is an N N covariance matrix of risky returns,

R
= E(R). Plugging
these equations into utility function, we obtain:
E[u(c)] = exp
_
A(y
f
R
f
+y
E(R)) +
A
2
2
y
y
_
,
= exp
_
AW R
f
Ay
(E(RR
f
) +
A
2
2
y
y
_
, (8.3)
where we use the constraint y
f
= W y
. Maximizing (8.3) we obtain the rst-order

condition describing the optimal amount to be invested in the risky asset,
A(E(R) R
f
) + A
2
y = 0,
so that
y =
1
A

1
[E(R) R
f
]. (8.4)
1
You may check Chapter 9 of Cochrane (2001) for a rigorous discussion.
2
Note that this is amount and not a fraction.
Note that the amount of wealth invested in risky assets is independent of the level of wealth.
That is why one usually says that the investor has absolute rather than relative risk aversion.
One may rewrite the equation (8.4) as:
E(R) R
f
= Ay. (8.5)
Note that
y = [E(R
R
)(R
R
)
] y = E(R
R
)(y
(R
R
))
= E(R
R
)[y
R+ y
f
R
f
(y
R
+ y
f
R
f
)] = Cov(R, R
p
),
where R
p
= y
R + y
f
R
f
, which is the investors overall portfolio. Therefore, y gives the
covariance of each return with the investors overall portfolio. If all investors are identical,
then the market portfolio is the same as the individuals portfolio so y also gives the
covariance of each return with R
m
, i.e y = Cov(R, R
m
). Equation (8.5) then becomes:
E(R) R
f
= ACov(R, R
m
). (8.6)
Note that equation (8.1) may be written as:
E(R) R
f
= Cov(R, R
m
)
_
E(R
m
) R
f
Var(R
m
)
_
,
which is the same as the model given in (8.2). Therefore, this derivation of CAPM ties the
market price of risk to the risk aversion coecient. This can also be seen by applying (8.6)
to the market return itself:
E(R
m
) R
f
= AVar(R
m
).
8.2 Statistical Framework for Estimation and Testing
Dene Z
t
as an N 1 vector of excess returns for N assets (or portfolios of assets). For
these N assets, the excess returns can be described using the excess-return market model:
Z
t
= + Z
mt
+e
t
, E(e
t
) = 0, E(e
t
e
t
) = , Cov(Z
mt
, e
t
) = 0,
where is the N 1 vector of betas, Z
mt
is the time period t market portfolio excess
return, and and e
t
are N 1 vectors of asset return intercepts and disturbances. Denote
E(Z
mt
) =
m
and E(Z
mt
m
)
2
=
2
m
. Three implications of Sharper-Lintner version of the
CAPM:
1. The vector of asset return intercepts is zero. The regression intercepts may be viewed
as the pricing errors.
2. The cross-sectional variation of expected excess returns is entirely captured by betas.
3. The market risk premium, E(Z
mt
), is positive.
There are three major methods of estimating the parameters: time series, cross-sectional,
and Fama-MacBeth, described next.
8.2.1 Time-Series Regression
The implication of the Sharper-Lintner version of the CAPM that the regression intercepts
of excess returns model are zero may be tested using time-series regressions. One runs N
time-series regressions:
Z
it
=
i
+
im
Z
mt
+ e
it
, i = 1, . . . , N.
The estimate of the factor premium (market premium), = E(Z
m
), may be found as the
sample mean of the factor:
=
1
T
T
t=1
Z
mt
.
For the case of uncorrelated and homoskedastic regression errors one may use the standard
t-tests to check that the pricing errors
i
, i = 1, ..., N, are in fact zero. However, one usually
wants to know whether all the pricing errors are jointly equal to zero. This hypothesis can
be tested using the following Wald-type
2
test
3
:
T
_
1 +
_

m

m
_
2
_
1

1

2
N
,
where

is the residual covariance matrix, i.e. the sample estimate of E(e
t
e
t
) = . This
test is valid asymptotically, i.e as T , and does not require the assumption of no
autocorrelation or heteroskedasticity. A nite-sample F-test for the hypothesis that a set of
parameters are jointly zero:
T N 1
N
_
1 +
_

m

m
_
2
_
1

1
F
N,TN1
.
3
You may check Chapter 5.3 of CLM (1997) and Chapter 12 of Cochrane (2001) for a rigorous discussion.
This distribution requires that the errors are normal as well as uncorrelated and homoskedas-
tic. Note that the assumption of uncorrelated residuals is needed to make sure that

is
non-singular. See CLM (1997, p.193) for details.
If there are many factors that are excess returns, the same ideas work. The regression
equation is
Z
it
=
i
+
i
f
t
+ e
it
,
where f
t
is a K 1 vector of excess returns,
i
is a K 1 vector of factor loadings. The
asset pricing model has the following form:
E(Z
it
) =
i
E(f
t
).
We can estimate and with ordinary least squares (OLS) time-series regressions. Assum-
ing normal i.i.d. errors with constant variance, one may use the following test statistic:
T N K
N
_
1 +
1
f

f
_
1

1
F
N,TNK
,
where N is the number of assets, K is the number of factors and

f
=
1
T
T
t=1
(f
t

f
)(f
t

f
)
. Cochrane (2001, p.234) showed that the asymptotic

2
test
T
_
1 +
1
f

f
_
1

1

2
N
.
does not require the assumption of i.i.d errors or independence from factors.
8.2.2 Cross-Sectional Regression
The central economic question is why average returns vary across assets. For the excess
returns model of Sharper and Lintner (see (8.2)), we have
E(Z
i
) =
im
,
where E(Z
m
) = is the factor risk premium. This model states that the expected returns
of an asset should be high if that asset has high betas or a large risk exposure to factor(s)
that carry high risk premia. This is illustrated in Figure 8.1. The model says that average
returns should be proportional to betas. However, even if the model is true, it will not work
out perfectly in each sample, so there will be some spread
i
as shown. Given these facts,
a natural idea is to run a cross-sectional regression to t a line through the scatter plot of
Figure 8.1. Cross-sectional regressions consist of two steps:
Cross-sectional regression
Assets i
i
E(Z
i
)
Slope =
Figure 8.1: Cross-sectional regression.
1. Find estimates of the betas from time-series regressions:
Z
it
=
i
+
i
f
t
+ e
it
, i = 1, . . . , N.
Use the estimated parameters
i
, i = 1, ..., N, to form an N K matrix B of factor
loadings to be used in the second step such as B
= (
1
,
2
, ,
N
).
2. Estimate the factor risk premia from a regression across assets of average returns on
the betas:

Z
= B +, (8.7)
where
Z
= (
Z1
,
Z2
, ,
ZN
)
, = (
1
,
2
, ,
N
)
,
Z
is an N 1 vector,
is a K 1 vector of risk premia (or factor returns),
Zi
=
1
T
T
t=1
Z
it
, and
i
is a
K1 vector. As in the gure, are right-hand variables, are regression coecients,
and the cross-sectional regression residuals in are the pricing errors. You can run
the cross-sectional regression with or without a constant. The theory says that the
constant should be zero.
OLS Cross-Sectional Regression
Consider a model with only factor without intercept in the cross-sectional regression. OLS
cross-sectional estimates are:
= (B
B)
1
B

Z
, and =
Z
B
=
_
I B(B
B)
1
B

Z
,
where the true errors are i.i.d over time and independent of the factors. Since the
i
are just
time series averages of the true e
it
, the errors in the cross-sectional regression have covariance
matrix E(
) =
1
T
. Then,
Var(
) =
1
T
(B
B)
1
B
B(B
B)
1
,
and
Var( ) =
1
T
(I B(B
B)
1
B
) (I B(B
B)
1
B
).
We could test whether all pricing errors are zero with the statistics:

Var( )
1

2
NK
. (8.8)
Note that the asymptotic distribution in (8.8) is
2
NK
but not
2
N
because the covariance
matrix is singular and one has to use a generalized inverse.
GLS Cross-Sectional Regression
Generalized least squares (GLS) cross-sectional estimates are:
= (B
1
B)
1
B
1

Z
, and =
Z
B
.
The variance of these estimates is as follows:
Var(
) =
1
T
(B
1
B)
1
, and Var( ) =
1
T
(B(B
1
B)
1
B
).
One could use the test in (8.8)

Var( )
1

2
NK
,
or use an equivalent test that does not require a generalized inverse:
T
1

2
NK
. (8.9)
For details, see Cochrane (2001, p.238).
Correction for the Fact that B are Estimated
In applying standard OLS and GLS formulas to a cross-sectional regression, we assume
that the right-hand variables B are xed. This is not true since the B in cross sectional
regression are not xed but are estimated in the time-series regressions. The correction for
the estimation of B is due to Shanken (1992):
Var(
OLS
) =
1
T
[(B
B)
1
B
B(B
B)
1
(1 +
1
f
) +
f
]
Var(
GLS
) =
1
T
[(B
1
B)
1
(1 +
1
f
) +
f
]
Var(
GLS
) =
1
T
(I B(B
B)
1
B
)(I B(B
B)
1
B
) (1 +
1
f
)
Var(
OLS
) =
1
T
(B(B
1
B)
1
B
)(1 +
1
f
)
One can use the test (8.8) with corrected estimates of the variances. One can also use the
test in (8.9) for the corrected GLS estimates:
T(1 +
GLS
1
f
GLS
)
GLS
1

GLS

2
NK
. (8.10)
For details, see Cochrane (2001, p.239).
Time Series versus Cross Section
The main dierence between cross-sectional and time series regression is that one can run
the cross-sectional regression when the factor is not a return. The time series test requires
factors that are also returns, so that you can estimate factor risk premia by

=
1
T
T
t=1
f
t
.
If the factor is an excess return, the GLS cross-sectional regression, including the factor as
a test asset, is identical to the time-series regression.
8.2.3 Fama-MacBeth Procedure
Fama and MacBeth (1973) suggested an alternative procedure for running cross-sectional
regressions, and for producing standard errors and test statistics. This procedure is widely
used in practice and consists of two steps.
1. Find beta estimates with a time-series regression.
2. Instead of estimating a single cross-sectional regression with the sample averages, by
assuming knowing s, we now run a cross-sectional regression at each time period, i.e.
Z
it
=
t
+
t
+ e
it
, t = 1, . . . , T.
Then, Fama and MacBeth (1973) suggested that one estimates and as the average of
the cross-sectional regression estimates:
=
1
T
T
t=1
t
, and =
1
T
T
t=1

t
.
One can use the standard deviations of the cross-sectional regression estimates to generate
sampling errors for these estimates
Cov(
) =
1
T
2
T
t=1
(
)(
, and

Cov( ) =
1
T
2
T
t=1
(
t
)(
t
)
.
It is 1/T
2
because we are nding standard errors of sample means,
2
/T. To test whether
all the pricing errors are jointly zero one can use the
2
test (or t-test) that we have used
before

Cov( )
1

2
NK
,
where K = 1. Fama and MacBeth (1973) used the variation in the statistic

t
over time to
deduce its variation across samples. For mode details, see Chapter 12 of Cochrane (2001,
p.244-p.246) and CLM (1997, p.215-p.216).
8.3 Empirical Results on CAPM
8.3.1 Testing CAPM Based On Cross-Sectional Regressions
The early evidence on testing CAPM was largely positive reporting the evidence consistent
with the mean-variance eciency of the market portfolio which implies that (a) expected
returns on securities are a positive linear function of their market s and (b) market s
suce to describe the cross-section of expected returns. However, less favorable evidence
for the CAPM started to appear in the so-called anomalies literature. The anomalies liter-
ature shows that contrary to the prediction of the CAPM, the rm characteristics provide
explanatory power for the cross section of average returns beyond the betas of CAPM. This
literature documents several deviations from the CAPM that are related to the following
variables:
1. Size: market equity (ME) adds to the explanation of the cross-section of average
returns.
2. Earnings yield eect.
3. Leverage.
4. The ratio of a rms book value of equity to its market value (BE/ME or B/M).
5. The ratio of earning to price (E/P).
We will consider how the cross-sectional regressions are used in practice to test CAPM by
looking at the paper of Fama and French (1992), denoted by FF, and at the paper of Kothari,
Shanken and Sloan (1995), denoted by KSS (1995).
The FFs ndings can be summarized as follows:
1. There is only a weak positive relation between average return and beta over the period
1941-1990. There is virtually no relation over 1963 -1990.
2. Firm size and B/M ratio do a good job of capturing cross-sectional variation in average
returns over 1963-1990. Moreover, the combination of size and B/M ratio seems to
absorb the roles of leverage and E/P ratio in average stock returns
The goal of KSS (1995) is as follows:
1. Re-estimate betas to see whether betas can explain cross-section variation over 1941-
1990 and 1926-1990 using a dierent data set.
2. Examine whether B/M captures cross-sectional variation in average returns over 1947
-1987.
The analysis of KSS (1995) is done using cross-sectional regressions of average monthly
returns on annual betas. The KSS (1995)s ndings may be summarized as follows:
1. There is substantial ex post compensation for beta risk over 1941-1990 and even more
so over 1927-1990. Estimated risk premium for dierent portfolio aggregations range
6.2 11.7%.
2. Using an alternative data source, S&P industry level data, KSS (1995) found that B/M
ratio has a weaker eect on the returns than that in FF.
3. Size, as well as beta, is needed to account for the cross-section of expected returns.
8.3.2 Return-Measurement Interval and Beta
KSS (1995) used the annual data to estimate the market betas unlike FF who used return
data for the beta estimation. KSS (1995) argued that there are at least three reasons longer
measurement-interval returns:
1. CAPM does not provide guidance on the choice of horizon.
2. Beta estimates are biased due to trading frictions and non-synchronous trading or other
phenomena. These biases are mitigated by using longer interval return observations.
3. There appears to be a signicant seasonal component to monthly return. Annual return
data is one of the ways to avoid statistical complications that arise from seasonality in
returns.
8.3.3 Results of FF and KSS
KSS (1995) presented the results of cross-sectional regressions for a variety of portfolio ag-
gregation procedures:
Grouping on beta alone.
Grouping on size alone.
Taking intersections of independent beta or size groupings.
Ranking rst on beta and then on size within each beta group.
Ranking rst on size and then on beta.
Note that to form portfolios KSS (1995) estimated beta using the monthly return data over
2 or 5 years. The annual time-series of post-ranked beta-size ranked portfolios are then used
to re-estimate the full-period post-ranking betas for use in cross-sectional regressions. The
cross-sectional model:
R
pt
=
0t
+
1t
p
+
2t
Size
pt1
+ e
pt
, (8.11)
where R
pt
the equally weighted (can be value-weighted) buy-and-hold return on portfolio p
from month t;
p
is the full-period post-ranking beta of portfolio p, Size
pt1
is the natural
log of the average market capitalization on June 30 of year t of the stocks in portfolio p,
0t
,
1t
and
2t
are regression parameters; e
pt
is the regression error. FF also included other
variables in cross-sectional regression (8.11). In particular, FF included leverage, E/P, B/M.
The estimation of models in (8.11) is known as horse race because it allows to test
whether one set of factors drives out another. For example, we want to know, given market
betas
p
, do we need Size factor to price assets, i.e. is
2t
= 0. Obviously, one can use the
asymptotic covariance matrix for
0t
,
1t
,
2t
(by using the improved method by Ledoit and
Wolf (2003)) to form the standard t-test. Note also that
jt
in (8.11) ask whether factor j
helps to price assets given the other factors,
jt
gives the multiple regression coecient of
R
pt
on factor j given the other factors. Risk premium
j
asks whether factor j is priced.
Results: See Tables I, II, III, IV, and V from FF and Tables I, II, and III from KSS
(1995).
The conclusion of KSS (1995) is that beta continues to dominate for size-ranked portfolios.
Then KSS (1995) analyzed selection biases and how it may aect the results from B/M factor.
The intuition is that many rms with high B/M values in 1973 went bankrupt before 1978
and therefore were not included in the COMPUSTAT database. Only the rm with high
B/M that did unexpectedly well were included in the database. As a result, it may have
created the selection bias and aected the eect of B/M factor.
8.4 Problems
1. Download the monthly data for 34 stock prices in the le 34stocks.csv. Estimate
the single index model for all stocks in the le 34stocks.csv. You can download the
market returns (say, S&P500 index return) by yourself but the sample period must be
the same as that for 34 stocks in the le.
(a) Use time-series regressions to test the validity of CAPM model for all stocks
simultaneously and individually.
(b) For each stock, present the estimates of market beta and the proportion of risk at-
tributed to the systematic risk. What can you say about the relationship between
the stock systematic risk and stock beta?
(c) For each stock, present the estimates of market beta and average sample returns.
What can you say about the relationship between average stock returns and their
market betas?
(d) Sort your stocks according to the estimates of market beta. Split your stock
into three portfolios containing approximately equal number of stocks. In the
rst portfolio you should collect stocks with the low beta, in the second portfolio
collect stocks with the medium beta, and in the third portfolio collect stocks
with the highest beta. This way, you will create a portfolio of low beta stocks, a
portfolio of medium beta stocks, a portfolio of high beta stocks.
(e) Compute the equal-weighted portfolio returns for the constructed portfolios.
(f) Estimate the portfolio market betas. What can you say about the relationship
between average portfolio returns and portfolio betas?
(g) Run Fama-MacBeth cross-sectional regressions for the constructed portfolios and
test for the validity of CAPM model.
8.5 References
Chan, L.K.C., J. Karceski and J. Lakonishok (1998). The risk and return from factors.
Journal of Financial and Quantitative Analysis, 33, 159-88.
Chan, L.K.C., J. Karceski and J. Lakonishok (1999). On portfolio optimization: Forecasting
covariances and choosing the risk model. Review of Financial Studies, 12 937-74.
Cochrane, J.H. (2001). Asset pricing. Princeton University Press, Princeton, NJ. (Chapters
9 and 12)
Davis, J.L., E.F. Fama and K.R. French (2000). Characteristics, covariances, and average
returns: 1929 to 1997. Journal of Finance, 55, 389-406.
Fama, E.F. and K.R. French (1992). The cross-section of expected stock returns. The
Fama, E.F. and K.R. French (1998). Value versus growth: The international evidence.
Fama, E.F. and J. MacBeth (1973). Risk, return, and equilibrium: Empirical tests. Journal
of Political Economy, 71, 607-636.
Methods. Princeton University Press, Princeton, NJ. (Chapters 3-4)
Kothari, S.P., J. Shanken and R.G. Sloan (1995). Another look at the cross-section of
expected stock returns. The Journal of Finance, 50, 185-224.
Liew, J. and M. Vassalou (2000). Can book-to-market, size and momentum be risk factors
that predict economic growth? Journal of Financial Economics, 57, 221-245.
Lintner, J. (1965a). Security prices, risk and maximal gains from diversication. Journal
of Finance, 20, 587-615.
Lintner, J. (1965b). The valuation of risky assets and the selection risky investments in
stock portfolios and capital budgets. Review of Economics and Statistics, 47, 163-196.
Markowitz, H. (1959). Portfolio Selection: Ecient Diversication of Investments. John
Wiley, New York.
Shanken, J. (1992). On the estimation of Bets-pricing models. Review of Financial Studies,
5, 1-34.
Sharper, W. (1964). capital asset prices: A theory of market equilibrium under conditions
of risk. Journal of Finance, 19, 425-442.
Chapter 9
Multifactor Pricing Models
9.1 Introduction
We have discussed the papers by Fama and French (1992, denoted by FF, here after) and
Kothari, Shanken and Sloan (1995) who showed that CAPM model (single factor model)
does not completely explain the cross section of expected returns and that some additional
factors may be needed to explain the dynamics of expected returns. Two main theoretical
approaches exist to allow for multiple risk factors: arbitrage pricing theory (APT) and inter-
temporal capital asset pricing model (ICAPM). APT is based on arbitrage arguments and
ICAPM is based on equilibrium arguments.
9.1.1 Why Do We Expect Multiple Factors?
The CAPM simplies matters by assuming that the average investor only cares about the
performance of his/her portfolio. This is not true in practice since the average investor has a
job. Investors are hurt during recessions because some of the investors loose jobs while others
may have lower income (lower salaries). As a result, most investors may prefer the stocks
that do well in recessions, i.e. counter-cyclical stocks. Therefore, pro-cyclical stocks
that do well during expansion and worse during recessions will have to oer higher average
returns than counter-cyclical stocks that do well in recessions. This leads Cochrane (1999)
to conclude that we may expect another dimension of risk arising from covariation with
recessions, bad times, that will matter for explaining the average returns.
Empirically useful multifactor asset pricing models include more direct measure of good
times or bad times:
1. The market return.
169
CHAPTER 9. MULTIFACTOR PRICING MODELS 170
2. Events such as recessions or macroeconomic factors that drive investors non-investment
sources of income.
3. Variables such as the D/P ratio or yield curve that forecast stock or bond returns,
so-called state variables for changing investment opportunity sets.
4. Returns on other well-diversied portfolios. These portfolios are called factor-mimicking
portfolios. They can be constructed as the tted value of a regression of any pricing
factor on the set of all asset returns. This portfolio carries exactly the same pricing
information as the original factor.
Note that it is important from theoretical point of view that the extra factors aect the
average investor.
9.1.2 The Model
The Arbitrage Pricing Theory provides an approximate relation for expected asset returns
with an unknown number of unidentied factors. The APT assumes that markets are compet-
itive and frictionless and that the return generating process for asset returns being considered
is:
R
it
R
f
t
= c
i
+
im
(R
mt
R
f
t
) +
iA
F
At
+
iB
F
Bt
+ ... + e
it
, 1 i N, 1 t T. (9.1)
Therefore, multifactor models use a time-series multiple regression to quantify an assets
tendency to move with multiple risk factors F
A
, F
B
1
, etc. The equation (9.1) can be written
as follows:
R
it
= c
i
+b
i
F
t
+ e
it
, E(e
it
| F
t
) = 0, E(e
2
it
) =
2
i
< , 1 i N, 1 t T,
where R
it
is the return for asset i at period t, c
i
is the intercept of the factor model, b
i
is a
K1 vector of factor loadings for asset i, F
t
is a K1 vector of common factor realizations,
and e
it
is the disturbance term. For the system of N assets the model is written as:
R
t
= c +BF
t
+e
t
, E(e
t
| F
t
) = 0, E(e
t
e
) = , 1 t T,
1
Note that APT does not specify that one of the factors should be excess market returns but it is usually
assumed that one of the factors is excess market returns
where R
t
is an N 1 vector with R
t
= (R
1t
, R
2t
, , R
Nt
)
, c is an N 1 vector with
c = (c
1
, c
2
, , c
N
)
, B is an N K matrix with B = (b
1
, b
2
, , b
N
)
. It is also assumed
that the disturbance term for large well-diversied portfolios vanishes.
Given this structure, Ross (1976) showed that the absence of arbitrage in large economies
implies that:

0
+B
K
,
where is the N 1 expected return vector,
0
is the model zero-beta parameter and is
equal to a riskfree return if such an asset exists, and
K
is a K 1 vector of factor risk
premia. Exact factor pricing can be derived from an intertemporal asset pricing framework.
We will analyze models where we have exact factor pricing and will not dierentiate the
APT from ICAPM. Therefore,
=
0
+B
K
.
The multifactor models specify neither the number of factors nor the identication of the
factors. Therefore, to estimate and test the model, we need to determine the factors which
may be observed and unobserved.
9.2 Selection of Factors
There are two approaches to specify the factors: statistical and theoretical.
9.2.1 Theoretical Approaches
Theoretically based approaches fall into two main categories. One approach is to specify
macroeconomic and nancial market variables that are thought to capture the systematic
risks of the economy. A second approach is to specify characteristics of rms which are likely
to explain dierential sensitivity to the systematic risks and then form portfolio of stocks
based on the characteristics.
9.2.2 Small and Value/Growth Stocks
Small Cap, Large Cap, Value, Growth stocks are the names that often used in
nance industry. Small cap stocks have small market values (price times shares outstand-
ing). Value stocks or high book/market stocks have market values that are small relative
to accountants book value. Recall that FF (1993) group stocks in portfolios according to
size and B/M variables and show that both categories of stocks, Small Cap and Value,
have relatively high average returns. Large Cap and Growth stocks are the opposite
and seem to have unusually low average returns.
To explain the dierence between stocks related to size and B/M, FF (1993) advocated
a three factor model with the market return, the return on small less big stocks (SMB)
portfolio, and the returns of high B/M less low B/M stocks (HML) portfolio. These three
factors seem to explain cross-sectional variation in average returns for 25 size and B/M
portfolios. FF (1995) argued that the size and value factors are related to the protability or
nancial distress of a rm. Cochrane (1999) noted that one cannot count the distress of the
individual rm as a risk factor because such distress is idiosyncratic and can be diversied
away. However, the typical investor is an owner of small business and an investors income
may be sensitive to the kinds of nancial distress among small and distressed value rms.
Therefore, the typical investor would demand a big premium to hold value stocks instead of
growth stocks at low premium.
9.2.3 Macroeconomic Factors
Researchers look at labor income, industrial production, ination, investment growth as
possible other factors that explain cross section of returns. The factors are easier to motivate
from theoretical point of view but are not as successful as size and value factors of Fama
and French (1993).
Momentum Factor
There is evidence of momentum eect which states that the stocks with the higher average
returns (winners) during the most recent 12 month (excluding the most recent month) con-
tinue to win, i.e. to earn relatively average returns, than the stocks with low returns (losers).
The three factor model of FF (1993) can not explain this phenomena.
Note that even though the model of FF can not explain the momentum phenomena, it
can explain reversal phenomena.
Multifactor Model of FF (1993)
FF (1993) identied ve common risk factors in the returns on stocks and bonds:
1. Stock-market risk factors
(a) A market factor
(b) A factor related to size, so-called size factor
(c) A factor related to B/M, so-called value factor
2. Bond-market risk factors
(a) Term spread: a factor that should capture unexpected changes in interest rates
(b) Default spread: a factor that should capture the shifts in economic conditions
that change the likelihood of default
The paper by Fama (1993) extended the paper by French and Fama (1992) in several ways:
1. The set of asset returns is expanded. FF (1993) analyzed stock returns as well as bond
returns while FF (1992) analyzed only stock returns.
2. The set of possible factors that may explain the stock returns is expanded. FF (1993)
analyzed the eect of bond market risk factors on stock returns.
3. Dierent econometric approach is used. FF (1993) used time-series approach while
FF(1992) used cross-sectional approach. To make the use of time-series approach
possible FF constructed factor mimicking portfolios.
Construction of the Explanatory Variables for the Time-Series Regressions
Bond-market factors are constructed as follows:
Term spread factor. TERM = monthly long-term government bond return minus one-
month Treasury bill rate
Default factor. DEF = monthly return on a market portfolio of long-term corporate
bonds minus the long-term government bond return
Construction of market factor is easy. It is simply the excess return on market portfolio.
Construction of factor mimicking portfolios that are meant to capture the size eect and
B/M eect is more involved and consists of two steps:
1. Construct six size-B/M portfolios. To construct six size-B/M one ranks NYSE stocks
on market capitalization. The median NYSE size is then used to split NYSE, Amex
and NASDAQ stocks into two groups, small and big (S and B). Then rank NYSE
stocks on B/M ratio and compute the breakpoints for the bottom 30% (Low), middle
40% (Medium), and top 30% (High) of the ranked B/M values. Then split all NYSE,
Amex, NASDAQ stocks intro three B/M portfolios. Construct six portfolios (S/L,
S/M, S/H, B/L, B/M, B/H) from intersection of the two market capitalization and
three B/M groups. For example, the S/L portfolio contains the stocks in the small
market capitalization group that are also in the low B/M group.
2. Construct Size (SMB) and Value (HML) factors.
(a) Size factor is the return on SMB (small minus big) portfolio. It is designed to
mimic the risk factor in returns related to size.
SMB =
1
3
[(R
S/L
R
B/L
) + (R
S/M
R
B/M
) + (R
S/H
R
B/H
)],
where R
S/L
is the portfolio return of S/L portfolio and so on. SMB is the dier-
ence between the returns on small- and big-stock portfolios for about the same
weighted-average book-to-market equity.
(b) Value factor is the return on HML (high minus low) portfolio. It is designed to
mimic the risk factor in returns related to book-to-market equity.
HML =
1
2
[(R
S/H
R
S/L
) + (R
B/H
R
B/L
)].
The two components of HML are returns on high- and low - B/M portfolios with
about the same weighted-average size.
The Returns to be Explained
The returns to be explained (the dependent variables in the time-series regressions) are the
excess returns on two government and ve corporate bond portfolios and 25 stock portfolios
formed on size and B/M equity. The twenty ve stock size - B/M equity portfolios are
formed in the same way as in FF (1992). Time series regressions run:
1. To analyze whether the bond-market factors capture the common variation in stock
returns, FF (1993) run the following regression:
R
t
R
f
t
= a + mTerm
t
+ d DEF
t
+ e
t
.
Based on the t-stat test, both m and d are signicant
2. Analysis of the stock market factors is done by running three dierent types of regres-
sions
R
t
R
f
t
= a + b [R
mt
R
f
t
] + e
t
(9.2)
R
t
R
f
t
= a + s SMB
t
+ hHML
t
+ e
t
(9.3)
R
t
R
f
t
= a + b [R
mt
R
f
t
] + s SMB
t
+ hHML
t
+ e
t
. (9.4)
Regression (9.2) analyzes how much of the variation is stock returns may be captured
by market factor alone. Regression (9.3) analyzes how much of the variation is stock
returns may be captured by size and value factors alone and the last regression analyzes
the how much of the variation is captured by three stock-market factors.
3. FF (1993) also ran a ve factor model:
R
t
R
f
t
= a + b [R
mt
R
f
t
] + s SMB
t
+ hHML
t
+ mTerm
t
+ d DEF
t
+ e
t
.
For the detailed results: see Tables 1 - 8 in FF (1993). We list the summary of the results
as follows:
The regression slopes and R
2
establish that the stock-market returns, SMB, HML and
R
m
R
f
, and the bond-market returns, TERM and DEF, proxy for risk factors.
These three stock-market factors and two bond-market factors capture common vari-
ation in stock and bond returns.
Stock returns have shared variation related to three stock-market factors and they are
linked to bond returns through shared variation in two term-structure factors
Next step that FF (1993) took is to run cross section regressions of dierent factor models
and test whether the intercept in cross section regression is dierent from zero. FF (1993)
also analyzed whether their factor model can explain the cross section of returns formed on
E/P, D/P ratios and conclude that their model can explain E/P and D/P anomaly.
9.2.4 Statistical Approaches
We will now consider a model in which factors are simple linear functions of some observable
variables. Assume that there are actually many variables that eect the stock returns R
t
.
This may be represented by a system of seemingly unrelated equations:
R
t
= BX
t
+e
t
, (9.5)
where B is an N L matrix, X
t
is a L 1 vector of observable explanatory variables, and
e
t
is an N-dimensional error term, E(e
t
| X
t
) = 0 and Var(e
t
| X
t
) = . Note that in this
model the matrix X
t
is dierent from the matrix of factors F
t
. Our goal is to create a matrix
of factors F
t
by decreasing the number of variables in X
t
so that the common explanatory
eect of variables in X can be summarized by a smaller number of variables in F
t
.
If the rank of matrix B is rank(B) = K < N, the model (9.5) can be written as:
R
t
= AX
t
+e
t
= F
t
+e
t
, (9.6)
where is an N K matrix, A is a K L matrix
2
, F
t
is a K 1 vector of factors and
F
t
= AX
t
, F
k,t
=
L
l=1
a
lk
X
l,t
, k = 1, ..., K, (9.7)
or in matrix form F = XA
, where F is a T K matrix of factors, X is a T L matrix

of observations, and A is a K L matrix. The coecient
ik
is the sensitivity of the stock
return R
i
with respect to the factor F
k
. As mentioned before, there exist various possible
choices of the set of observable explanatory variables for the model (9.5):
1. The explanatory variables may consist of macroeconomic variables.
2. The explanatory variables may include lagged values of endogenous variables leading
to a VAR specication.
3. The explanatory variables may consist of the values of some specic portfolios.
Once we have the matrix of variables X, how do we estimate A so that we can form F =
XA
?
2
Note that this A has nothing to do with A in regressions of FF (1993).
Principal Components Analysis
Principal components analysis (PCA) is a technique to reduce the number of variables being
studied without losing too much information in the covariance matrix. The principal com-
ponents serve as the factors. The rst sample principal component is a
1
R where the N 1
vector a
1
is the solution to the following problem:
max
a
1
a
a
1
subject to a
1
a
1
= 1, where

is the sample covariance of stock returns R (or factors). The
solution a
1
is the eigenvector associated with the largest eigenvalue of

. We can dene
the rst factor F
1
as follows: F
1
= w
1
R where w
1
= a
1
/
a
1
. The second sample principal
component solves the following problem:
max
a
2
a
a
2
subject to a
2
a
2
= 1. The solution is the eigenvector associated with the second largest
eigenvalue of

. The second factor portfolio will be F
2
= w
2
R, where w
2
= a
2
/
a
2
, and
F
1
and F
2
are uncorrelated. In general, the jth factor will be F
j
w
j
R where w
j
is the
re-scaled eigenvector associated with with the jth largest eigenvalue of

, and {F
j
} are
uncorrelated. Also,
j
= Var(a
) is the j-th largest eigenvalue of

. In other words,
1

2

N
0. The underlying theory of factor models does not specify the
number of factors, K, that are required in the estimation. One approach to determine K is
to estimate the model for dierent value of K and observe if tests and results are sensitive
to increasing number of factors. Alternatively, one can choose K such as
K
j=1
N
j=1
j
= certain percentage, say 85% or 90% or 95%.
For more details about the Principal Component Analysis, see Chapter 9 of Tsay (2005).
The R-code for PAC is princomp() and the R-code for computing eigenvalues and their
associated eigenvectors is eigen().
Factor Analysis
Estimation using factor analysis involves two steps:
1. The factor sensitivity matrix B and the disturbance covariance matrix are estimated.
2. The estimates of B and are used to construct factors.
Step 1:
For standard factor analysis it is assumed that is diagonal. Given this assumption the
covariance matrix of asset returns in the model (9.6) is as follows:
= B
K
B
+D, (9.8)
where E(F
t
F
t
) =
K
and = D to indicate it is diagonal. For identication purposes, it is
assumed that the factors are orthogonal and have unit variance which implies that
K
= I.
With these restrictions (9.8) can be written as:
= BB
+D. (9.9)
Given the assumption in (9.9), estimators B and D can be formulated using MLE.
Step 2:
Without loss of generality we can restrict the factors to have zero means and express the
factor model in terms of deviations about the means:
R
t
= BF
t
+e
t
.
Given the MLE estimates

B and

D, the Generalized Least Squares (GLS) estimator of f
t
is
found as follows:
f
t
= (
B

D
1
B)
1

B

D
1
(R
t
).
Here we are estimating f
t
by regressing R
t
onto

B. The series

f
t
, t = 1, . . . , T, can be
used to test the model. Since the factors are linear combinations of returns we can construct
portfolios which are perfectly correlated with factors. Denote

R
Kt
as the K 1 vector of
factor portfolio returns for time period t, we have
R
Kt
= AWR
t
,
where W = (
B

D
1
B)
1

B

D
1
, A is dened as diagonal matrix with 1/W
jj
as the jth
diagonal element, and W
j
is the jth element of W. The factor portfolio weights obtained
for the jth factor from this procedure are equivalent to the weights that would result from
solving the following optimization problem and then normalizing the weights to one:
min
w
j
{w
Dw
j
: w
b
k
= 0, k = j, and w
b
k
= 1}.
Therefore, the factor portfolio weights minimize the residual variance subject to the con-
straints that each factor portfolio has a unit loading on its own factor and zero loading on
other factors. For more details about the Factor Analysis, see Chapter 9 of Tsay (2005).
The R-code for factor analysis is factanal().
See Section 9.5.3 in Tsay (2005) for applications and their R-codes for computing.
9.3 Problems
1. Consider the monthly log stock returns, in percentages and including dividends, of
Merk & Company, Johnson & Johnson, General Electric, General Motors, Ford Motor
Company, and values-weighted index from January 1960 to December 1999; see the
le ch9-1.txt, which has six columns in the order listed before.
(a) Perform a principal component analysis of the data using the sample covariance
matrix.
(b) Perform a principal component analysis of the data using the sample correlation
matrix.
(c) Perform a statistical factor analysis on the data. Identify the number of com-
mon factors. Obtain estimates of factor loadings using the principal component
method.
2. The le ch9-2.txt contains the monthly simple excess returns of ten stocks and the
S&P500 index. The three-month Treasury bill rate on the secondary market is used
to compute the excess returns. The sample period is from January 1990 to December
2003 for 168 observations. The 11 columns in the le contain the returns for ABT,
LLY, MRK, PFE, F, GM, BP, CVX, RD, XOM, and SP5, respectively.
(a) Analyze the ten stocks excess returns using the single-index market model. Plot
the beta estimate and Rsquare for each stock, and use the global minimum
variance portfolio to compare the covariance matrices of the tted model and the
data.
(b) Perform a statistical principal component analysis on the data. How many com-
mon factors are there?
(c) Perform a statistical factor analysis on the data. How many common factors are
there if the 5% signicance level is used? Plot the estimated factor loadings of
the tted model. Are the common factors meaningful?
9.4 References
Cochrane, J.H. (1999). New facts in nance. NBER Working Paper #7169. Economic
Perspectives Federal Reserve Bank of Chicago, 23(3), 36-58.
Famma, E.F. (1993). Multifactor portfolio eciency and multifactor asset pricing models.
Working Paper, CRSP, University of Chicago.
Fama, E.F. and K.R. French (1992). The cross-section of expected stock returns. The
Fama, E.F. and K.R. French (1993). Common risk factors in the returns on stocks and
bonds. The Journal of Financial Economics, 33, 3-56.
Fama, E.F. and K.R. French (1995). Size and book-to-market factors in earnings and
returns. The Journal of Finance, 50, 131-155.
Kothari, S.P., J. Shanken and R.G. Sloan (1995). Another look at the cross-section of
expected stock returns. The Journal of Finance, 50, 185-224.
Liew, J. and M. Vassalou (2000). Can book-to-market, size and momentum be risk factors
that predict economic growth? Journal of Financial Economics, 57, 221-245.
Ross, S. (1976). The arbitrage theory of capital asset pricing. Journal of Economic Theory,
13, 341-360.

Eco No Metric Analysis of Financial Market Data

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Eco No Metric Analysis of Financial Market Data

Diunggah oleh

Hak Cipta:

Format Tersedia

Econometric Analysis of Financial Market Data

versus , which is called the ACF plot.

}, = 1, 2. It is a stylized fact that there is positive dependence

, i = 1, 2 and = 1, 2. Compute autocorrelation coecients

T] covers the true un-

is a (T1)1 vector of the observations, Y

is a (T 1) 1 vector of the observations.

is a (T 1) 1 vector of the residuals. One can also assume that

T, which shows up in the ACF plot

-statistic in (4.7)) and well known in the

determines the classication of days. If day t is a Buy, then day t +1

(n) statistics reject RW hypothesis for equal-weighted index but not

VR(12) = 1.31 with a similar ratio of 1.27 for the equal-weighted

VR(n) > 1 and

(n) are largest

(n) is not signicant.

, which is indeed a second order bias-correction method.

must be the least biased

. The break processes for stock prices

as the sample average of the N ab-

Y which is consistent (i.e.,

] = 0 (i.e., residuals are not

by its consistent estimator

, makes possible to calculate standard errors of the regression coecients

, 1). Note that J

N(0, 1). The power function is then

An acquisition cost of portfolio at date t is W

be an N 1 vector of stock returns at period t and

w is the risk attributed to common factors and w

is nonsingular. and there is no question

to construct the optimal portfolio consisting

y, where is an N N covariance matrix of risky returns,

. Maximizing (8.3) we obtain the rst-order

. Cochrane (2001, p.234) showed that the asymptotic

, where F is a T K matrix of factors, X is a T L matrix

) is the j-th largest eigenvalue of

Anda mungkin juga menyukai