Anda di halaman 1dari 110

Sam Robert Parry

CT8: Financial Economics

Revision Notes based on Teaching Units in


Online Classroom

Contents
1 The Efficient Markets Hypothesis 2

2 Utility Theory 5

3 Stochastic Dominance and Behavioural Finance 8

4 Measures of Investment Risk 13

5 Portfolio Theory 20

6 Models of Asset Returns 28

7 Asset Pricing Models 33

8 Brownian Motion and Martingales 37

9 Stochastic Calculus and Ito Processes 42

10 Stochastic Models of Security Prices 47

11 Introduction to the Valuation of Derivative Securities 50

12 The Greeks 62

13 The Binomial Model 65

14 The Black-Scholes Option Pricing Formula 73

15 The 5-step Method in Discrete and Continuous Time 81

16 The Term Structure of Interest Rates 94

17 Credit Risk 103

1
Tutorial Day 1
1 The Efficient Markets Hypothesis
The Efficient Market Hypothesis (EMH) states that security prices, quickly and accurately reflect the rele-
vant information.

It comes in THREE forms:

• Weak Form: Prices incorporate all information contained in the price history.
• Semi-strong Form: Prices incorporate all publically available information which includes infor-
mation contained in the price history.
• Strong Form: Prices incorporate all information including insider information

So any investor who believes in the strong form EMH automatically believes in the weak for EMH.

So how does EMH impact investor behaviour?

Fundamental Analysis which includes the analysis of balance sheets, company strategy and operating
environment, is the analysis of publicly available information. So if the market is Semi-strong efficient,
fundamental analysis will not help an investor to identify mis-priced securities, which can be traded to gen-
erate excess risk-adjusted returns. Therefore it can only add value when market is Semi-strong inefficient.

Insider Trading is illegal in the UK stock market and involves trading on the basis of information that
has not been published or not known to the public. This information goes beyond what is built into the
prices in a market that is only semi-strong efficient. If the market is not efficient in the strong sense, then
insider trading would enable the investor to generate excess risk-adjusted returns. Making it illegal aims to
remove this advantage from privileged individuals and suggests that some markets may not be strong form
efficient.

Technical Analysis is the study of chart patterns. A variety of trends, triangles and peak formations are
used as bullish (stock going up) or bearish (stock going down) indicators. This means we can draw a graph
of the share price in the past and try and use it to predict what it is going to do in the future.
The weak form EMH says that this would be a waste of time as security prices already incorporate infor-
mation from historical prices. So if possible to generate excess risk-adjusted returns, this implies weak form
inefficiency and consequently, semi-strong and strong form inefficient too.

2
Testing the Efficient Markets Hypothesis

3
Shiller’s Excessive Volatility Test
An excessively volatile market is one in which changes in the value of stocks are greater than can be justified
by the news arriving. This is claimed to be evidence of over-reaction of the market which is not compatible
with efficiency.

Shiller conducted his investigation in 1981, using data that went back 100 years. To test if the market was
excessively volatile, he used a discounted cashflow model based on the actual dividends that were paid and
some terminal value for the share, that could be used to calculate a perfect foresight price for the equity.

For example, on a particular stock, in 1981, Shiller could have looked at what the share price would have
been back in 1961, given that he knew the dividends that were paid out over the last 20 years and some
terminal value for the stock. Theoretically any share price is the discounted value of all its future dividends.
So we can use a discounted cashflow model as follows:

20
X
Perfect Foresight Price = X = v k Dk + v 20 T20
k=0
The difference between the perfect foresight price and actual price arises from forecast errors of future
dividends. So if we look at the difference between them we can call this the forecast error.

kActual Price - Xk = Forecast Error


We can hardly expect investors to get the foresight price exactly equal to the actual price the whole time.
They shouldn’t make these errors consistently. Shiller found that they were making systematic fore-
cast errors. Also if markets are efficient, then broad movements in the perfect foresight price should be
correlated with movements in the actual price, as both are reacting to the same news and hence the changes
in the anticipated future cashflows.

Shiller found instead that the actual price changes were significantly greater. This suggests the markets for
the particular securities he studied were excessively volatile. Thereby violating the EMH. More specifically
Shiller is claiming here that the market is NOT Semi-strong form efficient. However later studies suggested
flaws in his conclusion specifically:
• Incorrect choice of terminal stock value
• use of a constant discount rate
• bias in estimates of variances because of autocorrelation
• possible non-stationarity of time series

4
2 Utility Theory
Utility Theory is a mathematical theory to predict what investment choices people will make. Below is
a graph of my personal utility, it shows the change in happiness (y-axis) against the change in amount of
money I have.

Two main properties - the graph slopes upwards, i.e. the more money you have, the more satisfied you are.
This property is called Non-satiation. The other property is that the gradient decreases when money is in
abundance. We call this property Risk-aversion. To a Risk-averse investor, losing £10 is more significant
than gaining £10 so these investors would reject a fair gamble.

To define this mathematically, we define money as the variable w for wealth and happiness as the utility, a
function of this wealth, U (w). Thus the red line drawn in the figure above is known as a utility function.
The ability to draw this for individual investors is the first part of the Expected Utility Theorem.
The second part is that an investor, when making decisions, will do so in order to maximise their expected
utility, not necessarily their wealth.

Non-satiation says that basically the more money you have the better. So mathematically the gradient of
the utility function is increasing, as such U 0 (w) > 0 for all w.

Risk-aversion mathematically we can say that U 00 (w) < 0.

We can go further than saying someone is risk-averse. There are measures that show just how risk-averse an
investor is, so called Arrow-Pratt Measures.

Absolute Risk Aversion

−U 00 (w)
A(w) =
U 0 (w)
As wealth goes up, the most likely situation is that as we have more money, we will happier to put a higher
absolute amount of money into risky assets, say shares. Therefore our absolute risk aversion will go down.
In this case A0 (w) < 0. (As A(w) is a decreasing function.
Relative Risk Aversion is the proportion of wealth we are happy to invest in risky assets. Mathematically
defined as,

U 00 (w)
R(w) = w A(w) = −w
U 0 (w)
As wealth increases, relative risk aversion would stay the same if our investment policy involved putting say
30% of our wealth into risky assets. If this is the case we call it Iso Elastic. Also as we get wealthier, there
is less need to take risk to get further wealth, we already have enough so we invest in safer assets to protect
the level of wealth that we have. In which case, relative risk aversion would increase as wealth increases.
Final possibility is that we get wealthier and wealthier, we are less worried about losing money so we’re
prepared to take greater risks in order to increase wealth more. In this case RRA would decrease as wealth
increased.

5
Common Utility Functions
Quadratic utility functions have the general form:

U (w) = a + bw + cw2
= w = dw2

a would just shift the function and wouldn’t change the choices that we make at all. In the same way, the
coefficient of w would act just as a scaling parameter.

If we have a think about the kind of utility functions we can get from this, we can get two kinds of quadratics,
seen below. We divide each into a separate two segments. Then look at these to see which part is desirable.

If we assume that investors prefer more to less, we want our function to be an increasing function of wealth.
Therefore we can rule out two of the segments as seen below.

The remaining segments both exhibit non-satiattion. So investors prefer more to less over those two ranges.
However if we make the additional assumption that investors are risk-averse. We are going to want the
second derivative to be negative and thus we rule out the remaining LHS section and are left with the curve
resembling my utility at the top of page 5.

A key question is at what point is the vertex of the graph on the RHS as we will want to use this utility
function only for ranges of wealth below that point. To see that we have a look at the function and see
where its maximum is.
−1
U 0 (w) = 1 + 2dw = 0 =⇒ w =
2d
Hence we require d to be negative so that the stationary point is a maxima and to be on the increasing part
of the utility function we must have the requirement that w < −1/2d.

Logarithmic utility functions have the general form:

U (w) = log(w)
We could use any base for the logarithm but usually
we use the natural log, ln. If we have a look at the
graph of this (RHS), it goes through the point 1. We
have a positive first derivative, thus investors prefer
more to less (non-satiation) and also it complies with
the requirement for investors to be risk-averse.

The fact that utility is negative for wealth less than


1 doesn’t really matter. It has the right shape and
we’re just going to use this alone to make decision -
sign doesn’t really matter.

6
Power utility functions have the general form:

wγ − 1
U (w) =
γ
At first sight, this may seem a little odd. We said, when looking at the quadratic utility function that we
didn’t need those a and b co-efficients as it was just about shifting and scaling. We seem to be doing that
here. We will consider an example to see why this might be the case.

Example

Let us choose wealth as 5 units, w = 5 and take γ to equal some small arbitrary constant. We will do this
for both the power and logarithmic utility functions to see if we can spot anything interesting. In this case,
let us choose γ = 0.0001.
Under the power utility function we have,
Under the log utility function we have,
wγ − 1
U (w) =
γ U (w) = ln(w)
Substituting in we get, Substituting in we get,

50.0001 − 1 U (5) = ln(5)


U (5) =
0.0001 = 1.60943791
= 1.60956743
Hence we get approximately the same answer, the smaller gamma is, the closer the two utilities will get to
one another. So the log function is just a special case of the power function. It gives us an extra degree of
freedom by choosing gamma.

Certainty Equivalent is how much would we be prepared to pay in order to avoid having to take the gamble.

Let us consider an initial wealth of, W0 = 1000 and consider a gamble we could take such as we could toss
a coin, and if we get heads, we stand to lose 100. But if we get a tail, we aren’t going to lose anything.

If we were risk-neutral, we would just look at the expected value of the outcome, which is 50, which is
the certainty equivalent.

However, we will usually assume that our investors are risk-averse and not risk-neutral and therefore the
certainty equivalent will be different than 50. If we were risk averse we don’t like the gamble, so we don’t
want to take a fair gamble. Therefore we would be prepared to pay more than 50 to avoid the possibility
of losing 100. To work out how much more than 50 we would be prepared to pay, we would need to choose a
utility function and then work out which figure gave the same utility as the expected utility of the gamble.

Example

Assume our investor makes the decision choosing the log utility function, U (w) = ln(w). Lets work out the
expected utility of the gamble.
1 1
∗ ln(900) + ∗ ln(1000)
E [U (w)] =
2 2
Now we want to get the same utility by paying a fixed premium, so we set this equal to ln(1000 − CE).
Rearranging this for CE, we get that the certainty equivalent of the gamble for the risk-averse investor is
CE = 51.32. So we would be prepared to pay this to avoid having to take the gamble of possibly losing 100
with 50% probability.

7
3 Stochastic Dominance and Behavioural Finance
Lets pretend that we have two different investment opportunities to choose between. If one of them will
give greater returns than the other one, no matter what happens, then it is an easy choice. The one with
the superior returns is to said to dominate the other one absolutely. We have Absolute Dominance. This
however isn’t very likely to be the case. So we’ll see if there’s a way to make sensible decisions based on
certain assumptions about properties of the investment opportunities.

If we know the Utility function for an investor, then we can act to maximise the expected utility. But in lots
of cases we don’t have a utility function, so we’ll see if we can make any decisions without those.

Stochastic Dominance Theorems

Graphical Representations of Stochastic Dominance Theorems


For our first example we consider the following investment opportunities A and B, which have the same
variance, the distribution of returns is as follows:

   
2 2
A ∼ N ormal 10%, (5%) B ∼ N ormal 5%, (5%)

Lets start off by looking at the density functions of these below. It looks like investment A is better than
B as it seems to be giving higher returns. There is a possibility that B could give a higher return that A.
So we can’t say A Absolutely Dominates B. So we look to see if we can say anything about stochastic
dominance.

8
The Stochastic Dominance Theorem 1 involves analysis of the Cumulative Distribution (cdf) for the distri-
bution of returns, not the density function as sketched above. Therefore we sketch this for both investment
opportunities below at the side of the pdfs.

Here we can see that the distribution function for A is always less than or equal to B and over much of the
range its strictly less. This alone doesn’t allow us to say something, but if we can say that investors prefer
more to less (non-satiated) then the condition for the first order stochastic dominance theorem applies and
we can say that investment A will be preferred to investment B.

Looking at a slightly different example, now we will consider two different investment opportunities A and
B which have the same mean and family of distribution, but a different variance:

   
2 2
A ∼ N ormal 5%, (5%) B ∼ N ormal 5%, (10%)

The figure below shows us the distribution functions for the two choices. On the RHS we see that FB < FA ,
conversely on the LHS we see that FA < FB . So we can’t use the First-order Stochastic Dominance Theorem
as this result doesn’t apply. So we now pay attention to the G functions as defined on page 8, the integral of F .

This graph shows us that GA ≤ GB . For the Second-order theorem to apply we have to be able to say that
investors prefer more to less, i.e. they are non-satiated and also that they are risk averse. If we are prepared
to say that, we can say that in this particular case that investment A is preferred to investment B.

That should seem reasonable as, looking at the two distributions, the means are the same and we can get
that mean under investment A with a lower variance, a lower risk, than investment B.

9
Tabular Representations of Stochastic Dominance Theorems
If we have discrete distributions of returns like as follows, we adopt this tabular approach.
(
−1% with probability 0.5
A:
4% with probability 0.5
B: + 1% guaranteed
The way we proceed here is by listing all the possible returns down the LHS column. We need equidistant
spaces between each discrete option. We construct
P PP columns for the probabilities, p, the sum of probabilities
p and also to calculate the G function, p for both investment opportunities A and B.

In terms of the Stochastic dominance theorems


P we look to see if numbers in 1 column are always less than
or equal to those in another. Looking at p specifically we see that the First-order stochastic dominance
theorem doesn’t apply here as this isn’t observed in the table above.
PP
Additionally with the p columns we see that again we have the same case, we cant conclude anything
based on the equivalent of our function G. So the Second order Stochastic Dominance theorem doesn’t apply
in this case either.So we can’t say if an asset stochastically dominates the other.

If we consider expected returns, on Asset A we should get 1.5%, so its expected return is higher than B. But
we’re only going to get that if we’re prepared to take on some risk. The variance is 0 for asset B but obviously
positive for asset A. We can’t say much more without knowing an investors individual utility function or
risk return preferences.

10
Behavioural Finance
The study of how mental biases and errors in decision making affect financial decisions. In this section we
explore the 8 main themes found in research on behavioural finance and the biases that underly them.

The themes can be summarised into an acronym, FOAM POEM.

Framing
The words that are used in a question have an impact on the decision that is made.

Overconfidence
Generally people are overconfident in their own abilities, knowledge and skills. Two biases contribute to this
overconfidence.

• Hindsight bias - looking back, you think you anticipated events in advance.
• Confirmation bias- suppose you are faced with a difficult decision but you have an initial view of
what to do. You examine lots of data to help you make a decision, confirmation bias tells us that
you will tend to look for and remember evidence that confirms your view and dismiss contradictory
evidence. Leading you to believe your initial view is correct and overconfident in your abilities.

Anchoring
Anchoring attempts to explain how people produce estimates. They start with an initial idea of the answer,
known as the anchor. This may be based on past experience or expert opinion. They then adjust or amend
this estimate to reflect changes since the anchor was formed. The problem is that people give too much
weight to the anchor, they have difficulty moving away from it. Studies have shown that this is the case,
even if people are aware that their anchor is ridiculous.

Example - Consider a company whose share price has recently fallen because of new information clearly
showing the company’s prospects are worse than previously thought. People who purchase shares in this
company may be experiencing anchoring and adjustment bias. The investor finds it hard to give weight to
the new information, they believe the share price should be higher as they are anchored to the price before
the news came out.

Mental Accounting
Mental Accounting says that people tend to separate related events and decisions and find it hard to ag-
gregate them. Therefore, when making financial decisions, people tend to set up separate mental accounts
and consider them individually rather than netting out all the gains and losses. As a consequence, they may
make sub-optimal decisions.

Example - Consider somebody who owes money on a credit card and also having cash in an instant access
savings account. They are probably paying a high rate of interest on their credit card debt, whilst receiving
a low rate of interest (if any) on their instant access savings. Arguably this is a sub-optimal decision.

Mental Accounting bias means that they think about savings and debt separately. Instead they would pay
less interest overall, and so be better off, if they used their savings to pay off their credit card debt.

11
Prospect Theory
Expected Utility Theory assumes that we are all risk-averse. However, Prospect theory says that this
isn’t true. In-fact, according to Prospect theory, people consider risk relative to a reference point and are
risk-averse when facing the prospect of a gain, they wish to crystallise the gain rather than put it at risk by
taking a gamble. But they are risk seeking when facing the prospect of a loss. They want to try and gamble
away their loss.

Options
There are lots of theories around how people choose when presented with a range of options.
• Primary Effect says that people will select the option that they consider first.
• Recency Effect says that people will select the option that they see last, or most recently.
• Middle Option says that people will avoid the options that represent the extremes and will select
the one in the middle.
• Complexity - Individuals dislike complexity, they find it overwhelming, so the greater the number of
choices you offer, the more likely it is that an individual will defer a decision entirely.
• Status Quo Bias is present when choosing between sticking with a current option or making a change
to a new option. Status Quo bias leads people to stick with their current decision. It says that the
investor is naturally conservative and will favour the current situation - the status quo.
• Regret Aversion is present when choosing between sticking with a current option or making a change
to a new option. Regret aversion bias leads people to stick with their current decision. It says that
the investor will stick to the current situation because if they switch to the new manager and things
go wrong, they will feel more responsible for the loss than if they stick with the current manager and
things go equally wrong. However investor is equally responsible regardless of switch, so the bias is
irrational.
• Ambiguity Aversion says that people dislike uncertainty or ambiguity and so will pay a premium
to remove this ambiguity. For example, they may be willing to pay for information or analysis that
reduces the ambiguity around a potential investment.

Estimating Probabilities
Generally people are bad at estimating probabilities and three biases illustrate that.
• Dislike of Negative Events - People tend to underestimate the probabilities of events that will have
a negative impact.
• Availability Bias reflects that if someone finds it easy to imagine or visualise an event. They think
it is more likely to occur than an event which is hard to visualise.
• Representative Heuristics - this says as the level of detail in a scenario increases, it becomes more
specific and must become less likely. But the detail makes the scenario seem more believable so people
think its more likely.

Myopic Loss Aversion


Myopic Loss Aversion says that investors are less risk averse when faced with a series of repeated gambles
than when faced with a single gamble. So an investor who recognises that they are investing for the long
term, may be less risk averse when setting their investment strategy than someone who focuses on what will
happen in next 3 months. Also longer gap between reviews of strategy, the less risk averse investor will be.

12
4 Measures of Investment Risk
We will start by defining X as a Random Variable that represents investment return. It could be a percent-
age return or a monetary value, i.e. pounds or dollars. In this section we will look at the formulae for 5
different measures of investment risk, for both continuous and discrete distributions of investment returns.

1. Variance
The variance measures the spread of returns both above and below the mean and mathematically the variance
is defined as the expected squared deviation of returns from the mean return µ.

h i Z
2
Var [X] = E (X − µ) = (x − µ)2 f (x)dx
all x
Now if X is a continuous random variable, we can work out this expectation by integrating as seen above.
If X is discrete we amend the formula in the following way:

X
Var [X] = (x − µ)2 P [X = x] dx
all x

One criticism of the variance is that it considers both upside and downside risk, when in practice most
investors are more worried about downside risk. So one way of getting over this is to consider the downside
semi-variance.

2. Downside semi-variance
The formulae are almost identical to those for the variance we just change the limits on the integral, so we
can see we are just considering values of x below the mean return.

X
Var [X] = (x − µ)2 P [X = x] dx
x<µ
Z
= (x − µ)2 f (x)dx
x<µ

3. Shortfall Probability
This is simply the probability that the investment return falls short of some benchmark level, L. It could be
an absolute value; a return on an index; a return on a benchmark fund or even some other variable such as
inflation.
Z X
P [X < L] = FX (L) = f (x)dx = P [X = x]
x<L x<L

13
4. Expected Shortfall
There are many different ways of defining the expected shortfall, we define it here below in the continuous
and discrete cases.

Z
E [max (L − X, 0)] = (L − x) f (x)dx
x<L
X
= (L − x) P [X = x]
x<L

This definition is mainly a building block for a more useful measure called the conditional expected shortfall.

5. Conditional Expected Shortfall


This is the expected shortfall given that a shortfall occurs. We use Bayes’ Theorem for conditional proba-
bilities to calculate this as follows:

R
− x) f (x)dx
x<L (L
E [L − X|X < L] =
P [X < L]
P
(L − x) P [X = x] dx
= x<L
P [X < L]
To help understand these formula we can look at a simple discrete example.

We will set the Benchmark level L to be 3% and X is our random variable denoting investment returns
which takes the values 1%, 2%, 4% and 5% with equal probabilities.

We now look at the shortfall given that a shortfall occurs. Hence X is either 1% or 2%. Of which the
shortfall is 2% and 1% respectively. Subbing into the discrete equation above gives us 1.5%.

The conditional expected shortfall is a weighted average of the shortfalls, where the weights are the probability
of them occurring.

14
Now we will consider some of the advantages and disadvantages of the measures of investment risk that we
have just considered.

Points worth noting about the above table:

• Mathematically tractible means that the variance has nice statistical properties that make it easy to
work with mathematically.

• MVPT is a theory which is used to help investors select their optimal portfolio. There are lots of
criticisms of MVPT - its assumptions and how realistic it is. Having said that, it can be shown that it
is a good approximation to other methodologies for selecting portfolios. In fact in certain cases MVPT
results in the selection of the same optimal portfolio as maximising investors expected utility. Lets
have a look at this in more detail.

We will start by considering the case where the investor has a quadratic utility function.

U (w) = w + dw2 , d<0


E [U (w)] = E [w] + dE w2
 

So we can see that the expected utility is a function of the first two non-central moments of wealth.
So to maximise the expected utility we only need to consider the mean and the variance. MVPT only
considers the mean and the variance and no higher moments, hence both methods of expected utility
and MVPT will lead to the same optimal portfolio.

Additionally if investment returns have a normal distribution we recall that the MGF is defined as:
1 2 2
M (t) = eµt+ 2 σ t

The fact that the MGF depends on mean and variance we can say that the normal distribution is
completely characterised by its first 2 moments. So whatever the utility function, the expected utility
will always be a function of the mean and variance only.

15
• The conditional expected shortfall does have the advantage of telling us the size of a loss not only
the probability of a loss. This measure is just one single figure, the expected value, it would fail to
capture any differences in the range of the shortfall. For example let us consider two probability density
functions.

So here (in red) we have a pdf where the probability of falling short of the value L will be set at 5%
and we can also calculate the expected shortfall given that a shortfall occurs. Now consider a different
distribution (green), we can set this distribution up so that the probability of shortfall is also 5% and
the expected shortfall is of the same magnitude as the red distribution. However, since the green
distribution has a longer lower tail, it is arguably more risky than the red distribution. Therefore the
expected shortfall has failed to capture this difference in risk.

Value at Risk and Tail Value at Risk


Now we will look at two closely associated measures of investment risk called VaR and Tail VaR.

Value at Risk estimates how much you might lose on a portfolio over a specific period of time with a given
probability confidence level. Typical confidence levels would be 95%, 99% and 99.5%, representing losses
that might occur in 20 years, 100 years and 200 years respectively. If the confidence level is 95%, sometimes
this is called the 95% VaR and sometimes the 5% VaR.

In Layman’s terms, if we’re told the 1-day 5% VaR = $ 50,000, this means that we are 95% confident that
losses will not exceed $50,000 over the next day. It is important to note that VaR is always given as a
LOSS figure.

A negative VaR would represent a profit. So if the 1-year, 95% VaR = -$30,000. This means that we are
95% confident that profits will not fall below $30,000 over the next year.

VaR can be represented in absolute terms as we have seen above, or relative to a benchmark. For example
if the 1-month 1% VaR = 3% relative to an index. This means that we are 99% confident that we will not
under-perform the index by more than 3% next month.

We will look at two main examples, one involving a continuous random variable and the other a discrete
one.

16
Continuous Example of Value at Risk

X = annual profit in $m
X ∼ N (0, 1)

We are tasked to find the 1-year, 95% VaR. It is


helpful to draw the pdf of the distribution on a graph.
So we are trying to find the point on the x-axis such
that we are 95% confident that we will exceed this
value over the next year. This is the same as finding
the lower 5% tail of the standard normal distribution,
which is given in the tables as -1.645. Now noting
that x is defined in terms of millions of dollars and
that the VaR is a loss figure rather than a profit. We
will say that the 1-year, 95% VaR is $1.645m.

Mathematically the VaR is defined to be,

VaR = −t where P [X < t] = p


In our example above we had that p = 0.05 and t = −1.645.

Discrete Example of Value at Risk


Let X = 1-year return in $ on an investment such that X has the following discrete distribution:


 5000 with probability 0.9

−5000 with probability 0.05
X=

 −10000 with probability 0.04

−20000 with probability 0.01

We are asked to work out the 1-year, 95% VaR, we want to be able to say that we are 95% confident that
losses will not exceed next year and the blank will be our VaR.

According to our distribution of X we are 95% confident that returns will be no lower than −$5000 over
the next year. Equivalently we are 95% sure that losses will not exceed $5000 next year. Similarly if we are
asked to find the 1-year, 99% VaR we are 99% confident that losses will not exceed $10000 next year.

What if we were asked to find the 1-year, 96% VaR? Remember that X has a discrete distribution and X
can’t take any values other than the 4 given above. So can we say that we are 96% confident that losses will
not exceed 5000 next year? No we can’t, as there is a small probability that losses will be 10000. We can
say that we are 96% sure that losses will not exceed 10000 next year. So the 96% VaR is $10000.

17
As with the other measures of investment risk, we should look at the pros and cons of the VaR.

It is particularly popular and widely used in the world of finance in measuring market and credit risk. Also,
a potential loss figure is very easy for people to understand and it helps financial organisations to know how
much capital they need to hold in order to cover particular risks. VaR focuses on downside risk which is the
main worry of most investors.

However in calculating VaR, in order to make calculations easy, it is often assumed that investment returns
have a normal distribution. But this is unlikely to be the case in reality. For example, as soon as there is
credit risk present in the portfolio, or if the portfolio invests in derivatives. We would want distributions that
are more skewed with fatter tails. Having said this, it is possible to compute VaR using other distributions,
either directly or using monte carlo simulation.

VaR is used to measure the risk associated with extreme events, events that occur 1 every 200 years. Since
the events are extreme there is often very little past data to determine a suitable distribution to use, this calls
into question the accuracy of any VaR figure. It has been criticised in the past for giving a false sense of se-
curity to investors. In reality investors should be worrying more as to what happens if losses exceed the VaR.

Tail VaR looks at the expected loss if the loss exceeds the value at risk. It means different things to
different people. Its most common definition is the conditional expectation of the values in a particular tail
of the distribution, so L would be a particular tail value, i.e. the lower 5% value. However the core reading
defines the Tail VaR as the expected shortfall in a particular tail:

E [max(L − X, 0)]
or the conditional expected shortfall of a particular tail, i.e,

E [L − X|X < L]

18
Let us consider our earlier example on page 17 for a continuous random variable. This time we shall now look
at calculating the Tail VaR, specifically the 1-year, 95% Tail VaR. It is given by E [L − X|X < L], where L
would be taken to be the lower 5% point of the distribution. Or as we calculated earlier L = −1.645. As X
is a Continuous RV, we can calculate the required expectation using integrals:

L
(L − X)f (x)dx
Z
E [L − X|X < L] =
−∞ P [X < L]
Z L Z L !
1
= L f (x)dx − xf (x)dx
20 −∞ −∞
1
= (0.05L − 1 [f (L) − f (−∞)])
20
1
= (0.02085)
20
= $0.417m

Now we return to the discrete example, and this time we want to work out the 1 year 95% Tail VaR. As
before:

P
x<L (L − x)P [X = x] (−5000 − −10000)0.04 + (−5000 − −20000)0.01
E [L − X|X < L] = = = 7000
P [X < L] 0.05

This means in the worst 5% of scenarios, we can expect to lose $7000 on top of the $5000 already lost.

19
5 Portfolio Theory
Here we will discuss the assumptions underlying portfolio theory and describing the process followed. The
key question that we’re seeking to address with portfolio theory is: How can an individual investor choose
an optimal portfolio of risky assets, reflecting their risk return preferences?

The assumptions for portfolio theory are given below, they spell out the acronym SARDINE.

With any model or theory, it is quite likely that we can pick holes in many of the assumptions and this is
no exception. Examples may include our investment decisions may be based on other circumstances such as
ethical issues. Also 1 step into the future might not be the only thing that is important to us we may want
to know the decision at various future points in time. Finally assumption 7 isn’t quite right, you can’t buy
half a share, nor can you buy more of a company than actually exists.

Other factors that might influence the investment decision in practice are ignored. These include:

• the suitability of the asset(s) for the investor’s liabilities


• the marketability of the asset(s)
• higher moments of distribution, i.e. skewness and kurtosis
• restrictions imposed by legislation
• taxes and investment expenses
• restrictions imposed by fund’s trustees
• ethical issues surrounding specific asset(s)
But rather than just pick holes in the assumptions, a better thing to do is to have a look at the output of
the model or the theory and see if that ties up with reality. If it does then it can help to give some insight
into what is going on and how the world works.

20
We will be able to make our decision about our optimal portfolio by looking at this graph below. So we
go out and look at all the different possible assets that we could invest in and plot them on our graph. Also
we look at any combinations we can have of different assets. As we look at more and more combinations we
will build up a picture of lots and lots of dots in what we call the E - σ space.

This gives us the dense semi-ovular area above which is known as the opportunity set. It is the set of
all possible investment options. Our optimal portfolio is hidden somewhere in there. The point plotted on
the graph above is clearly not our optimal portfolio. This is because if we wanted that level of risk, we can
choose a portfolio higher up which would give a higher expected return. If we wanted that expected return,
we could choose a point more to the left which would have less volatility of returns.
So we can rule out all the portfolios where it is possible to go up and to the left and call them inefficient.
That just leaves all the portfolios on the red line below which we call the Efficient Frontier. So our optimal
portfolio is going to be somewhere along that line.

The next thing is to see which of those is right for us, to do that we will look at an indifference curve,
examples of which are plotted below. It joins together all points that give us exactly the same utility.

We maximise expected utility by finding an indifference curve that is tangential to the efficient frontier. It
is not possible to find another portfolio in our opportunity set that is on a higher indifference curve. Thus
this tangential point is our optimal portfolio. It represents how we would choose to invest our portfolio
of risky assets, if we assume the assumptions underlying portfolio theory.

21
Deriving the Minimum Variance Portfolio for 2 Assets
The graph below shows the expected return plotted against the standard deviation for two assets, A and
B. Asset B has a higher expected return than asset A but that is made up for by having a higher standard
deviation of returns. The line joining them on the screen represents a possible opportunity set, if we invest
varied proportions of our wealth in asset A and B it would trace out a line connecting them.

For example point B represent 100% of our wealth in Asset B and nothing in Asset A. If we want to move to
the right of this line we would have to short sell some of asset A in order to buy some more of asset B. There
is a particular point on this line which is of great interest to us marked in purple. This is the point that
has the lowest standard deviation of all the possibilities, we call this the Minimum Variance Portfolio.

It might be of interest to find out what proportion of our wealth we should invest in Asset A and Asset
B to achieve this. We can start by writing an expression down for the return on our portfolio if we split
our wealth between just these two assets. This is given by the weighted average on the returns on the two
portfolios.

RP = xA RA + xB RB

As we are interested in the MVP it also makes sense for us to calculate the variance of the portfolio, this is
given below, including a term incase returns on A and B are correlated,

VP = x2A VA + x2B VB + 2xA xB CAB


= x2A VA + (1 − xA )2 VB + 2xA (1 − xA )CAB
δVP
=⇒ = 2xA VA − 2(1 − xA )VB + 2(1 − 2xA )CAB = 0
δxA
= xA (VA + VB − 2CAB ) = VB − CAB
VB − CAB
=⇒ xA = xB = 1 − xA
VA + VB − 2CAB
If our aim is to minimise the variance on the portfolio, ignoring all other considerations, we would want to
invest xA proportion of our wealth into Asset A and the rest into B. That will give us our required portfolio.
For completeness we should really find the second derivative to show that this is a minimum.

22
Deriving an Equation for the Efficient Frontier (2 Asset Case)
Lets suppose there are 2 assets that we could invest in, A and B. Their expected returns and standard
deviations are as shown on the graph, the line you can see joining them represents all the different possible
combinations of E and σ that are possible by varying the proportions that we invest in A and B.

In order to get an equation for this line, we can write down expressions for the expected return and standard
deviations in terms of the proportions invested and somehow combine them.

As discussed earlier the return on the portfolio is given by,

RP = xA RA + xB RB
EP = xA EA + xB EB
= xA EA + (1 − xA ) EB

The variance is given by the following expression,

VP = x2A VA + x2B VB + 2xA xB CAB


= x2A VA + (1 − xA )2 VB + 2xA (1 − xA )CAB

Now in order to get the variance in terms of the expectation we can rearrange the expression for expectation
to make xA the subject of the formula and substitute it into the expression for the variance.

EB − EP
xA =
EB − EA
One of the assumptions for Portfolio Theory was that the expected returns, the variances and the covariances
on the available assets are all known. So in our expression, the only unknown is xA . Therefore if we substitute
in for xA in the expression for variance, we are going to get a function of the expected return on our portfolio.
This will give us an equation in V − E space. But if we want it in σ − E space, we can take square roots.
Hence,

p p
σP = VP = f (EP )

We now have an equation for the purple line in the diagram on page 22, but not all of this is the efficient
frontier. The Efficient Frontier is from the MVP upwards. So the last bit we would need to do would be to
work out the expected return that corresponds to the minimum variance and our Efficient Frontier is just
the equation we derived for expected returns EP above this.

To find this value, we would need to differentiate the variance with respect to the expected return on portfolio
and set this equal to zero.

23
So far we have been assuming that both assets A and B are risky assets. Now let us consider the case where
Asset A is in fact risk-free. In this case the standard deviation of the asset would be 0, so the point A would
be on the vertical axis as seen below.

Our equation for the expected return is still the same, but we can simplify the equation for the variance of
the portfolio. If asset A is risk-free its variance is 0, so VA = 0 and also there is no covariance with asset B
implying that CAB = 0. Hence,

VP = x2B VB

We can square-root this to get an expression for σP ,

σ P = xB σ B
EP − EA
= σB
EB − EA

We can see that σP is linear in EP , in other words, our Efficient Frontier in E-σ space will just be a straight
line joining assets A and B. As before we will need to specify that efficient frontier is the part of this line
that is greater than the return on Asset A.

Impact of Correlation on the Efficient Frontier (2 Asset Case)


The graph below shows just two possible opportunity sets when there are just 2 risky assets, that is all
the combinations of expected return and standard deviation that are possible by varying the proportions
invested in assets A and B. The two cases shown are when we have perfect correlation. From the graph we
can see that it is possible to eliminate all risk by holding the appropriate combinations of asset A and B.
We can see from the graph that for the case where asset A and B are perfectly correlated, ρ = 1, we need to
short sell Asset B to assign more than 100% of our wealth into asset A. That explains why the approximate
expected return on this portfolio is 2% because we’ve had to short sell the high yielding asset B in order to
invest in lower yielding asset A.

The situation is much better if we have perfect negative correlation, we look at the pink line on page 24.
Here we hold a positive proportion of both asset A and B to achieve expected returns of around 5%.

24
The graph on the right above shows us more likely cases in terms of the correlation between two assets. The
blue line shows partial positive correlation and the pink line shows partial negative correlation. The gold
line is interesting however, it shows that even when there is no correlation between assets, optimal portfolio
can be achieved by using diversification.

Efficient Frontier in the 3 Asset Case)

Let’s SUPPOSE we have 3 assets, A, B and C, whose expected returns and standard deviations are plotted
on the E-σ space below. We want to find the Efficient Frontier, if we had just 2 assets we would sub in the
equation for expected return on portfolio into the variance. We would get the following curves if so.

For 2 assets its nice and easy, as for any expected return we want there’s only one possible way of getting
that and that pins down the standard deviation exactly. But with three assets there isn’t just one way of
attaining a specific expected return.

For example lets suppose we wanted an expected return of 5.5%, to get that there are 3 possibilities, which
are using a combination of A and B, B and C and A and C. But looking at the graph it doesn’t make sense
to use anything other than the green curve, as the others have a greater amount of risk for the same expected
return. But it isn’t that easy as we could potentially invest in all 3 assets. In which case, rather than just
having a line for our opportunity set we have a densely packed in hyperbola. Our task is to minimise the
standard deviation subject to the expected return being equal to 5.5% and to constraint of xA +xB +xC = 1.

25
If we manage to solve that and get an equation in E-σ space, we could add that to our graph (black line)
and see the actual Efficient Frontier for 3 assets. So lets imagine we have done that and we will draw it on
the graph below. It hugs the other 3, in fact it touches each of them once and exactly once. We can see that
for a return of 5.5% discussed above, the black line which incorporates all 3 assets has less risk than solely
investing in assets A and C (green line).

Its all very well to draw this line, but how would we go about finding its equation? So our plan is to fix the
expected return and then minimise the possible variance associated with that return. So we want,

X
min VP = xi xj Cij
ij
X
subject to xi Ei = EP
i
X
and also xi = 1.
i

This is a constrained optimization problem and the process for solving it involves Lagrangian multipliers.

26
Example using Lagrangian Multipliers
Lets suppose our task is to minimise the following expression (normally to minimise variance but just looking
at this to illustrate method),

min 3x2 + 2y 2 + z 2


st x + y + z = 11

We will start by forming the Lagrangian and that is denoted by L, it is a function of the variables in the
expression we want to minimise and also a dummy variable, the Lagrangian multiplier.

3x2 + 2y 2 + z 2 + λ (x + y + z − 11)

We minimise by partially differentiating with respect to x, y and z and set equal to 0 to obtain the following,

δL
= 6x + λ = 0
δx
δL
= 4y + λ = 0
δy
δL
= 2z + λ = 0
δz

From the above equations we can see that 6x = 4y = 2z. We can combine this with the constraint of the
sum of the variables equalling 11 like so,

6x 6x
x+ +
4 2
=⇒ x = 2, y = 3, and z = 6.

We finish by subbing in these values into our variance expression we began with and in the case get 66.

27
6 Models of Asset Returns
Single-index Model is a simple statistical model used for measuring the return on a security. Under
the single-index model, the return on security i is given by,

Ri = αi + βiRM + i

In practice RM would be the return on an index such as the FTSE All Share in the UK.
αi signifies how an investment is expected to perform after allowing for the systematic or market risk in-
volved. However if we assume EMH we set α = 0. As it would be impossible to outperform the market
through active investment management. The same β here will be discussed later on in the CAPM model.

There are 3 key assumptions underlying the single index model. The first is that the mean of the error term
is 0, the second, that the error terms on different securities are uncorrelated and finally that error term in
uncorrelated with the return on the market.

It is worth noting that the Single-index model is a statistical model and not an economic one, i.e. α and
β where found from past data using linear regression. It is a single index model as there is only one factor
that is assumed to affect the returns on the securities and that is the market return. We could extend it to
become a multi-factor model but studies have shown the key influence on asset prices is the correlation with
the market return.

28
Systematic and Specific Risk
Systematic risk is risk that relates to the market as a whole. It is also known as market risk and undi-
versifiable risk.

Specific Risk on the other hand relates to a specific individual security. It is also know as unsystem-
atic risk, residual risk or diversifiable risk as you can reduce it by investing in lots of different securities.

Lets consider an example. You have $10,000 to invest and you decide to invest it in shares of an ice-cream
manufacturer. If there is a case of food poisoning and the manufacturer’s plant there may be bad publicity
and the share price will fall. You will make a loss. However, had you instead invested $1,000 with 10 different
ice-cream companies and there was a case of food poisoning in 1, the extent of the loss will be a lot smaller.
So Food Poisoning is a Specific Risk, we can reduce it by diversifying the portfolio.

A recession in the particular country where the ice-cream manufacturer operates might lead to a reduction
in production of ice-cream as a whole, whether you’ve invested in 1 company or 10. The impact on your
portfolio is likely to be similar. A Recession is a Systematic Risk.

We will now look at how systematic and specific risk relate to the single-index model.

As discussed above the return on a particular security i is given by,

Ri = αi + βi RM + i

Its variance is given by,

Vi = βi2 VM + Vi

This formula tells us that the variance of returns on security i is the sum of a market related component
and a component that relates just to security i. Variance is a measure of risk so the risk on security i is the
sum of market related risk and a risk that is specific to security i.

29
It is possible to show mathematically how we can diversify away specific risk in the single-index model.
Instead of looking at just one security, we will look at a portfolio of securities and will denote the portfolio
by P. So the return on the portfolio will be equal to,

n
X
RP = xi Ri xi = % of portfolio held in security i
i=1
Xn
= xi (αi + βi RM + i )
i=1
n n
! n
X X X
= x i αi + xi βi RM + xi i
i=1 i=1 i=1
= αP + βP RM + P

One of the results of the single-index model is that the variance is given as,

VP = βP2 VM + VP

where, !
n
X n
X
VεP = Var (εP ) = Var xi εi = x2i Var (εi )
i=1 i=1

We will consider the case where n is large, and the proportion of portfolio held in each security is equal.
Hence xi = 1/n, so

n
X
VεP = x2i Var (εi )
i=1
n
X 1
= Var (εi )
i=1
n2
n
1X1
= Var (εi )
n i=1 n

This is 1/n multiplied by the average specific risk across the n securities, as the number of securities held
increases, VεP → 0. So we are showing that we are eliminating the specific risk associated with the portfolio.

30
Multi-factor Models
A Multi-factor model is an extension of the single-index model. There are several factors which explain the
return on an individual security.

We will call these factors I1 , · · · , In . In fact they represent the change in these factors rather than the factors
themselves. For example if the k-th factor was short term interest rates, then Ik might be the change in
short term interest rates.

ai is the equivalent of αi in the single-index model, it is a constant and it represents the expected return
on security i due to company specific factors.

bi,1 , · · · , bi,n measure how sensitive the return on security i is to the various factors. The higher the absolute
value is on a b-term, the more sensitive the return is to that factor.

ci is the equivalent of εi in the single-index model. It is not a constant, it is a random variable and it
represents the unexplained part of the return.

Like the single-index model there are three key assumptions given in the red boxes above.

Multifactor models are statistical models and not economic models. This means that the constants are
determined by fitting the model to past data using regression techniques.

In terms of deciding how many factors to include, you would start with a few, fit the model to past data
and then measure the variance of the actual versus the fitted returns. Keep adding factors until there is no
longer a significant drop in the variance.

Finally, once the factors are chosen, it is possible to restructure a multi-factor model so that the I’s are
uncorrelated with each other. These are called Orthogonal Factors.

31
We will look at an example of a model where there are two factors, I1 , I2 . These are the factors in the
original model. Then we will restructure the model and define the factors as I1∗ and I2∗ .

We will set I1∗ to be equal to I1 and we will look at how I2 relates to I1∗ . We can do this by creating a
scatterplot of I2 against I1∗ . Using regression techniques we can find the line of best fit. We will call the
intercept γ1 and the gradient γ2 .

So,

I2 = γ1 + γ2 I1∗ + d1

where d1 is the error term which is independent of I1∗ . Next we will set I2∗ to be,

I2∗ = I2 − γ1 − γ2 I1∗
= d1

Now since d1 is independent of I1∗ , I2∗ must also be independent of it. Giving us two uncorrelated factors.

There are three types of Multi-factor model depending on the factors. They are macroeconomic, fundamental
(company/industry related) and statistical factor model. Examples of each are included below.

32
7 Asset Pricing Models
The Capital Asset Pricing Model is, as its name suggests, it is an asset pricing model, it is used
to come up with a discount rate for valuing the cashflows on an asset, and so for pricing that asset. CAPM
is an example of an Economic Model rather than a Statistical Model as its based on economic theory rather
than a fit to empirical data. In fact it is an extension of MVPT, MVPT considers how an individual investor
selects portfolios and CAPM extends this to how all investors select portfolios.

We will begin by looking at the assumptions underlying the MVPT (SARDINE on page 20), CAPM ex-
tends these assumptions to all investors.

In particular, we will now assume that all investors have the same 1 period time horizon. All investors have
the same estimates, expected return, variances and covariances of returns over that 1 period time horizon.
In addition we will assume that all investors can borrow or lend unlimited amounts at the same risk-free
rate, r and all investors measure returns in the same currency.

The key implications of these homogeneous assumptions is that all investors will have the same Efficient
Frontier and this Efficient Frontier will collapse to a straight line with the existence of a risk free asset.
Under CAPM we will also assume that the market is perfect. This is why CAPM is an economic model it
is based on the economic theory of a perfect market. A couple of characteristics of a perfect market are:

• All investors have the same access to information


• No one investor has the power to set and influence prices.

Finally we assume the market is in equilibrium, i.e. that supply is equal to demand. This means that we
can assume the return investors require is equal to the return in which they expect to get. So we can use
the expected returns derived from CAPM to discount the cashflows on assets and determine their price.

It is worth noting that a lot of these assumptions don’t hold up in practice.

33
Capital Market Line
Now we will consider the derivation of the capital market line. This is one of the key results of CAPM. A
consequence of investors all having the same homogeneous expectations will be that all investors will have
the same efficient frontier of risky assets.

A consequence of the assumption that all investors can lend and borrow at the same risk-free rate r is that
this efficient frontier collapses to a straight line. The portfolio represented by the blob on the EP axis below
is the risk-free portfolio, it consists of 100% investment in the risk-free asset. This would suit a very risk
averse investor.

The portfolio represented by the blob M, where the straight line is tangential to the efficient frontier is a
portfolio consisting of 100% investment in risky assets. This would suit a risk tolerant investor. The port-
folio represented by the blob to the right of this, represents going short in the risk-free asset or borrowing
at the risk-free rate and investing extra in the portfolio of risky assets. This would suit a risk lover.

Under CAPM all investors would choose a portfolio on the straight line. They would choose a portfolio
where their indifference curve is tangential to this straight line efficient frontier. Under another CAPM
assumption that the market is perfect, we can be more precise about what the point M actually is.

If everyone has the same information and the same assessments about returns and there’s no superior infor-
mation then every investor will hold the same portfolio of risky assets as everyone else. Some investors will
hold 0% of their money in this portfolio, 25%, 50% etc... Nonetheless they are all holding a proportion of
their money in this same portfolio. But if all investors were holding this to some degree or another, it must
be the Market Portfolio.

If a particular investor did get hold of some superior information that caused them to think about invest-
ing in a different portfolio of risky assets, everyone else would quickly follow suit. As in a perfect market,
information quickly becomes available to all investors and they would have the same assessment of it. This
isn’t to say that investors can’t exhibit personal risk preferences, we have already seen that investors with
different appetites would have portfolios on different points on the straight line. But portfolio M can be
determined without knowing anything about individuals preferences for risk and return or their liabilities.
This is known as Seperation Theory.

The equation of this straight line is given as,


 
EM − r
EP = r + σP
σM

34
Security Market Line
Now we will consider another key result of CAPM, the derivation of the security market line.

Most of the graphs we have looked at so far to do with MVPT and CAPM have had expected return on the
y-axis and standard deviation (or volatility) on the x-axis. But this time we are going to have a look at a
graph with the expected return on investment, Ei on the y-axis against the beta on the investment, βi on
the x-axis. Remember that beta is a measure of how sensitive an investment’s return is to movements in the
market return. It is defined as,

Cov (Ri , RM )
βi =
Var (RM )
We will plot some investments on the graph below. Firstly the risk free asset, this has an expected return
of r, the risk free rate and the beta of the risk free asset is 0. As returns on the risk free asset are assumed
to be uncorrelated with those on the market. The second investment that we’ll plot is the market portfolio
M. The expected return on the market is denoted EM and the beta of the market is βM = 1.

But what about investment P which represents investing 50% of wealth in the risk free asset and 50% in the
market portfolio. The expected return on this would be given as,

E [RP ] = 0.5 ∗ r + 0.5E [RM ]


The beta on this investment will be given by,

Cov (0.5r + 0.5RM , RM )


βP =
Var (RM )
0.5Cov (r, RM ) + 0.5Cov (RM , RM )
=
Var (RM )
= 0.5
We can now join these three separate investments together with a straight line. This line is called the
Security Market Line and using the graph above we can come up with the equation for the line as,

Ei = r + βi (EM − r)
What if a portfolio didn’t lie on the straight line, but lay below it. We will call this Portfolio S. It has
β = 0.5. However no one would want this portfolio as for the same beta we could get a better expected
return by investing in portfolio P. So it is reasonable to conclude that no portfolios would lie below the
security market line. A key difference between the Capital Market Line and the Security Market Line is that
the CML applies only to efficient portfolios but SML applies to all portfolios and all securities regardless of
their efficiency.

35
Arbitrage Pricing Theory
The Arbitrage Pricing Theory is an alternative model to CAPM. One of the criticisms of CAPM is that it
relies on too many assumptions which turn out to be unrealistic. It is also difficult under CAPM to establish
a measure the Market Portfolio.

APT tries to get round this by having far fewer assumptions. But like CAPM, APT is a model that is
used for pricing assets, it assumes the market is in equilibrium and we can use the expected return on the
investment that is derived from APT to discount the cashflows on that investment and obtain its price.

APT is an Economic Model rather than a Statistical Model because it is based on the economic principle of
no arbitrage. However APT builds on the statistical multi-factor model that we looked at earlier.

Under APT we are interested in the expected value of Ri .

E [Ri ] = λ0 + bi,1 λ1 + bi,2 λ2 + · · · + bi,n λn

The key assumption of APT is that the market is arbitrage free. This means that an investor cannot make
a risk free profit. As we mentioned earlier, APT also assumes the market is in equilibrium.

The key result of APT is that all securities and portfolios have expected returns described by the n-
dimensional hyperplane (formula above).

The graph below helps to explain this key result. Consider a multi-factor model with just 2 factors, I1 and
I2 . So we will have,

E [Ri ] = λ0 + bi,1 λ1 + bi,2 λ2

Each point on axis is determined by setting other 2 variables to 0. The main result of APT says that all
assets all portfolios lie on this 3 dimensional hyperplane. If we consider portfolio P on the above graph and
portfolio Q directly above it. Investors will flock to but portfolio Q as it offers a higher expected return. This
extra demand will cause its price to soar but higher price would decrease expected return back down to that
on the plane. This happens as we have assumed the market is arbitrage free - anomalies are quickly corrected.

The main disadvantage of APT is that it doesn’t specify the underlying multi-factor model.

36
Tutorial Day 2
8 Brownian Motion and Martingales
First of all we will cover Standard Brownian Motion and its use in Financial Economics.

It is a Continuous Time stochastic process, Bt , with a Continuous State Space. It has five key defining
properties that we will look at:

• B0 = 0

• It has independent increments, i.e. Bt − Bs is independent of Fs


• It has stationary increments, i.e. statistical properties are constant over time
• It has Gaussian increments with Bt − Bs ∼ N (0, t − s)
• It has continuous sample paths.

It is used as a building block for modelling financial processes, such as shares, interest rates and exchange
rates over time. We have functions of standard brownian motion for these that are more realistic as to how
they move. Once we have our process we can then go on to value derivatives, financial assets that depend
on the value of some other underlying process.

So maybe we have an insurance policy that pays out in line with some share index subject to minimum and
maximum values. And if we’ve got a model for how the underlying index moves; then that might make it
easier to value the derivative. Thinking of a basic call option, that gives the holder the right, but not the
obligation, to purchase one unit of the underlying at a particular time in the future for a price that is agreed
now. Depending how it moves in the future, that is going to affect whether the derivative is exercised or not.

Another use is to investigate how the Assets and Liabilities move relative to one another. So for an insurance
company it is going to be vital that its assets always exceed the value of its liabilities.

Future Probabilities
Lets suppose that we are currently at time s and we are using Brownian Motion to value our assets. Lets
say Bs = a and we require the probability that at some future time t the Brownian Motion exceeds b.

P [Bt > b|Bs = a] = P [Bt − Bs > b − a]


 
b−a
=P Z>√
t−s
 
b−a
=1−Φ √
t−s

37
Properties of Standard Brownian Motion
Markov - Firstly Standard Brownian Motion has the Markov Property. This means that its future sta-
tistical properties are independent of past values. We just need to know its current value and that is all we
need to estimate future probabilities.

We know that Standard Brownian Motion has Independent Increments, which says that future movements
are independent of the past and the current. So certainly they have got to be independent of the past.

Martingale - The next property is that Standard Brownian Motion is a Martingale. A Martingale is a
process whose Future Expected Value, given the past history of the process, is just its current value. In
other words, it is expected to go straight ahead on average.

Non-differentiable - Sample Paths of Standard Brownian Motion are Non-differentiable. With many
processes you can zoom in on them, and if you zoom in on them enough they become nice and smooth and
you can work out a gradient. Thats not the case with Standard Brownian Motion. For example, if we look
at the circled series below we can see that on average is seems like it is gradually increasing, but zooming in
we see that it is fractal in nature, it is infinitely divisible. No matter how many times we zoom in, the curve
will never be smooth.

Covariance - One of the defining properties of Standard Brownian Motion is that,

Bt − Bs ∼ N (0, t − s)
We can use this in determining the covariance function,

Cov (Bt , Bs ) = Cov (Bs + (Bt − Bs ) , Bs )


= Cov (Bs , Bs ) + Cov (Bt − Bs )
= Var (Bs ) + 0
=s

This occurs only if s < t, if we conversely assume that t < s we get that the covariance of Standard Brownian
Motion at two time-points is given as the minimum of the two times.

38
Scaling Property - We will define a new process, B ∗ (t) to be equal to,
B (ct)
B ∗ (t) = √
c
So we are squashing
√ up time by a factor of c, and at the same time reducing the size of the Brownian Motion
by a factor of c. The Scaling Property says that this new process B ∗ (t) is a Standard Brownian Motion
itself. So lets check it has the right properties.

First of all, as it is a function of Standard Brownian Motion, all we’re doing is scaling the time and the
magnitude. We still have Normally distributed increments. Any normal distribution is completely
characterised by its Mean and Covariance properties so we will check these out.
√ √
E (B ∗ (t)) = E (B (ct)) / c = 0/ c = 0
√ √ 
Cov (B ∗ (t), B ∗ (s)) = Cov B(ct)/ c, B(cs)/ c
= Cov (B(ct), B(cs)) /c
= min {ct, cs} /c
= cs/c = s

So overall we have shown it has the same distribution, mean and covariance as Standard Brownian Motion,
so B ∗ (s) must itself be Standard Brownian Motion.

Time Inversion Property - Now we define another new process,

B ∗ (t) = tB(1/t)

As with the scaling property, we will check out the distribution and its properties. B ∗ is a function of Standard
Brownian Motion and since we are just inverting time and scaling, we still have Normally distributed
Increments. The Expected value is given by,

E (B ∗ (t)) = t E (B(1/t)) = t × 0 = 0
Cov (B ∗ (t), B ∗ (s)) = Cov (tB(1/t), sB(1/s))
= ts Cov (B(1/t), B(1/s))
= ts × (1/t)
=s

39
General Brownian Motion
Lets suppose we are trying to model Share Prices. Now, we can try and use Standard Brownian Motion for
this, and if we did so, we know that SBM increments are normally distributed. Bt − Bs ∼ N (0, t − s). This
wouldn’t be ideal for share price movements for several reasons. The main of which being that we have a
mean of 0 for our increments. This would be bad for share prices as investors would only purchase them if
they in fact went up on average. Also SBM starts at 0, whereas a model of share prices should start at the
current share price.

So we will define a new process and call this W . Instead of starting at 0 we will choose to start at W0 , which
we can set equal to the current share price. Rather than having a mean increment size of 0, we will add some
drift onto the process by adding µt. So if µ is positive, we’d be going up by µ per time period on average.
We call this the drift of the process. We then add on the SBM process as it is purely deterministic without
it. We multiply this by σ depending if we want more or less variability coming from the SBM process. This
is the Volatility Parameter.

We will now look at increments of General Brownian Motion:

Wt − Ws ∼ N (µ(t − s), σ 2 (t − s)

So now we have more control over our process. We can determine where it starts and also add a positive or
negative drift if we wish and we can also scale the volatility.

Now, we will look at 3 graphs and decide which are examples of Brownian Motion (General or Standard)
dependent on if they exhibit the properties we have discussed in this chapter.

40
Martingales
A Stochastic Process is a Martingale if the Expected Future Value of the process at some point, conditional
on the past and present value of the process, is just its current value. So in other words, it is expected to go
straight ahead on average.

Mathematically, if our process is Xt ,

E (Xt |Fs ) = Xs s<t


.
This is the definition for a Local Martingale, but strictly speaking, for a Global Martingale we need one
other condition to be satisfied,

E (|Xt |) < ∞ for all t

Now lets see how we would go about checking a process is a Martingale. Lets take Standard Brownian
Motion and have a look at that,

E (Bt |Fs ) = E (Bs + (Bt − Bs )|Fs )


= Bs + E (Bt − Bs )
= Bs

Thus Standard Brownian Motion is a Martingale, as its Expected future value at time t, is just its current
value at time s.

Now we will consider a second example, testing whether or not SBM squared is in fact also a Martingale.
So we start off like we did above, given past history to time s, what is the expected future value at time t of
this process.

 
2
Bt2 |Fs

E = E (Bs + (Bt − Bs )) |Fs
 
2 2
= E Bs + 2Bs (Bt − Bs ) + (Bt − Bs ) |Fs
 
2 2
= Bs + 2Bs E (Bt − Bs |Fs ) + E (Bt − Bs ) |Fs
= Bs2 + Var (Bt − Bs ) + E (Bt − Bs )2
= Bs2 + t − s

We conclude that Bt2 is not a Martingale.

In fact if we considered Bt2 − t as the function of Brownian Motion, we would end up with the expected
future value of Bs2 − s. Hence this would be a Martingale.

We use the same process for all Martingale questions.

41
9 Stochastic Calculus and Ito Processes
SDEs for Functions of Standard Brownian Motion
In order to be able to work out Stochastic Differential Equations (SDEs) for Standard Brownian Motion, we
need to be aware of this 2 x 2 multiplication grid. We have the change in a very small period of time dt and
the change in Standard Brownian Motion over this very small period of time, dBt . Now the change in time
is so small, that when you multiply it by something else small you just get 0. Change in the SBM over this
very small period of time, dBt , is also very small, but it isn’t so small that when you multiply it by itself
you get 0. When you square dBt you get dt.

In order to calculate SDEs for functions of SBM we also need to be aware of another result, Taylor’s formula.
It tells us the change in the function of a Stochastic Process is given as,

Make sure you know Taylor’s Formula and the grid above!!
For functions of Standard Brownian Motion we can also apply Taylor’s Formula.

1
df (Bt ) = f 0 (Bt ) dBt + f 00 (Bt ) (dBt )2
2
Example of Finding SDEs for a Function of Standard Brownian Motion

If our function is given as,

f (Bt ) = sin (Bt )

We obtain the following derivatives,

f 0 (Bt ) = cos (Bt ) and f 00 (Bt ) = − sin (Bt ) .

So using Taylor’s formula, it follows that,

1
d (sin (Bt )) = cos (Bt ) dBt − sin (Bt ) dt
2

42
Stochastic Differential Equations for Diffusion Processes
A diffusion process is any process that satisfies the following Stochastic Differential Equation.

dXt = µ (Xt ) dt + σ (Xt ) dBt

We can see that the change in the process itself, dXt , is some kind of drift term times the change in time, dt,
plus some kind of volatility term, times the change in SBM. All diffusion processes can be defined in this way.

Now we will consider an example of a function of Xt . Let,

f (Xt ) = sin (Xt ) so f 0 (Xt ) = cos (Xt ) and f 00 (Xt ) = − sin (Xt )

So,

1
d (sin (Xt )) = df (Xt ) = f 0 (Xt ) dXt + f 00 (Xt ) (dXt )2
2
1
= cos (Xt ) {µ (Xt ) dt + σ (Xt ) dBt } − sin (Xt ) σ 2 (Xt ) dt
 2
1
= µ (Xt ) cos (Xt ) − sin (Xt ) σ 2 (Xt ) dt + cos (Xt ) σ (Xt ) dBt
2

By collecting the terms together of dt and dBt , we get it back in terms of a diffusion process. The co-efficient
of dt is the drift of f (X). Whereas the coefficient of dBt is the volatility of f (X).

Derivation of Ito’s Lemma

Below we will demonstrate how using Ito’s Lemma is the same as using Taylor’s Formula and the 2 by 2
multiplication grid.

We start with a diffusion process Xt as defined at the top of the page. But instead of considering a drift
and volatility function in terms of Xt only, they will not be functions of two variables: Xt and t. We have,

1
df (Xt , t) = fX0 (Xt , t) dXt + fX00 (Xt , t) (dXt )2 + ft0 (Xt , t) dt
2
= fX0 (Xt , t) (µ (Xt , t) dt + σ (Xt , t) dBt )
1
+ fX00 (Xt , t) σ 2 (Xt , t) dt + ft0 (Xt , t) dt
2
= µ (Xt , t) fX0 (Xt , t) + 1/2σ 2 (Xt , t) fX00 (Xt , t) + ft0 (Xt , t) dt + σ (Xt , t) fX0 (Xt , t) dBt

43
Solving the Geometric Brownian Motion SDE
The SDE for Geometric Brownian Motion is given on page 46 of the tables and is defined as,

dSt = St [µdt + σdBt ]

It is very similar to the SDE for General Brownian Motion, we have a drift term µ and a volatility term
σ. But with a Geometric Brownian Motion, both the drift and the volatility are multiplied by the process
itself, St .

The way you solve this SDE is to let,

f (St ) = log (St )

We then apply Taylor’s formula to this function,

d log (St ) = df (St ) = f 0 (St )dSt + 1/2f 00 (St )(dSt )2


= (St )−1 St [µ dt + σ dBt ] − 1/2 (St )−2 σ 2 St2
 
1 2
= µ − σ dt + σ dBt
2

We now change the letter t for the letter w as what we want to try and do now is integrate between 0 and t.
Z t  Z t Z t
1
d log (Sw ) = µ − σ 2 dw + σ dBw
0 2 0 0

 
1 2
[log (Sw )]t0 = µ − σ t + σ [Bw ]t0
2

log (St ) − log (S0 ) = µ − 1/2 σ 2 t + σBt


 

Rearranging we get the solution below.

44
Solcing the Ornstein - Uhlenbeck SDE
The SDE for the Ornstein-Uhlenbeck process is defined below as,

dXt = −γXt dt + σdBt

The way to solve the Ornstein-Uhlenbeck process, is to start with this function of variables Xt and t,

f (Xt , t) = Xt eγt

The coefficient of t in the exponent is given as negative the coefficient of the Xt in the drift term.

We then apply Taylor’s formula in two variables to this function.

d Xt eγt = df (Xt , t)


1
= fX0 (Xt , t) dXt + fX00 (Xt , t) (dXt )2 + ft0 (Xt , t) dt
2
= e (−γXt dt + σdBt ) + 0 + γXt eγt dt
γt

= σeγt dBt

We are then going to change the letter t to a letter s as we are going to integrate over the range 0 to t.
Z t Z t
γs
d (Xs e ) = σ eγs dBs
0 0

Z t
γt
Xt e − X0 = σ eγs dBs
0

Hence, upon rearranging we have the solution.

With the original SDE, we are told how Xt behaves over a very small period of time. We can see the drift
component which is proportional to the process itself and the volatility which is constant. In this ’so called’
solution, the first term is okay but in the second term we have an Ito integral. We may question how is this
any more intuitive than the original SDE?

Recall that dBs is the change in Standard Brownian Motion over a very small period of time,

dBs = Bs+ds − Bs ∼ N (0, ds)

45
So if we’re going to look at the mean of Xt we have,

E [Xt ] = X0 e−γt + 0

Now looking at the Variance of Xt we obtain,


Z t
Var (Xt ) = 0 + σ 2
e−2γ(t−s) ds
0

Note that it is only the case that the variance of a sum is the same as the sum of variances only if we have
independent random variables.
So we can work out the mean and the variance of Xt , we know that Xt is normally distributed as it is a
simple function of other normal distributions.

So, in the sense that we have ’solved’ the Ornstein-Uhlenbeck process we mean that we know that,

46
10 Stochastic Models of Security Prices
Log-normal Model for Modelling Share Prices
Lets suppose we want to come up with a model for Share Prices. We might think about using Standard
Brownian Motion, Bt . If we did this we would have increments that were normally distributed with mean
0 and variance, t − s. This wouldn’t be good as B0 = 0 which isn’t the case for share prices and there is 0
drift in the increments. We would expect over time share prices to increase on average.

We may go on to then consider General Brownian Motion,

Wt = W0 + µt + σBt

With this process, we can start wherever we like, we can set W0 equal to the current share price. We also have
µ and σ so we can choose a particular drift and volatility. The increments have the following distribution,

Wt − Ws ∼ N µ(t − s), σ 2 (t − s)


It should be noted that the mean and variance are independent of Ws , this doesn’t seem realistic as the
future share price should be related somewhat to the current share price. We would expect to have a bigger
mean movement for larger starting values of the share. Additionally General BM may not be a good model
as it can go negative as Bt can go negative.

To get round these two issues we can use the Lognormal model, defined as,

St = eWt

By introducing exponentials, there is no way we can go negative and we will also see that increments are
proportional to the current share price. With this model we can see that the log of St is equal to Wt . So
the increments in the logs of the share price will have the same distribution as the increments of W ,
 
St
∼ N µ(t − s), σ 2 (t − s)

log (St ) − log (Ss ) = log
Ss
Therefore it follow that the ratio of future share price to current share price has the distribution,

St
∼ LogNormal µ(t − s), σ 2 (t − s)

Ss
These parameters are still independent of the starting share price Ss , but as this is a model for the Per-
centage Growth this is fine. We can see that the share price at time t is going to be,

St = Ss eµ(t−s)+σ(Bt −Bs )

So mean and variance are proportional to t − s, increments are Gaussian, they are stationary and the
increments are independent of Fs . They do not depend on previous or current values of share price.

2
E [St ] = Ss eµ(t−s)+1/2σ (t−s)
 2 
Var [St ] = E [St ]2 eσ (t−s) − 1

47
Cross-sectional and Longitudinal Properties
Now we will consider two way of looking at the properties of a time series model. These are called Cross-
sectional and Longitudinal properties. When we say ’properties’ we mean things like the Statistical
distribution and its parameters.

To illustrate these concepts we’re going to look at an example relating to the Wilkie model of inflation. Below
is a table of data representing 10 runs (or simulations) of inflation, denoted I(t). Each column represents a
run and each row represents a projection of inflation for a different year. The detail as to the formula that
has been used for the projections is not needed.

Suppose that someone has asked us to estimate the volatility or standard deviation of 1 year inflation. One
way of doing this would be to fix on a particular row, say I(1), and to work out the standard deviation of
the figures in this row. We could work out the volatility using this formula,
v
u 10
u 1 X
t (xi − x̄)2 = 3.84%
10 − 1 i=1

Where the xi represents the 10 inflation figures in the row I(1). If we put in the actual figures from the
I(1) row, the sample standard deviation is 3.84%. This standard deviation figure is called the 1 year cross-
sectional volatility. We have fixed a time horizon and calculated the standard deviation across all
simulations.

Another way of working out the volatility of 1 year inflation would be to pick a particular run (simulation).
Lets pick run 1. We have,

I(1) = 1.2%|I(0) = 3%
I(2) = 8.9%|I(1) = 1.2%
..
.
I(10) = 10.9%|I(9) = 9.4%

Note that these are all estimates of 1-year inflation, but each is generated from a different starting point.

48
Once again we can estimate the standard deviation, or volatility, using the formula below. Where this time
the xi represent the 10 inflation figures in run 1 (I(1) to I(10)). If we put in the figures from this column,
the sample standard deviation is 4.09%. This is called the 1-year longitudinal volatility. We have picked
a simulation and looked at the one year inflation sampled repeatedly along a simulation.

v
u 10
u 1 X
t (xi − x̄)2 = 4.09%
10 − 1 i=1

The estimate of the longitudinal volatility is 4.09%, whereas the estimate of the cross-sectional volatility
was 3.84%. The longitudinal volatility is GREATER than the cross-sectional volatility and this is
what we should expect. The cross sectional distribution of 1-year inflation figures is dependent on the initial
conditions. It is based on a set of 1-year inflation figures that all have the same initial conditions, I(0) = 3%.

The longitudinal volatility is based on a set of 1 year inflation figures that have different starting conditions
as you move down the column. This causes additional variability in the longitudinal estimate. It is
almost unconditional on the initial condition that I(0) = 3% and should converge to some limiting, also
called an ergodic, distribution as the time horizon increases.

Finally, lets have a look at the relationship between the two types of volatilities by instead of looking at
1-year inflation, we looked at the volatility of 20-year inflation. This time the cross sectional volatility would
be based on the row I(20).

However, by the time that we get to this row; the impact of the initial condition I(0) = 3% will be greatly
reduced. For 20-year inflation figures, the estimates of the cross-sectional and longitudinal volatilities should
be closer together than the estimates for 1-year inflation.

When Wilkie looked at the parameters to use in his model, he looked at many years of historical data. From
1923-1994. This is equivalent to looking at a single simulation. The parameter estimates used in the Wilkie
model are therefore based on a longitudinal distribution. This will reflect the average of the economic
conditions over that time period.

As such longitudinal parameter estimates are appropriate if we want to project an average along all future
economic conditions. They are much less appropriate if we want to project 1 year figures starting from a
given set of market conditions.

49
11 Introduction to the Valuation of Derivative Securities
Definitions, Notations and Terminology
We will start with the definition of a Derivative.

The main types of derivatives are Futures, Forwards, Options, and Swaps.

With a Future or a Forward there is a commitment for one party to buy or sell the underlying on terms
that are agreed at outset.

A Future is where the agreement is within an established exchange. A Forward is where two parties get
together and agree terms.

With an Option, as the name suggests, you have a choice of whether to go ahead with the agreement or
not.

Finally, Swaps are another common type of derivative where two counter-parties agree to swap things on
certain terms in the future.

The main use of Derivatives is to REDUCE RISK. To be able to protect yourself from adverse events.
The underlying on a derivative contract can be almost anything; some of these will be assets but other ones,
for example interest rates and weather conditions aren’t actual assets. But as long as precise terms can be
specified, thats ok.

50
Moving onto Options we can have CALLS or PUTS. A Call Option is the right to BUY an asset in the
future for an amount agreed now, but we don’t have to. Similarly a Put Option is the right to SELL an
asset in the future for an amount agreed now, but again you have a choice.

Options can be European, American or Bermudan in nature.

European Options can be exercised at EXPIRY ONLY.


American Options can be exercised at ANY TIME up until expiry.
Bermudan Options can be exercised at SET TIMES ONLY.
Now lets look at some terminology.

The VALUE or the PRICE of an option are what we are willing to pay for it and this can be broken down
into two components: INTRINSIC and TIME.

The Intrinsic value is what you would get if you could Exercise the Option NOW subject to a minimum
of 0. For a Call option if you exercised it now you would receive an asset worth St in exchange for K and a
Put Option you would sell the asset worth St and receive K in return. Hence the values are given as,

The Time Value is just the BALANCING ITEM, it is whatever is left when you take the Intrinsic value
(IV) from the Total Value of the option, ct and pt .

This leads us onto the Moneyness of an Option. We say that an option is either in the money, out of the
money or at the money. The conditions of which are dependent on whether we are considering a Call or a
Put Option.

The other terminology that we need is the idea of having a LONG or SHORT position. We have the long
position on an option if We are the HOLDER, if we have the option to exercise or not. We have the short
position if we are the WRITER of the option, we have given someone else the option to exercise.

51
The typical notation used when dealing with derivatives is discussed below.

52
Payoffs and Position Diagrams
We use K for the Strike (or Exercise) price of the option and ST for the price of the asset at expiry.

For a CALL option, we have the right to BUY the asset at expiry for K. So if K < ST we would exercise
and out payoff would be as given below, but if the asset was worth less than K, there would be no point in
exercising and we would receive no payoff.

For a PUT option we have a right to SELL the asset for K. So if we do so, our payoff will be as defined
below. We would only do this if K > ST otherwise we would do nothing.

They are the mathematical definitions but it may be interesting to look at what this means graphically.
There are four different circumstances we need to consider whether we hold the long position for a call/put
option and the same for the short position. These are graphs of Payoff versus Price of Asset.

We will start by considering a Long Call option, so we have bought one of these and we have the choice
of whether to exercise or not. If the asset price ends up less than K, there is no point paying K for the
asset. We wouldn’t exercise the option and therefore get 0 payoff. As soon as the asset price rises above K
it becomes worth our while to pay K for it and therefore our payoff increases as S increases.

For a Short Call we are looking at the opposite position for when we had the Long Call, so it is from the
writer’s point of view. This is the mirror image of the long call in x-axis as if the holder is receiving a payoff
that has to come from us, the writer.

For a Long Put, if the asset price ends up being above K, we would be better off selling it in the open
market rather than just receiving K for it. So there would be no payoff for higher asset prices. But for
payoffs below K, we would exercise and sell the asset for K getting a higher payoff the lower the asset price.

For a Short Put, again this is the reflection of the Long Put in the x-axis.

53
We may also be interested in the Position Diagram. For this we replace Payoff by Profit which is net of
the money we paid for the option in the first place.

Let us consider a Long Call. All we need to do is adapt the payoff diagram, to take off the premium that we
paid for this option. So the graph will move downwards, and although for low asset prices, we don’t exercise
the option, we still have to pay for it and therefore would make an overall loss of c.

As soon as the exercise price is above K, we would exercise, we would get a payoff but we still had to pay
the premium so we would get the line shown above, crossing the axis at K + c. We would just break even
at this point. So if the asset price ends up at K + c our payoff would be ST − K = K + c − K = c. When
we take off the premium we have paid, we get down to a profit of 0 as shown.

If the asset price was between K and K + c we would exercise the option but we would still end up with a
loss because the payoff is less than the premium paid.

We can also make position diagrams for the other scenarios but I will leave this as an EXERCISE.

Intrinsic Value and Time Value, Call and Put Option Graphs
When people refer to the value of an option, they usually refer to the Market Value or the Total Value
someone would pay to buy the option.

The Intrinsic and Time Values are defined for Put and Call options on Page 51. This section aims to discuss
what this looks like graphically.

The graph below shows the value of a European Call Option. We have the option price as a function of
the underlying asset price. This option happens to be a 1-year option with a strike price K = 100. We can
break this value down into two parts. The dotted line shows the intrinsic value of the option.

So for asset prices below 100, the value of the asset is less than the strike price and therefore there is no
intrinsic value. But if the price is above 100, the intrinsic value goes up at 45 degrees. That leaves the time
value, the gap between the lines. It shows the extra value that we’ve got from not having to decide now
whether we buy the asset or not.

54
So for very low asset prices, our option is out of the money, so there is very little chance of wanting to exercise
it in the future. So time value is very low. At the other end of the graph, we are very likely to exercise our
option, again so the time value of waiting - not having to decide isn’t that great as we will probably exercise.
At asset price around 100, we see that Time Value is at its greatest, that is because around these values,
we’re unsure as to whether we will exercise or not. So the extra value that the option gives us, because we
can wait and decide later, it at its highest.

Now we will consider a European Put Option. In this case for high asset prices we have no intrinsic value
as the option is out-of-the-money. For low asset prices the option is in-the-money and the intrinsic value
increases at 45 degrees as we go to the left. As before, the time value is the difference between these 2 lines.
To the right of the graph we have positive time value as before, but interestingly on the left, for asset prices
below about 67, we have negative time value. This occurs when the intrinsic value is higher than the total
value of the option. So the balancing item, the time value is negative.

This is because, for low asset prices where our option is in the money. So if we could exercise it now, we
would get a big pay off. As it is a European option, we can’t exercise now we have to wait until expiry.
During this waiting there is not much further for the asset price to fall, so not much to gain but the asset
price could rise again taking us out of the money and therefore we would lose our payoff.

55
Factors Affecting Option Prices
In the table below we will list all the factors that affect option prices and also we will tabulate the effect
they have on the value of a call or a put. We will look at the affect of increasing the value of the factors
has on the call and put.

As the Payoff for a Call option is given as ST − K, if the price of the underlying increases, the value of the
call option increases. This has the adverse effect on the put option.

The Strike price K has the opposite impact.

If we increase time to expiry, we have a longer period for the share price to move in our favour. At the same
time we have protection against the share price moving against us. So generally we would expect value of
call and put to increase.

Using a similar argument, increasing the volatility of the underlying, it is more likely that the asset will
move in our favour and we have protection against the asset moving against us and therefore this is good
for our option.

If we exercise a call option, we’re going to need to pay K for the asset. The PV of K will be reduced if
we increase the discount rate r. Therefore having a higher r increases the value of our Call option. The
opposite happens for a Put option.

For a call option we are going to receive the asset if we exercise the option and if dividend rate increases,
then the capital value of the share decreases. Therefore we are receiving something less valuable. For a put
option, increasing dividend rate reduces capital value, but as we are giving up the share and receiving a fixed
K for it, this is good news, so value of our Put Option would increase.

56
Bounds for Option Prices
We will now consider the upper and lower bounds for Call and Put Options, both American and European.
Lets suppose our underlying is a non-dividend paying share and we want to derive the various bounds for
option values on this share.

We will start with a Call option. Lets consider a portfolio at the current time t. Since we want the bound
for a Call option we better have one in our portfolio, we will go for a European option which has value ct .
If we want to exercise our option we better have the cash. So together with our option we will need to hold
the PV of that cash, which would be Ke−r(T −t) . Now we will wait, and we will see what happens by the
time we get to maturity, T .

There are no dividends or no other cashflows from t to T . So we will just look at what these assets are
worth. The payoff function for our call option is max (ST , K).

The next stage in deriving our bound is to note that the maximum of ST and K has to be at least as big as ST .

So lets look for another portfolio now that will give us a payoff equal to this ST . If we hold one unit of our
asset at the start which we will define as St and it will grow to be ST at time T .

Now we can bring in the principle of No-Arbitrage and say that if the portfolio on the left has got a value
at least as big as the value of the portfolio on the right at time T . That must also mean the same at all
earlier times.

So to get our bound for our European Call option, we can rearrange this inequality that we have at time t
and we can say that,

St ≥ ct ≥ max St − Ke−r(T −t) , 0




Since the minimum payoff we’re going to get from our option is 0 and we might get a positive payoff it must
be the maximum of asset price and discounted cash held and 0. This is the lower bound derived algebraically.
For the upper bound we use general reasoning, it will be the share price St . We would never want to pay
more than that for a call option as for the value of the share we can buy the share not just the option to
buy it and not the share itself.

For an American Call Option we consider the intrinsic value of the option. this isn’t as great as the
lower bound calculated above. So it is the case that we have the same bounds on American Call options as
European Call options.

57
Put Options
We use a similar technique here, at time t we consider a portfolio that has a put option and we go from there.

Lets start with the European case again, the Put option will have a value pt and if we choose to exercise
our option, we’re going to receive K in return for the share, so we better have the share ready to give up.
This will have current value St .

Now if we wait, we’ll see what this is worth at time T , at maturity. The payoff function for our put option
is max (K − ST , 0) + ST = max (K, ST ).

The maximum of K and ST is at least as big as K. So for our second portfolio, we are looking for something
that would give a payoff at T equal to K. This would be the discounted value, Ke−r(T −t) .

Again by the principle of NO ARBITRAGE, if the left portfolio is at least as big as the right portfolio
at time T . The value must also be at least as big as the right at all earlier times, including t.

We can rearrange the inequality at time t, to obtain the Lower Bound for Put Options,

Ke−r(T −t) ≥ pt ≥ max Ke−r(T −t) − St , 0




The Put Option can never give us a negative payoff, therefore its value can never be less than 0 so we can
put in a maximum of the payoff and 0.

For the Upper Bound we will use general reasoning again. So what is the best thing that can happen to
our put option?

If we exercise we are going to receive K for it, if the share was worthless at time of exercising. We are just
going to receive K in return for nothing. The PV of this would be discounted K, which must therefore be
our Upper Bound.

That leaves the American Put Option, Pt . To see if we would want to exercise this early, again we can
look at the intrinsic value. So in this case, the intrinsic value is K − ST is its in the money. Comparing
this to the Lower bound we see that it is higher. So it looks like its possible for the value of our put option
to fall below the intrinsic value. Thats the same as saying it could have negative time value.

So if we hold the American version of the option, we could exercise it early and get the intrinsic value and
that would be better for us than selling the option for a lower market value. As we could exercise it now,
we replace discounted K by K in the bounds.

58
59
Put-Call Parity
We will look at two methods for deriving the Put-Call Parity relationship for Option Prices on non-dividend
paying shares.The first method will use an approach similar to the one used for deriving bounds and the
second method involves more algebraic manipulation.

Method One
We will consider two portfolios at time t. The first portfolio will contain a European Call Option and enough
cash so that with interest we’ll have enough to exercise the call. The other portfolio will contain a European
Put Option and a share. Then we will wait and see what these portfolios are worth at time T . The call
option has a payoff of max (ST − K, 0) and the cash will have accumulated to K, giving max (ST , K). The
other portfolio would give the payoff for a Put, which is max (K − ST , 0) plus the value of the share at time
T , ST , also giving max (K, ST ).

As you can see by the figure, these are equal at time T . The principle of no-arbitrage tells us that the
portfolios must be equal at all earlier times. Otherwise we could have set up an arbitrage opportunity.

Method Two
Here, we will consider a call option and some cash and go straight to the end and look at its value at maturity.

CT + K = max (ST − K, 0) + K
= max (ST , K)
= ST + max (0, K − ST )
= ST + PT

Now, our C + K and S + P , both represent portfolio values at time T . So again, we can use the principle
of no arbitrage to say the value of these portfolios must be the same at all earlier times. Therefore in
particular at time t,

ct + Ke−r(T −t) = St + pt

On Page 47 of the tables, the Put-Call Parity Relationship is stated for a dividend paying share. We will
discuss how to adapt Method One to account for this. All that would change would be that at time T as we
are only exchanging one share for K, we need to hold St e−q(T −t) at time t. The effective earning dividends
will cancel this out leaving us ST only at time T .

60
No Arbitrage Pricing of Forwards
We will start by defining what is meant by a Forward Contract.

So unlike an option, we have no choice in the matter, we are going to have to honour the agreement. Now
the specified price is referred to as the Forward Price of the contract. When the agreement is set up, the
Forward Price is chosen to make the initial value equal to 0. So that no cashflow changes hands.

We will use a No-Arbitrage argument to derive a fair forward price for a non-income producing asset.

We will use the following terms in the derivation.

The way to calculate the forward price is to consider two portfolios, A and B. Portfolio A contains a long
position in forward contract and cash to accumulate and honour our agreement at time T . Portfolio B will
consist if one unit of our asset. Lets look at the value of our portfolios at time 0 and time T .

We can see that both Portfolios A and B have the sae value at maturity. Neither of these portfolios involve the
payment of any income during the time period from 0 to T . Therefore, by the principle of No-Arbitrage,
they must have the same value at the start. So the principle of no-arbitage implies that,

61
12 The Greeks

Definitions of the Greeks and Signs


We will use f to denote the value of a derivative, or the value of a portfolio of derivatives. Mathematically
we can say that,

∆= The sensitivity in the derivative value with respect to value of underlying


Γ= The sensitivity of ∆ with respect to the price of the underlying
Θ= The sensitivity in the derivative value with respect to time
ν= The sensitivity of the derivative value with respect to volatility (Vega)
ρ= The sensitivity of the derivative price with respect to risk-free interest rate
Λ= The sensitivity of derivative value with respect to compounded dividend rate

The table above shows the likely signs of these Greeks for call and put options.

62
Use of Greeks in Risk Management
We will look at each greek in turn, highlighting what information they can give us.

If we were a derivatives trader, we’d make our money from buying and selling derivatives, we don’t really
want to be exposed to the price of the underlying. In which case we want to be Delta Hedged, that means
having a portfolio delta of 0. If we manage to get delta equal to 0, we are protected against small changes
in the price of the underlying. If we start off with a non-zero delta, one way to change it to get it closer to
0, is to buy or sell the underlying asset which has a delta equal to 1. It isn’t practical to have ∆ = 0 all the
time, so we need to decide on a frequency to rebalance. For example we might want to put on trades to get
delta back to 0 at the end of each day.

Gamma gives us useful information about the need for Re-balancing. If Gamma is 0 or close to 0, this
tells us that Delta is quite insensitive to changes in the underlying. Therefore if asset price changes a bit,
delta would stay close to 0 and we’d stay delta-hedged. However if Gamma is large, then a small change in
the price of the underlying is going to change our delta quite a lot and therefore we won’t be delta hedged.
In that case we’d have to do rebalance much more often, or do more each time.

The value of Theta shows the sensitivity of our portfolio to the movement in time. Theta will give us an
idea of how quickly the Time Value of the derivatives in our portfolio reduce as time passes. This is known
as Time Decay.

Vega tells us about the sensitivity in our portfolio to changes in volatility. Volatility is unique amongst all
the factors that effect derivative values since it is not observable directly like the other factors are. We need
to estimate volatility to be able to value our derivatives. If it takes a high value then the accuracy of out
volatility estimate becomes much more important.

Interest rates don’t usually change very often, so ρ isn’t usually as important as the other greeks. But it can
give us useful information regarding how our portfolio value would change with the Impact of changes in r.

Usually dividends are usually constant. But Λ will tell us about the how our portfolio value would change
with the impact of changes in q.

63
Use of Greeks in the df (S, t) Equation
Estimating the Change in the Derivative Value
Lets suppose we have a share whose current price S = 700. Suppose we have a derivative on this share,
denoting this as f , and it has value, f = 30, this might be a call option for example.

We are interested in estimating the likely change in the value of our derivative for changes in some of the
δf
factors that will affect it. We’ll start by looking at ∆ = δS and we can rearrange this to give us a rough
estimate to the impact that the change in share price will have on our derivative value.

We can say that,

1
df ≈ ∆ds + Γ (dS)2 + Θdt + νdσ + ρdr
2
Lets suppose that ∆ = 0.6 and we want to know the impact of a change in 10p in the value of our share. So
df = 0.6 × 10 = 6.

Considering the percentage changes in share and derivative price. S has increased by 1.4% whereas f has
increased by 20%. This is just a rough estimate, if we think about Taylor’s formula again for f we’re going
to have to add on an extra term (in blue in formula above) as S is stochastic.

Assuming that Γ = 0.008 then we obtain df = 6.4.

So we have seen how these two Greeks help us to estimate the change in derivative value. That’s fine if the
change in share value is instantaneous, but lets suppose it has happened over the period of one day. In which
case, the change in time is also going to have an effect on the change in derivative value.

So if we think about Taylor in two variables we need to add another term onto the formula above (in purple).
We assume that Θ = −0.05, that means our derivative is getting less valuable as its time value decays. We
also have a change in time of 1 day.

With this refinement our df now has a further change of -0.05. There might be other changes as well. It
might be that our estimate of volatility changes and we want to know the likely impact if that happens in
addition to the cases that we already have. In which case we can add another term (in red). Where ν = 1
and dσ = 2%. That gives us an extra 2 to add to df .

In the same way we could look at sensitivity to interest rates (green) and ρ may take a change in value of 0.4
and we may be concerned with quarter of a percent changes in interest rates. Which will give us an extra
0.1 on our change in derivative value.

If we add all these changes up we get a value of 8.45. Thus our estimated new value of our derivative is 38.45.
If we have reasonable estimates for all these Greeks, then estimating changes this way might be quicker and
more efficient than recalculating from scratch.

64
13 The Binomial Model
Binomial Model 1-Step
As its name implies, in the Binomial Model, we are going to assume that one of two things can happen to a
share price. We start off at time 0, at S0 and we assume that the share price either goes up to S0 u or down
to S0 d.

At the same time we have a derivative. If the Share price goes up, the derivative pays off cu and if the share
price goes down, the derivative pays off cd .

The values of cu and cd are dependent totally on the value of the share price. They aren’t the value of the
derivative if it goes up or down in value.

What we’re going to be asking, is what should we be paying at time 0 for this derivative? The way we’re
going to do this is to set up what is known as a Replicating Portfolio. Where at time 0, we buy φ shares
and place an amount ψ in the bank. This portfolio will have a value at time 0, which we call V0 .

Now if this share price goes up, its going to be worth φS0 u + ψer . If the share price goes down, it
will be worth φS0 d + ψer . At the moment we haven’t fixed the quantities φ and ψ. What we could try
and do is fix the quantities so that if the share price goes up - our portfolio is worth cu and if it goes down, cd .

If we do this, if we choose our values of φ and ψ to effectively force these equalities. Then all we have really
done with this portfolio is set up something that is exactly the same as the derivative. It is behaving like
the derivative in every single respect and replicating its payoff.

In which case by the principle of No-Arbitrage, the amount we should pay at time 0 for the derivative is
the same as the amount it cost us to set up the portfolio. This gives us two simultaneous equations labeled
above, which we can then solve for φ and ψ.

This is Frequently examined so the method is highlighted below for convenience.

cu = φS0 u + ψer (1)


cd = φS0 d + ψer (2)

65
Subtracting (2) from (1) we obtain:

cu − cd = φ (S0 u − S0 d)
cu − cd
=⇒ φ =
S0 (u − d)
Subbing into (2) with this we get,

cu − cd
cd = d + ψer
u −d 
−r ucd − dcu
=⇒ ψ = e
u−d
What we can then do is place these values of φ and ψ back into the formula for V0 . Notice that everything on
the numerator either involves a cu term or a cd term. So what we can do is rearrange the value of V0 as below.

er − d u − er
    
−r
V0 = e cu + cd
u−d u−d
Then hopefully we can notice straight away that these two coefficients sum to 1. Which we could then write
as,

−r er − d
V0 = e [qcu + (1 − q)cd ] where q =
u−d
Now we are going to think about what No-Arbitrage means in the Binomial Model.

From now on, we’re going to Pretend q is a Probability. We will pretend it is the Probability the
Share Price goes up. It isn’t in reality. It is just a number that happens to be between 0 and 1.

But if we pretend it is this probability, then the probability it goes down is (1-q).

66
Lets have another look at our derivative pricing formula bearing this in mind. We have, e−r then in the
brackets we have the probability that the share price goes up times the value of the derivative if it does,
plus the probability that the share price goes down times the value if it does. This is the same as the
Expected Value of the Derivative Payoff at time 1, given that we are stood at time 0 and know the
history up until then. We also add a little subscript Q to denote that we are pretending that q is a probability.

Mathematically this is,

e−r EQ [C1 |F0 ]

Example
Here we have a Share Price that starts off at 100 and can either go up to 120 or down to 83. So we have
implicitly been told the values of u and d are, 1.2 and 0.83 respectively.

We are also told that the risk-free interest rate is 5%. Our value of q is as it was on the previous page to be,

er − d e0.05 − 0.83
q= = = 0.5980
u−d 1.2 − 0.83
Lets suppose that we have a Put Option with a Strike Price of 110. Now what are the values of cu and
cd going to be? Recall that cu is the payoff of the derivative if the upward movement in share price occurs.
The option holder wouldn’t exercise if share price exceeds strike price as they could get more for it on the
open market. So consequently cu = 0. Whereas by a similar argument cd = 27.

So the value of our derivative is, using the formula,

V0 = e−r [qcu + (1 − q)cd ]


= e−0.05 [0.5980 × 0 + 0.4020 × 27] = 10.32

67
2-Step Recombining Trees
Let us assume we have a Put Option with a Strike Price of 110. Over one time period, the share price
can either go up by 20% or down by 17%, keeping the same up and down steps into the second time period
giving us 3 possible share prices at time 2.

So after two time periods the payoffs would be 0, 10.4 and 41.11. The up and down steps are the same from
one time period to the next. So q is the same at each node of the tree. So we can just use the usual Binomial
expansion,

V0 = e−2r 0 × q 2 + 10.4 × 2q(1 − q) + 41.11 × (1 − q)2


 

= e−0.1 10.4 × 2 × 0.5980 × 0.4020 + 41.11 × 0.40202


 

= 10.54

EXAM QUESTION
Consider a two-period Binomial Model of a Stock whose current price is S0 = 100.
Over the next two 6-month periods, the stock price can either move up or down
by 10%. Risk free rate of return is r = 8% per period. Calculate the price of a
1-year European Call Option with a strike price of K = 95.

Lets draw the graph of the underlying asset. As this is a recombining tree we have the same q at every node.

Substituting values given in the question we get q = 0.9164. As we have a call option the payoffs are 26, 4
and 0 respectively at the end of the second time period. Substituting these into the formula for valuing the
option we obtain,

V0 = e−2×0.08 26q 2 + 4 × 2q(1 − q) = 19.13


 

68
n-step General Pricing Formula
Here is a picture of an n-step recombining Binomial Tree. Sometimes called a Binomial Lattice.

We can show the time periods going from time 0 to time n. The convention for numbering the states is to use
the natural numbers. If there are n time periods, there n+1 possible final values for the share price at time n.

Notationally, everything corresponds at each node, to the beginning of that node. So the node highlighted
red starts at time n-2 and its state is 2 as it is second from the top of the tree. So if we were going to describe
for example the derivative value, the value is V the time period which is subscript is n − 2 and the state of
that node is 2. So the value of the derivative is given as,

Vn−2 (2)

We might be asked in the exam to come up with a formula for a value at time 0 for the derivative. For this
we discount as necessary and then there is a summation across all the possible final states at time n. We
then have the probability that the share price ends up at each of the final states multiplied by the possible
values of the derivative at time n.

69
American Options
Now we will think about how it might be possible to value American Options using the Binomial Model.

For a non-dividend paying share, if it is an American Call Option, because it is never optimal to
exercise an American Call Option early, the value is exactly the same as a European one.

However for an American Put Option it is not necessarily that straight forward. Lets look at an example,
recall the example (page 68) that we looked at for a European Option earlier that gave the value of the
derivative at time 0 to be 10.54. We will consider the same example for an American Put.

American Put Options can be exercised at any time on or before expiry. So considering the 2-step Binomial
tree, we have a choice whether or not we should exercise at time period 1 or treat it as a European Option
and hold it until expiry.

We need to do more investigation at the points where the Share Price is 83 and 120. To do this, we perform
the One Step Binomial Method at the two nodes at time 1.

Performing this on the bottom node, we see that it gives a value of 21.64. This is less than the payout of 27,
as this is an American Put Option, we can take the 27 by exercising now. Which is what every investor
would do.

Now if the Share Price goes up to 120 after 1 time period. We are going to hold onto it and treat it as a Euro-
pean Option in the hope that the share price goes down and we get a payout of 10.4. But we can still perform
a one-step calculation here to work out the value of the option at this point, which is 3.98. As 3.98¿0 that we
would get for exercising, we don’t exercise it we take its value of 3.98 as if it were a European Put Option.

We can take these values of 3.98 and 27 and perform one more 1-step calculation to give us the value of the
derivative at time 0. We get 12.59, which makes sense as it is slightly bigger than the 10.54 for the European
Put Option observed earlier.

70
State Price Deflators in the Binomial Model
The most likely place in the Exam that we’re going to be asked about State Price Deflators,
is in the two-period recombining Binomial Trees.

Let us assume we have a put option with strike price 110. That we have a share starting at price 100 and a
two-period binomial tree as indicated below.

The formula we are used to for calculating the value of the derivative is given as,

X
V0 = e−2r q(s) c(s)
s∈S

We sum across the three possible final values of the share of the discounted value of the risk neutral proba-
bility of getting to that final state times the payoff.

This is a put option so at the three states the payoffs are 0, 10.4 and 41.11 respectively.

Now another way to value any financial assets at time 0 is to use whats known as STATE PRICES. You
can see here that the state price is the discounted value of the risk neutral probability of getting to that
state.

You can think of State Prices as the cost of a unit bet. So it is the amount you’d pay to receive 1 if the
share price ends up in that final state.

Finally we can decompose state prices into something called a State Price Deflator and the Real-World
Probability of arriving at the final state.

X
V0 = D(s)p(s)c(s)
s∈S

Now we know that all probabilities are between 0 and 1. So we know the real world probability of an up-step
in the share is between 0 and 1. But we can get an even tighter bound than that.

71
Remember that the risk neutral probability q is the one that has the share price behaving like cash on
average. As risk-neutral investors do not demand a higher expected return for investing in risky assets.

In the real world Investors are Risk-Averse and would demand a higher expected return for investing in
Shares than they would on cash.

Therefore the real world probability of an up-step has to be greater than the risk neutral one.

For example, we might be told in the question that probabilty the share price goes up is 80%. Now the state
price deflator multiplied by the real world probability of achieving that final state equals the state price. So
if we take the state price and divide it by the real world probability, we will get the State Price Deflator
in each of the final 3 states.

72
Tutorial Day 3
14 The Black-Scholes Option Pricing Formula
Black-Scholes Assumptions
There are SEVEN ASSUMPTIONS of the Black-Scholes Model. You need to be able to state them and
discuss their realism in practice.

• St follows Geometric Brownian Motion


This means that increases in the log of St follow a normal distribution. This could be argued as being
Unrealistic as in real life we tend to get thinner peaks and fatter tails. Additionally, Geometric Brownian
Motion also has independent increments, in reality Share Prices can mean revert which contradicts this.

• The Market is Complete


This means that every derivative payoff that there is, can be replicated in some way. This isn’t
necessarily realistic but probably most derivatives can be replicated in some way.

• The Market is Arbitrage-Free


This assumption is realistic for most investors, as any potential arbitrage opportunities would be ex-
ploited immediately by arbitrage teams in large investment banks. Markets are immediately
brought back into line.

• The Risk-free rate of interest r is Constant


The risk-free interest rate r does vary and in an unpredictable way. However, over the short term of
a typical derivative. The assumption of a constant risk-free rate of interest is not far from reality.

• Assets may be bought and sold at Any Time


In reality there are only Specific Times in which Shares can be bought and sold.

• Assets may be held in Any Amount


This would involve Short Selling and unlimited Short Selling may not be allowed. Although, these problems
could be Mitigated by holding a mixture of derivatives which reduce the need for short selling.
In addition shares can only be dealt in integer multiples of one unit.

• There are No Taxes or Transaction Costs


This is very unrealistic. Every single dealing in the Share Markets would incurr transaction costs
and almost all of them would involve tax of some sort.

73
Using the BS Formula and Calculaing Implied Volatility
We will start by looking at notation that will be used throughout this and remaining chapters.

The formula for a European Call Option is given below.

This is on Page 47 in the Tables. The capital phi’s, Φ of d1 and d2 are the distribution function of a normal
distribution (table values). We look at an example to illustrate this.

As this call option is out of the money - its Time Value is 10.3 as its intrinsic value is 0.

74
This procedure is fairly straight forward. But the potential for silly slip ups is
quite high!
Let us set q to be 0, so that we are considering a non-dividend paying share. In real life, if we were con-
sidering whether or not to buy a call option, we would know the price of that call option. We would
also know the current price of the underlying asset. We would know the interest rate over the period of the
option and we would know the term of the option. We would also know the Strike Price.

What we wouldn’t know would be what the Volatility of the underlying asset was. So this is the case in real
life, this is often given as an exam question. It isn’t possible to rearrange to make σ the subject of
the formula. We have to use Trial and Error.

We consider another example to illustrate this. The data below relates to a European Call Option.

Subbing in all the values we are given in the question. We see that all we are missing is σ. As discussed we
work this out by trial and error.

Volatility on Equities is often around the range of 20%-40%.


So if we start off by trying 20%, we find that this gives a call option premium below the required value of 20.
We would then need to try a higher value of σ as options increase in value as volatility of the underlying
asset increases.

For exam purposes we plug in two volatilities and interpolate between them to get a final answer.

75
Derivation of the Black-Scholes Partial Differential Equation
We will start by looking at the Stochastic Differential Equation for General Brownian Motion, which is on
Page 46 of the tables. We are going to assume this for the behaviour of an underlying asset, lets say its a
share.

dSt = St [µdt + σdBt ]

We then will look at the change in the value of a derivative based on this share over a very small period of
time. Which is Taylor’s formula.

df (St , t) = fS0 (St , t)dSt + 1/2fS00 (St , t) (dSt )2 + ft0 (St , t)dt

Now in the context of options, these derivatives actually have names and they are the Greeks. Thus we
can rewrite this as,

df (St , t) = ∆dSt + 1/2ΓSt2 σ 2 dt + θdt

The next thing we do is to set up a portfolio where we hold:

• Minus one Derivative


• Plus ∆ Shares

So ∆ is an actual number. If this was a Call Option for example, we could sell a call option, work out its
delta which turns out to be 0.6. Then we would ask to buy 0.6 shares. Our portfolio’s value at time t, Vt is
given as,

Vt = −f (St , t) + ∆St

How does it change over a very small time period, the value of our portfolio?

dVt = −df (St , t) + ∆dSt

The change in the value of the derivative, df , we calculated above. So we can replace this by that expression.
If we do this we are left with the following (as the ∆dSt terms cancel).

1
dVt = − ΓSt2 σ 2 dt − θdt
2
The portfolio we set up was deliberately manufactured so that this would happen. The reason is that we
don’t know in advance what the change in Standard Brownian Motion is going to be because it is a random
process. We are now left with just terms which depend on the change in time, and we know what this is
going to be.

76
So the change in the value of our portfolio only depends on the change in time. ANy asset whose change
in value we know in advance what it is going to be, is a risk-free asset. So this portfolio is risk-free.

By the principle of No-Arbitrage, Vt must earn the risk-free rate of interest.

We can think of Vt now as being a bank account. The change in the value in our bank account would be the
size of our bank account times by the risk free rate of interest multiplied by the change in time.

dVt = Vt r dt
We have the value of Vt so we can sub this in.

dVt = (−f (St , t) + ∆St ) r dt


But now because of this forced equals sign because of the principle of no arbitrage. We have a relationship
between Γ, θ, f and ∆.

We rearrange this equation so that there are no minus terms involved and the dt terms cancel on both the
left and right hand sides. We are left with the Black-Scholes Partial Differential Equation.

−1/2ΓSt2 σ 2 dt − θdt = (−f (St , t) + ∆St ) r dt


rf (St , t)dt = θdt + r∆St dt + 1/2ΓSt2 σ 2 dt

Additionally we need to make sure that we know where dividends fit in, in this derivation.

The only change is when we first looked at the change in value of portfolio dVt , we now have,

dVt = −df (St , t) + ∆ [dSt + St q dt]


We follow the algebra through in the exact same way as we have above. This gives us the extra −q term
that can be seen in the Black-Scholes Partial Differential Equation as defined on Page 46 of the Tables.

77
How the Black-Scholes PDE is used
Now we are going to be looking at how the Black-Scholes PDE is used to find the fair price of derivatives.
Here is the Black-Scholes PDE.

This PDE has been derived on the basis of No-Arbitrage. That means that any derivative formulae that
does not satisfy this PDE must allow for arbitrage.

For example, lets say that we are trying to find the fair price for a Call Option. We might decide that the
fair price for a call option, f (St , t) is given as,

f (St , t) = exp (St ) + 4t − 4St2 + 7t3

We can Verify whether or not we are correct by calculating the Greeks for this formula and seeing if it
satisfies the PDE at the top of the page.

If we calculate these and sub them into the RHS of the equation we will see that we don’t get r × f (St , t).
So it isn’t the fair price, it allows arbitrage. So we need to choose a different formula.

f (St , t) = St − Ke−r(T −t)

Again, we calculate the greeks, θ, ∆ and Γ. If we do so we see that it equals the left. Hence this formula
satisfies the PDE and therefore prevents arbitrage.

There are many many formula that satisfy the PDE at the top of the page. They can’t all be
the fair price of a Call Option can they?

The other thing that derivative pricing formula need to do is satisfy the boundary conditions. That is, when
we get to time T , they need to give us the payoff of the derivative that we already know.

So what we can do, with our example above, is substitute in time T and see if it gives us our expected payoff
for a call option. If we did so we would get,

f (ST , T ) = ST − K

This isn’t the payoff of a call option, it doesn’t satisfy the boundary condition. So the formula we proposed
doesn’t represent the fair price of a call option.

At this point we decide to give up and let Black-Scholes have a go at it. On Page 47 of the Tables, their
proposed fair price is given. If we perform the same calculation of the greeks, we see that out PDE at the top
of the page is satisfied. Showing us that this formula doesn’t allow for arbitrage. Also if we let t tend to T
in this formula, we find that we do get the expected payoff for a Call option and hence boundary conditions
are satisfied.

78
Deriving the Greeks
Now we will look at how to derive the greeks from the Black-Scholes Call Option Pricing Formula.

This isn’t in the CT8 Core reading but has come up a few times in the exam!

We start off with a useful result that helps us simplify the differentiation later on.


 2 
φ(d1 ) 1 2 1
= exp − d1 + d1 − σ T − t
φ(d2 ) 2 2
Here we have taken the pdf for the Standard Normal Distribution, from Page 11 of the tables, and have
replaced d2 with its equivalent definition in terms of d1 on page 47 of the Tables. The expression simplifies
as follows.


 2 
φ(d1 ) 1 2 1
= exp − d1 + d1 − σ T − t
φ(d2 ) 2 2

 
1 2
= exp −d1 σ T − t + σ (T − t)
2
ln(St /K) + (r + 1/2σ 2 )(T − t) √
   
1 2
= exp − √ σ T − t + σ (T − t)
σ T − t 2
 
 1
= exp − ln(St /K) + (r + 1/2σ 2 )(T − t) + σ 2 (T − t)
2
 
K 1 1
= exp −(r + σ 2 )(T − t) + σ 2 (T − t)
St 2 2
φ(d1 ) K −r(T −t)
=⇒ = e
φ(d2 ) St
Hence,

It might not be clear at this stage why we have done any of this algebra. But hopefully it will become clear
in due course.

79
Example of Deriving Vega
Whichever greek you’re deriving, its a good idea to start with the definition of d2 in terms of d1 .


d2 = d1 − σ T − t

Now remember that Vega is to differentiate the Call Option Price with respect to σ (volatility). So what
we’re going to do is differentiate the left and right hand side of this equation with respect to σ, and we get
this result,

δd2 δd1 √
= − T −t
δσ δσ
Here is the Black-Scholes Formula for a Call Option directly from the tables.

ct = St Φ(d1 ) − Ke−r(T −t) Φ(d2 )

Now going through, we need to differentiate with respect to σ to get Vega and we can then use the chain
rule to differentiate Φ(d1 ) with respect to σ.

δct δΦ(d1 ) δΦ(d2 )


= St − Ke−r(T −t)
δσ δσ δσ
δΦ(d1 ) δd1 δΦ(d2 ) δd2
= St × − Ke−r(T −t) ×
δd1 δσ δd2 δσ
δd1 δd2
= St φ(d1 ) × − Ke−r(T −t) φ(d2 ) ×
δσ δσ

This is where we use the relationships between the partial derivatives of d1 and d2 . We can substitute this
into the expression to get the final result.


 
δct δd1 δd 1
= St φ(d1 ) − Ke−r(T −t) φ(d2 ) − T −t
δσ δσ δσ
δd1 h −r(T −t)
i
−r(T −t)

= St φ(d1 ) − Ke φ(d2 ) + Ke φ(d2 ) T − t
δσ √
= Ke−r(T −t) φ(d2 ) T − t.

80
15 The 5-step Method in Discrete and Continuous Time
Definitions and Theorems
The material surrounding the 5-step method is Extremely Theoretical, it hasn’t been examined much
and its very difficult to examine. But definitions and theorems are fair game.

Suppose that at time t we hold the portfolio (φt , ψt ), where:

• φt represents the number of units of St held at time t


• ψt represents the number of units of the cash bond held at time t.
Previsible - This is going to be surrounding the idea of knowing what our holdings of Shares and Cash
are in advance. φt is previsible if it is known based on information up to but not including time t.

Self-Financing - The portfolio strategy is defined as Self-Financing if dVt is equal to φt dSt +ψt dBt .
That is, at t + dt, there is no inflow or outflow of money necessary to make the value of the portfolio back
up to Vt+dt .
Replicating Strategy - A Replicating Strategy is a self-financing strategy (φt , ψt ), defined for
0 ≤ t < T , such that,

V (T ) = φT ST + ψT BT = X

In other words, for an initial investment of V0 at time 0, if we follow the self-financing portfolio strategy
we will be able to reproduce the derivative payment without risk.
Complete Market - This is one of the assumptions of the Black-Scholes Model. In a complete
market, every derivative payoff can be replicated in some way.

Equivalent Probability Measures - The risk-neutral probability measure and the real
world probability measure are equivalent. This may sound odd as they give different probabilities. Well yes,
they do, but they are equivalent in the sense that both worlds agree on what is possible. So if theres
a non-zero probability of an event in 1 world it would automatically imply there was a non-zero probability
of that event in the other world.

Cameron-Martin Girsanov Theorem - Suppose that Zt is a SBM under P . Further-


more, suppose that γt is a previsible process. Then there exists a measure Q Equivalent to P and
where,

Z t
Z̃t = Zt + γs ds
0
is a Standard Brownian Motion under Q.

81
Martingale Representation Theorem - Suppose that Xt is a Martingale with respect
to a measure P . That is:

for any t < s, EP [Xs |Ft ] = Xt

Suppose also that Yt is another Martingale with respect to P . The Martingale Representation Theorem
states that there exists a unique previsible process φt such that:

Z t
Yt = Y0 + φs dXs or dYt = φt dXt
0
if and only if there is no other measure equivalent to P under which Xt is a Martingale.

82
Motivation for the 5-Step Method
This proof is very theoretical so its quite difficult to make intuitive. But we might be able to say a few things
about the motivation for where we start here.

The final conclusion of the 5-step method is that the formula below for Vt , the discounted value of the
expected derivative payoff in the risk neural world, is the fair price for the derivative at time t.

We are going to end up Concluding that this is the value of a REPLICATING PORTFOLIO.

To be a Replicating Portfolio we need Self-Financing and we also need the Boundary Condition to be
satisfied.

The definition of Self-Financing tells us that we need our holdings of shares and cash to be Previsible and
that we need,

dVt = φt dSt + ψdBt

This is the change in Vt to be the number of Shares, times the change in the Share Value plus the amount
of Cash times the change in Cash. Here Bt is a cash process. We place 1 in the bank at time 0 so that at
time t it has accumulated to Bt = ert .

Now we use the Martingale Representation Theorem to show that something is previsible. To use the
Martingale representation Theorem we need 2 Martingales. We then work our way through the flow chart
below to conclude that Vt is the fair price for the derivative.

83
The 5-Step Method
STEP ONE - Establish risk-neutral probability measure Q such that
Dt = St e−rt is a Martingale.
First of all, if Dt is the discounted value of St , then St is the accumulated value of Dt ,

St = ert Dt

In the risk-neutral world, we expected the share price to behave in exactly the same way as cash. So that
the expected value of the share price at the future time t, given we’re stood at time w and we know the
history up to time w, is the share price at time w rolled up with interest.

EQ [St |Fw ] = Sw er(t−w)

If we multiply both sides by e−rt , and place this inside the expectation on the LHS, this is equivalent to
saying e−rt St is a Martingale. We can see that in the final equation below as the expected future value is
equal to the current value at time w.

e−rt EQ [St |Fw ] = Sw e−rw


EQ e−rt St |Fw = Sw e−rw
 

So when we say, to find the risk-neutral probability measure Q, such that the discounted share price process
is a Martingale. This is actually an alternate way of saying find the probability measure where on
average, the share behaves like cash.

STEP TWO - Propose Vt = e−r(T −t) EQ [X|Ft ] fair price at time t for a
derivative paying X at time T .
So on a timeline, we know a derivative is going to be worth X at time T and we are going to propose at
time t we should be paying the discounted value of the expected payoff of the derivative at time T . If we
sub in t = T we see that VT = X and therefore the boundary condition is satisfied.

84
STEP THREE - Define a new process, Et , such that Et = e−rT EQ [X|Ft ] =
Vt e−rt . Et is a Q- Martingale
Now if Et is the discounted value of Vt at time 0, then Vt is the accumulated value of Et .

It is actually quite straight-forward to demonstrate that Et is a Martingale. To show that something is a


Martingale, we need to show that its expected future value is equal to the current value.

The process to do this is outlined below which uses the conditional tower law of expectations, outlined
on page 16 of the Tables.

dEt
STEP FOUR - Define φt dDt and ψt = Et − φt Dt .
φt is Previsible by the Martingale Representation Theorem

The Martingale Representation Theorem says that if we have 2 Martingales and an equation satisfied by φ,
such as the one we have defined, then we can conclude that φt is previsible.

If you look at the definition of φt , we don’t know what the numerator is and we don’t know what the
denominator is. Neither of these 2 quantities are previsible. But, because they are both Martingales, when
we arrange them in this way, we do know what φt is going to be.

We can then say that ψt is previsible by construction. Because it consists of quantities that are at time
t or already shown to be previsible by the Martingale Representation Theorem.

We’re thinking here over the time period from t to t + dt. So by saying that φt and ψt are previsible. We
are saying that we can stand at time t and know what they’re going to be across the time period, without
having to be at the end of it.

85
STEP FIVE
On the interval [t, t + dt), hold φt shares and ψt cash. With φt and ψt defined as above.

At time t, the portfolio value is worth,

φt St + ψt Bt
Recall from earlier that Bt and St can be written as,

So subbing these in we have,

= ert (φt Dt + ψt )
= ert Et
= Vt

We have shown that this portfolio we have set up at time t is worth Vt our pricing formula that we proposed
in step 2.

Now lets see what this portfolio is worth at the end of the time period, t + dt.

= φt St+dt + ψt Bt+dt
= er(t+dt) (φt Dt+dt + ψt )
= er(t+dt) (φt Dt + ψt + φt dDt )
= er(t+dt) (Et + dEt )
= er(t+dt) (Et+dt )
= Vt+dt

So consider the change in V ,

dVt = Vt+dt − Vt = φt St+dt + ψt Bt+dt − (φt St − ψt Bt )


= φt dSt + ψt dBt

86
Now we will summarise what we have demonstrated.
On the time period from t to t + dt, we hold φt Shares and ψt Cash.

We showed, with help from the Martingale Representation Theorem, that φt and ψt are previsible.

We also showed that the change in our formula from step two is,

dVt = φt dSt + ψt dBt

So that this is the value of a Self-Financing portfolio. Finally we also showed that the boundary condi-
tion was satisfied as VT = X. VT is the value of a Replicating portfolio.

We can conclude that our proposal in step 2 was a good proposal and that,

Vt = e−r(T −t)EQ [X|Ft]


is indeed the Fair Price to pay at time t for a derivative with a RV Payoff X at time T .

87
Differences in Proof for Discrete and Continuous Time
There are subtle differences to parts of the proof that depend on whether Et and Dt are continuous or
discrete processes.

The method of the proof is the same for the first 3-steps and then diverges slightly as detailed below.

Additionally, there are also differences in checking that our portfolio is Self-Financing and Replicating.

88
Distribution of the Share Price under Q
In this section, we’re going to be looking into the distribution of the underlying asset at expiry in the
risk-neutral world Q. We will start by looking at the SDE for Geometric Brownian Motion.

dSt = St (µdt + σdZt ) , Zt ∼ N (0, t)

We solve this (as done on page 44 of this document). To get the following solution,

σ2
  
St = S0 exp µ − t + σZt
2
Which we can write equivalently as the statistical distribution of St ,

σ2
   
St ∼ log N log S0 + µ − t, σ 2 t
2
From here we can get the expected value of St in the real world P. This result makes sense as we’re saying
the expected value of the Share Price at time t is the Share Price at time 0 accumulated at the drift rate, µ.

EP [St ] = S0 eµt

Now we’re trying to derive the value of the share price in the risk-neutral world, Q. We know the expected
behaviour of the share price is the same as the guaranteed return on cash.

So ideally, we would like to swap the µ over with an r. We can do this if we use a result that is the Corollary
to the Cameron-Martin Girsanov Theorem. We can take the SBM in the real world, Zt , and we can
replace it with SBM in the risk-neutral world minus the market price of risk. Mathematically this is,
 
µ−r
Zt = Z̃t − t
σ
If we replace Zt with this and rearrange the algebra we get,

σ2
  
St = S0 exp r− t + σ Z̃t
2
σ2
   
St ∼Q log N log S0 + r − t, σ 2 t
2
This is from time 0 to t, from t to T this is given as,

2
   
σ
ST |Ft ∼Q log N log St + r − (T − t), σ 2 (T − t)
2

89
Deriving the Fair Price of a Forward
We do this by using the result of the 5-step Method.

We start with the statistical distribution of a Share Price at expiry (as defined on the previous page),

2
  
σ
ST |Ft ∼Q log N log St + r − (T − t), σ 2 (T − t) .
2
Then we use the result of the 5-step method. That the Fair price for any derivative is,

Vt = e−r(T −t) EQ [X|Ft ]

Now we’re deriving the fair price of a Forward which is an agreement to buy the share at a later date
for a price K.

So the random variable payoff for a Forward is,

X = ST − K

We substitute this in for X and rearrange and then use the statistical distribution of ST to get the expected
value of it.

Vt = e−r(T −t) EQ [X|Ft ]


= e−r(T −t) EQ [ST − K|Ft ]
= e−r(T −t) EQ [ST |Ft ] − Ke−r(T −t)
 
−r(T −t) r(T −t)
=e St e − Ke−r(T −t)
= St − Ke−r(T −t)

This final line, is the Fair Price of a Forward contract.

90
Deriving the Fair Price of a European Call Option
Again recall the statistical distribution of the Share Price at expiry,

2
  
σ
ST |Ft ∼Q log N log St + r − (T − t), σ 2 (T − t) .
2
We start with the result of the 5-step method, we have a call option here so we replace the RV X with the
payoff for a call option, X = max (ST − K, 0).

Vt = e−r(T −t) EQ [X|Ft ]


= e−r(T −t) EQ [max (ST − K, 0) |Ft ]

Now to get the expected value of a function of a RV. We need to integrate across the range of that
random variable. The range of a lognormal distribution if from 0 to infinity. We integrate the function
multiplied by the PDF of the random variable in question. So,

Z ∞
−r(T −t)
Vt = e max (S − K, 0) f (S)dS
Z0 ∞
= e−r(T −t) (S − K)f (S)dS
ZK∞ Z ∞
−r(T −t) −r(T −t)
=e Sf (S)dS − Ke f (S)dS
K K

We will look at solving the first of these two integrals. We are going to use the formula on page 18 of the
Tables for the Truncated Moments of a LogNormal Distribution.


σ2
Z    
1 2
Sf (S)dS = exp log St + r − (T − t) + σ (T − t)
K 2 2
  h   i 
σ2
log K − log St − r − 2 (T − t) √
× Φ(∞) − Φ
  √ − σ T − t
σ T −t
    
σ2
log K − log St − r + 2 (T − t)
= St er(T −t) 1 − Φ  √ 
σ T −t
= St er(T −t) [1 − Φ(−d1 )]

By d1 we mean the value of d1 from Black-Scholes formula.

91
Additionally note that 1 − Φ(−d1 ) = Φ(d1 ).

Now subbing in the value of this integral back into our equation for the fair price of the derivative value we
obtain,

  Z ∞
−r(T −t)
Vt = e St er(T −t) Φ(d1 ) − Ke−r(T −t) f (S)dS
K

If you follow through the second integral in exactly the same way you get,

 
−r(T −t) r(T −t)
Vt = e St e Φ(d1 ) − Ke−r(T −t) Φ(d2 )

where d2 is as defined on page 47 of the Tables.

To conclude the fair price for a European Call Option is given by,

Vt = StΦ (d1) − Ke−r(T −t)Φ (d2)

92
5-Step Method vs PDE Approach to Derivation of Black-Scholes
We can prove the Black-Scholes Call Option Pricing formula in two different ways. We can either use the
PDE approach and check that it satisfies the Black-Scholes PDE and the boundary conditions or we can use
the result of the 5-step method to derive it. We need to be aware of the Advantages and Disadvantages
of both methods.

Most people would argue that the PDE derivation is much easier than using the 5-step method. However
the main problem with the PDE derivation is that we have to guess the answer. We don’t start with
our assumptions and derive the result. We start with the result and check that it satisfies a PDE. We are
left with no explanation as to where it comes from. Another problem with the Black-Scholes PDE
approach is that it is derived assuming the underlying asset has Geometric Brownian Motion. So
we fix a statistical distribution very early on.

The advantages and disadvantages of using the 5-step result mirror those of the PDE approach. Namely
that it is mathematically harder. We derive the call option pricing formula and don’t have to guess
the answer and the statistical distribution isn’t fixed straight away (although at a later stage we do
fix Geometric Brownian Motion).

93
16 The Term Structure of Interest Rates
The Yield Curve, Spot and Forward Rates
We will start with the price of a Zero-Coupon Bond.

With a ZCB you receive 1 in τ years time, with no other payments in the mean time. So the price to pay
for this now is the present value of 1, which is,

B(τ ) = e−τ s(τ )

where s(τ ) is the force of interest over that time period. The average force of interest over that time
period is known as the spot yield.

By rearranging this we can see that the spot yield, s(τ ) is equal to,

− log B(τ )
s(τ ) =
τ
Now an example forward yield is slightly different to this. The h-year forward yield, applicable at time
τ , would be the rate of interest in the time period from τ to τ + h denoted as F (τ, h).

We also sometimes talk about THE forward yield which implies some kind of uniqueness. For this we
take the limit of the example forward yield as h → 0. So you can think of this as the rate of interest
that you’d be offered if you placed your money in the bank in 10 years time for 1-hour, or overnight perhaps.

We can derive a formula for it by considering the price of Zero-Coupon bonds.

B(τ + h)
= e−hF (τ,h)
B(τ )
Rearranging this to get the forward yield we get,

log B(τ + h) − log B(τ )


−F (τ, h) =
h
Now we consider the limit of this as h → 0.

d
f (τ ) = − log B(τ )

94
So we have a relationship between ZCB price and the spot yield and a relationship here between the ZCB
price and the forward yield. Carrying on further we can define a relationship between the spot yield and the
forward yield like so,

d 
−τ s(τ )
 d
f (τ ) = − log e = (τ s(τ )) = τ s0 (τ ) + s(τ )
dτ dτ
Two more definitions to be aware of are the short rate and the long forward yield.

The short rate is denoted by, r, it is defined mathematically as,

r = lim s(τ ) = lim f (τ )


τ →0 τ →0
The long forward yield is denoted by L, it is defined mathematically as,

r = lim s(τ ) = lim f (τ )


τ →∞ τ →∞
Let us now summarise all these definitions. Here is a yield curve, showing the yields on ZCBs at different
terms τ . Usually when we draw a yield curve we’re thinking about a spot yield. On the same graph we
can also plot the forward yield.

As the term of the bond, τ , tends towards 0, both the spot yield and the forward yield approach the short
rate, r. As the term, τ tends to infinity, both the spot yield and the forward yield approach a quantity
known as the Long Forward Yield, L.

In reality the long end of the yield curve in the UK for example would be about 50 years.

Both the formula for the spot yield and the forward yield are on page 44 of the Tables. It is reasonably
straight forward to derive the relationship between them as seen above.

95
Deriving s(t) and B(t) from f(t)
You might be told that the forward yield equals the following, from this you may then be asked to calculate
the price of a zero-coupon bond.

We are given,

f (τ ) = e−αt r + 1 − e−αt L


We also know that the price of a ZCB given the forward rate can be calculated by,

 Z τ 
B(τ ) = exp − f (s)ds
0
 Z τ 
−αs −αs

= exp − e r+ 1−e Lds
0
  −αs  τ 
e−αs

e
= exp − r+ s− L
−α −α 0  
−ατ −ατ
 
1−e 1−e
= exp − r− τ − L
α α
From this, we can go from the ZCB price to the spot yield and you may well be asked to do this by using,

ln B(τ )
s(τ ) = −
τ
So the log and the exponential will cancel as will the minuses.

We can be given any one of these three things and be asked to find
the value of the other two in the exam, using these relationships.

96
Desirable Features of Interest Rate Models
There are EIGHT desirable features for an interest rate model given in the core reading which you should
memorise for the exam.

A good interest rate model should...

• be arbitrage free
• not allow negative interest rates
• exhibit mean reversion
• be mathematically tractible
• exhibit realistic interest rate dynamics
• fit historic market data
• fit current market data
• help us value interest rate derivatives
MR MAN HCV

97
One-Factor Models: Vasicek, Cox-Ingersoll-Ross, Hull-White
Now we will look at one-factor models of the term structure of interest rates. We will look at what a one-
factor model is, the various types of one-factor models and we will look at how the 3 types of one-factor
model given in the core reading measure up against the desirable objectives stated on the previous page.

Below is a yield curve, showing yields on different Zero coupon bonds at different terms. The far left of the
yield curve is the short rate, r. This is extremely similar to thebase rate.

Over time, the short rate can change. So we’re going to try and build a model where the short rate, rt ,
moves about randomly over time.

The other end of the yield curve is called the long forward yield. We’re going to assume for our 1-factor
models that this is fixed.

There is only 1 source of randomness here which is just the short rate, rt . As such, this is called a One-
factor Model for the term structure of interest rates.

All we need now is some kind of process that will help us model rt randomly in continuous time. In CT8,
we have exactly the tools we need to do this, mainly diffusion processes.
We are going to say that,

drt = α (rt , t) dt + σ (rt , t) dZt

Lets now look at possible diffusion processes for rt .

We might try General Brownian Motion so we have a constant drift and a constant volatility,

drt = µdt + σdZt

One problem with this would be that it would allow interest rates to become negative.

98
We might try Geometric Brownian Motion which would prevent the negative interest rates,

drt = µrt dt + σrt dZt

However a problem with this is that there is mean reversion. If µ was positive here we would always
expect the short rate to go up which isn’t realistic.

We have the Vasicek Model based on the Ornstein-Uhlenbeck SDE,

drt = α (µ − rt ) dt + σdZt .

Here you can see some mean reversion. When rt is below µ we will have a positive drift and a negative drift
for when it is above µ. So we will always expect interest rates to go towards µ.

We also have the Cox-Ingersoll-Ross (CIR) Model, this is very similar to the Vasicek model, except

that we have the rt term in the volatility coefficient,


drt = α (µ − rt ) dt + σ rt dZt

Also we have the Hull-White Model. This is also very similar to the Vasicek model but instead of a
constant µ, we have a deterministic function of t, µ(t),

drt = α [µ(t) − rt ] dt + σdZt

The three main one-factor models which are studied in the notes are the:

• Vasicek Model
• Cox-Ingersoll-Ross (CIR) Model
• Hull-White Model
All three of these Stochastic Differential Equations are on page 46 of the Tables.

99
Lets think about how these one-factor models fit in with the desirable features of interest rate models that
we stated on page 95.

The CIR model makes sure that we do not allow for negative interest rates with the inclusion of the

rt term in the SDE on the previous page.

All three of the models are mean-reverting and it is worth noting that the Hull-White model mean
reverts to a moving long-run mean.

Ideally we would like the models to be mathematically tractable. The Vasicek model is the most tractable

whereas the CIR model is the least because of the rt term. So the CIR Model doesn’t allow for neg-
ative interest rates but it does this at the expense of tractability.

Interest rate models must have realistic interest rate dynamics. There are lot of things than can be said
here for one-factor models, most important would be:

• All three models have a constant σ. There is lots of empirical evidence that suggests σ should not be
constant. Normally in times of economic uncertainty we would expect volatility, σ, to be a lot greater.
• One factor models take the yields on long and short term bonds to be positively correlated. This
isn’t realistic, as thinking about supply and demand, sometimes the demand for long-term bonds
increase so yields go up on short term and down on long term.

We would like the models to fit current and historic market data. Neither the Vasicek or the CIR model
are very good at this and require constant reparameterisation. The Hull-White model stands the best
chance, because it has the moving long-run mean component that we can basically throw loads of
parameters into.

Finally, we would like a good model to help us value interest rate derivatives, these are one-factor
models so they won’t be very flexible when it comes to valuing interest rate swaps but would otherwise be
sufficient.

100
Two-Factor Models
Below is a yield curve showing the spot rate over different terms of Zero-coupon bonds. The far left of which
is called the short rate, rt as defined for one-factor models. That short rate varies over time.

The far right of the yield curve is the long forward yield. Now with a one-factor model, this was fixed. It
was called a one-factor model as there was only one source of randomness namely the short rate, rt ,
which moves over time.

One possibility for a two-factor model is rather than the long forward yield being fixed, we could allow it to
also move about randomly over time and define this as Lt .

So you see that with two-factor models we would have far more flexibilityin terms of the number of dif-
ferent yield curve shapes that we can get from our model.

Unfortunately, with more flexibility comes increased difficulty. A two-factor model is going to become far
less tractible.

The only two-factor model mentioned in the course notes is the Two-Factor Vasicek Model.

There isn’t much in the course notes about this model but what there is has been examined
previously so make sure you know this bit of core reading.

101
Relationships Between Rates and Bond Prices

102
17 Credit Risk
Definitions of Default, Credit Events and Recovery Rate

A breach may be not having adequate income and asset cover ratios. It isn’t necessarily the case that if a
debtor defaults then the investor receives nothing. There are many outcomes of default. Examples may
be that the payment stream is:

• rescheduled
• cancelled by the payment of an amount that is less than the default-free value of the
original contract.
• continued but at a reduced rate
• totally wiped out

103
3 Types of Credit Risk Model
Now we will look at three different types of credit models. These are Structural Models,
Reduced-form Models and Intensity-based Models.

Structural models look at the debt and equity of a company, they link the credit risk of a
company to how well the company is doing. In their favour they are simple and they give
an insight into the nature of default and the interaction between debt and equity holders.
However they cannot be realistically used to price credit risk. The Merton Model
is an example of a structural model.

Reduced-form models look at Market Statistics rather than specific data from the company
itself. The most commonly used market statistics are Credit ratings on bonds issued
by the company. Credit ratings are issued by agencies such as Moody’s. A reduced-form
model looks at movements in credit ratings on bonds over time. The output of the model
would be a statistical distribution of the time until default.

Intensity based models are a subset of reduced-form models. They are continuous time
multi-state models where the states are usually credit ratings and the jumps between states
are transition intensities. Two examples of intensity based models would be the Two-state
Model and the Jarrow-Lando-Turnbull (JLT) Model.

104
The Merton Model
We will now look at a specific example of a structural model known as the Merton model.

We will start by assuming that a company has some Zero-coupon debt. We are at time t and the debt is due
for redemption at time T for an amount L. The company also has come equity capital and together with
debt capital this gives total assets of the company as F (t).

Now lets fast forward to time T , if everything has gone well for the company, the total assets, F (T ) should
have grown and they should be more than sufficient to repay the debt amount L.

If things have gone badly for the company, its total assets, F (T ) may well have fallen below the level L
needed to repay the debt.

In which case, the Company will Default on the debt.


In this situation the debt holders will get something back less than L and the equity holders get nothing
back as they rank behind debt-holders.

Therefore at time T we can say that the Shareholders Funds are worth, max (F (T ) − L, 0)
Whereas at time T the bondholders will receive, min (F (T ), L) = L − max (L − F (T ), 0).
So far we have looked at what the share holder’s and bond holder’s funds are worth at time T . But what
about time t.

We recognise that the expression used for the Shareholders funds is the same as the payoff on European
Call Options with Strike Price L and expiry time T . Note that the underlying here is not the shares of the
company but the total assets of the company.

Therefore we could value the shareholder’s funds at time t as a call option with price ct and use Black-Scholes
formula to calculate this as follows,

ct = F (t)Φ(d1 ) − Le−r(T −t) Φ(d2 )

105
Whereas for bond-holders, the maximum of L-F(T) and 0 is the payoff on a European Put Option with
Strike Price L and expiry at time T .

Therefore in the risk-neutral world we can value the bond holders funds at time t as,

Le−r(T −t) − pt

where we can calculate pt by the Put-Call Parity result or by using the Black-Scholes formula directly,

pt = Le−r(T −t) Φ(−d2 ) − F (t)Φ(−d1 )

As well as being used to find out the risk-neutral value of the debt, the Merton model can also be used to
calculate the risk-neutral Probability of Default.

The company will default if,

P (F (T ) < L|F (t))

This is the same as 1 − P (F (T ) > L|F (t)).


The probability that F (T ) > L is given by Φ(d2 ). To see why this is the case remember that we can think
of the call option pricing formula as being the PV of what we get multiplied by the probability we get it less
the PV of what we pay multiplied by the probability we pay it.

The probability we pay it is the same as the probability that we would exercise the option and we would
only do so if it were in-the-money, i.e. if F (T ) > L.

So the risk-neutral probability of default will be given by,

P (F (T ) < L|F (t)) = 1 − Φ(d2 )

The Merton model can also be used to work out the credit spread on a bond.

The credit spread is the difference between the Gross Redemption Yield on the bond and the Gross
Redemption Yield on Government Debt.

The GRY on the bond can be worked out by equating the value of the bond-holders funds with the present
value of L and solving for the interest rate.

106
The Two-State Model
Now we will look at the two-state model which is an example of an intensity based credit risk model.

Just as a person can be alive or dead, a bond can be not defaulted, N, or defaulted, D. λ(t) is the transition
intensity between the two states and here we’re going to assume that it is a deterministic function of t.

We’re going to investigate the probability of a bond not defaulting

NN
t+h p0 =t pN
0
N
×h pN
t
N
=t pN
0
N
× [1 − hλ(t)] + o(h)

Here, we have replaced the probability that the bond doesn’t default for a further h years by 1 minus the prob-
ability that it does. We have used hλ(t) to approximate the probability of default. This approximation
is only good over very small time periods, so we assume h is very small. The o(h) term is a polynomial with
h2 and higher powers of h that make up the difference between the approximation and the actual probability.

Rearranging to form a derivative on the LHS we obtain,

NN
t+h p0 −t pN
0
N
= −pN N
0 λ(t) + o(h)/h
h
Now lets let h → 0,

d NN
t p0 = −t pN N
0 λ(t)
dt
d NN
dt t p0
NN
= −λ(t)
t p0

We now integrate both sides with respect to t with limits 0 and n.

n d NN n
dt t p0
Z Z
NN
dt =− λ(t)dt
0 t p0 0
Z n
NN NN
log n p0 − log 0 p0 =− λ(t)dt
0
Z n 
n p0 = exp − λ(t)dt
0

107
We are going to use the probability of a bond not defaulting to help help us in pricing a corporate bond.
In particular we are considering a Corportate Zero-coupon Bond that is due to pay an amount 1 at
time T . We are going to do this pricing in the risk-neutral world so we know that we can discount at the
risk-free force of interest.

To do this we are also going to need risk-neutral transition intensities. If λ(t) is the transition intensity
in the real world, we can use the idea of equivalent probability measures to say that there will be a λ(t) in
the risk neutral world which we will denote as, λ̃(t).

Here is the pricing formula that we are going to use to price a Zero-coupon bond that is going to pay an
amount 1 at time T ,

n  RT o
−r(T −t) − t λ̃(t)
B(t, T ) = e 1 − (1 − δ) 1 − e .

The formula looks pretty horrific until we start breaking it down.

B(t, T ) is the price of the Zero-coupon bond at time t.

The formula in the red oval below is the one that we derived in the previous slide. But now we are using
risk-neutral intensities.

That means that the green oval must be the risk-neutral probability that the bond does default.

If the bond defaults we are going to assume that we can recover a proportion δ. So we aren’t going to
recover the proportion (1 − δ).

So if we multiply the proportion of payment lost by the probability of losing it we’re going to get the ex-
pected default loss in the risk-neutral world.

The expected payoff overall would be 1 minus this, signified by the black oval in the figure above.

We said at the start that λ(t) was a deterministic function of t. If it was stochastic however we would amend
our formula as seen above.

108
The Jarrow-Lando-Turnbull Model
Finally we will look at a very simple example of the Jarrow-Lando-Turnbull (JLT) Model. This is a type of
intensity-based credit risk model.

The JLT model is really just an extension of the two-state model to allow for more credit ratings, not just
the two states defaulted and not-defaulted.

Therefore the JLT model is more realistic than the two-state model.

We will look at a very simple example where there are just 3 credit ratings. Below is a transition diagram
of the three states and the intensities between them. You may recall this looks very similar to the
HSD model in CT4.

ρ, σ, µ and ν are the transition intensities for moving from one state to another. For this example we have
assumed that they are constant. But they don’t have to be, they could be deterministic functions of
time or even stochastic functions of time.

We are assuming here that these are risk-neutral transition intensties as later on we’re going to price a
ZCB using the risk neutral pricing formula. So we put a tilde on the intensities to illustrate this.

Next we define, Λ̃ to be a matrix of the constant transition intensities.

We also need the matrix of probabilities of moving between states between times t and T , we will denote
this matrix by Π̃,

Z T
Π̃ (t, T ) = exp Λ̃ds
t
The formula that we have here is analogous to the formula that we had for the probability of not defaulting
in the two-state model. To integrate a matrix, you integrate each element in turn. As the transition
intensities are constant we deduce that,

n o
Π̃(t, T ) = exp Λ̃(T − t) .

109
We can take this one step further by using the Taylor’s series expansion to work out this exponential
function.

Each of the terms in the expansion represents a Matrix, where I represents a 3 × 3 identity matrix.

The end result will be a 3 × 3 matrix. The p’s in this matrix are not necessarily constant, they will rely on
(T − t).

We will now look at how this will help us price a ZCB that is due to pay an amount of 1 at time T .

Lets say we’re currently in state A and that default will occur if the credit rating falls to BB. We
can find the risk-neutral probability of this happening from the matrix above. This would be p1,3 .

We will denote the price of the bond by, B(t, T, X(t)) where X(t) denotes the state that we are currently in.

We have decided that the probability of default is given by p1,3 . If the bond does default we can expect to
recover a proportion δ of the total amount of 1. This gives (1-δ) as the proportion lost. Multiplying this by
the probability will give us the expected default loss.

So subtracting this from 1 would give us our expected amount received.

Finally to get the price of the ZCB we discount at the risk-free force of interest.

The full equation is thus given as,

B(t, T, X(t)) = e−r(T −t) {1 − (1 − δ) p̃1,3}

110

Anda mungkin juga menyukai