Anda di halaman 1dari 293

Numerical Methods for Finance

Dr Robert N urnberg
This course introduces the major numerical methods needed for quantitative work in
nance. To this avail, the course will strike a balance between a general survey of
signicant numerical methods anyone working in a quantitative eld should know, and a
detailed study of some numerical methods specic to nancial mathematics. In the rst
part the course will cover e.g.
linear and nonlinear equations, interpolation and optimization,
while the second part introduces e.g.
binomial and trinomial methods, nite dierence methods, Monte-Carlo simulation,
random number generators, option pricing and hedging.
The assessment consists of 80% an exam and 20% a project.
1
1. References
1. Burden and Faires (2004), Numerical Analysis.
2. Clewlow and Strickland (1998), Implementing Derivative Models.
3. Fletcher (2000), Practical Methods of Optimization.
4. Glasserman (2004), Monte Carlo Methods in Financial Engineering.
5. Higham (2004), An Introduction to Financial Option Valuation.
6. Hull (2005), Options, Futures, and Other Derivatives.
7. Kwok (1998), Mathematical Models of Financial Derivatives.
8. Press et al. (1992), Numerical Recipes in C. (online)
9. Press et al. (2002), Numerical Recipes in C
++
.
10. Seydel (2006), Tools for Computational Finance.
11. Wilmott, Dewynne, and Howison (1993), Option Pricing.
2
12. Winston (1994), Operations Research: Applications and Algorithms.
3
2. Preliminaries
1. Algorithms.
An algorithm is a set of instructions to construct an approximate solution to a math-
ematical problem.
A basic requirement for an algorithm is that the error can be made as small as we like.
Usually, the higher the accuracy we demand, the greater is the amount of computation
required.
An algorithm is convergent if it produces a sequence of values which converge to the
desired solution of the problem.
Example: Given c > 1 and > 0, use the bisection method to seek an approxima-
tion to

c with error not greater than .
4
Example
Find x =

c, c > 1 constant.
Answer
x =

c x
2
= c f(x) := x
2
c = 0
f(1) = 1 c < 0 and f(c) = c
2
c > 0
x (1, c) s.t. f( x) = 0
f

(x) = 2 x > 0 f monotonically increasing x is unique.


Denote I
n
:= [a
n
, b
n
] with I
0
= [a
0
, b
0
] = [1, c]. Let x
n
:=
a
n
+b
n
2
.
(i) If f(x
n
) = 0 then x = x
n
.
(ii) If f(a
n
) f(x
n
) > 0 then x (x
n
, b
n
), let a
n+1
:= x
n
, b
n+1
:= b
n
.
(iii) If f(a
n
) f(x
n
) < 0 then x (a
n
, x
n
), let a
n+1
:= a
n
, b
n+1
:= x
n
.
5
Length of I
n
: m(I
n
) =
1
2
m(I
n1
) = =
1
2
n
m(I
0
) =
c1
2
n
Algorithm
Algorithm stops if m(I
n
) < and let x

:= x
n
.
Error as small as we like?
x, x

I
n
error |x

x| = |x
n
x| m(I
n
) 0 as n .
Convergence?
I
0
I
1
I
n

! x

n=0
I
n
, f( x) = 0 , i.e. x =

c.
Implementation:
No need to dene I
n
= [a
n
, b
n
]. It is sucient to store only 3 points throughout.
Suppose x (a, b), dene x :=
a+b
2
.
If x (a, x) let a := a, b := x, otherwise let a := x, b := b.
6
2. Errors.
There are various errors in computed solutions, such as
discretization error (discrete approximation to continuous systems),
truncation error (termination of an innite process), and
rounding error (nite digit limitation in computer arithmetic).
If a is a number and a is an approximation to a, then
the absolute error is |a a| and
the relative error is
|a a|
|a|
provided a = 0.
Example: Discuss sources of errors in deriving the numerical solution of the non-
linear dierential equation x

= f(x) on the interval [a, b] with initial condition


x(a) = x
0
.
7
Example
discretization error
x

= f(x) [dierential equation]


x(t + h) x(t)
h
= f(x(t)) [dierence equation]
DE =

x(t + h) x(t)
h
x

(t)

truncation error
lim
n
x
n
= x, approximate x with x
N
, N a large number.
TE = |x x
N
|
rounding error
We cannot express x exactly, due to nite digit limitation. We get x instead.
RE = |x x|
Total error = DE + TE + RE.
8
3. Well/Ill Conditioned Problems.
A problem is well-conditioned (or ill-conditioned) if every small perturbation of the
data results in a small (or large) change in the solution.
Example: Show that the solution to equations x + y = 2 and x + 1.01 y = 2.01 is
ill-conditioned.
Exercise: Show that the following problems are ill-conditioned:
(a) solution to the dierential equation x

10x

11x = 0 with initial conditions


x(0) = 1 and x

(0) = 1,
(b) value of q(x) = x
2
+ x 1150 if x is near a root.
9
Example
_
_
_
x + y = 2
x + 1.01 y = 2.01

_
_
_
x = 1
y = 1
Change 2.01 to 2.02:
_
_
_
x + y = 2
x + 1.01 y = 2.02

_
_
_
x = 0
y = 2
I.e. 0.5% change in data produces 100% change in solution: ill-conditioned !
[reason: det
_
1 1
1 1.01
_
= 0.01 nearly singular]
10
4. Taylor Polynomials.
Suppose f, f

, . . . , f
(n)
are continuous on [a, b] and f
(n+1)
exists on (a, b). Let x
0

[a, b]. Then for every x [a, b], there exists a between x
0
and x with
f(x) =
n

k=0
f
(k)
(x
0
)
k!
(x x
0
)
k
+ R
n
(x)
where R
n
(x) =
f
(n+1)
()
(n + 1)!
(x x
0
)
n+1
is the remainder.
[Equivalently: R
n
(x) =
_
x
x
0
f
(n+1)
(t)
n!
(x t)
n
dt.]
Examples:
exp(x) =

k=0
x
k
k!
sin(x) =

k=0
(1)
k
(2k + 1)!
x
2k+1
11
5. Gradient and Hessian Matrix.
Assume f : R
n
R.
The gradient of f at a point x, written as f(x), is a column vector in R
n
with ith
component
f
x
i
(x).
The Hessian matrix of f at x, written as
2
f(x), is an n n matrix with (i, j)th
component

2
f
x
i
x
j
(x). [As

2
f
x
i
x
j
=

2
f
x
j
x
i
,
2
f(x) is symmetric.]
Examples:
f(x) = a
T
x, a R
n
f = a,
2
f = 0
f(x) =
1
2
x
T
Ax, A symmetric f(x) = Ax,
2
f = A
f(x) = exp(
1
2
x
T
Ax), A symmetric
f(x) = exp(
1
2
x
T
Ax) Ax,

2
f(x) = exp(
1
2
x
T
Ax) Axx
T
A + exp(
1
2
x
T
Ax) A
12
6. Taylors Theorem.
Suppose that f : R
n
R is continuously dierentiable and that p R
n
.
Then we have
f(x + p) = f(x) + f(x + t p)
T
p ,
for some t (0, 1).
Moreover, if f is twice continuously dierentiable, we have
f(x + p) = f(x) +
_
1
0

2
f(x + t p) p dt,
and
f(x + p) = f(x) + f(x)
T
p +
1
2
p
T

2
f(x + t p) p,
for some t (0, 1).
13
7. Positive Denite Matrices.
An n n matrix A = (a
ij
) is positive denite if it is symmetric (i.e. A
T
= A) and
x
T
Ax > 0 for all x R
n
\ {0}. [I.e. x
T
Ax 0 with = only if x = 0.]
The following statements are equivalent:
(a) A is a positive denite matrix,
(b) all eigenvalues of A are positive,
(c) all leading principal minors of A are positive.
The leading principal minors of A are the determinants
k
, k = 1, 2, . . . , n, dened
by

1
= det[a
11
],
2
= det
_
a
11
a
12
a
21
a
22
_
, . . . ,
n
= det A.
A matrix A is symmetric and positive semi-denite, if x
T
Ax 0 for all x R
n
.
Exercise.
Show that any positive denite matrix A has only positive diagonal entries.
14
8. Convex Sets and Functions.
A set S R
n
is a convex set if the straight line segment connecting any two points
in S lies entirely inside S, i.e., for any two points x, y S we have
x + (1 ) y S [0, 1].
A function f : D R is a convex function if its domain D R
n
is a convex set
and if for any two points x, y D we have
f(x + (1 ) y) f(x) + (1 ) f(y) [0, 1].
Exercise.
Let D R
n
be a convex, open set.
(a) If f : D R is continuously dierentiable, then f is convex if and only if
f(y) f(x) + f(x)
T
(y x) x, y D.
(b) If f is twice continuously dierentiable, then f is convex if and only if
2
f(x) is
positive semi-denite for all x in the domain.
15
Exercise 2.8.
(a)
As f is convex we have for any x, y in the convex set D that
f(y + (1 ) x) f(y) + (1 ) f(x) [0, 1].
Hence
f(y)
f(x + (y x)) f(x)

+ f(x) .
Letting 0 yields f(y) f(x) + f(x)
T
(y x).

For any x
1
, x
2
D and [0, 1] let x := x
1
+ (1 ) x
2
D and y := x
1
.
On noting that y x = x
1
x
1
(1 ) x
2
= (1 ) (x
1
x
2
) we have that
f(x
1
) = f(y) f(x) + f(x)
T
(y x) = f(x) + (1 ) f(x)
T
(x
1
x
2
) . ()
Similarly, letting x := x
1
+ (1 ) x
2
and y := x
2
gives, on noting that y x =
(x
2
x
1
), that
f(x
2
) f(x) + f(x)
T
(x
2
x
1
) . ()
16
Combining () + (1 ) () gives
f(x
1
) + (1 ) f(x
2
) f(x) = f(x
1
+ (1 ) x
2
f is convex.
(b)
For any x, x
0
D use Taylors theorem at x
0
:
f(x) = f(x
0
)+f(x
0
)
T
(xx
0
)+
1
2
(xx
0
)
T

2
f(x
0
+ (xx
0
)) (xx
0
) (0, 1)
As
2
f is positive semi-denite, this immediately gives
f(x) f(x
0
) + f(x
0
)
T
(x x
0
) f is convex.

Assume
2
f is not positive semi-denite in the domain D. Then there exists x
0
D
and x R
n
s.t. x
T

2
f(x
0
) x < 0. As D is open we can nd x
1
:= x
0
+ x D, for
> 0 suciently small. Hence (x
1
x
0
)
T

2
f(x
0
) (x
1
x
0
) < 0. For x
1
suciently
close to x
0
the continuity of
2
f gives (x
1
x
0
)
T

2
f(x
0
+ (x
1
x
0
)) (x
1
x
0
) < 0
for all (0, 1). Taylors theorem then yields f(x
1
) < f(x
0
) +f(x
0
)
T
(x
1
x
0
). This
contradicts f being convex, see (a).
17
9. Vector Norms.
A vector norm on R
n
is a function, , from R
n
into R with the following properties:
(i) x 0 for all x R
n
and x = 0 if and only if x = 0.
(ii) x = || x for all R and x R
n
.
(iii) x + y x + y for all x, y R
n
.
Common vector norms are the l
1
, l
2
(Euclidean), and l

norms:
x
1
=
n

i=1
|x
i
|, x
2
=
_
n

i=1
x
2
i
_
1/2
, x

= max
1in
|x
i
|.
Exercise.
(a) Prove that
1
,
2
and

are norms.
(b) Given a symmetric positive denite matrix A, prove that
x
A
:=

x
T
Ax
is a norm.
18
Example.
Draw graphs dened by x
1
1, x
2
1, x

1 when n = 2.
l
1
l
2
l

Exercise. Prove that for all x, y R


n
we have
(a)
n

i=1
|x
i
y
i
| x
2
y
2
[Scharz inequality]
and
(b)
1

n
x
2
x

x
1

nx
2
.
19
10. Spectral Radius.
The spectral radius of a matrix A R
nn
is dened by (A) = max
1in
|
i
|, where

1
, . . . ,
n
are all the eigenvalues of A.
11. Matrix Norms.
For an n n matrix A, the natural matrix norm A for a given vector norm
is dened by
A = max
x=1
Ax.
The common matrix norms are
A
1
= max
1jn
n

i=1
|a
ij
|, A
2
=
_
(A
T
A)
. .
(A) if A=A
T
, A

= max
1in
n

j=1
|a
ij
|.
Exercise: Compute A
1
, A

, and A
2
for A =
_
1 1 0
1 2 1
1 1 2
_
.
(Answer: A
1
= A

= 4 and A
2
=
_
7 +

7 3.1.)
20
12. Convergence.
A sequence of vectors {x
(k)
} R
n
is said to converge to a vector x R
n
if
x
(k)
x 0 as k for an arbitrary vector norm . This is equivalent
to the componentwise convergence, i.e., x
(k)
i
x
i
as k , i = 1, . . . , n.
A square matrix A R
nn
is said to be convergent if A
k
0 as k , which
is equivalent to (A
k
)
ij
0 as k for all i, j.
The following statements are equivalent:
(i) A is a convergent matrix,
(ii) (A) < 1,
(iii) lim
k
A
k
x = 0, for every x R
n
.
Exercise. Show that A is convergent, where
A =
_
1/2 0
1/4 1/2
_
.
21
3. Algebraic Equations
1. Decomposition Methods for Linear Equations.
A matrix A R
nn
is said to have LU decomposition if A = LU where L R
nn
is a lower triangular matrix (l
ij
= 0 if 1 i < j n) and U R
nn
is an upper
triangular matrix (u
ij
= 0 if 1 j < i n).
The decomposition is unique if one assumes e.g. l
ii
= 1 for 1 i n.
L =
_
_
_
_
_
_
_
_
l
11
l
21
l
22
l
31
l
32
l
33
.
.
.
.
.
.
l
n1
l
n2
l
n3
. . . l
nn
_
_
_
_
_
_
_
_
, U =
_
_
_
_
_
_
_
_
u
11
u
12
u
13
. . . u
1n
u
22
u
23
. . . u
2n
.
.
.
.
.
.
u
n1,n1
u
n1,n
u
nn
_
_
_
_
_
_
_
_
22
In general, the diagonal elements of either L or U are given and the remaining elements
of the matrices are determined by directly comparing two sides of the equation.
The linear system Ax = b is then equivalent to Ly = b and U x = y.
Exercise.
Show that the solution to Ly = b is
y
1
= b
1
/l
11
, y
i
= (b
i

i1

k=1
l
ik
y
k
)/l
ii
, i = 2, . . . , n
(forward substitution) and the solution to U x = y is
x
n
= y
n
/u
nn
, x
i
= (y
i

k=i+1
u
ik
x
k
)/u
ii
, i = n 1, . . . , 1
(backward substitution).
23
2. Crout Algorithm.
Exercise.
Let A be tridiagonal, i.e. a
ij
= 0 if |i j| > 1 (a
ij
= 0 except perhaps a
i1,i
, a
ii
and
a
i,i+1
), and strictly diagonally dominant (|a
ii
| >

j=i
|a
ij
| holds for i = 1, . . . , n).
Show that A can be factorized as A = LU where l
ii
= 1 for i = 1, . . . , n, u
11
= a
11
,
and
u
i,i+1
= a
i,i+1
l
i+1,i
= a
i+1,i
/u
ii
u
i+1,i+1
= a
i+1,i+1
l
i+1,i
u
i,i+1
for i = 1, . . . , n 1. [Note: L and U are tridiagonal.]
C
++
Exercise: Write a program to solve a tridiagonal and strictly diagonally dom-
inant linear equation Ax = b by the Crout algorithm. The input are the number of
variables n, the matrix A, and the vector b. The output is the solution x.
24
Exercise 3.2.
u
11
= a
11
and u
i,i+1
= a
i,i+1
, l
i+1,i
= a
i+1,i
/u
ii
, u
i+1,i+1
= a
i+1,i+1
l
i+1,i
u
i,i+1
, for
i = 1, . . . , n 1, can easily be shown.
It remains to show that u
ii
= 0 for i = 1, . . . , n. We proceed by induction to show
that
|u
ii
| > |a
i,i+1
|,
where for convenience we dene a
n,n+1
:= 0.
i = 1: |u
11
| = |a
11
| > |a
1,2
|
i i + 1:
|u
i+1,i+1
| =

a
i+1,i+1

a
i+1,i
a
i,i+1
u
ii

|a
i+1,i+1
| |a
i+1,i
|
|a
i,i+1
|
|u
ii
|
|a
i+1,i+1
| |a
i+1,i
| > |a
i+1,i+2
|
Overall we have that |u
ii
| > 0 and so the Crout algorithm is well dened. Moreover,
det(A) = det(L) det(U) = det(U) =

n
i=1
u
ii
= 0.
25
3. Choleski Algorithm.
Exercise.
Let A be a positive denite matrix. Show that A can be factorized as A = LL
T
where L is a lower triangular matrix.
(i) Compute 1st column:
l
11
=

a
11
, l
i1
= a
i1
/l
11
, i = 2, . . . , n.
(ii) For j = 2, . . . , n 1 compute jth column:
l
jj
= (a
jj

j1

k=1
l
2
jk
)
1
2
l
ij
= (a
ij

j1

k=1
l
ik
l
jk
)/l
jj
, i = j + 1, . . . , n.
26
(iii) Compute nth column:
l
nn
= (a
nn

n1

k=1
l
2
nk
)
1
2
.
27
4. Iterative Methods for Linear Equations.
Split A into A = M +N with M nonsingular and convert the equation Ax = b into
an equivalent equation x = C x + d with C = M
1
N and d = M
1
b.
Choose an initial vector x
(0)
and then generate a sequence of vectors by
x
(k)
= C x
(k1)
+ d, k = 1, 2, . . .
The resulting sequence converges to the solution of Ax = b, for an arbitrary initial
vector x
(0)
, if and only if (C) < 1.
The objective is to choose M such that M
1
is easy to compute and (C) < 1.
The iteration stops if x
(k)
x
(k1)
< .
[Note: In practice one solves M x
(k)
= N x
(k1)
+ b, for k = 1, 2, . . ..]
28
Claim.
The iteration x
(k)
= C x
(k1)
+ d is convergent if and only if (C) < 1.
Proof.
Dene e
(k)
:= x
(k)
x, the error of the kth iterate. Then
e
(k)
= C x
(k1)
+ d (C x + d) = C (x
(k1)
x) = C e
(k1)
= C
2
e
(k2)
= . . . C
k
e
(0)
,
where e
(0)
= x
(0)
x is an arbitrary vector.
Assume C is similar to the diagonal matrix = diag(
1
, . . . ,
n
), where
i
are the
eigenvalues of C.
X nonsingular s.t. C = X X
1
e
(k)
= C
k
e
(0)
= X
k
X
1
e
(0)
= X
_
_
_

k
1
.
.
.

k
n
_
_
_
X
1
e
(0)
0 as k
|
i
| < 1 i = 1, . . . , n
(C) < 1 .
29
5. Jacobi Algorithm.
Exercise: Let M = D and N = L + U (L strict lower triangular part of A, D
diagonal, U strict upper triangular part). Show that the ith component at the kth
iteration is
x
(k)
i
=
1
a
ii
_
_
b
i

i1

j=1
a
ij
x
(k1)
j

n

j=i+1
a
ij
x
(k1)
j
_
_
for i = 1, . . . , n.
6. GaussSeidel Algorithm.
Exercise: Let M = D + L and N = U. Show that the ith component at the kth
iteration is
x
(k)
i
=
1
a
ii
_
_
b
i

i1

j=1
a
ij
x
(k)
j

n

j=i+1
a
ij
x
(k1)
j
_
_
for i = 1, . . . , n.
30
7. SOR Algorithm.
Exercise.
Let M =
1

D + L and N = U + (1
1

)D where 0 < < 2. Show that the ith


component at the kth iteration is
x
(k)
i
= (1 ) x
(k1)
i
+
1
a
ii
_
_
b
i

i1

j=1
a
ij
x
(k)
j

n

j=i+1
a
ij
x
(k1)
j
_
_
for i = 1, . . . , n.
C
++
Exercise: Write a program to solve a diagonally dominant linear equation
Ax = b by the SOR algorithm. The input are the number of variables n, the matrix
A, the vector b, the initial vector x
0
, the relaxation parameter , and tolerance .
The output is the number of iterations k and the approximate solution x
k
.
31
8. Special Matrices.
If A is strictly diagonally dominant, then Jacobi and GaussSeidel converge for any
initial vector x
(0)
. In addition, SOR converges for (0, 1].
If A is positive denite and 0 < < 2, then the SOR method converges for any
initial vector x
(0)
.
If A is positive denite and tridiagonal, then (C
GS
) = [(C
J
)]
2
< 1 and the optimal
choice of for the SOR method is =
2
1 +
_
1 (C
GS
)
[1, 2). With this choice
of , (C
SOR
) = 1 (C
GS
).
Exercise.
Find the optimal for the SOR method for the matrix
A =
_

_
4 3 0
3 4 1
0 1 4
_

_
.
(Answer: 1.24.)
32
9. Condition Numbers.
The condition number of a nonsingular matrix A relative to a norm is dened
by
(A) = A A
1
.
Note that (A) AA
1
= I = max
x=1
x = 1.
A matrix A is well-conditioned if (A) is close to one and is ill-conditioned if (A)
is much larger than one.
Suppose A <
1
A
1

. Then the solution x to (A+A) x = b +b approximates


the solution x of Ax = b with error estimate
x x
x

(A)
1 A A
1

_
b
b
+
A
A
_
.
In particular, if A = 0 (no perturbation to matrix A) then
x x
x
(A)
b
b
.
33
Example.
Consider Example 1.3.
A =
_
1 1
1 1.01
_
A
1
=
1
det A
_
1.01 1
1 1
_
=
1
0.01
_
1.01 1
1 1
_
=
_
101 100
100 100
_
Recall
A
1
= max
1jn
n

i=1
|a
ij
| .
Hence
A
1
= max(2, 2.01) = 2.01 , A
1

1
= max(201, 200) = 201 .

1
(A) = A
1
A
1

1
= 404.01 1 (ill-conditioned!)
Similarly

= 404.01 and
2
= (A) (A
1
) =

max

min
= 402.0075.
34
10. Hilbert Matrix.
An n n Hilbert matrix H
n
= (h
ij
) is dened by h
ij
= 1/(i + j 1) for i, j =
1, 2, . . . , n.
Hilbert matrices are notoriously ill-conditioned and (H
n
) very rapidly as
n .
H
n
=
_
_
_
_
_
_
_
_
_
_
_
1
1
2
. . .
1
n
1
2
1
3
. . .
1
n + 1
.
.
.
.
.
.
1
n
1
n + 1
. . .
1
2n 1
_
_
_
_
_
_
_
_
_
_
_
Exercise.
Compute the condition numbers
1
(H
3
) and
1
(H
6
).
35
(Answer:
1
(H
3
) = 748 and
1
(H
6
) = 2.9 10
6
.)
36
11. Fixed Point Method for Nonlinear Equations.
A function g : R R has a xed point x if
g( x) = x.
A function g is a contraction mapping on [a, b] if g : [a, b] [a, b] and
|g

(x)| L < 1, x (a, b)


where L is a constant.
Exercise.
Assume g is a contraction mapping on [a, b]. Prove that g has a unique xed point x
in [a, b], and for any x
0
[a, b], the sequence dened by
x
n+1
= g(x
n
), n 0,
converges to x. The algorithm is called xed point iteration.
37
Exercise 3.11.
Existence:
Dene h(x) = x g(x) on [a, b]. Then h(a) = a g(a) 0 and h(b) = b g(b) 0.
As h is continuous, c [a, b] s.t. h(c) = 0. I.e. c = g(c).
Uniqueness:
Suppose p, q [a, b] are two xed points. Then
|p q| = |g(p) g(q)| =
..
MVT, (a,b)
|g

() (p q)| L|p q|
(1 L) |p q| 0 |p q| 0 p = q .
Convergence:
|x
n
x| = |g(x
n1
) g( x)| = |g

() (x
n1
x)|
L|x
n1
x| . . . L
n
|x
0
x| 0 as n .
Hence
x
n
x as n .
38
12. Newton Method for Nonlinear Equations.
Assume that f C
1
([a, b]), f( x) = 0 ( x is a root or zero) and f

( x) = 0.
The Newton method can be used to nd the root x by generating a sequence {x
n
}
satisfying
x
n+1
= x
n

f(x
n
)
f

(x
n
)
, n = 0, 1, . . .
provided f

(x
n
) = 0 for all n.
The sequence x
n
converges to the root x as long as the initial point x
0
is suciently
close to x.
The algorithm stops if |x
n+1
x
n
| < , a prescribed error tolerance, and x
n+1
is
taken as an approximation to x.
Geometric Interpretation:
Tangent line at (x
n
, f(x
n
)) is
Y = f(x
n
) + f

(x
n
) (X x
n
).
39
Setting Y = 0 yields x
n+1
:= X = x
n

f(x
n
)
f

(x
n
)
.
40
13. Choice of Initial Point.
Suppose f C
2
([a, b]) and f( x) = 0 with f

( x) = 0. Then there exists > 0 such


that the Newton method generates a sequence x
n
converging to x for any initial point
x
0
[ x , x + ] (x
0
can only be chosen locally).
However, if f satises the following additional conditions:
1. f(a)f(b) < 0,
2. f

does not change sign on [a, b],


3. tangent lines to the curve y = f(x) at both a and b cut the x-axis within [a, b];
(i.e. a
f(a)
f

(a)
, b
f(b)
f

(b)
[a, b])
then f(x) = 0 has a unique root x in [a, b] and Newton method converges to x for
any initial point x
0
[a, b] (x
0
can be chosen globally).
Example.
Use the Newton method to compute x =

c, c > 1, and show that the initial point
can be any point in [1, c].
41
Example.
Find x =

c, c > 1.
Answer.
x is root of f(x) := x
2
c.
Newton: x
n+1
= x
n

f(x
n
)
f

(x
n
)
= x
n

x
2
n
c
2 x
n
=
1
2
_
x
n
+
c
x
n
_
, n 0.
How to choose x
0
?
Check the 3 conditions on [1, c].
1. f(1) = 1 c < 0, f(c) = c
2
c > 0. f(1) f(c) < 0
2. f

= 2
3. Tangent line at 1: Y = f(1) + f

(1) (X 1) = 1 c + 2 (X 1)
Let Y = 0, then X = 1 +
c 1
2
(1, c).
Tangent line at c: Y = f(c) + f

(c) (X c) = c
2
c + 2 c (X c)
Let Y = 0, then X = c
c 1
2
(1, c).
42
Newton convergence for any x
0
[1, c].
43
Numerical Example.
Find

7. (From calculator:

7 = 2.6457513.)
Newton converges for all x
0
[1, 7]. Choose x
0
= 4.
x
1
=
1
2
_
x
0
+
7
x
0
_
= 2.875
x
2
= 2.6548913
x
3
= 2.6457670
x
4
= 2.6457513
Comparison to bisection method with I
0
= [1, 7]:
I
1
= [1, 4]
I
2
= [2.5, 4]
I
3
= [2.5, 3.25]
I
4
= [2.5, 2.875]
.
.
.
I
25
= [2.6457512, 2.6457513]
44
14. Pitfalls.
Here are some diculties which may be encountered with the Newton method:
1. {x
n
} may wander around and not converge (there are only complex roots to the
equation),
2. initial approximation x
0
is too far away from the desired root and {x
n
} converges
to some other root (this usually happens when f

(x
0
) is small),
3. {x
n
} may diverge to +(the function f is positive and monotonically decreasing
on an unbounded interval), and
4. {x
n
} may repeat (cycling).
45
15. Rate of Convergence.
Suppose {x
n
} is a sequence that converges to x.
The convergence is said to be linear if there is a constant r (0, 1) such that
|x
n+1
x|
|x
n
x|
r, for all n suciently large.
The convergence is said to be superlinear if
lim
n
|x
n+1
x|
|x
n
x|
= 0.
In particular, the convergence is said to be quadratic if
|x
n+1
x|
|x
n
x|
2
M, for all n suciently large.
where M is a positive constant, not necessarily less than 1.
Example. x
n
= x + 0.5
n
linear, x
n
= x + 0.5
2
n
quadratic.
Example. Show that the Newton method converges quadratically.
46
Example.
Dene g(x) = x
f(x)
f

(x)
. Then the Newton method is given by
x
n+1
= g(x
n
) .
Moreover, f( x) = 0 and f

( x) = 0 imply that
g( x) = x,
g

( x) = 1
(f

)
2
f f

(f

)
2
( x) = f( x)
f

( x)
(f

( x))
2
= 0 ,
g

( x) =
f

( x)
f

( x)
.
Assuming that x
n
x we have that
|x
n+1
x|
|x
n
x|
2
=
|g(x
n
) g( x)|
|x
n
x|
2
=
..
Taylor
|g

( x) (x
n
x) +
1
2
g

(
n
) (x
n
x)
2
|
|x
n
x|
2
=
1
2
|g

(
n
)|
1
2
|g

( x)| =: as n .
47
Hence |x
n+1
x| |x
n
x|
2
quadratic convergence rate.
48
4. Interpolations
1. Polynomial Approximation.
For any continuous function f dened on an interval [a, b], there exist polynomials P
that can be as close to the given function as desired.
Taylor polynomials agree closely with a given function at a specic point, but they
concentrate their accuracy only near that point.
A good polynomial needs to provide a relatively accurate approximation over an entire
interval.
49
2. Interpolating Polynomial Lagrange Form.
Suppose x
i
[a, b], i = 0, 1, . . . , n, are pairwise distinct mesh points in [a, b].
The Lagrange polynomial p is a polynomial of degree n such that
p(x
i
) = f(x
i
), i = 0, 1, . . . , n.
p can be constructed explicitly as
p(x) =
n

i=0
L
i
(x)f(x
i
)
where L
i
is a polynomial of degree n satisfying
L
i
(x
j
) = 0, j = i, L
i
(x
i
) = 1.
This results in
L
i
(x) =

j=i
_
x x
j
x
i
x
j
_
i = 0, 1, . . . , n.
p is called linear interpolation if n = 1 and quadratic interpolation if n = 2.
50
Exercise.
Find the Lagrange polynomial p for the following points (x, f(x)): (1, 0), (1, 3),
and (2, 4). Assume that a new point (0, 2) is observed, and construct a Lagrange
polynomial to incorporate this new information in it.
Error formula.
Suppose f is n + 1 times dierentiable on [a, b]. Then it holds that
f(x) = p(x) +
f
(n+1)
()
(n + 1)!
(x x
0
) (x x
n
),
where = (x) lies in (a, b).
Proof.
Dene g(x) = f(x) p(x) +
n

j=0
(x x
j
), where R is a constant. Clearly,
g(x
j
) = 0 for j = 0, . . . , n. To estimate the error at x = {x
0
, . . . , x
n
}, choose
such that g() = 0.
51
Hence
g(x) = f(x) p(x) (f() p())
n

j=0
_
x x
j
x
j
_
.
g has at least n + 2 zeros: x
0
, . . . , x
n
, .
Mean Value Theorem yields that
g

has at least n + 1 zeros


.
.
.
g
(n+1)
has at least 1 zero, say
Moreover, as p is polynomial of degree n, it holds that p
(n+1)
= 0.
52
Hence
0 = g
(n+1)
() = f
(n+1)
() (f() p())
(n + 1)!

n
j=0
( x
j
)
Error = f() p() =
f
(n+1)
()
(n + 1)!
n

j=0
( x
j
) .
53
3. Trapezoid Rule.
We can use linear interpolation (n = 1, x
0
= a, x
1
= b) to approximate f(x) on [a, b]
and then compute
_
b
a
f(x) dx to get the trapezoid rule:
_
b
a
f(x) dx
1
2
(b a) [f(a) + f(b)] .
If we partition [a, b] into n equal subintervals with mesh points x
i
= a + ih, i =
0, . . . , n, and step size h = (b a)/n, we can derive the composite trapezoid rule:
_
b
a
f(x) dx
h
2
[f(x
0
) + 2
n1

i=1
f(x
i
) + f(x
n
)] .
The approximation error is in the order of O(h
2
) if |f

| is bounded.
54
Use linear interpolation (n = 1, x
0
= a, x
1
= b) to approximate f(x) on [a, b] and then
compute
_
b
a
f(x) dx.
Answer.
The linear interpolating polynomial is p(x) = f(a) L
0
(x) + f(b) L
1
(x), where
L
0
(x) =
x b
a b
, and L
1
(x) =
x a
b a
.

_
b
a
f(x) dx
_
b
a
p(x) dx = f(a)
_
b
a
x b
a b
dx + f(b)
_
b
a
x a
b a
dx
= f(a)
1
a b
1
2
(x b)
2
|
b
a
+f(b)
1
b a
1
2
(x a)
2
|
b
a
= f(a)
1
a b
1
2
((a b)
2
) + f(b)
1
b a
1
2
(b a)
2
=
b a
2
(f(a) + f(b)) Trapezoid Rule
[Of course, one could have arrived at this formula with a simple geometric argument.]
55
Error Analysis.
Let f(x) = p(x) + E(x), where E(x) =
f

()
2
(x a) (x b) with (a, b). Assume
that |f

| M is bounded. Then

_
b
a
E(x) dx

_
b
a
|E(x)| dx
M
2
_
b
a
(x a) (b x) dx
=
M
2
_
b
a
(x a) [(b a) (x a)] dx
=
M
2
_
b
a
_
(x a)
2
+ (b a) (x a)

dx
=
M
2
_

1
3
(b a)
3
+
1
2
(b a)
3
_
dx
=
M
12
(b a)
3
.
56
The composite formula can be obtained by considering the partitioning of [a, b] into
a = x
0
< x
1
< . . . < x
n1
< x
n
= b, where x
i
= a + i h with h :=
b a
n
.
_
b
a
f(x) dx =
n1

i=0
_
x
i+1
x
i
f(x) dx
n1

i=0
x
i+1
x
i
2
(f(x
i
) + f(x
i+1
))
=
n1

i=0
h
2
(f(x
i
) + f(x
i+1
))
= h
_
1
2
f(a) + f(x
1
) + . . . + f(x
n1
) +
1
2
f(b)
_
.
Error analysis then yields that
Error
M
12
h
3
n =
M (b a)
12
h
2
= O(h
2
) .
57
4. Simpsons Rule.
Exercise.
Use quadratic interpolation (n = 2, x
0
= a, x
1
=
a+b
2
, x
2
= b) to approximate f(x)
on [a, b] and then compute
_
b
a
f(x) dx to get the Simpsons rule:
_
b
a
f(x) dx
1
6
(b a) [f(a) + 4 f(
a + b
2
) + f(b)] .
Derive the composite Simpsons rule:
_
b
a
f(x) dx
h
3
[f(x
0
) + 2
n/2

i=2
f(x
2i2
) + 4
n/2

i=1
f(x
2i1
) + f(x
n
)] ,
where n is an even number and x
i
and h are chosen as in the composite trapezoid
rule.
[One can show that the approximation error is in the order of O(h
4
) if |f
(4)
| is
bounded.]
58
5. NewtonCotes Formula.
Suppose x
0
, . . . , x
n
are mesh points in [a, b], usually mesh points are equally spaced
and x
0
= a, x
n
= b, then integral can be approximated by the NewtonCotes
formula:
_
b
a
f(x) dx
n

i=0
A
i
f(x
i
)
where parameters A
i
are determined in such a way that the integral is exact for all
polynomials of degree n.
[Note: n+1 unknowns A
i
and n+1 coecients for polynomial of degree n.] Exercise.
Use NewtonCotes formula to derive the trapezoid rule and the Simpsons rule. Prove
that if f is n + 1 times dierentiable and |f
(n+1)
| M on [a, b] then
|
_
b
a
f(x) dx
n

i=0
A
i
f(x
i
)|
M
(n + 1)!
_
b
a
n

i=0
|x x
i
| dx.
59
Exercise 4.5.
We have that
_
b
a
q(x) dx =
n

i=0
A
i
q(x
i
) for all polynomials q of degree n.
Let q(x) = L
j
(x), where L
j
is the jth Lagrange polynomial for the data points x
0
, x
1
, . . . , x
n
.
I.e. L
j
is of degree n and satises L
j
(x
i
) =
ij
=
_
_
_
1 i = j
0 i = j
. Now
_
b
a
L
j
dx =
n

i=0
A
i
L
j
(x
i
) = A
j

_
b
a
f(x) dx
n

i=0
A
i
f(x
i
) =
n

i=0
f(x
i
)
_
b
a
L
i
(x) dx
=
_
b
a
n

i=0
f(x
i
) L
i
(x) dx =
_
b
a
p(x) dx,
where p(x) is the interpolating Lagrange polynomial to f. Hence we nd trapezoid
60
(n = 1) and Simpsons rule (n = 2, with x
1
=
a+b
2
).
The Lagrange polynomial has the error term
f(x) = p(x) + E(x), E(x) :=
f
(n+1)
()
(n + 1)!
(x x
0
) (x x
n
),
where = (x) lies in (a, b). Hence

_
b
a
f(x) dx
_
b
a
p(x) dx

_
b
a
E(x) dx

_
b
a
|E(x)| dx

M
(n + 1)!
_
b
a
n

i=0
|x x
i
| dx.
61
6. Ordinary Dierential Equations.
An initial value problem for an ODE has the form
x

(t) = f(t, x(t)), a t b and x(a) = x


0
. (1)
(1) is equivalent to the integral equation:
x(t) = x
0
+
_
t
a
f(s, x(s)) ds, a t b. (2)
To solve (2) numerically we divide [a, b] into subintervals with mesh points t
i
= a+ih,
i = 0, . . . , n, and step size h = (b a)/n. (2) implies
x(t
i+1
) = x(t
i
) +
_
t
i+1
t
i
f(s, x(s)) ds, i = 0, . . . , n 1.
62
(a) If we approximate f(s, x(s)) on [t
i
, t
i+1
] by f(t
i
, x(t
i
)), we get the Euler (explicit)
method for equation (1):
w
i+1
= w
i
+ hf(t
i
, w
i
), w
0
= x
0
.
We have x(t
i+1
) w
i+1
if h is suciently small.
[Taylor: x(t
i+1
) = x(t
i
) + x

(t
i
) h + O(h
2
) = x(t
i
) + f(t
i
, x(t
i
)) h + O(h
2
).]
(b) If we approximate f(s, x(s)) on [t
i
, t
i+1
] by linear interpolation with points (t
i
, f(t
i
, x(t
i
)))
and (t
i+1
, f(t
i+1
, x(t
i+1
))), we get the trapezoidal (implicit) method for equation
(1):
w
i+1
= w
i
+
h
2
[f(t
i
, w
i
) + f(t
i+1
, w
i+1
)], w
0
= x
0
.
(c) If we combine the Euler method with the trapezoidal method, we get the modied
Euler (explicit) method (or RungeKutta 2nd order method):
w
i+1
= w
i
+
h
2
[f(t
i
, w
i
) + f(t
i+1
, w
i
+ hf(t
i
, w
i
)
. .
w
i+1
)], w
0
= x
0
.
63
7. Divided Dierences.
Suppose a function f and (n + 1) distinct points x
0
, x
1
, . . . , x
n
are given. Divided
dierences of f can be expressed in a table format as follows:
x
k
0DD 1DD 2DD 3DD . . .
x
0
f[x
0
]
x
1
f[x
1
] f[x
0
, x
1
]
x
2
f[x
2
] f[x
1
, x
2
] f[x
0
, x
1
, x
2
]
x
3
f[x
3
] f[x
2
, x
3
] f[x
1
, x
2
, x
3
] f[x
0
, x
1
, x
2
, x
3
]
.
.
.
.
.
.
64
where f[x
i
] = f(x
i
)
f[x
i
, x
i+1
] =
f[x
i+1
] f[x
i
]
x
i+1
x
i
f[x
i
, x
i+1
, . . . , x
i+k
] =
f[x
i+1
, x
i+2
, . . . , x
i+k
] f[x
i
, x
i+1
, . . . , x
i+k1
]
x
i+k
x
i
f[x
1
, . . . , x
n
] =
f[x
2
, . . . , x
n
] f[x
1
, . . . , x
n1
]
x
n
x
1
65
8. Interpolating Polynomial Newton Form.
One drawback of Lagrange polynomials is that there is no recursive relationship
between P
n1
and P
n
, which implies that each polynomial has to be constructed
individually. Hence, in practice one uses the Newton polynomials.
The Newton interpolating polynomial P
n
of degree n that agrees with f at the points
x
0
, x
1
, . . . , x
n
is given by
P
n
(x) = f[x
0
] +
n

k=1
f[x
0
, x
1
, . . . , x
k
]
k1

i=0
(x x
i
).
Note that P
n
can be computed recursively using the relation
P
n
(x) = P
n1
(x) + f[x
0
, x
1
, . . . , x
n
](x x
0
)(x x
1
) (x x
n1
).
[Note that f[x
0
, x
1
, . . . , x
k
] can be found on the diagonal of the DD table.]
Exercise.
Suppose values (x, y) are given as (1, 2), (2, 56), (0, 2), (3, 4), (1, 16), and
(7, 376). Is there a cubic polynomial that takes these values? (Answer: 2x
3
7x
2
+
66
5x 2.)
67
Exercise 4.8.
Data points: (1, 2), (2, 56), (0, 2), (3, 4), (1, 16), (7, 376).
x
k
0DD 1DD 2DD 3DD 4DD 5DD
1 2
2 56 18
0 2 27 9
3 4 2 5 2
1 16 5 3 2 0
7 376 49 11 2 0 0
Newton polynomial:
p(x) = 2 + 18 (x 1) 9 (x 1) (x + 2) + 2 (x 1) (x + 2) x
= 2x
3
7x
2
+ 5x 2 .
[Note: We could have stopped at the row for x
3
= 3 and checked whether p
3
goes through
the remaining data points.]
68
9. Piecewise Polynomial Approximations.
Another drawback of interpolating polynomials is that P
n
tends to oscillate widely
when n is large, which implies that P
n
(x) may be a poor approximation to f(x) if x
is not close to the interpolating points.
If an interval [a, b] is divided into a set of subintervals [x
i
, x
i+1
], i = 0, 1, . . . , n1, and
on each subinterval a dierent polynomial is constructed to approximate a function
f, such an approximation is called spline.
The simplest spline is the linear spline P which approximates the function f on the
interval [x
i
, x
i+1
], i = 0, 1, . . . , n 1, with P agreeing with f at x
i
and x
i+1
.
Linear splines are easy to construct but are not smooth at points x
1
, x
2
, . . . , x
n1
.
69
10. Natural Cubic Splines.
Given a function f dened on [a, b] and a set of points a = x
0
< x
1
< < x
n
= b,
a function S is called a natural cubic spline if there exist n cubic polynomials S
i
such that:
(a) S(x) = S
i
(x) for x in [x
i
, x
i+1
] and i = 0, 1, . . . , n 1;
(b) S
i
(x
i
) = f(x
i
) and S
i
(x
i+1
) = f(x
i+1
) for i = 0, 1, . . . , n 1;
(c) S

i+1
(x
i+1
) = S

i
(x
i+1
) for i = 0, 1, . . . , n 2;
(d) S

i+1
(x
i+1
) = S

i
(x
i+1
) for i = 0, 1, . . . , n 2;
(e) S

(x
0
) = S

(x
n
) = 0.
70
Natural Cubic Splines.
| | |
x
i
x
i+1
x
i+2
S
i
S
i+1
(a) 4 n parameters
(b) 2 n equations
(c) n 1 equations
(d) n 1 equations
(e) 2 equations
_

_
4 n equations
71
Example. Assume S is a natural cubic spline that interpolates f C
2
([a, b]) at
the nodes a = x
0
< x
1
< < x
n
= b. We have the following smoothness property
of cubic splines:
_
b
a
[S

(x)]
2
dx
_
b
a
[f

(x)]
2
dx.
In fact, it even holds that
_
b
a
[S

(x)]
2
dx = min
gG
_
b
a
[g

(x)]
2
dx,
where G := {g C
2
([a, b]) : g(x
i
) = f(x
i
) i = 0, 1, . . . , n}.
Exercise: Determine the parameters a to h so that S(x) is a natural cubic spline,
where
S(x) = ax
3
+ bx
2
+ cx + d for x [1, 0] and
S(x) = ex
3
+ fx
2
+ gx + h for x [0, 1]
with interpolation conditions S(1) = 1, S(0) = 2, and S(1) = 1.
(Answer: a = 1, b = 3, c = 1, d = 2, e = 1, f = 3, g = 1, h = 2.)
72
11. Computation of Natural Cubic Splines.
Denote
c
i
= S

(x
i
), i = 0, 1, . . . , n.
Then c
0
= c
n
= 0.
Since S
i
is a cubic function on [x
i
, x
i+1
], we know that S

i
is a linear function on
[x
i
, x
i+1
]. Hence it can be written as
S

i
(x) = c
i
x
i+1
x
h
i
+ c
i+1
x x
i
h
i
where h
i
:= x
i+1
x
i
.
73
Exercise.
Show that S
i
is given by
S
i
(x) =
c
i
6 h
i
(x
i+1
x)
3
+
c
i+1
6 h
i
(x x
i
)
3
+ p
i
(x
i+1
x) + q
i
(x x
i
),
where
p
i
=
_
f(x
i
)
h
i

c
i
h
i
6
_
, q
i
=
_
f(x
i+1
)
h
i

c
i+1
h
i
6
_
and c
1
, . . . , c
n1
satisfy the linear equations:
h
i1
c
i1
+ 2 (h
i1
+ h
i
) c
i
+ h
i
c
i+1
= u
i
,
where
u
i
= 6 (d
i
d
i1
), d
i
=
f(x
i+1
) f(x
i
)
h
i
for i = 1, 2, . . . , n 1.
C
++
Exercise: Write a program to construct a natural cubic spline. The inputs are
the number of points n + 1 and all the points (x
i
, y
i
), i = 0, 1, . . . , n. The output is
a natural cubic spline expressed in terms of the functions S
i
dened on the intervals
[x
i
, x
i+1
] for i = 0, 1, . . . , n 1.
74
5. Basic Probability Theory
1. CDF and PDF.
Let (, F, P) be a probability space, X be a random variable.
The cumulative distribution function (cdf) F of X is dened by
F(x) = P(X x), x R.
F is an increasing right-continuous function satisfying
F() = 0, F(+) = 1.
If F is absolutely continuous then X has a probability density function (pdf) f dened
by
f(x) = F

(x), x R.
F can be recovered from f by the relation
F(x) =
_
x

f(u) du.
75
2. Normal Distribution.
A random variable X has a normal distribution with parameters and
2
, written
X N(,
2
), if X has the pdf
(x) =
1

2
e

(x)
2
2
2
for x R.
If = 0 and
2
= 1 then X is called a standard normal random variable and its cdf
is usually written as
(x) =
_
x

2
e

u
2
2
du.
If X N(,
2
) then the characteristic function (Fourier transform) of X is given
by
c(s) = E(e
i sX
) = e
i t

2
s
2
2
,
where i =

1. The moment generating function (mgf) is given by


m(s) = E(e
sX
) = e
t+

2
s
2
2
.
76
3. Approximation of Normal CDF.
It is suggested that the standard normal cdf (x) can be approximated by a poly-
nomial

(x) as follows:

(x) := 1

(x) (a
1
k + a
2
k
2
+ a
3
k
3
+ a
4
k
4
+ a
5
k
5
) (3)
when x 0 and

(x) := 1

(x) when x < 0.


The parameters are given by k =
1
1+x
, = 0.2316419, a
1
= 0.319381530, a
2
=
0.356563782, a
3
= 1.781477937, a
4
= 1.821255978, and a
5
= 1.330274429. This
approximation has a maximum absolute error less than 7.5 10
8
for all x.
C
++
Exercise: Write a program to compute (x) with (3) and compare the result
with that derived with the composite Simpson Rule (see Exercise 4.4). The input is
a variable x and the number of partitions n over the interval [0, x]. The output is the
results from the two methods and their error.
77
4. Lognormal Random Variable.
Let Y = e
X
and X be a N(,
2
) random variable. Then Y is a lognormal random
variable.
Exercise: Show that
E(Y ) = e
+
1
2

2
, E(Y
2
) = e
2+2
2
.
5. An Important Formula in Pricing European Options.
If V is lognormally distributed and the standard deviation of ln V is s then
1
E(max(V K, 0)) = E(V ) (d
1
) K (d
2
)
where
d
1
=
1
s
ln
E(V )
K
+
s
2
and d
2
= d
1
s.
1
K will be later denoted by X. We use a dierent notation here to avoid an abuse of notation, as K is not a random variable.
78
E(V K)
+
= E(V ) (d
1
) K (d
2
), d
1
=
1
s
ln
E(V )
K
+
s
2
and d
2
= d
1
s.
Proof.
Let g be the pdf of V . Then
E(V K)
+
=
_

(v K)
+
g(v) dv =
_

K
(v K) g(v) dv.
As V is lognormal, ln V is normal N(m, s
2
), where m = ln(E(V ))
1
2
s
2
.
Let Y :=
ln V m
s
, i.e. V = e
m+s Y
. Then Y N(0, 1) with pdf: (y) =
1

2
e

y
2
2
.
E(V K)
+
= E(e
m+s Y
K)
+
=
_

ln Km
s
_
e
m+s y
K
_
(y) dy
=
_

ln Km
s
e
m+s y
(y) dy K
_

ln Km
s
(y) dy
= I
1
K I
2
.
79
I
1
=
_

ln Km
s
1

2
e

y
2
2
+m+s y
dy
=
_

ln Km
s
1

2
e

(ys)
2
2
+m+
s
2
2
dy
= e
m+
s
2
2
_

ln Km
s
s
1

2
e

y
2
2
dy [y s y]
= e
m+
s
2
2
_
1
_
ln K m
s
s
__
= e
m+
s
2
2

ln K m
s
+ s
_
= e
ln E(V )

_
ln K + ln E(V )
s
2
2
s
+ s
_
= E(V )
_
1
s
ln
E(V )
K
+
s
2
_
= E(V ) (d
1
) ,
on recalling m = ln(E(V ))
1
2
s
2
.
80
Similarly
I
2
= 1
_
ln K m
s
_
=
_

ln K m
s
_
= (d
1
s) = (d
2
) .
81
6. Correlated Random Variables.
Assume X = (X
1
, . . . , X
n
) is an n-vector of random variables.
The mean of X is an n-vector = (E(X
1
), . . . , E(X
n
)).
The covariance of X is an n n-matrix with components

ij
= (CovX)
ij
= E((X
i

i
)(X
j

j
)) .
The variance of X
i
is given by
2
i
=
ii
and the correlation between X
i
and X
j
is
given by
ij
=

ij

j
.
X is called a multi-dimensional normal vector, written as X N(, ), if X has pdf
f(x) =
1
(2)
n/2
1
(det)
1/2
exp
_

1
2
(x )
T

1
(x )
_
where is a symmetric positive denite matrix.
82
7. Convergence.
Let {X
n
} be a sequence of random variables. There are four types of convergence
concepts associated with {X
n
}:
Almost sure convergence, written X
n
a.s.
X, if there exists a null set N such
that for all \ N one has
X
n
() X(), n .
Convergence in probability, written X
n
P
X, if for every > 0 one has
P(|X
n
X| > ) 0, n .
Convergence in norm, written X
n
L
p
X, if X
n
, X L
p
and
E|X
n
X|
p
0, n .
Convergence in distribution, written X
n
D
X, if for any x at which P(X x)
is continuous one has
P(X
n
x) P(X x), n .
83
8. Strong Law of Large Numbers.
Let {X
n
} be independent, identically distributed (iid) random variables with nite
expectation E(X
1
) = . Then
Z
n
n
a.s.

where Z
n
= X
1
+ + X
n
.
9. Central Limit Theorem. Let {X
n
} be iid random variables with nite expecta-
tion and nite variance
2
> 0. For each n, let
Z
n
= X
1
+ + X
n
.
Then
Z
n
n

n
=
Z
n
n

n
D
Z
where Z is a N(0, 1) random variable, i.e.,
P(
Z
n
n

n
z)
1

2
_
z

x
2
2
dx, n .
84
10. LindebergFeller Central Limit Theorem.
Suppose X is a triangular array of random variables, i.e.,
X = {X
n
1
, X
n
2
, . . . , X
n
k(n)
: n {1, 2, . . .}}, with k(n) as n ,
such that, for each n, X
n
1
, . . . , X
n
k(n)
are independently distributed and are bounded
in absolute value by a constant y
n
with y
n
0. Let
Z
n
= X
n
1
+ + X
n
k(n)
.
If E(Z
n
) and var(Z
n
)
2
> 0, then Z
n
converges in distribution to a normally
distributed random variable with mean and variance
2
.
Note: LindebergFeller implies the Central Limit Theorem (5.9).
85
If X
1
, X
2
, . . . are iid with expectation and variance
2
, then dene
X
n
i
:=
X
i

n
, i = 1, 2, . . . , k(n) := n.
For each n, X
n
1
, . . . , X
n
k(n)
are independent and
E(X
n
i
) =
E(X
i
)

n
=

n
= 0 ,
Var(X
n
i
) =
1
n
2
VarX
i
=
1
n
2

2
=
1
n
.
Let Z
n
= X
n
1
+ + X
n
k(n)
= (

n
i=1
X
i
n)
1

n
, then
E(Z
n
) =
k(n)

i=1
E(X
n
i
) = 0 ,
Var(Z
n
) =
k(n)

i=1
Var(X
n
i
) = 1 .
Hence, by LindebergFeller,
Z
n
D
Z N(0, 1) .
86
6. Optimization
1. Unconstrained Optimization.
Given f : R
n
R.
Minimize f(x) over x R
n
.
f has a local minimum at a point x if f( x) f(x) for all x near x, i.e.
> 0 s.t. f( x) f(x) x : x x < .
f has a global minimum at x if
f( x) f(x) x R
n
.
87
2. Optimality Conditions.
First order necessary conditions:
Suppose that f has a local minimum at x and that f is continuously dierentiable
in an open neighbourhood of x. Then f( x) = 0. ( x is called a stationary point.)
Second order sucient Conditions:
Suppose that f is twice continuously dierentiable in an open neighbourhood of
x and that f( x) = 0 and
2
f( x) is positive denite. Then x is a strict local
minimizer of f.
Example: Show that f = (2x
2
1
x
2
)(x
2
1
2x
2
) has a minimum at (0, 0) along any
straight line passing through the origin, but f has no minimum at (0, 0).
Exercise: Find the minimum solution of
f(x
1
, x
2
) = 2x
2
1
+ x
1
x
2
+ x
2
2
x
1
3x
2
. (4)
(Answer.
1
7
(1, 11).)
88
Sucient Condition.
Taylor gives for any d R
n
:
f( x + d) = f( x) + f( x)
T
d +
1
2
d
T

2
f( x + d) d (0, 1) .
If x is not strict local minimizer, then
{x
k
} R
n
\ { x} : x
k
x s.t. f(x
k
) f( x) .
Dene d
k
:=
x
k
x
x
k
x
. Then d
k
= 1 and there exists a subsequence {d
k
j
} such that
d
k
j
d

as j and d

= 1. W.l.o.g. we assume d
k
d

as k .
f( x) f(x
k
) = f( x + x
k
x d
k
)
= f( x) + x
k
x f( x)
T
d
k
+
1
2
x
k
x
2
d
T
k

2
f( x +
k
x
k
x d
k
) d
k
= f( x) +
1
2
x
k
x
2
d
T
k

2
f( x +
k
x
k
x d
k
) d
k
.
Hence d
T
k

2
f( x +
k
x
k
x d
k
) d
k
0, and on letting k
d
T


2
f( x) d

0 .
As d

= 0, this is a contradiction to
2
f( x) being symmetric positive denite. Hence x
is a strict local minimizer.
89
Example 6.2. Show that f = (2x
2
1
x
2
)(x
2
1
2x
2
) has a minimum at (0, 0) along any
straight line passing through the origin, but f has no minimum at (0, 0).
Answer.
Straight line through (0, 0): x
2
= x
1
, R xed.
g(r) := f(r, r) = (2 r
2
r) (r
2
2 r)
g

(r) = 8 r
3
15 r
2
+ 4
2
r, g

(r) = 24 r
2
30 r + 4
2
g

(0) = 0 and g

(0) = 4
2
> 0 .
Hence r = 0 is a minimizer for g (0, 0) is a minimizer for f along any straight
line.
Now let (x
k
1
, x
k
2
) = (
1
k
,
1
k
2
) (0, 0) as k . Then
f(x
k
1
, x
k
2
) =
1
k
2
1
k
2
< 0 = f(0, 0) k .
Hence (0, 0) is not a minimizer for f.
[Note: f(0, 0) = 0, but
2
f(0, 0) =
_
0 0
0 4
_
.]
90
3. Convex Optimization.
Exercise. When f is convex, any local minimizer x is a global minimizer of f. If in
addition f is dierentiable, then any stationary point x is a global minimizer of f.
(Hint. Use a contradiction argument.)
91
Exercise 6.3.
When f is convex, any local minimizer x is a global minimizer of f.
Proof.
Suppose x is a local minimizer, but not a global minimizer. Then
x s.t. f( x) < f( x).
Since f is convex, we have that
f( x + (1 ) x) f( x) + (1 ) f( x)
< f( x) + (1 ) f( x) = f( x) (0, 1] .
Let x

:= x + (1 ) x. Then
x

x and f(x

) < f( x) as 0.
This is a contradiction to x being a local minimizer.
Hence x is a global minimizer for f.
92
4. Line Search.
The basic procedure to solve numerically an unconstrained problem (minimize f(x)
over x R
n
) is as follows.
(i) Choose an initial point x
0
R
n
and an initial search direction d
0
R
n
and set
k = 0.
(ii) Choose a step size
k
and dene a new point x
k+1
= x
k
+
k
d
k
. Check if the
stopping criterion is satised (f(x
k+1
) < ?). If yes, x
k+1
is the optimal
solution, stop. If no, go to (iii).
(iii) Choose a new search direction d
k+1
(descent direction) and set k = k +1. Go to
(ii).
The essential and most dicult part in any search algorithm is to choose a descent
direction d
k
and a step size
k
with good convergence and stability properties.
93
5. Steepest Descent Method.
f is dierentiable.
Choose d
k
= g
k
, where g
k
= f(x
k
), and choose
k
s.t.
f(x
k
+
k
d
k
) = min
R
f(x
k
+ d
k
).
Note that the successive descent directions are orthogonal to each other, i.e. (g
k
)
T
g
k+1
=
0, and the convergence for some functions may be very slow, called zigzagging.
Exercise.
Use the steepest descent (SD) method to solve (4) with the initial point x
0
= (1, 1).
(Answer. First three iterations give x
1
= (0, 1), x
2
= (0,
3
2
), and x
3
= (
1
8
,
3
2
).)
94
Steepest Descent.
Taylor gives:
f(x
k
+ d
k
) = f(x
k
) + f(x
k
)
T
d
k
+ O(
2
).
As
f(x
k
)
T
d
k
= f(x
k
) d
k
cos
k
,
with
k
the angle between d
k
and f(x
k
), we see that d
k
is a descent direction if
cos
k
< 0. The descent is steepest when
k
= cos
k
= 1.
Zigzagging.

k
is minimizer of () := f(x
k
+ d
k
) with d
k
= g
k
.
Hence
0 =

(
k
) = f(x
k
+
k
d
k
)
T
d
k
= f(x
k+1
)
T
(g
k
) = (g
k+1
)
T
g
k
.
Hence d
k+1
d
k
, which leads to zigzagging.
95
Exercise 6.5.
Use the SD method to solve (4) with the initial point x
0
= (1, 1). [min:
1
7
(1, 11).]
Answer.
f = (4 x
1
+ x
2
1, 2 x
2
+ x
1
3).
Iteration 0: d
0
= f(x
0
) = (4, 0) = (0, 0).
() = f(x
0
+ d
0
) = f(1 4 , 1) = 2 (1 4 )
2
2
minimum point at
0
=
1
4
x
1
= x
0
+
0
d
0
= (0, 1),
d
1
= f(x
1
) = (0, 1) = (0, 1) = (0, 0).
Iteration 1: x
2
= (0,
3
2
), d
2
= (
1
2
, 0).
Iteration 2: x
3
= (
1
8
,
3
2
), d
3
= (0,
1
8
).
96
6. Newton Method.
f is twice dierentiable.
Choose d
k
= [H
k
]
1
g
k
, where H
k
=
2
f(x
k
).
Set x
k+1
= x
k
+ d
k
.
If H
k
is positive denite then d
k
is a descent direction.
The main drawback of the Newton method is that it requires the computation of

2
f(x
k
) and its inverse, which can be dicult and time-consuming.
Exercise.
Use the Newton method to solve (4) with x
0
= (1, 1).
(Answer. First iteration gives x
1
=
1
7
(1, 11).)
97
Newton Method.
Taylor gives
f(x
k
+ d) f(x
k
) + d
T
f(x
k
) +
1
2
d
T

2
f(x
k
) d =: m(d)
min
d
m(d) m(d) = 0
f(x
k
) +
2
f(x
k
) d = 0 .
Hence choose d
k
= [
2
f(x
k
)]
1
f(x
k
) = [H
k
]
1
g
k
.
If H
k
is positive denite, then so is (H
k
)
1
, and we get
(d
k
)
T
g
k
= (g
k
)
T
(H
k
)
1
g
k

k
g
k

2
< 0
for some
k
> 0.
Hence d
k
is a descent direction.
[Aside: The Newton method for min
x
f(x) is equivalent to the Newton method for nding
a root of the system of nonlinear equations f(x) = 0.]
98
Exercise 6.6.
Use the Newton method to minimize
f(x
1
, x
2
) = 2x
2
1
+ x
1
x
2
+ x
2
2
x
1
3x
2
with x
0
= (1, 1)
T
.
Answer.
f =
_
4 x
1
+ x
2
1
2 x
2
+ x
1
3
_
, H :=
2
f =
_
4 1
1 2
_
.
H
1
=
1
det H
_
2 1
1 4
_
=
1
7
_
2 1
1 4
_
.
Iteration 0: x
0
= (1, 1)
T
, f(x
0
) = (4, 0)
T
.
x
1
= x
0
[H
0
]
1
f(x
0
) =
_
1
1
_

1
7
_
2 1
1 4
__
4
0
_
=
1
7
_
1
11
_
.
f(x
1
) = (0, 0)
T
and H positive denite.
x
1
is minimum point.
99
7. Choice of Stepsize.
In computing the step size
k
we face a tradeo. We would like to choose
k
to
give a substantial reduction of f, but at the same time we do not want to spend too
much time making the choice. The ideal choice would be the global minimizer of the
univariate function : R R dened by
() = f(x
k
+ d
k
), > 0,
but in general, it is too expensive to identify this value.
A common strategy is to perform an inexact line search to identify a step size that
achieves adequate reductions in f with minimum cost.
is normally chosen to satisfy the Wolfe conditions:
f(x
k
+
k
d
k
) f(x
k
) + c
1

k
(g
k
)
T
d
k
(5)
f(x
k
+
k
d
k
)
T
d
k
c
2
(g
k
)
T
d
k
, (6)
with 0 < c
1
< c
2
< 1. (5) is called the sucient decrease condition, and (6) is the
curvature condition.
100
Choice of Stepsize.
The simple condition
f(x
k
+
k
d
k
) < f(x
k
) ()
is not appropriate, as it may not lead to a sucient reduction.
Example: f(x) = (x 1)
2
1. So min f(x) = 1, but we can choose x
k
satisfying ()
such that f(x
k
) =
1
k
0.
Note that the sucient decrease condition (5)
() = f(x
k
+ d
k
) () := f(x
k
) + c
1
(g
k
)
T
d
k
yields acceptable regions for . Here () < () for small > 0, as (g
k
)
T
d
k
< 0 for
descent directions.
The curvature condition (6) is equivalent to

() c
2

(0) [ >

(0) ]
i.e. a condition on the desired slope and so rules out unacceptably short steps . In
practice c
1
= 10
4
and c
2
= 0.9.
101
8. Convergence of Line Search Methods.
An algorithm is said to be globally convergent if
lim
k
g
k
= 0.
It can be shown that if the step sizes satisfy the Wolfe conditions
then the steepest descent method is globally convergent,
so is the Newton method provided the Hessian matrices
2
f(x
k
) have a bounded
condition number and are positive denite.
Exercise. Show that the steepest descent method is globally convergent if the
following conditions hold
(a)
k
satises the Wolfe conditions,
(b) f(x) M x R
n
,
(c) f C
1
and f is Lipschitz: f(x) f(y) Lx y x, y R
n
.
102
[Hint: Show that

k=0
g
k

2
< .]
103
Exercise 6.8.
Assume that d
k
is a descent direction, i.e. (g
k
)
T
d
k
< 0, where g
k
:= f(x
k
). Then if
1.
k
satises the Wolfe conditions,
2. f(x) M x R
n
,
3. f C
1
and f is Lipschitz, i.e. f(x) f(y) Lx y x, y R
n
,
it holds that

k=0
cos
2

k
g
k

2
< , where cos
k
:=
(g
k
)
T
d
k
g
k
d
k

.
[Note: SD method is special case with cos
2

k
= 1. lim
k
g
k
= 0.]
Proof.
Wolfe condition (6) (g
k+1
)
T
d
k
c
2
(g
k
)
T
d
k
(g
k+1
g
k
)
T
d
k
(c
2
1) (g
k
)
T
d
k
. ()
.
104
The Lipschitz condition yields that
(g
k+1
g
k
)
T
d
k
g
k+1
g
k
d
k
= f(x
k+1
) f(x
k
) d
k

Lx
k+1
x
k
d
k
=
k
Ld
k

2
. ()
Combining () and () gives
k

c
2
1
L
(g
k
)
T
d
k
d
k

2
, and hence

k
(g
k
)
T
d
k

c
2
1
L
[(g
k
)
T
d
k
]
2
d
k

2
Together with Wolfe condition (5) we get
f(x
k+1
) f(x
k
) + c
1
c
2
1
L
[(g
k
)
T
d
k
]
2
d
k

2
= f(x
k
) c cos
2

k
g
k

2
,
105
where c := c
1
1c
2
L
> 0.
f(x
k+1
) f(x
k
) c cos
2

k
g
k

2
f(x
0
) c
k

j=0
cos
2

j
g
j

j=0
cos
2

j
g
j

1
c
(f(x
0
) M) k

j=0
cos
2

j
g
j

2
< .
106
9. Popular Search Methods.
In practice the steepest descent method and the Newton method are rarely used
due to the slow convergence rate and the diculty in computing Hessian matrices,
respectively.
The popular search methods are
the conjugate gradient method (variation of SD method with superlinear conver-
gence) and
the quasi-Newton method (variation of Newton method without computation of
Hessian matrices).
There are some ecient algorithms based on the trust-region approach. See Fletcher
(2000) for details.
107
10. Constrained Optimization.
Minimize f(x) over x R
n
subject to
the equality constraints
h
i
(x) = 0, i = 1, . . . , l ,
and the inequality constraints
g
j
(x) 0, j = 1, . . . , m.
Assume that all functions involved are dierentiable.
108
11. Linear Programming.
The problem is to minimize
z = c
1
x
1
+ + c
n
x
n
subject to
a
i1
x
1
+ + a
in
x
n
b
i
, i = 1, . . . , m,
and
x
1
, . . . , x
n
0.
LPs can be easily and eciently solved with the simplex algorithm or the interior
point method.
MS-Excel has a good in-built LP solver capable of solving problems up to 200 vari-
ables. MATLAB with optimization toolbox also provides a good LP solver.
109
12. Graphic Method.
If an LP problem has only two decision variables (x
1
, x
2
), then it can be solved by
the graphic method as follows:
First draw the feasible region from the given constraints and a contour line of the
objective function,
then, on establishing the increasing direction perpendicular to the contour line,
nd the optimal point on the boundary of the feasible region,
then nd two linear equations which dene that point,
and nally solve the two equations to obtain the optimal point.
Exercise. Use the graphic method to solve the LP:
minimize z = 3 x
1
2 x
2
subject to x
1
+ x
2
80, 2x
1
+ x
2
100, x
1
40, and x
1
, x
2
0.
(Answer. x
1
= 20, x
2
= 60.)
110
13. Quadratic Programming.
Minimize
x
T
Qx + c
T
x
subject to
Ax b and x 0,
where Q is an n n symmetric positive denite matrix, A is an n m matrix,
x, c R
n
, b R
m
.
To solve a QP problem, one
rst derives a set of equations from the KuhnTucker conditions, and
then applies the Wolfe algorithm or the Lemke algorithm to nd the optimal
solution.
The MS-Excel solver is capable of solving reasonably sized QP problems, similarly
for MATLAB.
111
14. KuhnTucker Conditions.
min f(x) over x R
n
s.t. h
i
(x) = 0, i = 1, . . . , l; g
j
(x) 0, j = 1, . . . , m.
Assume that x is an optimal solution.
Under some regularity conditions, called the constraint qualications, there exist
two vectors u = ( u
1
, . . . , u
l
) and v = ( v
1
, . . . , v
m
), called the Lagrange multipliers,
such that the following set of conditions is satised:
L
x
k
( x, u, v) = 0, k = 1, . . . , n
h
i
( x) = 0, i = 1, . . . , l
g
j
( x) 0, j = 1, . . . , m
v
j
g
j
( x) = 0, v
j
0, j = 1, . . . , m
where
L(x, u, v) = f(x) +
l

i=1
u
i
h
i
(x) +
m

j=1
v
j
g
j
(x)
is called the Lagrange function or Lagrangian.
112
Furthermore, if f : R
n
R and h
i
, g
j
: R
n
R are convex, then x is an optimal
solution if and only if ( x, u, v) satises the KuhnTucker conditions.
This holds in particular, when f is convex and h
i
, g
j
are linear.
Example.
Find the minimum solution to the function x
2
x
1
subject to x
2
1
+ x
2
2
1.
Exercise.
Find the minimum solution to the function x
2
1
+x
2
2
2x
1
4x
2
subject to x
1
+2x
2
2
and x
2
0.
(Answer. (x
1
, x
2
) =
1
5
(2, 4).)
113
Interpretation of KuhnTucker conditions
Assume that no equality constraints are present.
If x is an interior point, i.e. no constraints are active, then we recover the usual optimality
condition: f( x) = 0.
Now assume that x lies on the boundary of the feasible set and let g
j
k
be the active
constraints at x. Then a necessary condition for optimality is that we cannot nd a
descent direction for f at x that is also a feasible direction. Such a vector cannot exist,
if
f( x) =

k
v
j
k
g
j
k
( x) with v
j
k
0 . ()
This is because, if d R
n
is a descent direction, then f( x)
T
d < 0 and

k
v
j
k
g
j
k
( x)
T
d >
0.
So there must exist a j
k
, such that g
j
k
( x)
T
d > 0. But that means that d is an ascent
direction for g
j
k
, and as g
j
k
is active at x, it is not a feasible direction.
If we require v
j
g
j
( x) = 0 for all inactive constraints, then we can re-write () as 0 =
114
f( x) +

m
j=1
v
j
g
j
( x) with v
j
0. These are the KT conditions.
115
Application of KuhnTucker: LP Duality
Let b R
m
, c R
n
and A R
nm
.
min c
T
x s.t. Ax b, x 0 . (P)
Equivalent to min c
T
x s.t. b Ax 0, x 0 .
Lagrangian: L = c
T
x + v
T
(b Ax) + y
T
(x).
Hence, x is the solution, if there exist v and y such that
L = c A
T
v y = 0 y = c A
T
v,
KT conditions: v
T
(b A x) = 0, y
T
( x) = 0,
v, y 0, b A x 0, x 0.
116
Eliminate y to nd v, x:
A x b, x 0 feasible region: primal
A
T
v c, v 0 feasible region: dual
v
T
(b A x) = 0
x
T
(c A
T
v) = 0
_
x
T
c = x
T
A
T
v = v
T
b
Hence v R
m
solves the dual:
max b
T
v s.t. A
T
v c, v 0 . (D)
117
Here we have used that
c
T
x = min
x0, Axb
c
T
x
min
x0
max
v0
c
T
x + v
T
(b Ax)
= max
v0
min
x0
c
T
x + v
T
(b Ax)
= max
v0
min
x0
v
T
b + x
T
(c A
T
v)
max
v0, A
T
vc
v
T
b
v
T
b = c
T
x
118
Example 6.14.
Find the minimum solution to the function x
2
x
1
subject to x
2
1
+ x
2
2
1.
Answer.
L = x
2
x
1
+ v (x
2
1
+ x
2
2
1), so the KT conditions become
L
x
1
= 1 + 2 v x
1
= 0 (1)
L
x
2
= 1 + 2 v x
2
= 0 (2)
x
2
1
+ x
2
2
1 (3)
v (x
2
1
+ x
2
2
1) = 0, v 0 (4)
(1) v > 0 and hence x
1
=
1
2v
, x
2
=
1
2v
.
Plugging this into (4) yields
2
4 v
2
= 1 and hence
v =
1

2
x
1
=
1

2
, x
2
=
1

2
;
with the optimal value being z =

2.
119
Example.
min x
1
s.t. x
2
x
3
1
0, x
1
1, x
2
0.
Since x
3
1
x
2
0 we have x
1
0 and hence x = (0, 0) is the unique minimizer.
Lagrangian: L = x
1
+ v
1
(x
2
x
3
1
) + v
2
(x
1
1) + v
3
(x
2
).
KT conditions for a feasible point x:
L =
_
1 3 v
1
x
2
1
+ v
2
v
1
v
3
_
= 0 (1)
v
1
(x
2
x
3
1
) = 0, v
2
(x
1
1) = 0, v
3
(x
2
) = 0 (2)
v
1
, v
2
, v
3
0 (3)
Check KT conditions at x = (0, 0):
(1) v
2
= 1 < 0 impossible!
KT condition is not satised, since the constraint qualications do not hold.
Here g
1
= x
2
x
3
1
and g
3
= x
2
are active at (0, 0), and g
1
=
_
0
1
_
, g
3
=
_
0
1
_
. Hence
g
1
and g
3
are not linearly independent!
120
Exercise 6.14
Find the minimum solution to the function x
2
1
+x
2
2
2 x
1
4 x
2
subject to x
1
+2 x
2
2
and x
2
0.
Answer.
Lagrangian: L = x
2
1
+ x
2
2
2 x
1
4 x
2
+ v
1
(x
1
+ 2 x
2
2) + v
2
(x
2
)
KT conditions:
L =
_
2 x
1
2 + v
1
2 x
2
4 + 2 v
1
v
2
_
= 0 (1)
v
1
(x
1
+ 2 x
2
2) = 0, v
2
x
2
= 0, v
1
, v
2
0 (2)
x
1
+ 2 x
2
2, x
2
0 (3)
If x
1
+ 2 x
2
2 < 0 then v
1
= 0 x
1
= 1 , x
2
= 2 +
1
2
v
2
2. Hence from (2),
v
2
= 0, and so x
1
= 1, x
2
= 2. But that contradicts (3), and so it must hold that
x
1
+ 2 x
2
2 = 0.
If x
2
> 0, then v
2
= 0 solving (1) together with x
1
+ 2 x
2
2 = 0 gives x
1
=
2
5
,
x
2
=
4
5
; and v
1
=
6
5
, v
2
= 0.
121
If x
2
= 0 then x
1
= 2 2 x
2
= 2. But then from (1), v
1
= 2 < 0. Impossible.
122
7. Lattice (Tree) Methods
1. Concepts in Option Pricing Theory.
A call (or put) option is a contract that gives the holder the right to buy (or sell)
a prescribed asset (underlying asset) by a certain date T (expiration date) for a
predetermined price X (exercise price).
A European option can only be exercised on the expiration date while an American
option can be exercised at any time prior to the expiration date.
The other party to the holder of the option contract is called the writer.
The holder and the writer are said to be in long and short positions of the contract,
respectively.
The terminal payos of a European call (or put) option is (S(T)X)
+
:= max(S(T)
X, 0) (or (X S(T))
+
), where S(T) is the underlying asset price at time T.
123
2. Random Walk Model and Assumption.
Assume S(t) and V (t) are the asset price and the option price at time t, respectively,
and the current time is t and current asset price is S, i.e. S(t) = S.
After one period of time t, the asset price S(t + t) is either uS (up state)
with probability q or d S (down state) with probability 1 q, where q is a real
(subjective) probability.
To avoid riskless arbitrage opportunities, we must have
u > R > d
where R := e
r t
and r is the riskless interest rate.
Here r is the riskless interest rate that allows for unlimited borrowing or lending at
the same rate r.
Investing $1 at time t yields a value (return) at time t + t of $R = $(e
r t
).
(continuous compounding).
124
Continuous compounding
Saving an amount B at time t, yields with interest rate r at time t + t without
compounding: (1 + r t) B.
Compounding the interest n times yields
_
1 +
r t
n
_
n
B e
r t
B as n .
Example:
Consider t = 1 and r = 0.05.
Then R = e
r
= 1.0512711, equivalent to an AER of 5.127%.
n (1 +
r
n
)
n
AER
1 1.05 5 %
4 1.05094534 5.095 %
12 1.05116190 5.116 %
52 1.05124584 5.125 %
365 1.05126750 5.127 %
125
No Arbitrage. u > R > d
If R u, then short sell > 0 units of the asset, deposit S in the bank; giving a
portfolio value at time t of:
(t) = 0
and at time t + t of
(t + t) = R(S) S(t + t)
R(S) (uS) 0 .
Moreover, in the down state (with probability 1 q > 0) the value is
(t + t) = R(S) (d S) > 0 .
Hence we can make a riskless prot with positive probability.
No arbitrage implies u > R.
A similar argument yields that d < R.
126
3. Replicating Portfolio.
We form a portfolio consisting of
units of underlying asset ( > 0 buying, < 0 short selling) and
cash amount B in riskless cash bond (B > 0 lending, B < 0 borrowing).
If we choose
=
V (uS, t + t) V (d S, t + t)
uS d S
,
B =
uV (d S, t + t) d V (uS, t + t)
uR d R
,
we have replicated the payos of the option at time t + t, no matter how the asset
price changes.
127
Replicating Portfolio.
Value of portfolio at time t : (t) = S + B, and at time t + t:
(t + t) = S(t + t) + RB =
_
_
_
uS + RB =:
t
u
w. prob. q
d S + RB =:
t
d
w. prob. 1 q
Call option value at time t + t (expiration time) :
C(t + t) =
_
_
_
(uS X)
+
=: C
t
u
w. prob. q
(d S X)
+
=: C
t
d
w. prob. 1 q
To replicate the call option, we must have
t
u
= C
t
u
and
t
d
= C
t
d
. Hence
_
_
_
uS + RB = C
t
u
d S + RB = C
t
d

_
=
C
t
u
C
t
d
uS d S
B =
uC
t
d
d C
t
u
uR d R
This gives the fair price for the call option at time t: (t) = S + B.
128
Replicating Portfolio.
The value of the call option at time t is:
C(t) = (t) = S + B =
C
t
u
C
t
d
uS d S
S +
uC
t
d
d C
t
u
uR d R
=
1
R
_
R d
u d
C
t
u
+
u R
u d
C
t
d
_
=
1
R
_
p C
t
u
+ (1 p) C
t
d

,
where
p =
R d
u d
.
No arbitrage argument: If C(t) < (t), then buy call option at price C(t) and sell
portfolio at (t) (that is short sell units of asset and lend B amounts of cash to the
bank). This gives an instantaneous prot of (t) C(t) > 0. And we know that at
time t + t, the payo from our call option will exactly compensate for the value of the
portfolio, as C(t + t) = (t + t).
A similar argument for the case C(t) > (t) yields that C(t) = (t).
129
4. Risk Neutral Option Price.
The current option price is given by
V (S, t) =
1
R
_
p V (uS, t + t) + (1 p) V (d S, t + t)
_
, (7)
where p =
R d
u d
. Note that
1. the real probability q does not appear in the option pricing formula,
2. p is a probability (0 < p < 1) since d < R < u, and
3. p (uS) + (1 p) (d S) = RS, so p is called the risk neutral probability and the
process of nding V is called the risk neutral valuation.
130
5. Riskless Hedging Principle.
We can also derive the option pricing formula (7) as follows:
form a portfolio with a long position of one unit of option and a short position of
units of underlying asset.
By choosing an appropriate we can ensure such a portfolio is riskless.
Exercise.
Find (called delta) and derive the pricing formula (7).
131
6. Asset Price Process.
In a risk neutral continuous time model, the asset price S is assumed to follow a
lognormal process.
(as discussed in detail in Stochastic Processes I, omitted here).
Over a period [t, t +] the asset price S(t +) can be expressed in terms of S(t) = S
as
S(t + ) = S e
(r
1
2

2
) +

Z
where r is the riskless interest rate, the volatility, and Z N(0, 1) a standard
normal random variable.
S(t + ) has the rst moment S e
r
and the second moment S
2
e
(2 r+
2
)
.
132
7. Relations Between u, d and p.
By equating the rst and second moments of the asset price S(t + t) in both
continuous and discrete time models, we obtain the relation
S
_
p u + (1 p) d
_
= S e
r t
,
S
2
_
p u
2
+ (1 p) d
2
_
= S
2
e
(2 r+
2
) t
,
or equivalently,
p u + (1 p) d = e
r t
, (8)
p u
2
+ (1 p) d
2
= e
(2 r+
2
) t
. (9)
There are two equations and three unknowns u, d, p. An extra condition is needed to
uniquely determine a solution.
[Note that (8) implies p =
R d
u d
, where R = e
r t
as before.]
133
8. CoxRossRubinstein Model.
The extra condition is
ud = 1. (10)
The solutions to (8), (9), (10) are
u =
1
2R
_

2
+ 1 +
_
(
2
+ 1)
2
4 R
2
_
,
d =
1
2R
_

2
+ 1
_
(
2
+ 1)
2
4 R
2
_
,
where

2
= e
(2 r+
2
) t
.
If the higher order term O((t)
3
2
) is ignored in u, d, then
u = e

t
, d = e

t
, p =
R d
u d
.
These are the parameters chosen by CoxRossRubinstein for their model.
Note that with this choice (9) is satised up to O((t)
2
).
134
Proof.

2
= e
(2 r+
2
) t
= p (u
2
d
2
) + d
2
=
R d
u d
(u + d) (u d) + d
2
= Ru d u + Rd
= Ru 1 +
R
u
Ru
2
(1 +
2
) u + R = 0 .
As u > d, we get the unique solutions
u
true
=
1 +
2
+
_
(1 +
2
)
2
4 R
2
2 R
=
1 +
2
2 R
+
_
_
1 +
2
2 R
_
2
1
and
d
true
=
1
u
=
1 +
2

_
(1 +
2
)
2
4 R
2
2 R
=
1 +
2
2 R

_
_
1 +
2
2 R
_
2
1 .
135
Moreover
1 +
2
2 R
=
1
2
_
e
(2 r+
2
) t
+ 1
_
e
r t
=
1
2
_
2 + (2 r +
2
) t + O((t)
2
)
_ _
1 r t + O((t)
2
)
_
=
1
2
_
2 +
2
t + O((t)
2
)
_
= 1 +
1
2

2
t + O((t)
2
),
and
_
_
1 +
2
2 R
_
2
1 =
_
_
1 +
1
2

2
t + O((t)
2
)
_
2
1
=
_
1 +
2
t + O((t)
2
) 1
=

t
_
1 + O(t)
=

t (1 + O(t))
=

t + O((t)
3
2
) .
u
true
= 1 +
1
2

2
t +

t + O((t)
3
2
) .
136
The CoxRossRubinstein (CRR) model uses
u = e

t
= 1 +

t +
1
2

2
t + O((t)
3
2
) .
Hence the rst three terms in the Taylor series of the CRR value u and the true value
u
true
match. So
u = u
true
+ O((t)
3
2
) .
Estimating the error in (9) by this choice of u gives
Error = p u
2
+ (1 p) d
2
e
(2 r+
2
) t
= R(u + d) 1 e
(2 r+
2
) t
= e
r t
_
e

t
+ e

t
_
1 e
(2 r+
2
) t
=
_
1 + r t + O((t)
2
)
_ _
2 +
2
t + O((t)
2
)
_
1

_
1 + (2 r +
2
) t + O((t)
2
)
_
= O((t)
2
) .
137
9. JarrowRudd Model.
The extra condition is
p =
1
2
. (11)
Exercise.
Show that the solutions to (8), (9), (11) are
u = R
_
1 +
_
e

2
t
1
_
,
d = R
_
1
_
e

2
t
1
_
.
Show that if O((t)
3
2
) is ignored then
u = e
(r

2
2
) t+

t
,
d = e
(r

2
2
) t

t
.
(These are the parameters chosen by JarrowRudd for their model.)
Also show that (8) and (9) are satised up to O((t)
2
).
138
10. Tian Model.
The extra condition is
p u
3
+ (1 p) d
3
= e
3 r t+3
2
t
. (12)
Exercise.
Show that the solutions to (8), (9), (12) are
u =
RQ
2
_
Q+ 1 +
_
Q
2
+ 2 Q3
_
,
d =
RQ
2
_
Q+ 1
_
Q
2
+ 2 Q3
_
,
p =
R d
u d
,
where R = e
r t
and Q = e

2
t
.
Also show that if O((t)
3
2
) is ignored, then
p =
1
2

3
4

t.
139
(Note that ud = R
2
Q
2
instead of ud = 1 and that the binomial tree loses its
symmetry about S whenever ud = 1.)
140
11. BlackScholes Equation (BSE).
As the time interval t tends to zero, the one period option pricing formula, (7) with
(8) and (9), tends to the BlackScholes Equation
V
t
+ r S
V
S
+
1
2

2
S
2

2
V
S
2
r V = 0. (13)
Proof.
Two variable Taylor expansion at (S, t) gives
V (uS, t + t) = V + V
S
(uS S) + V
t
t
+
1
2
_
V
SS
(uS S)
2
+ 2 V
St
(uS S) t + V
tt
(t)
2
_
+ higher order terms .
Similarly,
V (d S, t + t) = V + V
S
(d S S) + V
t
t
+
1
2
_
V
SS
(d S S)
2
+ 2 V
St
(d S S) t + V
tt
(t)
2
_
+ higher order terms .
141
Substituting into the right hand side of (7) gives
V ={V + S V
S
[p (u 1) + (1 p) (d 1)] + V
t
t
+
1
2
_
S
2
V
SS
_
p (u 1)
2
+ (1 p) (d 1)
2
_
+2 S V
St
(p (u 1) + (1 p) (d 1)) t
+V
tt
(t)
2

+ higher order terms


_
e
r t
.
Now it follows from (8) that
p (u 1) + (1 p) (d 1) = p u + (1 p) d 1 = e
r t
1 = r t + O((t)
2
) .
Similarly, (8) and (9) give that
p (u 1)
2
+ (1 p) (d 1)
2
= p u
2
+ (1 p) d
2
2 (p u + (1 p) d) + 1
= e
(2 r+
2
) t
2 e
r t
+ 1 =
2
t + O((t)
2
) .
142
So, for the RHS of (7) we get
V =
_
V + S V
S
_
r t + O((t)
2
)
_
+ V
t
t
+
1
2
_
S
2
V
SS
_

2
t + O((t)
2
)
_
+ 2 V
St
_
r t + O((t)
2
)
_
t

+O((t)
2
)
_
e
r t
=
_
V +
_
r S V
S
+ V
t
+
1
2

2
S
2
V
SS
_
t + O((t)
2
)
__
1 r t + O((t)
2
)
_
= V +
_
r V + r S V
S
+ V
t
+
1
2

2
S
2
V
SS
_
t + O((t)
2
) .
Cancelling V and dividing by t yields
r V + r S V
S
+ V
t
+
1
2

2
S
2
V
SS
+ O(t) = 0 .
Ignoring the O(t) term, we get the BSE.
The binomial model approximates BSE to 1
st
order accuracy.
143
12. n-Period Option Price Formula.
For a multiplicative n-period binomial process, the call value is
C = R
n
n

j=0
_
n
j
_
p
j
(1 p)
nj
max(u
j
d
nj
S X, 0) , (14)
where
_
n
j
_
=
n!
j!(n j)!
is the binomial coecient.
Dene k to be the smallest non-negative integer such that u
k
d
nk
S X.
The call pricing formula can be simplied as
C = S (n, k, p

) X R
n
(n, k, p) , (15)
where p

=
u p
R
and is the complementary binomial distribution function dened
by
(n, k, p) =
n

j=k
_
n
j
_
p
j
(1 p)
nj
.
144
Asset:
u
2
S
uS
S ud S
d S
d
2
S
recombined tree
Call:
C
2t
uu
= (u
2
S X)
+
C
t
u
C C
2t
ud
= (ud S X)
+
C
t
d
C
2t
dd
= (d
2
S X)
+

t t + t t + 2 t
145
Call price at time t + t:
C
t
u
=
1
R
_
p C
2t
uu
+ (1 p) C
2t
ud
_
, C
t
d
=
1
R
_
p C
2t
ud
+ (1 p) C
2t
dd
_
.
Call price at time t:
C =
1
R
_
p C
t
u
+ (1 p) C
t
d
_
=
1
R
2
_
p
2
C
2t
uu
+ 2 p (1 p) C
2t
ud
+ (1 p)
2
C
2t
dd
_
.
Proof of (14) by induction:
Assume (14) holds for n k 1, then for n = k, there are k 1 periods between time
t + t and t + k t. So
C
t
u
=
1
R
k1
k1

j=0
_
k 1
j
_
p
j
(1 p)
k1j
_
u
j
d
k1j
(uS) X
_
+
=
1
R
k1
k

j=1
_
k 1
j 1
_
p
j1
(1 p)
kj
_
u
j
d
kj
S X
_
+
.
146
p C
t
u
=
1
R
k1
k

j=1
_
k 1
j 1
_
p
j
(1 p)
kj
_
u
j
d
kj
S X
_
+
,
and similarly
(1 p) C
t
d
=
1
R
k1
k1

j=0
_
k 1
j
_
p
j
(1 p)
kj
_
u
j
d
kj
S X
_
+
.
Combining gives
C =
1
R
_
p C
t
u
+ (1 p) C
t
d
_
=
1
R
k
_
p
k
(u
k
S X)
+
+ (1 p)
k
(d
k
S X)
+
+
k1

j=1
__
k 1
j 1
_
+
_
k 1
j
__
. .
=
_
k
j
_
p
j
(1 p)
kj
(u
j
d
kj
S X)
+

This proves (14).


147
Let k be the smallest non-negative integer such that
u
k
d
nk
S X
_
u
d
_
k

X
S d
n
k ln
X
S d
n
/ ln
u
d
.
Then
(u
j
d
nj
S X)
+
=
_
_
_
0 j < k
u
j
d
nj
S X j k
.
Hence
C =
1
R
n
n

j=k
_
n
j
_
p
j
(1 p)
nj
(u
j
d
nj
S X)
= S
n

j=k
_
n
j
__
p u
R
_
j
_
(1 p) d
R
_
nj
X R
n
n

j=k
_
n
j
_
p
j
(1 p)
nj
.
On setting p

=
p u
R
, it holds that
1 p

= 1
R d
u d
u
R
=
Rd + u d
R(u d)
=
u R
u d
d
R
=
(1 p) d
R
.
This proves (15).
148
13. BlackScholes Call Price Formula.
As the time interval t tends to zero, the n-period call price formula (15) tends to
the BlackScholes call option pricing formula
c = S (d
1
) X e
r (Tt)
(d
2
) (16)
where
d
1
=
ln
S
X
+ (r +

2
2
) (T t)

T t
and
d
2
=
ln
S
X
+ (r

2
2
) (T t)

T t
d
1

T t.
Proof.
The n-period call price (15) is given by
C = S (n, k, p

) X R
n
(n, k, p) = S (n, k, p

) X e
r nt
. .
e
r (Tt)
(n, k, p) .
149
Hence we need to prove
(n, k, p

) (d
1
) and (n, k, p) (d
2
) as t 0.
150
Let j be the binomial RV of the number of upward moves in n periods. Then
S
n
= S u
j
d
nj
= S
_
u
d
_
j
d
n
,
where S
n
is the asset price at time T = t + nt.
ln
S
n
S
= j ln
u
d
+ n ln d = j ln u + (n j) ln d ,
E(j) = np , Var(j) = np (1 p) ,
E
_
ln
S
n
S
_
= np ln
u
d
+ n ln d , Var
_
ln
S
n
S
_
= np (1 p)
_
ln
u
d
_
2
.
Moreover,
1 (n, k, p) = P(j k 1) = P
_
ln
S
n
S
(k 1) ln
u
d
+ n ln d
_
.
As k is the smallest integer such that k ln
X
S d
n
/ ln
u
d
, there exists an (0, 1] such
that
k 1 = ln
X
S d
n
/ ln
u
d
,
151
and so
1 (n, k, p) = P
_
ln
S
n
S
ln
X
S
ln
u
d
_
.
152
Dene
X
n
i
:=
_
_
_
ln u with prob. p
ln d with prob 1 p
i = 1, . . . , n,
and Z
n
:= X
n
1
+ . . . + X
n
n
.
Then, for each n, {X
n
1
, . . . , X
n
n
} are independent and Z
n
ln
S
n
S
. Hence
1 (n, k, p) = P(Z
n

n
) ,
n
:= ln
X
S
ln
u
d
.
Assuming e.g. the CRR model, we get
E(Z
n
) = np ln u + n(1 p) ln d (T t), where := r

2
2
,
Var(Z
n
) = np (1 p) ln
2
u
d

2
(T t) ,

n
= ln
X
S
ln
u
d
ln
X
S
,
and |X
n
i
| |

t| =

T t

n
y
n
0 as n .
153
Hence Theorem 5.10. implies that
Z
n
D
Z N((T t),
2
(T t)) .
154
So, as n
1 (n, k, p) = P(Z
n

n
)
1
_
2
2
(T t)
_
ln
X
S

(s(Tt))
2
2
2
(Tt)
ds .
On letting
y :=
s (T t)

T t
,
we have
1 (n, k, p)
1

2
_
ln
X
S
(Tt)

Tt

y
2
2
dy =
_
ln
X
S
(T t)

T t
_
.
Hence
(n, k, p) 1
_
ln
X
S
(T t)

T t
_
=
_

ln
X
S
(T t)

T t
_
=
_
ln
S
X
+ (r

2
2
) (T t)

T t
_
= (d
2
) .
Similarly one can show (n, k, p

) (d
1
).
155
14. Implementation.
Choose the number of time steps that is required to march down until the expiration
time T to be n, so that t =
T t
n
.
At time t
i
= t + i t there are i + 1 nodes.
Denote these nodes as (i, j) with i = 0, 1, . . . , n and j = 0, 1, . . . , i, where the rst
component i refers to time t
i
and the second component j refers to the asset price
S u
j
d
ij
.
Compute the option prices V
n
j
= V (S u
j
d
nj
, T), j = 0, 1, . . . , n, at expiration time
t
n
= T over n + 1 nodes.
Then go backwards one step and compute option prices at time t
n1
= T t using
the one period binomial option pricing formula over n nodes.
Continue until reaching the current time t
0
= t.
156
The recursive formula is
V
i
j
=
1
R
_
p V
i+1
j+1
+ (1 p) V
i+1
j
_
, j = 0, . . . , i and i = n 1, . . . , 0 ,
where
p =
R d
u d
.
The value V
0
0
= V (S, t) is the option price at the current time t.
C
++
Exercise: Write a program to implement the CoxRossRubinstein model to
price options with a non-dividend paying underlying asset. The inputs are the asset
price S, the strike price X, the riskless interest rate r, the current time t and the
maturity time T, the volatility , and the number of steps n. The outputs are the
European call and put option prices.
157
15. Greeks.
Assume the current time is t = 0. Some common sensitivity measures of a European
call option price c (the rate of change of the call price with respect to the underlying
factors) are delta, gamma, vega, rho, theta, dened by
=
c
S
, =

2
c
S
2
, v =
c

, =
c
r
, =
c
T
.
Exercise.
Derive the following relations from the call price (16).
= (d
1
) ,
=

(d
1
)
S

T
,
v = S

(d
1
) ,
= X T e
r T
(d
2
) ,
=
S

(d
1
)
2

T
r X e
r T
(d
2
) .
158
16. Dynamic Hedging.
Greeks can be used to hedge an option position against changes of underlying factors.
For example, to hedge a short position of a European call against changes of the asset
price, one should keep a long position of the underlying asset with units.
This strategy requires continuous trading and is prohibitively expensive if there are
any transaction costs.
159
17. Dividends.
(a) If the asset pays a continuous dividend yield at a rate q, then use r q instead of
r in building the asset price tree, but still use r in discounting the option values
in the backward calculation.
In the risk-neutral model, the asset has a rate of return of r q.
So
(8) becomes p u + (1 p) d = e
(rq) t
,
(9) becomes p u
2
+ (1 p) d
2
= e
(2 r2 q+
2
) t
.
However, the discount factor
1
R
remains the same:
1
R
= e
r t
.
160
(b) If the asset pays a discrete proportional dividend S between t
i1
and t
i
, then the
asset price at node (i, j) drops to (1 ) u
j
d
ij
S instead of the usual u
j
d
ij
S.
The tree is still recombined.
161
(1 ) u
4
S

(1 ) u
2
S
S
(1 ) ud S


(1 ) d
2
S
(1 ) d
4
S
(c) If the asset pays a discrete cash dividend D between t
i1
and t
i
, then the asset
price at node (i, j) drops to u
j
d
ij
SD instead of the usual u
j
d
ij
S. There are
i + 1 new recombined trees emanating from the nodes at time t
i
, but the original
tree is no longer recombined.
162
Suppose a cash dividend D is paid between t
i1
and t
i
, then at time t
i+m
there
are (i + 1) (m + 1) nodes, instead of i + m + 1 in a recombined tree.
If there are several cash dividends, then the number of nodes is much larger than
in a normal tree.
Example. Assume there is a cash dividend D between periods 1 and 2. Then
the possible asset prices at time n = 2 are {u
2
S D, ud S D, d
2
S D}. At
n = 3 there are 6 nodes: u {u
2
S D, ud S D, d
2
S D} and d {u
2
S
D, ud S D, d
2
S D}, instead of 4.
[Reason: d (u
2
S D) = u(ud S D)].
163
u
2
(u
2
S D)

u
2
S D

ud S D

d
2
S D

d
2
(d
2
S D)
164
18. Trinomial Model.
Suppose the asset price is S at time t. At time t + t the asset price can be
uS (up-state) with probability p
u
, mS (middle-state) with probability p
m
, and d S
(down-state) with probability p
d
. Then the risk neutral one period option pricing
formula is
V (S, t) =
1
R
(p
u
V (uS, t + t) + p
m
V (mS, t + t) + p
d
V (d S, t + t)) .
The trinomial tree approach is equivalent to the explicit nite dierence method. By
equating the rst and second moments of the asset price S(t+t) in both continuous
and discrete models, we obtain the following equations:
p
u
+ p
m
+ p
d
= 1 ,
p
u
u + p
m
m + p
d
d = e
r t
,
p
u
u
2
+ p
m
m
2
+ p
d
d
2
= e
(2 r+
2
) t
.
There are three equations and six variables u, m, d and p
u
, p
m
, p
d
. We need three
more equations to uniquely determine these variables.
165
19. Boyle Model.
The three additional conditions are
m = 1, ud = 1, u = e

t
,
where is a free parameter.
Exercise.
Show that the risk-neutral probabilities are given by
p
u
=
(W R) u (R 1)
(u 1) (u
2
1)
, p
m
= 1 p
u
p
d
,
p
d
=
(W R) u
2
(R 1) u
3
(u 1) (u
2
1)
,
where R = e
r t
and W = e
(2 r+
2
) t
.
Note that if = 1, which corresponds to the choice of u as in the CRR model, certain
sets of parameters can lead to p
m
< 0. To rectify this, choose > 1.
166
Boyle claimed that if p
u
p
m
p
d

1
3
, then the trinomial scheme with 5 steps is
comparable to the CRR binomial scheme with 20 steps.
167
20. HullWhite Model.
This is the same as the Boyle model with =

3.
Exercise.
Show that the risk-neutral probabilities are
p
d
=
1
6

_
t
12
2
_
r
1
2

2
_
,
p
m
=
2
3
,
p
u
=
1
6
+
_
t
12
2
_
r
1
2

2
_
,
if terms of order O(t) are ignored.
168
21. KamradRitchken Model.
If S follows a lognormal process, then we can write ln S(t +t) = ln S(t) +Z, where
Z is a normal random variable with mean (r
1
2

2
) t and variance
2
t.
KR suggested to approximate Z by a discrete random variable Z
a
as follows: Z
a
=
x with probability p
u
, 0 with probability p
m
, and x with probability p
d
, where
x =

t and 1.
The corresponding u, m, d in the trinomial tree are u = e
x
, m = 1, d = e
x
.
By omitting the higher order term O((t)
2
), they showed that the risk-neutral prob-
abilities are
p
u
=
1
2
2
+
1
2
_
r
1
2

2
_

t ,
p
m
= 1
1

2
,
p
d
=
1
2
2

1
2
_
r
1
2

2
_

t .
Note that if = 1 then p
m
= 0 and the trinomial scheme is reduced to the binomial
169
scheme.
KR claimed that if p
u
= p
m
= p
d
=
1
3
, then the trinomial scheme with 15 steps is
comparable to the binomial scheme (p
m
= 0) with 55 steps. They also discussed the
trinomial scheme with two correlated state variables.
170
Z
a
=
_

_
x with prob. p
u
0 with prob. p
m
x with prob. p
d
and the equations become
p
u
+ p
m
+ p
d
= 1 , (1)
E(Z
a
) = x (p
u
p
d
) = (r
1
2

2
) t , (2)
Var(Z
a
) = (x)
2
(p
u
+ p
d
) (x)
2
(p
u
p
d
)
2
. .
O((t)
2
)
=
2
t .
Dropping the O((t)
2
) term leads to
(x)
2
(p
u
+ p
d
) =
2
t . (3)
171
Hence
p
u
=
1
2
_
t
(x)
2

2
+
t
x
(r
1
2

2
)
_
, p
d
=
1
2
_
t
(x)
2

2

t
x
(r
1
2

2
)
_
.
p
u
=
1
2
_
1

2
+
1

(r
1
2

2
)

t
_
, p
d
=
1
2
_
1

2

1

(r
1
2

2
)

t
_
.
172
8. Finite Dierence Methods
1. Diusion Equations of One State Variable.
u
t
= c
2

2
u
x
2
, (x, t) D, (17)
where t is a time variable, x is a state variable, and u(x, t) is an unknown function
satisfying the equation.
To nd a well-dened solution, we need to impose the initial condition
u(x, 0) = u
0
(x) (18)
and, if D = [a, b] [0, ), the boundary conditions
u(a, t) = g
a
(t) and u(b, t) = g
b
(t) , (19)
where u
0
, g
a
, g
b
are continuous functions.
173
If D = (, ) (0, ), we need to impose the boundary conditions
lim
|x|
u(x, t) e
a x
2
= 0 for any a > 0. (20)
(20) implies u(x, t) does not grow too fast as |x| .
The diusion equation (17) with the initial condition (18) and the boundary conditions
(19) is well-posed, i.e. there exists a unique solution that depends continuously on u
0
,
g
a
and g
b
.
174
2. Grid Points.
To nd a numerical solution to equation (17) with nite dierence methods, we rst
need to dene a set of grid points in the domain D as follows:
Choose a state step size x =
ba
N
(N is an integer) and a time step size t, draw
a set of horizontal and vertical lines across D, and get all intersection points (x
j
, t
n
),
or simply (j, n),
where x
j
= a + j x, j = 0, . . . , N, and t
n
= nt, n = 0, 1, . . ..
If D = [a, b] [0, T] then choose t =
T
M
(M is an integer) and t
n
= nt, n =
0, . . . , M.
175
x
0
= a x
1
x
2
. . . x
N
= b
t
0
= 0
t
1
t
2
t
3
.
.
.
t
M
= T
176
3. Finite Dierences.
The partial derivatives
u
x
:=
u
x
and u
xx
:=

2
u
x
2
are always approximated by central dierence quotients, i.e.
u
x

u
n
j+1
u
n
j1
2 x
and u
xx

u
n
j+1
2 u
n
j
+ u
n
j1
(x)
2
(21)
at a grid point (j, n). Here u
n
j
= u(x
j
, t
n
).
Depending on how u
t
is approximated, we have three basic schemes: explicit, implicit,
and CrankNicolson schemes.
177
4. Explicit Scheme.
If u
t
is approximated by a forward dierence quotient
u
t

u
n+1
j
u
n
j
t
at (j, n),
then the corresponding dierence equation to (17) at grid point (j, n) is
w
n+1
j
= w
n
j+1
+ (1 2 ) w
n
j
+ w
n
j1
, (22)
where
= c
2
t
(x)
2
.
The initial condition is w
0
j
= u
0
(x
j
), j = 0, . . . , N, and
the boundary conditions are w
n
0
= g
a
(t
n
) and w
n
N
= g
b
(t
n
), n = 0, 1, . . ..
The dierence equations (22), j = 1, . . . , N 1, can be solved explicitly.
178
5. Implicit Scheme.
If u
t
is approximated by a backward dierence quotient
u
t

u
n+1
j
u
n
j
t
at (j, n + 1),
then the corresponding dierence equation to (17) at grid point (j, n + 1) is
w
n+1
j+1
+ (1 + 2 ) w
n+1
j
w
n+1
j1
= w
n
j
. (23)
The dierence equations (23), j = 1, . . . , N 1, together with the initial and bound-
ary conditions as before, can be solved using the Crout algorithm or the SOR algo-
rithm.
179
Explicit Method.
w
n+1
j
w
n
j
t
= c
2
w
n
j1
2 w
n
j
+ w
n
j+1
(x)
2
()
Letting := c
2 t
(x)
2
gives (22).
Implicit Method.
w
n+1
j
w
n
j
t
= c
2
w
n+1
j1
2 w
n+1
j
+ w
n+1
j+1
(x)
2
()
Letting := c
2 t
(x)
2
gives (23).
In matrix form
_
_
_
_
_
_
1 + 2
1 + 2
.
.
.
.
.
.
1 + 2
_
_
_
_
_
_
w = b .
180
The matrix is tridiagonal and diagonally dominant. Crout / SOR.
181
6. CrankNicolson Scheme.
The CrankNicolson scheme is the average of the explicit scheme at (j, n) and the
implicit scheme at (j, n + 1).
The resulting dierence equation is

2
w
n+1
j1
+ (1 + ) w
n+1
j

2
w
n+1
j+1
=

2
w
n
j1
+ (1 ) w
n
j
+

2
w
n
j+1
. (24)
The dierence equations (24), j = 1, . . . , N 1, together with the initial and bound-
ary conditions as before, can be solved using Crout algorithm or SOR algorithm.
182
CrankNicolson.
1
2
[() + ()] gives
w
n+1
j
w
n
j
t
=
1
2
c
2
w
n
j1
2 w
n
j
+ w
n
j+1
(x)
2
+
1
2
c
2
w
n+1
j1
2 w
n+1
j
+ w
n+1
j+1
(x)
2
Letting :=
1
2
c
2 t
(x)
2
=

2
gives
w
n+1
j+1
+ (1 + 2 ) w
n+1
j
w
n+1
j1
= w
n+1
j
and w
n+1
j
= w
n
j+1
+ (1 2 ) w
n
j
+ w
n
j1
.
This can be interpreted as
w
n+1
j
predictor (explicit method)
w
n+1
j
corrector (implicit method)
183
7. Local Truncation Errors.
These are measures of the error by which the exact solution of a dierential equa-
tion does not satisfy the dierence equation at the grid points and are obtained by
substituting the exact solution of the continuous problem into the numerical scheme.
A necessary condition for the convergence of the numerical solutions to the continuous
solution is that the local truncation error tends to zero as the step size goes to zero.
In this case the method is said to be consistent.
It can be shown that all three methods are consistent.
The explicit and implicit schemes have local truncation errors O(t, (x)
2
), while
that of the CrankNicolson scheme is O((t)
2
, (x)
2
).
184
Local Truncation Error.
For the explicit scheme we get for the LTE at (j, n)
E
n
j
=
u(x
j
, t
n+1
) u(x
j
, t
n
)
t
c
2
u(x
j1
, t
n
) 2 u(x
j
, t
n
) + u(x
j+1
, t
n
)
(x)
2
.
With the help of a Taylor expansion at (x
j
, t
n
) we nd that
u(x
j
, t
n+1
) u(x
j
, t
n
)
t
= u
t
(x
j
, t
n
) + O(t) ,
u(x
j1
, t
n
) 2 u(x
j
, t
n
) + u(x
j+1
, t
n
)
(x)
2
= u
xx
(x
j
, t
n
) + O((x)
2
) .
Hence
E
n
j
= u
t
(x
j
, t
n
) c
2
u
xx
(x
j
, t
n
)
. .
= 0
+O(t) + O((x)
2
) .
185
8. Numerical Stability.
Consistency is only a necessary but not a sucient condition for convergence.
Roundo errors incurred during calculations may lead to a blow up of the solution or
erode the whole computation.
A scheme is stable if roundo errors are not amplied in the calculations.
The Fourier method can be used to check if a scheme is stable.
Assume that a numerical scheme admits a solution of the form
v
n
j
= a
(n)
() e
i j x
, (25)
where is the wave number and i =

1.
186
Dene
G() =
a
(n+1)
()
a
(n)
()
,
where G() is an amplication factor, which governs the growth of the Fourier com-
ponent a().
The von Neumann stability condition is given by
|G()| 1
for 0 x .
It can be shown that the explicit scheme is stable if and only if
1
2
, called con-
ditionally stable, and the implicit and CrankNicolson schemes are stable for any
values of , called unconditionally stable.
187
Stability Analysis.
For the explicit scheme we get on substituting (25) into (22) that
a
(n+1)
() e
i j x
= a
(n)
() e
i (j+1) x
+ (1 2 ) a
(n)
() e
i j x
+ a
(n)
() e
i (j1) x
G() =
a
(n+1)
()
a
(n)
()
= e
i x
+ (1 2 ) + e
i x
.
The von Neumann stability condition then is
|G()| 1 |e
i x
+ (1 2 ) + e
i x
| 1
|(1 2 ) + 2 cos( x)| 1
|1 4 sin
2
_
x
2
_
| 1 [cos 2 = 1 2 sin
2
]
0 4 sin
2
_
x
2
_
2
0
1
2 sin
2
_
x
2
_
for all 0 x .
188
This is equivalent to 0
1
2
.
189
Remark.
The explicit method is stable, if and only if
t
(x)
2
2 c
2
. ()
() is a strong restriction on the time step size t. If x is reduced to
1
2
x, then t
must be reduced to
1
4
t.
So the total computational work increases by a factor 8.
Example.
u
t
= u
xx
(x, t) [0, 1] [0, 1]
Take x = 0.01. Then

1
2
t 0.00005
I.e. the number of grid points is equal to
1
x
1
t
= 100 20, 000 = 2 10
6
.
190
Remark.
In vector notation, the explicit scheme can be written as
w
n+1
= Aw
n
+ b
n
,
where w
n
= (w
n
1
, . . . , w
n
N1
)
T
R
N1
and
A =
_
_
_
_
_
_
1 2
1 2
.
.
.
.
.
.
1 2
_
_
_
_
_
_
R
(N1)(N1)
, b
n
=
_
_
_
_
_
_
_
_
w
n
0
0
.
.
.
0
w
n
N
_
_
_
_
_
_
_
_
R
N1
.
For the implicit method we get
Bw
n+1
= w
n
+ b
n+1
, where B =
_
_
_
_
_
_
1 + 2
1 + 2
.
.
.
.
.
.
1 + 2
_
_
_
_
_
_
.
191
Remark.
Forward diusion equation: u
t
c
2
u
xx
= 0 t 0.
Backward diusion equation:
u
t
+ c
2
u
xx
= 0 t T
u(x, T) = u
T
(x) x
u(a, t) = g
a
(t), u(b, t) = g
b
(t) t [as before]
[Note: We could use the transformation v(x, t) := u(x, T t) in order to transform this
into a standard forward diusion problem.]
We can solve the backward diusion equation directly by starting at t = T and solving
backwards, i.e. given w
n+1
, nd w
n
.
Implicit: w
n+1
=

Aw
n
+

b
n
Explicit:

Bw
n+1
= w
n
+

b
n
The von Neumann stability condition for the backward problem then becomes
|

G()| =

a
n
()
a
n+1
()

1 .
192
Stability of the Binomial Model.
The binomial model is an explicit method for a backward equation.
V
n
j
=
1
R
(p V
n+1
j+1
+ (1 p) V
n+1
j1
) =
1
R
(p V
n+1
j+1
+ 0 V
n+1
j
+ (1 p) V
n+1
j1
)
for j = n, n + 2, . . . , n 2, n and n = N 1, . . . , 1, 0.
Here the initial values V
N
N
, V
N
N+2
, . . . , V
N
N2
, V
N
N
are given.
Now let V
n
j
= a
(n)
() e
i j x
, then
a
(n)
() e
i j x
=
1
R
_
p a
(n+1)
() e
i (j+1) x
+ (1 p) a
(n+1)
() e
i (j1) x
_


G() =
_
p e
i x
+ (1 p) e
i x
_
e
r t
=
_
cos( x) + (2 p 1)
. .
= q
i sin( x)
_
e
r t
|

G()|
2
= (cos
2
( x) + q
2
sin
2
( x)) e
2 r t
= (1 + (q
2
1) sin
2
( x)) e
2 r t
e
2 r t
1
if q
2
1 1 q 1 p [0, 1]. Hence the binomial model is stable.
193
Stability of the CRR Model.
We know that the binomial model is stable if p (0, 1).
For the CRR model we have that
u = e

t
, d = e

t
, p =
R d
u d
,
so p (0, 1) is equivalent to u > R > d.
Clearly, for t small, we can ensure that
e

t
> e
r t
.
Hence the CRR model is stable, if t is suciently small, i.e. if t <

2
r
2
.
Alternatively, one can argue (less rigorously) as follows. Since x = uS S =
S (e

t
1) S

t and as the BSE can be written as


u
t
+
1
2

2
S
2
. .
c
2
u
SS
+ . . . = 0 ,
194
it follows that
= c
2
t
(x)
2
=
1
2

2
S
2
t
S
2

2
t
=
1
2
CRR is stable.
195
9. Simplication of the BSE.
Assume V (S, t) is the price of a European option at time t.
Then V satises the BlackScholes equation (13) with appropriate initial and bound-
ary conditions.
Dene
= T t, x = ln S, w(, x) = e
x+
V (t, S) ,
where and are parameters.
Then the BlackScholes equation can be transformed into a basic diusion equation:
w

=
1
2

2

2
w
x
2
with a new set of initial and boundary conditions.
Finite dierence methods can be used to solve the corresponding dierence equations
and hence to derive option values at grid points.
196
Transformation of the BSE.
Consider a call option.
Let = T t be the remaining time to maturity. Set u(x, ) = V (x, t). Then
u

=
V
t
and the BSE (13) is equivalent to
u

=
1
2

2
S
2
u
SS
+ r S u
S
r u, ()
u(S, 0) = V (S, T) = (S X)
+
, (IC)
u(0, ) = V (0, t) = 0 , u(S, ) = V (S, t) S as S . (BC)
Let x = ln S ( S = e
x
). Set u(x, ) = u(S, ). Then
u
x
= u
S
e
x
= S u
S
, u
xx
= S u
S
+ S
2
u
SS
and () becomes
u

=
1
2

2
u
xx
+ (r
1
2

2
) u
x
r u, ()
u(x, 0) = u(S, 0) = (e
x
X)
+
, (IC)
u(x, ) = u(0, ) = 0 as x , u(x, ) = u(e
x
, ) e
x
as x . (BC)
197
Note that the growth condition (20), lim
|x|
u(x, ) e
a x
2
= 0 for any a > 0, is satised.
Hence () is well dened.
Let w(x, ) = e
x+
u(x, ) u(x, ) = e
x
w(x, ) =: C w(x, ). Then
u

= C ( w + w

)
u
x
= C (w + w
x
)
u
xx
= C ((w + w
x
) + (w
x
+ w
xx
)) = C (
2
w 2 w
x
+ w
xx
) .
So () is equivalent to
C ( w + w

) =
1
2

2
C (
2
w 2 w
x
+ w
xx
) + (r
1
2

2
) C (w + w
x
) r C w.
In order to cancel the w and w
x
terms we need to have
_
_
_
=
1
2

2

2
(r
1
2

2
) r ,
0 =
1
2

2
(2 ) + r
1
2

2
.

_
_
_
=
1

2
(r
1
2

2
) ,
=
1
2
2
(r
1
2

2
)
2
+ r .
198
With this choice of and the equation () is equivalent to
w

=
1
2

2
w
xx
, ()
w(x, 0) = e
x
u(x, 0) = e
x
(e
x
X)
+
, (IC)
w(x, ) = 0 as x , w(x, ) e
x+
e
x
as x . (BC)
Note that the growth condition (20) is satised. Hence () is well dened.
Implementation.
1. Choose a truncated interval [a, b] to approximate (, ).
e
8
= 0.0003, e
8
= 2981 [a, b] = [8, 8] serves all practical purposes.
2. Choose integers N, M to get the step sizes x =
ba
N
and =
Tt
M
.
Grid points (x
j
,
n
):
x
j
= a + j x, j = 0, 1, . . . , N and
n
= n, n = 0, 1, . . . , M.
Note: x
0
, x
N
and
0
represent the boundary of the grid with known values.
199
3. Solve () with
w(x, 0) = e
x
(e
x
X)
+
, (IC)
w(a, ) = 0 , w(b, ) =
_
_
_
e
(+1) b+
or
e
b
(e
b
X) e

(a better choice)
. (BC)
Note: If the explicit method is used, N and M need to be chosen such that
1
2

2

(x)
2

1
2
M

2
(T t)
(b a)
2
N
2
.
If the implicit or CrankNicolson scheme is used, there are no restrictions on N, M.
Use Crout or SOR to solve.
4. Assume w(x
j
,
M
), j = 0, 1, . . . , N are the solutions from step 3, then the call option
price at time t is
V (S
j
, t) = e
x
j
(Tt)
w(x
j
, T t
. .

M
) j = 0, 1, . . . , N ,
where S
j
= e
x
j
and T t
M
.
200
Note: The S
j
are not equally spaced.
201
10. Direct Discretization of the BSE.
Exercise: Apply the CrankNicolson scheme directly to the BSE (13), i.e. there is
no transformation of variables, and write out the resulting dierence equations and
do a stability analysis.
C
++
Exercise: Write a program to solve the BSE (13) using the result of the
previous exercise and the Crout algorithm. The inputs are the interest rate r, the
volatility , the current time t, the expiry time T, the strike price X, the maximum
price S
max
, the number of intervals N in [0, S
max
], and the number of subintervals
M in [t, T]. The output are the asset prices S
j
, j = 0, 1, . . . , N, at time t, and their
corresponding European call and put prices (with the same strike price X).
202
11. Greeks.
Assume that the asset prices S
j
and option values V
j
, j = 0, 1, . . . , N, are known at
time t.
The sensitivities of V at S
j
, j = 1, . . . , N 1, are computed as follows:

j
=
V
S
|
S=S
j

V
j+1
V
j1
S
j+1
S
j1
,
which is
V
j+1
V
j1
2 S
, if S is equally spaced.

j
=

2
V
S
2
|
S=S
j

V
j+1
V
j
S
j+1
S
j

V
j
V
j1
S
j
S
j1
S
j
S
j1
,
which is
V
j+1
2 V
j
+ V
j1
(S)
2
, if S is equally spaced.
203
12. Diusion Equations of Two State Variables.
u
t
=
2
_

2
u
x
2
+

2
u
y
2
_
, (x, y, t) [a, b] [c, d] [0, ). (26)
The initial conditions are
u(x, y, 0) = u
0
(x, y) (x, y) [a, b] [c, d] ,
and the boundary conditions are
u(a, y, t) = g
a
(y, t), u(b, y, t) = g
b
(y, t) y [c, d], t 0 ,
and
u(x, c, t) = g
c
(x, t), u(x, d, t) = g
d
(x, t) x [a, b], t 0 .
Here we assume that all the functions involved are consistent, in the sense that they
have the same value at common points, e.g. g
a
(c, t) = g
c
(a, t) for all t 0.
204
13. Grid Points.
(x
i
, y
j
, t
n
), where
x
i
= a + i x, x =
b a
I
, i = 0, . . . , I ,
y
j
= c + j y, y =
d c
J
, j = 0, . . . , J ,
t
n
= nt, t =
T
N
, n = 0, . . . , N
and I, J, N are integers.
Recalling the nite dierences (21), we have
u
xx

u
n
i+1,j
2 u
n
i,j
+ u
n
i1,j
(x)
2
and u
yy

u
n
i,j+1
2 u
n
i,j
+ u
n
i,j1
(y)
2
.
at a grid point (i, j, n).
205
Depending on how u
t
is approximated, we have three basic schemes: explicit, implicit,
and CrankNicolson schemes.
206
14. Explicit Scheme.
If u
t
is approximated by a forward dierence quotient
u
t

u
n+1
i,j
u
n
i,j
t
at (i, j, n),
then the corresponding dierence equation at grid point (i, j, n) is
w
n+1
i,j
= (1 2 2 ) w
n
i,j
+ w
n
i+1,j
+ w
n
i1,j
+ w
n
i,j+1
+ w
n
i,j1
(27)
for i = 1, . . . , I 1 and j = 1, . . . , J 1, where
=
2
t
(x)
2
and =
2
t
(y)
2
.
(27) can be solved explicitly. It has local truncation error O(t, (x)
2
, (y)
2
), but
is only conditionally stable.
207
15. Implicit Scheme.
If u
t
is approximated by a backward dierence quotient u
t

u
n+1
i,j
u
n
i,j
t
at (i, j, n +1),
then the dierence equation at grid point (i, j, n + 1) is
(1 + 2 + 2 ) w
n+1
i,j
w
n+1
i+1,j
w
n+1
i1,j
w
n+1
i,j+1
w
n+1
i,j1
= w
n
i,j
(28)
for i = 1, . . . , I 1 and j = 1, . . . , J 1.
For xed n, there are (I 1) (J 1) unknowns and equations. (28) can be solved by
relabeling the grid points and using the SOR algorithm.
(28) is unconditionally stable with local truncation error O(t, (x)
2
, (y)
2
), but is
more dicult to solve, as it is no longer tridiagonal, so the Crout algorithm cannot
be applied.
16. CrankNicolson Scheme.
It is the average of the explicit scheme at (i, j, n) and the implicit scheme at (i, j, n+
1). It is similar to the implicit scheme but with the improved local truncation error
O((t)
2
, (x)
2
, (y)
2
).
208
Solving the Implicit Scheme.
(1 + 2 + 2 ) w
n+1
i,j
w
n+1
i+1,j
w
n+1
i1,j
w
n+1
i,j+1
w
n+1
i,j1
= w
n
i,j
With SOR for (0, 2).
For each n = 0, 1, . . . , N
1. Set w
n+1,0
:= w
n
and
ll in the boundary conditions w
n+1,0
0,j
, w
n+1,0
I,j
, w
n+1,0
i,0
, w
n+1,0
i,J
for all i, j.
2. For k = 0, 1, . . .
For i = 1, . . . I 1, j = 1, . . . , J 1
w
k+1
i,j
=
1
1+2 +2
_
w
n
i,j
+ w
n+1,k
i+1,j
+ w
n+1,k+1
i1,j
+ w
n+1,k
i,j+1
+ w
n+1,k+1
i,j1
_
w
n+1,k+1
i,j
= (1 ) w
n+1,k
i,j
+ w
k+1
i,j
until w
n+1,k+1
w
n+1,k
< .
3. Set w
n+1
= w
n+1,k+1
.
209
With Block Jacobi/GaussSeidel.
Denote w
j
=
_
_
_
_
_
_
w
1,j
w
2,j
.
.
.
w
I1,j
_
_
_
_
_
_
R
I1
, j = 1, . . . , J 1, w =
_
_
_
_
_
_
w
1
w
2
.
.
.
w
J1
_
_
_
_
_
_
R
(I1) (J1)
.
210
On setting c = 1 + 2 + 2 , we have from (28) for j xed
_
_
_
_
_
_
c
c
.
.
.
c
_
_
_
_
_
_
_
_
_
_
_
_
w
1,j
w
2,j
.
.
.
w
I1,j
_
_
_
_
_
_
+
_
_
_

.
.
.

_
_
_
_
_
_
_
_
_
w
1,j+1
w
2,j+1
.
.
.
w
I1,j+1
_
_
_
_
_
_
+
_
_
_

.
.
.

_
_
_
_
_
_
_
_
_
w
1,j1
w
2,j1
.
.
.
w
I1,j1
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
w
n
1,j
+ w
n+1
0,j
w
n
2,j
.
.
.
w
n
I2,j
w
n
I1,j
+ w
n+1
I,j
_
_
_
_
_
_
_
_
A w
n+1
j
+ B w
n+1
j+1
+ B w
n+1
j1
= d
n
j
211
Rewriting
A w
n+1
j
+ B w
n+1
j+1
+ B w
n+1
j1
= d
n
j
j = 1, . . . , J 1
as
_
_
_
_
_
_
_
_
A B
B A B
.
.
.
.
.
.
.
.
.
B A B
B A
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
w
n+1
1
w
n+1
2
.
.
.
.
.
.
w
n+1
J1
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
d
n
1
B w
n+1
0
d
n
2
.
.
.
d
n
J2
d
n
J1
B w
n+1
J
_
_
_
_
_
_
_
_
=:
_
_
_
_
_
_
_
_

d
n
1
d
n
2
.
.
.
d
n
J2

d
n
J1
_
_
_
_
_
_
_
_
,
where w
n+1
0
and w
n+1
J
represent boundary points, leads to the following Block Jacobi
212
iteration: For k = 0, 1, . . .
A w
n+1,k+1
1
= B w
n+1,k
2
+

d
n
1
A w
n+1,k+1
2
= B w
n+1,k
1
B w
n+1,k
3
+ d
n
2
.
.
.
A w
n+1,k+1
J2
= B w
n+1,k
J3
B w
n+1,k
J1
+ d
n
J2
A w
n+1,k+1
J1
= B w
n+1,k
J2
+

d
n
J1
Similarly, the Block GaussSeidel iteration is given by:
For k = 0, 1, . . .
A w
n+1,k+1
1
= B w
n+1,k
2
+

d
n
1
A w
n+1,k+1
2
= B w
n+1,k+1
1
B w
n+1,k
3
+ d
n
2
.
.
.
A w
n+1,k+1
J2
= B w
n+1,k+1
J3
B w
n+1,k
J1
+ d
n
J2
A w
n+1,k+1
J1
= B w
n+1,k
J2
+

d
n
J1
213
In each case, use the Crout algorithm to solve for w
n+1,k+1
j
, j = 1, . . . , J 1.
Note on Stability.
Recall that in 1d a scheme was stable if |G()| =

a
(n+1)
()
a
(n)
()

1, where v
n
j
= a
(n)
() e
i j x
.
In 2d, this is adapted to
v
n
i,j
= a
(n)
() e

1 i x+

1 j y
.
214
17. Alternating Direction Implicit (ADI) Method.
An alternative nite dierence method is the ADI scheme, which is unconditionally
stable while the dierence equations are still tridiagonal and diagonally dominant.
The ADI algorithm can be used to eciently solve the BlackScholes two asset pricing
equation:
V
t
+
1
2

2
1
S
2
1
V
S
1
S
1
+
1
2

2
2
S
2
2
V
S
2
S
2
+
1

2
S
1
S
2
V
S
1
S
2
+r S
1
V
S
1
+r S
2
V
S
2
r V = 0.
(29)
See Clewlow and Strickland (1998) for details on how to transform the BlackScholes
equation (29) into the basic diusion equation (26) and then to solve it with the ADI
scheme.
215
ADI scheme
Implicit method at (i, j, n + 1):
w
n+1
i,j
w
n
i,j
t
=
2
w
n
i+1,j
2 w
n
i,j
+ w
n
i1,j
(x)
2
. .
approx. u
xx
using (i, j, n) data
+
2
w
n+1
i,j+1
2 w
n+1
i,j
+ w
n+1
i,j1
(y)
2
Implicit method at (i, j, n + 2):
w
n+2
i,j
w
n+1
i,j
t
=
2
w
n+2
i+1,j
2 w
n+2
i,j
+ w
n+2
i1,j
(x)
2
+
2
w
n+1
i,j+1
2 w
n+1
i,j
+ w
n+1
i,j1
(y)
2
. .
approx. u
yy
using (i, j, n + 1) data
We can write the two equations as follows:
w
n+1
i,j+1
+ (1 + 2 ) w
n+1
i,j
w
n+1
i,j1
= w
n
i+1,j
+ (1 2 ) w
n
i,j
+ w
n
i1,j
()
w
n+2
i+1,j
+ (1 + 2 ) w
n+2
i,j
w
n+2
i1,j
= w
n+1
i,j+1
+ (1 2 ) w
n+1
i,j
+ w
n+1
i,j1
()
216
To solve (), x i = 1, . . . , I 1 and solve a tridiagonal system to get w
n+1
i,j
for j =
1, . . . , J 1.
This can be done with e.g. the Crout algorithm.
To solve (), x j = 1, . . . , J 1 and solve a tridiagonal system to get w
n+2
i,j
for i =
1, . . . , I 1.
Currently the method works on the interval [t
n
, t
n+2
] and has features of an explicit
method. In order to obtain an (unconditionally stable) implicit method, we need to
adapt the method so that it works on the interval [t
n
, t
n+1
] and hence gives values w
n
i,j
for all n = 1, . . . , N.
Introduce the intermediate time point n +
1
2
. Then () generates w
n+
1
2
i,j
(not used) and
() generates w
n+1
i,j
.

2
w
n+
1
2
i,j+1
+ (1 + ) w
n+
1
2
i,j

2
w
n+
1
2
i,j1
=

2
w
n
i+1,j
+ (1 ) w
n
i,j
+

2
w
n
i1,j
()

2
w
n+1
i+1,j
+ (1 + ) w
n+1
i,j

2
w
n+1
i1,j
=

2
w
n+
1
2
i,j+1
+ (1 ) w
n+
1
2
i,j
+

2
w
n+
1
2
i,j1
()
217
9. Simulation
1. Uniform Random Number Generators.
In order to use simulation techniques, we rst need to generate independent samples
from some given distribution functions.
The simplest and the most important distribution in simulation is the uniform dis-
tribution U[0, 1].
Note that if X is a U[0, 1] random variable, then Y = a + (b a)X is a U[a, b]
random variable.
We focus on how to generate U[0, 1] random variables.
Examples.
2 outcomes, p
1
= p
2
=
1
2
(coin)
6 outcomes, p
1
= . . . = p
6
=
1
6
(dice)
3 outcomes, p
1
= p
2
= 0.49, p
3
= 0.05 (roulette wheel)
218
2. Linear Congruential Generator.
A common technique is to generate a sequence of integers n
i
, dened recursively by
n
i
= (a n
i1
) mod m (30)
for i = 1, 2, . . . , N, where n
0
(= 0) is called the seed, a > 0 and m > 0 are integers
such that a and m have no common factors.
(30) generates a sequence of numbers in the set {1, 2, . . . , m1}.
Note that n
i
are periodic with period m 1, this is because there are not m
dierent n
i
and two in {n
0
, . . . , n
m1
} must be equal: n
i
= n
i+p
with p m1.
If the period is m1 then (30) is said to have full period.
The condition of full period is that m is a prime, a
m1
1 is divisible by m, and
a
j
1 is not divisible by m for j = 1, . . . , m2.
Example. n
0
= 35, a = 13, m = 100.
Then the sequence is {35, 55, 15, 95, 35, . . .}. So p = 4.
219
3. PseudoUniform Random Numbers.
If we dene
x
i
=
n
i
m
then x
i
is a sequence of numbers in the interval (0, 1).
If (30) has full period then these x
i
s are called pseudo-U[0, 1] random numbers.
In view of the periodic property, the number m should be as large as possible, because
a small set of numbers makes the outcome easier to predict a contrast to randomness.
The main drawback of linear congruential generators is that consecutive points ob-
tained lie on parallel hyperplanes, which implies that the unit cube cannot be uni-
formly lled with these points.
For example, if a = 6 and m = 11, then the ten
distinct points generated lie on just two parallel lines
in the unit square.
x
i+1
x
i
220
4. Choice of Parameters.
A good choice of a and m is given by a = 16807 and m = 2147483647 = 2
31
1.
The seed n
0
can be any positive integer and can be chosen manually.
This allows us to repeatedly generate the same set of numbers, which may be useful
when we want to compare dierent simulation techniques.
In general, we let the computer choose n
0
for us, a common choice is the computers
internal clock.
For details on the implementation of LCG, see Press et al. (1992).
For state of the art random number generators (linear and nonlinear), see the pLab
website at http://random.mat.sbg.ac.at/. It includes extensive information on
random number generation, as well as links to free software in a variety of computing
languages.
221
5. Normal Random Number Generators
Once we have generated a set of pseudo-U[0, 1] random numbers, we can generate
pseudo-N(0, 1) random numbers.
Again there are several methods to generate a sequence of independent N(0, 1) ran-
dom numbers.
Note that if X is a N(0, 1) random variable, then Y = + X is a N(,
2
) random
variable.
222
6. Convolution Method.
This method is based on the Central Limit Theorem.
We know that if X
i
are independent identically distributed random variables, with
nite mean and nite variance
2
, then
Z
n
=

n
i=1
X
i
n

n
(31)
converges in distribution to a N(0, 1) random variable, i.e.
P(Z
n
x) (x), n .
If X is U[0, 1] distributed then its mean is
1
2
and its variance is
1
12
.
If we choose n = 12 then (31) is simplied to
Z
n
=
12

i=1
X
i
6. (32)
223
Note that Z
n
generated in this way is only an approximate normal random number.
This is due to the fact that the Central Limit Theorem only gives convergence in
distribution, not almost surely, and n should tend to innity.
In practice there are hardly any dierences between pseudo-N(0, 1) random numbers
generated by (32) and those generated by other techniques.
The disadvantage is that we need to generate 12 uniform random numbers to generate
1 normal random number, and this does not seem ecient.
224
7. BoxMuller Method.
This is a direct method as follows: Generate two independent U[0, 1] random numbers
X
1
and X
2
, dene
Z = h(X)
where X = (X
1
, X
2
) and Z = (Z
1
, Z
2
) are R
2
vectors and h : [0, 1]
2
R
2
is a vector
function dened by
(z
1
, z
2
) = h(x
1
, x
2
) =
_
_
2 ln x
1
cos(2 x
2
),
_
2 ln x
1
sin(2 x
2
)
_
.
Then Z
1
and Z
2
are two independent N(0, 1) random numbers.
225
This result can be easily proved with the transformation method. Specically, we can
nd the inverse function h
1
by
(x
1
, x
2
) = h
1
(z
1
, z
2
) =
_
e

1
2
(z
2
1
+z
2
2
)
,
1
2
arctan
z
2
z
1
_
.
The absolute value of the determinant of the Jacobian matrix is

(x
1
, x
2
)
(z
1
, z
2
)

=
1

2
e

1
2
z
2
1

2
e

1
2
z
2
2
. (33)
From the transformation theorem of random variables and (33) we know that if X
1
and X
2
are independent U[0, 1] random variables, then Z
1
and Z
2
are independent
N(0, 1) random variables.
226
8. Correlated Normal Distributions.
Assume X N(, ), where R
nn
is symmetric positive denite.
To generate a normal vector X do the following:
(a) Calculate the Cholesky decomposition = LL
T
.
(b) Generate n independent N(0, 1) random numbers Z
i
and let Z = (Z
1
, . . . , Z
n
).
(c) Set X = + LZ.
Example.
To generate 2 correlated N(,
2
) RVs with correlation coecient , let
(Y
1
, Y
2
) = (Z
1
, Z
1
+
_
1
2
Z
2
)
and then set
(X
1
, X
2
) = ( + Y
1
, + Y
2
) .
227
9. Inverse Transformation Method.
To generate a random variable X with known cdf F(x), the inverse transform method
sets
X = F
1
(U) ,
where U is a U[0, 1] random variable (i.e. X satises F(X) = U U[0, 1])).
The inverse of F is well dened if F is strictly increasing and continuous, otherwise
we set
F
1
(u) = inf{x : F(x) u}.
Exercise.
Use the inversion method to generate X from a Rayleigh distribution, i.e.
F(x) = 1 exp
_
x
2
2
2
_
.
228
Inverse Transformation Method.
X = F
1
(U)
P(X x) = P(F
1
(U) x)
= P(U F(x))
=
_
F(x)
0
1 du
= F(x)
X F
Examples.
Exponential with parameter a > 0: F(x) = 1 e
a x
, x 0.
Let U U[0, 1] and
F(X) = U 1 e
a X
= U
X =
1
a
ln(1 U) =
1
a
ln U .
229
Hence X Exp(a).
Arcsin law: Let B
t
be a Brownian motion on the time interval [0, 1]. Let T =
arg max
0t1
B
t
. Then, for any t [0, 1] we have that
P(T t) =
2

arcsin

t =: F(t) .
Similarly, let L = sup{t 1 : B
t
= 0}. Then, for any s [0, 1] we have that
P(L s) =
2

arcsin

s = F(s).
How to generate from this distribution?
Let U U[0, 1] and
F(X) = U
2

arcsin

X = U
X =
_
sin
U
2
_
2
=
1
2

1
2
cos( U).
Hence X F.
230
10. Acceptance-Rejection Method.
Assume X has pdf f(x) on a set S, g is another pdf on S from which we know how
to generate samples, and f is dominated by g on S, i.e. there exists a constant c such
that
f(x) c g(x) x S .
The acceptance-rejection method
generates a sample X from g and a sample U from U[0, 1], and
accepts X as a sample from f if U
f(X)
c g(X)
and
rejects X otherwise, and repeats the process.
Exercise.
Suppose the pdf f is dened over a nite interval [a, b] and is bounded by M. Use
the acceptance-rejection method to generate X from f.
231
Acceptance-Rejection Method.
Let Y be a sample returned by the algorithm.
Then Y has the same distribution of X conditional on U
f(X)
c g(X)
.
So for any measurable subset A S, we have
P(Y A) = P(X A | U
f(X)
c g(X)
) =
P(X A & U
f(X)
c g(X)
)
P(U
f(X)
c g(X)
)
.
Now, as U U[0, 1],
P(U
f(X)
c g(X)
) =
_
S
P(U
f(x)
c g(x)
) g(x) dx =
_
S
f(x)
c g(x)
g(x) dx =
1
c
_
S
f(x) dx =
1
c
,
which yields
P(Y A) = c P(X A & U
f(X)
c g(X)
) = c
_
A
f(x)
c g(x)
g(x) dx =
_
A
f(x) dx.
As A was chosen arbitrarily, we have that
Y f .
232
11. Monte Carlo Method for Option Pricing.
A wide class of derivative pricing problems come down to the evaluation of the fol-
lowing expectation
E [f(Z(T; t, z))] ,
where Z denotes the stochastic process that describes the price evolution of one or
more underlying nancial variables such as asset prices and interest rates, under the
respective risk neutral probability distributions.
The process Z has the initial value z at time t.
The function f species the value of the derivative at the expiration time T.
Monte Carlo simulation is a powerful and versatile technique for estimating the ex-
pected value of a random variable.
Examples.
E[e
r (Tt)
(S
T
X)
+
], where S
T
= S
t
e
(r
1
2

2
) (Tt)+

Tt Z
, Z N(0, 1).
E[e
r (Tt)
(max
tT
S

X)
+
], where S

as before.
233
12. Simulation Procedure.
(a) Generate sample paths of the underlying state variables (asset prices, interest
rates, etc.) according to risk neutral probability distributions.
(b) For each simulated sample path, evaluate discounted cash ows of the derivative.
(c) Take the sample average of the discounted cash ows over all sample paths.
C
++
Exercise: Write a program to simulate paths of stock prices S satisfying the
discretized SDE:
S = r S t + S

t Z
where Z N(0, 1). The inputs are the initial asset price S, the time horizon T, the
number of partitions n (time-step t =
T
n
), the interest rate r, the volatility , and
the number of simulations M. The output is a graph of paths of the stock price (or
the data needed to generate the graph).
234
13. Main Advantage and Drawback.
With the Monte Carlo approach it is easy to price complicated terminal payo func-
tion such as path-dependent options.
Practitioners often use the brute force Monte Carlo simulation to study newly invented
options.
However, the method requires a large number of simulation trials to achieve a high
level of accuracy, which makes it less competitive compared to the binomial and nite
dierence methods, when analytic properties of an option pricing model are better
known and formulated.
[Note: The CLT tells us that the Monte Carlo estimate is correct to order O(
1

M
). So
to incease the accuracy by a factor of 10, we need to compute 100 times more paths.]
235
14. Computation Eciency.
Suppose W
total
is the total amount of computational work units available to generate
an estimate of the value of the option V . Assume there are two methods for gener-
ating MC estimates, requiring W
1
and W
2
units of computational work for each run.
Assume W
total
is divisible by both W
1
and W
2
. Denote by V
i
1
and V
i
2
the samples for
the estimator of V using method 1 and 2, and
1
and
2
their standard deviation.
The sample means for estimating V using W
total
amount of work are, respectively,
1
N
1
N
1

i=1
V
i
1
and
1
N
2
N
2

i=1
V
i
2
where N
i
=
W
total
W
i
, i = 1, 2. The law of large numbers tells us that the above two
estimators are approximately normal with mean V and variances

2
1
N
1
and

2
2
N
2
.
Hence, method 1 is preferred over method 2 provided that
2
1
W
1

2
2
W
2
.
236
15. Antithetic Variate Method.
Suppose {
i
} are independent samples from N(0, 1) for the simulation of asset prices,
so that
S
i
T
= S
t
e
(r

2
2
) (Tt)+

Tt
i
for i = 1, . . . , M, where M is the total number of simulation runs.
An unbiased estimator of a European call option price is given by
c =
1
M
M

i=1
c
i
=
1
M
M

i=1
e
r (Tt)
max(S
i
T
X, 0).
We observe that {
i
} are also independent samples from N(0, 1), and therefore the
simulated price

S
i
T
= S
t
e
(r

2
2
) (Tt)

Tt
i
for i = 1, . . . , M, is a valid sample of the terminal asset price.
237
A new unbiased estimator is given by
c =
1
M
M

i=1
c
i
=
1
M
M

i=1
e
r (Tt)
max(

S
i
T
X, 0).
Normally, we expect c
i
and c
i
to be negatively correlated, i.e. if one estimate over-
shoots the true value, the other estimate undershoots the true value.
The antithetic variate estimate is dened to be
c =
c + c
2
.
The antithetic variate method improves the computation eciency provided that
cov(c
i
, c
i
) 0.
[Reason: Var(X + Y ) = Var(X) + Var(Y ) + 2 cov(X, Y ).
Var( c) =
1
2
Var( c) +
1
2
cov( c, c)]
238
16. Control Variate Method.
Suppose A and B are two similar options and the analytic price formula for option B
is available.
Let V
A
and

V
A
be the true and estimated values of option A, V
B
and

V
B
are similar
notations for option B.
The control variate estimate of option A is dened to be

V
A
=

V
A
+ (V
B


V
B
),
where the error V
B


V
B
is used as an adjustment in the estimation of V
A
.
The control variate method reduces the variance of the estimator of V
A
when the
options A and B are strongly (positively) correlated.
239
The general control variate estimate of option A is dened to be

A
=

V
A
+ (V
B


V
B
),
where R is a parameter.
The minimum variance of

V

A
is achieved when

=
cov(

V
A
,

V
B
)
Var(

V
B
)
.
Unfortunately, cov(

V
A
,

V
B
) is in general not available.
One may estimate

using the regression technique from the simulated option values


V
i
A
and V
i
B
, i = 1, . . . , M.
Note: E(X + (E(Y ) Y )) = E(X) + (E(Y ) E(Y )) = E(X) and
Var(X + (E(Y ) Y )) = Var(X Y ) = Var(X) +
2
Var(Y ) 2 cov(X, Y ) .
240
17. Other Variance Reduction Procedures.
Other methods include
importance sampling that modies distribution functions to make sampling more
ecient,
stratied sampling that divides the distribution regions and takes samples from
these regions according to their probabilities,
moment matching that adjusts the samples such that their moments are the same
as those of distribution functions,
low-discrepancy sequence that leads to estimating error proportional to
1
M
rather
than
1

M
, where M is the sample size.
For details of these methods see Hull (2005) or Glasserman (2004).
241
18. Pricing European Options.
If the payo is a function of the underlying asset at one specic time, then we can
simplify the above Monte Carlo procedure.
For example, a European call option has a payo max(S(T) X, 0) at expiry.
If we assume that S follows a lognormal process, then
S(T) = S
0
e
(r
1
2

2
) T+

T Z
(34)
where Z is a standard N(0, 1) random variable.
To value the call option, we only need to simulate Z to get samples of S(T).
There is no need to simulate the whole path of S.
The computational work load will be considerably reduced.
Of course, if we want to price path-dependent European options, we will have to
simulate the whole path of the asset process S.
242
C
++
Exercise: Write a program to price European call and put options using the
Monte Carlo simulation with the antithetic variate method. The standard normal
random variables can be generated by rst generating U[0, 1] random variables and
then using the BoxMuller method. The terminal asset prices can be generated by
(34). The inputs are the current asset price S
0
, the exercise price X, the volatility
, the interest rate r, the exercise time T, and the number of simulations M. The
outputs are European call and put prices at time t = 0 (same strike price).
243
10. American Option Pricing
1. Formulation.
A general class of American option pricing problems can be formulated by specifying
a Markov process {X(t), 0 t T} representing relevant nancial variables such as
underlying asset prices, an option payo h(X(t)) at time t, an instantaneous short
rate process {r(t), 0 t T}, and a class of admissible stopping times T with
values in [0, T].
The American option pricing problem is to nd the optimal expected discounted
payo
sup
T
E[e

0
r(u) du
h(X())].
It is implicit that the expectation is taken with respect to the risk-neutral measure.
In this course we assume that the short rate r(t) = r, a non-negative constant for
0 t T.
244
For example, if the option can only be exercised at times 0 < t
1
< t
2
< < t
m
=
T (this type of option is often called the Bermudan option), then the value of an
American put can be written as
sup
i=1,...,m
E[e
r t
i
(K S
i
)
+
] ,
where K is the exercise price, S
i
the underlying asset price S(t
i
), and r the risk-free
interest rate.
245
American Option
In general, the value of an American option is higher than that of the corresponding
European option.
However, an American call for a non-dividend paying asset has the same value as the
European call.
This follows from the fact that in this case the optimal exercise time is always the
expiration time T, as can be seen from the following argument.
Suppose the holder wants to exercise the call at time t < T, when S(t) > X. Exercising
would give a prot of S(t) X. Instead, one could keep the option and sell the asset
short at time t, then purchase the asset at time T by either
(a) exercising the option at time t = T or
(b) buying at the market price at time T.
Hence the holder gained an amount S(t) > X at time t and paid out min(S(T), X) X
at time T. This is better than just S(t) X at time t.
246
2. Dynamic Programming Formulation.
Let h
i
denote the payo function for exercise at time t
i
, V
i
(x) the value of the option
at t
i
given X
i
= x, assuming the option has not previously been exercised.
We are interested in V
0
(X
0
).
This value is determined recursively as follows:
V
m
(x) = h
m
(x) ,
V
i1
(x) = max{h
i1
(x), C
i1
(x)}, i = m, . . . , 1 ,
where h
i
(x) is the immediate exercise value at time t
i
,
C
i1
(x) = E[D
i
V
i
(X
i
) | X
i1
= x]
is the expected discounted continuation value at time t
i1
, and
D
i
= e
r (t
i
t
i1
)
is the discount factor over [t
i1
, t
i
].
247
3. Stopping Rule.
The optimal stopping time is determined by
= min{t
i
: h
i
(X
i
) C
i
(X
i
), i = 1, . . . , m}.
If

C
i
(x) is the approximation to C
i
(x), then the sub-optimal stopping time is
determined by
= min{t
i
: h
i
(X
i
)

C
i
(X
i
), i = 1, . . . , m}.
248
4. Binomial Method.
At the terminal exercise time t
m
= T, the American option value at node j is given
by V
m
j
= h
m
j
, j = 0, 1, . . . , m, where h
i
j
= h(S
i
j
) is the intrinsic value at (i, j), e.g.
h(s) = (K s)
+
.
At time t
i
, i = m1, . . . , 0, the option value at node j is given by
V
i
j
= max
_
h
i
j
,
1
R
[p V
i+1
j+1
+ (1 p) V
i+1
j
]
_
. (35)
Dynamic programming is used to nd the option value at time t
0
.
In (35) we have p =
R d
u d
and the values of p, u, d are determined as usual.
249
5. Finite Dierence Method.
The dynamic programming approach cannot be applied with the implicit or Crank
Nicolson scheme.
Suppose the dierence equation with an implicit scheme has the form
a
j1
V
j1
+ a
j
V
j
+ a
j+1
V
j+1
= d
j
(36)
for j = 1, . . . , N 1, where the superscript n + 1 is omitted for brevity and d
j
represents the known quantities.
Recall the SOR algorithm to solve (36) is given by
V
(k)
j
= V
(k1)
j
+

a
j
_
d
j
a
j1
V
(k)
j1
a
j
V
(k1)
j
a
j+1
V
(k1)
j+1
_
for j = 1, . . . , N 1 and k = 1, 2, . . ..
250
Let e.g. h
j
= (S
n+1
j
K)
+
be the intrinsic value of the American option at node
(j, n + 1).
The solution procedure to nd V
n+1
j
is then
V
(k)
j
= max
_
h
j
, V
(k1)
j
+

a
j
(d
j
a
j1
V
(k)
j1
a
j
V
(k1)
j
a
j+1
V
(k1)
j+1
)
_
(37)
for j = 1, . . . , N 1 and k = 1, 2, . . ..
The procedure (37) is called the projected SOR scheme.
251
6. Random Tree Method.
This is a Monte Carlo method based on simulating a tree of paths of the underlying
Markov chain X
0
, X
1
, . . . , X
m
.
Fix a branching parameter b 2.
From the initial state X
0
simulate b independent successor states X
1
1
, . . . , X
b
1
all
having the law of X
1
.
From each X
i
1
simulate b independent successors X
i1
2
, . . . , X
ib
2
from the conditional
law of X
2
given X
1
= X
i
1
.
From each X
i
1
i
2
2
generate b successors X
i
1
i
2
1
3
, . . . , X
i
1
i
2
b
3
, and so on.
A generic node in the tree at time step i is denoted by X
j
1
j
2
j
i
i
.
At the mth time step there are b
m
nodes and the computational cost has exponential
growth.
(e.g. if b = 5 and m = 5, there are 3125 nodes;
if b = 5 and m = 10, there are about 10 million nodes.)
252
7. High Estimator.
Write

V
j
1
j
2
j
i
i
for the value of the high estimator at node X
j
1
j
2
j
i
i
.
At the terminal nodes we set

V
j
1
j
2
j
m
m
= h
m
(X
j
1
j
2
j
m
m
).
Working backwards we get

V
j
1
j
2
j
i
i
= max
_
_
_
h
i
(X
j
1
j
2
j
i
i
),
1
b
b

j=1
D
i+1

V
j
1
j
2
j
i
j
i+1
_
_
_
.

V
0
is biased high in the sense that
E[

V
0
] V
0
(X
0
)
and converges in probability and in norm to the true value V
0
(X
0
) as b .
253
8. Low Estimator.
Write v
j
1
j
2
j
i
i
for the value of the low estimator at node X
j
1
j
2
j
i
i
.
At the terminal nodes we set
v
j
1
j
2
j
m
m
= h
m
(X
j
1
j
2
j
m
m
).
Working backwards, for k = 1, . . . , b, we set
v
j
1
j
2
j
i
i,k
=
_
h
i
(X
j
1
j
2
j
i
i
) if
1
b1

b
j=1,j=k
D
i+1
v
j
1
j
2
j
i
j
i+1
h
i
(X
j
1
j
2
j
i
i
)
D
i+1
v
j
1
j
2
j
i
k
i+1
otherwise
we then set
v
j
1
j
2
j
i
i
=
1
b
b

k=1
v
j
1
j
2
j
i
i,k
.
v
0
is biased low and converges in probability and in norm to V
0
(X
0
) as b .
254
Example. High Estimator. Low Estimator.
92 (0)
112
(12)
115 (15)
85 (0)
107 (7)
100
(9)
97
(9)
102 (2)
118 (18)
114 (14)
105
(6)
104 (4)
94 (0)
92 (0) 12
112
(12)
115 (15) 12
85 (0) 12
107 (7) 7
100
(8)
97
(9)
102 (2) 2
118 (18) 18
114 (14) 5
105
(3)
104 (4) 4
94 (0) 0
Exercise price K = 100, interest rate r = 0.
Asset price (call price)
255
9. Condence Interval.
Let

V
0
(M) denote the sample mean of the M replications of

V
0
, s
V
(M) the sample
standard deviation, and z
2
the 1

2
quantile of the normal distribution. Then

V
0
(M) z
2
s
V
(M)

M
provides an asymptotically valid (for large M) 1 condence interval for E[

V
0
].
Similarly, we can get the condence interval for E[ v
0
].
The interval
_
v
0
(M) z
2
s
v
(M)

M
,

V
0
(M) + z
2
s
V
(M)

M
_
contains the unknown value V
0
(X
0
) with probability of at least 1 .
256
10. Regression Method.
Assume that the continuation value can be expressed as
E(V
i+1
(X
i+1
) | X
i
= x) =
n

r=1

ir

r
(x) =
T
i
(x) , (38)
where
i
= (
i1
, . . . ,
in
)
T
, (x) = (
1
(x), . . . ,
n
(x))
T
, and
r
, r = 1, . . . , n, are
basis functions (e.g. polynomials 1, x, x
2
, . . .). Then the vector
i
is given by

i
= B
1

b
V
,
where B

= E[(X
i
) (X
i
)
T
] is an n n matrix (assumed to be non-singular),
b
V
= E[(X
i
) V
i+1
(X
i+1
)] is an n column vector, and the expectation is over the
joint distribution of (X
i
, X
i+1
).
257
The regression method can be used to estimate
i
as follows:
generate b independent paths (X
1j
, . . . , X
mj
), j = 1, . . . , b, and
suppose that the values V
i+1
(X
i+1,j
) are known, then

i
=

B
1

b
V
,
where

B

is an n n matrix with qr entry


1
b
b

j=1

q
(X
ij
)
r
(X
ij
)
and

b
V
is an n vector with rth entry
1
b
b

j=1

r
(X
ij
) V
i+1
(X
i+1,j
).
However, V
i+1
is unknown and must be replaced by some estimated value

V
i+1
.
258
11. Pricing Algorithm.
(a) Generate b independent paths {X
0j
, X
1j
, . . . , X
mj
}, j = 1, . . . , b, of the Markov
chain, where X
0j
= X
0
, j = 1, . . . , b.
(b) Set

V
mj
= h
m
(X
mj
), j = 1, . . . , b.
(c) For i = m 1, . . . , 0, and given estimated values

V
i+1,j
, j = 1, . . . , b, use the
regression method to calculate

i
and set

V
ij
= max{h
i
(X
ij
), D
i+1

T
i
(X
ij
)}.
(d) Set

V
0
=
1
b
b

j=1

V
0j
.
The regression algorithm is biased high and

V
0
converges to V
0
(X
0
) as b if the
relation (38) holds for all i = 0, . . . , m1.
259
12. LongstaSchwartz Algorithm.
Same as 10.11, except in computing

V
ij
:

V
ij
=
_
h
i
(X
ij
) if D
i+1

T
i
(X
ij
) h
i
(X
ij
)
D
i+1

V
i+1,j
otherwise
The LS algorithm is biased low and

V
0
converges to V
0
(X
0
) as b if the relation
(38) holds for all i = 0, . . . , m1.
13. Other Methods. Parametric approximation, state-space partitioning, stochastic
mesh method, duality algorithm, and obstacle (free boundary-value) problem formu-
lation.
(See Glasserman (2004) and Seydel (2006) for details.)
260
11. Exotic Option Pricing
1. Introduction.
Derivatives with more complicated payos than the standard European or American
calls and puts are called exotic options, many of which are so called path-dependent
options.
Some common exotic options are
Asian option: Average price call and put, average strike call and put. Arithmetic
or geometric average can be taken.
Barrier option: Up/down-and-in/out call/put. Double barrier, partial barrier
and other time dependent eects possible.
Range forward contract: It is a portfolio of a long call with a higher strike price
and a short put with a lower strike price.
Bermudan option: It is a nonstandard American option in which early exercise
is restricted to certain xed times.
261
Compound option: It is an option on an option. There are four basic forms:
call-on-call, call-on-put, put-on-call, put-on-put.
Chooser option: The option holders have the right to choose whether the option
is a call or a put.
Binary (digital) option: The terminal payo is a xed amount of cash, if the
underlying asset price is above a strike price, and it pays nothing otherwise.
Lookback option: The terminal payo depends on the maximum or minimum
realized asset price over the life of the option.
262
Examples.
Range forward contract: Long call and short put
Assume you have a long position in a call with strike price X
2
and a short position in a put
with strike price X
1
< X
2
. Then the terminal payo is given by
(SX
2
)
+
(X
1
S)
+
=
_

_
S X
2
S > X
2
0 X
1
S X
2
S X
1
S < X
1
S X
1
X
2
Buy this option, if you do not think that the asset price will go down.
You can generate cash from a price increase.
263
Option price = Call price - Put price
264
Compound option: Call on call option
| | |
t T
1
T
2

C(T
2
) = (S(T
2
) X
2
)
+

C(T
1
)

C(S(T
1
), X
2
, T
1
, . . .)
C(T
1
) = (

C(T
1
) X
1
)
+
C(t) = ?
Binary (digital) option:
Price = E[e
r T
Q1
{S
T
X}
] = e
r T
QE[1
{S
T
X}
] = e
r T
QP(S
T
X) .
The price can be easily computed, since S
T
is lognormally distributed.
265
Lookback Option
Let
m = min
tuT
S(u) and M = max
tuT
S(u) .
(Floating strike) Call: (S
T
m)
+
.
(Floating strike) Put: (M S
T
)
+
.
Fixed strike Call: (M X)
+
.
Fixed strike Put: (X m)
+
.
266
2. Barrier Options.
The existence of options at the expiration date depends on whether the underlying
asset prices have crossed certain values (barriers).
There are four basic forms: down-and-out, up-and-out, down-and-in, up-and-in.
If there is only one barrier then we can derive analytic formulas for standard European
barrier options.
However, there are many situations (two barriers, American options, etc.), in which
we have to rely on numerical procedures to nd the option values.
The standard binomial and trinomial schemes can be used for this purpose, but their
convergence is very slow, because the barriers assumed by the tree is dierent from
the true barrier.
267

True barrier
Inner barrier
Outer barrier
Barriers assumed by binomial trees.
The usual tree calculations implicitly assume that the outer barrier is the true barrier.
268

True barrier
Outer barrier
Inner barrier
Barriers assumed by trinomial trees.
269
3. Positioning Nodes on Barriers.
Assume that there are two barriers H
1
and H
2
with H
1
< H
2
.
Assume that the trinomial scheme is used for the option pricing with m = 1 and
d =
1
u
.
Choose u such that nodes lie on both barriers.
u must satisfy H
2
= H
1
u
N
(and hence H
1
= H
2
d
N
), or equivalently
ln H
2
= ln H
1
+ N ln u,
with N an integer.
It is known that u = e

3 t
is a good choice in the standard trinomial tree, recall
the HullWhite model (7.20).
270
A good choice of N is therefore
N = int
_
ln H
2
ln H
1

3 t
+ 0.5
_
and u is determined by
u = exp
_
1
N
ln
H
2
H
1
_
.
Normally, the trinomial tree is constructed so that the central node is the initial asset
price S. In this case, the asset price at the rst node is the initial asset price. After
that, we choose the central node of the tree (beginning with the rst period) to be
H
1
u
M
, where M is an integer that makes this quantity as close as possible to S, i.e.
M = int
_
ln S ln H
1
ln u
+ 0.5
_
.
271

S
H
1
u
3
H
2
H
1
Tree with nodes lying on each of two barriers.
272
4. Adaptive Mesh Model.
Computational eciency can be greatly improved if one projects a high resolution
tree onto a low resolution tree in order to achieve a more detailed modelling of the
asset price in some regions of the tree.
E.g. to price a standard American option, it is useful to have a high resolution tree
near its maturity around the strike price.
To price a barrier option it is useful to have a high resolution tree near its barriers.
See Hull (2005) for details.
273
5. Asian Options.
The terminal payo depends on some form of averaging of the underlying asset prices
over a part or the whole of the life of the option.
There are two types of terminal payos:
max(A X, 0) (average price call) and
max(S
T
A, 0) (average strike call),
where A is the average of asset prices, and there are many dierent ways of computing
A.
From now on we focus on pricing average price calls, the results for average strike
calls can be obtained similarly.
We also assume that the asset price S follows a lognormal process in a risk-neutral
world.
274
6. Continuous Arithmetic Average.
The average A is computed as
A =
1
T
I(T) ,
where the function I(t) is dened by
I(t) =
_
t
0
S(u) du.
Assume that V (S, I, t) is the option value at time t. Then V satises the following
diusion equation:
V
t
+ r S
V
S
+
1
2

2
S
2

2
V
S
2
r V + S
V
I
= 0 (39)
with suitable boundary and terminal conditions.
If nite dierence methods are used to solve (39), then the schemes are prone to
serious oscillations and implicit schemes have poor performance, due to the missing
of one diusion term in this two-dimensional PDE.
275
7. Continuous Geometric Average.
The average A is computed as
A = exp
_
1
T
I(T)
_
,
where the function I(t) is dened by
I(t) =
_
t
0
ln S(u) du.
The option value V satises the following diusion equation:
V
t
+ r S
V
S
+
1
2

2
S
2

2
V
S
2
r V + ln S
V
I
= 0 . (40)
Exercise. Show that if we dene a new variable
y =
I + (T t) ln S
T
and seek a solution of the form V (S, I, t) = F(y, t), then the equation (40) can be
reduced to a one-state-variable PDE:
F
t
+
1
2
_
(T t)
T
_
2

2
F
y
2
+ (r
1
2

2
)
_
T t
T
_
F
y
r F = 0 .
276
8. Discrete Geometric Average: Case t T
0
.
Assume that the averaging period is [T
0
, T] and that the sampling points are t
i
=
T
0
+ i t, i = 1, 2, . . . , n, t =
TT
0
n
.
The average A is computed as
A =
_
n

i=1
S(t
i
)
_1
n
.
Assume
t T
0
,
i.e. the current time is already within the averaging period.
Assume
t = t
k
+ t
for some integer k: 0 k n 1 and 0 < 1.
277
Then ln A is a normal random variable with mean ln

S(t) +
A
and variance
2
A
,
where

S(t) = [S(t
1
) S(t
k
)]
1
n
S(t)
nk
n

A
=
_
r

2
2
_
(T T
0
)
_
n k
n
2
(1 ) +
(n k 1) (n k)
2 n
2
_

2
A
=
2
(T T
0
)
_
(n k)
2
n
3
(1 ) +
(n k 1) (n k) (2n 2k 1)
6 n
3
_
The European call price at time t is
c(S, t) = e
r (Tt)
_

S(t) e

A
+
1
2

2
A
(d
1
) X (d
2
)
_
where
d
1
=
1

A
_
ln

S(t)
X
+
A
_
+
A
and
d
2
= d
1

A
.
278
Proof.
t = t
k
+ t, t =
TT
0
n
, 0 k n 1, 0 < 1
A = [S(t
1
) S(t
n
)]
1
n
=
_
_
S(t
n
)
S(t
n1
)
S
2
(t
n1
)
S
2
(t
n2
)

S
nk
(t
k+1
)
S
nk
(t)
S
nk
(t) S(t
k
) S(t
1
)
. .
known
_
_
1
n
=
_
R
n
R
2
n1
R
nk1
k+2
R
nk
t

1
n

S(t)
ln A = ln

S(t) +
1
n
[ln R
n
+ 2 ln R
n1
+ . . . + (n k 1) ln R
k+2
+ (n k) ln R
t
]
Since S is a lognormal process (dS = r S dt + S dW), we have that ln R
n
, . . . , ln R
k+2
,
ln R
t
are independent normally distributed and
ln R
n
, . . . , ln R
k+2
N(t,
2
t),
ln R
t
N((t
k+1
t),
2
(t
k+1
t)) = N((1 ) t,
2
(1 ) t) ,
where = r
1
2

2
.
279
Hence ln A is normal with mean
E[ln A] = E[ln

S(t)] +
1
n
E [ln R
n
+ 2 ln R
n1
+ . . . + (n k 1) ln R
k+2
+ (n k) ln R
t
]
= ln

S(t) +
1
n
[t + 2 t + . . . + (n k 1) t + (n k) (1 ) t]
= ln

S(t) +
1
n
t [1 + 2 + . . . + (n k 1) + (n k) (1 )]
= ln

S(t) + t
_
(n k 1) (n k)
2 n
+
n k
n
(1 )
_
= ln

S(t) +
A
.
Moreover, the variance of ln A is given by
Var(ln A) =
1
n
2
_
Var(ln R
n
) + . . . + (n k 1)
2
Var(ln R
k+2
) + (n k)
2
Var(ln R
t
)

=
1
n
2

2
t
_
1
2
+ 2
2
+ . . . + (n k 1)
2
+ (n k 1)
2
(1 )

=
2
t
_
(n k 1) (n k) (2 n 2 k 1)
6 n
2
+
(n k)
2
n
2
(1 )
_
=
2
A
.
[Used:
n

i=1
i =
n(n + 1)
2
and
n

i=1
i
2
=
n(n + 1) (2n + 1)
6
.]
280
As A = e
ln A
, A is lognormally distributed and the important formula in 5.5. tells us
that
E[(A X)
+
] = E(A) (d
1
) X (d
2
) ,
where d
1
=
1
s
ln
E(A)
X
+
s
2
and d
2
= d
1
s, with s
2
= Var(ln A).
Now
ln A N(ln

S(t) +
A
,
2
A
)
and so, on recalling 5.4.,
E(A) =

S(t) e

A
+
1
2

2
A
.
Combining gives
E[(A X)
+
] =

S(t) e

A
+
1
2

2
A
(d
1
) X (d
2
) ,
where
d
1
=
1

A
_
ln

S(t)
X
+
A
+
1
2

2
A
_
+
1
2

A
=
1

A
_
ln

S(t)
X
+
A
_
+
A
281
and d
2
= d
1

A
.
Finally, the Asian (European) call price is e
r (Tt)
E[(A X)
+
].
282
Aside:
n

i=1
i =
n(n + 1)
2
and
n

i=1
i
2
=
n(n + 1) (2n + 1)
6
.
Let f(x) = x then
1
2
=
_
1
0
f(x) dx =
n

i=1
_ i
n
i1
n
x dx =
n

i=1
1
2
x
2
|
i
n
i1
n
=
n

i=1
1
2
_
_
i
n
_
2

_
i 1
n
_
2
_
=
n

i=1
1
2
_
2
i
n
2

1
n
2
_
=
1
n
2
n

i=1
i
1
2 n
2
n

i=1
1 =
1
n
2
n

i=1
i
1
2 n
Hence
1
n
2
n

i=1
i =
1
2
+
1
2 n
=
n + 1
2 n

n

i=1
i =
n(n + 1)
2
.
Similarly, for f(x) = x
2
we obtain
1
3
=
_
1
0
f(x) dx =
n

i=1
_ i
n
i1
n
x
2
dx = . . .
283
9. Discrete Geometric Average: Case t < T
0
.
Exercise.
Assume
t < T
0
,
i.e. the current time is before the averaging period.
Show that ln A is a normal random variable with mean ln S(t) +
A
and variance
2
A
,
where

A
=
_
r

2
2
__
(T
0
t) +
n + 1
2 n
(T T
0
)
_

2
A
=
2
_
(T
0
t) +
(n + 1) (2n + 1)
6 n
2
(T T
0
)
_
.
The European call price at time t is
c(S, t) = e
r (Tt)
_
S e

A
+
1
2

2
A
(d
1
) X (d
2
)
_
where d
1
=
1

A
_
ln
S
X
+
A
_
+
A
and d
2
= d
1

A
.
284
10. Limiting Case n = 1.
If n = 1 then A = S(T).
Furthermore,

A
= (r

2
2
) (T t)

2
A
=
2
(T t).
The call price reduces to the BlackScholes call price.
285
A = (
1
i=1
S(t
i
))
1
= S(t
1
) = S(T), independently from the choice of T
0
.
Hence choose e.g. T
0
> t.

A
= (r

2
2
) [(T
0
t) + (T T
0
)] = (r

2
2
) (T t) ,

2
A
=
2
[(T
0
t) + (T T
0
)] =
2
(T t) .
Call price:
c = e
r (Tt)
_
S e
(r

2
2
) (Tt)+
1
2

2
(Tt)
(d
1
) X (d
2
)
_
= S (d
1
) X e
r (Tt)
(d
2
) ,
where
d
1
=
1

A
_
ln
S
X
+
A
_
+
A
=
1

T t
_
ln
S
X
+ (r

2
2
) (T t)
_
+

T t
=
1

T t
_
ln
S
X
+ r (T t)
_
+
1
2

T t
and d
2
= d
1

A
=
1

Tt
_
ln
S
X
+ r (T t)
_

1
2

T t.
286
This is just the BlackScholes pricing formula (16), see 7.13.
287
11. Limiting Case n = .
If n then A exp
_
1
T T
0
_
T
T
0
ln S(u) du
_
.
Furthermore, if time t < T
0
then

A
(r

2
2
)
_
1
2
T +
1
2
T
0
t
_
,

2
A

2
_
1
3
T +
2
3
T
0
t
_
.
If time t T
0
then

S(t) [S(t)]
Tt
TT
0
exp
_
1
T T
0
_
t
T
0
ln S(u) du
_
,

A

_
r

2
2
_
(T t)
2
2 (T T
0
)
,

2
A

2
(T t)
3
3 (T T
0
)
2
.
The above values, together with the discrete average price formulas, then yield pricing
288
formulas for the continuous geometric mean.
289
Proof.
Two cases: t < T
0
and t T
0
. Here we look at the latter.

A
=
_
r

2
2
_
(T T
0
)
_
1
n
(1
k
n
) (1 ) +
1
2
(1
k
n

1
n
) (1
k
n
)
_
,

2
A
=
2
(T T
0
)
_
1
n
(1
k
n
)
2
(1 ) +
1
6
(1
k
n

1
n
) (1
k
n
) (2 2
k
n

1
n
)
_
.
Since t = t
k
+ t = T
0
+ k t + t = T
0
+ k
TT
0
n
+
TT
0
n
, we have that
t T
0
T T
0
=
k
n
+

n

k
n

t T
0
T T
0
1
k
n

T t
T T
0
.
Hence, as n , we have that

A

_
r

2
2
_
(T T
0
)
_
1
2
(T t)
2
(T T
0
)
2
_
=
_
r

2
2
_
(T t)
2
2 (T T
0
)
,

2
A

2
(T T
0
)
_
1
3
(T t)
3
(T T
0
)
3
_
=
2
(T t)
3
3 (T T
0
)
2
.
290
Moreover,

S(t) = [S(t
1
) S(t
k
)]
1
n
S(t)
nk
n
ln

S(t) =
1
n
[ln S(t
1
) + . . . + ln S(t
k
)] +
n k
n
ln S(t)
=
1
T T
0
t [ln S(t
1
) + . . . + ln S(t
k
)] + (1
k
n
) ln S(t) .
As t = t
k
+ t t
k
as n , we have
ln

S(t)
1
T T
0
_
t
T
0
ln S(u) du +
T t
T T
0
ln S(t)
=
1
T T
0
_
t
T
0
ln S(u) du + ln
_
S
Tt
TT
0
(t)
_


S(t) S
Tt
TT
0
(t) exp
_
1
T T
0
_
t
T
0
ln S(u) du
_
.
Note that the average A = [S(t
1
) S(t
n
)]
1
n
converges to
exp
_
1
T T
0
_
t
T
0
ln S(u) du
_
as n ,
291
which is just the continuous geometric average.
292
12. Discrete Arithmetic Average.
The average A is computed as
A =
1
n
n

i=1
S(t
i
).
The option value is dicult to compute because A is not lognormal.
The binomial method can be used but great care must be taken so that the number
of nodes does not increase exponentially.
Monte Carlo simulation is possibly the most ecient method.
Since an option with discrete arithmetic average is very similar to that with discrete
geometric average, and since the latter has an analytic pricing formula, the control
variate method (with the geometric call price as control variate) can be used to reduce
the variance of the Monte Carlo simulation.
293

Anda mungkin juga menyukai