MS&E 321 Martingales Guide

MS&E 321 Spring 12-13
Stochastic Systems June 1, 2013

Prof. Peter W. Glynn Page 1 of 17
Section 10: Martingales
Contents
10.1 Martingales in Discrete Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

10.2 Optional Sampling for Discrete-Time Martingales . . . . . . . . . . . . . . . . . . . . 5
10.3 Martingales for Discrete-Time Markov Chains . . . . . . . . . . . . . . . . . . . . . . 10
10.4 The Strong Law for Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
10.5 The Central Limit Theorem for Martingales . . . . . . . . . . . . . . . . . . . . . . . 16
10.1 Martingales in Discrete Time
A fundamental tool in the analysis of DTMC’s and continuous-time Markov processes is the
notion of a martingale. Martingales also underlie the definition we will adopt for defining stochastic
integrals with respect to Brownian motion. A martingale is basically a real-valued sequence that
is a suitable generalization of a random walk with independent, mean-zero increments.
Definition 10.1.1 Let (Mn : n ≥ 0) be a sequence of real-valued random variables. Then, (Mn :
n ≥ 0) is said to be a martingale (with respect to the sequence of random elements (Zn : n ≥ 0) if:
(i) E|Mn | < ∞ for n ≥ 0;

(ii) for each n ≥ 0, there exists a deterministic function gn (·) such that
Mn = gn (Z0 , Z1 , . . . , Zn );
(iii) E[Mn+1 |Z0 , Z1 , . . . , Zn ] = Mn , for n ≥ 0.
Remark 10.1.1 When a process (Mn : n ≥ 0) satisfies condition (ii), one says that (Mn : n ≥ 0)
is adapted to (Zn : n ≥ 0).
The critical component of the martingale definition is condition (iii). If we view Mn as the
fortune of a gambler at time n, then condition (iii) is asserting that the gambler is involved in
playing a “fair game”, in which he/she has no propensity (in expectation) to either win or lose on
any given gamble. As we asserted earlier, a random walk with independent mean-zero increments
is a martingale. To see this, let S0 , X1 , X2 , . . . be independent random variables with finite mean,
and suppose that EXi = 0 for i ≥ 1. Set Zn = Sn = S0 + X1 + · · · + Xn . Then, conditions (i) and
(ii) of Definition 10.1.1 are trivial to verify. For condition (iii), observe that
E[Sn+1 |S0 , . . . , Sn ] = E[Sn + Xn+1 |S0 , . . . , Sn ] = Sn + E[Xn+1 |S0 , . . . , Sn ] = Sn + EXn+1 = Sn .
Martingales inherit many of the properties of mean-zero random walks. In view of the analogy with
random walks, it is natural to consider the increments
Di = Mi − Mi−1 , i≥1
1
§ SECTION 10: MARTINGALES
namely, the martingale differences. The following proposition is a clear generalization of two of the
most important properties of mean-zero random walks.
Proposition 10.1.1 Let (Mn : n ≥ 0) be a martingale with respect to (Zn : n ≥ 0). Then,
EMn = EM0 n ≥ 0. (10.1.1)
In addition, if EMn2 < ∞ for n ≥ 0, then
Cov(Di , Dj ) = 0, i 6= j (10.1.2)
so that
n
X
Var[Mn ] = Var[M0 ] + Var[Di ]. (10.1.3)
i=1
Proof: Relation (10.1.1) is immediate from condition (iii) of the martingale definition. For
(10.1.2), note that (10.1.1) implies that EDi = 0, so that (10.1.2) is equivalent to asserting that
EDi Dj = 0 for i < j. But
E[Di Dj |Z0 , . . . , Zj−1 ] = Di E[Dj |Z0 , . . . , Zj−1 ] = 0,
where condition (ii) of the martingale definition was used for the first equality, and condition (iii)
was used for the final step. Taking expectations with respect to (Z0 , . . . , Zj−1 ), we get (10.1.2).
Finally, (10.1.3) is immediate from (10.1.2).
Definition 10.1.2 A martingale (Mn : n ≥ 0) for which EMn2 < ∞ for n ≥ 0 is called a square-
integrable martingale.
Before we turn to exploring further properties of martingales, let us develop some additional
examples of martingales in the random walk setting.
Example 10.1.1 Let (Xn : n ≥ 1) be a sequence of iid mean-zero random variables with finite
variance σ 2 . Let Sn = X1 + · · · + Xn and let
Mn = Sn2 − nσ 2 .
Then (Mn : n ≥ 0) is a martingale with respect to (Sn : n ≥ 0). The critical property to verify is
(iii). Note that
E[Mn+1 |S0 , . . . , Sn ] = E[(Sn + Xn+1 )2 − (n + 1)σ 2 |S0 , . . . , Sn ]

= E[Sn2 + 2Sn Xn+1 + Xn+1
2
− (n + 1)2 σ 2 |S0 , . . . , Sn ]
= Sn2 + 2Sn E[Xn+1 |S0 , . . . , Sn ] + E[Xn+1
2
|S0 , . . . , Sn ] − (n + 1)2 σ 2
= Sn2 + σ 2 − (n + 1)2 σ 2
= Mn .
Example 10.1.2 Let (Xn : n ≥ 1) be a sequence of iid random variables with common density g.
Suppose that f is another density with the property that whenever g(x) = 0, then f (x) = 0. Set
L0 = 1 and
n
Y f (Xi )
Ln = , n≥1
g(Xi )
i=1
2
Then, (Ln : n ≥ 0) is a martingale with respect to (Xn : n ≥ 1). Again, the critical property is
verifying (iii). Here,
Z
f (Xn+1 ) f (Xn+1 ) f (x)
E[Ln+1 |X1 , . . . , Xn ] = E Ln X1 , . . . , Xn = Ln E Ln = Ln g(x) dx = Ln ,
g(Xn+1 ) g(Xn+1 ) g(x)
since f is a density that integrates to 1. This is known as a likelihood ratio martingale.
To show why the likelihood ratio martingale arises naturally, suppose that we have observed an
iid sample from a population, yielding observations X1 , X2 , . . . , Xn . Assume that the underlying
population is known to be iid, either with common density f or with common density g. To test
the hypothesis that the Xi ’s have common density f (the “f -hypothesis”) against the hypothesis
that the Xi ’s have common density g (the “g-hypothesis”), the Neyman-Pearson lemma asserts
that we should accept the “f -hypothesis” if the relative likelihood
f (X1 ) · · · f (Xn )
(10.1.4)
g(X1 ) · · · g(Xn )
is sufficiently large, and reject it otherwise. So, studying Ln in the case where the Xi ’s have common
density g corresponds to studying the test statistic (10.1.4) when the “state of nature” is that the
“g-hypothesis” is true. Given this interpretation, it seems natural to expect that Ln converges to
zero as the sample size n goes to positive infinity. This is because for a large sample size n, it is
extremely unlikely that such a sample will be better explained by the “f -hypothesis” than by the
other one. The fact that Ln ought to go to zero as n → ∞ is perhaps a bit surprising, given that
ELn = 1 for n ≥ 0.
To prove that Ln → 0 almost surely as n → ∞, note that
n
X f (Xi )
log Ln = log .
g(Xi )
i=1
Then, the strong law of large numbers guarantees that

1 f (Xi )
log Ln → E log a.s. as n → ∞.
n g(Xi )
In other words,
Z
1 f (x)
log Ln → log g(x) dx. (10.1.5)
n g(x)
(The right-hand side of (10.1.5) is what is known as a relative entropy.) Since log is strictly concave,
Jensens inequality asserts that if f 6= g,

f (Xi ) f (Xi )
E log < log E =0 (10.1.6)
g(Xi ) g(Xi )
As a consequence, not only does Ln converge to zero as n → ∞ a.s, but the rate of convergence
is exponentially fast. It is worth noting that this is an example of a sequence of random variables
(Ln : n ≥ 0) for which Ln → 0 a.s. and yet ELn 9 0 as n → ∞ (in other words, passing limits
through expectations is not always valid).
3
Example 10.1.3 In this example, we specialize the likelihood ratio martingale a bit. Suppose that
the Xi ’s are iid with common density g, and suppose that the moment generating function mX (θ) =
EeθXi converges in some neighborhood of the origin. For θ within the domain of convergence of
mX (·), let
eθx g(x)
f (x) = ,
mX (θ)
or, equivalently, f (x) = eθx−ψ(θ) g(x), where ψ(θ) = log mX (θ). In this case,
n
Y f (Xi )
Ln = = eθSn −nψ(θ) (10.1.7)
g(Xi )
i=1
The martingale (Ln : n ≥ 0) defined by (10.1.7) is known as an exponential martingale. Because

the random walk (Sn : n ≥ 0) appears explicitly in the exponent of the martingale, (Ln : n ≥ 0) is
well-suited to studying random walks.
Some indication of the power of this martingale should be apparent, if we explicitly display the
dependence of Ln on θ as follows:
Ln (θ) = eθSn −nψ(θ)
The defining property (iii) of a martingale asserts that
E[Ln+1 (θ)|S0 , . . . , Sn ] = Ln (θ).
For θ inside the domain of convergence of mX (·), one can interchange the derivative and expectation,
yielding
E[L0n+1 (θ)|S0 , . . . , Sn ] = L0n (θ).
In particular, (L0n (0) : n ≥ 0) is a martingale. But
L0n (0) = Sn − nψ 0 (0).
It turns out that ψ 0 (0) = EX1 . So, by differentiating our exponential martingale, we retrieve the
random walk martingale. And by differentiating a second time, it turns out that L00n (0) is the
martingale of Example 10.1.2. Through successive differentiation, we can obtain a whole infinite
family of such martingales.
Exercise 10.1.1 (a) Prove that ψ(·) is convex.

(b) Prove that ψ 0 (·) = EX1 .
(c) Prove that ψ 00 (0) = Var[X1 ].
(d) Prove that L00n (0) = (Sn − nµ)2 − nσ 2 .
(e) Compute L000
n (0).
We now turn to a fundamental result in the theory of martingales known as the Martingale
Convergence Theorem.
Theorem 10.1.1 (Martingale Convergence Theorem in L2 ) Let (Mn : n ≥ 0) be a martingale

with respect to (Zn : n ≥ 0). If supn≥0 EMn2 < ∞ , then there exists a square-integrable random
variable M∞ such that
E[(Mn − M0 )2 ] → 0
as n → ∞, i.e. Mn converges to M∞ in mean square.
4
Proof: The space L2 of square-integrable random variables is a Hilbert space under the inner
product hX, Y i = E[XY ]. Since
Xn
2 2
EMn = EM0 + EDi2 ,
i=1
P∞ 2
P∞ 2
it follows that i=1 EDi < ∞. For > 0, choose m = m() so that i=m EDi < . Then, for
n2 > n1 ≥ m,
n2
X
2
E(Mn2 − Mn1 ) = EDj2 <
j=n1 +1
so that (Mn : n ≥ 0) is a Cauchy sequence in L2 . Then, the completeness of L2 yields the conclusion
of the theorem.
Actually, one does not need square integrability in order that the Martingale Convergence
Theorem hold.
Theorem 10.1.2 (Martingale Convergence Theorem) Let (Mn : n ≥ 0) be a martingale with

respect to (Zn : n ≥ 0). If supn≥0 E|Mn | < ∞ , then there exists a finite-valued random variable
M∞ such that Mn → M∞ a.s. as n → ∞.
For a proof, see p. 233 of “Probability: Theory and Examples” 3rd ed. by R. Durrett.
We conclude this section with a brief discussion of stochastic integrals in discrete time. Let
(M )n : n ≥ 0) be a square-integrable martingale with respect to (Zn : n ≥ 0). Suppose that
(Wn : n ≥ 0) is a sequence of random variables that is adapted to (Zn : n ≥ 0). We define the
stochastic integral of (Wn : n ≥ 0) with respect to (Mn : n ≥ 0) as the sequence
n
X n
X n
X
Vn = Wi−1 Di = Wi−1 (Mi − Mi−1 ) = Wi−1 ∆Mi
i=1 i=1 i=1
Pn
We could also have defined the stochastic integral here as i=1 Wi ∆Mi . But in that case, we would
lose the nice properties listed below.
Exercise 10.1.2 Let (Mn : n ≥ 0) be a square-integrable martingale with respect to (Zn : n ≥ 0),
with M0 = 0. Suppose (Wn : n ≥ 0) is a square-integrable sequence that is adapted to (Zn : n ≥ 0).
(a) Prove that if V0 = 0 and Vn = ni=1 Wi−1 ∆Mi for n ≥ 1, then (Vn : n ≥ 0) is a martingale
P
with respect to (Zn : n ≥ 0).
(Di : i ≥ 1) are a stationary sequence of independent
(b) Suppose that the martingale differences P
2 2 n−1 2 2
random variables. Show that EVn = σ i=0 EWi , where σ = Var[Di ].
10.2 Optional Sampling for Discrete-Time Martingales
An important property of martingales is

EMn = EM0 n≥0 (10.2.1)
The theory of “optional sampling” is concerned with extending (10.2.1) from deterministic n to
random times T . As in the discussion of the strong Markov property, it is natural to restrict
ourselves to stopping times. However,
EMT = EM0 (10.2.2)
5
fails to hold for all finite-valued stopping times T .
Example 10.2.1 Let (Sn : n ≥ 0) be a random walk with S0 = 0 and iid increments (Xn : n ≥ 1)
defined by
1
P(Xn = 1) = P(Xn = −1) = .
2
Put T = inf{n ≥ 0 : Sn = 1}. Since (Sn : n ≥ 0) is null recurrent, T < ∞ a.s. and ST = 1.
Therefore, EST = 1 and ES0 = 0 , and so EST 6= ES0 . Hence, the class of stopping times needs to
be restricted somewhat.
Theorem 10.2.1 Let (Mn : n ≥ 0) be a martingale with respect to (Zn : n ≥ 0). Suppose that
T is a bounded random variable that is a stopping time with respect to (Zn : n ≥ 0). Then
EMT = EM0 .
Pm
Proof: Let m be such that P (T ≤ m) = 1. Then MT = M0 + i=1 Di I(T ≥ i), and thus
m
X
EMT = EM0 + E Di I(T ≥ i) (10.2.3)
i=1
Because T is a stopping time, E[Di I(T ≥ i)|Z0 , . . . , Zi−1 ] = I(T ≥ i)E[Di |Z0 , . . . , Zi−1 ] = 0, and so
m
X
E Di I(T ≥ i) = 0
i=1
If T is a stopping time, then T ∧ n is a stopping time for n ≥ 0 (and is clearly bounded). So,
optional sampling applies at T ∧ n (see Theorem 10.2.1), i.e. EMT ∧n = EM0 for n ≥ 0.
If T < ∞ a.s., then MT ∧n → MT a.s. as n → ∞. Hence, if
E lim MT ∧n = lim EMT ∧n , (10.2.4)

n→∞ n→∞
then (10.2.2) holds, since
EMT = E lim MT ∧n = lim EMT ∧n = lim EM0 = EM0 .

n→∞ n→∞ n→∞
Therefore, the key to establishing (10.2.2) is (10.2.4). There are various results which one
can invoke to justify (10.2.4); the most powerful of these results is the Dominated Convergence
Theorem. To apply this result, we need to find a random variable W having finite mean, such that
|MT ∧n | ≤ W for n ≥ 0. The obvious candidate for W is
T
X
W = |M0 | + D
e i, (10.2.5)
i=1
e i = |Di |. So, if EW < ∞, we conclude that (10.2.4) is valid.

where D
Proposition 10.2.1 Suppose that there exists c < ∞ such that P(Di ≤ c) = 1 for i ≥ 1. If
ET < ∞, then
EMT = EM0 .
6
Proof: Note that W ≤ |M0 | + cT . Since ET < ∞, then EW < ∞. Then, the Dominated
Convergence Theorem implies that EMT ∧n → EMT as n → ∞, yielding the result.
Now, let’s turn to an application of optional sampling.
Application 10.2.1 Let (Sn : n ≥ 0) be a random walk with S0 = 0 and iid increments (Xn : n ≥
1) defined by
1
P(Xn = 1) = P(Xn = −1) = .
2
Let T = inf{n ≥ 0 : Sn ≤ −a or Sn ≥ b} be the “exit time”from [−a, b]. Suppose that we wish
to compute for P(ST = −a), the probability that the random walk exits the left boundary. (This
is basically the “gambler’s ruin” computation for the probability of ruin.) Note that D
e i = 1 and
ET < ∞ (see Exercise 10.2.1). Hence, Proposition 10.2.1 applies and EST = 0. However,
EST = −aP(ST = −a) + bP(ST = b) = −aP(ST = −a) + b[1 − P(ST = −a)].
Therefore, P(ST = −a) = b/(a + b).
Exercise 10.2.1 (a) Prove that ET < ∞ in Application 10.2.1.

(b) Compute the value of P(ST = −a) by setting up a suitable system of linear equations involving
the unknowns Px (ST = −a) and solving them. (This is an alternative approach to computing
the “exit” probability.)
Application 10.2.2 In this continuation of Application 10.2.1, we wish to compute ET (In the
gambler’s ruin setting, this is the mean duration of the game). Let Mn = Sn2 − nσ 2 , where
σ 2 = VarXi = 1. Assuming that (10.2.2) holds,
EST2 = σ 2 ET = ET. (10.2.6)
Solving for EST2 , we have

a2 b + ab2
EST2 = a2 P(ST = −a) + b2 P(ST = −b) = = ab,
a+b
so ET = ab. Does Proposition 10.2.1 apply? Here,
e i = Si2 − Si−1
D 2
= (Si + Si−1 )Xi − 1.
Clearly, the D
e i do not satisfy the hypotheses of Proposition 10.2.1, so something else is needed
here.
Proposition 10.2.2 Suppose that there exists c < ∞ for which

e i |Z0 , Z1 , . . . , Zi−1 ] ≤ c on {T ≥ i} for i ≥ 1.
E[D
If ET < ∞, then EMT = EM0 .
P∞ e P∞
Proof: Note that EW = E|M0 | + E i=1 Di I(T ≥ i) = E|M0 | + i=1 EDi I(T ≥ i). However,
e
e i I(T ≥ i)|Z0 , Z1 , . . . , Zi−1 ] = I(T ≥ i)E[D

E[D e i |Z0 , Z1 , . . . , Zi−1 ] ≤ cI(T ≥ i).
Thus, EW ≤ E|M0 | + c ∞
P
i=1 EI(T ≥ i) = E|M 0| + cET < ∞, and consequently the Dominated
Convergence Theorem applies.
7
Application 5.2.2 (continued) Here,D e i ≤ (|Si | + |Si−1 |) + 1. So, on {T ≥ i}, D

e i ≥ 2(|a| ∨ |b|) + 1,
validating the hypotheses of Proposition 10.2.2, and thus completing the desired computation.
How do we perform corresponding calculations if the random walk does not have mean zero?
Specifically, suppose that (Sn : n ≥ 0) is a random walk with S0 = 0 and iid increments (Xn : n ≥ 1)
given by
P(Xn = 1) = p = 1 − P(Xn = −1).
Here, the key is to switch to our exponential martingale.
Application 10.2.3 Here, mX (θ) = peθ + (1 − p)e−θ , so ψ(θ) = log(peθ + (1 − p)eθ ). Then, the
martingale of interest is Ln (θ) = eθSn −nψ(θ) . Assuming that optional sampling applies at time T ,
we arrive at ELT (θ) = 1, or, in other words,
EeθST −T ψ(θ) = 1. (10.2.7)
To compute the exit probabilities from [−a, b], it is desirable to eliminate the term T ψ(θ) from the
exponent of (10.2.7).
Recall that ψ is convex (see Exercise 10.1.1). There exists a unique θ∗ 6= 0 such that ψ(θ∗ ) = 0,
given by
∗ 1−p
θ = log .
p
∗S
Substituting θ = θ∗ into (10.2.7), we get Eeθ T = 1. But Eeθ∗ ST = e−θ∗ aP(ST = −a) +
eθ∗ bP(ST = b). Hence,
b
1 − 1−p p
P(ST = −a) = a b
p 1−p
1−p − p
(This is basically the probability of ruin in a gambler’s ruin problem that is not fair.)
Exercise 10.2.2 Rigorously apply the optional sampling theorem in Application 10.2.3.
Application 10.2.4 Let (Sn : n ≥ 0) be a random walk with S0 = 0 and iid increments (Xn : n ≥
1) given by
P(Xn = 1) = p = 1 − P(Xn = −1)
with p > 1/2. This is a walk with positive drift, so that T < ∞ a.s. if we set T = inf{n ≥ 0 : Sn ≥
b}. Our goal here is to compute the moment generating function of T , using martingale methods.
Assuming that we can invoke the optional sampling theorem at T ,
EeθSn −nψ(θ) = 1 (10.2.8)
For T as described above, ST = b (This is a consequence of the “continuity” of the nearest-

neighbor random walk. If Xi can take on values greater than or equal to 2, then ST would not be
deterministic, and this calculation becomes much harder). Relation (10.2.8) yields
Ee−T ψ(θ) = e−θb
Set γ = γ(θ) so that θ = ψ −1 (γ). Then,

−1 (γ)b
Ee−T γ = e−ψ
8
is the moment generating function of T (In computing ψ −1 (γ), one may find multiple roots; to
formally determine the appropriate root, note that the function Ee−γT of the non-negative random
variable T must be non-increasing in γ).
To make this result rigorous, note that if p > 1/2, then ψ 0 (0) > 0. The convexity of ψ(·) then
guarantees that ψ(θ) > 0 for θ > 0. Consequently, for θ > 0,
eθST ∧n −ψ(θ)(T ∧n) ≤ eθST ∧n ≤ eθb ,
so the Dominated Convergence Theorem ensures that (10.2.8) holds for θ > 0. Then for γ > 0, let
η = ψ −1 (γ) be the non-negative root of
ψ(η) = γ (10.2.9)
Relation (10.2.9) yields the expression
−1 (γ)b
Ee−γT = e−ψ = e−ηb ,
where ψ −1 is defined as above. Note that a rigorous application of optional sampling theory has
led us to the correct choice of root for the equation (10.2.9).
A similar analysis is possible for the one-sided hitting time T = inf{n ≥ 0 : Sn ≤ −a} with
a > 0. Since p > 1/2, T is infinite with positive probability in this case. Again, consider the
sequence eθST ∧n −ψ(θ)(T ∧n) . Note that if θ < 0 and ψ(θ) > 0, this sequence is bounded above by
e−θa . Hence, we may interchange limits and expectations in the expression
e−θa E[e−T ψ(θ) I(T ≤ n)] + E[eθSn −nψ(θ) I(T > n)] = 1,
thereby yielding the identity

E[e−T ψ(θ) ; T < ∞] = eθa ,
for θ < 0 satisfying ψ(θ) > 0. So, for γ ≥ 0, let η = ψ −1 (γ) be the root less than or equal to
θ∗ = log((1 − p)/p) < 0 defined by ψ(η) = γ. For the root defined as above, we then have
−1 (γ)a
E[e−γT ; T < ∞] = eψ .
Note that by setting γ = 0, we obtain the identity

∗
P(T < ∞) = eθ a .
In other words, we have computed the probability that a positive drift “nearest neighbor” random
walk ever drops below −a.
The theory of optional sampling extends beyond the martingale setting to supermartingales
and submartingales.
Definition 10.2.1 Let (Mn : n ≥ 0) be an integrable sequence of random variables that is adapted
to (Zn : n ≥ 0). If for n ≥ 0,
E[Mn+1 |Z0 , . . . , Zn−1 ] ≤ Mn ,
then (Mn : n ≥ 0) is said to be a supermartingale with respect to (Zn : n ≥ 0). On the other hand,
if
E[Mn+1 |Z0 , . . . , Zn−1 ] ≥ Mn ,
then (Mn : n ≥ 0) is said to be a submartingale with respect to (Zn : n ≥ 0).
9
If Mn corresponds to the fortune of a gambler at time n, then a supermartingale indicates

that the game is unfavorable to the gambler, whereas a submartingale indicates that the game is
favorable.
Proposition 10.2.3 Let T be a stopping time that is adapted to (Zn : n ≥ 0). If (Mn : n ≥ 0) is
a supermartingale with respect to (Zn : n ≥ 0), then
EMT ∧n ≤ EM0 , n ≥ 0.
On the other hand, if (Mn : n ≥ 0) is a submartingale with respect to (zn : n ≥ 0), then
EMT ∧n ≥ EM0 , n ≥ 0.
Exercise 10.2.3 Prove Proposition 10.2.3.
Exercise 10.2.4 Let (Mn : n ≥ 0) be a martingale with respect to (Zn : n ≥ 0). Suppose that
φ : R → R is a convex function for which E|φ(Mn )| < ∞ for n ≥ 0. Prove that (φ(Mn ) : n ≥ 0) is
a submartingale with respect to (Zn : n ≥ 0).
10.3 Martingales for Discrete-Time Markov Chains
In this section, we show how the random walk martingales introduced earlier generalize to the
DTMC setting. Each of the martingales constructed here will have natural analogs in the SDE
context.
Let (Yn : n ≥ 0) be a real-valued sequence of random variables, not necessarily Markov. A
standard trick for constructing a martingale in this very general setting is to set Di = Yi −
E[Yi |Y0 , . . . , Yi−1 ] for i ≥ 1. Assuming that the Yi ’s are integrable, then the Di ’s are martingale
differences with respect to the Yi ’s. Hence,
n
X
Mn = [Yi − E[Yi |Y0 , . . . , Yi−1 ]]
i=1
is a martingale. The same kind of idea works nicely in the DTMC setting. For f : S → R that is
bounded, note that
Di = f (Xi ) − E[f (Xi )|X0 , . . . , Xi−1 ] = f (Xi ) − E[f (Xi )|Xi−1 ] = f (Xi ) − (P f )(Xi−1 )
is a martingale difference with respect to (Xi : i ≥ 0). Hence,
n
X
M
fn = [f (Xi ) − (P f )(Xi−1 )]
i=1
is a mean-zero martingale. But

n
X
M
fn = [f (Xi ) − (P f )(Xi−1 )]
i=1
n−1
X
= [f (Xi ) − (P f )(Xi−1 )] + f (Xn ) − f (X0 )
i=0
n−1
X
= f (Xn ) − f (X0 ) − (Af )(Xi )
i=0
10
Pn−1
It follows easily that Mn = f (Xn ) − i=0 (Af )(Xi ) is a martingale whenever f is bounded. We
have proved the following result.
Pn−1
Proposition 10.3.1 For f : S → R bounded, Mn = f (Xn ) − i=0 (Af )(Xi ) is a martingale with
respect to (Xn : n ≥ 1).
This martingale is known as the Dynkin martingale. Viewing (Af )(Xi ) as the increment of a
random walk-type process, this is clearly the DTMC analog to the random walk martingale.
Suppose that Af = 0. Then Proposition 10.3.1 implies that (f (Xn ) : n ≥ 0) is a martingale
with respect to (Xn : n ≥ 0).
Definition 10.3.1 A function f : S → R for which Af = 0 is called a harmonic function.
The term “harmonic function” is widely used in the analysis literature. It refers to functions
f : Rd → R for which ∆f = 0, where
∂2 ∂2 ∂2
∆= + + · · · + .
∂ 2 x21 ∂ 2 x22 ∂ 2 x2d
(The operator ∆ is known as the “Laplacian operator”.) Note that if the Markov chain X corre-
sponds (for example) to simple random walk on the lattice plane, then
(
1/4 if (x2 , y2 ) ∈ {(x1 + 1, y1 ), (x1 − 1, y1 ), (x1 , y1 + 1), (x1 , y1 − 1)}
P ((x1 , y1 ), (x2 , y2 )) = .
0 otherwise
Requiring that f be harmonic in this setting forces f to satisfy
f (x1 + 1, y1 ) + f (x1 − 1, y1 ) + f (x1 , y1 + 1) + f (x1 , y1 − 1) − 4f (x1 , y1 )

=0 (10.3.1)
4
The left-hand side term turns out to be a finite-difference approximation to ∆f in two dimensions.
Thus, Definition 10.3.1 legitimately extends the classical notion of harmonic functions.
Proposition 10.3.2 (a) If f is a bounded function for which Af ≤ 0, then (f (Xn ) : n ≥ 0) is a

supermartingale with respect to (Xn : n ≥ 0).
(b) If f is a bounded function for which Af ≥ 0, then (f (Xn ) : n ≥ 0) is a submartingale with

respect to (Xn : n ≥ 0).
Definition 10.3.2 A function f for which Af ≤ 0 is said to be superharmonic. If instead Af ≥ 0,

then f is said to be subharmonic.
Again, this definition extends the classical usage, which states that f is superharmonic if ∆f ≥ 0
and subharmonic if ∆f ≤ 0 . It is in order to remain consistent with the classical usage that we
apply the term “supermartingale” rather than “submartingale” to an unfavorable game in which
Mn has a tendency to decrease in expectation.
There is a nice connection between harmonic functions and recurrence.
11
Exercise 10.3.2 Suppose that X is an irreducible DTMC.
(a) If X is recurrent, prove that all the bounded harmonic functions are constants. (Hint: This
is easy if |S| < ∞ . To prove the general case, use Theorem 10.1.2.)
(b) If X is transient, show that there always exists at least one non-constant bounded harmonic
function.
To apply martingale theory to additive processes of the form

n−1
X
g(Xj ) (10.3.2)
j=0
with X Markov, the obvious device to apply is Proposition 10.3.1. So, note that if we could find f
such that
Af = −g (10.3.3)
then we effectively would have our desired martingale for (10.3.2), namely
n−1
X
Mn = f (Xn ) + g(Xj ) (10.3.4)
j=0
(In the Markov setting, one cannot expect (10.3.2) itself to be a martingale – it just isn’t. But
(10.3.4) shows that it can be represented as a martingale if one adds on the “correction term”
f (Xn ).) Because (10.3.3) plays a key role in representing (10.3.2) as a martingale, this equation has
an important place in the theory of Markov processes. Equation (10.3.3) is called Poisson’s equation.
(In the symmetric simple random walk setting, (10.3.3) is just a finite-difference approximation to
∆f = −g, which is Poisson’s equation in the partial differential equations setting.)
Poisson’s equation need not have a solution for arbitrary g.
Exercise 10.3.3 Suppose that X is an irreducible transient DTMC. If g has finite support (i.e.
{x ∈ S : g(x) 6= 0} has finite cardinality), show that Poisson’s equation has a solution.
Exercise 10.3.4 Suppose that X is an irreducible finite-state DTMC. Let π be the stationary
distribution of X. Let Π be the matrix in which all rows are identical to π
(a) Prove that ΠP = P Π = Π2 .
(b) Prove that (P − Π)n = P n − Π for n ≥ 1.
(c) Prove that if X is aperiodic, then ∞ n

P
n=0 (P − Π) converges absolutely.
(d) Prove that if X is aperiodic, then (I − P + Π)−1 exists.
(e) Extend (d) to the periodic case.
(f) Prove that if g is such that πg = 0, then f = (Π − A)−1 g solves Poisson’s equation Af = −g.
(g) Prove that if g is such that πg 6= 0, then Af = −g has no solution.
12
Exercise 10.3.5 We extend here the existence of solutions to Poisson’s equation to infinite state
irreducible
P P X = (Xn : n ≥ 0). Let f : S → R be such that
positive recurrent Markov chains
x π(x)|f (x)| < ∞. Set fc (x) = f (x) − y π(y)f (y), and put
τ (z)−1
X
∗
u (x) = Ex fc (Xn ),
n=0
where τ (z) = inf{n ≥ 1 : Xn = z}.
Pτ (z)−1
(a) Prove that Ex n=0 |fc (Xn )| < ∞ for each x ∈ S (so that u∗ (·) is finite-valued).
(b) Prove that

X
u∗ (x) = fc (x) + P (x, y)u∗ (y)
y∈S
so that u∗ is a solution of Poisson’s equation.
We now turn to developing an analog to the likelihood ratio martingale that was discussed in
the random walk setting. Let X = (Xn : n ≥ 0) be an S-valued DTMC with initial distribution
ν and (one-step) transition matrix Q = (Q(x, y) : x, y ∈ S). Suppose that we select a stochastic
vector µ and transition matrix P such that
(i) µ(x) = 0 whenever ν(x) = 0 for x ∈ S;
(ii) P (x, y) = 0 whenever Q(x, y) = 0 for x, y ∈ S.
Proposition 10.3.3 The sequence (Ln : n ≥ 0) is a martingale with respect to (Xn : n ≥ 0),
where
n−1
µ(X0 ) Y P (Xj , Xj+1 )
Ln = , n≥0
ν(X0 ) Q(Xj , Xj+1 )
j=0
We close this section with a discussion of the exponential martingale’sPextension to the DTMC
setting. Suppose that we wish to study an additive process of the form n−1 j=0 g(Xj ), where (Xn :
n ≥ 0) is an irreducible finite-state DTMC. In the random walk setting, the moment generating
function of the random walk played a critical role in constructing the exponential martingale. This
suggests considering
Pn−1
un (θ, x, y) = Ex [eθ j=0 g(Xj )
; Xn = y]
for x, y ∈ S. Observe that
X
un (θ, x, y) = eθg(x) P (x, x1 )eθg(x1 ) P (x1 , x2 ) · · · eθg(xn−1 ) P (xn−1 , y) = K n (θ, x, y),
x1 ,...,xn−1
where K n (θ, x, y) the x − y;th component of the nth power of the matrix K(θ), where
K(θ, x, y) = eθg(x) P (x, y). (10.3.5)

13
Note that K(θ) is a non-negative finite irreducible matrix. Then, the Perron-Frobenius theorem
for non-negative matrices implies that there exists a positive eigenvalue λ(θ) and corresponding
positive column eigenvector r(θ) such that
K(θ)r(θ) = λ(θ)r(θ). (10.3.6)
Let ψ(θ) = log λ(θ). We can rewrite (10.3.6) as
X r(θ, x)
e−ψ(θ) K(θ, x, y) = 1, x ∈ S. (10.3.7)
y
r(θ, y)
Substituting (10.3.5) into (10.3.7), we obtain

X r(θ, x)
eθg(x)−ψ(θ) P (x, y) =1
y
r(θ, y)
or equivalently,
r(θ, X1 )
Ex eθg(x)−ψ(θ) =1
r(θ, X0 )
Proposition 10.3.4 For each θ ∈ R,
g(Xj )−nψ(θ) r(θ, Xn )
Pn−1
Ln (θ) = eθ j=0
r(θ, X0 )
is a martingale with respect to (Xn : n ≥ 0).
Proof: The critical verification involves showing that E[Ln+1 (θ)|X0 , . . . , Xn ] = Ln (θ). But

θg(Xn )−ψ(θ) r(θ, Xn+1 )

E[Ln+1 (θ)|X0 , . . . , Xn ] = Ln (θ)E e X0 , . . . , Xn = Ln (θ).
r(θ, Xn )
We can rewrite this martingale as follows. Set h(θ, x) = log r(θ, x). Then, Proposition 10.3.4
asserts that Pn−1
eh(θ,Xn )+θ j=0 g(Xj )−nψ(θ)
is a martingale. This
P exponential martingale can be used in a manner identical to the random walk
setting to study n−1
j=0 g(Xj ).
10.4 The Strong Law for Martingales
As for sums of independent mean zero rv’s, we expect that in great generality,
n
X
n−1 Di → 0 a.s. (10.4.1)
i=1
as n → ∞. This is easy to establish if we weaken the a.s. convergence to convergence in probability,

since
n
E(n−1 ni=1 Di )2
X P
−1
P (|n Di | > a) ≤
2
i=1
n
1 X
= 2 2 EDi2 ,
n
i=1
14
so that if
sup EDn2 < ∞, (10.4.2)
n≥1
it clearly follows that

n
−1
X p
n Di →
− 0 (10.4.3)
i=1
as n → ∞. To proveP (10.4.3) to a.s. convergence, we need to apply the Martingale Convergence

Theorem. Since (nP ni=1 Di : n ≥ 1) is not a martingale, we need to use something to “bridge the
−1
gap” between n−1 ni=1 Di and the world of martingales. The appropriate “bridge” is Kronecker’s
lemma.
Kronecker’s Lemma: If (xn : n ≥ 1) and (an : n ≥ 1) are two real-valued sequences for which
(an : n ≥ 1) is non-negative and increasing to infinity, then the existence of a finite-valued z such
that
n
X xj
→z
aj
j=1
as n → ∞ implies that
n
1 X
xj → 0
an
j=1
as n → ∞.
To apply this result in our martingale setting, let
n
X Dj
M̃n =
j
j=1
and observe that (M̃n : n ≥ 0) is a martingale for which

v  2
u
u n
X
u
E|M̃n | ≤ tE  Dj /j 
j=1
v
u∞
uX
=t ED2 /j 2 , j
j=1
so that in the presence of (10.4.2), the Martingale Convergence Theorem can be applied, yielding
the conclusion that there exists a finite-valued M̃∞ for which
M̃n → M̃∞ a.s.
as n → ∞. An application of Kronecker’s lemma “path-by-path” then yields

n
X
−1
n Di → 0
i=1
a.s. as n → ∞.
15
Exercise 10.4.1 Use the above argument to prove the strong law
n−1
X X
−1
n f (Xi ) → π(z)f (z) a.s.
i=0 z
as n → ∞ for a given finite-state irreducible Markov chain (with equilibrium distribution π =

(π(x) : x ∈ S)).
10.5 The Central Limit Theorem for Martingales
We discuss here general conditions under which

n
X
n−1/2 Di ⇒ σN (0, 1)
i=1
as n → ∞ in discrete time or under which
t−1/2 M (t) ⇒ σN (0, 1) (10.5.1)
as n → ∞ in continuous time. Since discrete-time martingales are just a special case of continuous
time martingales, we focus on (10.5.1).
Note that
n
X
M (t) = M (0) + (M (it/n) − M ((i − 1)t/n))
i=1
so that in the presence of square integrability

n
X
EM 2 (t) = EM 2 (0) + E (M (it/n) − M ((i − 1)t/n))2 .
i=1
For a given square-integrable martingale (M (t)t ≥ 0) define the quadratic variation of M to be

n
X
[M ](t) = lim (M (it/n) − M ((i − 1)t/n))2 .
n→∞
i=0
Theorem 10.5.1 Let (M (t) : t ≥ 0) be a square-integrable martingale with right continuous paths
with left limits. If either:
1
√ E sup |M (s) − M (s−)| → 0
t 0≤s≤t
and
1 p
− σ2
[M ](t) →
t
as t → ∞, or
1
E sup |M (s) − M (s−)|2 → 0,
t 0≤s≤t
1
E sup |hM i(s) − hM i(s−)| → 0
t 0≤s≤t
16
and
1 p
− σ2
hM i(t) →
t
as t → ∞, then
t−1/2 M (t) ⇒ σN (0, 1)
as t → ∞.
Remark 10.5.1 Note that Markov jump processes have right continuous paths with left limits, so
this result applies in the Markov jump process setting.
Remark 10.5.2 When specialized to discrete time,

n
X
[M ](n) = M 2 (0) + Di2
i=1
and
n
X
hM i(n) = E[Di2 |Z0 , . . . , Zi−1 ]
i=1
Exercise 10.5.1 Use the Martingale CLT to prove that there exists σ for which
n−1
!
X X
−1/2
n f (Xi ) − n π(z)f (z) ⇒ σN (0, 1)
i=0 z
as n → ∞, provided that (Xn : n ≥ 0) is a finite state irreducible Markov chain.
17

MS&E 321 Martingales Guide

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

MS&E 321 Martingales Guide

Diunggah oleh

Hak Cipta:

Format Tersedia

MS&E 321 Spring 12-13

Stochastic Systems June 1, 2013

Section 10: Martingales

10.1 Martingales in Discrete Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

10.1 Martingales in Discrete Time

(i) E|Mn | < ∞ for n ≥ 0;

(iii) E[Mn+1 |Z0 , Z1 , . . . , Zn ] = Mn , for n ≥ 0.

E[Sn+1 |S0 , . . . , Sn ] = E[Sn + Xn+1 |S0 , . . . , Sn ] = Sn + E[Xn+1 |S0 , . . . , Sn ] = Sn + EXn+1 = Sn .

EMn = EM0 n ≥ 0. (10.1.1)

In addition, if EMn2 < ∞ for n ≥ 0, then

E[Di Dj |Z0 , . . . , Zj−1 ] = Di E[Dj |Z0 , . . . , Zj−1 ] = 0,

E[Mn+1 |S0 , . . . , Sn ] = E[(Sn + Xn+1 )2 − (n + 1)σ 2 |S0 , . . . , Sn ]

since f is a density that integrates to 1. This is known as a likelihood ratio martingale.

Then, the strong law of large numbers guarantees that

The martingale (Ln : n ≥ 0) defined by (10.1.7) is known as an exponential martingale. Because

Exercise 10.1.1 (a) Prove that ψ(·) is convex.

Theorem 10.1.1 (Martingale Convergence Theorem in L2 ) Let (Mn : n ≥ 0) be a martingale

Theorem 10.1.2 (Martingale Convergence Theorem) Let (Mn : n ≥ 0) be a martingale with

10.2 Optional Sampling for Discrete-Time Martingales

An important property of martingales is

fails to hold for all finite-valued stopping times T .

E lim MT ∧n = lim EMT ∧n , (10.2.4)

then (10.2.2) holds, since

EMT = E lim MT ∧n = lim EMT ∧n = lim EM0 = EM0 .

e i = |Di |. So, if EW < ∞, we conclude that (10.2.4) is valid.

EST = −aP(ST = −a) + bP(ST = b) = −aP(ST = −a) + b[1 − P(ST = −a)].

Therefore, P(ST = −a) = b/(a + b).

Exercise 10.2.1 (a) Prove that ET < ∞ in Application 10.2.1.

EST2 = σ 2 ET = ET. (10.2.6)

Solving for EST2 , we have

Proposition 10.2.2 Suppose that there exists c < ∞ for which

If ET < ∞, then EMT = EM0 .

e i I(T ≥ i)|Z0 , Z1 , . . . , Zi−1 ] = I(T ≥ i)E[D

Application 5.2.2 (continued) Here,D e i ≤ (|Si | + |Si−1 |) + 1. So, on {T ≥ i}, D

EeθST −T ψ(θ) = 1. (10.2.7)

EeθSn −nψ(θ) = 1 (10.2.8)

For T as described above, ST = b (This is a consequence of the “continuity” of the nearest-

Ee−T ψ(θ) = e−θb

Set γ = γ(θ) so that θ = ψ −1 (γ). Then,

eθST ∧n −ψ(θ)(T ∧n) ≤ eθST ∧n ≤ eθb ,

thereby yielding the identity

Note that by setting γ = 0, we obtain the identity

If Mn corresponds to the fortune of a gambler at time n, then a supermartingale indicates

Exercise 10.2.3 Prove Proposition 10.2.3.

10.3 Martingales for Discrete-Time Markov Chains

is a mean-zero martingale. But

Definition 10.3.1 A function f : S → R for which Af = 0 is called a harmonic function.

Requiring that f be harmonic in this setting forces f to satisfy

f (x1 + 1, y1 ) + f (x1 − 1, y1 ) + f (x1 , y1 + 1) + f (x1 , y1 − 1) − 4f (x1 , y1 )

Proposition 10.3.2 (a) If f is a bounded function for which Af ≤ 0, then (f (Xn ) : n ≥ 0) is a

(b) If f is a bounded function for which Af ≥ 0, then (f (Xn ) : n ≥ 0) is a submartingale with

Exercise 10.3.1 Prove Proposition 10.3.2.

Definition 10.3.2 A function f for which Af ≤ 0 is said to be superharmonic. If instead Af ≥ 0,

Exercise 10.3.2 Suppose that X is an irreducible DTMC.

To apply martingale theory to additive processes of the form

(a) Prove that ΠP = P Π = Π2 .

(b) Prove that (P − Π)n = P n − Π for n ≥ 1.

(c) Prove that if X is aperiodic, then ∞ n

(d) Prove that if X is aperiodic, then (I − P + Π)−1 exists.

(e) Extend (d) to the periodic case.

(g) Prove that if g is such that πg 6= 0, then Af = −g has no solution.

where τ (z) = inf{n ≥ 1 : Xn = z}.

(b) Prove that