Contents
A fundamental tool in the analysis of DTMC’s and continuous-time Markov processes is the
notion of a martingale. Martingales also underlie the definition we will adopt for defining stochastic
integrals with respect to Brownian motion. A martingale is basically a real-valued sequence that
is a suitable generalization of a random walk with independent, mean-zero increments.
Definition 10.1.1 Let (Mn : n ≥ 0) be a sequence of real-valued random variables. Then, (Mn :
n ≥ 0) is said to be a martingale (with respect to the sequence of random elements (Zn : n ≥ 0) if:
Mn = gn (Z0 , Z1 , . . . , Zn );
Remark 10.1.1 When a process (Mn : n ≥ 0) satisfies condition (ii), one says that (Mn : n ≥ 0)
is adapted to (Zn : n ≥ 0).
The critical component of the martingale definition is condition (iii). If we view Mn as the
fortune of a gambler at time n, then condition (iii) is asserting that the gambler is involved in
playing a “fair game”, in which he/she has no propensity (in expectation) to either win or lose on
any given gamble. As we asserted earlier, a random walk with independent mean-zero increments
is a martingale. To see this, let S0 , X1 , X2 , . . . be independent random variables with finite mean,
and suppose that EXi = 0 for i ≥ 1. Set Zn = Sn = S0 + X1 + · · · + Xn . Then, conditions (i) and
(ii) of Definition 10.1.1 are trivial to verify. For condition (iii), observe that
Martingales inherit many of the properties of mean-zero random walks. In view of the analogy with
random walks, it is natural to consider the increments
Di = Mi − Mi−1 , i≥1
1
§ SECTION 10: MARTINGALES
namely, the martingale differences. The following proposition is a clear generalization of two of the
most important properties of mean-zero random walks.
Proposition 10.1.1 Let (Mn : n ≥ 0) be a martingale with respect to (Zn : n ≥ 0). Then,
Cov(Di , Dj ) = 0, i 6= j (10.1.2)
so that
n
X
Var[Mn ] = Var[M0 ] + Var[Di ]. (10.1.3)
i=1
Proof: Relation (10.1.1) is immediate from condition (iii) of the martingale definition. For
(10.1.2), note that (10.1.1) implies that EDi = 0, so that (10.1.2) is equivalent to asserting that
EDi Dj = 0 for i < j. But
where condition (ii) of the martingale definition was used for the first equality, and condition (iii)
was used for the final step. Taking expectations with respect to (Z0 , . . . , Zj−1 ), we get (10.1.2).
Finally, (10.1.3) is immediate from (10.1.2).
Definition 10.1.2 A martingale (Mn : n ≥ 0) for which EMn2 < ∞ for n ≥ 0 is called a square-
integrable martingale.
Before we turn to exploring further properties of martingales, let us develop some additional
examples of martingales in the random walk setting.
Example 10.1.1 Let (Xn : n ≥ 1) be a sequence of iid mean-zero random variables with finite
variance σ 2 . Let Sn = X1 + · · · + Xn and let
Mn = Sn2 − nσ 2 .
Then (Mn : n ≥ 0) is a martingale with respect to (Sn : n ≥ 0). The critical property to verify is
(iii). Note that
Example 10.1.2 Let (Xn : n ≥ 1) be a sequence of iid random variables with common density g.
Suppose that f is another density with the property that whenever g(x) = 0, then f (x) = 0. Set
L0 = 1 and
n
Y f (Xi )
Ln = , n≥1
g(Xi )
i=1
2
§ SECTION 10: MARTINGALES
Then, (Ln : n ≥ 0) is a martingale with respect to (Xn : n ≥ 1). Again, the critical property is
verifying (iii). Here,
Z
f (Xn+1 ) f (Xn+1 ) f (x)
E[Ln+1 |X1 , . . . , Xn ] = E Ln X1 , . . . , Xn = Ln E Ln = Ln g(x) dx = Ln ,
g(Xn+1 ) g(Xn+1 ) g(x)
To show why the likelihood ratio martingale arises naturally, suppose that we have observed an
iid sample from a population, yielding observations X1 , X2 , . . . , Xn . Assume that the underlying
population is known to be iid, either with common density f or with common density g. To test
the hypothesis that the Xi ’s have common density f (the “f -hypothesis”) against the hypothesis
that the Xi ’s have common density g (the “g-hypothesis”), the Neyman-Pearson lemma asserts
that we should accept the “f -hypothesis” if the relative likelihood
f (X1 ) · · · f (Xn )
(10.1.4)
g(X1 ) · · · g(Xn )
is sufficiently large, and reject it otherwise. So, studying Ln in the case where the Xi ’s have common
density g corresponds to studying the test statistic (10.1.4) when the “state of nature” is that the
“g-hypothesis” is true. Given this interpretation, it seems natural to expect that Ln converges to
zero as the sample size n goes to positive infinity. This is because for a large sample size n, it is
extremely unlikely that such a sample will be better explained by the “f -hypothesis” than by the
other one. The fact that Ln ought to go to zero as n → ∞ is perhaps a bit surprising, given that
ELn = 1 for n ≥ 0.
To prove that Ln → 0 almost surely as n → ∞, note that
n
X f (Xi )
log Ln = log .
g(Xi )
i=1
In other words,
Z
1 f (x)
log Ln → log g(x) dx. (10.1.5)
n g(x)
(The right-hand side of (10.1.5) is what is known as a relative entropy.) Since log is strictly concave,
Jensens inequality asserts that if f 6= g,
f (Xi ) f (Xi )
E log < log E =0 (10.1.6)
g(Xi ) g(Xi )
As a consequence, not only does Ln converge to zero as n → ∞ a.s, but the rate of convergence
is exponentially fast. It is worth noting that this is an example of a sequence of random variables
(Ln : n ≥ 0) for which Ln → 0 a.s. and yet ELn 9 0 as n → ∞ (in other words, passing limits
through expectations is not always valid).
3
§ SECTION 10: MARTINGALES
Example 10.1.3 In this example, we specialize the likelihood ratio martingale a bit. Suppose that
the Xi ’s are iid with common density g, and suppose that the moment generating function mX (θ) =
EeθXi converges in some neighborhood of the origin. For θ within the domain of convergence of
mX (·), let
eθx g(x)
f (x) = ,
mX (θ)
or, equivalently, f (x) = eθx−ψ(θ) g(x), where ψ(θ) = log mX (θ). In this case,
n
Y f (Xi )
Ln = = eθSn −nψ(θ) (10.1.7)
g(Xi )
i=1
Some indication of the power of this martingale should be apparent, if we explicitly display the
dependence of Ln on θ as follows:
Ln (θ) = eθSn −nψ(θ)
The defining property (iii) of a martingale asserts that
E[Ln+1 (θ)|S0 , . . . , Sn ] = Ln (θ).
For θ inside the domain of convergence of mX (·), one can interchange the derivative and expectation,
yielding
E[L0n+1 (θ)|S0 , . . . , Sn ] = L0n (θ).
In particular, (L0n (0) : n ≥ 0) is a martingale. But
L0n (0) = Sn − nψ 0 (0).
It turns out that ψ 0 (0) = EX1 . So, by differentiating our exponential martingale, we retrieve the
random walk martingale. And by differentiating a second time, it turns out that L00n (0) is the
martingale of Example 10.1.2. Through successive differentiation, we can obtain a whole infinite
family of such martingales.
We now turn to a fundamental result in the theory of martingales known as the Martingale
Convergence Theorem.
4
§ SECTION 10: MARTINGALES
Proof: The space L2 of square-integrable random variables is a Hilbert space under the inner
product hX, Y i = E[XY ]. Since
Xn
2 2
EMn = EM0 + EDi2 ,
i=1
P∞ 2
P∞ 2
it follows that i=1 EDi < ∞. For > 0, choose m = m() so that i=m EDi < . Then, for
n2 > n1 ≥ m,
n2
X
2
E(Mn2 − Mn1 ) = EDj2 <
j=n1 +1
so that (Mn : n ≥ 0) is a Cauchy sequence in L2 . Then, the completeness of L2 yields the conclusion
of the theorem.
Actually, one does not need square integrability in order that the Martingale Convergence
Theorem hold.
For a proof, see p. 233 of “Probability: Theory and Examples” 3rd ed. by R. Durrett.
We conclude this section with a brief discussion of stochastic integrals in discrete time. Let
(M )n : n ≥ 0) be a square-integrable martingale with respect to (Zn : n ≥ 0). Suppose that
(Wn : n ≥ 0) is a sequence of random variables that is adapted to (Zn : n ≥ 0). We define the
stochastic integral of (Wn : n ≥ 0) with respect to (Mn : n ≥ 0) as the sequence
n
X n
X n
X
Vn = Wi−1 Di = Wi−1 (Mi − Mi−1 ) = Wi−1 ∆Mi
i=1 i=1 i=1
Pn
We could also have defined the stochastic integral here as i=1 Wi ∆Mi . But in that case, we would
lose the nice properties listed below.
Exercise 10.1.2 Let (Mn : n ≥ 0) be a square-integrable martingale with respect to (Zn : n ≥ 0),
with M0 = 0. Suppose (Wn : n ≥ 0) is a square-integrable sequence that is adapted to (Zn : n ≥ 0).
(a) Prove that if V0 = 0 and Vn = ni=1 Wi−1 ∆Mi for n ≥ 1, then (Vn : n ≥ 0) is a martingale
P
with respect to (Zn : n ≥ 0).
(Di : i ≥ 1) are a stationary sequence of independent
(b) Suppose that the martingale differences P
2 2 n−1 2 2
random variables. Show that EVn = σ i=0 EWi , where σ = Var[Di ].
Example 10.2.1 Let (Sn : n ≥ 0) be a random walk with S0 = 0 and iid increments (Xn : n ≥ 1)
defined by
1
P(Xn = 1) = P(Xn = −1) = .
2
Put T = inf{n ≥ 0 : Sn = 1}. Since (Sn : n ≥ 0) is null recurrent, T < ∞ a.s. and ST = 1.
Therefore, EST = 1 and ES0 = 0 , and so EST 6= ES0 . Hence, the class of stopping times needs to
be restricted somewhat.
Theorem 10.2.1 Let (Mn : n ≥ 0) be a martingale with respect to (Zn : n ≥ 0). Suppose that
T is a bounded random variable that is a stopping time with respect to (Zn : n ≥ 0). Then
EMT = EM0 .
Pm
Proof: Let m be such that P (T ≤ m) = 1. Then MT = M0 + i=1 Di I(T ≥ i), and thus
m
X
EMT = EM0 + E Di I(T ≥ i) (10.2.3)
i=1
Because T is a stopping time, E[Di I(T ≥ i)|Z0 , . . . , Zi−1 ] = I(T ≥ i)E[Di |Z0 , . . . , Zi−1 ] = 0, and so
m
X
E Di I(T ≥ i) = 0
i=1
If T is a stopping time, then T ∧ n is a stopping time for n ≥ 0 (and is clearly bounded). So,
optional sampling applies at T ∧ n (see Theorem 10.2.1), i.e. EMT ∧n = EM0 for n ≥ 0.
If T < ∞ a.s., then MT ∧n → MT a.s. as n → ∞. Hence, if
Therefore, the key to establishing (10.2.2) is (10.2.4). There are various results which one
can invoke to justify (10.2.4); the most powerful of these results is the Dominated Convergence
Theorem. To apply this result, we need to find a random variable W having finite mean, such that
|MT ∧n | ≤ W for n ≥ 0. The obvious candidate for W is
T
X
W = |M0 | + D
e i, (10.2.5)
i=1
Proposition 10.2.1 Suppose that there exists c < ∞ such that P(Di ≤ c) = 1 for i ≥ 1. If
ET < ∞, then
EMT = EM0 .
6
§ SECTION 10: MARTINGALES
Proof: Note that W ≤ |M0 | + cT . Since ET < ∞, then EW < ∞. Then, the Dominated
Convergence Theorem implies that EMT ∧n → EMT as n → ∞, yielding the result.
Now, let’s turn to an application of optional sampling.
Application 10.2.1 Let (Sn : n ≥ 0) be a random walk with S0 = 0 and iid increments (Xn : n ≥
1) defined by
1
P(Xn = 1) = P(Xn = −1) = .
2
Let T = inf{n ≥ 0 : Sn ≤ −a or Sn ≥ b} be the “exit time”from [−a, b]. Suppose that we wish
to compute for P(ST = −a), the probability that the random walk exits the left boundary. (This
is basically the “gambler’s ruin” computation for the probability of ruin.) Note that D
e i = 1 and
ET < ∞ (see Exercise 10.2.1). Hence, Proposition 10.2.1 applies and EST = 0. However,
Application 10.2.2 In this continuation of Application 10.2.1, we wish to compute ET (In the
gambler’s ruin setting, this is the mean duration of the game). Let Mn = Sn2 − nσ 2 , where
σ 2 = VarXi = 1. Assuming that (10.2.2) holds,
Clearly, the D
e i do not satisfy the hypotheses of Proposition 10.2.1, so something else is needed
here.
P∞ e P∞
Proof: Note that EW = E|M0 | + E i=1 Di I(T ≥ i) = E|M0 | + i=1 EDi I(T ≥ i). However,
e
Thus, EW ≤ E|M0 | + c ∞
P
i=1 EI(T ≥ i) = E|M 0| + cET < ∞, and consequently the Dominated
Convergence Theorem applies.
7
§ SECTION 10: MARTINGALES
Application 10.2.3 Here, mX (θ) = peθ + (1 − p)e−θ , so ψ(θ) = log(peθ + (1 − p)eθ ). Then, the
martingale of interest is Ln (θ) = eθSn −nψ(θ) . Assuming that optional sampling applies at time T ,
we arrive at ELT (θ) = 1, or, in other words,
To compute the exit probabilities from [−a, b], it is desirable to eliminate the term T ψ(θ) from the
exponent of (10.2.7).
Recall that ψ is convex (see Exercise 10.1.1). There exists a unique θ∗ 6= 0 such that ψ(θ∗ ) = 0,
given by
∗ 1−p
θ = log .
p
∗S
Substituting θ = θ∗ into (10.2.7), we get Eeθ T = 1. But Eeθ∗ ST = e−θ∗ aP(ST = −a) +
eθ∗ bP(ST = b). Hence,
b
1 − 1−p p
P(ST = −a) = a b
p 1−p
1−p − p
(This is basically the probability of ruin in a gambler’s ruin problem that is not fair.)
Exercise 10.2.2 Rigorously apply the optional sampling theorem in Application 10.2.3.
Application 10.2.4 Let (Sn : n ≥ 0) be a random walk with S0 = 0 and iid increments (Xn : n ≥
1) given by
P(Xn = 1) = p = 1 − P(Xn = −1)
with p > 1/2. This is a walk with positive drift, so that T < ∞ a.s. if we set T = inf{n ≥ 0 : Sn ≥
b}. Our goal here is to compute the moment generating function of T , using martingale methods.
Assuming that we can invoke the optional sampling theorem at T ,
is the moment generating function of T (In computing ψ −1 (γ), one may find multiple roots; to
formally determine the appropriate root, note that the function Ee−γT of the non-negative random
variable T must be non-increasing in γ).
To make this result rigorous, note that if p > 1/2, then ψ 0 (0) > 0. The convexity of ψ(·) then
guarantees that ψ(θ) > 0 for θ > 0. Consequently, for θ > 0,
so the Dominated Convergence Theorem ensures that (10.2.8) holds for θ > 0. Then for γ > 0, let
η = ψ −1 (γ) be the non-negative root of
ψ(η) = γ (10.2.9)
Relation (10.2.9) yields the expression
−1 (γ)b
Ee−γT = e−ψ = e−ηb ,
where ψ −1 is defined as above. Note that a rigorous application of optional sampling theory has
led us to the correct choice of root for the equation (10.2.9).
A similar analysis is possible for the one-sided hitting time T = inf{n ≥ 0 : Sn ≤ −a} with
a > 0. Since p > 1/2, T is infinite with positive probability in this case. Again, consider the
sequence eθST ∧n −ψ(θ)(T ∧n) . Note that if θ < 0 and ψ(θ) > 0, this sequence is bounded above by
e−θa . Hence, we may interchange limits and expectations in the expression
e−θa E[e−T ψ(θ) I(T ≤ n)] + E[eθSn −nψ(θ) I(T > n)] = 1,
In other words, we have computed the probability that a positive drift “nearest neighbor” random
walk ever drops below −a.
The theory of optional sampling extends beyond the martingale setting to supermartingales
and submartingales.
Definition 10.2.1 Let (Mn : n ≥ 0) be an integrable sequence of random variables that is adapted
to (Zn : n ≥ 0). If for n ≥ 0,
E[Mn+1 |Z0 , . . . , Zn−1 ] ≤ Mn ,
then (Mn : n ≥ 0) is said to be a supermartingale with respect to (Zn : n ≥ 0). On the other hand,
if
E[Mn+1 |Z0 , . . . , Zn−1 ] ≥ Mn ,
then (Mn : n ≥ 0) is said to be a submartingale with respect to (Zn : n ≥ 0).
9
§ SECTION 10: MARTINGALES
Proposition 10.2.3 Let T be a stopping time that is adapted to (Zn : n ≥ 0). If (Mn : n ≥ 0) is
a supermartingale with respect to (Zn : n ≥ 0), then
EMT ∧n ≤ EM0 , n ≥ 0.
On the other hand, if (Mn : n ≥ 0) is a submartingale with respect to (zn : n ≥ 0), then
EMT ∧n ≥ EM0 , n ≥ 0.
Exercise 10.2.4 Let (Mn : n ≥ 0) be a martingale with respect to (Zn : n ≥ 0). Suppose that
φ : R → R is a convex function for which E|φ(Mn )| < ∞ for n ≥ 0. Prove that (φ(Mn ) : n ≥ 0) is
a submartingale with respect to (Zn : n ≥ 0).
In this section, we show how the random walk martingales introduced earlier generalize to the
DTMC setting. Each of the martingales constructed here will have natural analogs in the SDE
context.
Let (Yn : n ≥ 0) be a real-valued sequence of random variables, not necessarily Markov. A
standard trick for constructing a martingale in this very general setting is to set Di = Yi −
E[Yi |Y0 , . . . , Yi−1 ] for i ≥ 1. Assuming that the Yi ’s are integrable, then the Di ’s are martingale
differences with respect to the Yi ’s. Hence,
n
X
Mn = [Yi − E[Yi |Y0 , . . . , Yi−1 ]]
i=1
is a martingale. The same kind of idea works nicely in the DTMC setting. For f : S → R that is
bounded, note that
Di = f (Xi ) − E[f (Xi )|X0 , . . . , Xi−1 ] = f (Xi ) − E[f (Xi )|Xi−1 ] = f (Xi ) − (P f )(Xi−1 )
is a martingale difference with respect to (Xi : i ≥ 0). Hence,
n
X
M
fn = [f (Xi ) − (P f )(Xi−1 )]
i=1
Pn−1
It follows easily that Mn = f (Xn ) − i=0 (Af )(Xi ) is a martingale whenever f is bounded. We
have proved the following result.
Pn−1
Proposition 10.3.1 For f : S → R bounded, Mn = f (Xn ) − i=0 (Af )(Xi ) is a martingale with
respect to (Xn : n ≥ 1).
This martingale is known as the Dynkin martingale. Viewing (Af )(Xi ) as the increment of a
random walk-type process, this is clearly the DTMC analog to the random walk martingale.
Suppose that Af = 0. Then Proposition 10.3.1 implies that (f (Xn ) : n ≥ 0) is a martingale
with respect to (Xn : n ≥ 0).
The term “harmonic function” is widely used in the analysis literature. It refers to functions
f : Rd → R for which ∆f = 0, where
∂2 ∂2 ∂2
∆= + + · · · + .
∂ 2 x21 ∂ 2 x22 ∂ 2 x2d
(The operator ∆ is known as the “Laplacian operator”.) Note that if the Markov chain X corre-
sponds (for example) to simple random walk on the lattice plane, then
(
1/4 if (x2 , y2 ) ∈ {(x1 + 1, y1 ), (x1 − 1, y1 ), (x1 , y1 + 1), (x1 , y1 − 1)}
P ((x1 , y1 ), (x2 , y2 )) = .
0 otherwise
Again, this definition extends the classical usage, which states that f is superharmonic if ∆f ≥ 0
and subharmonic if ∆f ≤ 0 . It is in order to remain consistent with the classical usage that we
apply the term “supermartingale” rather than “submartingale” to an unfavorable game in which
Mn has a tendency to decrease in expectation.
There is a nice connection between harmonic functions and recurrence.
11
§ SECTION 10: MARTINGALES
(a) If X is recurrent, prove that all the bounded harmonic functions are constants. (Hint: This
is easy if |S| < ∞ . To prove the general case, use Theorem 10.1.2.)
(b) If X is transient, show that there always exists at least one non-constant bounded harmonic
function.
with X Markov, the obvious device to apply is Proposition 10.3.1. So, note that if we could find f
such that
Af = −g (10.3.3)
then we effectively would have our desired martingale for (10.3.2), namely
n−1
X
Mn = f (Xn ) + g(Xj ) (10.3.4)
j=0
(In the Markov setting, one cannot expect (10.3.2) itself to be a martingale – it just isn’t. But
(10.3.4) shows that it can be represented as a martingale if one adds on the “correction term”
f (Xn ).) Because (10.3.3) plays a key role in representing (10.3.2) as a martingale, this equation has
an important place in the theory of Markov processes. Equation (10.3.3) is called Poisson’s equation.
(In the symmetric simple random walk setting, (10.3.3) is just a finite-difference approximation to
∆f = −g, which is Poisson’s equation in the partial differential equations setting.)
Poisson’s equation need not have a solution for arbitrary g.
Exercise 10.3.3 Suppose that X is an irreducible transient DTMC. If g has finite support (i.e.
{x ∈ S : g(x) 6= 0} has finite cardinality), show that Poisson’s equation has a solution.
Exercise 10.3.4 Suppose that X is an irreducible finite-state DTMC. Let π be the stationary
distribution of X. Let Π be the matrix in which all rows are identical to π
(f) Prove that if g is such that πg = 0, then f = (Π − A)−1 g solves Poisson’s equation Af = −g.
12
§ SECTION 10: MARTINGALES
Exercise 10.3.5 We extend here the existence of solutions to Poisson’s equation to infinite state
irreducible
P P X = (Xn : n ≥ 0). Let f : S → R be such that
positive recurrent Markov chains
x π(x)|f (x)| < ∞. Set fc (x) = f (x) − y π(y)f (y), and put
τ (z)−1
X
∗
u (x) = Ex fc (Xn ),
n=0
Pτ (z)−1
(a) Prove that Ex n=0 |fc (Xn )| < ∞ for each x ∈ S (so that u∗ (·) is finite-valued).
We now turn to developing an analog to the likelihood ratio martingale that was discussed in
the random walk setting. Let X = (Xn : n ≥ 0) be an S-valued DTMC with initial distribution
ν and (one-step) transition matrix Q = (Q(x, y) : x, y ∈ S). Suppose that we select a stochastic
vector µ and transition matrix P such that
Proposition 10.3.3 The sequence (Ln : n ≥ 0) is a martingale with respect to (Xn : n ≥ 0),
where
n−1
µ(X0 ) Y P (Xj , Xj+1 )
Ln = , n≥0
ν(X0 ) Q(Xj , Xj+1 )
j=0
We close this section with a discussion of the exponential martingale’sPextension to the DTMC
setting. Suppose that we wish to study an additive process of the form n−1 j=0 g(Xj ), where (Xn :
n ≥ 0) is an irreducible finite-state DTMC. In the random walk setting, the moment generating
function of the random walk played a critical role in constructing the exponential martingale. This
suggests considering
Pn−1
un (θ, x, y) = Ex [eθ j=0 g(Xj )
; Xn = y]
for x, y ∈ S. Observe that
X
un (θ, x, y) = eθg(x) P (x, x1 )eθg(x1 ) P (x1 , x2 ) · · · eθg(xn−1 ) P (xn−1 , y) = K n (θ, x, y),
x1 ,...,xn−1
where K n (θ, x, y) the x − y;th component of the nth power of the matrix K(θ), where
Note that K(θ) is a non-negative finite irreducible matrix. Then, the Perron-Frobenius theorem
for non-negative matrices implies that there exists a positive eigenvalue λ(θ) and corresponding
positive column eigenvector r(θ) such that
K(θ)r(θ) = λ(θ)r(θ). (10.3.6)
Let ψ(θ) = log λ(θ). We can rewrite (10.3.6) as
X r(θ, x)
e−ψ(θ) K(θ, x, y) = 1, x ∈ S. (10.3.7)
y
r(θ, y)
or equivalently,
r(θ, X1 )
Ex eθg(x)−ψ(θ) =1
r(θ, X0 )
Proposition 10.3.4 For each θ ∈ R,
g(Xj )−nψ(θ) r(θ, Xn )
Pn−1
Ln (θ) = eθ j=0
r(θ, X0 )
is a martingale with respect to (Xn : n ≥ 0).
Proof: The critical verification involves showing that E[Ln+1 (θ)|X0 , . . . , Xn ] = Ln (θ). But
θg(Xn )−ψ(θ) r(θ, Xn+1 )
E[Ln+1 (θ)|X0 , . . . , Xn ] = Ln (θ)E e X0 , . . . , Xn = Ln (θ).
r(θ, Xn )
We can rewrite this martingale as follows. Set h(θ, x) = log r(θ, x). Then, Proposition 10.3.4
asserts that Pn−1
eh(θ,Xn )+θ j=0 g(Xj )−nψ(θ)
is a martingale. This
P exponential martingale can be used in a manner identical to the random walk
setting to study n−1
j=0 g(Xj ).
As for sums of independent mean zero rv’s, we expect that in great generality,
n
X
n−1 Di → 0 a.s. (10.4.1)
i=1
14
§ SECTION 10: MARTINGALES
so that if
sup EDn2 < ∞, (10.4.2)
n≥1
gap” between n−1 ni=1 Di and the world of martingales. The appropriate “bridge” is Kronecker’s
lemma.
Kronecker’s Lemma: If (xn : n ≥ 1) and (an : n ≥ 1) are two real-valued sequences for which
(an : n ≥ 1) is non-negative and increasing to infinity, then the existence of a finite-valued z such
that
n
X xj
→z
aj
j=1
as n → ∞ implies that
n
1 X
xj → 0
an
j=1
as n → ∞.
To apply this result in our martingale setting, let
n
X Dj
M̃n =
j
j=1
so that in the presence of (10.4.2), the Martingale Convergence Theorem can be applied, yielding
the conclusion that there exists a finite-valued M̃∞ for which
a.s. as n → ∞.
15
§ SECTION 10: MARTINGALES
Exercise 10.4.1 Use the above argument to prove the strong law
n−1
X X
−1
n f (Xi ) → π(z)f (z) a.s.
i=0 z
as n → ∞ in continuous time. Since discrete-time martingales are just a special case of continuous
time martingales, we focus on (10.5.1).
Note that
n
X
M (t) = M (0) + (M (it/n) − M ((i − 1)t/n))
i=1
Theorem 10.5.1 Let (M (t) : t ≥ 0) be a square-integrable martingale with right continuous paths
with left limits. If either:
1
√ E sup |M (s) − M (s−)| → 0
t 0≤s≤t
and
1 p
− σ2
[M ](t) →
t
as t → ∞, or
1
E sup |M (s) − M (s−)|2 → 0,
t 0≤s≤t
1
E sup |hM i(s) − hM i(s−)| → 0
t 0≤s≤t
16
§ SECTION 10: MARTINGALES
and
1 p
− σ2
hM i(t) →
t
as t → ∞, then
t−1/2 M (t) ⇒ σN (0, 1)
as t → ∞.
Remark 10.5.1 Note that Markov jump processes have right continuous paths with left limits, so
this result applies in the Markov jump process setting.
and
n
X
hM i(n) = E[Di2 |Z0 , . . . , Zi−1 ]
i=1
Exercise 10.5.1 Use the Martingale CLT to prove that there exists σ for which
n−1
!
X X
−1/2
n f (Xi ) − n π(z)f (z) ⇒ σN (0, 1)
i=0 z
17