Anda di halaman 1dari 115

Omri Sarig

Lecture Notes on Ergodic Theory


Penn State, Fall 2008
December 12, 2008
(Prepared using the Springer svmono author package)
Contents
1 Basic denitions and constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 What is ergodic theory and how it came about . . . . . . . . . . . . . . . . . . . 1
1.2 The abstract setup of ergodic theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 The probabilistic point of view. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Ergodicity and mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5.1 Circle rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5.2 Angle Doubling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5.3 Bernoulli Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.4 Finite Markov Chains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5.5 The geodesic ow on a hyperbolic surface . . . . . . . . . . . . . . . 16
1.6 Basic constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.6.1 Skew-products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.6.2 Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.6.3 The natural extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.6.4 Induced transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.6.5 Suspensions and Kakutani skyscrapers . . . . . . . . . . . . . . . . . . 26
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2 Ergodic Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.1 The Mean Ergodic Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 The Pointwise Ergodic Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 The non-ergodic case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.1 Conditional expectations and the limit in the ergodic theorem 35
2.3.2 Conditional probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3.3 The ergodic decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4 The Subadditive Ergodic Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5 The Multiplicative Ergodic Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.5.1 Preparations from Multilinear Algebra . . . . . . . . . . . . . . . . . . 43
2.5.2 Proof of the Multiplicative Ergodic Theorem . . . . . . . . . . . . . 48
v
vi Contents
2.5.3 The Multiplicative Ergodic Theorem for Invertible Cocycles 57
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3 Spectral Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.1 The spectral approach to ergodic theory . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 Weak mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.2.1 Denition and characterization . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.2.2 Spectral measures and weak mixing. . . . . . . . . . . . . . . . . . . . . 66
3.3 The Koopman operator of a Bernoulli scheme . . . . . . . . . . . . . . . . . . . 69
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.1 Information content and entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2 Properties of the entropy of a partition . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2.1 The entropy of . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2.2 Convexity properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2.3 Information and independence . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3 The Metric Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3.1 Denition and meaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3.2 The ShannonMcMillanBreiman Theorem . . . . . . . . . . . . . . 84
4.3.3 Sinais Generator theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.4.1 Bernoulli schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.4.2 Irrational rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.4.3 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.4.4 Expanding Markov Maps of the Interval . . . . . . . . . . . . . . . . . 88
4.5 Abramovs Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.6 Topological Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.6.1 The AdlerKonheimMcAndrew denition . . . . . . . . . . . . . . 91
4.6.2 Bowens denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.6.3 The variational principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
A The isomorphism theorem for standard measure spaces . . . . . . . . . . . . 99
A.1 Polish spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
A.2 Standard probability spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
A.3 Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
A.4 The isomorphism theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
A The Monotone Class Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Chapter 1
Basic denitions and constructions
1.1 What is ergodic theory and how it came about
Dynamical systems and ergodic theory. Ergodic theory is a part of the theory of
dynamical systems. At its simplest form, a dynamical system is a function T dened
on a set X. The iterates of the map are dened by induction T
0
:=id, T
n
:=T T
n1
,
and the aim of the theory is to describe the behavior of T
n
(x) as n .
More generally one may consider the action of a semi-group of transformations,
namely a family of maps T
g
: X X (g G) satisfying T
g
1
T
g
2
= T
g
1
g
2
. In the
particular case G=R
+
or G=Rwe have a family of maps T
t
such that T
t
T
s
=T
t+s
,
and we speak of a semi-ow or a ow.
The original motivation was classical mechanics. There X is the set of all pos-
sible states of given dynamical system (sometimes called conguration space) or
phase space), and T : X X is the law of motion which prescribes that if the sys-
tem is at state x now, then it will evolve to state T(x) after one unit of time. The orbit
T
n
(x)
nZ
is simply a record of the time evolution of the system, and the under-
standing the behavior of T
n
(x) as n is the same as understanding the behavior
of the system at the far future. Flows T
t
arise when one insists on studying contin-
uous, rather than discrete time. More complicated group actions, e.g. Z
d
actions,
arise in material science. There x X codes the conguration of a ddimensional
lattice (e.g. a crystal), and T
v
: v Z
d
are the symmetries of the lattice.
The theory of dynamical systems splits into subelds which differ by the struc-
ture which one imposes on X and T:
1. Differentiable dynamics deals with actions by differentiable maps on smooth
manifolds;
2. Topological dynamics deals with actions of continuous maps on topological
spaces, usually compact metric spaces;
3. Ergodic theory deals with measure preserving actions of measurable maps on a
measure space, usually assumed to be nite.
1
2 1 Basic denitions, examples, and constructions
It may seem strange to assume so little on X and T. The discovery that such
meagre assumptions yield non trivial information is due to Poincar e, who should be
considered the progenitor of the eld.
Poincar es Recurrence Theorem and the birth of ergodic theory. Imagine a box
lled with gas, made of N identical molecules. Classical mechanics says that if
we know the positions q
i
= (q
1
i
, q
2
i
, q
3
i
) and momenta p
i
= (p
1
i
, p
2
i
, p
3
i
) of the i-th
molecule for all i = 1, . . . , N, then we can determine the positions and momenta of
each molecule at time t by solving Hamiltons equations
p
j
i
(t) =H/q
j
i
q
j
i
(t) = H/ p
j
i
.
(1.1)
H = H(q
1
, . . . , q
N
; p
1
, . . . , p
N
), the Hamiltonian, is the total energy of the system.
It is natural to call (q, p) := (q
1
, . . . , q
N
; p
1
, . . . , p
N
) the state of the system. Let X
denote the collection of all possible states. If we assume (as we may) that the total
energy is bounded above, then this is a open bounded subset of R
6N
. Let
T
t
: (q, p) (q(t), p(t))
denote the map which gives solution of (1.1) with initial condition (q(0), p(0)). If
H is sufciently regular, then (1.1) had a unique solution for all t and every ini-
tial condition. The uniqueness of the solution implies that T
t
is a ow. The law of
conservation of energy implies that x X T
t
(x) X for all t.
Question: Suppose the system starts at a certain state (q(0), p(0)), will it eventually
return to a state close to (q(0), p(0))?
For general H, the question seems intractable because (1.1) is strongly coupled
system of an enormous number of equations (N 10
24
). Poincar es startling discov-
ery is that the question is trivial, if viewed from the right perspective. To understand
his solution, we need to recall a classical fact, known as Liouvilles theorem: The
Lebesgue measure m on X satises m(T
t
E) = m(E) for all t and all measurable
E X (problem 1.1).
Here is Poincar es solution. Dene T := T
1
, and observe that T
n
= T
n
. Fix > 0
and consider the set W of all states x = (q, p) such that d(x, T
n
(x)) > for all n 1.
Divide W into nitely many disjoint pieces W
i
of diameter less than .
For each xed i, the sets T
n
(W
i
) (n 1) are pairwise disjoint: If T
n
(W
i
)
T
(n+k)
(W
i
) ,=, then there is some x W
i
T
k
(W
i
) ,= and
1. x T
k
(W
i
) implies that T
k
(x) W
i
whence d(x, T
k
x) diam(W
i
) <, whereas
2. x W
i
W implies that d(x, T
k
x) > by the denition of W.
Since T
n
W
i

n1
are pairwise disjoint, m(X)
k1
m(T
k
W
i
). But T
k
(W
i
)
all have the same measure (Liouville theorem), and m(X) < , so we must have
m(W
i
) = 0. Summing over i we get that m(W) = 0. In summary, a.e. x has the
property that d(T
n
(x), x) < for some n 1. Considering the countable collection
= 1/n, we obtain the following:
1.2 The abstract setup of ergodic theory 3
Poincar es Recurrence Theorem: For almost every x = (q(0), p(0)), if the system
is at state x at time zero, then it will return arbitrarily close to this state innitely
many times in the arbitrarily far future.
Poincar es Recurrence Theorem is a tour de force, because it turns a problem
which looks intractable to a triviality by simply looking at it from a different angle.
The only thing the solution requires is the existence of a nite measure on X such
that m(T
1
E) = m(E) for all measurable sets E. This startling realization raises
the following mathematical question: What other dynamical information can one
extract from the existence of a measure m such that m = m T
1
? Of particular
interest was the justication of the following assumption made by Boltzmann in
his work on statistical mechanics:
The Ergodic Hypothesis: For certain invariant measures , many functions f : X
R, and many states x = (q, p), the time average of f lim
T
1
T
_
T
0
f (T
t
(x))dt exists, and
equals the space average of f ,
1
(X)
_
f d.
(This is not Boltzmanns original formulation.) The ergodic hypothesis is a quanti-
tative version of Poincar es recurrence theorem: If f is the indicator of the ball
around a state x, then the time average of f is the frequency of times when T
t
(x) is
away from x, and the ergodic hypothesis is a statement on its value.
1.2 The abstract setup of ergodic theory
The proof of Poincar es Recurrence Theorem suggests the study of the following
setup.
Denition 1.1. A measure space is a triplet (X, B, ) where
1. X is a set, sometime called the space.
2. B is a algebra, namely a collection of subsets of X which contains the empty
set, and which is closed under complements and countable unions. The elements
of B are called measurable sets.
3. : B [0, ], called the measure, is a additive function, namely a function
s.t if E
1
, E
2
, . . . B are pairwise disjoint, then (

i
E
i
) =
i
(E
i
).
If (X) = 1 then we say that is a probability measure and (X, B, ) is a proba-
bility space.
In order to avoid measure theoretic pathologies, we will always assume that
(X, B, ) is the completion (see problem 1.2) of a standard measure space, namely
a measure space (X, B
/
,
/
), where X is a complete, metric, separable space and
B
/
is its Borel algebra. It can be shown that such spaces are Lebesgue spaces,
namely measure spaces which are isomorphic to a compact interval equipped with
the Lebesgue algebra and measure, and a nite or countable collection of points
with positive measure. See the appendix for details.
4 1 Basic denitions, examples, and constructions
Denition 1.2. A measure preserving transformation (mpt) is a quartet (X, B, , T)
where (X, B, ) is a measure space, and
1. T is measurable: E BT
1
E B;
2. m is Tinvariant: m(T
1
E) = m(E) for all E B.
A probability preserving transformation (ppt) is a mpt on a probability space.
This is the minimal setup needed to prove (problem 1.3):
Theorem 1.1 (Poincar es Recurrence Theorem). Suppose (X, B, , T) is a p.p.t.
If E is a measurable set, then for almost every x E there is a sequence n
k

such that T
n
k
(x) E.
Poincar es theorem is not true for general innite measure preserving transforma-
tions, as the example T(x) = x +1 on Z demonstrates.
Having dened the objects of the theory, we proceed to declare when do we
consider two objects to be isomorphic:
Denition 1.3. Two m.p.t. (X
i
, B
i
,
i
, T
i
) are called measure theoretically isomor-
phic, if there exists a measurable map : X
1
X
2
such that
1. there are X
/
i
B
i
such that m
i
(X
i
X
/
i
) =0 and such that : X
/
1
X
/
2
is invertible
with measurable inverse;
2. for every E B
2
,
1
(E) B
1
and m
1
(
1
E) = m
2
(E);
3. T
2
= T
1
on X
1
.
One of the main aims of ergodic theorists is to devise method for deciding whether
two mpts are isomorphic.
1.3 The probabilistic point of view.
Much of the power and usefulness of ergodic theory is due to the following prob-
abilistic interpretation of the abstract set up discussed above. Suppose (X, B, , T)
is a ppt. We think of
1. X as of a sample space, namely the collection of all possible states of a random
system;
2. B as the collection of all measurable events, namely all sets E X such that we
have enough information to answer the question is E?;
3. is the probability law: Pr[ E] := (E);
4. measurable functions f : X R are random variables f ();
5. the sequence X
n
:= f T
n
(n 1) is a stochastic process, whose distribution is
given by the formula
Pr[X
i
1
E
i
1
, . . . , X
i
k
E
i
k
] :=
_
k

j=1
X : f (T
i
j
) E
i
j

_
.
1.4 Ergodicity and mixing 5
The invariance of guarantees that such stochastic processes are always station-
ary: Pr[X
i
1
+m
E
i
1
+m
, . . . , X
i
k
E
i
k
+m
] = Pr[X
i
1
E
i
1
, . . . , X
i
k
E
i
k
] for all m.
The point is that we can ask what are the properties of the stochastic processes
f T
n

n1
arising out of the ppt (X, B, , T), and thus bring in tools and intuition
from probability theory to the study of dynamical systems. Note that we have found
a way of studying stochastic phenomena in a context which is, a priori, completely
deterministic (if we know the state of the system at time zero is x, then we know
with full certainty that its state at time n is T
n
(x)). The modern treatment of the
question how come a deterministic system can behave randomly is based on this
idea.
1.4 Ergodicity and mixing
Suppose (X, B, , T) is a mpt. A measurable set E B is called invariant, if
T
1
(E) = E. Evidently, in this case T can be split into two measure preserving
transformations T[
E
: E E and T[
E
c : E
c
E
c
, which do not interact.
Denition 1.4. A mpt (X, B, , T) is called ergodic, if every invariant set E satises
(E) = 0 or (X E) = 0. We say is an ergodic measure.
Proposition 1.1. Suppose (X, B, , T) is a mpt on a complete measure space, then
the following are equivalent:
1. is ergodic;
2. if E B and (T
1
EE) = 0, then (E) = 0 or (X E) = 0;
3. if f : X R is measurable and f T = f a.e., then there is c R s.t. f = c a.e.
Proof. Suppose is ergodic, and E is measurable s.t. (ET
1
E) = 0. We con-
struct a measurable set E
0
such that T
1
E
0
= E
0
and (E
0
E) = 0. By ergodicity
(E
0
) = 0 or (X E
0
) = 0. Since (EE
0
) = 0 implies that (E) = (E
0
) and
(X E) = (X E
0
) we get that either (E) = 0 or (X E) = 0.
The set E
0
we use is E
0
:=x X : T
k
(x) E innitely often.. It is obvious that
this set is measurable and invariant. To estimate (E
0
E) note that
(a) if x E
0
E, then there exists some k s.t. x T
k
(E) E;
(b) if x E E
0
, then there exists some k s.t. x , T
k
(E), whence x E T
k
(E).
Thus E
0
E

k1
ET
k
(E). We now use the following triangle inequality :
(A
1
A
3
) (A
1
A
2
) +(A
2
A
3
) (A
i
B) (prove!):
(E
0
E)

k=1
(ET
k
E)

k=1
k

i=0
(T
i
ET
(i+1)
E)
=

k=1
k(ET
1
E) ( T
1
= ).
6 1 Basic denitions, examples, and constructions
Since (ET
1
E) = 0, (E
0
E) = 0 and we have shown that (1) implies (2).
Next assume (2). and let f be a measurable function s.t. f T = f almost every-
where. For every t, [ f >t]T
1
[ f >t] [ f ,= f T], so
([ f >t]T
1
[ f >t]) = 0.
By assumption, this implies that either [ f >t] = 0 or [ f t] = 0. In other words,
either f > t a.e., or f t a.e. Dene c := supt : f > t a.e.. then f = c almost
everywhere, proving (3). The implication (3)(2) is obvious: take f = 1
E
. .
An immediate corollary is that ergodicity is an invariant of measure theoretic iso-
morphism: If two mpt are isomorphic, then the ergodicity of one implies the ergod-
icity of the other.
The next denition is motivated by the probabilistic notion of independence. Sup-
pose (X, B, ) is a probability space. We think of elements of B as of events, we
interpret measurable functions f : X R as random variables, and we view
as a probability law (E) = P[x E]. Two events E, F B are are called in-
dependent, if (E F) = (E)(F) (because in the case (E), (F) ,= 0 this is
equivalent to saying that (E[F) = (E), (F[E) = (F)).
Denition 1.5. A probability preserving transformation (X, B, , T) is called mix-
ing (or strongly mixing), if for all E, F B, (E T
k
F)
k
(E)(F). (There
is no notion of strong mixing for innite measure spaces.)
In other words, T
k
(F) is asymptotically independent of E. It is easy to see that
strong mixing is an invariant of measure theoretic isomorphism.
It can be shown that the sets E, F in the denition of mixing can be taken to be
equal (problem 1.12).
Proposition 1.2. Strong mixing implies ergodicity.
Proof. Suppose E is invariant, then (E) = (E T
n
E)
n
(E)
2
, whence
(E)
2
= (E). It follows that (E) = 0 or (E) = 1 = (X). .
Just like ergodicity, strong mixing can be dened in terms of functions. Before
we state the condition, we recall a relevant notion from statistics. The correlation
coefcient of f , g L
2
() is dened to be
( f , g) :=
_
f gd
_
f d
_
gd
| f
_
f d|
2
|g
_
gd|
2
.
The numerator is equal to
Cov( f , g) :=
_
[( f
_
f )(g
_
g)]d,
called the covariance of f , g. The idea behind this quantity is that if f , g are weakly
correlated then they will not always deviate from their means in the same direc-
tion, leading to many cancelations in the integral, and a small net result. If f , g are
1.5 Examples 7
strongly correlated, there will be less cancelations, and a larger net result. (The de-
nominator in the denition of is not important - it is there to force ( f , g) to take
values in [1, 1].)
Proposition 1.3. A ppt (X, B, , T) is strongly mixing iff for every f , g L
2
,
_
f gT
n
d
n
_
f d
_
gd, equivalently Cov( f , gT
n
)
n
0.
Proof. The condition implies mixing (take f = 1
E
, g = 1
F
). We show the other
direction. We need the following trivial observations:
1. Since T
1
= , | f T|
p
=| f |
p
for all f L
p
and 1 p ;
2. Cov( f , g) is bilinear in f , g;
3. [Cov( f , g)[ 4|f |
2
|g|
2
.
The rst two statements are left as an exercise. For the third we use the Cauchy-
Schwarz inequality (twice) to observe that
[Cov( f , g)[ |f
_
f |
2
|g
_
g|
2
(| f |
2
+|f |
1
)(|g|
2
+|g|
2
) (2| f |
2
)(2|g|
2
).
Assume that is mixing, and let f , g be two elements of L
2
. If f , g are indica-
tors of measurable sets, then Cov( f , g T
n
) 0 by mixing. If f , g are nite linear
combinations of indicators, Cov( f , gT
n
) 0 because of the bilinearity of the co-
variance. For general f , g L
2
, we can nd for every >0 nite linear combinations
of indicators f

, g

s.t. | f f

|
2
, |gg

|
2
< . By the observations above,
[Cov( f , gT
n
)[ [Cov( f f

, gT
n
)[ +[Cov( f

, g

T
n
)[ +[Cov( f

, (gg

) T
n
)[
4|g|
2
+o(1) +4(| f |
2
+), as n .
It follows that limsup[Cov( f , g T
n
)[ 4(| f |
2
+|g|
2
+). Since is arbitrary,
the limsup, whence the limit itself, is equal to zero. .
1.5 Examples
We illustrate these denitions by examples.
1.5.1 Circle rotations
Let T := [0, 1) equipped with the Lebesgue measure m, and dene for [0, 1)
R

: T T by R

(x) = x + mod 1. R

is called a circle rotation, because the


map (x) = exp[2ix] is an isomorphism between R

and the rotation by the angle


2 on S
1
.
8 1 Basic denitions, examples, and constructions
Proposition 1.4.
1. R

is measure preserving for every ;


2. R

is ergodic iff , Q;
3. R

is never strongly mixing.


Proof. A direct calculation shows that the Lebesgue measure m satises m(R
1

I) =
m(I) for all intervals I [0, 1). Thus the collection M := E B : m(R
1

E) =
m(E) contains the algebra of nite disjoint unions of intervals. It is easy to check
M is a monotone class, so by the monotone class theorem (see appendix) M con-
tains all Borel sets. It clearly contains all null sets. Therefore it contains all Lebesgue
measurable sets. Thus M =B and (1) is proved.
We prove (2). Suppose rst that = p/q for p, q N. Then R
q

=id. Fix some x


[0, 1), and pick so small that the neighborhoods of x+k for k =0, . . . , q1 are
disjoint. The union of these neighborhoods is an invariant set of positive measure,
and if is sufciently small then it is not equal to T. Thus R

is not ergodic.
Next assume that , Q. Suppose E is an invariant set, and set f = 1
E
. Expand
f to a Fourier series:
f =

nZ

f (n)e
2int
( convergence in L
2
).
The invariance of E dictates f = f R

. The Fourier expansion of f R

is
f R

nZ
e
2in

f (n)e
2int
.
Equating coefcients, we see that

f (n) =

f (n)exp[2in]. Thus either

f (n) = 0
or exp[2in] = 1. Since , Q,

f (n) = 0 for all n ,= 0. We obtain that f =

f (0)
a.e., whence 1
E
= m(E) almost everywhere. This can only happen if m(E) = 0 or
m(E) = 1, proving the ergodicity of m.
To show that m is not mixing, we consider the function f (x) = exp[2ix]. This
function satises f R

= f with = exp[2i] (such a function is called an


eigenfunction). For every there is a sequence n
k
s.t. n
k
mod 1 0 (Dirich-
let theorem), thus
| f R
n
k

f |
2
=[
n
k
1[
k
0.
It follows that F := Re( f ) = cos(2x) satises |F R
n
k

F|
2

k
0, whence
_
F R
n
k

Fdm
k
_
F
2
dm ,= (
_
F)
2
, and m is not mixing. .
1.5.2 Angle Doubling
Again, we work with T := [0, 1] equipped with the Lebesgue measure m, and dene
T : T T by T(x) = 2x mod 1. T is called the angle doubling map, because the
map (x) := exp[2ix] is an isomorphism between T and the map e
i
e
2i
on S
1
.
1.5 Examples 9
Proposition 1.5. The angle doubling map is probability preserving, and strong mix-
ing, whence ergodic.
Proof. It is convenient to work with binary expansions x = 0.d
1
d
2
d
3
. . ., (d
i
= 0, 1),
because with this representation T(0.d
1
d
2
. . .) = 0.d
2
d
3
. . . For every nite nword
of zeroes and ones (d
1
, . . . , d
n
), dene the sets (called cylinders)
[d
1
, . . . , d
n
] :=x [0, 1) : x = 0.d
1
d
n

2
. . . , for some
i
0, 1.
This is a (dyadic) interval, of length 1/2
n
.
It is clear that T
1
[d
1
, . . . , d
n
] = [, d
1
, . . . , d
n
] where stands for 0 or 1. Thus,
m(T
1
[d]) = m[0, d] +m[1, d] = 2 2
(n+1)
= 2
n
= m[d]. We see that M :=E
B : m(T
1
E) = m(E) contains the algebra of nite disjoint unions of cylinders.
Since M is obviously a monotone class, and since the cylinders generate the Borel
algebra (prove!), we get that M =B, whence T is measure preserving.
We prove that T is mixing. Suppose f , g are indicators of cylinders: f =1
[a
1
,...,a
n
]
,
g = 1
[b
1
,...,b
m
]
. Then for all k > n,
_
f gT
k
dm = m[a,
. .
k1
, b] = m[a]m[b].
Thus Cov( f , gT
k
)
k
0 for all indicators of cylinders. Every L
2
function can be
approximated in L
2
by a nite linear combination of indicators of cylinders (prove!),
one can proceed as in the proof of proposition 1.3 to show that Cov( f , gT
k
)
k
0
for all L
2
functions. .
1.5.3 Bernoulli Schemes
Let S be a nite set, called the alphabet, and let X := S
N
be the set of all onesided
innite sequences of elements of S. Impose the following metric on X:
d((x
n
)
n0
, (y
n
)
n0
) := 2
mink:x
k
,=y
k

. (1.2)
The resulting topology is generated by the collection of cylinders:
[a
0
, . . . , a
n1
] :=x X : x
i
= a
i
(0 i n1).
It can also be characterized as being the product topology on S
N
, when S is given
the discrete topology. In particular this topology is compact.
The left shift is the transformation T : (x
0
, x
1
, x
2
, . . .) (x
1
, x
2
, . . .). The left shift
is continuous.
Next x a probability vector p = (p
a
)
aS
, namely a vector of positive numbers
whose sum is equal to one.
10 1 Basic denitions, examples, and constructions
Denition 1.6. The Bernoulli measure corresponding to p is the unique measure on
the Borel algebra of X such that [a
0
, . . . , a
n1
] = p
a
0
p
a
n1
for all cylinders.
It is useful to recall why such a measure exists. The collection of cylinders is a
semi-algebra, namely a collection S such that
1. S is closed under intersections, and
2. for every A S, X A is a nite disjoint union of elements from S.
Carath eodorys Extension Theorem says that every additive function on a semi-
algebra has a unique extension to a additive function on the algebra generated
by S. Thus it is enough to check that
m : [a
0
, . . . , a
n1
] p
a
0
p
a
n1
is additive on S, namely that if [a] is a countable disjoint union of cylinders [b
j
],
then [a] =[b
j
].
Each cylinder is open and compact (prove!), so such unions are necessarily nite.
Let N be the maximal length of the cylinder [b
j
]. Since [b
j
] [a], we can write
[b
j
] = [a,
j
] =

[c[=N[b
j
[
[a,
j
, c], and a direct calculation shows that

[c[=N[b
j
[
[a,
j
, c] = [a,
j
]
_

c
p
c
_
N[b
j
[
= [a,
j
] [b
j
].
Summing over j, we get that

j
[b
j
] =

[c[=N[b
j
[
[a,
j
, c].
Now [a] =

j
[b
j
] =

[c[=N[b
j
[
[a,
j
, c], so the collection of (
j
, c) is equal to
the collection of all possible words w of length N [a[ (otherwise the right hand
side misses some sequences). Thus

j
[b
j
] =

[w[=N[a[
[a, w] = [a]
_

w
p
w
_
N[a[
= [a],
proving the additivity of on S. The existence and uniqueness of follows
from Carath eodorys theorem.
Proposition 1.6. Suppose X =0, 1
N
, is the (
1
2
,
1
2
)Bernoulli measure, and is
the left shift, then (X, B(X), , T) is measure theoretically isomorphic to the angle
doubling map.
Proof. The isomorphism is (x
0
, x
1
, . . . , ) :=2
n
x
n
. This is a bijection between
X
/
:=x 0, 1
N
:,n s.t. x
m
= 1 for all m n
1.5 Examples 11
and [0, 1) (prove that (X
/
) = 1), and it is clear that = T . Since the image
of a cylinder of length n is a dyadic interval of length 2
n
, preserves the measures
of cylinders. The collection of measurable sets which are mapped by to sets of the
same measure is a algebra. Since this algebra contains all the cylinders and all
the null sets, it contains all measurable sets. .
Proposition 1.7. Every Bernoulli scheme is mixing, whence ergodic.
The proof is the same as in the case of the angle doubling map, and is therefore
omitted.
1.5.4 Finite Markov Chains
We saw that the angle doubling map is isomorphic to a dynamical system acting
as the left shift on a space of sequences (a Bernoulli scheme). Such representations
appear frequently in the theory of dynamical systems, but more often than not, the
space of sequences is slightly more complicated than the set of all sequences.
1.5.4.1 Subshifts of nite type
Let S be a nite set, and A = (t
i j
)
SS
a matrix of zeroes and ones without columns
or rows made entirely of zeroes.
Denition 1.7. The subshift of nite type with alphabet S and transition matrix A is
the set
+
A
= x = (x
0
, x
1
, . . .) S
N
: t
x
i
x
i+1
= 1 for all i, together with the metric
d(x, y) := 2
mink:x
k
,=y
k

and the action (x


0
, x
1
, x
2
, . . .) = (x
1
, x
2
, . . .).
This is a compact metric space, and the left shift map :
+
A

+
A
is continuous.
We think of
+
A
as of the space of all innite paths on a directed graph with vertices
S and edge a b connecting a, b S such that t
ab
= 1.
Let
+
A
be a topologically mixing SFT with set of states S, [S[ <, and transition
matrix A = (A
ab
)
SS
.
A stochastic matrix is a matrix P = (p
ab
)
a,bS
with non-negative entries, such
that
b
p
ab
= 1 for all a, i.e. P1 = 1. The matrix is called compatible with A, if
A
ab
= 0 p
ab
= 0.
A probability vector is a vector p = p
a
: a S) of non-negative entries, s.t.
p
a
= 1
Astationary probability vector is a probability vector p =p
a
: a S) s.t. pP= p:

a
p
a
p
ab
= p
b
.
Given a probability vector p and a stochastic matrix P compatible with A, one can
dene a Markov measure on
+
A
(or
A
) by
[a
0
, . . . , a
n1
] := p
a
0
p
a
0
a
1
p
a
n2
a
n1
12 1 Basic denitions, examples, and constructions
The stochasticity of P guarantees that this measure is nitely additive on the algebra
of cylinders, and subadditivity is trivial because of compactness. Thus this gives
a Borel probability measure on
+
A
.
Proposition 1.8. is shift invariant iff p is stationary w.r.t. P. Any stochastic matrix
has a stationary probability vector.
Proof. To see the rst half of the statement, we note that is stationary iff [, b] =
[b] for all [b], which is equivalent to

a
p
a
p
ab
0
p
b
0
b
1
p
b
n2
b
n1
=

a
p
b
0
p
b
0
b
1
p
b
n2
b
n1
.
Canceling the identical terms on both sides gives
a
p
a
p
ab
0
= p
b
0
. Thus is shift
invariant iff p is Pstationary.
We now show that every stochastic matrix has a stationary probability vector.
Consider the right action of P on the simplex of probability vectors in R
N
:
:=(x
1
, . . . , x
N
) : x
i
0,

x
i
= 1 , T(x) = xP.
We have T() , since
a
(Tx)
a
=
a

b
x
b
p
ba
=
b
x
b
a
p
ba
=
b
x
b
= 1. By
Brouwers xed point theorem (continuous mapping of a closed convex set) T has a
xed point. This is the stationary probability vector. .
Thus every stochastic matrix determines (at least one) shift invariant measure.
We ask when is this measure ergodic, and when is it mixing.
1.5.4.2 Ergodicity and mixing
There are obvious obstructions to ergodicity and mixing. To state them concisely,
we introduce some terminology. Suppose A is a transition matrix. We say that a
connects to b in n steps, and write a
n
b, if there is a path of length n +1 on the
directed graph representing
+
A
which starts at a and ends at b. In terms of A, this
means that there are states b
1
, . . . , b
n1
s.t. t
ab
1
t
b
1
b
2
. . . t
b
n1
b
>0 (see problem 1.5).
Denition 1.8. A transition matrix A = (t
ab
)
a,bS
is called irreducible, if for every
a, b S there exists an n s.t. a
n
b. The period of an irreducible transition matrix
A is the number p := gcdn : a
n
a (this is independent of a).
1
An irreducible
transition matrix is called aperiodic if its period is equal to one.
For example, the SFT with transition matrix
_
0 1
1 0
_
is irreducible with period two.
1
Let p
a
, p
b
denote the periods of a, b, and let
b
:=n : b
n
b. By irreducibility, there are , s.t.
a

b

a. In this case for all n
b
, a

b
n
b

a, whence p
a
[ gcd(+ +
b
)[ gcd(
b

b
).
Now gcd(
b

b
)[ gcd(
b
), because gcd(
b
)
b

b
. Thus p
a
[p
b
. By symmetry, p
b
[p
a
and we
obtain p
a
= p
b
.
1.5 Examples 13
If A is not irreducible, then any (globally supported) Markov chain measure on

+
A
is non-ergodic. To see why, pick a, b S s.t. a does not connect to b in any
number of steps. The set
E :=x
+
A
: x
i
,= b for all i sufciently large
is a shift invariant set which contains [a], but which is disjoint from [b]. Thus it
has positive, non-full measure w.r.t. every (globally supported) Markov chain an
obstruction for ergodicity.
If A is irreducible, but not aperiodic, then any Markov chain measure on
+
A
is
non-mixing, because the function
f := 1
[a]
satises f f T
n
0 for all n not divisible by the period. This means that
_
f f T
n
d
is equal to zero on a subsequence, and therefore cannot converge to [a]
2
.
We claim that these are the only possible obstructions to ergodicity and mixing.
The proof is based on the following fundamental fact, whose proof will be given at
the next section.
Theorem 1.2 (Ergodic Theorem for Markov Chains). Suppose P is a stochastic
matrix, and write P
n
= (p
(n)
ab
), then P has a stationary probability vector p, and
1. if P is irreducible, then
1
n
n

k=1
p
(k)
ab

n
p
b
(a, b S);
2. if P is irreducible and aperiodic, then p
(k)
ab

n
p
b
(a, b S).
Corollary 1.1. A shift invariant Markov chain measure on a SFT
+
A
is ergodic iff
A is irreducible, and mixing iff A is irreducible and aperiodic.
Proof. Let be a Markov chain measure with stochastic matrix P and stationary
probability vector p. For all cylinders [a] = [a
0
, . . . , a
n1
] and [b] = [b
0
, . . . , b
m1
],
([a]
k
[b]) =
_
_

W
kn
[a, , b]
_
_
, W

:= = (
1
, . . . ,

) : [a, , b] ,=
= [a]

W
kn
p
a
n1

1
p

kn1
b
0

[b]
p
b
0
= [a][b]
p
(kn)
a
n1
b
0
p
b
0
.
If A is irreducible, then by theorem 1.2,
1
n
n1

k=0
([a]
k
[b])
n
[a][b].
14 1 Basic denitions, examples, and constructions
We claim that this implies ergodicity. Suppose E is an invariant set, and x >0,
arbitrarily small. There are cylinders A
1
, . . . , A
N
S s.t.
_
E

N
i=1
A
i
_
<.
2
Thus
(E) = (E
k
E) =
N

i=1
(A
i

k
E) =
N

i, j=1
(A
i

k
A
j
) 2.
Averaging over k, and passing to the limit, we get
(E) =
N

i, j=1
lim
n
1
n
n

k=1
(A
i

k
A
j
) 2 =
N

i, j=1
(A
i
)(A
j
) 2
=
_
N

i=1
(A
i
)
_
2
2 = [(E) ]
2
2.
Passing to the limit 0
+
, we obtain (E) = (E)
2
, whence (E) = 0 or 1.
Now assume that that A is irreducible and aperiodic. The ergodic theorem for
Markov chains says that ([a]
k
[b])
k
[a][b]. Since any measurable sets
E, F can approximated by nite disjoint unions of cylinders, an argument similar to
the previous one shows that (E
k
F)
k
(E)(F) for all E, F B and so
is mixing. .
1.5.4.3 Proof of the Ergodic Theorem for Markov chains
Suppose rst that P is an irreducible and aperiodic stochastic matrix. This implies
that there is some power m such that all the entries of P
m
are strictly positive.
3
Let N := [S[ denote the number of states, and consider the Nsimplex :=
(x
1
, . . . , x
N
) : x
i
0, x
i
=1 (the set of all probability vectors). Since P is stochas-
tic, the map T(x) =xP maps continuously into itself. By the Brouwer Fixed Point
Theorem, there is a probability vector p s.t. pP = p (this is the stationary probability
vector).
The set C := p is a compact convex neighborhood of the origin, and
T(C) C , T
m
(C) int[C] (we mean the relative interior).
The image is in the interior because all entries of P
m
are positive, so all coordinates
T
m
(x) = xP
m
are positive.
2
Proof: The collection of sets E satisfying this approximation property is a algebra which
contains all cylinders, therefore it is equal to B.
3
Begin by proving that if A is irreducible and aperiodic, then for every a there is an N
a
s.t. a
n
a
for all n > N
a
. Use this to show that for all a, b there is an N
ab
s.t. a
n
b for all n > N
ab
. Take
m = maxN
ab
.
1.5 Examples 15
Now consider L := span(C) (an N 1dimensional space). This is an invariant
space for T, whence for P
t
. We claim that all the eigenvalues of P
t
[
L
have absolute
value less than 1:
1. Eigenvalues of modulus larger than one are impossible, because P is stochastic,
so |vP|
1
|v|
1
, so the spectral radius of P
t
cannot be more than 1.
2. Roots of unity are impossible, because in this case P
km
has eigenvalue one for
some k. The eigenvector v is real valued. Normalizing it we can ensure v C.
But P
km
cannot have xed points on C, because P
km
(C) int[C]
3. Eigenvalues e
i
with ,2Qare impossible, because if e
i
is an eigenvalue then
there are two real eigenvectors u, v C such that the action of P on spanu, v is
conjugate to
_
cos sin
sin cos
_
, namely an irrational rotation. This means k
n

s.t. uP
mk
n
u C. But this cannot be the case because P
m
[C] int[C], and
by compactness, this cannot intersect C.
In summary the spectral radius of P
t
[
L
is less than one.
But R
N
= Lspanp. If we decompose a general vector v = q +t p with q
L, then the above implies that vP
n
= t p +O(|P
n
[
L
|)|q|
n
t p. It follows that
p
(n)
ab

n
p
b
for all a, b.
This is almost the ergodic theorem for irreducible aperiodic Markov chains, the
only thing which remains is to show that P has a unique stationary vector. Suppose
q is another probability vector s.t. qP = q. We can write p
(n)
ab
p
b
in matrix form
as follows:
P
n

n
Q, where Q = (q
ab
)
SS
and q
ab
= p
b
.
This means that q = qP
n
qQ, whence qQ = q, so q
a
=
b
q
b
q
ba
=
b
q
b
p
a
= p
a
.
Consider now the periodic irreducible case. Let A be the transition matrix asso-
ciated to P (with entries t
ab
= 1 when p
ab
> 0 and t
ab
= 0 otherwise). Fix once and
for all a state v. Working with the SFT
+
A
, we let
S
k
:=b S : v
n
b for some n = k mod p (k = 0, . . . , p1).
These sets are pairwise disjoint, because if b S
k
1
S
k
2
, then
i
= k
i
mod p and
s.t. v

i
b

v for i = 1, 2. By the denition of the period, p[
i
+ for i = 1, 2,
whence k
1
=
1
= =
2
= k
2
mod p.
It is also clear that every path of length which starts at S
k
, ends at S
k
/ where
k
/
= k + mod p. In particular, every path of length p which starts at S
k
ends at S
k
.
This means that if p
(p)
ab
> 0, then a, b belong to the same S
k
.
It follows that P
p
is conjugate, via a coordinate permutation, to a block matrix
with blocks (p
(p)
ab
)
S
k
S
k
. Each of the blocks is stochastic, irreducible, and aperiodic.
Let
(k)
denote the stationary probability vectors of the blocks.
By the rst part of the proof,
16 1 Basic denitions, examples, and constructions
p
(p)
ab


(k)
b
for all a, b in the same S
k
, and p
(n)
ab
= 0 for n ,= 0 mod p.
More generally, if a S
k
1
and b S
k
2
, then
lim

p
(p+k
2
k
1
)
ab
= lim


S
k
2
p
(k
2
k
1
)
a
p
(p)
b
=

S
k
2
p
(k
2
k
1
)
a

(k
2
)
b
(by the above)
=
(k
2
)
b

S
p
(k
2
k
1
)
a
=
(k
2
)
b
. ( p
(k
2
k
1
)
a
= 0 when , S
k
2
)
A similar calculation shows that lim
k
p
(kp+)
ab
= 0 when ,= k
2
k
1
mod p. We
conclude that the following limit exists
lim
n
1
n
n

k=0
p
(k)
ab
=
b
:=
1
p
p1

k=0

(k)
b
.
The limit = (
b
)
bS
is a probability vector.
We claim that is the unique stationary probability vector of P. The limit the-
orem for p
(n)
ab
can be written in the form
1
n

n1
k=0
P
k
Q where Q = (q
ab
)
SS
and
q
ab
=
b
. As before this implies that P = and that any probability vector q such
that qP = q, we also have qQ = q, whence q = p. .
Remark 1. The theorem has the following a nice probabilistic interpretation. Imag-
ine that we distribute mass on the states of S according to a probability distribution
q = (q
a
)
aS
. Now shift mass from one state to another using the rule that a p
ab

fraction of the mass at a is moved to b. The new mass distribution is qP (check).


After n steps, the mass distribution is qP
n
. The previous theorem says that, in the
aperiodic case, the mass distribution converges to the stationary distribution - the
equilibrium state. It can be shown that the rate of convergence is exponential (prob-
lem 1.7).
Remark 2: The ergodic theorem for Markov chains has an important generaliza-
tion to all matrices with non-negative entries, see problem 1.6.
1.5.5 The geodesic ow on a hyperbolic surface
The hyperbolic plane is the surface H := z C : Im(z) > 0 equipped with the
Riemannian metric ds = 2[dz[/Im(z), which gives it constant curvature (1).
It is known that the orientation preserving isometries (i.e. distance preserving
maps) are the M obius transformations which preserve H. They form the group
1.5 Examples 17
M ob(H) =
_
az +b
cz +d
: a, b, c, d R, ad bc = 1
_

__
a b
c d
_
: a, b, c, d R, ad bc = 1
_
=: PSL(2, R).
The geodesics (i.e. length minimizing curves) on the hyperbolic plane are vertical
halflines, or circle arcs which meet H at right angles. Here is why: Suppose
TM is a unit tangent vector which points directly up, then it is not difcult to see
that the geodesic at direction is a vertical line. For general unit tangent vectors
, nd an element M ob(H) which rotates them so that d() points up. The
geodesic in direction is the preimage of the geodesic in direction d() (a
vertical halfline). Since M obius transformations map lines to lines or circles in a
conformal way, the geodesic of is a circle meeting H at right angles.
The geodesic ow of H is the ow g
t
on the unit tangent bundle of H,
T
1
H:= TH: || = 1
which moves along the geodesic it determines at unit speed.
To describe this ow it useful to nd a convenient parametrization for T
1
H. Fix

0
T
1
H (e.g. the unit vector based at i and pointing up). For every , there is a
unique

M ob(H) such that = d

(
0
), thus we can identify
T
1
HM ob(H) PSL(2, R).
It can be shown, that in this coordinate system, the geodesic ow takes the form
g
t
_
a b
c d
_
=
_
a b
c d
__
e
t/2
0
0 e
t/2
_
.
To verify this, it is enough to calculate the geodesic ow on
0

_
1 0
0 1
_
.
Next we describe the Riemannian volume measure on T
1
H (up to normaliza-
tion). Such a measure must be invariant under the action of all isometries. In our
coordinate system, the isometry (z) = (az +b)/(cz +d) acts by

_
x y
z w
_
=
_
a b
c d
__
x y
z w
_
.
Since PSL(2, R) is a locally compact topological group, there is only one Borel
measure on PSL(2, R) (up to normalization), which is left invariant by all left trans-
lations on the group: the Haar measure of PSL(2, R). Thus the Riemannian volume
measure is a left Haar measure of PSL(2, R), and this determines it up to normal-
ization.
It is a particular feature of PSL(2, R) that its left Haar measure is also invari-
ant under right translations. It follows that the geodesic ow preserves the volume
measure on T
1
H.
18 1 Basic denitions, examples, and constructions
But this measure is innite, and it is not ergodic (prove!). To obtain ergodic ows,
we need to pass to compact quotients of H. These are called hyperbolic surfaces.
A hyperbolic surface is a Riemannian surface M such that every point in M has a
neighborhood V which is isometric to an open subset of H. Recall that a Riemannian
surface is called complete, if every geodesic can be extended indenitely in both
directions.
Theorem 1.3 (KillingHopf Theorem). Every complete connected hyperbolic sur-
face is isometric to H, where
1. is a subgroup of M ob(H);
2. every point z H is in the interior of some open set U PSL(2, R) s.t. g(U) :
g are pairwise disjoint;
3. the Riemannian structure on z
/
: z
/
U is the one induced by the Riemannian
structure on F.
If we identify with a subgroup of PSL(2, R), then we get the identication
T
1
( H) PSL(2, R). It is clear that Haar measure on PSL(2, R) induces
a unique locally nite measure on PSL(2, R), and that the geodesic ow on
T
1
( H) takes the form
g
t
() =
_
e
t/2
0
0 e
t/2
_
,
and preserves this measure.
Denition 1.9. A measure preserving ow g
t
: X X is called ergodic, if every
measurable set E such that g
t
(E) = E for all t satises m(E) = 0 or m(E
c
) = 0.
This is equivalent to asking that all L
2
functions f such that f g
t
= f a.e. for all
t R are constant (prove!).
Theorem 1.4. If H is compact, then the geodesic ow on T
1
( H) is ergodic.
Proof. Consider the following ows:
h
t
st
() =
_
1 t
0 1
_
h
t
un
() =
_
1 0
t 1
_
If we can show that any geodesic invariant function f is also invariant under these
ows then we are done, because it is known that
__
1 t
0 1
_
,
_
1 0
t 1
_
,
_
0
0
1
__
= PSL(2, R)
(prove!), and any PSL(2, R)invariant function on PSL(2, R) is constant.
1.6 Basic constructions 19
Since our measure is induced by the the Haar measure, the ows h
t
un
, h
t
st
are
measure preserving. A matrix calculation shows:
g
s
h
t
st
g
s
= h
te
s
st

s
id
g
s
h
t
un
g
s
= h
te
s
un

s
id
Step 1. If f L
2
, then f h
te
s L
2

s
f .
Proof. Approximate by continuous functions of compact support, and observe that
h
t
is an isometry of L
2
.
Step 2. If f L
2
and f g
s
= f , then f h
t
un
= f and f h
t
st
= f .
Proof. |f h
t
st
f | =| f g
s
h
t
st
f |
2
=| f g
s
h
t
st
g
s
f |
2
0.
Thus f is h
t
st
invariant. A similar calculation shows that it is h
t
un
invariant, and we
are done. .
1.6 Basic constructions
In this section we discuss several standard methods for creating new measure pre-
serving transformations from old ones. These constructions appear quite frequently
in applications.
Products
Recall that the product of two measure spaces (X
i
, B
i
,
i
) (i = 1, 2) is the measure
space (X
1
X
2
, B
1
B
2
,
1

2
) where B
1
B
2
is the smallest algebra which
contains all set of the formB
1
B
2
where B
i
B
i
, and
1

2
is the unique measure
such that (
1

2
)(B
1
B
2
) =
1
(B
1
)
2
(B
2
).
This construction captures the idea of independence from probability theory: if
(X
i
, B
i
,
i
) are the probability models of two random experiments, and these experi-
ments are independent, then (X
1
X
2
, B
1
B
2
,
1

2
) is the probability model
of the pair of the experiments: If E
i
B
i
, then
F
1
:= E
1
X
2
is the event in experiment 1, E
1
happened
F
2
:= X
1
E
2
is the event in experiment 2, E
2
happened,
and F
1
F
2
=E
1
E
2
; so (
1

2
)(F
1
F
2
) = (
1

2
)(F
1
)(
1

2
)(F
2
), showing
that the events F
1
, F
2
are independent.
20 1 Basic denitions, examples, and constructions
Denition 1.10. The product of two measure preserving systems (X
i
, B
i
,
i
, T
i
) (i =
1, 2) is the measure preserving system (X
1
X
2
, B
1
B
2
,
1

2
, T
1
T
2
), where
(T
1
T
2
)(x
1
, x
2
) = (T
1
x
1
, T
2
x
2
).
(Check that S is measure preserving.)
Proposition 1.9. The product of two ergodic mpt is not always ergodic. The product
of two mixing mpt is always mixing.
Proof. The product of two (ergodic) irrational rotations S := R

: T
2
T
2
,
S(x, y) = (x +, y +) mod 1 is not ergodic: F(x, y) = y x mod 1 is a non-
constant invariant function. (See problem 1.8.)
The product of two mixing mpt is however mixing. To see this set =
1

2
,
S = T
1
T
2
, and S := AB : A B
1
, B B
2
. For any E
1
:= A
1
B
1
, E
2
:=
A
2
B
2
S,
(E
1
S
n
E
2
) =
_
(A
1
B
1
) (T
1
T
2
)
n
(A
2
B
2
)
_
=
_
(A
1
T
n
A
2
) (B
1
T
n
B
2
)
_
=
1
(A
1
T
n
A
2
)
2
(B
1
T
n
B
2
)

n

1
(A
1
)
2
(B
1
)
1
(A
2
)
2
(B
2
) = (A
1
B
1
)(A
2
B
2
).
Since S is a semi-algebra which generates B
1
B
2
, any element of B
1
B
2
can
be approximated by a nite disjoint elements of S, and therefore (E S
n
F)
(E)(F) for all E, F B. .
1.6.1 Skew-products
We start with an example. Let be the (
1
2
,
1
2
)Bernoulli measure on the two shift

+
2
:=0, 1
N
. Let f :
+
2
Z be the function f (x
0
, x
1
, . . .) = (1)
x
0
. Consider the
transformation
T
f
:
+
2
Z
+
2
Z , T
f
(x, k) = ((x), k + f (x)),
where :
+
2

+
2
is the left shift. This system preserves the (innite) measure
m
Z
where m
Z
is the counting measure on Z. The n-th iterate is
T
n
f
(x, k) = (
n
x, k +X
0
+ +X
n1
), where X
i
:= (1)
x
i
.
What we see in the second coordinate is the symmetric random walk on Z, started at
k, because (1) the steps X
i
take the values 1, and (2) X
i
are independent because
of the choice of . We say that the second coordinate is a random walk on Z driven
by the noise process (
+
2
, B, , ).
Here is a variation on this example. Suppose T
0
, T
1
are two measure preserving
transformations of the same measure space (Y, C, ). Consider the transformation
1.6 Basic constructions 21
(X Y, BC, , T
f
), where
T
f
(x, y) = (Tx, T
f (x)
y).
The nth iterate takes the form T
n
f
(x, y) = (
n
x, T
x
n1
T
x
0
y). The second coordi-
nate looks like the random concatenation of elements of T
0
, T
1
. We say that T
f
is
a random dynamical system driven by the noise process (X, B, , T).
These examples suggest the following abstract constructions.
Suppose (X, B, , T) is a mpt, and G is a locally compact Polish
4
topological
group, equipped with a left invariant Haar measure m
G
. Suppose f : X G is mea-
surable.
Denition 1.11. The skewproduct with cocycle f over the basis (X, B, , T) is the
mpt (X G, BB(G), m
G
, T
f
), where T
f
: X G X G is the transforma-
tion T
f
(x, g) = (Tx, g f (x)).
(Check, using Fubinis theorem, that this is a mpt.) The nth iterate T
n
f
(x, g) =
(T
n1
x, f (x) f (Tx) f (T
n1
x)g), a random walk on G.
Now imagine that the group G is acting in a measure preserving way on some
space (Y, C, ). This means that there are measurable maps T
g
: Y Y such that
T
1
g
= , T
g
1
g
2
= T
g
1
T
g
2
, and (g, y) T
g
(y) is a measurable from X G to Y.
Denition 1.12. The random dynamical system on (Y, C, ) with action T
g
: g
G, cocycle f : X G, and noise process (X, B, , T), is the system (X Y, B
C, , T
f
) given by T
f
(x, y) = (Tx, T
f (x)
y).
(Check using Fubinis theorem that this is measure preserving.) Here the n-th iterate
is T
n
f
(x, y) = (T
n
x, T
f (T
n
x)
T
f (Tx)
T
f (x)
y).
It is obvious that if a skewproduct (or a random dynamical system) is ergodic or
mixing, then its base is ergodic or mixing. The converse is not always true. The
ergodic properties of a skewproduct depend in a subtle way on the interaction
between the base and the cocycle.
Here are two important obstructions to ergodicity and mixing for skewproducts.
In what follows G is a polish group and

G is its group of characters,

G := : G S
1
: is a continuous homomorphism.
Denition 1.13. Suppose (X, B, , T) is a ppt and f : X G is Borel.
1. f is called arithmetic w.r.t. , if h : X S
1
measurable, and

G non-constant,
such that f = h/hT a.e.
2. f is called periodic w.r.t. , if h : X S
1
measurable, [[ = 1, and

G non-
constant, such that f = h/hT a.e.
Proposition 1.10. Let (X, B, , T) be a ppt, f : X G Borel, and (X G, B
B(G), m
G
, T
f
) the corresponding skewproduct. If f is arithmetic, then T
f
is
not ergodic, and if f is periodic, then T
f
is not mixing.
4
Polish=has a topology which makes it a complete separable metric space.
22 1 Basic denitions, examples, and constructions
Proof. Suppose f is arithmetic. The function F(x, y) := h(x)(y) satises
F(Tx, y f (x)) = h(Tx)(y)( f (x)) = h(Tx)(y)h(x)/h(Tx) = F(x, y),
and we have a non-constant invariant function, meaning that T
f
is not ergodic. Now
suppose f is periodic, then F(x, y) = h(x)(y) satises
F(Tx, y f (x)) = h(Tx)(y)( f (x)) = h(Tx)(y)h(x)/h(Tx) = F(x, y),
whence F T
f
= F. Pick n
k
s.t.
n
k
1, then Cov(F, F T
n
k
f
)
k
_
F
2

(
_
F)
2
. Since F ,=
_
F a.e., the limit is non-zero and we get a contradiction to mix-
ing. .
1.6.2 Factors
When we construct skew-products over a base, we enrich the space. There are some-
time constructions which deplete the space.
Denition 1.14. A mpt transformation (X, B, , T) is called a (measure theoretic)
factor of a mpt transformation (Y, C, , S), if there are sets of full measure X
/

X,Y
/
Y such that T(X
/
) X
/
, S(Y
/
) Y
/
, and a measurable onto map : Y
/
X
/
such that
1
= and S = T on Y
/
. We call the factor map.
If (X, B, , T) is a factor of (Y, C, , S), then it is customary to call (Y, C, , S) an
extension of (X, B, , T).
There are three principle examples:
1. Any measure theoretic isomorphism between two mpt is a factor map between
them.
2. A skew product T
f
: X Y X Y is an extension of its base T : X X. The
factor map is : X G X, (x, y) = x.
3. Suppose (X, B, , T) is an mpt and T is measurable w.r.t. a smaller algebra
C B, then (X, C, , T) is a factor of (X, B, , T). The factor map is the iden-
tity.
We dwell a bit more on the third example. In probability theory, algebras
model information: a set E is measurable, if we can answer the question is in
E? using the information available to use. For example, if a real number x R is
unknown, but we can measure [x[, then the information we have on x is modeled
by the algebra E R : E = E, because we can determined whether x E
only for symmetric sets E. By decreasing the algebra, we are forgetting some
information. For example if instead of knowing [x[, we only know whether 0
[x[ 1 or not, then our algebra is the nite algebra generated by [1, 1].
Here is a typical example. Suppose we have a dynamical system (X, B, , T),
and we cannot measure x, but we can measure f (x) for some measurable
1.6 Basic constructions 23
f : X R. Then the information we have by observing the dynamical system is
encoded in the sub sigma algebra
C := ( f T
n
: n 0)
dened to be the smallest algebra with respect to which f T
n
are all mea-
surable.
5
The dynamical properties we feel in this case are those of the factor
(X, C, , T) and not of the (X, B, , T). For example, it could be the case that
(ET
n
F) (E)(F) for all E, F C but not for all E, F B and then we
will observe mixing simply because our information is not sufcient to observe
the non-mixing in the system.
1.6.3 The natural extension
Every ppt is a factor of an invertible ppt. Moreover, there is a extension of this type
which is minimal in the sense that it is a factor of any other invertible extension.
This extension is unique up to isomorphism, and is called the natural extension. We
describe the construction.
Denition 1.15. Suppose (X, B, , T) is a ppt, and that T(X) = X.
6
The natural
extension of (X, B, , T) is the system (

X,

B, ,

T), where
1.

X :=x = (..., x
1
, x
0
, x
1
, x
2
, . . .) : x
i
X, T(x
i
) = x
i+1
for all i;
2.

B:=
1
(E) : E

B, where (x) = x
0
;
3. is the measure (
1
E) := (E);
4.

T is the left shift.
Theorem 1.5. The natural extension of (X, B, , T) is an invertible extension of
(X, B, , T), and is the factor of any other invertible extension of (X, B, , T). Er-
godicity and mixing are preserved under natural extensions.
Proof. It is clear that the natural extension is an invertible extension, and that

T = T . Since TX = X, every point has a preimage, and so is onto. Thus


(

X,

B, ,

T) is an invertible extension of (X, B, , T).


Suppose (Y, C, , S) is another invertible extension, and let
Y
: Y X be the
factor map (dened a.e. on Y). We show that (Y, C, , S) extends (

X,

B, ,

T).
Let (

Y,

C, ,

S) be the natural extension of (Y, C, , S). It is isomorphic to


(Y, C, , S), with the isomorphism given by (y) = (y
k
)
kZ
, y
k
:= S
k
(y). Thus it
is enough to show that (

X,

B, ,

T) is a factor of (

Y,

C, ,

T). Here is the factor


map: : (y
k
)
kZ
(
Y
(y
k
))
kZ
.
5
Such a algebra exists: take the intersection of all sub-algebras which make f T
n
all mea-
surable, and note that this intersection is not empty because it contains B.
6
Actually all that we need is that T
n
X is measurable for all n. In general, the measurable (or even
continuous) forward image of a measurable set is not measurable.
24 1 Basic denitions, examples, and constructions
We show that ergodicity is preserved under natural extensions: Suppose

E is a

Tinvariant

Bmeasurable set. By the denition of

B, we can write

E =
1
(E),
E B. We saw above that is onto, therefore E =

E. We have
T
1
E =x : T(x) E = (. . . , x
1
, x
0
, x
1
, . . .)

X : x
1
E
= (. . . , x
1
, x
0
, x
1
, . . .)

X :

T(. . . , x
1
, x
0
, x
1
, . . .)
1
E
=

T
1
(
1
E) =

T
1

E =

E = E.
Thus E has full measure, whence

E =
1
E has full measure. The proof that mixing
is preserved under natural extensions is left to the reader. .
1.6.4 Induced transformations
Suppose (X, B, , T) is a probability preserving transformation, and let A B be a
set of positive measure. By Poincar es Recurrence Theorem, for a.e. x A there is
some n 1 such that T
n
(x) A. Dene

A
(x) := minn 1 : T
n
x A,
with the minimum of the empty set being understood as innity. Note that
A
<
a.e. on A.
Denition 1.16. The induced transformation on A is (A
0
, B(A),
A
, T
A
), where
A := x A :
A
(x) < , B(A) := E A : E B,
A
is the measure
A
(E) :=
(E[A) = (E A)/(A), and T
A
: A
0
A
0
is T
A
(x) = T

A
(x)
(x).
Theorem 1.6. Suppose (X, B, , T) is a ppt, and A Bhas positive nite measure.
1.
A
T
1
A
=
A
;
2. if T is ergodic, then T
A
is ergodic (but the mixing of T does not imply the mixing
of T
A
);
3. Kac Formula: If is ergodic, then
_
f d =
_
A

A
1
k=0
f T
k
d for every f
L
1
(X). In particular
_
A

A
d
A
= 1/(A).
Proof. Given E A measurable,
(E) = (T
1
E A)
. .
(T
1
A
E[
A
=1])
+(T
1
E A
c
)
= (T
1
E A)
. .
(T
1
A
E[
A
=1])
+(T
2
E T
1
A
c
A)
. .
(T
1
A
E[
A
=2])
+(T
2
E T
1
A
c
A
c
)
= =
N

j=1
(T
1
A
E [
A
= j]) +(T
N
E
N1

j=0
T
j
A
c
).
1.6 Basic constructions 25
Passing to the limit as N , we see that (E)
A
(T
1
A
E). Working with A
E, and using the assumption that (X) < , we get that (A) (E) (A)
(T
1
A
E) whence (E) = (T
1
A
E). Since
A
is proportional to on B(A), we get

A
=
A
T
1
A
.
We nowassume that T is ergodic, and prove that T
A
is ergodic. Since T is ergodic,
and the set
x : T
n
(x) A for innitely many n 0
is a Tinvariant set of non-zero measure (bounded below by (A)), a.e. x X has
some n 0 s.t. T
n
(x) A. Thus r
A
(x) := minn 0 : T
n
x A < a.e. in X (not
just a.e. in A).
Suppose f : A
0
R is a T
A
invariant L
2
function. Dene
F(x) := f (T
r
A
(x)
x).
This makes sense a.e. in X, because r
A
< almost everywhere. This function is T
invariant, because T
r
A
(Tx)
(Tx) T
r
A
(x)
(x), T
A
(T
r
A
(x)
x), T
2
A
(T
r
A
(x)
x), and f T
A
=
f almost everywhere. Since T is ergodic, F is constant. Since F[
A
= f , f is constant.
Thus the ergodicity of T implies the ergodicity of T
A
.
Here is an example showing that the mixing of T does not imply the mixing of
T
A
. Let
+
be a SFT with states a, 1, 2, b and allowed transitions
a 1; 1 1, b; b 2; 2 a.
Let A =x : x
0
= a, b. Any shift invariant Markov chain measure on
+
is mix-
ing, because
+
is irreducible and aperiodic (1 1). But T
A
is not mixing, because
T
A
[a] = [b] and T
A
[b] = [a], so [a] T
n
A
[a] = for all n odd.
Next we prove the Kac formula. Suppose rst that f L

(X, B, ) and f 0.
_
f d =
_
A
f d +
_
f 1
A
c d =
_
A
f d +
_
f T 1
T
1
A
c d
=
_
A
f d +
_
f T 1
T
1
A
c
A
d +
_
f T 1
T
1
A
c
A
c d
=
_
A
f d +
_
A
f T 1
[
A
>1]
d +
_
f T
2
1
T
2
A
c
T
1
A
c d
= =
_
A
N1

j=0
f T
j
1
[
A
>j]
d +
_
f T
N
1

N
j=1
T
j
A
c
d.
The rst term tends, as N , to
_
A

j=0
f T
j

i=j+1
1
[
A
=i]
d
_
A

A
1

j=0
f T
j
d.
The second term is bounded by |f |

x : T
j
(x) , A for all k N. This bound
tends to zero, because x : T
j
(x) , A for all k = 0 because T is ergodic and re-
current (ll in the details). This proves the Kac formula for all L

functions.
26 1 Basic denitions, examples, and constructions
Every non-negative L
1
function is the increasing limit of L

functions. By the
monotone convergence theorem, the Kac formula must hold for all non-negative
L
1
function.
Every L
1
function is the difference of two non-negative L
1
functions ( f =
f 1
[ f >0]
[ f [ 1
[ f <0]
). It follows that the Kac formula holds for all absolutely inte-
grable functions. .
1.6.5 Suspensions and Kakutani skyscrapers
The operation of inducing can be inverted, as follows. Let (X, B, , T) be a ppt,
and r : X N an integrable measurable function.
Denition 1.17. The Kakutani skyscraper with base (X, B, , T) and height func-
tion r is the system (X
r
), B(X
r
), , S), where
1. X
r
:=(x, n) : x X, 0 n r(x) 1;
2. B(X
r
) =E B(X) B(N) : E X
r
, where B(N) = 2
N
;
3. is the unique measure such that (Bk) = (B)/
_
rd;
4. S is dened by S(x, n) = (x, n+1), when n <r(x)1, and S(x, n) = (Tx, 0), when
n = r(x) 1.
(Check that this is a ppt.)
We think of X
r
as a skyscraper made of stories (x, k) : r(x) > k; the orbits of S
climb up the skyscraper until the reach the top oor possible, and then move to the
ground oor according to T.
If we induce a Kakutani skyscraper on (x, 0) : x X, we get a system which is
isomorphic to (X, B, , T).
Proposition 1.11. A Kakutani skyscraper over an ergodic base is ergodic, but there
are non-mixing skyscrapers over mixing bases.
The proof is left as an exercise.
There is a straightforward important continuoustime version of this construc-
tion: Suppose (X, B, , T) is a ppt, and r : X R
+
is a measurable function such
that inf r > 0.
Denition 1.18. The suspension semi-ow with base (X, B, , T) and height func-
tion r is the semi-ow (X
r
, B(X
r
), , T
s
), where
1. X
r
:=(x, t) X R : 0 t < r(x);
2. B(X
r
) =E B(X) B(R) : E X
r
;
3. is the measure such that
_
X
r
f d =
_
X
_
r(x)
0
f (x, t)dtd(x)/
_
X
rd;
4. T
s
(x, t) = (T
n
x, t +s
n1

k=0
r(T
k
x)), where n is s.t. 0 t +s
n1

k=0
r(T
k
x) <r(T
n
x).
1.6 Basic constructions 27
(Check that this is a measure preserving semi-ow.)
Suspension ows appear in applications in the following way. Imagine a ow T
t
on a manifold X. It is often possible to nd a curve S X such that (almost) every
orbit of the ow intersects S transversally innitely many times. Such a curve is
called a Poincar e section. If it exists, then one can dene a map T
S
: S S which
maps x S into T
t
x with t := mins > 0 : T
s
(x) S. This map is called the Section
map. The ow itself is isomorphic to a suspension ow over its Poincar e section.
Problems
1.1. Proof of Liouvilles theorem in section 1.1
(a) Write x := (q, p) and y := T
t
(q, p). Use Hamiltons equations to show that the
Jacobian matrix of y = y(x) satises
y
x
= I +tA+O(t
2
) as t 0, where tr(A) = 0.
(b) Show that for every matrix A, det(I +tA+O(t
2
)) = 1+ttr(A) +O(t
2
) as t 0.
(c) Prove that the Jacobian of T
t
is equal to one for all t. Deduce Liouvilles theorem.
1.2. The completion of a measure space. Suppose (X, B, ) is a measure space. A
set N X is called a null set, if there is a measurable set E N such that (E) = 0.
A measure space is called complete, if every null set is measurable. Every measure
space can be completed, and this exercise shows how to do this.
(a) Let B
0
denote the the collection of all sets of the form E N where E B and
N is a null set. Show that B
0
is a algebra.
(b) Show that has a unique extension to a additive measure on B
0
.
1.3. Prove Poincar es Recurrence Theorem for a general probability preserving
transformation (theorem 1.1).
1.4. Fill in the details in the proof above that the Markov chain measure correspond-
ing to a stationary probability vector and a stochastic matrix exists, and is shift in-
variant measure.
1.5. Suppose
+
A
is a SFT with transition matrix A. Write A
n
= (t
(n)
ab
). Prove that
t
(n)
ab
is the number of paths of length n starting at a and ending at b. In particular:
a
n
b t
(n)
ab
> 0.
1.6. The PerronFrobenius Theorem
7
: Suppose A = (a
i j
) is a matrix all of whose
entries are non-negative, and let B := (b
i j
) be the matrix b
i j
= 1 if a
i j
> 0 and
b
i j
= 0 if a
i j
= 0. Assume that B is irreducible, then A has a positive eigenvalue
with the following properties:
(i) There are positive vectors r and s.t. A = , Ar = r.
(ii) The eigenvalue is simple.
7
Perron rst proved this in the aperiodic case. Frobenius later treated the periodic irreducible case.
28 1 Basic denitions, examples, and constructions
(iii) The spectrum of
1
A consists of 1, several (possibly zero) roots of unity, and a
nite subset of the open unit disc. In this case the limit lim
n
1
n

n1
k=0

k
A
k
exists.
(iv) If B is irreducible and aperiodic, then the spectrum of
1
A consists of 1 and a
nite subset of the open unit disc. In this case the limit lim
n

n
A
n
exists.
1. Prove the PerronFrobenius theorem in case A is stochastic, rst in the aperiodic
case, then in the general case.
2. Now consider the case of a non-negative matrix:
a. Use a xed point theorem to show that , , r exist;
b. Set 1 := (1, . . . , 1) and let V be the diagonal matrix such that V1 = r. Prove
that
1
V
1
AV is stochastic.
c. Prove the Perron-Frobenius theorem.
1.7. Suppose P = (p
(n)
ab
)
SS
is an irreducible aperiodic stochastic matrix. Use the
spectral description of P obtained in problem 1.6 to show that p
(n)
ab
p
b
exponen-
tially fast.
1.8. Show that the product of n irrational rotations R

1
, . . . , R

n
is ergodic iff
(
1
, . . . ,
n
) are independent over the irrationals.
1.9. Suppose g
t
: X X is a measure preserving ow. The time one map of the
ow is the measure preserving map g
1
: X X. Give an example of an ergodic ow
whose time one map is not ergodic.
1.10. The adding machine
Let X =0, 1
N
equipped with the algebra Bgenerated by the cylinders, and the
Bernoulli (
1
2
,
1
2
)measure . The adding machine is the ppt (X, B, , T) dened by
the rule T(1
n
0) = (0
n
1), T(1

) = 0

. Prove that the adding machine is invertible


and probability preserving. Show that T(x) = x (10

) where is addition with


carry to the right.
1.11. Prove proposition 1.11.
1.12. Show that a ppt (X, B, , T) is mixing whenever (AT
n
A)
n
(A)
2
for all A B. Guidence:
1.
_
1
A
f T
n
d
n
(A)
_
f d for all f span
L
2 1, 1
A
T, 1
A
T
2
, . . ..
2.
_
1
A
f T
n
d
n
(A)
_
f d for all f L
2
.
3.
_
g f T
n
d
n
_
gd
_
f d for all f , g L
2
.
1.13. Show that a Kakutani skyscraper over an invertible transformation is invert-
ible, and nd a formula for its inverse.
1.14. Conservativity
Let (X, B, , T) be a measure preserving transformation on an innite nite,
References 29
measure space.
8
A set W B is called wandering, if T
n
W : n 0 are pairwise
disjoint. A mpt is called conservative, if every wandering set has measure zero.
1. Show that any ppt is conservative. Give an example of a non-conservative mpt
on a nite innite measure space.
2. Show that Poincar es recurrence theorem extends to conservative mpt.
3. Suppose (X, B, , T) is a conservative ergodic mpt, and let A be a set of -
nite positive measure. Show that the induced transformation T
A
: A A is well
dened a.e. on A, and is an ergodic ppt.
4. Prove Kac formula for conservative ergodic transformations under the previous
set of assumptions.
Notes for chapter 1
The standard references for measure theory are [7] and [4]. The standard references
for ergodic theory of probability preserving transformations are [1] and [3]. The
standard reference on ergodic theory of mpt on innite measure spaces is [1]. Our
proof of the Perron-Frobenius theorem is taken from [3]. Kacs formula has very
simple proof when T is invertible. The proof we use has the merit that it works
for non-invertible transformations, and that it extends to the (conservative) innite
measure setting. It is taken from [2]. The ergodicity of the geodesic ow was rst
proved by E. Hopf. The short proof we use was found much later by Gelfand, and
is reproduced in [2].
References
1. Aaronson, J.: Introduction to innite ergodic theory, Mathematical Surveys and Monographs
50, AMS, 1997. xii+284pp.
2. Bekka, M.B. and Mayer, M.: Ergodic theory and topological dynamics of group actions on
homogeneous spaces. London Mathematical Society LNM 269, 2000. x+200pp.
3. Brin, M. and Stuck, G.: Introduction to dynamical systems. Cambridge University Press,
Cambridge, 2002. xii+240 pp.
4. Halmos, P. R.: Measure Theory. D. Van Nostrand Company, Inc., New York, N. Y., 1950.
xi+304 pp.
5. Krengel, U.: Ergodic theorems. de Gruyter Studies in Mathematics 6 1985. viii+357pp.
6. Petersen, K.: Ergodic theory. Corrected reprint of the 1983 original. Cambridge Studies in
Advanced Mathematics, 2. Cambridge University Press, Cambridge, 1989. xii+329 pp.
7. Royden, H. L.: Real analysis. Third edition. Macmillan Publishing Company, New York,
1988. xx+444 pp.
8. Walters, P.: An introduction to ergodic theory. Graduate Texts in Mathematics, 79. Springer-
Verlag, New York-Berlin, 1982. ix+250 pp.
8
A measure space is called nite, if its sample space is the countable union of nite measure
sets.
Chapter 2
Ergodic Theorems
2.1 The Mean Ergodic Theorem
Theorem 2.1 (von Neumanns Mean Ergodic Theorem). Suppose (X, B, , T) is
a ppt. If f L
2
, then
1
N

N1
k=0
f T
k
L
2

n
f where f L
2
is invariant. If T is ergodic,
then f =
_
f d.
Proof. Observe that since T is measure preserving, |f T|
2
= | f |
2
for all f L
2
(prove this, rst for indicator functions, then for all L
2
functions).
Suppose f = ggT where g L
2
(in this case we say that f is a coboundary
with transfer function g L
2
), then it is obvious that
_
_
_
_
_
1
N
N1

k=0
f T
k
_
_
_
_
_
2
=
1
N
|gT
n
g|
2
2|g|
2
/N
N
0.
Thus the theorem holds for all elements of C :=ggT : g L
2
.
We claim that the theorem holds for all elements of C (L
2
closure). Suppose
f C, then for every > 0, there is an F C s.t. | f F|
2
< . Choose N
0
such
that for every N > N
0
, |
1
N

N1
k=0
F T
k
|
2
< , then for all N > N
0
_
_
_
_
_
1
N
N1

k=0
f T
k
_
_
_
_
_
2

_
_
_
_
_
1
N
N1

k=0
( f F) T
k
_
_
_
_
_
2
+
_
_
_
_
_
1
N
N1

k=0
F T
k
_
_
_
_
_
2

1
N
N1

k=0
|( f F) T
k
|
2
+ < 2.
This shows that
_
_
1
N

N1
k=0
f T
k
_
_
2

N
0.
Next we claim that C

=invariant functions. Suppose f C, then


31
32 2 Ergodic theorems
| f f T|
2
2
= f f T, f f T) =| f |
2
2 f , f T) +|f T|
2
2
= 2| f |
2
2
2 f , f ( f f T)) = 2| f |
2
2
2| f |
2
2
= 0.
Thus f = f T a.e., and we have proved that L
2
=C invariant functions.
We saw above that the MET holds for all elements of C, and it is obvious for all
invariant functions. Therefore the MET holds for all L
2
functions.
The proof shows that the limit f is the projection of f on the space of invariant
functions, and is thus invariant. If T is ergodic, then it is constant. This constant is
equal to
_
f d, because if f
n
f in L
2
on a nite measure space, then f
n
f in
L
1
(CauchySchwarz), and so
_
f
n

_
f .
Remark 1. The proof shows that the limit f is the projection of f on the space of
invariant functions.
Remark 2. The proof only uses the fact that U f = f T is an isometry of L
2
. In
fact it works for all contractions, see problem 2.1.
Remark 3. If f
n
L
2

n
f , then f
n
, g)
n
f , g) for all g L
2
. Specializing to the
case f
n
=
1
n

n1
k=0
1
B
T
k
, g = 1
A
we obtain the following corollary of the MET:
Corollary 2.1. A ppt (X, B, , T) is ergodic iff for all A, B B,
1
n
n1

k=0
(AT
k
B)
n
(A)(B).
So ergodicity is mixing on the average.
2.2 The Pointwise Ergodic Theorem
Theorem 2.2 (Birkhoffs Pointwise Ergodic Theorem). Let (X, B, , T) be a ppt.
If f L
1
, then the following limit exists almost everywhere: lim
N
1
N
N1

k=0
f T
k
.
The limit is a Tinvariant function. If T is ergodic, it is equal a.e. to
_
f d.
Proof. We begin by proving convergence when f = 1
B
, B B. Dene
A
n
(x) :=
1
N
N1

k=0
1
B
(T
k
x)
A(x) := limsup
n
A
n
(x)
A(x) := liminf
n
A
n
(x)
Fix and M > 0 (to be determined later), and set
2.2 The Pointwise Ergodic Theorem 33
(x) := minn > 0 : A
n
(x) > A(x) , B
/
:= B[ > M].
For a given N, we color the time interval 0, 1, 2, . . . , N1 as follows:
If (T
0
x) > M, color 0 red; If (T
0
x) M color the next (x) times blue, and
move to the rst uncolored k
If (T
k
x) >M, color k red; Otherwise color the next (T
k
x) times blue, and move
to the rst uncolored k
Continue in this way until all times colored, or until (T
k
x) > Nk.
This partitions 0, 1, . . . , N 1 into red segments, and (possibly consecutive)
blue segments of length M, and one last segment of length M. Note:
If k is red, then T
k
x B
/
, so
red k
/
s
1
B
/ (T
k
x) number of red k
/
s
The average of 1
B
on each blue segment of length (T
k
x) is larger than A(T
k
x)
= A(x) . It follows that

bluesegment
1
B
/ (T
k
x) length of blue segment (A).
Summing over all segments:
bluek
/
s
1
B
/ (T
k
x) (A)number of blues.
Since A < 1, we get
N1

k=0
1
B
/ (T
k
x) number of colored times (A)
(NM)(A).
Integrating, we get N(B
/
) (NM)(
_
Ad ) whence
(B
/
) A
or (B)
_
Ad ( +[ >M]). Nowchoose M to be so small that [ >M] <.
We get (B) A2, or
_
Ad (B) +2.
Applying this argument to B
c
, we get
_
Ad (B) +2. It follows that
0
_
(AA)d 4
0
+
0.
Thus
_
(AA)d = 0, whence A = A almost everywhere. This proves almost
everywhere convergence for f = 1
B
It automatically follows that the pointwise ergodic theorem holds for all simple
functions
n
i=1

i
1
B
i
, B
i
B.
Any L

function is the uniform limit of simple functions. Therefore the pointwise


ergodic theorem holds for all bounded measurable functions.
34 2 Ergodic theorems
Now suppose f L
1
. Find some L

such that | f |
1
< . Let A
n
(g), A(g),
A(g) be the corresponding objects for a function g. We have
A( f ) A() +A( f ) A() +A( f )
Integrating, we have
_
Ad
_
A()d +
_
liminf
n
A
n
( f )d

_
A()d +liminf
n
_
A
n
( f )d
_
A()d +| f |
1
.
Applying this argument for f , gives
_
Ad
_
A()d | f |
1
.
Since A() = A() (because it is bounded), we get
_
(A( f ) A( f ))d < 2| f |
1
< 2
0
+
0.
Again, A( f ) = A( f ) almost everywhere and we obtain pointwise convergence. .
Remark: If f L
1
then the Birkhoff averages of f converge in L
1
. (For bounded
functions this is the bounded convergence theorem; for functions with small L
1
norm, the L
1
norm of the averages remains small; any L
1
function is the sum of
a bounded functions and an L
1
function with small norm.)
2.3 The non-ergodic case
The almost sure limit in the pointwise ergodic theorem is clear when the map is
ergodic:
1
N

N1
k=0
f T
k

N
_
f d. In this section we ask what is the limit in the
non-ergodic case.
If f belongs to L
2
, the limit is the projection of f on the space of invariant func-
tions, because of the Mean Ergodic Theorem and the fact that every sequence of
functions which converges in L
2
has a subsequence which converges almost every-
where to the same limit.
1
But if f L
1
we cannot speak of projections. The right
notion in this case is that of the conditional expectation.
1
Proof: Suppose f
n
L
2

n
f . Pick a subsequence n
k
s.t. |f
n
k
f |
2
< 2
k
. Then
k1
|f
n
k
f |
2
<
. This means that |[ f
n
k
f [|
2
< , whence ( f
n
k
f ) converges absolutely almost surely. It
follows that f
n
k
f 0 a.e.
2.3 The non-ergodic case 35
2.3.1 Conditional expectations and the limit in the ergodic theorem
Let (X, B, ) be a probability space. Let F B be a algebra. We think of
F F as of the collection of all sets F for which we have sufcient information
to answer the question is x F?. The functions we have sufcient information to
calculate are exactly the Fmeasurable functions, as can be seen from the formula
f (x) := inft : x [ f <t].
Suppose g is not Fmeasurable. What is the best guess for g(x) given the infor-
mation F?
Had g been in L
2
, then the closest Fmeasurable function (in the L
2
sense) is
the projection of g on L
2
(X, F, ). The dening property of the projection Pg of g
is Pg, h) =g, h) for all h L
2
(X, F, ). The following denition mimics this case
when g is not necessarily in L
2
:
Denition 2.1. The conditional expectation of f L
1
(X, B, ) given F is the
unique L
1
(X, F, )element E( f [F) which is
1. E( f [F) is Fmeasurable;
2. L

Fmeasurable,
_
E( f [F)d =
_
f d.
Note: E( f [F) is only determined almost everywhere.
Proposition 2.1. The conditional expectation exists for every L
1
element, and is
unique up sets of measure zero.
Proof. Consider the measures
f
:= f d[
F
and [
F
on (X, F). Then
f
.
The function E( f [F) :=
d
f
d
(Radon-Nikodym derivative) is Fmeasurable, and
it is easy to check that it satises the conditions of the denition of the conditional
expectation. The uniqueness of the conditional expectation is left as an exercise. .
Proposition 2.2.
1. f :E( f [F) is linear, and a contraction in the L
1
metric;
2. f 0 E( f [F) 0 a.e.;
3. if is convex, then E( f [F) (E( f [F));
4. if h is Fmeasurable, then E(hf [F) = hE( f [F);
5. If F
1
F
2
, then E[E( f [F
1
)[F
2
] =E( f [F
2
).
We leave the proof as an exercise.
Theorem 2.3. Let (X, B, , T) be a p.p.t, and f L
1
(X, B, ). Then
lim
N
1
N
N1

k=0
f T
k
=E( f [Inv(T)),
where Inv(T) := E B : E = T
1
E. Alternatively, Inv(T) is the algebra
generated by all Tinvariant functions.
36 2 Ergodic theorems
Proof. Set f := lim
N
1
N

N1
k=0
f T
k
on the set where the limit exists, and zero oth-
erwise. Then f is Invinvariant. For every L

which is Tinvariant,
_
f d =
_

1
N
N1

k=0
f T
k
d +O
_
||

_
_
_
_
_
1
N
N1

k=0
f T
k
f
_
_
_
_
_
1
_
=
=
1
N
N1

k=0
_
T
k
f T
k
d +o(1)
N
_
f d,
because the convergence in the ergodic theorem is also in L
1
. .
2.3.2 Conditional probabilities
Recall that a standard probability space is a probability space (X, B, ) where X is
a complete, metric, separable space, and B is its Borel algebra.
Theorem 2.4 (Existence of Conditional Probabilities). Let by a Borel proba-
bility measure on a standard probability space (X, B, ), and let F B be a
algebra. There exist Borel probability measures
x

xX
s.t.:
1. x
x
(E) is Fmeasurable for every E B;
2. if f is integrable, then x
_
f d
x
is integrable, and
_
f d =
_
X
__
X
f d
x
_
d;
3. if f is intergable, then
_
f d
x
=E( f [F)(x) for a.e. x.
Denition 2.2. The measures
x
are called the conditional probabilities of F. Note
that they are only determined almost everywhere.
Proof. By the isomorphism theorem for standard spaces (see appendix), there is no
loss of generality in assuming that X is compact (indeed, we may take X to be a
compact interval).
Fix a countable dense set f
n

n=0
in C(X) s.t. f
0
1. Let A
Q
be the algebra
generated by these functions over Q. It is still countable.
Choose for every g A
Q
an Fmeasurable version E(g[F) of E(g[F) (recall
that E(g[F) is an L
1
function, namely not a function at all but an equivalence class
of functions). Consider the following collection of conditions:
1. , Q, g
1,2
A
Q
, E(g
1
+g
2
[F)(x) = E(g
1
[F)(x) +E(g
2
[F)(x)
2. g A
Q
, ming E(g[F)(x) maxg
This is countable collection of Fmeasurable conditions, each of which holds with
full probability. Let X
0
be the set of xs which satises all of them. This is an
Fmeasurable set of full measure.
We see that for each x X
0
,
x
[g] :=E(g[F)(x) is linear functional on A
Q
, and
|
x
| 1. It follows that
x
extends uniquely to a positive bounded linear functional
on C(X). This is a measure
x
.
2.3 The non-ergodic case 37
Step 1.
_
(
_
f d
x
)d(x) =
_
f d for all f C(X).
Proof. This is true for all f A
Q
by denition, and extends to all C(X) because A
Q
is dense in C(X). (But for f L
1
it is not even clear that the statement makes sense,
because
x
could live on a set with zero measure!)
Step 2. x
x
(E) is Fmeasurable for all E B.
Exercise: Prove this using the following steps
1. The indicator function of any open set is the pointwise limit of a sequence of
continuous functions 0 h
n
1, thus the step holds for open sets.
2. The collection of sets whose indicators are pointwise limits of a bounded se-
quence of continuous functions forms an algebra. The step holds for every set in
this algebra.
3. The collection of sets for which step 1 holds is a monotone class which contains
a generating algebra.
Step 3. If f = g a.e., then f = g
x
a.e. for a.e. x.
Proof. Suppose (E) = 0. Choose open sets U
n
E such that (U
n
) 0. Choose
continuous functions 0 h

n
1 s.t. h

n
vanish outside U
n
, h

n
are non-zero inside
U
n
, and h

n

0
+
1
U
n
(e.g. h

n
() := [dist (x,U
c
n
)/diam(X)]
1/
).
By construction 1
E
1
U
n
lim
0
+
h

n
, whence
_

x
(E)d(x)
_ _
lim
0
+
h

n
d
x
d
lim
0
+
_ _
h

n
d
x
d = lim
0
+
_
h

n
d lim
0
+
_
h

n
d (U
n
)
n
0.
It follows that
x
(E) = 0 a.e.
Step 4. For all f absolutely integrable, E( f [F)(x) =
_
f d
x
a.e.
Proof. Find g
n
C(X) such that
f =

n=1
g
n
-a.e., and

|g
n
|
L
1
()
<.
Then
E( f [F) =

n=1
E(g
n
[F), because E([F) is a bounded operator on L
1
=

n=1
_
X
g
n
d
x
a.e., because g
n
C(X)
=
_
X

n=1
g
n
d
x
a.e., (justication below)
=
_
X
f d
x
a.e.
38 2 Ergodic theorems
Here is the justication:
_
[g
n
[d
x
<, because the integral of this expression, by
the monotone convergence theorem is less than |g
n
|
1
<. .
2.3.3 The ergodic decomposition
Theorem 2.5 (The Ergodic Decomposition). Let be an invariant Borel proba-
bility measure of a Borel map T on a standard probability space X. Let
x

xX
be
the conditional probabilities w.r.t. Inv(T). Then
1. =
_
X

x
d(x) (i.e. this holds when applies to L
1
functions or Borel sets);
2.
x
is invariant for a.e. x X;
3.
x
is ergodic for a.e. x X.
Proof. By the isomorphism theorem for standard probability spaces, there is no
loss of generality in assuming that X is a compact metric space, and that B is its
algebra of Borel sets.
For every f L
1
,
_
f d =
_
E( f [Inv(T))d(x) =
_
X
_
X
f d
x
d(x). This shows
(1). We have to show that
x
is invariant and ergodic for a.e. x.
Fix a countable set f
n
which is dense in C(X), and choose Borel versions
E

( f
n
[Inv(T))(x). By the ergodic theorem, there is a set of full measure such
that for all x ,
_
f
n
d
x
=E

( f
n
[Inv(T))(x) = lim
N
1
N
N1

k=0
f
n
(T
k
x) for all n.
Step 1.
x
is Tinvariant for a.e. x .
Proof. For every n,
_
f
n
Td
x
= lim
N
1
N
N1

k=0
f
n
(T
k+1
x) a.e. (by the PET)
= lim
N
1
N
N1

k=0
f
n
(T
k
x) =E

( f
n
[Inv(T))(x) =
_
f
n
d
x
.
Let
/
be the set of full measure for which the above holds for all n. Since f
n

is | |

dense in C(X), we have


_
f Td
x
=
_
f d
x
for all x
/
. Fix x
/
.
C(X) is | |
L
1
(
x
)
dense in L
1
(
x
), so
_
f Td
x
=
_
f d
x
for all
x
integrable
functions. This means that
x
T
1
=
x
for all x
/
.
Step 2.
x
is ergodic for all x .
Proof. With f
n

n=1
as above, let := x : k, lim
n
1
n

n1
k=0
f
k
(T
k
x) =
_
f
k
d
x
.
This is a set of full measure because of the ergodic theorem. Now
2.4 The Subadditive Ergodic Theorem 39
0 = lim
N
_
_
_
_
_
1
N
N1

k=0
f
n
T
k

_
X
f
n
d
x
_
_
_
_
_
L
1
()
( the convergence in the PET is in L
1
)
= lim
N
_
X
_
_
_
_
_
1
N
N1

k=0
f
n
T
k

_
X
f
n
d
x
_
_
_
_
_
L
1
(
x
)
d(x) ( =
_
X

x
d)
=
_
X
lim
N
_
_
_
_
_
1
N
N1

k=0
f
n
T
k

_
X
f
n
d
x
_
_
_
_
_
L
1
(
x
)
d(x)
because of the bounded convergence theorem. But this means that
lim
n
_
_
_
_
_
1
n
n1

k=0
f
n
T
k

_
X
f
n
d
x
_
_
_
_
_
L
1
(
x
)
= 0 a.e.
Since n ranges over a countable set, we get that for a.e. x,
1
n
n1

k=0
f
n
T
k
L
1
(
x
)

n
_
f
n
d
x
for all n.
But f
n

n1
is dense in L
1
(
x
), because it is dense in C(X). Therefore we have L
1
convergence for all L
1
(
x
)functions. We just showed that for a.e. x, the ergodic
averages converge in L
1
(
x
) to constants. This means that all Tinvariant functions
are
x
a.e. constant. .
2.4 The Subadditive Ergodic Theorem
We begin with two examples.
Example 1 (Random walks on groups) Let (X, B, , T) be the Bernoulli scheme
with probability vector p = (p
1
, . . . , p
d
). Suppose G is a group, and f : X G is the
function f (x
0
, x
2
, . . .) = g
x
0
, where g
1
, . . . , g
n
G. The expression
f
n
(x) := f (x) f (Tx) f (T
n1
x)
describes the position of a random walk on G, which starts at the identity, and whose
steps have the distribution Pr[step = g
i
] = p
i
. What can be said on the behavior of
this random walk?
In the special case G = Z
d
or G = R
d
, f
n
(x) = f (x) + f (Tx) + + f (T
n1
x),
and the ergodic theorem
2
says that
1
n
f
n
(x) has an almost sure limit, equal to
_
f d =
p
i
g
i
. So: the random walk has speed |p
i
g
i
|, and direction p
i
g
i
/|p
i
g
i
|.
(Note that if G =Z
d
, the direction need not lie in G.)
2
applied to the each coordinate of the vector valued function f = ( f
1
, . . . , f
d
).
40 2 Ergodic theorems
Example 2 (The derivative cocycle) Suppose T : V V is a diffeomorphism act-
ing on an open set V R
d
. The derivative of T at x V is a linear transformation
(dT)(x) on R
d
, v [(dT)(x)]v. By the chain rule,
(dT
n
)(x) = (dT)(T
n1
x) (dT)(T
n2
x) (dT)(x).
If we write f (x) := (dT)(x) GL(d, R), then we see that
(dT
n
)(x) = f (T
n1
x) f (T
n2
x) f (Tx) f (x)
is a random walk on GL(d, R). (But notice the order of multiplication!)
What is its speed? Is there an asymptotic direction?
The problem of describing the direction of random walk on a group is deep,
and at the front line of research to this day (even in the case of matrix groups!) We
postpone it for the moment, and focus on the conceptually easier task of dening
the speed. To do this, we assume that G possesses a right invariant metric (e.g.
dist (A, B) =

log|AB
1
|

on GL(d, R)), and we ask for the asymptotic behavior of


g
(n)
(x) := dist (id, f
n
(x)) as n .
Key observation: g
(n+m)
g
(n)
+g
(m)
T
n
, because
g
(n+m)
= dist (id, f
n+m
) = dist (id, f
m
T
n
f
n
)
dist (id, f
n
) +dist ( f
n
, f
m
T
n
f
n
)
= dist (id, f
n
) +dist (id, f
m
T
n
) (right invariance)
= g
(n)
+g
(m)
T
n
.
We say that g
(n)

n
is a subadditive cocycle.
Theorem 2.6 (Kingmans Subadditive Ergodic Theorem). Let (X, B, m, T) be a
probability preserving transformation, and suppose g
(n)
: X R is a sequence of
absolutely integrable functions such that g
(n+m)
g
(n)
+g
(m)
T
n
for all n, m. Then
the limit g := lim
n
g
(n)
/n exists almost surely, and is an invariant function.
Proof. We begin by observing that it is enough to treat the case when g
(n)
are all
non-positive. Indeed, the functions
h
(n)
:= g
(n)
(g
(1)
+g
(1)
T + +g
(1)
T
n1
)
are nonpositive, satisfy h
(n+m)
h
(n)
+h
(m)
T
n
, and h
(n)
/n converges a.e. to an
invariant function iff g
(n)
/n converges a.e. to an invariant function, because (g
(n)

h
(n)
)/n = (g
(1)
+g
(1)
T + g
(1)
T
n1
)/n E(g
(1)
[Inv) a.e. by Birkhoffs er-
godic theorem.
Assume then that g
(n)
are all non-negative. Dene G(x) := liminf
n
g
(n)
(x)/n (the
limit may be equal to ). We claim that G T = G almost surely. Starting from
the subadditivity inequality g
(n+1)
g
(n)
T +g
(1)
, we see that GGT. Suppose
2.4 The Subadditive Ergodic Theorem 41
there were a set of positive measure E where GT > G+. Then for every x E,
G(T
n
x) > G(x) +, in contradiction to the Poincar e Recurrence Theorem. Thus
G = GT almost surely.
Henceforth we work the set of full measure X
0
:=

n1
[GT
n
= G].
Fix M > 0, and dene G
M
:= G(M). This is an invariant function on X
0
.
We aim at showing limsup
n
g
(n)
n
G
M
a.s.. Since M is arbitrary, this implies that
limsup
n
g
(n)
/n G = liminf
n
g
(n)
/n, whence the theorem.
Fix x X
0
, N N, and > 0. Call k N0
good, if 1, . . . , N s.t. g
()
(T
k
x)/ G
M
(T
k
x) + = G
M
(x) +;
bad, if g
()
(T
k
x)/ > G
M
(x) + for all = 1, . . . N.
Color the integers 1, . . . , n inductively as follows, starting from k = 1. Let k be the
smallest non-colored integer,
(a) If k nN and k is bad, color it red;
(b) If k n N and k is good, nd the smallest 1 N s.t. g
()
(T
k
x)/
G
M
(T
k
x) + and color the segment [k, k +) blue;
(c) If k > nN, color k white.
Repeat this procedure until all integers 1, . . . , n are colored.
The blue part can be decomposed into segments [
i
,
i
+
i
), with
i
s.t.
g
(
i
)
(T

i
x)/
i
G
M
(x) +. Let b denote the number of these segments.
The red part has size
n
k=1
1
B(N,M,)
(T
k
x), where
B(N, M, ) :=x X
0
: g
()
(x)/ > G
M
(x) + for all 1 N.
Let r denote the size of the red part.
The white part has size at most N. Let w be this size.
By the sub-additivity condition
g
(n)
(x)
n

1
n
b

i=1
g
(
i
)
(T

i
x) +
1
n

k red
g
(1)
(T
k
x) +
1
n

k white
g
(1)
(T
k
x)
. .
non-positive

1
n
b

i=1
g
(
i
)
(T

i
x)
1
n
b

i=1
(G
M
(x) +)
i
=
b
n
(G
M
(x) +).
Now b = n(r +w) = n
n
k=1
1
B(N,M,)
(T
k
x) +O(1), so by the Birkhoff ergodic
theorem, for almost every x, b/n
n
1E(1
B(N,M,)
[Inv). Thus
limsup
n
g
(n)
(x)
n
(G
M
(x) +)(1E(1
B(N,M,)
[Inv)) almost surely.
Now N was arbitrary, and for xed M and , B(N, M, ) as N , because G
M

G=liminf

g
()
/. It is not difcult to deduce fromthis that E(1
B(N,M,)
[Inv)
N
0
42 2 Ergodic theorems
almost surly.
3
Thus
limsup
n
g
(n)
(x)
n
G
M
(x) + almost surely.
Since was arbitrary, limsup
n
g
(n)
/n G
M
almost surely, which proves the theorem
by the discussion above. .
Proposition 2.3. Suppose m is ergodic, then the limit in Kingmans ergodic theorem
is the constant inf[(1/n)
_
g
(n)
dm] (possibly equal to ).
Proof. Let G := limg
(n)
/n. Subadditivity implies that G T G. Recurrence im-
plies that GT =T. Ergodicity implies that G=c a.e., for some constant c =c(g)
. We claim that c inf[(1/n)
_
g
(n)
dm]. This is because
c = lim
k
1
kn
g
(kn)
lim
k
1
k
_
g
(n)
n
+
g
(n)
n
T
n
+
g
(n)
n
T
n(k1)
_
=
1
n
_
g
(n)
dm (Birkhoffs ergodic theorem),
proving that c (1/n)
_
g
(n)
dm for all n.
To prove the other inequality we rst note (as in the proof of Kingmans sub-
additive theorem) that it is enough to treat the case when g
(n)
are all non-positive.
Otherwise work with h
(n)
:= g
(n)
(g
(1)
+ +g
(1)
T
n1
). Since g
(1)
L
1
,
1
n
(g
(1)
+ +g
(1)
T
n1
)
n
_
g
(1)
dm pointwise and in L
1
.
Thus c(g) =lim
g
(n)
n
=c(h)+
_
g
(1)
=inf(1/n)[
_
h
(n)
+
_
S
n
g
(1)
] =inf[(1/n)
_
g
(n)
]dm.
Suppose then that g
(n)
are all non-positive. Fix N, and set g
(n)
N
:=maxg
(n)
, nN.
This is, again, subadditive because
g
(n+m)
N
= maxg
(n+m)
, (n+m)N maxg
(n)
+g
(m)
T
n
, (n+m)N
maxg
(n)
N
+g
(m)
N
T
n
, (n+m)N g
(n)
N
+g
(m)
N
T
n
.
By Kingmans theorem, g
(n)
N
/n converges pointwise to a constant c(g
N
). By deni-
tion, N g
(n)
N
/n 0, so by the bounded convergence theorem,
c(g
N
) = lim
n
1
n
_
g
(n)
N
dm inf
1
n
_
g
(n)
N
dm inf
1
n
_
g
(n)
dm. (2.1)
3
Suppose 0 f
n
1 and f
n
0. The conditional expectation is monotone, so E( f
n
[F) is decreas-
ing at almost every point. Let be its almost sure limit, then 0 1 a.s., and by the BCT,
E() = E(limE( f
n
[F)) = limE(E( f
n
[F)) = limE( f
n
) = E(lim f
n
) = 0, whence = 0 almost
everywhere.
2.5 The Multiplicative Ergodic Theorem 43
Case 1: c(g) =. In this case g
(n)
/n , and for every N there exists N(x) s.t.
n >N(x) g
(n)
N
(x) =N. Thus c(g
N
) =N, and (2.1) implies inf[(1/n)
_
g
(n)
dm] =
= c(g).
Case 2: c(g) is nite. Take N >[c(g)[ +1, then for a.e. x, if n is large enough, then
g
(n)
/n > c(g) > N, whence g
(n)
N
= g
(n)
. Thus c(g) = c(g
N
) inf
1
n
_
g
(n)
dm
and we get the other inequality. .
Here is a direct consequence of the subadditive ergodic theorem (historically, it
predates the subadditive ergodic theorem):
Theorem 2.7 (FurstenbergKesten). Let (X, B, , T) be a ppt, and suppose A :
X GL(d, R) is a measurable function s.t. |A| L
1
. If A
n
(x) :=A(T
n1
x) A(x),
then the following limit exists a.e. and is invariant: lim
n
1
n
log|A
n
(x)|.
The following immediate consequence will be used in the proof of the Oseledets
theorem for invertible cocycles:
Remark: Suppose (X, B, m, T) is invertible, and let g
(n)
be a subadditive cocycle
s.t. g
(1)
L
1
. Then for a.e. x, lim
n
g
(n)
T
n
/n exists and equals lim
n
g
(n)
/n.
Proof. Since g
(n)
is subadditive, g
(n)
T
n
is subadditive:
g
(n+m)
T
(n+m)
[g
(n)
T
m
+g
(m)
] T
(n+m)
= g
(n)
T
n
+[g
(m)
T
m
] T
n
.
Let m =
_
m
y
d(y) be the ergodic decomposition of m. Kingmans ergodic theo-
rem and the previous remark say that for a.e. y,
lim
n
g
(n)
T
n
n
= inf
1
n
_
g
(n)
T
n
dm
y
= inf
1
n
_
g
(n)
dm
y
m
y
a.e.
= lim
n
g
(n)
n
m
y
a.e.
Thus the set where the statement of the remark fails has zero measure with respect
to all the ergodic components of m, and this means that the statement is satised on
a set of full mmeasure.
2.5 The Multiplicative Ergodic Theorem
2.5.1 Preparations from Multilinear Algebra
Multilinear forms. Let V =R
n
equipped with the usual euclidean metric. A linear
functional on V is a linear map : V R. The set of linear functional is denoted
by V

. Any v V determines v

via v

= v, ). Any linear function is of this


form.
44 2 Ergodic theorems
A kmultilinear function is a function T : V
k
R such that for all i and
v
1
, . . . , v
i1
, v
i+1
, . . . , v
k
V, T(v
1
, . . . , v
i1
, , v
i+1
, . . . , v
k
) is a linear functional.
The set of all kmultilinear functions on V is denoted by T
k
(V). The tensor
product of T
k
(V) and T

(V) is T
k+
(V) given by
( )(v
1
, . . . , v
k+l
) := (v
1
, . . . , v
k
)(v
k+1
, . . . , v
k+l
).
The tensor product is bilinear and associative, but it is not commutative.
The dimension of T
k
(V) is n
k
. Here is a basis: e

i
1
e

i
k
: 1 i
1
, . . . , i
k
n.
To see this note that every element in T
k
() is completely determined by its action
on (e
i
1
, , e
i
k
) : 1 i
1
, . . . , i
k
n.
Dene an inner product on T
k
(V) by declaring the above basis to be orthonormal.
Alternating multilinear forms. A multilinear form is called alternating, if it
satises i ,= j(v
i
= v
j
) (v
1
, . . . , v
n
) = 0. Equivalently,
(v
1
, . . . , v
i
, . . . , v
j
, . . . v
n
) =(v
1
, . . . , v
j
, . . . , v
i
, . . . v
n
).
(to see the equivalence, expand (v
1
, . . . , v
i
+v
j
, . . . , v
j
+v
i
, . . . v
n
)). The set of all
kalternating forms is denoted by
k
(V).
Any multilinear form gives rise to an alternating form Alt() via
Alt() :=
1
k!

S
k
sgn() ,
where S
k
is the group of kpermutations, and the action of a permutation on
T
k
(V) is given by ( )(v
1
, . . . , v
k
) = (v
(1)
, . . . , v
(k)
). The normalization k!
is to guarantee Alt[

k
(V)
= id, and Alt
2
= Alt. Note that Alt is linear.
Lemma 2.1. Alt[Alt(
1

2
)
3
] = Alt(
1

3
).
Proof. We show that if Alt() = 0, then Alt( ) = 0 for all . Specializing to
the case = Alt(
1

2
)
1
and =
3
, we get (since Alt
2
= Alt)
Alt[(Alt(
1

2
)
1

2
)
3
] = 0,
which is equivalent to the statement of the lemma.
Suppose T
k
(V), T

(V), and Alt() = 0. Let G := S


k+
: (i) =
i for all i = k +1, . . . , k +. This is a normal subgroup of S
k+l
, and there is natural
isomorphism
/
:= [
1,...,k
from G to S
k
. Let S
k+l
=

j
G
j
be the corre-
sponding right coset decomposition, then
(k +)!Alt( )(v
1
, . . . , v
k+
) =
2.5 The Multiplicative Ergodic Theorem 45
=

G
sgn(
j
)(
j
) ( )(v
1
, . . . , v
k+
)
=

j
sgn(
j
)(v

j
(k+1)
, . . . , v

j
(k+)
)

G
sgn()( )(v

j
(1)
, . . . , v

j
(k)
)
=

j
sgn(
j
)(v

j
(k+1)
, . . . , v

j
(k+)
)

/
S
k
sgn(
/
)(
/
)(v

j
(1)
, . . . , v

j
(k)
)
=

j
sgn(
j
)(v

j
(k+1)
, . . . , v

j
(k+)
)k!Alt()(v

j
(1)
, . . . , v

j
(k)
) = 0.
.
Using this antisymmetrization operator, we dene the following product, called
exterior product or wedge product: If
k
(V),
l
(V), then
:=
(k +l)!
k!!
Alt( ).
The wedge product is bilinear, and the previous lemma shows that it is associative.
It is almost anti commutative: If
k
(V),

(V), then
= (1)
k
.
Well see the reason for the peculiar normalization later.
Proposition 2.4. e

i
1
e

i
k
: 1 i
1
< <i
k
n is a basis for
k
(V), whence
dim
k
(V) =
_
n
k
_
.
Proof. Suppose
k
(V), then T
k
(V) and so = a
i
1
,...,i
k
e

i
1
e

i
k
,
where the sum ranges over all ktuples of numbers between 1 and n. If
k
(V),
then Alt() = and so
=

a
i
1
,...,i
k
Alt(e

i
1
e

i
k
).
Fix :=e

i
1
e

i
k
. If i

=i

for some ,=, then the permutation


0
which
switches preserves . Thus for all S
k
,
sgn(
0
)(
0
) =sgn()
and we conclude that Alt() = 0. If, on the other hand, i
1
, . . . , i
k
are all different,
then it is easy to see using lemma 2.1 that
Alt(e

i
1
e

i
2
e

i
k
) =
1
k!
e

i
1
e

i
k
.
Thus =
1
k!
a
i
1
,...,i
k
e

i
1
e

i
k
, and we have proved that the set of forms in
the statement spans
k
(V). To see that this set is independent, note that we can
determine the coefcient of e

i
1
e

i
k
by evaluating the form on (e
i
1
, . . . , e
i
k
).
.
46 2 Ergodic theorems
Corollary 2.2. e

1
e

n
is the determinant. This is the reason for the peculiar
normalization in the denition of .
Proof. The determinant is an alternating nform, and dim
n
(V) = 1, so the deter-
minant is proportional to e

1
e

n
. Since the values of both forms on the standard
basis is one (because e

1
e

n
= n!Alt(e

1
e

n
)), they are equal. .
We dene an inner product on
k
(V) by declaring the basis in the proposition to
be orthonormal. Let | | be the resulting norm.
Lemma 2.2. For v V, let v

:=v, ), then
(a) | | ||||.
(b) v

1
v

k
, w

1
w

k
) = det(v
i
, w
j
)).
(c) If u
1
, . . . , u
n
is an orthonormal basis for V, then u

i
1
u

i
k
: 1 i
1
< <
i
k
n is an orthonormal basis for
k
(V).
(d) If spanv
1
, . . . , v
k
= spanu
1
, . . . , u
k
, then v

1
v

k
and u

1
w

k
are
proportional.
Proof. Write for I = (i
1
, . . . , i
k
) such that 1 i
1
< < i
k
n, e

I
:= e

i
1
e

i
k
.
Represent :=
I
e

I
, :=
J
e

J
, then
| |
2
=
_
_
_
_
_

I,J

J
e

I
e

J
_
_
_
_
_
2
=
_
_
_
_
_

IJ=

J
e

IJ
_
_
_
_
_
2
=

IJ=

2
I

2
J
||
2
||
2
.
Take two multi indices I, J. If I = J, then the inner product matrix is the identity
matrix. If I ,=J, then I J and then the row and column of the inner product
matrix will be zero. Thus the formula holds for any pair e

I
, e

J
. Since part (b) of
the lemma holds for all basis vectors, it holds for all vectors. Part (c) immediately
follows.
Next we prove part (d). Represent v
i
=
i j
w
j
, then
v

1
v

k
= const. Alt(v

1
v

k
) = const. Alt
_

1 j
u

k j
u

j
_
= const.

1 j
1

k j
k
Alt(u

j
1
u

j
k
).
The terms where j
1
, . . . , j
k
are not all different are annihilated by Alt. The terms
where j
1
, . . . , j
k
are all different are mapped by Alt to a form which proportional to
u

1
u

k
. Thus the result of the sum is proportional to u

1
u

k
. .
Exterior product of linear operators Let A : V V be a linear operator. The
k th exterior product of A is A
k
:
k
(V)
k
(V) given by
(A
k
)(v
1
, . . . , v
k
) := (A
t
v
1
, . . . , A
t
v
k
).
The transpose is used to get A
k
(v

1
v

k
) = (Av
1
)

(Av
k
)

.
2.5 The Multiplicative Ergodic Theorem 47
Theorem 2.8. |A
k
| =
1

k
, where
1

2

n
are the eigenvalues of
(A
t
A)
1/2
, listed in decreasing order with multiplicities.
Proof. The matrix AA
t
is symmetric, so it can be orthogonally diagonalized. Let
v
1
, . . . , v
n
be an orthonormal basis of eigenvectors, listed so that (AA
t
)v
i
=
2
i
v
i
.
Then v

I
: I 1, . . . , d, [I[ =k is an orthonormal basis for
k
(R
d
), where we are
using the multi index notation
v

I
= v

i
1
v

i
k
,
where i
1
< < i
k
is an ordering of I.
Given
k
(R
d
), write =
I
v

I
, then
|A
k
|
2
=A
k
, A
k
) =
_

I
A
k
v

I
,

J
A
k
v

J
_
=

I,J

J
_
A
k
v

I
, A
k
v

J
_
.
Now,
_
A
k
v

I
, A
k
v

J
_
=(A
t
v
i
1
)

(A
t
v
i
k
)

, (A
t
v
j
1
)

(A
t
v
j
k
)

)
= det
_
A
t
v
i

, A
t
v
j

)
_
(Lemma 2.2(b))
= det
_
v
i

, AA
t
v
j

)
_
= det
_
v
i

,
2
i

v
j

)
_
=

jJ

2
j
det
_
v
i

, v
i

_
=

jJ

2
j
v

I
, v

J
) =
_

iI

2
i
I = J
0 I ,= J
,
Thus |A
k
|
2
=
I

2
I

iI

2
i
||
2

k
i=1

2
i
. It follows that |A
k
|
1

k
.
To see that the inequality is in fact an equality, consider the case = v

I
where
I =1, . . . , k: |A
k
| =v

I
, v

I
) = (
1

k
)
2
= (
1

k
)
2
||
2
. .
Exterior products and angles between vector spaces The angle between vector
spaces V,W R
d
is
(V,W) := minarccosv, w) : v V, w W, |v| =|w| = 1.
So if V W ,=0 iff (V,W) = 0, and V W iff (V,W) = /2.
Proposition 2.5. If (w

1
, . . . , w

k
) is a basis of W, and (v

1
, . . . , v

) is a basis of W, then
|(v

1
v

) (w

1
w

k
)| |v

1
v

| |w

1
w

k
| [ sin(V,W)[.
Proof. If V W ,=0 then both sides are zero, so suppose V W =0, and pick
an orthonormal basis e
1
, . . . , e
n+k
for V W. Let w W, v V be unit vectors s.t.
(V,W) =(v, w), and write v =v
i
e
i
, w =w
j
e
j
, then
48 2 Ergodic theorems
|v

|
2
=
_
_
_
_
_

i, j
v
i
w
j
e

i
e

j
_
_
_
_
_
2
=
_
_
_
_
_

i<j
(v
i
w
j
v
j
w
i
)e

i
e

j
_
_
_
_
_
2
=

i<j
(v
i
w
j
v
j
w
i
)
2
=
1
2

i, j
(v
i
w
j
v
j
w
i
)
2
(the terms where i = j vanish)
=
1
2

i, j
_
v
2
i
w
2
j
+v
2
j
w
2
i
2v
i
w
i
v
j
w
j
_
=
1
2
_
2

i
v
2
i
j
w
2
j
2
_

i
v
i
w
i
_
2
_
=|v|
2
|w|
2
v, w)
2
= 1cos
2
(v, w) = sin
2
(V,W).
Complete v to an orthonormal basis (v, v
/
2
, . . . , v
/

) of V, and complete w to an
orthonormal basis (w, w
/
2
, . . . , w
/
k
) of W. Then
|(v

v
/
2

v
/

) (w

w
/
2

v
/
k

)|
|v

| |v
/
2

v
/

| |w
/
2

v
/
k

| =[ sin(V,W)[ 1 1,
because of orthonormality. By lemma 2.2
v

1
v

= |v

1
v

| v

v
/
2

v
/

1
w

k
= |w

1
w

k
| w

w
/
2

w
/
k

and the proposition follows. .


2.5.2 Proof of the Multiplicative Ergodic Theorem
Let (X, B, m, f ) be a ppt, and A : X GL(d, R) some Borel map. We dene A
n
:=
A f
n1
A, then the cocycle identity holds: A
n+m
(x) = A
n
( f
m
x)A
m
(x).
Theorem 2.9 (Multiplicative Ergodic Theorem). Let (X, B, T, m) be a ppt, and
A : X GL(d, R) a Borel function s.t. ln|A(x)
1
| L
1
(m), then
(x) := lim
n
[A
n
(x)
t
A
n
(x)]
1/2n
exists a.e., and lim
n
1
n
ln|A
n
(x)(x)
n
| = lim
n
1
n
ln|(A
n
(x)(x)
n
)
1
| = 0 a.s.
Proof. The matrix
_
A
n
(x)
t
A
n
(x) is symmetric, therefore it can be orthogonally
diagonalized. Let
1
n
(x) < <
s
n
(x)
n
(x) be its different eigenvalues, and R
d
=
W

1
n
(x)
n
(x) W

sn(x)
n
(x)
n
(x) the orthogonal decomposition ot R
d
into the corre-
sponding eigenspaces. The proof has the following structure:
Part 1: Let t
1
n
(x) t
d
n
(x) be a list of the eigenvalues of
_
A
n
(x)
t
A
n
(x) with
multiplicities, then for a.e. x, there is a limit t
i
(x) = lim
n
[t
i
n
(x)]
1/n
, i = 1, . . . , d.
2.5 The Multiplicative Ergodic Theorem 49
Part 2: Let
1
(x) < <
s(x)
(x) be a list of the different values of t
i
(x)
d
i=1
.
Divide t
i
n
(x)
d
i=1
into s(x) subsets of values t
i
n
(x) : i I
j
n
, (1 j s(x)) in
such a way that t
i
(x)
1/n

j
(x) for all i I
j
n
. Let
U
j
n
(x) := sum of the eigenspaces of t
i
n
(x), i I
j
n
.
= the part of the space where B
n
(x) dilates by approximately
j
(x)
n
.
We show that the spaces U
j
n
(x) converge as n to some limiting spaces U
j
(x)
(in the sense that the orthogonal projections on U
j
n
(x) converge to the orthogonal
projection on U
j
(x)).
Part 3: The theorem holds with (x) : R
d
R
d
given by v
i
(x) on U
i
(x).
Part 1 is proved by applying the sub additive ergodic theorem for a cleverly chosen
sub-additive cocyle (Raghunathans trick). Parts 2 and 3 are (non-trivial) linear
algebra.
Part 1: Set g
(n)
i
(x) :=
d
j=di+1
lnt
i
n
(x). This quantity is nite, because A
t
n
A
n
is
invertible, so none of its eigenvalues vanish.
The sequence g
(n)
i
is subadditive! This is because the theory of exterior products
says that expg
(n)
i
= product of the i largest e.v.s of
_
A
n
(x)
t
A
n
(x) =|A
n
(x)
i
|, so
expg
(n+m)
i
(x) =|A
i
n+m
(x)| =|A
m
(T
n
x)
i
A
n
(x)
i
| |A
m
(T
n
x)
i
||A
n
(x)
i
|
= exp[g
(m)
i
(T
n
x) +g
(n)
i
(x)],
whence g
(n+m)
i
g
(n)
i
+g
(m)
i
T
n
.
We want to apply Kingmans subadditive ergodic theorem. First we need to check
that g
(1)
i
L
1
. We use the following fact from linear algebra: if is an eigenvalue
of a matrix B, then |B
1
|
1
[[ |B|.
4
Therefore
[ lnt
i
n
(x)[
1
2
max[ ln|A
t
n
A
n
|[, [ ln|(A
t
n
A
n
)
1
|[
max[ ln|A
n
(x)|[, [ ln|A
n
(x)
1
|[

n1

k=0
_
[ ln|A(T
k
x)|+[ ln|A(T
k
x)
1
|[
_
[g
(n)
i
(x)[ ni
n1

k=0
_
[ ln|A(T
k
x)|+[ ln|A(T
k
x)
1
|[
_
. (2.2)
In the particular case n = 1, we get that |g
(1)
i
|
1

_

ln|A||
1
+ln|A
1
|

d <.
Thus Kingmans ergodic theorem says that lim
1
n
g
(n)
i
(x) exists almost surely, and
belongs to [, ). In fact the limit is nite almost everywhere, because (2.2) and
4
Proof: Let v be an eigenvector of with norm one, then [[ =|Bv| |B| and 1 =|B
1
Bv|
|B
1
||Bv| =|B
1
|[[.
50 2 Ergodic theorems
the Pointwise Ergodic Theorem imply that
lim
n
1
n
[g
(n)
i
[ E
_
[ ln|A|[ +[ ln|A
1
|[

Inv
_
< a.e.
Taking differences, we see that the following limit exists a.e.:
lnt
i
(x) := lim
n
1
n
[g
(n)
di+1
(x) g
(n)
di
(x)] = lim
n
1
n
lnt
n
i
(x).
Thus [t
i
n
(x)]
1/n

n
t
i
(x) almost surely, for some t
i
(x) R.
Part 2: Fix x s.t. [t
i
n
(x)]
1/n

n
t
i
(x) for all 1 i d. Henceforth we work with
this x only, and write for simplicity A
n
= A
n
(x), t
i
=t
i
(x) etc.
Let s = s(x) be the number of the different t
i
. List the different values of these
quantities an increasing order:
1
<
2
< <
s
. Set
j
:= log
j
. Fix 0 < <
min
j+1

j
. Since (t
i
n
)
1/n

i
, the following sets eventually stabilize and are
independent of n:
I
j
:=i : [(t
i
n
)
1/n

i
[ < ( j = 1, . . . , s).
Dene, relative to
_
A
t
n
A
n
,
U
j
n
:=

iI
j
[eigenspace of t
i
n
(x)];
V
r
n
:=

jr
U
j
n


V
r
n
:=

jr
U
j
n
The linear spaces U
1
n
, . . . ,U
s
n
are orthogonal, since they are eigenspaces of different
eigenvalues for a symmetric matrix (
_
A
t
n
A
n
). We showthat they converge as n ,
in the sense that their orthogonal projections converge.
The proof is based on the following technical lemma. Denote the projection of a
vector v on a subspace W by v[W, and write
i
:= log
i
.
Technical lemma: For every > 0 there exists constants K
1
, . . . , K
s
> 1 and N s.t.
for all n > N, t = 1, . . . , s, k N, and u V
r
n
,
_
_
u[

V
r+t
n+k
_
_
K
t
|u|exp(n(
r+t

r
t))
We give the proof later. First we show how it can be used to nish parts 2 and 3.
We show that V
r
n
converge as n . Since the projection on U
i
n
is the projection
on V
i
n
minus the projection on V
i1
n
, it will then follow that the projections of U
i
n
converge.
Fix N large. We need it to be so large that
1. I
j
are independent of n for all n > N;
2. The technical lemma works for n > N with as above.
2.5 The Multiplicative Ergodic Theorem 51
There will be other requirements below.
Fix an orthonormal basis (v
1
n
, . . . , v
d
r
n
) for V
r
n
(d
r
= dim(V
r
n
) =
jr
[I
j
[). Write
v
i
n
=
i
n
w
i
n+1
+u
i
n+1
, where w
i
n+1
V
r
n+1
, |w
i
n+1
| = 1, u
i
n+1

V
r+1
n+1
.
Note that |u
i
n+1
| = |v
i
n
[

V
r+1
n+1
| K
1
exp(n(
r+1

r
)). Using the identity

i
n
=
_
1|u
i
n+1
|
2
, it is easy to see that for some constants C
1
and 0 < < 1
independent of n and (v
i
n
),
|v
i
n
w
i
n+1
| C
1

n
.
( := max
r
exp[(
r+1

r
)] and C
1
:= 2K
1
should work.)
The system w
i
n+1
is very close to being orthonormal:
w
i
n+1
, w
j
n+1
) =w
i
n+1
v
i
n
, w
j
n+1
) +v
i
n
, v
j
n
) +v
i
n
, w
j
n+1
v
j
n
) =
i j
+O(
n
),
because v
i
n
is an orthonormal system. It follows that for all n large enough, w
i
n+1
are linearly independent. A quick way to see this is to note that
|(w
1
n+1
)

(w
d
r
n+1
)

|
2
= det(w
i
n+1
, w
j
n+1
)) (lemma 2.2)
= det(I +O(
n
)) ,= 0, provided n is large enough,
and to observe that wedge produce of a linearly dependent system vanishes.
It follows that w
1
n+1
, . . . , w
d
r
n+1
is a linearly independent subset of V
r
n+1
. Since
dim(V
r
n+1
) =
jr
[I
j
[ = d
r
, this is a basis for V
r
n+1
.
Let (v
i
n+1
) be the orthonormal basis obtained by applying the GramSchmidt
procedure to (w
i
n+1
). We claim that there is a global constant C
2
such that
|v
i
n
v
i
n+1
| C
2

n
. (2.3)
Write v
i
= v
i
n+1
, w
i
= w
i
n+1
, then the GramSchmidt process is to set v
i
= u
i
/|u
i
|,
where u
i
are dened by induction by u
1
:= w
1
, u
i
:= w
i

j<i
w
i
, v
j
)v
j
. We con-
struct by induction global constants C
i
2
s.t. |v
i
w
i
| C
i
2

n
, and then take C
2
:=
maxC
i
2
. When i = 1, we can take C
1
2
:= C
1
, because v
1
= w
1
, and |w
1
v
1
n
|
C
1

n
. Suppose we have constructed C
1
2
, . . . ,C
i1
2
. Then
|u
i
w
i
|

j<i
[w
i
, v
j
)[

j<i
[w
i
, w
j
)[ +|w
j
v
j
|
_
2C
1
(i 1) +

j<i
C
j
2
_

n
,
because [w
i
, w
j
)[ =[w
i
v
i
n
, w
j
) +v
i
n
, w
j
v
j
n
) +v
i
n
, v
j
n
)[ 2C
1

n
. Call the term
in the brackets K, and assume n is so large that K
n
< 1/2, then [|u
i
|1[ |u
i

w
i
| K
n
, whence
52 2 Ergodic theorems
|v
i
w
i
| =
_
_
_
_
u
i
|u
i
|w
i
|u
i
|
_
_
_
_

|u
i
w
i
|+[1|u
i
|[
|u
i
|
4K
n
and we can take C
i
2
:= 4K. This proves (2.3).
Starting fromthe orthonormal basis (v
i
n
) for V
r
n
, we have constructed an orthonor-
mal basis (v
i
n+1
) for V
r
n+1
such that |v
i
n
v
i
n+1
| C
2

n
. Continue this procedure by
induction, and construct the orthonormal bases (v
i
n+k
) for V
r
n+k
. By (2.3), these bases
form Cauchy sequences: v
i
n+k

k
v
i
.
The limit vectors must also be orthonormal. Denote their span by V
r
. The pro-
jection on V
r
takes the form
d
r

i=1
v
i
, )v
i
= lim
k
d
r

i=1
v
i
n+k
, )v
i
n+k
= lim
k
proj
V
r
n+k
.
Thus V
r
n+k
V
r
.
Part 3: We saw that proj
U
i
n
(x)

n
proj
U
i
(x)
for some linear spaces U
i
(x). Set
(x) GL(R
d
) to be the matrix representing
(x) =
s(x)

j=1
e

j
(x)
proj
U
i
(x)
.
Since U
i
(x) are limits of U
i
n
, they are orthogonal, and they sum up to R
d
. It follows
that is invertible, symmetric, and positive.
Let W
i
n
be the eigenspace of t
i
n
(x) for
_
A
t
n
A
n
, then for all v R
d
,
(A
t
n
A
n
)
1/2n
v = (
_
A
t
n
A
n
)
1/n
v =
d

i=1
t
i
n
(x)
1/n
proj
W
i
n
(v)
=
s

j=1

iI
j
t
i
n
(x)
1/n
proj
W
i
n
(v)
=
s

j=1
e

j
(x)

iI
j
proj
W
i
n
(v) +o(|v|),
where o(|v|) denotes a vector with norm o(|v|)
=
s

j=1
e

j
(x)
proj
U
j
n
(v) +o(|v|)
n
(x)v.
Thus (A
t
n
A
n
)
1/2n
.
We show that
1
n
log|(A
n

n
)
1
|
n
0. Its enough to show that
lim
n
1
n
log|A
n
v| =
r
:= log
r
uniformly on the unit ball in U
r
. (2.4)
2.5 The Multiplicative Ergodic Theorem 53
To see that this is enough, note that v =
s
r=1
e

r
(v[U
r
); for all > 0, if n is large
enough, then for every v,
|A
n

n
v|
s

r=1
e
n
r
|A
n
(v[U
r
)| =
s

r=1
e
n
r
e
n(
r
+)
|v| se
n
|v| (v R
d
)
|A
n

n
v| = e
n
r
|A
n
v| = e
n
|v| (v U
r
)
Thus |A
n

n
| e
n
for all , whence
1
n
log|A
n

n
| 0 a.e.
To see that
1
n
log|(A
n

n
)
1
| 0, we use a duality trick.
Dene for a matrix C, C
#
:= (C
1
)
t
, then (C
1
C
2
)
#
=C
#
1
C
#
2
. Thus (A
#
)
n
= (A
n
)
#
,
and B
#
n
:=
_
(A
#
)
t
n
(A
#
)
n
= (
_
A
t
n
A
n
)
#
= (
_
A
t
n
A
n
)
1
. Thus we have the following
relation between the objects associated to A
#
and A:
1. the eigenvalues of B
#
n
are 1/t
d
n
1/t
1
n
(the order is ipped)
2. the eigenspace of 1/t
i
n
for B
#
n
is the eigenspace of t
i
n
for B
n
3.
#
j
=
sj+1
4. (U
j
n
)
#
=U
sj+1
n
, (V
r
n
)
#
=

V
sr+1
n
, (

V
r
n
)
#
=V
sr+1
n
5.
#
=
1
.
Thus |(
n
A
1
n
)| =|(
n
A
1
n
)
t
| =|A
#
n
(
#
)
n
|, so the claim
1
n
log|
n
A
1
n
|
n
0
a.e. follows from what we did above, applied to A
#
.
Here is another consequence of this duality: There exist K
#
1
, . . . , K
#
t
s.t. for all ,
there is an N s.t. for all n > N, if u U
r
n
, then for all k
|u[V
rt
n+k
| K
#
t
exp[n(
r

rt
)]. (2.5)
To see this note that V
rt
n+k
= (

V
sr+t+1
n+k
)
#
and U
r
n
(V
sr+1
n
)
#
, and apply the techni-
cal lemma to the cocycle generated by A
#
.
We prove (2.4). Fix > 0 and N large (we see how large later), and assume
n >N. Suppose v U
r
and |v| =1. Write v =limv
n+k
with v
n+k
:=v[U
k
n+k
U
r
n+k
.
Note that |v
n+k
| 1. We decompose v
n+k
as follows
v
n+k
=
_
v
n+k
[V
r1
n
_
+(v
n+k
[U
r
n
) +
sr

t=1
_
v
n+k
[U
r+t
n
_
,
and estimate the size of the image of each of the summands under A
n
.
First summand:
|A
n
(v
n+k
[V
r1
n
)|
2
B
2
n
(v
n+k
[V
r1
n
), (v
n+k
[V
r1
n
))
= e
2n(
r1
+o(1))
|v
n+k
[V
r1
n
| e
2n(
r1
+o(1))
.
Thus the rst summand is less than exp[n(
r1
+o(1))].
Second Summand:
54 2 Ergodic theorems
|A
n
(v
n+k
[U
r
n
)|
2
=B
t
n
(v
n+k
[U
r
n
), (v
n+k
[U
r
n
))
= e
2n(
r
+o(1))
|v
n+k
[U
r
n
|
2
= e
2n(
r
+o(1))
(|v[U
r
n
|(|(v
n+k
v)[U
r
n
|)
2
= e
2n(
r
)
(|v[U
r
n
||v
n+k
v|)
2
= e
2n(
r
)
[1+o(1)] uniformly in v.
Thus the second summand is [1+o(1)] exp[n(
r
)] uniformly in v U
r
, |v| = 1.
Third Summand: For every t,
|A
n
(v
n+k
[U
t
n
)|
2
=B
n
(v
n+k
[U
t
n
), (v
n+k
[U
t
n
))
e
2n(
r+t
+o(1))
|v
n+k
[U
r+t
n
|
2
e
2n(
r+t
+o(1))
_
sup
uU
r+t
n
,|u|=1
v
n+k
, u)
_
2
, because |x[W| = sup
wW,|w|=1
x, w)
e
2n(
r+t
+o(1))
_
sup
uU
r+t
n
,|u|=1
sup
vV
r
n+k
,|v|1
v, u)
_
2
= e
2n(
r+t
+o(1))
sup
uU
r+t
n
,|u|=1
|u[V
r
n+k
|
2
(K
#
t
)
2
e
2n(
r+t
+o(1))
exp[2n(
r

r+t
o(1))], by (2.5)
= (K
#
t
)
2
e
2n(
r
+o(1))
.
Note that the cancellation of
r+t
this is the essence of the technical lemma. We
get: |A
n
(v
n+k
[U
t
n
)| = O(exp[n(
r
+o(1))]). Summing over t = 1, . . . , s r, we get
that third summand is O(exp[n(
r
+o(1))]).
Putting these estimates together, we get that
|A
n
v
n+k
| const. exp[n(
r
+o(1))] uniformly in k, and on the unit ball in U
r
.
Uniformity means that the o(1) can be made independent of v and k. It allows us
to pass to the limit as k and obtain
|A
n
v| const. exp[n(
r
+o(1))] uniformly on the unit ball in U
r
.
On the other hand, an orthogonality argument shows that
|A
n
v
n+k
|
2
=B
2
n
v
n+k
, v
n+k
)
=|1st summand|
2
+|2nd summand|
2
+|3rd summand|
2
|2nd summand|
2
= [1+o(1)] exp[2n(
r
+o(1))].
Thus |A
n
v
n+k
| [1+o(1)] exp[n(
r
+o(1))] uniformly in v, k. Passing to the limit
as k , we get |A
n
v| const. exp[n(
r
+o(1))] uniformly on the unit ball in U
r
.
These estimates imply (2.4).
2.5 The Multiplicative Ergodic Theorem 55
Proof of the technical lemma: We are asked to estimate the norm of the projection
of a vector in V
r
n
on V
r+t
n+k
. We so this in three steps:
1. V
r
n
V
r+t
n+1
, all t > 0;
2. V
r
n
V
r+1
n+k
, all k > 0;
3. V
r
n
V
r+t
n+k
, all t, k > 0.
Step 1. The technical lemma for k = 1: Fix > 0, then for all n large enough and
for all r
/
> r, if u V
r
n
, then |u[V
r
/
n+1
| |u|exp(n(
r
/
r
)).
Proof. Fix , and choose N =N() so large that t
i
n
=e
n
t
i
for all n >N, i =1, . . . , d.
For every t = 1, . . . , s, if u V
r
n
, then
|A
n+1
u| =
_
A
t
n+1
A
n+1
u, u)
=
_
A
t
n+1
A
n+1
(u[

V
r+t
n+1
), (u[

V
r+t
n+1
)) +A
t
n+1
A
n+1
(u[V
r+t1
n+1
), (u[V
r+t1
n+1
))
(because V
r+t1
n+1
,

V
r+t
n+1
are orthogonal, A
t
n+1
A
n+1
invariant,
and R
d
=V
r+t1
n+1

V
r+t
n+1
)
=
_
|A
n+1
(u[

V
r+t
n+1
)|
2
+|A
n+1
(u[V
r+t1
n+1
)|
2
|A
n+1
(u[

V
r+t
n+1
)| = e
(
r+t
)(n+1)
|u[

V
r+t
n+1
|.
On the other hand
|A
n+1
u| =|A(T
n
x)A
n
(x)u| |A(T
n
x)|
_
A
t
n
A
n
u, u)
|A(T
n
x)|e
n(
r
)
|u|
= e
n(
r
)+o(n)
|u|,
because by the ergodic theorem
1
n
log|A(T
n
x)| =
1
n
n

k=0
log|A(T
k
x)|
1
n
n1

k=0
log|A(T
k
x)|
n
0 a.e.
By further increasing N, we can arrange [o(n)[ < n, which gives
e
(
r+t
)(n+1)
|u[

V
r+t
n+1
| e
n(
r
+2)
,
whence |u[

V
r+t
n+1
| e
n(
r+t

r
3)
. Now take := /3.
Step 2. Fix > 0. Then for all n large enough and for all k, if u V
r
n
, then
|u[

V
r+1
n+k
| |u|
k1
j=0
exp((n+ j)(
r+1

r
)). Thus K
1
s.t.
|u[

V
r+1
n+k
| K
1
|u|exp[n(
r+1

r
)].
56 2 Ergodic theorems
Proof. We use induction on k. The case k = 1 is dealt with in step 1. We assume by
induction that the statement holds for k 1, and prove it for k. Decompose
u[

V
r+1
n+k
= [(u[V
r
n+k1
)

V
r+1
n+k
] +[(u[

V
r+1
n+k1
)

V
r+1
n+k
].
First summand: u[V
r
n+k1
V
r
n+k1
, so by step 1 the norm of the rst sum-
mand is less than |u[V
r
n+k1
|exp[(n+k1)(
r+1

r
)], whence less than
|u|exp[(n+k 1)(
r+1

r
)].
Second summand: The norm is at most |u[

V
r+1
n+k1
|. By the induction hypothesis,
this is less than |u|
k2
j=0
exp((n+ j)(
r+1

r
)).
We get the statement for k, and step 2 follows by induction.
As a result, we obtain the existence of a constant K
1
>1 for which u V
r
n
implies
|u[

V
r+1
n+k
| K
1
|u|exp(n(
r+1

r
)).
Step 3. K
1
, . . . , K
s1
> 1 s.t. for all n large enough and for all k, u V
r
n
implies
|u[

V
r+
n+k
| K

|u|exp(n(
r+

r
)) ( = 1, . . . , s r).
Proof. We saw that K
1
exists. We assume by induction that K
1
, . . . , K
t1
exist, and
construct K
t
. Fix 0 <
0
< min
j

j+1

j
; the idea is to rst prove that if
u V
r
n
, then
|u[

V
r+t
n+k
| |u|
_
t1

=1
K

__
k1

j=0
e

0
j
__
k1

j=0
exp[(n+ j)(
r+t

r
t)]
_
(2.6)
Once this is done, step 3 follows with K
t
:=
_

t1
=1
K

__

j0
e

0
j
_
2
.
We prove (2.6) using induction on k. When k = 1 this is because of step 1. Sup-
pose, by induction, that (2.6) holds for k 1. Decompose:
u[

V
r+t
n+k
= u[V
r
n+k1
[

V
r+t
n+k
. .
A
+

r<r
/
<r+t
u[U
r
/
n+k1
[

V
r+t
n+k
. .
B
+u[

V
r+t
n+k1
[

V
r+t
n+k
. .
C
Estimate of |A|: By step 1, |A| |u|exp((n+k 1)(
r+t

r
)).
Estimate of |B|: By step 1, and the induction hypothesis (on t):
|B|

r<r
/
<r+t
|u[U
r
/
n+k1
|exp((n+k 1)(
r+t

r
/ ))


r<r
/
<r+t
|u[

V
r
/
n+k1
|exp((n+k 1)(
r+t

r
/ ))


r<r
/
<r+t
K
r
/
r
|u|exp(n(
r
/
r
(r
/
r)))
exp((n+k 1)(
r+t

r
/ ))
2.5 The Multiplicative Ergodic Theorem 57
|u|
_
t1

t
/
=1
K
t
/
_
e
(k1)(
r+t

r
/ )
exp(n(
r+t

r
t))
|u|
_
t1

t
/
=1
K
t
/
_
e

0
(k1)
exp(n(
r+t

r
t)).
Estimate of |C|: |C| |u[

V
r+t
n+k1
|. By the induction hypothesis on k,
|C| |u|
_
t1

t
/
=1
K
t
/
__
k2

j=0
e

0
j
__
k2

j=0
exp[(n+ j)(
r+t

r
t)]
_
.
It is not difcult to see that when we add these bounds for |C|, |B| and |A|, the
result is smaller than the RHS of (2.6) for k. This completes the proof by induction
of (2.6). As explained above, step 3 follows by induction. .
Corollary 2.3. Let
1
(x) < <
s(x)
(x) denote the logarithms of the (different)
eigenvalues of (x). Let U

i
be the eigenspace of (x) corresponding to exp
i
. Set
V

:=

/ .
1. (x, v) := lim
n
1
n
log|A
n
(x)v| exists a.s, and is invariant.
2. (x, v) =
i
on V

i
V

i1
3. If |A
1
|, |A| L

, then
1
n
log[ det A
n
(x)[ =k
i

i
, where k
i
= dimU

i
.

i
(x) are called the Lyapunov exponents of x. V

i
is called the Lyapunov l-
tration of x. Property (2) implies that V

is Ainvariant: A(x)V

(x) = V

(Tx).
Property (3) is sometimes called regularity.
Remark: V

i
V

i1
is Ainvariant, but if A(x) is not orthogonal, then U

i
doesnt
need to be Ainvariant. When T is invertible, there is a way of writing V

i
=

ji
H
j
so that A(x)H
j
(x) = H
j
(Tx) and (x, ) =
j
on H
j
(x), see the next section.
2.5.3 The Multiplicative Ergodic Theorem for Invertible Cocycles
Suppose A : X GL(n, R). There is a unique extension of the denition of A
n
(x)
to non-positive ns, which preserves the cocycle identity: A
0
:= id , A
n
:= (A
n

T
n
)
1
. (Start from A
nn
= A
0
= id and use the cocycle identity.)
The following theorem establishes a compatibility between the Lyapunov spectra
and ltrations of A
n
and A
n
.
Theorem 2.10. Let (X, B, m, T) be an invertible probability preserving transforma-
tion, and A : X GL(d, R) a Borel function s.t. ln|A(x)
1
|. There are invariant
Borel functions p(x),
1
(x) < <
p(x)
(x), and a splitting R
d
=

p(x)
i=1
H
i
(x) s.t.
1. A
n
(x)H
i
(x) = H
i
(T
n
x) for all n Z
58 2 Ergodic theorems
2. lim
n
1
[n[
log|A
n
(x)v| =
i
(x) on the unit sphere in H
i
(x).
3.
1
n
logsin(H
i
(T
n
x), H
j
(T
n
x))
n
0.
Proof. Fix x, and let Let t
1
n
t
d
n
and t
1
n
t
d
n
be the eigenvalues of
(A
t
n
A
n
)
1/2
and (A
t
n
A
n
)
1/2
. Let t
i
:= lim(t
i
n
)
1/n
, t
i
= lim(t
i
n
)
1/n
. These limits ex-
ists almost surely, and logt
i
, logt
i
are lists of the Lyapunov exponents of A
n
and A
n
, repeated with multiplicity. The proof of the Oseledets theorem shows that
d

k=di+1
logt
i
lim
n
1
n
log|A
i
n
|
= lim
n
1
n
log
_
[ det A
n
[|((A
n
)
1
)
(di)
|
_
(write using e.v.s)
lim
n
1
n
log
_
[(det A
n
T
n
)[
1
|A
(di)
n
T
n
|
_
lim
n
1
n
log
_
[ det A
n
[
1
|A
(di)
n
|
_
(remark after Kingmans Theorem)
=
d

k=di+1
logt
i

k=1
logt
i
=
i

k=1
logt
i
.
Since this is true for all i, logt
i
=logt
di+1
.
It follows that if the Lyapunov exponents of A
n
are
1
< . . . <
s
, then the Lya-
punov exponents of A
n
are
s
< <
1
.
Let V
1
(x) V
2
(x) V
s
(x) be the Lyapunov ltration of A
n
:
V
i
(x) :=v : lim
n
1
n
log|A
n
(x)v|
i
(x).
Let V
1
(x)V
2
(x) V
s
(x) be the following decreasing ltration, given by
V
i
(x) :=v : lim
n
1
n
log|A
n
(x)v|
i
(x).
These ltrations are invariant: A(x)V
i
(x) =V
i
(Tx), A(x)V
i
(x) =V
i
(Tx).
Set H
i
(x) :=V
i
(x) V
i
(x). We must have A(x)H
i
(x) = H
i
(Tx).
We claim that R
d
=

H
i
(x) almost surely. It is enough to show that for a.e. x,
R
d
=V
i
(x) V
i+1
(x), because
R
d
V
1
=V
1
[V
1
V
2
] (V
1
V
2
=R
d
)
= H
1
[V
1
V
2
] = H
1
V
2
(V
1
V
2
)
= H
1
[V
2
(V
2
V
3
)] (V
2
V
3
=R
d
)
= H
1
H
2
V
3
= = H
1
H
s
.
2.5 The Multiplicative Ergodic Theorem 59
Since the spectra of , agree with matching multiplicities, dimV
i
+dimV
i+1
=
d. Thus it is enough to show that E :=x : V
i
(x)V
i+1
(x) ,=0 has zero measure
for all i.
Assume otherwise, then by the Poincar e recurrence theorem, for almost every
x E there is a sequence n
k
for which T
n
k
(x) E. By the Oseledets theorem,
for every > 0, there is N

(x) such that for all n > N

(x),
|A
n
(x)u| |u|exp[n(
i
+)] for all u V
i
V
i+1
, (2.7)
|A
n
(x)u| |u|exp[n(
i+1
)] for all u V
i
V
i+1
. (2.8)
If n
k
> N

(x), then A
n
k
(x)u V
i
(T
n
k
x) V
i+1
(T
n
k
x) and T
n
k
(x) E, so
|u| =|A
n
k
(T
n
k
x)A
n
k
(x)u| |A
n
k
(x)u|exp[n
k
(
i+1
)],
whence |A
n
k
(x)u| |u|exp[n
k
(
i+1
)]. By (2.7),
exp[n
k
(
i+1
+)] exp[n
k
(
i
+)],
whence [
i+1

i
[ < 2. But was arbitrary, and could be chosen to be much
smaller than the gaps between the Lyapunov exponents. With this choice, we get a
contradiction which shows that m(E) = 0.
Thus R
d
=

H
i
(x). Evidently, V
i
= V
i


ji
H
j
and V
i

j>i
H
j
V
i

V
i+1
=0, so V
i
=

ji
H
j
. In the same way V
i
=

ji
H
j
. It follows that H
i

(V
i
V
i1
) (V
i
V
i+1
). Thus lim
n
1
[n[
log|A
n
v| =
i
on the unit sphere in H
i
.
Next we study the angle between H
i
(x) and

H
i
(x) :=

j,=i
H
j
(x). Pick a basis
(v
i
1
, . . . , v
i
m
i
) for H
i
(x). Pick a basis (w
i
1
, . . . , w
i
m
i
) for

H
i
(x). Since A
n
(x) is invertible,
A
k
(x) maps (v
i
1
, . . . , v
i
m
i
) onto a basis of H
i
(T
k
x), and (w
i
1
, . . . , w
i
m
i
) onto a basis of

H
i
(T
k
x). Thus if v :=
_
v
i
j
, w :=
_
w
i
j
, then
[ sin(H
i
(T
k
x),

H(T
k
x))[
|A
n
(x)
d
(v w)|
|A
n
(x)
m
i
v| |A
n
(x)
(dm
i
)
w|
.
We view A
p
n
as an invertible matrix acting on spane

i
1
e

i
p
: i
1
< < i
p
via
(A
n
(x)e
i
1
)

(A
n
(x)e
i
p
)

. It is clear

p
(x) := lim
n
((A
p
n
)

(A
p
n
))
1/2n
=
_
lim
n
(A

n
A
n
)
1/2n
_
p
=(x)
p
,
thus the eigenspaces of
p
(x) are the tensor products of the eigenspaces of (x).
This determines the Lyapunov ltration of A
n
(x)
p
, and implies by Oseledets
theorem that if v
j
V

k( j)
V

k( j)1
, and v
1
, . . . , v
k
are linearly independent, then
60 2 Ergodic theorems
lim
n
1
n
log|A
n
(x)
p
| =
p

j=1

k( j)
, for := v
1
v
p
.
It follows that lim
n
1
n
log[ sin(H
i
(T
n
x),

H
i
(T
n
x))[ 0. .
Problems
2.1. The Mean Ergodic Theorem for Contractions
Suppose H is a Hilbert space, and U : H H is a bounded linear operator such that
|U| 1. Prove that
1
N

N1
k=0
U
k
f converges in norm for all f H, and the limit is
the projection of f on the space f : U f = f .
2.2. Ergodicity as a mixing property
Prove that a ppt (X, B, , T) is ergodic, iff for every A, B B,
1
N

N1
k=0
(A
T
k
B)
N
(A)(B).
2.3. Use the pointwise ergodic theorem to show that any two different ergodic in-
variant probability measures for the same transformation are mutually singular.
2.4. Ergodicity and extremality
An invariant probability measure is called extremal, if it cannot be written in the
form =t
1
+(1t)
2
where
1
,
2
are different invariant probability measures,
and 0 <t < 1. Prove that an invariant measure is extremal iff it is ergodic, using the
following steps.
1. Show that if E is a Tinvariant set of non-zero measure, then ( [E) is a T
invariant measure. Deduce that if is not ergodic, then it is not extremal.
2. Show that if is ergodic, and = t
1
+ (1 t)
2
where
i
are invariant, and
0 <t < 1, then
a. For every E B,
1
N

N1
k=0
1
E
T
k

N
(E)
i
a.e. (i = 1, 2).
b. Conclude that
i
(E) = (E) for all E B (i = 1, 2). This shows that ergod-
icity implies extremality.
2.5. Prove that the Bernoulli (
1
2
,
1
2
)measure is the invariant probability measure for
the adding machine (Problem 1.10), by showing that all cylinders of length n must
have the same measure as [0
n
]. Deduce from the previous problem that the adic
machine is ergodic.
2.6. Explain why when f L
2
, E( f [F) is the projection of f on L
2
(F). Prove:
1. If F =, A, X A, then E(1
B
[F) = (B[A)1
A
+(B[A
c
)1
A
c
2. If F =, X, then E( f [F) =
_
f d
3. If X = [1, 1] with Lebesgue measure, and F = A : A is Borel and A = A,
then E( f [F) =
1
2
[ f (x) + f (x)]
2.5 The Multiplicative Ergodic Theorem 61
2.7. Prove:
1. f :E( f [F) is linear, and a contraction in the L
1
metric
2. f 0 E( f [F) 0 a.e.
3. if is convex, then E( f [F) (E( f [F))
4. if h is Fmeasurable, then E(hf [F) = hE( f [F)
5. If F
1
F
2
, then E[E( f [F
2
)[F
1
] =E( f [F
1
)
2.8. The Martingale Convergence Theorem (Doob)
Suppose (X, B, ) is a probability space, and F
1
F
2
are algebras all of
which are contained in B. Let F :=(

n1
F
n
) (the smallest algebra contain-
ing the union). If f L
1
, then E( f [F
n
)
n
E( f [F) a.e. and in L
1
Prove this theorem, using the following steps (W. Parry). It is enough to consider
non-negative f L
1
.
1. Prove that E( f [F
n
)
L
1

n
E( f [F) using the following observations:
a. The convergence holds for all elements of

n1
L
1
(X, F
n
, );
b.

n1
L
1
(X, F
n
, ) is dense in L
1
(X, F, ).
2. Set E
a
:= x : max
1nN
E( f [F
n
)(x) > a. Show that (E
a
)
1
a
_
f d. (Hint:
E =

n1
x : E( f [F
n
)(x) > , and E( f [F
k
)(x) for k = 1, . . . , n1.)
3. Prove that E( f [F
n
)
n
E( f [F) a.e. for every non-negative f L
1
, using the
following steps. Fix f L
1
. For every > 0, choose n
0
and g L
1
(X, F
n
0
, )
such that |E( f [F) g|
1
< .
a. Show that [E( f [F
n
) E( f [F)[ E([ f g[ [F
n
)[ +[E( f [F) g[ for all n
n
0
. Deduce that

_
limsup
n
[E( f [F
n
) E( f [F)[ >

_

_
sup
n
[E([ f g[[F
n
)[ >
1
2

_
+
_
[E( f [F) g[ >
1
2

_
b. Show that
_
limsup
n
[E( f [F
n
) E( f [F)[ >

0
+
0. (Hint: Prove
rst that for every L
1
function F, [[F[ > a]
1
a
|F|
1
.)
c. Finish the proof.
2.9. Hopfs ratio ergodic theorem
Let (X, B, , T) be a conservative ergodic mpt on a nite measure space. If f , g
L
1
and
_
g ,= 0, then

n1
k=0
f T
k

n1
k=0
gT
k

n
_
f d
_
gd
almost everywhere.
Prove this theorem using the following steps (R. Zweim uller). Fix a set A B s.t.
0 < (A) < , and let (A, B
A
, T
A
,
A
) denote the induced system on A (problem
1.14). For every function F, set
62 2 Ergodic theorems
S
n
F := F +F T + +F T
n1
S
A
n
F := F +F T
A
+ +F T
n1
A
1. Read problem 1.14, and show that a.e. x has an orbit which enters A innitely
many times. Let 0 <
1
(x) <
2
(x) < be the times when T

(x) A.
2. Suppose f 0. Prove that for every n (
k1
(x),
k
(x)] and a.e. x A,
(S
A
k1
f )(x)
(S
A
k
1
A
)(x)

(S
n
f )(x)
(S
n
1
A
)(x)

(S
A
k
f )(x)
(S
A
k1
1
A
)(x)
.
3. Verify that S
A
j
1
A
= j a.e. on A, and showthat (S
n
f )(x)/(S
n
1
A
)(x)
n
1
(A)
_
f d
a.e. on A.
4. Finish the proof.
Notes for chapter 2
For a comprehensive reference to ergodic theorems, see [2]. The mean ergodic the-
orem was proved by von Neumann. The pointwise ergodic theorem was proved by
Birkhoff. By now there are many proofs of this theorem. The one we use is taken
from [1], where it is attributed to Kamae who apparently found it using ideas
from nonstandard analysis. The subadditive ergodic theorem was rst proved by
Kingman. The proof we give is due to Steele [5]. The multiplicative ergodic theo-
rem is due to Oseledets. The proof we use is due to Raghunathan and Ruelle, and is
taken from [4]. The Martingale convergence theorem (problem 2.8) is due to Doob.
The proof sketched in problem 2.8 is taken from [2]. The proof of Hopfs ratio er-
godic theorem sketched in problem 2.9 is due to R. Zweim uller and is taken from
[6].
References
1. Keane, M.: Ergodic theory and subshifts of nite type. In Ergodic theory, symbolic dynamics,
and hyperbolic spaces (Trieste, 1989), 3570, Oxford Sci. Publ., Oxford Univ. Press, New
York, 1991.
2. Krengel, U.: Ergodic theorems. de Gruyter Studies in Mathematics 6 1985. viii+357pp.
3. Parry, W.: Topics in ergodic theory. Cambridge Tracts in Mathematics, 75 Cambridge Uni-
versity Press, Cambridge-New York, 1981. x+110 pp.
4. Ruelle, D.: Ergodic theory of differentiable dynamical systems. Inst. Hautes

Etudes Sci. Publ.
Math. No. 50 (1979), 2758.
5. Steele, J. M.: Kingmans subadditive ergodic theorem. Ann. Inst. H. Poincar Probab. Statist.
25 (1989), no. 1, 9398.
6. Zweim uller, R.: Hopf s ratio ergodic theorem by inducing. Colloq. Math. 101 (2004), no. 2,
289292.
Chapter 3
Spectral Theory
3.1 The spectral approach to ergodic theory
A basic problem in ergodic theory is to determine whether two ppt are measure
theoretically isomorphic. This is done by studying invariants: properties, quantities,
or objects which are equal for any two isomorphic systems. The idea is that if two
ppt have different invariants, then they cannot be isomorphic. Ergodicity and mixing
are examples of invariants for measure theoretic isomorphism.
An effective method for inventing invariants is to look for a weaker equivalence
relation, which is better understood. Any invariant for the weaker equivalence rela-
tion is automatically an invariant for measure theoretic isomorphism. The spectral
point of view is based on this approach.
The idea is to associate to the ppt (X, B, , T) the operator U
T
: L
2
(X, B, )
L
2
(X, B, ), U
t
f = f T. This is an isometry of L
2
(i.e. |U
T
f |
2
= | f |
2
and
U
T
f ,U
T
g) = f , g)). It is useful here to think of L
2
as a Hilbert space over C.
Denition 3.1. Two ppt (X, B, , T), (Y, C, , S) are called spectrally isomorphic,
if their associated L
2
isometries U
T
and U
S
are unitarily equivalent, namely if there
exists a linear operator W : L
2
(X, B, ) L
2
(Y, C, ) s.t.
1. W is invertible;
2. W f ,Wg) = f , g) for all f , g L
2
(X, B, );
3. WU
T
=U
S
W.
It is easy to see that any two measure theoretically isomorphic ppt are spectrally iso-
morphic, but we will see later that there are Bernoulli schemes which are spectrally
isomorphic but not measure theoretically isomorphic.
Denition 3.2. A property of ppt is called a spectral invariant, if whenever it holds
for (X, B, , T), it holds for all ppt which are spectrally isomorphic to (X, B, , T).
Proposition 3.1. Ergodicity and mixing are spectral invariants.
63
64 3 Spectral theory
Proof. Suppose (X, B, , T) is a ppt, and let U
T
be as above. The trick is to phrase
ergodicity and mixing in terms of U
T
.
Ergodicity is equivalent to the statement all invariant L
2
functions are con-
stant, which is the same as saying that dim f : U
T
f = f = 1. Obviously, this
is a spectral invariant.
Mixing is equivalent to the following statement: dim f : U
T
f = f = 1, and
f ,U
n
T
g)
n
f , 1)g, 1) for all f , g L
2
.
To see that this property is preserved by spectral isomorphisms, note that if dim f :
U
T
f = f = 1, then any unitary equivalence W satises W1 = c with [c[ = 1. .
The spectral point of view immediately suggests the following invariant.
Denition 3.3. Suppose (X, B, , T) is a ppt. If f : X C, f L
2
satises f T =
f , then we say that f is an eigenfunction and that is an eigenvalue. The point
spectrum T is the set H(T) := C : f T = f .
H(T) is a countable subgroup of the unit circle (problem 3.1). Evidently H(T) is a
spectral invariant of T.
It is easy to see using Fourier expansions that for the irrational rotation R

,
H(R

) =
k
: k Z (problem 3.2), thus irrational rotations by different angles
are non-isomorphic.
Here are other related invariants:
Denition 3.4. Given a ppt (X, B, , T), let V
d
:= spaneigenfunctions. We say
that (X, B, , T) has
1. discrete spectrum (sometime called pure point spectrum), if V
d
= L
2
,
2. continuous spectrum, if V
d
=constants (i.e. is smallest possible),
3. mixed spectrum, if V
d
,= L
2
, constants.
Any irrational rotation has discrete spectrum (problem 3.2). Any mixing transfor-
mation has continuous spectrum, because a non-constant eigenfunction f T =
satises
f , f T
n
k
)
n
| f |
2
2
,= (
_
f )
2
along any n
k
s.t.
n
k
1. (To see that | f |
2
,= (
_
f d)
2
for all non-constant
functions, apply Cauchy-Schwarz to f
_
f , or note that non-constant L
2
functions
have positive variance.)
The invariant H(T) is tremendously successful for transformations with discrete
spectrum:
Theorem 3.1 (Discrete Spectrum Theorem). Two ppt with discrete spectrum are
measure theoretically isomorphic iff they have the same group of eigenvalues.
But this invariant cannot distinguish transformations with continuous spectrum. In
particular - it is unsuitable for the study of mixing transformations.
3.2 Weak mixing 65
3.2 Weak mixing
3.2.1 Denition and characterization
We saw that if a transformation is mixing, then it does not have non-constant eigen-
functions. But the absence of non-constant eigenfunctions is not equivalent to mix-
ing (see problems 3.83.10 for an example). Here we study the dynamical signi-
cance of this property. First we give it a name.
Denition 3.5. A ppt is called weak mixing, if every f L
2
s.t. f T = f a.e. is
constant almost everywhere.
Theorem 3.2. The following are equivalent for a ppt (X, B, , T) on a Lebesgue
space:
1. weak mixing;
2. for all E, F B,
1
N

N1
k=0
[(E T
n
F) (E)(F)[
N
0;
3. for every E, F B, N N of density zero (i.e. [N [1, N][/N
N
0) s.t.
(E T
n
F)
N ,n
(E)(F);
4. T T is ergodic.
Proof. We prove (2) (3) (4) (1). The remaining implication (1) (2)
requires additional preparation, and will be shown later.
The implication (2) (3) is a general fact from calculus (Koopmanvon Neu-
mann Lemma): If a
n
is a bounded sequence of non-negative numbers, then
1
N

N
n=1
a
n

0 iff there is a set of zero density N N s.t. a


n

N ,n
0 (Problem 3.3).
We show that (3) (4). Let S be the semi-algebra E F : E, F B which
generates BB, and x E
i
F
i
S. By (3), N
i
N of density zero s.t.
(E
i
T
n
F
i
)
N
i
,n
(E
i
)(F
i
) (i = 1, 2).
The set N =N
1
N
2
also has zero density, and
(E
i
T
n
F
i
)
N ,n
(E
i
)(F
i
) (i = 1, 2).
Writing m = and S = T T, we see that this implies that
m[(E
1
E
2
) S
n
(F
1
F
2
)]
N ,n
m(E
1
F
1
)m(E
2
F
2
),
whence
1
N

N1
k=0
m[(E
1
F
1
) S
n
(E
2
F
2
)]
N
m(E
1
F
1
)m(E
2
F
2
). In sum-
mary,
1
N

N1
k=0
m[AS
n
B]
N
m(A)m(B) for all A, B S.
Since S generates BBthe above holds for all A, B BB, and this implies
that T T is ergodic.
66 3 Spectral theory
Proof that (4) (1): Suppose T were not weak mixing, then T has an non-
constant eigenfunction f with eigenvalue . The eigenvalue has absolute value
equal to one, because [[| f |
2
=|[ f [ T|
2
=| f |
2
. Thus
F(x, y) = f (x) f (y)
is T Tinvariant. Since f is non-constant, F is non-constant, and we get a contra-
diction to the ergodicity of T T.
The proof that (1)(2) is presented in the next section. .
3.2.2 Spectral measures and weak mixing
It is convenient to introduce the following notation U
n
T
:= (U

T
)
[n[
where n < 0,
where U

T
is the unique operator s.t. U

T
f , g) = f ,U
T
g) for all g L
2
. This makes
sense even if U
T
is not invertible. The reader can check that when U
T
is invertible,
U
1
T
= (U
T
)
1
, so that there is no risk of confusion.
We are interested in the behavior of U
n
T
f as n . To study it, it is enough to
study U
T
: H
f
H
f
, where H
f
:= spanU
n
T
f : n Z.
It turns out that U
T
: H
f
H
f
is unitarily equivalent to the operator M : g(z)
zg(z) on L
2
(S
1
, B(S
1
),
f
) where
f
is some nite measure on S
1
, called the spectral
measure of f , which contains all the information on U
T
: H
f
H
f
.
To construct it, we need the following important tool from harmonic analysis.
Recall that The n-th Fourier coefcient of is the number (n) =
_
S
1 z
n
d.
Theorem 3.3 (Herglotz). A sequence r
n

nZ
is the sequence of Fourier coef-
cients of a positive Borel measure on S
1
iff r
n
= r
n
and r
n
is positive denite:
N

n,m=N
r
nm
a
m
a
n
0 for all sequences a
n
and N. This measure is unique.
It is easy to check that r
n
= U
n
T
f , f ) is positive denite (to see this expand

N
n=N
a
n
U
n
T
f ,
N
m=N
a
m
U
m
T
f ) noting that U
n
T
f ,U
m
T
f ) =U
nm
T
f , f )).
Denition 3.6. Suppose (X, B, , T) is a ppt, and f L
2
0. The spectral mea-
sure of f is the unique measure
f
on S
1
s.t. f T
n
, f ) =
_
S
1 z
n
d
f
for n Z.
Proposition 3.2. Let H
f
:= spanU
n
T
f : n Z, then U
T
: H
f
H
f
is unitarily
equivalent to the operator g(z) zg(z) on L
2
(S
1
, B(S
1
),
f
).
Proof. By the denition of the spectral measure,
_
_
_
_
_
N

n=N
a
n
z
n
_
_
_
_
_
2
L
2
(
f
)
=
_
N

n=N
a
n
z
n
,
N

m=N
a
m
z
m
_
=
N

n,m=N
a
n
a
m
_
S
1
z
nm
d
f
(z)
=
N

n,m=N
a
n
a
m
U
nm
T
f , f ) =
N

n,m=N
a
n
a
m
U
n
T
f ,U
m
T
f ) =
_
_
_
_
_
N

n=N
a
n
U
n
T
f
_
_
_
_
_
2
L
2
()
3.2 Weak mixing 67
In particular, if
N
n=N
a
n
U
n
T
f =0 in L
2
(), then
N
n=N
a
n
z
n
=0 in L
2
(
f
). It follows
that W : U
n
T
f z
n
extends to a linear map from spanU
n
T
f : n Z to L
2
(
f
).
This map is an isometry, and it is bounded. It follows that W extends to an linear
isometry W : H
f
L
2
(
f
). The image of W contains all the trigonometric poly-
nomials, therefore W(H
f
) is dense in L
2
(
f
). Since W is an isometry, its image is
closed (exercise). It follows that W is an isometric bijection from H
f
onto L
2
(
f
).
Since (WU
t
)[g(z)] = z[Wg(z)] on spanU
n
T
f : n Z, WU
T
g(z) = zg(z) on H
f
,
and so W is the required unitary equivalence. .
Proposition 3.3. If T is weak mixing ppt on a Lebesgue space, then all the spec-
tral measures of f L
2
s.t.
_
f = 0 are non-atomic (this explains the terminology
continuous spectrum).
Proof. Suppose f L
2
has integral zero and that
f
has an atom S
1
. We con-
struct an eigenfunction (with eigenvalue ). Consider the sequence
1
N

N1
n=0

n
U
n
T
f .
This sequence is bounded in norm, therefore has a weakly convergent subsequence
(here we use the fact that L
2
is separable a consequence of the fact that (X, B, )
is a Lebesgue space):
1
N
k
N1

n=0

k
U
k
T
w

N
g.
The limit g must satisfy U
T
g, h) = g, h) (check!), therefore it must be an eigen-
function with eigenvalue .
But it could be that g = 0. We rule this out using the assumption that
f
,= 0:
g, f ) = lim
k
1
N
k
N
k
1

n=0

n
U
n
T
f , f ) = lim
k
1
N
k
N
k
1

n=0
_

n
z
n
d
f
(z)
=
f
+ lim
k
1
N
k
N
k
1

n=0
_
S
1

n
z
n
d
f
(z)
=
f
+ lim
k
_
S
1

1
N
k
1
N
k
z
N
k
1
1
z
d
f
(z).
The limit is equal to zero, because the integrand tends to zero and is uniformly
bounded (by one). Thus g, f ) =
f
,= 0, whence g ,= 0. .
Lemma 3.1. Suppose T is a ppt on a Lebesgue space. If T is weak mixing, then for
every f L
2
,
1
N
N1

k=0
[
_
f f T
n
d (
_
f d)
2
[
N
0.
Proof. It is enough to treat the case when
_
f d = 0. Let
f
denote the spectral
measure of f , then
68 3 Spectral theory
1
N
N1

k=0

_
f f T
n
d

2
=
1
N
N1

k=0
[U
n
T
f , f )[
2
=
1
N
N1

k=0

_
S
1
z
n
d
f
(z)

2
=
1
N
N1

k=0
_
_
S
1
z
n
d
f
(z)
__
_
S
1
z
n
d
f
(z)
_
=
1
N
N1

k=0
_
S
1
_
S
1
z
n
w
n
d
f
(z)d
f
(w)
=
_
S
1
_
S
1
1
N
_
N1

k=0
z
n
w
n
_
d
f
(z)d
f
(w)
The integrand tends to zero and is bounded outside :=(z, w) : z = w. If we can
show that (
f

f
)() = 0, then it will follow that
1
N
N1

k=0
[
_
f f T
n
d[
2

N
0.
This is indeed the case: T is weak mixing, so by the previous proposition
f
is
non-atomic, whence (
f

f
)() =
_
S
1
f
wd
f
(w) = 0 by Fubini-Tonelli.
It remains to note that by the Koopman - von Neumann theorem, for every
bounded non-negative sequence a
n
,
1
N

N
k=1
a
2
n
0 iff
1
N

N
k=1
a
n
0, because both
conditions are equivalent to saying that a
n
converges to zero outside a set of indices
of density zero. .
We can now complete the proof of the theorem in the previous section:
Proposition 3.4. If T is weak mixing, then for all f , g L
2
,
1
N
N1

k=0

_
g f T
n
d
_
_
f d
__
_
gd
_

N
0. (3.1)
Proof. Assume rst T is invertible, then U
T
is invertible, with a bounded inverse
(equal to U
T
1 ). Fix f L
2
, and set
S( f ) := spanU
k
T
f : k Z.
Write L
2
= S( f ) +constants+[S( f ) +constants]

.
1. Every g S( f ) satises (3.1), because S( f ) is generated by functions of the form
g :=U
k
T
f , and these functions satisfy (3.1) by Lemma 3.1.
2. Every constant g satises (3.1) trivially.
3. Every g S( f )constants satises (3.1) because g, f T
n
) is eventually zero.
It follows that every g L
2
satises (3.1).
Now consider the case of a non-invertible ppt. Let (

X,

B, ,

T) be the natural
extension. A close look at the denition of

B shows that if

f :

X R is

B
measurable, then the value of

f (. . . , x
1
, x
0
, x
1
, . . .) is completely determined by x
0
.
Moreover,

f :

X C is of the form f where f is Bmeasurable. Thus every
eigenfunction for

T is a lift of an eigenfunction for T. It follows that if T is weak
mixing, then

T is weak mixing.
3.3 The Koopman operator of a Bernoulli scheme 69
By the rst part of the proof,

T satises (3.1). Since T is a factor of T, it also
satises (3.1). .
3.3 The Koopman operator of a Bernoulli scheme
In this section we analyze the Koopman operator of an invertible Bernoulli scheme.
The idea is to produce an orthonormal basis for L
2
which makes the action of U
T
transparent.
We cannot expect to diagonalize U
T
: Bernoulli schemes are mixing, so they have
no non-constant eigenfunctions. But we shall we see that we can get the following
nice structure:
Denition 3.7. An invertible ppt is said to have countable Lebesgue spectrum if
L
2
has an orthonormal basis of the form 1 f
, j
: , j Z where is
countable, and U
T
f
, j
= f
, j+1
for all i, j.
The reason for the terminology is that the spectral measure of each f
, j
is propor-
tional to the Lebesgue measure on S
1
(problem 3.6).
Example. The invertible Bernoulli scheme with probability vector (
1
2
,
1
2
) has count-
able Lebesgue spectrum.
Proof. The phase space is X =0, 1
Z
. Dene for every nite non-empty A Z the
function
A
(x) :=
jA
(1)
x
j
. Dene

:= 1. Then,
1. if A ,= B, then
A

B
;
2. span
A
: A Z nite is algebra of functions which separates points, and con-
tains the constants.
By the Stone-Weierstrass theorem, span
A
: A Z nite = L
2
, so
A
is an
orthonormal basis of L
2
. This is called the FourierWalsh system.
Note that U
T

A
=
A+1
, where A+1 :=a+1 : a A. Take the set of equiva-
lence classes of the relation A B c s.t. A =c+B. Let A

be a representative of
. The basis is 1
A

+n
: , n Z =Fourier Walsh functions. .
It is not easy to produce such bases for other Bernoulli schemes. But they exist.
To prove this we introduce the following sufcient condition for countable Lebesgue
spectrum, which turns out to be satised by many smooth dynamical systems:
Denition 3.8. An invertible ppt (X, B, , T) is called a K automorphism if there is
a algebra A B s.t.
1. T
1
A A;
2. A generates B: (

nZ
T
n
A) =B mod ;
1
3. the tail of A is trivial:

n=0
T
n
A =, X mod .
1
F
1
F
2
mod is for all F
1
F
2
there is a set F
2
F
2
s.t. (F
1
F
2
) = 0, and F
1
= F
2
mod iff F
1
F
2
mod and F
2
F
1
mod .
70 3 Spectral theory
Proposition 3.5. Every invertible Bernoulli scheme has the K property.
Proof. Let (S
Z
, B(S
Z
), , T) be a Bernoulli scheme, i.e. B(S
Z
) is the sigma algebra
generated by cylinders
k
[a
k
, . . . , a

] := x S
Z
: x
i
= a
i
(k i ), T is the
left shift map, and (
k
[a
k
, . . . , a

]) = p
a
k
p
a

.
Call a cylinder non-negative, if it is of the form
0
[a
0
, . . . , a
n
]. Let A be the sigma
algebra generated by all non-negative cylinders. It is clear that T
1
A A and
that

nZ
T
n
A generates B(S
Z
). We show that the measure of every element of

n=0
T
n
A is either zero or one. Probabilists call the elements of this intersection
tail events. The fact that every tail event for a sequence of independent identically
distributed random variables has probability zero or one is called Kolmogorovs
zeroone law.
Two measurable sets A, B are called independent, if (AB) = (A)(B). For
Bernoulli schemes, any two cylinders with non-overlapping set of indices is inde-
pendent (check). Thus for every cylinder B of length [B[,
B is independent of T
[B[
A for all non-negative cylinders A.
It follows that B is independent of every element of T
[B[
A (a monotone class
theorem argument). Thus every cylinder B is independent of every element of

n1
T
n
A. Thus every element of Bis independent of every element of

n1
T
n
A
(another monotone class theorem argument).
This means that every E

n1
T
n
A is independent of itself. Thus (E) =
(E E) = (E)
2
, whence (E) = 0 or 1. .
Proposition 3.6. Every K automorphism on a non-atomic standard probability
space has countable Lebesgue spectrum.
Proof. Let (X, B, , T) be a K automorphism of a non-atomic standard probability
space. Since (X, B, ) is a non-atomic standard space, L
2
(X, B, ) is (i) innite
dimensional, and (ii) separable.
Let A be a sigma algebra in the denition of the K property. Set V :=L
2
(X, A, ).
This is a closed subspace of L
2
(X, B, ), and
1. U
T
(V) V, because T
1
A A;
2.

nZ
U
n
T
(V) is dense in L
2
(X, B, ), because

nZ
T
n
A generates B, so every
BBcan be approximated by a nite disjoint union of elements of

nZ
T
n
A;
3.

n=1
U
n
T
(V) =constant functions, because

n1
T
n
A =, X mod .
Now let W := V U
T
(V) (the orthogonal complement of U
T
(V) in V). For all
n > 0, U
n
T
(W) U
n
T
(V) U
T
(V) W. Thus W U
n
T
(W) for all n > 0. Since U
1
T
is an isometry, W U
n
T
(W) for all n < 0. It follows that
L
2
(X, B, ) =constants

nZ
U
n
T
(W) (orthogonal sum).
If f

: is an orthonormal basis for W, then the above implies that


3.3 The Koopman operator of a Bernoulli scheme 71
1U
n
T
f

:
is an orthonormal basis of L
2
(X, B, ) (check!).
This is almost the full countable Lebesgue spectrum property. It remains to show
that [[ =
0
. [[
0
because L
2
(X, B, ) is separable. We show that is in-
nite by proving dim(W) =. We use the following fact (to be proved later):
N A
1
, . . . , A
N
A pairwise disjoint sets, with positive measure. (3.2)
Suppose we know this. Pick f W 0 (W ,= 0, otherwise L
2
= constants
and (X, B, ) is atomic). Set w
i
:= f 1
A
i
T with A
1
, . . . , A
N
as above, then (i) w
i
are linearly independent (because they have disjoint supports); (ii) w
i
V (because
T
1
A
i
T
1
A A, so w
i
is Ameasurable); and (iii) w
i
U
T
(V) (check, using
f W). It follows that dim(W) N. Since N was arbitrary, dim(W) =.
Here is the proof of (3.2). Since (X, B, ) is non-atomic, B
1
, . . . , B
N
B pair-
wise disjoint with positive measure. By assumption,

nZ
T
n
A generates B, thus
we can approximate B
i
arbitrarily well by elements of

nZ
T
n
A. By assumption,
A TA. This means that we can approximate B
i
arbitrarily well by sets from
T
n
A by choosing n sufciently large. It follows that L
2
(X, T
n
A, ) has dimension
at least N. This forces T
n
A to contain at least N pairwise disjoint sets of positive
measure. It follows that A contains at least N pairwise disjoint sets of positive mea-
sure. .
Corollary 3.1. All systems with countable Lebesgue spectrum, whence all invertible
Bernoulli schemes, are spectrally isomorphic.
Proof. Problem 3.7. .
But it is not true that all Bernoulli schemes are measure theoretically isomorphic.
To prove this one needs new (non-spectral) invariants. Enter the measure theoretic
entropy, which we discuss in the next chapter.
Problems
3.1. Suppose (X, B, , T) is an ergodic ppt on a Lebesgue space, and let H(T) be
its group of eigenvalues.
1. show that if f is an eigenfunction, then [ f [ =const. a.e., and that if , H(T),
then so do 1, , /.
2. Show that eigenfunctions of different eigenvalue are orthogonal. Deduce that
H(T) is a countable subgroup of the unit circle.
3.2. Prove that the irrational rotation R

has discrete spectrum, and calculate H(R

).
3.3. Koopman - von Neumann Lemma
Suppose a
n
is a bounded sequence of non-negative numbers. Prove that
1
N

N
n=1
a
n

72 3 Spectral theory
0 iff there is a set of zero density N N s.t. a
n

N ,n
0. Guidance: Fill in the
details in the following argument.
1. Suppose N N has density zero and a
n

N ,n
0, then
1
N

N
n=1
a
n
0.
2. Now assume that
1
N

N
n=1
a
n
0.
a. Show that N
m
:=k : a
k
> 1/m form an increasing sequence of sets of den-
sity zero.
b. Fix
i
0, and choose k
i
such that if n > k
i
, then (1/n)[N
i
[1, n][ <
i
.
Show that N :=

i
N
i
(k
i
, k
i+1
] has density zero.
c. Show that a
n

N ,n
0.
3.4. Here is a sketch of an alternative proof of proposition 3.4, which avoids natural
extensions (B. Parry). Fill in the details.
1. Set H := L
2
, V :=

n0
U
n
T
(H), and W := HU
T
H :=g H, g U
T
H.
a. H =V [(U
T
H)

+(U
2
T
)

+ ]
b. U
k
T
H is decreasing, (U
k
T
H)

us increasing.
c. H =V

k=1
U
k
T
W (orthogonal space decomposition).
2. U
T
: V V has a bounded inverse (hint: use the fact from Banach space the-
ory that any bounded linear operator between mapping one Banach space onto
another Banach space which is one-to-one, has a bounded inverse).
3. (3.1) holds for any f , g V.
4. if g U
k
T
W for some k, then (3.1) holds for all f L
2
.
5. if g V, but f U
k
T
W for some k, then (3.1) holds for f , g.
6. (3.1) holds for all f , g L
2
.
3.5. Show that every invertible ppt with countable Lebesgue spectrum is mixing,
whence ergodic.
3.6. Suppose (X, B, , T) has countable Lebesgue spectrum. Show that f L
2
:
_
f = 0 is spanned by functions f whose spectral measures
f
are equal to the
Lebesgue measure on S
1
.
3.7. Show that any two ppt with countable Lebesgue spectrum are spectrally iso-
morphic.
3.8. Cutting and Stacking and Chacons Example
This is an example of a ppt which is weak mixing but not mixing. The example is
a certain map of the unit interval, which preserves Lebesgues measure. It is con-
structed using the method of cutting and stacking which we now explain.
Let A
0
= [1,
2
3
) and R
0
:= [
2
3
, 1] (thought of as reservoir).
Step 1: Divide A
0
into three equal subintervals of length
2
9
. Cut a subinterval B
0
of length
2
9
from the left end of the reservoir.
3.3 The Koopman operator of a Bernoulli scheme 73
Stack the three thirds of A
0
one on top of the other, starting from the left and
moving to the right.
Stick B
0
between the second and third interval.
Dene a partial map f
1
by moving points vertically in the stack. The map is
dened everywhere except on R B
0
and the top oor of the stack. It can be
viewed as a partially dened map of the unit interval.
Update the reservoir: R
1
:= RB
0
. Let A
1
be the base of the new stack (equal to
the rightmost third of A
0
).
Step 2: Cut the stack vertically into three equal stacks. The base of each of these
thirds has length
1
3

2
9
. Cut an interval B
1
of length
1
3

2
9
from the left side of
the reservoir R
1
.
Stack the three stacks one on top of the other, starting fromthe left and moving
to the right.
Stick B
1
between the second stack and the third stack.
Dene a partial map f
2
by moving points vertically in the stack. This map is
dened everywhere except the union of the top oor oor and R
1
B
1
.
Update the reservoir: R
2
:= R
1
B
1
. Let A
2
be the base of the new stack (equal to
the rightmost third of A
1
).
Step 3: Cut the stack vertically into three equal stacks. The base of each of these
thirds has length
1
3
2

2
9
. Cut an interval B
2
of length
1
3
2

2
9
from the left side of
the reservoir R
2
.
Stack the three stacks one on top of the other, starting fromthe left and moving
to the right.
Stick B
2
between the second stack and the third stack.
Dene a partial map f
3
by moving points vertically in the stack. This map is
dened everywhere except the union of the top oor oor and R
2
B
2
.
Update the reservoir: R
3
:= R
2
B
2
. Let A
3
be the base of the new stack (equal to
the rightmost third of A
2
)
Continue in this manner, to obtain a sequence of partially dened maps f
n
. There is
a canonical way of viewing the intervals composing the stacks as of subintervals of
the unit interval. Using this identication, we may view f
n
as partially dened maps
of the unit interval.
1. Showthat f
n
is measure preserving where it is dened (the measure is Lebesgues
measure). Calculate the Lebesgue measure of the domain of f
n
.
2. Show that f
n+1
extends f
n
(i.e. the maps agree on the intersection of their do-
mains). Deduce that the common extension of f
n
denes an invertible probability
preserving map of the open unit interval. This is Chacons example. Denote it by
(I, B, m, T).
3. Let
n
denote the height of the stack at step n. Show that the sets T
i
(A
n
) : i =
0, . . . ,
n
, n 1 generate the Borel algebra of the unit interval.
74 3 Spectral theory
A
0
R
0
step 1 (cutting) step 1 (stacking)
A
1
R
1
B
0
step 2 (cutting)
A
1
R
1
B
0
step 2 (stacking)
A
2
R
2
B
1
B
0
B
1
l
2
l
1
Fig. 3.1 The construction of Chacons example
3.9. (Continuation) Prove that Chacons example is weak mixing using the follow-
ing steps. Suppose f is an eigenfunction with eigenvalue .
1. We rst show that if f is constant on A
n
for some n, then f is constant every-
where. (A
n
is the base of the stack at step n.)
a. Let
k
denote the height of the stack at step k. Show that A
n+1
A
n
, and
T

n
(A
n+1
) A
n
. Deduce that

n
= 1.
b. Prove that

n+1
= 1. Find a recursive formula for
n
. Deduce that = 1.
c. The previous steps show that f is an invariant function. Show that any invari-
ant function which constant on A
n
is constant almost everywhere.
2. We now consider the case of a general L
2
eigenfunction.
a. Show, using Lusins theorem, that there exists an n such that f is nearly con-
stant on most of A
n
. (Hint: part 3 of the previous question).
b. Modify the argument done above to show that any L
2
eigenfunction is con-
stant almost everywhere.
3.10. (Continuation) Prove that Chacons example is not mixing, using the following
steps.
1. Inspect the image of the top oor of the stack at step n, and show that for every n
and 0 k
n1
, m(T
k
A
n
T
k+
n
A
n
)
1
3
m(T
k
A
n
).
References 75
2. Use problem 3.8 part 3 and an approximation argument to show that for every
Borel set E and >0, m(E T

n
E)
1
3
m(E) for all n. Deduce that T cannot
be mixing.
Notes to chapter 3
The spectral approach to ergodic theory is due to von Neumann. For a thorough
modern introduction to the theory, see Nadkarnis book [1]. Our exposition follows
in parts the books by Parry [2] and Petersen [1]. A proof of the discrete spectrum
theorem mentioned in the text can be found in Walters book [3]. A proof of Her-
glotzs theorem is given in [2].
References
1. Nadkarni, M. G.: Spectral theory of dynamical systems. Birkh auser Advanced Texts:
Birkh auser Verlag, Basel, 1998. x+182 pp.
2. Parry, W.: Topics in ergodic theory. Cambridge Tracts in Mathematics, 75. Cambridge Uni-
versity Press, Cambridge-New York, 1981. x+110 pp.
3. Petersen, K.: Ergodic theory. Corrected reprint of the 1983 original. Cambridge Studies in
Advanced Mathematics 2 Cambridge University Press, Cambridge, 1989. xii+329 pp.
4. Walters, P.: An introduction to ergodic theory. Graduate Texts in Mathematics, 79 Springer-
Verlag, New York-Berlin, 1982. ix+250 pp.
Chapter 4
Entropy
In the end of the last chapter we saw that every two Bernoulli schemes are spec-
trally isomorphic (because they have countable Lebesgue spectrum). The question
whether any two Bernoulli schemes are measure theoretically isomorphic was a ma-
jor open question in the eld. It was solved by Kolmogorov and Sinai, through the
invention of a new invariant: entropy. Later, Ornstein proved that this invariant is
complete within the class of Bernoulli schemes.
4.1 Information content and entropy
Let =A
1
, . . . , A
N
be a measurable partition of (X, B, ), and suppose T : X X
is measurable. Let
(x) := The element of which contains x.
The itinerary of x is
_
(x), (Tx), (T
2
x), . . .
_
, a sequence taking values A
1
, . . . , A
N
.
Suppose x is not known, but ((x), . . . , (T
n1
x)) is known; How much uncer-
tainty do we have regarding (T
n
x)?
Example 1: Irrational Rotations. R

(z) = e
i
z with 0 < <
1
100
irrational, and
:=A
0
, A
1
where A
0
:=e
i
: 0 < , and A
1
:=e
i
: < 2. If the
ve rst symbols are
(1, 1, 1, 0, 0, . . .)
then we are certain that the next one is 0. This is the case whenever there is
a 1 there. In the case (0, 0, 0, 0, 0, . . .) we can guess that the next one is 0 with
certainty of 99%. Thus knowing (T
5
x) isnt worth much, if we already know
((x), (Tx), . . . , (T
4
x)). We can guess it with high certainty anyway.
Example 2: Angle Doubling. T(z) = z
2
, same partition. Knowing the rst ve
symbols tells us nothing on the sixth one: It is zero or one with probability 50%. So
the information content of (T
5
x) given ((x), (Tx), . . . , (T
4
x)) is constant.
77
78 4 Entropy
The following question arises: Let (X, B, ) be a probability space. Suppose x
X is unknown. How to quantify the information content I(A) of the the statement
x belongs to A?
Our guiding principle is to think of the information content of an event E as of
the uncertainty lost when learning that x A. Thus the information content of an
event of small probability is large. Here are some intuitively clear requirements that
a good denition of I(A) should satisfy:
1. I(A) should be a continuous function of the probability of A;
2. I(A) should be non-negative, decreasing in (A), and if (A) = 1 then I(A) = 0;
3. If A, B are independent, i.e. (AB) = (A)(B), then I(AB) = I(A) +I(B).
Proposition 4.1. The only functions : [0, 1] R
+
such that I(A) = [(A)] sat-
ises the above axioms for all probability spaces (X, B, ) are clnt with c < 0.
We leave the proof as an exercise. This leads to the following denition.
Denition 4.1 (Shannon). Let (X, B, ) be a probability space.
1. The Information Content of a set A B is I

(A) :=log[A]
2. The Information Function of a nite measurable partition is
I

()(x) :=

A
I

(A)1
A
(x) =

A
log(A)1
A
(x)
3. The Entropy of a nite measurable partition is the average of the information
content of its elements:
H

() :=
_
X
I

()d =

A
(A)log(A).
Conventions: The base of the log is 2; 0log0 = 0.
The are important conditional versions of these notions:
Denition 4.2. Let (X, B, ) be a probability space, and suppose F is a sub--
algebra of B. We use the notation (A[F)(x) :=E(1
A
[F)(x) (as L
1
elements).
1. The information content of A given F is I

(A[F)(x) :=log(A[F)(x)
2. The information function of a nite measurable partition given F is I

([F) :=

A
I

(A[F)1
A
3. The conditional entropy of given F is H

([F) :=
_
I

([F)d.
Convention: Let , be partitions; We write H

([) for H

([()), where
() :=smallest algebra which contains .
The following formul are immediate:
H

([F) =
_
X

A
(A[F)(x)log(A[F)(x)d(x)
H

([) =

B
(B)

A
(A[B)log(A[B), where (A[B) =
(AB)
(B)
.
4.2 Properties of the entropy of a partition 79
4.2 Properties of the entropy of a partition
We need some notation and terminology. Let , be two countable partitions.
1. () is the smallest algebra which contains ;
2. means that () mod , i.e. every element of is equal up to a set
of measure zero to an element of (). Equivalently, if every element of
is equal up to a set of measure zero to a union of elements of . We say that
is ner than , and that is coarser than .
3. = mod iff mod and mod .
4. is the smallest partition which is ner than both and . Equivalently,
:=AB : A , B .
If F
1
, F
2
are two algebras, then F
1
F
2
is the smallest algebra which con-
tains F
1
, F
2
.
4.2.1 The entropy of
It is useful to think of a partition = A
1
, . . . , A
n
as of the information which
element of contains an unknown x.
We state and prove a formula which says that the information content of and
is the information content of plus the information content of given the knowl-
edge .
Theorem 4.1 (The Basic Identity). Suppose , are measurable countable parti-
tions, and assume H

(), H

() <, then
1. I

( [F) = I

([F) +I

([F ());
2. H

( ) = H

() +H

([).
Proof. We calculate I

([F ()):
I

([F ()) =

B
1
B
log(B[F ())
Claim: (B[F ()) =
A
1
A
(BA[F)
(A[F)
:
1. This expression is F ()measurable
2. Observe that F () =

A
AF
A
: F
A
F (this is a algebra which
contains and F). Thus every F ()measurable function is of the form

A
1
A

A
with
A
Fmeasurable. It is therefore enough to check test functions
of the form 1
A
with L

(F). For such functions


80 4 Entropy
_
1
A


A
/

1
A
/
(BA
/
[F)
(A
/
[F)
d =
_
1
A

(BA[F)
(A[F)
d =
=
_
E(1
A
[F)
(BA[F)
(A[F)
d
=
_
(B[F)d =
_
1
B
d.
Using the claim, we see that
I

([F ) =

B
1
B
log

A
1
A
(BA[F)
(A[F)
=

A
1
AB
log
(BA[F)
(A[F)
=

A
1
AB
log(BA[F) +

B
1
AB
log(A[F)
= I

( [F) I

([F).
This proves the rst part of the theorem.
Integrating, we get H

( [F) = H

([F) +H

([F). If F =, X,
then H

( ) = H

() +H

([). .
4.2.2 Convexity properties
Lemma 4.1. Let (t) :=t logt, then for every probability vector (p
1
, . . . , p
n
) and
x
1
, . . . , x
n
[0, 1] (p
1
x
1
+ + p
n
x
n
) p
1
(x
1
) + + p
n
(x
n
), with equality iff
all the x
i
with i s.t. p
i
,= 0 are equal.
Proof. This is because () is strictly concave. Let m := p
i
x
i
. If m = 0 then the
lemma is obvious, so suppose m > 0. It is an exercise in calculus to see that (t)
(m) +
/
(m)(t m) for t [0, 1], with equality iff t = m. In the particular case
m =p
i
x
i
and t = x
i
we get
p
i
(x
i
) p
i
(m) +
/
(m)(p
i
x
i
p
i
m) with equality iff p
i
= 0 or x
i
= m.
Summing over i, we get p
i
(x
i
) (m) +
/
(m)(p
i
x
i
m) = (m). There is
an equality iff for every i p
i
= 0 or x
i
= m. .
Proposition 4.2 (Convexity properties). Let , , be countable measurable par-
titions with nite entropies, then
1. H

([) H

([)
2. H

([) H

([)
4.3 The Metric Entropy 81
Proof. The basic identity shows that has nite entropy, and so H

([) =
H

( [) = H

([) +H

([ ) H

([).
For the second inequality, note that (t) = t logt is strictly concave (i.e. its
negative is convex), therefore by Jensens inequality
H

([) =
_

C
[E(C[())]d =
_

C
[E(E(C[())[())]d

_

C
E([E(1
C
[())][())]d =

C
_
[E(1
C
[())]d H

([),
proving the inequality. .
4.2.3 Information and independence
We say that two partitions are independent, if A , B , (AB) =(A)(B).
This the same as saying that the random variables (x), (x) are independent.
Proposition 4.3 (Information and Independence). H

( ) H

() +H

()
with equality iff , are independent.
Proof. H

( ) = H

() +H

() iff H

([) = H

(). But
H

([) =

B
(B)

A
(A[B)log(A[B).
Let (t) =t logt. We have:

B
(B)[(A[B)] =

A
[(A)].
But is strictly concave, so
B
(B)[(A[B)] [(A)], with equality iff
(A[B) are equal for all B s.t. (B) ,= 0.
We conclude that (A[B) = c(A) for all B s.t. (B) ,= 0. For such B,
(AB) = c(A)(B). Summing over B, gives c(A) = (A) and we obtain the inde-
pendence condition. .
4.3 The Metric Entropy
4.3.1 Denition and meaning
Denition 4.3 (Kolmogorov, Sinai). The metric entropy of a ppt (X, B, , T) is
dened to be
82 4 Entropy
h

(T) := suph

(T, ) : is a countable measurable partition s.t. H

() <,
where h

(T, ) := lim
n
1
n
H

_
n1
_
i=0
T
i

_
.
Proposition 4.4. The limit which denes h

(T, ) exists.
It can be shown that the supremum is attained by nite measurable partitions (prob-
lem 4.9).
Proof. Write
n
:=
_
n1
i=0
T
i
. Then a
n
:=H

(
n
) is subadditive, because a
n+m
:=
H

(
n+m
) H

(
n
) +H

(T
n

m
) = a
n
+a
m
.
We claim that any sequence of numbers a
n

n1
which satises a
n+m
a
n
+a
m
converges to a limit (possibly equal to minus innity), and that this limit is inf[a
n
/n].
Fix n. Then for every m, m = kn+r, 0 r n1, so
a
m
ka
n
+a
r
.
Dividing by m, we get that for all m > n
a
m
m

ka
n
+a
r
kn+r

a
n
n
+
a
r
m
,
whence limsup(a
m
/m) a
n
/n. Since this is true for all n, limsupa
m
/m inf a
n
/n.
But it is obvious that liminf a
m
/m inf a
n
/n, so the limsup and liminf are equal,
and their common value is inf a
n
/n.
We remark that in our case the limit is not minus innity, because H

(
_
n1
i=0
T
i
)
are all non-negative. .
H

(
n
) is the average information content in the rst ndigits of the itinerary.
Dividing by n gives the average information per unit time. Thus the entropy mea-
sure the maximal rate of information production the system is capable of generating.
It is also possible to think of entropy as a measure of unpredictability. Lets think
of T as moving backward in time. Then

1
:=(

n=1
T
n
) contains the informa-
tion on the past of the itinerary. Given the future, how unpredictable is the present,
on average? This is measured by H

([

1
).
Theorem 4.2. If H

() <, then h

(T, ) =H

([

1
), where

1
=
_

n=1
T
i

_
.
Proof. We show that h

(T, ) = H

([

1
). Observe that
H

([
n
0
) = H

(
n1
0
) H

(T
1

n1
0
) = H

(
n
0
) H

(
n1
0
).
Summing over n, we obtain
H

(
n
) H

() =
n

k=1
H

([
k
1
)
4.3 The Metric Entropy 83
Dividing by n and passing to the limit we get
h

(T, ) = lim
n
1
n
n

k=1
H

([
k
1
)
It is therefore enough to show that H

([
k
1
)
k
H

([

1
).
This is dangerous!! It is true that H

([
k
1
) =
_
I

([
k
1
)d and that by the
martingale convergence theorem
I

([
k
1
)
k
I

([

1
) a.e. .
But the claimthat the integral of the limit is equal to the limit of the integrals requires
justication.
If [[ <, then we can bypass the problem by writing
H

([
k
1
) =
_

A
[(A[
k
1
)]d, with (t) =t logt,
and noting that this function is bounded (by [[ max). Thus the BCT applies and
gives H

([
k
1
)
k
H

([

1
).
If [[ = (but H

() < ) then we need to be more clever, and appeal to the


following lemma (proved below):
Lemma 4.2 (ChungNeveu). Suppose is a countable measurable partition with
nite entropy, then the function f

:= sup
n1
I

([
n
1
) is absolutely integrable.
The result now follows from the dominated convergence theorem. .
Here is the proof of the Chung Neveu Lemma. Fix A , then we may decom-
pose A[ f

>t] =

m1
AB
m
(t; A), where
B
m
(t; A) :=x X : m is the minimal natural number s.t. log
2
(A[
m
1
) >t.
We have
[AB
m
(t; A)] =E

_
1
A
1
B
m
(t;A)
_
=E

__
1
A
1
B
m
(t;A)
[(
m
1
)
__
=E

_
1
B
m
(t;A)
E

(1
A
[(
m
1
))
_
, because B
m
(t; A) (
m
1
)
E

_
1
B
m
(t;A)
2
log
2
(A[(
m
1
))
_
E

_
1
B
m
(t;A)
2
t
_
= 2
t
[B
m
(t; A)].
Summing over m we see that (A[ f

> t]) 2
t
. Of course we also have (A
[ f

>t]) (A). Thus (A[ f

>t]) min(A), 2
t
.
We now use the following fact from measure theory: If g 0, then
_
gd =
_

0
[g >t]dt:
1
1
Proof:
_
X
gd =
_
X
_

0
1
[0t<g(x)]
(x, t)dtd(x) =
_

0
_
X
1
[g>t]
(x, t)d(x)dt =
_

0
[g >t]dt.
84 4 Entropy
_
A
f

d =
_

0
(A[ f

>t])dt =
_

0
min(A), 2
t
dt

_
log
2
(A)
0
(A)dt +
_

log
2
(A)
2
t
dt =(A)log
2
(A)
2
t
ln2
_

log
2
(A)
=(A)log
2
(A) +(A)/ln2.
Summing over A we get that
_
f

d H

() +(ln2)
1
<. .
4.3.2 The ShannonMcMillanBreiman Theorem
Theorem 4.3 (ShannonMcMillanBreiman). Let (X, B, , T) be an ergodic ppt,
and a countable measurable partition of nite entropy, then
1
n
I

(
n1
0
)
n
h

(T, ) a.e.
In particular, if
n
(x) :=element of
n
which contains x, then
1
n
log
n
(x)
n
h

(T, ) a.e.
Proof. We start with the basic identity I

(
n1
0
) I

(
n1
1
) = I

(
n1
1
) +
I

([
n1
1
). This gives
I

(
n
0
) = I

([
n
1
) +I

(
n1
0
) T
= I

([
n
1
) +[I

([
n1
1
) +I

(
n2
0
) T] T
= =
n1

k=0
I

([
nk
1
) T
k
=
n

k=1
I

([
k
1
) T
nk
By the Martingale Convergence Theorem, I

([
k
1
)
k
I

([

1
). The idea of
the proof is to use this to say
lim
n
1
n
I

(
n
) lim
n
1
n
n

k=1
I

([
k
1
) T
nk
?
= lim
n
1
n
n

k=1
I

([

1
) T
nk

1
n
n1

k=0
I

([

1
) T
k
=
_
I

([

1
)d (Ergodic Theorem)
= H

([

1
) = h

(T, ).
4.3 The Metric Entropy 85
The point is to justify the question mark. Write f
n
:= I

([
n
1
) and f

=
I

([

1
). It is enough to show
_
limsup
n
1
n
n1

k=0
[ f
k
f

[ T
nk
d = 0.
(This implies that the limsup is zero almost everywhere.) Set F
n
:= sup
k>n
[ f
k
f

[.
Then F
n
0 almost everywhere. We claim that F
n
0 in L
1
. This is because of
the dominated convergence theorem and the fact that F
n
2 f

:= 2sup
m
f
m
L
1
(ChungNeveu Lemma). Fix some large N, then
_
lim
n
1
n
n1

k=0
[ f
nk
f

[ T
k
d =
=
_
lim
n
1
n
nN1

k=0
[ f
nk
f

[ T
k
d +
_
lim
n
1
n
n1

k=nN
[ f
nk
f

[ T
k
d

_
lim
n
1
n
nN1

k=0
F
N
T
k
d +
_
lim
n
1
n
_
N1

k=0
2 f

T
k
_
T
Nn
d
=
_
F
N
d + lim
n
1
n
_
N1

k=0
2f

T
k
d =
_
F
N
d.
Since F
N
0 in L
1
,
_
F
N
d 0, and this proves that the integral of he limsup is
zero. .
4.3.3 Sinais Generator theorem
Let F
1
, F
2
be two sub-algebras of a probability space (X, B, ). We write F
1

F
2
mod , if F
1
F
1
, F
2
F
2
s.t. (F
1
F
2
) =0. We write F
1
=F
2
mod ,
if both inclusions hold mod . For example, B(R) = B
0
(R) mod Lebesgues
measure. For every partition , let

i=
T
i
,

0
:=

i=0
T
i

denote the smallest algebras generated by, respectively,


i=
T
i
and

i=0
T
i
.
Denition 4.4. A countable measurable partition is called a generator for an
invertible (X, B, , T) if
_

i=
T
i
= B mod , and a strong generator, if
_

i=0
T
i
=B mod .
(This latter denition makes sense in the non-invertible case as well)
86 4 Entropy
Example: = [0,
1
2
), [
1
2
, 1] is a strong generator for Tx = 2x mod 1, because
_

i=0
T
i
= (

n1
0
) is the Borel algebra (it contains all dyadic intervals,
whence all open sets).
Theorem 4.4 (Sinais Generator Theorem). Let (X, B, , T) be a ppt. If is a
generator of nite entropy, then h

(T) = h

(T, ).
Proof. Fix a nite measurable partition ; Must show that h

(T, ) h

(T, ).
Step 1. h

(T, ) h

(T, ) +H

([)
1
n
H

(
n1
0
) =
1
n
_
H

(
n1
0
) +H

(
n1
0
[
n1
0
)

1
n
_
H

(
n1
0
) +
n1

k=0
H

(T
k
[
n1
0
)
_

1
n
_
H

(
n1
0
) +
n1

k=0
H

(T
k
[T
k
)
_
=
1
n
H

(
n1
0
) +H

([).
Now pass to the limit.
Step 2. For every N, h

(T, ) h

(T, ) +H

([
N
N
)
Repeat the previous step with
N
N
instead of , and check that h

(T,
N
N
) =
h

(T, ).
Step 3. H

([
N
N
)
N
H

([B) = 0.
H

([
N
N
) =
_
I

([
N
N
)d =

B
_
1
B
log(B[
N
N
)d
=

B
_
(B[
N
N
)log(B[
N
N
)d
=

B
_
[log(B[
N
N
)]d
N

B
_
[log(B[B)]d = 0,
because (B[B) = 1
B
, [1
B
] = 0, and [[ <
This proves that h

(T, ) suph

(T, ) : [[ <. Problem 4.9 says that this


supremum is equal to h

(T), so we are done. .


4.4 Examples 87
4.4 Examples
4.4.1 Bernoulli schemes
Proposition 4.5. The entropy of the Bernoulli shift with probability vector p is is
p
i
log p
i
. Thus the (
1
3
,
1
3
,
1
3
)Bernoulli scheme and the (
1
2
,
1
2
)Bernoulli scheme
are not isomorphic.
Proof. =[1], . . . , [n] is a strong generator, and
H

(
n1
0
) =

x
0
,...,x
n1
p
x
0
p
x
n1
_
log p
x
0
+ log p
x
n1
_
=n

p
i
log p
i
.
.
4.4.2 Irrational rotations
Proposition 4.6. The irrational rotation has entropy zero w.r.t. the Haar measure.
Proof. The reason is that it is an invertible transformation with a strong generator.
We rst explain why any invertible map with a strong generator must have zero
entropy. Suppose is such a strong generator. Then
h

(T, ) = H

([

1
) = H

(T[T(

1
)) =
= H

(T[

0
) = H

(T[B) = 0, because T B.
We now claim that :=A
0
, A
1
(the two halves of the circle) is a strong generator.
It is enough to show that for every ,

n1

n1
0
contains open covers of the circle
by open arcs of diameter < (because this forces

0
to contain all open sets).
It is enough to manufacture one arc of diameter less than , because the transla-
tions of this arc by k will eventually cover the circle.
But such an arc necessarily exits: Choose some n s.t. n mod 1 (0, ). Then
A
1
T
n
A
1
= (A
1
[A
1
n] is an arc of diameter less than .
4.4.3 Markov chains
Proposition 4.7. Suppose is a translation invariant Markov measure with transi-
tion matrix P = (p
i j
) and probability vector (p
i
). Then h

() =p
i
p
i j
log p
i j
.
Proof. The natural partition =[a] : a S is a strong generator.
88 4 Entropy
H

(
n
0
) =

0
,...,
n
S
[] log[]
=

0
,...,
n
S
p

0
p

1
p

n1

n
_
log p

0
+log p

1
+ +log p

n1

n
_
=
n1

j=0

0
,...,
n
S
p

0
p

1
p

n1

n
log p

j+1

0
,...,
n
S
p

0
p

1
p

n1

n
log p

0
=
n1

j=0

0
,...,
n
S
p

0
p

1
p

j1

j
p

j+1

j+2
p

n1

n
p

j+1
log p

j+1

0
S
p

0
log p

0
=
n1

j=0

j
,...,
n
S
(
j
[
j
]) p

j+1

j+2
p

n1

n
p

j+1
log p

j+1

0
S
p

0
log p

0
=
n1

j=0

j
,
j+1
S
p

j
p

j+1
log p

j+1

j+2
,...,
n1
S
p

j+1

j+2
p

n1

0
S
p

0
log p

0
=
n1

j=0

j
,
j+1
S
p

j
p

j+1
log p

j+1

0
S
p

0
log p

0
= n
_

i, j
p
i
p
i j
log p
i j
_

i
p
i
log p
i
Now divide by n+1 and pass to the limit. .
4.4.4 Expanding Markov Maps of the Interval
Theorem 4.5. Suppose T : [0, 1] [0, 1] and = I
1
, . . . , I
N
is a partition into
intervals s.t.
1. is a Markov partition
2. The restriction of T to is C
1
, monotonic, and [T
/
[ > > 1
3. T has an invariant measure .
Then h

(T) =
_
log
d
dT
d, where ( T)(E) =
A
[T(AE)].
4.5 Abramovs Formula 89
Proof. One checks that the elements of
n1
0
are all intervals of length O(
n
).
Therefore is a strong generator, whence
h

(T) = h

(T, ) = H

([

1
) =
_
I

([

1
)d.
We calculate I

([

1
).
First note that

1
=T
1
(

0
) =T
1
B, thus I

([

1
) =
A
1
A
log(A[T
1
B).
We need to calculate E([T
1
B). For this purpose, introduce the operator

T :
L
1
L
1
given by
(

T f )(x) =

Ty=x
d
d T
(y) f (y).
Exercise: Verify: L

and f L
1
,
_

T f d =
_
T f d.
We claim that E( f [T
1
B) = (

T f )T. Indeed, the T


1
Bmeasurable functions are
exactly the functions of the form T with Bmeasurable; Therefore (

T f ) T
is T
1
Bmeasurable, and
_
T

T f Td =
_

T f d =
_
T f d,
proving the identity.
We can now calculate and see that
I

([

1
) =

A
1
A
(x)logE(1
A
[T
1
B)(x)
=

A
1
A
(x)log

Ty=x
d
d T
(y)1
A
(y)

A
1
A
(x)log
d
d T
(x)
= log
d
d T
(x).
We conclude that h

(T) =
_
log
d
dT
(x)d(x). .
4.5 Abramovs Formula
Suppose (X, B, , T) is a ppt. Aset A is called spanning, if X =

n=0
T
n
A mod .
If T is ergodic, then every set of positive measure is spanning.
Theorem 4.6 (Abramov). Suppose (X, B, , T) is a ppt on a Lebesgue space, let A
be a spanning measurable set, and let (A, B
A
,
A
, T
A
) be the induced system, then
h

A
(T
A
) =
1
(A)
h

(T).
Proof. (Scheller) We prove the theorem in the case when T is invertible. The non-
invertible case is handled by passing to the natural extension.
90 4 Entropy
The idea is to show, for as many partitions as possible, that h

(T, ) =
(A)h

A
(T
A
, A), where A :=E A : E . As it turns out, this is the case
for all partitions s.t. (a) H

() < ; (b) A
c
; and (c) n, T
A
[
A
= n] ().
Here, as always,
A
(x) := 1
A
(x)infn 1 : T
n
x A (the rst return time).
To see that there are such partitions, we let

A
:=A
c
T
A

A
, where
A
:=[
A
= n] : n N
(the coarsest possible) and show that H

(
A
) <. A routine calculation shows that
H

(
A
) =H

(A, A
c
)+(A)H

A
(T
A

A
) 1+H

A
(
A
). It is thus enough to show
that p
n
log
2
p
n
< , where p
n
:=
A
[
A
= n]. This is because np
n
= 1/(A)
(Kac formula) and the following fact from calculus: probability vectors with nite
expectations have nite entropy.
2
Assume now that is a partition which satises (a)(c) above. We will use
throughout the following fact:
A, A
c
, [
A
= n]

1
. (4.1)
Here is why: [
A
= n] = T
n
T
A
[
A
= n] T
n

1
. Since AA =

n1
[
A
= n],
we automatically have A, A
c

1
.
Let be a nite entropy countable measurable partition of X such that A
c
is an
atom of and such that
A
. In what follows we use the notation A :=
BA : B ,

1
A :=BA : B

1
. Since H

() <,
h

(T, ) = H

([

1
)
=
_

BA
1
B
log(B[

1
)d +
_
1
A
c log(A
c
[

1
)d
=
_

BA
1
B
log(B[

1
)d, because A
c
(
A
)

1
= (A)
_
A

BA
1
B
log
A
(B[

1
)d
A
,
because A

1
and B A imply E

(1
B
[F) = 1
A
E

A
(1
B
[AF).
It follows that h

(T, ) = (A)H

A
(A[A

1
). We will show later that
A

1
=

i=1
T
i
A
(A) (4.2)
This implies that h

(T, ) = (A)h

A
(T
A
, A). Passing to the supremum over all
which contain A
c
as an atom, we obtain
2
Proof: Enumerate (p
n
) in a decreasing order: p
n
1
p
n
2
. If C=np
n
, thenC
k
i=1
n
i
p
n
i

p
n
k
(1 + +k), whence p
n
k
= O(k
2
). Since xlogx = O(x
1
) as x 0
+
, this means that
p
n
k
log p
n
k
= O(k
(2)
), and so p
n
log p
n
=p
n
k
log p
n
k
<.
4.6 Topological Entropy 91
(A)h

A
(T
A
) = suph

(T, ) :
A
, A
c
, H

() <
= Entropy of (X, B
/
, , T), B
/
:= (
_

: A
c
, H

() <).
(See problem 4.11).
Now B
/
=B mod , because A is spanning, so E B, E =

n=0
T
n
(T
n
E
A) mod , whence E B
/
mod . This shows Abramovs formula, given (4.2).
The proof of (4.2):
Proof of : Suppose B is an atom of A
_
n
j=1
T
j
, then B has the form A

n
j=1
T
j
A
j
where A
j
. Let j
1
< j
2
< < j
N
be an enumeration of the js s.t.
A
j
A (possibly an empty list). Since A
c
is an atom of , A
j
= A
c
for j not in
this list, and so B =

N1
k=1
T
k
A
(A
j
k
[
A
= j
k+1
j
k
]) T
N
A
[
A
> n j
N
]. Since

A
A, B
_

i=1
T
1
A
( A).
Proof of : T
1
A
( A) A
_

i=1
T
i
, because if B A, then
T
1
A
B =

_
n=1
T
n
(BT
A
[
A
= n])

n=1
T
n
( T
A

A

A
A).
The same proof shows that T
1
A
(T
n
A) A
_

i=1
T
i
. It follows that
T
2
A
( A) T
1
A
_
A

i=1
T
i

_
A

i=1
T
1
A
(AT
i
) A

i=1
T
i
.
Iterating this procedure we see that T
n
A
( A) A
_

i=1
T
i
for all n, and
follows. .
4.6 Topological Entropy
Suppose T : X X is a continuous mapping of a compact topological space (X, d).
Such a map can have many different invariant Borel probability measures. For ex-
ample, the left shift on 0, 1
N
has an abundance of Bernoulli measures, Markov
measures, and there are many others.
Different measures may have different entropies. What is the largest value possi-
ble? We study this question in the context of continuous maps on topological spaces
which are compact and metric.
4.6.1 The AdlerKonheimMcAndrew denition
Let (X, d) be a compact metric space, and T : X X a continuous map. Some
terminology and notation:
92 4 Entropy
1. an open cover of X is a collection of open sets U = U

: s.t. X =

;
2. if U =U

: is an open cover, then T


k
U :=T
k
U

: . Since
T is continuous, this is another open cover.
3. if U , V are open covers, then U V :=U V : U U ,V V .
Since X is compact, every open cover of X has a nite subcover. Dene
N(U ) := min#V : V U is nite, and X =
_
V .
It easy to check that N() is subadditive in the following sense:
N(U V ) N(U ) +N(V ).
Denition 4.5. Suppose T : X X is a continuous mapping of a compact metric
space (X, d), and let U be an open cover of X. The topological entropy of U is
h
top
(T, U ) := lim
n
1
n
log
2
N(U
n1
0
), where U
n1
0
:=
n1

i=0
T
k
U .
The limit exists because of the subadditivity of N(): a
n
:= logN(U
n1
0
) satises
a
m+n
a
m
+a
n
, so lima
n
/n exists.
Denition 4.6. Suppose T : X X is a continuous mapping of a compact metric
space (X, d), then the topological entropy of T is the (possibly innite)
h
top
(T) := suph
top
(T, U ) : U is an open cover of X.
The following theorem was rst proved by Goodwyn.
Theorem 4.7. Suppose T is a continuous mapping of a compact metric space, then
every invariant Borel probability measure satises h

(T) h
top
(T).
Proof. Eventually everything boils down to the following inequality, which can be
checked using Lagrange multipliers: For every probability vector (p
1
, . . . , p
k
),

i=1
p
i
log
2
p
i
logk, (4.3)
with equality iff p
1
= = p
k
= 1/k.
Suppose is an invariant probability measure, and let := A
1
, . . . , A
k
be a
measurable partition.
We approximate by a partition into sets with better topological properties. Fix
> 0 (to be determined later), and construct compact sets
B
j
A
j
s.t. (A
j
B
j
) < ( j = 1, . . . , k).
4.6 Topological Entropy 93
Let B
0
:= X

k
j=1
B
j
be the remainder (of measure less than k), and dene =
B
0
; B
1
, . . . , B
k
.
Step 1 in the proof of Sinais theorem says that h

(T, ) h

(T, ) +H

([).
We claim that H

([) can be made uniformly bounded by a suitable choice of


= ():
H

([) =

B
(AB)log
2
(A[B)
=

BB
0

A
(AB)log
2
(A[B)

A
(AB
0
)log
2
(A[B
0
)
=
k

i=1
(B
i
)log
2
1

A
(AB
0
)log
2
(A[B
0
)
=(B
0
)

A
(A[B
0
)log
2
(A[B
0
) (B
0
)log(#) k log
2
k.
If we choose < 1/(klog
2
k), then we get H

([) 1, and
h

(T, ) h

(T, ) +1. (4.4)


We now create an open cover from by setting U := B
0
B
1
, . . . , B
0
B
k
.
This is a cover. To see that it is open note that
B
0
B
j
= B
0
(A
j
B
0
) ( A
j
B
0
= A
j
B
j
)
= B
0
A
j
= B
0

_
X
_
i,=j
A
i
_
= B
0

_
X
_
i,=j
B
i
_
.
We compare the number of elements in U
n1
0
to the number of elements in
n1
0
.
Every element of U
n1
0
is of the form
(B
0
B
i
0
) T
1
(B
0
B
i
1
) T
(n1)
(B
0
B
i
n1
).
This can be written as a pairwise disjoint union of 2
n
elements of
n1
0
(some of
which may be empty sets). Thus every element of U
n1
0
contains at most 2
n
el-
ements of
n1
0
. Forming the union over a sub cover of U
n1
0
with cardinality
N(U
n1
0
), we get that #
n1
0
2
n
N(U
n1
0
).
We no appeal to (4.3): H

(
n1
0
) log
2
(#
n1
0
) H(U
n1
0
) +n. Dividing by
n and passing to the limit as n , we see that h

(T, ) h
top
(U ) +1. By (4.4),
h

(T, ) h
top
(U ) +2 h
top
(T) +2.
Passing to the supremum over all , we get that h

(T) h
top
(T) +2, and this
holds for all continuous mappings T and invariant Borel measures . In particular,
this holds for T
n
(note that (T
n
)
1
= ): h

(T
n
) h
top
(T
n
) +2. But h

(T
n
) =
nh

(T) and h
top
(T
n
) =nh
top
(T) (problems 4.4 and 4.13). Thus we get upon division
by n that h

(T) h
top
(T) +(2/n)
n
0, which proves the theorem. .
94 4 Entropy
In fact, h
top
(T) = suph

(T) : is an invariant Borel probability measure. But to


prove this we need some more preparations. These are done in the next section.
4.6.2 Bowens denition
We assume as usual that (X, d) is a compact metric space, and that T : X X is
continuous. For every n we dene a new metric d
n
on X as follows:
d
n
(x, y) := max
0in1
d(T
i
x, T
i
y).
This is called Bowens metric. It depends on T. A set F X is called (n, )
separated, if for every x, y F s.t. x ,= y, d
n
(x, y) > .
Denition 4.7.
1. s
n
(, T) := max#(F) : F is (n, )separated.
2. s(, T) := limsup
n
1
n
logs
n
(, T)
3. h
top
(T) := lim
0
+
s(, T).
Theorem 4.8 (Bowen). Suppose T is a continuous mapping of a compact metric
space X, then h
top
(T) = h
top
(T).
Proof. Suppose U is an open cover all of whose elements have diameters less than
. We claim that N(U
n1
0
) s
n
(, T) for all n. To see this suppose F is an (n, )
separated set of maximal cardinality. Each x F is contained in some U
x
U
n1
0
.
Since the ddiameter of every element of U is less than , the d
n
diameter of every
element of U
n1
0
is less than . Thus the assignment x U
x
is one-to-one, whence
N(U
n1
0
) s
n
(, T).
It follows that s(, T) h
top
(T, U ) h
top
(T), whence h
top
(T) h
top
(T).
To see the other inequality we use Lebesgue numbers: a number is called a
Lebesgue number for an open cover U , if for every x X, the ball with radius
and center x is contained in some element of U . (Lebesgue numbers exist because
of compactness.)
Fix and let U be an open cover with Lebesgue number bigger than or equal to
. It is easy to check that for every n, is a Lebesgue number for U
n1
0
w.r.t. d
n
.
Let F be an (n, /2)separated set of maximal cardinality, i.e. #F = s
n
(). Then
any point y we add to F will break its (n, )separation property, and so
y X x F s.t. d
n
(x, y) /2.
It follows that the sets B
n
(x; /2) :=y : d
n
(x, y) /2 (x F) cover X.
4.6 Topological Entropy 95
Every B
n
(x, /2) (x F) is contained in some element of U
n1
0
, because U
n1
0
has Lebesgue number w.r.t d
n
. The union of these elements covers X. We found a
sub cover of U
n1
0
of cardinality at most #F = s
n
(). This shows that
N(U
n1
0
) s
n
().
We just proved that for every open cover U with Lebesgue number at least ,
h
top
(T, U ) s(). It follows that
suph
top
(T, U ) : U has Lebesgue number at least s().
We now pass to the limit 0
+
. The left hand side tends to the supremum over all
open covers, which is equal to h
top
(T). We obtain h
top
(T) lim
0
+
s(). .
Corollary 4.1. Suppose T is an isometry, then all its invariant probability measures
have entropy zero.
Proof. If T is an isometry, then d
n
= d for all n, therefore s(, T) = 0 for all >
0, so h
top
(T) = 0. The theorem says that h
top
(T) = 0. The corollary follows from
Goodwyns theorem. .
4.6.3 The variational principle
The following theorem was rst proved under additional assumptions by Dinaburg,
and then in the general case by Goodman. The proof below is due to Misiurewicz.
Theorem 4.9 (Variational principle). Suppose T : X X is a continuous map of a
compact metric space, then h
top
(T) =suph

(T) : is an invariant Borel measure.


Proof. We have already seen that the topological entropy is an upper bound for the
metric entropies. We just need to show that this is the least upper bound.
Fix , and let F
n
be a sequence of (n, )separated sets of maximal cardinality
(so #F
n
= s
n
(, T)). Let

n
:=
1
#F
n

xF
n

x
,
where
x
denotes the Dirac measure at x (i.e.
x
(E) =1
E
(x)). These measure are not
invariant, so we set

n
:=
1
n
n1

k=0

n
T
k
.
Any weak star limit of
n
will be Tinvariant (check).
Fix some sequence n
k
s.t.
n
k
w

k
and s.t.
1
n
k
logs
n
k
(, T)
n
s(, T).
We show that the entropy of is at least s(, T). Since s(, T)
0
+
h
top
(T), this
will prove the theorem.
96 4 Entropy
Let = A
1
, . . . , A
N
be a measurable partition of X s.t. (1) diam(A
i
) < ; and
(2) (A
i
) = 0. (Such a partition can be generated from a cover of X by balls of
radius less than /2 and boundary of zero measure.) It is easy to see that the d
n

diameter of
n1
0
is also less than . It is an exercise to see that every element of

n1
0
has boundary with measure equal to zero.
We calculate H

n
(
n1
0
). Since F
n
is (n, )separated and every atom of has
d
n
diameter less than ,
n1
0
has #F
n
elements whose
n
measure is 1/#F
n
, and the
other elements of
n1
0
have measure zero. Thus
H

n
(
n1
0
) = log
2
#F
n
= log
2
s
n
(, T).
We now play with H

n
(
n1
0
) with the aim of bounding it by something in-
volving a sum of the form
n1
i=0
H

n
T
i (
q1
0
). Fix q, and j 0, . . . , q1, then
log
2
s
n
(, T) = H

n
(
n1
0
) H

n
(
j1
0

[n/q]1

i=0
T
(qi+j)

q1
0

n1
q([n/q]1)+j+1
)

[n/q]1

i=0
H

n
T
(qi+j)
(
q1
0
) +2qlog
2
#.
Summing over j = 0, . . . , q1, we get
qlog
2
s
n
(, T) n
1
n
n1

k=0
H

n
T
k (
q1
0
) +2qlog
2
#
nH

n
(
q1
0
) +2qlog
2
#,
because
n
=
1
n

n1
i=0

n
T
i
and (t) =t log
2
t is concave. Thus
1
n
k
log
2
s
n
k
(, T)
1
q
H

n
k
(
q1
0
) +
2
n
k
log
2
#, (4.5)
where n
k
is the subsequence chosen above.
Since every A
n1
0
satises (A) = 0,
n
k
(A)
w

(A) for all A


n1
0
(sandwich u 1
A
v with u, v continuous s.t.
_
[u v[d 1). It follows that
H

n
k
(
q1
0
)
k
H

(
q1
0
). Passing to the limit k in (4.5), we have s(, T)
1
q
H

()
q
h

(T, ) h

(T). Thus h

(T) s(, T). Since s(, T)


0
+
h
top
(T)
the theorem is proved. .
4.6 Topological Entropy 97
Problems
4.1. Prove: H

([) =
B
(B)
A
(A[B)log(A[B), where (A[B) =
(AB)
(B)
.
4.2. Prove: if H

([) = 0, then mod .


4.3. Prove that h

(T) is an invariant of measure theoretic isomorphism.


4.4. Prove that h

(T
n
) = nh

(T).
4.5. Prove that if T is invertible, then h

(T
1
) = h

(T).
4.6. Entropy is afne
Let T be a measurable map on X, and
1
,
2
be two Tinvariant probability mea-
sures. Set =t
1
+(1t)
2
(0 t 1). Show: h

(T) =th

1
(T) +(1t)h

2
(T).
Guidance: Start by showing that for all 0 x, y, t 1,
0 (tx +(1t)y) [t(x) +(1t)(y)] t logt (1t)log(1t)
4.7. Let (X, B, ) be a probability space. If , are two measurable partitions of
X, then we write = mod if = A
1
, . . . , A
n
and B = B
1
, . . . , B
n
where
(A
i
B
i
) = 0 for all i. Let P denote the set of all countable measurable partitions
of X, modulo the equivalence relation = mod . Show that
(, ) := H

([) +H

([)
induces a metric on P.
4.8. Let (X, B, , T) be a ppt. Show that [h

(T, ) h

(T, )[ H

([) +
H

([).
4.9. Use the previous problem to show that the supremum in the denition of metric
entropy is attained by nite measurable partitions.
4.10. Suppose =A
1
, . . . , A
n
is a nite measurable partition. Show that for every
, there exists = (, n) such that if =B
1
, . . . , B
n
is measurable partition s.t.
(A
i
B
i
) < , then (, ) < .
4.11. Entropy via generating sequences of partitions
Suppose (X, B, ) is a probability space, and A is an algebra of Fmeasurable
subsets (namely a collection of sets which contains and which is closed under
nite unions, nite intersection, and forming complements). Suppose A generates
B (i.e. B is the smallest algebra which contains A).
1. For every F F and > 0, there exists A A s.t. (AF) < .
2. For every Fmeasurable nite partition and > 0, there exists an A
measurable nite partition s.t. (, ) < .
98 4 Entropy
3. If T : X X is probability preserving, then
h

(T) = suph

(T, ) : is an Ameasurable nite partition.


4. Suppose
1

2
is an increasing sequence of nite measurable partitions
such that (

n1

n
) =B mod , then h

(T) = lim
n
h

(T,
n
).
4.12. Show that the entropy of the product of two ppt is the sum of their two en-
tropies.
4.13. Show that h
top
(T
n
) = nh
top
(T).
Notes to chapter 4
The notion of entropy as a measure of information is due to Shannon, the father of
the modern theory of information. Kolmogorov had the idea to adapt this notion to
the ergodic theoretic context for the purposes of inventing an invariant which is able
to distinguish Bernoulli schemes. This became possible once Sinai has proved his
generator theorem which enables the calculation of this invariant for Bernoulli
schemes. Later, in the 70s, Ornstein has proved that entropy is a complete invariant
for Bernoulli schemes: they are isomorphic iff they have the same entropy. The max-
imum of the possible entropies for a topological Markov shift was rst calculated
by Parry, who also found the maximizing measure. The material in this chapter is all
classical, [3] and [1] are both excellent references. For an introduction to Ornsteins
isomorphism theorem, see [2].
References
1. Petersen, K.: Ergodic theory. Corrected reprint of the 1983 original. Cambridge Studies in
Advanced Mathematics, 2. Cambridge University Press, Cambridge, 1989. xii+329 pp.
2. Rudolph, D.: Fundamentals of measurable dynamics: Ergodic theory on Lebesgue spaces,
Oxford Science Publications, 1990. x+168 pp.
3. Walters, P.: An introduction to ergodic theory. Graduate Texts in Mathematics, 79. Springer-
Verlag, New York-Berlin, 1982. ix+250 pp.
Appendix A
The isomorphism theorem for standard measure
spaces
A.1 Polish spaces
Denition A.1. A polish space is a metric space (X, d) which is
1. complete (every Cauchy sequence has a limit);
2. and separable (there is a countable dense subset).
Every compact metric space is polish. But a polish space need not be compact, or
even locally compact. For example,
N
N
:=x = (x
1
, x
2
, x
3
, . . .) : x
k
N
equipped with the metric d(x, y) :=
k1
2
k
[x
1
k
y
1
k
[ is a non locally compact
polish metric space.
Notation: B(x, r):=y X : d(x, y) < r (the open ball with center x and radius r).
Proposition A.1. Suppose (X, d) is a polish space, then
1. Second axiom of countability: There exists a countable family of open sets U
such that every open set in X is a union of a subfamily of U .
2. Lindel of property: Every cover of X by open sets has a countable sub-cover.
3. The intersection of any decreasing sequence of closed balls whose radii tend to
zero is a single point.
Proof. Since X is separable, it contains a countable dense set x
n

n1
. Dene
U :=B(x
n
, r) : n N, r Q.
This is a countable collection, and we claim that it satises (1). Take some open set
U. For every x U there are
1. R > 0 such that B(x, R) U (because U is open);
2. x
n
B(x, R/2) (because x
n
is dense); and
99
100 A The isomorphism theorem for standard measure spaces
3. r Q such that d(x, x
n
) < r < R/2.
It is easy to check that x B(x
n
, r) B(x, 2r) B(x, R) U. Thus for every x U
there is U
x
U s.t. x U
x
U. It follows that U is a union of elements from U
and (1) is proved. (2) is an immediate consequence.
To see (3), suppose B
n
:= B(z
n
, r
n
) is a sequence of closed balls such that B
n

B
n+1
and r
n
0. It is easy to verify that z
n
is a Cauchy sequence. Since X is
complete, it converges to a limit z. This limit belongs to B
n
because z
k
B
k
B
n
for
all k > n, and B
n
is closed. Thus the intersection of B
n
contains at least one point.
It cannot contain more than one point, because its diameter is zero (because it is
bounded by diam[B
n
] r
n
0). .
A.2 Standard probability spaces
Denition A.2. A standard probability space is a probability space (X, B, ) where
X is polish, B is the algebra of Borel sets of X, and is a Borel probability
measure.
Theorem A.1. Suppose (X, B, ) is a standard probability space, then
1. Regularity: Suppose E B. For every > 0 there exists an open set U and a
closed set F such that F E U and (U F) < .
2. Separability: There exists a countable collection of measurable sets E
n

n1
such that for every E B and > 0 there exists some n s.t. (EE
n
) < .
Equivalently, L
p
(X, B, ) is separable for all 1 p <.
Proof. Say that a set E satises the approximation property if for every there are
a closed set F and an open set U s.t. F E U and (U E) < .
Open balls B(x, r) have the approximation property: Take U = B(x, r) and F =
B(x, r
1
n
) for n sufciently large (these sets increase to B(x, r) so their measure
tends to that of B(x, r)).
Open sets U have the approximation property. The approximating open set is
the set itself. To nd the approximating closed set use the second axiom of count-
ability to write the open set as the countable union of balls B
n
, and approximate
each B
n
from within by a closed set F
n
such that (B
n
F
n
) < /2
n+1
. Then
(U

n1
F
n
) < /2. Now take F :=

N
i=1
F
i
for N large enough.
Thus the collection C := E B : E has the approximation property contains
the open sets. Since it is a algebra (check!), it must be equal to B, proving (1).
We prove separability. Polish spaces satisfy the second axiom of countability, so
there is a countable family of open balls U = B
n
: n N such that every open
set is the union of a countable subfamily of U . This means that every open set
can be approximated by a nite union of elements of U to arbitrary precision. By
the regularity property shown above, every measurable set can be approximated by
a nite union of elements of U to arbitrary precision. It remains to observe that
E =nite unions of elements of U is countable.
A.3 Atoms 101
The separability of L
p
for 1 p < follows from the above and the obvious fact
that the collection
N
i=1

i
1
E
i
: N N,
i
Q, E
i
B is dense in L
p
(prove!). The
other direction is left to the reader. .
The following statement will be used in the proof of the isomorphism theorem.
Lemma A.1. Suppose (X, B, ) is a standard probability space and E is a measur-
able set of positive measure, then there is a point x X such that [E B(x, r)] ,= 0
for all r > 0.
Proof. Fix
n
0. Write X as a countable union of open balls of radius
1
(second
axiom of countability). At least one of these, B
1
, satises (E B
1
) ,= 0. Write B
1
as a countable union of open balls of radius
2
. At least one of these, B
2
, satises
[E B
1
B
2
] ,= 0. Continue in this manner. The result is a decreasing sequence
of open balls with shrinking diameters B
1
B
2
which intersect E at a set of
positive measure.
The sequence of centers of these balls is a Cauchy sequence. Since X is polish, it
converges to a limit x X. This x belongs to the closure of each B
n
.
For every r nd n so large that
n
< r/2. Since x B
n
, d(x, x
n
)
n
, and this
implies that B(x, r) B(x
n
,
n
) = B
n
. Since B
n
intersects E with positive measure,
B(x, r) intersects E with positive measure. .
A.3 Atoms
Denition A.3. An atom of a measure space (X, B, ) is a measurable set A of non-
zero measure with the property that for all other measurable sets B contained in A,
either (B) = (A) or (B) = 0. A measure space is called non-atomic, if it has no
atoms.
Proposition A.2. For standard spaces (X, B, ), every atomis of the formxnull
set for some x s.t. x ,= 0.
Proof. Suppose A is an atom. Since X can be covered by a countable collection of
open balls of radius r
1
:= 1, A =

i1
A
i
where A
i
are measurable subsets of A of
diameter at most r
1
. One of those sets, A
i
1
, has non-zero measure. Since A is an
atom, (A
i
1
) = (A). Setting A
(1)
:= A
i
1
, we see that
A
(1)
A , diam(A
(1)
) r
1
, (A
(1)
) = (A).
Of course A
(1)
is an atom.
Now repeat this argument with A
(1)
replacing A and r
2
:= 1/2 replacing r
1
. We
obtain an atom A
(2)
s.t.
A
(2)
A , diam(A
(2)
) r
2
, (A
(2)
) = (A).
102 A The isomorphism theorem for standard measure spaces
We continue in this manner, to obtain a sequence of atoms A A
(1)
A
(2)

of the same measure, with diameters r
k
= 1/k 0. The intersection

A
(k)
is non-
empty, because its measure is lim(A
(k)
) = (A) ,= 0. But its diameter is zero.
Therefore it is a single point x, and by construction x A and x = (A). .
Lemma A.2. Suppose (X, B, ) is a non-atomic standard probability space, and
r > 0. Every Bmeasurable set E can be written in the form E =

i=1
F
i
N where
(N) = 0, F
i
are closed, diam(F
i
) < r, and (F
i
) ,= 0.
Proof. Since every measurable set is an a nite or countable disjoint union of sets
of diameter less than r (prove!), it is enough to treat sets E such that diam(E) < r.
Standard spaces are regular, so we can nd a closed set F
1
E such that (E
F
1
) <
1
2
. If (E F
1
) = 0, then stop. Otherwise apply the argument to E F
1
to
nd a closed set F
2
E F
1
of positive measure such that [E (F
1
F
2
)] <
1
2
2
.
Continuing in this manner we obtain pairwise disjoint closed sets F
i
E such that
(E

n
i=1
F
i
) < 2
i
for all n, or until we get to an n such that (E

n
i=1
F
n
) = 0.
If the procedure did not stop at any stage, then the lemma follows with N :=
E

i1
F
i
.
We show what to do in case the procedure stops after n steps. Set F = F
n
, the last
closed set. The idea is to split F into countably many disjoint closed sets, plus a set
of measure zero.
Find an x such that [F B(x, r)] ,= 0 for all r > 0 (previous lemma). Since X is
non-atomic, x = 0. Since B(x, r) x, [F B(x, r)]
n
0. Choose r
n
0 for
which [F B(x, r
n
)] is strictly decreasing. Dene
C
1
:= F B(x, r
1
) B(x, r
2
), C
2
:= F B(x, r
3
) B(x, r
4
) and so on.
This is an innite sequence of closed pairwise disjoint sets of positive measure in-
side F. By the construction of F they are disjoint from F
1
, . . . , F
n1
and they are
contained in E.
Now consider E
/
:=E(

n1
i=0
F
i

i
C
i
). Applying the argument in the rst para-
graph to E
/
, we write it as a nite or countable disjoint union of closed sets plus a
null set. Adding these sets to the collection F
1
, . . . , F
n1
C
i
: i 1 gives us the
required decomposition of E. .
A.4 The isomorphism theorem
Denition A.4. Two measure spaces (X
i
, B
i
,
i
) (i = 1, 2) are called isomorphic if
there are measurable subsets of full measure X
/
i
X
i
and a measurable bijection
: X
/
1
X
/
2
with measurable inverse such that
2
=
1

1
.
Theorem A.2 (Isomorphism theorem). Every non-atomic standard probability
spaces is isomorphic to the unit interval equipped with the Lebesgue measure.
A.4 The isomorphism theorem 103
Proof. Fix a decreasing sequence of positive numbers
n
which tends to zero. Using
lemma A.2, decompose X =

j=1
F( j) N where F( j) are pairwise disjoint closed
sets of positive measure and diameter less than
1
, and N
1
is a null set.
Applying lemma A.2 to each F( j), decompose F( j) =

k=1
F( j, k)N( j) where
F( j, k) are pairwise disjoint closed sets of positive measure and diameter less than

2
, and N( j) is a null set.
Continuing in this way we obtain a family of sets F(x
1
, . . . , x
n
), N(x
1
, . . . , x
n
),
(n, x
1
, . . . , x
n
N) such that
1. F(x
1
, . . . , x
n
) are closed, have positive measure, and diam[F(x
1
, . . . , x
n
)] <
n
;
2. F(x
1
, . . . , x
n1
) =

yN
F(x
1
, . . . , x
n1
, y) N(x
1
, . . . , x
n1
);
3. [N(x
1
, . . . , x
n
)] = 0.
Set X
/
:=

n1

x
1
,...,x
n
N
F(x
1
, . . . , x
n
). It is a calculation to see that (X X
/
) =
0. The set X
/
has tree like structure: every x X
/
determines a unique sequence
(x
1
, x
2
, . . .) N
N
such that x F(x
1
, . . . , x
n
) for all n. Dene : X
/
[0, 1] by
(x) =
1
x
1
+
1
x
2
+
This map is one-to-one on X
/
, because if (x) = (y), then 1/(x
1
+1/(x
2
+
)) = 1/(y
2
+ 1/(y
2
+ )) whence x
k
= y
k
for all k;
1
this means that x, y
F(x
1
, . . . x
n
) for all n, whence d(x, y)
n

n
0.
This map is onto [0, 1] Q, because every irrational t [0, 1] has an innite con-
tinued fraction expansion 1/(a
1
+1/(a
2
+ )), so t = (x) for the unique x in

n1
F(a
1
, . . . , a
n
). (This intersection is non-empty because it is the decreasing in-
tersection of closed sets of shrinking diameters in a complete metric space.)
We claim that : X
/
[0, 1] Q is Borel measurable. Let [a
1
, . . . , a
n
] de-
note the collection of all irrationals in [0, 1] whose continued fraction expansion
starts with (a
1
, . . . , a
n
). We call such sets cylinders. We have
1
[a
1
, . . . , a
n
] =
F(a
1
, . . . , a
n
) =a closed set, so the preimage of every cylinder is Borel measur-
able. Thus
C :=E B([0, 1] Q) :
1
(E) B
contains the cylinders. It is easy to check that C is a algebra. The cylinders gener-
ate B([0, 1] Q) (these are intervals whose length tends to zero as n ). It follows
that C =B([0, 1] Q) and the measurability of is proved.
Next we claim that
1
: [0, 1] Q X
/
is Borel measurable. This is because
[F(a
1
, . . . , a
n
)] = [a
1
, . . . , a
n
] and an argument similar to the one in the previous
paragraph.
It follows that : (X, B, ) ([0, 1] Q, B([0, 1] Q),
1
) is an isomor-
phism of measure spaces. There is an obvious extension of
1
to B([0, 1])
obtained by declaring (Q) := 0. Let m denote this extension. Then we get an
1
Hint: apply the transformation x [1/x] to both sides.
104 A The isomorphism theorem for standard measure spaces
isomorphism between (X, B, ) to ([0, 1], B([0, 1]), m) where is m is some Borel
probability measure on [0, 1]. Since is non-atomic, m is non-atomic.
We now claim that ([0, 1], B([0, 1]), m) is isomorphic to ([0, 1], B([0, 1]), ),
where is the Lebesgue measure.
Consider rst the distribution function of m, s m[0, s). This is a monotone
increasing function (in the weak sense). We claim that it is continuous. Otherwise it
has a jump J at some point x
0
:
m[0, x
0
+) m[0, x
0
) > J for all > 0.
This means that mx
0
J, which cannot be the case since m is non-atomic.
It follows that the following denition makes sense for all t:
(t) := mins 0 : m([0, s)) =t.
This is a monotone increasing map : [0, 1] [0, 1]. We aim at showing that is
an isomorphism with the unit interval equipped with Lebesgues measure .
Step 1. m([0, 1] [0, 1]) = 0.
The reader should note that in general [0, 1] ,= [0, 1] as sets: If m[[s
/
, s)] = 0 for
some s
/
< s, then s , [0, 1]. The opposite is also true: If s , ([0, 1]), then s
/
< s
such that m[0, s) = m[0, s
/
), whence m[s
/
, s) = 0.
Dene for every s , [0, 1], I(s) := [s
1
, s
2
], where
s
1
:= mins
/
< s : m[0, s
/
) = m[0, s) and s
2
:= maxs
/
> s : m[0, s
/
) = m[0, s)
Then s I(s), [I(s)[ [ss
/
[ >0, and since m is not atomic, m[I(s)] =m(s
1
, s
2
) =0.
Moreover, for any two s
/
, s
//
[0, 1] [0, 1], either I(s
/
) =I(s
//
), or I(s
/
)I(s
//
) =.
Since there can be at most countably many disjoint intervals of positive length, the
collection I(s) : s [0, 1] [0, 1] is countable. It follows that [0, 1] [0, 1] is
covered by a countable collection of sets of mmeasure zero. The step follows.
Step 2. is one-to-one on [0, 1].
Observe rst that since m has no atoms, m([0, s)) = m([0, s]) for all s. This implies
the equation m[0, (t)] =t. It immediately follows that is one-to-one.
Step 3. is measurable, with measurable inverse.
is monotonic, so it maps intervals to intervals. This implies that and
1
are
measurable (prove!).
Step 4. m = .
Recall the equation m[0, (t)] =t found above. It implies that m[((s), (t)]] =t s
for all 0 < s <t < 1, and this is enough to deduce that m = (prove!)
Steps 14 show that : ([0, 1], B([0, 1]), m) ([0, 1], B([0, 1]), ) is an isomor-
phism. Composing this with , we get an isomorphism between (X, B, ) and the
unit interval equipped with Lebesgues measure. .
A.4 The isomorphism theorem 105
We comment on the atomic case. Astandard probability space (X, B, ) can have
at most countably many atoms (otherwise it will contain an uncountable collection
of pairwise disjoint sets of positive measure, which cannot be the case). Let x
i
: i
be a list of the atoms, where N. Then
=
/
+

i
x
i

x
i
(
x
= Dirac measure)
where
/
is non-atomic.
Suppose w.l.o.g that X N =. The map
: X X , (x) =
_
x x , x
i
: i
i x = x
i
is an isomorphismbetween X and the measure space obtained by adding to (X, B,
/
)
atoms with right mass at points of . The space (X, B,
/
) is non-atomic, so it is iso-
morphic to [0,
/
(X)] equipped with the Lebesgue measure. We obtain the follow-
ing generalization of the isomorphism theorem: Every standard probability space
is isomorphic to the measure space consisting of a nite interval equipped with
Lebesgues measure, and a nite or countable collection of atoms.
Denition A.5. A measure space is called a Lebesgue space, if it is isomorphic to
the measure space consisting of a nite interval equipped with the Lebesgue mea-
surable sets and Lebesgues measure, and a nite or countable collection of atoms.
Note that the algebra in the denition is the Lebesgue algebra, not the Borel
algebra. (The Lebesgue algebra is the completion of the Borel algebra with
respect to the Lebesgue measure, see problem 1.2.) The isomorphism theorem and
the discussion above say that the completion of a standard space is a Lebesgue space.
So the class of Lebesgue probability spaces is enormous!
Appendix A
The Monotone Class Theorem
Denition A.1. A sequence of sets A
n
is called increasing (resp. decreasing) if
A
n
A
n+1
for all n (resp. A
n
A
n+1
for all n).
Notation: A
n
A means that A
n
is an increasing sequence of sets, and A =

A
n
.
A
n
A means that A
n
is a decreasing sequence of sets, and A =

A
n
.
Proposition A.1. Suppose (X, B, ) is a measure space, and A
n
B.
1. if A
n
A, then (A
n
)
n
(A);
2. if A
n
A and (A
n
) < for some n, then (A
n
)
n
(A).
Proof. For (1), observe that A =

n1
A
n+1
A
n
and use additivity. For (2), x
n
0
s.t. (A
n
0
) <, and observe that A
n
A implies that (A
n
0
A
n
) (A
n
0
A). .
The example A
n
= (n, ), =Lebesgue measure on R, shows that the condition in
(2) cannot be removed.
Denition A.2. Let X be a set. A monotone class of subsets of X is a collection M
of subsets of X which contains the empty set, and such that if A
n
M and A
n
A
or A
n
A, then A M.
Recall that an algebra of subsets of a set X is a collection of subsets of X which
contains the empty set, and which is closed under nite unions, nite intersections,
and forming the complement.
Theorem A.1 (Monotone Class Theorem). A monotone class which contains an
algebra, also contains the sigmaalgebra generated by this algebra.
Proof. Let M be a monotone class which contains an algebra A. Let M(A) de-
note the intersection of all the collections M
/
M such that (a) M
/
is a monotone
class, and (b) M
/
A. This is a monotone class (check!). In fact it is the min-
imal monotone class which contains A. We prove that it is a algebra. Since
M(A) M, this completes the proof.
107
108 A The Monotone Class Theorem
We begin by claiming that M(A) is closed under forming complements. Sup-
pose E M(A). The set
M
/
:=E
/
M(A) : (E
/
)
c
M(A)
contains A (because A is an algebra), and it is a monotone class (check!). But
M(A) is the minimal monotone class which contains A, so M
/
M(A). It
follows that E M
/
, whence E
c
M(A).
Next we claim that M(A) has the following property:
E M(A), A A =E A M(A).
Again, the reason is that the collection M
/
of sets with this property contains A,
and is a monotone class.
Now x E M(A), and consider the collection
M
/
:=F M(A) : E F M(A).
By the previous paragraph, M
/
contains A. It is clear that M
/
is a monotone class.
Thus M(A) M
/
, and as a result E F M(A) for all F M(A). But E
M(A) was arbitrary, so this means that M(A) is closed under nite unions.
Since M(A) is closed under nite unions, and countable increasing unions, it is
closed under general countable unions.
Since M(A) is closed under forming complements and taking countable unions,
it is a sigma algebra. By denition this sigma algebra contains A and is contained
in M. .
Index
Abramov formula, 89
Action, 1
adding machine, 28, 60
algebra, 98
Alt, 44
arithmeticity, 21
atom, 103
Bernoulli scheme, 9, 69
entropy of, 87
Bowens metric, 94
Carath eodory Extension Theorem, 10
Chacons example, 72
Chung-Neveu Lemma, 83
coboundary, 31
complete
measure space, 27
Riemannian surface, 18
conditional
entropy, 78
expectation, 35
probabilities, 36
conguration, 1
conservative mpt, 29
covariance, 6
cutting and stacking, 72
cylinders, 9
Dynamical system, 1
eigenfunction, 64
eigenvalue, 64
entropy
conditional, 78
of a measure, 81
of a partition, 78
topological, 91
ergodic, 5
decomposition, 38
hypothesis, 3
ergodicity and countable Lebesgue
spectrum, 72
ergodicity and extremality, 60
ergodicity and mixing, 60
ows, 18, 28
theory, 1
Ergodic Theorem
Mean, 31, 60
Multiplicative, 48
Pointwise, 32
Ratio, 61
Subadditive, 40
ergodicity and mixing, 32
extension, 22
exterior product, 45
and angles, 47
of linear operators, 46
extremal measure, 60
factor, 22
ow, 1
Fourier Walsh system, 69
Furstenberg-Kesten theorem, 43
generator, 85
geodesic ow, 17
Herglotz theorem, 66
hyperbolic
plane, 16
surface, 18
independence, 6
109
110 Index
for partitions, 81
induced transformation, 24
Abramov formula, 89
entropy of, 89
for innite mpt, 29
Kac formula, 24
Kakutani skyscraper, 26
information
conditional, 78
content, 78
function, 78
invariant set, 5
isomorphism, 4
measure theoretic, 4
of measure spaces, 104
spectral, 63
isomorphism theorem for measure spaces, 104
iterates, 1
itinerary, 77
K automorphism, 69
Kac formula, 24, 29
Kakutani skyscraper, 26
Lebesgue number, 94
Liouvilles theorem, 2
Martingale convergence theorem, 61
measure, 3
measure preserving transformation, 4
measure space, 3
Lebesgue, 3, 107
non-atomic, 103
sigma nite, 29
standard, 3
mixing
and countable Lebesgue spectrum, 72
weak, 65
mixing and ergodicity, 32
monotone class, 109
mpt, 4
multilinear function, 44
alternating, 44
natural extension, 23
orbit, 1
partition
ner or coarser, 79
wedge product of, 79
periodicity, 21
Perron-Frobenius Theorem, 27
Phase space, 1
Poincar e Recurrence Theorem, 3
Poincar e section, 27
Polish, 21
polish space, 21, 101
positive denite, 66
probability
preserving transformation, 4
stationary probability vector, 11
measure, 3
space, 3
vector, 11
product
of measure spaces, 19
of mpt, 20
PSL(2,R), 16
regularity (of a measure space), 102
rotation, 7
Rotations, 28
section map, 27
semi algebra, 10
semi-ow, 1
sigma algebra, 3
sigma niteness, 29
skew product, 21
spectral
invariant, 63
isomorphism, 63
measure, 66
spectrum
and the K property, 70
continuous, 64, 67
countable Lebesgue, 69
discrete, 64
Lebesgue, 72
mixed, 64
point, 64
pure point, 64
standard
probability space, 102
stationary
probability vector, 11
stationary stochastic process, 5
stochastic matrix, 11
stochastic process, 4
subadditive
cocycle, 40
ergodic theorem, 40
subshift of nite type, 11
suspension, 26
tail events, 70
tensor product, 44
Index 111
time one map, 28
Topological entropy, 91
transition matrix
aperiodic, 12
irreducible, 12
period, 12
unitary equivalence, 63
wandering set, 29
wedge product
of algebras, 79
of multilinear forms, 45
of partitions, 85
Zero one law, 70

Anda mungkin juga menyukai