Anda di halaman 1dari 78

MA40042 Measure Theory and Integration

class notes prepared by Dr. Antal A. Jarai

December 9, 2014

Contents
0 Introduction
0.1 Basic information . . . . . . . . . . . . . . . . . . . . . . . . .
0.2 Conventions regarding infinities . . . . . . . . . . . . . . . . .

3
3
4

1 Systems of sets
1.1 -algebras . . . . . . . . . .
1.2 Some easy properties . . . .
1.3 Examples and non-examples
1.4 Algebras . . . . . . . . . . .
1.5 Generated -algebras . . . .
1.6 Borel sets . . . . . . . . . .

.
.
.
.
.
.

5
5
5
5
6
6
7

2 Measures
2.1 Definition of measure . . . . . . . . . . . . . . . . . . . . . . .
2.2 Some simple properties . . . . . . . . . . . . . . . . . . . . . .
2.3 Some simple examples . . . . . . . . . . . . . . . . . . . . . .

7
8
8
9

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

3 Construction of measures, Lebesgue measure in Rd

3.1 Volume in Rd , the overall idea . . . . . . . . . . . . . . .
3.2 Boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Semi-algebras . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Generated algebra . . . . . . . . . . . . . . . . . . . . .
3.5 Volume of boxes in Rd . . . . . . . . . . . . . . . . . . .
3.6 Extension from a semi-algebra to the generated algebra
3.7 Pre-measures . . . . . . . . . . . . . . . . . . . . . . . .
3.8 Verifying the conditions for boxes . . . . . . . . . . . . .
3.9 -finiteness . . . . . . . . . . . . . . . . . . . . . . . . .
3.10 Caratheodorys extension theorem . . . . . . . . . . . .
1

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

10
11
11
11
12
13
14
16
18
20
20

3.11
3.12
3.13
3.14

Dynkins lemma (- theorem) . . . . . . . . . . .

Uniqueness lemma . . . . . . . . . . . . . . . . . .
Construction of the extension, Outer measures . .
Example of a set that is not Lebesgue measurable .

4 Measurable functions and their properties

4.1 Open sets in [, ] . . . . . . . . . . . .
4.2 Continuous functions . . . . . . . . . . . . .
4.3 Measurable functions . . . . . . . . . . . . .
4.4 Compositions of functions . . . . . . . . . .
4.5 Borel functions . . . . . . . . . . . . . . . .
4.6 Limits of measurable functions . . . . . . .
4.7 Simple functions . . . . . . . . . . . . . . .
4.8 Monotone class theorem . . . . . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

5 Abstract integration theory (Lebesgue integral)

5.1 Arithmetic in [0, ] . . . . . . . . . . . . . . . .
5.2 Integration of non-negative simple functions . . .
5.3 Integration of non-negative functions . . . . . . .
5.4 Basic properties of the integral . . . . . . . . . .
5.5 Integral as a new measure . . . . . . . . . . . . .
5.6 Monotone Convergence Theorem . . . . . . . . .
5.7 Sums of non-negative series . . . . . . . . . . . .
5.8 Fatous Lemma . . . . . . . . . . . . . . . . . . .
5.9 Density functions . . . . . . . . . . . . . . . . . .
5.10 Integration of signed functions . . . . . . . . . .
5.11 Linearity of the integral . . . . . . . . . . . . . .
5.12 Dominated Convergence Theorem . . . . . . . . .
5.13 The role of sets of measure 0 . . . . . . . . . . .
5.14 Completion . . . . . . . . . . . . . . . . . . . . .
5.15 Series absolutely convergent in L1 () . . . . . . .
5.16 Examples of a.e.-type conclusions . . . . . . . . .
6 Inequalities and Lp spaces
6.1 Convex functions . . . . . . . . . . . . . . .
6.2 Jensens inequality . . . . . . . . . . . . . .
6.3 Examples . . . . . . . . . . . . . . . . . . .
6.4 H
olders inequality; Minkowskis inequality
6.5 Lp -spaces . . . . . . . . . . . . . . . . . . .
6.6 Completeness of Lp () . . . . . . . . . . . .
2

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

21
23
24
30

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

31
31
32
32
33
34
36
37
38

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

38
38
39
39
40
42
43
44
45
45
47
48
50
50
52
52
53

.
.
.
.
.
.

53
53
54
55
55
58
60

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

7 Product spaces and Fubinis theorem

7.1 Product measure . . . . . . . . . . . .
7.2 Example . . . . . . . . . . . . . . . . .
7.3 Fubinis Theorem . . . . . . . . . . . .
7.4 Important counterexamples . . . . . .
7.5 Convolutions . . . . . . . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

62
62
64
64
69
70

8 Applications to probability
71
8.1 Product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8.2 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8.3 Conditional expectation . . . . . . . . . . . . . . . . . . . . . 74
A Appendix

75

Introduction

0.1

Basic information

These are class notes for the unit MA40042 Measure Theory and Integration.
They combine some of the material (adapted to the level of prerequisites
students in MA40042 have) from the following two sources:
1. W. Rudin, Real and complex analysis, third edition, McGraw-Hill, New
York, 1987.
2. R. Durrett, Probability: theory and examples, fourth edition, Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge
Univ. Press, Cambridge, 2010.
These notes are a summary of the lectures and their goal is to supplement the lectures by recording the essentials. Motivation, further detail and
explanation may be given in the lectures, in particular, if there are student
questions.
Numbering conventions: If there is only a single theorem (definition,
lemma, etc.) in Subsection X.Y, then in later sections this theorem is referred to as Theorem X.Y. The theorem does not carry a separate number in
Subsection X.Y itself. If there are, say, three theorems in Subsection X.Y,
then in later subsections these are referred to as Theorems X.Y.1, X.Y.2
and X.Y.3. In Subsection X.Y itself, the theorems are simply numbered as
Theorems 1, 2 and 3.

In text exercises: Some statements made in the text are followed by

(HW). This means they are left as exercises for you to do. Some of these
will appear on problem sheets, but not all of them. You will be expected to
know how to do them.
Potential typos: Unavoidably, some typos will have crept in. Please
ask if anything is not clear.
Non-examinable: Any part that is non-examinable is cleared marked
as such, printed in smaller font size and indented. All else is examinable.

0.2

A number of conventions regarding infinities will be very useful in measure

theory. This subsection summarizes them. [Note: the conventions are restricted to this subject only. Do not try to use them without care in other
units.]
When discussing measures and integrals, one inevitably encounters .
The area or volume of a set may be infinite, or the integral of a functions may
be + or . Also, although we will mainly be interested in functions that
take real values, at some points it may be reasonable to define the function
to take an infinite value. This is often the case when we take limits or form
infinite sums of functions. We need precise definitions to deal with such
cases unambiguously.
Arithmetic in [0, ]. If 0 a , we define:
a + = + a = ;
and
a=a=

if 0 < a
0 if a = 0.

To illustrate in an example why it will be convenient to define 0 = 0,

think about the set in the two-dimensional plane consisting of the points on
the x-axis. This set has infinite length in the x-direction, and 0 width
in the y-direction, and it has 0 area, correspondingly.
Warning: Careful with cancelations! Suppose a, b, c [0, ]. If
a + b = a + c, this implies the statement b = c, provided that a < . If
ab = ac, this implies the statement b = c, provided 0 < a < .
Arithmetic in [, ]. If a, b , we define a + b and a b
provided the sum or difference is not of the indeterminate form . That
4

is, when a < , we define a + () = a = + a = . (In

particular: = , but () + is not defined.)

Systems of sets

1.1

-algebras

Definition. A collection M of subsets of X is called a -algebra in X, if

it has the following properties:
(i) X M;
(ii) if A M then also X A = Ac M;
(iii) if A1 , A2 , M, then also
n=1 An M.
The pair (X, M) is called a measurable space.
Note: The name -field is also customary as an alternative for -algebra.

1.2

Lemma. Let M be a -algebra in X.

(a) If A1 , A2 , M, then also
n=1 An M;
(b) M;
(c) If A1 , . . . , An M, then also nk=1 Ak M;
(d) If A1 , . . . , An M, then also nk=1 Ak M;
(d) If A, B M, then also A B M.
Proof. (a)

n=1 An

De Morgan

n=1

Acn
|{z}

M by (ii)

{z

M by (iii)

c

M.

(b) = X c M by (i)+(ii).
(c) Use (b) to take An+1 = An+2 = = in (ii).
(d) Use (i) to take An+1 = An+2 = = X in (a).
(e) A B = A B c M due to (ii)+(d).

1.3

(a) Example: X any non-empty set,

M = all subsets of X = {A : A X}.
5

M is a -algebra
(b) Non-example: X = {1, 2, 3, . . . },
A = {A X : A is finite or X A is finite}.
A is not a -algebra: {1}, {3}, {5}, A, but their union: {1, 3, 5, . . . } 6
A.
(c) Example: X is an uncountable set,
M = {A X : A is countable or X A is countable}.
M is a -algebra (HW).

1.4

Algebras

Definition. A collection A of subsets of X is called an algebra in X, if it

has the following properties.
(i) X A;
(ii) if A A then also X A = Ac A;
(iii) if A1 , . . . , An A, then also nk=1 Ak A.
Example 1.3(b) is an algebra. Any -algebra is also an algebra (HW).

1.5

Generated -algebras

Theorem. If C is any collection of subsets of X, then there is a smallest

-algebra M in X, such that C M.
M is called the -algebra generated by C.
Notation: M = (C).
Proof of Theorem. Let M be the intersection of all -algebras that contain
C. (We know there is at least one such -algebra, namely the one containing
all subsets of X). Then clearly C M . We show that M is itself a algebra. For this, we show the more general statement that if {M }I is
any family of -algebras in X, then I M is also a -algebra in X. This
statement then can be applied to the family of all -algebras containing C.
(Here I is an arbitrary non-empty index set, possibly uncountable.)
Let F = I M , and we check that F satisfies the requirements of
Definition 1.1(i)(iii).
(i): X M for all I, beacuse each M is a -algebra. Therefore
X I M = F.
6

(ii): A F = A M , I = Ac M , I = Ac
I M = F.
(iii): A1 , A2 , F = A1 , A2 , M , I =
n=1 An M ,

I = n=1 An I M = F.
We get that F is a -algebra. In particular, our M is a -algebra.
It is clear that M is necessarily the smallest -algebra containing C.
(HW): (i) If M1 M2 . . . are -algebras in X, then
n=1 Mn is
always an algebra in X, but it may not be a -algebra.
(ii) The smallest -algebra that contains all the Mn s is (
n=1 Mn ),
that is, in general, it is necessary to apply the () operation to the union.

1.6

Borel sets

Definition 1.
A set U Rd is called open, if for all x U there exists r > 0 such that
B(x, r) U . Here B(x, r) = {y Rd : |y x| < r} is the open ball of radius
r centred at x.
A set F Rd is called closed, if F c is open.
A set K Rd is called compact, if any open cover of K admits a finite
subcover. [That is, if K I U , where U are arbitrary open sets, then
there exist n 1 and 1 , . . . , n I such that also K ni=1 Ui .]
Theorem (Heine-Borel Theorem). (see Appendix) A set K Rd is
compact if and only if it is closed and bounded.
Definition 2. We denote
G := collection of all open sets = {U : U Rd , U open}.
We define
B := (G) = smallest -algebra containing all open sets.
B is called the collection of Borel sets in Rd .
(HW): B = (closed sets) = (compact sets).

Measures

In this section, let (X, M) be a measurable space.

7

2.1

Definition of measure

Definition.
(a) A measure is a function defined on a -algebra M with the following
properties:
(i) : M [0, ];
(ii) () = 0;
(iii) P
If A1 , A2 , M are pairwise disjoint then we have (
n=1 An ) =

(b) The triple (X, M, ) is called a measure space, and the members of
M are called measurable sets.

2.2

Theorem. Let be a measure on (X, M). Then we have:

(i) If A1 , . . . , An M are pairwise disjoint, then
(A1 An ) = (A1 ) + + (An )

(ii) If A, B M, A B, then (A) (B) (monotonicity).

(iii) If A1 A2 . . . , and A =
n=1 An , then
(A) = lim (An )
n

(continuity for increasing unions).

(iv) If A1 A2 . . . , and A =
n=1 An , and (A1 ) < , then
(A) = lim (An )
n

(v) If A
n=1 An , then
(A)

(An )

n=1

Proof. (i) Take An+1 = An+2 = = in Definition 2.1(a)(iii) and use

Definition 2.1(a)(ii).
(ii) Due to part (i), and B = A (B A), we have
(B) = (A) +

(B A)
| {z }

(A).

(iii) Put
B1 = A1 ,
B2 = A2 A1 ,
..
.
Bn = An An1 ,
..
.

Then B1 , B2 , . . . are pairwise disjoint and

n=1 An = n=1 Bn . Therefore,
we have

(A) =

(
n=1 An )

Defn 2.1(a)(iii)
(
=
n=1 Bn )

(Bn )

n=1

= lim

n
X

(Bk )

part (i)

lim (nk=1 Bk ) = lim (nk=1 Ak )

k=1

= lim (An ),
n

as required.
(iv) Put Bk = A1 Ak , k = 1, 2, . . . . Then B1 B2 . . . , and
c

n=1 Bn = A1 (n=1 An ) = A1 (n=1 An ) = A1 A = A1 A.

Also, (Bn ) = (A1 )(An ), since both (A1 ) and (An ) are finite. Therefore,
(A1 ) = (A) + (A1 A)

part (iii)

n

= (A) + lim [(A1 ) (An )] = (A) + (A1 ) lim (An ).

n

We may cancel (A1 ) from both sides, since (A1 ) < , and get the statement.
(v) (HW).

2.3

(a) Counting measure. X any set, M = all subsets of X,

(
# of elements of A when A is finite;
(A) =
+
when A is infinite.
9

is called counting measure on X.

(b) Point mass (Dirac measure). X and set, M = all subsets of X,
x0 X a fixed.
(
1 if x0 A;
x0 (A) =
0 if x0 6 A.
x0 is called the point mass at x0 , or Dirac measure at x0 .
(c) Discrete measures. X is a countable set, M = all subsets of X,
mi 0, i X given numbers (weights).
X
(A) =
mi .
iA

(HW): Check that (a), (b), (c) are indeed measures.

Whenever (X, M, ) is a measure space and (X) = 1, we say that
is
a
P probability measure. (b) is a probability measure, and whenever
iX mi = 1, so is (c).

Rd

One general approach to defining a measure is to define it first on a collection

of simple sets, where we already know what we want the measure to be,
and then extend it to all sets in the generated -algebra. The extension
step is often non-trivial, and we will see in this section what properties
one has to verify for this. Thus there will be two parallel threads: the
general construction, and, as an illustration, the concrete construction of
d-dimensional volume on the Borel sets in Rd (called Lebesgue measure).
A lot of the material in this section is quite abstract. So it is important
to keep in mind that once we have constructed a measure, we can essentially forget about how it was obtained: the rest of the material, namely
integration theory and the three main theorems of integration will not rely
on the method of construction, and it is perfectly possible to understand the
further sections without knowing the internal workings of this section. But
one needs to be aware that the construction step, in almost all interesting
cases, involves something non-trivial.
Finally, if in the future you will be using more delicate properties of
function than we treat in this course, you may want to know that one
can construct measures in a somewhat more elegant way than we do
here; see e.g. [2, Chapter 2]. The reason I chose a more pedestrian

10

approach in this unit, although it is slightly longer, is that I believe it

is more intuitive on a first encounter with the subject.

3.1

Volume in Rd , the overall idea

For a rectangular
set [a1 , b1 ] . . . [ad , bd ], we already know what its volume
Qd
should be: i=1 (bi ai ). We can go a slight bit further, and consider sets
that are finite unions of rectangular sets. These can be decomposed into a
finite disjoint union of rectangular sets, so their volume is determined due to
finite additivity. After taking care of a few uninteresting technicalities, we
can ensure that the volume is defined on the algebra generated by rectangular
sets, and it is finitely additive there. The main step is to go from this algebra
to the Borel -algebra, and this requires the most work. But the basic idea
is simple: given a Borel set B Rd , we consider coverings B
n=1 An ,
where the An s are rectangular sets. Then the sum of the volumes of the
An s is an approximation to what the volume of B should be. Making the
covering as efficient as possible, we define the measure of B as the infimum
of the approximations over all coverings.

3.2

Boxes

For technical reasons (that are not essential, but help make some of the
statements neater), we will work with half-open intervals. A box S Rd is
a set of the form S = I1 Id , where, for each i = 1, . . . , d the interval
Ii takes one of the following four forms:

(ai , bi ]
for some < ai bi < , or

(, b ]
for some < bi < , or
i
Ii =
(ai , )
for some < ai < , or

(, ).

We put S = {S Rd : S is a box}. Note that S is not an algebra, because

it is not closed under complements.

3.3

Semi-algebras

Definition. A non-empty collection S of subsets of X is called a semialgebra, if

(i) S1 , S2 S S1 S2 S;
(ii) S S S c = X S is a finite disjoint union of members of S.
11

Example. The collection of boxes in Rd , defined in Section 3.2 is a semialgebra. (HW): The complement of any box is the disjoint union of at most
2d boxes.

3.4

Generated algebra

Lemma. Let S be a semi-algebra in X. Put

A = finite disjoint unions of members of S

Then A is an algebra in X. (It is called the algebra generated by S.)

Proof. Step 1. We firts show that A is closed under intersections. Suppose
A = ki=1 Si , B = j=1 Tj (disjoint unions). Then
A B = ki=1 j=1 (

Si Tj
| {z }

)A

by definition of A.

S by Defn. 3.3(i)

Step 2. We now show that A is closed under complements. Suppose

A = ki=1 Si (disjoint union). Then
Ac = ki=1

Sic
|{z}

by Step 1 and induction.

A by Defn. 3.3(ii)

This verifies that A is closed under complements.

Step 3. We check that X A. Take any S S. Write S c = ki=1 Si
(disjoint). Then X = S ki=1 Si A.
We check that A is closed under finite unions: if A, B A, we have
AB =(

c
B}c
|A {z

)c A

by Step 2.

A by Steps 1 and 2

Example. If S = collection of boxes in Rd , then

A = finite disjoint unions of boxes
is an algebra.

12

3.5

Volume of boxes in Rd

(S) :=

k
Y
i=1

(bi ai ),

where ai and bi are the endpoints of Ii ; allowed to take infinite values.

(Recall that the convention 0 is in force.)
Lemma. The set-function is finitely additive on S, that is, if S S,
S = ki=1 Si with S1 , . . . , Sk S disjoint, then
k
X

(S) =

(Si ).

i=1

Proof. Let us say that S = j=1 Bj is a regular subdivision of S, if for

each i = 1, . . . , d, there exists a sequence
ai = i,0 < i,1 < < i,ni = bi
such that each Bj is of the form:
(1,r1 1 , 1,r1 ] (d,rd 1 , d,rd ],

for some 1 ri ni , i = 1, . . . , d.

(and consequently: = n1 nd ). It follows

P easily, from the distributive
law, that for a regular subdivision (S) = j=1 (Bj ).
General case: let S = ki=1 Si . We can subdivide each Si regularly in
such a way that
m

j
Si = j=1
Ti,j

S = i,j Ti,j

It follows that
(S) =

(regular subdivision)
(also a regular subdivision).

mj
k X
X

(Ti,j ) =

k
X
i=1

i=1 j=1

13

(Si ).

3.6

Theorem. Let S be a semi-algebra in X, and suppose that the set-function

: S [0, ] satisfies:
(i) () = 0;
(ii) (finitely additive) if S = ki=1 Si , as a disjoint union, then (S) =
Pk
i=1 (Si ).
Then has a unique extension
to the algebra A generated by S, such that
the following properties holds:
(a) (extension is still finitely additive) If A, B1 , . . . , Bn A, A = ni=1 Bi ,
disjoint union, then
n
X

(Bi ).

(A) =
i=1

Moreover,
also has the following property:
(b) (
is finitely sub-additive) If A, B1 , . . . , Bn A, A ni=1 Bi , ( not
necessarily disjoint union), then

(A)

n
X

(Bi ).

i=1

Proof. We define
on A, by writing any A A in the form A = ki=1 Si
P
(disjoint union of members of S), and putting
(A) = ki=1 (Si ).
Note: this is the only possible way to define
(A), if we want property
(a) to hold at all. Hence uniqueness of the extension is immediate.
But we have to check that
is well-defined, that is, the value of
(A)
does not depend on the way we represented A as a union. Suppose A =
ki=1 Si = j=1 Tj . Then
Si = j=1 (Si Tj )

and

Tj = ki=1 (Si Tj ).

This implies:
k
X

(ii)

(Si ) =

k X
X
i=1 j=1

i=1

(ii)

(Si Tj ) =

(Tj ).

j=1

So any two possible definitions of

(A) agree.
i
Now to see that (a) holds: write Bi = m
j=1 Si,j as a disjoint union of
i
members of S. Then A = ki=1 m
j=1 Si,j , and hence

(A)

defn. of
(A)

mi
k X
X

(Si,j )

i=1 j=1

defn. of
(Bi )

k
X
i=1

14

(Bi ).

Finally, to see that (b) holds: start with n = 1, B1 = B. Then B =

c
A (B
|
{zA}), so
A

(A)
(A) +
(B Ac ) =
(B).

c
Now we handle n > 1. Let Fk = B1c . . . Bk1
Bk A. Then ni=1 Bi =
ni=1 Fi , and the latter is a disjoint union. So we have:
(a)

(A) =

n
X
i=1

(Fi A)

n = 1 case

n
X

(Fi )

i=1

n = 1 case

n
X

(Bi ).

i=1

Corollary. If A =
i=1 Bi , as a disjoint union, and A, B1 , B2 , A, then

(A)

(Bi ).

(1)

i=1

Proof. Write A = (ni=1 Bi ) Cn , as a disjoint union of members of A (note

that Cn = A (ni=1 Bi )c A). Hence we get:
(a)

(A) =
(B1 ) + +
(Bn ) +
(Cn )

n
X

(Bn ).

i=1

Letting n , we get (1).

Example. The theorem implies that the set-function defined on boxes in
on the
Section 3.5 extends uniquely, to a finitely additive set-function
generated algebra (finite disjoint unions of boxes).
Note: Of course, we would like = in (1), but this does not follow from
the assumptions we made on . For example, consider X = {1, 2, 3, . . . },
A = finite or co-finite subsets, (A) = 0 if A is finite and (A) = 1 if A
is co-finite. This is finitely additive, but the inequality in (1) is strict for
Bi = {i}, A = {1, 2, 3, . . . }. This set-function simply cannot be extended
to a measure on the generated -algebra. We will need extra conditions on
to guarantee that we can further extend to the generated -algebra.

15

3.7

Pre-measures

First, a brief summary of what we have achieved so far. If S is a semi-algebra

in X and A is the generated algebra:
Assume: finitely additive on S, that is:
P
(S) = ki=1 (Si ), if S = ki=1 Si (a disjoint union)

unique extension
to A, which is:
P
(A) = ki=1
(Ai ), if A = ki=1 Ai
(a disjoint union)
P
(A) ki=1
(Ai ), if A ki=1 Ai
(not necessarily disjoint union)
P
(A)
(Ai ), if A =
i=1 Ai
i=1
(disjoint union).

Examples of s we have seen so far: on boxes in Rd , and F on 1D boxes.

We want countable additivity for
. As the Note at the end of Section 3.6
shows, for this it is not enough to assume what we did in the first box above.
We are going to assume that is countably sub-additive on S, and show that
this implies the same for
on A. Such a
will be called a pre-measure.
Definition. Let A be an algebra in a set X. A set-function
: A [0, ]
is called a pre-measure, if it satisfies:
(i)
() = 0;
If A1 , A2 , A are disjoint and A =
n=1 An A, then
P

(A) = n=1
(An ).

Note: In short, a pre-measure has the same defining properties as a

measure, except that it is defined on an algebra, not a -algebra.
Note: There is an appropriate definition of pre-measure also when A is
not an algebra (that you might see in some books), but it is more technical.

Theorem. (An enhancement of Theorem 3.6) Let S be a semi-algebra

in X and A the generated algebra. Suppose that : S [0, ] satisfies:
(i) () = 0;
P
(ii) If S = ki=1 Si , disjoint union, then (S) = Pki=1 (Si );

(iii) If S =
i=1 Si , disjoint union, then (S)
i=1 (Si ).
Then has a unique extension
to A such that
is a pre-measure on A.
16

Proof. In Theorem 3.6 we saw that (i) + (ii) implies the existence of a unique
extension
on A satisfying 3.6(a) + (b) + (1). What is left to check is that
if A =
n=1 An , a disjoint union, A, A1 , A2 , A, then

(A)

(An ).

(2)

n=1

(Because (2) together with 3.6(1) implies -additivity, the required property
(ii) of a pre-measure.)
In order to show (2), for each n 1 write:
n
Sn,j ,
An = kj=1

We have

disjoint, Sn,j S.

(An ) =

kn
X
X

(Sn,j ).

(3)

n=1 j=1

n=1

For ease of reference, reindex the Sn,j s as a single sequence: S1 , S2 , . . . ,

disjoint (since the An s are also disjoint), so that
kn

we can write:

Tj =
i=1 (Tj Si ).
| {z }
S

(Tj )

X
i=1

(Tj Si ).

(A) =

X
j=1

X
i=1

(Tj )

X
X
j=1 i=1

(A Si ) =

(Tj Si ) =

(Si )

X
X
i=1 j=1

reindexing

(Tj Si )

kn
X
X

n=1 j=1

i=1

17

(3)

(Sn,j ) =

n=1

(An ).

3.8

Let S = boxes in Rd , let A be the generated algebra, the volume function

its extension to A.
defined in Section 3.5, and
In this section we verify the condition (iii) of Theorem 3.7 for , and as
is a pre-measure on the generated algebra A.
a consequence that
Suppose that S =
i=1 Si , disjoint union, S, S1 , S2 , . . . boxes. We need
to check that

X
(S)
(Si ).
(4)
i=1

You might want to pause here a little bit, and try to think how you would
prove this. It is very intuitive that the two sides should be equal, yet, to
prove just the inequality above involves something non-trivial.
Before presenting the argument, I should mention that one could streamline the proof below somewhat, but the reason I do not, is that this way the
argument will apply equally well to the set-functions F we looked at on
Problem Sheet 2.
Proof of (4). First assume that S is bounded, so
S = (a1 , b1 ] (ad , bd ].

We may assume that the sum on the right hand side of (4) is < (otherwise
holds trivially). In particular, we may assume that (Si ) < for all i 1,
and
Si = (ai1 , bi1 ] (aid , bid ].
Fix any > 0. We can find > 0 such that for the box
S = (a1 + , b1 ] (ad + , bd ],
that is slightly smaller than S, we have
(S) (S ) + .

(5)

Similarly, for each i = 1, 2, . . . , we can choose i > 0 such that for the box
Si = (ai1 , bi1 + i ] (aid + , bid + i ],
that is slightly larger than Si , we have
(Si ) (Si ) +

,
2i

18

i = 1, 2, . . . .

(6)

Put
S := [a1 + , b1 ] [ad + , bd ];

Si := (ai1 , bi1 + i ) (aid , bid + i ),

and observe that S is compact and the Si s are open. Moreover, we have
a covering, because of the inclusions:

S (a1 , b1 ] (ad , bd ] =
i=1 Si i=1 Si .

(7)

Due to the Heine-Borel Theorem (see the Appendix), there exists a finite N

such that S N
i=1 Si . From this the following inclusion follows:

S S N
i=1 Si i=1 Si .

(5)

(S) (S )+

Thm 3.6

N
X

(Si )+

i=1

N
(6) X

i=1

(Si )+2

i=1

(S)

(Si ),

i=1

which is what we wanted to prove.

Finally, we deal with the case of unbounded S. Let S S be a bounded
box, so that S
i=1 Si . We can repeat the previous argument and get

(S)

(Si ).

i=1

(For this, note that we only need inclusion, and not =, to hold in the middle
of (7).) Since this holds for any bounded S S, (4) follows.

19

A summary of the achivement of the last two sections:

Assume is:
finitely additive on S: (S) =
(a disjoint union)

Pk

i=1 (Si ),

countably sub-additive on S: (S)

(a disjoint union)

if S = ki=1 Si

i=1 (Si ),

if S =
i=1 Si

unique extension
to A, a pre-measure, that is:
P
(Ai ), if A =
(A) =
i=1 Ai
i=1
(disjoint union).

Examples of we have seen that satisy the assumptions in the first box:
in Rd , F in 1D (Problem Sheets).
What is left of the construction of measures is to extend a pre-measure

defined on an algebra A to a measure on (A). This involves two separate

parts: showing that an extension exists, and proving that it is unique. The
uniqueness part is the easier one, and we will start with that. In doing so,
we will need to impose another restriction (which is however, usually easy
to verify). This is explained in the next section.

3.9

-finiteness

The following concept is needed to ensure that a pre-measure

defined on
an algebra A has at most one extension to a measure on (A).
Definition. Let
be a pre-measure on an algebra A in X (or a measure on
a -algebra M in X). We say that
is -finite, if there exist A1 , A2 , A
(or M), such that

n=1 An = X

and

(An ) < ,

n = 1, 2, . . . .

Remark 1. If we need to, we may assume that A1 A2 . . . . Simply take

An = nk=1 Ak instead.

Remark 2. If we need to, we may assume that A1 , A2 , . . . are disjoint. Simply take An = An (A1 An1 ) instead.

3.10

Carath
eodorys extension theorem

Theorem. Let
be a -finite pre-measure on an algebra A in X. Then

has a unique extension to a measure on (A).

20

is a
Example. By the work in Sections 3.7 and 3.8 the volume function
pre-measure on the algebra A generated by the boxes. It is not difficult to

check that it is -finite: ((n,

n]d ) = (2n)d < for all n 1, and Rd =
n1 (n, n]d . It follows from the Theorem that there is a unique extension
on (A) = B. We call this measure the d-dimensional
to a measure
Lebesgue measure.
The proof of the Theorem is quite technical, and as mentioned earlier,
the uniquemess part is easier. We do this next. Following that we define the
extension. The details of seeing that the definition indeed works and defines
a measure will be non-examinable.

3.11

Dynkins lemma (- theorem)

In this section we prove a very useful lemma that helps with statements
of the sort: if we know that a property holds for a certain (often rather
small) collection of sets, then it necessarily holds for all sets in the generated
-algebra.
Let X be a non-empty set.
Definition.
A collection P of subsets of X is called a -system, if it is closed under
intersection:
(i) A, B P A B P.
A collection L of subsets of X is called a -system, if it has the following
three properties:
(ii) X L;
(iii) (closed under monotonic difference) A, B L, A B BA L;
(iv) (closed under monotonic union) A1 , A2 , L, A1 A2 . . .

n=1 An L.
Note: Sometimes a -system is called a d-system (after Dynkin).
Theorem (Dynkins - theorem). If P is a -system and L is a -system,
and P L, then also (P) L.
Proof. (Non-examinable)
Let (P) denote the smallest -system that contains P. (That there is
a smallest such -system can be seen similarly to the case of -algebras;
Theorem 1.5.) The Theorem will follow, once we show that:
(P) is a -algebra.

21

(a)

Indeed, then we get:

(P) (P) L.
The first inclusion holds because (P) is the smallest -algebra containing P, and by (a) (P) is one such -algebra. The second inclusion
holds because (P) is the smallest -system containing P, and L is one
such -system, by our assumption P L.
In order to prove (a), it is sufficient to prove:

(b)

Indeed, if we have (b), we get:

X (P), since (P) is a -system;
If A (P), then X A (P) (since this is a monotonic difference);
If A1 , A2 , (P), then
!c
n
n
[
\
Ai =
Aci
(P),
|{z}
i=1

i=1

{z

and then ni=1 Ai

i=1 Ai (P), as it is a monotonic union. The
three bullet points verify the three requirements for a -algebra, so
indeed (b) implies (a), and it is left to prove (b).
Proving (b) is the main part of the argument. For any set A (P),
we define
DA := {B X : A B (P)}.
We claim that:
DA is a -system.

(c)

Indeed:
A X = A (P), so X DA ;
If B, C DA , and B C, then A(CB) = (AC)(AB) (P),
since this is a monotonic difference of members of (P);
If B1 , B2 , DA and B1 B2 . . . , then A (
n=1 Bn ) =

n=1 (A Bn ) L, since this is a monotonic union of members of

(P).
The above verifies (c), saying that DA is a -system.
The remaining part of the argument is simply tracing definitions in the
right way. First we show that:
If A P, then P DA .

(d)

Indeed: for any B P, we have A B P, since P is closed under

intersection, and by the definition of DA we have B DA .

22

Next, (c) + (d) imply that if A P then (P) DA (since DA is

a -system containing P, and (P) is the smallest one). Tracing the
definition of DA , this means that:
If A P and B (P), then A B (P).
Swapping the roles of A and B in this statement, we can rephrase it
as:
If A (P) and B P, then A B (P).
(e)
But (e) says that if A (P), then P DA . This implies that if A
(P), then (P) DA (since DA is the smallest -system containing
P). By the definition of DA , the last statement says:
If A (P) and B (P) then A B (P).
And this is the statement (b) that we wanted to prove. The proof of
the Theorem is complete.

3.12

Uniqueness lemma

Theorem. Suppose that 1 and 2 are measures on -algebras M1 and M2

in X. Suppose that 1 and 2 equal on a -system P M1 M2 . Suppose
also that there exist A1 , A2 , P, An X, such that 1 (An ) = 2 (An ) <
for all n 1. Then 1 and 2 equal on (P).
Proof. Fix A P such that 1 (A) = 2 (A) < . Let
L := {B (P) : 1 (A B) = 2 (A B)}.
Since for any B P we have A B P, we have P L by assumption. In
order to use the - theorem, we check that L is a -system:
Since 1 (A X) = 1 (A) = 2 (A) = 2 (A X), it follows that X L.
Suppose B, C L and B C. We have:
1 (A (C B)) = 1 ((A C) (A B))
1 (A)<

B,CL

1 (A C) 1 (A B)

2 (A C) 2 (A B)

2 (A)<

2 ((A C) (A B))

= 1 (A (C B)).

23

Hence C B L.
Suppose B1 , B2 L and B1 B2 . . . . We have:

1 (A (
n=1 Bn )) = 1 (n=1 (A Bn ))
increasing union

lim 1 (A Bn )

B1 ,B2 ,L

lim 2 (A Bn )
n
increasing union
=
2 (
n=1 (A Bn ))

= 1 (A (n=1 Bn )).
=

Hence
n=1 Bn L.
The above verifies the three requirements for L to be a -system.
The - theorem implies that (P) L, and hence that 1 (A B) =
2 (A B) for all sets B (P). We apply this now with A = A1 , A2 , . . .
from the assumption of the Theorem. This gives, for all B (P), the
equality:
1 (B) = lim 1 (An B) = lim 2 (An B) = 2 (B).
n

Hence the statement is proved.

Corollary. Suppose that
is a -finite pre-measure on an algebra A. Then

has at most one extension to a measure on (A).

Proof. Suppose that 1 and 2 are both extensions of
. Then 1 and 2
equal on A, which is a -system. The -finiteness assumption implies that
the Theorem can be applied, and hence 1 and 2 are equal on (A).
Remark. In particular, d-dimensional Lebesgue measure is uniquely determined on the Borel sets, by specifying it on boxes.

3.13

Construction of the extension, Outer measures

Let X be a non-empty set, and write E(X) for the collection of all subsets
of X.
Definition 1. A set-function : E(X) [0, ] is called an outer measure, if it satisfies:
(i) () = 0;
P

n=1 (En ), whenever E n=1 En .
24

Remark 1. Observe that (i) + (ii) imply monotonicity: if E F , then

(E) (F ), because E F .

One way to obtain an outer measure is to start with an arbitrary setfunction, and cover sets as efficiently as possible, as we nexy show.
in the earlier sections, that is a pre-measure on an
algebra A in X.
Definition 2. The outer measure constructed from
is defined as:
)
(
X

(An ) : E
(9)
(E) := inf
n=1 An , A1 , A2 , A ,
n=1

Lemma. The set-function is indeed an outer measure, in the sense of

Definition 1.
Proof. The emptyset E = can be covered with = A1 = A2 = . . . ,
showing that () = 0.
In order to see -subadditivity, suppose that E
n=1 , and we want to
show that

(En ).
(10)
(E)
n=1

(E

If for any n 1 we have

n ) = , then the inequality (10) holds trivially.
Henceforth assume that (En ) < for all n 1. Let > 0 be fixed. For
each n 1, due to the definition of (En ) by an infimum, as well as its
finiteness, we can find a covering En
k=1 An,k , with An,1 , An,2 , A,
such that

(An,k ) (En ) + n .
(11)
2
k=1

Since

E
n=1 En n=1 k=1 An,k ,

the doubly index collections {An,k }n1,k1 is a countable covering of E by

elements of A. Therefore, using (11), we have

X

 X
(En ) + .
=
2n
n=1
n=1 k=1
n=1
P

Since > 0 was arbitrary, we can let 0 to obtain (E)

n=1 (En ).
This was the required inequality (10). This completes the proof that is
an outer measure.
(E)

(An,k )

(En ) +

25

Remark. Note that we did not use at all in the proof that
was a
pre-measure, nor that it was defined on an algebra. So we could have
started from any set-function, and (9) would have given an outer measure. This is sometimes useful. The fact that
is a pre-measure comes
in when we want to show that is an extension of
.

defined on A, the finite unions of boxes in Rd . We

Example 1. Consider ,
call
(
)
X

n) : E
(E) := inf
(A
An , A1 , A2 , A
n=1

n=1

the d-dimensional Lebesgue outer measure. Writing each An as a finite

disjoint union of boxes, and re-indexing, can also be written in terms of
coverings with boxes:
(
)
X
(E) := inf
(Sn ) : E
n=1 Sn , S1 , S2 , S .
n=1

The infimum is now over all countable coverings of E with boxes.

The following is the crucial definition for the construction.
Definition 3. Let be an outer measure on E(X). We say that a set
E X is -measurable, if for all T X we have:
(T ) = (T E) + (T E c ).

(12)

Note: One way to remember the formula (12) is that it says: E cuts
every set T nicely (nice meaning: additively).
Remark 2. (Important) Since an outer measure is -sub-additive, and
() = 0, the inequality (T ) (T E) + (T E c ) always holds,
trivially. Hence, in showing that a given set E X is -measurable, it
will always be enough to show that (T ) (T E) + (T E c ) for all
T X.
Example 2. With the Lebesgue outer measure, we say that a set E X
is Lebesgue measurable, if it is -measurable, in the sense of Definition
3.

Theorem (Main theorem). Suppose that

is a pre-measure on an algebra
A in X, and let be the outer measure constructed from
, according to
Definition 2.
26

(i) The collection M of -measurable sets is a -algebra.

(ii) M A, and hence M (A).
(iii) restricted to M is a measure.
(iv) |A =
.
In particular, extends
to a measure on a -algebra that is at least as
large as (A).
Example 3. Consider the Lebesgue outer measure , and let M denote
the collection of Lebesgue measurable sets. Then M B, and restricted
to M is called d-dimensional Lebesgue measure.
Proof of Main Theorem. (Non-examinable) We start with the proofs
of (i) and (iii). It is an important fact that these hold regardless of
how the outer measure was obtained, so we state them in this greater
generality in the next two lemmas.
Lemma 1. Let be any outer measure on E(X), and let M denote
the collection of -measurable sets. Then M is a -algebra.
Proof. It is easy to check that X M . Indeed, for any T X we
have
(T ) = (T X) = (T X) + () = (T X) + (T X c ),

It is also easy to see that M is closed under complements, since the

definition (12) of -measurability is symmetric in E and E c .
We show that M is closed under countable intersections. This is
sufficient to finish the proof, because taking complements we get that
M is closed under countable unions. So suppose that E1 , E2 , . . .
M , and recall that it is sufficient to show that for any T X we have


\
c 


\
.
(13)
Ej
Ej + T
(T ) T
j=1

j=1

Using -measurability of E1 , E2 , . . . in turn, with the role of the testset T being played by T, T E1 , T E1 E2 , . . . we get:
(T ) = (T E1c ) + (T E1 )
= (T E1c ) + (T E1 E2c ) + (T E1 E2 )

= (T E1c ) + (T E1 E2c ) + (T E1 E2 E3c )

+ (T E1 E2 E3 )
..
.
n
n1

\


 i1
X 
\ 
Ej .
T
=
Ej Eic + T
i=1

j=1

j=1

27

Due to monotonicity, the last term on the right hand side is at least
(T (
j=1 Ej )), so we have:
(T )

n1
X
i=1


\



 i1
\ 
Ej .
T
Ej Eic + T
j=1

j=1

Since this inequality holds for any n 1, we can let n on the

right hand side and obtain:
(T )

X
i=1


\



 i1
\ 
Ej .
T
Ej Eic + T
j=1

j=1

c
Since the union of the sets T (i1
j=1 Ej ) Ei (that appear in the sum)
c
c

over i = 1, 2, . . . equals T (j=1 Ej ) = T (

j=1 Ej ) , we get


\
c 

\

(T ) T
Ej
+ T
Ej .
j=1

j=1

And this is the inequality (13) we wanted to show. The proof is complete.
Lemma 2. Let be any outer measure on E(X), and let M be the
collection of -measurable sets. Then |M is a measure.
Proof. It is easy to see finite additivity: let E, F M be disjoint.
Using the -measurability of F with T = E F , we have:
(E F ) = ((E F ) F ) + ((E F ) F c ) = (F ) + (E).
Since due to Lemma 1 M is an algebra, finite additivity follows by
induction.
We saw in Eqn. 1 of Corollary 3.6 that finite additivity on an algebra
implies countable super-additivity: if E =
n=1 En as a disjoint union,
with E1 , E2 , . . . M , then
(E)

(En ).

n=1

(E)

(En ).

n=1

This proves that is -additive on M , and the lemma follows.

28

We next prove (ii) in the Main Theorem. Here we need to use that

is finitely additive, and that the outer measure was constructed

using coverings with elements of A.
Lemma 3. Under the assumptions of the Main Theorem, we have
M A, and consequently M (A).
Proof. We need to show that any A A is -measurable, and for this
it is enough to show that for any T X we have
(T ) (T A) + (T Ac ).

(14)

We may assume that (T ) < , otherwise the inequality holds trivially. Fix any > 0. By the definition of , we can find a covering
T
n=1 Bn , with B1 , B2 , . . . A, such that
(T ) +

(Bn ).

n=1

Writing
(Bn ) =
(Bn A) +
(Bn Ac ), this can be reformulated as:
(T ) +

n=1

(Bn A) +

n=1

(Bn Ac ),

(15)

c
Since T A
n=1 (Bn A) and T A n=1 (Bn A ), the right

hand side of (15) is bounded below by (T A) + (T Ac ). This

gives:
(T ) + (T A) + (T Ac ).

Since > 0 was arbitrary, we can let 0, and this gives the required
(14).
Finally, we prove (iv) of the Main Theorem. This is where we use that

is a pre-measure.
Lemma 4. Under the assumptions of the Main Theorem, |A =
.
Proof. Let A A. Since A is covered by A , we immediately
have (A)
(A). What we need to show is that there does not exist
a countable covering of A that is more efficient than A itself. For
this, consider any covering A
n=1 Bn , with B1 , B2 , . . . A. Let
Bn := A Bn , n 1; these sets still cover A, and are in A. We have

A =
n=1 Bn . Further, let us make the sets disjoint, that is, consider

)c ). Then A =
Bn := Bn ((B1 )c (Bn1
n=1 Bn as a disjoint
union.

29

Using that
is a pre-measure, we have:

(A) =

n=1

(Bn )

n=1

(Bn )

(Bn ).

n=1

Since the covering A

(A) (A).
n=1 Bn was arbitrary, we get
This completes the proof.
Together Lemmas 1, 2, 3, 4 prove the Main Theorem.

Together the Main Theorem and the Uniqueness Lemma complete the
proof of Caratheodorys Extension Theorem.
(Non-examinable)

3.14 Example of a set that is not Lebesgue measurable

Define the following relation between elements of the interval [0, 1]:
for x, y [0, 1], we write x y, if x y Q. This is an equivalence
relation:
(i) x x, because x x = 0 Q;
(ii) x y implies y x, because y x = (x y) Q;
(iii) x y and y z imply x z, because xz = (xy)+(y z) Q.
Consider the equivalence classes with respect to the equivalence relation . Let E be a set that contains exactly one number from each
equivalence class. We show that E is not Lebesgue measurable.
For each q [1, 1], denote Eq := E + q. We claim that the sets Eq ,
q [1, 1] Q are disjoint. Indeed, suppose we had x Eq Er , with
q, r [1, 1] Q. Then due to the definitions of Eq and Er , we must
have x = y + q = z + r for some y, z E. But then y z = r q Q,
and hence y z. But since E contains exactly one number from each
equivalence class, we can only have y z if y = z and hence q = r.
This proves the disjointness claim.
Next we claim that
[0, 1]

q[1,1]Q

Eq [1, 2].

(16)

The second inclusion is obvious, because E [0, 1] and each q is in

[1, 1]. To see the first inclusion, let x [0, 1]. Then there exists a
unique y E such that x y. Let q = x y [1, 1] Q. Then
x = y + q Eq , so x is an element of the middle set in (16).

30

Suppose that E was Lebesgue measurable. Then so are all the sets Eq
and (Eq ) = (E), because Lebesgue outer measure is translation invariant (because the volumes of boxes are, and Lebesgue outer measure
was defined in terms of those). From (16) we have:
X
0 < 1 = ([0, 1]) (q[1,1]Q Eq ) =
(Eq )
=

q[1,1]Q

(E).

q[1,1]Q

It follows from this that we must have (E) > 0. But then, again from
(16) we get
X
3 = ([1, 2]) (q[1,1]Q Eq ) =
(Eq )
=

q[1,1]Q

q[1,1]Q

(E) = ,

because an infinite sum, all of whose terms are a fixed positive number
is . This is a contradiction, so we must have that E is not Lebesgue
measurable.
Remark 1. In defining the set E we used the so-called Axiom of choice,
that asserts that if there is a family of sets {A : I}, then there
exists a function f : I I A such that f () A for each I.
That is, there exists a choice function that selects one element from
each set in the family. In our case the sets A are the equivalence
classes with respect to .

4
4.1

Measurable functions and their properties

Open sets in [, ]

Sometimes it will be useful to consider functions taking values in [, ],

rather than in R. The following two definition adapt the notion of of open
set to this setting.
Definition 1. We say that I [, ] is an open interval, if either
I R, and I is an open interval in the ordibary sense, or if I has one of the
following forms:

where < b ; or
[, b)
I = (a, ]
where a < ; or

[, ].
31

Definition 2. We say that U [, ] is open, if for any x U there

exists an open interval I such that x I U . A set F [, ] is
closed, if F c = [, ] F is open.

4.2

Continuous functions

Notation (inverse image): For any function f : X Y between two

sets, and any B Y , we denote the inverse image of B under f as
f 1 (B) := {x X : f (x) B}.

Note that nothing is implied about f have an inverse: in general, it will not
have one. The notationf 1 (B) is simply a shorthand for the set of points
in X that f maps into the set B.
Definition 1. Let U Rd be an open set. A function f : U Rk is called
continuous, if for all open sets V Rk the set f (V ) is an open subset of
U.
Note: A similar definition can be made for functions f : U [, ].
Remark. (HW) The above definition of continuity is equivalent to the usual
definition of continuity:
for all x U and all > 0, there exists > 0 such
that whenever |y x| < then |f (y) f (x)| < .

4.3

Measurable functions

Definition 1. Let (X, M) be a measurable space. We say that a function

f : X Rk is measurable, if for all open sets V Rk , the set f 1 (V ) M.
Note: When more than one -algebra is considered in X (a situation
common in probability), then we say f is M-measurable, in order to clarify
which -algebra is meant.
We sometimes need the following.
Definition 2. Let (X, M) be a measurable space. We say that a function
f : X [, ] is measurable, if for all open V [, ], the set
f 1 (V ) M.
Example 1. Let E M. Then 1E : X R defined by
(
1 if x E;
1E (x) :=
0 if x 6 E,
is a measurable function (HW).
32

Example 2. Let U Rd be open, and let

(HW)

BU := {B U : B B} = (open subsets of U ),
so that (U, BU ) is a measurable space. Suppose that f : U Rk is continuous. Then f is also measurable on (U, BU ), as can be seen as follows.
Let V Rk be open. Then f (V ) U is open, and hence it is in BU .
Therefore, f is measurable on (U, BU ).

4.4

Compositions of functions

Theorem 1. Let U Rd be an open set. If f : X U is measurable,

and g : U Rk is continuous, then h = g f , defined by h(x) = g(f (x)),
h : X Rk , is measurable.
Proof. Let V Rk be open. We have:

h1 (V ) = {x X : h(x) V } = {x X : g(f (x)) V }

= {x X : f (x) g 1 (V )} = f 1 ( g 1 (V ) ) M.
| {z }
|

open by
assump. on g

{z

M by
assump. on f

Theorem 2. Let u, v : X R be measurable, let : R2 R be continuous,

and put h(x) = (u(x), v(x)), h : X R. Then h is measurable.

Proof. Put f (x) = (u(x), v(x)), so that f : X R2 , and h = f . Due to

Theorem 1, it is enough to show that f is measurable. Let R R2 be any
open rectangle, that is R = I1 I2 for open intervals I1 , I2 R. We have:
f 1 (R) = {x X : f (x) R} = {x X : (u(x), v(x)) I1 I2 }
= {x X : u(x) I1 } {x X : v(x) I2 }

= u1 (I1 ) v 1 (I2 ) M.

Hence the inverse image of any open rectangle is measurable. Now if V R2

is any open set, we can write V =
i=1 Ri , with all Ri s open rectangles
(HW). Then we have
(HW)

1
f 1 (V ) = f 1 (
(R ) M.
i=1 Ri ) = i=1 f
| {z i}
M

33

Corollary.
(i) If f : X R is measurable, so are |f |, f + := max{f, 0} and f :=
max{f, 0}.
(ii) If f, g : X R are measurable, so are f + g and f g.
(iii) If f : X C is measurable, there exists a measurable function : X
C such that || 1, and f = |f |.
Proof. (i) Follows from Theorem 1 with g(y) = |y|, g(y) = max{y, 0} and
g(y) = max{y, 0}.
(ii) Follows from Theorem 2 with (s, t) = s + t and (s, t) = st.
(iii) Put X0 := f 1 (C{0}) M. The restriction f |X0 : X0 C{0} is
measurable (with respect to M0 := {E X0 : E M}), and g : C {0}
C defined by g(z) = z/|z| is continuous. Therefore, on X0 we can define
(x) = g(f (x)), and on X X0 , we can set (x) = 1.
Note: It is not difficult to check that (i) remains true if f : X [, ].
Note: In general, I will not spell out the case of complex valued functions in these notes. These do not present any major difficulty compared to the real-valued case, as you can check by consulting the text
.

4.5

Borel functions

Definition 1. Consider the measurable space (Rd cB), or more generally

tha space (U, BU ), where U Rd is open. A function g : U R is called a
Borel function, it is measurable with respect to (U, BU ).
Theorem. Let M be a -algebra in X, and f : X Rd a function.
(i) Let F := {E Rd : f 1 (E) M}. Then F is a -algebra in Rd .
(ii) If f is measurable, and E is a Borel set in Rd , then f 1 (E) M.
(iii) If f : X R and f 1 ((, )) M for every R, then f is
measurable.
(iii) If f : X [, ] and f 1 ((, ]) M for every R, then f is
measurable.
(iv) If f : X Rd is measurable, g : Rd R is a Borel function, and
h = f g, then h : X R is also measurable.

34

Proof. (i) We check the three properties for F to be a -algebra. First,

f 1 (Rd ) = X M, so Rd F. Second, if E F, then
f 1 (Rd E) = X f 1 (E) M,
| {z }
M

and hence Rd E F, so F is closed under complements. Finally, if

E1 , E2 , F, then

1
f 1 (
(E ) M,
i=1 Ei ) = i=1 f
| {z i}
M

so
i=1 Ei F, and F is closed under countable unions. This completes the
proof that F is a -algebra.
(ii) If f is measurable, then all open sets of Rd are in F. Since by part (i)
the collection F is a -algebra, it follows that B F. This proves statement
(ii).
(iii), (iii) We only prove (iii), the proof of (iii) is very similar. Let

F = {E [, ] : f 1 (E) M}. By the same proof as in part (i), F

is a -algebra. Choose R, and choose n < such that n . Then
c

[, ) =
n=1 [, n ] = n=1 (n , ] F .
| {z }
F
by assump.

It follows that

(, ) = [, ) (, ] F .
Every open set V [, ] is a countable union of such segements, so all
open sets in [, ] belong to F . This implies that f is measurable.
(iv) Let V R be open. Then
h1 (V ) = f 1 (g 1 (V )) M
| {z }

by part (ii).

Remark. Statement (iv) remains true if Rd is replaced by [, ], [0, ],

or in fact any Borel subset of [, ].

35

4.6

Let (X, M) be a measurable space.

Theorem. If fn : X [, ], n 1 are measurable functions, then
g := sup fn

and

n1

h := lim sup fn
n

are also measurable. Similarly for inf n1 fn , and lim inf n fn .

Proof. For any R we have
g 1 ((, ]) = {x X : sup fn (x) > }
n1

= {x X : fn (x) > for some n 1}

[
=
{x X : fn (x) > }
=

n=1

n=1

fn1 ((, ]) M.
|
{z
}
M, since
fn meas.

This implies that g is measurable by Theorem 4.5(iii). From this we also

get that
ge := inf fn = (sup fn )
n1

n1

is also measurable.
We can write

n kn

n1 kn

and this is measurable by the previous paragraph. Similarly, lim inf n fn =

supn1 inf kn fk is also measurable.
Corollary. (i) If fn : X [, ] are measurable, and fn (x) f (x) in
[, ] at every x X, then f is also measurble.
(ii) If fn : X R are measurable, and fn (x) f (x) in R at every x X,
then f is also measurable.
(iii) If f : X [, ] is measurable, so are f + := max{f, 0} and f :=
max{f, 0}.
Remark. |f | = f + + f and f = f + f are useful identities.
36

4.7

Simple functions

Let (X, M) be a measurable space.

Definition. A function s : X R is called simple, if its range is a finite
set.
Note: Here we exclude from the possible values.
If 1 , . . . , n are the distinct values of s, we set
Ai = {x X : s(x) = i }.
Then
s=

n
X

i 1A i .

i=1

Remark. (HW)
s is measurable

1 i n : Ai M.

Theorem. Let f : X [0, ] be measurable. There exist simple functions

sn , such that:
(a) 0 s1 s2 f ;
(b) sn (x) f (x) as n at every x X.
Proof. For each n 1, define the function:
(
k2n if k2n t < (k + 1)2n for some integer 0 k < n2n ;
n (t) :=
n
if t n.

It is easy to see that:

each n is a Borel function on [0, ];
t 2n < n (t) t, if 0 t n;
0 1 (t) 2 (t) t;
n (t) t as n for all t [0, ].
Let us set sn (x) := n (f (x)). Then (a) + (b) are clear. Also, sn is measurable as a composition of the measurable function f with the Borel function
n (Theorem 4.5(iv)).

37

4.8

Let X be a non-empty set.

Theorem (Monotone class theorem). Let P be a -system in X, such that
X P. Let H be a collection of functions X R satisfying the following:
(i) If A P, then 1A H;
(ii) If f, g H, then also f + g H and for all c R also cf H (linear
space);
(iii) If fn H, fn f , f bounded, fn 0, then also f H (closed under
bounded monotone convergence of non-negative functions).
Then H contains all bounded measurable functions with respect to (P).
Proof. (Non-examinable) Step 1. Put
G := {A X : 1A H}.
Then P G by assumption. We show that G is a -system in X:
X P by assumption, so by (i) we have X G;
If B, C G, B C, then 1CB = 1C 1B H by (ii). Hence
C B G.
If Bn G, Bn B, then 1Bn 1B , and by (iii) we have 1B H.
Hence B G.
This verifies that G is a -system. By the - theorem, we conclude
that (P) G.

Summarizing the previous paragraph: the indicator function of any

element of (P) is in H. It follows using (ii) that all simple (P)measurable functions belong to H.

Using (iii) and Theorem 4.7 we get that all non-negative bounded (P)measurable functions belong to H. Using f = f + f , the statement
follows for all bounded (P)-measurable functions.

5
5.1

Abstract integration theory (Lebesgue integral)

Arithmetic in [0, ]

Recall the arithmetic operations introduced in Section 0.2.

Proposition. If 0 a1 a2 . . . and 0 b1 b2 . . . , and an a and
bn b, then an bn ab.
Proof. (HW).

38

With the arithmetic introduced, sums and products of measurable functions with values in [0, ] are also measurable. Indeed, if we have 0 sn f
and 0 tn g (with sn , tn simple functions), then using the Proposition we
have (sn + tn ) (f + g) and sn tn f g, showing that f + g and f g are also
measurable.

5.2

Let (X, M, ) be a measure space.

Definition. Let s : X [0, ) be aP
measurable simple function with
distinct values 1 , . . . , n , so that s = ni=1 i 1Ai . For any E M, we
define
Z
n
X
i (E Ai ).
s d :=
E

i=1

Note: we use here the convention 0 = 0, because we may well have

i = 0 and (E Ai ) = .
Lemma. If 0 s t are measurable simple functions, then
Z
Z
t d.
s d
E

Proof. If s = i i 1Ai and t = j j 1Bj , then whenever Ai Bj is nonempty, we must have i j (since we assumed s t). Hence, we have
Z
X
XX
s d =
i (E Ai ) =
i (E Ai Bj )
E

XX
i

X
j

5.3

j (E Ai Bj ) =

j (E Bj ) =

XX
j

j (E Ai Bj )

t d.
E

39

Definition. Let f : X [0, ] be a measurable function. For any E M,

we define

Z
Z
s d : 0 s f , s a measurable simple function ,
f d := sup
E

(*)
that is, the supremum is over all measurable simple functions s satisfying
0 s f . The expression (*) is called the Lebesgue integral of f with
respect to . It is a value in [0, ].
Note: If f itself is a simple function, then s = f appears in the supremum, and due to Lemma 5.2 is the maximal value considered in the supremum. Therefore, for simple functions the definition (*) agress with the
earlier Definition 5.2.

5.4

Theorem. All functions

R below Rare assumed measurable.
(i) If 0 f g then E f Rd E g d.
R
(ii) If A B, f 0 then A f d B f d. R
R
(iii) If f 0, c is a constant 0 c <
,
then
cf
d
=
c
E
E f d.
R
(iv) If f (x) = 0 for allR x E, then E f d = 0, even if (E) = .
(v) If (E) = 0, then
R E f d R= 0, even if f (x) = for all x E.
(vi) If fR 0, then E f d = X 1E f d.
(vii) If X f d < , then ({x X : f (x) = }) = 0.
Proof. (i) If 0 s f , s simple, then also 0 s g, so
Z
Z
g d.
s d
E

Taking the sup of the left hand side over all simple 0 s f yields the
statement.
P
(ii) If 0 s f , s simple, s = i i 1Ai , then (A Ai ) (B Ai ).
Hence
Z
Z
Z
f d.
s d
s d
A

Taking the sup of the left hand side over all simple 0 s f yields the
statement.
R
(iii) If c = 0, then cf 0, and c E f d
P = 0 as well, so the statement
holds. So assume 0 < c < . Write s = i i 1Ai . Then c1 , . . . , cn are
40

Z
Z
X
X
s d.
cs d =
ci (E Ai ) = c
i (E Ai ) = c
E

(17)

Also note that 0 s f if and only if 0 cs cf . Taking sup on both

sides of (17) we get theRstatement.
(iv) If E = , then E s d = 0 for any 0 s f . If E 6= , 0 s f ,
then 0 rng(s). Without loss of generality, assume 1 = 0. Then EA1 =
for i 2, and
Z
X
s d = 0(E A1 ) +
i () = 0,
E

i2

(even if (E) = ).
(v) IfR(E) = 0, then (E Ai ) = 0 for all i, so for any 0 s f
we have E s d = 0 (even if f (x) = for all x E). This implies the
statement.
(vi) We may assume
P that E 6= X (otherwise the statement is trivial).
Let 0 s f , s = i i 1Ai . Observe that 1E s is also a simple function.
We distinguish two cases according to whether 0 rng(s) or 0 6 rng(s).
If 0 rng(s), assume (without loss of generaility) that 1 = 0, and that
E Ai 6= for i = 2, . . . , n n, and E Ai = for n < i n. Then the
distinct values of 1E s are 0 = 1 , 2 , . . . , n , and
1E s = 01E c A1 +

n
X

i 1EAi = 01E c A1 +

n
X

i 1EAi

i=2

i=2

gives the decomposition of the simple function 1E s into a sum of indicators.

The decomposition gives:
Z

1E s d = 0(E A1 ) +

n
X
i=2

i (E Ai ) =

n
X
i=1

i (E Ai ) =

s d.
E

If 0 6 rng(s), then let assume without loss of generality that E Ai 6=

for i = 1, . . . , n n, and E Ai = for n < i n. Then the distinct
values of 1E s are 0, 1 , 2 , . . . , n , and
1E s = 01E c +

n
X

i 1EAi = 01E c +

n
X
i=1

i=2

41

i 1EAi

gives the decomposition of the simple function 1E s into a sum of indicators.

The decomposition gives:
Z

1E s d = 0(E ) +
X

n
X
i=1

Thus in both cases we have

Z

i (E Ai ) =

1E s d =
X

n
X
i=1

i (E Ai ) =

s d.
E

s d.

(18)

Now we use that all simple functions 0 t 1E f are of the form 1E s for
some simple function 0 s f . Therefore, taking the sup on both sides of
(18) we get the statement.
(vii) Put En = {x f (x) n}. Then for all n 1, we have
n(En ) =

(i)

En

n d

(ii)

En

f d

f d =: K < .

Therefore, we have
({x X : f (x) = }) (En )

K
,
n

n 1.

Letting n this implies the statement.

Remark. If E M, then E itself is a measure space: put ME := {F E :
F M},
ME is
R
R a -algebra in E. Put E := |ME . Then it is easy to see
that E f d = E f dE . This shows that in defining the integral, we could
have restricted to integrating over the entire space X.

5.5

As in the previous sections, let (X, M, ) be any measure space.

Theorem. Let s, t be non-negative measurable simple functions on X.
(i) For E M define
Z
s d.
(E) :=
E

(ii) We have
Z

(s + t) d =

s d +
E

42

t d.
E

Proof. (i) It is clear that : M [0, ]. We also have () = 0, due

to Theorem
5.4(v). Suppose now that E1 , E2 , . . . M are disjoint, and
P
s = ni=1 i 1Ai . Then
(
m=1 Em ) =
=
=

s d =

m=1 Em

n
X

i=1

i=1

 [

m=1

n
X

m=1 i=1

n
X

 [

m=1
n
X

(Em Ai ) =

i (Em Ai ) =



Em Ai
i

i=1

Z
X

m=1

(Em Ai )

s d =

m=1 Em

(Em ).

m=1

This completes P
the proof that is a measure.
(ii) Let t = m
j=1 j 1Bj . Then if Eij = Ai Bj , then we have
Z
Z
Z
s d+
(s+t) d = (i +j )(Eij ) = i (Eij )+j (Eij ) =
Eij

Eij

t d.
Eij

the statement.

5.6

Theorem (Monotone Convergence Theorem). Let {fn }n1 be a sequence

of measurable functions on X, and suppose that:
(i) 0 f1 (x) f2 (x) for all x X;
(ii) fn (x) f (x) as n for all x X.
Then f is measurable, and
Z
Z
f d, as n .
(*)
fn d
X

4.6, f is measurable. Due to Theorem 5.4(i), we have

RProof. By Theorem
R
f
d

f
X n+1 d, and therefore there exists [0, ], such that
RX n
X fnR d as n . Since fn f for all n 1, it is also clear that
X f d. Hence it remains to show that
Z
f d.
(19)

Fix 0 < c < 1, and let 0 s f be a simple function. Consider the sets
En := {x X : fn (x) > cs(x)}.
43

Due to the monotonicity of the fn s we have E1 E2 . Also, since for

every x X we have f (x) > cs(x), and fn (x) f (x), for large enough n we
have x En . This shows that
n=1 En = X. Now observe that
Z
Z
Z
Z
fn d
fn d
cs d = c
s d.
En

En

En

R
As n , the left hand side approaches . Using that (E) := E sR d is a
measure (Theorem 5.5), the right hand side approaches c(X) = c X s d,
which gives:
Z
c

s d.

Since 0 < c < 1 is arbitrary here, we can let c 1, and this yields:
Z
s d.

Finally, taking the sup over s on the right hand side yields (19). The proof
is complete.

5.7

As in earlier sections, let (X, M, ) be any measure space.

Theorem.
(i) If f, g : X [0, ] are measurable, then
Z
Z
Z
g d.
f d +
(f + g) d =
X

then
Z
Z
X
fn d.
f d =
X

n=1 fn (x),

n=1 X

Proof. (i) There exist simple functions 0 si f and 0 ti g. Then due

to Proposition 5.1 we have 0 (si + ti ) (f + g). Applying the Monotone
Convergence Theorem (Theorem 5.6), we get:
Z

Z
Z
Z
Thm 5.5
MCT
ti d
si d +
(si + ti ) d =
lim
(f + g) d = lim
i
i X
X
Z X
Z X
Z
Z
MCT
g d.
f d +
ti d =
si d + lim
= lim
i X

i X

44

P
(ii) Consider the partial sums gn (x) = ni=1 fi (x). Then 0 g1 (x)
g2 (x) , and gn (x) f (x) for all x X. Hence by part (i) and the
Monotone Convergence Theorem we get
" n Z
#
Z
Z
Z
X
X
(i)
MCT
fn d.
gn d = lim
fi d =
f d = lim
X

n X

i=1

n=1 X

Corollary.
Suppose
aij 0, i, j = 1, 2, . . . are real numbers. Then
P
Pthat
P
P
i=1
j=1 aij =
j=1
i=1 aij .

Proof. Indeed, take X = {1, 2, . . . } with counting measure, and let fi : X

[0, ] be defined by fi (j) = aij .

5.8

Fatous Lemma

As earlier, (X, M, ) is any measure space.

Theorem (Fatous Lemma). If fn : X [0, ], n = 1, 2, . . . are measurable, then
Z
Z
fn d.
lim inf fn d lim inf
X n

Proof. Put gn (x) := inf kn fk (x), n = 1, 2, . . . . Then we have 0 g1 (x)

g2 (x) , and gn (x) lim inf n fn (x) as n . Hence due to the
Monotone Convergence Theorem, we get
Z
Z
Z
MCT
gn d.
(20)
lim gn d = lim
lim inf fn d =
X n

X n

n X

R
But gn (x) fk (x),Rfor all k n, andR therefore X gn d X fk d for all
k n. Therefore, X gn d inf kn
R X fk d. Inserting this into (20), the
right hand side becomes lim inf n X fn d, and the statement follows.

5.9

Density functions

Let (X, M, ) be any measure space.

TheoremR 1. Let f : X [0, ] be measurable, and for E M, put
(E) := E f d. Then is measure on (X, M), and for all measurable
g : X [0, ] we have:
Z
Z
gf d.
g d =
X

45

Proof. It is clear that : M [0, ], and that () = 0. Suppose

that
P
E1 , E2 , . . . M are disjoint and E =
E
.
Then
1
f
=
1
n
E
E
nf.
n=1
n=1
Therefore, using Theorem 5.7(ii) we have
(E) =
=

f d =
E

1E f d

Thm 5.7

Z
X

1En f d =

f d

n=1 En

n=1 X

Z
X

(En ).

n=1

This proves that is a measure.

For the second statement, first note that it holds when g = 1E for some
E M, since:
Z
Z
Z
1E f d.
f d =
1E d = (E) =
X

Indeed:
Z "X
n
X

i=1

i 1A i

Thm 5.7(i)
Thm 5.4(iii)

n
X

i=1

Thm 5.7(i)
Thm 5.4(iii)

(i)

Z "X
n
X

i=1

1Ai d =
#

n
X
i=1

Pn

i=1 i 1Ai .

1Ai f d

i 1Ai f d.

For general non-negative measurable g, take simple 0 si g, and use the

Monotone Convergence Theorem of both sides of the equality:
Z
Z
si f d
si d =
X

to get the statement.

Remark. It is customary to write d = f d, and call f the density function of with respect to .
There is an important converse to the above theorem that we now state.
Definition. If and are both measures on (X, M), we say that is
absolutely continuous with respect to , denoted , if whenever
(E) = 0 then also (E) = 0.

46

R
Observe that has this property: if (E) = 0, then (E) = E f d = 0.
The following theorem states that under a -finiteness assumption, any
finite measure that is absolutely continuous with respect to has a density.
Theorem 2 (Radon-Nikodym Theorem). If is -finite, and (X) < ,
then there
R exists a non-negative measurable function f on X such that
(E) = E f d for all E M.
(Non-examinable) For the proof see for example .

5.10

Let (X, M, ) be any measure space.

Definition 1. We define:


Z
|f | d <
L1 () := f : X R : f is a measurable and
X

= summable functions (w.r.t. ).

Note: Due
|f | = f + + f , we have f L1 () if
R to+ Theorem 5.7(i),
R and

Z
Z
Z
+
f d,
f d
f d :=
E

E M.

Sometimes it is convenient to extend this as long as we do not have

.
Definition 3. Given a measurable f
R
R
+

X f d X f d

Z
+
f d :=

undefined

: X [, ], we set
if
if
if
if

both are finite;

R
R
f + = but X f d < ;
X
R
R
f = but X f + d < ;
X
R
R
+

X f d = = X f d.

Remark. In the case of complex valued measurable functions f : X C,

the definition of L1 () is identical (where | | now denote the modulus of a
complex number). If f L1 () and f = u+iv
is theRdecomposition
R
R of f into
its real an imaginary parts, then we set X f d := X u d + i X v d C.
As mentioned earlier, the complex-valued case does not present any major
difficulties, and we will mostly restrict to real-valued functions in these notes.
47

5.11

Let (X, M, ) be any measure space.

Theorem 1. If f, g L1 () and , R, then (f + g) L1 () and we
have
Z
Z
Z
g d.
(21)
f d +
(f + g) d =
X

Proof. It is clear from Corollary 4.4(ii) that f + g is measurable. To see

integrability, we estimate
Z
Z
(|||f | + |||g| ) d
|f + g| d
X
Thm 5.7
Thm 5.4(iii)

||

|f | d + ||

|g| d < .

This shows that f + g L1 (). In order to prove (21), we first show that
Z
Z
Z
g d.
(22)
f d +
(f + g) d =
X

For this, write f = f + f , g = g + g , and with h = f + g, write

h = h+ h . Then we have
h+ + f + g = h + f + + g + .
Since all terms on both sides are non-negative, we can apply Theorem 5.7(i)
to get:
Z
Z
Z
Z
Z
Z
+

+
g + d.
f d+
(f +g) d+
g d =
f d+
(f +g) d+
X

Since all terms here are finite, we are allowed to rearrange and this yields
(22).
Finally, we prove
Z
Z
f d =
f d.
(23)
X

When 0, we have
Z
Z
Z
Z
+

+
f d
f d
(f ) d =
(f ) d
l.h.s. of (23) =
X
X
X
X
Z
Z
5.4(iii)
f d = r.h.s. of (23).
f + d
=
X

48

When < 0, we can reduce to the case of positive as follows:

Z
Z
+
(()(f )) d
(()(f )) d
l.h.s. of (23) =
X
X
Z
Z

()(f + ) d
()(f ) d
=
X
X
Z
Z

= ()
f d ()
f + d
X

 ZX
Z
+

f d = r.h.s. of (23).
f d +
=
X

The following theorem is essentially trivial for real-valued functions, so

we spell out the complex-valued case that is only slightly more tricky.
R
R
Theorem 2. Suppose f : X C. If f L1 (), then X f d X |f | d.

Proof. When f is real-valued, the statement follows from the fact that the
distance between two non-negative real numbers is at most their sum:
Z
Z
Z
Z
Z
Z

+

f d = f d
|f | d.
f
d
=
f
d
+
f
d

X

When
we use that there exists a real number such
R f is complex-valued,
R
that | X f d| = ei X f d. Therefore, we have

Z
Z
Z
Z

Thm 1
i
f d = ei
(ei f ) d.
e f d =
f d =

X

Here (z) denotes the real part of the complex number z, and the equality
holds, because the quantity we started with at the left hand side is real.
Now, the real part of a complex number is at most its modulus, so we have
Z
Z
Z
Z

i
i
f d =
|f | d,
(e f ) d
e f d =

X

since |ei | = 1. This proves the statement in the complex case.

Remark. It is not difficult to see that equality holds here if and only if
(ei f ) |f |, except on a set of measure 0. In other words, if and only
if f = ei |f |, that is the argument of f is constant, apart from a set of
measure 0.
49

5.12

Let (X, M, ) be a measure space.

Theorem (Dominated Convergence Theorem). Suppose that fn : X C
are measurable, and fn (x) f (x) for every x X. If there exists g : X
[0, ] such Rthat g L1 () and |fn (x)| g(x) for every x X, then:
(i) limn RX |fn f | d
R = 0;
(ii) limn X fn d = X f d.

Proof. Letting n in the inequality |fn (x)| g(x) we get |f (x)| g(x),
and hence f L1 (). Since |fn f | 2g, we have 2g |fn f | 0.
Therefore, Fatous Lemma (Theorem 5.8) can be applied to the sequence
2g |fn f |. This gives
Z
Z
Z
(2g |fn f |) d
lim inf (2g |fn f |) d lim inf
2g d =
n
n
X
X
Z
ZX
|fn f | d
2g d + lim inf
=
n
ZX
ZX
|fn f | d.
2g d lim sup
=
R

Since X 2g d < , we may cancel it from the left hand side and the last
expression, and rearrange to get
Z
|fn f | d 0.
lim sup
n

But the lim sup of a non-negative sequence can only be less than or equal
to 0, if the sequence converges to 0, so we get the statement
(i). R
R
R The statement (ii) now follows from (i), because | X fn d X f d|
X |fn f | d.

5.13

Let (X, M, ) be a measure space.

Definition. Let P be a property that a point x X may or may not have.
For example, f (x) > 0 (with f a given function), or fn (x) converges to a
limit (with {fn } a given sequence of functions). Given E M, we say that
P holds almost everywhere on E, if there exists N M with (N ) = 0,
such that P holds for all x E N . We often abbreviate this to P holds
a.e. on E, and when the measure needs to be emphasized, to P holds
a.e.[] on E.
50

Example. If f, g : X R are measurable, and ({x X : f (x) 6= g(x)}) =

0, we say that f = g a.e.[] on X, or simply that f = g a.e.[]. We are going
to write f g for this relation
(f and
R
R g are equivalent). When f g,
then for all E M we have E f d = E g d. That is, equivalent functions
behave identically for the purposes of integration.
Lemma 1. The relation is indeed an equivalence relation.
Proof. Reflexivity is clear: f = f a.e.[] (we can take the exceptional set
N = .
Symmetry is also clear, since if f g with exceptional set N , then the
same exceptional set can be used to show that g f .
Let us check transitivity. Suppose that f g and g h, and let N1 and
N2 be the exceptional sets: f (x) = g(x) for all x X N1 , and g(x) = h(x)
for all x X N2 , where (N1 ) = 0 = (N2 ). Put N = N1 N2 . Then
(N ) = 0, and we have f (x) = h(x) for all x X N .
Remark 1. As in the last part of the proof above, we can combine exceptional
sets to show that several statements simultaneously hold a.e. The only thing
we have to be careful about is that we are only allowed to combine countably
many statements.
L1 () and L1 []. It is sometimes convenient to identify equivalent functions. Let us write


[f ] := f1 L1 () : f1 f .
Define

L [] := {[f ] : f L ()} = {[f ] :

|f | d < }.

Lemma 2.
(i) If f f1 , g g1 and R, then f + g f1 + g1 and f f1 .
(ii) L1 [] is a vector space with addition [f ] + [g] := [f + g] and scalar
multiplication [f ] := [f ].
Proof. This is straightforward to check (HW).
Remark 2. Note that when we work with a representative forRan equivalence
class in L1 [], we can even allow f : X [, ]. Since X |f | d < ,
we know that |f | < a.e.[]; see Theorem 5.4(vii). Therefore, f is equal
a.e. to a real-valued representative. Also, in computing f g, we may have
at some points, but only on a set of measure 0, so it can be ignored.
51

5.14

Completion

Theorem. Let (X, M, ) be a measure space. Let

c := {E X : A, B M, A E B, such that (B A) = 0}.
M

c define
c Then
For E M,
b(E) = (A), with A as in the definition of M.
c is a -algebra,
c
M
b is well-defined and is a measure on M.

Proof. (HW)

c the completion of M with respect to . It

Definition. We call M
c
has the property that if E M,
b(E) = 0, and F E, then we also have
c
F M (and of course
b(F ) = 0 as well). We say that the measure space
c
(X, M,
b) is complete.

5.15

Let (X, M, ) be a measure space.

Theorem. Suppose fn are defined a.e.[] on X, and are measurable. Suppose
Z
X
|fn | d < .
(24)
n=1 X

f (x) =

fn (x)

(25)

n=1

Z

f d =

Z
X

fn d.

(*)

n=1 X

0. Put

X
|fn (x)|, x S :=
(x) =
n=1 Sn .
n=1

Z
d < .
X

52

(26)

Let E = {x S : (x) < }. From (26) we have (E c ) = 0. For every

x E, the series (25) converges absolutely, and |f (x)| (x) on E. Hence
n
f L1 (). If gn = f1 + + fn , then |gn | , gn (x) f (x) at every
x E, so by the Dominated Convergence Theorem, we get (*).

5.16

Let (X, M, ) be a measure space.

R
1. If f : X [, ] is measurable, E M, E |f | d = 0, then f = 0
a.e. on E.
R
2. If f L1 (), E f d = 0 for all E M, then f = 0 a.e. on X.
R

R
3. If f L1 (), X f d = X |f | d, then there exists a constant ,
|| = 1, such that f = |f | a.e. on X.

6
6.1

Convex functions

Definition. A function : (a, b) R (where a < b ) is called

convex if
(x + (1 )y) (x) + (1 )(y)
(27)
for all x, y (a, b) and for all 0 1. Equivalent to (27) is the requirement that
(u) (t)
(t) (s)

(28)
ts
ut
for all a < s < t < u < b.
How to check convexity? A differentiable function is convex on (a, b),
if (s) (t) for all a < s < t < b.
We will use the following theorem from elementary analysis.
Theorem. If is convex on (a, b), then it is continuous on (a, b).
Proof. (Non-examinable) Fix t (a, b). We show that is continuous
at t. Fix any a < u < t < v < b. Then for u < s < v we have:
(s) (t)
(v) (t)
(t) (u)

.
tu
st
vt

53

Hence



(t) (u) (v) (t)

.

|(s) (t)| |s t| max
,

tu
vt

This shows that as s t, (s) (t). (We even get the stronger
statement that is Lipschitz on any compact subinterval of (a, b).)

6.2

Jensens inequality

Theorem. Suppose (, M, ) is a measure space such that () = 1 (a

probability space). If f L1 (), a < f (x) < b for all x , and is convex
on (a, b), then
 Z
Z
(29)
f d ( f ) d.

Remark. It may happen that ( f ) 6 L1 (). In this case the proof below
will show that the right hand side is +.
R
Proof of Theorem. Denote t := f d. Then a < t < b. Let
(t) (s)
.
ts
s:a<s<t

:= sup

Then we have
(s) (t) + (s t),

a < s < t.

(30)

(u) (t)
,
ut

t < u < b,

so we also have
(u) (t) + (u t),

t < u < b.

(31)

(s) (t) + (s t),

a < s < b.

Therefore, for all x , we have:

(f (x)) (t) (f (x) t) 0.

54

(32)

get:

Z

Z
Z
Z

( f ) d
f d
f d ( f d)() 0.

|
{z
}
=0

6.3

Examples

exp

Z

f d

ef d.

2. Suppose is finite, = {x1 , . . . , xn }, ({xi }) = n1 , f (xi ) = ai . Then

item (1) specializes to:


1
1
exp
(a1 + + an ) (ea1 + + ean ).
n
n
3. Putting bi = eai in item (2), we get the familiar inequality between
the geometric and arithmetic mean:
(b1 bn )1/n

1
(b1 + + bn ).
n

4. P
A bit more generally than item (3), if ({xi }) = pi > 0, where
n
i=1 pi = 1, we have:
bp11 bpnn p1 b1 + + pn bn .

6.4

H
olders inequality; Minkowskis inequality

Let (X, M, ) be a measure space.

Definition. If 1 < p < , 1 < q < and p1 + 1q = 1, we call p and
q conjugate exponents. (A special case if p = q = 2.) We extend the
definition to the pairs p = 1, q = and p = , q = 1.

55

Theorem. Let p and q be conjugate exponents, 1 < p < . Let f, g : X

[0, ] be measurable functions. Then
Z

and

f g d

Z

Z

f d
X

(f + g) d
X

1/p Z

1/p

1/q

1/p

g d
X

Z

f d
X

(H
olders inequality)

Z

g d
X

1/p

(Minkowskis inequality)

Proof. We start with the proof of H

olders inequality. Put
1/q
Z
1/p
Z
.
g q d
and
B :=
f p d
A :=
X

Let us first discard some easy cases. If A = 0, then f = 0 a.e. and therefore
f g = 0 a.e. It follows that the left hand side of (H
olders inequality) is
0, and the inequality holds. If A > 0 and B = , then the right hand
side of (H
olders inequality) is , and the inequality holds. By symmetric
arguments, we may discard the case B = 0 and the case B > 0, A = .
Henceforth we may assume 0 < A < and 0R < B < . R
f
Put F = A
and G = Bg . Then we have F p = 1 = Gq . Consider a
point x X such that 0 < F (x) < and 0 < G(x) < . Write these
numbers in the form F (x) = es/p and G(x) = et/q , with some numbers
s, t R. Then by Section 6.3, we have
1 s 1 t
e + e
p
q
1
1
F (x)G(x) F (x)p + G(x)q .
p
q
es/p+t/q

(33)

Observe now that (33) also holds, if F (x) or G(x) equals 0 or , and
therefore, we have the inequality (33) at every x X. Integrating both
sides of (33) we get:
Z
Z
Z
1
1 1
1
p
F d +
Gq d = + = 1.
F G d
p X
q X
p q
X
Therefore,

f g d AB,

which is the desired inequality (H

olders inequality).
56

We now prove Minkowskis inequality. Write

(f + g)p = f (f + g)p1 + g(f + g)p1 .

(34)

Using H
olders inequality for the first term, we have
Z
Z
1/p  Z
1/q
fp
(f + g)(p1)q
f (f + g)p1
Z
1/p  Z
1/q
fp
(f + g)p
,
=
where in the last step we used
(p 1)q = (p 1)

1
1

1
p

p(p 1)
= p.
p1

Similalrly, using H
olders inequality for the second term in (34), we have
Z
Z
1/p  Z
1/q
p
p1
g
(f + g)(p1)q
g(f + g)

1/q
1/p  Z
Z
(f + g)p
.
gp
=
Adding the two inequalitues, we have:
 Z
Z
1/p   Z
1/q
1/p  Z
p
p
p
(f + g)
g
(f + g)p
.
f
+

(35)

We now discard some trivial cases

sure there is no problem with
R to make
p
infinities, when re-arranging. If (f + g)
and
R = 0,p then f = g = 0Ra.e.,
p =
(Minkowskis
inequality)
holds.
Assume
(f
+g)
>
0.
If
we
have
f
R
of g p = , then (Minkowskis inequality) clearly holds. Otherwise, using
R
1 p
p
p we have (f + g)p < . Therefore, we can divide
that ( f +g
2 ) 2 (f + g ),
R
through by the factor ( (f + g)p )1/q in (35), and use 1 1q = p1 to get
(Minkowskis inequality).
When does equality hold? Examining the proof of H
olders inequality, we
see that in (33) equality holds if and only if s = t, that is, if F (x)p = G(x)q .
Therefore, after integration, equality will hold if and only if F (x)p = G(x)q
a.e. This holds if and only if there are constants , , not both equal to
0, such that f p = g q . (HW): when does equality hold in Mikowskis
inequality?
57

6.5

Lp -spaces

Let (X, M, ) be a measure space.

Definition 1. If 0 < p < , and f : X [, ] is measurable, we
define
1/p
Z
p
,
|f | d
kf kp :=
X

and let

Lp () := {f : X [, ] : f is measurable and kf kp < } .

We call kf kp the Lp -norm of f .
Example 1. When is Lebesgue measure on Rd , we write Lp (Rd ) for Lp ().
Example 2. When is counting measure on a set A, we write p (A) for
Lp (). When A is countably infinite,
the elements of p (A) are infinite
P
sequences x = (1 , 2 , . . . ) such that n=1 |n |p < . It is customary to
P
p 1/p .
abbreviate this space to p . In this case kxkp = (
n=1 |n | )

We next define kk and the space L (). The idea is that kk should
be the smallest a.e.-bound we have on the function. (Recall that we want
definitions to be insensitive to changing a function on a set of measure 0.)
We now state the formal definition.
Definition 2. Suppose g : X [0, ] is measurable. Let
S := { 0 : (g 1 ((, ])) = 0},
and let := inf S (defined to be , when S = ). Note that since


1
1
1

1
(g ((, ])) = n=1 g (( + , ]) = lim (g 1 (( + , ]))
n
n
n
= 0,

we have S. That is, g a.e., and is the smallest number with this
property. We call the essential supremum of g, and denote it ess.sup g.
If f : X [, ] is measurable, we define kf k := ess.sup |f |, and let
L () := {f : X [, ] : f is measurable and kf k < } .
We call kf k the L -norm of f . The functions in L () are also called
essentially bounded functions.
58

Example 3. We write L (Rd ), when is Lebesgue measure on Rd . When

is counting measure on A, all functions are measurable, and only the empty
set has measure 0, so the definition reduces to:
(A) = {f : A R : f is bounded}.
When A is countable, we write for the space of bounded sequences.
The following theorem is essentially obvious from our definitions and
H
olders inequality for non-negative functions.
Theorem 1. If p and q are conjugate exponents, 1 p , and f Lp ()
and g Lq (), then f g L1 () and kf gk1 kf kp kgkq .
Proof. If 1 < p < , apply H
olders inequality to |f | and |g| to get
Z
Z
|f ||g| d kf kp kgkq < .
|f g| d =
X

This shows both that f g L1 (), and that kf gk1 kf kp kgkq .

If p = 1, q = , we observe that |f (x)g(x)| = |f (x)||g(x)| |f (x)|kgk
a.e. Integrating, we get
Z
|f g| d kf k1 kgk < .
X

The argument when p = , q = 1 is identical.

Theorem 2. Let 1 p , and f, g Lp (). Then f + g Lp () and
kf + gkp kf kp + kgkp .
Proof. When 1 < p < , apply Minkowskis inequality to |f | and |g| to get
Z
Z
p
(|f | + |g|)p d kf kp + kgkp < .
|f + g| d
X

This shows both that f + g Lp () and that kf + gkp kf kp + kgkp .

When p = 1, we simply use
Z
Z
(|f | + |g|) d = kf k1 + kgk1 < .
|f + g| d
X

When p = , we use that

|f (x)+g(x)| |f (x)|+|g(x)| kf k +kgk < for almost every x X.

59

The spaces Lp () and Lp []. Note that for R we have kf kp =

||kf kp , Lp () is a vector space. The following abstract nonsense is sometimes useful. We write Lp [] for the collection of equivalence classes of
functions in Lp (). Then we can define a distance on Lp [] as follows:
d([f ], [g]) := kf gkp .
It it easy to check that the right hand side indeed does not depend on which
representatives we pick from the equivalence class. With this definition, the
distance function d satisfies the triangle inequality: for [f ], [g], [h] Lp []
we have
d([f ], [h]) = kf hkp kf gkp + kg hkp = d([f ], [g]) + d([g], [h]).
Moreover, if d([f ], [g]) = 0, we have kf gkp = 0, and hence |f g| = 0 a.e.
and hence [f ] = [g]. The latter property would not have been true without
the idenitification of equivalent functions.

Completeness of Lp ()

6.6

Let (X, M, ) be a measure space.

Definition 1. If fn Lp (), n = 1, 2, . . . and f Lp (), we say that fn
convergese to f in Lp () if limn kfn f kp = 0. We denote this as
Lp

fn f .

Definition 2. We say that a sequence {fn }n1 of elements of Lp () is a

Cauchy sequence in Lp (), if for any > 0 there exists N = N () such
that for all n, m N we have kfn fm k < .
Theorem 1 (Riesz-Fisher Theorem). Let 1 p . Then Lp () is
complete, that is, every Cauchy sequence {fn }n1 in Lp () converges to a
limit f Lp () in Lp ().
Proof. We first prove the case 1 p < (the case p = being easy). The
idea is to find a subsequence along which convergence happens a.e. We show
that we can find a subsequence n1 < n2 < . . . such that
kfni+1 fni kp <

1
,
2i

i = 1, 2, . . . .

(36)

Indeed, using the Cauchy property with = 1/2, we can find an index n1 ,
such that
1
kfm fn1 kp <
for all m > n1 .
2
60

Next, using the Cauchy property with = 1/4, we can find n2 > n1 , such
that
1
kfm fn2 kp <
for all m > n2 .
4
Continuing inductively, we find indices n1 < n2 < . . . such that
kfm fni kp <

1
2i

(37)

Put
gk :=

k
X
i=1

|fni+1 fni |,

g :=

X
i=1

|fni+1 fni |.

kgk kp

k
X
i=1

kfni+1 fni kp

k
X
1
1.
2i
i=1

Since 0 gkp g p , the Monotone Convergence Theorem gives:

Z
Z
p
g d = lim
gkp d 1.
k X

In particular, g < a.e. At any point x X where g(x) < , the series
fn1 (x) +

X
i=1

(fni+1 (x) fni (x))

converges absolutely, so this series converges a.e. Denote the sum of this
series by f (x) whenever it converges, and let f (x) = 0 on the complementary
set (of measure 0). Since the partial sums of the series are
fn 1 +

k1
X
i=1

(fni+1 fni ) = fnk ,

we have fnk (x) f (x) a.e. Therefore, we have found the claimed subsequence converging to a limit a.e. What is left to show is that f Lp () and
Lp

that fn f .

61

Fix > 0. Due to the Cauchy property, there exists N such that for
n, m N we have kfm fn kp < . In particular, fixing an n N , and
applying Fatous Lemma we have
Z
Z
lim inf |fnk fn |p d
|f fn |p d =
X k
X
Z
(38)
lim inf
|fnk fn |p d
k

lim inf kfnk fn kpp p .

k

Since fn Lp () and
kf kp kfn kp + kf fn kp < ,
it follows that f Lp (). Looking at (38) again, we get that for all n N
we have kf fn kp . Since was arbitrary, we obtain that kf fn kp 0
as n , and the theorem is proved in the case 1 p < .
The case p = is a lot easier. The sets
Ak = {x X : |fk (x)| > kfk k }

have measure 0. Let E = (

k=1 Ak ) (n,m=1 Bn,m ), so that (E) = 0.
At every x X E, the sequence {fn (x)}n1 is a Cauchy sequence in R,
and hence converges to a limit f (x) in R. Since the Cauchy property holds
uniformly at every x X E, |f (x)| is uniformly bounded for x X E,
and the convergence is uniform in x X E. Therefore, f L (), and
kfn f k 0 as n .

The following theorem is a corollary of the proof of Theorem 1, and is

useful in itself.
Theorem 2. Let 1 p . If {fn }n1 is a Cauchy sequence in Lp ()
with limit f , then there exists a subsequence fnk such that fnk f a.e.

7
7.1

Product measure

Definition 1. Let (X, M, ) and (Y, N , ) be measure spaces. A set of the

form A B X Y , where A M and B N , is called a measurable
rectangle.
62

We denote
S := {A B : A M, B N } .
Lemma. S is a semi-algebra.
Proof. If A B, C D S, we have
(A B) (C D) = (A C) (B D) S,
| {z } | {z }
M

so S is closed under intersection.

If A B S, we have:

(A B)c = (Ac B) (A B c ) (Ac B c ),

which is a disjoint union of members of S. This shows that S is a semialgebra.
Definition 2. We denote M N := (S), and call it the product algebra (of M and N ).
Caution: Although we write M N , this is not a Cartesian product
of the collections M and N .
(HW): Show that if B n = B(Rn ), then B n B m = B n+m .
Theorem. If and are -finite measures, then there exists a unique
measure on the product -algebra M N such that
(A B) = (A)(B),

for all A M, B N .

(39)

Definition 3. We denote =: , and call it the product measure (of

and ).
Proof of Theorem. Formula (39) defines on S. We show that the conditions of Theorem 3.7 are satisfied. We have
() = ( ) = ()() = 0.
We also show that if A B =
i=1 (Ai Bi ) is a disjoint union, then
(A B) =

X
i=1

63

(Ai Bi ).

(40)

Given x A, write I(x) := {1 i < : x Ai }. Then we have B =

iI(x) Bi , and hence by countable additivity of , we have
1A (x)(B) =

1Ai (x)(Bi ).

i=1

Integrating both sides with respect to , and using Theorem 5.7(ii), we get
!
Z
Z

X
X
(A)(B) =
1Ai (x)(Bi ) d(x)
1Ai (x)(Bi ) d(x) =
X

i=1

i=1

(Ai )(Bi ).

i=1

This proves the claim (40). An application of Theoem 3.7 yields that
extends uniquely to a pre-measure on the algebra generated by S.
Since we assumed that and are -finite, we can write X =
n=1 Xn ,
Y =
Y
,
where
(X
),
(Y
)
<
.
Therefore,
we
have
X

Y =
n
n
n=1 n

n=1 Xn Yn , with (Xn Yn ) < , so is -finite. Therefore, we can apply

Caratheodorys Extension Theorem 3.10, and the statement is proved.

7.2

Example

Write (Rn , B n , n ) for Lebesgue measure on Rn . It can be shown using the

Uniqueness Lemma 3.12 that n m = n+m .

7.3

Fubinis Theorem

Theorem. Let (X, M, ) and (Y, N , ) be -finite measure spaces. Let f

be an M N -measurable function. Suppose that either:
(i) f 0; or
(ii) f L1 ( ).
Then we have

Z
Z Z
f (x, y) d(y) d(x) =
f d( )
Y
XY
X

Z Z
f (x, y) d(x) d(y).
=
Y

Proof. It is enough to show the first equality (the other one being completely
analogous). As part of the proof, we have to show that:
64

(i) for all x X the function

y 7 f (x, y) is N -measurable; and
R
(ii) the function x 7 Y f (x, y) d(y) is defined -a.e. and is M-measurable.
Assume first that f = 1E , where E M N . For x X, define
Ex := {y Y : (x, y) E},
called the cross section of E at x.
Lemma 1. Let E M N . Then for all x X, we have Ex N .
Proof. Fix x X, and conside the collection of sets for which that statement
in the lemma is true, that is, let
E := {E M N : Ex N }.
We have S E: indeed, if A B S, we have
(
B if x A;
(A B)x =
if x 6 A.
We show that E is a -algebra in X Y , which will imply that E contains
(S) = M N , and hence equals M N .
It is clear that X Y E. Let now E E. Then
(E c )x = {y Y : (x, y) E c } = {y Y : (x, y) 6 E}
= {y Y : (x, y) E}c = (Ex )c N ,

showing that E is closed under complements. Let E1 , E2 , . . . E. Then

(
i=1 Ei )x = {y Y : (x, y) i=1 Ei } = i=1 {y Y : (x, y) Ei }

=
i=1 (Ei )x N ,

showing that E is closed under countable unions. Therefore, E is indeed a

-algebra, and the proof of the lemma is complete.
Lemma
2. Let E M N . The function g(x) := (Ex ) is M-measurable,
R
and X g d = ( )(E).

Proof. Let X =
n=1 Xn and Y = n=1 Yn , such that (Xn ), (Yn ) < .
We first prove the statement for E Xn Yn . Let

Pn := measurable rectangles in Xn Yn

= {A B : A Xn , A M, B Yn , B N } .
65

Then Pn is a -system in Xn Yn . Let Ln denote the collection of subsets

of Xn Yn that are in M N and for which the statement of the lemma is
true:


E Xn Yn , the
R function x 7 (Ex ) is M- .
Ln := E M N :
measurable and X (Ex ) d = ( )(E)

We first check that Pn Ln . Let E = A B Pn . Then

(
(B) if x A;
g(x) = (Ex ) =
0
if x 6 A.

So g(x) = (B)1
R A (x), and this function is indeed M-measurable, since
A M. Also, X g d = (A)(B) = ( )(E)
Now we check that Ln is a -system. First, it is clear that Xn Yn Ln ,
by the previous paragraph. Let E, F Ln , E F . Then
((F E)x ) = (Fx Ex ) = (Fx ) (Ex ).
Here we used that we are allowed to subtract, since the sets have finite
measure due to (Yn ) < . Since E, F Ln , the functions x 7 (Fx )
and x 7 (Ex ) are both M-measurable, and hence x 7 ((F E)x ) is
M-measurable. Also, we have
Z
Z
Z
((F E)x ) d =
(Fx ) d
(Ex ) d
X

= ( )(F ) ( )(E) = ( )(F E).

Here we used that we are allowed to subtract, since the integrals are finite
due to (Xn ) < . Therefore, we have F E Ln . The proof is similar
for monotone unions. Let E1 E2 . . . , E1 , E2 , . . . Ln , and write E =

i=1 Ei . Then Ex = i=1 (Ei )x , and this is an increasing union. Therefore

(Ex ) = (
i=1 (Ei )x ) = lim ((Ei )x ).
i

Thus x 7 (Ex ) is a limit of M-measurable functions, and hence Mmeasurable. Also, due to the Monotone Convergence Theorem, we have
Z
Z
((Ei )x ) d = lim ( )(Ei ) = ( )(E).
(Ex ) d = lim
X

i X

66

Now we can apply the - Theorem to get Ln (Pn ) = M|Xn N |Yn ,

that is the statement of the lemma holds for all E Xn Yn , E M N .
To complete the proof of the lemma, let now E M N be arbitrary,
and let En = E (Xn Yn ). We have
(Ex ) = lim ((En )x ),
n

so the function x 7 (Ex ) is a limit of M-measurable functions and hence

itself M-measurable. Also, the Monotone Convergence Theorem gives
Z
Z
((En )x ) d = lim ( )(En ) = ( )(E).
(Ex ) d = lim
n

n X

This completes the proof of the lemma.

We can now finish the proof of Fubinis Theorem in a few steps. If
f = 1E forP
some E M N , the statement holds due to Lemma 2.
If f = ni=1 i 1Ei is a non-negative simple function, then the function
y 7 f (x, y) =

n
X

i 1(Ei )x (y)

i=1

Z

f (x, y) d(y) =
Y

f (x, y) d(y) =
Y

i ((Ei )x ).

i=1

So using Lemma 2 the function x 7

Integrating with respect to , we get:
Z Z

n
X

n
X

i=1

XY

((Ei )x ) d(x) =
X

n
X
i=1

i ( )(Ei )

f d( ).

So the statement holds for non-negative simple functions.

Let now f be any non-negative M N -measurable function. Choose
simple functions 0 sn f . Then by the previous paragraph,
y 7 f (x, y) = lim sn (x, y)
n

67

is a limit of N -measurable functions, and hence is N -measurable. By the

Monotone Convergence Theorem,
Z
Z
sn (x, y) d(y).
f (x, y) d(y) = lim
n Y

R
Hence x 7 Y f (x, y) d(y) is a limit of M-measurable functions, and hence
is itself M-measurable. Integrating with respect to and using the Monotone Convergence Theorem again yields


Z Z
Z Z
sn (x, y) d(y) d(x)
f (x, y) d(y) d(x) = lim
n X
Y
Y
X
Z
sn d( )
= lim
n XY
Z
f d( ).
=
XY

This completes the non-negative case of the Theorem.

Finally, suppose that f L1 ( ). Write f = f + f . Then since f +
and f are M N -measurable,
y 7 f (x, y) = f + (x, y) f (x, y)
is also M N -measurable. Applying the non-negative case to |f | we have
that

Z Z
|f (x, y)| d(y) d(x) < .
(41)
Y

Therefore

(x) :=

This implies that that the function

Z
Z
Z
+
g(x) :=
f (x, y) d(y) =
f (x, y) d(y)
f (x, y) d(y)
Y

is well-defined for -a.e. x. For definiteness, let us put g(x) = 0 when (x) =
. Then g is a difference of
R two M-measurable functions,
R and hence is itself
M-measurable. We have Y f + (x, y) d(y) (x) and Y f (x, y) d(y)

68

(x). Therefore, due to (41) we have g L1 () and


Z Z
f (x, y) d(y) d(x)
Y
X
Z
g(x) d(x)
=
X


Z Z
Z Z

+
f (x, y) d(y) d(x)
f (x, y) d(y) d(x)
=
Y
X
ZX Y
Z
Z
+

=
f d( )
f d( ) =
f d( ).
XY

This completes the

7.4

XY

L1 -case

XY

of the Theorem.

Important counterexamples

Example 1. Let X = Y = {1, 2, . . .} with = = counting measure. Put

f (m, m) = 1, f (m, m + 1) = 1 for all m 1, and put f (m, n) = 0
otherwise. Then
XX
XX
f (m, n) = 1
and
f (m, n) = 0.
m

Here f 6 0 and f 6 L1 ( ).
Example 2. Take X = (0, 1), Y = (1, ) with Lebesgue measure. Let
f (x, y) = exy 2e2xy . Then

Z 1 Z
f (x, y) dy dx > 0
1
0

Z Z 1
f (x, y) dx dy < 0.
1

L1 ((0, 1)

Here again f 6 0 and f 6

(1, )).
Example 3. Let X = (0, 1), M = B, = Lebesgue measure, and let Y =
(0, 1), N = all subsets of (0, 1), and = counting measure. Let
(
1 if x = y;
f (x, y) =
0 if x 6= y.
We have

Z Z

1 d(x) = 1
f (x, y) d(y) d(x) =
X

Z
Z Z
0 d(y) = 0.
f (x, y) d(x) d(y) =
X

69

Example 4. (Non-examinable) Take X = Y = [0, 1] with Lebesgue

measure. Let be a linear ordering of [0, 1] with the property that
any non-empty subset 6= A [0, 1] has a least element with respect
to (called a well-ordering). Such exists if we assume the so-called
Axiom of Choice. Further assume that has the property that for all
x [0, 1] the set {y [0, 1] : y x} is countable. This can be achieved
if we assume the so-called Continuum Hypothesis. Let
(
1 if y x;
f (x, y) =
0 otherwise.
We have
Z Z


Z
f (x, y) d(y) d(x) =
0 d(x) = 0;
X
Y
X

Z Z
Z
f (x, y) d(x) d(y) =
1 d(y) = 1.
Y

What goes wrong here is that f is not B B-measurable (although it

is measurable separately in both variables).

7.5

Convolutions

Let f, g L1 (R). We define

h(x) := (f g)(x) :=

f (y)g(x y) dy,

whenever the integral exists.

Theorem. Suppose f, g L1 (R) are B-measurable. Then
Z
|f (y)g(x y)| dy < for a.e. x.

Also, if h is defined by (42), then h L1 (R) and khk1 kf k1 kgk1 .

Proof. The plan is to apply Fubinis Theorem to |F |, where
F (x, y) = f (y)g(x y),

70

(x, y) R2 .

(42)

, : R2 R by
(x, y) := y

(x, y) := x y.

and

Since and are continuous on R2 , they are Borel functions. It follows

that
(x, y) 7 f (y) = (f )(x, y)

(x, y) 7 g(x y) = (g )(x, y)

and

are Borel functions, and hence so is their product F (x, y). Apply Fubinis
Theorem to |F |:

Z Z
|f (y)g(x y)| dy dx


Z Z
|f (y)||g(x y)| dx dy
=



Z
Z
(43)
|f (y)|
|g(x y)| dx dy
=


Z
Z

|g(x)| dx dy
|f (y)|
=

= kf k1 kgk1 < .

Z

|f (y)g(x y)| dy < ,

and hence the intergal (42) defining h(x) exists for a.e. x. Also, since
Z
|f (y)g(x y)| dy,
|h(x)|

the calculation (43) shows that h L1 (R) and khk1 kf k1 kgk1 .

Remark. The result and proof extend to the case when f and g are Lebesgue
measurable. In that case h is Lebesgue measurable.

Applications to probability

This section summarizes some applications of measure theory to probability. This distinction is of course a bit artificial, because all of measure theory applies to probability spaces. The measure theory needed

71

for probability is summarized in many texts, for example  or . A

nice short treatment is provided by the book of Kolmogorov , who
was the first to work out the measure theoretic foundations of probability. Statements not proved in detail in this section can be found in
these references.

8.1

Product spaces

Example 1. Let X = {0, 1}, M = {, {0}, {1}, {0, 1}}, and i = (1

pi )0 + pi 1 , i = 1, . . . , n. The probability space (X, M, i ) models the
flip of a coin with bias pi . The product space where n = X X,
Fn = M M, Pn = 1 n models flipping the coins
independently (see Section 8.2 below):
Pn [{(1 , . . . , n )}] = 1 ({1 }) . . . n ({n }).
Sometimes it is necessary to model an Q
infinite sequence of indepen
dent experiements. Then we take = i=1 X as the sample space.
Each Fn can be regarded as a -algebra in in a natural way (as
the collection of events that can be specified in terms of the first n
coordinates only). Then we take F := (
n=1 Fn ) as the -algebra
for the infinite sequence of coin tosses. One can show that there is a
unique measure P on F such that its restriction to Fn is given by Pn
for every n 1. This is special case of what is known as Kolmogorovs
Extension Theorem.

8.2

Independence

Let (, F, P) be a probability space.

Definition 1. Events A, B F are called independent, if P[AB] =
P[A] P[B].
Definition 2. We say that -algebras F1 , F2 F are independent,
if for any A F1 and B F2 the events A and B are independent.
Definition 3. If X : R is a random variable (RV) (that is: a
measurable function) we define the -algebra generated by X as
(X) := {X 1 (B) : B R, B B},
where recall that B denote the Borel sets in R.
(HW): (X) is a -algebra in , (X) F, and X is measurable with
respect to (X).

72

Definition 4. The RVs X and Y are called indpendent, if (X) and

(Y ) are independent -algebras.
Lemma 1. The collection
PX := {X 1 ((, x]) : x R}
is a -system that generates (X).
Proof. We have
X ((, x1 ]) X 1 ((, x2 ]) = X 1 ((, x1 ] (, x2 ])

(PX ) (X). Let

M := {B B : X 1 (B) (PX )}.

Then M is a -algebra in R. Indeed:
1
X 1 (R) =
((, n]) (PX );
n=1 X
1
if B M, then X (R B) = X 1 (B) (PX ), so R B M;

1
if B1 , B2 M, we have X 1 (
(Bn ) (PX ),
n=1 Bn ) = n=1 X

so n=1 Bn M.
Since M contains the intervals (, x], x R, and these intervals
generate B, we get that M = B. This implies (X) (PX ), and the
lemma follows.

Lemma 2. Let P1 and P2 be -systems such that for all A P1 and

B P2 the events A and B are independent. Then (P1 ) and (P2 )
are independent.
Proof. First fix A P1 , and consider the collection:
LA
Y := {B (Y ) : P[A B] = P[A] P[B]}.
A
We have P2 LA
Y by assumption. We show that LY is a -system.
Indeed:
Since is independent of A, we have LA
Y;
if B C, B, C LA
,
we
have
Y

P[A (C B)] = P[A C] P[A B] = P[A] P[C] P[A] P[B]

= P[A] P[C B],
so C B LA
Y;
if B1 B2 , 1 , B2 , LA
Y , we have
P[A (
n=1 Bn )] = lim P[A Bn ] = lim P[A] P[Bn ]
n

= P[A] P[
n=1 Bn ],

73

A
so
n=1 Bn LY .

An application of the - theorem yields that (Y ) = (P2 ) LA

Y
(Y ), so LA
Y = (Y ). Now fix B (Y ), and put:
LeB
X := {A (X) : P[A B] = P[A] P[B]}.

We have P1 LeB
X , due to the previous paragraph. By the same
arguments as there, LeB
X is a -system. Due to the - theorem, we get
=
(P
)
=
(X).
This statement says that the claim of the
that LeB
1
X
lemma is true.

Lemma 3. The RVs X and Y are independent if and only if for all
x, y R we have
P[X x, Y y] = P[X x] P[Y y].
Proof. Apply Lemma 2 to the -systems PX and PY defined in Lemma
1.

8.3

Conditional expectation

Let (, F, P) be a probability space.

Theorem. Let X : R be a RV, such that X L1 (P). Suppose
that G F is a -algebra. There exists a RV Y : R, such that:
(i) Y is measurable with respect to G;
(ii) Y L1 (P);
(iii) For every B G we have:
Z
Z
Y dP =
X dP.
B

Moreover, Y is unique up to a.e.[P] equivalence.

Definition 1. The RV Y constructed in the Theorem is called the conditional expectation of X given G, and is denoted Y =: E[X | G].
Proof of Theorem. We prove the statement when X 0. The general
case follows by considering the positive and negative parts of X.
Observe that (, G) is a measurable space itself. Define the measure
Q : G [0, ) by the formula:
Z
Q(B) :=
X dP, B G.
B

74

We have Q P|G , where P|G denotes the restriction of PR to the algebra G. Indeed,
R if B G and P[B] = 0, we have Q(B) = B X dP =
0. Also, Q() = X dP < , since X L1 (P). The Radon-Nikodym
Theorem implies that there exists a function Y : R,
R measurable
on the space (, G), such that Y L1 (P|G ) and Q(B) = B Y dP. This
proves statements (i)(iii)
If Y is another RV satifying
R
R of the theorem.

(i)(iii), then we have B Y dP = B Y dP for all B G, and hence

Y = Y a.e.[P|G ].
It can be show that if X Lp (P), p 1, then we have E[X | G] Lp (P),
and kE[X | G]kp kXkp , so conditional expectation contracts in each
Lp -norm.
The case p = 2 is especially important. In this case, E[X | G] is the
best G-measurable predictor of X with respect to the L2 distance,
in the sense that it minimises kX Zk2 over all Z measurbale with
respect to G. We have the orthogonality property:
Z
Z (X E[X | G]) dP = 0

for all RVs Z L (P|G ). Notice that when Z = 1B , B G, this is

just (iii), from which the general casecan be derived. The conditional
variance formula in this language is simple Pythagoras Theorem with
respect to the L2 -norm: if E[X] = 0, we have:
Var(X) = kXk22 = kE[X | G]k22 + kX E[X | G]k22
= Var(E[X | G]) + E[Var(X | G)].

Appendix

The purpose of this appendix is to prove the following theorem.

Theorem (Heine-Borel Theorem). Let K Rd . The following two statements are equivalent.
(i) K is closed and bounded.
(ii) Every open cover of K admits a finite subcover.
Recall that a sequence x1 , x2 , Rd is called a Cauchy sequence, if for
any > 0 there exists an N such that for all n, m N we have |xn xm | .
We assume as known the following fact.
If x1 , x2 , . . . is a Cauchy sequence in Rd , then it converges,
that is, there exists x Rd such that limn |x xn | = 0.

75

(44)

Proof of Theorem. (Non-examinable)

(i) (ii): Let {U }I be an open cover of K, that is, the sets U are
all open, I is an arbitrary index set, and K I U . The proof is
by contradiction, that is, we assume the no finite subcover of K exists,
and show that this leads to a contradiction, therefore it is impossible.
Since K is bounded, there exists an integer n 0 such that K is
contained in the cube centred at 0 of radius 2n , that is:
K [2n , 2n ]d = [2n , 2n ] [2n , 2n ].
Write [2n , 2n ]d as a union of unit cubes
[
[k1 , k1 + 1] [kd , kd + 1].
[2n , 2n ]d =
2n k1 ,...,kd <2n

Then we have
K [2n , 2n ]d =

2n k1 ,...,kd <2n

K ([k1 , k1 +1]. . . [kd , kd +1]). (45)

If all of the finite many sets on the right hand side of (45) could be
covered by finitely many of the U s then putting them together we
would get a finite subcover of K. Hence we must have that there exists
indices k1 , . . . , kd such that K ([k1 , k1 + 1] [kd , kd + 1]) cannot
be covered by finitely many of the U s. Put Q0 = [k1 , k1 + 1]
[kd , kd + 1].
We continue repeating the above argument by subdividing the cube Q0
further and further. In order to be able to express this neatly, write
(0)

(0)

k1 = k1 , . . . , kd = kd .
(0)

(0)

(0)

(0)

Subdivide Q0 = [k1 , kd + 1] [kd , kd + 1] into 2d smaller

cubes of sidelength 1/2:
(0)

(0)

(0)

(0)

Q0 = [k1 , k1 + 1] [kd , kd + 1]
#
" (1) (1)
#
"
(1)
(1)
[
kd kd + 1
k1 k1 + 1
=

.
,
,
2
2
2
2
(1)
(1)
k1 ,...,kd :

(0)

(1)

(0)

2k1 k1 <2k1 +2,...,

(0)

(1)

(0)

2kd kd <2kd +2

Hence we have
K Q0 =

(1)
(1)
k1 ,...,kd :
(0)
(1)
(0)
2k1 k1 <2k1 +2,...,
(0)
(1)
(0)
2kd kd <2kd +2

"

#
" (1) (1)
#!
(1)
(1)
kd kd + 1
k1 k1 + 1

.
,
,
2
2
2
2

(46)

76

If all the finitely many sets on the right hand side of (46) could be
covered with finitely many U s, we would get that K Q0 can be
covered with finitely many U s, which was not the case. Hence there
exists a cube
"
#
"
#
(1)
(d)
(1)
(d)
k1 k1 + 1
k1 k1 + 1
Q1 =

,
,
,
2
2
2
2
such that K Q1 cannot be covered with finitely many U s.

Suppose that we have inductively selected a descending sequence of

cubes Q0 , Q1 , . . . , Q such that K Qj cannot be covered with finitely
many U s for j = 0, . . . , . Subdivide Q as:
#
" () ()
#
"
()
()
kd kd + 1
k1 k1 + 1

,
,
Q =
2
2
2
2
#
" (+1) (+1)
#
"
(+1)
(+1)
[
kd
+1
kd
k1
+1
k1

.
,
,
=
2+1
2+1
2+1
2+1
(+1)
(+1)
(+1)

()

,...,kd

k1

()

2k1 k1

()

<2k1 +2,...,

(+1)

2kd kd

()

<2kd +2

Hence we have
K Q =

(+1)

k1

()

"

#
(+1)
(+1)
k1
k1
+1
...
,
2+1
2+1

()

<2k1 +2,...,

(+1)

2kd kd

(+1)

,...,kd

(+1)

2k1 k1
()

(47)

()

<2kd +2

"

(+1)

(+1)

k
k
+1
d+1 , d +1
2
2

#!

And we deduce as before that there exists a cube

#
" (+1) (+1)
#
"
(+1)
(+1)
kd
+1
kd
k1
+1
k1

,
,
Q+1 =
2+1
2+1
2+1
2+1
such that K Q+1 cannot be covered by finitely many of the U s.
For each = 0, 1, 2, . . . , select a point x from K Q . For m, n
we have

|xn xm | diam(Q ) = d2 ,
and this shows that x1 , x2 , . . . is a Cauchy sequence. By (44), there
exists x Rd such that xn x as n .

Since K is closed, K Q is closed for all = 0, 1, 2, . . . . Hence we have

x K Q for all = 0, 1, 2, . . . . In particular, x K. Let 0 I be

77

such that x U0 . Since U0 is open, and diam(Q ) 0 there exists

an index 0 such that Q0 U0 . This shows that Q0 is covered by
the single open set U0 , which contradicts the choice of Q0 .
(ii) (i): The union of the open balls B(0, n), n = 1, 2, 3, . . . covers
K, and hence there exists n0 such that K B(0, n0 ). This shows that
K is bounded.
To show that K is closed, let y K c be any point. Pick for each point
x K a radius rx > 0 with the property that B(x, rx ) B(y, rx ) = .
Such radius exists, since x 6= y. Then K has the open cover
[
K
B(x, rx ).
xK

By the assumed (ii), there is a finite subcover, so that there exists

x1 , . . . , xN K such that
K

N
[

B(xi , rxi ).

i=1

Put r = min1iN rxi . Then B(y, r) is disjoint from each B(xi , rxi ),
and hence it is disjoint from K. It follows that B(y, r) K c . Since
y K c was arbitrary it follows that K c is open, and hence K is closed.
We have thus shown that K is closed and bounded.

References


Rick Durrett, Probability: theory and examples, 4th ed., Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, Cambridge, 2010.
MR2722836 (2011e:60001)



Herbert Federer, Geometric measure theory, Die Grundlehren der mathematischen Wissenschaften, Band 153, Springer-Verlag New York Inc., New York, 1969.
MR0257325 (41 #1976)



A. N. Kolmogorov, Foundations of the theory of probability, Chelsea Publishing Co.,

New York, 1956. Translation edited by Nathan Morrison, with an added bibliography
by A. T. Bharucha-Reid. MR0079843 (18,155e)



Walter Rudin, Real and complex analysis, 3rd ed., McGraw-Hill Book Co., New York,
1987. MR924157 (88k:00002)



David Williams, Probability with martingales, Cambridge Mathematical Textbooks,

Cambridge University Press, Cambridge, 1991. MR1155402 (93d:60002)

78