Real Analysis II Functions Spaces

Real Analysis II
John Loftin
May 4, 2013
1 Spaces of functions
1.1 Banach spaces
Many natural spaces of functions form infinite-dimensional vector spaces.
Examples are the space of polynomials and the space of smooth functions.
If we are interested in solving differential equations, then, it is important to
understand analysis in infinite-dimensional vector spaces (over R or C).
First of all, we should recognize the following straightforward fact about
finite-dimensional vector spaces:
Homework Problem 1. Let x = (x1 , . . . , xm ) denote a point in Rm , and

let {xn } = {(x1n , . . . , xm m
n )} be a sequence of points in R . Then xn x if
and only if xin xi for all i = 1, . . . , m.
m
(Recall the standard p metric on R is given by |x y|, where the norm
| | is given by |x| = (x1 )2 + + (xm )2 .)
Thus for taking limits in Rm , we could even dispense with the notion of
the taking limits using the metric on Rm , and simply define the xn x
by xin xi for each i = 1, . . . , m. This reflects the fact that there is only
one natural topology on a finite-dimensional vector space: that given by the
standard norm.
For infinite-dimensional vector spaces, say with a countable basis, so that
x = (x1 , x2 , . . . ), it is possible to define a topology by xn x if and only
if each xin xi . It turns out that this is not usually the most useful way

partially supported by NSF Grant DMS0405873
1
to define limits in infinite-dimensional spaces, however (though a related
construction is used in defining the topology of Frechet spaces).
Finite-dimensional vector spaces are also all complete with respect to
their standard norm (in other words, they are all Banach spaces). Given a
norm on an infinite dimensional vector space, completeness must be proved,
however. There are many examples of Banach function spaces: On a measure
space, the Lp spaces of functions are all Banach spaces for 1 p . Also,
on a metric space X, the space of all bounded continuous functions C 0 (X)
is a measure space under the norm
kf kC 0 (X) = sup |f (x)|.

xX
The Lp and C 0 form the basis of most other useful Banach spaces, with exten-
sions typically provided by measuring not just the functions themselves, but
also their partial derivatives (as in Sobolev and C k spaces) or their difference
quotients (Holder spaces).
Completeness of a metric space of course means that any Cauchy sequence
has a unique limit. More roughly, this means that any sequence that should
converge, in that its elements are becoming infinitesimally close to each other,
will converge to a limit in the space. As we will see, taking such limits is
a powerful way to construct solutions to analytic problems. Unfortunately,
many of the most familiar spaces of functions (such as smooth functions) do
not have the structure of a Banach space, and so it is difficult to ensure that
a given limit of smooth functions is smooth. In fact we have the following
theorem, which we state without proof:
Theorem 1. On Rn equipped with Lebesgue measure, the space C0 (Rn ) of
smooth functions with compact support is dense in Lp (Rn ) for all 1 p < .
In other words, completion of the space of smooth functions with compact
support on Rn with respect to the Lp norm, is simply the space of all Lp
functions for 1 p < .
If we are working in L2 , for example, it is possible for the limit of smooth
functions to be quite non-smooth: there are many L2 functions which are
discontinuous everywhere. This poses a potential problem if the limit we
have produced is supposed to be a solution to a differential equation. In
particular, such a limit may be nowhere differentiable. Some of our goals then
are to understand (1) how to make sense of taking derivatives of functions
which are not classically differentiable (the theory of distributions and weak
2
derivatives), and (2) how to show that a limit function actually has enough
derivatives to solve the equation (bootstrapping).
Theorem 1 reminds us that the Lp Banach spaces have a very large over-
lap, which of course includes many more functions than the smooth functions
with compact support. In particular, it is often useful to take the point of
view that these Banach function spaces are not so much different spaces but
different tools to study either the space of all functions or (via the comple-
tion process) the space of only very nice functions (e.g., smooth functions of
compact support).
In particular, two function spaces which are very closely related to each
other are L and C 0 . As we will see below, they have essentially the same
norm. First of all, we show that C 0 (X) is a Banach space for any metric
space X.
1.2 The Banach space C 0

Given a metric space X, define
C 0 (X) = {f : X R : f is continuous and sup |f | < }.

X
Define the norm

kf kC 0 (X) = sup |f |.
X
It is straightforward to verify that k kC 0 satisfies the requirements for a

norm:
kf kC 0 = 0 f 0,
kf kC 0 = ||kf kC 0 ,
kf + gkC 0 kf kC 0 + kgkC 0 .
Remark. If fi f in C 0 (X), then we say fi f uniformly on X, and

C 0 (X) convergence is the same as uniform convergence.
The main thing to check is that the norm gives C 0 (X) the structure of a
complete metric space:
Proposition 1. For any metric space X, C 0 (X) is a Banach space with

norm k kC 0 .
3
Proof. We simply need to check the metric induced on C 0 (X) is complete.
Let d denote the metric on X, and consider a Cauchy sequence {fi }
C 0 (X). In other words, for all > 0, there is an N so that n, m > N
implies kfn fm kC 0 < . By the definition of the norm, this is equivalent to
|fn (x) fm (x)| < for all x X. Now for each x X, {fi (x)} R is a
Cauchy sequence, and since R is complete, there is a limit f (x) = limi fi (x).
Now we have produced a limit function f . Now we need to show that
kfi f kC 0 0 and f C 0 (X). The first statement is straightforward:
For all > 0, there is an N so that for all n, m > N , for all x X,
|fn (x) fm (x)| < .
Now let m to see that
|fn (x) f (x)| .
So we have that for all > 0, there is an N so that for all n > N , and for all
x X,
|fn (x) f (x)| .
Since this is true for all x X, we have
kfn f kC 0 = sup |fn (x) f (x)| ,

xX
and so kfi f kC 0 0.
We still need to prove that the limit function f is continuous. So let x
X and choose > 0. Then there is an N so that for n > N , kfn f kC 0 < .
By the previous paragraph and the definition of k kC 0 ,
|fn (x) f (x)| < and |fn (y) f (y)| < for all y X.
Choose a particular n > N and since fn is continuous at x, there is a > 0

so that |fn (x) fn (y)| < for y so that d(x, y) < . Then for such y in a
-ball around x,
|f (x) f (y)| = | [f (x) fn (x)] + [fn (x) fn (y)] + [fn (y) f (y)] |
|f (x) fn (x)| + |fn (x) fn (y)| + |fn (y) f (y)|
< + + = 3.
So we have proved that for all > 0, x X, there is a > 0 so that

d(x, y) < |f (x) f (y)| < 3. This proves f is continuous.
4
The last bit of the proof can be remembered as this: Any uniform limit
of continuous functions is continuous.
Remark. The previous proposition works as well for functions whose range is
the complex numbers C, or a vector space Rn , or in fact any Banach space
B. The proof is the same. In this last case, we could refer to the Banach
space C 0 (X; B) as the Banach space of continuous functions from X into B.
Consider an open set Rn . On , the C 0 norm is essentially the same
as the L norm, but is simpler to define because we can consider functions
as elements of C 0 , while we need equivalence classes of functions to define
L . In fact, more is true. Let inherit the standard metric and Lebesgue
measure from Rn . For a measurable function f : R, let [f ] be the
equivalence class whose members are all functions from R which agree
with f almost everywhere.
Proposition 2. The map : C 0 () L () given by (f ) = [f ] is one-
to-one and preserves the norm.
Proof. First of all, note that it follows immediately from the definitions that
for f C 0 (), (f ) L (). Also, we should show that kf kC 0 = k(f )kL
to show preserves the norm.
The proof hinges on the simple fact that every full-measure subset V of
is dense in . (Recall V has full measure if \ V has Lebesgue measure
zero.) This fact may be proved as follows: let V have full measure.
Then there is no open ball contained in \ V (since open balls have positive
measure). This shows V is dense in . (Question: We need to use is an
open subset of Rn in this paragraph. Where did we use that is open?)
Now we prove the map is injective. So if f and g are in C 0 (), and
[f ] = [g], then by definition, f g on a set V of full measure. Let x .
Since V is dense, there is a sequence xn x, xn V . Then
f (x) = f (lim xn ) = lim f (xn ) = lim g(xn ) = g(lim xn ) = g(x)
n n n n
since f and g are continuous and f (xn ) = g(xn ). So f and g coincide at each
point of and so f = g in C 0 ().
Finally, we show that for f C 0 (), kf kC 0 = kf kL . In particular, let
denote Lebesgue measure and compute (recall we often write kf kL instead
of the more correct k[f ]kL = k(f )kL )
kf kL () = inf{a : |f (x)| a for almost every x }
= inf{a : {x : |f (x)| > a} = 0}.
5
But {x : |f (x)| > a} = 0 implies that {x : |f (x)| > a} = (Proof: If the
set is not empty it is an open subset of since |f | is continuous. The only
open subset of with measure zero is the empty set.) So now
kf kL () = inf{a : {x : |f (x)| > a} = 0}

= inf{a : {|f (x)| > a} = }
= inf{a : |f (x)| a for all x }
= sup |f (x)| = kf kC 0 () .
x
Remark. The previous Proposition is true for any measurable subset of Rn

with the following property: every nonempty open subset of has positive
measure.
Remark. The map from C 0 () to L () is far from being onto. A typical
discontinuous function g cannot be changed on a set of measure zero to be
continuous. The following homework problem is to show this is the case with
the Heaviside function.
Homework Problem 2. Let g(x) be the Heaviside function on R. In other

words, let g(x) = 0 if x < 0 and g(x) = 1 if x 0.
(a) Show there is no function in C 0 (R) which is equal to g almost every-

where.
(b) Show that there is no sequence of functions fn C 0 (R) which satisfy

fn g in L (R).
Hint for (b): Show that if fn g in L (R), then {fn } is a Cauchy sequence
in C 0 (R). Then use Proposition 1 and show the resulting limit function
f C 0 (R) must be equal to g almost everywhere. (This amounts to showing
that (C 0 ) is a closed subspace of L .) Provide a contradiction.
1.3 Quantifiers
It is worth taking the time to look in some detail at C 0 convergence, and to
compare it to pointwise convergence. By contrast, C 0 convergence is often
call uniform convergence.
6
For a metric space X, fn f in C 0 (X), if for all > 0, there is an N so
that
n > N = kfn f kC 0 (X) < .
In other words, for all > 0, there is an N so that
n>N = sup |fn (x) f (x)| < .
xX
So then fn f in C 0 (X) implies that for all > 0, there is an N so that for
x X,
n > N = |fn (x) f (x)| < .
A few easy manipulations imply in fact the following
Lemma 3. Let X be a metric space and let fn C 0 (X). Then fn f in
C 0 (X) if and only if for every > 0, there is an N = N () so that for x X,
n>N = |fn (x) f (x)| < .
Homework Problem 3. Prove Lemma 3.
Since C 0 (X) is a Banach space, we know that the limit function f
C 0 (X) as well, and thus the uniform limit of continuous functions is continu-
ous. C 0 convergence is called uniform convergence because the N in Lemma
3 depends only on > 0 and not on x X: thus N is uniform over all
x X.
We contrast this with pointwise convergence. If fn are functions on X,
then fn f pointwise if for all > 0 and x X, there is an N = N (, x) so
that
n > N = |fn (x) f (x)| < .
The difference between pointwise and uniform convergence is subtle but very
important: in pointwise convergence N = N (, x) may depend on and x,
while in uniform convergence N = N () only depends on and is independent
of x.
We have belabored this point because it is one of the major issues in
analysis: keeping track of which constants, or quantifiers, depend on which
other quantifiers. (It is even better to have explicit bounds (estimates) on the
behavior of quantifiers with respect to each other.) Of course it is desirable
(though not always possible) to have more uniform dependence of quantifiers,
as we see in the following standard example:
We have seen that the uniform limit of continuous functions is continuous.
On the other hand, a pointwise limit of continuous functions may be not be:
7
Example 1. Consider X = [0, 1] and fn (x) = xn . Then fn f pointwise
on [0, 1], where
0 for x [0, 1),
f (x) =
1 for x = 1.
So the pointwise limit f is discontinuous, and thus we see that fn 6 f
uniformly.
1.4 Derivatives
The theory of derivatives in one variable is fairly straightforward: if a function
f : R R is differentiable at p (i.e., f 0 (p) exists), then f must be continuous
at p. For functions of more than one variable, however, consider the following
example:
Example 2.
( xy
for (x, y) 6= (0, 0)
f (x, y) = x2 + y 2
0 for (x, y) = (0, 0),
has first partial derivatives everywhere but is not even continuous at (0, 0).
Even though f has all its first partial derivatives at (0, 0), we do not
consider f to be differentiable at (0, 0). For functions of more than one
variable, we introduce the following definition of differentiability, which is
stronger than just the existence of all the partial derivatives. Instead of R-
valued functions, we consider the slightly more general case of maps from Rn
to Rm . A basic reference is Spivak, Calculus on Manifolds, Chapter 2.
Let O Rn be a domain, and let f = (f 1 , . . . , f m ) : O Rm . Then f
is differentiable at a point a O if there is a linear map Df (a) : Rn Rm
which satisfies
|f (a + h) f (a) Df (a)(h)|
lim = 0,
h0 |h|
where h Rn . Df (a) is called the derivative, or total derivative, of f at a.
Lemma 4. In terms of standard bases of Rn and Rm , Df (a) is written as
the Jacobian matrix
i
f
Df (a) = (a) , i = 1, . . . , m, j = 1, . . . , n.
xj
8
In particular, if f is differentiable at a, then all the partial derivatives f i /xj
exist at a.
Proof. Write Df (a) as the matrix (ij ). Also consider a path h = (0, . . . , k, . . . , 0),
where k 0 sits in the j th slot. (In other words, hl = jl k, where jl is the
Kronecker delta, which is 1 if l = j and 0 otherwise.) We also use Einsteins
summation convention. In n space, this summation convention requires that
any repeated index which appears in both up and down positionssuch as
the l in the last two lines belowis assumed to be summed from 1 to n.
Compute
f i f i (a1 , . . . , aj + k, . . . , an ) f i (a)
(a) = lim
xj k0 k
[f i (a1 , . . . , aj + k, . . . , an ) f i (a) il hl ] + il hl
= lim
k0 k
i l
l j k
= 0 + lim
k0 k
i l i
= l j = j .
The key step, going from the second to the third line, follows from the as-
sumption that f is differentiable at a.
Another important result with essentially the same proof concerns direc-
tional derivatives. For a vector v = v j Rn , The directional derivative at a
in the direction v is the vector
f (a + tv) f (a)
Dv f (a) = lim .
t0 t
(Note we do not require kvk = 1 to define the directional derivative.) We
have the following lemma:
Lemma 5. If f is differentiable at a, then the directional derivative Dv f (a)

exists and
f
Dv f (a) = v j j .
x
As Example 2 above shows, the converse of Lemma 4 is not true without
extra assumptions on the partial derivatives. The following proposition gives
an easy criterion for a function to be differentiable:
9
Proposition 6. If f = (f 1 , . . . , f m ) has continuous first partial derivatives
f i /xj on a neighborhood of a, then f is differentiable at a.
Proof. For a component function f i , write
f i (a + h) f i (a) = f i (a1 + h1 , a2 , . . . , an ) f i (a1 , a2 , . . . , an )
+ f i (a1 + h1 , a2 + h2 , . . . , an ) f i (a1 + h1 , a2 , . . . , an )
+ + f i (a1 + h1 , a2 + h2 , . . . , an + hn )
f i (a1 + h1 , a2 + h2 , . . . , an1 + hn1 , an )
Now consider the first term in terms of the function f i (x1 , a2 , . . . , an ) of the
first variable x1 alone. The Mean Value Theorem shows that there is a b1
between a1 and a1 + h1 so that
f i 1 2
f i (a1 + h1 , a2 , . . . , an ) f i (a1 , a2 , . . . , an ) = h1 (b , a , . . . , an ).
x1
Similarly, for all other terms the difference equals
f i 1
hj (a + h1 , . . . , aj1 + hj1 , bj , aj+1 , . . . , an )
xj
for bj between aj and aj + hj . So if we set cj = (a1 + h1 , . . . , aj1 +
hj1 , bj , aj+1 , . . . , an ), then we have
n
i i
X f i
f (a + h) f (a) = hj (cj ),
j=1
xj
where each cj a as h 0. So compute

n
n
i X f i i

i X f f
f (a + h) f i (a) (a)hj (cj ) j (a) hj

x j x j x

j=1 j=1
lim = lim
h0 |h| h0 |h|
n i
X f i

f j

xj (c j ) j
(a) |h |
j=1
x
lim
h0 |h|
n i
X f i

f
lim
xj (c j ) j
(a)
h0
j=1
x
= 0
10
since each f i /xj is assumed to be continuous at a.
So we have proved that each component function f i is differentiable at a.
To show f is differentiable, just note
i

f (a + h) f (a) f j
i i
f (a + h) f (a) f j

(a)h m (a)h
xj X xj
,
|h| i=1
|h|
which goes to 0 as h 0.
Recall a function is (locally) C 1 if its first partial derivatives are continu-
ous. The previous Proposition 6 shows that such functions are differentiable,
and Lemma 5 then shows that directional derivatives work as expected for
C 1 functions.
Now, for functions f on an open subset of Rm , consider the norm
m
X f
kf kC 1 () = kf kC 0 () +
xi 0
i=1 C ()
and the space
C 1 () = {f : R : f, 1 f, . . . , m f are bounded and continuous}.
Similarly, we can consider Rp -valued C 1 functions, the difference being that

the functions f , i f have bounded values in Rp .
Proposition 7. On any open set Rm , C 1 (, Rp ) is a Banach space.
Proof. It is straightforward to check k kC 1 is a norm.

f
Since kf kC 1 kf kC 0 and kf kC 1 k xj kC 0 , then for any Cauchy sequence
fn
{fn } in C , {fn } and { xj } are Cauchy sequences in C 0 . Therefore, since C 0
1
is a Banach space, there are uniform limits

f
f = lim fn , gi = lim , i = 1, . . . , m, (1)
n n xi
and f , gi C 0 . Since
m
X f
kf kC 1 = kf kC 0 +
xi ,
i=1 C0
11
(1) shows it suffices to prove that
f
= gi , i = 1, . . . , m.
xi
As usual, we recognize that integrating has better properties than differ-
entiating. For x , choose an x0 = x (0, . . . , k, . . . , 0), where the k > 0
is in the ith slot. Since is open, we may choose k small enough so that the
line segment from x0 to x is contained in . Compute
f (x) = lim fn (x)
n
" #
Z y=xi
fn 1
= lim fn (x0 ) + (x , . . . , xi1 , y, xi+1 , . . . , xm ) dy
n y=xi k xi
Z y=xi
= f (x0 ) + gi (x1 , . . . , xi1 , y, xi+1 , . . . , xm ) dy (2)
y=xi k
The key step in the computation is the last one: fn (x0 ) f (x0 ) is easy,
and the integral converges by the Dominated Convergence Theorem: Since
gi C 0 , there is a constant C so that |gi | C on . Moreover, since f
xi
n
gi
0 fn fn
in C , there is an N so that | xi gi | 1 for all n N . Thus xi are all
bounded by the integrable function C + 1, and the Dominated Convergence
Theorem applies.
Now we can differentiate (2) with respect to xi and we see that f xi

= gi
at each x . This completes the proof.
The last part of the proof is of independent interest. We record it as
Proposition 8. Let fn be C 1 functions on a domain Rm . Then if fn
f uniformly and fn /xi gi uniformly for i = 1, . . . , m, then gi = f /xi .
Remark. We can also define C k (, Rp ) to be the space of all functions f :
Rp so that f and all its partial derivatives up to order k are continuous
and bounded. The norm is given by
X
kf kC k = k f kC 0 , (3)
||k
where = (1 , . . . , m ), each i 0, || = 1 + + m , and

|| f
f =
(x1 )1 (xm )m
12
(if some i = 0, then there is no differentiation with respect to xi ).
We can use the same proof as above to conclude that C k is a Banach
space. In particular, we can apply the theorem to F = (f, f,1 , . . . , f,n ) and
then relate kF kC 1 to kf kC 2 to provide an inductive step.
C is not a Banach space, as the analog of (3) would involve an infinite
sum.
Weve used the following problem implicitly a few times above.
Homework Problem 4. Show that if f : Rn Rm is differentiable at a
point a, then it is continuous at a.
Homework Problem 5. Let f be a real-valued function defined on a domain
2f 2f
in R2 . Show that if the second mixed partials f,12 = x1 x2 and f,21 = x2 x1
are continuous in a neighborhood of a point y, then

2f 2f
(y) = (y).
x1 x2 x2 x1
Hint: If the two are not equal, assume without loss of generality that the
difference f,12 f,21 > 0 at y. Then it must be positive on a rectangular
neighborhood. Integrate this quantity over the rectangular neighborhood, and
use Fubinis Theorem and the Fundamental Theorem of Calculus to arrive at
a contradiction.
Finally, we introduce the Chain Rule. We need the following lemma first:
Lemma 9. Let A : Rn Rm be a linear map. Then there is a constant
C = C(A) so that |Ax| C|x| for all x Rn .
Homework Problem 6. Prove Lemma 9. Hint: write down Ax in terms
of the matrix entries of A.
Proposition 10 (Chain Rule). Let g : O Rn , f : U O, where
O Rm and U Rl are domains. Assume f is differentiable at a U, and
g is differentiable at f (a) O. Then there is a composition of linear maps
D(g f )(a) = Dg(f (a)) Df (a).
In terms of partial derivatives, this is equivalent to
g p g p y j
= ,
xi y j xi
where {xi } are coordinates on Rl , {y j } are coordinates on Rm , and we follow
the usual rules of Leibniz notation and Einstein summation.
13
Proof. Let A = Df (a), B = Dg(f (a)). Now consider the remainder terms
in the definition of differentiable maps. For h Rl , k Rm ,
(h) = f (a + h) f (a) A(h),
(k) = g(f (a) + k) g(f (a)) B(k),
(h) = (g f )(a + h) (g f )(a) (B A)(h).
Then since f and g are differentiable,
|(h)|
lim = 0, (4)
h0 |h|
|(k)|
lim = 0, (5)
k0 |k|
and we want to show that

|(h)|
lim = 0.
h0 |h|
So compute
(h) = g(f (a + h)) g(f (a)) B(A(h))
= g(f (a + h)) g(f (a)) B(f (a + h) f (a) (h))
= [g(f (a + h)) g(f (a)) B(f (a + h) f (a))] + B((h))
= (f (a + h) f (a)) + B((h))
So then
|(h)| |(f (a + h) f (a))| |B((h))|
+ .
|h| |h| |h|
|B((h))|/|h| 0 as h 0 by (4) and Lemma 9. On the other hand (5)
shows that for all > 0 there is a so that
|k| < = |(k)| |k|.
Therefore if |f (a + h) f (a)| < (which can be achieved if |h| < since f
is continuous),
|(f (a + h) f (a))| |f (a + h) f (a)|

|h| |h|

|A(h)| |(h)|
+
|h| |h|
14
Now if we let h 0, using (4) and Lemma 9,
|(f (a + h) f (a))|
lim sup C.
h0 |h|
Now we may let 0 to show that |(h)|/|h| 0 as h 0.
1.5 Contraction mappings

Another tool we need is a basic fact about complete metric spaces, the Con-
traction Mapping Theorem.
A fixed point of a map f : X X is a point x X so that f (x) = x.
For a metric space X with metric d, a contraction map is a map g : X X
so that there is a constant (0, 1) for which
d(g(x), g(y)) d(x, y) for all x, y X.
Remark. It is important that the constant < 1 is independent of the x and

y in X. As well see below in a homework exercise, the following theorem is
false if we let depend on x and y.
Theorem 2 (Contraction Mapping). Any contraction mapping on a com-

plete metric space has a unique fixed point.
Proof. As above, denote our metric space by X with metric d, and let
(0, 1) be the constant for the contraction map g: for all x, y X,
d(g(x), g(y)) d(x, y).
First we prove uniqueness. If x and y are fixed points of g (so g(x) = x,
g(y) = y), then
d(x, y) = d(g(x), g(y)) d(x, y).
So (1 )d(x, y) 0. Since < 1 and d(x, y) 0 (since X is a metric
space), we must have d(x, y) = 0 and so x = y (again since X is a metric
space).
To prove existence of the fixed point, we consider any point x0 X,
and consider iterates defined inductively by xn+1 = g(xn ) for all n 0. We
claim xn is a Cauchy sequence and the limit x of xn is the fixed point. For
15
n m 0, compute
d(xn , xm ) d(xn , xn1 ) + + d(xm+1 , xm )

= d(g(xn1 ), g(xn2 )) + + d(g(xm ), g(xm1 ))
d(xn1 , xn2 ) + + d(xm , xm1 )
2 d(xn2 , xn3 ) + + 2 d(xm1 , xm2 )
n1 d(x1 , x0 ) + + m d(x1 , x0 )
n1
X
i
X m
= d(x1 , x0 ) d(x1 , x0 ) i = d(x1 , x0 )
i=m i=m
1
(Note that in this computation, weve used the exact sum of the geometric
series, and it is crucial that (0, 1): the geometric series diverges for
1.) So if N is a positive integer, then for all n, m > N , d(xn , xm )
d(x1 , x0 )N /(1 ), and this last quantity d(x1 , x0 )N /(1 ) 0 as N
. Thus {xn } is a Cauchy sequence which has a limit x X since X is a
complete metric space.
Now we prove that x is a fixed point. Since x = limi xi = limi xi+1 ,
we have
g(x ) = g(lim xi ) = lim g(xi ) = lim xi+1 = x ,
i i i
and so x is a fixed point. One point to note is that we have interchanged g

with lim, which is valid only if g is continuous (this is a homework problem
below).
Homework Problem 7. Show any contraction map is continuous.
Homework Problem 8. Newtons method is an iterative method for finding

zeros of differentiable functions. For an initial x0 , we proceed by the recursive
definition
f (xi )
xi+1 = xi 0 .
f (xi )
Then the limit lim xn should produce a zero of the function f .
A differentiable function f : R R has a nondegenerate zero at x if
f (x) = 0 and f 0 (x) 6= 0.
Assume f : R R is a locally C 2 function (i.e., f 00 is continuous on all
of R). Show that every nondegenerate zero x of f has a neighborhood Nx so
that for any initial x0 Nx , Newtons method converges to x. Hints:
16
(a) The main point is to exhibit the Newtons method iteration as a con-
traction map on a complete metric space (recall a closed subset of any
complete metric space is complete). You must find an appropriately
small neighborhood of x on whose closure Newtons method is a con-
traction map.
(b) You will need the following lemma: For a C 1 function g : R R,
|g(y) g(z)|
y 6= z [a, b] = max |g 0 (w)|.
|y z| w[a,b]
(c) Show that any fixed point of Newtons method is a zero.

(d) Show the zero you have produced via Newtons method must be the orig-
inal zero x.
1.6 Differentiating under the Integral

Proposition 11. Let f = f (y, x) be a locally C 1 real-valued function for
y Rn , x O an open subset of Rm . Then on a measurable O Rm
equipped with Lebesgue measure dx,
Z Z
f
i
f (y, x) dx = i
(y, x) dx.
y y
R

f (y, x) dx is C 1 as a function of y.
in Rm is a compact subset of O.
Remark. O means that the closure
Proof. Compute. Let ei be the standard ith basis vector on Rn .
Z Z Z
1
f (y, x) dx = lim f (y + kei , x) dx f (y, x) dx
y i k0 k
f (y + kei , x) f (y, x)
Z
= lim dx
k0 k
f
Clearly as k 0, the integrand goes to y i (y, x) pointwise. We need to
show that the integrands are bounded in absolute value by a fixed integrable
function to use the Dominated Convergence Theorem. This follows from the
Mean Value Theorem, which shows that the integrand is equal to
f
(
y , x)
y i
17
for y = (y 1 , . . . , y i1 , bi , y i+1 , . . . , y n ), bi between y i and y i + k. Since f is C 1 ,
f /y i is continuous, is compact, and y stays in a compact neighborhood
of y, then
R the absolute value of the integrand is bounded by a constant M .
Since M dx < , the Dominated Convergence Theorem shows that
f (y + kei , x) f (y, x)
Z Z

i
f (y, x) dx = lim dx
y k0 k
f (y + kei , x) f (y, x)
Z
= lim dx
k0 k
Z
f
= i
(y, x) dx
y
R
To show that f (y, x) is C 1 as a function of y, note that its partial
derivatives Z
f
gi (y) = i
(y, x) dx
y
are continuous in y by the Dominated Convergence Theorem again, since if
y y0 , then
Z
f
lim gi (y) = lim (y, x) dx
yy0 yy0 y i
Z
f
= lim i
(y, x) dx
yy0 y
Z
f
= (y , x) dx
i 0
y
= gi (y0 )
because f /y i is continuous in y.
Remark. The last argument also shows that if f = f (z, x) is a continuous
function of z and x, and x a compact subset of Rn , then the function
Z
z 7 f (z, x) dx

is continuous.
18
1.7 The Inverse Function Theorem
We need the following lemma first:
Lemma 12. If f is a C 1 function from a ball B in Rn to Rm , which satisfies
i
f
xj C

on B, then for y, z B,
|f (y) f (z)| Cmn|y z|.
Proof. If y, z B, then the line segment {ty + (1 t)z : 0 t 1} between

them is also contained in B (see Homework Problem 13 below). Then use
the Chain Rule to compute for i = 1, . . . , m,
Z 1
i i
i
|f (y) f (z)| = f (ty + (1 t)z) dt
t
Z0 1 i

j j f
= (y z ) j (ty + (1 t)z) dt
0 x
Cn|y z|.
(Note this argument is essentially the same as the use of the Mean Value
Theorem.) Now apply
m
X
|f (y) f (z)| |f i (y) f i (z)|.
i=1
Theorem 3 (Inverse Function Theorem). Let f : O U be a C 1 map

between domains in Rm . Assume that for a O, Df (a) is an invertible
matrix (i.e., det Df (a) 6= 0). Then there are neighborhoods O0 3 a and
U 0 3 f (a) so that f : O0 U 0 is a bijection and f 1 is also a C 1 map. For
every b O0 , D(f 1 )(f (b)) = (Df (b))1 .
Proof. First of all, we may reduce to the case that a = f (a) = 0 and Df (a) =
I the identity map from Rm to itself. (This can be achieved by replacing f (x)
by (Df (a))1 (f (x + a) f (a)). Then use the Chain Rule and the fact that
the derivative of the linear map (Df (a))1 is (Df (a))1 itself.)
19
Now consider g(x) = x f (x) and note that Dg(0) = 0 the zero linear
transformation. Since g is C 1 , there is an r > 0 so that |x| < 2r implies
i
g
< 1 ,

(x)
xj 2m2 for i, j = 1, . . . , m. (6)
Let B(r) = {x Rm : |x| < r}. Then Lemma 12 and g(0) = 0 imply that
g(B(r)) B(r/2).
Now let y B(r/2) and consider
gy (x) = g(x) + y = x f (x) + y.
Then
gy (x) = x is equivalent to f (x) = y, and so a fixed point of gy is
equivalent to a solution to f (x) = y.
If x B(r), |gy (x)| |g(x)| + |y| r, and so gy is a map from the
complete metric space B(r) to itself.
Lemma 12 and (6) imply gy is a contraction map (with = 1/2). In
other words, for x1 , x2 B(r),
|gy (x1 ) gy (x2 )| = |g(x1 ) g(x2 )| 12 |x1 x2 | (7)
Therefore, for each y B(r/2), there is a unique fixed point x of gy , which

shows there is a unique solution x to f (x) = y in B(r).
Now we show x = f 1 (y) is continuous: for x1 , x2 B(r), we have, by
the definition g = x f and (7)
|x1 x2 | |g(x1 ) g(x2 )| + |f (x1 ) f (x2 )|

21 |x1 x2 | + |f (x1 ) f (x2 )|,
1
|x x2 | |f (x1 ) f (x2 )|,
2 1
|f 1 (y1 ) f 1 (y2 )| 2|y1 y2 | (8)
for yi = f (xi ). Thus f 1 is continuous.

To show f 1 is differentiable at y2 with total derivative (Df (x2 ))1 , we
need to show that
|f 1 (y1 ) f 1 (y2 ) (Df (x2 ))1 (y1 y2 )|
lim = 0.
y1 y2 |y1 y2 |
20
To show this, compute
|f 1 (y1 ) f 1 (y2 ) (Df (x2 ))1 (y1 y2 )|

= |x1 x2 (Df (x2 ))1 (f (x1 ) f (x2 ))|
= |(Df (x2 ))1 [Df (x2 )(x1 x2 ) (y1 y2 )]|
C|Df (x2 )(x1 x2 ) (y1 y2 )| (by Lemma 9)
= C|Df (x2 )(x1 x2 ) [f (x1 ) f (x2 )]| (9)
Therefore,
|f 1 (y1 ) f 1 (y2 ) (Df (x2 ))1 (y1 y2 )|
|y1 y2 |
|f (y1 ) f 1 (y2 ) (Df (x2 ))1 (y1 y2 )| |x1 x2 |
1
=
|x1 x2 | |y1 y2 |
(Note y1 6= y2 implies x1 6= x2 since yi = f (xi ).) This expression goes to zero
as y1 y2 by (8) and (9), since f is differentiable at x2 .
Finally we show the total derivative (Df (x))1 is continuous in y. We
2
can think of Df as a map from x to Rm , which represents the space of
m m matrices. Df (x) is continuous in x (f is C 1 ), and thus is continuous
2
in y. The determinant function det : Rm R is continuous, since it is a
polynomial in the matrix entries. So det Df (x) is bounded away from zero,
by compactness of B(r). We are left to prove the continuity of the matrix
inverse operation for square matrices with determinant bounded away from 0.
This follows from the formula from the inverse in terms of cofactor matrices:
Each entry of the inverse matrix A1 = (aij )1 is of the form
(m 1)st -order polynomial in the aij
.
det(aij )
Homework Problem 9. If, in the Inverse Function Theorem, f is a smooth

(C ) map, then f 1 : U 0 O0 , the C 1 local inverse of f , is also C . Hints:
(a) If A = A(s) is a family of invertible n n matrices which depend
differentiably on a real parameter s, differentiate the equation AA1 = I
to show
d(A1 ) dA
= A1 A1 .
ds ds
21
(b) Use the formula for D(f 1 ) to show that f 1 is C .
Hints: It may be helpful to use the following notation. If f = f (x) =
f (x1 , . . . , xn ), we may write (y 1 , . . . , y n ) = y = y(x) = f (x). And so
f 1 (y) = x may be written simply as y = y(x). To show f 1 is C 2 , for
example, you should write
2 (f 1 )k 2 xk
=
y i y j y i y j
in terms of (the components of ) the first and second derivatives
f y 2f 2y
= , and =
xi xi xi xj xi xj
and verify that the resulting expression is continuous.
Remember to use the Chain Rule, as in e.g.,
xi
= ,
y j y j xi
and recall that Df 1 = (Df )1 can be written as

1
xi y k

= .
y j xl
It will also be helpful to use Einsteins summation notation. In partic-

ular, the matrix notation used in part (a) is insufficient, as there may
be quantities with more than 2 indices which need to be summed.
Theorem 4 (Implicit Function Theorem). Suppose f : Rn Rm Rm

is C 1 in an open set containing (a, b), and assume f (a, b) = 0. Assume the
m m matrix
f i

(a, b) , 1 i, j m
xn+j
is invertible. Then there is an open set O Rn containing a and an open
set U Rm containing b so that for each x O, there is a unique g(x) U
so that f (x, g(x)) = 0. g is locally C 1 .
Homework Problem 10. Prove the Implicit Function Theorem. Hints:
22
(a) Consider F : Rn Rm Rn Rm defined by F (x, y) = (x, f (x, y)) and
apply the Inverse Function Theorem to F .
(b) Show that, on a suitably small neighborhood, F 1 is of the form F 1 (x, y) =

(x, p(x, y)) for p : Rn Rm Rm .
(c) Show that g(x) = p(x, 0) satisfies the conditions of the theorem.
1.8 Lipschitz constants and functions

A closely related concept to the contraction map is the Lipschitz constant.
A map f : X Y has Lipschitz constant
dY (f (x), f (x0 ))
L= sup .
x,x0 X:x6=x0 dX (x, x0 )
Here of course dX and dY are the metrics on X and Y respectively. An

equivalent definition is that L is the smallest constant so that
dY (f (x), f (x0 )) L dX (x, x0 ) for all x, x0 X.
A function with finite Lipschitz constant is called Lipschitz. A basic fact is

the following:
Lemma 13. Any Lipschitz function is continuous.
If f : X X, then the Lipschitz constant gives a criterion for a mapping
to be a contraction mapping:
Lemma 14. f : X X is a contraction map if and only if the Lipschitz
constant L of f is strictly less than 1.
Idea of proof. The Lipschitz constant is the smallest value of for which f
is a contraction map.
If f : R R, then the Lipschitz constant is simply
|f (x) f (y)|
L = sup ,
x6=y |x y|
which of course is suggestive of the definition of the derivative. In fact, the

following is true:
23
Homework Problem 11. The Lipschitz constant of a locally C 1 function
f : R R is equal to supxR |f 0 (x)|.
Hint: To show the two quantities are equal, you need to relate the sup of
the derivative to the sup of the difference quotients. To relate the derivative
f 0 (x) to difference quotients, use the definition of the derivative. To relate a
given difference quotient to a derivative, use the Mean Value Theorem.
The previous problem shows that any differentiable function with bounded
derivative is Lipschitz. The converse is false, as we see in the following ex-
ample.
Example 3. The function x 7 |x| is a Lipschitz function from R to R. This
follows from the observation that for each x 6= y R,

|x| |y|
1.
|x y|
(This can be proved using the Triangle Inequality.)

Example 4. For any constant (0, 1), the function from R to R x 7 |x|
is not Lipschitz. In particular,

|x| |0|
lim = lim |x|1 = .
x0 |x 0| x0
In terms of the graph of a function, a function whose graph has a corner

(as does x 7 |x|) is Lipschitz, while a function whose graph has a cusp (as
does x 7 |x| ) is not Lipschitz.
Another basic fact we establish is this: the conclusion of the Contraction
Map Theorem may be false if the Lipschitz constant is equal to 1. An easy
example is the map x 7 x + 1 from R R. The Lipschitz constant is
obviously 1, and there is no fixed point. A related, but somewhat more
surprising fact, is outlined in the following problem:
Homework Problem 12. Find an example of a differentiable function f :
R R so that for each x 6= y,
|f (x) f (y)|
< 1,
|x y|
and yet f has no fixed point. Prove your answer works.
24
Hint: The point of this problem is that there should be no uniform L < 1
which works for all x and y. To construct such a function f , use Problem
11 above. In particular, first construct the derivative f 0 and then integrate
to find f . (Youll need supx |f 0 (x)| = 1; why?) Use the Mean Value Theorem
to relate values of f 0 to difference quotients.
A subset C of a real vector space is convex if every line segment connecting

two points in C is contained in C. More formally, C is convex if
x, y C, t [0, 1] = tx + (1 t)y C.
Proposition 15. Any globally C 1 function from a convex domain Rn

to Rm is globally Lipschitz.
Proof. Lemma 12 above shows that for any x, y ,

i
f
|f (x)f (y)| Cnm|xy|, for C = sup j (z) : z , i n, j m .
x
C < since f is C 1 . Thus f is Lipschitz.

Consider X a locally compact metric space and Y any metric space. Then
we say a function f : X Y is locally Lipschitz if f satisfies one of the two
following equivalent definitions:
1. f is Lipschitz when restricted to any compact set of X. In other words,

if K X is compact, then there is a constant LK so that
x, x0 K = dY (f (x), f (x0 )) LK dX (x, x0 ).
2. Each x X has a neighborhood on which f is Lipschitz.
We prove these two definitions are equivalent below.
Corollary 16. On any domain Rn , any locally C 1 function f is locally

Lipschitz.
Proof. Any ball is convex (see the following homework problem), and so if
f is C 1 on a small ball, then it is Lipschitz on the ball by the previous
Proposition 15.
25
Homework Problem 13. Show that any ball Bx (r) = {y Rn : |yx| < r}
is convex.
Proposition 17. Let X be a locally compact metric space and Y be any

metric space, then for maps f from X to Y , the two definitions (1) and (2)
above are equivalent.
Proof. To prove (1) = (2), consider x X. Since X is locally compact,

there is a neighborhood O of x with compact closure. By the definition of
and is thus Lipschitz
locally Lipschitz, f is Lipschitz when restricted to O,
on O also.
To prove part (2) = (1), let K X be a compact subset. Given that
all points in X have neighborhoods on which f is Lipschitz, we need to prove
that f is Lipschitz on K. The set of all neighborhoods of points in K on
which f is Lipschitz forms an open cover of K, and thus there is a finite
subcover O1 , . . . , On . The set
n
!
[
P =K K \ Oi O i
i=1
is compact, and so the function
dY (f (x), f (x0 ))
,
dX (x, x0 )
which is continuous on P , attains its maximum M on P .

Consider any x 6= x0 K. Then either (x, x0 ) P or x, x0 Oi for
some i = 1, . . . , n. Let Li be the Lipschitz constant of f |Oi . Choose L =
max{M, L1 , . . . , Ln }. Then for every x 6= x0 K,
dY (f (x), f (x0 ))
L
dX (x, x0 )
and f is Lipschitz on K.
26
2 Ordinary Differential Equations
2.1 Introduction
An ordinary differential equation (an ODE ) is an equation of the form
x(n) (t) = F (x(n1) , . . . , x,

x, t), (10)
where x : I R is a function of t, I is an open interval in R,

dx dn x
x = , and x(n) = .
dt dtn
The order of the above equation is n, the highest derivative of x which
appears. It is also useful to consider the case
x = (x1 , . . . , xm ) : I Rm ,
which is called a system of ODEs.

Some ODEs can be solved explicitly by using integration techniques, but
most cannot. For most ODEs, instead of explicit solutions, we must rely
on an abstract existence theorem to show that for nice enough F (Lipschitz
suffices), there is a unique solution locally. We also investigate the regularity
of solutions, showing, for example, if F is smooth, then any solution to (10)
is smooth. Existence, uniqueness, and regularity are three main themes in
the theory of all differential equations, and there are satisfactory theorems
to handle all three for ODEs.
Consider the following example (where x, not t, is the dependent vari-
able):
Example 5. Consider the differential equation dy/dx = x2 y. This first order

ODE is called separable, since it is written in the form dy/dx = f (x)g(y).
Recall the solution procedure for a separable ODE:
If c is a root of g(y), then y = c is a solution. (Why?) So in the present

case, y = 0 is a solution.
27
For other values of g(y), compute
dy
= x2 y,
dx
dy
= x2 dx,
y
Z Z
dy
= x2 dx,
y
x3
ln |y| = + C,
3
3 3
y = eC ex /3 = C 0 ex /3 ,
where C 0 = eC is a nonzero constant.
If we let C 0 be any real number, then we capture both cases above, and
3
the general solution is y = C 0 ex /3 .
Homework Problem 14. Consider the ODE

dy 1 + y2
= .
dx 1 + x2
(a) Find the general solution to this differential equation. Your answer
should be rational functions of x. You may need to write your answer
using more than one case.
(b) Find the particular solution passing through (x, y) = (1, 1).
(c) Find the particular solution passing through (x, y) = (1, 1). (Hint:
What is the formula for tan( + 2 )?)
2.2 Local Existence and Uniqueness

The most natural setting for systems of ODEs is in terms of an initial value
problem. Let x = (x1 , . . . , xn ) = x(t). An initial value problem for a first
order system of ODEs at t = t0 consists of
a system of ODEs x = v(x, t)
and an initial condition x(t0 ) = x0 .
28
Well see below that if v satisfies a Lipschitz condition, and for t in a small
interval around t0 , there is a unique solution to the initial value problem.
Example 6. Consider the following problem: Find a solution to the ODE

y = y 2 subject to the initial condition y(0) = 1. Interpreting t as a time
variable, what happens as time goes forward from t = 0?
Solution: dy/dt = y 2 is separable, and so compute
Z Z
dy dy 1 1
2
= dt = 2
= dt = = t + C = y = .
y y y t+C
Plug in the initial condition y = 1 and t = 0 to solve for C to find C = 1
and
1
y= .
1t
Note that y(t) is discontinuous at t = 1, so as time goes forward from t = 0,
the solution only exists until time 1. Also note there is no problem going
backward in time, and so the solution to the initial value problem is
1
y= , t (, 1).
1t
It does not make sense to talk about the solution to the initial value problem
beyond t = 1.
The previous example shows that it is not in general possible to extend a

solution to an initial value problem for all time. However, we can still hope
to find a solution to an initial value problem on a neighborhood (t0 , t0 + )
of t0 .
Theorem 5. Consider the initial value problem

x = v(x, t),
(11)
x(t0 ) = x0
for x : I Rn for I an open neighborhood of t0 . Assume v is a Lipschitz

function from O I Rn , where O Rn is an open neighborhood of x0 .
Then on a neighborhood I of t0 contained in I, there is a unique solution
to (11).
Before we give the proof, let us consider a few examples.
29
Example 7. The differential equation x = x2 + t has no solution which can
be written down in terms of standard algebraic and transcendental functions
(such as roots, exponentials, trigonometric functions). Theorem 5 states that
there is a local solution for every initial value problem. For example, for
initial conditions x(0) = 1, there is a solution valid on an open interval
containing t = 0.
Theorem 5 does not guarantee a solution which is valid for all time t (see
Example 6 above). In fact the solution for the present initial-value problem
will also blow up in finite time. This is basically because for t 0, x =
x2 + t x2 , and so the solution should grow faster than the solution to
Example 6, which goes to infinity in finite time.
If v in Theorem 5 is not Lipschitz, then it is possible to lose the uniqueness

statement from Theorem 5 (although existence is still valid).
Example 8. Consider the initial value problem

2
x = x 3 , x(0) = 0.
Then it is straightforward to verify that x(t) = 0 is a solution. There is

another solution, however. Solve the equation
dx 2
= x3 ,
dt
23
x dx = dt,
Z Z
23
x dx = dt,
1
3x 3 = t + C,
x = ( 13 t + 13 C)3 .
Then plug in x(0) = 0 to find C = 0 and the solution x(t) = ( 13 t)3 .

2
The point of this example is that v = x 3 is not Lipschitzsee Example 4
above. Therefore, Theorem 5 does not apply.
Proof of Theorem 5. The idea of the proof is to set up the problem in terms
of a contraction mapping. We first find an iteration whose fixed point solves
the differential equation and then find an appropriate complete metric space
on which the iteration is a contraction map.
30
For a continuous Rn -valued function defined on a neighborhood of t0 ,
let A be another such function defined as follows:
Z t
(A)(t) = x0 + v(( ), )d. (12)
t0
(Note we are integrating Rn -valued function. This may be related to the

usual R-valued integration theory by considering each component separately.)
A will be our iterative map, and we consider , A, A2 , etc., to be the
Picard approximations for the initial value problem. We consider Picard
approximations because of the following
Lemma 18. A continuous fixed point of the Picard approximation (12) is a
solution to the initial value problem (11). In particular, any such fixed point
is continuously differentiable.
Proof. If A = , then compute
Z t
d
= x0 + v(( ), )d = v((t), t)
dt t0
by the Fundamental Theorem of Calculus. In particular, since and v are

continuous (Lemma 13), is continuous, and so is continuously differen-
tiable. Lastly, check the initial condition
Z t0
(t0 ) = x0 + v(( ), )d = x0
t0
to complete the proof of the lemma.

Our complete metric space will be
Rn ) : (t0 ) = x0 , sup |(t) x0 | P },
X = { C 0 (I,
tI
where I = [t0 , t0 + ] I for a small positive to be determined later,

| | is the norm on Rn , and P is chosen so that the closed ball Bx0 (P ) = {x :
|x x0 | P } O. We first demonstrate
Lemma 19. X is a complete metric space.
31
Rn ) is complete by Proposition 1. Moreover, the
Proof. First of all, C 0 (I,
conditions imposed give closed subsets of the Banach space C 0 . The second
condition is obviously closed since the norm on any Banach space is contin-
uous. To check the condition (t0 ) = x0 is closed, use the following lemma,
whose proof is immediate:
Lemma 20. For a metric space J and y J, the map from the Banach
space C 0 (J, Rn ) to Rn given by f 7 f (y) is continuous.
Since these two conditions are closed, X is a closed subset of the complete
Rn ), and so is complete with the induced metric.
metric space C 0 (I,
Remark. Lemma 20 is false for the Banach space L . Why?
So we have proved that X is a complete metric space. Next we show
Lemma 21. For > 0 small enough, A : X X.
Proof. First of all, choose > 0 so that [t0 , t0 + ] I. Since v is
continuous and {x : |x x0 | P } [t0 , t0 + ] is compact, there is a
constant M so that
sup |v(x, t)| M.

|tt0 |, |xx0 |P
In order for this bound to work below, we must have (so then I
[t0 , t0 + ]). To check A : X X, we need to check for each X,
1. A is continuous. This follows as in Lemma 18 above.
2. (A)(t0 ) = x0 . This is easy to check as in Lemma 18.
3. suptI |(A)(t) x0 | P . To check this, write

Z t

|(A)(t) x0 | = v(( ), )d M |t t0 | M ,
t0
where we have used the fact that X and the definition of M to

show the first inequality. So this condition is satisfied if P/M .
So A : X X if min{, P/M }.
32
Finally we use the Lipschitz hypothesis on v to show that A is a con-
traction map. Let L be the Lipschitz constant for v. Then for , X,
compute
Z t

|(A)(t) (A)(t)| = [v(( ), ) v(( ), )]d
t
Z t0
|v(( ), ) v(( ), )|d
t0
Z t
L|( ) ( )|d
t0
Lk kC 0 |t t0 |
Lk kC 0
Then since kA AkC 0 = suptI |(A)(t) (A)(t)|, we see that
kA AkC 0 Lk kC 0 .
So A is a contraction map if < 1/L. Thus all together, if we require
< min{, P/M, 1/L}, then A is a contraction map on X, and its fixed
point is a solution to the initial value problem.
In order to show uniqueness of the initial value problem, note that the
Contraction Mapping Theorem automatically proves that any two continuous
solutions 1 and 2 to the initial value problem from I to Rn must coincide
if the additional constraint
sup |(t) x0 | P
tI
is satisfied. Since 1 and 2 are continuous and satisfy the initial condition,
this condition is automatically satisfied for both 1 and 2 on a (perhaps
smaller) interval I I containing t0 . Then uniqueness applies on this smaller
interval, since A is a contraction map for any small enough. Note that the
interval I on which 1 = 2 may depend on 1 and 2 . The proof that the
two solutions must coincide on all of I depends on the Extension Theorem 6
below.
We record what we have proven so far with respect to uniqueness here.
Proposition 22. Any two solutions 1 and 2 to the initial value problem
(11) coincide on a small interval containing t0 . The interval may depend on
the solutions 1 and 2 .
33
Remark. Note that in the proof of the previous theorem, we only use that
v is Lipschitz in the x variables (with a uniform Lipschitz constant uniform
valid for all t). We still require v to be continuous in t.
The previous theorem provides a continuously differentiable solution on
an interval I containing the initial time t0 and proves uniqueness on a (per-
There is a satisfactory more global theory of ODEs
haps) smaller interval I.
which we detail in the next subsection.
2.3 Extension of solutions

Recall, from Corollary 16 above, that any locally C 1 function f from , a
domain in Rn , to Rm is locally Lipschitz. In other words, f is Lipschitz when
restricted to any compact subset of .
Theorem 6 (Extension). Consider an initial value problem
x = v(x, t), x(t0 ) = x0 . (13)
Assume v is continuous and locally Lipschitz in Rn I, where I is an open

interval containing t0 . Then there is an open interval J satisfying t0 J I
and a unique solution : J Rn to the initial value problem. Moreover,
J is maximal in the following sense: if there is a time T I J, then
lim suptT |(t)| = .
So this theorem says that if we start with an initial condition x(t0 ) = x0

and flow forward (or backward) in time by satisfying the ODE, then there
is a unique solution which continues until (1) the end of the interval I is
reached, or (2) the solution blows up.
Proof. We first consider the following lemma, which is a consequence of the
proof of Theorem 5 above:
Lemma 23. On any compact subset K of Rn I, there is an > 0 so
that for any (x0 , t0 ) K, there exists a solution to the initial value problem
x = v(x, t), x(t0 ) = x0 which is valid on [t0 , t0 + ].
The point is that there is a uniform which works for all initial conditions
(x0 , t0 ) K.
34
Proof. Recall that in the proof of Theorem 5. Any < min{, P/M, 1/L}
works. By compactness of K and since I is open, we can choose a uniform
> 0 so that for all (x0 , t0 ) K, [t0 , t0 + ] I. We may choose P to
be any positive number (since O = Rn in the present case). The Lipschitz
constant L = LK is uniform over any compact set K by the locally Lipschitz
property of v (Proposition 17). Let
M = max |v(x, t)|,

(x,t)K
where
= {(x, t) Rn+1 : (x0 , t0 ) K : |t t0 | , |x x0 | P }
K
It is straightforward to check K is compact (it is the image of the com-

pact set K BP (0) [, ] Rn+1 Rn+1 under the continuous map
+ : Rn+1 Rn+1 Rn+1 .) Therefore, since v is continuous, M can be chosen
independently of (x0 , t0 ) K.
(Note the reason we need to go to all of K: the definition of M in the
proof of Theorem 5 above is
M= sup |v(x, t)|.

|tt0 |,|xx0 |P
In order to have a single M work for all (x0 , t0 ) K, we must have let
L must be valid on all of K
(x, t) K. as well, since we consider integrals
from t0 to t, where (x0 , t0 ) K, |t t0 | < .)
Now we must ensure that < min{, P/M, 1/L}. All of these quantities
can be chosen independently of (x0 , t0 ) K.
Lemma 24 (Gluing solutions). Consider any two solutions to x = v(x, t)
which are defined on intervals in R. If the two coincide on any interval
in R then they must coincide on the entire intersection of their intervals of
definition. Thus they can be glued together to form a solution on the union
of their intervals of definition.
Proof. Consider two solutions 1 , 2 to x = v(x, t) defined on intervals I1
and I2 . Assume they coincide on an interval I3 I1 I2 . We want to show
1 = 2 on all of I1 I2 . Let I4 be the largest interval containing I3 on which
1 and 2 coincide (take I4 to be the path-connected component of the closed
set {t : 1 (t) = 2 (t)} containing I3 ). Now we will show that I4 = I1 I2 .
35
Assume I4 6= I1 I2 . Then since I4 is a relatively closed subinterval of
I1 I2 , there is an endpoint T of I4 in the interior of I1 I2 . Now 1 and 2
are both solutions of
x = v(x, t), x(T ) = 1 (T ) [= 2 (T )].
Proposition 22 shows that 1 and 2 must agree on a small interval I5 3 t0 .

Thus I4 must contain I5 , and we have a contradiction to the assumption that
T is an endpoint of I4 in the interior of I1 I2 . Thus I4 = I1 I2 .
It may help to refer to the following picture of the intervals involved.
I1
I2
I1 I2
I3
I4 r T
I5
Now we have proved that 1 = 2 on the intersection of their domains of

definition I1 I2 . To extend to I1 I2 , define

1 (t) for t I1 ,
(t) =
2 (t) for t I2 \ I1 .
Note that is a solution to the differential equation since both 1 and 2

are. There is no trouble with the differentiability of this piecewise-defined
function since 1 = 2 on the whole interval I1 I2 .
For simplicity, consider only solutions moving forward in time. Let
E = {t I+ : there is a unique solution to (13) on [t0 , t)},
where I+ = I (t0 , ). We will set this E to be equal to J+ = J (t0 , ).

Uniqueness on [t0 , t) means any other solution to the initial value problem
defined on an interval containing [t0 , t) must coincide with there. It will
suffice to prove the following
Lemma 25. If supE || C < , then E = I+ .
Proof. Assume || is uniformly bounded on E. Then to prove the lemma it
is enough to show that E is a nonempty, open, and closed subset of I+ (and
36
so E = I+ since I+ is connected). E is nonempty by Theorem 5 and Lemma
24 above.
To show E is open in I+ , let T E. Then there is a unique solution
defined on (t0 , T ). First we note that (t0 , T ] E. To see this, let T 0 (t0 , T ].
Then the restriction of = T to [t0 , T 0 ] is a solution to (13) on [t0 , T ).
Moreover, it is unique, since any other solution to (13) on [t0 , T 0 ) agrees with
on a neighborhood of t0 , and so Lemma 24 shows they must agree on all
[t0 , T 0 ).
So to show E is open, we may restrict our attention to times larger than
T . Since || is uniformly bounded by C and [t0 , T ] is a compact subinterval
of I, we may apply Lemma 23 to show there is uniform so that any solution
to the differential equation with initial condition x( ) = for [t0 , T ],
|| C must exist on [ , + ]. Now we may consider the initial value
problem
x = v(x, t), x(T 2 ) = (T 2 ). (14)
So Lemma 23 shows there is a solution to this initial value problem which
exists on [T 32 , T + 2 ]. Moreover, Lemma 24 says that = on the
intersection of their intervals of definition, and moreover, that may be
extended by to a solution on [t0 , T + 2 ]. Lemma 24 also implies this
extension is unique on every subinterval containing t0 , and so in particular
[T 2 , T + 2 ] E and E is open.

T 2
T
r r [t0 , T ]
[T 32 , T + 2 ]
It remains to show that E is closed in I+ . Let T E I+ . Let ti E,

ti T . Then the assumption that || C on E implies there is a uniform
so that for all ti , there is a solution on [ti , ti + ]. Choose ti so that
|T ti | < . Also, let < ti so that |T | < . Now we use the same
argument as in previous paragraphs: Use the solution on [t0 , ti ) to construct
a solution on [ , +] 3 T . Lemma 24 allows us to glue and together
to form a unique solution valid on [t0 , + ] 3 T . So T E as above and E
is closed in I+ .
T ti T
r r r r [t0 , T ]
[ , + ]
37
This Lemma 25 completes the proof of the Extension Theorem 6, at least
for solutions moving forward in time. The reason is this: if there is a time
T I+ J (we may choose I+ since we are only moving forward in time),
then
E = J+ 6= I+ .
Therefore, by the contrapositive of Lemma 25, supE || = . But since is
continuous on [t0 , T ), we must have lim suptT |(t)| = .
The argument for solutions moving backward in time is the same.
The above theorem may be improved as follows:
Theorem 7. Consider an initial value problem x = v(x, t), x(t0 ) = x0 .

Assume v is continuous and locally Lipschitz in U, where U is a connected
open subset of Rn R containing (x0 , t0 ). Then there is an open interval
J satisfying t0 J and a unique solution : J Rn to the initial value
problem. Moreover, J is maximal in the following sense: Let J+ = J (t0 , )
and J = J (, t0 ). Then neither of the graphs G = {(t, (t)) : t J }
is contained in any compact subset of U.
The proof is essentially the same as that of Theorem 6.

Here is an important principle which follows from the basic theorems
Proposition 26. Consider the graph of a solution (t, x(t)) to a differential

equation x = v(x, t), where v is Lipschitz. If any two solutions have graphs
which cross, then they must coincide on the intersection of their intervals of
definition.
Proof. Let 1 and 2 be the two solutions. If their graphs cross at (t0 , x0 ),
then they both solve the initial value problem
x = v(x, t), x(t0 ) = x0 .
The solutions must coincide on a small interval by Proposition 22, and then
must coincide on the whole intersection of their intervals of definition by
Lemma 24.
Homework Problem 15. Consider the initial value problem x = x2 + t,

x(0) = 1. Show that the solution to this problem (moving forward in time)
exists only until some time T > 0, where T < 1.
38
Hint: See Examples 6 and 7 above. Let (t) be the solution to the current
1
initial value problem. We will compare to the solution (t) = 1t of the
initial value problem x = x2 , x(0) = 1. Let J be the maximal interval on
which can be extended. Let J+ = J (0, ); T is then the positive endpoint
of J+ . Now consider the interval
E = {t J+ : ( ) ( ) for all (0, t]}.
(a) Show that E = J+ implies T 1. (Use Theorem 6.)
(b) Proceed to show E = J+ . It suffices to show E is nonempty, open and

closed in J+ . Why?
(c) To show E is nonempty, differentiate the equation = 2 + t at t = 0.

This will allow you to compute (0).
Show that (0) = (0), (0) =

(0), and (0) > (0). Why does this show E is nonempty? (Use
Taylors Theorem or integrate in t twice; in particular, by the regularity
results in Subsection 2.5 below, is continuous.)
> (t)
(d) To show E is open, show that (t) for t E.
(e) To show E is closed, use the continuity of and . So this proves

E = J+ and so T 1.
(f ) To show T < 1, note that part (c) implies there is a point E

be the solution to the initial value problem
where ( ) > ( ). Let (t)
x = x , x( ) = ( ). Solve this equation explicitly and show that
2
blows up at a time T < 1. Then note that parts (a)-(e) can be repeated
to show that J+ (0, T).
2.4 Linear systems

If x Rn , a homogeneous linear system is a system of the form x = A(t)x,
where A(t) is an n n matrix valued function of t alone. In this case, it is
straightforward to see that the space of solutions is a vector space over R.
In other words, if R, , satisfy the equation, then + also satisfies
the equation. The existence and uniqueness theorem allows us to find the
dimension of the solution space.
39
Proposition 27. Consider the equation x = A(t)x, where A(t) is a contin-
uous n n matrix valued function of t, and x(t) Rn . For each t0 , there is
an interval I 3 t0 so that the space of solutions (t) on I has dimension n.
Consider an initial value condition x(t0 ) = x0 . Let x0 (t) be the solution to
this initial value problem. Then the map S : x0 7 x0 is a linear isomorphism
from Rn to the space of solutions defined on I.
Remark. It is not too hard to show that the interval I can be taken to be
the maximal open interval containing t0 on which A(t) is continuous. (See
Michael Taylor, Partial Differential Equations, Basic Theory.)
Proof. A(t)x is locally Lipschitz in x and continuous in t, as needed for
Theorems 5 and 6. First of all, for a basis i of Rn , let I be a small interval
on which all the solutions i exist. Note the map x0 7 x0 is obviously
linear. S is injective since if x0 6= y0 , x0 (t0 ) 6= y0 (t0 ), and thus x0 6= y0 .
Therefore, if x0 = ai i , x0 = ai i . Again by uniqueness, any solution to
x = A(t)x is determined by the initial value (t0 ) = x0 , and so S is onto.
Given a linear equation x = A(t)x, for x = x(t) Rn , we can consider a
similar equation X = A(t)X for X = X(t) an n n matrix valued function.
The solution (t) of the initial value problem
X = A(t)X, X(t0 ) = I the identity matrix,
is called the fundamental solution of the equation x = A(t)x. It is straight-
forward to see that the ith column of (t) is the solution to x = A(t)x,
xj (t0 ) = ij . Moreover, the fundamental solution can be used to compute any
solution to the differential equation near t0 .
Lemma 28. On the maximal interval of existence of the fundamental solu-
tion (t) of x = A(t)x, the solution to the initial value problem
x = A(t)x, x(t0 ) = x0 ,
is given by (t)x0 .
Proof. The proof is an immediate calculation.
Homework Problem 16. An inhomogeneous linear system is a system of
the form
x = A(t)x + b(t), (15)
where A(t) and x are as above and b(t) is a continuous Rn -valued function.
40
(a) Let (t) be a solution to (15). Show that the solution space to (15) is
equal to
{(t) + (t) : (t) solves x = A(t)x}.
(b) In dimension 1, let (t) be the fundamental solution to x = A(t)x.
Show that the general solution to (15) is
Z
b(t)
(t) dt + C .
(t)
(c) Still in dimension 1, solve the initial value problem x = x + t, x(0) = 1.

An important example class of equations are those with constant coeffi-
cients. x = Ax, for A a constant n n matrix. The fundamental solution to
such an equation (with t0 = 0) can be calculated directly. In the case that A
is diagonalizable, write A = P DP 1 , with D = diag (1 , . . . , n ) the diago-
nal matrix with the eigenvalues i of A along the diagonal and P the matrix
whose columns are a basis of eigenvectors for the appropriate eigenvalues.
Then if we define
etD = diag (et1 , . . . , etn ),
then the fundamental solution to x = Ax is given by
etA P etD P 1 .
To check that etA is the fundamental solution, note that e0A = I and
d tA d
e = (P etD P 1 )
dt dt
d tD
= P e P 1
dt
= P DetD P 1
= P DP 1 P etD P 1
= AetA .
One thing to note is D and P may be complex-valued matrices. This doesnt
cause any problem if we use Eulers formula
ex+iy = ex (cos y + i sin y).
Not every matrix B is diagonalizable. To find a general formula for the
fundamental solution etB , we need to deal with the case of Jordan blocks.
The following problem addresses this.
41
Homework Problem 17. Let B be the n n Jordan block matrix

1 0 0
0
1 0
0
0 0 (16)
.. .. .. . . ..
. . . . .
0 0 0
with on the diagonal, 1 just above the diagonal, and 0 elsewhere. Find the
fundamental solution etB to x = Bx.
Hint: Write out the system of equations in terms of components. Note
that x n only involves xn and not any other xi . So first solve the appropriate
initial value problems for xn (youll need to do one initial value problem for
each column of the identity matrix I). Then do xn1 , then xn2 , etc., and
find a formula that works for all xi .
Alternatively, it is possible to write out etB as a power series. If you
approach the problem this way, you must check to be sure your answer works.
Of course the reason we consider Jordan blocks is the following famous

theorem.
Theorem 8 (Jordan Canonical Form). Let A be an nn complex matrix.

Then we can write A = P BP 1 , where B is an upper triangular, block
diagonal matrix of the form

B1 0 0 0
0 B2 0 0

0 0 B3 0
B= ,
.. .. .. . . ..
. . . . .
0 0 0 Bm
where each Bi is an li li Jordan block matrix of the form (16) for i =

1, . . . , m, = i an eigenvalue of A. Of course l1 + + lm = n. If is a
root of the characteristic polynomial det(I A) repeated k times, then
X
li = k.
i =
B is unique up to the ordering of the blocks Bi .
42
Remark. A is diagonalizable if and only if each Jordan block is 1 1. If
the characteristic polynomial of A has distinct roots, then A is diagonaliz-
able, but the converse is false in general (A = I the identity matrix is a
counterexample).
Homework Problem 18. Assume that all the eigenvalues of the n n
matrix A have negative real part. (A is not necessarily diagonalizable.) Show
that etA 0 as t . (Just check that each entry in the matrix etA goes
to 0.)
Homework Problem 19. Solve the initial value problem
x = 2x y, y = 2x + 5y, x(0) = 2, y(0) = 1.
2.5 Regularity
Regularity of a function refers to how many times the function may be differ-
entiated. A function is (locally) C k if it and all of its partial derivatives up to
order k are continuous. A function is C if it and all of its partial derivatives
of all orders are continuous. For the purposes of this course, a function is
smooth if it is C (in other settings a function may be called smooth if it has
as many derivatives as the purpose at hand requires). There are other no-
tions of regularity in which the function and perhaps its derivatives, suitably
defined, are in Lp or other Banach spaces.
A vector-valued function is smooth or C k if and only if each of its com-
ponent functions is smooth or C k respectively.
Theorem 9. Assume v : O I Rn is smooth (O Rn is a domain and
I R is an open interval). Any solution to x = v(x, t) is smooth.
Proof. Let be a solution. Since exists, then is differentiable, and thus
continuous. Since v is continuous as well, = v(, t) is continuous and so
is (locally) C 1 . Now since v is smooth, we may differentiate to find
= v (, t) i (t) + v (, t).
(t)
xi t
Now since and and the partial derivatives of v are continuous, we see that
is continuous and is (locally) C 2 . Since v is smooth, we can keep differ-
entiating, using the chain and product rules, to find by induction dm /dtm
is continuous for all m and so is C .
43
Remark. The technique used in the proof of Theorem 9 above is called boot-
strapping. In this process, once we know that is C 0 , we plug into the
equation to find that is C 1 . Then we use the fact that is C 1 to prove
is C 2 , etc.
Remark. The proof above also shows that if v is C k , then is C k+1 .
2.6 Higher order equations

A higher-order systems of ODEs is of the form
x(m) = v(x(m1) , . . . , x,
x, t), (17)
m
where of course x(m) = ddtmx . There is an easy trick to transform this system to
an equivalent first-order system with more variables. Let y 1 = x, . . . , y m1 =
x(m1) . Then it is easy to see the system (17) above is equivalent to the
system

y m1 = v(y m1 , . . . , y 1 , x, t),
m2
y = y m1 ,

.. .. (18)
. .
1 2

y = y ,
x = y 1 .

This first-order system leads us to the appropriate formulation of the initial-

value problem:
Theorem 10. Let U be a neighborhood of (xm1 0 , . . . x10 , x0 , t0 ) in Rnm+1 =

n n n
R R R. Let v : U I R be locally Lipschitz. Then there is an
interval J on which there is a unique solution to the initial value problem

x(m) = v(x(m1) , . . . , x,
x, t),
(m1) m1
x (t ) = x ,

0 0
.. .. (19)
. .
0 ) = x10 ,

x(t
x(t0 ) = x0 .
Moreover, if T is an endpoint of J (either finite or infinite), then as t T ,

(x(m1) , . . . , x,
x, t) leaves every compact subset of U.
Proof. Apply Theorems 5 and 7.
44
So for an mth order differential equation, we need initial conditions for
the function and its derivatives up to order m 1.
Remark. The trick of introducing new variables into a system of ODEs is
standard in physics. For a particle at position x = x(t), a typical equation
involves how a force acts on the particle. The sum F of the forces acting on
the particle must be equal to m x, where m is a constant called the mass. It is
standard to introduce a new vector quantity, called the momentum q = mx.
Then F = m x is equivalent to the system
q
q = F, x = .
m
Again, an important class of examples is linear equations with constant
coefficients. If
x(m) + am1 x(m1) + + a1 x + a0 x = 0,
for x a real-valued function, the functions {ek t } are linearly independent in

the solution space, if k solve the characteristic equation
m + am1 m1 + + a1 + a0 .
If all the roots are distinct, then {ek t } form a basis. If a root is repeated l
times, then we must consider functions of the form tj ek t for j = 0, . . . , l 1
to form a basis of the solution space.
Eulers formula again allows us to handle complex roots of the character-
istic equation.
Homework Problem 20. For which real values of the constants a and b do
all the solutions to
x + ax + bx = 0
go to 0 as t ? Prove your answer, and draw your answer as a region in
the (a, b) plane.
2.7 Dependence on initial conditions and parameters

Weve shown above that if v = v(x, t) is smooth, then the resulting solution
to x = v(x, t), x(t0 ) = x0 is also smooth as a function of t. The initial value
problem also depends on the initial point x0 . We investigate regularity of
the solution depending on x0 .
45
First of all we remark that there is a neighborhood N of (x0 , t0 ) in Rn+1
and an > 0 so that every solution to the equation with initial condition
x( ) = y for (y, ) N exists by Lemma 23. This existence on a neighbor-
hood allows us to consider taking derivatives in y in what follows.
Theorem 11. Let v be a C 2 function on a neighborhood of the initial con-
ditions (y, t0 ) Rn R. Then the solution = (y, t) to the initial value
problem
x = v(x, t), x(t0 ) = y,
is C 1 in y.
Proof. If /y i exists, then it must satisfy

Dy = Dx v(, t) Dy .
t
(Here Dy is the total derivative matrix with respect to the y variables. So
its entries are j /y i .) So = (, Dy ) = (x, z) should satisfy the initial
value problem

x = v(x, t),
z = Dx v(x, t) z,

(20)

x(t0 ) = y,
z(t0 ) = I the identity matrix.

Note that since v is C 2 , Dx v = (v k /xj ) is C 1 and is thus locally Lipschitz

by Proposition 15. Even though we dont yet know that the derivative Dy
satisfies the equation, we do know that the initial value problem (20) is
solvable.
In order show the solution to (20) is the partial derivative, we return to the
proof of Theorem 5. Let 0 = y, 0 = I the identity matrix. Then (0 , 0 )
satisfy the initial conditions in (20). Now we form Picard approximations
Z t
n+1 (y, t) = y + v(n (y, ), ) d,
t0
Z t
n+1 (y, t) = I + Dx v(n (y, ), ) n (y, ) d.
t0
It is easy to show by induction that Dy n = n . We already have the

initial step n = 0, and since we can differentiate under the integral sign
46
(see Proposition 11 above), we can easily check that Dy n = n implies
Dy n+1 = n+1 .
We know by the proof of Theorem 5 that n and n uniformly
on a small interval containing t0 . Then Proposition 8 shows that /y i = i
the ith component of for i = 1, . . . , n. Since these partial derivatives are
continuous (the uniform limit of continuous functions is continuous), then
Proposition 6 shows Dy = .
Remark. The previous theorem is true if we assume v is only C 1 and not
necessarily C 2 . The proof is more involved in the case v is only C 1 . (See
Taylor, Partial Differential Equations, Basic Theory, section 1.6.)
A bootstrapping argument can be used to prove the following
Proposition 29. For r 2, let v be a C r function on a neighborhood of

the initial conditions (x0 , t0 ) Rn R. Then the solution = (y, t) to the
initial value problem
x = v(x, t), x(t0 ) = y,
is C r1 in y.
Proof. Let Proposition Tr be the proposition for a given r 2. We proceed

by induction. The case r = 2 is proved above in Theorem 11. Now assume
that the Proposition Tr has been proved. To prove Tr+1 , assume that v is
locally C r+1 and let be a solution to the initial value problem. Then Dx v
is locally C r . Now as above, the pair (, Dy ) = (x, z) satisfies
x = v(x, t), z = Dx v(x, t) z. (21)
Now analyze the right-hand side of the equations in (21). They are C r
functions of x, z, t. Therefore, Proposition Tr shows that z = Dy is locally
C r1 in y. Since the first partial derivatives of are C r1 , is C r . This
proves the inductive step, and the proposition.
We also have the following
Corollary 30. If v = v(x, t) is smooth (C ), then the solution to the

initial value problem x = v(x, t), x(t0 ) = y is smooth in y.
Moreover, it is not too hard to prove the following:
47
Theorem 12. Let r 2. If v(x, t) is C r jointly in x and t, and if is the
solution to x = v(x, t), x(t0 ) = y, then is jointly C r1 in y, t and t0 .
Idea of proof. The difficult part is already done (the C r1 dependence on y).
For the rest, recall that any solution = (y, t0 , t) satisfies
Z t
=y+ v((y, t0 , ), ) d.
t0
Then use the Fundamental Theorem of Calculus and Proposition 11 above

to produce a bootstrapping argument to show that the appropriate partial
derivatives are continuous.
For a complete proof, see Arnold, Ordinary Differential Equations, sec-
tion 32.5.
Homework Problem 21. For f = f (x, t, y) a smooth function real variables

of x, t, and y, compute
Z t2
d
f (x(t, y), t, y) dy.
dt 0
Make sure your answer works for the functions f (x, t, y) = x2 ty + t3 y 2 + x,

x(t, y) = y 2 + t2 .
Hint: Carefully rename all intermediate variables and applyR the Chain
Rule. It also should help to write down the anti-derivative F = f (x(t, y), t, y) dy
and work with the function F using the Fundamental Theorem of Calculus.
Homework Problem 22 (Smooth dependence on parameters). Show

that if v = v(x, t, ) is jointly smooth on a neighborhood of (x0 , t0 , 0 ) in
Rn R Rm , then the solution to the initial value problem
x = v(x, t, ), x(t0 ) = x0
is smooth as a function of .
Hint: Show that this initial value problem is equivalent to the problem
x = v(x, t, ), x(t0 ) = x0 , = 0, (t0 ) = .
48
2.8 Autonomous equations
An ODE system of the form x = v(x) is autonomous. In other words, a
system is autonomous if there is no explicit dependence on t. The main fact
about autonomous systems is the following proposition, whose proof is an
easy computation:
=
Proposition 31. If is a solution to x = v(x), then for all T R, (t)
(t + T ) is also a solution.
A constant solution to an ODE system is called an equilibrium solution.

The equilibrium solutions to autonomous equations correspond to the roots
of v.
Example 9. Consider the initial value problem x = x2 1. Then to solve,

we have the equilibrium solutions x = 1 and x = 1. If x2 1 6= 0, compute
dx
= x2 1,
Z dt Z
dx
= dt,
x2 1
Z
1 1 1
dx = t + C,
2 x1 x+1

1 x 1
ln = t + C,
2 x + 1
x1
= e2t+2C
x+1
= Ae2t , A = e2C 6= 0,
1 + Ae2t
x = ,
1 Ae2t
x(0) 1
A = .
x(0) + 1
If x(0) (1, 1), then A < 0, and the solution x exists for all time and
is bounded between the equilibrium solutions at 1 and 1. Moreover, x
approaches the equilibrium solutions x 1 as t and x 1 as
t . If x(0) > 1, then A (0, 1) and the solution exists only for
t (, 12 ln A). If x(0) < 1, then A > 1 and the solution exists only
for t ( 21 ln A, ).
49
This behavior is typical of the behavior of autonomous equations for Lip-
schitz v. Any bounded solution which exists for all time must be asymptotic
to equilibrium solutions as t . Also note that any integral curve I
acts as a barrier to other solutions, in that no other integral curves can cross
I (see Proposition 26 above).
Homework Problem 23. Let v : R R be locally Lipschitz. Show
that any bounded solution of x = v(x) which exists for all time satisfies
limt (t) = c, where v(c) = 0.
Hint: There are three cases:
Case 1: v((0)) = 0. Show that is constant by uniqueness.
Case 2: v((0)) > 0. Show that v((t)) > 0 for all t (if it is ever equal to
zero, apply the argument of Case 1 above to show is constant; also use the
continuity of v ). Now show (t) is always increasing, and so must have
a finite limit c as t . Compute limt v((t)). Write
Z Z
> c = (0) +
(t) dt = (0) + v((t)) dt,
0 0
and show that v(c) = 0.

Case 3: v((0)) < 0 is essentially the same as Case 2.
2.9 Vector fields and flows

An important interpretation of autonomous systems of equations is given in
terms of vector fields. Interpret x(t) as a parametrized curve x : I Rn ,
where I R is an interval. Then x(t) is the tangent vector to the curve at
time t. For O R an open set, a function v : O Rn can be thought
n
of as a vector field. In other words, at every point x O, v(x) is a vector

in Rn based at x. Then we have a natural interpretation of an autonomous
differential equation x = v(x) as the flow along the vector field v.
For any solution to x = v(x), the tangent vector x(t)
must be equal to
the value of the vector field v(x(t)). The solution x(t) is an integral curve
to the equation x = v(x). The integral curves for the solution are tangent
to the vector field at each point x. Moreover, if v(x) is locally Lipschitz,
then the solutions are unique, and we may think of the vector field as giving
unique directions for how to proceed in time at each point in space. By
the invariance of solutions in time, we have the following strong version of
uniqueness:
50
Proposition 32. Let O Rn be an open set, and let v : O Rn be locally
Lipschitz. If 1 and 2 are two maximally extended solutions to x = v(x)
which satisfy 1 (t1 ) = 2 (t2 ), then 1 (t) = 2 (t + t2 t1 ) for all t in the
maximal interval of definition of 1 .
Proof. 1 (t) and 2 (t) = 2 (t + t2 t1 ) both satisfy the initial value problem
x = v(x), x(t1 ) = 1 (t1 ),
and so must be the same by Theorems 5 and 6.

For a vector field v on O Rn , a picture of all the integral curves on O
is called the phase portrait of v. Recall we drew in class the phase portraits
of the two systems in R2

1 0 3 4
x = x, x = x.
0 1 2 3
Homework Problem 24.
(a) Draw the phase portrait of the system in R2

1 0
x = x.
0 2
Show that each integral curve lies in a parabola or a line in R2 .
(b) Draw the phase portrait of the system in R2

3 1
2
2
x = x.
12 32
Here is the principal theorem regarding flows of vector fields on open sets:
Theorem 13. Let O Rn be open, and v : O Rn be smooth. Then there
is an open set U so that O {0} U O R on which the solution (y, t)
to
x = v(x), x(0) = y
exists, is unique, and is smooth jointly as a function of (y, t).
Proof. This follows immediately from Theorems 5, 7 and 11.
51
Remark. It may not be possible to find an > 0 so that O (, ) U. The
reason is that solutions may leave O in shorter and shorter times for initial
conditions y O. A simple example is given by v(x) = 1, O = (0, 1).
This problem cannot be fixed by considering O = Rn , since we may have
v(y) rapidly as y in Rn . However, see the following corollary.
Corollary 33. Under the conditions of Theorem 13 above, if K O is
compact, then there is an > 0 so that the solution
: K (, ) O.
Proposition 34. Consider (y, t) the solution to x = v(x), x(0) = y, for v
smooth. Then as long as (y, t1 ), (y, t1 + t2 ) O, then
(y, t1 + t2 ) = ((y, t1 ), t2 ).
Proof. Consider
(t) = (y, t1 + t), (t) = ((y, t1 ), t).
Then if we show and satisfy the same initial value problem, then unique-
ness will show that (t) = (t) and we are done.
Compute
(0) = (y, t1 ),
(0) = ((y, t1 ), 0) = (y, t1 ),

(t) = t1 + t) 1 = v((y, t1 + t)) = v((t)),
(y,

(t) =
((y, t1 ), t) = v(((y, t1 ), t)) = v((t)).
Note that it is necessary in the previous Proposition 34 to restrict to

times in which the solution does not leave O. In fact, long-time existence
of flows along vector fields is problematic on open subsets of Rn . Recall we
require our subsets to be open for ODEs since we want to be able to take two-
sided limits for any derivatives involved. On the other hand, compactness
guarantees a uniform time interval for existence. But compact subsets of Rn
are closed and bounded, and thus (if nonempty) cannot be open. The way
out of this problem is to consider compact manifolds, which we will realize
as compact lower-dimensional subsets of Rn . For example,
S1 = {(x1 , x2 ) : (x1 )2 + (x2 )2 = 1}
is a compact one-dimensional submanifold of R2 .
52
2.10 Vector fields as differential operators
A vector field v on O naturally differentiates functions f on O by the direc-
tional derivative:
f
vf = Dv f = v i i
x
for v i the components of v. Therefore, we often write

v = vi .
xi
We say that v is a first-order differential operator on functions f .
This observation is natural from the point of view of ODEs by the fol-
lowing
Proposition 35. For an interval I R, let : I Rn be a solution to the

autonomous system x = v(x), where v : O Rn is a continuous function and
O an open subset in Rn . Also consider a differentiable function f : O R.
Then the derivative
(f )0 (t) = (Dv f )((t)) = (vf )((t)).
Proof. Compute
(f )0 (t) = (Df )((t)) (D)(t)

f di
= ((t)) (t)
xi dt
f
= i
((t))v i ((t))
x

i
= v f ((t))
xi
= (vf )((t)).
Define the bracket [v, w] of two operators to be
[v, w]f = (vw wv)f = v(wf ) w(vf ).
Homework Problem 25. Let v and w are two smooth vector fields on .
53
(a) Show that the differential operator [v, w] is also a first-order differential
operator determined by a vector field (which we also write as [v, w]).
What are the components of [v, w]?
(b) For smooth vector fields u, v and w, show that
[u, v] = [v, u]
and
[[u, v], w] + [[v, w], u] + [[w, u], v] = 0.
(This last identity is the Jacobi identity.)
Remark. Part (b) of the previous problem shows that the vector space of
smooth vector fields on O is a Lie algebra. The bracket [, ] is called the Lie
bracket.
54
3 Manifolds
3.1 Smooth manifolds
We define smooth manifolds as subsets of RN . We basically follow Spivak,
Calculus on Manifolds, Chapter 5. When we say smooth in this section, we
mean C .
We say a subset M Rn is a smooth k-dimensional manifold (or, more
properly, a submanifold of Rn ), if for all x M , there are open subsets
U Rk and O M with x O and a one-to-one C map : U Rn
satisfying
1. (U) = O.
2. For all y U, D(y) has rank k.
3. 1 : O U is continuous.
Such a pair (, U) is called a local parametrization of M . The components

of the map 1 : O Rk are local coordinates on M . A set of triples
( , U , O ) is called an atlas of M if {O } is an open cover of M .
Since O is an open subset of M , there is an open subset W Rn so that
O = M W . In this case, we may rewrite condition (1) as
(10 ) (U) = M W .
Also note that : U O is a homeomorphism from O to U since it is

smooth, one-to-one, onto, and 1 is continuous.
Now we note with a few examples why conditions (2) and (3) are nec-
essary. First of all, consider : R R2 given by (t) = (t2 , t3 ). Then
is smooth, one-to-one, and 1 : (R) R is continuous. But we note
2
the image (R), which is the graph of x1 = (x2 ) 3 in R2 , is not smooth at
(0, 0) R2 . We also check that

2t
D = =0 when t = 0 and (t) = (0, 0),
3t2
and so D has rank 0 < 1 at the point at which (R) is not smooth.
Condition (3) is necessary by the following problem:
55
Homework Problem 26. Recall polar coordinates (x, y) = (r cos , r sin )
in R2 . Show that a portion of the polar graph r = sin 2 can be parametrized
for I an open interval in R, by : I R2 so that is one-to-one, C , and
D is never 0, but so that 1 : (I) I is not continuous. Sketch the graph
and indicate pictorially why (I) should not be considered a submanifold of
R2 .
If W and V are open subset of Rn , then a map f : W V is a diffeomor-
phism if f is one-to-one, onto, C , and f 1 is C . The Inverse Function
Theorem and Problem 9 show
Lemma 36. f : W V is a diffeomorphism if and only if f is one-to-one,
onto, C , and det Df (x) 6= 0 for all x W .
The following theorem is useful in proving properties about manifolds:
Theorem 14. M Rn is a k-dimensional manifold if and only if for all x
M , there are two open subset V, W of Rn , with x W and a diffeomorphism
h : W V satisfying
h(W M ) = V (Rk {0}) = {y V : y k+1 = = y n = 0}.
Proof. () Let U = {a Rk : (a, 0) h(W )}, and define : U Rn by
(a) = h1 (a, 0). is smooth and one-to-one since h is a diffeomorphism.
Moreover, (U) = M W to satisfy condition (10 ). 1 = h(W M ) is contin-
uous.
So all that is left to check is the rank condition (2). Consider H : W Rk
H(z) = (h1 (z), . . . , hk (z)).
Then H((y)) = y for all y U. Then use the Chain Rule to compute
DH((y)) D(y) = I, and so D(y) must be an injective linear map, and
so must have rank k. Thus M is a smooth manifold.
() Now assume M is a manifold, and define y = 1 (x). Then D(y)
has rank k, and so there is at least one k k submatrix of D(y) with
nonzero determinant. (We may think of D(y) as an n k matrix mapping
column vectors in Rk to column vectors in Rn . Then a k k submatrix is
simply a collection of k distinct rows of D(y).) By a linear change of basis,
if necessary, then, we may assume that
i

det (y) 6= 0.
1i,jk y j
56
By continuity, this is true on an open neighborhood U 0 of y.
Define g : U 0 Rnk Rn by g(a, b) = (a)+(0, b). Then, in block matrix
form,
i

0
y j 1i,jk
Dg(a, b) =

i .

j
Ink
y 1jk,k<in
i

So det Dg(a, b) = det1i,jk y j 6= 0. So we may apply the Inverse Function
Theorem to find that there are open subsets of Rn V10 3 (y, 0) and V20 3
g(y, 0) = x so that g : V10 V20 has a smooth inverse h : V20 V10 .
Define O via
O = {(a) : (a, 0) V10 }

= (1 )1 (1 (V10 )),
where : Rk Rn sends a to (a, 0). Since 1 is continuous, O is an open

subset of (U 0 ), and of M . Therefore, there is an open subset V of Rn so
that O = M V .
Let W = V V2 , and V = g 1 (W ). Then h : V W is a diffeomorphism
and
W M = {(a) : (a, 0) V }
= {g(a, 0) : (a, 0) V },
h(W M ) = g 1 (W M )
= g 1 ({g(a, 0) : (a, 0) V })
= V (Rk {0}).
This completes the proof.

This characterization of manifolds is quite useful. Consider two smooth
local parametrizations : U O , and : U O . Then if O O 6= ,
then we have the following
Proposition 37. 1 1 1
: (O ) (O ) is a diffeomorphism.
Proof. Consider : Rn Rk given by (a, b) 7 a for (a, b) Rk Rnk , and

: Rk Rn given by (a) = (a, 0). Let h and h be the diffeomorphisms
57
guaranteed by Theorem 14. Then (a) = h1 1
(a, 0), (x) = (h (x)), and
so
1 1
= h h
is smooth since h , h are diffeomorphisms.

The maps 1
are called gluing maps.
Remark. It is often useful to think of a manifold M as being glued together
from domains U in Rk by the gluing maps. In fact, the previous proposi-
tion is the starting point for the abstract definition of a smooth manifold:
A smooth k-dimensional manifold is Hausdorff, sigma-compact topological
space for which each point x has a neighborhood O homeomorphic to a
domain U Rk via : U O . In addition, we require the gluing maps
1 1
to be smooth on (O ).
If M is a smooth manifold, then a function f : M Rp is said to
be smooth if for each smooth parametrization : U M , f : U
Rp is smooth. If N Rp is a smooth submanifold, then f : M N is
said to be smooth the induced map f : M Rp is smooth. (For abstract
target manifolds N , we may work with local parametrizations instead.) This
definition of smooth maps from manifolds is consistent in the following sense:
Proposition 38. If f : M Rp , and f is smooth from U Rp , then
on 1
(O ) U , f is also smooth.
Proof. Apply Proposition 37 and the Chain Rule.

Proposition 39. If M Rn is a smooth manifold and f : M Rp , then f
is smooth if and only if f can be locally extended to smooth functions from
domains in Rn to Rp . In other words, f is smooth if and only if every x M
has a neighborhood
W in Rn , and there is a smooth function F : W Rp so
that F W M = f .
Proof. () For x M , consider the local diffeomorphism h : W V
guaranteed by Theorem 14. Then for the smooth parametrization (a) =
h1 (a, 0), we know f is smooth. Now define
F = f h1 h : W Rp
for : (a, b) 7 a. F is smooth since
F = f h1 h = (f ) h.
58
() For a local parametrization , f is smooth since locally, f =
F , which is smooth by the Chain Rule.
X RN is a smooth manifold of dimension k if every x X has a
neighborhood that is diffeomorphic to an open subset of Rk . In other words,
there is an open cover O of X so that each O is diffeomorphic to an open
subset U Rk . Let : U O be the diffeomorphism. is called a
parametrization of O X, and the inverse map 1 is called a coordinate
system. The open cover, together with the coordinate systems
{O , , U }
is called a smooth atlas of X, and X is a smooth manifold if and only if it
has a smooth atlas.
Example 10. The unit sphere
S2 = {(x1 , x2 , x3 ) R3 : (x1 )2 + (x2 )2 + (x3 )2 = 1}
is a two-dimensional submanifold of R3 .
To show this, we provide an atlas. Let N = (0, 0, 1) be the north pole and
S = (0, 0, 1) be the south pole. Then let O1 = S2 \ {N }, O2 = S2 \ {S},
U1 = U2 = R2 . We construct the coordinate systems 1 , = 1, 2, by
stereographic projection. We may realize R as the plane {x3 = 0} R3 .
2
For a point x in O1 , consider the line Lx,N in R3 through N and x. We

define 1 2
1 (x) to be the unique point in R Lx,N . It is easy to compute
x1 x2

1 2 1 1 2 3
(y , y ) = 1 (x , x , x ) = , ,
1 x3 1 x3
2y 1 2y 2 |y|2 1

1 2 3 1 2
(x , x , x ) = 1 (y , y ) = , , .
|y|2 + 1 |y|2 + 1 |y|2 + 1
Similarly, for any point x O2 , define 1
2 (x) to be the unique point in
R2 Lx,S , and we find as above
x1 x2

1 2 1 1 2 3
(z , z ) = 2 (x , x , x ) = , ,
1 + x3 1 + x3
2z 1 2z 2 |z|2 1

1 2 3 1 2
(x , x , x ) = 2 (z , z ) = , , .
|z|2 + 1 |z|2 + 1 |z|2 + 1
It is straightforward to check that each of these coordinate systems is a dif-
feomorphism, and since S2 = O1 O2 , we have produced a smooth atlas of
S2 and thus have shown that S2 is a two-dimensional manifold.
59
Given a smooth manifold X with a smooth atlas {O , , U }, let O =
O O . Also define U = U 1
(O ). As long as O 6= , the map
1
: U U
is a diffeomorphism. These maps are called the gluing maps of the man-
ifold X associated to the atlas. In particular, the manifold can be thought of
as the union of the coordinate charts U glued together by the gluing maps.
It is straightforward to see, at least as a set, we may identify
!
G
X= U / ,

where t means disjoint union and the equivalence relation is given by
xy if x U U , y U U , y = (x).
Gluing maps may be used to define smooth manifolds which are not necessar-
ily subsets of RN (though we wont do so here). It is instructive to think of
k-dimensional smooth manifolds as spaces that are smoothly glued together
from open sets in Rk .
Example 11. Recall the example of the atlas of S2 above. Compute
O12 = S2 \ {S, N },
U12 = R2 \ {0},
U21 = R2 \ {0},
y1 y2

y
z = 12 (y) = 1
2 (1 (y)) = , = .
|y|2 |y|2 |y|2
This gluing map is called inversion across the circle |y|2 = 1 in R2 . Each
point is mapped to a point on the same ray through the origin, but the distance
to the origin is replaced by its reciprocal. So we can think of S2 as two copies
of R2 glued together along R2 \{0} by the inversion map across the unit circle.
3.2 Tangent vectors on manifolds

Recall that for a solution to an autonomous system x = v(x), the para-

metric curve (t) has tangent vector (t) = v((t)) at time t. We will use
60
this to define tangent vectors to manifolds. A tangent vector at a point p
in a smooth manifold X is given by the derivative (0)
of a smooth curve
: (, ) X RN so that (0) = p. (Note the fact RN is a vector space
allows us to differentiate .) The space of all tangent vectors at p is called
the tangent space Tp X of X at p, and it is characterized by the following
proposition.
Proposition 40. If X RN is a k-dimensional smooth manifold, then the
tangent space Tp X is the following: Given a local parametrization of X
: U O 3 p
so that (0) = p,
Tp X = D(0)(Rk ).
In particular, Tp X is naturally a k-dimensional vector space.
Proof. First of all, given a curve : (, ) X so that (0) = p, we can
ensure (by shrinking if necessary), that the image of is contained in the
coordinate neighborhood O. Now
= (1 )
and the chain rule shows that
0 (0) = D(0)[(1 )0 (0)] D(0)(Rk ).
Thus weve shown Tp X D(0)(Rk ).

To show D(0)(Rk ) Tp X, for any vector v Rk , consider (t) = (tv)
for |t| small enough that the image of is contained in O. Then
0 (0) = D(0)v
and so D(0)(Rk ) = Tp X.
Also note the following corollary of our definition of Tp X:
Corollary 41. Tp X is independent of the coordinate neighborhood O of p.
If f : X Rm is a smooth map from a smooth k-dimensional manifold
X, and if p X, then we define
Df (p) : Tp X Rm
61
by using a local parametrization : U X so that (q) = p. Then we define
Df (p) = D(f )(q) (D(q))1 .
The following exercise verifies this definition makes sense (see Guillemin and
Pollack).
(a) Show that D(q) is invertible as a linear map from Rk to Tp X.
(b) Show that the definition of Df (p) is independent of the coordinate

parametrization .
(c) Show that if f : X Y for Y Rm a manifold, then Df (p)(Tp X)

Tf (p) Y .
Tangent vectors naturally differentiate functions at a point. So if f : X

R, then and the tangent vector v = 0 (0) for a curve so that (0) = p,
then we may define
(vf )(p) = (f )0 (0) = Df (p)0 (0) = Df (p)v.
This definition depends only on v, and not on the curve used. (For each v
there are many , since v only depends on the first derivative 0 (0) and no
higher Taylor coefficients.)
For a coordinate system
1 = (x1 , . . . , xk ) : O Rk ,
(where we assume as usual that (0) = p), then the coordinate basis of Tp X
induced by may be written as {/xi }, which are thought of as tangent
vectors differentiating functions f by

1
1

k
i
f =

i
f = i
f (x , . . . , x ) .
x p x 0 x 0
(/xi is the tangent vector associated to the curve = (tei ), for ei the ith
basis standard basis vector in Rk .) Thus we can write any tangent vector v
at p as

v = vi i .
x
62
Writing tangent vectors in terms of the coordinate basis of Tp X is much more
useful than writing them in terms of a basis of RN Tp X.
The components v i will change depending on the local coordinates. On
O = O O the intersection of two coordinate neighborhoods of p, then
1
we have two coordinate systems 1 1 k 1 k
= (x , . . . , x ) and = (y , . . . , y ).
We can write by using the chain rule
y j
v = v i (x) i
= v i
(x) i j
= v j (y) j .
x x y y
Therefore, we know how the v i change under coordinate transformations

x y:
y j
v j (y) = v i (x) i . (22)
x
(In a more coordinate-free notation, the Jacobian matrix y j /xi is the
derivative of the gluing map = 1 . It is easy to check that
y = x.)
All the tangent spaces of a manifold X patch together to make a larger
manifold T X called the tangent bundle. We define the tangent bundle
T X = {(p, w) RN RN : p X, w Tp X}.
Homework Problem 28. If X is a k-dimensional manifold, show that T X

is a 2k-dimensional submanifold of R2N . To prove this, consider a local
parametrization : U X RN .
(a) Define : U Rk R2N for y = (y 1 , . . . , y k ) by

i
(x, y) = (x), i (x) y .
x
Show that (U Rk ) is an open subset of T X and that is one-to-one.
(b) Show that D has rank 2k.
(c) Show that 1 is continuous from (U Rk ) to U Rk .
There is a natural smooth map
: T X X, (p, w) = p,
63
and each 1 ({p}) is the vector space Tp X.
Each coordinate system 1 = (x1 , . . . , xk ), provides a local frame {/xi }
of the tangent bundle. A local frame is a basis of the tangent space for ev-
ery p in a neighborhood O X. These frames are patched together in the
following paragraph.
A more abstract view of the tangent bundle is given by looking a given
smooth atlas {O , , U } of X. Then as a set, we may identify
!
G
TX = U Rk / ,

where the equivalence class is given by
(x, v) (y, w) if x U , y U , y = (x), w = D v.
A vector field on a manifold X provides a tangent vector at every point

in X. More precisely, a vector field is a section of the tangent bundle. In
other words, v : X T X is a vector field if (v(p)) = p for all p X. So
v(p) = (p, w(p)) for w(p) Tp X. In fact, for X RN , w : X RN so that
w(p) Tp X is equivalent to v(p) = (p, w(p)). (Clearly v and w carry the
same amount of information, and we often will refer to both of them using
the same symbol v.)
A vector field v is smooth if it is given as a smooth map from X to
R RN T X as above. Equivalently, v is smooth if for every local
N
coordinate system (x1 , . . . , xk ),

v = v i (x)
xi
for v i smooth on U Rk .
3.3 Flows on manifolds

A smooth vector field v on a manifold X defines a system of ODEs in the
local coordinates of X (or we may say more simply a system on X). The
ODE system is given by
x = v(x)
for x : I X a parametric curve.
64
In order to describe the relationship between the local and global pictures
of the ODE system, consider X RN and v : X RN so that for each
p X, v(p) Tp X. Consider a local parametrization : U O . Let
1 1 k k
= (x , . . . , x ). Locally on U R , we represent v by

v = vi .
xi
In other words, for p O X, we have
v(p) = D (p)v (p).
Proposition 42. Consider v a smooth vector field on X RN . Consider a

solution to x = v (x ), where : I U for a time interval I. Then
is a solution to x = v(x) from I to O X. Every solution to x = v(x)

restricted to O is of this form.
Proof. First of all, note that x = v (x ) is a well-defined system of ODEs
on the open set U Rk . On the other hand, on X, the system x = v(x) is
not an ODE system on RN X. This may be remedied locally as follows:
For each p X, v(p) Tp X RN . Then since v is a smooth function,
we may locally extend v to a smooth function to RN (we refer to each local
extension simply as v).
Consider a solution to x = v (x ). Then if we let = , then
compute
= D ( ) = D (v ) = v.
Thus is a solution. To show that every solution to x = v(x) is of this
form, note that since Tp X is the image of D (q) for (q) = p (Proposition
40), then every smooth vector field v is locally equal to D v . Then by
uniqueness of ODEs, the solution to x = v(x) must be the image of the
solution to x = v (x ).
Remark. The restriction to autonomous equations x = v(x) is unnecessary.
The same proof works for non-autonomous systems x = v(x, t) on manifolds.
Recall a subset X of a metric space Y is compactly contained in another
is compact and X
subset Z if X Z. In this case we write X Z, and
say X is a precompact subset of Z.
65
Theorem 15. Let v be a smooth vector field on a compact manifold X. Then
the flow F (y, t) along the vector field (the solution to
x = v(x), x(0) = y)
is a smooth function from X R to X. In particular, any flow on a compact

manifold exists for all time.
Proof. Consider an atlas {O , , U } of X. First of all, by Lemma 43 below,
there is an open cover Q of X so that each Q O for some O in the
atlas. Then each 1 Q is a compact subset of U . Our differential equation
is equivalent to x = v (x ) on each U .
Since X is compact, we can choose a finite subcover {Q1 , . . . , Qn } of the
open cover {Q }. For each i = 1, . . . , n, an straightforward analog of Lemma
23 shows there is an i > 0 so that if x0 1 Qi , then the solution to
x = v (x ), x(0) = x0
stays in U for t [i , i ]. Moreover, by Proposition 31, for any T R, the

solution with initial condition x(T ) = x0 1 Qi stays within U for time
t [T i , T + i ].
Let = min{1 , . . . , n } > 0. Then for every T R, p X, we claim the
solution to x = v(x), x(T ) = p exists for all t [T , T + ]. To prove the
claim, note that each p X lies in one of the Qi O , and that the solution
to
x = v (x ), x(T ) = 1
(p)
lies in U for t [T , T + ]. Thus Proposition 42 shows that the solution

to x = v(x), x(T ) = p is in O for t [T , T + ], and the claim is proved.
In order to prove the Theorem, continue as in the proof of Lemma 25 to
show the solution exists for all time. The smoothness of the solution follows
from Theorem 12 and Proposition 42.
Lemma 43. Given an atlas {O , , U } of a manifold X, there is an open
cover {Q } of X so that each Q is precompact in some O .
Proof. We can cover each open U Rk by open balls B U . Then
Q = (B ) forms an open cover of X.
The support of an Rm -valued function f is the closure
supp(f ) = {x : f (x) 6= 0}.
66
An important class of functions is smooth functions with compact support.
Prominent examples can be constructed using the smooth function on R
1
e x for x > 0
f (x) =
0 for x 0
See the notes on bump functions.

Homework Problem 29. Let Rn be a domain. Consider a smooth
vector field v : Rn with compact support. Show that any solution to
x = v(x), x(0) = x0 , exists for all time t R.
Hint: First show that if v(y) = 0, then any solution to x = v(x), x(t0 ) =
y, must be constant for all time. Use this to show that any solution to x =
v(x), v(x(0)) 6= 0, must remain in supp(v) for its entire maximal interval of
definition. Apply Theorem 7.
Given a smooth manifold X, consider the set Diff(X) of diffeomorphisms
from X to itself. Then for f, g Diff(X), it is easy to see that
f g Diff(X), f 1 Diff(X), f f 1 = id
for id the identity map. Therefore, Diff(X) is a group.

Proposition 44. Let v be a smooth vector field on a compact manifold
X. Then for the flow F (y, t), define Ft (y) = F (y, t). Then Ft Diff(X),
Ft1 +t2 = Ft1 Ft2 , and Ft = Ft1 . (And so F is a group homomorphism
from the additive group R to Diff(X).)
Proof. Theorem 15 shows that Ft is smooth for any t. The group homo-
morphism property is simply a restatement of Proposition 34. Therefore,
Ft Ft = F0 , which is the flow along v for time 0. By definition, F0 = id the
identity map. Now Ft1 = Ft is smooth, and so Ft is a diffeomorphism.
Remark. Note the only place we used the fact that X is compact is to guar-
antee the existence of the flow for all time. So the proposition still holds for
any smooth vector field v on a smooth manifold X so that the flow exists for
all time.
Example 12. For the sphere S2 R3 , consider the vector field defined by
v(x1 , x2 , x3 ) = (x2 , x1 , 0). It is straightforward to show that the tangent
space to S2 at (x1 , x2 , x3 ) is given by v = (v 1 , v 2 , v 3 ) R3 so that v 1 x1 +
67
v 2 x2 + v 3 x3 = 0. (Proof: S2 = {f = 1} for f = (x1 )2 + (x2 )2 + (x3 )2 ,
and so for any local parametrization , we have f = 1. Thus the Chain
Rule shows that Df (x)(Tx S2 ) = 0, and so Tx S2 ker Df (x). They must
be equal since both are two-dimensional vector spaces. Then simply compute
ker Df (x).) Therefore, v is a smooth vector field on S2 .
Recall that the coordinate systems of the atlas introduced above are
x1 x2

1 2 1 1 2 3
(y , y ) = 1 (x , x , x ) = , ,
1 x3 1 x3
x1 x2

1 2 1 1 2 3
(z , z ) = 2 (x , x , x ) = , .
1 + x3 1 + x3
On U1 , compute at x = (x1 , x2 , x3 ) O1 S2 ,
! x2
1 x1
1x3
0 (1x3 )2
D1
1 (x)(v) = 1 x2
x1
0 1x3 (1x3 )2 0
2
!
x

1x 3 y 2
= 1 = .
x
1x3
y1
It turns out that for x O2 ,

z 2
D1
2 (x)(v) =
z1
as well.
In the
coordinate charts, these systems can be solved explicitly. For A =
0 1
, compute the fundamental solution
1 0
eAt = P etD P 1
1 i
1 1 i 0 2 2
= exp t
i i 0 i 1
2
2i
1 i
1 1 cos t + i sin t 0 2 2
=
i i 0 cos t i sin t 1
2
2i

cos t sin t
= .
sin t cos t
68
Therefore, for y U1 , the solution to y = v(y), y(0) = y0 is

cos t sin t
y(t) = y0 . (23)
sin t cos t
And also, for z U2 , the solution to z = v(z), z(0) = z0 is

cos t sin t
z(t) = z0 . (24)
sin t cos t
Proposition 42 implies that these two flows should be related, since they both
correspond to flows on S2 . In particular, for y0 U12 , let z0 = 12 (y0 ) =
y0 |y0 |2 . Then we check that the solution
z(t) = 12 (y(t))
for y(t) from (23) and z(t) from (24). So compute

1 1
y0 cos t y02 sin t y0
y(t) = for y0 = ,
y01 sin t + y02 cos t y02
|y(t)|2 = (y01 cos t y02 sin t)2 + (y01 sin t + y02 cos t)2 = |y0 |2 ,

y(t) 1 cos t sin t
12 (y(t)) = = y0
|y(t)|2 |y0 |2 sin t cos t

cos t sin t
= z0 = z(t).
sin t cos t
Therefore, the flow patches from U1 to U2 .

The flow itself can be represented by on U1 by

cos t sin t
Ft (y) = y,
sin t cos t
on U2 by
cos t sin t
Ft (z) = z,
sin t cos t
and even on S2 R3 itself by

0 1 0 cos t sin t 0
Ft (x) = exp 1 0 0 t x = sin t cos t 0 x.
0 0 0 0 0 1
69
Homework Problem 30. Consider the atlas given above for S2 . On U1 ,
consider the vector field
2
v = y 1 y .
y 1 y 2
Show that D1 v is extends to a smooth vector field on all of S2 (i.e., it

extends smoothly across N = S2 \ O1 .) Write down this vector field in the
z coordinates on U2 as well. Solve for the flow on U1 and U2 , and explicitly
check they agree on the overlap O12 .
3.4 Riemannian metrics

For a vector v at a point p on a manifold X RN , we can measure the length
of v by using the inner product on RN . So if v Tp X RN , and

v = va
y a
for y = (y 1 , , y N ) coordinates on RN , then the length |v| of v is given by

N
X
2
|v| = (v a )2 = ab v a v b
a=1
for the Kronecker ab = 1 if a = b and ab = 0 if a 6= b. In this usage

for computing the length of a tangent vector on RN , the Kronecker is a
Riemannian metric.
(Note we use the following convention for an n-dimensional manifold X
RN : use indices a, b, c from 1 to N to represent coordinates in RN , and use
i, j, k from 1 to n to represent local coordinates on X.)
On a manifold X, a Riemannian metric is a smoothly varying positive
definite inner product on Tp X for all p X. Recall the definitions involved.
An inner product on a real vector space V is a pairing g : V V R which is
bilinear and symmetric. g is bilinear if for every v V , the maps g(v, ) and
g(, v) from V to R are linear maps, and g is symmetric if for each v, w V ,
g(v, w) = g(w, v). An inner product is positive definite if g(v, v) 0 for all
v V and g(v, v) = 0 only if v = 0.
If the vector space V has a basis ei , then the inner product g is determined
by gij = g(ei , ej ), since for any linear combination v = v i ei , w = wj ej ,
70
bilinearity shows
g(v, w) = g(v i ei , wj ej ) = v i g(ei , wj ej ) = v i wj g(ei , ej ) = v i wj gij .
The fact g is symmetric is equivalent to gij = gji .
Note that a positive definite
p inner product g provides a way to measure
the length of a vector |v|g = g(v, v), and it also provides a measurement
of the angle between two nonzero vectors v and w:
g(v, w)
cos = .
|v|g |w|g
A Riemannian metric on X gives a positive definite inner product on each
tangent space Tp X. We also require these inner products to vary smoothly
as the point p varies in X. To describe this, consider a smooth atlas on X,
and a local coordinate system (x1 , . . . , xk ) around p. Then a smooth vector
field v can be represented as v = v i x i for the standard local frame {/xi }
of the tangent bundle. Then at each point, the inner product g is represented
by gij (x), and
g(v, w) = gij v i wj , v i = v i (x), wj = wj (x), gij = gij (x).
Then g is smoothly varying on X if the functions gij are smoothly varying
on each coordinate chart in the smooth atlas of X.
Euclidean space RN has a standard Riemannian metric given by the stan-
dard inner product ab . As weve seen above, for any submanifold X RN
endows X with a Riemannian metric. In particular, for v, w Tp X RN ,
we can form g(v, w) using the inner product ab . In particular, consider a
smooth parametrization : U O X RN . Then = (1 , , N ). A
vector field represented by

v = vi i
x
on U Rn is represented by
a
D(x)(v) = i
(x)v i (x) T(x) X RN .
x
D(x)(v) is called the push-forward of v under the map . For v, w Tp X,
we may define the metric
a b
i j i j
gij v w = g(v, w) = i
v w ab
x xj
a b

= ab v i wj .
xi xj
71
Therefore, the Euclidean inner product on RN induced the Riemannian met-
ric on X locally given by the formula
a b

g , = gij = ab . (25)
xi xj xi xj
Given a real vector space V , the dual vector space V is given by the set
of all linear functions from V to R. It is easy to check V is a vector space.
If V has a basis {ei }, then there is a dual basis { i } of V , which is defined
as follows:
i (ej ) = ji .
Given a local coordinate frame {/xi } of T X, the local frame on the dual
space is written as {dxi }. Each dxi is called a differential. The dual space
Tp X of Tp X is called the cotangent space of X at p.
Lemma 45. If y = y(x) is a coordinate change as in (22), then
y j i
dy j = dx .
xi
Proof. Write dy j = `j dx` . Then we have
k k
k k
j j j ` x j x ` j x ` j x
i = dy = dx = dx = = .
y i `
y i xk `
y i xk `
y i k k
y i
k
j x j y j
Therefore, (k ) is the inverse matrix of , and so k = .
y i xk
A Riemannian metric can be naturally written as
y k y ` i j
gk` dy k dy ` = gij dx dx .
xi xj
This makes sense because of the natural pairing

i
dx j
= ji
x
between the tangent and cotangent spaces implies that

k `
i
g(v, w) = gij dx v j
dx w = gij (v k ki )(w` `j ) = gk` v k w` .
xk x`
72
A Riemannian metric is an example of a tensor on X. The tensor product
V W of two real vector spaces with bases respectively i and j is the real
vector space formed from the basis
{i j }.
This implies
dim V W = (dim V )(dim W ).
A tensor of type (k, `) on a manifold X assigns to each point p X an
element of
(Tp X)k (Tp X)` ,
which has as its basis

j1 j`
i dx dx .
xi1 x k
Locally, we write a tensor as

ji11 ik
j` i
dxj1 dxj` ,
x 1 xik
or simply as ji11 i i1 ik
j` . We say is smooth if each j1 j` is smooth locally for
k
all coordinates in a smooth atlas of X.

A Riemannian metric is then a smooth symmetric (0, 2) tensor on a man-
ifold X. Since the product is symmetric, we omit the and simply write
gij dxi dxj for a Riemannian metric in local coordinates x. (There are also
antisymmetric (0, k) tensors, or k-forms, for which the tensor product is
replaced by .)
Example 13. For S2 , in the local coordinate given by stereographic projec-

tion, recall the coordinate chart = 1 :
2y 1 2y 2 |y|2 1

1 2
(y , y ) = , , ,
|y|2 + 1 |y|2 + 1 |y|2 + 1
73
and the Riemannian metric induced from R3 is
a b i j
gij dy i dy j = ab dy dy
y i y j
= ab da db
= d1 d1 + d2 d2 + d3 d3
2
2(y 1 )2 + 2(y 2 )2 + 2 1 4y 1 y 2

2
= dy + dy
(|y|2 + 1)2 (|y|2 + 1)2
2
4y 1 y 2 2(y 1 )2 2(y 2 )2 + 2 2

1
+ dy + dy
(|y|2 + 1)2 (|y|2 + 1)2
2
4y 1 4y 2

1 2
+ dy + dy
(|y|2 + 1)2 (|y|2 + 1)2
4
= 2 2
(dy 1 dy 1 + dy 2 dy 2 ).
(|y| + 1)
Note in the previous example, we used the formula for differentials
a i
da = dy .
y i
It is also useful to have the following notation: If h = hab dz a dz b is a Rie-
mannian metric on Z, and : Y Z is a smooth map, then we denote the
pullback metric
h = hab () da db
on Y . Thus in the construction above, if = ab dxa dxb is the Euclidean
metric on RN , then the metric g induced on a submanifold : X , RN is
the pullback .
Homework Problem 31. Let : X Y be a smooth map of manifolds.
Let Y have a Riemannian metric h on it. Show that h is a Riemannian
metric on X if and only if the tangent map D(x) : Tx X T(x) Y is injective
for every x X. (In this case is called an immersion.)
Hint: Do the calculations in local coordinates on X and Y . The key point
to check is whether h is positive definite. Show h(x) is 0 on the kernel
of D(x).
Note in the previous example, we considered the Riemannian metric on
S pulled back from the Euclidean metric on R3 . It is possible to write down
2
other Riemannian metrics as well.
74
Example 14. Consider hyperbolic space
Hn = {x = (x1 , . . . , xn ) Rn : xn > 0}
equipped with the Riemannian metric
dx1 dx1 + + dxn dxn

.
(xn )2
A famous theorem of John Nash shows that for every Riemannian metric
g on a smooth manifold X, there is an embedding i : X RN so that g is
induced from the standard metric on RN . (Although it is not in most cases
obvious what the embedding is.)
3.5 Vector bundles and tensors

In order to explain better what tensors are, we introduce the idea of a vector
bundle. The tangent bundle T X of a smooth n-dimensional manifold X is a
vector bundle. Recall there is a map
: T X X.
The fiber over a point p X 1 (p) = Tp X is an n-dimensional vector

space. Moreover, over each coordinate neighborhood O X with coordi-
nates {x1 , . . . , xn }, 1 O is diffeomorphic to O Rn , the diffeomorphism
being
(p, v) 7 (p, v 1 , . . . , v n )
for p O, v = v i x i Tp X.
We generalize these properties of T X to define a vector bundle. A vec-
tor bundle of rank k over a manifold X is given by an n + k dimensional
manifold V with a smooth map : V X. V is called the total space of
the vector bundle. Every point in X has a neighborhood O so that 1 O is
diffeomorphic to O Rk . Under this diffeomorphism, is simply the natural
projection from O Rk O. Thus vector bundles are locally trivial, in that
each vector bundle is locally a product of a neighborhood times Rk . Note
that each diffeomorphism
1 O O Rk
75
provides for each p O a basis of the vector space 1 (p) by taking the
preimage of the standard basis of Rk under the diffeomorphism. Such a
smoothly varying basis is called a local frame of the vector bundle over O.
Given a gluing map y = y(x) of two small coordinate neighborhoods Ox
and Oy in X, there is a corresponding gluing map of Ox Rk and Oy Rk .
We require this gluing map to be of the form
(x, v) 7 (y(x), A(x)v)
for v a vector in Rk and A(x) a smoothly varying nonsingular matrix in
x. Therefore, above each point p, if we change coordinates from x to y, the
frame changes by the matrix A(x). A(x) is a transition function of the vector
bundle V . So the transition functions act on the fibers of a vector bundle as
linear isomorphisms. This preserves the vector-space structure on each fiber
when changing coordinates.
Remark. We have defined real vector bundles of rank k, for which each fiber is
diffeomorphic to Rk . We may also define complex vector bundles with fibers
diffeomorphic to Ck .
A section of a vector bundle : V X is a map s : X V satisfying
(s(p)) = p for all p X. So for each p X, s(p) is an element of the
vector space 1 (p). A vector field is precisely a section of the tangent
bundle. Locally, k sections which are linearly independent on each fiber
form a frame of the vector bundle. For example, {/xi } are n linearly
independent sections of the tangent bundle over a coordinate chart.
Since vector bundles preserve the linear structure on each fiber, we may
do linear algebra on the fibers to create new vector bundles. In particular,
we can take duals and tensor products of the fiber space to form new vector
bundles. The tensor bundle of type (k, `) over an n dimensional manifold X
is the vector bundle of rank nk+` with the fiber over p given by
Tp X k Tp X ` .
Over each coordinate chart, the natural frame of the tensor bundle is

i
i dxj1 dxj`
x 1 x k
for i1 , . . . , ik , j1 , . . . , j` {1, . . . , n}. The transition functions of a tensor
bundle are determined by the formulas
y k xj `
= , dxj = dy .
xi xi y k y `
76
For example the transition functions for the (0, 2) tensor bundle are given by
xi xj k `
dxi dxj = dy dy .
y k y `
Note we can view
xi xj
y k y `
as a nonsingular n2 n2 matrix, which is the tensor product of the matrix
xi
y k
with itself.
A smooth tensor of type (k, `) is a smooth section of the (k, `) tensor
bundle. Thus a Riemannian metric is a smooth symmetric, positive-definite
(0, 2) tensor.
3.6 Integration and densities

We begin by introducing the Change of Variables Formula for multiple inte-
grals:
Theorem 16 (Change of Variables). Let Rn be an open set, and let

g : Rn be one-to-one and locally C 1 . Then for every L1 function f on
g() with Lebesgue measure dx and dy,
Z Z
f (y) dy = f (g(x))| det Dg(x)| dx.
g()
Proof. See Spivak Calculus on Manifolds.

Here is another useful concept. Given an open cover {O } of a smooth
manifold X, a partition of unity subordinate to the cover is a collection of
smooth functions : X R satisfying
1. (x) [0, 1].
2. For each , there is an so that supp( ) O .
3. Every x X has a neighborhood which intersects only finitely many

of the supports of the .
P
4. (x) = 1.
77
Proposition 46. For every open cover of a smooth manifold X, there exists
a subordinate partition of unity.
For a proof, see Spivak or Guillemin and Pollack.
Theorem 17. A Riemannian metric g on a manifold X provides a measure
on X called the Riemannian density.
The construction of this measure follows below, along with a sketch of a
proof.
Let {O , , U } be a smooth atlas of X. A function f : X R is
measurable if each f : U R is measurable. For a Riemannian metric
g on X, the density dVg is defined first for measurable functions f : X R
whose supports are contained in some O . In this case, define
Z Z Z q
f dVg = f dVg = f (x) det gij (x) dx
X O U
for local coordinate x on O and Lebesgue measure dx on U Rn .

The key point is to make sure this definition makes sense for functions f
whose support is contained in two open charts O and O . As above, let x
be the local coordinates on O , and let y be the coordinates on O . Then we
use the rule (25) for changing gij under a change y = y(x) and the Change
of Variables Theorem 16 to show
y i
Z q Z q

f (y) det gij (y) dy = f (x) det gij (y) det j dx

U U x
s
y i

xk x`
Z
= f (x) det gk` (x) det j dx
U yi y j x
k i
Z
p x y
= f (x) det gk` (x) det i det j dx
U y x
Z p
= f (x) det gk` (x) dx.
U
Let be a partition of unity subordinate to the atlas O of X. For any

measurable subset X, consider its characteristic function . Then
Z XZ
Vg () = dVg = dVg .
X X
78
The calculation in the previous paragraph can be used to ensure that this def-
inition is independent of the atlas and partition of unity used. It is straight-
forward to check that dVg defines a measure on X. Then for any L1 function
f on X (measured by dVg of course),
Z XZ
f dVg = f dVg .
X X
Homework Problem 32. Check that Vg is a measure on X.

Remark. To complete a proof of Theorem 17, it is necessary to check that
the definition depends only on g and not on the atlas {O , , U } or the
partition of unity { } subordinate to the open cover {O }.
If is a domain in Rn with smooth boundary, then the measure on the
boundary is given by the restriction of the Riemannian metric on Rn .
(So this gives a Riemannian metric on , and thus a density as above.) If
is locally given by the graph of a function (x1 , . . . , xn1 , f (x1 , . . . , xn1 )),
then
(x1 , . . . , xn1 ) = (x1 , . . . , xn1 , f (x1 , . . . , xn1 ))
is a local parametrization of the n 1 dimensional manifold Rn . The
matrix
1 0 0
0 1
0
D = ... .. ... .. .

. .
0 0 1
f,1 f,2 f,n1
Then the pullback metric
n1
X
gij dxi dxj = = ab da db
i,j=1
= (dx1 )2 + + (dxn1 )2 + (f,1 dx1 + + f,n1 dxn1 )2 .

As a matrix,
(gij ) = (ij + f,i f,j ).
In order to compute the volume form, we should compute det gij . Fortunately,
it is easy to compute in this case
2
det g = 1 + |df |2 = 1 + f,12 + + f,n1 ,
79
(see Problem 33) below. So the density
p
dVg = 1 + |df |2 dxn1
for dxn1 Lebesgue measure on Rn1 .
Homework Problem 33. For w an n-dimensional column vector, and I
the n n identity matrix, show that det(I + ww> ) = 1 + |w|2 .
Hint: Show that I +ww> can be diagonalized, with one eigenvalue 1+|w|2 ,
and with the eigenvalue 1 repeated n 1 times. (For this last step, show that
on the n 1 space orthogonal to the natural (1 + |w|2 )-eigenvector, I + ww>
acts as the identity. What is a natural eigenvector to try?)
f i
For a function f : R, the differential, or one-form, df = x i dx .
Under a change of coordinates y = y(x), df transforms as via the chain rule

f j f i f y j i
dy = df = dx = dx .
y j xi y j xi
In particular, this gives the formula for differentials (cf. Lemma 45)
y j i
dy j = dx .
xi
It also shows that for each p X a manifold, we can think of df (p) Tp X
the cotangent space. This is investigated further in the following problem:
Homework Problem 34. If f is a smooth function on X and v is a smooth
vector field, show that at each point p X,
(vf )(p) = df (p)(v(p)).
(In the expression on the right, consider df (p) as an element of the dual space
Tp X.)
Hint: Check it in a single coordinate chart.
On a Riemannian manifold (X, g) (i.e., g is a Riemannian metric on the
manifold X), for each smooth function f , there is a vector field called the
gradient of f . We define the gradient f in local coordinates to be
(f )i = g ij f,j , g k` g`m = m
k
.
(So g ij is the inverse of the matrix gij .) Note that the Einstein convention
with one index up (typically) indicates that f is a vector field.
80
Homework Problem 35. Show that f transforms as a vector field under
coordinate changes. In other words, check that if y = y(x),
y j
(f )j (y) = (f )i (x)
xi
as in (22).
Hint: First check how the inverse of the metric g ij transforms. Note that
in the definition g ij gjk = ki , ki is independent of coordinate changes.
In the case of Euclidean space, it is common to use the gradient of a

function instead of its differential. In this case, f = ab f,a . Note that on
any Riemannian manifold
|df |2 = g ab f,a f,b = g ac gcd g db f,a f,b = gcd (f )c (f )d = |f |2 .
Let v = v i x i be a vector field on a domain in Rn . Then the divergence of v

is a function defined to be
v i
v = .
xi
The divergence of a vector field may also be defined on Riemannian manifolds,
but the definition is somewhat more involved.
Here is another important theorem, which is a consequence of Stokess
Theorem (see Spivak, Guillemin and Pollack, or Taylor). We only state it
for domains in Rn , and not in its more general context of compact manifolds
with boundary.
Theorem 18 (Divergence Theorem). Let Rn be a domain with

smooth boundary . Then for any C 1 vector field v on ,
Z Z
v dxn = v n dV.

(Here n is the unit outward normal vector field to , and dV is the measure
on induced from the Euclidean metric.)
Remark. The way we have put the integration depends on the Euclidean
metric (to form the dot product, dV and n). In the general form of Stokess
Theorem, it it unnecessary to use the metric. (We may recast v and v as
differential forms.)
81
Idea of proof. We do the computation in a very special case, for v having
compact support in , which is the lower half-space {x = (x1 , . . . , xn ) Rn :
xn 0}.
In this case the unit normal vector n = (0, . . . , 0, 1) and dV = dxn1
Lebesgue measure on Rn1 = {xn = 0}. Then, using Fubinis Theorem, we
want to prove
Z Z Z 0 Z Z
v i n n1 1
... i
dx dx dx = ... v n dxn1 dx1 .
x
Note that the left-hand integral is a sum from i = 1 to n. For i = n, compute

Z 0
v n n
n
dx = v n (x1 , . . . , xn1 , 0) lim v(x1 , . . . , xn1 , t)
x t
n 1 n1
= v (x , . . . , x , 0)
since v has compact support. On the other hand, for i 6= n,

Z
v i i
i
dx = 0
x
since v has compact support. Therefore, using Fubinis Theorem, for each
i 6= n, we can integrate v i /xi with respect to xi first to get zero. The
remaining term is the case i = n, and so
Z Z Z 0
v i n n1
... i
dx dx dx1
x
Z Z Z 0
v n n n1
= ... n
dx dx dx1
x
Z Z
= ... v n dxn1 dx1 .

This proves the Divergence Theorem in this special case.

The general case can be reduced to this special case by using a partition
of unity and the Implicit Function Theorem (see Spivak). In particular, near
each point in , there is a local diffeomorphism of to the lower half-
space, sending the boundary to the boundary. Together with open subsets of
, these form an open cover of the compact , and so we may take a finite
subcover, and a partition of unity subordinate to this subcover. Then we can
82
apply the above special case to v for in the partition of unity and v the
vector field.
It is also necessary to make sure that the various terms in the integrals
transform well with respect to the local diffeomorphisms. This can be checked
directly, but it is better to use the language of differential forms (see Spivak
or Guillemin and Pollack).
Homework Problem 36. Let be a domain in Rn with smooth boundary.
On a neighborhood N Rn of a point in the boundary , assume that
N = {x N : xn < f (x1 , . . . , xn1 )}
so that is locally the region under the graph of a smooth function f . Com-
pute n and dV . For a smooth vector field v, compute
Z
v n dV
N
in terms of the integral of a function times Lebesgue measure on Rn1 .

Hint: Locally, is a submanifold of Rn which is the image of
(x1 , . . . , xn1 ) = (x1 , . . . , xn1 , f (x1 , . . . , xn1 )).
Show that n is proportional to , for
(x1 , . . . , xn ) = xn f (x1 , . . . , xn1 ).
Your answer should be of the form
Z
h dxn1
1 (N )
for h a function of x1 , . . . , xn1 .

Corollary 47 (Integration by Parts). Let Rn be a domain with
smooth boundary . Then for any C 1 vector field v on and C 1 function

f on , Z Z Z
v f dxn = f v dxn + f v n dV.

Proof. It is easy to check that (f v) = (f ) v + f v, and

Z Z
(f v) dxn = f v n dV.

83
3.7 The -Neighborhood Theorem
Theorem 19. Let X Rn be a compact k-dimensional manifold. Then
there is an > 0 so that for
X = X + B (0) = {y Rn : there is an x X so that |x y| < },
there is a smooth projection map from X to X which restricts to the identity

on X.
Before we prove Theorem 19, we need to introduce the normal bundle

N X, which is a vector bundle over X for X Rn . Let h, i denote the
standard inner product on Rn . Define
N X = {(x, y) Rn Rn : x X, hy, zi = 0 for all z Tx X}.
Then N X is a vector bundle of rank n k, with : N X X given by

: (x, y) 7 x. For a given x X, Nx X = 1 (x) is the normal space to X
at x, which consists of all vectors in Rn perpendicular to the tangent space
Tx X.
First of all, we show that N X is a smooth n-dimensional manifold.
Homework Problem 37. N X is a smooth manifold of dimension n.
(a) Show that X Rn is a smooth manifold if and only if for each x X,

there is a neighborhood W of x in Rn and a smooth function : W
Rnk so that D has constant rank n k and X W = 1 (0). (To
show =, use Theorem 14, and to show =, use the Implicit Function
Theorem.)
(b) At each x X, and given a smooth function as above, show that the
normal space Nx is the image of of the transpose of the tangent map
D(x) : Rnk Rn .
(c) Use the previous section and the techniques of Problem 28 to show N X
is a manifold.
We will prove the -Neighborhood Theorem by showing that there is a

neighborhood of X in Rn which is diffeomorphic to the a neighborhood of
the zero section {(x, 0) : x X} N X, and the map required by the
-Neighborhood Theorem then comes from : N X X.
84
Proof of the -Neighborhood Theorem. Consider the map F : N X Rn
given by F : (x, y) 7 x + y. For each x X, DF (x, 0) : Tx (N X) Rn
is a linear isomorphism. This can be proved since T(x,0) (N X) can be written
as a sum Tx (X) + Nx (X), and DF (x), when restricted to each factor, is
a linear isomorphism. The Inverse Function Theorem then shows that each
x X, there are neighborhoods Nx of (x, 0) in N X and Wx of x in Rn so that
F |Nx is a diffeomorphism from Nx to Wx . Note we may apply the Inverse
Function Theorem because by considering a local parametrization of N X,
and diffeomorphisms of (open subsets of) manifolds are defined in terms of
these parametrizations.
Consider the following lemma:
Lemma 48. There are open sets N and X so that X {0} N N X and
XX Rn and the restriction of F is a diffeomorphism from N to X.
Proof.
S First of all, we note that DF is a linear isomorphism on N 0 =
xX Nx . The Inverse Function Theorem then shows that F |N 0 is a dif-
feomorphism onto its image as long as it is one-to-one. Therefore, we need
only find an open N satisfying X {0} N N 0 on which F is one-to-one.
Now assume by contradiction that no such N exists. Then there are
points (xn , yn ) 6= (x0n , yn0 ) N X satisfying F (xn , yn ) = F (x0n , yn0 ) and so
that |yn |, |yn0 | < n1 (Why? You must use the compactness of X.) Since X is
compact, there must be a subsequence ni so that (xni , yni ) (x, 0) as i .
Then we may take a further subsequence nij so that (x0ni , yn0 i ) (x0 , 0) as
j j
j . For simplicity, we rename the subsequence nij as simply n. Then
the continuity of F shows that
x = F (x, 0) = lim F (xn , yn ) = lim F (x0n , yn0 ) = F (x0 , 0) = x0 .

n n
Since F is injective on X {0}, we have x = x0 . But then F |Nx is injec-

tive, which contradicts our assumption that (xn , yn ) 6= (x0n , yn0 ) for large n.
Therefore, the lemma is proved.
Now since X is compact, there is a small > 0 so that X F (N ). The
projection map from X X is then given by F 1 , which is smooth.
This completes the proof of the -Neighborhood Theorem.
85
4 The Calculus of Variations
4.1 The variational principle
In this section, we want to consider the problem of constructing a function
which minimizes a given functional. (A functional is a map from functions
to R.)
Example 15. Let Rn be a domain with smooth boundary. Then we

consider the class
: f = g on }
F = {f C 2 () C 0 ()
for a given C 2 function g on . Consider the graph of f

R}.
{(x, f (x))
By pulling back the Euclidean metric on Rn+1 , we can consider the n-volume
of the graph. We have computed above
Z p
Vol(f ) = 1 + |f |2 dxn .

Then we want to consider the following question: Is there an f F which

minimizes Vol(f ) over all of F?
If it exists, f must satisfy

d
Vol(f + h) = 0
d =0
for every h so that f + h F. We compute and integrate by parts to find

a differential equation f must satisfy. First of all, f + h F if and only if
86
and h = 0 on .
h C 2 () C 0 ()

d
0 = Vol(f + h)
d =0
Z
d p
= 1 + |f + h|2 dxn
d =0
Z
d p
= 1 + |f |2 + 2 df h + 2 |h|2 dxn
d =0

2 f h + 2 |h|2
Z
= dxn

p
2
2 1 + |f + h|
=0
f h
Z
= p dxn
1 + |f |2
! !
f f
Z Z
= h p dxn + h p n dV
1 + |f |2 1 + |f |2

!
f
Z
= h p dxn .
1 + |f |2
which vanishes
This last integral must be equal to zero for every h C 0 ()
on . We claim this forces
!
f
g = p =0
1 + |f |2
on .
To prove the claim, note that since f is C 2 , g is continuous on . We
prove the claim by contradiction. If g is nonzero at any point x , assume
without loss of generality that g(x) > 0. Then by continuity, g > 0 in a small
ball B centered at x. Now it is easy to find a smooth bump function h whose
support is contained in B. In this case
Z Z
hg dxn = hg dxn > 0,
B
which provides the contradiction.
87
Thus any function f which minimizes the functional Vol satisfies the
Euler-Lagrange equation of the functional
!
f
p = 0.
1 + |f |2
This equation is known as the minimal surface equation.

So a solution to our problem satisfies the minimal surface equation, and
the boundary condition f = g on . This sort of boundary condition of
specifying the value of a solution f is called a Dirichlet boundary condition.
The problem of finding a solution to the equation with this boundary condition
is a Dirichlet boundary value problem. Note that the Dirichlet boundary
condition is essential in making sure the variational function h vanishes on
the boundary, and thus there are no boundary terms when we integrate by
parts. There is another useful type of boundary condition, the Neumann
boundary condition, in which the normal derivative f n = 0. Notice that
this also makes the integral over vanish in the integration by parts.
In the previous example, we computed the Euler-Lagrange equation for

Vol. There may be solutions to the Euler-Lagrange equation which are not
minimizers of Vol, since we have only checked the first-derivative test. A
solution to the Euler-Lagrange equation may correspond to a local maxi-
mum, a saddle point or a local but non-global minimum. Well see below
specific techniques for finding a global minimizer, which we apply in another
geometric problem.
The Euler-Lagrange equations come from the first variational formula
that a minimizer must satisfy: Given a family f with f = f0 , then if f
minimizes a functional P ,

d
P (f ) = 0.
d =0
This is the formula of the first variation, which comes from the first derivative
test in calculus. We may also use the second derivative test. A minimizer f
as above must satisfy the second variation formula
d2

P (f ) 0.
d2 =0
88
Homework Problem 38. Consider a variational problem for C 2 functions
y = y(x) from a domain [a, b] and fixed endpoints y(a) = y0 , y(b) = y1 .
Assume the function is of the form
Z b
J(y) = F (y, y 0 )dx,
a
for F a smooth function of 2 variables.
(a) Compute the general Euler-Lagrange equation for J.
(b) Multiply the Euler-Lagrange equation by y 0 to show that any solution to

the Euler-Lagrange equation must satisfy
dG
=0
dx
for a function G depending on F, y and their derivatives.
(c) A graph y = y(x) of a C 1 positive function determines a surface of

revolution around the x-axis with surface area
Z b p
A(y) = 2 y 1 + (y 0 )2 dx.
a
Compute the Euler-Lagrange equation for A (assume y is C 2 ) and com-

pute its general solution. (The graph of this solution is called a cate-
nary.)
4.2 Geodesics
Given a C 1 path : I X for I = [, ] an interval and X RN a manifold
with Riemannian metric g induced from the Euclidean metric on RN , the
length of the path (I) is given by
Z Z p Z q
L() = ||
g dt = g(,
)
dt = gij ((t)) i (t) j (t) dt.

(In the last formulation, note the use of local coordinates. So the last for-
mulation is strictly only true when (I) is contained in a single coordinate
chart.) L() is called the length functional which take paths to R.
89
Proposition 49. The length of a path is independent of the parametrization.
In other words, if ( ) = (t( )) for t = t( ) a C 1 diffeomorphism onto I,
then L( ) = L().
Proof. Let t = t( ) with t() = , t() = . Assume that < and since t
is a diffeomorphism, then dt/d > 0. Then compute
Z s
d
d
L() = g , d

d d
Z s
d dt d dt
= g , d

dt d dt d
Z s
d d dt
= g , d

dt dt d
Z s
d d
= g , dt
dt dt
= L().
> is similar.
The case when dt/d < 0 and
So this definition corresponds to the usual definition of the arc length of a
parametric curve. In particular, it is invariant under change of parametriza-
tion. This particular feature turns out to cause trouble analytically. In the
following sections, well seek to find paths minimizing arc length by con-
structing a sequence of paths approaching a length-minimizing one. The fact
that a potentially minimizing path has many different parametrizations will
make the analysis more difficult, since it will be difficult to find a sequence of
paths which approaches a particular minimizing path among all the possible
parametrizations. Another analytic objection to the length functional is that
it is the L1 norm of the length of the tangent vector . L2 norms tend to
behave better, since we can use the structure of Hilbert spaces.
Assume for convenience that the interval I = [0, 1]. This can always be
achieved by using a linear map to take a given I to [0, 1].
Thus we introduce a related functional, the energy of a C 1 path : [0, 1]
X. Define Z 1
E() = 2g dt.
||
0
The energy is related to the length by the following proposition.
90
Proposition 50. For a given homotopy class C of curves : [0, 1] X, a
C 1 curve minimizes E in C if and only if it minimizes L among C 1 curves
in C and the speed |(t)|
g is constant.
Before we start the proof, we recall a little about homotopy classes.

Two continuous curves i : [0, 1] X i = 0, 1 are homotopic if i (0) = p,
i (1) = q for i = 0, 1, and if there is a continuous function (called a homotopy)
G : [0, 1] [0, 1] X so that G(0, t) = 0 (t), G(1, t) = 1 (t) for all t [0, 1],
and G(s, 0) = p and G(s, 1) = q for all s [0, 1]. (More generally, if Y
and X are both metric spaces, then two continuous maps f0 , f1 : Y X
are said to be homotopic if there is a continuous map F : [0, 1] Y X
with F (0, y) = f0 (y), F (1, y) = f1 (y) for all y Y . In the present case, the
space Y = [0, 1] and we impose the extra conditions that the values at the
endpoints t = 0, 1 are fixed at p, q respectively as well.)
Since we are measuring length and energy, we are only interested in curves
i which are C 1 , while we allow the homotopy G to be only continuous.
Proposition 51. The condition of two paths being homotopic is an equiva-

lence relation, and thus we may consider homotopy classes of paths.)
Proof. We need to show the property is reflexive, symmetric, and transitive.

If : [0, 1] X is a continuous path, then it is homotopic to itself via the
homotopy G(s, t) = (t) for s [0, 1]. This shows the reflexive property.
If 0 is homotopic to 1 via the homotopy G, then we see 1 is homotopic
to 0 via the homotopy G(s, t) = G(1 s, t). This shows the symmetric
property.
Finally, to show the transitive property, if 0 is homotopic to 1 via a
homotopy G and 1 is homotopic to 2 via a homotopy F , then we construct
a homotopy from 0 to 2 by the formula

G(2s, t) for s [0, 1/2]
H(s, t) =
F (2s 1, t) for s [1/2, 1]
Note this definition is well-defined, since for H(1/2, t) = 1 (t) for either
definition above. This observation also shows that H is continuous. It is
straightforward to show H is a homotopy.
A C 1 diffeomorphism t = t( ) of [0, 1] is called orientation preserving if
dt/d > 0. Another fact about homotopy well presently use is the following
91
Lemma 52. If ( ) = (t( )) for t = t( ) an orientation-preserving diffeo-
morphism of [0, 1], then and are homotopic.
Proof. For s, [0, 1], define (s, ) = s + (1 s)t( ). Then we will
show that G(s, ) = ((s, )) is the required homotopy. First of all, since
t( ) is an orientation-preserving diffeomorphism, we see t(0) = 0, t(1) = 1.
Now check that for s, [0, 1], (s, ) [0, 1]: because 0 1 and
0 t( ) 1, then
0 = s(0) + (1 s)0 s + (1 s)t( ) s(1) + (1 s)(1) = 1.
This shows the homotopy G is well-defined. It is obvious for [0, 1]
that G(0, ) = 0 ( ) and G(1, ) = 1 ( ). Also compute for s [0, 1],
G(s, 0) = (0) and G(s, 1) = (1).
Also, note the following
Lemma 53. For any C 1 path , E() L()2 and they are equal if and
only if |(t)|
g is constant.
Proof. Apply Holders inequality

Z 1 Z 1 21 Z 1 12
2 2
p
L() = |(t)|
g dt 1 dt |(t)|
g dt = E()
0 0 0
with equality if and only if 1 is proportional to |(t)|

g , which is the same as
|(t)|
g being constant.
Proof of Proposition 50. Let C satisfy E() E( 0 ) for all 0 C. Given
, let c be the constant speed reparametrization of (this exists by Problem
39 below). Then we have by Proposition 49 and Lemma 53
L(c )2 = L()2 E() E(c ) = L(c )2 .
Thus all the inequalities in the above equation must be equalities and L()2 =
E(). Then Lemma 53 implies must have constant speed. So weve shown
so far that if minimizes E, then has constant speed.
Let minimize E. For each C 1 curve 0 C, let c0 be a constant speed
reparametrization. Then since has constant speed, Lemma 53 and Propo-
sition 49 show
L()2 = E() E(c0 ) = L(c0 )2 = L( 0 )2 .
So weve shown that if minimizes E in C, then minimizes L in C.
We leave the converse statement as Problem 40 below.
92
Homework Problem 39. (a) Let : [0, 1] X, = (t) be a C 1 path
into a Riemannian manifold X. Assume |(t)| g 6= 0 for all t [0, 1].
Show that there is a reparametrization t( ) so that t(0) = 0, t(1) = 1,
dt/d > 0, and d is constant.
d g
Hint: Show the constant must be equal to L(). Then show the con-
dition is an ODE in = (t). (Note that if dt/d > 0, then t( ) is
strictly increasing and thus has an inverse on [0, 1].)
(b) Remove the condition that |(t)|

g 6= 0. In this case, t( ) will only be
Lipschitz.
Hint: Consider the open set O = {t : (t)
6= 0}. Perform a similar
analysis on each connected component of O.
*** This still needs work. ***
Homework Problem 40. For a given homotopy class C of curves :

[0, 1] X, assume has constant speed |(t)|
g and minimizes L among
C curves in C. Then minimizes E among C 1 curves in C.
1
Now we compute the first variation of the energy functional. Let be

a smooth curve from [0, 1] to X so that (0) = p, (1) = q. X RN has
the Riemannian metric pulled back from RN . Assume minimizes E in a
homotopy class C, and that is C 2 . Then for each smooth family (t), we
have
d
E( ) = 0.
d =0
Consider a variation of the following special form. Near a point in ([0, 1]),
pick local coordinates x : O U Rn . Then there is a small time interval
I = 1 (O) [0, 1]. Assume for simplicity that I doesnt contain either
endpoint 0 or 1. In terms of the local coordinates x, x((t)) = (t) U Rn ,
for t I. Then let h : R Rn be a smooth function so that supp(h) I.
For near 0,
(t) = (t) + h(t) U
for t I. We define outside of O to be simply . Apply the first variational
93
formula
Z 1
d d
E( ) = g( (t), (t)) dt
d =0 d =0 0
Z
d
= gij ((t) + h(t))[ i (t) + h i (t)][ j (t) + h j (t)] dt
d =0 I
Z
gij
= k
((t))h (t) i (t) j (t) dt
k
I x
Z
+ gij ((t)) h i (t) j (t) dt
ZI
+ gij ((t)) i (t) h j (t) dt
I
Now we integrate by parts in the last two integrals. Note that since h has
compact support, all the boundary terms involving h vanish. Compute
Z Z
i j gij
gij ((t)) h (t) (t) dt = k
((t)) (t) hi (t) j (t) dt
k
I I x
Z
gij ((t)) hi (t) j (t) dt.
I
We may plug this in to find for a minimizer

d
0 = E( )
d =0
Z
gij k i j gij k i j i j gij k i j i j
= h k h gij h k h gij h dt
I xk x x
Z
k gij i j gkj i j j gik j i j
= h gkj gjk dt.
I xk xi xj
Since this is true for each h with compact support in I, then we must have
for each k = 1, . . . , n, and for all t in the open interval I,
gij i j gkj i j gik j i
0= k
i
gkj j gjk j .
x x xj
94
Since gkj = gjk , we have

j 1 gij gkj gik
0 = gjk + k + + i j ,
2 x xi xj

` 1 k` gkj gik gij
0 = + g + k i j
2 xi xj x
= ` + ìj i j ,

1 k` gkj gik gij
ìj = g + k .
2 xi xj x
ìj are called the Christoffel symbols of the metric gij , and
` + ìj i j = 0 (26)
is called the geodesic equation for the metric g. Note
ìj = `ji .
Any curve satisfying this second-order system is called a geodesic on the

Riemannian manifold X.
Remark. Our definition of geodesic requires a specific parametrization to
solve the equation (the constant speed parametrization). Many other authors
define a geodesic to be a curve which satisfies the first variational equation
of arc-length. These geodesics are the same as our geodesics as subsets of the
Riemannian manifold, but the parametrization is not required to be constant
speed.
Note that this analysis does not work at the endpoints 0 and 1. There,
we simply have the conditions (0) = p and (1) = q to remain in the class
C. This is essentially a Dirichlet boundary condition on the problem.
Homework Problem 41. Let p, q be points in a manifold X, and consider
the class C of all smooth paths from p to q.
(a) Compute the Euler-Lagrange equations for the length functional L()
for C. Show that any : [0, 1] X which is a critical point of L
must satisfy
` (t) + ìj ((t)) i (t) j (t) = c(t) ` (t)
for t (0, 1) and c(t) a real-valued function of t.
95
(b) Use part (a) to prove the following generalization of Proposition 50:
A curve in C is a critical point of E if and only if it is a critical point
of L and it has constant speed.
Homework Problem 42. Let (X, g) be an n-dimensional smooth compact
Riemannian manifold. By Nashs Theorem, we may assume that g = i the
pull-back of the Euclidean metric on RN for some embedding i : X RN .
If (p, v) T X (i.e. p X and v Tp X), show that the solution to the
geodesic equation (26) on X with initial conditions (0) = p and (0)
=v
exists for all time.
Hints:
(a) Show that if (t) solves the geodesic equation (26), then the speed |(t)|
g
is constant in t.
(b) Reduce the problem to the case the initial speed |v|g(p) = 1.
(c) The unit tangent bundle U T X is defined by
U T X = {(p, v) T X : |v|g(p) = 1}.
Show U T X is compact as long as X is compact.
(d) Mimic the proof of Theorem 15 to complete the proof.
Example 16. Euclidean space is Rn with the standard Euclidean metric
= ij dxi dxj . In this case, all the Christoffel symbols kij vanish, since each
term involves differentiating the components of the metric tensor, all of which
are constant. Therefore, the geodesic system is simply k = 0. Solutions to
this ODE are simply linear functions of t, and so geodesics are of the form
= tv + w for v, w Rn . So geodesics on Euclidean space are straight lines
traversed at constant speed.
Example 17. For hyperbolic space, recall the metric gij = (xn )2 ij on {x
Rn : xn > 0}. Compute the Christoffel symbols:
g ij = (xn )2 ij ,
gij,k = 2(xn )3 ij kn ,
kij = 12 (xn )2 k` (gi`,j + g`j,i gij,` )
= 1
2
(xn )2 k` [2(xn )3 ](i` jn + `j in ij `n )
= (xn )1 (ik jn + jk in kn ij ).
96
Now consider i, j, k distinct integers in {1, . . . , n}.
kij = 0,
iik = iki = (xn )1 kn ,
kii = (xn )1 kn ,
iii = (xn )1 in .
First, we look for solutions in which k = 0 for k = 1, . . . , n 1 (so only

n varies in t). It is plausible to look for such solutions since the coefficients
gij of the metric depend only on xn .
In this case, for k < n, compute
0 = k = kij i j
= knn n n
= (xn )1 kn n n = 0.
Thus if 1 = = n1 = 0, then the geodesic equations for k for k < n

are automatically solved.
Now compute the geodesic equation for n :
n = nij i j
= nnn n n
= (xn )1 n n ,
= ( n )1 n n . (27)
This is a second-order nonlinear equation in n , and we do not have any

general technique to solve such an equation. We can, however, make some
educated guesses. In particular, note that
( n n ) = n n + n n ,
and that each of these terms is similar to those in the geodesic equation (27)
above.
In particular, compute for a function f of n
0 = (f ( n ) n ) (28)
n + f 0 ( n ) n n ,
= f ( n )
f 0 ( n ) n n
0 = n + . (29)
f ( n )
97
This last equation is the same as the geodesic equation (27) if
f 0 ( n ) 1
n
= n,
f ( )
and this is now a first-order separable equation for f . We may solve to find
f = ( n )1 is a solution.
Now plug into (28) to find
n

0 = ,
n
n
C = n

= (log n ),
Ct + D = log n ,
n = AeCt
for A a positive constant (since in hyperbolic space, we have xn = n > 0)

and C any real constant. Therefore,
1 = 01 , ..., n1 = 0n1 , n = AeCt
solves the geodesic system on hyperbolic space.

So far we have only found geodesics in the special case that 1 = =
n1 = 0. To find all the geodesics on hyperbolic space, we introduce the
notion of an isometry of a Riemannian manifold.
Given a Riemannian manifold (X, g), a diffeomorphism : X X is an
isometry if g = g. Isometries of Hn are well understood, and we introduce
a specific type. For > 0, let
x
: x 7 ,
|x|2
where x Hn Rn and |x|2 = (x1 )2 + + (xn )2 comes from Rn . It is easy

to see that is a diffeomorphism of Hn . To show that it is an isometry, let
y = (x). Then
Pn j 2
!
j=1 (dy )
g = .
|y|2
98
Dropping the pull back notation, we compute
xj
yj = ,
|x|2
y j i
dy j = dx ,
xi
n
X |x|2 ij 2xi xj i
= 4
dx ,
i=1
|x|
n
!2
X |x|2 ij 2xi xj i
(dy j )2 = 2 dx
i=1
|x|4
n
! n
!
X |x|2 ij 2xi xj i X |x|2 kj 2xk xj k
= 2 dx dx
i=1
|x|4 k=1
|x|4
n
2 X i k j 2 2 i j j 2 k j j 4 j j
i k
= 4x x (x ) 2|x| x x k 2|x| x x i + |x| i k dx dx
|x|8 i,k=1
( n n
2 j 2
X
i k i k 2 j j
X
= 4(x ) x x dx dx 2|x| x dx xi dxi
|x|8 i,k=1 i=1
n
)
X
2|x|2 xj dxj xk dxk + |x|4 (dxj )2
k=1
( n n
)
2
X X
= 4(xj )2 xi xk dxi dxk 4|x|2 xj dxj xi dxi + |x|4 (dxj )2 ,
|x|8 i,k=1 i=1
99
n
( n
! n
!
X 2 X X
(dy j )2 = 4 (xj )2 xi xk dxi dxk
j=1
|x|8 j=1 i,k=1
n n
)
X X
2 j i i j 4 j 2
4|x| x x dx dx + |x| (dx )
i,j=1 j=1
( n n
2 2
X
i k i k 2
X
= 4|x| x x dx dx 4|x| xi xk dxi dxk
|x|8 i,k=1 i,k=1
n
)
X
+ |x|4 (dxj )2
j=1
2 n
X
= (dxj )2 ,
|x|4 j=1
2
Pn j 2
Pn j 2
j=1 (dy ) |x|4 j=1 (dx )
= n 2
(y n )2 2 (x|x|4)
Pn j 2
j=1 (dx )
= .
(xn )2
Therefore, g = g and is an isometry.
Moreover, it is trivial to check that any translation x 7 x + x0 is an
isometry of Hn if the last component xn0 = 0. Also, note that the composition
of two isometries is again an isometry (indeed the set of isometries of a
Riemannian manifold X forms a subgroup of the diffeomorphism group called
the isometry group).
Proposition 55 below shows that for any geodesic : R Hn , then
is also a geodesic. Recall we know so far that
= (01 , . . . , 0n1 , AeCt )
are geodesics for A > 0, C R. Compute for > 0,
(01 , . . . , 0n1 , AeCt )
= = .
||2 (01 )2 + + (0n1 )2 + A2 e2Ct
The image (R) is then the half-circle in Rn which intersects {xn = 0}
perpendicularly at
(01 , . . . , 0n1 , 0)
0 and .
(01 )2 + + (0n1 )2
100
Then if we apply the isometry given by adding a constant x0 with xn0 = 0,
then every half-circle in Hn which intersects {xn = 0} perpendicularly at both
endpoints is the image of a geodesic path in Hn .
All together, for constants
01 , . . . , 0n1 , x10 , . . . , xn1

0 , C R, A, > 0,
the path for t R

(01 , . . . , 0n1 , AeCt )
(t) = + (x10 , . . . , xn1
0 , 0) (30)
(01 )2 + + (0n1 )2 + A2 e2Ct
is a geodesic in Hn , and the image (R) is a ray or a half-circle in Rn
perpendicular to {xn = 0}. All such rays and semicircles are represented by
such geodesic paths.
We claim that we have found all the geodesics in Hn . The way to check
this is to recognize that the geodesic system, as a second-order ODE system
with smooth coefficients, has a unique solution for each initial value problem
k = kij i j , (0) = y0 , (0)

= v0 .
Then if we can check that every initial condition (y0 , v0 ) T Hn occurs as

((0), (0)) for a geodesic (t) in (30), uniqueness of the geodesic system
will imply that we have found all the geodesics in Hn .
So we must check that every (y0 , v0 ) T Hn = Hn Rn can be represented

by ((0), (0)) for a (t) in (30). For a given point y0 Hn , and vector
n n
v0 Ty0 H = R , consider first the case when
v01 = = v0n1 = 0.
In this case, we can choose A > 0 and C so that
(t) = (y01 , . . . , y0n1 , AeCt )

satisfies (0) = y0 and (0) = v0 . Otherwise, y0 and v0 span a plane P in
n n
H . Let L = P{x = 0}. It is straightforward to check that there is a unique
semicircle in the plane P which hits L perpendicularly, passes through y0 and
is tangent to v0 at y0 . This is the image of some geodesic (t) in (30). Then
we can adjust C and A to ensure that (0) = y0 and (0) = v0 . Therefore,
every initial condition (y0 , v0 ) is achieved by a geodesic on our list, and we
have found all the geodesics in hyperbolic space.
101
The following proposition was discussed in Example 17 above.
Proposition 54. Consider a Riemannian manifold (X, g). Given p X,
v Tp X, there is an > 0 and a unique geodesic : (, ) X with
(0) = p, (0)
= v.
Remark. In general, the geodesic may not exist for all time, although we
have seen that all the geodesics on hyperbolic space (Example 17) and on
compact Riemannian manifolds (Problem 42) do exist for all time.
A map : X Y for manifolds X and Y with Riemannian metrics g
and h respectively is a local isometry if every point in X has a neighborhood
O on which : O (O) Y is an isometry.
Proposition 55. If : X Y is a local isometry of Riemannian manifolds,
then for every geodesic : (, ) X, is a geodesic on Y . Any geodesic
on (X) Y is of this form.
Proof. In local coordinates on X and Y , we can write the isometry as y =
y(x). Note this is the same form as a coordinate change, and the condition
that the map is an isometry is simply that the metric pulls back as a (0, 2)
tensor when changing coordinates.
Therefore, the proof boils down the the following fact: for a local isometry,
and for any C 2 path , the quantity
wk = k + kij i j
transforms like a tangent vector (i.e. a (1, 0) tensor) under changes of coor-
dinates. Therefore,
y I
wk k = wk k I
x x y
and wk (x) = 0 for k = 1, . . . , n is equivalent to wI (y) = 0 for I = 1, . . . , n.
y I
This is because x k is nonsingular for y = y(x) a diffeomorphism.
In order to compute how wk transforms, we use the following index con-

vention. Indices i, j, k, . . . are with respect to the x variables, while indices
I, J, K, . . . are with respect to the y variables. For example, gij is the metric
in the x coordinates, while gIJ is the metric in the y coordinates.
First of all, note
xi xj y I y J
gIJ = gij , g IJ = g ij .
y I y J xi xj
102
Compute
gIJ
gIJ,K =
y K
xi xj

= gij I J
y K y y
gij x xj
i
2 xi xj xi 2 xj
= + gij + gij
y K y I y J y I y K y J y I y J y K
xk xi xj 2 xi xj xi 2 xj
= gij,k K I J + gij I K J + gij I J K .
y y y y y y y y y
Then compute
gKJ,I + gIK,J gIJ,K

xk xi xj 2 xi xj xi 2 xj
= gij,k I K J + gij I K J + gij K J I
y y y y y y y y y
k i j 2 i j
x x x x x xi 2 xj
+ gij,k J I K + gij I J K + gij I J K
y y y y y y y y y
k i j 2 i j
x x x x x xi 2 xj
gij,k K I J gij I K J gij I J K
y y y y y y y y y
xk xi xj xk xi xj xk xi xj
= gij,k I K J + gij,k J I K gij,k K I J
y y y y y y y y y
2 i j
x x
+ 2 gij I J K .
y y y
103
Then the Christoffel symbols
LIJ = g KL (gKJ,I + gIK,J gIJ,K )

1
2
K L
xk xi xj xk xi xj

1 m` y y
= 2g gij,k I K J + gij,k J I K
xm x` y y y y y y
k i j 2 i j

x x x x x
gij,k K I J + 2 gij I J K
y y y y y y
L k j
xk xi

y x x
= 12 g m` gmj,k + gim,k
x` y I y J y J y I
xi xj 2 xi

gij,m I J + 2 gim I J
y y y y
L i j
y x x y 2 x`
L
= ìj + .
x` y I y J x` y I y J
Note that the second term in the last formula shows that the Christoffel
symbols do not transform as a tensor. In fact, this is fortunate, as the extra
non-tensorial term will cancel out a similar term coming from the second
derivative k .
Note that
y I i
I = i
,
x
d y L

L = () `
dt x`
y L ` 2yL j `
=
+ .
x` x` xj
Compute
y L xi xj y L 2 x` I J

m y p y
LIJ I J = ìj +

x` y I y J x` y I y J xm xp
y L 2 xk y L y I y J
= ìj i j ` + I J k j ` j ` .
x y y x x x
Therefore, L + LIJ I J will transform like a tensor if we can show that the
non-tensorial terms cancel: We need to show
2yL 2 xk y L y I y J
+ = 0. (31)
x` xj y I y J xk xj x`
104
This equation follows from the formula for the first derivative of an inverse
matrix. If A represents the first derivative of a matrix A (with respect to
any parameter or variable), then
1 .
(A1 ) = A1 AA
(Proof: Differentiate the equation AA1 = I to find AA 1 + A(A1 ) = 0.)
Then since (y /x ) is the inverse matrix of (x /y L ),
L ` `
2yL
L
y
` j
= j
x x x x`
y L
k J
x y
= k
x xj y J x`
y L y I xk
J
y
= k j I J
.
x x y y x`
Upon plugging in, this proves formula (31) and the proposition.
Remark. There is also a more geometric proof of the previous proposition.
Recall that we derived the geodesic equation as the Euler-Lagrange equation
of the energy functional. So any path which minimizes the energy satisfies the
geodesic equation. It is easy to see that the energy of a path is invariant under
an isometry; therefore, the notion of energy-minimizing path is invariant
under isometries.
The problem is that there are geodesics which do not minimize the en-
ergy. (They may be saddle points of the energy functional.) This can be
surmounted by restricting to small domains by using the following fact from
Riemannian geometry: Every point in a Riemannian manifold has a neighbor-
hood O so that all geodesic paths in O are energy-minimizing for endpoints
in O. (In Riemannian geometry books, this fact is usually stated in terms
of the length functional instead; to translate to the present situation, re-
call that energy-minimizing paths are length-minimizing paths parametrized
with constant speed.)
Homework Problem 43. Given a smooth function on a Riemannian man-
ifold, the Hessian of f is defined locally by the formula
2f f
H(f )ij = i j
kij k .
x x x
Show that the Hessian of f is a symmetric (0, 2) tensor.
105
Homework Problem 44. Compute all the geodesics on S2 .
Hint: Use the expression for the metric in local coordinates (y 1 , y 2 ) from
Example 13. Compute the Christoffel symbols. Analyze the case when y 2 = 0
and only y 1 varies. Solve the resulting second-order ODE for 1 = y 1 . Then
move these geodesics around via the isometry group of S2 .
(The isometry group of S2 is given by the orthogonal group of 3 3 ma-
trices
O(3) = {A : AA> = I}.
Show that each such linear action is an isometry of R3 which takes the unit
sphere S2 to itself. For every line L though the origin in R3 , show that
rotating by an angle around the line L is a linear map in O(3). Show
that every initial condition (p, v) T S2 of the geodesic equation on S2 can
be realized by the examples you computed above, when acted on by such a
rotation in O(3).)
4.3 The direct method: An example

We have computed the Euler-Lagrange equations of the energy functional.
Now we introduce an example of the direct method in the calculus of varia-
tions.
The direct method is this: Given a functional E : C R, if there is a
lower bound I = inf C E() > , then there is a sequence of paths i so
that E(i ) I. The direct method is to show that there is a subsequence
of {i } which converges to some , and to show that the limiting C and
that E() = I. Thus we have constructed a minimizer over the class C
of the functional E. There are subtle points to deal with along the way.
Typically, the class C is a closed subset of a Banach space, and in passing
to the limit of a subsequence, the limit we construct may be in a weaker
Banach space (for example, a sequence in C 1 may produce a limit only in
C 0 , which will be problematic if the functional involves any derivatives).
A related issue is that in passing to the limit ij , we may not have
E(ij ) E(). In particular, below we will have to deal with the situation
in which we only know limj E(ij ) E()so that the functional is only
lower semi-continuous under the limit. Thus we will typically need to spend
time improving the regularity of the limit and showing some semi-continuity
of the functional under the limiting subsequence.
The direct method of the calculus of variations is very useful in solving
106
elliptic PDEs. The problem we approach involves geodesics, and thus the
solution we produce be a solution to an ODE. This will allow us to proceed
with much of the general picture of the calculus of variations while avoid-
ing some of the more technical points. In particular, we will learn about
distributions, weak derivatives, Hilbert spaces, and compact maps between
Banach spaces in solving our problem.
Given a smooth manifold X, a loop is a continuous map from the circle S1
to X. Each such loop is equivalent to a continuous map : R X which is
periodic in the sense that (t+1) = (t) for all t R. We will abuse notation
by using the same for : S1 X and the periodic : R X. (This is
because S1 is naturally the quotient R/Z, where Z acts on R by adding
integers to real numbers.) Two loops 0 , 1 : S1 X are freely homotopic if
there is a continuous homotopy
G : [0, 1] S1 X, G(0, t) = 0 (t), G(1, t) = 1 (t).
The condition of being freely homotopic is an equivalence relation, and thus

each loop on a manifold X is a member of a free homotopy class.
Here is our problem:
Problem: Find a curve of least length in a free homotopy class of loops on

a compact Riemannian manifold.
The problem may have no solution on a noncompact Riemannian mani-

fold. There may be loops of arbitrarily small length in a given nontrivial free
homotopy class, corresponding to a loops slipping off a narrowing end of the
manifold.
Homotopy classes are objects defined by continuity, and the following
result should come as no surprise.
Proposition 56. For a smooth compact manifold X RN , there is an > 0
so that if two loops 0 , 1 : S1 X RN satisfy
k0 1 kC 0 (S1 ,RN ) < ,
then 0 and 1 are homotopic as loops in X.

Proof. We apply the -Neighborhood Theorem (19): For > 0, let X be the
open subset of RN consisting of all points distance less than from X. There
is a > 0 small enough so that every point in X has a unique closest point
107
in X. Then the map : X X which sends a point in X to its closest
point in X is a smooth map of X to X, and it fixes each point in X X .
Let 0 and 1 be loops on X satisfying
k0 1 kC 0 (S1 ,RN ) < .
Then consider the homotopy in RN

t) = (1 s)0 (t) + s1 (t) RN .
G(s,
For s, t [0, 1], the distance in RN

t) 0 (t)| = s|0 (t) 1 (t)| < 1 .
|G(s,
t) X for all s, t [0, 1], and we may define a homotopy in X by
So G(s,
t)).
G(s, t) = (G(s,
Remark. The homotopy G(s, t) constructed is a smooth homotopy if 0 and

1 are smooth. Thus the same theorem works with smooth homotopy classes
(as considered in Guillemin and Pollack).
Corollary 57. If i are a sequence of loops in a free homotopy class in

X RN , and
lim ki kC 0 (S1 ,RN ) = 0,
i
then the loop is in the same free homotopy class.
Proof. For the > 0 of Proposition 56 above, there is a i so that
ki kC 0 (S1 ,RN ) < .
Apply Proposition 56 to show and i are in the same free homotopy class.
The -Neighborhood Theorem, together with the mollifier technique of

approximation, allow us to prove an important foundational result in topol-
ogy:
108
Theorem 20. Let f : Rn Y be uniformly continuous, where Y RN is
a compact submanifold without boundary. Then f is homotopic to a smooth
map from Rn Y .
Proof. Since f is uniformly continuous, for all > 0, there is a > 0 so
that if |x x0 | < , then |f (x) f (x0 )| < . The -Neighborhood Theorem
shows that there is an > 0 so that the map : Y Y is well-defined and
smooth. Let be the corresponding from the uniform continuity of f .
Let be a smooth nonnegative
R bump function with support in the unit
n
ball B1 (0) in R so that Rn dxn = 1. Then for > 0, define (x) =
n (x/). Note supp = B (0). Define
Z Z

f (x) = f (y) (x y) dyn = f (y) (x y) dyn .
Rn {y:|xy|}
(Note each f is RN -valued.) If < , then |f (y) f (x)| < for y in the
domain of integration, and so
Z

f (x) = f (y) (x y) dyn
{y:|xy|}
Z
= [f (y) f (x)] (x y) dyn
{y:|xy|}
Z
+ f (x) (x y) dyn
{y:|xy|}
Z
= [f (y) f (x)] (x y) dyn + f (x)
{y:|xy|}
since
Z Z Z
(x y) dyn = (x y) dyn = (z) dzn = 1
{y:|xy|} Rn Rn
for the substitution z = x y. So

Z

|f (x) f (x)| = [f (y) f (x)] (x y) dyn (32)
{y:|xy|}
Z
|f (y) f (x)| (x y) dyn
{y:|xy|}
Z
< (x y) dyn = .
{y:|xy|}
109
Therefore if (0, ), then f (x) Y . Then we check that f (x) =
(f (x)) is the desired homotopy. In particular, as 0, f (x) f (x)
uniformly by (32) (view as varying to zero instead of fixed for this inter-
pretation). Since and f are smooth, then f is smooth for small > 0.
In particular, we have shown that
f (x) for > 0 small

F (, x) =
f (x) for = 0
is the desired homotopy.
Theorem 21. Let f : X Y be a continuous map between smooth mani-
folds. Then f is homotopic to a smooth map from X Y .
Sketch of proof. We may assume X RM by Whitneys Embedding The-
orem. Then there is a > 0 so that M : X X is well-defined and
smooth. Define g : RM RN by g(p) = f (M (p)) for p X and g(p) = 0
for p 6 X . Note g(p) is uniformly continuous on a neighborhood of X.
Apply the mollifier argument as above to g and show that the homotopy
constructed in the proof of Theorem 20, when restricted to X RM , has the
desired properties.
The discussion above about energy and length still holds. Assuming the
minimizer is smooth enough, then a constant-speed length-minimizing loop
is the same as an energy-minimizing loop. Thus we may as well consider
energy-minimizing loops, and we have the equivalent problem.
Problem: Find a curve of least energy in a free homotopy class of loops on

a compact Riemannian manifold.
So far in our discussion, the formulation of length and energy depend

on the loop being C 1 (so that the derivative is C 0 and thus can be
integrated). If we look more closely, the energy is defined as the square of
the L2 norm of Z 1
E() = 2g dt.
||
0
Therefore, we really do not need to be continuous, but only L2 . In terms of
itself, we need to develop a theory of how to take a derivative which ends
up not being continuous, but only L2 . For this purpose, we define derivatives
in the sense of distributions, or weak derivatives.
110
4.4 Distributions
On Rn , we consider each smooth function with compact support to be a
test function. For any C 1 function f on Rn and test function , we have the
following formula by integrating by parts:
Z Z
f,i dxn = f ,i dxn . (33)
Rn Rn
For two locally L1 functions f and h on Rn , we say f,i = h in the sense of

distributions if for all test functions ,
Z Z
h dxn = f ,i dxn .
Rn Rn
Let D(Rn ) be the vector space of all smooth functions with compact sup-
port in Rn . For our purposes, we will define a distribution on Rn to be a linear
map from D(Rn ) R. We often allow C-valued test functions and consider
complex linear maps to C; complex-valued functions are useful when doing
Fourier analysis. (The usual definition of a distribution is more involved:
one must define a topology on D(Rn ) and then consider distributions to be
only continuous linear maps to C. For our purposes, the simpler definition
suffices. See Section 4.9 below for a more standard treatment of distributions
on the circle S1 .) Recall a measurable function
R f is locally L1 if over every
compact subset K of the domain of f , K |f | < . Any locally L1 function
f on Rn gives a distribution by sending
Z
f : 7 f () = f dxn .
Rn
Notice that there is a slight abuse of notation: f () for a test function is

not to be confused with f (x) for x Rn . Two locally L1 functions f1 , f2 are
said to be equal in the sense of distributions if for every test function ,
Z Z Z
f1 dxn = f2 dxn (f1 f2 ) dxn = 0.
Rn Rn Rn
Remark. On RN , note that any locally Lp function for p 1 is also locally

L1 . This is because for K Rn , p1 + 1q = 1, and f locally Lp , Holders
inequality states
Z Z 1q Z p1
p
|f | dxn 1 dxn |f | dxn < .
K K K
111
Example 18. Any locally finite Borel measure d on Rn defines a distribu-
tion by sending Z
7 d
Rn
for any test function .
An important example of this is the inaptly named -function, or unit
point mass, at the origin. The -function is a measure on Rn so that for any
subset Rn ,
1 if 0
() =
0 if 0
/ .
So the distribution defined by this measure is
: 7 (0),
which is just evaluation of at the origin. The following problem shows there
is no locally L1 function which is equal to the -function.
Homework Problem 45. Show that there is no L1 function f on Rn so

that Z
f dxn = (0) for all D(Rn ).
Rn
n
Hint: Consider a smooth nonnegative function R : R R with support
in B1 (0) the unit ball centered at 0 and so that Rn dxn = 1. Use this to
define (x) = n (x/). If there were such an L1 function f , recall that if
Z

f (x) = f (y) (x y) dyn ,
Rn
then f f in L1 as 0.
(a) Show that for all x 6= 0 that f (x) = 0 for small enough. (Follow the
proof of Proposition 58.)
(b) Suppose a family of continuous functions f f in L1 (Rn ) as 0+ ,

and let O Rn be a measurable subset on which f = 0 identically on
O for all sufficiently small. Show that f = 0 almost everywhere on O.
(Split up the relevant integrals on Rn into integrals on O and Rn \ O.)
(c) Show our f = 0 almost everywhere on Rn .
112
(d) Find a contradiction.
We have just seen that distributions are more general than functions.
In particular, it is possible to differentiate any distribution by mimicking
formula (33). A distributional derivative of a function may no longer be a
function, but it will be well-defined as a distribution. Given a distribution f
defined by a map f : 7 f () R, the partial derivative f,i in the sense of
distributions is defined to be the distribution
f,i : 7 f (,i ).
It is this innovation which allows us to define the derivatives of L2 functions.

Remark. Note that the equation (33) motivating the distributional derivative
is essentially the same as the integration by parts used to calculate the Euler-
Lagrange equations for +h. Thus if h is smooth with compact support, we
can still integrate by parts even if is no longer regular enough for ordinary
differentiation; we simply consider the derivatives to be taken in the sense of
distributions.
Homework Problem 46. Consider the Heaviside function

1 if x 0
h(x) =
0 if x < 0.
Show that the derivative h0 (taken in the sense of distributions) is the

function on R.
Homework Problem 47. Consider for any test function D(R),

Z Z
1 1 1
PV () = lim+ (x) dx + (x) dx .
x 0 x x
Part (a) shows that P V ( x1 ) is a distribution. It is called the principal value

of x1 .
(a) Show P V ( x1 )() converges for all smooth test functions . (Hint: The
potential problem is clearly at x = 0. Use Taylors Theorem to write
= (0) + O(x), where O(x) represents a term so that O(x)/x con-
verges to a real limit as x 0.)
113
(b) Show that the first derivative in the sense of distributions of P V ( x1 ) is
given in terms of D(R) as
Z Z
1 1 2
lim 2 (x) dx + 2 (x) dx + (0) .
0+ x x
One more thing is needed to complete the picture of distributions as

generalizations of functions. Recall that every locally Lp function for p 1
defines a distribution. The following proposition shows this map is injective.
Proposition 58. If two locally L1 functions f1 and f2 on Rn define the same

distribution, then f1 = f2 almost everywhere.
Proof. We first consider the case when f1 and f2 are both globally L1 on Rn .
Then recall that we can use a mollifier to approximate each in L1 by smooth
functions. In particular,
R if is a smooth nonnegative function with compact
support so that Rn dxn = 1, then define
Z
1 x
(x) = n , fi (x) = (x y)fi (y) dyn , i = 1, 2.
Rn
Then each fi is a smooth L1 function on Rn and fi fi in L1 as 0.

Now for each fixed x Rn , (x y) is a smooth test function with compact
support in y, and fi (x) is simply the evaluation of this test function by the
distribution fi . Since f1 = f2 in the sense of distributions, then f1 (x) = f2 (x)
for all x Rn . So then
kf1 f2 kL1 = lim kf1 f2 kL1 = lim 0 = 0.

0 0
Then f1 = f2 in L1 , which is equivalent to f1 = f2 almost everywhere.

If f1 and f2 are only locally L1 , consider a smooth function R with
compact support which is identically equal to 1 on BR = {|x| R}. It is
easy to check that the condition f1 = f2 in the sense of distributions implies
R f1 = R f2 in the sense of distributions. Then since each fi is locally L1 ,
each R fi is globally in L1 . We apply the argument of the previous paragraph;
so R f1 = R f2 almost everywhere on Rn . This implies that f1 = f2 almost
everywhere on the ball BR . Now let R to conclude that f1 = f2 almost
everywhere on Rn .
114
So far, we have discussed distributions on Rn . On the circle S1 , the
definitions are similar, the main difference being that since S1 is compact,
our test functions are simply all smooth functions on S1 . In particular, we
can think of test functions on S1 as smooth periodic functions on R with
period 1. In this way, an L1 function f on S1 acts on test functions by
Z 1
f: f dt.
0
One thing to check is that integration by parts still works. If f is C 1 on S1

and is smooth on S1 , then
Z Z 1

f dt = f dt
S1 0
Z 1 1
f dt + (f )

=
Z0 0
= f dt + f (1)(1) f (0)(0)
S 1
Z
= f dt
S1
because f (0) = f (1) and (0) = (1) since f and are periodic. So we
have the same basic formula as in (33), and we may define distributions and
distributional derivatives in the same manner as above.
Now we return to our problem. We want to consider all loops : S1
X RN so that
Z 1
E() = 2g dt = kk
|| 2L2 (S1 ,RN ) < .
0
Therefore, we consider the Sobolev space
L21 (S1 , RN ) = { : S1 RN : kk2L2 = kk2L2 + kk

2L2 < },
1
where the derivative is taken in the sense of distributions. Note that

L2 (S1 , RN ) implies that , when defined in the sense of distributions,
may be represented as a function (and an L2 function at that).
We may consider each component 1 , . . . , N separately, and it should be
clear that i in L21 (S1 , RN ) if and only if each ia a in L21 (S1 , R) for
115
each a = 1, . . . , N . Thus we may work with each component of separately
in RN . Below we will see that L21 is a Hilbert space, but for now we are
content to show that every function in L21 (S1 ) is continuous. Recall that
elements of L21 (S1 ) are only equivalence classes of functions, two functions
being equivalent if they agree almost everywhere.
Proposition 59. Every element of L21 (R) contains a continuous representa-

tive.
Remark. This proposition is an important example of the Sobolev embedding

theorem, which gives a means to embed Sobolev spaces Lpk (Rn ) into appropri-
ate C ` spaces ` = `(p, k, n). In particular, the present result depends strongly
on the fact that the dimension of the domain R of the functions is one. (There
are elements of L21 (R2 ) which do not have continuous representatives.)
Proof. Let f L21 (R). So R |f|2 dt = C 2 < . Then compute for t2 t1
R
Z t2

|f (t2 ) f (t1 )| = f (t) dt
t1
Z t2 12 Z t2 12
|f(t)|2 dt dt
t1 t1
1
C(t2 t1 ) . 2
So this formula shows f is continuous, as long as we can justify using the

Fundamental Theorem of Calculus
Z t2
f (t2 ) f (t1 ) = f(t) dt.
t1
Rt
We achieve this by defining g(t) = 0 f(s) ds. The previous argument
implies that g is continuous. Now we argue that there is a constant K so
that f g = K almost everywhere. This will show there is an continuous
representative g + K in the equivalence class of f .
First we show that g = f in the sense of distributions. Consider a test
116
function . Then
Z
g()
= dt
g(t)(t)

Z Z t
= dt
f (s) ds (t)
Z 0 Z
=
f (s)(t) ds dt + f(s)(t)
ds dt
R1 R2
by Fubinis Theorem, for the regions in the plane
R1 = {(s, t) : s 0, t s}, R2 = R1 .
Then again by Fubini, and since has compact support,

Z Z Z 0 Z s
g()
= dt f(s) ds +
(t) dt f(s) ds
(t)
0 s
Z Z 0
=
((s))f (s) ds + (s)f(s) ds

Z 0
= (s)f(s) ds

= f().
Therefore, g = f in the sense of distributions.

The following proposition, applied to f g, shows that there is a constant
K so that f = g + K in the sense of distributions. Then Proposition 58
above shows f = g + K almost everywhere, and thus there is a continuous
representative in the equivalence class of f .
Proposition 60. If a distribution h on R satisfies h = 0 in the sense of
distributions, then there is a constant K so that h = K as distributions.
R
Proof. Let be a test function Rwith integral R dt = 1. Let K = h().
Then for a test function with R dt = L, compute
h() = h( L) + Lh() = h( L) + LK.
But now Z
( L) dt = L L 1 = 0,

117
and thus the function
Z t
(t) = [(s) L(s)]ds (34)

is a smooth function with compact supportProof: Let supp( L)

[T, T0 ]. It is clear that (t) = 0 for t < T . For t > T 0 , note that 0 (t) =
(t) L(t) = 0 and so is constant on (T 0 , ). Then (34) shows that
(t) 0 as t , and so = 0 on (T 0 , ).
Then since = L,
h() = LK + h( L) = LK + h()
= LK h() = LK
since h = 0 in the sense of distributions. But then

Z Z
h() = LK = K dt = K dt.
R R
and h = K as distributions.
Homework Problem 48. Prove Propositions 59 and 60 above for distribu-

tions on S1 instead of on R. Here are the key steps:
(a) Let f : S1 R be an L2 function, and assume that the distributional

derivative f is L2 as well. Represent f and f as periodic functions
from R R. For any t R, define
Z t
g(t) = f(s) ds.
0
Show that g is periodic and continuous (and so defines a continuous

function on S1 .) Note that the constant function 1 is a test function
on S1 .
(b) Show that f = g in the sense of distributions. In other words, for every
smooth periodic test function D(S1 ), show that
Z 1 Z 1

f dt = g dt.
0 0
118
(c) If h is a distribution on S1 which satisfies h = 0 in the sense of distri-
butions, show there is a constant K so that h = K as distributions. In
other words, show that for every periodic smooth : R R,
Z 1
h() = K dt.
0
Now since any L21 map from S1 X RN is continuous, each one is in

a free homotopy class of loops on X. With that in mind, we formulate our
final version of the problem:
For X RN a smooth submanifold with Riemannian metric pulled back
from the Euclidean metric on RN , define
L21 (S1 , X) = { L21 (S1 , RN ) : (S1 ) X}.
Here we assume that is continuous, as we may by Proposition 59 above.
Problem: Let X RN be a smooth compact manifold equipped with the

Riemannian metric pulled back from the Euclidean metric on RN . Let C be
the class of loops : S1 X in a free homotopy class on X and in L21 (S1 , X).
Find a loop of least energy in C.
Proposition 61. Let L21 (S1 , X) be energy minimizing in a free homotopy

class on X for X RN a smooth manifold without boundary. Then solves
(a version of ) the geodesic equation
2(gik i ) gij,k i j
for all k = 1, . . . , n, in the sense of distributions.
Proof. First of all, note that we can choose to be continuous by Problem

48 above. Thus it makes sense that is in a free homotopy class. Since
minimizes energy, then for each h smooth with compact support so that
(supp h) a single coordinate chart in X, that

d
E( + h) = 0.
d =0
119
Compute the first variation as in the derivation of the Euler-Lagrange equa-
tions in Subsection 4.2 above:
Z Z Z
k i j
gij,k h dt + i j
gij h dt + gij i h j dt = 0.
S1 S1 S1
Since the components of h are smooth with compact support, they act as
test functions, and we may then integrate by parts in the second and third
integrals, in the sense of distributions, to conclude that
0 = (gkj j ) + (gik i ) gij,k i j = 2(gik i ) gij,k i j
in the sense of distributions.

Remark. In the previous proposition, we cannot immediately perform the
usual rules of calculus, since the objects involved are only distributions. In
particular, we show in the next homework problem that functions which are
only continuous cannot be meaningfully multiplied by distributions which
are not Borel measures.
Homework Problem 49. Note that if : Rn R is a smooth function, and

f is a locally L1 function, then the product f is also a locally L1 function.
(a) If : Rn R is a smooth function, and p is a distribution on Rn , then

show that it is possible to define the product p in such a way that if p
is induced from a locally L1 function, then p is induced from the usual
product of two functions.
(b) Let be the -function on R. Compute its first derivative in the sense
of distributions.
(c) Show that if g : R R is a continuous function which is not differen-

tiable at 0, then the formula for the product developed in part (a) above
does not give a reliable answer for the product g of the continuous
function g and the distribution .
4.5 Hilbert spaces

Recall that a Hilbert space is a Banach space whose norm comes from a
positive definite inner product. We now show that L21 (S1 , R) is a Hilbert
120
space. Recall that L21 (S1 , R) consists of all L2 functions on S1 whose derivative
in the sense of distributions is also L2 . This suggests a natural inner product:
Z Z
hf, hiL21 = f h dt + fh dt.
S1 S1
Then plug in f = h to find
Z Z
2
kf kL2 = 2
|f | dt + |f|2 dt = hf, f iL21 ,
1
S1 S1
and so the norm on L21 is induced by the inner product. Below in Corollary
67, we show that any positive definite inner product defines a norm.
Remark. L21 (S1 , RN ) is also naturally a Hilbert space, with inner product
given by Z Z
hf, hiL2 =
1
hf, hi dt + hf, hi
dt,
S1 S1
N
where h, i is the inner product on R .
It is also useful to define complex Hilbert spaces, in which the inner
product h, i is Hermitian and positive definite. A Hermitian inner product
on a complex vector space V is a map from V V C which satisfies for
C and f, g, h V ,
hf + g, hi = hf, hi + hg, hi,
gi + hf, hi,
hf, g + hi = hf,
hf, gi = hg, f i.
These three conditions are respectively that the inner product is complex
linear in the first slot, complex antilinear in the second slot, and skew-
symmetric. The first two conditions together are called sesquilinear.
Then L21 (S1 , C) is a complex Hilbert space with inner product
Z Z
hf, gi = f g dt + fg dt.
S1 S1
We can also define the Sobolev space L21 (Rn , R) by the inner product
Z n Z
X
hf, gi = f g dxn + f,i g,i dxn ,
Rn i=1 Rn
the derivatives taken in the sense of distributions. The elements of L21 (Rn , R)
are then equivalence classes of functions in L2 so that all the first partials in
the sense of distributions are also in L2 .
121
We will work with L21 (S1 , R) instead of L21 (S1 , RN ), since convergence
in L21 (S1 , RN ) is equivalent to each component converging in L21 (S1 , R). The
proofs that follow will work with minor modifications for the spaces L21 (S1 , RN )
and L21 (S1 , C).
We focus on L21 (S1 , R), which we refer to simply as L21 .
Proposition 62. L21 (S1 , R) is a Hilbert space.
Proof. Weve exhibited an inner product on L21 , and it is easy to check that
it is positive definite (if we consider elements to be equivalence classes of
functions, two functions being equivalent if they agree almost everywhere).
Thus the remaining thing to check is that the metric L21 (S1 , R) is complete
(and so it is a Banach space).
First of all note that fn f in L21 is equivalent to fn f in L2 and
fn f in L2 .

Let fn be a Cauchy sequence in L21 . Then by the definition of the norm,
it is clear that fn and fn are both Cauchy sequences in L2 . Then we have
limits fn f and fn g in L2 . In order to show that fn f in L21 , it
suffices to show that f = g in the sense of distributions.
Let be a test function, and note that fn f in L2 implies by Holders
inequality that
Z

|fn () f ()| = (fn f ) dt kfn f kL2 kkL2 0

S1
as n . We use this fact for both fn f and fn g to compute for a

test function
Z Z
g() = g dt = lim fn dt
S1 n S1
Z
= lim fn dt
n S1
Z
= f dt
S1
= f().
= f ()
Therefore, g = f in the sense of distributions.

Remark. Essentially the same proof shows that L21 (Rn , Rm ) is a Hilbert space.
122
For a real Hilbert space H, an orthonormal basis is a collection of elements
{e }A which are orthonormal in that
he , e i =
and so that every element v H can be written as

X
v= v e
A
for v R. Here A is an index set, which may be finite, countably infinite, or

uncountable (and of course the convergence of any infinite sum is controlled
by the norm). A Hilbert space which has a countable (finite or infinite)
orthonormal basis is called separable. The following is true:
Proposition 63. Every Hilbert space has an orthonormal basis. In fact,
every orthonormal set in a Hilbert space can be completed to an orthonormal
basis.
We omit the proof, which is similar to the proof of the corresponding
fact for vector spaces (any linearly independent set can be completed to a
basis). In particular, Zorns Lemma is needed in the case of non-separable
Hilbert spaces. But see Problem 54 below for a proof of this Proposition
for separable Hilbert spaces, and for a discussion of how this special case is
adequate for the proofs of the results in this section.
Theorem 22 (Pythagorean Theorem). If v, w H a Hilbert space, and
hv, wi = 0, then
kvk2 + kwk2 = kv + wk2 .
Proof. Compute
kv + wk2 = hv + w, v + wi = hv, vi + 2hv, wi + hw, wi = kvk2 + kwk2 .
Lemma 64 (Bessels Inequality). If {e1 , . . . , en } is a finite orthonormal

set in H, then for all y H,
n
X
2
kyk |hy, ei i|2 .
i=1
123
Pn
Proof. Check that for w = i=1 hy, ei iei , hy w, wi = 0. Then apply
the Pythagorian Theorem to y = (y w) + w, and note that kwk2 =
P n 2
i=1 |hy, ei i| .
Corollary 65. If ei is a countable orthonormal set, then

X
kyk2 |hy, ei i|2 .
i=1
Proof. Use Bessels Inequality and take limits of partial sums.

Theorem 23. Given a Hilbert space H with an orthonormal basis {e }A ,
for every element v H,
X
v = hv, e ie , (35)
A
X
kvk2H = hv, vi = |hv, e i|2 , (36)
A
where the (possibly uncountable) sums are defined by using

P Homework Prob-
2
50 below. Moreover, if there are v R so that A |v | < , then
lem P
v = A v e converges to an element of H.
Remark. For each v H, only a countable number of the coefficients v =
hv, e i are nonzero. This is due to the following fact:
Homework Problem 50. Let A be an uncountable set, and for each A,
let x 0.
(a) If A0 A is a finite set, let SA0 = A0 x . Show that if the set
P
{SA0 : A0 A is a finite set}

is bounded, then x = 0 for all but countably many A.
(b) Use part (a) to define
X
x = sup{SA0 : A0 A is a finite set}
A
as an element of [0, ] for any x 0. In particular, if the sum is

finite, show that X X
x = x ,
A
A
124
where A = { A : x > 0} is countable. Show that if A is infinite,
the right-hand sum is the usual sum of a convergent countably infinite
series (for any bijection between A and the natural numbers).
Hint for (a): Each x > 0 satisfies x [2n , 2n+1 ) for some n Z.
Derive a contradiction if the number of positive x is uncountable.
Remark. Note that X Z
x = x dc
A A
for dc the counting measure on A. If A0 A, then the counting measure

c(A0 ) = |A0 | the cardinality of A0 (and so c(A0 ) = + when A0 is infinite).
Proof of Theorem 23. First assume that v R and A |v |2 < . Then
P
Homework Problem 50 above shows that all but Pcountably manyP of v are
zero, and so we may write v as a countable sum i=1 v ei . Let vn = ni=1 v i ei .
i
Then for n > m

2
X n Xn
X
2 i i 2
kvn vm k = v ei = |v | |v i |2 .

i=m+1 i=m+1 i=m+1
Here,
P the second equality is by the Pythagorean
P Theorem. Since the series
i 2 i 2
i=1 |v | converges, the tail of the series i=m+1 |v | must go to zero as
m , and thus {vn } is a Cauchy sequence in H. Since H is complete, vn
converges to the limit v H.
Now let v H and v = hv, e i. Then Bessels Inequality shows that for
all finite subsets A0 A, that
X
|v |2 kvk2 .
A0
So for the collection {|v |2 }A , the set S of finite partial sums is bounded.
So Homework Problem 50 shows that all but countably many v = 0. Denu-
merate the countable number of nonzero terms as v 1 , v 2 , . . . , and the corre-
sponding elements of the Porthonormal basis as e1 , e2 , . . . .
N i 2
Since the sequence i=1 |v | is bounded and increasing, P iti has a finite
limit as N . We have shown above that the series i=1 v ei converges
0
to a limit v H. Compute
* n
+
X
hv v 0 , ei i = lim v v j ej , ei = v i v j ji = 0.
n
j=1
125
And for any e 6 {e1 , e2 , . . . }, compute
* n
+
X
0 j
hv v , e i = lim v v ej , e = 0.
n
j=1
So for all e in the orthonormal basis,

* +
X
hv, e i = hv 0 , e i = v i ei , e = v .
i=1

P the definition of orthonormal basis shows that there are v R so
Now
that A v e = v. By the same analysis
P ias above, all but countably many

v are zero, and we may write v = i=1 v ei . Moreover, as in the previous
paragraph, * +
X
v = hv, e i = vi ei , e = v
i=1
and so (35) is proved.
PnTo prove (36), note that (35) shows that v = limn vn in H, for vn =
i=1 hv, ei iei . Since the norm is continuous, then
n
X
X X
kvk2 = lim kvn k2 = lim |hv, ei i|2 = |hv, ei i|2 = |hv, e i|2 .
n n
i=1 i=1 A
This concludes the proof of the theorem.

Corollary 66. If v =
P i
P i
i=1 v ei , w = i=1 w ei for {ei } an orthonormal
basis of a separable Hilbert space, then
X
hv, wi = v i wi .
i=1
Proof. Compute
kv + wk2 = kvk2 + 2hv, wi + kwk2 ,
hv, wi = 12 [kv + wk2 kvk2 kwk2 ]
X
= 12 [(v i + wi )2 (v i )2 (wi )2 ]
i=1

X
= v i wi .
i=1
126
Remark. The formula for a complex Hilbert space is

X
hv, wi = v i wi .
i=1
Remark. Homework Problem 50 shows that this result still holds for non-
separable Hilbert spaces, since the number of basis elements with nonzero
coefficients for v and/or w is countable.
Here is another basic result in Hilbert spaces:
Homework Problem 51 (Cauchy-Schwartz Inequality). If v, w H a

real Hilbert space, then |hv, wi| kvkkwk, and there is equality if and only
if v and w are linearly dependent.
Hint: Use calculus to compute the minimum value of ktv + wk2 as a
function of t, and note the minimum value must be nonnegative.
Remark. The Cauchy-Schwartz Inequality is also true for complex Hilbert

spaces, but for the proof, note that the minimum value of ktei v + wk2 , for
t R and so that ei hv, wi = |hv, wi|, is nonnegative.
Corollary 67. Any positive definite inner product on a real vector space V
produces a norm by the formula kvk2 = hv, vi.
Proof. The main thing to check is the triangle inequality. Let v, w V and
note that
kv + wk kvk + kwk
kv + wk2 kvk2 + 2kvkkwk + kwk2
kvk2 + 2hv, wi + kwk2 kvk2 + 2kvkkwk + kwk2
hv, wi kvkkwk.
The main results we will use regarding Hilbert spaces involve another
topology on the Hilbert space which is different from the topology defined
by the metric. The usual metric convergence of sequences is called strong
convergence. So a sequence vi v in H strongly if
kvi vkH 0,
127
and we write vi v. On the other hand, a sequence vi H is weakly
convergent to a limit v H if
hvi , wi hv, wi for all w H,
and we write vi * v. If vi v, then vi * v (Homework Problem 52 below),

but the converse is not true in general, as the following example shows:
Example 19. Let H be a Hilbert space with a countably infinite orthonormal
basis e1 , e2 , . . . . Then ei * 0 in H, but {ei } does not converge strongly.
Proof. Let w H. Then since kwk2 = 2
P
i=1 |hw, ei i| < , we must have
each term |hw, ei i|2 0 as i . This shows ei 0 weakly in H as
i .
To show {ei } does not converge strongly, note that

kei ej kH = 2 for i 6= j
by the Pythagorean Theorem. Thus {ei } cannot be a Cauchy sequence in

H, and thus cannot converge strongly.
Homework Problem 52. Show that if vi v converges strongly in a
Hilbert space H, then vi * v weakly in H.
Hint: Use Cauchy-Schwartz.
Theorem 24. Let {vi } be a sequence in a Hilbert space H satisfying kvi k
K for a uniform constant K. Then there is a weakly convergent subsequence
to a limit v which satisfies kvk K. In other words, the closed ball of radius
K is (sequentially) compact in the weak topology on H.
Proof. Let {e }A be an orthonormal basis. Problem 50 shows that for
each of {v1 , v2 , . . . }, only a countable subset Av1 , Av2 , A have nonzero
coefficients in the orthonormal decomposition. Then the union

[
Av i
i=1
is also countable, and it represents all the basis elements with nonvanishing
coefficients for all the vi . Denumerate these elements as e1 , e2 , . . . , and write

X
vi = vij ej .
j=1
128
Since there is a constant K so that kvi k K, then Theorem 23 shows
for each N
N
X
|vij |2 K 2 . (37)
j=1
Thus, since the interval [K, K] R is compact, there is a subsequence

{1 vi } of {vi } so that
lim 1 vi1 = v 1 [K, K].
i
Now there is a subsequence {2 vi } of {1 vi } so that
lim 2 vi1 = v 1 , lim 2 vi2 = v 2 , |v 1 |2 + |v 2 |2 K 2 .

i i
This is because 1 vi2 [K, K], which is compact, and the bound follows from
(37). Recursively, we may define for each N a subsequence {N vi } and a real
number v N so that
{N vi } is a subsequence of {(N 1) vi },
lim N vij = v j for j = 1, . . . , N, (38)
i
|v | + + |v N |2 K 2 .
1 2
(39)
We use a diagonalization procedure to find a weakly convergent subse-

quence. {i vi } isPa subsequence of {vi }, and we will show that it converges
weakly to v = i j
i=1 v ei and v H. Note by construction that {i vi } v
j
as i for each j = 1, 2, . . . . This is because, for each j, {i vi }

i=j is a
subsequence of {j vi }i=1 and by condition (38).
That v H follows directly from (39) and Theorem 23. Now we show
i vi v weakly in H. Let w H, and let > 0. Write
|hi vi , wi hv, wi| = |hi vi v, wi|

X
|(i vi v )w |
A
X
= |(i vij v j )wj |
j=1
Xn
X
= |(i vij j j
v )w | + |(i vij v j )wj |
j=1 j=n+1
129
for all n. Here the third line follows from the second since i vi = v = 0 if
e 6 {e1 , e2 , . . . }.
Since ki vi k K and kvk K, then ki vi vk 2K and Cauchy-Schwartz
shows that

! 21
! 12
X X X
|(i vij v j )wj | |i vij v j |2 |wj |2
j=n+1 j=n+1 j=n+1

! 12
! 12
X X
|i vij v j |2 |wj |2
j=1 j=n+1

! 12
X
2K |wj |2
j=n+1
P
|wj |2 |w |2 = kwk2 converges, and there is an n
P
Since w H, j=1 A
so that ! 12

X
|wj |2 < .
j=n+1
Now for j = 1, 2, . . . , n, each i vij v j as i . So we may choose an I

so that for all i I, |i vij vj | < 0 for (|w1 | + |wn |)0 < . Therefore, for
i I,
n
X
X
|hi vi , wi hv, wi| |(i vij v j )wj | + |(i vij v j )wj |
j=1 j=n+1
0 1 n
(|w | + + |w |) + 2K
< + 2K.
Since K is independent of i, hi vi , wi hv, wi as i and thus i vi * v

weakly in H.
Theorem 25. Let vi * v in a Hilbert space. Then
kvk lim inf kvi k.

i
In other words, the Hilbert space norm is lower semicontinuous under weak
convergence.
130
Proof. The proof is to translate the current problem into Fatous Lemma.
Let {e }A be an orthonormal basis of our Hilbert space H. Then put the
counting measure c on the index set A. Let f : A [0, ), f : 7 f be a
nonnegative real-valued function on A. Then it is straightforward to check
that Z X
f dc = f ,
A A
and thus each sum may be thought of as an integral with respect to the
counting measure.
In our case, if X X
vi = vi e , v= v e ,
A A
we may view vi as a function from A R by vi : vi . (The same holds

for v.) Theorem 23 shows
X X
kvi k2 = |vi |2 , kvk2 = |v |2 . (40)
A A
Now since vi * v, then
vi = hvi , e i hv, e i = v
as i for all . Thus with respect to the counting measure on A, vi v

everywhere on A. Thus each terms in the sums in (40) is nonnegative, and
for each , |vi |2 |v |2 , the limit vi v satisfies the hypotheses of Fatous
Lemma with respect to the counting measure, and so
X
lim inf kvi k2 = lim inf |vi |2
i i
A
Z
= lim inf |vi |2 d
i A
Z X
|v|2 d = |v |2 = kvk2 .
A A
Note that the above proofs depend heavily on the existence of an or-
thonormal basis, Proposition 63, which we did not prove. The following
131
problem outlines a standard procedure for getting around the proof of Propo-
sition 63, by proving the existence of an orthonormal basis for any Hilbert
space with a countable spanning set. A subset S of a Hilbert space H is said
to be a spanning set if the (strong) closure of finite linear combinations of
elements in S is equal to all of H. For example, in the proof of Theorem
24, we need only deal with the closure H 0 of the span of {v1 , v2 , . . . }. The
existence of an orthonormal basis of H 0 is sufficient for the proof of Theorem
24.
Homework Problem 53. Show that any strongly closed linear subspace of
a Hilbert space H is again a Hilbert space (with the same inner product).
We say a subset {v }A H is linearly independent (in the sense of
Banach spaces) if any convergent sum
X
b v = 0
A
implies b = 0 for all A. Note in particular, the implication holds for any
finite sum (and thus this notion of linearly independence in this Banach-space
sense implies linear independence in the usual vector-space sense).
Homework Problem 54 (Gram-Schmidt Orthogonalization).
(a) Let H be a Hilbert space with a countable spanning set {v1 , v2 , . . . }
which is finite or countably infinite. Show that there is a subset of
{v1 , v2 , . . . } which is a linearly independent spanning set of H.
(b) Given a linearly independent spanning set {v1 , v2 , . . . } on a Hilbert
space H, define fi and ei recursively by
f1
f1 = v1 , e1 = ,
kf1 k
f2
f2 = v2 hv2 , e1 ie1 , e2 = ,
kf2 k
n1
X fn
fn = vn hvn , ei iei , en = .
i=1
kfn k
Show that this recursive definition can be carried out (in particular,
show that fn 6= 0). Then show that {e1 , e2 , . . . } is an orthonormal
basis for H. In other words, show that hei , P
ej i = ij and that any v in
H can be written as a convergent sum v = i
i=1 v ei .
132
The use of the previous problem isnt strictly necessary for our purposes,
as L21 (S1 , R) is separable (though we wont prove that it is).
Recall that for every Banach space B, the dual space Banach space B is
the space of all continuous linear functionals : B R, with norm given by
|(x)|
kkB = sup .
xB\{0} kxkB
Also recall that for any p (1, ), the dual Banach space of Lp (Rn ) is
Lq (Rn ) for p1 + q 1 = 1. Thus L2 (Rn ) is dual to itself. This fact is true for
all Hilbert spaces, as the following problem shows in the separable case.
Homework Problem 55. Let H be a separable real Hilbert space. Show

that the dual Banach space H is naturally equal to H. In particular, the
inner product provides a map from H H by
x 7 x = h, xi.
Show that this map preserves the norm, is one-to-one and onto.
Hint: The most significant step is showing the map is onto. First reduce
to the case 6= 0. Show that L = 1 (0) is also a separable Hilbert space,
and let {ei } be an orthonormal basis for L. Let y
/ L and use a version of
Gram-Schmidt to show we may assume y L. Construct x from y and .
A sequence vi in a Banach space B converges to v B, in the weak*

topology if for every B , (vi ) (v). The previous problem shows that
Theorem 24 is a special case of the following more general theorem about
Banach spaces:
Theorem 26 (Banach-Alaoglu). In a Banach space B, the unit ball {x

B : kxkB 1} is compact in the weak* topology. In other words, if xi is a
sequence in the unit ball, then there is a subsequence xij and a limit x B
so that for all B , (xij ) (x) as j .
Example 20 (Fourier series). In the following theorem, we compute per-

haps the easiest nontrivial example of an orthonormal basis on an infinite-
dimensional Hilbert space. L2 (S1 , C) is a complex Hilbert space with inner
product given by Z
hf, gi = f g dt.
S1
133
Theorem 27. The complex exponential functions
{e2ikt : k Z}
form an orthonormal basis of L2 (S1 , C).
Proof. It is clear that each e2ikt L2 (S1 , C), and we compute

Z
2ikt 2i`t
he ,e i = e2ikt e2i`t dt
1
ZS 1
= e2i(k`)t dt
0 1
e2i(k`)t
=0 if k 6= `

= 2i(k
Z 1 `) 0

dt =1 if k = `.
0
Therefore, {e2ikt } 2 1
k= forms an orthonormal set in L (S , C).
We must show that every element f L2 (S1 , C) can be written as a
Fourier series
X
f= hf, e2ikt ie2ikt ,
k=
with the convergence in the L2 sense.

First, we address this problem for smooth functions f C (S1 , C). Re-
call that C (S1 , C) is dense in L2 (S1 , C) (which may be proved by mollifying
L2 functions).
Lemma 68. If f C (S1 , C), then for every polynomial P = P (k),
lim P (k)hf, e2ikt i = lim P (k)hf, e2ikt i = 0.

k k
Proof. We use the following claim: For any L2 function f , the Fourier coef-
ficients hf, e2ikt i 0 as k . This follows from Bessels Inequality

X
|hf, e2ikt i|2 kf k2L2 < .
k=
134
If f is smooth, then f is also smooth (and thus is in L2 ), and integration
by parts gives us
Z 1

hf , e2ikt
i = fe2ikt dt
0
Z 1 1
2ikt 2ikt

= f (e )dt + f (t)e
0 0
= 2ikhf, e2ikt i + 0.
Now by the claim, hf, e2ikt i = 2ikhf, e2ikt i 0 as k . Now we may

apply induction to show that
lim k n hf, e2ikt i = 0 for each n = 0, 1, 2, . . . .

k
Thus any polynomial P (k) times the Fourier coefficients also goes to zero as
k .
The previous lemma shows that for any smooth function f C (S1 , C),
the Fourier series
X
g(t) = hf, e2ikt ie2ikt
k=
converges uniformly: This is because there is a constant C > 0 so that

C
|hf, e2ikt i|
1 + k2
(why?), which shows that the C 0 norm of the Fourier series satisfies

X
hf, e2ikt ie2ikt 0
X C
C
< .
k= k=
1 + k2
So the sup norm of the tails of the series

X
hf, e2ikt ie2ikt
k=
must go to zero, as they are bounded by the tails of an absolutely convergent

series.
135
Therefore, uniform convergence implies that g(t) is continuous (and thus
is in L2 as wellwhy?). (In fact, g(t) is smoothsee Homework Problem 58
below.) If we let

X
h(t) = f (t) g(t) = f (t) hf, e2ikt ie2ikt ,
k=
then by the same techniques in the proof of Theorem 23 above, we see that
hh, e2ikt i = 0 for all k Z.
The following lemma shows that h = 0:

Lemma 69. Given a function h C 0 (S1 , C) all of whose Fourier coefficients
hh, e2ikt i = 0, then h = 0 identically.
Proof. We prove by contradiction. If h is not identically zero, then there is
a point S1 at which h( ) 6= 0. Then we know that at least one of the
following is true:
Re h( ) > 0, Re h( ) < 0, Im h( ) > 0, Im h( ) < 0.
Assume that Re h( ) > 0 (the other cases are similar), and let (t) = Re h(t).
Since is continuous, there is a > 0 so that
(t) > 12 ( ) > 0 if t ( , + ).
We will construct an approximate bump function to prove a contradiction.

For n a positive integer, define
n
bn (t) = [ 12 + 12 cos 2(t )]n = 21 + 14 e2i e2it + 14 e2i e2it .

It is obvious that bn (t) is real-valued, periodic with period 1 (and so defines

a function on S1 ), and is equal to a finite Fourier series. Moreover, note that
1
2
+ 12 cos 2(t ) [0, 1]
always, and is equal to 1 only if t = in S1 . Thus the powers bn (t) 0 as

n away from t = , while bn ( ) 1. This is the property that makes
bn similar to bump functions centered around t = .
136
Now compute
Z

|Re hh, bn i| = Re h(t)bn (t) dt
1
Z S

= (t)bn (t) dt

1
ZS + Z

= (t)bn (t) dt + (t)bn (t) dt
S1 \[ , +]
Z + Z

(t)bn (t) dt (t)bn (t) dt
S1 \[ , +]
Z + Z
2
> (t)bn (t) dt (t)bn (t) dt .
2 S1 \[ , +]
(Note the last inequality follows since the integrand is positive.) Also, we
have the following bounds:
t [t 2 , t + 2 ] = (t) > 12 ( ) > 0, bn (t) ( 12 + 12 cos )n ,

t 6 [t , t + ] = |(t)| < C, bn (t) ( 12 + 12 cos 2)n .
for some constant C (since is continuous). The bounds on bn follow by

examining the graph of the cosine function. The key point is that
1
2
+ 12 cos > 1
2
+ 12 cos 2 > 0. (41)
Now compute
+ 2
Z Z

|Re hh, bn i| > (t)bn (t) dt (t)bn (t) dt
2 S1 \[ , +]
12 ( )( 12 + cos ) (1 2)C( 12 + 12 cos 2)n .

1
2
n
Now (41) shows the ratio of the first term over the second goes to + as
n and thus there is an n so that |Re hh, bn i| > 0.
Now the contradiction is this: Since bn is a finite Fourier series, hh, bn i is
a finite linear combination of Fourier coefficients hh, e2ikt i, which we assume
are all zero. Thus hh, bn i = 0, and we have a contradiction.
Since h is the difference between the smooth f and its Fourier series, we
have shown
137
Lemma 70. Let f C (S1 , C). Then

X
f (t) = hf, e2ikt i e2ikt ,
k=
and the series converges uniformly in t.

Uniform convergence on S1 implies L2 convergence (since S1 has finite
measure; why does this imply L2 convergence?). Therefore, as in Theorem
23, we have
Z
X
2
kf kL2 = |f |2 dt = |hf, e2ikt i|2 ,
S1 k=
for f C (S1 , C).

To complete the proof of Theorem 27, first define the Hilbert space `2 =
L2 (Z, C) for the counting measure on Z. In other words, P`2 is the set of all
complex-valued integer-indexed sequences {v }kZ so that
k k 2
k= |v | < .
Then we have the operation F of defining Fourier series:
F : L2 (S1 , C) `2 , F : f 7 fk = hf, e2ikt i.
Moreover, on the dense subset C (S1 , C) L2 (S1 , C), F is an isometry.

Bessels Inequality and the fact that {e2ikt } is an orthonormal set in L2 (S1 , C)
shows that for all f L2 (S1 , C),

X
kf k2L2 |hf, e2ikt i|2 = kF(f )k2`2 .
k=
Therefore F is a bounded linear map from L2 (S1 , C) to `2 . A linear map L

from a Banach space B1 to another Banach space B2 is called bounded if
there is a positive constant C so that for all v B1 ,
kL(v)kB2 CkvkB1 .
A linear map between Banach spaces is bounded if and only if it is continuous

(see Problem 56 below). Therefore, F is continuous.
Also, define the linear map G : `2 L2 (S1 , C) by

X
G(v) = v k e2ikt .
k=
138
The proof of Theorem 23 shows that G preserves the norms. In other words,
kG(v)kL2 (S1 ,C) = kvk`2 .
Let f L2 (S1 , C). Since smooth functions are dense in L2 , there is a

sequence fn f in L2 for fn C (S1 , C). Since F is continuous, then
F(fn ) F(f ) in `2 as n . In other words,
0 = lim kF(fn ) F(f )k2`2

n
= lim kfn fk2`2

n
= lim kG(fn ) G(f)k2L2

n
Now recall that

X
G(fn ) = hfn , e2ikt ie2ikt = fn
k=
since fn is smooth. Therefore,
0 = lim kG(fn ) G(f)kL2 = lim kfn G(f)kL2 .

n n
So in L2 ,

X
fn G(f) = fk e2ikt .
k=
Since we assumed fn f in L2 , this shows

X
f= fk e2ikt
k=
in L2 , and since the sum converges in L2 , finite linear combinations of the

orthonormal set {e2ikt } are dense in L2 (S1 , C), and {e2ikt } is an orthonormal
basis of L2 (S1 , C). So Theorem 27 is proved.
Homework Problem 56. Let L : B1 B2 be a linear map between Banach
spaces. Show that L is bounded if and only if L is continuous.
Homework Problem 57. Using the notation of the proof of Theorem 27
above, show that F : L2 (S1 , C) `2 is an isometry and that F G is the
identity map.
139
Homework Problem 58. Let f k C for all k Z, and assume for all
n 0 that
lim k n f k = lim k n f k = 0.
k k
Then the Fourier series

X
f (t) = f k e2ikt
k=
converges uniformly to a smooth function from S1 C.

Hint: The key isP being able to change the order of thePderivative d/dt
with the summation k= . Recall that the summation k= can be in-
terpreted as an integral over Z with respect to the counting measure d. Thus
for all t S1 , Z
f (t) = f k e2ikt d(k).
Z
1 1
To show that f (t) C (S , C), show that there is a constant C > 0 so
that
C
|f k | .
1 + |k|3
Mimic the proof of Proposition 11: Show that the absolute value of the dif-
ference quotient
f k e2ik(t+h) f k e2ikt
h
0
is uniformly C1+|k|
(|k|+1)
3 for a constant C 0 . (Apply the Mean Value Theorem
to the real and imaginary parts of e2ikt separately.) Show that the series

X C 0 (|k| + 1)
k=
1 + |k|3
converges by using the procedure in the proof of Lemma 70 above.

Use induction to show f (t) is smooth.
4.6 Compact maps and the Ascoli-Arzel

a Theorem
Recall that every element of L21 (S1 ) = L21 (S1 , R) has a continuous represen-
tative (Proposition 59). So there is a natural linear map L21 (S1 ) C 0 (S1 ).
140
In this section, we show that this map is compact. A linear map between
Banach spaces : B1 B2 is called compact if the closure of the image of
the unit ball in B1 is strongly compact in B2 . In other words, if vi B1
satisfy kvi kB1 1, then {(vi )} has a strongly convergent subsequence in
B2 : i.e. there is a subsequence {vij } and an element w B2 so that
lim k(vij ) wkB2 = 0.
j
The basic observation which allows us to conclude that the natural inclu-
sion map L21 (S1 ) C 0 (S1 ) is compact comes from the proof of Proposition
59. If f L21 (S1 ), then
Z t2

|f (t2 ) f (t1 )| = f (t) dt
t1
Z t2 12 Z t2 12
|f(t)|2 dt dt
t1 t1
Z 12
1
2
|f (t)| dt (t2 t1 ) 2
S1
1
kf kL21 (t2 t1 ) 2
(Note that the first equality was justified in the proof of Proposition 59.)
Therefore, f is continuous. But moreover, for every > 0, we may choose
!2

=
kf kL21
so that
|t2 t1 | < = |f (t2 ) f (t1 )| < .
So the modulus of continuity does not depend on t, and depends only on
the norm kf kL21 , and on no other information about f .
A family of functions of functions from a metric space X to a metric
space Y is called equicontinuous at a point x X if for all > 0, there is a
> 0 so that
dX (x, x0 ) < = dY (f (x), f (x0 )) <
for all f . The point is that does not depend on f . Such a family
of functions is called equicontinuous on X if it is equicontinuous at each
point x X.
141
Note that if is equicontinuous on X then each f is continuous.
The computations above show
Lemma 71. The unit ball in L21 (S1 ) is equicontinuous on S1 .
Theorem 28 (Ascoli-Arzel a). Let X be a compact metric space, and let

be an equicontinuous family of real-valued functions on X. Assume there
is a uniform C so that |f (x)| C for all f and x X. Then each
sequence {fn } has a uniformly convergent subsequence.
Proof. Well prove the theorem with the help of a few lemmas.
Lemma 72. Any compact metric space has a countable dense subset.
Proof. Let X be the compact metric space. For = 1/n, obviously
[
X= B (x), B (x) = {y X : dX (x, y) < }.
xX
For each positive integer n, this open cover of X has a finite subcover con-
sisting of balls of radius 1/n centered at points xn,1 , . . . , xn,mn . The union

[
{xn,1 , . . . , xn,mn }
n=1
is a countable dense subset of X.

Lemma 73. Let P be a countable set, and let fn : P R be a sequence
of functions. Assume there is a constant C so that |fn (p)| C for all
n = 1, 2, . . . and all p P. Then there is a subsequence of {fn } which
converges everywhere on P to a function f : P R.
Proof. See Problem 59 below.
Lemma 74. Let {fn } be an equicontinuous sequence of mappings from a
compact metric space X to R. If the sequence {fn (x)} converges for each x
in a dense subset of X, then {fn } converges uniformly on X to a continuous
limit function.
142
Proof. First we show that fn (x) converges pointwise everywhere to a function
f (x). Let y X and let > 0. Then by equicontinuity, there is a > 0 so
that
dX (x, y) < = |fn (x) fn (y)| < .
(Note is independent of n.) Since fn converges on a dense subset of X,
there is an x B (y) for which fn (x) converges. Therefore, {fn (x)} is a
Cauchy sequence in R, and so there is an N so that
n, m N = |fn (x) fm (x)| < .
Therefore, for n, m N ,
|fn (y) fm (y)| |fn (y) fn (x)| + |fn (x) fm (x)| + |fm (x) fm (y)| < 3.
Therefore, {fn (y)} is a Cauchy sequence in the complete metric space R, and
so it converges to a limit which we call f (y).
Let y X and > 0. Then equicontinuity shows that there is a > 0 so
that
x B (y) = |fn (x) fn (y)| < (42)
for all n. By letting n , we also have
x B (y) = |f (x) f (y)| (43)
These B (y) form an open cover of X, and so there is a finite subcover

k
[
X= Bi (yi )
i=1
since X is compact. Choose N large enough so that
nN = |fn (yi ) f (yi )| < , i = 1, . . . , k. (44)
Then for x X, x Bi (yi ) for some yi , and so (42), (43) and (44) show
|fn (x) f (x)| |fn (x) fn (yi )| + |fn (yi ) f (yi )| + |f (yi ) f (x)| < 3.
Since the same N works for all x X, the convergence is uniform.

f , as the uniform limit of continuous functions, is continuous.
This completes the proof of Theorem 28.
143
Homework Problem 59. Let P be a countable set, and let fn : P R
be a sequence of functions. Assume that for each p P, there is a constant
C = Cp so that |fn (p)| C for all n = 1, 2, . . . . Show there is a subsequence
of {fn } which converges everywhere on P to a function f : P R.
Hint: Use a diagonalization argument.
An important version of the Ascoli-Arzela Theorem is the following:
Theorem 29. Let X be a metric space so that there is a countable number
of open subsets Oi satisfying

[
X= Oi , Oi Oi+1 , (45)
i=1
and let be an equicontinuous set of real-valued functions on X. If for a

sequence of functions {fn } , there is a uniform C so that |fn (x)| C
for all n and all x X, then there is a subsequence of {fn } which converges
pointwise to a function f : X R, and the convergence is uniform on every
compact subset of X.
Remark. Recall A B for A a subspace of a topological space B means
that the closure A relative to B is compact.
Remark. A sequence of functions converging uniformly on compact subsets
of X is said to converge normally on X.
We relegate the proof of Theorem 29 to the following problem:
Homework Problem 60. Prove Theorem 29.
Hint: Consider X, Oi as in the previous theorem. Note we may apply
Theorem 28 to each of the compact sets Oi . Use a diagonalization argument
to find a uniformly convergent subsequence on each Oi . Show that every
compact subset of X is contained in some Oi .
Remark. For every smooth manifold X (which is Hausdorff and sigma-compact),
there are a countable collection of open sets Oi satisfying condition (45). See
the notes on The Real Definition of a Smooth Manifold.
The Ascoli-Arzela Theorem provides the following.
Proposition 75. If C > 0 and {fn } is a sequence of functions in L21 (S1 , R)
which satisfy kfn kL21 C, then there is a uniformly convergent subsequence.
144
Proof. This follows from the Ascoli-Arzela Theorem and Lemma 71 above,
once we know in addition that there is a constant K so that |fn | K
pointwise. First of all, note that
1 1
|fn (t2 ) fn (t1 )| kfn kL21 |t2 t1 | 2 C|t2 t1 | 2
shows that for every t2 , t1 S1 ,
|fn (t2 ) fn (t1 )| C
since we may choose t2 , t1 [0, 1). Since

Z 1 21
2
|fn | dt = kfn kL2 kfn kL21 C,
0
there must be a t1 S1 so that |fn (t1 )| C. Then for any t2 S1 ,
|fn (t2 )| |fn (t1 )| + |fn (t2 ) fn (t1 )| 2C.
Thus the hypotheses of the Ascoli-Arzela Theorem are satisfied.

Corollary 76. The inclusion L21 (S1 , R) , C 0 (S1 , R) is compact.
Proof. Take C = 1 in the above theorem.
Corollary 77. Let C > 0 and let X RN be a compact manifold, and let
n L21 (S1 , X) L21 (S1 , RN ) satisfy E(n ) C. Then there is a uniformly
convergent subsequence of {n }, and the limit is a continuous function :
S1 X.
Proof. Recall
kn k2L2 (S1 ,RN ) = kn k2L2 (S1 ,RN ) + k n k2L2 (S1 ,RN ) = kn k2L2 (S1 ,RN ) + E(n ).
1
Since n (S1 ) X and X is compact, there is a constant K so that |n (t)| K

for all n and t. Therefore,
Z
2
kn kL2 (S1 ,RN ) K 2 dt = K 2 ,
S1
and moreover,
kn k2L2 (S1 ,RN ) C + K 2
1
145
independently of n. So each component function na for a = 1, . . . , N satisfies

kna kL21 (S1 ,R) C + K 2 .
Then Proposition 75 shows that there is a subsequence {1 n } of {n } so

that the component 1 n1 converges uniformly. Let {2 n } be a subsequence of
{1 n } so that 2 n1 and 2 n1 converge uniformly. By induction, as in the proof
of Theorem 24, there is a subsequence {N n } of {n } so that N na converges
uniformly for a = 1, . . . , N . Since this subsequence converges uniformly on
each component in RN , N n converges uniformly as n to a limit in
C 0 (S1 , RN ).
Since X is closed in RN and the subsequence converges pointwise, the
limit : S1 X.
It is also useful to define the Holder norm for functions f : S1 R
|f (t1 ) f (t2 )|
kf kC 0, 21 (S1 ) = kf kC 0 + sup 1 .
t1 6=t2 dS1 (t1 , t2 ) 2
(Here we define
dS1 (t1 , t2 ) = inf |(t1 + k) t2 |.
kZ
This definition is necessary, since we identify the real numbers t and t + k on

the circle S1 . For example, dS1 (0, 0.9) = |1 0.9| = 0.1.) It is easy to check
1
that this defines a norm. Define the space C 0, 2 (S1 ) to be all f from S1 R
so that kf kC 0, 21 (S1 ) < .
1
C 0, 2 (S1 ) is a Banach space (Proposition 78 below), and the calculations
above show that there is a natural continuous inclusion map from L21 (S1 )
1 1
C 0, 2 (S1 ). Moreover, the natural inclusion map from C 0, 2 (S1 ) C 0 (S1 ) is
compact. Then Problem 63 below shows that composition inclusion from
L21 (S1 ) C 0 (S1 ) is compact.
In general, for any metric space X, (0, 1], we can define
C 0, (X) = {f : X R : kf kC 0, < },
|f (x) f (y)|
kf kC 0, = sup |f (x)| + sup .
xX x6=yX dX (x, y)
These are called Holder spaces and H

older norms respectively.
146
Example 21. This is the standard example for X = [1, 1] R. f (x) = |x|
is in C 0, (X).
Proof. It clearly suffices to bound the difference quotient
||x| |y| |
q(x, y) = , x 6= y [1, 1].
|x y|
We will show that this is always 1. First, simplify to the case x and y
have the same sign, since if they have opposite signs, q(x, y) < q(x, y).
We may assume x and y have the same sign. By possibly interchanging
(x, y) (x, y) and switching x and y, we may assume x > y 0. Then
write
x y 1 y
q(x, y) =
=
, = [0, 1).
(x y) (1 ) x
Then we compute
dq (1 1 )
= 0.
d (1 )+1
Therefore, the max of q() is achieved at = 0, q = 1.
We also say f (x) = |x| is locally C 0, on R, since the Holder norm of
f is finite on any compact subset of R.
In the case = 1, note that a function in C 0,1 is simply a C 0 function
which is globally Lipschitz.
(a) Show that the inclusion C 1 (S1 ) , C 0 (S1 ) is compact (Hint: use the
Mean Value Theorem).
(b) Show that every bounded sequence fn C 1 (R) (i.e., there is a uniform
C so that kfn kC 1 C for all n) has a subsequence which converges
uniformly on compact subsets of R to a continuous limit f . Hint: It is
easy to show that R satisfies condition (45).
(c) Find an example of a bounded sequence of functions fn C 1 (R) which
does not have a convergent subsequence in C 0 (R). Thus the inclusion
C 1 (R) , C 0 (R) is not compact. (Hint: How is this situation differ-
ent from parts (a) and (b)? You must use the noncompactness of R.
Therefore, the interesting behavior of the fn should be moving off to
infinity.)
147
It is also useful to apply Holder norms to the derivatives of a functions.
In particular, on Rn , we may define for k a positive integer, (0, 1],
X
kf kC k, = k f kC 0, ,
||k
where, as in (3) above, we use the multi-index notation to denote all the
partial derivatives of f of order k.
Remark. It is not useful to define C 0, for > 1, as the following problem
shows.
Homework Problem 62. Let > 1, and let f : R R. Assume that
|f (x) f (y)|
sup = C < .
x6=y |x y|
Show that f is a constant function.
Hint: Use the definition of the derivative to show that f 0 (x) = 0 for all
x.
Proposition 78. Let X be a metric space and (0, 1]. Then C 0, (X) is
a Banach space.
Proof. It is straightforward to show that k kC 0, is a norm. As always, we
must check completeness carefully.
Let {fn } be a Cauchy sequence in C 0, (X). We want to show that there
is a limit f C 0, and that kfn f kC 0, 0 as n .
First of all, it is obvious from the definition of the Holder norm that
{fn } is a Cauchy sequence in C 0 (X), and since C 0 is complete, there is a
continuous limit function f , and fn f uniformly.
Now we show f C 0, . Let > 0. Then there is an N so that
m, n N = kfm fn kC 0, < . (46)
Then for all m N , kfm kC 0, < kfN kC 0, + C . By the definition of the
Holder norm, for all x, y X,
|fm (x) fm (y)| C dX (x, y) .
Taking m shows that f C 0, . Now (46) also implies that for all
x, y X,
|fm (x) fn (x) fm (y) + fn (y)| dX (x, y) ,
148
and so again let m to show for all x, y X, and for all n N ,
|f (x) fn (x) f (y) + fn (y)| dX (x, y) .
Since we already know fn f in C 0 , this is exactly the additional statement

we need to show fn f in C 0, .
Remark. If X is a smooth manifold, then it is possible (by using an atlas and
a subordinate partition of unity) to define C k, (X). If X is compact, then
C k, (X) , C k (X) is a compact inclusion.
Homework Problem 63. Let : B1 B2 and : B2 B3 be linear maps

between Banach spaces.
(a) Assume is continuous and is compact. Then is compact.
(b) Assume is compact and is continuous. Then is compact.
Homework Problem 64. Let : B1 B2 be a compact linear map of

Banach spaces. Show is continuous.
Hint: It suffices to show is bounded. For B1 (0) the unit ball in B1 ,
consider the image of the compact set
B1 (0) B2
under the norm map k kB2 : B2 R.
Remark. The Holder spaces C k, , for (0, 1), and the Sobolev spaces Lpk ,
for p (1, ), play a very important role in the theory of partial differen-
tial equations. In particular, the behave much better than the more obvious
1
spaces C k . Our simple proofs that L21 (S1 ) embeds continuously in C 0, 2 (S1 )
and compactly in C 0 (S1 ) constitute some of the easiest cases of Sobolev em-
bedding theorem. The Sobolev embedding theorem allow us to embed certain
Sobolev spaces, in which derivatives are defined only in the sense of distri-
butions, to Holder and C k spaces, in which we may take derivatives in the
usual sense. These spaces are crucial to the regularity theory of solutions to
PDEs.
149
4.7 Convergence
Now we have finally developed the tools needed to solve our problem. Recall
Problem: Let X RN be a smooth compact manifold equipped with the

Riemannian metric pulled back from the Euclidean metric on RN . Let C be
the class of loops : S1 X in a free homotopy class on X and in L21 (S1 , X).
Find a loop of least energy in C.
Our strategy is as follows: Define
L = inf E().
C
Since E() 0 always, L 0. Now there is a sequence of i C so that

E(i ) L. We want to find a subsequence ij which converges to a limit
C so that E() = L. Moreover, we expect to be a geodesicit should
satisfy the geodesic equations not just in the sense of distributions, but also
in the usual sense. Therefore, by the theory of ODEs, should be smooth.
First of all, we show the existence of a limit . Corollary 77 shows that
there is a subsequence of i which converges uniformly to a continuous :
S1 X. (For simplicity, we just refer to this subsequence as i again.) Since
i uniformly, Corollary 57 shows that is in the same free homotopy
class. Thus we have
Proposition 79. There is a subsequence of i which converges uniformly to

a limit in the same free homotopy class.
Proposition 80. Let X RN be a compact manifold. If i : S1 X satisfy

E(i ) L, then there is a constant K independent of i so that ki kL21 (RN )
K.
Proof. Since X is compact, there is a uniform C so that ki kL2 (S1 ,RN ) C.

Since E(i ) L, {E(i )} is a bounded sequence. Therefore,
ki k2L2 (S1 ,RN ) = E(i ) + ki k2L2 (S1 ,RN )

1
is bounded independent of i.
This proposition shows there is a further subsequence of i which con-
verges weakly to a L21 (S1 , RN ) by Theorem 24. (Explanatory note: a
150
further subsequence means that we take a subsequence not just of the origi-
nal i , but of the subsequence taken in the paragraph above Proposition 79.)
We still refer to this further subsequence as i . Then Theorem 25 shows that
the Hilbert space norm
k
kL21 (S1 ,RN ) lim inf ki kL21 (S1 ,RN ) .
i
Note a potential problem: We have taken a subsequence of the original

i to converge uniformly to a continuous , and then we take a further sub-
sequence to converge weakly in L21 to in L21 . We must show and are the
same. This will follow from the fact that they must be equal in the sense of
distributions, and thus are equal almost everywhere (Proposition 58). Since
both and are continuous, they must be equal everywhere. In particular,
we require
Proposition 81. = in the sense of distributions.
Proof. It suffices to show each component a = a in the sense of distribu-
tions for a = 1, . . . , N .
For each a = 1, . . . , N , ia a uniformly as i . So if D(S1 ) is
a smooth test function, then
Z

|ia () a ()| = (ia a ) dt kkL1 kia a kC 0 ,
S1
which goes to 0 as i by uniform convergence. Therefore,

a () = lim ia (). (47)
i
Also, ia a weakly in L21 (S1 ). Let D(S1 ) L21 (S1 ) be a test

function. Let fi = ia a . Then fi 0 weakly in L21 . Compute
Z Z
hfi , iL21 =
(fi + fi ) dt = dt = fi ( ),
(fi fi )
S1 S1
the last term denoting fi acting in the sense of distributions. Therefore, for
all D(S1 ),
= lim hfi , iL2 = 0.
lim fi ( )
i i 1
By Proposition 82 below, for every D(S1 ), there is a D(S1 ) so that

= . Therefore, for all D(S1 ),
lim fi () = 0 lim ia () = a ().
i i
Therefore, by (47) above, = in the sense of distributions.
151
Proposition 82. For every D(S1 ), there is a D(S1 ) so that =

.
Proof. Recall D(S1 ) = C (S1 , R). Moreover, Lemma 70 and Problem 58
show that
( )
X
1 k 2ikt k n
C (S , C) = f e : lim f |k| = 0 for n = 1, 2, . . . . (48)
k
k=
The convergence of each such series is uniform, and the sum commutes with
the derivative d/dt.
Therefore, if

X
= k e2ikt C (S1 , C),
k=
then

X
= (4 2 k 2 )k e2ikt ,
k=
X
= (1 + 4 2 k 2 )k e2ikt .
k=
So if
X
= k e2ikt C (S1 , C),
k=
then we may let

X k
= 2k2
e2ikt ,
k=
1 + 4
so that = .
We must prove that C (S1 , C). Let n be a positive integer. Then
k |k|n
lim k |k|n = lim = 0.
k k 1 + 4 2 k 2
because |k ||k|n2 0. So is smooth. (Note that we went from a |k|n

limit to a |k|n2 limit. This is because the differential equation is of order
two.)
We have considered C-valued functions so far. It is easy to check that
C (S1 , R) implies C (S1 , R).
152
Remark. The previous proposition uses a standard technique for solving
constant-coefficient differential equations on S1 . The differential equation
then breaks into an algebraic equation for each Fourier coefficient, each of
which can be typically be solved.
This also works for functions on the n-torus (S1 )n . In this case, the Fourier
series is summed over Zn , and we can solve constant-coefficient PDEs. Also,
on Rn , the Fourier transform turns constant-coefficient PDEs into algebraic
equations of the Fourier transform variable.
Homework Problem 65. L22 (S1 , C) is the complex Hilbert space defined by
the inner product Z
hf, giL2 = 2
(f g + fg + fg) dt.
S1
The elements of L22 (S1 , C) are all complex-valued functions f on S1 which are
L2 and whose first and second derivatives f and f in the sense of distributions
are also L2 functions. (You may assume L22 (S1 , C) is a Hilbert space, as in
Proposition 62.)
Show that if fn f converges weakly in L22 (S1 , C), then for all D(S1 ),
fn () f ().
Hint: Mimic the proofs of Propositions 81 and 82.
To recap, so far we have a sequence of loops i in C so that
lim E(i ) = L = inf E(),

i C
lim i = uniformly and weakly in L21 (S1 , RN ).

i
Moreover, C the same free homotopy class of L21 loops containing the i .
Since i uniformly, we have
Z
2
ki kL2 (S1 ,RN ) = |i |2 dt sup |i |2 0,
S1 t
and so i in L2 .
153
Now Theorem 25 shows that
kk2L2 (S1 ,RN ) lim inf ki k2L2 (S1 ,RN )

1 i 1
h i
= lim inf E(i ) + ki k2L2 (S1 ,RN )
i
= L + kk2L2 (S1 ,RN ) ,
E() = kk2L2 (S1 ,RN ) kk2L2 (S1 ,RN )
1
L.
Since L is the infimum of the energy of all loops in C, and C, then

E() L as well. So E() = L. Thus we have proved
Theorem 30. Let X be a compact Riemannian manifold without boundary.

Then in each free homotopy class of loops, there is a L21 (S1 , X) which
minimizes the energy.
Corollary 83. This minimizing satisfies the geodesic equations (in local
coordinates on X) in the form
2(gik i ) gij,k i j = 0
in the sense of distributions.
Proof. See Proposition 61.

Note in the proof of Theorem 30 above, we implicitly use the fact that
the map from L21 (S1 ) L2 (S1 ) is compact, by using the inclusions
L21 (S1 ) , C 0 (S1 ) , L2 (S1 ),
the first of which is compact and the second of which is continuous. The
following problem gives a direct proof.
Homework Problem 66. Show directly that the inclusion L21 (S1 , C) ,
L2 (S1 , C) is a compact linear map.
Hints:
(a) Use the characterization of L21 (S1 , C) in terms of Fourier series from
Proposition 87 below.
154
(b) If kfi (t)kL21 1, then use a diagonalization argument to produce a
subsequence {fij } so that for each k Z, the Fourier coefficients fikj
converge to constants g k C as j .
(c) For all > 0, show that there is an N so that if |k| N , then
X
|fk |2 <
|k|N
for all f such that kf kL21 1.
(d) Conclude that the subsequence {fij } converges strongly to

X
g k e2ikt
kZ
in L2 (S1 , C).
Remark. The proof presented in the previous problem works for Sobolev
spaces in higher dimensions (for functions on the n-dimensional torus S1
S1 ), whereas the use of the Sobolev embedding theorem for the compact
inclusion L21 (S1 , C) , C 0 (S1 , C) is only available in dimension n = 1.
4.8 Regularity
Now we show that is smooth. First of all, note that kij is a smooth in each
set of local coordinates x on X. Also, since L21 (S1 , RN ), then we know
that is continuous in t S1 , and so kij () is continuous on S1 .
Until now, weve been lax about distinguishing between = ( 1 , . . . , N )
X RN and in local coordinates. There is an important point in which
we should make a distinction. Recall we are working on a coordinate chart
: U O X RN , where U Rn . Our notation has been this: a is the
ath coordinate of in RN X, while i has been shorthand for (1 )i
the ith coordinate of 1 in Rn U.
In the previous subsections, we have dealt with the L21 norm of in RN ,
while in local coordinates, we should deal with the L21 norm of 1 in
U Rn . Let 1 : O U be restriction of the smooth map
y = (y 1 , . . . , y n ) : Q U,
155
where Q is an open subset of RN which contains O X RN . (Recall
we may do this by the definition of smooth maps from O to Rn .) Let x =
(x1 , . . . , xN ) represent coordinates on RN . Compute for k = 1, . . . , n
y k a
(y )k = ,
t xa
where a is summed from 1 to N .
Proposition 84. Let : U O be a smooth coordinate parametrization of

X. Let I R be a compact interval, and let K O be compact. Then there
are positive constants C1 , . . . , C5 so that
C1 kkL21 (I,RN ) + C3 k1 kL21 (I,Rn ) + C4 C2 kkL21 (I,RN ) + C5 (49)
for all so that (I) K. (The point is that C1 , C2 , C3 , C4 , C5 are indepen-

dent of .)
Corollary 85. kkL21 (I,RN ) is bounded if and only if k1 kL21 (I,Rn ) is

bounded.
Remark. A related, simpler notion is the following: Two norms k kB1 and
k kB2 on a single linear space B are called equivalent if there are constants
C1 > C2 > 0 so that for all x B,
C1 kxkB1 kxkB2 C2 kxkB1 .
Remark. As long as we restrict to compact subsets of coordinate charts,

the norms in RN X and in local coordinates on Rn are equivalent. The
corollary holds for all the Banach function spaces we have discussed, not just
for L21 . Also, a similar proposition holds for Banach spaces of functions from
X to R, not simply spaces of maps from S1 to X:
For K O, the norms on L21 (K) and L21 (1 K) are equivalent under
the map
L21 (K) L21 (1 K), f 7 f .
Proof of Proposition 84. We claim it suffices to prove the bound (49) sepa-
rately for the L2 norm of and for the L2 norm of . Proof: if A = kkL2
and B = kk L2 , then
kkL21 = A2 + B 2 .
156
Then it is easy to check that for A, B 0,
1
(A + B) A2 + B 2 A + B.
2
In other words, the norm on given by the sum of the L2 norm of and the
L2 norm of is equivalent to the L21 norm. It is straightforward to use this
fact to prove the claim.
Since 1 is C 1 on K, it is locally Lipschitz and thus globally Lipschitz
on K (see Proposition 17). So for C the Lipschitz constant and x0 a point
in K, for all x K,
|1 (x)| |1 (x0 )| + C|x x0 |

C 0 + C|x|,
C 0 = |1 (x0 )| + C|x0 |.
Therefore, the Triangle Inequality gives

Z 12
1 1 2
k ()kL2 (S1 ,Rn ) = | ((t))| dt
S1
Z 12
0 2
(C + C|(t)|) dt
S1
Z 12 Z 12
0 2 2
(C ) dt + [C|(t)|] dt
S1 S1
0
= C + CkkL2 (S1 ,RN )
This is essentially one half of (49) for the L2 norm of . The other half
follows from the fact that is a C 1 function on the compact set 1 K.
We still must address the L21 norm of . Recall for y = 1 as above, that
y a
(1 ) = .
xa
On the compact set K, since 1 is C 1 , there is a constant C so that

y
xa C on K,

157
and so on K
N
1 y a X
( ) =
xa C
| a | CN ||.

a=1
Thus, as in the previous paragraph,
k(1 )kL2 (S1 ,Rn ) CN kk

L2 (S1 ,RN ) .
The opposite inequality can be obtained by considering as a C 1 map instead

of 1 .
Remark. In the previous proof, it sufficed to consider the L2 norms of and
separately. For higher derivatives, this is no longer adequate: Compute
y a 2y
(1 )=
+ a b .
xa xa xb
So first derivative terms of come into the calculations of the second deriva-
tives of 1 .
The geodesic equation is written in terms of the coordinates on U Rn ,
and for an open interval I S1 , (I) O. On any compact subinterval of
I, there is a constant C so that the the components of the metric gk` () and
its first derivatives gk`,m () have absolute values bounded by a constant C
(this is since is continuous on the compact interval I). Since L21 , each
i L2 . Therefore, Holders inequality shows that
Z n Z
X 12 Z 12
1
|g i j | dt
2 ij,k
C
2
i 2
| | dt j 2
| | dt < .
I i,j=1 I I
Thus 12 |gij,k i j | L1 (I) for each k, and thus Corollary 83 shows (gik i )
L1 (I) for each k in the sense of distributions. Lemma 86 below and the proof
of Proposition 59 above then show gik () i is continuous.
k L1 (I)
gk` () (50)
in the sense of distributions. Now since the inverse metric g `m () is contin-

uous in t, we may multiply by it to show that each i is continuous as well.
Thus is locally C 1 .
158
Now bootstrap using Corollary 83 again to show that (gik () i ) is con-
tinuous as well. Thus gik () i is, in the sense of distributions, a C 1 function.
As above, this shows i is also C 1 , and thus is locally C 2 .
We now have enough regularity to show rewrite Corollary 83 as the
geodesic equation
k = kij () i j .
for k C 2 functions. The equation holds in the usual sense of ODEs. There-
fore, since kij is smooth, the usual regularity theory for ODEs, Theorem 9,
applies, and the geodesic is smooth.
Lemma 86. Let f L1loc (R). Then

Z t
g(t) = f (s) ds
t0
is continuous.
Proof. Let t R, and let h > 0 (the case h < 0 is similar). Compute
Z t+h
g(t + h) g(t) = f (s) ds
Zt
= [t,t+h] (s) f (s) ds
R
for [t,t+h] the characteristic function of the interval [t, t + h]. Then as h 0,
[t,t+h] (s) f (s) 0
almost everywhere on R. For small h,

[t,t+h] (s) f (s) [t1,t+1] (s)f (s) ,
and the right-hand function is integrable since f is locally L1 . Then the

Dominated Convergence Theorem says that
Z Z
g(t + h) g(t) = [t,t+h] (s) f (s) ds 0 ds = 0
R R
as h 0+ . The case h 0 is similar. Thus g(t + h) g(t) as h 0 and

g is continuous at each t I.
159
Homework Problem 67. Let f : R R be an L1 function. Show that
Z t
(t) = exp f ( ) d
0
is a continuous function satisfying (0) = 1 and solves = f (t) in

the sense of distributions. (Hint: approximate f in L1 by a sequence of C
functions.)
4.9 Sobolev spaces, distributions, and Fourier series

In this subsection, we provide some more background results about Sobolev
spaces and distributions on S1 .
First of all, we describe C valued distributions. A complex valued distri-
bution is a C-linear map from C (S1 , C) to C.
Example 22. For k Z, the map
Z
7 k = e2ikt dt
S1
is a distribution.
Proposition 87.
( )
X X
L21 (S1 , C) = f k e2ikt : |f k |2 (k 2 + 1) < .
kZ kZ
Moreover, the norm kf kL21 is equivalent to

! 12
X
|fk |2 (k 2 + 1) .
kZ
Proof. First we show . Let f L21 (S1 , C) and compute

Z Z

f (e2ikt
)=
f (t) e2ikt
dt = f (t)(2ik)e2ikt dt = 2ik fk .
S1 S1
Since f L2 ,
X X
4 2 k 2 |fk |2 = kfk2L2 < . k 2 |fk |2 < .
kZ kZ
160
Now since f L2 also, then
X X
|fk |2 < and |fk |2 (k 2 + 1) < .
kZ kZ
This proves .
To show , note that
X X X
|f k |2 (k 2 + 1) < |f k |2 < and k 2 |f k |2 < .
kZ kZ kZ
Therefore, X
f= f k e2ikt L2 ,
kZ
and by the computations in the previous paragraph fk f(e2ikt ) =

2ikf k . Consider a test function C (S1 , C). Then compute
f() = f ()

Z
= f dt
S1
L2
= hf, i
X
= f k 2ik k
kZ
X
= (2ik)f k k
kZ
X
= fk k
kZ
* +
X
= k 2ikt
f e , .
kZ L2
This shows that

X X
f = fk e2ikt = (2ik)f k e2ikt
kZ kZ
in the sense of distributions. Therefore, both f and f are in L2 , and thus

f L21 (S1 , C).
The statement about equivalence of the norms follows easily.
161
Remark. Similar easy calculations show that
( )
X X
L2m (S1 , C) = f k e2ikt : |f k |2 (k 2 + 1)m <
kZ kZ
for every m = 0, 1, 2, . . . . Our characterization of smooth functions in (48)

above then shows that

\
1
C (S , C) = L2m (S1 , C)
m=0
Proof: it is straightforward to show that L2m (S1 , C) compactly embeds in

C m1 (S1 , C) for all m 1.
The Fourier series isometry between L2 (S1 , C) and sequences `2 = L2 (Z, C)
also allows us to define even more Sobolev spaces.
For any s R, define L2s (S1 , C) to be the set of distributions f which act
on X
= k e2ikt
kZ
by X
f () = fk k , (51)
kZ
k
where f = f (e2ikt ) and we assume that
X
|fk |2 (1 + k 2 )s < . (52)
kZ
Homework Problem 68. Show that if fk is a sequence of complex numbers

satisfying (52), then for any C (S1 , C), the sum in (51) converges.
Now we are able to put a topology on C (S1 , C). We only describe this
topology in terms of convergence of sequences. We say j in C (S1 , C),
if j in L2m (S1 , C) for all m 0.
Homework Problem 69. Show that j in C (S1 , C) if and only if
j in C p (S1 , C) for all p 0.
Hint: You may use the fact that L2m (S1 , C) embeds compactly into C m1 (S1 , C)
for each m 1. Also show that C p (S1 , C) embeds continuously into L2p (S1 , C)
for all p 0.
162
Now we finally give the correct definition of complex distributions on S1 .
A distribution on S1 is a continuous C-linear map from C (S1 , C) to C.
Denote the space of complex distributions on S1 by D0 (S1 , C).
[
Proposition 88. D0 (S1 , C) = L2m (S1 , C), and the image of D0 (S1 , C)
mZ
under the Fourier transform is the set of all polynomially bounded complex
sequences. In other words, it is the set of all sequences {f k } so that there are
m
m Z, C > 0 so that |f k | C(k 2 + 1) 2 for all k Z.
Proof. We prove the first equality, and leave the rest as an exercise.
To prove , if f is in the union, then f L2m (S1 , C) for some positive
m. To show f D0 (S1 , C), consider a sequence of j in C (S1 , C).
Then by definition, j in L2m . Then
|f (j ) f ()| = |f (j )|
X
|fk (kj k )|
kZ
X |fk | h
k k 2 m
i
= m | j |(1 + k ) 2
kZ
(1 + k 2 ) 2
! 21 ! 12
X |fk |2 X
2 m
|kj k |2 (1 + k 2 )m .
kZ
(1 + k ) kZ
The second term in the last line goes to zero by the remark after Proposition
87, while the first term is finite by the fact f L2m . Therefore, f (j ) f ()
for every test function , and f D0 (S1 , C).
We prove by contradiction. If f D0 (S1 , C) is not in L2m (S1 , C) for
every m Z, then for all m Z,

X
|fk |2 (1 + k 2 )m = .
k=
This implies that

sup |fk |2 (1 + k 2 )m = for all m Z.
kZ
(Proof of the contrapositive:

X X C
|fk |2 (1 + k 2 )m C = |fk |2 (1 + k 2 )m1 < .)
kZ kZ
1 + k2
163
So for each j, there is a kj so that
|fkj |
j 1.
(1 + kj2 ) 2
We may assume kj 6= 0.
Now we construct a sequence j which converges to 0 in C (S1 , C), but
for which f (j ) 6 0. Define
fkj
j = j e2ikj t .
|fkj |(1 + kj2 ) 2
Compute
(1 + kj2 )n
kj k2L2n = (1 + kj2 )nj ,
(1 + kj2 )j
where denotes equivalence of norms. For each fixed n, since each kj2 1,
then
lim kj k2L2n = 0,
j
and so j 0 in C (S1 , C). On the other hand,
fkj |fkj |
f (j ) = fkj j = j 1.
|fkj |(1 + kj2 ) 2 (1 + kj2 ) 2
So f (j ) 6 0 = f (0) = f (lim j ), where j 0 in C (S1 , C).
164

Real Analysis II Functions Spaces

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Real Analysis II Functions Spaces

Diunggah oleh

Hak Cipta:

Format Tersedia

Real Analysis II

Homework Problem 1. Let x = (x1 , . . . , xm ) denote a point in Rm , and

kf kC 0 (X) = sup |f (x)|.

1.2 The Banach space C 0

C 0 (X) = {f : X R : f is continuous and sup |f | < }.

Define the norm

It is straightforward to verify that k kC 0 satisfies the requirements for a

Remark. If fi f in C 0 (X), then we say fi f uniformly on X, and

Proposition 1. For any metric space X, C 0 (X) is a Banach space with

|fn (x) fm (x)| < .

Now let m to see that

|fn (x) f (x)| .

kfn f kC 0 = sup |fn (x) f (x)| ,

Choose a particular n > N and since fn is continuous at x, there is a > 0

So we have proved that for all  > 0, x X, there is a > 0 so that

kf kL () = inf{a : {x : |f (x)| > a} = 0}

Remark. The previous Proposition is true for any measurable subset of Rn

Homework Problem 2. Let g(x) be the Heaviside function on R. In other

(a) Show there is no function in C 0 (R) which is equal to g almost every-

(b) Show that there is no sequence of functions fn C 0 (R) which satisfy

Lemma 5. If f is differentiable at a, then the directional derivative Dv f (a)

where each cj a as h 0. So compute

and the space

C 1 () = {f : R : f, 1 f, . . . , m f are bounded and continuous}.

Similarly, we can consider Rp -valued C 1 functions, the difference being that

Proposition 7. On any open set Rm , C 1 (, Rp ) is a Banach space.

Proof. It is straightforward to check k kC 1 is a norm.

is a Banach space, there are uniform limits

where = (1 , . . . , m ), each i 0, || = 1 + + m , and

are continuous in a neighborhood of a point y, then

and we want to show that

Now we may let  0 to show that |(h)|/|h| 0 as h 0.

1.5 Contraction mappings

d(g(x), g(y)) d(x, y) for all x, y X.

Remark. It is important that the constant < 1 is independent of the x and

Theorem 2 (Contraction Mapping). Any contraction mapping on a com-

d(xn , xm ) d(xn , xn1 ) + + d(xm+1 , xm )

and so x is a fixed point. One point to note is that we have interchanged g

Homework Problem 7. Show any contraction map is continuous.

Homework Problem 8. Newtons method is an iterative method for finding

(c) Show that any fixed point of Newtons method is a zero.

1.6 Differentiating under the Integral

|f (y) f (z)| Cmn|y z|.

Proof. If y, z B, then the line segment {ty + (1 t)z : 0 t 1} between

Theorem 3 (Inverse Function Theorem). Let f : O U be a C 1 map

gy (x) = g(x) + y = x f (x) + y.

|gy (x1 ) gy (x2 )| = |g(x1 ) g(x2 )| 12 |x1 x2 | (7)

Therefore, for each y B(r/2), there is a unique fixed point x of gy , which

|x1 x2 | |g(x1 ) g(x2 )| + |f (x1 ) f (x2 )|

for yi = f (xi ). Thus f 1 is continuous.

|f 1 (y1 ) f 1 (y2 ) (Df (x2 ))1 (y1 y2 )|

Homework Problem 9. If, in the Inverse Function Theorem, f is a smooth

in terms of (the components of ) the first and second derivatives

and recall that Df 1 = (Df )1 can be written as

It will also be helpful to use Einsteins summation notation. In partic-

Theorem 4 (Implicit Function Theorem). Suppose f : Rn Rm Rm

Homework Problem 10. Prove the Implicit Function Theorem. Hints:

(b) Show that, on a suitably small neighborhood, F 1 is of the form F 1 (x, y) =

1.8 Lipschitz constants and functions

Here of course dX and dY are the metrics on X and Y respectively. An

dY (f (x), f (x0 )) L dX (x, x0 ) for all x, x0 X.

A function with finite Lipschitz constant is called Lipschitz. A basic fact is

|fn (x) fm (x)| < .

|fn (x) f (x)| .

kfn f kC 0 = sup |fn (x) f (x)| ,

So we have proved that for all > 0, x X, there is a > 0 so that

Now we may let 0 to show that |(h)|/|h| 0 as h 0.

where I = [t0 , t0 + ] I for a small positive to be determined later,