Anda di halaman 1dari 575

Birkhuser Advanced Texts

Edited by
Herbert Amann, Zrich University
Steven G. Krantz, Washington University, St. Louis
Shrawan Kumar, University of North Carolina at Chapel Hill
-DQ1HNRYi8QLYHUVLWp3LHUUHHW0DULH&XULH3DULV

3DYHO'UiEHN
-DURVODY0LORWD

Methods of Nonlinear Analysis


$SSOLFDWLRQVWR'LIIHUHQWLDO(TXDWLRQV

Birkhuser
Basel Boston Berlin

Authors:
3DYHO'UiEHN


'HSDUWPHQWRI0DWKHPDWLFV

)DFXOW\RI$SSOLHG6FLHQFHV

8QLYHUVLW\RI:HVW%RKHPLDLQ3LOVHQ
Univerzitn 8
3O]H


Czech Republic







-DURVODY0LORWD
'HSDUWPHQWRI0DWKHPDWLFDO$QDO\VLV
)DFXOW\RI0DWKHPDWLFVDQG3K\VLFV
&KDUOHV8QLYHUVLW\LQ3UDJXH
Sokolovsk 83
3UDKD
Czech Republic

0DWKHPDWLFV6XEMHFW&ODVVLFDWLRQ%[[---$+[[++-
-...5[[$$((&&&

/LEUDU\RI&RQJUHVV&RQWURO1XPEHU

%LEOLRJUDSKLFLQIRUPDWLRQSXEOLVKHGE\'LH'HXWVFKH%LEOLRWKHN
'LH'HXWVFKH%LEOLRWKHNOLVWVWKLVSXEOLFDWLRQLQWKH'HXWVFKH1DWLRQDOELEOLRJUDH
detailed bibliographic data is available in the Internet at <http://dnb.ddb.de>.

,6%1%LUNKlXVHU9HUODJ$*%DVHO%RVWRQ%HUOLQ
7KLVZRUNLVVXEMHFWWRFRS\ULJKW$OOULJKWVDUHUHVHUYHGZKHWKHUWKHZKROHRUSDUWRIWKHPDWHULDOLVFRQFHU
QHGVSHFLFDOO\WKHULJKWVRIWUDQVODWLRQUHSULQWLQJUHXVHRILOOXVWUDWLRQVUHFLWDWLRQEURDGFDVWLQJUHSURGXF
WLRQRQPLFUROPVRULQRWKHUZD\VDQGVWRUDJHLQGDWDEDQNV)RUDQ\NLQGRIXVHSHUPLVVLRQRIWKHFRS\ULJKW
owner must be obtained.

%LUNKlXVHU9HUODJ$*
Basel Boston Berlin
32%R[&+%DVHO6ZLW]HUODQG
3DUWRI6SULQJHU6FLHQFH%XVLQHVV0HGLD
3ULQWHGRQDFLGIUHHSDSHUSURGXFHGRIFKORULQHIUHHSXOS7&)f
3ULQWHGLQ*HUPDQ\
,6%1

H,6%1



ZZZELUNKDXVHUFK

Dedicated to the memory of


Svatopluk Fuck

Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

1 Preliminaries
1.1 Elements of Linear Algebra . . . . . . . . . . . . . . . . . . . . . .
1.2 Normed Linear Spaces . . . . . . . . . . . . . . . . . . . . . . . . .

1
24

2 Properties of Linear and Nonlinear


2.1 Linear Operators . . . . . . .
2.2 Compact Operators . . . . .
2.3 Contraction Principle . . . .

55
77
91

Operators
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .

3 Abstract Integral and Dierential Calculus


3.1 Integration of Vector Functions . . . . . . . . . . . . . . . . . . . . 105
3.2 Dierential Calculus in Normed Linear Spaces . . . . . . . . . . . . 117
3.2A Newton Method . . . . . . . . . . . . . . . . . . . . . . . . 134
4 Local Properties of Dierentiable Mappings
4.1 Inverse Function Theorem . . . . . . . . . . . . . . .
4.2 Implicit Function Theorem . . . . . . . . . . . . . .
4.3 Local Structure of Dierentiable Maps, Bifurcations
4.3A Dierentiable Manifolds, Tangent Spaces
and Vector Fields . . . . . . . . . . . . . . . .
4.3B Dierential Forms . . . . . . . . . . . . . . .
4.3C Integration on Manifolds . . . . . . . . . . . .
4.3D Brouwer Degree . . . . . . . . . . . . . . . . .

. . . . . . . . 139
. . . . . . . . 146
. . . . . . . . 156
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

181
195
208
228

5 Topological and Monotonicity Methods


5.1 Brouwer and Schauder Fixed Point Theorems . . . . . . . . . .
5.1A Fixed Point Theorems for Noncompact Operators . . .
5.2 Topological Degree . . . . . . . . . . . . . . . . . . . . . . . . .
5.2A Global Bifurcation Theorem . . . . . . . . . . . . . . . .
5.2B Topological Degree for Generalized Monotone Operators

.
.
.
.
.

.
.
.
.
.

249
261
267
295
303

viii

Contents

5.3
5.4

Theory of Monotone Operators . . . . . . . . . . . . . . . .


5.3A Browder and LerayLions Theorem . . . . . . . . . .
Supersolutions, Subsolutions, Monotone Iterations . . . . .
5.4A Minorant Principle and KreinRutman Theorem . .
5.4B Supersolutions, Subsolutions and Topological Degree

6 Variational Methods
6.1 Local Extrema . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Global Extrema . . . . . . . . . . . . . . . . . . . . . . . .
6.2A Ritz Method . . . . . . . . . . . . . . . . . . . . .
6.2B Supersolutions, Subsolutions and Global Extrema .
6.3 Relative Extrema and Lagrange Multipliers . . . . . . . .
6.3A Contractible Sets . . . . . . . . . . . . . . . . . . .
6.3B Krasnoselski Potential Bifurcation Theorem . . . .
6.4 Mountain Pass Theorem . . . . . . . . . . . . . . . . . . .
6.4A Pseudogradient Vector Fields in Banach Spaces . .
6.4B LusternikSchnirelmann Method . . . . . . . . . .
6.5 Saddle Point Theorem . . . . . . . . . . . . . . . . . . . .
6.5A Linking Theorem . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

309
323
330
338
351

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

361
375
388
398
401
414
416
426
436
442
456
464

.
.
.
.
.

.
.
.
.
.

473
477
481
486
492

.
.
.
.
.

.
.
.
.
.

499
505
510
515
527

7 Boundary Value Problems for Partial Dierential Equations


7.1 Classical Solution, Functional Setting . . . . . . . . . . . . . . .
7.2 Classical Solution, Applications . . . . . . . . . . . . . . . . . .
7.3 Weak Solutions, Functional Setting . . . . . . . . . . . . . . . .
7.4 Weak Solutions, Application of Fixed Point Theorems . . . . .
7.5 Weak Solutions, Application of Degree Theory . . . . . . . . .
7.5A Application of the Degree of Generalized
Monotone Operators . . . . . . . . . . . . . . . . . . . .
7.6 Weak Solutions, Application of Theory of Monotone Operators
7.6A Application of LerayLions Theorem . . . . . . . . . . .
7.7 Weak Solutions, Application of Variational Methods . . . . . .
7.7A Application of the Saddle Point Theorem . . . . . . . .

Summary of Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533


Typical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
Comparison of Bifurcation Results . . . . . . . . . . . . . . . . . . . . . . . 539
List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561

Preface
Motto:
Real world problems are in essence nonlinear. Hence methods of nonlinear
analysis became important tools of modern mathematical modeling.

There are many books and monographs devoted to methods of nonlinear analysis
and their applications. Typically, such a book is either dedicated to a particular
topic and treats details which are dicult for a student to understand, or it deals
with an application to complicated nonlinear partial dierential equations in which
a lot of technicalities are involved. In both cases it is very dicult for a student
to get oriented in this kind of material and to pick up the ideas underlying the
main tools for treating the problems in question. The purpose of this book is
to describe the basic methods of nonlinear analysis and to illustrate them on
simple examples. Our aim is to motivate each method considered, to explain it
in a general form but in the simplest possible abstract framework, and nally, to
show its application (typically to boundary value problems for elementary ordinary
or partial dierential equations). To keep the text free of technical details and
make it accessible also to beginners, we did not formulate some key assertions and
illustrative examples in the most general form.
The exposition of the book is at two levels, visually dierentiated by dierent
font sizes. The basic material is contained in the body of the seven chapters. The
more advanced material is contained in appendices to a number of sections and is
presented in a smaller font size. The basic material is independent of the advanced
material, is self-contained, and can be read by students new to the subject. It
should prepare an undergraduate student in mathematics to read scientic papers
in nonlinear analysis and to understand applications of the methods presented to
more complex problems.
Each chapter contains a number of exercises that should provoke the readers
creativity and help develop his or her own style of approaching problems. However,
the exercises play an additional role. They carry some of the technical material
that was omitted in simplifying some of the basic proofs. They are thus an organic

Preface

part of the exposition for graduate students who already have experience with the
methods of nonlinear analysis and are interested in generalizations.
We have organized the material in this book as follows.
In Chapters 13, we introduce some necessary notions and basic assertions
from linear algebra (Section 1.1) and linear functional analysis (Sections 1.22.2),
and we also present some preliminaries concerning the Contraction Principle and
dierential and integral calculus in normed linear spaces (Sections 2.33.2). In
this part of the text we give proofs of the results which are closely related to the
nonlinear part of the book. On the other hand, several very important statements
of linear functional analysis are left without proofs.
In Chapter 4, local properties of dierentiable mappings are treated. In particular, it includes topics such as the Inverse Function Theorem, the Implicit Function Theorem together with the Rank Theorem and the notion of the dierentiable
manifold. Results such as the LyapunovSchmidt Reduction and the Morse Theorem are used to prove the Local Bifurcation Theorem of Crandall and Rabinowitz.
Chapter 5 is devoted to the topological and monotonicity methods of nonlinear analysis. We focus on the Brouwer and Schauder Fixed Point Theorems, the
Sard Theorem and the analytic approach to the degree of a mapping, monotone
operators and the method of monotone iterations based on the notions of superand sub-solutions.
In Chapter 6, basic variational methods are presented. We start with local and global extrema and then continue with the method of Lagrange Multipliers with applications to eigenvalue problems (CourantFischer and Courant
Weinstein Principles), the Mountain Pass Theorem and the Saddle Point Theorem. Abstract results from Chapters 46 are accompanied by examples of various
boundary value problems for ordinary dierential equations. Since these applications are spread over a large number of pages, we add a brief account of examples
of boundary value problems for both ordinary and partial dierential equations
together with the methods used at the end of the book. The reader will also nd
there a short guide on the bifurcation results presented in the book.
Chapter 7 deals with several applications of the preceding methods to boundary value problems for elementary nonlinear partial dierential equations. We
present and discuss the notions of classical and weak solutions and try to minimize the technical diculties connected with the formulation of the problems. All
this material represents a self-contained introduction to the methods of nonlinear
analysis with simple applications to elementary dierential equations.
More advanced material is presented in appendices which are attached to a
number of sections.
In particular, Appendix 3.2A explores an abstract Newton Method as an
application of the Contraction Principle and dierential calculus in Banach spaces.
Appendices 4.3A to 4.3D are devoted to the analysis on manifolds (vector
elds, dierential forms and integration on manifolds). The main results presented

Preface

xi

in these appendices are an abstract version of the Stokes Theorem (and its applications) and the construction of the Brouwer degree by means of dierentiable forms.
Some xed point theorems for noncompact operators are presented in Appendix 5.1A. As an application of the LeraySchauder degree theory we consider
global bifurcation theorems in Appendix 5.2A while Appendix 5.2B is devoted to
the generalization of the LeraySchauder degree to generalized monotone mappings. Appendix 5.3A deals with the generalization of the theory of monotone
operators to a more general functional setting and to operators which are monotone only in the principal part. In Appendix 5.4A we give the proof of the famous
KreinRutman Theorem which itself falls within the linear theory but plays an
essential role in the study of nonlinear problems. Appendix 5.4B illustrates the
connection between the method of supersolutions and subsolutions and the topological degree.
The Ritz Method is presented in Appendix 6.2A as an application of an abstract variational principle. Appendix 6.2B illustrates the connection between the
method of supersolutions and subsolutions and the existence of global extrema.
Appendix 6.3A has an auxiliary character for establishing the potential bifurcation theorem in Appendix 6.3B. In Appendix 6.4A we generalize the so-called
Deformation Lemma and present the Mountain Pass Theorem in a more general
setting. The generalization of the Lagrange Multiplier Method is carried out in
Appendix 6.4B. Appendix 6.5A is dedicated to the generalization of the Saddle
Point Theorem.
Appendices 7.5A, 7.6A and 7.7A are devoted to applications of the degree of
generalized monotone mappings, the LerayLions Theorem and the Saddle Point
Theorem, respectively, to boundary value problems for elementary partial dierential equations. This more advanced part contains several generalizations of the
methods presented in the basic part and the beginner in the subject who is reading
the book can skip it. On the other hand, there are (a few) places in appendices
where we have to refer to the basic text and the notions we refer to are contained in the forthcoming chapters or sections. This, however, corresponds to our
philosophy of two levels of the text and, in our opinion, does not impair the
smoothness of reading.
In order to make the text self-contained, we decided to comment on several
notions and statements in footnotes. To place the material from the footnotes
in the text could disturb a more advanced reader and make the exposition more
complicated. In order to emphasize the role of the statements in our exposition we
identify them as Theorem, Proposition, Lemma or Corollary. However, the reader
should be aware of the fact that this by no means expresses the importance of the
statement within the whole of mathematics. So, several times, we call important
theorems Propositions, Lemmas or Corollaries.
Although the book should primarily serve as a textbook for students on the
graduate level, it can be a suitable source for scientists and engineers who have
need of modern methods of nonlinear analysis.

xii

Preface

At this point we would like to include a few words about our good friend,
colleague and mentor Svatopluk Fuck to whom we dedicate this book. His work in
the eld of nonlinear analysis is well recognized and although he died in 1979 at the
age of 34, he ranks among the most important and gifted Czech mathematicians
of the 20th century.
We would like to thank Marie Benediktov
a and Jir Benedikt for an excellent
typesetting of this book in LATEX 2 , excellent gures and illustrations as well as for
their valuable comments which improved the quality of the text. Our thanks belong also to Eva Fasangova, Gabriela Holubov
a, Eva Kasprkov
a and Petr Stehlk
for their careful reading of the manuscript and useful comments which have decreased the number of our mistakes and made this text more readable. Our special
thanks belong to Jir Jarnk for correction of our English, Ralph Chill and Herbert
Leinfelder for their improvements of the text and methodological advice.
Both authors appreciate the support of the research projects of the Ministry
of Education, Youth and Sports of the Czech Republic MSM 4977751301 and
MSM 0021620839.

Plze
nPraha,
November 2006

Pavel Dr
abek
Jaroslav Milota

Chapter 1

Preliminaries
1.1 Elements of Linear Algebra
This section is rather brief since we suppose that the reader already has some
knowledge of linear algebra. Therefore, it should be viewed mainly as a source
of concepts and notation which will be frequently used in the sequel. There are
plenty of textbooks on this subject. As we are interested in applications to analysis
we recommend to the reader the classical book Halmos [64] to nd more detailed
information.
A decisive part of analysis concerns the study of various sets of numbers
such as R, C, RM , . . . , sets of functions (continuous, integrable, dierentiable),
and mappings between them. These sets usually allow some algebraic operations,
mainly addition and multiplication by scalars. We will denote the set of scalars by
 and have usually in mind either the set of real numbers R or that of complex
numbers C.
Denition 1.1.1. A set X on which two operations addition and multiplication
by scalars are dened, is called a linear space over a eld  if the following
conditions are fullled:
(1) X with respect to the operation x, y X  x + y X forms a commutative
group with a unit element denoted by o and the inverse element to x X
denoted by x.
(2) The operation a , x X  ax X satises
(a) a(bx) = (ab)x, a, b , x X,
(b) 1x = x, x X, where 1 is the multiplicative unit of the eld .
(3) For the two operations the distributive laws hold, i.e., for a, b , x, y X,
we have
(a) (a + b)x = ax + bx,
(b) a(x + y) = ax + ay.

Chapter 1. Preliminaries

If  = R or C, then X is called a real or complex linear space, respectively. If a


subset Y X itself is a linear space with respect to the operations induced by
those dened on X, then Y is said to be a (linear) subspace of X.
In the rest of this section the character X always denotes a linear space over

. If  is not specied, then it always means that a denition or a statement holds


n

ai xi is well
for an arbitrary eld . For x1 , . . . , xn X, a1 , . . . , an  the sum
i=1

dened and determines an element x X which is called a linear combination of


x1 , . . . , xn (with coecients a1 , . . . , an ). Notice that only nite linear combinations
are dened since innite sums cannot be dened without any topology on X.
If A is a subset of X, then the set of all linear combinations of elements of
A is denoted by Lin A and is called the span of A. A span is always a subspace of
X. We can ask whether x Lin{x1 , . . . , xn } can be expressed in a unique way as
a linear combination of x1 , . . . , xn . This uniqueness holds if and only if x1 , . . . , xn
are linearly independent, i.e., the condition
n


ai xi = o

a1 = = an = 0

i=1

is satised. More generally, we have the following denition.


Denition 1.1.2. A set A X is said to be linearly independent if every nite
subset of A is linearly independent. A set A X is called a basis 1 of X provided
A is linearly independent and Lin A = X.
Theorem 1.1.3. Every linear space X = {o} has a basis. If A, B are two bases of
X, then there is a bijective (i.e., injective and surjective) mapping from A onto B.
We will give the proof of the existence part since it contains a very important
method which is frequently used. To see the idea of the proof, notice that a basis is a
linearly independent set which is maximal in the sense that, by adding an element,
it will cease to be linearly independent. The question why such a maximal set has
to exist concerns generally mathematical philosophy. There are several equivalent
statements of set theory which guarantee this existence result. As the most useful
we have found the following one.2
Theorem 1.1.4 (Zorns Lemma). Let (A, ) be an ordered set in which every chain
has the lowest upper bound. Then for any a A there is a maximal m A such
that a m.3
1 It

is sometimes called a Hamel basis in order to emphasize the distinction from a Schauder or
orthonormal basis in a Banach space or a Hilbert space, respectively (see Section 1.2).
2 It can be viewed also as an axiom of set theory.
3 A binary relation on A A is called an ordering if
(1) x x for all x A,
(2) if x y and y x, then x = y,
(3) if x y and y z, then x z.

1.1. Elements of Linear Algebra

We now return to
Proof of Theorem 1.1.3. Let A be a collection of all linearly independent subsets
of X and dene A B for A, B A if A is a subset of B. Choose A A (A =
since X = {o}) and let M be a maximal element of A , A M, whose
 existence
is guaranteed by Zorns Lemma (if B is a chain in A , then sup B =
B). Then
BB

Lin M = X.
The proof of the latter part of Theorem 1.1.3 is more involved (the construction
of an injection of A into B is also based on the application of Zorns Lemma) and
it is omitted.

Denition 1.1.5. Let X be a linear space and let A be a basis of X. Then the
cardinality of A is called the dimension of X.
Example 1.1.6.
(i) Assume that A is a basis of a linear space X. Then there is a set (the socalled index set) and a bijection  e A onto A. We will also say
that {e } is a basis of X. For any x X there is a nite subset K
and scalars { }K such that
x=

e .

These scalars are uniquely determined and will be called the coordinates of
x with respect to the basis {e }.
(ii) The space RM of real M -tuples with the usual operations is a real linear
space and the elements
ek = (0, . . . , 0, 1, 0, . . . , 0),

k = 1, . . . , M

(1 is at the kth position), form a basis of RM . It will be called the standard


basis of RM . If x = (x1 , . . . , xM ) RM , then x1 , . . . , xM are the coordinates
of x with respect to the standard basis.
If (A, ) is an ordered set, then B A is called a chain if for every x, y B we have either
x y or y x.
An element b A is called the lowest upper bound of a subset B A (b = sup B) if
(1) a B = a b;
(2) if a c for all a B, then b c.
Similarly, we call d A the greatest lower bound of a subset B A (d = inf B) if
(1) a B = d a;
(2) if c a for all a B, then c d.
An element m A is called a maximal element of A if m x for an x A implies m = x.

Chapter 1. Preliminaries

(iii) Similarly, CM is the space of complex M -tuples and the set of elements
e1 , . . . , eM dened as above is the standard basis of CM . More generally, if X
is a real linear space and iX is dened by iX  {ix : x X} where i2 = 1,
then
( = {x + iy : x, y X} )
XC  X + iX
is the complexication of X. The equality x + iy = o holds if and only if
x = y = o. The operations in XC are dened as follows:
(x1 + iy1 ) + (x2 + iy2 )  (x1 + x2 ) + i(y1 + y2 ),
(a + ib)(x + iy)  (ax by) + i(bx + ay),

x1 , x2 , y1 , y2 X,
a, b R, x, y X.

It is easy to verify that XC is a complex linear space.


(iv) Let P be the family of all polynomials of one variable with real or complex
coecients. Then P is respectively a real or complex linear space and the
polynomials
Pk (z) = z k ,
k = 0, 1, . . . ,
form a basis of P.
(v) The space C[0, 1] of all real (complex) continuous functions on the interval
[0, 1] is a real (complex) linear space. According to Theorem 1.1.3, C[0, 1]
has a basis but it is uncountable (this is not so easy to prove). We will
not distinguish among dierent innite cardinals and refer to spaces like P
and C[0, 1] as innite dimensional spaces and use (incorrectly) the symbol
dim = .
(vi) We can consider R as a linear space
over the eld Q of rational numbers.
For example, the elements 1 and 2 are linearly independent in this case. In
this case a basis is uncountable, and serves as a tool for the constructions of
pathological examples in analysis, like a noncontinuous (or, equivalently,
non-measurable) solution f of the functional equation
f (x + y) = f (x) + f (y),

x, y R.

Remark 1.1.7. In the sequel we will use the symbol

to warn the reader that the statement in question is true only in linear spaces of
nite dimension.
Next we state a corollary of Theorem 1.1.3.
Corollary 1.1.8. Let X be a linear space and let Y be a subspace of X. Then there
exists a subspace Z of X with the following properties:
(i) for every x X there are unique y Y , z Z such that x = y + z;
(ii) Y Z = {o}.

1.1. Elements of Linear Algebra

Notation. X = Y Z and X is called the direct sum of Y , Z, and Z a direct


complement of Y in X.
Proof. Let A be a basis of Y and
A = {B linearly independent subset of X : A B}.
Put C = A \ A (the set compleBy Zorns Lemma, A has a maximal element A.
ment). It is easy to see that Z  Lin C satises both (i) and (ii).

Notice that the elements y Y , z Z are uniquely determined by x in (i).
If {o} = Y and Y = X, then Z is not uniquely determined. A simple example can
be given in R2 and the reader is invited to provide one!
Denition 1.1.9. Let X and Y be linear spaces over the same eld . A mapping
A : X Y is said to be a linear operator if it possesses the following properties:
(1) A(x + y) = Ax + Ay for all x, y X;
(2) A(x) = Ax for every , x X.
The collection of all linear operators from X into Y is denoted by L(X, Y ). We
will use the simpler notation L(X) if X = Y .
Remark 1.1.10.
(i) A linear operator A L(X, Y ) is uniquely determined by its values on the
elements of a basis A = {e } . Indeed, let f  Ae , , and


x=

e .

K
K nite

If A is linear, then Ax has to be equal to

f . On the other hand, if

{f } are given, then


Ax 

for

x=

satises (1) and (2) from Denition 1.1.9.


(ii) Assume that both X and Y are nite dimensional spaces and
{e1 , . . . , eM }

and

{f1 , . . . , fN }

are bases of X and Y , respectively. If A L(X, Y ), then


Aej =

N

i=1

aij fi ,

j = 1, . . . , M,

for some scalars aij .

(1.1.1)

Chapter 1. Preliminaries

These scalars form a matrix


A = (aij ) i=1,...,N
j=1,...,M

(N rows and M columns; the jth column consists of the coordinates of Aej ).
This matrix A is called the matrix representation of the linear operator A
with respect to the bases {e1 , . . . , eM } and {f1 , . . . , fN }.
On the other hand, if {e1 , . . . , eM } and {f1 , . . . , fN } are bases of X and Y ,
respectively, and A is an N M matrix, then the formula (1.1.1) determines
a linear operator A L(X, Y ).
(iii) If A, B L(X, Y ) have matrix representations A and B (with respect to the
same bases), then
A + B  (aij + bij ) i=1,...,N
j=1,...,M

is the matrix representation of


A + B : x  Ax + Bx.
Similarly,
A  (aij ) i=1,...,N
j=1,...,M

is the matrix representation of


A : x  Ax.
It is obvious that L(X, Y ) is a linear space (over the same scalar eld )
under these denitions of A + B, A. This is true without any restrictions
on the dimensions of X and Y .
(iv) If X, Y , Z are linear spaces over the same scalar eld
B L(Y, Z), then
BA : x  B(Ax),
x X,

 and A L(X, Y ),

is a linear operator from X into Z. Moreover, if X, Y , Z are nite dimensional


spaces and A = (aij ) i=1,...,N , B = (bki )k=1,...,P are matrix representations
j=1,...,M

i=1,...,N

of A and B, respectively, then


BA 

N

i=1


bki aij
k=1,...,P
j=1,...,M

is the matrix representation of BA. This product of operators is non-commutative in general, even in the case X = Y = Z.

1.1. Elements of Linear Algebra

For A L(X, Y ) we denote by


Ker A  {x X : Ax = o}
the kernel of A, and by
Im A  {Ax : x X}
the image of A. Evidently, Ker A and Im A are linear subspaces of X and Y ,
respectively.
Denition 1.1.11. A linear operator A L(X, Y ) is said to be
(1) injective if Ker A = {o},
(2) surjective if Im A = Y ,
(3) an isomorphism if A is both injective and surjective.
Remark 1.1.12.
(i) If A L(X, Y ) is injective and e1 , . . . , en are linearly independent elements
of X, then Ae1 , . . . , Aen are linearly independent elements of Y . Further,
A L(X, Y ) is an isomorphism if and only if {Ae } is a basis of Y
whenever {e } is a basis of X. In other words:
linear spaces X, Y (over the same scalar eld ) have the same
dimension if and only if there is an isomorphism A L(X, Y ).
(ii) Assume that A L(X, Y ) is an isomorphism and put A1 y = x where
y = Ax. Then A1 L(Y, X) and
AA1 = IY ,

A1 A = IX

where IX and IY denote the identity maps on X and Y , respectively. A1 is


called the inverse of A. If X = Y and A is a matrix representation of A, then
A1 has the inverse matrix A1 as the representation in the same bases.
(iii) (Transformation of coordinates in a nite dimensional space)
Let E = {e1 , . . . , eM } and F = {f1 , . . . , fM } be two bases of a linear space
X. There are two questions:
(a) What is the relation between the coordinates of a given x X with
respect to these bases?
(b) Let A L(X) have matrix representations AE and AF with respect to
these bases. What is the relation between AE and AF ?
The answer to the rst question is easy: Put
T ej = fj ,

j = 1, . . . , M,

and extend T to a linear operator on X. Then T is an isomorphism. Denote


by T = (tij ) i=1,...,M its matrix representation with respect to the basis E,
j=1,...,M

i.e.,
T ej =

M

i=1

tij ei ,

j = 1, . . . , M.

Chapter 1. Preliminaries

For x =

M


j fj we have

j=1

x=

M

j=1

M


tij ei =

i=1

M


M


tij j ei .

i=1

j=1

This means that the column vector = ... of the coordinates of x in


M
the basis E is given by = T where

i =

M


tij j .

j=1

The second question can be answered by the same method but a certain
caution in computation is desirable. Write
M

M
M
M




(E)
(F )
(F )
tkj ek =
tkj aik ei =
akj T ek =
akj tik ei .
Afj = A
k=1

k,i=1

k=1

k,i=1

This equality can be expressed in matrix notation as


AE T = T AF .
Since the matrix T has an inverse, we get
AF = T 1 AE T .

(1.1.2)

Example 1.1.13.
(i) Let X = Y Z. Dene
Px = y

where x = y + z,

y Y,

z Z.

Then P is the so-called projection of X onto Y and has the following properties:
(a) P 2  P P = P ,
(b) Ker P = Z.
It is easy to see that the properties (a), (b) determine uniquely the projection
P and hence also the decomposition X = Y Z (Y = Im P ).
(ii) Let Y be a subspace of X. For x X put
[x]  x + Y = {x + y : y Y }.

1.1. Elements of Linear Algebra

If x, y X, then either [x] = [y] ( x y Y ) or [x] [y] = . Dene


[x] + [y]  [x + y],

for x, y X, .

[x]  [x]

These operations are well dened and endow the set


X|Y  {[x] : x X}
with the structure of a linear space. The space X|Y is called a factor space
or simply a Y -factor. Put
: x  [x],

x X.

Then (the so-called canonical embedding of X onto X|Y ) is a linear, surjective operator from X onto X|Y , and Ker = Y . If x = y + z where y Y ,
z Z and X = Y Z, then the mapping j : [x]  z is an isomorphism
of X|Y onto Z. In particular, X|Y and Z have the same dimension. The
dimension of X|Y is sometimes called the codimension of Y (codim Y ) and
dim X = dim Y + codim Y.

(1.1.3)

Warning. If X is an innite dimensional space, then the sum on the rightg


hand side is the sum of innite cardinal numbers!
Proposition 1.1.14. Let A L(X, Y ) and let be the canonical embedding of X
 Ax, then A is injective and the diagram in Figure 1.1.1
onto X|Ker A . If A[x]

is commutative, i.e., A = A.

X|Ker A
A

A
Y
Figure 1.1.1.

Proof. The assertion is obvious but do not forget to prove that A is well dened.

Corollary 1.1.15. Let A L(X, Y ). Then
dim X = dim Ker A + dim Im A.

(1.1.4)

In particular, if X = Y and dim X < , then A L(X, Y ) is injective if and


only if it is surjective.

10

Chapter 1. Preliminaries

Proof. We have
codim Ker A = dim X|Ker A = dim Im A = dim Im A
since A is an isomorphism of X|Ker A onto its image. Equality (1.1.4) follows
immediately from (1.1.3). If A is injective, then
dim X = dim Im A,
and this implies (only in the case of X and Y having the same nite dimension)
that Y = Im A. If Im A = Y , then (nite dimensions!)
dim Ker A = 0,


i.e., A is injective.

Example 1.1.16. Let X be the space of bounded (real) sequences l (N) and dene
the right-shift
SR : x = (x1 , . . . )  (0, x1 , x2 , . . . )
and the left-shift
SL : x = (x1 , . . . )  (x2 , x3 , . . . ).
Then SR is injective but not surjective and SL is surjective but not injective.
Moreover,
for every x X.
SL SR x = x
g
What is S S ?
R L

The following special case of linear operators plays an important role both
in the theory of linear spaces and in applications.
Denition 1.1.17. Let X be a linear space over a eld . A linear operator from
X into  is called a linear form. The linear space of all linear forms on X is called
the (algebraic) dual space of X and is denoted by X # .
Example 1.1.18.
(i) Let {e1 , . . . , eM } be a basis of X, i.e., for every x there is a unique M -tuple
M

(1 , . . . , M ) M (coordinates of x) such that x =
i ei . The mapping
i=1

ei : x  i is a linear form (the ith coordinate form). It is straightforward to


show that e1 , . . . , eM are linearly independent and
Lin{e1 , . . . , eM } = X # ,

i.e., {e1 , . . . , eM } is a basis of X # (the so-called dual basis of X # , dual to


{e1 , . . . , eM }).

1.1. Elements of Linear Algebra

11

(ii) If f X # \ {o}, then


codim Ker f = 1.
To see this choose x0 X such that f (x0 ) = 1. Then
x = (x f (x)x0 ) + f (x)x0 Ker f Lin{x0 }.
On the other hand, if Y is a subspace of X of codimension 1,4 then
X = Y Lin{x0 }

for an x0 X.

For x = y + x0 , y Y , we put f (x) = . Then


f X # \ {o}

and

Ker f = Y.

Moreover, if f, g X # are such that Ker f = Ker g, then there is an 


g
for which f = g.
This fact has the following generalization, which will be used in Section 6.3,
more precisely in the proof of Theorem 6.3.2.
Proposition 1.1.19. Let f1 , . . . , fn , g be linear forms on X. Then
g Lin{f1 , . . . , fn }

if and only if

Ker fi Ker g.

i=1

Proof. The only if part is obvious. For the if part notice that the assertion
g Lin{f1 , . . . , fn } can be interpreted as the existence of a linear form (n )#
such that
g = F
where F (x)  (f1 (x), . . . , fn (x)).
(1.1.5)
Let n = Im F (X) Y (Corollary 1.1.8). If = + , = F (x), Y , then the
mapping () = g(x) is a well-dened linear form (by assumption). This means
that (1.1.5) holds.

Denition 1.1.20. Let A L(X, Y ) and g Y # . Then the linear form f (x) 
g(Ax) is denoted by f = A# g and A# is called the adjoint operator to A.
Remark 1.1.21.
(i) A# L(Y # , X # ).
(ii) If A has a matrix representation A = (aij ) i=1,...,N with respect to bases
j=1,...,M

E = {x1 , . . . , xM } in X and F = {y1 , . . . , yN } in Y , then the adjoint operator


A# has the representation
A# = (aji )j=1,...,M
i=1,...,N

(i.e., A# is the transpose of A) with respect to the dual bases.


4 Such

a subspace is often called a hyperplane.

12

Chapter 1. Preliminaries

Warning. We will encounter dierent adjoint operators in the next section


and the adjoint A with respect to a scalar product will have a dierent
representation in a complex space!
Now we turn our attention to a system of linear equations
M


aij xj = bi ,

i = 1, . . . , N.

(1.1.6)

j=1

This system can be written in a more compact form, namely as


Ax = b

(1.1.7)

where A is a matrix representation of the linear operator A from X into Y . By


choosing xed bases E = {e1 , . . . , eM } in X and F = {f1 , . . . , fN } in Y (also
Y = RN or CN ), A is dened by its matrix representation A = (aij ) i=1,...,N with
j=1,...,M

respect to these bases. In order to formulate results on solvability of (1.1.6) (or,


equivalently, of (1.1.7)) the following notation will be useful.
Notation. If U is a subset of X (not necessarily a subspace of X), then
U = {f X # : x U f (x) = 0}.
Similarly,
W = {x X : f W f (x) = 0}

for W X # .

Proposition 1.1.22.
(i) (U ) = Lin U for every U X.
(ii) If dim X < , then (W ) = Lin W for all W X # .
Proof. We include the proof because it contains a construction which should be
compared with an analogous one in Section 2.1 (see Proposition 2.1.27 and its
proof).
(i) We can assume U to be a subspace of X since U = (Lin U) . The inclusion U (U ) is obvious. To prove the reverse let us suppose by contradiction
that there is an element x0 (U ) \ U. By the method of proof of Theorem 1.1.3,
a subspace Y of X can be found such that
X = Lin{x0 } Y

and

U Y.

According to Example 1.1.18(ii) there exists f X # with Ker f = Y . In particular, f U and f (x0 ) = 0, which contradicts the choice of x0 .
(ii) This part follows from (i) by replacing X by X # . To repeat the proof we
need that (X # )# could be identied with X. We note that this is possible because
dim X < .


1.1. Elements of Linear Algebra

13

The main idea separation of a point from a subspace by a linear form


(i.e., by a hyperplane) can be substantially generalized.
Denition 1.1.23. A subset C of a (real or complex) linear space X is called convex
if for every x, y C, t [0, 1], the point
tx + (1 t)y

belongs to

C.

Proposition 1.1.24. Let X be a real linear space, = C a convex subset of X with


a nonempty algebraic interior
C 0  {a C : y X t0 > 0 such that a + ty C for all t [0, t0 )}.
Let x0 X \ C. Then there is f X # such that
f (x) f (x0 )

for all

x C.

Proof. It needs a special tool for the treatment of convex sets and a considerably
more sophisticated extension procedure,5 and, therefore, it is omitted. See, e.g.,
Rockafellar [109, 11] where the interested reader can nd also applications to
convex optimization, and also Corollary 2.1.18.

Theorem 1.1.25. For A L(X, Y ) we have
(i) Im A = (Ker A# ) ,
(ii) Im A# = (Ker A) .
(iii) If, moreover, dim X = dim Y < , then
dim Ker A = dim Ker A# .

(1.1.8)

Proof. (i) It is straightforward to prove both the inclusions which lead to the
equality (Im A) = Ker A# . The result follows then from Proposition 1.1.22(i).
(ii) Let Y = Im A Z (Corollary 1.1.8). For f (Ker A) and y = Ax + z,
z Z, put g(y) = f (x). This denition does not depend on a concrete choice
of x since f (Ker A) . This proves that f = A# g and hence the inclusion
(Ker A) Im A# holds. The reverse inclusion is trivial.
(iii) Observe rst that (X|U )# is isomorphic to U for any subspace U of
X, namely,
(F )(x)  F ([x]),
F (X|U )#
is the desired isomorphism. If dim X < , then X|U is isomorphic to (X|U )#
(both spaces have the same dimension) and, therefore, X|U is isomorphic to U .
Now, we apply this observation to U = Ker A. We recall that Im A is isomorphic to X|Ker A (Proposition 1.1.14) and therefore to (Ker A) . By (ii), Im A is
isomorphic to Im A# . The equality (1.1.8) follows from Corollary 1.1.15.

5 See

Corollary 2.1.18 for a similar process.

14

Chapter 1. Preliminaries

Remark 1.1.26.
(i) Note that Theorem 1.1.25(i) is an existence result for the equation (1.1.6)
(or (1.1.7)) because it can be reformulated as follows:
The equation (1.1.6) has a solution for b = (b1 , . . . , bN ) if and only
if
N

bi fi = 0
i=1

for all solutions f = (f1 , . . . , fN ) of the adjoint homogeneous equation


N

aji fi = 0,
j = 1, . . . , M.
i=1

In particular, we have also the alternative result:


Either the equation (1.1.6) has a solution for all right-hand sides
or6 the adjoint homogeneous equation has a nontrivial solution.
Theorem 1.1.25(ii) can be reformulated similarly.
(ii) If A is a matrix representation of A L(X, Y ) (X and Y are nite dimensional spaces), then dim Im A is equal to the number of linearly independent
columns of A and is called the rank of A. If X = Y , then A is a square
matrix of the type M M (M = dim X), and it is called a regular matrix
provided
M = rank A.
Equivalently, A is a regular matrix if and only if its determinant det A does
not vanish. By the proof of Theorem 1.1.25(iii),
dim Im A = dim Im A# .
In particular, this means that the rank of A is equal to the rank of its transpose. The reader is asked to nd more matrix formulations of the previous
results.
We often do calculations with a matrix representation instead of the operator
itself. Since there are plenty of representations of the same operator it would be
convenient to work with the simplest possible form. To examine this problem we
start with some notions.
Denition 1.1.27. Let X be a complex linear space and A L(X). A complex
number is called an eigenvalue of A if there is x = o such that
Ax = x.

Such an element x is called an eigenvector of A (associated with the eigenvalue ).


The set of all eigenvalues of A is called the spectrum of A and is denoted by (A).
6 The conjunction or has exclusive character here. This alternative result is sometimes called
a Fredholm alternative since I. Fredholm proved such a result for linear integral equations. See
also Section 2.2.

1.1. Elements of Linear Algebra

15

Warning. In innite dimensions the spectrum of a linear operator can contain also
other points than the eigenvalues and is dened in a dierent way (see page 56)!
Remark 1.1.28. It is obvious that the following statements are equivalent in a
nite dimensional complex space X:
(A)

Ker (I A) = {o}
det (I A) = 0

rank (I A) < dim X

where A is a representation of A.
Since P (z)  det(zI A) is a polynomial (the so-called characteristic polynomial ) of degree M = dim X, the problem of nding (A) is equivalent to solving
an algebraic equation (the so-called characteristic equation of A)
P (z) = 0.

(1.1.9)

According to the Fundamental Theorem of Algebra (see Theorem 4.3.111) there


exists at least one solution of (1.1.9) in C. The reason for considering complex
spaces here is the fact that (1.1.9) need not have a real solution. It is an easy
consequence of the Fundamental Theorem of Algebra that the polynomial P can
be written in the form
P () = ( 1 )m1 ( k )mk

(1.1.10)

where (A) = {1 , . . . , k } (1 , . . . , k are dierent) and m1 + + mk = dim X.


The positive integer mi is called the multiplicity of the eigenvalue i .
Denition 1.1.29. Let A L(X).
(1) A subspace Y X is said to be A-invariant if A(Y ) Y .
(2) An A-invariant subspace Y X is said to reduce A if there is a decomposition
X = Y Z where Z is also A-invariant.
From now on till the end of this section we consider exclusively nite dimensional spaces.
Example 1.1.30.
(i) Let X = Y Z where both Y and Z are A-invariant. If {e1 , . . . , em } is a
basis of Y and {em+1 , . . . , eM } is a basis of Z, then the matrix representation
A of A with respect to {e1 , . . . , eM } has a block form

A=

AY
O

O
AZ

where AY and AZ are representations of restrictions of A to Y and Z,


respectively.

16

Chapter 1. Preliminaries

(ii) Assume that there is a basis {e1 , . . . , eM } of X consisting of eigenvectors


of A L(X) and Aei = i ei , i = 1, . . . , M (1 , . . . , M are not necessarily
distinct). Then the matrix representation of A with respect to this basis is
the diagonal matrix

1 0
0
0 2
0

..
..
.. .
..
.
.
.
.
0

1 1
is a representation of a linear operator A L(C2 )
0 1
which has no one-dimensional reducing subspace. Hence A has no diagonal
g
representation.

(iii) The matrix A =

Because of the last example we have to improve our previous idea:


Choose (A) and denote
k

Nk  Ker (I A) .
It is obvious that Nk Nk+1 and they cannot be all distinct. If Nk = Nk+1 , then
Ni = Nk for all i > k. Denote by n() the least such k and set


n()

N () 

Nj = Nn() ,

n()

R()  Im (I A)

j=1

Lemma 1.1.31. Let A L(X) and (A).


(i) Both N () and R() are A-invariant subspaces and the decomposition
X = N () R()

holds.

(1.1.11)

(ii) Denote by A|N and A|R the restrictions of A respectively to N () and R().
Then
(A|R ) = (A) \ {}.
(A|N ) = {},
Moreover, the dimension of N () is equal to the multiplicity of the eigenvalue .
(iii) If (A) = {1 , . . . , k }, then
X = N (1 ) N (k ).

(1.1.12)

Proof. (i) Since R() N () = {o} (by the denition of n()) and dim X =
dim N () + dim R() (Corollary 1.1.15), we deduce the decomposition (1.1.11). If
y = (I A)n() x R(),

1.1. Elements of Linear Algebra

17

then
Ay = (I A)y + y = (I A)n() (I A)x + y R().
The A-invariance of N () is also clear.
(ii) Obviously, (A|N ) (A). Let (A) \ {} and let x be a corresponding eigenvector. By (1.1.11) we have x = y + z where y N (), z R().
Further,
o = (I A)x = (I A)y + ( )y + (I A)z.
By virtue of the uniqueness of the decomposition we have
(I A)y = ( )y.
This implies that
o = (I A)n() y = ( )(I A)n()1 y,

i.e., y Ker (I A)n()1 .

By repeating this procedure we get


y Ker (I A)
and, therefore, ( )y = o, i.e.,
y=o

and

x = z R().

This shows that  (A|N ) and (A|R ). Since N () R() = {o} the
eigenvalue does not belong to (A|R ).
The matrix representation A of A with respect to the basis formed by joining
the bases of N () and R() has the block form


AN O
A=
.
O AR
It follows that
det(zI A) = det(zI AN ) det(zI AR )
and hence the characteristic polynomial of AN is PN (z) = (z )m() where m()
is the multiplicity of the eigenvalue of A. Therefore dim N () = m().
(iii) This follows by induction with respect to the eigenvalues of A.

For a polynomial P (z) = an z n + + a1 z + a0 and A L(X) we put
P (A) = an An + + a1 A + a0 I.
Corollary 1.1.32 (HamiltonCayley). Let A L(X) and let P be the characteristic
polynomial of A. Then P (A) = O.

18

Chapter 1. Preliminaries

Proof. Assume that P has the form (1.1.10) and x = x1 + + xk is the decomposition given by (1.1.12). Since mk = n(k ),
k1


(A k I)mk x =

(A k I)mk xj + o.

j=1

The result follows by induction.

It remains to compute the representation of the restriction of i I A to


N (i ). Notice that this restriction is nilpotent.7

Lemma 1.1.33. Let B L(X) be a nilpotent operator of order n. Then for any
x X \ Ker B n1 the elements x, Bx, . . . , B n1 x are linearly independent and
the subspace
Y = Lin{x, Bx, . . . , B n1 x}
reduces B. The restriction B|Y of B to Y has the representation

0
0
..
.
0

1
0
.. . .
.
.
0

0
0

1
0

with respect to the basis {B n1 x, . . . , x}. There exists a B-invariant direct complement of Y and the restriction of B to such a complement is nilpotent of order
n.
Proof. It is easy to see the linear independence of the elements x, . . . , B n1 x.
Indeed, if
n1

j B j x = o,
j=0

then, by applying B n1 , we get


0 B n1 x = o,

i.e.,

0 = 0.

Repetition shows that j = 0 for all j = 0, 1, . . . , n 1. The form of representation


of B|Y is obvious. The existence of an invariant direct complement of Y can be
proved by induction with respect to the order of nilpotency. We omit details and
refer to, e.g., Halmos [64, 57].

We are now ready to summarize all information to obtain the following fundamental result.

1.1. Elements of Linear Algebra

19

Theorem 1.1.34 (Jordan Canonical Form). Let X be a complex linear space of


nite dimension and let A L(X). Assume that (A) = {1 , . . . , k }. Then there
exists a basis F of X in which A has the canonical block representation

(1)

A1

F
A =

..

.
(l )

A1 1
O

..

.
(l )

Ak k

where the block matrices (the so-called Jordan cells) have the form

(i)
Aj

0
..
.

0


0
..
.

..



lj columns

0
,

1
j


i = 1, . . . , lj , j = 1, . . . , k.

(1.1.13)

Remark 1.1.35.
(i) We can also interpret Theorem 1.1.34 as follows. Let AE be the representation
of A with respect to the basis E. By Remark 1.1.12(iii), there is a regular
transformation matrix T such that (1.1.2) holds. The canonical matrix AF
may be viewed as a representation of a B L(X) with respect to the basis
E. Denote by T a linear operator represented in the basis E by the matrix T .
Then one has
B = T 1 AT.
(1.1.14)
(ii) Assume that A L(X) where X is a real linear space. The problem in the
application of Theorem 1.1.34 lies in the fact that the spectrum (A) R is
not sucient to guarantee the decomposition (1.1.12). This obstacle can be
overcome by the complexication XC of X. Namely, A is extendable to XC
by the formula
AC (x + iy) = Ax + iAy.
If = + i, = 0, is an eigenvalue of AC with an eigenvector u + iv, then u
and v are linearly independent in X and the complex conjugate is also an
eigenvalue of AC and u iv is the corresponding eigenvector. Moreover, both
and have the same multiplicity. Rearranging the AC -canonical basis by
joining its parts which correspond to and we obtain a basis of the real
operator B L(X) is said to be nilpotent if there is such an n N that B n = O. The least
such integer n is called the order of nilpotency.

7 An

20

Chapter 1. Preliminaries

space X in which the representation of


1
0
0
1

0
0

..
..
.
.

.
..
..
..
..
.
.
.

0
0
0
0

A has blocks of the form

..

1
0

..
.
0
1

We omit simple computations which conrm these statements and leave them
to the reader.

The simple canonical form is convenient for solving a system of linear dierential equations with real constant coecients. Such a system can be written in
the form
dx
= Ax,
A L(X).
(1.1.15)
x 
dt
If X = RM and A = (aij ) is the representation of A with respect to the
standard basis e1 , . . . , eM , then (1.1.15) is an abstract formulation of the system
x i (t) =

M


aij xj (t),

i = 1, . . . , M,

j=1

where x(t) =

M


xi (t)ei .

(1.1.16)

i=1

In order to nd a solution, it is convenient to transform (1.1.16) into a canonical


form. If T L(X) is invertible, then x = T y is a solution of (1.1.15) if and only if
y solves the equation
y = By,

where By  T 1 AT y.

Theorem 1.1.34 says that T can be chosen in such a way that the representation of
B with respect to the standard basis is the Jordan Canonical Form of A. Having
this form it is easy to solve (1.1.16) (see Exercise 1.1.41).
Qualitative properties of solutions of (1.1.15) are often more interesting than
an involved formula for solutions. Therefore it would be convenient to generalize
the exponential function solving x = ax in R to L(X). Similarly to the onedimensional case we put
n

t n
A x
etA x 
n!
n=0
provided the series is convergent in L(X). We postpone the question of convergence
of this series (see Exercise 2.1.34) and give instead an equivalent denition of a
function f (A) for A L(X) without any use of innite series.

1.1. Elements of Linear Algebra

21

First we will dene f (B) for B L(CM ) which has a representation in the
form

.. ..

.
.
(1.1.17)
B=
.

Assume that f is a polynomial P : z  a0 z n + + an . Obviously, we dene


P (B) = a0 B n + + an I.
It will be convenient to rewrite P (B) in a form which is more adequate for generalization. Since
n

P (j) ()
P (z) =
(z )j ,
j!
j=0
we can write
P (z) =

M1

j=0

P (j) ()
(z )j + (z )M R(z)
j!

where R is a polynomial, possibly equal to 0. Since z  (z )M is the characteristic polynomial of B, we have (B I)M = O (by Corollary 1.1.32). This means
that
M1
 P (j) ()
(B I)j .
(1.1.18)
P (B) =
j!
j=0
This shows that we may dene
f (B) 

M1

j=0

f (j) ()
(B I)j
j!

(1.1.19)

for a function f holomorphic on a neighborhood (depending on f ) of (B) = {}.8


We denote by H((B)) the collection of such functions.
It is easy to check that the formula
(f g)(B) = f (B)g(B) = g(B)f (B)
holds for f, g H((B)). In particular, for w C \ {} and rw (z) = (w z)1 we
get
M1
 (B I)j
1
rw (B) = (wI B) =
.
(1.1.20)
(w )j+1
j=0
8A

weaker assumption on f would be also sucient but we do not try to obtain an unduly
general denition. See also Lemma 1.1.37 below.

22

Chapter 1. Preliminaries

Remark 1.1.36. The following assertion yields another equivalent denition of


f (B) which can be used also in a general Banach space for a linear continuous
operator B : X X (see Section 1.2 for the notions of the Banach space and the
continuous linear operator). Also Theorem 1.1.38 holds in this more general setting
(Dunford Functional Calculus, see Proposition 3.1.14 or Dunford & Schwartz [44]).
Lemma 1.1.37. Let be a positively oriented Jordan curve, (B) int , and let
f be a holomorphic function on a neighborhood of int . Then

1
f (B)x =
f (w)(wI B)1 x dw,
x X.9
2i

Proof. By (1.1.20) we have


1
2i

f (w)(wI B)

x dw =

M1

j=0

1
2i


f (w)
dw (B I)j x.
(w )j+1

The result follows now from the Cauchy Integral Formula.10

Let A L(CM ) have the canonical form (1.1.17), i.e., A = T BT 1. Then we


dene f (A) by (1.1.19) replacing B by A. Notice that
f (A) = T f (B)T 1.
We can proceed in the same way for a general A L(X) using the decomposition
(1.1.12). This leads to the following theorem.
Theorem 1.1.38 (Functional Calculus). Let X be a complex linear space and let
A L(X). Then there exists a unique linear operator
: H((A)) L(X)
with the following properties:
(i) (f g) = (f )(g) = (g)(f ) for f, g H((A));
n
n


aj z j , then (P ) =
aj Aj ;
(ii) if P (z) =
(iii) if f (z) =

j=0
1
wz

j=0

for w  (A), then (f ) = (wI A)1 .

the integrand is a function w CM M (in a matrix representation), the integral is


an M M -tuple of standard curve integrals.
10 We recall the following result from the theory of functions of a complex variable:
If f and are as in Lemma 1.1.37, then

f (w)
j!
dw
holds for z int and j N {0}.
f (j) (z) =
2i (w z)j+1
9 Since

1.1. Elements of Linear Algebra

23

Remark 1.1.39.
(i) A mapping (f ) can be computed either by Lemma 1.1.37 which is valid
also for a general A, or by the formula
f (A)x =

k m(
l )1


f (j) (l )
l=1

j=0

j!

(A l I)j l x

where (A) = {1 , . . . , k } and l is the projection onto N (l ) dened by


the decomposition (1.1.12). We note that these projections are also functions
of A, namely l = l (A) where

1, z B(l ; ),
l (z) =
0, z  B(l ; )
and > 0 is small enough so that (A) B(l ; ) = {l }.
(ii) If X is a real linear space of nite dimension and A L(X), then we can
construct a functional calculus for XC and AC (see Remark 1.1.35(ii)).
(iii) We deduced a functional calculus from Theorem 1.1.34. The opposite way
is also possible, namely to use functional calculus for nding the canonical
form. An important role is played by projections l giving the decomposition
(1.1.12). The interested reader can nd more details, e.g., in Dunford &
Schwartz [44, Section VII, 1].

Exercise 1.1.40. Show that


sgn det A = (1)p

where p =

m()

(A)
<0

for a matrix representation of A with real entries. (The sum over the empty set is
dened to be zero.)
mk
1
Hint. Notice that det A = m
1 k .
Exercise 1.1.41. Show that the formula (1.1.19) yields a matrix representation of
etB in the form
t

tM 1 t
e
tet (M1)!
e

0
tM 2 t
et (M2)!
e

.
.
.
..

..
..
..
t
0
0

whenever B has the representation (1.1.17) with the respect to the same basis.
Exercise 1.1.42. Use the formula in Remark 1.1.39(i) to estimate etA xCM for
large positive t and large negative t in dependence on (A).

24

Chapter 1. Preliminaries

Hint. Suppose that < Re < for all (A). Show that there are constants
c1 , c2 such that

c1 et x for t 0,
etA x
for all x CM .
c2 et x for t 0
In particular, if < 0, then all solutions of (1.1.15) tend to zero as t +
(asymptotic stability).
Exercise 1.1.43. Let A L(CM ) have a regular matrix representation. Show that
(i) all matrix representations of A are regular;
(ii) there exists B L(CM ) such that
eB = A.
Is this B unique? How is (B) related to (A)?

1.2 Normed Linear Spaces


In the preface we mentioned that our main attention is focused on the properties of
nonlinear mappings dened on various spaces of functions. Besides the linear structure studied in the previous section such spaces also have a topological structure.
We will assume that the two structures are joined together (like multiplication
and addition are joined together by the distributive law in the notion of the eld).
A natural requirement of continuity of linear operations leads to the notion of a
linear topological space. These spaces are often too general for purposes of nonlinear analysis. For example, basic notions and results of dierential calculus in
such spaces are not straightforward generalizations of the corresponding notions
for functions of several variables and they frequently need profound ideas. Because
of that we restrict our interest to cases when a topological structure is given by
a metric, especially by a norm. Before starting with this concept we briey introduce the main topological notions. For more information, the interested reader
can consult books like Dugundji [43], Kelley [75].
A set X with a collection T of its subsets is called a topological space if T
possesses the following properties:
(1) , X T ;
(2) an intersection of a nite number of sets of T belongs to T ;
(3) a union of any subcollection of T belongs to T .
Elements of T are called open sets. A subset U X is called a neighborhood of a
point x X if there is an open set G X such that x G U.
An important special case of a topological space is the so-called metric space.
This is a set X with a real function (metric) : X X [0, ) for which

1.2. Normed Linear Spaces

25

(1) (x, y) = 0 x = y,
(2) x, y X = (x, y) = (y, x) (the so-called symmetry of the metric),
(3) x, y, z X = (x, z) (x, y) + (y, z) (the so-called triangle inequality).
If is a metric on X, then
B(x; r)  {y X : (x, y) < r}
is called the open ball centered at x X with the radius r > 0. Open sets in a
metric space are dened as subsets G X which have the following property:
for every x G there is > 0 such that B(x; ) G.
It is easy to prove that a metric space with this denition of open sets is also a
topological space. For the following notions and results see, e.g., Dieudonne [35].
A subset F of a topological space X is called a closed set if X \ F is open. If
A X, then the intersection of all closed sets containing A is called the closure
of A and is denoted by A, i.e.,

A=
F.
AF
F is closed

A dual notion is the interior (int A) of A:



int A =

G.

GA
G is open

The boundary A of A is dened by


A  A X \ A.
A subset A of X is said to be dense if A = X.
It is almost obvious that in a metric space X we have the following equivalences:
11
(i) x A {xn }
n=1 A : lim (xn , x) = 0,
n

(ii) x int A > 0 : B(x; ) A.


A metric space X is said to be separable if there is a countable dense subset of X.
11 We

also say that the sequence {xn }


n=1 is convergent to x and write lim xn = x or, more
n

simply, xn x. The notion of a convergent sequence can be introduced also in topological


spaces:
{xn }
n=1 is convergent to x if for every neighborhood U of x there is an index n0 N
such that xn U for each n n0 .
Warning. In a topological space there need not be enough convergent sequences in order to describe a closure, etc.! See, e.g., J. von Neumanns example in Dunford & Schwartz [44, Chapter
V, 7, Example 38].

26

Chapter 1. Preliminaries

If X, Y are topological spaces and f : X Y , then f is said to be continuous


on X provided f1 (G) is open in X whenever G is an open set in Y . If f is injective
and surjective, f , f 1 are both continuous, then f is called a homeomorphism of
X onto Y . It is also possible to dene continuity at a point a X with help of
the notion of a neighborhood:
f is continuous at a if f1 (U) is a neighborhood of a whenever U is a
neighborhood of f (a).
A mapping f is continuous on X if and only if it is continuous at every point of
X. The following equivalence holds in metric spaces X and Y :
f: X Y

is continuous at a X

(xn a = f (xn ) f (a)).

A very important notion is that of compactness: A topological space


 X is said to
be compact if for every open covering {G } of X (i.e., X =
G ) there is a

nite subset K such that


X=

(a nite subcovering).
Any subset A of a topological space X is itself a topological space with the
collection of open sets {(G A) : G open in X}. A subset A of a topological space
X is said to be compact in X if A is a compact topological space in this induced
topology. Further, A X is said to be relatively compact provided A is compact.
In metric spaces we have the following characterization.
Proposition 1.2.1. Let X be a metric space. Then A X is relatively compact if
12
and only if for any sequence {xn }
n=1 A there is a convergent subsequence.
Beside this proposition, the importance of compactness in analysis is obvious
from the next result which will be discussed more deeply in Section 6.2.
Proposition 1.2.2. Let X be either a compact topological space or a sequentially
compact topological space and let f be a continuous real function on X. Then there
exist a maximal and a minimal value of f , i.e., there are x1 , x2 X such that
f (x1 ) f (x) f (x2 )

for all

x X.

topological space X is said to be sequentially compact if for any sequence {xn }


n=1 X
there is a subsequence {xnk }
k=1 which is convergent to a point x X.

12 A

Warning. These two notions of compactness are dierent in topological spaces. To be more
precise:
There is a compact topological space which is not sequentially compact and there is
a sequentially compact topological space which is not compact!

1.2. Normed Linear Spaces

27

To nd a criterion for compactness in a particular space need not be an easy


task. To formulate a general result we need one more notion the signicance of

which goes far beyond our present considerations. A sequence {xn }n=1 of elements
of a metric space X is called a Cauchy sequence if for every > 0 there is n0 N
such that
for all m, n n0 .
(xm , xn ) <
A metric space X is said to be complete if any Cauchy sequence in X is convergent
(to an element of X). We will encounter complete spaces almost everywhere in
the subsequent text.
Proposition 1.2.3. Let X be a complete metric space. Then A X is relatively
compact if and only if for every > 0 there is a nite set K X (the so-called
nite -net for A) such that

In other words, A

a A x K :

(a, x) < .

B(x; ).

xK

Proposition 1.2.4. Let X be a complete metric space and let f : [, ) X. If f is


uniformly continuous on [, ),13 then there exists lim f (x) X. In particular,
x

if < , then f can be continuously extended to [, ].


Denition 1.2.5. A topological space X is called a connected space provided it is
not possible to nd two disjoint nonempty open sets G1 , G2 such that
X = G1 G2 .
For a X put
C(a) 


{A X : a A and A is connected}.

Then C(a) is a connected set and it is called the component of the point a. If
a, b X, a = b, then either C(a) = C(b) or C(a) C(b) = .
Proposition 1.2.6. Let X be a connected space, let f : X Y be continuous. Then
f (X) is a connected subset of Y . In particular, if : [0, 1] Y is continuous,
A Y , and (0) A, (1)  A, then there exists t0 [0, 1] such that (t0 ) A.
Proposition 1.2.7. Let X be a normed linear space and let G be an open subset of
X. Then G is connected if and only if for any two points a, b G there exists a
continuous mapping : [0, 1] G such that (0) = a, (1) = b. In particular,
can be chosen piecewise linear.
Now we are ready to start with the main subject of this section.
Denition 1.2.8. Let X be a real or complex linear space. A function  X : X
R is called a norm on X if it has the following properties:
13 I.e.,

> 0 > 0 x, y [, ) : |x y| < = (f (x), f (y)) < .

28

Chapter 1. Preliminaries

(1) xX = 0 x = o,
(2) xX = ||xX for R or C and x X,
(3) x + yX xX + yX for x, y X (the so-called triangle inequality).
If a linear space X is endowed with a norm, then X is called a normed linear
space.
In the sequel we will drop the index of the norm whenever there is no danger
of confusion.
It is obvious that (x, y)  x y is a metric on X. Therefore all metric
notions and results are transmitted to normed linear spaces. If a normed linear
space is complete in this metric, then it is called a Banach space. Any metric space
can be embedded as a dense set into a complete metric space. For a normed linear
space X we get a slightly stronger result:
(the so-called completion of X) and a
There exists a Banach space X

and
linear injection L : X X such that Im L is a dense subset of X
xX = L(x)X

x X.

for all

Example 1.2.9. Let X be an M -dimensional real linear space. Choose a basis


M

f1 , . . . , fM of X and let e1 , . . . , eM be the standard basis of RM . For x =
xi fi
X put (x) =

M


i=1

xi ei . Then is an isomorphism of X onto RM . Moreover,

i=1

(x1 , . . . , xM )1 

M


|xi |,

(x1 , . . . , xM ) 

i=1

(x1 , . . . , xM )2 

M


max |xi |,

i=1,...,M

 12
|xi |

i=1

are norms on RM (for indices 1, it is obvious, the triangle inequality for index
2 needs some eort see also Proposition 1.2.30 below). These norms can be
transmitted to X with help of , i.e.,
x  (x) ,

= 1, 2, .

Similar results are true also for a complex linear space X when CM is used instead
of RM . The space X is a Banach space with respect to any of the above norms.
g

The classical BolzanoWeierstrass result on the compactness of a closed


bounded interval in R has the following generalization:
Let X be a nite dimensional space endowed with an -norm ( =
1, 2, ). Then A X is relatively compact if and only if it is bounded
(i.e., there is a constant c such that x c for every x A).

1.2. Normed Linear Spaces

29

We note that this result is true for any norm on X (see Corollary 1.2.11(i) below).
Proposition 1.2.10. Let X and Y be normed linear spaces and let A be a linear
operator from X into Y . Then the following statements are equivalent:
(i) A is continuous on X;
(ii) A is continuous at o X;
(iii) there is a constant c such that the inequality
AxY cxX

is valid for all

x X.

Proof. The easy proof is left to the reader as an exercise.

(1.2.1)


We denote the collection of all continuous linear operators from X into Y by


L(X, Y ) and the least possible constant c in (1.2.1) by AL(X,Y ) . This quantity
has all properties of a norm on the linear space L(X, Y ). We will always consider
this norm (the so-called operator norm) on L(X, Y ). If X = Y , we will use the
shorter notation L(X) instead of L(X, X).
We now return to Example 1.2.9. It is obvious that there are positive constants c1 , c2 such that
c1 x1 x2 c2 x1
for all x RM (CM ).

1
, c2 = M .) Such constants exist also for the norms  1 ,
(Here, e.g., c1 = M
  . More generally, two norms on a linear space X are called equivalent if they
satisfy such inequalities. In other words, two norms   ,   on a linear space
X are equivalent if the identity map from (X,   ) into (X,   ) is continuous
together with its inverse, i.e., it is an isomorphism.14
Corollary 1.2.11.
(i) Any two norms on a nite dimensional linear space X are equivalent. In
particular, X is a Banach space.
(ii) Let X, Y be normed linear spaces and dim X < . Then L(X, Y ) =
L(X, Y ), i.e., any linear operator from X into Y is continuous.
Proof. (i) Let be as in Example 1.2.9 and consider RM (or CM ) equipped with
the  1 -norm. Then for x = (x1 , . . . , xM ) RM we have

M
M
M






1 (x)X = 
xi fi 
|xi | fi X c
|xi | = cx1 ,


i=1

i=1

i=1

i.e., 1 is continuous. Observe that for proving continuity of it is sucient to


show that
inf{1 (x)X : x1 = 1} > 0.
14 Unlike the algebraic isomorphism from Denition 1.1.11 here, it is understood in the topological sense. In general, A L(X, Y ) is an isomorphism if A is injective, surjective and
A1 L(Y, X).

30

Chapter 1. Preliminaries

But this is true since the set {x RM : x1 = 1} is compact and 1 is


continuous.
Now let  ,   be two norms on X, dim X = M and let be the identity
(= X with the norm   ). The result follows from the
map from X onto X
commutativity of the diagram in Figure 1.2.1. Since RM and CM are complete
spaces with respect to the  1 -norm (the classical BolzanoCauchy condition)

and {un }n=1 X is a Cauchy sequence if and only if {(un )}n=1 is a Cauchy
sequence, X is a Banach space.

X
1

RM (CM )
Figure 1.2.1.

(ii) It is sucient to prove continuity with respect to the 1-norm on X.


M

For x =
xi fi X we have
i=1

AxY

M

i=1

|xi | Afi Y c

M


|xi | = cx1 .

i=1

Example 1.2.12 (spaces of continuous functions). Let T be a compact topological


space. Then any continuous real (complex) function f is bounded on T (Proposition 1.2.2), and
f T = sup{|f (x)| : x T }
is a norm on the space C(T ) of all such functions. Convergence of a sequence in
this norm is the uniform convergence on T . It follows that C(T ) is a Banach space.
If T is not compact, then a continuous function on T need not be bounded.
To get a topology on a family of continuous functions on T we can either restrict
our attention to the space BC(T ) of all bounded, continuous functions on T or
assume certain properties of T which are weaker than compactness (the reader
can consider RM as a model of such T ). As a result we wish to obtain a topology
on C(T ) in which convergence of a sequence is equivalent to the locally uniform
convergence. This can be done as follows. Let a topological space T be a countable
union of open, relatively compact subsets Tn .15 We leave to the reader to verify
that the sum


1 f gn
(f, g) 
(1.2.2)
n 1 + f g
2
n
n=1
basic example is RM or CM . Another example is the set N of natural numbers with the
discrete metric: d(m, n) = 1 if m
= n and d(m, m) = 0.

15 A

1.2. Normed Linear Spaces

31

where
f gn  sup{|f (x) g(x)| : x Tn },
denes a metric on C(T ) and the convergence of a sequence in this metric is
actually the locally uniform convergence, i.e., uniform convergence on any compact
subset of T . Since is bounded it cannot be induced by any norm. Even more is
true, namely, there is no norm on C(T ) which generates the same system of open
g
sets as the metric does (provided T itself is not compact).
We now state two fundamental results concerning spaces of continuous functions. To formulate the rst we need the concept of equicontinuity:
A family F C(T ) is said to be equicontinuous if for all x T and > 0
there is a neighborhood U of x such that
y U, f F

|f (y) f (x)| < .

Theorem 1.2.13 (Arzel`aAscoli). Let T be a topological space which is a union of a


sequence of open, relatively compact subsets. Then F C(T ) is relatively compact
in the -metric if and only if the following two conditions are satised:
(i) F is equicontinuous;
(ii) for each x T the set {f (x) : f F } is bounded in R (or C).16
Proof. We omit the proof and refer, e.g., to Dugundji [43, Section XII, 6] or Kelley [75, Chapter 7, Theorem 17] where more general results are proved.

Since a continuous function can be very strange (e.g., nowhere dierentiable)
it is often desirable to have an approximation procedure. The rst result of this
type was the famous Weierstrass Theorem on uniform approximation by polynomials. One of the characteristic features of this approximation consists in the fact
that the product of two continuous functions is a continuous function and the same
is true for polynomials. In algebraic terms: Both sets are not only linear spaces
but also algebras.17 The following generalization of the Weierstrass Theorem is
due to M.H. Stone.
Theorem 1.2.14 (StoneWeierstrass). Let T satisfy the assumption of Theorem
1.2.13 and let CR (T ) be an algebra of all real continuous functions on T . Let
A CR (T ) be a subalgebra which contains constant functions and separates points
of T (i.e., for any x, y T , x = y, there is f A such that f (x) = f (y)). Then
A is dense in CR (T ) with respect to the -metric.
16 If T is compact and (i) holds, then the assumption (ii) is equivalent to the boundedness of F
in C(T ).
17 A linear space X with a binary operation (product) which is associative and distributive with
respect to linear operations is called an algebra. Further, if X is a normed linear space and for
the product the inequality x y x y holds for every x, y X, then X is called a normed
algebra and, in the case that X is complete, a Banach algebra.

32

Chapter 1. Preliminaries

Proof. The proof can be found, e.g., in Dugundji [43, XIII, 3] or Kelley [75, Chapter
7, Exercise T].

We note that Theorem 1.2.14 can be easily extended to the space of complex
continuous functions. In this case, A is assumed to possess the following additional
property:
If f A, then also f A.18
The reader can ask why certain additional properties are needed for compactness in innite dimensional spaces like C(T ) in contrast to nite dimensional
spaces. The following theorem explains not only this situation but also the technical diculties which one meets in the calculus of variations (see Chapter 6).
Proposition 1.2.15 (F. Riesz). Let X be a normed linear space. Then the closed
unit ball B(o; 1)  {x X : x 1} is compact (in the norm topology) if and
only if X has nite dimension.
Proof. Suciency is obvious (see Example 1.2.9 and Corollary 1.2.11(i)). It remains to prove necessity. We proceed by contradiction. Assume that dim X = .
Choose 0 < < 1 and suppose that we have x1 , . . . , xn B(o; 1) such that
xi xj  > 1

for all 1 i < j n.

We shall show that we can nd another element xn+1 B(o; 1) such that
{x1 , . . . , xn+1 } has the same property. Since Xn = Lin{x1 , . . . , xn } = X there
is y X \ Xn . Denote
d  inf{y x : x Xn }.
Observe that d > 0 since Xn is a closed subspace.19 By the denition of the
greatest lower bound, there exists x
Xn such that
d y x < d(1 + ).
For xn+1 

y
x
y
x

B(o; 1) and x Xn we get

xn+1 x =

1
1
y (
x + y xx)
d > 1 .
y x

d(1 + )

Thus an innite sequence {xn }n=1 B(o; 1) with no convergent subsequence has
been constructed, which contradicts compactness of B(o; 1).

Example 1.2.16 (spaces of integrable functions). Let be a Lebesgue measurable
subset of RM and let dx denote the Lebesgue measure in RM .
18 If

z = x + iy, x, y R, then its complex conjugate z is dened by z  x iy.


nite dimensional subspace Y X is complete, and therefore closed in X.

19 Every

1.2. Normed Linear Spaces

33

For p [1, ) we denote



Lp ()  f : R (or C) : f is measurable

 p1

p
and |f |Lp () 
|f (x)| dx < .

(1.2.3)

The Minkowski inequality


|f + g|Lp () |f |Lp () + |g|Lp ()

(1.2.4)

implies that Lp () is a linear space. Observe that | |Lp () is not a norm since
|f |Lp () = 0 implies only f = o almost everywhere (abbreviation: a.e.) in . Put
N () = {f : C : f = o a.e. in }.
Then N is a linear subspace of Lp and the factor space
Lp ()  Lp ()|N
is a normed linear space with the norm
[f ]Lp() = |f |Lp ()

for any f [f ].20

For the sake of simplicity we will use the notation f instead of the superuous [f ]
for an element of Lp () and will call it simply a function. It is also convenient
to introduce the space L () of all (classes of) essentially bounded measurable
functions. We recall that f is said to be essentially bounded on if there is a
constant c such that
|f (x)| c
for a.e. x in .
The least possible c is denoted by f L () . Again  L () is a norm on L ().
We mention another important inequality the so-called H
older inequality:
1
is
If 1 p and p is the conjugate exponent ( p1 + p1 = 1 where


here dened to be 0) and f Lp (), g Lp (), then f g L1 () and


f g1 f p gp .

(1.2.5)
g

Proposition 1.2.17. Lp () is a Banach space for any 1 p .


Proof. We give the proof for p = 1 (some small modications are needed for
1 < p < , while the proof for p = is similar to the one of completeness of
1
C(T ), cf. Example 1.2.12). Let {fn }
n=1 be a Cauchy sequence in L (). Then for
20 For

the sake of simplicity we will use in the sequel the notation p instead of Lp () .

34

Chapter 1. Preliminaries

any k N there is nk N such that fn fnk 1 <


that the sequence

{nk }k=1

for all n nk . We can assume


p

is strictly increasing. Put gp =
|fnk+1 fnk |. Since
k=1


gp (x) dx

1
2k

p 


|fnk+1 (x) fnk (x)| dx

k=1

p

1
,
2k

k=1

the Monotone Convergence Theorem21 gives that g = lim gn has a nite integral
n


over and therefore g is nite a.e. in . This means that
|fnk+1 (x) fnk (x)|
k=1

is a.e. convergent, and therefore f (x)  lim fnk (x) exists a.e. in . By the Fatou
k

Lemma22 we have


|f (x) fnk (x)| dx lim inf
|fnl (x) fnk (x)| dx
l

1
.
2k1

In particular,
f L1 ()

and

lim fnk f 1 = 0.

The rest of the proof is easy. Indeed, a Cauchy sequence which has a convergent
subsequence is itself convergent.

Remark 1.2.18. The proof shows that the following statement is true:

If {fn }n=1 is convergent to f in the Lp -norm, then there is a subse


quence {fnk }k=1 which converges to f a.e., and there is g Lp (),
g 0, such that
|fnk (x)| g(x)

for a.e.

x .

Warning. The whole sequence need not be


To see this arrange the

 a.e. kconvergent!
into
a
sequence.
characteristic functions of the intervals k1
,
2k
2k
21 This

theorem reads as follows:


Let {gn }
n=1 be an increasing sequence of nonnegative measurable functions on
and let g = lim gn . Then
n


gn (x) dx =

lim

n
22 The

g(x) dx.

Fatou Lemma reads:


Let {hn }
n=1 be a sequence of measurable functions which are uniformly bounded
below by an h L1 (). Then


lim inf hn (x) dx lim inf
hn (x) dx.
n

The statement holds for lim sup with the reverse inequality for a sequence bounded
above by an integrable function.
Put here hl = |fnl fnk |.

1.2. Normed Linear Spaces

35

Approximations of integrable functions by more regular functions, like continuous or dierentiable ones, are often desirable.
Proposition 1.2.19 (Density Theorem). For any p [1, ) the subset C()Lp ()
is dense in Lp ().
Proof. It is based on the application of the Luzin Theorem.23 See also Proposition 1.2.21 below.

We now show another type of approximation which is more constructive and
therefore often more convenient in applications. If f , g are measurable functions
on RM , then we dene their convolution f g as

(f g)(x) 
f (x y)g(y) dy
for all x RM
(1.2.6)
RM

for which the integral exists. We note that the properties of the convolution follow from the Fubini Theorem provided measurability of the function (x, y) 
f (x y)g(y) is established. For details see, e.g., Folland [52], Gripenberg, Londen
& Staans [62, Chapters 24], and also Example 2.1.28. The following assertion is
a basic result on convolutions.
Proposition 1.2.20. Let f L1 (RM ).
(i) If g Lp (RM ), 1 p , then
f g Lp (RM )

and

f gp f 1gp .

(ii) If g L (RM ), then f g is bounded and uniformly continuous on RM .


g
g

Lp (RM ), then x
(f g) = f x
a.e. in RM .
(iii) If g Lp (RM ) and x
i
i
i

(x) dx = 1 (the so(iv) If is a nonnegative, measurable function with
RM

called mollier) and n (x)  nM (nx), then n g converge to g in the


Lp -norm for any g Lp (RM ), 1 p < .
If T is a topological space and f : T R (C), then the support of f (abbreviation supp f ) is the set
{x T : f (x) = 0}.
If RM is an open set, then D() denotes the set of all innitely dierentiable
functions on (i.e., their derivatives of arbitrary order are continuous in ) which
have compact support lying in .
23

Roughly speaking, the Luzin Theorem says that a bounded measurable function is continuous
with respect to sets, measures of which are arbitrarily close to the measure of provided the
latter is nite. For a more general formulation and the proof of the Luzin Theorem the reader
can consult, e.g., Rudin [113, 2.23].

36

Chapter 1. Preliminaries

We show that D() contains enough functions. Put



1
e 1x2 , |x| < 1,
(x) =
0,
|x| 1.
It is a matter of simple calculation to prove that D(R). If a , then
B(a; ) for a > 0 small enough and the function (y)  2 y aRM
belongs to D(). However, much more is true.
Proposition 1.2.21. Let be an open set in RM and let p [1, ). Then D()
is dense in Lp ().
Proof. The just dened function multiplied by an appropriate constant satises
the assumptions of Proposition 1.2.20(iv). There is a strictly increasing sequence
of compact subsets Cm of such that

Cm = .

m=1

Extend f Lp () by zero outside and put


f m = m f
where m is the characteristic function of the set Cm . Then fm f in the Lp norm. By Proposition 1.2.20, n fm D() for n nm and
n fm f p n (fm f )p + n f f p fm f p + n f f p .


The result follows from Proposition 1.2.20(iv).

Remark 1.2.22. If meas < and 1 p < p , then, by the H


older inequality,
1

f p (meas ) p p f p ,

f Lp ().

(1.2.7)

This means that the identity map of Lp () into Lp() is continuous. We will
denote this fact by Lp ()  Lp() and say that Lp () is continuously embedded
into Lp().
Warning. Simple examples show that this is not true if meas = !
The following assertion is an analogue of Theorem 1.2.13.
Proposition 1.2.23 (A.N. Kolmogorov). Let be an open set in RM . Then M
Lp (), p [1, ), is relatively compact if and only if the following conditions are
satised:
(i) M is bounded in Lp (),

1.2. Normed Linear Spaces

37


(ii) > 0 > 0 f M:
(iii) > 0 > 0 f M:

|f (x + y) f (x)|p dx < for all yRM < ,24

{x:xRM }

|f (x)|p dx < .

Proof. For the proof based on Proposition 1.2.3 see Yosida [135, Chapter 10, 1].

Remark 1.2.24. All results from 1.2.161.2.23 also hold in spaces of sequences

 p1


lp  x = {xn }n=1 : xp =


|xn |p
<

n=1

which can be regarded as Lp (N) equipped with the counting measure ((A) =
card A).
Example 1.2.25 (spaces of dierentiable functions). We can consider either classical derivatives (dened as limits of relative dierences) or weak derivatives. We
start with the former case.
Let = (1 , . . . , M ) be a multiindex , i.e., i N {0}, i = 1, . . . , M , and
||  1 + + M . For a function f on an open set RM we put
D f (x) 

|| f (x)
M
x
M

1
x
1

and say that f C n () if D f are continuous for all multiindices for which
|| n. We can use the metric given by (1.2.2) to dene (f, g)  (D f, D g)
for a multiindex and set

n (f, g) 
(f, g).
||n

Then n is a metric on C n () and the convergence in this metric is the locally


uniform convergence of all derivatives D , 0 || n (Do f = f ). Another
possibility is to consider only such functions f C n () for which D f is bounded
in for all 0 || n. We denote the collection of such functions by C n ()25
and put

f C n() 
sup |D f (x)|.
||n

This is a norm, C n () is a Banach space, and the convergence of a sequence

{fk }k=1 C n () to f in this norm means that


D fk D f

uniformly on

for all || n.

x + y
, then we set f (x + y)  0.
connection with this notation observe that for a relatively compact set all derivatives
D f , || n 1, are uniformly continuous, and therefore continuously extendable to .
24 If

25 In

38

Chapter 1. Preliminaries

It is sometimes convenient to have a ner scale of spaces of dierentiable


functions. We can achieve that by introducing the H
older continuous functions:
A function f : R (or C) is called -H
older continuous (0 < 1)
if there is a constant c such that the inequality
|f (x) f (y)| cx y

holds for all x, y .26

The quantity
f C 0, ()  sup |f (x)| + sup
x

x,y
x =y

|f (x) f (y)|
x y

is a norm on the space C 0, () of -Holder continuous, bounded functions on . The space C n, () is dened similarly.
We note that C n, () is a Banach space with respect to the above norm (cf. Exercise 7.1.4).
Now we turn our attention to weak derivatives on an open set RM . Let
f L1loc ()
(this means that f L1 (K) for every compact subset K ), and let be a
multiindex. A function g L1loc () is called an -weak derivative of f if


f (x)D (x) dx = (1)||
g(x)(x) dx for every D().
(1.2.8)

We will denote g =

f
Dw

and omit w when there is no danger of ambiguity.

Warning. Even in the one-dimensional case the ordinary derivative existing almost
everywhere need not be the weak derivative!
For example, the Heaviside function

1, x 0,
H(x) =
satises
0, x < 0,

H (x) = 0

for x R \ {0}

but the weak derivative does not exist. The distributional derivative of H 27 is the
Dirac measure.
26 If

= 1, then it is more common to say that f is a Lipschitz continuous function. We note


that the inequality is satised for a > 1 only if f is a constant function (cf. Exercise 7.1.6).
27 A linear form on the linear space D() is called a distribution (this notion is due to
L. Schwartz) if it has the following continuity property:
If n D() have their supports in the same compact set K and D n D
uniformly on for all multiindices , then
(n ) ().

1.2. Normed Linear Spaces

39

We note that an absolutely continuous function f on an interval I R has


a derivative a.e. (the Lebesgue Theorem, see Rudin [113]), and
 x
f (y) dy
for a, x I.
f (x) = f (a) +
a

This implies that Dw f = f . The situation in higher dimensions is not so simple


since there are several non-equivalent denitions of absolutely continuous functions.
Having a denition of weak derivatives we can dene Sobolev spaces W k,p ()
for an open set RM as follows:

W k,p ()  {f Lp () : derivatives Dw
f exist

and belong to Lp () for all || k}


with the norm
f W k,p () 

Dw
f p .28

(1.2.9)

||k

Similarly to the denition of Lp spaces, classes of functions are considered here.


Since Lp () is a Banach space, W k,p () is a Banach space, too.
As we will see later in this book, Sobolev spaces play an important role in
the study of boundary value problems. For this purpose the following assertions
g
are important.
Theorem 1.2.26 (Sobolev Embedding Theorem). Let k N and let p [1, ).

k 29
.
(i) If k < Np , then W k,p (RN )  Lp (RN ) for p1 = p1 N
(ii) If k =

(iii) If k >

N
p,

N
p,

then
W k,p (RN )  Lr (RN )

for all

r [p, )

W k,p (RN ) Lrloc (RN )

for all

r 1.

then W k,p (RN )  C 0, (RN ) for all 0 < k

and

N 30
p.

Note that any f L1loc () (and even a regular


 Borel measure on , see, e.g., Rudin [113]) yields
f (x)(x) dx for any D(). The distributional
a distribution f by the formula f () =

derivative D of a distribution is dened as D ()  (1)|| (D ), D(). It is easy


to prove that D is again a distribution, and an -weak derivative of f L1loc () is actually
equal to the distributional derivative D f . As the Heaviside function shows the converse is not
true.
28 Similarly as for the Lebesgue norm we will use in the sequel the notation
k,p instead of
W k,p () for the Sobolev norm.
29 The exponent p  pN
is sometimes called the critical Sobolev exponent.
Nkp
30 This means that any function f W k,p (RN ) can be changed on a set of measure zero in such
a way that the new function f is -H
older continuous and f C 0, (RN ) c f W k,p (RN ) . The
symbol RN means that functions from C 0, (RN ) are bounded and uniformly -H
older continuous
on the whole RN .

40

Chapter 1. Preliminaries

Proof. Proofs of these statements are quite involved and also have a long history.
The interested reader can consult, e.g., Adams [2], Kufner, John & Fuck [82],
Mazja [93], Stein [123, Chapters V, VI]. For a readable account of Sobolev spaces
we recommend Evans [48, Chapter 5]. Spaces with fractional derivatives which
extend the class of Sobolev spaces can be also dened, e.g., Triebel [128], [129]. 
Remark 1.2.27. The situation for an open set with a nonempty boundary (in
particular, for a bounded ) is even more complicated because some techniques
from harmonic analysis, like Fourier transform, are not available. One possibility is
to extend f W k,p () to a function f W k,p (RN ). This is possible if the boundary possesses certain smoothness properties. To explain this more precisely we
would need some facts about manifolds (see Section 4.3 and Appendix 4.3A). So
we omit details and just state that Theorem 1.2.26 is true provided is locally
Lipschitz (see Section 7.3 for details).
Theorem 1.2.28 (RellichKondrachov). Let be a bounded open set in RN with
a locally Lipschitz boundary, k N, p [1, ).
(i) Let k <

N
p

and q [1, p ) where


p 

pN
.
N kp

(1.2.10)

Then the embedding W k,p () into Lq () is compact.31


(ii) If k = Np , then W k,p ()  Lq () for all q [1, ).
(iii) If 0 < k

N
p,

then W k,p ()  C 0, ().

Proof. For the proof see references given above.

Now, we turn our attention to abstract spaces. Proposition 1.2.15 has pointed
out the dierence between nite dimensional spaces and (innite dimensional)
function spaces. Another dierence between the nite and innite dimension lies
in the notion of a basis. It can be shown that any algebraic basis in an innite
dimensional Banach space X has to be uncountable, and therefore the representation of a point by its coordinates can hardly be of any use. This observation leads
to the necessity of expressing an element of X by an innite sum. A sequence

{en }n=1 X is called a Schauder basis of X if for each x X there is a (uniquely


determined) sequence {n }
n=1 of numbers (real or complex according to whether
X is real or complex) such that
x=

n en .

(1.2.11)

n=1

will use the notation  for compact embeddings. An embedding of X into Y is compact
if a ball in X is relatively compact in Y .
31 We

1.2. Normed Linear Spaces

41

There are several imperfections in this denition. Namely, there are separable32
Banach spaces which do not possess a Schauder basis. Moreover, the convergence
of the sum in (1.2.11) can be understood in several non-equivalent meanings. These
problems do not appear in a special class of spaces with an additional structure
which is connected with the norm and allows measuring angles.
Denition 1.2.29. Let X be a real (or complex) linear space. A mapping (, )X :
X X R (or C) is called a scalar product on X if the following conditions are
satised:
(1) for any y X the mapping x  (x, y)X is linear;
(2) (x, y)X = (y, x)X for all x, y X in the real case and (x, y)X = (y, x)X in
the complex case;
(3) (x, x)X 0 for every x X and (x, x)X = 0 if and only if x = o.
Proposition 1.2.30. Let (, ) be a scalar product on a linear space X. Then
(i) the so-called Schwartz inequality
|(x, y)|2 (x, x)(y, y)

holds for all

x, y X;

(1.2.12)

(ii) the mapping   : x  [(x, x)] 2 is a norm on X.


Proof. Assertion (i). For x, y X there exists c C, |c| = 1, such that for y = cy
we have (x, y) R. Hence it suces to prove (1.2.12) for the real space X. For
any R we have
0 (x + y, x + y) = (x, x) + 2(x, y) + ||2 (y, y),
i.e., the discriminant 4|(x, y)|2 4(x, x)(y, y) is nonpositive. Hence (1.2.12) follows.
In assertion (ii), only the triangle inequality has to be checked. For x, y X
we get33
x + y2 = (x + y, x + y) = (x, x) + 2 Re(x, y) + (y, y) x2 + 2|(x, y)| + y2
and the Schwartz inequality completes the proof.

If X is a linear space with a scalar product we will always consider the norm
on X induced by this scalar product. If X is complete with respect to this norm,
then X is called a Hilbert space and will be usually denoted by H. We note that
which is a completion of X.
if X is not complete there exists a Hilbert space H
32 If

a space X has a Schauder basis, then X is separable. This is not a serious drawback since
most function spaces used in analysis are separable.
33 Notice here a typical procedure with the norm induced by a scalar product, namely using the
second power of the norm in calculation.

42

Chapter 1. Preliminaries

Example 1.2.31.
(i) RM with the scalar product
(x, y) =

M


i i ,

x=

i=1

M


i ei ,

y=

i=1

M


i ei ,

i=1

(e1 , . . . , eM the standard basis) is a Hilbert space. Similarly, CM is a Hilbert


M

space with respect to the scalar product (x, y) =
i i .
i=1

(ii) The norm on L2 () given by (1.2.3) is induced by the scalar product



(f, g)L2 () =
f (x)g(x) dx
(1.2.13)

(in the complex case). Similarly, for p = 2 the norm (1.2.9) is equivalent to
the norm induced by the scalar product

(f, g)W k,2 () =
(D f, D g)L2 () .
||k

(iii) The sup norm on BC() is not induced by any scalar product. This can
be seen from the parallelogram identity
x + y2 + x y2 = 2x2 + 2y2,

x, y X

(1.2.14)

which is valid only in such a space X the norm of which is induced by a


scalar product. Indeed, if a norm satises (1.2.14), then (in the real case)
(x, y) =

1
(x + y2 x y2 )
4

(1.2.15)

(polarization identity) has all properties of a scalar product, and the induced norm coincides with  . It is not dicult to show that the sup
norm does not satisfy (1.2.14). Even more is true, namely, the sup norm
is not equivalent to any norm on BC() induced by a scalar product. Since
C[0, 1] L2 (0, 1), the scalar product (1.2.13) is also a scalar product on
C[0, 1]. But the space C[0, 1] is not complete in the L2 -norm and, therefore,
the L2 -norm on C[0, 1] cannot be equivalent to the sup norm; only the
inequality
f L2 (0,1) f C[0,1]
holds. Observe that L2 (0, 1) is a completion of C[0, 1] with respect to the
g
integral norm given by (1.2.3).
The most useful concept in spaces with a scalar product is the following one.

1.2. Normed Linear Spaces

43

Denition 1.2.32. Let X be a linear space with a scalar product (, ).


(1) Subsets A, B X are said to be orthogonal (and denoted by A B) if
(a, b) = 0 for every a A, b B.
(2) A system {x } X is said to be orthonormal if

(x , x ) =

0, = ,
1, = .

(3) A sequence {en }n=1 X is called an orthonormal basis of X if {en }n=1 is


both an orthonormal system and a Schauder basis of X.
Suppose that x1 , . . . , xn are linearly independent elements of a space X with a
scalar product (, ). Put e1 = xx11  and if orthonormal elements e1 , . . . , ek (k < n)
are constructed in such a way that
Lin{x1 , . . . , xk } = Lin{e1 , . . . , ek },
then dene
yk+1 = xk+1

k


(xk+1 , ej )ej ,

ek+1 =

j=1

yk+1
.
yk+1 

It is obvious that
(ej , ek+1 ) = 0,

ek+1  = 1

j = 1, . . . , k,

and
Lin{x1 , . . . , xk+1 } = Lin{e1 , . . . , ek+1 }.
This procedure is called the Schmidt orthogonalization. For any x Y 
n

k ek . Taking the scalar product with ej , we get
Lin{x1 , . . . , xn } we have x =
k=1

(x, ej ) =

n


k (ek , ej ) = j ,

k=1

and also

x2 =

n

j=1

(x, ej )ej ,

n

k=1

(x, ek )ek =

n

k=1

|(x, ek )|2 .

44

Chapter 1. Preliminaries

Assume now that X = Y and let us look for an approximation of a y X \ Y by


n

an element x =
j ej :
j=1

y x2 = y

n


j ej , y

j=1

= y2

n


j ej

j=1

n


j (y, ej )

j=1

= y2 +

n


n


j (y, ej ) +

j=1

|j |2

j=1

|j (y, ej )|2

j=1

n


n


(1.2.16)

|(y, ej )|2

j=1

2



n
n




2
2

y
|(y, ej )| = y
(y, ej )ej 
 .


j=1
j=1
Two consequences follow from this inequality. First, the best approximation of
y X by an element of Y is
Pn y 

n


(y, ej )ej .34

j=1

Observe also that (y Pn y) Y . Second,


n


|(y, ej )|2 y2

for all y X.

j=1

Since n is arbitrary (in an innite dimensional space) we have obtained the socalled Bessel inequality:

If {en }n=1 is an orthonormal system in X, then

|(y, en )|2 y2

for all y X.

(1.2.17)

n=1

In particular, the sum

|(y, ej )|2 is always convergent.

j=1
34 We

note that this result, namely the linearity of the operator Pn of the best approximation,
is typical for spaces with scalar products. In a general normed linear space X and a nite
dimensional subspace Y the best approximation of an arbitrary x X by elements of Y exists
(by a compactness argument) but a special property of the norm is needed for the uniqueness
of the best approximation. Linearity of the best approximation on all subspaces of dimension 2
implies that the norm is induced by a scalar product. More details can be found in the monograph
of Singer [120].

1.2. Normed Linear Spaces

45

Proposition 1.2.33. Let X be a linear space with a scalar product, let X be separable.35 Then there exists an orthonormal basis in X.
Proof. Let {x1 , x2 , x3 , . . . } be a dense set in X. Put
Yn = Lin{x1 , . . . , xn },

Y =

Yn .

n=1

Then Y = X. By omitting linearly dependent elements we can assume that


dim Yn = n. According to the Schmidt orthogonalization there exists an orthonor
mal sequence {en }
n=1 such that Yn = Lin{e1 , . . . , en }. Let x X and let {yn }n=1
be a sequence such that yn Yn and lim yn = x (the density of Y in X). By the
n

inequality (1.2.16),

This means that x =





n




x yn  x
(x, ej )ej 
.


j=1

(x, ej )ej .

j=1

To prove uniqueness, suppose that x =

j ej . Since the scalar product is

j=1

continuous, we have

(x, ek ) = lim
n

n


j ej , ek = k .

j=1

In order to obtain some useful properties which guarantee that an orthonormal sequence is a basis we need to use completeness. We start with a general
approximation result.
Theorem 1.2.34. Let H be a Hilbert space and let C be a closed convex subset of
H. Then for any x H there exists a unique y C such that
x y = inf {x z : z C}.

(1.2.18)

This best approximation y is characterized by the following property: y C


and
Re(x y, y z) 0

for all z C

(1.2.19)

(see Figure 1.2.2 36 ).


35 The

assumption on separability is redundant. Without separability an orthonormal basis


{e } still exists but is uncountable. Moreover, if x X, then (x, e ) = 0 for all but
countably many .
36 For A H we denote A  {x H : a A = (a, x) = 0}.

46

Chapter 1. Preliminaries

x
y + {x y}
y
z
C

xy

{x y}

yz
o
Figure 1.2.2.

Proof.
Step 1 (Existence). Denote the right-hand side in (1.2.18) by d. If d = 0, then

x C (C is closed) and y = x. Suppose that d > 0. Then there are {zn }n=1 C
such that
1
d x zn  < d + .
n
By (1.2.14) we get
zn zm 2 = x zm (x zn )2

2


zn + zm 


= 2(x zm  + x zn  ) 4 x

2


2
2
1
1
<2 d+
+2 d+
4d2
n
m
2

m
(notice that zn +z
C since C is convex). This implies that {zn }n=1 is a Cauchy
2
sequence, and therefore it is convergent to a y C. Obviously, x y = d.

Step 2 (Uniqueness). Assume that x y = x y = d for y, y C. Using


(1.2.14) as above we get y = y.
Step 3 (Characterization). Let y be the best approximation of x and let z C.
Then
zt  tz + (1 t)y C
for t (0, 1)
(C is convex) and
x zt 2 = x y + t(y z)2
= x y2 + t2 y z2 + 2t Re(x y, y z) x y2 ,

1.2. Normed Linear Spaces

47

i.e.,
ty z2 + 2 Re(x y, y z) 0
and taking the limit for t 0+ , the inequality (1.2.19) follows. If (1.2.19) is
satised, then
x z2 = x y + y z2 = x y2 + y z2 + 2 Re(x y, y z) x y2 ,


and therefore y is the best approximation of x.

Corollary 1.2.35. Let H be a Hilbert space and M a closed linear subspace of


H, M = H, M = {o}. Then there exists a unique subspace M (the so-called
orthogonal complement of M ) such that
H = M M ,

M M .

Moreover, if P denotes the projection to M given by this direct sum37 (the socalled orthogonal projection), then P L(H), P L(H) = 1 and
(P x, y) = (x, P y)

for all

x, y H.

Proof. A closed linear subspace M is a closed convex set. Denote by P x M the


best approximation of x H in M . Choose w M and put
z = P x w M.
By (1.2.19) we get Re(x P x, w) 0 and also Re(x P x, iw) 0 (by taking
z = P x iw), i.e.,
(x P x, w) 0.
Since also w M , we nally have
(x P x, w) = 0

for all w M.

(1.2.20)

It is easy to see that (1.2.20) is a characterization of P x. By using (1.2.20) for x,


x1 + x2 , we see that P is a linear operator. Since P 2 = P , P is a projection onto
M . The identity (1.2.20) also shows that
Ker P = M  {y H : x M = (x, y) = 0}.
By the orthogonality of P x and x P x we have
x2 = P x2 + x P x2 P x2 ,

i.e.,

P L(H) 1.

Since P x = x for x M , P L(H) = 1. By (1.2.20) we get


(x, P y) = (x P x + P x, P y) = (P x, P y) = (P x, P y y + y) = (P x, y).
37 Cf.

Example 1.1.13(i).

48

Chapter 1. Preliminaries

Corollary 1.2.36. Let H be a Hilbert space and let {en }n=1 be an orthonormal
sequence in H. Then the following statements are equivalent:
(i) {en }
n=1 is an orthonormal basis;
(ii) if (x, en ) = 0 for all n, then x = o;
(iii) Lin{e1 , e2 , . . . } is dense in H;
(iv) the Parseval equality
x2 =

|(x, en )|2

is valid for all

x H.

(1.2.21)

n=1

Proof. The implication (i)(ii) is obvious and follows from the denition of the
orthonormal basis.
The implication (ii)(iii): Denote Y = Lin{e1 , e2 , . . . }. Assume that Y is not
dense, i.e., Y = H. By Corollary 1.2.35 there exists x Y \ {o}. In particular,
(x, en ) = 0 for all n, a contradiction.
The implication (iii)(iv): The proof of Proposition 1.2.33 shows that the
sequence
n

(x, ek )ek
converges to x for all x H.
sn 
k=1

Moreover, sn (x sn ), and hence


x2 = sn 2 + x sn 2 =

n


|(x, ek )|2 + x sn 2 .

k=1

By taking the limit, the Parseval equality follows.


The implication (iv)(i): Let x H be arbitrary. For sn dened as above
and m > n we have
m

sm sn 2 =
|(x, ek )|2 .
k=n

Since the series in (1.2.21) is convergent, the sequence {sn }


n=1 is Cauchy, and
therefore it is convergent to a y H since H is complete. Moreover, (y, en ) =
(x, en ), and by the Parseval equality
x y2 =

|(x y, en )|2 = 0.

n=1

Remark 1.2.37. Let H be a Hilbert space and {en }n=1 an orthonormal basis in H.
The proof of the last implication shows that for an arbitrary sequence {n }
n=1 R


|n |2 is
(or C depending on whether H is a real or complex space) for which
n=1

1.2. Normed Linear Spaces

convergent, the series

49

n en is convergent in H to an x H and (x, en ) = n .

n=1

Moreover, the operator


2
38
U : x H  {(x, en )}
n=1 l (N)

is a unitary operator (i.e., (U x, U y)l2 (N) = (x, y), x, y H) which is surjective. It


implies also that all innite dimensional separable Hilbert spaces over the same
eld of scalars are unitarily equivalent. This statement is known as the Riesz
Fischer Theorem. Having this result we can ask why not restrict our attention
only to a single abstract separable Hilbert space. The reason is that in a special
function space like W k,2 () one has more ways of computation since its elements
are functions.
Example 1.2.38.
(i) The space L2 (, ) is a Hilbert space. It is separable since continuous 2periodic functions are dense in L2 (, ) and any such function can be apn

ak eikt (either the
proximated by trigonometric polynomials of the type
k=n

classical Weierstrass Approximation Theorem or Theorem 1.2.14). It is easy


to see that
1
en : t  eint ,
t (, ), n Z,
2
form an orthonormal system in L2 (, ). By Corollary 1.2.36(iii) it is also
an orthonormal basis.39
(ii) Functions
Hn (t)e

t2

n t2

where Hn (t) = (1) e

dn et
dtn

(the so-called Hermite polynomials) form (after normalization) an orthonormal basis in L2 (R). For the proof and relevant results in harmonic analysis
we recommend the classical book Kaczmarz & Steinhaus [70]. We note that
38 l2 (N)

is the space of all (generally complex) sequences x = {n }


n=1 such that

convergent. The scalar product on l2 (N) is given by (x, y)l2 (N) =


y = {n }
n=1 (see also Remark
39 Here this means that
f (t) =

+


f(n)eint


n=1

|n |2 is

n=1

n n for x = {n }
n=1 ,

1.2.24).

where

1
f(n) = (f, en )L2 (,) =
2

f (t)eint dt

and the series is convergent in the L2 -norm for arbitrary f L2 (, ). It is worth noting
that the series is actually a.e. convergent to f but this by no means follows from the norm
convergence. This result is due to L. Carlesson and it is one of the most dicult and profound
results in analysis.

50

Chapter 1. Preliminaries

there are many dierent orthonormal bases in L2 -spaces. We will present one
g
general method of their construction in Theorem 2.2.16.

Proposition 1.2.39. Let {en }n=1 be an orthonormal basis in a Hilbert space H.


Then a bounded set M H is relatively compact if and only if for any > 0
there is k N such that

|(x, en )|2 <

for all

x M.

n=k

Proof. The statement follows from Proposition 1.2.3.

Theorem 1.2.40 (Riesz Representation Theorem). Let H be a Hilbert space and


let F be a continuous linear form on H. Then there is a unique f H such that
F (x) = (x, f )

for all

x H.

Moreover, F  = f  where F  = F L(H,R) or F  = F L(H,C) depending on


whether H is a real or a complex space.
Proof. Suppose that H is a complex Hilbert space. If F = o, then f = o. Suppose
that F = o. The idea of constructing f is that f has to be orthogonal to Ker F
which is a closed subspace of H. By Corollary 1.2.35,
H = Ker F (Ker F ) .
Take x0 (Ker F ) , x0  = 1, and put f = x0 where will be determined later.
Let x = y + x0 , y Ker F , C be arbitrary. Then
(x, f ) = ,

F (x) = F (x0 ).

Choose now = F (x0 ). If there is another g H such that F (x) = (x, g), x H,
then 0 = (x, f g) for all x H, in particular, for x = f g. Therefore f = g.
By the Schwartz inequality (1.2.12) we obtain
|F (x)| = |(x, f )| xf ,

i.e.,

F  f .

Since F (f ) = f 2 , we have F  f . This shows that F  = f .

The following variant of the Riesz Representation Theorem is often used in


the functional analysis approach to dierential equations (see, e.g., Evans [48]).
Proposition 1.2.41 (LaxMilgram). Let H be a complex Hilbert space and let
B : H H C be a mapping with the following properties:
(i) The mapping x  B(x, y) is linear for any y H.
(ii) B(x, 1 y1 +2 y2 ) = 1 B(x, y1 )+2 B(x, y2 ) for every x, y1 , y2 H, 1 , 2
C.
(iii) There is a constant c such that |B(x, y)| cxy for every x, y H.

1.2. Normed Linear Spaces

51

Then there is A L(H), AL(H) c, such that


x, y H.

B(x, y) = (x, Ay),


Moreover,
(iv) if there is a positive constant d such that
B(x, x) dx2

for each

x H,

then A is invertible,
A1 L(H)

and

A1 L(H)

1
.
d

Proof. The existence of A follows from (i), (iii) and the Riesz Representation
Theorem. The property (ii) yields the linearity of A. Since
Ay2 = (Ay, Ay) = B(Ay, y) cAyy,
we have Ay cy, i.e., A L(H) and AL(H) c.
The property (iv) means that
dy2 B(y, y) = (y, Ay) yAy,
i.e.,
Ay dy

for all y H.

(1.2.22)

In particular, A is injective. Moreover, Im A is a closed subspace of H. Indeed,

let Ayn z Im A. By (1.2.22), {yn }n=1 is a Cauchy sequence, and hence it is


convergent to a y H. By continuity of A, Ay = z, i.e., z Im A. In fact,
Im A = H.
Indeed, if w (Im A) , then
dw2 B(w, w) = (w, Aw) = 0

and

w = o.

So Dom (A1 ) = Im A = H and (1.2.22) implies that


A1 L(H)

1
.
d

Exercise 1.2.42. Let {F }A be a system of closed subsets of a compact space


M . Prove the nite intersection property:
%
%
If
F = for any nite K A, then
F =
 .
K

(This property characterizes compact spaces.)


Hint. Suppose not. Then {M \ F }A is an open covering of M .

52

Chapter 1. Preliminaries

Exercise 1.2.43. Prove that F C[a, b] is relatively compact if and only if F is


bounded in C[a, b] and the following equicontinuity condition is satised:
> 0 > 0 f F :

x, y [a, b], |x y| <

|f (x) f (y)| < .

Hint. Use Proposition 1.2.3. Obviously, this statement is also a special case of
Theorem 1.2.13.

Exercise 1.2.44. Let {en }n=1 be an orthonormal basis in a Hilbert space H. Dene

n
if x = en ,

1
f (x) = n(1 2x en ) if x en  < ,

0
otherwise.
Show that f is a well-dened continuous functional on H which is not bounded
on the closed unit ball.
Exercise 1.2.45. Let = M X be a subset of a normed linear space X. For
x X set
dist(x, M) = inf{x y : y M}.
Prove that for any x1 , x2 X we have
| dist(x1 , M) dist(x2 , M)| x1 x2 .
Hint. Assume dist(x1 , M) dist(x2 , M). For any > 0 there exists x M such
that
x2 x  < dist(x2 , M) + .
Use the triangle inequality for x1 x .
Exercise 1.2.46.40 Let be a bounded open set in RM . For p [1, ) and k N
dene W0k,p () to be the closure of D() with respect to the W k,p ()-norm (1.2.9).
(i) Prove that
W0k,p ()  W k,p ()
and W0k,p () need not be dense in W k,p () (compare it with the statement
of Theorem 1.2.28(iii); see also the Trace Theorem (Theorem 7.3.1)).
(ii) Prove the Poincare inequality:
There exists a constant cp such that for all u W01,p () the inequality



|u(x)| dx cp

u(x)p dx 41

holds.

40 Supplement
41 Finding

to Example 1.2.25.
the smallest possible value of the constant
dicult problem. See also
' cp is a much more
(

Exercise 6.3.19 and Example 7.4.4. Here u(x) =

weak derivatives (see (1.2.8)), is the gradient of u.

u
u
, . . . , x
x1
M

where

,
xi

i = 1, . . . , M , are

1.2. Normed Linear Spaces

53

Hint. It suces to prove the assertion for u D(). Consider rst = (0, 1)
and use the Mean Value Theorem. Then suppose (without loss of generality)

 (0, d) RM1 and notice that D() D().

(iii) Use the Poincare inequality to prove that



|u|W 1,p () =
0

 p1
u(x)p dx

is an equivalent norm on W01,p () with the norm



uW 1,p () =
0

|u(x)| dx
p

 p1

 p1
u(x) dx
.
p

Exercise 1.2.47. Let u W 1,p (0, 1), 1 p < . Prove that functions
u+ (x)  max{u(x), 0},

u (x)  max{u(x), 0}

also belong to W 1,p (0, 1). We remark that the corresponding result is false for
W k,p (0, 1), k 2.

Chapter 2

Properties of Linear and


Nonlinear Operators
2.1 Linear Operators
In this section we point out some fundamental properties of linear operators in Banach spaces. The key assertions presented are the Uniform Boundedness Principle,
the BanachSteinhaus Theorem, the Open Mapping Theorem, the HahnBanach
Theorem, the Separation Theorem, the EberlainSmulyan Theorem and the Banach Theorem. We recall that the collection of all continuous linear operators from
a normed linear space X into a normed linear space Y is denoted by L(X, Y ), and
L(X, Y ) is a normed linear space with the norm
AL(X,Y ) = sup {AxY : xX 1}.
Proposition 2.1.1. Let Y be a Banach space. Then L(X, Y ) is a Banach space,
too. In particular, the space X of all linear continuous forms on X is complete.
Proof. Let {An }
n=1 be a Cauchy sequence in L(X, Y ). Then for any > 0 there
is n0 N such that for all n, m n0 and x X,
An x Am x An Am x x.

Since Y is complete, the sequence {An x}n=1 is convergent to a point in Y that


can be denoted by Ax. Obviously A is a linear operator from X into Y and
Ax Am x = lim An x Am x x,
n

m n0 ,

x X.

This implies (Proposition 1.2.10) that A L(X, Y ) and A Am  0.


The importance of this result can be seen from the following statement.

56

Chapter 2. Properties of Linear and Nonlinear Operators

Proposition 2.1.2. Let X be a Banach space and A L(X). If A < 1, then the
operator I A is continuously invertible and
(I A)1 =

An

n=0

where the sum is convergent in the L(X)-norm.


Proof. First we prove the convergence. Let > 0 be arbitrary. Put Sk =

k


An .

n=0

Then


l
l
l

 




An 
An 
An < 1
Sl Sk  = 


n=k+1

n=k+1

for

l>k

n=k+1

provided k is suciently large. By Proposition 2.1.1, the limit of Sk exists in the


L(X)-norm. Denote B  lim Sk =
An . We have
k

n=0

(I A)B = lim (I A)
k

k



An = lim

n=0

k


An

n=0

k+1



An

n=1

= lim (I Ak+1 ) = I
k

since lim An = O. Similarly,


n

B(I A) = I,

i.e.,

B = (I A)1 .

If X is a complex Banach space and A L(X), we denote


(A)  { C : I A is continuously invertible in L(X)}
(the so-called resolvent set of A) and
(A)  C \ (A)
(the so-called spectrum of A).2 The operator-valued function
 (I A)1 ,

(A),

is called the resolvent of A.


A L(X, Y ), B L(Y, Z), then BA L(X, Z) and BA L(X,Z) B L(Y,Z) A L(X,Y ) .
reason for considering only complex spaces consists in the fact that (A)
= for all
A L(X) in this case. This will be proved later in this section (see the discussion following
Example 2.1.20).
1 If

2 The

2.1. Linear Operators

57

Corollary 2.1.3. Let X be a complex Banach space and A L(X). Then (A) is
an open set and
{ : || > A} (A).
Proof. If || > A, then

and I


A 1



A
I A = I

L(X) according to Proposition 2.1.2. Hence we have


(I A)1 =


An 3
.
n+1
n=0

Similarly, if 0 (A), then


I A = ( 0 )I + (0 I A) = (0 I A)[I (0 )(0 I A)1 ].
For a parameter such that (0 )(0 I A)1  < 1, the inverse operator
B = [I (0 )(0 I A)1 ]1 exists and
(I A)1 = B(0 I A)1 .

The next theorem together with Theorems 2.1.8 and 2.1.13 is one of the
most signicant results in linear functional analysis. For the proofs the interested
reader can consult textbooks on functional analysis, e.g., Conway [28], Dunford &
Schwartz [44], Rudin [112], Yosida [135].
Theorem 2.1.4 (Uniform Boundedness Principle). Let X be a Banach space and
Y a normed linear space. If {A } L(X, Y ) is such that the sets {A xY :
} are bounded for all x X, then {A L(X,Y ) : } is also bounded.
This result is the quintessence of several results on approximation of functions
in classical analysis and can be used for modern proofs of such results. The
following example is typical.
Example 2.1.5. There exists a periodic continuous function the Fourier series of
which is divergent at zero.4 To see this we recall that the nth partial sum of the
Fourier series of a function f at 0 is given by



sin n + 12 t
1
Dn (0 t)f (t) dt where Dn (t) =
, 0 < |t| <
sn (f )(0) =
2
sin 2t
forms on
(the nth Dirichlet kernel ). Since n : f  sn (f )(0)
) are continuous *linear

the space C[, ], the sequence of their norms n L(C[,],R) n=1 should be
3 This series actually converges for such that || > r(A)  sup {|| : (A)} but its proof is
more involved. The quantity r(A) is called the spectral radius of A.
4 Even divergent at uncountably many points but always of measure zero. The set of such bad
functions is dense in C[, ].

58

Chapter 2. Properties of Linear and Nonlinear Operators

bounded provided n (f ) is convergent for all f C[, ] (Theorem 2.1.4). One


can calculate that

1
n  =
|Dn (t)| dt,
2
g
and a careful estimate shows that   is like log n for large n.
n

As indicated in the previous example, Theorem 2.1.4 is essentially an approximation result. This is clearer from its next variant.
Corollary 2.1.6 (BanachSteinhaus). Let X and Y be Banach spaces and let

{An }n=1 L(X, Y ). Then the limits lim An x exist for every x X if and
n
only if the following conditions are satised:
(i) There is a dense set M X such that lim An x exists for each x M.

(ii) The sequence of norms {An }n=1 is bounded.


Moreover, under these conditions
Ax  lim An x
n

exists for all x X and A L(X, Y ).5


The following proposition is also often useful.
Proposition 2.1.7. Let X be a Banach space and Y a normed linear space. If
B : X X Y is a bilinear operator (i.e., linear in both variables) and
(i) for every y X the mapping x  B(x, y) belongs to L(X, Y );
(ii) for every x X the mapping y  B(x, y) belongs to L(X, Y ),
then there exists a constant c such that
B(x, y)Y cxX yX ,

x, y X.

In particular, if xn x, yn y, then B(xn , yn ) B(x, y).


Proof. Denote By : x  B(x, y). By (i), By L(X, Y ) for all y X, y 1. By
(ii), By (x) c(x). The Uniform Boundedness Principle implies the existence of
a constant c such that

sup sup B(x, y) c.
x1 y1

Theorem 2.1.8 (Open Mapping Theorem). Let X, Y be Banach spaces, let A


L(X, Y ) and let A have a closed range Im A. Then for any open set G X its
image A(G) is an open set in Im A. In particular, if A is, in addition, injective
and surjective, then A1 L(Y, X).
5 This

type of convergence is the so-called convergence in the strong operator topology. It is


weaker than the norm convergence.

2.1. Linear Operators

59

When applied to linear equations


Ax = y,
Theorem 2.1.8 says that the continuous dependence of a solution on the righthand side is a consequence of the existence and uniqueness result. Such continuous
dependence is important for any reasonable numerical approximation.
Theorem 2.1.8 can be also used in a negative sense:
Example 2.1.9. Denote by
1
f(n) =
2

f (t)eint dt

the nth Fourier coecient of f L1 (, ). Since f(n) 0 for |n| for all
trigonometric polynomials which are dense in L1 (, ), we have
f(n) 0

for all f L1 (, )

(the so-called RiemannLebesgue Lemma). In other words, A : f  f() is a continuous linear operator from L1 (, ) into


c0 (Z)  {an }nZ : lim |an | = 0 ,
{an }c0 (Z) = sup |an |.
|n|

Applications of Fourier series to various problems in analysis (like convolution


equations, dierential equations, . . . ) would be much easier if A were a mapping
onto c0 (Z). Theorem 2.1.8 shows that this cannot be true for then A1 would be
bounded, i.e.,
f L1 (,) c sup |f(n)|

for all f L1 (, ).

If {Dk }k=1 is the sequence of Dirichlet kernels (Example 2.1.5), then



1, |n| k,

Dk (n) =
and
Dk L1 (,) log k,
0, |n| > k,
g

a contradiction.

Theorem 2.1.8 also yields a sucient condition for a linear operator to be


continuous. To formulate it we need the notion of a closed operator:
Let X, Y be normed linear spaces. A linear operator
A : Dom A X Y
is said to be closed if

{xn }n=1 Dom A,

xn x,

Axn y

implies that
x Dom A

and

Ax = y.

60

Chapter 2. Properties of Linear and Nonlinear Operators

Equivalently, A is a closed operator if and only if the graph of A, i.e.,


G(A)  {(x, Ax) : x Dom A},
is a closed linear subspace of X Y .
Corollary 2.1.10 (Closed Graph Theorem). Let X, Y be Banach spaces and let A
be a closed operator from Dom A = X into Y . Then A is continuous.
Proof. If G(A) denotes the graph of A, then put
T (x, Ax) = x.
By Theorem 2.1.8, T 1 is continuous, and therefore
A = 2 T 1
is continuous as well (2 is the projection of X Y onto the second component Y ).

Example 2.1.11. Many dierential operators are either closed or have closed extensions. If they are viewed as operators from X into X, then they are only densely
dened. A very simple example:
X = C[0, 1],
Ax = x,

Dom A = {x X : x(t)

exists for all t [0, 1] and x X}.


A well-known classical result says that A is a closed operator. But A is not contig
nuous. For xn (t) = tn we have xn  = 1, x n  = n.
Example 2.1.12. Let X be a Banach space and M a linear subspace of X. Let
N be an (algebraic) complement of M and let P be the corresponding projection
onto M . Then P is continuous if and only if both M and N are closed.
The suciency part follows from the Closed Graph Theorem and from an
observation that P is closed whenever M and N are closed subspaces. The necessity
part is obvious since
M = Ker(I P ),

N = Ker P.

This statement should be compared with the Hilbert space case (Corollary 1.2.35).
An important special case is codim M < . By denition, this means that
an algebraic direct complement N has a nite dimension (codim M  dim N ) and
therefore N is closed (Corollary 1.2.11(i)). If M is closed as well, then any projection onto M is continuous. We postpone the case of dim M < to Remark 2.1.19.
We note that if X is a Banach space such that there exists a continuous
projection P , P L(X) 1, onto every closed subspace of X, then X has an
g
equivalent norm induced by the scalar product on X (see Kakutani [71]).

2.1. Linear Operators

61

Now we turn our attention to the dual space X of all continuous linear
forms on a normed linear space X. In Section 1.1 we have seen the importance of
linear forms. Namely, they allowed us to dene an algebraic adjoint operator A#
and formulate Theorem 1.1.25. The dual space X is even more important for a
normed linear space X since another topology can be introduced on X with help
of X which in a certain sense has better properties (Theorem 2.1.25 below).
Surprisingly, the following basic result does not need any topology.
Theorem 2.1.13 (HahnBanach). Let X be a real linear space and let Y be a
linear subspace of X. Assume that f is a linear form on Y which is dominated
by a sublinear functional p.6 Then there exists F X # such that
(i) F (y) = f (y) for all y Y (extension);
(ii) F (x) p(x) for all x X (dominance).
Proof. The proof is based on an extension of f to a subspace whose dimension
is larger by 1 and such that this extension is dominated by the same p, and
the use of Zorns Lemma as an inductive argument, similarly as in the proof of
Theorem 1.1.3.

Remark 2.1.14. If X is a complex linear space, then we need p to satisfy a stronger
condition than (2) in footnote 6, namely
(2 ) p(x) = ||p(x), C, x X.
In this case p is called a semi-norm.7 The dominance also has to be stronger:
|f (x)| p(x).
The extension result follows from Theorem 2.1.13 by considering Re f and Im f
and observing that Re f (ix) = Im f (x).
Corollary 2.1.15. Let X be a normed linear space and let Y be a linear subspace
of X (not necessarily closed). If f Y , then there exists F X such that
(i) F (y) = f (y) for y Y ;
(ii) F X = f Y .
Proof. Put p(x) = f x, x X, and apply Theorem 2.1.13 or Remark 2.1.14,
respectively.

Corollary 2.1.16 (Dual Characterization of the Norm). Let X be a normed linear
space. Then
(2.1.1)
xX = max {|f (x)| : f X with f X 1}.
6

A mapping p : X R is called sublinear if


(1) p(x + y) p(x) + p(y) for any x, y X;
(2) p(x) = p(x) for any x X and 0.

7 The

dierence between a norm and a semi-norm is that a semi-norm need not satisfy the
condition: p(x) = 0 = x = o.

62

Chapter 2. Properties of Linear and Nonlinear Operators

Proof. Put g0 (x) = x, R (or C). Then g0 is a continuous linear


form on Lin{x} and its norm is 1 (provided x = o). Let f0 be its extension from
Corollary 2.1.15. Then
f0 (x) = x, f0  = 1,

i.e.,

x sup {|f (x)| : f X with f  1}.

The converse inequality follows from the denition of f .

Remark 2.1.17.
(i) If X is a Hilbert space, then the equality (2.1.1) can be obtained immediately
from the Riesz Representation Theorem (Theorem 1.2.40). This theorem can
be often used in Hilbert spaces instead of the HahnBanach Theorem.
(ii) A slightly weaker form of (2.1.1) is often used:
If f (x) = 0 for all f X , then x = o.
The equivalent assertion reads as follows:
X separates points of X.
Corollary 2.1.18 (Separation Theorem). Let X be a normed linear space and let
C be a nonempty, closed, convex set. If x0  C, then there exists F X such
that
sup {Re F (x) : x C} < Re F (x0 ).
(2.1.2)
Proof. It is sucient to give the proof for a real space X and under the additional
assumption o C. In particular, this assumption means that x0 = o. We wish to
extend the form f dened on Lin{x0 } by f (x0 ) = , R. To do that we need
a suitable dominating functional. Since d  dist(x0 , C) > 0, there exists a convex
neighborhood of C which does not contain x0 , e.g.,


d
K = x + y : x C, y <
.
2
,
+
z
pK (z)  inf > 0 : K

Put

for

z X.8

It is a matter of simple calculation to show that pK is sublinear, pK (x0 ) > 1, and


pK (z) 1 for z K.
Let F be an extension of f given by Theorem 2.1.13. Since o C, we have
F (y) pK (y) 1
This shows that
F 
8p

2
,
d

i.e.,

for

y <

F X .

is the so-called Minkowski functional of the convex set K.

d
.
2

2.1. Linear Operators

63

The inequality (2.1.2) follows from domination: namely, we have


F (x) + F (y) pK (x + y) 1
i.e.,

for x C

and all y <



d
F (x) 1 sup F (y) : y <
< 1 = F (x0 ).
2

d
,
2


Remark 2.1.19. If C from Corollary 2.1.18 is a closed linear subspace of X and


F X satises (2.1.2), then F (x) = 0 for all x C. Notice that F (x0 ) = 1 for F
which has been constructed in the proof.
This observation yields the existence of a continuous projection onto a nite
dimensional subspace Y of X. Namely, suppose that {y1 , . . . , yn } is a basis of Y ,
and denote by Yk the span of y1 , . . . , yk1 , yk+1 , . . . , yn . Then Yk is a closed linear
subspace of X and yk  Yk . Let Fk X be such that

1, j = k,
j = 1, . . . , n.
Fk (yj ) =
0, j = k,
Then
Px =

n


Fk (x)yk

k=1

is a continuous projection onto Y .


Warning. It is not true that every projection onto Y is continuous even if dim Y =
1 but the construction (i.e., the construction of a noncontinuous linear form) is
not obvious!
Example 2.1.20.
(i) By Corollary 1.2.11(ii), (RM ) = (RM )# . This means that (RM ) can be
identied with RM .
(ii) Let K be a compact subset of RM . Then for any F [C(K)] there exists a
unique complex Borel measure on K such that

F (f ) =
f (x) d(x)
for every f C(K),
K

and F [C(K)] = ||(K) where || is the total variation of . A similar


statement holds under a more general assumption on K for details and
the corresponding notions see Dunford & Schwartz [44, Section IV, 6] or
Rudin [113, Chapter 6] and, especially, Bourbaki [14]. In the last book the
integration theory is developed on the basis of this representation theorem.
(iii) Let be an open subset of RM and let p [1, ). Then the dual space

[Lp ()] can be identied with Lp () (p is the conjugate exponent, i.e.,

64

Chapter 2. Properties of Linear and Nonlinear Operators


1
p

+ p1 = 1) in the following sense. For any F [Lp ()] there exists a unique


Lp () such that

F (f ) =

f (x)(x) dx

for every f Lp ().

Moreover, F [Lp()] = Lp () . Details can be found in books cited


above.
Warning. The dual space [L ()] is much larger than L1 ()!
(iv) The dual spaces to Sobolev spaces W k,p (RM ) can be identied with special
subspaces of tempered distributions for example via the Fourier transform.
We omit details since their description is beyond the scope of this book. g
The reader can ask why we are so interested in continuous linear forms.
One of the reasons is the following. Suppose that is a vector-valued function
(i.e., a mapping from R or C into a normed linear space X). For any f X the
composition f is a real or complex function of a real or complex variable and
therefore results of classical analysis can be applied to f . To be more specic,
consider the resolvent (see page 56) of A L(X)
R()x  (I A)1 x,

(A),

which is an X-valued function for every x X. Then for any F X , the complex
function () = F [(I A)1 x] is holomorphic in (A). For || > A we also
have


 An 


|()| F X (I A)1 L(X) xX = F x 
n+1 



n=0
F x


An
,
||n+1
n=0

and so
lim |()| = 0.

||

If (A) = C, would be identically zero (by the Liouville Theorem from the
complex functions theory). Since this should be true for all F X , we get
(I A)1 x = o for all x X, a contradiction. Therefore, the spectrum (A)
is nonempty for each A L(X). This is a generalization of the existence of an
eigenvalue of a linear operator in a nite dimensional space and therefore also a
generalization of the Fundamental Theorem of Algebra (cf. page 15). It is worth
mentioning that the Jordan Canonical Form (Theorem 1.1.34) is based on this
result.

2.1. Linear Operators

65

Warning. It is not true that any A L(X), dim X = , has an eigenvalue!


A simple example is
X = C[0, 1],
Ax(t)  tx(t).
Our main reason for considering dual spaces comes from an attempt to nd
a weaker topology on a normed linear space in which bounded sets would be
relatively compact. The importance of this fact will become clear in Chapter 6.
We also ask the reader to return to Proposition 1.2.2 for motivation.
Denition 2.1.21. Let {xn }
n=1 be a sequence of elements in a normed linear

space X. We say that {xn }n=1 converges weakly to x X (notation xn  x


or w- lim xn = x) if
n

lim f (xn ) = f (x)

for every f X .

Proposition 2.1.22.
(i) (uniqueness) If xn  x and xn  y, then x = y.
(ii) If lim xn x = 0, then xn  x.9
n

(iii) A weakly convergent sequence is bounded. Moreover, if xn  x, then


x lim inf xn .
n

(iv) If X is a uniformly convex Banach space,10 xn  x and xn  x, then


{xn }
n=1 converges to x in the norm topology.
Proof. Assertion (i) follows immediately from Remark 2.1.17(ii) since in this case
f (x) = f (y) for every f X .
Assertion (ii) is obvious.
Assertion (iii) is basically a consequence of Theorem 2.1.4, but certain preliminaries are needed: Since X is a normed linear space, its dual X  (X ) is
dened. Put
(x) : f  f (x),
f X .
Then (the so-called canonical embedding) is a linear continuous operator from
X into X , and
(x)X = sup |f (x)| = xX
f X 1

The converse statement is not true in general (see Exercise 2.1.37)!


A Banach space X is said to be uniformly convex
for every > 0 there is > 0 such
 if 


that x, y X, x = y = 1, x y = 1  x+y
. Every uniformly convex space
2 
9 Warning.
10

is reexive, see Yosida [135, Chapter V, 2]. Hilbert spaces, Lp ()-spaces and W 1,p ()-spaces
(1 < p < ) are uniformly convex (for a Hilbert space this follows from the parallelogram
identity (1.2.14), for the other two cases see, e.g., Adams [2, Corollary 2.29 and Theorem 3.5]).

66

Chapter 2. Properties of Linear and Nonlinear Operators

(Corollary 2.1.16).11 Since the space X is always complete (Proposition 2.1.1),


Theorem 2.1.4 can be applied to the sequence {(xn )}
n=1 . This shows that

{xn }n=1 is bounded. If xn  x, we choose f X such that


f  = 1

and

f (x) = x

(Corollary 2.1.16). Then


x = f (x) = lim f (xn ) lim inf xn .
n

Assertion (iv) is obvious for x = o. If x = o, then we may assume that also


x
xn = o and put y  x
and yn  xxnn  . Since xn  x and xn  x, we have
f (yn ) =

1
1
f (xn )
f (x) = f (y) for any f X ,
xn 
x

i.e., yn  y.

If we prove that yn y 0, then


xn x = (yn xn  yx) xn yn y + y-xn  x- 0
due to the assumption xn  x. To prove yn y we proceed by contradiction
using the uniform convexity of X. Suppose that there is > 0 such that yn y
for innitely many n. Then, by the uniform convexity of X,
yn + y 2(1 ).
Let us choose f0 X , f0  = 1, f0 (y) = y = 1 (see Corollary 2.1.16). Then
2(1 ) lim sup yn + y lim sup f0 (yn + y) = 2f0 (y) = 2,
n

a contradiction.

Remark 2.1.23. The weak convergence is the convergence in the weak topology. It
is convenient to dene this topology by systems of neighborhoods of points. We say
that U X is a weak neighborhood of a point x X if there are f1 , . . . , fn X
such that
{y X : |fi (y) fi (x)| < 1 for i = 1, . . . , n} U.
A subset G X is weakly open (i.e., open in the weak topology) provided it is
a weak neighborhood of each of its points. It is easy to see that a weakly open
set is also open in the norm topology. The converse is generally true only in nite
dimensional spaces.
11 It

is not generally true that is surjective. A Banach space X is said to be reexive if


is surjective. Every Hilbert space and spaces Lp (), 1 < p < , are reexive (the Riesz
Representation Theorem and Example 2.1.20(iii)). Spaces L1 (), L () and C() are not
reexive.

2.1. Linear Operators

67

As we have mentioned, our aim is to nd compact sets in the weak topology.


Remark 2.1.24. The weak topology in an innite dimensional space is not metrizable. Therefore two concepts of compactness, namely the sequential and the covering one (see footnote 12 on page 26) are in principle dierent. It is surprising that
they coincide for weak topologies in Banach spaces. This very deep result is known
as the EberlainSmulyan Theorem (see Dunford & Schwartz [44, Chapter 5]).
Theorem 2.1.25 (EberlainSmulyan). Let X be a reexive space. Then any bounded sequence contains a weakly convergent subsequence.
Proof. We present a simple proof for the case that X is a Hilbert space. A proof
for an arbitrary reexive space can be found, e.g., in Dunford & Schwartz [44],
Fabian et al. [49], Yosida [135]. Let {xn }
n=1 X be a bounded sequence, and put
Y = Lin{x1 , x2 , . . . }
(the closure is taken in the norm topology). Since the sequence of scalar products
{(x1 , xn )}
n=1 is a bounded sequence of
+ numbers,(real or complex), there is a
subsequence, say {xn }
n=1 , such that
(1)

(1)

(x1 , xn )

reason there is a subsequence {xn }


n=1 of
(2)

(k)

converges, etc. Put yk = xk

converges. For the same


+
,
(2)
that (x2 , xn )

n=1
(1)
{xn }
n=1 such

n=1

(the diagonal choice). Then lim (xj , yk ) exists for


k

all j N, and therefore lim (x, yk ) exists for each x Lin{x1 , x2 , . . . }.


k

Since the sequence of linear forms fk : x (x, yk ) is bounded in Y , the


BanachSteinhaus Theorem (Corollary 2.1.6) implies the existence of f Y such
that
lim fk (x) = f (x)

for all x Y.

Let P be the orthogonal projection onto Y . Put


g(x) = f (P x)

for

x X.

Then g X and by the Riesz Representation Theorem there is y X such that


g(x) = (x, y)

for x X.

Moreover,
lim (x, yk ) = lim (P x, yk ) = f (P x) = (x, y)

This means that yk  y.

for all x X.


68

Chapter 2. Properties of Linear and Nonlinear Operators

Remark 2.1.26. Weak convergence in a dual space X is more confusing since two

approaches can be used. We say that a sequence {fn }


n=1 X
(i) converges weakly to f X (notation fn  f or w- lim fn = f ) if
n

lim F (fn ) = F (f )

for every F X ;

(ii) converges weak star to f X (notation fn  f or w - lim fn = f ) if


n

lim fn (x) = f (x)

for every x X.

Criteria for weak convergence in Lp -spaces can be found, e.g., in Dunford &
Schwartz [44, Chapter IV, 8].
The weak convergence in X has obviously the same properties as that in
X. Because of the continuous embedding : X X (see the proof of Proposition 2.1.22(iii)) the w-convergence implies the w -convergence. The converse is
true if X is a reexive space, i.e.,
(X) = X .
Since the w -topology is generally weaker than the w-topology there can exist
more w -compact sets than the w-compact ones. In fact, the following result (the
AlaogluBourbaki Theorem, see Conway [28], Dunford & Schwartz [44], Fabian et
al. [49]) holds:
If X is a normed linear space, then any closed ball in X is w compact. If, moreover, X is separable, then the ball is also sequentially
w -compact.
For example, this theorem can be applied to balls in Lp (), 1 < p .
In the rest of this section we will examine adjoint operators. Suppose that X
and Y are normed linear spaces and A L(X, Y ). If g Y , then
A g  g(A) X .
The operator A : Y X is obviously linear, and it is also continuous since
|A g(x)| = |g(Ax)| gY AxY gY AL(X,Y ) xX .
If H1 , H2 are Hilbert spaces and A L(H1 , H2 ) we have another approach
to the denition of an adjoint operator, namely the one based on the Riesz Representation Theorem: For y H2 the mapping f : x  (Ax, y)H2 is a continuous
linear form on H1 , and hence there is z H1 for which f (x) = (x, z)H1 . This z is
uniquely determined by y, and we denote for a moment z = A+ y, i.e.,
(Ax, y)H2 = (x, A+ y)H1 .

2.1. Linear Operators

69

There is a very slight dierence between A and A+ , e.g., (A) = A and


(A)+ = A+ (see also Example 2.1.28 below). So we will use the same notation,
namely A , for both concepts. Symmetric matrices have certain special properties
(e.g., their canonical forms are diagonal). The same can be expected for their
generalization in the Hilbert space setting which is dened as follows:
An operator A L(H) is said to be self-adjoint if A = A , i.e.,
(Ax, y) = (x, Ay)

for all x, y H.

In order to generalize Theorem 1.1.25 to continuous linear operators on innite


dimensional normed linear spaces we will use the same notation but with a slightly
dierent meaning:
If M X, then
M  {f X : x M f (x) = 0}.
If N X , then
N  {x X : f N f (x) = 0}.
We invite the reader to compare this symbol with that for orthogonal complements
in Hilbert spaces.
Proposition 2.1.27. Let X, Y be normed linear spaces and let A L(X, Y ). Then
(i) if xn  x, then Axn  Ax;
(ii) if A is, moreover, continuously invertible, then A is also continuously invertible and (A )1 = (A1 ) ;
(iii) Ker A = (Im A ) ;
(iv) Im A = (Ker A ) .
Proof. (i) It is easy with the use of A .
(ii) It is sucient to show that (A1 ) A = IY and A (A1 ) = IX . This
follows from the more general result
(AB) = B A
which is easily veried.
(iii) The inclusion is obvious from the denition, for the converse inclusion
it is sucient to use the fact that Y separates the points of Y .
(iv) It is easy to see that (Im A) = Ker A . To get (iv) it suces to prove
that
for M X.
(M ) = Lin M
If x0 belonged to (M ) \ Lin M, x0 would be separated from Lin M by a linear
form f X (Corollary 2.1.18). Since Lin M is a subspace of X, this separating
f would be in (Lin M) = M . Therefore f (x0 ) = 0, and a contradiction is
obtained. The converse inclusion Lin M (M ) is obvious.


70

Chapter 2. Properties of Linear and Nonlinear Operators

Notice that the statement (iv) is not a sucient condition for solvability of
the equation
Ax = y
since only the closure of Im A is characterized. There are
 many operators the range
t

of which is not closed. A simple example is Ax(t) =

x(s) ds considered either


0

in C[0, 1] or in L (0, 1). It is not an easy task to decide whether an operator has
a closed range or not. The following statement is useful in applications.
If X, Y are Banach spaces and A L(X, Y ) is injective, then Im A is
closed if and only if there is a positive constant c such that
Ax cx

for all

x X.

Suciency is easy, the necessity part follows from the Open Mapping Theorem.
There is an important subclass of operators with a closed range, namely the
so-called Fredholm operators. An operator A L(X) is said to be Fredholm if
dim Ker A < ,

Im A is closed,

and

codim Im A <

(i.e., the dimension of any direct complement of Im A is nite). We note that


codim Im A = dim Ker A
(this is basically Proposition 2.1.27(iv)). We dene
ind A  dim Ker A dim Ker A
and call it the index of the Fredholm operator . A special class of Fredholm operators
will be examined in the next section.
We have not yet introduced any suciently broad family of continuous linear
operators. The next example lls this gap.
be open subsets of RM and
Example 2.1.28 (Integral operators). Let and

R , respectively. Assume that k : C is a measurable function for which


there are constants c1 , c2 such that



|k(t, s)| ds c1 for a.a. t ,


|k(t, s)| dt c2 for a.a. s .

Then the operator A dened by



k(t, s)x(s) ds

Ax(t) =

(2.1.3)

for 1 p .12
is a linear bounded operator from Lp () into Lp ()
can be found in Dunford
on the kernel k which guarantee that A L(Lp (), Lr ())
& Schwartz [44, Chapter VI, 11A].

12 Conditions

2.1. Linear Operators

71

is
To prove this assertion we have to show that Ax(t) exists for a.a. t ,
p
13

older inequality,
measurable on and belongs to L (). For 1 p < , by the H
we get for p1 = 1 p1 :

|Ax(t)|

1
p

1
p

1
p



|k(t, s)| |k(t, s)| |x(s)| ds c1

Set

|k(t, s)||x(s)| ds
p

|k(t, s)||x(s)| ds

(t) 


p

 p1

 p1
.

Since the measurable function (t, s)  |k(t, s)||x(s)|p can be approximated by step
bounded), the function t  (t) is measurable on
functions (consider rst
The Fubini Theorem yields
.

 

p
p
|(t)| dt =
|k(t, s)||x(s)| ds dt


 
|k(t, s)| dt |x(s)|p ds c2 xpLp () .
=

(by the same


In particular, is nite a.e. Since t  Ax(t) is measurable on
argument as above), we also have

 p1
1
1

p
AxLp ()
=
|Ax(t)|
dt
c1p c2p xLp () .

, 1 p < , with a
The Fubini Theorem also yields (we identify g [Lp ()]
p
function from L () see Example 2.1.20(iii))

 

Ax(t)g(t) dt =
k(t, s)x(s) ds g(t) dt



 
k(t, s)g(t) dt x(s) ds =
(A g)(s)x(s) ds,
=

i.e.,

A g : s 


k(t, s)g(t) dt,

g Lp ().

We note that the adjoint operator to A for p = 2 in the sense of the Riesz
Representation Theorem is of the form


A g(s) =
k(t, s)g(t) dt.

and k(t, s) = k(s, t). We will continue the


In particular, A is self-adjoint if =
g
study of integral operators in the next section (Example 2.2.5).
13 The

case p = is left to the reader.

72

Chapter 2. Properties of Linear and Nonlinear Operators

In Example 2.1.11 we have mentioned that dierential operators on a function


space are not continuous and are only densely dened. Therefore we wish to extend
the notion of the adjoint operator to this case. Assume that A is a linear operator
dened on a dense subspace Dom A of X with values in Y . Put
D = {g Y : a linear form x Dom A  g(Ax)
has a continuous extension f to the whole of X}.
Obviously, D is a linear subspace of Y containing o and the extension f is
uniquely determined by g. We denote
A g  f,

Dom (A ) = D

and call A the adjoint operator to A.


Example 2.1.29. The simplest dierential operator is dened by
Ax(t) = x(t).

This relation can be considered in various function spaces and also with dierent
domains. If we are interested in its adjoint we should have a good representation
of the dual space. This leads to an observation that spaces of integrable functions
would be more convenient than spaces of continuous functions. Therefore let X =
Lp (0, 1), 1 p < and Dom A = C 1 [0, 1]. Consider A : Dom A X X. We

wish to compute A . Assume g Dom (A ) Lp (0, 1) and A g = f , i.e.,
 1
 1
x(t)g(t)

dt =
x(t)f (t) dt = A g(x)
for all x Dom A.
g(Ax) =
0

In particular, for

x V = {x Dom A : x(1) = 0}

and

F (t) =

f (s) ds,
0

the integration by parts14 yields


 1

x(t)f (t) dt = x(t)F (t)|10
0


x(t)F

(t) dt =

x(t)F

(t) dt.

Since the restriction A|V of A to V has a dense range in Lp (0, 1) (Im A|V = C[0, 1]),

we have F + g = o in Lp (0, 1). This means that g can be changed on a set of
measure zero to have g absolutely continuous and


g = f Lp (0, 1),

i.e.,

g W 1,p (0, 1).

14 If you are not familiar with integration by parts for the Lebesgue integral (notice that f

Lp (0, 1) L1 (0, 1)), you can approximate f by a continuous function to get a standard situation
for integration by parts.

2.1. Linear Operators

73

Moreover, g(0) = F (0) = 0. Taking F (t) =

f (s) ds we see that also g(1) =


t

0. This proves that

Dom (A ) {g W 1,p (0, 1) : g(0) = g(1) = 0} = W01,p (0, 1) 15


and
A g = g.

Integration by parts yields also the converse inclusion, i.e.,




Dom(A ) = W01,p (0, 1).


Notice that Im A is dense in Lp (0, 1) but not closed while A is injective and


p
Im A = f L (0, 1) :

f (t) dt = 0

0


is closed but not dense in Lp (0, 1).


Notice also that (A) = (A ) = and any C is an eigenvalue of A. To
g
the contrary, A has no eigenvalues.
A more general result (due to S. Banach) is stated in the following proposition
(see, e.g., Yosida [135]).
Proposition 2.1.30. Let X, Y be Banach spaces and let A be a closed densely
dened linear operator from X into Y . Then Im A is closed if and only if Im A
is closed. Moreover,
Im A = (Ker A )

and

Ker A = (Im A ) .

Nevertheless, notice that A is not closed in our example. Proposition 2.1.30


16
can be applied to A (A is always closed); Dom (A ) = W 1,p (0, 1), A x = x.
This simple example shows how the domain of a (linear) noncontinuous operator
aects its properties.
Example 2.1.31. Put
Ax =
x

with

Dom A = {x C 2 (a, b) : x(a) = x(b) = 0}.

If the equation
Ax = x
15 The last equality should be proved. A deeper insight into these Sobolev spaces will be given
in Chapter 7, cf. also Exercise 1.2.46.
16 Notice that A is an extension of A and, moreover, the graph of A is the closure of the
graph of A (it is also said that A is the closure of A).

74

Chapter 2. Properties of Linear and Nonlinear Operators

has a nonzero solution w ( Dom A), then is called an eigenvalue and w a cork2 2
responding eigenfunction of A. Simple calculation shows that (ba)
2 are all eigen-

k
values of A,17 and sin ba
(t a) are the corresponding eigenfunctions. Consider
now the boundary value problem


x(t) = x(t) + f (t),


t (a, b),
(2.1.4)
x(a) = x(b) = 0.

x x = 0.
Let 1 , 2 be a fundamental system for the dierential equation
The Variation of Constants Formula shows that
 t
1 (s)2 (t) 1 (t)2 (s)
f (s) ds
(2.1.5)
x(t) = c1 1 (t) + c2 2 (t) +
W (s)
a
is a solution to
x x = f . Here W is the Wronski determinant of 1 , 2 (notice
that for this equation we always can choose 1 , 2 such that W 1). We wish to
nd constants c1 , c2 such that x given by (2.1.5) satises the boundary conditions
x(a) = x(b) = 0. The number is not an eigenvalue if and only if


1 (a) 2 (a)
det
= 0.
1 (b) 2 (b)
In this case the formula (2.1.5) shows that for any f C[a, b] the problem (2.1.4)
has a unique solution in Dom A 18 which is called a classical solution. This means
that (A). Suppose now that is an eigenvalue. Then we can take 1 as a
corresponding eigenfunction and get
x(a) = c2 2 (a),

i.e.,

c2 = 0

(2 (a) = 0 since 1 , 2 are linearly independent), and


 b
 b
1 (s)f (s) ds = 0,
i.e.,
1 (s)f (s) ds = 0
x(b) = 2 (b)
a

(2.1.6)

since 2 (b) = 0 (by the same argument as above). Notice that (2.1.6) is also a
necessary condition for solvability of (2.1.4).
We will return to this example in the next section (see Example 2.2.17). g
Example 2.1.32. Linear dierential operators of the second order with nonconstant coecients are more complicated. To simplify our exposition we consider a
dierential expression
+ p1 x + p2 x
Lx  p0 x
17 The minus sign in the denition of A is conventional; it is introduced to obtain positive
eigenvalues.
18 If f Lp (a, b), then it is possible to show that the function x = x(t) given by (2.1.5) belongs to
W 2,2 (a, b), x(a) = x(b) = 0, and the equation in (2.1.4) is satised a.e. in (a, b). Such a solution
is called a strong solution.

2.1. Linear Operators

75

where p0 , p1 , p2 are continuous functions on a closed bounded interval [a, b] and


p0 < 0 on this interval (the so-called regular case). Let X = Lp (a, b), 1 p <
and
D = {x W 2,p (a, b) : x(a) = x(b) = 0}.
Put
x D = Dom A

Ax = Lx,
and consider

A : Dom A X X.
A solution of Ax = f is therefore a strong solution of

Lx(t) = f (t),
t (a, b),
x(a) = x(b) = 0.
It can be proved that A is injective provided p2 > 0 in [a, b]. (Assume by contradiction that Ker A = {o} and show that there is x0 Ker A which has a negative
minimum at an interior point c (a, b). Deduce that Lx0 (c) < 0.) The Variation
of Constants Formula shows that the operator A is also surjective and A1 is an
integral operator
 b
G(t, s)f (s) ds
(2.1.7)
A1 f (t) =
a

where G is the so-called Green function of L. The Green function is nonnegative


on [a, b] [a, b] and satises the estimates from Example 2.1.28. Therefore A1
L(X). In order to calculate the adjoint A it is convenient to consider the so-called
formal adjoint expression to L, i.e.,
M y = (p0 y) (p1 y) + p2 y


Lx(t)y(t) dt and omit-

which is obtained by integrating by parts in the integral


a

ting the boundary terms. Put


By = M y

for

y D = Dom B.

The same integration as above shows that B A . The proof of the equality
A = B needs a more careful calculation.
The interested reader can consult the books Coddington & Levinson [27,
Chapter 9], Edmunds & Evans [46] or Dunford & Schwartz [45], in particular
Chapter XIII, for details and also for more complicated singular cases which are
important in applications, e.g., in Quantum Mechanics (the Schr
odinger equation).
g

76

Chapter 2. Properties of Linear and Nonlinear Operators

Exercise 2.1.33. Let X, Y be Banach spaces. If A L(X, Y ) has a continuous


inverse A1 L(Y, X) and B L(X, Y ) is such that
B A <

1
,
A1 

then B is also continuously invertible and


B 1 

A1 
,
1 A1 B A

B 1 A1 

A1 2
B A.
1 A1 B A

Hint. Examine the proof of Corollary 2.1.3 and write A1 B = A1 (B A) + I.


Exercise 2.1.34. Show that
etA =

n n

t A
n!
n=0

is well dened for all t R, A L(X), provided X is a Banach space, and,


moreover, the vector function
: t  etA x0
solves the dierential equation
x(t)

= Ax(t)
and satises the initial condition (0) = x0 . (See also the end of Section 1.1, in
particular Exercise 1.1.41.)
Exercise 2.1.35. Let K be a continuous real function on [a, b] [a, b] and let
h C[a, b] be xed. Let
M=

max
(t, )[a,b][a,b]

and let R be such that


|| <

|K(t, )|

1
.
M (b a)

Prove that the integral equation



x(t) =

K(t, )x( ) d + h(t)


a

has a unique solution x C[a, b].

Exercise 2.1.36. Let {xn }n=1 , {yn }n=1 be sequences in a Hilbert space H such
that xn  x, yn y. Then
(xn , yn ) (x, y).
Hint. Use Proposition 2.1.22(iii).

2.2. Compact Operators

77

Exercise 2.1.37. Let {en }n=1 be an orthonormal sequence in a Hilbert space. Show
that
en  o.
Hint. Use the Bessel inequality (1.2.17).
Exercise 2.1.38. Prove assertion (iv) of Proposition 2.1.22 for a Hilbert space X.
Hint. Use the relation between the scalar product and the norm in X.
Exercise 2.1.39. Show that a convex set (in particular a subspace) of a normed
linear space is weakly closed if and only if it is closed in the norm topology.
Hint. Suppose by contradiction that C is a norm-closed convex set which is not
weakly closed. Then there is x0 C w \ C. Use the Separation Theorem (Corollary 2.1.18) to obtain a contradiction.
Exercise 2.1.40. Prove that actually
A L(Y ,X ) = AL(X,Y ) .
Hint. The inequality A  A follows from the calculation after Remark 2.1.26.
For the converse inequality use the dual characterization of the norm Ax.

2.2 Compact Operators


In this section we present a class of continuous linear operators the properties of
which are closely related to the properties of nite dimensional linear operators.
The key assertions presented concern the RieszSchauder Theory and the Hilbert
Schmidt Theorem.
Denition 2.2.1. Let X and Y be normed linear spaces. A linear operator A
L(X, Y ) is called a compact operator if the image of a ball in X is relatively
compact in Y . The set of all compact operators from X into Y is denoted by
C (X, Y ).
Remark 2.2.2.
(i) Every compact linear operator is continuous.
(ii) The compactness condition is mostly used in the following equivalent form:

For any bounded sequence {xn }n=1 X there is a subsequence

{xnk }k=1 such that Axnk converge in the norm topology of Y .


(iii) Replacing the norm topology in Y by the weak topology a weakly compact
operator can be dened. If either X or Y is reexive, then any A L(X, Y )
is weakly compact. This follows from the EberlainSmulyan Theorem (Remark 2.1.24) and the observation that A L(X, Y ) maps a weakly convergent
sequence into a weakly convergent one (cf. Proposition 2.1.27(i)).

78

Chapter 2. Properties of Linear and Nonlinear Operators

Example 2.2.3.
(i) If A L(X, Y ) and dim Im A < (the so-called operator of nite rank ),
then A C (X, Y ).

(ii) Let {en }n=1 be an orthonormal basis in a Hilbert space H. Put


Aen = n en
and extend A by linearity to the dense set D  Lin{e1 , . . . } in H.
The operator A is bounded on D (and therefore it can be uniquely extended

to a continuous operator on H) if and only if {n }n=1 is a bounded sequence.


In addition,
A = sup |n |.
n

This follows immediately from the identity



Ax2 =
|n |2 |(x, en )|2
for every x H.
Moreover, A is a compact operator on H if and only if
lim n = 0.

This is an easy consequence of Proposition 1.2.39.

Proposition 2.2.4. Let X, Y and Z be normed linear spaces. Then


(i) if A C (X, Y ), B L(Y, Z), then BA C (X, Z);
(ii) if A C (Y, Z), B L(X, Y ), then AB C (X, Z);

(iii) if A C (X, Y ) and a sequence {xn }n=1 X converges weakly to x X,


then
lim Axn Ax = 0.
n

(iv) Assume that Y is a Banach space and a sequence {An }n=1 C (X, Y ) converges to A L(X, Y ) in the norm operator topology. Then A C (X, Y ).
Proof. The assertions (i) and (ii) are obvious.
To prove (iii) assume by contradiction that there is a subsequence {xnk }
k=1
such that
Axnk Ax c > 0.

The sequence {x
, is bounded (Proposition 2.1.22(iii)), and hence there exists
+n }n=1
a subsequence xnkl

l=1

and y Y such that

Axnkl y 0.
Since
f (Axn ) = A f (xn ) A f (x) = f (Ax)
we have y = Ax, and hence a contradiction.

for every f X ,

2.2. Compact Operators

79

(iv) Let B(o; 1) be the unit ball. By Proposition 1.2.3 it suces to show
that for any > 0 there is a nite -net of A(B(o; 1)). We choose n such that
An A < 2 , and a nite 2 -net for An (B(o; 1)). By the triangle inequality, this
is the desired -net for A(B(o; 1)).

Example 2.2.5.
(i) Let k be a continuous function on the Cartesian product [a, b] [a, b]. Then
the operator
 b
Ax : t [a, b] 
k(t, s)x(s) ds
a

is compact as an operator from C[a, b] into itself.19 We give two proofs of


this assertion.
The rst is based on the use of the Arzel`
aAscoli Theorem (Theorem 1.2.13).
Its assumptions are satised for F = A(B(o; 1)) where B(o; 1) is the unit ball
in C[a, b]. The equicontinuity of F follows from the uniform continuity of k
on [a, b] [a, b].
The second proof uses Proposition 2.2.4(iv). Put
A = {(t, s)  x(t)y(s) : x, y C[a, b]}.
It is easy to see that A is a subalgebra of C([a, b] [a, b]) which satises
the assumptions of the real or complex StoneWeierstrass Theorem (The
orem 1.2.14). Hence there are sequences {qn }
n=1 , {rn }n=1 in C[a, b] such
that
qn (t)rn (s) k(t, s)
uniformly in [a, b] [a, b].
In particular, this means that the operators
 b
rn (s)x(s) ds
An x : t  qn (t)
a

converge in the operator norm to A. Since Im An Lin{qn }, all An are


compact and, therefore, A is compact.
(ii) Let be a measurable subset of RM and let k L2 (). Then the operator

k(t, s)x(s) ds
Ax(t) =

(the so-called HilbertSchmidt operator ) is compact as an operator from


L2 () into itself.
We present again two proofs of this statement. The rst will be a typical
Hilbert space proof, the second will use the reexivity of L2 () and we
will show how it could be used to get compactness of an integral operator
on Lp ().
19 This

is true under more general assumptions, e.g., if the interval [a, b] is replaced
by a compact


topological space K, is a Borel measure on K and A is dened by Ax(t) =

k(t, s)x(s) d(s).


K

80

Chapter 2. Properties of Linear and Nonlinear Operators

The rst proof is based on the following observation:

Let {ek }k=1 , {fk }k=1 be two orthonormal bases in a separable


Hilbert space H. Let B L(H). By the Parseval equality we have
n2 (B) 

|(Bek , fn )|2 =

k,n=1

Bek 2 =

B fn 2 .

n=1

k=1

This shows that the quantity n(B) depends only on B and not on the particular choice of bases. Moreover, if n(B) < , then B C (H). To see this


take n N such that
B fn 2 < and dene
n=n +1

B x =

n


(Bx, fn )fn .

n=1

Then dim Im B < and


B x Bx2 =

|(Bx, fn )|2 x2

n=n +1

B fn 2 x2 .

n=n +1

The compactness of B follows from Proposition 2.2.4(iv).


In order to apply this statement to the HilbertSchmidt operator choose an
2
orthonormal basis {en }
n=1 in L () and notice that
m,n (t, s)  em (t)en (s)
is an orthonormal set in L2 ( ). Notice that {m,n }
m,n=1 is an orthonormal basis (use Corollary 1.2.36). Since
(Aen , em )L2 () = (k, m,n )L2 () ,
the niteness of n(A) follows from the Bessel inequality.

Now we give the second proof. Let {xn }n=1 be a bounded set in L2 (). Since
2
L () as a Hilbert space is reexive, there is a subsequence denote it again

by {xn }n=1 which is weakly convergent to an x in L2 (). In particular,




k(t, s)xn (s) ds
k(t, s)x(s) ds
for a.a. t

(the Fubini Theorem shows that k(t, ) L2 () for a.a. t ). Since



|Axn (t) Ax(t)|
|k(t, s)| |xn (s) x(s)| ds



xn xL2 ()

|k(t, s)|2 ds

 12


c

|k(t, s)|2 ds

 12
,

the Lebesgue Dominated Convergence Theorem yields


Axn AxL2 () 0.

2.2. Compact Operators

81

Proposition 2.2.6. Let H be a Hilbert space and A L(H). Then A is a compact


operator if and only if there is a sequence {An }
n=1 L(H) of operators of nite
rank which converges to A in the operator norm topology.
Proof. Because of Proposition 2.2.4 only the necessity part is left to be proved.
Let B(o; 1) be the unit ball in H. Since A(B(o; 1)) is compact, it is a separable
metric space, and therefore
Y = Lin A(B(o; 1))
is a separable Hilbert space. Let {en }
n=1 be an orthonormal basis in Y . Put
An x =

n


(Ax, ek )ek .

k=1

Then An have nite rank and


An x Ax2 =

|(Ax, ek )|2 <

for every x B(o; 1)

k=n+1

provided n is suciently large (Proposition 1.2.39).

Remark 2.2.7. The proof of the preceding proposition indicates that the result

holds also in a Banach space X with a Schauder basis {en }n=1 (see page 40).
The famous conjecture of S. Banach was that any separable Banach space has a
Schauder basis. The rst counterexample was constructed by P. Eno. He found
a compact operator in a separable Banach space which cannot be approximated
by operators of nite rank. We notice that separable Banach spaces of functions
like C(), Lp (), W k,p () (1 p < ) have a Schauder basis.
One of our goals in this section is to generalize the Fredholm alternative (see
footnote 6 on page 14). As we have seen in Section 1.1 the notion of the adjoint
operator is very important.
Proposition 2.2.8 (Schauder). Let X, Y be Banach spaces and assume that A
L(X, Y ). Then A is compact if and only if A is compact.
Proof.

Step 1 (the only if part). Suppose that A C (X, Y ) and {gn }n=1 Y ,
gn Y 1. It is easy to verify the assumptions of the Arzel`aAscoli Theorem
(Theorem 1.2.13) for the sequence of functions
gn : K  A(B(o; 1)) R

(or C)

(B(o; 1) is the unit ball in X). By this theorem there is a subsequence {gnk }k=1
which is uniformly convergent on K. Since
|A gnk (x) A gnl (x)| sup |gnk (y) gnl (y)|

for each x B(o; 1)

yK

and X is complete, the sequence {A gnk }k=1 is convergent in X .

82

Chapter 2. Properties of Linear and Nonlinear Operators

Step 2 (the if part). Assume now that A C (Y , X ). We embed X into X


and Y into Y with help of the canonical isometrical embeddings X and Y (see
the proof of Proposition 2.1.22(iii)). Since A is compact, A is compact by the
rst part of the proof. It suces to show that
Y (Ax) = A X (x)

for x X


and we leave that to the reader.


If A C (X, Y ), then the equation
Ax = y

(2.2.1)

is scarcely ever well posed20 as follows from the rst part of the next theorem.
This is the reason why we are interested rather in equations of the type
x Ax = y.

(2.2.2)

Theorem 2.2.9 (RieszSchauder Theory). Let X be a Banach space and A


C (X). Then
(i) if Im A is closed, then dim Im A < ;
(ii) dim Ker (I A) < ;
(iii) Im (I A) is closed;
(iv) (the Fredholm alternative)
Im (I A) = X

if and only if

Ker (I A) = {o};

(v) dim Ker (I A) = dim Ker (I A ).


Proof. (i) If Y = Im A is closed, then A : X Y is an open mapping (Theorem 2.1.8). This means that a certain ball B(o; ) in Y is contained in the relatively compact set A(B(o; 1)), i.e., B(o; ) itself is relatively compact. By Proposition 1.2.15, dim Y < .
(ii) For the rest of the proof we put
T I A

and

Y  Ker T.

Then the restriction of A to the Banach space Y maps Y onto Y . By (i),


dim Y < .
(iii) Because of (ii) there exists a continuous projection P of X onto Y (Remark 2.1.19). Denote
Z  Ker P,

i.e.,

X =Y Z

equation (2.2.1) is said to be well-posed if A is injective and A1 is continuous. If A is


an integral operator, then (2.2.1) is called an integral equation of the rst kind . The equation
(2.2.2) is called an integral equation of the second kind. The research of these equations carried
out by I. Fredholm is supposed to be one of the starting points in the development of functional
analysis.

20 An

2.2. Compact Operators

83

and both Y and Z are Banach spaces. Since T is injective on Z, Im T is closed


provided there is a positive constant c such that
T zY czZ

for each z Z,

see page 70. Suppose by contradiction that such c does not exist, i.e., there are
zn Z such that
zn Z = 1

T zn Y <

and

1
zn Z .
n

Then one can nd a subsequence {znk }k=1 for which Aznk converges to a y. Since
T znk o, we have lim znk = y Z. This means that
n

T y = o,

i.e.,

y Y Z,

and thus

y = o.

This is a contradiction since znk y implies that yY = 1.


(iv) We will prove the necessity part by way of contradiction. Put
Yk  Ker T k .
Then
Y1  Y2   Yk 
since for x1 Ker (I A), x1 = o, there is x2 such that x1 = T x2 , i.e., x2 Y2 \Y1 ,
etc. It follows from the construction in the proof of Proposition 1.2.15 that there
are yk Yk , yk Yk = 1, such that dist(yk+1 , Yk ) 12 . For k > l we have
Ayk Ayl Yk = yk (yl T yl + T yk )Yk dist(yk , Yk1 )

1
.
2

This means that there is no convergent subsequence of {Ayk }k=1 , a contradiction.


The suciency part is now easy: It follows from Proposition 2.2.8 and the
previous part (iii) that Im T is closed. Assume that Ker T = {o}. By Proposition 2.1.27(iii),
Im T = (Ker T ) = X .
According to the rst part of this proof, Ker T = {o} and, again by (iii) and
Proposition 2.1.27(iv),
Im T = (Ker T ) = X.
(v) As in the proof of (iii), X = Y Z and the corresponding projection P
of X onto Y is continuous. It can be shown that a direct complement W of Im T
in X is isomorphic to Ker T .21 This means that
dim W = dim Ker T < .
21 This

is clear for X being a Hilbert space, since Im T is closed and the orthogonal complement
(Im T ) is equal to Ker T (Proposition 2.1.27(iv)). In a general Banach space we can use the
factor space X|Im T which is algebraically isomorphic to a direct complement W of Im T and


for g X|Im T
put f (x) = g([x]). It remains to show that the correspondence g f is an
(isometric) isomorphism onto (Im T ) = Ker T .

84

Chapter 2. Properties of Linear and Nonlinear Operators

Denote
dim Ker T = n

dim Ker T = n .

and

We shall prove that n = n . Assume that


n > n .
In particular, this means that there is a surjective linear operator L(Y, W ).
Such cannot be injective (see Corollary 1.1.15), i.e., there is x0 Y , x0 = o, for
which (x0 ) = o. Put now
B  A + P.
Since P C (X), we have B C (X) and
Bx0 = Ax0 + o = x0 ,

i.e.,

Ker (I B) = {o}.

By the Fredholm alternative (iv), Im (I B) = X. But


(I B)(Z) = Im T

and

(I B)(Y ) = (Y ) = W,

i.e.,
Im (I B) = Im T + W = X,
a contradiction. This proves the inequality n n .
By interchanging T and T we similarly obtain n n.22

Remark 2.2.10. The proof of the following statement is similar to that of Lemma
1.1.31(i).
If A C (X) and 1 (A), then there is k N such that
X = Ker (I A)k Im (I A)k .
Moreover, both the spaces on the right-hand side are A-invariant, and
dim Ker (I A)k < .23
Remark 2.2.11. Theorem 2.2.9 can be generalized to operators A L(X) for
which there is k N such that Ak C (X).
Another way of generalization is connected with perturbations of Fredholm
operators. Notice that the statement (v) of Theorem 2.2.9 says that I A is a
Fredholm operator of index zero provided A C (X). The following theorem states
the stability of index.
Theorem 2.2.12. Let X, Y be Banach spaces and let A L(X, Y ) be a Fredholm
operator. Then
(i) if B C (X, Y ), then A + B is Fredholm and
ind A = ind (A + B);

(2.2.3)

(ii) the set of Fredholm operators in L(X, Y ) is an open subset of L(X, Y );


furthermore, ind is a continuous function on this open set.
22 We

recommend to the reader to do that carefully to see that no reexivity of X is needed.


dimension is called the multiplicity of the eigenvalue 1.

23 This

2.2. Compact Operators

85

Proof. The proofs and further results can be found, e.g., in Kato [73, IV.5.].

Corollary 2.2.13. Let X be a complex Banach space and let A C (X). Then
(i) (A) \ {0} is a countable set of eigenvalues of nite multiplicity;
(ii) if dim X = , then 0 (A), and if is an accumulation point of (A),
then = 0.


Proof. (i) If = 0, then I A = I A
and Theorem 2.2.9 can be applied. In
particular, if such belongs to (A), then is an eigenvalue of nite multiplicity.
It remains to show that for any r > 0 the set = { (A) : || > r} is nite.
Assume by way of contradiction that there is a sequence of mutually dierent

points {n }n=1 and let xn be the corresponding nonzero eigenvectors. Put


Wn = Lin{x1 , . . . , xn }.
It is easy to see by induction that x1 , . . . , xn are linearly independent. So we can
nd yn+1 Wn+1 such that
yn+1  = 1

and

dist(yn+1 , Wn )

1
.
2

Now for k > l we have


Ayk Ayl  = k yk [(k I A)yk +(l I A)yl l yl ] |k | dist(yk , Wk1 )

r
2

and this contradicts the compactness of A.


(ii) The statement on accumulation points follows immediately from the proof
of (i). To see that 0 is a point of (A) provided dim X = it is sucient to realize
that (A) cannot be a nite set of nonzero numbers 1 , . . . , n . Indeed, with help
of Remark 2.2.10 we get
X = Ker (1 I A)k1 Ker (n I A)kn V

(2.2.4)

where V is a nontrivial closed A-invariant subspace of X. Therefore the spectrum


(A|V ) of the restriction A|V of A to V is a subset of (A). Since (A|V ) = (see
the discussion following Example 2.1.20), we have
 (A).
{1 , . . . , n } =

Example 2.2.14. Consider



Ax(t) 

x(s) ds

on the space

L2 (0, 1).

This is a special class of operators which have been examined in Example 2.2.5(ii):

1 for 0 s t 1,
k(t, s) =
0 for 0 t < s 1.
Therefore A C (L2 (0, 1)).

86

Chapter 2. Properties of Linear and Nonlinear Operators

If = 0 were an eigenvalue of A with an eigenfunction x, then



1 t
x(t) =
x(s) ds,
0
i.e., x is absolutely continuous and
x =

1
x,

x(0) = 0.

This implies that x = o in [0, 1]. Since (A) cannot be empty, (A) = {0}, and 0
is no eigenvalue of A.
We notice that the same statement (with a more complicated proof) is valid
for any Volterra integral operator
 t
k(t s)x(s) ds,
x L2 (0, 1),
Ax(t) =
0

provided, e.g., k L2 (0, 1). See also Example 2.3.7.

Corollary 2.2.13 can be signicantly strengthened in the case that X is a


Hilbert space and A is a compact, self-adjoint operator. To see this we need some
technicalities.
Proposition 2.2.15. Let H be a Hilbert space and A a self-adjoint continuous
operator on H. Then
(i) A = sup |(Ax, x)|;
x=1

(ii) m  inf (Ax, x) and M  sup (Ax, x) belong to the spectrum of A;


x=1

x=1

(iii) A = sup {|| : (A)};


(iv) (A) R;
(v) if Ax = x, Ay = y, = , then (x, y) = 0.
Proof. (i) Denote the right-hand side by . Obviously A. To prove the
converse inequality take
o = x H,
y = Ax.
Then for any t > 0, using (1.2.14), we have



1
1
2
Ax = A(tx), Ax = A(tx), y
t
t





1
1
1
1
1
=
A tx + y , tx + y A tx y , tx y
4
t
t
t
t

2 
2 






tx + 1 y  + tx 1 y  = t2 x2 + 1 y2 .


4 
t 
t 
2
t2

2.2. Compact Operators

87

Now we choose t such that


1
t x + 2 y2 = 2xy,
t
2

Hence


t=

y
x

 12
and

i.e.,

2

1
tx y = 0.
t

Ax2 xy

follows.

(ii) By taking A + AI instead of A, we can assume that 0 m M = A

(the last equality follows from (i)). Let {xn }n=1 be a sequence such that
xn  = 1

and

lim (Axn , xn ) = M.

Then
lim sup Axn M xn 2 = lim sup [(Axn , Axn ) 2M (Axn , xn ) + M 2 ]
n

lim sup [2M 2 2M (Axn , xn )] = 0.


n

If M (A), then there is a constant c > 0 such that


Ax M x cx.
The previous calculation shows that this cannot be true.
The assertion on m is obtained by replacing A by A.
(iii) This is a consequence of (i) and (ii) and Corollary 2.1.3.
(iv) Let = + i, = 0. A simple calculation yields that
x Ax2 ||2 x2

for every

x H.

This inequality shows that both


I A

and

I A = I A

are injective and Im (I A) is closed. By Proposition 2.1.27(iv) and Corollary 1.2.35,


Im (I A) = [Ker (I A) ] = [Ker (I A)] = H.
Therefore (A).
(v) We have
(x, y) = (Ax, y) = (x, Ay) = (x, y) = (x, y)
(by (iv), R). Since = , we conclude that (x, y) = 0.

88

Chapter 2. Properties of Linear and Nonlinear Operators

Theorem 2.2.16 (HilbertSchmidt). Let H be a separable Hilbert space and A


a self-adjoint compact operator. Then there exists an orthonormal basis {en }
n=1
where en are the eigenvectors of A.
If


Aen = n en
and
x=
(x, en )en ,
n=1

then
Ax =

n (x, en )en .

n=1

Proof. Let {n }n=1 be the sequence of all nonzero and pairwise distinct eigenvalues
(k)
(k)
of A. Choose an orthonormal basis e1 , . . . , enk of
Nk  Ker (k I A).
Remember that Nk Nk+1 (Proposition 2.2.15(v)). Let us align the collection
 (k)
{e1 , . . . , e(k)
nk }
k

into a sequence {e1 , e2 , . . . }. This sequence is an orthonormal basis of


H1  Lin{e1 , e2 , . . . }.
If H1 = H, the proof is complete. Assume therefore that H = H1 . The orthogonal
complement H1 is A-invariant. This means that the restriction
B  A|H1
is a self-adjoint operator on the Hilbert space H1 . Since (B) (A), (B)
cannot contain any nonzero number (Corollary 2.2.13(i)). As (B) = , we have
(B) = {0} and, by Proposition 2.2.15(iii),
on H1 .

B=O

Hence 0 is an eigenvalue of B as well as of A. By adding an orthonormal basis of



H1 to {e1 , e2 , . . . } we obtain an orthonormal basis of H.
Example 2.2.17.24 We have found that the inverse operator to
Ax = (px)
+ qx, 25

x Dom A = {x W 2,2 (a, b) : x(a) = x(b) = 0},

exists provided p,
q C[a, b] and p, q > 0 on [a, b]. Moreover, A1 is an integral
operator
 b
G(t, s)f (s) ds
A1 f (t) =
a
24 A

continuation of Example 2.1.32.


25 This operator is called a SturmLiouville operator.

2.2. Compact Operators

89

where G is the Green function of the dierential expression. From the construction
of G it follows that G C([a, b] [a, b]), in particular, G L2 (a, b), and G is a
real symmetric function (G(t, s) = G(s, t)), see, e.g., Walter [131].
By Example 2.2.5(ii), A1 is a compact, self-adjoint26 operator in the real
space L2 (a, b) and Theorem 2.2.16 can be applied to obtain an orthonormal basis

of L2 (a, b) formed by the eigenfunctions {en }n=1 of A1 , i.e., by the eigenfunctions


of A. Since
 b
(Ax, x)L2 (a,b) =
[p(t)x 2 (t) + q(t)|x(t)|2 ] dt > 0 for all x Dom A, x = o,
a

all eigenvalues are positive. If is an eigenvalue of A (equivalently


value of A1 ), then
dim Ker (I A) = 1

is an eigen-

since the equation


(px)
+ (q )x = 0
cannot have two linearly independent solutions satisfying the initial condition
x(a) = 0. Let the eigenvalues n of A be arranged into a sequence so that
0 < 1 < 2 < .
From the properties of compact operators (Corollary 2.2.13) it follows that n
. It is sometimes important to know how quickly n tend to innity. A simple
estimate can be obtained with help of the quantity n(A1 ) (Example 2.2.5(ii)),
namely


1
= n2 (A1 ) < .
2

n=1 n
However, this result is far from being optimal. We remark here that a variational
approach to an eigenvalue problem for compact, self-adjoint operators will be
briey described in Section 6.3.
Consider now the equation
Ax = x + f

(2.2.5)

or, equivalently (cf. Exercise 2.2.23),


n=1

(n )(x, en ) =

(f, en ),

i.e., (n )(x, en ) = (f, en )

for n N.

n=1

If is no eigenvalue of A, then inf |n | > 0 (since n ) and


n

x=
26 We


(f, en )
en

n=1 n

restrict our attention to a special dierential operator A in contrast to the general operator
from Example 2.1.32 in order to get a self-adjoint inverse A1 .

90

Chapter 2. Properties of Linear and Nonlinear Operators

is a unique solution of (2.2.5). (Notice that this series is convergent.) If = n ,


then the condition
(f, en ) = 0
is a necessary and sucient condition for solvability of (2.2.5) (see also Example 2.1.31).
If we examined singular dierential operators, e.g., on the interval [0, ),
we would meet with many diculties arising for example from the fact that A1
is not compact and, therefore, its spectrum is more complicated. The interested
g
reader can consult the book Dunford & Schwartz [45].
Remark 2.2.18. The HilbertSchmidt Theorem allows to introduce a functional
calculus for compact, self-adjoint operators similarly as it has been done for matrices in Theorem 1.1.38:
Let A be a compact, self-adjoint operator on a Hilbert space H. Then
there exists a unique mapping
: C((A)) L(H) 27
with the following properties:
(i) is an algebra homomorphism;
(ii) is a continuous mapping from C((A)) into L(H) with the operator topology;
m
m


ak xk , then (P ) =
ak Ak ;
(iii) if P (x) =
k=0

k=0

(iv) if w  (A) and f (x) =


then (f ) = (wI A)1 ;
(v) ((f )) = f ((A)) for every f C((A)).


If Ax =
n (x, en )en , then it is easy to verify properties (i)(v) for
1
wx ,

n=1

(f )x 

f (n )(x, en )en .

n=1

We omit the proof of uniqueness.


It is worth mentioning that we can introduce a functional calculus for a linear
operator A which has a compact, self-adjoint resolvent (0 I A)1 . We leave this
easy construction to the interested reader. Example 2.2.17 shows a class of such
operators.
Exercise 2.2.19. Suppose that A L(X, Y ) maps a weakly convergent sequence
into a strongly convergent one. Prove that A is compact provided X is reexive.
Exercise 2.2.20. Prove the assertion from Remark 2.2.10 and the decomposition
(2.2.4).
27 If

(A) = {0} {n }
n=1 , then f C((A)) if and only if lim f (n ) = f (0).
n

2.3. Contraction Principle

91

Exercise 2.2.21. Consider a special case of the SturmLiouville operator


Ax =
x
in the space L2 (0, ) with the boundary conditions
(i) x(0) = x() = 0 (Dirichlet boundary conditions),
(ii) x(0)

= x()

= 0 (Neumann boundary conditions),

= 0, 1 x()+1 x()

= 0 (mixed or NewtonRobin boundary


(iii) 0 x(0)+0 x(0)
conditions),
(iv) x(0) = x(), x(0)

= x()

(periodic conditions).
Find Green functions, eigenvalues and eigenfunctions. What follows from the
HilbertSchmidt Theorem? Compare this result with that of Example 1.2.38(i).
Exercise 2.2.22. Dene etA for the operator A from Exercise 2.2.21 (see Remark 2.2.18). Take x Dom A and show that the function


t 0,
u(t, )  etA x (),
is a solution to the heat equation
2u
u
=
t
2
satisfying the initial condition u(0, ) = x() and the boundary conditions given by
u(t, ) Dom A. Do not forget to dene the notion of a solution.
Exercise 2.2.23. Let A be as in Example 2.2.17. Prove that

.



Dom A = x =
(x, en )en :
|n |2 |(x, en )|2 <
n=1

and
Ax =

n=1

n (x, en )en .

n=1

2.3 Contraction Principle


The previous four sections have been devoted to some basic facts in the linear
theory. It is now time to start with nonlinear problems, especially with the solution
of the nonlinear equation
f (x) = a

for

f : X X.

(2.3.1)

The basic assertions in this section are xed point theorems for contractible and
non-expansive mappings. If X is a linear space, (2.3.1) is equivalent to the equation
F (x)  a f (x) + x = x.

92

Chapter 2. Properties of Linear and Nonlinear Operators

The solution of this equation is called a xed point of F . In the case that
f (x) = x Ax

(F (x) = Ax + a)

where A L(X), we succeeded in solving this equation in Section 2.1 (cf. Proposition 2.1.2) by applying the iteration process
x0 = a,

xn = a + Axn1

provided A < 1.

This idea can be easily generalized to the following result which is often attributed
to S. Banach.
Theorem 2.3.1 (Contraction Principle). Let M be a complete metric space and let
F : M M be a contraction, i.e., there is q [0, 1) such that
(F (x), F (y)) q (x, y)

for every

x, y M.

Then there exists a unique xed point x


of F in M . Moreover, if
x0 M,

xn = F (xn1 ),

then the sequence {xn }n=1 converges to x and the estimates


qn
(x1 , x0 )
1q
q
(xn , xn1 )
(xn , x
)
1q
)
(xn , x

(a priori estimate),

(2.3.2)

(a posteriori estimate)

(2.3.3)

hold.

Proof. We prove that {xn }n=1 is a Cauchy sequence. Indeed, for m > n we have
(xm , xn ) (xm , xm1 ) + + (xn+1 , xn )
= (F (xm1 ), F (xm2 )) + + (F (xn ), F (xn1 ))
q[ (xm1 , xm2 ) + + (xn , xn1 )]
qn
(x1 , x0 ).
(q m1 + + q n ) (x1 , x0 )
1q
Since q < 1, the right-hand side is arbitrarily small for suciently large n. The

in the complete space M , and for this


Cauchy sequence {xn }n=1 has a limit x
limit the estimate (2.3.2) holds. Being a contraction, F is a continuous mapping,
and therefore
(
'
x).
x
= lim xn = lim F (xn1 ) = F lim xn1 = F (
n

Uniqueness of a xed point is even easier: If x


= F (
x), y = F (
y ), then
(
x, y) = (F (
x), F (
y )) q (
x, y),

i.e., (
x, y) = 0

(q < 1).

The a posteriori estimate also follows from the above estimate of (xm , xn ).

2.3. Contraction Principle

93

The xed point of F the existence of which has been just established often depends on a parameter. The following result is useful in investigating this
dependence.
Corollary 2.3.2. Let M be a complete metric space and A a topological space.
Assume that F : A M M possesses the following properties:
(i) There is q [0, 1) such that
(F (a, x), F (a, y)) q (x, y)

for all

aA

and

x, y M.

(ii) For every x M the mapping a  F (a, x) is continuous on A.


Then for each a A there is a unique (a)  x
such that
F (a, x
) = x
.
Moreover, is continuous on A.
Proof. The existence of follows directly from Theorem 2.3.1. The estimates
((a), (b)) = (F (a, (a)), F (b, (b)))
(F (a, (a)), F (b, (a))) + (F (b, (a)), F (b, (b)))
(F (a, (a)), F (b, (a))) + q ((a), (b))
yield
((a), (b))

1
(F (a, (a)), F (b, (a))),
1q


and the continuity of follows.

Remark 2.3.3. Notice that is Lipschitz continuous provided a  F (a, x) is


Lipschitz continuous uniformly with respect to x (and, of course, A is a metric
space).
There is an enormous number of applications of the Contraction Principle.
The proof of the existence theorem for the initial value problem for ordinary differential equations belongs to standard applications. However, the historical development went in the opposite direction. The following theorem had been proved
(by iteration) about thirty years before the Contraction Principle was formulated
in its full generality. Another application will be given in Section 4.1.
Theorem 2.3.4 (Picard). Let G be an open set in R RN and let f : (t, x1 , . . . , xN )
G RN be continuous and locally Lipschitz continuous with respect to the xvariables, i.e., for every (s, y) G there exist > 0, > 0, L > 0 such that
f (t, x1 ) f (t, x2 ) Lx1 x2 

i = 1, 2.
whenever |t s| < , xi y < ,

Then for any (t0 , 0 ) G there exists > 0 such that the equation
x = f (t, x)

(2.3.4)

94

Chapter 2. Properties of Linear and Nonlinear Operators

has a unique solution on the interval (t0 , t0 + ) satisfying the initial condition
x(t0 ) = 0 .

(2.3.5)

Proof. First we rewrite the initial value problem (2.3.4), (2.3.5) into an equivalent
xed point problem for an integral operator F dened by
 t
F (x) : t  0 +
f (s, x(s)) ds,
t (t0 , t0 + ).28
(2.3.6)
t0

This equivalence is easy to establish (by integration and by dierentiation with


respect to t). Therefore we wish to solve the equation
F (x) = x
in a complete metric space M . We choose M to be a closed subset of the Banach
space C[t0 , t0 + ] for a certain small > 0.
We need two properties of F and M , namely that F maps M into M and F
is a contraction on M . Choose rst 1 , 1 such that
R1  [t0 1 , t0 + 1 ] {x RN : x 0  1 } G.
This set R1 is compact, and therefore f is bounded and uniformly Lipschitz continuous on it, i.e., there are constants K, L such that
f (s, x) K,

f (s, x) f (s, y) Lx y

for (s, x), (s, y) R1 .

Put
M = {x C[t0 , t0 + ] : x(t) 0  1 t [t0 , t0 + ]} for a 1 .
Then
sup F (x(t)) 0  K,

tI

sup F (x(t)) F (y(t)) L sup x(t) y(t)


tI

tI

where
I  [t0 , t0 + ].
If we choose so small that K 1 and L 12 , then F maps M into itself (the
rst condition) and is a contraction with q = 12 (the second condition). By the
Contraction Principle, F has a unique xed point y in M and this is a solution of
(2.3.4), (2.3.5) on the interval (t0 , t0 + ). If x
is a solution of (2.3.4), (2.3.5) on
M (prove it!), i.e., y = x
, and the uniqueness
the interval (t0 , t0 + ), then x
follows.



28 If

t < t0 , then we dene


t0

t0

f (s, x(s)) ds =
t

f (s, x(s)) ds, and

t0
t0

f (s, x(s)) ds = 0.

2.3. Contraction Principle

95

Remark 2.3.5. The mapping F dened by (2.3.6) depends actually not only on
x but also on t0 , 0 . By taking smaller we can prove that F is also Lipschitz
continuous with respect to the initial conditions and Corollary 2.3.2 yields that
the solution x(t; t0 , 0 ) of (2.3.4), (2.3.5) is also Lipschitz continuous with respect
to the initial conditions.
Remark 2.3.6. If we apply Theorem 2.3.4 (i.e., the Contraction Principle) to a
system of linear dierential equations
x = A(t)x + g(t)
with a continuous matrix A and a continuous vector function g on an interval
(a, b), we need an extra eort to prove that a solution exists on the whole interval
(a, b). Namely, Theorem 2.3.4 gives only local existence, and in the continuation
process (take (t0 + , x(t0 + )) as a new initial condition) there is no a priori
evidence that could not be smaller and smaller.29 It is therefore sometimes more
convenient not to refer to the Contraction Principle but to prove the convergence
of iterations directly. The following example demonstrates this approach.
Example 2.3.7. Let k be a bounded measurable function on the set
M = {(s, t) R2 : 0 s t 1}.
Then for any f L1 (0, 1) and = 0 there is a unique solution to the integral
equation
 t
k(t, s)x(s) ds = f (t).
(2.3.7)
x(t)
0

To prove this assertion, denote

Ax(t) =

k(t, s)x(s) ds.


0

Then A L(L1 (0, 1)) (Example 2.1.28). Put


x0 = f,

xn = f + Axn1 .

1
Due to the completeness of L (0, 1) the sequence {xn }
n=1 is convergent in L (0, 1)


xn xn1 L1 (0,1) is convergent. We have
if and only if the sum
1

n=1

xn xn1 = A f
n

and

A f (t) =

kn (t, s)f (s) ds


0

where


k1 = k

and

kn (t, s) =

kn1 (t, )k(, s) d


s

(check this relation).


situation does not occur for the equation x = f (t, x) provided, e.g., that G = RN+1 and
there exists L > 0 such that for all (t, x), (t, y) RN+1 the inequality f (t, x)f (t, y) L xy
holds.

29 This

96

Chapter 2. Properties of Linear and Nonlinear Operators

It is easy to prove by induction that


|kn (t, s)| knL (M)
and hence

Since the series


n=1

(t, s) M,

- t
||
kn (t, s)f (s) ds-- dt
0
0
 1

 1
||n kn
n
||
f L1 (0,1) .
|f (s)|
|kn (t, s)| dt ds
n!
0
s


xn xn1 L1 (0,1)

(t s)n1
,
(n 1)!

an
n!

is convergent for any a R the limit lim xn = x


n

is a unique solution (see


L1 (0, 1) exists and x is a solution to (2.3.7). In fact x
Exercise 2.3.18). Moreover, x
depends continuously on f , which means that (A) =
{0}.30 This result holds also for k C(M) in the space C[0, 1]. The proof is the
g
same.
Example 2.3.8. Find sucient conditions for the existence of a classical solution
(cf. Example 2.1.31) of the boundary value problem

x(t) = f (t, x(t)),
t (0, 1),
(2.3.8)
x(0) = x(1) = 0.
Theorem 2.3.4 suggests the assumption that f is continuous with respect to t
and Lipschitz continuous with respect to the x-variable on a certain rectangle
[0, 1] [r, r]. Denote a Lipschitz constant on this interval by L(r). We wish to
rewrite the problem (2.3.8) as a xed point problem. To reach this goal suppose
that we have a solution y and denote g(t)  f (t, y(t)). Then y solves also the
equation
y = g
and satises y(0) = y(1) = 0.
It is easy to see that this problem has exactly one solution which is given by
 t
 1
 1
G(t, s)g(s) ds 
(t 1)sg(s) ds +
t(s 1)g(s) ds
y(t) =
0

(G is the Green function see Example 2.1.32). Therefore, we are looking for a
continuous function x which solves the integral equation
 1
x(t) =
G(t, s)f (s, x(s)) ds.
(2.3.9)
0

we have proved that C \ {0} (A), i.e., (A) {0}. Since A L(L1 (0, 1)),
(A) =

, we have (A) = {0}.

30 Actually,

2.3. Contraction Principle

97

Denote


F (, x) 

G(t, s)f (s, x(s)) ds.


0

We can solve (2.3.9) by applying the Contraction Principle in


M  {x C[0, 1] : x r}

for an appropriate choice of r.

For x M we have
|f (s, x(s))| |f (s, 0)| + |f (s, x(s)) f (s, 0)| K + L(r)r
where K > 0 is a constant such that |f (s, 0)| K, s [0, 1], and
F (, x)

||
(K + L(r)r).
8

This estimate shows that F maps M into itself if


q

||
L(r) < 1
8

and

||K 1
.
8 1q

Then F is also a contraction on M with the constant q. We can conclude that for
a given r there is 0 > 0 such that for || 0 both the above conditions31 are
satised and (2.3.9) has a solution.
Now we have to show that a continuous solution x of (2.3.9) is actually a
classical solution of the boundary value problem (2.3.8). Since we know the explicit
form of the Green function G, it is obvious that x(0) = x(1) = 0 and it is also
easy to dierentiate twice the right-hand side of (2.3.9) (taking into account that
x is continuous).
We remark that we have not used all properties of the integral operator with
the kernel G. In particular, such an operator is compact (Example 2.2.5(i)) and
this property has not been used. This property will be signicant in Section 5.1.
g
The a posteriori estimate (2.3.3) shows that the convergence of iterations
may be rather slow. It can be sometimes desirable to have faster convergence at
the expense of more restrictive assumptions. The classical Newton Method for
solving an equation
f (x) = 0,
f : R R,
is illustrated in Figure 2.3.1.
In order to generalize this method we need the notion of a derivative of
f : X X. This will be the main subject of the next chapter.
31 Notice

that for a xed these conditions are antagonistic, namely the rst requires small r
and the other large r. This situation is typical in applications of the Contraction Principle.

98

Chapter 2. Properties of Linear and Nonlinear Operators

y = f (x)
y = f (xn )(x xn ) + f (xn )

x
xn+1

xn
Figure 2.3.1.

There are many generalizations of the Contraction Principle. One of them


concerns the assumption q < 1. A mapping F : M M is called non-expansive if
(F (x), F (y)) (x, y)

for all x, y M.

A simple example F (x) = x + 1, x R, shows that F may have no xed point.


This can be caused by the fact that F does not map any bounded set into itself.
However, there are non-expansive mappings which map the unit ball into itself and
do not possess any xed point either. See the following example or Exercise 2.3.17.
Example 2.3.9 (Beals). Let M be the space of all sequences with zero limit with
the sup norm (this space is usually denoted by c0 ) and let
F (x) = (1, x1 , x2 , . . . )

for

x = (x1 , x2 , . . . ) M.

Then F is a non-expansive map of the unit ball into itself without any xed point.
g
This example indicates that some special properties of the space are needed.
We formulate the following assertion in a Hilbert space and use the Hilbert structure essentially in its proof. The statement is true also in uniformly convex spaces
but the proof is more involved (see, e.g., Goebel [60]). Let us note an interesting
fact that the validity of Proposition 2.3.10 in a reexive Banach space is an open
problem.
Proposition 2.3.10 (Browder). Let M be a bounded closed and convex set in a
Hilbert space H. Let F be a non-expansive mapping from M into itself. Then
there is a xed point of F in M. Moreover, if
x0 M,

xn = F (xn1 )

and

yn =

n1
1
xk ,
n
k=0

then the sequence

{yn }n=1

is weakly convergent to a xed point.

2.3. Contraction Principle

99

Proof. The existence result is not dicult to prove.32 So we will prove a more
interesting result which yields also a numerical method for nding a xed point.
The proof consists of four steps, the last one is crucial and has a variational
character.
Step 1. Since M is bounded, closed and convex, yn M and there is a subsequence

{ynk }k=1 weakly convergent to an x


M (Theorem 2.1.25 and Exercise 2.1.39).
M.
Fix such a weakly convergent subsequence {ynk }
k=1 and its weak limit x
Step 2. We have lim F (yn ) yn  = 0. Indeed,
n

F k (x0 ) F (yn ) + F (yn ) yn 2 = F k (x0 ) F (yn )2 + F (yn ) yn 2


+ 2 Re(F k (x0 ) F (yn ), F (yn ) yn )
where
F k (x0 ) = F (F k1 (x0 )).
Summing up this equality from k = 0 to k = n 1 and dividing by n we get
n1
n1
1  k
1 k
F (x0 ) F (yn )2
F (x0 ) yn 2 =
n
n
k=0

k=0


+ F (yn ) yn 2 + 2 Re(yn F (yn ), F (yn ) yn )

n1
1 k
F (x0 ) F (yn )2 F (yn ) yn 2 .
n
k=0

Since F is non-expansive, we conclude from this equality that


F (yn ) yn 2

n1
1  k1
1
F
(x0 ) yn 2 + x0 F (yn )2
n
n

k=1
n1


1
n

F k (x0 ) yn 2

k=0

1
1
= x0 F (yn )2 F n1 (x0 ) yn 2 0
n
n
(all sequences belong to M, and hence they are bounded).
Step 3. The element x
is a xed point of F . To see this, observe that the inequality
(z F (z) (ynk F (ynk )), z ynk )
= (z ynk , z ynk ) (F (z) F (ynk ), z ynk ) z ynk 2 z ynk 2 = 0
is possible to assume that o M. For any t (0, 1) the mapping Ft (x)  tF (x) is a
contraction. Letting t 1 we obtain a sequence {xn }
n=1 M for which xn F (xn ) o.
Therefore it is sucient to show that (I F )(M) is closed. This needs a trick which is typical for
monotone operators (Section 5.3). Notice that I F is monotone provided F is non-expansive.

32 It

100

Chapter 2. Properties of Linear and Nonlinear Operators

holds for any z M. By Exercise 2.1.36 and Step 2, the limit of the left-hand side
is (z F (z), z x), i.e., the inequality
(z F (z), z x) 0

(2.3.10)

is also true. Now take t (0, 1) and put


z = (1 t)
x + tF (
x)

(z M).

For t 0, the inequality (2.3.10) divided by t yields



x F (
x)2 0.
Step 4. If x is a xed point of F , then
xn x2 = F (xn1 ) F (x)2 xn1 x2
and, therefore, the limit (x)  lim xn x2 exists. By Step 3, x
is also a xed
n
point, and we get
(
x) 
x xk 2 = 
x v2 + v xk 2 + 2 Re(
x v, v xk )

for any v H.

Summing up from k = 0 to k = n 1 and dividing by n we arrive at


(
x) 
x v2 +

n1
1
v xk 2 + 2 Re(
x v, v yn ).
n

(2.3.11)

k=0

Let v be a weak limit of a subsequence {ynl }l=1 {yn }n=1 , possibly dierent

from {ynk }k=1 . Then v is a xed point of F by virtue of the previous steps. Set
n = nl and take the limit for l in (2.3.11). We nally obtain33
(
x) (v) 
x v2 ,
and v = x
follows. In particular, the limit of any weakly convergent subsequence

of {yn }n=1 coincides with x


, and therefore the whole sequence {yn }n=1 weakly
converges to x
.

Remark 2.3.11. We have mentioned in footnote 32 on page 99 that I F is a
monotone operator whenever F is non-expansive. The converse statement is not
true even in R. Consider, e.g., F (x) = 2x. Proposition 2.3.10 should be compared
with Theorem 5.3.4.
In the following three exercises we briey show other modications of the
Contraction Principle.
33 Observe

nl 1
1 
x
n
l l j=0

that lim

xj 2 = lim x xn 2 .
n

2.3. Contraction Principle

101

Exercise 2.3.12. If M is a complete metric space, F : M M and there is a


function V : M R+ {+} such that
V (F (x)) + (x, F (x)) V (x),

x M,

(2.3.12)

then for arbitrary


x0 M,

xn = F (xn1 ),

{xn }
n=1

is convergent in M to an x
. Moreover, if the graph of F is
the sequence
closed in M M , then
F (
x) = x
.

Hint. Show that {V (xn )}n=1 is a decreasing sequence; this implies that {xn }n=1
is a Cauchy sequence.
Remark 2.3.13. The condition (2.3.12) is suitable for a vector-valued mapping
F and plays an important role in game theory. For details see, e.g., Aubin &
Ekeland [10, Chapter VI].
Exercise 2.3.14. Let M be a complete metric space and let F : M M . If there
is n N such that F n is a contraction, then F has a unique xed point in M .
Hint. Let x
be a xed point of
G  F n,

x
= lim Gk (x0 ).
k

Estimate (F (Gk (x0 )), Gk (x0 )). It is possible to show that


x
= lim F k (x0 ).
k

Remark 2.3.15. The power n N need not be the same for all x, y M , i.e., if
there is q [0, 1) and for every x M there exists n(x) N such that
(F n(x) (x), F n(y) (y)) q (x, y)

for all y M,

then F also has a unique xed point (Sehgal [119]). The proof is similar to the
previous one.
Exercise 2.3.16 (Edelstein). Let M be a compact metric space and let F : M M
satisfy the condition
(F (x), F (y)) < (x, y)

for all x, y M, x = y.

Then F has a unique xed point in M .


Hint. Only existence has to be proved: By compactness there is a convergent
subsequence F nk (x0 ) x. Now show that the sequence
n  (F n (x0 ), F n+1 (x0 ))
is decreasing and
x, F (
x)) = (F (
x), F 2 (
x)),
lim n = (

i.e.,

F (
x) = x.

102

Chapter 2. Properties of Linear and Nonlinear Operators

Exercise 2.3.17. Let


K = {x C[0, 1] : 0 x(t) 1, x(0) = 0, x(1) = 1},

F : x(t)  tx(t).

Then F (K) K, F is non-expansive and there is no xed point of F in K! Prove


these facts and explain their relation to Proposition 2.3.10.
Exercise 2.3.18. Let x L1 (0, 1) be a solution of


x(t) =

k(t, s)x(s) ds
0

where and k are as in Example 2.3.7. Prove that x = 0 a.e. in (0, 1).
Hint. First show that x L (0, 1). From the equation we have
xL (0,t) ||tkL (M) xL (0,t) ,

t (0, 1).

Now deduce x = 0 a.e. in (0, 1).


Exercise 2.3.19. Prove Corollary 2.1.3 using Theorem 2.3.1.
Exercise 2.3.20. Let f C[0, 1]. Prove that there exists 0 > 0 such that for any
[0, 0 ] the boundary value problem

x
(t) x(t) + arctan x(t) = f (t),
t (0, 1),
x(0) = x(1) = 0,
has a unique solution x C 2 [0, 1].
Exercise 2.3.21. Let K be a continuous real function on [a, b] [a, b] R and
assume there exists a constant N > 0 such that for any t, [a, b], z1 , z2 R, we
have
|K(t, , z1 ) K(t, , z2 )| N |z1 z2 |.
Let h C[a, b] be xed and let R be such that
|| <

1
.
N (b a)

Prove that the integral equation



x(t) =

K(t, , x( )) d + h(t)
a

has a unique solution x C[a, b].


Exercise 2.3.22. Let A : (a, b) RMM be a continuous matrix-valued function
and let (a, b), RM .

2.3. Contraction Principle

103

(i) Modify the procedure from Example 2.3.7 to prove that the initial value
problem

x(t)

= A(t)x(t),
x() = ,
has a unique solution which is dened on (a, b).
(ii) Prove that the equation
x(t)

= A(t)x(t)

(2.3.13)

has M linearly independent solutions 1 , . . . , M on the interval (a, b) and


any solution of (2.3.13) is a linear combination of 1 , . . . , M . The matrix
= (ji )i,j=1,...,M is called a fundamental matrix of (2.3.13).
(iii) Let A be continuous on R and T -periodic (T > 0). Denote C = (T ) where
is a fundamental matrix, (0) = I. Suppose that B is a solution of the
equation
eT B = C
(see Exercise 1.1.42). Prove that
Q(t)  (t)etB
is regular for all t R and T -periodic. Moreover, x is a solution to (2.3.13)
if and only if
y(t)  Q1 (t)x(t)
is a solution of the equation
y = By
which has constant coecients. Find a condition in terms of (C) for the
existence of a nontrivial kT -periodic solution to (2.3.13) (k N).
(iv) Let f : R RM be a continuous and T -periodic mapping. Is there any
relation between the existence of a nontrivial T -periodic solution to (2.3.13)
and the existence of a T -periodic solution to the equation
x(t)

= A(t)x(t) + f (t)?
Hint. Use the Variation of Constant Formula and (iii).

Chapter 3

Abstract Integral and


Dierential Calculus
3.1 Integration of Vector Functions
This short section is devoted to the integration of mappings which take values in a
Banach space X. We will consider two types of domains of such mappings: either
compact intervals or measurable spaces. For scalar functions the former case leads
to the Riemann integral and the latter to the Lebesgue integral with respect to a
measure.
Denition 3.1.1. Let f : [a, b] X. Let there exist x X with the following
property: For every > 0 there is > 0 such that for all divisions
D  {a = t0 < < tn = b}
for which |D|  max (ti ti1 ) < and for all choices i [ti1 , ti ], i = 1, . . . , n,
i=1,...,n

the inequality

 n





f (i )(ti ti1 ) x



i=1

<

(3.1.1)

is satised. Then x is called the Riemann integral of f over [a, b] and it is denoted by
 b
f (t) dt.
a

A basic existence theorem is a straightforward generalization of the classical


(Riemanns) result.
Theorem 3.1.2 (Graves). Let X be a Banach space and let f : [a, b] X be
 b
f (t) dt exists.
continuous. Then the Riemann integral
a

106

Chapter 3. Abstract Integral and Dierential Calculus

Proof. Since f is continuous on the compact interval [a, b], f is uniformly continuous on it. Take an equidistant division Dn = {a = tn0 < < tnn = b} of the
interval [a, b], i.e.,
tni = a +

i
(b a),
n

i = 1, . . . , n,

Then
sn =

n


and

|Dn | =

ba
.
n

f (tni )(tni tni1 )

i=1

forms a Cauchy sequence (by the uniform continuity of f ). Let x = lim sn . It is


n

easy to see, again by the uniform continuity of f , that condition (3.1.1) is satised
whenever |D| is suciently small.

Since the Riemann integral is linear and the estimate


 b
 b



f
(t)
dt

f (t)X dt f C([a,b],X)(b a) 1


 a

a

(3.1.2)

holds for each f C([a, b], X), the integral is a linear continuous operator. Its
commutativity with linear operators is important.
Proposition 3.1.3. Let X, Y be Banach spaces and let f : [a, b] X be Riemann
integrable.
(i) If A L(X, Y ), then Af is also integrable and
 b
 b
f (t) dt =
Af (t) dt
holds.
(3.1.3)
A
a

(ii) If A : X Y is a linear closed operator and Af is Riemann integrable, then


 b
f (t) dt Dom A
a

and (3.1.3) is true.


Proof. The verication of (i) is straightforward, for (ii) we choose a sequence

{sn }n=1 of Riemann sums such that


 b
 b
lim sn =
f (t) dt
and
lim Asn =
Af (t) dt.
n

The statement follows from the denition of a closed operator.

With help of this generalization of the Riemann integral we can also prove a
basic result on the existence and uniqueness of a solution of a dierential equation
in a Banach space.
1 Here

f C([a,b],X) = max f (t) X .


t[a,b]

3.1. Integration of Vector Functions

107

Assume that
f: I G X
where I is an open interval in R, G is an open subset of a Banach space X. By a
solution of a dierential equation
x = f (t, x)

(3.1.4)

we mean a mapping x : J X where J is an open interval, J I, such that


x(J ) G and for every t J the limit
x(t + ) x(t)
0

x(t)

 lim
exists and

x(t)

= f (t, x(t)).
Theorem 3.1.4. Let I be an open interval in R and let G be an open subset of a
Banach space X. Assume that f : I G X is continuous and locally satises
the Lipschitz condition with respect to the second variable, i.e., for every s I,
y G there are > 0, > 0, L > 0 such that
f (t, x1 ) f (t, x2 ) Lx1 x2 
i = 1, 2.
whenever |t s| < and xi y < ,
Then for each t0 I, x0 G there exists h > 0 such that the equation (3.1.4)
has a unique solution on the interval J = (t0 h, t0 + h) which satises the initial
condition
(3.1.5)
x(t0 ) = x0 .
The proof of this theorem is based on the use of the Contraction Principle
for the equivalent integral equation (see also the proof of Theorem 2.3.4)


f (s, x(s)) ds, 2

x(t) = x0 +

t J,

(3.1.6)

t0

where the integral is the Riemann integral. The equivalence of (3.1.4), (3.1.5) and
(3.1.6) is established in the following lemma.
Lemma 3.1.5. Suppose that f is continuous on I G and (t0 , x0 ) I G. Then a
continuous function x : J G is a solution of (3.1.4) on an interval J I and
satises the condition (3.1.5) if and only if t0 J and x solves on J the integral
equation (3.1.6).

2 Recall

that
t0

t0

g(s) ds =
t

g(s) ds for t < t0 (see footnote 28 on page 94).

108

Chapter 3. Abstract Integral and Dierential Calculus

Proof. Step 1. Assume rst that x is a solution of (3.1.4). Then x as well as the
mapping t J  f (t, x(t)) are continuous on J . Choose J and integrate
both sides of (3.1.4) over the interval [t0 , ] (or [, t0 ]). Notice that both sides are
Riemann integrable (Theorem 3.1.2). Moreover,

 

d
(x(t)) dt = (x( )) (x0 )

x(t)
dt =
(x(t))

dt =
dt
t0
t0
t0
for all X (the last equality follows from the so-called Basic Theorem of
Calculus). By the HahnBanach Theorem, in particular Remark 2.1.17(ii), we
have

x(t)
dt = x( ) x0 ,
t0

i.e., x satises (3.1.6).


Step 2. Suppose now that x : J X is a continuous solution of (3.1.6). Then x
satises (3.1.5) and it remains to check that

d t
f (s, x(s)) ds = f (t, x(t)).
dt t0
This can be done by copying the proof for the scalar case.

Proof of Theorem 3.1.4. Choose > 0, > 0 small enough and K > 0 such that
f (s, x1 ) K,

f (s, x1 ) f (s, x2 ) Lx1 x2 


+
,
i = 1, 2. Let 0 < h min , 1 , and
for s [t0 , t0 + ], xi x0  ,
K 2L
Mh = {x C([t0 h, t0 + h], X) : x(s) x0  for s [t0 h, t0 + h]}.
Then Mh is a complete metric space (with respect to the induced metric) and the
operator
 t
F (x) : t  x0 +
f (s, x(s)) ds, t [t0 h, t0 + h],
t0

is well dened on Mh , F (Mh ) Mh (by (3.1.2)), and


 t




sup
[f (s, x1 (s)) f (s, x2 (s))] ds
F (x1 ) F (x2 ) =


t[t0 h,t0 +h]

t0

Lhx1 x2 

1
x1 x2 
2

for x1 , x2 Mh .

By the Contraction Principle (Theorem 2.3.1), there is a unique x Mh such that


F (x) = x.
Using Lemma 3.1.5 we conclude that x is a solution of (3.1.4), (3.1.5) on the
interval J = (t0 h, t0 + h).

3.1. Integration of Vector Functions

109

Let y be another solution of the same problem on the interval J . Then y Mk


for a k h. Because of the uniqueness in the Contraction Principle, y(t) = x(t) for
t [t0 k, t0 + k]. Taking (t0 k, x(t0 k)) as new initial conditions we can extend
t0 + k],
i.e., y M .
the equality y(t) = x(t) to a larger closed interval [t0 k,
k
This argument shows that
y(t) = x(t)

for

t J.

Corollary 3.1.6. Let f satisfy the assumptions of Theorem 3.1.4 where I = (, ),


G = X. If, moreover, f is bounded on I X, then for each t0 I and x0 X,
(3.1.4) has a unique solution satisfying the initial condition (3.1.5) which is dened
on the whole interval I.
Proof. The problem (3.1.4), (3.1.5) has a solution x on an interval (, ) I.
Such an interval exists due to Theorem 3.1.4 and the solution x is unique on this
interval by a similar argument as in the proof of uniqueness. Denote
= sup { > : there is a solution x on (, )}.
If 1 < 2 and xi is a corresponding solution on (, i ), i = 1, 2, then
x1 (t) = x2 (t)

for

t (, 1 )

(by uniqueness). This allows us to dene the solution x = x(t) on the entire interval
(, ). Since
 t
f (, x()) d |t s| sup f (, y)
x(t) x(s)
s

(,y)IX

for any < s < t < , the solution x is uniformly continuous on (, ) and,
therefore, continuously extendable at provided < (see Proposition 1.2.4).
The local Theorem 3.1.4 allows us to continue x as a solution beyond the value of
, a contradiction. Hence = . Similarly, we prove inf = .

Remark 3.1.7. Under the assumptions of Corollary 3.1.6 the solution x depends
continuously on the initial data. In order to formulate this result we denote by
x = x(; t0 , x0 ) the solution of (3.1.4), (3.1.5) on the interval I. The continuous
dependence now reads as follows:
For any compact interval J I, t0 J , and any > 0 there is > 0
such that
x(t; t0 , x0 ) x(t; t1 , x1 ) <

for all

tJ

provided t1 J , |t0 t1 | < and x0 x1  < .


Cf. Remark 2.3.5.
Remark 3.1.8. Another existence theorem for the scalar dierential equation
(3.1.4) (i.e., X = RN ) is based on the continuity of f only (cf. Proposition 5.1.13).

110

Chapter 3. Abstract Integral and Dierential Calculus

Warning. A generalization to an innite dimensional space does not hold, e.g.,




1
1
f (x) = |xn | 2 +
for x = (x1 , x2 , . . . ) c0
n + 1 n=1
where c0 is the space of sequences which converge to zero. As a norm on c0 the
sup norm is taken. It is not dicult to see that the equation x = f (x) has no
solution satisfying the initial condition x(0) = o!
We now turn to the integration of vector functions dened on a measurable
space (M, , ) where is a -algebra of subsets of M and is a (positive)
measure dened on . A generalization of the abstract Lebesgue integral3 can be
done in two dierent ways: Either by integrating f over M for all X or
by approximating f by step functions for which the integral is naturally dened,
and then passing to the limit. The former approach leads to a weak integral and
the latter one to the so-called Bochner integral . Since an existence
 theorem for the
(f ) d for all
weak integral (i.e., the existence of x X such that (x) =
M

X ) is complicated, we only briey describe the less general Bochner integral.


Denition 3.1.9. Let (M, , ) be a measurable space and let X be a Banach
space.
(1) A function s : M X is called a step function if there are pairwise disjoint
sets M1 , . . . , Mn in with (Mk ) < , k = 1, . . . , n, such that s is constant
(say, equal to xk ) on each Mk and
for t M \

s(t) = o

n


Mk .

k=1

The integral of s is dened by



n

s d =
xk (Mk ).
M

k=1

(2) A function f : M X is said to be strongly measurable if there is a sequence

{sm }m=1 of step functions such that


lim sm (t) = f (t)

exists for -a.a. t M.

(3) A strongly measurable function f : M X is said to be Bochner integrable if

there is a sequence {sm }m=1 of step functions which converges to f -a.e. and

f sm X d = 0.
(3.1.7)
lim
m

3 The

reader who is not acquainted with measure theory and the abstract Lebesgue integral can
assume that M is an open subset of RN , is a Lebesgue measure and is a collection of all
Lebesgue measurable subsets of M .

3.1. Integration of Vector Functions

In this case we put

111


f d = lim

sm d.

(3.1.8)

Remark 3.1.10. In order to show that this denition is correct we need to prove
that the norm of a strongly measurable function is a measurable function (this
is obvious) and, therefore, the condition (3.1.7) makes sense. From (3.1.7) it also
immediately follows that the limit in (3.1.8) does not depend on any special choice

of {sn }n=1 .
The following statement oers a very useful criterion for Bochner integrability.
Proposition 3.1.11 (Bochner). Let X be a Banach space and let (M, , ) be a
measurable space. A strongly measurable vector function f : M X is Bochner
integrable if and only if the norm f X is Lebesgue integrable. Moreover,







f
d
f X d.
(3.1.9)


M

Proof.

Step 1. Let f be Bochner integrable


andlet {sn }n=1 be a sequence of step functions


sn  d

from the denition. Then


M

particular, its limit, say R,







f d

 = lim

is a Cauchy sequence (by (3.1.7)), in


n=1

exists. Then







sn d lim

n

sn  d = .

It is easy to see that does not depend on any special choice of {sn }n=1 from the
denition and, moreover,




f  d
f sn  d +
sn  d,
i.e.,
f  d .
M

Step 2. Suppose now that f  is Lebesgue integrable and that {sn }n=1 is a sequence of step functions from the denition of strong measurability. Put



s (t) if s (t) 1 + 1 f (t),


n
n
n
n (t) =

o
otherwise.
Then n f -a.e. and, by the Lebesgue Dominated Convergence Theorem,



1
n f  d 0
since
n (t) f (t) 2 +
f (t).
n
M
It follows from the 
independence of of the special choice of approximating step
f  d. This proves the inequality (3.1.9).

functions that
M

112

Chapter 3. Abstract Integral and Dierential Calculus

Proposition 3.1.12. Let X, Y be Banach spaces, let (M, , ) be a measurable


space and let f : M X be Bochner integrable.
(i) If A L(X, Y ), then Af is also Bochner integrable and


Af d = A
f d.
(3.1.10)
M

(ii) If A is a closed linear operator from X into Y and Af is Bochner integrable,


then

f d Dom A
M

and (3.1.10) holds.


Proof. The proof of statement (i) is straightforward. To prove (ii) let
Z = {(x, Ax) : x Dom A}
be equipped with the graph norm
(x, Ax)Z  xX + AxY .
Since A is closed and X, Y are Banach spaces, Z is a Banach space as well. The
crucial point of the proof is to show that
g(t)  (f (t), Af (t)),

t M,

is strongly measurable. Achieving this4 the rest of the proof is easy: By Proposition 3.1.11, g is Bochner integrable and




g d =
f d,
Af d Z
M


f d

(since g maps into Z, its integral has to belong to Z, too). In particular,


M

Dom A and (3.1.10) holds.

sketch the proof of this result: Let Z . According to the HahnBanach Theorem there
is an extension = (1 , 2 ) of to (X Y ) . Since f and Af are strongly measurable, we
conclude that t  (g(t)) = 1 (f (t)) + 2 (Af (t)) is measurable. It can be also shown that there
is N M , (M \ N ) = 0 such that g(N ) is separable. The result now follows from the Pettis
Theorem (see, e.g., Dunford & Schwartz [44, Chapter III, 6], Yosida [135]):
A function g : M Z (Banach space) is strongly measurable if and only if the
following two conditions are satised:
4 We

(i) For every Z the function t  (g(t)) is a measurable function.


(ii) There is N M such that (M \ N ) = 0 and g(N ) is a separable subspace
of Z.

3.1. Integration of Vector Functions

113

Remark 3.1.13. If f : M X is a Bochner integrable function and X , then,


by the previous proposition,
(f ) : M R

(or C)

is integrable (in this case the Bochner and the Lebesgue integrals coincide). This
shows that the Bochner integral is a restriction of any notion of a weak integral.
We now return to the functional calculus given for matrices (see Theorem 1.1.38). Let B L(X) and let H((B)) be a collection of holomorphic functions on a neighborhood of (B) (this neighborhood can depend on a function).
If f H((B)), then there exists a positively oriented Jordan curve such that
(B) int and f is holomorphic on a neighborhood of int . Hence the integral

1
f (B)x 
f (w)(wI B)1 x dw,
x X,
2i
exists. Its properties are collected in the following assertions.
Proposition 3.1.14 (Dunford Functional Calculus). Let X be a complex Banach
space and let B L(X). There exists a unique linear mapping : H((B))
L(X) with the following properties:
(i) (f g) = (f )(g) = (g)(f ) for f, g H((B));
n
n


(ii) if P (w) =
aj wj , then P (B) =
aj B j ;
(iii) if f (w) =

j=0
1
w

j=0

for w = and  (B), then f (B) = (I B)1 ;

{fn }n=1

H((B)), fn f on a neighborhood of int , then we have


(iv) if
fn (B) f (B) in the norm topology;
(v) if f H((B)), then (f (B)) = f ((B)).
Proof. The proof can be found, e.g., in Dunford & Schwartz [44, Section VII.3].

Suppose that 0 (B) is an isolated point of the spectrum of B, i.e., there
exist disjoint neighborhoods U0 of 0 and U of (B) \ {0 }. The function

1 for w U0 ,
f (w) =
0 for w U,
belongs to the collection H((B)) and the operator P0  f (B) is a projection of
X onto the subspace X0  P0 (X) since P02 = P0 . The operator P1  I P0 is a
projection onto the complementary subspace X1 , i.e., X = X0 X1 . Denote by
B0 and B1 the restrictions of B onto X0 (i.e., B0 L(X0 )) and X1 , respectively.
Proposition 3.1.14(v) implies that
(B0 ) = {0 },

(B1 ) = (B) \ {0 }.

114

Chapter 3. Abstract Integral and Dierential Calculus

Put
0  {0 + rei : [0, 2]}

for a (small) positive r.

Using Proposition 3.1.14 we get (see Exercise 3.1.16)



(I0 B0 )1 x0
1
d
(I0 B0 )1 x0 =
2i 0

n


1  0
1
=
(I0 B0 )1 x0 d
2i 0 0 n=0 0


x0
(1)n
+
(0 I0 B0 )n x0
0 n=1 ( 0 )n+1

(3.1.11)

for x0 X0 , | 0 | > r. The Taylor series for the function  (I1 B1 )1 x1


has the form (see Exercise 3.1.16)
1

(I1 B1 )


( 0 )n dn
1
x1 =
(I1 B1 ) x1 -n
n!
d
=0
n=0
=

(n+1)

(1) ( 0 ) (0 I1 B1 )
n

(3.1.12)

x1 ,

n=0

x1 X1 , | 0 | < r0  (0 I1 B1 )1 1 .
Proposition 3.1.15. If 0 is an isolated point of the spectrum (B), B L(X),
then there exist operators An L(X), n Z, and r > 0 such that
(I B)1 x =

+


( 0 )n An x

(3.1.13)

n=

for all x X and 0 < | 0 | < r. Moreover, if k N is such that


An = O

for every

n>k

and

z  Ak x = o,

then
Bz = 0 z.
On the other hand, if 0 is a nonzero eigenvalue of a compact operator B,
then 0 is a pole of the resolvent of B, i.e., there is k N such that
An = O

for all

n > k.

Proof. Let 0 be an isolated point of (B) and B L(X). If P0 , P1 are the above
projections onto X0 , X1 , then
(I B)1 x = (I B0 )1 x0 + (I B1 )1 x1 ,

P0 x = x0 ,

P1 x = x1 ,

3.1. Integration of Vector Functions

115

and (3.1.13) follows from (3.1.11) and (3.1.12). Since


A(k+1) x = (B 0 I)Ak x,
the second statement holds as well.
Suppose now that B is compact and 0 = 0 is an eigenvalue of B. By Corollary 2.2.13, 0 is an isolated point of (B). Since the restriction B0 of B onto the
subspace X0 has the spectrum (B0 ) consisting of 0 only, B01 exists and is continuous. Therefore, the unit ball B(x0 ; 1) = B0 (B01 (B(x0 ; 1))) is a compact set.
Proposition 1.2.15 says that M = dim X0 is nite. It follows from Lemma 1.1.31
that
for a certain k N
X0 = Ker (0 I0 B0 )k
and (see (1.1.20))
1

(I0 B0 )

k1


k

(1)n
An x
n
x0 =
(0 I0 B0 ) x0 =
n+1
(

)
(
0 )n
0
n=0
n=1

where P0 x = x0 . The proof is complete.

Exercise 3.1.16. Give details to conrm the formulae of resolvent (3.1.11) and
(3.1.12).
Hint. For (3.1.11) replace the sum and the integral and use Proposition 3.1.14(ii).
For (3.1.12) use the resolvent identity
(I B)1 (I B)1 = ( )(I B)1 (I B)1
and induction.
Exercise 3.1.17. Compare the functional calculus from Proposition 3.1.14 with
that of Remark 2.2.18. More precisely, show that for a compact, self-adjoint operator B the functional calculus given in Remark 2.2.18 is an extension of that of
Proposition 3.1.14.
Exercise 3.1.18. Let X be a Banach space. Assume that f : [a, b] X has the
Riemann integral over the interval [a, b]. Show that then the Bochner integral
 b
f (t) dt also exists and the two integrals are equal. In particular, Proposia

tion 3.1.3 is a special case of Proposition 3.1.12. However, the proof of Proposition 3.1.3(ii) is much simpler.
Exercise 3.1.19. Let A : Dom A H H be a densely dened linear operator on
a Hilbert space H. Assume that A has a compact resolvent that is also self-adjoint.
(i) Extend the functional calculus (Remark 2.2.18 and Proposition 3.1.14) to
such A. In particular, show that the formula for (f )x still holds provided
that


|f (n )|2 |(x, en )|2 <
n=1

116

Chapter 3. Abstract Integral and Dierential Calculus

(here (A) = {1 , . . . }). Notice that (f ) need not be bounded if dim H =


since (A) is unbounded in this case. Also, (f ), (g) do not commute in
general.
(ii) Suppose that (A) is bounded above. Show that the function
with x0 Dom A

u(t)  etA x0

is a continuous solution to the initial value problem



x(t)

= Ax(t),
x(0) = x0 .
(iii) Prove that
 t
esA x ds Dom A for all x H

(3.1.14)


and

esA x ds = etA x x.

A
0

In other words, etA x is a continuous solution of the integral form of (3.1.14).


(iv) Prove that

et etA x dt = ( A)1 x
for all x H
0

and suciently large Re (actually for Re > sup{(Ax, x) : x Dom A,


x = 1}).
(v) Let g : [0, ) H be a continuous mapping and u : [0, ) H a solution
to the initial value problem

x(t)

= Ax(t) + g(t),
x(0) = x0 .
Show that


e(ts)A g(s) ds

tA

u(t) = e x0 +

for

t 0.

(vi) Find conditions on a continuous mapping h : H H such that the existence


of a continuous solution to the integral equation
 t
e(ts)A h(x(s)) ds
x(t) = etA x0 +
0

follows from the Contraction Principle. Such a solution is called a mild solution of the problem

x(t)

= Ax(t) + h(x(t)),
(3.1.15)
x(0) = x0 .

3.2. Dierential Calculus in Normed Linear Spaces

117

If A is as in Exercise 2.2.21(i), (ii), (iv), or, more generally, A = with


suitable boundary conditions, then (3.1.15) is a semilinear partial dierential
equation of parabolic type.
Exercise 3.1.20. Let B be a compact, self-adjoint operator on a Hilbert space H
and let 0 (B) \ {0}. Compute An in the expression (3.1.13).

3.2 Dierential Calculus in Normed Linear Spaces


We suppose that the reader is acquainted with partial derivatives and the differential of functions of two real variables. Our goal in this section is to extend
these notions to mappings between normed linear spaces. Many innite dimensional spaces vary from RN by the lack of any natural basis. In particular this means
that there is no way of generalizing partial derivatives. We dene a directional
derivative instead.
Denition 3.2.1. Let X, Y be normed linear spaces (both over the same scalar
eld) and let f : X Y . If for a, h X the limit (in the norm of Y )
f (a + th) f (a)
t0
t
lim
tR

exists, then its value is called the derivative of f at the point a and in the direction
h (or directional derivative or G
ateaux variation) and is denoted by f (a; h).
If f (a; h) exists for all h X and the mappingDf (a) : h  f (a; h) is linear
and continuous, then Df (a) is called the G
ateaux derivative of f at the point a.5
Remark 3.2.2. Simple examples of functions of two variables show that the directional derivative need not be linear in h and not even the existence of Df (a)
guarantees the continuity of f at the point a.
M
N
N
M
Example 3.2.3. Consider the standard bases eM
1 , . . . , eM and e1 , . . . , eN of R
N
M
N
and R , respectively. Then we can write f : R R in the form

f (x) =

N


f i (x)eN
i

(or briey f = (f 1 , . . . , f N )).

i=1

It is easy to see that f (a; h) exists if and only if f i (a; h) exists for all i =
i
1, . . . , N . In particular, for h = eM
j , the directional derivative f (a; h) is nothing
else than
5 The

f i
xj (a).

This means that the Gateaux derivative Df (a) has the matrix

terminology concerning G
ateaux dierentiability is not xed. Some authors do not assume
linearity of Df (a).

118

Chapter 3. Abstract Integral and Dierential Calculus

representation with respect to the standard bases in

f 1
f 1
x (a) . . . x (a)
1
M

..
..
.
.

.
.
.

f N
f N
(a) . . .
(a)
x1
xM

the form

This matrix is called the Jacobi matrix of f at the point a. If M = N , then its
determinant is denoted by
(f 1 , . . . , f M )
= Jf
(x1 , . . . , xM )
g

and is called the Jacobian of f at a.

Example 3.2.4. Suppose that H is a Hilbert space and f : H R (or C) has a


G
ateaux derivative Df (a) at a H. Then, by the Riesz Representation Theorem,
there exists a unique point f (a) H such that
Df (a)h = (h, f (a))H .
The element f (a) is called the gradient of f at a. Notice that the gradient f
g
is a mapping from H into itself.
Remark 3.2.5. One of the most important applications of the notion of derivative
is in extremal problems of classical analysis. The well-known theorem (due to
Fermat) asserts that the derivative is zero at an extremal point provided this
derivative exists. The same result obviously holds for f : X R also in an innite
dimensional space X.6
The previous remark indicates the use of the notion of derivative for solving
the equation
F (x) = o
for F : H H.
Namely, suppose that there is a functional f : H R such that
f = F.
Then it is sucient to show that f has a local maximum or minimum. However, it
is a very nontrivial problem to nd such f (the so-called potential of F ) or to nd
conditions to ensure its existence. See Chapter 6 for more details. A discussion of
the nite dimensional case (H = RM ) is given in Appendix 4.3B (Remark 4.3.62
and Theorem 4.3.64).
We postpone examples since various properties of the derivative will be
needed to introduce them.
6A

simple reason for this observation comes from the fact that the directional derivative f (a; h)
describes the behavior of the functional f along the straight line {a + th : t R}, i.e., the
behavior of the real function t  f (a + th) near zero.

3.2. Dierential Calculus in Normed Linear Spaces

119

Theorem 3.2.6 (Mean Value Theorem). Let X be a normed linear space and
Y a Banach space. Let f : X Y have the directional derivative at all points
of the segment joining points a, b X in the direction of this segment, i.e.,
f (a+t(ba); ba) exists for all t [0, 1]. If the mapping t  f (a+t(ba); ba)
is continuous on [0, 1], then
 1
f (a + t(b a); b a) dt.
(3.2.1)
f (b) f (a) =
0

Proof. Take a Y

and denote

g(t) = (f (a + t(b a))),

t [0, 1].

By the denition of the directional derivative, we have


g (t) = [f (a + t(b a); b a)]
and g is continuous on [0, 1]. It follows from the Basic Theorem of Calculus that
 1
[f (a + t(b a); b a)] dt = g(1) g(0) = (f (b)) (f (a)).
0

The Riemann integral

f (a + t(b a); b a) dt exists (see Theorem 3.1.2) and,

by Proposition 3.1.12(i), we get



 1
[f (a + t(b a); b a)] dt =
0


f (a + t(b a); b a) dt .

Since Y has been chosen arbitrary, the HahnBanach Theorem (in particular,
Remark 2.1.17(ii)) implies the equality (3.2.1).

The following result oers another possible formulation.
Theorem 3.2.7 (Mean Value Theorem). Let X, Y be normed linear spaces and let
f : X Y . If for given a, b X the directional derivative f (a + t(b a); b a)
exists for all t [0, 1], then
f (b) f (a)Y sup f (a + t(b a); b a)Y

(3.2.2)

t[0,1]

and
f (b) f (a) f (a; b a)Y sup f (a + t(b a); b a) f (a; b a)Y .
t[0,1]

(3.2.3)
Moreover, if Df (a + t(b a)) exists for all t [0, 1], then
f (b) f (a)Y sup Df (a + t(b a))L(X,Y ) b aX .
t[0,1]

(3.2.4)

120

Chapter 3. Abstract Integral and Dierential Calculus

Proof. An idea similar to the previous proof is used. By the dual characterization
of the norm (Corollary 2.1.16) there is X ,  = 1, such that
f (b) f (a) = (f (b) f (a)).
Dene now
g(t) = (f (a + t(b a))),

t [0, 1].

Then
g (t) = (f (a + t(b a); b a))
and, therefore, the function g satises all assumptions of the classical Mean Value
Theorem. Consequently, if X is a real space, we get
f (b) f (a) = g(1) g(0) = g () = (f (a + (b a); b a))
f (a + (b a); b a)

for a (0, 1).

If X is a complex space, we consider Re g and obtain


f (b) f (a) sup |g ()|
(0,1)

(see the next remark) and the assertion also follows. The proof of (3.2.3) is similar
and (3.2.4) is an easy consequence of (3.2.2).

Remark 3.2.8. The Mean Value Theorem for functions from R R is often stated
in the following form:
There is (0, 1) such that
f (b) f (a) = f (a + (b a))(b a)
provided f is continuous on the interval [a, b] and f (x) exists for every
x (a, b).
Warning. This equality does not hold even for f : R C ( R2 ) (e.g., f (x) = eix ,
a = 0, b = 2)!
Example 3.2.9. Dierentiability of the norm is connected with the properties of the
corresponding space (see, e.g., Fabian et al. [49, Chapter 5]). As a simple example
we will show the relation between the uniqueness of the supporting hyperplane at
a given point a X, a = 1, and the G
ateaux dierentiability of the norm at the
point a. We recall that by Corollary 2.1.16 there is X ,  = 1, such that
(a) = a = 1 and
Re (x) 1

for all x X, x 1.7

hyperplane M = {x X : (x) = 1} is then called a supporting hyperplane to the unit


ball of X at the point a. Such a X need not be uniquely determined.

7 The

3.2. Dierential Calculus in Normed Linear Spaces

121

Put f (x) = x. Fix h X and let g(t) = a + th, t R. The function g is a
convex real function, and therefore there exist the right and the left derivatives at


(0) g+
(0). Further, we have
zero and g
(a + th) (a)
g(t) g(0)

= (h)
t
t

for

t > 0.



In particular, g+
(0) (h) and similarly g
(0) (h). This means that is
uniquely determined provided the directional derivative of the norm exists at a
for all h X. In particular,
f (a; h) = (h),

i.e., the norm is G


ateaux dierentiable at a. The converse is also true. Indeed,


(0) > g
(0).
suppose by contradiction that f (a; h) does not exist for an h, i.e., g+


Choose [g (0), g+ (0)] and dene (a + th) = + t for scalars , t. Then

g+
(0)

a + th a
g(t) g(0)
=
t
t

for

t > 0,

and therefore
(a + th) = 1 + t a + th.
The same inequality holds for t 0. As an easy consequence we get
|(a + th)| a + th,

(a) = 1.

This means that


Y ,

 = 1

where

Y = Lin{a, h}.

The HahnBanach Theorem yields an extension of which determines a supporting hyperplane. Since for a dierent we get a dierent there is no uniqueness of supporting hyperplanes at a and the duality mapping8 is not single-valued
g
at a.
Similarly to partial derivatives, the G
ateaux derivative is also unsuitable for
the Chain Rule for dierentiability. We recommend to the reader to construct
examples of f : R2 R, g : R R2 such that f (g) has no derivative at o in spite
of the fact that
Df (o) = 0,

g(0) = o,

g (0) = (0, 0).

For this purpose a stronger notion of dierentiability is needed. The following


denition is a straightforward generalization of the dierential of a function of two
variables.
The map : X exp X : (x)  {f X : f X = x X , f (x) = x 2X } is called the
duality mapping. It is a multivalued mapping and belongs to the fundamental concepts in the
Banach space theory.

122

Chapter 3. Abstract Integral and Dierential Calculus

Denition 3.2.10. Let X, Y be normed linear spaces (both over the same scalar
eld). A mapping f : X Y is said to be Frechet dierentiable at a point a X
if there exists A L(X, Y ) such that
f (a + h) f (a) AhY
= 0.
ho
hX

(3.2.5)

lim

In this case A is called the Frechet derivative of f at the point a and is denoted
by f (a).
Remark 3.2.11.
(i) If f (a) exists, then also Df (a) exists. Moreover,
f (a)h = Df (a)h

for all h X.

(ii) Suppose that a linear operator A : X Y has the property (3.2.5). It is easy
to see that A is continuous if and only if f is continuous at a.
(iii) A basic analytical approach to the investigation of nonlinear problems involves their approximation by simpler objects. Among them linear approximations are more appropriate from the local point of view. The classical
notion of the derivative as the best local linear approximation is the most
transparent conrmation of this phenomenon (e.g., the Fermat Theorem for
local extremal points). The notion of Frechet derivative is a genuine generalization to innite dimensional spaces.
Theorem 3.2.12 (Chain Rule). Let X, Y , Z be normed linear spaces and let there
exist g(a; h) for g : X Y . If g(a) = b and for f : Y Z the Frechet derivative
f (b) exists, then
(3.2.6)
(f g)(a; h) = f (b)[g(a; h)].9
Proof. Choose > 0 and h X. By (3.2.5) there is > 0 such that
f (b + k) f (b) f (b)kZ kY

for

kY < .

Put
(t)  g(a + th) g(a) tg(a; h).
For
By Denition 3.2.1, there is > 0 such that (t)Y |t| for |t| < .
k  g(a + th) g(a) = g(a + th) b
we have
kY |t|g(a; h)Y + (t)Y |t|[g(a; h)Y + ].
more transparent notation we will often use the symbol f g instead of f (g) for the composition of f and g.

9 For

3.2. Dierential Calculus in Normed Linear Spaces

123

We may choose so small that the right-hand side in this inequality is less than
Using all the information and g(a; h) = k(t) we obtain
whenever |t| < .
t


 f (g(a + th)) f (g(a))


f (b)[g(a; h)]


t
Z


 f (b + k) f (b) f (b)k
(t) 


+
f
=
(b)

t
t Z
(t)Y
kY
+ f (b)L(Y,Z)
[ + g(a; h)Y + f (b)L(Y,Z) ]

|t|
|t|
The formula (3.2.6) follows.
for 0 < |t| < .

Corollary 3.2.13. Let the hypotheses of Theorem 3.2.12 be satised. If, moreover,
Dg(a) exists, then also D(f g)(a) does exist and the analogue of (3.2.6) is true.
A similar assertion is true for (f g) (a) provided g (a) exists.
Proof. The assertion on D(f g)(a) follows from (3.2.6). The proof for (f g) (a)
is similar to that given above.

Corollary 3.2.14. Let A L(Y, Z) and let f (a; h) exist for f : X Y . Then
(Af )(a; h) = Af (a; h)
and similarly for D(Af )(a) and (Af ) (a).
Proof. It is sucient to show that
A (y) = A

for all y Y,

but this follows immediately from the denition.

The verication of the degree of linear approximation needed in (3.2.5) is not


always an easy task. The following condition can be of use in such situations.
Proposition 3.2.15. Let Df (x) exist for all x in a neighborhood of a point a X.
If x  Df (x) is continuous at a (as a mapping X L(X, Y )), then f (a) exists.
Proof. According to the estimate (3.2.3) we have for small h,
f (a + h) f (a) Df (a)hY sup Df (a + th) Df (a)L(X,Y ) hX
t[0,1]

and the continuity of Df yields (3.2.5).

Denition 3.2.16. Let G be an open set in X and let f : X Y . If the G


ateaux
derivative Df : G L(X, Y ) is continuous on G (or equivalently, f is continuous
on G), then we write f C 1 (G).
One of the convenient conditions for the existence of the dierential (i.e., Frechet derivative) of f : R2 R is the continuity of partial derivatives. These can
be interpreted also as derivatives with respect to one-dimensional subspaces. A
generalization leads to the following denition.

124

Chapter 3. Abstract Integral and Dierential Calculus

Denition 3.2.17. Let f : X Y where X = X1 X2 and X1 , X2 , Y are normed


linear spaces.10 Let a2 X2 and let f1 : x1  f (x1 , a2 ). If f1 has the Gateaux
(or Frechet) derivative at a1 X1 , then Df1 (a1 ) (or f1 (a1 )) is called the partial
G
ateaux (or Frechet ) derivative of f at (a1 , a2 ) with respect to the rst variable
and is denoted by D1 f (a1 , a2 ) (or f1 (a1 , a2 )).
Similarly the partial derivative with respect to the second variable (D2 f or
f2 ) is dened.
If Df (a1 , a2 ) exists, then also D1 f (a1 , a2 ), D2 f (a1 , a2 ) exist and
Df (a1 , a2 )(h1 , h2 ) = D1 f (a1 , a2 )h1 + D2 f (a1 , a2 )h2 .

(3.2.7)

For the converse assertion we need more assumptions:


Proposition 3.2.18. Assume that D2 f exists on a neighborhood of a point (a1 , a2 )
and the mapping
D2 f : X1 X2 L(X2 , Y )
is continuous at (a1 , a2 ). Assume, moreover, that D1 f exists at the point (a1 , a2 ).
Then Df (a1 , a2 ) exists and (3.2.7) holds.
Proof. Choose suciently small h1 , h2 . Then, by (3.2.3),
f (a1 + th1 , a2 + th2 ) f (a1 , a2 ) tD1 f (a1 , a2 )h1 tD2 f (a1 , a2 )h2 
f (a1 + th1 , a2 + th2 ) f (a1 + th1 , a2 ) tD2 f (a1 + th1 , a2 )h2 
+ D2 f (a1 + th1 , a2 ) D2 f (a1 , a2 )|t|h2 
+ f (a1 + th1 , a2 ) f (a1 , a2 ) tD1 f (a1 , a2 )h1 
sup D2 f (a1 + th1 , a2 + t h2 ) D2 f (a1 + th1 , a2 )|t|h2 
0 1

+ D2 f (a1 + th1 , a2 ) D2 f (a1 , a2 )|t|h2 




 f (a1 + th1 , a2 ) f (a1 , a2 )


+

D
f
(a
,
a
)h
1
1 2 1  |t| = o(|t|)

t
as t 0, and the result follows.

Remark 3.2.19. If, in addition to the assumptions of Proposition 3.2.18, f1 (a1 , a2 )


exists, then f (a1 , a2 ) exists, too. The proof then follows the same lines as that
above.
Corollary 3.2.20. Let G be an open subset of X = X1 X2 and f : X Y . Then
f C 1 (G) if and only if both f1 , f2 belong to C 1 (G).
X is a normed linear space, too. A norm on X is, for example, dened by x X =
x1 X1 + x2 X2 , x = (x1 , x2 ) X1 X2 .

10 Then

3.2. Dierential Calculus in Normed Linear Spaces

125

Example 3.2.21. One of the most important nonlinear mappings is the so-called
Nemytski operator which is sometimes also called the substitution (or superposition) operator . As the latter term indicates it arises by the substitution of a
function : G RM R into the function f : G R R. This leads to a new
operator
F :  f (, ())
which acts on a space X of functions . We wish to nd conditions on f for F
to be a mapping from X into X and to have some derivatives. We start with the
case X = C[0, 1].
It is clear that the continuity of f on [0, 1] R is sucient to guarantee that
F : X X. Since f is uniformly continuous on compact sets of the form
{(x, y) [0, 1] R : |y (x)| 1}

for C[0, 1],

F is also continuous on X.
Suppose now that the partial derivative f
y is continuous on [0, 1] R. For
, h X we have, by the classical Mean Value Theorem,
f (x, (x) + th(x)) f (x, (x))
f
=
(x, (x) + (t, x)th(x))h(x)
t
y
for a (t, x) (0, 1) and
- f (x, (x) + th(x)) f (x, (x)) f

(x, (x))h(x)-sup -t
y
x[0,1]
- f
f
sup sup - (x, (x) + th(x))
(x, (x))-- |h(x)| hC[0,1]
y
01 y
x[0,1]

for all suciently small |t| (again by uniform continuity of


This means that the G
ateaux derivative DF () exists and
DF ()h : x 

f
y

on compact sets).

f
(x, (x))h(x).
y

Moreover, DF is continuous as a mapping X L(X) (again by the uniform



continuity of f
y ) and, by Proposition 3.2.15, F () exists for any X.
Warning. It is not always true that the existence of
DF !

f
y

implies the existence of

For example, let


X = { C[0, ) : (x)ex is bounded on [0, )}
with the norm
 

sup |(x)ex |
x[0,)

126

Chapter 3. Abstract Integral and Dierential Calculus

and let f (y) = sin y. Since f is Lipschitz continuous with constant 1 we obtain
F (1 ) F (2 ) 1 2 .
In particular, F is a continuous mapping from X into itself. But
F (0; h) = sin (0)h,

h X,

as could be erroneously supposed by analogy. Namely, for h(x) = ex X,



- x sin (tex ) 0
- sin y
x sup -e
e - = sup 1-- 1 for any t > 0.11
t
y
x[0,)
y[t,)
Similar calculations yield that F (0; h), h = o, does not exist at all.

The study of the Nemytski operator in spaces of integrable functions is much


more complicated. First it has to be proved that F () is a measurable function
on for Lp (). The following notion is crucial for this purpose.
Denition 3.2.22. Let be an open set in RN . A function f : R R is said
to have the Caratheodory property (notation: f CAR( R)) if
(M) for all y R the function x  f (x, y) is (Lebesgue) measurable on ;
(C) for a.a. x the function y  f (x, y) is continuous on R.
Proposition 3.2.23. Let be an open set in RN . Then
(i) if f : R R is continuous on R, then f CAR( R);
(ii) if f CAR( R) and : R is (Lebesgue) measurable on , then
F () : x  f (x, (x)),

x ,

is a measurable function on .
Proof. (i) Since a continuous function f (, y) is Lebesgue measurable, the assertion
is obvious.

(ii) Let be a measurable function on . Then there is a sequence {sn }n=1


of step functions which converge to a.e. in . If
s(x) =

k


i i (x)

i=1

is a step function on , i.e., there are pairwise disjoint 1 , . . . , k which are measurable,

k

1, x i ,
i
and
i (x) =
=
0, x  i ,
i=1
11 The

lack of dierentiability of the Nemytski operators in weighted spaces causes big problems
in the use of the Implicit Function Theorem.

3.2. Dierential Calculus in Normed Linear Spaces

then
f (x, s(x)) =

k


127

f (x, i )i (x)

i=1

is a measurable function (property (M) in Denition 3.2.22). By property (C),


lim f (x, sn (x)) = f (x, (x))

for a.a. x ,

i.e., F ()(x) = f (x, (x))




is measurable.

Having measurability of F () we can ask when F () Lq (). It is plausible


that a certain growth condition for f is needed.
Theorem 3.2.24. Let f CAR( R) and p, q [1, ). Let there exist g Lq ()
and c R such that
p

|f (x, y)| g(x) + c|y| q

for a.a.

and all

y R.

(3.2.8)

Then
(i) F () Lq () for all Lp ();12
(ii) F is a continuous mapping from Lp () into Lq ();
(iii) F maps bounded sets in Lp () into bounded sets in Lq ().
Proof. The proof of (i) is based on Proposition 3.2.23 and the use of the Minkowski
inequality (Example 1.2.16) and it is straightforward.
The proof of (ii) is quite involved and its crucial step consists in the fact that
F maps sequences converging in measure into sequences with the same property.
We omit details (see, e.g., Krasnoselski [78, I.2] or Appell & Zabreiko [8]).
The property (iii) follows from the growth condition (3.2.8).

Remark 3.2.25. The Caratheodory property can be generalized to functions
f : RM R. Proposition 3.2.23 and Theorem 3.2.24 hold similarly for
F (1 , . . . , M )(x)  f (x, 1 (x), . . . , M (x)).
Remark 3.2.26. Let RN be an open subset of RN and f : RN +1 R
satisfy the Caratheodory property. Assume, moreover, there exist g Lq () and
c R such that
N

p
|yi | q
|f (x, y)| g(x) + c
i=0

for a.a. x and all y = (y0 , y1 , . . . , yN ) RN +1 . Then F dened by


F (u)(x)  f (x, u(x), u(x))
is a continuous mapping from W 1,p () into Lq () and maps bounded sets into
bounded sets.
it can be proved that this property implies that (3.2.8) is satised for g Lq () and
c R, cf. Appell & Zabreiko [8].

12 Actually,

128

Chapter 3. Abstract Integral and Dierential Calculus

The growth condition with respect to y0 can be relaxed according to the


Embedding Theorem for W 1,p () (cf. Fuck & Kufner [54]).
Now we turn our attention to the directional derivative of the Nemytski
operator in the space L2 (). The exponents p = q = 2 are considered for simplicity
only.
In accordance with the computation in Example 3.2.21 we could expect
F (; h)(x) =

f
(x, (x))h(x)
y

provided the right-hand side belongs to L2 (). This is true if

(3.2.9)
f
y (, ())

L (),

is a bounded continuous function on R. But this is not the


e.g., whenever
whole story since we have to show that
-2
 - f (x, (x) + th(x)) f (x, (x)) f

(x, (x))h(x)-- dx 0 for t 0.


t
y

f
y

For a.a. x the function under the integral sign can be estimated by the Mean
Value Theorem (the formula (3.2.1)):
- f (x, (x) + th(x)) f (x, (x)) f

(x,
(x))h(x)
t
y
- f
1
f

sup -- (x, (x) + th(x))


(x, (x))--|t| |h(x)|.13
|t| 01 y
y
The right-hand side converges to zero for t 0 for a.a. x (by the continuity of
f
y ). In order to justify the use of the Lebesgue Dominated Convergence Theorem
we need a square integrable majorant. In particular, boundedness of f
y on R
14
is sucient.
In the case when F depends also on the gradient of the situation is only
technically slightly more complicated. Nemytski operators appear often under the
integral (see Chapters 6 and 7). Since the integral is a continuous linear form, in
particular, it is Frechet dierentiable, we can use the Chain Rule to get

 
N

f
h
f
(x, (x), (x))h(x) +
(x, (x), (x))
(x) dx
D()h =
yi
xi
y0
i=1
(3.2.10)
13 It is worth noticing how the classical Mean Value Theorem is used here: to avoid problems with
measurability of x  f
(x, (x) + (x)th(x)) the inequality form of the theorem is employed.
y
14 The reader should notice problems in nding conditions which ensure that (3.2.9) is also the
Fr
echet derivative. The situation is even much worse than one would expect. The function f has
to be linear for F : L2 () L2 () to be Frechet dierentiable (see, e.g., Ambrosetti & Prodi [6,
Chapter 1, Proposition 2.8]). See also Exercise 3.2.41 and Remark 3.2.42.

3.2. Dierential Calculus in Normed Linear Spaces

129

for

f (x, (x), (x)) dx,

() 

under appropriate assumptions on f .


Now we turn our attention to higher derivatives. We restrict our attention
to the second derivatives and believe that the reader will be able to dene the
third and higher order derivatives as well. Higher order derivatives of functions
are dened by induction. We will do the same for abstract mappings.
Let f : X Y , and a, h, k X. Put g(t, s) = f (a + th + sk). Then
g
(0, s) = f (a + sk; h),
t
which is a mapping from R (of variable s) into Y and can be dierentiated again:

2g
g
.
(0, 0) 
(0, s) -ts
s t
s=0
If these derivatives exist, then
2 f (a; h, k) 

g
(0, s) -t
s=0

is called the second directional derivative (in the directions h, k). Notice that
generally 2 f (a; h, k) = 2 f (a; k, h). (Find an example for f : R2 R!) It is easy
to see that for f : RM R we have
2 f (a; ei , ej ) =

2f
(a)
xi xj

if ei , ej are the unit coordinate vectors in RM .


It may occur that the operator
(h, k) X X  2 f (a; h, k)
is linear in both variables (i.e., it is the so-called bilinear operator ) and is contiateaux derivative
nuous on X X.15 In that case 2 f (a; , ) is called the second G
and is denoted by D2 f (a).
15 Equivalently,

it is continuous at the point (o, o) if there is a constant c such that


2 f (a; h, k) Y c h X k X for all h, k X.

(See a similar assertion in Proposition 1.2.10 for a linear operator.) Denoting the space of all continuous bilinear operators from X X into Y by B2 (X, Y ) we see that the least possible constant
c in the above inequality is a norm on B2 (X, Y ). See also the important Proposition 2.1.7.

130

Chapter 3. Abstract Integral and Dierential Calculus

Proposition 3.2.27 (Taylor Formula). Let X be a normed linear space and Y a


Banach space. Assume that a, h X and that 2 f (x; h, h) exists for all x M 
{a + th : t [0, 1]} and is continuous as a mapping from M into Y . Then
 1
f (a + h) = f (a) + f (a; h) +
(1 t) 2 f (a + th; h, h) dt.16
(3.2.11)
0

Proof. Put g(t) = (1 t)f (a + th; h). Then we have


g (t) = f (a + th; h) + (1 t) 2 f (a + th; h, h),

t [0, 1].

Since both terms on the right-hand side are continuous we get,


 1
 1
 1
g(1) g(0) =
g (t) dt =
f (a + th; h) dt +
(1 t) 2 f (a + th; h, h) dt.
0

Using Theorem 3.2.6 we obtain (3.2.11).

If we wanted to dene the second Frechet derivative also by induction we


should dierentiate f : X L(X, Y ) at a X to obtain f (a) L(X, L(X, Y )).
But this space seems to be rather strange and the space L(X, L(X, L(X, Y ))) (for
f (a)) really awkward. Because of that we identify L(X, L(X, Y )) with the space
of continuous bilinear operators B2 (X, Y ) (see footnote 15) and dene the second
Frechet derivative f (a) to be an element of B2 (X, Y ) with the approximation
property
f (a + h) f (a) f (a)(h, )L(X,Y )
= 0.
(3.2.12)
lim
ho
hX
The careful reader can ask why we have written f (a)(h, ) and not f (a)(, h) in
(3.2.12). The reason is that the mapping (h, k)  f (a)(h, k) is actually symmetric.
Proposition 3.2.28. If f (a) exists, then
f (a)(h, k) = f (a)(k, h)

for all

h, k X.

Proof. Similarly to the proof of the classical result on mixed partial derivatives
we express the dierence
f (a + h + k) f (a + h) f (a + k) + f (a)
which is equal to gi (1) gi (0) for
g1 (t)  f (a + th + k) f (a + th),

t [0, 1],

g2 (s)  f (a + h + sk) f (a + sk),

s [0, 1].

for n N the nth directional derivative n f (a; h, . . . , h) exists for all h X, then the
n

1 k
f (a; h, . . . , h) is called the Taylor polynomial of the degree n of f
mapping h  f (a) +
k!
  
k=1
16 If

k-times

at the point a.

3.2. Dierential Calculus in Normed Linear Spaces

131

Since f (a) exists, both the mappings f and f are dened on a neighborhood U
of a. Elements h, k are chosen so small that all variables belong to U.
We can express
gi (1) gi (0) = Ai + gi (0)

where Ai  gi (1) gi (0) gi (0),

and
g1 (0) = e1 (h, k) + f (a)(k, h),
g2 (0) = e2 (k, h) + f (a)(h, k),

e1 (h, k)  f (a + k)h f (a)h f (a)(k, h),


e2 (k, h)  f (a + h)k f (a)k f (a)(h, k).

Since g1 (1) g1 (0) = g2 (1) g2 (0), we have


f (a)(h, k) f (a)(k, h) = A1 A2 + e1 (h, k) e2 (k, h).

(3.2.13)

Now we estimate all terms on the right-hand side of this equality.


By Theorem 3.2.7,
Ai  sup gi (t) gi (0).
t[0,1]

Since f (a) is bilinear, we have


g1 (t) g1 (0) = f (a + th + k)h f (a + th)h f (a + k)h + f (a)h
= [f (a + th + k)h f (a)h f (a)(th + k, h)]
[f (a + th)h f (a)h f (a)(th, h)]
[f (a + k)h f (a)h f (a)(k, h)].

(3.2.14)

Choose now > 0 and > 0 corresponding to the denition of f (a) such that
f (a + u)v f (a)v f (a)(u, v) uv for

u < and any v X.

Then every term on the right-hand side of (3.2.14) is bounded by (h + k)2
provided h, k < . The same estimate holds for e1 (h, k) and similarly also
for g2 (t) g2 (0), e2 (k, h).
By (3.2.13) we obtain
f (a)(h, k)f (a)(k, h) A1 +A2 +e1 (h, k)+e2 (k, h) 8[h+k]2
(3.2.15)
provided h, k < . Choose h0 , k0 X and put
h = h0 ,

k = k0 .

For a suciently small the estimate (3.2.15) holds. Because of the bilinearity of
f (a) we get
f (a)(h0 , k0 ) f (a)(k0 , h0 ) 8[h0 2 + k0 2 ]
This completes the proof.

for any > 0.




132

Chapter 3. Abstract Integral and Dierential Calculus

Remark 3.2.29.
(i) It is not dicult to see that the existence of f (a) implies the existence of
D2 f (a) and the equality
f (a)(h, k) = D2 f (a)(h, k).
It is also possible to prove that the continuity of D2 f on an open set G X
(as a mapping from G into B2 (X, Y )) is equivalent to the continuity of f on
G. In this case we write f C 2 (G).
(ii) If X = RM , Y = R and D2 f (a) exists for f : RM R, then it is sucient
to know the values
D2 f (a)(ei , ej ),

i, j = 1, . . . , M,

to determine D2 f (a). This means that D2 f (a) (and also f (a)) can be represented by the matrix (the so-called Hess matrix )

2
f
(a) .
xi xj
Exercise 3.2.30. Let A L(X, Y ) and B B2 (X, Y ). Compute A , B and B !
Exercise 3.2.31. Let f : X Y be injective on an open set G X. Denote
1
(f |G ) = g. Suppose that f (a) and g (b) exist for an a G, f (a) = b. Is it true
that
g (b) = [f (a)]1 ?
(For conditions which guarantee the existence of g (b) see Section 4.1.)
Exercise 3.2.32. Put (A) = A1 for an invertible A L(X, Y ) (here X, Y are
Banach spaces). Show that
(A)(H) = A1 HA1 ,

H L(X, Y ) for all A Dom .

Hint. Use the same method as in Exercise 2.1.33.


Exercise 3.2.33. Let X be either C[0, 1] or Lp (0, 1), 1 p < . Compute f (x, h),
Df (x) and f (x) for f (x) = x, x X.
Exercise 3.2.34.
(i) Compute the duality mapping for the space Lp (), 1 p < .
(ii) Show that the duality mapping for the space C[0, 1] need not be single-valued.
Exercise 3.2.35. Let p > 1, RN and let


1
1
p
f (u) =
u(x) dx,
g(u) =
|u(x)|p dx
p
p

3.2. Dierential Calculus in Normed Linear Spaces

133


be functionals dened on W

1,p

() (here u(x) 

(2
N '

u(x)
xi

i=1

 12
). Prove that

f and g are Frechet dierentiable at each u W 1,p (), and


f (u)v =


u(x)p2 (u(x), v(x)) dx,

g (u)v =


|u(x)|p2 u(x)v(x) dx.

Hint. Let
(t) = |t|p2 t,
t = 0,
(0) = 0.
'
(
d
1 p
= (t), t R. Similarly, for y RN , y =
Then is continuous and dt
p |t|
N
1
 2 2
yi
, set
i=1

(y) = yp2 y,

(
'
Then p1 yp = (y) for all y RN .

y = o,

(o) = o.

Exercise 3.2.36. Find conditions on k and f for the so-called Hammerstein operator
 b
k(t, s)f (s, (s)) ds
H(t) =
a

to map L2 (a, b) into itself, and then dierentiate H!


Exercise 3.2.37. Dierentiate the following operators:
3
 1  t
2
(i) F () =
|(s)| ds dt, C[0, 1] or L2 (0, 1);
0



2

(s) ds

(ii) F ()(t) =

as

F : L1 (0, 1) L1 (0, 1),


F : C[0, 1] C[0, 1],
F : C[0, 1] C 1 [0, 1].
Exercise 3.2.38. Let f : [0, 1] R R and

F () =

f (t, (t)) dt.


0

Under which conditions on f do there exist D2 F () and F () if we consider


F : C[0, 1] R

and

F : L2 (0, 1) R?

134

Chapter 3. Abstract Integral and Dierential Calculus

Remark 3.2.39. The following assertion is due to I.V. Skrypnik:


2

If yf2 is continuous and bounded on (0, 1) R, then F C 2 (L2 (0, 1))


(F is dened as in Exercise 3.2.38) if and only if
f (t, y) = a(t) + b(t)y + c(t)y 2

a, b, c L (0, 1).

where

It is not too dicult to prove that by contradiction.


Exercise 3.2.40. Let f : [0, 1] R R R and


F () =

f (t, (t), (t)) dt,

C 1 [0, 1].

Under which conditions does D2 F () exist?


Exercise 3.2.41. Suppose that is a bounded open subset of RN , function
f : R R and its partial derivatives f
y are continuous on R (or both
satisfy the Caratheodory property). Let p > 2 and let there exist constants a, b
such that
- f
- (x, y)- a + b|y|p2 ,
x , y R.
- y


p
(the conjugate exponent) and F ()(x) =
If f (, 0) Lp () where p = p1
f (x, (x)), show the following facts:

(i) F maps Lp () into Lp ().
Hint. Integrate f
y and use the above estimate and Theorem 3.2.24.
p
(ii) F ()h : x  f
y (x, (x))h(x) for all h L ().
Hint. Proceed similarly to the main text. Use the H
older inequality to show
p
p
q
(x,
(x))
maps
L
()
into
L
(),
q = p2
.
that Fy ()(x)  f
y

(iii) The Frechet derivative F () exists for all Lp ().


Remark 3.2.42. If dierentiability properties of F are also needed for p 2, one
should replace Lp () by more sophisticated spaces like Besov or TriebelLizorkin
ones. See, e.g., Runst & Sickel [115, Chapter 5].

3.2A Newton Method


The Contraction Principle oers a very eective method for solving nonlinear equations,
either to prove the existence of a solution or to nd them numerically. Since the speed of
convergence is not always satisfactory, various modications have appeared. One of these
modications is even much older than the Contraction Principle itself and goes back to
I. Newton. An idea of this method can be seen from Figure 3.2.1 where the iterations for
solving the equation
f (x) = o
are shown.

3.2A. Newton Method

135

f (x)

x
x2

x1 


y1

f (a)(x a) + f (a)

a

Figure 3.2.1.

Suppose that we have found an approximate solution a. We wish to construct a


correction y such that
f (a + y) = o.
By the Taylor expansion,
y + r(
y) = f  (a + y)
y + r(
y),
f (a) = f (a + y) f  (a + y)
i.e.,
y = [f  (a + y)]1 [f (a) r(
y )] = [f  (a + y)]1 [f (a + y) f  (a + y)
y]  F (
y ) (3.2.16)
provided [f  (a + y)]1 exists.
The idea is to solve the equation
y = F (
y)

(3.2.17)

in a certain closed ball B(o; ) around o by iterations


yn+1 = F (yn ),

y0 = o.

Denoting xn = a + yn we can rewrite these iterations in the form


xn+1 a = [f  (xn )]1 f (xn ) + xn a,

(3.2.18)

which are exactly the iterations from Figure 3.2.1. If the sequence of iterations {yn }
n=1
converges to y, then
f (a + y) = o
as follows from (3.2.16). Our goal is to show:
(A1) There is > 0 such that F maps B(o; ) into itself and it is a contraction on
this ball.

136

Chapter 3. Abstract Integral and Dierential Calculus

(A2) The convergence of {xn }


n=1 is faster than the convergence of iterations given
by the Contraction Principle (cf. Theorem 2.3.1), actually there is a constant c
such that
(3.2.19)
xn+1 xn  cxn xn1 2 .17
We apparently need some assumptions to reach this goal. We assume that X is a
Banach space, f : X X, and, moreover:
and f  satises the Lipschitz
such that f C 1 (B(a; ))
(H1) There is a ball B(a; )
condition on this ball:
there exists L such that
f  (x) f  (y)L(X) Lx yX

for

x, y B(a; ).

(H2) The value f (a) is suciently small.18


(H3) The derivative f  (a) has a continuous inverse [f  (a)]1 L(X).
The proof of (A1), (A2) will be done in several steps. For the sake of simplicity we
denote
A(x)  f  (x),
 [f  (a)]1 .
A  f  (a),
1
1
then A (x) exists and
Step 1. If <
, ,
L

A1 (x)
Indeed, we can write

1 L

for

x B(a; ).

A(x) = A[I + A1 (A(x) A)].

Since A(x) A Lx a, we get


A1 (A(x) A) Lx a
and A1 (x) exists for x B(a; ) (by Proposition 2.1.2), and
A1 (x) =

(1)n [A1 (A(x) A)]n A1 .

n=0
1

The estimate for A (x) follows.


Step 2. If w, x B(a; ), then
A1 (w) A1 (x)

1 L

2
Lw x.

This estimate follows from the identity


A1 (w) A1 (x) = A1 (w)[A(x) A(w)]A1 (x)
and Step 1.
this quadratic estimate (which yields an exponential one for
x xn ) with an
estimate from the Contraction Principle
x xn q n x1 x0 for a 0 < q < 1.
18 This assumption means that we actually need a good approximation of a solution of the
equation f (x) = o (see Step 4 for the estimate of f (a) ).
17 Compare

3.2A. Newton Method

137

Step 3. We have
r(y)

L 2

r(y) r(z) 3Ly z

and

for

y, z B(o; )

where
r(y)  f (a) f (a + y) + A(a + y)y
(see (3.2.16)). Indeed, by Theorem 3.2.6, we get


[A(a + y) A(a + (1 t)y)]y dt

r(y) =
0

and
r(y) r(z) = f (a + z) f (a + y) + A(a + y)y A(a + z)z
 1
=
[A(a + t(z y)) A(a + y)](z y) dt + [A(a + y) A(a + z)]z.
0

The estimates now follow from (H1) and Step 2.


Step 4. The assertion (A1) holds. Indeed, we have
F (y) F (z) = A1 (a + y)[r(y) r(z)] + [A1 (a + z) A1 (a + y)][f (a) r(z)].
From (H1) and Steps 13 we get
F (y) F (z) c( + f (a)L)y z
with a c which is a bounded function of [0, 0 ] (0 small enough). This means that
we can choose and the estimate of f (a) in (H2) such that
F (y) F (z) qy z,

y, z B(o; )

for a

q (0, 1).

Moreover,
F (y) F (y) F (o) + F (o) q + f (a) ,
provided f (a) is suciently small.
Step 5. We can now prove the assertion (A2). By (3.2.18) and Theorem 3.2.6,
f (xn ) = f (xn ) f (xn1 ) f  (xn1 )(xn xn1 )
 1
[f  (xn1 + t(xn xn1 )) f  (xn1 )](xn xn1 ) dt.
=
0

Hence
f (xn )
and also

L
xn xn1 2 ,
2

xn+1 xn  A1 (xn )f (xn ) cxn xn1 2 .

Remark 3.2.43. The drawback of the iteration procedure (3.2.18) consists in the requirement to compute the inverse to the derivative at each step. This is the price for fast
convergence. One can assume that by replacing [f  (x)]1 by the xed inverse [f  (a)]1

138

Chapter 3. Abstract Integral and Dierential Calculus

we should avoid this disadvantage. This idea is also due to I. Newton. Conditions for convergence of these iterations were found by Kantorovich (see Kantorovich [72]). Serious
problems appear when the derivative f  (x) is injective but not continuously invertible.
In applications, e.g., to nonlinear partial dierential equations, we have many possibilities of the choice of Banach spaces X , Y such that f : X Y (see, e.g., Example 1.2.25 and Example 2.1.29). It can happen that
[f  (x)]1 L(Y , X )

where

X X .

This means that the equation


f  (xn )(x xn ) = f (xn )
(see (3.2.18)) which has to be solved to obtain the (n + 1)st -iteration xn+1 , has a solution
in a larger space X provided xn X . Therefore the iterations belong to larger and
larger spaces and, after a nite number of steps, there is no solution at all. This can
be also expressed by an observation that xn+1 is less smooth than xn , or that derivatives are lost during iterations. One way to overcome these diculties consists in the
approximation of [f  (x)]1 by a better operator L(x) in the sense that
f  (x)L(x) IL(Y )
is smaller and smaller when x approaches a solution of
f (x) = o.
Precise conditions under which new iterations
wn+1 = wn L(wn )f (wn )
converge to a solution can be found, e.g., in Moser [97, pp. 265315 and 499535]. A
similar idea appeared earlier in Nash [98]. See also Remark 4.1.6 for a slightly dierent
explanation.
Exercise 3.2.44. Let f C 1 (R) be a convex real function.
(i) Using only the results of elementary calculus prove the convergence of Newton
approximations under appropriate assumptions. Give the reccurence formula for
f (x) = x2 A,

A > 0.

(ii) The same as in (i) for the Kantorovich approximations (Remark 3.2.43).

Chapter 4

Local Properties of
Dierentiable Mappings
4.1 Inverse Function Theorem
In this section we are looking for conditions which allow us to invert a map f : X
Y , especially f : RM RN . The simple case of a linear operator f indicates that
a reasonable assumption is that M = N .
Let us start with the simplest case M = N = 1. The well-known theorem
says that if f is continuous and strictly monotone on an open interval I, then f
is injective and f (I) is an open interval J . Moreover, the inverse function f 1 is
continuous on J .
It is not clear how to generalize the monotonicity assumption to RM (cf. Section 5.3), and without it the theorem is not true even in R. Since the monotonicity
of a dierentiable function f : R R is a consequence of the sign of the derivative
of f , we take into consideration also f . The example f (x) = x2 where f is not injective in any neighborhood of the origin shows that we have to assume f (x) = 0.
In fact, if f is continuous on an open interval I, f (x) exists (possibly innite)
at all points of I, and f does not vanish at any point of I, then f is injective
(actually strictly monotone since f is either strictly positive or strictly negative
in I), and f 1 is continuous and dierentiable on the open interval f (I).
Therefore, we are looking for a generalization of the assumption f (x) = 0 for
maps f : RM RM . Since we are interested in a (unique) solution of the equation
f (x) = y,
the case of a linear function f : RM RM (then f (x) = f ) suggests assuming
that f (x) is either an injective or, equivalently because of the nite dimension, a
surjective linear map. In both cases, f (x) is an isomorphism of RM onto RM (for
the case of Banach spaces see Theorem 2.1.8).

140

Chapter 4. Local Properties of Dierentiable Mappings

However, there is still one more problem: Let


f (r, ) = (r cos , r sin ), (r, ) (0, ) R,

or g(z) = ez , z C.

Both functions are innitely many times dierentiable on their domains,


det f (r, ) = r = 0,

g (z) = 0,

and f (r, ) is 2-periodic and g is 2i-periodic, i.e., f and g are not injective.
Therefore, we cannot expect more than only local invertibility. The philosophy of that is simple. Since the notion of derivative is a local one, we can deduce
only local information from it.
After these preliminary considerations we can state the main theorem. Since
there is no simplication in the case of nite dimension, we formulate it for general
Banach spaces.
Theorem 4.1.1 (Local Inverse Function Theorem). Let X, Y be Banach spaces, G
an open set in X, f : X Y continuously dierentiable on G. Let the derivative
f (a) be an isomorphism of X onto Y for a G. Then there exist neighborhoods
U of a, V of f (a) such that f is injective on U, f (U) = V. If g denotes the
inverse to the restriction f |U , then g C 1 (V).
Proof. We will solve the equation
f (x) = y
for a xed y near the point b = f (a) by the iteration process. To do that we have
to rewrite the equation f (x) = y as an equation in X. We denote by A the inverse
map [f (a)]1 L(Y, X). Then
f (x) = y

Fy (x)  x A[f (x) y] = x.

(4.1.1)

The simplest condition for the convergence of iterations is given by the Contraction
Principle (see Theorem 2.3.1). We have
Fy (x1 ) Fy (x2 ) = x1 x2 A[f (x1 ) f (x2 )]
Af (x2 ) f (x1 ) f (a)(x2 x1 ) A

sup

f () f (a)x1 x2 ,

B(a;r)

x1 , x2 B(a; r). (Here we have used the Mean Value Theorem (see formula
(3.2.3)).) In other words, we can choose r > 0 so small that
Fy (x1 ) Fy (x2 )

1
x1 x2 
2

(4.1.2)

for x1 , x2 B(a; r) G, y Y . Further,


Fy (x) a = Fy (x) Fy (a) + Fy (a) a

1
x a + Ab y.
2

If > 0 is such that A r2 , then Fy (x) B(a; r) provided x B(a; r), y
B(b; ). By the Contraction Principle, the equation (4.1.1) has a unique solution

4.1. Inverse Function Theorem

141

in B(a; r),
x  g(y) B(a; r)

for any y B(b; ).

Moreover, if g(yi ) = xi , i = 1, 2, then


g(y1 ) g(y2 ) = Fy1 (x1 ) Fy2 (x2 )
Fy1 (x1 ) Fy1 (x2 ) + Fy1 (x2 ) Fy2 (x2 )

1
x1 x2  + Ay1 y2 ,
2

i.e.,
g(y1 ) g(y2 ) 2Ay1 y2 .

(4.1.3)

In particular, g is a Lipschitz continuous map on B(b; ). To prove the dierentia B(b; ).


bility of g, x a y B(b; ) and choose > 0 so small that B(y; )


1
A candidate for g (y) is the inverse C(x)  [f (x)] for x = g(y). By (4.1.3),
x B(a; r) and
1
.
f (x) f (a)
2[f (a)]1 
This means that C(x) exists and C(x) L(Y, X) (cf. Exercise 2.1.33). So we wish
to estimate the expression
(k)  g(y + k) g(y) C(x)k

for k Y, k < .

Put
h = g(y + k) g(y),
We have

k = f (x + h) f (x).

i.e.,

(k) = h C(x)k = C(x)[f (x + h) f (x) f (x)h].

By the denition of the Frechet derivative, for any > 0 there is > 0 such that
f (x + h) f (x) f (x)h h

provided h < .

But (see (4.1.3)) h 2Ak. This means that


(k) = o(k),

i.e.,

g (y) = C(x) = [f (x)]1 .

This also implies the continuity of g (y) since the inverse [f (x)]1 depends continuously on x (see Exercise 2.1.33).
To complete the proof it remains to put
V = B(b; )

and

U = f1 (V) B(a; r).

Corollary 4.1.2. Let X, Y be Banach spaces, G an open subset of X, f C 1 (G, Y ).


If f (x) is an isomorphism of X onto Y for all x G, then f (G) is an open subset
of Y .
Proof. Use the denition of an open set and Theorem 4.1.1.

142

Chapter 4. Local Properties of Dierentiable Mappings

Example 4.1.3. If f C k (G), k N, then g C k (V). This follows easily from the
formula
x = g(y),
g (y) = [f (x)]1 ,
g
the Chain Rule and Exercise 3.2.32.
Denition 4.1.4. Let X, Y be Banach spaces. Then f : X Y is called a dieomorphism of G X (or a dieomorphism of G onto H = f (G)) if the following
conditions are satised:
(1) G is an open set in X, f C 1 (G),
(2) f (G) = H is an open set in Y ,
(3) f is injective on G and the inverse g = (f |G )1 belongs to C 1 (H).
If, moreover, f C k (G) for some k N, and (therefore) g C k (H), then f is
called a C k -dieomorphism.
A dieomorphism in RM can be viewed as a nonlinear generalization of a
linear invertible operator A : RM RM . Such A yields a linear transformation of
coordinates
y = Ax.
If is a dieomorphism of G onto H and a G, we can suppose without loss of
generality that
(a) = o
(if this is not true consider a new dieomorphism on G: (x)

= (x) (a)). Then


the Cartesian coordinates y1 , . . . , yM of y = (x) can be taken as (generalized or
nonlinear or non-Cartesian) coordinates of a point x in the neighborhood G of a.
Such coordinates play an important role in problems where we have to work on
non-at domains (e.g., on nonlinear manifolds see Appendix 4.3A).
Notice that we can also interpret Theorem 4.1.1 in the nite dimensional case
as follows:
The Cartesian coordinates (y1 , . . . , yM ) of the point y = f (x) are nonlinear coordinates of the point x. In these nonlinear coordinates the
dieomorphism f of U is equal to the identity map.
Example 4.1.5. Standard examples of nonlinear coordinates:
(i) Polar coordinates in R2 :
x = r cos ,

y = r sin

((r, ) = (x, y) is a dieomorphism of (0, ) (, + 2) onto R2 without


a half line);
(ii) Spherical coordinates in R3 :
x = r cos 1 cos 2 ,

y = r sin 1 cos 2 ,

z = r sin 2 ;

4.1. Inverse Function Theorem

143

(iii) Spherical coordinates in RM :


x1 = r cos 1 cos 2 cos M1 ,
x2 = r sin 1 cos 2 cos M1 ,
x3 = r sin 2 cos 3 cos M1 ,
..
.
xM1 = r sin M2 cos M1 ,
xM = r sin M1 .
Before using the Local Inverse Function Theorem we have to check that functions
i (r, 1 , . . . , M1 ) = xi ,

i = 1, . . . , M,

have continuous partial derivatives (obvious) and their Jacobi matrix is regular. Equivalently, the determinant J of the Jacobi matrix is nonzero at a point
(
r , 1 , . . . , M1 ), (
r , 1 , . . . , M1 ) = a. Here
J = rM1

M2
/

cosk k+1 ,

M 2.1

k=1

Example 4.1.6. The following question concerning the assumptions of Theorem


4.1.1 naturally arises: What happens if f (a) is not an isomorphism?
In the case of nite dimension, f (a) cannot be an isomorphism for f : RM
N
R whenever M = N . If M > N , i.e., the number of equations is smaller than the
number of variables, then we can expect (we recommend to consider the case of a
linear f ) to compute some of the variables. The simplest case is solved in the next
Section 4.2 (the Implicit Function Theorem). If M < N , then f (G) will probably be
a thin subset of RN . This case leads to the notion of a (dierentiable) manifold
(see the rst part of Section 4.3 (Denition 4.3.4) and Appendix 4.3A).
If both X and Y have innite dimension, it can occur that f (a) is injective
but Im f (a) is a dense subset of Y , dierent from Y . In this case, A = [f (a)]1
exists but it is not continuous into X. We can also explain this situation as follows.
If there is a constant c > 0 such that
f (a)hY chX

for all h X,

(4.1.4)

then f (a) is injective, Y1  Im f (a) is a closed subspace of Y . Moreover, if we


know that Y1 is dense in Y , then Y1 = Y and Theorem 4.1.1 can be applied. But
sometimes we are able to prove only a weaker estimate, namely that there is a
constant c > 0 such that
f (a)hY chX
1 We

use the notation

p
0
j=1

0
aj  a1 ap (  1).

for all h X

144

Chapter 4. Local Properties of Dierentiable Mappings

where  X is a weaker norm than  X . By this we mean that only the estimate
hX dhX

holds for all h X

(e.g., X = C 1 [0, 1], hX = sup |h(t)| + sup |h (t)|, hX = sup |h(t)|). Then
t[0,1]

t[0,1]

t[0,1]

A  [f (a)]1 maps Y continuously into the completion of X with respect to


the norm  X (remember that we need complete spaces for the Contraction
Principle).
In the above example one derivative is lost in an iteration. An idea how
to overcome this problem is to use an approximation of A and a more rapid
iteration process (e.g., the Newton iteration see Appendix 3.2A) to compensate
errors in the approximations of A (results of this type are the so-called Hard
Local Inverse/Implicit Function Theorems) see, e.g., Deimling [34], Hamilton [65],
g
Moser [97], Nash [98] or Nirenberg [100].
We now turn to a global version of the Inverse Function Theorem.
Theorem 4.1.7 (Global Inverse Function Theorem). Let X, Y be Banach spaces
and let f : X Y be continuously dierentiable on X. Suppose that f (x) is
continuously invertible for all x X and there is a constant c > 0 such that
[f (x)]1 L(Y,X) c

for all

x X.

Then f is a dieomorphism of X onto Y .


Proof. It is sucient to prove that f is injective and surjective. The statement on
the dieomorphism follows then from Theorem 4.1.1. Fix an a X and denote
b = f (a).
Step 1. The map f is surjective, i.e., f (X) = Y . To see this choose y Y and put
(t) = (1 t)b + ty,

t [0, 1].

We wish to show that there is a curve : [0, 1] X such that


f ((t)) = (t),

in particular,

y = (1) = f ((1)).

Since f is locally invertible at a X (Theorem 4.1.1), there is a neighborhood


1
U of a and > 0 such that (t) = (f |U ) (t) is well dened for t [0, ) and
C 1 ([0, ), X). Let
A  { [0, 1] : C 1 ([0, ], X), f ((t)) = (t), t [0, ]},

(4.1.5)

and = sup A. Notice that is uniquely determined by (4.1.5) (this follows


from the local invertibility of f ), and therefore there is C 1 ([0, ), X) such
thatf ((t)) = (t), t [0, ). Since we have
(t1 ) (t2 ) sup  (t)|t1 t2 | cy b|t1 t2 |
t[t1 ,t2 ]

4.1. Inverse Function Theorem

145

for all t1 , t2 [0, ), the mapping is uniformly continuous on the interval [0, ),
hence
lim (t)  ()
t

exists (X is a complete space) and the equality (4.1.5) holds for all t [0, ].
Now we are ready to prove that = 1. Indeed, if < 1, then we can apply
Theorem 4.1.1 at the point () to obtain a contradiction with the denition of
.
Step 2. The map f is injective. Suppose by contradiction that there are dierent
x1 , x2 X for which
f (x1 ) = f (x2 ).
Put
y  f (x2 ),

i (t) = (1 t)a + txi ,

i (t) = f (i (t)),

t [0, 1], i = 1, 2.

By a slight modication of the above procedure it is possible to prove the existence


of a mapping G : [0, 1] [0, 1] X such that
f (G(t, s)) = (1 s)1 (t) + s2 (t),

(t, s) [0, 1] [0, 1].

Then
f (G(1, s)) = (1 s)f (x1 ) + sf (x2 ) = y

for every s [0, 1].

This contradicts the local invertibility of f at x1 (= x2 ).

Exercise 4.1.8. A complex function f : C C is called holomorphic in an open


set G C if f (z) exists for every z G. If f (a) = 0 for an a G, then f is
1
locally invertible (Theorem 4.1.1). Prove that (f |U ) is holomorphic and apply
z
this result to f (z) = e to obtain a power series expression of a continuous branch
of the multivalued function log. (For the complex function proof see, e.g.,
Rudin [113, Theorem 10.30].)
Exercise 4.1.9. Let
1
f (x) = x + 2x2 sin ,
x

x = 0,

f (0) = 0.

Show that f is not injective on any neighborhood of zero. Which assumption of


Theorem 4.1.1 is not satised?
Hint. If U is a neighborhood of 0, show that f (x) = 0 has a solution in U and
f (x) = 0 at any such solution. Hence f is not injective on U. Note also that f is
not continous at 0.

146

Chapter 4. Local Properties of Dierentiable Mappings

Exercise 4.1.10. Find the form of the Laplace operator


u 

2u 2 u
+ 2
x2
y

in the polar coordinates in the set


G = {(x, y) R2 : x2 + y 2 > 0}

for

u C 2 (G).

Hint. If v(r, ) = u(r cos , r sin ), then


(u) =

2v
1 2v
1 v
+
+
2
2
2
r
r
r r

where (r, ) = (r cos , r sin )

is the transformation. Note that we have





u u
v v
,
,
=
( )1 .
x y
r
It is more comfortable to use this formula once again to compute

2 u 2u
x2 , y 2 .

Exercise 4.1.11. Show that the estimate


[f (x)]1 L(Y,X) c + dxX
is sucient in Theorem 4.1.7.
Hint. Use the Gronwall Lemma (Exercise 5.1.16) to estimate  (t).

4.2 Implicit Function Theorem


Let us start with a simple example of f : R2 R, e.g.,
f (x, y)  x2 + y 2 1.
Denote
M = {(x, y) R2 : f (x, y) = 0},
i.e., M is the unit circle in R2 . We would like to solve the equation
f (x, y) = 0
for the unknown variable y or to express M as the graph of a certain function
y = (x).

We immediately see that for any x (1, 1) there is a pair of ys (y1,2 = 1 x2 )


such that (x, y) M. In particular, M is not a graph of any function y = (x).
We can only obtain that M is locally a graph, i.e., for (a, b) M, a (1, 1),

4.2. Implicit Function Theorem

147

there is a neighborhood U of (a, b) such that M U is the graph of a function


y = (x). On the other hand, for x = 1 there is a unique y (y = 0) for which
(x, y) M. But there is no neighborhood U of (1, 0) such that M U is the graph
of a function y = (x). What is the dierence between these two cases?
In the former case the tangent line to M U exists at the point (a, b) with
the slope (a). Since
for x (a , a + ),

f (x, (x)) = 0
we have (formally by the Chain Rule)

f
f
(a, b) +
(a, b) (a) = 0,
x
y
i.e., (a) = ab , since

f
y (a, b)

(4.2.1)

= 2b = 0.


In the latter case, where (a, b) = (1, 0), we have f
y (1, 0) = 0, and (1)
cannot be determined from (4.2.1). The tangent line to M at the point (0, 1) is
parallel to the y-axis, which indicates some problems with determining a solution,
i.e., the (implicit) function . The reader is invited to sketch a gure.
This discussion shows the importance of the assumption

f
(a, b) = 0.
y
How can this assumption be generalized to f : RM+N RN ? A brief inspection
of the linear case leads to the observation that we can compute the unknowns
yM+1 , . . . , yM+N from the equations
fi (y1 , . . . , yM+N ) =

M+N


aij yj = 0,

i = 1, . . . , N,

j=1

uniquely as functions of y1 , . . . , yM if and only if


det (aij )

i=1,...,N
j=M+1,...,M +N

= 0.

fi
, and the condition on the regularity of the matrix
Nevertheless, aij = y
j
(aij )
means that the partial (Frechet) derivative of f (see Denii=1,...,N
j=M+1,...,M +N

tion 3.2.17) with respect to the last N variables is an isomorphism of RN .


Theorem 4.2.1 (Implicit Function Theorem). Let X, Y , Z be Banach spaces,
f : X Y Z. Let (a, b) X Y be such a point that
f (a, b) = o.
Let G be an open set in X Y containing the point (a, b). Let f C 1 (G) and let
the partial Frechet derivative f2 (a, b) be an isomorphism of Y onto Z.

148

Chapter 4. Local Properties of Dierentiable Mappings

Then there are neighborhoods U of a and V of b such that for any x U


there exists a unique y V for which
f (x, y) = o.
Denote this y by (x). Then C 1 (U). Moreover, if f C k (G), k N, then
C k (U).
Proof. We denote A  [f2 (a, b)]1 and dene
(x, y) G.

F (x, y) = (x, Af (x, y)),


Then F : X Y X Y , F C 1 (G) and

F (a, b)(h, k) = (h, Af (a, b)(h, k)).


One can verify that F (a, b) is an isomorphism of X Y onto itself. Hence we can
apply Theorem 4.1.1 to get neighborhoods U V of (a, b) and U V of (a, o) such
that for any U and = o V there exists a unique (x, y) U V such that
F (x, y) = (x, Af (x, y)) = (, o),

i.e.,

x = ,

U = U,

and, denoting y = (x),


f (x, (x)) = o.
This means that

F 1 (x, o) = (x, (x)).

Since the inverse F 1 is dierentiable, by Theorem 4.1.1 we conclude that


C 1 (U).

Remark 4.2.2. We can also deduce a formula for (x): Indeed, since
f (x, (x)) = o

for every x U

and both the functions f and are dierentiable, we get from the Chain Rule
f1 (x, (x)) + f2 (x, (x)) (x) = o,
and therefore
(x) = [f2 (x, (x))]1 f1 (x, (x))

for

x U1

(4.2.2)

where U1 U may be smaller if necessary in order to guarantee the existence of


the inverse
[f2 (x, (x))]1
(see Exercise 2.1.33).
Remark 4.2.3. The statement of Theorem 4.2.1 is by no means the best one. If we
have used the Contraction Principle directly we would obtain the existence of a
solution y = (x) under weaker assumptions. Namely, f (x, y) = o is equivalent to
y = y Af (x, y)

4.2. Implicit Function Theorem

149

and since x is a parameter here we do not need to assume the dierentiability


with respect to x if we content ourselves just with the existence of (and give up
its dierentiability). We recommend that the reader uses directly the Contraction
Principle to obtain the following statement:
Let X be a normed linear space, Y , Z be Banach spaces and let
f : X Y Z be continuous at the point (a, b) where
f (a, b) = o.
Assume that the partial Frechet derivative f2 (a, b) is an isomorphism
of Y onto Z and f2 : X Y L(Y, Z) is continuous at (a, b).
Then there are neighborhoods U of a and V of b such that for any
x U there is a unique y = (x) V for which
f (x, (x)) = o.
Moreover, is continuous at a.
It is also possible to avoid partly the requirement of invertibility of f2 (a, b)
(see Remark 4.1.6 and references given there).
There are many examples in Calculus where the Implicit Function Theorem
is used. We give one in Exercise 4.2.9, see also exercises in Dieudonne [35]. Our
attention is turned mainly towards more theoretical applications.
Example 4.2.4. Let
P (z) = z n + an1 z n1 + + a0
be a polynomial with real or complex coecients a0 , . . . , an1 . The famous Fundamental Theorem of Algebra says that if n 1, then the equation P (z) = 0 has
at least one solution z C and actually n solutions if all of them are counted with
their multiplicity. This means that P can be factorized as follows:
P (z) = (z z1 )k1 (z zl )kl ,

k1 + + kl = n,

where z1 , . . . , zl are dierent. A natural question arises: How do these solutions


z1 , . . . , zl depend on the coecients a0 , . . . , an1 of P ? Let
F (z, y0 , . . . , yn1 ) = z n + yn1 z n1 + + y0 : C Cn C.
Then
F (z1 , a0 , . . . , an1 ) = P (z1 ) = 0

and

If z1 is a simple root, i.e., k1 = 1, then


F
(z1 , a0 , . . . , an1 ) = 0,
z

F C (C Cn ).

150

Chapter 4. Local Properties of Dierentiable Mappings

and the Implicit Function Theorem says that z1 depends continuously on


a0 , . . . , an1 (also in the real case). But what happens if k1 > 1? Notice that
the cases of real and complex roots are dierent. In the former case, the real root
can disappear (x2 + = 0 for > 0), and in the latter case, the uniqueness can be
lost. Since the solution z1 ramies or bifurcates at a0 , . . . , an1 , this phenomenon
is called a bifurcation. We postpone a basic discussion of this very important
g
nonlinear phenomenon till the end of the next section.
Example 4.2.5 (dependence of solutions on initial conditions). Suppose that
f : R RN RN is continuous in an open set G R RN and has continuous
partial derivatives with respect to the last N variables in G. Denote by (; , ) a
(unique) solution of the initial value problem

x = f (t, x),
x( ) =
(see Theorem 2.3.4). We are now interested in the properties of with respect to
the variables (, ) G, cf. Remark 2.3.5. Let us dene
 t
f (s, (s)) ds (t).
(4.2.3)
(, , )(t)  +

For a xed (t0 , x0 ) G the solution (; t0 , x0 ) of (t0 , x0 , ) = o is dened


on an open interval J . Choose a compact interval I J such that t0 int I.
Then the mapping given by (4.2.3) is dened on a certain open subset H
R RN C(I, RN ) and takes its values in C(I, RN ). Further,
(t0 , x0 , (; t0 , x0 )) = o
and
[ 2 (, , )](t) = ,
[ 1 (, , )](t) = f (, ( )),
 t
[ 3 (, , )](t) =
f2 (s, (s))(s) ds (t),

t I,

RN ,

C(I, RN ) whenever (, , ) H.

Since these partial Frechet derivatives are continuous, C 1 (H) (see Proposition 3.2.18). The crucial assumption of the Implicit Function Theorem is the continuous invertibility of 3 (t0 , x0 , (; t0 , x0 )) in the space C(I, RN ). Put
 t
B(t) =
f2 (s, (s; t0 , x0 ))(s) ds,
C(I, RN ).
t0

We have proved in Example 2.3.7 that (B) = {0}. In particular,


B I = 3 (t0 , x0 , (; t0 , x0 ))

4.2. Implicit Function Theorem

151

is continuously invertible. By Theorem 4.2.1, there exist neighborhoods U of


(t0 , x0 ) and V of (; t0 , x0 ) such that for any (, ) U there is a unique V
such that
(, , ) = o.
Moreover, this is continuously dierentiable with respect to and , and for the
continuous mappings
() 

(; t0 , x0 )

and

() 

(; t0 , x0 )

we have, by Remark 4.2.2,


 t
f (t0 , x0 ) +
f2 (s, (s; t0 , x0 ))(s) ds (t) = o,


t0

+
t0

f2 (s, (s; t0 , x0 ))(s) ds (t) = o,

RN .

This means that and solve the so-called equation in variations


y(t)
= f2 (t, (t; t0 , x0 ))y(t)

(4.2.4)

(this is a system of N linear equations for and a system of N N equations for


) and full the initial conditions
(t0 ) = f (t0 , x0 ),

(t0 ) = I.

In particular, () is a fundamental matrix of (4.2.4).

(4.2.5)
g

As an application of dierentiability with respect to initial conditions we


briey sketch the approach to orbital stability of periodic solutions.
Example 4.2.6. Assume that we know a non-constant T -periodic solution 0 of an
autonomous system
x = f (x),
and that we are interested in the behavior of other solutions which start at time
t = 0 near 0 (0) = x0 . We assume that f C 1 (G), G is an open set in RN , and
denote by (, ) the solution satisfying (0, ) = . Let
M = {x RN : (x x0 , f (x0 ))RN = 0}.
In order to show that a solution (, ) exists on such an interval [0, t()] that it
meets M U (U is a neighborhood of x0 ) for the rst positive time t() near T
(T is the period of 0 ), see Figure 4.2.1, we can solve the equation
(t, )  ((t, ) x0 , f (x0 )) = 0
in the vicinity of the point (T, x0 ). We have
1 (T, x0 ) = (f (x0 ), f (x0 )) > 0

152

Chapter 4. Local Properties of Dierentiable Mappings

(f (x0 ) = 0 since 0 is non-constant) and




d

(t, x0 ), f (x0 ) = ((t, x0 ), f (x0 ))
2 (t, x0 ) =
d
(see the previous example) where (t, x0 ) is a fundamental matrix of the linear
T -periodic equation y(t)
= f (0 (t))y(t) (cf. (4.2.4)). So, we may use the Implicit
Function Theorem to get a function t() such that
t(x0 ) = T,

(t(), ) = 0,

U(x0 ).

f (x0 )

0 ()

RN

(, )
x0
(t(), )

Figure 4.2.1.

By (4.2.2) we also have


1
dt
(T, x0 ) =
((t, x0 ), f (x0 )),
d
f (x0 )2

RN .

This allows us to investigate the behavior of the so-called Poincare mapping


P ()  (t(), ),

U M.

The asymptotic orbital stability of 0 can be dened by the requirement


lim P n () = x0 .

For more detail the interested reader can consult, e.g., Amann [4, Section 23]. g

4.2. Implicit Function Theorem

153

We are often interested in asymptotic behavior of solutions of a system of ordinary dierential equations (linear or nonlinear), e.g., boundedness of solutions or
its convergence to some special solutions (constant, periodic, etc.). In the following
example we briey sketch a method which can be used.
Example 4.2.7. Consider the equation
x = Ax + f

(4.2.6)

where A is a constant N N matrix and f : R RN is bounded and continuous


on R (f BC(R, RN )). We are interested in bounded solutions of (4.2.6) on R.
Let us assume (A) iR = . With help of Functional Calculus (Theorem 1.1.38,
in particular, Remark 1.1.39(i), (ii)) we can construct two projections P + , P
onto complementary subspaces X + , X of RN which commute with A L(RN )
(A is the matrix representation of A in the standard basis) and such that
(A ) = (A) {z C : Re z < 0}

(A+ ) = (A) {z C : Re z > 0},

(A+ , A are the restrictions of A to X + , X , respectively). With help of the


Variation of Constants Formula it can be proved that for any f BC(R, RN )
there is a unique solution x of (4.2.6) in the space BC(R, RN ), and this solution
is given by the formula
 +
 t
+
(ts)A
e
P f (s) ds
e(ts)A P + f (s) ds.2
(4.2.7)
x(t) =

If we are interested in bounded solutions only on R+  [0, ), a similar computation shows that all such solutions for f BC(R+ , RN ) are given by
 t


+
x(t) = etA x +
e(ts)A P f (s) ds
e(ts)A P + f (s) ds
(4.2.8)
0

where x is an arbitrary point in X .


Both formulae (4.2.7) and (4.2.8) may be used for nding bounded solutions
to a semilinear equation
x = Ax + f (x)

where f (o) = o, f (o) = o, f C 1 (U)

(4.2.9)

(U is a neighborhood of o RN ). To do that we solve the corresponding nonlinear


equations (4.2.7), (4.2.8) where f () is replaced by g(x()) where g is bounded and
2 The

interested reader can check this formula and also (4.2.8) as an exercise on the use of the
Variation of Constants Formula.
Hint. Use the estimates etA x cet x for x X , t > 0, and etA x cet x for
x X + , t < 0, where the positive constants , c are independent of t and x, is such that
(A) { C : | Re | } = and c depends on only. These estimates follow from
Functional Calculus (see Exercise 1.1.42) and they ensure that integrals in (4.2.7) do exist.
Apply P + to
 both sides of the Variation of Constants Formula and send t to obtain

P + x(t) =
t

e(t)A P + f (s) ds provided x is a bounded solution. Similarly P x(t).

154

Chapter 4. Local Properties of Dierentiable Mappings

g(y) = f (y) in a neighborhood of 0. For details see Hale [63, Sections III.6 and
IV.3]. A solution in (4.2.8) depends on the parameter x , so we have the equation
(, )(t)

 (t) etA

e(ts)A P g((s)) ds +

e(ts)A P + g((s)) ds = o

with : X BC(R+ , RN ) BC(R+ , RN ) (check it you have to use the


estimates given in footnote 2 on page 153). This formulation is suitable for the use
of the Implicit Function Theorem. We have left details to the interested reader.
The graph of the mapping
: X  P + (0, )
s
(x0 ) of the equation (4.2.9) ((, ) is a
is the so-called local stable manifold Wloc
3
solution of (, ) = o). It follows from the formula (4.2.2) that

(o) = o,
s
i.e., Wloc
(x0 ) is tangent to the stable manifold X of the linear equation x = Ax,
g
see Figure 4.2.2.

Remark 4.2.8. It is sometimes convenient to dene a solution of nonlinear, in particular, partial dierential equations, more generally, not assuming that a solution
has all classical derivatives which appear in the equation (see Chapters 6 and 7).
Actually, we have seen one such possibility in the reformulation of a dierential
equation as an integral equation
x = F (x)
where F is given by the formula (2.3.6). Having a more general notion of solution
a natural question arises: Under which conditions is this solution smoother, in
particular, is it a classical solution? Such results are known as regularity assertions. The Implicit Function Theorem can be occasionally used to prove such
statements. See Theorem 6.1.14.
so-called stable manifold W s (x0 ) of the stationary point x0 of the equation x = g(x)
(g(x0 ) = o) is dened as follows: Let (, ) be a solution of this dierential
 equation satisfying


3 The

the initial condition (0, ) = . Then the stable manifold is W s (x0 ) =

: lim (t, ) = x0
t

 { W
and a local stable manifold is dened by
0 ) : (t, ) U for t 0} where
U is a neighborhood of x0 .
Notice the crucial assumption (A) iR = (i.e., o is a so-called hyperbolic stationary point
of the equation (4.2.9)) in the above argument. Figure 4.2.2 shows also the distinction between
stable and local stable manifolds. It is worth mentioning that a similar approach cannot be
used in the case (A) iR
= . Since there can exist eigenvalues on the imaginary axis of the
multiplicity greater than 1, we cannot expect a manifold consisting of bounded solutions. To get
the so-called central manifold we are forced to solve a nonlinear version of the equations (4.2.7)
in a weighted space instead of BC(R, RN ). However, this problem is more dicult due to the
lack of dierentiability of the Nemytski operator (see footnote 11 on page 126). For details see,
e.g., Chow, Li & Wang [25, Chapter 1] and references given there.
s (x )
Wloc
0

s (x

4.2. Implicit Function Theorem

155

W s (o)

X+

RN

(o, )

()
s
(o)
Wloc

W s (o)
Figure 4.2.2.

Exercise 4.2.9. Let f : RM RN and let be a dieomorphism dened on a


neighborhood U of the graph of f onto V RM+N . Write
1 (, ) = ( 1 (, ), 2 (, ))

for (, ) V.

This means the graph of f is isomorphic to


= {(, ) RM+N : 2 (, ) f ( 1 (, )) = o}.
The Implicit Function Theorem yields conditions for to be the graph of a function
= g().
(i) Formulate these conditions!
(ii) Express the derivative of f in terms of the derivative of g.
Hint. f (1 ) = [( 2 ) 2 g + ( 2 ) 1 ] [( 1 ) 1 + ( 1 ) 2 g ]1 .
Control question: Have you checked that the second term on the right-hand
side is an isomorphism of RM onto RM ?
(iii) Without using the general result from (ii) transform the equation
dy
= f (x, y)
dx
into polar coordinates!
Exercise 4.2.10. Let M be a metric space and f : M R R a continuous map.
Let c > 0 be such that for all x M , y1 , y2 R, we have
(f (x, y1 ) f (x, y2 ))(y1 y2 ) c|y1 y2 |2 .

156

Chapter 4. Local Properties of Dierentiable Mappings

Prove that for any x M there exists a unique y(x) R such that
f (x, y(x)) = 0
and, moreover, y : x  y(x) is a continuous map from M into R.
Hint. Use the properties of real functions of one real variable.
Exercise 4.2.11. Let M be a normed linear space, let f be as in Exercise 4.2.10
and, moreover, f C k (M R) with some k N. Then the implicit function
y = y(x) from Exercise 4.2.10 is of the class C k (M ). Prove it!
Hint. Use Theorem 4.2.1.
Exercise 4.2.12. Give details which are omitted in Example 4.2.7.
Exercise 4.2.13. Let A be a densely dened linear operator in a Hilbert space.
Assume that A has a compact self-adjoint resolvent. Extend the construction of the
local stable manifold (Example 4.2.7) to the equation (4.2.6). See Exercise 3.1.19
for the properties of this equation.
Exercise 4.2.14. Assume that
f (x, y) =

ajk (x x0 )j (y y0 )k ,

|x x0 | < ,

|y y0 | < .

j,k=0

Moreover, let a00 = 0, a01 = 0. Apply the Implicit Function Theorem and show
that the implicit function y(x) is the sum of a power series in a neighborhood
of x0 .
Note that for complex variables the result follows directly from the properties
of holomorphic functions and Theorem 4.2.1. In the real case one has to prove that
the formal power series for y(x) has a positive radius of convergence.

4.3 Local Structure of Dierentiable Maps,


Bifurcations
We now revert to the topic of Remark 4.1.6, i.e., to the case when the assumptions
of the Local Inverse Function Theorem (Theorem 4.1.1) are violated. In particular, it was mentioned there that the assumptions of the Local Inverse Function
Theorem are never satised for f : RM RN provided M = N . In the rst part
we will study local behavior of such mappings. In the second part we stress the
main idea of the LyapunovSchmidt Reduction and the approach to bifurcation
phenomena (CrandallRabinowitz Bifurcation Theorem).
Denition 4.3.1. Let f : X Y be a dierentiable map in a neighborhood of a
point a X. If f (a) is neither injective nor surjective, then a is called a singular
point of f .

4.3. Local Structure of Dierentiable Maps, Bifurcations

157

The following proposition deals with the rst non-singular case for the mapping f : RM RN , M < N . For the second one see Proposition 4.3.8.
Proposition 4.3.2. Let f : RM RN be a dierentiable map on an open set G
RM . Let a G and let f (a) be injective. Let Q be a (linear) projection of RN
onto Y1  Im f (a). Then there exist neighborhoods U of a, V of Qf (a) in Y1 , a
dieomorphism of U onto V and a dierentiable map g : V RN such that
f =g
(see Figure 4.3.1).
RN = Y1 Y2

Y2
f (a)

a
Q

o
U G
G RM

Qf (a)
V

Y1 = Im f (a)

Figure 4.3.1.

Proof. The proof is almost obvious from Figure 4.3.1. Put = Q f . Then
(a) = Qf (a)
is an isomorphism of RM onto Y1 . Since dim Y1 = M is nite, Y1 is a Banach
space (as a closed subspace of the Banach space RN ) and, by Theorem 4.1.1,
is a dieomorphism of a neighborhood U of a onto a neighborhood (in Y1 ) V of
Qf (a). It suces to put

g = f 1 .
Remark 4.3.3.
(i) We have used the nite dimension of Y  RN to ensure both the existence
of a continuous linear projection Q and the closedness of the range Im f (a).
If f : X Y , X, Y are Banach spaces, then neither of these two conditions
has to be satised. It follows from the proof that Proposition 4.3.2 holds
under these two additional assumptions. We notice that these assumptions
are superuous provided X has a nite dimension (see Remark 2.1.19).

158

Chapter 4. Local Properties of Dierentiable Mappings

(ii) It is also easy to prove that


(y)  g(Qy) (I Q)y Qf (a)
is a dieomorphism of a neighborhood W of b = f (a) onto a neighborhood
of o in RN . Indeed,
W
(b)k = (a)h (I Q)k

(b) = o,
where h RM is such that

Qk = (a)h.
Moreover, y f (G) W if and only if there is an x G such that
y = f (x)

and

(I Q)(f (x)) = o.

This means that there exists a local (nonlinear) transformation of coordinates


in W (given by ) such that f (G) W is expressed by
zM+1 = = zN = 0
in these new coordinates.
(iii) An interpretation similar to (ii) follows:
(I Q)f = (I Q)g() = (I Q)g(Qf ),

(Qf )  (I Q)g(Qf ).

This means that after a linear transformation of coordinates the last N M


components of f (i.e., (I Q)f ) depend (via ) on the rst M components
of f in a neighborhood of a. Compare this local nonlinear result to the linear
one for the equation
Ax = y,

Figure 4.3.2. Immersion

A L(RM , RN ).

Figure 4.3.3. Injective immersion

4.3. Local Structure of Dierentiable Maps, Bifurcations

159

(iv) A map f which satises the assumptions of Proposition 4.3.2 at each point
a G is often called an immersion of G into RN . An injective immersion
which is also a homeomorphism of G onto f (G) (in the induced topology
from RN ) is called an embedding. Some examples of immersions which are
not embeddings are shown in Figures 4.3.2 and 4.3.3. We note that we have
already used the term embedding for an injective continuous linear operator.
Further examination of Proposition 4.3.2 leads to the following denition
of a dierentiable manifold. This notion is basic for dierential geometry and
global nonlinear analysis. In this textbook we will mostly use it for purposes of
terminology only. Some basic facts on manifolds are given in Appendix 4.3A and
will be used for developing the notion of degree (Appendix 4.3D).
Denition 4.3.4. A dierentiable manifold of dimension M and of the class C k is
a subset M of RN (N M ) with the following property:
For each x M there is a neighborhood W of x (in RN ) and a
C k -dieomorphism of W into RN such that
(M W) = {y = (y1 , . . . , yN ) RN : yM+1 = = yN = 0} (W).
A relative neighborhood W M together with is called a (local) chart at the
point x M . The rst M coordinates (y1 , . . . , yM ) are called the local coordinates
of x on M . The collection of all charts of M is called an atlas of M .
Example 4.3.5.
(i) An open subset G RM is an M -dimensional dierentiable manifold of the
class C k for any k N (i.e., of the class C ).
(ii) The graph of a function f : RM R, f C k (G), G an open subset of RM , is
an M -dimensional dierentiable manifold of the class C k in RN , N M + 1.
(iii) Let
S 2 = {(x, y, z) R3 : x2 + y 2 + z 2 = 1}
be the 2-dimensional sphere. Then S 2 is a 2-dimensional dierentiable manifold of the class C in RN , N 3. Indeed, a chart for the upper open
half-sphere can be constructed as follows: let
1
(x, y, z) = (x, y, z 1 x2 y 2 ),
W = {(x, y, z) R3 : x2 + y 2 < 1, z > 0}.
Then is a dieomorphism of W into R3 and
(W S 2 ) = {(u, v, w) R3 : u2 + v 2 < 1, w = 0}.
We will see a more comfortable proof in Example 4.3.10.

160

Chapter 4. Local Properties of Dierentiable Mappings

Denition 4.3.6. Let X, Y be Banach spaces, f : X Y a dierentiable map on a


neighborhood of a point a X. If f (a) is a surjective map onto Y , then the point
a is called a regular point. If a is not a regular point, then it is called a critical
point . A value b Y is called a critical value of f provided the set
f1 (b)  {x X : f (x) = b}
contains a critical point. In the other case, b is a regular value.
Remark 4.3.7. There is a dierence between the notion of a singular point (Denition 4.3.1) and a critical point. For example, if f : RM RN , M < N , then all
points in RM are critical (but some of them can be non-singular). The importance
of the notion of a critical point will be more apparent in connection with the Sard
Theorem (Theorem 5.2.3) and its applications.
Proposition 4.3.8. Let G be an open subset of RM , f : G RN , f C k (G). Let
a G be a regular point of f . Then there are neighborhoods U of o RM , V of
a, and a dieomorphism C k of U onto V such that
{x V : f (x) = f (a)} = (U Ker f (a))
(see Figure 4.3.4).
X2

RM = X 1 X 2

V
a

RN
A1

f (x) = f (a)

o
U

X1 = Ker f (a)

Figure 4.3.4.

Proof. By Remark 4.3.7, M N . If M = N , then Theorem 4.1.1 can be applied.


Therefore, we assume that M > N . Denote by P a (linear continuous) projection
of X  RM onto X1  Ker f (a) and by X2 the complementary subspace given by
X2 = Im (I P ). If A is the restriction of f (a) to X2 , then A is an isomorphism
of X2 onto RN (A is both injective and surjective). Denote by A1 the inverse
isomorphism of RN onto X2 (A1 is also called a right inverse of f (a)). We can
rewrite f in the following way:
f (x) = f (a) + f (a)[A1 (f (x) f (a)) + P (x a)].

4.3. Local Structure of Dierentiable Maps, Bifurcations

Let us denote

161

(x) = A1 (f (x) f (a)) + P (x a).

A simple calculation shows that


(a)h = A1 f (a)h + P h = (I P )h + P h = h

for any h X.

Since (a) = o, is a dieomorphism of a neighborhood V G of a onto a


neighborhood U of o (Theorem 4.1.1). Further, x {y V : f (y) = f (a)} if and
only if x V and (x) = P (x a), i.e.,
(x) U Ker f (a).


The desired dieomorphism is the inverse of .

Remark 4.3.9.
(i) Proposition 4.3.8 together with its proof also holds for f : X Y , X, Y
Banach spaces provided there exists a linear continuous projection P of X
onto Ker f (a). The continuity of A1 follows in this case from the Open
Mapping Theorem (Theorem 2.1.8). The existence of such a projection P can
be shown in two important cases, namely, when Y has nite dimension (and
therefore Ker f (a) has nite codimension Example 2.1.12) or Ker f (a) has
nite dimension (Remark 2.1.19).
(ii) Notice that can be viewed as a local (nonlinear) transformation of coordinates in which f is a linear map, namely
f ((y)) = f (a) + f (a)y,

y U.

This formula also shows that all points in V are regular. Moreover, if z is
suciently close to b = f (a), then
y = A1 (z b) U

and

f ((y)) = z.

This shows that f (G) is an open set in RN provided all points of G are regular.
(iii) In the terms of dierentiable manifolds (Denition 4.3.4) the statement of
Proposition 4.3.8 can be formulated as follows:
If f : RM RN is a dierentiable map in an open set G RM ,
b RN , then the set
{x G : f (x) = b}
is a dierentiable manifold (either empty or of dimension M N )
provided b is a regular value of f .
(iv) Proposition 4.3.8 imposes certain restrictions on the set
{x RM : f (x) = f (a)}.
In Figures 4.3.54.3.7 there are some cases in which a is not a regular point
(i.e., it is a critical point). The value f (a) is critical in all cases.

162

Chapter 4. Local Properties of Dierentiable Mappings

a
a
a (cusp)
Figure 4.3.5.

Figure 4.3.6.

Figure 4.3.7.

Example 4.3.10. The sphere S 2 is a C -dierentiable manifold. To see this it is


sucient to use Remark 4.3.9(iii) for
f (x, y, z) = x2 + y 2 + z 2 1,

b = 0.

The assertions of the last two propositions are part of the following more
general result.
Theorem 4.3.11 (Rank Theorem). Let f : RM RN be a dierentiable map on
an open subset G RM and let the dimension of Im f (x) be constant for x G
(and equal to L N). Then for any a G there exist neighborhoods U of a, W of
b = f (a), cubes C in RM , D in RN and dieomorphisms : C U, : W D
such that the map F dened by F = f has the form
F (z1 , . . . , zM ) = (z1 , . . . , zL , 0, . . . , 0)

for all

z = (z1 , . . . , zM ) C

(see Figure 4.3.8).


Proof. Denote X2 = Ker f (a), P a (linear) projection in RM onto X2 , X1 = Ker P
and, similarly, Y1 = Im f (a), Q a (linear) projection in RN onto Y1 , Y2 = Ker Q.
Then the restriction A of f (a) to X1 is an isomorphism of X1 onto Y1 . Let A1
be the inverse isomorphism, A1 : Y1 X1 . By the proof of Proposition 4.3.8,
(x) = A1 Q(f (x) f (a)) + P (x a)
is a dieomorphism of the neighborhood U of a RM onto the neighborhood U
of o RM . Denote by the inverse to . For h1 X1 we have
(x)h1 = A1 Qf (x)h1 .
This implies that f (x) is injective on X1 ( (x) has this property). Since
dim X1 = dim Im f (x) = L,

4.3. Local Structure of Dierentiable Maps, Bifurcations

163

RN L

RML

RM

RN
D

C
o

RL
TC

RL
TD

X2 = Ker f (a)

RM

Y2

RN

f
P

W
U

X1

f (U)

f (U) W

Y1 = Im f (a)

Figure 4.3.8.

the restriction of f (x) to X1 is an isomorphism of X1 onto Im f (x). We can


express this fact in the commutative diagram (Figure 4.3.9).
(x)
X1

X1

A1 Q (an isomorphism)

f (x)
Im f (x)
Figure 4.3.9.

Using the decomposition RM = X1 X2 , we write


u = u1 + u2

ui Xi , i = 1, 2,
for u U,

and dene
g(u1 , u2 ) = f ((u1 + u2 )).
Now, we show that g actually depends on the rst variable only. To see this we
compute the derivative of g with respect to the second variable:
g2 (u1 , u2 )h2 = f ((u)) (u)h2 .

164

Chapter 4. Local Properties of Dierentiable Mappings

For k  (u)h2 and (u) = x we have


h2 = (x)k = A1 Qf (x)k + P k.
This means that A1 Qf (x)k = o. Since A1 Q is an isomorphism of Im f (x) onto
X1 (see Figure 4.3.9), we have f (x)k = o, i.e.,
g2 (u1 , u2 )h2 = o

for any h2 X2 .

The Mean Value Theorem (Theorem 3.2.7) implies that


g(u1 , u2 ) = g(u1 , o)

for

4
(u1 , u2 ), (u1 , o) U.

This result is shown in Figure 4.3.8 by shaded areas. Put g(u1 )  g(u1 , o).
We employ Proposition 4.3.2, in particular Remark 4.3.3(ii) to complete the
proof. Replacing there g for f , we obtain a dieomorphism of a neighborhood
of o RN such that
W of b = f (a) onto a neighborhood W
(I Q)(f (U) W) = o
(see the right lower corner of Figure 4.3.8). We get cubes C and D by dieomorphisms TC , TD in RM , RN , respectively, which transform non-Cartesian coordinates in X1 X2 or in Y1 Y2 into Cartesian coordinates in RM = RL RML
(TC (X1 ) = RL ), or in RN = RL RN L , respectively (see the upper part of Figure 4.3.8 and page 163).

Remark 4.3.12. The assertion of the Rank Theorem can be formulated in a slightly
less informative way as follows:
Under the hypotheses of Theorem 4.3.11, f (G) is a dierentiable manifold of dimension L.
Denition 4.3.13. Functions f1 , . . . , fN : RM RN are said to be independent in
an open set G RM if any point x G is regular for f = (f1 , . . . , fN ). In the
other case, the functions are called dependent .
The following assertion explains the notions of dependent and independent
functions.
Suppose the assumptions of the Rank Theorem are satised for
f = (f1 , . . . , fL , fL+1 , . . . , fN ) : RM RN
where functions f1 , . . . , fL are independent in a neighborhood of a point
a RM . Then there is a smooth function G : RL RN L such that
(fL+1 (x), . . . , fN (x)) = G(f1 (x), . . . , fL (x))
for x in a certain neighborhood of a.

fact, the use of Theorem 3.2.7 requires the segment joining (u1 , o) to (u1 , u2 ) to lie in U.
Taking a smaller U if necessary we can assume that U is convex.
Notice that we have got a similar result at the end of the proof of Proposition 4.3.8 where we
have considered only one ber, namely {x : f (x) = f (a)}.

4 In

4.3. Local Structure of Dierentiable Maps, Bifurcations

165

To prove this assertion notice rst that Im f (a) is an L-dimensional subspace of


RN and can be identied with RL {0}. This means that
Qf (x) = H1 (x)  (f1 (x), . . . , fL (x))
and, in the notation of the proof of Theorem 4.3.11,
f (x) = g(u1 )

where

u1 = A1 (H1 (x) H1 (a)).

In particular, fL+1 , . . . , fN are smooth functions of f1 , . . . , fL .


The notion of independent functions plays an important role also in the
theory of ordinary dierential equations. Indeed, let
x = v(x)
be a system of M dierential equations. A smooth non-constant function f : RM
R is called the rst integral of this system in an open set G RM if for any a G
there is an interval Ia such that for a solution (, a) of the system such that
(0, a) = a,
we have that
(t, a) G

and

d
f ((t, a)) = 0
dt

hold for t Ia .

It has been proved in the theory of ordinary dierential equations that a system
x = v(x) (v : G RM RM is smooth) has M 1 independent rst integrals
f1 , . . . , fM1 in a neighborhood U of any non-stationary point a G. A smooth
function g : U R is the rst integral if and only if g, f1 , . . . , fM1 are dependent
on U.
We remark that the knowledge of the rst integrals reduces the original system. For example, if f1 , . . . , fM1 are independent rst integrals in a neighborhood
U of a non-stationary point, then the transformation of coordinates
yi = fi (x),

i = 1, . . . , M 1,

yM = xM

leads to a new system


y i = 0,

i = 1, . . . , M 1,

y M = w(yM )

for a function w,

and after rescaling in time, to


y i = 0,

i = 1, . . . , M 1,

y M = 1.

For another interpretation and a generalization of the notion of the rst integral
see Exercise 4.3.26 and the end of Appendix 4.3A.

166

Chapter 4. Local Properties of Dierentiable Mappings

Remark 4.3.14. A result similar to the Rank Theorem holds also for a dierentiable map f : X Y where X, Y are Banach spaces. The delicate question is
the existence of continuous linear projections P of X (onto Ker f (a)) and Q of Y
(onto Im f (a)). Such projections exist provided f (a) is a Fredholm operator, i.e.,
Ker f (a) has nite dimension and Im f (a) is a closed subspace of nite codimension in Y (see page 70). Notice that the equation f (x) = y can be solved by the
following procedure which is often called the LyapunovSchmidt Reduction:
The equation
f (x) = y
is equivalent to the pair of equations
y1  Qy = Qf (x1 + x2 ),

y2  (I Q)y = (I Q)f (x1 + x2 )

where
x = x1 + x2 ,

x2 = P x.

Suppose that the rst equation may be solved5 for x1 assuming x2 to be xed
(looking at x2 as a parameter). We obtain
x1 = g(y1 , x2 ).
The second equation is now an equation (it is called the bifurcation equation or
the alternative problem) of the form
(I Q)f (x2 + g(y1 , x2 )) = y2

for an unknown x2 .

If f (a) is a Fredholm map, then this equation is an equation in nite dimensional


spaces:
x2 Ker f (a), y2 Y2 , dim Ker f (a) < , dim Y2 = codim Im f (a) < .
Notice that the Implicit Function Theorem ensures a unique local solution to the
rst equation for y suciently close to b = f (a). In this situation we also obtain
g2 (b1 , a2 ) = o,
i.e., the point a2 is a critical point for
F (x2 )  (I Q)f (x2 + g(b1 , x2 )) b2 .
The simplest case for the local study of F is that
codim Im f (a) = 1,

i.e.,

F : X2 = Ker f (a) R

(see Example 4.3.20). Notice that dim X2 is nite for f (a) being a Fredholm map.
5 E.g.,

by the Implicit Function Theorem (Theorem 4.2.1) in the vicinity of a known solution
b = f (a) since f  (a) is an isomorphism of X1 onto Y1 or, more generally, by an iteration process.

4.3. Local Structure of Dierentiable Maps, Bifurcations

167

Example 4.3.15. As an application we will investigate the existence of a solution of the following boundary value problem for a system of ordinary dierential
equations

x(t)

= f (t, x(t)),
t (0, 1),
(4.3.1)
x(0) = x(1).
We suppose (see Theorem 2.3.4) that f together with its partial derivatives with
respect to the variables x = (x1 , . . . , xN ) are continuous on [0, 1] RN . We know
that any solution starting at t = 0 satises the integral equation
 t
f (s, x(s)) ds
x(t) x(0) =
0

for all t from the interval of its existence. This means that x satises the boundary
value problem (4.3.1) if and only if

G(x0 ) 

f (s, x(s, x0 )) ds = o.
0

= f (t, x(t)) such that x(0, x0 ) =


Here x(, x0 ) denotes a (unique) solution of x(t)
x0 . The problem of solving the equation
G(x0 ) = o

for G : RN RN

is a nontrivial topological task which we will deal with in Chapter 5.


Notice that we cannot use the Implicit Function Theorem directly since there
is no parameter in (4.3.1). Therefore we modify the problem by adding a multiplicative parameter to (4.3.1), i.e., we investigate the problem

x(t)

= f (t, x(t)),
t (0, 1),
(4.3.2)
x(0) = x(1).
Notice that for = 0 any N -dimensional constant a solves (4.3.2). To be able to
use the abstract approach described above we rewrite (4.3.2) in an operator form.
To do this we dene Banach spaces
X = {x C([0, 1], RN ) : x(0) = x(1)},

Y = {y C([0, 1], RN ) : y(0) = o}

and operators L, N : X Y :

Lx : t  x(t) x(0),

N (x) : t 

f (s, x(s)) ds,

t [0, 1].

Then the system (4.3.2) is equivalent to the operator equation


G(x, )  Lx N (x) = o.

(4.3.3)

168

Chapter 4. Local Properties of Dierentiable Mappings

The operator L is linear and continuous, therefore dierentiable:


L (x)h = Lh

h X.

for

The operator N is also continuously dierentiable and


 t
N (x)h : t 
f2 (s, x(s))h(s) ds,
t [0, 1], h X.
0

Check this expression yourself, see also Example 3.2.21. This means that
G 1 (a, 0)h = Lh
is not injective and X2  Ker L consists of N -dimensional constant functions.
Moreover,
Y1  Im L = {y Y : y(1) = y(0) = o}.
There are continuous linear projections P , Q onto closed subspaces X2 and Y1 ,
respectively, given by
P x : t  x(0),

Qy : t  y(t) ty(1).

Having the decompositions


X = X1 X2 ,

Y = Y1 Y2 ,

we can use the LyapunovSchmidt Reduction, i.e.,


x = x1 + a,

x1 X1 ,

a X2 ,

solves (4.3.3) if and only if it solves the pair of equations


G1 (x1 , a, )  Lx1 QN (x1 + a) = o,

(4.3.4)

G (x1 , a, )  (I Q)N (x1 + a) = o.

(4.3.5)

Since G1 (o, a, 0) = o and

(G1 ) 1 (o, a, 0)h = Lh

is an isomorphism of X1 onto Y1 (it is both injective and surjective), the inverse is


continuous by the Open Mapping Theorem (Theorem 2.1.8). The Implicit Function
Theorem yields a solution
x1  (b, )
of (4.3.4) in a neighborhood of (a, 0) for a given a X2 . We also have
(a, 0) = o

and

1 (a, 0) = o

(check it again). This means that it is sucient to solve


H(b, )  (I Q)N ((b, ) + b) = o

4.3. Local Structure of Dierentiable Maps, Bifurcations

169

with respect to b. Since dim X2 = dim Y2 = N < and H : X2 R Y2 we


can try to use the Implicit Function Theorem once more. To this end we need an
a
X2 for which
 1
 1
(I Q)N (
a)  t
f (s,
a) ds = o,
i.e.,
f (s, a
) ds = o,
0

and the equation


(I Q)N (
a)d  t


0


f2 (s, a
) ds d = tc

has a unique solution


for every c R . The last requirement means that the
 1
g
N N -matrix
f2 (s, a
) ds has to be regular.
N

To summarize the considerations of the previous example, we get the following conclusion.
Proposition 4.3.16. Let f = (f 1 , . . . , f N ) : [0, 1] RN RN be continuous and
f i
(i, j = 1, . . . , N ). Let the function f satisfy
have continuous partial derivatives x
j
the conditions
 1

 1
f i
f (s, a
) ds = o,
det
(s, a
) ds = 0
0
0 xj
for a certain constant a
RN . Then there exist > 0 and a dierentiable map
 x(, ), || < , such that x(, 0) = a
and the functions x(, ) satisfy the
boundary value problem (4.3.2).
Remark 4.3.17. Let us make some remarks on this result. If the function f in
(4.3.1) is 1-periodic in the variable t, then x is a solution of (4.3.1) if and only if
x
(t) = x(t n),

n = [t], t R,

is a 1-periodic solution of x = f (t, x). Only technical diculties appear when one
generalizes the just described approach to a more general equation
x(t)

= A(t)x + f (t, x)
with more general boundary conditions
Bx(0) Cx(1) = o
(B, C are N N matrices). Notice also that having a result for a system of
dierential equations we can investigate boundary value problems for second order
equations. For example, we put








y
0
0 0
b1 b2
, f (t, x) =
x=
, C=
,
, B=
y
0 0
g(t, y, y)

c1 c2

170

Chapter 4. Local Properties of Dierentiable Mappings

to rewrite

into the form

t (0, 1),

y(t) = a(t)y(t) + g(t, y(t), y(t)),

= 0,
b1 y(0) + b2 y(0)

x(t)

0
a(t)

1
0

c1 y(1) + c2 y(1)

=0


x(t) + f (t, x(t)),

t (0, 1),

Bx(0) + Cx(1) = o.
Many other examples of the use of the Implicit Function Theorem can be found
in Vejvoda et al. [130]. We will return to the problem (4.3.1) in Example 5.2.18.
We now turn to the study of the behavior of a dierentiable function in the
vicinity of a critical point. We recommend that the reader considers the cases
f (x) = xn ,

n > 1,

and

f (x) =

n


aij xi xj ,

aij = aji ,

i,j=1

rst.
Denition 4.3.18. Let G be an open set in a Banach space X, f : X R, f
C 2 (G). A critical point a G of f is said to be non-degenerate if for any h X,
h = o, the linear form f (a)(h, ) does not vanish.
The following basic result holds also in a Hilbert space but its nite dimensional version is more transparent.
Theorem 4.3.19 (Morse). Let G be an open set in RM , f : RM R, f C 2 (G). Let
a G be a non-degenerate critical point of f . Then there exists a dieomorphism
of a neighborhood U of a onto a neighborhood V of o RM such that for x U,
y = (x), the function f can be expressed in the form
1
i yi2
2 i=1
M

f (x) = f (a) +

where 1 , . . . , M are the eigenvalues of the symmetric matrix f (a).


Proof. We identify a bilinear operator with its matrix representation in the standard basis in RM (Remark 3.2.29(ii)) and denote the collection of all M M
matrices by M . Then we can write
B(x)(x a, x a)  (B(x)(x a), x a)RM .
We choose a norm MM on M and keep it xed throughout the proof. A subset
of M consisting of symmetric matrices is denoted by S . We also denote by F and

4.3. Local Structure of Dierentiable Maps, Bifurcations

171

FS the sets of all bounded continuous maps of G into M and S , respectively.


The space F equipped with the norm
AF  sup A(x)MM
xG

is a Banach space and FS is its closed subspace. Without loss of generality we


can assume that G is a convex neighborhood of the point a so small that f is
bounded on G.
After these preliminaries we start with the proof. Since f (a) = o, the Taylor
Formula (Proposition 3.2.27) gives


f (x) = f (a) +

(1 t)f (a + t(x a))(x a, x a) dt

= f (a) + B(x)(x a, x a) with

B(x) 

(1 t)f (a + t(x a)) dt

(the Riemann integral of a function with values in RMM ). Note that we have
B() FS . Our aim is to show that we can choose C() F such that
B(x) = C (x)JC(x)
where J is the canonical form of B(a) = 12 f (a), i.e.,

J=

1
..
0

6
.

Here C stands for the adjoint matrix to C, i.e., C = (cji ) provided C = (cij ).
The transformation of coordinates
y = C(x)(x a)
then yields
1
i yi2 .
2 i=1
M

f (x) = f (a) + (J(y), y)RM = f (a) +

To achieve this goal we will use the Implicit Function Theorem (Theorem 4.2.1).
We put
(B, C) = C (x)JC(x) B(x) : FS F FS .
In particular,
(B(a), T ) = T JT B(a) = o,
6A

symmetric matrix has a diagonal canonical form see Proposition 6.3.8.

172

Chapter 4. Local Properties of Dierentiable Mappings

provided T is a unitary matrix which transforms B(a) into its canonical form J.
Put A  JT . The partial dierential of with respect to the second variable has
the form
2 (B, C)M : x  M (x)JC(x) + C (x)JM (x),

x G.

Then
Ker 2 (B(a), T ) = {M F : M ()A + A M () = o}
and
Q : M 

1
(M (A )1 M A)
2

is a continuous linear projection of F onto Ker 2 (B(a), T ). By the assumption


on the point a, J is injective. Further, T is a unitary matrix, i.e., T = T 1 . This
means that (A )1 = J 1 T exists. It can be seen that I Q is a projection onto
F1  (A )1 (FS ).
The partial dierential 2 (B(a), T ) is an isomorphism of F1 onto FS . Namely,
M

1 1
J T S F1
2

2 (B(a), T )M = S FS .

and

We can now apply the Implicit Function Theorem to : FS F1 FS (T F1 )


and obtain positive numbers and such that for any B FS , B()B(a)F <
there is a unique C F1 , C() T F < for which
(B, C) = C (x)JC(x) B(x) = o

for all x G.

To nish the proof we have to show that there is a neighborhood U of a such that
B(x) B(a)F <

for all x U.

By the denition of B,


B() B(a)F = sup 

xG



(1 t)[f (a + t(x a)) f (a)] dt


MM

1
sup f (x) f (a)MM .
2 xG
This means that we can nd the desired neighborhood U.

Example 4.3.20. Let X, Y be Banach spaces and let f : X Y . Consider the


equation
f (x) = o
(4.3.6)

4.3. Local Structure of Dierentiable Maps, Bifurcations

173

in the vicinity of a known solution x = a. Let f be a C 2 -mapping in a neighborhood


of a. Suppose that f (a) is a Fredholm operator (Remark 4.3.14) and, moreover,
that the above equation can be reduced to the bifurcation equation
(I Q)f (g(x2 ) + x2 ) = o.
Here Q is a projection of Y onto Im f (a), X = X1 Ker f (a) and g(x2 ) is a
(unique) solution of
Qf (x1 + x2 ) = o

for x2 Ker f (a),

and x2 is in a neighborhood of a2 Ker f (a) (a = a1 + a2 ). We also assume that


this g is given by the Implicit Function Theorem. In particular, this means that
g (a2 ) = o.
Suppose now that

codim Im f (a) = 1,

i.e., I Q is a projection onto a 1-dimensional subspace Y2 of Y . Let


Y2 = Lin{y2 }.
By Corollary 2.1.18 and Remark 2.1.19, there is Y , [Im f (a)] , and we
may assume that
(y2 ) = 1.
In other words,
(I Q)y = (y)y2 ,
and the bifurcation equation has the form
F (x2 )  (f (g(x2 ) + x2 )) = 0.
We have
F (a2 )h = [f (a)(g (a2 )h + h)] = 0,

h Ker f (a),

i.e., a2 is a critical point of F . Further,


F (a2 )(h, k) = [f (a)(g (a2 )h + h, g (a2 )k + k)] + [f (a)(g (a2 )(h, k))]
= [f (a)(h, k)]
since
If, for example,

f (a) = 0

and

g (a2 ) = o.

dim Ker f (a) = 2

174

Chapter 4. Local Properties of Dierentiable Mappings

(this can occur for f : RN +1 RN ) and the matrix of F (a2 ) is regular, i.e., a2
is a non-degenerate critical point of F , then after a suitable transformation of
coordinates we get
1
F (x2 ) = (1 2 + 2 2 )
2
(the Morse Theorem) and the following conclusion:
If sgn 1 = sgn 2 , then the equation (4.3.6) has an isolated solution
x = a;
if sgn 1 = sgn 2 , then there are two curves of solutions given by
2
2
= .
1

The previous example can be generalized. The following problem is a standard


one in the bifurcation theory: A dierentiable map f : R X Y is given where
X, Y are Banach spaces.7 A smooth curve x = (), (, ), of solutions of
the equation
f (, x) = o
(4.3.7)
is known. After the transformation = x (), we can suppose that
f (, o) = o

(4.3.8)

for in a neighborhood of (e.g.) 0 R.


Denition 4.3.21. Let (4.3.8) be satised for the equation (4.3.7). The point (0, o)
R X is called a bifurcation point provided in any neighborhood of (0, o) there is
a solution (0 , x0 ) of (4.3.7) such that x0 = o.
Notice that whenever f is dierentiable in a neighborhood U of (0, o) and
f2 (0, o) is an isomorphism, then (0, o) is not a bifurcation point (the Implicit
Function Theorem). In order to nd a sucient condition for bifurcation suppose
that f C 2 (U) and A = f2 (o, o) is not an isomorphism. More precisely, let Ker A
be nontrivial, i.e., let 0 be an eigenvalue of A. The simplest case occurs when 0 is
a simple eigenvalue, i.e.,
Ker A = Lin{x0 },

x0 = o.

The following result is a classical one (see Crandall & Rabinowitz [29]).
Theorem 4.3.22 (Local Bifurcation Theorem). Let X, Y be Banach spaces,
f : R X Y a twice continuously dierentiable map on a neighborhood of (0, o).
Let f satisfy the assumptions
(i) f (, o) = o for all (, ) for some > 0,
(ii) dim Ker f2 (0, o) = codim Im f2 (0, o) = 1,

(iii) if f2 (0, o)x0 = o, x0 = o, then f1,2
(0, o)(1, x0 )  Im f2 (0, o).
7 The

set of parameters R can be replaced by a normed linear space in general.

4.3. Local Structure of Dierentiable Maps, Bifurcations

175

Denote by X1 the topological complement8 of Ker f2 (0, o) in X. Then there


is a C 1 -curve (, ) : (, ) R X1 (for some > 0) such that
(0) = 0,

f ((t), t(x0 + (t))) = o.

(0) = o,

Moreover, there is a neighborhood U of (0, o) in R X such that


f (, x) = o

(, x) U

for

if and only if either x = o or


= (t),

x = t(x0 + (t))

for a certain t

see Figure 4.3.10. Such a picture is called a bifurcation diagram.


X

((t), t(x0 + (t)))

(0, o)
R
U

Figure 4.3.10.

Proof. We will give two proofs. The rst one for a nite dimensional case when
X = Y = RM is based on the Morse Theorem. The second one which is due to
M. Crandall and P. Rabinowitz is based on the Implicit Function Theorem and
will be only sketched.
The rst proof. We choose Y = RM , = o, such that
y Im f2 (0, o)

if and only if

(y) = 0.

Using the LyapunovSchmidt Reduction (Remark 4.3.14) we obtain a map g(, t) :


R2 X1 such that the equation f (, x) = o is locally equivalent to the equation
F (, t)  [f (, tx0 + g(, t))] = 0.
We now show that (0, 0) R2 is a non-degenerate critical point of F . To do this
we need to compute F (0, 0) and F (0, 0). Since
f1 (, o) = o,


f1,1
(, o) = o

for

(, )

X = X1 X2 and the corresponding projection P of X onto X1 be continuous. Then X1


is called a topological complement of X2 and vice versa.
8 Let

176

Chapter 4. Local Properties of Dierentiable Mappings

(assumption (i)) and


g(, 0) = o,

g2 (0, 0) = o

(see Remark 4.3.14), we have


F (0, 0) = 0
Further,

and also


F1,1
(0, 0) = 0.

F2 (, 0) = [f2 (, o)(x0 + g2 (, 0))].

Therefore, by (iii),




 F1,2
(0, 0) = F2,1
(0, 0) = [f1,2
(0, o)(1, x0 ) + f2 (0, o)g1,2
(0, 0)]

= [f1,2
(0, o)(1, x0 )] = 0

since (z) = 0 for every z Im f2 (0, o). If we denote  F2,2
(0, 0), we obtain


0
. This matrix has
the matrix representation of F (0, 0) in the form

eigenvalues of dierent signs. The rest of the proof follows by applying the Morse
Theorem (see also Example 4.3.20).
The second proof proceeds by using the Implicit Function Theorem for the
function : R R X1 Y dened by

1
f (, t(x0 + x1 )) for t = 0,
(, t, x1 ) = t
f (, o)(x + x ) for t = 0.
2

Notice that (0, 0, o) = o, and


(, h)  1 (0, 0, o) + 3 (0, 0, o)h
is an isomorphism of R X1 onto Y (assumptions (ii) and (iii)). For details see
Crandall & Rabinowitz [29].

Example 4.3.23. The following two functions oer very simple illustrative examples:
g(, x) = x x3 .
f (, x) = x x2 ,
Their bifurcation diagrams are shown in Figures 4.3.11 and 4.3.12.
We use these functions to point out the typical examples of the changing of
stability of a dierential equation when the so-called non-hyperbolic stationary
point is crossed.9 In these gures branches of stationary solutions of equations
x = f (, x),

x = g(, x)

are shown with an indication of their stability (s for stable, u for unstable).

stationary point a RM is called hyperbolic for the equation x = f (x) provided f (a) = o
and (f  (a)) iR = . See also footnote 3 on page 154.
9A

4.3. Local Structure of Dierentiable Maps, Bifurcations

177

x
s

(0, 0)

(0, 0)

u
s
Figure 4.3.11. Transcritical bifurcation

Figure 4.3.12. Pitchfork bifurcation

Example 4.3.24. We wish to nd a nontrivial 2-periodic solution of the nonlinear


pendulum equation
x
(t) + sin x(t) = 0.
(4.3.9)
We put
f (, x) : t  x(t) + sin x(t),
and
X = {x C 2 (R) : x is 2-periodic},

+ max |
x(t)|,
xX = max |x(t)| + max |x(t)|
t[0,2]

t[0,2]

Y = {y C(R) : y is 2-periodic},
It is easy to show that

t[0,2]

yY = max |y(t)|.


t[0,2]

+ h
f2 (, o)h = h(t)

and, therefore, Ker f2 (, o) is nontrivial if and only if = n2 , n N {0}, and


Ker f2 (0, o) = {constant functions},
Ker f2 (n2 , o) = Lin{sin nt, cos nt}

for

n N.

In the former case, i.e., n = 0, we can apply Theorem 4.3.22. Since




 2
Im f2 (0, o) = y Y :
y(s) ds = 0
0

and


f1,2
(0, o)(1, c) = c

for c Ker f2 (0, o),

the assumptions of Theorem 4.3.22 are satised.


What can we do in the latter case when n N and the dimension of
Ker f2 (n2 , o) is equal to 2? In spite of the fact that Theorem 4.3.22 cannot be
used we still may proceed with the LyapunovSchmidt Reduction: Denote
Y = Im A Z,
A  f2 (n2 , o),
 2

cos nt 2
sin nt
y(s) sin ns ds +
y(s) cos ns ds,
(I Q)y : t 

0
0

y Y.

178

Chapter 4. Local Properties of Dierentiable Mappings

Then I Q is the projection onto Z such that Ker (I Q) = Im A. Similarly, let


X = Ker A V

where V = {v X : (I Q)v = o}.

The operator f can be expressed by


f ( + n2 , u + v) = Av + (u + v) + h(, u, v),

u Ker A, v V,

where
h(, u, v) = [sin (u + v) (u + v)] + n2 [sin (u + v) (u + v)].
Because of this special form of h we will try to nd a solution of (4.3.9) in the
form x = (u + v). The equality
f ( + n2 , (u + v)) = 0
holds if and only if
Av + (u + v) + 2 g(, u, v) = 0

(4.3.10)

where

h(, u, v)

, = 0,

3
g(, u, v) =
2

n (u + v)3 , = 0.
6
For solving (4.3.10) we use the LyapunovSchmidt Reduction. According to it, the
equation (4.3.10) is equivalent to the following pair of equations:
0 = Av + Q(u + v) + 2 Qg(, u, v)
0 = (I Q)(u + v) + (I Q)g(, u, v)

(= Av + v + 2 Qg(, u, v)),
(= u + (I Q)g(, u, v)).

By the Implicit Function Theorem, the rst equation has a unique solution v =
(, u) in a neighborhood of the point (0, u , o) for any u . We insert into the
bifurcation equation obtaining
(, u) = u + (I Q)g(, u, (, u)) = 0.

(4.3.11)

Since (0, u ) = u , we take u = o and solve (4.3.11) in a neighborhood of (0, o).


This can be done with help of the Implicit Function Theorem since 2 (0, o) is an
isomorphism of Ker A onto V . Denoting this solution by u = () we can come
to the following conclusion:
Any point (n2 , o) is a bifurcation point of the equation (4.3.9) and a
nontrivial branch of 2-periodic solutions of (4.3.9) has the form
x = (() + (, ())),

(, ).

4.3. Local Structure of Dierentiable Maps, Bifurcations

179

The reader is invited to generalize this procedure to obtain sucient conditions for a bifurcation for the equation
f (, x) = o
assuming that
f C 2 (U),
f (, o) = o
for | | < ,
dim Ker f2 ( , o) = codim Im f2 ( , o) = 2
where U is a neighborhood of ( , o).
We notice that no uniqueness of the bifurcation branch was proved even in
our concrete example. Compare this with the assertion given in Theorem 4.3.22.
This is due to our special choice of the form of the bifurcation branch, namely
g
x = (u + v).
Example 4.3.25 (Application of Theorem 4.3.22). We will study the bifurcation
points of the periodic problem

x
(t) + x(t) + g(, t, x(t), x(t))

= 0,
t (0, 2),
(4.3.12)
x(0) = x(2), x(0)

= x(2).

In this example we will concentrate on the point = 0 which is an eigenvalue of


the associated eigenvalue problem

x(t) + x(t) = 0,
t (0, 2),
(4.3.13)
x(0) = x(2), x(0)

= x(2),

of multiplicity 1. We consider the same function spaces X, Y as in the previous


example (Example 4.3.24).
Let us dene F : X R Y by
F (, x)(t) = x
(t) + x(t) + g(, t, x(t), x(t))

where the function


g = g(, t, x, y)
satises the following hypotheses:
(i) g is 2-periodic in t and continuous with respect to all four variables (as a
function from R4 into R);
(ii) the derivatives of g with respect to x, y, up to the order p (p 2) are
continuous functions from R4 into R;
(iii) g(, t, 0, 0) = 0 for all t, R;
(iv) g3 (, t, 0, 0) = g4 (, t, 0, 0) = 0 for all t, R.

180

Chapter 4. Local Properties of Dierentiable Mappings

It follows from (iii) that F (, o) = o for all R. Moreover, thanks to (iv) we


have
+ w,
F2 (, o)w = w
and so we conclude

dim Ker F2 (o, 0) = 1.

It follows from (ii) that F C p (X [0, 2], Y ). By Proposition 2.1.27(iv),




 2
Y1 = Im F2 (0, o) = w Y :
w(t) dt = 0
0

is a closed subspace of Y of codimension 1. Set




x0 = 1,
X1 = Lin{1},
X2 = x X :


x(t) dt = 0 .

Since


F1,2
(0, o)1 = 1

and

1
/ Im F2 (0, o),

the condition (iii) in Theorem 4.3.22 is veried, too.


It follows from the CrandallRabinowitz Bifurcation Theorem (see Theorem
4.3.22) that (0, o) is a bifurcation point of (4.3.12). In particular, the point (0, o)
R X belongs to the branch of trivial solutions (, o), but also to the branch

= {(s + x(s), (s))


: s (, )},

x(0) = o, x s (0) = o,

(0)
= 0.

Hence for any s (, ), s = 0, the nontrivial solution s + x(s) is the sum of


a constant function (with respect to t) and the perturbed function x(s) (which
g
depends on t) such that x(s) belongs to X2 .
Exercise 4.3.26.
(i) Let G be an open subset of RM and let v C 1 (G, RM ). Assume that a
G is not a stationary point of v, i.e., v (a) = o. Prove that there exists a
dieomorphism F of a neighborhood U of a onto a neighborhood V of o such
that F maps solutions to the equation
x(t)

= v(x(t))

(4.3.14)

which lie in U to solutions in V of the system of equations


y 1 (t) = 1,
y i (t) = 0,

i = 2, . . . , M.

Hint. Choose a subspace Y of RM for which RM = Y Lin{v(a)}. Dene


G(z) = (t; a + y)

for z = y + tv(a), y Y,

where (t; ) is a solution to (4.3.14) such that


(0; ) = .
Prove that G is a local dieomorphism and F = G1 has the desired property.

4.3A. Dierentiable Manifolds, Tangent Spaces and Vector Fields

181

(ii) Deduce from (i) that the equation (4.3.14) has M 1 independent rst integrals in a neighborhood of a non-stationary point.
(iii) Is there any relation between the rst integral of (4.3.14) and the linear
partial dierential equation
v1 (x)

u
u
+ + vM (x)
= 0?
x1
xM

Exercise 4.3.27. Apply Theorem 4.3.22 to the (Dirichlet) boundary value problem

x(t) + x(t) + g(, t, x(t), x(t))

= 0,
t (0, ),
x(0) = x() = 0,
and show that every (k 2 , o), k N, is a bifurcation point!
Exercise 4.3.28. Replace the Dirichlet boundary condition in Exercise 4.3.27 by
the Neumann boundary condition
x(0)

= x()

=0
and prove that every (k 2 , o), k N {0}, is a bifurcation point!
Exercise 4.3.29. Why cannot the approach used in Example 4.3.25 be applied to
prove that the points (k 2 , o), k N, are bifurcation points of (4.3.12) even if k 2 is
an eigenvalue of the associated eigenvalue problem (4.3.13)? Can you modify the
method from Example 4.3.24?
Exercise 4.3.30. Apply Theorem 4.3.22 to the boundary value problem
4
...
d x
(t) x(t) + g(, t, x(t), x(t),

x
(t) x (t)) = 0,
t (0, ),
dt4

x(0) = x
(0) = x() = x(),
and show that, under appropriate assumptions on g, every (k 2 , o), k N, is a
bifurcation point.

4.3A Dierentiable Manifolds, Tangent Spaces and Vector Fields


We have dened a dierentiable manifold in the basic text (Denition 4.3.4) and have also
shown some examples (Examples 4.3.5 and 4.3.10). In the following Appendices 4.3A
4.3D we will provide more information about this object in order to develop a geometric
approach to one of the most powerful tools of nonlinear analysis, namely to the Brouwer
degree (cf. Section 5.2 for an alternative approach).
There is no doubt as to the importance of the notion of the derivative (or differential) for local study of functions of one or more variables. Therefore a notion of
the dierential will be a rudiment of analysis on dierentiable manifolds, too. We have
learned that it is convenient to dene the dierential of
f : RM R

at a point

a RM

182

Chapter 4. Local Properties of Dierentiable Mappings

as a linear form f  (a) on RM which approximates f locally at a in a given precise way.


To extend such an approach to functions on dierentiable manifolds we have to say what
is a linear form on a (nonlinear) manifold. This is done by the important notion of the
tangent space
Ta M
of a dierentiable manifold M at a point a M . Roughly speaking, Ta M is the collection
of all tangent vectors to M at the point a. We can imagine a tangent vector v at the
point a M with help of the following physical interpretation.
Consider a force F which acts on a material point P . Suppose that there are certain
rigid constraints which make P move along a smooth curve which lies on a manifold
M RN (this manifold is determined by these constraints). As a concrete example,
imagine that we move on the globe (because of gravitation) which can be approximated
by the smooth two-dimensional sphere S 2 . Let the point P be at a certain instant, say
t = 0, at the position a = (0) M , and let the force F and also all constraints stop
operating suddenly at this time. What will happen then? According to the First Newton
Law the point P will continue to move with a constant speed
v = |(0)|

((0)

d
(0))
dt

along the line with the directional vector (0).

This vector is the tangent vector to the


curve at the point a. The collection of all these tangent vectors (which are given by
all possible motions through a xed point a M ) forms the tangent space Ta M . More
precisely, we give the following denition.
Denition 4.3.31. Let M be an M -dimensional dierentiable manifold in RN and let

to all smooth
a M . The tangent space Ta M is the collection of tangent vectors (0)
curves
a  { : R RN : there is an open interval I  0
such that C 1 (I ), (I ) M , (0) = a}.
The method for computation of tangent vectors to a parametrized M -dimensional manifold M RN is based on the use of local coordinates: Let a M , let
be a dieomorphism of a neighborhood W RN of the point a into RN (see Denition 4.3.4). If
V = W M,
then (V, ) is a local chart of M at the point a M . Denote the inverse of the restriction
of P to V by where P y = (y1 , . . . , yM ) RM for y = (y1 , . . . , yN ) RN . Then
maps the neighborhood U = P (V) of the point b = P (a) RM into M and we will
consider also as an embedding (see Remark 4.3.3(iv)) of U into RN . We call the
local parametrization of V M . The main reason for introducing is that can be
dierentiated, but |V cannot, and the whole does not describe M . See Figure 4.3.13.
Consider now a smooth curve a (see Figure 4.3.13). We can choose I so
small that (I ) V. Then
(t) = (P )((t))

4.3A. Dierentiable Manifolds, Tangent Spaces and Vector Fields

RN M

183

RN

M
W

(0)

P
(0)

RM
Figure 4.3.13. Manifold
is a smooth curve in U RM . We have = and, consequently,
j (0) =
or, more briey,

M

j
(b) i (0), 10
yi
i=1

j = 1, . . . , N,

(0)

=  (b)((0)).

Since also

(4.3.15)

(0)

= P  (a)((0)),

there is a correspondence between the tangent vector v = ( 1 (0), . . . , N (0)) Ta M and


the tangent vector w = ( 1 (0), . . . , M (0)) to a curve
= (P )

at

b = (0) = (a).

Obviously, for any w = (w , . . . , w ) R there is a smooth curve (e.g., (t) =


M

wi ei ; e1 , . . . , eM is the standard coordinate basis in RM ) such that w = (0).

b+t
1

i=1

This means that

Ta M = Im  (b)

10 The Einstein summation convention is often used in dierential geometry. According to it, the
sum is taken with respect to all indices which appear simultaneously in upper and lower positions.
For example, if e1 , . . . , eM is a basis in RM , then the coordinates of a point x RM with respect
M

to this basis should be denoted by x1 , . . . , xM , since x =
xi ei  xi ei by this convention.
i=1

Similarly, : RM RN has components 1 , . . . , N and values (x1 , . . . , xM ) (= (x)) and


M
N
j
j


j
partial derivatives
. Moreover,  (a)(hi ei ) =
(a)hi ej (=
(a)hi ej ). Since
x
x
x
i

this notation is not too common in analysis we do not use it.

j=1

i=1

184

Chapter 4. Local Properties of Dierentiable Mappings

and the linear operations in RM induce those in Ta M . Therefore, Ta M is a linear space


of dimension M (M = dim M ) and  (b)ei , i = 1, . . . , M , form a basis of Ta M . Since
(y 1 , . . . , y M )  ( 1 (x), . . . , M (x))
can be viewed as local (nonlinear) coordinates of a point x V, the vector  (b)ei is also

. This means that


denoted by y
i
M

M
 i





= (b)
(0)ei =
i (0)
.
(4.3.16)
(0)

= (b)((0))
y
i
i=1
i=1
Example 4.3.32. Let us compute the tangent space to the 2-dimensional sphere


1 1
2
, ,
S 2 = {(x, y, z) R3 : x2 + y 2 + z 2 = 1}
at the point a =
.
2 2 2
As local coordinates we choose the spherical coordinates
x = cos cos , y = sin cos , z = sin ,




i.e., (x, y, z) = (, ), b = 4 , 4 and 4 , 4 = a. Then




1 1
1
' (

' (
1
2

= , ,0 ,
= , ,

,

,

4 4
2 2

4 4
2
2 2
is a basis of Ta S 2 . Choosing a perpendicular vector v to both

(1, 1, 2), we get the following expression for Ta S 2 :

Ta S 2 = {(x, y, z) R3 : x + y + 2z = 0}.

and

e.g., v =

(If you have drawn a picture, you get a slightly better insight.)

It was shown in Remark 4.3.9(iii) that a manifold can also be given implicitly,
i.e., as the set of solutions of the equation
f (x) = o.
Proposition 4.3.33. Let f : RM RN have continuous partial derivatives in an open set
G RM and let o be a regular value of f (Denition 4.3.6). Then
M = {x G : f (x) = o}
is an (M N )-dimensional dierentiable manifold provided M is not empty, and for
a M the tangent space Ta M is equal to Ker f  (a).
Proof. The rst part is exactly Proposition 4.3.8 and Remark 4.3.9(iii). If a map : I
M is a smooth curve, (0) = a, then
f ((t)) = o

for t I

and

f  (a)(0)

= o,

i.e.,

(0)

Ker f  (a).

Since Ta M Ker f  (a) and both the spaces have the same (nite) dimension (the
assumption on regularity of o), we have
Ta M = Ker f  (a).

4.3A. Dierentiable Manifolds, Tangent Spaces and Vector Fields

185

Since the same geometric object M can be viewed as a manifold with dierent local
parametrizations or as solutions of dierent equations, we would like to know how the
notion of the tangent space (and other notions to be introduced later on) depends on the
way it is introduced. As the implicit denition of manifold leads to local parametrizations (see the proof of Proposition 4.3.8) we can consider only the denition given by
parametrizations. First of all we should say when two atlases of M dene the same
structure on M .
Denition 4.3.34. Two C k -atlases (V , )A , (V , )B of M are said to be equiva the mapping
lent if for every a M and any A, B for which a V V
= (P )
) onto (
is a C -dieomorphism of ( )1 (V V
)1 (V V ) (see Figure 4.3.14).
k

V
a


= (P )
b

U1  1 (V V)

U1  1 (V V)

Figure 4.3.14.
be two local charts at a point a M which belong to
)
Example 4.3.35. Let (V, ), (V,

for a smooth curve . Then


equivalent atlases of M . Let v Ta M , v = (0)
= =

where

and is dened as in Denition 4.3.34. It follows that


 (b)((0))

=  (b)( (b)(0)).

In particular, for ei = (0),

denoting y
  (b)ei (as above) and z j 
 (b)ej , we get
i
the transformation rule for the tangent vectors
M

M
 j


=
 (b)
(b)ej =
(b)
,
i = 1, . . . , M.
(4.3.17)
yi
y
y
z
i
i
j
j=1
j=1

We will now examine a more general situation. Let M , M be dierentiable man


be
)
ifolds in RN and RN , respectively. Suppose that g : M M and let (V, ), (V,
local charts in a M and a
= g(a) M. Put
g
G = (P )
(see Figure 4.3.15). Then G is called a local realization of g.

186

Chapter 4. Local Properties of Dierentiable Mappings

RN

RN M

RN M

RN
M

V
g

V
g1 (V)

V
a

g o
G = (P )

U
R

V)
P (g1 (V)

RM

Figure 4.3.15.

We say that a mapping g : M M is of the class C k (or k-times continuously


In this case g maps a
).
dierentiable) if G C k (U, U ) for all charts (V, ) and (V,
smooth curve a onto a smooth curve g a . Namely,
g =
G

where

= ,

and
d

(4.3.18)
(g )(0) =
 (b)(G (b)(0)).
dt
We say that g pushes forward the tangent vector (0)

Ta M to the tangent vector


d
(g )(0) Tg(a) M
dt

and which
which is denoted by g ((0))
' is(called a push-forward . In particular, g pushes

where
forward the tangent vector yi to g y
i

g

yi


=

M


Gj
(b)
;
y
z
i
j
j=1

=
 (b)ej .
zj

(4.3.19)

We wish to point out that g is the generalization of g  (a) for g : RM RM .


The transformation rule (4.3.17) is a special case of (4.3.18) where g = I.
An important special case of a smooth mapping is a dierentiable function on a
manifold. For such a function we dene the notion of the dierential.

4.3A. Dierentiable Manifolds, Tangent Spaces and Vector Fields

187

Denition 4.3.36. A function f : M R is said to be dierentiable at a M if


f C 1 (I )

for all

a .

The dierential df (a) of f at the point a is dened by the relation


df (a)((0))

d
(f )(0)
dt

for all

a .

The (algebraic) dual space to Ta M will be denoted by (Ta M ) (it is sometimes


called the cotangent space) and the dual basis to y 1 , . . . , yM is denoted by dy 1 , . . . ,
dy M , i.e.,



1 for i = j,

= ij =
dy j
yi
0 for i
= j.
Remark 4.3.37.
(i) From the denition of df (a) it is obvious that df (a) (Ta M ) and its values can
be expressed in local coordinates as follows: If (, V) is a local chart at a M ,
F = f : U RM R, then
 F
d
(f )(0) =
(b) i (0).
dt
yi
i=1
M

f = F

for

= a ,

i.e.,

In other words,
df (a) =

M

F
(b) dy i .
y
i
i=1

(4.3.20)

In particular, for f (x) = i (x) we have


d i (a) = dy i ,

i = 1, . . . , M.

Observe that the formula (4.3.20) allows us to dene continuity of the mapping
x M df (x) (Tx M )
by the requirement that all F (corresponding to all charts of M ) have continuous
partial derivatives.11
at a point a M . Let
)
(ii) Suppose now that there are two local charts (V, ), (V,
f : M R be dierentiable at a M . Put
F  f ,

F  f ,

i.e.,

F = F

11 It is possible to dene the structure of a dierentiable manifold on the collection T M 


{(x, v) : x M , v Tx M } of all tangent spaces together with their base points. This set
T M is called a tangent bundle of M . The structure of a dierentiable manifold on T M is given
by the local charts which are constructed as follows: If (V, ) is a local chart at a M , then
VT = {{x} Tx M : x V} RN+M and T (x, w) = ((x), P  (x)w) RN+M . In a similar
way the cotangent bundle is the collection T M  {(x, ) : x M , (Tx M ) }. The continuity
of df which has just been dened is then the continuity of df : M T M .

188

Chapter 4. Local Properties of Dierentiable Mappings


where is dened in Denition 4.3.34. By virtue of (4.3.20) we have

M
M
M


 F
j
F
i

df (a) =
(b) dy =
(b)
(b) dy i
yi
zj
yi
i=1
i=1
j=1
M

M
M


F  j
F
i
=
(b)
(b) dy =
(b) dz j .
z
y
z
j
i
j
j=1
i=1
j=1

(4.3.21)

The equality
dz j =

M

j
(b) dy i
yi
i=1

(4.3.22)

follows from this calculation applied to the jth coordinate function


z j = j (x),

x V V

and Remark 4.3.37(i).12


According to the last remark, the existence of the dierential df (a) does not depend
on the choice of local chart. The fact that the dierentiability of functions, and similarly
other notions, does not depend on a particular choice of local coordinates is crucial
for dierential geometry and global analysis. Namely, these parts of mathematics study
objects in their own geometric nature. The invariance of geometric objects with respect to
various groups of transformations (in our case this group is the group of dieomorphisms)
plays a very important role in various applications, mainly in physics (e.g., in general
relativity). It is also worth mentioning that the emphasis on invariance is a certain kind of
the philosophy dual to Descartes. Analytic geometry and classical dierential geometry
transform geometric properties into the language of analysis. Local coordinates which
introduce analytic tools into geometry are used mainly for computations.
Remark 4.3.38. The transformation rule (4.3.21) can be generalized. Consider a function
f : M R and suppose that a mapping g : M M is given. We can look at g as a
generalized transformation (we do not assume that g is injective). We are interested in
the relation between df (
a) and the dierential of the transformed function f g : M
R. The desired chain rule can be obtained again with help of local charts (V, ) at a M
)
at a
and (V,
= g(a) M (see Figure 4.3.15). This means that we investigate F G
instead of f g. Here
g
G = (P )

and

F = f .

According to (4.3.19)(4.3.21) we have



d(f g)(a)

yi

 F
(F G)
Gj
(b) =
(b)
(b)
yi
zj
yi
j=1



 g ( df (
.
a))
= df (
a) g
yi
yi

(4.3.23)

12 Notice the dierence between the transformation rules (4.3.17) and (4.3.22). The reader who is
acquainted with tensor analysis can realize that tangent vectors are transformed as contravariant
tensors and dierentials as covariant ones.

4.3A. Dierentiable Manifolds, Tangent Spaces and Vector Fields

189

The linear form g ( df (


a)) (Ta M ) is called a pull-back of the linear form df (
a)

(Ta M ) . This operation will play an important role in the denition of the degree (see
Proposition 4.3.116).
We will return to pull-back in Exercise 4.3.71.
Remark 4.3.39. The notion of the dierentiable manifold can be generalized in such a
way that it is not a priori assumed that M is a subset of RN . In fact, we have needed in
Denition 4.3.4 that M has a topological structure (inherited from RN ) such that the
neighborhood V = W M is homeomorphic via P |V ((P |V )1 = ) with U RM . A
dierential structure on M is introduced with help of dierentiability properties of the
mappings 1,2  1
2 1 for dierent neighborhoods V1 , V2 of M (cf. Denition 4.3.34).
This is sucient for correctness of the denition of the smooth function f : M R. (It
is smooth provided f : U R is smooth for all . Namely,
f (x) = (f 1 )(y) = (f 2 )[1,2 (y)]

for

x V1 V2 .)

These considerations allow us to say that the topological space M 13 is called the M dimensional dierentiable manifold if M is locally homeomorphic to open sets of RM in
such a way that all composite mappings 1,2 belong to the class C k .14
Remark 4.3.40. We can dene an innite dimensional dierentiable (e.g, of the class C k )
manifold by replacing RM by a Banach space X. As an important example consider a
mapping f C k (X, R), k 1, and dene
M = {x X : f (x) = 1}.
If M
= and all its points are regular (i.e., f  (x)
= 0 for all x M ), then M is an
innite dimensional manifold of the class C k (this follows from Proposition 4.3.8 and
Remark 4.3.9(i)). Moreover, the tangent space Ta M is equal to Ker f  (a) for a M .
Indeed, the inclusion Ta M Ker f  (a) can be proved as in Proposition 4.3.33. To get the
reverse inclusion let h Ker f  (a) and let be the dieomorphism from Proposition 4.3.8.
For (t)  (th) there exists > 0 such that (t) M for |t| < and (0)

=  (o)h = h
(cf. the proof of Proposition 4.3.8), i.e., h Ta M .
In order to dene the dierential df (a) for f : M R we need a generalization of
the notion of the tangent space Ta M in this more general setting. For a V M we
dene
a  { : R M : open interval I  0 : (0) = a V, 1 C 1 (I ),
where : U V is a local parametrization of V}.

d
(1 )-t=0 with a linear
Similarly as above, Ta M is the collection of all vectors dt
structure of RM . (Actually, Ta M now coincides with RM . We remark that previously
Ta M was an M -dimensional subset of RN .) Further,

13 If

M =

df (a) : Ta M R
V is a set such that there are injective and surjective mappings : U V

where U are open subsets of RM , then the sets V form a subbase of a topology in M .
14 The deep theorem due to H. Whitney roughly says that a connected M -dimensional dierentiable manifold can be embedded (Remark 4.3.3(iv)) into R2M +1 (see, e.g., Whitney [133,
Chapter IV], Aubin [11, Theorem 1.22] or Sternberg [124, Theorem 2.4.4]). This means that our
previous approach was not too restrictive.

190

Chapter 4. Local Properties of Dierentiable Mappings

is dened as
df (a)v 

d
d
=
where v = (o),

= 1 , F = f 1 ,
(f )-(F )-dt
dt
t=0
t=0

see Figure 4.3.16.

V M

U RM
1

(I )

(I )
a

R
Figure 4.3.16.
In Appendix 6.3B we consider the level sets of a function : X R, C k , k 1.
If 0 is the regular value of and M = {x X : (x) = (a)}
= , then M is a C k dierentiable manifold (with a parameter space X1 = Ker  (a), see Remark 4.3.9(i)
and the proof of Proposition 4.3.8). In this case Ta M can be identied with X1 (the
analogue of Proposition 4.3.33).
The following notion is also useful in nonlinear analysis.
Denition 4.3.41. A vector eld on a dierentiable manifold M is such a mapping
v : M T M that
for all x M .
v(x) Tx M
A vector eld v determines the dierential equation
x = v(x).

(4.3.24)

A solution of (4.3.24) is a curve : I M (I is an open interval in R) such that


(t)

= v((t))

for all

t I .

If we choose a chart (V, ) at a point a M , we can try to nd a solution of (4.3.24)


which passes through the point a, i.e., we are looking for a curve : I  0 M which
is a solution of (4.3.24) and (0) = a. If
v(x) =

M

i=1

v i (x)

yi

for

x V,

4.3A. Dierentiable Manifolds, Tangent Spaces and Vector Fields

191

then the equation (4.3.24) has the form of a system


y i = v i ( y),

i = 1, . . . , M.

(4.3.25)

The local existence (and uniqueness) theorem for the system (4.3.25) can be used asi
suming that the vector eld v is continuous (and all partial derivatives (vy)
are conj
tinuous). A standard continuation process then yields a solution which is dened on a
maximal time interval Ia ((0) = a). It is well known that even very simple dierential
equations in R need not have any solution dened on the whole of R (e.g., x = x2 + 1).
The situation is better in the case of a compact dierentiable manifold M (i.e., when M
is a compact subset of RN ). If v is continuous on M , a M , then there exists a solution
of (4.3.24), (0) = a, which is dened on the whole of R. Because of the compactness
k

of M there is a nite number of charts (Vi , i ), i = 1, . . . , k such that M =
Vi . By
i=1

the continuity of v there is also a constant K for which


|v(x)| K

for all

x M.

Any solution is therefore uniformly continuous on I . If I


= R, then the limits (in
M ) of at the terminal point(s) of I exist, and could be continued. The reader is
invited to ll in all details of this proof using local coordinates.
The global existence of solutions means that the map : R M M :
(t, a) = a (t)
where a is a solution of (4.3.24) on R which satises
a (0) = a,
is a smooth (provided v is smooth) dynamical system on M . On the other hand, with
any smooth dynamical system on M we can associate a smooth vector eld v on
M . The reader who is interested in dynamical systems on manifolds can consult, e.g.,
Chillingworth [24], Ruelle [114] for brief information, Palis & De Melo [101] or Katok &
Hasselblatt [74] (this is actually much more than an introduction).
We have mentioned on page 165 the role of the rst integrals for (autonomous)
systems of ordinary dierential equations. The notion of the rst integral is connected
with partial dierential equations, since f : M R is the rst integral of (4.3.24) if f
satises (in local coordinates) the linear partial dierential equation of the rst order
 F
d
d
f ((t)) =
F ((t)) =
((t))v i ((t)) = df (x)(v(x)) = o
dt
dt
y
i
i=1
n

where
x = (t)

and

(t) = ((t)).

The solutions of (4.3.24) are called the characteristics of this partial dierential equation
for an unknown function F .
We obtain a system of partial dierential equations by considering a family of vector
elds. Let v1 , . . . , vk be vector elds on a manifold M and denote by
V (x) = Lin{v1 (x), . . . , vk (x)}

192

Chapter 4. Local Properties of Dierentiable Mappings

the subspace of Tx M . The rst integral of the system v1 , . . . , vk (or the collection of
subspaces {V (x)}xM ) is a function f : M R for which
df (x)(vi (x)) = 0,

i = 1, . . . , k,

x M,

(4.3.26)

or equivalently, df annihilates all V (x), x M , i.e., Lw f = o for all vector elds w such
that w(x) V (x). (Lw f is the so-called Lie derivative see Exercise 4.3.46.) From this
formulation it is clear that we can suppose that the vector elds v1 , . . . , vk are linearly
independent at each x M . Contrary to the case of one equation, the system (4.3.26)
need not have a solution.
The following problem is similar to the preceding one: Let G be an open subset
of RM and let g = (g 1 , . . . , g M ) be a smooth mapping from G R into RM . Since RM
can be also interpreted as the dual space to RM , the mapping g determines a system of
partial dierential equations
u (x) = g(x, u(x)),

x G RM .

(4.3.27)

Expressing the Frechet derivative (i.e., the dierential) u (x) in terms of partial derivatives we get the system
u
(x) = g i (x, u(x)),
xi

i = 1, . . . , M,

x G.

If this system has a solution u, then u C 2 (G) (since g is supposed to be smooth), which
implies a necessary condition for the existence of a solution given by mixed derivatives
2
2
u
u
= xj x
, i, j = 1, . . . , M ):
( xi x
j
i
g i
g i j
g j
g j i
g =
g,
+
+
xj
u
xi
u

i, j = 1, . . . , M.

(4.3.28)

It is a question how to formulate this integrability condition for the system (4.3.26).
The system (4.3.26) is said to be completely integrable in M if for any x M there
is a submanifold N (x) of (the integral manifold ) M containing x such that
i (Ty N (x)) = V (y)

for all

y N (x)

(i is the natural embedding of N (x) into M ). Notice that for one vector eld v
= o
(i.e., dim V (x) = 1) the integral manifold is the integral curve of the system x = v(x)
and, in general, it contains the integral curves of all equations
x = v i (x),

i = 1, . . . , M.

The gluing of all integral curves need not be a manifold. A possible problem is shown in
Remark 4.3.43 below.
The basic result on complete integrability is the next theorem (Frobenius Theorem) which we state without proof. This theorem is an important tool in dierential
geometry, and the reader can nd its proof in textbooks on this subject, e.g., Aubin [11],
Sternberg [124, 3.5]. To formulate the theorem in a compact form we need15 the notion
15 We

will give another formulation of this theorem at the end of Appendix 4.3B.

4.3A. Dierentiable Manifolds, Tangent Spaces and Vector Fields

193

of Lie brackets (they are sometimes, mainly in applications to Hamiltonian mechanics,


called Poisson brackets). If v, w are two smooth vector elds with local representations
v=

M


vi

i=1

,
yi

w=

M


wj

j=1

,
yj

then [v, w] is the vector eld with the local representation


M

M
i
i



j w
j v
w
.
[v, w] =
v
y
y
y
j
j
j
j=1
i=1

(4.3.29)

For another interpretation of this operation see Remark 4.3.43 below or Exercise 4.3.47.
Theorem 4.3.42 (Frobenius). Let v1 , . . . , vk be smooth vector elds on a manifold M .
Then this system is completely integrable if and only if
[vi (x), vj (x)] V (x)

for all

x M,

i, j = 1, . . . , k.

Remark 4.3.43. Suppose that two smooth vector elds v, w are given on a (compact)
manifold M and let v , w be the corresponding dynamical systems on M . There is no
reason to expect that these systems commute, i.e.,
v (t, w (s, x)) = w (s, v (t, x)),
and it is not dicult to construct a counterexample which conrms that (see Figure 4.3.17).

v (t, x) = y1
w (s, y1 )

v (t, y2 )
w (s, x) = y2
Figure 4.3.17.
It can be shown that a necessary and sucient condition for commutativity is that
[v, w] = o
(see (4.3.29)).
Exercise 4.3.44. We say that a function f : M R is an integral of the equation (4.3.24)
if f (()) is constant for any solution of (4.3.24). If this is true only locally, f is called a
local integral. Suppose that dim M = 2 and (V, ) is a local chart of M , f is the integral
of (4.3.24) and df (x)
= 0 for all x V. Prove the following assertions.
(i) There is no stationary point of (4.3.24) in V (a V is a stationary point of (4.3.24)
if (t) = a, t R, is a solution of (4.3.24)).

194

Chapter 4. Local Properties of Dierentiable Mappings

(ii) Take functions g : U R, (U) = V, such that


z = (y) = (F (y), g(y))

F = f ,

where

is a dieomorphism of U onto U (why does such a function exist?). Then there


exists h : U R such that the vector eld v has the form
v(x) = 0

+ h(z)
z1
z2

in these new local coordinates z = (z1 , z2 ) (use the transformation rule (4.3.17)
and the fact that f is an integral of (4.3.24)). Notice that h(z)
= 0 for all z U .
(iii) Put


H(z1 , z2 ) =

z2
0
z2

d
h(z1 , )

(what is the relation of H to a solution of (4.3.24) in the z-coordinates?). Consider


another transformation of coordinates
= (z1 , H(z1 , z2 )).
Then the vector eld v has the form
v(x) = 0

+1
1
2

in the local coordinates = (1 , 2 ). Cf. Exercise 4.3.26.


Can you formulate the result in terms of ordinary dierential equations? Can you generalize this result to higher dimensional manifolds?
Exercise 4.3.45. Let v1 , . . . , vk be smooth vector elds on a dierentiable manifold M
which are linearly independent on a neighborhood U of a M . Assume that
[vi , vj ] = o,

i, j = 1, . . . , k,

on

U.

Prove that there exist local coordinates (y1 , . . . , yM ) such that


vi =

,
yi

i = 1, . . . , k,

in a neighborhood of a.
Hint. Cf. Exercises 4.3.44 and 4.3.26.
Exercise 4.3.46. Let M be a dierentiable manifold of the class C and let X be the set
of all C functions on M . Let D : X X satisfy
(D1) D is linear,
(D2) D(f g) = gDf + f Dg for all f, g X (pointwise multiplication).
Show that there is a vector eld v on M such that
Lv f  df (x)(v(x)) = Df (x),

x M,

f X.

4.3B. Dierential Forms

195

Here Lv f is the directional derivative (in the direction of the vector eld v) which is also
called the Lie derivative (cf. page 192).16
Hint. Put
M


v=
ai
where ai = Dyi .
y
i
i=1
Show that
Df Lv f = o
holds for polynomials of degree 1 on U. Then use the Taylor polynomial. It remains
to extend this result from local charts to the whole manifold use a partition of unity.
See Denition 4.3.74 and Theorem 4.3.76.
The converse statement, i.e., the fact that Lv satises (D1), (D2) for smooth v
is easy to prove. (Do that.) Is there any dierence between the dierential and the Lie
derivative?
Exercise 4.3.47. Let v, w be two smooth vector elds on a dierentiable manifold. Dene
the vector eld [v, w], the so-called commutator (or Lie bracket) of v, w, by the formula
L[v,w] f = Lv (Lw f ) Lw (Lv f )

for every

f X

(see (4.3.29) and Exercise 4.3.46). Show that this denition is correct, i.e., [v, w] is a
vector eld, and show that the Jacobi identity
[u, [v, w]] + [v, [w, u]] + [w, [u, v]] = o
holds for any smooth vector elds u, v, w.17

4.3B Dierential Forms


Before starting with the notion of a dierential form we need to summarize some basic
facts from multilinear algebra. Let X be a (real) linear space. A bilinear form is a map
A : X X R which is linear in both variables. A typical example of a bilinear form
is the scalar product in a real Hilbert space.18 A p-linear form A : X X R is



p

dened in a similar way.


16 Sophus Lie was one of the promoters of geometric methods in analysis. A topological group
(i.e., the group with a topological structure such that group operations are continuous) with
the structure of a dierentiable manifold (e.g., S N , groups of regular or orthogonal matrices) is
called a Lie group.
17 Let A be a set with two binary operations +, such that
(A1) A with operations +, is a ring,

(A2) a a = o for all a A,


(A3) (a b) c + (b c) a + (c a) b = o for all a, b, c A.
Then A is said to be a Lie ring. If A is, moreover, a linear space, then A is called a Lie algebra.
(If A is an associative ring and [a, b] = a b b a, then (A, +, [, ]) is a Lie ring.) For more
information see, e.g., Adams [1], Bourbaki [15], Br
ocker & Dieck [17], Helgason [66].
18 This is not true for a complex Hilbert space since (x, y) = (x, y) for C.

196

Chapter 4. Local Properties of Dierentiable Mappings

Denition 4.3.48. A p-linear form A is said to be skew-symmetric if


A(x(1) , . . . , x(p) ) = sgn A(x1 , . . . , xp )
holds for any permutation of the set {1, . . . , p} and all x1 , . . . , xp X. Here sgn = 1 if
the number of sign changes in (a sign change occurs whenever i < j and (i) > (j)) is
even and sgn = 1 if this number is odd. The collection of all skew-symmetric p-linear
forms is denoted by p (X).
Remark 4.3.49.
(i) Let e1 , . . . , eM be a basis of X and f 1 , . . . , f M its dual basis, i.e., a basis of the
space X of all linear forms on X for which

i

f (ej ) =

ji

1,
0,

i = j,
i
= j.

Then any element x X can be expressed in the from


x=

M


f i (x)ei .

i=1

For A p (X), p M = dim X, we have


A(x1 , . . . , xp ) =

M


f i1 (x1 ) f ip (xp )A(ei1 , . . . , eip )

i1 ,...,ip =1


1i1 <<ip M

sgn f

i(1)

(x1 ) f

i(p)

(xp )

{1,...,p}

A(ei1 , . . . , eip )
=

det (f ij (xk ))j,k=1,...,p A(ei1 , . . . , eip ).

1i1 <<ip M

In particular, if p = M , then
A(x1 , . . . , xM ) = det (f i (xk ))i,k=1,...,M A(e1 , . . . , eM ),

(4.3.30)

i.e., dim M (X) = 1. Notice also that p (X) = {o} for p > M .
(ii) Elements x1 , . . . , xp of X are linearly dependent if and only if
A(x1 , . . . , xp ) = 0

for all

A p (X).

This follows easily from the formula given above.


The product operation can be dened in the family of skew-symmetric forms.

4.3B. Dierential Forms

197

Denition 4.3.50. Let A p (X), B q (X) be skew-symmetric forms. Then their


exterior product A B is the (p + q)-skew-symmetric form dened by the formula

A B(x1 , . . . , xp+q ) =
sgn A(x(1) , . . . , x(p) )B(x(p+1) , . . . , x(p+q) ).
{1,...,p+q}
(1)<<(p)
(p+1)<<(p+q)

Remark 4.3.51.
(i) The exterior product of three or more skew-symmetric forms is dened by induction
and the associative law holds, i.e.,
A B C  (A B) C = A (B C).
(ii) The exterior product is not commutative. Namely,
B A = (1)pq A B

for

A p (X),

B q (X).

Example 4.3.52.
(i) If A, B are one-forms (i.e., linear forms), then
A B(x1 , x2 ) = A(x1 )B(x2 ) A(x2 )B(x1 ).
More generally, by induction,
A1 An (x1 , . . . , xn ) = det (Ai (xj ))i,j=1,...,n

for one-forms

A1 , . . . , A n .

(ii) If e1 , . . . , eM and f 1 , . . . , f M are mutually dual bases of X and X , respectively,


then
A = A(e1 , . . . , eM )f 1 f M
for any A M (X).
In other words, f 1 f M generates M (X) (dim X = M ). More generally, the
products
(f i1 f in )1i1 <<in M
form a basis of n (X), i.e., for any A n (X) there are scalars a... such that

A=
ai1 ,...,in f i1 f in
(4.3.31)
1i1 <<in M =dim X

where ai1 ,...,in = A(ei1 , . . . , ein ) (see Remark 4.3.49(i)).

The main goal of this appendix is to investigate skew-symmetric forms on manifolds


which are continuous (smooth) with respect to the topological (dierential) structure of
the manifold. The basic denition is the following:
Denition 4.3.53. Let M be a dierentiable manifold of dimension M . A mapping
: x M (x) p (Tx M )
is said to be a p-dierential form on M if is continuous (or smooth) in the following
sense:

198

Chapter 4. Local Properties of Dierentiable Mappings


Let (V, ) be a local chart of M and let

(x) =
ai1 ,...,ip (x) dy i1 dy ip

(4.3.32)

1i1 ip M

be the representation of in this chart (see (4.3.31)). Then all functions


ai1 ,...,ip are continuous (or smooth) in V.
Remark 4.3.54.
(i) A smooth function f : M R is sometimes called a dierential form of order 0.
Its dierential df is the one-form with the local representation
df (x) =

M

(f )(y) i
dy ,
yi
i=1

(y) = x

(see (4.3.20)).
(ii) Let be a p-dierential form in RN with the representation

(x) =
ai1 ,...,ip (x) df i1 df ip
1i1 <<ip N

(f 1 , . . . , f N is the dual basis to the standard one e1 , . . . , eN in RN ). In accordance


with the notation of coordinates in RN we can also write

ai1 ,...,ip (x) dxi1 dxip .
(x) =
1i1 <<ip N

If M is a dierentiable manifold in RN of dimension M p, then can be restricted


to M . Since Tx M RN (see (4.3.16)) we have







(x)
,...,
ai1 ,...,ip (x) det f ik
.
=
yj1
yjp
yjl
k,l=1,...,p
1i <<i N
1

are local charts at the same point, then we have two represen )
(iii) If (V, ) and (V,

tations for a p-dierential form in V V:



fi1 ,...,ip dy i1 dy ip ,
(x) =
ii1 <<ip M

(x) =

gj1 ,...,jp dz j1 dz jp .

1j1 <<jp M

(Here dy 1 , . . . , dy M is the basis of (Tx M ) with respect to the local chart (V, )
The Transformation Rule (4.3.22) yields a
).)
and similarly dz 1 , . . . , dz M for (V,
relation between the coecients f... and g... . This relation is simple for M -forms
(M = dim M ), namely


j
((x)) dy 1 dy M
(x) = g(x) dz 1 dz M = g(x) det
yi

4.3B. Dierential Forms


'
where

j
yi

199

(
(y)

is the Jacobi matrix of the transformation z = (y) of


i,j=1,...,M

local coordinates (see Figure 4.3.14). The determinant of the Jacobi matrix will be
called the Jacobian and denoted by J (Example 4.1.5). This Transformation Rule
can be generalized to mappings between manifolds in a way similar to (4.3.19) and
(4.3.23). If g : M M is a smooth map and is a p-dierential form on M, then
the formula
(4.3.33)
g (x)(v1 , . . . , vp ) = (g(x))(gv1 , . . . , g vp ),
x M , v1 , . . . , vp Tx M , where g vi is the push-forward of the tangent vector vi
(see (4.3.19)), denes the pull-back of . To obtain a local representation of the type
(4.3.33) we choose local coordinates at x, put vk = yj and use the Transformation
k

Rule (4.3.19). However, the nal formula is rather cumbersome and we will not need
it with the exception of the case when dim M = dim M = M and is an M -form,
(z) = f (z) dz 1 dz M .
Then

g (x) = f (g(x))JG ((x)) dy 1 dy M

where G is the local realization of g (see Figure 4.3.15). An important special case is
where is a coordinate mapping U RM V M . The next example shows
how to compute the pull-back of (M 1)-forms for small M . These formulae are
often used in vector calculus see also special cases of integration in Appendix 4.3C.
Example 4.3.55.
(i) Let
(x, y) = f (x, y) dx + g(x, y) dy
be a 1-form in R2 and = (1 , 2 ) : (a, b) R2 a smooth curve. Then




(t)
= ((t))
t
t
= f (1 (t), 2 (t)) 1 (t) dt + g(1 (t), 2 (t)) 2 (t) dt.
(ii) Let
(x, y, z) = f (x, y, z) dy dz + g(x, y, z) dz dx + h(x, y, z) dx dy
be a 2-form in R3 and : (u, v) U R2 R3 a smooth parametrization of a
surface S in R3 . Then
[ (u, v)](e1 , e2 ) = ((u, v))( (u, v)e1 ,  (u, v)e2 )
     

and, if

= u1
+ u2
+ u3 ,
u
x
y
z

= v1
+ v2
+ v3
v
x
y
z

(here x
is actually the rst vector of the standard basis in R3 see Remark
4.3.54 (ii)), then



= u2 v3 u3 v2 ,
etc.,
dy dz
,
u v

200

Chapter 4. Local Properties of Dierentiable Mappings


and eventually


,
u v

(f, g, h),

u
v


R3

where the brackets (, )R3 denote the scalar product in R and u


v
is the so

called cross (or vector) product of vectors u = (u1 , u2 , u3 ), v = (v1 , v2 , v3 ) in R3 ,


i.e.,

= (u2 v3 u3 v2 , u3 v1 u1 v3 , u1 v2 u2 v1 ).
u
v
3

Remark 4.3.56. The reader can ask why it is necessary (or reasonable) to introduce
dierential forms even though vectors and vector elds have been dened. Actually,
there is only a technical dierence for one-forms, since Ta M is isomorphic to its dual
(Ta M ) . For example, df (a) (Ta M ) and therefore it can be represented by a scalar
product in Ta M . Since Ta M is a linear subspace of RN (for M RN ) we may dene
the scalar product in Ta M as
(v, w)Ta M  (v, w)RN

for

v, w Ta M .19

In particular, this means that there is a vector f (a) the so-called gradient of f such
that
df (a)(v) = (v, f (a))Ta M .
If
df (a) =
then


fi (a) =

fi (a) dyi ,

, f (a)
yi


.20
Ta M

The reason for distinguishing between dierential forms and vector elds lies in the richer
structure of the collection of all dierential forms there are operations like the exterior
product and the exterior dierential (Denition 4.3.57). Moreover, the dierential forms
1 = f dx + g dy + h dz

and

2 = f dy dz + g dz dx + h dx dy

can be attached to the vector eld


F =f

+g
+h .
x
y
z

We will see in Appendix 4.3C that the integral of 1 along a curve can be
interpreted as work done by the force eld F along and the integral of 2 along a
surface S has the meaning of the rate at which a uid ow represented by the velocity
eld F crosses S. Another reason consists in a simplication of various notions and
results of classical vector analysis and dierential geometry. Examples like orientation,
elementary volume and the Stokes Theorem will be shown in Appendix 4.3C.
19 In

this connection see footnote 27 on page 214.

warn that vectors y


, . . . , y need not be orthogonal in Ta M !

20 We

4.3B. Dierential Forms

201

Denition 4.3.57. Let M be a dierentiable manifold of dimension M and let be a


smooth p-dierential form on M which has the local representation (4.3.32). Then the
dierential of is the (p + 1)-dierential form d with the local representation


d(x) =

dai1 ,...,ip (x) dy i1 dy ip .

(4.3.34)

1i1 <<ip M

Example 4.3.58.
(i) If f : M R is a dierentiable function, i.e., a 0-form, then the dierential df
given by Remark 4.3.54(i) is the same as that in (4.3.34).
(ii) Let
(x) = f1 (x) dx1 + f2 (x) dx2 + f3 (x) dx3
be a 1-form on an open set G in R3 and (x1 , x2 , x3 ) the Cartesian coordinates of a
point x. If f1 , f2 , f3 are smooth functions on G, then
d(x) = df1 (x) dx1 + df2 (x) dx2 + df3 (x) dx3
f1
f1
f1
dx1 dx1 +
dx2 dx1 +
dx3 dx1 +
x1    x2
x3
=0




f1
f2
f3
f2
dx1 dx2 +
dx2 dx3

=
x1
x2
x2
x3


f1
f3
+

dx3 dx1 .
x3
x1
=

If we interpret the components (f1 , f2 , f3 ) of the form as those of a vector eld


v (Remark 4.3.56), then the components of d, more precisely

f2 f1
f3 f2
f1
f3

x2
x3 x3
x1 x1
x2


,

form the so-called curl of v (notation curl v or v; for the cross product see
e
Example 4.3.55(ii)).
Remark 4.3.59. By computing the dierentials dfi1 ...ip as in the previous example and
rearranging the sum in (4.3.34) to get rid of the zero terms dy i1 dy ip+1 where
two indices coincide, we obtain, e.g., for an (M 1)-form,
(x) =

M


fi (x) dy 1 7
dy i dy M

i=1

(here 7
dy i means that dy i is missing),
d(x) =

M

i=1

(1)i

(fi )
((x)) dy 1 dy M .
yi

Here , are given by a local chart in x M .

202

Chapter 4. Local Properties of Dierentiable Mappings

Example 4.3.58(i) leads to the following question: Are all one-dierential forms
dierentials of smooth functions? In other words, has any (continuous) one-form a
primitive function f , i.e., is there f such that df = ? A short speculation on oneforms in R2 suggests obstacles caused by mixed partial derivatives. We investigate this
problem in a more general way.
Proposition 4.3.60.
(i) Let be a dierential form of the class C 2 . Then
d2  d( d) = 0.
(ii) Let and be a p-dierential form and q-dierential form, respectively, then
d( ) = ( d) + (1)q ( d).
Proof. An easy proof is left to the reader. Notice however that the exchangeability of

mixed partial derivatives of C 2 -functions is the crucial point in the statement (i).
Denition 4.3.61. A dierential form is said to be
(1) closed if d = 0,
(2) exact if there is a dierential form such that = d.
Remark 4.3.62. The concept of exact dierential forms is a generalization of the classical
notion of the potential of a mapping f : RM RM : A function F : G R is called a
potential of f in an open set G RM if
F  (x)h = (f (x), h)RM ,

x G,

h RM .

In particular, if F is a potential of a C 1 -function, then


fi
fj
(x) =
(x),
xj
xi

i, j = 1, . . . , M,

x G.

The following example shows that this necessary condition is not sucient.
Example 4.3.63. Let G = R2 \ {(0, 0)} and let
(x, y) =

x
y
dx + 2
dy
x2 + y 2
x + y2

be a 1-form in G. This form is closed in G. Suppose now that is exact, i.e., there is a
function f : G R such that df = , in particular,
y
f
= 2
,
x
x + y2

x
f
= 2
y
x + y2

in

G.

Integrating, we obtain

arctan y + C(y)
f (x, y) =

arctan y + D(x)
x

for

(x, y) G, y
= 0,

for

(x, y) G, x
= 0.

4.3B. Dierential Forms

203

Since arctan z + arctan 1z = 2 for z


= 0, we have C(y) D(x) = 2 , i.e., C and D are
constant functions in all quadrants. Taking limits for x 0 , y 0 we arrive at a
contradiction, i.e., is not exact.
The reader can ask how we have found this example. The problem is more transparent if R2 is identied with the complex plane. If
F (z) =
then
Re F (x + iy) =

1
,
z

z = x + iy,

x
,
x2 + y 2

Im F (x + iy) =

y
.
x2 + y 2

It is well known that there is no (holomorphic) function such that


 (z) =


1
z

for all

z C \ {0}

dz
). In the theory of functions of a complex variable a primitive function
z
can be constructed by a curve integral. We will use the same approach in constructing a
primitive form to a dierential form. This is the main idea of the proof of the following
e
basic result.
(consider

S1

Theorem 4.3.64 (H. Poincare). Any closed dierential form on a dierentiable manifold
is locally exact.
Proof. Let be a closed p-form on an M -dimensional manifold M (1 p M ). We
choose a local chart (V, ) such that P (V) = U is an open ball in RM with center at
the origin. The pull-back (see Remark 4.3.54(iii))  ( = (P )1 ) is a p-form in
U. We dene a (p 1)-form on U by the formula
 1
tp1 (ty)(y, v1 , . . . , vp1 ) dt
(y)(v1 , . . . , vp1 ) =
0

for y U, v1 , . . . , vp1 Ty U .

21

We have to show that

(i) the integral exists (this fact follows from the continuity of t (ty));
(ii) is a (p 1)-dierential form on U (the skew-symmetry of follows from the same
property of );
(iii) d(y) = (y) for y U.
Verication of the last statement is technically complicated. The case p = 1 is
more transparent, and therefore we will give the computation only for this case. For the
induction step the reader can consult, e.g., Sternberg [124, Theorem III.4.1], Cartan [21,
Theorem II.3.2.12.1] or Taylor [127, Theorem 1.13.2].
Suppose that has in U the form
(y)  ( )(y) =

M


gi (y) df i ,

y U,

i=1
21 Here

we identify the point y U with the vector y Ty U = RM .

204

Chapter 4. Local Properties of Dierentiable Mappings

where f 1 , . . . , f M is the dual basis to the standard one e1 , . . . , eM in RM . We wish to


show that the function
M  1

gi (ty)yi dt,
y = (y1 , . . . , yM ) U,
(4.3.35)
(y) 
i=1

has the dierential


d(y) = (y),

(y) = gj (y),
yj

i.e.,

j = 1, . . . , M.

By dierentiating the integral (4.3.35) with respect to the parameter yj we obtain

(y) =
yj

gj (ty) dt +
0

M 

i=1

1
0

gi
(ty)tyi dt =
yj

gj (ty) dt +
0

M 

i=1

1
0

gj
(ty)tyi dt.
yi

For the last equality we have used the assumption d = 0 and Exercise 4.3.71(iv):
d = d( ) = ( d) = 0
and, consequently,
gj
gi
=
,
i, j = 1, . . . , M.
yj
yi
Using integration by parts we get
 1
 1
M  1

8
9t=1
d
gj
gj (ty) dt = tgj (ty) t=0
t gj (ty) dt = gj (y)
t
(ty)yi dt.
dt
yi
0
0
0
i=1
If we put f (x) = (y) for x = (y), then


df = .

Remark 4.3.65. The proof of the case p = 1 shows that there exists a potential of
a smooth mapping g = (g 1 , . . . , g M ) : U RM in a ball U provided the symmetry
conditions
g j
g i
=
,
i, j = 1, . . . , M,
hold.
yj
yi
Example 4.3.63 suggests that certain topological properties of U are necessary if U is not
a ball. In the proof of the previous theorem the potential was dened by the curve
integral


1

(y) =
0

(g((t)), (t))

RM dt 

22

o,y

along the curve o,y = {ty : t [0, 1]}. The crucial point in the direct computation of
the Frechet derivative of is an estimate of the dierence (y + h) (y). If the curve
integral depends only on the initial and terminal points and not on the path which joins
these points, then
 1

=
(g(y + th), h)RM dt = (g(y), h)RM + o(hRM )
(y + h) (y) =
y,y+h

22 The

denition of the curve integral and of the integral of a dierential form is given in the
next Appendix 4.3C.

4.3B. Dierential Forms

205

provided g is continuously dierentiable. Example 4.3.63 can be easily adapted to show


that the independence of the curve integral on the path in U implies that U is not
punctured. There is another way to express this observation. It consists in considering
the obstacles preventing a closed form from being exact.
Assume that M is a connected manifold and denote the group (with respect to
pointwise addition) of closed p-dierential forms on M by Z p (M ) and the subgroup of
exact p-forms by B p (M ). The quotient
H p (M )  Z p (M )|B p (M )
is called the p-(de Rham) cohomology group of M . If H 1 (M ) is trivial, i.e., any closed
one-form in M is exact, then M is said to be simply connected . More details on the role
of cohomological groups in the study of dierentiable manifolds can be found, e.g., in
Whitney [133, Chapter IV]. The calculation of cohomology groups is by no means trivial.
Example 4.3.66.
(i) Let f : M N be a smooth map. If Z p (N ), then
d(f ) = f ( d) = 0,

i.e.,

f Z p (M ).

(Exercise 4.3.71(iv)). Similarly, if B p (N ), then f B p (M ). This means


that f induces a linear map
f : H p (N ) H p (M ).
In particular, if f is a dieomorphism of M onto N , then H p (M ) is isomorphic
to H p (N ).
(ii) Using the previous example we can show that H 1 (S 1 ) is isomorphic to R. Instead
of H 1 (S 1 ) it is sucient to compute H 1 (R|Z ): Denote by i the natural projection
of R onto R|Z and consider a closed one-form on R|Z . Then
f (x) dx  i (x)
where f is a 1-periodic function. Dene

() =

f (x) dx.
0

It is easy to see that () = 0 if and only if is exact, and also that maps
B 1 (R|Z ) onto R. This shows that induces the isomorphism of H 1 (R|Z ) onto R.

Now we explain the notion of a simply connected domain in another way which
will be important in the sequel (e.g., in the degree theory Appendix 4.3D).
Denition 4.3.67. Let X, Y be metric (topological) spaces. Continuous maps f, g : X Y
are called homotopic if there exists a continuous map : X [0, 1] Y such that
(, 0) = f (),

(, 1) = g().

Such is said to be a homotopy between f and g.

206

Chapter 4. Local Properties of Dierentiable Mappings

Remark 4.3.68. The relation between two continuous maps to be homotopic is clearly
an equivalence relation. The set of all continuous maps C(X, Y ) is therefore divided into
disjoint classes of mutually homotopic maps. We denote the class containing f by [f ].
Here we are using the homotopy concept mainly for curves. The reader can imagine
that curves 0 , 1 : [0, 1] M are homotopic if 0 may be continuously deformed (in M !)
into 1 . A curve is called null-homotopic if is homotopic to a constant curve (point)

: t [0, 1] a M . In particular, this is important for closed curves ((0) = (1)).


To see this, choose a xed point a M and dene
H1 (M )  {[] : : [0, 1] M is continuous, (0) = (1) = a}.
H1 (M ) forms a group the so-called fundamental group of M under multiplication
= 2 1 dened by

9
8
1 (2t),
t 0, 12 ,
81 9
(t) =
2 (2t 1), t 2 , 1
(notice that the denition [2 ] [1 ]  [2 1 ] is correct). If M is path-connected, i.e., for
any a, b M there exists a continuous curve in M such that (0) = a, (1) = b, then
the fundamental group of M does not depend on the choice of the point a. Whenever the
fundamental group is trivial, i.e., any closed curve can be continuously deformed into a
point, then there are no holes in M and the integral of a one-form along a closed curve
is zero, i.e., this integral does not depend on the path (Remark 4.3.65).
Cohomology and homotopy groups belong to the main tools in algebraic topology.
The reader who is interested in these techniques can consult the corresponding textbooks,
e.g., Adams [1], Dold [36], Greenberg [61], Kosniowski [77] (here, in Chapter 26, you can
nd applications of fundamental groups to the classication of two-dimensional compact
connected surfaces; for example, such a surface is simply connected if and only if it is
homeomorphic to S 2 ), Spanier [122].
At the end of this appendix we link dierential one-forms and systems of dierential
equations and continue the discussion from Appendix 4.3A. To simplify it we assume that
=

M


i dxi

i=1

is a non-vanishing smooth one-form in an open set G RM . The form is uniquely


determined by its kernel up to a multiplication factor. Let v 1 , . . . , v M 1 be a basis of this
kernel, i.e., v 1 , . . . , v M 1 are vector elds on G which are linearly independent at each
point x G and annihilate . The equation
=0

(4.3.36)

is called the exterior dierential equation in G and its solution is a mapping


T : x G a subspace T (x) Tx G
such that
(v) = 0

for all

v T (x).

4.3B. Dierential Forms

207

A submanifold S of G is said to be an integral manifold for the equation (4.3.36) if


dim S = M 1

and

(y) = 0

for all

yS

where is a local parametrization of S in a neighborhood of y. This integral manifold


is the same object as that at the end of Appendix 4.3A, i.e.,
i (Ty S ) = Lin{v 1 (y), . . . , v M 1 (y)}
where i : S G is an embedding.
The notion of the exterior dierential equation can be generalized to a system
j = 0,

j = 1, . . . , k,

(4.3.37)

where , . . . , are dierential forms on a manifold M not necessarily of the same


order. In the special case when 1 , . . . , k are linearly independent one-forms ((4.3.37),
it is the so-called Pfa system), the intersection of their kernels has a basis formed by
M k vector elds. The Frobenius Theorem for the existence of an integral manifold for
(4.3.37) has the following form.
1

Theorem 4.3.69 (Frobenius, the dierential forms version). Let 1 , . . . , k be smooth differential one-forms on a dierentiable manifold M . The necessary and sucient condition for the existence of an integral manifold in a vicinity of any point of M is that
di 1 k = 0,

i = 1, . . . , k.

Proof. The equivalence of Theorem 4.3.42 and Theorem 4.3.69 is not dicult to prove,
and the case k = 1, dim M = 3 is rather instructive. In this case a connection to the
Poincare Theorem 4.3.64 and its proof should also be recognized.

Exercise 4.3.70. Denote by O(N ) the set of all regular linear mappings A : RN RN for
which A1 = A and by SO(N ) the set
{A O(N ) : det A = 1}.
the set O(N ) is a dierentiable manifold. Find its dimension.
(i) As a subset of R
Hint. Consider A A A. The dimension is N(N1)
.
2
NN

(ii) Show that SO(N ) is the component of O(N ) containing the identity.
(iii) Show that A O(N ) induces a mapping of S N1 into itself.
(iv) Let be a one-form on S 2 which is invariant under SO(3), i.e.,
A =

for all

A SO(3).

Prove that = 0.
(v) Does a result analogous to (iv) hold for a two-form on R3 ?
Exercise 4.3.71. Prove the following properties of the pull-back operation:
(i) g ( ) = (g ) (g ),
(ii) (h g) = g (h ),
(iii) g ( df ) = d(f g) for f : M R,23
(iv) g ( d) = d(g ) where d denotes the dierential.
23 If

we interpret f as a 0-form, then the notation g f instead of f g is more agreeable.

208

Chapter 4. Local Properties of Dierentiable Mappings

Exercise 4.3.72. Let M be the open unit ball in R2 without its origin. Show that H1 (M )
is isomorphic to Z. Is it also true in R3 ?
Exercise 4.3.73. Show that H1 (S 1 ) is isomorphic to Z.
Hint. Use an approach similar to that in Example 4.3.66(ii), and instead of the mapping
show that there is a lifting of a continuous closed curve : [0, 1] R|Z , i.e.,
: [0, 1]
R continuous such that
i(
) = ,
(0) = 0.
Now consider (1) (actually this is the degree of see Appendix 4.3D). For details see,
e.g., Kosniowski [77, Chapter 16].

4.3C Integration on Manifolds


We have met the curve integral in the previous Appendix 4.3B. There are two objects
which can be integrated along curves: functions and dierential one-forms.
The situation with functions is simple. If M is an M -dimensional dierentiable
manifold in RN , f : M R is a continuous function and : [a, b] M is a smooth
curve, then we dene
 b

f d 
f ((t))(t)

dt.
(4.3.38)

The Euclidian norm ()

of tangent vectors expresses here a quantity which could be


viewed as the innitesimal length of . Recall in this context the formula for the length
of :
 b
(t)

dt.
l() 
a

We will return to the length and area of a nonlinear object later in this appendix.
The integral on the right-hand side of (4.3.38) is the Riemann integral and consequently it has reasonable properties. It could be generalized to some noncontinuous
functions (via the Lebesgue integral) and/or to certain non-smooth curves (pairwise
smooth or with bounded variation via the RiemannStieltjes integral). Since we are not
interested in these generalizations we always assume that all objects are as smooth as
we need (manifolds at least of the class C 1 , functions, vector elds, dierential forms at
least continuous).
The situation with integration of one-forms is dierent. Namely, dierential forms
are dened only on manifolds (recall that an open subset of RM is also a manifold) and
curves need not be manifolds (see Figure 4.3.2). There are two possibilities to avoid these
obstacles: either to assume that lies on a manifold where the one-form is dened or to
restrict integration to curves which are themselves manifolds. We now examine the rst
possibility and postpone the other one to Denition 4.3.86.
For the denition of the integral of a one-form given in Remark 4.3.54 we have
assumed that the whole curve lies in one chart to get the same representation of the
form at all points of . If more charts are needed to cover the curve we have to be careful
not to integrate over some parts of the curve several times. To eliminate this risk the
following tool is very useful. In order to build it up we need a topological interlude.

4.3C. Integration on Manifolds

209

Denition 4.3.74. Let (Vn , n )nN be an atlas of a dierentiable manifold M . Let {n }nN
be a collection of smooth (often C ) nonnegative functions on M which have the following properties:
(1) for all n N the support of n dened by
supp n  {x M : n (x)
= 0}
is a compact subset of Vn ;


(2)
n (x) = 1 for all x M .
n=1

Then {n }nN is said to be a partition of unity subordinate to {Vn }nN .24


Since M RN is separable in the induced topology a countable atlas always exists.
It is also possible to construct a sequence {Gn }
n=1 of open subsets of M such that
Gn int Gn+1 , 25

Gn

is compact,

and

M =

Gn .

n=1

For example, Gn can be chosen as the intersection of M with the open ball centered at
o with radius n. For the construction of a partition of unity the following topological
device is convenient. We will need various types of balls so B(a; r) will denote the open
ball in RM (M = dim M ).
Lemma 4.3.75. Let {W }A be an open covering of an M -dimensional manifold M
in RN . Then there is a countable open covering {Vm }mN of M with the following
properties:
(i) {Vm }mN is subordinate to {W }A , i.e., for each m N there is an index m
A such that Vm Wm ;


m (B(o; 1)) = M ;
(ii) there are smooth mappings m : B(o; 2) Vm such that
m=1

(iii) the collection {Vm }mN forms a locally nite system, i.e., any point x M has a
neighborhood which intersects only a nite number of {Vm }mN .
Proof. Choose a sequence {Gn }
n=1 of open subsets of M which has the property stated
prior to this lemma. Put in addition G0 = G1 = . The main idea behind the forthcoming
construction is that the compact sets
Kn  Gn \ Gn1 ,

n N,

cover M and the larger open sets


Hn  Gn+1 \ Gn2 ,

n N,

24 A partition of unity is dened in topology in a more general way; see the corresponding
textbooks, e.g., Dugundji [43, Chapter VIII].
25 We wish to point out that topological notions (like interior) are taken here with respect to the
topology of M , i.e., G M is open provided there is an open set H RN such that G = M H.

210

Chapter 4. Local Properties of Dierentiable Mappings

Hn
Vxn

x
nx (z)

nx

P
(P )1
z

y+ z
2

o
B(o; 1)

B(y; )

B(o; 2)

Figure 4.3.18.

form a locally nite system. Fix n N. Let (V, ) be a local chart at


x H
n  Hn W .
Put
y = P (x).
P (V H
n ).

We now shift the center y to the origin and expand


There is a ball B(y; )
the ball appropriately, namely we put



1
y+ z
for z B(o; 2).
n
x (z) = (P )
2
With help of these smooth maps n
x we return back to the manifold by setting
Vxn = n
x (B(o; 1))
cover the
(see Figure 4.3.18). Notice that Vxn is open in M . Open sets {Vxn }xH
n ,A
compact set Kn . We choose a nite subcovering
Vxn1 , . . . , Vxnkn .
The collection {Vxnj }j=1,...,kn , nN , covers M , and
{n
xj (B(o; 2))}j=1,...,kn , nN ,
is the desired locally nite countable system {Vm }mN .

4.3C. Integration on Manifolds

211

Theorem 4.3.76. For any atlas (W , )A of a manifold M there exists a subordinate


partition of unity.
Proof. According to the previous Lemma 4.3.75 we choose a locally nite subordinate
covering {Vk }kN of M and the corresponding functions {k }
k=1 . It is easy to show that
the function
 1
e 1y2 , y < 1,
y RM ,
(y) =
0,
y 1,
is a C -function in RM . Put

k (x) =

x Vk , x = k (y),
x M \ Vk .

(y),
0,

Then k is smooth (of the same order as M ) and


k (x) > 0

for

x k (B(o; 1)).

Since {Vk }kN is a locally nite system, the series


nonzero terms. Moreover,

k (x) has only a nite number of

k=1

k (x) > 0 for all x M due to

k=1

k (B(o; 1)) = M . It is

k=1

now sucient to put


k (x)
k (x) = 
,

n (x)

k N,

n=1

to obtain the desired partition of unity.

We can now return to the denition of the integral of a one-form . If {n }nN is


a partition of unity which is subordinate to a covering {Vn }nN of M where (Vn , n )nN
is an atlas of M , then
(x) =

x M.

n (x)(x),

n=1

Notice that n is a one-form and supp n Vn . This decomposition of allows us to

T(t) M
dene the integral locally. If is a smooth curve in M and (t) Vn , then (t)
and it can be written in the form
(t)

M


i (t)

i=1

.
yi

Denition 4.3.77. Let M be an M -dimensional dierentiable manifold and denote


(Vn , n )nN its atlas. Let {n }nN be a partition of unity subordinate to {Vn }nN . Let
: I M be a smooth curve and a one-form on M . If
(x) =

M

i=1

fi (x) dy i ,

x Vn ,

212

Chapter 4. Local Properties of Dierentiable Mappings

then we dene

=



n=1

n 



n=1

n ((t))
I

M


fi ((t)) i (t) dt

(4.3.39)

i=1

provided the integrals on the right-hand side exist and the sum is absolutely convergent.
Remark 4.3.78.
(i) If is a smooth curve dened on a compact interval I = [a, b] and {Vn }nN is
a locally nite covering of M , then lies in a nite number of {Vn }nN only. If,
moreover, the form is continuous, then the integrals in (4.3.39) exist and the sum
is absolutely convergent, since it contains only nitely many nonzero terms. We
require absolute convergence of the series because we do not want the value of the
integral to depend on the arrangement of charts (Vn , n )nN into a sequence.
(ii) It can be proved (do it as an exercise!) that the formula (4.3.39) does not depend
on the choice of partition of unity. It should be also proved that the right-hand side
in (4.3.39) is the same for all equivalent atlases on M (see Denition 4.3.34). This
follows from the transformation rules for tangent vectors (4.3.17) and for dierential
forms (4.3.22).
Remark 4.3.79. We can interpret the local coordinates (f1 , . . . , fM ) of a one-form as
the local coordinates of a vector eld
F (x) =

M


fi (x)

i=1

,
xi

x Vn

(and vice-versa Remark 4.3.56). If we dene




F =
,


F expresses the work done by the vector eld F along the curve .

then the integral

The special cases M = R2 , R3 are known from introductory courses in mechanics (see,
e.g., Kittel, Knight & Ruderman [76, Chapter 5]).
Remark 4.3.80. Figures 4.3.2 and 4.3.3 show that a smooth curve need not be a dierentiable manifold in RN . In order to avoid such cases it is sucient to assume that the
curve has a parametrization which is an embedding (Remark 4.3.3(iv)). If, moreover,
lies on a manifold M , then it is the so-called submanifold of M in the following sense:
A subset P of a dierentiable manifold M is said to be a P -dimensional submanifold of
M if there is an atlas (Vn , n )nN of M such that
n (x) = (y1 , . . . , yP , 0, . . . , 0) RN

for all

x Vn P.

The proof of Proposition 4.3.2 shows that the image of an embedding is a submanifold.
In order to integrate functions over a surface in R3 , or more generally over a manifold, we need to generalize the notion of area of a parallelogram to a non-at domains. Let

4.3C. Integration on Manifolds

213

us recall here the denition of the multiple Riemann integral. The notion of (normalized)
area or volume is based on the fact that the unit cube
.
M

x i ei : 0 x i 1
C
i=1

(e1 ,...,eM is the standard basis in RM ) has the M -dimensional volume (i.e., the Lebesgue
measure) equal to 1.
Let A be a parallelepiped in RM spanned by vectors v1 , . . . , vM , i.e.,
.
M

j vj : 0 j 1 .
A
j=1

Then the volume V (A) of A is dened by



V (A) 

1 dx.
A

This integral can be calculated with help of the linear operator T : RM RM which
M

sends the vectors e1 , . . . , eM of the standard basis to v1 , . . . , vM (T ej = vj =
tij ei ).
i=1

It is well known that




1 dx =
| det T | dy = | det T |  | det (tij )i,j=1,...,M |.26
A

(4.3.40)

There is a problem with the generalization to a manifold since a manifold is bent.


Nonetheless, a manifold can be supposed to be locally at provided it is smooth. This
basic principle of analysis allows us to dene an innitesimal area or volume via these
notions for at tangent spaces. Another problem now arises since there is no natural
unit cube in Ta M . To overcome this obstacle we want to express the M -volume of the
parallelepiped, spanned by the coordinate vectors y 1 , . . . , yM , without using the standard basis. This can be done for the parallelepiped A given above with help of the scalar
product (in which the standard basis is an orthonormal basis). If G(v1 , . . . , vM ) is the
so-called Gramm matrix of vectors v1 , . . . , vM , i.e.,

(v1 , v1 )RN

..
G(v1 , . . . , vM ) =
.
(vM , v1 )RN
then the formula

..
.

(v1 , vM )RN

..
,
.
(vM , vM )RN
1

V (A) = [det G(v1 , . . . , vM )] 2

(4.3.41)

holds (see Exercise 4.3.103).


26 More

generally:If T is a (nonlinear)
dieomorphism which maps C onto A  T (C), then the

1 dx =
| det T  (y)| dy holds for the Lebesgue measure V (A) of A. The

formula V (A) 

proof of this nonlinear version is based on (4.3.40).

214

Chapter 4. Local Properties of Dierentiable Mappings

Since Ta M RN , the scalar product in RN can be restricted to Ta M to get a


27
This justies
the next denition. We wish to point out
natural scalar product in T'
aM .
(

, y j
that the Gramm matrix G y1 , . . . , yM consists of scalar products of vectors y
i
in RN (the dierential structure of M is inherited from RN ).
Denition 4.3.81. Let M be a dierentiable manifold with an atlas (Vn , n )nN and let
n = (n |Vn )1 .

Un = Pn n (Vn ),

Let {n }nN be a partition of unity in M subordinate to {Vn }nN . If f is a continuous


function on M , then we dene
 1





2

f dV 
(n f )(n (y)) det G
,...,
dy1 dyM
(4.3.42)
y1
yM
M
n=1 Un
provided the right-hand side exists and the sum is absolutely convergent.
Remark 4.3.82. It is possible to show that this denition does not depend on the partition
of unity and on the choice of an atlas. The right-hand side in (4.3.42) exists whenever f
has compact support or, in particular, if M is a compact manifold.
Example 4.3.83. Compute the surface area V (S 2 ) of the unit sphere S 2 in R3 .
It is obvious that the two-dimensional surface area of the Greenwich meridian G
is zero. The rest S 2 \ G is covered by one chart with
' (
.
(, ) = (cos cos , sin cos , sin ),
(0, 2), ,
2 2

 

Since n = 1, U1 = (0, 2) 2 , 2 , 1 = 1 and det G
, = cos2 ,

= ( sin cos , cos cos , 0),

we have

= ( cos sin , sin sin , cos ),



V (S 2 ) =

1 dS =
S2

(0,2)(
,
2 2)

cos d d = 4.

(It is more common to denote the integration symbol in the two-dimensional case by dS
e
instead of dV .)
It follows from Denition 4.3.77 (see also Exercise 4.3.100) that the integral of a
one-form along a curve depends on the orientation of . Namely, if
(t) = (1 t),

t (0, 1),


then
(t) = (1
t),

and hence

27

This scalar product leads to uniformly distributed mass or currents in physical applications
but it is sometimes unrealistic. To cover further applications in a more realistic manner we can
consider dierent scalar products at dierent points of a manifold. Since any positive denite
symmetric bilinear form in RM RM determines a scalar product in RM , we can introduce a
metric structure on a manifold M by a (smooth) mapping g : x M  S2+ (Tx M ) (positive
denite symmetric bilinear forms on Tx M ). Such g is called a Riemann metric on M .

4.3C. Integration on Manifolds

215

This dependence on an orientation is crucial for the generalization of the curve integral to
an integral of a dierential form over a manifold. What can the orientation of a manifold
be? Let us start with simple examples like R, R2 , R3 . It is the common understanding that
the standard (equivalently positive) orientation on R is from the left to the right, in
R2 anticlockwise and by the right-thumb rule in R3 . These slightly vague formulations
can be made precise by taking xed bases in R, R2 , R3 , e.g., the standard bases. Then
all bases are divided into two disjoint classes according to the sign of the determinant of
the transformation matrix T which sends the xed (e.g., standard) basis into a new one.
We say that e1 , . . . , eM is a positive basis if det T > 0. We want to remind the reader
that
det T = f 1 f M (
e1 , . . . , eM )
if
T ei = ei , i = 1, . . . , M,
and f 1 , . . . , f M is the dual basis to e1 , . . . , eM (Example 4.3.52(i)). This indicates that
the choice of a xed nowhere-vanishing continuous M -form (i.e., (x)
= 0 for all
x M ) on the M -dimensional manifold M makes it possible to introduce an orientation
on M . If (V, ) is a local chart at a point x M , then the basis y 1 , . . . , yM of Tx M
is said to be a positive basis of Tx M provided



(x)
,...,
> 0.
y1
yM
It can be proved that a continuous non-vanishing form exists on M if and only if there
)
of this atlas the
is an atlas (Vn , n ) of M such that for any two charts (V, ), (V,
) (y) (see (4.3.17)) has a positive determinant for all
transformation matrix ((P )
(provided V V
=
y (V V)

).
Denition 4.3.84. A dierentiable manifold M of dimension M is said to be orientable
if there exists a continuous nowhere-vanishing M -form on M . If such a form is xed,
then (M , ) is called an oriented manifold .
Example 4.3.85.
(i) Suppose that M is a two-dimensional orientable connected manifold in R3 (i.e., a
surface) and is a nowhere-vanishing two-form on M . The question is how these
orientations of Tx M cohere with the natural orientation of R3 . To nd an answer we
choose a point x M and local coordinates at x such that y 1 , y 2 form a positive
basis in Tx M . It is obvious that there is a vector n R3 which is perpendicular to
Tx M R3 and such that



,
n,
y1 y2
n
is called a (unit) outer normal vector to
is a positive basis of R3 . The vector
n

M at the point x. It is easy to prove that

n=

.
y1
y2

For the denition of the cross product see Example 4.3.55(ii).


(ii) The M
obius strip S R3 is an example of a non-orientable manifold. An argument
to prove that can be based on the above consideration. Choose a point a S , a
basis y 1 , y 2 in Ta S and nd the outer normal vector na . Now move the point

216

Chapter 4. Local Properties of Dierentiable Mappings


(
'
a together with the basis n, y 1 , y 2 along the whole strip to come back to the
e
initial position. The vector na will end at na (see Figure 4.3.19).

y1

nx

y2

y2

na

y1

Figure 4.3.19.
Denition 4.3.86. Let (M , ) be an oriented M -dimensional dierentiable manifold. Let
(Vn , n )nN be an atlas of M for which the coordinate vectors y 1 , . . . , yM form a
positive basis of Tx M for all x Vn and all n N. Let {n }nN be a partition of unity
subordinate to this atlas. If is a continuous M -form with the local representation
(x) = fn (x) dy 1 dy M ,

x Vn ,

then we dene the integral of over M as







=
n 
n (n )(y)
M

n=1

n=1



n=1

Un

Un

(4.3.43)
(n fn )(n (y)) dy1 dyM 28

provided the right-hand side exists and the sum is absolutely convergent.
Remark 4.3.87.
N
(i) If a form has
 compact support, in particular, if M is a compact set in R , then
exists.
the integral
M
28 For

M
(y) = g(y) df 1  df M a continuous M -form
 on a measurable set U R (cf. Re-

mark 4.3.54(ii)) we dene


U

g(y) df 1 df M =

g(y) dy1 dyM .

4.3C. Integration on Manifolds

217

(ii) Denition 4.3.86 does not depend on the concrete choice of an atlas and on a
partition of unity. If the coordinate vectors y 1 , . . . , yM determine a negative
basis in Tx M , then we change the order, e.g., to

,
,
,...,
,
y2 y1 y3
yM
to get a positive basis. Notice that
(x) = f (x) dy 1 dy M = f (x) dy 2 dy 1 dy 3 dy M .
(iii) Denition 4.3.86 is also independent on a transformation of coordinates in the
following sense:
Let g be a dieomorphism of an oriented manifold M onto a manifold
M.29 Then g induces an orientation on M30 and with respect to this
orientation the equality


=
M

(4.3.44)

holds for any continuous M -form on M.


(iv) If a curve is itself a dierentiable
manifold (Remark 4.3.80) with the orientation

dened by (4.3.39) is the same as in (4.3.43).
induced from R, then

(v) If M is an oriented manifold and V is the M -form given in local coordinates by


 1


2

,...,
dy 1 dy M
V = det G
y1
yM

where

, . . . , yM
y1

is a positive basis, then the volume V (M ) 

(4.3.42)) is given by

dV (see
M


V (M ) =

V .
M

Example 4.3.88. Let M be part of the hemisphere given by




1
(x, y, z) R3 : x2 + y 2 + z 2 = 1, z >
2
and
(x, y, z) = y 2 dx dy + yz dx dz + x2 dy dz.
We choose the spherical coordinates :
x = cos cos ,

y = sin cos ,

z = sin ,

' (
(, ) U  (, ) ,
6 2

29 We have not dened this notion yet, but it is almost evident how to generalize the well-known
case RM RM . One has to overcome certain diculties which are caused by the local denition
of a manifold and the global notion of dieomorphism.

30 Let (V , )
form a
n
n nN be an atlas of M such that the coordinate vectors y , . . . , y
1
M

, . . . , g
determine a positive basis of Tg(x) M.
positive basis of Tx M for x Vn . Then g
y1

yM

218

Chapter 4. Local Properties of Dierentiable Mappings

 
and the orientation such that
, is a positive basis. We wish to compute the
integral
.
M
*
)

has two-dimensional surface meaThe curve = ( cos , 0, sin ) : 6 , 2
sure equal to zero, and therefore


=
d d.
M

We have (see Example 4.3.55(ii) or (4.3.33) and (4.3.19))


( dx dy) = sin cos d d,
( dy dz) = cos cos2 d d,

i.e.,

= cos3 cos4 d d.

( dz dx) = sin cos2 d d,


An easy computation gives


=
M

(,)(
,
6 2)

cos3 cos4 d d = 0.


It is evident that the computation of

need not be an easy task when several


M

charts have to be used to cover the support of . The main reason is that a partition
of unity must be constructed and this is technically dicult. Because of that, we would
like to have such useful tools like the Fubini Theorem and the Fundamental Theorem of
Calculus. The later theorem, i.e.,
 b
f  (x) dx = f (b) f (a),
a

can be interpreted in the manifold language as follows. The closed interval [a, b] is a
manifold M positively oriented from the left to the right. The one-form f  (x) dx is the
dierential of the zero-form (i.e., the function f ), and the (oriented) boundary of M
consists of the points b, a (in this order). The Fundamental Theorem of Calculus reduces
the integral of the form df (x) = f  (x) dx over M to an integral of f over M . This
observation is essential for the generalization to manifolds with boundaries. To do that
we have to dene the boundary of M rst, and then to show how this boundary inherits
the orientation of M .
Denition 4.3.89. Let N be an M -dimensional dierentiable manifold in RN . A closed
subset M of N is said to be an M -dimensional dierentiable manifold with boundary 31 if
int M = M
(the interior and closure are taken in the topology of N ) and for any point x M there
is a chart (V, ) of an atlas of N such that either
31 This boundary can be an empty set (see, e.g., Remark 4.3.90(i)). If this boundary is nonempty,
then the manifold M is not a dierentiable manifold in the sense of Denition 4.3.4. See also
Remark 4.3.90(ii).

4.3C. Integration on Manifolds

219

(i) V M
or
(ii) P (x) = (0, y2 , . . . , yM ) and P (V M ) = {(y1 , . . . , yM ) RM : y1 0} P (V).
A point x is called an interior point of M in the case (i). If x is not an interior point,
then x is called a boundary point of M . The collection of all boundary points is called
the boundary of M and denoted by M (see Figure 4.3.20).

RN

RN M

V M

x
V

{(y1 , . . . , yM ) RM : y1 < 0}

{(0, y2 , . . . , y

M)

P (x)
y1

P (V)

RM }

{(y1 , . . . , yM ) RM : y1 > 0}

RM
Figure 4.3.20.

Remark 4.3.90.
(i) A manifold can have empty boundary. The sphere S 2 in R3 is an example of this
fact. Such a manifold is also called a manifold without boundary.
(ii) If M is a manifold with nonempty boundary M , then the boundary M is itself a
dierentiable manifold of dimension M 1 in RN . An atlas is given by the restriction
of the original atlas. We notice that M has empty boundary, i.e., (M ) = .
(iii) Let M be a manifold with boundary. The tangent space Ta M for an interior point
a M is dened as in Appendix 4.3A and Ta M = Ta N . If a M , then we take
M
M
: y1 < 0}) through the point
all smooth curves in RM
(R = {(y1 , . . . , yM ) R
N
b = (a) and transfer them into R by applying
= (P )1 .

220

Chapter 4. Local Properties of Dierentiable Mappings


The tangent vectors at the point a of these transferred curves form the tangent
space Ta (M ). If : (1, 1) RM is a smooth curve such that

(0) = b,

(t)

RM

RM
+

for
for

t < 0,
t > 0,

then

 (b)[(0)]
is the so-called outer vector to M . The outer normal n to M at the point
a M is the unit outer vector which is perpendicular in RN to Ta (M ) (see
Figure 4.3.21).

RN

RN M
M

()

a
{a} + Ta M
n

Ta M

P
P (V)
o

y1

(0)

RM

RM
+

Figure 4.3.21.
(iv) If M N is a dierentiable manifold with boundary and (N , ) is an oriented
manifold, then induces an orientation on M as follows. Choose a M and
d
(t, b2 , . . . , bM )-v1 
dt
t=0

4.3C. Integration on Manifolds

221

(a special case of an outer vector), and dene


(a)(v2 , . . . , vM ) = (a)(v 1 , v2 , . . . , vM ).
The form denes the induced orientation on M . In other words: v 2 , . . . , v M is
a positive basis of Ta (M ) provided v 1 , . . . , v M is a positive basis of Ta N .
Example 4.3.91.
(i) Let M be the closed ring
{(x, y) R2 : 1 x2 + y 2 4}.
Then M R2 is a two-dimensional manifold with boundary
M = S11 S21
(Sr1 denotes the circle with radius r and center at the origin). As the local coordinates we take the polar coordinates (r, ). The (standard) orientation on M is
given either by the standard Euclidean basis e1 = (1, 0), e2 = (0, 1) in Ta M = R2
or by the form
(a) = f 1 f 2  dx dy = r(a) dr d.
The special outer vector v 1 mentioned in Remark 4.3.90(iv) is

r at the point a1 = (1, 1 ),


v1 =


at the point a2 = (2, 2 ).
r
The completion to a positive basis in Ta R2 is shown in Figures 4.3.22 and 4.3.23.

R2
a1

v2

v1
o

=
S11
a2
v1

S21

v2
Figure 4.3.22.

222

Chapter 4. Local Properties of Dierentiable Mappings

v1 =

r
2

a2

v2 =

v2 =

a1

1
v1 =

Figure 4.3.23.

The form on S11 is given (in the polar coordinates r, ) by







(a)
= r(a) dr d ,
= r(a).

r
(ii) Let B be the closed unit ball in R3 . Then B is a three-dimensional manifold (included in R3 ) with boundary B = S 2 (the two-dimensional sphere). The standard
orientation in R3 = Ta B gives the orientation in int B. The induced orientation on

at
B is obtained by Remark 4.3.90(iv). Namely, we take a normal vector n = r
a point a B and independent vectors v 2 , v 3 Ta (B) in the order given by the
right-thumb rule for (n, v 2 , v 3 ) (see Figure 4.3.24).32 Notice that the orientation on
B is given, e.g., by
= dx dy dz = (r 2 cos ) dr d d
in the spherical coordinates. Similarly,

(v2 , v3 ) =

, v2 , v3 .
r

The next theorem is a basic result on dierential forms and it is the promised
generalization of the fundamental theorem of calculus.
Theorem 4.3.92 (Stokes, abstract version). Let M be an M -dimensional oriented manifold
with boundary M . Let be a smooth (M 1)-dierential form on M with compact
support. Then


i 33

d =
M

provided M has the induced orientation and i : M M is the canonical injection.


precisely, for n  v 2 v 3 (the cross product is dened in Example 4.3.55(ii)) the vectors
3
n, v 2 , v
3 form a positive basis in R and n is perpendicular to v 2 , v 3 .
33 Here
d is dened as in Denition 4.3.86 where Un need not be an open set see foot32 More

note 28 on page 216.

4.3C. Integration on Manifolds

223

R3
n

v2
a
v3

Ta S 2

Figure 4.3.24.

Proof. Let (Vn , n )nN be an atlas of M such that the coordinate vectors given by

, . . . , yM form positive bases in the corresponding tangent spaces. Let {n }nN be


y1
a partition of unity subordinate to {Vn }nN . Since has compact support, we have

k 

d =
j d
where supp (j d) Vj .
M

j=1

Therefore, it is sucient to prove the Stokes Theorem only for the case when supp is
contained in one coordinate neighborhood, say V. Suppose that has the representation
(x) =

M


(1)i1 fi (y) dy 1 7
dy i dy M ,

x = (y) V,

i=1

where the hat denotes that the term dy i is missing. Then



M
 fi
(y) dy 1 dy M .
d(x) =
yi
i=1
There are two cases for V:

i = 0 and

Case 1 (V M = ). Then
M


d =
M

M 

fi
(y) dy1 . . . dyM = 0
y
i
i=1 U

by the Fubini Theorem, since fi = 0 outside of a compact subset of U.

224

Chapter 4. Local Properties of Dierentiable Mappings

Case 2 (V M
= ). According to the denition of the boundary we can assume that
V and (V) = U have the form as in Figure 4.3.20. Then
i (x) = f1 (y) dy 2 dy M

for

x = (y) V M

and


i =
M


f1 (y) dy 2 dy M =
M

U RM 1

f1 (0, y2 , . . . , yM ) dy2 . . . dyM .

On the other hand,




 0



M 

fi
f1
dy1 . . . dyM =
(y) dy1 dy2 . . . dyM
yi
U RM 1
y1
i=1 U

=
f1 (0, y2 , . . . , yM ) dy2 . . . yM

d =
M

U RM 1

since the integrals for i = 2, . . . , M vanish because of the compact support of the restriction of f to U RM 1 .


This completes the proof.

Several special cases of the Stokes Theorem are worth mentioning. We say that a
curve : [a, b] RN is simple if
(t1 ) = (t2 ),

t1 < t2 ,

implies

a = t1 ,

b = t2 .

Corollary 4.3.93 (Green). Let be a bounded open subset of R2 the boundary of which
is the image of a simple closed smooth curve which is oriented so that is on the
left-hand side when we move along the curve. Let F = (f, g) : R2 R2 be a C 1 -mapping
in a neighborhood of the closure . Then



f
g
dx dy =

(f dx + g dy).
(4.3.45)
x
y

(See Denition 4.3.77 for the integral on the right-hand side.)


Proof. Notice that is an oriented manifold (the positive orientation of R2 ) with boundary (smoothness of ) and the above mentioned orientation of agrees with the
induced orientation on . Put
= f dx + g dy


and use the Stokes Theorem.


Remark 4.3.94. For F (x, y) = 12 (y, x) the formula (4.3.45) gives
1
V () =
2


y dx + x dy,

which can be used for computation of the area of planar sets.

4.3C. Integration on Manifolds

225

Corollary 4.3.95 (GaussOstrogradski). Let be a bounded open subset of R3 such that


is a dierentiable manifold with boundary . Assume that F = (f 1 , f 2 , f 3 ) : R3 R3
is a smooth mapping in a neighborhood of . Then

 
3
f i
dx1 dx2 dx3 =
(F, n) dS
(4.3.46)
i=1 xi

where n is the unit outer normal vector to and (F, n) is the scalar product in R3 .
The integral on the right-hand side is dened in Denition 4.3.81.
Proof. Put
= f 1 dy dz + f 2 dz dx + f 3 dx dy
 
, v forms a positive orthonorand choose local coordinates (u, v) on such that n, u
3

mal basis in R . The pull-back i has been computed in Example 4.3.55(ii):





i = F,

du dv.
u
v
The cross product

v
= n.
u

v
is the unit vector which is perpendicular in R3 to

, .
u v

So,


Corollary 4.3.96 (Stokes, classical version). Let M be a bounded two-dimensional oriented


manifold in R3 (i.e., a surface) with boundary which is described by a simple smooth curve
. Let F be a C 1 -vector eld on M . Then


(curl F, n) dS =
F
M

where has the orientation induced from M , the vector curl F is dened in Example 4.3.58(ii) and n is the unit outer normal vector to M in R3 (if y 1 , y 2 is a positive
(
'
basis at a M , then n is perpendicular in R3 to Ta M and n, y 1 , y 2 is a positive

basis in R3 ).

Proof (a hint). Rewrite the abstract Stokes theorem for this special case using the corresponding denitions of integrals and Example 4.3.58(ii).

Remark 4.3.97. Considering F in Corollary 4.3.95 as a velocity eld of a uid ow (Remark 4.3.56) we can interpret the right-hand side in (4.3.46) as the amount of the uid
which ows out of a region per a unit time. In particular, if the divergence of F
3

f i
) vanishes everywhere in , then this amount is zero for any subregion
(div F 
xi
i=1

of . In other words, the uid is incompressible in this case.


Remark 4.3.98. Using an innitesimal ball centered at a in Corollary 4.3.95 it is
3

f i
at the point a physically.
possible to interpret the value of the function div F =
xi
i=1

From the mathematical point of view it is more interesting that this is a starting point
for the generalization of basic dierential operators to non-at domains. We now briey
describe this procedure. Let F be a vector eld on a manifold M and let be an M -form.
Dene an (M 1)-form F by
(F )(v1 , . . . , vM 1 )  (F, v1 , . . . , vM 1 ).

226

Chapter 4. Local Properties of Dierentiable Mappings

If (M , ) is an oriented manifold, then d(F ) has to be a multiple of :


d(F )  (div F ).
We strongly recommend that the reader computes div F , e.g., on the sphere S 2 . One of
the most important partial dierential operators is the Laplacian : if G is an open set
in RN , f : G R is smooth, then
f 

M

2f
x2i
i=1

and it is easy to see that



f = div (f )

where

f 

f
f
,...,
x1
xN

is the gradient of f . Since the notion of the gradient has been dened for functions on
manifolds (Remark 4.3.56), we are able to generalize the Laplacian to functions dened
on a manifold M :
M f = div (f ).
This operator M is often called the LaplaceBeltrami operator. For more information
on the signicance of this operator the reader can consult, e.g., Chavel [23], Davies &
Safarov [31], Robinson [108] or Rosenberg [110].
Remark 4.3.99. In weak formulations of boundary problems for elliptic dierential equations (see Chapter 7) the following Green formula is frequently used:
Let a manifold M satisfy the assumptions of the abstract Stokes Theorem and
let f, g C 2 (M ). Then



M

(M f )g +


M

(f, g)Tx M =
M

g(f, n)Tx (M) .

(4.3.47)

The proof is based on a generalization of (4.3.46):




(div F ) =
(F, n)
M

which follows from the abstract Stokes Theorem. Another ingredient is the formula for
the divergence of the product of a function and a vector eld
div (gF ) = g div F + (g, F ).
This formula follows from the denition of the divergence by computation.

does not depend on the choice of an atExercise 4.3.100. Prove that the denition of

las and a partition of unity. What can be said about the dependence on a parametrization
of ?
Exercise 4.3.101.
Let be an exact one-form and f its primitive function, i.e., df = .

! What is the result if is a closed curve?

Compute

4.3C. Integration on Manifolds

227

Exercise
4.3.102. Let be a closed one-form in M . Show that is exact if and only if

= 0 for any smooth closed curve in M .

Hint. Consult the proof of Theorem 4.3.64.


Exercise 4.3.103. Prove (4.3.41)!
Hint. Use induction. To prove the induction step it is convenient to compute the distance
of vM from the span of v1 , . . . , vM 1 . Show that
2 =

det G(v1 , . . . , vM )
det G(v1 , . . . , vM 1 )

provided v1 , . . . , vM 1 are linearly independent.


Exercise 4.3.104. Check that the denition (4.3.38) is a special case of (4.3.42)! Express
the length of a curve R3 in spherical coordinates!
Exercise 4.3.105. Let a surface S be determined by the graph of a smooth function
3
f : U R2
for the area of S!
: R . Finda formula
2

2
f
f
1+
+
dx dy
Hint.
x
y
U
Exercise 4.3.106. Let M be the graph of a smooth function f : RM R dened on an
open set U RM . Show that M is an orientable manifold!
Exercise 4.3.107. Let f : RN R be a smooth mapping for which o is a regular value.
Show that
M = {x RN : f (x) = 0}
is an orientable manifold of dimension N 1 (provided M
= )!
Exercise 4.3.108. Deduce a version of the Cauchy Theorem, i.e.,

f dz = 0,

from the Green Theorem (Corollary 4.3.93)!


Hint. Interpret f (z) dz as the couple of dierential 1-forms
(g dx h dy, h dx + g dy).
Exercise 4.3.109. Find the formula34 for f
(i) in polar coordinates;
Hint. You should get
f =

1 f
2f
1 2f
+
;
+
r 2
r r
r 2 2

(ii) in spherical coordinates in R3 ;


34 Such

formulae are convenient if one is looking for a solution with some symmetries, i.e., invariant with respect to some group actions.

228

Chapter 4. Local Properties of Dierentiable Mappings


Hint. You should get
f =

2 f
2f
1
+
+ 2
r 2
r r
r cos

2f
2f
f
+ cos 2 + sin
2


;

(iii) on S 2 ;
(iv) on the Riemann manifold (see footnote 27 on page 214).
Exercise 4.3.110. Let M be a connected dierentiable manifold.
(i) Prove that there exists a Riemann metric g on M .
Hint. M is embedded into RN .
(ii) Prove that any two points x, y M can be connected by a C 1 -curve, i.e.,
there is

: [a, b] M ,

C1,

(a) = x, (b) = y.

Hint. Use the assumption that M is connected.


(iii) Dene the length of a C 1 -curve : [a, b] M by the formula


lg () 

;
g(t) ((t),

(t))

dt.

Put
g (x, y) = inf{lg () : : [a, b] M , (a) = x, (b) = y}
and show that g is a metric on M .
(iv) How is the topology on M given by g related to the topology induced from RN ?

4.3D Brouwer Degree


We will establish the main properties of the degree of a mapping f : RM RM in
the basic text (Section 5.2). In this appendix we will present another denition of the
Brouwer degree, prove its main properties and give some of its topological applications.
This goal can be achieved in dierent ways but all of them are lengthy and contain
intricate calculations. Here we choose the treatment based on the integration of dierential forms, mainly to introduce the interested reader to a geometric world and to
relate the notion of the degree to classical results of the theory of functions of complex
variable. Since C.F. Gauss was one of the fathers of this theory we will start with his
approach to the so-called Fundamental Theorem of Algebra.
Theorem 4.3.111 (Fundamental Theorem of Algebra). Let P be a polynomial with complex coecients of a degree at least 1. Then there exists z0 C such that P (z0 ) = 0.
Equivalently, P : C C is a surjective mapping.
We will give two rather dierent proofs the ideas of which go back to Gauss. The
former is purely geometric with a little analysis for introducing a dierentiable structure
on the sphere S 2 . The latter uses the theory of functions of a complex variable and
demonstrates the above mentioned connection to the degree.

4.3D. Brouwer Degree

229

Proof (see, e.g., Milnor [95]). We will regard the sphere S 2 as a compactication of the
complex plane and endow it with the structure of a dierentiable manifold by two charts
(see Example 4.3.5(iii)) given by the stereographic projections + , of S 2 \ {N } where
N is the north pole, and S 2 \ {S} where S is the south pole, respectively, onto R2 . See
Figure 4.3.25.

S2

x
R2

(x)
+ (x)

S
Figure 4.3.25.

Let P be a non-constant polynomial. We dene f to be



1
+
P + (x), x S 2 \ {N },
f (x) 
N,
x = N.
One can prove that f : S 2 S 2 is continuously dierentiable at all points. To prove
dierentiability at N it is sucient to verify another formula for f , namely
1
Q
f =

where

Q(z) =

1
P (z 1 )

Q(0) = 0

(z is the complex conjugate to z). The calculation of f (see (4.3.19)) shows that x0 S 2
is a singular point of f (i.e., f (Ta S 2 )
= Tf (a) S 2 ) if and only if
P  (z0 ) = o,

z0 = + (x0 ).

Since the last equation has only a nite number of solutions, the set A of singular values
of f is nite. This is the main point where we have used that P is a polynomial. Consider
now a point y S 2 \A, i.e., y is a regular value of f . Then the set f1 (y) is nite (possibly
empty) since a polynomial takes any value only nitely many times. Let (y) denote the
number of points, i.e., the cardinality, of f1 (y). The main part of the proof is to show
that is a constant function on B  S 2 \ A. To prove that we consider two kinds of
regular values,
B2 = B \ B1 .
B1 = {y : f1 (y) = },
Since B1 = S 2 \ f (S 2 ) and S 2 is compact, B1 is open in S 2 . For y B2 we have
f1 (y) = {x1 , . . . , xk }

230

Chapter 4. Local Properties of Dierentiable Mappings

and there are disjoint open neighborhoods U(x1 ), . . . , U(xk ) on which f is a dieomorphism (see Local Inverse Function Theorem 4.1.1). Put
Vi  f (U(xi )).
It is easy to see that is constant on the set



k
k


2
U(xi ) ,
Vi \ f S \
i=1

i=1

which is a neighborhood of the point y. Since S 2 is connected and A is nite, the set
B is also connected. The function , being locally constant, is constant on the whole of
B. Moreover, cannot vanish on B, i.e., B1 = . This shows that f actually maps S 2
onto S 2 , i.e., P : C C is surjective as well. In particular, there exists z0 C such that

P (z0 ) = 0.
The following result generalizes the above fact that f is surjective (see, e.g., Sternberg [124, Theorem 3.4.3]).
Proposition 4.3.112. Let M1 , M2 be two oriented manifolds of the same dimension, and
let M2 be a connected space. Suppose that f : M1 M2 is a proper35 dierentiable
mapping such that its realization (see Figure 4.3.15) has a nonnegative Jacobian at any
point. Then either the Jacobian vanishes everywhere or f (M1 ) = M2 .
Remark 4.3.113. Write f : C C as
f (z) = g(x, y) + ih(x, y)

where

z = x + iy

and

g, h : R2 R.

If f is a holomorphic function in an open set G, then the CauchyRiemann conditions


h
g
=
,
x
y

g
h
=
y
x

hold for

z = x + iy G,

and the Jacobian of (g, h) : R2 R2 satises


J(g,h) (x, y) = |f  (z)|.
In particular, J(g,h) 0 and if f is a polynomial, then f is proper (why?), and so
Proposition 4.3.112 can be used to get the Fundamental Theorem of Algebra.
The idea of the latter proof is based on the notion of the index of a point a with
respect to a curve. If is a closed C 1 -curve in C, a
,36 then the index , Ind a, is
dened by the formula

1
dz
.
Ind a =
2i z a
We say that is positively oriented if
Ind a 0

for all

a
.37

mapping f is said to be proper if f1 (K) is compact whenever K is a compact set.


to complicate matters by dierent notation we identify the curve, i.e., a mapping
: interval I C, with its image.
37 This denition, common in the theory of functions of a complex variable, coincides with our
denition of an oriented manifold.
35 A

36 Not

4.3D. Brouwer Degree

231

In particular, if
a=0

and

(t) = reit ,

t [0, 2n],

n Z,

then
Ind 0 = n
and it can be interpreted as the number of revolutions of and also as the increment of
the argument along divided by 2.
Proposition 4.3.114 (Rouche). Let be a simple, closed, positively oriented C 1 -curve in
an open set , and let
G  {z \ : Ind z
= 0}.
If f is a holomorphic function in for which 0
f (), then the number Nf (G) of
solutions of the equation f (z) = 0 that belong to G is equal to
 
f (z)
1
Nf (G) =
(4.3.48)
dz = Indf 0 38
2i f (z)
provided the solutions are counted with their multiplicity.39 If g is another holomorphic
function in such that
|f (z) g(z)| < |f (z)|,

z ,

(4.3.49)

then Nf (G) = Ng (G).


Proof. The proof is based on the Residue Theorem, see, e.g., Rudin [113, Theorem 10.43].

We wish to point out that the condition (4.3.49) is a quantitative description of
stability of the number of solutions with respect to perturbations of f .
The second proof of Theorem 4.3.111. Our second proof of the Fundamental Theorem of
Algebra follows easily from the previous proposition:
Suppose that
P (z) = z n + a1 z n1 + + an
and put f (z) = z n . Let R > 0 be such that
|a1 z n1 + + an | < |z n | = Rn

for

|z| = R

(why such an R does exist?). Proposition 4.3.114 implies that


n = Nf (G) = NP (G)
where G is the open ball B(0; R).

The connection between the winding number and the degree deg (f, , p) as the
latter is dened in Denition 5.2.1 for a regular value p is given by the following result.
For a holomorphic function f and its regular value p the degree deg (f, , p) is dened
38 This

quantity is also called the winding number of f at 0 with respect to .


solution z0 has multiplicity k if f (z0 ) = = f (k1) (z0 ) = 0, f (k) (z0 )
= 0. Notice that k
is nite provided f is not identically zero.

39 A

232

Chapter 4. Local Properties of Dierentiable Mappings

as the number of solutions in of the equation f (z) = p. By identication of


f : C C,

f (z) = g(x, y) + ih(x, y)

with

(g, h) : R2 R2 ,

this denition coincides with Denition 5.2.1.


Lemma 4.3.115. Let be an open, bounded set whose boundary is the image of a
C 1 -simple, closed, positively oriented curve . Assume that f : C C is holomorphic
in a certain neighborhood of and p
f () is a regular value of f (i.e., f  (z)
= 0
whenever f (z) = p). Then

f  (z)
1
deg (f, , p) =
dz.
(4.3.50)
2i f (z) p
Proof. Denote
A = {z : f (z) = p}.
If A = , then both sides vanish (use the Cauchy Theorem for the right-hand side). If
A=

, then A is nite since


A = ,
and f is holomorphic. Both sides in (4.3.50) are equal to the cardinality of A. This follows
for the left-hand side from the denition of the degree and for the right-hand side by the
Residue Theorem.

We would like to point out here that the formula (4.3.50) indicates a way to remove the assumption on regularity of p. Namely, the integral exists for any holomorphic
function f provided p
f (). Put
d  inf |f (z) p|,
z

and let A = {z1 , . . . , zn } be the same as in the proof of Lemma 4.3.115. Assume that the
solution zj is of multiplicity mj , and denote
m  m1 + + mk .
According to Proposition 4.3.114, the number of solutions (including multiplicity) of the
equation f (z) = q is also m provided |pq| < d. In this neighborhood of the point p there
exists a regular value of f . This follows either from the Sard Theorem (see Theorem 5.2.3)
or, in this special case more easily from the properties of holomorphic functions see,
e.g., Rudin [113, Theorem 10.32]. If the degree has the property of stability with respect
to the point (Property (vi) or (viii) in Theorem 5.2.7 or Theorem 4.3.124), the equality
(4.3.50) will also hold for singular values.
In order to motivate the next result, we note that the integral in (4.3.50) can be
rewritten as an integral over (by the Green Theorem). We avoid such uninteresting and
intricate calculation, and put aside the special case of holomorphic functions. Instead we
will consider the general case of mappings RM RM . For the rest of this appendix we
will suppose that

is a bounded open set in RM ,

(H)
f : RM is continuous on ,

p RM \ f ().

4.3D. Brouwer Degree

233

Proposition 4.3.116. In addition to (H) let f C 1 (, RM ) and let p be a regular value of


f . Then there exist aneighborhood U of the point p and a smooth function : RM R
supported in U with

(y) dy = 1 such that




f =

(  deg (f, , p)).

sgn Jf (x)

(4.3.51)

x
f (x)=p

Here f is the pull-back of the form


(y) = (y) dy 1 dy M
(see Remark 4.3.54(iii)).
Proof.40 By the denition of the pull-back we have
f (x) = (f (x))Jf (x) dx1 dxM

f dx =

and

(f (x))Jf (x) dx

where Jf (x) is the Jacobian of f at x. We consider two complementary cases:


Case 1 (p
f ()). Then there is a neighborhood U of p which is a subset of RM \ f ().
(y) dy = 1.
Choose a smooth function on RM with its support in U and such that

Then

f = 0

since

|f () = 0.

The right-hand side of (4.3.51) also vanishes by denition ( = 0).

Case 2 (p f ()). Since p is a regular value, the set


A  {x : f (x) = p}
is nite. Let A  {x1 , . . . , xk }. Then for any xi A there is a neighborhood Vi of xi
such that f |Vi is a dieomorphism of Vi onto a neighborhood Ui of p. These neighborhoods
k
%
Ui of p such
V1 , . . . , Vk can be chosen mutually disjoint. Take a neighborhood U
i=1

that
f1 (U)

k


Vi

i=1

(why does such a U exist?). Choose a smooth function with its support in U and
(y) dy = 1. Then
normalize so that
U

f =

k 

i=1

k

i=1

f =
Vi

k 

i=1

(f (x))Jf (x) dx =
Vi

sgn Jf (xi )

(y) dy =
f (Vi )

k



sgn Jf (xi )

i=1
k


sgn Jf (xi ).

(f (x))|Jf (x)| dx
Vi

i=1

40 The proof based on a more explicit construction of the form is given by Mawhin [92]. There
the reader can also nd a dierent approach to the homotopy invariance property of the degree
(see Lemma 4.3.117 below).

234

Chapter 4. Local Properties of Dierentiable Mappings

We dene the degree deg (f, , p) for f C 1 (, RM ) and p a regular value of f by


the right-hand side of (4.3.51). This denition coincides with Denition 5.2.1. We also
want to point out that Case 2 of the proof contains the essence of the use of dierential
forms for the denition of the degree. The formula (4.3.51) can hardly be used for computation of the degree but its advantage consists in the fact that the integral exists also
if p is a singular value. However, we should be careful and examine whether this integral
does not depend on the choice of an M -form also in the case when p is a singular value.
Lemma 4.3.117. Let be a smooth M -form in RM with its support in a certain open
cube Q. If

= 0,
Q

then there exists a smooth (M 1)-form with support in Q such that


d = ,
i.e., is exact.
Proof. It is similar to that of the Poincare Theorem (Theorem 4.3.64).

Corollary 4.3.118. Suppose (H) and f C (, R ).


be two smooth M -forms
 Let ,
M
=

. Then
with supports in a cube Q R \ f () such that
1

f =

f
.

Proof. According to Lemma 4.3.117 there exists an (M 1)-form for which d =


.
Then
) = f ( d) = d(f )
f (
(see Exercise 4.3.71(iv)). Since the support of the form f is a compact subset of , the
Stokes Theorem (Theorem 4.3.92) implies that

d(f ) = 0.


Let p be a singular value of f and B  B(p; r) be such a ball that


B f () = .
According to the Sard Theorem (see Corollary 5.2.4) there is a regular value q of f in B.
We want to dene


=
f .
deg (f, , p)  deg (f, , q) =
Q

To be sure that this denition does not depend on the choice of the point q we choose
another regular value q B. It is obvious that a dieomorphism h of B (onto B) can be
constructed such that
h(q) = q.

4.3D. Brouwer Degree

235

Let
be a dierential form constructed in the proof of Proposition 4.3.116 supported on
q U B. Then by the denition of pull-back
a cube U,



=
h
.
U

is supported by the same cube Q as


The form
can be chosen in such a way that h
the form (explain why!). By Corollary 4.3.118,



=
h
=

.
Q

This shows that the assumption on regularity of p can be omitted in the denition of the
degree.
Now we may drop the assumption on smoothness of f . However, it will need a
certain further eort.
Lemma 4.3.119. Let be a bounded open set in RM and let H : [0, 1] RM be
such that the mapping t [0, 1] H(t, ) C 1 (, RM ) C(, RM ) is continuous. If
p RM \ H([0, 1] ), then
deg (H(t, ), , p)
is constant on the interval [0, 1].
Proof. First note that H is continuous as a mapping considered on [0, 1], and therefore
H([0, 1] ) is compact and there is an open cube
Q RM \ H([0, 1] )
which is a neighborhood of p. Choose a smooth M -form with its support in Q. By
denition,

deg (H(t, ), , p) =
(H(t, )) .

This shows that the function


t deg (H(t, ), , p)
is continuous on [0, 1]. Taking only integer values it has to be constant.

Corollary 4.3.120. Suppose (H) and denote d  dist (p, f ()). If g, h are mappings from
C 1 (, RM ) C(, RM ) such that
f gC(,RM )  sup |f (x) g(x)| < d,

f hC(,RM ) < d,

then
deg (g, , p) = deg (h, , p).
Proof. Put
H(t, x)  (1 t)g(x) + th(x),
The assertion now follows from Lemma 4.3.119.

t [0, 1],

x .


The last step in our approach to the denition of the degree consists in the approximation of a continuous mapping by smooth ones.

236

Chapter 4. Local Properties of Dierentiable Mappings

Lemma 4.3.121. Let be a bounded open set in RM and let f : RM be continuous.


Then there exist mappings fn C 1 (, RM ) C(, RM ) that converge uniformly on to
f.
Proof. Observe rst that it is sucient to prove the statement for individual components
of f . So we will assume that f : R is continuous.
There are many ways to prove the density of smooth functions in the space of continuous functions. See the discussion on page 31. We will present the approach based on
convolution approximations (see Proposition 1.2.20 and the proof of Proposition 1.2.21).
Extend f to a continuous bounded function g on RM (such an extension exists by the Tietze Theorem41 ). Choose a nonnegative C -function : RM R with compact support
and

(x) dx = 1. Put
RM

n (x) = nM (nx).

Then the convolutions

(n g)(x) 

RM

n (x y)g(y) dy,

x RM ,

are C -functions which converge to g locally uniformly on RM , in particular, uniformly


to f on . Proposition 1.2.20(iii) implies the smoothness of n g. To see the convergence
it is convenient to visualize the form of n (see Figure 4.3.26 for M = 1).42
The convergence is obtained as follows:

n (x y)|g(y) g(x)| dy
|(n g)(x) g(x)|
M
R
=
n (x y)|g(y) g(x)| dy
U (x)

+
RM \U (x)

n (x y)|g(y) g(x)| dy < + 0.

The rst integral is arbitrarily small for a suciently small neighborhood U(x) of x by
the continuity of g at x; the second integral is zero for suciently large n since n (x )

vanishes on RM \ U(x) for such n.
41 The

Tietze Theorem says:


Assume that g is a bounded, continuous, real function on a closed non-void subset
A of a metric space X equipped with a metric . Then there exists a continuous
extension G : X R such that
sup g(x) = sup G(x),
xA

xX

inf g(x) = inf G(x).

xA

xX

This theorem permits generalization (nontrivial) to normal topological spaces (see, e.g.,
Dugundji [43, Section VII.5]). The proof for metric spaces is quite easy. Indeed, without loss
(x,y)g(y)
of generality we can suppose that 1 g 2. Put G(x) = inf dist(x,A) for x X \ A, and
yA

G(x) = g(x) for x A. It is not dicult to show that G has all the required properties.
42 It is also said that converge to the Dirac measure (this is true in the sense of distributions)
n
1
M
or that {n }
n=1 is the so-called approximate unit (the space L (R ) with the convolution
multiplication is a Banach algebra without a unit, and the convergence n g g takes place
in the L1 -norm for all g L1 (RM )).

4.3D. Brouwer Degree

237

2
= 1
a

a2

a5

a
5

a
2

Figure 4.3.26.

Corollary 4.3.122. Suppose (H) and let {fn }


n=1 be a sequence the existence of which
follows from Lemma 4.3.121. Then
lim deg (fn , , p)

exists and its value does not depend on {fn }


n=1 whenever this sequence possesses the
properties from Lemma 4.3.121.
Proof. Since dist(p, f ()) is positive, we conclude that p
fn () for all suciently
large n and the degrees deg (fn , , p) are dened. Corollary 4.3.120 shows that the sequence
{deg (fn , , p)}
n=1
is eventually constant. This corollary also yields the independence of the limit of the

sequence of degrees on the choice of {fn }
n=1 .
The previous corollary shows how the denition of the degree deg (f, , p) is extended to all triples (f, , p) satisfying (H).
Remark 4.3.123. The approach which has just been described may be extended to a
mapping
f : M M
between manifolds of the same dimension. We need to assume that both M and M are
oriented (in order for the integral of an M -form to be dened, cf. Proposition 4.3.116
where instead of Jf (x) we take the Jacobian of a realization of f ). A set M is
supposed to be an open set with compact closure in the topology of M .
We consider the degree of f C(, M) with respect to a point p M \ f ().
Here is again the boundary of in the topology of M . M -forms have their supports
in some coordinate neighborhoods V of p which are disjoint with f ().
An analogue of Lemma 4.3.117 still holds. There are problems with an analogue
of Corollary 4.3.120 we have to use a topology on the space of mappings M M
since we have not dened any metric on a manifold. The existence of an approximating
sequence similar to that of Lemma 4.3.121 is not clear, either. These obstacles can be
overcome but special tools are required. Since they are beyond the scope of this book we
only refer the interested reader to, e.g., the book Hirsch [67, Chapters 2 and 5].

238

Chapter 4. Local Properties of Dierentiable Mappings

We are now able to prove the main properties of the degree. See also Proposition 5.2.2 and Theorem 5.2.7. Notice that the following theorem is also true in the case
of manifolds.
Theorem 4.3.124. There exists a mapping deg which sends any triple (f, , p) satisfying
(H) into Z and has the following properties:
(i) (normalization property) If f is the identity map, p , then
deg (f, , p) = 1.
(ii) (additivity property) If 1 , 2 are disjoint open subsets of and the point p is
such that p
f ( \ (1 2 )), then
deg (f, , p) = deg (f, 1 , p) + deg (f, 2 , p).
(iii) (continuity property) Let {fn , , p}, n = 0, 1, . . . , satisfy (H). If the sequence
{fn }
n=1 converges uniformly to f0 on , then
lim deg (fn , , p) = deg (f0 , , p).

(iv) (translation invariance property) deg (f, , p) = deg (f p, , o);


(v) (solution property) If deg (f, , p)
= 0, then there exists an x such that
f (x) = p.
(vi) (homotopy invariance property) If H : [0, 1] RM is continuous and (H) is
satised for all (H(t, ), , p), t [0, 1], then
deg (H(t, ), , p)
is constant on [0, 1].
(vii) (boundary values dependence property) If (f, , p), (g, , p) satisfy (H) and f and
g coincide on , then
deg (f, , p) = deg (g, , p).
(viii) (point dependence property) The mapping
p deg (f, , p)
is constant on every component of RM \ f ().
(ix) (multiplication property) Let be a bounded open set in RM and let the mapping
f : RM be continuous. Denote by U1 , . . . all bounded components of RM \
f (). Suppose that the mappig g : RM RM is continuous on f () and p

g(f ()). Then



deg (f, , Ui ) deg (g, Ui , p) 43
(4.3.52)
deg (g f, , p) =
i

where the sum contains only a nite number of nonzero terms.


of property (viii), deg (f, , U )  deg (f, , q) for a q U is well dened for any
component U of RM \ f ().

43 Because

4.3D. Brouwer Degree

239

Proof. The degree is dened in Denition 5.2.1 for f C(, RN ) C 1 (, RN ) and a


regular value p
f () and has been extended above by the procedure started with
Proposition 4.3.116. It follows from this construction that it is sucient to prove all
parts of Theorem 4.3.124 for f C(, RN ) C 1 (, RN ). Another proof is given on
pages 270271 (the proof of Proposition 5.2.2).
(i) This follows immediately from the denition.
(ii) This is a consequence of Proposition 4.3.116 since an M -form can be chosen
in such a way that its support is disjoint with f ( \ (1 2 )). Then



f =
f +
f .

(iii) It is obtained directly from Corollary 4.3.122.


(iv) It follows from Proposition 4.3.116.
(v) This is slightly tricky: Suppose by contradiction that
f1 (p) = ,
and choose four mutually disjoint, nonempty open subsets 1 , . . . , 4 of . Then, by the
additivity property, we have
deg (f, , p) = deg (f, 1 , p) + deg (f, 2 , p) = deg (f, 3 , p) + deg (f, 4 , p),
and also
deg (f, , p) = deg (f, 1 2 , p) + deg (f, 3 4 , p) = = 2 deg (f, , p).
This contradicts the inequality deg (f, , p)
= 0.
(vi) It follows from the construction see Lemma 4.3.119.
(vii) It is sucient to apply property (vi) to
H(t, x) = tf (x) + (1 t)g(x).
(viii) Choose an M -form supported in an open set U where U f () = . Then,
by denition,

f
for all p U.
deg (f, , p) =

This means that the degree is locally constant on RM \f (), and therefore it is constant
on every component of RM \ f ().
(ix) If the equation
g(f (x)) = p
has no solution in , then the left-hand side of (4.3.52) vanishes (by the solution property). For the same reason all products on the right-hand side are also equal to zero.
Suppose therefore that
f () g1 (p)
= .
There is exactly one unbounded component U0 of RM \ f (). Regardless of whether
U0 g1 (p)
= or not,
deg (f, , U0 ) = 0
(by (v) and (viii)). Since
g1 (p) f ()

240

Chapter 4. Local Properties of Dierentiable Mappings

is compact there is only a nite number of bounded components of RM \ f (), say


U1 , . . . , Uk , which contain some points of g1 (p) (dierent components are disjoint). According to property (ii) we have
deg (g f, , p) =

k


deg (g f, i , p)

(4.3.53)

i=1

where i = f1 (Ui ). By denition, there


 is a neighborhood V of p, V g(f ()) = ,
and an M -form supported in V with
= 1 for which
V

(g f ) .

deg (g f, , p) =

First consider a component Ui such that

deg (g, Ui , p) =

Ui

g
= 0.

In this case put


1
g |Ui .
deg (g, Ui , p)
Then i is an admissible M -form for the denition of deg (f, i , Ui ), and we have


deg (g f, i , p) =
(g f ) = deg (g, Ui , p)
f i = deg (g, Ui , p) deg (f, i , Ui ).
i =

Now consider a component Ui such that

deg (g, Ui , p) =

(4.3.54)

g = 0.
Ui

Let

g (y) = (y) dy 1 dy M .
If does not vanish on Ui , then there are disjoint open sets Ui+ and Ui which carry the

positive part + and the negative part , respectively. Let


i = f1 (Ui ). Then


f (g )
f (g )
+

i
i
= deg (f, i , Ui ) = 
.
deg (f, i , Ui ) = deg (f, i , Ui ) = 
+ (y) dy
(y) dy
Ui+

Ui

g = 0 and g does not vanish

Notice that both denominators are nonzero since


everywhere. This means that we have



f (g ) =
f (g ) +
deg (g f, i , p) =
i

= deg (f, i , Ui )

+
i



(y) dy

f (g )


Ui+

Ui

Ui

(y) dy = deg (f, i , Ui ) deg (g, Ui , p),

i.e., (4.3.54) holds in this case as well. Summing up and using (4.3.53) we complete the
proof.


4.3D. Brouwer Degree

241

Remark 4.3.125. It can be proved (see Amann & Weiss [5]) that the properties (i)(iv)
determine the degree uniquely.
We will conclude this appendix with some topological applications of the degree.
Applications to dierential equations are shown in Section 5.2. The well-known and basic
Jordan Separation Theorem asserts that a Jordan curve divides the complex domain into
two open components and exactly one of them is bounded (the interior domain of ).
This theorem has the following generalization.
Theorem 4.3.126 (Generalized Jordan Separation Theorem). Let K be a compact set in
RM such that RM \ K has a nite number, say k, of components. If f is a continuous
injection of K into RM , then RM \ f (K) has exactly k components.
Proof (a sketch). Notice rst that f is actually a homeomorphism of K, i.e., f 1 is
continuous on the compact set f (K). Applying the Tietze Theorem (see the footnote on
page 236) to each coordinate f i of f = (f 1 , . . . , f M ) and f 1 we conclude that there are
continuous extensions g and h of f and f 1 , respectively, which are dened in RM . Denote
by G0 , . . . , Gk1 the components of RM \K where G0 is the unique unbounded component.
Similarly, let U0 , . . . , Um (m N {}) denote the components of RM \ f (K) where U0
is the unique unbounded component. The idea of the proof consists in the application of
the multiplication property of the degree (Theorem 4.3.124 (ix)): Show that
deg (h g, Gj , p) = ij

for

Gi ,

deg (g h, Ui , q) = ij

for

Uj ,

i = 1, . . . , m, j = 1, . . . , k 1.

Dene matrices
A  (deg (g, Gj , Ui ))

i=1,...,m ,
j=1,...,k1

B  (deg (h, Ui , Gj ))j=1,...,k1 ,


i=1,...,m

and show that


AB = Im .
This means that m k 1. Similarly, BA = Ik1 . The equality m = k 1 follows.
Corollary 4.3.127 (Invariance of domain). Let G R
continuous injection. Then f (G) is an open set.

be an open set and f : G R


M

Proof (a sketch). Show rst that it is sucient to prove the assertion for the case when
G is an open ball B and f is continuous and injective on B. According to Theorem 4.3.126,
RM \f (B) has exactly two components U0 , U1 . If U0 denotes the unbounded component,
then
RM \ f (B) U0
(again by Theorem 4.3.126), i.e., U1 f (B). To show the opposite inclusion recall that
f (B) is bounded and connected.

Remark 4.3.128. If M < N and f : RM RN is a continuous injection, then it can
be proved (not too easily) that the complement of f (RM ) is a dense set in RN . We
want to recall the famous Peano curve, i.e., a continuous (but not injective) map from
the interval [0, 1] onto the square [0, 1] [0, 1].44 This map can be used to construct a
44 This

example (for the construction see, e.g., Dugundji [43, Section IV.4]) has played an important role in developing the notion of a curve.

242

Chapter 4. Local Properties of Dierentiable Mappings

continuous surjection of RM onto RN for any M < N . The existence of such a surjection
for M N is trivial.
As we have mentioned in Section 4.3, our main interest in the degree theory
consists in applying it to solving equations, i.e., in using the solution property (Theorem 4.3.124 (v)). This means to compute the degree, which is by no means an easy
task. Fortunately, we do not need to know the exact value of the degree. It will be sucient to show that it is not equal to zero. For this purpose the following mapping property
is very important.
Denition 4.3.129.
(1) A nonempty subset A of a linear space X is said to be symmetric if
for every

xA

we have

x A.

(2) A mapping f : A X Y , X, Y linear spaces, is called an odd mapping on a


symmetric set A if
f (x) = f (x)

for each

x A.

Theorem 4.3.130 (Borsuk Antipodal Theorem). Let be a bounded, open, symmetric


subset of RM and o . Let f : RM be a continuous mapping whose restriction to
is odd. If o
f (), then deg (f, , o) is an odd integer. In particular,
deg (f, , o)
= 0,
and there is a solution of the equation
f (x) = o

in

There are several proofs of this important topological theorem. For the proof based
on algebraic machinery see, e.g., Dugundji [43, Section XVI.6]. Here we present the main
steps of an analytic proof which is taken from Schwartz [118] (see also Nirenberg [100]
or Rothe [111] or Krawcewicz & Wu [80]).
Proof of Theorem 4.3.130. The assertion is obvious for M = 1, therefore, we assume that
M > 1. The main idea of this proof is quite simple. First observe that deg (f, , o) does
not depend on a continuous extension of f from into (see Theorem 4.3.124(vii)).
Since o , there is a small open ball B  B(o; ) inside , and there is a continuous
mapping g : RM such that

f (x) for x ,
g(x) =
x
for x B
(by the Tietze Theorem). Part (ii) of Theorem 4.3.124 implies that
deg (g, , o) = deg (g, \ B, o) + deg (g, B, o) = deg (g, \ B, o) + 1.
If g is constructed in such a way that g C 1 ( \ B) and o is a regular value of g, then

sgn Jg (x)
where S = {x \ B : g(x) = o}.
deg (g, \ B, o) =
xS

4.3D. Brouwer Degree

243

If, moreover, g is odd, then S is either the empty set or a symmetric set and g  (x) =
g  (x), deg (g, \ B, o) is an even integer, and the proof is completed. Unfortunately, it
is not known whether such g does exist. Therefore, we will show that all the required
properties of g are not actually needed. The demand for regularity of o can be replaced
by the assumption that g does not vanish on a part of a hyperplane, for instance on
H  {(x1 , . . . , xM 1 , 0) \ B}.
Indeed, in this case we have
deg (g, \ B, o) = deg (g, H+ , o) + deg (g, H , o)
where

H  {(x1 , . . . , xM ) \ B : xM > 0 or xM < 0}.

Moreover,


deg (g, H+ , o) =

g =
H+


H
+

g = deg (g, H , o)

since Jg (x) = Jg (x) and the mapping x H x is a dieomorphism. It can be


shown, by smooth approximation, that the equality
deg (g, H+ , o) = deg (g, H , o)
holds also for continuous g. Therefore, the core of the proof is the following substantial
strengthening of the Tietze theorem for odd mappings.

Lemma 4.3.131. Let D be a bounded, open, symmetric subset of RM and o
D. Let f
be a continuous mapping from D into RM which is odd and nowhere zero on D. Then
there exists a continuous odd extension g : D RM such that
g(x)
= o

for all

x H  {(x1 , . . . , xM ) D : xM = 0}.

To elucidate the problems of construction we note that the requirement of oddness


of g does not cause diculties. The crucial point is that we need g to be nowhere zero
on H. The following simple example shows that the existence of such an extension is not
obvious. Let D = (1, 1), f (1) = 1, f (1) = 1. Then any continuous extension of f
has a zero point in D.
Proof of Lemma 4.3.131. We assume again M > 1. The notation H is used similarly
as above. The key point is an odd, nowhere zero, continuous extension of f from D H
to f: H RM . See the above example and notice the distinction, namely that dim H =
M 1 and f is a map into RM . Having such an extension, the rest of the proof is an
application of the Tietze Theorem:

f (x),
x D,

f(x),
x H,
g(x) =

g(x),
x H+ ,

g (x), x H ,
where g is the Tietze extension of f and f.
The existence of an extension f follows from the following slightly more general
assertion:

244

Chapter 4. Local Properties of Dierentiable Mappings


Let G be a bounded, open, symmetric subset of RN , o
G, and let the mapping
f : G RM be continuous, odd and nowhere zero. If N < M , then there
exists a continuous, odd and nowhere zero extension
: G RM .

We will prove this assertion by induction with respect to the dimension N .


If N = 1, then
G [, ] [, ]

0 < < < ,

for some

and we need to nd a continuous extension to [, ] which is nowhere zero, dene


on [, ] to be odd and restrict to G.
The induction step is done similarly: First use the induction hypothesis for an
extension to G RN1 , and then for extending it into the upper half-space. In order to
show that such an extension actually exists (also for N = 1) we need the following key
result:
Let K be a compact subset of RN and let f be a continuous nowhere-zero
mapping from K into RM where N < M . Then for any compact set L K
there exists a continuous
: L RM
which extends f and is nowhere-zero on L.
Recall again the above example to see the obstacles in the proof. Denote
c  min |f (x)|
xK


c

and choose 0, 2 . First we prove the existence of a smooth approximation which is


dened on a neighborhood of L. By the Tietze Theorem, there is a continuous extension
f1 : L RM . This f1 can be smoothly approximated on L (the proof of Lemma 4.3.121):
Take 1 C 1 (U) where U is a neighborhood of L such that
f1 1 C(L) <

.
2

Put

1 (x1 , . . . , xM ) = 1 (x1 , . . . , xN )
In particular, this means that
i.e., all points of 1 (U) = 1 (U R
Theorem (Theorem 5.2.3)

for

x = (x1 , . . . , xN , . . . , xM ) U RM N .

det
1 (x) = 0,
M N

) are critical values of


1 . According to the Sard

meas 1 (U) = 0
(meas is the Lebesgue measure in RM ), and RM \ 1 (U) is dense.45 Therefore, there is a
point y0 RM \ 1 (U) such that y0  < 2 . Put 2 = 1 y0 . Then 2 (x)
= o for every
x L and
f 2 C(K) < .
Moreover,
2 (x)

c
2

for

x K.

4.3D. Brouwer Degree

245

We can assume that the last inequality holds for all x L. Otherwise, we multiply 2
c
by the function 2
()

outside the set where 2 (x) 2c . Since the Tietze extension


retains upper and lower bounds, we can extend f 2 from the set K to a continuous
mapping on L for which
(x) < ,
x L.
It remains to put
(x) = 2 (x) + (x),

x L.

The above proof is cumbersome and seems to be endless. So we recommend that


the reader goes through the main steps once again, not checking all the technicalities but
concentrating on their main ideas.
Corollary 4.3.132 (BorsukUlam). Let f be a continuous mapping from the M -dimensional sphere S M RM +1 into RM . Then there is a point x0 S M such that
f (x0 ) = f (x0 ).
Proof. Extend
(x) = f (x) f (x)
to a mapping from the unit ball B(o; 1) RM +1 into RM which is viewed as a subset
RM {0} of RM +1 . If o (S M ), the proof is complete. For the case o
(S M ), the
application of the Borsuk Theorem yields
deg (, B(o; 1), o)
= 0.
By Theorem 4.3.124 (viii),
deg (, B(o; 1), o) = deg (, B(o; 1), y)

y = (0, . . . , 0, 1) RM +1 ,

where

45 In fact we do not need here the whole strength of the Sard Theorem. The following much
weaker result is sucient:
Let G RM be an open set and : G RM a C 1 -mapping on G. If A G has
Lebesgue measure zero, so has (A).
Indeed, every point of A belongs to a ball B G on which  is bounded. By the Mean Value
Theorem, there is a constant K such that

(x) (y) K x y ,

x, y B.

()

RM

Since
is separable, the set A can be covered by countably many balls {Bj }jN , i.e., A =



(ABj ), and (A) =
(ABj ). To complete the proof we show that meas ((A Bj )) =
j=1

j=1

0, j N. So take > 0. Since meas (A Bj ) = 0, there are countably many cubes (or balls)



{Qk }kN A Bj
Qk ,
meas Qk < , and the estimate () holds for all Qk with
k=1

k=1

the same constant K. This implies meas ((A Bj ))


k=1

meas (Qk ) c

meas Qk < c

k=1

where the constant c depends only on the dimension M and on K. Since > 0 is arbitrary,
meas ((A Bj )) = 0 for all j N. To obtain the result required in the proof above take
A = U {0, . . . , 0} .
  
(M N)-tuple

246

Chapter 4. Local Properties of Dierentiable Mappings

i.e., the equation


(x) = y
has a solution in B(o; 1). However, this is impossible since
(B(o; 1)) RM {0}.

For more information in this direction see Schwartz [118].


Exercise 4.3.133. Deduce the classical Jordan Separation Theorem from Theorem 4.3.126!
Hint. A Jordan curve is homeomorphic to S 1 .
Exercise 4.3.134. Show that there is no continuous injection of RM into RN whenever
M > N!
Hint. Assume by contradiction that is such a mapping and put
f (x) = ((x), 0, . . . , 0).
Apply Corollary 4.3.127. For another proof see Exercise 4.3.137.
Exercise 4.3.135. Let be a ball in C with suciently large radius and let P be a
polynomial of degree n 1. Show that
deg (P, , 0) = n.
Hint. For P (x) =

n


ak xk , an
= 0, use the homotopy

k=0

H(t, x) = tP (x) + (1 t)an xn

on

Exercise 4.3.136. Let f be an odd mapping from S = B(o; 1) RM +1 into RM +1 \{o}.


Show that there is no continuous extension of f to
M

: B(o; 1) RM +1 \ {o}.
(This is the original Borsuks formulation of Theorem 4.3.130.)
Exercise 4.3.137. Deduce the assertion of Exercise 4.3.134 from the BorsukUlam Theorem (Corollary 4.3.132).
Exercise 4.3.138. Prove the following result due to Lusternik and Schnirelmann :
Let F1 , . . . , FM +1 be closed sets which cover S M . Then at least one Fi contains a pair of antipodal points (i.e., x, x Fi ).
Hint. Let
(x)  x,

x SM,

and suppose that


(Fi ) Fi =

for

i = 1, . . . , M.

There are continuous functions f i : S M [0, 1] such that


f i (Fi ) = {0},

f i ((Fi )) = {1}

(this consequence of the Tietze Theorem is known in a normal topological space as the
Urysohn Lemma). Put f = (f 1 , . . . , f M ) and apply the BorsukUlam Theorem to obtain
a point x0 . Show that
x0 FM +1 (FM +1 ).

4.3D. Brouwer Degree

247

Exercise 4.3.139. Prove the following complement of the above covering result of Lusternik and Schnirelmann:
There exist closed sets F1 , . . . , FM +2 which cover S M , and such that no Fi
contains a pair of antipodal points.
Hint. Proceed by induction with respect to M . The assertion is obviously true for M = 1.
Let M = 2. Cover the equator of S 2 with three closed sets E1 , E2 and E3 , with Ej (Ej ) =
, j = 1, 2, 3. Then choose a latitude L on the southern hemisphere and extend the cover
of the equator to a covering A1 , A2 and A3 of the set of all points which lie to the north
of L, including those of latitude L (see Figure 4.3.27).

A2

A3
E3

E2
A1
E1
L
A4
Figure 4.3.27.

Here Aj consists of all great circle arcs from latitude L to the north pole which
contain a point of Ej . Finally, let A4 consist of all points lying to the south of L, including
those of latitude L . Then A1 , A2 , A3 and A4 is the desired covering of S 2 . Continue
the argument for M 3.
Exercise 4.3.140. Prove the following BreadHamCheese Theorem:
If B1 , . . . , BM are bounded measurable sets in RM with M 1, then there is
an (M 1)-dimensional plane which divides all the sets Bj into two parts of
the same measure.
This assertion can be reformulated in three dimensions as follows:
Suppose we have a sandwich of bread, ham and cheese with ham and cheese
piled attractively but irregularly on the bread. Then the sandwich can be cut in
two with one straight slash of a knife in such a way that each of two persons
gets an identical share of bread, ham and cheese.
Hint. Let M = 2 and let d S 1 determine a direction in R2 . Take a perpendicular line
to d and move it from to + (see Figure 4.3.28).

248

Chapter 4. Local Properties of Dierentiable Mappings

B1
o

Hd
Figure 4.3.28.

Take the rst and the last perpendicular (they need not be necessarily distinct)
which splits the set B1 into two parts of the same measure. The perpendicular Hd which
is half-way between these two has the equation
(x, d) = a(d)
where a : R2 R satises

a(d) = a(d).

In order to nd d for which Hd also splits B2 into two parts of the same measure, we set
f (d)  meas {x B2 : (x, d) > a(d)}.
Then f : S 1 R is continuous and
f (d) + f (d) = meas B2 .
By the BorsukUlam Theorem, there is a point d such that f (d) = f (d). Thus the
corresponding Hd divides B2 into two parts of the same measure.
If M 3, then construct functions f2 , . . . , fM corresponding to B2 , . . . , BM .

Chapter 5

Topological and
Monotonicity Methods
5.1 Brouwer and Schauder Fixed Point Theorems
One of the most frequent problems in analysis, especially in its applications, consists in solving the equation
F (x) = y
where F is a mapping from a Banach space X into a Banach space Y .1 Such an
equation can be reduced to the equation F (x) = o, or, provided X Y , to the
equation
F (x) = x.
(5.1.1)
In this section we present two basic results on the solvability of (5.1.1) in a special
case, namely, for a continuous mapping F and a nite dimensional X, and a
compact mapping F in a general Banach space of innite dimension the Brouwer
and the Schauder Fixed Point Theorems.
We start with the nite dimensional case. A brief inspection of F : R R
indicates that reasonable assumptions on F are continuity on a closed interval I
and F (I) I. Moreover, the interval I should also be bounded. The Intermediate
Value Theorem from Calculus applied to g(x) = F (x) x says that there is a
solution of (5.1.1) in I provided these assumptions are satised. Notice that these
assumptions are too weak to say anything about the number of solutions. Having
no appropriate ordering in R2 , standard proofs of the above result fail in R2 and,
therefore, a generalization is far from being simple.
1 Spaces

X, Y are assumed to have linear and topological structure since we discuss problems of
analysis and not only of algebra or topology. Banach space structures are supposed mainly for
simplication here, but sometimes they can be crucial.

250

Chapter 5. Topological and Monotonicity Methods

Instead of an interval we consider the closed unit ball B  B(o; 1) in RN ,


N 2, and a continuous mapping F : B B. Suppose that the equation (5.1.1)
has no solution in B. Dene a map G as indicated in Figure 5.1.1, i.e.,
G(x) = (x)F (x) + (1 (x))x
where (x) 0 is a solution of the quadratic equation
 F (x) + (1 )x2 = 1.
The mapping G is well dened (remember our assumption that F (x) = x for
G(x)
x
F (x)
B
Figure 5.1.1.

x B), it maps B continuously onto the unit sphere S N 1 in RN and


G(x) = x

for x = 1.

However, this seems to be impossible as our experience says that if the ball is
continuously deformed (by G) onto the sphere, this ball has to be punctured.
Nevertheless, a rigorous proof of this fact is far from being obvious. Such a proof
could be based on introducing certain topological notions which are preserved
under continuous deformation. If we show that some of these notions are dierent
for the ball and its boundary, we obtain a contradiction with the existence of G.
Algebraic topology is devoted to the study of topological invariants of an algebraic
nature (homotopic groups, homological groups, etc.). However, such methods are
beyond the scope of this book.
Instead we will give an analytic proof of the existence of a xed point of a
continuous mapping F : B B. This proof which is due to Milnor [95] is based
on the idea of approximating a bad nonlinear mapping F by a simpler one. A
smooth approximation is possible by the Weierstrass Approximation Theorem (see
Theorem 1.2.14 and the discussion there).
Suppose, by contradiction, that F has no xed point in B. For any > 0 there
are polynomials P1 , P2 , . . . , PN of N variables, such that for P = (P1 , . . . , PN ) we
have
sup F (x) P (x) <
x1

5.1. Brouwer and Schauder Fixed Point Theorems

251

P
and also that P  1+
: B B. Moreover, P has no xed points in B, either. This
follows from the estimate





x
F (x) x F (x) P (x) + (1 + ) P (x) x +
1+ 

< 2 + (1 + )P (x) x


and the fact that inf F (x) x > 0, since B is compact. Now we construct the
xB

mapping G : B S N 1 corresponding to P as has been shown in Figure 5.1.1 for


F . We put
H(t, x)  (1 t)x + tG(x),
x B.
The most important properties of H are given in the next lemma.
Lemma 5.1.1.
(i) H(t, ) maps B into itself for every t [0, 1].
(ii) H(t, ) maps int B into itself for every t [0, 1).
(iii) The partial Frechet derivative H2 (t, x) exists on [0, 1] int B and is bounded
on this set.
(iv) For small t 0 the mapping H(t, ) is a dieomorphism of int B onto itself.
Proof. The rst two statements are obvious, the third follows from dierentiation
of (x) (see Exercise 5.1.18). Let us prove the fourth statement. For a small positive
t the derivative
H2 (t, x) = (1 t)I + tG (x)
is an isomorphism (Proposition 2.1.2) and, by the Local Inverse Function Theorem,
H(t, ) is a local dieomorphism. Since, by the Mean Value Theorem, applied to
G,
H(t, x) H(t, y) (1 t)x y t sup G (z) x y (1 ct)x y
z<1

for a constant c and all x, y int B, the mapping H(t, ) is injective for small t
and hence it is also a dieomorphism on the whole int B. It remains to prove that
H(t, )(int B) = int B.
Notice that int B is a connected set and thus it is sucient to show that
M  H(t, )(int B)
is open and relatively closed in int B. The former property is a consequence of the
local continuous invertibility of H(t, ). To see the latter property we point out
that
M = H(t, )(B) int B
(H(t, x) = 1 for every x = 1). Since B is compact, H(t, )(B) is also compact

and therefore closed, i.e., M is relatively closed in int B.

252

Chapter 5. Topological and Monotonicity Methods

Having Lemma 5.1.1 we continue in the proof of our main statement on the
existence of a xed point of F . To reach a contradiction with the assumption of
non-existence, we will use the substitution theorem for the Lebesgue integral:
If meas A denotes the N -dimensional Lebesgue measure of A RN ,
we have


dx =
det H2 (t, y) dy
(5.1.2)
meas (int B) =
int B

int B

for small positive t.


Notice that H2 (0, y) = I and thus det H2 (t, y) is positive for small t > 0. The
second equality follows from the substitution x = H(t, y) (Lemma 5.1.1(iv)). The
last integral in (5.1.2) is dened for all t [0, 1], and it is a polynomial Q(t) of
the variable t [0, 1]. Since Q(t) is a constant for small t, we also obtain that
Q(1) = meas (int B).
The substitution G(y)  H(1, y) = z yields


Q(1) =
det G (y) dy =
int B

G(int B)

dz = meas G(int B).2

But G(B) = S N 1 and meas S N 1 = 0. Hence


0 = Q(1) = Q(0) = meas (int B),
a contradiction.
In order to get a xed point theorem in reasonable generality we prove the
following simple topological result.
Lemma 5.1.2. Let K be a convex, closed and bounded subset of RN which contains
at least two dierent points. Then K is homeomorphic to the unit ball B M 
B(o; 1) in RM for some M N .
Proof. Choose linearly independent elements x1 , . . . , xM of K such that
X  Lin{x1 , . . . , xM }
contains K. The existence of x0 K such that for any x X there is > 0 such
that x0 + 1 x K can be proved by induction with respect to the dimension of
X. For the sake of simplicity we assume that x0 = o, and dene the Minkowski
functional of K, i.e.,


x
1
, x X \ {o}, (o) = o.
p(x) = inf > 0 : x K , (x) = p(x)

xRN
It is not dicult to prove that is a homeomorphism of K onto B N X. Since
X with the induced RN -norm is isomorphic to RM (Corollary 1.2.11(i)), K is also

homeomorphic to B M .
2 Notice

that for this substitution we do not need G to be a dieomorphism.

5.1. Brouwer and Schauder Fixed Point Theorems

253

The rst main result of this section is the following Brouwer Fixed Point
Theorem.
Theorem 5.1.3 (Brouwer Fixed Point Theorem). Let K be a nonempty, convex,
closed and bounded subset of RN . Assume that F : K K is continuous. Then F
has a xed point in K.
Proof. If K has exactly one point then the statement is obvious. In other cases
choose a homeomorphism of B M = B(o; 1) RM onto K (Lemma 5.1.2).
According to the above discussion, the mapping 1 F : B M B M has a
xed point x
B M . Then
F ((
x)) = (
x) K.

The following example shows an interesting application of the Brouwer Fixed


Point Theorem in linear algebra.
Example 5.1.4. Let A = (aij )i,j=1,...,N be a matrix such that
aij 0

for all i, j = 1, . . . , N.

Then there exists a nonnegative eigenvalue of A with an eigenvector x =


(x1 , . . . , xN ) having all its components
xi 0,

i = 1, . . . , N.

Indeed, consider the l1 -norm on RN , i.e.,


x1 =

N


|xi |,

and let D  {x RN : x1 = 1, xi 0, i = 1, . . . , N }.

i=1

Then D is a nonempty, closed, convex and bounded subset of RN . Let A : RN


RN be a linear operator with the representation in the standard basis given by
the matrix A. If A vanishes at an x D, then such x is an eigenvector for the
eigenvalue = 0. If this is not the case, put
f (x) =

Ax
Ax1

for x D.

Since f maps D continuously into D, it has a xed point x in D. Then


Ax = x

where = Ax1 .

Let us mention now the standard application of the Brouwer Fixed Point
Theorem to the existence of periodic solutions of ordinary dierential equations.
The basic idea goes back to H. Poincare: Denote by x(; ) a solution of the initial
value problem
x(t)

= f (t, x(t)),
x(0) = .
(5.1.3)

254

Chapter 5. Topological and Monotonicity Methods

Assume that f satises conditions which ensure the existence and uniqueness of
(5.1.3) (see, e.g., Theorem 2.3.4) and, moreover, that f (, x) is T -periodic. Then
x(; ) is a T -periodic solution of (5.1.3) if and only if x(; ) is dened on the
interval [0, T ] and
P  x(T ; ) =
(P is called the Poincare mapping). Since x(; ) depends continuously on the
initial condition (under reasonable assumptions on f , see Remark 2.3.5 and
Example 4.2.5), the Poincare mapping is continuous and its xed points can be
found with help of the Brouwer Fixed Point Theorem as the following example
suggests.
Example 5.1.5. Assume in addition that there exists r > 0 such that
(x, f (t, x))RN 0

for all t [0, T ] and xRN r.

Then there exists a T -periodic solution of (5.1.3).


To be able to apply the Brouwer Fixed Point Theorem to the Poincare mapping P it is sucient to show that P maps the closed ball B(o; r) into itself.
Since
d 1
x(t)2 = (x(t), f (t, x(t)) 0
whenever x(t) r,
dt 2
the function t  x(t) is decreasing provided x(0) r. In particular, P is well
dened (i.e., a solution x(, x0 ) exists on the interval [0, T ] provided x(0) < r)
g
and P maps B(o; r) into itself.
Example 5.1.6. Assume that the right-hand side in (5.1.3) is asymptotically linear
in RN , i.e., there exist a T -periodic continuous matrix A(t) and the function
g : R RN RN continuous, T -periodic in t, and locally Lipschitz with respect
to the x-variables such that for
f (t, x) = A(t)x + g(t, x)

(5.1.4)

the following condition is satised:


(H) > 0 b > 0 :

g(t, x) b + x

for all t R, x RN .3

We are again interested in periodic solutions to (5.1.3) with f given by (5.1.4).


First we have to show that the Poincare mapping is well dened for all
RN . Denote by
(t, s)  (t)1 (s)
where (t) is the fundamental matrix of the linear equation
x = A(t)x
3 Roughly

(5.1.5)

speaking: g has a uniformly (with respect to t) vanishing derivative at innity.

5.1. Brouwer and Schauder Fixed Point Theorems

255

such that (s, s) = I.4 Then the solution of (5.1.3) satises the integral equation
 t
x(t; ) = (t, 0) +
(t, s)g(s, x(s; )) ds
(5.1.6)
0

(the Variation of Constants Formula) whenever it exists on the interval [0, t]. The
fundamental matrix is continuous on [0, T ] [0, T ] and therefore it is bounded:
(t, s)RN N K,

(t, s) [0, T ] [0, T ].

By (H), we get the estimate



x(t; ) K + KbT + K

x(s; ) ds
0

and, with help of the Gronwall inequality (see Exercise 5.1.16),


x(t; ) K(bT + )eKT = L1 + L2 

(5.1.7)

whenever x(; ) is dened on the interval [0, t] [0, T ]. If the maximal interval of
the existence of the solution x(; ) is [0, ) with T , then the boundedness of
x(; ), (5.1.7) and the condition (H) imply that x(; ) is uniformly continuous on
[0, ), and therefore it can be extended to a larger interval (see Proposition 1.2.4
and cf. a similar idea in Corollary 3.1.6). This implies that x(; ) is dened on
[0, T ] (actually on R) and the mapping P is dened for all RN .
To apply the Brouwer Fixed Point Theorem (the problem is to show that P
maps a ball into itself) we assume that 1 is not the Floquet multiplier of the linear
equation (5.1.5), i.e., 1  ((T, 0)) or, equivalently, the equation (5.1.5) possesses
only the trivial T -periodic solution. Then the equation P () = is equivalent to
the equation
F ()  [I (T, 0)]1 [x(T ; ) (T, 0)] = .
From (5.1.6) and (5.1.7) we obtain
F () [I (T, 0)]1 [Kb + K(L1 + L2 )]T = c1 () + c2 
where c2 does not depend on . Choose small enough to satisfy c2 < 1. Keeping
such xed there is r > 0 such that c1 () + c2 r r. It follows that F maps the
ball B(o; r) RN of the radius r into itself and the Brouwer Fixed Point Theorem
g
yields a T -periodic solution of (5.1.4).
The Brouwer Fixed Point Theorem is a very strong device for solving nite dimensional nonlinear equations. Unfortunately, it does not hold in innite
dimensions as the following example shows.
means that x : t  (t, s) is a (unique) solution of the equation (5.1.5) which satises
x(s) = .

4 This

256

Chapter 5. Topological and Monotonicity Methods

Example 5.1.7 (Kakutani). Let H be a separable Hilbert space with an orthonormal basis {en }
n=1 . Denote by A L(H) the right shift given by





xn en =
xn en+1 ,
Aen = en+1 , i.e., A
n=1

and

n=1
1

F (x) = (1 x2 ) 2 e1 + Ax.


Then F is continuous and
F (x)2 = 1 x2 + Ax2 = 1
If x =

for

x 1.

xn en is a xed point of F , then

n=1

xn = xn+1
The series

and

x1 = (1 x2 ) 2 .

x2n , with xn = xn+1 , is convergent only if xn = 0 for all n, i.e., x =

n=1

0. Then
x1 = 1,
a contradiction.

Notice that in the previous example the apparently simple linear operator A
is perturbed by a nonlinear operator with a one-dimensional range. Continuous
operators with the range in a nite dimensional subspace form an important special
subclass of the so-called (nonlinear) compact operators.
Denition 5.1.8. Let X, Y be normed linear spaces and let M X. A mapping
F : M Y is called a compact operator on M into Y if F is continuous on M (M
being a metric space with the metric induced by the norm of X) and F (M K)
is a relatively compact set in Y for any bounded set K X.
The set of all compact operators from M into Y is denoted by C (M, Y ). If
the range of F C (M, Y ) is a subset of a nite dimensional subspace of Y , then
we say that F is a nite dimensional operator and write F Cf (M, Y ).
We recall that linear compact operators have been investigated in Section 2.2.
Warning. In contrast to the linear case the continuity of a nonlinear operator F
is not a consequence of the fact that F maps bounded sets onto relatively compact
ones! A simple example can be constructed for F : R R.
Our interest in compact operators arises from the observation that they are
close to nite dimensional ones. The precise formulation follows.
Theorem 5.1.9. Let X be a normed linear space, Y a Banach space and let M be
a bounded subset of X.

(i) If F C (M, Y ), then there is a sequence {Fn }n=1 Cf (M, Y ) which


converges to F uniformly on M.

5.1. Brouwer and Schauder Fixed Point Theorems

257

(ii) If {Fn }n=1 C (M, Y ) and lim Fn (x) = F (x) uniformly for x M, then
n

F C (M, Y ).

-net y1 , . . . , ym F*(M)
Proof. (i) Since F (M) is compact there is a nite n1 )
of F (M) (Proposition 1.2.3). Functions k (x) = max 0, n1 F (x) yk  are
m

continuous on M and
k (x) > 0 for every x M. Therefore the functions
k=1

k (x)
k (x)  
,
m
k (x)

k = 1, . . . , m,

k=1

form a continuous partition of unity on M. Put Fn (x) =

k (x)yk , x M.

k=1

Then Fn Cf (M, Y ) and


F (x) Fn (x)

m


m


k (x)F (x) yk  <

k=1

1
n

for every x M.

(ii) If we literally translate the classical proof for real functions to vector
functions we see that F is continuous on M. Let n N be such that
sup F (x) Fn (x) <
xM

and y1 , . . . , yk is an -net for Fn (M). Then it is also a 2-net for F (M). Since Y

is a Banach space, Proposition 1.2.3 shows that F (M) is compact.
Remark 5.1.10. The assertion (i) of Theorem 5.1.9 obviously holds for linear compact operators, but generally we cannot guarantee linearity of the approximating

sequence {Fn }n=1 (see Remark 2.2.7).


The following theorem is a generalization of the Brouwer Fixed Point Theorem into the innite dimensional setting.
Theorem 5.1.11 (Schauder Fixed Point Theorem). Let K be a nonempty, closed,
convex and bounded subset of a normed linear space X. Assume that F C (K, X)
and F (K) K. Then there is a xed point of F in K.

Proof. Let {Fn }n=1 be the sequence constructed in the proof of Theorem 5.1.9(i).
Denote
Xn  Lin{y1 , . . . , ym }.
Since y1 , . . . , ym F (K) and K is convex, we have
Fn (K) K Xn .
The restriction of Fn to K Xn satises the assumptions of the Brouwer Fixed
Point Theorem and hence there is xn K Xn such that
Fn (xn ) = xn .

258

Chapter 5. Topological and Monotonicity Methods

By the compactness of F there is a subsequence {F (xnk )}k=1 which converges to


an x F (K) K = K. The estimate
F (xnk ) xnk  = F (xnk ) Fnk (xnk ) <

1
nk

implies that also lim xnk = x. Since F is continuous, we conclude that


k

lim F (xnk ) = F (x)

and

F (x) = x.

Remark 5.1.12. The above proof of Theorem 5.1.11 is based on the approximation
of F by Fn Cf (K, X). The construction in the proof of Theorem 5.1.9(i) is surely
not unique. We recommend that the reader thinks about a possible simplication
when F acts on a separable Hilbert space.
Another possibility occurs when K is a compact convex set. We obtain a typical situation as soon as X is a reexive Banach space and K is a closed, convex and
bounded subset of X. Then K is compact in the weak topology (Theorem 2.1.25)5
and the continuity of F : K K in the weak topology (it sends weakly convergent sequences into weakly convergent ones) is sucient to justify application of
Theorem 5.1.11.
A slightly more general statement was proved by A.N. Tikhonov (for a proof
see, e.g., Dugundji [43, Appendix 1] and Deimling [34, 10.3]).
We now show how the Schauder Fixed Point Theorem can be applied to
dierential equations. To avoid technical details we restrict ourselves to ordinary
dierential equations. Their solutions are generally smooth which suggests a relation to compact operators.
Proposition 5.1.13. Let G be an open subset of RN +1 and let f : G RN be
continuous on G. Then for any (t0 , x0 ) G there exists > 0 such that the
equation
x = f (t, x)
has a solution on the interval (t0 , t0 +) which satises the condition x(t0 ) = x0 .
Proof. It has been shown in Lemma 3.1.5 that the initial value problem is equivalent to the integral equation
 t
f (s, x(s)) ds = x(t)
(5.1.8)
F (x)(t)  x0 +
t0

in the space C[t0 , t0 + ]. Choose > 0, r > 0 such that


M = [t0 , t0 + ] B(x0 ; r) G
5 We

also have to use the fact that a convex set which is closed in the norm topology is also
weakly closed (cf. Exercise 2.1.39).

5.1. Brouwer and Schauder Fixed Point Theorems

259

(B(x0 ; r) is the closed ball in RN of radius r centered at x0 ). Then M is a compact


set in RN +1 , and therefore f is bounded on M, say
f (t, x)RN c

for

(t, x) M.

Then
F (x) x0 C[t0 ,t0 +] c r
for
x K  {y C[t0 , t0 + ] : y x0 C[t0 ,t0 +] r}
provided is suciently small. This proves that F (K) K. Since f is also uniformly continuous on M, the operator F is continuous on K (the convergence on
K is the uniform convergence). Further, for t, s [t0 , t0 + ], t < s, x K, we
have

s

F (x)(t) F (x)(s)RN

f (, x())RN d c|s t|.

This means that F (K) is equicontinuous. By Theorem 1.2.13, F (K) is relatively


compact on C[t0 , t0 + ]. It follows from Theorem 5.1.11 that the equation
(5.1.8) has a solution.

Our second example concerns a boundary value problem for an ordinary
dierential equation.
Example 5.1.14. Let f be a continuous function on [0, 1] R. We wish to solve
the equation
x(t) = f (t, x(t))
(5.1.9)
with the Dirichlet boundary conditions
x(0) = x(1) = 0.

(5.1.10)

We have dealt with this problem already in Example 2.3.8. It has been proved there
that y is a solution of this problem if and only if it is continuous and satises the
integral equation (f is assumed to be continuous)

F y(t) 

G(t, s)f (s, y(s)) ds = y(t)

(5.1.11)

where the Green function G is given by



s(t 1), 0 s t 1,
G(t, s) =
t(s 1), 0 t s 1.
The operator F maps C[0, 1] into itself (actually into C 2 [0, 1]) and is compact.

260

Chapter 5. Topological and Monotonicity Methods

This can be proved by two types of argument:


(i) For any R > 0 there is c(R) such that
|f (s, y)| c(R)

for s [0, 1], |y| R.

Since

d2
F (y)(t) = f (t, y(t)),
dt2
F maps the ball B(o; R) in C[0, 1] into the set of functions which have uniformly bounded second derivatives. Thus F (B(o; R)) is relatively compact in
C[0, 1] (see Theorem 1.2.13).
(ii) The operator F is a composition of a linear integral operator and a Nemytski
operator (see Example 3.2.21). The Nemytski operator
: y  f (, y())
is continuous from C[0, 1] into itself, and the integral operator
 1
G(, s)x(s) ds
K : x 
0

is compact from C[0, 1] into itself (Example 2.2.5). Therefore F = K is


also compact.
It remains to prove that F maps a ball B(o; R) C[0, 1] into itself. For this
purpose some growth assumptions on f are needed. If
|f (s, y)| a + b|y|

for s [0, 1], y R,

then


F (y)C[0,1] [a + byC[0,1]] sup
t[0,1]

Whenever b < 8, R
solution.

a
8b ,

|G(t, s)| ds

a + byC[0,1]
.
8

then F maps B(o; R) into itself and (5.1.11) has a


g

Exercise 5.1.15. If f has a sublinear growth in y, i.e., there is [0, 1) such that
|f (s, y)| a + b|y| ,
then no restriction on b is needed. Prove this fact!
Exercise 5.1.16. Prove the Gronwall inequality:
Let f be a nonnegative continuous function on an interval [a, b] and let
A, B be nonnegative reals. Assume that

f (t) A + B

f (s) ds,

t [a, b].

Then
f (t) AeB(ta) ,

t [a, b].

(5.1.12)

5.1A. Fixed Point Theorems for Noncompact Operators

261

Hint. Denote the right-hand side of (5.1.12) by g and notice that


g(t)

= Bf (t) Bg(t).
Remark 5.1.17. More general integral and dierential inequalities can be investigated in a similar way. Let us mention x(t)

f (t, x(t)) as an example.


Exercise 5.1.18. Let be as in the proof of Lemma 5.1.1. Prove that
x  (x),

x int B,

has a bounded Frechet derivative.


Hint. Use the Implicit Function Theorem for
(, x)   P (x) + (1 )x2 1.
Exercise 5.1.19. Regard the operator F given by (5.1.11) as an operator on a space
of integrable functions. Repeat the argument from Example 5.1.14.
Exercise 5.1.20. Let f in (5.1.9) depend also on x(t),

i.e., f = f (t, x(t), x(t)).

Formulate assumptions on f (x, y, z) to get the existence of a solution of the boundary


value problem (5.1.9), (5.1.10). See also Example 5.2.16.
Exercise 5.1.21. Let K be a bounded continuous real function on [a, b] [a, b] R
and let h C[a, b]. Prove that the integral equation
 b
K(t, , x( )) d + h(t)
x(t) =
a

has at least one solution x C[a, b].

5.1A Fixed Point Theorems for Noncompact Operators


There are many generalizations of the Schauder Fixed Point Theorem. We mention here
one which shows that the assumption of compactness of the operator can be relaxed.
However, having in mind Example 5.1.7 this must be done carefully and more than
continuity of the operator must be required. To this purpose we need a tool which will
measure how much noncompact the operator actually is.
Denition 5.1.22. Let M be a bounded set in a metric space (X, ). The Kuratowski
measure of noncompactness (M) is dened to be the inmum of the set of all numbers
d > 0 with the property that
(KM) M can be covered by nitely many sets, each of whose diameters6 is less than
or equal to d.
If X is complete, then it follows from Proposition 1.2.3 that M is relatively compact
if and only if (KM) holds for every d > 0. Therefore (M) = 0 is equivalent to relative
compactness of M. If the value of (M) increases, M deviates more strongly (in the
sense of condition (KM)) from relatively compact sets.
diameter of M is dened as diam M  sup (x, y) where the supremum is taken over all
x, y M.
6 The

262

Chapter 5. Topological and Monotonicity Methods

Proposition 5.1.23 (Properties of the Kuratowski measure of noncompactness). Let X be


a (real or complex) Banach space. Then for all bounded subsets M, M1 , . . . , Mn , N of
X the following assertions hold:
(i) () = 0;
(ii) (M) = 0 M is relatively compact;
(iii) 0 (M) diam M;
(iv) M N = (M) (N );
(v) (M + N ) (M) + (N );7
(vi) (M) = ||(M) for all R (or C);
(vii) (M) = (M);
n


(viii)
Mi = max{(M1 ), . . . , (Mn )};
i=1

(ix) (M) = (Co M).


Proof. The properties (i)(vii) follow directly from Denition 5.1.22, and so the proof is
left to the reader. Let us prove (viii). Set
M=

n


Mi

and

a = max{(M1 ), . . . , (Mn )}.

i=1

Then it follows from Mi M and from (iv) that (Mi ) (M), so a (M). To
i
prove the equality, choose > 0 and a covering {M1i , M2i , . . . , Mm
i } of Mi with
diam Mji (Mi ) + a + .
All of these Mji form a covering of M, so that
(M) a + ,

(M) a.

i.e.,

Hence, (M) = a and (viii) is proved.


Finally, we prove (ix). It follows from M Co M and (iv) that
(M) (Co M).
Conversely, we show that (Co M) (M). This will be done in three steps.
Step 1. We prove inequality (5.1.13) below. For every > 0 there exists a covering
N

Mi with diam Mi (M) + . Since diam (Co Mi ) = diam Mi ,8 we may
M
i=1

assume that Mi are all convex. Let





= (1 , . . . , N ) R

N


.
i = 1, i 0 for all i

i=1

and
A() 

N


i Mi

for all

i=1
7M

+ N  {z = x + y : x M, y N }.
reader is invited to prove this equality.

8 The

= (1 , . . . , N ) .

5.1A. Fixed Point Theorems for Noncompact Operators

263

Now it follows from (iv), (v) and (vi) that


(A())


Step 2. We show that the union

N


i (Mi ) (M) + .

(5.1.13)

i=1

A() is a convex set. Indeed, let

x=

N


i x i ,

y=

N


i=1

t [0, 1]

i yi ,

z = tx + (1 t)y

and

i=1

where , and xi , yi Mi for all i. The point z can be represented in the form

N
t i , for > 0,

i
i
i zi where i = ti + (1 t)i , zi = i xi + (1 i )yi , i =
z=
0,
for = 0.
i=1

By denition of we have 0 i 1. The set Mi is convex, so zi Mi . Moreover,


, by the convexity of . Hence z A().
Step 3. We prove that
(Co M) (M) + 3.
Since the set is compact, for a given > 0 we can nd nitely many points (1) , . . . ,
(
'
N

(j)
(j)
(m) such that for any x =
i xi A() there exists (j) = 1 , . . . , N for
i=1

which



N





(j) 
(j) x

x
k=

i  max -i i - max |xi |
i
i=1,...,N
 i=1,...,N

k
i=1

where k > 0 is a common bound for all sets Mi . Therefore,




A()

So, by Step 2, we have Co M



(Co M)


A()

m


(
'
A (j) + B(o; ).

j=1

A() and by the other statements and (5.1.13),

m


'

(j)

+ B(o; )

j=1

m


((
' '
+ 2
A (j)

j=1

(M) + 3,
i.e., since > 0 is arbitrary,
(Co M) (M).

Example 5.1.24. Let B(o; 1) X be the open unit ball in a Banach space X. If dim X <
, then
(B(o; 1)) = (B(o; 1)) = (B(o; 1)) = 0
(see Proposition 1.2.3). On the other hand, if dim X = , then
(B(o; 1)) = (B(o; 1)) = (B(o; 1)) = 2.

(5.1.14)

The proof of this fact is not trivial. Since the diameter of B(o; 1) is equal to 2, we know
that (B(o; 1)) 2. In order to prove (5.1.14) we show that (B(o; 1)) 2. Assume

264

Chapter 5. Topological and Monotonicity Methods

the contrary. Then there exist sets Mi with


B(o; 1) =

n


Mi

i=1

and the diameter of every Mi is strictly less than 2. We may take all Mi s to be closed.
Let Xn X be a subspace of X such that dim Xn = n. Then we have
B(o; 1) Xn =

n


(Mi Xn ).

i=1

The sets Mi Xn , i = 1, 2, . . . , n, cover the closed unit sphere B(o; 1) Xn in Xn .


By the result of Lusternik and Schnirelmann (see Exercise 4.3.138) there exists Mj such
that Mj Xn contains an antipodal pair {x, x}. Consequently,
2 diam (Mj Xn ) diam Mj ,
which is a contradiction. Finally, by (iv) and (vii) of Proposition 5.1.23 we have (5.1.14).

In the next denition we will consider a special class of continuous and bounded
operators.
Denition 5.1.25. Let T : M X X be a bounded operator9 from a Banach space X
into itself. The operator T is called a k-set contraction if there is a number k 0 such
that
(T (M)) k(M)
for all bounded sets M in M .
The bounded operator T is called condensing if
(T (M)) < (M)
for all bounded sets M in M with (M) > 0.
Obviously, every k-set contraction for 0 k < 1 is condensing. Every compact
map T is a k-set contraction with k = 0. A typical example of a k-set contraction with
0 k < 1 is the following one.
Example 5.1.26. Let K, C : D X X be operators on a Banach space X. Let K be a
k-contractive, i.e., there exists k [0, 1) such that
K(x) K(y) kx y

for all

x, y D,

(5.1.15)

and let C be compact. Then K + C is a k-set contraction. Indeed, let M D be a


bounded set. By Denition 5.1.22 it follows from (5.1.15) that (K(M)) k(M). By
(ii) of Proposition 5.1.23 we have (C(M)) = 0. Set T  K + C. Now (iv) and (v) of
Proposition 5.1.23 imply
(T (M)) (K(M) + C(M)) (K(M)) + (C(M)) k(M).

The following assertion is a generalization of the Schauder Fixed Point Theorem


(note that every compact operator is condensing).
The operator T is said to be bounded on M if T (M A) is a bounded set provided A is a
bounded set.
9

5.1A. Fixed Point Theorems for Noncompact Operators

265

Theorem 5.1.27 (Darbo). Let us suppose that


(i) M is a nonempty, closed, bounded and convex subset of a Banach space X;
(ii) an operator T : M X M is condensing and continuous on M.
Then T has a xed point in M.
Proof. The idea of the proof is to nd a suitable subset A of M which is mapped into
itself by T in such a way that the Schauder Fixed Point Theorem can be applied to the
restriction T : A A. The resulting xed point is then trivially a xed point of the
original mapping T : M M. The set A is constructed in the following way. Choose
a point m M and let denote the system of all closed, convex subsets K of M for
which m K and T (K) K. Set

K
and
C = Co {T (A) {m}}.10
A=
K

Since m A and T (A) A, it follows that C A. This implies T (C) T (A). Obviously
T (A) C, i.e., T (C) C which means that C . So, A C. We have proved that
A = C. Now, (vii), (viii) and (ix) of Proposition 5.1.23 imply that
(A) = (C) = (T (A)).

(5.1.16)

Since T is condensing,
(A) = 0.
Since A is also closed, A is a compact set. The restriction of T to A is thus a compact operator. Consequently, the Schauder Fixed Point Theorem can be applied to the mapping
T : A A.

Corollary 5.1.28. Let K, C : M X X be operators in a Banach space X such that
(K + C)(M) M, let M be a nonempty, closed, bounded and convex set in X, let K
be k-contractive (0 k < 1) and C compact. Then K + C has a xed point in M.
Proof. The proof follows immediately from Example 5.1.26 and Theorem 5.1.27.

The following assertion generalizes the existence part of Theorem 3.1.4 and follows
from the previous Corollary 5.1.28, cf. the statement with the example on page 110. Let
us consider the initial value problem

x = f (t, x) + g(t, x),
(5.1.17)
x(t0 ) = x0
in a Banach space Y . For xed positive numbers a and b dene
R  [t0 a, t0 + a] [x Y : x x0  b}.
Proposition 5.1.29. Let us assume that
(i) the map f : R Y is continuous and also Lipschitz continuous with respect to the
second variable, i.e., there exists L > 0 such that
f (t, x) f (t, y) Lx y

for all

(t, x), (t, y) R;

(ii) the map g : R Y is compact;


10 Observe

that
= because M , and A =

because m A.

266

Chapter 5. Topological and Monotonicity Methods

(iii) the sum f + g is bounded, i.e., there exists B > 0 such that
f (t, y) + g(t, y) B

(t, y) R;

for all

(iv) the number c > 0 is chosen such that


c a,

cL < 1,

Bc b.

Then the problem (5.1.17) has a solution x = x(t) dened on (t0 c, t0 + c).
Proof. It follows from Lemma 3.1.5 that the problem (5.1.17) is equivalent to the integral
equation

t

x(t) = x0 +

[f (s, x(s)) + g(s, x(s))] ds.

(5.1.18)

t0

Let
X = C([t0 c, t0 + c], Y )

M = {x X : x x0 X b}.11

and

Then (5.1.18) can be regarded as the operator equation


x = K(x) + C(x),
where

x M,


K(x)(t) = x0 +

(5.1.19)

f (s, x(s)) ds,

C(x)(t) =

t0

g(s, x(s)) ds.


t0

Similarly to the proof of Theorem 3.1.4 we obtain that


(K + C)(M) M.
Furthermore, the operator K is k-contractive with k = Lc and the operator C is compact.
So, Corollary 5.1.28 yields the existence of a solution of (5.1.19), hence of (5.1.18), and
thus, ultimately, of (5.1.17).

A similar approach can be also used for functional dierential equations. In the following example we describe a simple situation. For a more general treatment of evolution
equations see, e.g., Milota & Petzeltov
a [96].
Example 5.1.30. Consider a system of ordinary functional dierential equations
 t
f (t, s, x(s)) ds,
x(0) = x0 ,
(5.1.20)
x(t)

= A(t)x(t) +
0

where A is an N N -matrix with continuous entries on the interval [0, T ], f : M 


older continuous with respect
([0, T ] [0, T ] RN ) RN is continuous and locally -H
to the rst variable and locally satises the Lipschitz condition with respect to the third
variable, i.e., for any (t0 , s0 , x0 ) M there are a neighborhood U of this point and
constants c > 0, L > 0, (0, 1) such that
|f (t1 , s, x1 ) f (t2 , s, x2 )| c|t1 t2 | + L|x1 x2 |

for

(ti , s, xi ) U, i = 1, 2.

Instead of (5.1.20) we consider the equivalent integral equation


 t
 s
(t, s)
f (s, , x()) d ds
x(t) = H(x)(t)  (t, 0)x0 +
0
11 Here

x0 X is understood to be a constant function dened on [t0 c, t0 + c] with value in Y .

5.2. Topological Degree

267

where (t) is a fundamental matrix of the equation


x(t)

= A(t)x(t).
We put

F (s, x) 

f (s, , x()) d,

s [0, T ],

x C([0, T ], RN )

and

(t, s)[F (s, x) F (t, x)] ds,

G1 (x)(t) =

(t, s) ds F (t, x).

G2 (x)(t) =

It is not dicult to show that there are r > 0, > 0 small enough such that H maps
the set
.

Q(r, ) 

y C([0, ], RN ) : sup |y(t) x0 | r


t[0, ]

into itself, G1 is a compact mapping (using the Arzel`


aAscoli Theorem) on Q(r, ) and
G2 is a contraction on Q(r, ). The local existence of a solution of (5.1.20) follows now
from Corollary 5.1.28. This local solution can be continuously extended. To keep the time
step xed it is sucient to assume that f satises the global Lipschitz condition with
respect to the x-variable on the whole domain M .
Exercise 5.1.31. Let H and Q(r, ) be as in Example 5.1.30. Prove that H maps the set
Q(r, ) into itself.
Exercise 5.1.32. Let G1 , G2 and Q(r, ) be as in Example 5.1.30. Prove that G1 is a
compact mapping on Q(r, ) and G2 is a contraction on Q(r, ).
Exercise 5.1.33. Prove Proposition 5.1.23(i)(vii).
Exercise 5.1.34. Consider the boundary value problem

x
(t) = f (t, x(t), x(t)),

t (0, 1),
x(0) = x(1) = 0,

(5.1.21)

where f : R3 R is a real function. Find conditions on f and apply Corollary 5.1.28 to


prove the existence of a solution to (5.1.21).
Hint. Look for conditions which guarantee that the Nemytski operator given by f is a
sum of a contraction and a compact operator.

5.2 Topological Degree


In this section we stress the basic properties of the Brouwer degree of a continuous
map in nite dimensional spaces and of the LeraySchauder degree of a compact
perturbation of the identity in general Banach spaces. We start with some elementary considerations in one dimension. The reader can nd another motivation from
the theory of functions of a complex variable in Appendix 4.3D. In the previous
section we have dealt with a solution of the operator equation
F (x) = 0.

268

Chapter 5. Topological and Monotonicity Methods

Now we are asking what happens with its solution if F : R R is slightly


perturbed. Figures 5.2.15.2.3 show that the situation can change considerably,
namely, either a solution may disappear (if a perturbation takes place in the solid
arrow direction) or the number of solutions may vary (in the dashed arrow direction).
F

G
x1

x0

x0

x2

x0
G

Figure 5.2.1.

Figure 5.2.2.

Figure 5.2.3.

A closer examination indicates that this can happen since a solution x0 is


either on the boundary (Figure 5.2.1) or the derivative F vanishes at x0 , i.e., x0 is
a critical point of F (Figures 5.2.2 and 5.2.3). We expect that a small perturbation
of F does not cause any alteration provided the just described cases do not occur.
There is another point which should be mentioned, namely the distinction between perturbations of F in the direction of one or the other arrow in Figures 5.2.2
and 5.2.3. The number of solutions changes by two, being even in Figure 5.2.2 (0
is even by denition), and being odd in Figure 5.2.3. Is there any way to describe
this phenomenon? Look at the dashed curve G in Figure 5.2.2 and assume that
G C 1 . We have
G (x1 ) < 0,
G (x2 ) > 0.
These signs remain the same in some neighborhoods U1 , U2 of x1 and x2 .
In particular, G is injective (actually a dieomorphism) on these neighborhoods and can be regarded as a local transformation of the x-coordinate. This
transformation changes the orientation at x1 and does not do that at x2 .
The sum of signs of G at the solutions of G(x) = 0 is zero (more generally
even) in Figure 5.2.2 and odd for the dashed curve in Figure 5.2.3. This observation
can be generalized to higher dimensions: If A : RN RN is a linear transformation
of coordinates (i.e., A is injective and surjective), then we say that A does not
change the orientation in RN provided
det A > 0
where A is the matrix representation of A (this does not depend on the choice of
basis in which the representation is taken). This concept can be also used locally
for a nonlinear C 1 -transformation G : RN RN by replacing G (a) for A. Then

5.2. Topological Degree

269

the sign of the matrix representation of G (a) is the sign of its Jacobian JG (a).
This idea leads to the following preparatory denition.
Denition 5.2.1. Let be an open and bounded subset of RN and let F
C(, RN ) C 1 (, RN ). Assume that y0 RN \ F () and y0 is a regular value
of F .12 Then we dene the Brouwer degree of F as

deg (F, , y0 ) =
sgn JF (x) .13
(5.2.1)
xF1 (y0 )

We point out that the sum in (5.2.1) is nite. Indeed, otherwise the set
F1 (y0 )  {x : F (x) = y0 }
x) = y0 , and since
has an accumulation point x
. By the continuity of F , F (
y0  F (), x
. By the Local Inverse Function Theorem (Theorem 4.1.1), F
is injective in a neighborhood U of x
. But U contains points of F1 (y0 ) dierent
from x
, a contradiction. Notice that in this argument we have used all assumptions
of Denition 5.2.1.
Proposition 5.2.2. Let be an open bounded subset of RN . The degree dened in
Denition 5.2.1 has the following properties ( I is the identity map):

1 if y0 ,
(i) deg (I, , y0 ) =
0 if y0  .
Suppose that F C(, RN ) C 1 (, RN ) and y0 RN \ F () is a regular value
of F . Then
(ii) deg (F, , y0 ) Z;
(iii) deg (F, , y0 ) = deg (F y0 , , o);
(iv) if deg (F, , y0 ) = 0, then the equation
F (x) = y0
has a solution in ;
(v) if 1 is an open subset of and y0  F ( \ 1 ), then
deg (F, , y0 ) = deg (F, 1 , y0 ).
More generally,
if 1 , . . . , k are pairwise disjoint open subsets of and

k

j , then
y0  F \
j=1

deg (F, , y0 ) =

k


deg (F, j , y0 ).

j=1
12 The
13 Here

denition
of a regular value is given in Denition 4.3.6.

= 0 as usual.

(5.2.2)

270

Chapter 5. Topological and Monotonicity Methods

(vi) For all y RN which are suciently close to y0 ,


deg (F, , y) = deg (F, , y0 )

holds.

(5.2.3)

(vii) For all G C(, RN ) C 1 (, RN ) which are suciently close to F in the


C 1 -topology,14
deg (G, , y0 ) = deg (F, , y0 )

is valid.

(5.2.4)

Proof. The properties (i)(v) follow immediately from Denition 5.2.1.


To prove (vi) let
F1 (y0 ) = {x1 , . . . , xk }
and let F be a dieomorphism of an open neighborhood Uj of xj onto a neighborhood Vj of y0 (the Local Inverse Function Theorem (Theorem 4.1.1). Denote


Uj > 0.
d  inf F (x) y0  : x \

j=1

If y y0 < d, then there is no solution of F (x) = y in \

k


Uj , and if y

j=1

k
%

Vj

j=1

(the neighborhood of y0 ), then there is exactly one x


j Uj such that F (
xj ) = y.
Moreover,
xj ) = sgn JF (xj ).
sgn JF (
This completes the proof of (5.2.3).15
To prove (vii) we use the same notation as above. Let G dier a little from
F in the C 1 -topology, say
F GC 1 (,RN ) < .
The quantity will be specied later. Put
H(t, x) = (1 t)F (x) + tG(x),

x , t (, 1 + ) for > 0.

(5.2.5)

Choose a xed neighborhood Uj as above. Notice that we can take Uj so small


that
c1  sup F (x)L(RN ) <

and

xUj

c2  sup [F (x)]1 L(RN ) < ,


xUj

and the determinant det F (x) has a constant sign in Uj . We have


H(t, x) y0  F (x) y0  |t|F (x) G(x) d |t| > 0
14 I.e.,

there exists > 0 such that


F G C 1 (,RN )  sup F (x) G(x) RN + sup F  (x) G (x) L(RN ) < .
x

15 Notice

that this proof is correct also for F1 (y0 ) = .

5.2. Topological Degree


k


for every x \

271

Uj and t (, 1 + ), and for suciently small > 0. In

j=1

particular, deg (H(t, ), Uj , y0 ) is well dened16 and, by (v),


deg (H(t, ), , y0 ) =

k


deg (H(t, ), Uj , y0 ).

j=1

We wish to prove that this degree is constant on the interval [0, 1]. We will study
the set
M  {(t, x) [0, 1] Uj : H(t, x) = y0 }
with help of the Implicit Function Theorem at the point (0, xj ). This is possible
since
H2 (t, x) F (x)L(RN ) = |t|F (x) G (x)L(RN ) |t| <
(t, x) (, 1 + ) Uj ,

1
,
c2

for small > 0.

This estimate implies that [H2 (t, x)]1 exists (Exercise 2.1.33). The Implicit Function Theorem implies that M has the form
{(t, (t)) : t [0, )}
in a certain neighborhood of (0, xj ) (F (xj ) = y0 ) where C 1 ([0, ), RN ) and

1
(t)

H1 (t, (t))L(RN )
L(RN ) = [H2 (t, (t))]

c2
,
1 c2

see again Exercise 2.1.33. In particular, is uniformly continuous and, if necessary,


it can be continued at least until t = 1.17 Therefore M is a graph of on the
interval [0, 1], i.e.,
{x Uj : G(x) = H(1, x) = y0 } = {(1)},
and, consequently,
deg (G, Uj , y0 ) = deg (F, Uj , y0 ).

One of our main goals is to show that the degree is homotopically invariant,
i.e., if for H given by (5.2.5) we have
y0 = H(t, x)

for all t [0, 1],

x ,

then deg (H(t, ), , y0 ) is constant on [0, 1]. In particular,


deg (H(0, ), , y0 ) = deg (H(1, ), , y0 )
16 A
17 If

(5.2.6)

homotopy H(t, x) for which the degree is well dened is called an admissible homotopy.
1, then lim (t) = x
exists and x
Uj (see Proposition 1.2.4).
t

272

Chapter 5. Topological and Monotonicity Methods

provided at least one side in (5.2.6) is dened. The problem in proving this property
can be seen from Figures 5.2.2 and 5.2.3. Namely, if the dashed curve G is moving
up, then it is equal to F in one instance, o is not a regular value for F , and
so deg (F, , o) is not yet dened. To overcome this obstacle we approximate a
critical value by a regular one. Such approximation is based on the so-called Sard
Theorem. Its special case stated below will be sucient for our purposes.
Theorem 5.2.3 (Sard). Let be an open subset of RN and assume that F
C 1 (, RN ). Then the Lebesgue measure of the set of critical values of F is zero.
Proof. Since RN can be covered by a sequence of bounded open sets and a countable union of sets of measure zero has also measure zero, we can suppose that is
bounded. Choose now an open subset G such that G . Let S be the set of
critical points of F in G. By the same argument as above, it is sucient to show
that
measN F (S) = 0
where measN is the Lebesgue measure in RN . Since G is compact,
d  dist(G, RN \ ) > 0
and G can be covered by a nite number of closed cubes C1 , . . . , Ck with
sides
parallel to the coordinate hyperplanes and edges of length a. If a < d N , then
k

Ci and
i=1

sup F (x)L(RN ) < .

c

k


i=1

Ci

Again it is sucient to show that


measN F (Ci S) = 0,

i = 1, . . . , k.

Choose one of these cubes and denote it by C. By the Mean Value Theorem
(Theorem 3.2.7),
F (y) F (x) cx y and F (y) Lx y (x y)x y, x, y C
where lim (r) = 0 (uniform continuity of F on the compact set C) and
r0+

Lx y  F (x) + F (x)(y x).


a
Divide now the cube C into mN small cubes with edges m
, and consider a small

cube
r =
a C which contains a critical point x. Denote the diameter of C by r (

N m ). Since Lx (C) lies in an (N 1)-dimensional hyperplane (x is a critical


point!),

(c
measN 1 Lx (C)
r )N 1
we have

and

dist(F (y), Lx (C))


r )
r for y C,

cN 1 (
measN F (C)
r)
rN .

5.2. Topological Degree

273

The number of such small cubes which contain critical points is mN at most.
Therefore,
r )
rN mN c1 (
r)
measN F (S C) cN 1 (
with a constant c1 independent of m. Since (
r ) 0 for m ,
measN F (S C) = 0.

Corollary 5.2.4. Under the hypotheses of Theorem 5.2.3 the set of regular values
of F : RN is dense in RN .
Proof. The complement of regular values, i.e., F ( S), cannot have an interior
point and zero Lebesgue measure simultaneously.

Remark 5.2.5. A more general Sard Theorem concerns F : RM RN . It is surprising that the assertion measN F (S C) = 0 needs more smoothness of F , namely
F C r (, RN )

where

r > max {0, M N }.

The proof is more involved (see, e.g., Hirsch [67, Chapter 3, Theorem 1.3] for
the C -case and comments given there, or Sard [117], or Sternberg [124, Theorem II.3.1]). If r max {0, M N }, then there exists F C r (, RN ) such that
int F ( S) = ,
see Whitney [132]. The statement on the Lebesgue measure can be strengthened
by considering the ner Hausdor measure or dimension. The following result also
holds:
If F : RM R is analytic, then F ( S) is even countable.
For more detail see, e.g., Fuck et al. [56, Chapter IV and Appendix IV]. There is
also a generalization for mappings F : X Y , X, Y Banach spaces:
If F (x) is a Fredholm operator for all x and F is suciently
smooth, then F ( S) is nowhere dense in Y .
Sharper results can be proved for functionals (i.e., Y = R) (see the book Fuck et
al. [56] cited above).
Now we return to the degree deg (F, , y0 ) where y0 RN \ F () and it is

a critical value of F . According to Corollary 5.2.4 there is a sequence {yn }n=1 of


regular values of F such that
lim yn = y0 ,

yn  F ().

In particular, deg (F, , yn ) is well dened by Denition 5.2.1. Part (vi) of Proposi
tion 5.2.2 allows us to presume that the sequence {deg (F, , yn )}n=1 is eventually

274

Chapter 5. Topological and Monotonicity Methods

constant and does not depend on the choice of the sequence {yn }n=1 of regular values. To see this we need to extend Proposition 5.2.2(vi) to guarantee that
deg (F, , y) is constant on any open connected set
G RN \ F ( S).
This can be done due to the fact that any two dierent points in an open connected subset of RN can be connected by a smooth curve in this subset (see
Proposition 1.2.7). We leave details to the reader. He or she should also be convinced that all statements of Proposition 5.2.2 are still valid for this more general
denition of the degree.
For the denition of the degree deg (F, , y0 ) it is not necessary to assume
that F C 1 (, RN ) since any F C(, RN ) can be approximated by smooth
mappings. This is a consequence of the StoneWeierstrass Theorem (see Theorem 1.2.14).18
To show that deg (G, , y0 ) is the same for all
G C(, RN ) C 1 (, RN )
which are close to F in the C(, RN )-norm we need the following extension of
Proposition 5.2.2(vii).
Proposition 5.2.6. Let be a bounded open subset of RN and let F, G be the
mappings from C(, RN ) C 1 (, RN ). Put
H(t, x) = (1 t)F (x) + tG(x),

t [0, 1],

x .

Assume that
y0 RN \ {H(t, x) : t [0, 1], x }.
Then
deg (F, , y0 ) = deg (G, , y0 ).

(5.2.7)

Proof. As has been stated above, Proposition 5.2.2(vii) holds for an arbitrary
y0 RN \ F (), in particular for H(t, ) and t small. Put
t0 = sup {t [0, 1] : deg (H(t, ), , y0 ) = deg (F, , y0 )}.
By the same statement (deg (H(t, ), , y0 ) is dened for all t [0, 1]),
deg (H(t0 , ), , y0 ) = deg (H(t, ), , y0 )

for t (t0 , t0 + ) [0, 1],

i.e., t0 = 1, and the equality (5.2.7) follows.

The following result is a summarization of the previous exposition.


the set A in Theorem 1.2.14 take restrictions of polynomials of N variables to and
approximate separately every Fi , i = 1, . . . , N , where F = (F1 , . . . , FN ). For another proof see
Lemma 4.3.121.

18 For

5.2. Topological Degree

275

Theorem 5.2.7. Let be a bounded open set in RN . There exists a mapping


deg : C(, RN ) RN Z
dened for all F C(, RN ) and y0 RN \ F () which has the properties
(i)(vii) from Proposition 5.2.2 and from Proposition 5.2.6.19 If, moreover, F
C 1 (, RN ) and y0 is a regular value of F , then the formula (5.2.1) holds.
Remark 5.2.8. The functiondeg from the previous theorem is unique. The reader
can consult, e.g., Amann & Weiss [5] or Deimling [34] to get more information.
Example 5.2.9 (Brouwer). Let F be a continuous mapping from the closed unit
ball B  B(o; 1) into itself. Then F has a xed point in B.
Indeed, if there is x B such that
x F (x) = o,
then the statement is true. In the other case put
H(t, x) = x tF (x).
Then
H(1, x) = o
and
H(t, x) x tF (x) 1 t > 0

for t [0, 1), x = 1.

By the homotopy invariance property of the degree,


1 = deg (I, int B, o) = deg (x F (x), int B, o).
By property (iv), the equation x F (x) = o has a solution in int B.

Example 5.2.10. Let B  B(o; 1) be the closed unit ball in RN and let A be a
linear injective operator from RN into RN .20 Then

where p =
m()
deg (A, int B, o) = (1)p
(A)
<0

and m() is the multiplicity21 of the eigenvalue of A.


This follows immediately from Denition 5.2.1 and Exercise 1.1.40. Notice
that the same result is true when A is replaced by a C 1 -mapping F : RN RN
which has an isolated zero at x0 RN , B(o; 1) is replaced by a suciently small
19 Mappings

F , G are now supposed to be continuous only.


A maps RN onto RN and o
A(B).
21 See footnote 23 on page 84, m() is often called the algebraic multiplicity.
20 Actually,

276

Chapter 5. Topological and Monotonicity Methods

ball B(x0 ; r) (such that the equation F (x) = o has a unique solution in this closed
ball, namely x0 ) and F (x0 ) is injective. Under these hypotheses we have

deg (F, B(x0 ; r), o) = (1)p
where p =
m().
(5.2.8)
(F  (x0 ))
<0

The value deg (F, B(x0 ; r), o) is also called the index of an isolated solution x0 of
g
the equation F (x) = o.
Example 5.2.11. Let be a bounded open subset of RN and let Y be a linear
subspace of RN . Suppose that f C(, Y ) is such that o = f (). Denote
by a projection of RN onto Y , by f the restriction of f onto Y and
g(x)  f(x) + (I )x. Then
deg (g, , o) = deg (f, Y, o).

(5.2.9)

To see this notice rst that


( Y ) = Y

and

g1 (o) = f1 (o).

This means that both sides of (5.2.9) are dened. By the construction of the
degree, it suces to prove the equality under the following additional assumptions:
f C 1 () and o is a regular value. Since
g (y)(h + k) = f (y)h + k

for

y Y ,

h Y,

k Ker ,

we get
det g (y) = det f (y), 22
g

and (5.2.9) follows.

Our next aim is to generalize the notion of the degree to innite dimensional
spaces. Since the Brouwer Theorem is a corollary of the homotopy invariance property of the degree and this theorem does not hold even in an innite dimensional
Hilbert space (Example 5.1.7) we cannot expect a meaningful generalization of the
Brouwer degree which would be valid for all continuous mappings. Similarly as in
the Schauder Fixed Point Theorem we restrict our attention to operators which
are well approximated by nite dimensional ones, i.e., to compact operators.
One more remark is desirable. One of the main consequences of the notion of
deg (F, , y0 ) is a sucient condition for the solvability of the equation F (x) = y0
in the set . If F is a compact operator, then F () is rather small in an innite
dimensional space. Therefore it is much better to solve either
x F (x) y0 = o
22 The

reader is asked to check this equality.

5.2. Topological Degree

277

(recall the Fredholm theory in Section 2.2) or, more generally,


Ax F (x) = o

for a suitable A.

The LeraySchauder degree concerns operators of the type


I F
where F : X X is a compact operator. Let be a bounded open set in a Banach
space X, o X \ (I F )(). Using the compactness of F , it is easy to prove
that (I F )() is closed, and hence,
d  dist(o, (I F )()) > 0.
Let {Fn }
n=1 be a sequence in Cf (, X) which converges to F uniformly on
(Theorem 5.1.9). Denote
Xn = Lin Fn (),

n = X n ,

and
Gn (x) = x Fn (x)

for x n .

If n is suciently large, then o  Gn (n ) and deg (Gn , n , o) is well dened.


Lemma 5.2.12. Under the above stated hypotheses, the sequence of integers

{deg (Gn , n , o)}n=1 is constant for large n, and its limit does not depend on
the choice of the approximating sequence {Gn }
n=1 .
Proof. For a given > 0 there is n0 such that
sup F (x) Fn (x) <

for

n n0 .

Choose some n, m n0 and put


= Xn + Xm ,
X

k (x) = x Fk (x),
G

x X,

k = n, m.

Consider the homotopy


m (x) + (1 t)G
n (x),
H(t, x) = tG

t [0, 1], x X.

= ( X)
we have
For x X
H(t, x) = x F (x) t[Fm (x) F (x)] (1 t)[Fn (x) F (x)]
x F (x) tFm (x) F (x) (1 t)Fn (x) F (x)
d 2 > 0
for small > 0. By Theorem 5.2.7,
m , X,
o) = deg (G
n , X,
o).
deg (G

278

Chapter 5. Topological and Monotonicity Methods

Using Example 5.2.11 we get


o) = deg (Gk , k , o),
k , X,
deg (G

k = m, n,

i.e., deg (Gn , n , o) is constant for n n0 .


If n is another approximating sequence of F , then the homotopy joining the
restrictions of I Fn and I n to the span Z of Im Fn + Im n can be dened.
The same procedure as above yields
deg (x Fn (x), Z, o) = deg (x n (x), Z, o).

We are now able to dene the LeraySchauder degree


deg (I F, , y0 )  deg (I (F + y0 ), , o)

as the limit of deg (Gn , n , o) for any approximating sequence {Gn , n }n=1 . This
construction also shows that the LeraySchauder degree inherits its properties
from the Brouwer degree.
Theorem 5.2.13. Let be a bounded open subset of a Banach space X. There
exists a mapping deg (I F, , y0 ) dened for all F C (, X) and y0 X such
that
x F (x) = y0
for all x .
This mapping has the following properties:

1 if y0 ,
(i) deg (I, , y0 ) =
0 if y0  .
(ii) deg (I F, , y0 ) = deg (I F y0 , , o).
(iii) If deg (I F, , y0 ) = 0, then the equation
x F (x) = y0
has a solution in .
(iv) If 1 , . . . , k are pairwise disjoint open subsets of and x F (x) = y0 for
k

each x \
j , then
j=1

deg (I F, , y0 ) =

k


deg (I F, j , y0 ).

j=1

(v) If F, G C (, X) and
sup F (x) G(x)X < inf x F (x) y0 X ,
x

then
deg (I F, , y0 ) = deg (I G, , y0 ).

5.2. Topological Degree

279

(vi) (homotopy invariance property) If F, G C (, X) and


H(t, x) = (1 t)F (x) + tG(x),

t [0, 1],

x ,

are such that


x H(t, x) = y0

for every

and

t [0, 1],

then deg (I H(t, ), , y0 ) is constant on [0, 1]. In particular,


deg (I F, , y0 ) = deg (I G, , y0 ).
Proof. Finite dimensional approximations and the corresponding properties of the
Brouwer degree are used to prove all the statements. We only give details for (iii).

We can assume that y0 = o (by (ii)). Let {Fn }n=1 Cf (, X) be a sequence of


nite dimensional approximations that converges to F uniformly on . We know,
by construction of the degree, that
deg (I F, , o) = deg (In Fn , n , o) = 0

for all n large.

By Theorem 5.2.7 there are xn n such that


Fn (xn ) = xn .

Since F is compact there exists a subsequence {F (xnk )}k=1 converging to a z X.


It follows from the uniform convergence of Fn that lim Fnk (xnk ) = z, too. This
k

means that also lim xnk = z and, therefore,


k

F (z) = z.
But z cannot belong to , i.e., z .

Example 5.2.14 (Rothes version of the Schauder Fixed Point Theorem). Assume
that F is a compact operator from the closed unit ball B(o; 1) of a Banach space
X into X. If
F (B(o; 1)) B(o; 1),
then F has a xed point in B(o; 1).
Indeed, suppose not and consider H(t, x)  x tF (x). By the homotopy
invariance property,
deg (I F, B(o; 1), o) = deg (I, B(o; 1), o) = 1,
a contradiction.

280

Chapter 5. Topological and Monotonicity Methods

Example 5.2.15 (Schaefer). Let F C (X, X) and let


 {x X : t [0, 1] such that x tF (x) = o}
be bounded.23 Then F has a xed point.
To prove this choose an r > 0 such that B(o; r) and put = B(o; r)
(open ball). The homotopy invariance property of the degree can be applied to
H(t, x) = (1 t)x + t(x F (x)),
g

and the result follows.

The next example shows that nding an a priori estimate need not be a
trivial task.
Example 5.2.16. Consider the boundary value problem

x(t) = f (t, x(t), x(t)),

t (0, 1),
x(0) = x(1) = 0,

(5.2.10)

where f : [0, 1] R2 R is a continuous function. We know (Example 2.3.8 and


also Example 5.1.14) that a solution of (5.2.10) is also a solution of the integral
equation
 1
G(t, s)f (s, x(s), x(s))

ds = x(t)
F (x)(t) 
0

where the Green function G(t, s) is dened as follows:



s(t 1), 0 s t 1,
G(t, s) =
t(s 1), 0 t < s 1.
Notice the dierence between this example and Example 5.1.14. Here the operator
F depends also on the derivative x,
and it is thus dened only on a dense subset
of the space C[0, 1]. The notion of the degree cannot be used for F in this space.
Therefore, we have to work either in C 1 or in the space
X = {x C 2 [0, 1] : x(0) = x(1) = 0}
which our solution has to belong to. In both the spaces the problem of an a priori
estimate of a possible solution to (5.2.10) occurs see Step 2 of this example. We
will work in X.
23 This assumption is often called an a priori estimate. Notice that it is not assumed that the
equation xtF (x) = o has any solution. However, if a solution exists, then it belongs to a certain
ball the radius of which is independent of t. The result given in this example is also called the
LeraySchauder Continuation Method.

5.2. Topological Degree

281

Step 1. First we show that F is a compact operator on X. To prove this notice


that F is the composition where : X C[0, 1] is a Nemytski type operator
(x) : t f (t, x(t), x(t))

and : C[0, 1] X is the linear integral operator


 1
(y)(t) =
G(t, s)y(s) ds.
0

The Nemytski operator is also a composition of a compact embedding of X


into C 1 (the Arzel`aAscoli Theorem) and a continuous operator from C 1 [0, 1] into
C[0, 1]. It is sucient to show that is a continuous linear operator from C[0, 1]
into X.
Indeed, since (y) is a solution of the boundary value problem

x
(t) = y(t),
t (0, 1),
x(0) = x(1) = 0
(Example 2.3.8), we have x = (y) X. Because of the boundary conditions
0 ) = 0 (the classical theorem due to Rolle). This
there is t0 (0, 1) such that x(t
allows us to write
 t
y(s) ds,
t [0, 1],
x(t)

=
t0

- t
x(s)

ds-- yC[0,1],
and to get the estimate |x(t)|

yC[0,1]. Since |x(t)| = 0


we have
xX  sup |x(t)| + sup |x(t)|

+ sup |
x(t)| 3yC[0,1].
t[0,1]

t[0,1]

t[0,1]

Step 2. In order to establish an a priori estimate we have to require estimates on


the behavior of f (t, x, y) for large x and y:
(H1) There is M0 such that
xf (s, x, 0) > 0

for s [0, 1] and |x| M0 .

(H2) There are c1 , c2 such that


|f (s, x, y)| c1 y 2 + c2

for s [0, 1],

|x| M0 ,

y R.24

Suppose that (H1) and (H2) hold. Let there exist x X, x = o, and (0, 1]
such that
x = F (x).
24 The

tion.

condition (H1) is sometimes called the sign condition and (H2) the Nagumo-type condi-

282

Chapter 5. Topological and Monotonicity Methods

First we will estimate xC[0,1]. There is t0 (0, 1) such that |x(t0 )| =


0 ) = 0 and x
(t0 ) 0. Since
xC[0,1] and we assume x(t0 ) > 0. Then x(t
0 x(t0 )
x(t0 ) = x(t0 )f (t0 , x(t0 ), 0),
we have x(t0 ) M0 according to (H1). Similarly for x(t0 ) < 0, i.e., xC[0,1] M0 .
To get an estimate of x we consider the function
2

+ c2 ].
g(t)  log[c1 (x(t))

Since
g(t)

= 2c1

x(t)
x(t)
x(t)f

(t, x(t), x(t))

= 2c1
,
2 +c
2+c
c1 (x(t))

c1 (x(t))

2
2

we obtain, by (H2),
|g(t)|

2c1 |x(t)|.

Let
G = {t [0, 1] : x(t)

= 0}.
Then
G=

Jj

where Jj is a closed interval such that x(t)

= 0 for t Jj and x vanishes at one


end point of Jj (say j ) at least. Now, we have
-
 t
- t
c1 x 2 (t) + c2
0 log
= g(t) g(j ) =
g(s)
ds |g(s)|

dsc1 0 + c2
j
j
-
- t
2c1 |x(s)|

ds- = 2c1 |x(t) x(j )|,


- j
since x does not change its sign in Jj . Hence
log

c1 x 2 (t) + c2
4c1 M0 .
c2

This inequality shows that

|x(t)|

M1 

c2 2c1 M0
e
,
c1

t [0, 1].

If M2  sup {f (t, x, y) : t [0, 1], |x| M0 , |y| M1 }, then



xC[0,1] M2 .
C[0,1] , 
xC[0,1] show that the set from ExamThese estimates of xC[0,1], x
ple 5.2.15 is bounded and therefore the proof of existence of a solution of (5.2.10)
under the hypotheses (H1) and (H2) is complete.
The reader can imagine that (H1), (H2) are not the only sucient conditions
for solving (5.2.10). However, the direct use of the Schauder Fixed Point Theorem
leads to more restrictive assumptions on f . A survey of results until 1980 can be
g
found in the monograph Fuck [53].

5.2. Topological Degree

283

It is clear that the above stated procedure can be used for solving a more
general equation
(5.2.11)
Ax = F (x)
where F C (, X)
and A is a linear operator with a bounded inverse. In that case (5.2.11) is equivalent
to
x = A1 F (x)
with a compact operator A1 F . More interesting questions arise for a non-invertible A. Since many dierential operators (both ordinary and partial) are Fredholm
operators we will suppose that A is a linear closed Fredholm operator25 of index
zero, and proceed as in Remark 4.3.14 with the exception that A is not assumed
to be continuous.
We denote
Y2  Im A,
X1  Ker A,
and choose topological complements X2 , Y1 such that
X = X1 X2 ,

Y = Y1 Y2 .

These closed complements exist because X1 has a nite dimension and Y2 a nite
co-dimension (Example 2.1.12 and Remark 2.1.19). By the assumption on the
index of A there is also an homeomorphism of Y1 onto X1 . Denote by P and
Q the linear continuous projections onto X1 and Y1 with kernels X2 and Y2 ,
respectively. Then the restriction of A to X2 Dom A is an injective operator with
a bounded inverse B.26
The equation (5.2.11) is equivalent to the pair of equations
Ax2 = (I Q)F (x1 + x2 ),

o = QF (x1 + x2 ),

x1 X1 , x2 X2 Dom A,

(see Figure 5.2.4) or


x2 = B(I Q)F (x1 + x2 ),

x1 = x1 + QF (x1 + x2 ).27

(5.2.12)

This pair of equations is equivalent to the equation


G(x) = x

where

G(x)  P x + QF (x) + B(I Q)F (x).

linear closed operator A is said to be Fredholm if dim Ker A < , Im A is closed and
codim Im A < . The index of such an operator is dened as ind A  dim Ker A codim Im A.
See the denition on page 70.
26 Indeed, B is a closed operator (as an inverse to a closed operator) dened on the Banach space
Y2 . The continuity of B follows now from the Closed Graph Theorem (Corollary 2.1.10). The
operator B(I Q) is called a generalized (or a right) inverse to A. It is characterized by the
following two properties:
(i) AB(I Q) = I Q (the reason for calling it the right inverse);
(ii) B(I Q)Ax = (I P )x for x Dom A.
25 A

27 Since

F is nonlinear, need not be taken as linear. It is actually only essential that maps
Y1 into X1 and 1 (o) = {o}.

284

Chapter 5. Topological and Monotonicity Methods

X2

Y2 = Im A

B
Q

P
o

X1 = Ker A

Y1
o

Figure 5.2.4.

If is a bounded open subset of X, Ax F (x) = o for each x and G is


compact on , then the LeraySchauder degree
deg (I G, , o)
is well dened. It is called the coincidence degree of the couple (A, F ). It can be
proved that this denition does not depend on the choice of the projections P ,
Q and the class of s which do not change the orientations in X1 and Y1 . The
coincidence degree was introduced by J. Mawhin (see, e.g., Mawhin [91] or Gaines
& Mawhin [57]). J. Mawhin also proved the following theorem which generalizes
the statement of Example 5.2.15.
Theorem 5.2.17 (J. Mawhin). Let A : Dom A X X be a Fredholm operator
of index zero, a bounded open subset of a Banach space X, and let B(I Q)F
C (, X) where B(I Q) is a generalized inverse to A. Assume further that
(i) Ax F (x) = o for x Dom A, (0, 1),28
(ii) deg (QF |Ker A , Ker A , o) = 0.
Then the equation (5.2.11) has a solution in .
Proof. The proof is based on the observation that the coincidence degree can be
reduced with help of the homotopy invariance property and the Product Formula
to the Brouwer degree of the restriction of QF to Ker A for details see the
references given above.

Example 5.2.18. Consider the equation
x = f (t, x)

(5.2.13)

together with the periodic boundary condition


x(0) = x(1)

(5.2.14)

where f C([0, 1] R, R). See also Example 4.3.15.


28 Notice

= 0.

that injectivity of A is actually not needed hence the assumption (i) is not assumed for

5.2. Topological Degree

285

We denote Dom A = {x C 1 [0, 1] : x(0) = x(1)} and put


for x Dom A.

Ax = x

Further, let F be a Nemytski operator dened by


t [0, 1],

F (x)(t) = f (t, x(t)),

x X  C[0, 1].

Then A is a Fredholm operator of index zero. We choose projections


P x = x(0)
and

onto Ker A

Qy =

y(s) ds

onto the complement of Im A.

Then the generalized inverse to A is given by



 t
y(s) ds t
B(I Q)y(t) =
0

y(s) ds

and it is an isomorphism of Im A = Ker Q onto X2 Dom A where X2 = Ker P .


Moreover, B(I Q)F is a compact operator on X. To verify conditions (i), (ii) of
Theorem 5.2.17 we suppose
(H) there are functions f+ , f C[0, 1] such that
lim f (t, x) = f (t),

lim f (t, x) = f+ (t)

x+

uniformly with respect to t [0, 1] and


t [0, 1],

f (t) < f (t, x) < f+ (t),

x R.29

Step 1. Verication of condition (i): For any solution x of (5.2.13)(5.2.14) we


have, by integration,
 1
 1
 1
f (t) dt < 0 =
f (t, x(t)) dt <
f+ (t) dt.
(5.2.15)
0

Take an > 0 such that


 1


f (t) dt < ,

f+ (t) dt > .
0

Then there exists r > 0 such that


0 < f (t, x) f (t) <
and
0 < f+ (t) f (t, x) <
29 Conditions

for t [0, 1],


for

t [0, 1],

x < r,
x > r.

similar to (H) are called conditions of the LandesmanLazer type. See, e.g., Landesman & Lazer [83], Fuck [53], Mawhin [91] or Dr
abek [39], and also Section 7.5.

286

Chapter 5. Topological and Monotonicity Methods

This implies that




f (t, x) dt <

f (t, x) dt >
0

for

x < r,

for

x > r.

It means that for any solution x of (5.2.13)(5.2.14) there exists t0 [0, 1] such
that |x(t0 )| r. Since


x(t) = x(t0 ) +

f (s, x(s)) ds,


t0

we get
xC[0,1] M  r + max {f C[0,1] , f+ C[0,1] }.
If we take to be a ball B(o; R) of radius R > M , then the condition (i) from
Theorem 5.2.17 is satised for = 1. The same is also true for the solution of
(0, 1).

Ax = F (x),

Step 2. Verication of condition (ii): For x Ker A we have




QF (x) =

f (t, x) dt
0

provided we have taken as the identity map. Moreover,




f (t, R) dt > 0 >


0

f (t, R) dt.

(5.2.16)

From the construction of the Brouwer degree it follows that we can assume that

(x) 

f (t, x) dt
0

is smooth, and 0 is a regular value of . By (5.2.16), the number of zero points of


is odd. This means that

deg (, (R, R), 0) =
sgn det (x)
(x)=0

= 1 = deg (QF |Ker AB(o;R) , Ker A B(o; R), o)


and, therefore, condition (ii) is also satised.

5.2. Topological Degree

287

These considerations show that the problem (5.2.13)(5.2.14) has a solution


provided (H) is fullled. Notice that we have shown that


f (t) dt < 0 <


0

f+ (t) dt
0

is also a necessary condition for the solvability of (5.2.13)(5.2.14) under the asg
sumption (H) by (5.2.15)).
Remark 5.2.19. It is not necessary to consider only projections P , Q on the small
Ker A and a complement of Im A. For example, suppose that there are projections
{Qn } converging to the identity in a certain sense, and
Qn Q = Q
(e.g., Qn can be the partial sums of the Fourier series of the elements of Y = C[0, 1]
for the periodic problem). If we can take projections Pn so that
A(Im (I Pn )) = Im (I Qn ),
then there is a chance to solve the rst equation in (5.2.12) for a xed x1 by
the Contraction Principle even if F is only locally Lipschitz. This idea belongs to
L. Cesari (see, e.g., his survey in Cesari [22]). Using this approach he proved the
existence of a 2-periodic solution of the equation
x
+ x3 = sin t.
Notice a signicant dierence in the sign of the nonlinear term here and in (H1)
in Example 5.2.16, and the fact that the growth of the nonlinear term is faster
here than in (H2).
At the end of this section we turn our attention to the bifurcations of solutions. As in Section 4.3 we consider the equation
f (, x) = o
where f : R X X is continuous on J U, J is an open interval and U is a
neighborhood of o in a Banach space X. We suppose that
f (, o) = o

for all J

and desire to nd conditions under which the point 0 J is a bifurcation point


according to Denition 4.3.21. In Section 4.3 we have used a method based on the
Implicit Function Theorem and now we want to employ the topological approach
based on the degree theory. Notice that the denition of the index of an isolated
solution (Example 5.2.10) can be literally used also in an innite dimensional
space.

288

Chapter 5. Topological and Monotonicity Methods

Proposition 5.2.20. Let h(, ) : X X be a compact operator on the neighborhood


U of zero in a Banach space X for all J . Let o be an isolated solution of
f (, x)  x h(, x) = o

in

for all

J \ {0 }.

Put
i() = deg (f (, ), U, o).
If
lim i() = lim i(),

(5.2.17)

0+

then (0 , o) is a bifurcation point of f .


Proof. Suppose not. Then there is a neighborhood V = J U of (0 , o) such that
(, o) are the only solutions of
f (, x) = o

in V.

This means that for any J the index


i() = deg (I h(, ), U, o)
is dened and, by the general homotopy invariance property of the degree (Exercise 5.2.29), the index i() is constant, a contradiction to (5.2.17).

The use of Proposition 5.2.20 is restricted to the problems of computing the
index i(). The following classical result (Theorem 5.2.23) is based on a special
form of f which is often met in applications. For the proof we need two prerequisites
which are of independent interest.
Proposition 5.2.21. Let be an open set in a Banach space X and let F
C (, X). If the Frechet derivative F (x0 ) exists for an x0 , then F (x0 ) is a
(linear) compact operator.

Proof. If F (x0 ) is not compact, then one can nd 0 > 0 and a sequence {yn }n=1
X such that yn  1 and
F (x0 )yk F (x0 )yl  0

for k = l.

By the denition of Frechet derivative, there is > 0 such that


F (x0 + h) F (x0 ) F (x0 )h

0
h
4

provided h <

Choose such that


 yk  <

and

x0 + yk

for all k N.

( 1).

5.2. Topological Degree

289

Then
F (x0 + yk ) F (x0 + yl )
F (x0 )( yk yl ) F (x0 + yk ) F (x0 ) F (x0 ) yk )
F (x0 + yl ) F (x0 ) F (x0 ) yl 
But this means that F is not compact on , a contradiction.

0
.
2


Proposition 5.2.22 (LeraySchauder Index Formula). Let be an open bounded


set in a Banach space X and let F C (, X). Let x0 be a unique solution
in of the equation
x = F (x).
Assume that the Frechet derivative F (x0 ) exists and I F (x0 ) is continuously
invertible. Then

where =
m()
(5.2.18)
deg (I F, , o) = (1)
(F  (x0 ))R
>1

and m() is the multiplicity30 of the eigenvalue of the operator F (x0 ).


Proof. First we recall that F (x0 ) is a compact operator (Proposition 5.2.21) and,
therefore, is a nite number (Corollary 2.2.13). Choose such a small ball B(o; )
that x0 + B(o; ) , and put
H(t, y) =
and

F (x0 + ty) F (x0 )


,
t
H(0, y) = F (x0 )y,

t (0, 1],

y B(o; ),

y B(o; ).

The ball B(o; ) can be chosen such that the equation


y = H(t, y)
has a unique solution in B(o; ), namely y = o. Indeed,


1




H(t, y) y = 
[F
(x
+
ty)

F
(x
)

F
(x
)(ty)]
+
F
(x
)y

y
0
0
0
0
t



1

c


F (x0 )y y 
 t [F (x0 + ty) F (x0 ) F (x0 )(ty) cy 2 y
provided y is small enough (by the denition of F (x0 ) and the assumption on
I F (x0 )).
30 For

the denition of multiplicity see footnote 23 on page 84.

290

Chapter 5. Topological and Monotonicity Methods

By the homotopy invariance property (in a more general setting see Exercise 5.2.29), deg (I H(t, ), B(o; ), o) is constant on the interval [0, 1]. In particular,
deg (I F, , o) = deg (I F (x0 ), B(o; ), o).
Put
X1 

Ker [I F (x0 )]p .

(F  (x0 )) p=1
>1

As we have mentioned above, dim X1 = < . Moreover, there exists a topological complement X2 to X1 in X which is F (x0 )-invariant (see the decomposition
(2.2.4)). This decomposition of X allows us to use the Product Formula for the
degree (Exercise 5.2.28) provided balls Bi Xi , i = 1, 2, are chosen such that
B1 B2 B(o; ). Hence we obtain
deg (I F (x0 ), B(o; ), o) = deg (F1 , B1 , o) deg (F2 , B2 , o)
where Fi denotes the restriction of I F (x0 ) to Xi , i = 1, 2. To compute
deg (F2 , B2 , o) we introduce the homotopy
H2 (t, y) = y tF (x0 )y,

t [0, 1],

y X2 .

Assume that H2 (t, y) = o for a y = o. Then t (0, 1) and 1t (F (x0 )) and, by


the denition of X1 , y X1 . Since X1 X2 = {o} we arrive at a contradiction.
This consideration shows that we may apply the homotopy invariance property to
H2 to get
deg (F2 , B2 , o) = deg (I, B2 , o) = 1.
The degree deg (F1 , B1 , o) is the Brouwer degree of the linear operator F1 in the
nite dimensional space X1 , and it was computed in Example 5.2.10. Notice that
A is here(I F (x0 ))|X1 , and thus
{ (A) : < 0} = { (F (x0 )) : > 1}.
This shows that
deg (F1 , B1 , o) = (1) .

Theorem 5.2.23 (Krasnoselski Local Bifurcation Theorem). Let U be a neighborhood of o in a Banach space X, and let
f (, x) = x Ax G(, x),

J,

x U,

where J is an open interval in R, A is a linear compact operator on X, G(, ) : U


X is a compact operator and
G(, x)
=o
xo
x
lim

for all

J.

If 0 J is such that 10 is an eigenvalue of A of odd multiplicity, then (0 , o)


is a bifurcation point of f .

5.2. Topological Degree

291

Proof. Suppose that (0 , o) is not a bifurcation point. Then there is a neighborhood J V of (0 , o) such that the equation
f (, x) = o
has for every
J a*unique solution in V, namely x = o. We may assume that
)
0  J, : 1 (A) J = {0 }, and also that 0 > 0 (for 0 < 0 consider
f(, x) = f (, x)). The degree deg (f (, ), V, o) is dened and it is given by the
LeraySchauder Index Formula (Proposition 5.2.22):


where =
m() =
m().
deg (f (, ), V, o) = (1)
(A)R
>1

(A)R
1
>

' (
Since m 10 is odd, the degree deg (f (, ), V, o) changes sign at 0 . A contradiction follows now from Proposition 5.2.20.

The Krasnoselski Theorem 5.2.23 is of a local nature and does not say anything about the global behavior of a branch of nontrivial solutions of the equation
f (, x) = o.
The so-called global bifurcation theorems describe these branches (see Appendix 5.2A). The interested reader can also consult, e.g., Rabinowitz [103, pages 11
36], Ize [69], Nirenberg [100, Chapter 3], Krasnoselski & Zabreiko [79], Krawcewicz
& Wu [80]. There are also methods depending on other topological tools. See, e.g.,
Alexander [3, pages 457483] or Fitzpatrick [50] and references given there.
Remark 5.2.24 (Comparison of Theorems 4.3.22 and 5.2.23). Let
value of A and


1
I A = 1.
dim Ker
0

1
0

be an eigen-

Denote by x0 , x0  = 1, an eigenvector of A associated with 10 . Let us compare


the assumptions of Theorems 4.3.22 and 5.2.23. One of the essential dierences
consists in the smoothness assumptions: while Theorem 4.3.22 applied to
f (, x) = x Ax G(, x)
requires G being a C 2 -mapping, Theorem 5.2.23 demands G compact (and so
continuous). The assumption G(, x) = o(x), x o, yields
f2 (0 , o) = I 0 A,


f1,2
(0 , o) = A.

(5.2.19)

Theorem 4.3.22 requires


Im (I 0 A) = Im (I 0 A),
codim Im (I 0 A) = 1,

(5.2.20)
(5.2.21)


f1,2
(0 , o)(1, x0 )  Im (I 0 A).

(5.2.22)

292

Chapter 5. Topological and Monotonicity Methods

The compactness of A (Theorem 5.2.23) implies that I 0 A is a Fredholm operator of index 0, hence (5.2.20) holds and also
codim Im (I 0 A) = dim Ker (I 0 A) = 1,
i.e.,
follows. The last assumption is closely connected with the multiplicity
(
' (5.2.21)
1
1
m 0 of 0 as follows from the assertion:
The assumption (5.2.22) is veried if and only if

m

1
0


= dim

Ker (I 0 A)k = 1.

(5.2.23)

k=1

First, let us prove (5.2.23) (5.2.22). Assume the contrary:



f1,2
(0 , o)(1, x0 ) Im (I 0 A).

According to (5.2.19) it means Ax0 Im (I 0 A). Since x0 = 0 Ax0 , we have


x0 Im (I 0 A) as well. Then there exists w X such that x0 = w 0 Aw.
But
i.e.,
w Ker (I 0 A)2 .
x0 Ker (I 0 A),
Since x = o, we have w  Ker (I 0 A), which implies

dim Ker (I 0 A) > dim Ker (I 0 A),
2

i.e.,

1
0


> 1,

a contradiction.
Now, let us prove (5.2.22) (5.2.23). Take
w Ker (I 0 A)2

and set

u = (I 0 A)w.

Then (I 0 A)u = (I 0 A)2 w = o that implies u Ker (I 0 A). Since


Ker (I 0 A) is generated by x0 , there exists a R such that u = ax0 . Simultaneously,
u = (I 0 A)w Im (I 0 A).
For a = 0 we have
Ax0 = 0 x0 =

0
u Im (I 0 A),
a

a contradiction with (5.2.19) and (5.2.22). Hence


a = 0 and u = (I 0 A)w = o,

i.e.,

This proves
Ker (I 0 A)2 Ker (I 0 A).

w Ker A.

5.2. Topological Degree

293

Since the opposite inclusion is evident, we have proved


Ker (I 0 A)2 = Ker (I 0 A).
By induction by the power n we now easily prove that
Ker (I 0 A)n+1 = Ker (I 0 A)n

for any n N

(do it in detail!).
Exercise 5.2.25. Prove the following assertion:
Let f , A and G be as in Theorem 5.2.23. Let 0 = 0 be a bifurcation
point of f . Then 10 is an eigenvalue of A.
Hint. If 0 is a bifurcation point of f , there are
o = xn o,
Set vn 

xn
xn  .

n 0 ,

f (n , xn ) = o.

Then
vn = n Avn

G(n , vn )
.
xn 

(5.2.24)

Since {vn }n=1 is bounded, A is compact, and passing to a subsequence if necessary


we may assume that vn v for a v X, v = o. From (5.2.24) we obtain that
v = 0 Av.
Exercise 5.2.26. Let F C (, X) where X is an open, bounded, symmetric
with respect to o X, and nonempty set in a Banach space X, F (x) = o for all
x . Assume that
F (x) = F (x)

for any x .

Then the LeraySchauder degree


deg (I F, , o)
is an odd number.
Hint. Use nite dimensional approximations as in the construction of the Leray
Schauder degree (see pages 277278) and Theorem 4.3.130.
Exercise 5.2.27. Modify the proof of (5.2.9) to obtain the so-called Product Formula:
deg (g, , y0 ) = deg (f1 , 1 , y1,0 ) deg (f2 , 2 , y2,0 )
where g = (f1 , f2 ) : RN1 +N2 , = 1 2 , y0 = (y1,0 , y2,0 ), i RNi ,
fi C(i , RNi ), yi,0  fi (i ), i = 1, 2.
Exercise 5.2.28. By repeating the construction of the LeraySchauder degree show
that the Product Formula and the boundary dependence (Theorem 4.3.124(vii))
of the degree also hold for the LeraySchauder degree.

294

Chapter 5. Topological and Monotonicity Methods

Exercise 5.2.29. Prove the following general homotopy invariance property:


Let be an open bounded set in a Banach space X and assume that
h = h(t, x) C([0, 1] , X) and
x h(t, x) = y0

for every

and

t [0, 1].

Then
deg (I h(t, ), , y0 )
is constant with respect to t [0, 1].
The following two exercises use an idea similar to that of Example 5.2.14.
Exercise 5.2.30. Let H be a Hilbert space and F a compact operator on a bounded
open set H into H. Assume that o and
(F (x), x) x2

for each x .

Prove that F has a xed point in .


Hint. Suppose not and show that there is t0 [0, 1], x0 such that
x0 = t0 F (x0 ).
By assumption, t0 = 1.
Exercise 5.2.31. Let F be a compact operator from the closed unit ball B(o; 1) of
a Banach space X into X and, moreover, let
x F (x)2 F (x)2 x2

for x B(o; 1).

Prove that F has a xed point in B(o; 1).


Exercise 5.2.32. Let f be continuous and satisfy the following growth conditions:
There are K > 0 and 0 < < 1 such that the inequality
|f (t, x, y)| K(1 + |x| + |y| )

holds for

t [0, 1], x, y R.

Then the boundary value problem (5.2.10) has a solution. Prove that!
Hint. Proceed similarly to Example 5.2.16. Use the equation to estimate x
and compute x with help of a special form of the kernel G.
Exercise 5.2.33. Apply Theorem 5.2.23 to the Dirichlet boundary value problem

x
(t) + x(t) + g(, t, x(t)) = 0,
t (0, ),
x(0) = x() = 0,
and show that every point (k 2 , o), k = 1, 2, . . . is a bifurcation point.

5.2A. Global Bifurcation Theorem

295

5.2A Global Bifurcation Theorem


In this appendix we study the bifurcation equation
f (, x)  x Ax G(, x) = o.

(5.2.25)

The following result is due to Rabinowitz [103, pp. 1136], Rabinowitz [104].
Theorem 5.2.34 (Rabinowitz Global Bifurcation Theorem). Let X be a Banach space,
an open set in R X, (0 , o) , 0
= 0. Let us assume:
A is a compact linear operator from X into X,

(5.2.26)

G is a compact (nonlinear) operator from into X,

(5.2.27)

for any bounded set M {v R : (v, o) } we have


G(, x) = o(x), x 0, uniformly for M,
1
is an eigenvalue of A of odd multiplicity.
0

(5.2.28)
(5.2.29)

Denote by S the closure of all solutions of (5.2.25) with x


= o, i.e.,
S = {(, x) : x
= o, f (, x) = o}.
Then S contains the point (0 , o).31 Let C be a component of S which contains (0 , o).
Then at least one of the following assertions holds:
(i) C is not a compact set in .
(ii) C contains an even number of points (, o) where
multiplicity.

is an eigenvalue of A of odd

Proof. We shall follow the proof of Ize [69]. The idea is the following. We will assume
that C is compact, and prove that it contains an even number of points described in (ii).
Since C is compact, it contains only a nite number of points (, o) where
= 0 and 1
is an eigenvalue of the compact operator A (see Figure 5.2.5): We shall denote them by
(0 , o), . . . , (k1 , o).

Since C is a component of S in and S is closed, there exists an open bounded set


and S
= . We prove that
can be chosen in such a way that
such that C
j = 0, 1, . . . , k 1, but (, o)
for 1 (A),
= j , j = 0, 1, . . . , k 1
/
(j , o) ,

(see Figure 5.2.6). Indeed, let U be a -neighborhood of C such that U \ C does not
contain any point (, o),
= 0, 1 (A). The set K = U S is then compact,32 and
obviously C (U S) = . By Deimling [34, Lemma 29.1] there exist compact disjoint
sets K1 , K2 K such that
K = K1 K2 ,
31 I.e.,
32 The

C K1 ,

U S K2 .

(0 , o) is a bifurcation point in the sense of Denition 4.3.21.


reader is invited to prove it using the compactness of A and G.

296

Chapter 5. Topological and Monotonicity Methods

C
(0, o)

Figure 5.2.5.

X
U S

K1

K2

(0, o)

0
U0 (, )

U1 (, )

U2 (, )

U3 (, )

U
Figure 5.2.6.

can be chosen as an 0 -neighborhood of K1 with


Hence
0 < min {dist (K1 , K2 ), dist (K1 , U), }.
X R as
For any r > 0 dene fr :
fr (, x) = (x2 r 2 , f (, x)).

(5.2.30)

5.2A. Global Bifurcation Theorem

297

Then obviously
fr (, x) = o

f (, x) = o

and

x = r.

(In other words, the function fr considers the solutions of f (, x) = o which belong
and the homotopy invariance
to the sphere x = r.) Then thanks to the choice of
property of the degree (Theorem 5.2.13(vi)), we conclude that
o)
deg (fr , ,
is well dened and independent of r > 0.
The rest of the proof consists in the calculation of this degree for suciently large
r and for suciently small r.
implies that there exists C > 0 such
Step 1 (suciently large r). The boundedness of

that x < C for any (, x) . Then for r > C the equation
fr (, x) = o
and so, according to Theorem 5.2.13(iii), we have
has no solution in ,
o) = 0.
deg (fr , ,
Step 2 (suciently small r). For j = 0, 1, . . . , k 1 set
Uj (, r)  {(, x) : x2 + | j |2 < r 2 + 2 },

and choose > 0 so small that the sets Uj (, ) are pairwise disjoint, all belong to ,
and do not contain (0, o) (see Figure 5.2.6).
We prove rst that there exists r > 0 (r ) such that
x Ax tG(, x)
= o

(5.2.31)

0 < x r, | j | , j = 0, 1, . . . , k 1. Indeed,


for all t [0, 1], (, x) ,
assume via contradiction that such r > 0 does not exist. Then there exist tn [0, 1] and
n N, o
= xn o, |n j | , j = 0, 1, . . . , k 1, not satisfying (5.2.31),
(n , xn ) ,
i.e.,
(5.2.32)
xn n Axn tn G(n , xn ) = o.
It follows from the construction
We can assume, without loss of generality, that n .
that 1
(A). On the other hand, it follows from (5.2.32) that (setting yn = xn )
of

xn

yn n Ayn tn

G(n , xn )
= o.
xn 

(5.2.33)

Now, the compactness of A and (5.2.28) imply that for a y


= o (ynk y for a subsequence) we have

y Ay
= o,
a contradiction.
We shall write Uj = Uj (, r) for simplicity. It follows from Theorem 5.2.13(iv) that
o) =
deg (fr , ,

k1

j=0

deg (fr , Uj , o).

(5.2.34)

298

Chapter 5. Topological and Monotonicity Methods

Let j be xed. It follows from the choice of > 0 that for 0 < | j | we have
1

(A).

Then for any such the degree


deg (I A, B(o; r), o)
is well dened. Moreover, the homotopy invariance property of the degree implies that
it is locally constant with respect to . Denote
ij = deg (I (j )A, B(o; r), o),

ij+ = deg (I (j + )A; B(o; r), o).

It follows from Lemma 5.2.35 below that


deg (fr , Uj , o) = ij ij+ .
If mj is the multiplicity of

1
j

(5.2.35)

, then Proposition 5.2.22 yields


ij+ = (1)mj ij .

Hence for mj even we obtain


deg (fr , Uj , o) = ij ij+ = 0,

(5.2.36)

deg (fr , Uj , o) = 2ij .

(5.2.37)

while for mj odd we have


It follows from (5.2.34)(5.2.37) that
o) = 2
deg (fr , ,

k1


ij .

j=0
mj odd

Since this degree is independent of r, it must be equal to zero (see Step 1 of this proof).
Hence there must be an even number of eigenvalues of odd algebraic multiplicity among

0 , . . . , k1 .
Now we prove an analogue of the LeraySchauder Index Formula (see Proposition 5.2.22).
Lemma 5.2.35. Let fr , Uj , ij , ij+ be as above. Then
deg (fr , Uj , o) = ij ij+ .
Proof. We will connect fr with a simpler mapping using a suitable homotopy. Let us
dene this homotopy in the following way:
t [0, 1]

ft,r : Uj R X : ft,r (, x) = (t , yt ),

t = t(x2 r 2 ) + (1 t)(2 ( j )2 ),
We prove that for any t [0, 1]

o
/ ft,r (Uj ).

yt = x Ax tG(, x).

5.2A. Global Bifurcation Theorem

299

Assume the contrary, i.e., there exist t [0, 1] and (, x) Uj such that
ft,r (, x) = o.
The fact that (, x) Uj implies
x2 + ( j )2 = r 2 + 2 .
At the same time, from
0 = t = t(x2 + ( j )2 ) t(r 2 + 2 ) + 2 ( j )2
we obtain = j , and so x = r. This together with yt = o contradicts (5.2.31).
The homotopy invariance property of the degree then implies
deg (fr , Uj , o) = deg (f0,r , Uj , o).
The mapping f0,r is now easier to deal with. Indeed, the point o has two preimages,
(j , o) and (j + , o), with respect to the mapping
f0,r (, x) = (2 ( j )2 , x Ax).

is injective:
At both points the Frechet dierential f0,r

f0,r
(, 0)(, u) = (2( j ), u Au).

Let us choose suciently small neighborhoods of points (j , o) in the following way:


Let V be small neighborhoods of points j and let U be a small neighborhood of o
in X such that
U  V U Uj .
We have

deg (f0,r , Uj , o) = deg (f0,r , U , o) + deg (f0,r , U + , o)

and, by the Product Formula (Exercise 5.2.28) and Proposition 5.2.22,


deg (f0,r , U , o) = deg (I (j )A, U, o) deg (2 ( j )2 , V , o) = ij 1.
Similarly, we get
deg (f0,r , U + , o) = ij+ (1).


This completes the proof.

Corollary 5.2.36. If = R X in Theorem 5.2.34, then the rst possibility (i) reduces
to C is unbounded in R X and (ii) remains unchanged.
Proof. Let (, x) C. Then
x = Ax + G(, x).
This implies that if C is bounded in R X, it is also relatively compact because
T (, x) = Ax + G(, x)
is a compact operator. But C is closed, and so it is compact. We have thus proved that
if C is bounded, it is also compact.


300

Chapter 5. Topological and Monotonicity Methods

We will now discuss the special case when 10 is an eigenvalue of A the multiplicity of
which is equal to 1. If this is the case in Theorem 5.2.34 and C is the component containing
the point (0 , o), then C consists of two connected sets C which near (0 , o) meet only
in (0 , o). More precisely, the next assertion holds (see Deimling [34, Corollary 29.1]).
Corollary 5.2.37. Under the hypotheses of Theorem 5.2.34 suppose, in addition, that the
multiplicity of 10 is 1. Then the component C containing (0 , o) consists of two connected
sets C + and C , C = C + C such that
C + C B((0 , o); ) = {(0 , o)}

and

C B((0 , o); )
=

for suciently small  > 0.


The meaning of C is the following. Let us assume that (n , xn ) C , n 0
and xn o. Then similarly to Exercise 5.2.25 we prove that
xxnn
v0 where v0
= o
is a normalized eigenvector associated with the eigenvalue 0 . In other words, the sets
C describe the branches of nontrivial solutions which bifurcate in the direction of the
eigenvectors v0 (see Figure 5.2.7 for the projections of C into the space X).

C+
v0
o
v0
C
Figure 5.2.7.
The global properties of C were studied by Dancer [30]. The main result of this
paper can be formulated as follows.
Theorem 5.2.38 (Dancer Global Bifurcation Theorem). The sets C + and C are either
both unbounded, or
C + C
= {(0 , o)}.
Example 5.2.39 (Application of the Dancer Global Bifurcation Theorem). Let us consider
the Dirichlet boundary value problem

x
(t) + x(t) = g(, t, x(t)),
t (0, ),
(5.2.38)
x(0) = x() = 0.
We assume that g = g(, t, s) is a continuous function from [0, ] R R into R and,
given any bounded interval I R,
lim

s0

g(, t, s)
=0
s

(5.2.39)

5.2A. Global Bifurcation Theorem

301

holds uniformly with respect to t [0, ] and I. In particular,


g(, t, 0) = 0,

t [0, ], R,

and so (5.2.38) has a trivial solution. In this example we discuss the existence and properties of nontrivial weak solutions of (5.2.38). Let X  W01,2 (0, ) and dene operators
A, G : X X as follows:


(Ax, y) =
x(t)y(t) dt, (G(, x), y) =
g(, t, x(t))y(t) dt for any x, y X.
0

The existence of a weak solution of (5.2.38) is equivalent to the existence of a solution of


the operator equation (5.2.25), i.e.,
x Ax G(, x) = o,
cf. Example 5.3.11. Moreover, (5.2.39) implies (5.2.28) (the reader is invited to check it).
Set 0 = n2 where n N is xed. Then 10 = n12 is an eigenvalue of A of the
multiplicity 1. It follows from the above results (Theorems 5.2.34 and 5.2.38) that there
is a component C of S which contains nontrivial solutions of (5.2.25), and such that
C = C+ C,

{(n2 , o)} C + C ,

C are either both unbounded, or C + C


= {(n2 , o)}. We show that the latter case
cannot occur if g = g(, t, s) is locally Lipschitz continuous with respect to the third
variable s (cf. page 93). To prove this fact the properties of the initial value problem
(with xed)

x
(t) + x(t) g(, t, x(t)) = 0,
(5.2.40)
0 ) = x1 ,
t0 [0, ],
x(t0 ) = x0 , x(t
play a crucial role. In particular, we use the uniqueness of the solution to (5.2.40), which
in turn implies that (5.2.40) with x0 = x1 = 0 has only the trivial solution.
The regularity result (cf. Remark 5.3.10 and Exercise 5.3.26) for weak solutions of
(5.2.38) yields that for any (, x) C we have x C 2 [0, ] and the above mentioned
uniqueness result for (5.2.40) also implies that any such x has only a nite number of
nodes in (0, ).
Let
2
1 2
v0 (t) =
sin nt
n
be a normalized eigenfunction associated with the eigenvalue n12 of A. Consider (k , xk )
C + such that k n2 and xk  0. Then
xk k Axk G(k , xk ) = o.
The reexivity of X, (5.2.39) and the compactness of A imply that
vk 

xk
v0
xk 

in

X.

The embedding X = W01,2 (0, )  C[0, ] and the fact that


vk = k Avk +

G(k , xk )
xk 

(5.2.41)

302

Chapter 5. Topological and Monotonicity Methods

then yield that vk v0 even in C 2 [0, ]. In particular, it means that for large enough k,
the functions xk share the nodal properties of v0 . More precisely, let
A+  {(, x) C + : x has exactly (n 1) nodes in (0, ) and x(0)

> 0},
A  {(, x) C : x has exactly (n 1) nodes in (0, ) and x(0)

< 0}.
Then there exists 0 > 0 such that
C B((n2 , o); ) = A

for any

0  0 .

In particular, A
= . We show that A is closed and open in C . Let us consider
C + , the case of C is similar. Recall that C + is a connected set with respect to the
x
topology induced by the topology on R X. For a given (,
) C + the convergence
x
) in this topology means that
(k , xk ) (,

in R

and

xk x

in X.

The above mentioned regularity result and the embedding X = W01,2 (0, )  C[0, ]
then imply that

in C 2 [0, ].
xk x
Let us assume that
(k , xk ) A+ ,

(k , xk )
= (, o),

x
(k , xk ) (,
) C + .

x
The fact xk x
in C 2 [0, ] then yields that (,
) A+ , i.e., A+ is closed in C + . On
+
2

)
= (n , o), then there exists  > 0 such that
the other hand, if (, x
) A , (, x
x
C + B((,
); ) A+ ,
and xk x
for otherwise there would be k
in C 2 [0, ], (k , xk )
A+ , (k , xk )
+
+
+
C , a contradiction. Hence A is open in C .
We have just proved that A = C and so the sets C + and C do not have
any common point besides (n2 , o). According to Theorem 5.2.38 both C + and C are
unbounded in R X.
Let us emphasize that this means that C are unbounded either with respect to
x, or with respect to (or with respect to both x and !).
Some further properties of g might provide more information about the sets of C
(e.g., boundedness with respect to x if there are a priori estimates for all solutions
and unboundedness with respect to ; or vice versa, boundedness with respect to and
e
unboundedness with respect to x).
Exercise 5.2.40. Consider the boundary value problem (5.2.38) and apply Theorem 5.2.34
to get conclusions about the bifurcation branches. Formulate further assumptions on g
which will imply unboundedness of the branches with respect to x and , respectively.
Exercise 5.2.41. Consider the Neumann boundary value problem

x
(t) + x(t) = g(, t, x(t)),
t (0, ),
x(0)

= x()

= 0.

(5.2.42)

Find conditions on g = g(, t, s) and making it possible to apply Theorem 5.2.34.

5.2B. Topological Degree for Generalized Monotone Operators

303

Exercise 5.2.42. Modify assumptions from Exercise 5.2.41 on g so as to make it possible


to apply Theorem 5.2.38 to (5.2.42) and to exclude the situation C + C
= {(n2 , o)}.
Exercise 5.2.43. Consider the periodic problem

x
(t) + x(t) = g(, t, x(t)),
x(0) = x(2),

t (0, 2),

x(0)

= x(2).

(5.2.43)

Find conditions on g = g(, t, s) and making it possible to apply Theorem 5.2.34,


cf. Example 4.3.25.

5.2B Topological Degree for Generalized Monotone Operators


Let X be a reexive real Banach space and X its dual. We will consider the operator
T : X X .

(5.2.44)

The purpose of this appendix is to inform the reader about a possible method for extending the LeraySchauder degree theory to mappings of the type (5.2.44). The following
denition is the key to the theory presented in this appendix.
Denition 5.2.44. The operator T : X X is said to satisfy the (S+ ) condition if the
assumptions
un  u0

(weakly) in X

and

lim sup T (un ), un u0  0 33


n

imply
un u0

(strongly) in

X.

Remark 5.2.45. The topological degree for generalized monotone operators was independently introduced by Browder [19] and Skrypnik [121]. The notation (S+ ) is brought
from Browder [19] while the same condition is called (X) in Skrypnik [121]. The (S+ )
condition is a kind of compactness condition and plays an essential role in the construction of the degree for T : X X . This construction is based on the Brouwer degree and
nite dimensional approximations as the construction of the LeraySchauder degree, and
mappings satisfying the (S+ ) condition then play a similar role as compact perturbations
of the identity. The following assertion illustrates this fact. Its proof is a straightforward
consequence of Denition 5.2.44.
Lemma 5.2.46. Let T : X X satisfy the (S+ ) condition and let K : X X be a
compact operator. Then the sum T + K : X X satises the (S+ ) condition.
The following assertion is an analogue of Theorem 5.2.13 and of Exercise 5.2.26.
Theorem 5.2.47 (I. V. Skrypnik [121]). Let T : X X be a bounded and demicontinuous34 operator satisfying the (S+ ) condition. Let D X be an open, bounded and
and in the sequel we denote by f, u  f (u) the value of the linear form f X for an
element u X. If X is a Hilbert space, then according to the Riesz Representation Theorem,
f, x = (x, f ).
34 We say that T : X X is demicontinuous if T maps strongly convergent sequences in X to
weakly convergent sequences in X .
33 Here

304

Chapter 5. Topological and Monotonicity Methods

nonempty set with the boundary D such that T (u)


= o for u D. Then there exists
an integer
deg (T, D, o)
(called the degree of the mapping T ) such that
(i) deg (T, D, o)
= 0 implies that there exists an element u0 D such that
T (u0 ) = o.
(ii) If D is symmetric with respect to the origin and T satises T (u) = T (u) for
any u D, then
deg (T, D, o)
is an odd number (and thus dierent from zero).
(iii) (Homotopy invariance property) Let T be a family of bounded and demicontinuous
mappings which satisfy the (S+ ) condition and which depend continuously on a real
parameter [0, 1], and let T (u)
= o for any u D and [0, 1]. Then
deg (T , D, o)
is constant with respect to [0, 1]. In particular, we have
deg (T0 , D, o) = deg (T1 , D, o).
The following assertion combined with Theorem 5.2.47(i) is a crucial tool in proving
the existence of a solution or the existence of a bifurcation branch (see Appendix 7.5A).
Proposition 5.2.48 (I.V. Skrypnik [121]). Let T : X X be a bounded, demicontinuous
mapping satisfying the (S+ ) condition, o D \ D, T (u)
= o for u D, D being as in
Theorem 5.2.47. Let for u D the inequality
T (u), u 0

be valid.

Then
deg (T, D, o) = 1.
Let u0 X be an isolated solution of the equation
T (u) = o.

(5.2.45)

Similarly to the nite dimensional case (and in the case of the LeraySchauder degree)
we dene the index of an isolated solution u0 as
i(T, u0 ) = lim deg (T, B(u0 ; r), o).
r0+

Then we have the following useful property of the degree.


Proposition 5.2.49 (I.V. Skrypnik [121]). Let T and D be as in Theorem 5.2.47. Let
T (u) = o have only isolated solutions in D and let T (u)
= o for u D. Then there is
only a nite number of solutions of (5.2.45) in D, ui , i = 1, . . . , n, and the equality
deg (T, D, o) =

n

i=1

i(T, ui )

holds.

5.2B. Topological Degree for Generalized Monotone Operators

305

The last assertion connects the properties of the functionals and the degree of their
Frechet derivatives.
Proposition 5.2.50 (I. V. Skrypnik [121]). Assume that a real functional F : X R has
a local minimum at u0 X and its Frechet derivative F  : X X is a bounded and
demicontinuous mapping which satises the (S+ ) condition. Let, moreover, u0 be an
isolated solution of F  (u0 ) = o. Then
i(F  , u0 ) = 1.
Example 5.2.51. Let us consider the boundary value problem

p2
x(t))

g(x(t)) = f (t),
t (0, 1),
(|x(t)|

(5.2.46)

x(0) = x(1) = 0


where p > 1, f Lp (0, 1), p =


dierential operator

p
,
p1

and g : R R is a continuous function. The

35
x (|x|
p2 x)
is the so-called one-dimensional p-Laplacian (or half-linear dierential operator of the
second order). The parameter R for which there is a nontrivial weak solution (cf. Remark 5.3.10) = (t) (i.e., not identically equal to zero in (0, 1)) of the problem

p2
x(t))

|x(t)|p2 x(t) = 0,
t (0, 1),
(|x(t)|

(5.2.47)
x(0) = x(1) = 0
is called an eigenvalue of the eigenvalue problem (5.2.47) and the function an eigenfunction associated with the eigenvalue . It is known (see, e.g., Elbert [47]) that the problem (5.2.47) has a countable set of simple eigenvalues 0 < 1 < 2 < , lim n =
n

(cf. Appendix 6.4B for the case p > 2) and the values of n , n = 1, 2, . . . , can be explicitly
calculated in terms of p and . The eigenfunction n associated with n is continuously
dierentiable and has exactly n 1 zero points in (0, 1). In particular, we can choose
y [37], Dr
abek, Krejc & Tak
ac [41] and refer1 (t) > 0, t (0, 1). (See Elbert [47], Dosl
ences given there.) However, the concrete values of n are not important in this example.
Let us assume that
lim

g(s)
=
|s|p2 s

where

n < < n+1

for an n = 1, 2, . . . .

(5.2.48)

The problem (5.2.46) is then called a nonresonance problem (cf. Remark 7.5.5). Put
X  W01,p (0, 1) with the norm


x =

|x(t)|

dt
p

 p1
.

Let us dene operators J, G : X X and an element f X by


 1
 1
p2
J(x), y =
|x(t)|

x(t)
y(t)
dt, 36
G(x), y =
g(x(t))y(t) dt,
0

right-hand side is dened by (x)


where (s) =
s
= 0, (0) = 0 for p > 1.
the H
older inequality the integral exists and denes, for a xed x X, a continuous linear
form on X.

35 The
36 By

|s|p2 s,

306

Chapter 5. Topological and Monotonicity Methods


f , y =

f (t)y(t) dt

x, y X.

for any

If we set
T = J + G,
then the operator equation

T (x) = f

(5.2.49)

is equivalent to the requirement that the integral identity


 1
 1
 1
p2
|x(t)|

x(t)
y(t)
dt
g(x(t))y(t) dt =
f (t)y(t) dt
0

(5.2.50)

holds for all y X. The function x X satisfying (5.2.50) is then a weak solution of
(5.2.46) (cf. Remark 5.3.10). It follows that the existence of a weak solution of (5.2.46)
is equivalent to the existence of a solution of the operator equation (5.2.49).
Our plan is to use the degree argument to prove the existence of a solution of
(5.2.49). First we sketch the properties of operators J and G. The operator J satises
J(x), x = xp .

(5.2.51)

Moreover, J is an odd mapping, (p 1)-homogeneous,37 it is bounded, continuous (and


so demicontinuous) and satises the (S+ ) condition. Indeed, let xn  x0 in X and
lim sup J(xn ), xn x0  0.
n

Then lim J(x0 ), xn x0  = 0, and so


n

0 lim sup J(xn ) J(x0 ), xn x0 


n

= lim sup
n



|x n (t)|p dt

lim sup
n

9
|x n |p2 x n (t) |x 0 (t)|p2 x 0 (t) (x n (t) x 0 (t)) dt

 1 
p
|x n (t)|p dt

|x 0 (t)| dt
p

1
p

 p1
|x 0 (t)|p dt

 p1 
1
|x n (t)|p dt

8
9
= lim sup xn p1 x0 p1 [xn  x0 ] 0

|x 0 (t)| dt
p

where the last inequality follows from the fact that s |s|p1 is strictly increasing on
(0, ). Hence xn  x0 , and due to the uniform convexity of X we have xn x0 in
X.38 The operator J is also invertible and its inverse is continuous.39
The operator G is compact. This follows immediately from the compact embedding
X = W01,p (0, 1)  C[0, 1] and from the continuity of g (the reader is invited to prove
it in detail). Hence, due to Lemma 5.2.46 the operator T satises the (S+ ) condition.
J(tx) = tp1 J(x) for any t > 0, x X.
Proposition 2.1.22(iv).
39 See Exercise 5.2.53.
37 I.e.,
38 See

5.2B. Topological Degree for Generalized Monotone Operators


Let us dene an operator S : X X by
 1
S(x), y =
|x(t)|p2 x(t)y(t) dt

307

x, y X.

for any

Then S is (p 1)-homogeneous and compact (use X = W01,p (0, 1)  Lp (0, 1)). We


dene a homotopy
T (x) = J(x) (1 )G(x) S(x) + ( 1)f

[0, 1], x X,

for

and show that there exists R > 0 (large enough) such that this homotopy is admissible
with respect to the ball B(o; R) X. The usual way to prove it relies on an indirect
argument. Assume by contradiction that for any k N there exists k [0, 1] and xk X,
xk  k such that
Tk (xk ) = o,
i.e.,

J(xk ) (1 k )G(xk ) k S(xk ) + (k 1)f = o.

We divide (5.2.52) by xk 


homogeneous to get

p1

J(vk ) (1 k )

, denote vk 

xk

xk
p1

(5.2.52)

and use that J and S are (p 1)-

G(xk )
f
k S(vk ) + (k 1)
= o.
xk p1
xk p1

(5.2.53)

Due to the reexivity of X and the compactness of the interval [0, 1], we may assume
that
and
k [0, 1].
vk  v in X
Using the compactness of the embedding X  Lp (0, 1), the facts that G and S are

continuous as operators from Lp (0, 1) into Lp (0, 1), and using the assumption (5.2.48)
we obtain
G(xk )
(1 )S(v)
xk p1

in

X ,

k S(vk ) S(v)

in

X ,

in

(1 k )

(k 1)

f
o
xk p1

as

(the reader is invited to justify all in detail!). Passing to the limit in (5.2.53) we thus get
J(vk ) (1 )S(v) + S(v)
i.e.,

in

vk J 1 (S(v))

in

as

k ,

X.

Since at the same time vk  v in X, we have


vk v

in

and
J(v) S(v) = o

in

X .

(5.2.54)

308

Chapter 5. Topological and Monotonicity Methods

Since vk  = 1 for all k = 1, 2, . . . , we have v = 1, and so (5.2.54) contradicts the fact
that is not an eigenvalue of (5.2.47). This proves that the homotopy T is admissible
with respect to the ball B(o; R) if R is large. Applying Theorem 5.2.47(iii) we arrive at
deg (J G f , B(o; R), o) = deg (J S, B(o; R), o),
but the value of the degree on the right-hand side is an odd number according to Theorem 5.2.47(ii). Hence
deg (J G f , B(o; R), o)
= 0,
and the existence of at least one solution x X of (5.2.49) which satises x < R
e
follows from Theorem 5.2.47(i).
Remark 5.2.52. It is possible to solve the problem (5.2.46) by means of the Leray
Schauder degree theory as well. In that case instead of solving the operator equation
J(x) G(x) = f
one has to deal with

x = J 1 (f + G(x))

(cf. Exercise 5.2.54). Due to the properties of J 1 (cf. Exercise 5.2.53) this approach is
more or less equivalent to that presented in Example 5.2.51. However, in more complicated applications (equations of higher order, partial dierential equations, etc., see, e.g.,
Appendix 7.5A) the use of the degree presented in Theorem 5.2.47 can appear to be of
essential advantage!
Exercise 5.2.53. Let J be an operator from Example 5.2.51. Prove that there exists an
inverse operator J 1 which is bounded and continuous.
Hint. The strict monotonicity of s |s|p2 s implies that
J(u) J(v), u v > 0

u
= v.

for

Hence J is injective. Using the H


older inequality prove that
J(x) J(y), x y (xp1 yp1 )(x y)

(5.2.55)

(cf. the proof of the (S+ ) condition in Example 5.2.51). The boundedness of J 1 then
follows.
To prove that J 1 is continuous proceed via contradiction. Suppose it is not, i.e.,

there exists a sequence {fn }


n=1 , fn f in X and
J 1 (fn ) J 1 (f )

for a

> 0.

Let xn  J 1 (fn ), x = J 1 (f ). It follows that


fn xn  fn , xn  = J(xn ), xn  = xn p ,

i.e.,

xn p1 fn .

in X due to the reexivity of X. Hence


We may then assume xn  x
x), xn x
 = J(xn ) J(x), xn x
 + J(x) J(
x), xn x
 0 (5.2.56)
J(xn ) J(
since J(xn ) J(x) in X . It follows from (5.2.55) (with x  xn , y  x
) and (5.2.56) that
x. Hence xn x
follows due to the fact that X is a uniformly convex Banach
xn  
space (see page 65 or Adams [2, Theorem 3.5]). Since J is continuous and injective, x
= x,
a contradiction.

5.3. Theory of Monotone Operators

309

Exercise 5.2.54. Consider the boundary value problem (5.2.46) with g satisfying (5.2.48).
Prove the existence of at least one weak solution of (5.2.46) using the LeraySchauder
degree theory.
Hint. Prove that J 1 G is a compact operator from X into itself and then use the
homotopy invariance property of the LeraySchauder degree to prove that
x = J 1 (f + G(x))
has at least one solution in X. Compare your proof with the method presented in Example 5.2.51.
Exercise 5.2.55. Consider the problem
 

p2
|x(t)|

x(t)

= h(t, x(t), x(t)),

t (0, 1),

x(0) = x(1) = 0,

(5.2.57)

where p > 1. Formulate conditions on h = h(t, x, s) which guarantee the existence of a


weak solution of (5.2.57) (see Remark 5.3.10).

5.3 Theory of Monotone Operators


The motivation for the methods presented in this section can be described by
the following simple example of a real function of one real variable f : R R. We
would like to nd conditions on f which guarantee that for any y R the equation
f (x) = y has a (unique) solution x. One possible way to solve this rst semester
calculus problem is to consider f which is continuous, (strictly) monotone and
lim |f (x)| = (see Figure 5.3.1).
|x|

y = f (x)

Figure 5.3.1.

If f is replaced by an operator
T: H H
from a real Hilbert space H (with a scalar product (, ) and the induced norm
 ) into itself and the same question is posed, then similar conditions appear to
be appropriate to prove that for any h H the equation
T (u) = h

310

Chapter 5. Topological and Monotonicity Methods

has a unique solution u H. It is clear how to reformulate the rst condition on


f in the case of a general operator T . The third condition motivates the following
denition.
Denition 5.3.1. Let H be a real Hilbert space. An operator T : H H satisfying
lim

uH

T (u) =

is called weakly coercive.


In order to reformulate the second condition we should rst note that a real
function of one real variable is increasing (decreasing) if and only if
(f (x) f (y))(x y) 0

( 0)

for any x, y R.

Denition 5.3.2. Let H be a real Hilbert space. An operator T : H H satisfying


(T (u) T (v), u v) 0

for any u, v H

(5.3.1)

is called a monotone operator. An operator T is called strictly monotone if for u = v


the strict inequality holds in (5.3.1). An operator T is called strongly monotone if
there exists c > 0 such that
(T (u) T (v), u v) cu v2

for any u, v H.

Remark 5.3.3. It is clear that a strongly monotone operator is strictly monotone


and, therefore, monotone. Also, every strongly monotone operator is weakly coercive. Indeed, T being strongly monotone implies
(T (u) T (o), u) cu2 .

(5.3.2)

The Schwartz inequality (see Proposition 1.2.30(i)) yields


(T (u) T (o), u) [T (u) + T (o)]u.

(5.3.3)

Putting (5.3.2) and (5.3.3) together we get


T (u) cu T (o),
and the weak coercivity follows.
The following theorem is the basic assertion of this section.
Theorem 5.3.4. Let H be a real Hilbert space and let T : H H be continuous,
monotone and weakly coercive. Then
T (H) = H.
If, moreover, T is strictly monotone, then for any h H the equation
T (u) = h
has a unique solution.

(5.3.4)

5.3. Theory of Monotone Operators

311

Proof. The uniqueness of the solution is a direct consequence of the strict monotonicity of T . The existence of a solution to (5.3.4) for any h H is proved in two
steps:
Step 1. Assume for a while that the assertion of the theorem holds if T is continuous and strongly monotone. We prove this fact later, in Proposition 5.3.5. Since
Tn : H H, n N, dened by
Tn : u 

1
u + T (u)
n

is strongly monotone (prove it!) for any n N, we claim that given h H there
exists un H such that
Tn (un ) = h.
(5.3.5)

Step 2. Let us prove that {un }n=1 is a bounded sequence in H. Assume the

contrary, i.e., there exists a subsequence which will be denoted by {un }n=1 again
such that
lim un  = .
n

It follows from the monotonicity of T that




un
1
1
1
h h,
(T (un ) T (o), un ) +
(T (o), un )
= un  +
un 
n
un 
un 
1
un  T (o),
n
)
*
i.e., n1 un n=1 is a bounded sequence (and therefore weakly sequentially compact
see
and note that H is reexive). Hence there exists a subsequence
, 2.1.25
+ Theorem
) 1 *
1
n un n=1 which is weakly convergent, i.e.,
nk unk
k=1

1
un  w.
nk k
According to (5.3.5),
T (unk )  h w.
{T (unk )}
k=1

This implies that


is a bounded sequence (Proposition 2.1.22(iii)),
which contradicts the weak coercivity of T . This proves the boundedness of

{un }n=1 . In particular, n1 un o and T (un ) h.

By Theorem 2.1.25, there is a subsequence {umk }


k=1 {un }n=1 such that

u mk  u 0 .
We prove that T (u0 ) = h. Indeed, for any v H and k N we have
(T (umk ) T (v), umk v) 0.

312

Chapter 5. Topological and Monotonicity Methods

Passing to the limit for k we obtain


(h T (v), u0 v) 0

for any v H.40

Set v = u0 + w, > 0, w H. Then


(h T (u0 + w), w) 0

holds for any > 0 and w H.

(5.3.6)

Passing to the limit for 0+ in (5.3.6) and using the continuity of T and of the
scalar product in H, we get
(h T (u0 ), w) 0

for any w H.

(5.3.7)

Since (5.3.7) holds simultaneously for any w and w, we actually have


(h T (u0 ), w) = 0

for any w H,

i.e.,

T (u0 ) = h.

Now, it remains to justify the assumption made in Step 1. For this purpose
we prove the following assertion.
Proposition 5.3.5. Let H be a real Hilbert space and S : H H a continuous and
strongly monotone operator. Then
S(H) = H.
Proof. The idea of the proof is easy. Since H is a connected metric space, it is
enough to prove (see Lemmas 5.3.6 and 5.3.8) that S(H) is both open and closed
in H. Then S(H) = H because the only nonempty subset of H which is both open
and closed is the entire space H.

First we prove that S(H) is closed.
Lemma 5.3.6. Let D be a closed set in H, let S : D H be a continuous and
strongly monotone operator. Then S(D) is a closed set in H.

Proof. Let {un }n=1 D be such that


S(un ) h.
Since S is strongly monotone, we have
(S(un ) S(um ), un um ) cun um 2 ,
and using the Schwartz inequality we obtain
1
S(un ) S(um ) un um .
c
40 Here

we use that xn  x and yn y imply (xn , yn ) (x, y). See Exercise 2.1.36.

5.3. Theory of Monotone Operators

313

Hence {un }n=1 is a Cauchy sequence, and there exists u0 D such that
un u0 .
The continuity of S implies that
S(un ) S(u0 ),

i.e.,

S(u0 ) = h.

To prove that S(H) is an open set is more tricky. For this purpose we need
an auxiliary assertion about an extension of Lipschitz continuous operators.
Lemma 5.3.7. Let D be a subset of a real Hilbert space H, let V : D H be an
operator satisfying
V (u) V (v) u v

for any

u, v D.

Then there exists an operator W : H H such that


W (u) W (v) u v

for any

u, v H

(5.3.8)

and, moreover,
W (u) = V (u)

for any

u D.

Proof. It follows from Zorns Lemma (see Theorem 1.1.4) that there exists a maximal extension W of the operator V , the domain of which satises
Dom W H,

D Dom W,

and for any u, v Dom W the inequality (5.3.8) holds. Our aim is to prove
Dom W = H.
Assume the contrary, i.e., there exists u0 H \ Dom W . In order to reach a
contradiction it is enough to prove the existence of v0 H such that
v0 W (u) u0 u
Indeed, setting


: u 
W

for any u Dom W.

(5.3.9)

v0 ,
u = u0 ,
W (u), u Dom W,

we obtain an operator
: Dom W {u0 } H
W
satisfying (5.3.8) for any u, v Dom W {u0 }. This will be a contradiction with
the maximality of the extension W . So, in the rest of the proof we concentrate on
the existence of v0 satisfying (5.3.9).
Let B be a nite subset of Dom W . Denote by AB the set of all v0 H
satisfying (5.3.9) for any u B. Let A denote the set of all v0 H satisfying
(5.3.9) for all u Dom W . Let Bn be the system of all nite subsets B of Dom W
which belong to the closed ball {u H : u n}, n N. Set

An =
AB .
BBn

314

Chapter 5. Topological and Monotonicity Methods

Clearly, we have
A=

An ,

An+1 An A1 .

n=1

We wish to prove that A = .


Observe rst that AB and An are weakly compact sets (they are bounded
and weakly closed41 ). If AB = for any nite subset B Dom W , then An =
for any n N by Exercise 1.2.42. Applying this procedure again we nally obtain
A=
 and the proof will be complete.
Assume now that there exists B = {u1 , . . . , um } Dom W such that AB = .
We want to reach a contradiction which will complete the proof. Denote
Hf = Lin{u1 u0 , . . . , um u0 , W (u1 ), . . . , W (um )}.
Then Hf is a subspace of H and
dim Hf 2m.
For any w Hf set
h(w) = max

1jm

w W (uj )
.
u0 uj 

If there exists v0 Hf such that h(v0 ) 1, then v0 AB , a contradiction.


So, assume that h(w) > 1 for any w Hf . Note that the real function h is
continuous on Hf and
lim h(w) = .
w
wHf

Hence, there exists w0 Hf 42 such that


h(w0 ) = min h(w) = > 1.
wHf

Let us re-enumerate u1 , . . . , um in such a way that


w0 W (uj )
= > 1,
u0 uj 
w0 W (uj )
< ,
u0 uj 

1 j k,
(5.3.10)
k + 1 j m.

We prove that w0 belongs to the convex hull M = Co{W (u1 ), . . . , W (uk )} of


{W (u1 ), . . . , W (uk )}.43 Let us assume the contrary. Then we can nd w1 Hf
the weak topology is not metrizable the fact that AB is weakly closed has to be shown
with help of weak neighborhoods (Remark 2.1.23). But this is simple due to the Dual Characterization of the Norm (Corollary 2.1.16).
42 Recall that bounded sets in a nite dimensional space H are relatively compact.
f
43 The convex hull of the set A is the least convex set containing A.
41 Since

5.3. Theory of Monotone Operators

315

such that
w1 W (uj )
w0 W (uj )
<
= ,
u0 uj 
u0 uj 
w1 W (uj )
< ,
u0 uj 

1 j k,
k + 1 j m,

(see Figure 5.3.2 for m = 5 and k = 2).44 Hence h(w1 ) < h(w0 ), a contradiction.

W (u1 )

Hf

W (u4 )

U3

w0

W (u5 )
w1

U5

wM

U2

=C

U1

U4

o{W

W (u3 )
+
Figure 5.3.2. Uj = w Hf :

wW (uj )

u0 uj

(u
1 ),
W

W (u2 )
(u
2 )}

,
< = B(W (uj ); u0 uj ) Hf

Consequently, there are c1 , . . . , ck such that


w0 =

k


cj W (uj ),

cj 0,

j=1

k


cj = 1.

j=1

Set zj = w0 W (uj ), zj = u0 uj , 1 j k. Then


k


cj zj = o,


zj 2 < zj 2 ,

1 j k.

(5.3.11)

j=1

+
,
%

wW (u )

other words,
Uj
= where Uj = w Hf :
u u j
< (see Figure 5.3.2).
0
j
1jm
%
Indeed,
Uj is a nonempty open subset of Hf and the latter (m k) inequalities in
k+1jm
%
(5.3.10) imply w0
Uj . Using the convexity and compactness of M, the reader is
k+1jm
%
invited to show that
Uj contains the segment {tw0 + (1 t)wM : 0 < t < 1} where
1jk
%
w0 wM = dist (w0 , M). Consequently,
Uj is a nonempty set, too, and w0 belongs to
44 In

1jk

its boundary.

316

Chapter 5. Topological and Monotonicity Methods

For 1 j, n k we also have


zj zn 2 ,
zj zn 2 = W (un ) W (uj )2 un uj 2 = 
i.e.,
zj 2 + zn 2 2(zj , zn ) 
zj 2 + 
zn 2 2(
zj , zn ).

(5.3.12)

We conclude from (5.3.11) and (5.3.12) that


(
zj , zn ) < (zj , zn ),
and thus

k


cj cn (
zj , zn ) <

j,n=1

1 j, n k,
k


cj cn (zj , zn ).

j,n=1

However,
2


 k



cj cn (zj , zn ) = 
c
z
j j  = 0,


 j=1
j,n=1
k


i.e.,

2


 k



cj cn (
zj , zn ) = 
c
z

j j ,


j=1
j,n=1
k



2
 k





c
z

j j  < 0,

j=1


a contradiction. This proves that there exists v0 Hf such that h(v0 ) 1,



i.e., AB = for any nite set B Dom W , and the proof is complete.
Now, we are ready to prove that S(H) is also an open set in H.
Lemma 5.3.8. Let D H be an open set, let S : D H be continuous and strongly
monotone. Then S(D) is an open subset of H.
Proof. It is enough to prove this lemma for S satisfying the strong monotonicity
assumption with c = 1 (explain why!). Let us denote R  S(D). We are going to
construct a continuous mapping Z : H H, Dom Z = H, such that Z1 (D) = R
and this will imply that R is open. The operator S is injective in D and S 1 is
continuous on R (see Exercise 5.3.12). So we intend to construct Z as an extension
of S 1 . For this purpose set
F (u) = S(u) u.
Then for u, u1 D we have
(F (u) F (u1 ), u u1 ) 0,
i.e., F is monotone. For v R set
K(v) = S 1 (v) F (S 1 (v)).

5.3. Theory of Monotone Operators

317

Let v, v1 R be such that v = S(u), v1 = S(u1 ). Then


K(v) K(v1 )2 = u u1 2 + F (u) F (u1 )2 2(F (u) F (u1 ), u u1 ),
v v1 2 = F (u) F (u1 )2 + u u1 2 + 2(F (u) F (u1 ), u u1 ).
The monotonicity of F implies that for any v, v1 R,
K(v) K(v1 ) v v1 .
It follows from Lemma 5.3.7 that there exists a continuous extension K1 of K
which is dened on the whole H and for any v, v1 H we have
K1 (v) K1 (v1 ) v v1 .
For v H set
Z(v) =

1
(v + K1 (v)).
2

If v R and v = S(u), then


v + K1 (v) = v + K(v) = 2u,

i.e.,

Z|R = S 1

and R Z1 (D).

The inclusion
Z1 (D) R

(5.3.13)

will imply that


Z1 (D) = R
and, by the continuity of Z, the set R = S(D) is open.
To prove (5.3.13) it is enough to show that for any v Z1 (D) we have
v = S(Z(v)). Assume by contradiction that there is v Z1 (D) such that for
u = Z(v) we have
v S(u) > 0.
(5.3.14)
The continuity of S implies the existence of d > 0 such that B(u; d) D and for
u1 B(u; d) we have
S(u) u S(u1 ) + u1 

1
v S(u).
2

Let us choose t > 0 so small that


t(v S(u)) < d.
Set
u1 = u + t(v S(u)),

v1 = S(u1 ).

Then u u1  < d, and so


(S(u1 ) u1 v + u, t(v S(u))) = (v1 Z(v1 ) v + Z(v), Z(v1 ) Z(v))
1
= (v1 K1 (v1 ) v + K1 (v), v1 + K1 (v1 ) v K1 (v))
4
1
= (v v1 2 K1 (v) K1 (v1 )2 ) 0.
4

318

Chapter 5. Topological and Monotonicity Methods

Furthermore,
(S(u1 ) u1 S(u) + u, u u1 )
= (S(u1 ) u1 v + u, t(v S(u))) + (v S(u), t(v S(u)))
(v S(u), t(v S(u))) = tv S(u)2 ,
and so
tv S(u)2 (S(u1 ) u1 S(u) + u, t(v S(u)))
tS(u1 ) u1 S(u) + uv S(u)

1
tv S(u)2 .
2

Since t > 0 this contradicts (5.3.14). This proves (5.3.13) and the proof is complete.

Let us point out that for operator equations with strongly monotone operators we obtain the continuous dependence of the solution on the right-hand
side.
Corollary 5.3.9. Let H be a real Hilbert space and T : H H a continuous and
strongly monotone operator. Then for any h H the equation
T (u) = h
has a unique solution. Let T (u1 ) = h1 and T (u2 ) = h2 . Then
u1 u2 

1
h1 h2 
c

with c > 0 from Denition 5.3.2, i.e., T 1 is Lipschitz continuous.


Proof. The existence part follows from Proposition 5.3.5. Uniqueness is obvious.
For T (u1 ) = h1 and T (u2 ) = h2 we have (using the Schwartz inequality)
cu1 u2 2 (T (u1 ) T (u2 ), u1 u2 ) u1 u2 h1 h2 ,


which completes the proof.

Remark 5.3.10. Let h : [0, 1]RR R be a real function. Consider the boundary
value problem


x(t) = h(t, x(t), x(t)),

t (0, 1),
(5.3.15)
x(0) = x(1) = 0.
Assume that h is continuous and x C 2 [0, 1] is a solution of (5.3.15). Let us
multiply the equation in (5.3.15) by a function y W01,2 (0, 1) and then integrate
the equation from 0 to 1. Applying the Integration by Parts Formula on the lefthand side, we obtain
 1
 1
x(t)
y(t)
dt =
h(t, x(t), x(t))y(t)

dt.
(5.3.16)
0

5.3. Theory of Monotone Operators

319

This identity makes sense for a more general x than that from C 2 [0, 1] and also
for a more general function h. We discuss this issue in Section 7.3 in detail.
If we assume that h is such that the integral on the right-hand side of (5.3.16)
exists for any x, y W01,2 (0, 1) (see the following Example 5.3.11), then the function x W01,2 (0, 1) is called the weak solution of (5.3.15) if the integral identity
(5.3.16) holds for any y W01,2 (0, 1).
Once we succeed in nding a weak solution of (5.3.15), a natural question
arises whether it belongs to a better space than W01,2 (0, 1), e.g., the continuity
of the rst and second derivatives of x can be of interest. This is the so-called
regularity problem. It is a very delicate issue in the theory of partial dierential
equations. On the other hand, for an ordinary dierential equation, it is not. For
instance, if h is a continuous function, independent of x,
and x W01,2 (0, 1) is a
weak solution of (5.3.15), then x C 2 [0, 1] is a classical solution of (5.3.15), i.e.,
the equation in (5.3.15) holds pointwise in (0, 1).
Example 5.3.11. Let us consider the boundary value problem


x(t) + g(x(t)) = f (t),


t (0, 1),
x(0) = x(1) = 0

(5.3.17)

where g : R R is a continuous function and f L2 (0, 1) is a given function.


Reformulate (5.3.17) as an operator equation. Put H  W01,2 (0, 1) and dene
operators J, G : H H and an element f H by


x(t)
y(t)
dt,

(J(x), y) =
0

(f , y) =

(G(x), y) =

g(x(t))y(t) dt,
0

for any x, y H.

f (t)y(t) dt
0

We will work with the scalar product




(x, y) =

x, y H,

x(t)
y(t)
dt,
0

and with the norm

x =

|x(t)|

dt
2

 12
,

cf. Exercise 1.2.46. The reader is invited to prove that the operators J and G as
well as the element f are well dened and that J is linear. Set
S = J + G.
Then the operator equation

S(x) = f

320

Chapter 5. Topological and Monotonicity Methods

is equivalent to the requirement that the integral identity




x(t)
y(t)
dt +

g(x(t))y(t) dt =

f (t)y(t) dt

(5.3.18)

holds for all y H. This is the weak formulation of (5.3.17), and x H satisfying
(5.3.18) for any y H is a weak solution of (5.3.17).
Let us prove that S is a continuous operator. This fact follows from the
continuity of J and G. By the denition of J and of the scalar product in H, J is
the identity on H, and so it is a continuous operator. Assume now that xn x.
The embedding of H = W01,2 (0, 1) into C[0, 1] (see Theorem 1.2.26) implies that
xn x uniformly in [0, 1]. It follows then from the continuity of g on R that
g xn g x uniformly in [0, 1] (justify this statement carefully!). Then using the
Dual Characterization of the Norm and the H
older inequality, we conclude that
G(xn ) G(x) = sup |(G(xn ) G(x), w)|
w1

-
= sup -w1

[g(xn (t)) g(x(t))]w(t) dt--

|g(xn (t)) g(x(t))| dt

 12


c

|g(xn (t)) g(x(t))| dt


2

sup wL2 (0,1)

w1

 12

0 as n .

Hence G is also a continuous operator.


Next we prove that S is a strongly monotone operator provided g is an
increasing function. Indeed, for any x, y H we have
(S(x) S(y), x y)

 1
2
|x(t)
y(t)|

dt +
=
0

[g(x(t)) g(y(t))][x(t) y(t)] dt x y2 .45

It follows from Corollary 5.3.9 that the problem (5.3.17) has a unique weak solution
for any f L2 (0, 1).
If fn f in L2 (0, 1), then fn f strongly in H (prove it in detail!), and
according to Corollary 5.3.9 the corresponding weak solutions xn H satisfy
xn x 0.
In particular, this means that a weak solution of (5.3.17) depends continuously on
g
the right-hand side f L2 ().
45 Here

we use that (g(r) g(s))(r s) 0.

5.3. Theory of Monotone Operators

321

Exercise 5.3.12. Let H be a real Hilbert space and let S : H H be a strongly


monotone operator. Prove that S is injective and S 1 is Lipschitz continuous.
Exercise 5.3.13. Let > 0 and T : B(o; R + ) RN RN be a monotone
operator. Prove that T (B(o; R)) is a bounded set.
Exercise 5.3.14. Let B(o; 1) RN , N 2. Prove that there exists a strongly
monotone operator T : B(o; 1) RN for which T (B(o; 1)) is an unbounded set.

Hint. Let {xn }n=1 RN , xn  = 1, xn = xm for n = m, xn x0 . Set



x
for x B(o; 1)), x = xm , m = 1, 2, . . . ,
T : x 
xn + nxn for x = xn , n = 1, 2, . . . .
Prove that T (B(o; 1)) is unbounded and T is strongly monotone.
Exercise 5.3.15. Dene
fn : t 


0
nt

n
2

for t 12 ,
for t > 12 .

For x = (x1 , x2 , . . . ) l2 set


T x = (f1 (x1 ), f2 (x2 ), . . . ) + (x1 , x2 , . . . ).
Prove that
(T (x) T (y), x y)l2 x y2l2

for any x, y l2

and T (B(o; 1)) is an unbounded set.


Exercise 5.3.16. Let T : RN RN be a monotone operator and T (RN ) = RN .
Prove that T is weakly coercive.

Hint. Assume the contrary, i.e., there exist M > 0 and a sequence {un }n=1 such
that
and
T (un) M.
un  as n
,
+
Choose a subsequence of uunn 
which is convergent to w. Since T (RN ) = RN
n=1

there is u RN such that T (u) = (M + 1)w. By the monotonicity of T we have





un u
un u
T (un ),
T (u),
.
un 
un 
Taking lim sup on both sides we obtain a contradiction.
Exercise 5.3.17. Let f : R R be dened as follows:

x
for x < 0,
f : x 
x + 1 for x 0.

322

Chapter 5. Topological and Monotonicity Methods

For any (x, y) R2 set

T (x, y) = (y + f (x), x).

Prove that T is an injective monotone operator, T (R2 ) = R2 and T is not continuous. Can there exist an injective, monotone function T : R R which is not
continuous and T (R) = R?
Exercise 5.3.18. Let H be a real Hilbert space and T : H H a strongly monotone
and Lipschitz continuous operator, i.e., there exist numbers m > 0, M > 0, M > m
such that
(T (u) T (v), u v) mu v2 ,

T (u) T (v) M u v

hold for all u, v H. Prove that the equation


T (u) = h
has precisely one solution for every h H, and it is possible to construct this
solution by iterations.
Hint. Let h H, > 0. Dene an operator
A (u) = u (T (u) h).
Prove that for u, v H,
A (u) A (v)2 (1 2m + 2 M 2 )u v2 ,
2m
and show that for < M
2 the operator A is contractive. Apply the Contraction
Principle (Theorem 2.3.1).

Exercise 5.3.19. Let H be a Hilbert space and T : H H a contraction. Prove


that I T is a monotone operator.
Exercise 5.3.20. A function x W 1,2 (0, 1) satisfying (5.3.16) for any y
W 1,2 (0, 1) is called a weak solution of the Neumann problem


x(t) = h(t, x(t), x(t)),

t (0, 1),
(5.3.19)
x(0)

= x(1)

= 0.
Prove that any weak solution x of (5.3.19) such that x C 2 [0, 1] satises the
equation in (5.3.19) and x(0)

= x(1)

= 0, i.e., it is a classical solution of (5.3.19).


Hint. Taking y D(0, 1) in (5.3.16) show that the equation in (5.3.19) holds
pointwise in (0, 1). Then take arbitrary y C 2 [0, 1] in (5.3.16) and integrate by
parts.
Exercise 5.3.21. Find conditions on , g = g(t, x) and f for the Neumann problem


x(t) + x(t) + g(t, x(t)) = f (t),


t (0, 1),
x(0)

= x(1)

= 0,
to have a unique weak solution.
Hint. Use Corollary 5.3.9.

5.3A. Browder and LerayLions Theorem

323

5.3A Browder and LerayLions Theorem


In this appendix we will discuss generalizations of the previous assertions from the basic
text. We will present two assertions: one attributed to F.E. Browder, the other named
after J. Leray and J.-L. Lions.
Theorem 5.3.22 (Browder). Let X be a reexive real Banach space. Moreover, let T : X
X be an operator satisfying the conditions
(i) T is bounded;
(ii) T is demicontinuous;
(iii) T is coercive, i.e.,
lim

T (u), u
= +,
u

cf. Denition 6.2.17 in the Hilbert space setting;


(iv) T is monotone on the space X, i.e., for all u, v X we have
T (u) T (v), u v 0,

(5.3.20)

cf. Denition 5.3.2 in the Hilbert space setting.


Then the equation

T (u) = f

(5.3.21)

has at least one solution u X for every f X . If, moreover, the inequality (5.3.20)
is strict for all u, v X, u
= v, then the equation (5.3.21) has precisely one solution
u X for every f X .
The second assertion is more general since the monotonicity condition (iv) is replaced by a set of weaker conditions.
Theorem 5.3.23 (LerayLions). Let X be a reexive real Banach space. Let T : X X
be an operator satisfying the conditions
(i) T is bounded;
(ii) T is demicontinuous;
(iii) T is coercive.
Moreover, let there exist a bounded mapping : X X X such that
(iv) (u, u) = T (u) for every u X;
(v) for all u, w, h X and any sequence {tn }
n=1 of real numbers such that tn 0, we
have
(u + tn h, w)  (u, w);
(vi) for all u, w X we have
(u, u) (w, u), u w 0
(the so-called condition of monotonicity in the principal part);

324

Chapter 5. Topological and Monotonicity Methods

(vii) if un  u and
lim (un , un ) (u, un ), un u = 0,

then we have
(w, un )  (w, u)

for arbitrary

w X;

(viii) if w X, un  u, (w, un )  z, then


lim (w, un ), un  = z, u.

Then the equation

T (u) = f

has at least one solution u X for every f X .


The conditions (iv)(viii) of Theorem 5.3.23 are somewhat unintuitive at rst
glance. We will try to clarify these conditions in Appendix 7.6A where an application to
boundary value problems for partial dierential equations is given.
Next we will discuss the main steps of the proof of Theorem 5.3.22. The proof of
Theorem 5.3.23 is similar, nonetheless it is technically more demanding (see, e.g., Leray
& Lions [85]).
Proof of Theorem 5.3.22. We divide the proof into eight steps.
Step 1. Observe that the operator
Tf (u)  T (u) f
also satises all the conditions of Theorem 5.3.22. Hence it suces to prove that the
equation
T (u) = o
(5.3.22)
has at least one solution.
Step 2. We construct an approximation of the innite dimensional equation (5.3.22) by
an equation in a space of nite dimension. More precisely: Let be the family of all
subspaces of nite dimension in the space X. If F , dene the operator jF : F X
by
jF (u) = u.
Obviously jF is linear and continuous on F . Let jF be the adjoint operator to jF (see
Section 2.1). Then
jF : X F
and for u F , put

TF (u)  jF (T (u)).

This denes a mapping TF from the space F into the space F (see Figure 5.3.3).
Step 3. Since a continuous linear operator maps a weakly convergent sequence to a
weakly convergent one (see Proposition 2.1.27(i)) and the weak convergence coincides
with the strong convergence on the subspace F of nite dimension (cf. Remark 2.1.23),
it follows from (ii) that TF is continuous. Put
c(r)  inf

uX

u
=r

T (u), u
.
u

5.3A. Browder and LerayLions Theorem

325

u
jF

X
TF

T
F
jF
T (u)

TF (u) =

jF (T (u))

Figure 5.3.3.

By condition (iii), we have


lim c(r) = ,

i.e.,
TF (u), u = T (u), jF (u) = T (u), u c(u)u
holds for all u F .

(5.3.23)

Step 4. In Exercise 5.3.26 it is proved that the equation


TF (u) = oF

(5.3.24)

has at least one solution uF F .


Step 5 (a priori estimate). There exists an r0 > 0 such that
uF  r0
holds for arbitrary F and for every solution uF F of the equation (5.3.24). Indeed,
if such an r0 did not exist, there would be a sequence {un }
n=1 of solutions of the equation
(5.3.24) with F = Fn , n = 1, 2, . . . , such that
lim un  = ,

lim c(un ) = .

This would lead to a contradiction in view of the inequality (5.3.23)


c(un )un  0 = TFn (un ), un .
Step 6. Let F0 and

F0 = {F : F0 F }.
We denote by UF0 the set of all elements u X which are solutions of the equation
(5.3.24) for a F F0 . Furthermore, let UF0 w be the weak closure of the set UF0 .46 Note
46 In

other words, UF0

is the least weakly closed set which contains UF0 .

326

Chapter 5. Topological and Monotonicity Methods

that UF0 w B(o; r0 ) for any F0 (cf. Exercise 2.1.39 and the fact that UF0 B(o; r0 )
for all F0 ). Let f be any nite subset of . Then

UF0 w
= .
F0 f

Indeed, let f = {F0i : dim F0i < , i = 1, . . . , n}. Then each of the sets UF i contains
0
n

F0i .
all solutions u X of the equation (5.3.24) in
i=1

Since B(o; r0 ) is a compact topological space with respect to the weak topology
(notice that X is reexive), it follows from the result of Exercise 1.2.42 that

UF 0

= .

F0

Hence there exists

u0

UF 0 w .

F0

In the next two steps we prove that u0 is the desired solution of (5.3.22).
Step 7. Let v X. Choose F0 such that v F0 , and let F F0 . If uF F is a
solution of the equation (5.3.24), then condition (iv) implies that
0 T (v) T (uF ), v uF  = T (v), v uF  T (uF ), v uF 
= T (v), v uF  T (uF ), jF (v uF ) = T (v), v uF  TF (uF ), v uF 
= T (v), v uF .
Thus
T (v), v u 0

holds for arbitrary

u UF 0 .

(5.3.25)

By the denition of weak topology (see Remark 2.1.23), (5.3.25) is valid even for arbitrary
u UF0 w . In particular, we then have
T (v), v u0  0.

(5.3.26)

Step 8 (the Minty trick). Choose w X, t > 0, and put v = u0 + tw in the inequality
(5.3.26). Then
0 T (u0 + tw), tw = t(T (u0 + tw), w,

i.e.,

0 T (u0 + tw), w.

By passing to the limit as t 0+ , we obtain (applying condition (ii) demicontinuity


of T ) the inequality
(5.3.27)
T (u0 ), w 0
which is valid for all w X. Replacing the element w in (5.3.27) by the element w, we
have
(5.3.28)
0 T (u0 ), w = T (u0 ), w,
and thus
T (u0 ), w = 0

for every

w X,

i.e.,

T (u0 ) = o.

5.3A. Browder and LerayLions Theorem

327

Example 5.3.24. Let us consider the boundary value problem



p2
x(t))

+ g(x(t)) = f (t),
t (0, 1),
(|x(t)|

(5.3.29)

x(0) = x(1) = 0


where p > 1, f Lp (0, 1) and g is as in Example 5.3.11 (continuous and increasing).


Put X  W01,p (0, 1) with the norm


x =

|x(t)|

dt
p

 p1

and dene operators J, G : X X and an element f X as in Example 5.2.51. Set


T = J + G.
Then the operator equation

T (x) = f

(5.3.30)

is equivalent to the requirement that the integral identity


 1
 1
 1
p2
|x(t)|

x(t)
y(t)
dt +
g(x(t))y(t) dt =
f (t)y(t) dt
0

(5.3.31)

holds for all y X. So, as in Example 5.2.51, to nd a weak solution x of (5.3.29) (i.e., x
satisfying (5.3.31)) is equivalent to nding a solution of (5.3.30). We have, by the H
older
inequality,
J(xn ) J(x) = sup |J(xn ) J(x), y|

y
1

-
= sup -
y
1

|x n (t)|

p2

x n (t) |x(t)|

p2

x(t)

y(t)
dt-9

- p
p2
-|x n (t)|p2 x n (t) |x(t)|

x(t)
- dt

- p
p2
-|x n (t)|p2 x n (t) |x(t)|

x(t)
- dt

1
p

1
p
|y(t)|

dt

sup

y
1

1
p

 p1

(5.3.32)

The last integral tends to zero as xn x 0 due to the continuity of the Nemytski

operator (x)(t) = (x(t))

from Lp (0, 1) into Lp (0, 1) with (s) = |s|p2 s, s


= 0,
(0) = 0, p > 1 (see Theorem 3.2.24).
Using the H
older inequality we also have
- 1


G(xn ) G(x) = sup |G(xn ) G(x), y| = sup -g(xn (t)) g(x(t)) y(t) dt-
y
1

y
1

sup

|g(xn (t)) g(x(t))| dt

y
1

|y(t)| dt
p

 p1

0


|g(xn (t)) g(x(t))|p dt


0

1
p

0
1

p

1
p

(5.3.33)

328

Chapter 5. Topological and Monotonicity Methods

as xn x 0 (cf. Example 5.3.11 and the continuous embedding W01,p (0, 1)  C[0, 1]
for p > 1). It follows from (5.3.32) and (5.3.33) that T is continuous and hence demicontinuous. The boundedness of T follows from estimates similar to (5.3.32), (5.3.33) (the
reader is invited to do it in detail!). We also have
 1
8
9
p2
p2
|x(t)|

x(t)

|y(t)|

y(t)

(x(t)

y(t))

dt
T (x) T (y), x y =
0

[g(x(t)) g(y(t))] (x(t) y(t)) dt

+
0

p
|x(t)|

dt

p2
|y(t)|

y(t)
x(t)
dt

p2
|x(t)|

x(t)
y(t)
dt +
0

|y(t)|

dt

1
p
|x(t)|

dt

0
p1

= [x

|x(t)|

dt

p
|y(t)|

dt
0

0
1
p

1
p

|x(t)|

dt
p

 p1

 p1 
p
|y(t)|

dt
+

1
p
|y(t)|

dt
0

yp1 ][x y] 0

with strict inequality for x


= y, since s |s|p1 is a strictly increasing function on
(0, ). Hence the monotonicity of T follows. Finally,


p
|x(t)|

dt +

T (x), x =
0

g(x(t))x(t) dt
0

= xp +

[g(x(t)) g(0)](x(t) 0) dt +
0

g(0)x(t) dt xp |g(0)|x, 47


0

i.e., T is coercive. It follows then from Theorem 5.3.22 that there is a unique solution of
e
(5.3.30) (which in turn is a unique weak solution of (5.3.29)).
The advantage of the Browder Theorem is more transparent in the case of partial
dierential equations when the embedding W01,p ()  C() does not hold in general,
and so only the demicontinuity of T can be proved.
An application of the more general Theorem 5.3.23 is postponed to the last chapter,
Appendix 7.6A.
Exercise 5.3.25. Prove that the unique weak solution x = x(t) of (5.3.29) belongs to
p2 x is absolutely continuous and the equation
C 1 [0, 1], |x|
+ g(x(t)) = f (t)
(|x|
p2 x)

holds a.e. in

(0, 1).

Hint. Integrating by parts in (5.3.31), we obtain that



 1
 t
p2
|x(t)|

x(t)

+
(g(x( )) f ( ) d y(t)
dt = 0
0

for every y D(0, 1).


47 We

have used x L1 (0,1) x W 1,p (0,1) . Prove it!


0

(5.3.34)

5.3A. Browder and LerayLions Theorem

329

Set

(g(x( )) f ( ) d .

p2
x(t)

+
M (t) = |x(t)|

It follows from Lemma 6.1.9 that


M (t) = c

a.e. in

(0, 1).

(5.3.35)

The assertion now follows from (5.3.35) as in the proof of Theorem 6.1.14.
Exercise 5.3.26. Prove the following assertion:
Let T be a continuous mapping dened on a Banach space X of nite dimension with values in X . Assume that there exists a real function c = c(r),
dened on the interval (0, ), such that
lim c(r) = and that T (u), u c(u)u holds for all u X.

Then T (X) = X , i.e., the equation


T (u) = f
has at least one solution in the space X for arbitrary f X .
Hint. Let f X . In the case when X = X = RN and u, v = (u, v)RN is the scalar
product in RN , there exists r > 0 such that the operator F : RN RN dened by the
relation
F (u) = T (u) f
satises the assumption
(F (u), u)RN > 0

for

u B(o; R)

with R > 0 large enough.

(5.3.36)

Apply the homotopy invariance property of the Brouwer degree and show that (5.3.36)
implies that there exists u0 B(o; R) such that
F (u0 ) = o,

i.e.,

T (u0 ) = f .

In the general case Remark 1.1.12(i) must be employed.


Exercise 5.3.27. Consider the problem
 

p2
x(t)

= h(t, x(t), x(t)),

|x(t)|

x(0) = x(1) = 0,

t (0, 1),

(5.3.37)

where p > 1. Formulate conditions on h = h(t, x, s) which guarantee the existence of a


weak solution of (5.3.37).
Hint. Apply Theorem 5.3.22.
Exercise 5.3.28. How do the conditions on h change if we replace the homogeneous Dirichlet boundary conditions in (5.3.37) by the Neumann ones?

330

Chapter 5. Topological and Monotonicity Methods

5.4 Supersolutions, Subsolutions, Monotone Iterations


In this section we deal with another possibility of extending the notion of a monotone function to operators between Banach spaces of innite dimension. Instead
of characterizing an increasing function f : R R in terms of the inequality
(f (x) f (y))(x y) 0

for any x, y R

(cf. Section 5.3), we use the usual rst semester calculus denition
for any x, y R satisfying x y

we have

f (x) f (y).

(5.4.1)

However, to generalize the implication (5.4.1) to the case of general operators


we have to introduce an inequality relation for Banach spaces which can be used
analogously to the inequality relation for the set of real numbers.
Denition 5.4.1. Let X be a real Banach space and let K be a subset of X. Then
K is called an order cone if
(1) K is closed, nonempty, and K = {o};
(2) a, b R, a, b 0, x, y K implies ax + by K;
(3) x K and x K implies x = o.
On this basis we dene
xy

provided

y x K,

x<y
xy

provided
provided

x y and x = y,
y x int K.

(5.4.2)

The set
[x, y] = {z X : x z y}
is called an order interval in X. Note that
xy

means

xy K

and similarly for > and .


Remark 5.4.2. Condition (2) is equivalent to saying that K is convex, and if x K
and a 0, then ax K.
Denition 5.4.3. By an ordered Banach space we mean a real Banach space together with an order cone.
Remark 5.4.4. The reader should notice the dierence between order cones and
cones. A subset C of the Banach space X is called a cone if x C and a > 0 implies
ax C. So, every order cone is a cone, but the converse is not true in general.
Example 5.4.5. Let X = RN . We set
RN,+ = {(1 , . . . , N ) RN : i 0 for all i = 1, . . . , N }.
Then K = RN,+ is an order cone (see Figure 5.4.1).

5.4. Supersolutions, Subsolutions, Monotone Iterations

331

We have
(1 , . . . , N ) (1 , . . . , N )

if and only if i i for all i= 1, . . . , N,

(1 , . . . , N )  (1 , . . . , N ) if and only if i < i for all i= 1, . . . , N.

Example 5.4.6. The set C in Figure 5.4.2 is a cone in R2 but it is not an order
g
cone.
y

K = R2,+
(1 , 2 )

R2

o
x

(1 , 2 )
C
o

x
Figure 5.4.1.

Figure 5.4.2.

Example 5.4.7. Let X = C() for a bounded set RN . We set


C + () = {f C() : f (x) 0 for every x }.
Then K = C + () is an order cone in X, and we have
f g

if and only if

f (x) g(x)

for all x ,

f g

if and only if

f (x)< g(x)

for all x .

The following assertion summarizes the basic properties of ordering in a Banach space X.
Proposition 5.4.8. For all u, x, xn , y, yn , z X and all a, b R, we have
x x,
xy
xy

and
and

yx
y z

imply
imply

x= y,
x z.

Furthermore, we have
xy
xy
xn yn
47 The

and
and

0ab
uz

for all

imply
imply
implies

limits are understood in the norm topology of X.

ax by,
x+uy+z

and

lim xn lim yn 48

332

Chapter 5. Topological and Monotonicity Methods

provided the limits exist. For the symbol , the following implications hold:
xy
xy

and
and

yz
y z

imply
imply

x  z,
x  z,

xy

and

a>0

imply

ax  ay.

Proof. Use (5.4.2) and the properties of K. For example, if xn yn for all n, then

yn xn K. Since K is closed and limits of {xn }n=1 , {yn }n=1 exist, we conclude
that
y x K,
i.e.,
x y.

Denition 5.4.9. The order cone K is called normal if there is a number c > 0
such that for all x, y X, o x y we have
x cy.
Example 5.4.10. For X = RN , K = RN,+ is a normal order cone in RN . Similarly,
g
C + () is a normal order cone in C().
Lemma 5.4.11. If an order cone is normal, then every order interval [x, y] is
bounded in the norm.
Proof. If x w y, then o w x y x, and hence
w w x + x cx y + x.

Now we can introduce the denition of a monotone increasing operator between ordered Banach spaces.
Denition 5.4.12. Let X and Y be ordered Banach spaces. An operator T : Dom T
X Y is said to be monotone increasing if
x<y

implies

T (x) T (y)

for any x, y Dom T.

An operator T is said to be strictly or strongly monotone increasing if the symbol


is replaced by < or , respectively. Similarly we dene (strictly, strongly)
monotone decreasing operator.
The operator T is said to be positive if T (o) o and for all x Dom T ,
x>o

implies

T (x) o.

As above, the operator is strictly or strongly positive if the symbol is replaced


by > or , respectively.
Example 5.4.13. Let X = Y = R, K = R+ . Then for a real function f : Dom f
R R the concepts of (strictly) monotone increasing (or decreasing) above coincide with the usual denitions. Because of the equivalence of x  y and x < y on
R there is no dierence here between strongly monotone increasing (decreasing)
g
and strictly monotone increasing (decreasing).

5.4. Supersolutions, Subsolutions, Monotone Iterations

333

Example 5.4.14. For a linear operator T , the concepts of (strictly, strongly) positive
are the same as those of (strictly, strongly) monotone increasing. Indeed, let T be
positive, for example. Then we have the following sequence of implications:
x<y

o<yx

o T (y) T (x)

o T (y x)
=

T (x) T (y),
g

i.e., T is monotone increasing. The other proofs are similar.

Let T : X X be an operator on a Banach space X. We consider the


operator equation
u = T (u)
(5.4.3)
and apply an iterative method to solve it. For this purpose we consider the iterations (successive approximations)
un+1 = T (un )

and

vn+1 = T (vn ),

n = 0, 1, 2, . . . .

(5.4.4)

We illustrate the idea of approximations in Figure 5.4.3.


X

o
u0

u1 u2 u

v v2 v1

v0

Figure 5.4.3. Fixed points u, v of T .

The next denition is a basic denition of the existence theory for operator
equations in ordered Banach spaces.

334

Chapter 5. Topological and Monotonicity Methods

Denition 5.4.15. A point u is called a supersolution of (5.4.3) (or of the operator


T ) if
T (u) u.
The prex super is replaced by sub when the respective inequalities are
reversed. For example, u X satisfying u T (u) is a subsolution of (5.4.3).49
The following assertions justify the general principle of super- and subsolutions. This principle can be formulated as follows:
If there is a super- and subsolution, then a solution can be obtained by
the convergent iterative method (5.4.4).
Namely, we have the following results.
Theorem 5.4.16. Let T : X X be a compact monotone increasing operator on
a real Banach space X with a normal order cone X + and u0 be a subsolution (v0

a supersolution) of (5.4.3). Then {un }n=1 ({vn }n=1 ) in (5.4.4) converges if and
50
only if this sequence is bounded above (below). In the case of convergence, the
limit point u is the smallest xed point u of T with u0 u (v is the greatest xed
point v of T with v v0 ).51
Proof. We will consider the case of a subsolution. The case of a supersolution is
very similar. Since T is monotone increasing, we have the sequence of implications
u0 T (u0 ) = u1 = u1 u2 = ,

i.e., u0 u1 u2 .

Now un u implies that un u for all n. Consequently, the sequence {un }


n=1
is bounded above, if it is convergent.

Conversely, the sequence {un }n=1 is convergent if it is bounded above. In


deed, suppose un v for all n. By Lemma 5.4.11, the sequence {un }n=1 is bounded
in the norm. Since un = T (un1 ) and since T is compact, the set of all un is rel
atively compact. Thus there exist a convergent subsequence {unk }k=1 and u such

that unk u. Since the sequence {un }n=1 is monotone, all convergent subse
quences have the same limit point. Therefore, the whole sequence {un }n=1 converges to u as well. Since un+1 = T (un ), letting n shows that
u = T (u).
Let w be a solution of (5.4.3) with u0 w. Then u1 = T (u0 ) T (w) = w,
etc., so that un w for all n. Hence u w.

The intuitive meaning of the following assertion is demonstrated in Figure 5.4.3.
49 The

terminology is not xed. Instead of super- and subsolution the notions upper and lower
solutions have been also used.
50 The set M X is bounded above if there is m X such that u m for any u M.
51 Note that the concepts of smallest and greatest are used in the usual sense, e.g., a smallest
xed point u in X is characterized by u w for all other xed points w u0 .

5.4. Supersolutions, Subsolutions, Monotone Iterations

335

Corollary 5.4.17 (Monotone Iterative Method). Let X be a real Banach space with
a normal order cone and T : X X. Assume that u0 and v0 is a subsolution and
a supersolution of (5.4.3), respectively, and u0 v0 . If T is a compact monotone
increasing operator on the order interval [u0 , v0 ], then both the iterative sequences

{un }n=1 and {vn }n=1 from (5.4.4) are dened, converge, and
u = lim un

and

v = lim vn
n

is the smallest xed point and the largest xed point, respectively, of T in [u0 , v0 ].
Furthermore, we have the error estimates
un u v vn

for all

n = 0, 1, . . .

(Figure 5.4.3).
Proof. Since u0 T (u0 ), T (v0 ) v0 and u0 v0 together imply that u0 u1
v0 and similarly that
for all n,
un v0

it follows that {un }


n=1 is bounded above. The proof then follows from Theorem 5.4.16.

Example 5.4.18 (Integral Equation). Let be a bounded domain in RN ,
G : R be a continuous and nonnegative function and f : R R
be a continuous and increasing function in the second variable. Consider the integral equation

G(x, y)f (y, u(y)) dy

u(x) =

(5.4.5)

in the space C(). We write this equation in the form


u C().

u = T (u),
+

Let us consider the normal order cone C () from Example 5.4.7. Then the operator T : C() C() is compact and monotone increasing.52
Considering subsolutions and supersolutions now means that we replace =
by and , respectively, in the integral equation (5.4.5). Corollary 5.4.17
implies that if u0 C() is a subsolution and v0 C() is a supersolution with
u0 v0 on , then for n the iterative method

un+1 (x) =
G(x, y)f (y, un (y)) dy,
n = 0, 1, 2, . . . ,

converges uniformly on to a solution u C() of the integral equation with


u0 u v0 on . Here u is the smallest solution with this property. If, instead,
the iterative method starts with v0 , then we obtain the greatest solution v with
u0 v v0 . The most dicult task in solving (5.4.5) is to nd at least one
g
subsolution u0 and/or supersolution.
52 The

compactness of T has been proved in Example 5.1.14; f = f (y, u) increasing in u immediately implies that T is monotone increasing.

336

Chapter 5. Topological and Monotonicity Methods

Example 5.4.19 (Dierential Equation). Let us consider the Dirichlet boundary


value problem


x(t) = f (t, x(t)),


t (0, 1),
(5.4.6)
x(0) = x(1) = 0
where f : [0, 1] R R is a function continuous in the rst variable and continuously dierentiable in the second one. Suppose u0 , v0 C 2 [0, 1] are such that



u0 (t) f (t, u0 (t)), t (0, 1),

v0 (t) f (t, v0 (t)), t (0, 1),


u0 (0) 0,

u0 (1) 0,

v0 (0) 0,

v0 (1) 0.53

(5.4.7)
We will show that u0 , v0 is a subsolution and a supersolution, respectively, for the
operator w = T (z) dened by a solution of the problem

w(t)
+ cw(t) = f (t, z(t)) + cz(t)  F (t, z),
t (0, 1),
w(0) = w(1) = 0,
where c > 0 is chosen in such a way that
f
(t, s) + c > 0 for t [0, 1] and s I0 
s


min {u0 (t)}, max {v0 (t)} .

t[0,1]

t[0,1]

Notice that 0 I0 . The map T is correctly dened because the Dirichlet problem

w(t)
+ cw(t) = g(t),
t (0, 1),
(5.4.8)
w(0) = w(1) = 0
has a unique solution for any xed g C[0, 1]. Then T : C[0, 1] C[0, 1] is
a compact operator. This follows from the fact that T is composed from the
Nemytski operator
N : z(t)  f (t, z(t)) + cz(t)
which is continuous, and a compact linear operator A1 where
A(w(t)) = w(t)
+ cw(t),
(cf. Example 2.2.17), i.e.,

Dom A = {w C 2 [0, 1] : w(0) = w(1) = 0}


T = A1 N.

We will prove that T : C[0, 1] C[0, 1] is a monotone increasing operator. Indeed,


let z1 , z2 C[0, 1], z1 z2 . By denition,

(T (zi ))(t) + cT (zi )(t) = f (t, zi (t)) + czi (t),
t (0, 1),
for i = 1, 2.
(T (zi ))(0) = (T (zi ))(1) = 0
53 The

functions u0 , v0 are called a subsolution and supersolution of the boundary value problem
(5.4.6), respectively.

5.4. Supersolutions, Subsolutions, Monotone Iterations

337

Putting
w = T (z2 ) T (z1 )
we get

w(t)
+ cw(t) = f (t, z2 (t)) f (t, z1 (t)) + c(z2 (t) z1 (t)),
w(0) = w(1) = 0.

t (0, 1),

However, the function


F : (t, s)  f (t, s) + cs
is increasing in s on the interval I0 by the choice of c. Hence for z1 z2 ,
z1 (t), z2 (t) I0 for every t [0, 1], we have
0 F (t, z2 ) F (t, z1 ) = f (t, z2 (t)) f (t, z1 (t)) + c(z2 (t) z1 (t)) = w(t)
+ cw(t).
Therefore

w(t)
+ cw(t) 0,
w(0) = w(1) = 0.

(5.4.9)

Assume that there is t (0, 1) such that w(t) < 0. Then there is t0 (0, 1) such
that
0 > w(t0 ) = min w(t).
t[0,1]

But then w(t


0 ) 0, a contradiction with the inequality (5.4.9). Hence w(t) 0
in (0, 1), i.e., T (z1 ) T (z2 ).54
We now prove that v0 T (v0 ), i.e., v0 is a supersolution of T . Set v1 = T (v0 ).
We get

t (0, 1),

v1 (t) + cv1 (t) = f (t, v0 (t)) + cv0 (t),


v1 (0) = v1 (1) = 0,
therefore
(v1 (t) v0 (t))+ c(v1 (t) v0 (t)) = f (t, v0 (t)) + cv0 (t) + v0 (t) cv0 (t) 0
for t (0, 1). The same argument as above yields that v1 (t) v0 (t), t [0, 1].
Analogously we prove that u0 T (u0 ), i.e., u0 is a subsolution of T . If, moreover,
g
u0 v0 , then Corollary 5.4.17 can be used (cf. Exercise 5.4.22).
Exercise 5.4.20. Let T : RN RN . Then the equation x = T (x), x RN , in
(5.4.3) describes a system of nonlinear equations
xi = Ti (x1 , . . . , xN ),

i = 1, . . . , N.

Consider the order cone RN,+ from Example 5.4.5. Translate all the assumptions
and conclusions of Theorem 5.4.16 and Corollary 5.4.17 to this system.
54 The argument used to prove w(t) 0 in (0, 1) is a special version of the more general Maximum
Principle (see, e.g., Protter & Weinberger [102]). The monotonicity of T can be also shown by
proving that the Green function corresponding to the operator A (Example 2.2.17) is nonnegative.

338

Chapter 5. Topological and Monotonicity Methods

Exercise 5.4.21. Formulate conditions on a nonlinear function f : R R which


guarantee that there exist a subsolution u0 C() and a supersolution v0 C()
of the operator T from Example 5.4.18 such that u0 v0 on . Then formulate
the corresponding existence result for the integral equation (5.4.5).
Exercise 5.4.22. Formulate conditions on a function f : [0, 1] R R which guarantee that there exist a subsolution u0 C 2 [0, 1] and a supersolution v0 C 2 [0, 1]
of the operator T from Example 5.4.19 such that u0 v0 in [0, 1]. Then formulate
the corresponding existence result for the boundary value problem (5.4.6).
Exercise 5.4.23. Replace in (5.4.6) the homogeneous Dirichlet boundary conditions
by the Neumann ones. Modify the denitions of a subsolution and a supersolution
in such a way that Corollary 5.4.17 could be applied. Formulate conditions on f =
f (t, x) which guarantee the existence of a solution of the corresponding Neumann
problem.
Hint. Use Corollary 5.4.17.
Exercise 5.4.24. Consider the Dirichlet boundary value problem


x(t) = h(t, x(t), x(t)),

t (0, 1),
x(0) = x(1) = 0.

(5.4.10)

Formulate conditions on h = h(t, x, s) which guarantee the existence of a solution


of (5.4.10).
Hint. Use Corollary 5.4.17.
Exercise 5.4.25. How do the conditions on h change if we replace the homogenenous Dirichlet boundary conditions in (5.4.10) by the Neumann ones?

5.4A Minorant Principle and KreinRutman Theorem


In this appendix we study the eigenvalue problem
T (x) = x,

(5.4.11)

and the corresponding inhomogeneous equation


x T (x) = y,

y > o,

(5.4.12)

on a real Banach space X with an order cone K  X .


+

Denition 5.4.26. By a positive solution (x, ) of (5.4.11), we mean a solution of T (x) =


x with x > o and > 0. If we replace = by , then we speak about a positive
subsolution.
Although we present mainly statements about linear problems, the following results
play a central role in the investigation of nonlinear problems, for example, in the bifurcation theory, variational principles, etc. The essential tools for investigating (5.4.11) are the
Minorant Principle and the Separation Theorem for convex sets (see Corollary 2.1.18).
Set
for a xed, given r > 0.
Kr = {x K : x r}

5.4A. Minorant Principle and KreinRutman Theorem

339

The key is to nd a suitable minorant M for T , so that


T (x) M (x)

for all

x Kr ,

(5.4.13)

and which satises appropriate conditions. Furthermore, it is important to know a subsolution x0 , i.e.,
c > 0, x0 > o.
(5.4.14)
M (x0 ) cx0 ,
The general Minorant Principle:
If we know a subsolution of (5.4.11), then we can obtain a positive eigenvalue
with a positive eigenvector of (5.4.11),
is formulated precisely in the following two theorems.
Theorem 5.4.27 (Krasnoselski). Suppose that
(i) X is a real Banach space with an order cone K;
(ii) an operator T : Kr X X is compact and (5.4.13) holds;
(iii) a linear operator M : K X is positive, and there are an x0 > o and a positive
real number c such that (5.4.14) holds.
Then for every  with 0 <  < r the problem (5.4.11) has a positive solution (x, )
satisfying
x = .
Theorem 5.4.28 (Zeidler). Let us set
(x) = sup{t 0 : x tx0 }

for xed

x0 > o

and all

x K.

The conclusion of Theorem 5.4.27 still holds if we replace (iii) by the following condition:
(iii ) suppose that M : K X K is an operator, not necessarily linear, for which
there is an x0 > o and there are real numbers s with 0 < s 1 and c > 0 such
that
for all x Kr .
(5.4.15)
M (x) ((x))s cx0
Theorem 5.4.27 is a special case of Theorem 5.4.28. Indeed, since x (x)x0 for
x Kr , we have
M (x) (x)M (x0 ) (x)cx0

x Kr .

for all

Thus, (5.4.14) implies (5.4.15) with s = 1.


Proof of Theorem 5.4.28. We will use a regularization method and the Schauder Fixed
Point Theorem (Theorem 5.1.11).
Let us rst solve an auxiliary problem
n xn = Tn (xn ),

n > 0,

where
Tn (x)  T (x) +

xn > o,

xn  = 

x0
, n = 1, 2, . . . , 0 <  r.
n

Let n be xed. Set


z(x) = 

x
x

for

x
= o

For x K we set
S(x) = xTn (z(x)) +

and

z(o) = o.

( x)x0
.
n

(5.4.16)

340

Chapter 5. Topological and Monotonicity Methods

Then S is compact on K (explain why!), and by (5.4.13), (5.4.16)


S(x)

xx0
( x)x0
x0
+
=
>o
n
n
n

for all

x K .

So there is an an > 0 such that


S(x) an

for all

x K .

It follows from the boundedness of S(K ) that there exists a number bn > 0 such that
0 < an S(x) bn

for all

x K .

(5.4.17)

By (5.4.17),
S(x)
S(x)
is well dened on K . Furthermore, the operator V : K K is compact on the closed,
bounded, and convex set K (why?). By Theorem 5.1.11 (the Schauder Fixed Point
Theorem) there is an xn K such that
V (x) = 

xn = V (xn ).
Tn (xn )
In particular, xn  = V (xn ) = , so z(xn ) = xn . Therefore xn = 2
S(x
, which
n )

n )

means that xn is a solution of (5.4.16) with n =


S(x
.
2
Before we pass to the limit for n , we estimate the value of n . Namely, we
will show that there exist numbers a, b > 0 such that

0 < a n b

n N.

for all

(5.4.18)

It follows from (5.4.16) that


n  T (xn ) + x0 ,

so

b  sup n < .
nN

On the other hand, xn (xn )x0 implies that there exists such that
 sup (xn ) < .
nN

Indeed, otherwise there would be a subsequence, again denoted by {xn }


n=1 , with
xn

o
as
n

.
Now
(5.4.16),
(xn ) as n , contradicting o < x0 (x
n)
(5.4.13) and (5.4.15) imply that
n xn = T (xn ) +

x0
x0
x0
M (xn ) +

.
n
n
n

Therefore, (xn ) > 0, and furthermore


n xn M (xn ) ((xn ))s cx0 ,
i.e., the denition of (xn ) implies
(xn )
This proves (5.4.18).

((xn ))s c
,
n

so

n (xn )s1 c s1 c  a.

5.4A. Minorant Principle and KreinRutman Theorem

341

Now, we pass to the limit n in (5.4.16). Using (5.4.18) and xn  = , we can

nd convergent subsequences, again denoted by {n }


n=1 and {T (xn )}n=1 , with n
and T (xn ) y strongly in X. By (5.4.18), > 0. Then we have also strong convergence
in X for
'
x0 (
x.
T (xn ) +
xn = 1
n
n
Hence
x = T (x)
and
x K, x = .

Example 5.4.29. We will consider the nonlinear system of equations
i = fi (1 , . . . , N ),

i = 1, . . . , N,

(5.4.19)

, x = , > 0. The following assertion (the


with x = (1 , . . . , N ) and x K  R
Generalized Perron Theorem) is a consequence of Theorem 5.4.27:
N,+

Suppose that fi : K (0, ) is continuous for i = 1, . . . , N and that there is


a xed r > 0 for which
fi (1 , . . . , N )

N


ij j

holds for all

xK

with

x r,

(5.4.20)

j=1

and i = 1, . . . , N . Assume that all the real numbers ij are nonnegative, and
that
N

ij > 0.
min
1iN

j=1

Then (5.4.19) has a positive solution for every  with 0 <  r.


Indeed, we can write (5.4.19) as x = T (x) and apply Theorem 5.4.27 with X = RN ,
X + = RN,+ , x0 = (1, . . . , 1), and
M (x)  (1 , . . . , N )

where

i =

N


ij j .

j=1

Example 5.4.30. We will consider the nonlinear integral equation


 b
A(t, s)f (s, x(s)) ds
x(t) =

(5.4.21)

on a nite interval [a, b] with > 0. This time, for xed > 0 and r > 0, the key
condition (substituting (5.4.20)) is
f (s, x) x

for all

(s, x) [a, b] [0, r].

(5.4.22)

Applying Theorem 5.4.27 we have the following assertion (the Generalized Jentzsch Theorem):
Suppose A : [a, b] [a, b] R is continuous, nonnegative, and
 b
A(t, s) ds > 0.
min
t[a,b]

Let f : [a, b] R R be continuous and let (5.4.22) be satised. Then for


every  with 0 <  r, (5.4.21) has a positive solution x C[a, b] with
x = .
+

342

Chapter 5. Topological and Monotonicity Methods

Indeed, we write (5.4.21) as x = T (x) and apply Theorem 5.4.27 with X = C[a, b],
X + = C + [a, b], x0 (t) 1 and


M (x)(t) 

A(t, s)x(s) ds.


a

Proposition 5.4.31. Let T : X X be a compact linear positive operator on a real ordered


Banach space X. Then there exists a positive solution of (5.4.11) if and only if (5.4.11)
has a positive subsolution.
Proof. This assertion is an immediate consequence of the Minorant Principle (Theorem 5.4.27 with M = T ).

Our goal is to sharpen this result. Let T : X X be a linear operator and let r(T )
denote the spectral radius of the complexication of T .55
We call a simple eigenvalue of T if its multiplicity m() is equal to 1.56 Recall
that this means
dim Ker (I T ) = 1

and

Ker (I T )2 = Ker (I T ).

Let K  X ,+ denote the set of all positive functionals x K , i.e.,


x , x 0

for all

x K  X+.

We write x o if x is positive. Furthermore, x > o means


x o

and

x , x > 0

for a certain

x K.

We call x strictly positive if


x>o

always implies

x , x > 0.

A cone K  X + X is called total if Lin(K) is dense in X. Then K being total


implies that K is an order cone (cf. Exercise 5.4.43). In this case we call K the dual
order cone of K. In particular, K is total if
int K
=
(cf. Exercise 5.4.41). For X = RN , K = RN,+ we have X = X, K = K (explain why!).
Proposition 5.4.32 (KreinRutman). Let X be a real Banach space with a total order cone
K. Suppose that T : X X is linear, compact, and positive, with r(T ) > 0. Then r(T )
is an eigenvalue of both T and T with eigenvectors in K and K , respectively.
If T is strongly positive, we get a sharper version of the previous assertion.
X is a real Banach space, then by the complexication of T we mean the operator T : XC
XC dened by T (x + iy) = T (x) + iT (y), x, y X, where XC is the complexication of X in the
sense of Example 1.1.6(iii).
56 The signicance of simple eigenvalues, roughly speaking, is that their behavior is very stable
under perturbations of the operator (cf. Example 4.2.4 and in more details Kato [73]). For this
reason, simple eigenvalues play a special role also in the bifurcation theory.
55 If

5.4A. Minorant Principle and KreinRutman Theorem

343

Theorem 5.4.33 (KreinRutman). Let X be a real Banach space with an order cone
K having nonempty interior. Then any linear, compact, and strongly positive operator
T : X X has the following properties:
(i) T has exactly one eigenvector with x > o and x = 1. The corresponding eigenvalue is r(T ) and it is algebraically simple. Furthermore, x  o.
(ii) If (T ),
= r(T ), then || < r(T ).
(iii) The dual operator T has r(T ) as an algebraically simple eigenvalue with a strictly
positive eigenvector x .
Remark 5.4.34. Recall that by the RieszSchauder theory (see Theorem 2.2.9), the spectrum of T consists of at most countably many nonzero eigenvalues of nite multiplicity
which can accumulate only at the origin, and o (T ) whenever dim X = . The
spectra of T and T coincide (X is a real space).
Now, we will give proofs of Proposition 5.4.32 and Theorem 5.4.33.
Proof of Proposition 5.4.32. Let us consider T on the complexication XC = X + iX. By
the RieszSchauder theory (see Theorem 2.2.9), all of the nonzero points of the spectrum
of T consist of eigenvalues of nite multiplicity. The same holds for T . Note that
(T ) { : || = r(T )}
= .
We consider the eigenvalues of T satisfying || = r(T ), and distinguish three cases.
Case 1 (0 = r(T ) is an eigenvalue). Our goal is to construct an x > o and an x > o
such that
and
T (x ) = 0 x .
T (x) = 0 x
From footnote 3 on page 57 we have
(I T )1 u =
and, therefore,

(I T )1 u o


T ju
,
j+1
j=0

for

> r(T ),

> 0

and

u o.

Since T is compact, 0
= 0 is an eigenvalue of nite multiplicity (Remark 2.2.10) and in
the Laurent series
+

( 0 )n An ,
(5.4.23)
(I T )1 =
n=

there is an index k such that


An = O

for all

k<n

and

Ak
= O

(Proposition 3.1.15). So, there is u > o such that


x  Ak u
= o
(otherwise Ak = O since K is total). It follows from Proposition 3.1.15 that
T x = 0 x.

344

Chapter 5. Topological and Monotonicity Methods

Moreover, by (5.4.23) and its proof (cf. page 114)


x = Ak u = lim ( 0 )k (I T )1 u o,
0+

i.e.,

x > o.

Let us construct the element x . By the previous step, Tn (K) K. We choose a

u . Then
u K with u , x > 0. This is possible by Exercise 5.4.42. We set x = Tn
v o implies
x , v = u , Tn v 0

x , u = u , x > 0.

and

Thus x > o. Passing to the dual operator in (5.4.23), we obtain


0 x = T (x )
analogously as above.
Case 2 (there is an eigenvalue 0 C of T with |0 | = r(T ) and n
0 > 0 for an n N,
which
lies
on
the
spectral circle of
i.e., Arg 0 57 ). Now T n has a positive eigenvalue n
0
T n , so by Case 1 there exists a u > o with
T n (u) = n
0 u.
If we set
x = |0 |n1 u + |0 |n2 T (u) + + T n1 (u),
then x > o and T (x) = |0 |x. Analogously we construct an x for T .
Case 3 (none of the eigenvalues of T with || = r(T ) has the property from Case 2). We
show that this is impossible. So, let 0 be an eigenvalue of T with |0 | = r(T ) and with
the greatest possible real part. We set
T = T + T 2

for

> 0.

By the Spectral Mapping Theorem (see Proposition 3.1.14(v)), all eigenvalues of T are
of the form + 2 where is an eigenvalue of T . One can check that 1 = 0 + 20 and
1 are the eigenvalues of T of greatest absolute value (the reader is asked to justify it!).
k
There is a sequence {k }
k=1 , k 0, such that Arg 1 is a rational multiple of 2 where
k
2
1 = 0 + k 0 (explain why!). According to Case 2, there is n N such that nk
1 > 0.
Since
n
lim nk
1 = 0 > 0,
k

we get a contradiction.
Before we prove Theorem 5.4.33 we need the following geometrical result.

Lemma 5.4.35. Let X be a real Banach space with an order cone K  X + containing an
interior point. Let u  o. Then for every v
K there is a uniquely determined number
u (v) > 0 such that
(i) 0 u (v) implies u + v o;
(ii) > u (v) implies u + v
K.
In particular,
u + v  o
57 Any

and

>0

imply

< u (v).

complex number
= 0 can be written in the form = ||ei Arg .

(5.4.24)

5.4A. Minorant Principle and KreinRutman Theorem

345

Proof. Consider the ray


 = {u + v : 0}.
For small 0 we have u + v int K, and for large 0 we have u + v
K.
Otherwise u + nv K for large n N, and nu + v K. Passing to the limit for n ,
we obtain a contradiction v K. Set
u (v)  sup { > 0 : u + v int K}.


It is easy to show that u (v) has the desired properties.


Proof of Theorem 5.4.33. We proceed in six steps.

Step 1 (existence of a positive solution). We choose an x > o. Since T is strongly positive,


T (x)  o, so T (x) int K. Thus T (x) x K for small > 0, so T (x) x. By
Proposition 5.4.31, there exists a positive solution (e, 0 ):
T (e) = 0 e

with

e>o

and

0 > 0.

Since T (e)  o, we have also e  o.


Step 2. We show:
If T (x) = x, x > o and R, then x = e for a positive and = 0 .
To begin with, T (x)  o, so > 0 and x  o. We consider two identities
T (e x) = 0 (e 1
0 x),
T (x e) = (x

0 e),

(5.4.25)
(5.4.26)

and choose
= e (x)

and

= x (e).

Then x = e. Otherwise, xe > o. This implies T (xe)  o, and hence 1 0 < 1 by


(5.4.26) and (5.4.24). On the other hand, e x o immediately implies T (e x) o,
and (5.4.25) yields the contradiction 1
0 1.
Step 3. We show:
If T (x) = x and x
= o, R \ {0} as well as x
= e for all R, then || < 0 .
By Proposition 5.4.32 0 = r(T ) now follows, and with respect to Step 2,
dim Ker (0 I T ) = 1.
By Step 2, x
K. We consider
T (e x) = 0 (e 1
0 x)
and set
= e (x).
Since e x
= o, we have e x > o, so
T (e x)  o.
Then (5.4.27) and (5.4.24) immediately imply 1
0 || < 1.

(5.4.27)

346

Chapter 5. Topological and Monotonicity Methods

Step 4. We now consider the complexication XC = X + iX and T : XC XC (see


footnote 55 on page 342). In this step we show:
If is a complex eigenvalue of T , then || < 0 .
Let = +i , , R, be an eigenvalue of T and z = x+iy, x, y X, the corresponding
eigenvector, i.e., according to the denition of T , we have
T (x + iy) = ( + i )(x + iy),
which is equivalent to
T (x) = x y,

T (y) = x + y.

(5.4.28)

Our goal is to show that (5.4.28) implies


1
|| = 2 + 2 < 0 .
The reader is invited to prove that if is not real and (5.4.28) holds, then x and y
are linearly independent elements of X (cf. Remark 1.1.35(ii)). In particular, x
= o and
y
= o.
Let P be a two-dimensional plane in X which consists of elements x + y, , R.
Then P is an invariant subspace of the operator T , i.e.,
T (P) P.
= K P is
Let T be the restriction of T onto P. Since also T (K) K, the cone K
invariant with respect to T, i.e.,
K.

T (K)
We want to prove that
= {o}.
K

(5.4.29)
is an order cone in P and T : P P is strongly positive
Assume the opposite, then K
since T is strongly positive. According to Step 1, there exists a positive eigenvector e P
According to Step 2 we necessarily have
of T (and hence also of T ) such that e K.
e = e for a certain
= 0, R. But this fact combined with (5.4.28) implies that x
and y are linearly dependent, which is a contradiction, i.e., (5.4.29) is proved.
It now follows from (5.4.29) that no elements x + y with || + || > 0 belong
to K. In particular, x
K. Since int K
= implies that K is total, there exist nonzero
elements x int K and x int (K) such that
x = x x .58
There exists > 0 such that
T (x ) e.
Indeed, since e int K we nd > 0 large enough to satisfy e 1 T (x ) K.
58 Indeed, if v
0
x= u
v 0 .

int K, v0
= o and  > 0 is small enough, then u = v0 + x int K, u
= o. Hence

5.4A. Minorant Principle and KreinRutman Theorem

347

So, we have
T (x) = T (x ) T (x ) T (x ) e,

i.e.,

e+

1
T (x) K.

It follows from (5.4.28) that = e + 1 T (x) can be written in the form


|| + || > 0.

= x + y + e,

(5.4.30)

Let A be the set of all elements of the form (5.4.30) which belong to K. We have just
shown that A =

. Let us consider a continuous function of two variables f : A R which


with every A associates the number 2 + 2 . Since x
K, y
K, the function f
must be bounded. It follows from the Extreme Value Theorem (K is closed) that there is
0 = 0 x + 0 y + e A

f (0 ) = max f ()  M.

such that

It follows from the strict positivity of T that there exists > 0 such that
T (0 ) e.
Indeed, 0 K, 0
= o, implies T (0 ) int K. We then can nd > 0 small enough to
satisfy
T (0 ) e K.

Let us assume without loss of generality that



1

< 1. Let us rewrite T (0 ) e as


0 e + (1 x + 1 y) o

where

1 x + 1 y = T (0 x + 0 y).

(5.4.31)

Using (5.4.28), we have


T (0 x + 0 y) = (0 + 0 )x + (0 + 0 )y
and hence
1 = 0 + 0 .

1 = 0 + 0 ,

(5.4.32)

Then
12 + 12 = (02 + 02 )( 2 + 2 ) = M ||2 .
It follows from (5.4.31) that
1 = e + '

1
1

x+ '
0

1
1

y
0

is an element of the form (5.4.30). Hence



M
which implies || < 0 .

1
0

2

1
0

2
=

M ||2
,
(0 )2

348

Chapter 5. Topological and Monotonicity Methods

Step 5. We show that 0 is simple. Since dim Ker (0 I T ) = 1 (Step 3), it is enough
to prove
Ker (0 I T )2 = Ker (0 I T ).
Let
(0 I T )2 (x) = o.
By Step 2, this implies
(0 I T )x = e.
We want to show that = 0. Suppose
= 0. We may assume that > 0, for otherwise
we pass to x. Set 0 = 1
0 . Now x = 0 T (x + e) implies
x + e = 0 T (x + 2e)

and

x = 20 T 2 (x + 2e).

n
It follows by induction that x = n
0 T (x + ne), so
'
x(
x
n
for all
= n
e +
0T
n
n

n N.

(5.4.33)

Since e int K, we have e + nx o for large n. By (5.4.33) and the positivity of T , we


n
have nx o. Furthermore, from (5.4.33) and n
0 T (e) = e we immediately conclude
' (
x
n x
o.
e = n
0T
n
n
Passing to the limit for n we get e o, so = 0, contradicting > 0.
Step 6 (examination of T ). By Proposition 5.4.32 there exists e > o such that
T (e ) = 0 e .
We show that

e , x > 0

provided

x > o,

(5.4.34)

i.e., e is strictly positive. Indeed, let x > o. Then T (x)  o and by Exercise 5.4.44,
e , T (x) > 0. So
0 e , x = T (e ), x = e , T (x) > 0.
According to the RieszSchauder Theory (see Theorem 2.2.9),
dim Ker (0 I T ) = dim Ker (0 I T )
which is equal to 1 by Steps 2 and 3. To prove that 0 is an algebraically simple eigenvalue
of T choose x Ker (0 I T )2 . Let y = 0 x T x . Then y = e for an R.
For any x > o we have
e , x = y , x = x , 0 x T x.
In particular, taking x = e we obtain = 0, i.e., y = o and x Ker (0 I T ). This
proves
Ker (0 I T )2 = Ker (0 I T ).
This completes the proof of Theorem 5.4.33.

The authors want to point out that another proof of the KreinRutman Theorem
can be found in, e.g., Tak
ac [126].

5.4A. Minorant Principle and KreinRutman Theorem

349

Corollary 5.4.36. Let X and T be as in Theorem 5.4.33. For every y > o, (5.4.12) has
exactly one solution x > o if > r(T ), and no such solution if r(T ). Moreover,
x T (x) = y

and

x > o, y > o

sgn() = sgn( r(T )).

imply

Here and are real numbers.


Proof. The resolvent R exists for > r(T ) and thus the equation
x T (x) = y

(5.4.35)

has a unique solution for any y X. Since R : K K by the proof of Proposition 5.4.32,
hence y > o implies x > o. On the other hand, if r(T ) and there is a positive solution
x of (5.4.35) for y > o, then choosing e X as in Step 6 of the proof of Theorem 5.4.33
we arrive at
( r(T ))e , x = e , x T (x) = e , y > 0,
a contradiction. Finally, let x > o, y > o and
x T (x) = y

for a certain

R.

Then
( r(T ))e, x = e , x T (x) = e , y,

i.e.,

sgn( r(T )) = sgn .

Corollary 5.4.37 (Comparison Principle). Let X and T be as in Theorem 5.4.33. If


S : X X is a compact linear operator with
S(x) T (x)

for all

x o,

then
r(S) r(T ).
If S(x) > T (x) for all x > o, then r(S) > r(T ).
Proof. Let
S(x) T (x)

for all

x o.

Choose e > o such that T (e) = r(T )e. Then


S(e) T (e) = r(T )e.
By Proposition 5.4.31, r(S) (S) and therefore r(S) r(T ).
In order to prove the second part of the statement we choose x > o with S(x) =
r(S)x (see Proposition 5.4.32). We now set
AST
and choose e as in Step 6 of the proof of Theorem 5.4.33. Then
r(S)e , x = e , A(x) + e , T (x) = e , A(x) + T (e ), x
= e , A(x) + r(T )e , x.
By (5.4.34), we have e , x > 0 and also e , A(x) > 0, and thus
r(S) > r(T ).

350

Chapter 5. Topological and Monotonicity Methods

Example 5.4.38. Let X = RN and X + = RN,+ . Further, let T be a real (N N ) matrix


of positive elements only. Then T : X X is linear, compact, and strongly positive. The
e
conclusions of Theorem 5.4.33 coincide with those of the classical Perron Theorem.
Example 5.4.39. Let be a bounded domain in RN . We set X = C(), X + = C + ()
(cf. Example 5.4.7) and consider the integral equation

A(t, s)x(s) ds

x(t) =

for all

t ,

(5.4.36)

with a positive continuous kernel A : R. If we write (5.4.36) in the form


x = T (x),

x X,

then Theorem 5.4.33 is the classical Jentzsch Theorem.

In the next example we use some facts from the forthcoming Chapter 7. The reader
who is not acquainted with the properties of the Laplace operator can skip this example
or consider the one-dimensional case and replace the Laplace operator by the second
derivative.
Example 5.4.40. Let us consider the eigenvalue problem for the Laplace operator subject
to the homogeneous Dirichlet boundary conditions


u(x) = u(x)

in

x ,

u(x) = 0

on

x ,

(5.4.37)

where is a bounded domain in RN and is its boundary (cf. Remark 7.2.2). Then
(5.4.37) can be written in the form (5.4.36) with = 1 where A = A(t, s) is the Green
function associated with the Laplace equation with the homogeneous Dirichlet boundary
conditions. Since A is a positive continuous function A : R (see, e.g., Gilbarg &
Trudinger [59]), we can apply the result of Example 5.4.39.
Multiplying the equation in (5.4.37) by u = u(x) (u is a real function) and using
the Green Formula (cf. footnote 7 on page 479), we nd



u(x)2 dx =

u2 (x) dx,

which shows that (5.4.37) has only positive real eigenvalues. It then follows from Example 5.4.39 (and hence from the KreinRutman Theorem) that (5.4.37) has the least
eigenvalue 1 > 0 which is simple and which is the only eigenvalue of (5.4.37) having a
e
positive eigenfunction 1 (x) > 0, x .
Exercise 5.4.41. Show that if int K
= , then K is a total cone and construct an example
of a cone which is not total.
Hint. If y int K, then y x K for every x X with > 0 suciently small. Thus
X = K K because
(y + x) (y x)
x=
.
2

5.4B. Supersolutions, Subsolutions and Topological Degree

351

Exercise 5.4.42. Show that for every x K \ {o}, there exists an x X such that
x , x > 0.
Hint. Since x
K and K is closed, x is an exterior point of K. Consequently, there
is an open convex neighborhood U of x which is disjoint from K. By the Separation
Theorem for convex sets,59 there is an x X with x (K) 0 and x (U) < 0. Hence
x , x > 0.
Exercise 5.4.43. Show that if K is total, then K is an order cone on X .
Hint. K =

{o} implies K
= {o} by Exercise 5.4.42. Suppose x K . We have to
show that x = o. Indeed, x , x 0 for all x K implies x , x 0 for all x X,
because K is total. Hence x = o.
Exercise 5.4.44. Let x X . Show that if x > o (i.e., x o and x , y > 0 for a
y > o), then x , x > 0 for all x int K.
Hint. Suppose x , x = 0 for an x int K. Then x y K for suciently small > 0.
Hence x , x y 0, so x , y = 0. This is a contradiction.
Exercise 5.4.45. Prove that the functional v u (v) from Lemma 5.4.35 is continuous.
Exercise 5.4.46. Apply the KreinRutman Theorem to the problems in Examples 2.1.32
and 2.2.17.

5.4B Supersolutions, Subsolutions and Topological Degree


In this appendix we show the connection between the supersolution and subsolution
on the one hand and the topological degree on the other. We consider the quasilinear
boundary value problem

p2
x(t))

= f (t, x(t)),
t (0, 1),
(|x(t)|

(5.4.38)
x(0) = x(1) = 0,
as a model example. A special case of it was studied in Examples 5.2.51 and 5.3.24.
However, in this appendix we work in dierent function spaces. Here p > 1 is a real
number and f : [0, 1] R R is a function the properties of which will be specied later.
By a solution of (5.4.38) we understand a function x C 1 [0, 1] with x(0) = x(1) = 0
such that |x|
p2 x is absolutely continuous and the equation in (5.4.38) holds a.e. in (0, 1).
Clearly, the problem (5.4.38) formally coincides with (5.4.6) if p = 2.
Denition 5.4.47. A function u0 C 1 [0, 1] with |u 0 |p2 u 0 absolutely continuous is called
a subsolution of (5.4.38) if
u0 (0) 0,
u0 (1) 0
and
(|u 0 (t)|p2 u 0 (t)) f (t, u0 (t))

for a.e.

In an analogous way we dene a supersolution v0 of (5.4.38).


We write x  y if and only if
x(t) < y(t),
59 This

is a minor supplement of Corollary 2.1.18.

t (0, 1),

t (0, 1).

352

Chapter 5. Topological and Monotonicity Methods

and
either

x(0) < y(0)

or

x(0) = y(0)

and

x(0)

< y(0),

and the same alternatives hold at 1.60


Denition 5.4.48. A subsolution u0 of (5.4.38) is said to be strict if every possible solution
x of (5.4.38) such that u0 x on [0, 1] satises u0  x.
In an analogous way we dene a strict supersolution of (5.4.38).
Let us formulate (5.4.38) as a xed point operator equation. Assume that for any
y C01 [0, 1]  {x C 1 [0, 1] : x(0) = x(1) = 0}
we have

f (t, y(t)) L (0, 1).

We denote by T : C01 [0, 1] C01 [0, 1] the solution operator of



p2
x(t))

= f (t, y(t)),
t (0, 1),
(|x(t)|

x(0) = x(1) = 0,

(5.4.39)

i.e., for x, y C01 [0, 1],


x = T (y)
if and only if the equation in (5.4.39) holds a.e. in (0, 1). For any xed y C01 [0, 1] it
follows by integration of (5.4.39) and the injectivity of (s) = |s|p2 s that the operator
T is well dened.
Clearly, the problem (5.4.38) has a solution x if and only if
x = T (x),
i.e., x is a xed point of T .
Let f be a Caratheodory function and for any r > 0 let there exist a constant
hr > 0 such that for a.e. t (0, 1) and for all s (r, r),
|f (t, s)| < hr .

(5.4.40)

This condition is satised if, e.g., f (t, x(t)) = h(t) g(x(t)) where h L (0, 1) and
g : R R is a continuous function (cf. Examples 5.2.51 and 5.3.24).
We prove that the operator T is compact. To this purpose we express T in the
integral form. By the Rolle Theorem for any x = T (y) there exists tx [0, 1] such that
x(t
x ) = 0, i.e.,
-p 2  tx
-  tx
f (, y( )) d -f (, y( )) d
(5.4.41)
x(t)

=t

 t - 
x(t) =
-

and

where p =

p
.
p1

-p 2 
tx
f (, y( )) d --

f (, y( )) d

(5.4.42)

If yn y0 in C01 [0, 1], then the continuity of the Nemytski operator


y f (, y)

60 Here

tx

(5.4.43)

x(0)

and x(1)

mean the derivative from the right and from the left, respectively.

5.4B. Supersolutions, Subsolutions and Topological Degree

353

from C[0, 1] into C[0, 1], and (5.4.41), (5.4.42) imply that xn x0 in C01 [0, 1] where
xn = T (yn ), x0 = T (y0 ), i.e., T is continuous. Let M C01 [0, 1] be a bounded set.
To prove the compactness of T we have to show that T (M) is relatively compact. Let
{xn }
n=1 T (M) be an arbitrary sequence, xn = T (yn ), yn M. It follows from
the compact embedding C01 [0, 1]  C[0, 1] (see Theorem 1.2.13) that there exists a

subsequence {ynk }
k=1 {yn }n=1 which converges uniformly on [0, 1]. But the continuity
of the Nemytski operator (5.4.43) and (5.4.41), (5.4.42) imply that {xnk }
k=1 converges
in C01 [0, 1], i.e., T (M) is relatively compact. Hence the compactness of T is proved.
The following assertion is referred to as a well-ordered case of supersolution and
subsolution.
Theorem 5.4.49 (well-ordered case). Let f be a Carath
eodory function satisfying (5.4.40).
Assume that u0 and v0 are a subsolution and a supersolution of (5.4.38), respectively,
with u0 v0 (see Figure 5.4.4). Then the problem (5.4.38) has at least one solution x
satisfying
in [0, 1].
u0 x v0
If, moreover, u0 and v0 are strict and satisfy u0  v0 , then there exists R0 > 0 such
that for, all R > R0 ,
deg (I T, 1 , o) = 1

where

1  {x C01 [0, 1] : u0  x  v0 } B(o; R),

is an open set in C01 [0, 1] (cf. Exercise 5.4.53).

v0

u0

Figure 5.4.4. Well-ordered case


Proof. Set

f (t, y)
f(t, y)  f (t, u0 (t))

f (t, v0 (t))

if
if
if

u0 (t) y v0 (t),
y u0 (t),
y v0 (t).

Every solution of


p2
x(t))

= f(t, x(t)),
(|x(t)|

t (0, 1),

x(0) = x(1) = 0,

(5.4.44)

is a solution of (5.4.38). Indeed, assume that x solves (5.4.44) and x > v0 in an interval
I+ (0, 1) and x = v0 on I+ . Then


1
0

 1
- dx(t) -p2 dx(t) d
f (t, v0 (t))(x(t) v0 (t)) dt
(x(t) v0 (t)) dt =
- dt dt dt
0

(5.4.45)

354

Chapter 5. Topological and Monotonicity Methods




where

(x(t) v0 (t)) =

x(t) v0 (t)
0

on
on

I+ ,
[0, 1] \ I+ .

Since v0 is a supersolution, we have


 1 1
- dv0 (t) -p2 dv0 (t) d

(x(t)

v
(t))
dt

f (t, v0 (t))(x(t) v0 (t)) dt. (5.4.46)


0
- dt dt dt
0
0
Hence, combining (5.4.45) and (5.4.46), we obtain

p2
(|x(t)|

x(t)

|v 0 (t)|p2 v 0 (t))(x(t)

v 0 (t)) dt 0.
I+

This is a contradiction,61 which proves that


x(t) v0 (t),

t (0, 1).

The same argument shows that


x(t) u0 (t),

t (0, 1).

Now, denote by T(y) a solution of the boundary value problem



p2
(|x(t)|

x(t))

= f(t, y(t)),
t (0, 1),
x(0) = x(1) = 0
for y C01 [0, 1]. Then T : C01 [0, 1] C01 [0, 1] is compact62 and the solutions of (5.4.44)
are in a one-to-one correspondence with the xed points of T. The denition of f ensures
that there exists a constant R0 > 0 such that for any y C01 [0, 1] we have
T (y)C01 [0,1] < R0

(5.4.47)

(see (5.4.41), (5.4.42)). By the Schauder Fixed Point Theorem T has a xed point x in
B(o; R0 ), i.e., x is a solution of (5.4.44). It follows from the above considerations that
u0 x v0 , and so x is also a desired solution of (5.4.38).
The proof of the second part follows from the fact that due to (5.4.47), we can
construct an admissible homotopy
H(, )  I T ,

[0, 1],

which shows that


deg (I T , B(o; R0 ), o) = deg (I, B(o; R0 ), o) = 1.
Since u0 and v0 are strict and there is no solution x of the equation
x T(x) = o
for which either x(t) < u0 (t) or x(t) > v0 (t) for a t (0, 1), it follows from Theorem 5.2.13(iv) that
deg (I T , 1 , o) = deg (I T, B(o; R0 ), o) = 1.
The assertion now follows from the fact that T and T coincide in 1 .
that s 
is a strictly increasing function!
proof of this fact is the same as that for T .

61 Note
62 The

|s|p2 s

5.4B. Supersolutions, Subsolutions and Topological Degree

355

The next assertion is referred to as a non-well-ordered case of a supersolution and


a subsolution.
Theorem 5.4.50 (non-well-ordered case). Let f be a Carath
eodory function which satises
the following assumption:
there are ci > 0, i = 1, 2, such that
|f (t, s)| c1 + c2 |s|p1

for a.e.

t (0, 1)

and for all

sR

(5.4.48)

and, moreover,

lim

|s|

f (t, s)
= 1 .63
|s|p2 s

(5.4.49)

Assume that u0 and v0 are a subsolution and a supersolution of (5.4.38), respectively,


and there exists t0 such that u0 (t0 ) > v0 (t0 ) (see Figure 5.4.5).

v0

t0

u0

Figure 5.4.5. Non-well-ordered case


Then (5.4.38) has at least one solution in the closure (with respect to the C 1 -norm)
of the set
S  {x C01 [0, 1] : t1 , t2 (0, 1), x(t1 ) < u0 (t1 ), x(t2 ) > v0 (t2 )}.
Set 2  S B(o; R) and assume that there is no solution of (5.4.38) on 2 . Then
there exists R0 > 0 such that for all R > R0 ,
deg (I T, 2 , o) = 1.
Proof. If (5.4.38) has a solution on S, we are done. Let us assume in the sequel that
(5.4.38) does not have a solution on S. For r > 0 let us dene

f (t, y)
fr (t, y) = (1 + r |y|)f (t, y)

0
63 Here

if
if
if

|y| < r,
r < |y| < r + 1,
|y| > r + 1.

1 is the rst eigenvalue of (5.2.47), see Example 5.2.51.

356

Chapter 5. Topological and Monotonicity Methods

Next we show that there is K > 0 such that for any r > 0 and for any possible solution
of

p2
(|x(t)|

x(t))

= fr (t, x(t)),
t (0, 1),
(5.4.50)
x(0) = x(1) = 0,
the following a priori estimate holds:
xC01 [0,1] K.

(5.4.51)

To prove this fact we argue by contradiction, and thus we assume that for any k N
there are rk > 0, xk S solving

t (0, 1),
(|x k (t)|p2 x k (t)) = frk (t, xk (t)),
(5.4.52)
xk (0) = xk (1) = 0,
and satisfying xk  k. Set yk 
xxkk
and divide (5.4.50) by xk p1 to obtain

(|y k (t)|p2 y k (t)) = frk (t, xk (t)) ,


t (0, 1),
xk p1

y (0) = y (1) = 0.
k

By integration we nd that {yk }


k=1 equivalently satises


 t
frk (, xk ())
y k (t) = p p (y k (0)) +
d
xk p1
0
and

yk (t) =

p

p (y k (0)) +

frk (, xk ())
d
xk p1

(5.4.53)


d ,

t [0, 1],

(5.4.54)

where for s > 1 we set s () = ||s2 if


= 0 and s (0) = 0.
Now, since yk  = 1, by passing to a subsequence if necessary, we have
yk y

in

C0 [0, 1]  {x C[0, 1] : x(0) = x(1) = 0}

for a

y C0 [0, 1].64

But then (5.4.53) yields


yk y

in

C01 [0, 1]

(note that without loss of generality we may also assume that {y k (0)}
k=1 forms a convergent sequence!). It follows from (5.4.54), (5.4.48), (5.4.49) and the Lebesgue Dominated
Convergence Theorem that y solves the problem


p2 y(t))
t (0, 1),
= 1 |y(t)|p2 y(t),
(|y(t)|
y(0) = y(1) = 0.
Since y = 1, it follows that y is a nonzero multiple of the rst eigenfunction 1 (t) > 0
in (0, 1) (see Example 5.2.51). If y > 0 in (0, 1), then we nd that xk (t) for any
t (0, 1), which contradicts xk S. Also y < 0 in (0, 1) leads to a contradiction. Hence
the a priori estimate (5.4.51) is proved.
64 This

is a consequence of the Arzel`


aAscoli Theorem.

5.4B. Supersolutions, Subsolutions and Topological Degree

357

Now choose
R > R0 = max{K, u0 C[0,1] , v0 C[0,1] } + 1
and consider (5.4.50) with r = R and xk = x, i.e.,

p2
x(t))

= fR (t, x(t)),
(|x(t)|

t (0, 1),

(5.4.55)

x(0) = x(1) = 0.

Obvious modications of the denition of a strict subsolution and supersolution of


(5.4.38) lead to the same notions associated with (5.4.55). Then = R2 and = R+2
are a subsolution and a supersolution, respectively, associated with (5.4.55). Both are actually strict. Indeed, assume, e.g., that x is a solution of (5.4.55), x(t) R 2 and
0 ) = 0 and
x(t0 ) = R 2 for a certain t0 (0, 1). Then x(t0 ) = min x( ), i.e., x(t
(0,1)

there exists > 0 such that x(t) < R 1 for t [t0 , t0 + ). But fR (t, x(t)) = 0 by
denition, so x(t) R 2 in (t0 , t0 + ]. Since this implies that x(t) R 2 in (t0 , 1],
we obtain a contradiction. The same argument applies to R + 2. Notice also that  v0
and u0  .
Now, let us dene TR : C01 [0, 1] C01 [0, 1] by
x  TR (y)
where x is a solution of the problem

p2
x(t))

= fR (t, y(t)),
(|x(t)|

t (0, 1),

x(0) = x(1) = 0,
and dene the sets
S  {x C01 [0, 1] :  x  },
Su0  {x C01 [0, 1] : u0  x  }

and

Sv0  {x C01 [0, 1] :  x  v0 }

(see Figure 5.4.6).

=R+2
v0

t0

u0

= R 2
Figure 5.4.6.

358

Chapter 5. Topological and Monotonicity Methods

By denition, TR and T coincide in the ball B(o; R). Applying Theorem 5.4.49 and
Theorem 5.2.13(iv) we obtain
1 = deg (I TR , B(o; R) S , o)
= deg (I TR , B(o; R) Sv0 , o) + deg (I TR , B(o; R) Su0 , o)
+ deg (I TR , 2 , o) = 2 + deg (I TR , 2 , o),


which completes the proof.

Remark 5.4.51. There are several applications of Theorems 5.4.49 and 5.4.50. Also generalizations of these results to the case of partial dierential equations can be found in
literature, see, e.g., Dr
abek, Girg & Man
asevich [40].
In the next assertion we present one application of Theorems 5.4.49 and 5.4.50
which under suitable assumptions on f yields the multiplicity of solutions of (5.4.38).
Theorem 5.4.52. Let f be as in Theorem 5.4.50 and let ui0 and v0i , i = 1, 2, be subsolutions
and supersolutions of (5.4.38), respectively, which satisfy
u10  v01 ,

u20  v02 ,

and let there exist t0 (0, 1) such that


u20 (t0 ) > v01 (t0 )
(see Figure 5.4.7). Then the problem (5.4.38) has at least three distinct solutions.

x3

v02
v01

x2

t0

u20
x1

u10
Figure 5.4.7.

Proof. It follows from Theorem 5.4.49 that there are solutions xi = xi (t), i = 1, 2, of
(5.4.38) which satisfy
u20  x2  v02 .
u10  x1  v01 ,
Now, let us apply Theorem 5.4.50 with a subsolution u20 and a supersolution v01 . We
get another solution x3 = x3 (t) of (5.4.38). Clearly, all xi , i = 1, 2, 3, are mutually
dierent.


5.4B. Supersolutions, Subsolutions and Topological Degree

359

Exercise 5.4.53. Prove that 1 from Theorem 5.4.49 is an open set in C01 [0, 1].
Exercise 5.4.54. Formulate conditions on f = f (t, x) which guarantee that the problem
(5.4.38) has a pair of well-ordered supersolution and subsolution.
Exercise 5.4.55. Formulate conditions on f = f (t, x) which guarantee that the problem
(5.4.38) has a pair of non-well-ordered supersolution and subsolution.
Exercise 5.4.56. Formulate conditions on f = f (t, x) which guarantee that the problem
(5.4.38) has two pairs of supersolutions and subsolutions which satisfy the assumptions
from Theorem 5.4.49.

Chapter 6

Variational Methods
6.1 Local Extrema
In this section we present necessary and/or sucient conditions for local extrema
of real functionals. The most famous ones are the Euler and Lagrange necessary
conditions and the Lagrange sucient condition. We also present the brachistochrone problem, one of the oldest problems in the calculus of variations. We
also discuss regularity of the point of a local extremum. The methods presented
in this section are motivated by the equation
f (x) = 0

(6.1.1)

where f is a continuous real function dened in R. The solution of this equation


can be transformed to the problem of nding a local extremum of the integral F
of f (i.e., F (x) = f (x), x R). Indeed, if there exists a point x0 R at which the
function F has its local extremum, then the derivative F (x0 ) necessarily vanishes
due to a familiar theorem of the rst-semester calculus. The problem of nding
solutions of (6.1.1) can be thus transformed to the problem of nding local extrema
of the function F . On the other hand, one should keep in mind that the equation
(6.1.1) may have a solution which is not a local extremum of F .
In what follows we will deal with real functionals
F: X R
where X is a normed linear space with the norm  .
Denition 6.1.1. We say that F has a local minimum (maximum) at a point a X
if there exists a neighborhood U of a such that for all x U \ {a} we have
F (x) F (a)

(F (x) F (a)).

362

Chapter 6. Variational Methods

If the inequalities are strict, we speak about a strict local minimum (strict local maximum). If the functional F has a (strict) local minimum or (strict) local
maximum at a, we say that it has a (strict ) local extremum at a.
In Figure 6.1.1 the critical point a is not a point of extremum of F .
R

Figure 6.1.1.

The fundamental assertion is the following Euler (or Fermat ) Necessary Condition.
Proposition 6.1.2 (Euler Necessary Condition). Let F : X R have a local extremum at a X. If for v X the derivative F (a; v) exists, then
F (a; v) = 0.
Proof. Set
g(t) = F (a + tv),

t R.

Then g attains a local minimum at t = 0, thus


0 = g (0) = F (a; v).

Denition 6.1.3. If
F (a; v) = 0

for all v X,

then a is called a critical point of the functional F .1


The more precise Lagrange Necessary Condition distinguishes between local
minima and maxima, but requires the existence of the second derivative in the
given direction.
Proposition 6.1.4 (Lagrange Necessary Condition). Let F : X R have a local
minimum (maximum) at a X. If for v X the second derivative 2 F (a; v, v)
exists, then
( 2 F (a; v, v) 0).
2 F (a; v, v) 0
1 Cf.

Denition 4.3.6.

6.1. Local Extrema

363

Proof. Let g be as in the proof of Proposition 6.1.2. Then


g (0) = 2 F (a; v, v).
Now we can apply the Lagrange necessary condition for local extrema of the real
function g of one real variable to get the conclusion.

Contrary to Propositions 6.1.2 and 6.1.4, the Lagrange Sucient Condition
provides the information when a critical point of F is a point of its local minimum
or local maximum.
Theorem 6.1.5 (Lagrange Sucient Condition). Let a X be a critical point
of F : X R. Let there exist a neighborhood U of a such that the mapping
x  D2 F (x) is continuous in U. If there exists > 0 such that
D2 F (a)(v, v) v2

(D2 F (a)(v, v) v2 )

for any

v X,

then F has a strict local minimum (maximum) at a.


Proof. Let v X be such that a + v U. Then according to Proposition 3.2.27
we have
 1
F (a + v) F (a) =
(1 t)D2 F (a + tv)(v, v) dt.2
(6.1.2)
0

On the other hand,


D2 F (a + tv)(v, v) D2 F (a)(v, v) |D2 F (a + tv)(v, v) D2 F (a)(v, v)|
9
8
D2 F (a + tv) D2 F (a)B2 (X,R) v2 .
The continuity of D2 F (x) in U implies that there is > 0 so small that for v < ,
t [0, 1],
D2 F (a + tv) D2 F (a)B2 (X,R) < ,
(6.1.3)
i.e., for 0 < v < we have (due to (6.1.2) and (6.1.3))
F (a + v) > F (a).
The proof for a strict local maximum is similar.

Let us illustrate the general statements at rst on a function of several real


variables
F : RN R.
Example 6.1.6. Let F : RN R have all partial derivatives of the rst order at a
point a RN and, moreover, let the function F have a local extremum at a. Then
Proposition 6.1.2 states that
F
F
F
(a) =
(a) = =
(a) = 0.
x1
x2
xN
2 We

(6.1.4)

can assume that U is convex. Then D 2 F (a + tv) exists and is continuous for all t [0, 1].

364

Chapter 6. Variational Methods

On the other hand, it is well known that (6.1.4) does not imply that F has a local
extremum at the point a. To check that this is the case we can apply Theorem 6.1.5.
If F has continuous second partial derivatives in a neighborhood of a, then we
should investigate the quadratic form
D2 F (a)(v, v) =

N


2F
(a)vi vj .
xi xj
i,j=1

(6.1.5)

To prove that F has, e.g., a local minimum at a, it is enough to show that there
exists > 0 such that for any v RN , v = 1,
D2 F (a)(v, v) .

(6.1.6)

(Here we have used the fact that the quadratic form is homogeneous.) Since we
are in nite dimension, the unit sphere in RN is a compact set. Then (6.1.6) holds
with an > 0 whenever
for all v = 1.3

D2 F (a)(v, v) > 0

(6.1.7)

The reader is invited to justify that (6.1.7) implies (6.1.6) and to explain why this
is not the case when RN is replaced by a space of innite dimension.
It follows from linear algebra4 that for any quadratic form on RN there exists
a basis {u1 , . . . , uN } of RN and numbers 1 , . . . , N such that for any v of the form
v=

N


i ui

i=1

we have
D2 F (a)(v, v) =

N


i i2 .

i=1

The inequality (6.1.7) holds if and only if all i , i = 1, . . . , N , are positive, and so
according to Theorem 6.1.5 the function F has a strict local minimum at a. If there
is at least one positive and at least one negative number among i , i = 1, . . . , N ,
then according to Proposition 6.1.4 the function F does not have a local extremum
g
at a.
Before we give an application in an innite dimensional space, we prove the
following assertion for convex functionals.
3 Here

we use the fact that a positive continuous function on a compact set achieves its minimal
value which has to be positive.
2
2 F
F
4 See also Corollary 6.3.9. (Remember that
(a) = x x
(a).)
x x
i

6.1. Local Extrema

365

Denition 6.1.7. Let M X be a convex set. A functional F : X R is said to


be convex in M if for any u, v M and t [0, 1] we have
F (tu + (1 t)v) tF (u) + (1 t)F (v).
The functional F is said to be strictly convex in M if for any u, v M, u = v
and t (0, 1) we have
F (tu + (1 t)v) < tF (u) + (1 t)F (v).
Proposition 6.1.8. Let F : X R be a convex functional on a normed linear space
X. Then every critical point of F in X is a point of minimum of F over X.
Proof. Without loss of generality we can assume that
F (o) = 0

and

F (o; v) = 0

for any v X

(i.e., o X is a critical point). Assume that F does not achieve the minimum
value over X at o X. Then there exists u X for which F (u) = < 0. The
convexity of F implies that
F (tu + (1 t)o) t
i.e.,

for any t (0, 1),

F (tu) F (o)
< 0.
t

(6.1.8)

But (6.1.8) implies


F (o; u) < 0,
which is a contradiction.

The following result will be needed several times in the further text.
Lemma 6.1.9 (Fundamental Lemma in Calculus of Variations). Let I be an open
interval and f L1loc (I). If

f (x) (x) dx = 0
for any D(I), 5
(6.1.9)
I

then f = const. a.e. in I.


Proof. Let J be a compact subinterval of I and a mollier, D(R), supp
[1, 1] (see Proposition 1.2.20(iv)). For

f (x), x J ,
g(x) =
0,
x R\J,
5 See

page 35 for the denition of D(I).

366

Chapter 6. Variational Methods

we have g L1 (R), and thus


lim g n = g

in the L1 (R)-norm6

and (passing to a subsequence cf. Remark 1.2.18) also a.e. in R. Since




g(x) n (y x) dx =
f (x) n (y x) dx
(g n ) (y) =
R



whenever y n1 , y + n1 J , by the assumption (6.1.9), (g n )(y) is constant
for all such y. The convergence of g n to g means that g is constant a.e. in J ,
i.e., f = const. a.e. in I.

One of the oldest problems in the calculus of variations is studied in detail
in the following example.
Example 6.1.10 (Brachistochrone Problem). The problem is formulated as follows:
For two given points A and B in a vertical plane nd a curve connecting
A and B which is optimal among all other such curves in the following
sense. The point P of unit mass which starts from A with zero velocity
and moves along this curve only due to the gravitational force will reach
the point B in a minimal time.7
In order to nd a suitable mathematical model we shall assume that the points
A = (0, 0) and B = (a, b), b 0, are situated in a vertical plane with the coordinate
system chosen as in Figure 6.1.2. The reader is invited to verify that such a position
of A and B can be considered without loss of generality. We shall concentrate rst
only on curves which are graphs of nonnegative functions y = u(x) which belong
to the space C 1 [0, a].
The point P moves according to the second Newton Law. The resulting force
is a composition of the gravitational force and the reaction force of the constraint
(the point P moves along the given curve). The resulting direction is given by the
tangent line of the curve, see Figure 6.1.2.
The Second Newton Law says that for the velocity v of the point P the
following identity holds:
mv = F = mg cos
(see Figure 6.1.2). Multiplying this identity by v and taking into account that
x = v cos , we obtain


1 2
v
= gv cos = g x,

2
i.e.,
1 2
v = gx
(6.1.10)
2
(the Principle of Conservation of Energy).
6

n is dened in Proposition 1.2.20(iv).


7 This problem was posed by Johann Bernoulli

(see Berkovitz [12]).

6.1. Local Extrema

367

F
mg
a

x
Figure 6.1.2. The x-axis is oriented in the (downward) direction of the gravitational
force.

Since the point P moves along the graph of u = u(x), its trajectory s = s(t)
is given by
 x(t) 1
s(t) =
1 + (u (x))2 dx.8
(6.1.11)
0

Hence

ds(t)
ds(t) dx 1

=
= 1 + (u (x(t)))2 x(t).
dt
dx dt
Using (6.1.10) and the strict monotonicity of x we have
1
1 + (u (x))2
dt

=
.
dx
2gx
v(t) =

Therefore the time needed to get from A to B is given by


 a1
1 + (u (x))2

F (u) =
dx.
2gx
0

(6.1.12)

We wish to apply Proposition 6.1.2 to the functional F . However, F is not dened


on a linear space (u(a) = b = 0). To avoid this obstacle we change the variable u
for this moment by a substitution
b
w(x) = u(x) x.
a
8 We
 x0
0

use the formula for the length of a curve given by the graph of u = u(x): s =
;
1 + (u (x))2 dx.

368

Chapter 6. Variational Methods

So, we can write (6.1.12) as


  a

b

F (w) = F w + x =
a
0

;

2
1 + w (x) + ab

dx
2gx

where
w C01 [0, a]  {w C 1 [0, a] : w(0) = w(a) = 0}.
We equip C01 [0, a] with the norm

uC01 [0,a] =

|u (x)|2 dx

 12
.

For a given h C01 [0, a] we have (see Corollary 3.2.14 and Example 3.2.21)


F (w; h) =
0

w (x) + ab

2
= h (x) dx.
<


2
2gx 1 + w (x) + ab

The Euler Necessary Condition (Proposition 6.1.2) for the original variable u reads
 a
u (x)
1
h (x) dx = 0
for all h C01 [0, a].
(6.1.13)
2gx[1 + (u (x))2 ]
0
Let us denote

u (x)
M (x) = 1
,
2gx[1 + (u (x))2 ]

x (0, a).

Applying Lemma 6.1.9 we obtain that there is a constant K R such that M (x) =
K a.e. in (0, a). However, the continuity of M actually implies that
u (x)
1
=K
2gx[1 + (u (x))2 ]

for all x (0, a).

(6.1.14)

We will nd a solution of the Euler equation (6.1.14). Note that K = 0 implies


b = 0, and so in this case u = 0 is a unique solution of (6.1.14). Assume now that
1
b > 0, and write K as 4gc
with a c > 0. The equation (6.1.14) then implies
'

x
x(
(u (x))2 = ,
2c
2c

x [0, a].

(6.1.15)

x
Hence 0 2c
< 1. After the change of variables x = c(1 cos ), [0, 0 ] (here
0 < is such that a = c(1 cos 0 )) we obtain

du
du
=
c sin
d
dx

6.1. Local Extrema

369

and (6.1.15) is transformed into


du
d

2
= c2 (1 cos )2 .

Hence
u( ) = c( sin ),

[0, 0 ].

(Notice that the integration constant is zero since u(0) = 0, and only the sign plus
corresponds to our problem.) Hence the parametric equations of the graph of u
are given by
x = c(1 cos ),

y = c( sin ),

[0, 0 ].

This is a part of the cycloid, and we have to determine parameters c and 0 so


that B is the end point of this curve. This means
b
0 sin 0
=
,
a
1 cos 0
Since the function


sin
,
1 cos

0 (0, ).

(6.1.16)

(0, ),

is strictly increasing with the supremum (over (0, )) equal to 2 , we have that for
0 ab < 2 the functional F has a unique critical point v C01 [0, a] such that the
graph of the function u(x) = v(x) + ab x has parametric equations
x=a

1 cos
,
1 cos 0

y=a

sin
,
1 cos 0

[0, 0 ],

(6.1.17)

where 0 is given by (6.1.16).


On the other hand, for ab 2 the functional F does not have critical points
1
in C0 [0, a]. However, this does not mean that the original problem has no solution
at all! The restriction we made during the formulation of the mathematical model
(considering only curves which are graphs of functions y = u(x)) does not t with
the real situation if ab 2 ! In this case one has to parametrize the curves x = x( ),
y = y( ) and to investigate the functional
2
 dx 2 ' dy (2
 0
+ d ( )
d ( )
1
d .
F (x, y) =
2gx( )
0
An analogous procedure leads to the solution of two dierential equations for x
and y and one can prove the existence of a unique critical point.9
9 The

reader is invited to prove it in detail as an exercise.

370

Chapter 6. Variational Methods

Let us return to the case ab < 2 . It still remains to show that the solution
(6.1.17) is a global minimum of F over C01 [0, a]. This follows from Proposition 6.1.8.
Indeed, the function
1
z  1 + z 2
is convex in R. This immediately implies that the functional F is convex on C01 [0, a]
(the reader is invited to prove both facts in detail). Hence the unique critical point
g
of F in C01 [0, a] must be the point of its global minimum.
Let us now consider a more general situation. Namely, let
M = {u C 1 [a, b] : u(a) = u1 , u(b) = u2 },
and let us introduce the functional
 b
f (x, u(x), u (x)) dx,
F (u) =

u M,

where f = f (x, y, z) is a function dened on [a, b] R2 with continuous second partial derivatives with respect to all its variables. This assumption will hold
throughout the rest of this section. Applying the Euler Necessary Condition (Proposition 6.1.2) we get the following assertion.
Proposition 6.1.11. Let u0 M be a local extremum of F with respect to M.
Then the function
f
(x, u0 (x), u 0 (x))
x 
(6.1.18)
z
is continuously dierentiable on [a, b] and


f
d f
(x, u0 (x), u 0 (x))
(x, u0 (x), u 0 (x)) = 0
(6.1.19)
y
dx z
for all x [a, b].
Proof. Let us rst assume u1 = u2 = 0. Let w C01 [a, b]. Since

 b
f
f
0 = F (u0 ; w) =
(x, u0 (x), u 0 (x))w(x) +
(x, u0 (x), u 0 (x))w (x) dx,
y
z
a
we get, by integrating by parts,

 b
 x
f
f


(x, u0 (x), u0 (x))
(, u0 (), u0 ()) d w (x) dx = 0.
z
a
a y
Using Lemma 6.1.9 we get from (6.1.20) that there is c R such that
 x
f
f

(x, u0 (x), u0 (x))
(, u0 (), u 0 ()) d = c
z
y
a

(6.1.20)

(6.1.21)

6.1. Local Extrema

371

for all x [a, b]. This equality shows that the function (6.1.18) is continuously
dierentiable and (6.1.19) holds for all x [a, b].
u1
In a general case we can consider u u2ba
(x a) u1 instead of u and
apply the previous result on the transformed functional.

Remark 6.1.12. Equation (6.1.19) is the Euler Equation of the functional F . Taking the formal derivative of the second term in (6.1.19) we obtain


d f
2f

(x, u0 (x), u0 (x)) =
(x, u0 (x), u 0 (x))
dx z
xz
2f
2f
(x, u0 (x), u 0 (x))u 0 (x) + 2 (x, u0 (x), u 0 (x))u 0 (x).
+
yz
z
Hence (6.1.19) indicates that u 0 (x) should exist. This motivates the following
assertion.
Theorem 6.1.13 (Regularity of the classical solution). Let u0 M be a local
extremum of F with respect to M, and let x0 (a, b) be such that
2f
(x0 , u0 (x0 ), u 0 (x0 )) = 0.
z 2
Then there exists > 0 such that u0 C 2 (x0 , x0 + ).
Proof. For x [a, b] and z R dene a function by
 x
f
f
(x, u0 (x), z)
(, u0 (), u 0 ()) d c
(x, z) =
z
a y
where c is the constant from the proof of Proposition 6.1.11. The Implicit Function
Theorem (see Theorem 4.2.1) implies that there exist 1 > 0, > 0 with the
following properties: for any x (x0 1 , x0 + 1 ) there exists a unique z(x)
u (x0 ) + )
such that
(u 0 (x0 ) ,
0
(x, z(x)) = 0.
Moreover, z C 1 (x0 1 , x0 + 1 ). The continuity of u 0 and the uniqueness of z
imply the existence of (0, 1 ) such that
u 0 (x) = z(x)

for

x (x0 , x0 + ).

It is more convenient to look for critical points of F on greater sets than


M in several situations. As we will see later (Section 6.2) this is mainly connected
with the fact that the space of continuously dierentiable functions C 1 [a, b] is not
reexive and it does not possess a Hilbert structure, either. For this purpose it is

372

Chapter 6. Variational Methods

more convenient to work in the Sobolev space W 1,2 (a, b) and to look for extrema
of F on the set
N = {u W 1,2 (a, b) : u(a) = u1 , u(b) = u2 }.
Notice that it is not obvious whether the functional F is well dened on the set N .
We have to assume that f satises certain growth conditions (see Theorem 3.2.24
and Remark 3.2.25; the Caratheodory property is guaranteed by the continuity of
f and its derivatives).
In this case we have
Theorem 6.1.14 (Regularity of the weak solution). Let h L2 (a, b), c1 0 be
such that for a.a. x [a, b] and for all (y, z) R2 ,
|f (x, y, z)| h(x) + c1 (y 2 + z 2 ),
- f
- (x, y, z)- h(x) + c1 (|y| + |z|),
- y
- f
- (x, y, z)- h(x) + c1 (|y| + |z|).
- z
-

(6.1.22)
(6.1.23)
(6.1.24)

Let u0 W 1,2 (a, b) be a local minimum of F on N . For x [a, b] and z R set


(x, z) =

f
(x, u0 (x), z).
z

Assume that
z > 0 on [a, b] R and that for every xed x [a, b] the function
z  (x, z) maps R onto R.
Then u0 C 2 [a, b].
Proof. First, let us assume that u1 = u2 = 0. Conditions (6.1.22)(6.1.24) guarantee that F is well dened on W01,2 (a, b) and that F (u0 ; v) exists for any
v W 1,2 (a, b).10 It follows from Proposition 6.1.2 that for any w W01,2 (a, b),

 b
f
f
(x, u0 (x), u 0 (x))w (x) +
(x, u0 (x), u 0 (x))w(x) dx = 0.
F (u0 ; w) =
z
y
a
If we proceed literally as in the proof of Proposition 6.1.11 we arrive at (6.1.21)
which now holds for a.a. x [a, b]. Since the function
 x
f
g(x, z) = (x, z) c
(, u0 (), u 0 ()) d
y
a
is continuous on [a, b] R, hence by the assumptions on the function , for any
x [a, b] the equation
g(x, z) = 0
10 The

reader is invited to check these facts in detail, see Remark 3.2.25.

6.1. Local Extrema

373

has a unique solution z = z(x). Moreover, by the Implicit Function Theorem (see
Remark 4.2.3, not Theorem 4.2.1!), the function x  z(x) is continuous on (a, b).
It can be shown (Exercise 6.1.21) that it is continuous also at the end points a, b.
So, it follows from (6.1.21) that
 x
for a.a. x [a, b],
i.e.,
u0 (x) =
z(y) dy.
u 0 (x) = z(x)
a

Hence u0
and it is a local minimum of F in the space C01 [a, b]. The
assertion now follows from Theorem 6.1.13.
In the general case, we consider again
C01 [a, b]

u2 u1
(x a) u1
ba

instead of u and apply the previous result on the transformed functional.

Exercise 6.1.15. Consider a function of two real variables


F (x, y) = sin x + sin y sin (x + y)



3
3
,
,
.
2 2
2 2

 4 4 

2
Prove that F has a local maximum at 2
3 , 3 , a local minimum at
3 , 3 , and
there is no extremum at the critical point (0, 0). For the graph of F see Figure 6.1.3.

on the set

M=

Exercise 6.1.16. Find local and global extrema of the functional


 1
F : C[0, 1] R :
F (u) =
[|u(t)|2 + u(t)v(t) + w(t)] dt
0

where v, w C[0, 1] are given functions.

11

Exercise 6.1.17. Use Theorem 6.1.5 to prove that the solution of the Euler equation
(6.1.14) is a local minimum of F from Example 6.1.10.
Hint. Show that
3 
(2c a) 2 a
2 F (v; h, h)
|h (x)|2 dx.

4c gca 0
Exercise 6.1.18. Prove that the functional

|u(x)|2 [1 |u (x)|2 ] dx
F (u) =
0

has in C01 [0, ] a unique local minimum at u = 0.


functional F : X R reaches its global minimum over M X if there exists u0 M
such that F (u) F (u0 ) for all u M. Global maximum is dened similarly. See Section 6.2 for
more detail on the existence of global extrema.
11 The

374

Chapter 6. Variational Methods

Figure 6.1.3. Graph of F

Exercise 6.1.19 (Weierstrass Example). Prove that the functional


 1
F (u) =
x2 |u (x)|2 dx
1

does not have its global minimum over the set


M = {u C 1 [1, 1] : u(1) = 1, u(1) = 1}.
Hint. Set un (x) =

arctan nx
arctan n

and prove that lim F (un ) = 0.


n

Exercise 6.1.20. Prove that the functional


 1
2
F (u) =
x 5 |u (x)|2 dx
1

does not have its global minimum over the set M from Exercise 6.1.19.
Hint. The corresponding Euler equation has no solution.
Exercise 6.1.21. Prove the following statement:
Let g : [a, b]R R be a function and assume that for any x [a, b] the
equation g(x, z) = 0 has a solution denoted by z = z(x) (not necessarily
unique). If
g
(x, z) > 0
on [a, b] R,
z
then this solution is unique. If, moreover, g and g
z are continuous on
[a, b] R, then z = z(x) is continuous on [a, b] as well.

6.2. Global Extrema

375

Hint. For the continuity of z = z(x) use the Implicit Function Theorem in the
form of Remark 4.2.3 and notice that usage of the Contraction Principle is also
possible at the end points a, b.

6.2 Global Extrema


In contrast with the previous section we focus now on points of global extrema. The
key assertions deal with weakly coercive and weakly sequentially lower semicontinuous functionals. Let us consider a dierentiable function of one real variable,
F : R R.
It is not dicult to give an example which shows that local extrema of F need
not be its global extrema see Figures 6.2.1, 6.2.2.
R

R
F

Figure 6.2.1. F attains neither its


maximum nor its minimum on R.

b R

Figure 6.2.2. F attains its extrema on

[a, b] at a and b, respectively.

It is quite natural to ask:


What properties of F guarantee the existence of the point of global
extremum of F ?
First of all let us note that we can look for global minima only because global
maxima of F are global minima of F and vice versa.
Let us consider the following very simple model example of a function F : R
R which is continuous in a bounded interval [a, b]. Then there exists a point x0
[a, b] such that
F (x0 ) = min F (x),
x[a,b]

i.e., the minimum of F over [a, b] is at the point x0 (see Figure 6.2.3). The proof of

this fact is typical for this section. Assume that {xn }n=1 is a minimizing sequence
for F on [a, b], i.e.,
F (xn )  inf F (x).12
(6.2.1)
x[a,b]

12 Note

that, for a general M, we set inf M = if M is not bounded below.

376

Chapter 6. Variational Methods

F (b)
F (a)

x0

Figure 6.2.3.

The compactness of [a, b] implies that there is a subsequence {xnk }k=1 {xn }n=1
and a point x0 [a, b] such that
xnk x0 .
The continuity of F then implies that
F (x0 ) = inf F (x).
x[a,b]

The reader should notice that a property weaker than the continuity of F is
sucient to get this conclusion, namely
F (x0 ) lim inf F (xnk )

(6.2.2)

(cf. Denition 6.2.1 below). It follows now from (6.2.1) and (6.2.2) that
F (x0 ) = inf F (x)
x[a,b]

(see Figure 6.2.4). If, moreover,


F (a) > inf F (x),

F (b) > inf F (x),

x(a,b)

(6.2.3)

x(a,b)

then x0 is also a local minimum of F (see Figures 6.2.3 and 6.2.4).


Assume in the sequel in this section that
F: X R
is a functional on a (innite dimensional) Banach space X. It is quite natural to
ask if a similar result as above holds if [a, b] is replaced by a closed and bounded
set D X and (6.2.3) is substituted by
inf F (u) >

uD

inf

uint D

F (u).

6.2. Global Extrema

377

F (a)

F (b)

F (x0 )
0

x0

Figure 6.2.4.

Unfortunately, the answer is no in general (see Exercise 6.2.23). The reason lies
in the fact that the compactness of the bounded and closed interval [a, b] is the
crucial property which plays the essential role in the proof. In fact, one can imitate
the proof above to get the following result:
Let F be a lower semi-continuous real functional on a compact set
K X. Then F has a minimum in K.
However, this assertion has practically no applications because compact subsets of
the innite dimensional Banach space X are too thin (see Proposition 1.2.15).
For instance, for any compact set K X we have
int K = .
Because of this fact we have to look for a dierent (weaker why?) topology
on X than that induced by the norm. We would like to nd a new topology on X
with respect to which any bounded (in the norm) set D X is relatively compact.
The lower semi-continuity of a functional F with respect to this topology will then
allow us to prove the above assertion with K substituted by a bounded and closed
set D with respect to this new topology. These problems gave an impulse for
the study of weak convergence introduced in Denition 2.1.21. The reader should
notice that we will discuss weak sequential continuity of functionals instead of weak
continuity (these are dierent concepts since weak topology is not metrizable in
general). The reason is quite practical: weak sequential (semi-) continuity is easier
to prove for a concrete (e.g., integral) functional. In order to make the exposition
in this section as clear as possible we will restrict our attention to real Hilbert
spaces H. The reader should have in mind that the following notions can also be
dened in any Banach space.
Denition 6.2.1. Let F : H R be a functional, M H. Then F is said to be
weakly sequentially lower semi-continuous at a point u0 M relative to M if for

any sequence {un }n=1 M such that un  u0 we have


F (u0 ) lim inf F (un ).
n

378

Chapter 6. Variational Methods

We say that F is weakly sequentially lower semi-continuous in M H if it is


weakly sequentially lower semi-continuous at every point u M relative to M.
Example 6.2.2. The norm   on H is a weakly sequentially lower semi-continuous
g
functional in H as follows immediately from Proposition 2.1.22(iii).
Example 6.2.3. Let L : H R be a continuous linear form. Then L is weakly
sequentially lower semi-continuous in H. Indeed, it follows from the Riesz Representation Theorem (Theorem 1.2.40) that there is v H such that
for all u H.

L(u) = (u, v)
Hence
un  u0

L(un ) L(u0 ), 13

implies

in particular,

L(u0 ) lim inf L(un ).


n

The following assertion is a counterpart of Proposition 1.2.2 which is known


as the Extreme Value Theorem for H = R.
Theorem 6.2.4 (Extreme Value Theorem). Let M be a weakly sequentially compact
nonempty subset of H and let F be a weakly sequentially lower semi-continuous
functional in M. Then F is bounded below in M, and there exists u0 M such
that
F (u0 ) = min F (u).
uM

Proof. Let

{un }n=1

be a minimizing sequence for F relative to M, i.e.,

{un }
n=1 M

and

F (un )  inf F (u).


uM

Since M is weakly sequentially compact there exist u0 M and a subsequence

{unk }k=1 {un }n=1 such that


unk  u0 .
The assumption on F implies
inf F (u) F (u0 ) lim inf F (unk ) = lim F (un ) = inf F (u),

uM

uM

i.e.,
F (u0 ) = inf F (u) > .
uM

Corollary 6.2.5. Let M H, F : H R, and u0 be as in Theorem 6.2.4. Assume,


moreover, that u0 int M. If F (u0 ; v) exists for a v H, then
F (u0 ; v) = 0.
un  u0 implies F (un ) F (u0 ), then the functional F is called weakly sequentially
continuous at u0 .

13 If

6.2. Global Extrema

379

Proof. The assumption u0 int M implies that F attains also its local minimum

at u0 . The assertion now follows from Proposition 6.1.2.
Example 6.2.6. Let us consider the boundary value problem for the second order
ordinary dierential equation

t (0, 1),

x(t) + x3 (t) = f (t),


(6.2.4)
x(0) = x(1) = 0,
where f L2 (0, 1) is a given function. Put H  W01,2 (0, 1) with the norm


x =

|x(t)|

dt
2

 12
.

A weak solution 14 of (6.2.4) is a function x H for which the integral identity


 1
 1
 1
3
x(t)
y(t)
dt +
x (t)y(t) dt =
f (t)y(t) dt
0

holds for any function y H.


Let us dene a functional F : H R by
 1


1 1
1 1
2
4
|x(t)|

dt +
|x(t)| dt
f (t)x(t) dt,
F (x) =
2 0
4 0
0
Then for x, y H we have
 1

F (x; y) =
x(t)
y(t)
dt +
0


x (t)y(t) dt
3

x H.15

f (t)y(t) dt,
0

and any critical point of F , i.e., x H satisfying


F (x; y) = 0

for an arbitrary y H,

is a weak solution of (6.2.4) and vice versa.


We will show that Corollary 6.2.5 applies to F and a suitably chosen set
M H. First let us prove that F is a weakly sequentially lower semi-continuous

functional on H. Consider an arbitrary z H and {xn }n=1 H such that xn  z


in H. Due to the compact embedding (Theorem 1.2.28(iii)) H = W01,2 (0, 1) 
C[0, 1], we have that
in C[0, 1]
xn z
(Proposition 2.2.4(iii)).
14 For

a detailed discussion of the notion of a weak solution see Remark 5.3.10. Note that this
weak solution x0 minimizes the energy functional F , i.e., it corresponds to the state with the
minimal energy of the system.
15 This functional can represent the energy of a certain system. For this reason it is often called
the energy functional.

380

Chapter 6. Variational Methods

This implies



|xn (t)|4 dt


|z(t)|4 dt,


f (t)xn (t) dt

f (t)z(t) dt.

(6.2.5)

The weak sequential lower semi-continuity of the Hilbert norm  (Example 6.2.2)
implies
(6.2.6)
lim inf xn 2 z2 .
n

We obtain from (6.2.5) and (6.2.6) that


lim inf F (xn ) F (z).
n

To nd a suitable set M we rst note that


xL2 (0,1) xW 1,2 (0,1) .16

(6.2.7)

Due to this fact we can estimate F using the Holder inequality as follows:
F (x)



1
1
x2 f L2(0,1) xL2 (0,1) x x 2f L2(0,1) .
2
2

(6.2.8)

It is clear that for x > 2f L2(0,1) we have


F (x) > 0,
and at the same time
F (o) = 0.
So, taking
M = {x H : x 2f L2(0,1) + 1},
the assumptions of Corollary 6.2.5 are fullled, since a closed ball in a Hilbert space
is weakly sequentially compact (see Theorem 2.1.25 and Proposition 2.1.22(iii)).
We then conclude that there exists at least one weak solution x0 H of the
g
boundary value problem (6.2.4).
From (6.2.8) it is easy to see that the functional F from the previous example
satises
lim F (x) = .
x

This motivates the following general denition.


Denition 6.2.7. A functional F : H R is said to be weakly coercive on H if
lim F (u) = .

u


16 This

follows by a direct calculation using the H


older inequality for x(t) =

x(s)

ds.
0

6.2. Global Extrema

381

This notion together with Corollary 6.2.5 leads to the following global result.
Theorem 6.2.8. Let F : H R be a weakly sequentially lower semi-continuous
and weakly coercive functional. Then F is bounded below on H, and there exists
u0 H such that
F (u0 ) = min F (u).
uH

Moreover, if F (u0 ; v) exists for a v H, then


F (u0 ; v) = 0.
Proof. Let d > inf F (u). There exists R > 0 such that for u H, u R, we
uH

have
F (u) d.
Hence
inf F (u) = inf F (u).

uR

uH

Now, we apply Theorem 6.2.4 with


M = {u H : u R}.
The assertion on a directional derivative follows from Corollary 6.2.5.

From the point of view of applications, it is convenient to have sucient


conditions in the language of the topology on H induced by the norm which
guarantee that
the set M is weakly sequentially compact;
the functional F is weakly sequentially lower semi-continuous in M.17
We recall the results from Chapter 2 which state that
every closed, convex and bounded set M H is weakly sequentially
compact
(see Exercise 2.1.39, Theorem 2.1.25 and Remark 2.1.24). Concerning the desired
property of F we need the following auxiliary assertion.
Lemma 6.2.9. Let M H. Then F : H R is weakly sequentially lower semicontinuous in M if and only if for every a R the set
E(a) = {u M : F (u) a}
is weakly sequentially closed in M.18
17 Not every continuous functional is weakly sequentially lower semi-continuous (cf. Exercise 6.2.31).
18 The set E M is called weakly sequentially closed in M if for any {x }
n n=1 E, xn  x M,
we have x E.

382

Chapter 6. Variational Methods

Proof. Let F be a weakly sequentially lower semi-continuous functional in M,


a R, {un }
n=1 E(a), un  u0 , u0 M. Then
F (u0 ) lim inf F (un ) a,
n

i.e.,

u0 E(a).

Hence E(a) is weakly sequentially closed in M.


On the other hand, assume that for every a R the set E(a) is weakly

sequentially closed in M. Let {un }n=1 M, un  u0 M and denote


= lim inf F (un ).
n

Then there is a subsequence {unk }k=1 such that


F (unk ) .
For any > 0 we have unk E( + ) for k suciently large. Since E is weakly
sequentially closed in M, u0 E( + ). Hence u0 E(), i.e.,
F (u0 ) lim inf F (un ).
n

Proposition 6.2.10. Let F be a convex and continuous functional dened in a


convex set M H. Then F is weakly sequentially lower semi-continuous in M.
Proof. It follows from the convexity of F that the set
E(a) = {u M : F (u) a}
is convex. The continuity of F implies that E(a) is closed in M. It follows from
Exercise 2.1.39 and Remark 2.1.24 that it is also weakly sequentially closed in M.
The result now follows from Lemma 6.2.9.

These results combined with Theorems 6.2.4 and 6.2.8 allow us to formulate
the following assertions, very often used in applications.
Theorem 6.2.11. Let M be a closed, convex, bounded and nonempty subset of H.
Let F : H R be a convex and continuous functional on M. Then F is bounded
below on M and there exists u0 M such that
F (u0 ) = inf F (u).
uM

If, moreover, F is strictly convex, then u0 is the unique point with this property.19
Theorem 6.2.12. Let F : H R be continuous, convex and weakly coercive on H.
Then F is bounded below on H, and there exists u0 H such that
F (u0 ) = inf F (u).
uH

19 The

reader is invited to prove the uniqueness of u0 !

6.2. Global Extrema

383

If F (u0 ; v) exists for a v H, then


F (u0 ; v) = 0.
If, moreover, F is strictly convex, then u0 is uniquely determined.
Example 6.2.13. For any real continuous linear form L : H R there exists u H
such that u = 1 and
L = L(u).
Indeed, the set M = {u H : u 1} and the functional F = L satisfy the
assumptions of Theorem 6.2.11. Hence there exists u0 M such that
L(u0 ) = inf (L(u)).
uM

By the linearity of L and the symmetry of M we have


inf (L(u)) = sup |L(u)|,
uM

i.e.,

uM

L(u0 ) = sup |L(u)| = L.


uM

Assume that L = 0 and u0  < 1. Then there exists t > 1 such that tu0  = 1,
i.e., tu0 M, and
L(tu0 ) = tL(u0 ) = tL > sup |L(u)|,
uM

a contradiction.
Note that this assertion can be proved directly using the Riesz Representation
g
Theorem (Theorem 1.2.40).
Example 6.2.14. Let us consider the boundary value problem (6.2.4) and the energy functional
 1


1 1
1 1
2
4
F (x) =
|x(t)|

dt +
|x(t)| dt
f (t)x(t) dt, x H  W01,2 (0, 1)
2 0
4 0
0
associated with (6.2.4). We have actually proved in Example 6.2.6 that F is weakly
coercive on H. The continuity of F on H follows from the continuity of the norm
in H, the continuity of the embedding H = W01,2 (0, 1)  L4 (0, 1) and from the
continuity of the linear form

x 

f (t)x(t) dt

on H

under the assumption f L2 (0, 1). The strict convexity of F follows from the
strict convexity of the real functions
t  t2 ,

t  t4 ,

384

Chapter 6. Variational Methods

and the convexity of the linear form. We conclude (see Theorem 6.2.12) that there
exists a unique x0 H such that
F (x0 ) = min F (x).
xH

It follows then from Proposition 6.1.8 that x0 is the unique weak solution of (6.2.4).
g
Remark 6.2.15. The reader should compare Examples 6.2.6 and 6.2.14. In the
latter one we have used Theorem 6.2.12 which enables us to avoid verifying the
assumption of the weak sequential lower semi-continuity of F . This might be a
dicult task in general (it can not be always done so easily by means of the
compact embedding as in Example 6.2.6).
The reader should also notice that the continuity of F without any additional
assumptions does not imply the weak sequential lower semi-continuity of F (see
Exercise 6.2.31).
In the last part of this section we show another possibility for nding critical
points of F under the assumption that F is dierentiable. First we need two
auxiliary assertions.
Lemma 6.2.16. Let F be a functional dened on H and F its gradient.20 Let
F : H H be a monotone operator. Then F is weakly sequentially lower semicontinuous on H.
Proof. Let u, v H. According to the Mean Value Theorem applied to the real
function : s  F (v + s(u v)), s [0, 1], there exists t (0, 1) such that
F (u) F (v) = (F (v + t(u v)), u v)
= (F (v), u v) + (F (v + t(u v)) F (v), u v)
(F (v), u) (F (v), v).21

(6.2.9)

Let {vn }
n=1 be a sequence in H such that vn  v in H, i.e.,
(F (v), vn ) (F (v), v).
It follows from (6.2.9) that
lim inf F (vn ) F (v) (F (v), v) + lim (F (v), vn ) = F (v).
n

20 Remember

that according to the Riesz Representation Theorem (Theorem 1.2.40), G


ateaux
derivative DF (u) is identied with an element of H which is denoted by F (u) and called a
gradient of F at u. Remember also that F is a mapping from H into itself. (Cf. Example 3.2.4.)
21 Since the monotonicity of F implies that  is increasing, is a convex function, i.e., F is
convex.

6.2. Global Extrema

385

Denition 6.2.17. Let T : H H be an operator from H into itself. We say that


T is coercive if
(T (u), u)
= .
lim
u
u
Lemma 6.2.18. Let F : H R be a functional and F : H H its gradient. Let
F be a coercive and bounded operator. Then F is weakly coercive.
Proof. Since
d
F (tu) = (F (tu), u),
dt
we obtain by integration

F (u) = F (o) +
0

dt
= F (o) +
(F (tu), tu)
t

u





u
u
d
F
,
u
u

for any u H, u = o. The coercivity of F implies that there exists r 0 such


that




1
u
u
F
,
1
for any r and u H, u =
 o.

u
u
The boundedness of F implies
m

sup
[0,r]
uH, u =o






F u  < .

u 

Consequently, we obtain




d
u
u
F
,
u
u
0



 u
d
u
u
+
F
,
u
u

r


F (u) = F (o) +

F (o) rm + u r

for any u H,

u > r.


The last inequality yields the weak coercivity of F .

Remark 6.2.19. Let F : H R, h H. Assume that the gradient F (u) of F


exists at any point u H. Then the following equivalence obviously holds true:
There exists u0 H such that F (u0 ) = h if and only if there exists
u0 H such that G(u0 ) = o where
G : u  F (u) (u, h).

(6.2.10)

386

Chapter 6. Variational Methods

Theorem 6.2.20. Let F : H R and let F : H H be the gradient of F . Let


F be a monotone, coercive and bounded operator. Then
F (H) = H.22
Proof. It follows from Remark 6.2.19 that it is enough to prove that for any h
H, the functional G dened by (6.2.10) has a critical point. But Lemmas 6.2.16
and 6.2.18 yield that G is weakly sequentially lower semi-continuous and weakly
coercive. The existence of a critical point of G follows from Theorem 6.2.8.

Example 6.2.21. Let us consider again the boundary value problem (6.2.4) and
the associated energy functional
 1


1 1
1 1
2
|x(t)|

dt +
|x(t)|4 dt
f (t)x(t) dt.
F (x) =
2 0
4 0
0
Then

F (x)y =

x(t)
y(t)
dt +
0


x (t)y(t) dt
3

f (t)y(t) dt
0

is the G
ateaux derivative of F in H  W01,2 (0, 1). We verify the assumption
of Theorem 6.2.20. Using the continuous embedding H = W01,2 (0, 1)  C[0, 1]
(Theorem 1.2.26) we prove the boundedness (and even continuity!) of F in the
space H. Since s  s3 is monotone we have

(F (x1 ) F (x2 ), x1 x2 ) =

+
0

|x 1 (t) x 2 (t)|2 dt

0
1

(x31 (t) x32 (t))(x1 (t) x2 (t)) dt x1 x2 2

for x1 , x2 H and the monotonicity of F follows.


Finally, we have
 1
x4L4 (0,1)
1
(F (x), x)
= x +

f (x)x(t) dt
x
x
x 0
xL2 (0,1)
.
x f L2 (0,1)
x
Using the inequality (6.2.7) we get
(F (x), x)
x f L2 (0,1) ,
x
i.e., F is coercive. We conclude from Theorem 6.2.20 that
F (H) = H,
22 Compare

this result with Theorem 5.3.4.

(6.2.11)

6.2. Global Extrema

387

in particular, there exists x0 H such that


F (x0 ) = o.
Hence x0 is a weak solution of (6.2.4). The estimate (6.2.11) then implies the
g
uniqueness of x0 .23
Remark 6.2.22. Most of the previous results hold true when the Hilbert space H is
replaced by a real reexive Banach space X and the scalar product (, ) is replaced
by the duality pairing , , between X and X, i.e., for f X and x X we
write
f, x  f (x).
However, the proofs are technically more involved and the gradient F has to be
replaced by the G
ateaux derivative DF .

Exercise 6.2.23. Let {en }n=1 be an orthonormal basis in a Hilbert space H. Put


1
Dn = x H : x en 
2
and dene a functional

x
for

f (x) =


2(n 1) 1

x en 
for
x +
n
2

x 

n=1

x Dn .

Show that f is continuous on H,


sup f (x) = 2,
x 32

but f does not have maximum on the ball




3
x H : x
.
2
Exercise 6.2.24. The mapping U : R2 R2 is dened by
U : (x, y)  (y, x).
Prove that U is monotone and satises
lim

(x,y)

U (x, y) =

but is not coercive.


23 The

reader is invited to apply Theorem 5.3.4 to get the same result.

Dn ,

388

Chapter 6. Variational Methods

Exercise 6.2.25. Prove that any coercive map F : H H satises


lim F (u) = .

u

Exercise 6.2.26. Prove that the same conclusion as in Example 6.2.14 holds true
also if f L1 (0, 1).
Hint. Use the embedding W01,2 (0, 1)  L (0, 1).
Exercise 6.2.27. Prove that the norm on H and linear forms on H are convex
functionals.
Exercise 6.2.28. Prove that in Theorem 6.2.12 the weak coercivity of F can be
substituted by a weaker assumption:
For any u H there exists r > 0 such that for all v H, v r, we
have F (v) > F (u).
Exercise 6.2.29. Let M be an open convex subset of a real Hilbert space H, let
F : H R be a functional such that for any u M there exists the second
G
ateaux derivative D2 F (u). Prove that
(a)

(b)

(c)

where
(a) D2 F (u)(h, h) 0 for u M, h H;
(b) (F (u) F (v), u v) 0 for u, v M;
(c) F is convex on M.
Hint. Use the Mean Value Theorem (see Theorem 3.2.6) as for real functions.
Exercise 6.2.30. Prove that for any n N and f L2 (0, 1) the boundary value
problem


x(t) + x2n+1 (t) = f (t),


t (0, 1),
x(0) = x(1) = 0
has a unique weak solution.
Exercise 6.2.31. Let f be the functional from Exercise 6.2.23. Prove that f is
not weakly sequentially upper semi-continuous (i.e., f is not weakly sequentially
lower semi-continuous).
Hint. Remember that en  o.

6.2A Ritz Method


In this part of the text we want to address one fundamental numerical approach to
nding the global minimum of a real functional on a real Banach space. In applications
such a minimum corresponds to a solution of a certain boundary value problem and the
general method we will discuss below is a starting point for many numerical methods. Let

6.2A. Ritz Method

389

us mention the Galerkin Method, the Finite Elements Method, the KatchanovGalerkin
Method, etc., which are powerful tools in the numerical solution of dierential equations.
Let X be a real Banach space and F a real functional dened on X. An element
u0 X satisfying
(6.2.12)
F (u0 ) = inf F (u)
uX

will be called a solution of the variational problem (6.2.12). We will discuss the Ritz
Method which actually yields directly an algorithm for nding a solution of the variational
problem. The basic idea of the Ritz Method is rather simple:
Instead of looking for the minimum of the functional F on the entire space
X, we look for its minimum on suitable subspaces of the space X in which we
know how to solve the variational problem.
Let us now formulate this idea precisely:
To every n N, let a closed subspace Xn of the space X be assigned. The
problem of nding an element un Xn such that
F (un ) = inf F (u)
uXn

(6.2.13)

holds is called the Ritz approximation of the problem (6.2.12) and the element
un Xn is called a solution of the problem (6.2.13).
The following two fundamental problems immediately present themselves:
(a) the problem of the existence and uniqueness of a solution of the problem (6.2.13);
(b) the relation between the solutions of the problems (6.2.12) and (6.2.13).
Problem (a) has already been solved by Theorem 6.2.12 in the framework of Hilbert
spaces. It follows from Remark 6.2.22 that the same assertion can be proved in a reexive
Banach space X. Since a closed subspace Xn of a reexive Banach space X is also
a reexive Banach space, we have the following assertion which follows directly from
Theorem 6.2.12 and Remark 6.2.22.
Proposition 6.2.32. Let X be a reexive Banach space, and let a functional F dened on
the space X be continuous, strictly convex and weakly coercive on X. Then each of the
problems (6.2.12) and (6.2.13) has precisely one solution u0 and un , respectively.
We now focus our eort on problem (b). We investigate under what condition
lim u0 un  = 0

(6.2.14)

is true. If (6.2.14) is valid, then we say that the Ritz Method converges for the problem
(6.2.12) and the solutions un of the problems (6.2.13) approximate the solution of the
problem (6.2.12) in the sense of the norm of the space X.
Proposition 6.2.33. Let F be a continuous linear functional on a normed linear space X
and let {Xn }
n=1 be a sequence of closed subspaces of X such that for every v X there
exist elements vn Xn , n N, such that
lim v vn  = 0.

(6.2.15)

390

Chapter 6. Variational Methods

Let un be such an element of Xn that (6.2.13) holds. Then {un }


n=1 is a minimizing
sequence for the functional F on X, i.e.,
lim F (un ) = inf F (u).

(6.2.16)

uX

Proof. Let {k }
k=1 be a sequence such that
k  inf F (u).
uX

Then there exist elements v (k) X for which


(
'
F v (k) < k .
(k)

(k)

By the assumption (6.2.15) we can nd wn Xn satisfying wn


Hence
(
'
inf F (u) F (un ) F wn(k) .

v (k) for n .

uX

By the continuity of F we get


(
'
(
'
lim sup F (un ) lim F wn(k) = F v (k) < k .
n

This implies that


lim F (un ) = inf F (u).

uX

The assertion on the convergence of the Ritz Method for the problem (6.2.12) is
the following proposition.
Proposition 6.2.34 (Ritz Method). Let H be a real Hilbert space,24 and let F be a continuous functional on the space H which has the second G
ateaux derivative D2 F (u)
25
B2 (H, R).
Assume, further, that there exists a constant c > 0 such that for all u, v H we
have
(6.2.17)
D2 F (u)(v, v) cv2 .
Let subspaces Hn of the space H satisfy condition (6.2.15). Then
(i) there exists precisely one solution u0 H of problem (6.2.12);
(ii) for every n N there exists precisely one solution un Hn of problem (6.2.13);
(iii) the Ritz Method converges for problem (6.2.12), i.e.,
lim u0 un  = 0.

24 We will state and prove Proposition 6.2.34 in the Hilbert space setting. The generalization to
the Banach space setting can be obtained (c.f. Remark 6.2.22). The reader can nd details in
specialized literature (see, e.g., Saaty [116]).
25 See Section 3.2.

6.2A. Ritz Method

391

Proof. It follows from the Taylor Formula (Proposition 3.2.27) that


 1
F (u + v) = F (u) + DF (u)(v) +
(1 t)D2 F (u + tv)(v, v) dt.

(6.2.18)

Choosing u = o, we have due to (6.2.17)


 1
F (v) = F (o) + DF (o)(v) +
(1 t)D2 F (tv)(v, v) dt
0
=
<c
c
F (o) + v2 DF (o)H v = F (o) + v v DF (o)H
2
2
for v H. This implies that F is weakly coercive on H (and also on Hn for arbitrary n).
Choosing now w  u + v in (6.2.18), then for w
= u (i.e., for v
= o) we conclude
from (6.2.18) that
F (w) F (u) DF (u)(w u) > 0.
In particular, for u = tw1 + (1 t)w2 , w1
= w2 , t (0, 1), we have
F (w1 ) F (u) > (1 t)DF (u)(w1 w2 ),

F (w2 ) F (u) > tDF (u)(w1 w2 ).

Multiplying the rst inequality by t, the second by (1 t) and adding both of them, we
obtain that F is strictly convex on H (and also on Hn for arbitrary n).
The assertions (i) and (ii) now follow from Theorem 6.2.12.
It remains to prove assertion (iii). Let u0 and un be a solution of (6.2.12) and
(6.2.13), respectively. Set u  u0 and v  un u0 in (6.2.18). From (6.2.17) and (6.2.18)
we obtain
c
F (un ) F (u0 ) + DF (u0 )(un u0 ) + un u0 2 .
2
Since u0 H is the minimum point for F on H, it follows from Theorem 6.2.12 that
DF (u0 )(un u0 ) = o,

i.e.,

F (un ) F (u0 ) +

c
un u0 2
2

(6.2.19)

holds for arbitrary n N. On the other hand, due to Proposition 6.2.33, the elements
un , n N, constitute a minimizing sequence for F on H, i.e.,
lim F (un ) = inf F (u) = F (u0 ).

uH

(6.2.20)

It follows from (6.2.19) and (6.2.20) that


lim u0 un  = 0

and the proof is complete.

So far, we have answered theoretically problems (a) and (b) formulated at the beginning of this appendix. However, from the point of view of practical (numerical) calculations the most interesting problems start right now. The most frequent and most important case arises in practice when the spaces Hn are of nite dimension, e.g., dim Hn = N .
If e1 , . . . , eN is a basis of Hn and
 N


ci ei ,
Fn (c1 , . . . , cN )  F
i=1

392

Chapter 6. Variational Methods

then the problem (6.2.13) means to nd c = (


c1 , . . . , cN ) RN such that
Fn (
c1 , . . . , cN ) =

inf

(c1 ,...,cN )RN

Fn (c1 , . . . , cN ).

(6.2.21)

If the assumptions of Proposition 6.2.34 are satised, then the function Fn is continuous,
strictly convex on the space RN , satises
lim Fn (c) = ,

and then the vector c is a solution of problem (6.2.21) if and only if all partial derivatives
of the rst order of the function Fn vanish at c (cf. Theorem 6.2.12). Thus the problem
of nding a solution of problem (6.2.21) is equivalent to the problem of nding a solution
of the system
Fn (c1 , . . . , cN )
= 0,
c1
..
.

(6.2.22)

Fn (c1 , . . . , cN )
= 0.
cN
The system (6.2.22) is a system of N algebraic equations which are generally nonlinear.
However, note that if the functional F is quadratic, then the system (6.2.22) is a system
of linear algebraic equations.
Remark 6.2.35. We have not been concerned with the question which is fundamental
from the practical point of view: How do we solve system (6.2.22) numerically? A vast
literature dedicated to numerical methods deals with this problem. Just for an illustration
we mention one minimization method. Choose arbitrarily a vector c0 = (c01 , . . . , c0N )
RN . Let us present an algorithm for the construction of a sequence {cm }
m=1 which
converges under appropriate assumptions on f to the solution of system (6.2.22). If we
m
N
know the vector cm = (cm
1 , . . . , cN ) R , we calculate the components of the vector
m+1
N
,
.
.
.
,
c
)

R
as
follows:
Let the function
cm+1 = (cm+1
1
N
m
m
, . . . , cm+1
Fn (cm+1
1
i1 , , ci+1 , . . . , cN )

of the variable on R assume its minimum at the point cm+1


. Put, then,
i
 cm
cm+1
cm
cm+1
i + (
i )
i
i

where

0 < 2.

Here is the so-called relaxation parameter. If we choose = 1 and if F is a quadratic


functional, we obtain the so-called GaussSeidel Iterative Method (see, e.g., Stoer & Bulirsch [125]). Nowadays there are plenty of packages available in Mathematica, Maple,
Matlab, etc. and oering dierent solvers of system (6.2.22).
From the practical point of view it is important that the system (6.2.22) be as
simple as possible. The form of the system (6.2.22) depends in an essential way on the
actual choice of the subspaces Hn . One special choice depends on the notion and the
properties of the Schauder basis.

6.2A. Ritz Method

393

Let {ei }
i=1 be a Schauder basis (see Section 1.2) of a Hilbert space H (not necessarily orthonormal) and dene the subspace Hn as the set of all elements u H which
are of the form
u = c1 e1 + + cn en .
It follows from the denition of the Schauder basis that {Hn }
n=1 satises condition
(6.2.15).
Example 6.2.36. Let H  W01,2 (0, 1), f L1 (0, 1) and
F (x) 

1
2

1
2
|x(t)|

dt +
0

1
4

|x(t)|4 dt
0

x H.

f (t)x(t) dt,

(6.2.23)

Then F is the energy functional associated with the Dirichlet problem




x(t) + x3 (t) = f (t),


t (0, 1),
x(0) = x(1) = 0

(6.2.24)

(cf. Example 6.2.6). We have




f (t)y(t) dt,
0

2
|y(t)|

dt + 3
0

x3 (t)y(t) dt


D2 F (x)(y, y) =

x(t)
y(t)
dt +

DF (x)(y) =

|x(t)|2 |y(t)|2 dt,


0

and the assumptions of Proposition 6.2.34 are satised.26 The sequence of functions ei ,
i = 1, 2, . . . , which are dened by
ei (t)  ti (1 t),
constitutes a Schauder basis of the space H (see, e.g., Michlin [94]). Thus, if we construct
the subspaces Hn as above, the condition (6.2.15) will be satised. If we rewrite the
system (6.2.22) for this particular case, we obtain the system of nonlinear equations for
unknowns c1 , . . . , cn ,
n


ck

k=1

3
 1 
n
= d <
=
d <k
j
k
t (1 t)
t (1 t) dt +
ck t (1 t) tj (1 t) dt
dt
dt
0
k=1
 1
f (t)tj (1 t) dt,
=

(6.2.25)

j = 1, . . . , n. In each of the equations of system (6.2.25), all unknowns c1 , . . . , cn appear


this fact is rather unpleasant from the computational point of view!
The question then arises whether it is possible to choose the spaces Hn so that each
of the equations of the system (6.2.22) depend on only a small number of unknowns.
This is one of the fundamental questions of numerical mathematics. Such a choice of
Hn is possible, there are dierent ways to do it and each of them leads to a particular

26 Note

that we consider x =
0

2
(|x(t)|

dt

 12

as the norm on H.

394

Chapter 6. Variational Methods

numerical method. Below we indicate one possible choice of Hn which is dierent from
the previous one and which meets the above mentioned requirements.
Let n N, and put ti = ni for i = 0, 1, . . . , n and Ij = [tj , tj+1 ] for j = 0, 1, . . . , n1.
We dene the spaces Hn as follows:
Hn is the set of functions x = x(t) continuous on the interval [0, 1] which are
linear on every interval [ti , ti+1 ] and for which x(0) = x(1) = 0.
Let ei Hn , i = 1, . . . , n 1, be functions such that

1 for i = j,
j = 0, . . . , n.
ei (tj ) =
0 for i
= j,
It is easily established that the set {ei }n1
i=1 constitutes a basis of the space Hn and that
for all y Hn we have
y(t) =

n1


t [0, 1].

y(tj )ej (t),

j=1

The system (6.2.22) constructed for this basis will now be itself a system for the unknown
values xn (tj ) of the solution of problem (6.2.13). The crucial point in this
9
8 construction
. We
, i+1
is the fact that the functions ei (t) vanish outside the interval Ii1 Ii = i1
n
n
then have
ei (t)ej (t) = e i (t)e j (t) = 0
for i, j = 1, . . . , n 1, |i j| > 1 at every point t [0, 1] (with the obvious exception, for
derivatives, of the points t1 , . . . , tn1 , which constitute a set of measure zero). Therefore,
in each of the equations
n1

i=1

ci

e i (t)e j (t) dt +
0

n1


3
ci ei (t)

ej (t) dt =

i=1

f (t)ej (t) dt,

(6.2.26)

j = 1, . . . , n 1, of system (6.2.22) only the unknowns cj1 , cj+1 appear apart from cj .
If we compute a solution c1 , . . . , cn1 from these equations, and if we put
un (t) = c1 e1 (t) + + cn1 en1 (t),

t [0, 1],

we obtain a solution of problem (6.2.13). Now, we wish to know whether


lim un u = 0.

By Proposition 6.2.34, it suces to show that the spaces Hn satisfy condition (6.2.15).
Let y H and > 0. We shall show that there exist n N and yn Hn such that
y yn  < .

(6.2.27)

Indeed, the set D(0, 1) is dense in H (see Exercise 1.2.46). Hence there exists w D(0, 1)
such that

y w < .
(6.2.28)
2

6.2A. Ritz Method

395

Let n N be arbitrary, and let us construct a function yn Hn such that


yn (ti ) = w(ti )

for all

i = 0, . . . , n.

Then we have (due to the Mean Value Theorem):


w yn  =

n1
  ti+1
i=0

|w(t)

y n (t)|2 dt

ti

n1

i=0

max
t[0,1]

1
2
|w(t)|

(ti+1 ti )
n2

= 2 max |w(t)|.
n t[0,1]
This implies that for suciently large n N we have
w yn  <

.
2

(6.2.29)

The desired inequality (6.2.27) now follows from (6.2.28) and (6.2.29).

Remark 6.2.37.
(i) Let us point out that to get system (6.2.25) it was not essential that an equidistant
division of the interval [0, 1] has been selected. Nonetheless, the norm of the division
(i.e., the maximal distance between two consecutive points) must approach zero.
(ii) The spaces Hn are the simplest which could be chosen for the given example. It
is also possible to choose spaces of C 1 -functions which are polynomials of higher
degree on every interval Ii . For instance, one can choose
Hn = {y C 1 [0, 1] : y(0) = y(1) = 0,
y|Ii is a polynomial of the third degree for all i = 0, . . . , n 1}.27
There exists a basis of this space whose dimension is 2n which consists of the
functions e1 , . . . , en1 , 0 , . . . , n such that

1 for i = j,
ei (tj ) =
e i (tj ) = 0,
i = 1, . . . , n 1, j = 0, . . . , n;
0 for i
= j,

i (tj ) =

i (tj ) = 0,

1
0

for
for

i = j,
i
= j,

i, j, = 0, . . . , n,

see Figure 6.2.5. Every function y Hn can be written in the form


y(t) =

n1

j=1

y(tj )ej (t) +

n


y(t
j )j (t),

t [0, 1].

j=0

(iii) From the computational point of view the question of how rapidly the solutions un
of problem (6.2.13) converge to a solution of problem (6.2.12) is very important.
This question is closely related to the regularity of solutions of equations. If, e.g.,
f C 0 [0, 1], then u0 C 2 [0, 1] (cf. Proposition 6.1.11 and Theorem 6.1.13) and
27 These

functions are called cubic splines (see, e.g., de Boor [32]).

396

Chapter 6. Variational Methods

ei
i
0 = t0

ti1

ti

ti+1

1 = tn t

Figure 6.2.5.

using this it can be proved that there exists a constant c > 0 such that for all n N
we have
c
u0 un  .
n
If, e.g., u0 C 4 [0, 1], then we even have
u0 un 

c
.
n3

Remark 6.2.38 (Finite Elements Method). Similarly to Example 6.2.36 we could proceed
even in the case H  W0k,2 (), RN , C 0,1 . The situation then corresponds to
the boundary value problem for partial dierential equations see Chapter 7 for more
details. Suppose that we can divide the set into a nite number of open subsets i ,
i = 1, . . . , k, such that their diameter
diam i = sup x y <
x,yi

1
n

and such that

k


i , i j = for i
= j.

i=1

Each of the sets i is called a nite element. The space Hn will consist of functions
whose restrictions to i are smooth functions, for instance polynomials in N variables,
and satisfy certain conditions on the common boundary of the sets i and j (i
= j).
For simplicity and greater intuitive appeal we will consider to be a polygon in R2 and
for every n N we perform a triangulation Tn of the set , i.e., we put
=

k


Ki

where Ki are open triangles such that

diam Ki

i=1

1
, i = 1, . . . , k,
n

see Figure 6.2.6. Assume that precisely one of the following situations arises for the
mutual position of triangles Ki , Kj Tn (i
= j):
(a) the closures of two distinct triangles have no common point;
(b) the closures of two distinct triangles have only one vertex in common;
(c) the closures of two distinct triangles have an entire side in common.
The spaces Hn will be sets of continuous functions whose restrictions to Ki are polynomials of the kth order. Below, we give examples of the spaces Hn for the case k = 1 and
k = 3. The continuity of a function v Hn is ensured on the set by choosing the values of parameters (used for the construction of the function) to be equal at the common
vertices. The reader will nd more details in specialized literature on the Finite Elements
Method (see, e.g., Brenner & Scott [16], Krzek & Neitaanm
aki [81], Rektorys [107]).

6.2A. Ritz Method

397

Ki

Figure 6.2.6.
Example 6.2.39 (k = 1). Let be a polygon in R2 . Let K be an open triangle with
vertices Q1 , Q2 , Q3 . Let P1 (K) be the set of all polynomials of the rst degree dened
on K, i.e., P P1 (K) if
P (x, y) = 0 + 1 x + 2 y,

(x, y) K.

It is easily shown that any function P (x, y) P1 (K) is uniquely determined by its values
at the vertices Q1 , Q2 , Q3 . The values P (Q1 ), P (Q2 ), P (Q3 ) serve as parameters by
means of which the function P (x, y) is constructed.
The function P P1 (K) for which
P (Qi ) = v(Qi ),

i = 1, 2, 3,

is called the Lagrange interpolation of the function v C(K). The function P (x, y)
constructed in this way is denoted by K v. Clearly, K is a linear operator from the
space C(K) into P1 (K) and
v K vW 1,2 (K) chK vW 2,2 (K)

(6.2.30)

holds for arbitrary functions v W 2,2 (K) (here hK = diam K and c > 0 is a constant
independent of v and hK ).28 Dene the space Hn as follows:
Hn  {v C() : v|Ki P1 (Ki ) for all Ki Tn }.
Obviously,
Let v W

2,2

Hn H  W 1,2 ().
(). Construct a function vn Hn in the following way:
vn |Ki = Ki v.

Applying inequality (6.2.30), we obtain


c
vW 2,2 () .
n
Thus, the function vn is arbitrarily close to the function v provided n is a suciently
large nonnegative integer. Hence, making use of the fact that the space W 2,2 () is dense
v vn 

28 The

reader is invited to prove it in detail!

398

Chapter 6. Variational Methods

in the space H (explain why!), we conclude that the spaces Hn , n N, satisfy condition
(6.2.15).
We can construct the basis functions e1 , . . . , ek of Hn just as in Example 6.2.36. If
{Qi }m
i=1 are all vertices of all triangles of the triangulation Tn , then

1 for i = j,
e
j = 1, . . . , m.
ei (Qj ) =
0 for i
= j,
Example 6.2.40 (k = 3). Let K be an open triangle with vertices Q1 , Q2 , Q3 and with
the center of gravity Q0 . Let P3 (K) be the set of polynomials of the third degree dened
on K, i.e., P P3 (K) if
P (x, y) = 0 + 1 x + 2 x2 + 3 x3 + 4 xy + 5 xy 2 + 6 x2 y + 7 y + 8 y 2 + 9 y 3 ,
(x, y) K. A function P (x, y) P3 (K) is uniquely determined by its values at the
vertices and at the center of gravity and by the values of the rst partial derivatives at
the vertices of the triangle K. A function K v P3 (K) for which
K v(Qi ) = v(Qi ),
v(Qi )
K v(Qi )
=
,
x
x

i = 0, 1, 2, 3;

K v(Qi )
v(Qi )
=
,
y
y

i = 1, 2, 3,

is called the Hermite interpolation of the function v C 1 (K). Just as in the preceding
example, the inequality
v K vW 3,2 (K) chK vW 4,2 (K)

holds for all

v W 4,2 (K).

If we put
Hn  {v C 1 (K) : v|Ki P3 (Ki ) for every triangle Ki Tn },
then
Hn H  W 3,2 ()
and the spaces Hn , n N, again satisfy condition (6.2.15) since the set W 4,2 () is dense
e
in the space H.
Exercise 6.2.41. Apply the spaces Hn described in Remark 6.2.37(ii) to Example 6.2.36.

6.2B Supersolutions, Subsolutions and Global Extrema


In this appendix we show the connection between the supersolutions and subsolutions
(see Section 5.4) on the one hand and the existence of global minima (see Section 6.2)
on the other. We will illustrate it on the Dirichlet boundary value problem

x
(t) = f (t, x(t)),
t (0, 1),
(6.2.31)
x(0) = x(1) = 0,
where f is a continuous function on [0, 1] R (cf. Example 5.4.19). Put H  W01,2 (0, 1).
The functional
 1  x(t)
f (t, s) ds dt
(x) 
0

6.2B. Supersolutions, Subsolutions and Global Extrema


dened on H is of the class C 1 (H, R) and
 1
 (x)(h) =
f (t, x(t))h(t) dt,

399

x, h H.29

(6.2.32)

Then

F (x) =
0


1
2
|x(t)|

+
2

x(t)

f (t, s) ds dt
0

is of the class C 1 (H, R) and its critical points correspond to weak solutions of (6.2.31).
A regularity argument applied to (6.2.31) (similar to that from Theorem 6.1.13) implies
that every weak solution is a classical solution in the sense that
x C02 [0, 1]  {x C 2 [0, 1] : x(0) = x(1) = 0}
and the equation in (6.2.31) holds at every point t (0, 1).
The link between the method of supersolutions and subsolutions on the one side
and the method of nding the global minimizer on the other side is that
the existence of a well-ordered pair of a subsolution and supersolution u0 and
v0 , respectively, implies that the functional F has a minimum on the convex
but noncompact set
M = {x H : u0 (t) x(t) v0 (t) for all t [0, 1]}.
This minimum then solves (6.2.31). Namely, we have the following assertion.
Theorem 6.2.42. Let u0 and v0 be a subsolution and supersolution of (6.2.31) such that
u0 (t) v0 (t), t [0, 1],
E  {(t, x) [0, 1] R : u0 (t) x v0 (t)},
and let f : E R be a continuous function. Then the functional F has a global minimum
on M, i.e., there exists x0 M such that
F (x0 ) =

min

xH
u0 xv0

F (x).

Moreover, x0 is a solution of (6.2.31).


Proof. Let
(t, x)  max{u0 (t), min{x, v0 (t)}}
and consider the modied problem

x
(t) = f (t, (t, x(t))),

t (0, 1),

x(0) = x(1) = 0.
Dene the energy functional associated with this modied problem by

 x(t)
 1
1
2

|x(t)|

+
f (t, (t, s)) ds dt.
F (x) =
2
0
0
29 Cf.

Section 3.2 in order to prove these facts.

(6.2.33)

400

Chapter 6. Variational Methods

Then F C 1 (H, R) and its critical points correspond to the solutions of (6.2.33). It is
easy to prove (the reader should do it as an exercise) that F is weakly sequentially lower
semicontinuous and weakly coercive. It then follows from Theorem 6.2.8 that F has a
global minimum on H at x0 H, F  (x0 ) = o. This x0 is a weak solution of (6.2.33) and
it is regular, i.e., x0 C 2 [0, 1], by Theorem 6.1.14.
We shall show that u0 (t) x0 (t) v0 (t). Indeed, assume by contradiction that
min (x0 (t) u0 (t)) < 0

t[0,1]

and dene



t0  max t [0, 1] : x0 (t) u0 (t) = min (x0 (s) u0 (s)) .
s[0,1]

From the denition of a subsolution u0 and of we obtain that t0 < 1, and for t t0 , t
close to t0 , we have
 t
 t
[
x0 (s) u
0 (s)] ds =
[f (s, u0 (s)) u
0 (s)] ds 0.
x 0 (t) u 0 (t) =
t0

t0

This contradicts the denition of t0 . Hence x0 (t) u0 (t), t [0, 1]. Similarly we prove
x0 (t) v0 (t), t [0, 1].
Notice that if x is such that u0 (t) x(t) v0 (t), then (t, x(t)) = x(t), i.e., x0 is

a minimizer for F on M and F  (x0 ) = o.
Example 6.2.43. Consider the problem

x
(t) = f (t, x(t)),

t (0, 1),

x(0) = x(1) = 0,

(6.2.34)

where f is continuous on [0, 1] R, f (t, 0) = 0, f (t, R) 0 for an R > 0 and there exists
w H  W01,2 (0, 1), 0 w(t) R, t [0, 1], such that
 1  w(t)
f (t, s) ds dt < 0.
0

Then there exists 0 such that for all , (6.2.34) has, besides the trivial solution,
at least one nontrivial nonnegative solution.
Indeed, u0 0 is a subsolution and v0 R is a supersolution, and according to
Theorem 6.2.42 there exists
x0 M  {x H : 0 x(t) R}
which solves (6.2.34) and minimizes the energy functional F on M. Moreover, taking
large enough, we have

 w(t)
 1
1
2
|w(t)|

+
f (t, s) ds dt < 0,
F (w) =
2
0
0
and so
F (x0 ) = min F (x) F (w) < 0 = F (o).
xM

6.3. Relative Extrema and Lagrange Multipliers

401

Remark 6.2.44. The same results as in Theorem 6.2.42 and Example 6.2.43 hold if the
continuity of f is relaxed to f CAR([0, 1]R) and for all r > 0 there exists h L1 (0, 1)
such that for a.e. t (0, 1) and all s R, |s| r, we have |f (t, s)| h(t). The reader is
invited to verify all the previous steps as an exercise.
The reader who wants to learn more is referred to De Coster & Habets [33] where
also the relation between non-well-ordered supersolutions and subsolutions on the one
hand and the minimax method on the other is discussed.
Exercise 6.2.45. How does the proof of Theorem 6.2.42 change if the homogeneous Dirichlet boundary conditions in (6.2.31) are replaced by the Neumann ones?
Exercise 6.2.46. Consider the problem



p2
x(t)

= f (t, x(t)),
|x(t)|

t (0, 1),

x(0) = x(1) = 0,
where p > 1 and

F (x) =
0


1
p
+
|x(t)|

x(t)

(6.2.35)


f (t, s) ds dt.

Prove the analogue of Theorem 6.2.42 for (6.2.35).


Exercise 6.2.47. Find conditions on a continuous function f : [0, 1] R R which guarantee that the problem (6.2.35) has a subsolution u0 and a supersolution v0 satisfying
u0 (t) v0 (t)

for all

t [0, 1].

Hint. Look for u0 and v0 constant on [0, 1].

6.3 Relative Extrema and Lagrange Multipliers


In this section we will investigate the local minima or maxima of a real function
f on a smooth manifold M (in particular, on a surface in R3 ). Such a manifold is
often determined by various constraints which are given by certain equations like
(x) = o
(cf. Remark 4.3.9). The key assertions of this section are the Lagrange Multiplier
Method, the CourantFischer and CourantWeinstein Variational Principles.
Denition 6.3.1. Let X be a metric (or, more generally, topological) space, M
X. We say that a function f : M R has a local minimum (maximum) at a
point a M with respect to M (or a constrained minimum on M ) if there is a
neighborhood U of a such that
f (x) f (a)

(f (x) f (a))

for all x M U.

402

Chapter 6. Variational Methods

We will suppose that M is given as the zero set of a map : X Y , i.e.,


M = {x X : (x) = o}.
The way of investigating the behavior of f in a relative neighborhood U M of
a point a M is simple and transparent. It consists in expressing M U as the
graph of a map : Z X and subsequently studying f . This is always possible
if M is a dierentiable manifold in X = RN (Denition 4.3.4) or if M is given
by as above and satises certain regularity conditions (Proposition 4.3.8 and
Remark 4.3.9(i)).
Theorem 6.3.2 (Lagrange Multiplier Method). Let X be a Banach space, f : X
R, = (1 , . . . , N ) : X RN . Let f have a local minimum or maximum with
respect to
M = {x X : (x) = o}
at a point a M .
Let there be a neighborhood U of a in X such that f, C 1 (U) and let a be
a regular point of (i.e., (a) is a surjective map onto RN ). Then there exist
numbers 1 , . . . , N 30 such that


N

i i (a) = o.
(6.3.1)
f
i=1

Proof. Proposition 4.3.8 and Remark 4.3.9(i) yield a dieomorphism of a neighborhood U of o X onto a neighborhood V of a such that
(U Ker (a)) = M V,

(o) = a.

If 1 denotes the restriction of to U Ker (a), then f 1 has a local minimum


(or maximum) at o and therefore
(f 1 ) (o) = o.
Since 1 (o)h = h for any h Ker (a) (see the proof of Proposition 4.3.8), it
follows that
Ker (a) Ker f (a).
The use of Proposition 1.1.19 completes the proof.

Remark 6.3.3.
(i) The main signicance of Theorem 6.3.2 consists in reducing a (dicult) problem of nding the constrained extremal points to an easier task of nding
the local ones for a function
f

N


i i

i=1

with unknown coecients 1 , . . . , N (they have to be determined in the


course of calculation see Example 6.3.4).
30 The

numbers 1 , . . . , N are called Lagrange multipliers.

6.3. Relative Extrema and Lagrange Multipliers

403

(ii) For an innite number of constraints (i.e., : X Y , Y is a Banach space


of innite dimension) the proof of Theorem 6.3.2 still holds provided there
exists a continuous projection of X onto Ker (a). It is interesting that the
statement (now (f F ) (a) = 0 for a certain F Y ) is true without the
assumption on existence of a projection (the so-called Lusternik Theorem),
but the proof is more dicult (see Lusternik & Sobolev [90]).
Example 6.3.4. Find the minimal and maximal values of
f (x, y, z) = x2 y + xy 2 + z 2

on the set M = {(x, y, z) R3 : x2 + y 2 + z 2 = 1}.

Notice rst that all points of M are regular. The necessary condition given by
Theorem 6.3.2 for extremal points requires solving the following four equations:
2xy + y 2 2x = 0,

(6.3.2)

x + 2xy 2y = 0,

(6.3.3)

2z 2z = 0,

(6.3.4)

x + y + z = 1.

(6.3.5)

We have either z = 0 or = 1 from the third equation. Adding x2 and y 2 to


(6.3.2) and (6.3.3) we obtain
x2 + 2x = y 2 + 2y,
Case 1 (z = 0). If x = y, then

2
x=y=
and
2

(x y)(x + y + 2) = 0.

i.e.,




2
2
2
f
,
,0 =
.
2
2
2

If x + y = 2, then (6.3.2) and (6.3.5) imply xy = 13 and from equation (6.3.5)


we nd

3
3
x+y =
and hence
f (x, y, 0) = xy(x + y) =
.
3
9
Case 2 ( = 1). Again we have either x = y or x + y = 2. Putting x = y into
(6.3.2) we nd
x = y = 0,

z = 1

or

x=y=

and
f (0, 0, 1) = 1,

2 2 1
, ,
3 3 3

2
,
3


=

If x + y = 2, then (from (6.3.2) and (6.3.3))


x2 + 2x 4 = y 2 + 2y 4 = 0.

z=
19
.
27

1
3

404

Chapter 6. Variational Methods

Summing these equations we get


0 = x2 + y 2 4 8,
i.e., there cannot exist z such that
x2 + y 2 + z 2 = 1.
We have found several points in M for which the necessary condition is
satised. Since M is a compact set in R3 and f is continuous, the maximum and
the minimum of f on M have to exist. Comparing the values of f at points at
which the necessary condition is satised we nd that



2
2
2
max f = f (0, 0, 1) = 1,
,
,0 =
.
min f = f
M
M
2
2
2
If we were interested in local minima/maxima of f with respect to M , we
would need some sucient conditions. Since we are able to reduce the problem of
constrained minima/maxima to that of local ones (see the proof of Theorem 6.3.2),
we might employ the sucient condition which uses the second dierential (Theg
orem 6.1.5). Cf. Exercise 6.3.17.
Example 6.3.5 (Existence of the principal eigenvalue). Let p > 1 be a real number,
X  W01,p (0, 1).31 Consider the eigenvalue problem


p2
(|x(t)|

x(t))

= |x(t)|p2 x(t),
x(0) = x(1) = 0

t (0, 1),

(6.3.6)

with a real parameter . This problem is linear for p = 2 and nonlinear for p = 2.
We say that R is an eigenvalue of (6.3.6) if there is a weak solution x X,
x = o, of (6.3.6), i.e.,


p2
|x(t)|

x(t)
y(t)
dt =

|x(t)|p2 x(t)y(t) dt

(6.3.7)

holds for every y X. The corresponding x is then called an eigenfunction associated with the eigenvalue .32

31 We
32 To

will work with the norm x =

p
|x(t)|

dt

 p1

see the analogue to the linear case the reader should notice that for p = 2 such a function x is
an eigenvector (Denition 1.1.27) and is an eigenvalue of the linear operator Bx = x
, Dom B =
{x W01,2 (0, 1) : x(0) = x(1) = 0} L2 (0, 1). The identity (6.3.7) can be interpreted (for
p = 2) as the operator equation x = Ax where A is dened by the equality (Ax, y)W 1,2 (0,1) =
0

(x, y)L2 (0,1) . The eigenvalues of (6.3.6) are then reciprocal values of the eigenvalues of A.

6.3. Relative Extrema and Lagrange Multipliers

405

Since (6.3.7) must also hold for y = x, we obtain




= 0 1

p
|x(t)|

dt

,
|x(t)|p dt

which implies that > 0 for any eigenvalue .


We will prove that the value


1 = inf 
xX
x =o

p
|x(t)|

dt

(6.3.8)

|x(t)| dt
p

i.e.,



1 = inf

xX

p
|x(t)|

dt :


|x(t)|p dt = 1

is attained and use the Lagrange Multiplier Method to show that 1 is the least
eigenvalue (principal eigenvalue) of (6.3.6). Let us prove that the inmum in (6.3.8)
is achieved at an x1 X with


|x1 (t)|p dt = 1.

Indeed, there exists a minimizing sequence {xn }n=1 X such that





|xn (t)| dt = 1
p

and

|x n (t)|p dt 1 .

In particular, this means that the sequence {xn }n=1 is bounded in X. By the
reexivity of X and the compact embedding X = W01,p (0, 1)  Lp (0, 1) (see

Theorem 1.2.28 and Exercise 1.2.46(i)) there exists a subsequence {xnk }k=1

{xn }n=1 and a function x1 X such that


xnk  x1
Hence

in X,

|x1 (t)|p dt = 1

and

xnk x1

x1 p lim inf xn p = 1 ,

i.e.,


0

in Lp (0, 1).

|x 1 (t)|p dt = 1 .

406

Chapter 6. Variational Methods

Now we apply Theorem 6.3.2 with


 1
p
f (x) =
|x(t)|

dt
and

g(x) =

|x(t)|p dt 1.

The Frechet derivatives of f and g at x1 (in the space X) are given by


f (x1 )y = p

|x 1 (t)|p2 x 1 (t)y(t)
dt,
for any y X

g (x1 )y = p

|x1 (t)|p2 x1 (t)y(t) dt

(cf. Exercise 3.2.35). Since x1 = o, we also have g (x1 ) = o, and so the assumptions
of Theorem 6.3.2 are fullled. Hence there exists R such that
f (x1 ) = g (x1 ),
which is equivalent to
 1

|x 1 (t)|p2 x 1 (t)y(t)
dt =
0

|x1 (t)|p2 x1 (t)y(t) dt

(6.3.9)

for any y X. Setting y = x1 in (6.3.9) we get


= 1 .
Now it follows from (6.3.7) and (6.3.8) that 1 is the least eigenvalue of (6.3.6).
g
Remark 6.3.6. Let us emphasize that Theorem 6.3.2 provides a necessary condition
only. It means that not every point a M for which
f (a)

N


i i (a) = o

with some

i R,

i = 1, . . . , N,

i=1

need be a point of local extremum of f relative to M ! On the other hand, to


nd all local extrema of f relative to M one has to start with nding all i R,
N

i = 1, . . . , N , such that the functional f
i i has a critical point a M . It
i=1

is a well-known fact from the calculus of several real variables (when X = RN )


that the set of all such as is almost always nite (see, e.g., Example 6.3.4).
Hence a very natural and deep question arises: How many points a do we have if
dim X = ?
Remark 6.3.7. Let us denote by R the set of all R such that f g
has a critical point a M . If X is a Hilbert space of innite dimension, then in

6.3. Relative Extrema and Lagrange Multipliers

407

Krasnoselski [78, Chapter 6] the reader can nd the proof of the assertion that the
set contains a sequence of nonzero numbers n = 0 such that n 0. The same
assertion for a Banach space X can be found in Citlanadze [26], Browder [18],
Fuck & Necas [55]. Actually, the whole Chapter 6 of the lecture notes by Fuck
et al. [56] is devoted to this problem. As for more recent references the reader can
confer Zeidler [136] and the bibliography therein.
Let us emphasize that in all above results the authors prove that the cardinality of the set is equal to innity. The question: When is a countable set?
is much more involved. Some partial results in this direction can be found in Fuck
et al. [56]. The proofs are based on a stronger version of the Morse Theorem and
go beyond the scope of this book.
Proposition 6.3.8. Let H be an N -dimensional Hilbert space and let A be a selfadjoint operator in H. Then A has N real eigenvalues 1 , . . . , N (if they are
counted with their multiplicities), and the corresponding eigenvectors e1 , . . . , eN
form an orthonormal basis in H.
Proof. Consider two functions f, 1 : H R dened by
f (x) = (Ax, x),

1 (x) = (x, x) 1,

x H.

Then the set


M1 = {x H : 1 (x) = 0}
(the unit sphere in H) is a compact subset of H and the continuous function f
assumes its maximum in M1 at a point e1 M1 . By Theorem 6.3.2, there is a
1 R such that
f (e1 ) 1 1 (e1 ) = o.
A simple calculation shows that f (e1 )h = 2(Ae1 , h), 1 (e1 )h = 2(e1 , h). Therefore
(Ae1 1 e1 , h) = 0

for all h H,

i.e.,

Ae1 = 1 e1 .

Taking h = e1 we also get


1 = (Ae1 , e1 ) = max (Ax, x).
xM1

In particular, 1 is the largest (equivalently, rst) eigenvalue. To nd the second


eigenvalue we add another constraint
2 (x)  (x, e1 ) = 0
(remember that eigenvectors of a symmetric matrix are pairwise orthogonal). The
function f has again a maximum with respect to
M2 = {x H : 1 (x) = 2 (x) = 0}

408

Chapter 6. Variational Methods

2 R such that
and thus there are e2 M2 , 2 ,
2 (e2 )h = (2Ae2 22 e2
2 e1 , h) = 0
f (e2 )h 2 1 (e2 )h
2

(6.3.10)

for all h H. In particular, for h = e1 we get


2 e1 2 = 2(e2 , Ae1 )
2 = 21 (e2 , e1 )
2,
0 = (2Ae2 , e1 )
2 = 0. The equality (6.3.10) hence yields
and consequently
Ae2 = 2 e2
and, similarly as above,
2 = max (Ax, x).
x=1
(x,e1 )=0

It is obvious that we can proceed by induction to obtain all eigenvalues 1 , . . . N


and to show that the corresponding eigenvectors e1 , . . . , eN are orthonormal and
form a basis of H.

Corollary 6.3.9. Let A = (aij )i,j=1,...,N be a symmetric matrix (aij = aji for
i, j = 1, . . . , N ). Then there exist real numbers 1 , . . . , N and a basis e1 , . . . , eN
of RN such that
N

i,j=1

aij xi xj =

N


i i2 ,

where

x = (x1 , . . . , xN ),

i=1

x=

N


i ei .

i=1

Remark 6.3.10. The procedure explored in the proof of Proposition 6.3.8 has a
disadvantage, namely, to nd the kth eigenvalue k it is necessary to know the
rst k 1 eigenvectors e1 , . . . , ek1 . Because of that it can be convenient to have
another expression for k . We will now prove that


(Ax, x)
k = min max
: (x, y1 ) = = (x, yk1 ) = 0 and x = o (6.3.11)
y1 ,...,yk1
x2
provided dim H k. Expression (6.3.11) is called the Minimax Principle.
Let e1 , . . . , ek be eigenvectors corresponding to the rst k eigenvalues 1
k . Take y1 , . . . , yk1 H and let
N = {x = o : (x, y1 ) = = (x, yk1 ) = 0}.
There is an x
N Lin{e1 , . . . , ek }, say x
=

k


i ei . A simple argument to

i=1

see this consists in the observation that the linear operator : Rk Rk1 (or
Ck Ck1 ) given by

 k

i (ei , yj )
=
i=1

j=1,...,k1

6.3. Relative Extrema and Lagrange Multipliers

409

must have a nontrivial kernel. For such an x


we have

k
k
k
k




(A
x, x
) =
i i ei ,
j ej =
i |i |2 k
|i |2 = k 
x2 .
i=1

j=1

i=1

i=1

This shows that the maximum in (6.3.11) (denoted by m(y1 , . . . , yk1 )) is not less
than k and therefore
inf

y1 ,...,yk1

m(y1 , . . . , yk1 ) k ,

too. But the above calculation yields that


m(e1 , . . . , ek1 ) = k .
Remark 6.3.11. This method of nding eigenvalues of a self-adjoint continuous
operator A cannot be extended to innite dimensional Hilbert spaces. The reason is
rather simple: such an operator need not have any eigenvector (Example: Ax(t) =
tx(t), x L2 (0, 1)). On the other hand, if we assume that A is, in addition to
self-adjointness, also compact, then similar result holds.
Theorem 6.3.12 (CourantFischer Principle). Let A : H H be a compact, selfadjoint and positive33 linear operator from an (innite dimensional ) separable real
Hilbert space H into itself. Then all eigenvalues of A are positive reals and there
exists an orthonormal basis of H which consists of eigenvectors of A. If, moreover,
1 2 3 > 0,

n 0

(n ),

denote the eigenvalues of A, then


1 = max{(Au, u) : u = 1}
and
k+1 = min max {(Au, u) : u = 1, (u, v1 ) = = (u, vk ) = 0},
v1 ,...,vk

k = 1, 2, . . . .34
Proof. Set
F (u) = (Au, u),

1 (u) = u2 1

for u H,

and
M1 = {u H : 1 (u) = 0}.
linear self-adjoint operator A is said to be positive if (Au, u) > 0 for all u
= o.
reader should compare this assertion and its proof with the HilbertSchmidt Theorem
(Theorem 2.2.16).
33 A

34 The

410

Chapter 6. Variational Methods

Let {un }n=1 be a maximizing sequence for F subject to M1 , i.e., un  = 1,


n = 1, . . . , and
lim F (un ) = sup {F (u) : u M1 }.
n

The boundedness of M1 and the compactness of A imply (Proposition 2.2.4(iii))

that we can pass to a subsequence (denoted again as {un }n=1 ) for which
u n  e1

and

Aun Ae1

in

with an e1 H.

Then
|(Aun , un ) (Ae1 , e1 )| |(Aun Ae1 , un )| + |(Ae1 , un e1 )| 0
since both terms on the right-hand side approach zero. So
F (e1 ) = sup {F (u) : u M1 }.
In particular, we have
F (e1 ) > 0

and

e1 = o.

Let us prove that e1  = 1. Indeed, we have


e1  lim inf un  = 1.
n

Assume that e1  < 1. Then there exists t > 1 such that for e1 = te1 we have

e1  = 1, i.e., e1 M1 . Also
F (
e1 ) = (A(te1 ), te1 ) = t2 (Ae1 , e1 ) = t2 F (e1 ) > sup {F (u) : u M1 },
a contradiction. Hence
1 = F (e1 ) = max {F (u) : u M1 }.
Applying Theorem 6.3.2 we prove exactly as in Proposition 6.3.8 that 1 is an
eigenvalue of A and e1 is the corresponding eigenvector. Now, we proceed by
induction using
Mn = {u H : u = 1 and (u, e1 ) = = (u, en1 ) = 0}
as above to get the sequence of eigenvalues
1 2 > 0

(6.3.12)

and the sequence of the corresponding eigenvectors


e1 , e2 , . . . 35
which are pairwise orthogonal. The innite dimension of H causes that the above
sequences are innite in general.
35 The

reader should perform this part of the proof in detail.

6.3. Relative Extrema and Lagrange Multipliers

411

Suppose now that there is w H such that


w = 1

(w, en ) = 0

and

for all n N.

Then
w

Mn ,

(Aw, w) n

and thus

for n = 1, 2, . . . .

n=1

Since n 0 (Corollary 2.2.13), we have (Aw, w) = 0. The assumption on the

positivity of A implies w = o, a contradiction. This result shows that {en }n=1


is an orthonormal basis of H (Corollary 1.2.36). Moreover, the sequence (6.3.12)
contains all eigenvalues of A. Indeed, if
Aw = w

for w =

n en = 0,

n=1

then
n n = n

for n = 1, 2, . . . .

Therefore n = 0 provided n = .
The min max characterization of n s follows as in the nite dimensional
case (Remark 6.3.10).

Remark 6.3.13. It is remarkable that the Minimax Principle holds even without
the assumption on the continuity of A in the sense that
inf

y1 ,...,yk1

sup {(Ax, x) : x Dom A, x = 1, (x, y1 ) = = (x, yk1 ) = 0}

yields either the kth eigenvalue or an upper bound of the essential spectrum of a
linear self-adjoint operator A provided A is bounded above. For details see, e.g.,
Reed & Simon [106].
There is also a dual characterization of the eigenvalues of A called the
CourantWeinstein Variational Principle.
Theorem 6.3.14 (CourantWeinstein Variational Principle). Let H be a real separable Hilbert space, A : H H a positive compact self-adjoint linear operator.
Assume that the eigenvalues n of A form a decreasing sequence
1 2 3 n > 0,

n 0

(n )

(cf. Theorem 6.3.12), and the multiplicity of an eigenvalue indicates how many
times this repeats in the above sequence. Then for any n N,
n =

sup

inf (Au, u).

uX
XH
dim X=n u=1

(Here X is an arbitrary linear subspace of H of dimension equal to n.)

412

Chapter 6. Variational Methods

Proof. Keeping the notation from Theorem 6.3.12, in particular, Aen = n en , we


denote for n N xed
n = sup
inf (Au, u).

uX
XH
dim X=n u=1

n = n .
Our aim is to prove
n n . Set
Step 1. We prove that
X0 = Lin{e1 , . . . , en }.
Then X0 is a linear subspace of H, dim X0 = n, and clearly
n min (Au, u).

uX0
u=1

However, we can estimate the minimum of the quadratic form on the right-hand
side in terms of n . For u X0 , u = 1 we have
u=

n


n


xi ei ,

i=1

Then

(Au, u) =

n

i=1

xi i ei ,

n


x2i = 1.

i=1

xj ej =

j=1

n


i x2i n ,

i.e.,

n n .

i=1

n n . Set
Step 2. We prove
Y = Lin{ei }
i=n .
Then codim Y = n 1. Let X be an arbitrary linear subspace of H, dim X = n.
Then necessarily
dim (X Y ) > 0,
and the space X Y must contain an element w = o. We can assume w = 1.
Since w Y , we have



xi ei ,
x2i = 1.
w=
i=n

i=n

The estimate of the quadratic form (Au, u) on the unit sphere in X yields
min (Au, u) (Aw, w) =

uX
u=1

i x2i n

i=n

n n follows.
Since X is arbitrary, the equality

x2i = n .

i=n

6.3. Relative Extrema and Lagrange Multipliers

413

Example 6.3.15 (Higher eigenvalues). Let p = 2 in (6.3.6), i.e., let us consider the
eigenvalue problem

x(t) + x(t) = 0,
t (0, 1),
(6.3.13)
x(0) = x(1) = 0.
The eigenvalues of the linear problem (6.3.13) can be calculated in an elementary
way. On the other hand, if we set H  W01,2 (0, 1) and dene a positive and
compact operator A : H H by

(Ax, y)

W01,2 (0,1)

x(t)y(t) dt, 36

=
0

then = 0 is an eigenvalue of A if and only if = 1 is an eigenvalue of (6.3.13)


(cf. footnote 32 on page 404). It follows from Theorem 6.3.14 that
1
=
n


sup

min

XH x=1
dim X=n

|x(t)|2 dt.

The following two exercises show the relation between the local (global) extremum subject to a constraint and the local (global) extremum of the functional
depending on a parameter (without the constraint).
Exercise 6.3.16. Prove the following assertion:
Let f , be two real functionals dened on a real Hilbert space H. Let
the functional f ( R) have a local (global ) extremum at a point
x0 H. Then the functional f has a local (global ) extremum subject to
the constraint {x H : (x) = (x0 )} at the point x0 .
Exercise 6.3.17. Prove the following assertion:
Let f, : X R satisfy the assumptions of Theorem 6.3.2 and let
x0 X, R be such that
f (x0 ) (x0 ) = 0.
Assume, moreover, that there exist D2 f (x0 ; h, h), D2 (x0 ; h, h). Then
x0 is a local minimum of f (without the constraint) provided the
quadratic form
h  D2 f (x0 ; h, h) D2 (x0 ; h, h),

h X,

is positive denite in X.

36 By

Example 2.2.17 the operator A is also dened as (Ax)(t) =

G(t, s)x(s) ds, and the


0

compactness of A follows.

414

Chapter 6. Variational Methods

Exercise 6.3.18. Show that the rst eigenvalue of



x
(t) + x(t) = 0,
t (0, ),
x(0) = x() = 0
is simple and equal to 1, and that given > 1 there exists c = c() > 0 such
that for any x W01,2 (0, ),



2
2
|x(t)|

dt +
|x(t)|2 dt c
|x(t)|

dt.
0

Exercise 6.3.19. Prove that for all x W01,2 (0, ) the inequality


2
2
|x(t)| dt
|x(t)|

dt
holds true.
0

Hint. Use Exercise 6.3.18.

6.3A Contractible Sets


This appendix has solely an auxiliary character and will be used in the proof of the Krasnoselski Potential Bifurcation Theorem in Appendix 6.3B. The proofs of the assertions
from this appendix rely on the Brouwer Fixed Point Theorem (Theorem 5.1.3).
Denition 6.3.20. Let A and B be subsets of a topological space Y . Then by denition
A is contractible into B in the space Y , briey
AB

in

Y,

if there exists a homotopy h C([0, 1] A, Y ) such that for any u A,


h(0, u) = u,

h(1, u) B.

The next assertion shows that is a transitive relation.


Lemma 6.3.21. Let A, B and C be subsets of Y . If A B and B C in Y , then also
A C in Y .
Proof. Let us assume that A B and B C by means of homotopies h and g. Dene a
homotopy f C([0, 1] A, Y ) by

h(2t, u),
0 t 12 , u A,
f (t, u) =
g(2t 1, h(t, u)), 12 < t 1, u A.
Then f C([0, 1] A, Y ) and Denition 6.3.20 yields A C.
Let H1 and H2 be two closed subspaces of a Hilbert space H such that
H = H1 H2 .

6.3A. Contractible Sets

415

Let Pi : H Hi , i = 1, 2, be projections (cf. Example 1.1.13(i)), and assume that


dim H1 < .
Set
R = {x H : P1 x
= o}.
The set R equipped with the metric induced by the norm in H is a metric space.
Lemma 6.3.22. The set S1,r  B(o; r) H1 is not contractible to a point in R.37
Proof. It is enough to prove this assertion for the sphere with radius r = 1. Let us denote
it by S1 . We proceed in two steps. We prove rst that if S1 were contractible to a point
in R, then it would have to be contractible to a point in S1 . In the second step we show
that this fact contradicts the Brouwer Fixed Point Theorem (Theorem 5.1.3).
Step 1. If S1 is contractible to a point in R, then there exists a continuous mapping
f : [0, 1] S1 R and x0 R such that
f (0, x) = x,

f (1, x) = x0

x S1 .

for all

For t [0, 1], x S1 set


g(t, x) =

P1 f (t, x)
.
P1 f (t, x)

Then g deforms the set S1 continuously to the point

P1 x 0

P1 x0

in S1 .

Step 2. Let the unit sphere S1 H1 be contractible to a point in S1 , i.e., there exists a
continuous map g : [0, 1] S1 S1 and a point x0 S1 such that
g(0, x) = x,

g(1, x) = x0

Now, we dene h : B(o; 1) H1 B(o; 1) H1 by




g 1 x, x
x
h : x

x0

for all

x S1 .

for

x
= o,

for

x = o.

Then h is continuous. Since dim H1 < , the Brouwer Fixed Point Theorem (Theorem 5.1.3) implies that there exists y B(o; 1) H1 such that
h(y) = y.
Since h assumes only values from S1 , we have y S1 , y = 1. On the other hand,
h(y) = g(0, y) = y,
which is a contradiction.
Lemma 6.3.23. Let F be a subset of R. If there exists x0 H1 , x0  = 1, such that
P1 (F) {y H1 : y = ax0 , a R} = ,
then F is contractible to a point in R.
37 I.e.,

there is no x R such that S1,r {x} in R.

416

Chapter 6. Variational Methods

Proof. Dene f : [0, 1] F R as



f (t, x) =

x + 2tx0 [1 (x, x0 )]
x0 + 2(1 t)[x (x, x0 )x0 ]

for
for

8
9
t 0, 12 , x F,
1 9
t 2 , 1 , x F.

The mapping f is continuous and deforms F to the point x0 R. It is sucient to


verify that for any t [0, 1], x F we have
P1 f (t, x)
= o.
8

Indeed, for any t 0,

9
1
2

we have
P1 f (t, x) = 2t[1 (x, x0 )]x0 + P1 x,

for t

1
2

9
, 1 we have
P1 f (t, x) = [1 2(1 t)(x, x0 )]x0 + 2(1 t)P1 x.

For t [0, 1) we have then P1 f (t, x)


= o due to the assumption P1 (F) Lin{x0 } = .
For t = 1 we have

P1 f (t, x) = x0
= o.

6.3B Krasnoselski Potential Bifurcation Theorem


Let us recall the denition of a potential operator.
Denition 6.3.24. Let O be an open subset of a real Hilbert space H, f : O H. We
say that f has a potential (in O) if there exists a functional F : O R which is Frechet
dierentiable in O, and for any x O we have
f (x) = F  (x).

(6.3.14)

Remark 6.3.25. Let us recall how to interpret the equality (6.3.14). The Frechet derivative
F  (x) is a continuous linear operator from H into R. It follows from the Riesz Representation Theorem (see Theorem 1.2.40) that there is a unique point z  z(x) H such
that
F  (x)y = (y, z),
z = F  (x)
for any y H.
In what follows we will identify F  (x) with z(x) H and study bifurcation points
of the equation
(6.3.15)
x F  (x) = o.
The main objective of this appendix is to prove that (under the assumptions F (o) = o,
F  (o) = o and some assumptions concerning the smoothness of F )
every point (, o) where is a nonzero eigenvalue of F  (o) : H H is a
bifurcation point of (6.3.15).

6.3B. Krasnoselski Potential Bifurcation Theorem

417

Theorem 6.3.26 (Krasnoselski Potential Bifurcation Theorem). Let F be a (nonlinear)


functional on a Hilbert space H. Assume that
F is twice dierentiable in a certain neighborhood U(o) of o H,

(6.3.16)

F  is compact on U(o),

(6.3.17)

F : U(o) L(H) is continuous at o,

(6.3.18)



F (o) = o,

F  (o) = o.

(6.3.19)

Then (0 , o) where 0
= 0 is a bifurcation point of
x F  (x) = o

(6.3.20)

if and only if 0 is an eigenvalue of the operator A  F  (o).


Remark 6.3.27. Note that the equation (6.3.20) is a special case of the equation
o = x Ax + G(, x)
from Theorem 5.2.23. Indeed, the left-hand side of (6.3.20) can be written as
x F  (o)x + [F  (o)x F  (x)]
where F  (o) is a compact linear operator (see Proposition 5.2.21), and
F  (o)x F  (x) = o(x),

x 0.

Note rst that the implication


if 0 =

0 and (0 , o) is a bifurcation point of (6.3.20), then 0 is an eigenvalue of A,


follows from Exercise 5.2.25.
So we will concentrate on the proof of the reversed implication. Roughly speaking,
we know that the linearization of (6.3.20), i.e., the equation
(I F  (o))x = o
has a nontrivial solution, and we want to show that there is also a nontrivial solution of
the close but nonlinear equation (6.3.20).
The basic idea of the proof consists in the fact that (6.3.20) is a necessary condition
for x to be a critical point of F subject to the sphere


1
1
where J(x) = x2 .
B(o; r)  x H : J(x) = r 2
2
2
Here we use the fact that identity is the dierential of the functional J, and the Lagrange
Multiplier Method. Later we will prove the existence of a suciently large number of
critical points of F on B(o; r). If we restrict ourselves to spheres with suciently small
radii (B(o; r) U(o) at least), we get critical points converging to zero. The last part of
the proof consists in showing that the corresponding Lagrange multipliers can be chosen
close to 0 .

418

Chapter 6. Variational Methods

Let us assume that 0


= 0 is an eigenvalue of the operator A. The assumption
(6.3.18) guarantees that F  (o) is a linear self-adjoint operator (see Proposition 3.2.28).
We can assume, without loss of generality, that 0 > 0.
Let us start with a geometrical interpretation of the points x B(o; r) such that
x = F  (x).

(6.3.21)

In this case the dierential F  (x) is perpendicular (recall that F  (x) H in our interpretation) to the sphere B(o; r) at x. Then x can be looked for as a limit of those points of
the sphere B(o; r) at which the tangent projections (see (6.3.22) below and Figure 6.3.1)
of F  (x) converge to zero. More precisely, we have

P (z) =

F (z)

(F (z), z)
z
(z, z)

z
{y : (z, y) = 0}
o
D(z)

Figure 6.3.1.
Lemma 6.3.28. For z H, z
= o, set
D(z) = F  (z)

(F  (z), z)
z
(z, z)

(6.3.22)

(D(z) is the orthogonal projection of F  (z) to the tangent space of B(o; z) at z 38 ).
Let yn B(o; r), yn  x0 , and let F  be continuous, and
lim F  (yn ) = y
= o,

lim D(yn ) = o.39

(6.3.23)

Then yn x0 , y = F  (x0 ), x0
= o, and
x0 F  (x0 ) = o

where

1
(F  (x0 ), x0 ).
r2

Proof. From the weak convergence yn  x0 and from (6.3.23) we obtain


(F  (yn ), yn ) (y, x0 )
38 This
39 Both

and hence

(F  (yn ), yn )
(y, x0 )
yn 
x0 .
r2
r2

tangent space is equal to {x H : (x, z) = 0} see Remark 4.3.40.


limits are considered with respect to the norm in H.

(6.3.24)

6.3B. Krasnoselski Potential Bifurcation Theorem

419

At the same time, from the denition of D(yn ) and (6.3.23) we have
(F  (yn ), yn )
yn = F  (yn ) D(yn ) y.
r2
Hence

1
(y, x0 )x0 .
r2
Since y
= o, we have x0
= o and also (y, x0 )
= 0. The denition of D(yn ) and the fact
that D(yn ) o yield
y=

yn = r 2

F  (yn ) D(yn )
y
r2
= x0 .
(F  (yn ), yn )
(y, x0 )

Continuity of F  at x0 then implies


y = F  (x0 ),

i.e.,

F  (x0 ) =

(y, x0 )
(F  (x0 ), x0 )
x0 =
x0 .
2
r
r2

We will look for a curve on the sphere B(o; r) which starts at a xed point x;
the values of F along this curve do not decrease, and after a nite time (even if large)
we almost reach the critical point of F . In other words, we are looking for a curve
k = k(t, x), t [0, ), x B(o; r) such that
k(0, x) = x,

(6.3.25)

and for all t (0, ) we require


k(t, x) B(o; r),

i.e.,

k(t, x)2 = r 2 .

The last relation implies


d
k(t, x)2 = 0,
dt
which is equivalent to

d
k(t, x), k(t, x)
dt


=0

for all

t (0, ).

(6.3.26)

d
k(t, x) is perThe equality (6.3.26) states that for all t (0, ) the element dt
pendicular to k(t, x). This will be satised if we look for a solution of the initial value
problem

d k(t, x) = D(k(t, x)),


t (0, ),
dt
(6.3.27)

k(0, x) = x.

The assumption (6.3.18) implies that F  is Lipschitz continuous in a neighborhood of


o. Hence, for r > 0 suciently small, D is Lipschitz continuous. Then, by virtue of
Corollary 3.1.6, there exists a unique solution of (6.3.27) which is dened on the whole
interval (0, ). It follows from Remark 3.1.7 that this solution depends continuously on
the initial condition x B(o; r).

420

Chapter 6. Variational Methods

Let k be a solution of the initial value problem (6.3.27). Then it has the following
important properties:
(i) For any t (0, ) we have
k(t, x) = x.
(ii) For any t (0, ) we have
d
F (k(t, x)) = (F  (k(t, x)), D(k(t, x))) = D(k(t, x))2 0.
dt
In other words, the values of the functional F increase along k regardless of the
choice of x B(o; r).
(iii) For any t (0, ) we have


D(k(, x))2 d .

F (k(t, x)) = F (x) +


0

Since F is bounded on B(o; r) (by the Mean Value Theorem and (6.3.19)), there
exists a sequence {ti }
i=1 (0, ) such that
lim D(k(ti , x)) = o.

40
(iv) Since {k(ti , x)}
i=1 is bounded, we can select a weakly convergent subsequence.

Summarizing, we have
Lemma 6.3.29. For any x B(o; r) there exist a sequence {ti }
i=1 (0, ) and x0 H
such that
k(ti , x)  x0 ,
D(k(ti , x)) o,
{F (k(ti , x))}
i=1

is an increasing sequence.

(6.3.28)
(6.3.29)
(6.3.30)

It follows from (6.3.28) and (6.3.17) that F  (k(ti , x)) y. If we prove that y
= o,
then the assumptions of Lemma 6.3.28 are veried with yn = k(tn , x), and so the existence
of a solution x0 of (6.3.20) with described by (6.3.24) will be proved.
By an appropriate choice of the initial condition x B(o; r), we show that the
above convergence takes place and that given by (6.3.24) is suciently close to 0 .
Recall that A = F  (o) is a compact linear self-adjoint operator in the Hilbert space
H (see Proposition 5.2.21). Its spectrum consists of a countable set of real eigenvalues
with one possible limit point = 0. We split the set of all eigenvalues to the parts 0
and < 0 , respectively. We denote by H1 and H2 , respectively, the corresponding closed
linear subspaces generated by the eigenvectors (see Theorem 2.2.16). Note that 0 > 0
implies that dim H1 < . The eigenspace associated with 0 will be denoted by H0 . Let
P1 , P2 be the orthogonal projections of H onto H1 , H2 , respectively (see Figure 6.3.2).
40 The

reader is invited to justify (i)(iv).

6.3B. Krasnoselski Potential Bifurcation Theorem

421

H2
P1

P2
o

(H0 ) H1
Figure 6.3.2.

Let us denote
S1 = {x H1 : x = r}.
Lemma 6.3.30. There exists r0 > 0 such that B(o; r0 ) U(o) (see (6.3.16)), and for all
0 < r < r0 we have
(i) there is no t [0, ) for which the set k(t, S1 ) is contractible to a point (see
Denition 6.3.20) in
R = {x H : P1 x
= o},
(ii) for any t [0, ) there exists xt S1 such that
P1 k(t, xt ) H0 ,

i.e.,

k(t, xt ) H0 H2 .

Proof. Lemma 6.3.23 and (i) imply (ii) (see Exercise 6.3.33). Hence we prove only (i).
According to Lemma 6.3.21 it is sucient to prove that for any t the set S1 is contractible
into k(t, S1 ) in R. Indeed, according to Lemma 6.3.22 the set S1 is not contractible to
a point in R. Since k is a continuous function of both variables, it is sucient to prove
that it assumes only values from R: we want to prove that
P1 k(t, x)
= o

t [0, ),

x S1 .

We have
F (k(0, x)) = F (x)

1 
(F (o)x, x) (x)x2
2

(6.3.31)


1
0 (x) x2
2

where (r) 0 as r 0 (see (6.3.19) and Proposition 3.2.27). Note that the last
inequality holds due to x H1 . Since F (k(t, x)) is increasing in t, we conclude herefrom
that


1
0 (r) r 2 .
F (k(t, x))
(6.3.32)
2
On the other hand, we have an estimate from above (we write k instead of k(t, x) for the
sake of brevity):


1
1
F (k) = (F  (o)k, k) + F (k) (F  (o)k, k)
2
2
1 
1 
(F (o)P1 k, P1 k) + (F (o)P2 k, P2 k) + (k)k2
2
2
(note that (F  (o)P1 k, P2 k) = 0 due to H1 H2 ).

422

Chapter 6. Variational Methods


Denote
= max { : (F  (o))},

= sup { (F  (o)) : < 0 }.

Then

P1 k2 + P2 k2 + (k)k2 = k2 +


P1 k2 + (k)k2 .41
2
2
2
2
Hence, due to the fact that k = r, we have
F (k)

2
r +
P1 k2 + (r)r 2 .
2
2
It follows from (6.3.32) and (6.3.33) that
F (k)

P1 k(t, x)2

(6.3.33)

0 2
4
r
(r)r 2 .

This implies the existence of r0 such that


P1 k(t, x)2 ar 2

for any

r r0

where

a = a(r0 ) > 0.

(6.3.34)


This completes the proof of Lemma 6.3.30.

Proof of Theorem 6.3.26. Step 1. Let tn be an arbitrary sequence of positive numbers. Let xn be a point from S1 for which
P1 k(tn , xn ) H0
(its existence follows from (ii) of Lemma 6.3.30). Since S1 is compact, we can select a
strongly convergent subsequence (denoted again by {xn }
n=1 ) such that
.
lim xn = x

(6.3.35)

Step 2. It follows from Lemma 6.3.29 that there is a sequence {i }


i=1 such that
) = yi  x 0
k(i , x

in

H,

and at the same time also


D(yi ) o.


Step 3. The compactness of F implies that (passing again to a subsequence if necessary)


there exists y H such that
lim F  (yi ) = y.
i

We show that y
= o. Indeed, we have
(F  (yi ), P1 yi ) (y, P1 x0 ).
Also, for all i N, we have the estimate
(F  (yi ), P1 yi ) = (F  (o)yi , P1 yi ) + (F  (yi ) F  (o)yi , P1 yi )
1
0 P1 yi 2 (yi )yi 2 0 ar 2
2
for all r small enough due to (6.3.34). This immediately implies
(y, P1 x0 )
= 0,
41 We

and so

use the identity P1 k 2 + P2 k 2 = k 2 .

y
= o, x0
= o.

6.3B. Krasnoselski Potential Bifurcation Theorem

423

Step 4. We have just veried the assumptions of Lemma 6.3.28. Hence yi x0 in H,


and x0 solves (6.3.20) with given by (6.3.24):
x0 F  (x0 ) = o,

1
(F  (x0