Anda di halaman 1dari 496

ADVANCED

CALCULUS_
ALLEN DEVINATZ

Northwestern University

_./

HOLT, RINEHART AND WINSTON


New York
Montreal

Chicago
Toronto

San Francisco
London

Atlanta

Dallas

Copyright

1968

by Holt, Rinehart and Winston, Inc.

All Rights Reserved


Library of Congress Catalog Card Number: 68-18409

2689453
Printed in the United States of America

1 2 3 4 5 6 7 8 9

PREFACE
The contents of this book represent a somewhat expanded version of
a one-year course that I have given from time to time since 1961.

Those taking the course have been mainly undergraduate and first

year graduate students concentrating in mathematics. Occasionally,

students from engineering and the physical sciences have taken the

course and have told me they enjoyed it. I recommend that students

who select a course such as this should generally have a little more

mathematical maturity than that afforded by the usual freshman

sophomore courses in the calculus. One excellent way to gain such

maturity is through a beginning course in linear algebra, although the


contents of such a course are not a specific prerequisite for the under
standing of this book.

Section 1.1 on logic is to be read by the student. Of course, such a

brief introduction is not intended to teach the student the elements

of logic, but rather to make him aware of the formal processes involved

in mathematical reasoning. It may not be too well understood on the

first reading, but if the student will reread it several times during the
course, it probably will begin to appear more reasonable. The notation

of the propositional calculus is to be viewed as a concise shorthand for

mathematical statements. My experience has been that students learn


to use the notation in a reasonable way in a relatively short time and
with very little trouble.

For those instructors who do not wish to spend time on an extended

treatment of the real number system, I have arranged matters so that

they can begin the discussion o(real numbers with Section 1.8. In that
section, I have given what amounts to a set of axioms for the real num

ber system, the more usual starting point for a beginning course in

analysis.

If, in going through the material, my peers should at times accuse

me of being pedantic, I plead guilty to the charge; my aim in doing


this has been deliberate. All too often students beginning the serious

study of mathematics get the idea that a vague or seemingly trivial

point should be waved away. I have triecrto convey the idea to the novice

that he should be sure that he can really prove these seemingly trivial

points.

As far as the differential calculus is concerned, there is probably


not too much choice in the way one can proceed. As for the integral

calculus, I have chosen the more cumbersome and less general method

of Riemann-Darboux integration and Jordan content rather than one


iii

of the more modern theories of measure and integration. Although


I do not feel that a historical approach to a subject i necessarily always

the best, in the case of integration my view is that a student cannot


fully appreciate or even fully understand the more modern theories
until he has seen the gradual and natural evolution of the ideas involved.
I make absolutely no claims to originality. I have no gimmicks or

special pedagogical devices as aids in understanding. Mathematics is


a difficult subject; I have tried to set down a small but important portion
of it in as straightforward, clean, and concise a way as I know how,
consistent with the level of student to whom it is addressed. Only the
readers can ultimately decide whether or not I have succeeded.
I am grateful to several friends for their help in preparing the manu

script. I am deeply indebted to Sam Lachterman of St. Louis University.


He read the entire manuscript, pointed out a large, but finite, number
of errors, and showed me how to make several proofs in a shorter and
more elegant way. Jacob K. Goldhaber of the University of Maryland
read several of the chapters and gave me some excellent advice. Thanks
are also due to my former colleagues, Sebastian Koh and A. Edward
Nussbaum of Washington University. The former used a preliminary
version of the first five chapters in his class and made several sugges
tions for improvement, while I had several helpful conversations with
the latter on the subject matter of the book. Above all I am grateful to
the various classes of students who endured varying versions of the
course.
Evanston, Illinois
March 1968

A. D.

CONTENTS

Preface

CHAPTER 1

THE REAL NUMBER SYSTEM

I. I

Some Ideas about Logic

1.2
1.3
1.4
1.5
1.6
1. 7
1.8
1. 9

Sets
Relations and Functions
The Natural Numbers
The Integers and the Rationals
Countability
The Reals
A Review of the Real Number System and Sequences

Properties of the Reals

CHAPTER 2
2.1
2.2
2.3
2.4

The Heine-Borel Theorem and Uniform Corttinuity


Monotone Functions
Limit Superior and Limit Inferior

Convergence Tests
Decimal Expansions
Sequences and Series of Functions
Infinite Products

69
76
84
90

INFINITE SERIES

Series of Real Numbers

CHAPTER 4
4.1
4.2
4.3
4.4

LIMITS

The Limit Concept and Continuity

CHAPTER 3
3.1
3.2
3.3
3.4
3.5

1
16
22
26
31
38
44
55
62

99
110
118
126
131

DIFFERENTIATION

The Derivative Concept


Differentiation Rules
Mean Value Theorems
Taylor's Remainder Formulas

138
145
149
158
v

4.5
4.6

Power Series
The Weierstrass Approximation Theorem

165
178

CHAPTER 5 I INTEGRATION
5.1
5.2
5.3
5.4
5.5

Riemann-Darboux Integrals
Properties and Existence of Riemann-Darboux Integrals
Improper Integrals
Riemann-Stiel tj es Integrals

183
190
201
210

Functions of Bounded Variation and the Existence of


Riemann-Stiel tj es Integrals

217

CHAPTER 6 j HIGHER DIMENSIONAL SPACE


6.1
6.2
6.3
6.4
6.5
6.6
6.7

Real Vector Spaces


Euclidean Spaces
Topology in En
Continuous Functions
Linear Transformations
Determinants
Function Spaces

228
235
241
248
256
274
293

CHAPTER 7 I HIGHER DIMENSIONAL


DIFFERENTIATION
7.1
7.2
7.3
7.4
7.5
7.6

Motivation
Directional Derivatives and Differentials
Differentiation Rules
Higher-Order Differentials and Taylor's Theorem
The Inverse and Implicit Function Theorems
Maxima and Minima

CHAPTER

305
309
319
324
332
344

8 I HIGHER DIMENSIONAL
INTEGRATION

8.1
8.2
8.3

Riemann-Darboux Integrals
Jordan Content
Existence and Properties of Riemann-Darboux Integrals

353
359
366

8.4
8.5

Iterated Integration
The Transformation Theorem for Integrals

CHAPTER 9

374
380

THE INTEGRATION OF
DIFFERENTIAL FORMS

I. LINE INTEGRALS

9.1
9.2
9.3
9.4

Motivation and Definitions


The Length of a Curve
A Special Case of Stokes' Theorem
Closed and Exact Differentials

396
403
407
416

II. SURFACE INTEGRALS

9.5
9.6
9.7
9.8
9.9
9.10

Motivation and Definitions


The Algebra of Differential Fonns
Closed and Exact Forms
Manifolds
Integration on Manifolds
Stokes' Theorem

Symbols
Index

429
437
452
455
461
467
479
482

2 I THE REAL NUMBER SYSTEM

new statements that are called true. In this sense mathematics is a


complicated game and truth has nothing to do with reality (whatever
that elusive thing is!) or various concepts of truth discussed by the
philosophers. Truth, for us, shall be something prescribed by a set of
rules.
To be somewhat more specific, a branch of mathematics is usually
constructed in the following way. A small number of statements are
written down which are called axioms and these are arbitrarily called
true and the letter 't' assigned to them. By means of a given rule,
from each true statement a new statement can be formed which is
called false and the letter 'f' is assigned to these. Then there are vari
ous rules for assigning 't' or 'f' to new statements formed from col
lections of statements that already have 't' or 'f' values attached to them.
This enlarges our collection of statements with 't' or 'f' attached to them.
We can then use our rules on this enlarged collection to get a possibly
still larger collection of statements haviilg 't' or 'f' values attached to
them. We can then apply the rules again to get a possibly still larger
collection of statements having 't' or 'f' assigned to them, and so on.
It is always our hope that in starting from the given axioms and
applying our rules that we will not get a statement that has both letters
't' and 'f' attached to it. If we get a statement with both 't' and 'f' at
tached to it, we say that our axioms are inconsistent. If this is never the
case, we say our axioms are consistent.
For a consistent set of axioms, those statements taking on the value
't' are called lemmas. propositions, theorems, and corollaries. It is not always
clear which true statements should bear which names.

However,

current usage seems to suggest the following rules. A theorem is


an important true statement. A lemma is a true statement that is used
in constructing the proof of another true statement and usually does
not have wider applicability. A corollary is a true statement that is an
immediate consequence of a true statement. Finally, a proposition is
a true statement that is not a lemma or corollary but is not important
enough to be called a theorem.

Many people also use the word

'scholium' to play the same role as the word 'proposition' or even


possibly to be a true statement that is not as important as a proposition.
We shall now give some rules for forming new statements from given
statements A and B that have values 't' or 'f'. That is to say, we shall
construct new statements containing the statements A or B or both and
give rules for assigning 't' or 'f' to the new statements. We shall do
this by means of a truth table, which will list the symbol 't' or 'f' to be
given to a new statement given the various combinations of 't' and 'f'
values that A and B can take on.
a.

Negation

(To be read: not A.)

I.I

-A

SOME IDEAS ABOUT LOGIC I 3

f
b.

Implication A =>B
(To be read: A implies B, or if A then B, or B if A, or A only if
B, or A is a sufficient condition for B, or B 1s a necessary con
dition for A.)
A

c.

A =>B

A&B

Conjunction A&B
(To be read: A and B.)
A

d.

Disjunction AVB
(To be read: A or B.)
A

AVB

t
t

The preceding truth tables give a prescription for the use of the
symbols

'-

' '==i, &


'
,' and 'V'. As such, we can loosely think of these
,

tables as giving a meaning to statements containing these symbols. We


shall try to explain this in more detail.
In our previous discussion we have used the word 'statement' as
if this were a well-known concept to the reader. Actually we think the
reader has a good idea of this concept, but we shall be pedantic and

4 I THE REAL NUMBER SYSTEM

comment on it further. To form a statement in the written English


language, for example, we begin with an alphabet consisting of 52
Latin letters (lower case and capitals), the various punctuation marks,
and various other symbols such as parentheses, brackets, and so forth.
We may even suppose the alphabet contains a symbol that cannot be
seen-an empty space. A statement in the English language, meaningful
or not, is a string of these symbols usually placed in a horizontal row,
and one has a rule to tell where a statement begins and where it ends.
A string of Latin letters that begins and ends witl) the empty-space

symbol and has no empty-space symbol in betwee'n is called a word.


A string of objects beginning with two empty-space symbols and a

capital Latin letter and ending with a period, and having no period in
between is called a sentence, and so forth.
Statements in mathematics are formed in the same way, that is, by
placing symbols in various positions. However, it is usually the case that
we form these statements from a different collection of symbols than
those that we use for the English language. The symbols '

' . '&',
,

and 'V' are part of our mathematical alphabet.


Now, what we have described as statements of the written English
language cannot be said to constitute the written English language.
Most of the strings of symbols that would be written down would not
be meaningful. The meaningful statements are those prescribed by
means of lists of words contained in dictionaries and by means of the
rules of grammar. A moment's reflection is enough to convince us that
for someone who does not already know the written English language
it would be impossible to describe the rules of grammar or how to use
a dictionary in terms of the written language. The various rules must
be described in terms of a different language that is understood by
the learner. For a child this is usually done by means of a spoken
language, and for someone who understands a different written lan
guage such as, for example, Hebrew or Sanskrit, the rules of written
English can be described in terms of those languages.
The same situation persists with regard to the mathematical language.
In the mathematical language we say that the meaningful statements
are those which can be assigned a value 't' or 'f'. The rules whereby
we describe which mathematical statements are meaningful must be
prescribed by a language outside the mathematical language. For us,
the describing language is the English language. We are assuming that
a truth table is part of the English language, since if we had a mind to
do it we could describe these tables in terms of the conventional lan
guage. So we see that truth tables are nothing more than rules, written
in a language that we presumably understand, which describes which
statements in our mathematical language are meaningful. Of course,
.we have some intuitive ideas of what we want and these prescriptions

I.I

SOME IDEAS ABOUT LOGIC I 5

of the truth tables are nothing more than formalizations of these


intuitive ideas.
Let us give an example that illustrates how the truth-table method
works. Let us suppose that

A, B,

and

are statements that can be

given 't' or 'f' values. We wish to show that the statement

[(AB)

(BC)] [A CJ

&

always has a 't' value regardless of the values taken on by

A, B, and C.
'AB', E for
'AC'. The table

To get the table to fit on one page, let us set 'D' for

'BC',

'F' for

'(AB)

&

(BC)',

and 'G' for

'

'

looks as follows.

FG

t
t

Since the last column always has the 't' value we have shown what we set
out to show.
Once we have given the basic symbols of our mathematical language
we can

define

point we can

new symbols in terms of our basic symbols. As a case in

define

the equivalence symbol

When we write

AB,
this is to be read: A is equivalent with

B. It is sometimes also read:


'AB' is defined as another repre
for the statement (AB) & (BA), which is in terms of
symbols of our mathematical language. Once 'AB' has

A if and only if B.
sentation
the basic

The set of symbols

been defined, we coan consider it as the name of a statement. It is easily


seen that the statement
A and

AB has

the value 't' attached to it whenever

both haye the 't' value or whenever A and B both have the 'f'
value. Otherwise AB has the 'f' value attached to it.

In the above paragraph we have used a short symbol to replace a


more cumbersome one. This is, in essence, the nature of a definition.
A definition gives a (usually shorter) new name to something that can
be described in terms of known symbols or names. The object, as in
any other language, is for efficiency in expression, which leads to
efficiency in thought. The criteria of a definition is that it should only

6 I THE REAL NUMBER SYSTEM

introduce new symbols or names for groups of known symbols and we


should not be able to obtain any true statements by use of the defini
tion that could not be obtained without it. In other words, we should
think of definitions as simply introducing a system of shorthand into
the mathematical language.
Suppose now that we have a set of statements Hi, H2,

Hn which

are meaningf ul in the sense that they have 't' op-"f' values attached to
them. In addition, we suppose that there are statements A1,

, Am

so that each H k is composed of some of the A; together with the logical


symbols '-','==?',etc. It may not, in general,be known whether the A;
are meaningful. Further, suppose C is a statement that is composed of
some of the A; and the logical symbols, and we don't know in general
whether C is meaningful. However, suppose that under the supposi

tion that A1,

, Am can be given 't' or 'f' values we find by the use of

our truth-table rules that


Hi & H2 &

& Hn ==* C

always has a 't' value. Then,if all the H k have the 't' value we shall give
C a 't' value. This is a new rule for giving statements 't' or 'f' values and

is usually called the rule of inference. Our hope here,as with the truth
table rules, is that starting from our axioms we cannot also give C an
'f' value,that is, -C cannot be given a 't' value by the scheme that has
been outlined.
As an example of this new rule,suppose A1 =*A2 and A2 =*A3 are
axioms and therefore have 't' values, even though we don't know
whether A1, A2, and A3 can be given 't' or 'f' values. However, under
the supposition that they are meaningful we have established by a truth
table that

always has a 't' value. Hence we would give A 1 ==* A3 a 't' value. Another
example is

If Ai and A1 ==*A2 have 't' values,we would assign the 't' value to A2
In case a statement can be given a 't' value by means of the rules we
have prescribed,then the statement is said to be derived or proved from
the axioms. Suppose a statement C has a 't' value and it is obtained by
our rules through an implication Hi & H2 &

& Hn -==*C .

Each

H k is in turn either an axiom or obtained through an implication of

a conjunction of other statements, and so forth, until we finally get


back to where all the statements appearing in the conjunction on the left
are axioms. The collection of all such statements is called the derivation
or proof of C. However, it would be quite impractical to list all these
statements beginning with the axioms and, as the reader well knows

1.1

SOME IDEAS ABOUT LOGIC 17

from experience, in practice a proof usually consists of just a portion


of this collection. In other words, the proof starts with known true
statements that have been proved elsewhere and proceeds from there.
The set of rules we have given above is usually called the pmposi
tional cakulus or sometimes a model for the propositional calculus.
However, the symbols we introduced, and the rules for their use, are
not rich enough to provide an adequate basis for most of the discourse
of mathematics. Hence we shall introduce some new symbols together
with rules for their use which in formal studies of logic is called the

predicate cakulus. A good deal of mathematics is described in the lan


guage of the predicate calculus, usually at an informal level, since a
very formal approach gets very cumbersome and may often interfere
with understanding. However, there are some situations in which a
formal approach may very much clarify and facilitate the handling
of complicated situations. We have in mind the precise statements of
complicated definitions and, in particular, the negating of complicated
statements.
Suppose that

Q(x) is a statement that depends on a variable 'x'. The

reader may think of a variable as the name of an unspecified object


that can be replaced by any member of a specified set. The counterpart
of a variable in the English language is a pronoun or a common noun.

x is not a specified object, in general it would make no sense to


Q(x). However, by adding certain quali
fying statements to Q(x) it may be possible to do so. One of these
qualifying statements is 'for every x,' which in symbols is '(x)' or 'Vx.'

Since

associate a 't' or 'f' value with

We may then write down a statement:

(i)(Q(x))
This is to be read: For every

or

Vx Q(x).

x the statement Q(x) is true. Another


x', which in symbols is '(3x).'

qualifying statement is 'there exists an

We may then write down another statement:

(3x)(Q(x)) .
This is to be read: There exists an

x such that Q(x) is true. It may now

be possible to attach 't' or 'f' values to these statements. The symbols

'(x)' and 'Vx' are called universal quantifiers and the symbol '(3x)' is
existential quantifier.
It is also possible that we may have a statement Q(x, y) which depends

called an

on two variables and we may write down a statement:

(x)(y)(Q(x,y)).
This is to be read:

For every

x and for every

is true. We may also write down a statement:

(x)(3 y)(Q(x,y)).

y the statement

Q(x, y)

I THE REAL NUMBER SYSTEM

This statement is to be read: For every x there exists a y such that Q(x, y)

is true. Clearly, the statements

(x)(3y)(Q(x, y)) and (3y)(x)(Q(x, y))

are different statements simply because the symbols are placed in a


different order. However, even intuitively they cannot be considered
equivalent. In the second case y is independent of which
whereas in the first case y may depend on

x is chosen,
x. As an example, suppose

x and y may be replaced by real numbers. We may then write the true
statement:

(x)(3y)(x

<

y).

The translation of this statement into the English language reads:


For every real number there exists another real number (depending
on

x) that is larger. On the other hand, we may write down the false

statement:

(3 y)(x)(x

<

y).

In the English language this says: There exists a (fixed) real number
that is larger than every real number. Many other situations will arise

later on in the text which will demonstrate again and again the dis
tinction between these two types of statements.

We shall now outline some of the rules for operating with the predi

cate calculus. These rules will show the connections with the rules we

outlined previously for operating with the propositional calculus. The

reader may very well recognize that he has been using these rules in
an informal way inhis previous studies of mathematics.

As in the propositional calculus we shall suppose that statements


involving quantifiers are to be given 't' and 'f' values. We start with a

given set of statements that have 't' or 'f' values attached to them, and
by means of a set of rules we obtain other statements that have 't' or 'f'

values attached to them. The first rules we shall give are those for
negating statements. The rule for negation is the following:

-(x)(Q(x)) {::::> (3x)(-Q(x)).


This is to be taken to mean that if the left side has a 't' or 'f' value, then

the right side is to be given the 't' or 'f' value, respectively, and vice

versa. From this rule it is easy to establish the rule

- (3x)(Q(x))

{::::>

(x)(-Q(x)).

Using these rules we can negate more complicated expressions. For

example,

is equivalent to

SOME IDEAS ABOUT LOGIC) 9

1.1

This is done by negating one at a time; that is, we consider

Q1(x1) = (x2) (3x3) (3x4)(xs) (Q(x.,


and then

-(x1)(Q1(x1))

is equivalent to

,xs))

(3xi)(-Q1(x1)),

and con

tinue on in this way.


Now, let us go on to some of the other rules. Usually it is necessary
to start a chain of proof by means of known true statements that in
volve quantifiers. A broad description of the rules is as follows. First,
have a consistent way of removing the quantifiers from the known true
statements. Next manipulate the resulting quantifier free statements
by the rules of the propositional calculus. Finally, have a consistent
way of replacing the quantifiers. The final quantified statement can be
given a 't' value provided we have used all the rules correctly.
We believe that the rules of manipulating with quantifiers are best
explained by means of examples. Let us first look at a simple situation
involving only universal quantifiers. Suppose
statements involving the variable

'x',

P(x), Q(x), and R(x)

are

and it is known that the statements

(x)(P(x) => Q(x)),


(x)(Q(x) =>R(x)),
are true, that is, have a 't' value attached to them. It would seem natural,
at least from the rules given for the propositional calculus, that we
should be able to conclude that the statement

(x)(P(x) =>R(x))
has a 't' value attached to it. The rule here is to remove the universal
quantifiers to get the statements

P(x) => Q(x),


Q(x) =>R(x).
Consider each of these two statements to have the 't' value attached to
it even though with the variable

'x'

the statements may have no mean

ing. However, we are thinking that if we replace

by any member of

a given set, then the statements will be true. By the methods of the
propositional calculus we conclude that

{[P(x) => Q(x)J

&

[Q(x) =>R(x)]} => {P(x) =>R(x)}

is a true statement. Consequently, by the rule of inference we conclude


that

P(x) =>R(x)
is a true statement. The rule now is to add the universal quantifier and
get the statement

(x)(P(x) =>R(x)).

10 I THE REAL NUMBER SYSTEM

Let us now look at a simple situation that involves an existential quan


tifier. Suppose it is known that the statements
(x) (P(x) :::} Q(x)),
(3x)(P(x))
are true. It would seem reasonable that the rules we give should lead
to the conclusion that the statement
(3x)(Q(x))
is true. Now, the statement (3x)(P(x)) has the intuitive meaning that
P(x) is true when x is replaced by only certain members (possibly only
one) of a specified set. Hence the rule we now formulate is that when
an existential quantifier is removed, the variable 'x' shall be replaced by
a symbol that stands for a definite but unspecified member of some
set. This is in accordance with the conventions used in ordinary mathe
matical discourse. For the purpose of this discussion let us use the
beginning letters of the Latin alphabet to stand for these definite but
unspecified symbols. Consequently, remove the existential quantifier
to obtain the statement P(a). If we remove the universal quantifier
from the statement (x)(P(x) :::} Q(x))to get the statement P(x) :::} Q(x),
then there would be no way for us to proceed. For the statement
{P(a) & [P(x) :::} Q(x)]}:::} Q(a) is not a true statement according to
truth-table methods. But
{P(a) & [P(a) :::} Q(a)]} :::} Q(a)
is a true statement, and by the rule of inference we conclude that
Q(a) is a true statement. Hence we adopt the rule that whenever a
universal quantifier is removed, we may retain the variable 'x' or replace
it by one of the letters 'a', 'b', and so on, which stand for definite but
unspecified objects. The tactics depend on just what is intended to be
accomplished.
Once we have established that Q(a) is true, we want to reinstate a
quantifier. The rule is that whenever the letters 'a', 'b', and so on, appear
in our statements, they can be quantified by existential quantifiers.
Hence we get that the statement
(3x)(Q(x))
is to be given a t value.
'

'

In removing existential quantifiers, the rule we made is that the


letters usually reserved for variables are replaced by other letters.
This is done to serve as a warning that, when we reach the point where
we want to reinstate quantifiers, we should not add a universal quanti
fier where we should have added an existential quantifier. As an exam-

1.1

SOME IDEAS ABOUT LOGIC

11

ple of the type of difficulty that could arise if we did not follow this
procedure, let us consider the statements:
(x)(x< 1x+1<2),
(3x)(x<l).

In these statements we are supposing that x may be replaced by real


numbers. Hence these are true statements. Remove the existential
quantifier but do not replace the variable 'x' by another letter. We get
the statements:
x<lx+l<2,
x<1,

which are to be considered as being true. Applying the propositional


calculus we are led to the statement:
x+1< 2.

Now, in adding a quantifier to this statement we may forget that x


cannot be replaced by any real number. If we add a universal quanti
fier we get the wrong statement:
(x)(x+1<2).

In the simple example we have considered, it is, of course, not hard


to remember that in adjoining a quantifier we should add an existential
rather than a universal quantifier. However, in a long and complicated
chain of reasoning this may not be so easy to remember. For this
reason we must proceed in the manner outlined.
Let us give several more examples which will illustrate several other
points that arise when dealing with quantifiers. In the next three
examples we shall suppose our variables represent real numbers. We
start with the following true statements:
(3y)(x)(x+ y =x),
(x) (x< x + 1).

In statements involving both existential and universal quantifiers, the


rule is to remove the existential quantifiers first. Doing this we get the
statements:
(x)(x+a=x),
(x)(x< x+1).

Now remove the universal quantifiers to get the statements:


x+a=x,
x<x+l.

12 I THE REAL NUMBER SYSTEM


In dealing with the equality sign we shall adopt the rule that the symbols which

stand on either side of the equality may be used interchangeably tn any ex


pression involving these symbols. Hence we get the statement:
x+a<x+l.
Now replace the quantifiers. The rule here is the reverse of the rule
in removing the quantifiers. First, replace the universal quantifier to
get the statement:
(x)(x+a<x+I).
Now replace the existential quantifier, by a variable other than 'x', to
get the true statement:
(3y)(x)(x+y <x+ 1).
Note here that we do not try to quantify the symbol 'I'. This would lead
to a different statement from the one we have obtained. The general
rule is not to quantify symbols in the final statement of a chain of
argument which are not quantified in the initial statements.
Let us look at a second example closely related to, but not the same
as, the previous example. Again the variable 'x' shall represent real
numbers. Let us start with the true statements:
(3y)(x)(x+ y=x),
{3y)(x)(x<x+ y).
In removing the existential quantifiers we should be careful not to
use the same symbol in separate statements involving the existential
quantifiers. If, for example, we used the same symbol upon removing
the existential quantifiers we would get the statements:
(x)(x+a=x),
(x)(x<x+a).
After a few more steps this would lead to the incorrect statement:
(

(x)(x+y<x+y).

Hence, removing the existential quantifiers in the correct way and


then removing the universal quantifiers, we get the statements:
x+a=x,
x<x+ b.
Using our rule for equality we get the statement:
x+a<x+ b.
Replacing the quantifiers, first the universal and then the existential,
we get the true statement:

I.I

SOME IDEAS ABOUT LOGIC I 13

(3y)(3z)(x)(x+y <x+z).
We shall consider one final example. The variables are again assumed
to represent real numbers. Let us start with the true statements:
(x)(3 y)(x<y),
(x)(y)(x <y =}x + y <2y).
If we proceed according to the previous rules we get the statements:
x <a,
x <a =:::} x+ a<2a.
Using the propositional calculus we get the statement:
x+a<2a.
Replacing the quantifiers by our previous rules leads to the incorrect
statement:
(3y)(x)(x+y < 2y).
The reader has probably already recognized what has gone wrong
here. In removing the quantifier from the first statement the ambiguous
element a does not remain the same for all x but must change as x
changes. In mathematics one usually indicates this by writing 'ax' or
'a(x)'. If we do this we get the statements:
x <ax,
X <ax

==> X+ ax<2ax

By means of the propositional calculus these statements lead to the


statement:
x+ ax< 2ax.
We must now decide how to replace the quantifiers. Clearly the rule
in this situation is to replace the existential quantifier first, using a
variable other than 'x ', to get the statement:
(3y)(x+y < 2y).
This does not contradict any of our previous rules, since we have
introduced a new type of symbol, namely, 'ax' Now replace the univer
sal quantifier to get the correct statement:

(x)(3y)(x+y<2y).
In working with quantified statements the reader should be aware
of the fact that any letters or symbols may be used for the variables.
This is in accordance with accepted mathematical practice. For example,
the following three sets of symbols all have the same meaning:

14 I THE REAL NUMBER SYSTEM

J:!(x)

dx,

J: J(y) dy, J: J(t)

dt.

The verbal description of the rules of the predicate calculus we have


given above can actually be given in a very succinct manner by means
of a set of axioms. That is, given any set of axioms for a mathematical
system, it is always possible to add another short set of axioms so that
together with the propositional calculus the latter set of axioms has the
effect of automatically applying the rules of the predicate calculus
which we have given. For such a set of axioms we refer the reader to
page 57 of the book by Mendelson cited at the end of this section.
One final rule. We shall always make the assumption that every
statement we make has one and only one of the values 't' or 'f'; that is,
every statement we make is either true or false and our system is con
sistent. This allows us to use the

rule or principle of proof by contradiction.

Suppose we want to find the value 't' or 'f' of a statement A. Suppose


also that we know a statement B has a 't' value and we are able to prove
that A

B; that is, the last statement has a 't' value. If A had the 't'

value attached to it, then it would follow that

B has the 't' value at

tached to it, which would contradict our hypothesis that B can have
only one value attached to it. Hence A must have the 'f' value.
Our discussion of the elements of logic has of necessity been rather
circumscribed, so that at this point there is not a sufficiently adequate
basis for a completely formal development of a branch of mathematics.
On the other hand, a completely formal development usually turns out
to be very cumbersome and, unless one already understands the sub
ject that is being formalized, is probably undesirable. After all, the
process of understanding is an intuitive psychological phenomenon.
In understanding the proof of a theorem or in discovering a new
theorem, the mind seems to operate in a haphazard manner rather than
in a formal step by step process that a logical proof requires. What
then is the role of logic in mathematics? There are several important
roles it can play: It can provide a mechanical check on our informal
reasoning processes; it can provide a system of automatic symbolic
operations to assist in complicated situations; it can provide a means
of increasing precision and generality; and, last but not least, it can
provide an accurate method of transmitting mathematical information.
When we begin the discussion of the natural numbers we shall
proceed in a somewhat more formal manner than we do later on in
the book. The reason for this is that we feel that most readers will be
very familiar with the basic facts of elementary arithmetic and hence
the formalism will not interfere with their understanding. As the mate
rial becomes less familiar, our approach shall become more informal
so that understanding may not be jeopardized. However, we shall
never completely abandon a certain amount of formalism, since we feel

1.1 SOME IDEAS ABOUT LOGIC I 15

that in many instances it provides an accurate and efficient method for


handling complicated situations.

References
Kershner, R. B., and L. R. Wilcox,

The Anatomy of Mathematics,

The Ronald

Press Company, New York, 1950.

Introduction to Mathematical Logi,c, D. Van Nostrand Com


J., 1964.
Introduction to Logi,c, D. Van Nostrand Company, Inc., Prince

Mendelson, Elliott,

pany, Inc., Princeton, N.


Suppes, Patnck,

ton, N. J., 1957.

D Exercises
I.

By use of a truth table verify the following:


(a) [-(A & B)] {:::>[ -AV -B].
(b) [-(A & B)]{=>[A =>-B].
(c) [-(A =>B)] {=>[A & -B].

2. Use the truth-table method to show that the following statements


are always true:
(a) [A& (A =>B)] =>B.
(b) [ (AV B)& -A] =>B.
(c) [A&B] =>B.
3. Use the truth-table method to show that the following statements
are always true:
(a) [A =>B] {:::> [-B =>-A].
(b) [ (A =>B)&-B] =>-A.
(c) [ (-A)& (AVB)] =>B.
4. Translate the following into the English language assuming the
variables represent real numbers:
(a) (x)(y)(z)([x > y &y > z] =>x > z).
(b) (x)(x > 0 => (y)(3z)(y < zx)).
(c) (3z)(x)(x+z= x).
(d) (3z)(x)(3y)(x+y=z).
(e) (x)(3y)(3z)(x+y=z).
Explain why (d) and (e) are not equivalent statements.

5. Using the results of Exercise


Exercise 2.

negate all the statements

6. Translate the following statements into the symbolism of the


propositional and predicate calculus:
(a) Every nonzero real number is either positive or negative.

16 I THE REAL NUMBER SYSTEM

(b)

It is not the case that there is a real number greater than

every real number.


(c)

There is a real number with the property that its multipli

cation with any real number


(d)

7.

xgives xagain.

There is a negative real number whose square is

Suppose

1.

P(x) is a statement involving the variable 'x' and it is

known that the following statement is true:

(x)(P (x)).
Use the rules of the propositional and predicate calculus to show that
the following statement is true:

(3x)(P(x)).
8. In the following list show that if the left statement is true, so is
the right one. The results of Exercise 1 may be helpful.

(c)

(3y)(x)(x < O=>x < y)


(x)(3y)(y > 0 &xy = x)
(x)(3y)(x O=>xy= 1)

1.2

SETS

(a)
(b)

(3y)(x)((x < y) V -(x < 0)).


- (3x)(y)(y > 0 =>xy x).
(x)(3y)(-[x 0 &xy l]).

We shall adopt the naive, intuitive point of view that a set is a collection
of objects without questioning what these words mean. The term

'b EB' is to be read: b is an element of the set B. Often a set will be


specified by some descriptive property. For example, suppose that

Q(x)is a sentence containing a variable 'x'. Then we can form the class
'x' make the
sentence Q true. We denote this by means of the term '{x: Q(x)}',
and this is to be read: The collection of all xsuch that Q(x) is true. As
in the situation for quantifiers, x is understood to vary over a specified
set. The set consisting of the one element a will be designated by '{a}',
the set consisting of the two elements a and b will be designated by
'{a, b}', and so on.
Given two elements a EA and b EB we can form the ordered pair
(a, b). The reason we use the word 'ordered' is that in general {a, b)
and (b, a)are not considered the s.ame object. Indeed, the ordered pairs
(a, b) and {a1, b1) shall be identified if and only if a= a1 and b= b1
Recall that the symbol '=' has the meaning that the symbols that stand
of all objects that when their names are substituted for

to the left and right of it are simply different names for the same object
and we adopted the rule that different names for the same object may
be used interchangeably in any expression.
From two setsA and B we can form a new set, the Cartesian product
A X B, which is defined by the equality
AX B

= { (x, y) : x EA & y E B}.

(l.2.1)

1.2

SETS I 17

We can also form the intersection of two sets defined by the equality

A n B={x:x EA & x EB}.

(1.2.2)

More generally, if is a collection of sets we shall define

n {A:A EV'6}

{x: (A)(A Et16 =>x EA)},

(l.2.2')

that is, the collection of all elements each of which belongs to every set
in <76. The union of two sets is defined by the equality

A U B={x: x EA V x EB}.

(l.2.3)

More generally, if is a collection of sets we shall define

U {A:A EV'6}={x: (3A)(A Et16 & x EA)},

(1.2.3')

that is, the collection of all elements that belong to at least one set in A.
The complement of a set A is defined by the equality

(1.2.4)
We usually use the term 'x

ft A'

for the term '(x EA)' and shall

write 'A\B' for 'A n BC'; that is,

(1.2.5)
This latter set can be described as the set of all elements in A that are
not in B; that is,

A\B={x:x EA & x fi! B }.

(1.2.5')

A very helpful intuitive way of thinking about the sets we have

been forming is to represent them diagrammatically. These diagrams


are usually referred to as Venn diagrams. For example, the set A n B
can be represented by the cross-sectioned area in Fig. 1.2.1. The set

A U B can be represented by the cross-sectioned area in Fig. 1.2.2.

FIGURE 1.2.1

FIGURE 1.2.2

The rectangle in which the sets are enclosed represents the entire
universe of elements of the discourse.
We shall define the inclusion symbol C' by means of the equivalence

AC B<=>(x)(x EA=>x EB) .

(l.2.6)

18 j THE REAL NUMBER SYSTEM

The term

B.

If

'A CB ' is to be read: A is contained in B, or A is a subset


A CB and A =fa B, A is called a proper subset of B. Two sets are

of
to

be identified if and only if they are contained in each other, that is,

(1.2. 7)

A=B (ACB&BCA).

Since we have taken equality as a primitive logical notion, the equiv


alence ( l.2. 7) is to be viewed as an axiom rather than as a definition of
equality between sets. For, from the rule we have adopted for the
symbol'=', it is a simple matter to prove that

A=B =:}(A CB&BCA).


However, the converse implication cannot be proved and in axiomatic
set theory it is usually adopted as an axiom, provided equality is taken
as a logical notion. Actually in making the definition (l.2.6) and in
taking as an axiom (I. 2. 7) we should have used the universal quantifiers

'(A)'

and

'(B)',

so that these statements would refer to all pairs of sets

rather than to two particular sets.


In our previous discussion we have introduced a new symbolism,

'{x: Q(x)}',

which has not been defined in terms of the symbolism of

the predicate calculus. Hence we must either define this symbol in


terms of the rules of the predicate calculus or else give new rules for
operating with statements that contain these symbols. The first method
is clearly the preferable one. Hence we take the symbol

'{x: Q(x)}'

to

be a name for that set for which the following statement is true:

(y)(y E{x : Q(x)}Q(y)).


A moment's reflection is enough to convince us that the intuitive
meaning of this statement is the same as the intuitive meaning we
previously gave to the term
As an example,

A n B

'{x: Q(x)}'.

is to be defined as that set for which the

following statement is true:

(x) (x EAn B [x EA &x EB]).

(l.2.8)

Let us prove that the following statement is true:

An BCA.
First,

removing

the

universal

(l.2.9)

quantifier from (l.2.8) we get the

statement:

x EA n B [x EA &x E B],

(l.2.9')

which for the purpose of applying the rules of the propositional cal
culus is assumed to have a 't' value. The truth-table method of the prop
ositional calculus tells us that the following statement has a 't' value:

{x EA n B [x EA&x E B]}
=:} {x EA n B =:} [x EA &x EB]}.

(1.2.lO)

1.2

SETS j 19

From (l.2.9), (1.2.10), and the rule of inference we find that the follow
ing statement is true:

x EA n B ""* [x EA

&

x E B].

(l.2.11)

The rules of the propositional calculus tell us that the following is


true [Exercise 2(c) of Section 1.1]:

[x EA

&

x E BJ ===* x EA.

Designating the statement (l.2.11) by


by

'S(x)',

'R(x)'

(l.2.12)

and the statement (l.2.12)

we get the following true statement:

[R(x)

&

S(x)] ===* [x EAn B ===*x EA].

(l.2.13)

Using the rule of inference the following statement has a 't' value:

x EAn B ===*x EA.

(l.2.14)

Adding a universal quantifier we get the following true statement:

(x)(x EAn B ===*x EA).

(l.2.15)

Using the statement (l.2.6) and the rules of the propositional calculus,
we arrive at the true statement

(x)(x EA n B ===*x EA) ===*An BC A.

(l.2.16)

Using the fact that statements (1.2.15) and (1.2.16) are true, by the rule
of inference we finally arrive at the true statement:

An BC A.
We have presented above a formal proof of the last statement, being
careful to point out at each stage exactly what was being used. Of course,
we could have developed a scheme so that the proof would have been
more mechanical and the amount of space needed to write it down
would have been much less. Nevertheless, we think the reader now
sees how cumbersome a formal proof can be, even of the simplest
statements. For this pragmatic reason most of the discourse of mathe
matics is carried on in an informal way.
In an informal proof we do not write down all the steps but only
those considered to be essential. This is analogous to the situation
when in making an arithmetic or algebraic computation we usually
do not take cognizance of the fact that we are using, for example, the
commutative or associative laws, but suppose these are standard facts
which the reader recognizes. For example, the chain of argument
leading from ( l .2.9) to (l.2.11) or the chain of argument leading from
(1.2.11) to (l.2.14) is usually considered a standard argument and
would not be mentioned in an informal proof. Of course, just how much
is written down is at the discretion of the writer. Usually enough should

20 I THE REAL NUMBER SYSTEM

be written down so that it would be clear how to make the formal proof
if any question should arise about the validity of the informal proof.
As an example of an informal proof let us show the following:

(B

B)

(A

(B U C) [x EA

&

x EB U C].

C)

(A

C).

We have

x EA

Also,

x EB UC [x EB V x EC].
Now, it is easily checked by a truth table that
[ (x EA) & (x EB V x E C)]
[ (x EA & x EB) V (x EA

&

x EC)].

The disjunction on the right is equivalent with

x E (A

B)

(A

C).

Consequently, we have shown that

x EA

(B

C) x E (A

B)

(A

C)'

which gives the equality we are seeking.


Of course, proving such an equality or discovering it may be two dif
ferent matters. Often the way to discover such an equality is by looking
at the Venn diagram. In this case the set in question is shown by the
cross-sectioned area in Fig. l.2.3.

FIGURE 1.2.3

The reader may now object that we have not defined the symbol 'E'
in terms of the symbols of the predicate calculus. This is true, and in a
formal development of mathematics it is necessary to give the rules or
axioms that prescribe the use of this symbol. The situation is analogous
to that of Euclidean geometry, where points and lines are taken as
undefined objects and a set of axioms are given that give the relation
ships between points and lines. In axiomatic set theory, sets and the
symbol

'E' are taken as undefined things and a set of axioms is given that

will allow us to develop the kind of a theory of sets which seems intui-

1.2 SETS I 21

tively reasonable to us. These axioms deal mainly with prescribing the
conditions under which new sets can be formed from given sets. For
example, in axiomatic set theory the facts that

AnB and A

B can

be taken to be sets are usually given by axioms. In connection with the


set

B, the notion of ordered pair can be defined by use of the

axioms.
To try to give a reasonable axiomatic approach to set theory would
be too difficult at this stage and would delay our study of the calculus
for a long time. Hence, as we mentioned at the beginning of the dis
cussion on sets, we shall suppose that everyone understands what a
set is and we shall allow operations on sets and the construction of sets
that seem intuitively reasonable. Such a procedure can, on occasion, lead
to serious philosophical difficulties, but we shall pretend that they
don't exist.
Finally, let us remark that it is convenient to consider the set that has
no elements. It is defined by the equality

0= {x: x x}.
The set 0 is called the null set or the empty set or the void set.

D Exercises
1.

Draw Venn diagrams for the sets

A \B, Ac, and An(BU C).

Give a schematic diagram (not a Venn diagram) for a Cartesian product


set.

2.

Give formal proofs of the following statements:


(a)

AU(BU C)=(AU B)U C.


An(Bn C)=(AnB)n c.
(c) (Ac)c =A.
(d) An Ac= 0.
(b)

3.

Prove the following:

(A u B) n c=(An C) u (Bn C)..


(AnB) u c = (A u C)n(B u C).
(c)
An(A u B)=A.
(d) AU(AnB)=A.
(a)

(b)

4.

Prove the following:


(a)
(b)

5.

(An B)C=AC u Be.


(A u BY=ACnBe.

Using the results of Exercises 2, 3, and 4, find the complements

of the following sets:


(a)
(b)

AU B U cc.
An(BU(C U D)c).

22 I THE REAL NUMBER SYSTEM

6.

(c)

(A u BC)n (A u (Bn cc)).

(d)

0.

Prove the following by using the results of Exercise 3:


If An B=An C and A U B=A U C, then B = C.

7.

Show the following:


(a)
(b)

8.

If A,, is any collection of sets and B is any set, show the following:
(a)
(b)

9.

(b)

Bn u {A:AEA,, }= u {AnB:AEA,,}.
B u n {A:AEA,, }= n {A u B:A E A,,}.

If A,, is any collection of sets, show the following:


(a)
(b)

11.

B U U{A:AEA,, }= U {A U B:A EA,,}.


Bn n{A:AE.A,,}= n{AnB:AE.A,,}.

If A,, is any collection of sets and B is any set, show the following:
(a)

10.

AB;,,,_A\(A\B).
An B= A\(A\B).

-n c

( U{A: A E.A,,})c= n{Ac:AEA,,}.


( n{A :AE.A,,})c= U{Ac:AEA,,}.

If A,, is any collection of sets and B is any set, use the results of

the previous three exercises to show the following:


(a)
(b)

1.3

B\ U{A:A Evt}= n{B\A : AE vt} .


B\ n {A :A Evt}= U {B\A :AE vt} .

RELATIONS AND FUNCTIONS

The concept of the Cartesian product of two sets leads to the concept
of a relation. We shall first give a formal definition and then comment
on the meaning.

1.3.1

Definition.

A relation is a subset of a Cart.esian product set.

If R is a relation, the set >(R)={x :(3y)((x, y)ER)} is carted the do


main of R and the set 5t(R)={y :(3x )((x, y)ER)} is called the range
ofR.
The relation defined by R-1={(y,x):(x,y)ER} is caUed the inverse
of the relation R. If A is any set, then the set R-1(A)={x:(3y)(yEA &

(x, y)ER} is called the inverse image of A under R.


An example of a relation is the following. Let A be the set consisting
of all men in the United States and B the set of all people in the United
States. Let R be the set of all (x,y)EA x B so that x EA and y is a rela
tive of x. Since R is a subset of a Cartesian product it is a relation. Note
that we also have R C B X B. The domain of R is the set of elements
which are first members of the ordered pairs that are in R. In this case

1.3

RELATIONS AND FUNCTIONS I 23

this is A. t Suppose C is the subset consisting of those people in B who


have at least one living male relative. It is probably true that C - B.
At any rate, C is the set of elements which are the second members of
the ordered pairs that are in Rand hence is the range of R. Note that
it is not true that R=A X C, although certainly R C A X C.
The situation where to each element in the domain of a relation
there corresponds only one element in the range so that the resulting
ordered pair is in the relation is of special significance . Such relations
are called functions and we now give the formal definition.

1.3.2

Definition.

A function F is a relation with the additional property

that

(x)(y)(z)([(x,y)

F & (x,z )

E F]

==>y=z)'.

For example, if A and B are the sets given above, then the set of all

(x, y)

E A X B with

a husband and y his legal wife is a function. A

more pertinent example of the distinction between a relation and a


function is perhaps the following:

{ (x,y) : x2 + y2= 1} is a relation.


{ (x, y) : x2 + y2= 1 and y;a. O} is a

function.

Some people prefer the words multivalued function in place of the word
'relation.'
If Fis a function and

(x, y)

E F, then the usual convention is to de

note the second member y by


notation. The element

F(x)

F(x).

We shall follow this convenient

is called the value of Fat

x, and

we also

often speak of it as the map of x under F. In case Fmaps distinct elements


of its domain into distinct elements of its range the function

is said

to be one to one. The formal definition is the following

1.3.3

Definition.

A function F is

said to be one to one <=>

(x)(y)(F(x)=F(y) <=> x =y).


In the statement above the variables are, of course, understood to
represent elements of J?>(F). In case F is a one-to-one function, it is
clear that 1 is also a function. However, we shall state this as a formal
proposition and leave the proof as an exercise.

1.3.4

Proposition.

If F is a one-to-one function, then p-i is also a one

to-one function.
Given two or more functions there may be ways of combining these
tWe are supposing that every man has a relative.

24 I THE REAL NUMBER SYSTEM

functions to get a new function. We shall give one way here, the

com

position of two functions. We shall give other ways later.


1.3.5 Definition. If F and G are functions, then F 0 G is that function
having domain {x: G(x) EE(F) } and Vx EJ0(F 0 G) ,

F0G(x) =F(G(x) ) .
In very formal terms we can write

F0G= {(x,y) : (3z) ((x,z) EG & (z,y) EF) }.


By an abuse of language we shall often designate the range of a
function

f by the symbol

{J(x) : x EE(f)}.
If A is a set, we can define a new function g as that subset off consisting
of those ordered pairs (possibly void) whose first members belong to

A.

We shall often write

g=JIA.
If

E(f ) ,when

we write

f(A) = {f(x) : x EA},


we are referring to the range of

g.

A function is an important special type of relation. There is another

special type of relation, an

equivalence relation, which plays an extremely

important role in all branches of mathematics.

1.3.6 Definition. A relation R is said to be an equivalence relation if


and only if the following are satisfied:
(a)
(b)

(x) (y) ((x,y) ER ==> (y,x) ER) .


(symmetric)
(x) (y) (z) ([(x,y) ER & (y,z) ER] =>(x,z) ER) . (transitive)

Many authors prefer to talk about an equivalence relation

X and
(c)

on a set

add the condition

(x) (x EX=> (x,x) ER) .

(reflexive)

The condition (c) simply assures us that

,B (R) .

In fact, from

(a) and (b) it is easy to prove the following:

(x) (x E,B(R) ==> (x,x) ER) .


(x, y) ER, then (y,x) ER.
(x,y) ER & (y,x) ==> (x,x) ER.

Indeed, (a) tells us that if


(b) we get that

Hence from

We shall usually denote an equivalence relation by the symbol'=' and


instead of

'(x,y) E='we

shall write

'x

y'.

It is not hard to check that

1.3

RELATIONS AND FUNCTIONS I 25

it is possible to use the symbol to define an equivalence relation in


the Cartesian product XX X, where X is the set of all meaningful state
ments. We shall soon meet other familiar equivalence relations.
1.3.7 Theorem. Let X be a set and = an equiva/.ence relation having
domain and range the set X. There is a collection 6 of subsets of X so that
X=U{E :EE6},
where VE, F E 6, E = F E n F = 0 and x,y E E x = y; x = y
3E E 6 so that x,yE E. (The sets E E 6 are called equivalence
classes.)
Proof. For every x E X let E(x) = {y : y = x}. Since, as we have
shown, xE ..e (=) x = x, it follows that xE E(x) . For any sets
E(x) and E(y) suppose E(x) n E(y) = 0 and let z E E(x) n E(y) .
We have z = x and for w E E(x) we have w = x. From the symmetry
condition (a) we get x = z, and thus from the transitivity condition (b)
we get w = x & x = z w = z. On the other hand, z = y, and hence
from (b) w = z & z = y w ;,, y. Hence we have shown that w E E(x)
w E E(y) , which means E(x) CE(y) . By making the same kind
of argument for the set E(y) we arrive at the conclusion that E(y)
CE(x) . This shows E(x) =E(y) . If we now take
8 = {E(x) : xE X},
we see that the theorem is proved.
D Exercises
I.

Letf be a function and A,B CJ?J(f). Show the following:


(a) A CB l(A) Cl(B).
(b) l(A U B) =l(A) U l(B).
(c) l(A\B) Cl(A).
(d) l(A n B) c l(A) n l(B).

2. Prove Proposition 1.3.4: The inverse of a one-to-one function is


a function, which is also one to one.
3.
=x.

If l is a one-to-one function, show that Vx

4.
Let
following:
(a)
(b)
(c)
(d)

..e(J), 1-1 l(x)


0

f be a function and A and B subsets of (f). Prove the


A CB 1-1(A) C1-1(B).
1-1(A U B) 1-1(A) U 1-1(B).
l-1(A\B) =l-1(A) \j-1(B).
1-1(A n B) =1-1 (A) n l-1(B).
=

26 j THE REAL NUMBER SYSTEM


5.

Give an example which shows that we may not have equality in

Exercise l(d). However, show that if f is a one-to-one function we get


equality.
Suppose f and g are functions such that tR- (g) C JFJ(J) and
E " (g), f g(x) =x. Show that g is one to one. If, in addition,
tR-(J) C "(g) and Vy E"(J), g0f(y) =y. show thatf=g-1

6.

Vx

7.

Define a relation on the set Z of integers by writing n

m n

- m is divisible by 5. Show that this is an equivalence relation. How many

equivalence classes are there?

8.
=

For ordered pairs (x, y) and ( u, v) of real numbers write (x, y)

(u, v) there exists a real number t > 0 so that (x,y) =(tu, tv). Show

that this is an equivalence relation and give a geometric description


of the equivalence classes.

9.

Suppose R is a relation with the following properties:


y E tR- ( R) ::::} (y, y) E R.

(a)

(/3)

(x,y ) , ( z,y ) E R::::} ( z,x ) E R.

Prove that (x, y) E R ::::} (y,x) E R.

1.4

THE NATURAL NUMBERS

In this section we shall give a set of axioms for the natural numbers
and derive some of their more important properties. The proofs we
give will be informal, as explained in Section 1.2, and the set theory
we shall use will be intuitive. One may, quite legitimately, ask why
we bother to be so formal about the development of the real number
system when we are being so informal about logic and set theory. One
answer is that the first serious questions about the nature of mathematics
arose in connection with the real numbers, first among the ancient
Greeks and later again among the nineteenth-century mathematicians.
Hence an enormous amount of intellect and energy have been expended
in trying to clarify the nature of these objects. Many people seem to
feel that between certain limits these efforts have been successful agd
that a usable system can be obtained from a few psychologically satis
fying and clearly stated principles or axioms. Of course, there may
be sharp disagreement on just where to start and how far one can go
without getting involved in contradictions. We shall start with a set
of axioms that are not as minimal and/or perhaps not as intuitively
satisfactory as others. However, we feel they are reasonably satisfac
tory and have the advantage that the development of the real numbers
can proceed quite rapidly from them.
The name 'the natural numbers' is given to any set N together with two
functions + and
following axioms:

each with domain N X N and range in N satisfying the

1.4

(a)

THE NATURAL NUMBERS I 27

(x)(y)(x+ y=y+ x).


(x)(y)(xy=yx).

(a')
(b)

(commutative laws)

(b')

(x)(y)(z)(x+(y+z)=(x+y)+z).
(x)(y)(z)(x (y z)=(x y) z).

(associative laws)

(c)

(x)(y)(z)(x

(distributive law)

(y+z) =x y+x z).

(d)

1E

(e)

N & (x)(x

x, y in N, one
(I) x=y.
(2) (3z)(x=y+z).
(3) (3z)(y=x+z).

For every

x).
and only one of the following is possible:
(trichotomy/aw)

M C N , the following is true:


[I E M & (x)(xE Mx+ IE M)] M=N.

(f)

For every

(induction)

Using these axioms it is immediately possible to prove a number of


results about the natural numbers

N.

[We are using

'N'

to designate the

natural numbers, although strictly speaking we should use the triple

'(N +
, , )'.]

However, let us first make some comments about the

previous axioms. The axioms (a) through (c) are of course the familiar
ones from arithmetic. The first part of the cortjunction of axiom (d)
says that

=/:-

0 and names a particular element. The second part of

the axiom states a property for this element. Axiom (e) has been stated
rather informally for the sake of clarity. It simply says that we can have
one and only one possibility; either

and y are the same,

is greater

than y, or xis less than y. More formally, we could have given this axiom
in terms of two axioms:
( e' )

(x)(y)(x =/:- y :::> (3z)( y=x+z_ V x=y+ z)) .


(x)(y)((3z)(y=x+z)- (3z)(x=y+z)).

(e")

The last axiom (f) is often stated in the following way: If P (x) is a state
ment depending on

x,

then

[P( I) & (x)(P(x)P(x+ I))](x)(P(x)).

(f')

This can be translated to our statement (f) by the following device. Set

M={x: P(x)};
then if (f') is true, (f) is true for

M,

and vice versa.

Let us now give some examples that show how these axioms may be
used to obtain other true statements about
that

N.

Our first statement says

has no zero element; actually it says more.

1.4.1

Proposition.

There is no

and no y in

that is,

-(3x)(3y)(x+y=x).

N,

so that

x + y=x;

28 [ THE REAL NUMBER SYSTEM

Proof. We shall prove this by contradiction. Suppose (3x)(3y)


(x+ y=x). This implies

(3x)(3y)((x=x) & (x+y=x)),


which contradicts the trichotomy axiom (e).
Our next statement is to the effect that we have a cancellation law
in N with respect to multiplication.

Proposition.

1.4.2

If x z=y z, then x= y, and vice versa, that is,

(x) (y)(z)(x z=y

z x=y).

Proof. The fact that x=y=>x z=y z follows from our rules
x and y are different names for the same thing and hence we may

that

use them interchangeably in any expression. Hence we must prove the


implication

(x)(y)(z)(xz=yz=>x=y). Suppose this 1s not true.

Then using our rule for negating statements we have

(3x) (3 y)(3z)(xz=y

z &

=F

y).

(l.4.1)

By the trichotomy axiom (e) [or (e')]

=;/:

y=>(3w) (x=y+w Vy=x+ w),

and by the distributive law the latter statement implies

=F

y=>(3w)(x z=yz+w z Vyz=x z+ w z).

(l.4.2)

Hence from ( 1.4.1) and (l.4.2) we get

[(3x) (3y)(3z) (x z=y z & x =F y)]


=> [ (3x) (3 y)(3z)
(x z=y z & (3w)(x z=y z+ w

z Vy z=x z+w

z) ) ],

which contradicts the trichotomy axiom.

1.4.3

Definition
(x)(y)(x < y(3z) (y=x+z)),
(x)(y)(xyx <y Vx=y).

1.4.4

Proposition.

If xy and zw, then x+zy + w;

(x)(y) (z)(w) ( [x y & zw]=>x+ z y+ w).


Proof.

Exercise.

1.4.5

Proposition
(x) (1 x).

Proof..

Let us set

M= {x: 1.;;: x}.

that is,

1.4

Clearly 1

THE NATURAL NUMBERS I 29

{x)(x+ 1 E M). The latter statement follows from


x+ 1 and z = I. Hence {x)(x E M
=:::} x+1 E M), and by the principle of induction it follows that M = N.
EM

and

Definition l.4.3 by putting y

The next statement is to the effect that there is no natural number


between two successive natural numbers.

1.4.6

Proposition

-(3x)(3y)(x<y <x+ 1).


Proof.

By the definition of the relation< it follows that

n < m<n+ 1=:::} (3x )( m =n+ x <n+1).


Using Proposition 1.4.4 we have

::s;;

(1.4.3)

Vx E N,

x=:::} n+ 1

::s;;

n +x.

Combining this with Proposition 1.4.5 we see that

n+ 1

::s;;

Vx E N,

n +x.

(1.4.4)

If we assume that Proposition 1.4.6 is not true, then from (1.4.3)


there are natural numbers m, n, and k so that

m=n+k<n+l.
Replace

(l.4.5)

by k in (l.4.4) and we see that (l.4.4) and (l.4.5) contradict

the trichotomy axiom.

1.4.7 Theorem (Well Ordering of N). Every nonvoid subset of N


has a unique smallest element; that is,Jor every nonvoid S C N there is a unique
m E S so that Vn E S, m ::s;; n.
Proof.

Let

be a given nonempty set in N and set

R= {x: (y)(y E S=}x<y)}.


We have two cases to consider.
(a)

1 R. In this case 1 E S. For in case 1 S, by trichotomy and


(x)(x E S=:::} 1<x), and hence 1 ER. This
is a contradiction. Since 1 E S, the result follows by Proposition 1.4.5.
(b) 1 ER. In this case we claim (3x)(x ER & (x+ 1) R). For
if we suppose the contrary is true, we have (see Exercise 1 of Section 1.1)
Proposition 1.4.5 we get

-(3x)(x ER & (x +1) R) ::::> (x)(x ER=:::} (x+1) ER).


But since I

ER,

the principle of induction tells us that R

trichotomy axiom we get


the hypothesis that

R n S= 0= N n S=S,

+1 E S.

must have

By the

0.

This being the case we must have that


and k

= N.

which contradicts

For, if k+ 1 fj.

S,

3k E R,

so that (k + 1)

fJ.

then by the trichotomy axiom we

30 I THE REAL NUMBER SYSTEM

(x)(x E S :::!? x < k + l V k + l < x).

But for

:::!? k

x E S

cannot have x < k + l, since otherwise k E R

we

< x < k + l, contradicting Proposition l.4.6. Therefore, (x)'(x E S

:::!?k+ l < x), which contradicts the fact that k + I

R.

We now claim that (x) (x E S :::!? k + I ,,,;;; x). Indeed, in the contrary
case, (3x)(x E S & x < k+I), which we have seen contradicts Prop
osition l.4.6. This shows that k+ l is the smallest element of S. The
uniqueness is an immediate consequence of trichotomy.

1.4.8

Theorem (Archimedian Ordering of N)


(x)(y)(3z)(x,,,;;; z

Proof.

y).

By Proposition l.4.5 we know that


(y)(y

#l

:::!?

( 3 z) (y=I+z)).

Therefore, using the distributive axiom we have


(x)(y)(y

which means (x) (y )(y


that (x)(y) (x ,,,;;; x
NOTE:

#I:::!? (3z)(x

# I :::!? x

< x

y=x+x

z)),

y). Since (x)(x ,,,;;; x

l),

it follows

y). This says even more than we set out to pro

Let us agree from this point on that for n, m E

write 'nm' instead of'n

we shall

m' unless we have a special reason for empha

sizing the multiplication function. This is in accordance with the usual


procedure.

D Exercises
1.

Prove the following:


(x) (y)(z)(x+z= y + z :::!?x= y).

2.

Prove the following:


(x) (y)(z)(w)(x,,,;;; y

& z,,,;;; w :::!?xz ,,,;;; yw).

Use this in conjunction with Proposition 1.4.5 to get the Archimedian


ordering of

N.

3.

If mn < mp or m+n < m + p, show that n < p.

4.

If n2= 1, show that n= 1. In formal terms,


(x)(x2

5.

If we define

l+ l,

I:::!? x=I).

prove that V m E

N,

there exists exactly

one k E N so that
m < k < m+

6.

2.

Prove there is no number n so that n2=2. In formal terms prove


-(3x)(x2= 2).

1.5

7. If we define 3
2n=3.
8.

THE INTEGERS AND THE RATIONALS I 31

2 + 1, show that there is no number n so that

Replace the principle of induction by the principle of well order

ing in the axioms for the positive integers and prove (x)( 1 x).
9.

Replace the principle of induction by the principle of well order

ing in the axioms for the positive integers and prove the principle of
induction as a theorem.

10.

Show that the axiom (e') implies the following:


(x)(y)(x=y V (3 z)(y=x+z V x=y+z)).

1.5

THE INTEGERS AND THE RATIONALS

If m and n are natural numbers with m > n, then it is not true that there

is a natural number p so that m + p = n. For if it were true, it would


contradict the axiom of trichotomy. It would be very useful if we could
embed N into a larger setZso that

(x) (y)(3z) (y=x+ z). (The variables

in the last expression, of course, represent elements of Z.) The object


of this section is to construct such a larger set Z called the integers.
We shall construct the integers using ordered pairs of natural num
bers. We define an equivalence relation in
ing (n,m)

(p,q) n +q =m+ p.

(N X N) X (N X N) by writ

[We are, of course, thinking of

the pair (n,m) as n - m, from which the construction of the equivalence

is natural.] To check that this is an equivalence relation we must check


that
(a)
(b)

(n,m)
(n,m)

(p,q) ==> (p,q)


(n,m),
(p, q) and (p,q)
(r,s) ==> (n,m)
=

(r ,s).

We think that (a) is immediate and we leave the verification to the


reader. To prove (b) we must show that from the equalities n+ q =m+p
and p +s=q+r we can get n+s =m +r. From the first equality we
get n+q +r=m + p +r. The second inequality allows us to substitute
p+ s for q +r and we get
(n+s) +p=(m+r) + p.

The result of Exercise 1 of Section 1.4 shows that n+s = m+ r, which

is the result we wished to obtain.

Denote the equivalence class of (n ,m) by 'Z(n,m)' and the collection

of equivalence classes by 'Z'. We want to define two functions+ and

with domain Z x Z and range in Z for which the rules (a) through (d)
of Section 1.3 are valid and moreover (x) (y) ( 3 z) (x=y+z) . We shall
define these functions as follows:
Z(n,m) +Z(p,q)=Z(n+p,m+q),
Z(n,m)

Z(p,q)=Z(np+mq,mp+nq).

32 I THE REAL NUMBER SYSTEM

For these definitions to make sense, they must be independent of the


representatives we choose from the equivalence classes. In other words,
if (n, m)
(ni. m1) and (p, Q)
(Pi. q1), then we must make sure that
(n1 + Pi. m1 + Q1)
(n + p, m + Q)
and
(np + mQ, mp + nQ)
(n1P1 + m1Qi, mtP1 + n1Qi ) . Otherwise we would not be defining
functions. Let us prove the first one. The equivalences (n, m)
(ni. mi)
and (p, q)
(p1, Qi) give
=

and
If we add corresponding sides of both equations we get

n + P + mi + Qi = m + Q + n1 + P1.
This is precisely the condition that (n + p, m +

q)

(n1 + Pi. m1 + q1).

We shall leave as an exercise the proof of the second statement about


equivalence.
The equivalence class Z(n, n) shall be denoted by a special symbol,

O=Z(n,n),
and it is given the name zero. It is almost immediate that for every x
x

{ Z,

+ 0 = x.

Also,

(x) (y)(3z) (x + z =y).


To prove this suppose x =Z(n, m) ,y = Z(p,
is an equivalence class so that

q);

then z =Z(m+ p, n +

q)

x + z =y.

For the construction of Z to be meaningful, we must show that it


contains N in some sense. Define a function i with domain N and range
in Z by the equality

'(n) = Z(n+ 1, 1).


If '(n) = i(m), then (n + 1, 1)
(m + 1, 1), which by definition means
n + 2 = m+ 2 and hence by Exercise 1 of Section 1.4 we get n = m.
=

As we noted in Definition

1.3.3,

a function with this property is said

to be one to one.
The function i also satisfies the equalities

i(n+ m) = i(n) + '(m),


i(n
m) = i(n)
i(m).

Indeed, the first equality is nothing more than the fact that (n + m +

1, I)

(n + m + 2, 2) and the second equality is the fact that (nm + 1, 1)


(nm+ n + m+ 2, n + m + 2). A one-to-one function that preserves
the operations + and is called an isomorphism. Hence N has an image
=

1.5

THE INTEGERS AND THE RATIONALS I 33

in Z that reflects in a faithful manner all its properties. Therefore,


it is usual to say that N C Z and to give the equivalence classes in Z
that correspond to elements of N the same names as the elements of N,
although strictly speaking they are different entities. The triple (Z, +,

is given the name 'the integers', although usually we shall speak only
of Z as being the integers. The set N in Z will be called the natural num

bers or the positive integers. As in the case of the natural numbers, the
symbol for multiplication in Z X Z will usually be dropped and we shall
write 'nm' instead of 'n

m' ..

It is very convenient to define a function with domain and range Z


and which is designated by the symbol '-'. This symbol represents a
collection of ordered pairs in Z X Z, and very properly it is defined as
that function satisfying

(x)(y )((x,y ) E- (m)(n)(m,n EN & x =Z(m, n) y=Z(n,m))).


The function - is read: minus. In line with our usual convention, if

(x, y) E - , then we write y = -x. This way of making a definition is


rather formidable and is usually valuable only when one already under
stands what is being defined. Thus we shall not, in general, make defini
tions in such a formal manner, but may introduce the formalism after
the informal statement has been given.

1.5.1

Definition
(x)(-x=Z(m, n) x=Z(n,m)).
(x)(y )(x - y=.x + ( -y )).
-N = {x: -x EN}.

1.5.2

Defiiiftio n
(x)(y)(x < y y - x EN).
(x)(y)(x ,,;;; y x < y V x= y).
(x)(y )(y > x x < y).
(x)(y)(y x x,,;;; y ).

It is a simple matter to show that the properties (a) through (d) of


Section 1.4 hold for the integers. The trichotomy rule reads that for
every x,y EZone and only one of the following hold: x < y, x

y, y < x.

The trichotomy rule is equivalent to the statement of the following


proposition:

1.5.3

Proposition

(b)

-NnN=0.
-N U N U {O}=Z.

(c)

0 ft -N

(a)

U N.

34 I THE REAL NUMBER SYSTEM

Proof. (a) If - N n N 0 and x E-N n N, then it follows that


-x E-N n N. Hence x - x= 0 E N, which contradicts Proposition

1.4.1.
(b) Suppose now that x EZ and x NU {O}. Let us write x =
Z(m,n). If n < m, then 3p EN, so that m =n+ p. Since Z (p + n,n)
=Z(p+ 1,1), it follows that x EN, which is a contradiction. Also
m n, since otherwise x= 0. Hence m < n and -x=Z(n, m) EN,
which implies x E -N. Thus Z C -NUNU {O}. Since the converse
inclusion is obvious, the proof is finished.
(c)

Clear.

1.5.4
(a)
(b)

Proposition

(x)(xO= O).
(x)( y)( (-x)y=x(- y) =-(xy)).

Proof. (a) Since 0 = 0+ 0, if we multiply both sides by x we get


xO=xO+ xO. Add -(xO) to both sides and we get

xO - xO=xO+ (xO - xO).


Now, for every x

EZ, x - x= 0 and therefore we get


O=xO+O=xO.

(b)

Using the distributive rule and part (a) of this proposition we

have
0 = Oy= (x - x)y =xy+ ( -x)y.
Add -(xy) to both ends of the equality to get

-(xy) = (-x)y.
If we use the commutative rule and interchange the symbols 'x' and ' y'
in this formula we get

-(xy)

(x) (- y).

This completes the proof.


If n, m

EZand m

0, it is not always true that

( 3x)(x EZ & mx =n).

Hence it would be very convenient to embed Z into a larger system that


will make this true.
We shall consider the class
P = {(x,y)

: x

EZ & y EZ & y

0}.

[We are, of course, thinking of the ordered pair (x,y) as x/y in the usual
notation.] We shall say

(m,n)

(p, q)

::}

mq

np.

1.5

THE

INTEGERS AND THE RATIONALS I 35

To prove this is an equivalence relation we must show that the symmetric


and transitive properties hold. The first property is automatic and we
leave this for the reader. Hence we must prove the transitivity property.
To do this we first prove the following.

1.5.5

If x and y are in

Lemma.

more formal

and xy

0, then x = 0

0. In

terms:

(x)(y) (xy=0 =* [x=0

y=OJ).

N U {O} =Z, if we can show that not both x


N, we will be done. If x, y E N, then by Proposi
tion l.5.3(c), xy # 0, and this contradicts the fact that xy
0.
If x E N, y E -N, or if x E -N, y E N, or if x, y E -N, then since
-0= 0, we get from Proposition 1.5.4,
Proof.

Since

-N

and y belong to -N U

(x)(-y)= (-x)(y) = (-x)(-y)= 0.


Apply the reasoning of the first paragraph to get a contradiction in
each case.

xy=0 =* - (x # 0 & y # O). But - (x # 0 & y # O)


0), which gives the desired conclusion.

Hence we have

(x=0

It is now a simple matter to prove the transitivity for the equivalence


relation given above. Suppose

(m,n)

(p,q) and (p,q)

(r,s). This

means

mq=np

and

Multiply the first equation by

s to get

ps=qr.

mqs = nps,
Then use the fact that

ps=qr to get

( ms - nr)q=0.
From Lemma
ms=

1.5.5

we get

ms

- nr=0

q=0. But

#0

and thus

nr.
(m,n),
'Q (m,n) '. The collection of equivalence classes shall be de

We shall denote the equivalence class of an ordered pair

#.

0,

by

noted by 'Q'.. We shall also define two functions+ and

with domain

Q and range Q by means of the following equations:


Q(m,n) +-Q(p,q)= Q(mq + np,nq),
Q (m,n) Q(p,q)= Q(mp,nq).

These are well-defined functions that satisfy the commutative associa


tive and distributive laws. The triple (Q,+, )will be called the rational
numbers, and by an abuse of language we shall refer to Qas the rationals.
We shall write

36 I THE REAL NUMBER SYSTEM

(r)(-r= Q(m, n)<=> r=Q(-m, n)

Q(m, -n)),

and we shall set


Q+

{Q(m, n) : (m >

0)

& (n >

O) }.

Q+ will be called the positive rational numbers. As before, we shall take


r - s=r+ (-s).
The relation< shall be defined as before,

(r)(s)(r<s<=> s - r E Q+),
(r)(s)(rs<=>r<s V r=s),
and we shall also define

(r)(s)(s > r<=>r<s),


(r)(s)(sr<=>rs).

The integers Z can be embedded into Qby means of an order-preserving


isomorphism defined by the equality

'YJ(n)= Q(n, I).


By 'YI being order-preserving we mean that Vm, n E Z, if m< n we

have 'YJ(m)<'YJ(n).

We shall now introduce the concept of absolute value on Q. This will


be needed to describe the construction of the real number system.

1.5.6

Definition.

and range in Q+

The absolute value is that function with domain Q

{ 0} defined by the following:

r<=>r 0,

Ir! = -r<=>r
1.5.7

<

0.

Proposition

(x)(-x lxl & x Ix!) .


(x)(!xi= I -xi).
(x)(y)(!xyl=lxl IYI).
Proof.

1.5.8

Exercise 18 of Section 1.5.

Proposition (Triangle Inequality)

(x)(y)(I lxl- IYI I lx+yl lxl +IYI).


Proof.

Since

x lxl

and

yIYI, it follows t hat x+ y !xi+ IYI.


-(x + y) lxl +IYI Thus from the

In the same way it follows that

definition of the absolute value we have

lx+yl lxl +IYI

1.5

In this inequality, replace

by

-x

THE INTEGERS AND THE RATIONALS [ 37

and

by

y+x to

get

IYI - lxl,,;;; Ix+YI


In the same way, if we replace

by

-y

and

by

y+ x

we get

lxl - IYI ,,;;; Ix+ YI


Taken together these say that

I lxl - IYI I

,,;;;

Ix+YI ,,;;; !xi + IYI

D Exercises
I.

Show that the properties (a) through (d) given in Section 1.4 as

part of the axioms for

2.

hold for Z.

Show the following:

(x)(y)(x,yE Z &x<y=>(3w)(w EN &y.=x+w)).

3.

Vx,yE Z, one and only


x<y, x= y, x > y. Show that trichotomy

Trichotomy in Z is taken to mean that

one of the following is true:

in Z is equivalent to the statement of Proposition 1.5.3.


4.

Show that the isomorphisms and T/ are order-preserving.

5.

Show that Z is Archimedian-ordered in the following sense:

(x) (y)( y > 0 ==> (3z) (zE N &x,,;;; zy)).

6.

Prove the following:


(a)
(b)

x,yE-N=>xyEN.
xE N, yE- N ==> xyE-N.

Using these facts and the known fact that the product of elements in

N, show that
x<y &z > 0 ==> xz <yz.
x<y & z<0 ==> yz<xz.

is again an element of
(c)
(d)

7.

Show that multiplication in Z is well defined, that is, independent

of the representatives chosen from the equivalence classes.

8.

Show the following:

(x)(y)(x,yEZ ==> (-x)(-y) =xy).

9.

Show that t he commutative, associative, and distributive laws are

valid for the rational numbers.

IO.

(a)

Show the following:

(r)(s)(r,sEQ. &r # 0 ==> (3x)(xE Q.&rx=s)).


(r)(s)(r,sEQ.=> (3x)(xE Q.&r+x=s)).
For given rands show that x obtained in (I) and (2) is unique.

(1)
(2)
(b)

38 I THE REAL NUMBER SYSTEM


II.

In defining the equivalence classes for the rationals we allowed

only ordered pairs of integers

for which

ordered pairs of the form

describe at least one unfavorable

(m, n)
(m, O),

n =F 0. If we had allowed

circumstance that would have occurred.

12.

Show that

Q+ is not well ordered under the order relation .,;; .


Q+ has a least element.

That is, it is not true thvery subset of

Prove that Q+ is Archimedian ordered; that is, for every r and s


Q+, 3n E N such that r.,;; ns.

13.
in

14.

Let us define

Show that

Q-

Q- = {x: x E Q & -x E Q+}.


Q+ 0 and Q- U Q+ U {O} = Q.
=

15.

Do Exercise 6 with - Nand Nreplaced by Q- and

16.

If

Q+, respectively.

m EZ, let

Zm = {k : k EZ & k ;:;.: m}.


Show that the principle of induction implies the following for every

MC Zm:
[m EM & (x)(x EM=>x+ 1 EM)] =>M=Zm
17.

Prove that the set Zm of Exercise 16 is well ordered.

18.

Prove Proposition 1.5. 7.

1.6

COUNTABILITY

It is interesting, as well as important, to know that there are the "same


number" of rational numbers as there are positive integers. By this we
mean that there is a one-to-one function with domain

N and range Q.

In this section we shall prove this fact.

1.6.1 Definition. Any set that is the range of a one-to-one function with
domain N will be called denumerable.
1.6.2

Lemma.

The set {(m, n) : m E N & n E N} is denumerable.

We shall first give an informal construction so as to make the formal


proof understandable. Let us look at a picture of the lattice points

(m, n), Fig. 1.6.1. Now follow along the paths of the arrows and attach
successive positive integers to the successive points on the paths. We
shall write this out as follows, calling the point (I,I) the first arrow.
1st arrow:

I---+ (1,I)

2nd arrow: 2---+ (1, 2), 3(2,1)


3rd arrow:

4---+ (1,3), 5---+ (2, 2), 6---+ (3,I)

1.6

nth arrow:

COUNTABILITY I 39

m,, - (1,n), mn+1 - (2,n-1), ,mn+ n-1

- (n,1).

,-.
(I.5)
(1, 4)

"""'.-""'---+-- -+-----+--

(1, 3) 1.---+--+----l

1_....
.,__.

(1, 2)

(1, 1) (2, 1) (3, 1) (4, 1) (5, 1)


FIGURE 1.6.1

We have taken m,, to be the integer that begins the nth arrow. Since
the nth row contains n elements, it is clear that

mn+i

=mn+ n.

Hence

mn =m1 + (mi-m1)

(m3 - mi) +

+ (mn-m..-1)

=1+1+2+ +n-1
n(n-1).
=1+
2

Consequently, we have the general correspondence


n(n

; I)+k+l ""(k+l,n-k),

Okn-l.

We are now in a position to prove Lemma l.6.2. We will have proved


this if we can show that the relation which is the set of ordered pairs
cl> of the form

n(n

; l)+k+1, (k+ I,n-k) ) .

nEN,

Okn-l,

is a function with domain N, range N x N, and is one to one.


It is convenient for us to use the term

and n; that is,

' ( m, n)'

to stand for all integers between

40 I THE REAL NUMBER SYSTEM

(1.6.1)

(m,n) = {k: k E Z & m k n}.

Let us first show that the doman of <I> is N. We shall show the follow
ing is t rue:
(m) (m E N => (3n)(3k)
(n EN & k E ( O, n-

1) &

;- l)+k+l=m)).

n(n

..----

Let P(m)be the statement :


(3n)(3k)(n EN & k E ( O , n-

) &

; I) +k+1 =m).

n(n

P(I) is t rue. Indeed, for m= 1 we take n= 1, k= 0. Next we shall

show that Vm, P (m)=> P(m +1). Let nand kbe integers of the required
kind, so that

; 1) +k+1.

m= n(n

(1.6.2)

Then
m+1 =
If 0 k

<

n-

1,

; l)+ (k+ 1) +1.

n(n

then 0 k+ 1 n-

make P(m+1) t rue. If k=n-

1,

and hence n and k + 1 will


take n1 = n+1, k1 = 0, and we see

that

Therefore, also in this case P(m+ 1) is true. By the principle of induc


tion P(m)is true for all min N.
We next show that <I> is a function. This will be accomplished if we
can show that every integer mhas a unique representation in the form
given by

(l.6.2).

Suppose that n, k, n.
i and k1 are integers with

0 k n- 1, 0 k1 n1 - 1, and

If n

<

ni. then

n (n-

l)

+k

,.:::. n(n-

.....,

1)

+n

< n( n + I)

,.:::.

n1 ( n1 -

I)

k1
'

<
and conversely if n1
n we get the reverse inequality. Hence we can
have equality if and only if n=ni. and this implies that k= k1. This

shows that for a given mthere is one and only one ordered pair (k + 1,

n- k) corresponding to it. This shows that <I> is a function.

1.6

COUNTABILITY J 41

Finally, to show <I> is one to one we note that if


<t>

; 1) +k+1 ) = (k+ l,n-k)

n(n

<1>

n1(n - I)

+k1 +I

= (k1+1, n1-k1),
then k = ki. n

ni, and therefore


n(n

; I)+k+1 = n1(n2- I) +ki+1.

The fact that (<I>) = N X N is almost obvious.

1.6.3

Definition.

A set A is said to be finite A is void or there exists

an n E N and a one-to-one function

& 1 .;;; k .;;; n} such that A = (<I>)

<I>

with domain

(1, n)

= { k: k E N

In these cases A is said to have zero

elements or n elements, respectively. Otherwise A is said to be infinite. A set that


is finit,e or denumerable is called countable.

1.6.4

Theorem.

Proof.

An infinit,e subset of a denumerable set is denumerable.

Suppose A is denumerable, B C A, and B is infinite. Since

A is denumerable, there is a one-to-one function <I> with domain N


and range A. Let
C

{n: n E N & <l>(n) E B};

then C C N, <I> takes C onto B, and C is infinite. Indeed, ifC is finite


there is an integer m and a one-to-one function 'I' with domain the set

(1,

m) and range C. But then <I>

domain

(1, m)

'I' is a one-to-one function with

and range B, which contradicts the fact that B is infinite.

If we can show that C is denumerable, then it will follow th(}VB is


denumerable. For, if 'TT is a one-to-one function with domain N and
rangeC, then <I>

'TT is a one-to-one function with domain N and range B.

The method of constructing 'TT proceeds informally as follows:


'TT( 1) = smallest element ofC

7r(2) = smallest element of C \ {'TT(I)}

'TT(n) =smallest element ofC\{7r(I),

(n

, 'TT

1) }

To proceed more formally we let P(n) be the statement: There exists


a unique function

'TTn

with domain

(1, n)

and range in C such that

42 I THE REAL NUMBER SYSTEM

k n implies 1Tn(j) < 1Tn(k), and IE C and l 1Tn(n) implies


IE gi(1Tn) . The statement P(l) is true; simply take 7T1(1) as the first

j <

element of

C.

If

P(n)

is true, define

1Tn+1(k) 1Tn(k) :::!> k


C\ &2-(1T n) We
=

to be the smallest element in

1Tn+i (n+ 1)

<

n+ 1

and

leave to the

reader the easy task of verifying that all the conditions of the statement
are satisfied. Hence, by the principle of induction,

P(n+ 1)

(n) (P(n) ) .

Define

1T(n)

1Tn(n) .

This defines a function with domain N and

gi (1T) C C. 1T has the


k => 1T(j) < 1T(k) and if l E C and I 1T(n) for some
l E gi( 1T). To prove these statements we first note that if

property thatj <

n E N, then
k n, then

This follows from the uniqueness of the function 1Tk Indeed, if we re


strict 1T n to the set (1, k), we get a function that has all the properties
of 1Tk and hence by the uniqueness
1Tk. Therefore, ifj < k we have

1T(j)

of

1Tj(j) = 1Tk(j)

1Tk

<

this restriction of

1Tn

must be

1Tk(k) = 1T(k)'

and if l

1T(n), IE C, then l 1Tn(n) and hence by P(n) there is a


1Tn(k) = 1Tdk) = 1T(k).
The above proof shows that 1T is a one-to-one function. Indeed, if
1T(m) = 1T(n) we must have m = n. Otherwise, m < n or n < m. In the
first case we get 1T(m) < 1T(n) and in the second case 1T(n) < 1T(m).
Each contradicts the fact that 7T(m) = 1T(n).
The function 1T has all of C as its range. In the contrary case,
C\&i(1T) # 0. Let n be the first element of C\&2-(1T). Now, the collec
tion {k : 7T(k) < n} is nonempty and finite and hence has a maximum
element m. But 'TT( m + I) n, which means n E gi (1T), which is a con

k n

so that/=

tradiction. This completes the proof of the theorem.

1.6.5

Theorem.

Proof.

Suppose

Q+ is

r=

denumerab/,e.

Q(m,

n)

E Q+ and let

<I> be the one-to-one

function obtained in Lemma 1.6.2 that maps N onto N X N. Set

P(r)
w(r)
This defines

=
=

{x: x E N & <l>(x) E r},


<l>(k),
k =least element

of

P(r) .

as a fu nction with domain Q+ and range in N X N. It

is a one-to-one function, since if

<l>(j ) = w(q)
thenj =

k;

this implies j E

P(r) ,

w(r)

<l>(k),

which means

q = r.

1.6

COUNTABILITY j 43

The set (w) is infinite, since the one-to-one character of w implies


that(w IN) is already infinite. By Lemma 1.6.2 N X

N is denumerable

and since (w) is infinite, by Theorem 1.6.4 it is also denumerable.


Hence there is a one-to-one function

function w-

0 7T

7T

that takes

onto(w). The

is a one-to-one function with domain N and range Q+.

D Exercises
I.
2.

P( n)
3.

Prove that every subset of a finite set is finite.


Provide the details of the proof that

P(n) P(n + 1),

where

is the statement given in the proof of Theorem 1.6.4.


Justify all the statements in the last paragraph of the proof of

Theorem 1.6.4. Prove:

4.

(a)

{k : 7r(k)

(b)

Every nonempty finite subset of N has a maximum element.

(c)

7r ( m + 1) n.

<

n}

is nonempty and finite.

Show that the function

constructed in the proof of Theorem

7T

1.6.4 is unique in that it is the only function from N onto C that has its
properties.

5.

(a)

Show that a denumerable set is infinite.

(b)

Show that if B C

and B is infinite, then

is infinite.

These two facts will completely justify the proof of Theorem 1.6.5.

6.

Show that the rationals Qare denumerable and hence the integers

Z are denumerable.

7.

Show that Definition 1.6.3 is meaningful in t

set cannot have both


8.

If

n elements

is a finite set with

one functions with domain (I,

and

ile sense that a finite

elements, where

"'"

elements, show that there are

n)

and range

n!

one-to

A.

9.
Let {Ai.
, An} be a collection of n sets. Define A,
An "inductively" by setting
Ai x A2 x A3= (Ai x A2) x A3,
Ai X
X Ak= (Ai X
X Ak_1) x A k,
Ai x ... x An= (A, x . . . x An-1) x An.
an )
The elements of A1 X
X An are written (ai.

m.

A2

(We shall

not make the meaning of "definition by induction" precise at this time.)


If each set
10.

Ak

is countable, show that

Ai

An

is countable.

Show that the collection of finite subsets of N is denumerable by

proceeding in the following way: If


XA (k)

{0

is a finite subset of N, set

:::)

k
k

1 :::)

E
E

A,
A.

44 J THE

REAL NUMBER SYSTEM

Since A is finite, there is a smallest n(A) E N so that XA(k)=0 if


k > n(A). Now set

Show that <I> is a one-to-one map with range an infinite subset of Q+.
The results of Exercise 5 may be helpful.
From this result it might be natural to guess that the collection of all
subsets of N is denumerable. Disturbingly enough, this is not true, as
we shall show in Section 3.3.

1.7

THE REALS

The equation x2 = 2 has no rational solution. Indeed, suppose there is


a rational solution p/q (in the standard terminology), where p and q are
not both even. Then p2= 2<f, which means p2 and therefore p is even.
If we write p = 2k, then p2 = 4k2= 2<f, which means 2k2=<f and hence
<f and q are even. This contradicts the fact that p or q is not even.
An equation of the form we have just considered arises quite naturally
when we try to compute the length of the hypotenuse of a right triangle
with legs ea--c4 of length one unit. The "geometric proof" of the Pytha
gorean theorem is interesting. We consider a square of side length c and
decompose it into right triangles with legs having lengths a and b, as
shown in Fig. 1.7.1. The square has been decomposed into four con-

FIGURE 1.7 .1

gruent right triangles and a square of side length a - b. Hence we have


area=c2

2ab+ (a - b)2=a2+b2

There are various methods for "completing" the rational numbers,


so that, for example, an equation such as the one we have been con
sidering above has a solution. One method is due to R. Dedekind, and
another method is due to G. Cantor. We shall follow the latter's method
here.

I. 7 THE REALS I 45

To describe Cantor's method for completing the rationals it will be


necessary for us first to make some remarks about rational sequences.

1.7.1

Definition.

nonnegative integers N0

A
=

rational sequence is a function with domain the


N U { 0} and range a subset of the rational numbers.

There is an operation of addition and multiplication for rational


sequences given formally by the following.

1.7 .2 Definition. If r and s are rational sequences, then r + s and r


are rational sequences defined by the following equalities for all n EN0:

(r + s)(n)
(r s) (n)

r(n) + s(n),
r(n)s(n).

We now come to a special type of rational sequence that bears the


name of one of the great nineteenth-century mathematicians, Augustin
Cauchy. In his

Cours d'.analyse Cauchy stated that irrational numbers are

to be regarded as limits of sequences of rational numbers. He tried to

r is a rational sequence so that Ir(n) -r(m) I tends to zero


m get larger, then r(n) tends to a real number. Of course, since

prove that if
as

and

he did not really have a definition of an irrational number, it was not


possible to prove this.

1.7.3 Definition. A rational sequence r is said to be a rational Cauchy


sequence VEE Q+, 3M so that if n,m E N and n;;;::: M, m;;;::: M, then
lr(m) -r(n)I < E.
In the notation of the predicate calculus the last statement of the
definition would be as follows:

(E)(EE Q+ => (3M)(m)(n)


(m,n EN & m,n;;;::: M=>lr(m)-r(n)I < e)).
This may actually look rather formidable, but after the definition of a
rational Cauchy sequence is understood, this notation will seem quite
reasonable. Actually, this method of writing the definition of a Cauchy
sequence is very convenient if one should want to negate the statement
that a sequence is Cauchy.
To obtain some properties of rational Cuchy sequences we make a
few more definitions.

1.7.4 Definition. The maximum of two rationals, r,s EQ, is the value
at (r,s) of that function with domain Q X Q defined as follows:
max

(r, s)

rsr,
sr s.

46 I THE REAL NUMBER SYSTEM

In an analogous fashion we could define the minimum function on QX Qwith


values min(r, s).
In Exercise 2 at the end of this section the reader is asked to show
that every finite set of real numbers has a unique maximum and a
unique minimum element. Hence if {r(k):k E (1, n )} is a finite
set of numbers we can define, in an obvious way, a maximum function
and a minimum function on these finite sets. We shall designate the
values of these functions, respectively, by
max{r(k):k E (l,n)},
min {r(k):k E (1, n)}.
1.7.5

Definition.

Vr EA, lrl

set A

Q is said to be bounded 3m so that

m.

In the symbolism of the predicate calculus the last part of this state
ment is written as follows:
(3y)(x)(x EA==> lxl

y).

Note, at this stage, unless stated to the contrary, our variables are opera
tive on the universe of the rationals.
1.7.6

Lemma.

Every finite set in Qis bounded.

Proof. For n E N0 =N U {O} let P(n) be the statement: Every


set in Qwith n elements is bounded. P(O) is true, since the null set is
clearly bounded. Assume P(n) is true. Suppose A C Q has n + 1
elements; that is, there exists a one-to-one function <I> with domain
(l,n+l) and range A. Let B={<l>(k):k E (l,n)}; then B has
n elements and by P(n), 3l so that Vr EB, lrl l. Let m=max
(l,<l>(n+ 1)); then it is clear that Vr EA, lrl m. Thus P(n) ==>
P(n+ 1) and by the principle of induction (x)(P(x)).

In the above proof we have used the principle of induction


as applied to N0 N U { O} rather than to N. That it may be so ap
plied follows from Exercise 16 of Section 1.5.

REMARK:

1.7.7
Proof.

Proposition.

Every rational Cauchy sequence is bounded.

By the definition of a Cauchy sequencer, 3M so that n M ==>


ir(M) - r(n)I < 1.

From the triangle inequality we get, for n


lr(n)I<

1 +

M,

lr(M)j.

1.7

{r (n): n

The set

THE REALS I 47

(O,M-1)} is finite and hence by the previous


k. Set m= max (k, I + lr(M) I) and we see

lemma is bounded, say by


that

(n )(lr(n)I

m).

1.1.8 Proposition. If r and s are rational Cauchy sequences, then


r+s and r s are rational Cauchy sequences.

Proof.

For every

lr(n)-r (m)I
Hence, if

n,m

Q.+, 3M such

<

e/2

that

and

n,m

ls( n)- s(m)I

implies
<

e/2.

M,

l(r+s )(n)-(r+s)(m)I

lr(n)-r(m)I + ls(n)-s(m)I

<

E.

Since r and s are Cauchy sequences they are bounded and hence
(3k )(n)(lr(n) I < k & ls(n)I < k). There exists an M such that n, m M
implies

lr(n)-r (m)I
Hence, if

n,m

<

e/2k

and

ls(n)- s(m)I

<

e/2k.

M,

I(r s) (n)-(r s )(m)I= lr(n )s(n)-r(n)s(m) + r(n)s(m) - r (m)s(m)I


l r( n)I ls(n)-s(m)I+ ls(m)I lr (n)- r(m)I

<

As we have _mentioned before, it was impossible for Cauchy to prove


that every rational Cauchy sequence converged to a real number, since
he had no formulation of an irrational number. This gap was filled in
the second part of the nineteenth century by people such as Cantor,
Dedekind, Heine, and Weierstrass. From intuitive geometric considera
tions it seemed reasonable that a rational Cauchy sequence should
converge to some entity. It was Cantor's view that this entity could be
identified with the rational Cauchy sequence itself. However, again from
an intuitive geometric point of view, there are many Cauchy sequences
that converge to the same entity so, more properly, Cantor's view was
that this entity could be identified with an equivalence class of rational
Cuchy sequences.

1. 7 .9 Definition. Two rational Cauchy sequences r and s are sai d to be


equivalent for every e E Q.+, 3M such that for every n E N0 with n M
we have lr(n) -s(n)I < E.
We leave as an exercise the proof of the fact that what we have
defined is indeed an equivalence relation.
We shall designate the collection of equivalence classes of rational
Cauchy sequences by 'R' and the equivalence class of a rational Cauchy

48 I THE REAL NUMBER SYSTEM

sequencer by

R x R and

'R(x)'. We shall define


R by means of te

range

two functions+ and with domain


equations

R(r) +R.(s) =R(r+s),


R(r) R(s) = R(r s).

It is, of course, necessary to show that these are well-defined functions.


That is, the definitions are independent of the particular represent atives
chosen from each class. We shall leave the proofs of these facts as exer
cises. We shall call the ordered triple

(R, +, ) the real number system, but


R is the real number
-

by an abuse of language we shall simply say that

system. Also, in accordance with s tandard practice, we shall usually


drop the multiplication dot when multiplying real numbers.
We shall leave as an exercise the facts that

R obeys

the commutative

associative and distributive laws. The rationals Qare embedded into

by means of the isomorphism defined by

p(s) = R(r8),
where

r,(n) = s for

all

n E N0

It is clear that

p is order- preserving (see

Definition l. 7 .13 and the notion of order preserving before Definition

1.5.6). In mos t ins tances no confusion will result if we label the elements
in

which are in the range of

p by the same
Q. C R.

names used in

Q..

In fact,

we shall suppose that N C Z C

1.7 .1 0 Definition. A rational sequence r is said to be positive


EQ.+ & 3M so that Vn E N0 with n ""'M we have r(n)""'S.

3S

In the formalism of the predicate calculus the last part of the above
s tatemen t would read as follows:

(3 S)(8 E Q.+ & (3M)(n)(n E N0 & n""'Mr(n) ""'8)).


1.7.11 Definition. R+ is the set of all equivalence classes R(r), where
r is a positive rational Cauchy sequence. We also define -R(r)= R(-r),
where (-r)(n) =-r(n), and R-= {R(r): -R(r) ER+}. The set R+ is
called the positive real numbers and the set R - is called the negative real num
bers. As usual we write R(r) - R(s)for R(r)+ (-R(s)).
The next theorem is the st atement of trichotomy for R.

1.7.12

Theorem
0 ft_ R+ U R-.
R+ n R-=0.
R+ U R- U {O}= R.

I.7

Proof.

-R(r)

THE REALS I 49

The first statement is clear. Next, if R(r) ER+ n R-, then

E R+ n R-, which implies R.(r)

R(r)

Suppose

- R(r) =

is a contradiction.

-- 0. This means that

0 ER+ n R-, which

is not equivalent to the zero

sequence; that is,

-(e)(e

E Q+

(3M)(n)(n

E N0 &

n;;;,: M lr(n)I

<

e))

is a true statement. Using our rule for negating statements we find that
this statement becomes

(3e)(e
Since

E Q+ &

(M)(3n)(n

E N0 &

r is a Cauchy sequence, 3N so that n,m


n0 > N so that lr(n0) I ;;;,: E.

n;;;,: M
> N

&

lr(n)I;;;,: e)).

lr(n) - r(m)I

<

e/2;

Choose

r(n0) # 0,
we have r(n0) > 0 V r(n0) < 0. In the first case, since r(n0) - r(m) :,;;;
lr(n0) - r(m)I < e/2 for m;;;,: N, it follows that r(m) > r(n0) - e/2;;;,: e/2.
In the second case, since r(m) - r(n0) :,;;; lr(n0) - r(m) I < e/2 it follows
that for m ;;;,: N, -r(m) > -r(n0) - e/2 = lr(n0)I - e/2 ;;;,: e/2. Hence r
or -r is a positive sequence.
Now, we know that trichotomy holds in Qand hence since

1.7.13

Definition
(x)(y)(x<y<=>y-x ER+).
(x)(y)(x:,;;;y<=>.x<y V x=y).
(x)(y)(x >y<=>y <x).
(x)(y)(x;;;,:y<=>y:,;;;x).

Note that in terms of the relation < ,Theorem

1. 7.12 is the statement

of trichotomy for R in the form: Ifx


y
, ER, then one and only one of
the statementsx<y,x=y
,y <xis true. Indeed, the three conditions
in the. theorem are equivalent to the statement that Vx
,
y ER
.
one and
only one of the following possibilities holds: y-x ER+
, y-x= 0,
y-x ER-.

1.7.14

Lemma.

Ifx <y,

then

3r

E Q such that x<

<y.

Let x R({), y= R('); sincey-x > 0, 3S E Q+ and 3L


n;;;,: L '(n) - {(n);;;,: S > 0. Further, since {and'are ra
tional Cauchy sequences, 3M such that n,m;;;,: M l'(n) - '(m)I< S/4
and l{(n) - {(m) I < S/4. Pick n0;;;,: N = max(L, M) and take r =
['(n0) + {(n0)]/2. Then for n ;;;,: N we have

Proof.

such that

'(n) - r = '(n) - '(n0) + ['(n0) - {(n0)]/2 > S/4,


r - {(n) = ['(n0) - {(n0) ]/2 + {(n0) - {(n) > S/4.

50 I THE REAL NUMBER SYSTEM

If we now call the isomorphic image of

in R by the same name, we

have proved the lemma.

1.7 .15

Theorem. R+ is Archimedian-ordered in the sense that

(x)(y)(x,y ER+=>(3n)(n

EN

& x:;;; ny).

Proof. By the previous lemma 3 r,s E Q.+ , so that 0 < r < y and
x < s < x + 1. Since Q.+ is Archimedian-ordered (Exercise 13 of Section
1.5), 3n E N, so thatx < s:;;; nr < ny.

1.7 .16

range R+

The absolute value is that function with domainRand


{O} defined by the following:

Definition.
U

x :::) x 0,
lxl = -x :::) x 0.
:;;;
1. 7.17

For everyx and yin R,

Theorem.

x:;;; lxl,
-x:;;; lxl,
lxl =I-xi,
llxl - IYll:;;; Ix+ YI:;;; lxl + IYI
Proof.

See Propositions 1 5. 7 and l.5.8.


.

The important question now arises as to what happens if we repeat


the process for

real

Cauchy sequences that we have just gone through

for rational Cauchy sequences. Theorem l.7.20 below shows that we get
nothing new.

A real sequence is a function with domain N0=N


U {0} and range in R. A real Cauchy sequence x is a sequence such that
Ve>0, 3N so that if n,m EN0 and n,m N, then lx(n)- x(m) I <e.
1.7.18

Definition.

In the formalism of the predicate calculus the definition of a real


Cauchy sequence would read as follows: A real sequence
Cauchy sequence

is a real

:::)

(e)(e >0=> (3N) (n) (m)(n,m E N0 & n,mN=> lx(n) - x(m) I <e)).
Our variables, of course, are now assumed to take values inR.

1.7 .19 Definition. a ER is said to be a limit of the real sequence


x <=>Ve>0, 3N so that Vn E N0 with nN, lx(n)- al < E. If the real
sequence x has a limit a we say that x is convergent, and also x converges to a.
In the formalism of the predicate calculus the definition of a limit
would be as follows:

a ER is

a limit of the real sequence

x :::)

(e)(E>0=>(3N)(n)(n E N0 & nN=>lx(n)- al <e)).

1.7

THE

REALS I 51

Every real Cauchy sequence has a unique limit in R.

1.7.20

Theorem.

Proof.

To make the proof clear, it is necessary to distinguish care

fully between element s in Qand elements in R which are the isomorphic


images of elements in Q. For a rationals E

R, let us write

s=p(s),
p is the isomorphism taking Qinto R.
r be a rational Cauchy sequence with range in R. For every n

wheres E Q and
Let

E N0

we may write

r(n) =p(i(;i)) = R(1\),


where rn is a
rn(P) = r(n).

rational Cauchy sequence with range Q and

Suppose Eis a positive rational in

R. Then

lr(n) -r(P)I

3N so that

Vp

E N0,

n,p N:::}

< E/2,

or, what is equivalent,


-E/2 <

r(n) -r(p)

Since the isomorphism


-

()

<

r(n) -r(p)

and

< E2
/ .

is order-preserving, this gives

r"{n) - f'(p)

and

or what is equivalent,

-(E2
/ ) < rn\P)

,..-:-...

- r(p)

and

This leads to the set of inequalities

for all

n,p

e+

[w) -r"lP)l > ((/2),

[r-;;{P) - r(p) ] > (@),

N.

If we now identify with the constant Cauchy sequence defined by

r;{jj)

, we have shown that for Vn N the Cauchy sequences

r.-+ [-T]
are positive, where

r is

r. ..- [....
rn .. -r']
....

and

the rational Cauchy sequence that evaluated at

p is r(p). If we take the equivalence classes of these sequences we get


R(T,') + R(T,;'- r) > 0

R("f;) - R(T,;- r) > 0.

and

If we now set

a= R(T),
then from the facts that
and

r(n) = R (r,;-),

52 J THE REAL NUMBER SYSTEM

we have arrived at the conclusion that

Vn ;;;.: N,

lr(n)-al < E.
x is a real Cauchy sequence. Using the Archimedian
Un={m: m E Z & x(n) .;;; m/n} is nonvoid and
hence, by the well ordering of N (see Exercise 17 of Section 1.5), Un
has a minimal element mn. If we set r(n)= mn/n, then from the fact that
(mn-I)/n < x(n) we get 0 .;;; r(n) - x(n) < I/n. Since x is a Cauchy
sequence, the sequence r defined by the numbers r(n) is Cauchy.
Indeed, Ve> 0, 3e' E Q+ with e' < E and 3M so that n,m ;;;.: M
==> lx(n) -x(m)I < e'/2. Hence, if n,m;;;.: max {M, 4/e}, we have
Suppose now that

ordering of R+, the set

lr(n)-r(m)I .;;; lr(n)-x(n)I +lx(n)-x(m)I


I
I
+ix(m)-r(m)I <-+-+e/2 < e'.
n
m
From the first part of the proof,
==>

lr(n) - al < e' /2.

3a

Consequently, for

E R and

n;;;,:

3L such that n ;;;.: L


2/e') we have

max(L,

lx(n) -al .;;; ix(n)-r(n)I +lr(n)-al < E.


This is what we set out to prove.
To show that
that

;;;.:

a is unique,

suppose

3b,

so that for every

e> 0, 3N such

implies

lx(n)-bl < E.
Then

la - bl < lx{n)-al+lx(n) - bl < 2e.


This implies

a= b;

for in the contrary case the use of the trichotomy

theorem for R shows that

la -bl> 0. Choose 0 < E < la -bl/2 and we

get a contradiction.
This concludes the proof of the theorem.
REMARKS:

(a) Let us emphasize once more the meaning of this last

theorem. If we form an equivalence class of real Cauchy sequences


and define functions of addition and multiplication, as we did for
the rationals, then Theorem 1. 7.20 tells us that, up to isomorphism,
we shall get nothing new. For this reason R is said to be complete.
Since

V2 <t,

Q, we know that Q is not complete.

(b) The uniqueness part of the proof of the last theorem shows that
we are justified in calling a limit of a sequence, the limit of the sequence.
As is usual, if

is the limit of the real sequence


lim

x,

we shall write

x(n)=a,

and, as we noted before, we say that the sequence


We shall also write

x(n) -+a

as

n--+oo.

converges to

a.

I. 7
NOTE:

as

THE REALS I 53

To avoid notational confusion we shall usually use a term such

'(x(n) )' or '(xn)' to denote a real sequence. The second notation is

the more standard one in the mathematical literature and, although


we shall eventually revert to it, for the remainder of this chapter we
shall use the functional notation to remind the reader that a real
sequence is a function on N0. In the future when speaking about
sequences we shall usually drop the word 'real', since it will be under
stood that R is the range of the function that is the sequence.
Let us now prove the converse to Theorem 1.7.20.

1. 7 .20'
Proof.

Every convergent sequence

Theorem.
Suppose

is

a Cauchy sequence.

(x(n)) is a real sequence and


Jim

x(n) =a.

n- co

This means that VE>

0, 3N so that n

N implies

lx(n) - al < e/2.


Therefore, if

n,m

N,

lx(n) - x(m) I

lx(n) - al

la - x(m) I <

E.

The sum and product of real sequences are defined in the same way
as the sum and product of rational sequences; see Definition 1.7.2.
The analogues of Propositions 1.7.7 and 1.7.8 are valid for real se
quences and we leave it to the reader to satisfy himself of these things.
There is an important concept that we have not noted yet and that
is the concept of a subsequence.

1. 7.2 1 Definition.
A sequence y is said to be a subsequence of a sequence
x <=:? there is a function <I> with domain N 0 = N U { 0} and range in N0 so
that j < k => <l>(j) < <l>(k) and

y=xo<I>.
In other words, y(n)
= (x(<l>(n))).
The condition

x(<l>(n)) and we shall write (y(n)) = (x

<l>(n))

j < k => <l>(j) < <l>(k) means that the range of <I> is

denumerable. Hence, speaking loosely, <I> "picks out" an infinite num


ber of the ordered pairs

(n, x(n)) to form the sequence y. As a simple


(x(n)) to be the sequence given by x(n) = n2

example, suppose we take


+

2. Take <l>(n) =2n

+ l; then

y(n) = x <l>(n) = (2n


0

1)2

4n2

4n

+ 3.

54 I THE REAL NUMBER SYSTEM

D Exercises
I.

Use the principle of induction to show the following inequalities:


(a)

;;,.

&

N0 =>

n(
(1+h) n ;;,. 1+nh+
0

(b)

(I - h) n
h

(c)

N0 =>
n (n
1 - nh+

h2

0, n. E N, n

;;,.

(I+h) n
2.

&

n; I)

;;,. 2 =>

;;,. 1+nh +

; I) h2

h2

Show that every finite set in R has a unique maximum element

and a unique m in imum element. Use this fact to give another proof
of Lemma 1.7.7.

3.

Show that there is always an irrational (not rational) number

between any two real numbers. Recall that there exists an irrational
number:

\/2.

If

lxl <

4.

1, show that

xn - 0
lxl

> 1, show that Vk E

5.

If

6.

For every

as

n-oo.

E R, show that

xn
,-o
n.

as

n-oo.

x(n) -a, y(n) - b as n - oo,


and x(n)y(n) -ab as n -oo.
7.

If

8.

If

(x(n))

show that

x(n) + y(n) -a+ b

is a convergent sequence, show that every subsequence

(x (n)). Conversely, if every "proper"


(x(n)) converges, then (x(n)) converges. By a "proper"
subsequence we mean a subsequence x <I>, where (m) (3n) (n ;;,. m &
<I>(n+I) > <I>(n)+1).
converges to the same limit as

subsequence of

9.

W ithout using Theorem 1.7.20, show that if a subsequence of a

Cauchy sequence converges, then the Cauchy sequence itself converges.

10.

Suppose

x(n) -a

as

n-oo and 3N such


lx(n) - bl

What can be said about

la - bl

<

c.

that

n;;,. N =>

1.8

If

11.

A REVIEW OF THE REAL NUMBER SYSTEM AND SEQUENCES I 55

as

x(n)- a

show that

n- oo,

lx(n) I - lal

as

n- oo.

Give an

example that shows that the converse is not always true. For what
value(s) of

12.

If

a,

if any, is it

(x(n))

always

true that

lx(n) I -l al

x(n) -a?

is a sequence and

x(2n) -a, x(2n

as

1) -a

n-

oo,

show that

x(n)-a
13.

Let

(s(n)) be

as

n-oo.

a sequence and set

<T(n)

s(O)+ s(l) +
n+l

s(n).

If

s(n)-s as n - oo, show that <T(n)-s as n- oo. The


(<T(n)) is called the Cesaro mean of the sequence (s(n)).

If A C Ris a finite set with n elements, show that there is a unique

14.

one-to-one function <I> with domain

<l> (j)

15.

<

For every

is countable.

N,

1.8

<

U {An : n

N}

let <l>n be the unique function


(1,mn) and range An. Let 'I' be that func

defined by

'l'(k,n)
TT

[Hint: If An has mn elements,

tion with domain

0 TT

and range A so that

suppose An is a finite set in R. Show that

of Exercise 14 having domain

Let

(1, n)

<l>(k).

'I'

sequence

{ <l>n (k) k (I,m,.),


0

otherwise.

be a one-to-one function with domain

N and range N
0 TT ) . l

N. Then

is a function with domain N andA C &i('I'

A REVIEW OF THE REAL NUMBER SYSTEM AND SEQUENCES

In the previous sections we constructed the real number system starting


from a set of axioms for the natural numbers and derived a number
of relevant properties. In developing further properties of the real
numbers, it is usually more efficient to list a certain number of these
relevant properties and then proceed from these properties without
regard as to how they arose. In this section we shall give this list.
Consequently, this section can be read in either of two ways:
review of the previous material, or

(2) taking

(1)

as a

statements (a) through (j)

below as an axiom system for the real numbers. What is a theorem under
reading (l) may be an axiom under reading

(2)

and vice versa.

56 I THE

REAL NUMBER SYSTEM

The numbered statements given below have in the previous sections


either been taken as definitions or axioms, or have been proved. Under
the reading (2) they are either definitions or can be proved using state
ments (a) through (j) as axioms, and we have indicated how to do this
in some instances. For example, if the current section is being read as
introducing an axiom system for the reals, then it is of course neces
sary to identify the natural numbers in the set R, since several axioms
require the use of the natural numbers. This is what has been done in
.the statements labeled 1.8.1 and 1.8.2. Note then that the principle of
induction can .be proved as a theorem, 1.8.3. On the other hand, if
the current section is being read as a review, then it is not necessary to
identify the natural numbers, since they form the basis for the con
struction of the reals. However, then the statement 1.8.2 must be proved.
Because of the dual nature in which this section may be viewed, we have
not labeled statements as definitions or theorems but have left it to the
reader to decide in which way he wants to read it. In writing it our
tendency has been to consider statements (a) through (j) as a f ull
blown axiom system for the reals.
One of the properties of the real number system that we wish to list
is stated in the language of sequences. Hence this section will serve as
a good opportunity to look once again at the salient definitions and facts
about sequences.

We shall describe the real number system as a sextuple consisting of a set R,


two functions + and with domain R X R and range R, a relation < with
domain and range R, a function with domain and range R whose value at
x E R is designated by -x, and a function with domain and range R'\{O}
whose value at x ER'\{O} is designated by x-1 or I/x, and satisfying the
properties (a) through (j) listed below.

(a)

For every x and y in R:

x+y=y+x,
x . y = y x.

(commutative laws)

(b)

For every x, y, and z in R:

(x+y)+ z=x+ (y+z),


(x y)
z=x
(y z).

(c)

(associative laws)

For every x, y, and z in R:

x
(d)

I E R, 0 ER,

I -

(y+ z)= x y+ x z.

0, and for every x in R:

x+O=x,
x I= x.

(distributive law)

1.8

(e)

A REVIEW OF THE REAL NUMBER SYSTEM AND SEQUENCES I 57

For every x in

R:

x+(-x)=O,
x # 0 =>x x-1

(f )

For every x , y, and z in


x

(g)

For every x, y, and

x
(h)

<

in

<
<

1.

one and onlyone of the following is valid:

y, x=y y
,

x.

<

(trichotomylaw)

R:
<

z.

y =>x+ z < y + z,
y & 0 < z) =>x z

<

<

For every x , y, and z in


x
(x

R,

&

<

z=>x

R:

z.

To state the Archimedian ordering property for R it is necessary to


identify the natural numbers in R. For this purpose we introduce the
following definitions:

1.8.1

A set A C

R is

said to be inductive

(a)

1 E A.

(b)

xEA=>x+l EA.

1.8.2

The natural numbers N is the intersection of all inductive sets in


that is, is the smallest inductive set in R.

R,

Note that N is not the null set 0, since R itself is an inductive set and
1 belongs to every inductive set. Note also it is a simple matter to prove

that the principle of induction is valid for N:

1.8.3

For everyMC N, if

1 EM

and xEM=>x+ IE M, then

M=N.
Indeed, M is an inductive set and hence NC M. Since MC N we
must have M=N.
Now that we have N we can state the next property of the real number
system.
(i)

For every x and yin R with x


that x < ny (Archimedian ordering).

> 0

and y>

0,

there is an n in N so

To state our last and rather crucial property for the real number
system, it is necessary to consider the concepts of a sequence, a Cauchy
sequence, and a limit of a sequence. For this purpose we have the follow
ing definitions:

58 I THE REAL NUMBER SYSTEM


1.8.4 A (real) sequence is a function with domain N0 =N U {O} and
range in R. A sequence will usually be denoted by the term '(x(n)) ' or ' (x n) ',
and this is to indicate that x(n) or Xn is the value of the Junction at n E N0
A sequence (y(n)) is said to be a subsequence of (x(n))<==> there is a function
<I> with domain N0 and range in N0 so that j < k ==> <l>(j) < <l>(k) and
y(n)=x(<l>(n)).
1.8.5

The absolute value is the function defined by


lx. l =

1.8.6

x<==>x 0,
-x. <==> x < 0.

The following properties hold for the absolute value:


-x lxl,
x lxl,
x
I l l- IYI I Ix+ YI

We are, incidentally, using

lxl =I-xi,
x
l l+ IYI

to mean

a < b

or

a=b.

1.8.7 A sequence (x(n)) is said to be a Cauchy sequence<==> VE > 0,


3N so that if m,nN, then lx(n) - x(m) I < E.
1.8.8 A sequence (x(n)) is said to have a limit<==> 3a such that VE > 0,
3N so that n N ==> lx(n)- al < E. The number a is said to be the limit of
(x(n)). If a sequence has a limit, it is said to be convergent.
We can now state a crucial property of the real number s ystem:

Every Cauchy sequence has a limit.

(j )

Let us note that if a sequence has a limit, then the limit must be

unique.
3N so

Indeed, suppose that

that

a and bare limits of (x(n)). Then VE > 0,


n N ==> lx(n) - al < E/2 and lx(n) - bl < E/2. Hence we

have VE > 0,

la- bl
If

S =la - bl > 0,

la- x(n)I+ lx(n) - bl <

E,

nN.

choose E =S and we get the following contradiction

to the trichotomy law:

S= la- bl< S.
Hence, if a sequence
of

the

limit

a.

(x(n))

has a limit

We usually write

lim

n-oo

a,

we are justified in speaking

x(n) =a,

or

x(n)-+ a

as

n-+

oo.

1.8

A REVIEW OF THE REAL NUMBER SYSTEM AND SEQ.VENCES I 59

The statement converse to U> is true and very easy to prove:

1.8.9

Every convergent sequence is Cauchy.

Indeed, suppose x(n) - a. This means that V e> 0, 3N so that


n;:;,: N lx(n)-al < e/2. Hence, if m,n;:;,: N, we get, using the triangle
inequality,

lx(n)-x(m)I ,,;;:; lx(n)-al+lx(m)-al < e.


This, of course, proves that

(x(n))

is Cauchy.

1.8.10 Every convergent sequence is bounrkd; tha t is, 3M so that Vn EN0,


lx(n)I,,;;:; M.

n;:;,: N lx(n) - al < 1. Hence


n;:;,: N lx(n)I,,;;:; I+lx(a)I.
Let L = max{x(n): n E( O, N) } and M= max(L, I+lal). Clearly
Vn E No, lx(n)I,,;;:; M.
Suppose

x(n) -a;

then 3N so that

using the triangle inequality, we get that

1.8.11

defined

as

The sum and product of two sequences (x(n)) and (y(n)) are
follows:
(x+y)(n) = x(n)+y(n),
(xy) (n) = x(n)y(n).

Note we have reverted to the custom of dropping the symbol

''

for

multiplication.

1.8.12

If x(n) - a and y(n) -b, then


(x+y)(n) - a+b,
(xy)(n) -ab.

x(n) -a and y(n) -b mean first of all that 3M> 1,


Vn EN0, lx(n)I ,,;;:; M, and ly(n) I ,,;;:; M. S econd, Ve> 0, 3N
n;:;,: N lx(n) - al < e/2M, ly(n)-bl < e/2M. Hence, if n;:;,: N,

The facts that


so that
so that

l(x+y)(n)- (a+b )I,,;;:; lx(n)- al +ly(n)-bl < e,


I(xy)(n)- abl ,,;;:; ly(n)I lx(n)-al +lal ly(n)-b I
,,;;:; M {lx(n)- al+ly(n)-bl} < e.
In the last part of the above proof we have used the fact that lx(n)I ,,;;:; M
for all

n EN0 lal ,,;;:; M.

We shall leave the verification of this simple

fact to the reader.


In 1.8.2 we defined the natural numbers. Now take

-N= {x: -x EN},


Z=-N UN U{O}.

60 I THE REAL NUMBER SYSTEM

The set Z is, of course, called

the integers. The rational numbers is the

set
Q=

{m/n: m,n

We are, of course, writing

1.8.13

Vx

E R,

E Z,

O}.

m/n for m( l/n).

The rationals are dense in the reals in the sense that VE


3r E Q, so that Ix - rl < E.

> 0 and

3n E N so that lxl < n,


-n < x < n. Again, using Archimedian ordering,
3m E N so that l/m < E. Let k E N be the smallest natural number so
thatx,,;:;; -n + k/m. Then -n + (k - I)/m < x and if we set r = -n + k/m,
we have Ix - rl = -n + k/m - x < k/m - (k - l)/m < E.
Indeed, by the Archimedian ordering of R,

or,

equivalently,

In the previous proof we have made use of some facts about the
relation < without explicitly mentioning them. For example, we said

lxl < n is equivalent to the fact that -n < x and x < n. Indeed,
x ,,;:;; lxl we get from (g) that x < n, and from -x ,,;:;; lxl we get
-x < n, and from (e) and (h), n + x > 0. Now, using (d), (e), and (h)
we get x = -n + n + x > -n + 0 = -n, which is what we set out to prove.
that

from

The reverse implication follows in a similarly easy way. Note we are


using the usual convention thatx -

y= x

(-y). We are sure the reader

can fill in the details of proofs of other facts.


Aside from the fact that the rationals are dense in the reals, they also
have the property that they have the "same number': of elements as
the positive integers. More formally this can be written as follows:

1.8.14

There exists a one-to-one function with domain

N and range Q.

The proof of this statement can be found in Section 1.6. More gen
erally we can make the following definition:

1.8.15 A set
tion with domain

A
N

is said to be denumerable
and range A .

::> there

exists a one-to-one func

Another way of phrasing 1.8.14 is to say that the rationals are denum
erable. We can also talk about finite sets.

1.8.16 A set A is said to be finite ::>A is the null set, in which case we say
that A has zero elements, or else there is a one-to-one function with domain the
{k: k E N & 1,,;:;; k,,;:;; n} and range A, in which case we say
set (I, n)
A has n elements. A set that is either finite or denumerable is called countable.
=

D Exercises
Use only statements (a) through U> as axioms in proving the following:

1.8

A REVIEW OF THE REAL NUMBER SYSTEM AND SEQUENCES J 61

1.

Show that -(x+y) =-x+ (-y) and (xy)-1=x-1 y-1.

2.

If n,m E N, then n +m E N and n

3.

Show that 0 < 1.

4.

For every n E N, 1 :;:;; n.

m E N.

5. It is not true that there is an n E N and an m E N so that


n<m<n + I.
6.

Show that N is well ordered; that is, every nonvoid subset of N

has a unique smallest element.

7.

If n,m E Z, then n + m E Zand n

8.

Show that every nonvoid subset of Zthat is bounded below has

m E Z.

a first element. A set A C Zis said to be bounded below:::} 3m so that

Vn EA, m:;:;; n.
9.
a

If x(n) .- a and b,

- b :;:;;

c.

E R so that Vn, x(n) - b <c show that


bl :;:;; c. Give

Show that if Vn, l x( n) - bl <c, then also la

an example which shows that in general we cannot conclude that


a

- b <c or la - bl <c.

10.

If Vn E N0, x(n)

11.

Set

O!

l/(n +I), prove that x(n)

= 1 and Vn E N, n! = 1

._

0.

n. If k E N and

_(k!)R
,
n!

x( n) show that x(n) .- 0 as n .-

12.

If x(n)

._

oo.

and y(n)

._

b - 0, show that

x(n)
a
-----
y(n)
b

13.

Let

that

p(x)= 2x4 +x2+3x+ 1,

p(n)
q(n)

14.

q(x)= 3x4 +x3 +2x+3.

._

_g_ .
3

Prove the following:


2xy :;:;; x2+y 2, Vx,y E R.
(b ) x+ l/x ;::;,: 2 , x > 0.

(a)

[Hint: (x - y)2;::;,: O.]

15.

Use Exercise 14 to prove the Cauchy-Schwarz inequality


(X1Y1 +

XnYn>2:;:;; (X12+

+Xn2HY12 +

[Hint: Use Exercise l 4(a) to get for each k,

+Y n 2)

Prove

62 I THE REAL NUMBER SYSTEM

(X12 +

xk2

+
+ Xn2)
(Y12

Yk2
+

+ Yn2 )

Then add both sides of all the inequalities as k varies from I to n.]

1.9

PROPERTIES OF THE REALS

A very crucial property of R is that every Cauchy sequence of reals


converges. One reason why this property is so crucial was pointed out
in Section 1 . 7. There are, however, a number of other properties which
may be used in place of this property and which may be more convenient
to use in some circumstances. The purpose of this section is to derive
these other properties.

1.9.1 Definition. A set S C R is said to be bounded above [below] <==?


(3M)(x)(x E S x M[x;,, M]). Any such M is called an upper bound
[lower bound] for S. A set is said to be bounded<==? it is bounded above and
bounded below. A sequence is said to be bounded above [below] <==?the range
of the sequence is bounded above [below]; it is said to be bounded<==? its range
is bounded.
1.9.2 Definition. A real sequence (x(n)) is said to be monotone increas
ing [nondecreasing] <==? (n)(m)(n <mx(n) <x(m)[x(n) x(m)]),
and monotone decreasing [nonincreasing] <==? (n) (m)(n <m :=} x(m)
<x(n)[x(m) x(n)]).
1.9.3 Theorem. Every monotone nondecreasing [nonincreasing] se
quence that is bounded above [below] has a limit.
Proof.

(x(n)) is a monotone nondecreasing sequence

Suppose

bounded above by M. We claim that (x(n)) is Cauchy. For if (x(n)) is


not Cauchy, then we must negate the statement

(e) (E

>

0 :=} (3L)(n)(m) (n, m ;,, L :=} lx(n) - x(m) I <e)).

The negation of this statement is the statement

(3e)(e
Take L

>

0 & (L)(3n)(3m)(n, m ;,, L & lx(n) - x(m) I ;,, e)).

2 and fix n1 > m1 > 1 so that x(n1) - x(m1)

;,, E.

The absolute

value is not needed, since by monotonicity x(ni) ;,, x(m1). Next take

n1 + 1 and n2

n1 so that x(n2) - x(m2) ;,, E. Assuming we


nk + 1 and nk+i > mk+i > nk
nk, mk, take L

> >

have chosen ni.m1,

so that x(nk+i) - x(mk+i) ;,,

E.

Hence, by the principle of induction, for

every k ;,, 2 there exists a finite set of pairs { (n;, m;): j E (2, k>} so
that m1 < n1 < m 2 <n 2 <

< mk <nk and x(n;) - x(m;)

;,,

E.

1.9

PROPERTIES OF THE REALS I 63

Since R is Archimedian ordered, 3k0 E .N so that M - x(m1) k0E.


Take k > ko and for this k a finite set of the type given in the first para
graph. Then

x(nk) - x(m1)=
+

x(nk) - x(mk) + x(mk) - x(nk-1)


x(nk-1) - x(mk-1) + x(mk-1) - x(nk-2)

- x(m2)
- x(m1).
Now x(n;) - x(m;) E

and

x(m2)

x(m;+1) - x(n;)

0.

Hence

kE x(nd - x(m1) :s; M - x(m1) koE.


This is a contradiction.
Since (x(n)) is a real Cauchy sequence it has a limit. If (x(n)) is a
monotone nonincreasing sequence, (-x(n)) is a monotone nondecreas
ing sequence and hence there is an a so that
lim -x(n)=a.
n- oo

But this is equivalent with the statement


lim x(n)= -a.

n-oo

This concludes the proof of the theorem.


1.9.4

Definition.

A least upper bound [greatest lower bound] for a set

A C R is a number that is
(a) an upper [lower] bound for A, and
(b)

if YJ is any upper [lower] bound for A, then


ri[ YJ

].

A least upper bound [greatest lower bound] for A is denoted by


'l.u.b. A [g.l.b. A]' and is often called the supremum [injimum] of A and
denoted by 'sup A [inf A]'.
1.9.5 Theorem. Every nonempty set A C R that is bounded above
[below] has a unique least upper bound [greatest lower bound].

Proof.

Let Ube the set of upper bounds for A, which, by hypothesis,

is nonvoi<j. For every n E N set

In= m: m

E Z &

;E

u},

By Archimedian ordering, Un and hence In is nonempty. The set of


integers I11 is bounded below (by any element of A times 2n) and there
n
fore has a least element mn. (Why?) Hence mn/2 is the smallest element
of Un.

64

I THE REAL NUMBER SYSTEM

m/2k E Uk , then since m/2k 2m/2k+i, it follows that m/2k


Hence U k C Uk+i and consequently
If

If we set

r(n)

Uk+i

11
m,./2 , then the above inequality shows that (r(n)) is

a monotone nonincreasing sequence bounded below by any element


of A. It follows from Theorem 1.9.3 that

(r(n)) has a limit g.

We claim that
g

l.u.b. A.

The first thing to show is that g is an upper bound for A. For every

EA we have

x m11/2"
n, since the number on the right is always in U. For every E > 0,
3N so that n N implies

for all

m,.

0 2n-g < E.
This follows from the fact that
that

m11/2" g. Therefore,

VE >

0, 3N so

N=>
X

This means

g-x

m71/211 < g +

E.

- x < 0, choose E
(x - g)/2 and we
(g - x)/2 > 0. We have consequently shown

O; for if g

get the contradiction that

that is an upper bound for A.


To show it is a l.u.b., let 71 be any upper bound for A. If we suppose
that

8 > 0, then we shall show that there is a rational number


p/2" so that 71 p/2" < - First note that it is a consequence
of the Archimedian ordering of the reals that there exist m and n in
N such that m/2" < 8. Indeed, for every m, 3n such that m < n8, and
since it is easy to show by induction that n < 211 for all n E N, the asser
tion of the last sentence follows. Let us fix m and n so that m/2" < 8
-

71

of the form

and let

{k: k

EZ

& 71 km/2"}.

The set S is nonvoid by the Archimedian ordering of R, and it is


bounded below by 71. Since N is well ordered, it follows that S has a
minimum element

k0 (see Exercise 17 of Section 1.5). Hence we get

and this implies that

1.9

T/

Take

p= k0m

m
ko n<
2

T/

m
<
2n

PROPERTIES OF THE REALS

+8

T/

I 65

g.

and we have shown that

p/2" < g.

TJ

This is a contradiction, since TJ is an upper bound for A and hence p/211


is an upper bound for A, and thus g

m11/2n

p/2n.

Hence we must

have g TJ
If A is bounded below, then

-A= {x: -x EA}


is bounded above. If { is the supremum of -A, -g is the infimum of

A.

Since the unicity of a l.u.b. or g.l.b. is trivial, we have concluded the

proof of the theorem.

1.9.6 Definition. A real number a is said to be an accumulation point


ofa set A CR<=> Ve> 0, 3x EA such that x - a and Ix-al< e.

Every bounded infinite set in

1.9.7 Theorem (Bolzano-Weierstrass).


R has an accumulation point.
Proof.

(a)

Suppose, at first, that A is a bounded denumerable set.

Hence we may suppose that A is the range of a sequence


the property that
Since

n =}x(n)

(x(n)) is bounded
(y(n)), where

(x(n))

with

x(m).

below, we may use Theorem 1.9.5 to obtain

a sequence

y(O)= g.l.b.{x(n): n;;;;. O}


y(l)= g.l.b. { x(n) : n;;;;. l}

y(k)= g.l

Since

y(k + 1)

{x(n): n;;;;. k
.

b .{ x(n):

n;;;;. k }

l} C {x(n): n;;;;. k}
(y(n)) is bounded

The sequence

it

follows

that

y(k)

above, and hence . we may

again -use Theorem 1.9.5 and set

a=
We claim that

a=

such that for VN,

l.u.b { y(n) :
.

n;;;;. O}.

lim n coY (n ). Indeed,


3n;;;;. N so that
-

if this is not true, then

a -y(n);;;;. e.

3e> 0

66 I THE

NUMBER SYSTEM

REAL

The absolute-value sign is not necessary in the previous inequality,


since

y(n)

a. Now if

N < n, y(m)

y(n),

and therefore

a - y(m);;;., a - y(n);;;., e.
N = 1, 2,
every m,

If we successively choose
proved, we see that for

y(m)

a-

then by what we have just

e < a.

This contradicts the fact that a is the l.u.b. of the range of

y.

It remains to prove that a is an accumulation point of the range of

(x(n)). To do this we must show that Ve> 0, 3x(n) such that 0 <
lx(n) - al < e. Now, Ve> 0, 3N such that n;;;., N ly(n) - al < e/2.
Also, Vn;;;., N, 3n1;;;., n so that lx(n1) - y(n) I < e/2 and Vm> n1,
3n2;;;., m so that lx(n2) - y(m) I < e/2. Hence 3n1 and 3n2> n1 so that
Jx(n1) - al
lx(n2) -al
Since
(b)

x(n1) #- x(n2),

lx(n1) - y(n) I + ly(n) - al < e,


lx(n2) - y(m) I + ly(m) - al < e.

it follows that either

x(n1) #-a or x(n2) #-a.

Suppose now that A is any bounded infinite set in R. One way

of trying to reduce this case to the previous case is to try to choose a


denumerable set inA. However, since it is not clear how to do this using
only the axiom of induction, we shall proceed in a slightly different way.
Let us set

m =infA,

M=supA.

These numbers exist by virtue of the Theorem 1.9.5. Next, let us put
P = {x:

m<x<

E Q &

M}.

The set P contains an infinite number of elements (why?) and hence is


denumerable. Let

Vn

(r(n))

be a one-to-one sequence with range P and

E N0 set
An= {x:

EA &

r(n)

x}.

Each setAn is nonvoid and we put

x(n) =infAn.
We claim that the range of the sequence

(x(n)) is infinite,

and there

fore by Theorem 1.6.4 is denumerable. Indeed, suppose the range of


this sequence has k elements

y1 ,

, Yk.

By the use of the principle

of induction we may suppose these elements are labeled so that

Y1 < Y2 <

<

Yk I f we put Yo=
B; = {x:

m,

EA &

Yk+i = M, then the sets

Y; < x < Yi+d

1.9

PROPERTIES OF THE REALS I 67

must be void for 0 ,,;;;; j ,,;;;; k. But this means that A can have at most k + 2
elements, which is a contradiction.
If 3n so that x(n) A, then x(n) is an accumulation point of A. On
the other hand, if Vn E N0, x(n) E A, then A contains a denumerable
set and we get the existence of an accumulation point from the first
part of the proof.
If A is the conjunction of the statements (a) through (i) given in the
description for R in Section 1.8, then an examination of the proofs of
this section show that we have the following chain of implications:
A

& (j) =>A & Theorem 1.9.3 =>A &


Theorem 1.9.5 =>A & Theorem 1.9.7.

[Incidentally, the reason we did not use Theorem 1.9.3 directly in the
proof of Theorem l.9.7(a) in taking a as the limit of the monotone
sequence (y(n)) is that we wanted to establish the above chain of impli
cation.] If we can show that A & Theorem I. 9.7 => A & (j), then all
these statements are equivalent and any one of the statements of
Theorems 1.9.3, 1.9.5, or 1.9.7 can be used in place of U).
1.9.8

Theorem.

& Theorem 1.9.7 =>A & ( j ) .

Proof. Let (x(n)) be a real Cauchy sequence. We distinguish two


cases.
(a) The range of (x(n)) is finite. In this case 3N so that n,m N
=> x(n)
x(m). Indeed, if this is not the case, Vk, 3nk k & 3mk k
such that lx(nk) - x(mk) I
sk > 0. By hypothesis, it is immediate
that the collection of numbers {Sk} is finite. Let 8 be the minimum of
these numbers and E
8/2. Since (x(n)) is Cauchy, 3L such that
k L => lx(nk) - x(mk) I < E, which is a contradiction. Take a = x(n)
for n N, and this is clearly the limit of (x(n)).
(b) The range of (x(n)) is infinite. Let a be an accumulation point
of the range of (x(n)), which exists by Theorem 1.9.7. Since (x(n))
is Cauchy, 3N so that n,m N => lx(n) - x(m) I < e/2. Now a is an
accumulation point of the set {x(n): n N}. Therefore, 3n1 N so
that lx(n1)
a l < e/2. Hence, for any n N,
=

lx(n)

a l ,,;;;;

lx(n1) - al+ lx(n) - x(n1) I<

E.

This completes the proof.


References
Cohen, Leon W., and Gertrude Ehrlich,

The Structure of the Real Number System,

D. Van Nostrand Company, Inc., Princeton, N.J., 1963.


Hamilton, Norman, and Joseph Landin,

Set Theory and the Structure of Arithmetic,

Allyn and Bacl>n, Inc., Boston, 1961.

68 I THE REAL NUMBER SYSTEM


D Exercises
1.

If

A and B are bounded subsets of R and A C B, show that


l.u.b. A l.u.b. B,
g.l.b. B g.l.b. A.

2.

If

A and B are bounded subsets of R, show that


sup A U B =max(sup A, sup B),
inf AU B =min(inf A, inf B).

3.

If

A C R and A has an upper bound that belongs to A, show


A.

that this upper bound must be sup


4.

If a set

A has a l.u.b. that does not belong to A, show that this

l.u.b. is an accumulation point of A.

of

5.
If A is a denumerable set in R and a is an accumulation point
A, show that there is a sequence in A which converges to a. If you

use the axiom of induction, be sure you use it carefully and correctly.

6.

(a)

Show that the l.u.b. of the range of a bounded monotone

nondecreasing sequence is the limit of the sequence.


(b)

Show that if a subsequence of a monotone nondecreasing

sequence is bounded, then the sequence is bounded.

7.

Show that the sequence defined by the following expression is

monotone increasing and bounded:

3
a(n) =
2
8.

(2n+ 3)
2(n+ l)'

EN0

Assuming the binomial theorem as known, show that the se

quence defined as follows is monotone increasing and bounded:

a(n) = I +

I
"+1
,
n+1

EN0

The reader may recognize that the limit of this sequence is designated
by 'e'.

9. Show that Va 0 and V n EN, there is a unique y 0 so that


=a. Designate this unique y by 'a1/n' and show that if 0 a < b, then
a 1 1n < blln, and conversely.
"
y

10.

Prove that lim11_00 n11" =1. [Hint: Set n11" =I+

n =(I+

a11

) n [n(n-1)/2!]

2
an ,

an

and hence

for n 2.]

11.

Show that lim.,._00 n!/nn

12.

What is the set of all accumulation points of the subset of the

0.

rationals of the form

n,p,q

EN?

CHAPTER

21 LIMITS

We have already discussed the concept of function in Section 1.3. In


this and in the next few chapters we shall be exclusively interested in
those functions which have their domains and ranges in the real number
system. A real sequence is an example of such a function.
To discuss the properties of real-valued functions, it is convenient
to introduce some notation and terminology for certain sets of real
numbers. We shall define

]a,b[= {x: a<x <b},


[a,b[= {x: a..; x <b}.

[a,b]= {x: a..; x..; b},


]a,b]= {x: a< x..; b},
The sets

[a,b] and ]a,b[ will be called closed and open intervals, re


]a,b] and [a,b[ will be called half-open intervals.

spectively. The sets

Note that any of these could be the null set.


We shall also define

[a,oo[= {x: a..; x},


]-oo,a]= {x: x..; a},

]a, oo[= {x: a<x},


J -oo,a[= {x: x <a}.

x will often be denoted by 'I(x)'


x. When we
wr
. ite '/(oo)' or 'J(-oo)' we shall mean a set of the form ]a, oo[ or J-oo,a[,
An open interval containing the point

and it may sometimes be referred to as a neighborhood of

respectively.

2.1

THE LIMIT CONCEPT AND CONTINUITY

In the last chapter we introduced the idea of the limit of a sequence.


We shall now generalize this concept to the situation where we are deal
ing with arbitrary real-valued functions.

Definition. The number l is said to be the limit of the function f at


is an accumulation point of J:) (J) and VE> 0, 3 S > 0 so that
..e{f)\Ja} and Ix - al< S :::::} IJ(x) - lj< e.

If

l is the limit off at a, we write

2.1.1

a
x

lim J(x ) =

x-a

or

f(x) las xa.

In case ,B(J) is not bounded above, we write lim x oo f(x ) = l Ve> 0,


-

3M so that

;;;,: M and

E oecn implies

IJ{x) - LI< E. Note that the


69

70 I LIMITS

latter statement includes the definition of the limit of a sequence. If

.EJ(J) is not bounded below, we have a corresponding definition for


lim.r--aof(x) = L.
An equivalent way of phrasing the definition of the limit of a function
at a point is as follows: The number Lis a limit of the function J at a a
is an accumulation point of .EJ(J) and V/(l), 3/(a) such that x E /(a)
n .E;(J)'\{a} f(x) E /(l). We leave for the reader the formulation

when a = oo or a=-oo.
In dealing with the limit of a function f at a it is very natural and
often very convenient to want to conclude that lim.r-af(x) = La
is an accumulation point of ,E;(f) and for every sequence (xn) for which
Xn - a, f(x) - L. One implication is very easy to prove: If f(x) - Las
x - a and Xn - a, then f(xn) -1. Indeed, VE> 0, 3 S> 0 so that
Ix - al<S and x E .e(J)'\_{a} IJ(x) - ll<E. On the other hand,
3N so that n N lxn - al<S. Hence n N lf(xn) - LI<E. The
last statement is precisely the fact thatf(xn) - L.
However, it is not always so clear how to prove the converse implica
tion. We have the hypothesis that a is an accumulation point of ,E;(f )
and for every (xn) so that Xn - a, f(xn) - L. We want to conclude that
f(x) - Las x - a. Let us see how we could proceed. Suppose it is not
true that f(x) - l as x - a. Then we must negate the statement:

(E> 0)(3S> O)(x)(O<Ix-al<S

& x E .EJ(J) lf(x) - ll<E).

Negating this statement leads to the statement:

(3E> O)(S> O)(3x)(O<Ix - a)<S & x

.EJ(J) & lf(x)

E).
L
I

(Note that, for the sake of brevity, we have shortened the symbolism
of the predicate calculus by putting restrictions on the quantified varia
bles together with the quantification symbols ). Let Eo be a number for
which the last statement is true. For n E N0, let Sn= l/(n + 1) and let

An= {x: 0<Ix - al<Sn & x

.e(J) & IJ(x) - l l Eo}.

(2.1.1)

The set An is not void. Now V n E N0, choose Xn E An. Clearly Xn - a,


but IJ(xn) - LI Eo for all n E N0. This last statement contradicts the
hypothesis.
The sticking point in the last argument is the question of whether it
is possible to prove that there exists a sequence (xn) with Xn E An using
only the axioms that prescribe the use of E and the axioms we gave for N. Now,
it is a consequence of the axioms that prescribe the use oi:..the symbol' E'
that for every nonvoid set A there is a function f so that f(A) E A.
Using this result and the axiom of induction, it is not hard to show that
V n E N0 there exists a function <l>n with domain (O, n) so that <l>n(k)
E Ak (See page 199, Exercise 2, in the reference to Mendelson cited
at the end of Section 1.1.) However, it is highly unlikely that there is a
canonical (unique) way of prescribing the <l>n, and hence there is no

2.1

THE LIMIT CONCEPT AND CONTINUITY 171

way of patching them together as we did in the proof of Theorem I .6.4


to get a function cl>, with domain N0, so that cl>(n) E An.
We know that many readers may possibly be quite chagrined by our
suggestion that we don't know how to construct a sequence of the type
needed above. After all, it is "intuitively obvious" that this can be done.
However, strictly speaking, we have shown the existence of a function
only if we can proceed from the axioms by means of the predicate cal
culus, and prove its existence. At this point the reader must take our
word for it that the axioms of set theory that can be used to lead to the
real number system do not include the justification for picking one
element from each set of an infinite number of nonvoid sets. Indeed,
this "picking" process, even for a denumerable number of nonvoid
sets, is quite independent of the axiom of induction, unless the sets
have some special property.
The way out of the dilemma is simply to institute another axiom so
that we can construct the sequence we needed in the previous discussion
and for which the possibility of construction seems so intuitively rea
sonable. This axiom is called the axiom of choice.
(AC) For every collection db of nonvoid sets there exists a function f with
domain db so that for every A E db,f(A) E A.
Whether or not we are justified in assuming such an axiom may. be
discussed along the same lines as to whether we are justified in assum
ing any set of axioms. However, the axiom (AC) seems to have a special
role, .since the real number system can be obtained without its use and
one naturally wonders why it is needed to develop a calculus based on
the real number system. Actually we should point out that we could
develop most of the topics of this book without ever mentioning (AC).
The difficulty usually arises when one tries to prove the equivalence of
a "sequential" type of criteria with another type of criteria, as in the case
of the limit of a function at a point. Indeed the use of (AC) will arise
at only a very few points which if avoided would not really affect the
usefulness of the calculus. However, since the "sequential" criteria
seem so embedded into classical analysis, we felt it was best if we pointed
them out at the appropriate places. When we write '(AC)' before a state
ment it means we are using the axiom of choice in its proof. It does not
necessarily mean that the statement cannot be proved without this axiom
but only that the author is not aware of how to do so.
During the course of the preceding discussion we have proved the
following.

2.1.2 Proposition. If f(x) -1 a.s x - a, then for every sequence (xn)


with range in JFJ(J) such that Xn - a, it follows that f(xn) - t. (AC) Con
versely, if a is an accumulation point of JF)(f ), and for every (xn) with range
in JFJ(J), if Xn - a implies f(xn) -1, then f(x) -1 as x - a.

7% I LIMITS

As in the case of sequences, if each of two functions has a limit at a


given point so do their sum and product. For the purpose of proving
this result, we first introduce a formal definition.

2.1.3

Definition.

If f and g are (real-valued) functions, then

(a) J+g is that function with domain (J+g) = {x: x E (J ) n


(g)} and so that V x E (J + g), (J + g )(x)= f(x)+g (x).
(b) Jg is that function with domain (Jg)= {x: x E (J ) n (g)}
and so that V x E (Jg), (Jg)(x)=J(x)g(x).
(c) fig is that function with domain (Jig)= {x: x E (J ) n (g)
& g(x) O} and so that Vx E (Jig), (Jig)(x)= f(x)lg(x).

If f and g are functions, a is an accumulation point


(g), and limx-af(x) and li mx-a g(x) exist, then
lim (J+ g)(x)= lim f(x) + lim g(x);
-

2.1.4

of (J )
(a)

(b)
(c)

Proposition.

x-a

x-a

lim (Jg)(x) = lim f(x) lim g(x);

x-a

x-a

and iflim g(x)


x-a

x-a

0, then

lim (Jig)(x)= lim f(x)llim g(x) .

x-a

x-a

x-a

Proof. (a) Let us set l= lim x- o f(x) m=limx-a g(x). Then


VE> 0, 38 > 0 so that I x- al< 8 and x E (J ) n (g)\{a}
,

I J(x) - LI < El2

and

lg(x) - ml < el2.

Hence for these same x we have

l(J+g)(x) - (l+ m)I

I J(x) - ll + lg(x) - ml < e.

(b) Take l and m as in part (a). Then 361 > 0 so that I x - al < 81
and x E (J )\{a}

lf(x)l - Ill IJ(x) - ll < 1;


that is,

IJ(x)I < 1+Il l.


Also, Ve> 0, 382> 0 so that I x - al < 62 and x E (J ) n (g)\

{a}

IJ(x) - LI < el2(1+ lml)

and

Take 8 = min (6i. 62); then for I x


we get

lg(x) - ml < El2(1 + I ii) .

al< 8 and x

(J )

(g)\{a}

I (Jg)(x) - lml= IJ(x)g(x) - J(x)m+f(x)m lml


I J(x)I lg(x) -ml+ lml IJ(x) - ll < E.
-

(c) Again, take mas in part (a). Then 361 > 0 so that x
0 < I x - al < 81

(g) and

2.1

lml - lg(x) I

THE LIMIT CONCEPT AND CONTINUITY I 7

lg(x) - ml < lml/2

Thus lg(x) I > lml/2, and hence ]a - Si. a+ s.[ n J0(g) c 1>(1/g).
Now, V > 0, 3S so that 0 <S <S1 and x E J0(g) with 0 < Ix - al

< s =:::}

lg(x) - ml < m2/2 .


Thus x E 1>(1/g) and 0 < Ix - al <S=:::}

1 g1 (x) - ;;;
1

lmg(x) I

lg(x) - ml

2
lg(x) - ml < .
-2
m

This shows that


lim (l/g)(x)

x-a

l/m.

The proof of part (c) is completed by applying part (b) to the product
functionf(l/g).
It is by no means always obvious whether or not a function has a limit
at a point; and even if we know that it has a limit at a point, the value
of the limit may not be too easy to establish. A standard example that
shows this is the function defined by

s(x) =

smx

'

We shall see later that


Jim s(x)

x-o

0.

= 1.

Closely connected with the concept of the limit of a function at a


point is the concept of continuity of a function at a point.
2.1.5 Definition. A function f is said to be continuous at the point
a a E .1>(J) and V > 0, 3S > 0 such tliat Ix - al <Sand x E J0(J)
=:::} IJ(x) - f(a) I < . If f is continuous at every point of its domain, we say
f is a continuous function.

Note that according to this definition, any a E J0(f) that is not an


accumulation point of this set is a point of continuity off. In case a is
an accumulation point of 1>(f), then for f to be continuous at a, it is
necessary and sufficient that

limf(x) =f(a).

x-a

2.1.6 Proposition. If a function f is continuous at a, then a E J0(J)


and for every sequence (xn) with range in 1>(J) and Xn a, we have f(xn)

74 I LIMITS

--+f(a). (AC) Conversely, if a E (J), and for every sequence (xn) with
range n
i (J) if Xn --+ a impl iesf(xn) --+f(a), then f is continuous at a.
,

Proof. If f is continuous at a, then Ve> 0, 3o > 0 so that Ix- al


< o and x E (J) lf(x)-f(a)I< e. Also 3N so that n N
lxn - al< o. Hence n N IJ(xn)-f(a)I< e, which is the proof
of the first sentence.
To prove the second statement, let us assume to the contrary that

0 so that Vo > 0 there exists an x E (J)


Ix- al< o and IJ(x)- J(a)I e0. For n E N0, let On= I/
(n +I) and An= {x: x E (J) & Ix- al< On & IJ(x)-f(a)I eo}.
Each set An is nonvoid and hence by (AC) there exists a sequence (xn)
so that Xn E An. Clearly Xn--+ a, but since Vn E N0, IJ(xn)- J(a)I
it is not true. Then 3e0>

so that

e0,

we get a contradiction.

2.1.7 Theorem. If f and g are continuous at a E (J) n (g),


thenf + g and Jg are contn
i uous at a, and if g(a) - O,f/g is a/,so continuous
ata.
Proof.
to

and

In case

g,

is an accumulation point of the domain common

then the theorem is an immediate consequence of Proposi

tion 2.1.4 and the remark made prior to Proposition 2.1.6. In case

is not an accumulation point of the common domain, all the functions


listed are automatically continuous.

2.1.8 Theorem. If f and g are functions, if g is continuous at a and f


continuous at b= g(a): then f 0 g is continuous at a. (See Defin ition 1.3.5
for f0 g.)
is

Sincefis continuous at b, Ve> 0, 311 > 0 so that IY - bl < '11


y E (J) IJ(y) -f(b) I < e. Also, since g is continuous at a,
3 o> 0 s o that Ix- al< o and x E (g) lg(x) - g(a) I< '11 Hence

Proof.

and

IJ0 g(x)- Jo g(a)I< e


Ix- al <

o and

of continuity off0

g at a.

provided

(J0 g).

This is precisely the meaning

We shall now give a simple example to show that, in general, the o


of Definition 2.1.5 depends in an essential way on both
be the function with domain

]O, I]

and

f(x)=

is continuous at every point of

Letf

I
x

Since it is almost trivial to prove that the function defined by


follows from Theorem 2.1.7.

a.

given by the equation

]O, I],

the fact that

g(x)= x

is continuous

2.1

Let us take
on

a,

and

>

THE LIMIT CONCEPT AND CONTINUITY 175

a E ]O, I]. We know that 38(E, a) , depending


Ix - al < 8 and x E ]O, I] ll(x) - l(a) I < E.

and

so that

Therefore,

and this in turn implies that

Ix - al
x 1.

since

Hence for fixed

E,

is fixed and

E goes

to zero,

EXa

<

Ea,

a goes to zero we see that 8(E, a) must


of l(a) . Indeed, this also shows that if
then 8(E, a) must also go to zero.
as

go to zero for l(x) to be within

<

D Exercises
I.

,RJ(g)
2.

State and prove a result like Proposition


are unbounded above and
If limx-a

of l(x) at

3.

a;

Let

l(x)

4.

2.1.4

when

,RJ(l)

and

oo.

l, show that we are justified in calling l the limit

that is, show that if l exists it must be unique.

s be

the function with domain R\ { 0} defined by

s(x)
Show that

a=

s does

=sin

not have a limit at

(1/x) .

x =0.

A polynomial function of degree

is a function with domain

R given by

Show that every polynomial function is continuous.

5.

Suppose

l is

a function with domain R so that

l(x + y)
Iffis continuous at
then

6.

{ 0}

=fl(, (J) .

Suppose

is continuous at

a show

7.

3m

8.

=l(x)l(y) .

l is a function with ,RJ(f) = R and 3a


a. Show that if for every x and y in R,
=l(x) + l(y)

fJC,(f),

E R so that f

E R so thatl(x) =mx .

Ill shall be that function with ,RJ(lfl )


Vx E ,RJ(f), Ill (x) ll(x) I . If f is continuous at a, use
2.1.8 to show that Ill is continuous at a. Is the converse true?

If f is a function, then

=,RJ(f) and
Theorem

E R,

thatfis continuous. Further, ifO E

l(x + y)
then

Vx, y

Letfbe a function with domain R defined by

76 J LIMITS

f(x)

:atinal,
{ 1O(:::xx:) iss irrational
.
(::::)

At what points is f continuous and at what points does the limit of J


exist?
9. Ifa ""'0, then the results of Exercise 9 of Section 1.9 show that
Vn E N there is a unique nonnegative number a11n so that (a11n)n =a.
If J is a function having domain the nonnegative reals and whose
value at each x of its domain is x 11 2 show that f is continuous. Do
the same for that function whose value at each nonnegative x is x11n,
n E N.
10.

Suppose thatf1 andf2 are functions with a common domain and

J is that function having the same domain and defined by


f(x)

max(J1(x).f2(x)).

If f1 and h are continuous show that f is continuous. Extend this result


to the case ofn functions.
11. Give an example which shows that Jg may be continuous but
f and g are not continuous. Do the same for f g.
0

12.

Assume the following properties for the sine and cosine func

tions:
lsin(x)I

Jcos(x) I

lxl,

sin(x)- sin(y)

2 sin

1,

Vx ER;

(x ;y ) cos(x ;y ) ,

Vx,y ER.

Prove that the sine function is continuous.


13. If J is a function continuous at a andJ(a) >0, show that 3h
so thatforx E {x: Ix-al <h} n -B(J),
J(x) >O.
14. A positive rational number p/q is said to be in lowest terms if
there is no equal [equivalent in the terminology of 1.5] p1/q1 so that
0 < p1 < p, 0 < q1 < q. Let f be that function with domain [O, l]
defined as follows:

J(x)

{ l/q

(::::) =: p/q in lowest terms,


.
0 (:::x
:) 1s irrational.

Show that f is continuous only at the irrational numbers.


2.2

THE HEINE-BOREL THEOREM AND UNIFORM CONTINUITY

At the end of Section 2.1 we gave a simple example which showed


that for a continuous function the 8 of Definition 2.1.5 depended in
an essential way on both E' and the given point of the domain of the

2.2

THE HEINE-BOREL THEOREM AND UNIFORM CONTINUITY 177

function. In 1872 E. Heine showed that if the domain of the function


was a closed bounded interval, then 8 depended only on

and not on

the point chosen in the interval. In 1895 E. Borel distilled out the
essence of Heine's proof and stated a certain property about closed
bounded intervals from which Heine's theorem followed as an imme
diate corollary. The purpose of this section is to prove the Heine-Borel
theorem and use it to prove Heine's theorem on continuous functions.
We begin with a number of definitions.

2.2.1
Definition. A set in R is said to be closed:::} it contains all its
accumulation points. The closure A of any set A is A together with all its accumu
lation points.
A set in R is said to be compact :::} it is closed and bounded.
2.2.2 Definition. A set UC R is said to be open :::} VxE U there is
an open interval I(x) CU.
Recall thatx E I(x) so that if U is open we have U =

U {/(x): xE U}.

In terms of open sets the definition of continuity at a point is as follows:


A function

is continuous at

a:::} aE JE>(J)

and for every open

containingf(a) there exists an open V containing

a so thatf(V

U
JE>(J))

CU.
2.2.3 Proposition. The complement of a closed set is open and the
complement of an open set is closed.
Proof.

Suppose

3/(x) CAc.
belong to A.

is closed. Then

Otherwise

VxEAc,

the complement of A,

is an accumulation point of

and would

Conversely, if A is open, then Vx which is an accumulation point of


Ac and V/(x), I(x) n Ac# 0. Hence xEAc, since otherwise xEA,
and 3/(x) so that I(x) n Ac= 0.

2.2.4

Definition.

A collection of sets 1-" is said to be a covering for a set

c R :::}

A C U{U:UE1"'}.
The collection 1-" is said to be an open covering for A :::} every UE 1-" is open,
and 1-" is a covering for A.
2.2.5 Theorem (Heine-Borel). A set A C R is compact :::} in every
open covering for A there exists a finite number of sets that cover A .
Proof.

1-"

Suppose first that

be an open covering for

A is a
A and

closed bounded interval

[a, b]. Let


x in A

let E be the collection of all

78 j LIMITS

so that

[a, x] can be covered by a finite number of elements of "U.


g =sup Eand suppose U0 E "U so that t E U0, and I(g) is an open
interval about t so that I(t)C U0 By the definition of sup Ewe must
have I(g) n E 0. Let x E I(g) n E and {Uk: k E (1,n)} C "U be
an open covering for [a, xJ ; then {Uk:k E (0, n)} is an open covering
for [a-, g]. Hence g E Eand if g b we get a contradiction. This shows
that [a, b] can be covered by a finite number of elements from "U.
Now let us suppose that A is any closed and bounded set in R. Let I
be a closed (and bounded) interval so that A C I and set U0 =Ac . Then
U0 is open and "U U {U0} is an open covering for I. This reduces to a
finite subcovering {Uk:k E (O,n)} of I, and since U0 n A=0,
{Uk: k E (l,n)} must cover A.
Set

To prove the converse implication we suppose that every open cover


ing of

A reduces to a finite subcovering. For every ri E N, let In


= ]-n, n [. The collection {In: n E N} of open intervals covers R and
hence A. Hence there exists a finite set Uni: J E (l,k)} that covers A.
Let m= max {ni:J E (l,k)}; then clearly AC Im, which means it is
bounded.
To show A is closed, let

a EAc. Then Vx EA, Ix - al= 8(x) > 0.


Vn E N , Un= {x: Ix al > l/n}, it follows that "U=
{Un: n E N} is an open covering for A, since V8(x), 3n E N so that
I/ n < 8(x). By hypothesis, this reduces to a finite subcovering {Uni:
J E (l,k)}. Let m =max {ni: J E (l,k )}; then, since Vx EA, 3ni
so that x E Un it follows that {x: Ix - al < l/m} is completely in Ac.
This means Ac is open and therefore A is closed.
Hence, if

..

Many people like to use a "sequential" version of compactness. The


equivalence with the definition of compactness given in Definiuon 2.2. l
seems to involve the use of (AC).

2.2.6 Proposition. If a set A C R is compact, then every sequence with


range in A has a subsequence that converges to a point in A. (AC) Conversely,
if every sequence with range in A has a subsequence that converges to a point
in A, then A is compact.
Proof.

A.

Suppose A is compact and

If the range of

(xn) is a sequence with range in


(xn) is finite, then there is a subsequence whose range

consists of only one element (proof?) and hence is convergent to an


element of A. If the range of

(xn)

is infinite, then the Bolzano-Weier

strass theorem tells us that the range has an accumulation point

a which

belongs to A, since A is closed. Apply the result of Exercise 5 of Section

1.9, which shows the existence of a subsequence of (xn) that converges

to

a.
Conversely, suppose that every sequence with range in A contains

a subsequence which converges to a point in A. If

is an accumulation

point of A, then by use of (AC) we conclude that there is a sequence that


converges to

a.

Indeed, forn

E N0 letAn= {x: 0 <Ix - al < I/(n +I)}

2.2 THE HEINE-BOREL THEOREM AND UNIFORM CONTINUITY I 79

n A. The sets An are nonvoid and by (AC) we get a sequence (xn) with
Xn EAn; clearly Xn-+ a. By hypothesis there is a subsequence (yn)
that converges to an element of A. But any subsequence of a conver
gent sequence converges to the same limit as the original sequence.

a EA, which shows A is closed.


A is bounded let us assume the contrary. Suppose it is not
bounded above. Now, V n EN0 let
Hence

To show

An=A n [n,n + l [ .
An which are nonvoid. Other
A is bounded above. Hence the collection of nonvoid An is de
numerable and there is a one-to-one function <I> that maps N0 into itself
so that the set A41<n> is nonvoid. Clearly it is possible to choose <I> so that
m < n => <l>(m) < <l>(n) (proof?). Let us put Xn=inf A41<n> From the
definition of Ak and the fact that <I> is increasing, we get
There are an infinite number of the sets
wise

<l>{n) ,;;:;; Xn

<

Since we have already shown

<l>(n) + 1

,;;:;;

<l>(n

1).

A is closed, Vn EN0, Xn EA. Now set


Y n=X2n+l

Then we have

Yn+l - Yn=X2n+3 - X2n+1

<1> (2n + 3) - <1>(2n + 2) 1,


and by induction it follows that if

>

n,

Ym - Y n 1.
Thus no subsequence of

(yn) can be Cauchy and hence no subsequence

can converge. This contradicts the original hypothesis, and shows that

A must be bounded above. In the same way, the assumption that A


is unbounded below would lead to a contradiction.
An immediate corollary of the Heine-Borel theorem is the following,
which is often called the

2.2. 7

Cantor intersection theorem.

Theorem (Cantor).

sets so that Vn EN0, An+i

Proof.

If (An) is a sequence of nonvoid compact


An, then

Suppose to the contrary that this intersection is void. Then

[see Exercise lO(b) of Section


R

and since

1.2]

(n {An: n EN0})c= U {Anc: n EN0},

Vn E N0, Anc is open, the collection {Anc: n EN0} is an open


A0 Consequently, there is a finite set {An/:j E (l,k)}

covering for

80 I LIMITS
that covers A0 Let

m = max {ni: j E (1,k); then since V n, Anc C An+1c,

it follows that Ame covers A0. But this is impossible, since Am C A0,

Am# 0, and Am n Amc=0.


Let us now go on to establish Heine's theorem about continuous
functions.

2.2.8

>

A function f is said to be uniformly continuous VE


so that if Ix - YI <8andx, y E (J), then IJ(x)-J(y)I <E.

Definition.

0, 38> 0

In the very formal symbolism of the predicate calculus this would be


written

(E) (E>

0 ==> (38)(8> 0 &

(x) (y) (x, y E (J) &


lx-yl <8==>IJ(x)-f(y)I <E))).

It is not true that every continuous function is unformly continuous.


We have already seen in Section 2.1 that the function defined by

J(x) = l /x

x E ]O, l] is not uniformly continuous.


J(x) =sin (l/x), x E ]O, l], and this

for

is given by

Another example
function is even

bounded.

2.2.9 Theorem (Heine). If f is a continuous function and (J) zs


compact, then f is uniformly continuous.
Proof. Vx E (J) and VE> 0, 38(E, x)>
<8(E, x) and y E (J), then IJ(y)-f(x)I <E/2.
set /(x) = {y: ly-xl < 8(E,x)/2}. The collection
'lb= {/(x):
is an open covering for

0 so that if

For every x E

IY-xi
(J)

x E (J)}

(J) and by the compactness of (J) reduces


{xk: k E

to a finite subcover. That is, there is a finite set of points

(1, n)} C (J)

so that

(J) CU {l(xk): k E (l,n)}.


Let us set 8 =min

lx-yl <8.

{8(E, xk)/2: k E (1, n)}. Suppose x,y E (J) and

Now,3k E

IY-xk l ,,;;; IY-xi

(l,n) so thatx E I(xk) This implies that


+

Ix- xkl <8 + 8(E, xk)/2 ,,;;; 8(E, xk).

Hence

IJ(y)-f(x)I,,;;; IJ(y)-f(xk)I
which proves that
REMARK:

IJ(xk)-f(x)I <E,

f is uniformly continuous.

In the proof of the last theorem, it might seem that it is

necessary to use the axiom of choice in order to pick a 8(E, x) for every

x E ( f). However,

there are many explicit ways of picking 8(E,x).

2.2

THE HEINE-BOREL THEOREM AND UNIFORM CONTINUITY I 81

For example, for a given

and

x,

one might consider the collection

of all S's less than one for which the first statement in the proof holds,
and then take the supremum of this set. There are also a number of
other ways to construct a suitable covering for

(J)

and we are

sure that the reader can think of several alternate ways. There are
a number of places throughout the text where a point like this will
arise, but we shall not comment about it in the future.
There are several other important results about continuous functions
with compact domains that follow from the Heine-Borel theorem. We
shall give these below.

2.2.10

Theorem.

compact range.
Proof.

If

A continuous function with a compact domain has a

is a continuous function with a compact domain, we shall

show that its range is closed and bounded. We shall consider only the
case where
if

fl(,(j)

o/:-

0, since otherwise the theorem is trivial. First,

is an accumulation point of

An= {x: x
Each set

An

fl(, (f), Vn

is nonvoid, closed (Exercise 16 of this section), and hence

Cantor intersection theorem,


E

N.

N let

(f) & IJ(x) - bl:;;;_; l/n}.

compact. Further, it is clear that

Thus

closed.

f(a)

b,

Vn

N, An+i

An.

Thus by the

3 a E (f), which belongs to An for every

which shows

fl(,(j),

and hence

fl(,(j)

is

fl(,(j) is bounded, we take a E (f). Since f is con


N there is an open interval In so that x E In n (J) =>

To show that
tinuous,

Vn

if(x) I - IJ(a) I

:;;;_;

IJ(x) - f(a) I

<

n.

E In n (f) => lf(x) I E ln = [O, IJ(a) I+ n]. Since the col


Un: n E N} covers R+ U {O}, it follows that {In: n E N} is
an open covering for (f). By the compactness of (f) this reduces
to a finite subcovering {In;: j E (I, k)}. Lel m =max {n;: j E (1, k)};
since Vn E N,]n C ]n+i, it follows that fl(, (f) C ] m, which proves that
fl(,(j) is bounded.

Thus

lection

In Chapter 6 we shall give a "global" characterization of continuous


functions from which the above theorem will follow as an immediate
corollary of the Heine-Borel theorem.
We shall now show that continuous functions with compact domains
have a maximum and a minimum. We first give the formal definition.

2.2.11 Definition. If f is a function and a E (f), then a is said to


be at the maximum [minimum] for!::::} Vx E (f), J(x) :;;;j(a) [f(a)
:;;;; J(x)]. The number f(a) is called the maximum [minimum] off.

82 I LIMITS

a E /) (f) is said to be at a lo cal maximum [minimum J fo r f there exists


an open interval I(a) so that Vx E l(a) n /f)(j), f(x) ,,;;;; f(a) [f(a)

:s;;f(x)]. The number f(a) is called a lo cal maximum [minimum] off.


2.2.12 Theorem. Every co ntinuo us functio n with a co mpact do main
has a maximum and a minimum.
Proof.

If

f is continuous on a compact domain, it follows from the

last theorem that f/C,(f) is compact. Since f/C,(f) is bounded, the numbers

m =inf f/C,(f),

M =sup f/C,(f),

exist, and since f/(, (f) is closed, they belong to f/(, (f). This establishes
the theorem.
The next theorem gives more specific information about functions
that are defined on compact intervals. We first prove a lemma.

2.2.13 Lemma. If f is co ntinuo us, ,ff)(j) = [a,b] and f(a)f(b),,;;;; 0,


then 3c E [a, b] so that f(c) =0.
Proof.

f(a) =0 or f(b) 0 we are done. Hence for the sake of


f(a) < 0 and f(b) > 0. Let A= {x: x E [a,b] &
f(x) < O}; since f(a) < 0, A - 0 and hence c =l.u.b. A is well defined.
If f(c) < 0, then, if we use the continuity of f at c, there is an open
interval I(c) so that x E I(c) n /f)(j) => f(x) < 0 (see Exercise 13
of Section 2.1). This contradicts the definition of c. By the same argu
ment we cannot have f(c) > 0. Consequently, we must have f(c)
0.
If f(a) > 0 and f(b) < 0, apply the above proof to the function -f
If

argument suppose

and we are done.

2.2.14 Theorem. Iff is co ntinuo us, /f)(j)


[a,b], m is the minimum
off and M is the maximum off, then f/(, (f)
[m,M] . That is,f takes o n all
values between its maximum and its minimum.
=

Proof.

Let y E [m,M], f(c) =m, f(d) =M. If we set g(x) =f(x)


g is continuous and g(c)g(d) ,,;;;; 0. By the last lemma, 3e be
tween c and d so that g(e) =f(e) - y
0. This shows [m,M] C f/C,(f).
On the other hand, since Vx E [a,b], m ,,;;;; f(x) ,,;;;; M, we see that

- y, then

f/(, (f) C

[m,M]. These two inclusions establish the theorem.

D Exercises
I.

Show that a closed interval is a closed set and an open interval

is an open set.

2.

If A C R, designate by A' the set of accumulation points of A.

2.2

THE HEINE-BOREL THEOREM AND UNIFORM CONTINUITY I 85

Show the following:


(a)
(b)
(c)
(d)
(e)
(f )

A' is closed.
ACBA'CB'.
(AUB)'=A'UB'.
A=AUA' is closed.
(A)'=A'.
A is closed A=A.

n E N, let Jn= ] I/n, 2/n[. Show that the collection


Un: n E N} is an open covering for]= ]O, I [but that no finite subset
3.

For

of these intervals covers].


4.

Show that a closed subset of a compact set is compact.

5.

Show that the union of any number of open sets is open and the

intersection of a finite number of open sets is open. Give an example


which shows that the intersection of a collection of open sets may not
be open.

6.

Using the results of Exercise 5, show that the intersection of any

number of closed sets is closed and the union of any finite number of
closed sets is closed.

7.

If

C R and

x E R,

define the distance between

and

as

d(x,A)=inf{lx-yl:y EA}.
If

is closed and

x g A,

3y EA

show that

Is this nearest pointy unique?

(Hint:

so that

d(x, A)= Ix-YI.

Properties of continuous functions

may be useful here.)

8.

Generalizing the notion in Exercise 7, if

R, define the distance between

and

d(A,B) =inf{lx-yl: x EA
If A is compact and B is
d(A, B) = Ix-YI. Is this

9.

closed , show that


result true if

and

are subsets of

as

&

y EB}.

3x EA and 3y EB so that
B are merely closed?

and

Show that every closed subset of Ris the intersection of a count

able number of open sets.

10.

If

ACRand x E R,

define

x+A={x+y: y E.A}.
Show the following:
(a)
(b)
(c)

11.

A
A
A

x+A is open.
x+A is closed.
compact x+A is compact.

is open

is closed
is

Show that the complement of any closed set is the union of a

countable number of pairwise disjoint open intervals.

84 ILIMITS

12. Prove the Cantor intersection theorem in the following form:


Let S C R and VxE S let Ax be a nonempty compact subset of R.
Suppose further that x,yES and x < y ==}AY C Ax. Show that

is nonvoid.

n{Ax:xES}

13. Assuming the Cantor intersection theorem, 2.2.7, and the axiom
of induction, prove the Bolzano-Weierstrass theorem. [Hint: If A is an
infinite bounded set, use the axiom of induction to establish the exist
ence of a sequence (In) of compact intervals, each of which contains a
point of A so that Vk E N0, h+i Ch, and the length of In goes to zero
asnoo.J
14. Show that the Heine-Borel theorem implies the Bolzano
Weierstrass theorem.
15. If f and g are uniformly continuous functions, show thatf+ g,
g,
J and f g are also uniformly continuous. Under what condition(s)
is l/g uniformly continuous?
0

16. If f is a continuous function with a closed domain and m is a


fixed number, show that {x: xE (f) & IJ(x)I .;;;; m} is a closed subset
of (f).
17. Let A be a set with the property that every continuous function
with domain A is uniformly continuous. Is A necessarily compact?
18. Let f be a continuous function with domain [a, b J and g be that
function with the same domain defined by

g(x) =max (fl [a, x]);


that is, g (x) is the maximum of f restricted to [a,x]. Show that g is
continuous.
2.3

MONOTONE FUNCTIONS

In this section we shall treat a special class of functions that is of im


portance in the theory of Riemann-Stieltjes integration. Although the
functions in this class are not necessarily continuous, they are continuous
at "almost all" the points in their domains.
Definition. A functionf is called monotone increasing [decreasing]
Vx,yE (f) with x < y we have f(x) <J(y) [f(y) <J(x)].
A function f is called monotone nondecreasing [non increasingJ Vx, y
E (f) with x < y we have f(x) .;;;; J(y) [f(y) .;;;; f(x)].
2.3.1

For monotone functions it is convenient to talk about right and left


limits, and for this purpose we need the concept of right and left
accumulation points.

2.3

2.3.2

a setA

MONOTONE FUNCTIONS I 85

We say that a is a right [left] accumulation point of

Definition.

Ve> 0,

{x: 0 <x - a <e}

0 [{x: 0 <a - x <E}

0].

2.3.3 Definition. We say that l is a right [left] limit of f at a a is


a right [left] accumulation point of>(!) and Ve> 0, 38 > 0 so that

{x: 0 <x - a <8}


lf(x)
- ll <e.
=:::}
x

n >(!)

A right [left] limit at

[{x: 0 <a - x <8}

will be designated by

n >(!)]

f(a+)[ f(a-)],

and we

shall write

f(a+) =

lim

f(x),

x'a

f(a-) =

lim

X-"a

f(x).

We leave as an exercise the simple fact that if the right and left limits of

exist at

a,

then

has a limit at

a if and

only if the right and left limits

are equal.

2.3.4 Definition. If f is a/unction, a E >(!), and f(a+) and f(a-)


exist, but f(a+) f(a) or f(a-) f(a), then we say that f has a jump
discontinuity at a.
If

is a monotone nondecreasing function and

is a left and right

accumulation point of> ( f ), then it is clear

f(a+) =
f(a-) =
If

g.l.b.
l.u.b.

{ f(x): x
{ f(x): x

E >(!) &
E >(!) &

x > a},
x <a}.

E >(!) and is a right accumulation point but not a left accumula

tion point set

f( a-) = f( a),

and if

not a right accumulation point, set

a is a left accumulation point but


f(a+) = f(a). With this convention

we see that the only type of discontinuity that a monotone nondecreas

ing function may have is a jump discontinuity. If


increasing, then

-f

is monotone non

is monotone nondecreasing and hence the result

holds as well in this case. Indeed, we can say more.

2.3.5 Proposition. The discontinuities of a monotone function are jump


discontinuities and there are a countable number (possibly zero) of them.
Proof.

We have already noted above that a monotone function can

have only jump discontinuities. Hence it remains to show that there


are a countable number of them.
For every

E N, let us set
Jn=

{x: x

E >(!) &

lf(x) I

n},

86JUMITS

and let us set

Dn= {x: x E n & ll(x+) -l(x-)1 l/n}

We claim that Dn hs at most 2n2 points. Indeed, let us first note that
Vx,y E n we have
ll(x) -l(y)I

2n.

Next, suppose that {a;:} E (I,k)} is any finite set of points in Dn.
Without loss of generality we may suppose that these points are labeled
so that a 1 < a2 <
< ak. Let us adopt the convention that i f
a E (J) is not a right accumulation point of (J), we set l( a + )
=l( a ), and if a is not a left accumulation point of (J), we setl(a-)
=l( a ) Then, clearly, we may writel(ak+) -l(a1- ) as a "telescoping
sum" in the following way:

l( ak+ )

l( a 1 )

i=I

J=2

L [f(a;+) -l(a;-)] + L [J(a;-) -l( a;-1+ )]

Since l is monotone, the terms that appear under the summation signs
are always nonnegative or always nonpositive. Thus
n

2n ll( ak+) -l(a1-)I L ll(a;+)


i=I

l( a; ) I k/n.

This shows that k 2n2


The set of discontinuities of l is exactly the set

D = U {Dn: n E N}.
It follows from Exercise 15 of Section 1. 7 that D is countable.
The next theorem tells us that if the domain of a strictly monotone
function satisfies certain conditions, in particular if it is an interval,
then the inverse of the function is always continuous. This is true
regardless of whether the given function is continuous or not.
2.3.6 Theorem. If f is a monotone increasing or monotone decreasing
function, then1-1 is a monotone increasing or decreasing function, respectively.
If (J) has the property that Va, b E (J) with a b, [a, b] n (J)
is closed (hence compact), then 1-1 is continuous.
Proof. If we suppose that l is monotone increasing, it is clearly a
one-to-one function and hence 1-1 is a function. For every y1 and y2
in (J-1) letx1 andx2 in (J) be those elements for whichy1=l(x1)
andy2=l(x1). Ify1 < y2, then we must have l-1(y2) -l-1(y1) X2 -xi
> 0. For otherwise, ifx2 -x1 0, we would gety2=l(x2) l(x1) =Yi.
which is a contradiction. Thus 1-1 is increasing. If f is monotone de
creasing, a similar proof shows thatl-1 is monotone decreasing.
Suppose now that b E (J-1) and b =l( a ). If a is a right accumula
tion point of (J), then VE > 0, 3x, E (J) so that 0 < x. - a < E.
=

2.S

MONOTONE FUNCTIONS I 87

f is monotone increasing
y ,,;;; y., then since 1-1 is mono

Suppose, for the sake of being definite, that


and set

y.=f(x,). If y

J(f-1) and b

,,;;;

tone increasing we get

Q .,;;j-l(y) - 1-l(b} .,;;j-l(y,} - 1-l(b) ,,;;; x, - a<

E.

If 3e>0 so that

>(!} n ]a, a+ e[=0, then we claim 38>0 so


]b, b+ 8[= 0. Indeed, if this is not true, bis a right
accumulation point of J(f-1). Let y E J(f-1) and y >b, and set
1
1
X =1-1(y }. Then X >a, and indeed X ""a+ E since >(!) n
1
1
1
1
]a, a+ e[=0. Now, by hypothesis, [a+ e, x ] n >(!) is closed. Fur
1
ther, y E ]b, y ] n J(f-1) J-1 (y} E [a+ e, x ]. Thus
1
1
y
>
b}
(y)
:
c = 1-1(b+) =inf u-1
that

J(f-1)

>(!). Now,f(c} >b since c""a+ E. On the


J(f-1) so that
b< y < f(c), which means that 1 -1(y) < 1-1(f(c)) = c. The latter in
equality contradicts the definition of c.
We have established in the last two paragraphs thatVe>0, 38>0
so that 0 .;;; y - b < 8 and y E J(f-1) 0.;;; 1-1(y} - 1-1(b} < e.
Arguing in an exactly analogous fashion we can prove that Ve>0,
38>0 so that 0.;;; b- y < 8 and y E J(f-1) 0.;;; 1-1(b) - 1-1(y)
< e. This means, of course, that 1-1 is continuous at b.
If f is monotone decreasing, then -f is monotone increasing and
hence the continuity of 1-1 follows from the continuity of (-J )-1
belongs to

[a+

E,

X]
1

other hand, since bis a right accumulation point, 3 y E

EXPONENTIAL FUNCTIONS

As an example of a monotone function we shall construct a

exponential function. Let a be a fixed number. If n

generalized

E N, then

an can be

defined inductively by the equation

fa with
fa(n+ 1) =fa(n}fa(l) .

The meaning of this is that there exists a unique function

fa(l)=a and V n

domain N such that

E N,

This statement can be proved by induction in a manner very similar


to the proof of Theorem 1.6.4. We also define a 0 = 1, and if n E Z,

n < 0, and a# 0, we define an= l/a-n.


From the results of Exercise 9 of Section 1.9 we know that for every

E N there exists a

uni,que solution in

R+ to the equation
.

a>0,
which is labeled 'a1tn. If

If

E N 0, define (Exercise

min < 0, we define


amtn= 1/0,-mtn.

9 of Section 2.3)

88 I LIMITS

For every m,n E Z and Vb,c E R\{O}, it is easily established by


induction that

Using these facts, and the uniqueness of the solutions of any equation
r,s E Q.,

xn=y, y > 0, if b,c E R+ we can establish that for every

We shall leave the verification of these simple facts as an exercise.


The function defined by
a> 0,

is a function with domain Q. and range in R+. If 0< a< 1, ea is mono


tone decreasing; if a=1, then Vr E Q., ea(r)
1; if a> 1, then ea is
monotone increasing. Let us prove the last statement. We must show
that if r,s E Q. and s< r, then ar - a8 > 0. Now
=

ar-as=a(ar--

I).

Since a> 0, if we can show ar-s- 1 > 0 we are done. Suppose r- s


=p/q; then we get
(ar-s)q=aP

Since a > 1, a simpie inductive proof shows that aP > 1. Now, if ar-s 1,
then (ar-)q 1, which would contradict the above equality.
Since ea is monotone on Q. it is bounded on every bounded set in Q..
. Further, it is continuous at r
0. To prove this let us first suppose that
a;;;,: 1. For any a;;;,: 1, it is always true that Vn E N,
=

1 a< (I +a/n)n.
For we can either expand the right side by the binomial theorem (which
we suppose known) or use the result of Exercise 1 of Section 1.7 to get

(I+

a/n)n> 1 +a > a.

Therefore, we have (why?)


1a11n<1 +a/n.

Consequently, since ea is monotone, if 0 r l/n we have


1 ar a11n< 1 + a/n.
If> 0, 3n EN so that a/n<.Therefore, Ve> 0, 36> 0 so that
Or<6==>
(2.3.1)
Since Vr;;;,: 0, ar;;;,: 1, and Vr, ara-r=1, we get Vr 0, ar 1.
Hence, if-6< r 0, we get from (2.3.1)

2.5

MONOTONE FUNCTIONS j 89

(2.3.2)

The inequalities (2.3.1) and (2.3.2) show the continuity at zero. If


0<a< 1, we work with b= l/a. We have already proved the continuity
for eb. Since ea= l/eb, and eb(r) - 0, we have continuity for ea.
Now, for every x E R, if a> 1, then since ea is monotone on Q, the
left and right limits exist:
ea(x+) = g.l.b. {ea(r): r> x},
ea(x-) =l.u.b. {ea(r): r<x}.

If r<x<s, r,s E Q, then


ea(r) <ea(x-)

ea(x+) <ea(s),

from which it follows that

If now we use the fact that ea, defined on Q, is continuous at zero, and
is bounded on any bounded set, we see that we must have
ea(x-) =ea(x+).

We denote this common value by ea(x) and this extends ea to a contin


uous monotone increasing function defined on all of R, which we
denote by the same symbol ea. We usually write ea(x) =ax.
If a - 1, ea is monotone increasing or decreasing and hence its in
verse ea-i exists and is continuous. As is well known, we usually write
x> 0,

and this is a monotone increasing function if a> 1, monotone de


creasing if a< 1 .
D Exercises
1. Let ea be the function with domain R which was constructed
above. Prove the following:
(a) If a> 1, ea is a monotone increasing continuous function.
(b) If a- 1, .17C-(ea) = ]O, oo[.
(c) ea(X + y) =ea(x)ea(y).
(d) (ax )11 =ax11.
2.

Show that the logarithm function ea-i satisfies the following:


(a) .17C-(ea-1) = ]-oo,oo[.
(b) loga X11=y loga x.
(c) loga xy =loga x + loga y.

3.

Suppose J is a continuous function on


f(x + y) =f(x)f(y).

such that

90 I LIMITS

Show that there is a unique a


4.

a>

E R

so thatf(x) = ea(x) .

If a > 1, show that x"a 0 as x oo for any real


and a> 0, show that (loga x)lx" 0 as x oo.
-x

a.

Also, if

5. Show that if f is a continuous function with domain [a, b] and


does not have a local maximum or minimum in ] a, b [, then f must be
monotone decreasing or monotone increasing.
6. If f is a continuous one-to-one function with an interval domain,
show that f must be monotone increasing or monotone decreasing.
7. Give an example which shows that Theorem 2.3.6 may not be
true if no restriction is put on J>(f).
8. For every n
and defined by

N show that the function fn with domain [O, oo[

is a monotone continuous function with range [O, oo[. Hence it has a


continuous inverse and consequently deduce the result of Exercise 9
of Section 1.9: Va E [O, oo[ there exists a unique b E [O, oo[ so that

bn= a.
9.

If m E N0 and n E N , we defined

Show that this is independent of the representation of min; that is, if


p E N0 and q E N so that Plq =min show that

10.

2.4

If b,c E R+ and r,s E Q, show that

LIMIT SUPERIOR AND LIMIT INFERIOR

There is a concept which is more general than the concept of the limit
of a function at a point and is very useful in many instances. In this sec
tion we shall introduce this concept and obtain several facts about it.
Let us suppose that f is a bounded function with J>(f) - 0 and a is
an accumulation point of J>(f). For every r E R+ let us define

;J(r) = l.u.b. {f(x): x


fr(r) = g.I.b. {J(x): x

E
E

J>(f)
J>(f)

& 0 <
& 0 <

Ix
Ix

al
al

<
<

r},
r}.

It is almost immediate that 'ifir is a nondecreasing function and <fir is a


nonincreasing function.

2.4

2.4.1
Definition. If f
point of Je(f), we shall set

is

LIMIT SUPERIOR AND LIMIT INFERIOR j 91

a bounded function and a is an accumulation

lim f(x) =lim ;,(r) = ;Ao+).


r-o

x-a

limf(x) =lim f_t(r) = ft(O+).


x-a

r-o

These numbers are called the limit superior and the limit inferior of f at a,
respectively.
The limit superior and the limit inferior of f at a are sometimes
designated by
Jim sup f(x)
x-a

Jim inf J(x) ,

and

x-a

respectively. Clearly it is not necessary that f be bounded, but only


bounded in a neighborhood of a to define the concepts of limit superior
and limit inferior.
In case Je(f) is not bounded above, we can set

;,(r) =sup {x: x

Je(f) & x

f_r (r) =inf

Je(f) & x

{x : x

>
>

r},

r }.

These are monotone nonincreasing and nondecreasing functions,


respectively, and we define
Jim f(x) = Jim ;,(r),

x-oo

r-ao

Jim f(x) = Jim f_t(r).


x-oo

r-oo

We can consider oo as an accumulation point of Je(f) since every open


interval I ( oo) contains a point of Je(f). If f is a bounded sequence,
the latter quantities are called the limit superior and the limit inferior
of the sequence, since the domain of a sequence has no finite accumu
lation point. We shall leave to the reader the easy task of formulating
these notions at -oo.
It is possible to get a geometric meaning of the previous definition
which may make it more understandable. The number ih(r) may be
thought of as measuring the size of the largest peak off as x varies over
the deleted interval {x: 0 < Ix - al < r}, and 'Pr(r) measures the depth
of the deepest valley. As r decreases to zero, the size of the largest peak
decreases to 1(0+) and the size of the deepest valley shortens to
'f'r(O+).
- As an example, let us consider the function given by f(x) =sin (l/x)
for x 0. A sketch of the graph of this function is shown in Fig. 2.4.1.

92 I LIMITS

-1
7T

-1

-2

Figure 2.4.1

As x - 0, we get an infinite number of peaks and valleys of this function.


It is clear that Vr > 0, (r) 1, !e(r)= -1. Hence
=

lim sin (l/x). = 1,

x-o

lim sin (l/x)


x-o

-1.

'

Note that this function does not have a limit at x


2.4.2

Proposition.

0.

The function f has the limit l at a:::a


:>

is an

accumu

lation point of (f) and


lim f(x)= l

lim f(x).

Proof. Suppose f(x) - z as x - a. This means Ve> 0, 38> 0 so


that x E (f) and 0 < Ix - a l < 8 l f (x) - l l < e. It follows that
if 0 < r ,,,-;; 8, then l1(r) - ll ,,,-;; e and l<P1(r)
- l] ,,,-;; e. This means, by
definition, that

!e1(0+)= l= ;1(0+).
Conversely, suppose this last equation is satisfied. Then Ve > 0,
38> 0 so that 11(8) - l l < e and l<P1(8) - l l < e. However, Vx E
(f) for which 0 < Ix-al < 8 we hive,r1(8) os;f(x) ..-;;1(8). Hence

-e < :1(8)

l os;f(x)- l

,,,-;;

q;-1(8)

l < e.

This shows thatf(x) -1 as x - 0.


2.4.3

Theorem.

lation point of (f)

If f and g a re bounded functions and a


(g)' then

is an

accumu

2.4

(a)

LIMIT SUPERIOR AND LIMIT INFERIOR I 93

lim (J+g)(x) .:;; lim f(x)+ lim g(x) ,


x-a

x-a

x-a

lim f(x) + lim g(x) .:;; lim (J+g)(x);

x-a

(b)

x-a

x-a

iff and g are nonnegative, then


Jim (fg)(x) .:;; lim f(x) lim g(x) ,

x-a

x-a

x-a

limf(x) lim g(x) .:;; Jim (Jg)(x);

x-a

x-a

x-a

(c) if Vx E .e(g), g(x)


bounded, then
lim (l/g)(x)

Proof.

lb

(a)

al

<

is

x-a

l/lim g(x).

x-a

0 <

and does not change its sign, and l/g

l/lim g(x),

x-a

lim (l/g)(x)

""0

x-a

For every e> 0 and Vr > 0, 3 b E .e (f+g) so that

r and

r+u(r)

<

(f+g)(b)+e .:;;1h(r)+0(r)+e.

Letting r--+ 0 and taking account of the fact that Eis arbitrary, we get
the first inequality in (a). The second one follows by similar reasoning.
(b) For every E> 0 and Vr > 0, 3 b E .e(fg) so that 0 < lb - al < r
and

ru(r )

<

(fg)(b)+E.:;; 1(r)u(r)+ E.

The last inequality makes use of the facts that 0.:;;j(b).:;; r(r) and
0 .:;; g(b) .:;; ?u(r). Letting r--+ 0 we get the first inequality in (b). The
second one follows by similar reasoning.
(c) For every E> 0 and Vr > 0, 3b E .,e)(g) so that 0 < lb - a l < r
and

1/g(r)

<

(l/g)(b)+E.

Since g does not change its sign and l/g is bounded, 3m > 0 so that
Vx , g(x) m or g(x) .:;; -m. Consequently, 'Pu(r) m or ip0(r) .:;; -m,
respectively. Since Vx E .e(g) for which 0 < Ix - al < r we have
g(x) fu(r), we have in either case,

'P11u(r)

<

(l/g)(b)+E.:;; l/ipu(r) + E.

Letting r--+ 0 we get the inequality


lim (l/g)(x) .:;; l/lim g(x).

x-a

x-a

On the other hand, Ve> 0 and Vr > 0, 3 b E .e(g) so that


lb a l < r and

0 <

94 I LIMITS
,,

fa(r)

,b)

<

_f0(r)

E.

Since l/g is bounded, <Pa(r) is never zero. Suppose E is so small that


_f0(r) and !e_0(r) + E hae the same sign. Then g(b) also has the same
sign and
1

_!a(r)

<

1
g(b)

'P11a (r) .

Letting r -+ 0 we see that we have the inequality


l/lim g(x) lim (l/g) (x).
x-a

x-a

This inequality when taken together with the previous one gives equality.
The second equality follows by similar reasoning.
NOTE:

The results of this theorem are completed in Exercises 4 and

5 at the end of the chapter. Also, it is clear that the above theorem

will hold if the various hypotheses put on the functions hold in a


neighborhood of a. Indeed, if I(a) is an open neighborhood of a
and f1 is the restriction off to I(a) n cB (f ), then clearly

1imf1(x) = limf(x) ,

.x-a

x-a

lim f1(x)

x-a

lim f(x).

x-a

We cannot expect expect equality in (c) if g changes sign in every


neighborhood of a. For then the limit inferior of g or l/g at a will
be negative while the corresponding limit superior will be positive.
We shall now given several examples which show that the hypotheses
of Theorem 2.4.3(b) cannot be weakened in any essential way. That is
to say, if we remove the conditions of nonnegativity, the conclusions
may not follow. For the first example take

g(x) = cos(l/x),

f(x) = sin(l/x),

;t. 0.

These functions change their signs infinitely many times in any neigh
borhood of the origin. Now, by a well-known trigonometric identity
we have

f(x)g(x) =sin (l/x) cos (l/x) =sin (2/x).


Therefore,

limf(x)

x-o

=-

Jim g(x)

x-o

-1,

lim (Jg) (x) = -1/2,

x:::;o

and hence
Jim f(x) Jim g(x) > lim (Jg) (x).

x:::;o

x:::;o

x=o

2.4

LIMIT SUPERIOR AND LIMIT INFERIOR I 95

For the second exampie take

J(x) =I+ sin(l/x),


Clearly

g(x) = cos(I/x),

0.

f(x) 0 for all x in its domain, and g changes sign infinitely

often in any neighborhood of zero. If we take xk so that I/xk =

k = 0,

1, , we

(2k+ 1)7T,

get

Hence

(Jg)(x)

lim

-1.

x-o

On the other hand,


limf(x)

x-o

= 0,

lim

x-o

g(x) =-1 ,

which leads to the inequality


lim

x-o

f(x) lim g(x) > lim (Jg)(x).


x-o

x-o

Finally, as a third example we consider

f(x) =-1 +sin (I/x),


If we take

g(x) = -1 +cos (l/x).

xk so that l/xk = (2k+ 1)7T, k= 0, 1, 2, ,we get


f(xk)g(xk) = 2

and therefore
lim

x-o

But lim J(x)


x-o

lim

x-o

(Jg)(x) 2.

g(x) = 0 and consequently we get


limf(x) lim g(x) < lim (Jg)(x).

x-o

x-o

x-o

Let us 'finish this section by giving an application of the use of the


concept of limit superior and limit inferior. This involves an extension
of the idea of a Cauchy sequence.

2.4.4 Proposition. Suppose f is a function, a is an accumulation point


of (J ) and Ve > 0, 38 > 0 so that 0 < Ix-al < a, 0 < l y-al <:a
and x,y E (J ) =}lf(x)-J(y)I < e. Then lim x a f(x) exists.
-

Proof.

Clearly

f is bounded in a neighborhood of a and hence we

may set

l= limf(x).
x-a

We claim that

l=

0 < a < 'YJ, 3y

lim x a f(x) Indeed, Ve > 0, 3'YJ > 0 so that V8 with


(J ) so that 0 < IY - al < a and
-

96 I LIMITS

lf(y) - ll < E/2.


Suppose we have taken 8 small enough so that x,y E (f ) and
0 < Ix - a l < 8 and 0 < IY a l < 8 l f(x) -f(y)I < e/2. Then we
get
-

lf(x) - ll

IJ(x) - f(y)I

lf(y) - ll < E.

D Exercises
1.

that

If f is bounded and

is an accumulation point of (f ), show

Jim - f(x)-= -lim f(x).


x-a

x-a

2. Show that the inequalities in Theorem 2.4.3(b) are reversed if


f and g are nonpositive.
3. If f is a bounded nonnegative function and a is an accumulation
point of (f ), show that Va 0,

Jim r<x) = (Jim f(x))"'

x-a

4.

x-a

Under the hypotheses of Theorem 2.4.3(a) prove that


Jim f(x) + Jim g(x) Jim (f + g)(x),
x-a-

x-a

x-a

Jim (f + g)(x) Jim f(x) + Jim g(x).

x-a

x-a

x-a

Using this in conjunction with Theorem 2.4.3(a) show that if limx- af(x)
exists, then
Jim (f + g)(x) = Jim J(x) + Jim g(x) ,

x-a

x-a

x-a

Jim (f + g)(x) = lim f(x) + Jim g(x).


x-a

5.

x-a

x-a

Under the hypotheses of Theorem 2.4.3(b) prove that


lim g(x)
Jim J(x) x-a

x-a

lim (fg)(x),

x-a

Jim (Jg)(x) lim f(x) lim g(x).

x-a

x-a

x-a

Using this in conjunction with Theorem 2.4.3(b), show that if limx-a


J(x) exists, then
lim (Jg)(x) = Jim f(x) Jim g(x),

x-a

x-a

x-a

Ji
m (Jg)(x) = Jim f(x) lim g(x).
x-a
x-a

x-a

If x E [O, I] and _is rational of the form p/q (in lowest terms),
define f(x) = I/q. Also set f(O) = I and if x is irrationaf set f(x) = 0.
6.

2.4 LIMIT SUPERIOR AND LIMIT INFERIOR I 97

Compute limx-a f(x) and limx-a J(x) for every a in [O, l]

7.

If (an) is a bounded sequence and

A= nlim an,
-oo
show that V E > 0 there are only a finite number of n E N0 with an > A
E and an infinite number of n so that A - E< an. State and prove an
analogous statement for limn_., an.

8. Compute the limit inferior and the limit superior for the
sequences whose terms are the following:
(a) sin (mr/2).
n
(b) (l + (-1) ) COS ( n1T) .
)
(c) (l + l/n (l +sin (mr/8))1'n.
9. Let ( rn) be a sequence, whos range consists of all rationals in
JO, 1 [. If rn= Pnfqn. Pn.qn E N' set Sn= (Pnfqn)11qn . Show that

Jim S = lim S = 1.
n-oo n n-oo n
10. A function f is said to be upper semicontinuous at a <=>a E
>(!) and Ve > 0, 3S> 0 so that Ix-al<Sand x E >(!)f (x)
<J(a) + E. If a E >(!)and is an accumulation point of>(!), show
that f is upper semicontinuous at a if and only if limx-a J(x) os; f(a).
Give an analogous definition for lower semicontinuity and prove a
similar result .
11.

Suppose f has domain [O, l]and is defined by


f(x)
=

{ 1 <=> x is rational,

0 <=>x is irrational.

Where is f upper semicontinuous and where is f lower semicontinuous?


12. Show that the sum and product of two upper [lower] semi
continuous functions are upper [lower]semicontinuous.
13. If f is upper [lower]semicontinuous at every point of a compact
domain, show that f is "uniformly" upper [lower]semicontinuous.
14. If f is upper [lower J semicontinuous with a compact domain,
show that it is bounded above [below]and attains its maximum [mini
mum].
15. If f is a uniformly continuous function, show that it may be
extended to a function J which is uniformly continuous on>(f), the
closure of>(!). Do this by the theme of Proposition 2.4.4.
16. Use the results of Exercise 15 to extend the function ea of Section
2.3 from Qto R.
17. If J is a continuous one-to-one function with a compact domain,
show that 1-1 is continuous.

CHAPTER

3j INFINITE

SERIES

We have already discussed the concept of real sequences, real Cauchy


sequences, and real convergent sequences in Chapter 1. In Chapter 2
we discussed corresponding questions for real functions. In this chapter
we shall discuss the concept of infinite series, which actually is based
on the concept of sequence.
Before we launch into the definitions and theorems about infinite
series, we shall digress for a moment and comment about the meaning
of the sum of a finite set of real numbers. Although we have used finite
sums several times in Chapter 2 and have assumed that the meaning
is intuitively clear (which it is!), it is nevertheless of some value to
comment about the formal meaning. It is possible to prove the following
statement by induction.

There exists a unique function er with domain and range the collection of
all real sequences so that if a= (an) is a sequence, then
cr(a)o= ao,
Vn E

N0.

Although we shall not bother to carry out the proof, the general idea
how to use induction in this context is like that used in the proof of
Theorem l.6.4. It is not hard to prove, using the commutative, asso
ciative, and distributive laws for R, and the axiom of induction, that

Va,{3 E

R and for all sequences

a and b,

cr(aa + {3b) = au(a)


Of course, by

aa we mean that sequence

{3cr(b).

so that

Vn E

N0,

(aa)n=aa n .

We now set
n

k=O

ak= cr(a)n=ao + ai

If we have given a finite set of numbers

a n.

{ ak: k E ( 0, n)},

course necessary to extend the function with values

( 0, n)

it is of

ak and domain

to all of N0 to be able to use the summation symbol. This of

course gives rise to the question whether if we extend in two different

a and b, we get cr(ah=cr(b h, V k E ( 0, n).


c= (en) is a sequence
and ck=0, Vk E (O, n) , then cr(c)n
0. Hence, if ak =bk, Vk E (O, n),
and c=a - b, we have cr(c)n=cr(a - b ) n= cr(a)n - cr(b )n=0. It is also
not hard to show that if a and b are sequences, and <I> is a one-to-one

ways, say to sequences

However, it is a very easy matter to show that if


=

98

3.1

function with domain and range

(O, n)

SERIES OF REAL NUMBERS J 99

so that Vk E

(O, n), ak= b<P<k

then CT(a) n

= CT(b)n
Very often we shall use the symbol

If n < m, this is to be given the value zero. If n ;?.!: m and a= (an) is any
sequence that is an extension of the function with domain

( m, n)

and

values ak, we set bk= am+k and


II

k=m

ak= CT(b)n-m

More generally, suppose A is any finite set and a is a function with do


main A and range in R. If A has n + 1 elements, let <I> be a one-to-one
function with domain

( 0, n)

and range A. Then define

a'=

a EA

k=O

a<P(k)

If A= 0 we define the symbol on the left to be zero.

3.1

SERIES OF REAL NUMBERS

In the introduction to this chapter we have discussed the meaning of


a finite sum of real numbers by means of the CT function. We now wish
to discuss the meaning of "infinite sums" of real numbers. We think
that it should now be quite clear that the most natural way to extend the
meaning of a finite sum is to say that the sum of the sequence (an) is
the limit of the sequence CT(a), provided the limit exists. Let us write
down some formal definitions.

3.1.1

Definition.

An in.finite series is an element of the CT function, that

is, is an ordered pair of sequences (a, CT(a)).


An in.finite series (a, CT(a)) is said to be convergent the sequence CT(a)
is convergent. Otherwise the series is said to be divergent.
An infinite series (a, CT(a)) will often be designated by

and in case it is convergent we shall remove the parentheses and desig


nate the limit by

100 I INFINITE SERIES

This is in keeping with standard practice and the notations are very
convenient. Very often we shall use a notation such as

Let

bk= ak+n;

then we define the above symbol as another name for

The following fact is often very useful in deciding whether a series


diverges, but it tells us nothing about the convergence of a series.

3.1.2

If (a, <T(a)) is a convergent series, then

Proposition.

lim a = 0.
n-co n

Proof.

Since

<T(a)

is a convergent sequence, it must be Cauchy.

Hence VE> 0, 3N so that n:;;;:,: N ==::}

Ian I = l<T(a)n - CT(a)n-11


This is exactly the meaning of

an ---"' 0

as n

< E.

---"' oo.

As an example of how this may be used consider the series

k=O

k
(<-I)k _
I+k

It is not immediately evident whether this senes 1s convergent or


divergent unless we note that

. .

!1.?!

k
= I'
I+ k

so that by the last proposition the series is divergent. However, even if

an ---"' 0

it does not mean that the series is convergent. The standard

example is the harmonic series

k=l
Clearly, 1/k

-0

as k

oo,

G)

but the sequence of partial sums given by

sn=

Lk
k=l

is not Cauchy. Indeed, we first note that if k E N and k n, then

n +k

2n. Hence

Vn E

N,

S2n - Sn=

L n+k
k=t

I
:;;;:,: 2

U SERIFS OF REAL NUMBERS I 101

This means that the sequence of partial sums is not Cauchy and hence
cannot be convergent.
Let us now record a very simple but very useful fact: The sum of two
convergent series is again convergent.

3.1.3

a, {3 E

If (a, u(a)) and (b, u(b)) are convergent series and


then (aa + {3b, u(aa + {3b)) is convergent and

Theorem.
R,

00

00

00

k=O

k=O

k=O

L [aak+,Bbk] =a L ak+{3 L bk.

Proof.

The proof is simply a consequence of the facts that

u(aa + ,Bb) =au(a) + {3u(b) ,


lim

n-oo

[au(a)n + ,Bu(b)n] =a

lim

n-=

u(a) + ,B
n

lim

n-ao

u(b) .
n

As we pointed out in our discussion of finite sums, it is not hard to


prove, making use of the commutativity law for R, that if

ak =b<t><k where cl>


(0, n), then

V k E (O, n),

is a one-to-one function with domain and range

In other words, rearranging the indices on a finite set of numbers does


not change its sum. This may no longer be true for a convergent in
finite series, as we shall eventually show. However, we must first know
what it means to rearrange an infinite series.

3.1.4 Definition. A rearrangement of the infinite series (a, u(a)) is


an infinite series (a cl>, u(a cl>) ), where cl> is a one-to-one function with
domain and range N0
0

It turns out that every rearrangement of an infinite series is conver


gent if and only if the series whose terms are the absolute values of the
terms of the original series is convergent. We first give a formal
definition.

3.1.5 Definition. An infinite series (a, u(a)) is said to be absolutely


convergent the sequence u( lal) is convergent, where Vn E N0, lal =la l
If an infinite series is convergent, but not absolutely convergent, it is called
conditionally convergent.
n

n .

A natural question is whether or not conditionally convergent series


exist. The answer is in the affirmative and an example is given .by the
senes

102 I INFINITE SERIES

k=O

( (-I)k )
k+I

This series is certainly not absolutely convergent, since the series of


absolute values is the harmonic series. The fact that the series is con
vergent is a consequence of Leibnitz' criteria, which will be established
in Section 3.2.

3.1.6 Lemma. If V k
finite subset of N0, then

ak

E N0,

0,

<T(a) is convergent, and A is any

co

L ak:,,;; L ak.
kEA
k=O
Proof.

Let

max A and B = (0,

n) \A.

Then

L ak:,,;; L ak+ L ak=<T(a)n


kEA
kEA
kEB
Now

<T(a)

is monotone nondecreasing and convergent. Hence


co

<T(a)n:,,;; L an ,
k=O
which proves the lemma.

3.1.1 Theorem. If an infinite series is absolutely convergent, then it is


convergent, and any rearrangement converges to the same limit and is abso
lutely convergent.
Proof.

Vn E N0, an 0. Let (a ct>, <T(a ct>))


{j: (3k) (k E (0, n) & j ct>(k)}. Then

Suppose, at first, that

be a rearrangement and A=

by the last lemma we have

n
L a
k=O
Hence

<T(a

ct>)

co

ct>k= L ak:,,;; L ak.


k=O
keA

is bounded, and since it is monotone nondecreasing

it converges and
co

L a
k=O
Actually, since

(a,<T(a))

co

ct>k :,,;; L ak.


k=O

is a rearrangement of

(a

ct>, <T(a

get the reverse inequality, so that the two numbers are equal.
For a general absolutely convergent sequence
+ _

an a

,,

lanl+an
2

lanl - an
2

(an)

let us set

ct>)),

we

3.1

Thus 0,,;;;;

an+,,;;;; Ian!, 0,,;;;; an-,,;;;; Ian!, and an=an+ - an-.


n
n
"'
L ak,,;;;; L lakl ,,;;;; L lakl,
k=O

and sirice

SERIFS OF REAL NUMBERS I 103

<T(a+)

and

k=O

<T(a-)

Hence

k=O

are monotone nondecreasing sequences

they are convergent. From the previous paragraph and Theorem 3.1.3
we get
"'

L a

k=O

"'
<l>k = L a+

"'

<l> k - L a- <l>k

k=O

k=O

Note that the first equality and the last equality come from Theorem

3.1.3 and establish the fact that

<T(a)

is convergent. Also,

<T(la <l>I)
0

is

clearly convergent, since

la

<l>kl=a+

<l>k +a-

<l>k

The converse of the last theorem is due to B. Riemann. Although the


idea behind the proof is relatively simple, the technical details of a
formal proof are slightly complicated. Hence before we proceed with
the statement and proof of this theorem, we shall note that a form of the
assonauve law is true for convergent infinite series. We make this
precise by means of the following two statements.

3.1.8 Definition. The series (b, <T(b)) is said to be a grouping of the


series (a, <T(a)) <:::::? there exists a monotone increasing function <I> from N0
into N0 so that <l>(O) =0 and

bn =

<l>(11+1l-l

ak.

k=<l>(n)

3.1.9 Proposition. If (a, u(a)) is [absolutely] convergent, then any


grouping is [absolutely] convergent to the same limit.
The proof is so simple that we shall leave it as an exercise. We s hould
point out that it is sometimes possible to group a divergent series and
obtain a convergent series. For example, the series

k=O

is clearly divergent, since


if we take

<l>(k)

and consequently

((- I) k)

does not converge to zero. However,

2k, then

u(b)

is convergent.

104 I INFINITE SERIES


3.1.10

If a series

Theorem (Riemann).

(a, <T(a)) i,s conditionally

convergent, then for any two real numbers a 13, there i,s a rearrangement
(a

<I>,

<T(a

<I>)) so

that
lim <T(a
nco

Proof.

<l>)n =a,

The intuitive idea of the proof is quite simple. Let (a+n) be

that subsequence of (an) which consists of the nonnegative terms and


(-a-n) that sMbsequence which consists of the negative terms. The
sequences <T(a+) and <T(a-) must both be unbounded. For since both
sequences are monotone nondecreasing, if one is bounded it converges,
and, since <T(a) converges, the other sequence will converge also.
Hence <T( lal) will converge, contrary to hypothesis. Further, note that
since <T(a) is convergent, a+n

0 and a-n

0 as n

oo.

Let r0 be the smallest integer so that <T(a+)r0 > 13. This is possible
since <T(a+) is unbounded. Hence the difference between <T(a+)r0 and
13 is at most a+ro Next let s0 be the smallest integer so that <T(a+)r0
- <T(a-) < a. Again this is possible, since <T(a-) is unbounded. The
difference between a and <T(a+)r. - <T(a-)80 is at most a-. Now let r1
be the smallest integer so that <T(a+)ri - <T(a-)80 > 13; that is,
ro

k=O

so
k

k=O

a-k +

ri

a+k > 13.

k=ro+l

In other words, we have added just enough positive terms to just


bring the sum up past 13. The difference between this sum and /3 is at
most a+ r,. Proceeding in this way we see we get a rearrangement of
the original series having the specified properties.
Let us now proceed to write all this down in a formal manner follow
ing the informal proof as an outline. We shall break the proof into a
number of parts.
(a)

The first step is to get the subsequences (a+n) and (a-n) For this

purpose consider the sets


A+= {n: an O},

A-={n: an< O} .

The sequence (a+ k ) is obtained by relabeling the elements in the set


{an: n E A+} and -(a-n) is obtained by relabeling the elements in the
set {an: n EA-}. The sets A+ and A- are both denumerable subsets of

N0 Indeed, if eitherA+ orA- is finite, all but a finite number of elements


of the range of (an) are negative or nonnegative, respectively. In either
event <T(.lal) is convergent, contradicting the original hypothesis. Clearly
A+ n A-= 0 and No= A+ U A-.
The functions that relabel the elements of (an) to give (a+ k ) and
(a-k) are the functions <1>+ and <1>- having domains N0 and N and ranges
A+ and A-, respectively, so that <1>+(0) =min A+, <1>-(1) =min A-, and

5.1

SERIES OF REAL NUMBERSj 105

The existence and uniqueness of such functions can be proved by in


duction in exactly the same manner as done in the proof of Theorem
1.6.4. They clearly are monotone increasing functions. Now, define

ak

Clearly

0 and hence

o-(a+) and o-(a-) are monotone nonde

creasing.
Actually, both of these sequences are unbounded. For if, for exam
ple,

o-(a+) is bounded it is convergent. Now, if n is sufficiently large,

the set

{k: <1>+(k)

(o,<1>-(n))}

mn. Clearly
mn oo as n oo, since otherwise "(<1>+) is finite. Now, (<1>+j(O,mn))
=A+ n (O,<1>-(n)), and (<1>-j (I, n)) =A- n (O,<1>-(n) ). These sets,
which we shall label Bn+ and Bn-, respectively, are disjoint and together
give (O, <1>-(n)). Thus
is not void, and hence has a maximum which we label

o-(a) <1>-<n> = L ak + L ak = o-(a+)mn - o-(a-)n.


kEBn +
kEBnSince

o-(a) and o-(a+) converge, it follows that o-(a-) is convergent.

But since

it follows that
(b)

o-(lal) is convergent, which is a contradiction.

We shall now formalize the process of adding blocks of the se

quences

(a+n) and (-a-n). Let qr be the function with domain

N0 and

range in N0 which satisfies the following:

qr(O)
qr(l)
qr(2)

= min
=min
=min

{k: o-(a+h > /3}


{k: o-(a+hw> - o-(a-h <a}
{k: o-(a+h- o-(a- )"'m > /3}

=min {k: o-(a+h- o-(a-)'i'<2n 1> > /3}


qr(2n)
qr(2n + 1) =min {k: o-(a+)"'<2ni - o-(a-h <a}

qr can easily be proved


o-(a+) and o-(a-) are
(qr(2n)) and (qr(2n+ 1))

The existence and uniqueness of the function

using the axiom of induction and the fact that


unbounded. It is clear that the sequences
are monotone increasing.

For the sake of notational convenience let us set, for

Tn =qr(2n),
Sn= qr(2n + 1).

E N0,

106 I INFINITE SERIES

From the definition of 'I' we see that Vn E N0,

u(a+)rn i-l - u(a-)sn :!S f3 < u(a+)rn i- <T(a-)n ,


+
+
u(a+)rn - u(a-)sn <a :!S <T( a+)rn- u(a-)sn-1

(3.1.1)

Now u(a+)rn 1-1= u(a+)rn +i -a +rn i and u(a-)sn- 1 = u(a-)sn -a-sn.


+
+
Since u(a) converges, an.- 0 asn .- oo, and hencea+,, .- 0 anda-n .- 0

as n .- oo. But rn .- oo and Sn.- oo as n .- oo. Hence Ve> 0, 3N so that


n N=}a+rn l < E anda-Sn < e. Thus from (3.1.1) we see that Ve> 0,
+
3N so that n N=}

a(c)

f3 < u(a+)rn i- u(a-).n < f3 +


+
E < u(a+)rn - u(a-)sn <a.

E,

(3.1.2)

We shall now define the one-to-one function that gives the

proper rearrangement of the series (a, u (a)). Let us put, for j E N0,

n_1 ro,
n2i = S; + Tj,
n2i l = Sj + T;+1
+
=

Since

(s;)

(3.1.3)

and (r1) are monotone increasing, the sequence (

m) is mono

tone increasing. Let us put, form E N0,

B_1= {k: 0 :!S k :!S r0},


Bm = {k: nm 1 <k :!S nm}.
These sets are certainly pairwise disjoint and

N0= U {Bk: k

N0 U {- 1} } .

Let <I> be that function, with domain N0, defined as follows:

<l>(k)=

{ <l>+(k- SJ-1) :::>k


<1>-(k-r;) :::>k

E B2;-1,

B2i,

(3.1.4)

where we set s_1 = 0. Let us rewrite this in a slightly different form


which will be useful later. We have

<l> ( s ; 1 +k)= <1>+(k)


<l>(r; +k) =<1>-(k)

for r; 1 <k :!S r; ,

for S; 1 <k :!S S;.

(3.1.4')

where we set r_1 = -1, and recall that s_ 1 = 0.


Let us put

B+= U {B2; 1: j E N0},


B-= U {B2;:j E N0}.
Then B+ n B-= 0 and B+ U B-= N0 The range of <l>jB+ is A+ and
the range of<I>IB- is A-. This is immediately clear from (3.1.4') and the
fact that (<1>+)=A+ and (<I>-)=A-. Thus (<I>)=A+ U A-= N0
Further, we see that <l>B
j + and <l>jB- are monotone functions. Indeed,

suppose p E B21_1, q E B2; 1 and P < q. We haven2<i l> < P < q .;:; n2; 1>

and since the sequence (nm) is monotone increasing we have that

3.1

2(i - I) < 2j- 1,

which implies that

i,,;; j.

SERIES OF REAL NUMBERS I 107

Hence, since the sequence

(rk) is monotone increasing we have p =s1_1 ,,;; r1 ,,;; r;-1 <q -s1-1 if
i <j, andp-s1_1 <q-s1_1 if i = j Thus
.

<l>(p) =<l> +(p-s;-1) < <1>+(q-s1-1 ) =<l>(q).

In the same way we show that <l>IB- is monotone. These facts, together

with the fact that .9i(<l>IB+) n .9i(<l>IB-) = 0, show that


one. Hence, we have shown that

<I> is

<I>

is one to

a one-to-one function with domain

and range N0
(d)

Let us now bring all these things together and show that
lim O'(a <l>)n =a,
0

n-co

lim O'(a 0 <l>)n = {3.

n-.,

We shall prove only the second one, since the first one will follow by
similar reasoning.
Suppose we take

odd,

= 2j + 1. Then we can write O'(a <l>)nm


0

as a "telescoping sum" in the following way:

O'(a <l>)nm
o

O' (a <l>) ro +
o

L [O'(a

k=O
j

L [O'(a

k=O

<l>)n2k - O'(a <l>)n2k-1J


o

<l>)n2k 1 - O'(a 0 <l>)n2kJ


+

Now,

sk+rk

O'(a <l>)n2k- O'(a <l>)n2k-1 =


o

By (3.1.4') we have a <l>(rk +


0

a <l>(i)

k-1+1

i) =a <1>-(i) =-a-;
0

Thus we get

O'(a

<I>) n2k- O'(a <I>) n2k-1


o

forsk-I

< i,,;;sk.

.L

a-i
k-1+1

=:' -

=-[O'(a-) k- O'(a-) k-1].


Hence
j

L [O'(a

k=O

<l>)n2k- O'(a 0 <l>)n2k-1J

(Recall, we have taken a-0 =

that

=-

L [O'(a-).k-

k=O

O'(a ) k-1]

=-O"(a-).i'

0.)

By exactly similar reasoning we find

I 08 j INFINITE SERIES
and hence
j
+
+
L [<T(a 0 <l>)2k+i - <T(a <l>)2k] = <T(a )r;+1 - <T(a )r0
0

k=O

Consequently, we get

<T(a o <l>)n2J+i
If nm-1 <

n nm+1

<T(a+)r;+1 - <T(a-)sr

(3.1.5)

we have

nm-1
n
L a 0 <l>k= L a 0 <l>k + L a
n

k=O

n m-1+1

k=O

<l>k.

Now, nm-1 = n2J. and for n2J < k n2;+1 we have a <l>k= a 0 cJ>+(k - SJ )
= a+k-J 0. Also, for n2J+1 < k n2J+2 we have a 0 cl> k a 0 ct>-(k - r;+1 )
= -a-k-rJ+1 0. Hence, in either case, n < nm or nm n, we get
0

La

k=O

<l>k

nm

L a <l>k= <T(a
o

k=O

<l>) n2J+1

(3.1.6)

If we combine (3.1.2) with (3.1.5) and (3.1.6) we have VE> 0, 3N so


that

n n2N =:::::}
n

L a 0 <l>k

k=O

{3 + E.

This means that limn-"'


Vj E

N0, {3

<

<T(a

<T(a 0 <l>)n {3. On the other hand, from (3.1.2),


<l>)2J+i Thus we see that
Jim
n-oc

REMARK:

<T(a 0 <l>)n

{3 .

It is clear from the proof of the last theorem that we could

take

{3 = oo or a = -oo. That is, we could find a rearrangement


that <T(a 0 cl>) is unbounded above or unbounded below.

so

D Exercises
1.

a,

(a, <T(a)) is absolutely convergent, and b is a subsequence of


(b, <T(b)) is convergent.
If r # I, show that Vn E N0,
If

show that

2.

:L rk=
k=o

1 - rn+l
.
I

-r

Hence give the values of r for which the series

will converge and the values for which it will diverge.

3.1

3.

Let

SERIES OF REAL NUMBERS I 109

(bn) be the subsequence of (an) consisting of the nonzero


(a,u(a)) and (b,u(b)) .con

terms of the latter. Prove that the series


verge or diverge simultaneously.
4.

If k=o

(ak) is convergent, show that Vn

N0, k=n (ak) is

convergent.

5.

If

(a, u(a)) is convergent and b is a subsequence of a, is it


(b, u(b)) is convergent?

necessarily true that

6.

Show that

n(n+ 1) = 1.
00

[Hint:

n(n+ 1)
7.

1
:;;

Show that

00

- n+ J
1

I n(n+ l}(n+2) = ;r
8.

Determine whether the series

is convergent or divergent.

9.

If limn-oo

an exists, show that the sequence defined by

n
L ak
n+ l k=O
converges to the same limit. This is called Cesaro summability.
bn

(an) is a monotone nonincreasing sequence of nonnegative


<T(a) converges, show that nano. This result is often called
Pringsheim's theorem.
IO.

If

terms and

I I.

Show that the following two series converge or diverge simul

taneously, provided

Vk

00

N0, ak 0.

L ((k+ l)(a2k+a 2k+1)),


k=O
I2.

If

00

( 2k )

o ; a1

(ak) is a sequence with nonnegative terms and (a, CT(a)) is


(En)

divergent, show that there is monotone nonincreasing sequence


that converges to zero and

is divergent.

110 I INFINITE SERIES

3.2

CONVERGENCE TESTS

We shall first give a number of tests for the absolute convergence of


series.

3.2.1 Comparison Test. Suppose (an) and (bn) are sequences for
which there is an N so that n N::::} lanl .,;; lbnl. Then ifer(lbl) is convergent,
so is er(lal), and consequently if er(lal) is divergent, so is er(lbl).
Fix

Proof.

n0

N and for

>

n0 we have

o.;;;er<laDn-er(lal)no=

n
L lakl
k=no+l

.,;;

L lbkl
k=no+l

er(lbl)n-er(lbl)no

Hence, if er(lbl) is convergent it is bounded, and thus er(lal) is bounded


and convergent.
As an example of the comparison test, let us first examine the con
vergence of the geometric series

We know (see Exercise 2 of Section 3. 1) that for

#-

l,

n
1 - rn+I .
rk=
L,,,
l-r
k=O
Hence, if lrl < 1 ,
If

rn

0 and the geometric series converges to 1/ ( 1 - r).


1, the series diverges since clearly lim n - rn #- 0.
2 3
n 2n-1, it follows that l/n!.;;; (I/2)n-i.
Now, since n!

lrl

co

Consequently,

1
L n1
n=l
co

converges by the comparison test.

3.2.2

lim lanl11n.
n< 1 and diverges if p
p

Then er(lal) converges if p


Proof.
la.I''" is

co

>

1.

I f p < 1 , 3e > 0 s o that p + e < 1 . B y the definition of limit

superior, 3N so that
t If

Let (an) be a sequence andt

Cauchy's Root Test.

N::::}

unbounded, say that p = oo and consider p > 1.

3.2

CONVERGENCE TESTS I 111

I an I 11n < p + E'


or, what is the same thing,

lanl < (p + E) n.
By the comparison test (with the geometric series)

u( lai)

converges.

If 1 < p < oo, 3e > 0 so that p- e > 1. By the definition of limit

superior, VN, 3n N so that

lanl11n

p- E,

>

or, what is the same thing,

lanl

(p - E)n

>

> 1.

lanl 0 and thus u(jai) cannot con


lanl11n is unbounded and clearly we get

Consequently, it is not true that


verge. If p = oo, the sequence

the same result.

Let us prove the converges of k=i


First, if

N and n >

k,

we have

n!

>

(l/n!)

kn-k+1.

Consequently, for all sufficiently large

( n ! )l/n
This shows that

.
hm

n,

kk<l-k>Jn

>

by means of the root test.

( )l/n
1
-1

n-"' n.

>

k/2.

=O,

and the series converges by the root test.

3.2.3
3N so

Suppose (an)

D'Alembert's Ratio Test.


N ===>an- 0. Define

that n

is

a sequence for which

l aann l
1 aan+n1 1

+l ,
R =Im
.
1
-n-

QO

r=I.1m

n:::"OO

If R < 1, u(jai)
Proof.

is

convergent and if r

-- .

> 1,

u{iai)

is

divergent.

This is an immediate consequence of the root test and the

chain of inequalities,
llm

n-co

1 aan+n 1 1

.;;;;

1
lffi

n-oo

I an 111n

112 I INFINITE SERIES

We shall leave the proof of these inequalities to Exercise 1 at the end


of the secu'on.
The convergence of the series of factorials that we have considered
previously is also an immediate consequence of the ratio test since

n!/ (n + I) != l/n

10.

The ratio test, when it works, is usually very easy to apply and hence
is very useful. When it fails it may be possible to apply a slightly sharper
variant of it.

Raabe's Test. Suppose (an) is a sequence for which 3N so that


- 0. Let us put

3.2.4

N => a

n- ( -laann+i I) ,
n n(1-1 aann+il)

= lim n

oo

{3= lim
-

If

oo

> 1, er(lal) is convergent and if (3

Proof.

Thus

If

kjak+il
jak+il

> 1,

la:l l

< k
(

Hencek

Vp so that

1, er( lal) is divergent.

<

> p > 1, 3 K so thatk


<

K=>

-x

- I) lakl + ( I - p) lakl, or, rearranging terms, we get

<

(p - I) lakl

< (k

- I) lakl - klak+il

is monotone decreasing ask increases and consequently

it is bounded. If we sum the above inequality we get

< (p- I) lavl - nlan+il,


k=p
er(lal) is bounded and thus convergent.

(p - I) L lakl
Therefore,

K.

Suppose now that (3 < 1. Then V p so that (3 < p < 1, 3k so that

k K=>

lakak+1 1

> l

f!..

Therefore,

so that k a k + il is monotone increasing ask increases. For fixed


andk p we get

lak+1I > (p- 1) lavlfk.


Consequently, er (lal) diverges by comparison with the harmonic series.

3.%

CONVERGENCE TESTS I I 13

Both the root test and the ratio test fail for the series

k=l

{: )
2

However, Raabe's test will show this is convergent. Indeed,

Thus
I

(
-

k +I

I
> I - I+ 2/k

=k

2
+ 2'

and hence
Jim

k-oo

(k +

2 ) [l

(k/k + 1)2]

=Jim

k-oo

k [I

(k/k + 1)2 ]

;:.;,: 2.

3.2.5 Cauchy's Condensation Test. If (an) is a monotone nonin


creasing sequence with nonnegative terms, then the following series converge
or diverge simultaneously:

Proof.

The sequences of partial sums of these series are monotone

nondecreasing. If a monotone nondecreasing sequence has a subse


quence that is bounded, then the sequence is bounded and thus con
vergent.

If the sequence is divergent, then every subsequence is

unbounded.
Using the monotone nonincreasing character of (an) we get
2k+l-1

L a;,,;;; 2ka 2k,,;;;

J=2k

2k-1

i=2k-1

where the sum on the right is taken to be 2a0 if

k =0

to

n we

k = 0.

Summing from

get

u(a)2 n+l-i,,;;;

ao + L

k=O

2k a2k,,;;; 2u(a) 2n_1

This set of inequalities when taken together with the comments in the
first paragraph constitute the proof.
As an example of the use of the condensation test, let us consider
the convergence properties of the

p series,

114 I INFINITE SERIES

Recall that for p

1 we called this the harmonic series and showed it

diverged. Hence the comparison test shows that the p series diverges
for p < 1. We can apply the condensation test only for p 0, since
otherwise the sequence ( l/nP) is increasing. For p 0 we must examine
the convergence of the series
00

'L

k=O

k
( l/2<p-l) ) .

This is a geometric series with r= l/2<P-0. Hence, if p > 1, 0 r < l,


and the series converges. If p 1, r 1, and the series diverges. Thus
the p series converges for p > 1 and diverges for p 1 .
There is another very useful test for absolute convergence called
the integral test. We shall take this up in Chapter 5. For now let us turn
to some other convergence tests which are not specifically tests for abso
lute convergence. Let us first prove a result called the Abel sum mation
form ula. In Chapter 5 we shall recognize this formula as a special case
of integration by parts for Riemann-Stieltjes integrals.
For every pair of sequences (a,,) and (b,,)

Lemma (Abel).

3.2.6

k=O

akbk

bn+1 u(a)n -

k=O

(bk+1 - bk ) u(ah.

We have, upon setting u(a)_1

Proof.

(3.2.l)

0,

ak = u(ah- u(ah 1
Therefore,
n

k=O

akbk=

k=O

bku(ah-

k=I

bk u(ah-1

(3.2.2)

The last sum begins at k= 1, since u(a)_1= 0. This last sum can also
be written
n

k=O

bk+1<T(a)k - bn+1<T (a)n.

If we put this into (3.2.2), we get (3.2. 1).

3.2.7

Abel's Test.

If (a, u(a)) is a convergent series and (bn)

is

m onotonic convergent sequence, then

is convergent.

Proof.

Since u(a) is convergent and (bn) is convergent, it follows

that ( u(a)nbn+i) is a convergent sequence. Also u(a) is bounded, say,

3.2

by M , and hence, since

(bn)

is monotone,

L lbk+1 - bkl lcr(ak) I

k=O
Since (bn)

CONVERGENCE TESTS I 115

n
M L lbk+l - bk l
k=O

M [bn+l

- ho]

is convergent, the sums on the left are bounded, and since

they are monotone nondecreasing they are convergent. Thus by using

(3.2. l) we have completed the proof.


3.2.8

to

zero,

If cr(a) is bounded and (bn) is nonincreasing

Dirichlet's Test.

then

is convergent.
Proof.

The fact that

L (bk+1 - bn)cr(ah

k=O

is convergent follows in exactly the same way as the previous proof.


If Mis a bound for

er

(a),

then

lcr(a)nbn+il

and since

bn

- 0, cr(a)nbn+i - 0.

Mlbn+1I.

These facts in conjunction with Abel's

summation formula complete the proof.


As we saw in Proposition 3.1.2, an infinite series

gent only if an

- 0.

(a, er(a))

is conver

We also gave an example, the harmonic series, which

showed that in general this is not a sufficient condition for convergence.


However, under certain circumstances it is a sufficient condition, as
the next result shows.

3.2.9

If (an) is nonincreasing to zero, then the series

Leibnitz' Test.

is convergent.
Proof.

(an)

This is an immediate consequence of Dirichlet's test. Indeed,

is nonincreasing to zero and the sequence with terms

L <- t)k

k=O
is bounded by 1.

As an example of the use of Dirichlet's test we consider the series

k=I

kx ) ,

si

x E ] -oo,

oo

[.

116 J INFINITE SERIES

Recall the trigonometric identity

2 sin (x/2) sin kx

cos

(k - l/2)x - cos (k + 1/2)x.

Summing both sides we get


n

2 sin (x/2)
Thus, if

k=l

sin kx

cos (x/2)

cos

(n

1/2)x.

27Tm, (l:=i sin kx) is a bounded sequence. Since l/k

we may apply Dirichlet's test and we see the original series converges.
If x

27Tm, the series converges to zero.

O Exercises
1.

Prove the chain of inequalities in the proof of the D'Alembert

ratio test,

2.

Test the following series for convergence:


(a)

(b)

3.

3.2.3.

o (i : )
2 (
)
k2

k3
1 + k! .

00

k=O

Find all values of a for which the following series converge:


00

(a)

(b)

(c)
4.

k =l

(akka).

i( )
i C10 )
k( lo k)a
k)a

Test the following series for convergence:


00

(a)

(b)

(c)

k=I

( (-l)k21tk).

0 (
(

).
)

(-l)k [v'k+T -VkJ


(-l)k [211k

211<k+i>J .

3.l!

5.

CONVERGENCE TESTS I 117

Show that the following series is convergent by using the Abel

summation formula.

fG
k =I

6.

log

(I+ l/k )

(a, <T(a)) is an absolutely convergent series, show that it is

If

always possible to find an unbounded monotone nondecreasing se


quence

(wk ) so that

is absolutely convergent.

7.

(a, u(a)) so that for every


(En) which converges to zero,

Show that there is a divergent series

monotone nonincreasing sequence

is convergent. Compare with Exercise

8.

12 of Section 3.1.

Give another proof of Leibnitz' test by taking

b k = (-l )k a k and

(u(h)2n+t) is monotone nondecreasing, the


(<T(b)2n) is monotone nonincreasing, and noting that

showing that the sequence


sequence

<T(h )2n+l = u(b)2n - a 2n+ I


9.

Test the following series for convergence:


(a)
(b)

IO.

(
:i (
k =O

(-l )k 2-1 1k /(k + 1) .


(- l )k

(Vk+i - vk) .

Test the following series for convergence:


co

(a)

L
k =l

<3-11k [211k

2 11< k + l ) J) .

limit comparison test: If (an) and (hn) are sequences


(anfbn) exists and is different from
zero, then u(a) and u(b) converge or diverge simultaneously.
11.

Prove the

with positive terms and limn-

12. Suppose lim nVr for which p < r <

co

co

n
lanl11 = p

<

1. Ifs= limn-co u(a)n, show that


n;;;.: N ==>

1, there is an N so that

Is - <T(a)nl

<

n
r +l/(l - r).

118 I INFINITE SERIES

13.
that

Suppose limn-

Vr for which

p <

co

lan+ifanl = p
<

1,

Is - <T(a)nl
14.

Suppose limn-co

Show that

Vr for which

<

lanl

n(l - lan+1/anl ) = p > 1 and s


limn-co <T(a)n.
p > r > 1, 3N so that n:;;. N
=

Is - <T(a)nl
15.

l, and s = limn-co <T(a)n Show


n:;;. N

<

3N so that

<

n
r-

--

lan+1I

{an) and ( bn) satisfy the hypotheses of


s =limn-co <T(ab)n, find an error estimate for

Suppose the series

Dirichlet's

test.

Is - <T(ab)nl .
16.

Suppose

If

(an) satisfies the conditions of Leibnitz' test. Show that

(-l)kak

(-l)kak an+1.

3.3 DECIMAL EXPANSIONS


If

m is a positive integer greater than 1, then the series


co

k=l

(ak/mk) ,

is convergent to a number in

ak

(0, m - 1),

[O, l]. Indeed, since

the convergence follows by comparison with the geometric series, and,


moreover,
co

k=l

ak /mk (m -

1) L l/mk

The object of this section is to show that

co

k=l

1.

e very number in [O, l] may be

represented by such a series.

3.3.1 Theorem. If m is a positive inte ger greater than 1, then Va


[O, l], there exists a seque nce (ak) with range in (0, m - 1) so that
a=

co

k=l

ak/mk.

Proof. Let a1 be the largest integer in (0, m - 1) so that a1/m a,


2
a2 the largest integer in (0, m
1) so that a1/m + a2/m a, and if
a1,
, an-l have been determined we take an to be the largest integer
-

3.3

DECIMAL EXPANSIONS I 119

so that
(3.3.1)

In this way we can prove by induction that there exists a unique sequence
(ak) so that Vn EN, (3.3.1) is satisfied and Vj E (1, n) if c E (O, m- I)
& c >a;, then

kEAj

ak/mk + c/n; >a ,

where A;= (1, n) \{j}.


The sequence with terms given by (3.3.1) is monotone nondecreasing
and bounded above by a, hence convergent. Of course, the limit is also
bounded above by a. Hence Vn E N, we get
n

oo

ak/mk L ak/mk = b a.
k=l
k=l

Since m > 1, limn--+ 0 as n--+ oo. Thus, if b <a, there is a smallest


positive integer p so that
l/mP-l a -b .
Now if n

;;;,: p

w e have l/mn -1 a
n-1

k=l

b and consequently

ak /mk + (m - l)/mn b + l/mn-1 a.

Thus, because of the way (ak) was defined, we must have an= m-1.
Let q be the smallest integer so that n ;;;,: q :::} an m-1. Clearly q > l,
for otherwise, using the formula for the sum of a geometric series, we
get
=

b=

(m-1)/mk=

1;;;,:

a,

k=l

which contradicts the assumption that b <a. Since


..

k=q

it follows that
b

q 1
=

k=l
q-2

(m - l)/mk

l/m"-1,

ak /mk + l/m"-1

L ak /mk + [aq-1 + l]/m"-1 <a .


k=I
Since aq-1 < m-1, it follows that aq 1 + 1 m-1, which is a contra
diction, since the definition of (ak) does not allow such an inequality
as the last one. Therefore, we must have b= a and the proof is complete.
=

1%0 I INFINITE SERIES

As the reader well knows, the

duimal expansion

of a number in

is obtained by writing down the elements of the sequence

(ak)

[O, l]

in suc

cession to get

Of course, the base

must be specified.

It is not true that a real number has a unique representing sequence.


Indeed, suppose

Then we also have

a=

n-1

k=l

ak/mk + [ak - l]/mn +

k=n+l

(m - l)/mk.

Nonzero rational numbers such as these, which have a representation


with

ak= 0

for all

sufficiently large, are called

terminating decimals.

These rationals are the only reals that do not have unique series repre
sentations, and as a matter of fact they have exactly two representations.
We shall now prove these statements.

3.3.2 Theorem. Every real number in ] 0, l], except a terminating


decimal, has a unique representing series with respect to a fixed m E N , m > 1.
Each terminating decimal may be represented in exactly two ways, either by a
finite sum or by a series for which 3 n so that Vk :;;;-: n, ak= m 1 .
-

Proof.

Suppose we have

a=
If we set

ck= ak - bk ,

00

k=l

then

akfmk=

ickl .:;;; m
00

k=l
Let

00

k=l

bk/mk.

1 and

ck/mk = 0.

be the smallest integer in the set

{k: ck

7':-

O},

provided this set

is not empty. We then get

icnl/m n=

lk+i ck/mk l
00

.:;;;

k=n+l

lck l /mk
00

.:;;; (m
Thus we must have

lcnl =

m.

Since

ak

and

bk

1)

k=n+l

l/mk= l/mn.

l, which in turn implies that the above in

equalities are equalities. Hence

=1

are in

Vk > n, ck= m 1, or Vk > n, ck


(O, m - 1), this means that Vk > n,
-

3.5

ak = m

and bk =

0,

or V k > n, ak =

0,

DECIMAL EXPANSIONS I Ul

and bk = m

1.

These cases

are the cases of a terminating decimal. If a is not a terminating decimal,


the set

{k: ck

=fa

O}

must be void. This concludes the proof.

We can use the series or decimal representation to show that the set
of numbers in

[O, I]

is not countable. The set of terminating decimals,

being an infinite subset of the rationals, is denumerable. Hence the set


of numbers in

[O, I] is not finite and the set of sequences each having


( 0, m 1) is denumerable or not denumerable if and only
of numbers in [O, 1] is denumerable or not denumerable,

its range in
if the set

respectively.
Let us suppose that the set of sequences each having range in

( 0, m

1)

is denumerable. Then there is a one-to-one function <I>

mapping N onto this set of sequences. Let us set

Let (ak) be that sequence defined as follows:

ak=

a\
m

1 ::::} a k
::::}

=fa

0,

k
a k = 0.

This sequence exists since m 2. Now the sequence (ak) has range in

( 0, m

1)

and therefore there is a q E N so that

But aq =fa a q, which is a contradiction. Hence the original assumption


that

[O, I]

is denumerable is untenable. Thus

[O, l]

is an example of

an infinite set that is not denumerable. Such a set is called uncountable


or norulenumerable.
The fact that

[O, l]

is uncountable is due to G. Cantor. The process

by which this was shown to be true is, for obvious reasons, called Cantor's

diagonal process. We have proved the following theorem.


3.3.3

Theorem (Cantor).

The set [O, l] is uncountable.

CANTOR'S SET

Let us take m = 3 in the previous considerations, the ternary expansions


of the numbers in

[O, l].

The set C of numbers in

[O, l]

that can be

written in a ternary expansion given by the sequence (ak) so that

Vk E N, ak =fa 1, is called Cantor's set. This set has many interesting


properties that can be used to advantage for giving examples and
counterexamples.
First, the set C is closed. We will show this by showing that its comple
ment is open. Let b be a number in the complement of C and suppose

12% I INFINITE SERIES

j is the smallest element in the set {k: bk= l}. Now, 3k


bk 2 . Otherwise, since
"'

2/3k

k=;+l
we see that

lf3i'

>

j so that
(3.3.2)

b would have an expansion of the form


j-1

b= :L bkf3k + 2w.
k=l
b is in the complement of C. Also,
j, so that bk 0; otherwise we see from (3.3.2) that
"'
J-1
b= :L bkf3k + :L 2t3k,
k=l
k=;+l

which contradicts the fact that

3k

>

which would contradict the fact that

bE cc. Let us set

J-1
q/3i-l = :L bk/3k.
k=l

VkE{ 1, j - 1 ), bk is even, we see that q is even and moreover


31-1 By what we have just proved,

Since

<

(3.3.3)
(3.3.3) show that every bEcc is in an open interval
(p + 1)/3i [, where p is odd and p < 3;. We claim that
every open interval of this form is contained in cc. Indeed, let us write
The inequalities

of the form ] pf3i,

p131= :L Pkf3k ,
k=I
Since p is odd,
<

(p + 1)/31 and

3 kE {I, j)

PkE{0,2).

so that

pk= 1. Suppose that p/3i

<

"'

a= :Lak/3k;
k=l

VkE{l,j), ak=Pk
{k: kE{l,j) & ak Pk}.

then

For if not, let


If ar

L Pk/3

k=l
T

be the smallest integer in

Pr+ 1, then

k + l/3
r
"'

:L Pkf3k + :L 2/3k
k=r+l
k=l
j

"'

L Pk/3k + L 2/3k (p + l}j3i '


k=l
k=Hl

3.3 DECIMAL EXPANSIONS I Il!3

which is a contradiction. If a r .;;;

.;;;

k=l
T

.;;; L

k=l

Pr - 1, then

r
Pk/3k - l/3 +

oo

k=r+l

2/3k

k
Pk/3 .;;; PW'

which is again a contradiction. Note that if a is a terminating decimal,


the previous proof is valid regardless of which representation is used
for a. Thus a E cc, since, as we have noted, 3kE (l,j) so that a k
=

Pk

1. Hence cc is open and

C is closed.

Actually, the last two paragraphs prove more than we have stated.
Namely, the inequalities

(3.3.3) and the last paragraph show that

cc

is the union of all intervals of the form

where

q/3n-1

n-1
=

:L

k=l

qkf3k,

qkE{0,2} .

Further, the proof of the last paragraph shows that these intervals are
pairwise disjoint. For suppose a Elp,k n

lq,n, where (p,k)

(q,n).

Then, from the last paragraph we have

n-1
=

q;/31 + l/3n +

P1 /3

1=1
k-1
1=1

co

J=n+I

l /3k +

00

j=k+I

a1f3i
a1/31.

qk
1,
q1 for j E ( 1, n - 1), which

If k < n, the proof in the last paragraph leads to the fact that
which is a contradiction. If k

n,

then p1

q. If k > n, we again get a contradiction.


2n-1 intervals of the form lq,n that we
have described. Since each such interval has length l/3 n, the sum of
their lengths add up to 2n-i /3n . Hence the sum of the lengths of the pair
contradicts the fact that p #

For fixed n there are exactly

wise disjoint intervals which make up cc is

n=I
Every point of the Cantor set is an accumulation point of C. A closed set
with this property is called a perfect set. The proof is very easy. Suppose

a EC and

akE{O,2},

124 J INFINITE SERIES

where

ak

0 for

an infinite number of

series are all different from

k. Then the partial sums of this

belong to C, and converge to

a,

a.

If, on

the other hand,

ak
then for

>

{O, 2},

j, the numbers

are in C, are different from

a, and converge
The Cantor set is uncountable. The proof

to

a.

is also rather easy, being

simply an application of the Cantor diagonal process. Suppose C is


countable and <I> is a one-to-one function with domain N and range C.
Set

ak=<l>(k)

and define

ak=

0::::}a\=2,
2=}akk=O .

Clearly, the number determined by the sequence

(ak)

in its ternary

expansion belongs to C but is not in the range of <I>.


Collecting all the previous results we have proved the following
theorem.

3.3.4 Theorem. The Cantor set is an uncountable perfect set in [O, 1)


whose complement consists of the union of pairwise disjoint open intervals, the
sum of whose lengths is 1.
It is interesting and instructive to look at the geometric positions
of the intervals that comprise cc (see Fig. 3.3. l ). The interval I 0, 1 is

] 1/3, 2/3[,

that is, the "middle third" of the interval [O,

1
9

2
9

2
3

1
3

7
9

1].

8
9

The interval

FIGURE 3.3. 1

/0,2 is

] 1/9, 2/9[,

and the interval /2,2 is

)7/9, 8/9[.

These intervals

represent the "middle thirds" of the intervals that remain after I 0, 1 is


removed. Proceeding in this way, by removing the "middle thirds" of
the intervals that remain after any given stage, after a denumerable
number of steps we are left with Cantor's set.

3.3

DECIMAL EXPANSIONS I 125

O Exercises
1.

If

m,p, n E N, m

write

p /mn
2.

If

>

p,m E N, m

1,

k
Pk/m ,

>

1,

k=l

p < mn,

and

show that it is possible to

pk E ( 0, m

1 ).

show that p has a unique representation

in the form

Pk E (O,m-1), Pn- 0.
(Hint:

Exercise

may be helpful.)

3. A sequence (ak) is said to eventually periodic{:::::> 3N and


3 p E N so that k ;:_;,,, N ==> ak+P = ak. If m E N and m > 1, we know
that every a E;:: [O, I] can be written

ak E (O,m-1).
Show that
4.

is rational{:::::>

Suppose that

(ak)

is eventually periodic.

m E N, m > 1, and suppose we make the decimal


[ 0, 1 J unique by taking the representing

expansion of any number in

series of a terminating decimal as a finite sum. Show that


00

k=I
if and only if A=

5.

k
ak/m <

00

L bk/mk

k=l

{k: ak-bk}- 0

and

an< b n,

where

=min A.

If the range of a function is uncountable, show that its domain

is also uncountable.

6.

Prove that the set of irrational numbers in

7.

If x

EC

[O, l] is uncountable.

and

then define f to be that function with domain C given by


00

f(x) =
Show that (J) =

[O, l]

k=l

and that

k
xk/2 +i.

is continuous. Note that in con

junction with Exercise 5 this provides another proof of the fact that C
is uncountable.
8.

Show that the function

function. Describe a subset A C


tion with range

[O, l].

f
C

of Exercise 7 is not a one-to-one


so that f IC \A is a one-to-one func

1%6 j INFINITE SERIES

9.

Show that the function

nondecreasing. Show that


has domain

[O, I],

defined in Exercise 7 is monotone

may be extended to a function F which

is monotone nondecreasing, continuous, and is

constant on each interval I q,n of cc. The function F is called

function.

Cantor's

A way of describing Fis as follows. If

let us set

n
and

n=

oo

x E C.

if

n(x)

Then

F(x)

3.4

n-1
=

k=l

min { k:

xk

xk /2k+i

l},

l/2n.

SEQUENCES AND SERIES OF FUNCTIONS

In this section we shall generalize our definition of sequences and


series to include the situation where the elements are functions.

3.4.1 Definition. Let .t;t be the collection of all functions each having
domain and range in R. A function sequence is a function with domain N 0
and range in .f;i .
As in the situation for a real sequence, we shall designate a function
sequence by

(Jn).

The definition of convergence of a function sequence

at a point reduces to the definition of a real sequence. We shall state


this formally.

3.4.2 Definition. A function sequence Un) is said to be convergent at


a point x<=> 3N so that Vn E N0 with n N, x E J:JUn), and the real
sequence defined by the set of numbers {Jn(x) : n N} is convergent.
In dealing with function sequences, a new concept may be introduced
in considering convergence, namely, the consideration of uniform
convergence. When speaking of uniform convergence, for simplicity
we shall suppose that all elements of the range of the function sequence
have the same domain.

3.4.3 Definition. Suppose Un) is a function sequence so that Vn E N0,


Jn has the same domain J:J. The sequence Un) is said to be uniformly con
vergent<=> 3g E .f;i with J:J(g) J:J, so that Ve> 0, 3N such that Vx E J8
and Vn N,
=

IJn(x) - g(x) I

<

E.

3.4

SEQUENCES AND SERIES OF FUNCTIONS J 127

Loosely speaking, for a uniformly convergent sequence the rate of


convergence of each real sequence

Un(x))

is independent of

x.

The

idea of uniform convergence leads to the idea of a uniformly Cauchy


function sequence .

3.4.4

fn

Suppose

Definition.

Un) is a function sequence so that Vn E N0,


Un) is said to be uniformly Cauchy

has the same domain . The sequence

:::>Ve> 0, 3N so that Vx E and Vn,m;;.:N,

lfn(x)-fm(x)I

< e.

As one would expect, a function sequence is uniformly Cauchy if


and only if it is uniformly convergent. This is proved in the next
theorem.

3.4.5

A function sequence is uniformly convergent :::> it is

Theorem.

uniformly Cauchy.
Suppose

Proof.

Un)

is uniformly convergent, each

common domain . Then 3g E &f,

(g) =.so

fn

having the

that Ve> 0, 3N

such that Vx E and Vn;;.:N,

lfn(x) -g(x)I

< e/2.

Hence Vx E and Vn,m;;,; N,

lfn(x)-fm(x)I

lfn(x)- g(x)I

lg(x)-fm(x)I

< e.

Un) is uniformly Cauchy. Then Vx E .the


(fn(x)) is Cauchy, and hence converges to a real number
designate 'g (x) The collection of ordered pairs (x, g (x)) is

Conversely, suppose
real sequence
which we

'.

a function with domain . Now, V e> 0, 3N so that Vx E and

Vn,m;;.:N,

lfn(x) - fm(x)I
For each

x, fn(x) g(x),

< e/2.

and since the absolute value is a continuous

function with domain R, we have


lim
n(x) - fm(x) I= lfn(x)- g(x)I
m-oo lf
This shows that fn

3.4.6
to

g and

Theorem.

g uniformly.

If

Un)

is a function sequence uniformly convergent

each fn is continuous at a, then

Proof.

Since

(Jn)

e/2 < e.

g is

continuous at a.

is uniformly convergent, Ve> 0, 3N so that Vx

in the common domain and Vn;;.:N,

lfn(x) -g(x)I

< e/3.

1%8 I INFINITE SERIES

Fix n 0 N; by the continuity


Ix - al < 8 and x E It>=}

of

fno

at

we have that 38 > 0 so that

IJ,,.(x) - fno(a)I < E/3.


Ix

Hence, if

al < 8

and

x E />,

jg(x) - g(a)I .;;; lg(x) - fno(x)I + lfno(x) - fno(a)I


+ lfn.(a) - g(a) I < E.
This proves the continuity of

at

a.

The condition that the function sequence

(Jn)

is uniformly conver

gent cannot be removed from the hypothesis of the last theorem.


Indeed suppose

Vn E N0,fn

has domain

[O, I]

and is defined by

fn(x) = x n .
Then for

x E [O, I [,
x
0,
n- oo fn( )
lim fn(l) = 1.
n-oo
lim

Hence the limit function


I at

I.

takes the value zero on

Thus it is not continuous.

As in the case of real sequences a

[O, I [

and the value

function can be defined from

function sequences to function sequences. For the sake of simplicity


we shall restrict the domain of
so that

Vn E N0, /t>(J n) =It>.

to those function sequences

f = (Jn)

Then

n
u(J) n(x) = L fn(x )
k=O
is well defined on It>.

3.4. 7 Definition. A function series is an ordered pair (J, u(J)). The


function series is said to be [absolutely] convergent at x{:::> [u(IJl)Ju(J) is
convergent at x. The function series is said to be [absolutely] uniformly con
vergent{:::::> [u(IJI)] u(J) is uniformly convergent.
Analogous to our previous notation, we shall denote a function series
by

and if the limit exists at every point of the common domain of all the

fk,

we shall denote the limit function by


00

,Lfk
k=O

3.4 SEQUENCES AND SERIES OF FUNCTIONS I 129

We have the following immediate corollary to Theorem 3.4.6, the


proof of which we leave to the reader.

3.4.8

a and

is

er

If each in of the sequence i


uniformly convergent, then

Corollary.

(i)

is

Un) is continuous at

continuous at a.
There is a very simple but very useful test for uniform convergence

of function series.

3.4.9 Weierstrass M Test. Suppose i


Un) is a function sequence
so that each fn has the same domain and there is a constant sequence (Mn)
so that Vn E N0 and Vx E ' l in ( x) I Mn, and (M, er(M)) is con
vergent. Then (Iii, er( Iii)), and hence U. er U)) are uniformly convergent.
=

Proof.

Since for

>

m,

n
L lik(x )I
k=m+l

l<T(li l)n - er(lil)ml

<T(M)

it follows from the convergence of

that

er(Iii)

and

erU) are

uniformly Cauchy and hence uniformly convergent.


As an example, let us consider the series

and ask for the values of

x where this series converges. From the Cauchy

root test we know the series will converge for those

-1
lim

n-"'

and diverge for those


Since 1 + x2" >

xn

+ X2n

1 1/n

for which

< 1,

for which this limit superior is greater than 1.

x2n, we have

-.

hm

n-"'

x2"
I --x"

1 +

1 1,.. 1

lxl'

and hence we certainly get convergence for


since

lxl

> 1. On the other hand,

UO I INFINITE SERIES

1 + x2n

1 + l/x2n

it follows that the series converges as well for


Looking at this another way, if

lxl

I1 nx2nl

:s;;

:s;;

p< 1

lxl

<

1.

we see that

n'

and thus the series converges uniformly by the Weierstrass M test. On


the other hand, since

l/xn
1 + l/x2n

1 + x2n'

----=

it follows the series converges uniformly for


converges certainly for all

so that

lxl

=Jc.

1.

lxl l/p. Thus the series


lxl I, it is not true that

If

xn
--o ,
+
x211

so that the series cannot converge for these values of

x.

0 Exercises

gi ven in the last example does not converge

I.

Show that the series

2.

Discuss the uniform convergence of the following sequences:

uniformly on R \ { 1,

-1} .

(a ) fn(x) = 1

[Hint:

(b )

fn(x)

(c)

fn(x)

nx

' x

JO, l[.

, x E [O, oo[.
xn
nx(I - x)n, x E [O, l].

1 +

Use the binomial theorem to show that

n-1

(1-; ) n
3.

3 ! n2

Discuss the uniform convergence of the following series:

( a)
(b )
4.

(n- l)(n-2)

( ! x2 ) , x ]-00,oo[.
(0 :x2)k ) x ]-oo,oo[.
k2

Discuss the uniform convergence of the following series:

J-oo,oo[ .

3.5

[Hint:

INFINITE PRODUCTS [ 131

Use the methods of elementary calculus to compute the maxi

mum of each function in the series. Another possibility is the use of


the Cauchy-Schwarz inequality (Exercise

5.

Vn

15

1.8).]

of Section

N0, the function Jn is defined and contin


Vx EK and Vn E N0,Jn(x);;;,,, Jn+i(x).
If Vx EK, Jn(x)-+ 0, show that Jn-+ 0 uniformly on K. This fad is
often called Dini's theorem.
Suppose that

uous on a compact set K, and

6.

Suppose that the function sequence

Vn

vergent and

E N0,

Jn

(Jn)

is uniformly con

is bounded. Show that there is a uniform

bound for all the Jn

7.
and

Suppose Jn-+ J and

gn

is bounded. Then

gn-+ g uniformly on a set K and each Jn


Jngn-+ Jg uniformly on K. Show that this

may not be true if we remove the boundedness condition.

8.

Derive an analogue to Dirichlet's test that will give sufficient

conditions for the uniform convergence of a function series.

9.
10.

Derive an analogue to Abel's test for function series.


Let

(<Pn)

be a sequence of continuous functions with common

domain R and defined as follows:

<Pn-1 (x)=

<=>

{O

.I
Let

(rk)

Vx

ER,

JxJ ;;;,,, I/n,

EN

nJxJ <=> JxJ .;;; I/n,

be a squence whose range is the rational numbers Q. Set

co

Jn(x)=
Show that

Vn

E N 0,

Jn

L <Pn(X

k=O

rk)/2k.

is continuous, and

vergent. However, the sequence

(Jn) does

Vx

ER,

Un(x))

is con

not converge uniformly on

any interval in R.

3.5

INFINITE PRODUCTS

Just as it is possible to define an infinite sum in terms of a sequence,


it is possible also to define an infinite product in terms of a sequence.
Before we do this, it would be well to take a few moments for a dis
cussion of finite products of real numbers. As in the situation for finite
sums, it is possible to prove the following statement using the axiom
of induction.

There exists a unique function '1T with domain the collection of all real se
quences and range in the collection of real sequences so that if a= (an) is a

1!2 I INFINITE SERIES

sequence, then
7T(a0) = ao ,
7T(a) n+i = 7T(a)nan+t

Vn

N0

We shall not carry out the proof here, but shall note only that the use
of induction for this result is very much like that used in the proof of
Theorem 1.6.4. Actually, the existence of the

function discussed at

the beginning of this chapter, the existence of a power function as dis


cussed in Chapter 2, and more generally the

7T function discussed here

are all special cases of a very general logical theorem on the possibility
of

or

recursive

inductive definition.

The interested reader should consult

page I 63 of the book by Kershner and Wilcox cited at the end of


Section I. I.
We shall use the notation

n ak

k=O

7T(a)n

aoa1 ... an.

If we are given a finite set of numbers

{ak: k

necessary to extend the function with values

E (O, n)}, it is of course


ak and domain (O, n) to

N0 to be able to use the product symbol.


a and b,
it is a rather simple matter to show that 7T(ah
7T(bh, Vk E ( 0, n).
a function defined on all of

However, if we extend in two different ways, say to sequences


=

We may very often use the symbol

for

m.

If

a= (an) is any sequence that is an extension of the func


( m, n) and values ak, we set bk llm+k and we define

tion with domain

k=m

ak

7T(b)n-m

More generally, if A is any finite set with


one-to-one function with domain

TI

qEA

(O, n)

n+

1 elements, let <I> be a

and range A. Then define

aa

Il a<f>(k)

k=O

The last definition, of course, raises the question whether or not we have
given a definition that is independent of the function <I>. We think that
the reader can easily establish by use of the commutativity property
of the product of two real numbers and the axiom of induction that
for every one-to-one function <I> with domain and range

( 0, n), we have

S.5

INFINITE PRODUCTS

I 13!

The following associativity property for finite products is also easily


established by using the associativity property for the product of real
numbers and the axiom of induction:

3.5.1 Definition. An infinite product is an element of the 7T function,


that is, is an ordered pair of sequences (a,7T(a))
If a= (ak) and Vk E N0, ak - 0, then the infinite product (a,7T(a))
is said to be convergent the sequence 7T(a) is convergent to a nonzero number.
Otherwise the infinite product is said to be divergent.
.

The reason for excluding zero factors and a zero limit when speak
ing about the convergence of products is that it leads to neater criteria
for deciding when a product converges. This will be borne out by fol
lowing results. An infinite product will often be designated by

and, if it is convergent, its limit will be denoted by

3.5.2 Theorem. An infinite product Ilk,.0 (ak), ak - 0, Vk E N0, is


convergent Ve > 0, 3N so that n m N =>
(3.5.1)
Proof. Suppose the infinite product is convergent; that is, 7T(a)n
--+ p - 0. Thus VE > 0, 3N so that n m N =>

Thus we have

But

7T(a)n
7T(a)m-1

l 1

7T(a)n - 7T(a)m-1
l =

7T(a) n
7T(a)m-1

--

7T(a)m-1

IJn

k=m

and hence the necessity is established.

ak

'

<

E.

(3.5.2)

134 I INFINITE SERIES

Conversely, suppose
If we fix m

;.:;.

(3.5. 1 ) is satisfied. This is the same as (3.5.2).


7T (a) is a bounded sequence. Set

N, we see that

Pi = lim 7T ( a ) n ,
n-oo

From

(3.5.2)

we see that Vm;.:;. N,

-E

<

( i
- 1
7T a ) m-1

j= 1, 2.

E,

<

Thus we see that Pt #- 0, since otherwise, if we had originally taken


E

<

1,

we would get a contradiction. Let us take j=

inequalities. Using Theorem


.
lIm

2.4.3(c)

Pt

m=oo 7T ( a ) m-1

lim

m-oo

we get

I in

the last set of

P1
= P1
7T(a ) m-1 p/

and thus

-E
Since this is true, VE > 0.

Pt
- 1
P2

<

p1 = p2

=;i=

<

E.

0, and the sufficiency is established.

From the last theorem it follows that a necessary condition that the

nk;;.o (ak)

infinite product

converges is that
as

Thus we shall write

ak - 0

as k

ak = 1 + ak

k-

oo.

and the last condition is converted into

oo.

3.5.3 Theorem. A sufficient condition that the infinite product nk,,.O


(1 + ak) is convergent is that the infinite product Ilk,,.0 (I + !akl) is conver
gent. If Vk E N0, ak;.:;. 0, or if Vk E N0, ak :o;;; 0, then the condition is
also necessary.
Proof.

We claim that for every sequence

In {I+ f3d- l1 n {l
:o;;;

({3k)

N0,

lf3kl}-l.

The proof is by induction. It is certainly .t:rue for

n = j,

and V n E

n = 0.

If it is true for

then since
i+t

k=O

k=O

k=O

TI {I+ f3k} - I= TI {I+ /3d - I+ {3j+1 TI

{l +

f3k}'

by taking absolute values on .both sides, using the inductive hypothesis


and the triangle inequality, we get

3.5

INFINITE PRODUCTS I 135

Consequently, we have

ltt {I+ ak} - ii ft {I+ laki} - 1.

0, 3N so that n

For every E >


E.

N implies the right side is less than

Hence the sufficiency follows from Theorem 3.5.2.


Now, if Vk E N0, ak .0, then the necessit y is clear. On the other

hand, if Vk E

N0, ak

TI

0,
I

let us write

{I+ ak}

k=O

fI { 1 + -2.J_
}
I+ ak

k=o

Because the left side converges as

ak 0, 3K

so that k

ft{l

K => 0

<

lakl} - I

n oo, so does the


I + ak 1. Thus,

right side. Since


for

it {I+ l:klJ- 1 .

K,

m and n are
Ih,.0(1 + lakl)

The right side is small provided


Theorem 3.5.2 and we see that

sufficiently large. Apply


is convergent.

3.5.4 Definition. A infinite product n k;>O(l + ak) is said to be abso


lutely convergent tj the infinite product II k;.o (I+ I akl) is convergent. A con
vergent infinite product that is not absolutely convergent is called conditionally
convergent.

ak

For emphasis we shall remark that if the


then the infinite product

Ilk,.0 {I + ak}

maintain their signs,

is convergent if and only if


it is absolutely convergent. This is, of course, a consequence of the last
theorem.

3.5.5 Theorem. The infinite product Ilk,.0 {1 + ak} is absolutely con


vergenttj the infinite series k"'o (lakl) is convergent.
Proof.

Using the Mean Value Theorem (see Chapter 4) we can estab

lish that Vk E

N0, 3f3k

with

log(l +
Thus, if

k;.o (lakl)

1/(1 + Jakl)

lakl)

f3k

1, so that

lak l.

is convergent, then by the comparison test,

!i..?! L log( 1 + lakl)


k=O

f3k lakl

!i..?! log TI

k=O

{1 + lakl}

(3.5.3)

136 I INFINITE SERIES

exists. Since the exponential function is continuous,

il

{l

+ lakl}

!i exp

1og

( I!

{l +

lakl}

exists and clearly is not zero.


On the other hand, if the infinite product is convergent, 3K so that

Vk

;<!: K =>

that

lakl l. Thus Vk

;<!: K,

f3k

;<!:

1/2, and from (3.5.3) we see


Vk

;:;i!=K.

Since the infinite product converges absolutely, the series


00

k=O

(log(l +

lakj)

is convergent. Hence by the comparison test the series

is convergent, and the theorem is proved.

3.5.6 Corollary. If Vk E N0, ak ;<!: 0, or Vk E N0, ak 0, then


Ilk,.0 (I + ak) is convergent:::> the series l:k,.0 (ak) is convergent.

D Exercises
I.

If

a2k

l/(k + 2), a2k+i

-l/(k + 2), show that

is conditionally convergent. Show that this product has the same value
as the absolutely convergent product

fl

k=2

( ;2)
-

2. Generalizing Exercise 1, let a2k(x) =x/(k + l), a2k+i(x)


-x/(k + 1). Show that the sequence defined by
n

kTI
=O
is uniformly convergent for

{l

+ ak(x)}

x in a bounded set in

the same as the limit defined by the sequence


n

x2

TI I - k2
k=I

}.

R, and the limit is

3.5

3.

If the infinite product

INFINITE PRODUCTS I 137

nk,,,0 (I I + akl) is convergent, is it neces


flk;;.O (1 + ak) is absolutely Conver

sarily true that the infinite product


gent, or simply convergent?
4.

Show that the terms of an absolutely convergent infinite product

may be rearranged in any way to obtain another absolutely convergent


infinite product that converges to the same limit as the original product.
5.
Vk E

Show that the

N0,

infinite product

nk,,,O (I

ak)'

I +

ak

> 0,

is conditionally convergent the infinite series

is conditionally convergent.
6.

Suppose

the infinite

product

nk,,,O (I

ak)

is conditionally

convergent. Use the results of Exercise 5 to show that Va E R, a- 0,


there is a rearrangement of the product which converges to a.

CHAPTER

41

DIFFERENTIATION

4.1

THE DERIVATIVE CONCEPT

When speaking about the derivative of a function at a point, it is usual


to take the domain of the function an open interval or an open set, and,
of course, the point to be in this open set. However, situations often
arise where functions are defined on closed intervals, and it is conven
ient to talk about the derivatives of the functions at the end points of
the intervals. It is for this reason that we give the following slightly
more general definition.

4.1.1
E

Definition. A function f is said to have a derivative at a a


J(f ), a is an accumulation point ofJ(f ), and
.
f(x) - f(a)
lIm
x-a

x-a

exists. In case this limit exists, it is called the derivative of f at a and is denoted
by any one of the symbols 'f'(a)', 'df(a)/dx', or 'D f(a)'. If f is differentiable at
every point of its domain it is called differentiable.
The derivative of a function f is that function f' (or dj/dx or DJ) with domain
J(J') = {x:f'(x) exists} (possibly void) whose value at the point a E J(f')
is the derivative of f at a.
From a logical point of view the notation

DJ,

for the derivative of a

D as a function with
all real-valued functions each having domain

function, is the most suitable. For we can consider


domain the collection &I of

in R (including that function whose domain is the null set), and the
range of

is also in &I. Then we can define

D2 = D oD, D3 = D2 oD,

and so on. !his "definition by induction" is made precise in the follow


ing way. It can be proved by induction that there exists a unique func
tion

F with

domain N and range in the collection of functions each

having domain &I and range in &I so that

F(l) = D ,
If we set

F(n+I) =F(n) oD .

D11 =F(n), then V f E &I we call D"f the nth derivative


'j11
< and 'df11/dx"'. We also set D0J =f

of

f.

Other notations are

4.1.2

at a a
lll8

Definition.
E

We shall say that a function f is n-times differentiable


J(J<11l), and that f is n times differentiable J(J) = J(J<11l).

4.1

THE DERIVATIVE CONCEPT I U9

function f will be said to be infinitely differentiable at a Vn


LJ(JCn>), and f is said to be infinitely differentiable Vn
LJ(J) LJ(J<n>).

A
a

E N0,
E N 0,

4.1.3

Theorem.

Proof.
&

If f' (a) exists, then f is continuous at a.

By the definition of

LJ(J)

f' (a), 3S' > 0 so that 0

I J(x)x-- af(a)

J' (a)

Multiply both sides of this inequality by

<Ix

- al <S'

< 1.

Ix - al and then use the triangle

inequality. This gives

IJ(x)-f(a) I
Now, for a given

e > 0,

take

< (I

+ I f' (a)I)Ix-al.

<min

(S', e/ (I

lf'(a)I)).

Then the

preceding inequality implies that

IJ(x)-f(a)I
provided

Ix-al < S

&

<e,

LJ(J) .

The converse of this theorem is not true. Namely, it is not true that
if a function is continuous at a point, then it is differentiable at the
point. The simplest example that illustrates this is given by the function

f (x)
This is continuous at

0,

lxl.

but
lim

x-o

does not exist. By adding together a finite number of translations of


the above function we can get a continuous function whose derivative
does not exist at a finite number of points. By modifying this process
and using what essentially amounts to an infinite number of transla
tions of the absolute value function, we can construct a continuous func
tion with domain R that does not have a derivative at any point. We
shall carry out this construction after we give an example on differ
entiation.

THE DERIVATIVE OF THE LOG.NRITHM FUNCTION

As an example of the process of differentiation, let us show that loga


is differentiable and compute its derivative at any point

E R+. We

shall suppose that all the standard additive and multiplicative proper
ties have been proved, as, for example,

140 I DIFFERENTIATION

loga xy
loga

loga

+ loga y,

y Ioga x =

loga x-1,

loga

X11

Indeed, these properties are almost immediate consequences of the


definition of loga as the inverse of

and the additive-multiplicative

ea

properties of the generalized exponential function (see Exercise


Section

of

2.3).

We form the differential quotient of loga x and attempt to pass to


the limit. We have
loga(x +

- loga x
=

Let

v =

xh
/ ; then ash \i 0,

loga

( l + ;)l/h.

(4.1.l)

/" oo. We shall show that

lim

v-.,

(1 + .! )

(4.1.2)

exists and shall designate this limit by

'e.'

From this we will be able to

show immediately that ifh /" 0, then also


lim

v--oo

( 1 .! )
+

= e.

Assuming, for the moment, that we have proved these facts, we have
from (

4.1. l)

and the fact that loga is continuous,


D loga x

To show that the limit in


sequence with values

loga

(4.1.2)

1
e1 x =

- loga

e.

exists, we shall first show that the

l,

a0 =

n EN,
is monotone increasing and bounded above (see Exercise 8 of Section

1.9). Expanding by the binomial theorem we get

1) .. n(n-1) l _!_
(l +.!.n)n l +!".n + n(2n!n2 . +
n!
nn
1 + 1 + _!_2! (1 .!.n)
+ 3\ (1-)(l-) +
! (1-) ... (1-n:l).
+

If we use

n+ l

instead of

n (4.1.3),
in

we see first that for a fixed

(4.1.3)
k
n,
,,;:;

4.1

THE DERIVATIVE CONCEPT I 141

and second, the binomial expansion for

(
l+

n+l

)n+1

has one more positive term than the corresponding expansion for
Hence

n.

+
(
(
1+
< 1+
n!1r I.
r

Next, we see from

(4.1.3) that

(
1) n
I
I
l+<l+t+-++-
n
n!
2!

Since
k!

if we use this inequality in

2 3

'

k ;??:

(4.1.4) we get

(4.1.4)

2k-I,

(
I )n
1
1
l+<l+l+-++
n
2
2n-1
1-(1/2)"
l+l-(1/2) 3 .

(an) is monotone and bounded and its limit exists. We denote the

This establishes the fact that this sequence is bounded. Consequently,


limit by 'e'; that is,
e

Suppose now that

n-oo
lim

I "
(
1+- .
n

n v n+ I ; then
(Why?)

Now,

n-oo
lim

lim

11-""

1)n + 1
1+ n

I+

)"
n+l

--

n-oo
lim

n-oo
lim

n-oo
lim

(
(

I+
I+

1)"

n
-

n-oo
lim

)"+1

n+l
--

(
1 + ! ".
n

1)
n

I +-

n-oo
lim

n-oo
lim

1 )"

I +-

1 )-1
1+ -n+l

142 J DIFFERENTIATION

Therefore,
v

lim

V-+00

If

( + !)
1

lim

n-oo

1+

e.

0, then we have

h /'

lim

v- - oo

(+ )

1 v

lim

v-oo

lim

u-oo

(-)
(
( !u ) ( + !u )
1

and loge

simply denoted by 'log

1+

is called the

-v

The function whose values are

function

( !)n
n

ex

lim

v-oo

1+

-1

--

e.

is usually called the

natural logarithm

of

exponential

and usually is

x'.

A NOWHERE-DIFFERENTIABLE CONTINUOUS FUNCTION

We shall now give an example of a continuous function defined on R


that does not have a derivative at any point of R. The idea is to con
struct a sequence of "sawtooth" functions where the spacings of the
"teeth" get finer and finer. The construction is due essentially to B. L.
van der Waerden. The variation we use is taken from the book, A

of Real Functions

Primer

by Ralph P. Boas, Jr., The Mathematical Association of

America, 1960, where it is attributed to M. Mikola.s. The reader is


referred to this book for other references on this problem.
We first obtain the following necessary condition for a function to be
differentiable at a point in its domain:

If f is differentiable at a E J(J), then Ve > 0, 38 > 0 so that Vt1,t2,si.s2


J(f) with t1 a< t2, s1 a< s2, and lt1 - t21 < 8 and ls1-s2I< 8,
we have
E

(4.1.5)
To prove this we know that
with 0<

Ix - aJ < 8

Ve

> 0,

38 > 0 so that Vx E J(J)

we have

[L(x)-f(a) - f' (a) <


I x-a

e/2.

In other words, we can write

f(x) - f(a)
x-a

f' (a)+ 'Y)(x)'

THE DERIVATIVE CONCEPT I 14!1

4.1

where

l11(x) I

e/2.

<

Thus, if t1

a,

we have

Hence we get

l[(t2) - f(t1)
- f' ( a)
I t2 - ti

<

e/2 .

t1 = a, this inequality is automatic. We get the same inequality when


t1 and t2 are replaced by s1 and s2 and thus an application of the triangle
inequality gives (41
. 5
. ).
If

Let us now set

x E (-1/2, 1/2],

g(x) = lxl,

and extend g periodically to R. That is, we set

f0(x) = g(x - k)

<===> x E

[k - 1/2, k + 1/2],

kE

Z.

The function Jo has domain R, is continuous, and is periodic of period

l;

that is,

Vx E

R,

fo(x + 1)
Now, for

Vk E N0

and

Vx E

fo(x).

R let us set

fk(x) =fo(2kx).
Each function

VxE

R,

fk

is continuous and periodic of period

l /2k;

Figure 4. 1.1

that is,

144 I DIFFERENTIATION

The graphs of Jo and f1 restricted to [ 0, 1] are shown in Fig. 4. 1.1. Let


us note from the definition of fk that if x E [p/2k, p/2k + l/2k+i], then

fk (x) = 2kx-p,

and if x E [p/2k + l/2k+i , (p + 1) /2k], then

fk(x) =-2kx + p + 1.
We now set
00

f(x) = L fk(x)/2k .
k=O
By the Weierstrass M Test this is a uniformly convergent series of con
tinuous functions and hence f is continuous. Let a E R, S > 0, l/2n < S,
and m be that integer so that m/2n a < (m + 1) /2n. Set b1 m/2n,
b2 = (m + l )/2n, and b = (b1 + b2)/2. If k > n, then sincefk has period
l/2k, we have f (b1) =f (b) =fk(b2). If k < n, let p be that integer so
k
k
that p/2k b1 < (p + 1) /2k. Then, of course, p/2k < b< (p + 1) /2k and
p/2k < b2 (p + l)/2k. If pis even we have, for j E (1, 2),
=

fk(b2) - fdb1) fk(bj)-h(b)


=
= 2k'
b; -b
b2 -b1
and, if pis odd,

In either event, if k >

or k <

n,

we have

fk(b2)-fk(b1) _fk(b;) - fk(b)


bj -b
b2 -b1

0.

Hence we have
f(b2)
b2

f(b1)
b1

f(b;) - f(b)
b; -b
=

_!_ f n(b2) -fn(b1) _Jn(b;) -fn(b) = +l


2n

b2 -b1

b; - b

Now, eitherb1ab orba< b2 In either event the above equality


shows that we cannot have (4.1.5) for E = 1 / 2 , say, regardless of how
small S is taken. Thus f cannot be differentiable at a.
D Exercises

Suppose f is a monotone"'increasing [decreasing] differentiable


I.
function with an interval domain whose derivative does not vanish at
any point. Show that 1-1 is also differentiable and compute its deriva
tive in terms of the derivative of f.

4.2

Use the results of Exercise 1 to show that the functions given

2.

the following are differentiable and compute their derivatives:

J(x) = a , a> 0, x ER:


J(x) =x11n, x > 0, n E N. You
n
x , n E N, then g' (x) = nxn-1

(a)

(b)
if

DIFFERENTIATION RULES j 145

g(x)

may assume as known that

3.

Suppose f is a differentiable function with domain [a, b] and


[a, b], f'(x) > 0. Show that f is monotone increasing. [Hi nt:
Suppose that 3a,f3 E [a,b] so that a</3 and J(a) >J(/3). Let
A= {x: x >a &f(a) >f(x)} and set 1' g.l.b. A. Show thatf' (y) O.]

Vx

(bn)

Suppose

4.

is a sequence whose value at

bn = 1+1
Show that

bn

- e.

1
+ f"

= 1 + I + ...!._
2!

1m

[Hi nt:

For

+ ...!._

n!

( ; r=bnm
1+

Then bnm

5.

- bn

(O!=l) .

n.

Hence obtain an estimate for the value of

valid to three decimal places.

E N0 is given by

and

Tnm - Tn;;;. 0

as

;;;. n,

m -

- 1m

that is

set

),

nm.

oo. ]

For what values of a will the sequence whose terms are

converge? Compute the limit in each case where it converges.

6.

Let us set

f(x)

-1/x

oF-

0,

0.

0,
Show thatj<n>(o) exists and i s zero

Vn

E N. You may assume as known

the results of Exercise 4 of Section 2.3.

f and g with domain R as follows: f(O)


0 and, if x oF- 0, f(x) =x sin (l/x), g(x) =x2 sin (l/x). Show
that!' (0) does not exist, g' (O)
0 but lim x-o g' (x) does not exist. You
7.

Define two functions

= g(O)

may assume as known all trigonometric identities and all differentiation


formulas for trigonometric functions.

4.2

DIFFERENTIATION RULES

In this section we shall obtain the rules for differentiating various


combinations of functions. Presumably most of these rules are well

146 I DIFFERENTIATION

known to the reader from his previous mathematical studies. However,


some of the discussion we present here may clarify some points in the
elementary calculus.

4.2.1
Theorem. If f and g are both differentiable at a, and a is an
accumulation point of J:>(f ) n J:>(g), then f+ g, Jg, and, if g(a) "" 0, l/g
are differentiable at a. Moreover,

(f+g)'(a)=J '(a)+g'(a).
(Jg) '(a)=f(a)g'(a)+f '(a)g(a).
(I/g)'(a) =-g'(a)/g(a)2

(a)
(b)
(c)

Proof.

Formula (a) is an immediate consequence of the equation

(f +g)(a+h) - (f +g)(a) f(a+h)- f(a) g(a+h)- g(a)


=
+
'
h
h
h
and the fact that a limit of a sum is a sum of the limits. Clearly,
chosen so that

a+ h

J:>(f )

J:>(g).

h is

Formula (b) is a consequence of the equation

(Jg)(a+h)- (Jg)(a)
h
=f(a+h)
the fact that

g(a+

g(a)

+g(a)

f(a+h -f(a)
'

f is continuous at a, and the fact that a limit of a sum and

product is the sum of the limits and the product of the limits, respect
ively.
Formula (c) is a consequence of the equation

(I/g)(a+h)- (I/g)(a)
g(a+h)- g(a)
-1
- g(a h)g(a)
h
h
+
_

a, and the fact that the limit of a product


g is continuous and g(a) "" 0,
g(a+h) ""0 for all sufficiently small h for which a+h E J:>(g).
the fact that g is continuous at

is the product of the limits. Note that since

Theorem (Chain Rule). If f and g are functions with (g)


J:>(f ), and g'(a) and f '(g(a)) exist, then (f g)'(a) exists and

4.2.2
C

(fog)'(a)=f '(g(a))g'(a).
Proof.

The most natural way to begin to construct a proof is to

consider the equality

(f

g)(a+h)- (f g)(a)
h
0

f(g(a+h)) - f(g(a)) g(a+h)- g(a).


h
g(a+h)- g(a)

Since g has a derivative at a, it is continuous at a and hence as h - 0,


g(a+h)- g(a) - 0. Consequently, taking limits on both sides of the

4.2

DIFFERENTIATION RULES I 147

above equality we should get the formula stated in the theorem. The

g(a + h) - g(a)

one possible difficulty with this method is that


be zero for an infinite number of values of

could

in every neighborhood of

zero. Hence it would not always be possible to divide by the quantity

g(a + h)-g(a).

Consequently, we must proceed in a somewhat dif

ferent way.

f is differentiable
81 & y E .B(J ) ==>

Since
<

g(a), VE1> 0, 381

at

>

so that

IY - g(a)I

IJ(y)-f(g(a))-j'(g(a))(y-g(a))I:;;; e,ly-g(a)I.
Also, since g is differentiable at

a, Ve1 > 0, 38> 0

& x E .B(g) ==> lg(x)-g(a)I < 81 and

Hence, if

so that

(4.2.1)

Ix-al

lg(x)-g(a)-g'(a)(x-a)I:;;; e,lx-al.

Ix-al

<

8&x

.B(g),

<

(4.2.2)

we get from (4.2.1) and (4.2.2),

IJ(g(x))-f(g(a))-f'(g(a))g'(a)(x-a)I
:;;; IJ(g(x))-f(g(a))-f'(g(a))(g(x)-g(a))I

+ If'(g(a))I lg(x)-g(a)-g'(a)(x - a)I


:;;; e,lg(x)-g(a) I+ ei!f'(g(a)) I Ix-al.
From (4.2.2) it follows from the triangle inequality that

:;;; (E1 + lg'(a)I) Ix - al.

(4.2.3)

lg(x)-g(a)I

Using this in the last line of (4.2.3) the latter

becomes less than or equal to

E1(E1 + lg'(a)I + lf'(g(a))I) Ix-al.


For every

E> 0, take E1 < min(l,e/(l+ lg'(a)I + lf'(g(a))I)


8 so that Ix-al < 8 & x E .B.(g) implies

and a

corresponding

If 0 g(x)-f 0 g(a)-f'(g(a))g'(a)(x-a)I :;;; E Ix-al.

(4.2.4)

x =;6 a, we can divide both sides of (4.2.4) by Ix - al, which shows that
f 0 g has a derivative at a and indeed the derivative at that point is given
If

by the formula in the theorem.


There is another theorem, very closely related to the chain rule,
which is sometimes confused with the chain rule in elementary calculus.
For this reason we feel it is worthwhile to point it out.

4.2.3 Theorem. If (g) C .B(J ), (f 0 g)'(a) and f'(g(a)) exist,


f'(g(a)) # 0, and g is continuous at a, then g'(a) exists and
g'(a)
Proof.
and

(j0g)'(a)/f'(g(a)).

Since (f 0 g)'(a) exists, Ve,> 0, 381


.B(g) ==>

>

0 so that Ix - al

If 0 g(x)-f g(a)-(J 0 g)'(a)(x-a)I :;;; e,lx-al.


0

<

81

(4.2.5)

148 I DIFFERENTIATION

Also, since f'(g(a)) exists, 382> 0 so that I Y - g(a)I < 82 and y E


Je(f )
IJ(y)-f(g(a))-f'(g(a))(y -g(a))I

gx

E1IY - g(a)I .

(4.2.6)

Ix

Since g is continuous at a,383> 0 so that Ix - al < 83 and x E Je(g)


l ( ) - g(a)I < 82 Hence if -al < 83 and x E Je(g), we have
from (4.2.6)

IJ(g(x))-f(g(a))-f'(g(a))(g(x) - g(a))I

eiJg(x)- g(a)I.
,
(4.2_6 )

Let us put 8 =min (8i. 83) and m = I/If'(g(a))I If we use (4.2.5) and
(4.2.6'), then Vx E Je(g) with Ix-al < 8, we get

lg(x) -g(a)-Cfoc:l)

I
E1 m{lg(x)-g(a) I+ Ix-a}
l .
(x-a)

(4.2.7)

Now set M= I (f0g)' (a)I and use the triangle inequality on (4.27) to
get
(I - me1)

lg(x) -g(a)I

m(E1+ M) Ix.:._ al .

Take E1 < l/2m, and we get

lg(x)-g(a)I

(I+ 2Mm) Ix - al .

If we use this inequality in the right side of (4.2.7) and set A=

2m(I + mM), we get

(fog)'(a)
lg(x)-g(a)- f'
(x-a)I
(g(a))

e1 A Ix - al .

(4.2.8)

Thus Ve> 0, take e1 < min(e/A,l/2m), and we see that 38> 0 so


that 0 < Ix-al < 8 and x E Je(g)
-g(a
)
lg(x x)
-a

(f g)'(a)
< E
f'(g(a))
0

(4.2.9)

In Theorem 4.2.2 we demanded that (g) C Je(f ) to


ensure that a is an accumulation point off0g so that we could talk
about (f0g)'(a). InTheorem 4.2.3we needed(g) C Je(f ),since
an examination of the proof reveals that otherwise (4.2.9) would not
necessarily be valid for all x E Je(g) for which 0 <
-al < 8.

REMARKS.

Ix

D Exercises

Use the rules for differentiation to establish the following:


(a) If f(x)= xn, Vx E Rand fixed n E N0, then f is differ
entiable.
1.

4.3

MEAN VALUE THEOREMS j 149

(b) If c is a constant and f is differentiable at a, then cf is


differentiable at a.
(c) Any polynomial function of degree n E N0,

p(x)

k=O

akxk,

is differentiable.
Use Theorem 4.2.3 to solve Exercise 1 of Section 4.1.

2.

3. Assume the results of Exercise l of Section 4.1 as known.


Supposefand g are functions with se(g) C >(!), f'(y) exists and is
continuous in an open interval around g(a), J'(g(a)) # 0, and
(J g)' (a) exists. Use the chain rule (Theorem 4.2.2) and the results
of Exercise 1 of Section 4.1 to show that g'(a) exists and
0

g'(a) = (f g)'(a)/J'(g(a)).
0

4. Compute the derivative of e (x ) =ex and assuming the deriva


tive of the logarithm function is known, use the chain rule to show
that the following functions are differentiable and compute their
derivatives:
a> 0, Vx ER.
(a) J(x) = ax = e x loga'
Vx> 0, a ER.
(b) J(x ) = xa =ea logx'

5. Assuming that the logarithm is a differentiable function, use


Theorem 4.2.3 (and not the chain rule) to show that the functions in
(a) and (b) of Exercise 4 are differentiable.
6. Suppose that f and g are n times differentiable functions on
]a, b[. If h is the product off and g, prove Leibnitz's formula for the
nth derivative of h,

h<n>(x) =

n
) pn-k>(x)g/-l<>(x)'
(
k=O

where

4.3

MEAN VALUE THEOREMS

All the mean value theorems of the differential calculus are based on
two principles: (a) a continuous function on a compact set assumes a
maximum and a minimum, and (b) if a differentiable function is defined
in an open interval about a point where it has a local maximum or mini
mum, then the derivative must be zero at this local maximum or
minimum.

150 I DIFFERENTIATION

4.3.1

Theorem. If f is a function with JF>(f) = ]a, b[, a< b, and


a
local maximum or local minimum at c, and if moreover f' (c) exists,
f
has
if
then f'(c) =0.
Proof.

Let us suppose that

f(c + h - f(c)

{;;;:.:

has a local minimum at

if h > 0 ,

.;;; 0

if h< 0 .

c.

Then

From this we see that the limit of the difference quotient, as

h - 0,

must be zero.

4.3.2

If f is a continuous function with JF>(f)


= [a, b] , a< b , f(a) =f(b) = 0, and f is differentiable on ]a, b[, then
3c E ]a, b[ so that f' (c) = 0.
Proof.

Rolle's Theorem.

Since

is continuous on the compact set

[a, b],

it has a

maximum and a minimum on this interval. If the maximum and mini


mum are taken on at the end points, we havef(x)
hence the theorem is true. If

]a, b[, then

=0

for every x and

has a local maximum or minimum at

by Theorem 4.3.1,

f'(c) = 0.

4.3.3 Mean Value Theorem. If f is a continuous function with JF>(f)


= [a, b], a< b, and f is differentiable on ]a, b[, then 3c E ]a, b[ such that

f(b) - f(a)
= f'(c) .
b-a
Proof.

From a geometric point of view, the Mean Value Theorem

says that there is a point on the graph off where the tangent line to

is parallel to the line joining

(a, f(a))

to

(b,f(b)), Fig. 4.3.1. The


(a,f(a)) and (b,f(b)) is

equation of the straight line through the points

f(b)-f(a)
y=f(a) +
(x-a).
b-a
y

Figure 4.3. 1

4.!

MEAN VALUE THEOREMS I 151

The difference between y andf(x) is

F(x)

f(b) - f(a) x a
f(a) + b-a
( - )- f(x).

Now, F is a function which is continuous on

]a, b[, and F(a) F(b) 0. Hence


F and find a c E ]a, b[ such that
=

F'(c)

f(b)- (a)
c
f
b-a - f'( )

differentiable on

0.

If f and g are continuous


b, and have derivatives on ]a, b[, then

Generalized Mean Value Theorem.

4.3.4

functions defined on [a,b], a


3c E ]a, b[ such that

<

g'(c)[J(b)- f(a)]

Proof.
F(x)

[a, b], is

we may apply Rolle's theorem to

f'(c)[g(b)- g(a)].

Set

[g(b)- g(a)][J(a)- f(x)] + [g(x)- g(a)][J(b) - J(a) ].

This is analogous to the formula we wrote down in the previous theo

rem, where x- a has been replaced by g(x) - g(a), b- a has been re


placed by g(b)- g(a), and we may think about it as if we had multiplied

through by the latter factor. Now Fis continuous on [a, b], differentiable

]a, b[, and F(a) F (b)


find a c E ]a, b[ such that

on

F'(c)

0. Hence we apply Rolle's theorem and

g'(c)[f(b) - f(a)] - f'(c)[g(b)- g(a)]

0,

which concludes the proof.


REMARKS:
If one uses the theory of determinants (which we shall
develop in Section 6.6), there is an interesting way of looking at the

proof of the Mean Value Theorem which leads to more general re


sults. If we refer to Fig. 4.3.2, then the area of the triangle with ver
tices at the points

(a,f(a)), (b, f(b))

and

(x,f(x))

absolute value of the determinant


y

x,

f (x) )

(a, f (a))

Figure 4.3.2

(b, f (b))

is one half the

152 I DIFFERENTIATION

F(x)

f(x)
J(a)
J(b)

Now, if f is continuous on

[a, b]

is true for the function

Clearly,

theorem

3c

]a, b[

F.

so that

x
a
b

I
I

and differentiable on

F' (c)

F(a)

F(b)

]a, b[,

the same

0, so that by Rolle's

0. The derivative

F' (x) is

obtained

by differentiating the top row of the determinant. if we do this at

and set the resulting determinant to zero, we get the Mean Value
Theorem.
This procedure can be generalized in the following way. Suppose

f, g,

and

are continuous on

F(x)
Then

F
F(b)
F' (x) is

and differentiable on

g(x)
g(a)
g(b)

f(x)
f(a)
J(b)

is continuous on

[a, b]

h(x)
h(a)
h(b)

3c

]a, b[

Set

differentiable on

[a, b],

0. By Rolle's theorem,

]a, b[.

]a, b[, and F(a)


F' (c) 0. But

so that

obtained by differentiating the top row of this determinant.

Thus we get

F'(c)
Note that if

h(x) =

h' (c)
h(a)
h(b)

g' (c)
g(a)
g(b)

f'(c)
f(a)
f(b)

0.

I, this is the Generalized Mean Value Theorem.

Let us now turn to the problem of obtaining a mean value theorem


for higher-order -differences. Suppose that

[a, b]

and for

and

x+ h

in

[a, b]

61,f(x)
If x+2h is
6h2f(x) and

also in

[a,b],

is

is a function defined on

J(x+h) - f(x).

then

we have

6h2f(x)
If n

we define

6h(6,,J(x))

is defined. We call this

J(x + 2h) - 2f(x+h)+f(x).

positive integer so that

x+ nh

[a, b],

inductively by means of the formula

6h"f(x)

we define

6hnf(x)

6h(6hn-1J(x)).

4.3.5 Theorem. Suppose f is continuous on [a, b], and n times differ


entiable on ]a, b[. Then Vx E [a, b] and Vh for which x + nh E [a, b],
38, 0 < 8 < I, so that

Proof.

Let

P(n) should

P(n)

be the statement of the theorem. More precisely

be stated in the following way:

4.3

MEAN VALUE THEOREMS I 153

[a, b] and for every continuous


[a, b] which is n times differentiable on ]a, b[ and Vx E
Vh, if x + nh E [a, b], then 38 so that 0 < 8 < l and

For every nondegenerate interval


function on

[a, b]

and

L,_hnf(x) = pn>(x + n8h)hn.


If

7'=

h 0, the theorem is clearly true and therefore we shall suppose


0. Indeed, for the sake of argument we shall suppose that h > 0.
=

The statement

P(l)

is a statement of the Mean Value Theorem and

hence is true. Assume

P(n- l)
g(x)

n> l

is true for
=

and set

6hf(x) .
h
[a, b- h] and is n- l
[a, b], then x + (n- l)h
P(n- l) to g and find a

The function g is defined and continuous on


times differentiable on
E

[a, b - h].

81

so that

]a, b- h[.

If

x + nh

Consequently, we may apply

0 < 81 < 1.
Now,

1
g<n- >(x + (n - l) 81h)
=

x u<n-ll(x + (n- 1)81h + h) - pn-ll(x + (n- l) 81h}.

Apply the Mean Value Theorem to the right side and we find a

82

so

that

0 < 82 < l.
If we set 8
since

[ (n - l.) 81.+ 82]/n,


1
6h"- g(x)

then it is clear that 0 <

6hn-l(6_hf(x)/h)

8 < 1.

L,_hnf(x)/h ,

(4.3.l)
Further,

(4.3.2)

it follows from (4.3. l) and (4.3.2) that

6hnf(x)
If

h < 0,

pn>(x + n8h)hn.

(4.3.3)

a similar argument will lead to the same conclusion (4.3.3).

Consequently, we have shown that

P(n - l) ====* P(n),

and using the

axiom of induction completes the proof.


As an application of the Generalized Mean Value Theorem, 4.3.4, we
shall obtain a result that is an aid in computing limits of quotients of
functions. We have seen in Chapter 2 that
Jim

x-a

(//g)(x)

Jim /( x ) / lim

x-a

x-a

g(x),

provided the limits in the numerator and denominator on the right


exist and

lim,,._ag(x)

7'=

0. Now, it may happen that Jim,,._af(x)

154 I DIFFERENTIATION

and lim.r-ag(x) = 0. In this case it may still be possible to compute the


limit of J/g as

x a. Or it may happen that limx-ag(x) =

oo

and it may

be possible to compute the limit of the quotient. By limx-ag(x) =

oo

38 > 0 so that 0 < Ix - al < 8 & x E .B{g) =>g(x)


""'M. By limx-ag(x)
-oo we mean that VM, 38 > 0 so that 0 < Ix - al
< 8 & x E .B(g) =>g(x) M .

we mean that VM,

4.3.6

L'Hospital's Rule. Suppose f and g are differentiable o n [a, b [,


#- 0 in this interval. If

a < b, and g'(x)

X?b f
lim

(a)

(x) = 0 = lim g(x),


X?b

or if
lim g(x) =
x
?b

(b)

oo,

and if
lim

.r?b

{f'/g')(x) = l,

then
lim {f/g)(x)
.r ?b
Proof.

l.

(a) Let us first give the proofs under the supposition that

f and g are zero at b. This being the case we may set


J(b) g(b) = 0 and thus extend f and g so as to be continuous on
[a, b]. From the Mean Value Theorem, V x E [a, b] we get g(x) - g(b)
= g'(g)(x- b), x < g < b. Since, by hypothesis, g'(g) #- 0, it follows
that g{x) - g( b ) #- 0 and we may divide by it. Using the Generalized
the left limits of
=

Mean Value Theorem 4.3.4, we get

f(x) _J(x)-f(b) _Lfil


g(x) g(x) - g(b) g'(g)'
Now, Ve> 0,

38 > 0 so that 0 < b - x < 8 => 0 < b - g < 8 and


f'( )
f(x)
lg(x)
l
I g' (g)t I
-

-/ <

/ =

(b) Let us now look at the situation where

h(x) = f(x) - lg(x) so that, by hypothesis,


lim

.r? b

E.

limx?bg(x) =

g(x) oo as x /' b, 3c1 E [a, b[ so that c1


0, and 3c ""' c1 so that c < x < b ::;=:>
(g)
- h(c)
lh'g'{g)
lh(x)
g(x)-g(c) I
. I
=

oo.

Let us set

(h'/g')(x) = 0 .

Now, since
>.

x < g < b.

<

2'

c < g < x.

x < b =>g(x)

4.3

MEAN VALUE THEOREMS I 155

Also, since
then 0 <

]d, b[

g(x) oo as x /' b, 3d so that c < d< b and if d< x < b,


g(x) -g(c) < g(x) and lh(c)/g(x) I < e/2. Thus, for x E

we have

lh(x)l
g(x)
Since

lh(c)/g(x)I

lh(c)I :,;;; lh(x) -h(c)I


g(x)
g(x)

<

e/2

ford<

x < b,

<

lh(x) -g{c)I
g(x) -g(c)

for these

{x) I
lhg(x)
Ig(x) l

= M- z <

<

x we get

e.

The previous theorem is clearly true for b finite or infinite There


.
is also a corresponding theorem involving right limits and also a theorem
for the case where

reader.

g(x)

-oo

as

x b.

We leave the details for the

As an example of the use of L'Hospital's rule, let us consider

h(x)

cos x
.
Sill

# 0.

Let us suppose it is known that limx-o cos

D sin x =cos x,

and

x = 1, limx-o sin x = 0.
All the conditions of L'Hospital's

D cos x =-sin x.

rule are satisfied for the above quotient and

1.Im
x-o

h (x )

sm
l"
=Im--=
0
x-o cos x

Sometimes it may be possible, and necessary, to apply L'Hospital's

rule more than once in a given situation. For example, consider the

function given by

f(x)
g(x)

smx-x
x2

# 0.

Then we have

.Jil. cos x - I
g' (x) - 2x
_

Now, f' (x)

0 and g' (x)

apply the rule to

f' /g'

to get

0 as x

0 and D2g(x) # 0. Hence we may

l(x) 1. -sin
.
IIm = Im
x-o g(x)
x-o
2

--

= 0.

Finally, let us give an important application of the use of the Mean

Value Theorem 4.3.3. In Theorem 4.2.1 we showed that if two functions

are differentiable at a point, then the sum of the two functions is also

differentiable at that point and the derivative of the sum is the sum of
the derivatives. By the principle of induction this result can be stated

for any finite number of functions. Under suitable hypotheses the result

156 I DIFFERENTIATION
will extend for an infinite sum of functions. This is what we now wish
to establish.

4.3. 7 Theorem. Suppose that (f ,.) is a function squence for which


each fn has the domain ]a, b[ and is differentiable there. Suppose further that
the sequence (f,.' ) is uniformly convergent to a function g and 3c E ]a, b[ so
that the real sequence (f,.(c)) is convergent. Then (f ,.) is uniformly convergent
to a function J and J' g.
=

Proof.

Using the MeanValueTheorem,Vx E

we get

[f ,.(x)- fm(x)]- [f,.(c)- fm(c)]


where

is between x and

c.

]a, b[ & Vn, m

E N0

= (x- c)[J',.()- J;..()] ,

Hence we get

(f',.) is uniformly convergent it is uniformly Cauchy, and since


(f,.(c)) is Cauchy, it follows from the above inequality that (/,.) is uni
formly Cauchy. Thus (/,.) is uniformly convergent to a function f.
Now fix x E ] a, b[ and use the Mean Value Theorem again to get
Vz E ]a, b[ & Vn,m E N0,
Since

[J,.(z)- f,.(z)]- [J,.(x) - fm(x)]


where ' is between
and get

and

x.

If

= (z - x)[f',. (0

z -x of:- 0,

we tnay divide by this quantity

l /,.(z) - f,.(x) _ fm(z)- fm(x)I = IJ',.(O


z-x

z-x

- f 'm (')] ,

f:,.(,) j.

(f',.) we arrive at the conclusion


a,
b[,
z of:- x, and Vn,m N,
]

Using the uniform convergence of


that Ve> 0,

3N so

that

Vz

l f,.(z)- f,.(x) _fm(z)- fm(x)


z-x

If we let

oo

z-x

J(z) -f(x )
z-x

m > N so

fm(z) - fm(x)
z-x

that

IJ:n(x) - g(x) I <


For this fixed

< .
3

in this last inequality and use the fact that the absolute

value function is continuous, we get

Choose a fixed

m, 38 > 0

so that 0 <

lz- x i < 8 & z

l fm(Z:-=-!m(x) _ j:,.(x) I

<

]a, b[ =>

MEAN VALUE THEOREMS j 157

4.3

From the last three inequalities and the use of the triangle inequality

lz - xi
-f(x)
lf(z)zx g(x) I
f' (x)
f'(x) g(x).
If V n
fn is defined and differentiable on ]a, b[,
if k:o U'n)
uniformly convergent on ]a, b[ and 3c
]a, b[ so that
k:o Un(c)) is convergent, then k:o Un)
uniformly convergent and
moreover
(Jn)' 0f'n
we find that if

]a, b[ &

< S, then

0 <

<

Of course, this says that

E.

exists and
=

4.3.8

Corollary.

E N0,

is

is

We shall leave the proof to the reader.

D Exercises

f
Vx,y .e(J), IJ(x) -J(y) I Mix - YI f
f'
f
[a, b]
Vx
A function

1.

If

If

is differentiable on

and

> 0 so that

has domain

has the same domain and is continuous, show that

2.

is said to be Lipschitz if and only if

[a, b],

that f is a constant function.

[a, b]

and

is Lipschitz.

f'(x)

0, show

3.

Do Exercise 3 of Section 4.1 by using the Mean Value Theorem.

4.

Assume that

domain

.e(J).

]a, b[, g

has a continuous derivative at every point of its

is differentiable on its domain

entiating the composite function

f
(xn)
limn-oof(xn)

5.

Suppose

that

exists.

Xn

Suppose that f is a differentiable function defined on

c E ]a, b[
be f'(c).

and for

7.

(g)

suppose that limx-cf'

Suppose that

(x)

f'

]a, b[ such that


is
]a, b[ so that
- a, show

is a differentiable function on

is a sequence with range in

must

and

g.

bounded. If

6.

]c, d[,

Use the Mean Value Theorem to derive the chain rule for differ

]a, b[

exists. Show that this limit

has a continuous derivative on

]a, b[ and that


(a, b]. Show

f' can be extended to a continuous fimction F defined on

that f itself can be extended to a continuous function defined on

[a, b]

and that

F(a)

lim
x'>a

f(x) - f(a) ,
x-a

F(b)

lim

X/'b

f(x) -J(b)
x- b

158 I DIFFERENTIATION

8.

Use L'Hospital's rule to establish the following:


a;;:=: 0.
(a) xae-x - 0 as x - oo,
(b) x log x - 0 as x - 0.

9.

Compute the following limits:


(a)
(b)
(c)

10.

.
ex - I x-o
x2
hm

lim

x-o

log(l - x) +x.

x2

cos x
.
1im

I + x2/2
x4

x-o

If p2>(x) exists on ]a, b[, show that

J(x+h) - 2f(x) +f(x-h)


r
1(2)(x) =hi_?;!
.
h2
(Hint: Use L'Hospital's rule.)
11. Suppose f is differentiable on [a, b] and f' (a) =a, f' (b) =f3 .
Show that f ' takes on all values between a and f3. [Hint: If 'Y is strictly
between a and {3, show that g(x)
J(x) - ')l(x - a) takes on its maxi
mum or its minimum in the open interval ]a, b[. This result is often
called Darboux's theorem or property.]
=

12. Let (r n) be a sequence whose range consists of all rationals in


]O, I[. Show that

4.4

TAYLOR'S REMAINDER FORMULAS

If f is a polynomial of degree n - I and a E R, then we may write

f(x) =ao+a1(x-a)

+an_1(x-a)11-1

By successive differentiation of both sides of this equation at a we find

that

J<k>(a)
ak=,

kE (O,n-1).

For a general function defined on an interval [a, b], and which 1s

n - I times differentiable there, we write


f(x)

n-1 j<k>(a)

k=O

k-!

(x - a)k + R,.(x, a) ,
-

where this formula serves to define Rn The problem is to find a con


venient form for the remainder R,.. Any such formula is called a

Taylor's remainder formula,

4.4 TAYLOR'S REMAINDER FORMULAS I 159


although none of the expressions for

Rn

is

due to Taylor himself. The method for obtaining expressions for the
remainder is an application of Rolle's theorem.

Suppose J n - l continuous derivatives on


[a, b], and n times differentiable on ]a, b[. Further suppose and '11 are
continuous on [a, b], differentiable on ]a, b[ and Vx,y ]a, b[ with
a y x, the determinant
<I> (y) '11'(y)
I <I>'(x) 'l'(x) I
Then Vx ]a, b], 3c ]a, x[, so that
n
(x - a)k Rn(X, a),
f(x)=k=OL-1 j<k>k(a)
l
where
<l>(a) (a) n
I p>
Rn (x, a)= - -1I<1><l>(x)
, (-c)--v'{!'1-1 (x)
'-(c-)- (n -(lc)) ! (x - c)n-1.
<I>(x) v(x) I
F
[a, x], a x b,
]a, x[,
3c ]a, x[
F'(c) <l>'(c) 'l''(c)
F(a) l>(a) 'l'(a) =
F(x) l>(x) '11 (x)
a t b,
-1 J<k>
,L -(t) (x - t)k.
F(t) J(x) - nk=O
k.
F
[a, b]
]a, b[
F'(t)=- (x(n--t)n)-1! pn>(t).
F(a)=Rn (x, a),
F(x)
Rn(x,a),
4.4.1

Theorem. t
is

has

<I>

<

<

r
...t.

(4.4.l )

Proof.

on

If

is continuous on

then by Rolle's theorem

and differentiable

so that the determinant:j:

0.

(See the remarks after Theorem 4.3.4.) Let us take, for

1-

Clearly

(4.4.2)

is continuous on

and its derivative in

is given by

(4.4.3)

Further,

0.

(4.4.4)

Using (4.4.3) and (4.4.4) in (4.4.2) and solving the resulting equation
for

we get exactly the form (4.4.1).

t This theorem is due

to

L. M. Blumenthal, Am. Math. Monthly, 33 (1926), 424-426. It

was pointed out to me by Professor Sam Lachterman.

:j: A development of the theory of determinants is given in Chapter 6.

160 I DIFFERENTIATION

By taking <I> and 'I' special functions, it is possible to get a number


of special useful expressions for Rn which have appeared in the liter
ature under different names.

4.4.2 Corollary. Under the hypothesis on f given in Theorem 4.4.l,


we have the following special cases:
(a) If Vt E ]a, b[, <I>' (t) - 0 we get the Schlomilch form of the remainder:

Rn(x,a)=
(b)

<l>(x)-<l>(a) p n>(c)
(x-c)n-1,
<l>'(c)
(n-1) !

The Roche form of the remainder:


Rn(x,a)=

(c)

pn>(c)
1 ! (x-a)P(x-c)n-v,
)

p{n-

a< c< x.

The Lagrange form of the remainder:


f<n>(c)
R (x,a)=--1- (x-a)n,
n
n.

(d)

a< c< x.

a< c< x.

The Cauchy form of the remainder:


- pn>(c)
Rn(x,a)l
(x-a)(x-c)n-1,
(n- )!

Proof.

a< c< x.

To prove (a) choose 'I' any nonzero constant and apply

Theorem 4.4.l. To prove (b) set <l>(t) = (x-t)v, 1


(a). To prove (c) and (d),
REMARK:

p
set = n

p
and

n, in part

1 in part (b), respectively.

In Theorem 4.4.1 and Corollary 4.4.2 for the sake of

convenience in the statements and proofs we have expanded about


the left end point a. Clearly, everything will be equally valid if we

expand around the right end point b. In the future we shall use this fact
without comment even though we refer to Theorem 4.4.1 or Corol
lary 4.4.2.

The Taylor remainder formulas give a very convenient method for


deciding when a given infinitely differentiable function is the sum of

a convergent infinite series. For example, Vn E N0 the nth derivative


of the exponential function is again the exponential function. If we use
the Lagrange form of the remainder we may write

ex=

n-1 xk
ec
L-+x n'
!
n!

kO

where c is a number between x and 0 and depends on both n and x.

If lxl b, then

4.4

Since

bn/n! 0

TAYLOR'S REMAINDER FORMULAS I

161

(Exercise 6 of Section l. 7), we see that the remainder

goes uniformly to zero in

[-b,b].

Thus
.,

ex=:Lxk/k!,
k=O

and the convergence of the series on the right is uniform on every


compact set in

R.

The proof we have given above to show that the exponential func
tion is the sum of an infinite series leads immediately to a more general
result: If a function f is defined and infinitely differentiable in an open
interval

I(a)

about the point

compact subinterval of

I(a)

a,

and if

(J'kl)

when restricted to every

is uniformly bounded, then

f(x) =

.,

k=O

r<k r a \
(x-a)k,

k
>

I(a). Indeed,
M= sup{IJ'k>(x)I: x E j

and the convergence is uniform on every compact set in

j= {x: Ix - al ,,;;; b } C I(a) and


& k E N0}. Using the Lagrange form of
for x E ],

suppose

lpn>(c)

let

the remainder again, we get

IRn(x,a)I= -1
- (x-a)n
n.
This estimate on

Rn

gives the result.

bn
.;;M-,.
n.

Actually, it is a rather interesting fact that the conclusions we have


obtained in the previous paragraph will follow from the considerably
milder assumption that all the derivatives are uniformly bounded below
(or above). This fact is due to Serge Bernstein. Before we prove Bern
stein's theorem we shall prove a lemma. The development we give is
taken from the book by W. Maak,

An Introduction to Modern Calculus,


I 963.

Holt, Rinehart and Winston, Inc., New York,

4.4.3 Lemma. Suppose J has do main [a,b] , is infinitely differentiable,


and 3m > 0 so that Vx in [a, b] and Vn E N0,

-m

,,;_:;

pn>(x) .

Then 3M so that Vx E [a,b[ and Vn E N0,


j
Proof.

Suppose, at first, that

remainder,

Vx E [a,b[, we

f (b)
where

<">
(x),,;;;

<

<

n! M
(b- x)n

m= 0.

Using the Lagrange form of the

get

J<k>(x)
J<n+ll(c)
_ (b-x)k+
(b - x)n+l ,
k!
(n + I)!
n

b.

__

Since all the terms on the right are nonnegative we

16l! I DIFFERENTIATION

must have

pn> x)
(b -x)n f(b).
n.
In case

0, consider the auxiliary function

g(x)

f(x) +mer-a,

[a,b].

Then we have

g<n>(x)
Thus

Vx

pn>(x) +mer-a -m + m=

0.

[a,b[,
pn>(x) n!g(b)/(b-x)n.

4.4.4

Su ppose f has domain [a, b],


so that Vx in [a,b] and Vn E N0,

Theorem (S. Bernstein).

nitely differ enti able, and 3 m

is

infi

I(c)

-m pn>(x).
If Ve E [a,b[, we set I(c)= {x: Ix-cl< b-c}, then Vx
n [a,b[,
f(x)=

j<k>(c)
k=O n.
co

--1 (x-c)k.

Ind e ed, the infinite seri es on the right converges uniformly in every compact
set in I(c) .
Proof.

By hypothesis

n!/(b - a)n

3M

pn>(x)

is uniformly bounded below, and by

n! M/(b - x)n. Since


[a,b[, n!/(b-x)n n!/(b - a)",
it follows that we may take M so large that Vn E N0 and Vx E [a, b[,
m n!M/(b-x)n. Thus Vx E [ a, b[ we have

the last lemma

oo

as

so that it is bounded above by

oo,

and

Vx

n! M .
1pn>(x)I (b -x)n
Let us use the Cauchy form of the remainder in Taylor's formula:

pn>(O
R (x,c)=
(x-c)(x-On-1
n
(n-I)!
Using the above estimate for

IJ<n>({)I.

I Rn(x, c) I nM
If

<

<

b, then c

<

<

<

Jc,x[.

x-c
{

Ib

l 1 x-{ 1 n-1
b_{

and a simple computation shows that

the derivative of the function of


vanish in the interval

we get

with values

(x-{)/(b - {) cannot
{, when restricted

Hence this function of

4.4 TAYLOR'S REMAINDER FORMULAS I 163

to

[c, x],

must. take its minimum and maximum value at the end points

x and consequently
c. We get the same result if x < < c;
Ix - Wlb-I is taken at= c. If c d < b,

of this interval. Clearly the minimum is taken at

the maximum is taken at


that is, the maximum of
let us set

-d-c
b-c

p=

then for

Ix-cl d -c we

< l;

have

Ix-I ,,,:::. Ix-cl ,,,:::.


lb -I
l b-cl ...,

Thus for

Ix-cl d-c, x E [a,b[ ,


IR n (x,

Since p < 1, n pn-

Ix-cl d-c.

--'>

0 as n

c) I ,,,:::.

--'> oo,

and

Vn EN,

d -c n 1
n - .
b d p
_

and hence Rn (x,

c)

--'>

0 uniformly for

This proves the theorem, since the last statement of

the theorem is obvious.

The last theorem gives a sufficient condition in order that a function

may be represented by a special kind of infinite series. Such functions

have a special name that we emphasize by a formal definition.

4.4.5 Definition. If f is an infinitely differentiable function having an


open domain whose values in some open interval about the point a can be
represented as

f(x)

"'

<k>( a

k=O

k.

)
:L J
-,- - (x-a)k,

then f is said to be analytic at a. The series is called the Taylor expansion of


f at a. If f is analytic at every point of its domain it is called analytic.
We should point out that if a function is infinitely differentiable in

the neighborhood of a point it does not necessarily mean that the func

tion is analytic at the point. For example, the function given by

f(x)

is infinitely differentiable and

Section 4.1 ). Hence, if

e-1/x '

0,

=I=

0,

0,

Vn E N0, pn>(O)

0 (see Exercise 6 of

were analytic at zero, it would of necessity have

to have zero values at every point of some neighborhood of the origin,

which of course it doesn't.

164 I DIFFERENTIATION

0 Exercises
1.
Do Exercise 10 of Section 4.3 by using an appropriate form of
Taylor's remainder formula and assuming thatj<2> is continuous .

2.

Let f be the function with domain ]-1, 1[ defined by

f(x) =1

1
-

x.

Show that f is analytic at zero and that its Taylor expansion at zero
converges uniformly on every compact subset of ]-1, 1[.
3. Generalizing the considerations of Exercise 2, letfa be the func
tion with domain ]-1, 1 [ defined by
a

ER.

Show that fa is analytic at zero and its Taylor expansion at zero con
verges uniformly on every compact subset of ]-1, 1[.
4.

Show that the function f with domain J-1, 1[ defined by


f(x) =log (I - x)

is analytic at zero and its Taylor expansion at zero converges uniformly


on every compact subset of ]-1, 1[.
5. We have shown in Section 4.3 that the exponential function with
values e3' is analytic at zero and its Taylor expansion converges at every
point of R. Hence we may write
00

e= L llk'
k=O

l/k! + Rn+1

k=O

Show that Rn+i < I/n!n. This estimate for Rn+i shows that e is irrational.
Indeed , if e is rational it can be written e = p/q, p, q E N and
q

L
k=O

llk' < pfq<

11k1 + lfq!q.

k=O

Show that this leads to a contradiction .


6. Suppose that f is n times differentiable in ]a, b[ and 3k < n
andc E ]a,b[so that frn(c) =Oforj E (l,k). UsingTaylor's formula
state sufficient conditions so that f will have either a relative maximum
at c, or a relative minimum atc, or neither.
7. Suppose f and g have (n - 1) continuous derivatives on [a, b]
and are n times differentiable on ]a, b[. Show that Vx E ]a,b], 3c
E ]a, x[ so that

4.5

]
[ f(x) - n-1
f'(kl f(n\)
(x- a)k gin>(c)

POWER SERIES I 165

k
[ g(x) - k=O
'f gk< >a)(x - a)k ] pn>(c).
.
By choosing g to be a suitable function, obtain the Taylor formula with
Lagrange remainder.
Give an example of a function

8.

f that is defined and infinitely

differentiable in a neighborhood of zero so that in some interval

f(

x) =

J<kl ( 0) k
x
k!
'

but the power series does not represent f for


Suppose

9.

<

[O, a[,

0.

is defined and infinitely differentiable in

]a, b [.
f is analytic at
0, 3/(c), and 3M so that Vk E N0 and Vx

Show that a necessary and sufficient condition that

]a, b [
I(c),

is that

3p

>

jj<k>(x)j

:s;;

k!M/pk.

Suppose J is defined in an open interval around

IO.

at a. Show that

J is

the result of Exercise 9.)

4.5

a and is analytic
a. (Hint: Use

analytic in some open interval around

POWER SERIES

We have seen in the previous section that some (but not all) infinitely
differentiable functions may be analytic at a given point; that is, they
may be represented in a neighborhood of the given point by their Taylor

expansions. It therefore behooves us to study function series of the form


00

Such series are called

a =0

about

k=O

( c k (x- a)k ) .

power series about a or Taylor series about a, and if


Maclaurin series. Every power series
x =a, but clearly it is of interest to ask for an

they are sometimes called

converges at

effective criterion, in terms of the sequence


values of

(ck), which will give the


x for which the power series converges. A second natural

question is the following: If a power series converges in an open interval

about the point

a,

analytic function?

is it always the Taylor expansion, about

a,

of an

An effective criterion to determine the values of x for which a power

series converges is given by the Cauchy root test. For every sequence

(ck} let us set

(4.5.1)

166 J DIFFERENTIATION

0 if the sequence ( lckl1'k)


if the limit superior in (4.5.1) is zero.

We shall adopt the usual convention that


is unbounded, and shall taker=

4.5.1

oo

Theorem (Cauchy-Hadamard).

The power series

00

L (ck(x - a)k)
k=O
is uniformly absolutely convergent on any compact subset of the interval
]a - r, a+ r[ and diverges on the complement of [a - r, a+ r], where r
is given by (4.5.1).
Proof.

From the Cauchy root test the power series will converge

absolutely for every

for which
lim

k-oo
and will diverge for every

for which

lim

k-

icki1'klx - al < 1

00

lckl1'klx - al > 1.

Indeed, the proof of the Cauchy root test shows that the absolute con
vergence is uniform on every compact subinterval of

]a - r, a+ r[.

However, the uniform convergence is also easily established by means


of the Weierstrass M Test. For, if 0 <
and hence

k..olc kl Ix - al k

s < r,

then

k"'o le kls k converges

converges uniformly in the interval

[a - s, a+ s].
The number

power series and

of (4.5.1) is called the

]a - r, a+ r[ is

radius of convergence of the given


interval of convergence.

called the

The question as to whether a convergent power series is always the


Taylor expansion of an analytic function is answered by the next two
results.

4.5.2 Theorem. If the power series k..o (ck(x - a)k) has a nonzero
radius of convergence r , then the function f defined on ]a - r, a+ r[ by
00

J(x) = L cdx - a)k


k=O
is infinitely differentiable and
00

f' (x)

L kck(x - a)k-l,

k=l
where the latter series also has the radius of convergence r.
Proof.
tion 2.4)

Since (Exercise 10 of Section I. 9 and Exercise 5 of Sec

4.5

POWER SERIES I 167

it follows that both of the series above have the same radius of con
vergence.
If we set

fk(x) =ck (x-a) k,


then

fk

is differentiable and by Theorem 4.5. l and the first paragraph

of the proof of this theorem it follows that


00

k=O

(fk')

is uniformly convergent on any compact subinterval of the interval of


convergence. Thus we may apply Corollary 4.2.5 on termwise differen
tiation of a series to arrive at the differentiation formula of this theorem.
The fact that the limit of the power series is infinitely differentiable
follows by use of the axiom of induction.

If the power series k..o (ck(x - a)k) has a n onzero


radiu s of convergence r, an d if
4.5.3

Corollary.

00

f (x)
then Vn

k=O

ck (x-a)k,

]a - r, a+ r[,

E N0,

Cn
Proof.

pn>(x)
x

pn>(a)/n!

By the use of the previous theorem and the axiom of induc

Vn

tion it is easy to establish that

Setting

E N0,

00

L k(k - 1)

k=n

(k- n

+ I)c k(x-a)k-n.

in both sides gives the formula for

Cn .

If we formally multiply two Maclaurin series together and collect


terms all having the same powers of
00

n=O

(anxn )

x,

we get
00

00

( bnxn) = L (cnxn),
n=O
n=O

where

Cn

2 akbn-k

k=O

The power series on the right is called the

Cauchy product

of the series

on the left. The natural question to ask concerns the value of the radius
of convergence of the Cauchy product in relation to the radii of con
vergence of the series that make up the product. We shall prove a
theorem of this nature for series of constant terms that will immediately
answer this question for power series.

168 I DIFFERENTIATION

4.5.4 Definition. If (a,CT(a)) and (b,CT(b)) are infinite series, then


the Cauchy product of these series is the series (c,CT (c)) , where
Cn =

n
L a kbn-k
k=O

(4.5.2)

The following theorem about Cauchy products is somewhat more


general than is needed to establish the facts about Cauchy products
of power series. The proof would be somewhat easier if we demanded
that both series be absolutely convergent.

4.5.5 Theorem (Mertens). If (a, <T(a)) is absolutely convergent and


(b,CT(b)) is convergent, then their Cauchy product is convergent and its sum is

Proof.
domain of

Let us set

J(n,k)

akbn-k

fork E

(O, n)

and

N0 The

may be pictured as an infinite triangle of lattice points,

a finite portion of which is shown in Fig. 4.5.1. If

Cn

is given by (4.5.2),

then we may write

m [n

o f(n,k)

Cn=

m 1
1 -

D=ID

-l+tL
i ++- 1

1--c. ---+-

, =t=f+t

2 t---+-- ------.--- 11---.---.


0

Figure 4.5.1

What we are doing is adding together all the values that f takes on the
triangle of lattice points shown in the figure. Note we are first adding
the terms along the columns and then adding the resulting numbers.
We get the same result if we first add the terms along the rows and
then add the resulting numbers. Thus we get

4.5

POWER SERIES I 169

Of course, a formal proof of this interchange of summations can be


easily carried out by induction, but we shall not do so. Now,
m

n=k

n=k

L f(n, k) = ak L bn-k= ak<T ( b) m-k>

and thus
m

Cn =

n =O

Let us set A= li m n - oo

L ak<T(b)m-k

k=O

<T

(a) n

and

B= lim n - oo <T ( b) n .

Then we may

write
m

Cn

-AB=

n=O

oo

k=O

k=O

L ak<T (b) m-k - B L ak


m

Let

k=O

oo

ak

[ <T ( b) m-k - B]

M be an upper bound for both

<T

( lal)

B L

k=m+l

and

<T

(b) ,

virtue of the convergence of these sequences. Now,

ak.

(4.5.3)

which exists by

Ve> 0, 3m0

E N

so that the following hold:

m - k mo=> l <T (b) m-k - Bl

<

e/2M,

00

k=mo+l

If we take

m 2m0

lakl

<

and use the above estimates in the following in

equality for the right side of

lo

ak[<T (b) m-k - B] - B

:s;;

e /6M.

(4.5.3) we have completed the proof:

ak i

+l

L l ak l l <T (b) m-k - Bl+

k=O

Note that

:s;;

k=mo+l

oo

lak l l <T (b) m-k - Bl + B L l akl

m+l

m0 and m 2m0 => m - k m0

4.5.6 Corollary. The ra.dius of convergence of the Cauchy product of


two power series about a is at least as large as the smaller of the radii of the
two component series.
Proof.

Since a power series is absolutely convergent inside its inter

val of convergence, the corollary is an immediate result of the last


theorem.
As we have shown, a power series converges

inside its interval of con

vergence and if that open interval is not the null set, the power series
converges to a function that is analytic. Now, the reader can easily
show that nothing can be said about the convergence of a power series
at the end points of its interval of convergence. In other words, exam-

170 I DIFFERENTIATION

pies of power series can be given which converge at both end points
of the interval of convergence, at neither end point, or at only one end
point.
There is another question about the end points of the interval of
convergence of a power series which is well illustrated by the following
example. It is very easy to show that for

lxl

<

1,

xn
(-l)n+1 -,
n
n=J
ao

log(l

+ x) =

and the interval of convergence of the series on the right is


For

1 the

]-1, 1[ .

series converges and the function defined by the quantity

on the left has the value log 2. The question is whether these two
quantities are equal. The answer is given by the following theorem due
to N. Abel.

4.5.7

If (a,u(a)) is a convergent series and

Theorem (Abel).

ao

f(x) =

k=O

akxk,

lxl

<

1,

then f(1-) exists and


ao

J( l-)

:L

k=O

a k.

Using Abel's summation formula (Lemma 3.2.5) we get

Proof.

n
akxk= u(a) nX"+i - L u(ah[xk+i - xk]
k=O
k=O
n
= u(a)nx n+I+ (1-x) L u(a)kxk.
k=O

Noting that

jxj

<

1 and u(a) is convergent, by letting n


ao

f(x)
Let

A be

the limit of

u(a).

(1- x)

k=O

ao

:L

k=O

we get

u(ahxk.

Then, since for


--=

oo

lxl

<

1,

xk,

we get

f(x) -A= ( 1-x)


Now, Ve> 0, 3N so that k N
and 0 :s:::

<

00

[u(ah- A]x k.

k=O
ju(ah-AI

< e/2. Thus for

1,
00

(1- x)

k=n

ju(ah-Ajxk

:s:::

e/2.

4.5

Also, for fixed

;;,, N,

(I - x)
is less than

POWER SERIES I 171

n-1

L [cr(ah-A]xk

k=O

e/2 provided x is close enough to 1.

This constitutes the proof

of Abel's theorem.
The converse of Abel's theorem is, in general, not true. That is to
say, suppose

f(x)

L akxk,
00

lxl

k=O

<

1,

lxl < 1. If J(l-)


(ak) is convergent. For example,

where we are supposing the series is convergent for


exists, it is not necessarily true that k;;.o

L (-l)kxk,

1
1+X

00

-- =

The left side goes to

k=O
1/2 as x -

l, but

lxl

1.

<

k..o (-l )k

is not convergent.

There are a number of "corrected" converses to Abel's theorem, and


these are called

Tauberian theorems.

We shall prove the original theorem

obtained by Tauber. We first need a lemma (see Exercise


tion

4.5.8

Lemma (Cesaro).
Cn

Proof.

of Sec

If
=

(sn) converges to S, then

n
1
s l
L
n + k=O k S.

We may write

"
1
L [s
n+ l k=O k -S] .
that k;;,, K =::::} lsk -SI < e/2.

Cn

Now

13

1. 7).

Ve> 0, 3K
n;;,, m,

so

get for

h-SI

-S

l
n+ 1

m-1

4.5.9

e/2

e/2

and the first term on the right

for all sufficiently large

Theorem (Tauber).

n.

This proves the lemma.

<

1,

If

L akxk,
00

lxl
k=O
and ifkak - 0 ask - oo and J( 1-) exists, then
J(x)

0 lsk-SI + n+ 1 k l sk-SI.

The last term on the right is less than


is also less than

Fix m;;,, K and we

is convergent and, of course, f( 1-)

k..o ak.

172 I DIFFERENTIATION

Proof.

lxl

If

1,

<

we can write

n
n
L ak-f(l-)=J(x)-J(l-)+ Lak(l-x k)
k=O
k=O
(4.5.4)
Now, for

>

1,
1 -xk

and .since

x
J l

<

1 we

k-1
( 1 -x) Lxi,
j=O

get

1-xk k (I -x).
Using this in (4.5.4) we get, for 0 x <

I ak- f(l-) 1

1,

IJ( x)- f(l-)J


oc

Let us set

Xn

(I

x) L klak l + L J akJxk .
k=n+l
k=O

(4.5.5)

1 - 1/ ( n + 1) ; then, by hypothesis,

By Lemma 4.5.8 and the fact that

klakl

as

n - oo.

0, we get

(4.5.6)

(4.5.7)
Also, since

kak -

0,

Ve> 0, 3N so that

,.,;:: _

.c:::

;=: N

L.J
+ 1 k=o

Xn

l akl

e/k and thus


,

e
1
--.c:::
e
+ 1 1-xn

(4.5.8)

--

<

If we use (4.5.6), (4.5.7), and (4.5.8) in (4.5.5) with

Xn

replacing x, we

have completed the proof.


THE TRIGONOMETRIC FUNCTIONS

It is certainly true that the best approach to the understanding of the


properties and uses of the trigonometric functions is through the intui
tion of geometry. However, once we understand what we are looking
for, the demands of mathematical rigor require that we give precise
definitions and proofs. One of the easiest ways to do this for the trigo
nometric functions is through the use of power series.

4.5

POWER SERIES I 173

Let us pose the problem of finding two differentiable functions s


and c, each with domain R and which satisfy the equations
s' (x) =c(x).

(4.5.9)

c' (x) =-s(x).

If there exist such functions, it is clear from these equations that they
must be infinitely differentiable and moreover V k E N 0,

c<2k>(x) =(-l)kc(x) ,

Since

(4.5.10)

c<2k+ll(x) =(-l)k+is(x).

and c are differentiable they are continuous and thus, from

(4.5.10), for every compact set in R all the derivatives are uniformly

bounded. It follows from Bernstein's theorem, 4.4.4, that these func


tions are analytic at every point of Rand moreover they are represented
on all of R by their Taylor expansions.
If we consider the Taylor expansions around the origin, from (4.5.10)
we get
s(x) =s(O)

c(x) =c(O)

oo

k=O
oo

k=O

l)

(-l)k

x2k

oo

+ c(O)
(2k) !

x2k
(2k) !

s(O)

k=O
oo

k=O

x2k+1

( - l) k
(

(2k + 1) ! '
x2k+ 1

k
l)

(2k + 1) !

(4.5.11)

Hence, if there exist functions that satisfy (4.5.9), they must be of the
form (4.5.11). From Theorem 4.5.2 we may differentiate the series on
the right termwise, and it is a simple exercise to establish that these
functions actually satisfy the differential equations (4.5.9).
The equations (4.5.11) show that the solutions to (4.5.9) are not
uniquely determined. However, once s(O) and c(O) are specified, they
are uniquely determined. By taking s(O) = 0 and c(O) =1, we obtain
the trigonometric functions sine and cosine:

sm x =

cos x =

oo

k=O
oo

k=O

(-1) k

2k+I
X
(2k + 1) !

'
(4.5.12)

x2k .
( -O k
2k'

Let us now obtain the main properties of these trigonometric func


tions. Let us first get the addition formulas, from which it is then relatively
easy to get the other essential properties. If for fixed y E R we set
su(x)=sin(x+y),

Cy(x)=

cos (x + y) , then s11 and

Cy

satisfy the dif

ferential equations (4.5.9) and moreover su(O) =siny, and cu(O) =cosy.
Hence from (4.5.11) and (4.5.12) we get
sin (x + y) = sin x cos y + cos x sin y ,
cos(x + y) =cos x cos y - sin x sin y .

(4.5.13)

174 I DIFFERENTIATION

(4.5.12)

From

it is clear that sine is an odd function and cosine is

Vx E R, sin (-x) =-sin x, and cos (-x)


=cos x. If we use these facts, and in the second equation of (4.5.13) we
an even function; that is,
take y

-x,

we get the usual formula


cos2

x+

x=1 ,

sin2

From this it is immediately clear that


!cos
Also from

(4.5.13) we

xi

Vx

Vx

E R,

!sin

xi

1,

(4.5.14)

ER.

1.

(4.5.14')

get the usual double-angle formulas:

sin
cos

2x=2 sin x cos x ,


2x =cos2 x - sin2 x
=2 cos2 x - 1
1 - 2 sin2 x.

(4.5.15)

From the facts that cosine is continuous, cos


it follows that sin

=1, andD sin

x =cos x

is increasing in some neighborhood of the origin

and indeed it is increasing in that interval around the origin for which
cos

x > 0.

Now,

22
cos 2=1 2!

24

4!

26+4k

""

[ 6 +14k)
(

4
(8 + 4k) !

and since

4
(8 + 4k) !

(6 + 4k) !

>

k;;;. 0,

it follows that cos 2 < 0. Since a continuous real-valued function with


an interval domain must take on all values between any two points
of its range, there is a number in ]O, 2[ at which the cosine takes on
the value zero. Since the cosine is continuous, the set of points at which
it takes on the zero value is closed and hence there is a smallest positive
number at which it takes on the zero value. This number is clearly not
zero. Tradition demands that the smallest positive number that makes
cosine zero be labeled 7T/2.
Because cosine

is

an

even

function,

[-7T/2, 7T/2] is monotone increasing. From

the restriction of sine to

(4.5.14) and the fact that


1 and sin(-7T/2) =-1.

cos 7T/2=cos(-7T/2) =Owe get that sin 7T/2=


Thus, from

(4.5.14')

and the continuity of sine, we know that its range

is [-1, l].
From the first formula of

(4.5.15)

we get sin 1T = 0 and from the sec

ond formula of the same number we get cos 1T =-sin2( 7T/2) =-1.
Repeating this process with 27T in place of 1T, we find that cos 27T = 1,
sin 27T=0. If we use these facts in conjunction with the addition
formulas

(4.5.13),
cos

we find that

Vx

(x + 27T) =cos x,

ER.

sin (x + 27T) = sin

x.

(4.5.16)

4.5

POWER SERIES I 175

If f is a function with domain R and p is a nonzero number so that


Vx E R, f(x + p) f(x), then p is called a period for f, and f is called
periodic. If there is a smallest positive period for f, then it is called the
period for f. We have just shown in (4.5.16) that 21T is a period for both
sine and cosine. Thus both of these functions are periodic and we shall
show that 21T is the period for both. Let us first show this for cosine.
Since cosine is continuous, the set of its periods is either closed or else
=

zero is an accumulation point. Since any integer times a period is again


a period, in the latter case it would follow that the set of periods is
dense in R, and using the continuity of the cosine we find that Vx E R,
cos x =1. We have seen in a previous computation that this is not true.
Thus let p be the period of cosine. From (4.5.15) we get 1 =cos p =
2 cos2 (p/2) - 1, from which it follows that cos2 (p/2) =1 and hence
sin (p/2) =0. Now, cos (p/2) = 1, since otherwise from (4.5.13) we
would find that p/2 is a period for cosine, contradicting the fact that p
is the period. Consequently, -1 =cos (p/2) = 2 cos2 (p/4) - 1, or
cos (p/4) =0. Since TT/2 is the smallest positive number at which cosine
is zero, it follows that p
21T. From (4.5.13) and the facts that cos(TT/2)
=0, sin ( 1T/2) =1, it follows that

cos x =sin (x + TT/2).


If p is any period for sine, it follows that
cos (x + p) =sin (x + 1T/2 + p) =sin (x + 1T/2) =cos x .
Hence p 2TT, and we see that 2 TT is also the period fo r sine.
Since sin (x + TT) =-sin x, it follows that sine, when restricted to
[TT/2, 31T/2], is decreasing, and has only one zero in that interval at TT.
Since sin x > 0 for x E ]O, 1T[ and D cos x =-sin x , we conclude that
cosine is decreasing in [O, 1T], and, since it is even, is increasing in
[-TT,0]. Using the periodicity of sine we see that Vk E Z, the restric
tion of sine to [(k - l/2)TT, (k + l/2)TT], has an inverse called the kth
branch of the arc sine and whose values we shall denote by 'arc sinkx.'
For k = 0, the inverse is called the principal arc sine and its values are
usually denoted by 'Arc sin x.' Note that the domain of each branch of
arc sine is [-1,1] and the range of the kth branch is [(k - 1 /2)TT,
(k + l/2)TT]. In a similar way, Vk E Z the restriction of cosine to
[k1T, (k + l)TT] has an inverse which is called the kth branch of the arc
cosine and its values are denoted by 'arc coskx.' Each branch of arc
cosine has domain [- l, 1] and the branch for k = 0 is called the principal
branch of arc cosine and its values are denoted by 'Arc cos x.'
Let us compute D arc sink x, wherever the derivative exists. Since
sine restricted to [(k - l /2)TT,(k + l/2)TT] is monotone, Theorem 2.3.6
tells us that the inverse is continuous. Hence, since Vx E R,
sin(arc sink x ) =x ,

176 I DIFFERENTIATION

Theorem 4.2.3 tells us that D arc sinkx exists wherever cos (arc sinkx)
#- 0 and moreover
D(arc sinkx)

1
.
cos (arc sinkx)

Now,
cos2 (arc sinkx)

sin2 (arc sinkx) = 1 ,

so that
cos2 (arc sinkx)= 1 -x2
For k odd, cosx is non positive forx E [(k - 1/2) 1T, (k + 1/2) 1T] , and for
k even, cosx is nonnegative on this interval. Hence
k
.
(-l)
D arc sinkx = ;,-o ,
lxl < 1 .
v 1-x2

In a similar way we can show that

k
(-l) +l
D arc coskx= . ;,-o .
v 1 -x2

lxl

<

1.

Once the sine and cosine have been defined, the other trigonometric
functions can be defined in terms of these. The most important other
trigonometric function is the tangent defined by the equation
tanx=

Sill X

cosx

--

#-

(2k + 1) 7T/2.

The tangent is monotone increasing in the open intervals ] (2k - l)TT/2,


(2k + 1)TT/2[, k E Z, and its range when restricted to any one of these
intervals is R. The inverse of the tangent restricted to the interval
] (2k - l)TT/2, (2k + l)TT/2[ is designated by 'arc tank,' and arc tan0 is
called the principal branch of the arc tangent and is designated by
'Arc tan.'
Since
tan (arc tankx)

x,

and D tanx is never zero, it follows from Theorem 4.2.3 that arc tankx
has a derivative
D arc tankx=

1
D tan (arc tankx)

--------

Now, from the definition of tangent,


1
D tan y = - -,
cos2 y

y #-

(2k + l)TT/2.

4.5

POWER SERIES) 177

Further, since cos2 y + sin2 y = I, it follows that I + tan2 y = l/cos2 y and

thus

= I

cos2 ( arc tank x )

+ tan2 (arc tan x) = 1 + x2


k

Thus
D arc tank

x= 1 + 2
x

D Exercises
1.

Suppose that Vx in [a
00

k=O
Show that Vk E N0,

2.

ck=

B,

a+

BJ, B

> 0,

ck(x-a)k=O.

0.

Compute the radius of convergence of each of the following

power series and test for convergence at the end points:


(a)

f (i ! k x
k=O

(b)

k .

oo

k=O
00

(c)
3.

k=l

x2k+1
2k+ I

(kkxk).

Suppose that

3M so that Vk

)
)

(ck) is a sequence with the property that 3p and

E N0,

ick l Mk! pk.


Show that Va E R there exists an infinitely differentiable function f so

that a E J?>(J) and Vk E N0,

Jk(a) =ck.
4;

Suppose that
00

f(x) = L ckxk,
k=O

where the power series has a nonzero radius of convergence. What is


the function
00

g(x)

k=O

k3ckxk?

178 I DIFFERENTIATION

5.

Suppose

]a - r, a+ r[, r

power

series

has

the

interval

of

convergence

0, and the series converges at a+ r. Show that the


series is uniformly convergent in [a, a+ r]. (Hint: Take a close look
>

at the proof of Abel's theorem, 4.5.7.)

6.

k;.o(ak)
k;.o(ck) is

Use Abel's theorem to prove the following: If

and

are convergent and their Cauchy product

con

k;.o(bk)

vergent, then

7.

Suppose that

f(x)
and VkE N0,

ak

L akxk,
00

k=O

lxl < l,

0. If k;.o(ak) is divergent, show that f is un -

bounded.

8.

Prove the addition formula for cosine by means of Cauchy

multiplication for power series.

4.6

THE WEIERSTRASS APPROXIMATION THEOREM

We gave an example at the end of Section 4.4 which showed that it is


not true that every infinitely differentiable function is analytic, that is,
can be represented by a convergent power series in the neighborhood
of a point. On the other hand, if a function is analytic at a point, then
on any compact interval inside the interval of convergence of the
power series there is a sequence of polynomials that converge uniformly
to the function, namely, the finite sections of the power series. Sur
prisingly this latter property persists not only for infinitely differentiable
functions but for merely continuous functions as well. Of course, for
general continuous functions it will no longer be true that the approxi
mating polynomials will be sections of the

same

power series.

The fact that a continuous function on a closed bounded interval


can be uniformly approximated by polynomials was discovered by
K. Weierstrass. The proof we shall give here is due to S. Bernstein. It

is probably one of the best constructive proofs of this theorem in the


sense that the approximating polynomials can actually be constructed
and their rate of convergence estimated. Another famous proof of
this theorem is associated with the name L. Fejer. In Section 6.7 we shall
give another proof due to M. H. Stone. That proof is a pure existence
proof and does not give a ready method for constructing the approxi
mating polynomials or of estimating their rate of convergence. On the
other hand, it has the advantage that it is a proof which can be adapted
to very general situations.

4.6

THE WEIERSTRASS APPROXIMATION THEOREM I 179

We shall begin by proving a theorem of a somewhat more general


nature than the Weierstrass theorem. It very often happens that by
generalizing a problem some of the obscuring details of the special
case are removed, and thus we can see much more clearly what is in
volved. We feel that this is the case here, although we shall not give as
general a theorem as is possible, since this might have the effect of again
obscuring the problem.

Suppose (An) is a sequence of functions each having


[O, I] and satisfying the following conditions:
(a) An(k,x) OandVk > n,An(k,x) = 0.
(b) Vo> 0, kEKi;<x> An(k,x) 0, uniformly on [O, I], as n oo,

4.6.1

domain

Theorem.

N0 X

Ki;(x) = {k: Ix - k/nl o}.


(c) Vx E [O, I],=oAn(k,x) = 1.
If f is a continuous function with domain [O, l] and if
where

Bn(J,x) =
then Bn(J,x)
Proof.

L f(k/n)An(k,x).

k=O

f(x) uniformly in x as n

oo.

From condition (c) it follows that

f(x)

L f(x)A,.(k,x).

k=O

Hence we may write

Bn(J,x)- f(x) =
If we use the fact that

L [f(k/n)- f(x)]A,.(k,x).
k=O

An(k,x)

IBn(J,x) - f(x)I

we find that

L lf(k/n)- f(x)IAn(k,x)

k=O

Since f is uniformly continuous on [O, l], it is bounded, say by M, and


moreover Ve > 0, 38 > 0, so that Ix - k/nl < o =} lf(k/n)-J(x) I < e/2.
Thus we write

IBn(J,x)- f(x)I

IJ(k/n)- f(x)I An(k,x)

1x-k/nl<6

lx

nl ;;.6

k/

lf(k/n)- f(x)I An(k,x).

The first term on the right is less than


E

L
lx-k/nl<6

An(k,x)

2 L.J An(k,x) = 2
k=O

(4.6.1)

180 I DIFFERENTIATION

The second term on the right in

2M.

is less than or equal to

L An(k, x).
lx-k/ni"'ll
that n;;:.: N implies

Ve, 3N so
e/2, uniformly in x. Thus
Ve> 0, 3N so that n;;:.: N

By condition (b)

(4.6. l)

than

this last number is less

using these facts in ( 4.6. l) we find that

IBn(f, x) - J(x) I

<

e.

This completes the proof.


The reader who continues his studies in analysis will find that se
quences of functions

(An)

which satisfy hypotheses like those in the

previous theorem arise again and again. Such sequences fit under the
generic name "approximate identity." The reason for this name will
become clearer to those readers who investigate the theory of Banach
algebras.
The Bernstein proof of the Weierstrass theorem is obtained by choos
ing the sequence

(An)

in a special way. Our next lemma is devoted to

proving that the special sequence we choose satisfies the hypotheses of


the previous theorem.

4.6.2

Lemma.

For every n

let An be the function on

E N0

defined by
A n(k,

N0 X

[O, l]

x) = () xk(I-x)n-k ,

where () is the binomial coefficient given by


()

n
(n _

<=> k

! kl

:s.;;

n
,

O<=>k>n.

The sequence (An ) satisfies the hypotheses of Theorem 4.6. l.


Proof.

Condition (a) i s clearly satisfied and requires n o further

comment. Condition (c) follows from the binomial theorem:

(x+ l

-x ) n=

k=O

(n)xk(l-x)11-k.
k

It therefore remains to prove the crucial condition (b). This will require
some computations.
From the binomial theorem,

(x+ y)n
If we fix

Vn
=

E N0 and

()

Vx,y

E R, we have

n kyn-k
.
x
k=O k

and differentiate both sides with respect to

y n-1
n(x + )

f k (kn)xk-lyn-k.

k=l

x we get
(4.6.2)

4.6 THE WEIERSTRASS APPROXIMATION THEOREM I 181

Multiply both sides by x and set y

nx

I -x to get

i k ( nk ) xk(I-x)n-k .

(4.6.3)

k=O

Now keeping y fixed again and differentiating both sides of (4.6.2)


with respect to x we get

n(n-I) (x + y)n-2

i k(k-I) ( nk ) xk-2yn-k.

k =2

Multiply both sides by x2 and set y

n(n- I)x2=

k(k- I)

k2

k=O
=

I -x to get

k=O
From the fact that (k-nx)2

()
n
k

k2

()

n k
x (I-x)n-k
k

xk(I -x)n-k - nx.

(4.6.4)

2knx + n2x2, we get from (4.6.3)

and (4.6.4)

()

( nx-k)2 n xk(I-x)n-k
k
k=O

nx(I -x).

Supposing n > 0, and dividing both sides by n2 we arrive at the formula

(4.6.5)
Using (4.6.5) we get

.s2

l.r-k/nl;;.ll

()

xk (1

x)n-k

.::;

l.r-k/n l;;.ll

( )(
n
k

2
x - ! xk(I - x)n-k
n

x(l - x)
I

n
4n

This shows that V.S > 0, the sum on the left goes to zero uniformly in

x as n

-+ oo,

which completes the proof of the lemma.

4.6.3 Theorem (Weierstrass). If f is continuous with compact domain


[a, b], then Ve > 0 there exist a polynomial p so that Vx E [a,b],
IJ(x) -p(x)I < e.

Proof.

Suppose at first that [a,

b]

[O, I].

Then by Lemma 4.6.2

and Theorem 4.6.1, the sequence of polynomials given by

Bn (f, x)
converges uniformly to f.

f(k/n)

( )

xk(l-x)n-k

182 I DIFFERENTIATION

In the general case, set g{y) f( (b - a)y +a) for y E [O, I]. As y
ranges over [O, I], x = (b - a)y +a ranges over [a, b] Now,

.
g(ktn) (;) G:::::)\ :::::r-k.
=

Bn(g,y)

The right side is a polynomial in x and thus f is approached uniformly


on [a, b J by polynomials. This concludes the proof.
REMARK:

For continuous functions f on [O, I], the polynomials


with values Bn(f, x ) formed by using the special functions of Lemma
4.6.2 are called the Berristein polynomials for f.

CHAPTER

5.1

51 INTEGRATION

RIEMANN-DARBOUX INTEGRALS

We now come to the operation of integration, which is, broadly speak


ing, the inverse of the operation of differentiation. We shall suppose
that the reader has already obtained, in his studies of the elementary
calculus, the intuitive geometric conception of a Riemann integral as
an area. Hence we shall forego a discussion of this aspect of the subj ect
and proceed immediately to the formal aspects.

5.1.1 Definition. A decomposition A of a closed interval [a, b] is a


finite set {Ik: k E (I, n)} of closed nonvoid intervals such that any two
intervals of this set have at most one point in common and

[a,b]

If Ik

{Ik: k E (l, n) } .

[ak, bd, we shall put JIkl= bk

ak, and

JAJ =max {!hi :k E (I, n)}.


5.1.2 Definition. A decomposition A* is called a refinement of the
decomposition A (in symbols A* :>) tj each interval in A* is contained in
an interval in A.
If A1 and A2 are decompositions of the same interval, the common refinement
of A1 and A2 is the set of all nonvoid intervals each of which is the intersection
of an interval of A1 with an interval of A2

A,

We shall leave as an exercise the fact that if

A* is
A*.

then each interval in is a union of intervals in

a refinement of
We also think it

is clear that the common refinement of two decompositions is actually


a refinement of each.
Let us now turn to the problem of defining Riemann and Darboux
sums and integrals.

5.1.3 Definition. Let J be a real-valued Junction with domain the


interval [a, b J. Let R1 be the real-valued function with domain the collection
of ordered pairs (A, {xk}) where A= { h : k E (I, n)} is a decomposition
of [a, b] and xk E Ik, and defined by
,

R1(A, {xk}) = :LJ(xdJikl


k=l
183

184 I INTEGRATION

The function R1 is called the Riemann sum function for f and any number in
its range is called a Riemann sum for f.
The function R1 is said to have the limit R (J) ::::? VE > 0, 36., so that
a>-a. IR1(fi, {x k}) - R(f) I < E. In case R1 has a limit, we say that
f is Riemann integrable and the limit R(J) is the Riemann integral of f.
Note that in case R1 has a limit R (J), we are justified in calling R (J)
the limit of R1 since it is unique. Indeed suppose R1 (J) is also a limit of
R1. Then Ve> 0, 3fi, and 3fi1, so that if a>-a. and a>-d1,, then

fl.*. be the common refinement of a. and a1 . Then a>-ti*


a >-a. and a>-fl.1.. Hence a>-fl.*. IR(J) - Ri(J) I IR(J)
- RJ(fi, {xd) I + IR1(fi, {x k}) - R1 (J) I < E. Since this is true VE > 0,
we have R(J)
R1 (J).
Let

5 1.4 Definition. Let f be a real-valued bounded function with domain


[a, b] and let D1 and !21 be real-valued functions with domain the set of decom
positions of [a, b] so that if ti= {lk: k E (1, n) },
n

DJ(D,,)

k=l
n

QA6.)

'L

k=l

Mkllkl,
mk11k1,

Mk=
mk

sup

{J(x): x Eh} ,

inf {f(x):

x E Ik} .

The functions D1 and !J.1 are called the upper and lower Darboux sum functions
for f, respectively. The numbers DJ(D,,) and !J.J(D,,) are called upper Darboux
and lower Darboux sums for f, respectively.
Set

and call these numbers the upper and lower Darboux integrals of f, respectively.
case D(f) !J.(f) D(f), we say that f is Darboux integrable and call
D(f) the Darboux integral off.
In

The fundamental theorem of this section is the following.

5.1.5 Theorem. The Riemann integral of f ex ists if and only if the


Darboux integral of f ex ists, and if they ex ist, then R (f)
D (f)
=

From this theorem we are justified in denoting the common value of

R(f)

and

D(f),

if they exist, by one symbol. The standard symbol is

J:!(x)

dx,

5.1

RIEMANN-DARBOUX INTEGRALS j 185

and we shall call this the Riemann-Darboux integral. Before we prove


Theorem 5.1.5 it is necessary to establish the following lemma.
5.1.6 Lemma. If f is a bounded function defined on [a, b] and A* is
a refinement ofA, then

!l1(A) ,,;;;; !l1(A*) ,,;;;; 15,(A*) :s.;; 15,(A).


Proof.

By definition
n

!l1(A)=L mklikl,
k=l
For each Ik EA, let
Ak

{I*:I* EA* & I* CIk}

Since A*> A, it follows that


Ik= U {I*:I* EA k}
and hence (Exercise 2 at the end of this section )
I
I kl

II*I.

/*EAk

Ifm*(/*)=inf {f (x) :x EI*} andI* Ch, it is clear that m k ,,;_;; m* (/*).


Thus we get
n

L mk IIkl
k=l

L 2, m*(I*)I
I *!.
k=l JE A k
Since A*= {/*: I* EA k & k E (l, n) }, it follows that the right side of
the above inequality is precisely !l1(A*) . This proves the left-hand in
equality of Lemma 5.1.6. The right-hand inequality follows by similar
reasoning, and the middle inequality is obvious.
,,;_;;

Proof of Theorem 5.1.5. Suppose that the Riemann integral of f


exists. f must be bounded (Exercise 4 of Section 5.1) and VE > 0 there
is a decomposition A of [a, b] such that

-e/2 < R1(A, {xk}) -R(J) < e/2.

(5.1.l)

We shall suppose that a < b, since otherwise the fact that D( f ) exists
and is equal to R(f ) is trivial. If we choose xk EI k so that Mk -f(xk)
< e/2(b - a), then
-

D1(A) -R1(A, {xk})

I kl < e/2.
2, (Mk - J (xk))I
(5.1.2)
k=l
On the other hand, if we choosexk E Ik so thatf(xk) - mk < e/2(b- a),
we get
(5.1.3)
=

186

\INTEGRATION

From (5.1.1) and (5.1.2) and the definition of D(J) we get


D(J) - R(J) l5t<.6.) - R(J) < e,

(5.1.4)

and from (5.1.1) and (5.1.3) we get


R(J) -{!,(J) R(J) -[2,(.6.) < e ,

(5.1.5)

which implies {!,(J) = D(f)


D(J) . Replacing D(f) and {!,(f) by
D(f) in (5.1.4) and (5.1.5), respectively, leads to the conclusion that
D(f)
R(f) .
Conversely, suppose that the Darboux integral off exists. For every
e> Q_there are decompositiOns .6.1 and .6.2 so that D(f) -[21(.6.1) < e,
and DtC.6.2) -D(f) < e. If .6., is the common refinement of .6.1 and .6.2,
then Lemma 5.1.6 gives
=

D(f) - []t(.6..) <

e'

Dt<.6..) - D(f) < e.

(5.1.6)

If A >.6.,, then by Lemma 5.1.6, and (5.1.6),


-e < [l1(.6.,)

D(f) [21(.6.) - D(f) R1(.6. , {xk}) - D(f)


i5,(.6.) - D(f) i5,(.6..) - D(f) < e.

This shows that the Riemann integral of f exists and R(J) = D(f) ,
which completes the proof of the theorem.
Theorem. The Riemann-Darboux integral of the function f exists
the function R1 is Cauchy in the sense that Ve> 0 ,3 .6., so that .6.> A,
and .6.' >.6., ==::}
5. 1. 7

Proof. If the Riemann-Darboux integral off exists, then Ve> 0,


3A, so that A > A, and A' > .6., ==::} \R1(A, {xd)
R(J) I < e/2,
\R1(.6.', {x'd) - R (J) I < e/2. Hence, by the triangle inequality,

\RtC.6., {xd) -R,(.6.', {xk}) I < e.


Conversely, suppose R1 is Cauchy. The method of obtaining a limit
is a variation on the theme of the proof of Proposition 2.4.4. There
exists a decomposition A0 so that .6.> A0 ==::}
\R1(.6., {xk}) -R , (.6.o, {xok}) \ < 1 .
Hence A>A0 ==::} \R1 (A, {xk}) I is bounded by 1 + \R1(A0, {x0k}) \. For
every .6.>.6.0 let us set
-;;;(A) =sup {R1(.6. ', {x})

A' >A},

Jim R1= inf{-;;;(.6.) :A>A0}.


We claim that Jim R1 is the Riemann integral off. First, note that
A*> A =}(A*) ;;J(A). Next Ve> 0, 3A1 >Ao so that
(5.1.7)

5.1

RIEMANN-DARBOUX INTEGRALS I 187

Also, 3A2 >-A0 so that A, A*>- A2 ==:}


IR1(A, {xd)-R1(A*, {x\}) I

<

e/2,

and 3A' >- A2 so that


0 iiJ(A2) - R1(A', {xd)

<

e/2.

(5.1.8)

Since A' >- A2, it follows that VA>-A2 we get


IR1(A, {xd) -R1(A', {xd)I

e/2.

<

(5.1.9)

The inequalities (5.1.8) and (5.1.9) show that VA>-A2,


O -;;;(A2) -R1(A, {xd)

<

e.

(5.1.10)

Let A, be the common refinement of A1 and A2 Then VA>-A, we get


from (5.1.7) and (5.1.10) and the monotone character of-;;; that
o
o

-;;;(A)- limR,

<

e,

-;;;(A)-R1(A , {xd)

<

e.

From these inequalities it is immediate that A >-A,==:}


IR1(A, {xk})-1im R11

< E.

5.1.8 Corollary. Suppose [c,d] C [a,b],f is de.fined on [a, b], and


is the restriction of f to [c,d]. If f is Riemann-Darboux integrable, then

so

lS g.

Proof. Let A0 be the decomposition of [a,b] that consists of the


intervals [a,c], [c,d] ,and [d,b]. For everye >0,let A,be a refinement
of A0 so that A>- A, and A' >-A,==:}

IR1(A, {xk}) -R1(A', {x})I

< E.

Let A1,, A2,, and A3, be the subsets of A, which are decompositions of
[a,c], [c,d], and [d,b], respectively. If A2 is a decomposition of
[c,d] and A2 >-A2. , then A= A1, U A2 U A3, is a decomposition of
[a b],which is a refinement ofA,. IfA '2>-A2, and A' = A1, U A'2 U A3.,
then

Hence R0 is Cauchy and the theorem follows from Theorem 5.1.7.


5.1.9 Theorem. If f and g are Riemann-Darboux integrable functions
de.fined on [a,b] , then
( a) for all real numbers a and f3, af + {3g is integrable,
(b) f g is integrable, and
(c) Ill is integrable.

188 I INTEGRATION

Proof.

For every

>

0 , 3d, so that d

>-

d, ==?

I 1 af(xk)IIkl - J:f(x) I
I J3g(xk)IIkl - J3 J: g(x) l

<

dx

e/2,

dx < e/2.

Thus

I [af(xk)

J3g(xk)] II kl - a

J:f(x)

dx - J3

J: g(x) I
dx

<

E.

This proves part (a).


To prove part (b) we first note that it is enough to prove that the
square of any integrable function is integrable . Indeed, if this is the case,
then from the formula
I

fg=4 [(f+g)2 - (f-g)2],


and the fact that f +g and f - g are integrable, it follows that Jg is
integrable.
Let us suppose at first that f ;;;,; 0 and M is an upper bound for J.
For every E > 0, 3 d so that

D1(d) -[11(d) ,,;;;; e/ 2M

If hE A, m k =inf{f(x) :x E Id, Mk=sup{f(x) :x Eh}, then since


j;;;,;O we get m k2=inf{f(x)2:xEid, Mk2=sup{f(x)2:xE!k}
Hence
-

D1,(A) -Q1.('11) = L (M/ - m /} IIkl


k=l
n

= L (Mk+m k)(Mk - m k)II kl


k=l
,,;;;; 2M[D,(A) - [l,(A)]

<

E.

This shows that 15 (J 2)=[2 (f2) .


In the general case let m =inf{j( x) :xE [a, b]}. Then f - m is a
nonnegative integrable function. Hence (f- m )2 is integrable. But
(f - m )2

f2 - 2mf + m 2 ,

and, since 2mf and m 2 are integrable, it follows that f2 is integrable.


This completes the proof of part (b).
1
To prove part (c), since IfI = (f2 ) 12 , it is enough to prove that the
positive square root of any nonnegative integrable function is integrable.
Suppose g ;;;,; 0 and integrable. For any e > 0, 3 A so that

Du(A) -Qu(A)

< e2

5.1

RIEMANN-DARBOUX INTEGRALS I 189

Let lk Ea, mk =inf{g(x)112:x EIk},


Mk=sup{g(x)112: x EIk },
A= {k: Mk+ mk<E} and B ={k: Mk+ mk E}. Then

kEA
E

(Mk-mk )IIkl<E(b- a),

L (Mk+ mk)(Mk - mk)IIkl


kEB
:s;;Da(a)-Q0(a)<E2

(Mk -md IIkl :s;;

kEB

Hence
n

k=l

(Mk -mk)lhl< E(b - a+ I),

which completes the proof of (c).

D Exercises
l.

If

a*

is a refinement of a, show that every interval in

of intervals in

2.

If

a*.

is a decompositiOn of

[a, b], a :s;; b,

a is a union

show that

b-a= L II kl
k=l
3.

If f is a bounded function defined on

[a, b],

establish the fact

that

4.

Show that a Riemann integrable function must be bounded.

5.

Using the definition of a Riemann-Darboux integral, show the

following:

J: x dx= 1/2.
r x2 dx 2/3.
f dx= -

(a)
(b)

-1

(c)

6.

If

1.

is a decomposition of

llll

=max

[a, b]'

we have defined

{III: IEa}.

Show that a bounded function f, with domain


tegrable

VE> 0, 36> 0

so that

lal<6 ==>

O:s;;Dr(a)-ll.r(a)<E.

[a, b],

is Darboux in

190 j INTEGRATION

7.

f, with domain [a, b], is Riemann integra


R(J) so that Ve> 0, 38 > 0 so that for every
[a, b] with IAI < 8.

Show that a function

ble::::> there is a number


decomposition

of

IR,(A,
8.

{xk}) - R(J)I <

If f is Riemann-Darboux integrable on

Sn=;; kLn=I J(k/n) ,

E.

[O, 1]

and if

show that

Sn J:J(x) dx.
(Hint:
9.

The results of Exercises 6 or 7 may be useful.)


Evaluate the following limits:

1
( k )2
n-oo -n kL=I -n .
1
k
n-oon- k=LI e ln.

(a)

lim

Jim

(b)

(Hint:
10.

See Exercises 5 and 8.)


If

is defined on

[a, b]

and integrable, and

only a finite set of points, show that

differs from fat

g is integrable and has the

same in

tegral as f.

5.2

PROPERTIES AND EXISTENCE OF


RIEMANN-DARBOUX INTEGRALS

In this section we bring together the relevant properties of Riemann


Darboux integrals and prove some theorems about their existence.
Most of the results we prove here should be known to the reader, at
least in form, from the elementary calculus.

5.2.1 Theorem. If f and g are functions defined


Riemann-Darboux integrable, then
(a)

(b)

on

[a, b] and are

a and f3,
J: [af(x) + f3g(x)] dx=a J: f(x) dx + f3 J: g(x) dx,
J 0 implies J: f(x) dx 0,

for all real numbers

PROPERTIES AND EXISTENCE OF RIEMANN-DARBOUX INTEGRALS I 191

5.l!

(c)

IJ: f(x) dxl J: IJ(x) I dx, and

(d)

b implies

J: f(x) dx J: f(x) dx J: f(x) dx.


+

Proof.

The proof of part (a) is the same as the proof of part (a)

of Theorem 5.1.9.
Part (b) follows from the fact that all the Riemann sums are non
negative, and hence the limit must also be nonnegative.
To prove part (c) we note that it is always true that

Iii - f;;:;.

IJI + f;;:;.

0 and

0. By combining the additive property proved in (a) with

the positivity property of (b) we get

J: f(x) dx J: l f(x)I dx.

This is the content of part (c).


To prove (d) we first remark that by Corollary 5.1.8 the restrictions

[c,d] are integrable. Now, Ve> O,there exist decom


Ll2, of [a,c] and [c,b], respectively, so that if Ll1
> a,. and Ll2 > Ll2. . then

of/to

[a,c]

and

positions a .. and

J: f1 (x) dxl <


IR12( Ll2, {x2k}) - J: fi(x) dxl <
IR1t (Ll1, {x1d) -

e/2,
e/2,

where /1 and hare the restrictions of f to

[a, c] and [c,b], respectively.


Ll2. and Ll =a, u Ll2. it follows that Ll is a decomposition
[a,b] with Ll > a.' and if {xd {x,d u {x2d' then by the triangle

If a.= a,. u
of

inequality

IR,(Ll, {xd) Since every refinement of

J: f(x) dx - J: f(x) dxl <

Ll,

is clearly ot the form

e.

Ll1

Ll2,

we are

done.

5.2.2 Theorem. If f is defined and integ;rable on [a,b], and g is de


fined and continuous on [a,b], differentiable on ]a,b[, and Vx E ]a,b[,
'
we have g (x)
f(x), then
=

J: f(x) dx
Proof.

lk

Let

[ak> bk]

Ll = {Ik: k E ( 1, n)}
ak <bk. Then

and

g( b )

- g(a) .

be any decomposition of

[a,b]

with

192 I INTEGRATION

n
g(b) - g(a) = L [g(bk) - g(ak)]
k=l
n
L g'(xk)[bk - ak]
k=l
=

n
L f(xk) IIkl ,
k=l

We have, of course, used the Mean Value Theorem, and this is where

we require that g be continuous at the end points.


For any

>

0, if we choose a so that

IRJ(Ll, {xk}) f f(x) dx l


-

it follows that for the particular choice of

<

{xd

E,

as given above by the

use of the Mean Value Theorem,

If f(x) dx - [g(b) - g(a)] I If J(x) dx - RJ(Ll, {xk}) I


=

<

e.

This completes the proof of the theorem.

5.2.3 Corollary (Integration by Parts). If f and g are defined and


continuous on [a,b J and differentiable on ]a, b[ and fg' and f 'g are inte gra
ble (by defining f ' and g' in any way at a and b), then

f f(x)g'(x) dx + J: f '(x)g(x) dx=f(b)g(b) - f(a)g(a).


f(x)g'(x)+J'(x)g(x)=[f(x)g(x)]'.

Proof.

Since the left side is

integrable, the conclusion is an immediate consequence of the previous


theorem (see Exercise 10 of Section 5. 1).

As an application of Theorem 5.2.2 we can obtain an integral remainder


formula in Taylor's formula. We shall suppose that f is defined and has
n - 1 continuous derivatives on [a,b J, is n times differentiable on
]a,b[ andj<"> (when defined in any way at a and b ) is integrable. Let us
write

f(x)
Clearly,

Rn(t,t)

n-1 <k>(t
j
)
L k !- (x - t)k + Rn(X,t) ,
k=O
=

R'n(x,t)=-

(x - t)n-1 < n>


J (t),
(n I)!
_

[a,b].

0. If we differentiate with respect to t, holding x

fixed, we get

Hence, if

Rn(X, c)

b, we

have from Theorem 5.2.2,

Rn(X, c) - Rn(x,x)
-

tE]a,b[.

J: R'n(x,t ) dt

(x - t) n-1 n>
p (t) dt.
(n - I)!

PROPERTIES AND EXISTENCE OF RIEMANN-DARBOUX INTEGRALS I 193

5.2

If {3 ;;;.:

and

g is

integrable on

[a,{3], define

J: g(t) dt - J: g(t) dt.


=

With this definition, the formula for Rn is valid as well for a


Hence

l) ! J:

Rn(X,c) =
(n

(x - t)n-ipn>(t) dt.

b.

(5.2.1)

5.2.4 Theorem. If g is a continuous increasing function on [a,b J and


difef rentiable on ]a,b[, f is defined on se(g) and integrable, and (f 0 g)g' is
integrable, then

fg(b) f(x)

d:x=

g(a)

fb f0g(x)g' (x)

d:x.

If d
{ [ak,bk]: k E (1, n)} is a decomposition of [a,b],
{ [a'k,b'k]: k E (1, n)}, where a'k = g(ak), b'k g(bk) is a de
composition of [g(a),g(b)]. Conversely, since g is continuous and
increasing, it is one to one and every decomposition of [g(a),g(b)]
comes from a decomposition of [a,b J in this way. Further, xk E [ak,bk]
<=> x'k = g(xk) E [a'k,b'k] and d1 >-d <=> d'1 >-d'.
Since f is integrable, for every E > 0 there exists a decomposition
d', of [g(a),g(b)] so that if a' >-d'., then

Proof.

d'

then

IR1(d', {x'd) - Jg(b)f(x) I

d:x <

(5.2.2)

e.

g(a)

Let a and a. be the corresponding decompositions of


and

Ik

the corresponding interval in

a,

[a,b].

If I'k E

d',

use the Mean Value Theorem

to get

(5.2.3)
If we take

x'k = g(xk)

in

(5.2.2),

then from

(5.2.2)

and

(5.2.3)

we get

va >-a.,

I f0g(xk)g' (xk)IIkl - J: f(x) l

d:x <

Since

f0g(x)g' (x)

E.

is integrable, this shows the equality of the two

integrals (see Exercise

10

of Section

5.1).

REMARK:

Because the last inequality in the previous proof holds

va >-a.'

we might be tempted to conclude that

(f0g)g'

is integrable

and we can eliminate this hypothesis from the theorem. However,


note that for each a
set

{xk}

>-a.

we have this inequality true for only a special

and not for all such sets.

194 I INTEGRATION

The previous theorem justifies, in many instances, the process used


in elementary calculus for changing variables under an integral sign.
For example, suppose we wish to compute

J:
In this case

f(x) = -Let

dX .

hypotheses of the last theorem are satisfied. We have

f g(x) =
0

cos

g{x) =sin x for


g{O)= 0, and g{7r/2)

us take

Then g is monotone increasing,

x,

0,,;.;;

x,,;.;; 7T/2.

1. Thus the

g'(x) =cos x,

and thus

('o dx= f7r/2


o cos2 x dX.
J
J
5.2.5 Theorem. If f is defined on [a, b J and integrable, and if F is
also defined on [a, bJ by means of the equation

J: f(t) dt,

F(x) =

then F is differentiable at every point x


off, and F' (x)=f(x).
Proof.

F(x+h -F(x)
- f(x)=
h

[a, bJ which

is

a point of continuity

Suppose xis a point of continuity off Then if x+h E

we have

If

[a, bJ,

* {J:+h f(t) dt - J: J(t) dt }- f(x).

> 0, we have

J:+h J(t) dt- J: f(t) dt= J:+h f(t) dt,


rx
J(x)= ldx +h f(x) dt.
1

Hence

F(x+h -F(x) f(x)

f is continuous at x, Ve >
IJ(t) - f(x) I < E. If we take h

Since
=>

h\"}

l * J:+h IJ(t) - f(x)I dt.


:o;;;

0, 38 so that

It - xi

< 8 &

fe(f)

< 8, we are led to the conclusion

F (x+h)-F (x)
=J(x) .
h

5.2

If

PROPERTIES AND EXISTENCE OF RIEMANN-DARBOUX INTEGRALS I 195

< 0, a similar argument shows that

lim

h)'O

F(x + h) - F(x)
= J(x) ,
h

which concludes the proof of the theorem.

5.2.6 First Mean Value Theorem. If f and g are defined on [a, b],
g 0, Jg and g are integrable, and J is bounded, then there exists a number c
such that inf J ::% c ::% sup f and

J: J(x)g(x)
Proof.

Let

dx= c

m =inf f, M =sup

J: g(x)

f. Since

dx.

mg(x)

::%

J(x)g(x)

::%

Mg(x),

it follows that

m
Now, if

J: g(x)

J: g(x)

f J(x)g(x)

dx ::%

dx ::% M

J: g(x)

dx.

dx= 0, then the theorem is clearly true. If

J: g(x)

dx

> 0, set

c=

J: J(x)g(x) /J: g(x)

and it is immediate that

dx

::%

::%

dx,

M.

5.2. 7 Second Mean Value Theorem. If f and g are defined on [a, b],
g 0, Jg and g are integrable, J is bounded, and m ::% inf J ::% sup J ::% M ,
then 3c E [a, b] such that
.

f J(x)g(x)
Proof.

dx= m

Define the function

G(x)= m
G

J: g(x)

on

dx + M

[a, b]

J: g(t) dt

f g(x)

dx.

by the equation

f g(t) dt.

is a continuous function (Exercise 4 of Section 5.2) and

min

::%

G(b) = m

J: g(t) dt J: J(t)g(t) dt
M J: g(t) dt= G(a)
::%

::%

Since

takes on all values between min

::% max

and max

G,

G.

and since the

196 I INTEGRATION

above inequality shows that

J: f(t)g(t) dt

maximum and minimum, 3c E

G(c)

[a, b]

is a number between this

such that

J: J(t)g(t) dt

This proves the theorem.


We shall now present a theorem concerning the integration of a
unifotmly convergent sequence of integrable functions. The theorem
is useful for a wide variety of purposes.

5.2.8 Theorem. If Un) is a sequence of functions each of which is de


fined on [a, b] and integ;rable, and if fn - f uniformly on [a, b], then f is
integrable and

J: fn(x)

dx -

J: f(x)

dx.

Proof. For every e > 0, 3N so that n N and x E [a, b] => lfn(x)


- J(x) I < e/3(b- a). In particular, this means that for every set
A C [a, b], and Vn N,

lsup{fn(x): x
Iinf Un(x): x
Fix

N and let

E
E

A} - sup{f(x): x
A} - inf {f(x): x

E
E

A}I e/3(b- a),


A}I e/3(b- a).

a. be a decomposition of [a, b] so that .:1

ID rm (.:1) - 12rm (.:1) I

<

e/3

a.

>-

=>

Hence

ID,(.:1) -Q,(.:1) I ID,(.:1)

i5,m ( .:1) I+ ID,m( .:1 ) - Q,m(.:1) I


+

Thus

is integrable. Further,

I J:fn(x)

dx-

ll21m(.:1) - l2r(.:1) I

< E

N =>

J: !(x) dxl J: lfn(x)- f (x)I

dx e/3,

which concludes the proof.

,,,/

If Cfn) is a sequence of functions each of which is


defined on [a, b] and integrable, and if Lk,,,0 (fk) is uniformly convergent to
f on [a, b], then f is integ;rable and
5.2.9

Corollary.

We shall leave the obvious proof of this corollary for the reader. We
should remark that it is possible to relax the hypothesis about the uni-

5.2

PROPERTIES AND EXISTENCE OF RIEMANN-DARBOUX INTEGRAl.S I 197

form convergence of the sequence

(Jn) and still obtain the conclusion

of theorem 5.2.8. However, the hypothesis cannot be relaxed all the


way to pointwise convergence as the following simple example shows.

Un) is the sequence of functions, each having domain [O, l]

Suppose

and defined in the following way:

fn(x)

n+ l

{::}

x E ]0,1/(n+ I)],

0,

otherwise.

It is not hard to check that each

fn(x)

0. But Vn E N0,

fn is integrable and Vx E [O,l],

If a sequence Un) converges pointwise to an integrable function J and if


the sequence is uniformly bounded, then the conclusion of Theorem 5.2.8
remains valid. The proof of this result belongs more properly to the
circle of ideas connected with the Lebesgue theory of integration and
we shall not prove it in this book. Note that in our previous example
the sequence of functions does not remain uniformly bounded.
We shall now finish this section by giving two different sufficient
conditions that a Riemann-Darboux integral of a function exists. Al
though the conditions we shall give are not necessary conditions, they
nevertheless are broad enough to be very useful in a wide variety of
circumstances.

5.2.10 Theorem. If f is defined on [a, b] and is monotone nondecreas


ing [nonincreasing], then f has a Riemann-Darboux integral.
Proof.

Let

a= {lk:

k E (l,n)} be any decomposition of [a,b].

Suppose that the intervals Ik

= [ak, bk] are named so that a1 =a,


n. Since f is nondecreasing, the mini
mum and maximum off restricted to Ik are taken on at ak and bk ,

bn

b, and ak

bk-I for 1 < k

.,;;

respectively. Hence

n,(a) = f(a2)(a2 - a1) + J(a3)(a3 - a2)+ ...+ f(b) (b - an)'


Q,(a) =f(a1)(a2 - a1)+ f(a2)(a3 - a2)+

+ f(an)(b - an).

Thus

D,(a) - Q,(a)
Now, for

>

"

_L

k=l

[f(ak+1) - f(ak)] [ak+i - ak],

0, choose a so that 1a1 <

D,(a) - [},(a)
Since

D(j)

.,;;

15,(a)

and

[},(a)

E'

b.

and we get

.,;; E [f(b)
.,;;

an+l

- f(a)] .

lJU ), and (Exercise 3 of Section

198 I INTEGRATION

5.1) f}(f) ,,;;; D(f)


0,,;;;

it follows that

r f(x)

Lb f(x)

dx-

Since this is true for every

E> 0,

dx,,;;; E[/(b) - f(a)].

the upper and lower integrals are

equal, which proves the theorem.

5.2.11 Theorem. If f is a function with domain [a,b], is bounded,


and is continuous except possibly at a finite set of points, then f has a Riemann
Darboux integral.
Let m be the number of discontinuities of f and M =supIfI.
E> 0, let 8, be a positive numbr so that 4Mm8,< E and let
a0 ={ J k: k E (1, n)} be a decomposition with laol,,;;; 8,. Let u take
A0 as the set of k in (1, n) so that f is continuous on] k and B0 = (-1, n) \
A0 Clearly Bo has at most 2m elements, and

Proof.

For every

:L llkl,,;;; 2m8,.

kEBo

If k EA0, J il k is uniformly continuous. Hence VE> 0, 38 so that


Vk EA0 and Vx, y E ]k with Ix- YI< 8 we have lf(x)- J(y) I< E.
Let a.>-a0, l a.I < 8, and let a= {IJ: j E (l,p)} be a decomposition
of [a,b] with a >-a. Let A = {j: 3 k EA0 so that I; Cfd, and
B = (1, p) \A. Clearly, if j EA we have MJ - mJ< e, where M; =sup
JII; and m; =inf JIIJ. Further,

L I Ijl

}EB

L lfkl

kEBo

Hence we have

D1(a)-f}1(a)= :L(M ;-m;)II;I+ L (M;-m;)II;I


}EA

JEB

< E(b - a)+ 4Mm8,< e (b- a+ 1).


We can now complete the proof in the same way as we completed the
proof of the previous theorem.
The last two theorems are special cases of a much more general
theorem concerning

the existence of Riemann-Darboux integrals.

Indeed, a necessary and sufficient condition is known. To describe this


result it is necessary to define a set of measure zero. A set of real num
bers is said to be of measure zero if and only if

VE> 0 there is a cover


e.

ing by a set of open intervals the sum of whose lengths is less than
It turns out that a bounded function

is Riemann-Darboux integrable

if and only if it is continuous except possibly on a set of measure zero.


Clearly it follows from this result that many of the hypotheses of the
theorems of this section are redundant. We shall delay a proof of this
general result until Chapter 8.

5.2

PROPERTIES AND EXISTENCE OF RIEMANN-DARBOUX INTEGRALS I 199

5.2.12 Fundamental Theorem of the Calculus. If f is a continuous


function with domain [a,b], then there exists a function F, defined and differ
entiable on [a,b] so that F' f.
=

Proof.

Since

is continuous, its Riemann-Darboux integral exists

for every interval

b. Setting

[a,x] , a :,,;;; x :,,;;;


F(x)

J: f(t) dt,

the result is an immediate consequence of Theorem 5.2.5.


The Leibnitz-Newton conception of an integral was that it was an

antiderivative, that is, the inverse of the differentiation operator. Thus


a function f has' an integral in their sense if and only if there is a func
tion

such that

F'

f.

The fu nction Fis called a primitive off and the

collection of primitives of

f is called the indefinite integral of f.


fundamental theorem of the calculus is so named because it shows

The
that

every continuous function, with domain an interval, has a primitive,


and therefore an integral in the Leibnitz-Newton sense.

D Exercises
I.

If

is defined and integrable on

La J(x)
If a,b, and care

5.2.l(d) is valid.

2.

dx

[a,]

-J: f(x)

define

dx.

any real numbers, prove that the conclusion of Theorem

In Theorem 5.2.1 replace the hypothesis that

and g are

Riemann-Darboux integrable by the hypothesis that they are bounded.


Either show that the conclusions of the theorem are valid for both the
upper and lower Darboux integrals, or obtain more general statements
involving the upper and lower Darboux integrals from which Theorem
5.2.1 will follow as a corollary.

3.

In Theorem 5.2.2 replace the hypothesis that

the hypothesis that

is integrable by

is bounded. Obtain a generalization of Theorem

5.2.2 involving upper and lower Darboux integrals.


4.

[a,b]

If

is defined and integrable on

F(x)
show that

5.

[a,b] , and F

is defined on

by

F is

J: J(t) dt ,

continuous.

Using the convention of Exercise 1, show that Theorem 5.2.4

remains valid if g is monotone decreasing.

200 f INTEGRATION

6. Suppose g and g' are defined and continuous on [a, b], and J is
defined and continuous on (g). If Fis a function defined on (g)
by the equation
F(y)

show that for every

[Note: If

y <

g(a),

g(a)

=.

o<a>

J(t) dt

[a, b] ,
g(x)

J: Jo g(t)g'(t) dt.

J(t) dt is defined in Exercise 1. If we set x

b,

then we get the formula of Theorem 5.2.4. Hence for a certain dass
of functions this result is a generalization of Theorem 5.2.4.]

7.

Under the hypotheses of Theorem 5.2.6 and the additional

hypothesis that f is continuous, show that

J: J(x)g(x)
8.

dx = J (c)

3c

[a, b] so that

J: g(x)

dx.

Show that any two primitives of a given continuous function

differ by an additive constant.

9.

Let f be a function defined on


if

x is irrational,

if

x is rational .

Does the Riemann-Darboux integral of

10.

g(x)

11.

If g is continuous on

[O, 1] as follows:

[a, b], g

J exist?

;;;;,; 0 and

J: g(x)

dx

0, show that

0 for every x.

(Jn) is a sequence of continuous functions each defined


on [a, b], and Vn E N0 and Vx E [a, b].fn(x) :-s;; Jn+1(x). Further, sup
pose that Vx E [a, b], Jn(x) J(x), where f is continuous. Show that
Suppose

J: Jn(x)

dx

J: J(x)

dx.

(Hint: See Exercise 5 of Section 3.4.)


Suppose (Jn) is a sequence of continuous functions each defined
[a, b], and Vx E [a, b], Jn(x) J(x), where f is continuous. Is it

1 2.
on

necessarily true that

5.3

13.

Give a proof for the integrability of

IMPROPER INTEGRALS I 201

in Theorem

5.2.8 using

the Cauchy criteria for the existence of an integral.


From the expansion

14.

00

--= "" xk

1-x

,.,,

k=O

obtain the Taylor expansion of log

15.

Expand

l/

'

lxl < 1,

(1- x)

around

0.

in a Taylor series around the origin and use

this to get the Taylor expansion for Arc sin

16.

x around the origin.

Prove Bernstein's theorem by using the integral form of the

If f and all its derivatives are nonnegative


in an interval [a,b] , then f is analytic in ] a,b[ .
For c E [a,b[, write the remainder as

remainder in Taylor's formula:

Rn (x ' c)

(x- c) n-1
=

(n - 1) !

(1 - s) n-I pn>( (x - c)s

by making the change of variable t= (x- c)s + c in


;;:. 0, pn> is nondecreasing and thus for x > c,

J<n+O

Rn(x,c)
But

Rn(b,c)

f(b), and

5.3

Rn(x,c)

c) ds

(5.2.1). Now, since

(=r-1 Rn(b,c).

this implies

0 Rn(X, c)
This implies

G=r-l f(b).

0.

IMPROPER INTEGRALS

It may often happen that it is possible to define an integral for an un


bounded function whose domain is a finite interval or for a function
that is defined on an unbounded interval such as

[a,oo[.

One natural

way to do this is by means of a limiting process. Since the definitions


and results given here parallel very closely the definitions and results
given for infinite series, we shall not be as thorough here as in Chapter

3.

Definition. Suppose f is a function with domain [a,b[ (b E R


b= oo) and Vx E [a,b[, JI [a,x] has a Riemann-Darboux integral. Let
I(J) be that function with domain [a,b[ defined by
5.3.1

or

I(J)(x)

J: f(t) dt.

The ordered pair (J, J(J)) is called an improper integral of the first kind
b= oo , and it is called an improper integral of the second kind b E R.

202 I INTEGRATION

The improper integral (f, I(f)) is called convergent<::) limx-b I(f) (x)
exists. An improper integral is called divergent <::) it is not convergent. The
improper integral (f, I(f)) is called absolutely convergent<::) (ifl, J(ifl))
is convergent. In case (f, I(f)) is convergent, then limx-b I(f) (x) is called
the limit of the improper integral and it is denoted by

J: f(t) dt .
As in the case of infinite series, it is often convenient to denote an

improper integral by the symbol

J: (f(t) dt).
As an example of a convergent improper integral of the first kind

consider the function defined by f(t)


lim

.r- 00

and hence we write

.r

e-" dt

e-1

Then

I,

L'" e-1 dt

I.

As an example of a convergent improper integral of the second kind,

we have

lim

.r-1

ix
0

dt

7T

- -

\!T=t2 -

Clearly Definition 5.3.1 does not exhaust all the possibilities for defining

f is a function with domain


]a, b] .JI [x , b] is Riemann-Darboux
reasonable to call the ordered pair (f, I(f))

improper integrals. For example, suppose

]a , b] (a

E R or

a = -oo )

and V x E

integrable. Then it is quite

an improper integral, where

I(f) =

J f(t) dt.

The type of theorems that one can prove about improper integrals

of the first and second kinds resemble very closely analogous theorems
for infinite series. As examples we prove the following results.

5.3.2 Theorem. The improper integral (f,I (f)) is convergent<::) the


function I(f) is Cauchy in the sense that Ve> 0, 3c E [a, b[ so that

5.3

Vx,y

IMPROPER INTEGRALS I 203

[a, b[ with x, y > c we have


II(J) (x) -J(J) (y)I

Proof.

If lim.r-b I(f)

(x)

<

e.

exists, then the Cauchy condition is ob

vious. On the other hand, if J(f) is Cauchy, the theorem follows from
Proposition 2.4.4.

5.3.3 Theorem. Suppose (g, J(g)) is an absolutely convergent im


proper integral and that (J, I (f)) is an improper integral so that Je(J) =
Je(g) and Vt E Je(J), IJ(t)I lg (t)I. Then (J, J(f)) is absolutely con
vergent and thus convergent.

Proof.

Suppose

Je(f) = Je(g) = [a, b[

and x,y E

[a, b[

with

y > x.

Then

If f(t) dt l
f lf(t)l dt
f lg (t)I dt
II(lgl) (y) -J(lgl) (x)I.

II(f) (y) - I(f) (x) I=

Since

I (lgl)

(lgl, I (lgl))

is convergent, it follows from Theorem 5.3.2 that

is Cauchy. Thus the previous inequality shows that

I(IfI)

and

hence J(f) is Cauchy. Thus, using Theorem 5.3.2 again, we have com

pleted the proof of the theorem.

5.3.4 Integral Test. Suppose f is a nonnegative nonincreasing func


tion with domain [O, oo[ and integrable on every finite interval [O, x], x > 0.
Then the series k,.,0 (f(k)) and the function l(J) converge or diverge simul
taneously.

[k

Proof. Since f is nonincreasing, we have f(k)


1, k]. Thus

f(t)

for

f(k) L1 f(t) dt f(k).

(5.3.1)

However,

J(f) (n)

J: f (t) dt = J:_1 f (t) dt.

Hence we see that if lim.r- .,,/ (J) (x) exists, the monotone nondecreasing
sequence defined by the left side of (5.3.1) is bounded and hence con
vergent. On the other hand, if the series k,.,0

(J(k))

is convergent, the

204 J INTEGRATION

right-hand inequality of (5.3.1) shows that the monotone nondecreasing


sequence (I(J) (n)) is bounded and hence, since /(f) is monotone

nondecreasing, limx-ool(f) (x) exists.

Since the statement about divergence is an immediate consequence


of the statement about convergence, we shall consider the theorem as
proved.
As an example of the use of the integral test, we come back to the

p series of Section 3.2. Let f be that function on [O, oo[ defined by


1

f(t) = (t + I)P'

0.

Then clearly f 0 and is monotone nonincreasing, and

J( k)

(k + I)P

(I/kP)

Thus the series k,,,1

and the integral

i00 (1/tP dt)

converge or

diverge simultaneously. Now,

x -dt
1
=

tp

{ _l_

1- p
log

[x1-p

l]

'

for p =I= 1,
for p=

x,

1.

Thus we see that the given integral of the first kind converges{:=:} P > 1.
Thus the p series converges {:=:} p > 1 .

Let us give another example that shows how the techniques used in
the proof of the integral test can be used to obtain rather refined esti
mates for certain finite sums. Let us show that

n
k}:
=2

k log k =
-

log logn + a+ bn,

where a is a constant that satisfies

0 < a<

0 <

bn

1
2 log 2

<

log log 2

1
n logn

Let us draw the graph of the function with values

1/ (t

log

(Fig. 5.3.1). Let ek be the area as shown in the figure; that is,

1
k
e = k log k

k+l dt
J t t'
k

log

2.

t), t > 1

5.3

IMPROPER INTEGRAl.S I 205

1
t

log

k+l

FIGURE 5.3.1

Then we may write

n-1
n
n-1 k+l dt
1
1
2: k log k = 2: k t log t+ 2: ek+n logn
k=2
k=2
k=2
--

(5.3.2)

--

Note now that

n-1 k+l dt
n dt
= = log logn - log log 2 .
L k 1
t
og
t
t 1og-t
k=2

Further,

I
< ek< k log k -

O
and thus

<

n-1
n-1
2 ek<
I

I
+ l)log(k+I)'

(k

1
k log k - (k

+I)

l)log{k

2 log 2

(5.3.3)

(5.3.4)

n logn

l::k,,.2 (ek)

Thus the series


is convergent to a number which is less than
1/(2 log 2). Further, by the same type of reasoning we find that
co

< Lek< n
k=n

I
1ogn

(5.3.5)

Let us set

co

Lek - log log 2,


k=2
bn=- ek+ I
n Iogn
k

(5.3.6)

(5.3.6')

Then from (5.3.4) and (5.3.6) we get the estimates for


(5.3.5) and (5.3.6') we get the estimates for
(5.3.3) we get

bn .

n
I
= log logn+ a+ bn.
L
k=2 k log k
-

and from

Also, from (5.3.2) and

l!06 I INTEGRATION

Now, on to other matters. An integral such as

(5.3.7)
is not absolutely convergent; that is, the function with values ! sin

xl/x

does not have a convergent integral of the first kind. To see this we
simply note that

Vn

E N,

lmr j

sin

Now,

Vt

ti

k
< +ll1r j sin tj
n -1
dt = 2: f
dt.
k=O k1T

[k7T, (k+ 1)7T], I /t l/(k+ 1)7T. Further, Vk


"
< +01T
fk
! sin t i dt 1 sin t dt 2.
k1T
0
=

Hence

Jo

n"

!sin ti
t

E N0,

1
7Tk=ok+l

n-I

dt-2:

Since the sequence defined by the sum on the right is divergent, we see
that the integral in

(5.3. 7)

is not absolutely convergent.

On the other hand, the integral in (5.3. 7) is convergent and the proof
is very much like the proof for Abel's test,

3.2.6, or Dirichlet's test, 3.2.7


x 7T/2,

-integrate by parts. Let us first write, for

lx
0

sin

--

dt =

f"'2

sin

dt +

--

Ix

1T/2

sin t
--dt.
t

We are supposing here, as we also did previously without mentioning

t/t,

it, that we have extended the function, with values sin

continuously

to t= 0 by giving it the value 1 there. Hence the first integral on the right
exists as an ordinary Riemann-Darboux integral. Let us use integration
by parts on the second integral. We have

fx -- d
sin t

7T/2

Since

Vt, ! cos ti

t--

fx

cos x
cos t
-- dt.
x
7T/2 -2t

,,_;;; 1, it follows from Theorem

on the right converges. Hence

100
0

f"'

sin t
1"'2 sin t
--dt=
-- dtt
t
7T/2
0

5.3.3

cos t

-2

that the integral

dt.

Let us now go on to discuss another type of improper integral. For


E

> 0 it is clear from the formula

- loge=
that the function with domain

I.I dt

]O, l]

and values l/t does not have a

5.3

IMPROPER INTEGRAIS I 207

convergent improper integral of the second kind. On the other hand,


for

>

0,

. [f- -+
dt f dt
t
t ]
1

hm
E-0

-1

=0.

If an integral of a function exists in this sense, we say that

has a

convergent Cauchy principal value integral. We give the formal defini


tion below.

Definition. Suppose f has domain [a,b] \{c}, c E ]a,b[, and


with O<e:s;min(c-a,b - c), Jl[a,c -e] and Ji[c+e,b] are
integrable. Let I(j) be that function with domain ]O, min (c -a, b - c)]
given by
5.3.5

Ve

I(j) (e) =

c- f( t )

l>
f
dt +

c+E

f(t) dt.

Then the ordered pair (f, I (J)) is called a Cauchy principal value integral.
Similarly, if f has domain ]-oo,oo[ and Vx 0, Jl[-x,x] is integrable,
and I(j)
I(f )(x) =

J:/(t) dt,

then the ordered pair (f, I (f)) is also called a Cauchy principal value inte
gral. The Cauchy principal value integral (f, I (f)) is said to be convergent
=>Jim,_0I(J)(e) or limx-ool(J)(x) exists, and the latter numbers are
called Cauchy principal values.
We shall now give some examples which show that the symmetry used
in the definition of the Cauchy principal value may be very important.
Since

t2

sin

is an odd function, its integral over

the integral of 1/ ( 1 +t2) over


.
IJill
x-oo

[-x,x]

x 1
+

-x

t2

[-x,x]

is 2 Arc tan

sin

1 +t2

dt

is zero. Also

Thus

x.

7T.

On the other hand,

x+17 1 +t2 sin

-x

+ t2

dt = 2

Arc tan

and thus the limit does not exist as

x+2

cos

x+

x+1l

I - sin t
dt,
1 + t2

x - oo.

By the same type of reasoning we find that


.

hm

x-oo

On the other hand,


x-00

Jim

2x

..!.2 ...i_ dt = Jim

-xI+t

x -co

I+t
dt = 7T.
t2

-x 1 +

fx

-x

..!...2 .i_ dt+Jim


l+t

x-co

2x

..!....i_ dt.
I+t2

208 j INTEGRATION

Now,

2X df
J
x-00 x I+t2
. 2x -1-t - dt I .
x-oo Jx +t2 x-oo

hm

hm

Thus

hm log

(1+4x2 2)
1+x

. zx,I+t
+t2 dt +
x-oo J-x 1

hm

7T

log 2 .

log 2.

The definitions we have written down do not exhaust all the possi
bilities for defining improper integrals and the reader can undoubtedly
think of cases we have not discussed. However, in most instances a
suitable definition of an improper integral will be either a variant or a
combination of the definitions we have discussed. Some of the follow
ing exercises are designed to exhibit the various possibilities.

D Exercises
I.

State and prove results analogous to Theorems 5.3.2 and 5.3.3

for Cauchy principal value integrals.

2.

Discuss the convergence of the following improper integrals:


(a)

(b)

(c)

3.

Discuss the convergence of the following improper integrals:


(a)

(b)

(c)
4.

!100 (/ 1).
L"" C2 )
J:""C3 )

L1 (t t dt).
J: c- t dt) .
in/2 ( )
0
log

in

cost

For what values of


(a)

f000 (t"'e-1 dt).

will the following integrals converge?

5.3

(b)
(c)

5.

IMPROPER INTEGRALS I 209

J: (;!).
Loo (;!).

Discuss the convergence of the following integrals:


(a )
(b)

L (k
)
J:oo C( (2 )
J-oooo ( t2 t I tltl1/2 ) dt
00

(c)

6.

1112

For what values of


( a)
(b)
(c)

7.

00

f C t)a )
J: ( J
L"' Ca g t)
1

( lo

will the following integrals converge?

r I g

Show the following:

i"' --t dt J"' --t dt.


t2
t
sin 2

sin

8.

State and prove an analogue of Abel's test for improper integrals.

9.

State and prove an analogue of Dirichlet's test for improper

integrals.

IO.

Discuss the convergence of the following integrals:

(a )
(b)
(c)

I I.

L"' ( t dt) .
"'
L ( I t ti dt ) .
oo C! t dt .
)
L(
i
5

si

Use the integral test to establish convergence or divergence of

the following series:


(a)

""

k=I

(k3e-k).

210 I INTEGRATION
00

(b)
(c)

k=2

((log k)P/k).

( k (log og k)")

Let p and q be fixed integers p

12.

an=

pn

L
k=qn+l

1 and let

Use the ideas involved in the proof of the integral test to establish that

()

an log
(Hint:

"1
L-

k=l k

where a is constant and


13.

0 <

as

oo.

n
dt
- + a + bn,
t
1

b,. < c/n, c constant.)

In the equation
n

l og k =
L kk=2

log log n

a + bn,

show that

log log 2
log 2 -

<

a <

and
0 <

log log 2 ,
2 log 2 -

bn < 2
n l ogn

[Hint: Extend the techniques used in the text and show that V k 2,

I
I
2 k log k - (k + I) log (k + 1)

5.4

<

ek <

1)

k log k - (k + I) log (k +

RIEMANN-STIELTJES INTEGRALS

In this section we shall discuss a generalized integral that is very valuable


in that it brings together under one form such seemingly diverse topics
as absolutely convergent infinite series and Riemann-Darboux integrals.
We think i.t is also true that without the concept of such a generalized
integral a very important part of modern functional analysis would not
be available to us.
5.4.1
Definition. Let f and g be real-valued functions each having the
domain [a, b] . Let S 10 be that real-valued function with domain the collection

5.4

RIEMANN-STIELTJES INTEGRALS j 211

of ordered pairs (a, {x k}), where d {[ak> bk ]: k E ( 1, n)} is a decompo


sition of [a,b] and xk E [ak> bd, and de.fined by
=

S1,0(d , {xd)

L f (xk) [g(bk)-g( ad].

k=I

The function S1,0 is called the Riemann-Stieltjes sum function for f with respect
to g and any element in its range is called a Riemann-Stieltjes sum.
The Junction s l.o is said to converge to the limit s (f,g ) v E > 0' 3 a.
so that a >- a.
IS1.o(d, {xk})-S(f,g )I

<

E.

In case the limit of S1,0 exists, the number S( f, g ) is called the Riemann-Stieltjes
integral of f with respect to g and it is denoted by
S(f, g )

J: f(x) dg(x).

In this case f is said to be integrable with respect to g.


S1,0 has
S(f,g) the limit of S1,0, since it is unique.

As in the case of an ordinary Riemann-Darboux integral, if


a limit we are j ustified in calling

The proof is the same as before.

5.4.2 Definition. Let f be a bounded Junction and g a monotone non


decreasing function, each having domain [a, b]. Let D1,0 and Q.1,0 be those
real-valued functions with domain the set of decompositions of [a,b] so that if
d= {[ak,b k]: k E (l,n)},

m k inf {f(x): x E [ak , bd}.


k=I
The functions 151,0 and D1.0 are called the upper and lower Darboux-Stieltjes
su"!.]unctions for f with respect to g, respectively. Any element in the range
of D1,0 and any element in the range of Q.1,8 is called an upper and lower
Darboux-Stieltjes sum for f with respect to g, respectively.
Set
f(x) dg(x)'
D(f,g) inf {D1.u(d): a E >(D1.o)}
D1 , 0(d)

L mk[g(bk)-g(ak) ],

[l(f,g)

su p {!l1.0(d):

d E >([21,0)}

J:
J: f(x) dg(x) ,

arJ,d call these numbers the upper and lower Darboux-Stieltjes integrals of f
with respect to g, respectively. In case D(f,g) [l(f,g) D(f,g), we say
that f is Darboux-Stieltjes integrable with respect to g and call D(f,g) the
Darboux-Stieltjes integral of f with respect to g.
=

212 I INTEGRATION

The fundamental theorem here, as in the case of Riemann-Darboux


integrals, is the following:

5.4.3 Theorem. If f is bounded and g is monotone nondecreasing, then


the Riemann-Stieltjes integral of f with respect to g exists if and only if the
Darboux-Stieltjes integral exists and if they exist they are equal.
The proof of this theorem follows the details of the proof of Theorem

5.1.5, and we shall not reproduce it here.


5.4.4 Theorem. The Riemann-Stieltjes integral of the function f with
respect to g exists=? the function S1,0 is Cauchy in the sense that VE > 0, 3a,
so that a >- A. and a' >- a. ==>

ISr.o(a,{xd)-Sr.o(a',{x'k}I

<

e.

5.4.5 Corollary. Suppose [a1, b1] C [a, b], f and g are defined on
[a, b], and f1 and g1 are the restrictions of f and g to [a1, b1], respectively.
If f is Riemann-Stieltjes integrable with respect to g, then fr is Riemann
Stielt.Jes integrable with respect to g1
The proofs of Theorem 5.4.4 and Corollary 5.4.5 follow

mutandis

mutatis

the proofs of Theorem 5.1. 7 and Corollary 5.1.8.

Theorem. Suppose f, g, and h are defined on [a, b].


If f and g are Riemann-Stieltjes integrable with respect to h, then for
all real numbers a and {3, a f+ {3g is Riemann-Stieltjes integrable with respect
to h and
5.4.6
(a)

J: [af(x) +{3g(x)] dh(x) =a J: !(x) dh(x) +{3 J: g(x) dh(x).


(b) If f is Riemann-Stieltjes integrable with respect to g and h, then for
all real numbers a and {3, f is Riemann-Stieltjes integrable with respect to
ag+ {3h and

J: J(x) d[ag(x)+ {3h(x)] =a J: f(x) dg(x) {3 ib f(x) dh(x).


+

(c) If f ?!! 0 and Riemann-Stieltjes integrable with respect to g and g is


monotone nondecreasing, then

J: f(x) dg(x)

?!! 0.

(d} If f is bounded and Riemann-Stieltjes integrable with respect to g,


and if g is monotone nondecreasing, then IJI is integrable with respect to g and

If f(x) dg(x)I J: IJ(x) I dg(x).

RIEMANN-STIELTJES INTEGRALS I 215

5.4

(e) Iffis Riemann-Stieltjes integrable with respect to g and if a ,,;;; c ,,;;; b,


then fl[a,c] and fl[c,b] are Riemann-Stieltjes integrable with respect to
gl [a,c] and gl[c,b], respectively, and

J: f(x) dg(x)= J: f(x) dg(x) J: f(x) dg(x).


+

The proofs of (a), (c), (d), and (e) follow the proofs of Theorems
and

5.2. l

5.1.9

and we leave this as an exercise. The proof of (b) is very sim

ple and we also leave it as an exercise. The proof of (b) also follows from
part (a) and Theorem

5.4.8.

5.4. 7 Theorem. If f and g are defined on [a,b], g is continuous on


[a, b] and differentiable on ]a,b[ and if the Riemann-Stieltjes integral off
with respect to g exists and the Riemann-Darboux integral off(x)g'(x) exists
(when g' is defined in any way at the end points), then

J: !(x) dg(x) J: !(x)g'(x)


=

Proof.
with

Ik

dx.

Let a= {Ik: k E (1, n)} be any decomposition of [a, b]


[ak,bk]. By the Mean Value Theorem, 3xk E [ak,bd, so that

g(bk) - g(ak) = g'(xk)(bk - ak).


Consequently,

s,.u(a, {xd>

L 1cxk> [g(bk> - g(ak> 1


k=I
L f(xk)g'(xk)(bk - a k)= R10,(a, {xd).
k=I

Since the limits on the left side and on the right side exist, they must
be equal. This establishes the theorem.
An immediate corollary of Theorem
if we take

f(x) =

5.4.7

is Theorem

5.2.2. Indeed,

1, and note that

J: dg(x)

we immediately have Theorem

g(b) - g(a) ,

5.2.2.

The next theorem is integration

by parts for Riemann-Stieltjes integrals. When taken in conjunction


with the previous theorem, it is seen to yield Corollary

5.2.3.

. ned on [a, b] and if the integral


5.4.8 Theorem. If f and g are defi
offwith respect to g exists, then the integral ofg with respect to fexists and

J: f(x) dg(x) J: g(x) df(x)


+

f(b)g(b) - f(a)g(a)

J: df(x)g(x).

214 I INTEGRATION

Proof.

.1 {h: k E ( l, n)}
[ak,bk]. We may write

Let

where/k =

be any decomposition of

[a,b],

S0j.1, {xd) = L g(xk)[f(bk) - f(ak)],


k=l
On the other hand,
n

f(b)g(b) - J(a)g(a) = L [f(bk)g(bk) - f(ak)g(ak)].


k=l
Hence

f(b)g(b) - f(a)g(a) - S0,1('1, {xk})


n

=L

k=l

Now

[ak,xk]
.1'

[xk,bk] = [ak, bk]

f(bk)[g(bk) - g(xk)]

L f(ak)[g(xk) - g(ak)].
k=l

and hence

{[ak,xk]: k E (l,n)}

{ [xk,bk]: k E (l,n)}

is a decomposition of [a,b] and indeed is a refinement of a. We may


therefore write

S0,,(a, {xk})
x'k = ak for
[xk, bk]. Since, by

where

= f(b)g(b)

the interval
hypothesis,

- f(a)g(a) - S1,0('11, {x'k}) ,

[ak, xk]
S(f,g)

and

x'k =bk

for the interval

exists, the conclusion of the

theorem is an immediate consequence of the last equality.


The generalization of Theorem 5.2.4 is the following:

Theorem. If g is defined, increasing, and continuous on [a,b],


and
h
are
defined on [g(a),g(b)], and the Riemann-Stieltjes integral of
f
fwith respect to h exists, then the integral off g with respect to h g exists,
and
5.4.9

fg(b) J(x) dh(x) =fb f


o<m

Proof.

g(x) dh

g(x) .

{[ak,bk]: k E (I, n)}


[a,b], then .1' ={ [a'k,b'k]: k E ( 1, n)} , where
g(bk), is a decomposition of [g(a),g(b)], and con
if xk E [ak, bd, then x'k = g(xk) E [a, b'k], and con

As in the proofofTheorem 5.2.4 if.1 =

is a decomposition of

a'k = g(ad, b'k

versely. Further,
versely. Hence

Since

'1 1

>

.12 <=? .1'1

> .1'2, and since the limit on the left exists, the

limit on the right exists and we have proved the theorem.


Note that this theorem in conjunction with Theorem 5.4. 7 gives
Theorem 5.2.4 by taking

h(x) = x.

5.4

RIEMANN-STIELTJES INTEGRALS I 215

The Riemann-Stieltjes integral and Theorem 5.4. 9 can be used to


justify some of the formal operations used in elemel\tary calculus.
For example, we may write

f12 f(sin x) d sin x J: f(x)


=

dx

for any continuous function f. Assuming that the integral on the left
exists (which we shall prove in Section 5.5), set g(x)
Arc sin x. Since g
is monotone increasing and g(O) 0, g(l)
7T/2, if we use the formula
of Theorem 5.4.9 we get
=

f0"12 f(sin x) d sin x J: f(sin g(x)) d sin g(x) .


=

But since g is the inverse of the sine function, we have sin g(x) x.
Of course, in this case Theorem 5.2.4 in conjunction with Theorem 5.4.7
will also justify this change of variable. (See also Exercise 6 of Section
5.2.)
The next two theorems are generalizations of the mean value the
orems, 5.2.6 and 5.2.7.
=

5.4.10 First Mean Value Theorem. If f is de.fined and bounded on


[a, b] and g is monotone nondecreasing on the same interval, and if f is
integrable with respect to g, then 3c such that inf f,,;:::; c ,,;:::; sup f and

J: f(x) dg(x) c J: dg(x)


=

c[g(b) - g(a)J.

The proof of this theorem is essentially the same as the proof of


Theorem 5.2.6, and we leave it as an exercise.
5.4.11 Second Mean Value Theorem. If f and g are de.fined on
[a, b] , f is monotone nondecreasing, and g is continuous and integ rable with
respect to f, then 3c E [a, b] such that

J: f(x) dg(x)
Proof.

J(a)

f dg(x)

+ J(b)

J: dg(x).

By Theorem 5.4.8 we may write

J: f(x) dg(x)

f(b)g(b) - f(a)g(a)

- J: g(x) df(x).

Since g is continuous, it takes on all values between its maximum and


its minimum. Hence if we use this fact and apply Theorem 5.4. l 0, we
see that 3c E [a, b] so that

J: g(x) df(x)

g(c)

J: df(x)

g(c)[J(b) - J(a)].

216 \INTEGRATION

Hence

J:

f(x) dg(x) =f(b)[g(b) - g(c)] + f(a)[g(c) - g(a)]


=f(a)

J:

dg(x) + f(b)

dg(x).

Since g is continuous and f is monotone nondecreasing, it

NOTE:

will follow from Theorem

5.5.2

that g is integrable with respect to

and hence this part of the hypothesis is superfluous. We should

J,

also note that if f is monotone nonincreasing we get exactly the


same theorem.

D Exercises
If f is defined and bounded on

I.

[a, b]

and g is monotone non

decreasing on the same interval and f is integrable with respect to

g,

is it necessarily true that

J:

F(x) =

f(t) dg(t)

is continuous? If so, prove; if not, give an example and give sufficient


conditions under which it is continuous.
If

2.

[a, b],

r
r
1:

(a )
(b)
(c)

is nondecreasing on

f(x) dh(x) =

f(x) dh(x)

[f(x) + g(x)] dh(x) .s;


[f(x) + g(x)] dh(x)

r
J:

and f and g are bounded on

f(x) =

f:

f(x) df(x)

(iz .s; c .s; b).

f(x) dh(x)

f(x) dh(x) +
f(x) dh(x) +

Letf be a function defined on

3.

Does

[a, b]

show that

r
1:

[O, l] in

g(x) dh(x).
g(x) dh(x).

the following way:

{I'

for

x E [O, 1/2],

0,

for

x E ]1/2, l].

exist?

a= {[ak, bk]: k E (1, n ) } is a decomposition of [a, b]


a, bn = b, and ak+i = bk> k E ( l, n - I). Suppose g is constant
on each interval [a, b1 [, ]an, b], and ]ak, bk[, is defined in any way for
x =ab and g(ak+) - g(ak-) =Bk for k E (2, n - 1). If f is contin4.

with

a1

Suppose
=

5.5

uous on

[a, b],

show that S (J,

J:
5.

Suppose

Iff(x)
6.

x for x

If f,

and J and

g) exists and

J(x) dg(x)

is defined on

g(x)

[O, l]

x,
x + 3/4,
x + 9/8,

[O, I],

FUNCTIONS OF BOUNDED VARIATION I 217

f(ak)8k.

in the following way:

x
x
for x

compute

for

for

[O, 1/3],
]1/3, 2/3[,
[2/3, I].

f(x) dg(x).

g, and h are defined on [a, b], his monotone nondecreasing


g are bounded and integrable with respect to h, show that

fg is integrable with respect to h.

7.

Prove Theorem

8.

5. 4 6(d).
.

(Jn) is a sequence of functions and g is a monotone


[a, b]. Suppose
Nofn is integrable with respect to g and fn f uniformly on
Suppose

nondecreasing function, all functions being defined on

Vn E
[a, b].

Show thatf is integrable with respect tog and

J:
9.

fn(x) dg(x)

Show that Abel's lemma,

J:

f(x) dg(x) .

3.2.5,

is a special case of Theorem

5.4.8.
IO.

Supposef,

g, and hare defined on [a, b] andf and g are integra


Lagrange identity:

ble with respect to h. Prove the integral form of the

J: [J:
(J:
=

{f(x)g(y) - f(y)g(x)}2 dh(x) dh(y)

) (f

f(x)2 dh(x)

) (f

g(x)2 dh(x) -

r.

f(x)g(x) dh(x)

If h is monotone nondecreasing, use this to obtain the Cauchy-Schwarz


inequality:

(f
5.5

r (J:

f(x)g(x) dh(x)

) (f

f(x)2 dh(x)

g(x)2 dh(x) .

FUNCTIONS OF BOUNDED VARIATION AND THE EXISTENCE


OF RIEMANN-STIELTJES INTEGRALS

To get an idea of the kinds of functions that will guarantee the existence
of the Riemann-Stiehjes integral, we shall start with a continuous func
tion

f and write down the Riemann-Stieltjes sums off with respect to


g.

a function

218 j INTEGRATION

Let

a1 and a2 be

decompositions of an interval

and a=

2,
of

{I k :

I :,;;; k

[a, b]

Suppose

a;= {l;k:

and a the com

(1, n;)}, j = l,
:,;;; n}. Now, suppose we have labeled the elements
a2

mon refinement of a1 and

k E

so that

l1k = {/;: lk

<

j:,;;; lk+1}.

Consequently, we may write


lk+I

g(b1k) - g(a1k) = L

i=lk +I

x1

and hence if

11

and

x1k

Ilk

[g(b;) - g(a;)] ,

we have

lk+I

f(x1k) [g(b1k) - g(a 1k)] - L f(x;) [g(b;) - g(a1)]


j=l k+l
=

'k+l

j=l k+I

[f(x1k) - f(x;)] [g(b;) - g(a;)].

If we sum both sides of this over k we get

S1, g(a1, {x1k}) - S1,g(a, {xk})


=

n1

1k+I

k=I

i=lk+I

[f(x1k) - f(x;)] [g(b;) - g(a;)].

(5.5.1)

S1, g(a2, {x2k}).


[a, b], it is uniformly continuous and hence
Ve > 0, 38 > 0 so that Ix - YI < 8 and x,y E [a, b] implies lf(x) - f(y) I

We get a similar formula for


If

is continuous on

Consequently, if a. is a decomposition with 1a.1 < a, and al> a.


a2 >a. it follows that a >a. and la l < 8. Consequently, the left
side of (5.5. l) is less than
<

E.

and

L lg(b;) - g(a;) I.

J=I

If we suppose, for the moment, that

3M so that for any decomposition

L lg(b;) - g(a;) I :,;;; M,

j=I

then we have

(5.5.2)
That is,

S1,g

is Cauchy. It follows from Theorem

the Riemann-Stieltj es integral of

5.4.4

with respect to

and

(5.5.2)

that

g exists.

We are now ready to write down a formal definition.

5.5.1 Definition. A function g defined on the interval [a, b] is said


to be of bounded variation <=} 3M surh that for every decomposition a =

5.5

FUNCTIONS OF BOUNDED VARIATION I %19

L Jg(bk)-g(ak) I

k=l

:;;;; M.

We have actually proved the following theorem.

5.5.2 Theorem. If f and g are de.fined on the same interval, f is con


tinuous and g is of bounded variation, then the Riemann-Stieltjes integral of
f with respect to g exists.
Note that this theorem when used in conjunction with Theorem

5.4.8 on integration by parts contains Theorem 5.2.10 as a special case.


Note also that a slight adjustment in the proof of Theorem 5.5.2 will
lead to a theorem that contains Theorem 5.2.11 as a special case (see
Exercise 5 at the end of this section).
It is a simple consequence of the triangle inequality that the sum of
two functions of bounded variation is again a function of bounded varia
tion. Since it is clear that a monotone function on a finite interval is of
bounded variation, it follows that the sum of any two monotone func
tions on a finite interval is a function of bounded variation. Indeed,
the last statement is a description of any function of bounded variation.

5.5.3 Theorem. A function g with domain [a, b] is of bounded varia


tion if and only if it is the difference of two monotone nondecreasing functions.
Further, there are unique monotone nondecreasing functions g+ and g- so
that g+(a) g- (a) = 0 ,
=

g(x) - g(a)

g+(x)-g-(x),

(5.5.3)

and so that for every pair of monotone nondecreasing functions J+ and 1- for
which
g(x) - g(a)
we have

j+(x)-J-(x),

g+(x) :;;;j+(x) -j+(a),


(5.5.4)

g-(x) :;;;j-(x) - J-(a).


Proof.

We shall prove only the necessity of the first statement,

since the sufficiency is obvious. If


tion of

[a, x]

[a, b],

let

6.(x) be any decomposi

and (x) the collection of all such decompositions. Set

sg+(a(x))

2 L {Jg(bk)-g(ak)I
k=l
1

sg-<a<x)) = 2

[g(bk)-g(ak)]},

L {Jg(bk)-g(ak)I- [g(bk)-g(ak)]}.

k=l

%20

I INTEGRATION

where [ak> bk]

A(x). It is clear that

Sg+(A(x)) -s0-(A{x))

g(x) - g(a).

(5.5.5)

Since g is of bounded variation, the sets

are bounded and hence


g+(x)
g-(x)

sup{sg+(A{x)): A{x)

.#(x)},

sup{Sg-{A(x)): A{x) E (x)},

are well defined. The functions g + and g- are monotone nondecreasing


functions. We shall show this for g+, the proof for g- being similar. We
first note that every term in the sum Sg+(A(x)) is nonnegative. This
means that if A (x) C A{y ), then
(5.5.6)

Now, if x < y, then VA(x), 3A{y) such that A{x) C A{y). Indeed, if
A(x,y) is any decomposition of [x,y], then A(x) C A(x) U A(x,y)
=A{y). Consequently, it follows from this fact and (5.5.6) that
g+(x)

sup{S0+(A(x)): A{x)

.#(x)} g+(y) ,

which shows that g+ (x) is monotone nondecreasing.


Let us now prove (5.5.3); that is,
g(x)-g(a) =g+(x)-g-(x).
To show this we first note that
1

S0(A{x)) =2

k=l

'

lg(bk) -g(ak)I 2 [g(x) -g(a)]

where we have set


So(A(x))
Hence

Sg+(A(x)) + sg-(A(x))

g(x) =sup{Sg{A(x)): A(x)

k=l
E

lg(bk) -g(ak) 1.
.# (x)}

=2 sup{Sg(A(x)): A(x) E ,,! (x)}

I
2 [g(x) - g(a)].
If we subtract g -(x) from g+(x) and use the above equality, we get
(5.5.3).

5.5 FUNCTIONS OF BOUNDED VARIATION J 221

To complete the proof, let us first remark that if

;;;,:

0 andB ;;;,: 0,

we always have

4 {IA -Bl+ (A -B)}.;;; A.


Now, if g(x) -g(a)

l+(x) - 1-(x),

wherel+ andl- are nondecreas

ing, then if we apply the previous remark with


B 1-(bk) -1-(ak), we get

1+(bk) -1+(ak) and

2 {lg(bk) -g(ak)I

[g(bk) -g(ak)]}.;;;l+(bk) -l+(ak).

Thus

sg+<<x))

1l

.;;;

L u+<bk) -1+<ak)J .;;;1+<x) -1+<a).

k=I

It follows that

g+(x) .;;;l+(x) -l+(a).


Now,

l+(a)

1-(a),

and

1-(x) -g-(x)

1-(x) - 1-(a) -g-(x)

l+(x) -g+(x).

l+(x) -l+(a) -g+(x)

;;;.:

Thus

0,

which shows that

The uniqueness of the

decomposition of

g -g(a)

into functions

havin g the properties ofg+ andg- is an immediate consequence of the


normalization conditions at

and the minimum conditions (5.5.4).

This completes the proof of the theorem.


The decomposition of a function of bounded variation into the
difference of two monotone nondecreasing functions is clearly not
unique. The unique pair of functionsg+ and g- described in the last

theorem is called the

canonical decomposition

ofg -g(a).

Since every g of bounded variation can be decomposed into the


difference of two monotone nondecreasing functions, it follows that
Vx E [a, b],g(x+) andg(x-) exist. We are using the same convention

here as in Section 2.3: g(a-)

g(a)

andg(b+)

g(b).

5.5.4 Theorem. If g is defined on [ a, b] and is of bounded variation,


and if the pair g+ and r is the canonical decomposition ofg -g(a)' then
Vx E [ a, b],

g(x+) -g(x)
g(x) -g(x-)

{lg(x+) -g(x)I

=4 {lg(x) -g(x-)1

[g(x+) -g(x)]},
(5.5.7)

[g (x) -g (x-)]}.

In particular, g is continuous at x g+ and g- are continuous at x.

222 I INTEGRATION

For every E > 0 there exists a decomposition a. of [a,b]


g+(b) Sg+(d,) + E. If d >- d., then Su+(d,) Su+(d), SO that
g+(b) s0+(a) +e.
Let x E ]a, b[ , a >- a , so that [x, y] E d. Further, let d(x)
={/:IE a & IC [a,x]}, d(x,y) {[x,y]}, and d(y,b) ={/:IE a
& I C [y,b]}. Then it is clear that

Proof.

SO that

sg+(a) = so+(a(x)) + so+(a(x,y)) + so+(a(y,b))'


and hence

0 g+(b) - so+(a)

[g+(x) - so+(a(x))]
+

[g+(y) - g+(x) -so+(a(x,y))]

+ [g+(b) -g+(y) - so+(d(y,b))]


<

(5.5.8)

E.

Now, by the method used in the last paragraph of the proof of Theorem

5.5.3, we can easily establish that all the terms in the sum on the right in
(5.5.8) are nonnegative. Thus we get
0 g+(y) -g+(x) - S0+(d(x,y)) <

E,

or, what is the same thing,


1

0 g+(y) -g+(x) -2 {lg(y) -g(x)I + [g(y) -g(x)]} <


If now we allow

E.

we get

g+(x+) -g+(x) = 2 { lg(x+) - g(x) I+ [g(x+) - g(x)]}.


The proofs of the other formulas in (5.5.7) follow in a similar way.

The proof of the statement about continuity is almost obvious and we


leave it for the reader.

+
If g is of bounded variation and g and g form the
canonical decomposition of g - g(a), then
5.5.5

Definition.

(5.5.9)
is

called the variation of g and v(g) = v0(b) is called the total variation of g.
The next proposition gives another way of computing the variation

of a function.

5.5.6

Proposition

vu(x) = sup{S0(d(x)): d(x) E J (x)},

(5.5.10)

5.5 FUNCTIONS OF BOUNDED VARIATION I 223

where

Sg(A(x))

sg+(a(x)) + sg- (A(x))

n
L lg(bk) -g(ak)I.
k=l

Moreover,
Vg(x+)-vg(x)
Vg(x) - Vg(x-)
Proof.

lg(x+) -g(x)I,

(5.5.11)

l (x) - g(x-)1g

As we have already noted in the proof of Theorem 5.5.3,


g(x)

4 sup{Sg(A(x)): A(x)

tt (x)}

1
2 [g(x) -g(a)].

Thus, adding g + and g-, we get the formula (5.5.10). The formulas in
(5.5.11) follow immediately from the formulas (5.5. 7).
As we have already noted, a function g of bounded variation can
have only jump discontinuities. From Theorem 5.5.4 there are only a
countable number of jump discontinuities and moreover from Proposi
tion 5.5.6,
J (x) -g(x-)I= Vg(x+) - Vg(x-).
Jg(x+) -g(x)I+ g
Let J be the set of jump discontinuities of g and {xk: k E (1, n)} any
finite set in]. Suppose these are labeled so that x1 < x2 <
< Xn
Then
n
L [ Jg(xk+) -g(xk)J + Jg(xk) -g(xk-)J ]
k=l
n
L [vg(xk+) -vg(xk-)]
k=l
n-1
:;;:;; L [v9(xk+) - Vg(xk-) + Vg(Xk+i-) - Vg(xk+)] + Vg(xn+)
k=l
- vg(xn-)

:;;:;; Vg(Xn+) - Vg(x1-) :;;:;; Vg(b).

The last three inequalities follow from the fact that v9 is a monotone
nondecreasing function. Now, if] is denumerable, let (xk) be a sequence
with range]. The last inequality shows that
00

L g
J (xk+) -g(xk-)I
k=O

:;;:;; v(g),

and thus the series on the left is absolutely convergent and consequently
independent of the order of summation of the terms. Therefore, if
J is finite or denumerable, the number

%24 I INTEGRATION

( +)-gx
( -)]
L [gx

xE}

has a meaning and value independent of how the elements of J are


labeled.
Let us setlx=

[a ,x [ , and Vx E [a,b ],

g,(x )=

[g(y+)-g(y-)] +gx
( )-g(x -).

g(a-)= g(a), so that g (a ) 0. The


saltus function for g. The proofs of the following

We are using the convention that


function

g8is

called the

facts about the saltus function are easily established and we have
left the proofs for an exercise at the end.

g8(x +)-g8x
( )= gx
( +)- gx
( ),

(a)
(b)

( -)= gx
g8(x )- g8x
( )-gx
( -),

Vx E [a,b].

If g+, g- is the canonical decomposition of g-g(a), then


g+ s< g+,

and

The function g8is of bounded variation and

(c)

Vx E [a,b ].
If we set

gc = g-gs
it is clear from condition (a) that

gc is

a continuous function and more

over from condition (c) it follows that

gc is

of bounded variation.

It is interesting to compute the Riemann-Stieltjes integral of a contin


uous function

with respect to the saltus function

g8 We

know that

this integral exists by virtue of Theorem 5.5.2. By the proof of the latter
theorem

Ve>

0, 311

> 0

so that 0 < 8 < TJ and lal < 8

I ft( k)[gsb( k)-g,(ak)] - J: fx( )dg8x( )I


If

<

E/2.

ak a and bk b, then clearly there is no loss in generality in taking


bkas points of continuity of g8 In the latter case we get from the

ak and

definition of g. that

Ik= [ak> bk[.


bk= bn = b, we

where

Indeed, the last formula is also valid if ak =a. In

case

must add the germ

the right.

g+b
( )-g+ (b-)to

the sum on

5.5

FUNCTIONS OF BOUNDED VARIATION I %%5

x,y E [ak, bk] ==>


e/2M, where M is a fixed number that satisfies the

Let us suppose that 8 is small enough so that

lf(x) - f(y)J

<

inequality

L Jg(x+) - g(x-)J

<

xE}

Thus if bk#

b, we get

f(k)[g,(bk) - g,(ad]
and, if

where

M.

L f(x)[g(x+) - g(x-)] +Eb

n, we must add the term f(b)[g(b) - g(b-)] on the right,


n

LJekJ
k=l

Hence, summing over

<

e/2.

k, we see that

I Lf(x)[g(x+) - g(x-)] - fb f(x) dg,(x)I


xE}

< E.

We have proved the following result.

5.5. 7 Theorem. If f is continuous and g is of bounded variation on


[a, b], then taking g(a-)=g(a), g(b+) g(b) we get
=

fb f(x) dg(x)=Lf(x)[g(x+) - g(x-)] + fb f(x) dgc(x),


XEJ

where] is the set of discontinuities of g,


gc(x)

g(x) - g,(x) ,

and g, is the saltus function for g.


The last theorem shows, for example, that the sum of an absolutely
convergent infinite series may be written as a Riemann-Stieltj es integral.
Indeed, suppose that

g(O)=0 and for x

Lk,,.0 (ak) is an absolutely convergent series. Set


]O, I],
g(x)=a0+ L ak.
kx1

It is clear that

Vk

N,

g(O+) - g(O)=a0,
g. Since the series is absolutely
g is of bounded variation. The function g is a saltus function
and indeed g,
g. Further,
and these are the only discontinuities of
convergent,

Lak =
dg(x).
Ol
k"'=O

226 I INTEGRATION

D Exercises
If f and g are functions of bounded variation on [a, b], show
f + g and Jg are functions of bounded variation. If lgl m > 0,

I.

that

show that l/g is also of bounded variation.


2.
If

g1

[a, b J and c E [a, b].


gl [c, b], show that g1 and g2 are of bounded varia

Suppose that g is of bounded variation on

gl [a, c], g2

tion and

v(g)

v(g1) + v(g2) .

3. Suppose g is defined on [a, b], is of bounded variation, and


g+, g- is the canonical decomposition for g- g(a). If g- g(a) j+- 1-,
where j+ and 1- are monotone nondecreasing, show that h j+ - g+
1g- is a monotone nondecreasing function.
=

4.

Let g be that function defined on

g(x)

[O, l] as follows:

x sin (7T/x),

for

#- 0,

0,

for

0.

Decide whether or not g is of bounded variation.

f defined on [a, b] is said to be piecewise continuous


{ ak: k E (1, n)}
and Vk E (1, n) f(ak+) and J(ak-) exist. If f is piecewise continuous
and g is continuous and of bounded variation on [a, b], show that the
5.

A function

::::} it is continuous except at a finite number of points

Riemann-Stieltjes integral off with respect to g exists. Can the hypoth


eses on f be weakened further?
6.

Let the function f be as in Exercise 5. Weaken the continuity

requirements on g in that exercise so that the Riemann-Stieltjes integral


off with respect tog exists. Can the hypotheses onf be weakened also?
7.

If f is defined and differentiable on

[a, b] and f' is bounded,

show that f is of bounded variation.


8.

[a, b],

Suppose f is defined on

[a, b] and 3M.> 0 so that \lx,y


M Ix

IJ(x)-f(y) I

12
- Yl 1

Isf necessarily of bounded variation?


9.

[a, b] ,

Suppose f is defined on

[a, b] and 3M> 0 so that Vx,y

IJ(x)-f(y)I

M Ix-

YI

If g is any continuous function, show that the Riemann-Stieltjes integral


off with respect tog exists.

5.5

FUNCTIONS OF BOUNDED VARIATION I 227

IO.
Suppose f is continuous and of bounded variation on [a, b].
Show that

If f(x) dg(x) I J: lf(x) I ldg(x) I,

where ldg(x) I is another symbol for dv0(x).

6j HIGHER

CHAPTER

DIMENSIONAL SPACE

In the previous chapters we have developed the essentials of the calculus


for real-valued functions with domains in the real number system.
The topics we have developed include ( 1) the real number system, (2)
sequences and series, (3) real-valued continuous functions, (4) differen
tiable functions, and (5) integration theory. The object of the remaining
chapters will be to extend some of these theories to higher-dimensional
situations. Some results extend in a straightforward manner. On the
other hand, there are other results that are very simple and easy to
prove in the one-dimensional case which become rather complicated
and far-reaching theories in the higher-dimensional case. For example,
the simple fact that a differentiable monotone function has an inverse
that is also a differentiable monotone function generalizes to the con
siderably more difficult inverse function theorem. The rather elemen
tary formula for the integration of a derivative becomes a much more
complicated theorem that requires considerably more algebraic and
analytic machinery for its proof. This theorem is generally called

Stokes' theorem, but the names of Gauss and Green may also be associated
with it.
In this chapter we shall lay the basic foundation for the subsequent
chapters. We shall discuss real vector spaces with a certain distance
function acting on pairs of points and shall also discuss general proper
ties of continuous functions as well as special continuous functions called

linear transformations.

6.1

REAL VECTOR SPACES

We shall begin our discussion with the n-fold Cartesian product Rn of


the real numbers. The set Rn shall be defined as the set of all n-tuples

(x1,

"
,x

),

where

ER fork E (l,n).

A much more formal

definition of Rn can be given as the collection of all functions with


domain the set

(1, n)

and range in R. Indeed, a little reflection will

convince the reader that one reasonable way to define an n-tuple is as


a function with doIJlain

(1, n)

and range in R. We shall continue to

use the very suggestive notation that we have indicated above.


We shall now introduce two functions + and

the first one having

domain Rn X R" and range R" and the second one having domain
228

6.1

REAL VECTOR SPACES j 229

R X Rn and range Rn . These functions are defined by the equalities

(x 1, ... ,xn) + (y1, ...,yn) = (xl + y1, ... ,xn + yn),


a (x1,

,xn)= (ax1,

axn).

The triple (Rn,+, ) is a prototype example of what we shall call a


real n-dimensional vector space and we shall designate it by vn. By an
abuse of language we shall write x E vn rather than x E Rn when we
want to emphasize that we are working with a vector space, and shall
speak of the elements of vn rather than the elements of Rn. The ele
ments of vn shall be called points or vectors and we shall designate them
by letters without superscripts; for example,

x= (x1

'

'xn).

However, in two and three dimensions we shall often revert to the


standard notations (x,y) and (x,y,z). As is usual, when we "multiply"
a vector by a scalar (an element of R) we shall drop the dot. The numbers
k
x will be called the components of the vector x. The zero vector, 0, is
that one for which every component is the zero element of R. We shall
also set-x= (-I)x.
6.1.1
Definition. A finite set {xk: k E (1,m)} c vn, where if j =i' k,
then X; =i' xk is said to be linearly independent<=> for every finite set { ak: k E
(l,m)}CR:
m

k=l

akxk=O::::::} ak=O,

Vk E (l,m).

A set of vectors zn vn that is not linearly independent is called linearly de


pendent.
REMARl{S:
In using the notation {xk: k E (1, m)} we are already
specifying a function<I> with domain (1,m) and range in vn by means
of the equality <l>(k) xk. The set {xk: k E (I, m)} is the range of<I>,
and if this set contains more than one element there is more than one
function with domain (I, m) and with range {xk: k E (1,m)}. In
some instances, for example when we talk about matrices, it is im
portant to know exactly which function we are using. If this is the
case, we shall call the function an ordered m-tuple of vectors and denote
it by (x1 ,
, Xm). Ordinarily it is only the range of the function that
is important. For exampk, the first sentence in Definition 6, I. I should
perhaps more properly be stated as follows:
Afinite nonvoid set A C vn is said to be linearly independent <=>for every
function a with domain A and range in R we have
=

L
XEA

a(x)x

0 ::::::}

a(x) =

0,

Vx E A.

230 I HIGHER-DIMENSIONAL SPACE

Suppose A has m elements and <I> is any one-to-one function with


domain (l,m) and range A. If we put xk = <l> (k) and ak = a( xk) ,
then (referring to the introduction to Chapter 3) we get

2:

a(x)x

.xEA

2:

k=l

a(<l>(k) )<l>(k) =

2:

k=l

akxk .

Hence, we have shown that Definition 6.1. l is independent of <I>.


In using the notation {xk: k E (l,m)} we usually mean that if
j - k, then x; - xk> although this is not generally the case when we
use the notation (x1,
Xm ) .

6.1.2

space of

Definition.
vn

A nonempty subset L C vn is said to be a linear sub


or a linear manifold ::::> V x, y E L and Va , f3 E R, ax + {3y

EL.
6.1.3 Definition. A vector x E vn is said to be a linear combination of
the vectors in the set {xk : k E (1, m)} C vn ::::> there exist numbers a1 ,
am E R so that

The last definition should perhaps more properly be stated as follows:


vector x E vn is said to be a linear combination of the vectors in the
finite set A C vn ::::> there exists a function a with domain A and range in R
A

so that

As in the discussion of Definition 6.1.1, the sum on the right is quite


independent of any function <I> with domain ( 1, m) and range A. Thus
Definition 6.1.3 really makes sense quite independent of which function
from (1, m) onto {xk: k E (1, m)} we are using to define the sum.
The terminology and notations we have adopted in the formal defini
tions 6.1.1 and 6.1.3 are more classical and standard than those we have
used in the rewritten versions of these definitions. Hence, if the reader
will keep in mind what is involved, there seems to be no real reason to
change from the classical terminology and notations, and we shall use
them in the future without further comment.

6.1.4 Definition. If L is a linear subspace of vn, then a set {xk: k


E (I , m)} C L is said to generate L ::::> every x E L is a linear combination
of the vectors in {xk : k E (1, m)} . A set of vectors that generates L is called
a basis for L ::::> the set is linearly independent.
It is clear that every finite set {xk: k E (1, n)} C vn generates a linear
subspace of vn, namely, the set of linear combinations of the vectors of
the given set.

6.1

REAL VECTOR SPACES I 231

As an example of a basis for vn consider the vectors

(6.1.1)
where

eki =0

ifj -

k
ek =1.

and

Of course, a vector space may have

many different bases, and for example a basis of

V3

is given by the

three vectors

X1=(1,0,0},

X3 = (1,1,l) .

X2=(1,l,O),

We shall leave the verification to the reader.

6.1.5

Theorem.

the linear subspace of


s:;;:;; r.
Proof.

If {yk: k E (1,s) } is a linearly independent set in


vn generated by the set {x k: k E (1, r) } C vn, then

Since y1 -

0, we

may write

r
Y1 = L <X1kXk,
k=l
where not all the

xk1

a1k are zero.

Let k1 be an integer so that

a1k 1

0.

Then

is a linear combination of

Since y2 is a linear combination of the

Y2=/3Y1

k.. k,

xk

we find

<X2kXk.

a2k are zero, for otherwise y1 and y2 would be


k2 be an integer so that a2k2 - 0. Then xk2 is
a linear combination of {y1,y2} U {xk: k E (1, r) & k - k1, k2}. If
r < s, then proceeding in this way we see that the set {xk: k E (1, r)}
is contained in the linear subspace generated by the set {yk: k E (1, r) }
Not all the numbers

linearly dependent. Let

Hence the latter set generates the same linear subspace generated by

{xk: k

(l,r)},

and thus

r
Yr+t = L f3kY k
k=l
This contradicts the linear independence.
The formal way to do this is as follows. Forj E
statement: There exist distinct integers
vector in the set

{xk;: i- E ( l ,j)}

k1,

ki

(1, s) let P(j) be the


(1, r) so that each

in

is a linear combination of the vectors

in the set

{y;: i
For j >

s,

P (j) be any
P(l) is true.

let

have shown

( l,j ) }

{xk: k

(l,r) & k

k;, i

true statement, for example,

( l,j ) } .
1 =1. Now, we
P(j) is true.

Further, suppose thatj <sand

232 I HIGHER-DIMENSIONAL SPACE

The set

{xk: k E (l,r) & k #- ki, i E (l, j)}

is nonvoid, since other

{yi: i
E (l,j)}, which is impossible. Now, the process we have carried out
for y2 in the previous paragraph can be carried out for YJ+i, and we se
that P(j + l) is true. If j ;;;,, s, P (j) :::::} P (j + 1) automatically. Hence
we have the statement (j) (P(j)), and taking j
s we see thats.;;; r.
wise YJ+i is a linear combination of the elements in the set

6.1.6 Corollary. If L is a linear subspace of vn, then any two bases for
have the same number of elements.
6.1. 7

Theorem.

independent set in
Proof.

If L is a linear subspace of
can be extended to a basis for L.

Suppose

{xk: k E ( 1, r)} C

V",

then every linearly

is the given linearly inde

n elements, it follows
6.1.5 that any linearly independent set contained in L
{xk: k E (I, r)} has at most n elements. Hence, among

pendent set. Since L C V" and V " is generated by


from Theorem
and containing

the collection of such linearly independent sets, there is one with a

m.

maximal number of elements, say

If this set is

{xk: k E ( 1, m)} and

L1 is the linear space generated by this set, then L1


L - L1 #- 0 and there is an Xm+i

L. Otherwise,

{xk: k E ( 1, m + 1)}
{xk: k E (I, r)}. But this is a

L - L1, so that

is linearly independent and contains


contradiction.

6.1.8

Corollary.

If L is a linear subspace of

V", L #-

{O}, then

has

a basis.
Proof.

Since L #-

{O},

there is a nonzero element in L. The set

formed from this one element is linearly independent and hence can
be extended to a basis for L .
Corollaries

6. J .6

and

6.1.8

show that the following definition makes

sense.

6.1.9 Definition. If L is a linear subspace of V" and L #- {O}, the num


ber of elements in any basis for L is called the dimension of L. If L
{O}, its
dimension is said to be zero.
.=

6.1.1 0 Corollary. If L is a linear subspace of V" of dimension m,


then any linearly independent set contained in L and having m elements is a
basis for L.
Proof.
Theorem

If {yk: k E (I, m)} C


6.1.7 we can extend it

L is a linearly independent set, by

to a basis for L, which must have

elements. Hence the given set is a basis.

6.1

REAL VECTOR SPACES I 233

6.1.11 Corollary. If L is an m-dimensional linear subspace of vn, then


any set contained in L with m + I elements is linearly dependent.
Proof.

Otherwise, L would have dimension larger than m.


As we had mentioned before, the space vn is a prototype example of
a real vector space. There are many other sets which arise quite naturally
in mathematics that have enough of the properties of V" that all the
theorems and corollaries we have given above will be valid for these
other sets. For this reason we abstract the essential properties of vn
and give the following definition.
6.1.12 Definition. A real vector space (or real linear space) is a triple
consisting of a set V, a function + with domain V X V and range V, and a
function with domain R X V and range V which satisfJ the following con
ditions:
Vx,y EV.
( a) x+y=y+x,
( b) x+ (y+z)
(x+y) +z,
Vx,y,z EV.
( c) There is a vector 0 EV so.that Vx EV x+ 0 =x .
( d) (a/3) x=a (/3 x),
Va,{3 ER & Vx EV.
Va ER & Vx,y EV.
( e) a (x+y) =a x+a y,
( f ) (a+f3) x= a x+ f3 x,
Va,f3 E R & Vx EV.
g
I
x
x,
0
x
0,
V
x
E
V.

( )

real vector space is called finite-dimensional ::::} it is generated by a finite set


of its elements.

We shall usually abuse the language and designate a real vector space
by 'V' instead of the triple ' (V, +, ) .' Also we shall follow the usual
custom and drop the dot when we "multiply" an element of R by an
element of V; that is, we shall write 'ax' instead of 'a x.' We shall also
define -x to be the vector Ix.
All the definitions, theorems, and corollaries that we have given for
the vector spaceV" will carry over mutatis mutandis to a finite-dimensional
vector space V. We shall leave the verification of this as an exercise.
A simple example of a finite-dimensional vector space is the space
R,. [x] consisting of all polynomial functions of degree at most n, that
is, all functions given by
-

p (x) = L
k=O

akxk,

Two such polynomial functions are added by the rule ( p + q ) (x)


R is defined by

= p(x) +q(x), and multiplication by an element of


(ap) (x) ap(x).
=

We shall soon see other examples of a real vector space, and indeed
we shall even see an important example of one that is not finite-dimen
sional.

234 I HIGHER-DIMENSIONAL SPACE

D Exercises
I.
Let V be a real vector space of dimension n. Show that V is
isomorphic to vn; that is, there exists a one-to-one function <P with
domain V and range V11 so that Vx ,yEV and Va , f3 ER ,

<f> (ax + {3y)=a<f>(x)+ {3</>(y).


2. If {xk: k E (I, n)} is a basis for a vector space V, show that
VxEV, there exist unique numbers a 1,
, a n in R so that

3.

Decide which of the following sets of vectors form a basis for V4:
(a) { (I,l, 0, O), (O,0, 1, 1), (I, 0,0,4), (O, 0,0,2)}.
(b) { (I,0, 3,1), (1, 1, 0, 2), (0,1, 2,1), (2,2,5, 4)}.
(c) { (l, 0,1,0), (0,1,0,0), (0,0, 1, 0), (2, 1, 3, 0)}.
(d) { (a, b, 0, O), (I, 0, c, O), (0, 1, I, a ) , (I, 0, 1, b)},
where ab< 0.
4.

Extend the set { (I, 1, 0,2), (0,2,1,O)} to be a basis for V4

5. Decide which of the following sets of vectors are linear subspaces


of vn.

{ k=l
xk=O}
(b) { x: (xk)2= }
k=I
(c) { x: i kxk 1} .
k=I
( a)

x:

(d) {x: x1xn O }.


(e) {x: x1x2 O}.
=

6.
Is the vector (I, -3,2,-2) in the linear subspace generated
by the vectors in the set {2,3,-1, 4), (3, 0,1,2), (1, 6,-3,6)}?

7. Show that the vector space of polynomial functions of degree


at most n, given as an example at the end of Section 6.1, has dimension
n + 1.
8.

For
( a)
(b)
(c)

any real vector space V show the following:


The zero vector is unique.
VxEV,Ox=O.
VxEV,x+(-x)=O , and x+y= 0::::>y=-x.

9. If L and M are the linear subspaces of V4 generated by the sets


of vectors { (I,0, l,0), (O,2,1,2), (2,3,1, I)} and { (0, 2,0, -3),

6.2

(2, 3,

1,

EUCLIDEAN SPACES I 235

I)}, respectively, find the dimensions of the linear spaces


M. By definition,

L+M and L n

L+M= {z: (3x)(3y)(x E L,y EM, &z=x+y)}.


If V is a finite-dimensional real vector space, designate its

10.

dimension by

d(V). If Land Mare linear subspaces of V" (or any finite

dimensional real vector space), show that

d(L) + d(M) = d(L+M) + d(L n M).


The subspace L+M is defined in Exercise 9.

11. Suppose L and M are linear subspaces of a given vector space


V. Give necessary and sufficient conditions on L and M that L U M
is a linear subspace of V.
Suppose L and

12.

fixed dimensions land

M are variable linear subspaces of vn but with


m, respectively. What is the maximum possible

dimension of L+M (see Exercise 9) and minimum possible dimension


of

6.2

M?

EUCLIDEAN SPACES

To proceed further in the direction that we wish to go, it is necessary


to define a distance between two points in vn. The way in which we
choose to do this is based on the

Pythagorean theorem. Recall that we

use the symbol 'lxl' to denote the absolute value of a number and we
can intuitively think of this as being the distance from x to 0. We shall
use the same notation for elements of vn.

6.2.1
vn

to

Definition.

The norm or length function on

defined by
lxl

vn

is a Junction from

{ (xk)2 } 1'2.

The pair (V", I I), consisting of V" and the length function I I,
is a prototype example of what is called a Euclidean space of dimen
sion n and we shall designate it 'En.' The number Ix - YI will often be
called the distance between x and y. By an abuse of language we shall
say

x E En if we want to emphasize that it is in the domain of the given

length function.
The reader will perhaps recall from the elementary theory of analytic
geometry that the cosine of the angle between two vectors x and y in
the plane (Fig. 6.2.1) is given by
cos

(J =

x1y1 + x2y2
.
lxl IYI

2S6 I HIGHER-DIMENSIONAL SPACE

FIGURE 6.2.1

The number
of the vector

6.2.2

x1y1 +x2y2 is called the inner product or the dot product


x with the vector y.

Definition.

The function with domain vn x vn and range R

defined by

is called the inner product or dot product of x and y.


6.2.3

Proposition.

The inner product function satisfies the following

conditions:
(a)
(b)
(c)
(d)
(e)

xy =y x,
Vx, y E vn.
VaER & Vx,yEV".
a(xy) = (ax)y=x (ay),
Vx,y,zEV".
(x+y)z=xz+yz,
xxO, VxEV",andxx=0:::}x=O.
x x= lxl2,
VxEV".

We shall leave the proof of this proposition as an exercise.

6.2.4

Theorem (Cauchy-Bunjakovsky-Schwarz).

I x YI

For every x,yEP,

lxl IYI ,

with equality holding ::::} y =ax or x = 0.


A proof of this theorem has already been indicated in Exercise 1 5
o f Section 1.8. Another proof i s t o b e found i n Exercise 1 0 o f Section 5.4
(see also Exercise I at the end of this section). The proof we shall give
is possibly not as straightforward or elementary as any of the proofs
we have mentioned, but it does have the advantage that it will carry
over to real vector spaces that are not necessarily finite-dimensional,
that is, to real spaces

V for which there is a function defined on V

that satisfies the conditions of Proposition 6.2.3.

6.2

Note that in

V2

EUCLIDEAN SPACES I 237

! cos Bl

this theorem says nothing more than

:;;;; 1.

Before we prove this theorem it will be convenient to prove a lemma.

6.2.5 Lemma. If a ,e 0 and if V>.. E R, a>..2+b>..+ c 0, then


b2- 4ac,,;;;; O; b2- 4ac = 0 a>.. 2+b>..+c has a double root (which must be
at-b/2a).
By the technique of completing the square, we have

Proof.

a>..2+b>..+c=a
_

>..+l._

2a

a ;C 0, it must be positive;
VA.,

Since
that

)2

_.!_ (b2-4ac)

4a

V>..

E R,

(6.2.1)

0.

otherwise the right side of (6.2.1)shows

2
4a2 >..+:J :;;;; (b2- 4ac),

which is impossible if >.. is sufficiently large. Therefore, we have

(b2 - 4ac) ,,;;;; 4a2

and, taking>..=-b/2a, we get that


If

b2- 4ac=0,

r,

>..+

:a
b2 - 4ac,,;;;; 0.

then (6.2.l)'shows that our polynomial has a double

root at-b/2a. Conversely, if

a>..2 + b>..+ c has a double

real root,

a>..2+b>.. + c=a(>..- r)2 =a>..2- 2ar>..+ ar2 ,

and hence

V>..,

r =-b/2a.

From (6.2.1) i t follows that

b2- 4ac=0.

For every>.. E R ,

Proof of Theorem 6.2.4.

(y+ >..x)

r, then

( y+Ax)= lxl2>..2 + 2(x y)>.. + IYl2

0.

(6.2.2)

If we apply Lemma 6.2.5, we get

(x

y)2,,;;;; lxl2 IYl2,

from which it immediately follows that

If

y =ax

or x

Ix YI ,,;;;; lxl IYI

O; we clearly have equality. Conversely, if we have

equality, then either x=0 or, taking a= x

y /lxl2,

it follows from

Lemma 6.2.5 that-a is a double root of (6.2.2) and hence

From this it follows that

6.2.6

y=ax.

For every x, y

Theorem (Triangle Inequality).

I lxl - IYI I

,,;;;;

I x+ YI

,,;;;;

lxl + IYI,

with equality on the rif{ht holding x = 0 or y=ax,


on the left holding x = 0 or y={3x, f3 ,,;;;; 0.

IY- axl =0.

E En,

a 0

and equality

238 I HIGHER-DIMENSIONAL SPACE

Proof.

Using Theorem 6.2.4 we get

Ix+ Yl2 = lxl2+ 2x

:y+

IYl2

lxl2+ 2lxl IYI + IYl2 = (lxl + IYl)2

Ix+ YI lxl + IYI. In this inequality replace y by -y to get Ix - YI


lxl + IYI, and then x by x+ y to get lxl Ix+ YI+ IYI, from which
it follows that Ix+ YI ;;-: lxl - IYI. By a similar argument we find that
Ix+ YI ;;-: IYI - lxl and this inequality with the previous one completes

Thus

the proof of the inequalities. The statements about equality are immedi
ate consequences of Theorem 6.2.4 and the above computations.
The reason that the previous theorem is called the triangle inequality
is that the length of any side of a triangle is less than or equal to the
sum of the lengths of the other two sides (Fig. 6.2.2).

x +y
length lxl

FIGURE 6.2.2

6.2. 7

x, y E En are said to be orthogonal


{xk: k E (l, m)} CE" is said to be ortho
E (I, m), j k =}xi xk = 0, and Vk E (I, m), lxkl

Definition.

Two

vectors

y=O. A set of vectors

normal Vj, k

= 1.

A simple example of an orthonormal set for E11 is the set


that we displayed in Section 6.1.

{ek:

E (I, m)}
6.2.8

{ 0} ,

Theorem (Gram-Schmidt).

Every linear subspace L CE11,. L

has an orthonormal basis.

Proof.

Let

{xk:

E (l, m)}

of Corollary 6. 1.8. Let us set

Y1

be a basis for L, which exists by virtue

X1f lx1 I,

x onto the onedimensional linear subspace generated


2
, we shall consider the
2
plane determined by y1 and x and use elementary trigonometry to
2
Y1/lx I,
compute it. Considering Fig. 6.2.3, and recalling that cos(} = x
2
2
and then project

by y1 to obtain a vector w2 To see the form of w

6.2 EUCLIDEAN SPACES I 239

FIGURE 6.2.3

we see that the projection of x2 onto the linear space generated by

Yi is the vector

Now, let z2 be the vector perpendicular tow2 so that z2 + w2 =x2. Hence


z2=x2- (x2 Y1)y1.

The vector z2 - 0, for otherwise y1 and x2 are linearly dependent. Let


us set

Y2 = z2/lz2I

Formally, by a direct computation, we see that y1 z2= 0. Therefore


{y1, y2} is an orthonormal set, and we think it is clear that the linear
subspace generated by this set and the set {x1, x2} is the same.
Let us proceed by induction. Suppose that {y1: j E (I, k), k < m}
is an orthonormal set that generates the same linear subspace as the
set {x1:j E (l,k)}. Let us set
k
zk+i =xk+I
L (xk+t Y;)YJ

j=l

The vector zk+i - 0, for otherwise zk+t is in the linear space generated
by {x1:j E (l,k)}. If we set
Yk+1 = zk+d lzk+1 I.
it follows that {y; : j E ( 1, k + 1)} is an orthonormal set. It is an easy
exercise to show that this set generates the same linear space as the set
{xi: j E ( 1,k + 1)}. The principle of induction shows that there is an
orthonormal set, having the same number of elements as the dimension
of L, which generates L. This proves the theorem.
6.2.9 Definition. If L is a linear subspace of a linear space M C En,
then the orthogonal complement L.1. of L in M is the set of all elements in M
which are orthogonal to every element of L; that is,
U = {x:

x E M & (y) (y E L x

y = 0)}.

240 I HIGHER-DIMENSIONAL SPACE

6.2.10

Proposition.

If L is a linear subspace of the linear space M C E ,

then the following hold:


(a)
(b)

where

L1- is a linear subspace of Mand L n L1= {O}.

LEBL1-,
LEBL1-= {z: (3x)(3y)(x E L,y E L1-, & z=x+y)}.
M

Further,

every element of M is the unique sum of an element in Land an element in Ll..


(c)

LH

L.

The proof of (a) is almost trivial and we leave it as an exer

Proof.

cise. To prove (b) we think it is immediately clear that LEB Ll. C M.


On the other hand, since any orthonormal basis for L can be extended to

an orthonormal basis for M (Exercise


If

x,x1 E

Landy, y1

Ll. and x

4), it follows that M C LEB Ll..

+ y=x1 + y1,

then x

left side is in L and the right side is in Ll., so that

Statement (c) is an immediate consequence of (b).

- x1 = y - y1 The
x=x1 and y = y1

If V is a real finite-dimensional vector space and there is a function

with domain V X V and range R which satisfies conditions (a) through


(d) of Proposition

6.2.3, then

V together with this bilinear function is

called a Euclidean vector space. By defining the norm or length function


by condition (e) of that proposition it is possible to prove the Cauchy
Bunjakovsky-Schwarz inequality, the triangle inequality, and the Gram
n
Schmidt theorem in exactly the same way as we proved them for E .
An example of a finite-dimensional Euclidean space that is not E",

we consider the space Rn [x] of'polynomial functions of degree at most

n which we considered at the end of Section

6.1.

product on Rn [x] X Rn [x] by means of the formula

q=

J:p(x)q(x)dx.

We put an inner

(6.2.3)

We shall leave as an exercise the fact that this function satisfies condi
tions (a) through (d) of Proposition

6.2.3.

D Exercises
1.

Prove the Lagrange identity,

and use it to deduce the C-B-S inequality.

2.

Find an orthonormal basis for the subspace of E4 generated by

the vectors in the following sets:


(a)

(b)
(c)

{(0,1,1,2), (1,0,2,3), (2,1,1,2)}.


{(1,-2,3,-2)' (0,-1, 0,3)' (2, 3, 6,-7)}.
{(1,0, -1,2), (0,1,2,3), (2,1,0,7), (1,-1,-3, -1)}.

6.3

3.

Show that any orthonormal set in

E"

is linearly independent.

4.

Show that any orthonormal set in

En

can be extended to an

E".

orthonormal basis for

5.

Show that

Vx,y E E",

Interpret this result geometrically in

6.

TOPOLOGY IN E" I 241

If

y.

x, y

En

= IYI

lxl

and

E2

show that

7.

Suppose that

the form

x + ay

and

x-y

is orthogonal to

E2

Interpret this geometrically in

y are fixed vectors in E". Find the vector of

which has the smallest length as

varies over

R.

Show that this vector is orthogonal toy.

8.

Verify that the inner product defined by

(6.2.3) makes R [x]

a Euclidean vector space. Apply the Gram-Schmidt process to the basis

{I,x,x2,x3,x4}
9.

Let

for

R4[x].

{xk: k E (1, n)} be an orthonormal


E. For any x E E we may write

basis for a Euclidean

vector space

x ;=
Compute the numbers

ak.

x
10.

x E E"

akxk.

k=l

n
y = L (x
k=l

show that

xk)(y xd.

{xk: k E (l,m)} is an orthonormal


ak so that

Suppose that

For a given

If

"

set

E".

find the numbers

Ix

f .akxk l =minimum.

k=l

TOPOLOGY IN E"

6.3

Let us recall that for

E1

R,

questions about limits and continuity of

functions are discussed in terms of the absolute value, which is the same
as the length function. An alternative, but equivalent, way of discussing
these matters is through the use of open intervals or more generally
open sets. In the same way, we can discuss limit and continuity questions
in

En

by means of the length function or alternatively by means of

open sets.
In

E1

open sets were defined by means of open intervals. We could do

the same thing in En, replacing the open intervals by open cubes or more
generally by higher-dimensional rectangles. Another way is to replace

242 I HIGHER-DIMENSIONAL SPACE

the open intervals in 1 by open balls in

B(a,p)
The set

B (a, p)

E" defined

by

{x: xEE" and Ix-al< p}.

is called an

open ball with center at a and radius p .

However, it makes very little difference which sets we take as basic,


since every open cube about

a contains an open ball with center at a,


a contains an open cube

and, conversely, every open ball with center at


about

a. We shall use the open balls as our basic sets.

6.3.1

Definition.

A set U CE" is said to be open<=> VxEU, 3p > 0,

so that B(x,p) CU.


The concept of accumulation point and closure are the same as for
the one-dimensional case, but we shall repeat the definitions.

6.3.2 Definition. A point aEE" is said to be an accumulation point


of the set ACE"<=> Vp > 0, [B(a,p)\{a}] n A =I' 0. The set of all
accumulation points of A is called the derived set of A and is denoted by A .
The closure of a set A is the set A= A U A , and A is said to be closed<=> A A.
'

'

a of a set A is a point with the


a contains a point in A
which is different frdm a. This means we can get arbitrarily close to a
by elements in A. The point a may or may not belong to A.
In other words, an accumulation point

property that every open ball with center at

The concept of compactness for E" is the same as the concept of com
pactness for 1

R. For the sake of completeness we shall repeat the

relevant definitions.

6.3.3

Definition.

A set ACE" is said to be bounded <=> 3M so that

VxEA,Jxl,,;;;M.
6.3.4

Definition.

A set in E" is said to be compact <=> it is closed and

bounded.
To discuss the Heine-Borel theorem in

E"

we need the concept of

covering. We shall give this formally in the next definition.

6.3.5

Definition.

A collection of sets U is said to be a covering for a set

ACE"<=>

AC U {U: UEU}.
The collection is said to be an open covering <=> each UE U is open.
6.3.6 Theorem (Heine-Borel). A set ACE" is compact <=> in every
open covering for A there are a finite number of sets which is a covering for A.

6.3

Proof.

TOPOLOGY IN E" I 243

We shall prove the necessity part of the theorem by induc

tion on the dimension

of En. We shall suppose that P(n) is the state

ment: An open covering for every closed and bounded set in En reduces
to a finite subcovering. We have shown in Theorem 2.2.5 that

P (1)

is true. Let us suppose that P(k) is true and that

A=BX{t},
t E E1 Suppose U is
U E U we set U1 = {x: x EEk
& (x, t) E U}. It is almost immediate that U1 is an open set in Ek and
U1 = {U1: U E U} is an open covering for B. Hence, by P( k) , this re
duces to a finite subcovering for B which in turn implies that U reduces
to a finite subcovering for A.
where

is a closed and bounded set in Ek and

an open covering for

and for every

Next suppose that

A=B XI,
[a, b] C E1 Let U be an open covering for
[a, b] so that BX [a, t] is covered by
a finite number of sets in U. J is nonvoid, since, by the first paragraph,
a E]. Let us set t0 = sup]. By the first paragraph of the proof, BX{t0}
is covered by a finite number of sets {Ui: j E (1, n)} CU. At each
point of BX {t0} place an open cube with center at the point and whose
closure is in one of the U1 This is an open covering for BX{t0} and
where B is as before and I=

and let] be the collection oft in

hence reduces to a finite subcovering. Let 2/ be the minimum of the


side lengths of this finite number of cubes. Since

B X [a, t0 - l] is
- l, t0 + l] is
covered by { U1: j E ( l, n)} it follows that B X [a, t0 + l] is covered
by a finite number of sets in U. Hence we must have t0 = b.
Finally, if A is closed and bounded, it is contained in a large enough
covered by a finite number of sets in U and since BX [t0

cube,

A C Ik
where I=

U
U

[a, b] C E'

and

is an open covering for


U

{Ac}

I,

is the direct product of

and

Ac

is the

I taken k times. If
complement of A, then

covers h XI and hence reduces to a finite subcover. Con

sequently, U reduces
P(k) P(k + l).

to a finite subcover, and we have proved that

The sufficiency part of the theorem follows exactly the same reason
ing as the sufficiency part of the proof of Theorem 2.2.5, and we shall
leave it for the reader.
The Bolzano-Weierstrass theorem is also true in En, and as we shall
see its proof is an immediate consequence of the Heine-Borel theorem.

6.3. 1

Theorem

(Balzano-Weierstrass).

in En has an accumulation point.

Every bounded

infinite set

244 I HIGHER-DIMENSIONAL SPACE

Proof.

Suppose A C E" is a bounded infinite set. If A has no accu

mulation points it is closed and hence compact. Also, since it has no

Vx EA, 3p(x) > 0 so that B(x,p(x)) nA= {x}.


{B (x,p(x)): x EA} is an open covering for A and thus
finite subcovering. But this means that A has only a finite

accumulation points,
The collection
reduces to a

number of points, contrary to the original hypothesis.

CONNECTED SETS

The concept of a connected set is rather important in many problems


in analysis. The easiest way to introduce this concept is first to introduce

relatively

the concept of

open and

relatively

closed sets.

6.3.8 Definition. A set U C E" is said to be relatively open in a set


A C E"<=::?there exists an open set V C E" so that U V nA. A set C C E"
is said to be relatively closed in a set A C E"<=::? there exists a closed set D C E"
so that C = D n A .
=

Using the concepts of relatively open sets and relatively closed sets,
it is possible to describe the concept of a connected set.

6.3.9

written

Definition. A set A C E" is said to be connected <=::? A cannot be

as

the union of two disjoint nonvoid sets that are relatively open in A.

It is an easy matter to see, and we leave it as an exercise, that a set is


connected <=::? it cannot be written as the union of two disjoint nonvoid
sets that are relatively closed with respect to A.
It is not always a trivial matter to decide whether or not a set in E",

> 1, is connected. Roughly speaking, a set is not connected if it consists

of two or more disjoint pieces. For example, the set that is the union of
the open balls

B( (2, 2), 1)

and

B( (-2, O), 1)

in E2 is a disconnected set.

On the other hand, connected sets in E1 are easy to describe, and this
is often very useful in deciding whether or not a set in higher dimen
sions is connected.

6.3.10

Theorem.

Proof.

A set IC E1 is an interval<=::?

A set in

E1

is connected<=::?it is an interval.
Vx,y E /, {z: x ,s; z ,s; y} C I.
3x,y EA and 3z EAc
J-oo,z[ and ]z,oo[ are open in E1 and

If a set A is connected and not an interval, then


so that

<

<

y.

The intervals

if we set

B=J-oo,z[nA,
then

and

C=]z,oo[nA ,

are nonvoid and relatively open in

This contradicts the assumption that A is connected.

and

C= 0.

6.3

Conversely, suppose

A n B
xEA, yEB
and suppose x < y. Since I is an interval [x, y] C I. Since A is relatively
open in /, there is an open U so that A= U n I. Now 3 so that
x < < y and [x,U CU. Thus [x,[ CA. Let E= { : [x,U CA};
the set E is nonvoid and bounded above and hence z = I.u.b. E exists
and zE [x, y]. Now z fj. 4,; otherwise; since A is relatively open in/,
there is a > z so that [z, U c A and thus [x,U c A, which means
thatE E and z l.u.b. E. This means that zE B. But since Bis rela
tively open 3 < z so that ]' z] C B, which again contradicts the fact
that z is the least upper bound of E.
Using Theorem 6.3.10 it is possible to prove the connectedness of a

= 0, A

is an interval and

TOPOLOGY IN E" I 245

I=A

U B, where

and B are nonvoid and relatively open in/. Let

large number of sets in En. We first define these sets.

6.3.11

Definition.

A set C C E" is said to be convex Vx,yE C, the

straight-line segment
L(x, y)={z: (3t)(tE [O, 1) & z= (I- t)x+ ty}
is contained in

C.

6.3.12

Corollary.

Any convex set in En is connected.

Proof.

Suppose C is convex and C =A U B, where A,


, Bare disjoint,

relatively open sets in C. If

and B are nonvoid, let

xEA and yEB.

Let us set

A,={t: tE [O, 1) & (I-t)x+ tyEA},


B,={t:tE [O,l) & (I-t)x+tyEB}.
Since C is convex,

[O,1) =A1

U B,. From the fact

tively open in C, it is a sim pie matter to show that


tively open in

[O, 1).

But this means

dicting the last theorem.

[O, 1)

A and B are
A 1 and B1 are

rela
rela

is not connected, contra

After we discuss continuous functions in Section

6.4

we shall be able

to generalize this last corollary. We shall finish the discussion of con


nectedness by discussing maximal connected sets.

6.3.13 Definition. A component in A C En is a maximal connected


subset of A (in the sense that it is not contained in any larger connected subset
of A).
6.3.14

Theorem.

Any set zn En is a union of pairwise disjoint com

ponents.
Proof.
ting

If A C En,
y x and y

let us define an equivalence relation on


belong to a subset of

by set

which is connected. For

246 I HIGHER-DIMENSIONAL SPACE

every

x EA, it is clear that x x. Also, Vx,y EA it is also clear that


y =::} y x. Suppose now that x,y,z EA, and x y and y z.
Let B and C be connected subsets of A so that x,y E B and y,z E C.
We claim that B U C is a connected subset of A. For suppose B U C
= E U F, where E and F are relatively open in B U C and E n F = 0.
Now, B = (B n E) U (B n F) and the latter sets are disjoint and rela
tively open in B. Since B is connected, one.of these sets must be void.
If B n F= 0then y E E. In the same way, C= (C n E) U (C n F)
so that one of the latter two sets must be void. Since y E C n E, we must
have C n F= 0 and thus ( C U B ) n F= 0., which implies F= 0.
Thus B U C is connected, and since x,z E B U C, we see that x
y
and y
z =::} x
z. Consequently, we have an equivalence relation.
We can write A as the union of pairwise disjoint equivalence classes.
We claim that each equivalence class is a component of A. If E is an
equivalence class of A, it is connected. For suppose E = B U C, B n C
= 0, and B and C are nonvoid and relatively open in E. If x EB and
y E C, since x y there is a connected subset D of A so that x,y E D.
Clearly D C E. But then the sets D n B and D n C are nonvoid,
x

disjoint, relatively open sets whose union is D. This is a contradiction.


If E is not a maximal connected subset of A, there is a point of A out
side of E that is equivalent with every point of E. This contradicts the
fact that E is an equivalence class. The proof is finished.

6.3.15 Corollary. IfA C En is open, then the components ofA are open
and there are a countable number of them.
Proof.

Let E be a component of A and

B (x,p) CA.

Now,

B (x,p)

x EE.

Let

> 0 so that

is convex and thus, by Corollary 6.3.12, is

connected. Hence every element in

B (x,p)

is equivalent to

and thus

is in E. Hence E is open.
Suppose

Q" is
QA

times, and

the Cartesian product of the rationals


=

Q" n A;

rational components. Since

that is,

QA

is open,

Q with

itself

is the set of points in A with

QA

is denumerable. (Proof?)

Let <I> be a one-to-one function with domain N and range

QA

If E is a

component of A, let

NE= {n: n E N & <l>(n)


Since E is open,

NE

E E}.

is nonvoid and thus by the well-ordering principle

it has a minimal element

nE.

Let qr be that function whose domain con

sists of the equivalence classes E and defined by


qr(E)
The one-to-one function <I>

= nE.

qr has range a subset of

QA,

and thus its

range is countable. This means the collection of components of A is


countable.

6.3

6.3.16

Corollary.

TOPOLOGY

IN E"

I 247

If A C E1 is open, then A is the countable union of

open intervals.
Proof.

From the last corollary,

is the countable union of open

connected sets. Since by Theorem 5.3.10, every connected set in E1 is


an interval, the proof is complete.

INTERIORS AND BOUNDARIES

The concepts of the interior and boundary of a set in En will be of some


importance in Chapter 8 when we discuss Jordan content and the theory
of integration in higher dimensions. In somewhat loose language the
interior of a set

is the largest open set that is contained in

A.

The

boundary of A is the set of points that do not belong to the interior of

or the interior of Ac. The formal definition is as follows.

Definition. If A C E", then the interior of A, denoted by A0,


the union of all open sets contained in A. The boundary of A, denoted {3A,
is the set A\A0
6.3.17

is

A0 38 > 0, so that B(x, 8) CA. Also we think


{3A V8 > 0, B(x, 8) n A =t= 0 and B(x, 8) n AC
boundary of a set is clearly a closed set and, if A C B, then

It is clear that
it is clear that

=t= 0. The
A C B0

It is quite possible that a set may be rather "thin" and yet its boundary
may be rather "thick." For example, the rationals in

[O, I]

are in some

sense rather "thin" but the boundary of this set consists of the whole
interval

[O, l].

Note that this set of rationals has no interior. A single

point in En is an example of a closed set with no interior. The reader


may easily construct an example of a denumerable closed set, which
consequently has no interior. The Cantor set is an example of a closed
set with no interior which nevertheless has an uncountable number of
points.

D Exercises
In the following exercises all sets are to be taken in En, unless otherwise
specified.
I.

Show that the intersection of any finite number of relatively

open sets is relatively open and the union of any number of relatively
open sets is relatively open.

2.

Show that the union of any finite number of relatively closed sets

is relatively closed and the intersection of any number of relatively


closed sets is relatively closed.

248 I HIGHER-DIMENSIONAL SPACE

3.

Suppose

tively closed. If

U is a relatively open set in A. Show


C is a relatively closed set in A, show

that
that

A\U is
A\C is

rela
rela

tively open.
4.

Let

C be relatively compact in A.
d

then

5.

3c E C,

so that d

Prove that a set

inf {Ix

le

If

E A, is it true that if we set

E C},

al?

C En is connected{:::::} the subsets of A which

are both open and closed relative to

6.

al :

If a

and B are connected and

are

itself and the null set.

n B 0, show that

U Bis

connected.

7.

Let

be a connected set in En , and A its closure. If A C B C

A,

show that Bis connected.

8.

If

C En and B C Em, show that

X B is open in 1+m

{:::A
::}

X Bis closed in 1+m

{:::::}A

and B are open.


9.

If

C En and B C Em, show that

and Bare closed.

10.

If A C En and B C Em, show that

X Bis compact{:::::} A and B

are compact.

11.

Show that a sequence with range in E" is convergent if and only

if it is a Cauchy sequence.

12.

If

C En and B C Em, show that

X Bis connected

{:::A
::}

and

Bare connected.

13.

Show the following:


(a)
(b)

14.

[A0 U (Ac)oy.
(AoV\ (Ac)o.

Show the following:


(a)
(b)

6.4

{3A
{3A

A is closed{:::::} {3A
A is open {:::::} {3A

C
C

A.
Ac.

CONTINUOUS FUNCTIONS

In Chapter 2 we discussed the limit and continuity concept for functions


having their domains and ranges in R. We shall now do the same for
functions having their domains in En, n 1, and ranges in E"', m 1.
The definitions are essentially identical and the theorems and their
proofs are, by and large, the same. Hence we shall not present as de
tailed a study here as we did in Chapter 2, but shall limit ourselves to
those matters for which a slightly different formulation than given in
Chapter 2 may lead to deeper insights.

6.4

6.4.1

Definition.

CONTINUOUS FUNCTIONS I 249

If f is a function with J0
(
f) C P andf
( ) C Em

and a is an accumulation point of J0


f
( ), then I is said to have the limit l at a

=}VE> 0,38> Oso thatx

E [B
( a ,8)\ { a }] n J0
(
f) l
x
( )EB
(l ,

E)
.

In case f has a limit at a we write, as usual,


Jimf
x
( )= l

x-a

6.4.2

Definition.

or f
(
x) - l

asx- a.

If f is a function with J0
(
f) C En and(
f) C Em,

then f is said to be continuous at a::::} a E J0


(
f) and VE> 0, 38> 0
so that x E B
( a 8) n J0
(
f)f
(
x) E B
f
(
a), E).
(
,

The function I

ZS

said to be continuous :::f


:}
is continuous at every point of its domain.

The concept of continuity at a point is a local concept, that is, de


pends only on the values that the given function takes on in some neigh
borhood of the given point. However, a continuous function has a
"global" characterization that is important and useful.

6.4.3

Theorem.

A function f with J0
(
f) C P and f
( ) C Em

is

(
is relatively open in J0
J
( ).
continuous :::f:} or every openU C Em, 11
- U)

Proof. Suppose f is continuous andU C E'n is open. If 11


- U
( ) is
void, it is open relative to J0
f
( ) and we are done. If
f1
- U
( ) 0, let
x E 1-1
(
U); sinceUis open,3p> Oso thatB
(
f
(
x),p) CU. From the
continuity off, 38(
x,p), so that y E B
(
x,8) n J0
(
f) f
(
y) E
B
J
(
x)),p), which in turn implies that B
(
(
x,8) n J0
(
f) Cf1
- U
( ). If
we set
(
U ) },
Ef-1
x
( 8
, )x
V= U {B
:
then Vis open andf1
- U)
(
= V n J0
(
f).
Conversely, supposef1
- U)
(
is relatively open in J0
(
f) for every
openU C Em. Let a E J0
(
f) and takeU =B
(
f
(
a), E); then there is
an open V C En so thatf1
U) = V n J0
(
f). Since Vis open,38 > 0,
- (
so that B
(
a8)
,
C V. Hence f takes B
(
a8)
,
n J0
(
f) into B
(
f
( a ) E),
which is the definition of continuity at a. This completes the proof.
,

6.4.4

Definition.

A function f is said to be an open function or an open

map::::} for every relatively open A C J0


f
( ),f
(
A)is relatively open inJ
( ).

As we shall see in Chapter 7, open maps play a relatively important


role in many considerations. Right now, Theorem 6.4.3 leads imme
diately to the following corollary.

6.4.5

Corollary.

If f is a one-to-one open map, then f 1


- is continuous.

250 I HIGHER-DIMENSIONAL SPACE

Now, an open one-to-one map need not itself be continuous as the


reader may easily verify (for example, refer to Theorem 2.3.6). A
function that is one-to-one and f and f-1 are continuous is called a
homeomorphism or wpologfral map. So, a Continuous one-to-one open map
is topological. Exercise 4 at the end of this section will give another
condition for a continuous one-to-one function to be topological.

6A.6

Tl(j)

Theorem.

CEm

If f is a continuous function with (J)


and if (J) is compact, then Tl(j) is compact.

C E"

and

Let U be an open covering of

Tl(j). For every U E U,


there exists an open V CE" so that f-1(U)
V n (f). The collec
tion of such Vis an open covering for (f), and hence there are a
finite number that cover (f). Hence there are a finite number of
elements of U that cover Tl (f).
Proof.

6.4. 1 Corollary. If f is continuous and (f) is compact, then f is


bounded; that is, Tl (J) is bounded.
Proof.

Tl(j) compact and therefore is bounded.

6.4.8 Corollary. If f is real-valued and continuous and (f)


compact, then f assumes its maximum and its minimum on (f).

CE"

is

Proof.

Since Tl(j) C E1 is compact, it is closed and bounded.


Tl(f) must contain its supremum and infimum, which are the
maximum and minimum of f, respectively.
Hence

6.4.9 Definition. A function f with (J) CE" and Tl(j) C Em


is said to be uniformly continuous::::} VE> 0, 38 > 0 so that x y E (f)
and x y E B(O, o) f(x) - f(y) E B(O, E).
,

6.4.10

Theorem.

If f is continuous and (f) is compact, then f is

uniformly continuous.
Proof.

See the proof of Theorem 2.2.9.

In the one-dimensional case we showed (Theorem 2.2.14) that every


continuous function with a closed interval domain takes on all values
between its maximum and its minimum. This fact generalizes in a very
interesting way for functions with domains and ranges in higher
dimensional space.

6.4.11

Tl(j)

Theorem.

C Em

If f is a continuous function with (J)


and if JEJ(J) is connected, then Tl(j) is connected.

CE"

and

6.4

CONTINUOUS FUNCTIONS I 251

Proof. Suppose U and V are open in Em, f'/C,(f) C U U V, and


U n f'/C,(f) and V n f'/C,(J) are disjoint. Then 1-1(U) l-1(U n f'/C,(f))
and 1-1(V) l-1(V n f'/C,(J)) are disjoint and open relative to tB(J),
and cB(J) l-1(U) U l-1(V). Since cB(J) is connected, one of the
setsj-1(U) or 1-1(V) must be the null set and hence the corresponding
set U n f'/C,(f) or V n f'/C,(J) must be the null set. Thus f'/C,(J) must be
connected.
Theorem 6.4.11 is an important aid in deciding whether or not a set
in higher dimensions is connected. For example, the proof of Corollary
6.3.12 can be made in the following way. Suppose C is a convex set in
En and C A U B, where A and B are nonvoid disjoint relatively open
sets in C. Let x E A and y E B, and set
=

l(t)

(I- t)x + ty,

for

[O, l].

The function f is continuous and since its domain is connected its range
is connected. Butf'/C,(J)
[f'/C, (J) n A] U [f'/C, (J) n BJ, where clearly,
f'/C,(J) n A and f'/C,(J) n B are disjoint, nonvoid, relatively open sets in
(f). This is a contradiction.
We think it is clear that the procedure we have just given can be
generalized in a very simple way.
=

Definition. A set A C E" is said to be arcwise connected


A, there exists a continuous function f, with domain [0, 1 J and range
in A, so thatl(O) x andl(I) y.
6.4.12

V x, y E

6.4.13

Proposition.

Every arcwise connected set is connected.

The details of the proof are essentially the same as the proof we have
just given above for convex sets, and we shall leave it as an exercise.
As an example of how Proposition 6.4.13 can be used, let us consider
the ring in E2, which is the set

{x: 0 < r ,,,,; lxJ ,,,,; r }.


1
2
A and let (J and <P be numbers so that '

Let x,y

x1
y1

lxJ cos 8,
JyJ cos <(J,

x2
y2

Jx J sin
Jy I sin

(J,
<P

Let l (f1,J2) be the continuous function with domain [O, l] and


range. in A defined as follows:
=

f1(t)

Jxl cos [(l-2t)8+ 2t<,0]

f2(t)

Jxl sin [(I- 2t) (J + 2t<,0 J

l(t)

2(1- t) J

G) + (2t- l)y,

0 ,,,,; t ,,,,; 1/2

'

252 /HIGHER-DIMENSIONAL SPACE

FIGURE 6.4.1

This is a continuous function withf(O)= x, f(I) = y. Hence A is arcwise


connected. The diagram of A and the range off is given in Fig. 6.4.1.
There is a partial converse to Proposition 6.4.13, the following.
6.4.14

Proposition.

Every open connected set is arcwise connected.

Proof. Exercise 9 at the end of this section.


In general, it is not true that a connected set is arcwise connected.
2
As an example, let us consider the set A=A1 U A2 C E , where

A,= { (O, y): IYI I},


A2 = { (x,y): y = sin (l/x),

0 <

x I}.

In other words,A1 is a line segment along they axis and A2 is the graph
of the function given by sin (I/x) for 0 < x 1. The set A is not arcwise
1 2
connected. For suppose g= (g ,g ) is a continuous function defined
on [O,I] and whose range is inA,g(O)= (O,y) and g(l) = (x, sin (l/x)).
Let
B= {t: g(t)

A1},

t0 =sup B.

Since A1 is closed and g is continuous, B is closed. Hence t0 E B, and1


since A1 n A2= 0 and g(l) E A2, it follows that t0-I.
1
Since g is continuous and ]t0, l] is connected, it follows that the
range of g1 restricted to ] t0, I] is an interval whose closure is, say,
[a, b]. It is clear that a= 0, since otherwise by continuity g1 (t0) > 0.
Now as t ')i t0, we have that x = g1(t) ranges over an interval to the
1
right of zero and moreover g(t) = (g (t), sin l/g1(t)) g(t0). But this
is impossible, since sin (I/x) has no limit as x 0 .
On the other hand, the set A i s connected. For suppose A =U U V,
where U and V are disjoint, nonvoid, relatively closed (and open) sets
inA.IfA1 n U-0,thenA1 C U.Otherwise the setsU1=U n A1 and
V1= V n A1 would disconnect A1. By the same token, since A2 is the
range of a continuous function with domain a connected set, it is con-

6.4

CONTINUOUS FUNCTIONS I 253

nected and hence must be entirely in U or V. If Ai U A2 C U, we

contradict the fact that V is nonvoid. The same kind of statement is

true if U and V are interchanged. If Ai= U and A2 = V, or vice versa,


since A2 is not relatively closed in A, we again get a contradiction.

PEANO CURVES

As an another example of the strange behavior that some continuous


functions may exhibit, we shall give a continuous function whose domain
is

[O, l]

and whose range is the entire interval

X [O, l]. Any con


Peano curve.

[O, l]

tinuous function with this property is called a

The function we shall give is a slight modification of the function


given in Exercise 7 of Section

3.3. Let C be Cantor's set (Section 3.3).


x E [O, l] that can be written in the form

Recall that C is the set of all

where V k E N,

xk

0 or

xk

2. Set

f1(x)

j2(x)

f xi2'

k=I

- 2
f X2k /
'
2

k=I

(J'(x), f2(x) ).
We think it is clear that &>,(J') = &>,(f2 )
[O, l] (see Theorem 3.3.2)
and that &',(J )
[O, l] X [O, l]. To show that f is continuous, it is
enough to show that f1 and f2 are continuous.
l/3 2<n+1l. Suppose that x and a are in Cantor's
Let En= l/2n and 811
J(x)

set and

Clearly, we must have

xk

ak for k E ( 1, 2(n

1))

inequalities be satisfied. Hence

in order that these


.

(x2k .,.._ a2k)/2


(x2k a2k)/2
=
k
L,,
2k
2
k=I
k=n+2

L,,

Similarly, we get

- 2n

< - 2n+1

00

k=n+2

2k

(x2k-a2d/2
2k
k=I
00

,,,;;;;

254 I HIGHER-DIMENSIONAL SPACE

These inequalities mean that if

Ix- al < 8n

x,a EC, then

and

lf1(x)- f1(a)I < ,,


2
In exactly the same way, we show that

x, a EC

and

Ix-al < 8n

==>

lf2(x)- f2(a)I < 2,,


Since
of

Ve> 0,

3n so that

1/2" <

we have established the continuity

E,

Clearly J cannot be a one-to-one function. For if it were, 1-1 would

[O, l]

be continuous (Exercise 4) and thus since

[O, I]

it would follow that C is connected, which is clearly false.

is connected,

Now, it is a very easy matter to extend f to a continuous function g


with domain

[O, l]

and range the same as the range off. Indeed, since

cc is open, it is the countable union of pairwise disjoint open intervals

]a,b[ is one of these intervals, then a,bEC.


[a, b] we define g so that its range is the line segment
f(a) and J(b) :

(Corollary 6.3.16). If
Consequently, on
joining

g(x) =
Of course,

[(x-a)f(b)

b-a

(b-x)f(a)].

Vx EC we take g(x) =J(x). Since (0, I] X [O, l] is a convex


(g) is contained in this square, and indeed is all

set, it follows that


of this square.

Clearly, g is continuous at every point of cc. If

a EC ,

and is a left

end point of an interval in cc, it is clear that


Jim g ( x)

x,a

If

= g(a) .

accumulation point of C. Now,

< () ==>

VE >

0, 3 8 so

IJ(x)- f(a)I <

b,c EC, 0
Vx E Jb, c[,

Suppose

g(x)-g(a) =

b-a < 8, 0

c-b

E.

c-a < 8,

and ]b,c[

E cc.

Then

[(x-b)(J(c )-J(a)) + (c-x)(J(b)-f(a))],

and consequently

lg(x)-g(a)I

a =ft 1, then it is a left


that x EC and 0 x-a

is not a left end point of an interval in cc and

c -b

[(x-b) + (c-x)]e=e.

Hence, we have in this case also


Jim

x,a

g(x) = g(a).

6.4

CONTINUOUS FUNCTIONS I 255

A similar argument shows that we also always have

lim

x .. a

g(x)=g(a).

This shows g is continuous.

O Exercises
I. Suppose f is a continuous function with domain the cube
C= {x: jxkl < 1, k E (l,n)} and range in Em. Let a EC and define
a function fk on ] -1, 1 [ by means of the equality
fk(t) J(a+(t ak)ek).
Show that fk is continuous.
2. Let J be a real-valued function defined on 2 as follows:
=

J(x,y)
Let

xy
x 2+y 2'

0,

(x,y) = (O,O).

be fixed and define f1 and f2 as in Exercise 1, and show that these

functions are continuous. Show that

(Hint:

(x,y)#(O,O),

is not continuous at

Look at the set of points which satisfy the equation

x = y.)

(0,0).

This problem gives an example of a function of two variables that is


continuous in each variable separately but is not continuous in both
variables simultaneously.

3.

Discuss the functions prescribed in the following with respect

to continuity in each variable separately and continuity in both variables


(see Exercise 2):

(a)

(b)

(c)

(d)

{
{
r
{

f(x,y)=

f(x,y)=

f(x,y) =

f(x,y)

x2 + y

4'

0,

x'+y'
lxl+IYI'

0,

-y
lxl + IYI'

0,

lxl

IYI'
0,

(x,y)#(O,O),
(x,y) = (O,O).
(x,y)#(O,O),
(x,y)=(O,O).
(x,y) #(O,O),
(x,y)=(O,O).
(x,y) # (0,0),
(x,y)= (0,0).

256 I HIGHER-DIMENSIONAL SPACE

4.

If

is a one-to-one continuous function and

(f)

is compact,

show that 1-1 is continuous.

5.

If g is continuous at

(a)

and

is continuous at

thatf g (see Definition 1.3.5) is continuous at


0

(b)

If

is a continuous, show that

g(a),

show

a.

Ill

is continuous. Is the

converse true?

6.

If

is uniformly continuous, show that

a continuous function having domain

7.

!7C,(f)

(f),

If J is uniformly continuous and

may be extended to

the closure of

(f)

(f).

is bounded, show that

is bounded. Give examples which show that this may not remain

true if the hypothesis of uniform continuity or the hypothesis of


bounded domain is removed.

pk

8.

f is a function with domain all of E" and 3p E E",


E (l , n) ,so that Vx E E",f(x+p) =f(x). Iffis continous,

Suppose

'fa 0, Vk

show that it is uniformly continuous.

9.

Show that every

open

connected set in

E"

is arcwise connected.

10. Let C be the cube in 2 given by C


{x: lxkl 1, k 1, 2} and
A C C, the set of points with rational components. Is C\A connected
=

or disconnected?

1 1.

Show the existence of a Peano curve with range the unit cube

inE".
Show that no Peano curve (range [O, l] X [O, l]) can be one-to
(Hint: Consider how two intersecting perpendicular line segments
[O, l] X [O, l] would map under 1-1.)

12.
one.
in

13.

(x, y), where x E E"


En+m and range E" by

Designate the elements of En+m by

E Em. Define

a function P with domain


P (x,

y)

and

(x,O).

Show that P is a continuous open map. Does P necessarily map closed

sets onto closed sets?

14.

Use the results of Exercise 13 (as far as possible) to obtain the

results of Exercises 8, 9, and 10 of Section 6.3.

15.

If A C E" and B C E"' and A X B is connected, use the results

of Exercise 13 to show that A and B are connected.

6.5

LINEAR TRANSFORMATIONS

Many important questions in the theory of functions defined on do


mains in higher-dimensional space reduce to questions about a special
class of uniformly continuous functions, the

linear transformations.

6.5

LINEAR TRANSFORMATIONS I 257

6.5.1
Definition. A function T with domain a linear subspace L c V"
and range in vm is said to be a linear transformation \fa,{3 E R and
Vx,y EL,

T(ax

{3y) =aT(x) + {3T(y).

It is an immediate consequence of this definition that T(O)


Indeed, if

a ER, T(O)

0.

T(aO)=aT(O); choosing a=0 we have the

result. It is also an immediate consequence of the definition that the


range of a linear transformation is a linear subspace of vm.

6.5.2
:::::>

Theorem. A linear transformation T is one-to-one (nonsingular)


the range of T has the same dimension as the domain of T.

Proof.

{uk: k E (l,r)} is a basis for .B(T). Since T is


!R-(T) is a linear combination of the vectors
in the set {T(uk): k E (I, r)}. Hence, if under the assumption that T
Suppose

linear, every element in

is one to one we can show that this latter set is linearly independent, we
will have proved the necessity.
If T is one to one it follows that

T(uk)

- 0

Vk E (1,r) , since

0 is

the only vector taken into 0. Suppose that

akT(uk)=r( t1 ak uk )

0.

SinceTis one to one, we get

{uk: k E ( 1, r)} is a linearly independent set we have ak 0,


\fk E (l,r).
Conversely, suppose the range of T has the same dimension as its
domain. We must show that T(x)
T(y) ==}x=y. Using the linearity
of T, this is equivalent with showing that T(x)
0 =::::} x=0. Let us write

and since

then

T(x)

0 =::::}
r

k=I

ak T(uk)

0.

!R-(T) is r,and, as we have already


!R-(T) is generated by the set {T(uk): k E ( 1, r)}. Consequently,
this latter set is linearly independent and Vk E (l,r), ak=O. Hence
x 0 and the proof of the theorem is complete.
Now, by hypothesis, the dimension of

noted,

is

6.5.3 Definition. The dimension of the range of a linear transformation


called the rank of the linear transformation.

258 I HIGHER-DIMENSIONAL SPACE

Suppose L and M are linear subspaces in V" and vm, respectively,


and

S and T are linear transformations each having domain L and


a E R we define

range in M. If for

(aT)(x)
(S + T) (x)

=
=

aT(x),
S( x) + T(x),

then these two operations will make the set of all linear transformations
with domain L and range in M into a vector space. Indeed, this vector
space is finite-dimensional, and if

p is the dimension of L and q is the

dimension of M, then its dimension is

pq.

We have asked the reader to

verify these facts in Exercise 4 at the end of this section.


There is a very useful representation for linear transformations by
means of

matrices. It is at this point that it becomes important to con

sider a basis of a vector space as a function rather than as a set (see the
discussion in Section 6.1 ). We shall use the terminology

ordered basis

for such a function.


Suppose

A is a linear transformation with domain L and range in


(u1,
,up ) and (v 1,
,Vq) be ordered bases

the linear space M. Let

for L and M, respectively. Then we may write

A(uk)

2:

j=l

ajkVj,

kE(l,p).

The array of numbers

a21

a12
a22

a,.
a2P

aql

aq2

aqp

[""

matrix. More specifically it is called a matrix repre


sentation for A with respect to the ordered pair of ordered bases
((u1, ,up) , (v1, , vq)) . From a very formal point of view, a
is called a

qXp

q Xp

matrix is a function with domain (I, q)

Suppose
N. Let

( w1,

X (I,p)

and range in R.

B is a linear transformation with domain M and range in

, wr

be an ordered basis for N and let us compute the

r X p matrix representation of B A with respect to the ordered pair


((ui.
,up), ( w1, , wr)) . Let the matrix entries of B with respect
to the ordered pair of ordered bases ((v1,
Vq) , (w1,
, Wr) ) be
denoted by b;k Then we have
0

A (uk)

2:

j=l

a;kB(v;)

kE(l,p).

6.5

LINEAR TRANSFORMATIONS I 259

It follows that if c1k is in the lth row and kth column of the matrix
representation of B

A with respect to the given bases, then


c 1k

L
i=I

b1;a;k

This serves to define a multiplication 0 matrices,

bu
b21

b12
b22

b,.
b2q

au
a21

a12
a22

alp
a2P

Cu
C21

C12
C22

Ct p
c 2P

br i

br2

b rq

a q,

aq2

aqp

Crt

cr2

C rp

Thus we see that we get c1k by "multiplying" the lth row of the matrix
of B with the kth column of the matrix of A; that is, bu is multiplied by
a;k and then the result is summed over j to get c1k.
Suppose now that A is a linear transformation with domain a linear
subspace L C vn and range in L. Let 'U
(u 1,
up) be an ordered
basis for L and let us designate the matrix representation of A with
respect to the pair ('U,'U) by [a;k]. We want to investigate the ques
=

tion of how the matrix representation changes when we pick a new

ordered basis 'U' = (u'i. , u'v) for Land get a matrix representation

[a' ;k] with respect to the pair ( 'U', 'U'). From this point on,for the sake of
simplicity, let us agree that if [au] is a matrix representation of A with respect
to a pair of ordered bases ( 'U,'U) we shall say that A has the matrix represen
tation [au] with respect to the basis 'U.
Let Q be the linear transformation with domain and range L that is

defined by

This means Q(uk) = u'k> and we may write

u'k = Q(ud =

j=l

Q;ku;.

Hence the matrix representation of Q with respect to the basis

'U

is

[q;k]. Clearly, Q is nonsingular,since i:he dimension of &e,(Q) is the same


as the dimension of ..e(Q). Let Q-1 be the inverse of Q and suppose its
matrix representation with respect to the basis 'U is [ru].
Let us now compute the matrix representation of A with respect
to the basis

'U'.

We have

A(u'k) = A Q(uk) =
0

L
j=l

s;ku;.

260 I HIGHER-DIMENSIONAL SPACE

where

s ik

L a i,qlk
l=l

Now,

Q-1 (u i)

L ruui.
l=l

Q, and note that Q( u 1)

and if we compose both sides with

u' 1, we get

Using this in the expansion for A (u'k) we get


A (u'k)

Hence, if we write

[q;k]-1

[ rlisik ] u'1

for the matrix

[rik],

we have

(6.5.1)
In Section

6.6 we shall show how to compute the numbers r ik in terms

of the entries of

[q;k].

Let us now turn our attention to properties of a linear transformation


that are connected with the inner product, or, what is the same thing,
with the length function on

vn x vn.

Theorem. Let T be a linear transformation whose domain is a


C En and range is in Em. There exists an M > 0 so that

6.5.4

linear subspace L
Vx EL,

IT(x) I

lxl.

(6.5.2)

In particular this means that T is uniformly continuous.


Proof.

x EL,

Let

{ uk: k E (I, p)}

be any orthonormal basis for

we may write

lxl2
If we apply the transform

T we

T(x)

L 1gk12
k=l

get
p

L gk T(uk),
k=I

L.

If

6.5

LINEAR TRANSFORMATIONS I 261

and taking the norm of both sides and using the triangle inequality we
get
p

IT(x)I :!SL ltkl IT(uk )I.


k=l
Now use the Cauchy-Buajakovsky-Schwarz inequality on the right to
give

IT(x)i :!S

{ IT(uk)l2 f'2 { ltkl2 r2

This is the inequality (6.5.2).


Now, replace

x by x - y and

use the linearity of T to give

IT(x) - T(y)I :!SMix - YI.


This proves the uniform continuity.

6.5.5

3m>

0,

Corollary. If T is a nonsingular linear transformation, then


so that Vx E JV(T),

mlxl :!S IT(x)I.


11 is a linear transformation, it follows by the previous
3M> 0 so that Vy E JV(T1), l11(y)I :!SM IYI. But
Vx E JV(T), 3y E JV( 11) , so that T(x) =y. Hence lxl -:!SM IT(x)I,
and, taking m
l/M, we are finished.
For any linear transformation T let us set
Proof.

Since

theorem that

llTll =inf {M: Vx


Clearly,

Vx

JV(T) we

JV(T),

!T(x)I :!SM !xi}.

(6.5.3)

have

IT(x)I :!S llTll lxl.


The real number defined by (6.5.3) is called the norm of T and it defines
a distance function on the vector space of all linear transformations with
domain a fixed space Land range in a fixed space M. For this distance

function to. be useful, the triangle inequality should be satisfied and


this is seen as follows. Let

be another linear transformation with

domain Land range in M. Then

Vx

EL we have

! (T+ S)(x)I :!S IT(x)I + !S(x)I :!S {llTll + llSll} !xi.


This shows immediately that

!IT+ Sii :!S llTll + !!Sil.


It is also a very simple matter to check that

llaTll

la l llTll.

Va

E R,

262 I HIGHER-DIMENSIONAL SPACE

and that

We shall leave the proofs of these simple facts as an exercise.


There are other expressions for 11Tll that are often very useful. For
example,

llTll=sup{IT(x)I: lxl=l}.
The proof of this is very simple. Indeed, if

(6.5.3')

lxl= 1, IT(x) I .;; IITll,

and

hence the right side of (6.5.3 ') is dominated by II Tll. On the other hand,
let M0 be the right side of (6.5.3'). Then,

IT(x) /lxl ) I .;;


From this it follows that

IT(x)I .;;

Vx

oF- 0,

M o.

M0 lxl and hence

llTll.;;

M0 This

E (T)}.

(6.5.3")

shows the equality.


Another useful expression is

llTll

=sup {IT(x )

YI: lxl= IYI = 1 & Y

To prove this, let us first note that if

Vz

is a linear subspace of

En,

then

EL,

lzl

=sup { lz

YI: IYI

1 &

y E

(6.5.4)

L}.

Indeed, if M1 is the right side of (6.5.4) the Cauchy-Bunjakovsky

Schwarz inequality shows that M1.;;


take y=z/lzl. Then

lzl=z z/lzl

lz l .

On the other hand, if

.;; M1, which shows equality. If z

oF- 0,
=

0,

the equality (6.5.4) is obvious.


To prove (6.5.3"), let M0 be the right side of that equation. From
the Cauchy-Bunjakovsky-Schwarz inequality we get

.;; llTll,

lxl=IYI= 1. Hence Mo.;; llTllget Vx, so that l xl = 1 ,

for

(6.5.4) we

IT(x)I=sup { IT(x) YI: IYI

IT(x) YI .;; IT(x)I

On the other hand, from

l}.;;

M o.

Hence, from (6.5.3'),

llTll =sup { IT(x)I: lxl=l}.;;


6.5.6

Definition.

linear functional.

Mo.

linear transformation with range in

is called a

Linear functionals have very interesting and useful representations,


as the next theorem will show.

6.5. 7

Theorem.

there exists a unique y

If A is a linear functional with domain L C En, then


so that Vx EL,

EL

A(x)=x

y.

6.5

LINEAR TRANSFORMATIONS I 263

Proof. Let {uk: k E ( 1, p)} be an orthonormal basis for L. If


x E L, we may write

A(x)

k=l

kA(uk) .

If we set
n

y= L A(uk ) uk>
k=l
A(x) =x y.
z E L, so that Vx E L, A(x) =x z. Then Vx EL, x (y - z)
=0. In particular, take x
y - z and we get jy - zj =0, which implies
y=z.
it is clear that

Suppose

6.5.8 Theorem. Let A be a linear transformation with domain L C En


and range in M C E'n. Then there exists a unique linear transformation A1
with domain M and range in L so that Vx EL and Vy E M
A(x) y=xA1(y).

(6.5.5)

y E Mand set Ay(x) =A(x) y. Since Ay is a linear func


L, it follows from the last theorem that 3 a unique
y1 E L so that Vx E L
Proof.

Fix

tional with domain

A(x)y=xy1
Since

Aay+13Ax) =aAy(x)

(6.5.5')

f3Az(x) it follows that

(ay + {3z)1 =ay1 + {3z1


y1, it follows that A1 is a linear transformation
L. Since the yl for which (6.5.5') holds is
unique, it follows that there is only one linear transformation A' for
which (6.5.5) holds.
Hence, if we set

A1(y)

with domain M and range in

6.5.9 Theorem. If A is a linear transformation with domain Land range


in M, if N(A) = {x: A(x) =O} is the null space of A, and if N(A)l. is the
orthogonal complement of N(A) in L, then

N(A)l. =(A1).
Proof.

For every

x E N(A) and Vy E M, we have


A(x) y=xA1(y)=O.

(A1) C N(A)l.. On the other hand, suppose 3z E N(A)l.\


(A1). Because (A1) C N(A)l., without loss of generality we may

Thus

264 I HIGHER-DIMENSIONAL SPACE

suppose that

&2.(A1).J..

Thus we have

Vy

E M,

A(z) y=zA1(y)=O,
from which it follows, upon setting y
E N(A) n N(A).J.. It follows that

=A(z), that A(z)=0,

Hence

N(A).J.=&C.(A1).

6.5.10

Corollary.

For every linear transformation A,


rank

Proof.
Since

The range of

AJN(A).J.

and hence

z= 0, which is a contradiction.

A=

rank

A1

is clearly the same as the range of AJN(A).J..

is a nonsingular linear transformation, we have


rank

A=dim N(A).J.
=dim

&i (A1) =rank

A1

The linear transformation A1 is called the transpose of A with respect


to the space M. If A has the matrix representation [aid with respect
to the ordered orthonormal bases ((u1, , up) , (v1, , Vq)), it is
interesting and useful to comp ute the matrix representation of
((v1, - , Vq) , (u1, ,u p)) We have

respect to the bases

A1

with

A(ud = L aikvi,
i=I

and thus

On the other hand,


p

A1(vJ = L a1ki uk
k=I

and hence

Thus

The matrix

[a1Jk] is called the transpose of the matrix [aik] and is a


[aik]1

p X q matrix. It is usually denoted by

The rank of a linear transformation can be comp uted from its matrix
representation with respect to any ordered p air of ordered bases. The
p recise facts are given in the following proposition.

6.5.11 Proposition. If [aii] is the q X p matrix representation of a


linear transformation A with respect to any ordered pair of ordered bases,
q
then the rank of A is the dimension of the linear subspace in V generated by
the column vectors {ak=(a1k, ,aqk): k E (I,p)}, which is the same

6.5 LINEAR TRANSFORMATIONS I 265

as the dimension of the linear subspace of VP generated by the


{bk= (ak1,
akp): k E(l,q)}.

row

vectors

) is an ordered basis for "(A) and


(A).
If r is the rank of A, then there is a set {A(uk; ): i E( 1, r)} of r linearly
independent vectors that generate (A). Suppose that {a; : i E (1, r)}
Suppose

Proof.

(v1,

,Vq)

(u1,

,Up

is an ordered basis for a linear space that contains

C Rand

Since

q
A(uk1) = L aik; vi
J=l

a;A(uk1) =i (a; aik;)vi = 0,


Vi E(l,r), a1=0. Thus the vectors {ak;: i E(l,r)}
Vq . To show that these vectors generate
the same space as the vectors {ak: k E( 1, p)}, we first note that Vuk
there exist numbers {/3;k: i E (1, r)} so that
it follows that

are linearly independent in

A(ud

L /3;kA(uk;)

i=l

Hence
r

jE(l,q).

aik = L /3; kaik; ,


i=l

But this says that


n

ak= L /3;kak;,
i=l

which proves the assertion about the vectors


To prove the assertion about the vectors

ep)

be an ordered orthonormal basis

{ak: k E {l,p)}.
{bk: k E (1, q)}, let ( e1,
for P and (f 1, ,fq)

Eq. Let B be the linear transforma


q
tion with domain P, range in E , and whose matrix representation with
respect to ((e1, ,ep), (f1,
,fq)) is [aid By what we have
proved in the previous paragraph rank B = rank A. Also, we know from
Corollary 6.5.10 that rank B1 = rank B. The matrix representation of
B1 with respect to ( (f1,
, fq), (e1,
ep)) is a p X q matrix with
column vectors the set {bk: k E ( 1, q)}. Thus from the first paragraph
be an ordered orthonormal basis for

of the proof, the dimension of the linear space generated by these


vectors in

VP is rank B1 =rank A. This completes the proof.

266 I HIGHER-DIMENSIONAL SPACE


SPECIAL LINEAR TRANSFORMATIONS

Projections. Suppose M is a linear subspace of En and Lis a linear


M. If V- is the orthogonal complement of L in M, then
Vx E M there is a unique y E L and a unique yl. E LL so that

subspace of

l
y + y ..

The last statement is just Proposition

6.2. IO(b). Let us defn


i e a linear
P with domain M and range L by the equation

transformation

P(x) =y.
The linear transformation

P is called the projection of M onto L. It has

the following properties:


(a)
(b)

x E L:::P
:> (x) =x.
p2 = p p=p.

(c)

P=P1

We shall leave these simple facts as an exercise for the reader.


If

{uk:

E (I, r)} is an orthonormal basis for the linear space L,


P(x) in terms of this basis. Indeed,

it is a simple matter to compute


let us write

P(x) = L akuk.
k =l
If we take the dot product of both sides with respect to

ak=P(x)

uk=x

uk, we get

uk.

The last equality follows from the facts that

P1

P and P(uk) = uk.

Hence
r

P(x) = L (x
k=l

uk)uk.

Let us use the idea of projection to obtain the

spherical representation
P1 be the projection of En onto the linear subspace
1
O}. P1 may also be described as the projection of
{x: x E En & x
En onto the space generated by the vectors {ek: k E (2, n)}. Here we
i
are taking ek
(e/,
ekn ) where ek = 0 ::::> j # k, el= 1. Let P2
2
1
be the projection of En onto the linear subspace {x: x E En & x = x
O}. The projection of P2 may also be described as the projection of
En onto the space generated by {ek: k E (3,n)}. Note that (P2)
=(P2j(P1)), so that P2 restricted to (P1) is the projection of
the latter subspace onto the subspace (P2). In general, let Pi, j
E (I, n I), be the projection of En onto the subspace {x: x E En
1
& x =
x; =O}. Clearly, the last subspace is the space generated
by {ek: k E (j + l,n)}, and (Pi)=(Pij(Pi_1)),j E (2,n-1).
1
The vector (t , 0;
, 0) is the projection of t onto the subspace
generated by e1, and we have
of a vector in En. Let
=

6.5

LINEAR TRANSFORMATIONS J 267

Now, there exists a unique 81E[O,1T] so that cos81 = (t


provided It I =fa 0. Hence we get
(1 = ltl

COS

e 1)/ltl,

(JI,

The number 81 may be considered as the measure of the angle between


the vectors t and e1. Since t= (t e1)e1 +P1(t) and e1 and P 1(t) are
orthogonal, we have

But, we also have


ltl2 cos2 81 + ltl2 sin281 = ltl2,
so that
IP1(t)l2= ltl2 sin281
Since 81E [O,1T] , sin 81 0, so that
1
IP 1(t)I = ltl sin 8 .
Now, let us repeat this process with the vector P1(t) playing the role
oft and P playing the role of P1 We find that
2
O ) = (P1(t) e2)e2,
(0, t2, 0,

and there exists a unique 82E [O,1T] so that cos 82 = (P1(t)


IP 1(t)I , provided, of course, that P1(t) =fa 0. We then get
t2= IPdt)I cos 82= ltl sin 81 cos 82,

e 2)/

IP2(t)I = IP1(t)I sin 82= ltl sin 81 sin 82


The number 82 may be considered the measure of the angle between
P 1(t) and e (Fig. 6.5.1).
2
x3

FIGURE 6.5.1

268 I HIGHER-DIMENSIONAL SPACE

If P n ( t) =F
-2

0 , then none of the vectors t, P 1 ( t) , P P ( t)


,
1
2
P (t) is zero, and we can proceed by induction and
1
2
find that there exist unique (Jk E [ 0, 'TT] , k E(l,n - I), so that Vk
E(l,n-1),
tk = It I sin 61 sin 62 sin (Jk-i cos (Jk,

Pn_

P,._3

and moreover

It" I = Iti

sin 61 sin 62 sin en-2 sin en-I.

The last equation comes from the fact that

(O ,
If

t = 0,

0 , t") =

Pn
-1

Pn

-2

P1 (t) = P n (t).
-1

these equations Still hold, but the numbers

(JI,

,en -I

are no longer uniquely determined. Indeed, any numbers will do. If

P1(t) =O , then from the equation j P 1 ( t ) j = ltl sin 61, it


0 or 61 ='TT. Again the equations above hold, but now
(}2,
en-i are no longer uniquely determined and again any num
bers will work. Proceeding in this way, we see that if t =F 0 and V k
E (I, n - 2), (Jk E ]O,'TT[, then the vector t uniquely determines the
numbers (Jk, Vk E (I, n
I).
Unfortunately the last equation is an equation for It" I rather than
t". If we wish to remove the absolute value sign, it may no longer be
true that we can take en-I E [0,7T]. However, if Vk E(l,n-2),
(Jk E )0,'TT[, then sin 61 sin en-2 =F 0, and there exists a unique
en-I E [0,27T[ so that
t

0,

=F

but

follows that 61

tn-I
tn

= It I

= It I

sin 61

sin en-2 cos en-l'

sin e1 sin en-2 sin en-1.

en-1) where p ;:;. 0,


E (I, n-2) and en-1 E [O,27T]. Let S0 be this
with p > 0, (Jk E ]O,7r[ for k E( l, n - 2) and en-i

Let S be the collection of n-tuples (p, 61,

(Jk E [O,'TT]

set of n-tuples

E [O,27T [.

We have proved the following result.

The function with domain


t1

t2

tk

tn-I
tn

for k

defined by

ltl cos 61

Iti

sin 61 cos 62

It I

sin 61 sin 62

= It I
= It I

sin

(Jk-l

cos

(Jk

(6.5.6)

sin 61 sin 62 sin en-2 cos en-I


sin 61 sin 62 sin en-2 sin en-I

'

has range all of E". If this function is restricted to S then it is one to one and
its range is all of E" with the exception of the subspace generated by the set
{e1:j E(l,n-2)} U {O}.
0,

6.5

Note that if

= 2,

the formulas

LINEAR TRANSFORMATIONS I 269

(6.5.6) give the ordinary transforma

tion from "polar coordinates" to "rectangular coordinates," and the

exceptional set is { 0}.

Symmetric Transformations.
and

A is

linear transformation
If

Suppose Lis a linear subspace of En

a linear transformation with

( u1,

ur

A is

J0(A) =Land .92.(A)


RA = A1

is an ordered orthonormal basis for Land

the matrix representation of

A with

A=A1,

we get that

[a;;]

is

respect to this basis, it follows from

A1

our discussion about the matrix representation of


But since

C L. The

said to be symmetric

a1ii =a;;=aii.

that

a1;;=a;;.

Any matrix whose entries

symmetric.

satisfy the last relation is called

There is a method of computing the norm of a symmetric transforma


tion that is usually more convenient than the methods we have indicated
previously. Let us set

M=sup{IA(x) xi: lxl


so that

Vx E

En,

IA(x) xi M lxl2
sup { IA(x)

l} ,

It is clear that

YI: lxl =IYI

l}= llA ll.

On the other hand , a direct computation shows that


1

A(x) y = [A(x+y)
"4

(x+y)-A(x-y) (x-y)].

From the facts that

l[A (x+y) (x+y)-A(x-y)

(x-y)]I M[lx+yl2+lx-yl2]

Ix+Y l2+Ix - Y l2 = 2[lxl2 + IYlJ,


we get

llAll =sup {IA(x) YI: lxl =IYI = l} M.


Consequently, we have the following equality for symmetric transfor
mations:
sup {IA(x)
Since

IA(x) xi

xi: lxl = l}

llAll.

(6.5.7)

is a continuous real-valued function and the unit

sphere in L,S ={ x:

lxl = l},

is compact it follows that

3x0 E S,

llAll =max {IA(x) xi: x ES}=IA(x0) x01.

Let us set

,0=A(x0) x0
A direct computation shows that

0 IA(xo ) - 1-toxol2 =IA(xo )J2-J.to2

so that

270 I HIGHER-DIMENSIONAL SPACE

Since

IA(xo ) I

llAII= 11.to l.

:!f::

it follows that

IA(xo)-,o xol2=0.
This means that

(6.5.8)
Any number

,0

for which there exists a nonzero vector

x0 E(A)

for which

(6.5.8) is satisfied is called an eigenvalue or proper value of the


linear transformation A . Any nonzero vector that satisfies (6.5.8) is

called an

eigenvector

or

proper vector

A.

for

Our previous discussion shows

that a symmetric transformation has an eigenvalue. The corresponding


eigenvectors are not unique, since clearly any element in the linear
space generated by an eigenvector satisfies the relation

(6.5.8).

A., let MA be the linear subspace of all vectors


in L that satisfy the relation Ax=A.x. It is clear that A takes every element
in MA into an element in MA. If A is symmetric, then it is also true that
A takes every element in MA.l into MA.l To prove this last statement, let
x EMA1-; then Vy EMA we have x y=0. Now, Vy EMA, since
A(y) EMA and x EMA.l we get
For a given eigenvalue

A(x) y = x
,0

A(y)

A.(x

y)=0.

A(x) EMA.l'

This means
Let

be the eigenvalue whose existence we established several

paragraphs back. Let

A1

be the restriction of

to

M.,,1-;

that is,

A,= AIM..1-.
A1 is a symmetric linear trans

As we have shown in the last paragraph,


formation with domain

M./

and range in the same linear subspace.

Hence, by the same existence proof as before, there exists a

x1 EM./,

so that

lx1I =I

,1

and an

and

Proceeding in this way (formally, by induction!) we find that there is


an ordered orthonormal basis

{ A.i.

A. r}

so that

(v1,

v r)

for L and eigenvalues

A vk= A.k vk.


A. k are the same.
M.i is more than 1.

We are not excluding the possibility that some of the


This can happen, for example, if the dimension of
With respect to the ordered basjs
sentation of

is

A.,

(v1,

0
0

vr)

the matrix repre

6.5

LINEAR TRANSFORMATIONS I 271

where the entries off of the main diagonal are zero. This means that

[a;i],

the matrix

which is the matrix representation of

( u1,

to the ordered basis

Ur)

with respect

is similar to a diagonal matrix in the

sense that there is an invertible matrix

[bii]

so that

is the given diagonal matrix. Because of this, we usually say that a


symmetric matrix is diagonalizable.
Let us suppose that we have numbered the eigenvalues so that

>..

>--1

>..r.

For

E L let us write

Hence we get

r
A(x) = L xk>..kvk>
k=l
r
A(x) . x L >..k(xk)2.
k=l
=

This shows that

Vx

E L,

>--rlxl2 A(x) x >..1 lxl2


Indeed, as the reader may easily verify,

{A(x) x: lxl = l},


{A(x) x: lxl = l}.

>..1 =sup

>..r =inf

Orthogonal Transformations.
and

Let L be a linear subspace of E"

a linear transformation with

linear transformation

=Ix!.

l(A)

(A) C L. The
Vx E L, IA (x) I

L and

is said to be orthogonal <::::}

A necessary and sufficient condition that a linear transformation is

orthogonal is that

Vx, y

E L,

A(x)

A(y) =x y.

(6.5.9)

We believe that the sufficiency of this condition is clear without further


comment. The necessity follows from the relation

A(x)

A(y) =

A(x-y)l2J
A x y)
4 [I { + l2-I
1

=4 [Ix+Yl2-Ix-Yl2J

x Y

From the condition (6.5. 9) it is almost immediate that a necessary and


sufficient condition that a linear transformation is orthogonal is that

At oA

(6.5.10)

272 I HIGHER-DIMENSIONAL SPACE

where I is the identity transformation; that is, Vx


if A is orthogonal, then Vx,y E L

E L, I

(x)= x. Indeed,

A10A (x) y=A (x) A(y)=xy.


Hence for any fixed x and for every y, [A10 A (x) - x]y 0, which
implies that A10 A (x) x. Conversely, if A'0 A=I, then Vx E L,
A10 A (x) x= IA (x) 12 = l xl2 Note that an equivalent way of stating
(6.5.10) is that
=

D Exercises
I.
If T is a linear transformation, show that its range is a linear
space. If T is one to one, show that T-1 is also a linear transformation.

2. Decide which of the following functions from V3 into itself is


a linear transformation.
(a) T (x)= (x1 -x3, x1 + x2, x1 + 3x2 -x3).
(b) T (x)= (!xi, x2 + x3, x2).
(c) T (x)= (3x2 -x3, V{x'i")2, x2 + x3).
(d) T (x)= (x1x3, x2 + 3x3, 0).
(e) T (x)= (3x1 + 2 x2 + x3, x1 - x2, 4x1 + x2 + 2x3).
For those transformations that are linear, give the matrix representa
tions with respect to the ordered basis (e1, e2 , e ).
3
3. Let T be a linear transformation with domain the linear sub
space V c vn and range in V. Let us set

N= {x: x E V

&

T (x)=O},

the null space of T. Show that N is a linear subspace of V and moreover


rank (T) +dim (N) =dim ( V).
4.

Verify the facts stated in the paragraph after Definition 6.5.3.

5. Suppose that A and B are linear transformations each having


domain a real vector space V and ranges in V. Show that

A0B=IB0A=I,
where I is the identity transformation; that is, Vx

V, I (x) =x.

6. Suppose A and B are linear transformations each having domain


a linear subspace L C E" and range M C E"'. Show the following:
(a) Att=A.
(b) (aA + {3B)1=aA1 +{3B1,
Va,{3 ER.
(c) (A0B)1=B1oA1
7.
EC

Let C be a closed set in E" with the property that x E C ==:}ax


for all a E R with a ;;,; 0. Suppose f is a continuous function with

6.5

LINEAR TRANSFORMATIONS I 273

JE>(J) =C, fk(J) C En and so that Vx EC and Va


f(ax) = af(x). Show that 3M> 0, so that Vx E C,
IJ(x)I
8.

ER with

a;;.:

0,

Mlxl.

Let f be a function with domain En and range in E'n which is

additive; that is,

Vx, y

E En

J (x + y)= J(x)
Show that if f is continuous at

J(y).

x= 0,

then it is continuous at every

point of En.

9.

Let

Ay be

the linear functional defined on En by the equation

Ay(x) =x y.

Show that

10.

Let

be an orthogonal linear transformation with domain and

range a linear subspace L C En. Suppose


orthonormal basis of L and

[aii]

( u1,

ur

is any ordered

the matrix representation of

with

respect to this basis. Show that

L aiiakJ

if

i # k.

j=I

11.

Let

be a symmetric transformation with domain a linear sub

space L C En and range in E". Let


of

[aiJ]

be the matrix representation

with respect to the ordered orthonormal basis

that there is a matrix

[biJ]

( u1,

,Ur

).

Show

that is the matrix representation of an

orthogonal transformation from L onto L so that

is a diagonal matrix.

12.

If L and Mare linear subspaces of E" that have the same dimen

sion, show that there is a linear transformation


range M so that

13.

Vx

Show that if

E L,

IA (x) I= lxl.

with domain L and

is a linear transformation with domain L C En

and rangeM C E" so that

Vx

E L,

IA (x) I= lxl , then A can be extended

to be an orthogonal transformation with domain and range E".

14.
that P2

Let P be a symmetric linear transformation with the property


=

P. Show that P is the projection of its domain onto its range.

Give an example that shows that P2= P does not imply that P is a
symmetric linear transformation.

274 I HIGHER-DIMENSIONAL SPACE

15.
of

Let

(A),

be a symmetric linear transformation, M a linear subspace

and

P the projection
A= A P.

into itself::::? P

16.

If

P is

(A)

onto M. Show that

a nonzero projection show that

vector in En and
of

of

takes M

11P11= 1 .

If y is a nonzero

is any vector in En, use the formula for the projection

onto the linear space generated by y and the result of the first sen

tence of this exercise to give another proof of the Cauchy-Bunjakovsky


Schwarz inequality.

17.

6.6

Show that a linear transformation is an open map.

DETERMINANTS

In his study of elementary algebra the reader has undoubtedly come


across the notion of determinants and has learned enough of their
properties to be able to use them for solving systems of linear equations.
Our purpose in this section is to derive a number of properties of
determinants in a rigorous way since they are very important quantities
in the higher-dimensional calculus.
Before we discuss determinants, it is necessary to discuss a certain
class of functions, called

permutations,

which take finite sets onto them

selves.

Definition. Let S be a finite set. A one-to-one function <T having S


its domain and range is called a permutation.

6.6.1
as

For example, if

is the set of integers

(1, n),

then a permutation is

simply a function that takes these integers into each other. It is not
difficult to show, although we shall not do it, that each permutation is
a composite of permutations that permute or interchange only two
elements in the original set and leave the remaining elements fixed.
This leads to the important concept of an

even

or

odd

permutation:

An even [odd] permutation is one that is a composition of an even


[odd] number of elementary permutations which interchange only
two elements.
Of course, in trying to define an even or an odd permutation as we
did in the last paragraph, it is necessary to show that a permutation
cannot be written in two ways as a composite of both an even number
and an odd number of elementary permutations. To avoid proving this,
we shall define the concept of an even or an odd permutation in a differ
ent way.
Let

(I, n)

Sn

be that function whose domain is the set of permutations of

onto itself which is given by

S n(<T)=

IJ (<r(k)-<r(j)).

l:Ej<kn

(6.6.1)

6.6

DETERMINANTS I 275

The definition of the product on the right has already been indicated
in Section 3.5. To make this precise we first remark that the set

={(j,k):j, k EN, 1 j < kn} has n(n-1)/2


fcr be that function with domain An defined by

elements.

An

Let

Jcr(j,k)= <r(k) - <r(j) ,


and let <I> be any one-to-one function with domain
and range

An.

(I , n(n-I) /2)

Then, as we noted in Section 3.5, the right side of (6.6.1)

is defined as

Sn(<r)=

n<n-1)/2
TI Jrr0<l>(k),
k=l

and this definition is quite independent of the one-to-one function

<I> that is used.


Let us now define a function on the set

An as

1 <r(k) - <rU)
U'jk = -1
<r(k) - <r(j)

follows:

> 0,
< 0.

We claim that

TI <T1k[<r(k)-<r(j)J = TI < k- 1)
1j<kn
li<kn
where these products are defined in a manner indicated in the last
paragraph. Indeed, for each permutation
having domain

A"

Fcr(j, k)
If

<T

let

Fcr

be that function

and range in R which is given by


=

<Tjd<r(k)- <r(j)].

is the identity permutation, then clearly

Now, define the one-to-one function

qr

F, (j,k)

k - j= f, (j, k).
An by

with domain and range

Fcr 0 qr= F,. If <I> is any one-to-one function with domain


( 1, n(n-1)/2) and range An, then qr 0 <I> is the same type of a func
It is clear that
tion and thus

n<n-1)/2

TI

k=l

F<T 0<l>(k) =

n(n-ll/2

TI
k=l

F<T 0 qr 0<l>(k)=

n(n-1)/2

TI

k=l

F, 0<l>(k)'

which proves our assertion. From the associativity property of finite


products we get

TI U'jk(<r(k)-<r(j))= TI nU'jk TI (<r(k)-<r(j)).


ti<kn
li<k
li<kn
We shall set
sgn

<T =

TI <Tjk,
l:!Si<kn

(6.6.2)

276 I HIGHER-DIMENSIONAL SPACE

and note that sgn <ris either 1 or -1 .


We have established part of the following theorem.
6.6.2 Theorem. There exists a function, with domain the collection of
permutations of (1, n) onto itself and range the set { 1, -1} , whose values
are denoted by 'sgn <r,' so that

Sn( <r)
where
<rand

sgn <rSn ( L )

(6.6.3)

is the identity permutation. Further, for every pair of permutations

T,

sgn

T 0

<r= sgn T sgn

(6.6.4)

<r.

Proof. Since we have already proved (6.6.3), it remains to prove


(6.6.4). Let f be any function whose domain is (I, n) and range is in

{l,n).We set
Sn(f)

TI (f(k) - J(j))
1E.i<kn

We claim that
S,,(f

0 a-

) = sgn

<r

Sn(f).

Indeed, the proof of this carries over mutatis mutandis from the proof
of (6.6.3) given just previous to the statement of this theorem, where
in that case f was taken as the identity permutation L. If in the last
formula we replace f by T we get (6.6.4).
Suppose now that [ a ii] is a square n X n matrix. One way of defining
a determinant of a matrix, which the reader has probably already seen
in his previous studies, is as follows:
(6.6.5)
<r

where the symbol rr indicates we are summing over all permutations


of ( 1, n) onto itself and for notational convenience we have set a-(j)
CTj. We leave it to the reader as an exercise to establish that there
are n ! permutations with domain ( 1 , n) .
Although the equality in (6.6.5) is a perfectly good definition of the
determinant of a matrix, it is usually easier to obtain the principal
properties of the determinant by proceeding in a way that, at first
glance, may seem slightly different. We shall do this by means of alter
nating multilinear functionals.

6.6.3 Definition. A multilinear functional is a real-valued function


defined on an m-fold Cartesian product of V" which has the property that when
any m - 1 variables are held fixed the function is a linear functional of the
remaining variable.

6.6

DETERMINANTS I 277

n
A multilinear functional D defined on the m-fold Cartesian product of v

every permutation <r acting on ( I , m)

which has the further property that for


onto itself,

D(aul a0"2,, aum) = sgn

<T

D(a1,a 2,

,am)

is called an alternating multilinear functional. If m = n and D has the addi


tional property that
D(e1,

,en) = I ,

then D is called a determinant function, and the number D(a1 , ,an) is


called the determinant of the n X n matrix whose kth row consists of the com
ponents of the vector ak.

It is not very difficult to verify that the function defined by (6.6.5)


is a determinant function. However, since it is an important point, we
shall state and prove this formally.

6.6.4

Theorem.

The function defined by the right side of

(6.6.5) is a

determinant function.

Proof.

Let us set
D(a1, ,am )

(sgn <r ) a1u1

anun.

Now, Va,{3 ER
D(a1, ,aa k + f3bk>,a,.)
=

(sgn <r) alul

f3

(aakuk +

f3bkud

a11Un

(sgn <T) alul akuk a,.un

(sgn <T) alul

bkuk anun

= aD(a1,-,a,.) +{3D(a1, ,bk. -,an).

This shows that Dis multilinear. Further,


D(a,1, ,a,m) =

If T(j) = k, then <r(j) = <r

(sgn <r) a,1u1 arnun

1
T- (k). Thus, since Tis a permutation and

a finite product is ind>pendent of the order of the products, we get


n

TI

j=l

aTjuj =

Now use the fact that <r = (<r

L
u

(sgn <r)

TI

i=l

TI

k=l

1
T- )

a Tiui = sgn T

akUT -lk.

Tand formula (6.6.4), and we get


11

(sgn <r

'T-1

TI

k=l

aku ,-1k .

278 I HIGHER-DIMENSIONAL SPACE

Now, as <T goes over the set of permutations of ( 1, n) , <T 0 T-1 also goes
over this set. Thus the function <l>A<r)= <T 0 T-1 is a permutation of the
collection of permutations of ( 1, n). Because a finite sum is independent
of the order of summation, we have

2:

(sgn

<T 0

1
T- )

<T

IJ ak<N-1k

k=I

2:

(sgn

<T

akrrk.
TI
k
=I

Thus we see that


D(a71, ,

a m )= sgn

TD ( a 1 , ,an),

so that D is alternating.
Finally,
D ( e 1,

,en)=

2:

(sgn

<T)e1 rr

enrr".

<T

Since ekrrk=0, unless <Tk k, we see that the only nonzero term on
the right appears when <T is the identity permutation. Thus
=

D ( e 1, - ,

e 11 )=

l,

and the theorem is established.


6.6.5

Theorem.

An alternating multilinear functional

satisfies the

following:
(a)

xk=O=>D(x1, ,xk>, xm)=O.

(b)

xJ=xk=>D(x1,-,xi,,xk>,xm)=O.

(c)

Va ER & V(x1, ,xm),

Proof.

(a) Since D is multilinear we have


D(x1, , 0,, Xm)= D(x1, , 0 + 0,, Xm)
=2D(x1,- ,0,,xm).

This proves (a).


(b) Let <T be the permutation with domain (1, n) so that <T( i)= i
if i j, k, <r(j)
k, and <r( k)= j. Then since sgn <T = -1, we have
=

Since
(c)

X;

xk we have proved (b).


Using the multilinear property we have
=

6.6

DETERMINANTS I 279

By part (b) the second term on the right is zero, which completes the
proof.

6.6.6 Theorem. Suppose m .:;: n, and a is a real-valued function with


domain, the set {(i1,
, im) : I <S: ii <
< im <S: n}. Then there exists
at most one alternating multilinear functional D defined on the m-fold Cartesian
product of vn so that V(ji, .im) with I .:;: ii ::: < lm.:;: n, we have

D(e;., , e;m)
Proof.

a(ii, .im)

Suppose E is another alternating multilinear functional on

the m-fold Cartesian product of vn, so that E (e;,,

.im). If we set

F=D

, e;m) =a (j1,

E,

then F is an alternating multilinear functional with the property that

im <S: n.
...'Xm be vectors in vn and set

for all indices for which I <S: ii <


Now, let Xi'

xk = L xkie;.
j=i
Using the multilinearity of F we find that

F(xi,

",Xm) =

L
im=l

L
i1=l

If ir = i, for some rands, it follows from part (b) of the last theorem that
F(e;,, , e;m) = 0. If the ik are all different, let ii , , im be a re
numbering of the ik so that ii < i2 < < im Let CT be the permuta
tion given by i.,.k
ik. Then
=

F(e;1,

e;m)
,

(sgn CT) F(e;i'

, e;m )

Hence F = 0, which proves the theorem.

6.6. 7 Corollary. There exists one and only one determinant function on
the n-fold Cartesian product of vn with itself.
Proof.

The existence of such a determinant function is given by

Theorem 6.6.4. The uniqueness is given by the last theorem.


The determinant function on the n-fold Cartesian product of vn
may also be interpreted as a real-valued function acting on the collection
of n X n matrices. If [aiJ] is an n X n matrix, we shall often denote the
determinant of this matrix as on the left side of (6.6.5): det [a;;]

Another standard notation for the determinant of a matrix is

280 I HIGHER-DIMENSIONAL SPACE

One of the most useful properties of the determinant function is


that the determinant of a product of two n X

matrices is the product

of the determinants of the matrices. We shall prove a slightly generalized


version of this result which will be useful to use when we discuss surface
integrals.

6.6.8

Theorem (Binet-Cauchy).

and [bjj] an n
<let

Let

m :o:;;

n, [a;;] an m

n matrix,

m matrix. Then

[a;j][b;;]

[b;;] be fixed and set ak


(ak1, ,ak,,). The kth row
[a;i] [b;i] is ak [b;i], where we are considering this as
matrix multiplication of the 1 X n matrix ak with then X m matrix [b,iJ]
m
to give a 1 x m matrix which can be considered as a vector in v . Thus,
since [b;j] is fixed, we see that the left side of (6.6.6) is an alternating
multilinear functional of the m-tuples (a1,
,am). Also, for fixed
[b;i], the function with values
Proof.

Let

of the matrix

is an alternating multilinear functional of the vectors

(aiki

aikm),

(1,m). Hence a fortiori it is an alternating multilinear functional of


ai,j E (1,m). Thus both sides of (6.6.6) are alternating
multilinear functionals of them-tuples (a1,
,am).
Denote the left side of (6.6.6) by D(a1,
,am) and the right side by
E(a1,,am) Let {ek: k E (l,n)} be the standard unit vectors in
i
V"; that is, ek
0 for j k, e/
1. Now,

the vectors

ek1[b;i]
Thus we see that

(bk i'' bk12,

, bk1m)

6.6

DETERMINANTS I 281

On the other hand,

e ktkt

ek/

"'
=l.

ekmk1

ekmkm

kt < < k,,., and Ut, ,jm) =fa (kt,


(l,m)} so Vi E {l,m},j-i6.k;. The proof
of this is a simple induction argument on m and we leave it for the
reader. Since for j =fa k;, ek/ = 0, it follows that if Ut, ,jm) =fa (k1,
,km) and 1 j1 < < jm n and 1 k1 < < km n, then
Further, if

-,km),

jt

then

jm
{j;: i

< <

3j

and

Hence we see that for 1

k1

k1

<

D(ekp ,ekm)

It follows from Theorem 6.6.6 that

Corollary.

6.6.9

< km :;;:; n,

<

Therefore, we see that for 1

If [a;;] and
det[a;;] [bu]

km

:;;:;

we have

E(ekp , ekm).
D

[b;;]

<

E,

are n

which proves the theorem.


X

n matri.ces, then

det[aii]det[b;;] .

From the formula (6.6.5) it is very easy to see that the determinant
of a square matrix may also be considered as an alternating multilinear
functional of the column vectors of the matrix. Indeed, as the permu
tation

<T

all permutations. Moreover, if


a;a; =

(1, n), <T-1 also ranges over


k, then <T-t (k) = j, and hence

ranges over all permutations of

aa-lkk

a1a1a2a2 .

Since sgn

<T

(j )

and

<T =

sgn

t
<T- ,

. llnan= lla-t11llu-l22 .

it follows that

. lla-tnn.

282 I HIGHER-DIMENSIONAL SPACE

<T

which shows that <let [a;;] is an alternating multilinear functional of


the columns of [a;;] .
Let [a;;]1 designate the transpose of [a;;]; that is, if a1u are the entries
of the transpose of [aii ], then a1;; = a;;. Hence the rows of [au]1 are the
columns of [a;;]. Our previous remarks constitute a proof of the
following.

6.6.10

If [a;;]1 is a square matrix with entries a1;j that

Theorem.

satisfy a1u

a;;, then

det[au]1

det [ au J .

(6.6.7)

The actual computing of a determinant is probably most effectively


carried out by the method of expansion by cofactors or minors of the jth
row (or column). This is a method that reduces the computing of the
determinant of an n X n matrix to the computation of determinants
of (n

1)

(n

matrices. Let [a;j] be an n X n matrix and as

1)

usual set
"

ai
If we set D(a1,

(a;i. ,ai,,)

k=l

aikek.

,an)=det [a;j], then using the multilinearity of

D we have
det[a;;]

k=l

ai(ai. ,ai-1' ek,ai+1,

,a,,),

(6.6.8)

where, as we have indicated, each ek occurs in the jth position in the


n-tuple.
Let us, at first, compute the determinant D(e1,a2,

,a n). Set

b; =a; - a0e1, Vi E (2, n), and


ai

(a;2,

,a;,,).

Clearly b; is determined as soon as ai is given. Now, from Theorem


6.6.S(c),

A simple induction argument shows that

D(e i. a2, ,an)

D(ei. b2, ,bn).

The number on the right and hence the number on the left side depends
only on a'2,

,a',,. Let us set


E(a2, ,a)= D(ei. a2, ,an).

6.6

DETERMINANTS I 283

This defines E as an alternating multilinear functional on the (n


n
Cartesian product of v -1 Further,
E (ei.,

e)=D(e1, e2,

1) fold

,en)= 1,

so that E is a determinant function. Thus

(6.6.9)

Suppose now that I :s;;

:s;;

e1 1s m the jth position. Let

n and

<r

be

that permutation of (I, n) onto itself defined by

i
1,
iE(2,j),
iE(j+l,n).
=

A very simple induction argument establishes that

sgn
Thus, from

<r =

(6.6.9),

= ( l)i-1
-

Let us now compute D(ek> a2,


equality

<r

a u-1>n
au+ on

kE (I, n). From the


bn) are the rows of the trans

,an, then

be the same type of permutation as described above,

rr(i) =
Then

au-u2
au+u2

,an), where

(6.6. 7) we know that if (bi.

pose of the matrix with rows ek, a2,

Let

I)i-1

=>i=l ,
1 =>iE(2, k)
i =>iE(k+l,n).

284 I HIGHER-DIMENSIONAL SPACE

Finally, let us take the transpose of the matrix having rows ( ba1 ,
The new matrix has

e1

brr,.).

in the first row and using (6.6.7) and (6.6.9)

we get

Finally, if

ek is

a2ck-o

a2ck+o

aack-o

aack+o

anck-1>

anck+O

in thejth position, by the use of a permutation in exactly

the same way as used in the case where

e1

was in the jth position, we

find that

D ai.

au

(- I ) i+ k

ek,

.. 'a,.

)
aHk-o

aHk+i>

aln

au-m

au-1xk-o

ac;-O<k+O

au-on

au+o1

au+ock-o

au+i><k+o

a< i+l)n

ant

an<k-0

anck+O

a,.n

ai2

an2

(6.6.10)

The number on the right of (6.6.10) is called the

cofactor of aik and we


Co (aik ).' Note that C o(a i k ) is a suitably signed deter
minant of an (n - I) X tn - I) matrix. Also note that this (n - I) X
( n - I) matrix is obtained by deleting the jth row and the kth column from
the original matrix.
shall denote it by

'

The arguments we have given in the last several paragraphs con


stitute a proof of the following theorem.

6.6.11

Theorem.

If [aij] is any n

n matrix, then Vj

det [aij]

k=l

aik

Co (aid =

aki

Co (aki ) .

( 1, n} ,
(6.6.11)

k=l

In computing the determinant of a matrix, whether by the formula

(6.6.5) or the cofactor formula (6.6.11), it is often very useful to use


Theorem 6.6.5, especially part (c). This says we can multiply any row
by any number and add it to any other row without changing the value
of the determinant. By use of Theorem 6.6.10 it follows that the same
procedure is valid if we operate on the columns of the matrix. This
procedure allows us to compute the determinant of a matrix by com
puting the determinant of another matrix which may have many more
zeros than the original matrix.

6.6

[au]
[bu] be

Suppose now that

i j,
[au] by

is an

with

and let

of

its jth row. Then

DETERMINANTS I 285

matrix. Fix the integers

obtained from

[au]

and

by replacing the ith row

[bu] has

two rows that are equal and hence

has a zero determinant. Use formula

to expand by the ith row

of

[bu]

to get (remember that


det[b;;]

Thus we see that for

Vi,j

(6.6.11)
b;k= aik ! )

2,

k=l

aik Co(a;k)= 0.

(1, n)

we may write

2,

k=I
where

B;i= 0

entry is the number

adjoint of [a;1],

aik Co(aik)=Bu det[aii].

{)ii= 1.
Co(a1;). This

j &

(6.6.12)

Let adj [aii] be a matrix whose

(i,j)

matrix has classically been called the

although in recent times the word 'adjoint' has been

used for other purposes. The formula

(6.6.12)

shows that

(6.6.13)
By working with the columns of

[aii] instead of the

rows, the same kind

of reasoning leads to the fact that

(adj[a;1]) [aii]= (det[a;1]) [B;1] .


The equation

(6.6.13')

leads to Cramer's rule for solving a system of

linear equations inn unknowns. Let

numbers and suppose

(6.6.13')

{x1

{y1,

be a given set of real

y.. }

is another set of real numbers so

x.. }

that the matrix equation

(6.6.14)
is satisfied. If we perform the matrix multiplication and equate corre
sponding entries on each side, we see that this is the same as a system of
n

linear equations inn "unknowns."


If we multiply both sides of the matrix equation

(6.6.13')

on the right

by the column matrix consisting of the xk, we get

(adj[a; ]) [Ytn]
Y

(det[a.,])

r:l ]

(6.6.15)

x,.

If we multiply out the left side and equate entries we get


11

xk d et[ai;]= 2,
j=l

Y;

Co(a1k).

(6.6.15')

286 j HIGHER-DIMENSIONAL SPACE

By the formula for the expansion of the determinant of a matrix by


cofactors of a column, the right side of (6.6.15') is nothing more than

[bii]k which

the determinant of the matrix

[aii]

is obtained from the matrix

by replacing the kth column of the latter matrix by the column

consisting of the

Y;.

Now, if det

[aii] = 0,

there is nothing more that can

be said at this point. However, if det [aii] =/= 0, from (6.6.15') it follows
that

xk-

det[b0] k
det [a ;;]

(6.6.16)

This shows that if det[a;;] =/= 0 and the equation (6.6.14) has a solution,
the solution must be of the form (6.6.16) which implies the solution
is unique.
Conversely, if we suppose det

[a;;]

=/=

0, then the numbers xk given

by (6.6.16) satisfy the equation (6.6.15') or, what is the same thing, the
equation (6.6.15). Multiply both sides of equation (6.6.15) on the left
by the matrix

[aii]

and then use the relation (6.6.13). This shows that

(6.6.14) is satisfied. The formula (6.6.16) is known as Cramer's rule for

solving (6.6.14). We can summarize these facts as follows:

6.6.12 Theorem. If [a;;] is an n X n matrix with det [aii] =/= 0, and


Yn} is any set of n real numbers, then the equation (6.6.14) has a
{yi.
unique solution gi,ven by (6.6.16) (Cramer's rule), where [biih is the matrix
obtained from [a1;] by replacing the kth column in [a;;] by the column vector
Yn)
(yi.

Suppose now that

is the identity transformation of a linear space

onto itself. The matrix representation of I with respect to


basis is

every ordered
8iJ = 0 if i =/= j, 8;; 1. If A is a linear transformation
inverse, [a;;] a matrix representation of A with respect to an
basis, and [aiJJ-1 denotes the matrix representation of A-1

[ 8;;],

with an
ordered

where

with respect to the same ordered basis, then

[a;;] -1 [ail] = [8iJ] .


Since by Corollary 6.6.9, the determinant of a product of square
matrices is the product of the determinants of the matrices, it follows
that
det

[a;;]-1

(det

[a;;))-1

Let T be a linear transformation of a vector space into itself and

[t;;] a matrix representation of T with respect to a given ordered basis.


If [t' ;;] is the representation of T with respect to another ordered basis,

then according to formula (6.5. l) there exists an invertible linear oper


ator Q so that if

(q;;]

is the matrix representation of Q with respect

to the original ordered basis, then

6.6

DETERMINANTS I 287

Hence
det

[ti;]

det

[t;;].

This shows that the determinant of

any matrix representation of T


This fact allows us to define the determinant of a linear
transformation (of a linear space into itself) as the determinant of any matrix
representation of the linear transformation with respect to an ordered basis.

has the same value.

6.6.13

Theorem.

linear transformation

is nonsingular<==> det T

- -0.

Proof.

If T has an inverse T -1, then T -1


det T -1 det T

I and hence

1.

This shows det T - 0.


Conversely, suppose det T - 0 and T ( x )

0. Suppose

( u1,

,Un

is an ordered basis for .B( T ) and


n

T(uk)

j=l

t;ku;,

Then
"

T(x)

Since the

u;

k=1

k
x T(uk)

form a linearly independent set, we have a system of linear

equations
n

k=1

k
t;kx

o,

( l , n) .

Since det T - 0, Cramer's rule tells us that Vk E

k
(1, n), x

0. Hence

0, which shows that T is one to one.

We shall close this section by proving a theorem that we shall need


at one point in Chapter 7.

6.6.14 Theorem. Suppose T is a symmetric linear transformation de


fined on a linear subspace of dimension n in Eq, and [t;;] a matrix represen
tation with respect to an ordered orthonormal basis. Let us set

288 I HIGHER-DIMENSIONAL SPACE

t11
t21

/:;.k=

t12
tz2

t1k
t2k
(6.6.17)

tk l

tk2

tkk

Necessary and sufficient conditions that 3m > 0 so that Vx E JV(T)


2
T(x) x ;;;;: m lxl
(6.6.18)
are
/:;.k > 0,
Proof.

VkE(l,n).

(6.6.19)

Let us first prove the necessity. Suppose that [tu] is the

matrix representation with respect to the ordered orthonormal basis

(u1 , , un). Then the matrix [t;;] itself is symmetric; that is, tiJ

t;;.
T
are bounded below by m. If {i1.k: kE ( 1, n)} are the eigenvalues of T,
then since det T is independent of any matrix representation, we have
=

The condition (6.6.18) means certainly that all the eigenvalues of

n
/:;.n=det T= I1 A.k

k=I

Now,

mn > 0.

VkE ( 1, n), let Tk be the symmetric linear transformation


(u1,
, uk) and given by

acting on the space generated by

Tk(u;)= L tuui.

jE (l,k).

i=l

The symmetry of

Tk comes from the symmetry of [tu] or from the


Vx,y E JV(Tk),

easily established fact that

Tk(x) y=T(x) y=xT(y)=x Tk(y).


Replacing y by
is satisfied for

x in the above equalities we see that the inequality (6.6.18)


Tk. Thus arguing in exactly the same way as before, we

get that

The proof of the sufficiency will proceed by induction. If

n=1, it

is clear that the inequality (6.6.19) implies the inequality (6.6.18).


Assume now that the sufficiency is true for

n=p

1 < q. Let

T be a

symmetric transformation on a linear subspace of dimension p in Eq,


and suppose the inequalities (6.6.19) are satisfied

VkE (l,p). By

repeated application of Theorem 6.6.5(c) on the addition of multiples


of a row of a determinant to other rows, and the use of Theorem 6.6.10,
which allows us to do the same thing for columns, we get

6.6

lip

l11

DETERMINANTS I 289

l11

S22

S2p

Sp2

Spp

Lip =
lpp

lpl
where,sincelrs=lsr.

Vr,s E(l,p),

Note that, since Li1 =t 11 > 0, we can divide by t11 Further, if

VkE (2,p) ,

rk-1 =

it follows that

VkE(2,p).
Thus

(6.6 2 0)

VkE(l,p-1).

Let S be the symmetric transformation, with domain the linear space


generated by (u2,

,up), and defined by


p

Sui =

L s u ui

i=2

j E(2,p).

>

From (6.6.2 0) and the inductive hypothesis it follows that

Vx E .:B(S),

o/:- 0, we have

S(x)

> 0.

Let us now set

kE(2,p).
It is an easy matter to check that ( u1, u2,
and we leave this for the reader. Now,

T(u;)

u;) is a basis for .:B(T),

Vj,kE (2,p),

t it k
ufc=tki- i t =S(ui)
t11

Further, VkE(2, p),

T(u 1)
Suppose now that

u;,

u1 T(u;,) = 0.

uk.

290 I HIGHER-DIMENSIONAL SPACE

Then, since

u1

T(uk)

T(x) x

1
x u1

k =2

0, we have, if

xkuk,

1
t11(x )2

t11(x1)2

1
t11(x )2

+ S(x')

#- 0,

f f xixkT(uj)
p

L L

k=2 j=2

xixkS(u;)

x '>

ufc
uk

0.

In other words, Vx E .B(T), x #- 0, we have T(x)


The function with values

k=2 j=2

x > 0.

T(x) x is continuous. If we restrict it to


the compact set K
{x: lxl
1 & x E .B(T)}, then since T(x) x
> 0, 3 m > 0, so thatx EK==> T(x) x m. Ifx #- O,x E .B(T), then
x/lxl EK and thus
=

T (x)
If x

m lxl2

0, this inequality is obviously fulfilled. Thus the induction is

established and the theorem is proved.

6.6.15 Corollary. Suppose T, [ti ], and ak are as in the last theorem.


i
Then necessary and sufficient conditions that 3m > 0 so that Vx E .B(T)

T(x) x

-m lxl2

(6.6.21)

are
VkE(l,n).

(6.6.22)

In case <let T
an #- 0, but the conditions (6.6.19) do not hold and the
conditions (6.6.22) do not hold, then 3x so that T(x) x > 0 and 3y so that
T(y) y < 0.
=

Proof.

If we set S =

-T and a 'k the determinant in (6.6.17) for the

transformation S, then

The first statement iri the corollary is now an immediate consequence


of the previous theorem.
To prove the second statement we first note that since an #- 0, zero
is not an eigenvalue of

T. If the conditions (6.6.19) do not hold and

the conditions (6.6.22) do not hold, then T must have at least one posi
tive and one negative eigenvalue. Taking x a nonzero eigenvector for
the positive eigenvalue and y a nonzero eigenvector for the negative
eigenvalue completes the proof.

6.6

DETERMINANTS I 291

D Exercises
1.

Compute the following determinants:


(a)

(c)

-1
I 1
-2

-2

-1

-1
1
1
3

(b)

( d)

-3

-1

a
a

1
c
2
c

2. Use Cramer's rule to obtain solutions (if possible) to the follow


ing systems of linear equations:
2x1 +3x2 - 5x3= 3

(a)

x1 - 2x2+ x3=0
3x1+ x2+3x3=0.

(b)

2x1+ 4 x2 - 3xa = 3
3x1 - 8x2+ 6x3=

4x1 + 8x2 - 6x3 = 2.

3. Let D be an alternating multilinear functional on the n-fold


Cartesian product of vn. Show that
D(a1,

a =
, n)

det [a ii ] D (e i .

,en ) .

4.
Let <r be a permutation of (1, n) that interchanges only two
elements; that is, u(j ) = k, u(k) = j, and Vi, i j, k, u(i) =i. Assuming
j k, give all the details of the proof that sgn <r = 1
-

5.

Let [a u ] be an n

n matrix with

det[a;;] =

a;;= 0

if i

> j.

Prove that

n akk

kI

1.

6. If U is an orthogonal linear transformation from an n-dimen


sional real vector space onto itself show that ldet VI =
Give an
example of an orthogonal linear transformation for which the determi
nant is -1 and an example for which the determinant is 1.
3

Suppose that A is a linear transformation of V onto itself whose


matrix representation with respect to a given ordered basis is
7.

-3
1
-2

-n

Compute the matrix representation of


ordered basis.

1
A-

with respect to the same

292 I HIGHER-DIMENSIONAL SPACE


8.

A linear transformation A is called skew-symmetric A 1

If A acts Q..n an n-dimensional space into itself, and


that <let
9.

-A.

is odd, prove

0.

Suppose that we have an

(r + s)

matrix of the form

(r + s)

[ i ].
where

is an

rXr

matrix,

is an

and the lower left-hand corner is an

J
(Hint: Fix

r X s matrix, C is
s X r zero matrix.

(<let

an

X s matrix,

Show that

A)(<let C).

and B and set

D (A, B, C)

<let

[i l

Show that this is an alternating multilinear functional on the s-fold


Cartesian product of

vn.

Hence by Exercise 3

D(A,B,C)
where I is the

( det C)D(A B I)
,

identify matrix

[Sii].

Next show that by substracting

multiples of the rows of I from the rows of B we get

D(A,B,I) =D(A, 0, I),


and the right side is an alternating multilinear functional of the rows
of

A.)
I 0.

Show that the matrix equation

an
a2n

ann
,
has a nonzero solution

11.

(x1,

Xn)

x'

l f ] fl

<let

0
.
.
0
O

X2
.
.
.
Xn

[aii]

0.

Define a submatrix of a given matrix as one obtained by striking

out rows and/or columns from the original matrix. Let T be a linear
transformation and

[tii]

a matrix representation with respect to any

p there exists a
p X p submatrix of [tiJ] with nonvanishing determinant, and any sub
ordered pair of ordered bases. Show that rank T

matrix with more rows and columns has a vanishing determinant.

6.7

6. 7

FUNCTION SPACES I 293

FUNCTION SPACES

Let K be any set in En and designate by Cm(K) the collection of all


bounded continuous functions with domain Kand range in Em. As usual,
we shall define the sum of two functions in Cm(K) by the equality

Vx EK,

(J + g) (x) =f(x) + g (x),


and Va E R we define

(af) (x) = af (x),

Vx EK.

With these definitions Cm(K) becomes a real vector space as defined in


Section 6.1. However, this vector space is finite-dimensional if and only
if K has a finite number of points (Exercise l).
We can put a norm on the elements of Cm (I(.) by defining

llJ ll =sup {lf{x)I: x EK}.


It is clear that

11111=0 <=> f= 0.

Also, it is almost immediate that Va E R and Vf E Cm(K),

llafll = la l llJll,
and Vf, g E Cm(K) the triangle inequality

II!+ gll

:s;;

11!11 + llgll

is valid.' The distance between J and g is taken as the number llJ- gll.
Using this definition of distance it is clear how to define a Cauchy
sequence in Cm(K). In Theorem 3.4.5 we proved that a uniformly
Cauchy function sequence is uniformly convergent. This was done for
function sequences whose elements were real-valued functions with a
common domain in R. However, the proof carries over mutatis mutandis
to Cauchy sequences in Cm(K). In Theorem 3.4.6 we proved that a
uniformly convergent sequence of continuous functions converges to a
continuous function. This also carries over to Cm(K). Thus we have the
following fact.

6. 7 . 1 Proposition. The space Cm(K) is complete in the sense that every


Cauchy sequence with range in Cm(K) converges to an element of this space.
An open ball in Cm(K) can be defined in the same way as for Euclidean
space:

B( g, p) = {f: f E Cm(K) & llJ- gll

<

p}.

is said to be the open ball in Cm(K) with center g and radius p. Using
the concept of open ball, an open set in Cm(K) is a set for which every

294 I HIGHER-DIMENSIONAL SPACE

point is the center of an open ball contained completely in the set.


Thus we are led to the idea of an open covering for a set in Cm(K).
Also, the idea of an open ball or an open set can be used to formulate
the concepts of accumulation point and closure in the obvious way.
It is probably clear from the discussion of the last paragraph on open
sets and open coverings that we are interested in the concept of com
pactness for subsets of Cm(K). We saw in Chapter 2 and again in Section
6.3 of this chapter the importance of the Heine-Borel theorem. Un
fortunately, it is no longer true that a closed and bounded set in Cm(K)
has the Heine-Borel property. Since the Heine-Borel property seems
to be the more important property we define compactness in terms of it.
6. 7 .2 Definition. A set A C Cm(K) is said to be compact in every
open covering for A there are a .finite number of sets which cover A.

There is a covering concept in terms of open balls which as we shall


see is essentially the same as the Heine-Borel property, but is much
easier to use in some contexts. We first give a definition.
6. 7 .3 Definition. A set AC Cm(K) is said to be totally bounded
VE> 0, there exists a .finite set {g k: k E (1,p)}C A so that

AC U {B(gk, E): k E (l,p)}.


It is very easy to show that a totally bounded set is bounded and we
leave this for the reader (Exercise 2). A more important fact is the
connection between total boundedness and compactness. To establish
the connection it seems to be necessary to use the axiom of choice that
we enunciated in Section 2.1. Recall that we agreed to use the symbol
(AC) in front of a statement that required this axiom.
6. 7 .4

Theorem.

A compact set in Cm(K) is closed and totally bounded.


(AC) Conversely every closed and totally bounded set in Cm(K) is compact.
Proof. Let us first prove that every compact set is totally bounded.
Suppose A is compact; then VE> O,{B (g, E): g E A } is an open cover
ing for A. By the definition of compactness, this open covering reduces
to a finite subcover, which is just the definition of total boundedness.
To show A is closed, let g E Ac. Since every element inA has a positive
distance from g, the collection of open sets {B(g, l/k)c : k E N} is a
coverin for A. By compactness there are a finite number of these sets
B (g l/kJc: j E (1, p)} which cover A. Let p =min {l/kj: j E (l,p)};
then B(g, p) covers A and thus B (g, p) C Ac . This shows Ac is open and
hence A is closed.
To prove the converse statement we shall first proceed in a way in
which it may not be completely clear just how the axiom of choice is
being used. We are doing this so that the idea of the proof does not be,

6. 7 FUNCTION SPACES I 295

come obscured with formalities; we shall take care of the formalities


later on.
It will make technical matters slightly simpler if we restrict the distance
function on

Cm(K)

space

altogether. Thus when we speak about a ball or an open

Cm(K)

Cm(K)

to A X A and forget about the ambient

set we shall mean a ball in A or an open set in A with respect to the


given distance function. Since, by hypothesis, A is closed and

Cm(K)

is complete, it follows that A is complete; that is, every Cauchy sequence


with elements in A converges to an element in A.
Suppose now that 'U is an open covering for A but no finite subset of
'U covers A. Since A is totally bounded, a finite number of balls of radius

I covers A. Hence there must be one of these balls, say B0, so that no
B0 Now, B0 (being a subset of A) is totally
bounded. Indeed, V > 0 we may cover A by a finite number of balls
of radius e/2 and hence a finite number of these balls covers B0 The
only question is whether these balls have their centers in B0 However,
if a ball of radius e/2 has a nonvoid intersection with B0 we can put a
finite subset of 'U covers

ball of radius around a point of this intersection and so get a finite


set of balls with centers in B0 and radius which covers it.

B0 is totally bounded and covered by 'U but no finite subset of


B0, we may repeat the argument given above and find a ball
B1 in A of radius 1/2 with center in B0 so that no finite subset of 'U
covers B1 Proceeding in this way (a rather vague statement!) we get
a sequence (B n) of balls with the radius of Bn being l/2n, the center of
Bn is in Bn-I and no finite subset of 'U covers Bn. Let g n be the center of
Bn Then for n m we have
n-1
I
llg n - g mll L llgk+l - gkll < 2m-1
km
Since

'U covers

(gk) is Cauchy, and since A is


3g E A, so that gk - g. Since 'U is an open
E 'U so that g E U. However, 3N so that n
Bn C U. This is a contradiction, since no finite
Bn.
Thus the sequence

complete, it follow.s

that

covering for A,

N ==> gn E U

3U

and

subset of 'U covers

The part of the proof that requires the axiom of choice is the rigorous

(Bn). Let R be the rela


n
n
tion consisting of all ordered pairs (B(g, l/2 ), B ( h, I/2 +1 )), where n
n
n
ranges over N0, h EB(g, I /2 ) and each of the balls B(g, I /2 ) and
n
+
B(h, l/2 i) cannot be covered by any finite subset of 'U. If we suppose
establishment of the existence of the sequence

that A is totally bounded but cannot be covered by any finite subset of


'U, then R is nonvoid and indeed Vn E N0, R contains an ordered pair
(B(g, l/2n), B(h, l/2n+1)). Now, Vg E (R), put

Ri= {': (g,,) ER}.


The set

Rl is nonvoid. Thus, using the axiom of


F with (F) = (R) and F(g) ERl.

function

choice, there exists a

296 j HIGHER-DIMENSIONAL SPACE

Using the function F we can establish the existence of the sequence


by means of the axiom of induction. Let B(g0, 1) be a ball in A
which cannot be covered by any finite set from 'U. We leave to the reader
the simple induction argument which establishes that Vn E N0, there
exists a unique function Gn with domain (O, n) so that
(Bn)

G n( O ) =B(go. 1),
G,,(k+ 1) =F(Gn(k)) ,

kE(O,n-1), nl.

Now, Vn E N0, take G(n )


Gn(n ) and it is the function G that is the
sequence (Bn) .
Another useful equivalence with compactness in Cm (K) is given by
the following theorem.
=

If A C Cm (K) is compact, then every sequence with


range in A has a subsequence that converges to an element of A. (AC) Con
versely, if every sequence with range in A has a subsequence that converges to
an element of A, then A is compact.
6. 7 .5

Theorem.

Suppose A is compact and (gn) is a sequence with range in


If no subsequence of (gn) converges to a point of A, then Vg E A,
3 pg so that the set {k : gk E B(g, pg)} is finite. To prove this we assume
to the contrary that 3g E A so that V p > 0, {k: gk E B(g, p)} is infinite.
Let An-i = {k: gk E B(g, l/n )}, n E N. Using the axiom of induction,
it is easy to establish that Vn E N 0, there exists a unique function 'Pn
with domain(O, n) so that 'Pn( O ) =min A0 and Vk E(1, n)
Proof.

A.

'Pn(k)

=min {Ak\{q;n(l),

",'{)n(k-1)}}.

Note that Ak\{q;n(l),


'Pn(k - 1)} is nonvoid, since Ak is infinite.
Further, since Vn E N0, An+i C An, it follows that 'Pn(n) < 'Pn+i (n + 1).
Now let q; be that function with domain N0 so that q; (n) = 'Pn(n ). Then
(g.p<n>) is a subsequence of (gn) that converges tog. This is a contradiction.
The collection {B(g, pg): g E A} is an open covering for A and thus
reduces to a finite subcover. By what we have proved above, this means
that {k : gk E A} is a finite set, which is a contradiction.
Conversely, suppose every sequence with range in A has a subsequence
that converges to a point of A. The first conclusion that can be drawn
from this is that A is closed. For if g is an accumulation point of A, using
the axiom of choice we can pick a sequence from the collection of balls
{B(g, l/n): n E N} which converges to g. But since this sequence
contains a subsequence that converges to an element of A, we must have

g EA.
If the set A is not totally bounded, then 3E: > 0 so that no finite set of
balls of radius E covers A. Pick g0 EA, and since B(g0, E ) does not
cover A we may pick g1 E A n B(g0, E ) c. Since B(g0, E ) and B(g1, E )

6.7

do not cover A, we may pick

g2 in A and

outside of B(g0,

Proceeding in this way we get a sequence

Vn,m E N0, llgn - gmll E.

FUNCTION SPACES I 297

(gk)

E)

with range in

Hence no subsequence of

(gn)

B(g1, e ) .
A so that

is Cauchy

and hence no subsequence can converge. This is a contradiction.


Of course, the proof of the existence of the sequence

(gk)

of the last

paragraph requires the axiom of choice. The method of procedure


is similar to that used in the proof of Theorem 6.7.4. We are going to
leave it to the reader as an exercise to make this precise.
The last two theorems we have proved can be carried over to very
much more general situations than what we have considered here:

X
X which satisfies the

they can be carried over to metric spaces. A metric space is a set


together with a distance function

defined on XX

following:
(a)
(b)
(c)
(d)

Vx,yEX.
d(x,y)O.
d(x,y)=0 {::::} x = y.
d(x, y) = d(y, x),
Vx,y EX.
d(x,y) d(x,z) +d(z,y),
Vx,y,zEX.

We leave it to the reader to formulate the relevant concepts in this


context

and to convince himself that the last two theorems can be

stated and proved in essentially the same way as for

C m(K ).

THE ARZELA-ASCOLI THEOREM

The compactness of a set in

Cm(K)

is not equivalent with the set being

closed and bounded. However, if K is compact in

En we get equivalence

(using the axiom of choice) if we add a third condition to "closed and


bounded." The third condition is equicontinuity, which we now formulate.

6. 7.6

Ve

A set A C Cm(K) is said to be equicontinuous {::::}


so that Vx,y EK with Ix - y l < 6 and Vf EA,

Definition.

> 0, 36 > 0

IJ(x) - f(y)I

<

E.

Note that every element of an equicontinuous set is uniformly con


tinuous. A simple example which shows that there may be closed and
bounded sets in

C m(K) which are not compact


A={/: f E C1 (K)

lowing: Let K = [O, l] and

is provided by the fol

&

II/II l}.

The set

is clearly closed and bounded. However, the functions defined by

fn(x) = x"
are in

A,

but no subsequence converges to an element of

A.

6.7.7 Theorem. Suppose K is a compact set in E " and A C C m(K ).


If A is compact, then A is closed, bounded, and equicontinuous. (AC) Conversely,
if A is closed, bounded, and equicontinuous, then A is compact.

298 I HIGHER-DIMENSIONAL SPACE

Proof. If A is compact, then by Theorem 6.7.4 it follows that A


is closed and totally bounded, and hence, a fortiori. A is bounded. Since
A is totally bounded, Ve > 0, there exists a finite set of open balls
{B(gk,e/3): kE ( 1, p)} which covers A. Each gk is in A, and since K
is compact each gk is uniformly continuous. Thus 38 > 0, so that
Vx,y E Kwith lx-yl < 8 and VkE(l,p),

lgdx) - gk(y)I

<

e/3.

Now, VJ EA, 3kE (l,p) so that J E B(gk> e/3). Thus Vx,y E K


with Ix - yl < 8, we have

IJ(x) -j(y)I::;; IJ(x) - gk(x)I + lgk(x) - gk(Y)I


+ lgk(y) - f(y )I <

E.

Hence A is equicontinuous.
To prove the converse statement we shall prove that A is totally
bounded. Since A is closed it will follow by Theorem 6.7.4 that A is
compact. Note that the axiom of choice is needed, since it is needed
in part of the proof of Theorem 6.7.4.
Since A is equicontinuous, VE > 0, 3 8 > 0 so that V x, y E K with/
Ix - Y I < 8 and VJ EA we have IJ(x) - J(y)I < e/4 . Since Kis co n:f
pact, it is totally bounded and hence there is a finite set of balls {B(xk, 8}':
kE ( 1,p)} which covers K. Since A is bounded, each set

kE(l,p) ,
is bounded, and thus totally bounded. Hence there is a finite set of
balls{B(f;(xd, e/4): i E (l,pk)}which coversAk.
Suppose, for the moment, that we are able to find a set { hk: k
E ( 1, p)} of real-valued nonnegative continuous functions on K so
that Vx E B(xk, 8)c n K, hk(x)= 0, and Vx EK,
k=l
Letj = (j1,

,jp), wherejk E (I,pk), and Vx EK set

g i (x)

L hk(x) Jik (xd.

k=l

As j ranges over the set of vectors we have indicated above, we get a


finite set of functions each of which belongs to Cm (K).
j,;) so that VkE(l,p), IJ (xk)
Now, VJ EA , 3j= (j1,
- Jik (xd I < e/4. This simply follows from the fact that {B (f; (xk),
e/4): i E ( 1, Pk)} covers Ak. Further, by the equicontinuity of A ,
Vx E B(xk,8), IJ(x) - J(xk)I < e/4, and thus IJ(x) - Jik (xk)I < e/2.
Since Vx EK,

J(x)

L hk(x)J(x),

k=l

6.7

FUNCTION SPACES j 299

we get

jgi(x)-f(x)I

kL=l hk(x) lfik (xd-f (x)I.

x EK, let us set N(x) {k: x E B(xk , o)}. For every k


N(x), hk(x) 0, and Vk E N(x), lf;k (xk) -f(x)I < E/2. Thus Vx
EK,

For every

lgi(x)-f(x)I

<

(E/2) L hk(x)
kEN(Xl

E/2.

This shows that A can be covered by a finite set of balls of radius

E/2.

From this we conclude that A is totally bounded. The only small point
to be clarified is that the centers of the balls may not lie in A. However,
as in the proof of Theorem 6.7.4, for each of these balls which has a
nonvoid intersection with A we choose a point in this intersection and
take the ball with the chosen point as center and radius
a finite set of balls with centers in A and radius

E which

E.

This gives

cover A.

To complete the proof we must show the existence of the set

( 1, p)}.

Let

cp

'P(x)
The function

cp

{hk(x):

be that function on En defined by


=

{ 1- lxl

<:::::}

0 <:::::}

lxl
l xl

1,
;;.;,: 1.

is clearly continuous. Now define

'Pk

with domain

E'', by setting

cpdx)= cp( (x -xk)/o).


The function 'Pk is clearly continuous,
and

Vx E B(xk,o)c, cpk(x)= 0.

Vx E B(xk,o) we have 'Pk(x) > 0,

If we set

B= u {B(xk,o): k E (l,p)},
then

Vx EB,
p

Let us

<l>(x)= L 'Pk(x) # 0.
k=l
define hk on K by putting Vx EK
hk(x)

cpk(x)/<J>(x).

We see immediately that this set of functions has the required properties.
This completes the proof.
REMARK:

The set of functions

{hk: k E (1,p)}

that we used in the

proof of the previous theorem is called a


subordinate to the covering

partition of unity for K


{B(xk,o): k E (I,p)}. We shall meet

these objects again later on, especially when we study integration on


manifolds.

300 I HIGHER-DIMENSIONAL SPACE


THE STONE-WEIERSTRASS THEOREM

We now come to an important generalization of the Weierstrass ap


proximation theorem which was proved in Section 4.6. This generalized
theorem was proved by M. H. Stone in a context that is more general
than we shall present it, although the proof is the same. The reason we
do not present the theorem in as general a context as originally given is
that a discussion of the relevant concepts would take us too far afield.
The theorem we shall state is valid only for

C1(K), where K is a compact


C(K) in place

set in En. For the sake of simplicity we shall simply write


of

C1(K).

Definiti.on. A set A C C(K) is said to be an algebra<:::=?


A is a linear subspace of C (K).
(b) Vf, g EA ,Jg EA.
An algebra A is said to be a separating algebra<:::=? Vx,y E K with x # y, 3 f
EA so that f (x) # f(y). An algebra A is said to be a closed algebra<:::=?A is
closed as a point set in C(K).
6.7.8

(a)

K = [O, I] andA the col


K is a compact set in E2,
in two variables restricted to K; that

For an example of a separating algebra take


lection of all polynomials restricted to
take A as the set of all polynomials

[0, I].

If

is, the elements of A are functions of the form

P (x, y) =

ai,!cX'iyk'

k=l j=l

(x, y) E K .

Clearly this is a separating algebra. A third example of a separating

C(K) is the set of all functions in C(K) that


K. Indeed the l(}.st algebra is even closed.

algebra in
point of

vanish at a fixed

6.7.9 Theorem. Suppose K is a compact set in En andA is a closed sepa


rating algebra in C(K). Then either A= C(' K), or 3a E K so that A = {f: f
E C(K) &f(a)= O}.
Proof.
(a)

We shall make the proof in a number of steps.

If g EA, then JgJ EA.

To make the proof of this theorem independent of the classical


Weierstrass approximation theorem, 4.6.3, we shall proceed in this part
in a way that is slightly longer than absolutely necessary. If
function with domain

]-a, oo[

and values (t

) 112

a>

0, the

is infinitely differ

entiable. By using, for example, the Lagrange form of the remainder, the
reader may easily convince himself that the Taylor expansion of this
function around t = 1/2 converges uniformly in the interval
If we replace t by y2 and

by

2
E , E

[O, I J .

> 0, we see then that there is a poly-

6.7

nomial

p so

that

jp(O)I< 2e and Vy

FUNCTION SPACES I 301

[O, l] ,

I (y2 +E2)1/2- p(y2)I< E.


Now,

e[ (y2 +E2)1/2

(y2) 112)
,,,,;; [ (y2 +e2) 112 + (y2)112][ (y2 +e2) 112

Thus, since

IYI = (y2) 112,

(y2) 112] ,,,,;; E2.

we have

I (y2 + e2) 112- IYI I < E.


Consequently,

IP(y2) - IYI I< 2e


ll gll

If

Vx

,,,,;; 1, this gives

E K,

IP(g2(x))- jg(x)l I< 2e.


p g2- p(O)

Since A is an algebra,

E A. Thus we see that

lgl

can be

approximated by elements of A, and since A is closed this means that

lgl

EA. If

l lgll

> 1, then

g/llgll

has norm 1 and it is in A. Thus by this

device we see that it is always true that


(b) If g,h EA, then g
also belong to A.

h=

max

EA=>

lgl

EA.

(g,h) and g

/\

h=

min

(g,h)

The proof of this statement is an immediate consequence of .the


formulas

g (\ h
and the facts that
(c)

If

and Va, b

Vx

g+ h

E K,

ER,

3f

By hypothesis

3g

h= 2 [g +h +lg- IL
1

2 [g +h- lg- hi],

E A and

jg- hi

EA.

so that g(x) = 0, then Vx, y


so that f(x) =a and J(y) = b.

EA

EA

3g,h

EA so that

hypothesis of the theorem,

3p

EK

so that x

= y

g(x)= 1 and h(y) = 1. Also, by


p(x) = p(y). For 8, e ER,

EA so that

let us set

ip(8,e)= p(x) + 8 +eh(x) ,


1/1(8,E) = p(y) + 8g(y) +E .
Then, since

lip(8 ,e) - 1/1(8, e)I


it is clear that

371

IP(x)- p(y)I - io(l - g(y)) +e( h(x)- 1) j,

> 0 so that if

iol <

11 and

lei<

11 then

ip(o, e)

302 I HIGHER-DIMENSIONAL SPACE

#- lf1(8, E). Now, if p(x)= 0, then p(y) #- 0 and hence we think it is


clear that it is possible to choose 8 and

E,

so that 181 < 'Y/ and

IEI

< YJ,

#- 0 and "1(8, E) #- 0. If p(y)= 0, then p(x) #- 0 and the


same result holds. If p(x) #- 0 and p(y) #- 0 we clearly get the same
and <,o(8, E)

result. Thus we see it is possible to choose 8 and

so that if we set

q =p+ 8g+ Eh.


then q(x) #- 0, q(y) #- 0 and q(x) #- q(y). Since A 1s an algebra and
p, g, h EA, it follows that q EA.
Now, let us set

f= aq+ f3q2,
where

a and f3 are taken so that

a= aq(x)+ f3q2(x),
b

aq(y)+ f3q2(y).

This is possible since

I :i;j ::i;j I= q(x)q(y)[q(y) - q(x)] #- O.


(d)
(c)

Under the hypotheses of the theorem and the additional hypotheses of


it follows that A= C(K).

Let f E C(K); under


V x, y EK, 3Pxu EA so
E

>

the additional hypotheses of (c) it follows that


that

Pxu(x)

f(x) and Pxu(Y)

f(y)

Fix

y and

0 and consider the set

Ux= {z: f(z)


Since

Pxu

contains

<

Pxu(z)}

and fare continuous, Ux is a relatively open set in K which

x. The collection { Ux: x E K} is a relatively open covering

of K and therefore the Heine-Borel theorem tells us that there is a


finite set

{ Uxk :

E (I, m)} which covers K. Let

qy
then by (b) we know that

Px1Y

qy EA.
f(z)

Pxmy;

Moreover,

- E

<

Vz E K,

qy(z).

Next, let us set

Since

qy

and

f are

continuous, it follows that

Vu is

a relatively open set

in K. It follows as before, by use of the Heine-Borel theorem, that there


is a finite set
/\

{Vuk: k E (I, r)} which covers K. If we set h= qy1


qYr then by (b), h EA, and moreover Vz E K
f(z)

- E

<

h (z ) < f(z) +

E,

/\

6.7

FUNCTION SPACES I 303

or, what is the same thing,

IJ(z) - h (z) I <

E.

This means that f can be approximated by elements of A, and, since

A is closed, f EA.
If 3a EK

(e)

& g(a) = O} .

so

that VJ E A,f(a) =

0,

then A= {g: g E C(K)

Let Ai be the closed algebra generated by A and the function e,

which has the value e(x) = 1 for every x EK. Then Ai is a closed,
separating algebra which satisfies the additional hypotheses of (c).
Thus Ai= C(K) which means that Vf E C(K) so that f(a) = 0 and

V E>

0,

3g EA and c E R so that

llJ- (g + c) II< e/2 .


In particular, this means that if we evaluate the function f- (g + c)

al a we get lcl < e/2. Thus

llf-gll<

E,

and this shows that f EA. This completes the proof of the theorem.

D Exercises
1.
Show that Cm(K) is finite-dimensional K has a finite number
of points.

2.

Show that every totally bounded set in Cm(K) is bounded.

3.

Give a proof, using the axiom of choice, of the existence of the

sequence (gk) needed in the proof of the second statement of Theorem


6.7 5 The method will be similar to that used in the proof of .Theorem
.

6.7.4.
Let A be a bounded equicontinuous set of functions in C(K).

4.

Show that

f*(x) = sup{f(x): f EA}


defines a uniformly continuous function on K. Show that the conclusion

may not necessarily hold if A is not equicontinuous.


5.

Suppose K C E" is compact and A C C(K). Suppose A has the

property that Ve> 0 and Vx EK, 38(x,e) so that Vf EA and

Vy EK for which lx-yl < 8(x,e) we have IJ(x)-J(y)I <

that A is an equicontinuous set.

Show

If K is not compact, show that there may be a compact set


C(K) which is not equicontinuous.

6.
C

E.

304 I HIGHER-DIMENSIONAL SPACE

7.

Let K = [l,

oo[,

Vx E K and Vn E N set

and

f,.(x)=x-1111
and A= {!11:

n E N}. Show that A C C(K) is closed, bounded, and

equicontinuous, but is not compact. This shows that if K is not a bounded


set, the last statement of Theorem 6.7.7 may not hold.
8.

If K is a bounded set in E" and A C C(K) is closed, bounded,

and equicontinuous, show that the last statement of Theorem 6.7.7


is valid.

9.

Let T be the unit circle in E2; that is, T ={x:

The function G with domain

[O, 27T[

x E

& lxl = 1}.

and defined by

g(8)=(cos 8,

sin

8)

is a topological map with range T. Let A C C(T) be the collection of


functions in C(T) so that

E A
n

g(8)= L ak
k=O

cos

k8

+ L bk
k=l

sin

k8.

Show that the closure of A is C(T). This result is a slightly less refined
version of a result called
10.

If I=

[O, l]

Fejer's theorem.

and j=IX I, show that any continuous function

in C(j) can be uniformly approximated by functions of the form


n

F(x, y)= L fk(x)gk (y),


k=l
where fk,
11.

gk

E C(I).

Let A be that subset of C(R) consisting of all functions of the

form

f(x)= p(x)e-lxl'
where p is a polynomial. Show that the closure of A is the set
C(R) & limx-oo

{f: f

J(x) =O}. (Hint: Find a topological map that makes

R onto the unit circle minus one point.)

CHAPTER

71 HIGHER-DIMEN

SIONAL DIFFERENTIATION

7.1

MOTIVATION

In Chapter 4 we considered the concept of the derivative for real-valued


functions having domains in R. At that time we did not give any moti
vational reasons for considering such a concept since we felt that such
reasons are usually adequately. and clearly discussed in elementary
calculus. Now, in carrying over a theory from one dimension to higher
dimensions, one tries to do this in a way in which the essential features
of the one-dimensional theory are preserved. Very often this can be
done by formal analogy, and this will lead to a perfectly satisfactory
theory. At the other extreme, a higher-dimensional situation may
present such a rich diversity of paths that it is only after studying many
examples from geometry and physics, and after a considerable amount
of experimentation and inspiration that the essential features of a useful
theory are isolated. It often happens that only after the higher-dimen
sional theory has been developed is it realized that it is a very rich
generalization of a very simple one-dimensional situation.
The adoption of a definition of differentiation in higher dimensions
presents a problem somewhere between the two extremes mentioned
in the previous paragraph. If
range in En,

is a function with domain in R and

1, then clearly the derivative of

can be defined in

the same way as for real-valued functions. However, ifthe domain of

is in Em,

> 1, then a motivational discussion of a geometric nature

may help to clarify the nature of the problem.


Let us first look at a real-valued, differentiable function
on an interval

{ (x, f(x)): x

/. The graph of this function is


/}, considered as a subset of E2

defined

by definition the set


Of course, the graph

of a function is the same as the function f. The name 'graph' is used


only to emphasize that the collection of ordered pairs which is f is
being considered as a subset of a space having a distance function. The
graph of a real-valued function with domain an interval I is often called
the

trace of a path,

and the path itself is defined as the function

F with

values

F(x)

(x, J(x)).
305

306 j HIGHER-DIMENSIONAL DIFFERENTIATION

The function

vector

to F at

F has
c E 1

domain I and range in E2 By definition, the

tangent

(interior of I) is

F'(c)
The tangent line to F at

(l,f'(c)).

is, by definition, the subset of E2 given by

7F<cJ= {tF'(c) + F(c): t E R}.


Note that the set

o'eF<cJ= {tF' (c): t E R}


is a one-dimensional subspace of E2 and

7F<c> = F<cJ +F(c)


7F <c> is the
F(c). Such a

;eF<cJ by a
affine space.

Thus we see that

"translation" of a linear space

constant vector

point set is usually called an

The pictorial representation is shown in Fig. 7.1.1.

x2

FIGURE 7 .1.1

Let us now suppose that

is a real-valued function with domain in

{ (x, g(x)): x E
'(g)}, considered as a subset of E3 Such a graph is often called the
trace of a suface or trace of a surface element, the surface or surface element

E2 The graph of

is the collection of ordered pairs

itself being considered as the function with values

G(x) = (x, g(x)).


The function

has domain

'(g)

C E2 In analogy with the concept

of the tangent line in E2, we want to define the concept of a tangent


plane to

at a point

interior to

'(g).

Probably, the most natural

thing to do is to consider the intersection of the graph of


planes through

which are parallel to the

collection of paths in the graph of

g,

x3

with all

axis. This will give a

and if all these paths have tangent

lines the collection of tangent lines if they form an affine space, could
possibly be called the

tangent plane

to

G at c

(Fig. 7 .1.2).

7.1

MOTIVATION\ 307

x3
Al

FIGURE 7 .1.2

Let us make a little more precise the things we have said in the
previous paragraph. A two-dimensional linear space which contains
the vector

(u,O) =(u1, u2,O) and the set of all vectors (0, O,x3) is the set
.Pu={t(u,0) +(O,0, x3): t,x3 E R}.

A p lane through

c E .B(g)

and parallel to

is the affine space

Pu(c)=Pu+C.
If we intersect this space with the graph of

{(c+tu,g(c+tu)): c+tu

g we get the set of points

(g)}.

We can consider this set of points as the graph of the path

Gu(t)
where

(c+tu,g(c+tu)) ,

ranges over an interval I for which

If we assume that

G'u(O)

exists. This is usually called the

u.

0 E /0

and

c+tu E (g).

exists, then it follows that

Dug(C) - r
direction

Gu defined by

g(c+hu) - g(c)
h
directional derivative

(7. l. l)
of

g at c in the

We shall set

DuG(c) = G'u(O) = (u,Dug(c))


G(c+hu) - G(c).
=Jim
hO

This is called the

directional derivative

h
of

at

in the direction

(7.1.l')

u.

08 I HIGHER-DIMENSIONAL DIFFERENTIATION

Let us suppose that

Vu

E2, DuG(c)

de c<c>

exists. If the set of vectors

{DuG(c): u

E2}

(7.1.2)

is a linear subspace of E3, then there is some temptation to say that the
surface

has a tangent plane at

c.

However, from the point of view

of analysis, there are compelling reasons why people prefer to apply


the term 'tangent plane' in a situation which is slightly more restrictive
than the one we have given here. We shall say more about this a little

dec<c>

later on. For now, let us show that if


with values

DuG(c)

is linear, then the function

is a linear transformation with domain

range in 3 Indeed,

Vu, v

and

Va, {3

aDuG (c) + f3DvG(c)

E R,

E2

and

3w E 2 so that

DwG(c).

If we write this out in component form we get

From this it follows that w =

Dau+nv G(c)

au + {3v
=

and hence

aDuG(c) + f3DvG(c).

The equality (7.1.3) means that the function

Lc<c>

(7.1.3)

with domain

E2

and

defined by

Lc<c>(u) = D,,G(c)
is a linear transformation with range in
is the linear space

dec<c> defined

E3

(7.1.4)
Indeed, the range of

Lc<c>

by (7.1.2).

In the case where the domain of g is in R, the fact that g' (c) exists

implies that g is continuous. We would like this situation to persist in


higher dimensions. Let us see what is involved. We shall suppose as

Vu E E2,DuG(c) exists. Then for every u E E2 for


lul = 1, and VE> 0, 31> > 0, depending on u and E, so that Vt E
which ltl < I> we have

before that

IG(c +tu) - G(c)- tDuG(c)I ,;;;; E ltl.


This simply follows from the definition of

which
R for

(7.1.5)

DuG(c). Now, the last in


G at c if the domain of

equality is enough to prove the continuity of

is in R. Unfortunately, it is not quite enough in higher dimensions.

However, if we can say that VE> 0, 31) > 0 so that for all

lul

linear manifold, then this will be enough to show that


at

c.

Indeed, we can then say that VE> 0, 31> > 0 so that

lvl < I>

is continuous

Vv

E2 with

we have

IG(c+v)-G(c)-Lc<c>(v)I,;;; E lvl.
Lc<c> is a linear
ILc<c>(v)I,;;;; M lvl. If

Since

E2 with

1, the inequality (7.1.5) is satisfied, and if in addition dec<c> is a

transformation, 3M so that

Vv

(7 .1.6)
E

E2

we have

we take E= 1 in (7.1.6), then 31) > 0 so that

7.2

Vv

DIRECTIONAL DERIVATIVES AND DIFFERENTIALS I 309

E2 with lvl < 8 we have


IG(c+v)-G(c)I

This shows the continuity of

lvl+ IL c<ci(v)I
(M + 1) lvl.

G at c.

In case (7.1.6) is satisfied, where


say that

La<c> is a linear transformation, we


G has a tangent plane at c. We take this tangent plane to be the

affine space

<cl= .t5acc> + G(c),


which is the range of the affine transformation

T G<c>=Lace>+ G(c).
There are any number of examples which show that

DuG(c) exists
G is not continuous at c. From our previous remarks
it follows that such a G does not have a tangent plane at c. The example
Vu

E E2 but that
,

we shall give is a very standard one. We feel sure that the reader will
be able to construct many more without difficulty.
Let us write the points in

g(x,y )_

E2 as (x, y) and set

{ xy2/[x2+y4],
0

if

Vx
x

0,
0.

The result of Exercise 3(a) of Section 6.4 is that the function

g is not
(O,O). However, let us show that Dcu.vig(O,O) exists
2. Since g(O,O) = 0 we have

continuous at

V(u, v)

g(hu, hv)- g(O,O)


uv2
- u2 h2(v)4
+
h
_

Hence, if

=O.
7.2

0, we get Dcu.v>g(O,O)

v2/u and, if u = 0, Dcu.v>g(O,0)

DIRECTIONAL DERIVATIVES AND DIFFERENTIALS

The purpose of this section is to write down in a formal manner the


things that were discussed informally in the last section. We shall not,
however, adopt the geometric terminology. For the sake of simplicity
of presentation we shall not make our definitions as general as in
Chapter 4 but will define derivatives only at interior points (see Section
6.3) of the domains of functions.

7 .2.1
Definition. Suppose f is a function with domain in En and range
in Em. The function f is said to have a directional derivative at a in the direction
u ::::) a is an interior point of If) (J ), u E En, and for h E R the limit

Duf(a) =Jim
h-0

J(a+hu)- J(a)
h

310 I HIGHER-DIMENSIONAL DIFFERENTIATION

exists. If u ek, and Dk f(a)


Dekf(a) exists, then it is called the partial
derivative of f at a with respect to the variable xk.
The directional derivative of a function f in the direction u is that function
Duf with domain le (Duf)
{x: D,,f(x) exists} (possibly void) and whose
value at a E le (D,,f) is Duf(a).
=

Many people speak about the directional derivative of a function

in the direction

only if

lul

I . However, this is just a matter of

taste and it doesn't seem worthwhile to make such a distinction. Also,


a more classical and possibly more popular notation for a partial
derivative at a point is given by

af(a)
oxk

Dk f(a) .

This notation does have considerable appeal in some situations since


it does lead to

an

easy way to remember some complicated formulas. We

have in mind, for example, the formula for the chain rule which we
shall develop in Section

7.3. Other notations are

fxk

and

The notation that we have adopted in the formal definition

7.2.1 has

become more popular with the advent of the modern theory of partial
differential equations since

D k can

be considered to be a function whose

domain is a space of functions and whose range is also a space of func


tions. We shall, in most instances, prefer the notations we have intro
duced in the formal definition for the same reasons as discussed after
Definition 4.1.1. Let .P be the collection of all functions each of which
has its domain in some Euclidean space and range in some Euclidean
space. The set .P shall include that function which has the null set as
domain. Then for every vector

and V f E .P,

Duf

is well defined;

more likely than not it will be the function whose domain is the null
set. However, regardless of this, the composite function

Dv Du
0

can

be defined and its domain is also .P. This way of looking at things
will be convenient when we discuss higher-order derivatives and we
shall consider the matter again in Section

7.4.

Definition. If f is a function with domain in En and range in


then f is said to be difef rentiable at a E le(J) 8 a E le(J)0 and
there exists a linear transformation L, with domain En and range in Em, so
that Ve> 0, 3(') > 0 so that Vx E le(J) with Ix - al <a, we have
7 .2.2

Em,

IJ(x) - J(a) - L (x - a) I

,,;;;

Ix - al.

(7.2.l)

If f is differentiable at a, then the linear transformation L is called the differ


ential of f at a and will be denoted by 'df (a).'

7.2 DIRECTIONAL DERIVATIVES AND DIFFERENTIALS I 311

df with domain
FJ(df) is df(x).
Some authors prefer to use the word 'derivative' for df and reserve
the word 'differential' for the function defined on FJ(df) X En with
values df(x) (u). Since from a strictly logical point of view df(a) is not
the derivative of f at a when n = 1, we shall not use this terminology.
The differential of

FJ(df) = {x: df(x)

can be defined as that function

exists}, and whose value

Vx

However, it is really just a matter of taste.


There is one small point that must be justified in Definition
We have called the linear transformation

The implication is that

L the

differental of

7.2.2.
at a.

is unique, and this can be settled in short

L1 is a linear transformation that satisfies


7.2.2. Then since a E -B(J)0, Ve> 0,
with lul < 8 we have a+u E -B(J) and

order. Indeed, suppose that

the conditions of Definition

38> 0

so that

IL(u)-L1(u)I

Vu

IJ(a+u)-J(a)-L(u)I +IJ(a+u)-f(a)-L1(u)I
2elul < 2e8.

:s:;
:s:;

Now,

Vv

E En

E En, if

0,

then

Thus using the fact that

L1(av)

aL1(v)

Vv

8v/2lvl

E E" and

lul < 8.
aL(v) and

has the property that

Va

E R,

L(av)

we get

(8/2lvl) IL(v)-L1(v)I

I L(u) -L,(u)I < 2e8.

Hence

IL(v)-L1(v)I < 4e IvI .


v # 0 this is true Ve> 0, we must have L(v)
L(O) L1(0) = 0, we have shown that L = L1.

Since for fixed


Further, since

L1(v).

7 .2.3 Proposition. If a function has a differential at a point a, then the


function must be continuous at a.
The simple proof of this fact was given in Section

7.1

and we shall

not repeat it here. However, we should call attention to the fact that
the proof of Proposition

7.2.3

very definitely requires the use of the

linearity of the differential. On the other hand, the proof given above
of the unicity of the function L which satisfies the conditions of Defi
nition

7.2.2 requires only the homogeneity of L;

that is,

L(au)

aL(u).

Proposition. If f is a function with domain in E" and range in


and has a differential at a, then Vu E En, Duf(a) exists and

7 .2.4
Em

Duf(a)
Proof.

Suppose

Ve> 0, 38> 0

so

E E"

that

Vh

and

df(a) (u) .
lul

E R with

1. Now, Va E -B(J)0 and


0 < lhl < 8 we have a+hu

312 I HIGHER-DIMENSIONAL DIFFERENTIATION

E -B{f) and

If we divide by

e lhl.

IJ(a+ hu) - f(a) -df(a)(hu)I

lhl and note that df(a) is linear,

we see that the proposi

tion is proved in this case.

v E

If

E" and

0, then upon setting u = v/lvl we see that lul = 1.

Thus
1
Dv11v1f(a) =
df(a)(v).

But

Va E R, a

0,

f(a+ hau) -J(a)


f(a+ hau) -f(a)
'
-a
ha
h
_

Duf(a) exists. then Dauf(a) exists and is equal to aDuf(a).


lvlDv11vif(a) =Dvf(a). Finally, since Dof(a) = df(a)(O) = 0, we

so that if
Thus

have completed the proof of the proposition.


REMARK:

For future reference, let us call attention to the fact that

during the proof of the last theorem we have shown that if


exists then

Va E R, Dauf(a) exists and

Duf(a)

Dauf(a) = aDuf(a) .
As we saw at the end of Section 7.1, the converse of the last proposi
tion is not true. That is to say, if
necessarily true that

Vu E

has a differential at

E",

a.

Duf(a)

exists, it is not

Indeed, the example we

gave showed the existence of a function all of whose directional deriva


tives exist at the origin but the function itself is not continuous at the
origin.

However, if in addition to the existence of the directional

derivatives of a function at a point, the directional derivatives are


continuous, then the function has a differential at the point. This is
shown by the next theorem, which actually proves somewhat more.

7 .2.5 Theorem. Suppose f is a function with domain in E" and range


in Em, {.k: k E (l,n)} is a basis for En, 3i E (l,n) so that D.J(a)
exists, and there is a ball B(a,p) C -B{f) so that Vj E (1, n) \{i}, B(a,p)
C -B(D.J) and D.J is continuous at a. Then df(a) exists.
Proof.

We shall break the proof into several steps.

Suppose u and v are linearly independent vectors in En so that Dvf(a)


exists, B(a, p) C -B(Duf) and Duf is continuous at a. Then Ve> 0,
3S > 0 so that Va,f3 E R with lau +f3vl < S we have
(a)

lf(a +au+ {3v) - f(a) -aDuf(a) - f3Dvf(a)I

lau+ f3vl.

(7.2.2)

7.2 DIRECTIONAL DERIVATIVES AND DIFFERENTIALS I 313

From the definition of the directional derivative,


so that if

lf3vl

<

8i.

Ve

> 0,

381

> 0,

then

IJ(a+{3v) - f(a)-f3Dvf(a)I :;;;

lf3vl .

Next, let us set

F(a, {3) = f(a+au+{3v),


a+au+{3v

where we suppose that

laul + lf3vl

D1F(a, {3)

Since

f/2,(F )

l
lim _
h-0 h

lim

h-0

[F(a+ h, {3)-F(a, {3)]

J [f(a+ (a+h)u+{3v) - f(a +au+{3v)]


h

Duf(a+au+{3v).

C Em we may write

F(a, {3)
where

E JFJ(f); this is certainly true if

< p. We have

f/2,(Fk)

L Fk(a, {3)ek>

k=l

C R. Using the one-dimensional mean value theorem

we get

Fk(a, {3)-Fk(O, {3)


aDufk(a+Ok u+{3v),
where 0 :;;; I O i :;;; lal. By hypothesis, Duf is continuous at a and
k
38 with 0 < 8 < p, so that if laul + lf3vl < 8 , then
2
2
2
IJ(a+au+{3v) - f(a+{3v) - aDuf(a)I :;;; E laul.
J k(a+au+{3v) - J k(a+{3v)

thus

If we now write

f(a+au+{3v)-J(a)
and

take

we have

J(a+au+{3v)-f(a+{3v)
+ f(a+{3v)-J(a),

83 = min (81 , 8 ),
2

then

Va, {3

E R with

IJ(a+au+{3v) - f(a)-aDuf(a)-f3Dvf(a)I
:;;; e

laul +lf3vl

[laul +lf3vlJ.

<

83

(7.2.2')

u and v are linearly independent, if follows from Theorem 6.2.4


37/ so that 0 < 7J < I and 2lau f3vl :;;; 2(1-1'}2) laui lf3vl :;;;
(1-1'}2) [laul2+ lf3vi2]. Thus
Since

that

lau+f3vl2

laul2+2au {3v+lf3vl2
;:::,7/2 [laul2+lf3vl2 ]
;::: 7/2 [laul+lf3vl ]2/2 .
=

314 I HIGHER-DIMENSIONAL DIFFERENTIATION

If we take o

7103/\12, then lau +,Bbl

<

o =::::} laul + l.Bvl < 83 Thus

from (7.2.2') and the above inequality we get (7.2.2), up to the unimpor
tant factor

\12/71.

Under the hypotheses of the theorem, Vu E ", D,,f(a) exists, and if


Lk=tuk ,k> then

(b)

D,,f(a)
Further, VE

> 0,

38

> 0

"

2: ukD.kf(a).

so that lul

<

o =::::}

lf(a + u) - f(a) - D,,f(a)I


Note that (7.2.2) shows that

Dau+fJvf(a)

Dau+fJvf(a) exists

(7.2.3)

k=l

lul.

(7.2.4)

and moreover

aD,,J(a) + ,BDvf(a).

Let us now prove the first statement in (b) by induction. For every

k E (O, n - 1)

let

P(k)

and

u which
{,j: j E ( 1, n) \{i}},

be the following statement: For every

is a linear combination of

vectors in the set A

Va E R, Du+a.J(a) exists and

(7.2.3')

ji
Lj,,.; ui,i. For every k E N and k n, let P(k) be the state
1 1. Clearly, Vk E N0 and k n, P(k) =::::} P(k + 1).
Now, P(O) is certainly true, since a zero-dimensional subspace consists
only of {O}, and thus (7.2.3') holds. If n
1, there is nothing more to
prove. Hence, suppose n > 1, k < n - 1 and P(k) is true. Suppose u is
a nonzero linear combination of (k + 1) vectors in A. Then 3l E
(1, n) \{i}, so that u1 u - u1,1 is a linear combination of k vectors in
A. From the hypothesis P (k), Va E R, Du1+a./(a) exists and
where

ment:

ji
Now, by part (a) of the proof,
-.

(7.2.3")
since by hypothesis

B (a, p) and is continuous at a.


P(k) =::::} P(k + 1) and the induction is com
Vu E E",D,,f(a) exists and (7.2.3) holds.

D,,1.J

is defined on

Thus (7.2.3') holds. Hence


plete. This shows that

The second statement of part (b) is an immediate consequence of the

1
u ,1 and u1 +a,; are linearly independent, the fact that D,,1.J
is defined on B(a, p) and continuous at a, formula (7.2.3"), and the

fact that

inequality (7.2.2).
(c) . The function f has a differential at a. From part (b) we know that
Vu E E",D,,f(a) exists. If we set L(u)
D,,f(a), then (7.2.3) shows
=

7.2

DIRECTIONAL DERIVATIVES AND DIFFERENTIALS I 315

(7.2.1) is satisfied for


(7.2.4). This completes the proof.

that L is a linear transformation. The fact that


L is simply the inequality

The last theorem is usually stated in terms of the basis

REMARK:

{ei: j

( l, n)}

since in practical situations the partial derivatives

of a function are usually the easiest to compute.


The fact that we stated the theorem in the form that allowed one
directional derivative merely to exist and not necessarily be continuous
at

a was not done for reasons of sophistry. This was done to include the
n 1, where a function has a differential at a point if it has a

case

derivative at the point, regardless of whether or not the function has


a derivative in the neighborhood of the point. Indeed, iff has domain
in R and is differentiable at

a,

then

df(a) (u)

Vu

E R,

uf' (a) .

Let us now note a particularly useful matrix form of the differential


of a function at a point. Suppose f is a function with domain in
range in

Em

and

df(a)

En

and

exists. Let us find the matrix representation of

this differential with respect to the ordered pair of ordered bases,

( (e1,

,en), (e., ,em)). If m

n,

we are using the same symbol

ek

to stand for different vectors in different dimensional spaces. However,


we think that no confusion will result. We may write

df(a) (ek)

m
Dd(a) = L DdJ(a)ei.

Thus the matrix representation of

df(a)

with respect to this ordered

pair of ordered bases is

(7.2.5)

This matrix is called the

Jacobian matrix

of f at

a,

and, if

determinant of this matrix is called simply the Jacobian off at

m,

the

and is

denoted by ]J(a) . Of course, the reader should be aware of the fact that
the Jacobian matrix of a function can exist at a point without the func
tion having a differential at the point in question.
If f has a differential at

a,

df(a) (u) in
J and the components of u with
(e., ,en) We already noted such a
then it is easy to compute

terms of the partial derivatives of


respect to the ordered basis
form in formula

(7.2.3). In general let us write

!16 I HIGHER-DIMENSIONAL DIFFERENTIATION

and apply the linear transformation df(a) to both sides. Noting that
df(a)(e )
k

D f(a), we get
k
df(a)(u)

k=l

ukD f(a).
k

(7.2.6)

The last formula can be put into a form that we feel sure the reader has
seen in elementary calculus, even though the meaning may not have
been clear at that time.
For every k E

(1, n),

let xk be that function with domain En and

range E1 defined by
(7.2.7)
Note that we have also used the same symbol 'xk as a variable. We think
it will always be clear from the context in which way we are using this
symbol, and when we use it as a function it will be clear on which space
it is acting. The function xk is clearly a projection and an almost trivial
calculation shows that Vx, u E En
Duxk(x)= uk.
Since Vu E En, this is a continuous function of x, <fxk(x) exists and
(7.2.8)
Note that <fxk(a) is independent of a so that we will usually write simply
'dxk ' instead of 'dxk(a).' Also note that <fxk(a)(u)

xk(u).

Let us suppose, for the moment, that f is a real-valued function.


Then putting (7.2.8) into (7.2.6) we get
df(a)=

k=l

D f(a) <fxk(a).
k

(7.2.9)

Since D f(a) is a real number, D f(a) <fxk(a) is a linear transformation


k
k
and (7.2.9) simply means that the linear transformations on both sides
of the equality are equal. However, even if

is not real-valued the

right side of (7.2.9) can be interpreted in a way that will make the
Vu E En we interpret D f(a) dxk(a)(u) as ukD f(a),
k
k
and thus both sides of (7.2.9) are the same when evaluated at every

equality valid:

u E En. The reader may have seen (7.2.9) written in the form
df=

i dxk.

k=l
From the above discussion he should be able to supply the interpretation
of this formula.

D Exercises
I.
follows:

Consider the real-valued functions with domain E2 defined as

7.2 DIRECTIONAL DERIVATIVES AND DIFFERENTIALS I 317

f(x' x2)
f(x1,x2)

(a)
(b)
Show that

Dif(O)

Vu= (u1,u2),

(xx2) t/3.
lx1x21112.

and

so that

Iff has domain

2.

D2f(O)

u1u2

E3

'/:-

0.

exist but that

Duf(O)

does not exist

and is defined by

f(x1,x2,x3)= (x1)2+3x1x2- (x2)a+xa,


calculate Duf(a) for u= (1, 0,3), and u = (1, -1, 2).
3.

In each of the following exercises, f has domain

df(a).

(a)
(b)
( c)

follows:

f (x )

lxl2

sin

, 12

<=>x
if

'/:-

E2

defined as

0,

x = 0.

Show that the partial derivatives of J exist at every point of


not continuous at the origin. Nevertheless,

5.

Compute

f(x1' x2) (xi) 4 + (x2) 4 + (x2) 4 + 3x1x2.


f(x) = lxl3.
J(x1,x2) =x1ex.

Let J be a real-valued function with domain

4.

E2

df(O)

E2

but are

exists.

Supposef and g are real-valued functions and each has a differ

ential at

En.

Show that

Jg

has a differential at

and

dfg(a) = J(a) dg(a) + g(a) df(a).


If

g(a)

'/:-

0,

show that

g(a) df(a) - f(a) dg(a).


d 1 (a) =
g
g(a)2
6.

Suppose J is a real-valued function, all of whose partial deriva

tives exist at

En.

Define

Vf(a)

and call this the gradient off at

L Dkf(a)ek>
k=I
a.

If

df(a)

df(a) (u)= Vf(a)


7.

df(a)

exists show that

Vu

En

u.

Suppose f is a real-valued function with domain in

En

and that

exists. Define the real-valued function g on the unit sphere in

En, {u:

u E U &

iul

I } by
,

g(u) = df(a) (u)= Duf(a).


If

df(a)

when

is not the zero transformation, show that g has a maximum

u= Vf(a)/IVJ(a) I

and g does not take on its maximum at any

318 I HIGHER-DIMENSIONAL DIFFERENTIATION

other vector. In other words the maximum of the directional deriva


tives of

8.

at

a is taken on in the direction of the gradient off at a.

Suppose that f is a linear transformation with domain En and

range in Em. Show that

Va E En,
df(a) = f.

9.

Suppose

is a function with domain in En and range in

Em.

Writef componentwise as

J= (J1'

'

Jm) '

(1, m ) Jk is a real-valued function with JFJ(Jk) = JFJ(f).


df(a) exists {:::} Vk E (1, m ) djk (a) exists and moreover

where Vk E
Show that

Vu E En,

df(a)(u)= (df1(a)(u),
10.

dfm (a)(u)).

Suppose f is a function with domain in

and Vt E R+ so that tx E

En and range in Em

JFJ(f) we have
J(tx) = tPf(x).

Such a function is called homogeneous of degree p. If

df(a) exists,

show that

df(a)(a)
This is known as

Pf(a).

E uler's relation.

Let A be a symmetric linear transformation from E


f(x) =A(x) x. Show that Vx E E n, df(x) exists and
over Vx , y E En

11.

E n,

(a)

and set

df(x)(y) = df(y)(x) = 2A(x) y.

(b)

If

[bt;] is any

g(x) =

matrix, then

j=l

i=l

L L b;;xixi.

Use the results of part (a) to show that

12.

Vx E En set

dg(x) exists Vx E E".

Suppose that f and g have a common domain in

E"

and both

df(a) and dg(a) exist and if h(x) = J(x) g(x),


show that dh(a) exists and Vu E En

have range in

Em.

If

dh(a)(u ) = J(a) dg(a)(u) + g(a) df(a)(u).

13. Suppose that J is a function with domain in En and range in


Em so that Vx E JFJ(f), If (x)I= 1. If df(a) exists, show that Vu E P

J(a) df(a)(u) = 0.

DIFFERENTIATION RULES I 319

7.3

If we think of

as parameterizing the unit sphere in Em, then from a

geometric point of view this says that the tangent plane to the unit
sphere at a point is perpendicular to the line from the origin to the
point.

7 .3

DIFFERENTIATION RULES

In this section we shall obtain the higher-dimensional differentiation


rules, analogous to those we derived in Section

4.2. The one slight differ

ence her is that the rules will be stated in the language of differentials.

7 .3.1 Theorem. (a) If f and g each have their domain in E" and range
in Em, and if df(a) and dg(a) exist, then d(j + g)(a) exists and

d(J

g)(a)= df(a) + dg(a) .

If f is a real-valued function with domain in E" and g has domain in


and range in Em, and if df(a) and dg(a) exist, then d(jg)(a) exists and

(b)
En

d(Jg)(a)

f(a) dg(a) + g(a) df(a).

[The function Jg has domain (J ) n (g) and Vx E (Jg) we have


Jg(x) J(x)g(x). The symbol g(a)df(a) is interpreted to mean that Vu E En
g(a) df(a)(u) = df(a)(u)g(a) and this has meaning since df(a) (u) E R.]
=

(c) If g is real-valued with domain in


then d( l/g)(a) exists and

d .!
g (a)

E",

dg(a) exists, and g(a)

- 0,

dg(a).
g(a)2

a is an interior point of (f ) and (g), it is an


(f + g). The function df(a) + dg(a) is certainly a
linear transformation with domain E" and range in Em. Now, Ve> 0,
38 > 0 so that Vx E (f + g) with Ix-al< S we have
Proof.

(a)

Since

interior point of

IJ(x) - f(a) - df(a)(x - a) I

Ix - aj/2,

lg(x) - g(a) - dg(a)(x-a) I

Ix - al/2.

Thus the triangle inequality gives

I (J + g)(x) - (f + g)(a) - [df(a) + dg(a)](x - a) I

Ix - al.

This constitutes the proof of (a).


(b)

The point

interpretation of

a is an interior point
g(a) df(a), it follows

of

(Jg). Under the given


J(a) dg(a) + g(a) df(a)

that

is a linear transformation with domain E" and range in Em. Since

df(a) exists, f is continuous at a and thus 3M > 0 and 377


x E (J ) and Ix - al < 7J =:} IJ(x) I M. We may suppose

so that
that M

320 I HIGHER-DIMENSIONAL DIFFERENTIATION

is large enough so that


Further,
< 8

VE>

lg(a)I

Vu

:E; Mand

E En,

0, 38> 0 so that 8 :E; TJ, and

==>
lf(x) - f(a)I

<

ldg(a)(u)I :E; Mlul.


JV(Jg) and Ix - al

e/3M,

lf(x) - f(a) - df(a) (x - a)I

:E;

(e/3M) Ix - al,

lg(x) - g(a) - dg(a) (x - a)I

:E;

(e/3M) Ix - al.

Thus we have

lfg(x) - fg(a) - [f(a) dg(a) + g(a) df(a)] (x - a)I


lf(x)I lg(x) - g(a) - dg(a)(x - a)I

:E;

:E;

lg(a)I lf(x) - f(a) - df(a)(x - a)I

lf(x) - f(a)I ldg(a)(x - a)I

E Ix - al.

This completes the proof of (b).


(c)

Since

dg(a)

exists, it follows that g is continuous at

a. Thus,
Ix - al <

x E JV(g) and
g(a) of; 0, 3M>
TJ==>g(x) #;O and ll/g(x)g(a)I :E;Af. We may suppose M is large
enough so that Vu E R, ldg(a)(u) I .s; Mlul. Further, Ve> 0, 38>
0 so that 8 .s; TJ,_ and x E DE9(g) and Ix - al < 8 ==>
0 ano 3"f'/ > 0 so that

since

lg(x) - g(a) - dg(a)(x - a)I ,,;;: E Ix - al/2M,


g(x)
I g(a)
- 11

<

e/2M2

I_!g (x) _!g (a)

dg(a)(x - a)
g(a)2

Thus

:E;

I g(x/g(a) j lg(x) - g(a) - dg(a)(x - a) I


+ I g(x)(a) 11:(j - I ldg(a)(x - a) I
I

.s;

E Ix - al.

This completes the proof of (c).


In the higher-dimensional case the easiest and neatest way to state
the theorem about the differentiation of composite functions is in terms
of differentials. The proof is the same as for the one-dimensional case,
but we shall repeat it.

7.3

DIFFERENTIATION RULES I 321

7 .3.2 Theorem (Chain Rule). Suppose g is a function with domain


in Er and range in Eq, f is a function with domain in Eq and range in EP, and
se(g) C cB(J). If dg(a) and df(g(a)) exist, then df 0 g(a) exists and

df

g(a)

df(g(a))

dg(a).

(7.3.1)

Proof. For simplicity of notation let us set b


g(a). First, note that
since df(b) is a linear transformation 3N > 0 so that Vu E Eq,
ldf(b)(u) I N lu l. Hence
=

l df(b)(g(x)-g(a))-df(b) 0dg(a)(x-a)I
N lg(x)-g(a)-dg(a)(x-a) I.
Next, Ve> 0, 38' > 0 so that Vx

cB(g) with Ix-al

lg(x)-g(a)-dg(a)(x-a)I

<

(7.3.2)

8' we have
(7.3.3)

elx-al/2N.

If we apply the triangle inequality to (7.3.3) and use the fact that
E F, ldg(a)(u)I Mlul, we get that Vx E cB(g)

3M > 0 so that Vu
with Ix-al < 8',
lg(x)-g(a) I

eIx-al/2N+ ldg(a)(x-a)I

L Ix"'.""' al,

where L M + e/2N. Finally, Ve> 0, 38" > 0 so that Vx


with lg(x) -g(a)I < 8" we have
=

IJ0g(x) -f(b)-df(b)(g(x)-b)I
Let us set 8
min(8', 8"/L); then Vx
have from (7.3.2) through (7.3.5),
=

(7.3.4)

cB(f 0g)

e lg(x) -bl/2L .

cB(J 0 g) with Ix-al

(7.3.5)
<

8 we

IJ0g(x)-J0g(a)-df(g(a)) 0dg(a)(x-a)I

IJ0g(x)-f(b)-df(b)(g(x) - b)I
+ ldf(b)(g(x)-b)-df(b) 0dg(a)(x-a) I

elg(x)-g(a)l/2L+N lg(x)-g(a)-dg(a)(x-a)I

eIx-al .

This shows that df 0g(a) exists and moreover has the value given by
(7.3.1).

The linear transformation df(g(a)) is the difef rential of f at


the point b g (a) and should not be confused with df 0g(a), which is the
differential of f 0g at the point a.

WARNING:

The formula for the differential of a composite function is in very


abbreviated form, and for practical computational purposes it is useful
to compute the entries of the Jacobian matrix of df 0g(a). From the
formula (7.3.1) it follows that the Jacobian matrix of df 0g(a) is the

322 I HIGHER-DIMENSIONAL DIFFERENTIATION

df(g(a)) with the Jacobian matrix


dg(a). If we perform the multiplication of these two Jacobian matrices

product of the Jacobian matrix of


of

we get

Dk(J g)i(a)
0

L D;Ji(g(a))Dk gi(a).
i=l

(7.3.6)

This formula may be rather difficult to remember, although with some


practice one can fix in one's mind the positions of the

i, j, and k. How

ever, if we revert to the notations

agi(a)
axk

Dk gi(a) '

then (7.3.6) becomes

a(J0 g)j(a)
aji(g(a)) agi(a)
=
..,
axk
agi
axk .
i=l

(7.3.6')

This formula may be easier to remember because of the temptation to


cancel

agi(a) with ag1

An analogue to Theorem 4.2.3 is also valid in higher dimensions.


The statement is as follows.

7 3 3 Theorem. Suppose g is a function with domain in E,. and range


in Eq, f is a function with domain in Eq and range in EP, and (g)C J0(J).
If df g(a) and df(g(a)) exist, the latter is nonsingular, and g is continuous
at a, then dg(a) exists and
.

dg(a)

df(g(a))-1

The proof of this theorem follows

df

g(a).

mutatis mutandis the proof of

Theorem 4.2.3 and we shall not reproduce it here. However, the reader
may find it rewarding to trace through the proof. Note that in Theorem

7.3.2 the condition (g) C J0(J) assures us that a is an interior point


J0(f g). In Theorem 7.3.3 the condition (g) C J0(f ) is required

of

in the proof.

7 3 4 Mean Value Theorem. Suppose f is a real-valued function with


J0(J) C En and the line segment A= {x: x= tb + (1-t)a & t E [O, l]}
is contained in J0(J)0. IfVx E A,df(x) exists, then 3c= (1-y)a+yb,
y E ]O,1 [,so that
.

f(b)-f(a)= df(c)(b - a).


Proof.

F or

[O,I] set
F(t)

f(tb+ (1- t)a).

Either by direct computation or the use of the chain rule we find that

F' (t)= df(tb + (I -t)a)(b - a).

7.3

DIFFERENTIATION RULES I 323

By the use of the one-dimensional mean value theorem, 3-y E

]O, I[

so that

F(I)-F(O) =F'(y).
But

F(l)

J(b)

and

F(O)

which proves the result.

J(a),

7 3 5 Corollary. Suppose f is a function with domain in En and range


in Em and the line seg;inent A is in >(J)0. If Vx E A, df(x) exists, then
there exists a linear transformation L with domain En and range in Em so that
.

J(b)-f(a)
Proof.

L(b

a) .

Write

k=l
Now, Vk E

(I, n),

3 ck E A so that

Jk(b)
For every

u E

fk(a)

dfk(c ) (b - a).
k

En define

L(u)
It is clear that

dfk(c )(u)e .
k
k

k=l

has all the properties described in the corollary.

O Exercises
1.

Suppose

is a real-valued function with domain E3 defined by

f(x)
Let

(x1 )2 - x2x3 -

x1 x3

be the function with domain and range E3 whose components are

g1 (x) x1
g2(x) x1
g3(x) = x1
=

cos
sin
sin

x2,
x2 cos x3,
x2 sin x3

Compute the three partial derivatives

2.

sin

Let

of

f 0 g.

be a function with domain E" and range in En. If

kth component of

g,

gk is

the

suppose that

gk(x)
a function depending only on

(g) C >(!). Assuming


f and g, show that

xk.

g (xk)'
k
Suppose that

is a function with

appropriate differentiability properties for

Jr.g(x) =Jr(g(x))

IJ

k=l

g' (xk) .
k

324 /HIGHER-DIMENSIONAL DIFFERENTIATION

Suppose

3.

x E E9 and the components of x are labeled in some


i,j E (I, 3). Suppose we set

fixed order as XtJ

a(x)
Show that

Va E

aa(a)
axu
If

det [xij] .

E9,
=

Co(au) .

Vi,j E (I, 3), gu is a differentiable function with domain [O, l],

show by the chain rule that

D(a g)
0

g;l
g21
g31

g3
g23
g33

g;2
g22
g32

g13
g3
g33

gll
gl
g31

4.

Generalize the results of Exercise

5.

Suppose that f is a one-to-one function and df(a) and df-1(f(a))

6.

Suppose

exist. Show that

3 to

gll
g21
gl

determinants.

df(a) is nonsingular and that df(a)-1

df-1(f(a) ).

and g are functions which have a common domain

C E n which is open and connected. Suppose further that Vx E


the differentials off and g at x exist and df(x)
dg(x). Show that there
is a vector c so that Vx E . f(x)
g(x) t c.
=

'

Compute the matrix representation of the linear transformation

7.

Lobtained in Corollary

7 .3.5 with respect to the ordered pair of ordered


( (e1,- ,en) , (e1,- -, em)).

bases

Suppose the conditions of Corollary

8.

Vu E

Em,

3c E A so that

[J(b) - J(a)]
9.

sis that

7 .3.5 are satisfied. Show that

df(c) (b - a)

u.

Under the hypotheses of Exercise 8 and the additional hypothe

V c E A, 11df(c)11

:,;;;; M, use the results of Exercise 8 to show that

llLll :,;;;; M , where Lis any linear transformation satisfying the conclusion
of Corollary 7.3.5. Hence, deduce that

If(b) - f(a) I
[For definition of the symbol

7.4

II

II

:,;;;; M

see

I b - aI .

(6.5.3).]

HIGHER-ORDER DIFFERENTIALS AND TAYLOR'S THEOREM

In Section 7.2 we discussed the possibility of defining the function

Du

on the collection P of all functions each of which has its domain in some

Euclidean space and range in some Euclidean space. (P includes the

function whose domain is the null space.) More specifically, Vf E P,


Duf is that function whose domain consists of all x E (f ) so tha

7.4

D uf(x)

HIGHER-ORDER DIFFERENTIALS AND TAYLOR'S THEOREM I 325

x. Clearly, it may very well


tFJ(Duf) = 0. Nevertheless, this approach allows us to
composition Dv Du and this is a well-defined function with

exists and takes on that value at

happen that
form the

domain M and range in M.


It is, of course, "intuitively clear" how to define any finite number
of compositions of such functions. However, the precise way to handle
the situation is to use the idea of recursive definition or definition by
induction which we have mentioned several times before. The process
is entirely analogous to the use of the sum function or the product func
tion as done in Chapter 3. We again refer the reader to the book by
Kershner and Wilcox cited at the end of Section

I. I

for an easily accessi

ble treatment of the idea of inductive definition.


Let be the collection of all function sequences so that

::::} Vk

N0, tFJ(Fk)=M,

and

5(,(Fk)

(Fk)

C M. It is not hard to prove,

using the axiom of induction, that there is a unique function r with

tFJ(f) =, 5(,(f)

C so that

VF=(Fd

f(F)(O)=Fo,
f(F) ( n +I) = Fn+ i

E,

f(F) (n).

Let us remark that according to Definition 1.3.5 the composition of


two functions is

always

defined, even though the resulting function may

be that one whose domain is the null set. This, of course, allows us the
possibility of establishing the existence off. If

F= (Fk)

E we shall

write
n

TI

k=O

Fk=f(F)(n).

II=m

The usual conventions hold when we write


case the
If j

Fk

(j1,

Fk,

and so on. In

ar partial derivatives, we shall often use a special notation.

Jn ) , where Vk E (I, n), jk E N, then we shall set


n

Di=

TI
k=I

D;k=D;n Di n- I
o

D}].

If f is a function with domain in E n and range in


and if j = (j1,
,jq), where Vk E (I, q), jk E (I, n), then we call
Dd a qth-order partial derivative offat a.
7 .4.1

Em,

Definition.

Other, more standard, notations for a qth-order partial derivative of


a function

fare

We shall prefer to remain with the symbols we have chosen


previous paragraph.

the

326 I HIGHER-DIMENSIONAL DIFFERENTIATION

Di Dk=Dk Di . How
f it may often happen that Di Dkf=Dk DJ , and
a E J8(f ) it is more likely to happen that Di Dd(a)

Now, ifj,k E N andj k, it is not true that


ever, for a fixed
for a fixed f and

=Dk DJ(a).
0

The next theorem gives sufficient conditions under

which the last commutativity relation holds.

7.4.2 Theorem. Suppose f is a function with domain in En and range


in Em. Suppose further that j, k E (I, n) and there is a ball B(a, p) contained
in the domains of Dif, Dkf, and Di Dkf, and the latter is continuous at a.
Then Dk Dif(a) exists and moreover
0

(7.4.1)
Proof.

Without loss of generality we may suppose that

is real

valued. Otherwise we could prove the theorem for each real-valued


component off, and the theorem itself would follow from this.
We must show that
Jim!
h-o

exists and its value is

D;f(x) exists

[D;f(a + hek)

D1 Dkf(a).

Now,

and, of course,

D;f(a)]
Vx E B(a, p)

(7.4.2)
we know that

D;f(x) =Jim! [f(x +lei) - f(x)].


1-0
t
If

x E B(a, p)

and we set

t, x + te1 E B (a, p), and


D;f(x) = lim1_0 dtf(x). Consequently,

Dkdtf(x)

then for all sufficiently small

thus

exists.

we can write

Further,

h [D;f(a + hek) - D;f(a)]


[dtf(a + hek) - dtf(a)]
h t-o
=Jim Dkdtf(a + Ohed,
=!Jim

(7.4.3)

t-o

where

(J, 0 < (J < 1,

depends on

a, h, t, j,

and k, and of course is ob

tained by use of the Mean Value Theorem.


Let us now write

Dkdtf(a + Ohek) in

terms off to get

Dkdtf(a + Ohek) = t [Dkf(a + Ohek +lei) - Dd(a + Ohed


=Di
where

(J', 0 < (J' < 1,

Dkf(a + Ohek + (J'te1),

depends on

(7.4.4)

a, (J, h, j, k, and t, and is also obtained

7.4

HIGHER-ORDER DIFFERENTIALS AND TAYLOR'S THEOREM I 327

by the use of the Mean Value Theorem. From the continuity of Di


at

a, Ve> 0, 38 > 0 so that \Ohek + O'td

\Di Dkf(a + Ohek


0

O'tei)

<

8 =>

Di Dkf(a) I
0

<

Dkf

e,

or using (7.4.4) we have

\Dkd(a + Ohek)

Di Dd( a ) \

<

e.

If we use the fact that the absolute value is a continuous function, then
by allowing

-7

0 in the left side of the last inequality, and using (7.4.3)


i h \ < 8,

we get Vh for which

I [Dd(a

hek)

But, this simply says that

Dif(a)] - Di Dkf(a)
0

Dk Dd(a)
0

e.

exists and is equal to Di

Dkf(a),

which completes the proof.

7.4.3 Corollary. Let be an open set in En, i=(ii.


iq), where
Vj E (l,q),ii E (l,n),cr any permutation of (l,q),cr*i= (i"1,
q
i"q), and C () the collection of all functions with domain and range in
some E'n so that all partial derivatives of f of order q have as their domain
and are continuous. Then V f E Cq().

DJ=DCTJ.

P(q)
P(l) is true and P(2) is
true by the previous theorem. Suppose q ;;;:-: 2 and assume P ( q) is true;
we shall prove that P(q + 1) is true. Suppose cr(l)=r. If r
1 we may
Proof.

We shall prove this by induction on the order q. Let

be the statement of the corollary. Clearly,

write

Of course, the fact that

(g
follows from

P(q).

If

D;k (D;J) =

q, we

11 D;kf=(ft

(g

D;"K (D;J)

have from

P(q),

D; k (D;J)

kr

Since the domain of both sides is. we may apply


Hence, if we do this, then by the use of

P(q)

D;q+I

we get

to both sides.

328 I HIGHER-DIMENSIONAL DIFFERENTIATION

DIf

(D
= (Ii
=

lq+l

k=2

q 0D- (D f)
fl
I,
'k

k=l
k,,.r

D;uk

) (D;ulf) =Du ;f.


r

Finally, a similar argument gives the case when


P ( q) ==> P ( q +

1) ,

q+

1. Hence

and the corollary is proved.

function f is said to belong to the class Ck :::::> all


of the partial derivatives of f of order k have domain J (f) and are con
tinuous. The function f is said to belong to the class C00 :::::> all of partial deriva
tives of f (of all orders) have domain J (f) and are continuous.
7 .4.4

Definition.

Ck, then from Corollary 7.4.3 it follows that any partial deriva
f of order k is independent of the manner in which the partial

If f E
tive of

derivatives are composed. Hence the operator D'>, which we shall define
in an informal way below, becomes rather useful. Suppose that
C En and

se(f)

C Em and a=

(a1,

,an), where a,.

set

J(f)

N . We shall
0
(7.4.5)

If

Ck and lal

k, we shall set

D<>J= (J]
where

0Dk<>k (f),

(7.4.6)

D k"k is ak compositions of Dk and Dk0f

f.

The notation

D"f

has become popular with the advent of the modern theory of partial
differential equations.

7 .4.5 Definition. Suppose f is a function with domain in En and range


in Em which has a differential at the point a and Vj E ( 1, k - 1 ) and V (u 1 ,
, ui), where u; E E", the function II;-!:1 DuJ has a differential at a.
Then V ( u1,
, uk) we set

dkJ(a) (u1,

, ud =fl 0D u J(a) ,
i=l

(7.4.7)

and call dkf(a) the kth -order differential of f at a. If u1 = u2 = = uk= u,


we shall set

dkj(a) (u)k

dkJ(a) (u,

u),

(7.4.8)

and d0f(a) (u)0= J(a).


A special case of Theorem

7 .2.5 says that if a function has continuous

first partials at a point, then the function has a differential at the point.

7.4 HIGHER-ORDER DIFFERENTIALS AND TAYLOR'S THEOREM j 329

The same type of result holds for kth-order differentials, although for
the sake of simplicity we shall state it in a slightly less general form than
is possible.

7 .4.6 Theorem. Suppose f has domain in En and range in Em, and there
is a ball B(a, p) C JFJ (J), so that all the partials of f of order k, (k 1)

have B(a,p) in their domains and are continuous there. Then f has a kth
order differential at every x E B(a,p) and
dkJ(x) (u1 ,

uk )

L
n

i1=1
Proof.

U1;,

uk 1 k

IJ

j=l

D1if(x).

(7.4.9)

Let P(k) be the statement of the theorem. The statement

P(l) is true by Theorem 7.2.5. Assume P(k) is true and we shall try
to prove P(k

+ 1).

Since the hypotheses of P (k

+ 1)

imply the hypoth

eses of P(k), it follows from P(k) thatf has a kth-order differential


at every point of B(a,p) and (7.4.9) holds.
Each function on the right side of (7.4.9) has, by the hypotheses of
P(k +

1),

continuous first partials on

B (a, p), and thus by Theorem

7.2.5 has a differential at each point of this ball. Further, Vuk+l E E11
and Vx E B( a,p)

(fI

Duk+1

J=l

D;J (x)

'k+I =1

uk+1ik+1 D;k+1

( fJ

D;J (x).

J=l

Hence, if we apply Duk+1 to both sides of (7.4.9) and use the last equality

we see that the induction is complete.


REMARK:

Clearly there is nothing special about partial derivatives

and we could have stated the previous theorem in terms of the direc
tional derivatives of any basis for 11 Note that the right side of (7.4.9)
shows that

dkf(x) is multilinear.

As an exampie of the formula (7.4.9) let us write down

d 3f(x) (u)3 in

terms of the more classical terminology for partial derivatives. We have

d3f(x) (u)3

L L .L ii i2uia
u u
.
1 =1 11=1
n

11

a3J(x)
axiaaxt.ax;"

t3=l 2
i
where of course u i is the ii component of the vector u.

7 .4. 7 Theorem. Suppose f is a real-valued function with JFJ(J) C En,


and the line segment L {x: x th + (1 - t)a & t E [O, l]} is contained
in JFJ(J). If Vk E (1, m) and Vx E L, f has a kth-order differential
dkf(x), then Vx E L, 3c E L so that c yx + (1- y)a, y E JO, 1 [,
and
=

330 j HIGHER-DIMENSIONAL DIFFERENTIATION

f(x)=
Proof.

For

1
m-1 1
L kl. dkf(a)(x - a)k +m.1 dmf(c)(x - a)m.
k
-

VtE [O, l]

(7.4.10)

let us set

F(t)

f(tx +(I - t) a) .

Since xEL we may write x


rb +(I - r)a with TE[O, l]. Hence
tx +(I - t)a= trb +(I - tr)a with trE [O, l]. Thus tx + (I - t)a is
in L and hence by the chain rule, VtE [0, 1],
=

F'(t)

df(tx +(I - t)a)(x - a).

It follows by an easy induction argument that

E[O, l]

VkE (1, m)

and

Vt

we have

p<k>(t) = dkf(tx +(1 - t)a)(x - a)k .


If we apply the one-dimensional Taylor formula with the Lagrange form
of the remainder, Corollary

F (I)=

4.4 .2(c),

we get

i 1i p<k>(O) +
p<m> (y)'
m.

k
k=O .

If we substitute in for

F(l), p<k>(O),

and

yE]O,l[.

p<m>(y)

we have completed

the proof of the theorem.


The above theorem is valid only for real-valued functions. However,
by applying it component wise to a function whose range is in

>

1,

Em,

we do get a Taylor remainder formula. However, the reader

should be cautioned that the same point

will not work for all com

ponents and will in general change with the different components.


There is an integral formula for the Taylor remainder that looks
like the integral remainder formula for functions defined in R. The
integral formula does not require that
that if

f( t)= k1!:1 Jk ( t)e


k

be real-valued. We only note

is a continuous function with domain

[a, b]

we define

7 .4. 7
Theorem. Suppose the hypotheses of theorem 7.4.7 are satisfied
and in addition VxEL, dmf(tx +(I - t)a)(x - a)m is a continuous func
tion of t. Then VxEL,

m-1 1
f(x)= L I dkf(a)(x - a)k
k.
k=O

7.4

HIGHER-ORDER DIFFERENTIALS AND TAYLOR'S THEOREM I 331

As in the proof of Theorem 7.4.7 we set

Proof.

F (t)

f(tx + (I - t) a) .

Then by the formula (5.2. l) we get

F(l)

I - t)m1 pcm>(t)dt.
i F<k>(O) + Jo{ 1 ((m-1)
.

k=Ok.

If we substitute in for p ck>(O) and F<k>(l) we get formula (7.4.10'),


which concludes the proof.

If we assume that f E cm, then we can write formulas (7.4.10) or


(7.4.10') in the terminology of the operators na. In this case Corollary
7.4.3 tells us that it doesn't matter in which order the Di; are applied
and we can write, fork m,

ii dkf(x)(u)k

where

"

L u"caDaf(x),
1al=k

fl '/=1 ( U j)a;, and Ca is a constant independent of j and X.

This can be proved by an easy induction argument onk. Suppose pis


a polynomial of

variables of degree m; that is,

p(x) = L Pa (X - a)",
lai"'m

where (x - a)a =Ilj= (xi - ai )ai. From Theorem 7.4.7 we can write

1
p(x) = L (x - a)acaDap(a),
lal.:;m

since the (m + l)st remainder vanishes. If lal m and we apply n

to

both sides of this equation, we get

a!Pa= Dap(a) =a!caDap(a),

where a!= nil aj ! . If we choose pso that Pa - 0, we see that Ca= I/a!

Thus we can write the formula (7.4.10) in the form

1
1
f(x) = L -, naf(a)(x - a)a + L I naf(c)(x - a)o:.
a
a.
lal=m .
lal.:;m-1
D Exercises
1.

Compute d3f(a)(x - a)3 for the following functions at the given

point a:

(a)
(b)
(c)

2.

f(x1, x2, x3)


(x2)2 + 2x1 (x2)2 + (x3)2, a= (I, 0, -1).
f(x1, x2, x3) =sin (x1 + x2 + x3), a= 0.
f(x1' x2) =exx, a= 0.
=

Write Taylor's formula about (0, 0) for

f(x1, x2) =sin (x1 + ex)


form=3.

332 I HIGHER-DIMENSIONAL DIFFERENTIATION

3.

Suppose

p(x1' x2)

x1 (x2)2 + 3(x1)2 x2

x1 + 2.

Write this polynomial as a polynomial in powers of


4.

f E C2 and Dif(a) Dd(a)


JFJ(f), show 3M > 0 so that Vb E

Suppose

convex set in

IJ(b) - f(a)I

,,s; Mlb

(x1 - 1)

and

(x2

1).

0. If B is a compact

- a l2

C"' and JFJ(f) is an


JFJ(f) is a convex set containing a E JFJ(f),
and 3M so that Va andVx E B, ID"f(x)I :s; M. Show that Vx E B, the
remainder in Taylor's formula goes to zero as m -'> oo.
5.

Suppose

open set in E

6.

n
.

is a real-valued function of class

Suppose B C

State and prove an analogue of Bernstein's theorem, 4.4.4, for

functions with domain in E n,

7.5

n > 1.

THE INVERSE AND IMPLICIT FUNCTION THEOREMS

f is a function with domain in E", range in Em and df(a) exists.


df(a)(x - a) + f(a) approximates f(x) very closely in a neighbor
hood of a we might hope that if df(a) is nonsingular, then f itself, re
stricted to a neighborhood of a, is a one-to-one function. It turns out
Suppose

Since

that this is essentially the case and our first object in this section is to
prove this, and indeed somewhat more.

7.5.1
Proposition. Suppose f is of class C1 with an open domain in E"
and range in Em. For every compact set K C JFJ(f) and Ve > 0, 3 f> > 0 so
that Vx, y E K with Ix - YI < f> and Vu E E" we have

ldf(x)(u) -df(y)(u)I

,,s;

elul,

IJ(x) - f(y) - df(x)(x - y)I


Proof.

Let

{u: u

E P &

lul

l}

(7.5.1)

,,s;

elx - YI.

(7.5.2)

be the unit sphere in E".

From the expansion

df(x)(u)

L uiDJ!(x),

j=l

df(x)(u) is continuous on the Cartesian product JFJ(f)


X S. If we restrict df(x)(u) to K X S, the restricted function is uniformly
continuous. Thus Ve> 0, 36 > 0 so that Vx, y E K with Ix - YI < a,
and Vu - 0 we have
it is clear that

ldf(x)(u/lul) - df(y)(u/lul)I
lul

we get

e.

df(x) and df(y), if we multiply the


(7.5.1). I(u 0, (7.5.1) is clearly true.

Using the homogeneity of


equality by

<

last in

7.5

THE INVERSE AND IMPLICIT FUNCTION THEOREMS I 333

To prove (7.5.2), for every a EK we can find a ball B(a,8(a))


JE>(f ) so that Vx,y E B(a, 8(a)) we have (7.5.1). Using the mean
value theorem, Vu E 711 there is a c on the straight line joining x and
y so that
C

[f(x)-f(y)-df(x)(x-y) ]

u= [df(c)(x-y)-df(x)(x-y)]

u.

Replace u by [f(x) - f(y)-df(x)(x -y) J and for the corresponding


c we get, using the C-B-S inequality,

lf(x)-f(y)-df(x)(x-y)I

ldf(c)(x-y)-df(x)(x-y)I.

If we now use the estimate (7.5.1) on the right, we have (7.5.2) in

B(a,8(a)).

The collection {B(a, 8(a)/2):a EK} is an open covering for Kand


thus reduces to a finite subcovering {B(ai,8(a;)/2):J E (l,q)}. Let
8=min{8{ai)/2:J E (l,q)} and suppose x,y EK with lx-yl < 8.
Now 3J E ( 1, q) so that Ix -a; I < 8. Thus IY-aiI IY -xi + Ix-a;I
< 8(ai) . Hence x,y E B(a;, 8(a;)), and since we have the estimate
(7.5.2) in this ball we have concluded the proof.
7 .5.2 Corollary. If f satisfies the hypotheses of Proposition 7.5.1 and
3a E JE>{f ) so that df(a) is nonsingular, then there exists a ball B(a, 8)
C JE>(f ) and 3m > 0, so that Vx E B(a,8) and Vu EEn we have

ldf(x)(u)I

m lul.

(7.5.3)

Moreover, if df(x) is nonsingular for every x in a compact set K C JE>(f ),


then 3m > 0 so that (7.5.3) holds Vx EK.
Proof. Since df(a) is nonsingular, it follows from Corollary 6.5.5
that 3 m > 0 so that Vu EE" jdf(a)(u)I 2m lul. Now, from Proposi
tion 7.5.1, 38 > 0 so that Vx E B(a,8) and Vu EE" we have

ldf(a) (u)I -jdf(x)(u)I

m lul.

Thus

jdf(x)(u) I

ldf(a)(u)I-m lul

m lul.

To prove the second statement, it follows from what we have just


proved, that Va EK, 38(a) > 0 and 3m(a) > 0 so that (7.5.3) holds
Vx E B(a,8(a)) , provided m is replaced by m(a). The collection
{B(a,8 (a)):a EK} is an open covering for K and thus reduces to a
finite subcovering {B(a;,8(aj)):J E (l,q)}. If we now take m= min
{m(a;):J E (l,q)} we have completed the proof.
The next two propositions constitute essentially the proof of the
Inverse Function Theorem.

3M I

HIGHER-DIMENSIONAL DIFFERENTIATION

Supposef E C1 has (an open) domain in En, range


in Em, and df(a) is nonsingular. Then there exists a ball B(a, 8) C (f)
and 3m > 0 so that Vx E B(a,8), df(x) is nonsingular and Vx,y
E B(a, 8),
IJ(x) - f(y)I m Ix-YI.
In particular th,is means that JIB(a, 8) is a one-to-one function.
7.5.3

Proposition.

Proof. From Corollary 7.5.2 there exists a ball B(a,281) C


3m > 0 so that Vx E B(a, 281) and Vu E En we have

(J)

and

ldf(x) (u) I

2m l ul.

(7.5.3')

If we take K as the closure of B(a, 81), then it follows from Proposition

7.5.l

Thus

that

38,0

8 < 81, so that Vx,y EK with lx-yl


IJ(x)-f(y) - df(x) (x-y)I ,,;;; m Ix - YI.

Vx,y

<

B(a, 8),

we get from

IJ(x) - f(y)I

<

28

we have

(7.5.4)

(7.5.3') and (7.5.4),

ldf(x) (x - y)I-m Ix - YI
m lx-yl.

This concludes the proof.

7 .5.4 Proposition. Suppose f is a one-to-one function with (an open)


domain in En and range in En. If Vx E (f) ,df(x) exists and is nonsingu
lar, then t.e (J) is open. In particular, f is an open map.

f(B (a,p)) contains


(a,p) \B(a, p) {x: x
E En & Ix-al
p}. S is a compact set and since f is continuous,
J(S) is compact. Further, since f is one to one and a ftS, it follows that
J(a) ft J(S). Thus the distance from f(a) tofS
( ) is a positive number,
m. Indeed, m is nothing more than the minimum of that continuous
function with domain Sand values IJ(x)-J(a)I (Fig. 7.5.1).
Proof.

If B

(a,p)

C f
( ) , we shall show that

an open ball in En with center atf(a). Let S

B(a, p)

FIGURE 7 5 1
.

7.5 THE INVERSE AND IMPLICIT FUNCTION THEOREMS I 335

We claim that B (f(a), m/2) C f(B (a, p)). To see this lety E B (f(a),
m/2) and let hu be that function with domain B(a, p) and defined by

hu(x) =IJ(x) - YI.


Since hu is a continuous real-valued function with a compact domain,
it takes on a minimum. This minimum must be in the open set B (a, p).
Indeed, on the one hand,
min hu hu(a) =IJ(a) - YI < m/2 ,
and, o n the other hand, i f x

S , then

hu (x) = lf(x) - YI ;;;,; IJ(x) - f(a)I - IJ(a) - YI


>

m - m/2 =m/2 .

If b is at a minimum point for hu, it is also at a minimum point for


h/. But

n
h/ (x) = L [Ji (x)

yi]2,

j=l

and hence all the partial derivatives of h/ exist. If we set

gk(t) =h/(b+tek),
then gk is defined in some open interval around t =0, and gk(O) is a
local minimum for gk Thus Vk E (I, n),

dgk(O)
= Dkh y 2(b) =0 '
dt
and hence Vu E E",

"
dhi/(b)(u) = L uiDih/(b) =0.
j=l

Now, hu2(x) = [f(x) -y]

[f(x) -y] and thus Vu E E"

dhu2(b)(u) =2df(b)(u)

[f(b) -y ] = 0.

Since df(b) is nonsingular, its range is all of E", and thus we must have
y = f(b).
7.5.5 Inverse Function Theorem. Suppose f is of class C1 with (an
open) domain in E" and range in E". If df(a) is nonsingular, then there exist
open sets U and V in En so that a E U, f(a) E V, JIU is a one-to-one June"
tion with range v' and the inverse function is also of class c.

Since df(a) is nonsingular, all the hypotheses of Proposition


B(a, p) C (f) so that Vx
E B(a, p), df(x) is nonsingular, and f1 = JIB(a, p) is a one-to-one
Proof.

7.5.3 are satisfied. Thus there is a ball

336 j HIGHER-DIMENSIONAL DIFFERENTIATION

f1 satisfies the hypotheses of proposition 7.5.4 and


fl(,(j ) is an open set. We take U = B(a, p) and V fl2,(J1) .
1
f -1 Since f1 is an open map, it follows from Corollary 6.4.5
Let g
1
that g is continuous. Now, f1 g is the identity function on V and thus
Vy E V, df1 g(y) = t, where t is the identity linear transformation on
En. Further, Vx E U, df(g(J(x)))
df(x) exists and is nonsingular.
function. Thus

Thus the hypotheses of Theorem 7.3.3 are satisfied, and it follows that

Vy E

V,

dg(y)

exists and

dg(y)

df(g(y))-1

0 l =

df(g(y))-1

(7.5.5)

Vu
dg(y)(u) is continuous.
df(g(y)). Since g is con

The proof of the theorem will be concluded if we can show that


E En, the function with domain V defined by

L(y)
1
f E C , it follows that Vu E En, the function on V with
values L(y)(u) is continuous. Let b E V and b = f(a). By Corollary
7.5.2 371 and 3M> 0, so that Vx E B (a, 71) and Vu E En
For the sake of simplicity let us set

tinuous and

idf(x)(u)I
This is the same as saying that

JuJ/M.

Vy E f(B(a,71)),

JL(y)(u)J

and

Vu

E En,

JuJ/M.

Next, let us note that

L(y) [L(y)-1 - L(b)-1 ](u)


Using the above inequality for

J[L(y)-1-L(b)-1](u)J
Now,

dg(y)

1
[L(b)-L(y)] L(b)- (u).

IL(y)(u) I

we get

M J[L(b)-L(y)] L (b)-1(u)J.

L(y)-1

and dg(b) = L(b)-1 If we use the fact that the


L(y)(L(b)-1 (u)) is continuous at b, then VE> 0,
Vy E V with IY - b I < () we get that the right side of the
is less than E. This completes the proof of continuity

function with values

3()> 0 so

that

last inequality

and of the theorem.

If f satisfies the hypotheses of Theorem 7.5.5 with the


exception that f E Cq, q 1, then the local inverse function g is also of class Cq.
7.5.6

Proof.

Corollary.

Probably the easiest way to prove this is to use formula (7.5.5),

which is

df(g(y)) dg(y)
0

If we evaluate both sides at

e;,

l.

we get

df(g(y))(D;g(y))

e;.

From Cramer's rule it follows that

(g(y))
DJgk(y) = k
]J(g(y))'

(7.5.6)

7.5

THE INVERSE AND IMPLICIT FUNCTION THEOREMS I 337

tJ.k (g (y)) is obtained from ]1 (g ( y)) by replacing its kth column


ei. Both a,.. ( g ( y)) and ]1 ( g ( y)) are polynomials in the functions
( Drf8) 0 g evaluated at y. Now, g E C1, and if f E C2, then by the chain
rule it follows that each of the functions ( Drf8) 0 g is in C1 Thus from
(7.5.6) it follows that Digk E C1, and moreover

where
by

(7.5.7)

;fJ.k ( g ( y)) is a polynomial in the functions ( Drf") 0 g, ( D; Drf")


g, and D;g, evaluated at y. If f E C3, we see from (7.5.7) that D; Digk

where
0

C1 and we may proceed t.o differentiate again. Proceeding in this way

we get the proof of the corollary. Of course, the precise way to do this
is by induction, but we shall not bother to write this out in a formal
way.
In most instances the easiest way to decide whether or not

REMARKS:

df( a)

is nonsingular is to check the rank of the Jacobian matrix.

If the Jacobian matrix is square, then, of course,

{::::} ]1 ( a)

df(a) is

nonsingular

0. The reader should be warned that the invertibility


theorem, 7.5.5, is only a "local" theorem. In other words, Vx E E(f)

it could be possible that

df(x) is

nonsingular and yet f is not "globally"

a one-to-one function. This is already seen in the following very


simple
E

example

] 1, 2[

and

of the polar coordinate transformation.

JO,47T[

f1 ( r, 8) = r
2

f (r, 8) = r
and, of course

cos

sin

8,

f(r, 8) = (/1 (r, 8), /2 ( r, 8)).

] 1 (r, 8)

For r

let us set

cos

sin

sin

r cos

-r

Then

r.

Hence J 1 never vanishes and yet f is not a one-to-one function.


The conditions given in Theorem 7.5.5 are only sufficient conditions
for local invertibility and are by no means necessary. Indeed, let us
consider the function

f(x1,x2) = ( (xl)a, ( x2)3).


It is clear that

df(O, O) = 0,

and yet f maps an open neighborhood of

the origin onto an open neighborhood of the origin in a one-to-one


fashion. Of course, we cannot expect in a case like this that the inverse
function will be differentiable at (0, 0), since it would contradict the
fact that

df(O, O)

is singular.

On the other hand, there are any number of examples which show

338 J HIGHER-DIMENSIONAL DIFFERENTIATION

that if

df(a)

is singular we may not get local invertibility at

a.

A specific

example is given by the function

f(x1, x2) =((xi )2, (x2)2).


We feel that the reader can construct many more.
We now turn to the important situation of a function whose domain is
in E111+n and range in En. The prototype example we have in mind is
that of a linear function where the jth component is given by

J i (x,y) =bJlx1+

+ b3711X711 + aily1+

+ ll1nYn

If det [a;k] - 0, then by Cramer's rule we can "solve" the equation


f(x,y) =0 foryin terms of x, so that f(x,y(x)) =0. Note that det [a 1k ]
is the Jacobian off taken with respect to theyvariable.

Suppose now that f is any function of class C1 with domain in E111+n

and range in En. Let us identify E111+" with E111 X En and designate the
points in the latter space by

duf(a, b)

(x,y).

is nonsingular, where by

of the function with values

f(a,y).

Suppose that

dyf(a, b)

f(a, b) =

0 and that

we mean the differential

Keeping in mind the example of the

linear function we discussed in the last paragraph and the fact that

duf(a, b) + J(a, b) approximates J(a,y) very closely in ayneighborhood


(a, b), one might hope that there is a neighborhood about a E E111
so that we can "solve" the equation f(x,y) =0 for y in terms of x in
this neighborhood; that is, f(x,y (x))=0. This is actually the case and
of

the precise statement is given by the next theorem.


However, before we state and prove the Implicit Function Theorem,
it will be illuminating to look at several very simple examples. These
should help to clarify the nature of the problem.
Suppose that

andyare variables on Rand

J(x,y) =x2+y2-1.
Now, for
of

x.

lxl

<

we can "solve" the equation

f(x,y) =

0 for yin terms

Indeed, we get two C"' functions

g1(x)=,
g2(x)=-,
so that

f(x, g1 (x)) =0

for j E

(1, 2)

and

lxl

<

1.

This simple example

already illustrates several points. First, we can only "solve" the equation

f(x,y) =0 "locally" and in general cannot find y as a function of x for


x for whichf is defined. Second, there may be more than one "solu
tion" to the equation. However, note that if f(a, b) =0 with b > 0,
then there is only one of these functions, namely, g1, whose graph con
tains (a, b). In the same way if b < 0, then g is the only one of these
2
functions whose graph contains (a, b). Indeed, we can say more. If
f(x,y)=0 and y E ]O, l] we must have y=g1(x), and if y E [-1, O[
all

7.5

THE INVERSE AND IMPLICIT FUNCTION THEOREMS I 339

y = g (x). That is, there is a unique function with range


2
JO, l] and domain J -1,1 [ and a unique function with the same
domain and range in [ -1,0 [ so that f(x,g (x))
0. From this we can
conclude that V(a, b) with b =fa 0 and f(a, b) = 0,there is an open neigh
borhood V C E2 about (a, b) and a function g so that if (x,y) E V,
and f(x,y) = 0, then y = g (x).
For fixed a, the Jacobian at b of the function with values f(a,y) is
Dd(a, b) = 2b. Thus dyf{a, b) is nonsingular{:::::} b =fa 0. Now, f(l,O)
= 0, and we can still find solutions whose graph contains (1,O), but
we must have

in

several interesting things happen. First, the solutions cannot be defined

x = 1, and are not differentiable at this


2 which contains (1,O) and for all
x which are sufficiently close to 1 and with x < 1, there are always two
numbers, y and -y, so that f(x,y) = f(x,-y) = 0. The same state of
affairs exists at x =-1. Thus we lose the unicity discussed in the last

in an open neighborhood of

point. Second, in every open set in

paragraph.
As a second example we shall consider a function with domain in
and range in

E2 We suppose that x,y, and z are variables on

R,

f1(x,y, z) = x2 + y2 + z2 - 1,
f2(x,y,z) = xy - z ,
and f(x,y,z) = (f1(x,y,z), f2(x,y,z)). If we "solve" the equation
f(x,y,z) = 0 by elementary techniques, for y and z in terms of x, we
get the following two C00 "solutions" for lxl < 1:

{l,x),
g {x) =- {l,x).
2
g1 (x) =

1 and f(a, b,c) = 0, it is not difficult


(a, b,c) in its graph. How
ever, f{l,0,0) =O and both solutions, extended to x= 1, have
(1,0,O) in their graphs. As before, at x =1 we lose the differentia
Now,

V(a, b, c)

to check that only

3 so that lal

one

<

of these solutions has

bility as well as the uniqueness of the solutions in the neighborhood


of a point.
Let us compute the Jacobian of the function with domain
having values

2 and

f(a,y,z), where a is fixed. We calculate this Jacobian as


2y

2z

-1

=-2(y+az).
y + az = 0, and the only points
(a,b,c) where both the Jacobian and f(a,b,c) vanish are (1,0,0).

Hence, this Jacobian can vanish only for

As in the previous example, the points where we lose differentiability


and unicity of the solutions are the points where

d<y,zJ

is singular.

340 I HIGHER-DIMENSIONAL DIFFERENTIATION

7.5.7 Implicit Function Theorem. Suppose f is of class Cq, q 1,


>(!) C Em+n, and !R,(f) C E". Suppose further that 3a E >(!) so
that f(a)=0 and df(a) is of rank n. Then there is an open set U C Em
containing 0, an open set V C >(!) containing a, a function g with >(g)
U and !R,(g) C V which satisfies the following:
(a)
g(O)=a.
(b)
f g(t)=0, Vt E >(g).
gE Cq and Vt E U, rank dg(t)
m.
(c)
(d) If x E V and f(x)=0, then x E !R,(g).

Proof.

Let us identify Em+n with E"' XE" in the obvious way, and

X {O} of Em+n andE" with the subspace


{O} XE" of Em+n. Let M be any linear subspace of Em+n of dimension
n so that the range of df(a) IM is E". Let P be the projection of Em+n
onto M 1. and A any linear transformation of Em+n into itself which takes
M 1. onto E"'. This is possible, since dim M n and thus dim M 1.= m.
identify Em with the subspace E"'

(See Exercises 12 and 13 of Section 6.5.)


If

Vx E >(!)

we set

F(x)=(A0P(x),f(x)) ,
then

is a function of class Cq with domain

Further,

Vu EEm+n

>(/)

and range in E"'+".

we have

dF(a)(u)=(dA0P(a)(u),df(a)(u))
=(A0P(u),df(a)(u)).
dF(a) is Em+n. Indeed, let (v1,v2) EEm+n and u1 E Af1.
A0P(u1)=v1,u2EM so that df(a)(u2)=v2-df(a)(u1),
and u=u1 +u2 Then A 0P(u)=v1, df(a)(u)=v2, and we see that
dF(a)(u)=(v1,v2). Consequently dF(a) has rank m + n, which means
The range of
so

that

it is nonsingular.
If we apply the Inverse Function Theorem to F, we find that there is
an open set V C

>(/)

containing

and an open set W C Em+n contain

ing F(a) so that FIV is a one-to-one function with range Wand having
an inverse function

of class Cq. Set

W1= { T:

T EE"' & (T,0) E W}.

It is clear that W1 is open in Em and is nonvoid, since


For every

T E W1

A P (a) E W1 .
0

let u s set

h(T)=G(T,0).
Then

h E Cq

and since

d G (T,0)

is nonsingular it follows that

dG(T,O)JE"'
has rank

m.

But

Vu EE"'

we have

7.5

THE INVERSE AND IMPLICIT FUNCTION THEOREMS I 341

d G(r,O) (u,O) =
Hence rank dh(T)

i=I

uiDiG(r,O) =dh(r) (u).

m.

Now,

h(A0P(a))=G(A0P(a),O)=G(A0P(a),f(a))
(7.5.8)

=GF (a)=a.
Further,

Vr

E W1

(AoPoh(r) ,Jo h(r)) =F 0h(r)


=F 0 G((T, O)) =( T, 0).
Thus, we get

AoP0h(r)=r ,

(7.5.9)

f 0h(r)=O.
x

Note also, if

E V and

f(x)=0, then F(x)

(7.5.10)
E Wand

x = Go F (x) =G(A0P(x), O) =h(A0P(x)).


0P(a)={t: t
U let us set

Finally,let us setU=W1-A
&

E W1}

Then

Vt

g (t)=h(r),
Clearly

E Em &

(7.5.11)

t=r-A0P(a)

t=r-A0P (a).

g satisfies the conclusions of Theorem 7.5.7, condition (d)

coming from (7.5.11). The proof is complete.


Condition (d) is a uniqueness condition on

(g) rather than on g

itself. We can get any number of other functions that satisfy the con

g with a function of class cq


U onto itself, leaves the origin fixed, and is of rank m at
every point of U. To pin down the uniqueness of g, the Implicit Func

clusions of the theorem by composing


that takes

tion Theorem is usually stated in a special form. We state this as a


corollary, although it is really a corollary of the proof.

7.5.8 Corollary. Suppose f is of cl ass Cq, q 1, eB(f) CE"' X En,


and (f) C En. Suppose further that (a,b) E eB(f) so that f(a,b) =0
and duf(a,b) is nonsingul ar, where duf(a,b) is the differenti al of the func
tion with dom ain in En and val u es f(a,y). Then there is an op en set U CEm
cont aining a and an op en set YC E" cont aining b, so that UXYC eB(f)
and a function g with eB(g)=U and (g) CYthat satisfies the following:
(a')
g (a) =b.
(b') f(x,g (x))=0, Vx E eB(g).
(c')
g E Cq.
(d') If (x,y) E UXYand f(x,y)=O,then y=g (x).

342 I HIGHER-DIMENSIONAL DIFFERENTIATION

We shall use the notations of the proof of the last theorem,

Proof.

Em X En by (x,y).
E". From the formula

except that we shall designate the elements of


Let

u= (0, u 2)

E Em X
11

df(a,b)(u) = L u2iD +d(a,b) =duf(a,b)(u2) ,


m
;1
duf(a,b) is nonsingular and f76(df(a,b)) C E", we
df(a, b) has rank n. Let M=E"; then, of course, the orthogonal
complement of Min Em+n is Em. As in the proof of Theorem 7.5.7 we
let P be the projection of Em+n onto Em so that V (x,y) E Em+n we have
P(x,y) =x. We take A to be the identity transformation of"'+" onto
itself. Hence the function F of the last theorem becomes
and the fact that

see that

F(x,y) = (x,f(x,y)).
If we apply the proof of the last theorem, we find that there is an open
neighborhood U C
C (J)

Em containing a and an open neighborhood U X Y


(a,b) and a function h(x) = (h1(x),h2(x)) of

containing

class cq with domain u and range in u x y so that from

(7.5.8) we have

h(P(a,b)) =h(a) = (a,b).


Thus

Further, from

(7.5.9)

we have

h1(x) =P0h(x) =x ,
and thus from

(7.5.10)

w e get

f0h(x) =f(x,h2(x)) =O.


g=h2, then condition (a'), (b') and (c') are satisfied.
(x,y) EU X Y
and f(x,y) =O; then (x,y) E (F) and F(x,y) = (x,O). Applying
the inverse function G and recalling that G (x,O) =h(x) we get
If we take

To prove the unicity condition (d') let us suppose

(x,y)

F(x,y) = G (x,O) = (x,g (x)),

from which it follows that

y = g (x). This completes the proof.


duf(a,b) is the matrix

Of course, the Jacobian matrix of

Dm+d1(a,b)

1
Dm+nf (a,b)
Dm+nf"(a,b)

As we remarked after the proof of the Inverse Function Theorem, the


easiest way to check that

duf(a,b) is nonsingular is to check that the

Jacobian, that is, the determinant of the above matrix, does not vanish.

7.5 THE INVERSE AND IMPLICIT FUNCTION THEOREMS j 1143

The reader may find it instructive to go back and review the examples
given before Theorem 7 .5. 7 in the light of that theorem and its corollary.

O Exercises
1.

Define a function f on

E2 by means of the equations

f'(x,y)=x2-y2,
f2(x,y)= 2xy.
Show that f has a nonsingular differential at every point except the
origin and thus at every point of E2\ { (0,O)} is

locally a

one-to-one func

tion. Show thatf is not a one-to-one function. Is the restriction off to


some neighborhood of

(0,O) a one-to-one function? [Note: From the


z=x + iy, then f'(x,y)
is the real part of z2 and f2(x,y) is its imaginary part.]

point of view of complex variables, if we set

2.

Letf be that function on

f'(x,y)=

{x

E2 defined by

x2 sin (I/x) {::::> x

oF-

0,

if x = 0,

f2(x,y)=y.
Show that

df(O,O) is nonsingular but thatf is not a one-to-one function


df(x,y) nonsingular for every

on any neighborhood of the origin. Is

(x,y) in some neighborhood of the origin?


3.

(a)

Suppose that f is a real-valued function defined on

E2 by

f(x,y)=x-y2
Does there exist a real-valued function g defined in a neighborhood of

x=0 so that f(x,g(x)) =O?


(b)

Suppose thatf is the same function as in part (a). Show that

there is a unique function


so that
4.

f(x,g(x))

g defined on a suitable neighborhood of x = 1

0 and g(x)

>

0.

Suppose that f is a real-valued function on

E2 defined by

f(x,y)=x2 - y2.
How many continuous functions g do there exist, defined on a neighbor
hood of x= 0 so that

f(x,g (x))= 0? Are there more functions for which

this is true if we remove the requirement of continuity on g?

5.

Suppose f has domain

E2 and is defined by the equations

f'(x,y)= e cosy,
x

f2(x,y) =ex sin y.

344 I HIGHER-DIMENSIONAL DIFFERENTIATION

Show that

f/2,(J)

E2 \ {O}. Isfa one-to-one function? Isflocally one

to-one? Note that in terms of the complex variable


is the real part of

6.

f2 (x, y)

ez,

= x + iy , f1 (x, y)

is its imaginary part.

If the open set U of Corollary 7.5.8 is connected and his a con

tinuous function with domain U which satisfies (a') and (b'), show that

h=g.
7.

Suppose

is a real-valued function with

J0(J)

C E2 and satis

fies the hypotheses of Corollary 7.5.8. Compute the derivative of g


in terms of the partial derivatives off.
Extend the results of Exercise 7 to higher dimensions, that is,

8.

where the domain and range off are in higher dimensions. In fact,
show that

d g(x) = -dyf(x, g (x))- 1


[Hint:

Vu

Note that

dxf(x, g (x)).

E Em and Vv E En

dxf (x, y)(u) = df(x, y)(u, O) ,


dyf(x, y)(v) = df(x, y)(O, v) .]
9.

Suppose

C1

with domain in En and range in Em and

is nonsingular. Show that there is a ball B (a,

=JIB(a, p),

then

Va

E R so that

a= 1, 3m

lg(x)-g(y) I
lx-yl"

Suppose

lim

lg( x) - g(y)I
Ix - YI

C1

0,

>

Vu

Vx E J0(f ) ,
[Hint: Use the
[J(x)-J(y)] when u = x - y.]
E E",

=ft 0, and

=ft 0. Show thatfis a one-to-one function.

mean value theorem on

7 .6

df(a)
g

so that if

and its domain is a convex set in E" and its

range is in En. Suppose further that

u df(x)(u)

J0(J)

> 0

lx-yl-O
10.

1 we have

Jim

lx-yl-O
and if

a<

p)

MAXIMA AND MINIMA

For a real-valued function whose domain is in E",

1, it is, of course,

possible to talk about the maximum and minimum of the function. The
purpose of this section is to develop some of the criteria that will tell
when a multivariate function has a (local) maximum or minimum. We
are sure that the reader knows the definition of the maximum and mini
mum of a function, but for the sake of completeness we shall repeat it.

7.6

MAXIMA AND MINIMA I 345

7 .6.1 Definition. If f is a real-valued function with domain in En,


then f is said to have a maximum [minimum] at a:::> a E >(!) and Vx
E >(f),f(x).;;; J(a) [f(a).;;; J(x)]. Thefunction f is said to have a local
maximum [minimum] at a:::> a E >(!) and there is a ball B(a,p) C En
so thatVx E B(a,p) n >(J),f(x).;;;j(a) [f(a).;;;j(x)].
The analogue of Theorem 4.3. l for a function with domain in En
is given below. Actually it is not so much a generalization of that theorem
as a consequence of it.

7.6.2 Theorem. Suppose f is a real-valued function with domain in En


and all the first partial derivatives of f exist at a E >(J). A necessary con
dition that f have a local maximum or local minimum at a is thatVk E (1, n)
(7.6.l)
Proof.

Let us set

gk (t)
This defines a function

f(a + tek).

gk with >(gk)

{t: a+ tek E >(! )} . Clearly

0 is an interior point of >(gk) If f has a local maximum or mini


mum at a, then gk has a local maximum or minimum at 0. Thus by
Theorem 4.3.1 we have

a, then the last theorem tells us that a


a is that df(a) 0.
Now, it is not necessarily true that if df(a)
0, then a is at a local
extremum for J. For example, let f be that function on E2 defined by
In case

f has

a differential at

necessary condition that f has a local extremum at

f(x,y)
Clearly

df(O)

x3 + y3.

0, but 0 is not at a local maximum or minimum for


df(a) 0 but is not at local extremum for f is

f. A point for which


called a

saddle point.

7.6.3 Definition. If f has domain in En and range in


said to be a critical point for f :::> df(a) exists and df(a) 0.

Em,

then a is

The next theorem gives sufficient condition under which a real-valued


function has a local maximum or a local minimum at a point. It is based
on Theorem 6.6.14.

7.6.4 Theorem. Suppose f is a real-valued function of class C2 with an


open domain in P and a E >(J) is a critical pointfor J. For every k E (1, n)
let us set

346 I HIGHER-DIMENSIONAL DIFFERENTIATION

(7.6.2)

ak(x) =

[dn (x) is called the Hessian offat x]. IfVk E (I, n), dk(a) > 0,
then fhas a local minimum at a, and ifVk E (l,n), (-I) kdk(a) > 0,
then fhas a local maximum at a. If the Hessian dn(a) 0, but neither of
the previous conditions hold, then fhas a saddle point at a, that is, has neither
a local maximum nor minimum at a.
Since

Proof.

df(a) =0,

(f)

straight line joining

B(a, p) C (f). Since


7.4.7, Vx E B(a, p), 3c on the

is open, there is a ball

from Taylor's Theorem


to

so that

I
f(x)-J(a) =2 d2f(c)(x - a)2.
For every

En

into

x E B(a, p),

let

T(x)

(7.6.3)

be that linear transformation from

{ei: j E (1, n)} is


[D; 0 Dkf(x)]. Since fis of class C2, it follows from
7.4.2 that Vx E B(a,p), Di 0Dkf(x) =Dk 0Dif(x). Hence

E"

whose matrix with respect to the basis

the Hessian matrix


Theorem

T(x)

is a symmetric linear transformation. Further, we have from

Theorem

7.4.6,

n n
i - a i)(x k - ak) Di Dkf(c)
d2f(c)(x - a)2 = L L (x
k=I j=l
=T(c)(x-a) (x-a).
o

(7.6.4)

E (l,n), d da) > 0, then since dk is a continuous function,


p so thatVe E B(a, cr ) andVk E (I, n), dk(c) > 0. Thus from
(7.6.3), (7.6.4), and Theorem 6.6.14 it follows that Vx E B(a,cr),
f(x) - f(a) > 0, and hence f(a) is a local minimum for J.
k
IfVk E (l,n), (-I) d(a) < 0, then arguing the same way as above
and using Corollary 6.6.15, we find that f(a) is a local maximum for J.
If Vk

3cr

To prove the last statement of the theorem we use the last statement in

6.6.15. This tells us that 3u, v E En, so that T(a)(u) u > 0,


and T(a)(v) v < 0. Since the functions with values T(x)(u) u and
T(x)(v) v are continuous, there is a ball B(a, 'Y)) C (f) so that
Ve E B(a, 'Y)), T(c)(u) u > 0 and T(c)(v) v < 0. Now VOi E R,
Ol 0, andVe E B(a,'Y)),
Corollary

T(c)(Olu) OlU = 10ll2 T(c)(u) u > 0,

T(c)(Oiv)

OlV = 10ll2 T(c)(v) v < 0 .

e > 0, 3a E R, a 0, so that IOlul < e and IOlvl < e. Sup


pose 0 < e < 'Y), and we set y =au+ a and z =av+ a. Then y, z
E B(a, e) and from (7.6.3) and (7.6.4),
For every

7.6 MAXIMA AND MINIMA I 347

J(y) - f(a)= T(c)(y - a) (y - a)= T(c)(au) au> 0,


f(z) - f(a) =T(c' )(z - a)
Thus the function with values
borhood of

so that

(z - a)= T(c')(av) av< 0.

f(x) - f(a)

changes sign in every neigh

is at a saddle point for

LAGRANGE MULTIPLIERS

If

is a real-valued function with

le(J)

C E", it very often happens

that we are not interested in the local extrema offbut rather in the local

extrema of a new function g that is


Usually the subset of

le(f )

restricted to a subset of

le(f ).
{x:

we are interested in is given by a set

h(x)=O} n le(J), where his a function with domain in E" and range
m ,,;;:; n. This is a standard type of problem that arises, for example,

in Em,

in classical analytical mechanics. It is usually called an extremal problem

for funder the constraint

h(x)=0.

The method of Lagrange multipliers gives a necessary condition

that a point

h(x)=0.

should be at a local extremum of funder the constraint

Actually, it is based on Theorem 7.6.2, being an elaboration

on that theme.

7.6.5

Theorem. Suppose f and h are of class C1 with (open) domains


Suppose also that f is real-valued, f/2,(h) C Em, m ,,;;:; n, and Vx
E le(h), dh(x) has rank m. A necessary condition that a be at a local ex
tremum for f restricted to the set {x: h(x)=O} n le(J) is that 3A. E Em,
so that the function F with domain [le(J) n le (h)] X Em and de.fined by

in

E".

(7.6.5)

F(x,y)=f(x) + h(x) : y
has a critical point at (a, A.); that is,

dF(a, A.)= df(a) +


Proof.

We shall suppose that

k=I

A_k dhk(a)= 0.

(7.6.6)

m< n, since otherwise, as we shall


h(a)=0, and rank dh(a)= m,

show later, the theorem is trivial. Since

according to the Implicit Function Theorem 7.5.7, there exists an open


set U C En-m containing the origin, and a function g of class C1 with
domain U, so that

Vt

E U, rank

dg(t)=n - m, h

and

g(t)=0,

g{O)

=a.

Since

g{O) E le(J), g is continuous, and le(J) is open, there is a


B{O,p) CU so that t E B{O,p) ::::::}g(t) E le(J). Consequently,
since a is at a local extremum for frestricted to {x: h(x)=O} n le(J),
it follows that t =0 is at a local extremum for the function f
g, and is
ball

an interior point of the domain of this function. We may apply Theorem

7.6.2 to f

g, and

also use the fact that h

is the zero function, to

348 I HIGHER-DIMENSIONAL DIFFERENTIATION

get the following two equations:

dh

g(0) = dh (a)

df

g(O)= df(a)

dg ( 0)

O,

(7.6.7)

dg(O)= 0.

Let N be the null space of dh(a) and N1- its orthogonal complement in
En. Now, (dh(a))= (dh(a) IN1-), and dh(a) IN1- is a one-to-one func
tion. Hence, since rank dh(a)= m, we must have dim N1-= m, and since
dim N + dim N 1n, we must have dim N=n - m. Since rank dg(0)
=n - m, it follows from the first equality of (7.6.7) that N (dg(O)).
Since df(a) is a linear functional on E", it follows from Theorem
6.5.7 that 3b EE" so that Vu EEn,
=

df(a)(u)
From the second equality of

df(a)

u b.

(7.6.8)

(7.6.7) we get Vu EEn,

dg(O)(u)

dg(O)(u)

b= 0.

Thus b E N1-. Now, from Theorem 6.5.9 we know that (dh(a)1)


=N1-. Thus 3.A EE"', so that

b= -dh(a)1(A).
If we use this in

(7.6.8) we get Vu EE",


df(a)(u)= -dh(a)(u) A.

Now,

(7.6.9)

Vu EE" and Vv EE"',

dF(a,A)(u,v) = df(a)(u)+ dh(a)(u) A+ h(a)


where, of course, by

dy(v),

(7.6.10)

dy we mean the differential of that function defined


(x,y) is y. Since h(a) 0, it follows from

on E" XE'" whose value at

(7.6.9) and (7.6.10) that


dF(a, A)= df(a)+

k=l

_Ak dhk(a)

0.

n= m, then we cannot use the preceding technique since g does


dh (a) has an inverse and thus dh(a)1
has an inverse. So again 3.A EEn so that b=-dh(a)1(A). We can then
If

not exist. However, in this case

proceed exactly as before. However, this situation is really trivial, since

h is one to one in a neighborhood of a and thus a is the only point in


h(a) 0. Consequently, a is an isolated point
of {x: h(x)= O} n .B(J). Of course, J restricted to this set still has a
relative maximum and minimum at a. The proof is concluded.
If we write (7.6.6) in terms of partial derivatives we get n equations:
the neighborhood where

Dk f(a)+

L
i=l

A1Dkhi (a)

0,

VkE(l,n).

(7.6.11)

MAXIMA AND MINIMA I 349

7.6

From the fact that

h(a)=0 we get m more equations


hi(a)=O,

VjE(l,m).

(7.6.12)

If in the set of equations (7.6.11) and (7.6.12) we replace (a,>.) by


(x,y), then these equations can be viewed as a system of m+n equa
tions in m+ n unknowns, x1,
xn and y1,
ym. The points that
are at the relative extrema of J under the constraint h(x)=0 must be
among the solutions of this system of m+ n equations. The auxiliary
solutions A.1,
,Am are called Lagrange multifJliers.
Unless the functions f and h are relatively simple, the method of

Lagrange multipliers is difficult to apply. However, we shall now give


an example which shows that it can lead to nice results. We shall obtain
the so-called

geometric-arithmetic means inequality. Other examples of

its uses are given in the exercises at the end of the chapter.
We shall prove the following statement:
IJVk

E (l,n), ak;;,: 0, th en

( J1n ak )l/n :;;1 ak.


n

(7.6.13)

To prove this, we shall find the maximum of the function

f(x)

under the constraint

(il x1 )2.

xi

>

0,

Vj E (1, n) ,

n
h(x)= L (xi)2 -1=0.
J=l

By use of the method of Lagrange multipliers, the local extrema are


contained among the solutions to the

n+ 1 equations

kE(l,n),
n

(7.6.14)

L (xi)2=1.
i=l

Sppose

(b,A.) is a solution of the above system. If we multiply the kth


(7.6.14) by b k we get

equation in

(7.6.15)
If we sum up over k and use the last equation of

(7.6.14)

we get

nf(b)+A.=0.
Putting this value of

A. into (7.6.15) we get, Vk E (l,n),


(bk)2= l/n,

(7.6.16)

350 I HIGHER-DIMENSIONAL DIFFERENTIATION

and thus

A= -n-n.

(7.6.16')

It is not difficult to check that the numbers given by (7.6.16) and


(7.6.16') constitute a solution of the system (7.6.14) and thus this sys

f is defined. To see
f, let us extend J, in
the obvious way, to a continuous function F defined on the set D
{x:
x E E" & Vj E (l, n), xi ;;;.: O}. If Sis the unit sphere in En, then since
F is continuous, FI (D n S) must take on a maximum and minimum.
Clearly, t_he minimum is taken on when 3j E (l, n) so that xi= 0,
and the minimum is 0. Thus the maximum of FI (D n S) is taken on
when Vj E (l, n), xi> 0, and by Theorem 7.6.5 the point where the
tem has a unique solution in the domain where

whether this solution leads to an extremum for

maximum is taken on must satisfy the system (7 .6.14). If we specify


that Vj E

(l, n), xi> 0,

this system has a unique solution. Hence it

follows that the maximum is taken on at the point whose components


are given by (7.6.16).

C R+,yk= (a )112,y= ( y1, -,yn) ,xk=yk/lyl,


k
x) . Then !xi = l and by what we have proved pre

(l,n)}

Let {a : k E

and x = (x1,

viously we have

But

The last inequality is precisely (7.6.13). In case 3j E

(l, n)

so that

a;=0, the inequality (7.6.13) is obviously true.


O Exercises
I.

Suppose A is a compact set in

E"

with a nonvoid interior A0,

and f is a real, continuous function with domain A which has a differen


tial at every point of A0 If Vx E {3A =A\A0, f(x)
so that df(a) =

2.

0.

Let f be a real-valued function defined on

f(x,y) = ax2
If

-,!:. 0

and b2 - 4ac =

0,

Let

bxy

J has
= (O, 0).

show that

or a relative minimum at (x,y)

3.

0,

show 3 a E A0

This generalizes Rolle's theorem.

E2

by the equation

cy2
either a relative maximum

f be that real-valued function defined on E2 by the


f(x,y)

1
x2 + xy + 2 y3.
1

equation

7.6

MAXIMA AND MINIMA\ 351

Find all the relative maxima and minima for f restricted to the triangle
and its interior which has vertices at the points

( -1, 6) .

(-1, 2 ) , (-2, 4),


f restricted to

What is the maximum and minimum of

and
this

triangle and its interior?


4.

Let

f be

that real-valued function defined by the equation

J(x,y) =2+2+2xy,
x
y

x=F-0, y=F-0.

Find all the critical points of the function and decide whether they are
at a relative minima, a relative maxima, or at a saddle point.
5.

Let P be that plane in 3 whose equation is

3x+ y- 2z = 5.

(12, 1, 5)

Find that point on P whose distance from the point

is a mini

mum.

6.

Find the shortest distance from the point

surface in 3 whose equation is

7.

Let

xy - z=0.

{3, 3,-1) to the

be a real-valued function with domain 2 given by the

equation

f(x,y) = ax2 + bxy2+cy4.


b2 - 4ac > 0, then f does not have a relative extremum at
a2 + {32 - 0, and t E R, the function defined by g,,13 (t)
=f(at,f3 t) has a relative minimum at t=0 if a > 0 and a relative
maximum at t= 0 if a< 0.
Show that if

0.

However, if

8.
& Vj

f is that real-valued function


(1, n), xi> O}, and defined by

Suppose
E

with

.B{f) = {x: x

E En

1 n
n i=t

f(x) =- L xi.
Use the method of Lagrange multipliers to find the minimum of this
function under the constraint
n

h(x) =TI xi - 1=0.


i=l

Deduce the geometric-arithmetic means inequality


9.

(a)

(7.6.13).

For fixed positive p and q, let f be that function defined on

the open first quadrant of 2 by the equation

Show that the minimum off under the' constraint

352 I HIGHER-DIMENSIONAL DIFFERENTIATION

h(x,y)=xy-1=0
is

(l/p) + (l/q).
(b)

b 0,

Use the result of part (a) to show that if

(l/p) + (l/q)

1 , and

0 and

> 1 and

> 1,

then

b a2 b2

This is a generalization of the result that 2a

b]

10.

[a,

Suppose f and g are nonnegative continuous functions on


and

is nondecreasing on the same interval. Use Exercise 9(b)

to show that if

> 1andq>1, and

(l/p) + (l/q)

1 , then

b (x)q dh(x) ]l/q .


afb f(x)g(x) dh(x) [ fab f(x)P dh(x) ]l/p [ fag
=

This is known as Halder's


If p

1 and

oo,

inequality.

define

[ abg(x)q dh(x) ]l/Q


f

=sup g.

Show that Holder's inequality is true in this case also.

11.
and

= 1,

and

ak
12.

p 1
k 0 bk 0,
[ ak rp [ b kqrq.

Use the results of Exercise

(l/p) + (l/q)

Vk

10

to show that if

(l,n), a

for 1 p <

and

then

Use Holder's inequality of Exercise

inequality

and

oo.

10

to obtain

Minkowski's

b
[ f)f(x)
g(x)IP dh(x) ]l/p [ fablf(x)IP dh(x) ]l/p
[ J:ig(x)Ip dh(x) rp.

[Hint:

IJ(x) g(x) Iv= IJ(x) g(x) 1v-i IJ(x) g(x) I


IJ(x) g(x) 1v-i IJ(x)I
IJ(x) g(x) lp-i lg(x)I.
+

Integrate both sides of the inequality and apply Holder's inequality on


the right.]

81 HIGHER-DIMEN

CHAPTER

SIONAL INTEGRATION

8.1

RIEMANN-DARBOUX INTEGRALS

The definitions and elementary results about Riemann and Darboux


sums and integrals for higher dimensions are essentially the same as
for the one-dimensional case given in Section 5.1. However, for the
sake of completeness we shall repeat these things in this section but
shall not dwell at any length upon them, and refer for the proofs to
Chapter 5.
As we have done at times in previous chapters, if

n we shall often

identify E"' with that subspace of En consisting of all vectors of the form

(x,O)

(or

(O,x)),

where

E Em and 0 E En-m. Also EPXEq shall

usually be identified with EP+Q in the obvious way.

8.1.1

Definition.

n nonvoid intervals in

An interval in En is an n-fold Cartesian product of


E1 ;

/ =/1X X/ll,

Jk

C E1,

Vk E (l,n).

The volume or content of I is defined by


n

I I I = II I/kl'
k=I

and the diameter of I is defined by


d (I ) =

[ 11k12 r2.

The interval I is said to be m-dimensional, m n <=> exactly n - m of the


component intervals
are degenerate, that is, consist of a single point.

Jk

The next thing to do is to define the decomposition of a closed interval

I in E". The most natural thing to do in carrying over Definition 5.1.1


is to say that .:l is a decomposition of I if and only if .:l is a finite set
of closed intervals in E", any two of which intersect in at most an interval
of dimension less than

n,

and whose set theoretic union is/. However,

with such a definition it becomes somewhat complicated to prove, at


least with the tools we have at present, that the sum of the volumes
of the intervals in .:l is the volume of/. Thus we give a somewhat more
restrictive definition.
353

354 I HIGHER-DIMENSIONAL INTEGRATION

8.1.2 Definition. A finite set A of intervals in E" is called a decomposi


X I" Vk E ( I , n) there is a decom
tion of the closed interval I = JI X
position Ak of I k so that every interval

I = J1 Xj2 X

Xj",

is in A and every interval in A is of this form. The norm of a decomposition A


is defined as

IA I = max {dU) :I EA}.


With this definition of the decomposition of an interval it is now not
hard to prove the fact about volumes that we mentioned above. We shall
do this formally in the next proposition.

8.1.3

Proposition.

sition of I, then

If I is any closed interval in E" and A is any decompo


III= L 111.
JE/i

n. Let P (n) be the state


P (I) is true (Section 5.1,
Exercise 2). Assume that P( n) is true and let I be an interval in E"+I,
I= I' X
X /"+1, and let A be a decomposition of I. Let I' =I' X
X I" and let A' be the set of all intervals I ' = I 1 X
X I", where
Vk E (1,n),I k EAk. By definition, A' is a decomposition of I' and
thus, by P(n),
The proof will be by induction on

Proof.

ment of the proposition. The statement

II' I
Now,

III= II'l IJ"+II

and, by

2: If' I

J'Eli'

P(I),

Hence

where the notation on the right means we are summing over all ordered
pairs in

A' X A"+',

as is usual when two finite sums are multiplied. It

is almost trivial to prove that there is a one-to-one map from A' x


onto

A,

and if

II' I II 11+' I= II I .

is complete.

(]', I "+I) EA' X A"+I

This establishes that

corresponds to

P ( n + I)

I E A,

A11+1

then

is true and the proof

8.1.4 Definition. A decomposition A* is called a refinement of a decom


position A (in symbols A* >-A) each interval in A* is contained in an
interval in A.

8.1

RIEMANN-DARBOUX INTEGRALS I 355

If A1 and A2 are two decompositions of the same interval. their common


refinement is that decomposition which consists of every interval that is an
intersection of an interval of A1 with an interval of A2
We leave as an exercise for the reader the rather simple proof that
the second sentence in the previous definition makes sense, namely,
what we have defined as a refinement is actually a decomposition. We
now proceed to define a Riemann sum and integral.

8.1.5 Definition. Let f be a real-valued function with domain the closed


interval I C En. Let JE> be the collection of all pairs (A, {xn}), where A
= {h: k E (l,m)} is a decomposition of I and Vk E (l,m),xk E Ik.
Let Rf be that function with domain JE> defined by
m

Rf(A, {xd) = L f(xk) lhl


k1

The function Rf is called the Riemann sum function for f, and any number
in its range is called a Riemann sum for f
The function Rf is said to have the limit R (f) <==> VE> 0, 3A, so that
A >A, IRf(A,{xd) R(f)I < E. In case Rf has a limit, we say that
f is Riemann integrable and the limit R (f) is called the Riemann integral
off.
-

In case

Rf has

a limit, we are justified in calling

R (f) the

limit since

it is unique. The proof is extremely simple and is found after Definition

5. l.3 when f has domain in

R.

8.1.6 Theorem. The function Rf has a limit<==> Rf is Cauchy in the


sense that VE> 0, 3A, so that A>A, and A' >A,

IRf(A,{xd)-Rf(A',{xD)I

<

E.

The proof of this is exactly the same as the proof of Theorem 5.1. 7
and we shall not repeat it. We go on to define upper and lower Darboux
sums and integrals.

8.1. 7 Definition. Let f be a real-valued bounded function with domain


the closed interval I and let i5f and [}J be those real-valued functions each
having domain the set of all decompositions of I and defined by

Df(A) = L M(]) Ill'

M(])=

sup {J(x):

m (])=

inf {f(x):

x E ]},

Jea

x E ]} .

Jea

The functions i5f and Qfare called the upper and lower Darboux sum functions
for f, respectively. The numbers Df(A) and !2f(A) are called upper and lower

356 J HIGHER-DIMENSIONAL INTEGRATION

Darboux sums for f, respectively.


Set

D(J)

=inf {D1(Li):

Q(J)

=sup

i f(x)dx,
i f(x)dx,

Li

10(D1)}

{Q1(d): Li

.f0(Q1)}

and call these numbers the upper and lower Darboux integrals off, respectively.
In case D (J) =Q (J) =D (J), we say that fis Darboux integrable and call
D (J) the Darboux integral off.
As in Chapter 5, to show the connection between Riemann integrals
and Darboux integrals it is necessary to prove the following lemma.

8.1.8 Lemma. If f is a real-valued bounded function on the closed


interval I and Li* >- Li , then

!21(Li) ,,;;; Q1(Li*) ,,;;; i51(a*) ,,;;; i51(a).


The proof of this, using Proposition 8.1.3, is exactly the same as the
proof of Lemma 5.1.6, and we shall not repeat it. We now state the main
connection between Riemann and Darboux integrals. Again, the proof
is the same as the proof of Theorem 5.1.5.

8.1.9 Theorem. The Riemann integral of f exists if and only if the


Darboux integral off exists. and if they exist then R (J) D(J) .
=

In case the integral of

fexists we call it the Riemann-Darboux integral

and denote it by

GENERALIZED LIMITS

It is possible that by this time the reader may have begun to wonder
whether or not the various concepts of limit we have used, for example,
the limit of a sequence, the limit of a function at a point in En, or the
limit of the function

Rf> can all be brought under some general defini

tion. The answer is yes, and we shall briefy


l describe this general concept
and show how our previous definitions fit under it.
A set t!V is called a

directed set if and only if there is a relation

t!V with the following properties:

(a)

( b)
(c)

(n)(n >- n).


(m)(n)(p)(p >- n & n >- m p
(m)(n)(3p)(p >- m & p >- n).

>-

m) .

>- on

8.1

RIEMANN-DARBOUX INTEGRAIS j 357

A real net is a real-valued function with domain a directed set G/V. The
real net S is said to have a limit s <=> VE> 0, 3N E <IV so that n >N
==} IS(n) -sl < E. If a net has a limit, the limit is unique. Indeed, sup
pose tis another limit. Then 3M E <IV so that n >M ==} IS(n) -ti < E.
Now, by condition (c) for a directed set, 3P E <IV so that P >M and
P > N. Hence if n >P we have

ls-ti IS(n)-sl+IS(n)-tl <2E,


and thus s = t. Hence we are justified in calling a limit of a real net the
limit of the real net.
A real net S is said to be Cauchy<=> VE> 0, 3N E <IV so that n >N
& m >N==} IS(n) -S(m)I < E. A real net has a limit<=>it is Cauchy. It
is clear that if a real net has a limit it is Cauchy. On the other hand,
suppose the net S is Cauchy. Then 3M E <IV so that n >M ==} IS(n)
-S(M) I < 1. Thus n >M ==} IS(n)I is bounded by 1+ IS(M) 1- For
every n >M set (,O(n)
sup{S(m): m >n} and Jim S = inf{(,O(n):
n > M}. We claim that s lim S is the limit of S. First, note that
n >m ==}(n) (,O(m). Next, VE> 0, 3n1 >M so that
=

0 (n1) -s < E.

(8.1.1)

Also, 3n2 >M so that m, n >n2 ==} IS{m) -S(n) I < E/2, and 3n' >n2
so that

(8.1.2)
Since n' >n2, it follows that Vn >n2 we get

IS(n) -S(n') I < E/2.

(8.1.3)

From the inequalities (8.1.2) and (8.1.3) we get that n >n2 ==}

0 ip( n2) -S(n) < E .

(8.1.4)

From condition (c) 3N E <IV so that N >n1 and N >n2 Hence if


n >N we get from (8.1.1) and (8.1.4) and the monotone character of
7P that

0 (,O(n) -s < E,
0 (n) -S(n) < E.
From these two inequalities it is immediately clear that n >N ==}

IS(n)-sl < E .
Let us see how we can apply this concept to the various definitions of
limit that we have given. In case <IV = N 0, a net is just a sequence and
the relation > is taken as the relation - Suppose f is a real-valued
function with rB(J) C En and a is an accumulation point of rB(J).
If x, y E rB(J) \{a} set x >y<=>Ix -al IY-al. It is easily checked

358 J HIGHER-DIMENSIONAL INTEGRATION

(J) \{a}
(J) \{a}
a<==>
a.
(A*, {xk*}) >- (A, {xd) <==>A*

that
is a directed set under this relation. The function J
restricted to the directed set
is a net and the function f has
the limit l at
the net f has the limit l at
In the case of the function Rf we take ,;Vas the domain of Rf and
define
is a refinement
Note that with
this definition our discussion of Cauchy nets provides a general proof
for Theorems 5.1.7 and 8.1.6. For the functions Df and[]! we can take
,;V as the set of all decompositions of the given interval I and take
as a refinement of
Then Df becomes a monotone non
increasing net and !2.f a monotone nondecreasing net. The numbers
are the limits of these nets, respectively.
and
Of course, everything we have said for real nets will work as well for
vector-valued nets with ranges in E", n 1.

A* >-A<==> A*
D(J) Q(J)

A.

A.

D Exercises
I.
Suppose that f and g are real-valued bounded functions each
having as domain the closed interval I C E". Show that

I [f(x) +g(x)] dx,,,:; If(x) dx+ Ig(x) dx,


Lf(x) dx+ J g(x) dx,,,:; L [J(x) + g(x)] dx,
2. Suppose f and g are bounded real-valued functions having as
common domain the closed interval I C E". If
show that

If(x) dx,,,:; Lg(x) dx,


Lf(x) dx,,,:; Lg(x) dx.

f,,,:; g

3. Suppose f is a bounded integrable function with domain the


closed interval I C E" and J is a closed subinterval of /. Is it always
true that

4.
Suppose that I and j are closed intervals in E" so that I U J
is an interval and I n j is at most an (n
!)-dimensional interval. If
J is a real bounded function with domain I U J, show that
-

T 1<x) dx T J<x) dx f- J<x) dx,


J1
J
J1uJ
Ju Jf(x) dx ]/(x) dx +Lf(x) dx.
=

8.2 JORDAN CONTENT I 359

5.

V n E vV let Kn be a nonvoid com


m, then K,. C Km. Show that the follow

Let vV be a directed set, and

pact subset of P so that if

>-

ing version of the Cantor intersection theorem is valid; that is,

n
is nonvoid.

8.2

{K,.: n E <IV}

JORDAN CONTENT

In the case of higher dimensions it is very apparent that there are


simple geometric shapes, other than intervals, over which we wish to
define integrals of functions. We shall first make a definition and then
show that the definition makes sense.

8.2.1
Definition. Let A be any bounded set in E" and I any closed interval
in E" so that A C I. If f is a real-valued, bounded function with J:>(J) =A
ex tend f to I by taking

Then define

f () = f(x) :::>x EA,


A x
0
if x EI\A .

and
In case these numbers are equal, we define the common value
of f over A and denote it by

as

the integral

In order that the previous definition make sense, it is clear that the
numbers we have defined should be independent of the interval I that

contains A. Although this seems almost obvious, we shall state and prove

it formally.

8.2.2 Proposition. If A C
with A C I, then the numbers

E" is

bounded and I is any closed interval

and
are independent of I.
Proof.

We shall make the proof only for the upper integral, the proof

for the lower integral being similar. Suppose, at first that, A

C ] C I,
I= /1 X
XI",]=J1 X
X]". Since ] C I, it
follows that Vk E (I, n),Jk C Ik, and hence there is a decomposition

where recall that

360 I HIGHER-DIMENSIONAL INTEGRATION

!!,.k Jk
l2\l3k,

consisting of three intervals (some possibly degenerate) 1/,


where
Corresponding to the set
k E (I,
of
decompositions, there is a decomposition
of I as in Definition 8.1.2.
Now, Ve> 0 there is a decomposition l!..1 of] so that
of

l2k =lk.

!!..

{t:..k:

n)}

(8.2.1)

n)

(!!,.k\{Jk})U t:..1k C
!!.'. \l!..i.
=
DrA(t:..') = L M(/')JI'J+ L M(/')JI'I =DrA(t:..1).

Jk;

For every k E (I,


the set
is a decomposition of
these give a decomposition !!..' of /. Since A
1, it follows that if
then M(I')
sup{JA(x): x E /'} = 0. Thus
I' E
/'EL>.1

(8.2.2)

/'El>.'\L>.1

From (8.2.1) and (8.2.2) we get

!A(x) dx-

IfA(x)

dx.:;;

DrA

(!!..1) -

IfA(x)

dx < e.

(8.2.3)

On the other hand, let !!..2 be a decomposition of I so that


(8.2.4)

If !!..* is the common refinement of !!..2 and !!.., we get

< e.
f
D1A
l!..1* = {/*: /* !!..* I* C j};
C !!..*\l!..1*,
*
M(/ )
DrA(!!.. *) = DrA *).
(!!..*) -

0.:;;

Now,take
that

!A(x) dx

(8.2.4')

E
&
then if/*
it follows
0. Thus we again have a formula like (8.2.2), that is,

(!!..1

(8.2.2')

Hence from (8.2.2') and (8.2.4') we get

IfA

(x) dx-

IfA(x)

dx.:;;

i51A(l!..1*) - IfA(x)

dx < e.

(8.2.5)

The inequalities (8.2.3) and (8.2.5) show that the upper integral of
over j is the same as its upper integral over /.
In case whereA
/,but it is not true thatA
1, let us proceed
as follows. If
and e> 0, set I/=
e], and
- e,
I,= 1.1 X
XI/. In the same way we construct the interval]. We
leave for the reader the very easy, but slightly tedious,task of verifying
that 3M> 0, depending on and/, so that

fA

Cl C
lk = [ak, bk]

C
[ak bk+

II.fA(x)
I I/

dx-

A(x) dx-

I
l <Me,
IfA(x) l <Me.
fA(x) dx
dx

8.2 JORDAN CONTENT I 361

The analysis is very similar to that carried out in the first part of the
proof.
Since

Ve>

A Cj Cj, CI.,

we have, from the first part of the proof,

0,

Using this fact and the two inequalities above, we see that the upper
integral of
and

A C K,

8.2.3

fA

over J is the same as its upper over

then

A is

Definition.

I.

contained in the closed interval

If Ais a bounded set in

x(A)=

f-

dx

and

En,

Finally, if

A CI

n K and

let

x(A)=
dx,
A

and call these the outer and inner Jordan content of A, respectively.
In the case where X'(A)= x(A), we say that A is Jordan measurable and
designate this common value by IAI. calling the latter number the Jordan
content of A.
The next theorem, although of a rather simple nature, is nevertheless
relatively important in the development of the theory of Jordan content.
In loose language it says that the outer content is a subadditive mono
tone function.

8.2.4
(a )

Theorem.

Suppose A and Bare bounded sets in En.

x(A U B) x(A)+ x(B),

and if the distance between A and B, d =


positive, then equality maintains.
( b)

inf{lx

- YI: x EA &

EB}, is

If A CB, then x(A) x(B).

For any set S, let )(s be that function which has


S, and has the value 0 on sc, the complement of S. It is
characteristic function of S.
(a) Suppose I is a closed interval in En and A U B C I.
E En we have XAuB (x) XA(x) + x8 (x ) we have

Proof.

I on

=I
[

x(A u B)

XAuB (x) dx
xA(x) dx+

XB(x) dx=X.(A)+x (B) .

the value
called the
Since

Vx

362 j HIGHER-DIMENSIONAL INTEGRATION

If the distance d between

ILll

tion of I so that

<

A and B is positive, let Ll be any decomposi

d/2. Then an interval in Ll cannot intersect both

LlA = {] : I E Ll & I n A # 0} and


0}. Then LlA n LlB =0 and if I
that M (}) =sup {XAuB ( x ) : x E ]} = 0.

and B in nonvoid sets. Let

LlB = {]:I E Ll
E Ll \ (LlA U Ll8)

&

it

follows

Thus

DxA uB(Ll)

2: M(j) II I+ 2: M( j ) II I

JEAA

=DxA

JEAB

(Ll) + Dx8 (Ll) ,

and consequently

[ XA

( x)

dx +

[ XB

(x)

dx DxA (Ll)

Dx8 (Ll)

Now, VE> 0 it i s certainly true that 31'.l so that

DxAuB (Ll)

[ XAuB ( ) dx +

ILll

=DxAuB (Ll) .
<

d/2 and

E.

Consequently,

x(A)+ x(B) x(A

B).

This combined with the inequality in (a) gives equality.


(b)
that

The inequality in part (b) is simply a consequence of the fact


C

B ==> XA XB and thus

x(A) = XA (x) dx
REMARKS:

[ XB(x) dx

x(B).

From the last proposition and the definition of outer

Jordan content, it follows that for every bounded set

x(A) =g.I.b.

{ J2: II I:

LP E r

E<P

},

C E",

(8.2.6)

Where r is the collection of all coverings of A containing a finite num


ber of closed intervals. Indeed, if

tion we have

x(A) x (

CP E f,

u {]:] ECP} )

On the other hand, the definition of

then from the last proposi-

'2: 111.
JE<P

x(A) says that it is the g.l.b. over

that subset of r consisting of decompositions of closed intervals that


contain

A.

Thus (8.2.6) is valid.

It is not hard to give examples of sets that are not Jordan measurable.
The easiest example is obtained by considering the set A as all the
rationals in
check that x

[O, I];

(A)

that is,
1, while

A= Q
K (A)

[O, I].

It is a very easy matter to

0. More generally, it is true that if

8.2 JORDAN CONTENT I 363

A
A

is a bounded set, and if

A0

0, then x(A)

0;

hence, if

cannot be Jordan measurable. Indeed-:- if we embed

interval

and Li is any decomposition of

Q)(A (a)

I,

x(A) > 0,

into a closed

then

L II I,

JEt;.
JcA

where the right side is considered to be zero if there is no J in Li that .is


contained in

A0

We now wish to state a necessary and sufficient condition that a


bounded set in En be Jordan measurable. This is done in terms of the
boundary of the set, and it would be well for the reader to review the
pertinent material at the end of Section 6.3. The condition we shall
give will also provide another viewpoint on the facts discussed in the
last paragraph.

8.2.5

Theorem.

A bounded set A

C E"

is Jordan

measurable<=>

I.BA I

=O.

Although it is not difficult to give a direct proof of this theorem, we


prefer to defer it until after we have proved Theorem 8.3.2, since it
is really a very easy corollary of that theorem. We shall leave it to the
reader in Exercise 8 at the end of this section to establish a result that
will give Theorem 8.2.5 as an immediate consequence. Nevertheless,
we want to use Theorem 8.2.5 to investigate under what conditions the
map of a Jordan measurable set remains Jordan measurable. We shall
need this type of result when we prove the transformation theorem for
multiple integrals in Section 8.5.

8.2.6 Definition. A function g with domain in E" and range in Em is


said to be locally Lipschitz<=> for every compact set K C .:{g), 3M >0
and 3 o >0 so that Vx,y E K with Ix-YI < o we have lg(x) - g(y) I
M lx-yl.
For example, every function of class C1 is a locally Lipschitz function.
The proof of this statement is essentially contained in the statement
of Proposition 7.5.1. We shall leave it to the reader to fill in the elemen
tary details in Exercise 9 at the end of this section.

8.2. 7 Theorem. Suppose g is a locally Lipschitz function with domain


and range in E". For every bounded open set U with U C .:{g) and VE>0,
3o>0 so that if Ac u and x(A) < o, then x(g(A)) < E. In particular,
if IAI 0, then I g(A) I 0.
=

Proof.

371>0

Since
so that

V is compact and g is locally


Vx,y E V with lx-yl < 71

Lipschitz,
we have

3M >0 and
lg{x)-g{y)I

864 I HIGHER-DIMENSIONAL INTEGRATION

M Ix - YI. Now, given E > 0, cover A by a finite number of closed


intervals {Ik: k E 0, m)} so that

IIkl < x(A)+ E/2(MVn)n.


k=l
Since it is clear that each Ik may be covered by cubes whose total volume
is as close to the volume ofh as we may wish, we may as well suppose
at the outset that each Ik is a cube. Let be the positive distance from
A to uc, the complement ofU, and ,= min ( 71, ). Considering each
h as a cube, we can decompose it into cubes, so that the diameter of
each is less than,. Thus we may as well suppose from the very begin
ning that d(Ik) < ,. Hence we can suppose that U /k C U. Thus,
since x,y E lk ==>Ix - YI < 7/, we get
lg(x) - g(y) I M Ix - YI Md(Ik).
This means that g(Ik) is contained in a cube]k whose side length is at
most M d(h). Now d(h) is Vn times the side length ofI k so that
lfkl (MVn)11 Ilk'
Since g(A) C g( U/k) C U }k, it follows from Proposition 8.2.4 that

x(Ufk) :L 11k1
k=l
(MV;;)n

L lhl

k=l

<

(MVn)"x (A)+ e/2 .

Ifwe take o=E/2(MVn)", thenx(A) < o==>x(g(A))

<

E.

8.2.8 Theorem. Suppose g is a locally Lipschitz map with an open domain


in E" and range also in E". If A is a bounded Jordan measurable set with
AC "(g) and g(A0) is open, then g(A) is Jordan measurable.
Proof. The theorem will be proved ifwe can show that l,8 g (A)I = 0.
Now, since A is compact and "(g) is open, there is a bounded open set
U with ACU C U C "(g). [Just take U as the union ofall open balls
with centers in Aand radii equal to halfthe positive distance from A to
o(g)c.] Hence, since ,BA= A\ A CU, we can apply the last theorem
and get lg(,BA)I= 0. Thus, if we can show that ,Bg(A) C g(,BA), we
shall be done.
To prove that ,Bg(A) C g(,8A), we note several facts. First, since A
is compact and g is continuous, it follows that g(A) is compact and hence
closed. It follows that g(A) C g(A). Next, since g(A0) is open we have
g(A0) C g(A)0 Consequently,

8.2 JORDAN CONTENT I 365

f3g(A)

g(A) \g(A)

g(A) \g(A0)

g(A \A0)

g({3A).

This completes the proof.


D Exercises

I. Show that an (n
1)-dimensional interval in En has zero n
dimensional Jordan content.

2. Let J be a continuous real-valued function with E(f) a compact


set in E1 The graph of J is the set of ordered pairs in J when considered
as a subset of E2 Show that the two-dimensional Jordan content of the
graph of f is zero. Extend this result to the situation where f is a real
valued continuous function with compact domain in En.
3. Show that the sphere { (x, y, z) : x2 + y2 + z2
dimensional Jordan content.

l} has zero three

4.
A Jordan arc or path in E" is a one-to-one continuous function
J with domain [O, l] and range in En. It is piecewise differentiable
the domain off' is all of [O, l] except for a finite number of points.
If J is a piecewise differentiable Jordan arc in E2 and f' is bounded,
show that the two-dimensional Jordan content of f,1(,(j) is zero.

5. Let A be a bounded set in En having a finite number (possibly


zero!) of accumulation points. Show that the Jordan content of A is
zero.
6. Give an example of a bounded open set that is not Jordan
measurable. (Hint: Let {rk: k E N} be the rationals in [O, I] and
Vk E N let lk be an open interval contained in [O, I] so that rk E Ik
and

k=l

lhl,,,;;;; 1/2.

7. Give an example of a bounded connected set in E", n 2, which


is not Jordan measurable.
8.

If A is a bounded set in En, show that

x(f3A)

x(A) - x(A).

Deduce Theorem 8.2.5 as a corollary.


9.

Show that every function of class C1 is a locally Lipschitz function.

10. Let g
(x, y) be the polar coordinate function with domain and
range E2 defined by
=

x(r, 8)

r cos 8,

y(r, 8)

= r

sin 8.

366 j HIGHER-DIMENSIONAL INTEGRATION

Let A be the open interval in 2 given by A = ]O, I [


that f3g(A) C g(f3A) but that f3g(A) #- g(f3A).

]O, 3'1T [ . Show

11. If in addition to the hypotheses of Theorem 8.2.8 we assume


that g is one to one, show that f3g(A) = g(/3A).

8.3 EXISTENCE AND PROPERTIES OF RIEMANN-DARBOUX


INTEGRALS

In Section 5 we showed that a sufficient condition that a function with


domain I C E1 have a Riemann-Darboux integral is that it be con
tinuous. Although this condition is clearly not necessary, it is never
theless not far off the mark. The purpose of this section is to give a
necessary and sufficient condition that a real-valued function defined
on a closed and bounded interval in En has a Riemann-Darboux integral.
Let us slightly change the definition of the limit superior and limit
inferior of a real-valued bounded function at a point of its domain so
as not to exclude the point itself from consideration, as is done in the
usual definition. Let us set
Limf(x)=lim [sup{J(x):x EB(a,r)
r-o
x-a
Limf(x)=lim [inf {J(x): x E B (a, r)
x-a
r-o

n
n

c(J)}],
c(J)}].

The function f is continuous at the point a E " (J) {::::} Limx-af(x)


Limx-af(x). If f is not continuous at a, the difference of the latter
two quantities gives a measure of the discontinuity at a. The number

w(J, a)= Lim f(x) - Lim f(x)


x-a
x-a

(8.3.1)

is called the oscillation of f at a . In terms of this quantity, the function


f is continuous at the point a E c(J) {::::} w(J, a)= 0.
Before we give a necessary and sufficient condition for the existence of
a Riemann-Darboux integral it will be convenient to prove a lemma. The
lemma may be viewed as a generalization of the theorem that a con
tinuous function on a compact set is uniformly continuous, and indeed
the proof of the lemma is essentially the same as the proof of the latter
fact.
8.3.1 Lemma. If f is a bounded real-valued function with a compact
domain in E11, and if at every point x E c(J), w(f, x) < E/2, then 38 > 0
so that Vx ,y E c(J) with lx-yl < S, we have IJ(x)-f(y)I < E.
Proof. Suppose x E c(J). By the definition of Lim and Lim,
3B(x ,S(x)) so that Vy E B(x,S (x)) n c(J) we have

Lim J(t) - E/4


x

<

f(y)

<

Lim J(t) + e/4.


x

8.3

EXISTENCE AND PROPERTIES OF RIEMANN-DARBOUX INTEGRALS I 367

Thus Vy , z EB (x , 8(x) )

(f) we have

-w(f, x) - e/2 < f(y) - J(z) < w(f, x )


or, in other words, Vy,z EB(x, 8(x) )

IJ(y) - f(z ) I < w(f, x)

+ e/2

(f) ,
+ e/2 <

e.

(8.3.2)

The collection {B(x , 8(x ) /2: x E (f)} is an open covering for


(f) and hence by the Heine-Borel theorem there is a finite subset
{B(xk. 8(xk ) /2 ) : k E (I, m)} that covers (f ) . Set 8 min {8(xk) /2:
k E (l, m)}. lfx,y E (f) with lx-yl < 8, and x EB(xk>8(xd/2) ,
then y EB(xk>8 (xk ) ) . Thus it follows frnm (8.3.2) that lf(x) - f(y) I
< E.
=

8.3.2 Theorem. Let f be a bounded real-valued function defined on a


closed interval I in En. The function f has a Riemann-Darboux integral
<=>Ve> 0 the compact set f!(f, e) = {x: w(f, x) e} has zero Jordan
content.
Proof.

and V'Y/>

Suppose f has a Riemann-Darboux integral. Then Ve>


0, there exists a decomposition a of I so that

o Dr(a) -!21(a) L [M(}) -m(])J Ill< ET/.


JEA

Let a' = {]: } Ea & } n f!(f, e) oF- 0}; if} Ea', it follows that
M(}) - m ( } ) e. Also, since f!(f, E) c u {]: J Ea'} it follows
from Theorem 8.2.4 that
,

x( n (f, e) ) x ( u {]: J E a' } ) L I J I .


JEil.'

Consequently,

ex(f!(f, e) ) L [M(})

m(]) ] 11 I < ET/.

JEA'

Hence VT/ >

we have

x(f!(f, e) )

<

T/,

which means that !f!(f, e) I = 0 .


Conversely, suppose that Ve> 0 , !f!(f, e)!
there is a decomposition a of I so that

jjX[!(f,<) (a) <

E.

0.

This being the case,


(8.3.3)

Let a*={]:] Ea & } n f! (f,e)=0} . The set K= U {]:]


Ea*} is a compact subset of (f) and Vx EK, w(f, x) < e. Thus by
Lemma 8.3.1, 38> 0 so that Vx , y EK with lx-yl< 8 we have IJ(x)
- J(y) I < 2E. Let al be a refinement of a so that I al I < 8. Let a1* be
the set of all }1 in a1 so that 3] E a* with }1 C}. If L Ea*1, then

368 I HIGHER-DIMENSIONAL INTEGRATION

clearly

IJI,

M(L)

m(L)

:s;;

2e.

Consequently, if

is an upper bound for

we have
0 :s;;

i5,(Li1) - Q,(6.1)

[M(]) - m(]) ] IJ I

JE!J.1*

[M(])

m(]) ] Ill

JE!J.1/!J.1*

:s;;

2e II I+2MDxn<f,l (Li)

:s;;

2e[III+M].

This shows that the integral of f exists.


There is one small point we must address ourselves to before the proof

can be considered complete. We should prove the statement that !l(f,

e)

is compact. Although it is not needed in the proof of this theorem, it

!l(f, e) is contained in the closed bounded


!l(f, e) is closed in/, or what is the same
thing, that J\!l(f, e) is relatively open in /. Suppose a E J\!l(f, e) ;
then w (f, a) < e. Now, VTJ > 0, 3p > 0 so that Vx E B(a, p) n I
will be needed later on. Since

set/, it is enough to show that

we have
Lim
t-a

f(t) - T}/2

<

f(x)

< Lim
t-a

J(t) +T}/2.

In particular this means that


Lim
t-a

f(t)

I,,p. f(t)
Hence,

Vx

B(a,p)

w(f, x)
If we fix TJ < E -

w(f, x)

T}/2

- TJ/2

:s;; Lim
t-x

:s;;

f(t)

1tlr;i J(t)

:s;; Lim
t-a

:s;;

J(t) +T}/2,

T J(t) +TJ/2.

n /,we have

=Lim f(t) - Ljm


t-x

t-x

w(f, a) ,

< E. Consequently,

f(t)

:s;;

w(f, a) +TJ.

then 3p > 0 so that

J\!l(f, e)

Vx

B(a, p)

I,

is relatively open in/.

Suppose A is a bounded set in En. Then


XA of A is continuous except at the points of
every x E {3A, w(xA, x) =I and thus Ve so that 0 < E :s;; 1,
= {3A . Embed A into an interval I and the last theorem tells

Proof of Theorem 8.2.5.


the characteristic function

{3A. For
n (XA' E)
us that

if and only if

XA (x) dx =

J!l(xA, e) I = lf3A I

XA (x) dx

= 0.
An immediate corollary of Theorems

8.3.2 and 8.2.5

is the following.

8.3.3 Corollary. Suppose A is aJordan-measurable set and f is a bounded


continuous real-valued function with J:>(J) =A. Then f has a Riemann
Darboux integral.

8.3

Proof.

{3A.

EXISTENCE AND PROPERTIES OF RIEMANN-DARBOUX INTEGRALS I 369

A into I; then fA is continuous except possibly on


0, O(JA, e) C {3A, so that applying Theorem 8.2.4
8.2.5 we get IO(JA, e) I
0. The proof is completed by

Embed

Thus V e >

and Theorem

an application of Theorem 8.3.2.


To put the result of Theorem 8.3.2 into a more usable form, it is
necessary to introduce the concept of an outer Lebesgue measure.
The outer Lebesgue measure can be defined in a manner analogous to
formula (8.2.6). The basic difference is that in defining outer Lebesgue
measure we allow a countable number of intervals in the covering,
rather than only a finite number, as in the case of outer Jordan content.
Although this does not seem to be much of a difference, actually it
turns out to be quite profound, and ultimately leads to a theory of
integration that is much more flexible and useful than the theory of
Riemann-Darboux integration.

8.3.4

Definition.

as

If A

1(A)

C En,

g.Lb.

the outer Lebesgue measure of A is defined

{ 111: <U E r }.

(8.3.4)

IE'U

where f is the collection of all coverings of A that consist of a countable number


of open intervals.
If A= 0, then A can be covered by a zero number of open intervals;
that is, there is a void covering 'U of

0.

sum over 'U to be zero, and thus 1(0) =

In such a case we consider the

0. However, even if void cover


1(0) = 0. Note also that we
the series l: tE<U (III) may not be

ings are not allowed, it is still clear that


have not taken account of the fact that

convergent. In the study of measure theory, one usually extends R

oo Ji R, and extends the order relation on R


oo, Vx E R. If a series of nonnegative terms
does not converge in the ordinary sense we say it converges to oo. Hence

to a set R U
to R U

{oo}

{oo},

where

by taking x <

in the new system every series of nonnegative terms is convergent, and


we set I (A) =
oo.

oo if

every series on the right side of (8.3.4) converges to

The new element

oo

is introduced to be able to make the statements

in measure theory in a neater way. Of course, the addition and multi


plication functions cannot be extended to the new system so as to main
tain all their relevant properties, and care must be exercised on that
score. However, addition and multiplication can be extended to some
extent and many of the usual properties of the order relation contnue
to hold.

!JI (A) = 0, we shall usually say that A has zero Lebesgue measure rather
than zero outer Lebesgue measure. Of course, as we already have noted,
the null set always has zero Lebesgue measure. Also note that every
countable number of points in En has zero Lebesgue measure. Indeed,
since each point can be covered by an open interval of arbitrarily small

370 I HIGHER-DIMENSIONAL INTEGRATION

volume, we see that a countable set can be covered by open intervals


for which the sum of the volumes is arbitrarily small. This fact is actually
a case of the following proposition.

Proposition. (a) If .A:, is a countable collection of subsets of E"


U {A: A E vf,}, then

8.3.5

and B

1(B).;;;

L l(A).

(8.3.5)

AE-t

IfA

(b)

Proof.

B,then1(A) .;;;1 (B).

N0 and range vf, U {0},


Vk > m, <l>(k) 0. Let us

Let <I> be a function with domain

m elements,
k o 1(<l>(k)) < oo, since

..

where if .A:, is finite with

then

suppose that

otherwise (8.3.5) is clearly true.

According to the definition of outer measure,


there exists a covering

'Uk

of

<l>(k)

VE>

0 and

Vk

N0,

so that

L III .;;;1(4>(k)) +E/2k.


Now,

'U

U {'Uk: k E N0} is an open covering for Band hence

1(B).;;;

( III ) .;;; 1(<t>(k)) +E

Since this is true,

VE>

0 we have proved (a). Part (b) is an immediate

consequence of the fact that every covering for B is a covering for

A.

The last proposition says, in particular, that the union of a countable


number of sets of zero Lebesgue measure is again a set of zero Lebesgue
measure. The reader should not come to the conclusion that sets of zero
Lebesgue measure consist only of a countable number of points. Indeed,
Cantor's set has zero Lebesgue measure and we have asked the reader
to verify this in Exercise IO at the end of this section. We now give a
connection between outer Lebesgue measure and outer Jordan content.

Proposition.

8.3.6

IfA is a bounded set in E", then


1(A).;;; x(A),

(8.3.6)

and equality maintains if A is compact.


Proof.

E>
{Ik: k

For every

of closed intervals

0, there is a covering of A by a finite number


E

(1, m)}

so that

k=I
Clearly,

Vk

( 1, m)
_

there exists an open interval J k so that Ik C J k and


m

>..<A>.;;; :L lhl
k=I

<

m
E
:L 11k1 +2
k=I

<

x<A > + E.

8.3

EXISTENCE AND PROPERTIES OF RIEMANN-DARBOUX INTEGRALS j 371

Since this is true,

VE>

Suppose now that


'U of

0 we have

(8.3.6).

is compact. For every

E>

0 there is a covering

by a countable number of open intervals so that

"L III <X(A)+ E.


/EU

Since

is compact there exists a finite number

this set which cover

A.

x<A)
Since this is true,

{h: k E (l, m ) } from

From Theorem 8.2.4 we get

VE>

x< u 1k)

"L 11k1 <>-(A)+ E.

k-1

0 we get equality in

(8.3.6).

8.3. 1 Theorem (Lebesgue). Suppose f is a bounded real-valued func


tion with domain the closed interval I. The function f is Riemann-Darboux inte
grable the set of discontinuities of f has zero Lebesgue measure.
Suppose f is Riemann-Darboux integrable. The set of points
f is not continuous is exactly the set f!(J) = U {f!(f, l/n):
n E N}. By Theorem 8.3.2 each set f!(J, l/n) has zero Jordan content,

Proof.

where

and hence by Proposition 8.3.6 has zero Lebesgue measure. By Propo

f!(f) has zero Lebesgue measure.


f!(f) has zero Lebesgue measure. Then
f!(J) it follows that f!(J, E) has zero Lebesgue

sition 8.3.5 it follows that

On the contrary, suppose

VE>

0, since

f!(f, E) C
f!(f, E) is compact, it follows from Proposition 8.3.6

measure. Since

th.at it has zero Jordan content. An application of Theorem 8.3.2 com


pletes the proof.
The theorem we have just proved makes it clear that many of the
hypotheses we made in the theorems of Section 5.2 were redundant.
The redundancies occurred in having to assume that various combi
nations of integrable functions were integrable. We first state the
generalization to higher dimensions of Theorem 5.1. 9 and Theorem

5.2. l (a), (b), and (c).

8.3.8 Theorem. If f and g are real-valued functions defined on a closed


interval I and are Riemann-Darboux integrable, then
(a)

Va , b E R, af+bg is integrable and

(b)

i [af(x)+bg(x)] dx=a f f(x) dx+b i g(x) dx.


f
implies f /(x) dx
Ill is integrable and f /(x) dx
l
l i lf(x) I dx.

(c)
(d)

0.

Jg is integrable.

372 I HIGHER-DIMENSIONAL INTEGRATION

Proof. The integrability property of af+ bg, IJI, and Jg is an


immediate consequence of Theorem 8.3.7. The remainder of the
proof is exactly the same as the proof of Theorem 5.2.1 and we shall
not repeat it.
The generalization of Theorem 5.2. l(d) is the following.

Theorem. If A and n are bounded Jordan measurable subsets of


and f is a bounded real-valued function with domain A U n which is
continuous except for a set of zero Lebesgue measure, then the Riemann-Darboux
integrals of f over A U n, A n n, A, and n exist and
8.3.9

En,

fAuB f(x) dx+ fAnB f(x) dx= fA f(x) dx+ JB f(x) dx.

(8.3.7)

A and n are Jordan measurable, I/Ml= lf3nl = 0.


A0 u n c (A u n)0 and A u B = A u n we have
/3(A u n) =A u n\(A u n)0 c (Au B)\(A0 u n). But (Au B)\
(A0 U n) C (A\A0) U (B\n) = f3A U {3n. Thus {3(A U B) C f3A U
{3B, and hence l/3(A U B) I= 0. This means that f3(A U n) has zero
Lebesgue measure and consequently XA u B is continuous except for a
set of zero Lebesgue measure. Since XA u B =XA + XB - XAnB, it follows
that XA n B is continuous except on a set of zero Lebesgue measure,
which means, of course, that X({3(A n B) ) = l/3(A n B) I = 0.
Embed A U B into a closed interval I and we get
Proof.

Since

Further, since

!A U B+ fA n B =!A + fB

Since the boundaries of all the sets in question have zero Lebesgue
measure, it follows that all the above functions are continuous except
on sets of zero Lebesgue measure. Hence by Theorem 8.3. 7 the integrals
of all these functions exist. The formula (8.3.7) is then a consequence
of Theorem 8.3.8(a).

8.3.10

If A and n are bounded Jordan measurable sets, then


B are Jordan measurable and

Corollary.

A U B and A

IA u BI+ IA

B I= IA I + IB 1.

The first mean value theorem, 5.2.6, has a natural analogue in higher
dimensions and we leave it for the reader to formulate. The analogues
of some of the other theorems in Section 5.2 become very much more
complicated in the higher-dimensional situation. For example, Theorem
5.2.4 must be interpreted in a somewhat more complicated way and has
a considerably more complicated proof. We shall carry this out in Section
8.5. Theorem 5.2.2 and its corollary on integration by parts becomes
Stoke's theorem in higher dimensions and we shall delay this until
Chapter 9.

8.3

EXISTENCE AND PROPERTIES OF RIEMANN-DARBOUX INTEGRAlS I 373

D Exercises
I. The set of rational numbers in [O, l] has zero Lebesgue measure.
Since the characteristic function of this set of rationals is not Riemann
Darboux integrable, why does this not contradict Theorem 8.3. 7?

2.

For every bounded set A

C En

(A)

show that

"X(A).

Give an example of a set for which equality does not hold. At any rate,
if A is Jordan measurable, this inequality shows that IA I = X(A).
3.

Give an example of a compact set that is not Jordan measurable.


6 of

[Hint: Let A be the complement of the set suggested in Exercise


Section 8.2. Show that (A)= 0, but X(A) 1/2.]

4. Give an example of a function with domain [O, l] which is


discontinuous at every rational number in [O, l] and yet is Riemann
Darboux integrable.

5.

Suppose f is a real-valued function defined on a bounded set


and J is integrable. Suppose g is another real-valued bounded
function defined on A and f(x) = g(x) except on a set of zero Jordan
content. Show that g is integrable and

C En

L f(x)
6.

dx=

L g(x)

dx.

Show that every function of bounded variation with domain

[a, b] has a Riemann-Darboux integral.


7.
Iff and g are defined on an interval I and are Riemann-Darboux
integrable, show that the functions

g= max (J, g) ,

/\

g= min(J, g)

are Riemann-Darboux integrable.


8. Suppose f is defined and integrable on an interval I C Em and
g

is defined and integrable on an interval]

C En.

Show that the function

h defined by
h(x, y) = f(x)g(y)
is integrable on

I X }.

9. Suppose that J is a real-valued integrable function with domain


the interval IC En. Show that the graph off has (n + !)-dimensional
Jordan content zero. The graph of f is the set of ordered pairs
{ (x,f(x)): x E I} considered as a subset of En+i.
10.
Show that Cantor's set has zero Jordan content and hence zero
Lebesgue measure. (Hint: The result of Exercise 8 of Section 8.2 may
be very useful.)

374 I HIGHER-DIMENSIONAL INTEGRATION

8.4

ITERATED INTEGRATION

To evaluate higher-dimensional integrals, usually the most practical


thing to do is to integrate with respect to one variable at a time. Thus
the evaluation of higher-dimensional integrals is reduced to the problem
of evaluating several one-dimensional integrals. The purpose of this
section is to show that this can be done. As is usual, we shall identify

Em X En with Em+n. If f is a bounded real-valued function with domain


the interval I X J C Em X En, then

1(x, y)dy

and

define real-valued bounded functions on

1(x, y)dy

I, and hence these functions

have upper and lower Darboux integrals.

8.4.1 Theorem. If f is a re al-valued bounded function with domain


the interval I x] c E m+n, then

IxJ

f(z) dz

Proof.

J [JJ
I [[

]
]

f(x, y) dy dx

f(x,y) dy dx

LxJJ(z)

dz.

(8.4.1)

Let us set

L(f) =

L [L

f(x,y) dy dx.

Then the following properties hold (see Exercises 1 and 2 of Section

8.1):

f gL(f) L(g).
L(f) + L(g) L(f + g).
(c) L(af) = aL(f), Va ER.
(d) If K C I X] is an interval, then

(a)

(b)

L("X.Ko) = L(X.K)

IKI

Let Li be any decomposition of I X] and VKELi set m(K) = inf{f(z):


z EK}. Also set K* = K0 ::::? m(K ) 0 and K* = K ::::? m(K ) < 0. If
we put

ft.= L m(K)X.K*,
KEl1.

then

f t.

f Hence from property (a) we get

L(ft.) L(f).
From the properties (b), (c), and (d) we get

ITERATED INTEGRATION I 375

8.4

!}r(A)

L m(K) IKI
Ket;.

Thus

!2t(A)

KE!;.

,,;; L(f),

and taking the supremum over

on the left we

have the left-hand inequality of the theorem.


The right-hand inequality of the theorem follows by similar reason
ing. The middle inequality is of course a well-known inequality for
upper and lower Darboux integrals.

8.4.2 Corollary. Suppose f is a Riemann-Darboux integrable function


with domain the interval Ix] C Em+n. Then 3A C I so that X(I n Ac)
0
and V E A, the function fx with domain J and de.fined by

fx(y)=f(x,y)

is Riemann-Darboux integrable, the functions

and h given by

g(x) = Jf(x,y)dy ,
h(x) Lf(x,y)dy,
=

are integrable on I, and

LxJJ(z)dz= L [Lf(x, y)dy] dx


L[L1<x.y)dyJdx.
,,;; h
8.4.
LxJJ(z)dz ,,;; J g(x)dx
,,;; I g(x)dx
,,;; { h(x) dx,,;; { J(z)dz.
J1xJ
J1
=

Proof.

Clearly g

and from Theorem

l we get

(8.4.2)

This shows that the upper and lower integrals of g are equal, and thus
g is integrable on /. In a similar way, we show that

is integrable.

Further, the above chain of inequalities shows that

L [h(x) - g(x)] dx
Now, the function F

0.

g is nonnegative and integrable and hence

376 I HIGHER-DIMENSIONAL INTEGRATION


is continuous except on a set of zero Lebesgue measure. Further at

a of continuity of F we must have F(a)= 0. For, if we


F(a) > 0, then there is an m-dimensional interval M C I, with
center at a, so that Vx E /, F(x) F(a)/2. Hence

every point
suppose

f F(x) dx
J1

F(a)
IMI
2

>

0,

which is a contradiction.
We have proved that

g(x)= Lf(x,y)dy= If(x,y)dy=h(x)


except on a set of zero Lebesgue measure. This taken in conjunction
with

(8.4.2) concludes the proof of the corollary.

E2 let I= [O, l]

We should point out that it is possible that iterated integrals may


exist without the integral existing. For example, in

[O, I]

form

and let A be the subset of I consisting of all rationals of the

(P/2n, q/2n), where p and q are odd integers. It is, we think, clear

that A is dense in /. Also, every line parallel to the x axis and every line

parallel to the y axis contains only a finite number (possibly zero) of


points in A. Let us putf= I - XA; then f has no points of continuity
in I and consequently it is not Riemann-Darboux integrable over /.
On the other hand, for every fixed y the function defined on
by

[O, I]

f(x,y) is integrable, and a similar result is valid for every fixed x.

Further, we have

J: f(x,y)dy =I

J: f(x,y)

and

dx= l,

and thus

Let us now work out a simple example that makes use of iterated
integrals. This is the type of exercise that is usually done in the ele
mentary calculus. However, the reader may possibly find it instructive
to look at it from the slightly more sophisticated point of view that we
have developed in this section. The problem is to find the volume
enclosed by the ellipsoid whose equation is

a,b,c ER+.
In precise terms this means that we wish to find the integral of the
constant function, with value 1, whose domain is the set
A=

{ (x, y, z) : x2
a2

y2
2

z2
2

+b + c

,,,;;;

I .

8.4 ITERATED INTEGRATION I 77

Let us embed A

[-b, b],

into the interval I X j X K,

where I = [-a,a],

and K = [-c, c]. B y definition the volume Vis


v

Now, for fixed

(y,z)

r
(x, y,z) d(x,y,z)
J1xJxK XA

so that

XA(x,y,z) _

(y/b)2 + (z/c)2.::;

lxl .::; a[l - (y/b)2 - (z/c)2]1'2 ,

0 otherwise

XA(x,y,z) = 0

and, of course,

if (y/b)2 +

(z/c)2

a[l - (y/b)2 - (z/c)2]112


(y z) '
O

<I>

Then we get

1, we have

XA(x,y,z) dx

> 1. Let us set

(y/b)2 + (z/c)2.::;

1,

otherwise.

f<1><11.z> dx= 2<1>(y,z).


-<l>(y,z)

Thus
v

fJXK <l>(y,z) d(y,z)

r
JK

[J <l>(y,z) dy ] dz.
J

Let us now set

qr z = b[l - (z/c)2]112 (z/c)2,,,;;


( )
0
therwise.
Then, for fixed

z,

fJ <l>(y,z) dy J'i'<z>
=

1,

<l>(y,z) dy.

-'i'(Z)

Hence
V

<z>
J
lK [ 'i'

-'i'(Z)

<l>(y,z) dy dz = - 11' abc.


3
4

DIFFERENTIATION OF INTEGRALS

In the previous example we have come across integrals of the form

g(y) =

J.<1>(11) f(x,y) dx.


</)(IJ)

It is frequently desirable to have sufficient conditions under which g is


differentiable.

378 I HIGHER-DIMENSIONAL INTEGRATION

8.4.3 Theorem. Suppose f is defined in the rectangle [a, b] X [c, d]


is integrable on [a, b] for each
E [c, d], and
exists and is

E2,

continuous on [a, b]

[c, d]. If we set

Dd

g(y)= J:!(x,y)dx,
then g' is defined and continuous on [c, d], and moreover

g' (y) J: D2f(x,y) dx.

(8.4.3)

Proof.

By an application of the Mean Value Theorem we get

g(y+ hh -g(y) J: [f(x,y+ h) -f(x,y)] dx


J: Dd(x,y+Oh) dx,
=

where 0 < 0 < I. Hence

'g(y+ hh-g(y) - J: D2f(x,y)dx'


If [Dd(x,y+Oh) - Dd(x,y)] dx'
J: ID2f(x,y+Oh) - Dd(x,y) I dx.
h
y+ h [c, d].
(x,y+Oh)
(x, y)
[a, b]
[c, d].
D2f(x,y)
y+ h [c, d]
ihl
l(x,y+Oh)-(x,y)I= jOhl
IDd(x,y+Oh)-D2f(x,y)j e/(b-a).
y+ h [c, d] ihl
lg(y + hh - g(y)-J: Dd(x,y)dx'
g'
=

Now,

is always chosen so that

are in the compact set

continuous,

Ve>

Hence

and

Since

0, 35 > 0 so that
< 5, and thus

is uniformly

and

<

<

Consequently, Ve> 0, 35 > 0 so that

and

<

This gives (8.4.3). Since the continuity of

<

E.

is immediate, the proof

is complete.

8.4.4 Corollary. Under the hypotheses of Theorem 8.4.3 and the addi
tional hypotheses that a <P
<I>
b, where <P and <I> are defi,ned and
differentiable on [c, d], and V E [c, d] the function fv on [a, b] with
values v
is continuous, then

(y)

f (x) J(x,y)
=

(y)

8.4

g(y)=

ITERATED INTEGRATION I 379

J<l>(y) f(x,y)dx
<p(y )

is

differentiable, and
g'(y)=
Proof.

J<p<l>(y(y) ) Dd(x,y)dx + f(<l>(y),Y )<I>' (y) - f('P(Y),Y)'P'(y)

Set

G(y,u,v) =

f J(x,y)dx.

By the use of the chain rule we get

G'(y,'()(y),<l>(y)) =D1G(y,'()(y),<l>(y)) +D 2G(y,'()(y),<l>(y))1P'(y)


+D3G(y,'()(y),<l>(y))<l>'(y).
Using the last theorem we get

and the other terms are obtained by the use of Theorem 5.2.5.

D Exercises
I.
X

[a, b]
a 'P(Y) <l>(y) b, where 'P and <I> are
real-valued functions defined on [c, d]. Show that

Suppose f is defined and continuous on the rectangle

[c, d].

Suppose further that

continuous

g(y)=

J<l>(y) J(x,y)dx
<p(y)

is continuous.
2.

Locate the relative maxima and minima of

g(x)
in the interval
3.

J:

cos[ (y

- x)2]dy

]O, oo[.

Suppose f is a continuous real-valued function defined on an

interval

I C E2 containing 0 as an interior point. If (x,y)


F(x,y)

J [f:f(t,r) dr] dt.

Show that at every interior point

(x,y)

E J,

f(x,y)= D1 D2F(x,y) =D2 D1F(x,y).

E J, define

380 I HIGHER-DIMENSIONAL INTEGRATION


4.

[a, b] x [c, d ] , and


D2f is continuous on this interval. Use Theorem 8.4.1
Corollary 8.4.2 to obtain a proof of Theorem 8.4.3.
Suppose f is real-valued and defined on

together with
or its

5.

Suppose

f, D1 D2f,

on a rectangle in

E2

and

D2 D1f are

defined and continuous

Use the results on iterated integration to show

that

D1 D2f= D2 D1f

6.

Suppose g is defined on

[O, l]

J: [{ g(t) dt ]
7.

En.

E [O,f(x)]} is
8.

J: tg(t) dt.

Suppose f is a real-valued, nonnegative integrable function with

domain A C

dx

and integrable. Show that

If

Show that the volume of the set

L f(x)

V n(r) is

{ (x, y): x E A &

dx. (Hint: See Exercise 9 of Section 8.3.)

the volume of the ball B ( 0, r) in

V2dr)
V2k-1 (r)

En show

that

r2k7Tk/k!
r 2k-l7Tk-l4kk!/(2k) !

Do this by making use of induction and showing that

Vn+i(r)

8.5

2Vn(l)

J: (r2 - t2)nl2 dt.

THE TRANSFORMATION THEOREM FOR INTEGRALS

The purpose of this entire section is to generalize Theorem

5.2.4

to

higher dimensions. We shall do this first for linear transformations,


which leads immediately to a corresponding theorem for affine trans
formations. Since under an affine transformation an interval goes over
into a generalized parallelepiped, it is comparatively easy to compute
how the volume of an interval changes under an affine transformation.
Now, functions of class

C1

can be closely approximated in small neigh

borhoods by affine transformations, namely, translations of their differ


entials. Since an integral may be approximated by weighted sums of
the volumes of small intervals, and since the volumes of the maps of
these small intervals under a map of class

C1 can be very closely approxi

mated by the volumes of maps of these intervals under affine transfor


mations, it seems quite reasonable

that a general transformation

formula for integrals can be obtained in this way. This is actually the
way it works and we shall develop this in the subsequent pages.
We shall begin by considering how the volume of an interval changes
under three very simple types of linear transformations of En into

E n.

8.5 THE TRANSFORMATION THEOREM FOR INTEGRALS I 381

We list them as follows:

g(x)=x+(A.-l)xkek, A.ER.
g(x)=(x"\
xrrn), <Fa permutation of (1, n).
g(x) = x + x2e 1

I.
II.

III.

If we write out these transformations componentwise we see that we get


a transformation of type I simply by multiplying the kth component of

by A.. We get a linear transformation of type II by permuting the

components and we get a linear transformation of type III by adding


the second component to the first component.

If I is a closed interval in En and g is one of the linear


transformations listed above, then
8.5.1

Proposition.

1.
II, III.

Proof.

I =1 1 x

lg(J )I =l>..I 111.


lg(J)I =III.
Case I is obvious. To prove case II we simply note that if

... x P, then

g(J)=/<T1 x ... x /<TR

and hence the result

is immediate. To prove case III we set

]={(x1+x2,x2): x1E11 & x2 E12},


K=/3

/".

It is clear that

g(J)=]

x K.

/1 X /2 transforms onto J is given in Fig. 8.5. l.


11= [a1,b1] and 12= [a2,b2], let L= [a1+a2,b1+b2] and M=L
/2 X K. Clearly, M is a closed interval for which g(/) C M .

The picture of how


If
X

L
FIGURE 8.5.1

Now, in case III it is clear that g is a nonsingular linear transforma


tion. It follows from the Inverse Function Theorem that

is an open

382 I HIGHER-DIMENSIONAL INTEGRATION

map with an open range. Since a linear transformation is always a


locally Lipschitz map it follows from Theorem 8.2.8 that

g(J)

is Jordan

measurable.
Let us apply Corollary 8.4.2 on iterated integration. Let us first note
that if we designate the elements of

En-2

by

(x, y),

then

Xum(x,y)= xJ(x)xK(y).
Now, since the transformation that takes

into

x + x2e1

is non

singular, it follows as in the argument of the previous paragraph that

XJ is

integrable. It is also clear that

X K is

integrable. (Hence from Exer

cise 8 of Section 8.3 we get another proof that


we have

\g(J) I=

xg(f) (x,

JrK

[I

LX/2

J(

LX/2

Xow

is integrable.) Thus

y) dx dy

Xorn (x, y)dx

XJ(x) dx

dy

Xk(y) dy.

Now, clearly

n ,Ii,_

xK<Y> dy= ,K, =

Further, using the theorem on iterated integrals again, we get

J(

LX/2

For fixed

x2

/2

X J(x) dx=

we have

XJ(x1, x2)=

J [f

I ::::>

/2

ai

xJ(x) dx1

x2

xi :;;; bi

dx2

x2'

0 otherwise.

Thus we get

Lxi

XJ(x) dx= \/1 I \!21,

and thus

lg(J) I

III.

8.5.2 Corollary. If A is a bounded Jordan measurable set in E" and


is any one of the three linear transformations of type I, II, or III, then
I.
II, III.

\g(A) I= l>-1 !Al.


lg(A) I= !Al.

8.5 THE TRANSFORMATION THEOREM FOR INTEGRALS I 383

Proof. In case I, if A.= 0, then the corollary is clearly true, since


g(A) is a bounded set in an (n 1)-dimensional subspace of E n and thus
-

has n-dimensional Jordan content zero.


If A. - 0, then g is nonsingular and by the same reasoning as in the
proof of the last proposition it follows that g{A) is Jordan measurable.
Thus VE > 0 we may cover A by a finite set of intervals {I; : j E (1, k)}
so that
k

L II;I

<

IAI + e/l>..1.

j=l

The set {g{I;): j E (l,k)} covers g(A) and hence


k

j=l

j=l

lg{A)I.;;; L lg{I;)I =IA.IL II;I.;;; l>..l IAI + E.


1

On the other hand, g- is a linear transformation of the same form as


g with A. replaced by l/A.. Thus, using the same argument as before,
we get
.
.;;; 1

IAI = lg-I

g(A)I

TiT lg{A) 1.

This gives the equality in case I.


Cases II and III are proved in essentially the same way, and we shall
leave this as an exercise for the reader.
Suppose now that h is any linear transformation from En into itself
which has the matrix representation [h ;k ] with respect to the ordered
basis {e1,
,en). Let h; be the jth row of [h;d; that is, h1 =(h;i,
h;2, ,h;n). Let A be a fixed bounded Jordan measurable set in En .
Set

(8.5.1)
This clearly defines a function VA with domain the n-fold Cartesian
product of En, and range in R. In Exercise 8 at the end of the chapter
we have asked the reader to show that h(A) is Jordan measurable.
8.5.3 Proposition. The function VA defined by (8.5.1) satisfies the fol
lowing properties:
(a) VA(h,, ,A.kb,hn)= l>..I VA(h,, ,hn).
(b) VA(h1, ,hn)=VA(h<I1,,h<In), for every permutation <T of
(1, n).
(c) VA(h,,,hn)=VA(h,,,hv+Ahq,,hq,,hn), Vp,
q E (1, n)so that p - q, and VA. E R.
(d) VA(e,,,en)=IAI.

Proof. To prove (a) we suppose that h is a linear transformation


from En into itself so that h1,
,hn are the rows of the matrix repre

384 I HIGHER-DIMENSIONAL INTEGRATION

h with

sentation of

= A.ek.

(e1, ,en). Let g be


k,g(ei) =e;, and g(ek)

respect to the ordered basis

a linear transformation of type I. Then Vj =

Thus, since

h(e;) =L h;;e1+ hk;ek,


ik
we get

g 0 h(ei) = L hiiei+ A.hk1ek.


ik
g 0 h with respect
h1, ,A.hk. ,hn. Hence

Thus the rows of the matrix representation of


ordered

basis

(e1 , ,en)

are

to the
using

Corollary 8.5.2 we get

VA(h1,

A.hk

,hn) =lg 0 h(A)I =IA.I lh(A)I


=IA.I VA (h1, ,hn)

g is a linear transformation of type


g 0 h are hu1,
hun. This follows from the fact that Vj E (1, n) , g(e;) = eu;.
To prove (c) we first note that if g is a transformation of type III,
then Vj = 2, g(e;)=ei and g(e2)=e1+ e2 Thus
n
g 0 h(ei) (hu+ h2;)e1 + L h;;e;,
i=2
To prove (b) we simply note that if

II, then the rows of the matrix representation of

and hence the rows of

g 0 h are h1 + h2,h2, ,hn.

Thus

VA(h1+ h2,h2, ,hn) =lg 0 h(A)I =lh(A)I


=VA(hi. hn).

Hence if we apply a transformation of type II that interchanges

hp, h2 and hq,

h1 and

and leaves the other rows fixed, use the result (b) and the

equality we have just obtained for a transformation of type III, and then
use the result (b) again, we have arrived at the fact that

VA(h1, ,hp+ hq, ,hn)=VA(hi.

hn).

If we use this fact and (a) we get

IA.IVA(hi. ,hn)=VA(hi. , A.hq, ,hn)


,hp+ Ahq," . Ahq, . ,hn)
=VA (hi.
=IA.IVA(hi. ,hp+ Ahq, ,hq, ,hn).

From this, if

A. =

0 we get (c). If

A.=0,

(c) is obvious. Since the equality

(d) is obvious, we have completed the proof.

8.5.4 Proposition. There exists only one function defined on the njold
Cartesian product of En that satisfies the conditions (a) through (d) of the
preceding proposition.

8.5 THE TRANSFORMATION THEOREM FOR INTEGRALS I 385

Proof. Suppose U is a function that satisfies all the conditions (a)


through (d) of the previous proposition and let
W=VA-U.

Then W satisfies the conditions (a) through (c) and moreover


W(e., ,en)= 0.
If

for some k, hk = 0, then by taking A.= 0 in (a) it follows that W(hi.


hn)= 0. Hence if {hk: k E (1, n)} is linearly dependent, there is
a linear combination of the form

hk +

#k

Ajhj

which is zero. In this case a repeated application of (c) (rigorously by


induction!) gives
W(h., , hn)

W(h., , hk +

j#k

Ajhj,

hn)

0.

If the set {hk: k E (1,n)} is linearly independent, then 3k1 E (1, n)


so that the set {hk,, e2,
enris linearly independent. Otherwise, the
set {hk: k E (l,n)} is in an (n-1)-dimensional space. Also, for the
same reason, 3k2 so that {hk,, hk2, e3,
,en} is linearly independent.
Proceeding by induction, there exist numbers k1 ,
,kn so that all the
different sets {hk,, , hk; ei+t. ,en },j E (1, n), are linearly
independent. Now, we may write

hk,

n
=

i=l

A.iki eJ,

and by a repeated application of (c) and (a) we get


W( hk, ,e2, ,en)= 0.

Also,
hk,= A.1k,hk, +

j= 2

A.;k,e;.

Using the previous equality and a repeated application of (c) and (a)
we get

Proceeding by induction we find that


W( h0k1 , hk2 , , hkn )= 0.

Now let <r be that permutation of (1, n) so that <r(j) = ki. Then, applying
(b) we get

386 I HIGHER-DIMENSIONAL INTEGRATION

8.5.5 Theorem. If g is a linear transformation of


a bounded Jordan measurable set in En, then

lg(A)I

ldet

En

into itself and A is

gl IAI.

(8.5.2)

If g is any linear transformation of En into itself, let g.,


, gn be the row vectors of the matrix representation of g with respect
the ordered basis ( e 1 ,
e ) The function
n

Proof.

to

satisfies the conditions (a) through (d) of Proposition


by the last proposition U =VA. Now,
(see Exercise

8)

g(A)

8.5.3,

and thus

is a Jordan-measurable set

and thus

From the preceding theorem it would be immediately possible to get


a formula for transforming integrals under linear or affine transforma
tions. We shall not bother to carry this through but shall instead proceed
directly to the general case. We shall first establish two propositions that

essentially constitute the heart of the matter.

Proposition. Suppose g is of class C 1 with (an open) domain in


and range in En and whose difef rential is nonsingular at every point of
its domain, or in other words, g has a nonvanishing Jacobian at every point
of its domain. For every compact set K C: J0(g) and Ve> 0, 38 > 0 so
that for every interval I C K with d (I) < 8 and Vx E I we have
8.5.6

En

lg(I)I

(l+e) IJ11(x)l III.

(8.5.3)

Let us first note that from the Inverse Function Theorem it

Proof.

follows that

is an open map with an open range and from Proposition

formula

7.5.1,

(7.5.2), it follows that g is locally Lipschitz. Thus from


8.2.8, g takes Jordan measurable sets onto Jordan measura
Hence the left side of (8.5.3) makes sense. We shall now break

Theorem
ble sets.

the remainder of the proof into several parts.


(a)

Vx, a

From Proposition
E K with

Ix - a l

<

7.5.l it follows that Ve'>


8' we have

lg(x) - g(a) - dg(a)(x

a) I

0,

38' >

e' Ix - al.

Let us suppose, at first, that I is a cube in K with center at


length

21,

where

21 < 81/Vn.
(8.5.4)

we get,

(8.5.4)
a and of side
d(I) < 8',

This of course means that

and vice versa. Also let us suppose that


tion. Then from

0, so that

Vk

dg(a) is the identity transforma


(1, n) and Vx E /,
(8.5.4')

8.5 THE TRANSFORMATION THEOREM FOR INTEGRALS I 387

Thus, since

Vx EI, Ix - al:,;;;; lVn,

and

lxk - akl :,;;;; l,

we get

l gk(x)- gk(a)I:,;;;; (1 + e'Vn)l.

(8.5.5)

g(J) is contained in a cube with center at g(a) and side


2(1 + e'Vn)l. Consequently,

This means that


length

lg(/)I:,;;;; (1 + e' Yn) n III.


If for a given e
=<let
(b)

e'

> 0 we choose

have established

dg(a) = I.

(8.5.3)

so that (1 +

e'Vn)n < (I+ e), we


x =a, since jy(a)

in this case, at least for

Let us now go to the case where it is not necessarily true that

dg(a) is the identity, but I is still a cube as in part (a).


number so that Vx E K and Vu E E"

Let

M be a positive

ld g(x)-1 (u)I:,;;;; M lul.


7.5.2. Hence from (8.5.4)
Vx, a E K with I x- al < 81,

This is a consequence of Corollary


linearity of

dg(a)-1

we get

and the

ldg(a)-1 (g(x))- dg(a)-1 (g(a))- (x - a)I


:,;;;; M lg(x)- g(a)- dg(a)(x- a)I:,;;;; M e' I x- al.

(8.5.6)

Let us set

h = dg(a)-1 g.
0

(g)
(8.5.6)

From the chain rule it follows that his of class C1 in

dh(a)

is the identity transformation. The inequality

inequality

(8.5.5)

for

Me'. Consequently,
<l + e/2. Applying

with the exception that

for a given

> 0, choose

the result of part (a) to

e'

'

and further
leads to the

has been replaced by


so that

(I + Me Vn)"
'

we find that

lh(J)I < (1 + e/2) III.


From Theorem

8.5.5

and the definition of

we get

lh(J)I = ldg(a)-'(g(J))I =I det dg(a)-11 lg(J)I.


But since

I <let dg(a)-1I

Ifn(a) 1-1,

we get

lg(J)I < (1 + e/2) Ifn(a)I I I I


(c)

(8.5.7)

Let us now finish the proof. From the compactness of K, and the

continuity and nonvanishing of jg ( x) , 3m > 0 so that

Vx E K we have
lfn(x)I m. Further, from the uniform continuity offn(x) IK, Vri > 0,
3 8 with 0 < 8 < 8' so that Vx, y E K with Ix - YI < 8 we have
-71 < lfn(Y)l- IJg(x)I <

Hence we find that

T/

388 I HIGHER-DIMENSIONAL INTEGRATION

(8.5.8)
Take

11

so that

Now, if

(I+e/2)(I+11/m) < (1+e).

8.2.7 and Corollary 8.3.10


{h: k E {l, m)} of cubes in I so that

from Theorem
set

d(I) < 8,
V, > 0 there

is any n-dimensional interval in K with


that

L lhl :s; III.


k=I

and
Let

a k be

the center of

I k.

Then using

(8.5. 7)

it follows
is a finite

we get

Jg(I)J <,+Jg( u h)J


m

:s; ,+ L Jg(h)J <,+ (l+e/2) L lfu(adl Jikl


k=I
k=I
From (8.5.8) we have Vx EI, IJ11(a k )J < (1+11/m) IJ11(x)J. Thus
calling how 11 was taken we ge V, > 0,

re

Jg(I)J <,+ {l+e) IJ11(x)J JIJ.


V, > 0, we have completed the proof. Note that if
0, then (8.5.3) is automatically satisfied.

Since this is true

JIJ

8.5.7 Proposition. Suppose g is of class C1 with an open domain in


En and range in En. If A is a bounded set with AC .E9(g) and Vx EA,
]11(x) = O, then Jg(A)J O
=

Let B be a bounded open set in .E9(g) so that B C .E9(g)


B. Since B is compact, from Proposition 7.5.1 it follows that
Ve> 0, 38> 0 so that x,y EBand Jx -yJ < 8 =>

Proof.

and AC

Jg(x)-g(y)-dg(x)(x-y)J

:s;

Jx-yJ.

{h: k E (l,m)} be a covering for Aby cubes so


E (1,m)} C B, the center of h is in A, d(Ik) < 8 and

Let

(8.5.9)
that

U {I k: k

L II kl :s; 2nx(B).
k=I
The factor

2"

I k in A.
(8.5.9) we get

is needed to make sure we can get the center of

Let] be any one of these cubes and a its center. Then from

Jg(x) -g(a)-dg(a)(x-a) I :s; elVn,


where 2l is the side length of].
Since

j11(a)

(8.5.9')

0, the rank of dg(a) is r<n; that is, dg(a) maps En

into a subspace of dimension

r.

For every

g(x)-g(a)

x Ej

h(x)+k(x),

let us write

8.5

THE TRANSFORMATION THEOREM FOR INTEGRALS I 389

where k(x) E !1C,(dg(a)) and h(x) E !1C,(dg(a))_j_. Hence we get

lg(x) - g(a) - dg(a) (x - a)12

l h(x)12 + lk(x) - dg(a) (x - a)12

It follows from (8.5.9') that

lh(x)I

,,;:; ElVn,

(8.5.10)

lk(x) - dg(a)(x - a)I

,,;:; ElVn.

Since g E C1 and 1J is compact, 3M> 0 so that Vx EB and Vu


E En we have

ldg(x) (u ) I

,,;:;

M lul.

Thus from (8.5.10) we get


l
.
lk(x)I.;:; (M+E)Vn

(8.5.10')

Let U be any orthogonal transformation from En onto itself so that U


takes !1C,(dg(a)) onto Er. Of course, U takes the orthogonal complement
of !1C,(dg(a)) onto En-r. From (8.5.10) and (8.5.10') we have

IUh(x)I

,,;:; ElVn,

IUk(x)I.;;; (M+E)Vn
l
.
Thus Uh(x) is contained in a cubel1 C En-r of side length 2e!Vn and
Uk(x) is contained in a cubel2 C Er of side length 2(M+E)lVn. Hence
Ug(x) - Ug(a) is contained in 11 Xl2 and Ug(x) is contained in the
cube 11 Xl2 + Ug(a), whose content is, of course, the same as the
content ofl1 x12 Thus

Set

(M+ l)nnn12; then VEfor which 0 < E< l we have

x(Ug(j)),,;:; EN Il l .
From this we get
n

x<ug(A)) .;:; :L x<ug(Ik))


k=I

n
.;:; EN :L IIkl
k=l

.;:;

EN2nx(B).

Since Eis arbitrary we get IUg(A)I= 0. But since I <let u-11


from Theorem 8.5.5 that

lg(A)I

we get

1u-1 Ug(A)I= ldet u-11 IU(g(A))I = o.


0

We are now in a position to prove a transformation theorem for


higher-dimensional integrals.

390 I HIGHER-DIMENSIONAL INTEGRATION

8.5.8 Theorem. Let g be a function of class C1 with o/;en domain in


and range in E"- Let A be a bounded set with AC J0(g) and suppose that
B = {x:}.q(x) =I' O} n A is a Jordan measurable set, and glB0 is one to one.
Then g(A) is Jordan measurable, and if f is a real-valued bounded function
with domain g(A), which is continuous except on a set of zero Lebesgue measure,
then

E"

fg(.4) J(x) dx= f J0g(x) IJg(x)I dx.

(8.5.. 11)

}g(x) =I' 0
Jg (x)
E A0, it follows from the Inverse Function Theorem that g is
map. It follows from Theorem 8.2.8 that g(A ) is Jordan mea

Proof.

(a) Let us suppose, at first, that

for every x E
=I' 0, Vx

an open

A.

Since g

E C1,

A= B;

that is,

it is locally Lipschitz, and since

surable. From Theorem 8.3.7 it follows that the left-hand integral in


(8.5.11) exists. On the other hand, since the inverse of

glA0

takes sets

of zero Lebesgue measure into sets of zero Lebesgue measure (the


proof is almost exactly the same as the proof of Lemma 8.2.7), it follows
that (J 0

g) IJgl

is continuous on

except on a set of zero Lebesgue

measure. Hence the integral on the right side of (8.5. l l) exists.


Let us first prove (8.5.11) in the case where f 0. Let
Jordan measurable set in
C

ii

A0

A0

A0

and V an open set in

Let TJ be the positive distance from

be a compact

so that

to vc, and

K C V
VE> 0

let 8 be a number so that 0 < 8 < TJ, and which is_small enough so that
the conclusions of Proposition 8.5.6 hold with V the compact set of
that proposition. Embed
sition of I so that
0

By '( J0

lal

into an interval I and let a be a decompo

< 8 and

Du,g>IJylK(a)

J0g(x)

IJy(x)I dx

<

E.

(J0 g)IJgl on K
Ea&] n K =I' 0}. Clearly
V,and thus fromProposition 8.5.6it follows thatVx Ej,

g)IJglK'

we mean that function whose value is

and zero elsewhere. Let us set a*={]:]

VJ Ea*,} c

J0g(x) lg(])I (I +E)f0g(x) IJy(x)l IJI.


If we put

M(})=sup{f0g(x):x E]n K}
and

M'(})=sup{f0g(x) IJg(x)I: x

Jn K}'

then we get

M(}) lg(])I (I+ E) M'(}) IJI.


Let us now set

8.5

THE TRANSFORMATION THEOREM FOR INTEGRALS I 391

f!l
Clearly,

g(K)

fo<K> ,;;; f ll,

2 M(] ho(J)

JEil *

and, since from Theorem 8.2.8 the compact set

is Jordan measurable, we get

lg(J)I
fg(K)f(x)dx,;;; fg(f) fll(x)dx=l:MU)
JEA*
< (1 + E) D(fog)fJgfK (6.)
,;;; (l+E)
Since this is true

VE > 0,

[ LJ0g(x)IJ0(x)I dx+E J .

we have

lg(K) f(x) dx,;;; fK f 0g(x) IJ0(x)Idx.

(8.5.12)

A is Jordan measurable, there is a Jordan measurable compact


K C A so that IA\Kl is as small as we please. By Theorem 8.2. 7,
lg(A\K) I is as small as we please provided IA\KI is small enough. But
since g(A) \g(K) C g(A\K), it follows that lg(A) \g(K)I can be made
as small as we please provided IA\Kl is small enough. Thus we see that
(8.5.12) holds true when K is replaced by A.
Now, let g-1 be the inverse of glA0 The function g- 1 has domain
g( A0), is a one-to-one function of class C1 and has a nonvanishing
Jacobian. Thus if L is any Jordan measurable compact set in g(A0), we
can apply the inequality (8.5.12) when K is replaced by L, g(K) is re
placed by g-1 (L) and f is replaced by f 0g(x) IJ0 (x)I . Thus we get
Since

set

Jo-iu> f0g(x)IJo(x)I dx,;;; f f0g 0g-1(x)IJo(g-1(x))l IJ0-1(x)I dx.


i

Now, of course,f0

g 0g-1

and from the chain rule

I Jo(g-1(x))i IJo -1(x)I = 1 .


Now, let
we get

be a compact measurable set i n A 0 and put

JK f 0g(x) IJ0(x)I dx,;;; fg(K) f(x)dx.

L= g(K).

Then

(8.5.12')

Now argue exactly the same as in the previous paragraph and we see

(8.5.12') is satisfied with K replaced by A. Thus (8.5.11) is satisfied.

For an f that takes on both positive and negative values, let us set

J+(x)=2 [IJ(x)I + f(x)],


1

f -(x)=2 [if(x)l-f(x)].

392 j HIGHER-DIMENSIONAL

INTEGRATION

I+ and 1- are nonnegative and integrable and I= I+ 1-.


We can apply (8.5.11) to each function 1+ and 1- and then use the addi

Then

tion theorem for integrals to get (8.5.11) for

J.

A
AC (g) it follows that A\B is bounded and its
closure is in (g). Sincejg(x) =0 on A\B, it follows from Proposition
8.5.7 that /g(A\B) / =0. But since g(A) \g(B) C g(A\B), we see that
/g(A)\g(B) / =0. Now, since g/B0 is certainly an open map, it follows
from Theorem 8.2.8 that g(B) is Jordan measurable. Hence it follows
from Corollary 8.3.10 that g(A) = g(B) U (g(A)\g(B)) is Jordan mea
(b)

We shall now complete the proof in the general case. Since

is a bounded set with

surable. From the hypothesis on fit follows that the left side of (8.5.11)

(f g) /Jg /
B except for a set of Lebesgue measure zero.
Further, Vx E A\B, I g(x) /Jg(x) / =0 and thus, since B is measur
able, (f g) /Jg / is integrable over A regardless of whether or not A
certainly exists. On the other hand, as we noted in part (a),
is

continuous

on

is Jordan measurable. Hence the right-hand integral in (8.5.11) always


exists.
We can write

l(x)dx=

g(A)

l(x)dx+

g(B)

J.

l(x)dx= {

Y<Al \g(B)

l(x)dx,

J g(B)

and also

If we apply part (a) of the proof we get

J.

g<B>

l(x)dx= f I g(x) /Jg(x) I <ix.

Jn

If we combine this with the previous two equalities we get (8.5.11) and
the proof is complete.
REMARK:

The statement of Theorem 8.5.8 could be made to appear

somewhat simpler if the concept of the Lebesgue integral had been


used instead of the concept of the Riemann-Darboux integral. For
example, the hypothesis that
set

AC (g)

and the hypothesis that the

is measurable could be removed, provided we supposed that

was Lebesgue measurable. Even if we had assumed that A was Jordan


measurable in Theorem 8.5.8, we could not have eliminated the con
dition on

B.

Also the theory of Lebesgue integration would allow us

to assume less about the functions that are involved.


We now want to show that the hypotheses which have been made in
Theorem 8.5.8 are not unnecessarily refined for easy and carefree
application. This will already be seen for the most standard and most

8.5

THE TRANSFORMATION THEOREM FOR INTEGRALS I 393

often used transformation: the transformation from rectangular to


polar or spherical coordinates. We shall do this in three dimensions. Let

g be that function with domain and range


given by (see Section

3 whose components are

6.5)
g1 (r,(},rp)= r cos (},
g2(r,(},rp) = r sin (} cos rp ,
g3(r,0, rp )= r sin (} sin rp.

The function g is of class C1 and an elementary computation shows that

}g(r, 0, rp)= r2 sin 0.


Let S be the set of points whose coordinates satisfy the following:

r
If

{3 cp {3

0 (} 1T,

0,

{3 arbitrary.

27T'

S, it is not necessarily true that g!A0 is one to one with a non


glA0 has these

vanishing Jacobian. However, it is always true that

properties and hence we may apply the transformation theorem and get

g(A)

J(x,y,z) dx dy dz=

For example, if f(x,y,z)

A= {(r, 0, cp): r
then

J f 0 g(r,(},cp) r2

(} dr d(} drp.

and

[O, l], (}

sin

[O, 7r], cp

[O, 2 7T]},

g(A) is the closure of B (O, 1) and we can thus compute the volume

of the unit ball in 3 Actually, if we use the full strength of Theorem


.

8.5.8

we can take S to be the set of points whose coordinates satisfy the

following:

0,

a,

aOa+1T,

{3 arbitrary.

Loosely speaking, the second-order differential sin (}

d(} d'(J is an

element of surface area on the unit sphere in 3. On a sphere of radius

r this element of surface area is magnified (or shrunk) to r2 sin (} d(} dcp
and hence r2 sin (} d(} d'(J dr is an element of volume in the (r,(},cp)
space.
There is a rather amusing way to remember the transformation
formula

jy(x)

(8.5.11)

of Theorem

8.5.8.

Let us suppose that

Vx

o(g),

> 0 and let us write

a(gl'...'gn)
Jg(x)=a(x1,,xn) (x).

Then the transformation formula becomes

g(A)

f(x)dx=

f f0g(x) o(g(X>, ..,gX:)) (x) dx1


A

' ,

dx n.

(8.5.11 )
'

This reminds us more nearly of the form of Theorem 5.2.4. Also, if

394 I HIGHER-DIMENSIONAL INTEGRATION

we rather loosely think of


think of a (g1,

dx1
,g") as dg1

dxn as canceling a (x1,


,xn) and
dgn, then the right-hand integral goes

over into

g(A)

J(g1, ..., gn) dgl ... dgn,

and this is the correct domain of integration since as the


over

A,

g(A).

the g variables vary over

the left side of

x variables vary

This last integral is the form of

(8. 5. 11').

O Exercises
1.

Suppose

J(x,y)

27T(x2- y2) sin 1T(x - y)2

Compute the integral off over the interior of the square having vertices
at

(1,0), (0,1), ( -1,0) and (0,-1) . (Hint:

v=x-y.)

u=x+ y,
2.

Make the transformation

C1 function with domain [a, b].


{(x,y): x E [a,b] & yE [f(x),f(x) + l]}.

Suppose J is a real-valued

Compute the area of the set

Do this by making the change of variable

g(x,y)
(See Exercise 9 of Section

3.

(x,y-f(x))

8. 3.)

'
Let g be the function on

g(x,y)

2\{0} defined

by

y', x' y')

x'

Using this transformation, compute the integral

JI

dxdy
(x2+y2)2

where A is the region common


+ y2 < l}, {(x,y): x2 + (y - 1)2
{(x,y): x2+(y- 1/2)2 > 1/4}.
4.

{(x,y) : (x - 1) 2
l}, {(x,y): (x-1/2)2+y2 > 1/4},

to the four regions


<

This exercise generalizes Exercise

function of class

C2 with

3.

Suppose f is a real-valued

domain an open connected set in En. Suppose

that the gradient off,

V f(x)=,L Dkf(x)ek>

k=l

8 .5

THE TRANSFORMATION THEOREM FOR INTEGRALS I 395

is a one-to-one function, and the Hessian off

Hr(x) = det[D; Dkf(x)]


0

0,

Vx

JE>(f).

Let A be a bounded Jordan-measurable set with A C JE>(f). Show that

IL Hx) I = l(Vf)-1 (A)I.


5.

Show that

J:., e-x dx = y/;.


Do this as follows: First note that

Change to polar coordinates and use the transformation theorem on


the integral on the right. Do all this carefully, justifying each step.
6. Compute the volume of the unit ball B (0, I) in En by changing
to spherical coordinates (see Section 6.5 and Exercise 8 of Section 8.4).
[Hint: Use induction to show that the Jacobian of the spherical coordi
nate transformation in En is

pn-l (sin (J1 )n-2 ( sin (J2)n-3 .. . (sin on-2).]


7. Suppose g is a function of class C1 with an open domain in E2 so
that Vx E JE>(g),J0(x) - 0. Give an example which shows that the
transformation theorem may not necessarily be valid for this type of g.
8. Suppose his a linear transformation with domain E" and range
in En. If A CE" is Jordan measurable show that h(A)is Jordan measur
able. [Hint: If h is singular, then dim t1<,(h)= m < n. Let U be an
orthogonal transformation of En onto itself so that U(t1<,(h))=Em.
Then IV h(A)I = 0 so that U h(A) is Jordan measurable. Now apply
u-1 to u h(A).]
0

CHAPTER 9 j THE
INTEGRATION OF
DIFFERENTIAL FORMS

I. LINE INTEGRALS

9.1

MOTIVATION AND DEFINITIONS

y(t) (y1(t),
y2(t), y3(t)) defines a function of the time t and represents the motion
under the action of a force field w(x)
(w (x), w (x), w3(x)), which
2
1
depends only on the position vector x
(x1, x2, x3). Let Ll be a decom
position of the time interval [a, b]. If 1 E Ll and T E 1, then
The concept of a line integral arises quite naturally in physics in con
nection with the motion of a particle. Suppose that

dyk (T)
1I
dt 1

i s the approximate displacement o f the particle


during the time interval], and

dyk(7)
wkoy(-r)---11 1
dt
k=I
3

the

x k direction

is the approximate work done on the particle in the time interval ].


If we add up all these quantities, as J varies over

ILll

Ll,

and then allow

0, we get the Riemann-Darboux integral

wk
fb 3
0

y(t)

d (t)
7

dt.

(9.1.1)

[a, b]. The integral in (9.1.1) is called a line integral.

By definition, this is the work done on the particle in the given time
interval

w is defined as a function

We can describe a line integral in the language of differentials in the

with domain in En and range in the set of linear transformations (linear

following way. A first-order differential form


functionals) acting from

En to E1 If x

w(x)(u)
If we set
396

wk(x)

1!

k=I

.B(w) and u

u kw(x)(ek).

En, then

w(x)(ek), and recall that in Section 7.2 we found that

9.1

dxk (x)(u)

MOTIVATION AND DEFINITIONS I 397

uk , then we have
(9.1.2)

k =l

y is a function with domain the closed interval [a, b] and


oE:>(w). If Vk E (I, n) and V t E [a, b], dyk (t) exists [that is,
dyk (t)/dt exists] we define the composition of w with y by the equation
Suppose

range in

y(t)

k =l

wk

y(t) dxk

y(t).

(9.1.3)

Now, from the chain rule we get

d xk

y(t)

dxk (y(t))

dy(t)

dyk (t).

Hence (9.1.3) can be written

y(t)

k =l

wk

y(t) dyk (t).

(9.1.3')

line integral of w over the line y by the equation

We define the

JY

J:

wk

y(t) d yk (t),

(9.1.4)

provided the integrals on right exist in some sense. They will certainly

y k is a continuous function
wk is continuous. Note that we may have
such a situation even though dyk (t) does not exist for every t E [a, b].
In case dyk (t)/dt exists V t E [a, b], is continuous, and each w is
k
continuous, then of course dyk (t) may be replaced by (dyk (t)/dt) dt
in (9.1.4). For n
3, the right side of (9.1.4) is the same as (9.1.1).
exist as Riemann-Stieltjes integrals if each

of bounded variation and each

It is quite possible that the line integral of a differential form over a

y may be independent of the "parameterization" of the curve.


t(r) defines
a continuous monotone increasing function from an interval [c, d]
onto the interval [a, b]. If we set a{r)
y t(r), then from Theorem

curve

In more precise language this means the following. Suppose


=

5.4.9 we have

and thus

(9.1.5)

t(r) defines a monotone decreasing function and


t ( T ) , then we get (by a slight modification of Theorem

On the other hand, if


we set f3 ( T )

5.4.9)
(9.1.6)

598 I THE INTEGRATION OF DIFFERENTIAL FORMS

The functions

a, {3, and y all have the same range, but the line
w with respect to these various paths may not be the same.
In other words, the line integral of w depends on more than just the

integral of

point set that is the common range of these functions. Intuitively


speaking, the line integral also depends on the orientation that these
functions give to the range. As T goes from c to d, a(T) goes from
y(a) to y(b), while {3(7) goes from y(b) to y(a), thus reversing the
direction of travel of the range.
The idea of an oriented curve explains, in some sense, the definition

J: f(t)dt = f J(t)dt.

(9.1.7)

We can think of [a, b] as the path given by the function defined by


y(t)=t,VtE [a,b]. On the other hand, define l(T)=a+b-T,
VTE [a,b]. If we write {3(7)=y0t(T) and w(x)=f(x)dx, then
w0y(t)=J(t)dt and w0{3(T)=-f(a+b -T)dT. Hence the line
integral of w over y is exactly the ieft side of (9.1. 7), while

J13 w = J: f(a+b - T) dT
-

I w.

The right-hand equality follows from (9.1.6) or the transformation


theorem, 8.5.8. Thus we may think of the integral on the right side
of (9.1. 7) as another name for the line integral of

w over {3.

On the basis of all the previous motivation we are now ready to give
some formal definitions.

9.1.1 Definition. A function a with domain the closed interoal I C E'


and range in En is said to be equivalent to a function f3 with domain the closed
interoal J C E1 <==? there exists a strictly monotone function t, with domain J
and range I so that f3 (T)=a0t(T). The function a is said to be orientably
equivalent to f3 <==? the function t is monotone increasing.
It is not difficult to check that the relations we have defined are indeed
equivalence relations. Note that since the function tis strictly monotone
and maps a closed interval onto a closed interval, it follows from Theo
rem 2.3.6 that it must be continuous. The idea of equivalence among
functions leads to the following definition.

9.1.2 Definition. A curoe in En is an equivalence class of functions (as


defined in Definition 9.1. I) with ranges in En. An oriented curoe in E" is an
equivalence class of functions with ranges in En under the orientable equivalence
relation.
Since our interest in curves is mainly in connection with taking line

9.1

MOTIVATION AND DEFINITIONS I 399

integrals, we shall, in general, be only interested in those curves which


display certain smoothness properties. Line integrals can be taken along
(oriented) continuous curves of bounded variation, but for our purposes
it will usually be enough to consider even smoother curves.

9.1.3 Definition. A function f with domain the interoal I C E1 is said


to be piecewise smooth <=> f is continuous and there is a decomposition a of I so
V] E a, f' is defined and continuous at every point of J 0 and f' 11 can be
extended continuously to ]. The function f is said to be piecewise regular <=>
it is piecewise smooth and f' vanishes only at a finite (possibly zero) number
of points.
A curve consists of a whole equivalence class of functions and it may
not be true that two equivalent functions always have the same smooth
ness properties. This may come about, for example, if the monotone
function that gives the passage from one function to the other is not
differentiable at a denumerable set of points. However, this is a relatively
minor matter and we give the following definition.

9.1.4 Definition. A [oriented] curoe is said to be continuous, of bounded


variation, piecewise smooth, etc. <=> the equivalence class (which is the curoe!)
contains a function with the indicated property, respectively. A continuous
[oriented] curoe is said to be closed<=> there is a representative 'Y with domain
[a, b] so that y(a) y(b).
=

Note that if we would change the equivalence relation given in Defi


nition

9.1. l

so that we consider only continuously differentiable func

tions, with nowhere vanishing derivatives, then if one function in an


equivalence class is piecewise smooth, or piecewise regular, so are all
the other functions in the same class. Also, if a function has range in
En, it is said to be of bounded variation <=> each real-valued component

of the function is of bounded variation.

9.1.5 Definition. A first-order differential form w is a function with


domain in En and range in the set of linear transformations (linear functionals)
acting from E" to E1. A first-order differential form is said to be continuous, of
class C1, etc.<=> Vk E (1, n) the function defined by wk(x) w(x)(ek) has
the indicated property, respectively.
=

NOTE:

We have already noted that a first-order differential form

can be written in the form

(9.1.2).

To give a formal definition of a

line integral we shall make the usual convention

that a curoe may be


designated by any suitable function in its equivalence class. When we do this

in the future, we shall not comment about it and assume that the
rea<,:ler is aware of this convention.

400 I THE INTEGRATION OF DIFFERENTIAL FORMS

9.1.6 Definition. If w is a continuous first-order differential form and


y is a continuous oriented curve of bounded variation with range in J0(w),
then the line integral of w over y is de.fined by

{
where

w=

J:

y(t) dyk(t),

(9.1.4)

Ji9(y) = [a, b].

As we have already noted, this integral depends only on the oriented


curve and not on any particular representative from the equivalence
class, so that the integral over the oriented curve is always well defined.
It would be possible to define the integral of a differential form over

unoriented curves by using the variation

dyk(t).

jdyk(t)I

in

(9.1.4) instead of

This would make the integral independent of any particular

representative of the curve. However, we shall not pursue this particular


line of thought. We would like to emphasize that if
smooth, then

is piecewise

(9.1.4) can be written

'Y

W =

fb
11

y(t)

d k(t)
Ti
dt.

(9.1.4')

Let us take a very simple example of the computation of a line


integral. Suppose

w is the differential form given by


w(x,y)

and

( x2

-y

) dx- 2xy dy ,

is the oriented curve given by

y(t) =

Note that the range of

cos

t , sin t ) ,

tE[O, l].

is that part of the unit circle which lies in the

first quadrant. Using the definition of a line integral we get

'Y

w=

(1
Jo

'!!..
_

cos2

{'
Jo

'!!..t 2

[(

cos2

sin2

'!!.t. dy1 (t) - 2


2

'!!..t2

sin2

cos

'!!..

sin

'!!..t dy2(t)
2

'!!..
'!!..
'!!..
'!!..
i sin i + 2 cos2 i sin i

] dt

1
3
On the other hand, let us take

a(t) =

as the oriented curve given by

(1- t,O),
(0, t -1),

tE[O,l],
tE[l,2].

The range of this curve consists of portions of two straight lines, one
proceeding along the x axis from

(1, O) to (O, O) and the other proceed-

9.1

ing along they axis from

(0, O)

to

dadt t) 1l
da2(t)
dt !

(0, 1). Hence this curve has the same

initial and final points as y. Now.


l(

[O, 1 [,

w=

tE
tE]l,2]
tE
tE]l,2].
t)2 dt

[O, 1 [,

Hence we get

MOTIVATION AND DEFINITIONS I 401

{1 (1Jo

_!,
3

We see from the two computations that we have just made that

This fact is not an accident and indeed we would get the same value
of the integral of

proceeds from
is so.
If

{'Yk: k

over every piecewise smooth oriented curve which

(1,0)

(1, m)}

to

(0, 1).

We shall see a little later just why this

is a set of continuous oriented curves of bounded

variation, let us formally form the sum


m

where Vk

(1, m), ak

k=l
R. If

(9.1.8)

ak'Yb

is a first-order continuous differential

form and the range of each yk is in 10 ( w ) , then we shall define

w=

aY

k =l

ak

(9.1.9)

w.

Yk

To be slightly more precise about things, the formal sum in

(9.1.8) may

be considered as another name for a real-valued function whose domain


is the collection of continuous oriented curves of bounded variation.
This function takes the value ak at 'Yk Vk

(1, m),

and takes the value

zero at every other curve. Such functions, defined on curves, are called
chains, and the integral of a first-order differential form over a chain is

defined by

(9.1. 9).

Intuitively speaking, if ak is a positive integer, then ak'Yk may be


considered as the curve 'Yk taken ak times. If ak is negative, say- I, then
we shall write ak'Yk as -yk and think of this as a curve having the same
range as 'Yk but with the opposite orientation. Let us briefly comment
about the meaning of "opposite orientation" of an oriented curve. From
an intuitive geometric point of view, any curve should have two different
orientations or two different directions of travel. From a more precise

402 I THE INTEGRATION OF DIFFERENTIAL FORMS

point of view, this means that if the orientable equivalence relation is


applied to a curve, it should break up the curve into exactly two nonvoid
classes that are oriented curves. One oriented curve is then said to have
the negative or opposite orientation of the other.
Because of the generality with which we have been discussing curves,
it is not always true that every curve breaks into two oriented curves.
This is already seen to be true in the extreme case where the range of the
function is one point. Clearly every function in such a curve belongs to
the same equivalence class under the orientable equivalence relation.
However, this is not really a very good example, since such a curve plays
essentially the same role for curves as zero does for the real number
system. For a more pertinent example, let y be the unoriented curve
defined by the function with values

a(t) = (cos 27Tt,cos 27Tt),

tE [O,l].

In other words, y is a collection of curves equivalent with the function


a. The range of this curve is the straight-line segment lying between
(-1,-1) and (1, 1). Suppose the curve y breaks into more than one
class under the orientable equivalence relation. Suppose f3 is a function
in y but not orientably equivalent to a. But since f3 is equivalent to a,
there is a continuous monotone decreasing function t with range [O,1]
so that {3(-r) = a t(-r). Let us set 8(0-)
a ( I - a-), Va-E [O,l]; then
8(1- t(-r)) = a t(-r) = {3(-r). Since 1- t(-r) is a monotone increasing
function, it follows that 8 and f3 are orientably equivalent. But 8(0-)
= a(l- a- ) = a(a- ) and thus 8 and a are orientably equivalent. But
this means that a and f3 are orientably equivalent, which is a contradiction.
To avoid these types of pathologies, as far as the orientation question
is concerned, it is necessary to take a more restrictive class of curves and
somewhat more restrictive equivalence relations. We shall not develop
this point of view since it will not be important for the problems we
shall discuss. However, we have asked the reader to investigate the
situation in Exercise 5 below.
0

D Exercises.
1. Compute the value of the line integral of the first-order differ
ential form with domain E2,

w(x,y) =xdx-ydy,
over the following curves:
(a) y(t) = (cos 7Tl, sin 7Tl), tE [O,l].
(1- t,O), tE [O,2].
(b) y(t)
(c) y(t)
(1- t,1- jl - tj), t E [O, 2].
=

2. Compute the value of the line integral of the first-order differ


ential form with domain E3,

9.2 THE LENGTH OF A CURVE I 403

w(x,y,z)= yzdx+xzdy +xydz ,


over the following curves:
(a)
(b)

3.

y(t) =( cos 27Tt,sin 21Tt,2t),t


y(t)= (1,0,t),t E [0,6].

[0,3].

Compute the value of the line integral of the first-order differ

ential form with domain E \ {0},

y
x
w(x,y)= - 2dx+ 2dy,
x +y
x +y
over the following curves:

y(t) =( cos 1Tt,sin 1Tl), t E [O,l].


(b) y(t) =(cos 7Tl,-sin 1Tt), t E [O, l].

(a)

Note that these line integrals are not the same, even though the two
curves have the same initial and final points.
4.

An ellipse is the set of all vectors

(x,y) in E2

so that

Find a representative of a closed oriented curve whose range is this


ellipse and traverses it once in the "counterclockwise" direction. If this
oriented curve is

5.

y,

compute the line integral

A function with an interval domain in E1 and range in E" is said

to be regular <=? it has a continuous nonzero derivative at each point


of its domain (see Definition

9.1.3).

Let us say that two regular functions

and {3 are equivalent <=? there is a continuously differentiable strictly

monotone function

t(T) (increasing or decreasing) so that {3 (T) = a 0 t(T)


t' ( r) > 0. A regular curve is an equivalence

and orientably equivalent<=?

class of regular f unctions, and an oriented regular curve is an equiva


lence class under the orientable equivalence relation. Show that every
regular curve contains exactly two regular oriented curves.

9.2
Let

THE LENGTH OF A CURVE

be a curve in E". Using our convention that a curve can be named

by one of its representatives, we shall suppose that

[a, b] and range in E". Let a


a= {[ak, bk]: k E (1,m) }, and set
domain

L,,(a)

L ly(bk) - y(ak) 1.

k=l

is a function with

be a decomposition of

[a,b] ,

(9.2.1)

404 I THE INTEGRATION OF DIFFERENTIAL FORMS


This defines a real-valued function

l-y on the set of all decompesitions


[a, b]. Following the terminology of Section 5.5 we say that y is of
bounded variation<=> l-y is bounded. We now formally define the length
of

of a curve as follows.

9.2.1 Definition. If y is a curve of bounded variation in


length of the curve is the number

En,

then the
(9.2.2)

In line with the terminology of Section 5.5, it would also be reasonable

to say that

l(y) is the total variation of y.


t E [a, b], we restrict y. to [a, t], and we define
l(y, t) by (9.2.2), where .:i is now to be taken as a decomposition of [a, t].
The function oft defined by l(y, t) is monotone, nondecreasing, and
Suppose now that

hence the Riemann-Stieltjes integral

J: g(t) dl(y, t)
exists for every real-valued contin.uous function g. In particular, if
l(y, t) defines a coritinuous function and f is a real-valued continuous
function whos domain contains the range of y, then

J: f :y(t) dl(y, t)

(9.2.3)

exists. This last integral may be viewed as another type of line integral

where the function f is being integrated along 'Y with respect to the
length along
is

y. In particular, if f is the constant function 1, then (9.2.3)

l( y ) ..

Ify is a smooth curve,

way.

9.2.2

Theorem.

l(y) can be computed in a pa'rticularly simple

If y is of class
t(y)

then

J: Jy' (t) I d t.

Suppose .:i
{[ak, bk]: k E (1, m)} 1s any decomposition
[a, b]. By the mean value theorem we get

Proof.
of

C1,

Now, since

y' is continuous, VE> 0, 3.:i. so that

E Ik we have

v .:i

>a. and Vtk

9.2

=
where

l7Jkl

<

{ [ dyi(td ]2}1/2 IIkl+ 7Jklhl/(b - a),


dt
m

e;

405

(9.2.4)

moreover,

DIY'I (!J.)
If

THE LENGTH OF A CURVE J

mk =inf { IY'(t) I: t E Ik},


mklikl

J: IY, (t)I dt + E.

then using

(9.2.5)

(9:2.4) we get

t [ dytk) rr2 IIkl

ly(bk) - y(ak) I+ E IIkl/(b

).

Thus, summing over k, we get .

f21Y'I (!J.)

k=I

ly(bk)-y(ak)I+e.

Taking the I. u. b. as ti varies ovet

crrl refinements of di, we get

J: ly'(t)J dt l(y)+e.
In a similar way, and using

(9.2.5), we get

ly(bd -y(ak) I . DIY'I (ti)+ E fb IY'(t) I dt + 2E.


m

Thus

l(y)
Since

J: ly'(t)J dt+ 2E.

E is arbitrary we have completed the proof.

It is possible to connect the line integral of Section


integral
tion

(9.2.3).

9.1 with the line

Let us use the same physical example we used in Sec

9.1, namely, we shall compute the work done on a particle acted on


( ) ( ( )
( ) w3(x)). Let the position of the

by a force field

w x

particle at any time

w1

, w2

be given by

yW

(y1 (t),

y2(t), y3(t)).

Since we

have a physical system we may assume that the particle has an accelera
tion at every time

t and

hence the velocity

y'(t)

is continuous. Let us

make the simplifying assumption that in the time interval


velocity does not vanish. The vector
vector to 'Y at

t.

y'(t)

[a, b]

the

is, by definition, a tangent

The vector

O(t) =y'(t)/ly'(t)I
is a tangent vector of unit length and is often ca,Iled the
the oriented curve 'Y.

(9.2.6)
orientation

of

406 I THE INTEGRATION OF DIFFERENTIAL FORMS

If we project the force vector


generated by

O(t), the

w 0 y(t)

onto the one-dimensional space

projection is of course given by

[w 0 y(t) O(t)]O(t).
This is the compom:nt of the force that lies in the space generated by

O(t). Let d { [ak> hd: k E (I, m)}


tk E [ak,bk]. Then

be a decomposition of

[a ,b] and

L [w 0 y(tk) O(tk)] [l(y,bk)-l(y,ak)]


k=l
should be approximately the work done by the force field
lhe particle in the time interval

integral

acting on

If we pass to the limit we get the

[a, b].

J: w 0 y(t) O(t) dl(y,t).

(9.2.7)

To show that this actually coincides with the integral (9.1.1), we note that
w

0 y(t)

__l
O(t) - '
IY (t) I

wk
11

'

y(t)

dyk (t)
,
dt

and that

/(y,t)

{ ly'(T)ldT.

Thus

From Theorem 5.4.7 it follows that the integral (9.2.7) is the same as

(9.1.1).
Let us close this section by showing how a continuous oriented curve

of bounded variation may be "parameterized by its arc length." Suppose

is the oriented curve. Using our convention about the naming of

curves, we shall suppose that


variation with domain

[a, b]

s(t)
Since

y is

y is

J d/(y,

t1

<

/(y, t).

continuous it follows that

s is continuous. Further, if t1 < t2 ,


s (t2) If we suppose y is one to one,
if s(t1)
s(t2), it follows that YI [t., t2]

then it is certainly clear that s (t1) :s;;


then

a continuous function of bounded

and range in ". Let us set

t2 ==>s(t1)

<

s(t2).

For,

is a constant function, which contradicts the fact that it is one to one.

Thus, if

y is

one to one, s is a monotone increasing continuous function

9.3

A SPECIAL CASE OF STOKES' THEOREM I 407

[a,b] and range [O, l(y)J. If we denote the values of the


t(s) and set a(s)= y t(s), then a is another repre
sentation for the oriented curve y. This is what we mean by parameter
izing a curve by its arc length. Notice also that if y is of class C 1, then
with domain

inverse function by

s(t)
y'

and if

never vanishes,

parameterize

J ly'(r)I dr;

is monotone increasing and thus we can

by its arc length, even if

is not one to one.

D Exercises
Justify the remarks made in the last paragraph of this section;

I.

that is, show that

YI [t1, t2] is
If

2.
and

continuous implies sis continuous ands(t1)

3.

is a curve of class

C1

that is regular

[a,b], and Vt

on

l(y, t)

does not vanish!)

Suppose

4.

2.

[a,b],

If

is a function of

'

t-a.

and f3 are curves with the same range. Without loss

of generality, we may suppose that


same interval domain. If

and f3 are functions having the

is of bounded variation, is it necessarily

true that f3 is of bounded variation, and

9.3

(y'

I.

Prove the following converse of Exercise

C1 defined

I
I dyd(t)
t
then

s(t2)

is "parameterized by its arc length s'', show that

I dys)I
class

a constant function.

l(a)= l({3)?

A SPECIAL CASE OF STOKES' THEOREM

Let I=

[O,I]

[O,I]

and let

given as follows:

be the piecewise smooth closed curve

Vt

[O, I],

(I, t-1), Vt

[l,2],

(3-t,l), Vt

[2,3],

(0,4-t), Vt

[3,4].

(t,O),

y(t)

(9.3.1)

Geometrically speaking, y proceeds around the boundary of I in a


"counterclockwise direction."

408 I THE INTEGRATION OF DIFFERENTIAL FORMS

w be a first-order differential form of class C1 whose domain is


an open set containing /. The integral of w over y is, by definition,
Let

Jy W f W1

'}'(t)

1
dy (t) + W2

f40 k=l

:L

y(t)

'}' (t)

2
dy (t)

dyk(t)
- dt.
dt
-

Note that since yk is piecewise smooth the last integral exists. If we note

that, except at four points where it is undefined,

-1, 0, or 1, we get

dyk/dt takes on the value

1 w = f [w2(1,T)-w2(0,T)] dT
-i1 [w1(t, 1) -w1(t,O)J dt.

(9.3.2)

On the other hand,

i D2W1 (x) dx J: [ il D2w1 (t,T) dT ] dt


L1 [w1(t, l)-w1(t,O)] dt.
i D1W2(x) dx= J: [ J: D1w2(t,7) dt ] dT
= J: [ w2(1,7)-w2(0,7) ] dT.
=

(9.3.3)

If we compare the formulas in (9.3.2) and (9.3.3) we see that

i [D1w2(x) -D2w1(x)J dx= JY w.

(9.3.4)

This simple formula is an analogue of the forinula in Theorem 5.2.2.


Indeed, note that we used Theorem 5.2.2 in its proof.

The curve y may be identified with a chain in the following way.


Fort E [O, l ] let us set

2
/ 0 (t)
1
/0 (t)

(t, O),

(O, t),

2
11 (t)
1
11 (t)

(t, 1) '

(1, t).

(9.3.1 )
'

Let us then form the chain

iJI=LL (-l)HkJ/.
j=O k=l

Using the definition we gave in Section 9.1 for integration of a first

order differential form over a chain, we get

9.3

A SPECIAL CASK OF STOKES' THEOREM J 409

(-1) i+k J

i=O k=t

If

J w.

(9.3.5)

In higher dimensions it is sometimes much more convenient to con


sider integrals over higher-dimensional chains rather than over certain
surfaces. It is for this reason that we have noted the chain form of the
integration of w over y.
In applications it is usually the case that we want to apply the formula

(9.3.4) over more general regions than the unit cube /. Although this
can be carried out for rather general situations, we shall confine our
selves at present to showing that the formula holds for regions that are
smooth mappings of/.

Let us suppose that rp is of class C2 with IC J9(rp) and (rp) C E2

Further, let us suppose that Vx E !0,j<fJ(x) > 0 and rpj/0 is one to one.
Then we may apply the transformation theorem 8.5.8 to get

(9.3.6)
On the other hand, ,

rp

y defines an oriented piecewise smooth

curve and by definition

f,

140 k=12
2

wk

,(t)

d (t)
7dt,
t

(9.3.7)

where, of course, the derivative of , k does not exist at four points.


Using the chain rule we get

d, k(t)
dt

so that

2 wk

,(t)

d (t)
T,

DJ'P k ( 'Y(t))
L..
j=t

2 [ 2 wk

dyi(t)
dt '

,(t)Dirpk(y(t))

] ?,
d i(t)

Let us set

wj(x)

w*(x)

2 wk

k=t
2

i=I

rp(x) Dirpk (x),

wj(x) dxi.

(9.3 8)
.

(9.3.8')

From (9.3. 7) and the definition of w* we get

{ w=
J.poy

f w*.
y

(9.3.9)

410 I THE INTEGRATION OF DIFFERENTIAL FORMS

Now, we have already proved [formula

(9.3.4)]

that

r [D1w!(x)-D2wf(x)] dx= { w* .

(9.3.10)

To connect the left side of (9.3. l 0) with the right side of (9.3.6), we must
compute the quantity

w.

form

Since

ip is

D1w(x)-D2w!(x) in terms of the differential


C2 we can use (9.3.8) to compute this quantity

of class

and a straightforward calculation gives

D1w!(x)-D2wf(x)= [D1w2(ip(x))-D2 w1(ip(x))]].p(x) . (9.3.11)


Thus if we use

(9.3.11) in (9.3.10),

and make use of

(9.3.6) and (9.3. 9)

we get

(9.3.12)
Parenthetically, let us introduce another notation for

w*

which is

more or less standard and will arise again in higher dimensions. From

(9.3.8) and (9.3.8 ')

we get

w*(x) =
Since

wk

<Upk(x) = 2I=i DJ'Pk(x) dxi

ip(x)and

[ J DJipk(x) dxi J.

dxk ip(x) = <Upk(x),


0

2
2
w*(x) = L wj(x) dxi = L w
k
j=I
k=I

Some standard notation for

w*

is [see

we have

ip(x) dxk ip(x).


0

(9.1.3)]

w*(x) = ip*w(x) = w

ip(x).

We have proved the following version of Stokes' theorem, to which


the names "Green" and "Gauss" may also be attached.

Suppose ip is a function of class C2 with an open domain


in E2 and range also in E2 Further suppose that IC (q;), Vx E J0,
j.p(x) > 0, and ipj/0 is one to one. If w is a first-order differential form of
class C' with an open domain containing ip(I), then
9.3.1

Theorem.

'!'(/)

where

y is

[D1w2 (x)-D2 w,(x)] dx =

w,

(9.3.12)

<p,y

given by (9.3.1).

We can also write the formula (9.3.12) in terms of a chain. Indeed, set

aip(I)

= LL <-oi+k q;o1/,
J=O k=l

9.3

where

I/

is given by

(9.3.1 ').

A SPECIAL CASE OF STOKES' THEOREM I 411

It is indeed a simple matter to verify that

1
2 <- o i+k
L L

o IP(/)

j=O k=l

IPll

f,

w.

IPY

Let us give a few examples that show how Theorem

ip

First, it is clear that if

9.3.1

can be used.

is any affine transformation with a positive

(constant) Jacobian, then the hypotheses of the theorem are satisfied


and

(9.3.12)

is satisfied. Thus, by composing any function

satisfies the hypotheses of Theorem


formation, the formula

(9.3.12)

9.3.1,

ip,

which

with a suitable affine trans

is satisfied for a rectangle

in any

position and y its bounding curve proceeding in a "counterclockwise"


direction.
Now let 'P be the polar coordinate transformation in

ip(r, 8) = ( r

cos

E2:

8, r sin 8).

"

FIGURE 9.3.1

Let J be the rectangle

[r1, r2]

[O, 27T],

where

r1 ;;;.

0. The function

'P is of class C2, the Jacobian of 'P is positive in the interior of J and

iplf0

is one to one. Hence formula

(9.3.12)

will hold when

is replaced

by J and y is the boundary of J proceeding in a "counterclockwise


direction." The interval J maps under
Fig.

9.3.1.

ip

onto the annulus shown in

Note that the two vertical edges of J map onto the circular

boundaries of the annulus, while the horizontal edges map onto the
portion of the positive

axis that lies between the two circular bound

aries. However, note that the two different horizontal boundaries map

ip y this portion
9.3.1 takes the form

in opposite directions so that when we integra.te over


of the line integral will cancel out. Hence Theorem

412 j THE INTEGRATION OF DIFFERENTIAL FORMS

J f [D1w2-D2w1] dx dy
R

where R is the annulus


circle, and

{(x,y): r1,,;;; I (x,y)I,,;;; r2}, C2

is the outer

C1 is the inner circle, both proceeding in a "counterclockwise

direction."
As a second example, let us show that Stokes' theorem is valid for a
triangle and its interior. Let us set

This clearly defines a

rp(x,y)= (x, (I -x)y).


2
function of class C on

2, and its Jacobian is

given by
I

= 1-x.

j.,,(x,y)=
-y
(x,y)
rpjI0 is

(1-x)
I,

is in the interior of the unit cube

a one-to-one function so that we may apply Theorem

9.3.1. The map of

I under <P is shown in Fig.

then

J.,,(x,y)

Hence, if
Further,

> 0.

9.3.2.

I
I
!

FIGURE 9.3.2

Suppose now that

u1

and

u2

are linearly independent vectors in 2

and set

Then extend

I/I by linearity to all of

2 Suppose that

Jl/J(x,y)=

> 0.

chosen so that

u1

and

u2

are

9.3

Then

!fl

A SPECIAL CASE OF STOKES' THEOREM I 413

cp satisfies all the hypotheses of Theorem

the map of I as shown in Fig.

9.3.1 and we have


9.3.3. Finally, by an affine transformation

FIGURE 9.3.3

we may move the triangle on the right away from the origin so that
Stokes' theorem is valid for any triangle.
As a third example we shall show how to obtain a somewhat more

exotic region for which Stokes' theorem will hold. Let us set

cp(x, y)

(x,y(x5 sin

;: + 1)),

(0, y)'

0,

x= 0.

'1 t is not difficult to check that cp is of class C2, and cpj/0 is one to one and

has a positive Jacobian. Hence we may apply the formula


map of I under cpis shown in Fig.

(9.3.12). The

9.3.4.

FIGURE 9.3.4

As a final example, let us suppose that cp satisfies the conditions of


Theorem

9.3. l and

is a first-order differential form of class C1 of

the form
w

(x, y)

=-au(x,y) dx+au (x,y) d


y.
.
ay
ax

414 I THE INTEGRATION OF DIFFERENJ:IAL FORMS

Let us set R = ip(I) and= ip 0 y and let us parameterize the oriented


curve by its arc lengths. As we have seen in Section 9.2, a tangent
vector to of u.nit length is d(s)/ds. The vector
n=

d 2 (s)
ds

d1 (s)

'

ds

is called the outward normal to, at s. It is dearly orthogonal to the tangent


vector d,(s)/ds. If we compute the directional derivative of u with
respect to n we get
DnU(X, y)ds=

[
[-

a u(x,y)

Thus

ax

n 1+

au(x,y)
n2 ds
.
ay
.

a u(x,y) d,1(s) +a u (x,y) d,2(s)


ds .
ds
ds
ax
ay

Dnu(,(s))ds=

(s).

w 0

If we use the symbol 'aua


/ n' for Dnu, from Stokes' theorem we get

.
U >au((s))
ds
.
an
o

The last integral is also very often simply written as

au
ds.

.an

The operator

a2
a 2
6.=
+
ax2 ay2

is called the Laplacean operator, and hence we may rewrite Stokes'


theorem for the particular w we are using here as

II
R

6.u(x,y) dx dy =

aR

au
ds,

(Jn

where we are using a


' R' as another name for

(9.3.13)

D Exercises
1. Use the Stokes-Green-Gauss theorem to evaluate the following
line integrals:
(a) f. x2y dx x dy, where proceed "clockwise" around the
boundary of]= { (x, y): 3 x 5, 1 y 3}.

9.!I A SPECIAL CASE OF STOKts. THEOREM j 415


.

(b) f ,,,x dy-y dx, where, proceeds "counterclockwise" around


4.
the boundary ofcircle whose equation is (x - 2) 2+ y2
(c) f,,, ex sin y dx+ex cos y dy, where , proceeds "clockwise"
around the boundary ofany region where we may apply the Stoke;
Green-Gauss theorem.
=

2. Show that the Stokes-Green-Gauss theorem is valid for an


ellipse and its boundary.
3. Generalize Exercise 2 and show that the Stokes-Green-Gauss
theorem is valid for an "elliptical ring."
4.
Use the Stokes-Green-Gauss theorem to find the area enclosed
by an ellipse by evaluating a line integral.

5. Let J be a real-valued function ofclass C2 with domain ]-e,


l+e[,e>O.LetD CE2be the set{(x,y):x E [O,l]&y E [O,f(x)]}.
Show that the Stokes-Green-Gauss theorem is valid forD and its suitably
oriented boundary. Note that this generalizes some ofthe constructions
given at the end ofSection 9.3.
6. Generalize Exercise 5 in the following way. Let J and g be real
valued functions of class C2 defined on ]-e, 1 + E[ so that g f Let
DC E 2 be the set {(x,y): x E [O, l] & y E [g(x),f(x)]}. Show that
the Stokes-Green-Gauss theorem is valid for D and its suitably oriented
boundary.

Let w be defined on 2\{0} by

7.

- -y
x
w(x,y)-+ dx++ 2dy.
x
y2
x
y
IfJ is any interval containing the origin in its interior , show that

JI [D1 w2 -D2w1] dxdy

0,

'J

but that

dJ

27T,

where a] is the suitably oriented boundary of]. Why does this not
contradict Theorem 9.3. l?
Under suitable conditions on u,v, and R

8.

rf [u

where

av + au av+ au av

ax ax

ay ay

i the arc length along aR.

] dxd =I
Y

E 2, show that

a
u v ds.
an
an

416 I THE INTEGRATION OF DIFFERENTIAL FORMS

u,v,
(uL\v-vL\u) dx dy= iR(u:-v:)ds,
ff
R

Under suitable conditions on

9.

where

IO.

is the arc length along iJR.

Under suitable conditions on

then

where

9.4

u,

and R C E2, show that

and R C E2, show that if

Au=

0,

dy= u ds.
(
(
ff
R [ r r J dx LR :
+

is the arc length along iJR.

CLOSED AND EXACT DIFFERENTIALS

Suppose

is a first-order differential form with an open domain in

En and moreover

Vx ..(w),w(x)= df(x),
E

where

is a real-valued

function of class C1 Suppose further that "Y is an oriented curve in

(w)

..

of class C1 Using the chain rule we may write

k
w 0 y(t)= Dk f( y(t)) dydt(t) dt df 0/t (t) dt.
k=I
[a,b],
a= y(a),/3 y(b),
(9.4.1)
J w=J d J=fbdjodty(t) dt=f(/3)-J(a).
w
a {3,
(9.4.1)
w
=

Hence, if the domain of "Y is

and we set

then

This says that the integral of


domain, which goes from
soon show that

to

over any smooth oriented curve in its


is independent of the curve. We shall

is also true for piecewise smooth oriented curves

so that the integral of

over any piecewise smooth closed curve in its

domain is zero.

9.4. 1 Definition.
first-order difef rential form is called exact there
exists a real-valued function f defined on ..(w) so that Vx ..(w),w(x)
= df(x).
A

The next theorem gives necessary and sufficient conditions that a


first-order differential form be exact.

9.4.2 Theorem. If w is a first-order continuous differential form defined


on an open set in then w is exact for every piecewise smooth closed oriented
curve in ..(w) we have
En,

"Y

9.4 CLOSED AND EXACT DIFFERENTIALS I 417

Proof.

Let us first prove the necessity; that is, we assume

{[ak, a k+d: k E (I, m)}


that y' exists on ]a k> ak+1 [.

df
'Y

Let

be a decomposition of the domain of

so

Using Theorems 5.2.1 (d) and 5.2.2 we get

w=

'Y

df=

'Y

Since

over

is closed,

is zero.

.L

fak+l dj

y(t)
dt
dt

k=l ak
m
L [J y(ak+i) - f y(ak)]
k=l
0

y(a1)= y(am+1),

and it follows that the integral of

To prove the converse we may assume, without loss of generality,


that

.B(w)

is arcwise connected. Otherwise we can work with each open

component of

.B(w).

x0

Fix a point

.B(w) and Vx
.B(w) with x0

be a piecewise smooth oriented curve in


and

the final point of

yx

.B(w)

let

'Yx

Let us set

This defines a function of


independent of the choice

the initial point

yx We claim
of 'Yx Indeed,

that for fixed

x0

and

it is

ax is another piece
x0 to x. If yx has
that its domain is [b, c].

suppose

wise smooth oriented curve which proceeds from


domain

[a, b]

suppose

Define

{3(t)=

ax

is parameterized so

{ 'Yx(l),

Vt
Vt

ax(b + c - t) ,

[a, b],
[b. c].

E
E

Then f3 defines a piecewise smooth closed oriented curve

.B(w).

Hence

Thus for fixed

x0

we may set

f(x)= F(yx) ,
and this defines a real-valued function on

w= df
u E En

We shall show that

.B(w). Now, let


B(x, 8). Let us set
C

'Yx+hu(t) -

Since
and

.B(w)

E R so that

{ 'Yx{t)' -b)u,
x+h(t

.B(w).

is open

Vt
Vt

E
E

38 > 0 so that B(x, 8)


hot=- 0 and x+hu E

[a, b],
[b, b + I).

418 I THE INTEGRATION OF DIFFERENTIAL FORMS

Hence we get

{f w- w }
J
b+t
n
d k
(t)
f
"' Wk
(t)
dt
=dt
h

J(x+hu)-f(x) !
=
h
h

'Yx

'Yx+hu

'Y

'Y x+hu

x+hu

,,,c.,
b k=I
(b+I n
wk(x +h(t-b)u)uk dt.
=J
b
o

Ash 0 we get

n
Duf(x)= L wdx)uk = w(x)(u).
k=I
Vu E En the right side is a continuous function of x, it follows
that df(x) exists, and hence w(x)=df(x).
1
In case w is of class C it is possible to give conditions on the partials
of wk so that w is "locally exact." For the moment we shall restrict our
2
selves to the case where (w) is an open set in E Later on we shall
n
consider the case where (w) C E . Let B be an open ball in (w)

Since

and] an interval inB. From Stokes' theorem we get

If
Now, if

V(x,y)

[D1w2-D2w1] dxdy =

iJ

w.

EB,

D1w2(x,y)=D2w1(x,y) ,
w

then we get that the line integral of

along every oriented rectangle

in Bis zero .
Now, if

(a,b) is

the center ofBand

(x,y)

EB, set

r W1(t,b) dt+I: Wz(X, t) dt ,


f2(x,y)= J: w2(a, t) dt+ J: w1(t,y) dt.

f1(X, y) =

The number f1 (x,y) is the line integral of

along the curve consisting

of the horizontal straight line proceeding from


the vertical straight line from

(x,b)

to

(x,y).

the line integral consisting of the vertical line


to
to

(a,y)
(x,y).

and then the horizontal straight line


Since the line integral of

inBis zero, it follows that

V(x,y)

(a,b) to (x,b) and then


The number /2(x,y) is
proceeding from (a,b)
proceeding from (a,y)

around every oriented rectangle

EB,f1(x,y) = f2(x,y). Now, a simple

calculation shows that

Dz/1(x,y)=w2(x,y),

Dif2(x,y)=w1(x,y).

9.4 CLOSED AND EXACT DIFFERENTIALS I 419

Since w1 and

w2 are continuous,

if we setf

f1

f2, then from Theorem


df(x, y). Thus

7.2.5 it follows that df(x, y) exists and of course w(x, y)

wlB

is exact.

The discussion of the last paragraph prompts us to make the following


definition.

9.4.3 Definition. A differential form w with domain an open set in En


1
is said to be closed w is of class C and Vj, k E (I, n) and Vx E E(w),

DJwk(x)

Dkwi(x).

A little later on we shall present a definition of a closed differential

form in a much more compact and more easily remembered notation.


For now, let us remark that we have proved above that every closed
differential form with domain in E2 is "locally exact" in the sense that
the restriction of the closed form to any ball in its domain is exact.
However, it is not necessarily true that a closed form is "globally exact."
For example, the form

- ......:::.1_
w(x, y) 2 +
2 dx
x
y

is defined on

2
\{0}

+ z--- 2
+y
x

and is closed. However,

dy

is not an exact form.

Indeed, if J is any interval in E2 containing the origin in its interior,


then (see Exercise 7 of Section 9.3)

w#-0.

aJ

It follows from Theorem 9.4.2 that

is not exact, even though it is

locally exact.
From the previous example it would seem that for a closed form to
be exact, there would need to be additional conditions on its domain.
This is actually the case, and for the purpose of obtaining these addi
tional conditions we introduce the following definition. To make the
notation easier we shall suppose, for the remainder of this section, that
we shall only work with representatives from a given curve that have
domain

[O, l].

9.4.4 Definition. Two closed, piecewise smooth, oriented curves 'Yo and
y 1 in a set E C En are said to be homotopic in E there exists a continuous
function r with domain [O, l] X [O, l] that is piecewise smooth in each
variable, has range in E, VT E [O, l], f(T, O]
f(T, 1), and Vt E [O, l],
,
t
t
y
y
O,
t
and
f(
0(t),
1( ).
f( )
l )
A piecewise smooth oriented curve y in E is said to be homotopic to zero in
E y is homotopic in E to a constant curve; that is, a curve 'Yo so that Vt
E [O, l], y0(t)
y0(0).
=

420 I THE INT.EGRATION OF DIFFERENTIAL FORMS

It is not difficult to establish the fact that the homotopy relation is an


equivalence relation, so that the piecewise smooth oriented curves in
a given region break up into pairwise disjoint homotopy classes. It is
also not difficult to show (Exercise 7 of Section 9.4) that the homotopy
relation, 9.4.4, is independent of the piecewise smo9th representatives
we pick from each curve.
If every closed, piecewise smooth oriented curve in an arcwise con
nected region is homotopic to zero, then we say that the region is simply

connected. From the point of view of the homotopy relation, the last
statement says that a region is simply connected if all the closed,
oriented, piecewise smooth curves belong to the same homotopy class.

9.4.5 Definition. An open set /) C E" is said to be simply connected


<==> /) is connected and every piecewise smooth, closed oriented curve in /) is
homotopic to zero.
Roughly speaking, a simply connected set in

E2 is one that has no

holes in it. Of course in higher dimensions we no longer have such a


simple interpretation. An example of an arcwise connected set in

E3

that is not simply connected is an anchor ring.


For the purpose of giving an example, let us note that if an open set
in

E" can be contracted to a point by means of straight lines, then the

set is simply connected. To be more precise, let us say that the set

S C En is star-shaped with respect to the point a E S <==> Vx E S, the straight


line L= {y: y ( 1 - t)a + tx & t E [O, l]} belongs to S. Of course,
=

every convex set is star-shaped with respect to every point in the set.
Suppose S is open and is star-shaped with respect to

a E S. If y is

a piecewise smooth, oriented closed curve in S, set

f ( T, t) = (1 - T) a + TY (t) ,
Clearly,
variable,

VT E [O, l].

f has range in S, is continuous, is piecewise smooth in each


f( T, O) = f(T, l) , f(O , t ) = a , and f(l,t)=y(t). Thus y is

homotopic to zero.

We want ultimately to prove that every closed first-order differential


form defined on a simply connected domain is exact. First, we shall
prove that every closed form on an open domain is locally exact. We
have proved this previously for closed forms having domains in

E2,

making use of the Stokes-Green-Gauss theorem. Although we could


still make use of this theorem in higher dimensions, it is much simpler
to proceed by a more direct method. Of course, since we know what we
are looking for, it is easy to discover a direct method of proof.

9.4.6 Theorem. Every closed first-order differential form with domain


an open set in E" is locally exact; that is, its restriction to every open ball in
its domain is exact.

9.4

Proof.

Let

w be

a first-order closed differential form with domain an

open set in En and suppose

B(a, r)

[O,I]

CLOSED AND EXACT DIFFERENTIALS I 421

B(a, r)

.e9(w).

Define the functions on

by

s(x,t)
Clearly, for fixed

x,

( I - t)a + tx.

the range of

is the straight line joining

and

a.

Let us set

If we apply the operator

Di to

both sides we may use Theorem

8.4.3

to move this operator from the outside to the inside of the integral. Now,

Di wk

s(x,t)

ask(x,t)
ask(x,t)
= [Di wk0s(x,t)]
at
at
ask(x, t) .
+Wk0s(x,t) Dj
at

Further, using the chain rule, and noting the form of

s,

we get

Diwk0s(x,t) = tDiwk(s(x,t)).
Also,

ask (x,t)
D)
at
Now use the fact that

- =
8)k

l
0

<=>

<=>

j = k.

k,

is closed so that

Thus we have

- t aw;0s(x,t) + Wj
at

o S X,

t).

Consequently,

(1 aw-0
) s(x , t) dt+ (1 Wj0s(x,t) dt.
DJ(x) =Jot
Jo
at
If we integrate the first integral hy parts we get

DJ(x) = w;(x).
Since

equal to

is continuous, it follows that

w(x).

df(x)

exists and, of course, is

422 I THE INTEGRATION OF DIFFERENTIAL FORMS

The theorem we have just proved is one of the crucial steps in showing
that every closed first-order differential form in a simply connected
region is exact. A second crucal step is the following lemma. For the

a curve a is in a 8-neighborhood of the


curve 1' if there are representatives of each curve so that

purposes of this lemma we shall say

sup{ja(t) - y(t) I: t
Suppose w
domain an open set in En. For
.,(w) , 3 8 > 0 so that for every
with a ( 0) = 1'(0) , a (I) =1' (I)
9.4. 7

Lemma.

Proof.

Let

compact and
and

Vx

28

is a closed first-order differential form with


every piecewise smooth, oriented curve 1' in
piecewise smooth, oriented curve a in .,(w)
and a in a 8-neighborhood of 1' we have

B(y(T), 28)

y,
by

to
>

.,(w)c. Since 5C,(y) is


0. For every T E [O, I]

set

'Y,.

(0, T]

5C,(y)

be the distance from

.,(w)c is closed, it follows that 8


J,(x) =

where

[O, I]} < 8.

f,

w,

s-r.x

is that oriented curve which has a representative defined on

y ,(t) = y(t ) ,

and

s,,x(t) =(I - t)y(T) + tx, t E (0, I]. As the


Vx E B(y(T), 28), df,(x) = w(x).

proof the last theorem shows,


Next,

Vx

B(a(T), 8)

let us set

g,(x) =

w+

a,

where

a,

w,

TT,X

y,, and, of course, r, ,x(t )


B(a(T), 8), dg,(x) = w(x).
that Vx E B(a(T),8)

is defined in a manner similar to

=(I - t)a(T) +
Since

B(a(T), 8)

Vx

tx. We also get that


C

B(y(T), 28)

it follows

d[f,(x) - g,(x)] =0 ,
from which it follows that there is a number

Vx

c(T)

so that

B(a(r) ;8),

f,(x) - g,(x) =c(T).


We think it is clear that

is a continuous function of

T.

Let us put

To= sup{T: c(T) =O}.


The set on the right is nonvoid since
is well defined and we claim

To< I.

Take

To< T1 I

so

To= I.
that t

certainly belongs to it. Thus

To

For suppose to the contrary that


E

[T0,T1] =}y(t)

B(y(T0),28)

CLOSED AND EXACT DIFFERENTIALS I 423

9.4

a(t) E B(a(To), 6) . It is possible to do this because of the continuity


y and a. Now, Vx E B(y(T0), 26) let us define

and
of

Vt
Vt
Vt
The function

f3x

oriented curve in

E
E
E

[T0, T i],
[Ti.Ti+ l]'
[T1 + l, Ti+ 2].

is the representative of a piecewise smooth, closed,

B(y(T0), 26)

(Fig. 9.4.1). From Theorems 9.4.6 and

FIGURE 9.4.1

9.4.2 it follows that

{ w=
J f3x
This shows that

Vx

w-

'Yn

w+

'YTO

w-

8n,x

8To,x

w=O.

B(y(T0), 26) ,
J,0(x) =f,,(x).

In exactly the same way we get that

Vx

B{a(T0), 6)

g,0 (x) =g,,(x).


Thus

Vx

B(a(r0), 8)

we get

c(Ti) =f,,(x) - g"(x)

f,0(x) - g,0(x)

c(T0) = 0,

which, of course, is a contradiction.


Consequently, since

y( 1) = a ( 1),

we get

fi(y(l)) =g1(a(l)).
But this says nothing more than

9.4.8 Theorem. Suppose w is a closed first-order differential form with


domain an open set in En. If a and f3 are oriented, piecewise smooth closed

424 I THE INTEGRATION OF DIFFERENTIAL FORMS

curves in . ( w) , and are homotopic in . (w) , then

Proof.

We shall divide the proof into two parts.

a(O) = a(l) = {3(0) = /3(1).


[O,l] which gives the homotopy between
a and /3, and let us suppose that VT E [O,l],f(T,O)= f(T, I)= a(O).
Also, suppose that f(O,t) = a(t),f(l,t) = {3(t).
Let E be the set of all points a E [O,l] with the property that VT
E [O,a]
(a)

To begin with, we shall suppose that

[O,l]

Let r be a function on

I.

w=

where

f7(t)= f(T,t).

w,

fT

The set E is a nonvoid set, since

To= sup E
Lemma 9.4. 7, 3 e > 0 so

EE. Further,

it is clearly bounded. Hence,

is well defined. We claim

To= 1. Indeed, by

that if y is in an

neighbor

hood of rTO> then

I.fTO

w=

J')'

3o > 0

Now, since r is uniformly continuous,

Vt

so that

lro - Tl < o

[O, I],
lf(T,t)-f(To,t)j <

( w= (
JrTO
JrT
To = 0, then r 70 = a and we are
3a < To so that jT0
al < o, and

If

w=

e.

led to a contradiction. If
'

To > 0,

then

hence

( w= ( w= r w
JrO"
Jr TO
JrT

Again we get a contradiction. Thus

To=

1, and the last argument also

shows that

(b)

w=

( w=
Jr,

w.

fJ

We shall now reduce the general case to the previous case. Let

and f3 be piecewise smooth, closed homotopic curves in . (w) . Let r

be a function on
and

/3

with

f(O,t)

[O, I] X [O, I] which defines a homotopy between a


a(t),f(l,t) = /3(t) (Fig. 9.4.2). Let us set
=

9.4

A(t)
and

VT

[O, l],

{ r(

CLOSED AND EXACT DIFFERENTIALS [ 425

f(3t,O),
{3(3t- 1),
f(3 - 3t,1)'

Vt
Vt
Vt

A(t),

A(7,t)

t- 7/3
,
T,
1 - 2T/3
A(t),

E
E
E

[O, 1/3],
[1/3,2/3],
[2/3,I],

Vt

[O,7/3],

Vt

[7/3,I - 7/3]'

vt E

[I - T/3' 1] .

FIGURE 9.4.2

A defines a piecewise smooth, closed oriented curve in "(w)


A ( 0)
A(I) a ( 0)
a(I), and A defines a homotopy between
and A so that VT E [O, l] we have A(T,0)
a(O) A(7,1).

Now,
with
a

From part (a) of the proof we have

But we think it is clear that

w over that part of A for which t E [O, 1/3] cancels


w over that part of A for which t E [2/3, l]. Thus the

since the integral of


the integral of

proof is complete.

9.4.9 Corollary. If l is an open, simply connected set in E", then the


collection of closed first-order differential forms with domain l is identical with
the collection of exact first-order differential forms which are of class C1 and
have domain ".
Proof.

If w is exact and of class C1, then there exists a fonction J of

class C2 so that

426 I THE INTEGRATION OF DIFFERENTIAL FORMS


11
k
w(x)= :L Dkf(x)dx .
k=l

SinceD; Dkf=Dk D;f, it follows that w is closed.


Conversely, suppose w is closed. Since J8 is simply connected, every
piecewise smooth, closed oriented curve '}' in J8 is homotopic to zero.
From Theorem 9.4.8 it follows that
0

Thus from Theorem 9.4.2 we see that w is exact.


Let us take a simple example that shows an application of Theorem
2
9.4.8. Let B0 B (0, 1) \{0} in E and
=

- -y
x
w(x,y)-
2dy,
2dx+
x +
y
x +
y

which we have considered before. Let 1/1 be any closed differential form
0
defined on Bo and let] be any closed interval in B(O, 1) with 0 E ] .
Leta be that real number so that

JaJ .=a JaJr w

27Ta'

where iJ] is the boundary of J oriented in the "counterclockwise"


direction. Now the oriented boundaries of any two closed intervals
that have zero as an interior point are homotopic in B0 (Exercise 8
of Section 9.4). Thus, by Theorem 9.4.8, the number a computed
above is independent of] .
IfJ is a closed interval in B0, then iJ] is homotopic to zero, and since
ifJ aw is closed in B0, it follows again from Theorem 9.4.8 that

r
JaJ

[. - aw]= 0.

Thus the integral of 1/1 aw around every rectangle in B0 is zero.


Let (a, b) E B0 so that a< 0 andb < 0. For every (x,y) E B0 which
is not of the form (0, y) withy > 0 set
-

g(x,y)

J: [l/11(t,b)-aw1(t,b)]dt+f [l/12(x,t)-aw2(x,t)]dt.

This is a line integral along a horizontal line from (a, b) to (x,b), and
then along a vertical line from (x,b) to (x,y). It is always well defined
and clearly D2g(x,y) i/J2(x,y) - aw2(x,y). Also, V(x,y) E B0, which
is not of the form (x,O) withx > 0, set
=

This is a line integral first along a vertical line from (a, b) to (a, y) and

9.4 CLOSED AND EXACT DIFFERENTIALS J 427

then along a horizontal line from (a, y) to (x,y). It is also well defined,
1/11(x,y)-aw1(x,y).
andD1 h(x,y)
Since the integral of I/I-aw is zero around every closed rectangle in
B0, it follows that on the common domain of g and h we have g(x,y)
=

h(x,y). Now,
D2h(O,y)

lim

+t) - h(O,y) .

h(O,y

t-o

If we use the fact that the integral of I/I-aw around any closed rec
tangle in B0 is zero, we get
h(O,y

+t) - h(O,y)

+ J:+t

[1/1 (0,
2

T)-aw2(0,T)] dT.

Thus
D2 h(O,y)

1/12(0,y)-aw2(0,y) .

In the same way we get


D1g(x,0)

I/I 1(x,O)-aw1 (x,O).

Thus if we extend g to all of B0 by defining g(O,y)


h(O,y) and we
extend h to all of B0 by defining h(x,0)
g(x,0) the above computations
show that by taking f = g
h we have
=

df= I/I-aw.
The result we have just proved can be interpreted in the following
way. Suppose we say that two closed forms on B0 are equivalent they
differ by an exact form. It is not hard to verify that this is really an
equivalence relation. The result we have just obtained tells us that the
disjoint equivalence classes are determined by all the forms {aw:
a E R}. Another way of putting it is that the real vector space of
equivalence classes is one-dimensional.

D Exercises
1.
Determine which of the following forms are exact, and if exact
find a function f for which w = df

(a)

2.

(h)

w(x,y)
w(x, y)

(c )

w(x,y,z)

(d )

w(x,y,z)

Let

(x2- y2) dx-2xy dy.


2xydx + (x2- y2) dy.
=

xydx
(2xyz3

+2" x2 dy
+z) dx

+z3 dz.
+x2z3dy

+(3x2yz2

y
x
w(x,y)- dx+ dy ,
+y2
+y2
x
x

which has domain E2\{0}. Is w exact?

+x)

dz.

428 j THE INTEGRATION OF DIFFERENTIAL FORMS

3.

w be defined on [a, b]

Let

X R by

w(x,y) =p(x)ydx+dy ,
where pis a continuous function on

[a,b]. If we set

verify that the first-order differential form


4.

qw is exact.

Compute the line integral of

w(x,y) =xdx+y3dy
along the oriented curve defined by

a(t)
5.

( e1 log cos

t, sin t Arctan (t-1)),

Show that

w(x,y)is exact on

6.

t E [O, l].

x2- y2
2xy
dx+ 2
dy
(x2 +y2)2
(x +y2)2

2\{0}.

Show that if

a0 and a1 are two different piecewise smooth

parametric representations for the same piecewise smooth, oriented


closed curve, then there is a function r, as in Definition 9.4.4, so that

f(O, t) =a 0 (t), f{l, t) = ai(t).


7.

If

a and {3 are piecewise smooth, closed oriented curves which

are homotopic, show that the homotopy relation is independent of the


piecewise smooth representatives which are picked from

8.

a and {3.

Show that the suitably oriented boundaries of any two closed

intervals in

2, which have zero as an interior point, are homotopic

in 2\{0}.

9. Let C 2 be an open simply connected set, a E . and


(a) =\{a}. Show that there is a closed form w defined on (a)
so that for every closed form /I defined on (a), 3a E R so that
/I - aw is exact. (Hint: Combine the example at the end of Section 9.4
with the technique of proof of Theorem 9.4.8.)

be a simply
,an)
=\{ak: k E (l,n)}. Show that there exists a set {wk: k E (l,n)}
of linearly independent closed forms in (ai. ... 'an) so that for
every closed form /I in (ai. ,a,.) there is a set {ak: k E (l,n)}
10.

Generalize Exercise 9 in the following way. Let

connected open set in

C R so that

2, {ak: k E (1, n)}

C .and let (ai.

9.5

MOTIVATION AND DEFINITIONS I 429

1/1 -

k=l

akwk

k E (I,n)} is said to be linearly independent


k E (I,n)} CR,

is exact. The set {wk:


V{ak:

k=l

11.

akwk =O=>ak=O.

Let u be a real-valued harmonic function

B (O, I) C E2;

that is, u is of class C2 and


a2u

t:m

a2u

ax2 + ay2

=0

Show that there exists a harmonic function v with domain B (O,

1)

so

that the Cauchy-Riemann equations are satisfied:


au

av

ax

ay'

av
au
-=--
ay
ax

12.

Extend Exercise

11 to the situation where B (0, 1) is replaced

by any simply connected domain in 2

II. SURFACE INTEGRALS


9.5

MOTIVATION AND DEFINITIONS

Our purpose in the next few sections will be to extend to higher dimen
sions some of the facts we have proved for line integrals. In this con
nection it will be necessary to decide the meaning of an integral taken
over a surface. This section will be devoted to making definitions
together with the motivation for making these definitions.
The first thing we want to do is define a surface and the area of a
surface. We think it will be best if in the beginning we give a rather
discursive motivational discussion. Let

T be a
n;;;.. m.

is a
us

IT(A)I= l det Tl IA 1If

linear transformation

n m, and A C
J ordan measurable set, then the considerations of Section 8.5
that T(A) is J ordan measurable and

with domain E"' and range in E", where

> m, it is still true that

flC,(T)

If

Em

tell

(9.5.1)

is in an m-dimensional subspace of

E", which is identifiable with E"' through an orthogonal transformation


U of E" onto itself. Hence we now

T(A)

define

the m-dimensional content of

by

IT(A)I =IV T(A)I.


0

(9.5.2)

430 I THE INTEGRATION OF DIFFERENTIAL FORMS

The content on the right is computable by (9.5.1). Of course, we must


make sure that the definition (9.5.2) is independent of the orthogonal
transformation U which takes Tt(T) into Em. Indeed, suppose V1 is
another such orthogonal transformation. Now, if T is singular, the
dimension of Tt( U T) and Tt(V1 T) is less than m and thus
0

IV

T(A)I = IV1T(A)I=0.

If Tis nonsingular, then W = V1 v-1IEm is an orthogonal transforma


tion of Em onto itself. Thus
IV

T(A)I = ldet V Tl IAI


= ldet W V Tl A
I I = ldet V1 Tl IAI
= IV1T(A)I.
0

Let us now give an effective way of computing the right side of (9.5.2)
so that the operator V does not intervene. Let us first note that T1 T
is a nonnegative symmetric linear transformation from E"' into itself,
that is, Vu E Em, T1 T(u) u;:,, 0. Thus this linear transformation
has a matrix representation consisting of nonnegative eigenvalues
down the main diagonal (see Section 6.5). Suppose we arrange them
in nondecreasing order d<?wn the main diagonal. By taking the non
negative square roots of these nonnegative eigenvalues and arranging
these square roots in nondecreasing order down the main diagonal we
get the matrix representation of a nonnegative symmetric linear
transformation B so that B2 = T1 T. We leave to the reader the easy
task of showing that there is only one nonnegative symmetric linear
transformation B with this property (Exercise 1 of Section 9.5). Since
Bis symmetric, the null space of Bis orthogonal to Tt(B)and moreover
BlTt(B) is one to one and takes Tt(B) onto Tt(B), For simplicity sake,
designate the inverse of BlTt(B) by B-1. Let V be that linear trans
formation with domain Tt(B) and range Tt(T) and defined by
0

V= T B-1
0

Now, Vu

Tt(B) we have

jV(u) 12 = V1
since

l l2,
V(u) u = u

V1 V(u) = B-1 T1 T B-1(u)= B-1 B2 B-1(u)= u.


0

Thus V preserves length and we may write


T= V

\/fl:T,

(9.5.3)

where we have taken B. If T is nonsingular so is B, and V


is a length-preserving linear transformation from Em onto Tt(T).
If T is nonsingular, then V V is an orthogonal map from Em onto
itself. It follows from (9.5.1), (9.5.2), and (9.5.3) that
=

9.5

IT(A)I=IV

Vo

MOTIVATION AND DEFINITIONS I 431

(A)I

= [det ] IAI= [det T1 T]112 IAI.


0

(9.5.4)

In case T is singular we have already noted that IT(A)I=0. Since in


that case T1 T is also singular, we have det T1 T=0. Thus (9.5.4)
is still valid. Hence we have succeeded in computing IT(A) I only in
terms of T and A and have eliminated U. Incidentally, this is another
proof of the fact that IT(A)I is independent of U.
Suppose now that A is Jordan measurable in E"' and cp is a function
with domain A and range in En, where n;::: m. We may think of the range
of cp as a surface in E". Let us further assume that cplA0 is of class C1.
If c E A 0, then cp(c) + dcp(c) is a very close approximation to cp in a
small neighborhood of c. Thus if C is a small neighborhood of c in A0,
it is reasonable to say that ldcp(c)(C)I is a close approximation to the
surface area of that portion of the surface that is obtained by restricting
cp to C. Now, the Jacobian matrix of dcp(c) is the matrix
0

D1cp1(c)
:
D1cpn(c)

Thus, if we use the Binet-Cauchy formula (Theorem 6.6.8), we get that


det dcp(c)1

dcp(c)=
l:EiJ<<imn

where

a(cpji,... 'cpjm)
(c)
a(x1,. ,x"')

,m).
is the Jacobian at c of the function (1,
Suppose {Ak: k E (1, q)} is a decomposition of the Jordan measur
able set A into Jordan measurable sets that intersect at most on their
boundaries. If ak E Ak0, then it is not unreasonable to consider the sum

k=l

[det dcp(ak)1

dcp(ak)]'121A kl

as an approximation to the surface area of the surface determined by


cp. Thus in allowing these sums to pass to a limit, it is not unreasonable
to define the surface area as

L [det dcp(x)1

dcp(x)]112 dx.

(9.5.5)

A little later we shall write this again as a formal definition and show
that this definition is independent of the parameterization of the sur-

432 I THE INTEGRATION OF DIFFERENTIAL FORMS

face, but to do so we must first define an equivalence relation among


functions.

9.5.1 Definition. The function a with domain in Em and range in En


is said to be equivalent to the function f3 with domain in Em <::=} there exists a C1
function with values x(y) so that JE>({3) C JE>(x), xjtB({3)0 is one to one with
range JE>(a)0, and f3(y) =a 0 x(y), Vy E JE>({3)0. The functions a and f3
are said to be orientably equivalent <::=} the Jacobian of xjtB(a)0 is always
positive.
It is not difficult to check that what we have defined above are actually
equivalence relations. We also think that the analogy with Definition

9.1.1 is quite clear.


9.5.2 Definition. A smooth surface in E" is an equivalence class of C1
functionst with domains Jordan measurable sets in Em, m .;;;; n, where equiva
lence is taken in the sense of Definition 9.5.1, and the functions have the same
range in Em.
A smooth, oriented surface in E" is defined in the same way, with the difference
that the equivalence relation is replaced by the orientable equivalence relation.
The common range of the equivalence class of functions that constitute a
(oriented) surface is called the trace of the surface.
As we have done with curves, a (oriented) surface will usually be
designated by any function in the equivalence class that constitutes
the oriented surface. From the point of view of surface area this makes

cp and I/I are two functions


A and B, respectively.
from B0 onto A0 so that l/J(y)

no difference, as we shall now show. Suppose

in the same smooth surface and having domains


Then there is a one-to-one C1 map
=

cp

x(y).

Thus using the chain rule we get

di/I (y)1

di/I (y) = dx(y)1

Using the fact that det dx (y)1


[det

di/I (y)1

di/I (y)]112

dcp(x(y))1

det

[det

dx (y)

dcp(x(y))

dx(y).

we get

dcp(x(y) )1

dcp(x(y))]112 IJx(Y) J.

Thus using the transformation theorem for multiple integrals we get

[det

dtfJ(y) 1

dtfJ(y) ]112 dy

[det

dcp(x)1

dcp(x)]112 dx.

Thus we see that the following formal definition makes sense.

9.5.3 Definition. If cp is a smooth surface in


are bounded, then the surface area of cp is defined as

En

and the partials of cp

t For this definition we shall say a function is C' if the restriction of the function to the
interior of its domain is of class c.

9.5 MOTIVATION AND DEFINITIONS I 433

J
- [
-J

( )=
J)(, cp

.IJ('{>)

.EJ('/))

[det dcp(x)1

dcp(x)]1'2 dx

l:O::::::l,<<imn

(a(cp)ii,
a( 1

)2]112

i )
. . . ,cpm

(x)

)
x
, x
,
m

dx.

(9.5.6)

[ We are supposing ta
h
t cp
( ) C E m, m:;;;; n.]
As an example of the use of formula (9.5.7) let us compute the area

E3

of a well-known surface: a sphere in


sphere of radius

r E3
L1 (x,y) r
L2(x, r
L3(x,y) r
in

cosx,

y)

where

(x,y)

The parametric equations of a

are given by

sinxcosy,

sinxsiny,

ranges over the interval [07T


,
] X [0,27T]. If we compute

the various Jacobians we get

a(L1,L2)( ) 2.
a(x,y) .
a(L1 L3)
a(x,y)
a(L2 L3)(x,y) r2
a(x,y)
xy
,
=

'

sin2 xsiny,

(x,y)=-r2 sin2xcosy,

--' -

sinx cosx .

Squaring and adding, taking the square root, and integrating we get

J)(,(L)=r2 Jro

r
Jo

2 1T

1T

sinxdxdy=47Tr2

Let us now turn to the problem of defining a surface integral which


will be analogous to the definition of a line integral. For motivation let
us again turn to a physical example. Letcpbe a smooth oriented surface
with trace in

in

E3

E3

and domain in

E2

Let vbe the velocity of a fluid flowing

which depends only on the position of the particle of fluid; that is ,

vis a function with domain and range in E3 We shall suppose that vis
continuous and its domain contains the trace ofcp. We suppose that the
trace ofcpis of such a nature that the fluid can flow through the surface
without altering its velocity.
If we suppose that dcp(a) is nonsingular, then we can think of the
range of the affine transformation dcp(a)+<P(a) as being the tangent
plane to cp at cpa
( ). Suppose n(a) is a vector emanating from rp(a)
which is orthogonal to this tangent plane. Then
dcp(a)(u)

n(a)

Vu E

d<P(a)1(n(a))= 0.

2 we have

434 I THE INTEGRATION OF DIFFERENTIAL FORMS

Thus dcp(a)1(n(a) ) = 0, and if we write this equation


form we get

D cp1(a)n1(a)
1
D cp1(a)n1(a)
2

+
+

Assume that

D1cp2(a)n2(a)
D2cp2(a)n2(a)

component

D cp3(a)n3(a) = 0.
1
D2cp3(a)n3(a) = 0.

+
+

Cl(cp1,cp2)
(a) -0
a(x,y)
and take n3(a) to be this Jacobian. Then we can use Cramer's rule on
the previous two equations to solve for n1(a) and n2 (a) to get

( (:;)3) (a) , a(:;)1) (a) , a(:;)2) (a) ) .

n(a)= a

(9.5.7)

If

Cl(cp1 ,cp2) (a)=O,


a(x,y)
then one of the other Jacobians does not vanish, since dcp(a) has rank 2.
Thus we could proceed in a similar way to get (9.5.7). The vector

n(a)/n
j (a)I

(9.5.8)

is called the outward normal to the surface


if the partials of cp are bounded, then

Jlf,(cp)=
The component of the velocity
normal to cp at x is given by

v a

EJ (cp)

cp(x)

cp

at cp(a). Please note that

jn(x)jdx.

(9.5.9)

that is in the direction of the outward

:: , ] ,:=I'

provided, of course, that d cp(x) is nonsingular. Hence the amount of


fluid that flows in a unit time through a small area of surface about
cp (x) in an outward direction is approximately
v a

n(x)
cp(x) . j
n(x)j

JI(,

(Vx)'

where Vx is a small neighborhood about cp(x) on .92,(cp). We may sup


pose Vx = cp(Ux), where Ux is a small neighborhood about x in .e(cp).
But as we pointed out in (9.5.9), we have that JI(, (V x) is approximately
ln(x)l IUxl Thus we are justified in saying that the amount of fluid
that flows in a unit time through the surface cp in an outward direction
is the integral

9.5 MOTIVATION AND DEFINITIONS I 435

r,o(x)

v o

n(x) dx

.EJ(cp)

(9.5.10)
where we have taken

The integral we have just written down is called a

surface integral.

Because of the physical interpretation we have given to this integral,


it should be true that it is independent of the parameterization of the

oriented

surface r,o. This is actually true, but before we prove it let us

try to make an analogy between surface integrals and line integrals.


In a very informal manner let us write down an expression
n

w(x)

LL

k=1 i=1

w;k(x) dx i

/\

dxk ,

x is assumed to range over some set J) C En, n 2, and wJk is


a real-valued function defined on /). Even though this has no meaning

where

for us as yet, we shall call this a second-order differential form. Actually,


it is clear that

w can be given a meaning in such a way that it is uniquely


{w;k: j,k E (1, n ) } . However,
since we shall discuss this in very great detail in Section 9.6 we shall
not bother to do this here. We are using the symbolism dxi /\ dxk simply
determined by the functions in the set

to indicate that we are not thinking of this as the composite linear trans
formation

dxi

dxk which would make w a first-order differential form.

In an analogous manner we can write down an expression of an mth


order differential form as
n

w(x) =

im=1

Jt=t

witim(x) dxii

/\

/\

d,xim,

(9.5.11)

x runs over a set /) C En, n m, and wii ..im is a real-valued


function defined on /). We shall say w is continuous if and only if each

where

one of these component functions is continuous.


Even though we have not given a formal definition of an mth-order
differential form, we shall nevertheless give a formal definition of a
surface integral of a continuous mth-order form. This is not really
too serious a breach in the logical development since the functions

wi t i m are really the important things. Keeping formula (9.5.10) in


mind, we have the following.

9.5.4 Definition. If w is an mth-order bounded continuous differential


form with J)(w) C En, n ;::;;:: m, and r,o is a smooth oriented surface with trace
in /)(w) and /)(r,o) C E "' , and r,o has bounded partials, then the surface
integral of w over r,o is defined as

436 I THE

<P

INTEGRATION OF DIFFERENTIAL FORMS

EJ !<P)

.L ... .L

im=I

it=l

witim

cp(x)

a (cpit, . . . 'cpim )
m ) (x) dx.
a 1

'

'X

(9.5.12)

In order for this really to be an integral over an oriented surface,

it should be independent of the parameterization of the surface that


is used. Supposel/J. is orientably equivalent withcp.This means that there
is a C1 function y taking If) (cp)0 one-to-one onto If) ( l/J)0, having a positive

Jacobian and so that Vx E lf}(cp)0 we have


cp(x) = l/J

y(x).

From the chain rule it follos that


a(cp i1' ...' cpim)
a( l/J j 1, . . ' l/J im )
(
(x)
= a(yl, .. ,ym) (y(x))jy X).
a ( xl, .. ,x"')
.

If we use these last two formulas in

(9.5.12),

and note that y(lf)(cp)0)

lf)(l/J )0 and that lf)(cp)\lf)(cp)0 and lf}(l/J)\lf}(l/J )0 have Jordan con

tent zero, then it follows from the transformation theorem for integrals
that we could use l/J. in place of cp in

(9.5.12).

D Exercises
I.

We have shown that if

is a nonnegative symmetric' linear

transformation of En into E", then there exists a nonnegative symmetric

linear transformation

of E" into En so that B2

Show that

is

unique in the ense that it is the only nonnegative symmetric linear

transformation with this property.

2.

A portion of a cone in E3 is a surface having a parametric repre-

sentation

wherex E

cp(x,y)

[O, 1]

y E

[O, 27T]

l/J (x,y)
where If)(l/J)

B (0, 1) C

this surface.

(x cosy, x siny, x),


Show that

(x y, Yx2 +y2),

E2 "is another parametric representation for

3.

Find the surface area of the surface given in Exercise

4.

A torus is a surface in E 3 having a parametric representation

cp1(x,y)

(R+r cosy) cosx ,

cp2(x,y)

(R+rcosy) sinx ,

cp3(x,y)

whereR >rand (x,y) E

2.

r siny ,

7T ,

1T]

[ -7T, 1T]

The boundary of a bagel

or doughnut ( which is really a soft bagel!) is the trace of a torus.Find


.
its surface area.

9.6 THE ALGEBRA OF DIFFERENTIAL FORMS I 437

5.

In the text we gave a parametric representation of a 2-sphere

of radius

in 3. Give a corresponding parametric representation

for a 3-sphere of radius

same for

6.

iIJ 4 and determine its surface area. Do the

Compare with Exercise

E".

of Section

8.5.

Give a parametric representation for a smooth surface whose

trace is the ellipsoid in 3 given by the set of all points (x ,

the equation

y, z)

that satisfy

Find its surface area.

7.

If 'P is a smooth surface with

partials, prove that


JJ(,

[Hint:
8.

(If!) =

.f>(f{!)

JE>(l{I)

C 2 and 'P has bounded

[!D1 'f!(x) !2 ID21{1(x) I2 - (D1 l{l(x)

D 21{1(x))2] 112 dx .

Use Lagrange's identity (ExerCise I of Section

6.2).]

Suppose a smooth surfac in 3 is given by

l{l(x,y )= (x, y,f(x,Y)).

Show that
JJ(,

= IrJ.e

(If!)

{!{!)

[I

(Dif(x, y))2 + (D2f(x, y))2]112 dx dy.

8 and Fis a real-valued func


with domain an open set of 3 so that !lC.('f!) C JE>(F),
F 'P = 0, and dF('f!(x,y)) - 0, V(x,y) E JE>('f!). Comput JJf,(l{I) in
terms of the function F.
9.

Suppose l{lis the same as in Exercise

tion of class

C1

Suppose 'Pis a smooth, oriented surface in 3 with JE>(r,o) C E2


dr,o(x) is nonsingular, show that the outward normal to this surface
l{l(x) is independent of the paran,ieterization of the surface.

10.
If

at

11.

Let I be the cube in

with itself

surfaces.

9.6

E"

that is the Cartesian product of

[O, I ]

times. Show that I is the trace of exactly two smooth oriented

THE ALGEBRA OF DIFFERENTIAL FORMS

From the definition of a surface integral given by formula

it is clear that

(9.5.12),

J dxi
"'

/\

k=-

J cJxk

/\

dxi.

"'

'

Hence it is reasonable to want to define higher-dimensional forms in

such a way so that

dxi

/\

cJxk=-dxk

/\

dxi.

438 I THE INTEGRATION OF DIFFERENTIAL FORMS

Indeed, if this is the case, and er is a permutation of

(I, m),

then we

must have the alternating commutation relation,

rJxh

(\ . . . (\

rJxim = ( sgn er) rJx ia-1

(\

. . ./\rJxi.:rm.

We have already run into this type of behavior in our study of deter
minants in Section 6.6. Actually this should not be a very surprising
development for us in view of the fact that formula (9.5.12) defines a
surface integral in terms of determinants. If we want to demand also
that Va,{3 ER,

rJxit

(\

(\

s /\
(adxir+[3rJxi)

/\dxim

= a[rJx1/\
1 .../\rJx ir/\.../\rJxim]
+{3[dx i1

(\ . . . (\

rJxis

(\ ... (\

<fx1m],

then we shall have a complete analogy with the alternating multilinear


functionals used in the study of determinants.
Let us now make some of these things precise. The first thing we shall
do is define an algebra. We have already come across the concept of a
function algebra in Section 6.7.

9.6.1 Definition. A real associative algebra is a quadruple (A,


+
, /\
, ),
where the triple (A,+, ) is a real vector space and/\is a function from
A X A into A that satisfies the conditions:

(a)

y
z), Vx,y,z EA.
(x/\y)/\ z=x/\(/\

b
()

zf\(x+y)=(z/\x)+(z/\y)
v x ,y,z E A
z +(/\
y
z)
(x+y)/\z= (x/\)

(c )

x)/\y
y =(a
a (x/\)
=x/\(a )
y , Va ER & Vx,y EA.

The algebra is said to be .finite-dimensional<=> the corresponding vector space


is .finite-dimensional.
As usual, we shall shorten the terminology and simply say that A is
an algebra. Also, we shall suppress the dot symbol and write
of'a

'ax'

instead

x.'

The question now arises as to whether it is possible to embed a real


vector space into an algebra for which an alternating commutation
relation holds. If this can be done, then by embedding the first-order
differential forms into such an algebra we could give a meaning to
higher-order differential forms. There is a standard way of doing this,
and in view of our discussion before Definition 9.6.1, the definitions and
constructions that follow should not be too surprising.
Let

be a finite-dimensional vector space and let Mk ( V) be the set

of all multilinear functionals with domain the k-fold Cartesian product


of

with itself. Under the usual definitions of addition of functions

and multiplication by elements in R, Mk (V) becomes a vector space.

9.6

THE ALGEBRA OF DIFFERENTIAL FORMS I 439

If A E MP(V) and, E Mq(V), define


A,

(v1,

=A (v,'

v,

u1,

. 'Vp

u )
q

) , ( u,' ...

'U ) .
q

(9.6.1)

Also, we set M0(V) =R, and if A E M0(V) we set A,=A


,. It is

quite clear that A , E Mv+q(V). Note that it is not usually true that

A ,=, A.For historical reasons the elements of Mk (V) are some


times called

covariant tensors of

V of order k.

We shall designate by M (V) the collection of sequences A=(An)

so that Vk E N0, Ak E Mk (V). For every A and, in M (V), and Va.


E R, define

(a)

(a.A)n = a.An .

(b)

(A+,)n=An+, n.

(c )

n
(A ,)n =
Ak ,n-k
k=O

With these operations it is a very simple matter to check that M (V)


becomes a real associative algebra. We shall leave the proofs of these

simple facts for the exercises. The algebra M (V) is usually called the

covariant tensor algebra of

V.

For our purposes the algebra M (V) is much too large, since there are

already too many elements in each M k (V) in the sense that they contain

multilinear functionals that are not alternating. If k ;;;.: I we designate


A k(V) that subspace of Mk (V) which consists of all the alternating

multilinear functionals in Mk (V), and we set

A 0(V) =R .

Now, if

A E AP(V) and , E Aq(V), it is not usually true that A , is an

alternating multilinear functional.Thus we define a linear transforma

tion d&k with domain Mk (V) and range A k(V) by means of the equations
d&k(A) =

kl

,L

UETTk

(sgn <r)<r*(A),

k;;;.: 1,

(9.6.2)

d&o(A) =A,

where <r*(A)(v1,

v k ) =A(v"''

,vuk) , and where

of permutations of (I,k) onto itself. If k;;;.: I and A E

1Tk

is the set

Ak(V), then V<T

E 1Tk, (sgn <r)<r*(A) =A. Hence, since 1Tk has kl elements, it follows
that A E A k (V) => d&k (A)=A.This is one reason for the normalizing

factor I/kl

The formula (9.6.2) for k ;;;,; 1 is analogous to the formula (6.6.5)

and indeed is modeled on the latter formula.To show that d&k has range

in A k(V), it is necessary to show that d&k(A) is alternating.The proof is


essentially the same as the proof of Theorem 6.6.4. However, we shall
repeat it.If

E 1Tk, then
T*(d&k (A))=

I
k!

L
<Te

1Tk

(sgn <r)(T0<T)*(A).

440 I THE INTEGRATION OF DIFFERENTIAL FORMS


Let us set

and thus sgn

T 0 <TE rrk. As <T ranges over rrk> so does p. Also <T=T-10


<T = sgn T-1 sgn p = sgn p=sgn T sgn p. Hence
1

T*(d&k(>,))= (sgn T) kl

PE

(sgn

1Tk

p)p* (.\)

= (sgn T) d&k(.\),
which proves our assertion that

d&k{.\) is alternating.
d& on M (V) bysetting

We now define a linear transformation

(9.6.3)
The range ofd& is, ofcourse, M (V).
Let

,\k E

(a)
(b)
(c)

A(V) be the collection of sequences ,\ = (.\n) so that V


k
A (V). For every.\,, E A(V), and Va ER, define
(a .\)n=a,\n
{,\ + JL)n = ,\n + /.Ln
.\ A , = d& (.\,).

With these definitions it is not difficult to show that

E N0,

A (V) becomes a

real associative algebra. This algebra is sometimes called the covariant

alternating tensor algebra of V, or the covariant Grassman algebra of V,

or the covariant exterior algebra of V. The element ,\ A

, is called the

exterior product of,\ and ,.

As a step in showing that the operation A is associative, it is first

convenient to prove a lemma.

9.6.2

Lemma.

For every ,\ and , in M (V),

d&(d&(.\),)
Proof.

d& (,\,) = d& (,\d& (,)).

From the definition ofd& given by (9.6.3) and the definition

of,\,, it is enough to prove that


E

q
A (V) we have

Vp, q E

N0,

V.\ E A P(V), and V,

d&p+q(v'&JJ(,\),) = d&p+q(.\,)= d&p+q(.\v'&q(,)).


Let us consider

rrJJ as a subset ofrrJJ+q in the sense that every<T E

rr JJ

is identified with that element ofrrp+q which leaves invariant all elements
in

(1, p + q) \ (1, p)

and whose restriction to

(1, p)

is

<T. Then we may

write

d&p+q{v'&p(.\),)

(p
=

(p

1 q)

TE 1Tp+q

sgn

L
1 q) p\ UE1Tp
!

sgn

p
er

(sgn

(sgn

(]"E

1Tp

TE1Tp+Q

<T)(T

<T)* (.\ ,)

) ( ro <T)* (.\,) .

9.6 THE ALGEBRA OF DIFFERENTIAL FORMS I 441

Now for fixed r.r E 'Trp, as

runs over 'Trp+q, T 0 r.r runs over 'Trp+q Hence

( sgn T)(T 0 r.r)* (.\ ,)= sgn r.r

'(E1Tp+q

Since there are p ! elements in

ihp+q(vtp(.\) ,)

'TrP,

it follows that

( sgn p)p* (A. ,).

(p + q)

( sgn p)p*(.\ ,)

E11J>+q

ihp+q(A. ,).

A similar computation shows the second equality.


REMARK.

Notice that the normalizing factor l/k! used in the defini

tion of vt is essential for the validity of the last lemma.

9.6.3 Theorem. The tensor algebra A(V) is a real associative algebra


which contains Ak(V) (isomorphically) Vk E N0 If V has dimension n, then
A (V) has dimension 2n and Ak(V)= {O}, Vk > n. Further, Va E R,
and VA. E A (V)
,

a/\.\=aA.=A./\a

(9.6.4)

q
and VA. E AP(V) and V, E A (V),
A./\,= (-l)PQ,/\.\.

Proof.

(9.6.5)

The fact that A(V) satisfies all the conditions for a real

associative algebra can be very easily checked. Actually the only thing
that may cause some difficulty is the proof of associativity. However,
this is an immediate consequence of the last lemma. Indeed, using the
fact that M(V) is an associative algebra we get

(.\/\ ,)/\ T/=vt( (>.. /\ ,) T/)


= vt(vt(.\ ,) T/)
= vt (A.,T/)
= vt(A. vt(, T/))

,\ /\ (,/\ T/).

We shall leave as an exercise for the reader the proofs of the other facts
needed to establish that A(V) is an algebra.
The statement that Ak(V) is contained isomorphically in A (V) means
that there is a nonsingular linear transformation with domain A k(V)
and range in A(V).

Since this fact is almost obvious, we shall leave it

for the reader.


It remains to prove the second and third statements of the theorem.
The formula (9.6.4) is so easy to prove that we leave it to the reader.
In proving the other things we shall proceed in a somewhat informal

442 I THE INTEGRATION OF DIFFERENTIAL FORMS

manner, because the formal proofs of many of the things we say would
require an induction argument, and this would get rather tedious.
However, the induction arguments are not difficult and the reader
wishing to do so can easily fill in the formal details.

(l,n)} be a basis for V. For every j E (l,n), let >._i


A 1 (V) be that linear functional so that V k E ( 1, n) ,

Let {ek: k E
E

j
j

k,

k.

We shall first prove that V k E ( 1, n) the set of multilinear functionals


k
{A.ii/\ /\ A ik: 1 j1 < < lJi n}
generates
A (V). First,
we have

Thus if

er

rrk we have

Hence, by use of Lemma 9.6.2,

>._i1

/\

/\

A_ik (v1,. . .,vk)

vf,(>._i1 .. . >._ik) (v1, .. .,vk)

\ uLe 'TTk

(sgn

er

) Vu/ 1

Vukik.

Now, if A E A (V) , then since A is multilinear,

A.(v1,-

,vk)

v1i1 vkik A.(e;1,"

k!A.(e;1,'

k!A.(e;I' ,e k)A.ii /\ /\ A_ik(v1,i

i1=l

Jk=l

(J)
[j)

-,e;k)

[;

-,e;k)

( sgn

)vu1i1 .

er

vukik

(TE7Tk

-,vk).

where [j] means we are summing over all ordered k-tuples (j1,
< jk n. Consequently, we have shown
jk) for which 1 j1 <
what we set oqt to prove: The set {A.ii /\ /\ >._i k: 1 j1 < < jk
k
n } generates A (V).

Next let us prove that this is a linearly independent set. Indeed


suppose

L,,,
[j]

.
J1 ... -.Jk

.
u.

\
I\

i1

\h
/\ . . . /\ I\

, e1k) we see that


ik
0. Thus the given set is a basis for A (V), and consequently

Then evaluating both sides at the k-tuple (e i,


ah,. . . ,

ik

this vector space has dimension

n !/k ! ( n -

k) !

9.6

THE ALGEBRA OF DIFFERENTIAL FORMS I 443

A(V), if

Note now that by the very definition of

L Laj,,.ik >._i1/\

/\ >.. ik = o,

k=I [j]

then

-O
"" aii .... ,jk A.ii/\ .../\ A.ik.L.J
(j)

Thus, if it is true that


sion

Vk

>

n, Ak(V) = {O}, then A(V) has the dimen

n
nl
=
(l+l) =
" k!(nk)! 2n.

Vk > n, AK(V) = {O} is very easy to prove. Indeed, if


n, then in every set {e;;: i E (l,k) &jk E (l,n)} at least two of
the elements must be the same, since there are only n elements in a
basis for V.Suppose e;r = e;8; then since every A. E A k(V) is alternating,

The fact that

>

if we permute

and

and leave the other indices fixed we get

A.(e1" ,e;k ) =-A.(e;" ,e;k).

From this it follows that A ( ei I'

,ei k)

0. If we now use the expan


A.(v1 ,
, vk )

sion for A given by the first equality in (9.6.6), we get

=O.

To finish the proof of the theorem it remains to prove the relation

(9.6.5). We may as well assume that p+ q :;;;; n; otherwise the relation


is trivial. From the formula (9.6.6) we get

A= LaJi,,i,, >._ i1/\

/\ >._ip,

(j)

. \ i1 /\

"" /3t1..tq/\.
L.J

/\

\
I\

iq

'

[ii

where recall that by the symbol [j ] we mean we are summing over all
P-tuples of integers (j1,

j,,), where l ,,;;; j1

<

<

by [i] we mean we are summing over all q-tuples with


<

iq ,,;;; n. Hence we get


A/\

=LLa;,,,i,,/3iv,iq >._it
[j) [i)

(\

Because of the alternating property, Ai/\

>._ it /\

/\

A_ip/\ >._i1/\

/\

/\ >._ip/\ >._i1/\

A.k = - A.k

/\

j,, :;;;; n, and


l ,,;;; i1 <

/\ >._iq.

Ai, we have

A_iq

This gives (9.6.5).


We shall single out the following statement, which is actually a
corollary of the previous proof. It will be of some importance when we
discuss differential forms.

444 I THE INTEGRATION OF DIFFERENTIAL FORMS

Corollary. Let V be a real vector space of dimension n, {ek: k


( 1 n)} a basis for V, and ,\i that linear functional with domain V so that

. 9.6.4
E

j
j

k,
=

k.

Then the set {,\it /\ /\ ,\ik: 1 j < < jk n} is a basis for Ak( V) ,
1
k > 0, which in turn implies that the set consisting of the number 1 together
with
{A.it f\ ... f\ ,\ik:

is a basis for

j1

<

jk n, k

<

(l,n)}

A( V) .

We are now in a position to give a definition of a kth-order differ


ential form. If the reader will review Definition

9.1.5 he will see that the

following definition is a direct generalization of the concept of a first


order differential form.

9.6.5 Definition. A kth-order differential form w zs a function with


domain in En, n ;;,, k, and whose range is in A k (En).

{ dx i: j E ( 1, n)} have the properties


{,\ i: j E (1, n)} of the previous corollary
{ei: j E (l,n)} for En, where, of course, ei

The linear functionals in the set


of the functionals in the set
with respect to the basis

is that vector whose jth component is 1 and all of whose other com
ponents are zero. Thus, from formula

(9.6.6), Vx

.(w)

we may

write

w(x) =:L wit, ,ik(x) dxit /\ f\ d,x ik,

(9.6.7)

(j)

where we recall again that


k-tuples

(j1,

,jk)

[j]

To see how the right side of


basis for

En,

means we are summing over all ordered

j 1 <
< Jk n.
(9.6.7) changes if we choose a different

for which 1

we first prove the following proposition.

9.6.6 Proposition. Suppose f is a function of class C1 with domain


(an open set) in En and range in Eq, q n. Suppose further that g is a function
of class C1 with domain in E" and range in .(J). Then

d f1

g(x) /\ /\ dfq

= :L
[j]

g(x)

au , . . . fq)
,
. , tiq) (gCx> )dgit(x) /\ ... /\ dgiq(x)
a(tit ' ..
'

where

a( f1,... r )
k(g(x))].
; (g(x))-det[D;f
,
a (.
111,
, t q)
'

(9.6.s)

9.6

Proof.

THE ALGEBRA OF DIFFERENTIAL FORMS j 445

Using the chain rule we may write

dfk g(x)
0

dfk(g(x))

Also, we have

dg(x).

dfk(g(x)) = L D;fk(g(x)) dtj.


j=l

dti

Hence, since

dg(x) = dgi(x),

"

dfk(g(x))
Thus

dg(x) = L D;fk(g(x)) dgi (x).


i=I

df1 g(x)

/\

iq=l

(ih

iq)

dgi1(x)

< jq

{jk: kE (1, q)}.

/\

/\

dgiq(x).

/\ . . . /\

dgiq(x)= 0.

jq) is
(j1,
7T[j] be the

On the other hand, if

dr g(x)

is a q-tuple of integers any two of which are the same,

then

1 j1 <

n q
L TI D;kfk(g(x)) dgi i(x)
i1=l k=l

L
If

we have

n,

let

a q-tuple of integers so that


set of permutations of the set

Any two different sets in the collection of sets of

q-tuples

{ ( <Tj 1 ,

, <Tjq) : <TE 7T [j]}

are disjoint. Moreover, from the alternating property of multiplication


in A (En), if

dga'i1(x)

<TE 7T[j]

/\

.'

/\

we have

dga'jq(x)

sgn

<T dgii(x)

/\

/\

dgiq(x).

Thus we get

df1 g(x)

=L
[j]

/\

L
crerr[j]

/\

sgn

dr g(x)

<T

TI Da'ik fk(g(x)) J dgh(x)

k=l

/\

/\

dgiq(x).

However, the summation in brackets is precisely

a( f 1,
a(tji,

.r)
(g(x))
tjq)
,

This establishes the formula (9.6.8).


Let us use Proposition 9.6.6 to show how the right side of (9.6. 7)
changes when we change the basis of E". Suppose that
is any basis for E", and

VaE En let
n

us write
n

a= L ake' = L ake
k
k
k=I
k=I

{e' : kE (1, n)}


k

446 I THE INTEGRATION OF DIFFERENTIAL FORMS

For every j E

( 1, n)

let yi be that function defined on E n by the equation


yi(a) =cxi.

Now, Vx E En, a simple calculation shows that


dyi(x)(a) =Dayi ( x) = cxi.

Hence dy1(x) is independent of x and we denote it simply by dy1 Now


the set {dyi: j E

properties of the set {A.i: j E

of Corollary

the basis {e' : j E

formula

(1, n)} has the


9.6.4 with respect to
(9.6.6) we may write
w(x) =

L w'it ....,;k(x)dyit

/\

(1, n)}.

(1, n)}

Thus from

/\ dyik .

(9.6.7')

[j]

Let g be that linear transformation acting from En onto itself so that


Vj E

(1, n),

equation

g(e ) =e' Let us define the function y by means of the


i
i

y(a)

y1(a)e
1

j=l

Then we have
n

y(a) =
yi(a)e'; =a.
j=l

Thus we see that g is the inverse of y and


gi

y(a) =a1 =xi(a).

9.6.6

Thus we may apply Proposition


.

dx'l /\ . . . /\ dx'q

(9.6.8)

and we see that

a(gh,.,giq)
.
i
> dyi1 /\ .
a(t t ' .. ' tiq

'L
[i]

Since g is a linear transformation, the

..

/\ dyiq.

(9.6.9)

J acobian

a(gi1,.. .,giq)
a(tii, .. , ti q)

is independent of the points in E", and thus we have deemed it unneces


sary to evaluate it at any point.
If we use the formula
w(x) =

L
(i]

['L
[j]

Now, by Corollary
.;,;;

n}

(9.6.9)

w11" .. ,;q (x)

9.6.4,

in
a

(9.6.7)

we get

(ri:: : : : ':;) J

'

'

'

dyii /\ . . . /\ dyiq.

the set {dyii /\ /\ dy1q:

.;.;; i1 < < iq

is linearly independent. Hence if we compare the equation above

(9.6.7')

with equation
I

we get

w it ;...,iq (x) =

L
[j]

WJi, .. ,Jq (x)

t
a (gi ' giq)
.
a(tit ' ... ' tiq)

(9.6.10)

9.6

The transformation formula

THE ALGEBRA OF DIFFERENTIAL FORMS

I 447

(9.6.10) is in itself not of great impor

tance. However, it does help to show that the following definition is


quite independent of the coordinate system we choose.

9.6. 7 Definition. If w is a kth-order differential form with domain


(an open set) in En, n ;;;.: k, then w is said to be of class cm each function
wii,... ,;k which appears in (9.6. 7) is of class cm. The diff erential of a kth
order differential form of class C 1 is defined by
dw(x)

L
[j)

dw;1, ...,Jk(x) /\ dxi1 /\

/\ dxik.

(9.6.11)

Let us now establish what we mentioned before the last definition:


The class to which a differential form belongs and the definition of
dw(x) are quite independent of the coordinate system used.

9.6.8 Proposition. Suppose w is a q th order differential form with an


expansion given by (9.6.7' ). Then w is of class cm::::> each function w ii,... ,Jq is
of class cm, and moreover if w is of class C1, then
-

dw(x)

L dw'iJ,...,;q(x)

/\ dyi i /\

/\

dyiq.

[j]

Proof.

If

W;l, ... ,Jq

is of class

cm, it follows from formula (9.6.10)

that w'ii,.-.Jq is of class cm. Conversely, since there is clearly a formula


similar to (9.6.10) in which w;1,. ..,;q can be written in terms of w'J1,.. ,Jq,

it follows that if the latter functions are of class


From formula

cm, so are the former.

(9.6. l 0) we get

dw i1,...,;q(x)

L
[j)

'gi )
dw;l,. . .,;q(x).
a(ti1, ... 'tiq)

a(gii,

Note that we are not using Definition

9.6. 7 here, since we are only

dealing with the differentials of real-valued functions as developed in


Chapter

7.

Hence, by use of formula

L
rn

(9.6.9) we get

q
dw';1,...,;q(x) /\ dyi1 /\ . .. /\ dy i
-

-L
[j)

dwh ...,;q(x) /\

[iJ

.
.
a (gJ1,.. ,gJq)
q
dyi1 /\ ... /\ dyi
a< t i1' ' t iq)

dw;1,...,;q(x) /\ dxi1 /\

dw(x).

[j)

/\ dxiq

This completes the proof.


Before we proceed, let us pause for just a moment and look at some
of the things we discussed about line integrals in terms of the language
of differentials of differential forms. First, let us look at a first-order
differential form of class C1 whose domain is in 2 We shall write this as

448 j THE INTEGRATION OF DIFFERENTIAL FORMS

By definition, the differential of

dw(x)= dw1(x)
dwdx) = D1wk(x) dx1

Since

w is
/\

dx1

dw2(x)

D 2wk(x) dx2,

/\

dx2

if we use the alternating

property for the exterior product of differentials, we get

dw(x)= [D1w2(x) - D2w1(x)] dx'

dx2

/\

We have already seen the coefficient of the right side in Section 9.3.
Suppose I/I is a function of class
interval

I= [O, l]

[O, l],

contained in the domain of

C2

and let

w.

whose open domain contains the

<p = I/III.

&R,(<p)

Suppose that

is

According to Definition 9.5.4, we have

L dw=i [D1 w2(<p(x))-D2 w1(<p(x))]f<P(x) dx.


If

Vx

!0,j<P(x) > 0

and if

<plI0

is one to one, then by the transforma

tion theorem for integrals the integral on the right is

<P(/)

[D1w2(x)-D2w1(x)] dx.

Hence, Stokes' theorem, 9.3.l, takes the form

J dw= J
IP

where aq; is another symbol for


If

iJq;

<p 0

w,

'Y in formula

(9.3.12).

is a first-order differential form whose domain is in", then


n

If

has an open

w(x)= L wk(x) dx k.
k=l
1
domain and is of class C ,

then

dw(x)= L [D;wk(x) -Dkw;(x)] dxi


j<k

/\

k
dx .

Recall that a first-order differential form is called closed:::}

& Vj,k E (I,n),D;wk(x)=Dkw;(x).

From_,....the

Vx

previous

we see that a first-order differential form is closed :::}

Vx

o(w)

formula
E

.(w),

dw(x)= 0.
9.6.9 Proposition. (a) If w and
class C 1, then on their common domain

11

are kth-order differential forms of

d(w + v)(x)= dw(x) + dv(x).


(b)

(9.6.12)

If w is a kth-order differential form of class C2, then


d2w(x)= 0.

(9.6.13)

(c) If w and v are differential forms of class C1 of orders p and q, respec


tively, then on their common domain

9.6 THE ALGEBRA OF DIFFERENTIAL FORMS I 449

d(w

/\ v)(x)

= dw(x)

/\ v(x) + (-l)Pw(x) /\ dv(x).

(9.6.14)

The statement (a) is an immediate consequence of the defini

Proof.

tion of the differential operator and there is no need to comment


further.
To prove (b) we write
n

dw(x)

Hence

=LL Drw it,"ik (x)dx r


n

=LL L D,

d2 w(x)

Since dx8 /\

/\ dxii /\

[JJ r=l

fil r=l
dxr

s=l
-

dxr

r
D rw i1, .. .ik(x)dx /\ dx /\ dxi1

/\ dxik.

/\

/\ dxik.

/\ dx8 and D, Drwi1, ...,jk(x) = Dr D,w;1 .....ik(x),


0

the conclusion of part (b) is immediate.

To prove part (c), because of the linearity property (9.6.12) it is


enough to suppose that

= f(x)dxi1 /\
v(x) = g (x)dxit /\

w(x)

where
d(w

/\ dx;P,

/\ dx iq,

and g are real-valued functions of class C1 Hence

/\ v)(x)
=

d(Jg)(x)

/\ dxi1 /\

k
Dkf(x) dx /\ dxi1

+ f(x)

11

L Dkg(x) dxk
k=l

/\ dx;P

/\ dxii /\

/\

/\ dxip

/\ dxi1 /\

/\ dxiq

/\ v(x)

/\ dx;P /\ dxi1 /\

/\ dxiq.

Now, from the alternating commutation relation


dxi

/\ dxk = - dxk /\ dxi,

we get
dx k /\ dxi1

/\ ... /\ dxip /\ dxi1 /\ .

/\ dxiq

(\ . . . (\ dxip (\ dx k (\ dxii (\ .

= (-})P dxi1

(\ d,xiq.

Hence
d(w

/\ v)(x) = dw(x) /\ v(x) + (-l)P w(x) /\ dv(x).

This completes the proof.


NOTE:

In the previous proposition we have used the following defini

tions. If w and v are differential forms, then on their common domain

= w(x) +
v)(x) = w(x) /\

(w + v)(x)

v(x) ,

(w

v(x).

/\

450 I THE INTEGRATION OF DIFFERENTIAL FORMS

Suppose now that cp is a function of class C1 We shall set

cp*w(x)

w0cp(x)

=L wit.ik0 cp(x) dxh0cp(x) /\

/\

iq
dx 0cp(x).

(9.6.15)

[j]

We shall leave it as an exercise for the reader to show that cp*w is inde
pendent of the particular representation we use for w.

9.6.10 Proposition. Suppose w and v are differential forms and cp and


I/I are of class C1 Then
(a) cp*(w + v)=cp*w + cp*v.
(b) cp*(w /\ v)=cp*w /\ cp*v.
(c)
(I/I 0cp)*w=cp*0lfl*w.
(d) If w is of class C1, &e(cp) C J9(w) and cp is of class C2, then
d(cp*w)

Proof.

cp*dw.

The proofs of the first three statements are rather easy and

we leave them as exercises. To prove (d) we first write

cp*w(x)

L wi1...,iq

cp(x) dxi1

cp(x) /\ . . . /\ d,xiq

cp(x).

lj]

Thus from formulas (9.6. l 2), (9.6. l 3), and (9.6. l 4) we get

i
dcp*w(x)=L dwi1...,iq 0cp(x) /\ dx 1

cp(x) /\

/\

d,xiq

[j]

cp(x).

On the other hand, let us note that


n

k
dwi1.-.iq(x)= L Dkwi1,,iq(x) dx '
. k=I
and thus

cp*dw it.,iq(x)

k
L Dkwi1,- ..,iq(cp(x)) dx 0cp(x)
k=l
n

= L Dkwi}.-,iq (cp(x)) dcp k(x)


k=I
=

dwit..,iq 0cp(x).

Hence from (a) and (b) and the previous equality we get

i
iq
/\ dx ]
L cp*dwit.--.iq(x) /\ cp*[dx i /\
i
l l
iq
=L dw i}.--.iq cp(x) /\ dxii cp(x) /\
/\ d,x

cp*dw(x)

[j]

=dcp*w(x).
This completes the proof.

cp (x)

9.6 THE ALGEBRA OF DIFFERENTIAL FORMS I 451

D Exercises
I.

Prove that

M( V)

is a real associative algebra.

2. Complete the proof of Theorem 9.6.3; that is, show that A(V)
is a real algebra and that A k( V) can be isomorphically embedded into
A(V). Can we consider V as embedded in A (V)?
3. If V is a real finite-dimensional vector space and {e :k E(I,n)}
k
is a basis for V, show that there exists a linear functional >..i, acting on V,
so that

j =;t. k,
j = k.
4.

Prove (a),(b) and (c) of Proposition 9.6.10.

5.

If A.,,,v E A3(V), show that


(>..-,)/\(,-v) =(A./\,)+(,/\v)+(v/\A.).

6. Let V,{e :k E(I,n)},and {A.k:k E(I,n)}be as inExercise3.


k
Show that

7.

Suppose that
f..i =

2: au,i,

jE(l,n),

i=l
whereVjE(l,n),A.iand ,1are in A1(V),andVi,j E(I,n),a;1 ER.
Show that
A.1/\

/\A." = det (ai;) ,1 /\

/\ ,".

Find the differential of the following differential forms:


(a) x2ydx+xdy.
(b) xdx+ y dy.
(c) Pdx1 /\dx2 + Qdx2/\ dx3 + R dx3 /\ dx1 , where P, Q, and
R are functions of class C1 defined on an open set in 3.
8.

9. If w is a differential form of order p and YJ is a differential form


of order q, both of class C2 and having a common domain in E", find
the differential of

[(dw)/\ YJ] - [ w/\dri].


IO. Suppose f1,
a common domain in
Dip(x) has rank k ::::}

Jk are real-valued functions of class C1 having


Show that the Jacobian matrix with entries

E".

df1(x)/\

/\dfk(x)

0.

452 \THE INTEGRATION OF DIFFERENTIAL FORMS

9. 7

CLOSED AND EXACT FORMS

Our obj ect in this section is to prove the analogue of Theorem

9.4.6

for higher-order differential forms. We shall begin with a definition.

Definition. A kth- order differential form w of class C1 is said to


9.7.1
be closed dw = 0. A kth (k 1)-order differential form w is said to be
exact there exists a ( k - 1) st-order differential form 'Y) so that w= d'Y).
It is quite clear that if an exact form is of class C1, then it is closed,
since

dw = d27J =

9.4.6
star-shaped

0. We shall prove the analogue of Theorem

for slightly more general regions than open balls, called

regions.
Definition. A set S CE" is called star-shaped 3a E S so that
the set {y: y= (1- t)a + tx, t E [O, l]} is contained in S.

9.7.2

Vx

E S,

It is clear that every convex set is star-shaped, but of course not every
star-shaped set is convex. We shall now state and prove the analogue

9.4.6.

of Theorem

The proof of the higher-dimensional theorem is a

direct generalization of the proof for first-order differential forms.

9. 7.3

Theorem.

If w is a closed differential form, of order greater than


zero, defined on an open star-shaped set in E", then w is exact.
Proof.

3a E '(w) so that the range of


[O, l] by s(x, t)
(1- t)a + tx, is
in '(w). Let us suppose, for the moment, that a= 0, so that s (x, t)
tx.
By the chain rule, and the fact that w is closed, we get
Since

the function

'(w)

is star-shaped,

defined on

'(w)

dw
Supposing that

dw

s(x, t)

dw(s(x, t))

ds(x, t)= 0.

(9.7.1)

is a kth-order form we also have

s(x, t)

dwu1

s(x, t)

dxi1

s(x, t)

[j]

Now,

dxi s(x, t) = xidt + tdxi,

dxi1

s(x, t)

= [xii dt + t dxii]

dxik

dxik

s(x, t).

(9.7.2)

and thus

s(x, t)
A

[xik dt + t dxik]

i=l

(9.7.3)
where the caret over

dxi;

indicates that this term does not appear in

CLOSED AND EXACT FORMS I 453

9.7

the exterior product. Further, we have


dwu1s(x,t)

If we use
tk

(9.7.3)

and

(9.7.4)

LL (-I)i-txi; dwc;1(tx)
fj] .i=l

tk

in

aw(j](tx)
at

(9.7.2),

/\ dxii /\

at

+ t k+i

_.......__

/\ d,xii /\

(9.7.4)

(9.7.1)

and then use

/\ dt /\ dxii /\

L awc;i(tx) dt
ui

dt+t dwui(tx) .

"d,x i k

we get

/\ dxik

L dwrn(tx) /\
[j]

dxii /\

/\ d,xik

O.

Now, the last term on the left is tk+i dw(tx), which is zero since dw(tx)
=

0. Thus moving dt through to the right on the remaining two terms

we get
k
LL
( -1) i-txi; dw1n(tx)
[j] i=l

/\ dxi1 /\
=

......--:-..
/\ dx3i /\

L aw(j]Ctx>
at

ui

dxi1 /\

/\ dxi k /\ dt

.. .

/\ d,xik /\ dt.

(9.7.s>

Let us set
71(x)

(-l)i-t

[j] i=l

[ J(

tk-t w1n(tx)x1i dt

/\ dxJi /\ . . . /\ dxik.

dxii /\

In order to compute d71(x) we must compute the differentials of the


coefficients. Using Theorem

8.4.3, which allows us to differentiate under

the integral sign, we get


d

[J:

tk-1wrn(tx)xii dt

J
[J:

tk-1w c;1(tx) dt

dxi;

dxm

J:

tk xiidw u1(tx) dt,

where the last integral is defined as the sum

m=I

[Jo{

tkx1iDmwrn(tx) dt

Consequently, we get
d71(x)

L
[j]

f
Jo

tk

[f

[2:
[j) i=I

tk-1wu1(tx) dt

dxi1 /\

(-I)i-xi;dw1n(tx) /\ dxii /\

/\

d'): /\

/\ dxi k

dt.

454 J THE INTEGRATION OF DIFFERENTIAL FORMS

where the last integral must be given the obvious interpretation. Now,
integrating by parts, we get

(1 k
t -lw

Jo

[j]

(tx)dt

[j]

(1 k awui(tx)
(x) t
dt
.
at

Jo

Putting this into the expression for d11 and using (9.7.5) we get
d11(x)

w(x) .

We have done all of this under the assumption that


s(x, t)

tx. In case

0, that is,

a # 0, set
v(x)

w(x

+a),

x E .B(w)

a.

Then v is a closed differential form defined on a star-shaped set to which


we may apply the previous considerations. Hence there exists a differ
ential form so that d(x)
d11(x)

v(x). But if we set 11(x)

(x

- a), we get

w(x), and the proof is complete.

The theorem we have just proved is usually referred to as Poincare's


lemma, although the result in question seems to have been first proved
by V. Voltera. To connect the concept of a closed form with the concept
of an exact form for domains that are more general than star-shaped
sets, it is necessary to discuss certain "topological" properties of sets
in En. This is done in terms of the

cohomology groups of a set. The relevant

theorem is due to G. DeRham. We shall forego a discussion of this


matter, because it would take us too far afield.

D Exercises
1.

Let w be a second-order differential form of class C1 defined on

an open set in E3;


w(x)

L w;,.12(x) dxii
[j]

/\ dxi2

Show that w is closed Vx E .B(w),

D3w1.2(x) -D2w1,3(x)
2.

+D1w2,3(x)

Let w(x,y, z) be the second-order differential form of class

C1 defined on E3 by
w(x,y,z)

xdy /\ dz+ydx /\ dz+xydx /\ dy .

Show that w is closed and find a first-order form 11 on E3 so that w

3.

d11.

Justify the use of formula (9.7.5) in the proof of Theorem 9.7.3.

S is an open star-shaped region in En and cp is a one-to


S, range in En, and having a
nowhere-vanishing Jacobian. If w is a closed form on cp (S), show that
4.

Suppose

one function of class C2 with domain

w is exact.

9.8

Suppose that

5.

MANIFOLDS I 455

w is an odd-order differential form of class C'

defined on an open set in En and g is a nowhere-vanishing real-valued


function of class

C1 with domain (w) so that gw is closed. Show that

w /\ dw = 0.
9.8

MANIFOLDS

We would like to prove a version of Stoke's theorem in higher dimen


sions for objects that are more general than most surfaces. Thus we
wish to talk about certain geometric objects called manifolds, which are
essentially nothing more than surface elements that have been patched
together in a consistent way. We begin with some definitions.

9.8.1

Definition.

Suppose M is a set in En. An m-dimensional chart on

M is a homeomorphism with an open domain in E"' and range a (relatively

open) set in M. A collection of m-dimensional charts on M is called an atlas


forM
M= U {512,(<p): <p

E }.

(9.8.1)

If the collection <I> of all m-dimensional charts on M is an atlas forM, then the
ordered pair (M, <I>) is called an m-dimensional topological manifold. Jn the
latter case <I> is called the continuous structure for M, and Mis called the trace
of the manifold.
For the sake of simplicity we shall usually designate a manifold by its
trace M rather than by the ordered pair consisting ofM and the struc
ture for M. Generally speaking, topological manifolds are not "smooth
enough" to be able to carry out very much analysis on them. Thus it
is necessary to describe classes of differentiable manifolds.

9.8.2 Definition. Let (M, <I>) be an m-dimensional topological manifold


and <l>k the collection of all charts in the structure <I> which are of class Ck,
k ;;;_,: 1 and each of which has a nonsingular differential at each point of its
domain. The ordered pair (M, <J>k) is called an m-dimensional, regular Ck
manifold <J>k is an atlas for M.
Any subset of <J>k that is an atlas for M is called a regular Ck atlas for M.
If (M, <J>k) is an m-dimensional, regular Ck manifold, then <J>k is called the
Ck structure forM.
As before, we shall often abuse the language and simply say that M
is an m-dimensional, regular

k can take on the value

oo

Ck manifold. In the previous definition


ck by 'analytic,' then

also. If we replace

we call M a regular analytic manifold. For the sake of rounding out the
terminology we can designate a topological manifold as a
If M is a

Ck manifold, and <p, l/J E <J>k, then


.

<p-1

"'

C0 manifold.

456

I THE INTEGRATION OF DIFFERENTIAL FORMS

is a Ck function. The proof is very similar to the proof of Corollary

7.5.6. Indeed, from Theorem 7.3.3 we know that , has a differential


at every point of its domain and hence

d<p(,(x))
Now, at every point

d,(x)

,(x), d<p(,(x))

dijJ(x).

has rank

and hence by Cramer's

rule we can solve for the entries of the Jacobian matrix of, in a neigh
borhood of

x as a quotient of determinants that involve only the function

and the partials of 'P and ijJ. We can then proceed as in Corollary

7.5.6. We shall ask the reader to give all the precise details in an exercise.
Let us look at some simple examples. Let us take M as the unit circle
in

2,M

{x: lxl

1}.

The functions

<p(t) =(cos 21Tt, sin 27Tt),


ijJ(t) =(cos 21Tt, sin 27Tt),

t
t

E
E

]-7T, 1T[,
]0,27T(,

are C" homeomorphisms with nonsingular differentials at each point


of their respective domains. Thus M is a one-dimensional, regular

C" manifold. Indeed, M is even analytic.


Suppose now that M is the unit sphere in

We could again use

polar coordinates to get a parametric representation of the two-dimen


sional sphere in 3 However, for the sake of diversity, and also because
it is somewhat easier, we shall proceed in a different way. Let
unit sphere in

3;

that is,

S2

{ixl: x

3 & lxl

= l}. On

S2 be the
S2 we shall

consider the following relatively open sets:

V/ = {x: x
V;{x: x
=

We shall consider the functions

E2

E
E

S2 & xi> O},


S2 & xi < O}.

1T;

taking

V;

onto open sets

U;

in

as follows:

1T1(x) = (x2,x3),
1T (x) (x1,xa) ,
2
1T 3(x) (x1,x2) .
=

It is a very easy matter to check that these are one-to-one functions and
the inverse functions are of class C''' with nonsingular differentials at
each point of their respective domains. Thus

S2

is a regular, two

dimensional C" manifold. Indeed it is even an analytic manifold. We


shall leave the proofs of these simple facts for the exercises at the end.
To have an integration theory on a manifold, it is necessary to limit
the differentiable structure for the manifold. We have already seen this
to be the case when we integrated over surface elements. We saw in
Section 9.5 that it was possible to define an integral of a differential
form over a surface, provided the surface was oriented. The same
situation persists for manifolds. We are thus led to the definition of
an orientable manifold.

9.8

MANIFOLDS J 457

9.8.3 Definition. A regular Ck manifold (M, <J>k), k 1, is said to be


orientable there is an atlas 'II' C <J>k so that V'(J, l/J E 'II' and Vx E
1
J('(J1/1) ,]ip-1o1JJ (x) > 0. Such an atlas is called an oriented atlas for M.
0

In defining an oriented manifold it is convenient to be able to specify


a

maximal oriented atlas. If M is an orientable manifold, then we put an


equivalence relation on the cla5s of oriented atlases for M by saying that
two oriented atlases 'II' 1 and 'II' 2 are equivalent if and only if 'II' 1 U 'II' 2
is an oriented atlas for M. We shall leave to the reader the easy task of
verifying that what we have said constitutes the definition of an equiva
lence relation.
If S is an equivalence class under the previous equivalence relation,
then we shall call

an

oriented structure for M. Clearly 'IJ'k is a maximal oriented atlas for M.

9.8.4 Definition. A regular, oriented, m-dimensional Ck manifold is


a pair (M, 'IJ'k), where 'IJ'k is an oriented structure for M. An oriented structure
for M is also called an orientation for M.
It is not always true that a regular

Ck

manifold is orientable. As an

example we shall consider the manifold called the Mobius

band. We shall

first describe the trace of this manifold geometrically, which will enable
us to get a parametric representation for it.
Let us take a circle of radius 2 in the

(x, y)

plane as represented by

the dashed line in Fig. 9.8. 1. Take a line segment of length 2 and keep
z

FIGURE 9.8.1

its center on this circle. Starting with the line segment on the x-axis in
a vertical position, move it continuously around the circle and rotate

458 j THE INTEGRATION OF DIFFERENTIAL FORMS

the line continuously around its fixed midpoint so that it is always in


the plane perpendicular to the circle and will have completed a rotation
of 180 degrees when the circle has been completely traversed. The point
set in

swept out by this moving line is the trace of the Mobius band.

To get a parametric representation for the trace of the Mobius band,


we first write down the parametric representation of the circle in
This is the function from

[O,27T] to E3
2(cos

c(8) =

8,

sin

8, 0) .

The tangent vector to this curve at the point

d) / Id) I

= (-sin

c(8)

is, by definition,

8, cos 8, O).

By definition, the plane perpendicular to the curve

c(8)

given by

at the point

is the plane perpendicular to the tangent vector at

c(8).

It is

given by

{(x,y,z): -x

8+y

sin

cos

8 = O}.

A ball of radius 1 with center at the point


all

(x, y,z)

in

E3

x(r, v,)
y(r, v,)
Z( T, V,)
where

c(8)

is the collection of

given by

]-1,1(, v

2 cos 8 + r sin v cos


2 sin 8 + r sin v sin
T COS V ,

[O,7T], and

[O,27T].

(9.8.2)

lf we fix, say=

then i t can b e checked immediately that the set in

8,

{(x(r, v,8),y(r,v,8), z (r, v,8)): TE ]-1,1(, vE (0,7T]}


is the open disk of radius I with center at
perpendicular to

at

c(8).

If we fix

c(8)

and allow

that lies in the plane

to vary, we get a line

c(8). If we
0 and v(27T)=

segment in this disk which goes through


tinuous function of

so that

v(O)

make
7T,

some con

then we have a

parametric description of the trace of the Mobius band. The easiest


function to take for vis, of course, the one given by
If we set =

and

defined on the rectangle

8/2 in (9.8.2), then the


{(r,8): r E ]-1,l[, 8

v(8)

8/2.

resulting function is
E

[0,27T]},

and the

range of this function is the trace of the Mobius band. However, just
using this one function will not justify the fact that this point set in

is the trace of a C"' manifold, since the function does not have an open
domain in

E2,

E2

Actually we need two functions, defined on open sets in

to show that this set is the trace of a manifold. Let us consider the

function cp with domain 2 and range in E3 whose components are given


as follows:

(
(2

cp1(r,8)= 2

sin

cp2(r,8)

sin

cp3(T,8)

COS

)
U

cos

sin

(9.8.3)

9.8

ip2 be ip
]-7T/2, 7T/2 [. The functions ip1
and ip2 are C"' homeomorphisms and (ip1) U (ip2) is all the trace of
the Mobius band. Further, it is a rather easy matter to check that dip1
and dcp2 are nonsingular at every point of '(ip1) and '(ip2), respec
Let

ip1

be

ip

MANIFOLDS I 459

restricted to the open set ]-1, 1 [ x J O,27T[ and

restricted to the open set J-1, 1 [ X

tively. Thus we have all the ingredients for a manifold.

ip1 restricted to the open set U1 ]-1, 1 [ X ]37T/2,27T [


ip2 restricted to the open set U2 ]-1, 1 [ X ] 7T /2 O[
1
range. The function ip2'Pi. restricted to U1, takes U1

The function

and the function


have the same
onto

U2 and
'/)2-l oip1('T,8)

(-T,8-27T),
(T,8) E U1

The Jacobian of this transformation is -1. Note also that


restricted to ]-1, 1 [ X ] O, 7T/2 [ coincide.
Suppose the Mobius band is orientable and

'1'

ip1

and

ip2

is an oriented struc

1
('T,8) for which 3l/J E '1' so that
1
('T,8) E '(l/J- ip1) and J.i,-1.p, ('T, 8) > 0. Also let Q be the set of all
1
1
points ('T, 8) for which 3tjJ E '1' so that ('T, 8) E '(tjJip1) and
1
]w- <P1 ('T,8) < 0. It is clear that P and Q are open. Since dip1 (x) is
1
always nonsingular and since VtjJ E '1' , dtjJ(x) is nonsingular, it follows
1
ip1 never vanishes. Hence the union of P
that the Jacobian of tjJand Qis '(ipi).
1
1
Since '11 is an oriented structure, P n Q 0. For, if tjJ, t/J1 E '1'
1
and tjJip1 has a positive [negative] Jacobian at ('T,8), it follows
1
1
tjJ1- tjJ tjJ- ip1 has a positive [negative] Jacobian at
that tjJ1-i ip1
('T,8). Finally, P and Q are both nonvoid. For suppose that V is a
connected open neighborhood of (0, 0) and ip2(V) is in the domain
1
1
of tjJ- , where tjJ E '1' Clearly, there is a neighborhood W C U1
]-1, l[ X ]37T/2,27T[ so that ip1(W) C ip2(V). Now, let ('T1,81)E
1
V n J0(ip2) with 81 > 0. Then the Jacobian of tjJip1 at ('Ti. 81) is
1
the same as the Jacobian of tjJip2 at this point, since ip1 ip2 in a
neighborhood of ('T1, 81). On the other hand, let ('T2, 82) E W. Now,
1
1
1
1
tjJ- ip1 tjJ- ip2 ip2- 'Pi. and it follows that the Jacobian of tjJ- ip1
1
i
at (T2,82) is negative the Jacobian of tjJip2 at ip2- ip1(T2,82), since
i
we have shown that the Jacobian of ip2ip1 is -1 at (T2, 82). Since Vis
1
i
connected, ip2ip1('T2, 82) E V and the Jacobian of tjJ- ip2 never
1
vanishes,- it follows that the Jacobian of tjJip2 at ('T1,81) has the
1
same sign as the Jacobian of this function at ip2ip1(T2,82). Thus
1
ip1 at ('Ti. 81) is negative the Jacobian of this func
the Jacobian of tjJtion at (T2, 82). This shows that P and Qare not void.
ture. Let P be the set of all points
0

We have shown, under the hypothesis that the Mobius band is


orientable, that

J0(ip1)

is the union of disjoint nonvoid, open sets.

This contradicts the fact that


band is not orientable.

'(ip1)

is connected. Hence the Mobius

460 I THE

INTEGRATION OF DIFFERENTIAL FORMS

OPPOSITE ORIENTATIONS
If qrk is an orientation for M, then there is another orientation for M
that is canonically associated with qrk and which we label _qrk_ It can
be called the orientation for M that is opposite or negative to the
prientation qrk_ Let us describe -'l'k. Suppose that M is an n-dimen
sional manifold - so the elements of '}lk are defined on open subsets of
En. There is an atlas '11 C qrk so that every element of qr has its domain the
open unit ball. Indeed, Vr/J E '}lk and Vx E JF>(rjJ) let B(x, Px) be a
ball contained in JF>(rjJ). Let Tx be the function with domain B(O, I)
defined by

Tx(t) =pxt+ x.
It is clear that T x is a C"" homeomorphism that takes B (0, 1) onto
B(x , px ) . Let us set

rfix=r/J0Tx.
Since Tx has a positive Jacobian, it is clear that rfix E '}lk. Now take

It is clear that '11 is an atlas with the prescribed properties: All elements
in qr have domain B(O, 1).
For every rjJ E qr let us set

!Ji- (x)=ljl(-x1,x2, ,xn).


Then since

(x)= r/J-1

!Ji_(x)

(-xi' x2, ... 'xn),

it follows thatj(x) < 0, so that rfi-fi- '11. On the other hand, if cp E '11
and we set

v=cp_-1 orjJ_= (cp_-1 ocp) o (cp-1 orjJ) o (!Ji-I o!Ji_),


it follows that Vx E JF>(v) ,]v(x) > 0. Thus it follows that the set

- qr= {!Ji_: !Ji E 'I'}


is an oriented Ck atlas for M. We shall designate the orientation for M
that contains the atlas -qr by -'l'k. We have asked the reader to investi
gate the situation somewhat further in Exercises 5 and 6 below.

D Exercises
I. Give all the details of the fact that if
Vcp, rjJ E <I>k, the transition function

cp-1 0!Ji

is a C k manifold, then

9.9

is a

INTEGRATION ON MANIFOLDS I 461

Ck

function.

2.

Show that the six projection functions

in Section 9.7 can be used to define a regular

{ 7Tk: k E ( 1, 3)} defined


C00 atlas for S2, and indeed

even a regular, analytic atlas for S2

C00

3.

Show that S2 is the trace of an orientable

4.

Show that the torus of Exercise 4 of Section 9.5 is the trace ot a

regular, orientable

5.

C'"'

manifold.

manifold.

If M is a connected, regular, orientable

C1

manifold, show that

there exist exactly two oriented structures for M. In other words, there
exist exactly two orientations for M.

6.

Show that the results of Exercise 5 are not valid if M is not con

nected. Indeed, compute the number of orientations for M in terms of


the number of components in M.

9.9

INTEGRATION ON MANIFOLDS

In Definition 9.5.4 we defined the integral of a differential form over


an oriented surface element. To define the integral of a differential
form over a bounded subset of an oriented manifold, it is necessary to
patch together, in a coherent way, integrals over surface elements. The
easiest way to do this is through a device called a

partition of unity.

We

have already run across this concept in Section 6. 7 in connection with


the Arzela-Ascoli theorem.

{B(xk> pd: k E (l,p)} a finite


ea be that real-valued function with

Let K be a bounded set in En and


covering of K by open balls. Let
domain E1 which is defined by
_

e a(r)
The function

{el/(r2-a2>

lrl <a,
lrl a.

ea is of class C00, and vanishes only for lrl a. Let l/Jk be

that real-valued function with domain Em defined by

lflk (x)
lflk is
x E K,

The function
therefore, if

of class

C00,
p

:L

k =I
Let us setB

epk ( l x - xk l) .
vanishes only for

Ix - xk I Pk>

l/lk (x) ,e o.

U {B(xk,pk ): k E (l,p)}, and Vx EB let us set

1Tk (x)

lflk (x) I L l/li(x).


j=l

and

462 I THE INTEGRATION OF DIFFERENTIAL FORMS

Then

1T is a C00 function with domain B and Vx E K,


k

Such a collection of functions is a partition of unity for K. The formal


definition is given below.

9.9.1
Definition. Let K be a bounded set in En. A Ck partition of unity
for K is a finite collection { 1T J : j E (I ,p)} of real-valued Ck functions each
having as domain an open set in En containing K and having the following
properties:
(a) 7TJ 0, Vj E (l,p).
p

( b)

J=l

1T;(x)

l, Vx EK.

For the theory of integration on a manifold it is necessary to have


the notion of a partition of unity for K which is subordinate to an open
covering for K. In defining this concept it is convenient to talk about

support of a real-valued function, which is defined as the closure of


the set of points where the function does not vanish.

the

9.9.2
K, and

Definition.

Let K be a bounded set in En, 'U an open covering for

{ 1T; : j E ( 1, p)} a partition of unity for K. The partition of unity

is said to be subordinate to the open covering 'U <=> Vj E ( 1, p) , 3 U E 'U


so that the support of 1T; is contained in U.
9.9.3 Proposition. If K is a compact set in En and 'U is an open cover
ing for K, then there exists a C00 partition of unity for K which is subordinate
to 'U.
Proof.

x E K there exists an open ball with center at x


whose closure is contained in an element of 'U. Since K is compact, there
is a finite set {B (x > P ) : k E ( 1, p)} that covers K. We have previously
k k
constructed a C00 partition of unity for K that is subordinate to this
For every

finite set of open balls. Thus the proposition is proved.


We are now in a position to define the integral of a differential form

M
M, then when we speak of the boundary
of K we shall mean those points x E M so that every relatively open set in
M that contains x contains points of K as well as points in M\K. We shall
over suitable sets in oriented manifolds. Let us first remark that if

is a manifold in En and K C

designate this boundary of K by 'aK'. In general it will not coincide


with the boundary f3K, which is taken with respect to E".

9.9.4

in

En

Definition. If (M, <I>k) is an m-dimensional, regular Ck manifold


and K CM, then K is called a Jordan domain in M <=>the closure of

9.9

INTEGRATION ON MANIFOLDS I 463

Kin Mis compact and Vcp E <I>k the set cp-1(aK) has m-dimensional Lebesgue
measure zero.

We should remark if M is a regular C1 manifold and the elements of


any atlas for M satisfy the conditions of the previous definition with
regard to iJK, then the same is true for the structure <1>1 on M. This is
essentially a consequence of Theorem 8.2. 7 and we shall leave the details
for the reader.
If (M, <I>k) is a regular, m-dimeilsional Ck manifold in En, then there
is a covering 1.b of M by open sets in En so that the intersection of every
open set in 1.b with M is the range of an element in <I>k. This is simply a
consequence of the fact that the range of each element of <I>k is a rela
tively open set in M. If K is a Jordan domain in M, then it can be covered
by a finite number of elements in 1.b and thus there is a partition of unity
for K that is subordinate to 1.b.
9.9.5 Definition. If (M, <I>k) is a regular, m-dimensional, Ck manifold
in En, 1.b is a covering of M by open sets in En so that the intersection of every
element in 1.b with M is the range of an element in <I>k which has a bounded
domain, K is a bounded set in M, and { rr;: j E ( 1, p)} is a partition of unity
for K which is subordinate to 1.b, then this partition of unity is called a partition
of unity for K subordinate to <I>k.

We are now in a position to give the definition of the surface integral


of a differential form.
9.9.6 Definition. Let (M, 'lfl) be a regular, oriented, m-dimensional C1
manifold in En, K a Jordan domain in M, and w a continuous mth-order
. differential form with domain an open set in En which contains K. Then if
{ 7T;: j E (1, p)} is a continuous partition of unity for K subordinate to '111,
we define

r
JK

i
i=l

r
Jx

1T;W,

where rr;IM has support in (cp;).

Let us make some remarks about this definition. First, the Riemann
integral used to define the integral of the differential form 7T; w
over K exists. This is one point where we make use of the fact that
{ rr;: j E (1, p)} is subordinate to 'I'1 Indeed, if the function under
the integral sign is defined to be zero outside cp;-1 (K), then the extended
function is continuous except possibly at the points in the set cp;-1(aK).
Since, by hypothesis, this set has Lebesgue measure zero, the result

464 I THE INTEGRATION OF DIFFERENTIAL FORMS

follows from Theorem 8.3. 7. Note that the definition of the integral
of the differential form 1T;W coincides with the Definition 9.5.4, although
in the latter definition we did not demand that the function
rank

cp

have

m.

The definition of the integral of

over K given in Definition 9.9.6

seems to be dependent upon the particular choice of a partition of unity


that is chosen. This would, of course, be an intolerable state of affairs,
and we want to prove that the definition of the integral is independent
of the particular choice of partition of unity for K; from that it will be
clear that the definition of the integral of 1T ;w is independent of the chart

cp; provided, of course, that 7T;IM has support in (cp;).It is precisely at


this point that we use the fact that we are integrating on an oriented manifold .
9.9. 7
9.9.6

Proposition. The definition of the integral given in Definition


is independent of the choice of the partition of unity for K subordinate

to '111
Proof.

Suppose

{7r;:j

(l,p)}

and

{cr;:j

(l,q)}

are partitions

of unity for K, both subordinate to '111 Let us suppose that


support in

(cp;) and cr;IM

has support in

7T;IM

has

(t/I;).

For the sake of simplicity let us set

f;(t)='TT; 0 cp;(t) L

W[i] 0 cp;(t)

lil

a(cp/1, ... 'cp/m)


a(tl . . . 't'n) (t) .
'

Then .we have

rp;-l<K>

f;(t)dt

qJ
iJ

:L

i=l
=

i=l

rp;-l<K>

CT; 0 cp;(t)fj(t) dt

(9.9.1)

1T;CT;W.

The last equality follows from the first formula in Definition 9.9.6,
since we can think of

'TT;

as multiplying the form

cr;w.

Let us set

This is a one-to-one function of class C1 with domain

Since we are working with an oriented manifold, it has a pos1uve


Jacobian at each point of its domain. We shall apply the transformation
theorem for integrals in the form

g(S)

f0g-1(t)}0-1(t)dt=

f J(t)dt,
S

(9.9.2)

9.9 INTEGRATION ON MANIFOLDS I 465

which is possible if g is a one-to-one C1 function with S C .e(g) and


has a positive Jacobian. Let Ku be the support of <Ti1T;. Then if we take
S
rp;-1(K n Kii),g= g ii, andf(t) =er; 0 rp;(t)f;(t), and apply (9.9.2),
the right side of (9.9.1) becomes
=

i=l

l/J;

-l(

CT;

..-1(t)dt.
l/J1(t)f; 0 g;;-1( t) }gtJ

(9.9.3)

Note that we can use l/J;-1(K) in place of l/J;-1(K n K0), since the
support of the integrand is l/J;-1 (K0). It is at this point that we make essen
tial use of the fact that our partitions of unity are subordinate to 'I' 1 Now,
(9.9.4)
= 1T; 0 l/J;(l) " W[k]

fi

l/J;(t)

iJ(rp/t,

Further, since

rp/m)

" (t i ' ...' tm)

(giJ-l(t))}g ..-1(t)
lJ

it follows from the chain rule that

If we use the fact that

then it follows from (9.9.4) and (9.9.5) that

If we use this in (9.9.3), then we get from (9.9.1),

p fK

j=l

1T3W=

L Lq

p JK

CT;1T;W

=l i=l

q 11 r 1T;<T1W=
q r CT ;W.
JK
JK

= j

The first equality comes from the above computations. The first and
third sums are equal by formula (9.9.1). The last equality is, of course,
obtained by interchanging the roles of {7T;:j E (l,p)} and {er;:
j E ( 1, q)} in the previous proof. This completes the proof.

466 j THE INTEGRATION OF DIFFERENTIAL FORMS

9.9.8 Proposition. Suppose wi and w2 are continuous differential forms


so that the conditions of Definition 9.9.6 are satisfied. Then Va, f3 E R,
we have
(9.9.6)
The proof is an immediate consequence of Definition 9.9.6

and we

shall leave it as an exercise for the reader.

9.9.9 Proposition. Suppose the conditions of Definition 9.9.6 are satis


fied and Ki and K2 are Jordan domains in M with K1 n K2 oKi n oK2;
that is, Ki and K2 have at most boundary points in Min common. Then
=

L,uK2 w L, w L. w.

(9.9.7)

The proof of this is also an almost immediate consequence of Defi


nition 9.9.6

and we again shall leave it for the reader.

Finally, let us establish the following proposition, which will be


useful in Section 9.10.

9.9.10 Proposition. Suppose 'P is a C1 homeomorphism with le(ip) (an


open set) in Em, se('P) C En, m n, and d'{)(x) is of rank m, Vx E le(ip)
(see the Remark below). Further, suppose that A is a Jordan measurable set
with AC le(ip), and w is a continuous mth-order differential form with domain
in En and se ('P) C le(w). Let us consider ip (A ) as a Jordan domain in the
regular oriented m-dimensional manifold (se('P), '111) where 'P E '111, and
let us consider A as a Jordan domain in the regular m-dimensional oriented
manifold (Em, <I> 1 ), where <I>1 contains the identity transformation of E m onto
itself. Then

y:>(A)

J 'P*w,

(9.9.8)

where ip'!'w is defined by formula (9.6.15).


Proof.

By definition

y:>(AJ

JL

[j)

Wen

ip(t)

a ('Ph'

. . . ' 'Pim)
(t) dt .
l
o(t ' . . . ' em)

(9.9.9)

On the other hand,

'{)*w(t)

L wu1
[j]

If in Proposition 9.6.6

(\ ... (\

ip(t) diph(t)

/\

/\

dipim(t).

.('Ph,
, ipim ) and g to be the
Em onto itself we get

we take f

identity transformation of

diph(t)

dipim(t)

a ( ipi
=

. . . ipim)
'
(t) dti
a(ti, . . ,tm)
i,

(\ ... (\

dt m.

9.10

STOKES' THEOREM I 467

Thus

Consequently, by definition, the integral of <p*w over A is exactly the


right side of (9.9.9). This concludes the proof.
The formula (9.9.8) is valid without the assumption that

REMARK:

'P is a homeomorphism and

dq;(x)

is of rank

m.

Indeed, the proof

only uses these things to remain in the context of integration on a

manifold.

If we had used the concept of integration over a

surface,

as given in Section 9.5, we would have only had to make the assump
tion that 'P is a

C1

function.

D Exercises

En

I.

Prove Proposition 9.9.8.

2.

Prove Proposition 9.9.9.

3.

Suppose

(M, c:I>1)

is a regular, m-dimensional C1 manifold in

and K is a Jordan domain in

M.

Give a definition of the surface area

of K that is independent of any partition of unity for K.

9.10

STOKES' THEOREM

It is the purpose of this section to generalize the results of Section 9.3.


To do this the first thing to do is to prove a version of Stokes' theorem
for an n-dimensional interval.
Let I be the unit interval in

En

given by

/={x:xkE[O,l],
Let I0k be the

(n

Vk E (l, n)}

I )-dimensional face of I given by

J0k= {x:x E I & xk = O},


and let

/1k be

the

(n -

!)-dimensional face of I given by

I 1k={x: x E I & xk = 1} .
The space

En

may be considered as a regular, n-dimensional

C'

mani

fold, and the interval I is certainly a Jordan domain in this manifold.


Actually,

En

is an orientable manifold. Indeed, if '111 is that oriented

structure for

En which contains the identity transformation, then


(En, '111 ) is an oriented manifold. When we integrate a differential form
over I we shall always consider I as a Jordan domain in this latter oriented
manifold. Perhaps we should also note that according to Exercise 5 of
Section 9.8, there is only one other oriented structure of class C1 for En.

468 I THE

INTEGRATION OF DIFFERENTIAL FORMS

k and J k is a Jordan domain in


1
an (n - !)-dimensional, regular, oriented C1 manifold. Indeed, let
Mk be the range of the function 'Pk that has domain En-1, and is given by
Each

(n -

1)-dimensional interval J0

'Pk(t)
where the

e1

k-1

j=l

J=k+l

2: tiei + L

ti-lei,

(9.10.1)

are the standard unit vectors in E n. If <1>1 is that oriented

structure on Mk which contains the function 'Pk then (Mk. <1>1) is an

we shall always consider J0k as a Jordan damain

oriented manifold, and

In a similar manner, let Nk be the range of the

in this oriented manifold.

function t/Jk with domain E" -1 given by

(9.10.2)
If '1'1 is the oriented structure on Nk that contains t/Jk, then (Nk '1'1)
is an oriented manifold and

we shall always consider I/ as a Jordan domain

in this oriented manifold

Suppose now that w is a C1 differential form of order

(open) domain contains /. If we write

w(x)

L wr;1(x)

n- 1

whose

rJxi1 /\ /\ dxin-1,

[j]

then, by definition,

dw(x)

L dw1il(x)
[j)

LL
[j]

k=l

/\ dxit /\ /\ rJxin-1

Dkwrn(x) rJx k /\ dxii /\ /\ rJxi n-1.

Now, since 1 :,.;; j1 < < j,,_1 :,.;;

n,

it follows that

rJxk /\ dxii /\ ... /\ rJxin-1


/\ dx"k -j;, Vi E ( l, n
3j; so that k }; .

1),

Hence

dw(x)

L (-I)k+IDkwk(x) dx1

k=l

/\ /\ dx",

where we have set

wk(x)
If

W1.--.k-1.k+t..n(x)

is the identity transformation of E" onto itself, it follows that

Vx E E",j,(x)

1. Thus, from the definition of a surface integral given

in Section 9.9, we get

l dw
I

(-l)k+i l Dkwdx) dx.

k=I

9.10 STOKES' THEOREM I 469


If we use iterated integration on the integrals on the right we get

i
where xk= (x1,

i [J:

Dkwdx)dx=

xk-1, xk+l,

Dkwk(x)dxk dxk>

xn) and Ik = {xk: x

E /}. Thus

From the definition of the functions 1/Jk and cpk, and the definition
of a surface integral given in section 9.9, we get

1ik

w=

L W[j]
[j]

lfJk(t)

a (lfJki1, ... '1/Jkjn-1)


(t) dt'
a(tl . . .'tn-1)
'

where]= {t: t E En-1 & Oti 1, Vj E

( l , n - 1 ) }.

Now, if (ji .

jn-i) contains the integer k, then


a(lfJki1,
a(tl' .

'

1/lk;n-1 )
(t)= O '
1n-I)

otherwise this Jacobian is identically I. Thus

I1k

w=

w=

{ wk
J lk

Ik

wk

1/Jdxk) dxk.

In the same way we find

To/<

cpk(xk) d:xk.

Consequently,

and thus

dw=

k=l

(-l)k+1

[f

11k

w-

f ]

1
n
= :L :L (-l)i+k
w.
f; k
j =O k=I

Iok

(9.10.3)

In Section 9.1 we formed formal sums of curves that we called chains


and explained there how the meaning could be made precise. In the
same way we shall form the formal sum

1 n
a1= :L :L (-l)i+kf/,
i=O k=I

(9.10.4)

and call this chain the oriented boundary of/. We shall leave to the
reader the simple task of giving a precise definition of a chain in this

470 I THE INTEGRATION OF DIFFERENTIAL FORMS

context. We shall define the integral of

(9.10.3).

over al as the last sum in

Thus we have proved the following.

9.10.1 Theorem. If w is a C 1 differential form of order n - 1 with


(an open) domain in En that contains the n-dimensional unit interoal/, then

r dw
11

=.

r w,
Jar

(9.10.5)

where al is the chain given by (9.10.4) and the integral on the right is defined
as the last sum in (9.10.3).
To be able to obtain a formula like

(9.10.5) for certain types ofJordan

domains in manifolds, it is first necessary to obtain a version of (9.10.5)


when I is replaced by a homeomorphic image of/. This is described in
the next theorem.

9.10.2 Theorem. Suppose that ip is a C2 homeomorphism with domain


(an open set) in En which contains/, has range in EP, n :,;;;; p, and dip(x) has
rank n, Vx E (ip) (see the Remark below). Further, suppose that w is a
C1 differential form of order n
1 with domain (an open set) in EP that con
tains ip (/) Then
-

JIP(/) dw= Jra IP(/) w '


where

iJip(/)

l IPW
0

(
LL
J=O k=l

(9.10.6)

-l)Hkip(J/)'

'L 'L ( -1 > i+k

J=O k=I

(9.10.7)

JIPUJk) w.

(9.10.8)

Proof. The differential form ip*w w ip is a C1 differential form of


order n - 1 whose open domain in En contains the interval /. [See
(9.6.15) for a definition of ip*w.] From Theorem 9.10.l we have
=

r dip*w
Jr

r ip*w
Ja1
I n

'L 'L (-l)k+

J=O k=I

We claim that

r ip*w
J1;k
Let us show this for

1,

Jjk

ip*w.

f1PU;k> w.

the situation for

(9.10.9)

(9.10.10)
=

being similar. As we

have noted before, /1k may be considered as a Jordan domain in

(Nk,

9.10

'111), where '111 is the oriented structure on N k that


lflk given by (9.10.2). Let J be the unit interval
=cp lflk(j). By proposition 9.9.10 we get

STOKES' THEOREM I 471

contains the function


in

{u k> w = J.,,."' < > w


1
kJ

{ (cp

En-1;

then

cp(I1k)

I/Id *w.

On the other hand, using Proposition 9.6. lO(c) we get

cp*w = f lflk*
J cp*w = J
11k
J
o/Jk(J)
= L (cp

cp*w

lflk)*w.

Thus the two integrals in (9.10.10) are the same. Consequently, the
right side of (9.10.8) is the same as the right side of (9.10.9).
On the other hand, from Proposition 9.6. lO(d) we get
From Proposition 9.9.10 we get

f dcp*w= r cp*dw=J

J1

<p(J)

dcp*w=cp*dw.

dw.

Thus using the last equality, (9.10.8), (9.10.9), and (9.10.10) we see
that (9.10.6) is valid.
REMARK:

has rank

The conditions that

cp

is a homeomorphism and

dcp(x)

are used only to remain within the context of integration

on a manifold. Theorem 9.10.2 remains valid if we only assume that

cp

is of class

C2

provided the concept of integration over a surface is

used as given in Section 9.5. See the remark after the proof of Propo
sition 9.9.10.

We can now get a more general version of Stokes' theorem 9.10.2 by


"patching together" formula (9.10.6). This more general version of
Stokes' theorem is obtained for special types of Jordan domains. To
describe these objects we shall introduce the half-space,

Hn= {x:

XE

In what follows we shall identify


by

{x: x E En &xn = O}.

En &xn

En-1

O}.

with the subspace of

En

given

9.10.3 Definition. Suppose (M, <I>k) is a regular, oriented, n-dimensional


Ck manifold in EP and K is a bounded subset of M. The set K is said to be a
regular Jordan domain in (M, <I>k) Vx E iJK,3cp E <I>k so that
(a)
(b)
(b')

x E (cp).
.B(cp) n Hn=cp-1(K) .
.B(cp) n En-1=cp-1(aK).

(See Exercise 10.)

472 I THE INTEGRATION OF DIFFERENTIAL FORMS

We should note here that, as in the case of defining a


REMARK:
Jordan domain in an oriented manifold, we have clearly abused the
tenants of notational precision in that K itself is a point set and thus
could be a regular Jordan domain in different oriented manifolds
having the same trace M. However, we think that no confusion will
result.
9.10.4 Proposition. Suppose K is a regular Jordan domain in the
oriented manifold (M, <I>k) , and <I> is that subset of all 'PE <I>k so that

se('P)

n aK oF- 0

and conditions (b) and (b') of Definition 9.10.3 are satisfied. Let 'I' be that
collection of all functions 1/1 with the property that 3 'PE <I> so that
1/1 = 'Pl.B('P)

n E"-1.

Then 'I' is a regular, oriented Ck atlas for aK.


Proof.

= (t1,

Let 'PE <I> and tE .B('P) n 11-1. Then, of course, (t, 0)


tn-1, O) E .B(ip) and we set

1/l(t) = ip(t, 0).


For every uE En- I we have

dl/J (t) (u) =

n-1

k=I

D 'P(t,O) uk = dip(t, 0)(u, 0).


k

Now, dip(t,O) has rank n, and thus dip(t,O) IE"-1 must have rank n - I.
This means that dl/l(t) has rank n - I. By conditions (a) and (b') of
Definition 9.10.3,

{se(l/J) : 1/1E 'I'}.

aK = u

Thus '11 is certainly a Ck atlas for aK.


To show that '11 is an oriented atlas, let 'Pi, ip2 E <I>, and

j= 1,2.
Let us set

.(t) = 1/12 -l
v(t,t") = i/)2 -l

0
0

1/11 (t) = 'P2 -l

'P1 (t, 0)'

'Pt (t,tn).

Since .(t) = v(t,O), it follows that

Vj,k E(l,n-1).
On the other hand, v"(t,0)

0, so that

D;v"(t,0) = 0,
Thus it follows that

VjE(l,n-1).

9.10

STOKES' THEOREM I 473

}v(t,O) =Dnv"(t,O)j,,(t).
Now, since

(M, <J>k)

is an oriented manifold, it follows that]v(t, 0) > 0.

On the other hand, since

Dnv"(t, O)

v"(t,t")

> 0 when

tn

> 0, it follows that

;:.: 0. Thus we must have j,,(t) > 0. Hence, 'II' is an oriented

atlas and the proposition is proved.


What we have just proved shows that there is an oriented Ck structure

'IJ'k

containing the atlas 'II' of Proposition 9.10.4, so that

regular,

( aK, 'IJ'k) is a
(n - 1)-dimensional, oriented Ck manifold. For reasons that

will soon become apparent, we want to consider the oriented manifold

(aK, (-1) "'IJ'k) r:ather than (aK, 'IJ'k). The reader is advised to reread
the last two paragraphs of Section 9.8 to refresh his memory on the
notation

( - I) n'IJ'k.

9.10.5 Definition. Suppose K is a regular Jordan domain in the regular.


oriented. n-dimensional Ck manifold (M, <J>k). Suppose further that 'II' is
that oriented Ck atlas for aK gi,ven in Proposition 9. l 0.4 and 'IJ'k is the oriented
structure for aK that contains 'II'. The oriented, (n - I )-dimensional, regular
Ck manifold (aK, (-l )n'IJ'k) is called the regular oriented boundary of K
with orientation induced by the orientation <J>k. (See the last two paragraphs
of Section 9.8 for a definition of -'IJ'k.) We shall usually abuse the notation
and designate (aK, (-I) "'IJ'k) by 'aK.'
Examples.

As a first example of a regular Jordan domain, let us

3 We shall take M E3 and <J>k to be


3 that contains the identity transformation. The set
K is the closure of B(O, I) and aK is the unit sphere {x: lxl=I}. Let
'Pa be that function with domain ]O, 1T[ X ]a,a+27T[ X R+ given by
consider the closed unit ball in

the structure on

'-Pa(t) = (t3 cos t1,t3 sin t1 cos t2,t3 sin t1 sin t2).
The Jacobian of

'Pa at t is
l<f!a ( t ) = (t3)2 sin t1

and thus

Va

R, 'Pa

7r(x)

E <J>k. If

> 0,

(x2,x3,x1}, we also get j

a(t)

7T <P

> 0.

Let

Then

T(t) = (t1,-t2,-t3) + e3 and set


l/Ja='-Pa0T .
1>(1/Ja) = ]0,7T[

]-a-27T,-a[

]-oo, I[, and

1>( I/la)

H3,

l/Ja-'(aK) = 1>(1/Ja)

l/Ja-1 (K)

1T0 l/Ja instead of l/Ja Thus


{ 1T 0 l/Ja: a E R} satisfy the
conditions of Definition 9.10.3. If we restrict each l/Ja and 1T 0 t/Ja to
We get the same results if we work with
the functions in the set

{l/Ja: a

R}

474 I THE INTEGRATION OF DIFFERENTIAL FORMS

(1/la)

2, then we get an oriented atlas for iJK and by adding the

identity transformation restricted to B (0, 1/2) we get that K is a regular


Jordan domain in (3,

<I>k).

As a second example let us consider the case of an anchor ring


or bagel in 3 We shall again take M=3 and

<I>k the structure on


3 onto itself. Let
Ia=]a,a+2rr[,Ip=],B,,B+2rr[ and IR=]O,R[. Let 'Pa./3 be that
function with domain Ia X Ip X IR given by

M that contains the identity transformation of

'Pa./ (t)

(R + t3 cos t2) cos t1,

'Pa.l(t) =(R + t3 cos t2) sin t1,

(9.10.11)

'Paj (t) =t3 sin t2


The functions 'Pa,P are C00 homeomorphisms and moreover

l'f'a.P (t) =t3(R + t3 cos t2)

>

0.

{cpa. p : a, ,B E R} belong to
]O, R [, let cp be defined on [O,2rr] X [O,2rr] X (0, r]
by the right side of (9.10.11), and set K=92.(cp). K is an anchor ring
in 3 and aK is a torus (Fig. 9.10.1) (see Exercise 4 of Section 9.5).
This shows that the collection of functions

<I>k. Now let r

FIGURE 9.10.1

Let us set T(t) =(t1,-t2, r - t3) and

1/Ja,P='Pa,P

0 T

The domain of 1/Ja,/3 is Ia X ]13 X ]r - R,r[, where ]13=]-,B - 2rr, -,B[.


Clearly,

1/la,p-1(K) = (1/la.P)

H3 ,

l/Ja .P-t ( iJK) =(l/Ja,/3)

9.10

Since

STOKES' THEOREM I 475

has a positive Jacobian at each point, it follows that the same

is true for t/Ja,/3 Thus we see that

(3' <t>k).

K is a regular Jordan domain in

9.10.6 Theorem. Suppose that (M, <1>2) is a regular, oriented, n-dimen


sional C2 manifold in P, and K is a regular Jordan domain in (M, <1>2).
Suppose also that w is a C1 differential form of order n - 1 with domain (an
open set) in P so that KC R>( w). Then

r dw=
JK

aK

w.

(9.10.12)

Let <I> be the subset of <1>2 described in Proposition 9.10.4.


x E iJK, 31/Jx E <I> so that x E fl{,( t/Jx) and I C R>( t/Jx), where
I is the unit interval in En. Let B x be an open ball about I/Ix-1( x) so that
Bx n Hn C I, and let 'fix= t/JxlBx The collection of the ranges of the
latter elements is a relatively open covering for the compact set iJK.
Thus there is a finite set {'P;: j E ( 1, l)} of these elements whose
ranges cover iJK. Now, Vx E K =K\ U {fl(,( ipJ): j E (I, l)}, 31/Jx E <1>2
1
so that x E fl(,( I/Ix), I C R>( I/Ix) and fl(,( I/Ix) n iJK= 0. Let Bx be an
open ball about t/lx-1( x) so that Bx C I and let 'fix= t/lxlBx Since Ki
is compact there is a finite set {cp;:j E (l+ l,m)} of these latter ele
ments whose ranges cover Ki. Thus {fl(,( cp;):j E (l,m)} covers K.
Let {'lT;: j E (I, m)} be a ci partition of unity for K so that 'lT;IM
has support in fl(,( cp;). If t/JJ corresponds to 'f';, as described above, then
'lT;01/J;( Jik)=O, Vk E ( l,n-1), i=O, 1, and also 'lTJ01/J;( /t)=O.
From the fact that Vx E K\t/l;( I),'lT; w( x)=0, we get, from Theorem

Proof.

For each

9.10.2,

Note that if j E

(l+ l,m), then 'lT; 0'l'J ( I0n)=0, so that in this case

all the above integrals are zero.


Recall that we are supposing that aK has the orientation induced by
<1>2, that is, ( -1) nqr2 Thus the last equality is nothing more than

r d'lT;W =
JK
Since

aK

'lT;W.

{'lT;: j E ( 1, m)} contains a partition of unity for aK that is sub


( -1) nqr2' it follows that

ordinate to

On the other hand,

n
w( x)= L 'lT; ( x)w( x)
j=i

476 I THE INTEGRATION OF DIFFERENTIAL FORMS

for all x in some open set in EP that contains K. Thus, for the x in this
open set,
dw(x) =

drr;w(x).

j=l

Hence from Proposition 9.9.8 we get

dw =

'i:, l

i=l

drr;w.

Thus we see that formula (9.10.12) holds and the proof is complete.
The theorem we have just proved does not include Theorem 9.10.1
or 9.10.2 as special cases. Aside from the relatively minor fact that al
and acp(I), when considered as chains, are logically different from
manifolds, there is the more important fact that a/ and acp(/) when
considered as boundaries of I and cp(J), respectively, cannot support
structures that will make them into regular manifolds. The trouble,
of course, lies in the fact that these boundaries have sharp corners.
These things can be rectified by considering "piecewise Ck manifolds,"
but we shall not go into these matters.

D Exercises
1.

Suppose M is a compact, regular, oriented, m-dimensional C2

manifold in En and w is a C1 differential form of order m - I with


domain an open set in En which contains M. Show that

L
2.

dw=O.

Let w be a C1 differential form of order m

open set) in En, m

n.

1 with domain (an

Suppose that for every (m - !)-dimensional

sphere S C E" we have

L
Show that

3.

w=O.

is closed.

A fluid flows in E3 with a velocity v which depends only on the

position vectors (x, y, z). The flow is said to be irrotational if and only
if v is of class C1 and

JY

( v1 dx + v2 dy + v3 dz) = 0,

for every smooth, regular, oriented, closed curve yin E3. Show that the
flow is irrotational if and only if there exists a real-valued function cp
of class C2 so that
v=-'ilcp.

9.10
4.

Let

STOKES' THEOREM I 477

K be a regular Jordan domain in the regular, oriented,


ci manifold (E", qri), where qri is the oriented structure

n-dimensional

for E" which contains the identity transformation. Show that the Jordan

content of K is given by

IKI

_! J (-1) Hxi dxi /\


n

/\

d;J" /\

/\ dx".

iJK j=l

K1 and K2 are regular Jordan domains in the regular,


2
2
(M 1, <l>i ) and (M 2, <1>2 ), respec
2
tively. If (aK1, (-1)"'1'1 )
(aK2, (-l)"'l'l), and w is a continuous,
5.

Suppose

oriented, n-dimensional C2 manifolds


=

exact, nth-order differential form, show that

Ki and K2 are regular Jordan domains in a regular,


2
C manifold, and K2 C Ki. Let w be a closed
1
C differential form of order n - 1 so that Ki \K2 C J0(w). Show that
6.

Suppose

oriented, n-dimensional

7.

Generalize formula (9.3.13) to higher dimensions.

8.

Generalize Exercises 8, 9, and 10 of Section 9.3 to higher dimen

sions.

Suppose a fluid flows in an open subset of 3 with a velocity

v
(x, y, z). Suppose
also that p(x, y, z, t) is the density of the fluid at the position (x, y, z)
and at the time t. We shall also suppose that the flow has no sinks or
9.

of class C1 which depends only on the position vectors

sources; that is, the rate of increase of mass of the fluid which lies inside
any fixed, closed surface equals the rate at which the mass flows through
the surface. Assuming that
equation of continuity,

dp
at
IO.

p is of class ci, obtain the hydrodynamic

apv + apv + apv


ax
ay
af.

0.

Show that condition (b') of Definition 9.10.3 is implied by

condition (b).

SYMBOLS

PAGE
2

negation

=::::}

implication, if ..., then ...

&

conjunction, and

{=}

for every

(x)(Q(x))

for every

there exists

16, 20

element of

16, 18

intersection

17

{x: Q(x)}
AXB
A\B
ACB
An B
n {A:A E v(..}
AUB
U {A:A E vt}
Ac

17

not an element of

21

0
JFJ(R)
5ll(R)
R -
R-1(A)

null set

24

F0G

composition of the functions F and G

24

!IA
f(A)

the range off restricted to

16
17
17
17
17
17
17

22
22
22
22
24

24

disjunction, either ... or ...


equivalence, if and only if

x, Q(x) is true

the set of all

x such that Q(x) is true

Cartesian product

A and not in B
A is contained in B

elements in
inclusion,

the intersection of all sets in .,(,


umon
the union of all sets in vt
complement of the set

domain of the relation R

range of the relation R

R
A under R

inverse of the relation


inverse image of

equivalence relation
the functionf restricted to the set

26

the natural numbers

31

the integers

35

the rationals

39

Q
Q+
(m, n)

45

No

the nonnegative integers

47

R
R+

the reals

48
52, 69

lim

limit

53

(x(n)), (xn)

a sequence

36

the positive rationals


the set of all integers between

m and n

the positive reals

479

480 I SYMBOLS
54, 235

lxl

absolute value, the length of the

63

g.l.b., inf

greatest lower bound, infimum

63

1.u.b., sup

least upper bound, supremum

69

[a,b], ]a,b[
[a,b[, ]a,b]
I(x)

half-open intervals

vector

69
69
71

(AC)

closed, open intervals


an open interval containing

77

closure of the set

91

lim,lim

limit superior, limit inferior

98

CT

98
99
132
133
133
138
138
183, 354
183, 354
183, 354
184, 355
184, 355

184

(a)n

L ak

k=O
(a, u(a))

2: <ak)
k=O
n
II ak
k=O
(a, Il(a))

II (ak)
k=O
f', df /dx, DJ
pn>, djn/dx", D nJ
A
IAI
A* A
R1(A, {xk})
i5,(A)
Q,(A)

J:
r
f

proof requires axiom of choice

finite sum
infinite series
infinite series
finite product
infinite product
infinite product
derivative off
nth order derivative off
decomposition of an interval
norm of a decomposition

A* is a refinement of A
Riemann sum forf

upper Darboux sum forf


lower Darboux sum forf

J(x) dx

upper Darboux integral off

f(x) dx

lower Darboux integral of f

f(x) dx

Riemann-Darboux integral off

_a

185
201

( f, /(J ))

improper integral

211

Riemann-Stieltjes integral

J(x) dg(x)

236

xy

dot or inner product

239
240

Ll.
L EBLl.

direct sum of L and

242

B(a,r)

open ball with center at

247

boundary of the set A

247

{3A
Ao

259

[ai!]

matrix

orthogonal complement of

interior of the set

LJ.
a

and radius r

SYMBOLS I 481
263

transpose of T

T'

sign of

275

sgn

276

det

309

D.J

310

af
Dk,-k

310
315
316
325
325
328
353
356

356

[a;;]

determinant of

[a;;]

directional derivative
partial derivative

ax

differential of f at

df(a)
],(a)
dxk
Du Dvf

Jacobian of f at

differential of projection

akf
ax' ... axk

Ck, C""
d(I)

I dx
I f(x) dx
fA
f(x)

) <ix

higher-order partial
classes of differentiable functions
diameter of

upper Darboux integral of f


lower Darboux integral of f
Riemann-Darboux integral off

361

'X( ),K(A)
IAI

Jordan content of

361

XA

characteristic function of

393,431

a(J1, ..., r) (a)


a(x1, ... ' xn)

Jacobian of f at

361

399,444

400

Jy

410
440
444

>.. /\ ,

(j]

outer and inner Jordan content of

A
A

usually a differential form


line integral of a differential form w

ip *w, w

xk

composition of directional derivatives

/C x

'P

composition of w and function 'P


exterior multiplication
special n-tuple of integers

447

dw(x )

differential of differential form w at

463

integral of a differential form over a

462,473

aK

Jordan domain
relative boundary; also oriented
boundary of a Jordan domain

INDEX

Abel's sum formula, 114


test, 114
theorem, 170

Bolzano-Weierstrass theorem, 65,


243
Borel, E., 77

Absolute value, 36,50,58

Boundary, 247

Accumulation point, 65, 242

Bounded set, 46

left, right, 85

Bounded variation, 218

Affine space, 306


transformation, 309
Algebra, 300,438

Cantor, G., 44

alternating tensor, 440

diagonal process, 121

closed, 300

function, 126

of differential forms, 437

intersection theorem, 79

exterior, 440

set, 121

finite dimensional, 438

Cartesian product, 16

Grassman, 440

Cauchy, A., 45

separating, 300
tensor, 439
Alternating multilinear functional,
277,439

Buajakovsky-Schwarz inequality,
217, 236
condensation test, 113
net, 357

Analytic function, 163

product of series, 167,168

Arc cos, arc sin, arc tan, 175,176

remainder formula, 160

Archimedian ordering, 30,50,57

root test, 110

Arcwise connected, 251

sequence, 50

Area of a surface, 432

Cesaro summability, 109,171

Arzela-Ascoli theorem, 297

Chain, 401,469

Associative laws, 27,56

Chain rule, 145,321

Atlas, 455

Characteristic functions, 361

oriented, 457
Axioms, 2
for the natural numbers, 27
Axiom of choice, 71

Closed set, 77, 242


Cofactor, 284
Commutative laws, 27,56
Compact set, 77,242,294
Comparison test, 110
Complement of a set, 17

Ball, 242

Component, 245

Basis, 230

Composition of functions, 24

ordered, 258
Bernstein, S., 178,180

Conjunction, 3
Connected set, 244

polynomials, 182

arcwise, 251

theorem, 162

simply, 420

Binet-Cauchy theorem, 280

Content,

Blumenthal, L., M., 159

Jordan, inner and outer, 361

Boas, R., P., Jr., 142

of an interval, 353

482

INDEX 1483
Continuity, 73, 249
uniform, 80, 250
Convex sets, 245
Corollaries, 2
Cosine, 173
addition formula, 173
Countability, 38, 41
Covering, 77, 242
open, 77, 242

Differential of a function, 310


of a differential form, 447
higher order, 325, 328
Differentiation, 138, 305
of integrals, 377
rules, 145, 319
Dini's theorem, 131
Directed set, 356
Directional derivative, 307, 309

Cramer's rule, 286

Disjunction, 3

Critical point, 345

Distributive law, 27, 56

Curves, 398

Domain

closed, 399

of a function, 23

homotopic, 419

of a relation, 22

length of, 403

Dot product, 236

oriented, 398
piecewise regular and smooth, 399
Eigenvalue, eigenvector, 270
Equality, 12, 16
D' Alembert's ratio test, 11 l

Equicontinuity, 297

Darboux lower and upper sums and

Equivalence, 5

integrals, 184
integrable functions, 184

relation, 24
Euclidean spaces, 235

Darboux's theorem, 158

Euler's relation, 318

Darboux-Stieltjes integrals, 211

Existential quantifier, 7

Decimal expansions, l l 8

Exponential function, 142

terminating, 120
Decompositions of intervals, 183

generalized, 87
Exterior product, 441

refinements of, 183, 354


Dedekind, R., 44
Definition, 5
inductive, 87

Fejer, L., 178


theorem, 304

Denumerable, 38

Finite set, 41

DeRham, G., 454

Functionals,

Derivative, 138
higher order, 138
of the logarithm, 139
of the trigonometric and inverse

alternating multilinear, 277, 439


linear, 262
multilinear, 276, 438
Functions, 23

trigonometric functions, 173,

analytic, 163

176

arc cos, arc sin, arc tan, 175, 176

Derived set, 242

bounded variation, 218

Determinants, 274, 277

Cantor, 126

expansion by cofactors, 282

characteristic, 36 l

of linear transformations, 287

class Ck, C"', 328

Differential forms, 399, 444


algebra of, 437
closed, 419, 452

continuous, 73, 249


nowhere differentiable, 142
cosine, 174

differential of, 447

Darboux integrable, 184, 356

exact, 416, 452

differentiable, 138, 310

484 I INDEX
Functions (cont.)
equicontinuous, 297

Geometric-arithmetic means
inequality, 349, 35 l

exponential, 87, 142

Geometric series, l l 0

gradient of, 316

Gradient, 316

graph of, 305

Gram-Schmidt process, 283

harmonic, 429

Graph, 305

higher-order derivatives, 138

Green's theorem, 410, 414, 415

infinitely differentiable, 138


integrable, 184, 356, 371
integration of a sequence and
series of, 196
limits of, 69, 249

Hadamard J., 166


Harmonic function, 429
Harmonic series, l 00

linear, 257

Heine-Borel theorem, 77, 242

Lipschitz, 363

Heine, E., 47, 77

logarithm, 139
maximum and local maximum of,

81, 345
minimum and local minimum of,

81, 345
monotone, 84

theorem, 80
Hessian, 346
Homeomorphism, 250
Higher-order differences, 152
differentials, 325, 328
partial derivatives, 325

multivalued, 23

Holder's inequality, 352

one-to-one, 23

Homotopic curves, 419

open, 249
oscillation of, 366
periodic, 175

Implication, 3

permutations, 274

Implicit Function Theorem, 340

piecewise regular and smooth, 399

Improper integrals, 201

polynomial, 75

absolutely convergent, 202

product of, 72

Cauchy, principal value, 207

quotient of, 72

convergent, 202

Riemann-Darboux integrable,

divergent, 202

184, 371

of the first and second kind, 201

saltus, 224

Inclusion of sets, 17

sequences and series of, 126

Index, 482

differentiable, 156, 157


sine, 173
spaces of, 293
sum of, 72
support of, 462
tangent, 176
trigonometric, 172
uniformly continuous, 80, 250
variation of, 222
Fundamental theorem of the
calculus, 199

Induction, 27
Inductive
definition, 87
set, 57
Inequalities,
Cauchy-Bunjakovsky-Schwarz,

217, 236
geometric-arithmetic means, 349,

351
Holder, 352
Minkowski, 352
triangle, 36, 237
Inference, 6

Gauss's theorem, 410, 414, 415


Generalized Mean Value Theorem,

151

Infinite products, 133


absolutely and conditionally
convergent, 135

INDEXl485
Infinite series, 98
Abel's test, 114

Intervals (cont.)
higher dimensional, 353

absolutely convergent, l 0 l

Interior of a set, 247

comparison test, 110

Intersection of sets, 17

condensation test, 113

Inverse of a function and a

conditionally convergent, l 03

relation, 22, 23

convergent, 99

Inverse Function Theorem, 335

decimal expansions, 118

Isomorphism, 32

of differentiable functions, 157

Iterated integration, 374

Dirichlet's test, 115


divergent, 99
of functions, 126
geometric, 110

Jacobian, 315
matrix, 315

grouping of, 103

Jordan content, 359, 361

harmonic, 100

Jordan domain, 462

of integrable functions, 196


integral test, 203

inner, outer, 361


regular, 471

Leibnitz' test, 115

Jordan measurable set, 361

limit comparison test, 117

Jump discontinuity, 85

p-series, 113
power, 165
Pringsheim's theorem, l 09

Lachterman, S., 159

product of, 168

Lagrange,

Raabe's test, 112

identity, 217, 240

ratio test, 11 l

multipliers, 347, 349

rearrangement, 10 l
Riemann's theorem, l 04

remainder formula, 160


Laplace operator, 414

root test, 110

Least upper bound, 63

sum of, 101

Lebesgue outer measure, 369

Infimum, 63
Infinite set, 41

Lebesque's theorem on Riemann


integrable functions, 371

Inner product, 236

Leibnitz' differentiation formula, 149

Integers, 31, 60

Lemmas, 2

Integral mean value theorems, 195,

Length of a curve, 403

215

L'Hospital's rule, 154

Integral test for series, 203

Limit comparison test, 117

Integrals,

Limit,

Darboux-Stieltjes, 211

of a function, 69, 249

line, 400

generalized, 356

Riemann-Darboux, 184, 185, 353,

inferior and superior, 90

371
Riemann-Stieltjes, 211
Integration, 183, 353
of differential forms, 396

left and right, 85


of a sequence, 50
Line integrals, 396, 400
Linear,

iterated, 374

combination of vectors, 230

on manifolds, 461

functional, 262

by parts, 192

manifold, 230

Intervals, 69, 353


content or volume of, 353
decompositions of, 183, 354

space, 233
subspace, 230
Linear transformations, 257

486 IINDEX
Linear transformations (cont.)
eigenvalues and eigenvectors of,
270
matrix representation of, 258
nonsingular, 257
norm of, 261
orthogonal, 271
projections, 266
proper values and proper vectors

Natural numbers, 26, 57


Negation, 2, 8
Net, 357
Cauchy, 357
limit of, 357
Nondenumerable, 121
Nowhere differentiable continuous
function, 142
Null set, 21

of, 270
rank of, 257
symmetric, 269
transpose of, 263
Linearly dependent and inde
pendent vectors, 229
Lipschitz functions, 363
Logarithm, 139
natural, 142

Open,
ball, 242
covering, 77, 242
function or map, 249
set, 77, 242
Order preserving, 36
Ordered,
basis, 258
n-tuple, 228

Maak, W., 161


Maclaurin series, 165
Manifolds, 455

pair, 16
Oriented,
curves, 398

Ck structure for, 455

manifolds, 457

integration on, 461

surfaces, 432

orientable, oriented, orientation,


457
regular Ck, 455
topological, 455
trace of, 455
Map, 23, 249
Matrix, 258
adjoint, 285

Orientation,
of a curve, 405
induced, 473
of a manifold, 457
Orthogonal complement, 239
Orthogonal transformations, 271
Oscillation of a function, 366
Outward normal, 414, 434

Jacobian, 315
skew-symmetric, 292
symmetric, 269
transpose of, 264
Maximum, 45, 81, 345
local, 81, 150, 345
Mean Value Theorem, 150, 332
Generalized, 151
Merten's theorem, 168
Mikolas, M., 142
Minimum, 46, 81, 345
local, 81, 150, 345
Minkowski's inequality, 352

Partial derivatives, 310


higher order, 325
Partition of unity, 299, 462
subordinate to an open cover,
299, 462
Path, 305
Peano curves, 253
Perfect se.t, 123
Period, 175
Permutation, 274
Piecewise regular and smooth, 399

Mobius band, 457

PoincarC's lemma, 454

Monotone functions, 85

Power series, 165

Multilinear functionals, 276, 438


alternating, 277, 439

Abel's theorem, 170


Cauchy-Hadamard's theorem, 166

INDEX 1487
Power series (cont.)

interval of convergence, 166

Sequence, 45, 50, 58


bounded,62

product of, 167

Cauchy,50,58

radius of convergence, 166

convergent,50, 58

Tauberian theorems, 171

of differentiable functions, 156

Predicate calculus, 7

of functions, 126

Pringsheim's theorem, 109

limit of, 50, 58

Products, 131

monotone, 62

infinite, 133

subsequence, 53, 58

Projections, 266

Series (see Infinite series)

Proof, 6

Sets, 16

by contradiction, 14

arcwise connected, 251

Proper value and proper vector,270

boundary of, 247

Propositional calculus, 7

bounded, 62, 242

Propositions, 2

Cantor, 122

Pythagorean theorem, 44

closed, closure of, 77, 242


compact, 77, 242, 294
connected, 244

Raabe's test, 112


Range of a function and a relation,
22, 23

convex, 245
countable, 60
covering for, 77, 242

Rank of a linear transformation, 257

dense, 60

Ratio test, 11 l

denumerable, 38, 60

Rationals, 35, 60

derived, 242

Reals, 44, 56

directed, 356

Rearrangement of infinite series,

finite, 41, 60

101, 104

greatest lower bound of, 63

Recursive definition, 132

infimum of, 63

Relation, 22

interior of, 247

equivalence, 24

least upper bound of, 63

reflexive, 24

open, 77

symmetric, 24

perfect, 123

transitive, 24

relatively open and closed, 244

Relatively open and closed sets, 244

simply connected, 420

Riemann-Darboux integrals, 183,

star-shaped, 420, 452

353
existence of, 197, 371
properties of, 190, 371
Riemann-Stieltjes integrals, 210
existence of, 219
properties of, 212

supremum of, 63
totally bounded, 294
Simply connected sets, 420
Sine, 173
addition formula, 173
Spherical coordinates, 268

Roche remainder formula, 160

Star-shaped sets, 420, 452

Rolle's theorem, 150

Statement, 3

Root test, 110

Stieltjes-Riemann integral, 210


Stoke's theorem, 410, 470, 475
Stone-Weierstrass theorem, 300

Saddle point, 345


Saltus function, 224

Structure, 455
oriented, 457

Schlomilch remainder formula, 160

Subsequence, 53, 58

Semicontinuity, lower and upper, 97

Subset, 18

4881 INDEX
Support of a function, 462

Trace (cont.)

Surface, 306

of a manifold, 455

area, 432

of a surface, 306, 432

integrals, 435

Transformation theorem for

oriented, 432

integrals, 390

Symbols, 479

Truth table, 2, 3

Symmetric,
matrix, 270
transformation, 269

Uncountable, 121
Uniform continuity, 80, 250
Uniform convergence, 126

Tangent,

Union of sets, 1 7

function, 176

Universal quantifier, 7

plane, 309
vector, 306
Tauberian theorems, 171

van der Waerden, B., L., 142

Taylor's,

Variable, 7

expansion, 163

Variation of a function, 222

series, 165

Vectors,

theorem, 330

components of, 229

Taylor's remainder formulas, 159

linearly dependent and

Cauchy form, 160

independent, 229

integral form, 192

orthogonal and orthonormal, 238

Lagrange form, 160

Vector space, 228, 233

Roche form, 160


Schlomilch form, 160

basis for, 230


finite dimensional, 233

Tensor algebra, 439

linear subspace of, 230

alternating, 440

Venn diagram, 1 7

Ternary expansions, 121

Voltera, V., 454

Theorems, 2

Volume of an interval, 353

Triangle inequality, 36, 237


Trichotomy, 27, 48, 57
Trigonometric functions, 172
inverses of, 175
Topological map, 250
Totally bounded set, 294
Trace,
of a curve or path, 305

Weierstrass, K., 47
approximation theorem, 129
M Test, 129
Well ordering, 29
Weyl, H., l

Anda mungkin juga menyukai