Anda di halaman 1dari 145

Lecture Notes for MATH201

Revised, December 2013

Rodney Nillsen
2
Contents

1 Introduction 5
1.1 Historical background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Where does this course fit in? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 How to use these notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Books and references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Sets, functions, vectors and open sets 9


2.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 One-to-one functions and the range of a function . . . . . . . . . . . . . . . . . . 12
2.4 Composition of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6 The space Rn of n-dimensional vectors . . . . . . . . . . . . . . . . . . . . . . . . 15
2.7 Position vectors and direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.8 Inner products and length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.9 Spheres and open sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.10 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.11 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.12 The cross product of vectors in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.13 Linear functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.14 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3 Lines, planes and surfaces 33


3.1 Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4 Differentiation 47
4.1 The derivative of a function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3 The matrix representation of the derivative . . . . . . . . . . . . . . . . . . . . . 51
4.4 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.5 The Chain Rule in classical notation . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.6 Higher derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.7 Polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3
4 CONTENTS

4.8 The Jacobian and the Inverse Function Theorem . . . . . . . . . . . . . . . . . . 60


4.9 Implicit Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5 Integration in two dimensions 71


5.1 Integration over a rectangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Integration over subsets of RR2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3 Repeated integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.4 Integration by substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.5 Integration using polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.6 Areas as integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6 Curves and vector fields 83


6.1 Curves and paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2 Directional derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.3 Vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.4 The gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.5 The divergence of a vector field . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.6 The curl of a vector field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7 Integration on curves and surfaces 97


7.1 Integration of vector fields along curves in Rn . . . . . . . . . . . . . . . . . . . . 97
7.2 Integration along curves in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.3 Green’s Theorem in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.4 Integrals that are independent of the path . . . . . . . . . . . . . . . . . . . . . . 103
7.5 Surfaces and normals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.6 Integration over a surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.7 Integration of a vector field over a surface in R3 . . . . . . . . . . . . . . . . . . . 112
7.8 Volume integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7.9 The divergence theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.10 Stokes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

8 Optimization 125
8.1 Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
8.2 Maxima and Minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.3 Constrained maxima and minima and Lagrange multipliers . . . . . . . . . . . . 133
8.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

9 Appendices 137
9.1 Open, closed and bounded sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
9.2 Limits, continuity and the existence of maxima and minima . . . . . . . . . . . . 137
9.3 The equality of mixed partial derivatives . . . . . . . . . . . . . . . . . . . . . . . 138
9.4 Proof of the Inverse Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . 139
Chapter 1

Introduction

1.1 Historical background

Sir Isaac Newton (1643-1727) and Gottlieb Wilhelm von Leibniz (1646-1716 ) discovered the
calculus in the seventeenth century. This is the calculus as taught to high school pupils and,
at a higher level, to first year university students. In his book, The Principia, Newton used
the calculus to show that a universal law of gravitation explained the observed motion of the
planets around the sun, the moon’s motion about the earth, the tides and other phenomena.
The calculus as developed by Newton and Leibnitz was a calculus involving functions of a single
real variable.
The corresponding calculus for functions of several variables was slow to develop but, by the
mid nineteenth century the mathematicians Green, Gauss and Stokes had developed fundamen-
tal results in the area. In this they had been motivated, in part, by new discoveries in physics,
in particular the notions of electromagnetic and electrostatic fields which, like gravitation, were
fields of force, but arising from magnetism and electricity rather than gravity. By the end of
the 19th century, mathematicians like Cantor, Weierstrass and Dedekind had put elementary
calculus on an intellectually rigorous basis, equivalent to the basis which the Greeks had estab-
lished for Euclidean geometry some 1500 years before. In the 20th century, Einstein’s theories
of special and general relativity gave further direction for work and clarification involving the
calculus of several variables.
In the 20th century David Hilbert emphasised the strengths and power of placing all of
mathematics on the axiomatic basis the Greeks had created for Euclidean geometry. Gradually,
the calculus of several variables came to have the conceptual clarity and rigour of one variable
calculus. Two notions here are central: the recognition of the derivative of a function as a linear
transformation or linear function describing the approximate behaviour of a function near the
point of differentiation, and the notion of n-dimensional manifold — that is, a surface in an
m-dimensional space which, locally, looks like an n-dimensional sphere. Thus, the surface of an
ordinary sphere in 3-dimensional space R3 is a 2-dimensional manifold because the region near
each point on the sphere looks like the interior of a circle — that is, it looks like a 2-dimensional
sphere. In general relativity, optimisation problems, statistics, economics, mathematical mod-
elling and dynamical system, functions of several variables now play an important role, even one
which is indispensable.

5
6 CHAPTER 1. INTRODUCTION

1.2 Where does this course fit in?


The course will reflect the 20th century changes in thinking about multivariate calculus. We
consider functions given by a formula f (x1 , x2 . . . , xn ), where each xj is a single real variable,
instead of where f is thought of as an expression involving a single real variable — that is
we consider functions of n real variables rather than just one real variable, which is why the
course is called multivariate calculus. The course assumes a knowledge of first year university
calculus and aims to integrate the first year approach to calculus into multivariate calculus.
Thus, whereas in first year (Df )(a) or Df (a) denoted the derivative of the function f at the
point a, the same notation for the same concept will be used for functions of several variables.
We shall see that because Df (a) is a linear transformation, it can be interpreted as a matrix,
denoted by f 0 (a). Thus we have

matrix of Df (a) = f 0 (a).

In first year, f 0 (a) is a 1 × 1 matrix, but in the calculus of several variables, f 0 (a) is an m × n
matrix, where n is the number of variables. The entries in the matrix f 0 (a) are found by
calculating derivatives of f , as in first year.
The Chain Rule for differentiation in first year reads:

(f ◦ g)0 (x) = f 0 (g(x))g 0 (x),

and in multivariable calculus the rule is identical, except that it is interpreted as a matrix
multiplication.
As the course treats the derivative as a linear transformation with a matrix representation,
there is a link between MATH201 and the Linear Algebra course MATH203. However, there is no
need to do MATH203 to understand MATH201. Students doing MATH203 as well as MATH201
will be able to see links between apparently different areas, and 20th century mathematics
strongly emphasised the links and connections between different areas. These links provide
mathematics with a strong sense of conceptual unity and coherence.
The course may try and point out specific connections to other subject areas such as physics
and economics.

1.3 How to use these notes


The notes are not, in any way, a substitute for coming to lectures. The notes have few examples
and they are not complete, nor are they intended to be complete as a set of notes. The notes
describe concepts, establish notations to be used, make definitions and give precise statements
of the results. To this extent, they aim to be complete and self contained. The notes aim to
give a complete, coherent description of the sequence of ideas and to present the “intellectual
backbone” of the course. However, an average student reading just these notes would find
them “heavy going” especially with few examples at this stage. Understanding of the notes will
come more fully from lectures, tutorials, problem solving and discussing the ideas and concepts.
Mathematical ideas are not generally obvious immediately, and a longer reflection upon them is
usually required before the learner feels they are “internalized” and “belong” to him or to her.
The notes are intended to form a coherent whole, not a series of isolated sections, each
one of which has a mere calculating technique or has to be memorized for the exam. While
calculations remain a very important part of the course, understanding at a deeper level comes
1.4. BOOKS AND REFERENCES 7

from understanding the underlying concepts and how they piece together throughout so as to
form a coherent and comprehensible pattern of ideas.
The notes have few proofs. However, students should note that the function of proof is
to provide intellectual justification for the truth of statements. As such, proofs are essential to
mathematics, although they are sometimes presented implicitly, rather than explicitly, especially
in applications. Some further proofs may be presented in lectures. The end of a proof is denoted
by .
There are many different notations used in multivariate analysis, mainly due to historical
reasons. Here, the number of notations is kept to a minimum. But students should be aware
of some differences in what they might see in looking at these notes and books on advanced
calculus or multivariate analysis. There are notations that are sometimes used to distinguish
between vectors and scalars — for example in some books, a vector may be indicated by putting
it in bold type, or a vector might be indicated by underlining it. These notations are not used
here. Essentially, we use only two notations for vectors: (i) just a plain letter, or a letter with
a subscript if dealing with several vectors at once, (ii) a position or direction vector is indicated
−→ −→
by 0x or xy to denote respectively the direction associated with the vector x relative to 0, or
the direction associated with the vector y relative to x (see the discussion in the text). Only
one notation is used to denote the length of a vector, whereas many different notations may be
found by looking at other sources. The viewpoint here is that there is no adequate mathematical
reason to distinguish between vectors and scalars. Thus, a number or scalar x is a vector as much
as (x, y) is a vector. Even physically, there is no real reason to distinguish between vectors and
and numbers, for a real number may be positive or negative – if positive it is “pointing” in one
direction and if negative it is “pointing” in the opposite direction. Thus, real numbers have both
a “magnitude” and a “direction”, just like vectors. Having said all this, in some circumstances
different notations are used for the same thing, in particular the different notations for partial
derivatives are used (see discussion in the text).
There are various figures in the text. The figures are not necessarily to scale, and are there
to clarify concepts and comprehension, rather than to be visually accurate.
Section 2 of the notes is mainly revision of ideas from first year, and only some parts will
be discussed in any detail. You should study the remaining part of Section 2 for yourself, as
necessary.

1.4 Books and references


There is no text book for the course. The following are reference works only and they will be
placed in the closed reserve in the library. Not all of the material in the following works is
necessarily suitable for all of the course. Also, there are many books on advanced calculus in the
library that have useful and relevant material, but not all such books reflect the 20th century
changes in multivariate analysis and advanced calculus.

• W. Kaplan, Advanced Calculus, Addison Wesley, 2002. Library call number 515/27.

• E. Kreyszig, Advanced Engineering Mathematics, Wiley, New York 1988 and subsequent
editions. Library call number 510.265/5.

• Lynn H. Loomis, and Shlomo Sternberg, Advanced Calculus, Addison-Wesley, 1968. Li-
brary call number 515/71.
8 CHAPTER 1. INTRODUCTION

• M. Spivak, Calculus on Manifolds, W.A. Benjamin, New York, 1965. Library call number
515.1/4.

• J. Stewart, Multivariable Calculus: concepts and contexts, Thomson, Belmont CA, 2005.
Library call number 515/288.

1.5 Comments
These notes are still “work in progress” and have been prepared while the author has been doing
many other things. Anyone wishing to make comments, suggestions or draw attention to errors
is invited to do so. I can be emailed at hnillsen@uow.edu.aui. In the future, there will be a bit
more detail on integration, and more exercises reflecting the changes in the notes.
The illustration
p on the front cover is of a hyperboloid in 3-dimensional space given by the
equations z = ± x2 + y 2 − 1. The illustration on the title page on the inside of the front cover
is of a parametric curve (x(t), y(t)) in 2-dimensional space given by
500 500
X cos(n2 πt) X sin(n2 πt)
x(t) = and y(t) = .
n2 n2
n=1 n=1

1.6 Acknowledgments
I wish to acknowledge the help of the office staff in the School, especially that of Sue Denny
who typed up my early drafts for most of the notes. I wish to thank Tania Barrett and Paul
Wakenshaw for reading an earlier draft of the notes and correcting some errors. My thanks go to
James McCoy, who has also pointed out some errors. I would also like to thank Des Clarke for
access to his earlier notes for Math201. I also thank those students who have drawn attention
to errors and made comments on the notes.

Rod Nillsen
School of Mathematics and Applied Statistics
University of Wollongong
February 2010, soe revisions made December 2013
email: nillsen@uow.edu.au
web site: www.uow.edu.au/∼nillsen
Chapter 2

Sets, functions, vectors and open sets

This Chapter introduces some of the basic concepts and notation used in the course. The main
concepts are those of a set and a function. A set of particular importance is the set Rn of
n-dimensional vectors.

2.1 Sets
A set is a collection of objects called elements or points. We write

x∈A

to mean x is an element of the set A. A set may be specified in various ways. If

A = {x1 , x2 , . . . , xn },

this would mean that A is a set with a finite number n of distinct elements x1 , x2 , . . . , xn .
If
A = {z1 , z2 , z3 , z4 , . . .},
this would mean that A is a set with a (presumably) infinite number of elements z1 , z2 , . . .. Note
that, in describing a set, the order of the elements is not important. Thus, it would also be true
to say that

A = {z2 , z1 , z3 , z4 , z6 , z5 , . . .}
The other way a set may be specified is that its elements are those having a given property,
P say. In this case we write

A = {x : x satisfies property P }

or
A = {x : x satisfies P },
and we say: “A is the set of all objects x that have the property P”.
For example,
[0, ∞) = {x : x is a real number and x ≥ 0}
Some sets appear often and have special symbols, Thus,

9
10 CHAPTER 2. SETS, FUNCTIONS, VECTORS AND OPEN SETS

IN = {1, 2, 3, . . .}
Z = {. . . , −2, −1, 0, 1, 2, 3, . . .}
R is the set of all real numbers
(a, b) = {x : x ∈ R and a < x < b}
[a, b) = {x : x ∈ R and a ≤ x < b}
(0, ∞) = {x : x ∈ R and x > 0},
R+ = [0, ∞) = {x : x ∈ R and x ≥ 0}.

Given n ∈ IN we define
  

 x1 

x2

  

n
R =  : xi ∈ R for all 1 ≤ i ≤ n .
 
..



 .  


xn
 

The elements of Rn are called n-dimensional vectors. If x ∈ Rn with


 
x1
 x2 
x =  . ,
 
.
 . 
xn

xj is called the j th coordinate, or coordinate j, of x for all 1 ≤ j ≤ n. Sometimes, a vector


x ∈ Rn may be written as x = (x1 , x2 , . . . , xn ). The set Rn is the set of all n × 1 real matrices
or, if we think of Rn in the alternative way, it is the set of all 1 × n matrices.
If A, B are two sets, we say A is a subset of B if all elements of A are elements of B, and we
write A ⊆ B. Thus  
A ⊆ B ⇐⇒ x ∈ A =⇒ x ∈ B .

Here, note that the statement x ∈ A =⇒ x ∈ B reads as “if x is in A then x is in B” or as


“x ∈ A implies x ∈ B”. The double arrow ⇐⇒ we read as “if and only if”, in which case we
might say that the statements on each side of the double arrow are equivalent.
If A, B are two sets, we define the union A ∪ B and the intersection A ∩ B of A and B by
n o n o
A ∪ B = x : x ∈ A or x ∈ B and A ∩ B = x : x ∈ A and x ∈ B .

If A, B are sets, the Cartesian product or simply the product of A and B is the set A × B
consisting of all pairs (a, b) with a ∈ A and b ∈ B. Thus,
n o
A × B = (a, b) : a ∈ A and b ∈ B .

More generally, the (Cartesian) product of n sets A1 , . . . , An is the set A1 × A2 × · · · × An given


by n o
A1 × A2 × · · · × An = (a1 , a2 , . . . , an ) : aj ∈ Aj for all1 ≤ j ≤ n .
2.2. FUNCTIONS 11

In particular, we have that


Rn = R × R × · · · × R,
where the product is taken n times. If A = [a1 , a2 ] and B = [b1 , b2 ] are closed intervals,
n o
A × B = (x, y) : a1 ≤ x ≤ a2 and b1 ≤ y ≤ b2 ,

so we see in this case that A × B can be thought of as a rectangle in R2 (the rectangle here
includes both the interior and the boundary). More generally, if we have intervals
[a11 , a12 ], . . . , [aj1 , aj2 ], . . . , [an1 , an2 ], the set

[a11 , a12 ] × · · · × [aj1 , aj2 ] × · · · × [an1 , an2 ]

is called a closed n-dimensional rectangle in Rn .

2.2 Functions
If A and B are sets, a function f from A into B is a procedure that assigns, to each element
x ∈ A, a unique element f (x) ∈ B. We then write

f : A −→ B

to describe this situation. The set A is called the domain of the function, and the set B is called
the codomain of the function. We say f maps A into B. If x ∈ A, the element of f (x) of B is
called the value of f at x.
Important comment: We distinguish between the function f and the value f (x) of the
function at x. The symbol f refers to the total procedure by which the function assigns a unique
value f (x) ∈ B to each x ∈ A. But if x ∈ A, f (x) is an element of B, namely the value of the
function at x. This conceptual distinction is vital to understanding many aspects of this course.
In our course, function values are generally specified by a (multivariate) formula. So, for
example, the sine function is the function

f : R −→ R

given by
f (x) = sin x.
That is, the sine function assigns to each point x ∈ R the value sin x ∈ R. Or we might consider
a function g : R2 −→ R3 given by

g(x, y) = (x2 − y 2 , 2xy, x2 + y 2 ),

using the row notation for vectors.


If A, B are subsets of R and f : A −→ B, where A and B are taken as given, or are implicit,
the function f may be described by saying it’s the function or mapping given by

x 7−→ f (x),

where f (x) is a formula specifying the value f (x). Note the different arrow used in this context.
EXAMPLE. The function

f : (−∞ − 4) ∪ (−4, 0) ∪ (0, ∞) −→ R


12 CHAPTER 2. SETS, FUNCTIONS, VECTORS AND OPEN SETS

given by
(x − 1)(x − 2)
f (x) = ,
x(x + 4)
might more informally be described as the function

(x − 1)(x − 2)
x 7−→ ,
x(x + 4)

where the domain and the codomain are implicit.


If we have a function f : A → B, and if C is a subset of A, we may restrict f to the subset
C . Thus, the function mapping C → B given by

x 7−→ f (x), for x ∈ C,

is called the restriction of f to C, and is denoted by f |C. Thus,

f |C : C −→ B.

If A is a set, the identity function on A is the function

ι : A −→ A

given by
ι(x) = x, for all x ∈ A.

2.3 One-to-one functions and the range of a function


Suppose that A, B are given sets and that f is a function with domain A and codomain B.
Thus, f : A −→ B. Let y ∈ B be given, and consider the equation

f (x) = y, (2.1)

to be solved for x ∈ A. There are 2 possibilities. These are:


(1) equation (2.1) has no solution for x ∈ A, or
(2) equation (2.1) has at least one solution for x ∈ A.
If (2) occurs, we say that y is in the range of f . Thus,

range of f = {y : y ∈ B and f (x) = y for some x ∈ A}.

If
range of f = codomain of f,
we say that the function f is onto, or that it maps its domain onto its codomain.
Now, when case (2) occurs, we know that the equation (2.1) has at least one solution, and
we will have either
(3) equation (2.1) has exactly one solution for x ∈ A,
or
(4) equation (2.1) has more than one solution for x ∈ A.
If, for each y ∈ B, either case (1) or case (3) occurs, then the function f is said to be one-
to-one. Thus, if f : A −→ B is a function, f is one-to-one if for each y ∈ B, either the equation
f (x) = y for x ∈ A has no solution for x or has exactly one solution for x.
2.3. ONE-TO-ONE FUNCTIONS AND THE RANGE OF A FUNCTION 13

DEFINITIONS. Let f : A −→ B be a given function. Then, if C ⊆ A, we define the set


f (C) by putting
n o
f (C) = f (x) : x ∈ C .

That is, f (C) is the subset of the codomain B obtained by applying the function f to all points
of C. Note that y ∈ f (C) is equivalent to saying that f (x) = y for some x ∈ A. Thus, f (C)
equals the range of f . We say that f is onto, or that f maps A onto B, if f (C) = B. That is,
f is onto when the range of f equals the codomain B.
EXAMPLE. Let f : (−∞, 2) ∪ (2, ∞) −→ R be given by

x−3
f (x) = ,
x−2

and consider the equation


f (x) = y,

to be solved for x in the domain of f — that is, we want x 6= 2. We have

x−3
f (x) = y ⇐⇒ x 6= 2 and =y
x−2
⇐⇒ x 6= 2 and x − 3 = xy − 2y
⇐⇒ x 6= 2 and x − xy = −2y + 3
⇐⇒ x 6= 2 and x(1 − y) = −2y + 3.

Thus, if y = 1 there is no solution for x. If y 6= 1, we have

−2y + 3
f (x) = y ⇐⇒ x 6= 2 and x = . (2.2)
1−y

−2y+3
But, if y 6= 1, 1−y 6= 2. So we see that the equation

f (x) = y

has a solution x with x 6= 2 if and only if y 6= 1. Thus,

Range of f = {y : y ∈ R and y 6= 1}
= (−∞, 1) ∪ (1, ∞).

Also, we see from (2.2) that when y 6= 1 the equation f (x) = y has a unique solution for x given
by
−2y + 3
x= .
1−y

This shows that f is a one-to-one function mapping its domain onto (−∞, 1) ∪ (1, ∞). It is
useful to sketch the graph of f to illustrate what is going on with this function.
14 CHAPTER 2. SETS, FUNCTIONS, VECTORS AND OPEN SETS

2.4 Composition of functions


If we have functions f, g where
f : A −→ B and g : B −→ C,
the composition of g with f is the function
g ◦ f : A −→ C
is given by
(g ◦ f )(x) = g(f (x)), for all x ∈ A.
Note that g ◦ f “makes sense” or is defined only when
codomain of f ⊆ domain of g.
The function g ◦ f is sometimes called the function obtained by substituting f in g. The idea
is, for example, that the function x 7−→ sin(x2 ) is the function obtained by substituting x2 in
the formula sin x.

2.5 Inverse functions


Let f : A −→ B be a one-to-one function. This means that if we consider the equation
f (x) = y, where y ∈ B is given, (2.3)
either there is no solution for x ∈ A, or there is exactly one solution for x ∈ A. The “no solution”
case occurs when y is not in the range of f ; there is a solution when y is in the range of f .
When y is in the range of f , let us denote the unique solution x of equation (2.3) by f −1 (y).
This defines a function f −1 where
f −1 : Range of f −→ A.
This function is called the inverse of f . Note that because of the definition of f −1 (y),
f (f −1 (y)) = y, for all y in the range of f.
That is,
(f ◦ f −1 )(y) = y, for all y in the range of f,
or
f ◦ f −1 = ι,
where ι denotes the identity function on the range of f .
The above discussion shows that if a function is one-to-one, then it has an inverse.
Now, consider the equation f (x) = y again, where y is in the range of f . As y is in the range
of f , y = f (v) for some v ∈ A. Thus, the solution of the equation f (x) = y for x is in this case
x = v. That is,
f −1 (f (v)) = f −1 (y) = v, for all v ∈ A.
Thus,
f −1 ◦ f = ι,
where this time ι is the identity function on A. Thus, we have seen that if f is one-to-one, then
f ◦ f −1 is the identity function on the range of f , and f −1 ◦ f is the identity function on the
domain of f .
2.6. THE SPACE RN OF N -DIMENSIONAL VECTORS 15

2.6 The space Rn of n-dimensional vectors


The elements of Rn are called n-dimensional vectors; or simply vectors. If x is a vector, xj is
called coordinate j of the vector x. That is, if x ∈ Rn ,

x1
 
 x2 
x=
 ...  ,
 or x = (x1 , x2 , . . . , xn ),

xn

where xj ∈ R for all 1 ≤ j ≤ n. Now it may happen that we need to consider n vectors
x1 , x2 , . . . , xn ∈ Rn . In this case coordinate j of xi is (xi )j , but we usually write this as xij .
Thus, if xi ∈ Rn ,
xi1
 
 xi2 
xi = 
 ...  , or xi = (xi1 , xxi2 , . . . , xin ).

xin
It appears that xi may sometimes denote a real number, that is when xi ∈ R; but at other
times xi may denote a vector in Rn . Which case is intended becomes obvious from the context
— that is, if x ∈ Rn then xj ∈ R, but if xi ∈ Rn then xij ∈ R.
Given two vectors x, y ∈ Rn we can add them by the rule

(x + y)i = xi + yi , for 1 ≤ i ≤ n.

If in addition α ∈ R, we can multiply x by α using the rule

(αx)i = αxi , for 1 ≤ i ≤ n.

Thus,
x1 + y1
 
 x2 + y2 
x+y = .. , or x + y = (x1 + y1 , x2 + y2 , . . . xn + yn ).
 . 
xn + yn
Also,
αx1
 
 αx2 , 
αx = 
 ...  ,
 or αx = (αx1 , αx2 , . . . , αxn ).

αxn
In Rn , the vectors

0
 
1 0
   
1
0    ... 
e1 =  0
 ...  , e2 =  .  , . . . , en =  0  ,
  
 .. 
0 1
0

are called the standard basis vectors for Rn . In MATH203, other “basis vectors” for Rn are dis-
cussed, but only the standard basis vectors are used in MATH201. So, the vectors e1 , e2 , . . . , en
16 CHAPTER 2. SETS, FUNCTIONS, VECTORS AND OPEN SETS

are usually simply called the basis vectors for Rn . Note that in R3 , the vectors e1 , e2 , e3 are
sometimes denoted by i, j, k. Thus,

i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1).

B
A
x . . f(x)

A B
x .
. z
y .

A B
-1
x = f (y)
. . y = f(x)

Figure 2.1. The top picture illustrates the idea that the function f takes a point
x in the domain A to the unique point f (x) in the codomain B.
The middle picture illustrates the situation where the function f takes the distinct
points x, y ∈ A and maps them to the same point z ∈ B. Thus, in this case the
function f is not one-to-one.
In a case where f is one-to-one, the bottom picture illustrates the idea that whereas
f maps a point x into the point f (x) = y, the inverse function maps back, from
the point y ∈ B to the point x ∈ A. If f is one-to-one, the inverse function exists
and maps the range of f into A.
2.6. THE SPACE RN OF N -DIMENSIONAL VECTORS 17

(2,3)
(1,2)

(0,1)
S (1,1) g
g(S)
(1,1)

(0,0) (1,0) (0,0)


xy-plane uv-plane

Figure 2.2. The figure illustrates the behaviour of the function g : R2 −→ R2 given by

g(x, y) = (x + y, x + 2y), for all (x, y) ∈ R2 .

Let (u, v) ∈ R2 and consider the equation g(x, y) = (u, v), to be solved for (x, y). We
have

g(x, y) = (u, v) ⇐⇒ (x + y, x + 2y) = (u, v)


⇐⇒ x + y = u and x + 2y = v
⇐⇒ x = 2u − v and y = −u + v
⇐⇒ (x, y) = (2u − v, −u + v).

Thus, for each (u, v) ∈ R2 , the equation g(x, y) = (u, v) has a solution for (x, y) ∈ R2 ,
and this solution is unique. Thus, g is one-to-one and onto as a function from R2 into
R2 . Also, the inverse g −1 of g maps R2 into R2 and is given by

g −1 (u, v) = (2u − v, −u + v), for all (u, v) ∈ R2 .

The set S on the left is the unit square [0, 1] × [0, 1]. As depicted by the arrow, the
function g maps S onto g(S), the parallelogram depicted in the picture on the right.
How do we know that g(S) is the parallelogram depicted? Well the function g can be
expressed by using x and y for the coordinates of points in the domain and u and v for
coordinates in the codomain. We then have u = x + y and v = x + 2y. The edge of the
square S joining (0, 0) to (1, 0) is given by {(x, 0) : 0 ≤ x ≤ 1}. But g(x, 0) = (x, x), so
g maps points on this edge of S to the line segment {(u, u) : 0 ≤ u ≤ 1} joining (0, 0) to
(1, 1), as depicted in the picture on the right. The edge of S joining (1, 0) to (1, 1) maps
to the line segment joining (1, 1) to (2, 3), the edge of S joining (1, 1) to (0, 1) maps to
the line segment joining (2, 3) to (1, 2), and the edge of S joining (0, 1) to (0, 0) maps to
the line segment joining (1, 2) to (0, 0). So, we see that all the edges of S map
to corresponding edges of the parallelogram. It can be checked that interior points of S
map to interior points of the parallelogram, and using the fact that g is one-to-one and
onto it follows that g maps S onto the parallelogram as indicated. Interested students
may care to think about the details of this argument as an exercise.
18 CHAPTER 2. SETS, FUNCTIONS, VECTORS AND OPEN SETS

In what sense are e1 , e2 , . . . , en “basis” vectors for Rn ? Well if x ∈ Rn , we have

x = (x1 , x2 , . . . , xn )
= (x1 , 0, . . . , 0) + (0, x2 , 0, . . . , 0) + · · · + (0, . . . , 0, xn )
= x1 (1, 0, . . . , 0) + x2 (0, 1, 0, . . . , 0) + · · · + xn (0, 0, . . . , 0, 1)
= x1 e1 + x2 e2 + . . . + xn en . (2.4)

The last line of (2.4) is called a (linear) combination of e1 , e2 , . . . , en . Thus, it follows from
(2.4) that if x ∈ Rn , x is a unique combination of the basis vectors e1 , e2 , . . . , en , so that each
vector is Rn can be “built up” from the basis vectors.

2.7 Position vectors and direction

Mathematical concepts may often be thought of or interpreted in various ways. The most useful
way of thinking about a concept may depend upon the context in which the concept occurs.
In the case of vectors, in the two and three dimensional cases, vectors can be thought of in
geometrical terms, and this can give insight into n-dimensional vectors where it is harder to
think geometrically.
In R2 and R3 , we sometimes think of vectors as “arrows” rather than points in R2 and R3 .
This comes about by identifying the point x in R2 (or R3 ) by the “arrow” joining 0 to x, starting
at 0 and finishing at x (see Figures 2.3 and 2.4). This arrow is sometimes called the position
−→
vector of x. It may be denoted by Ox.

.x

(0,0)
Figure 2.3.The figure illustrates the “position vector” associated with a vector x in R2 .
The position vector of x is thought of as an arrow starting at 0 and ending at x. The
idea of a position vector is used sometimes when we want to think geometrically about
vectors.
2.7. POSITION VECTORS AND DIRECTION 19

Z
.x

0
Y

Figure 2.4. The figure illustrates the position vector of a vector x in R3 .

y
.

.
x

0
Figure 2.5. Given two vectors x, y we can associate with them the arrow going
from x to y. This arrow has the same length and direction at the vector y − x.
So, if we want the direction of the line segment going from x to y, it is given by
the vector y − x.

Given two vectors x, y ∈ Rn , the direction from x to y is given by the vector y − x, or by any
positive multiple of this vector. We can think of y − x geometrically as an arrow that goes from
−→
x to y, and this arrow is sometimes denoted by xy . If the vectors are denoted by P, Q as they
20 CHAPTER 2. SETS, FUNCTIONS, VECTORS AND OPEN SETS

−→
may be in some cases, the “vector” or “arrow” going from P to Q is denoted by P Q. These
ideas are illustrated in Figures 2.5.

The addition and scalar multiplication of vectors can be interpreted in terms of position
vectors – that is, in terms of arrows. For example, adding vectors in R2 or R3 can be interpreted
geometrically as addition according to the parallelogram law. This is illustrated in Figure 2.6,
and the subtraction law is similarly illustrated in Figure 2.7.

2.8 Inner products and length


If x, y ∈ Rn the inner product of x and y is hx, yi, where
n
X
hx, yi = xj yj .
j=1

Also, the length of x is defined to be |x|, where


v
u n 2
uX
p
|x| = hx, xi = t xj .
j=1

Note that
hx, yi ∈ R and |x| ∈ [0, ∞).
A vector x is called a unit vector if |x| = 1.

Theorem 1 (the Cauchy-Schwartz inequality). If x, y ∈ Rn ,

|hx, yi| ≤ |x| · |y|.

Also, if it is the case that


|hx, yi| = |x| · |y|,
then there is α ∈ R such that either x = αy or y = αx. That is, equality holds in the Cauchy-
Schwartz inequality if and only if one vector is a multiple of the other.

The Cauchy-Schwartz inequality means that if x, y ∈ Rn with x 6= 0 and y 6= 0, we can define


the angle between the vectors x, y as θ, where
hx, yi
cos θ = .
|x| · |y|
That is,  
−1 hx, yi
θ = cos .
|x| · |y|
This is possible because, by the Cauchy-Schwartz inequality,
hx, yi
−1 ≤ ≤ 1.
|x| · |y|
Note that θ = π/2 corresponds to the case hx, yi = 0, and then we say that x and y are
orthogonal.
2.8. INNER PRODUCTS AND LENGTH 21

x+ y

0
Figure 2.6. The figure illustrates the “parallelogram law” for the addition of the vectors
x and y in R2 .

x-y

Figure 2.7. The figure illustrates the “subtraction law” for the subtraction of vector y
from vector x in R2 .

This above definitions of the inner product and the angle between vectors fits in with first
year, for there we saw that if we had x = (x1 , x2 ), y = (y1 , y2 ) ∈ R2 then the dot product of x
and y was defined to be x · y and was given by

x · y = |x| · |y| · cos θ = x1 y1 + x2 y2 .

That is, in R2 , the dot product is the same as the inner product.
The length function for vectors has the following properties. If x, y ∈ Rn and α ∈ R:
22 CHAPTER 2. SETS, FUNCTIONS, VECTORS AND OPEN SETS

(i) |x| = 0 ⇐⇒ x = 0,

(ii) |αx| = |α| · |x|, and

(iii) |x + y| ≤ |x| + |y|.

The value |x − y| represents the distance between x, y. This is clear in R2 , for Pythagoras’
Theorem gives the distance between x, y ∈ R2 as
p √
(x1 − y1 )2 + (x2 − y2 )2 = < x − y, x − y > = |x − y|.

In Rn , v
u n
uX
|x − y| = t (xj − yj )2 .
j=1

2.9 Spheres and open sets


Let x ∈ Rn and r > 0. Then the (open) sphere of centre x and radius r is the subset of Rn given
by
S(x, r) = {y : y ∈ Rn and |x − y| < r}.
In R3 , S(x, r) is an “ordinary” sphere in 3-dimensional space R3 , but it does not include the
surface of the sphere. In R2 , S(x, r) is the interior of the circle of centre x and radius r. In R,
S(x, r) is the open interval (x − r, x + r).
Let Ω ⊆ Rn . Then Ω is an open set or is called open if, for any point x ∈ Ω, there is r > 0
such that
S(x, r) ⊆ Ω.

2.10 Matrices
A matrix is an array of real numbers. A matrix has a certain number of rows and a certain
number of columns. An m × n matrix A is an array A = (aij ) 1≤i≤m of m rows and n columns,
1≤j≤n
where the notation means that aij appears in row i and column j. Thus, if A = (aij ) 1≤i≤m is
1≤j≤n
an m × n matrix, we write
a11 a12 ··· a1n
 
 a21 a22 ··· a2n 
A=
 ... .. ..  .
. ··· . 
am1 am2 ··· amn
The numbers aij , 1 ≤ i ≤ m, 1 ≤ j ≤ n are called the entries in the matrix, and aij is called
the entry (i, j) in the matrix. If the number of rows and columns in the matrix A is known or
understood, we may simply write the matrix as A = (aij ). If we have a number x, then x can
be regarded as the 1 × 1 matrix (x). Thus, a number can be thought of as a 1 × 1 matrix. A
matrix is called square if it has the same number of rows and columns – that is, it is an m × m
matrix, for some m.
2.10. MATRICES 23

Let
A = (aij ) 1≤i≤m and B = (bij ) 1≤i≤m
1≤j≤n 1≤j≤n

be two m × n matrices. Then the matrices may be added to get an m × n matrix A + B, and
A + B is the matrix obtained by adding the corresponding entries. Thus, with

A = (aij ) 1≤i≤m and B = (bij ) 1≤i≤m ,


1≤j≤n 1≤j≤n

we have
a11 + b11 a12 + b12 ··· a1n + b1n
 
 a21 + b21 a22 + b22 ··· a2n + b2n 
A+B = .. .. ..  = (aij + bij ) 1≤i≤m .
 . . ··· .  1≤j≤n

am1 + bm1 am2 + bm2 ··· amn + bmn


If A = (aij ) 1≤i≤m is a given matrix and a ∈ R, we may multiply A by the number a to get
1≤j≤n
the m × n matrix aA given by
aa11 aa12 ··· aa1n
 
 aa21 aa22 ··· aa2n 
aA = 
 ... .. ..  = (aaij ) 1≤i≤m .
. ··· .  1≤j≤n

aam1 aam2 ··· aamn


Thus, to obtain the matrix aA, multiply every entry in A by a.
Let A = (aij ) 1≤i≤m be an m × r matrix and let B = (bk` ) 1≤k≤r be an r × n matrix. Thus,
1≤j≤r 1≤`≤n

a11 a12 ··· a1r b11 b12 ··· b1n


   
 a21 a22 ··· a2r   b21 b22 ··· b2n 
A=
 ... .. ..  and B =  .. .. ..  .
. ··· .   . . ··· . 
am1 am2 ··· amr br1 br2 ··· brn
In this case,
the number of columns in A = the number of rows in B.
Then A and B may be multiplied to get the m × n matrix AB given by
a11 b11 + a12 b21 + · · · + a1r br1 · · · a11 b1n + a12 b2n + · · · + a1r brn
 
.. ..
AB =  . . 
am1 b11 + am2 b21 + · · · + amr br1 · · · am1 b1n + am2 b2n + · · · + amr brn
Thus, AB is the matrix (ci` ) 1≤i≤m , where for each 1 ≤ i ≤ m and 1 ≤ ` ≤ n,
1≤`≤n

r
X
ci` = aij bj` .
j=1

Note that the matrices A, B can be multiplied only when

the number of columns in A = the number of rows in B.

EXAMPLES. Let
     
1 1 2 −1 3 0
A= ,B = ,C = .
2 3 1 1 7 1
24 CHAPTER 2. SETS, FUNCTIONS, VECTORS AND OPEN SETS

Then,
       
3 0 5 −1 2 2 −1 −1
A+B = ,B + C = , 2A = and BC = .
3 4 8 2 4 6 10 1

Also, if  
1 2  
1 2 −1
X =  −1 1  and Y = ,
−1 1 4
2 −1
then  
−1 4 7
XY =  −2 −1 5  .
3 3 −6

2.11 Determinants
Every square matrix A has a determinant, which is a real number associated with A. The
determinant of A is denoted by det(A) or |A|. In these notes we are mostly restricted to
considering only determinants of 1 × 1, 2 × 2 and 3 × 3 square matrices. If A is a 1 × 1 matrix,
A = (x) say, then det(A) = x. If A is a 2 × 2 matrix with
 
a11 a12
A= ,
a21 a22

then
det(A) = |A| = a11 a22 − a12 a21 .
If  
a11 a12 a13
A =  a21 a22 a23  ,
a31 a32 a33
then

det(A) = |A| = a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a22 a31 ). (2.5)

In general, if A = (aij ) is an n × n matrix with n > 1, for each i, j ∈ {1, 2, . . . , n} let Aij denote
the (n − 1) × (n − 1) matrix obtained by removing row i and column j from A. Then, for any
i ∈ {1, 2, . . . , n},
Xn
det(A) = |A| = (−1)i+j aij det(Aij ). (2.6)
j=1

Then, if in (2.6) we take n = 3 and i = 1, we get formula (2.5) for the determinant of a 3 × 3
matrix. It is a remarkable fact that in (2.6), the summation has the same value for each choice
of i ∈ {1, 2, . . . , n}. It is also a remarkable fact that if A, B are both n × n matrices, then

det(AB) = det(A)det(B). (2.7)

EXAMPLES. Let, as above,


     
1 1 2 −1 3 0
A= ,B = ,C = .
2 3 1 1 7 1
2.12. THE CROSS PRODUCT OF VECTORS IN R3 25

Then we have, using (2.5),


det(A) = 3 − 2 = 1, det(B) = 2 − (−1) = 3 and det(C) = 3 − 0 = 3.
Note that C = AB, so that the fact that 3 = 1 × 3 confirms the identity (2.7) in this case of
2 × 2 matrices.
As another example, using (2.5) we have
 
1 2 −1
det  2 0 1  = 1(0 − 1) − 2(2 + 1) + (−1)(2 − 0) = −1 − 6 − 2 = −9.
−1 1 1

2.12 The cross product of vectors in R3


Let x, y be vectors in R3 , x = (x1 , x2 , x3 ), y = (y1 , y2 , y3 ). Then the cross product of x and y is
the vector x × y given by

e1 e2 e3

x × y = x1 x2 x3
y1 y2 y3

x2 x3 x1 x3 x1 x2
= e − e + e
y2 y3 1 y1 y3 2 y1 y2 3

= (x2 y3 − x3 y2 , −x1 y3 + x3 y1 , x1 y2 − x2 y1 ). (2.8)


Now x × y is orthogonal to x and to y, for by (2.8),
hx, x × yi = x1 (x2 y3 − x3 y2 ) − x2 (x1 y3 − x3 y1 ) + x3 (x1 y2 − x2 y1 )

x1 x2 x3

= x1 x2 x3
y1 y2 y3
= 0.
Similarly
hy, x × yi = 0.
The direction of x × y is given by the ‘right hand rule’. Note that for all x, y ∈ R3 ,
x × y = −y × x,
and
x × x = 0.

2.13 Linear functions


A function f : Rn → Rm is called linear if, for all x, y ∈ Rn and α ∈ R,
f (x + y) = f (x) + f (y), and
f (αx) = αf (x).
Sometimes a linear function is called a linear transformation. Geometrically, the idea is that a
linear transformation or function changes straight lines into straight lines. We shall adopt the
convention that when elements x, y ∈ Rn are written as column vectors they may be denoted
by X, Y etc.
26 CHAPTER 2. SETS, FUNCTIONS, VECTORS AND OPEN SETS

Theorem 2 Let f : Rn → Rm be a given function. Then the following are equivalent.


(i) f is linear.
(ii) There is an m × n matrix A such that

f (X) = AX, for all X ∈ Rn .

In the case when f is linear, the matrix A is called the matrix of the linear function f .
PROOF. Let (ii) hold. Then if X, Y ∈ Rn ,

f (X + Y ) = A(X + Y )
= AX + AY
= f (X) + f (Y ).

Also, if α ∈ R,
f (αX) = A(αX) = αAX = αf (X).
Thus, if (ii) holds, f is linear so that (i) holds also.
Conversely, let (i) hold. Recall that
0
 
1 0
   
1
0    ... 
0
e1 =  ...  , e2 =  .  , . . . , en =  0  ·
  
 .. 
0 1
0
As f : Rn → Rm we have f (ej ) ∈ Rm for all, j = 1, 2, . . . , n. Put
a 
1j
 a2j
 ∈ Rm , for all 1 ≤ j ≤ n.

f (ej ) = 
 ... 
amj
We obtain an m × n matrix A by putting

A = (f (e1 ) f (e2 ) . . . f (en ))

a11 a12 a1n


 
 a21 a22 a2n 
= 
 ... .. .. 
. ... . 
am1 am2 amn
That is,
A = (aij ) 1≤i≤m .
1≤j≤n

We claim that A the required property in (ii). For, if


x1
 
 x2  n
X=  ...  ∈ R ,

xn
2.14. EXERCISES 27

we have

f (X) = f (x1 e1 + x2 e2 + · · · + xn en )

= x1 f (e1 ) + x2 f (e2 ) + · · · + xn f (en ), as f is linear,

a11 a1n
   
 a21   a2n 
 ...  + · · · + xn  ... 
= x1    

am1 ann
a11 x1 + a12 x2 + ··· + a1n xn
 
 a21 x1 + a22 x2 + ··· + a2n xn 
=  .. 
 . 
am1 x1 + am2 x2 + · · · + amn xn
a11 a12 a1n x1
  
 a21 a22 a2n   x2 
= 
 ... .. ..   . 
. ... .   .. 
am1 am2 amn xn

= AX.

So, (i) implies (ii). 


The Theorem means that linear transformations correspond to matrices. So, if we carry
out operations on linear functions, does this correspond to carrying out familiar operations on
matrices? For example, if a linear functions has an inverse, is the inverse function linear and is
the matrix of the inverse function the inverse of the matrix of the original function? The general
answer to such questions is “yes”.

Theorem 3 The following hold.


(i) Let f, g : Rn −→ Rm be two linear functions, and let their matrices respectively be A, B.
Then, f + g : Rn −→ Rm is linear and the matrix of f + g is A + B.
(ii) Let f : Rn −→ Rn be an invertible linear function whose matrix is A. Then the inverse
function f −1 : Rn −→ Rn is linear, the matrix A is invertible, and the matrix of f −1 is A−1 .
(iii) Let f : Rn −→ Rm and g : Rm −→ R` be two linear functions whose matrices are
respectively A, B. Then g ◦ f : Rn −→ R` is linear and the matrix of g ◦ f is BA.

The proof of this result is basically routine, just using the definitions.

2.14 Exercises
Note that exercises marked with an asterisk (∗) are optional and are not examinable.

1.* Let X, Y, Z be sets and let f : X → Y, g : Y → Z, h : Z → X be three functions, as indicated.


If two of the functions h ◦ g ◦ f, f ◦ h ◦ g, g ◦ f ◦ h are one-to-one and the other is onto, prove
that f, g, h are all one-to-one and onto. Also, if two of the functions h ◦ g ◦ f, f ◦ h ◦ g, g ◦ f ◦ h
are onto and the other is one-to-one, prove that f, g, h are one-to-one and onto.
28 CHAPTER 2. SETS, FUNCTIONS, VECTORS AND OPEN SETS

2.* Let A, B be sets. Put

A∆B = (A ∩ B c ) ∪ (Ac ∩ B).


Draw a picture to illustrate the set A∆B. Now, let A, B, C be three sets. Prove that

A∆B = B∆A and that A∆B = ∅ ⇐⇒ A = B.

Prove also that


A∆(B∆C) = (A∆B)∆C.

3.* If A, B, C are sets, prove that

A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∩ C),
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C),
(A ∪ B)c = Ac ∩ B c , and
c c c
(A ∩ B) = A ∪B .

4.* Let f : {x : x ∈ R and x 6= 0} −→ R be given by f (x) = (x − 1)/x. Prove that f is


one-to-one, find the range of f , and calculate an expression for f −1 , where f −1 is the inverse of
f . Identify the domain and codomain of f −1 .

5. Let f : R2 −→ R2 be the function given by

f (x, y) = (x + y, x − y).

(i) Prove that f is linear as a function from R2 to R2 .


(ii) Calculate the matrix of f .
(iii) Prove that f is a one-to-one function whose range is R2 . Deduce that f has an inverse
function and calculate it.
(iv) If C is the square in R2 given by C = [0, 1] × [0, 1], find the set f (C), illustrating what
happens by means of a picture. [Hint: to find f (C) it may help to find what f does to the edges
of C. Then guess the answer and verify it.]

6. Let f : R2 −→ R2 be the function given by

f (x, y) = (x2 + y, x − y).

Investigate whether f is linear as a function from R2 to R2 , proving any conclusions you come
to.

7. If x = (1, −2, 1), y = (1, 1, 2) and z = (3, 1, −1) calculate (i) |x|, (ii) |y|, (iii) hx, yi, (iv)
hx, yi/|x| · |y|, (v) the angle between x and y, and (vi) the angle between x and z.

8. Let f : R2 −→ R2 be the function given by

f (x, y) = (x2 + y, x + y).


2.14. EXERCISES 29

Investigate whether f is linear as a function from R2 to R2 , proving any conclusions you come
to.

9.* Prove the Mean Value Theorem from first year calculus. Let f be a continuous real valued
function on [a, b] that is differentiable on (a, b) and such that f (a) 6= f (b). Let

x − a f (x) − f (a)
F (x) = − , for a ≤ x ≤ b.
b−a f (b) − f (a)

Check that F is continuous on [a, b] and differentiable on (a, b). Show that

F (a) = F (b) = 0.

Deduce that F 0 (c) = 0 for some a < c < b. Then show that

f (b) − f (a)
f 0 (c) = .
b−a
This conclusion is known as the Mean Value Theorem because it produces the point c which
is between a and b. Deduce that if f 0 (x) 6= 0 for all a < x < b, then f is one-to-one on (a, b)
and so has an inverse on (a, b). Interpret this result and the ideas in terms of the graph of the
function. Also, discuss what happens if f (a) = f (b).

10. Calculate a vector in R2 that is orthogonal to (3, −4). Then, calculate all vectors in R2
that are orthogonal to (3, −4).

11. Let a, b, c, d be vectors in R2 that, in the given order, form the vertices of a parallelogram
abcd. Let x be the midpoint of the line segment ab and let y be the point of intersection of the
diagonals of the parallogram.
(i) Prove that a + c = b + d.
(ii) Prove that x = (a + b)/2 and that y = (a + c)/2.
−→
(iii) Write down ac in terms of a and c.
−→
(iv) Calculate ay in terms of a and c.
−→
(v) Calculate bd in terms of a, b, c.
−→
(vi)Calculate by in terms of a, b, c.
−→
(vii) Calculate ax in terms of a, b.
−→
(viii) Calculate dx in terms of a, b, c.
−→
(ix) xy in terms of a, b, c.

12. In R3 let x = (3, 4, 0), y = (2, 2, −1) and z = (3, 0, −4). Calculate the following.
(i) |x|, |y| and |z|.
(ii) |x − z|.
(iii) A unit vector in the direction of x.
(iv) The angle between x and z.

13. Let x, y ∈ Rn . Prove that

|x + y|2 + |x − y|2 = 2 |x|2 + |y|2 .




Write down a geometrical interpretation of this result in R2 .


30 CHAPTER 2. SETS, FUNCTIONS, VECTORS AND OPEN SETS

14. We saw that a result of the Cauchy-Schwartz inequality is that for vectors x, y ∈ Rn ,
|x + y| ≤ |x| + |y|. Interpret this result with a geometric picture in R2 and explain why it is
often called the triangle inequality. Prove also that

|x| − |y| ≤ |x − y|,

and interpret this result geometrically.

15. Let x, y ∈ R2 and put z = x + y. Show that 0, x, z, y form the vertices of a parallelogram.

16. This problem is related to the previous one. Let x, y ∈ R2 and let s ∈ [0, 1]. Put u =
sx + (1 − s)y. Calculate |u − x| in terms of |x − y| and calculate |u − y| in terms of |x − y|. In
the case when s = 2/3, deduce that u lies 1/3 of the way along the line segment joining x to y
and going from x to y.

17. Calculate the general form of the vectors in R3 which are orthogonal to the vector (−2, 1, −1).

18. If a = (2, 2, 0), b = (3, −1, 1) and c = (8, 0, 0), calculate


(i) (a × b) × c and a × (b × c).
(ii) h(a × b), ci and ha, (b × c)i.

19. Calculate the cross product of the vectors (3, 1, −2) and (2, 1, −1) in R3 . Then check that
it is orthogonal to the vector (1, 0, −1). Explain this result geometrically, in words.

20. Let x, y, z ∈ R2 , and let u be the midpoint of x, y, let v be the midpoint of y, z, and let w
be the midpoint of z, x (that is, u = (x + y)/2 etc). Calculate the point a that is 2/3 of the way
along the line segment joining x to v, the point b that is 2/3 of the way along the line segment
joining y to w, and the point c that is 2/3 of the way along the line segment joining z to u. (All
of these points a, b, c should be calculated in terms of x, y, z.) What do you conclude? Express
this conclusion as a geometrical statement about triangles in R2 . Does the result still hold if
x, y, z ∈ Rn instead of belonging to R2 ?

21. Prove that the lines of the midpoints of the consecutive sides of a quadrilateral in R2 form
a parallelogram. Is the result true in R3 ?

22. The notes define the (open) sphere S(x, r) as the set {y : y ∈ Rn and |x − y| < r}. Write
down a corresponding description for the surface of the sphere S(x, r), and for the closed sphere
S(x, r) which is defined to consist of S(x, r) together with the surface of S(x, r).

23. Let e1 , e2 , e3 denote respectively the vectors (1, 0, 0), (0, 1, 0) and (0, 0, 1) in R3 . Calculate
the vectors
e1 × (e2 − e3 ), e2 × (e3 − e1 ) and e3 × (e1 − e2 ).

24. Identify which of the following sets are open sets, giving clear written reasons for your
answer in each case.
(i) [1, 2] as a subset of R.
(ii) (1, 2) as a subset of R.
(iii) {(x, y) : x2 + 2y 2 < 1} as a subset of R2 . Also, sketch a picture of this set.
(iv) {(x, y) : |x| ≤ 1 and |y| < 1}. Also, sketch a picture of this set.
2.14. EXERCISES 31

(v) {(x, y) : |x| < 1 and |y| < 1}. Also, sketch a picture of this set.
(vi) {(x, y, z) : x + y + z 6= 0} as a subset of R3 . Describe the set geometrically.
(vii) {(x, y, z) : 2x + 3y + 4z = 2} as a subset of R3 . Describe the set geometrically.
32 CHAPTER 2. SETS, FUNCTIONS, VECTORS AND OPEN SETS
Chapter 3

Lines, planes and surfaces

3.1 Lines
Let u, v ∈ Rn be given. Then the vector giving the direction from u to v is v − u. So, we define
the line through u and v to be the set ` of points in Rn given by

` = {u + t(v − u) : t ∈ R}
= {(1 − t)u + tv : t ∈ R.} (3.1)

Note that if t = 0,
(1 − t)u + tv = u;
while if t = 1,
(1 − t)u + tv = v.
Thus u, v both lie on the line `. (3.1) gives the description, as a set, of the collection of all points
on the line joining u to v. Of course, when u, v ∈ R2 , or u, v ∈ R3 , this definition of the line `
joining u and v has its usual geometric meaning.
Consider the case when n = 2. Then u = (u1 , u2 ), v = (v1 , v2 ) and let ` be the line through
u and v. Then (x, y) ∈ ` if and only if

(x, y) = tu + (1 − t)v, for some t ∈ R


⇐⇒ (x, y) = t(u1 , u2 ) + (1 − t)(v1 , v2 ) for some t ∈ R
⇐⇒ (x, y) = (u1 , u2 ) + (1 − t)(v1 − u1 , v2 − u2 ) for some t ∈ R
⇐⇒ x = u1 + (1 − t)(v1 − u1 ) and y = u2 + (1 − t)(v2 − u2 ) for some t ∈ R.

Thus, if v1 − u1 6= 0 and v2 − u2 6= 0,
x − u1 y − u2
= = 1 − t = s, say.
v1 − u1 v2 − u2
Thus, if the direction of the line is (a, b), we have a = v1 − u1 and b = v2 − u2 and if a 6= 0 and
b 6= 0, the equation of ` may be written alternatively as
x − u1 y − u2
= = s, for s ∈ R. (3.2)
a b
In the case when n = 3, if the direction of the line joining u = (u1 , u2 , u3 ) to v = (v1 , v2 , v3 )
is (a, b, c), then a = v1 − u1 , b = v2 − u2 , c = v3 − u3 and if a 6= 0, b 6= 0, c 6= 0, the equation

33
34 CHAPTER 3. LINES, PLANES AND SURFACES

of ` may be written alternatively as

x − u1 y − u2 z − u3
= = = s, for s ∈ R. (3.3)
a b c

Equations (3.1), (3.2) and (3.3) are called parametric forms of the line joining the two points in
their respective spaces.

u vector
v-u
.

.
v

.
w

There is a value t
such that
w-u = t(v-u).
So,
w = u+t(v-u).
Figure 3.1. The figure illustrates the derivation of the equation of the line going
through the two points u, v in R3 . If w is on the line joining u,v, w should equal
u plus some multiple of v − u, which gives the direction of the line.
3.2. PLANES 35

3.2 Planes
The discussion in this section is restricted to R3 .
Let n be a given non-zero vector in R3 , n = (a, b, c), say. Let r0 = (u1 , u2 , u3 ) be a given
vector in R3 . Then the plane P through r0 and orthogonal to n is the set of all vectors r in R3
whose direction from r0 is orthogonal to n. That is, the vector r − r0 must be orthogonal to n.
Thus, if we put
d = −hr0 , ni = −au1 − bu2 − c3 u3 ,
and put
r = (x, y, z),
we have

P = {r : r ∈ R3 and hr − r0 , ni = 0}
= {r : r ∈ R3 and hr, ni − hr0 , ni = 0}
= {(x, y, z) : ax + by + cz + d = 0}. (3.4)

Motivated by (3.4), we sometimes say that the plane P has the equation

ax + by + cz + d = 0, (3.5)

and note that the vector (a, b, c) is orthogonal to the plane. The vector n is called a normal
vector to the plane. This is called the Cartesian equation of the plane.
The equation of a plane is given by three numbers (the fourth number is illusory as if we
multiply the equation by a non-zero constant, the new equation still describes the same plane).
This means that, in general, three points in R3 will determine a unique plane. The derivation
of the equation of a plane in R3 is illustrated in Figure 3.2.
Consider three given vectors u, v, w ∈ R3 . Then, v − u is the direction from u to v, and w − u
is the direction from u to w. Geometrically, it appears that the plane P determined by u, v, w
is given by all vectors of the form

u + s(v − u) + t(w − u), for s, t ∈ R. (3.6)

This called the parametric equation of the plane. Then a normal vector to the plane is n, where

n = (v − u) × (w − u),

because (v − u) × (w − u) is orthogonal to v − u and w − u, the vectors that give the directions


of the plane.
Note that the above discussion breaks down when v − u is a multiple of w − u or vice-versa,
because in this case u, v, w lie on a line and there is a family of planes, each one of which goes
through u, v and w. This can be expressed in terms of the scalar triple product of three vectors
in R3 . A specific plane is illustrated in Figure 3.3.
DEFINITION. If x, y, z ∈ R3 , the scalar triple product of x, y, z (in that order) is

hx, y × zi.

The scalar triple product is sometimes denoted by

[x, y, z].
36 CHAPTER 3. LINES, PLANES AND SURFACES

normal n to
the plane

r r0

vector n is
orthogonal to
r-r0
Figure 3.2. The figure illustrates the derivation of the equation of a plane. r0 is
a point in the plane, and n is a vector normal to the plane. The vector r − r0 lies
in the plane and so must be orthogonal to n, which is a vector normal to the plane.
That is,
hr − r0 , ni = 0,
gives the equation of the plane. If the normal n to the plane is n = (a, b, c) and if
the point r0 in the plane is r0 = (a1 , b1 , c1 ), the equation of the plane becomes

ax + by + cz + d = 0,

where d = −aa1 − bb1 − cc1 .


3.3. SURFACES 37

Theorem 4 Let x, y, z ∈ R3 . Then the following hold.


(i) Letting x = (x1 , x2 , x3 ), y = (y1 , y2 , y3 ) and z = (z1 , z2 , z3 ), the scalr triple product equals
the determinant
x1 x2 x3

y1 y2 y3 .

z1 z2 z3
(ii) x, y, z lie in a common plane through the origin if and only if their scalar triple product
is 0.

0
00
0.5
1
1 1.5
2
y 2.5
3
2
x
3

Figure 3.3. The figure illustrates part of the plane surface in R3 whose equation is
3y
z =3−x− .
2
The illustrated part of the surface is the one obtained by restricting the x values to be
in [0, 3] and the y values to be in [0, 2].

3.3 Surfaces
The discussion here is restricted to R3 . In general terms, a surface in R3 is a set
n o
(x, y, z) : (x, y, z) ∈ R3 and g(x, y, z) = 0
38 CHAPTER 3. LINES, PLANES AND SURFACES

Figure 3.4 The figure illustrates the surface in R3 given by the equation
2
+y 2 −1)2
z = e−(x .

Figure 3.5. The figure illustrates the surface in R3 given by the equation
2
−y 2
z = e−x .
3.3. SURFACES 39

where g : R3 −→ R is a given function. The equation g(x, y, z) = 0, or one that is equivalent, is


called the equation of the surface.
EXAMPLE. The equation
x2 + y 2 + z 2 − 1 = 0
may be written as
x2 + y 2 + z 2 = 1,
or as
|(x, y, z) − (0, 0, 0)|2 = 1.
So, we see that the surface whose equation is x2 + y 2 + z 2 − 1 = 0 is the surface of a sphere of
centre 0 and radius 1, because the surface consists of all points in R3 whose distance from the
origin is 1.
Spheres and ellipsoids. In general, if a ∈ R3 and r > 0, the surface of the sphere of centre
a and radius r is described by the equation

|w − a| = r,

as this describes all the points of R3 whose distance from a is precisely r. In coordinate form,
if a = (x0 , y0 , z0 ) ∈ R3 , this sphere has the equation

(x − x0 )2 + (y − y0 )2 + (z − z0 )2 = r2 ,

where (x, y, z) denotes the general point on the surface of the sphere.
An ellipsoid is a surface like a sphere, except that the distance from the centre of an ellipsoid
is not constant (unless it is a sphere), but varies with the direction. A general ellipsoid with
centre at (x0 , y0 , z0 ) has an equation

(x − x0 )2 (y − y0 )2 (z − z0 )2
+ + = 1,
a2 b2 c2
where a, b, c are constants, which we may take to be positive. When the centre is the origin the
equation becomes
x2 y 2 z 2
+ 2 + 2 = 1. (3.7)
a2 b c
An ellipsoid is bounded – that is, it does not have points that can be arbitrarily far removed
from the origin in R3 . The intersection of the ellipsoid given by (3.7) with the xy-plane consists
of points satisfying (3.7) such that z = 0. That is, the intersection is the ellipse whose equation
is
x2 y 2
+ 2 = 1.
a2 b
The intersection of the ellipsoid given by (3.7) with the plane z = h is given by

x2 y 2 h2
+ = 1 − .
a2 b2 c2
Hyperboloids. The equation

x2 y 2 z 2
+ 2 − 2 = 1, (3.8)
a2 b c
40 CHAPTER 3. LINES, PLANES AND SURFACES

describes a hyperboloid of one sheet. Its intersection with the yz-plane is given by x = 0. That
is, the equation of the intersection is
y2 z2
− 2 = 1,
b2 c
and it describes a hyperbola in the yz-plane. Similarly, its intersection with the xz-plane is
given by y = 0. That is, by the equation
x2 z 2
− 2 = 1,
a2 c
which is a hyperbola in the xz-plane. However, the intersection of the hyperbola with the
xy-plane is given by putting z = 0 and we get
x2 y 2
+ 2 = 1,
a2 b
which is an ellipse in the xy-plane.
The intersection of the hyperboloid in (3.8) with the plane z = h is given by
x2 y 2 h2
+ = 1 + ,
a2 b2 c2
which is an ellipse with semi-axes
r r
h2 h2
a 1 + 2 and b 1 + 2 .
c c
Note that as h increases so do the semi-axes.
The above illustrates some of the techniques for trying to see what a surface looks like from
its equation – we make take the intersection of the surface with planes, and even by varying h
in the equation of the plane z = h we try and understand how the intersection varies as the
parameter h is varied.
Cones. A cone with vertex at the origin and whose axis is the z-axis has an equation
x2 y 2 z 2
+ 2 − 2 = 0. (3.9)
a2 b c
The intersection with the plane z = h is given by
x2 y 2 h2
+ = ,
a2 b2 c2
which gives an ellipse with semi-axes a and b. The simplest case is when a = b, and then this
intersection is given by
x2 y 2 h2
+ = ,
a2 a2 c2
which gives a circle whose centre is at 0 and whose radius is ah/c.
Note also that the cone given by (3.9) extends below the xy-plane, it is not simply above
the xy-plane.
Elliptic paraboloids. A elliptic paraboloid with vertex at the origin is given by an equation
of the form
x2 y 2
z = 2 + 2,
a b
or  2
y2

x
z=− + 2 ,
a2 b
3.3. SURFACES 41

Figure 3.6. The figure illustrates a hyperboloid in R3 whose equation is


p
z = ± x2 + y 2 − 1.

That is, the equation is


x2 + y 2 − z 2 = 1.

Its intersection with the xz-plane is given by putting y = 0, which gives


x2 = a2 z or x2 = −a2 z.
Each of these is a parabola in the xz-plane with vertex at the origin.
The intersection with the yz-plane is given by x = 0, which gives
y 2 = b2 z or y 2 = −b2 z,
42 CHAPTER 3. LINES, PLANES AND SURFACES

0
graph of the cone
whose equation is
z = x2 + y
2

Figure 3.7. The figure illustrates the surface of a cone in R3 whose equation
is
z 2 = x2 + y 2 , z ≥ 0.
that is, p
z= x2 + y 2 .
The intersection of the cone with a plane z = h is a circle, as indicated in the
picture.

each of which is again a parabola, in the yz-plane this time, with vertex at the origin.
The intersection with the plane z = h is given by

x2 y 2
+ 2 = ±h,
a2 b
which when ±h is non-negative, gives an ellipse with semi-axes
p p
a |h| and b |h|.

The semiaxes increase with |h|.


The common paraboloid occurs when a = b, so it has an equation that is either

x2 y 2
z= + 2,
a2 a
or
x2 y 2
 
z=− + 2 .
a2 a
3.3. SURFACES 43

Cylinders. A general cylinder in the z-direction has an equation

f (x, y) = 0. (3.10)

Similarly a general cylinder in the y-direction has an equation

g(x, z) = 0,

and one in the x-direction has an equation

h(y, z) = 0.

The equation (3.10) has no appearance of z, so it is telling us that the surface it represents has
the same appearance for each value of z – it consists of copies of the curve in the xy-plane given
by f (x, y) = 0 “stacked up” in the z-direction. Similar remarks apply to the other equations
above for cylinders. The variable that is not present in the equation tells us the direction of the
cylinder.
Generally we shall be concerned with when the cylinders are circular, elliptic, parabolic or
hyperbolic. These occur respectively when the equation f (x, y) = 0, or g(x, z) = 0 or h(y, z) = 0
above determine a circle, an ellipse, a parabola or a hyperbola.
Thus, the equation

x2 + y 2 = a2

in R3 is a circular cylinder of radius a in the z-direction. The equation

x2 y 2
+ 2 = 1,
a2 b

where a, b > 0, is an elliptical cylinder whose semi-axes are a, b.


The surface given by z = y 2 is a parabolic cylinder in the x-direction. Each intersection of
the surface with the plane x = h gives the a parabola in the plane x = h whose equation is

z = y2,

a parabola.
44 CHAPTER 3. LINES, PLANES AND SURFACES

Figure 3.8. The figure illustrates a paraboloid in R3 whose equation is

z = x2 + y 2 .

3.4 Exercises
Exercises marked with an asterisk (∗) are optional and are not examinable

1. Calculate the equation of the line through the vectors (−1, 1, 2) and (1, 2, 1) in R3 , in para-
metric form. Calculate the direction of this line and find a vector orthogonal to the line.

2. Calculate the equation of the line going through (1, −4, −1) in the direction of the vector
(−1, 1, 2). Then, find the equation of the line parallel to this line going through the point (3, 4, 0).

3. Let x, y ∈ Rn and put z = (x + y)/2. Prove that

|x − z| = |z − y|.

That is, z is the midpoint of the points x, y. Prove that if x 6= y, then z lies on the line going
through x and y.
3.4. EXERCISES 45

4. Let x, y ∈ R3 with x 6= y, and let s, t ∈ R. If w = sx + ty, does w necessarily belong to the


line through x and y? If not, find a condition on s, t that does ensure that w lies on the line
going through x, y.

5. Find a unit vector in R3 that satisfies in turn each of the following conditions.
−→
(i) The unit vector is parallel to xy , where x = (4, 1, −1) and y = (3, 2, 1).
(ii) The unit vector is orthogonal (that is normal) to the plane going through the points
x, y, z in R3 , where x, y are as in (i) and z = (3, 4, 6).
(iii) The unit vector lies in the x − y plane and in the direction of the tangent to the curve
given by y = 2x − x2 in that plane, at the point (2, 0) on the curve in the x − y plane.

6. let a = e1 + e2 + e3 , b = e1 , and c = αe1 + βe2 + γe3 , where α, β, γ ∈ R.


(i) Write down the vectors a, b, c in coordinate form.
(ii) If α = 1 and β = 2, find γ such that a, b, c are coplanar with the origin – that is, a, b, c
lie in a common plane through the origin.
(iii) If β = −1 and γ = 1 show that there is no value of α such that a, b, c are coplanar.
Explain this conclusion geometrically.

7. Calculate the equation of the plane going through the points (1, 1, 2), (−1, 1, 2) and (1, 2, 3)
in R3 . Show that the point (3, 2, 3) lies in this plane. Calculate a vector normal to this plane,
and find the equation of the line going through (3, 2, 3) that is normal to the plane.

8. A line is given by the intersection of the two planes with equations

x + y − z = 1 and 2x − y + 3z = 4.

Calculate the equation of the line in parametric form.

9. Sketch a picture of the cone C in R3 given by


p
z = x2 + 4y 2 .

Also, sketch a picture of the plane P in R3 given by

x + y = 1.

Describe by means of an equation or equations the set C ∩ P – that is, the set of points that lie
on both C and P , and sketch a picture of the set C ∩ P .

10. Show that the vectors x, y, z in R3 lie in a common plane through the origin if and only if

hx, y × zi = 0.

11. Write down what type of surface in R3 each of the following equations represents.

z = x2

z = x2 + y 2

z 2 = x2 + y 2
46 CHAPTER 3. LINES, PLANES AND SURFACES

z 2 = x2 + y 2 − 1

z 2 = 1 − x2 − y 2 .

12. Sketch the surfaces corresponding to the following equations, identifying their type where
possible.
(i)z = 3x2 + y 2 , (ii) 2x2 + y 2 + z 2 = 4, (iii) x2 + y 2 − z 2 = 18, (iv) x2 + z 2 = 4, (v) z = 4 − y 2 ,
(vi) z = 4 − x2 , (vii) z = 4 − x, (viii) x2 + y 2 + z 2 = 2az, where a is some constant, (ix)
x2 + y 2 − z 2 = 0, (x) x2 + y 2 − z 2 = 1, (xi) z = y 2 , (xii) x = 2, (xiii) y 2 + z 2 = 9, (xiv)
x + y + z = 1.
Chapter 4

Differentiation

4.1 The derivative of a function


Let Ω be an open set in Rn , let f : Ω → Rm be a given function, and let x ∈ Ω. Then f is
differentiable at x if there is a linear transformation

T : Rn → Rm

such that
|f (x + h) − f (x) − T (h)|
lim = 0.
h→0 |h|
If such a transformation T exists, it is unique. Then T is called the derivative of f at x, and
is denoted by (Df )(x) or Df (x). The matrix of (Df )(x) is denoted by f 0 (x). Thus, f 0 (x) is an
m × n matrix.
Note that (Df )(x) is a linear transformation from Rn into Rm . As x varies in Ω, (Df )(x)
is generally going to vary. The above definition fits in with the one-dimensional case – in the
one dimensional case (when f : R 7−→ R), f 0 (x) is a 1 × 1 matrix, that is, a number. So when
f : R → R, f 0 (x) is a number (or a 1 × 1 matrix), but in general f 0 (x) is an m × n matrix.
EXAMPLE. Let f : R2 → R be given by
 
x1
f = x1 − 2x2 + 1.
x2

Let T : R2 → R be the linear transformation given by


   
x1 x1
T = x1 − 2x2 = ( 1 −2 ) .
x2 x2
The matrix of T = ( 1 −2 ) . Now, putting
   
h1 x1
h= and x = ,
h2 x2
we have
|f (x + h) − f (x) − T (h)|
lim
h→0 |h|
     
f x1 + h1 − f x1 − T h1

x 2 + h2 x2 h2
= lim
h→0 |h|

47
48 CHAPTER 4. DIFFERENTIATION

|x1 + h1 − 2(x2 + h2 ) + 1 − x1 + 2x2 − 1 − h1 + 2h2 |


= lim
h→0 |h|
|0|
= lim
h→0 |h|
= 0.

So f is differentiable at x and (Df )(x) = T . Also

f 0 (x) = matrix of T = ( 1 −2 ) .

Note in this case that T does not depend on x, so (Df )(x) is the same for all x ∈ R2 .
If a function is differentiable at a point x, then it is continuous at that point in the following
sense: limh→0 f (x + h) = f (x).
In general, how do we calculate the matrix f 0 (x)? It turns out that f 0 (x) is expressed in
terms of what are called the “partial derivatives” associated with the function f . These “partial
derivatives” are calculated by using the differentiation techniques from ordinary calculus.
SUMMARY OF IDEAS ABOUT THE DERIVATIVE. Given a function mapping Rn into
Rm , at a point x it may have a derivative Df (x). Df (x) is a linear function, and the function
changes from point to point, in general. Df (x) is the (unique) linear function that approximates
changes in the function f at the point x (see Figure 4.1). Because Df (x) is a linear function, it
has a (standard) m times n matrix representation, f 0 (x). The approach is different in concept
and depends upon thinking of the derivative at x as a linear function that approximates changes
in the function itself near x – it takes the idea of linear approximation as the basic idea in
the differential calculus (that is what the tangent at a point of the graph does, in effect, in
the 1 dimensional case). That is, Df (x)(h) is a linear approximation to f (x + h) − f (x).
Consequently, you may recognize that the derivative is the differential, if you have come across
this latter concept before. We shall see that the chain rule then takes the matrix form of

(g ◦ f )0 (x) = g 0 (f (x))f 0 (x),

the same as in first year, and the matrix product makes sense. Similarly, the (largely meaning-
less) first year formula

dx 1
= ,
dy dy/dx

is saying, from this viewpoint, something like this: “the derivative of the inverse function f −1
at the point f (x) is the inverse of the derivative of f at the point x”.
4.2. PARTIAL DERIVATIVES 49

tangent t
approximation error t(x)-f(x)
t(x)
f(x)

function f
f(x 0 )

.
x0 x

Figure 4.1. Given a function f : R −→ R, and given x0 ∈ R, the derivative


f 0 (x0 ) gives the slope of the tangent t to f at x0 . The tangent is given by
t(x) = f (x0 ) + f 0 (x0 )(x − x0 ). Because of the geometric meaning of the
tangent, for a point x near x0 , t(x) can be seen to be an approximation to
f (x) – this approximation gets better and better as x gets closer to x0 . The
error in the approximation is the difference between f (x), the true value,
and t(x) the approximating value. The magnitude of this error is indicated
in the Figure. As t(x) = f (x0 ) + f 0 (x0 )(x − x0 ), we see that f (x) − f (x0 )
can be approximated by the expression

t(x) − t(x0 ) = t(x) − f (x0 ) = f 0 (x0 )(x − x0 ).

That is, near x0 , f (x) − f (x0 ) may be approximated by the values of the
linear function x − x0 7−→ f 0 (x0 )(x − x0 ) in x − x0 , and it is this function
that is the derivative of f at x0 (note that in first year this function may
have been called a “differential”, but it is the same function!). From this
viewpoint, the basic idea of elementary calculus is that the behaviour of a
function near a given point may be approximated by a linear function.

4.2 Partial derivatives


Let Ω be an open set in Rn , and let f : Ω → R be a function. Then f has a j th partial derivative
Dj f at x ∈ Ω if the limit
f (x1 , . . . , xj−1 , xj + h, xj+1 , . . . , xn ) − f (x1 , . . . , xn )
lim
h→0 h
exists, and we write
f (x1 , . . . , xj−1 , xj + h, xj+1 , . . . , xn ) − f (x1 , . . . , xn )
Dj f (x) = lim .
h→0 h
50 CHAPTER 4. DIFFERENTIATION

Note that the definition of Dj f (x) means that it is the ordinary derivative of a function of a
single variable, just as in first year. Namely, if we keep x1 , . . . , xj−1 , xj+1 , . . . xn fixed, and treat
f as though it’s a function of the variable xj only, then the derivative of this (new) function at
xj gives us the partial derivative Dj f (x). That is, Dj f (x) is the derivative of the function of a
single real variable given by

x 7−→ f (x1 , . . . , xj−1 , x, xj+1 , . . . , xn ),

evaluated at xj .
Sometimes we may write (Dj f )(x) in place of Dj f (x). Note that Dj f (x) is sometimes
written as
∂f ∂f
or (x1 , . . . , xn ) or fxj .
∂xj ∂xj
On this, see the discussion later.
If Dj f (x) exists for all 1 ≤ j ≤ n, we say that all partial derivatives of f exist at x ∈ Ω. If
all partial derivatives of f exist at all points in Ω, then

Dj f : Ω → R

for all j = 1, 2, . . . , n.
In the case where f : R2 → R and f is given by an expression f (x, y), D1 f or D1 f (x, y) may
be written as
∂f ∂f
or (x, y) or fx or fx (x, y).
∂x ∂x
Also, D2 f or D2 f (x, y) may be written also as

∂f ∂f
or (x, y) or fy or fy (x, y).
∂y ∂y

In the case where g : R3 → R and g is given by an expression g(x, y, z), D1 g or D1 g(x, y, z)


may be written as
∂g ∂g
or (x, y, z) or gx or gx (x, y, z).
∂x ∂x
Also, D2 g or D2 g(x, y, z) may be written as

∂g ∂g
or (x, y, z) or gy or gy (x, y, z).
∂y ∂y

Also, D3 g or D3 g(x, y, z) may be written as

∂g ∂g
or (x, y, z) or gz or gz (x, y, z).
∂z ∂z
EXAMPLE. Let f : R3 −→ R be given by

f (x, y, z) = x2 − 3y 3 z 2 + xyz 2 .

Then,
∂f ∂f ∂f
= 2x + yz 2 , = −9y 2 z 2 + xz 2 , = −6y 3 z + 2xyz.
∂x ∂y ∂z
4.3. THE MATRIX REPRESENTATION OF THE DERIVATIVE 51

4.3 The matrix representation of the derivative


Consider a function f : Ω → Rm , where Ω is an open set in Rn . Then there are functions

fj : Ω → R

for all 1 ≤ j ≤ m such that


f = (f1 , f2 , . . . , fm ).

That is,
f (x) = (f1 (x), f (2 (x), . . . , fm (x)),

for all x ∈ Ω. The functions f1 , f2 , . . . , fm are called the coordinate functions of f .


If all the partial derivatives of f1 , f2 , . . . , fm exist at a point x ∈ Rn , we may form an m × n
matrix from them — this matrix is

D1 f1 (x) D2 f1 (x) ... Dn f1 (x)


 
 D1 f2 (x) D2 f2 (x) ... Dn f2 (x) 
 .. .. .. ..  = (Dj fi (x)) 1≤i≤m ,
 . . . .  1≤j≤n

D1 fm (x) D2 fm (x) . . . Dn fm (x)

and it is denoted by f 0 (x).


In this case, and provided that all partial derivatives are continuous in an open sphere about
x, the function f is differentiable at x and the matrix of Df (x) is f 0 (x). Thus, under the stated
conditions,

D1 f1 (x) D2 f1 (x) ... Dn f1 (x)


 
D1 f2 (x) D2 f2 (x) ... Dn f2 (x) 
f 0 (x) = 

.. .. .. .. 
 . . . . 
D1 fm (x) D2 fm (x) . . . Dn fm (x)

= (Dj fi (x)) 1≤i≤m .


1≤j≤n

Also, if f is differentiable at x, then all the partial derivatives exist at x.


We state this formally as a Theorem.

Theorem 5 Let Ω be an open subset of Rn , let x ∈ Ω, let f : Ω −→ Rm be continuous and let


f = (f1 , f2 , . . . , fm ), where each fj : Ω −→ R. Assume that there is an open sphere S(x, r) of
centre x such that Di fj (y) exists for all y ∈ S(x, r) and Di fj is continuous in S(X, r), for all
1 ≤ i ≤ m and 1 ≤ j ≤ n. Then f is differentiable at x and

D1 f1 (x) D2 f1 (x) ... Dn f1 (x)


 
D1 f2 (x) D2 f2 (x) ... Dn f2 (x) 
f 0 (x) = 

.. .. .. ..  = (Dj fi (x)) 1≤i≤m . (4.1)
 . . . .  1≤j≤n

D1 fm (x) D2 fm (x) ... Dn fm (x)

Also, if f is differentiable at x, f is continuous at x and the partial derivatives Dj fi (x) exist for
all 1 ≤ i ≤ m, 1 ≤ j ≤ n.
52 CHAPTER 4. DIFFERENTIATION

Note that if f is a constant function, Df (x) = 0 for all x. Also, if f : Ω −→ R, that is if


m = 1, then f 0 (x) is a 1 × n matrix and

f 0 (x) = (D1 f (x), D2 f (x), . . . , Dn f (x))


 
∂f ∂f ∂f
= , ,..., .
∂x1 ∂x2 ∂xn

Note that if f is a linear function,

Df (x) = f, Df = f, and f 0 (x) = the matrix of f.

Thus, when f is linear, f 0 (x) is constant and is equal to the matrix of f .


EXAMPLE. Consider the function f : R2 −→ R given by
 
x1
f = x1 − 2x2 + 1,
x2

so m = 1 and f1 = f . We have

D1 f1 (x) = D1 f (x) = 1
D2 f1 (x) = D2 f (x) = −2.

So, using (4.1),


f 0 (x) = ((D1 f )(x), (D2 f )(x)) = ( 1 −2 ) ,
as before.

EXAMPLE. Let f : R2 → R3 be given by

x21 − x2
 
 
x1
f =  x21 + x22  .
x2
2x31 − 3x1 x22

We have
 
x1
f1 = x21 − x2 ,
x2
 
x1
f2 = x21 + x22 ,
x2
 
x1
f3 = 2x31 − 3x1 x22 .
x2

All partial derivatives of f1 , f2 , f3 exist at all points of R2 . So f is differentiable at every point


x ∈ R2 and using (4.1) we have
    
0 x1 x1
f = Dj fi
x2 x2 1≤i≤3
1≤j≤2
 
2x1 −1
=  2x1 2x2  .
2 2
6x1 − 3x2 −6x1 x2
4.4. THE CHAIN RULE 53

In particular,  
  2 −1
1
f0 = 2 4 .
2
−6 −12

EXAMPLE. Let f : Rn → Rn be given by

x1 x1 x1
     
 x2   x2   x2 
f  .  ∈ Rn .
 ...  =  ...  , for all
  
 .. 
xn xn xn

Then f is the identity function on Rn and fi (x) = xi , for all x ∈ Rn .


So, using (4.1),

1 0 ... ... 0
 
0 1 0 ... 0
f 0 (x) = (Dj fi (x)) 1≤i≤n = 

.. .. .. .. . .
1≤j≤n  . . . . .. 
0 0 ... 0 1

So, if f is the identity function on Rn , f 0 (x) is the identity n × n matrix.

4.4 The Chain Rule


Now let
f : Rn → Rm and g : Rm → R` .
We can form the function
g ◦ f : Rn → R`
given by
(g ◦ f )(x) = g(f (x)), for all x ∈ Rn .
The Chain Rule tells us how to calculate (g◦f )0 (x) in terms of f 0 and g 0 . For this reason, it should
probably be called the composition rule. The Chain Rule is what I would call “heuristically
obvious”, but it requires some serious reflection first, to understand why. In elementary calculus,
the Chain Rule is sometimes called the “function of a function” rule and is sometimes written,
more or less unintelligibly, as
dy dy du
= .
dx du dx
However, it is also written as
(g ◦ f )0 (x) = g 0 (f (x))f 0 (x).
One of the advantages of our approach to multivariable calculus is that, because of the systematic
notations that we use, the new results take the same form as those of elementary calculus, when
the latter are properly stated. The Chain Rule is just one example of this.

Theorem 6 (The Chain Rule). If f is differentiable at x and g is differentiable at f (x), then


g ◦ f is differentiable at x and

(g ◦ f )0 (x) = g 0 (f (x))f 0 (x). (4.2)


54 CHAPTER 4. DIFFERENTIATION

Note that in (4.2), f 0 (x) is an m × n matrix, g 0 (f (x)) is an ` × m matrix, so the matrix


multiplication g 0 (f (x))f 0 (x) gives an ` × n matrix, which is consistent because (g ◦ f )0 (x) is an
` × n matrix.
Note that the Chain Rule in (4.2) looks like the Chain Rule (also known as the “function
of a function” rule) from single variable calculus. Now, recall that a matrix has an associated
linear transformation, and that matrix multiplication corresponds to the composition of linear
transformations. Also, recall that the derivative was defined as a linear transformation. So, the
Chain Rule as stated above may be interpreted conceptually as a statement about derivatives
and the composition of derivatives, as follows: “the derivative at x of the composition of g
with f is the composition of the derivative of g at f (x) with the derivative of f at x”. Or,
more informally, it says: “the derivative of a composition of functions is the composition of the
derivatives”.
Now in (4.2), the matrices f 0 (x) and g 0 (f (x)) can be expressed in terms of the partial
derivatives of the respective coordinate functions. Specifically,
D1 f1 (x) . . . Dn f1 (x)
 
 D1 f2 (x) . . . Dn f2 (x)
f 0 (x) = 

.. .. .. ..  ,
 . . . .
D1 fm (x) . . . Dn fm (x)

D1 g1 (f (x)) . . . Dm g1 (f (x))
 
D1 g2 (f (x)) . . . Dm g2 (f (x)) 
g 0 (f (x)) = 

.. .. .. ..  .
 . . . .
D1 g` (f (x)) . . . Dm g` (f (x))
So
(g ◦ f )0 (x) = g 0 (f (x))f 0 (x)
   
= Dj gi (f (x)) 1≤i≤`
(Dk fj )(x) 1≤j≤m . (4.3)
1≤j≤m 1≤k≤n

The most common case is when f : Rn → Rm and g : Rm → R. In this case, we have


` = 1, g1 = g, g ◦ f : Rn −→ R and
 
Xm
(g ◦ f )0 (x) =  Dj g(f (x))(Dk fj )(x) .
j=1
1≤k≤n

Also, as g ◦ f : Rn → R, (g ◦ f )0 (x) is a 1 × n matrix and


(g ◦ f )0 (x) = ( D1 (g ◦ f )(x), ... , Dn (g ◦ f )(x) ) .
Using (4.3) we have
m
X
Dk (g ◦ f )(x) = (Dj g)(f (x))(Dk fj )(x), (4.4)
j=1

for all x ∈ Rn and k = 1, 2, . . . , n. That is,


m
X
Dk (g ◦ f ) = ((Dj g) ◦ f )Dk fj , (4.5)
j=1

for k = 1, 2, . . . , n.
4.5. THE CHAIN RULE IN CLASSICAL NOTATION 55

4.5 The Chain Rule in classical notation


The formulas (4.4) and (4.5) above distinguish between the functions g ◦ f and g, and make
clear also where each function should be evaluated. In the older ‘classical’ approach, these
distinctions are obscured. The formulas above do require more thought and care, for they give
greater precision and clarity. However, if one is only interested in calculation, the classical
notations may be convenient — but care may be needed owing to the confusion that may result
from using the same symbol for different functions, a point of special relevance in calculating
higher derivatives repeatedly using the Chain Rule.
We illustrate these points by interpreting the formulas (4.4) and (4.5) in classical notation.
Consider when f : R2 → R2 and g : R2 → R. Then g ◦ f : R2 → R. Put h = g ◦ f . Let f be
given by    
x u(x, y)
f = , where u, v : R2 → R.
y v(x, y)
Thus,       
x x u(x, y)
h =g f =g .
y y v(x, y)
Now, by putting k = 1, 2 in (4.5), and by regarding h = g ◦ f and g as the same function even
though they are not the same function, we get
∂g ∂g ∂u ∂g ∂v
= + , and
∂x ∂u ∂x ∂v ∂x
∂g ∂g ∂u ∂g ∂v
= + . (4.6)
∂y ∂u ∂y ∂v ∂y

This is a common way the Chain rule is stated. Note that in (4.6), g can be replaced by h since
we are treating g and h as though they are the same.

EXAMPLE. Let f : R2 −→ R2 and g : R2 −→ R be given by


   2
x − y2
  
x u
f = , g = u2 − v 2 .
y 2xy v

We have
u(x, y) = x2 − y 2
    
x u(x, y)
f = , where .
y v(x, y) v(x, y) = 2xy
Then by (4.6) we have

∂g ∂g ∂u ∂g ∂v
= +
∂x ∂u ∂x ∂v ∂x
= 2u · 2x + (−2v) · 2y
= 4x(x2 − y 2 ) + (−4y)(2xy)
= 4x3 − 4xy 2 − 8xy 2
= 4x3 − 12xy 2 .

Similarly,
∂g ∂g ∂u ∂g ∂v
= +
∂y ∂u ∂y ∂v ∂y
56 CHAPTER 4. DIFFERENTIATION

= 2u(−2y) − 2v(2x)
= −4y(x2 − y 2 ) − 4x(2xy)
= −4x2 y + 4y 3 − 8x2 y
= 4y 3 − 12xy.

We can check this by observing


 
x
g = (x2 − y 2 )2 − 4x2 y 2
y
= x4 − 2x2 y 2 + y 4 − 4x2 y 2
= x4 − 6x2 y 2 + y 4 .

So,

∂g
= 4x3 − 12xy 2 and
∂x
∂g
= −12x2 y + 4y 3 ,
∂y

giving the same answers.

4.6 Higher derivatives


Let Ω be an open subset of Rn and let f : Ω 7−→ R be a given function, assumed to be suitably
differentiable. Then the partial derivatives D1 f, D2 f, . . . , Dn f exist and Dj f : Ω −→ R for all
j = 1, 2, . . . , n. Thus, we may take the partial derivatives

D1 (D1 f ), D2 (D1 f ), . . . Dn (D1 f )

of D1 f . More generally, we may take the j th partial derivative Dj (Di f ) of Di f .


DEFINITIONS. Dj (Di f ) is denoted by Dij f . More generally, if j1 , j2 , . . . , jk ∈ N, then
Djk (· · · (Dj2 (Dj1 f )) · · ·) is denoted by
Dj1 j2 ...jk f,
and this is called a higher (partial) derivative of f of order k. If jr 6= js for some r, s, this higher
derivative may be called a mixed partial derivative. In classical notation
 
∂ ∂f
∂xj1 ∂xj2

is written as
∂2f
.
∂xj1 ∂xj2
Also,    
∂ ∂ ∂f
···
∂xj1 ∂xj2 ∂xjk
is written as
∂kf
.
∂xj1 ∂xj2 · · · ∂xjk
4.6. HIGHER DERIVATIVES 57

So, in the case of a function of two real variables denoted by x, y we have that
 
∂ ∂f
∂x ∂y

is written as
∂2f
.
∂x∂y
In the subscript notation for partial derivatives, (fx )y is written as fxy , ((fx )y )z is written as
fxyz , ((fz )z )x is wiritten as fzzx and (· · · ((fxj1 )xj2 ) · · ·)xjk is written as fxj1 xj2 ···xjk , and so on.
If all the above seems a bit confusing the next Theorem comes to the rescue! For, the
notations above indicate precisely the order in which the partial differentiations are carried out
on the function. The Theorem below tells us that when the function is suitably differentiable,
and this nearly always includes the cases in which we are interested, it is only the actual partial
derivatives that is important, and not their order. That is, if the same operations of partial
differentiation are carried out, but possibly in a different order, then the result is not affected.

Theorem 7 Let Ω be an open set in Rn and let f : Ω −→ R be a function whose partial


derivatives all exist up to order k and are continuous on Ω. Then, if i1 , i2 , . . . , ir ∈ {1, 2, . . . , n}
are given and j1 , j2 , . . . , jr consist of i1 , i2 , . . . , ir but possibly written in a different order, and if
r ≤ k, then
Di1 i2 ···ir f (x) = Dj1 j2 ···jr f (x), for all x ∈ Ω.

Thus, for a twice continuously differentiable function f mapping R2 −→ R, say, we have

∂2f ∂2f
= ,
∂x∂y ∂y∂x

on R2 .
EXAMPLE. Let z(x, y) = 3x2 − xy + y 2 . Then,

zx = 6x − y and zy = −x + 2y.

Thus,
zxy = −1 and zyx = −1.
So, zxy = zyx , a result we could also express by writing

∂2z ∂2z
= or D12 z = D21 z.
∂y∂x ∂x∂y

The following example uses the Chain Rule repeatedly, and the equality of mixed partial
derivatives, to calculate higher partial derivatives.
EXAMPLE. Let
z = z(x, y), x = u2 − v 2 , y = u2 + v 2 .
Then,

zu = zx xu + zy yu
= 2uzx + 2uzy .
58 CHAPTER 4. DIFFERENTIATION

Hence,
zuu = (zu )u
= 2zx + 2u(zx )u + 2zy + 2u(zy )u
h i
= 2zx + 2u (zx )x xu + (zx )y yu + 2zy +
h i
2u (zy )x xu + (zy )y yu
h i
= 2zx + 2zy + 2u 2uzxx + 2uzxy
h i
+2u 2uzyx + 2uzyy
h i
= 2zx + 2zy + 4u2 zxx + zyy + 8u2 zxy ,
h i
= 2zx + 2zy + 2(x + y) zxx + zyy
+4(x + y)zxy ,
as zxy = zyx .
Similarly,
zv = zx x v + zy yv
= −2vzx + 2vzy .
Hence,
zvv = (zv )v
= −2zx − 2v(zx )v + 2zy + 2v(zy )v
h i
= −2zx − 2v (zx )x xv + (zx )y yv + 2zy +
h i
2v (zy )x xv + (zy )y yv
h i
= −2zx − 2v −2vzxx + 2vzxy + 2zy
h i
+2v −2vzyx + 2vzyy
h i
= −2zx + 2zy + 4v 2 zxx + zyy − 8v 2 zxy
h i
= −2zx + 2zy + 2(−x + y) zxx + zyy
−4(−x + y)zxy ,
as zxy = zyx .

4.7 Polar coordinates


An important case where the Chain Rule arises is in polar coordinates. Here, we consider the
transformation mapping from R2 into R2 given by
(r, θ) 7−→ (r cos θ, r sin θ).
If a > 0, this transformation maps the rectangle
[0, a] × [0, 2π)
4.7. POLAR COORDINATES 59

onto the set


{(x, y) : x2 + y 2 ≤ a2 },
in a one-to-one fashion. The case a = 1 is illustrated in Figure 4.2.
Let
x = r cos θ and y = r sin θ.
Then,
xr = cos θ, xθ = −r sin θ, yr = sin θ and yθ = r cos θ.
Also, p
r= x2 + y 2 ,
θ = tan−1 (y/x),
x
rx = ,
r
y
ry = ,
r
−y
θx = 2 ,
r
x
θy = 2 .
r
EXAMPLE: Laplace’s equation and its polar form
The function z : R2 −→ R satisfies Laplace’s equation when
zxx + zyy = 0.
We will calculate the from of this equation in polar coodinates—that is, where we use r and θ
in place of x, y. By the Chain Rule we have:
zx = zr rx + zθ θx
x y
= zr − 2 zθ .
r r

Now,
zxx = (zx )x
x  y 
= zr − 2 zθ
r x r x
y 2 x y 2xy
= zr + (zr )x − 2 (zθ )x + 4 zθ
r3 r r r
y 2 x y 2xy
= zr + (zrr rx + zrθ θx ) − 2 (zθθ θx + zθr rx ) + 4 zθ .
r3 r r  r
y 2 x  x y  y −y x 2xy
= zr + zrr − zrθ 2 − 2 zθθ 2 + zθr + 4 zθ .
r3 r r r r r r r
y 2 x 2 xy y 2 xy 2xy
= 3
zr + 2 zrr − zrθ 3 + 4 zθθ − zθr 3 + 4 zθ . (4.7)
r r r r r r
Similarly,
zy = zr ry + zθ θy
y x
= zr + 2 zθ .
r r
60 CHAPTER 4. DIFFERENTIATION

Hence,

zyy = (zy )y
y  x 
= z r + 2 zθ
r y r y
x 2 y x 2xy
= 3
zr + (zr )y + 2 (zθ )y − 4 zθ
r r r r
x2 y x 2xy
= 3
zr + (zrr ry + zrθ θy ) + 2 (zθr ry + zθθ θy ) − 4 zθ
r r r r
x2 y y x x  y x  2xy
= zr + zrr + zrθ 2 + 2 zθr + zθθ 2 − 4 zθ
r3 r r r r r r r
x2 y2 xy xy x2 2xy
= zr + 2 zrr + zrθ 2 + 3 zθr + zθθ 4 − 4 zθ (4.8)
r3 r r r r r
Adding (4.7) and (4.8) gives

zxx + zyy
 2
x + y2
 2
x + y2
 2
y2
  
x
= zr + zrr + zθθ + 4
r3 r2 r4 r
1 1
= zrr + zr + 2 zθθ .
r r

Thus, in polar coordinate form, Laplace’s equation is


1 1
zrr + zr + 2 zθθ = 0.
r r
If z is such a function, and is a function of r only (that is, it is rotationally invariant), the
equation is
1
zrr + zr = 0.
r
0
But then, since the differential equation y + (1/x)y = 0 must have a solution of the form
y(x) = C/x, where C is some constant, we see that
C
zr = ,
r
so that
z(r) = C ln(r) + D,
where C, D are constants. So a solution of Laplace’s equation which is rotationally invariant
must have this form.

4.8 The Jacobian and the Inverse Function Theorem


Let Ω be an open set in Rn and let f : Ω → R be a differentiable function. Note that as
f : Ω → Rn , f 0 (x) is an n × n matrix, and so it has a determinant.
DEFINITION. The Jacobian of f is the function J(f ) : Ω −→ Rn given by

J(f )(x) = det(f 0 (x)) = |f 0 (x)|,

for x ∈ Ω.
4.8. THE JACOBIAN AND THE INVERSE FUNCTION THEOREM 61

C = circle of
radius 1 and
centre 0
S g(S) = C
(0, 2π ) (1, 2π )

g
(0, 0) (0, 1)

(0, 0) (1, 0)
rθ-plane xy-plane
Figure 4.2 The transformation arising from polar coordinates is the func-
tion g : R2 −→ R2 given by

g(r, θ) = (r cos θ, r sin θ).

This function is not one-to-one on R2 although its range is R2 . However,


if g is restricted to the rectangle S = [0, 1] × [0, 2π), then the restricted
function becomes one-to-one and its range is the circle C given by
n o
C = g(S) = (x, y) : x2 + y 2 ≤ 1 .

Each vertical line segment in S given by x = r say, as indicated by the


vertical dotted line in S, maps under g to a the circumference of a circle of
radius r and centre at 0, as indicated by the dotted circled in C = g(S).
In general, g maps rectangles like S into circles. Put u = r cos θ and
v = r sin θ, so that g(r, θ) = (u(r, θ), v(r, θ)). Then we also have
   
0 ur uθ cos θ −r sin θ
g (r, θ) = =
vr vθ sin θ r cos θ

Hence the determinant of g 0 (r, θ) is

det(g 0 (r, θ)) = r cos2 θ + r sin2 θ = r,

and this is called the Jacobian of g. It is important later in calculating


integrals using polar coordinates.
62 CHAPTER 4. DIFFERENTIATION

EXAMPLE. Let f : R2 −→ R2 be given by f (x, y) = (x + y, x2 − y 2 ). Then, writing


u(x, y) = x + y and v(x, y) = x2 − y 2 we have

∂u ∂u
∂x ∂y 1 1
J(f )(x, y) = ∂v ∂v = = −2(x + y).
∂x ∂y 2x −2y

The behaviour of the Jacobian is related to the existence of inverse functions. The Jacobian
also “measures” how much the transformation expands or contracts areas or volumes in the
space from point to point. The following result shows that at a point where the Jacobian is
non-zero, the function has an inverse near the point. Also, the derivative of the inverse function
is the inverse of the derivative of the function (when evaluated at appropriate points), a fact
expressed in terms of elementary calculus by the formula

“ dx 1 ”
= .
dy dy/dx

Theorem 8 (The Inverse Function Theorem). Let x ∈ Rn and let f : Rn −→ Rn be a


function that is continuously differentiable in some open set containing x and assume that f 0 (x)
is an invertible n × n matrix. That is, we assume that det(f 0 (x)) 6= 0. Then there is an open set
V containing x and an open set W containing f (x) such that f maps V onto W in a one-to-one
manner and the inverse f −1 of f is continuously differentiable as a map from W to V . In this
situation we have: 0 −1
f −1 (w) = f 0 f −1 (w)

, for all w ∈ W.
Alternatively, this can be written as

(f −1 )0 (f (v)) = (f 0 (v))−1 , for all v ∈ V. (4.9)

PROOF. This is not presented. See Michael Spivak’s book Calculus on Manifolds, but the
proof is not examinable. 
Thus, the inverse function theorem says: if f is differentiable at x and det(f 0 (x)) 6= 0, that is
if the Jacobian J(f ) of f at x is not 0, then f has an inverse f −1 near f (x), and (f −1 )0 (f (x)) is
the inverse of f 0 (x). That is, the derivative of the inverse function is the inverse of the derivative,
when evaluated at the appropriate points.
When n = 1, formula (9.23) was written in elementary calculus as
1
(f −1 )0 (x) = ,
f 0 (f −1 (x))
or as
dx 1
= .
dy dy/dx
Theorem 9 Let Ω, Ω0 be open sets and let f : Ω −→ Ω0 be a continuously differentiable function
with a (continuously differentiable) inverse f −1 : Ω0 :−→ Ω. Then,
1
J(f −1 )(f (x)) = ,
J(f )(x))
for all x ∈ Ω. That is,
1
J(f −1 ) ◦ f = ,
J(f )
4.8. THE JACOBIAN AND THE INVERSE FUNCTION THEOREM 63

f(x0 ) =b
y
a
function f

(x 1
.
x0
J
x2 (
Figure 4.3. In the case of a function f : R −→ R, the Figure illlustrates
why there may not be an inverse function of f near a point x0 where
f 0 (x0 ) = 0. The graph is of a (presumably) quadratic function, that has a
maximum at x0 , we have f (x0 ) = b and f 0 (x0 ) = 0. J is an open interval

.
about x0 , and the range of f on J is the interval (a, b]. The function f
restricted to J does not have an inverse, for if y ∈ (a, b), the equation
f (x) = y has two distinct solutions x1 , x2 as indicated in the Figure.

c
f(x0 ) =b
y
function f
a

( .
x0
x
J
(
Figure 4.4 This Figure depicts the same function as in Figure 4.3, but
this time the point x0 is not where the maximum occurs, and f 0 (x0 ) 6= 0.
On the open interval J, the range of the function is (a, c), as shown. Now
if y is any point in (a, c), the equation f (x) = y has exactly one solution
for x ∈ J – the Figure illustrates this for the case where a < y < b.
Thus f : J −→ (a, c), f is one-to-one and onto, and f has an inverse
f −1 : (a, c) −→ J. All this is possible because f 0 (x0 ) 6= 0.
64 CHAPTER 4. DIFFERENTIATION

PROOF. We have (f −1 ◦ f )(x) = x, for all x ∈ Ω. So, by the Chain Rule rule, and denoting
the n × n identity matrix by In , we have
(f −1 )0 (f (x))f 0 (x) = In ,
so that
det (f −1 )0 (f (x)) det f 0 (x) = det(In ) = 1.
 

That is
J(f −1 )(f (x))J(f )(x) = 1.

In classical notation in R2 ,
the transformation f : Ω → R2 may be given by f (x, y) = (u, v)
where u = u(x, y) and v = v(x, y). In this notation we have
J(f ) = det (f 0
 ) 
ux uy
= det
vx vy
= ux vy − uy vx
Also,
 
−1 xu xv
J(f ) = det = x u yv − x v yu .
yu yv

So, in the notation in R2 ,


1
ux vy − uy vx = .
xu yv − xv yu

Note that in this situation the Jacobian ux vy − uy vx may be denoted by

∂(u, v)
∂(x, y)
The Jacobian J(f ) = xu yv − uv yu may be denoted by
∂(x, y)
∂(u, v)
Thus, we have seen that
∂(u, v) 1
= .
∂(x, y) ∂(x, y)
∂(u, v)
EXAMPLE (polar coordinates). Changing to polar coordinates is associated with the
function g : (r, θ) −→ (x, y), where x = r cos θ and y = r sin θ. So, the Jacobian in changing
from r, θ to x, y is

∂(x, y) xr xθ cos θ −r sin θ
J= = = = r.
∂(r, θ) yr yθ sin θ r cos θ
So, the Jacobian for changing from x, y to r, θ is the inverse of the one above. Checking
directly we obtain that the Jacobian for the inverse transformation is
y/r x2 + y 2 r2

rx ry x/r
= 1

2
θx θy −y/r x/r 2 = 3
= 3
= .
r r r
4.9. IMPLICIT FUNCTIONS 65

4.9 Implicit Functions


Suppose that f : Rn × Rm −→ Rm is a given function. Then we can consider the equation
f (x1 , . . . , xn , xn+1 , . . . , xm+n ) = 0, (4.10)
and ask: given (x1 , x2 , . . . , xn ) ∈ Rn , is there a unique (xn+1 , xn+2 , . . . , xm+n ) ∈ Rm such that
(4.10) holds? Putting this differently we could ask: given x ∈ Rn , is there are a unique g(x) ∈ Rm
such that f (x, g(x)) = 0. Such values of x would define a function g, given by x 7−→ g(x). The
function g is then said to be defined implicitly by the equation (4.10).
More generally, we might have open sets Ω ⊆ Rn and Ω0 ⊆ Rm and a function f : Ω × Ω0 −→
Rm . Then if, for each x ∈ Ω there is a unique g(x) ∈ Ω0 such that f (x, g(x)) = 0, we say the
function g : Ω −→ Ω0 given by x 7−→ g(x) is defined implicitly on Ω by the equation
f (x, y) = 0, x ∈ Ω, y ∈ Ω0 .
EXAMPLE. Let f : R2 −→ R be given by
f (x, y) = x2 + y 2 − 1.
Consider the equation
f (x, y) = 0. (4.11)
If x is given, can we solve (4.11) to obtain a unique y? Well,
p
f (x, y) = 0 ⇐⇒ x2 + y 2 − 1 = 0 ⇐⇒ y = ± 1 − x2 .
Thus, for (4.11) to have a solution, we must have x ∈ [−1, 1] (for we cannot take the square root of
a negative number); but even then, there will be two solutions for y except for x = 1 or x = −1.
So, in this case, the function g described above has domain {−1, 1} and g : {−1, 1} −→ R is
given by g(w) = 0. This is not a very interesting case! In fact, when we are solving an equation
such as (4.10) or (4.11) we restrict the acceptable solutions, so as to obtain a unique solution.
In this way we have a function x 7−→ g(x) such that f (x, g(x)) = 0 but g has √ a larger domain.
In the case of this example,
√ we might choose to take the positive square root 1 − x2 , and reject
2
the negative one − 1 − x . This means that we are interested in the existence of an implicit
function whose graph is near a given point on the graph of the equation. See Figure 4 for more
details on this example.

Theorem 10 (The Implicit Function Theorem) Let a ∈ Rn and b ∈ Rm . Let


f : Rn × Rm −→ Rm be a function that is continuously differentiable in some open set containing
(a, b) and such that f (a, b) = 0. Let A denote the m × m matrix
(Dn+j fi (a)) , for 1 ≤ i, j ≤ m.
That is,  
Dn+1 f1 (a) Dn+2 f1 (a) · · · Dn+m f1 (a)

 Dn+1 f2 (a) Dn+2 f2 (a) · · · Dn+m f2 (a) 

A=
 ··· ··· ··· ··· .

 ··· ··· ··· ··· 
Dn+1 fm (a) Dn+2 fm (a) · · · Dn+m fm (a)
Then, if |A| 6= 0, there is an open set V ⊆ Rn with a ∈ V , and an open set W ⊆ Rm with
b ∈ W such that the following holds: for each x ∈ V , there is a unique g(x) ∈ W such that
f (x, g(x)) = 0. This defines a function g : V −→ W and g is differentiable.
66 CHAPTER 4. DIFFERENTIATION

PROOF. This is not given, but see Michael Spivak’s Calculus on Manifolds, for example. 

circumference of circle is graph of implicit


given by x 2 + y 2 = 1 function g

)
yo (xo , yo )

(
J
( )
xo

Figure 4.5 The Figure depicts a situation for the Implicit Function The-
orem where f : R × R −→ R is given by f (x, y) = x2 + y 2 − 1 = 0. The
matrix A at (x0 , y0 ) is in this case the 1 × 1 matrix A = D2 f (x0 , y0 ) = 2y0 .
Thus, |A| 6= 0 when y0 6= 0. In the picture, y0 6= 0 and the graph of the
implicit function g over J is indicated by the shaded arc √ of the circle. In
this case, there is an explicit form for g, namely g(x) = 1 − x2 for x ∈ J.

4.10 Exercises
Exercises marked with an asterisk (∗) are optional and are not examinable
Note that in some problems the domain of the function is referred to explicitly, but
in others it is only implicit. Even where the domain is not referred to explicitly, the
various equations or properties depend upon the domains of the function(s) and any
associated partial derivatives, as required by the circumstances of each problem.

1. Let f : Rn −→ Rm be linear and let A be the matrix of f . Prove that f 0 (x) = A, for all
x ∈ Rn . Thus, when f is linear Df and f 0 are constant.
4.10. EXERCISES 67

∂z ∂z
2. Find and given
∂x ∂y
(i) z = x2 − 2xy 2 ;
(ii) z = xy + ln xy;
2 2
(iii) z = e−(x +y ) .

∂f ∂f
3. If f (x, y) = 2x2 − xy + y 2 , find and .
∂x ∂y
2
4. If f : R2 −→ R is given by f (x1 , x2 ) = x31 x2 + ex1 x2 , find D1 f (x1 , x2 ) and D2 f (x1 , x2 ) for all
(x1 , x2 ) ∈ R2 .

5. If f : R2 −→ R is given by f (x, y) = 3x2 − 2xy 2 , find fx (2, −1) and fy (2, −1).
x y z
6. Let u : R3 −→ R be given by u(x, y, z) = + + , for all (x, y, z) ∈ R3 with x 6= 0, y 6= 0
y z x
6 0. Note that this means that the domain of u is the set
and z =
n o
A = (x, y, z) : x 6= 0, y 6= 0 and z 6= 0 ⊆ R3 .

Show that xux + yuy + zuz = 0 at all points of the domain A of u.


 
x
7. Let V : R2
−→ R be given by V (x, y, z) = arctan . Write down the domain of V and
y+z
prove that xVx + yVy + zVz = 0 at all points of the domain of V .
y
8. Let u(x, y) = arcsin , for all (x, y) ∈ R2 with x 6= 0. Verify that xux + yuy = 0 for all
2x
(x, y) ∈ R2 with x 6= 0.

9. Given that  
0 ∂φ ∂φ ∂φ
φ = , , ,
∂x ∂y ∂z
find φ0 (x, y, z) in the following cases.
1
(i) φ = x2 yz 3 ; (ii) φ = p .
x + y2 + z2
2

10. Find all the second order partial derivatives of the function z, where z is given by
(i) z = sin(x3 − 3xy), (ii) z = xyexy .

y ∂f ∂f ∂2f ∂2f
11. If f (x, y) = ln(x2 + y 2 ) − arctan , find and . Show that = .
x ∂x ∂y ∂x∂y ∂y∂x

12. If u : R2 −→ R is given by u(x1 , x2 ) = x31 − 3x1 x22 , show that D11 u + D22 u = 0. Then, write
this conclusion using two other systems of notation for partial dervatives.

1
13. If V = p , show that Vxx + Vyy + Vzz = 0. Write this conclusion using the Dij
x2 + y 2 + z 2
notation for partial derivatives.

14. Let f : R2 −→ R be given by f (x1 , x2 ) = x31 − 4x21 x32 + 7x1 x22 − 8x32 . Calculate D12 f (x1 , x2 ),
D21 f (x1 , x2 ) and D22 f (x1 , x2 ).
68 CHAPTER 4. DIFFERENTIATION

15. If z : R2 −→ R is given by z(x, y) = sin(2y + x) + cos(2y − x), prove that z satisfies the
∂2z ∂2z
equation = 4 .
∂y 2 ∂x2

16. If V = f (x + cy) + g(x − cy), where c is a constant, show that

∂2V 2
2∂ V
= c .
∂y 2 ∂x2

17. If f : R −→ R is a given twice differentiable function, let V : R2 −→ R be the function given


by V (x, y) = f (x2 + y 2 ). Show that
(i) yVx − xVy = 0,
(ii) y 2 Vxx − 2xyVxy + x2 Vyy = xVx + yVy .

18. Given that f is a differentiable function of one variable and that u : R2 −→ R is defined by
 
x+y
u(x, y) = xyf ,
xy

show that there is a function G : R2 −→ R such that u satisfies the equation

x2 ux (x, y) − y 2 uy (x, y) = G(x, y)u(x, y),

for all (x, y) in the domain of u, and find G(x, y).

19. If u : R3 −→ R is given by u(x, y, z) = exp(xyz), calculate D321 u and find uzyx .

20. If u = (x3 − y 3 )/(x3 + y 3 ), verify that uxy = uyx .

21. Let (a1 , a2 , . . . , an ) ∈ Rn . Then, let f : R −→ Rn be given by

f (t) = (ta1 , ta2 , . . . , tan ).

Let g : Rn −→ R be a given function and let h = g ◦ f. Thus,

h = g ◦ f : R −→ R, and

h(t) = g(ta1 , ta2 , . . . , tan ), for all t ∈ R.


Assuming the functions involved are suitably differentiable, use the Chain Rule to prove that
n n
0 0
X X ∂g
h (t) = (g ◦ f ) (t) = aj Dj g(ta1 , ta2 , . . . , tan ) = aj (ta1 , ta2 , . . . , tan ),
∂xj
j=1 j=1

for all t ∈ R.
2 dz π
22. If z = exy , where x = θ cos θ and y = θ sin θ, find the value of at θ = .
dθ 2
dV
23. If V = x2 + y 2 , where x = t2 + 1, y = t − 1, find as a function of t.
dt
4.10. EXERCISES 69

∂z
24. If z = 2x2 + xy − y 2 + 2x − 3y + 5, x = 2s − t, y = s + t, find .
∂t
∂u ∂u
25. If u = x2 + y 2 z + xyz 2 and x = est , y = s2 + t2 , z = st, find and in terms of x, y, z,
∂s ∂t
s and t.
∂z ∂z
26. Find and given z = x2 + 3xy + y 2 , x = sin r + cos s, y = sin r − cos s.
∂r ∂s
27. Given z = f (x, y), x = es cos t, y = es sin t, show that
 2  2 (   2 )
∂z ∂z 2s ∂z 2 ∂z
+ =e + .
∂s ∂t ∂x ∂y

dy du dv
28. If y = uv , where u and v both functions of x, find in terms of u, v, and .
dx dx dx
29. Given V = V (x, y), with x = r cos φ and y = r sin φ, show that

∂V 2 ∂V 2 ∂V 2 1 ∂V 2
       
+ = + 2 .
∂x ∂y ∂r r ∂φ

∂2V
30. If x = 2r − s, y = r + 2s and V = f (x, y), find in terms of derivatives of V with
∂y∂x
respect to r and s.

31. If x = u2 − v 2 , y = 2uv, and z = F (x, y) = G(u, v), show that


(i) zuu + zvv = 4(u2 + v 2 )(zxx + zyy ),
(ii) vzuu + uzuv = 2vzx + 2uzy + 4(u2 + v 2 )(uzxy + vzyy ).

32. If V = V (x, y) and u = αx2 + βy 2 , v = αx2 − βy 2 , show that


(i) xVx + yVy = 2{uVu + vVv },
(ii) xVx − yVy = 2{vVu + uVv }, and
(iii) x2 Vxx − 2xyVxy + y 2 Vyy = 2{uVu + vVv } + 4{v 2 Vuu + 2uvVuv + u2 Vvv }.

33. Find xu , xv , yu , yv , ux , uy , vx and vy for the following cases.


(i) x = u3 + v 2 , y = u/v
(ii) u = x2 + y 2 , v = x2 + y 2 − 2y
(iii) u = y ex , v = y e−x
1 1
(iv) y = (uv) 2 , x = ln(u/v) 2 .

∂z ∂z ∂2z
34. If x = u + v, y = uv, z = u2 + v 2 define z as a function of x and y, find , and .
∂x ∂y ∂x∂y

35. Given z = f (x, y) with u = y ex and v = y e−x , show that


(i) zxx + y zxy = 2u(zu + u zuu − v zuv )
(ii)zxx − y 2 zyy = u zu + v zv − 4uv zuv .

36. Given z = f (x, y) with u = x − y and v = xy, show that zxy + zyy = zv + (x + y)(x zvv − zuv ).
70 CHAPTER 4. DIFFERENTIATION

37.∗ Let Ω be an open subset of Rn and let f : Ω −→ Rn be such that J(f )(x) 6= 0 for all x ∈ Rn .
Prove that the range f (Ω) of f is an open set of Rn .

38. Let f : R2 −→ R2 be given by f (x, y) = (x2 + y 2 , x2 − y 2 ), for all (x, y) ∈ R2 . Calculate


J(f )(x, y) and describe the set A = {(x, y) : J(f )(x, y) = 0}, illustrating by means of a pic-
ture. If (x0 , y0 ) ∈/ A, deduce from the Inverse Function Theorem that f has an inverse near
(x0 , y0 ), calculate explicitly what it is, and calculate (f −1 )0 (f (x0 , y0 )). Show that (1, −1) ∈
/ A
and calculate (f −1 )0 (2, 0).

39. Let g : R −→ R be a given differentiable function. Let f : R2 −→ R be given by f (x, y) =


g(x) − y. If x0 ∈ R and g 0 (x0 ) 6= 0, observe that f (x0 , g(x0 ) = 0 and use the Implicit Function
Theorem, or the Inverse Function Theorem, to prove that there is an open interval J about x0
such that g has an inverse on J.

40. Let f : R3 −→ R2 be given by f (x, y, z) = (x − xy, x + 2y + z 2 ). Assume that (a, b, c) ∈ R3 is


such that f (a, b, c) = 0. Find a condition on (a, b, c) that ensures that there is an open interval
V containing a and an open set W ⊆ R2 containing (b, c) such that there is a differentiable
function g mapping V onto W and f (x, g(x)) = 0 for all x ∈ V . Can you calculate a specific
formula for g? If so, do it.

41. Let f : R3 −→ R2 be given by f (x, y, z) = (x−xyz, x+2y +z 2 ). Assume that (a, b, c) ∈ R3 is


such that f (a, b, c) = 0. Find a condition on (a, b, c) that ensures that there is an open interval
V containing a and an open set W ⊆ R2 containing (b, c) such that there is a differentiable
function g mapping V onto W with f (x, g(x)) = 0 for all x ∈ V .
42.∗ Let Ω be an open subset of Rn and let f : Ω −→ Rm be differentiable at the point x ∈ Ω.
Prove that
lim f (x + h) = f (x).
h→0

That is, prove that f is continuous at x.


Chapter 5

Integration in two dimensions

5.1 Integration over a rectangle


The discussion here is moderately complete, but some details are omitted. Let
J = [a1 , b1 ] × [a2 , b2 ] be a given rectangle in R2 . Let

a1 = x0 < x1 < x2 < · · · < xr = b1

be a partition P1 of [a1 , b1 ]. Also, let

a2 = y0 < y1 < · · · < ys = b2

be a partition P2 of [a2 , b2 ]. Put

Jkl = [xk−1 , xk ] × [y`−1 , y` ],

for 1 ≤ k ≤ r, 1 ≤ ` ≤ s. J is the union of the rs subrectangles Jk` .


Let f : J −→ R be a given continuous function. We put

mk` = min{f (x) : x ∈ Jk` },


Mp` = max{f (x) : x ∈ Jk` }.

Note that these maximum and minimum values exist because f is a continuous function on the
closed and bounded set Jk` . Note also that mk` ≤ Mk` .
Let
r X
X s
S(f, P1 , P2 ) = mk` (area of Jk` ), and
k=1 `=1
Xr X s
S̄(f, P1 , P2 ) = Mk` (area of Jk` ). (5.1)
k=1 `=1

Put n o
ε = max Mk` − mk` : 1 ≤ k ≤ r, 1 ≤ ` ≤ s . (5.2)

Then from (5.1) and (5.2) we have

0 ≤ S̄(f, P1 , P2 ) − S(f, P1 , P2 )

71
72 CHAPTER 5. INTEGRATION IN TWO DIMENSIONS

r X
X s
= (Mk` − mk` ) (area of Jk` )
k=1 `=1
Xr X s
≤ ε (area of Jk` )
k=1 `=1
= ε (area of J). (5.3)

Let
n o
δ(P1 ) = max |xk−1 − xk | : 1 ≤ k ≤ r , and
n o
δ(P2 ) = max |y`−1 − y` | : 1 ≤ ` ≤ s .

Then, as f is continuous, it is uniformly continuous on J, which means that with ε as given in


(5.2),
lim ε = 0. (5.4)
δ(P1 ),δ(P2 )→0

So, we see from (5.3) that


 
lim S̄(f, P1 , P2 ) − S(f, P1 , P2 ) = 0. (5.5)
δ(P1 ),δ(P2 )→0
R
It can be shown from (5.3), (5.4) and (5.5) that there is a unique number, denoted by J f , with
the following property: for any ε > 0 there is δ > 0 such that if P1 is a partition of [a1 , b1 ] with
δ(P1 ) < δ and P2 is a partition of [a2 , b2 ] with δ(P2 ) < δ, then
Z Z Z
f − ε ≤ S(f, P1 , P2 ) ≤ f ≤ S̄(f, P1 , P2 ) ≤ f + ε.
J J J
R
DEFINITION.
RR The number J f is called R the integral of f over J, and is sometimes
denoted by J f . It is usual for the integral J f to be written as
Z Z Z Z Z
f (x, y) dxdy or as f or as f (x, y) dxdy.
J J [a1 ,b1 ]×[a2 ,b2 ]

5.2 R2
Integration over subsets of R
The discussion here is rather sketchy; it is not meant to be complete. Let S be a closed and
bounded set in R2 and let U be an open set containing S. Let f : U −→ R+ be a continuous
function on U .
Then, S may be approximated from the inside by a grid of rectangles, where each rectangle is
contained in S. Let these rectangles be denoted by U1 , U2 , . . . , Ur . Also, S may be approximated
from the outside by a grid of rectangles whose union contains S and is contained in U . Let these
rectangles be denoted by V1 , V2 , . . . , Vs . Then as f (x) ≥ 0 for x ∈ R2 , we have
r Z
X s Z
X
f (x, y) dxdy ≤ f (x, y) dxdy.
j=1 Uj k=1 Vk

R
The function f is integrable over S if there is a number S f with the following property: for
any ε > 0 there is δ > 0 such that there is grid U1 , . . . , Ur of rectangles approximating S from
5.3. REPEATED INTEGRATION 73

the inside, and a grid V1 , . . . , Vs of rectangles approximating S from the outside such that
Z Xr Z Z Xs Z Z
f −ε ≤ f (x, y) dxdy ≤ f ≤ f (x, y) dxdy ≤ f + ε. (5.6)
S j=1 Uj S k=1 Vk J
R
Note that the existence of S f apparently depends upon the set S, not simply the function f .
However, we only consider sets S R for which the integral exists for any continuous function, in
the above context. The number S f , when it exists, is usually denoted by
Z Z Z
f (x, y) dxdy, or by f (x, y) dxdy.
S S

The idea in (5.6) is that the integral over S can be approximated simultaneously to greater
and greater degrees of accuracy by using grids of rectangles from the
R inside and from the outside.
When the function is not necessarily non-negative, the integral S f (x, y) dxdy may be defined
by expressing the function as a difference of non-negative functions and then integrating each
non-negative function separately.
The following result summarizes some of the usual properties of the integral.

Theorem
RR 11 Let S beR Ra closed bounded set and let f, g be continuous functions such that
S f (x, y) dxdy and
R R S g(x, y) dxdy both exist. Then the following hold:
(1) the integral S (f + g)(x, y) dxdy exists and
Z Z Z Z Z Z
(f + g)(x, y) dxdy = f (x, y) dxdy + g(x, y) dxdy.
S S S
RR
(2) If a ∈ R, then the integral S af (x, y) dxdy exists and
Z Z Z Z 
af (x, y) dxdy = a f (x, y) dxdy .
S S

5.3 Repeated integration


R
The definitions above tell us what the integral S f of the function f over the rectangle or
set S is, but they do not give us any idea how to calculate specific integrals. Typically, these
2-dimensional integrals are calculated by reducing the problem to calculating one or more 1-
dimensional integrals such as appeared in first year calculus. R
Now, let S = [a1 , b1 ] × [a2 , b2 ] be a rectangle. We saw earlier that S f (x, y) dxdy was
approximated by S̄(f, P1 , P2 ), where
r X
X s
S̄(f, P1 , P2 ) = Mk` (area of Jk` )
k=1 `=1
r s
!
X X
= Mk` (area of Jk` )
k=1 `=1
r s
!
X X
= Mk` (y` − y`−1 ) (xk − xk−1 )
k=1 `=1
r s
!
X X
= f (x0k , y`0 ) (y` − y`−1 ) (xk − xk−1 ), (5.7)
k=1 `=1
74 CHAPTER 5. INTEGRATION IN TWO DIMENSIONS

where (x0k , y`0 ) is a point in Jk` such that f (x0k , y`0 ) = Mk` . Now, the expression inside the brackets
in (5.7) is an approximation to
Z b2
f (xk , y) dy,
a2

so it follows from (5.7) that S̄(f, P1 , P2 ) can be approximated by


r Z
X b2 
f (xk , y) dy (xk − xk−1 ), (5.8)
k=1 a2

and this in its turn is an approximation to


Z b1 Z b2 
f (x, y) dy dx (5.9)
a1 a2
R
In summary, we see that S f (x, y) dxdy can be approximated by (5.7), then (5.7) can be
approximated by (5.8), and (5.8) can be approximated by (5.9). The point about all these ap-
proximations is that in the limit, as the rectangular grids get finer and finer, the approximations
all become arbitrarily good. This means we can deduce that
Z Z b1 Z b2 
f (x, y) dxdy = f (x, y) dy dx. (5.10)
S a1 a2

Similarly, by reversing the role of x, y in the above argument, we deduce that


Z Z b2 Z b1 
f (x, y) dxdy = f (x, y) dx dy. (5.11)
S a2 a1

Thus, from (5.10) and (5.11), for a rectangle S = [a1 , b1 ] × [a2 , b2 ], we have
Z Z b2 Z b1  Z b1 Z b2 
f (x, y) dxdy = f (x, y) dx dy = f (x, y) dy dx. (5.12)
S a2 a1 a1 a2

Integrals like (5.10) and (5.11) are called repeated or iterated integrals.
Equation (5.12) gives us a method for calculating integrals over rectangles—we can integrate
with respect to x between the appropriate limits, and then integrate with respect to y and the
appropriate limits, or vice versa.
EXAMPLE. Let S = [−1, 1] × [1, 2]. Evaluate the integral S (x2 − y) dxdy. We have,
R

Z Z 1 Z 2 
2 2
(x − y) dxdy = (x − y) dy dx
S −1 1
Z 1 h
y 2 i2

2
= x y− dx
−1 2 1
Z 1  
2 4 1
= x (2 − 1) − − dx
−1 2 2
Z 1 
2 3
= x − dx
−1 2
h x3 3x i1
= −
3 2 −1
5.3. REPEATED INTEGRATION 75

3 i h −1 −3 i
h1
= − −−
3 2  3 2
7 7
= − −
6 6
14
= −
6
7
= − (5.13)
3

Y
ψ2
ψ (x)
2

ψ (x)
1 ψ1

0 a x b x

Figure 5.1. If x is a given value in [a, b], the y coordinate of points in


S goes between ψ1 (x) and ψ2 (x). This is how we calculate the y limits of
integration when integrating first with respect to y and then with respect
to x. The x limits in this case are more obvious, for as we consider all
points in S, the x coordinates go from a to b.

When it comes to general regions, similar considerations apply. As a suitable set can be
approximated by rectangles, under suitable conditions equation (5.12) is true where the rectangle
S is replaced by a more general set.
For example, if the set S is the elliptically shaped set in Figure 5.1, the function ψ1 on the
interval [a, b] gives the lower part of the boundary of S, while the function ψ2 gives the upper
part of the boundary of S. Now, if we integrate first with respect to y and then with respect to
x, (5.12) gives
!
Z Z Z b
Z ψ2 (x)
f= f (x, y) dy dx (5.14)
S a ψ1 (x)
76 CHAPTER 5. INTEGRATION IN TWO DIMENSIONS

In the figure the x values of points in S go from a to b—hence the limits of a and b in the
integral with respect to x. But the y limits generally vary according to the different values of
x. By keeping x fixed, and examining the y limits of points in S for this value of x, we get the
corresponding y limits of integration—in
RR the picture we see they are ψ1 (x) and ψ2 (x). Then
(5.14) gives us a way of calculating S f , by integrating first with respect to y then with respect
to x.

d
y
φ1 S φ2

0 φ (y)
1
φ (y)
2 x
Figure 5.2. If y is a given value in [c, d], the x coordinate of points in
S goes between φ1 (y) and φ2 (y). This is how we calculate the x limits of
integration when integrating first with respect to x and then with respect
to y. The y limits in this case are more obvious, for as we consider all
points in S, the y coordinates go from c to d.

Now, instead of integrating with respect to y and then x, it may be easier to do it by


integrating first with respect to x and then with respect to y. This situation is illustrated by
Figure 5.2. The function φ1 is defined on the interval [c, d] on the y-axis, and it gives the left
hand part of the boundary of S. The function φ2 is defined on the interval [c, d] on the y-axis,
and it gives the right hand part of the boundary of S. In this case, we find that

!
Z Z Z d Z φ2 (y)
f= f (x, y) dx dy (5.15)
S c φ1 (y)

It follows from (5.14) and (5.15) that for a set S as in Figures 5.1 and 5.2 we have, under
5.4. INTEGRATION BY SUBSTITUTION 77

suitable conditions,
! !
Z Z Z b Z ψ2 (x) Z d Z φ2 (y)
f= f (x, y) dy dx = f (x, y) dx dy (5.16)
S a ψ1 (x) c φ1 (y)

5.4 Integration by substitution


In first year calculus we came across the formula for integration by substitution. It read some-
thing like
Z b Z g(b)
f (u)
f (g(x))dx = 0 (g −1 (u))|
du.
a g(a) |g
It can also be written as
Z g(b) Z b
f (x) dx = f (g(u))|g 0 (u)| du.
g(a) a

In these formulas, the substitution arises from the function g that maps the interval (a, b) onto
the interval (g(a), g(b)), assuming for example that g 0 (ξ) 6= 0 for all ξ ∈ (a, b). There is a
corresponding formula for functions of 2 variables, where g maps an open subset Ω of R2 onto
an open subset g(Ω) of R2 .
Theorem 12 Let Ω be an open subset of R2 and let g : Ω −→ Rn be a one-to-one continuously
differentiable function such that det(g 0 (x)) 6= 0 for all x ∈ Ω. Then g(Ω) is an open set. Let
f : g(Ω) −→ R be a given function with domain g(Ω). R Observe that the composed function f ◦ g
is defined and that f ◦ g : Ω −→ R. We assume that g(Ω) f exists. Then Ω f ◦ g | det(g 0 )| exists
R

and Z Z
f= f ◦ g | det(g 0 )|. (5.17)
g(Ω) Ω

An equivalent form of the equation (7.15) is


Z Z
h
h◦g = , (5.18)
Ω g(Ω) | det(g 0 ◦ g −1 )|
where in this formula h : g(Ω) −→ R. Formulas (7.15) and (5.18) are usually put into use by
putting say
u = u(x, y) and v = v(x, y).
Then, if T is the region in xy-coordinates, and S is the transformed region in uv-coordinates,
Z Z Z Z
∂(u, v)
f (u, v) dudv = f (u(x, y), v(x, y)) dxdy

S T ∂(x, y)

Z Z  ∂u ∂u 
∂x ∂y
= f (u(x, y), v(x, y)) det ∂v ∂v dxdy. (5.19)

T ∂x ∂y

Alternatively, we may have a situation where we use the formulas in the form
Z Z Z Z
∂(x, y)
f (x, y) dxdy = f (x(u, v), y(u, v)) dudv

A B ∂(u, v)
Z Z  ∂x ∂x 

= f (x(u, v), y(u, v)) det ∂u ∂v
dudv, (5.20)

∂y ∂y
B ∂u ∂v

78 CHAPTER 5. INTEGRATION IN TWO DIMENSIONS

(2,1)
x-2y = 0
x-y = 1 (0,1) (1,1)
(0,0)

x-y = 0 S (1,0) T
x-2y = 1
(-1,-1)
(0,0) (1,0)

x-y plane u-v plane


The transformation from the x-y
plane to the u-v plane is given by
u=x-y, v=x-2y.
Then, S maps onto T.
Figure 5.3. The figure illustrates the function g : R2 −→ R2 given by

g(x, y) = (x − y, x − 2y).

In the figure, the edge of the parallelogram S given by the equation x−y = 0
is transformed by g into the edge of the unit square T that is {(0, z) : 0 ≤
z ≤ 1}, the edge of S given by x − y = 1 is transformed into the edge of T
that is {(1, z) : 0 ≤ z ≤ 1}, the edge of S given by x−2y = 0 is transformed
into the edge of T that is {(z, 0) : 0 ≤ z ≤ 1}, and the edge of S given by
x − 2y = 1 is transformed into the edge of T that is {(z, 1) : 0 ≤ z ≤ 1}.

where A is the region in xy-coordinates and B is the transformed region in uv-coordinates.


The main problems with applying (5.19) and (5.20) lie in finding a suitable substitution for
u, v in terms of x, y or vice versa, and in working out the regions S, T in the one case and A, B
in the other. Typically, one of S, T is given, or one of A, B is given. Then, (5.19) and (5.20)
essentially change the problem of integrating over one set to that of integrating over another,
with the hope that the integral over the new set is easier than the integral over the older set.
2
RR
EXAMPLE. This example is illustrated in Figure 5.3. Calculate S (x + y)dxdy, where
S is the parallelogram determined by the lines

x − y = 0, x − y = 1, x − 2y = 0, x − 2y = 1.

Put
u = x − y and v = x − 2y.
5.5. INTEGRATION USING POLAR COORDINATES 79

Then, under this substitution, S changes into the set


T = {(u, v) : 0 ≤ u ≤ 1 and 0 ≤ v ≤ 1}.
Notice that
y = u − v and x = 2u − v,
so that    
∂(x, y) xu xv 2 −1
= det = det = −1.
∂(u, v) yu yv 1 −1
Thus we have

Z Z Z 1Z 1 ∂(x, y)
(x2 + y)dxdy = ((2u − v)2 + u − v) dudv

S 0 0 ∂(u, v)
Z 1Z 1
= (4u2 − 4uv + v 2 + u − v) dudv
0 0
Z 1 Z 1 
2 2
= (4u − 4uv + v + u − v) du dv
0 0
Z 1 
4 2 1
= − 2v + v + − v dv
0 3 2
4 1 1 1
= −1+ + −
3 3 2 2
2
= .
3
.

5.5 Integration using polar coordinates


This is a special case of integration by substitution. It is potentially useful inp
cases where the
function f (x, y) to be integrated can be expressed as a function of the form h( x2 + y 2 ).
As mentioned before, the change from polar coordinates (r, θ) to rectangular coordinates
(x, y) is given by the function g given by
g(r, θ) = (x(r, θ), y(r, θ),
where
x(r, θ) = r cos θ, y(r, θ) = r sin θ. (5.21)
The Jacobian of this transformation is
∂(x, y) ∂x ∂x

∂r ∂θ

= ∂y ∂y
= r.
∂(r, θ) ∂r ∂θ

So, under the polar substitution (5.21), the formula (5.19) becomes
Z Z Z Z
f (x, y) dxdy = f (r cos θ, r sin θ)r drdθ, (5.22)
S T

where S is the set of integration for f , and T is the corresponding


p set in polar coordinates.
p Note
that when f (x, y) can be expressed in the form h( x2 + y 2 ), because r = x2 + y 2 , (5.22)
takes the simple form Z Z Z Z
f (x, y) dxdy = h(r)r drdθ. (5.23)
S T
80 CHAPTER 5. INTEGRATION IN TWO DIMENSIONS

EXAMPLE. Let S be the set {(x, y) : x2 + y 2 ≤ a2 }. That is, S is the closed circle with
centre 0 and circumference {(x, y) : x2 + y 2 = a2 }. The function

(r, θ) 7−→ (r cos θ, r sin θ)

maps T = [0, a] × (0, 2π) to S, so that in this case we have from (5.23) that
Z Z Z Z Z 2π Z a
1 dxdy = r drdθ = rdrdθ = πa2 . (5.24)
S T 0 0
RR
In fact, as we see in the next section, S 1 dxdy equals the area of S, so (5.24) shows that the
area of a circle of radius a is πa2 .

5.6 Areas as integrals


If we have a suitable set S in R2 , the integral
Z Z
1
S

will exist. The value of this integral is the area of S.


For example, in the case of a rectangle S = [a1 , b1 ] × [a2 , b2 ], we have from (5.12) that
Z Z Z b1 Z b2 
1= 1 dy dx = (b1 − a1 )(b2 − a2 ) = the usual area of S,
S a1 a2

confirming the idea that the area of S is the integral of the constant function 1 over S.
EXAMPLE. Let S be the area between the x-axis, the y-axis and the graph of y = x2 .
Then, !
Z Z Z 1 Z x2 Z 1
1
area of S 1= dy dx = x2 dx = .
S 0 0 0 3

Note that this is almost the same (in this case) as integrating the function x 7−→ x2 and
integrating it over the interval [0, 1] to obtain the area under the graph of the function, as was
carried out in first year. However, the pint is that in some cases we can calculate areas using
double integrals that we could not do by first year techniques, or that may be be more difficult
by first year techniques. One of these is calculating the area of a circle, another is calculating
2
the area under the graph of the function x 7−→ e−x .

5.7 Exercises

1. Let R be the rectangle [−1, 1] × [1, 2] in R2 . Calculate the integrals


Z Z Z Z Z Z
(x − y) dxdy, 1 dxdy and (2x − y 2 ) dxdy.
R R R

2. Evaluate each of the following double integrals, sketching the region of integration.
Z 1 Z x2
(i) xy dy dx
0 0
5.7. EXERCISES 81

Z 1Z y
(ii) (x + y)dx dy
0 y

Z 2 Z + 2y
(iii) √
(3x + 2y)dx dy
0 − 2y
Z 1 Z x3
(iv) ey/x dy dx
0 0
Z π Z x
(v) x sin y dy dx
0 0
Z π Z sin x
(vi) y dy dx
0 0
Z 2 Z y2
(vii) dx dy.
1 y
3. Use a double integral to find the area of the following regions.
(i) The intersection of y = x + 2 and y = x2 .
(ii) the segment of y = cos x cut off by the axes 0x and 0y.
1
(iii) The sector from the origin to the portion of the curve y = between x = 1 and x = 2.
x
4. In the following, change the order of integration to evaluate the integrals, identifying the
region of integration.
Z 1Z 1
2
(i) e−x dxdy.
0 y
Z π Z π
sin y
(ii) dydx.
0 x y
√ √
Z π Z π
(iii) sin x2 dxdy.
0 y
Z 1 Z π/4
(iv) sec x dxdy.
0 tan−1 y
Z 1 Z e2x
(v) ln y dydx.
0 ex
5. Evaluate and describe the regions of integration for
Z a Z √a2 −x2
(i) y dy dx,
0 a−x

Z aZ a2 −x2
(ii) √ dy dx,
0 − a2 −x2

Z 1Z x
(iii) (1 + x2 + y 2 )dy dx
0 x
Z 2 Z x+2
(iv) dy dx
−1 x
Z 2Z 5
(v) xy dx dy.
1 2
82 CHAPTER 5. INTEGRATION IN TWO DIMENSIONS

Z 3 Z 9−x2
6. Evaluate the integral y dy dx by transforming to polar co-ordinates.
−3 0
ZZ
2 −y 2
7. Evaluate the integral e−x dx dy where R is the annulus bounded by the concentric
R
circles x2 + y 2 = 1 and x2 + y 2 = 4.
ZZ
8. Evaluate (x + y)2 ex−y dx dy where R is the region bounded by
R
x + y = 1, x + y = 4, x − y = −1, and x − y = 1.
ZZ
9. Find (x2 + y 2 )dx dy where R is the parallelogram bounded by x + y = 1, x + y = 2,
R
3x + 4y = 5, 3x + 4y = 6.
10. Find the area in the first quadrant which is bounded by xy = 1, xy = 4, y = x and y = 2x.
ZZ
11. Evaluate (x2 + y 2 )dx dy where R is the region in the first quadrant bounded by y = 0,
R
y = x, xy = 1 and x2 − y 2 = 1.
2 2 2 2
ZZ R is the region x + y ≤ 1, x + y − 2y ≥ 0, x ≥ 0, y ≥ 0. Sketch the region. Compute
12.
xey dx dy using the change of variable u = x2 + y 2 ,
R
v = x2 + y 2 − 2y.
 
y−x
ZZ
13. Evaluate exp dx dy where R is the region inside the triangle with vertices (0, 0),
R y+x
(1, 0) and (0, 1), by using the change of variable u = y − x, v = y + x.
14. Calculate the area of the ellipse
n x2 y 2 o
(x, y) : 2 + 2 ≤ 1 ,
a b
by means of a double integral. Then, deduce the area of the circle

(x, y) : x2 + y 2 ≤ r2 .


15. Using repeated integration prove that


Z ∞
2 √
e−x dx = π.
−∞
R∞ R∞ −(x2 +y 2 )
[Hint: Evaluate −∞ −∞ e dxdy using polar coordinates.]
16. The equation Z Z
f= f ◦ g | det(g 0 )|
g(Ω) Ω

in (7.15) leads to the equation in (5.18), namely


Z Z
h
h◦g = .
Ω g(Ω) | det(g 0 ◦ g −1 )|

Explain how this occurs. [Hint: in (7.15) replace f by h/| det(g 0 ◦ g −1 )|, and show how (5.18)
follows.]
Chapter 6

Curves and vector fields

6.1 Curves and paths


Let n ∈ N. Then an n-dimensional curve or path is a continuous function C : J −→ Rn , where
J is an interval. Thus, for each t ∈ J we have C(t) ∈ Rn , so that there are coordinate functions
x1 , x2 , . . . , xn where each xj : J −→ R such that

x1 (t)
 
 x2 (t) 
 ...  , for all t ∈ J.
C(t) =  

xn (t)
That is, in row notation,

C(t) = (x1 (t), x2 (t), . . . , xn (t)), for all t ∈ J.

We shall generally use the term curve rather than path, although the latter is quite common
especially in the integration context. If J = [a, b] and C(a) = C(b), the curve is said to be
closed if C(a) = C(b). Note that if C is a curve given by the function C : [a, b] −→ Rn , then as
t increases from a to b, C(t) will traverse the points on the curve starting at C(a) and ending
at C(b). Thus the curve has a certain “direction” or orientation deriving from the fact that it
commences at C(a) and ends at C(b). In two dimensions, the usual orientation of a closed curve
is in the anti-clockwise direction. Note that when thinking of the curve in geometric terms,
we may sometimes informally refer to the range C(J) of C as the curve and this range may
sometimes be denoted by C. This is brought out in an example below.
The tangent vector to a differentiable curve C at the point C(t0 ) on the curve is the matrix
of the derivative of C at t0 . That is, the tangent vector at C(t0 ) is the vector given by
 x0 (t ) 
1 0
 x02 (t0 ) 
C 0 (t) = 
 ...  .

x0n (t0 )
Alternatively, in row vector form,

C 0 (t0 ) = (x01 (t0 ), x02 (t0 ), . . . , x0n (t0 )).

The tangent vector points in the direction of the curve at the point.

83
84 CHAPTER 6. CURVES AND VECTOR FIELDS

EXAMPLE. Let a > 0 and let C : [0, 2π) −→ R2 be the curve given by
C(t) = (a cos t, a sin t), for t ∈ [0, 2π).
Note that C(t) = (x1 (t), x2 (t)) where x1 (t) = a cos t and x2 (t) = a sin t. Then C is one-to-one
and the range of C is the set
n o
(a cos t, a sin t) : t ∈ [0, 2π) ,

which is the circle of centre 0 and radius a whose Cartesian equation is


x2 + y 2 = a2 .
Note that
x0 (t) = (x01 (t), x02 (t)) = (−a sin t, a cos t) = (−x2 (t), x1 (t)).

( -a sin t, a cos t )

( a cos t, a sin t )
.

t
0 a
.
0 t 2

Figure 6.1. The figure illustrates the curve whose range is the cir-
cle whose centre is the origin and which has radius a. Note that the
parameter t, which is in the interval [0, 2π), has a geometric interpre-
tation as the angle between the line segment joining the origin to the
point (a cos t, a sin t) and the horizontal axis. Note that sometimes, al-
though a curve has been defined as a function, it may sometimes be
loosely indentified with the range of the curve, with its orientation. The
anti-clockwise orientation of the curve is indicated in the figure by the
arrowheads on the curve. The tangent vector to the curve at the point
(a cos t, a sin t) is the vector (−a sin t, a cos t), as indicated.

EXAMPLE. Let a > 0 and let C : [0, 2π) −→ R3 be the curve given by
C(t) = (a cos2 t, a cos t sin t, a sin t), for t ∈ [0, 2π).
6.2. DIRECTIONAL DERIVATIVES 85

Observe that

(a cos2 t)2 + (a cos t sin t)2 + a2 sin2 t = a2 cos4 t + a2 cos2 t sin2 t + a2 sin2 t
= a2 cos2 t(cos2 + sin2 t) + a2 sin2 t
= a2 (cos2 t + sin2 t)
= a2 ,

where we have used the identity cos2 t + sin2 t = 1. Thus the point C(t) on the curve lies on the
sphere in R3 whose centre is the origin and whose radius is a, given by the equation

x21 + x22 + x23 = a2 . 

Note that historically, there has not been a clear distinction between the range of the curve
and the definition given here of the curve as a function. There is a subtle point in considering
curves as the range, in that two curves may be different as functions, but describe essentially the
same curve, in the sense that they have the same range and the same orientation. More specif-
ically, let C : [a, b] −→ Rn be a curve and let φ : [c, d] −→ [a, b] be an increasing differentiable
function that maps [c, d] onto [a, b] in a one-to-one fashion. Then D = C ◦ φ : [c, d] −→ Rn , so
that D is also a curve and has the same range and orientation as C. In this case, we say that
the curves C, D are equivalent. The idea is that C, D are the “same” curve, although they have
different but equivalent parametrizations. The following comments indicate the terminology we
use to deal with this problem of history. As indicated above, we may sometimes identify a
curve with its range and a given orientation, and there may be many functions that effectively
describe the curve. So, we may say that we have a curve C that is parametrized by the function
t 7−→ r(t), instead of identifying the curve with the function r.
Finally, note that if C : [a, b] −→ Rn is a curve, then the function mapping [a, b] into Rn
given by
t 7−→ C (a + b − t)
is a curve D say, that has the same range as C but the opposite orientation — that is, as t
increases from a to b, D(t) maps out the same points as C(t) but in the opposite direction. This
is because, as t increases from a to b, C(t) proceeds from C(a) to C(b), but D(t) proceeds from
C(b) to C(a).

6.2 Directional derivatives


Let u be a unit vector in Rn . That is, u ∈ Rn and |u| = 1. We think of u as determining a
direction in Rn . Let Ω be an open subset of Rn , let

f : Ω −→ R

be a given function, and let x ∈ Ω be given. Then the derivative of f at x in the direction u is
the number Du f (x), if it exists, given by

f (x + hu) − f (x)
Du f (x) = lim . (6.1)
h→0 h
Then Du (x) in equation (6.1) is called the directional derivative of f at x in the direction u, or
the derivative of f at x in the direction u. Note that the partial derivatives D1 f (x), . . . , Dn f (x)
of f at x are special cases of directional derivatives, for it follows from (6.1) and the definition of
86 CHAPTER 6. CURVES AND VECTOR FIELDS

Dj f (x), that Dj f (x) is the derivative of f in the direction ej , where ej = (0, . . . , 0, 1, 0, . . . , 0) ∈


Rn . That is,
Dj f (x) = Dej f (x), for x ∈ Ω.
Thus, each partial derivative of f is a directional derivative.
Given a unit vector u ∈ Rn , we now show that the directional derivative Du f may be
expressed in terms of the partial derivatives of f .

Theorem 13 Let u = (u1 , u2 , . . . , un ) be a unit vector in Rn , let Ω be an open subset of Rn ,


and let
f : Ω −→ R
be a given differentiable function. Then, for all x ∈ Ω, the directional derivative Du f (x) exists
and
n
X
Du f (x) = hu, f 0 (x)i = uj Dj f (x). (6.2)
j=1

That is, on Ω,
n
X
Du f = hu, f 0 i = uj Dj f.
j=1

PROOF. By (6.1),

f (x1 + hu1 , x2 + hu2 , . . . , xn + hun ) − f (x1 , x2 , . . . , xn )


Du f (x) = lim
h→0 h
That is, Du f (x) is the derivative at 0 of the function mapping R into R given by

t 7−→ f (x1 + tu1 , . . . , xn + tun ).

But this function is the composition of the function f : Rn −→ R with the function θ given by

t 7−→ (x1 + tu1 , x2 + tu2 , . . . , xn + tun ),

mapping R into Rn . Note that θ(0) = x. Thus, by the Chain Rule we see that Du f (x) exists
and that

Du f (x) = (f ◦ θ)0 (0)


= f 0 (θ(0))θ0 (0)
u1
 
!
 u2 
= D1 f (x) D2 f (x) · · · Dn f (x)  . 
 .. 
un
n
X
= uj Dj f (x).
j=1


This result can be compared with Exercise 21 in 4.12.
EXAMPLE. Let Ω = {(x, y) : (x, y) ∈ R2 and x 6= 0 and y 6= 0.}. Then, let f : Ω −→ R be
given by
x y
f (x, y) = +
y x
6.3. VECTOR FIELDS 87

Then,  
0 1 y x 1
f (x, y) = − 2 − 2+ .
y x y x
Taking the point (−1, 2), for example, we have
   
1 2 −1 1 3 3
f 0 (−1, 2) = − − + = − −
2 1 4 −1 2 4
Thus the directional derivative of f at (−1, 2) in the direction of the unit vector u = (3/5, 4/5),
say, is

Du f (−1, 2) = hu, f 0 (−1, 2)i = h(3/5, 4/5), (−3/2, −3/4)i = −9/10 − 3/5 = −3/2.

Note that r r √
0 9 9 45 3 5 3
|f (−1, 2)| = + = = > = |Du f (−1, 2)|.
4 16 16 4 2
0
(We see below that f (x, y) gives the direction of greatest change of the function at the point
(x, y).)

6.3 Vector fields


Let Ω be an open subset of Rn . Then a vector field on Ω is a continuously differentiable function

F : Ω −→ Rn .

Thus, if F is a vector field on Ω ⊆ Rn , F (x) is a vector in Rn for all x ∈ Ω. So, F assigns an


n-dimensional vector to each point in Ω, so we can write for each x ∈ Ω,

F (x) = (F1 (x), F2 (x), . . . , Fn (x)).

Each function Fj maps Ω into R, and F1 , F2 , . . . , Fn are called the coordinate functions of F , as
before. A vector field is a special type of function mapping a subset of Rn into Rm , namely it
is the case that occurs when m = n.
Vector fields arise in physics. For example, according to Newton’s Law of Gravitation, a
mass M exerts a gravitational force F of attraction on another mass m whose magnitude |F | is,
according to Newton’s inverse square law,
GM m
|F | = ,
r2
where G is the gravitational constant and r is the distance between the two masses. However,
the gravitational force F is a vector. Let us take the mass M as being concentrated at the
origin. Let us take the mass m as being at the point x ∈ R3 , a point that may vary, and denote
the corresponding vector, the force of attraction between M and m, by F (x). Then,

x 7−→ F (x)

is a vector field on R3 . The actual formula for F (x) in this context is


GM m
F (x) = − x, for all x ∈ R3 . (6.3)
|x|3
88 CHAPTER 6. CURVES AND VECTOR FIELDS

Figure 6.2 . The figure illustrates the vector field arising from a mass
situated at the origin, due to the gravitational force. The length of
the arrows indicates (very roughly) the magnitude of the force and the
direction of the arrows indicates the direction of the force at the point.

Figure 6.3. The figure illustrates the velocity vector field arising in two
dimensions from the velocity at point on a disc rotating at a constant
angular velocity around the origin.
6.4. THE GRADIENT 89

Note that this gives the correct value for |F (x)|, and that the force of attraction is in the
direction from x backwards towards the origin – see Figure 6.2. More complicated vector fields
would arise if there were more than two masses.
Vector fields arise in a similar way in electrostatics for example, where we have Coulomb’s
Law in place of Newton’s, and in other areas of physics. According to Einstein’s General Theory
of Relativity, the gravitational force even affects light, and a confirmation of the theory occured
when the sun was observed to bend light rays during the transit of Venus in 1918.

6.4 The gradient


The gradient is no more than the matrix of the derivative, but in a special context. Let Ω be an
open subset of Rn and let f : Ω −→ R be a differentiable function. Then, if x ∈ Ω, the gradient
gradf (x) of f at x is given by

gradf (x) = (D1 f (x), D2 f (x), . . . , Dn f (x)).

That is,  
∂f ∂f ∂f
gradf (x) = , ,..., .
∂x1 ∂x2 ∂xn
Another notation for gradf (x) is ∇f (x). Thus, in fact,

gradf (x) = ∇f (x) = f 0 (x),

the matrix of the derivative of f at x. This proliferation of notations is due to historical reasons.
Here, the notation ∇f (x) for the gradient will be used sparingly. Note that the gradient of f is
obtained from a real valued function f on a subset of Rn .
Since Ω ⊆ Rn and gradf (x) ∈ Rn , we see that the function

x 7−→ gradf (x)

is a vector field on Ω.
The direction of gradf (x) is the direction of the greatest rate of change of the function f at
x. What does this statement mean? Well we saw in (6.2) that given the unit vector u,

Du (f )(x) = hu, f 0 (x)i.

We assume that f 0 (x) 6= 0. Then, by the Cauchy-Schwartz inequality and as |u| = 1,

|Du (f )(x)| = |hu, f 0 (x)i| ≤ |u| · |f 0 (x)| = |f 0 (x)|, (6.4)

and equality holds here if and only if u is a multiple of f 0 (x). As u is a unit vector, equality in
(6.4) therefore is equivalent to having
f 0 (x) f 0 (x)
u= or u = − .
|f 0 (x)| |f 0 (x)|
Observing that when
f 0 (x)
v=
|f 0 (x)|
we have
f 0 (x) 0 |f 0 (x)|2
Dv (f )((x) = hv, f 0 (x)i = h , f (x)i = = |f 0 (x)| > 0,
|f 0 (x)| |f 0 (x)|
90 CHAPTER 6. CURVES AND VECTOR FIELDS

we see that Du (f )(x) attains its maximum value when u = v – that is when

f 0 (x) grad(f )(x)


u=± =± ,
|f 0 (x)| |grad(f )(x)|
which is when u is in the in the unit vector direction of the gradient of f at x.
EXAMPLE. Let f : R3 −→ R be given by

f (x, y, z) = x2 + y 2 + z 2 + (x + y + z)2 .

Then
 
∂f ∂f ∂f
gradf (x, y, z) = , ,
∂x ∂y ∂z
= (2x + 2(x + y + z), 2y + 2(x + y + z), 2z + 2(x + y + z)
= (4x + 2y + 2z, 2x + 4y + 2z, 2x + 2y + 4z).


EXAMPLE. Let
f : {w : w ∈ R3 and w 6= 0} −→ R
be given by
GM m GM m
f (x, y, z) = = ,
x2 2
+y +z 2 r2
p
where r = x2 + y 2 + z 2 is the distance of (x, y, z) from the origin. Thus, according to Newton’s
Law of Gravitation, f (x, y, z) represents the magnitude of the gravitational attractive force acing
on a mass m at (x, y, z), if a mass M is situated at the origin. We have
 
2GM mx 2GM my 2GM mz
gradf (x, y, z) = − 2 ,− ,−
(x + y 2 + z 2 )2 (x2 + y 2 + z 2 )2 x2 + y 2 + z 2
2GM m
= − 2 (x, y, z) .
(x + y 2 + z 2 )2

This reflects the observation that in this physical situation, gradf (x, y, z) is a vector pointing
back from (x, y, z) to the origin, which is also the direction in which the gravitational force of
attraction is exerted as far as the mass m is concerned. 
DEFINITIONS. A vector field F defined in an open subset of Rn is called conservative or
exact if there is a scalar function φ : Ω −→ R such that

F = gradφ.

That is, F is conservative means that there is φ : Ω −→ Rn such that


 
∂φ ∂φ ∂φ
F = (D1 φ, D2 φ, . . . , Dn φ) = , ,..., .
∂x1 ∂x2 ∂xn

In this case, φ is called a potential function for the field F .


If we have a conservative vector field F = (F1 , F2 ) in a 2-dimensional open set Ω, there is a
potential function φ : R2 −→ R such that
∂φ ∂φ
F1 (x, y) = , and F2 (x, y) = .
∂x ∂y
6.5. THE DIVERGENCE OF A VECTOR FIELD 91

Thus,
∂F1 ∂2φ ∂2φ ∂F2
= = = .
∂y ∂y∂x ∂x∂y ∂x
This proves the following result.

Theorem 14 If F = (F1 , F2 ) is a conservative vector field in some open subset Ω of R2 , then


∂F1 ∂F2
= (6.5)
∂y ∂x
in Ω.

We shall see later that the converse of this statement holds. That is, if (6.5) holds, then the
vector field F = (F1 , F2 ) is conservative.

6.5 The divergence of a vector field


Let Ω be an open subset of Rn and let F : Ω −→ Rn be a differentiable vector field. Put
F = (F1 , F2 , . . . , Fn ). Then the divergence of the vector field F at a point x ∈ Ω is given by
n n
X X ∂Fj
divF (x) = Dj Fj (x) = . (6.6)
∂xj
j=1 j=1

Note that the divergence of the vector field F at x is a number. Thus,


x 7−→ divF (x)
is a function mapping Ω to R.
The most common case is when n = 3, so that Ω ⊆ R3 and F : Ω −→ R3 . In this case,
∂F1 ∂F2 ∂F3
divF (x, y, z) = + + .
∂x ∂y ∂z
Now the function f 7−→ ∂f ∂
∂x may be denoted by ∂x , with similar definitions for

∂y and ∂
∂z . Then
∂ ∂ ∂
∂x , ∂y , ∂z are sometimes called operators. If we use the notation
 
∂ ∂ ∂
∇= , , ,
∂x ∂y ∂z
then (6.6) may be written as
divF = h∇, F i
or even as ∇ · F (but remember we use h , i to denote the inner product, not the dot notation).
DEFINITION. A vector field F in an open set Ω is called solenoidal if divF = 0.
EXAMPLE. Let F be the vector field on R3 given by
F (x, y, z) = (x2 + yz, −2y(x + z), xy + z 2 ).
Then,
divF (x, y, z) = 2x − 2(x + z) + 2z = 0.
Thus, F is solenoidal.
The divergence of a vector field at a point x has the physical meaning that it measures the
rate at which the field “spreads away from” or “diverges from” x. The physical meaning of the
divergence will become clearer when we have discussed the divergence theorem.
92 CHAPTER 6. CURVES AND VECTOR FIELDS

6.6 The curl of a vector field


Let Ω be an open subset of R3 and let F : Ω −→ R3 be a differentiable vector field. Put
F = (F1 , F2 , F3 ). Then the curl of the vector field F at a point (x, y, z) ∈ Ω is given by

e1 e2 e3


∂ ∂ ∂
curlF (x, y, z) =
∂x ∂y ∂z


F F2 F3
1
 
∂F3 ∂F2 ∂F1 ∂F3 ∂F2 ∂F1
= − , − , − . (6.7)
∂y ∂z ∂z ∂x ∂x ∂y

Note that the curl of F at x is a vector. Thus, the function

(x, y, z) 7−→ curlF (x, y, z)

is a function mapping Ω into R3 , and we see that curlF is a vector field on Ω. The equation
(6.7) is sometimes written in terms of ∇ and the cross product notation as

curlF = ∇ × F.

The curl of a vector field at a point x has the physical meaning that it measures the rate at
which the field ”swirls around” or “curls around” or “rotates around” x. Note that the curl of
a vector field is always solenoidal because
     
∂ ∂F3 ∂F2 ∂ ∂F1 ∂F3 ∂ ∂F2 ∂F1
− + − + −
∂x ∂y ∂z ∂y ∂z ∂x ∂z ∂x ∂y
2
∂ F3 2
∂ F2 2
∂ F1 2
∂ F3 2
∂ F2 2
∂ F1
= − + − + −
∂x∂y ∂x∂z ∂y∂z ∂y∂x ∂z∂x ∂z∂y
= 0.

EXAMPLE. Consider a solid rotating with a uniform angular speed ω anti-clockwise about
the Z-axis. The velocity at a point (x, y, z) in the solid defines a vector field F given by

F (x, y, z) = (−ωy, ωx, 0).

Then,
F1 (x, y, z) = −ωy, F2 (x, y, z) = ωx, and F3 (x, y, z) = 0,
so we have
curlF = (0, 0, ω + ω) = (0, 0, 2ω).
Thus, curlF is constant.
DEFINITION. A vector field F is called irrotational if curlF = 0.
EXAMPLE. Consider the vector field of gravitation in R3 , as before in (6.3). We have that
the force F (x) at a point x = (x1 , x2 , x3 ) ∈ R3 is

GmM x GmM
F (x) = − =− 2 (x1 , x2 , x3 ).
|x|3 (x1 + x22 + x23 )3/2
6.7. EXERCISES 93

Note that this gives


|x|
|F (x)| = GmM = GmM |x|−2 ,
|x|3
which shows that the magnitude of the gravitational force is governed by an inverse square law,
owing to the presence of the term |x|−2 – this is Newton’s Inverse Square Law for gravitational
attraction. Now from (6.6) we have

GmM x1
F1 (x1 , x2 , x3 ) = − ,
+ x22 + x23 )3/2
(x21
GmM x2
F2 (x1 , x2 , x3 ) = − 2 , and
(x1 + x22 + x23 )3/2
GmM x3
F3 (x1 , x2 , x3 ) = − 2 .
(x1 + x22 + x23 )3/2
p
Thus, if we put r = x21 + x22 + x23 we have

∂F2 3GmM x2 x3 ∂F3 3GmM x2 x3


= 5
and = .
∂x3 r ∂x2 r5
Thus,
∂F2 ∂F3
− = 0,
∂x3 ∂x2
and similarly,
∂F1 ∂F2 ∂F1 ∂F3
− = 0 and − = 0.
∂x2 ∂x1 ∂x3 ∂x1
It follows that curlF = 0, so that the vector field F of gravitational force is irrotational in this
case. This is exactly what we would expect from Figure 6.2.

Theorem 15 Any conservative vector field is irrotational.

PROOF. For, if F is conservative there is a real valued function φ such that


 
∂φ ∂φ ∂φ
F = , , .
∂x ∂y ∂z

Then,
∂2φ ∂2φ ∂2φ ∂2φ ∂2φ ∂2φ
 
curlF = − , − , − = (0, 0, 0).
∂y∂z ∂z∂y ∂z∂x ∂x∂z ∂x∂y ∂y∂x

Under certain conditions, the converse of this theorem is valid – that is, that an irrotational
vector field is conservative.

6.7 Exercises

1. Let a, b > 0. Sketch a picture of the curve C given by

r(t) = (a cos t, b sin t), for 0 ≤ t ≤ 2π.


94 CHAPTER 6. CURVES AND VECTOR FIELDS
√ √
Calculate the tangent vector to the curve at the point (a/ 2, b/ 2) on the curve, and at a
general point on the curve.

2. Let u = (1/3, 2/3, −2/3) ∈ R3 and let f : R3 −→ R be given by

f (x, y, z) = x2 − 3y + z 4 .

Calculate D1 f (x, y, z), D2 f (x, y, z), D3 f (x, y, z), f 0 (x, y, z), Du f (x, y, z) and Du f (1, 1, −2).

3. Let f : R2 −→ R be given by
x
f (x, y) = ,
y2 +1
and let u = (3/5, 4/5). Verify that u is a unit vector and calculate Du f (1, −1).

4. This is a type of converse to the identity (6.2). Let u1 , u2 , u3 be 3 unit vectors in R3 that are
also orthogonal. Let uj = (uj1 , uj2 , uj3 ) for j = 1, 2, 3 and let f : Ω −→ R where Ω is a given
open subset of R3 . Prove that
3
X
Dk f (x) = ujk Duj f (x), for k = 1, 2, 3 and for all x ∈ Ω.
j=1

This expresses the partial derivatives of f in terms of the directional derivatives Duj f for j =
1, 2, 3.

5. Find a function φ : R2 −→ R such that

gradφ(x, y) = (2xy − y 2 , −2xy + x2 ),

for all (x, y) ∈ R2 . Explain why the vector field F

(x, y) 7−→ (2xy − y 2 , −2xy + x2 )

is conservative. Also, calculate divF .

6. Let φ : Rn −→ R be a given twice differentiable function. Calculate div(gradφ) in terms of


the second order partial derivatives of φ. If div(gradφ) = 0, write down this equation in terms
of the partial derivatives of φ, and show that in the case when n = 2, the functions

(x, y) 7−→ x3 − 3xy 2

and
(x, y) 7−→ y 3 − 3x2 y
satisfy this equation.

7. Calculate divF and curlF for the vector field F on R3 given by

F (x, y, z) = (xy 2 , −yz 2 , zx2 ).


6.7. EXERCISES 95

8. Let G be the vector field on R3 given by

G(x, y, z) = (xe2z , ye2z , −e2z ).

Prove that G is solenoidal.


96 CHAPTER 6. CURVES AND VECTOR FIELDS
Chapter 7

Integration on curves and surfaces

7.1 Integration of vector fields along curves in Rn


Let C be a curve in Rn parametrized by the function r : [a, b] −→ Rn and let r1 , r2 , . . . , rn be
the respective coordinate functions, so that

r(t) = (r1 (t), r2 (t), . . . , rn (t)), for a ≤ t ≤ b.

As mentioned earlier, the curve C is closed if

r(a) = r(b).

This means that C “joins up” with itself at the beginning and end of the curve. Now, let f be
a continuous function defined at the points of the curve. That is, the domain of f is the range
of C, which is to say that the domain of f is
n o
r(t) : a ≤ t ≤ b .

LetR F be a vector field in Rn . Then, the integral of F along the curve C is denoted by
R
C F · dr
or C hF (t), r0 (t)idt and it is defined by
Z Z Z b
0
F · dr = hF (r(t)), r (t)idt = hF (r(t)), r0 (t)idt. (7.1)
C C a

If we write r = (r1 , r2 , . . . , rn ) and F = (F1 , F2 , . . . , Fn ) where r1 , r2 , . . . , rn are the coordinate


functions of r and F1 , F2 , . . . , Fn are the coordinate functions of F , this may be written as
   
Z Z Z b X n Z X n
F · dr = hF (r(t)), C 0 (t)idt =  Fj (r(t))rj0 (t) dt =  Fj dxj  . (7.2)
C C a j=1 C j=1
R
An integral of the form R C F · dr is called an integral along a curve or, sometimes, a line integral.
The line integral C F · dr in (7.1) has a physical interpretation. If we think of the R vector
n
field as a field of force, with F (x) being the force vector at the point x of R , then C F · dr
represents the work required to drag a particle of unit mass (say) along the curve against the
“resistance” created by the vector field. This is because the expression hF (r(t)), r0 (t)i physically
represents the component of the vector field in the tangential direction to the curve at the point
r(t), so that integrating this over t, as we proceed along the curve, gives the total “force” times

97
98 CHAPTER 7. INTEGRATION ON CURVES AND SURFACES

”distance” along the curve — that is, the total work required as we proceed along the curve
from the start to the finish.
Note that the integral in (7.2) might also be written in the form
Z n
Z X
F · dr = Fj drj ,
C C j=1

or Z n
Z X
F · dr = Fj dxj .
C C j=1

This notation is common in the case of R2 , as we now see.

7.2 Integration along curves in R2


Let C be a curve in R2 . We assume that C : [a, b] −→ R2 and that

C(t) = (x(t), y(t)), for a ≤ t ≤ b.


R
Now, let M, N : range of C −→ R be continuous functions. Then the integral C M dx + N dy is
a special case of (7.2) with n = 2, F = (M, N ) and C(t) = (x(t), y(t)), and it becomes
Z Z Z b
0
M dx + N dy = hF (C(t)), C (t)i dt = M (x(t), y(t))x0 (t)dt + N (x(t), y(t))y 0 (t)dt.
C C a

Note that the right hand side in this definition is a standard integral involving functions of a
single variable only, and so can
R be evaluated by standard techniques.
H
If the curve C is closed, C M dx + N dy may be denoted by C M dx + N dy.
EXAMPLE. Let C(t) = (cos t, sin t), for 0 ≤ t < 2π. Note that C(0) = C(2π) = (1, 0), so
that the curve C is closed. Let M (x, y) = x and N (x, y) = y. Then,
Z I I Z 2π h i
M dx + N dy = M dx + N dy = xdx + ydy = cos t(− sin t) + sin t(cos t) dt = 0.
C C C 0

Also, if
P (x, y) = x and Q(x, y) = x + y,
we have
I I
P dx + Qdy = xdx + (x + y)dy
C C
Z 2π h i
= cos t(− sin t) + (cos t + sin t)(cos t) dt
0
Z 2π
= cos2 tdt
0
1 2π
Z
= 2 cos2 tdt
2 0
1 2π
Z
= (cos 2t + 1)dt
2 0
= π.
7.3. GREEN’S THEOREM IN R2 99

7.3 Green’s Theorem in R2

Green’s theorem in R2 connects a line integral along a curve with a double integral over the
open set inside the curve.

Theorem 16 (Green’s Theorem). Let C be a continuously differentiable closed curve in R2


and R be the closed set in R2 bounded by the closed curve C. Assume that the curve C is
parametrized by the function t 7−→ r(t). Let M and N be continuous functions on D having
continuous partial derivatives in R. Then,

I I Z Z  
∂N ∂M
(M, N ) · dr = M dx + N dy = − dx dy.
C C R ∂x ∂y

Note that it is unlikely that the integrand of a double integral will have the form ∂N /∂x −
∂M /∂y. Hence the theorem is usually used going from the left to the right. The usefulness of
the result then depends upon whether we can integrate the function ∂N /∂x − ∂M /∂y over R.

PROOF OF GREEN’S THEOREM. Consider when R is the rectangle

R = [a, b] × [c, d].

Then C is the boundary of the rectangle and is parametrized by



 (a + t(b − a), c), if 0 ≤ t ≤ 1;




 (b, c + t(d − c)), if 0 ≤ t ≤ 1;


r(t) =
(b + t(a − b), d), if 0 ≤ t ≤ 1;








(a, d + t(c − d)), if 0 ≤ t ≤ 1.

The curve takes us around the boundary of the rectangle in the anti-clockwise direction (see
Figure 7.1).
100 CHAPTER 7. INTEGRATION ON CURVES AND SURFACES

Y
C
d

R
c

0
a b X

Figure 7.1. The curve C goes anticlockwise around the boundary of the
rectangle R.

The integral along the part of the curve going from (a, c) to (b, c) is
Z 1 Z b
M ((a + t(b − a), c)(b − a)dt + 0 = M (u, c)du. (7.3)
0 a

Similarly we have: the integral along the part of the curve going from (b, c) to (b, d) is
Z 1 Z d
0+ N (b, c + t(d − c))(d − c)dt = N (b, u)du. (7.4)
0 c

The integral along the part of the curve going from (b, d) to (a, d) is
Z 1 Z b
M (b − t(b − a), d)(a − b)dt + 0 = − M (u, d)du. (7.5)
0 a

The integral along the part of the curve going from (a, d) to (a, c) is
Z 1 Z d
0+ N (a, d + t(c − d))(c − d)dt + 0 = − N (a, u)du. (7.6)
0 c
7.3. GREEN’S THEOREM IN R2 101

From equations (7.3), (7.4), (7.5) and (7.6) we have


Z Z b Z d
M dx + N dy = (M (u, c) − M (u, d)) du + (N (b, u) − N (a, u)) du. (7.7)
C a c

On the other hand, changing the order of integration in the double integral over R will give
Z Z   Z d Z b  Z b Z d 
∂N ∂M ∂N ∂M
− dx dy = dx dy − dy dx
R ∂x ∂y c a ∂x a c ∂y
Z d Z b
= (N (b, y) − N (a, y))dy − (M (x, d) − M (x, c))dx.(7.8)
c a

It follows from (7.7) and (7.8) that


Z Z Z  
∂N ∂M
M dx + N dy = − dx dy,
C R ∂x ∂y

as required.
This proves Green’s Theorem for a rectangle, and the case for general sets can be proved
by approximating a set by a finite union of arbitrarily small disjoint rectangles. The idea is
illustrated in Figure 7.2, where the open set Ω inside the curve C may be approximated by the
union of the smaller disjoint rectangles contained entirely within the curve C.

curve C

Figure 7.2 The approximation of a set by rectangles.

The idea is that because Green’s Theorem holds for each rectangle, as we have seen, it
must hold for the union of these rectangles. The reason for this is illustrated in Figure 7.3,
where we see that when we integrate about the boundaries of the four rectangles in the Figure,
the integrals along the overlapping sides cancel out, because they are the same integral but
going in opposite directions in each case. This means that the only integrals that do not cancel
102 CHAPTER 7. INTEGRATION ON CURVES AND SURFACES

out are those along the boundary of the set which is the finite union of the rectangles. Since
Green’s Theorem now holds for the finite union of the approximating rectangles in Figure 7.2,
an approximation argument enables us to deduce (under certain conditions), that it is true for
the set Ω that is the inside the given curve C. 
EXAMPLE. Let F = (P, Q), where
P (x, y) = x2 − y and Q(x, y) = x + y 2 − 1.
(a) Let C be the line segment joining (1, 1) to (2, 3). Then, we may parametrize C by
r(t) = (r1 (t), r2 (t)) = (1 + t, 1 + 2t), for 0 ≤ t ≤ 1.
Then,
Z Z
hP, Qi · dr = P dx + Qdy
C C
Z 1 Z 1
2
= ((1 + t) − (1 + 2t)) · 1dt + ((1 + t) + (1 + 2t)2 − 1) · 2dy
0 0
Z 1 Z 1
2
= t dt + 2 (4t2 + 5t + 1)dt
0 0
1 8 10
= + + +2
3 3 2
= 10.
(b) Let D be the curve parametrized by
D(t) = (1 + t, 2t2 + 1), for 0 ≤ t ≤ 1.
Like C, the curve D starts at (1, 1) and ends at (2, 3). We have,
Z Z
hP, Qi · dr = P dx + Qdy
D D
Z 1 Z 1
2 2
= ((1 + t) − (2t + 1))dt + ((1 + t) + (2t2 + 1)2 − 1)4tdt
0 0
Z 1 Z 1
2
= (−t + 2t)dt + 4 (t + 1 + 4t2 + 4t4 )tdt
0 0
Z 1 Z 1
= (−t2 + 2t)dt + 4 (t + t2 + 4t3 + 4t5 )dt
0 0
 
−1 2 1 1 4
= + +4 + +1+
3 2 2 3 6
32
= .
3

Thus, although the curves C, D begin and end at the same points, we have
Z Z
hP, Qi · dr 6= hP, Qi · dr.
C D

We shall see that we do not generally expect equality in such a case unless
∂Q ∂P
= .
∂x ∂y
7.4. INTEGRALS THAT ARE INDEPENDENT OF THE PATH 103

7.4 Integrals that are independent of the path


Green’s Theorem says that under certain conditions on the closed curve C, and on the functions
M, N , I Z Z  
∂N ∂M
M dx + N dy = − dx dy.
C R ∂x ∂y
Now if it happens that
∂N ∂M
= ,
∂x ∂y
it therefore follows that I
M dx + N dy = 0. (7.9)
C

D C I

A H
B

E F G

0 X
Figure 7.3. Integrals along the common sides of the rectangles cancel out
as they are in the opposite direction. For example, the integral along AB in
the rectangle ABCD cancels out with the integral along BA in the rectangle
EF BA, and so on for the other rectangles. Thus, if we integrate around
all the boundaries of the rectangles ABCD, EF BA, F GHB, HICB, the
cancellations mean we are left with the integral around the boundary of
the large rectangle EGID.

Theorem 17 If
∂N ∂M
= ,
∂x ∂y
104 CHAPTER 7. INTEGRATION ON CURVES AND SURFACES

and if C1 and C2 are curves in R2 that both have a starting point (a1 , a2 ) and an ending point
(b1 , b2 ), then Z Z
M dx + N dy = M dx + N dy. (7.10)
C1 C2
R
Thus, in this case, the integral C M dxR + N dy along a curve C depends only on the starting and
ending points of the curve C, so that C M dx + N dy is “independent of the path”, in the sense
that it depends only on the points at the beginning and the end of the curve.
PROOF. If (7.10) holds, (7.9) gives
I
M dx + N dy = 0,
C

for any closed curve C. Now let C1 and C2 be two curves each of which has a starting point
(a1 , a2 ) and an ending point (b1 , b2 ). Now, let C denote the closed curve that starts at (a1 , a2 )
and proceeds along C1 to (b1 , b2 ), and then proceeds from (b1 , b2 ) along C2 in the reverse direction
back to (a1 , a2 ). Then, by (7.9), we have
Z Z Z
0= M dx + N dy = M dx + N dy − M dx + N dy,
C C1 C2

so it follows that Z Z
M dx + N dy = M dx + N dy.
C1 C2


Theorem 18 The vector field (M, N ) on R2 is conservative (or exact) if and only if
∂N ∂M
= .
∂x ∂y
PROOF. If (M, N ) is conservative we have seen that
∂N ∂M
= ,
∂x ∂y

by Theorem 17. Conversely, if ∂N ∂M 2


∂x = ∂y , let (a, b) ∈ R be given, and let C(u, v) denote any
curve that starts at (a, b) and ends at (u, v). Then, define a function φ : R2 −→ R by putting
Z
φ(u, v) = M dx + N dy.
C(u,v)

By virtue of Theorem 17, the value of φ(u, v) does not depend on the particular curve C(u, v)
used to go from (a, b) to (u, v). So, let us take C(u, v) to be the curve that goes from (a, b) to
(u, v) as follows. Let
C1 (t) = (a + t(u − a), b), for 0 ≤ t ≤ 1.
Thus, C1 is the line segment joining (a, b) to (u, b). Also, let

C2 (t) = (u, b + (t − 1)(v − b)), for 1 ≤ t ≤ 2.

Thus, C2 is the line segment joining (u, b) to (u, v). We take the curve C(u, v) to be the one
obtained by proceeding along C1 and then along C2 . Thus, C(u, v) starts at (a, b) and ends at
(u, v).
7.5. SURFACES AND NORMALS 105

Now, Z Z 1 Z u
M dx + N dy = M (a + t(u − a), b)(u − a)dt = M (x, b)dx,
C1 0 a
and Z Z 2 Z v
M dx + N dy = N (u, b + (t − 1)(v − b))(v − b)dt = N (u, y)dy.
C2 1 b
Thus, Z Z u Z v
M dx + N dy = M (x, b)dx + N (u, y)dy.
C(u,v) a b
Hence, !
Z Z v 
∂φ ∂ ∂
= M dx + N dy = N (u, y)dy = N (u, v).
∂v ∂v C(u,v) ∂v b

A similar argument, using a different curve from C(u, v), shows that
∂φ
= M (u, v).
∂u
Putting x, y in place of u, v gives
 
∂φ ∂φ
gradφ = , = (M, N ),
∂x ∂y
showing that the vector field (M, N ) is conservative. 

7.5 Surfaces and normals


A set S is called a surface in Rn if there is a continuously differentiable function f : Rn −→ R
such that
S = {x : x ∈ Rn and f (x) = 0};
that is
S = {(x1 , x2 , . . . , xn ) : (x1 , x2 , . . . , xn ) ∈ Rn and f (x1 , x2 , . . . , xn ) = 0}. (7.11)
In the case n = 2, a surface S in R2 would take the form

S = {(x, y) : (x, y) ∈ R2 and f (x, y) = 0}.

In the case n = 3, a surface S in R3 would take the form

S = {(x, y, z) : (x, y, z) ∈ R3 and f (x, y, z) = 0}.

Let S be a surface as given by (7.11). Let x0 ∈ S and let C be a smooth curve that lies in S
and which goes through x0 . That is, there is some open interval J, a continuously differentiable
function C : J −→ S and a point t0 ∈ J such that C(t0 ) = x0 . Then, a vector z ∈ Rn is normal
to S at the point x0 ∈ S if for every such curve C

hC 0 (t0 ), zi = 0.

That is, z is normal to the surface S at the point x0 ∈ S if, for every smooth curve C that lies
in S and goes through x0 , z is orthogonal to the tangent vector of C at x0 .
Now it is possible that under the definition, there are many independent vectors normal to
a surface at a point. The following Theorem clarifies the situation.
106 CHAPTER 7. INTEGRATION ON CURVES AND SURFACES

Theorem 19 Let S be a surface as given by (7.11) with n ≥ 2, and assume that for all x ∈ S,
Df (x) 6= 0. Thus, for each x ∈ S, there is j ∈ {1, 2, . . . , n} such that Dj (x) 6= 0. Let x ∈ S.
Then the following hold.
(1) There is a set of n − 1 linearly independent vectors in Rn such that each vector is the
tangent vector at x of a differentiable curve lying in S.
(2) There is no set of n linearly independent vectors in Rn such that each vector is the
tangent vector at x of a differentiable curve lying in S.
(3) Let y, z be two non-zero vectors in Rn that are normal to S at x0 . Then there is α ∈ R
such that y = αz.

PROOF. Put x = (x1 , x2 , . . . , xn ) ∈ S. Then, Dj f (x) 6= 0 for some j, and by renumbering


the variables, if necessary, we may take it that Dn f (x) 6= 0. But now the Implicit Function
Theorem (Theorem 10, Section 4.9) applies, and we deduce that there is an open set V in
Rn−1 and and an open set W in R such that (x1 , x2 , . . . , xn−1 ) ∈ V and there is a one-to-one
differentiable function h : V −→ W such that f (y, h(y)) = 0 for all y ∈ V . Thus, (y, h(y)) ∈ S
for all y ∈ V . Now, for each j = 1, 2, . . . , n − 1, let Cj be the curve in S given by

Cj (t) = (x1 , . . . , xj−1 , xj + t, xj+1 , . . . , xn−1 , h(x1 , . . . , xj−1 , xj + t, xj+1 , . . . , xn−1 )) ∈ S

for all t in some suitable open interval about 0. Notice that Cj (0) = x. Thus, Cj is a differentiable
curve in S and the tangent vector to Cj at x is

Cj0 (0) = (0, . . . , 0, 1, 0 . . . , 0, Dj h(x)),

where the definition of the partial derivative of the function

t 7−→ h(x1 , . . . , xj−1 , xj + t, xj+1 , . . . , xn−1 )

has been used. Note that the vectors

(1, 0 . . . , 0, D1 h(x)), (0, 1 . . . , 0, D2 h(x)), (0, . . . , 0, 1, Dn−1 h(x)) (7.12)

are linearly independent in Rn . This proves that C10 (x), C20 (x), . . . , Cn−1
0 (x) are linearly inde-
pendent and this proves (1).
Now let C = (C1 , . . . , Cn ) be a differentiable curve in S passing through x, and suppose
C(t0 ) = x. Then, Cn (t) = h(C1 (t), . . . , Cn−1 (t)) and using the Chain Rule we have

C 0 (t0 ) = ((C10 (t0 ), . . . , Cn−1


0
(t0 ), Cn0 (t0 ))
 
n−1
X
= C10 (t0 ), . . . , Cn−1
0
(t0 ), Dj h(x)Cj0 (t0 )
j=1

= (C10 (t0 ), 0, 0, . . . , D1 h(x)C10 (t0 )) + · · · + (0, 0, . . . , Cn−1


0 0
(t0 ), Dn−1 h(x)Cn−1 (t0 ))
0 0
= C1 (t0 )(1, 0 . . . , 0, D1 h(x)) + · · · + Cn−1 (t0 )(0, 0 . . . , 0, 1, Dn−1 h(x)).

This proves that the tangent vector at x of any differentiable curve in S that goes through x is
a linear combination of the n − 1 linearly independent vectors in (7.12), so that (2) follows.
To prove (3), observe that if y, z are two non-zero vectors normal to S at x, then y, z are
both orthogonal to the n − 1 independent vectors in (7.12). But then, it is a standard result in
the theory of linear equations that y must be a multiple of z. [In a non-redundant system of
7.5. SURFACES AND NORMALS 107

n − 1 homogeneous equations in n unknowns, there is one arbitrary parameter only that appears
in the general solution.] This proves (3). 
n
Now, let S be a surface in R as given in (7.11), and let x = (x1 , x2 , . . . , xn ) be a given point
on the surface. That is, f (x) = 0. Let (a, b) be a closed interval and let C : (a, b) −→ S be a
curve lying on the surface going through the point x. That is, letting

C(t) = (y1 (t), y2 (t), . . . , yn (t)), for t ∈ (a, b),

there is t0 ∈ (a, b) such that C(t0 ) = x and

f (y1 (t), y2 (t), . . . , yn (t)) = 0,

for all t ∈ (a, b). Differentiating with respect to t using the Chain Rule gives
n
d X
0 = (f (y1 (t), y2 (t), . . . , yn (t)) = Dj f (C(t))yj0 (t)
dt
j=1

Putting t = t0 gives

hf 0 (x), C 0 (t0 )i = h(D1 f (x), D2 f (x), . . . , Dn f (x)), (y10 (t0 ), y20 (t0 ), . . . , yn0 (t0 )i
Xn
= Dj f (x)yj0 (t0 )
j=1
n
X
= Dj f (C(t0 ))yj0 (t0 )
j=1
= 0.

That is, f 0 (x) and C 0 (t0 ) are orthogonal. But C 0 (t0 ) is the tangent vector to the curve C at the
point x. Thus, every curve C going through the point x on the surface S has a tangent vector
that is orthogonal to f 0 (x). Thus, f 0 (x) is normal to the surface at the point x on the surface.
We summarize this in the following.

Theorem 20 Let Ω be an open set in Rn , let f : Ω −→ R be a continuously differentiable


function and let S be the surface given by

S = {x : x ∈ Ω and f (x) = 0}.

Assume that for each x ∈ S, f 0 (x) 6= 0. Then, for each point x on the surface S, f 0 (x) is normal
to the surface at the point x, and any vector in Rn that is normal to S at x is a multiple of
f 0 (x).

PROOF. This is immediate from the above reasoning and (3) of the preceding theorem. 
EXAMPLE. Let f (x, y, z) = x2 + y 2 + z 2 − 1. Then, if

S = {(x, y, z) : (x, y, z) ∈ R3 and f (x, y, z) = 0},

then S is a sphere of centre 0 and radius 1. We have

f 0 (x, y, z) = (2x, 2y, 2z),


108 CHAPTER 7. INTEGRATION ON CURVES AND SURFACES

so a unit vector n normal to the surface S at a point (x, y, z) on the surface is

1
n= p (x, y, z) = (x, y, z).
x2 + y 2 + z 2


Let Ω be a bounded open set in R2 and put D = Ω ∪ ∂Ω. Suppose we have a differentiable
function h : D −→ R. Then the set

n o
(x, y, h(x, y)) : (x, y) ∈ D

is a surface S in R3 . The surface can be described by the equation

h(x, y) − z = 0,

as considered before. A normal to the surface at a point (x, y, z) on the surface is

 
∂h ∂h
, , −1 ,
∂x ∂y

and a unit normal to the surface at a point (x, y, z) is

 
1 ∂h ∂h
n(x, y, z) = s , , −1 . (7.13)

∂h
2 
∂h
2 ∂x ∂y
1+ +
∂x ∂y

That is, if a surface in R3 is given by an equation z = h(x, y), (7.13) tells us a unit normal
vector on the surface at the point (x, y, z) on the surface.

7.6 Integration over a surface

Let S be a surface in R3 and let Rφ : S −→ R be a continuous


R R function on the surface. The
integral of φ over S, denoted by S φdS or sometimes by S φdS is defined similarly to the
2
R
integral of a real valued function over a subset of R . The problem of calculating S φdS can
be reduced to calculating a corresponding integral over a subset D of R2 , which can then be
worked out using the earlier techniques for calculating double integrals.
On way to understand what is going on is to consider a simplified situation, as depicted in
Figures 7.4 and 7.5. Then Figure 7.6 illustrates the actual 3-dimensional case.
7.6. INTEGRATION OVER A SURFACE 109

line
e 2 = (0,1)
segment
θ L
θ

Figure 7.4. This Figure illustrates a line segment L in R2 that makes an


angle θ with the horizontal axis. The arrow orthogonal to L makes the
same angle θ with the vertical unit vector e2 = (0, 1), as indicated in the
Figure. This observation is used in Figure 7.6.

Figure 7.6 illustrates a rectangular set D in the XY plane, with a surface S lying above D.
We let h : D −→ R be the function such that
n o
S = (x, y, h(x, y)) : (x, y) ∈ D .

δD denotes a small area of D, and δS denotes the corresponding small area of S that lies
above D. e3 denotes the vector (0, 0, 1) and n denotes the normal unit vector to S at the
indicated point. Let θ be the angle between e3 and n, so that cos θ = hn, e3 i. It appears that
δS cos θ approximately equals δD (compare Figures X and Y). That is, δS approximately equals
δD/
R R if φ : S −→ R is a continuous function onS, the integral of φ over S will be
cos θ. Now,
S φdS where S φdS is apparently approximated by sums of the form
X
φ(zδS )δS,

where zδS ∈ δS. As zδS ∈ δS, there is (xδD , yδD ) ∈ δD such that

h(xδD , yδD ) = zδS .


R
Thus, S φdS is apparently approximated by sums of the form
X δD
φ(xδD , yδD , h(xδD , yδD )) .
hn(xδD , yδD , h(xδD , yδD )), e3 i

Thus it is reasonable to make the definition that


Z Z Z Z
φ(x, y, h(x, y))
φdS = dx dy.
S D |hn(x, y, h(x, y)), e3 i|
110 CHAPTER 7. INTEGRATION ON CURVES AND SURFACES

f(uj )
un
u n-1
uj
u j-1
graph of f
over L u2 L
u1
u0

x0 = a x1 x2 xj-1 xj xn-1 b = xn

Figure 7.5. The figure illustrates how the integral of a function f : L −→


R can be expressed as the integral of a related function over the interval
[a, b] on the X-axis. The dotted curve indicates the graph of f over the line
segment L, and let L be given by the equation x 7−→ sx + t for a ≤ x ≤ b,
and let g : [a, b] −→ L be given by

g(x) = (x, sx + t), for a ≤ x ≤ b.


R
The integral L f dL of f over L is approximated by the expression
n
X
f (uj−1 )|uj − uj−1 |,
j=1

where the points uo , u1 , . . . , un subdivide the line segment L as indicated


in the picture. However, as L makes an angle θ with the X-axis, we see
that cos θ = (xj − xj−1 )/|uj − uj−1 |, for j = 1, 2, . . . , n. Thus,
n n
X X f (g(xj−1 ))
f (uj−1 )|uj − uj−1 | = (xj − xj−1 )
cos θ
j=1 j=1

In the limit, as we take finer subdivisions of L, we get


Z Z b
f (g(x))
f = dx.
L a cos θ

But if n denotes the direction of the unit normal to L, Figure 7.5 shows
that
cos θ = angle between n and e2 = hn, e2 i.
Thus,
Z Z b
f (g(x))
f= dx. (∗)
L a hn(x), e2 i
R
Here the integral L f dL of f over the line segment LR is like the integral of
a function φ over a surface S, and the forumula for S f dS takes a similar
form to (∗).
7.6. INTEGRATION OVER A SURFACE 111

e3
n
θ
S δS

Y
δS
D

X
Figure 7.6. See comments in the main text.

We may write this more informally as


Z Z Z Z
φ(x, y)
φdS = dx dy,
S D |hn(x, y), e3 i|
where φ(x, y) denotes the value of φ at the point (x, y, h(x, y)) on the surface, and n(x, y) denotes
the outward-pointing unit normal at the point (x, y, h(x, y)) on the surface.
Note that the absolute value appears here to take account of the fact that the normal may
point in two possibleRdirections, which could introduce a minus sign. The point is, that if φ ≥ 0
on the surface, then S φdS ≥ 0. If we use formula (7.13) for the unit normal, and observe that
1
hn(x, y), e3 i = r  2
∂h 2 ∂h

1+ ∂x + ∂y

this becomes
s  2  2
Z Z Z Z
∂h ∂h
φdS = φ(x, y, h(x, y)) 1 + + dxdy, (7.14)
S D ∂x ∂y
Then (7.14) is the formula commonly used to calculate a surface integral. However, note that
in (7.14), the role of x, y may be sometimes replaced by y, z or by z, x, depending on the form
of the surface.
112 CHAPTER 7. INTEGRATION ON CURVES AND SURFACES

RR
If the surface S is closed, that is it has an empty boundary, the integral S φdS may be
denoted by ZZ
φdS.
S

7.7 Integration of a vector field over a surface in R3


In some cases, the function defined on the surface may arise from a vector field. Using the
notation of the previous Section, assume further that F is a given vector field on the surface S.
If n(x, y, z) denotes a unit vector normal to the surface at the point (x.y, z) on the surface, then
(x, y, z) 7−→ hF (x, y, z), n(x, y, z)i defines a function on S mapping S to R. The corresponding
surface intergral is then Z Z
hF, ni dS,
S

and may be evaluated as indicated above.

7.8 Volume integrals


In Chapter 5 we describe the integral of s function over a two dimensional set, and we learned how
to calculate such integrals by reducing them to repeated integration of single-variable integrals.
The same considerations apply when we consider integration of a function over a 3-dimensional
set.
Specifically, consider when Ω is a 3-dimensional “rectangle” of the form

Ω = [a, b] × [c, d] × [e, f ],

and let φ : Ω −→ R be a continuous function. Then Ω may be partitioned into finer and finer
partitions
R of 3-dimensional rectangles, as analogous to the 2-dimensional case. This means that
Ω φ may be defined in an entirely analogous way to the 2-dimensional case. This integral may
be denoted by Z Z Z
φ, or by φ(x, y, z)dxdydz or by φ dV.
Ω Ω Ω
R R R
In the case when φ = 1, Ω or Ω dxdydz or Ω dV are all equal to the volume of Ω. Similar
considerations apply to calculating 3 dimensional integrals as applied in the case of 2-dimensional
integrals. These are indicated in the examples, and the Theorem below.
EXAMPLE (volume of a sphere). Let a > 0 and let S = S(0, a) = {(x, y, z) : x2 + y 2 +
z 2 ≤ a2 }. Then, S is the inside of sphere, together with its boundary, whose centre is at 0 and
whose radius is a. Then, the volume of S is

Z Z
dV = dxdydz
S x2 +y 2 +z 2 ≤a2
Z a Z 
= dxdy dz
−a x2 +y 2 ≤a2 −z 2
Z a
= π (a2 − z 2 ) dz
−a
7.8. VOLUME INTEGRALS 113
a
z3

= π a2 z −
3 −a
4πa3
= .
3
EXAMPLE (volume of a cone). Consider a cone of height h and whose circular base
has radius a. The equation of the cone is then

x2 y 2 z2 hp 2
2
+ 2
− 2
= 0, or z = x + y2,
a a h a
see Figure 7.7

h a radius of circle
when z=h is a

radius of circle for


general z is az/h

0 Y

X
x2 y2 z2
Figure 7.7. The cone with equation a2
+ a2
− h2
= 0.

Find the volume of the cone that lies between z = 0 to z = h. If V denotes the set inside
the cone, the required volume is
!
Z Z Zh
dxdydz = √ dxdy dz
V 0 x2 +y 2 ≤ az
h
h
a2 2
Z
= π z dz
0 h2
114 CHAPTER 7. INTEGRATION ON CURVES AND SURFACES

 h
a2 z 3
= π 2
h 3 0
πa2 h
= .
3

EXAMPLE (volume of a tetrahedron). Let a > 0. Find the volume of the set B
{(x, y, z) : x, y, z ≥ 0 and x + y + z ≤ a.}
If z ∈ [0, a], the plane through z and parallel to the xy-plane cuts the set S in a triangle Tz (see
figure). The vertices of Tz are at (0, 0, z), (a − z, 0, z) and (0, a − z, z). This is a right angled
triangle whose area is
1
(a − z)2 .
2

(0,0,a) the plane


x+y+z=a
(0,0,z) (0,a-z,z)
Tz
(a-z,0,z)

0 (0,a,0) Y

(a,0,0)

X
Figure 7.8. The tetrahedron bounded by the plane x + y + z = a and
which lies in the first octant.

We now have that the volume of B is


Z Z
dV = dxdydz
B B
7.9. THE DIVERGENCE THEOREM 115
Z a Z 
= dxdy dz
0 Tz
Z a
= (area of Tz ) dz
Z0 a
1
= (a − z)2 dz
0 2
a
(a − z)3

= −
6 0
a3
= .
6

It should be noted that the change of variables formula for integration in two dimensions
carries over to the 3-dimensional case. Here is a formal statement that corresponds to the one
in Chapter 5 for 2 dimensional integrals.

Theorem 21 Let Ω be an open subset of R3 and let g : Ω −→ R3 be a one-to-one continuously


differentiable function such that det(g 0 (x)) 6= 0 for all x ∈ Ω. Then g(Ω) is an open set in R3 .
Let f : g(Ω) −→ R be a given function with domain g(Ω).R Observe that the composed function
f ◦ g is defined and that f ◦ g : Ω −→ R. We assume that g(Ω) f exists. Then Ω f ◦ g | det(g 0 )|
R

exists and Z Z
f= f ◦ g | det(g 0 )|. (7.15)
g(Ω) Ω

7.9 The divergence theorem


The divergence theorem is a type of 3-dimensional analogue of Green’s Theorem (arguably,
Stokes’ Theorem is a more precise analogy). It relates an integral of a vector field over a surface
in R3 to a volume integral of the divergence of the vector field — the divergence of the vector
field is integrated over the volume inside the surface in 3-dimensional space whereas, in Green’s
theorem, the corresponding integral was over the area inside a curve in 2-dimensional space.

Theorem 22 (the divergence theorem, or Gauss’ Theorem). Let Ω be a bounded open


subset of R3 and let the boundary be a surface S without boundary. We assume that Ω lies on
one side only of the surface S. Let F be a differentiable vector field define on Ω ∪ S. Then, if n
denotes the unit normal function on S,
ZZ Z Z Z
hF, ni dS = div F (x, y, z) dxdydz.
S Ω
PROOF. This is indicated in a special case, typical examples of which are the sphere and
the ellipsoid. Assume that there is an open set R in the XY -plane with one-to-one differentiable
functions
f : R ∪ ∂R −→ R and g : R ∪ ∂R −→ R
such that
f (x, y) < g(x, y) for all (x, y) ∈ R ∪ ∂R.
Let S1 and S2 be the surfaces given by

S1 = {(x, y, g(x, y) : (x, y) ∈ R} and S2 = {(x, y, f (x, y) : (x, y) ∈ R}.


116 CHAPTER 7. INTEGRATION ON CURVES AND SURFACES

We assume that S + S1 ∪ S2 . (In the case of a sphere, S1 corresponds to the surface of the
upper half of the sphere and S2 corresponds to the surface of the lower half of the sphere.) Let
V denote the volume between the surfaces S1 and S2 . The situation is illustrated in Figure 7.9.
Let F = (F1 , F2 , F3 ). Now,
Z Z Z Z Z Z g(x,y) !
∂F3 ∂F3
dxdydz = dxdy
V ∂z R f (x,y) ∂z
Z Z Z g(x,y) !
∂F3
= (x, y, z) dxdy
R f (x,y) ∂z
Z Z
= (F3 (x, y, g(x, y)) − F3 (x, y, f (x, y))) dxdy. (7.16)
R

On the other hand,


Z Z Z
dxdy
h(0, 0, F3 ), n1 idS1 = F3 (x, y, g(x, y))he3 , n(x, y, g(x, y))i
S1 he3 , n1 (x, y, g(x, y)i
Z ZR
= F3 (x, y, g(x, y))dxdy (7.17)
R

g(x,y) S1

. R
V
(x,y)

f(x,y) S2
Figure 7.9. The figure illustrates a surface where a vertical line cuts
the surface in two places, producing a “top half” S1 of the surface and a
“bottom half” S2 of the surface.
7.10. STOKES’ THEOREM 117

Similarly, Z Z Z
h(0, 0, F3 ), n2 idS2 = − F3 (x, y, f (x, y))dxdy (7.18)
S2 R
But S = S1 ∪ S2 , so we have from (7.16), (7.17) and (7.18) that
Z Z Z Z
∂F3
h(0, 0, F3 ), nidS = dxdydz. (7.19)
S V ∂z

Similarly we find that Z Z Z Z


∂F2
h(0, F2 , 0), nidS = dxdydz, (7.20)
S V ∂y
and Z Z Z Z
∂F1
h(F1 , 0, 0), nidS = dxdydz. (7.21)
S V ∂x
Then, adding (7.19), (7.20) and (7.21) gives
Z Z Z Z  
∂F1 ∂F2 ∂F3
h(F1 , F2 , F3 ), nidS = + + dxdydz.
S V ∂x ∂y ∂z
That is, Z Z Z Z
hF, nidS = divF dxdydz.
S V


7.10 Stokes’ Theorem


Theorem 23 (Stokes’ Theorem).
Let S be a surface that is piecewise continuously differentiable and has a piecewise continu-
ously differentiable curve C as its boundary. Let F be a continuously differentiable vector field
on R2 whose domain is an open set containing S. Assume that the curve C is parametrized by
the function r. Then, Z ZZ
hF, r0 i dC = hcurl F, ni dS.
C S

PROOF. The Theorem is proved in a special case. The idea of the proof is to reduce the
calculation to one that is in 2 dimensions instead of 3.
We assume that there is a set A in the XY -plane as indicated in the Figure, and that the
surface S is given by
n o
S = (x, y, z) : z = f (x, y) for some(x, y) ∈ A ,

where f : A −→ R3 is a continuously differentiable function. That is, S is the range of the


function f .
The curve C is given by

C(t) = (r1 (t), r2 (t), r3 (t)) for a ≤ t ≤ b,

where [a, b] is some suitable interval in R. Then, the curve D lies “underneath” C in the xy-plane,
as in the Figure, and is given by

D(t) = (r1 (t), r2 (t)), for a ≤ t ≤ b.


118 CHAPTER 7. INTEGRATION ON CURVES AND SURFACES

The curve C is the boundary of S and the curve D is the boundary of the region A that lies
“underneath” S in the xy-plane.
As
C(t) = (r1 (t), r2 (t), r3 (t)) ∈ boundary of S,
we have
r3 (t) = f (r1 (t), r2 (t)), for all a ≤ t ≤ b.
By the Chain Rule,

r30 (t) = D1 f (r1 (t), r2 (t))r10 (t) + D2 f (r1 (t), r2 (t))r20 (t)

Let
F = (F1 , F2 , F3 )
Then it follows that

Z
F · dr
C
Z b
F1 (r(t))r10 (t) + F2 (r(t))r20 (t) + F3 (r(t))r30 (t) dt

=
a
Z b
= F1 (r(t))r10 (t) + F2 (r(t))r20 (t)
a
h
+ F3 (r(t)) D1 f (r1 (t), r2 (t))r10 (t)
i
+ D2 f (r1 (t), r2 (t))r20 (t) dt
Z b" #
= F1 (r(t)) + F3 (r(t))D1 f (r1 (t), r2 (t)) r10 (t)dt
a
Z b" #
+ F2 (r(t)) + F3 (r(t))D2 f (r1 (t), r2 (t)) r20 (t)dt
a
(7.22)

Now, we can define the vector field G = (G1 , G2 ) in R2 by putting

G1 (x, y) = (F1 (x, y, f (x, y)) + F3 (x, y, f (x, y))D1 f (x, y),

and
G2 (x, y) = F2 (x, y, f (x, y)) + F3 (x, y, f (x, y))D2 f (x, y).
Then, (7.22) becomes
Z Z
F · dr = G · dr
C ZD
= G1 dx + G2 dy
ZDZ  
∂G2 ∂G1
= − dxdy, (7.23)
A ∂x ∂y
7.10. STOKES’ THEOREM 119

where Green’s Theorem at the last step has been used in the set A. However, ∂G ∂G1
∂x and ∂y may
2

be worked out using the Product Rule and the Chain Rule. We get, putting z = f (x, y),

∂G2 ∂F2 ∂F2 ∂z ∂F3 ∂z ∂F3 ∂z ∂z ∂2z


= + + + + F3 . (7.24)
∂x ∂x ∂z ∂x ∂x ∂y ∂z ∂x ∂y ∂x∂y
Similarly,

∂G1 ∂F1 ∂F1 ∂z ∂F3 ∂z ∂F3 ∂z ∂z ∂2z


= + + + + F3 . (7.25)
∂y ∂y ∂z ∂y ∂y ∂x ∂z ∂y ∂x ∂y∂x

It follows from (7.24) and (7.25) that

∂G2 ∂G1

∂x ∂y
∂F2 ∂F2 ∂z ∂F3 ∂z
= + +
∂x ∂z ∂x ∂x ∂y
∂F1 ∂F1 ∂z ∂F3 ∂z
− − −
∂y ∂z ∂y ∂y ∂x
   
∂F2 ∂F3 ∂z ∂F1 ∂F3 ∂z
= − − −
∂z ∂y ∂x ∂z ∂x ∂y
 
∂F2 ∂F1
+ −
∂x ∂y
s 
∂z 2
 2
∂z
= hcurl(F1 , F2 , F3 ), n(x, y, z)i + +1 . (7.26)
∂x ∂y
.
120 CHAPTER 7. INTEGRATION ON CURVES AND SURFACES

(x,y,f(x,y))

C
Z
. C(t)

.b
.
t Y
a
A . (x,y)

D D(t)
X
Figure 7.10. The arrows pointing vertically downwards indicate that the surface
integral over S becomes a double integral over the region A in the XY -plane, and
that the integral along the curve C becomes a corresponding integral along the curve
D in the XY -plane. The idea of the proof is to apply Green’s Theorem to the integral
along the curve D around the region A, and deduce that it equals the integral over
A that corresponds to the surface integral over S. That is: for a suitable vector field
G in R2 , Z Z
0
hF, r idC = hG, r0 idD.
C D
R
Applying Green’s Theorem to the integral D G · dr we get
Z Z Z  
∂G2 ∂G1
hG, r0 idD = − dxdy,
D A ∂x ∂y
RR
and we observe from the calculation that the latter integral equals S
hcurlF, nidS.
7.11. EXERCISES 121

Now, if we integrate (7.26) over the region A in the XY -plane, using the formula for a surface
integral, on the right hand side we get
Z Z
hcurlF, ni dS.
S

However, if we integrate the left hand side of (7.26) over the region A in the XY -plane, and use
(7.23) we get Z
F · dr.
C
Thus, Z Z Z
hcurlF, ni dS = F · dr,
S C
as required. 

7.11 Exercises

1. Evaluate Z
(x2 − 2xy) dx + (y 2 − 2xy) dy,
C
where C is the path going from (−2, 4) to (1, 1) along y = x2 .

2. Calculate
x−y
I
x+y
2 2
dx − 2 dy
C x +y x + y2
where C is the circle whose equation is x2 + y 2 = a2 , traversed in the anticlockwise direction.

I
3. Verify Green’s Theorem in the plane for (xy + y 2 )dx + x2 dy where C is the closed curve
C
of the region bounded by y = x and y = x2 .

4. Let F be the vector field in R2 given by


F (x, y) = (3x2 + 5y 2 , 10xy + 3y 2 ).
Let C1 and C2 be the curves joining (0, 0) to (1, 1) given by
C1 (t) = (t, t) and C2 (t) = (t, t2 ), for 0 ≤ t ≤ 1.
Calculate Z Z
F · dr and F · dr.
C1 C2
Show that F is a conservative vector field and find a potential function φ for F .

5. Let F be a conservative vector field on Rn , and let φ be a potential function for F . Thus,
F = gradφ. Let C be a curve in Rn that starts at the point u0 ∈ Rn and ends at the point
u1 ∈ Rn . Prove that Z
F · dr = φ(u1 ) − φ(u0 ).
C
122 CHAPTER 7. INTEGRATION ON CURVES AND SURFACES

R
This shows that when F is conservative, the work done along a curve C, namely C F ·dr, depends
only on the end points of the curve C. [Hint: let C(t) = (r1 (t), r2 (t), . . . , rn (t)), for a ≤ t ≤ b.
Then, observe that the function
n
X
t 7−→ Fj (rj (t))rj0 (t)
j=1

is the derivative of another function, because of the Chain Rule.]

6. Let A be the vector field given in R3 by

A(x, y, z) = (18z, −12, 3y).

Let S be that part of the surface 2x + 3y + 6z = 12 which is located in the first octant. Then,
evaluate Z
hA, nidS.
S

7. Let φ : R3 −→ R be the function given by

3
φ(x, y, z) = xyz.
8
Then, evaluate Z
φ dS
S

where S is the surface of the cylinder x2 + y 2 = 16 included in the first octant between z = 0
and z = 5.

8. Evaluate Z
hr, nidS,
S

where S is the curved surface of the cylinder y 2 + z 2 = 9 bounded by x = 1 and x = 3.

9. Let F be the vector field on R3 given by

F (x, y, z) = (0, y, z).

Evaluate Z
hF, nidS,
S
p
where S is the curved surface of the cone 0 ≤ z ≤ 1 − x2 + y 2 .

10. Let A be the vector field on R3 given by

A(x, y, z) = (6z, 2x + y, −x).


7.11. EXERCISES 123
Z
Calculate hA, ni dS over the entire surface S of the set bounded by the cylinder x2 + z 2 = 9,
S
x = 0, y = 0, z = 0 and y = 8.

11. Let F be the vector field on R3 given by

F (x, y, z) = (4xz, xyz 2 , 3z).

Let S be the surface which is the boundary of the open set above the xy-plane and is bounded
by the cone z 2 = x2 + y 2 and the plane z = 4. Then, calculate
Z
hF, ni dS.
S

12. Sketch the three-dimensional region, R, bounded by the planes

4x + 2y + z = 8, x = 0, y = 0 and z = 0.

Evaluate the volume of R by a double integral, and using the intersection of R with planes (i)
orthogonal to the y-direction and (ii) orthogonal to the x-direction.

13. Let x 7−→ r(x) denote the function on R3 mapping x to x. That is, r is the identity
function on R3 , or we can think of r(x) as the “position vector” of x. Then, evaluate
Z
hr, ni dS
S

over the following surfaces S.


(a) the surface S of the unit cube bounded by the co-ordinate planes and the planes x =
1, y = 1, z = 1.
(b) the surface S of a sphere of radius a with centre at 0.

14. Sketch the three-dimensional region, R, bounded by the planes x + y + z = a, (a > 0),
x = 0, y = 0, and z = 0. Then, evaluate
Z
(x2 + y 2 )dx dy dz.
R

15. Let φ : R3 −→ R be the function given by f (x, y, z) = 45x2 y, and let V be the set
bounded by the planes
4x + 2y + z = 8, x = 0, y = 0 and z = 0.
Then, evaluate Z
φ dV.
V

16. Find the volume contained between the paraboloid x = 4 − 2y 2 − 2z 2 , and the plane
x = 2.
124 CHAPTER 7. INTEGRATION ON CURVES AND SURFACES

17. Find the volume in the first octant bounded by the cylinder z = y 2 and the planes
x = 0, y = 0 and z = 9 − x.

ZZZ
18. Find f (x, y, z)dx dy dz, where f (x, y, z) = 2xyz and S is the region bounded by the
S
parabolic cylinder z = 2 − 12 x2 and the planes z = 0, y = x, and y = 0, in the first octant.

19. Let F be the vector field on R3 given by


F (x, y, z) = (2xz, −x, y 2 ).
Evaluate Z Z Z Z Z Z Z Z Z Z Z Z 
2
F

dV = 2xzdV, − xdV, y dV ,
V V V V
where V is the region bounded by the surfaces
x = 0, y = 0, y = 6, z = x2 and z = 4.

20. Find the volume bounded above by the elliptic paraboloid z = 6 − 3x2 − 2y 2 and below
by the plane z = 2.

21. Find the volume in the first octant below the paraboloid
x2 y 2
z =1− − 2.
a2 b

22. Let F be the vector field given by


F (x, y, z) = (x + 2y, −3z, x),
and let φ : R3 −→ R be the function given by
φ(x, y, z) = 4x + 3y − 2z.
Let S be the closed surface bounded by x = 0, x = 1, y = 0, y = 2, 2x + y + 2z = 6 and z = 0.
Now, calculate the following.
ZZ
(a) h∇ × F, ni dS.
S
ZZ
(b) φ dS.
S

ZZ
23. If H

=∇

×A

prove that H.d

S

= 0.
S

24. Prove that Z


h∇, ni dV = volume of S,
V
where S is the surface which encloses the volume V in R3 .
Chapter 8

Optimization

8.1 Taylor series


In elementary calculus, if f : R −→ R is an infinitely differentiable function, the Taylor series
associated with f about the point a ∈ R is

f 00 (a) f 000 (a)


f (a) + f 0 (a)(x − a) + (x − a)2 + (x − a)3 + · · · .
2! 3!
The idea is that for some values of x, the series converges and that the sum of the series is f (x)
— or at least we hope it is. Note that when x = a, the series always converges and has sum
f (a), so the interest is in what happens when x 6= a. In general, the series converges for some
values of x and not for others. Note also that even when the series converges for a given value
of x, the sum of the series need not be f (x). However, for most functions arising in elementary
calculus, the Taylor series converges in an interval of R, called the interval of convergence, and
for any x in this interval the sum of the series is f (x).
Similar considerations arise in the case of functions of several variables. First we consider
the case of a real valued function whose domain Ω is an open set in R2 . Thus, Ω ⊆ R2 and
f : Ω −→ R. We assume that all the partial derivatives of f , of all orders, exist in Ω, in which
case we say that f is infinitely differentiable. Let (a, b) ∈ Ω and let (h, k) ∈ R2 . We consider the
function g : (−1, 1) −→ R given by

g(t) = f (a + th, b + tk).

Then g may be differentiated by the Chain Rule to get

g 0 (t) = hD1 f (a + th, b + tk) + kD2 f (a + th, b + tk).

Thus,

g 0 (0) = hD1 f (a, b) + kD2 f (a, b)


= (hD1 + kD2 )f (a, b). (8.1)

Also, using the Chain Rule again we have

g 00 (t) = h2 D11 f (a + th, b + tk) + hkD12 f (a + th, b + tk)


+khD21 f (a + th, b + tk) + k 2 D22 f (a + th, b + tk).

125
126 CHAPTER 8. OPTIMIZATION

So,

g 00 (0) = h2 D11 f (a, b) + 2hkD12 f (a, b) + k 2 D22 f (a, b)


= (h2 D11 + 2hkD12 + k 2 D22 )f (a, b)
= (hD1 + kD2 )2 f (a, b), (8.2)

and so on. For the general case, recall some notation. We put 0! = 1 and for n ∈ N we put

n! = n(n − 1)(n − 2) · · · 3 · 2 · 1.

Then, for n ∈ N and j ∈ {0, 1, 2, . . . , n} we put


 
n n!
= .
j j!(n − j)!

Then the formula (8.2) becomes, in general,


n  
(n)
X n j n−j
g (0) = h k Dj Dn−j f (a, b)
j
j=1
= (hD1 + kD2 )n f (a, b), (8.3)

Thus, using (8.1), (8.2) and (8.3), the Taylor series of g about 0 is
∞ ∞
X tn X tn
g (n) (0) = (hD1 + kD2 )n f (a, b) .
n! n!
n=1 n=1

We assume that when t = 1 this Taylor series for g about the origin converges with sum g(1).
Now, put x = a + h and y = b + k and note that f (a + h, b + k) = g(1) and that h = x − a
and k = y − b. Then,

f (x, y) = f (a + h, b + k)
= g(1), by the definition of g,

X 1
= (hD1 + kD2 )n f (a, b)
n!
n=0

X 1
= (x − a)D1 + (y − b)D2 )n f (a, b) (8.4)
n!
n=0
 
∞ n  
1  n
(x − a)j (y − b)n−j D1j D2n−j f (a, b) .
X X
= (8.5)
n! j
n=0 j=0

Then (8.4) and (8.5) are two equivalent forms of the Taylor series for the function f about the
point (a, b) ∈ R2 . The first few terms are

f (a, b) + D1 f (a, b)(x − a) + D2 f (a, b)(y − b)


1 
+ D11 f (a, b)(x − a)2 + 2D12 f (a, b)(x − a)(y − b) + D22 f (a, b)(y − b)2 + · · · .
2
8.2. MAXIMA AND MINIMA 127

In terms of the classical notation for partial derivatives this becomes


∂f ∂f
f (a, b) + (a, b)(x − a) + (a, b)(y − b)
∂x ∂y
!
1 ∂2f 2 ∂2f ∂2f
+ (a, b)(x − a) + 2 (x − a)(y − b) + (a, b)(y − b)2 + ···. (8.6)
2 ∂x2 ∂x∂y ∂y 2

Thus, to calculate the Taylor series about the point (a, b) in Ω, we simply need to calculate
the partial derivatives of f at (a, b) and put them in 8.6. Then the first few terms of the Taylor
expansion may then be taken as an approximation to the value of f (x, y) when (x, y) is near the
point (a, b).
In the case of n variables where Ω is an open set in Rn , and f : Ω −→ R is infinitely
differentiable, the Taylor series of f about the point (a1 , a2 , . . . , an ) can be written in the form
 r
∞ n
X 1 X
(xj − aj )Dj  f (a, b),
r!
r=1 j=1

or equivalently,
 

1 
D1j1 · · · Dnjn f (a, b)(x1 − a1 )j1 · · · (xn − an )jn  .
X X
r!
r=1 j1 +···+jn =r

8.2 Maxima and Minima


Recall that S(x0 , r) is the sphere of centre x0 and radius r. Let Ω be an open set in Rn and let
f : Ω → R be a function. Let x0  Ω be a given point. Then f has a relative maximum at x0 if
there is r > 0 such that
(i) S(x0 , r) ⊆ Ω, and
(ii) f (x) ≤ f (x0 ), for all x ∈ S(x0 , r).
In this case, f (x0 ) is called the relative maximum at x0 or the value of the relative maximum at
x0 .
Also, f has a relative minimum at x0 if there is r > 0 such that
(iii) S(x0 , r) ⊆ Ω, and
(iv) f (x0 ) ≤ f (x), for all x ∈ S(x0 , r).
In this case, f (x0 ) is called the relative minimum at x0 or the value of the relative minimum at
x0 .
Thus, for a relative maximum at x0 , the idea is that f (x0 ) ≥ f (x) for all x in Ω sufficiently
close to x0 . For a relative minimum at x0 , the idea is that f (x0 ) ≤ f (x) for all x in Ω sufficiently
close to x0 . These ideas are illustrated in Figures 4.2 and 4.3.

Theorem 24 Let Ω be an open set in Rn and let f : Ω → R be a differentiable function. Then,


if x0 ∈ Ω and f has either a relative maximum or a relative minimum at x0 , then

Dj f (x0 ) = 0, for all 1 ≤ j ≤ n.

That is,
∂f
(x0 ) = 0,
∂xj
for all 1 ≤ j ≤ n.
128 CHAPTER 8. OPTIMIZATION

DEFINITION Let Ω be a region in Rn and let f : Ω → R be a differentiable function.


Then a point x0 ∈ Ω is an critical point if

(D1 f )(x0 ) = (D2 f )(x0 ) = · · · = (Dn f )(x0 ) = 0.

That is, x0 is a critical point if


∂f ∂f ∂f
(x0 ) = (x0 ) = · · · = (x0 ) = 0.
∂x1 ∂x2 ∂xn

6
point where
minimum occurs

3.5
0
1 6

value of the
minimum -8.75

Figure 8.1. It is important to distinguish between the point where the


relative minimum (or maximum) of a function occurs, and the value of the
relative minimum (or maximum) of the function. The picture illustrates
the graph of the function f on [0, 7] given by

f (x) = x2 − 7x + 6.

The function has zeros at 1 and 6 and has a unique point where a relative
minimum occurs. The point where the relative minimum occurs is x = 3.5,
and the value of the relative minimum at this point is f (3.5) = −8.75. This
relative minimum of f is also an absolute minimum of f .

Theorem 11 shows that a point where there is a relative maximum or relative minimum must
be a critical point. However, note that a point may be a critical point, but there is neither a
relative maximum nor a relative minimum at the point.
8.2. MAXIMA AND MINIMA 129

Example Let f : R2 → R be given by f (x, y) = x2 − y 2 . Then

∂f ∂f
= 2x and = 2y
∂x ∂y

So, for a critical point we would have

∂f ∂f
0= = 2x and 0= = 2y
∂x ∂y

So there is exactly one critical point for f , occurring when x = y = 0. That is, the unique
critical point of f is (0,0). However, there is neither a maximum nor a minimum for f at (0,0).
To see this, observe that

f (x, x) = 2 x2 − x2 = (2 − 1)x2

So, if  > 1 and (x, y) approaches (0,0) along the line x = y, we see that f (x, x) = (2 − 1)x2 ≥
0 = f (0, 0), so f has a relative minimum at (0,0) in the direction of the line x = y.

However, if 0 <  < 1, if (x, y) approaches (0,0) along the line x = y, we see that f (x, x) =
(2 − 1)x2 ≤ 0 = f (0, 0), so f has a relative maximum at (0,0) in the direction of the line x = y.
So, we see that at (0,0), f has a relative maximum at (0,0) in some directions, but a relative
minimum at (0,0) in some other directions. This means that f has neither a relative maximum
nor a relative minimum at the critical point (0, 0).

Now, in first year calculus we saw that if a function f : R → R has a maximum or minimum
at x0 , then f 0 (x0 ) = (Df )(x0 ) = 0. This is a special case of the Theorem. We also saw in
first year that if f 0 (x0 ) = 0 and f 00 (x0 ) < 0 then f has a relative maximum at x0 , while if
f 0 (x0 ) = 0 and f 00 (x0 ) > 0, f has a relative minimum at x0 . What is the corresponding situation
for functions of several variables? The following result helps us to tell when there is a relative
maximum or a relative minimum at a critical point.
130 CHAPTER 8. OPTIMIZATION

a b c 1
0

Figure 8.2. The figure illustrates the graph of a differentiable real valued func-
tion on the interval [0, 1]. The function has a relative minimum f (b) at b, and
relative maxima f (a) at a and f (c) at c. f (b) is the absolute minimum for f , and
f (c) is the absolute maximum for f . f (a) is a relative maximum of f , but it is
not an absolute maximum.

Theorem 25 Let Ω be an open set in R2 and f : Ω → R be a twice continuously differentiable


function. Let x0 ∈ Ω and assume that x0 is an initial point of f . Put
∂2f
A = (D11 f )(x0 ) = ∂x2
(x0 ),

∂2f
B = (D12 f )(x0 ) = ∂y∂x (x0 ),

∂2f
C = (D22 f )(x0 ) = ∂y 2
(x0 ).

Then, if AC − B 2 > 0 there is a relative maximum or a relative minimum at x0 . In this


case,
8.2. MAXIMA AND MINIMA 131

if A > 0 there is a relative minimum at x0 ,


if A < 0 there is a relative maximum at x0 .
Note that if AC − B 2 < 0, there is a saddle point at x0 , which means that the function has a
maximum in one direction near x0 but has a minimum in another direction at x0 .
Note also that if AC − B 2 = 0, it is inconclusive as to the behaviour of the function near x0 .

PROOF (a few ideas only). As Ω is an open set there is r > 0 such that S(x, r) ⊆ Ω. Put
x = (x1 , x2 ), let (h, k) ∈ S(0, r) and consider the function g : (−1, 1) −→ R given by

g(t) = f (x1 + th, x2 + tk).

The Chain Rule gives


∂f ∂f
g 0 (t) = h (x1 + th, x2 + tk) + k (x1 + th, x2 + tk),
∂x ∂y
and so
∂f ∂f
g 0 (0) = h (x1 , x2 ) + k (x1 , x2 ) = 0,
∂x ∂y
as x = (x1 , x2 ) is assumed to be a critical point. Using the Chain Rule, we get

g 00 (0) = Ah2 + 2Bhk + Ck 2 .

Assuming that A 6= 0, we have


" 2 #
00 B AC − B 2 2
g (0) = A h+ k + k .
A A2

If A > 0 and AC − B 2 > 0, we see that whatever we choose for (h, k), g 00 (0) > 0 and we see that
f has a relative minimum at x = (x1 , x2 ). And so on. 
n
DEFINITIONS. Let Ω be an open set in R . Then the boundary of Ω is the set ∂Ω where

∂Ω = {x : x ∈ Rn and S(x, r) intersects Ω and Ωc for all r > 0 }

The closure of the set Ω is the set Ω given by

Ω = Ω ∪ ∂Ω.

The set Ω has the property that its complement in Rn is open. Any set whose complement in
Rn is open is called a closed set.

The idea is that the boundary of Ω represents the “edge” or “border” or “boundary” of Ω in
Rn . For example if Ω is the open interval (a, b) in R,

∂Ω = {a, b} and Ω ∪ ∂Ω = [a, b].

In R3 , the boundary of the sphere

S(x, r) = {y : y ∈ R3 and |x − y| < r}

is
∂S(x, r) = {y : y ∈ R3 and |x − y = r|}
132 CHAPTER 8. OPTIMIZATION

Thus, ∂S(x, r) is the surface of the sphere S(x, r) and S(x, r), the closure of S(x, r), is the solid
sphere of radius r including the surface of the sphere.
DEFINITION. Let X be a set and let f : X → R be a function. Let x0 ∈ X be a given point.
The f has an absolute maximum at x0 if

f (x0 ) ≥ f (x), for all x ∈ X,

and then f (x0 ) is called the (value of) the absolute maximum of f over X.
Also, f has an absolute minimum at x0 if

f x0 ≤ f (x), for all x ∈ X,

and then f (x0 ) is called the (value of) the absolute minimum of f over X.

Note that f need have neither an absolute maximum nor an absolute minimum. Also, if there
is an absolute maximum for f , the absolute maximum may occur at more than one point —
remember to distinguish between the absolute maximum f (x0 ), and the point x0 where f takes
on the absolute maximum. Similar remarks apply to the absolute minimum.

Theorem 26 . Let Ω be a bounded open set in Rn and let Ω be the closure of Ω. Let

f :Ω→R

be a continuous function. The f has an absolute maximum and an absolute minimum over Ω.

Now, suppose we have an open set Ω and a continuous function f : Ω → R. The procedure
for finding relative and absolute maxima and minima for f as a function on Ω is as follows:
(i) Find all critical points in Ω and check to see which of these give relative maxima or
minima.
(ii) Find the maximum and minimum values of f on ∂Ω.
(iii) Consider all maximum and minimum values found, and by comparing them determine
the absolute maximum and minimum values, if any.
EXAMPLE. Let A denote the closed set

{(x, y) : −1 ≤ x, y ≤ 1}.

Calculate the maximum and minimum values of the function f over A, where

f (x, y) = xy.

We have
∂f ∂f
= y and = x.
∂x ∂y
Thus the only critical point is when x = y = 0 and this gives the point (0, 0) in the interior of
A. That is, the origin (0, 0) is the only critical point. Note that f (0, 0) = 0. If y = αx, then
f (x, αx) = αx2 , so that if α > 0 then f has a minimum at (0, 0) on the curve y = αx, but if
α < 0 then f has a maximum at (0, 0) on the curve y = αx. So, f has neither a local maximum
nor a local minimum at (0, 0) so (0, 0) is a saddle point of f . We now have to check what f does
on the boundary of A. There are 4 cases, corresponding to the 4 sides of the square A.
We have f (x, 1) = x so on {(x, 1) : −1 ≤ x ≤ 1}, f has a maximum of 1 at (1, 1) and a
minimum of −1 at (−1, 1).
8.3. CONSTRAINED MAXIMA AND MINIMA AND LAGRANGE MULTIPLIERS 133

We have f (x, −1) = −x so on {(x, −1) : −1 ≤ x ≤ 1}, f has a maximum of 1 at (−1, −1)
and a minimum of −1 at (1, −1).
We have f (1, y) = y so on {(1, y) : −1 ≤ y ≤ 1}, f has a maximum of 1 at (1, 1) and a
minimum of −1 at (1, −1).
We have f (−1, y) = −y so on {(1, y) : −1 ≤ y ≤ 1}, f has a maximum of 1 at (−1, −1) and
a minimum of −1 at (−1, 1).
So, f has no local or relative maxima or minima inside A, but it has an absolute maximum of
1 at each of (1, 1) and (−1, −1), and an absolute minimum of −1 at each of (−1, 1) and (1, −1).

8.3 Constrained maxima and minima and Lagrange multipliers


Let Ω be an open subset of Rn and let g : Ω −→ R be a differentiable function with the property

x ∈ Ω and g(x) = 0 =⇒ Dg(x) 6= 0. (8.7)

That is, if x ∈ Ω and g(x) = 0, Dj g(x) 6= 0 for some j ∈ {1, 2, . . . , n}. Put
n o
S = x : x ∈ Ω and g(x) = 0 .

The surface S is closed in Rn because the function g, being differentiable, is also continuous.
Now, let f : Ω −→ R be a differentiable function and consider the following problem. Find
the maximum and minimum values, if they exist, of the function f when f is restricted to S.
Also, find the points at which any such maxima or minima occur. Thus, we are aiming to find
n o n o
max f (x) : x ∈ Ω and g(x) = 0 and min f (x) : x ∈ Ω and g(x) = 0 , (8.8)

assuming that these values exist. The idea is that the equation g(x) = 0 imposes a constraint
on the values in the domain of f over which we are trying to find the maximum and minimum
values of f .
Note as above that f is continuous in Ω because it is differentiable there. Consequently, if
S is bounded as well as closed, the maximum and minimum values in (8.8) will exist (see the
Appendix). That is, there are u, v ∈ S such that
n o n o
f (u) = max f (x) : x ∈ Ω and g(x) = 0 and f (v) = min f (x) : x ∈ Ω and g(x) = 0 .

Now, let’s assume that u ∈ S and that f has a maximum or minimum value at u. Let J be
an open interval and let C : J −→ S be a differentiable curve in S that goes through u. Let
C(t0 ) = u and put C = (C1 , C2 , . . . , Cn ). Then as f restricted to S has a local maximum or a
minimum at u, the function
t 7−→ f (C(t))
has a maximum or a minimum at t0 , since C(t0 ) = u. Consequently, the derivative of t 7−→
f (C(t)) must be 0 at t0 . Calculating this derivative using the Chain Rule we get
n
X
Dj f (u)Cj0 (t0 ) = 0.
j=1

That is,
hf 0 (u), C 0 (t0 )i = 0
134 CHAPTER 8. OPTIMIZATION

Thus, f 0 (u) is orthogonal to the tangent vector at u of every differentiable curve going through
u. That is, f 0 (u) is normal to the surface at the point u where the maximum or minimum
occurs. However, g 0 (u) is also normal to the surface S at the point u. Now, we assume that
at each point x ∈ S, g 0 6= 0. Then, g 0 (u) 6= 0 and by our earlier result on normal vectors to a
surface, under these assumptions, f 0 (u) is a multiple of g 0 (u). That is, there is λ ∈ R such that
f 0 (u) = −λg 0 (u). That is
f 0 (u) + λg 0 (u) = 0 and g(u) = 0. (8.9)
The number λ in (8.9) is called a Lagrange multiplier. Putting u = (u1 , u2 , . . . , un ), the equation

f 0 (u) + λg 0 (u) = 0 and g(u) = 0,

becomes a system of n + 1 equation in n + 1 unknowns, to be solved for u1 , u2 , . . . , un and λ. If


these can be found, we have a solution of the problem.
EXAMPLE. Let b, c ∈ R and let f : R3 −→ R be given by

f (x, y, z) = x2 + by 2 + cz 2 .

We assume that 1, b, c are all distinct. We find the maxima and minima of f subject to the
constraint
x2 + y 2 + z 2 = 1.
Thus, the constraint is that the domain of f is restricted to the surface of the sphere S(0, 1) in
R3 . Putting g(x, y, z) = x2 + y 2 + z 2 , observe that
 
0 ∂g ∂g ∂g
g (x, y, z) = , , = (2x, 2y, 2z) 6= (0, 0, 0).
∂x ∂y ∂z

So by the result above, there is λ ∈ R and u ∈ S such that

f 0 (u) + λg 0 (u) = 0 and g(u) = 0.

That is, if u = (u1 , u2 , u3 ),

(2u1 , 2bu2 , 2cu3 ) + λ(2u1 , 2u2 , 2u3 ) = 0 and u21 + u22 + u23 = 1.

Thus,

2u1 = −2λu1 , (8.10)


2bu2 = −2λu2 , (8.11)
2cu3 = −2λu3 , and (8.12)
u21 + u22 + u23 = 1. (8.13)

from (8.10), (8.11) and (8.12) we see that u1 = 0 or λ = −1, that u2 = 0 or λ = −b, and that
u3 = 0 or λ = −c. Using (8.13) we get the following.
If λ = −1, u1 = ±1 and u2 = u3 = 0.
If λ = −b, u2 = ±1 and u1 = u3 = 0.
If λ = −c, u3 = ±1 and u1 = u2 = 0.
Thus the possible points where the maxima and minima occur are

(1, 0, 0), (−1, 0, 0), (0, 1, 0), (0, −1, 0), (0, 0, 1), (0, 0, −1).
8.4. EXERCISES 135

The values of f at these points are respectively

1, 1, b, b, c, c.

So, if it is the case that b < 1 < c, for example, f has a minimum value on S of b at the points
(0, 1, 0) and (0, −1, 0), and maximum value on S of c at the points (0, 0, 1) and (0, 0, −1).
Similarly in other cases such as 1 < b < c or 1 < c < b, the location and value of the maxima
and minima may be described.

8.4 Exercises
1.∗ Let f :−→ R be the function given by

0, if x = 0;
f (x) = − 12
e x , if x 6= 0.

(i) Prove that f is infinitely differentiable on R.


(ii) Prove that Dn f (0) = 0 for n = 0, 1, 2, . . . .
(iii) Prove that the Taylor series of f about 0 converges for all values of x and that its sum
equals 0.
(iv) If x 6= 0, deduce that the Taylor series of f about 0 with the value x does not converge
to f (x). Thus f is an example of a function where the Taylor series converges but it does not
converge to the value of the function.

2. Calculate the Taylor series up to the second order terms for the following cases.
(i) f (x, y) = x2 + xy − y 2 , about (1, −2)
(ii) f (x, y) = x2 y + sin y + ex , about (1, π)
2 2
(iii) f (x, y) = e−x −y , about (0, 0)
1
(iv) f (x, y) = xy , about (−1, 1).

3. Find all critical points and the nature of the points for the following function f on R2 ,
given as follows.
2 2
f (x, y) = x(x − y)e−x −y .
If possible, find all local and absolute maximum and minimum values.

4. Find all critical points and the nature of the points for the following function f on R2 ,
given as follows.
f (x, y) = (x2 + y 2 )2 (x2 + y 2 − 4)2 .
If possible, find all local and absolute maximum and minimum values.

5. Calculate all critical points of the function z on R2 given by


3 +y 3
z(x, y) = xe−x ,

and determine any local maxima and minima.

6. Calculate the maximum and minimum values of the function f given by f (x, y) = xy − y 2
on the closed circle {(x, y) : x2 + y 2 ≤ 1}.
136 CHAPTER 8. OPTIMIZATION

7. Calculate the maximum and minimum values of the function f given by f (x, y) =
x2 ye−(x+y) on the closed triangular region x ≥ 0, y ≥ 0, x + y ≤ 4.
8. Let g : R2 −→ R be given by

g(x, y) = 2x3 − 6xy + 3y 2 .

Find all critical points of g and determine whether these points give a relative minimum, a
relative maximum or a saddle point. Determine whether g has an absolute minimum or an
absolute maximum on R2 .
Chapter 9

Appendices

9.1 Open, closed and bounded sets


Let Ω denote a subset of Rn . Then, Ω was defined to be open if, for every x ∈ Ω there is
r > 0 such that the sphere S(x, r) = {y : |x − y) < r} has the property that S(x, r) ⊆ Ω. Ω is
closed if its complement Ωc = {y : y ∈ Rn and y ∈ / Ω} is open. The boundary of Ω is the set
n
consisting of all points x ∈ R such that for each r > 0, the sphere S(x, r) has the property that
S(x, r) ∩ Ω 6= ∅ and S(x, r) ∩ Ωc 6= ∅. The boundary of Ω is a closed set, and it may be denoted
by ∂Ω. The set Ω is bounded if there is M > 0 such that

|x| ≤ M, for all x ∈ Ω.

9.2 Limits, continuity and the existence of maxima and minima


Let Ω be a subset of Rn and let f : Ω −→ Rm be a given function. Then f has a limit ` at the
point x0 ∈ Ω if, for any number ε > 0, there is some number δ > 0 such that

x ∈ Ω and 0 < |x − x0 | < δ =⇒ |f (x) − `| < ε.

In this case we write


lim f (x) = `.
x→x0

The function f is continuous at x0 ∈ Ω if

lim f (x) = f (x0 ).


x→x0

The function f is continuous on Ω if it is continuous at each point of Ω.

Theorem 27 Let Ω be a closed and bounded subset of Rn . Then, there are x0 , y0 ∈ Ω such that

f (x) ≤ f (x0 ) for all x ∈ Ω, and f (y0 ) ≤ f (y) for al— y ∈ Ω.

We say that f attains an absolute maximum at x0 and that f attains an absolute minimum at
y0 . Then f (x0 ) is called the absolute maximum of f over Ω and f (y0 ) is called the absolute
minimum of f over Ω.

137
138 CHAPTER 9. APPENDICES

9.3 The equality of mixed partial derivatives


The equality of mixed partial derivatives has been used frequently in this main text. The
following result proves this equality for second order derivatives. The result for the general case
is then proved from this.

Theorem 28 Let Ω be an open subset of R2 and let f : Ω −→ R be a given twice continuously


differentiable function. Then for each (a, b) ∈ Ω,

D12 f (a, b) = D21 f (a, b).

PROOF. Let (a, b) ∈ Ω, let δ > 0 be such that (a + h, b + k) ∈ Ω for all |h|, |k| < δ. Let
|h| < δ and put
g(x) = f (x, b + h) − f (x, b), for all |x − a| < δ. (9.1)
Then, by the Mean Value Theorem, there is c ∈ [a − h, a + h] such that

g(a + h) − g(a) = hg 0 (c) = h[D1 f (c, b + h) − D1 f (c, b)]. (9.2)

Now, in this expression we apply the Mean Value Theorem again, this time to D1 f (c, b + h) −
D1 f (c, b). We see that there is a point d ∈ [b − h, b + h] such that

D1 f (c, b + h) − D1 f (c, b) = hD2 D1 f (c, d). (9.3)

Then (9.2) and (9.3) give

g(a + h) − g(a)
lim = lim D2 D1 f (c, d) = D2 D1 f (a, b) = D12 f (a, b), (9.4)
h→0 h2 (c,d)→(a,b)

by the continuity of D12 f .


Similarly, put
w(x) = f (a + h, x) − f (a, x), for all |x − b| < δ. (9.5)
An argument along the same lines as above gives

w(b + h) − w(b))
lim = D1 D2 f (a, b) = D21 f (a, b), (9.6)
h→0 h2

by the continuity of D21 f .


But, using (9.1) and (9.5),

g(a + h) − g(a) = [f (a + h, b + h) − f (a + h, b)] − [f (a, b + h) − f (a, b))],


= [f (a + h, b + h) − f (a, b + h)] − [f (a + h, b) − f (a, b))],
= w(b + h) − w(b). (9.7)

We deduce from (9.7) that the left hand limits in (9.4) and (??) are equal.Then, (9.4) and (9.6)
give
D12 f (a, b) = D21 f (a, b).

9.4. PROOF OF THE INVERSE FUNCTION THEOREM 139

9.4 Proof of the Inverse Function Theorem


Lemma Let g : Rn −→ R be differentiable and let Ω be an open sphere of Rn . Suppose that
M > 0 is such that
|Dj g(x)| ≤ M, for all x ∈ Ω.
Then, for all x, y ∈ Ω we have

|g(x) − g(y)| ≤ nM |x − y|.

PROOF. Let
d0 = g(y1 , . . . , yn ) and
dj = g(x1 , . . . , xj , yj+1 , . . . , yn )
for j = 1, 2, 3, . . . , n.

g(x) − g(y)
= dn − d0
Xn
= (dj − dj−1 )
j=1
n
"
X
= g(x1 , . . . , xj , yj+1 , . . . , yn )
j=1
#
−g(x1 , . . . , xj−1 , yj , . . . , yn ) (9.8)

But, by the Mean Value Theorem, there is a point zj between xj and yj such that

|g(x1 , . . . , xj , yj+1 , . . . , yn )
−g(x1 , . . . , xj−1 , yj , . . . , yn )|
= |xj − yj | · |Dj g(zj )|
≤ M |xj − yj |. (9.9)

It now follows from (9.8) and (9.9) that

|g(x) − g(y)|

Xn
≤ g(x1 , . . . , xj , yj+1 , . . . , yn )


j=1


−g(x1 , . . . , xj−1 , yj , . . . , yn )


n
X
≤ M |xj − yj |
j=1
n
X
≤ M |x − y|
j=1
= nM |x − y|.


140 CHAPTER 9. APPENDICES

Theorem 29 (The Inverse Function Theorem). Let a ∈ Rn and let f : Rn −→ Rn be a


function that is continuously differentiable in some open set containing a and assume that f 0 (a)
is an invertible n × n matrix. That is, we assume that det(f 0 (a)) 6= 0. [That is Da f is an
isomorphism.]
Then there is an open set V containing a and an open set W containing f (a) such that f
maps V onto W in a one-to-one manner and the inverse f −1 of f is continuously differentiable
as a map from W to V .
Also, we have:
0 −1
f −1 (w) = f 0 f −1 (w)

, for all w ∈ W.
Alternatively, this can be written as

(f −1 )0 (f (x)) = (f 0 (x))−1 , for all x ∈ V. (9.10)

Also, if f is k times continuously differentiable, so is f −1 . [So, if f is a diffeomorphism, so


is f −1 .]

Thus, the inverse function theorem says: if f is differentiable at a and det(f 0 (a)) 6= 0, that
is if the Jacobian J(f ) of f at a is not 0, then f has an inverse f −1 near f (a), and (f −1 )0 (f (a))
is the inverse of f 0 (a). That is, the derivative of the inverse function is the inverse of
the derivative, when evaluated at the appropriate points.
PROOF.
I. We can assume Df(a) is the identity
Put T = Df (a)[= Da (f )], and put g = T −1 ◦ f . By the Chain Rule,

Dg(a) = D(T −1 ◦ f )(a)


= T −1 ◦ Df (a)
= the identity transformation on Rn .

Now it is clear that the result holds for f if and only if it holds for g, as T −1 is an invertible
linear transformation and g = T −1 ◦ f . So, without loss of generality, we may assume in proving
the Theorem that T = Df (a) is the identity transformation on Rn . That is

0, if i 6= j;
Dj fi (a) = (9.11)
1, if i = j
II. f is one-to-one near a
Observe that if f (a + h) = f (a),
|f (a + h) − f (a) − T (h)| |h|
= = 1.
|h| |h|
But by the definition of the derivative,
|f (a + h) − f (a) − T (h)|
lim = 0.
h→0 |h|
Hence, for all h sufficiently small,
f (a + h) 6= f (a).
It follows that there is a closed rectangle U with a in the interior of U such that

f (x) 6= f (a), for all x ∈ U. (9.12)


9.4. PROOF OF THE INVERSE FUNCTION THEOREM 141

As we are assuming that f is continuously differentiable in an open set containing a, we can


take U to have also the following properties:
det(f 0 (x)) 6= 0, for all x ∈ U, and (9.13)

1
|Dj fi (x) − Dj fi (a)| < ,
2n2

for all 1 ≤ i, j ≤ n and x ∈ U. (9.14)


Note that if we put
g = f − T,
then 
Dj fi for j 6= i, and
Dj gi = (9.15)
Dj fi − 1 if j = i.
Thus (9.11), (9.14) and (9.15) give
1
|Dj gi (x)| ≤ (9.16)
2n2
for all 1 ≤ j ≤ n and x ∈ U .
Now, for x, y ∈ U we have by applying the Lemma to each function gi , and using the estimate
in (9.16),
|(f (x) − x) − (f (y) − y)| = |g(x) − g(y)|
Xn
≤ |gi (x) − gi (y)|
i=1
n
X n
≤ |x − y|
2n2
i=1
1
= |x − y|. (9.17)
2
Now,
|x − y| − |f (x) − f (y)| ≤ |(f (x) − x) − (f (y) − y)|
1
≤ |x − y|.
2
Thus,
|x − y| ≤ 2|f (x) − f (y)|, (9.18)
for all x, y ∈ U . This implies that f is one-to-one on U .
III. The function maps onto an open set about a
The boundary B of U is compact, so f (B) is also compact because f is continuous. As
f (a) ∈
/ f (B), this means that there is a number d > 0 such that
|f (x) − f (a)| ≥ d, for all x ∈ B.
Put  
d
W = y : |y − f (a)| < .
2
W is open.
142 CHAPTER 9. APPENDICES

f(U)
U f
a . .
f(a)

Figure 9.1. It is assumed that det(f 0 (a)) 6= 0. The closed rectangle U is chosen
so that det(f 0 (x)) 6= 0 for all x ∈ U , together with the property that the partial
derivatives of f do not vary much over U . It is shown that f is one-to-one on U .
This means that f (a) is not on the boundary of f (U ).

f(U)
W
U
f
-1
.
V
a
. f(a)
d

Figure 9.2. The closest distance from f (a) to the boundary is d. The open
set W consists of all points whose distance from f (a) is less than d/2. That is
W = S(f (a), d/2). Put V = f −1 (W ). V is open. It is shown that f restricted to
V has a continuous inverse f −1 : W −→ V . It is also shown that f −1 : W −→ V
is differentiable.

Then, if y ∈ W and x ∈ B, observe that

d ≤ |f (x) − f (a)|
= |(f (x) − y) + (y − f (a)|
≤ |f (x) − y| + |y − f (a)|.
9.4. PROOF OF THE INVERSE FUNCTION THEOREM 143

But as y ∈ W , |y − f (a)| < d/2, so that


d
|f (x) − y| > > |y − f (a)|, (9.19)
2
for all y ∈ W and x ∈ B.
Now let y ∈ W be given. We show there is a unique point x in the interior of U such that
f (x) = y. That is, we show that f restricted to a suitable subset of the interior of U maps
one-to-one and onto W .
Consider the function g : U −→ R given by
n
X
h(x) = |y − f (x)|2 = (fi (x) − yi )2 . (9.20)
i=1

The function is continuous on U and so has a minimum value on U . We have by (9.19), that
this minimum value does not occur at a point on the boundary of U . So, it occurs at a point x0
of the interior of U . At this point, all partial derivatives of h equal 0. So, by (9.20),
n
X
2(fi (x0 ) − yi ) · Dj fi (x0 ) = 0, for all j.
i=1

But by (9.13), the matrix (Dj fi (x)) is invertible. Hence,

fi (x0 ) − yi = 0, for all i = 1, 2, . . . , n.

That is, x0 is in the interior of U and f (x0 ) = y.


Let V = (the interior of U ) ∩ f −1 (W ). Then, V is open. We have seen that f : V −→ W
is an onto mapping and it is also one-to-one by (9.18). Thus, f has an inverse f −1 : W −→ V .
This means that (9.18) can be rewritten in the form

|f −1 (v) − f −1 (w)| ≤ 2|v − w|, for all v, w ∈ W.

Hence f −1 is continuous.

IV. The inverse of f is differentiable


Let x ∈ V and let T = Df (x). We have to show that f −1 is differentiable at y = f (x) and
that Df −1 (y) = T −1 .
Let
ψ(h) = f (x + h) − f (x) − T (h), (9.21)
for all h such that x + h ∈ V . Because f is differentiable at x,
|ψ(h)|
lim = 0. (9.22)
h→0 |h|

It follows from (9.21) and (9.22) that

f (u) − f (x) = T (u − x) + ψ(u − x), for u ∈ V close to x.

Thus,
T −1 (f (u) − f (x)) = u − x + T −1 (ψ(u − x)), and
144 CHAPTER 9. APPENDICES

u = x + T −1 (f (u) − f (x)) − T −1 (ψ(u − x)).


But as every point w of W is of the form f (u) for some u ∈ V , we may write this as

f −1 (w)
= f −1 (y) + T −1 (w − y) − T −1 (ψ(f −1 (w) − f −1 (y))),

for all w ∈ W . That is,

f −1 (w) − f −1 (y) − T −1 (w − y)
= −T −1 (ψ(f −1 (w) − f −1 (y))),
(9.23)

We prove that
|T −1 (ψ(f −1 (w) − f −1 (y)))|
lim = 0. (9.24)
w→y |w − y|
Then, (9.23) gives
|f −1 (w) − f −1 (y) − T −1 (w − y)|
lim = 0,
w→y |w − y|
and the result follows by the definition of the derivative of f −1 .
To prove (9.24), it suffices to prove

|ψ(f −1 (w) − f −1 (y))|


lim = 0. (9.25)
w→y |w − y|

But,

|ψ(f −1 (w) − f −1 (y))|


|w − y|
|ψ(f (w) − f −1 (y))| |f −1 (w) − f −1 (y)|
−1
= ·
|f −1 (w) − f −1 (y)| |w − y|
−1 −1
|ψ(f (w) − f (y))|
≤ 2 , by (9.18),
|f −1 (w) − f −1 (y)|
→ 0, as w → y,

because f −1 is continuous and because of (9.22). This proves (9.25) and shows that f −1 is
differentiable at f (x) and that its derivative at f (x) is Df −1 (f (x)) and

Df −1 (f (x)) = (Df (x))−1 .

V. Order of differentiability of the inverse function


Once we know that f −1 is differentiable in W and that its derivative matrix at y ∈ W is
[f (f −1 (y))]−1 we have, putting x = f −1 (y) as before,
0

(f −1 )0 (y) = ((Dj fi (x))1≤i,j≤n )−1


= (aij (x))1≤i,j≤n
= aij (f −1 (y)) 1≤i,j≤n .

(9.26)
9.4. PROOF OF THE INVERSE FUNCTION THEOREM 145

Here, for each i, j, aij : V −→ R and is a rational function in the Dj fi (x) by Cramer’s Rule.Thus,
if f is k times differentiable, each aij is k − 1 times differentiable. It follows from (9.26) that

Dj (f −1 )i = aij ◦ f −1 .

Then, from the following Lemma, we deduce that if f is k times differentiable, f −1 is k times
differentiable.
Lemma. Assume that f is k times differentiable and that h : V −→ R is k times differen-
tiable. Then, h ◦ f −1 : W −→ R is differentiable and its derivative is of the form ` ◦ f −1 , where
` : V −→ R and ` is k − 1 times differentiable.
PROOF.

(h ◦ f −1 )0 (y) = h0 (f −1 (y))(f −1 )0 (y)


= h0 (f −1 (y))(f 0 (f −1 (y))−1 .

That is, h i
(h ◦ f −1 )0 = h0 (f 0 )−1 ) ◦ f −1 = ` ◦ f −1 ,

where
` = h0 (f 0 )−1 .
But, as f, h are k times differentiable, ` is k − 1 times differentiable. 

Anda mungkin juga menyukai