Anda di halaman 1dari 212

MAT HEMAT I CAL

MET HODS I
Contents
I Linear Algebra: by Gordon Royle and Alice Devillers 7
1 Systems of Linear Equations 9
1.1 Systems of Linear Equations 9
1.2 Solving Linear Equations 12
1.3 Gaussian Elimination 16
1.4 Back Substitution 19
2 Vector Spaces and Subspaces 25
2.1 The vector space R
n
25
2.2 Subspaces 27
2.3 Spans and Spanning Sets 31
2.4 Linear Independence 37
2.5 Bases 40
3 Matrices and Determinants 47
3.1 Matrix Algebra 47
3.2 Subspaces from matrices 51
3.3 The null space 56
3.4 Solving systems of linear equations 59
3.5 Matrix Inversion 59
3.6 Determinants 68
4 Linear transformations 77
4.1 Introduction 77
4
II Differential Calculus: by Luchezar Stoyanov, Jennifer Hopwood and Michael
Giudici. Some pictures courtesy of Kevin Judd. 85
5 Vector functions and functions of several variables 87
5.1 Vector valued functions 87
5.2 Functions of two variables 89
5.3 Functions of three or more variables 91
5.4 Summary 92
6 Limits and continuity 95
6.1 Scalar-valued functions of one variable 95
6.2 Limits of vector functions 100
6.3 Limits of functions of two variables 100
6.4 Continuity of Functions 103
7 Differentiation 107
7.1 Derivatives of functions of one variable 107
7.2 Differentiation of vector functions 109
7.3 Partial Derivatives 113
7.4 Tangent Planes and differentiability 116
7.5 The Jacobian matrix and the Chain Rule 119
7.6 Directional derivatives and gradients 124
8 Maxima and Minima 129
8.1 Functions of a single variable 129
8.2 Functions of several variables 131
8.3 Identifying local maxima and minima 134
9 Taylor Polynomials 139
9.1 Taylor polynomials for functions of one variable 139
9.2 Taylor polynomials for functions of two variables 141
mathematical methods i 5
III Differential equations and eigenstructure: by Des Hill 143
10 Differential Equations 145
10.1 Introduction 145
10.2 Mathematical modelling with ordinary differential equations 148
10.3 First order ordinary differential equations 149
10.4 Initial conditions 155
10.5 Linear constant-coefcient ordinary differential equations 156
10.6 Linear nonhomogeneous constant-coefcient differential equations 158
10.7 Initial and boundary conditions 162
10.8 Summary of method 164
10.9 Partial differential equations 164
11 Eigenvalues and eigenvectors 171
11.1 Introduction 171
11.2 Finding eigenvalues and eigenvectors 172
11.3 Some properties of eigenvalues and eigenvectors 175
11.4 Diagonalisation 177
11.5 Linear Systems 179
11.6 Solutions of homogeneous linear systems of differential equations 179
11.7 A change of variables approach to nding the solutions 181
12 Change of Basis 183
12.1 Introduction 183
12.2 Change of basis 185
12.3 Linear transformations and change of bases 186
IV Sequences and Series: by Luchezar Stoyanov, Jennifer Hopwood and
Michael Giudici 191
13 Sequences and Series 193
13.1 Sequences 193
13.2 Innite Series 198
13.3 Absolute Convergence and the Ratio Test 204
13.4 Power Series 205
13.5 Taylor and MacLaurin Series 207
6
14 Index 211
Part I
Linear Algebra: by Gordon
Royle and Alice Devillers
1
Systems of Linear Equations
This chapter covers the systematic solution of systems of linear equations using Gaussian elimination
and back-substitution and the description, both algebraic and geometric, of their solution space.
Before commencing this chapter, students should be able to:
Plot linear equations in 2 variables, and
Add and multiply matrices.
After completing this chapter, students will be able to:
Systematically solve systems of linear equations with many variables, and
Identify when a system of linear equations has 0, 1 or innitely many solutions, and
Give the solution set of a system of linear equations in parametric form.
1.1 Systems of Linear Equations
A linear equation is an equation of the form
x +2y = 4
where each term in the equation is either a number
1
(i.e. 6) or
1
You will often see the word scalar
used to refer to a number, and scalar
multiple to describe a numerical multi-
ple of a variable.
a numerical multiple of a variable (i.e., 2x, 4y). If an equation
involves powers or products of variables (x
2
, xy, etc.) or any other
functions (sin x, e
x
, etc.) then it is not linear.
A system of linear equations is a set of one or more linear equa-
tions considered together, such as
x + 2y = 4
x y = 1
which is a system of two equations in the two variables
2
x and y.
2
Often the variables are called un-
knowns emphasizing that solving a
system of linear equations is a process
of nding the unknowns.
A solution to a system of linear equations is an assignment of
values to the variables such that all of the equations in the system
are satised. For example, there is a unique solution to the system
given above which is
x = 2, y = 1.
Particularly when there are more variables, it will often be useful to
give the solutions as vectors
3
like
3
In this case we could also just give
the solution as (2, 1) where, by conven-
tion the rst component of the vector
is the x-coordinate Usually the vari-
ables will have names like x, y, z or x
1
,
x
2
, . . ., x
n
and so we can just specify
the vector alone in this case (2, 1)
and it will be clear which component
corresponds to which variable.
10
(x, y) = (2, 1).
If the system of linear equations involves just two variables, then
we can visualise the system geometrically by plotting the solutions to
each equation separately on the xy-plane. The solution to the sys-
tem of linear equations is the point where the two plots intersect,
which in this case is the point (2, 1).
2 1 1 2 3
2
1
1
2
3
Figure 1.1: The two linear equations
plotted as intersecting lines in the
xy-plane
It is easy to visualise systems of linear equations in two vari-
ables, but it is more difcult in three variables, where we need
3-dimensional plots. In three dimensions the solutions to a single
linear equation such as This particular equation describes the
plane with normal vector (1, 2, 1)
containing the point (4, 0, 0).
x + 2y z = 4
form a plane in 3-dimensional space. While computer algebra sys-
tems can produce somewhat reasonable plots of surfaces in three
dimensions, it is hard to interpret plots showing two or more inter-
secting surfaces.
With four or more variables any sort of visualisation is essen- It is still very useful to use geometric
intuition to think about systems of
linear equations with many variables
provided you are careful about where
it no longer applies.
tially impossible and so to reason about systems of linear equations
with many variables, we need to develop algebraic tools rather than
geometric ones.
1.1.1 Solutions to systems of linear equations
The system of linear equations shown in Figure 1.1 has a unique
solution. In other words there is just one (x, y) pair that satises
both equations, and this is represented by the unique point of inter-
section of the two lines. Some systems of linear equations have no If a system of linear equations has
at least one solution, then it is called
consistent, and otherwise it is called
inconsistent.
solutions at all. For example, there are no possible values for (x, y)
that satisfy both of the following equations:
x + 2y = 4
2x + 4y = 3
Geometrically, the two equations determine parallel but different
lines, and so they do not meet.
mathematical methods i 11
2 1 1 2 3
2
1
1
2
3
Figure 1.2: An inconsistent system
plotted as lines in the xy-plane
There is another possibility for the number of solutions to a sys-
tem of linear equations, which is that a system may have innitely
many solutions. For example, consider the system
x + 2y + z = 4
y + z = 1
(1.1)
Each of the two equations determines a plane in three dimen-
sions, and as the two planes are not parallel
4
, they meet in a line
4
The two planes are not parallel
because the normal vectors to the
two planes, that is n
1
= (1, 2, 1) and
n
2
= (0, 1, 1) are not parallel.
and so every point on the line is a solution to this system of linear
equations.
How can we describe the solution set to a system of linear equations with
innitely many solutions?
One way of describing an innite solution set is in terms of free
parameters where one (or more) of the variables is left unspecied
with the values assigned to the other variables being expressed as
formulas that depend on the free variables.
Lets see how this works with the system of linear equations
given by (1.1): here we can choose z to be the free variable but
then to satisfy the second equation it will be necessary to have
y = 1 z. Then the rst equation can only be satised by taking
x = 4 2y z
= 4 2(1 z) z (using y = 1 z)
= 2 + z.
Thus the complete solution set S of system (1.1) is
S = (2 + z, 1 z, z) [ z R.
To nd a particular solution to the linear system, you can pick
any desired value for z and then the values for x and y are deter-
mined. For example, if we take z = 1 then we get (3, 0, 1) as a
solution, and if we take z = 0 then we get (2, 1, 0), and so on. For
12
this particular system of linear equations it would have been possi-
ble to choose one of the other variables to be the free variable and
we would then get a different expression for the same solution set.
Example 1.1. (Different free variable) To rewrite the solution set S =
(2 + z, 1 z, z) [ z R so that the y-coordinate is the free variable,
just notice that as y = 1 z, this implies that z = 1 y and so the
solution set becomes S = (3 y, y, 1 y) [ y R.
A system of linear equations can also be expressed as a single
matrix equation involving the product of a matrix and a vector of
variables. So the system of linear equations
x + 2y + z = 5
y z = 1
2x + 3y z = 3
can equally well be expressed as
_

_
1 2 1
0 1 1
2 3 1
_

_
_

_
x
y
z
_

_ =
_

_
5
1
3
_

_
just using the usual rules for multiplying matrices.
5
In general, a
5
Matrix algebra is discussed in detail
in Chapter 3 but for this representation
as a system of linear equations, just
the denition of the product of two
matrices is needed.
system of linear equations with m equations in n variables has the
form
Ax = b
where A is an m n coefcient matrix, x is an n 1 vector of vari-
ables, and b is an m1 vector of scalars.
1.2 Solving Linear Equations
In this section, we consider a systematic method of solving systems
of linear equations. The method consists of two steps, rst using

In high school, systems of linear
equations are often solved with ad
hoc methods that are quite suitable
for small systems, but which are
not sufciently systematic to tackle
larger systems. It is very important
to thoroughly learn the systematic
method, as solving systems of linear
equations is a fundamental part of
many of the questions that arise in
linear algebra. In fact, almost every
question in linear algebra ultimately
depends on setting up and solving a
suitable system of linear equations!
Gaussian elimination to reduce the system to a simpler system of
linear equations, and then back-substitution to nd the solutions to
the simpler system.
1.2.1 Elementary Row Operations
An elementary row operation is an operation that transforms a system
of linear equations into a different, but equivalent system of linear
equations, where equivalent means that the two systems have
identical solutions. The answer to the obvious question Why
bother transforming one system of linear equations into another?
is that the new system might be simpler to solve than the original
system. In fact there are some systems of linear equations that are
extremely simple to solve, and it turns out that by systematically
applying a sequence of elementary row operations, we can transform
any system of linear equations into an equivalent system whose
solutions are very simple to nd.
mathematical methods i 13
Definition 1.2. (Elementary Row Operations)
An elementary row operation is one of the following three types of
transformation applied to a system of linear equations:
Type 1 Interchanging two equations.
Type 2 Multiplying an equation by a non-zero scalar.
Type 3 Adding a multiple of one equation to another equation.
In a system of linear equations, we let R
i
denote the i-th equa-
tion, and so we can express an elementary row operation symboli-
cally as follows:
R
i
R
j
Exchange equations R
i
and R
j
R
i
R
i
Multiply equation R
i
by
R
i
R
i
+ R
j
Add times R
j
to R
i
We will illustrate elementary row operations on a simple system
of linear equations:
x + 2y + z = 5
y z = 1
2x + 3y z = 3
(1.2)
Example 1.3. (Type 1 Elementary Row Operation) Applying the Type 1
elementary row operation R
1
R
2
(in words, interchange equations 1
and 2") to the original system (1.2) yields the system of linear equations:
y z = 1
x + 2y + z = 5
2x + 3y z = 3
It is obvious that this new system of linear equations has exactly the same
solutions as the original system, because each individual equation is un-
changed and listing them in a different order does not alter which vectors
satisfy them all.
Example 1.4. (Type 2 Elementary Row Operation) Applying the Type
2 elementary row operation R
2
3R
2
(in words, multiply the second
equation by 3) to the original system (1.2) gives a new system of linear
equations:
x + 2y + z = 5
3y 3z = 3
2x + 3y z = 3
Again it is obvious that this system of linear equations has exactly the
same solutions as the original system. While the second equation is
changed, the solutions to this individual equation are not changed.
6 6
This relies on the equation being
multiplied by a non-zero scalar.
Example 1.5. (Type 3 Elementary Row Operation) Applying the Type 3
elementary row operation R
3
R
3
2R
1
(in words,add 2 times the
rst equation to the third equation" ) to the original system (1.2) gives a
14
new system of linear equations:
x + 2y + z = 5
y z = 1
y 3z = 7
In this case, it is not obvious that the system of linear equations has the
same solutions as the original. In fact, the system is actually different
from the original, but it happens to have the exact same set of solutions.
This is so important that it needs to be proved.
As foreshadowed in the last example, in order to use elementary
row operations with condence, we must be sure that the set of solu-
tions to a system of linear equations is not changed when the system
is altered by an elementary row operation. To convince ourselves of
this, we need to prove
7
that applying an elementary row operation
7
A proof in mathematics is a careful
explanation of why some mathematical
fact is true. A proof normally consists
of a sequence of statements, each fol-
lowing logically from the previous
statements, where each individual
logical step is sufciently simple that
it can easily be checked. The word
proof often alarms students, but
really it is nothing more than a very
simple line-by-line explanation of a
mathematical statement. Creating
proofs of interesting or useful new
facts is the raison dtre of a profes-
sional mathematician.
to a system of linear equations neither destroys existing solutions
nor creates new ones.
Theorem 1.6. Suppose that S is a system of linear equations, and that
T is the system of linear equations that results by applying an elementary
row operation to S. Then the set of solutions to S is equal to the set of
solutions to T.
Proof. As discussed in the examples, this is obvious for Type 1 and
Type 2 elementary row operations. So suppose that T arises from S
by performing the Type 3 elementary row operation R
i
R
i
+ R
j
.
Then S consists of m equations
S = R
1
, R
2
, . . . , R
m

while T only differs in the i-th equation


T = R
1
, R
2
, . . . , R
i1
, R
i
+ R
j
, R
i+1
, . . . , R
m
.
It is easy to check that if a vector satises two equations R
i
and
R
j
, then it also satises R
i
+ R
j
and so any solution to S is also a
solution to T. What remains to be checked is that any solution to T
is a solution to S. However if a vector satises all the equations in
T, then it satises R
i
+ R
j
and R
j
, and so it satises the equation
(R
i
+ R
j
) + (R
j
)
which is just R
i
. Thus any solution to T also satises S.
Now lets consider applying an entire sequence of elementary
row operations to our example system of linear equations (1.2) to
reduce it to a much simpler form.
So, starting with
x + 2y + z = 5
y z = 1
2x + 3y z = 3
mathematical methods i 15
apply the Type 3 elementary row operation R
3
R
3
2R
1
to get
x + 2y + z = 5
y z = 1
y 3z = 7
followed by the Type 3 elementary row operation R
3
R
3
+ R
2
obtaining
x + 2y + z = 5
y z = 1
4z = 8
Now notice that the third equation only involves the variable z,
and so it can now be solved, obtaining z = 2. The second equation
involves just y, z and as z is now known, it really only involves y,
and we get y = 1. Finally, with both y and z known, the rst equa-
tion only involves x and by substituting the values that we know
into this equation we discover that x = 1. Therefore, this system of
linear equations has the unique solution (x, y, z) = (1, 1, 2).
Notice that the nal system was essentially trivial to solve, and
so the elementary row operations converted the original system
into one whose solution was trivial.
1.2.2 The Augmented Matrix
The names of the variables in a system of linear equations are es-
sentially irrelevant whether we call three variables x, y and z or
x
1
, x
2
and x
3
makes no fundamental difference to the equations or
their solution. Thus writing out each equation in full when writing
down a sequence of systems of linear equations related by elemen-
tary row operations involves a lot of unnecessary repetition of the
variable names. Provided each equation has the variables in the
same order, all the information is contained solely in the coefcients,
and so these are all we need. Therefore we normally represent a
system of linear equations by a matrix known as the augmented
matrix of the system of linear equations; each row of the matrix rep-
resents a single equation, with the coefcients of the variables to the
left of the bar, and the constant term to the right of the bar. Each
column to the left of the bar contains all of the coefcients for a sin-
gle variable. For our example system (1.2), we have the following:
x + 2y + z = 5
y z = 1
2x + 3y z = 3
_

_
1 2 1 5
0 1 1 1
2 3 1 3
_

_
Example 1.7. (From matrix to linear system) What system of linear
equations has the following augmented matrix?
_

_
0 1 2 3
1 0 2 4
3 4 1 0
_

_
The form of the matrix tells us that there are three variables, which we
can name arbitrarily, say x
1
, x
2
and x
3
. Then the rst row of the matrix
16
corresponds to the equation 0x
1
1x
2
+ 2x
3
= 3, and interpreting the
other two rows analogously, the entire system is
x
2
+ 2x
3
= 3
x
1
2x
3
= 4
3x
1
+ 4x
2
+ x
3
= 0
We could also have chosen any other three names for the variables.
In other words, the augmented matrix for the system of linear
equations Ax = b is just the matrix [A [ b].
1.3 Gaussian Elimination
When solving a system of linear equations using the augmented
matrix, the elementary row operations
8
are performed directly on
8
This is why they are called elemen-
tary row operations, rather than ele-
mentary equation operations, because
they are always viewed as operating
on the rows of the augmented matrix
the augmented matrix.
As explained earlier, the aim of the elementary row operations is
to put the matrix into a simple form from which it is easy to read
off the solutions; to be precise we need to dene exactly the simple
form that we are trying to achieve.
Definition 1.8. (Row Echelon Form)
A matrix is in row echelon form if
1. Any rows of the matrix consisting entirely of zeros occur as the last
rows of the matrix, and
2. The rst non-zero entry of each row is in a column strictly to the right
of the rst non-zero entry in any of the earlier rows.
This denition is slightly awkward to read, but very easy to
grasp by example. Consider the two matrices
_

_
1 1 1 2 0
0 0 2 1 3
0 0 0 1 0
0 0 0 0 1
_

_
_

_
1 1 1 2 0
0 0 2 1 3
0 2 0 1 0
0 0 1 2 1
_

_
Neither matrix has any all-zero rows so the rst condition is au-
tomatically satised. To check the second condition we need to
identify the rst non-zero entry in each row this is called the
leading entry:
_

_
1 1 1 2 0
0 0 2 1 3
0 0 0 1 0
0 0 0 0 1
_

_
_

_
1 1 1 2 0
0 0 2 1 3
0 2 0 1 0
0 0 1 2 1
_

_
In the rst matrix, the leading entries in rows 1, 2, 3 and 4 occur
in columns 1, 3, 4 and 5 respectively and so the leading entry for
mathematical methods i 17
each row always occurs strictly further to the right than the leading
entry in any earlier row. So this rst matrix is in row-echelon form.
However for the second matrix, the leading entry in rows 2 and 3
occur in columns 3 and 2 respectively, and so the leading entry in
row 3 actually occurs to the left of the leading entry in row 2; hence
this matrix is not in row-echelon form.
Example 1.9. (Row-echelon form) The following matrices are all in row-
echelon form:
_

_
1 0 2 1
0 0 1 1
0 0 0 0
_

_
_

_
1 1 2 3
0 2 1 1
0 0 3 0
_

_
_

_
1 0 0 0
0 2 1 1
0 0 0 1
_

_
Example 1.10. (Not row-echelon form) None of the following matrices are
in row-echelon form:
_

_
1 0 2 1
0 0 0 0
0 0 1 1
_

_
_

_
1 1 2 3
0 2 1 1
0 1 3 0
_

_
_

_
1 0 0 0
0 0 1 1
0 0 2 1
_

_
Gaussian Elimination (sometimes called row-reduction) is a system-
atic method for applying elementary row operations to a matrix
until it is in row-echelon form. Well see in the next section that a
technique called back substitution, which involves processing the
equations in reverse order, can easily determine the set of solu-
tions to a system of linear equations whose augmented matrix is in
row-echelon form.
Without further ado, here is the algorithm for Gaussian Elimina-
tion, rst informally in words, and then more formally in symbols.
The algorithm is dened for any matrices, not just the augmented
matrices arising from a system of linear equations, because it has
many applications.
Definition 1.11. (Gaussian Elimination in words)
Let A be an m n matrix. At each stage in the algorithm, a particular
position in the matrix, called the pivot position, is being processed. Ini-
tially the pivot position is at the top-left of the matrix. What happens at
each stage depends on whether the pivot entry (that is, the number in the
pivot position) is zero or not.
1. If the pivot entry is zero then, if possible, interchange the pivot row
with one of the rows below it, in order to ensure that the pivot entry is
non-zero. This will be possible unless the pivot entry and every entry
below it are zero, in which case simply move the pivot position one
column to the right.
18
2. If the pivot entry is non-zero then, by adding a suitable multiple of the
pivot row to every row below the pivot row, ensure that every entry
below the pivot entry is zero. Then move the pivot position one column
to the right and one row down.
When the pivot position is moved off the matrix, then the process nishes
and the matrix will be in row-echelon form.
The process of adding a multiple of the pivot row to every row
below it in order to zero out the column below the pivot entry is
called pivoting on the pivot entry for short.
Example 1.12. (Gaussian Elimination) Consider the following matrix,
with the initial pivot position marked:
A =
_

_
2 1 2 4
2 1 1 0
4 3 2 4
_

_
The initial pivot position is the (1, 1) position in the matrix, and the
pivot entry is therefore 2. Pivoting on the (1, 1)-entry is accomplished by
performing the two elementary operations R
2
R
2
R
1
and R
3

R
3
2R
1
, leaving the matrix:
_

_
2 1 2 4
0 0 1 4
0 1 2 4
_

_ R
2
R
2
R
1
R
3
R
3
2R
1
(The elementary row operations used are noted down next to the relevant
rows to indicate how the row reduction is proceeding.) The new pivot
entry is 0, but as the entry immediately under the pivot position is non-
zero, interchanging the two rows moves a non-zero to the pivot position.
_

_
2 1 2 4
0 1 2 4
0 0 1 4
_

_ R
2
R
3
R
3
R
2
The next step is to pivot on this entry in order to zero out all the entries
below it and then move the pivot position. As the only entry below the
pivot is already zero, no elementary row operations need be performed, and
the only action required is to move the pivot:
_

_
2 1 2 4
0 1 2 4
0 0 1 4
_

_.
Once the pivot position reaches the bottom row, there are no further opera-
tions to be performed (regardless of whether the pivot entry is zero or not)
and so the process terminates, leaving the matrix in row-echelon form
_

_
2 1 2 4
0 1 2 4
0 0 1 4
_

_
as required.
mathematical methods i 19
For completeness, and to provide a description more suitable
for implementing Gaussian elimination on a computer, we give the
same algorithm more formally in a sort of pseudo-code.
9 9
Pseudo-code is a way of expressing
a computer program precisely, but
without using the syntax of any
particular programming language.
In pseudo-code, assignments, loops,
conditionals and other features that
vary from language-to-language are
expressed in natural language.
Definition 1.13. (Gaussian Elimination in symbols)
Let A = (a
ij
) be an m n matrix and set two variables r 1, c 1.
(Here r stands for row and c for column and they store the pivot po-
sition.) Then repeatedly perform whichever one of the following operations
is possible (only one will be possible at each stage) until either r > m or
c > n, at which point the algorithm terminates.
1. If a
rc
= 0 and there exists x > r such that a
xc
,= 0 then perform the
elementary row operation R
r
R
x
.
2. If a
rc
= 0 and a
xc
= 0 for all x > r, then set c c +1.
3. If a
rc
,= 0 then, for each x > r, perform the elementary row operation
R
x
R
x
(a
xc
/a
rc
)R
r
,
and then set r r +1 and c c +1.
When this algorithm terminates, the matrix will be in row-echelon form.
1.4 Back Substitution
Recall that the whole point of elementary row operations is to
transform a system of linear equations into a simpler system with
the same solutions; in other words to change the problem to an easier
problem with the same answer. So after reducing the augmented
matrix of a system of linear equations to row-echelon form, we now
need a way to read off the solution set.
The rst step is to determine whether the system is consistent or
otherwise, and this involves identifying the leading entries in each
row of the augmented matrix in row-echelon form these are the
rst non-zero entries in each row. In other words, run a nger along
each row of the matrix stopping at the rst non-zero entry and
noting which column it is in.
Example 1.14. (Leading entries) The following two augmented matrices
in row-echelon form have their leading entries highlighted.
_

_
1 0 1 2 3
0 0 2 1 0
0 0 0 0 2
0 0 0 0 0
_

_
_

_
1 2 1 2 3
0 0 2 1 0
0 0 0 1 2
_

_
The left-hand matrix of Example 1.14 has the property that one
of the leading entries is on the right-hand side of the augmenting bar.
You may wonder why we keep
saying to the right of the augmenting
bar rather than in the last column.
The answer is that if we have more
than one linear equation with the
same coefcient matrix, say Ax = b
1
,
Ax = b
2
, then we can form a super-
augmented matrix [A [ b
1
b
2
]
and solve both systems with one
application of Gaussian elimination.
So there may be more than one column
to the right of the augmenting bar.
20
If we unpack what this means for the system of linear equations,
then we see that the third row corresponds to the linear equation
0x
1
+0x
2
+0x
3
+0x
4
= 2,
which can never be satised. Therefore this system of linear equa-
tions has no solutions, or in other words, is inconsistent. This is in
fact a dening feature of an inconsistent system of linear equations, a
fact that is important enough to warrant stating separately.
Theorem 1.15. A system of linear equations is inconsistent if and only if
one of the leading entries in the row-echelon form of the augmented matrix
is to the right of the augmenting bar.
Proof. Left to the reader.
The right-hand matrix of Example 1.14 has no such problem,
and so we immediately conclude that the system is consistent it
has at least one solution. Every column to the left of the augment-
ing bar corresponds to one of the variables in the system of linear
equations
_

_
x
1
x
2
x
3
x
4
1 2 1 2 3
0 0 2 1 0
0 0 0 1 2
_

_ (1.3)
and so the leading entries identify some of the variables. In this case,
the leading entries are in columns 1, 3 and 4 and so the identied
variables are x
1
, x
3
and x
4
. The variables identied in this fashion
are called the basic variables (also known as leading variables) of the
system of linear equations. The following sentence is the key to
understanding solving systems of linear equations by back substitu-
tion:
Every non-basic variable of a system of linear equations is a free vari-
able or free parameter of the system of linear equations, while every
basic variable can be expressed uniquely as a combination of the free
parameters and/or constants.
The process of back-substitution refers to examining the equations
in reverse order, and for each equation nding the unique expres-
sion for the basic variable corresponding to the leading entry of
that row. Lets continue our examination of the right-hand matrix of
Example 1.14, also shown with the columns identied in (1.3).
The third row of the matrix, when written out as an equation,
says that 1x
4
= 2, and so x
4
= 2, which is an expression for
the basic variable x
4
as a constant. The second row of the matrix
corresponds to the equation 2x
3
+ x
4
= 0, but as we know now that
x
4
= 2, this can be substituted in to give 2x
3
2 = 0 or x
3
= 1.
The rst row of this matrix corresponds to the equation
x
1
+2x
2
x
3
+2x
4
= 3
mathematical methods i 21
and after substituting in x
3
= 1 and x
4
= 2 this reduces to
x
1
+2x
2
= 8. (1.4)
This equation involves one basic variable (that is, x
1
) together with
a non-basic variable (that is, x
2
) and a constant (that is, 8). The rules
of back-substitution say that this should be manipulated to give an
expression for the basic variable in terms of the other things. So we
get
x
1
= 8 2x
2
and the entire solution set for this system of linear equations is
given by
S = (8 2x
2
, x
2
, 1, 2) [ x
2
R.
Therefore we conclude that this system of linear equations has
innitely many solutions that can be described by one free parameter.
The astute reader will notice that (1.4) could equally well be
written x
2
= 4 x
1
/2 and so we could use x
1
as the free parameter,
rather than x
2
so why does back-substitution need to specify
which variable should be chosen as the free parameter? The answer
is that there is always an expression for the solution set that uses
the non-basic variables as the free parameters. In other words, the
process as described will always work.
Example 1.16. (Back substitution) Find the solutions to the system of
linear equations whose augmented matrix in row-echelon form is
_

_
0 2 1 0 2 3 1
0 0 1 3 1 0 2
0 0 0 0 1 1 0
0 0 0 0 0 1 5
_

_
First identify the leading entries in the matrix, and therefore the basic and
non-basic variables. The leading entries in each row are highlighted below
_

_
0 2 1 0 2 3 1
0 0 1 3 1 0 2
0 0 0 0 1 1 0
0 0 0 0 0 1 5
_

_
Therefore the basic variables are x
2
, x
3
, x
5
, x
6
while the free variables
are x
1
, x
4
and so this system has innitely many solutions that can be
described with two free parameters. Now back-substitute starting from the
last equation. The fourth equation is simply that x
6
= 5, while the third
equation gives x
5
+ x
6
= 0 which after substituting the known value for
x
6
gives us x
5
= 5. The second equation is
x
3
+3x
4
x
5
= 2
and so it involves the basic variable x
3
along with the free variable x
4
and
the already-determined variable x
5
. Substituting the known value for x
5
and rearranging to give an expression for x
3
, we get
x
3
= 3 3x
4
(1.5)
22
Finally the rst equation is
2x
2
x
3
+2x
5
+3x
6
= 1
and so substituting all that we have already determined we get
2x
2
(3 3x
4
) +2(5) +3(5) = 1
which simplies to
x
2
=
7 3x
4
2
What about x
1
? It is a variable in the system of linear equations, but it
did not actually occur in any of the equations. So if it does not appear
in any of the equations, then there are no restrictions on its values and
so it can take any value therefore it is a free variable. Fortunately, the
rules for back-substitution have already identied it as a non-basic variable
as it should be. Therefore the nal solution set for this system of linear
equations is
S = (x
1
, (7 3x
4
)/2, 3 3x
4
, x
4
, 5, 5) [ x
1
, x
4
R
and therefore we have found an expression with two free parameters as
expected.
Key Concept 1.17. (Solving Systems of Linear Equations)
To solve a system of linear equations of the form Ax = b, perform the
following steps:
1. Form the augmented matrix [A [ b].
2. Use Gaussian elimination to put the augmented matrix into row-
echelon form.
3. Use back-substitution to express each of the basic variables as a
combination of the free variables and constants.
1.4.1 Reasoning about systems of linear equations
Understanding the process of Gaussian elimination and back-
substitution also allows us to reason about systems of linear equa-
tions, even if they are not explicitly dened, and make general
statements about the number of solutions to systems of linear equa-
tions. One of the most important is the following result, which says
that any consistent system of linear equations with more unknowns
than equations has innitely many solutions.
Theorem 1.18. Suppose that Ax = b is a system of m linear equations
in n variables. If m < n, then the system is either inconsistent or has
innitely many solutions.
Proof. Consider the row-echelon form of the augmented matrix
[A [ b]. If the last column contains the leading entry of some row,
mathematical methods i 23
then the system is inconsistent. Otherwise, each leading entry is
in a column corresponding to a variable, and so there are exactly
m basic variables. As there are n variables altogether, this leaves
n m > 0 free parameters in the solution set and so there are
innitely many solutions.
A homogeneous system of linear equations is one of the form
Ax = 0 and these systems are always consistent.
10
Thus Theo-
10
Why is this true?
rem 1.18 has the important corollary that every homogeneous system
of linear equations with more unknowns than equations has innitely
many solutions.
A second example of reasoning about a system of linear equa-
tions rather than just solving an explicit system is when the system
is not fully determined. For example, suppose that a and b are un-
known values. What can be said about the number of solutions of
the following system of linear equations?
_

_
1 2 a
0 1 2
1 3 3
_

_
_

_
x
y
z
_

_ =
_

_
3
b
0
_

_
In particular, for which values of a and b will this system have 0, 1
or innitely many solutions?
To answer this, start performing Gaussian elimination as usual,
treating a and b symbolically as their values are not known.
11
The
11
Of course, it is necessary to make
sure that you never compute anything
that might be undened, such as 1/a.
If you need to use 1/a during the
Gaussian elimination, then you need to
separate out the cases a = 0 and a ,= 0
and do them separately.
row reduction proceeds in the following steps: the initial aug-
mented matrix is
_

_
1 2 a 3
0 1 2 b
1 3 3 0
_

_
and so after pivoting on the top-left position we get
_

_
1 2 a 3
0 1 2 b
0 1 3 a 3
_

_
R
3
R
3
R
1
and then
_

_
1 2 a 3
0 1 2 b
0 0 1 a 3 b
_

_
R
3
R
3
R
2
From this matrix, we can immediately see that if a ,= 1 then 1 a ,=
0 and every variable is basic, which means that the system has a
unique solution (regardless of the value of b). On the other hand,
if a = 1 then either b ,= 3 in which case the system is inconsistent,
or b = 3 in which case there are innitely many solutions. We can
summarise this outcome:
a ,= 1 Unique solution
a = 1 and b ,= 3 No solutions
a = 1 and b = 3 Innitely many solutions
2
Vector Spaces and Subspaces
This chapter takes the rst steps away from the geometric interpretation of vectors in familiar 2- or
3-dimensional space by introducing n-dimensional vectors and the vector space R
n
, which must neces-
sarily be described and manipulated algebraically.
Before commencing this chapter, students should be able to:
Solve systems of linear equations.
After completing this chapter, students will be able to:
Determine when a set of vectors is a subspace, and
Determine when a set of vectors is linearly independent, and
Find a basis for a subspace, and hence determine its dimension.
2.1 The vector space R
n
The vector space R
n
consists of all the n-tuples of real numbers,
which henceforth we call vectors; formally we say that
R
n
= (x
1
, x
2
, . . . , x
n
) [ x
1
, x
2
, . . . , x
n
R.
Thus R
2
is just the familiar collection of pairs of real numbers that
we usually visualise by identifying each pair (x, y) with the point
(x, y) on the Cartesian plane, and R
3
the collection of triples of real
numbers that we usually identify with 3-space.
A vector u = (u
1
, . . . , u
n
) R
n
may have different meanings:
when n = 2 or 3 it could represent a geometric vector in R
n
that
has both a magnitude and a direction;
when n = 2 or 3 it could represent the coordinates of a point in
the Cartesian plane or in 3-space;
it could represent certain quantities, eg u
1
apples, u
2
pears, u
3
oranges, u
4
bananas, . . .
it may simply represent a string of real numbers.
The vector space R
n
also has two operations that can be per-
formed on vectors, namely vector addition and scalar multiplication.
26
Although their denitions are intuitively obvious, we give them
anyway:
Definition 2.1. (Vector Addition)
If u = (u
1
, u
2
, . . . , u
n
) and v = (v
1
, v
2
, . . . , v
n
) are vectors in R
n
then
their sum u +v is dened by
u +v = (u
1
+ v
1
, u
2
+ v
2
, . . . , u
n
+ v
n
).
In other words, two vectors are added coordinate-by-coordinate.
Definition 2.2. (Scalar Multiplication)
If v = (v
1
, v
2
, . . . , v
n
) and R then the product v is dened by
v = (v
1
, v
2
, . . . , v
n
) .
In other words, each coordinate of the vector is multiplied by the scalar.
Example 2.3. Here are some vector operations:
(1, 2, 1, 3) + (4, 0, 1, 2) = (5, 2, 0, 5)
(3, 1, 2) + (6, 1, 4) = (9, 0, 2)
5(1, 0, 1, 2) = (5, 0, 5, 10)
Row or column vectors?
A vector in R
n
is simply an ordered n-tuple of real numbers
and for many purposes all that matters is that we write it down
in such a way that it is clear which is the rst coordinate, the
second coordinate and so on.
However a vector can also be viewed as a matrix, which is
very useful when we use matrix algebra (the subject of Chap-
ter 3) to manipulate equations involving vectors, and then a
choice has to be made whether to use a 1 n matrix, i.e. a
matrix with one row and n columns or an n 1 matrix, i.e. a
matrix with n rows and 1 column to represent the vector.
Thus a vector in R
4
can be represented either as a row vector
such as
[1, 2, 3, 4]
or as a column vector such as
_

_
1
2
3
4
_

_
.
mathematical methods i 27
For various reasons, mostly to do with the conventional no-
tation we use for functions (that is, we usually write f (x) rather
than (x) f ), it is more convenient mathematically to assume that
vectors are represented as column vectors most of the time. Un-
fortunately, in writing about mathematics, trying to typeset a row
vector such as [1, 2, 3, 4] is much more convenient than typeset-
ting a column vector such as
_

_
1
2
3
4
_

_
which, as you can see, leads
to ugly and difcult to read paragraphs.
Some authors try to be very formal and use the notation for
a matrix transpose (see Chapter 2) to allow them to elegantly
typeset a column vector: so their text would read something
like: Let v = (1, 2, 3, 4)
T
in which case everyone is clear that the
vector v is really a column vector.
In practice however, either the distinction between a row-
and column-vector is not important (e..g adding two vectors
together) or it is obvious from the context; in either case there
is never any actual confusion caused by the difference. So to
reduce the notational overload of adding a slew of transpose
symbols that are almost never needed, in these notes weve
decided to write all vectors just as rows, but with the under-
standing that when it matters (in matrix equations), they are
really to be viewed as column vectors. In this latter case, it will
always be obvious from the context that the vectors must be
column vectors anyway!
One vector plays a special role in linear algebra; the vector in
R
n
with all components equal to zero is called the zero-vector and
denoted
0 = (0, 0, . . . , 0).
It has the obvious properties that v +0 = 0 +v = v; this means that
it is an additive identity
1
.
1
This is just formal mathematical
terminology for saying that you can
add it to any vector without altering
that vector.
2.2 Subspaces
In the study of 2- and 3- dimensional geometry, gures such as lines
and planes play a particularly important role and occur in many
different contexts. In higher-dimensional and more general vector
spaces, a similar role is played by a vector subspace or just subspace
which is a set of vectors that has three special additional properties.
Definition 2.4. (Vector Subspace)
Let S R
n
be a set of vectors. Then S is called a subspace of R
n
if
(S1) 0 S, and
28
(S2) u +v S for all vectors u, v S, and
(S3) v S for all scalars R and vectors v S.
First well go through these three conditions in turn and see
what they are saying. The rst condition, (S1) simply says that a
subspace must contain the zero vector 0; when this condition does
not hold, it is an easy way to show that a given set of vectors is not a
subspace.
Example 2.5. The set of vectors S = (x, y) [ x + y = 1 is not a
subspace of R
2
because the vector 0 = (0, 0) does not belong to S.
The second condition
2
(S2) says that in order to be a subspace, a
2
Condition (S2) does not restrict what
happens to the sum of two vectors that
are not in S, or the sum of a vector in S
and one not in S. It is only concerned
with the sum of two vectors that are
both in S.
set S of vectors must be closed under vector addition. This means that
if two vectors that are both in S are added together, then their sum
must remain in S.
Example 2.6. (Subspace)
3
In R
3
, the xy-plane is the set of all vectors of
3
In the next section, well see how to
present a formal proof that a set of
vectors is closed under vector addition,
but for these examples, geometric
intuition is enough to see that what is
being claimed is true.
the form (x, y, 0) (where x and y can be anything). The xy-plane is closed
under vector addition because if we add any two vectors in the xy-plane
together, then the resulting vector also lies in the xy-plane.
Example 2.7. (Not a subspace) In R
2
, the unit disk is the set of vectors
(x, y) [ x
2
+ y
2
1.
This set of vectors is not closed under vector addition because if we take
u = (1, 0) and v = (0, 1), then both u and v are in the unit disk, but their
sum u +v = (1, 1) is not in the unit disk.
The third condition (S3) says that in order to qualify as a sub-
space, a set S of vectors must be closed under scalar multiplication,
meaning that if a vector is contained in S, then all of its scalar multi-
ples must also be contained in S.
Example 2.8. (Closed under scalar multiplication) In R
2
, the set of
vectors on the two axes, namely
S = (x, y) [ xy = 0
is closed under scalar multiplication, because it is clear that any multiple
of a vector on the x-axis remains on the x-axis, and any multiple of a
vector on the y-axis remains on the y-axis.
Example 2.9. (Not closed under scalar multiplication) In R
2
, the unit
disk, which was dened in Example 2.7, is not closed under scalar multi-
plication because if we take u = (1, 0) and = 2, then u = (2, 0) which
is not in the unit disk.
One of the fundamental skills needed in linear algebra is the
ability to identify whether a given set of vectors in R
n
forms a
subspace or not. Usually a set of vectors will be described in some
mathematical methods i 29
way, and you will need to be able to tell whether this set of vectors
is a subspace. To prove that a given set of vectors is a subspace,
it is necessary to show that all three conditions (S1), (S2) and (S3)
are satised, while to show that a set of vectors is not a subspace,
it is only necessary to show that one of the three conditions is not
satised.
2.2.1 Subspace proofs
In this subsection, we consider in more detail how to show whether
or not a given set of vectors is a subspace or not. It is much easier to
show that a set of vectors is not a subspace than to show that a set
of vectors is a subspace. The reason for this is that conditions (S2)
and (S3) apply to every pair of vectors in the given set. To show that
either condition fails, we only need to give a single example where
the condition does not hold, but to show that they are true, we need
to nd a general argument that applies to every pair of vectors.
This asymmetry is so important that we give it a name
4
.
4
The black swan name comes from
the famous notion that in order to
prove or disprove the logical statement
all swans are white, it would only be
necessary to nd one single black swan
to disprove it, but it would be necessary
to check every possible swan in order
to prove it. Subspaces are the same
if a set of vectors is not a subspace,
then it is only necessary to nd one
black swan showing that one of
the conditions does not hold, but if it
is a subspace then it is necessary to
check every swan by proving that
the conditions hold for every pair of
vectors and scalars.
Key Concept 2.10. (The Black Swan concept)
To show that a set of vectors S is not closed under vector addition,
it is sufcient to nd a single explicit example of two vectors u, v
that are contained in S, but whose sum u +v is not contained in S.
However, to show that a set of vectors S is closed under vector
addition, it is necessary to give a formal symbolic proof that applies
to every pair of vectors in S.
Example 2.11. (Not a subspace) The set S = (w, x, y, z) [ wx = yz in
R
4
is not a subspace because if u = (1, 0, 2, 0) and v = (0, 1, 0, 2), then
u, v S but u +v = (1, 1, 2, 2) / S so condition (S2) does not hold.
One of the hardest techniques for rst-time students of linear al-
gebra is to understand how to structure a proof that a set of vectors
is a subspace, so well go slowly. Suppose that
S = (x, y, z) [ x y = 2z
is a set of vectors in R
3
, and we need to check whether or not it is a
subspace. Here is a model proof, interleaved with some discussion
about the proof.
5 5
When you do your proofs it may
help to structure them like this model
proof, but dont include the discussion
this is to help you understand why
the model proof looks like it does, but
it is not part of the proof itself.
(S1) It is obvious that
0 0 = 2(0)
and so 0 S.
Discussion: To check that 0 is in S, it is necessary to verify that the zero
vector satises the dening condition that determines S. In this case, the
dening condition is that the difference of the rst two co-ordinates (that
is, x y) is equal to twice the third coordinate (that is, 2z). For the vector
0 = (0, 0, 0) we have all coordinates equal to 0, and so the condition is true.
30
(S2) Let u = (u
1
, u
2
, u
3
) S and v = (v
1
, v
2
, v
3
) S. Then
u
1
u
2
= 2u
3
(2.1)
v
1
v
2
= 2v
3
(2.2)
Discussion: To prove that S is closed under addition, we need to check
every possible pair of vectors, which can only be done symbolically. We give
symbolic names u and v to two vectors in S and write down the only facts
that we currently know namely that they satisfy the dening condition
for S. These equations are given labels in this case, (2.1) and (2.2),
because the proof must refer to these equations later.
Now consider the sum
u +v = (u
1
+ v
1
, u
2
+ v
2
, u
3
+ v
3
)
and test it for membership in S. As
(u
1
+ v
1
) (u
2
+ v
2
) = u
1
+ v
1
u
2
v
2
(rearranging)
= (u
1
u
2
) + (v
1
v
2
) (rearranging)
= 2u
3
+2v
3
(by (2.1) and (2.2))
= 2(u
3
+ v
3
) (rearranging terms)
it follows that u +v S.
Discussion: To show that u +v is in S, we need to show that the difference
of its rst two coordinates is equal to twice its third coordinate. So the
sequence of calculations starts with the difference of the rst two coordinates
and then carefully manipulates this expression in order to show that it
is equal to twice the third coordinate. Every stage of the manipulation is
justied either just as a rearrangement of the terms or by reference to some
previously known fact. At some stage in the manipulation, the proof must
use the two equations (2.1) and (2.2), because the result must depend on the
two original vectors being vectors in S.
(S3) Let u = (u
1
, u
2
, u
3
) S and R. Then
u
1
u
2
= 2u
3
(2.3)
Discussion: To prove that S is closed under scalar multiplication, we need
to check every vector in S and scalar in R. We give the symbolic name u
to the vector in S and to the scalar, and note down the only fact that we
currently know namely that u satises the dening condition for S. Well
need this fact later, and so give it a name, in this case (2.3).
Now consider the vector
u = (u
1
, u
2
, u
3
)
and test it for membership in S. As
u
1
u
2
= (u
1
u
2
) (rearranging)
= (2u
3
) (by (2.3))
= 2(u
3
)
it follows that u S.
Discussion: To show that u is in S, we need to show that the difference
of its rst two coordinates is equal to twice its third coordinate. So the
mathematical methods i 31
sequence of calculations starts with the difference of the rst two coordinates
and then carefully manipulates it in order to show that it is equal to twice
the third coordinate. At some stage in the manipulation, the proof must use
the equations (2.3) because the result must depend on the original vector
being a member of S.
It will take quite a bit of practice to be able to write this sort of
proof correctly, so do not get discouraged if you nd it difcult at
rst. Here are some examples to try out.
Example 2.12. These sets of vectors are subspaces:
1. The set of vectors (w, x, y, z) [ w + x + y + z = 0 in R
4
.
2. The xy-plane in R
3
.
3. The line x = y in R
2
.
4. The set of vectors (x
1
, x
2
, . . . , x
n
) [ x
1
+ x
2
+ . . . + x
n1
= x
n
in
R
n
.
while these sets of vectors are not subspaces:
1. The set of vectors (w, x, y, z) [ w + x + y + z = 1 in R
4
.
2. The plane normal to n = (1, 2, 1) passing through the point (1, 1, 1).
3. The line x = y 1 in R
2
.
4. The set of vectors (x
1
, x
2
, . . . , x
n
) [ x
1
+ x
2
+ . . . + x
n1
x
n
in
R
n
for n 2.
2.2.2 Exercises
1. Show that a line in R
2
is a subspace if and only if it passes
through the origin (0, 0).
2. Find a set of vectors in R
2
that is closed under vector addition,
but not closed under scalar multiplication.
3. Find a set of vectors in R
2
that is closed under scalar multiplica-
tion, but not closed under vector addition.
2.3 Spans and Spanning Sets
We start this section by considering a simple question:
What is the smallest subspace S of R
2
containing the vector v =
(1, 2)?
If S is a subspace, then by condition (S1) of Denition 2.4 it must
also contain the zero vector, so it follows that S must contain at least
the two vectors 0, v. But by condition (S2), the subspace S must
also contain the sum of any two vectors in S, and so therefore S
must also contain all the vectors
(2, 4), (3, 6), (4, 8), (5, 10), . . .
32
But then, by condition (S3), it follows that S must also contain all
the multiples of v, such as
(1, 2), (1/2, 1), (1/4, 1/2), . . . ,
Therefore S must contain at least the set of vectors
6 6
So if a subspace contains a vector
v then it must contain every scalar
multiple of v. In R
2
and R
3
, this means
that if a subspace contains a point,
then it contains the line containing the
origin and that point.
(, 2) [ R
and in fact this set contains enough vectors to satisfy the three
conditions (S1), (S2) and (S3), and so it is the smallest subspace
containing v. Now lets extend this result by considering the same
question but with a bigger starting set of vectors: Suppose that
A = v
1
, v
2
, . . . , v
k
is a set of vectors in R
n
what is the smallest
subspace of R
n
that contains A?
To answer this, we need a couple more denitions:
Definition 2.13. (Linear Combination)
Let A = v
1
, v
2
, . . . , v
k
be a set of vectors in R
n
. Then a linear combi-
nation of the vectors in A is any vector of the form
v =
1
v
1
+
2
v
2
+ +
k
v
k
where
1
,
2
, . . .,
k
R are arbitrary scalars.
By slightly modifying the argument of the last paragraph, it
should be clear that if a subspace contains the vectors v
1
, v
2
, . . .,v
k
,
then it also contains every linear combination of those vectors. As
we will frequently need to refer to the set of all possible linear
combinations of a set of vectors, we should give it a name:
Definition 2.14. (Span)
The span of A = v
1
, v
2
, . . . , v
k
is the set of all possible linear
combinations of the vectors in A, and is denoted span(A). In symbols,
span(A) =
1
v
1
+
2
v
2
+ +
k
v
k
[
i
R, 1 i k .
Therefore, if a subspace S contains a subset A S then it also
contains the span of A. To answer the original question (what is
the smallest subspace containing A) it is enough to notice that the
span of A is always a subspace itself and so no further vectors need
to be added. This is sufciently important to write out formally as a
theorem and to give a formal proof.
Theorem 2.15. (Span of anything is a subspace) Let A = v
1
, v
2
, . . . , v
k

be a set of vectors in R
n
. Then span(A) is a subspace of R
n
, and is the
smallest subspace of R
n
containing A.
mathematical methods i 33
Proof. We must show that the three conditions of Denition 2.4
hold.
(S1) It is clear that
0 = 0v
1
+0v
2
+ +0v
k
and so 0 is a linear combination of the vectors in A and thus
0 span(A). This seems like a lot of work just to say
something that is almost obvious: if
you take two linear combinations of a
set of vectors and add them together,
then the resulting vector is also a linear
combination of the original set of
vectors!
(S2) Let u, v span(A). Then there are scalars
i
,
i
(1 i k)
such that
u =
1
v
1
+
2
v
2
+ +
k
v
k
(2.4)
v =
1
v
1
+
2
v
2
+ +
k
v
k
(2.5)
Now consider the sum u +v:
u +v = (
1
v
1
+
2
v
2
+ +
k
v
k
)
+ (
1
v
1
+
2
v
2
+ +
k
v
k
) (by (2.4), (2.5))
= (
1
+
1
) v
1
+ (
2
+
2
) v
2
+ + (
k
+
k
) v
k
and so u +v span(A).
(S3) Let v span(A) and R. Then there are scalars
i
(1 i
k) such that
v =
1
v
1
+
2
v
2
+ +
k
v
k
(2.6)
It is clear that
v = (
1
v
1
+
2
v
2
+ +
k
v
k
) (by (2.6))
= (
1
)v
1
+ (
2
)v
2
+ + (
k
)v
k
(rearranging)
and so v span(A).
The arguments earlier in this section showed that any subspace
containing A also contains span(A) and as span(A) is a subspace
itself, it must be the smallest subspace containing A.
The span of a set of vectors gives us an easy way to nd sub-
spaces start with any old set of vectors, take their span and we
get a subspace. If we do have a subspace given in this way, then
what can we say about it? Is this a useful way to create, or work
with, a subspace?
For example, suppose we start with
A = (1, 0, 1), (3, 2, 3)
as a set of vectors in R
3
. Then span(A) is a subspace of R
3
what
can we say about this subspace? The rst thing to notice is that
we can easily check whether or not a particular vector is contained
in span(A), because it simply involves solving a system of linear
equations.
7 7
This shows that having a subspace of
the form span(A) is a good representa-
tion of the subspace, because we can
easily test membership of the subspace
that is, we know which vectors
are contained in the subspace.
34
Continuing our example, to decide whether a vector v is con-
tained in span(A), we try to solve the vector equation
v =
1
(1, 0, 1) +
2
(3, 2, 3)
for the two unknowns
1
,
2
; this is a system of 3 linear equa-
tions in two unknowns and, as discussed in Chapter 1, can easily be
solved.
Example 2.16. (Vector not in span) If A = (1, 0, 1), (3, 2, 3), then
v = (2, 4, 5) is not in span(A). This follows because the equation
(2, 4, 5) =
1
(1, 0, 1) +
2
(3, 2, 3)
yields the system of linear equations

1
+ 3
2
= 2
2
2
= 4

1
+ 3
2
= 5
which is obviously inconsistent. Thus there is no linear combination of
the vectors in A that is equal to (2, 4, 5).
Example 2.17. (Vector in span) If A = (1, 0, 1), (3, 2, 3), then v =
(5, 2, 5) is in span(A). This follows because the equation
(5, 2, 5) =
1
(1, 0, 1) +
2
(3, 2, 3)
yields the system of linear equations

1
+ 3
2
= 5
2
2
= 2

1
+ 3
2
= 5
which has the unique solution
2
= 1 and
1
= 8. Therefore v
span(A) because we have now found the particular linear combination
required.
Continuing with A = (1, 0, 1), (3, 2, 3), is there another descrip-
tion of the subspace span(A)? It is easy to see that every vector in
span(A) must have its rst and third coordinates equal, and by try-
ing a few examples, it seems likely that every vector with rst and
third coordinates equal is in span(A). To prove this, we would need
to demonstrate that a suitable linear combination of the two vectors
can be found for any such vector. In other words, we need to show
that the equation
(x, y, x) =
1
(1, 0, 1) +
2
(3, 2, 3)
has a solution for all values of x and y. Fortunately this system of
linear equations can easily be solved symbolically with the result
that the system is always consistent with solution

1
= x 3y/2
2
= y/2.
Therefore we have the fact that
span((1, 0, 1), (3, 2, 3)) = (x, y, x) [ x, y R
mathematical methods i 35
2.3.1 Spanning Sets
So far in this section, we have started with a small collection of vec-
tors (that is, the set A), and then built a subspace (that is, span(A))
from that set of vectors. Why do we want to nd a spanning
set of vectors for a subspace? The
answer is that a spanning set is a
effective way of describing a subspace.
Every subspace has a spanning set,
and so it is also a universal way of
describing a subspace. Basically,
once you know a spanning set for
a subspace, you can easily calculate
everything about that subspace.
Now we consider the situation where we start with an arbi-
trary subspace V and try to nd a set hopefully a small set
of vectors A such that V = span(A). This concept is sufciently
important to warrant a formal denition:
Definition 2.18. (Spanning Set)
Let V R
n
be a subspace. Then a set A = v
1
, v
2
, . . . , v
k
of vectors,
each contained in V, is called a spanning set for V if
V = span(A).
Example 2.19. (Spanning Set) If V = (x, y, 0) [ x, y R, then V is
a subspace of R
3
. The set A = (1, 0, 0), (0, 1, 0) is a spanning set for
V, because every vector in V is a linear combination of the vectors in A,
and every linear combination of the vectors in A is in V. There are other
spanning sets for V for example, the set B = (1, 1, 0), (1, 1, 0) is
another spanning set for V.
Example 2.20. (Spanning Set) If V = (w, x, y, z) [ w+x +y +z = 0,
then V is a subspace of R
4
. The set
A = (1, 1, 0, 0), (1, 0, 1, 0), (1, 0, 0, 1)
is a spanning set for V because every vector in V is a linear combination
of the vectors in A, and every linear combination of the vectors in A is in
V. There are many other spanning sets for V for example, the set
A = (2, 1, 1, 0), (1, 0, 1, 0), (1, 0, 0, 1)
is a different spanning set for the same subspace.
One critical point that often causes difculty for students begin-
ning linear algebra is understanding the difference between span
and spanning set; the similarity in the phrases seems to cause
confusion. To help overcome this, we emphasise the difference.
8 8
Another way to think of it is that a
spanning set is like a list of LEGO
shapes that you can use to build
a model while the span is the
completed model (the subspace).
Finding a spanning set for a subspace
is like starting with the completed
model and asking What shapes do I
need to build this model?".
Key Concept 2.21. (Difference between span and spanning set) To
remember the difference between span and spanning set, make sure
you understand that:
The span of a set A of vectors is the entire subspace that can be
built from the vectors in A by taking linear combinations in all
possible ways.
A spanning set of a subspace V is the set of vectors that are
needed in order to build V.
36
In the previous examples, the spanning sets were just given
with no explanation of how they were found, and no proof that
they were the correct spanning sets. In order to show that a particu-
lar set A actually is a spanning set for a subspace V, it is necessary
to check two things:
1. Check that the vectors in A are actually contained in V this
guarantees that span(A) V.
2. Check that every vector in V can be made as a linear combination
of the vectors in A this shows that span(A) = V.
The rst of these steps is easy and, by now, you will not be sur-
prised to discover that the second step can be accomplished by
solving a system of linear equations
9
.
9
In fact, almost everything in linear
algebra ultimately involves nothing
more than solving a system of linear
equations!
Example 2.22. (Spanning Set With Proof) We show that the set A =
(1, 1, 1), (2, 1, 1) is a spanning set for the subspace
V = (x, y, z) [ z = 2x 3y R
3
.
First notice that both (1, 1, 1) and (2, 1, 1) satisfy the condition that
z = 2x 3y and so are actually in V. Now we need to show that every
vector in V is a linear combination of these two vectors. Any vector in
V has the form (x, y, 2x 3y) and so we need to show that the vector
equation
(x, y, 2x 3y) =
1
(1, 1, 1) +
2
(2, 1, 1)
in the two unknowns
1
and
2
is consistent, regardless of the values of x
and y. Writing this out as a system of linear equations we get

1
+ 2
2
= x

1
+
2
= y

1
+
2
= 2x 3y
Solving this system of linear equations using the techniques of the previ-
ous chapter shows that this system always has a unique solution, namely

1
= 2y x
2
= x y.
Hence every vector in V can be expressed as a linear combination of the
two vectors, showing that these two vectors are a spanning set for V.
Actually nding a spanning set for a subspace is not so difcult,
because it can just be built up vector-by-vector. Suppose that a sub-
space V is given in some form (perhaps by a formula) and you need
to nd a spanning set for V. Start by just picking any non-zero vec-
tor v
1
V, and examine span(v
1
) if this is equal to V, then
you have nished, otherwise there are some vectors in V that can-
not yet be built just from v
1
. Choose one of these unreachable
vectors, say v
2
, add it to the set you are creating, and then examine
span(v
1
, v
2
) to see if this is equal to V. If it is, then you are n-
ished and otherwise there is another unreachable vector, which
you call v
3
and add to the set, and so on. After some nite number
mathematical methods i 37
of steps (say k steps), this process will eventually terminate
10
when
10
The reason that this process must
terminate (i.e. not go on for ever)
will become clear over the next few
sections.
there are no more unreachable vectors in V in which case
V = span(v
1
, v
2
, . . . , v
k
)
and you have found a spanning set for V.
Example 2.23. (Finding Spanning Set) Let
V = (x, y, z) [ z = 2x 3y,
which is a subspace of R
3
. To nd a spanning set for V, we start by choos-
ing any non-zero vector that lies in V, say v
1
= (1, 0, 2). It is clear that
span(v
1
) is strictly smaller than V, because every vector in span(v
1
)
has a zero second coordinate, whereas there are vectors in V that do not
have this property. So we choose any one of these say, v
2
= (0, 1, 3)
and now consider span(v
1
, v
2
), which we can now prove is actually
equal to V. Thus, a suitable spanning set for V is the set
A = (1, 0, 2), (0, 1, 3).
2.4 Linear Independence
In the last section, we learned how a subspace can always be de-
scribed by giving a spanning set for that subspace. There are many
spanning sets for any given subspace, and in this section we con-
sider when a spanning set is efcient in the sense that it is as
small as it can be. For example, here are three different spanning
sets for the xy-plane in R
3
(remember that the xy-plane is the set of
vectors (x, y, 0) [ x, y R.
A
1
= (1, 0, 0), (0, 1, 0)
A
2
= (1, 1, 0), (1, 1, 0), (1, 3, 0)
A
3
= (2, 2, 0), (1, 2, 0)
Which of these is the best spanning set to use? There is perhaps
nothing much to choose between A
1
and A
3
, because each of them
contain two vectors
11
, but it is clear that A
2
is unnecessarily big if
11
But maybe A
1
looks more natural
because the vectors have such a simple
form; later we will see that for many
subspaces there is a natural spanning
set, although this is not always the case
we throw out any of the three vectors in A
2
, then the remaining two
vectors still span the same subspace. On the other hand, both of the
spanning sets A
1
and A
3
are minimal spanning sets for the xy-plane
in that if we discard any of the vectors, then the remaining set no
longer spans the whole xy-plane.
The reason that A
2
is not a smallest-possible spanning set for the
xy-plane is that the third vector is redundant it is already a linear
combination of the rst two vectors: (1, 3, 0) = 2(1, 1, 0) (1, 1, 0)
and therefore any vector in span(A
2
) can be produced as a linear
combination only of the rst two vectors. More precisely any linear
combination

1
(1, 1, 0) +
2
(1, 1, 0) +
3
(1, 3, 0)
38
of all three of the vectors in A
2
can be rewritten as

1
(1, 1, 0) +
2
(1, 1, 0) +
3
(2(1, 1, 0) (1, 1, 0))
which is equal to
(
1
+2
3
)(1, 1, 0) + (
1

3
)(1, 1, 0)
which is just a linear combination of the rst two vectors with al-
tered scalars.
Therefore a spanning set for a subspace is an efcient way to
represent a subspace if none of the vectors in the spanning set is a
linear combination of the other vectors. While this condition is easy
to state, it is hard to work with directly, and so we use a condition
that means exactly the same thing but is easier to use.
Definition 2.24. (Linear Independence)
Let A = v
1
, v
2
, . . . , v
k
be a set of vectors in R
n
. Then A is called
linearly independent (or just independent) if the only solution to the
vector equation

1
v
1
+
2
v
2
+ +
k
v
k
= 0
in the unknowns
1
,
2
, . . .,
k
is the trivial solution
1
=
2
= =

k
= 0.
Before seeing why this somewhat strange denition means ex-
actly the same as having no one of the vectors being a linear com-
bination of the others, well see a couple of examples. The astute
reader (indeed, even the somewhat dull and apathetic reader) will
not be surprised to learn that testing a set of vectors for linear inde-
pendence involves solving a system of linear equations.
Example 2.25. (Independent Set) In order to decide whether the set of
vectors A = (1, 1, 2, 2), (1, 0, 1, 2), (2, 1, 3, 1) in R
4
is linearly
independent, we need to solve the vector equation

1
(1, 1, 2, 2) +
2
(1, 0, 1, 2) +
3
(2, 1, 3, 1) = (0, 0, 0, 0).
This denitely has at least one solution, namely the trivial solution

1
= 0,
2
= 0,
3
= 0, and so the only question is whether it has
more solutions. The vector equation is equivalent to the system of linear
equations

1
+
2
+ 2
3
= 0

1
+
3
= 0
2
1

2
+ 3
3
= 0
2
1
+ 2
2
+
3
= 0
which can easily be shown, by the techniques of Chapter 1 to have a unique
solution.
A set of vectors that is not linearly independent is called depen-
dent. To show that a set of vectors is dependent, it is only necessary
to nd an explicit non-trivial linear combination
12
of the vectors
12
There is an asymmetry here similar
to the asymmetry in subspace proofs.
To show that a set of vectors is de-
pendent only requires one non-trivial
linear combination, whereas to show
that a set of vectors is independent it
is necessary in principle to show that
every non-trivial linear combination
of the vectors is non-zero. Of course
in practice this is done by solving the
relevant system of linear equations and
showing that it has a unique solution,
which must therefore be the trivial
solution.
equal to 0.
mathematical methods i 39
Example 2.26. (Dependent Set) Is the set
A = (1, 3, 1), (2, 1, 2), (4, 7, 0)
in R
3
linearly independent? To decide this, set up the vector equation

1
(1, 3, 1) +
2
(2, 1, 2) +
3
(4, 7, 0) = (0, 0, 0)
and check how many solutions it has. This is equivalent to the system of
linear equations
1
1
+ 2
2
+ 4
3
= 0
3
1
+
2
+ 7
3
= 0

1
+ 2
2
= 0
After Gaussian elimination, the augmented matrix for this system of linear
equations is
_

_
1 2 4 0
0 5 5 0
0 0 0 0
_

_
and so has innitely many solutions, because
3
is a free parameter.
While this is already enough to prove that A is dependent, it is always
useful to nd an explicit solution which can then be used to double-check
the conclusion. As
3
is free, we can nd a solution by putting
3
= 1, in
which case the second row gives
2
= 1 and the rst row
1
= 2. And
indeed we can check that
2(1, 3, 1) 1(2, 1, 2) +1(4, 7, 0) = (0, 0, 0)
as required to prove dependence.
Now well give a rigorous proof of the earlier claim that the
denition of linear independence (Denition 2.24) is just a way of
saying that none of the vectors is a linear combination of the others.
Theorem 2.27. Let A = v
1
, v
2
, . . . , v
k
be a set of vectors in R
n
.
Then A is linearly independent if and only if none of the vectors in A are a
linear combination of the others.
Proof. We will actually prove the contrapositive
13
statement, namely
13
Given the statement A implies B,
recall that the contrapositive statement
is not B implies not A. If a statement
is true then so is its contrapositive, and
vice versa.
that A is linearly dependent if and only if one of the vectors is
a linear combination of the others. First suppose that one of the
vectors, say v
i
, is a linear combination of the others: then there are
scalars
1
,
2
, . . .,
i1
,
i+1
, . . .,
k
such that
v
i
=
1
v
1
+
2
v
2
+ +
i1
v
i1
+
i+1
v
i+1
+ +
k
v
k
and so there is a non-trivial linear combination of the vectors of A
equal to 0, namely:

1
v
1
+
2
v
2
+ +
i1
v
i1
1v
i
+
i+1
v
i+1
+ +
k
v
k
= 0
(This linear combination is not all-zero because the coefcient of v
i
is equal to 1 which is denitely non-zero.)
40
Next suppose that the set A is linearly dependent. Then there is
some non-trivial linear combination of the vectors equal to 0:

1
v
1
+
2
v
2
+ +
k
v
k
= 0.
Because this linear combination is non-trivial, not all of the coef-
cients are equal to 0, and so we can pick one of them, say
i
, that is
non-zero. But then

i
v
i
=
1
v
1
+
2
v
2
+ +
i1
v
i1
+
i+1
v
i+1
+ +
k
v
k
,
and so because
i
,= 0 we can divide by
i
and get
v
i
=

i
v
1

i
v
2


i1

i
v
i1


i+1

i
v
i+1


k

i
v
k
so one of the vectors is a linear combination of the others.
There are two key facts about dependency that are intuitively
clear, but useful enough to state formally:
1. If A is a linearly independent set of vectors in R
n
, then any sub-
set of A is also linearly independent.
2. If B is a linearly dependent set of vectors in R
n
then any superset
of B is also linearly dependent.
In other words, you can remove vectors from an independent set
and it remains independent and you can add vectors to a depen-
dent set and it remains dependent.
2.5 Bases
In the last few sections, we have learned that giving a spanning set
for a subspace is an effective way of describing a subspace and that
a spanning set is efcient if it is linearly independent. Therefore an
excellent way to describe or transmit, for example by computer, a
subspace is to give a linearly independent spanning set for the sub-
space. This concept is so important that it has a special name:
Definition 2.28. (Basis)
Let V be a subspace of R
n
. Then a basis for V is a linearly independent
spanning set for V. In other words, a basis is a set of vectors A R
n
such that
V = span(A), and
A is linearly independent.
Example 2.29. (Basis) The set A = (1, 0, 0), (0, 1, 0) is a basis for
the xy-plane in R
3
, because it is a linearly independent set of vectors and
any vector in the xy-plane can be expressed as a linear combination of the
vectors of A.
mathematical methods i 41
Example 2.30. (Basis Proof) Let V be the subspace of R
3
dened by
V = (x, y, z) [ x y + 2z = 0. Then we shall show that A =
(2, 0, 1), (1, 1, 0) is a basis for V. We check three separate things: that
the vectors are actually in V, that they are linearly independent, and that
they are a spanning set for V.
1. Check that both vectors are actually in V.
This is true because
2 (0) +2(1) = 0
1 1 +2(1) = 0
2. Check that they are linearly independent.
Theorem 2.27 shows that two vectors are linearly dependent only if
one of them is a multiple of the other. As this is not the case here, we
conclude that the set is linearly independent.
3. Check that they are a spanning set for V
Every vector in V can be expressed as a linear combination of the two
vectors in A, because all the vectors in V are of the form (x, y, (y
x)/2) [ x, y R and using the techniques from Chapter 1 we see that
(x, y, (y x)/2) =
x y
2
(2, 0, 1) + y(1, 1, 0).
Therefore A is a basis for V.
A subspace of R
n
can have more than one basis in fact, a
subspace usually has innitely many different bases.
14 14
The word bases is the plural of
basis.
Example 2.31. (Two different bases) Let V be the subspace of R
3
dened
by V = (x, y, z) [ x y +2z = 0. Then
A = (2, 0, 1), (1, 1, 0)
B = (1, 3, 1), (3, 1, 1)
are both bases for V. Proving this is left as an exercise.
The vector space R
3
is itself a subspace and so has a basis. In
this case, there is one basis that stands out as being particularly
natural. It called the standard basis and contains the three vectors
e
1
= (1, 0, 0), e
2
= (0, 1, 0), e
3
= (0, 0, 1),
where the standard basis vectors are given the special names e
1
, e
2
and e
3
.
15
More generally, the vector space R
n
has a basis consisting
15
In Engineering, the standard basis
vectors for R
3
are also known as i, j
and k respectively.
of the n vectors e
1
, e
2
, . . . , e
n
where the i-th basis vector e
i
is all-
zero except for a single 1 in the i-th position.
Finding a basis from scratch is straightforward, because the
technique described before Example 2.23 (and illustrated in the
example) for nding spanning sets by adding vectors one-by-one to
an independent set will automatically nd a linearly independent
spanning set in other words, a basis. In fact, the same argument
42
shows that you can start with any linearly independent set and
augment it vector-by-vector to obtain a basis containing the original
linearly independent set of vectors
16
.
16
We still have not yet shown that this
process will actually terminate, but
will do so in the next section.
Another approach to nding a basis of a subspace is to start with
a spanning set that is linearly dependent and to remove vectors from
it one-by-one. If the set is linearly dependent then one of the vec-
tors is a linear combination of the others, and so it can be removed
from the set without altering the span of the set of vectors. This
process can be repeated until the remaining vectors are linearly
independent in which case they form a basis.
Example 2.32. (Basis from a spanning set) Let
A = (1, 1, 2), (2, 2, 4), (1, 2, 3), (5, 5, 0)
be a set of vectors in R
3
, and let V = span(A). What is a basis for V? We
start by testing whether A is linearly independent by solving the system of
linear equations

1
(1, 1, 2) +
2
(2, 2, 4) +
3
(1, 2, 3) +
4
(5, 5, 0) = (0, 0, 0)
to see if it has any non-trivial solutions. If so, then one of the vectors can
be expressed as a linear combination of the others and discarded. In this
case, we discover that (2, 2, 4) = 2(1, 1, 2) and so we can throw
out (2, 2, 4). Now we are left with
(1, 1, 2), (1, 2, 3), (5, 5, 0)
and test whether this set is linearly independent. By solving

1
(1, 1, 2) +
2
(1, 2, 3) +
3
(5, 5, 0) = (0, 0, 0)
we discover that (5, 5, 0) = 15(1, 1, 2) + 10(1, 2, 3) and so we
can discard (5, 5, 0). Finally the remaining two vectors are linearly
independent and so the set
(1, 1, 2), (1, 2, 3)
is a basis for V.
The most important property of a basis for a vector space V is
that every vector in V can be expressed as a linear combination of
the basis vectors in exactly one way; we prove this in the next result:
Theorem 2.33. Let B = v
1
, v
2
, . . . , v
k
be a basis for the sub-
space V. Then for any vector v V, there is a unique choice of scalars

1
,
2
, . . . ,
k
such that
v =
1
v
1
+
2
v
2
+ +
k
v
k
.
We call the scalars
1
, . . . ,
k
the coordinates of v in the basis B.
Proof. As B is a spanning set for V, there is at least one way of
expressing v as a linear combination of the basis vectors. So we just
need to show that there cannot be two different linear combinations
mathematical methods i 43
each equal to v. So suppose that there are scalars
1
,
2
, . . . ,
k
and

1
,
2
, . . . ,
k
such that
v =
1
v
1
+
2
v
2
+ +
k
v
k
v =
1
v
1
+
2
v
2
+ +
k
v
k
Subtracting these expressions and rearranging, we discover that
0 = (
1

1
)v
1
+ (
2

2
)v
2
+ + (
k

k
)v
k
.
As B is a linearly independent set of vectors, the only linear com-
bination equal to 0 is the trivial linear combination with all coef-
cients equal to 0, and so
1

1
= 0,
2

2
= 0, . . .,
k

k
= 0
and so
i
=
i
for all i. Therefore the two linear combinations for v
are actually the same.
2.5.1 Dimension
As previously mentioned, a subspace of R
n
will usually have in-
nitely many bases. However, these bases will all share one feature
every basis for a subspace V contains the same number of vectors.
This fact is not at all obvious, and so we will give a proof for it.
Actually we will prove a slightly more technical result that has the
result about bases, along with a number of other useful results, as
simple consequences.
17
While this is a result of fundamental impor-
17
A result that is a straightforward
consequence of a theorem is called a
corollary.
tance, it uses some ddly notation with lots of subscripts, so do not
feel alarmed if you do not understand it rst time through.
Theorem 2.34. Let A = v
1
, v
2
, . . . , v
k
be a linearly independent set
of vectors. Then any set of > k vectors contained in V = span(A) is
dependent.
Proof. Let w
1
, w
2
, . . . , w

be a set of > k vectors in V. Then


each of these can be expressed as a linear combination of the vec-
tors in A, and so there are scalars
ij
R, where 1 i and
1 j k such that:
w
1
=
11
v
1
+
12
v
2
+ +
1k
v
k
w
2
=
21
v
1
+
22
v
2
+ +
2k
v
k
.
.
.
w

=
1
v
1
+
2
v
2
+ +
k
v
k
Now consider what happens when we test w
1
, w
2
, . . . , w

for
linear dependence. We try to solve the system of linear equations

1
w
1
+
2
w
2
+ +

= 0 (2.7)
and determine if there are any non-trivial solutions to this system.
By replacing each w
i
in (2.7) with the corresponding expression as
a linear combination of the vectors in A, we get a huge equation:
0 =
1
(
11
v
1
+
12
v
2
+ +
1k
v
k
)
+
2
(
21
v
1
+
22
v
2
+ +
2k
v
k
)
+
+

(
1
v
1
+
2
v
2
+ +
k
v
k
).
(2.8)
44
However, this is a linear combination of the vectors in A that is
equal to the zero vector. Because A is a linearly independent set of
vectors, this happens if and only if the coefcients of the vectors v
i
in (2.8) are all zero. In other words, the scalars
1
,
2
, . . . ,

must
satisfy the following system of linear equations:

11

1
+
21

2
+ +
1

= 0

12

1
+
22

2
+ +
2

= 0
.
.
.
.
.
.
.
.
.

1k

1
+
2k

2
+ +
k

= 0.
This is a homogeneous
18
system of linear equations, and so it is
18
The constant term in each equation
is zero.
consistent. However, there are more variables than equations and
so there is at least one non-basic variable, and hence there is at least
one non-trivial choice of scalars
1
,
2
, . . . ,

satisfying (2.7),
thereby showing that w
1
, w
2
, . . . , w

is linearly dependent.
Corollary 2.35. Every basis for a subspace V of R
n
contains the same
number of vectors; this number is then called the dimension of the sub-
space V and is denoted by dim(V).
Proof. Suppose that A = v
1
, v
2
, . . . , v
k
and B = w
1
, w
2
, . . . , w

are two bases for V. Then both A and B are linearly independent
sets of vectors and both have the same span. So by Theorem 2.34 it
follows that k and k and so = k.
Example 2.36. (Dimension of R
n
) The standard basis for R
2
contains
two vectors, the standard basis for R
3
contains three vectors and the stan-
dard basis for R
n
contains n vectors, so we conclude that the dimension
of R
n
is equal to n.
Example 2.37. (Dimension of a line) A line through the origin in R
n
is
a subspace consisting of all the multiples of a given non-zero vector:
L = v [ R
The set v containing the single vector v is a basis for L and so a line is
1-dimensional.
This shows that the formal algebraic denition of dimension
coincides with our intuitive geometric understanding of the word
dimension, which is reassuring.
19 19
Of course, we would expect this to
be the case, because the algebraic no-
tion of dimension was developed as
an extension of the familiar geometric
concept.
Another important corollary of Theorem 2.34 is that we are -
nally in a position to show that the process of nding a basis by
extending
20
a linearly independent set will denitely nish.
20
extending just means adding
vectors to.
Corollary 2.38. If V is a subspace of R
n
, then any linearly independent
set A of vectors in V is contained in a basis for V.
Proof. If span(A) ,= V, then adding a vector in Vspan(A) to
A creates a strictly larger independent set of vectors. As no set
mathematical methods i 45
of n + 1 vectors in R
n
is linearly independent, this process must
terminate in at most n steps, and when it terminates, the set is a
basis for V.
The next result seems intuitively obvious, because it just says
that dimension behaves as you would expect, in that a subspace can
only contain other subspaces if they have lower dimension.
Theorem 2.39. Suppose that S, T are subspaces of R
n
and that S T.
Then dim(S) < dim(T).
Proof. Let B
S
be a basis for S. Then B
S
is a linearly independent set
of vectors contained in T, and so it can be extended to a basis for T.
As span(B
S
) ,= T the basis for T is strictly larger than the basis for
S and so dim(S) < dim(T).
Theorem 2.35 has a number of important consequences. If A is a
set of vectors contained in a subspace V, then normally there is no
particular relationship between the properties A is linearly inde-
pendent and A is a spanning set for V in that A can have none,
either or both of these properties. However if A has the right size to
be a basis, then it must either have none or both of the properties.
Corollary 2.40. Let V be a k-dimensional subspace of R
n
. Then
1. Any linearly independent set of k vectors of V is a basis for V.
2. Any spanning set of k vectors of V is a basis for V.
Proof. To prove the rst statement, let A be a linearly independent
set of vectors in V. By Corollary 2.38, A can be extended to a basis
but by Corollary 2.35 this basis contains k vectors and so no vectors
can be added to A. Therefore A is already a basis for V.
For the second statement, let B be a spanning set of vectors,
and let B
/
be the largest independent set of vectors contained in
B. Then span(B
/
) = span(B) = V and so B
/
is a basis for V. By
Corollary 2.35 this basis contains k vectors and so B
/
= B, showing
that B was already linearly independent.
Example 2.41. (Dimension of a specic plane) The subspace V =
(x, y, z) [ x + y + z = 0 is a plane through the origin in R
3
. What
is its dimension? We can quickly nd two vectors in V, namely (1, 1, 0)
and (1, 0, 1), and as they are not multiples of each other, the set
A = (1, 1, 0), (1, 0, 1)
is linearly independent. As R
3
is 3-dimensional, any proper subspace of
R
3
has dimension at most 2, and so this set of vectors is a basis.
3
Matrices and Determinants
This chapter introduces matrix algebra and explains the fundamental relationships between
matrices and their properties, and the various subspaces associated with a matrix.
Before commencing this chapter, students should be able to:
Solve systems of linear equations,
Condently identify and manipulate subspaces, including rapidly determining spanning sets and
bases for subspaces, and
Add and multiply matrices.
After completing this chapter, students will be able to:
Understand the operations of matrix algebra and identify the similarities and differences between
matrix algebra and the algebra of real numbers,
Describe, and nd bases for, the row space, column space and null space of a matrix,
Find the rank and nullity of a matrix and understand how they are related by the rank-nullity theo-
rem, and
Compute determinants and understand the relationship between determinants, rank and invertibility
of matrices.
An m n matrix is a rectangular array of numbers with m rows
and n columns.
3.1 Matrix Algebra
In this section we consider the algebra of matrices that is, the sys-
tem of mathematical operations such as addition, multiplication,
inverses and so on, where the operands
1
are matrices, rather than
1
This is mathematical terminology for
the objects being operated on.
numbers. In isolation, the basic operations are all familiar from
high school in other words, adding two matrices or multiplying
two matrices should be familiar to everyone but matrix algebra is
primarily concerned with the relationships between the operations.
3.1.1 Basic Operations
The basic operations for matrix algebra are matrix addition, matrix
multiplication , matrix transposition and scalar multiplication. For com-
pleteness, we give the formal denitions of these operations:
48
Definition 3.1. (Matrix Operations)
The basic matrix operations are matrix addition, matrix multiplication,
matrix transposition and scalar multiplication, which are dened as
follows:
Matrix Addition: Let A = (a
ij
) and B = (b
ij
) be two m n matrices.
Then their sum C = A + B is the mn matrix dened by
c
ij
= a
ij
+ b
ij
.
Matrix Multiplication: Let A = (a
ij
) be an m p matrix, and B =
(b
ij
) be a p n matrix. Then their product C = AB is the mn matrix
dened by
c
ij
=
k=p

k=1
a
ik
b
kj
.
Matrix Transposition: Let A = (a
ij
) be an m n matrix, Then the
transpose C = A
T
of A is the n m matrix dened by
c
ij
= a
ji
.
Scalar Multiplication: Let A = (a
ij
) be an mn matrix, and R be
a scalar. Then the scalar multiple C = A is the mn matrix dened by
c
ij
= a
ij
.
The properties of these operations are mostly obvious but,
again for completeness, we list them all and give them their for-
mal names.
Theorem 3.2. If A, B and C are matrices and , are scalars then,
whenever the relevant operations are dened, the following properties hold:
1. A + B = B + A (matrix addition is commutative)
2. (A + B) + C = A + (B + C) (matrix addition is associative)
3. (A + B) = A + B
4. ( + )A = A + A
5. ()A = (A)
6. A(BC) = (AB)C (matrix multiplication is associative)
7. (A)B = (AB) and A(B) = (AB)
8. A(B + C) = AB + AC (multiplication is left-distributive over
addition)
9. (A + B)C = AC + BC (multiplication is right-distributive over
addition)
10. (A
T
)
T
= A
11. (A + B)
T
= A
T
+ B
T
12. (AB)
T
= B
T
A
T
mathematical methods i 49
Proof. All of these can be proved by elementary algebraic manip-
ulation of the expressions for (i, j)-entry of the matrices on both
sides of each equation. We omit the proofs because they are slightly
tedious and not very illuminating.
2 2
However it may be worth your while
working through one of them, say
the proof that matrix multiplication is
associative, to convince yourself that
you can do it.
Almost all of the properties in Theorem 3.2 are unsurprising and
essentially mirror the properties of the algebra of real numbers.
3
3
A 1 1 matrix can be viewed as
essentially identical to a real number,
and so this is also not surprising.
Probably the only property in the list that is not immediately obvi-
ous is property (12) stating that the transpose of a matrix product
is the product of the matrix transposes in reverse order.
(AB)
T
= B
T
A
T
This property extends to longer products of matrices, for example
we can nd the transpose of ABC as follows:
4 4
A cautious reader might correctly
object that ABC is not a legiti-
mate expression in matrix algebra,
because we have only dened ma-
trix multiplication to be a product
of two matrices. To make this a legal
expression in matrix algebra, it really
needs to be parenthesised, and so we
should use either A(BC) or (AB)C.
However because matrix multiplication
is associative, these evaluate to the
same matrix and so, by convention,
as it does not matter which say the
product is parenthesised, we omit the
parentheses altogether.
(ABC)
T
= ((AB)C)
T
= C
T
(AB)
T
= C
T
(B
T
A
T
) = C
T
B
T
A
T
It is a nice test of your ability to structure a proof by induction to
prove formally that
(A
1
A
2
A
n1
A
n
)
T
= A
T
n
A
T
n1
A
T
2
A
T
1
.
However, rather than considering the obvious properties that
are in the list, it is more instructive to consider the most obvious
omission from the list; in other words, an important property that
matrix algebra does not share with the algebra of real numbers. This
is the property of commutativity of multiplication because, while
the multiplication of real numbers is commutative, it is easy to
check by example that matrix multiplication is not commutative. For
example,
_
1 1
2 1
_ _
3 1
0 4
_
=
_
3 3
6 2
_
but
_
3 1
0 4
_ _
1 1
2 1
_
=
_
1 2
8 4
_
.
If A and B are two specic matrices, then it might be the case that
AB = BA, in which case the two matrices are said to commute, but
usually it will be the case that AB ,= BA. This is a key difference
between matrix algebra and the algebra of real numbers.
There are some other key differences worth delving into: in real
algebra, the numbers 0 and 1 play special roles, being the additive
identity and multiplicative identity respectively. In other words, for
any real number x R we have
x +0 = 0 + x = x and 1.x = x.1 = x
and
0.x = x.0 = 0. (3.1)
In the algebra of square matrices (that is, n n matrices for some
n) we can analogously nd an additive identity and a multiplicative
50
identity. The additive identity is the matrix O
n
with every entry equal
to zero, and it is obvious that for any n n matrix A,
A +O
n
= O
n
+ A = A.
The multiplicative identity is the matrix I
n
where every entry on
the main diagonal
5
is equal to one, and every entry off the main
5
The main diagonal consist of the
(1, 1), (2, 2), . . ., (n, n) positions. As an
example,
I
3
=
_
_
1 0 0
0 1 0
0 0 1
_
_
.
diagonal is equal to zero. Then for any n n matrix A,
AI
n
= I
n
A = A.
When the size of the matrices is unspecied, or irrelevant, we will
often drop the subscript and just use O and I respectively. As the
terms additive/multiplicative identity" are rather cumbersome, the
matrix O is usually called the zero matrix and the matrix I is usually
called the identity matrix or just the identity.
The property (3.1) relating multiplication and zero also holds in
matrix algebra, because it is clear that
AO = OA = O
for any square matrix A. However, there are other important prop-
erties of real algebra that are not shared by matrix algebra. In par-
ticular, in real algebra there are no non-zero zero divisors, so that if
xy = 0 then at least one of x and y is equal to zero. However this is
not true for matrices there are products equal to the zero matrix
even if neither matrix is zero. For example,
_
1 1
2 2
_ _
2 1
2 1
_
=
_
0 0
0 0
_
.
Definition 3.3. (A menagerie of square matrices)
Suppose that A = (a
ij
) is an n n matrix. Then
A is the zero matrix if a
ij
= 0 for all i, j.
A is the identity matrix if a
ij
= 1 if i = j and 0 otherwise.
A is a symmetric matrix if A
T
= A.
A is a skew-symmetric matrix if A
T
= A.
A is a diagonal matrix if a
ij
= 0 for all i ,= j.
A is an upper-triangular matrix if a
ij
= 0 for all i > j.
A is a lower-triangular matrix if a
ij
= 0 for all i < j.
A is an idempotent matrix if A
2
= A, where A
2
= AA is the
product of A with itself.
A is a nilpotent matrix if A
k
= O for some k, where A
k
= AA A
. .
k times
.
mathematical methods i 51
Example 3.4. (Matrices of various types) Consider the following 3 3
matrices:
A =
_

_
1 0 1
0 2 3
0 0 1
_

_ B =
_

_
1 0 0
0 2 0
0 0 0
_

_ C =
_

_
1 0 1
0 2 1
1 1 3
_

_
Then A is an upper-triangular matrix, B is a diagonal matrix and C is
a symmetric matrix.
Example 3.5. (Nilpotent matrix) The matrix
A =
_
2 4
1 2
_
is nilpotent because
A
2
=
_
2 4
1 2
_ _
2 4
1 2
_
=
_
0 0
0 0
_
Exercise 3.1.1. Show that an upper-triangular matrix with zero diagonal
is nilpotent.
3.2 Subspaces from matrices
There are various vector subspaces associated with a matrix, and
there are useful relationships between the properties of the matrices
and subspaces. The three principal subspaces associated with a
matrix are the row space, the column space and the null space of a
matrix. The rst two of these are dened analogously to each other,
while the third is somewhat different. If A is an m n matrix,
then each of the rows of the matrix can be viewed as a vector in R
n
,
while each of the columns of the matrix can be viewed as a vector in
R
m
.
Definition 3.6. (Row and Column Space)
Let A be an m n matrix. Then the row space and column space are
dened as follows:
Row Space The row space of A is the subspace of R
n
that is spanned
by the rows of A. In other words, the row space is the subspace con-
sisting of all the linear combinations of the rows of A. We denote it by
rowsp(A).
Column Space The column space of A is the subspace of R
m
that is
spanned by the columns of A. In other words, the column space is the
subspace consisting of all the linear combinations of the columns of A. We
denote it by colsp(A).
Example 3.7. (Row space and Column Space) Let A be the 3 4 matrix
A =
_

_
1 0 1 1
2 1 2 0
1 0 2 1
_

_ (3.2)
52
Then the row space of A is the subspace of R
4
dened by
rowsp(A) = span((1, 0, 1, 1), (2, 1, 2, 0), (1, 0, 2, 1)),
while the column space of A is the subspace of R
3
dened by
colsp(A) = span((1, 2, 1), (0, 1, 0), (1, 2, 2), (1, 0, 1)).
(Remember the conventions regarding row and column vectors described
in Chapter 1.)
As described in Chapter 2, it is easy to answer any particular
question about a subspace if you know a spanning set for that
subspace. In particular, it is easy to determine whether a given
vector is in the row or column space of a matrix just by setting up
the appropriate system of linear equations.
Example 3.8. (Vector in row space) Is the vector (2, 1, 1, 3) in the row
space of the matrix A shown in (3.2) above? This question is equivalent to
asking whether there are scalars
1
,
2
and
3
such that

1
(1, 0, 1, 1) +
2
(2, 1, 2, 0) +
3
(1, 0, 2, 1) = (2, 1, 1, 3).
By considering each of the four coordinates in turn, this corresponds to the
following system of four linear equations in the three variables:

1
+ 2
2
+
3
= 2

2
= 1

1
+ 2
2
+ 2
3
= 1

1
+
3
= 3
The augmented matrix for this system is
_

_
1 2 1 2
0 1 0 1
1 2 2 1
1 0 1 3
_

_
which, after row-reduction, becomes
_

_
1 2 1 2
0 1 0 1
0 0 1 3
0 0 0 13
_

_
and so the system is inconsistent and we conclude that (2, 1, 1, 3) /
rowsp(A). Notice that the coefcient matrix of the system of linear equa-
tions is the transpose of the original matrix.
Example 3.9. (Vector in column space) Is the vector (1, 1, 2) in the row
space of the matrix A shown in (3.2) above? This question is equivalent to
asking whether there are scalars
1
,
2
,
3
and
4
such that

1
(1, 2, 1) +
2
(0, 1, 0) +
3
(1, 2, 2) +
4
(1, 0, 1) = (1, 1, 2).
mathematical methods i 53
By considering each of the three coordinate positions in turn, this corre-
sponds to a system of three equations in the four variables:

1
+
3

4
= 1
2
1

2
+ 2
3
= 1

1
+ 2
3
+
4
= 2
The augmented matrix for this system is
_

_
1 0 1 1 1
2 1 2 0 1
1 0 2 1 2
_

_
which, after row reduction, becomes
_

_
1 0 1 1 1
0 1 0 2 3
0 0 1 2 1
_

_
and so this system of linear equations has three basic variables, one free
parameter and therefore innitely many solutions. So we conclude that
(1, 1, 2) colsp(A). We could, if necessary, or just to check, nd a
particular solution to this system of equations. For example, if we set the
free parameter
4
= 1 then the corresponding solution is
1
= 3,
2
= 5,

3
= 1 and
4
= 1 and we can check that
3(1, 2, 1) +5(0, 1, 0) (1, 2, 2) + (1, 0, 1) = (1, 1, 2).
Notice that in this case, the augmented matrix of the system of linear
equations is just the original matrix itself.
In the previous two examples, the original question led to sys-
tems of linear equations whose coefcient matrix was either the
original matrix or the transpose of the original matrix.
In addition to being able to identify whether particular vectors
are in the row space or column space of a matrix, we would also
like to be able to nd a a basis for the subspace and thereby deter-
mine its dimension. In Chapter 2 we described a technique where
any spanning set for a subspace can be reduced to a basis by suc-
cessively throwing out vectors that are linear combinations of the
others. While this technique works perfectly well for determining
the dimension of the row space or column space of a matrix, there
is an alternative approach based on two simple observations:
1. Performing elementary row operations on a matrix does not
change its row space.
6 6
However, elementary row operations
do change the column space!!
2. The non-zero rows of a matrix in row-echelon form are linearly
independent, and therefore form a basis for the row space of that
matrix.
The consequence of these two facts is that it is very easy to nd
a basis for the row space of a matrix simply put it into row-
echelon form and then write down the non-zero rows that are
54
found. However the basis that is found by this process will not
usually be a subset of the original rows of the matrix. If it is neces-
sary to nd a basis for the row space whose vectors are all original
rows of the matrix, then either of the two earlier techniques can be
used.
Example 3.10. (Basis of row space) Consider the problem of nding a
basis for the row space of the 4 5 matrix
A =
_

_
1 2 1 1 4
0 0 0 0 0
1 1 1 0 1
3 0 1 1 6
_

_
.
After performing the elementary row operations R
3
R
3
1R
1
, R
4

R
4
3R
1
then R
2
R
3
and nally R
4
R
4
2R
2
, we end up with the
following matrix, which we denote A
/
, which is in row-echelon form.
A
/
=
_

_
1 2 1 1 4
0 3 2 1 3
0 0 0 0 0
0 0 0 0 0
_

_
The key point is that the elementary row operations have not changed the
row space of the matrix in any way and so rowsp(A) = rowsp(A
/
).
However it is obvious that the two non-zero rows
(1, 2, 1, 1, 4), (0, 3, 2, 1, 3)
are a basis for the row space of A
/
, and so they are also a basis for the row
space of A.
To nd a basis for the column space of the matrix A, we cannot
do elementary row operations because they alter the column space.
However it is clear that colsp(A) = rowsp(A
T
), and so just trans-
posing the matrix and then performing the same procedure will
nd a basis for the column space of A.
Example 3.11. (Basis of column space) What is a basis for the column
space of the matrix A of Example 3.10? We rst transpose the matrix,
getting
A
T
=
_

_
1 0 1 3
2 0 1 0
1 0 1 1
1 0 0 1
4 0 1 6
_

_
and then perform Gaussian elimination to obtain the row-echelon matrix
_

_
1 0 1 3
0 0 3 6
0 0 0 0
0 0 0 0
0 0 0 0
_

_
whose row space has basis (1, 0, 1, 3), (0, 0, 3, 6). Therefore these
two vectors are a basis for the column space of A.
mathematical methods i 55
What can be said about the dimension of the row space and col-
umn space of a matrix? In the previous two examples, we found
that the row space of the matrix A is a 2-dimensional subspace of
R
5
, and the column space of A is a 2-dimensional subspace of R
4
.
In particular, even though they are subspaces of different ambient
vector spaces, the dimensions of the row space and column space
turn out to be equal. This is not an accident, and in fact we have the
following surprising result:
Theorem 3.12. Let A be an mn matrix. Then the dimension of its row
space is equal to the dimension of its column space.
Proof. Suppose that v
1
, v
2
, . . . , v
k
is a basis for the column space
of A. Then each column of A can be expressed as a linear combi-
nation of these vectors; suppose that the i-th column c
i
is given
by
c
i
=
1i
v
1
+
2i
v
2
+
ki
v
k
Now form two matrices as follows: B is an m k matrix whose
columns are the basis vectors v
i
, while C = (
ij
) is a k n matrix
whose i-th column contains the coefcients
1i
,
2i
, . . .,
ki
. It then
follows
7
that A = BC.
7
You may have to try this out with
a few small matrices rst to see why
this is true. It is not difcult when
you see a small situation, but it is not
immediately obvious either.
However we can also view the product A = BC as expressing
the rows of A as a linear combination of the rows of C with the i-th
row of B giving the coefcients for the linear combination that de-
termines the i-th row of A. Therefore the rows of C are a spanning
set for the row space of A, and so the dimension of the row space of
A is at most k. We conclude that
dim(rowsp(A)) dim(colsp(A))
Applying the same argument to A
T
we also conclude that
dim(colsp(A)) dim(rowsp(A))
and hence these values are equal.
This number the common dimension of the row space and
column space of a matrix is an important property of a matrix
and has a special name:
Definition 3.13. (Matrix Rank)
The dimension of the row space (and column space) of a matrix is called
the rank of the matrix. If an n n matrix has rank n, then the matrix is
said to be full rank.
There is a useful characterisation of the row- and column spaces
of a matrix that is sufciently important to state separately.
Theorem 3.14. If A is an mn matrix, then the set of vectors
Ax [ x R
n

56
is equal to the column space of A, while the set of vectors
yA [ y R
m

is equal to the row space of A.


Proof. Suppose that c
1
, c
2
, . . . , c
n
R
m
are the n columns of
A. Then if x = (x
1
, x
2
, . . . , x
n
), it is easy to see that Ax = x
1
c
1
+
x
2
c
2
+ . . . + x
n
c
n
. Therefore every vector of the form Ax is a linear
combination of the columns of A, and every linear combination of
the columns of A can be obtained by multiplying A by a suitable
vector. A similar argument applies for the row space of A.
3.3 The null space
Suppose that A is an m n matrix. If two vectors v
1
, v
2
have the
property that Av
1
= 0 and Av
2
= 0, then simple manipulation
shows that
A(v
1
+v
2
) = Av
1
+ Av
2
= 0 +0 = 0
and for any R,
A(v
1
) = Av
1
= 0 = 0
Therefore the set of vectors v with the property that Av = 0 is
closed under vector addition and scalar multiplication, and there-
fore it satises the requirements to be a subspace.
Definition 3.15. (Null space)
Let A be an mn matrix. The set of vectors
v R
n
[ Av = 0
is a subspace of R
n
called the null space of A and denoted by nullsp(A).
Example 3.16. (Null space) Is the vector v = (0, 1, 1, 2) in the null
space of the matrix
A =
_
1 2 2 0
3 0 2 1
_
?
All that is needed is to check Av and see what arises. As
_
1 2 2 0
3 0 2 1
_
_

_
0
1
1
2
_

_
=
_
0
0
_
it follows that v nullsp(A).
This shows that testing membership of the null space of a matrix
is a very easy task. What about nding a basis for the null space of
a matrix? This turns out
8
to be intimately related to the techniques
8
No surprise here!
mathematical methods i 57
we used Chapter 1 to solve systems of linear equations.
So, suppose we wish to nd a basis for the matrix
A =
_
1 2 2 0
3 0 2 1
_
from Example 3.16. The matrix equation Ax = 0 yields the follow-
ing system of linear equations
x
1
+ 2x
2
+ 2x
3
= 0
3x
1
+ 2x
3
+ x
4
= 0
which has augmented matrix
_
1 2 2 0 0
3 0 2 1 0
_
After the single elementary row operation R
2
R
2
3R
1
, the
matrix is in row echelon form:
_
1 2 2 0 0
0 6 4 1 0
_
Therefore x
3
and x
4
are free parameters; the second equation shows
that x
2
= (x
4
4x
3
)/6 and substituting this into the rst equation
yields x
1
= (2x
3
+ x
4
)/3. Thus, following the techniques of
Chapter 1 we can describe the solution set as
S = ((2x
3
+ x
4
)/3, (x
4
4x
3
)/6, x
3
, x
4
) [ x
3
, x
4
R
In order to nd a basis for S notice that we can rewrite the solu-
tion as a linear combination of vectors by separating out the terms
involving x
3
from the terms involving x
4
((2x
3
+ x
4
)/3, (x
4
4x
3
)/6, x
3
, x
4
) =
= (2x
3
/3, 4x
3
/6, x
3
, 0) + (x
4
/3, x
4
/6, 0, x
4
)
= x
3
(2/3, 2/3, 1, 0) + x
4
(1/3, 1/6, 0, 1)
Therefore we can express the solution set S as follows:
S = x
3
(2/3, 2/3, 1, 0) + x
4
(1/3, 1/6, 0, 1) [ x
3
, x
4
R
However this immediately tells us that S just consists of all the
linear combinations of the two vectors (2/3, 2/3, 1, 0) and
(1/3, 1/6, 0, 1) and therefore we have found, almost by acci-
dent, a spanning set for the subspace S. It is immediate that these
two vectors are linearly independent and therefore they form a
basis for the null space of A.
In general, this process will always nd a basis for the null space
of a matrix. If the set of solutions to the system of linear equations
has s free parameters, then it can be expressed as a linear combina-
tion of s vectors. These s vectors will always be linearly independent
because in each of the s coordinate positions corresponding to the
free parameters, just one of the s vectors will have a non-zero entry.
58
Definition 3.17. (Nullity)
The dimension of the null space of a matrix A is called the nullity of A.
We close this section with one of the most important results in
elementary linear algebra, which is universally called the Rank-
Nullity Theorem, which has a surprisingly simple proof.
Theorem 3.18. Suppose that A is an mn matrix. Then
rank(A) +nullity(A) = n.
Proof. Consider the system of linear equations Ax = 0, which is
a system of m equations in n unknowns. This system is solved by
applying Gaussian elimination to the augmented matrix [A [ 0],
thereby obtaining the matrix [A
/
[ 0] in row-echelon form. The rank
of A is equal to the number of non-zero rows of A
/
, which is equal
to the number of basic variables in the system of linear equations.
The nullity of A is the number of free parameters in the solution
set to the system of linear equations and so it is equal to the num-
ber of non-basic variables. So the rank of A plus the nullity of A is
equal to the number of basic variables plus the number of non-basic
variables. As each of the n variables is either basic or non-basic, the
result follows.
Given a matrix, it is important to be able to put all the techniques
together and to determine the rank, the nullity, a basis for the null
space and a basis for the column space of a given matrix. This is
demonstrated in the next example.
Example 3.19. (Rank and nullity) Find the rank, nullity and bases for
the row space and null space for the following 4 4 matrix:
A =
_

_
1 0 2 1
3 1 3 3
2 1 1 0
2 1 1 2
_

_
.
All of the questions can be answered once the matrix is in row-echelon
form, and so the rst task is to apply Gaussian elimination, which will
result in the following matrix:
A
/
=
_

_
1 0 2 1
0 1 3 0
0 0 0 2
0 0 0 0
_

_
.
The matrix has 3 non-zero rows and so the rank of A is equal to 3. These
non-zero rows form a basis for the rowspace of A and so the basis for the
rowspace of A is
(1, 0, 2, 1), (0, 1, 3, 0), (0, 0, 0, 2).
mathematical methods i 59
The null space is the set of solutions to the matrix equation Ax = 0,
and solving this equation by performing Gaussian elimination on the
augmented matrix [A [ 0] would yield the augmented matrix [A
/
[ 0].
9 9
In other words, the Gaussian elim-
ination part only needs to be done
once, because everything depends
only on the form of the matrix A
/
.
However it is important to remember
that although it is the same matrix,
we are using it in two quite distinct
ways. This distinction is often missed
by students studying linear algebra for
the rst time.
So given the augmented matrix
A
/
=
_

_
1 0 2 1 0
0 1 3 0 0
0 0 0 2 0
0 0 0 0 0
_

_
,
we see that x
1
, x
2
and x
4
are basic variables, while the solution space has
x
3
as its only free parameter. Expressing the basic variables in terms of the
free parameters by back-substitution we determine that the solution set is
S = (2x
3
, 3x
3
, x
3
, 0) [ x
3
R
and it is clear that a basis for this is (2, 3, 1, 0) and so the nullity of A
is equal to 1.
3.4 Solving systems of linear equations
We have seen that the set of all solutions to the system of linear
equations Ax = 0 is the nullspace of A. What can we say about the
set of solutions of
Ax = b (3.3)
when b ,= 0?
Suppose that we know one solution x
1
and that v lies in the
nullspace of A. Then
A(x
1
+v) = Ax
1
+ Av = b +0 = b
Hence if we are given one solution we can create many more by
simply adding elements of the nullspace of A.
Moreover, given any two solutions x
1
and x
2
of (3.3) we have that
A(x
1
x
2
) = Ax
1
Ax
2
= b b = 0
and so x
1
x
2
lies in the nullspace of A. In particular, every solu-
tion of Ax = b is of the form x
1
+ v for some v nullsp(A). This is
so important that we state it as a theorem.
Theorem 3.20. Let Ax = b be a system of linear equations and let x
1
be
one solution. Then the set of all solutions is
x
1
+v [ v nullsp(A)
Corollary 3.21. The number of free parameters required for the set of
solutions of Ax = b is the dimension of the nullspace of A.
3.5 Matrix Inversion
In this section we consider the inverse of a matrix. The theory of
inverses in matrix algebra is far more subtle and interesting than
60
that of inverses in real algebra, and it is intimately related to the
ideas of independence and rank that we have explored so far.
In real algebra, every non-zero element has a multiplicative in-
verse; in other words, for every x ,= 0 we can nd another number,
x
/
such that
xx
/
= x
/
x = 1
Rather than calling it x
/
, we use the notation x
1
to mean the
number that you need to multiply x by in order to get 1, thus
getting the familiar
2
1
=
1
2
In matrix algebra, the concept of a multiplicative identity and
hence an inverse only makes sense for square matrices. However,
even then, there are some complicating factors. In particular, be-
cause multiplication is not commutative, it is conceivable that a
product of two square matrices might be equal to the identity only
if they are multiplied in a particular order. Fortunately, this does
not actually happen:
Theorem 3.22. Suppose that A and B are square matrices such that
AB = I
n
. Then BA = I
n
.
Proof. First we observe that B has rank equal to n. This follows
because if Bv = 0 for any non-zero vector v, then ABv = 0, which
contradicts the fact that AB = I
n
. So the null space of B contains
only the zero vector, so B has nullity 0 and therefore, by the Rank-
Nullity theorem, B has rank n.
Secondly we do some simple manipulation using the properties
of matrix algebra that were outlined in Theorem 3.2:
O
n
= AB I
n
(because AB = I
n
)
= B(AB I
n
) (because BO
n
= O
n
)
= BAB B (distributivity)
= (BA I
n
)B (distributivity)
This manipulation shows that the matrix (BA I)B = O
n
.
However because the rank of B is n, it follows that the column
space of B is the whole of R
n
, and so any vector v R
n
can be
expressed in the form Bx for some x. Therefore
(BA I
n
)v = (BA I
n
)Bx (because B has rank n)
= O
n
x (because (BA I
n
)B = O
n
)
= 0 (properties of zero matrix)
The only matrix whose product with every vector v is equal to 0 is
the zero matrix itself, so BA I
n
= O
n
or BA = I
n
as required.
This theorem shows that when dening the inverse of a matrix,
we dont need to worry about the order in which the multiplication
occurs.
10 10
In some text-books, the authors
introduce the idea that B is the left-
inverse of A if BA = I and the
right-inverse of A if AB = I, and
then immediately prove Theorem 3.22
showing that a left-inverse is a right-
inverse and vice versa.
mathematical methods i 61
Another property of inverses in the algebra of real numbers is
that a non-zero real number has a unique inverse. Fortunately, this
property also holds for matrices:
Theorem 3.23. If A, B and C are square matrices such that
AB = I
n
and AC = I
n
then B = C.
Proof. The proof just proceeds by manipulation using the properties
of matrix algebra outlined in Theorem 3.2.
B = BI
n
(identity matrix property)
= B(AC) (hypothesis of theorem)
= (BA)C (associativity)
= I
n
C (by Theorem 3.22)
= C (identity matrix property)
Therefore a matrix has at most one inverse.
Definition 3.24. (Matrix Inverse)
Let A be an n n matrix. If there is a matrix B such that
AB = I
n
then B is called the inverse of A, and is denoted A
1
. From The-
orems 3.22 and 3.23, it follows that B is uniquely determined, that
BA = I
n
, and that B
1
= A.
A matrix is called invertible if it has an inverse, and non-invertible
otherwise.
Example 3.25. (Matrix Inverse) Suppose that
A =
_

_
1 0 1
2 1 1
2 0 1
_

_
Then if we take
B =
_

_
1 0 1
0 1 1
2 0 1
_

_
then it is easy to check that
AB = BA = I
n
Therefore we conclude that A
1
exists and is equal to B and, naturally,
B
1
exists and is equal to A.
In real algebra, every non-zero number has an inverse, but this is
not the case for matrices:
62
Example 3.26. (Non-zero matrix with no inverse) Suppose that
A =
_
1 1
2 2
_
.
Then there is no possible matrix B such that AB = I
2
. Why is this? If
the matrix B existed, then it would necessarily satisfy
_
1 1
2 2
_ _
b
11
b
12
b
21
b
22
_
=
_
1 0
0 1
_
.
In order to satisfy this matrix equation, then b
11
+ b
21
must equal 1, while
2b
11
+2b
21
= 2(b
11
+ b
21
) must equal 0 - clearly this is impossible. So
the matrix A has no inverse.
One of the common mistakes made by students of elementary
linear algebra is to assume that every matrix has an inverse. If an
argument or proof about a generic matrix A ever uses A
1
as part
of the manipulation, then it is necessary to rst demonstrate that A
is actually invertible. Alternatively, the proof can be broken down
into two separate cases, one covering the situation where A is as-
sumed to be invertible and a separate one for where it is assumed
to be non-invertible.
3.5.1 Finding inverses
This last example of the previous section (Example 3.26) essentially
shows us how to nd the inverse of a matrix because, as usual, it all
boils down to solving systems of linear equations. If A is an n n
matrix then nding its inverse, if it exists, is just a matter of nding
a matrix B such that AB = I
n
. To nd the rst column of B, it is
sufcient to solve the equation Ax = e
1
, then the second column
is the solution to Ax = e
2
, and so on. If any of these equations has
no solutions then A does not have an inverse.
11
Therefore, nding
11
In fact, if any of them have innitely
many solutions, then it also follows
that A has no inverse because if the
inverse exists it must be unique. Thus
if one of the equations has innitely
many solutions, then one of the other
equations must have no solutions.
the inverse of an n n matrix involves solving n separate systems
of linear equations. However because each of the n systems has the
same coefcient matrix (that is, A), there are shortcuts that make
this procedure easier.
To illustrate this, we do a full example for a 3 3 matrix, al-
though the principle is the same for any matrix. Suppose we want
to nd the inverse of the matrix
A =
_

_
1 0 1
0 1 1
2 0 1
_

_.
The results above show that we just need to nd a matrix B = (b
ij
)
such that
_

_
1 0 1
0 1 1
2 0 1
_

_
_

_
b
11
b
12
b
13
b
21
b
22
b
23
b
31
b
32
b
33
_

_ =
_

_
1 0 0
0 1 0
0 0 1
_

_
mathematical methods i 63
This can be done by solving three separate systems of linear equa-
tions, one to determine each column of B:
A
_

_
b
11
b
21
b
31
_

_ =
_

_
1
0
0
_

_, A
_

_
b
12
b
22
b
32
_

_ =
_

_
0
1
0
_

_ and A
_

_
b
13
b
23
b
33
_

_ =
_

_
0
0
1
_

_.
Then the matrix A has an inverse if and only if all three of these
systems of linear equations have a solution, and in fact, each of
them must have a unique solution. If any one of the three equations
is inconsistent, then A is one of the matrices that just doesnt have
an inverse.
Consider how the solution of these systems of linear equations
will proceed: for the rst column, the Gaussian elimination pro-
ceeds as follows: We start with the augmented matrix
_

_
1 0 1 1
0 1 1 0
2 0 1 0
_

_,
and after pivoting on the (1, 1)-entry we get
_

_
1 0 1 1
0 1 1 0
0 0 1 2
_

_
R
3
R
3
+2R
1
which is now in row-echelon form. As every variable is basic, this
system of linear equations has a unique solution, which can easily
be determined by back-substitution, giving b
31
= 2, b
21
= 2 and
b
11
= 1.
Now we solve for the second column of B; this time the aug-
mented matrix is
_

_
1 0 1 0
0 1 1 1
2 0 1 0
_

_,
and after pivoting on the (1, 1)-entry we get
_

_
1 0 1 0
0 1 1 1
0 0 1 0
_

_
R
3
R
3
+2R
1
It is immediately apparent that we used the exact same elementary
row operations as we did for the previous system of linear equations
because, naturally enough, the coefcient matrix is the same matrix.
And obviously, well do the same elementary row operations again
when we solve the third system of linear equations! So to avoid
repeating work unnecessarily, it is better to solve all three systems
simultaneously. This is done by using a sort of super-augmented
matrix that has three columns to the right of the augmenting bar,
representing the right-hand sides of the three separate equations:
_

_
1 0 1 1 0 0
0 1 1 0 1 0
2 0 1 0 0 1
_

_
64
Then performing an elementary row operation on this bigger ma-
trix has exactly the same effect as doing it on each of the three
systems separately.
_

_
1 0 1 1 0 0
0 1 1 0 1 0
0 0 1 2 0 1
_

_
R
3
R
3
+2R
1
We could now break this apart again into three separate systems
of linear equations and solve each of them by back-substitution, but
again this involves a little bit more work than necessary.
Suppose instead that we do some more elementary row operations
in order to make the system(s) of linear equations even simpler than
before. Usually, when we pivot on an entry in the matrix, we use
the pivot row in order to zero-out the rest of the column below the
pivot entry. However there is nothing stopping us from zeroing out
the rest of the column above the pivot entry as well. Lets do this on
the (3, 3)-entry in the matrix and see how useful it is:
_

_
1 0 0 1 0 1
0 1 0 2 1 1
0 0 1 2 0 1
_

_
R
1
R
1
R
3
R
2
R
2
+ R
3
One nal elementary row operation puts the matrix into an espe-
cially nice form.
_

_
1 0 0 1 0 1
0 1 0 2 1 1
0 0 1 2 0 1
_

_
R
1
(1)R
1
In this form not even any back substitution is needed to nd the
solution; the rst system of equations has solution b
11
= 1, b
21
= 2
and b
31
= 3, while the second has solution b
12
= 0, b
22
= 1 and
b
32
= 0 while the nal system has solution b
13
= 1, b
23
= 1 and
b
33
= 1. Thus the inverse of the matrix A is given by
A
1
=
_

_
1 0 1
2 1 1
2 0 1
_

_
which is just exactly the matrix that was found to the right of the
augmenting bar!
In this example, weve jumped ahead without using the formal
terminology or precisely dening the especially nice form of the
nal matrix. We remedy this immediately.
Definition 3.27. (Reduced row-echelon form)
A matrix is in reduced row-echelon form if it is in row echelon form,
and
1. The leading entry of each row is equal to one, and
2. The leading entry of each row is the only non-zero entry in its column.
mathematical methods i 65
Example 3.28. (Reduced row-echelon form) The following matrices are
both in reduced row echelon form:
_

_
1 0 0 2 0
0 1 0 0 0
0 0 1 3 0
0 0 0 0 1
_

_
and
_

_
1 0 0
0 1 0
0 0 1
0 0 0
_

_
.
Example 3.29. (Not in reduced row-echelon form) The following matrices
are NOT in reduced row echelon form:
_

_
1 0 0 2 0
0 1 0 0 0
0 0 1 3 0
0 0 0 0 2
_

_
and
_

_
1 0 0
0 1 0
0 0 1
0 0 1
_

_
.
A simple modication to the algorithm for Gaussian elimination
yields an algorithm for reducing a matrix to reduced row echelon
form; this algorithm is known as Gauss-Jordan elimination. It is pre-
sented below, with the differences between Gaussian elimination
and Gauss-Jordan elimination highlighted in boldface.
Definition 3.30. (Gauss-Jordan Elimination in words)
Let A be an m n matrix. At each stage in the algorithm, a particular
position in the matrix, called the pivot position, is being processed. Ini-
tially the pivot position is at the top-left of the matrix. What happens at
each stage depends on whether the pivot entry (that is, the number in the
pivot position) is zero or not.
1. If the pivot entry is zero then, if possible, interchange the pivot row
with one of the rows below it, in order to ensure that the pivot entry is
non-zero. This will be possible unless the pivot entry and every entry
below it are zero, in which case simply move the pivot position one
column to the right.
2. If the pivot entry is non-zero, multiply the pivot row to ensure
that the pivot entry is 1 and then, by adding a suitable multiple of
the pivot row to every row above and below the pivot row, ensure that
every entry above and below the pivot entry is zero. Then move the
pivot position one column to the right and one row down.
When the pivot position is moved off the matrix, then the process nishes
and the matrix will be in reduced row echelon form.
Now we can formally summarise the procedure for nding the
inverse of a matrix. Remember, however, that this is simply a way
of organising the calculations efciently, and that there is nothing
more sophisticated occurring than solving systems of linear equa-
tions.
66
Key Concept 3.31. (Finding the inverse of a matrix) In order to nd
the inverse of an n n matrix A, proceed as follows:
1. Form the super-augmented matrix
[A [ I
n
]
2. Apply Gauss-Jordan elimination to this matrix to place it into
reduced row-echelon form
3. If the resulting reduced row echelon matrix has an identity matrix
to the left of the augmenting bar, then it must have the form
[I
n
[ A
1
]
and so A
1
will be the matrix on the right of the augmenting bar.
4. If the reduced row echelon matrix does not have an identity ma-
trix to the left of the augmenting bar, then the matrix A is not
invertible.
It is interesting to note that, while it is important to understand
what a matrix inverse is and how to calculate a matrix inverse, it is
almost never necessary to actually nd an explicit matrix inverse
in practice. An explicit problem for which a matrix inverse might
be useful can almost always be solved directly (by some form of
Gaussian elimination) without actually computing the inverse.
However, as we shall see in the next section, understanding
the procedure for calculating an inverse is useful in developing
theoretical results.
3.5.2 Characterising invertible matrices
In the last two subsections we have dened the inverse of a matrix,
demonstrated that some matrices have inverses and others dont
and given a procedure that will either nd the inverse of a matrix
or demonstrate that it does not exist. In this subsection, we consider
some of the special properties of invertible matrices focussing on
what makes them invertible, and what particular properties are
enjoyed by invertible matrices.
Theorem 3.32. An n n matrix is invertible if and only if it has rank
equal to n.
Proof. This is so important that we give a couple of proofs in
slightly different language, though the fundamental concept is
the same in both proofs.
Proof 1: Applying elementary row operations to a matrix does not
alter its row space, and hence its rank. If a matrix A is invertible,
then Gauss-Jordan elimination applied to A will yield the iden-
tity matrix, which has rank n. If A is not invertible, then applying
mathematical methods i 67
Gauss-Jordan elimination to A yields a matrix with at least one row
of zeros, and so it does not have rank n.
Proof 2: If a matrix A is invertible then there is always a solution
to the matrix equation
Ax = v
for every v, and so the column space of A is equal to the whole
of R
n
. Conversely if A is not invertible, then at least one of the
standard basis vectors e
1
, e
2
, . . . , e
n
is not in the column space of
A and so the rank of A is strictly less than n.
There are some other characterisations of invertible matrices that
may be useful, but they are all really just elementary restatements
of Theorem 3.32.
Theorem 3.33. Let A be an n n matrix. Then
1. A is invertible if and only if its rows are linearly independent.
2. A is invertible if and only if its columns are linearly independent.
3. A is invertible if and only if its row space is R
n
.
4. A is invertible if and only if its column space is R
n
.
Proof. These are all ways of saying the rank of A is n.
Example 3.34. (Non-invertible matrix) The matrix
A =
_

_
0 1 2
1 2 1
1 3 1
_

_
is not invertible because
(0, 1, 2) + (1, 2, 1) = (1, 3, 1)
is a dependency among the rows, and so the rows are not linearly indepen-
dent.
Now lets consider some of the properties of invertible matrices.
Theorem 3.35. Suppose that A and B are invertible n n matrices, and
k is a positive integer. Then
1. The matrix AB is invertible, and
(AB)
1
= B
1
A
1
.
2. The matrix A
k
is invertible, and
_
A
k
_
1
=
_
A
1
_
k
.
3. The matrix A
T
is invertible, and
_
A
T
_
1
=
_
A
1
_
T
.
68
Proof. To show that a matrix is invertible, it is sufcient to demon-
strate the existence of some matrix whose product with the given
matrix is the identity. Thus to show that AB is invertible, we must
nd something that we can multiply AB by in order to end up with
the identity.
(AB)(B
1
A
1
) = A(BB
1
)A
1
(associativity)
= AI
n
A
1
(properties of inverses)
= AA
1
(properties of identity)
= I
n
(properties of inverses)
This shows that AB is invertible, and that its inverse is B
1
A
1
as required. The remaining two statements are straightforward to
prove.
This theorem shows that the collection of invertible n n ma-
trices is closed under matrix multiplication. In addition, there is a
multiplicative identity (the matrix I
n
) and every matrix has an in-
verse (obviously!). These turn out to be the conditions that dene
an algebraic structure called a group. The group of invertible n n
matrices plays a fundamental role in the mathematical subject of
group theory which is an important topic in higher-level Pure Mathe-
matics.
3.6 Determinants
From high-school we are all familiar with the formula for the in-
verse of a 2 2 matrix:
_
a b
c d
_
1
=
1
ad bc
_
d b
c a
_
if ad bc ,= 0
where the inverse does not exist if ad bc = 0. In other words, a
2 2 matrix has an inverse if and only if ad bc ,= 0. This num-
ber is called the determinant of the matrix, and it is either denoted
det(A) or just [A[.
Example 3.36. (Determinant Notation) If
A =
_
3 5
2 4
_
then we say either
det(A) = 2 or

3 5
2 4

= 2
because 3 4 2 5 = 2.
In this section, well extend the concept of determinant to n n
matrices and show that it characterises invertible matrices in the
same way a matrix is invertible if and only if its determinant is
non-zero.
mathematical methods i 69
The determinant of a square matrix is a scalar value (i.e. a num-
ber) associated with that matrix that can be recursively dened as
follows:
Definition 3.37. (Determinant)
If A = (a
ij
) is an n n matrix, then the determinant of A is a real
number, denoted det(A) or [A[, that is dened as follows:
1. If n = 1, then [A[ = a
11
2. If n > 1, then
[A[ =
j=n

j=1
(1)
1+j
a
1j
[A[1, j][ (3.4)
where A[i, j] is the (n 1) (n 1) matrix obtained from A by
deleting the i-th row and the j-th column
Notice that when n > 1, this expresses an n n determinant as an alter-
nating sum of n terms, each of which is an (n 1) (n 1) determinant.
Example 3.38. (A 3 3 determinant) What is the determinant of the
matrix
A =
_

_
2 5 3
4 3 6
1 0 2
_

_?
First lets identify the matrices A[1, 1], A[1, 2] and A[1, 3]; recall these are
obtained by deleting one row and column from A. For example, A[1, 2] is
obtained by deleting the rst row and second column from A, thus
A[1, 2] =
_

_
2 5 3
4 3 6
1 0 2
_

_
The term (1)
1+j
simply alternates between +1 and 1 and so the rst
term is added because (1)
2
= 1, the second subtracted because (1)
3
=
1, the third added, and so on. Using the formula we get
[A[ = 2

3 6
0 2

4 6
1 2

+3

4 3
1 0

= 2 6 5 2 +3 (3)
= 7
where the three 2 2 determinants have just been calculated using the
usual rule.
This procedure for calculating the determinant is called expand-
ing along the rst row, because each of the terms a
1j
A[1, j] is associ-
ated with an entry in the rst row. However it turns out, although
we shall not prove it,
12
that it is possible to do the expansion along
12
Proving this is not difcult but
it involves a lot of manipulation of
subscripts and nested sums, which is
probably not the best use of your time.
any row or indeed, any column. So in fact we have the following
result:
70
Theorem 3.39. Let A = (a
ij
) be an n n matrix. Then for any xed
row index i we have
[A[ =
j=n

j=1
(1)
i+j
a
ij
[A[i, j][
and for any xed column index j, we have
[A[ =
i=n

i=1
(1)
i+j
a
ij
[A[i, j][.
(Notice that the rst of these two sums involves terms obtained from the i-
th row of the matrix, while the second involves terms from the j-th column
of the matrix.)
Proof. Omitted.
Example 3.40. (Expanding down the second column) Determine the
determinant of
A =
_

_
2 5 3
4 3 6
1 0 2
_

_
by expanding down the second column. Notice that because we are using
the second column, the signs given by the (1)
i+j
terms alternate 1,
+1, 1 starting with a negative, not a positive. So the calculation gives
[A[ = (1) 5

4 6
1 2

+3

2 3
1 2

+ (1) 0 (dont care)


= 5 2 +3 1 +0
= 7
Also notice that because a
32
= 0, the term (1)
3+2
a
32
[A[3, 2][ is forced
to be zero, and so there is no need to actually calculate [A[3, 2][.
In general, you should choose the row or column of the matrix
that has lots of zeros in it, in order to make the calculation as easy
as possible!
Example 3.41. (Easy if you choose right) To determine the determinant of
A =
_

_
3 1 0 2
4 1 0 1
2 1 0 1
1 1 3 2
_

_
use the third column which has only one non-zero entry, and get
[A[ = (+1) 0 + (1) 0 + (+1) 0 + (1) 3

3 1 2
4 1 1
2 1 1

= 6
rather than getting an expression with three or four 3 3 determinants to
evaluate!
mathematical methods i 71
From this we can immediately deduce some theoretical results:
Theorem 3.42. Let A be an n n matrix. Then
1. [A
T
[ = [A[,
2. [A[ =
n
[A[, and
3. If A has a row of zeros, then [A[ = 0.
4. If A is an upper (or lower) triangular matrix, then [A[ is the product of
the entries on the diagonal of the matrix.
Proof. To prove the rst statement, we use induction on n. Certainly
the statement is true for 1 1 matrices. So now suppose that it
is true for all matrices of size up to n 1. Notice that expanding
along the rst row of A gives a sum with the same coefcients as
expanding down the rst column of A
T
, the signs of each term are
the same because (1)
i+j
= (1)
j+i
, and all the (n 1) (n 1)
determinants in the rst sum are just the transposes of those in the
second sum, and so are equal by the inductive hypothesis.
For the second statement we again use induction. Certainly the
statement is true for 1 1 matrices. So now suppose that it is true
for all matrices of size up to n 1. Then
[A[ =
j=n

j=1
(1)
i+j
(a
ij
) [A[i, j][
=
j=n

j=1
(1)
i+j
(a
ij
)
n1
[A[i, j][ (inductive hypothesis)
=
n1
j=n

j=1
(1)
i+j
a
ij
[A[i, j][ (rearranging)
=
n
[A[.
The third statement is immediate because if we expand along the
row of zeros, then every term in the sum is zero.
The fourth statement again follows from an easy induction argu-
ment.
Example 3.43. (Determinant of matrix in row echelon form) A matrix in
row echelon form is necessarily upper triangular, and so its determinant
can easily be calculated. For example, the matrix
A =
_

_
2 0 1 1
0 1 2 1
0 0 3 2
0 0 0 1
_

_
which is in row echelon form has determinant equal to 2 1 (3) 1 =
6 because this is the product of the diagonal entries. We can verify this
easily from the formula by repeatedly expanding down the rst column. So

2 0 1 1
0 1 2 1
0 0 3 2
0 0 0 1

= 2

1 2 1
0 3 2
0 0 1

= 2 1

3 2
0 1

72
3.6.1 Calculating determinants
The recursive denition of a determinant expresses an n n de-
terminant as a linear combination of n terms each involving an
(n 1) (n 1) determinant. So to nd a 10 10 determinant
like this involves computing ten 9 9 determinants, each of which
involves nine 8 8 determinants, each of which involves eight 7 7
determinants, each of which involves seven 6 6 determinants,
each of which involves six 5 5 determinants, each of which in-
volves ve 4 4 determinants, each of which involves four 3 3
determinants, each of which involves three 2 2 determinants.
While this is possible (by computer) for a 10 10 matrix, even the
fastest supercomputer would not complete a 100 100 matrix in the
lifetime of the universe.
However, in practice, a computer can easily nd a 100 100
determinant, so there must be another more efcient way. Once
again, this way is based on elementary row operations.
Theorem 3.44. Suppose A is an n n matrix, and that A
/
is obtained
from A by performing a single elementary row operation.
1. If the elementary row operation is Type I, then
[A
/
[ = [A[.
In other words, a Type I elementary row operation multiplies the deter-
minant by 1.
2. If the elementary row operation is Type 2, say R
i
R
i
then
[A
/
[ = [A[.
In other words, multiplying a row by the scalar multiplies the deter-
minant by .
3. If the elementary row operation is of Type 3, then
[A[ = [A
/
[.
In other words, adding a multiple of one row to another does not change
the determinant.
Proof. Omitted for now.
Previously we have used elementary row operations to nd
solutions to systems of linear equations, and to nd the basis and
dimension of the row space of a matrix. In both these applications,
the elementary row operations did not change the answer that was
being sought. For nding determinants however, elementary row
operations do change the determinant of the matrix, but they change
it in a controlled fashion and so the process is still useful.
13 13
Another common mistake for begin-
ning students of linear algebra is to
assume that row-reduction preserves
every interesting property of a matrix.
Instead, row-reduction preserves some
properties, alters others in a controlled
fashion, and destroys others. It is
important to always know why the
row-reduction is being done.
mathematical methods i 73
Example 3.45. (Finding determinant by row-reduction) We return to an
earlier example, of nding the determinant of
A =
_

_
2 5 3
4 3 6
1 0 2
_

_
Suppose that this unknown value is denoted d. Then after doing the Type I
elementary row operation R
1
R
3
we get the matrix
_

_
1 0 2
4 3 6
2 5 3
_

_
which has determinant d, because Type 1 elementary row operations
multiply the determinant by 1. If we now perform the Type 3 elementary
row operations, R
2
R
2
4R
1
and R
3
R
3
2R
1
, then the resulting
matrix
_

_
1 0 2
0 3 2
0 5 1
_

_
still has determinant d because Type 3 elementary row operations do
not alter the determinant. Finally, the elementary row operation R
3

R
3
(5/3)R
2
yields the matrix
_

_
1 0 2
0 3 2
0 0 7/3
_

_
which still has determinant d. However, it is easy to see that the deter-
minant of this nal matrix is 7, and so d = 7, which immediately tells
us that d = 7 conrming the results of Examples 3.38 and 3.40
Thinking about this process in another way shows us that if a
matrix A has determinant d and the matrix A
/
is the row-echelon
matrix obtained by performing Gaussian elimination on A, then
[A
/
[ = [A[ for some ,= 0.
Combining this with the fourth property of Theorem 3.44 allows us
to state the single most important property of determinants:
Theorem 3.46. A matrix A is invertible if and only if its determinant is
non-zero.
Proof. Consider the row echelon matrix A
/
obtained by applying
Gaussian elimination to A. If A is invertible, then A
/
has no zero
rows and so every diagonal entry is non-zero and thus [A
/
[ ,= 0,
while if A is not invertible, A
/
has at least one zero row and thus
[A
/
[ = 0. As the determinant of A is a non-zero multiple of the
determinant of A
/
, it follows that A has non-zero determinant if
and only if it is invertible.
74
3.6.2 Properties of Determinant
We nish this chapter with some of the properties of determinants,
most of which follow immediately from the following theorem,
which shows that the determinant function is multiplicative.
Theorem 3.47. If A and B are two n n matrices, then
[AB[ = [A[ [B[.
Proof. There are several proofs of this result, none of which are
very nice. We give a sketch outline
14
of the most illuminating proof.
14
This is a very brief outline of the
proof so do not worry if you cannot
follow it without some guidance on
how to ll in the gaps.
First note that if either A or B (or both) is not invertible, then AB is
not invertible and so the result is true if any of the determinants is
zero.
Then proceed in the following steps:
1. Dene an elementary matrix to be a matrix obtained by perform-
ing a single elementary row operation on the identity matrix.
2. Note that premultiplying a matrix A by an elementary matrix E,
thereby forming the matrix EA, is exactly the same as perform-
ing the same elementary row operation on A.
3. Show that elementary matrices of Type 1, 2 and 3 have deter-
minant 1, and 1 respectively (where is the non-zero scalar
associated with an elementary row operation of Type 2).
4. Conclude that the result is true if A is an elementary matrix or a
product of elementary matrices.
5. Finish by proving that a matrix is invertible if and only if it is the
product of elementary matrices, because Gauss-Jordan elimina-
tion will reduce any invertible matrix to the identity matrix.
The other proofs of this result use a different description of the
determinant as the weighted sum of n! products of matrix entries,
together with extensive algebraic manipulation.
Just for fun, well demonstrate this result for 2 2 matrices
purely algebraically, in order to give a avour of the alternative
proofs. Suppose that
A =
_
a b
c d
_
B =
_
a
/
b
/
c
/
d
/
_
Then
AB =
_
aa
/
+ bc
/
ab
/
+ bd
/
a
/
c + c
/
d b
/
c + dd
/
_
Therefore
[AB[ = (aa
/
+ bc
/
)(b
/
c + dd
/
) (ab
/
+ bd
/
)(a
/
c + c
/
d)
= aa
/
b
/
c + aa
/
dd
/
+ bc
/
b
/
c + bc
/
dd
/
ab
/
a
/
c ab
/
c
/
d bd
/
a
/
c bd
/
c
/
d
= (aa
/
b
/
c ab
/
a
/
c) + aa
/
dd
/
+ bc
/
b
/
c + (bc
/
dd
/
bd
/
c
/
d) ab
/
c
/
d bd
/
a
/
c
= 0 + (ad)(a
/
d
/
) + (bc)(b
/
c
/
) +0 (ad)(b
/
c
/
) (a
/
d
/
)(bc)
= (ad bc)(a
/
d
/
b
/
c
/
)
= [A[[B[.
mathematical methods i 75
The multiplicativity of the determinant immediately gives the
main properties.
Theorem 3.48. Suppose that A and B are n n matrices and k is a
positive integer. Then
1. [AB[ = [BA[
2. [A
k
[ = [A[
k
3. If A is invertible, then [A
1
[ = 1/[A[
Proof. The rst two are immediate, and the third follows from the
fact that AA
1
= I
n
and so [A[[A
1
[ = 1.
4
Linear transformations
4.1 Introduction
First we will give the general denition of a function.
Definition 4.1. (Function)
Given two sets A and B, a function f : A B is a rule that assigns to
each element of A a unique element of B. We often write
f : A B
a f (a)
where f (a) is the element of B assigned to a, called the image of a un-
der f . The set A is called the domain of f and is sometimes denoted
dom( f ). The set B is called the codomain of f . The range of f , (some-
times denoted range( f )) is the set of all elements of B that are the image
of some element of A.
Often f (a) is dened by some equation involving a (or whatever
variable is being used to represent elements of A), for example
f (a) = a
2
. However, sometimes you may see f dened by listing
f (a) for each a A. For example, if A = 1, 2, 3 we could dene f
by
f : A R
1 10
2 10
3 102
If f (x) is dened by some rule and the domain of f is not explicitly
given then we assume that the domain of f is the set of all values
on which f (x) is dened.
Note that the range of f need not be all of the codomain. For
example, if f : R R is the function dened by f (x) = x
2
then the
codomain of f is R while the range of f is the set x R [ x 0.
A linear transformation is a function from one vector space to an-
other preserving the structure of vector spaces, that is, it preserves
vector addition and scalar multiplication.
More precisely:
78
Definition 4.2. (Linear transformation)
A function f from R
n
to R
m
is a linear transformation if:
1. f (u +v) = f (u) + f (v) for all u, v in R
n
;
2. f (v) = f (v) for all v in R
n
and all in R.
An interesting case is when n = m, in
which case the domain and codomain
are the same vector space. Example 4.3. (Linear transformation) In R
3
, the orthogonal projection
to the xy-plane is a linear transformation. This maps the vector (x, y, z) to
(x, y, 0). Check the two conditions to convince
yourself
Example 4.4. (Not a linear transformation) The function from R
3
to
R given by f (x, y, z) = x
2
+ y
2
+ z
2
is not a linear transformation.
Indeed for v = (1, 0, 0) and = 2, f (v) = f (2, 0, 0) = 4 while
f (v) = 2.1 = 2.
Example 4.5. (Not a linear transformation) Let f : R R dened
by f (x) = ax + b. Note that f (1) = a + b while f (2) = 2a + b ,=
2(a + b) when b ,= 0. Thus when b ,= 0, the function f is not a linear
transformation of the vector space R. We call f an afne function.
Example 4.6. Let A be an mn matrix. Then the function f from R
n
to
R
m
such that f (x) = Ax is a linear transformation (where we see x as an
n 1 column vector, as described in Chapter 1). Indeed
f (u +v) = A(u +v) = Au + Av = f (u) + f (v)
(using matrix property (8) of Theorem 3.2), and
f (v) = A(v) = Av = f (v)
(using matrix property (7) of Theorem 3.2).
Theorem 4.7. Let f : R
n
R
m
be a linear transformation.
(i) f (0) = 0.
(ii) The range of f is a subspace of R
m
.
Proof. (i) This follows from an easy calculation and the fact that
0 +0 = 0: f (0) = f (0 +0) = f (0) + f (0). The result follows from
subtracting f (0) on each side.
(ii) We need to prove the three subspace conditions for range( f ) (see
Denition 2.4).
(S1) 0 range( f ) since 0 = f (0) by Part (i).
(S2) Let u, v range( f ), that is, u = f (u

), v = f (v

) for some
u

, v

R
n
. Then u +v = f (u

) + f (v

) = f (u

+v

) and so
u +v range( f ).
(S3) Let R and v range( f ), that is, v = f (v

) for some
v

R
n
. Then v = f (v

) = f (v

) and so v range( f )
for all scalars
mathematical methods i 79
4.1.1 Linear transformations and bases
A linear transformation can be given by a formula, but there are
other ways to describe it. In fact, if we know the images under a
linear transformation of each of the vectors in a basis, then the rest
of the linear transformation is completely determined.
Theorem 4.8. Let u
1
, u
2
, . . . , u
n
be a basis for R
n
and let t
1
, t
2
, . . ., For instance the basis of R
n
can be the
standard basis.
t
n
be n vectors of R
m
. Then there exists a unique linear transformation f
from R
n
to R
m
such that f (u
1
) = t
1
, f (u
2
) = t
2
, . . ., f (u
n
) = t
n
.
Proof. We know by Theorem 2.33 that any vector v R
n
can be
written in a unique way as v =
1
u
1
+
2
u
2
+ . . . +
n
u
n
(where the

i
s are real numbers). Dene f (v) =
1
t
1
+
2
t
2
+ . . . +
n
t
n
. Then
f satises f (u
i
) = t
i
for all i between 1 and n and we can easily
check that f is linear. So we have that f exists.
Now suppose g is also a linear transformation satisfying g(u
i
) =
t
i
for all i between 1 and n. Then
g(v) = g(
1
u
1
+
2
u
2
+ . . . +
n
u
n
)
= g(
1
u
1
) + g(
2
u
2
) + . . . + g(
n
u
n
)
(by the rst condition for a linear function)
=
1
g(u
1
) +
2
g(u
2
) + . . . +
n
g(u
n
)
(by the second condition for a linear function)
=
1
t
1
+
2
t
2
+ . . . +
n
t
n
= f (v)
Thus g(v) = f (v) for all v R
n
so they are the same linear trans-
formation, that is, f is unique.
Exercise 4.1.1. Say f is a linear transformation from R
2
to R
3
with
f (1, 0) = (1, 2, 3) and f (0, 1) = (0, 1, 2). Determine f (x, y).
4.1.2 Linear transformations and matrices
We have seen that it is useful to choose a basis of the domain, say
u
1
, u
2
, . . . , u
n
. Now we will also take a basis for the codomain:
v
1
, v
2
, . . . , v
m
. For now we can think of both the bases as the
standard ones, but later we will need the general case.
By Theorem 2.33, each vector f (u
j
) (1 j n) has unique
coordinates in the basis v
1
, v
2
, . . . , v
m
of R
m
. More precisely:
f (u
j
) = a
1j
v
1
+ a
2j
v
2
+ . . . + a
mj
v
m
where the symbols a
ij
represent real numbers, for 1 i m,
1 j n. For short we write
f (u
j
) =
m

i=1
a
ij
v
i
.
Now we can determine the image of any vector x in R
n
. If x =
x
1
u
1
+ x
2
u
2
+ . . . x
n
u
n
, then we have:
80
f (x) = f (x
1
u
1
+ x
2
u
2
+ . . . + x
n
u
n
)
= f (
n

j=1
x
j
u
j
)
=
n

j=1
x
j
f (u
j
) (by linearity)
=
n

j=1
x
j
_
m

i=1
a
ij
v
i
_
=
m

i=1
_
n

j=1
a
ij
x
j
_
v
i
Notice that
n
j=1
a
ij
x
j
is exactly the i-th element of the m 1
matrix Ax, where x is the n 1 vector (x
1
, x
2
, . . . , x
n
)
T
. Now let
us see vectors as column vectors (coordinates in the chosen basis),
that is, vectors in R
m
are m1 matrices and vectors in R
n
are n 1
matrices. Then f (x) = Ax, where A = (a
ij
) is the m n matrix
dened by f (u
j
) =
m
i=1
a
ij
v
i
. Since A has size mn and x has size
n 1, Ax has size m1 and so is a vector of R
m
. Together with Example 4.6, this tells
us that linear transformations are
essentially the same as matrices (after
you have chosen a basis of the domain
and a bais of the codomain).
This gives us a very convenient way to express a linear transfor-
mation (as the matrix A) and to calculate the image of any vector.
Definition 4.9. (Matrix of a linear transformation)
The matrix of a linear transformation f , with respect to the basis B of the
domain and the basis C of the codomain, is the matrix whose j-th column
contains the coordinates in the basis C of the the image under f of the j-th
basis vector of B.
When both B and C are the standard bases then we refer to the matrix
as the standard matrix of f .
Whenever m = n we usually take B = C.
Example 4.10. (identity) The identity matrix I
n
corresponds to the
linear transformation that xes every basis vector, and hence xes every
vector in R
n
.
Example 4.11. (dilation) The linear transformation with matrix
_
2 0
0 2
_
A dilation is a function that maps every
vector to a xed multiple of itself:
x x, where is called the ratio of
the dilation.
(with respect to the standard basis, for both the domain and codomain)
maps e
1
to 2e
1
and e
2
to 2e
2
: it is a dilation of ratio 2 in R
2
.
Example 4.12. (rotation) The linear transformation with matrix
_
0 1
1 0
_
(with respect to the standard basis) maps e
1
to e
2
and e
2
to e
1
: it is the
anticlockwise rotation of the plane by an angle of 90 degrees (or /2)
around the origin.
mathematical methods i 81
Exercise 4.1.2. In R
2
, an anticlockwise rotation of angle around the
origin is a linear transformation. What is its matrix with respect to the
standard basis?
4.1.3 Composition
Whenever you have two linear transformations such that the
codomain of the rst one is the same vector space as the domain
of the second one, we can apply the rst one followed by the sec-
ond one.
Definition 4.13. (Composition)
Let f : R
n
R
m
and g : R
m
R
p
be functions. Then the function
g f : R
n
R
p
dened by
(g f )(x) = g( f (x))
is the composition of f by g.
Notice we read composition from right
to left.
Theorem 4.14. If f : R
n
R
m
and g : R
m
R
p
are linear
transformations, then g f is also a linear transformation.
Proof. We need to prove the two conditions for a linear transforma-
tion.
For all u, v in R
n
:
(g f )(u +v) = g( f (u +v)) (denition of composition)
= g( f (u) + f (v)) ( f is a linear transformation)
= g( f (u)) + g( f (v))) (g is a linear transformation)
= (g f )(u) + (g f )(v) (denition of composition)
For all u in R
n
and all in R:
(g f )(v) = g( f (v)) (denition of composition)
= g(f (v)) ( f is a linear transformation)
= g( f (v)) (g is a linear transformation)
= (g f )(v) (denition of composition)
Let A = (a
ij
) be the matrix corresponding to f with respect to
the basis u
1
, u
2
, . . . , u
n
of the domain and the basis v
1
, v
2
, . . . , v
m

of the codomain. Let B = (b


ij
) be the matrix corresponding to g
with respect to the basis v
1
, v
2
, . . . , v
m
of the domain and the ba-
sis w
1
, w
2
, . . . , w
p
of the codomain. So A is an mn matrix and
B is a p m matrix. Let us look at the image of u
1
under g f .
We rst apply f , so the image f (u
1
) corresponds to the rst
column of A: f (u
1
) = a
11
v
1
+ a
21
v
2
+ . . . + a
m1
v
m
=
m
i=1
a
i1
v
i
.
82
Then we apply g to f (u
1
):
(g f )(u
1
) = g
_
m

i=1
a
i1
v
i
_
=
m

i=1
a
i1
g(v
i
) (g is a linear transformation)
=
m

i=1
a
i1
_
p

j=1
b
ji
w
j
_
=
p

j=1
_
m

i=1
a
i1
b
ji
_
w
j
=
p

j=1
_
m

i=1
b
ji
a
i1
_
w
j
=
p

j=1
(BA)
j1
w
j
Notice this is exactly the rst column of the matrix BA! We can do
the same calculation with any u
j
(1 j n) to see that the image
(g f )(u
j
) corresponds exactly to the j-th column of the matrix BA.
Hence the matrix corresponding to g f with respect to the basis
u
1
, u
2
, . . . , u
n
of the domain and the basis w
1
, w
2
, . . . , w
p
of the
codomain is BA.
Key Concept 4.15. Composition of linear transformations is the
same thing as multiplication of the corresponding matrices.
You may have thought that matrix multiplication was dened in
a strange way: it was dened precisely so that it corresponds with
composition of linear transformations.
4.1.4 Inverses
An inverse function is a function that undoes another function:
if f (x) = y, the inverse function g maps y to x. More directly,
g( f (x)) = x, meaning that g f is the identity function, that is, it
xes every vector. A function f that has an inverse is called invert-
ible and the inverse function is then uniquely determined by f and
is denoted by f
1
.
In the case where f is a linear transformation, it is invertible if
and only if its domain equals its codomain and its range, say R
n
. In
this case, f
1
f is the identity function.
Let A be the matrix corresponding to f and B be the matrix
corresponding to f
1
(all with respect to the standard basis, say);
both are n n matrices. We have seen in Example 4.10 that the
matrix corresponding to the identity function is the identity matrix
I
n
. By the Key Concept 4.15, we have that BA = I
n
. In other words,
B is the inverse matrix of A.
Hence we have:
mathematical methods i 83
Theorem 4.16. The matrix corresponding to the inverse of an invertible
linear transformation f is the inverse of the matrix corresponding to f
(with respect to a chosen basis).
4.1.5 Rank-Nullity Theorem revisited
Remember the Rank-Nullity Theorem (Theorem 3.18): rank(A) +
nullity(A) = n for an m n matrix A. We now know that A rep-
resents a linear transformation f : x Ax, so we are going to
interpret what the rank and the nullity are in terms of f .
The rank of A is the dimension of the column space of A. We
have seen that the columns represent the images f (u
j
) for each
basis vector u
j
of R
n
. So the column space corresponds exactly to
the range of f . By Theorem 4.7, we know that the range of f is a
subspace, and so has a dimension: the rank of A corresponds to the
dimension of the range of f .
The nullity of A is the dimension of the null space of A. Recall
that the null space of A is the set of vectors x of R
n
such that Ax =
0. In terms of f , it corresponds to the vectors x of R
n
such that
f (x) = 0. This set is called the kernel of f .
Definition 4.17. (Kernel)
The kernel of a linear transformation f : R
n
R
m
is the set
Ker( f ) = x R
n
[ f (x) = 0.
The kernel of f is a subspace
1
of R
n
and so has a dimension: the
1
Try proving it!
nullity of A corresponds to the dimension of the kernel of f .
We can now rewrite the Rank-Nullity Theorem as follows:
Theorem 4.18. Let f be a linear transformation. Then
dim(range( f )) +dim(Ker( f )) = dim(dom( f )).
We immediately get:
Corollary 4.19. The dimension of the range of a linear transformation
is at most the dimension of its domain.
Part II
Differential Calculus: by
Luchezar Stoyanov,
Jennifer Hopwood and
Michael Giudici. Some
pictures courtesy of Kevin
Judd.
5
Vector functions and functions of several variables
A great deal can be done with scalar functions of one real variable;
that is, functions that are rules for assigning to each real number in
some subset of the real line (the independent variable) another real
number (the dependent variable). An example of such a function is
f (x) = x
2
where x is a real number. However, sometimes this type of function
is not enough to deal with the problems that arise in science, en-
gineering, economics or other elds. For instance, a meteorologist
might want to deal with air pressure which varies with latitude,
longitude and time, so would need a function of three variables.
Other types of problems might involve just one independent vari-
able (for instance, time) which affects several dependent variables
(e.g. the components in three directions of an electromagnetic eld);
then we would have a set of three functions of one variable. We
could deal with all three at once by regarding them as the compo-
nents of a vector.
Just as for scalar functions of one real variable, it is important to
know how these functions change as the variables change. This is
where calculus is required. Luckily the ideas of differential calculus
from high school can be extended to these new functions and that is
the focus of this part of the course.
5.1 Vector valued functions
Vector functions are functions that take a real number and assign a
vector in R
n
. We will usually be dealing with vectors in R
2
and R
3
.
More precisely, given a subset I of R, a vector function with values
in R
3
is a function
r : I R
3
t r(t) = ( f (t), g(t), h(t))
The functions f (t), g(t) and h(t) are called coordinate functions of
r(t).
For instance, t could represent time and r(t) the position of some
object at that time. As t varies in I, r(t) draws a curve C in R
3
which is the path of the object in space.
88
An example of such a function is
r : R R
3
t
_
sin t, 2 cos t,
t
10
_
that plots the curve given in Figure 5.1.
Figure 5.1: The curve dened by r(t) =
_
sin(t), 2 cos(t),
t
10
_
for t [0, 20]
In mathematical language, the curve C is the set described by
C = ( f (t), g(t), h(t)) [ t I.
We say that
C : x = f (t), y = g(t), z = h(t), for t I
are parametric equations of C with parameter t, or equivalently, that
( f (t), g(t), h(t)) is a parameterisation of C.
Remark 5.1. Any curve C (except if it consists of a single point) has
innitely many different parameterisations, that is, can be described by
innitely many vector functions. See Example 5.2(2) and (3) below.
A vector function with values in R
2
has the form
r : I R
2
t r(t) = ( f (t), g(t))
with coordinate functions f (t) and g(t). As above, when t varies in
I, r(t) draws a curve C in R
2
.
Example 5.2. 1. For t R, let r(t) = (x
0
+ at, y
0
+ bt, z
0
+ ct), where
x
0
, y
0
, z
0
, a, b, c are real constants. The corresponding curve C dened
by
x = x
0
+ at, y = y
0
+ bt, z = z
0
+ ct, for t R,
is a straight line in R
3
this is the line through the point P =
(x
0
, y
0
, z
0
) parallel to the vector v = (a, b, c). Briey, r(t) = P + tv.
2. For t [0, 2] dene r(t) = (cos(t), sin(t)) The corresponding curve
is the unit circle with centre at (0, 0) and with radius one.
3. Another parameterisation of the unit circle is given by r
1
(t) =
(cos(2t), sin(2t)) for t [0, ].
mathematical methods i 89
5.2 Functions of two variables
We can think of two variables x and y as the vector (x, y) R
2
.
A function of two variables can then be thought of as a function
whose variable is a vector in R
2
. More formally, let D be a subset of
R
2
. A function of two variables dened on D is a rule that assigns to
each (x, y) D a real number f (x, y), that is,
f : D R
(x, y) f (x, y)
An example of a function of two variables is the height above sea
level at a location on a map dened by its x and y coordinates.
Given a function f : A B, the graph of f is the set
graph( f ) = ( a, f (a) ) [ a A
When f is a scalar-valued function of one real variable, then the
elements of graph( f ) are points in the plane and so this concept
corresponds to the graph in the plane that you are familiar with
drawing.
When f is a function of two variables then
graph( f ) = ( x, y, f (x, y) ) [ (x, y) D
This denes a surface S in R
3
. We sometimes write
S : z = f (x, y), for (x, y) D,
which can be regarded as a parametric equation for S. So if we
think of R
2
as the plane in R
3
dened by z = 0, then z = f (x, y)
gives the distance of the surface above or below the plane at the
point (x, y) just as y = f (x) gives the distance of a curve above or
below the x-axis for functions of one variable.
Figure 5.2 shows the surface dened by the graph of the function
f (x, y) = y
2
x
2
.
Figure 5.2: The surface dened by
f (x, y) = y
2
x
2
Example 5.3. 1. Let
f : R
2
R
(x, y) x
2
+ y
2
90
Then
graph( f ) = (x, y, x
2
+ y
2
) [ (x, y) R
2

and denes a surface that is called a paraboloid and is shown in Fig-


ure 5.3. The domain of f is R
2
and the range is [0, ).
Figure 5.3: The surface dened by
f (x, y) = x
2
+ y
2
2. Let D = (x, y) R
2
[ x
2
+ y
2
1 and dene
f : D R
(x, y)
_
1 x
2
y
2
The graph of f is the upper hemisphere in R
3
of radius 1 and centre
(0, 0, 0) as displayed in Figure 5.4. The range of f is the interval [0, 1].
Figure 5.4: The surface dened by
f (x, y) =
_
1 x
2
y
2
5.2.1 Level Curves
Another way of visually representing functions of two variables is
by using level curves.
Suppose f : D R is a function of two variables. Then the set
of points (x, y) in D satisfying the equation
f (x, y) = k
where k is some real constant denes a curve in D. It is called a
level curve of f because it consists of all the points in D for which
the corresponding points on the surface dene by f are at the same
mathematical methods i 91
level, that is, at height [k[ above or below the plane. If k is not in the
range of f , then the level curve does not exist.
Examples of level curves with which you might be familiar are:
1. contour lines on maps, which are lines of constant height above
sea-level, and
2. the isobars on a weather chart, which are lines of constant atmo-
spheric pressure.
Level curves are a way of giving visual information about a surface
in R
3
in a 2-dimensional form.
Example 5.4. 1. Let f (x, y) = x
2
+ y
2
. If k > 0, the level curve dened
by f (x, y) = k is the circle x
2
+ y
2
= k. If k = 0 then the level curve
is the single point 0 and if k < 0, then the level curve does not exist.
Figure 5.5 displays some of the level curves.
Figure 5.5: Some level curves for
f (x, y) = x
2
+ y
2
.
2. Let f (x, y) = 2x
2
+3y
2
. If k > 0, the level curve dened by f (x, y) =
k is the ellipse 2x
2
+ 3y
2
= k. If k = 0 the level curve is the single
point 0 and if k < 0, then the level curve does not exist. Some of the
level curves are displayed in Figure 5.6.
Figure 5.6: Some level curves for
f (x, y) = 2x
2
+3y
2
.
5.3 Functions of three or more variables
Functions of three or more variables may also be dened but it
is not possible to visualise their graphs as curves or surfaces. For
92
example, if D is a subset of R
3
, a function of three variables dened in
D is a rule that assigns to each (x, y, z) D a real number f (x, y, z).
In a similar way, given a subset D of R
n
, we can consider functions
f (x
1
, x
2
, . . . , x
n
) dened for (x
1
, x
2
, . . . , x
n
) D.
For a general function f of n variables,
graph( f ) = ( x
1
, x
2
, . . . , x
n
, f (x
1
, x
2
, . . . , x
n
) ) [ (x
1
, x
2
, . . . , x
n
) D
(this is an n-dimensional surface in R
n+1
, which is a concept which
makes algebraic but not visual sense). For any k R the set
S = x D [ f (x) = k
is called a level surface of f (level curve when n = 2).
Example 5.5. 1. f (x, y, z) = ax + by + cz = c x is a linear function
of three variables for each constant vector c = (a, b, c). Here x =
(x, y, z) R
3
. If c ,= 0, then for each real number k the equation
f (x) = k determines a plane in R
3
with normal vector c. Changing k
gives different planes parallel to each other. That is, the level surfaces
dened by f are all planes perpendicular to c.
2. f (x) = c x = c
1
x
1
+ . . . + c
n
x
n
. Then (see part (1) above) the level
surfaces x [ f (x) = k are mutually parallel (n 1)-dimensional
subspaces.
3. f (x, y, z) = x
2
+ y
2
+ z
2
. Then, for each k > 0, the level surface
dened by f (x, y, z) = k is a sphere with radius

k and centre 0.
5.4 Summary
Summing up we have discussed three types of functions:
1. Scalar-valued functions of one real variable, that is, functions of
the form
f : A R
for some subset A of R.
2. Vector functions, that is, functions of the form
f : A R
n
for some subset A of R with n 2.
3. Functions of several variables, that is, functions of the form
f : A R
for some subset A of R
n
with n 2.
High school calculus considered functions of type (1). In these
notes we will briey refresh this and study the calculus of functions
of types (2) and (3).
mathematical methods i 93
We can of course have even more general functions, that is, func-
tions of the form
f : A R
m
for some subset A of R
n
with n, m 2. Such functions are some-
times called vector elds or vector transformations.
We have already seen linear versions of such functions in Chap-
ter 4: an mn matrix B denes the function
f : R
n
R
m
x Bx
6
Limits and continuity
6.1 Scalar-valued functions of one variable
An important concept for the calculus of scalar-valued functions of
one variable is the notion of a limit. This should mostly be familiar
to you from school. Recall that an open interval in R is a set denoted
by (a, b) and consisting of all real numbers x such that a < x < b. A
closed interval [a, b] consists of all real numbers x such that a x
b.
Definition 6.1. (Intuitive denition of a limit)
Let f (x) be dened on an open interval containing a, except possibly
at a, and let L R. We say that f converges to L as x approaches
a if we can make the values of f (x) arbitrarily close to L by taking x to
be sufciently close to a, but not equal to a. We write lim
xa
f (x) = L or
f (x)
xa
L.
Essentially what the denition is saying is that as x gets closer
and closer to a, the value f (x) gets closer and closer to L.
Sometimes we can calculate the limit of f (x) as x approaches a
by simply evaluating f (a). For example, if f (x) = x
2
then
lim
x1
f (x) = f (1) = 1.
However, it may be the case that lim
xa
f (x) ,= f (a). For example,
consider the function f : R R dened by
f (x) =
_
x if x ,= 0
2 if x = 0
(6.1)
The graph of this function is given in Figure 6.1. Then we clearly
have that lim
x0
f (x) = 0 while f (0) = 2.
It is also possible to dene one-sided limits. We write
lim
xa

f (x) = L
if f (x) gets arbitrarily close to L as x gets sufciently close to a with
x < a. This denition only requires f to be dened on an open
96
2 1 1 2 3
2
1
1
2
3
Figure 6.1: The graph of the function f
dened in (6.1)
interval (b, a) for some b < a. Similarly, lim
xa
+ f (x) = L if f (x)
gets arbitrarily close to L as x gets sufciently close to a with x > a.
Note that lim
xa
f (x) exists if and only if
lim
xa

f (x) = lim
xa
+
f (x)
Example 6.2. 1. For any constant > 0, lim
x0
x

= 0.
2. lim
x0
sin x
x
= 1.
Figure 6.2: The graph of the function
y = sin(x)/x for x [5, 5].
3. Consider the function
f (x) =
_
0 if x < 0
1 if x 0
Then lim
x0
f (x) does not exist. However, lim
x0
f (x) = 0 and
lim
x0
+ f (x) = 1.
To make it easier to calculate limits we have the following rules
that allow us to combine limits of known functions to determine
limits of new functions.
mathematical methods i 97
Theorem 6.3 (Limit Laws). Let I be an open interval, a I, and
let f (x) and g(x) be dened for x I a such that lim
xa
f (x) and
lim
xa
g(x) exist. Then:
1.
lim
xa
( f (x) + g(x)) = lim
xa
f (x) + lim
xa
g(x)
lim
xa
( f (x) g(x)) = lim
xa
f (x) lim
xa
g(x)
2. For any constant c R,
lim
xa
(c f (x)) = c lim
xa
f (x)
3.
lim
xa
( f (x) g(x)) = (lim
xa
f (x)) (lim
xa
g(x))
4. If lim
xa
g(x) ,= 0, then
lim
xa
f (x)
g(x)
=
lim
xa
f (x)
lim
xa
g(x)
The statements in Theorem 6.3 remain true if x a is replaced
by either x a

or x a
+
for suitably dened functions.
The intuitive denition of a limit can be easily extended to the
notion of the limit of f (x) as x approaches innity. In particular
we say that f (x) tends to L as x if we can make f (x) arbi-
trarily close to L by taking x sufciently large. We denote this by
lim
x
f (x) = L. We can dene lim
x
f (x) in a similar fashion.
The statements in Theorem 6.3 still hold if we replace x a by
x or x .
Example 6.4. 1. For any constant > 0, lim
x
1
x

= 0.
2. Evaulate lim
x
x
2
+2x +1
2x
2
+ x +7
.
Solution: We have
lim
x
x
2
+2x +1
2x
2
+ x +7
= lim
x
x
2
(1 +
2
x
+
1
x
2
)
x
2
(2 +
1
x
+
7
x
2
)
= lim
x
1 +
2
x
+
1
x
2
2 +
1
x
+
7
x
2
=
lim
x
1 + lim
x
2
x
+ lim
x
1
x
2
lim
x
2 + lim
x
1
x
+ lim
x
7
x
2
by Theorem 6.3
=
1+0+0
2+0+2
=
1
2
Another useful theorem for calculating limits is the following
98
Theorem 6.5 (The Squeeze Theorem, or The Sandwich Theorem).
Let I be an open interval, a I, and let f (x), g(x) and h(x) be dened
for x I a with g(x) f (x) h(x) for all x I a. If
lim
xa
g(x) = lim
xa
h(x) = L
then lim
xa
f (x) = L.
The Squeeze Theorem still holds if x a is replaced by any of
x a

, x a
+
, x or x with the appropriate changes
to the statement about the domain of the function.
Example 6.6. Determine lim
x
sin x
x
.
We have

1
x

sin x
x

1
x
for any x > 0, and lim
x

1
x
= lim
x
1
x
= 0. So by the Squeeze
Theorem, lim
x
sin x
x
= 0.
We saw in Example 6.2(3) one way in which a limit may not
exist. Another way for which a limit may fail to exist is if f (x)
diverges to .
Definition 6.7. (Diverging to )
Let f (x) be dened on an open (b, a). We say that f diverges to as
x a from below if f (x) gets arbitrarily large as x gets sufciently
close to a with x < a. We denote this by lim
xa

f (x) = .
In a similar way one denes lim
xa

f (x) = , lim
xa
+
f (x) = and
lim
xa
+
f (x) = .

Note that by writing


lim
xa

f (x) =
we are not saying that the limit exists,
we are simply using the notation to
denote that f diverges to innity from
below.
Finally, if lim
xa

f (x) = lim
xa
+
f (x) = , we write lim
xa
f (x) = .
Similarly, lim
xa
f (x) = means that lim
xa

f (x) = lim
xa
+
f (x) = .
Example 6.8. For f (x) =
1
x 1
, x ,= 1, we have
lim
x1

f (x) = , lim
x1
+
f (x) = .
6.1.1 The precise denition of a limit
The intuitive denition of a limit given in Denition 6.1 is a bit im-
precise; what exactly do approach or sufciently close mean?
This is particularly unclear when dealing with functions of sev-
eral variables: how do you approach 0? From which direction?
It is also unclear when you start to deal with more complicated
functions than the ones you will see in this course. Moreover, we
need a more precise denition if we wish to give rigorous proofs of
Theorem 6.3 and 6.5.
mathematical methods i 99
For these reasons, mathematicians require a more formal and
precise denition. We outline such a denition in this subsection
for the interested student. These ideas will be explored in tutorials.
Definition 6.9. (Limits at nite points)
Let f (x) be dened on an open interval I around a, except possibly at a,
and let L R. We say that f converges to L as x a if for all > 0
there exists > 0 such that
[ f (x) L[ < for all x I a with [x a[ <
This denition needs quite a bit of unpacking. Essentially what it
is saying is that no matter what you are given, then it is possible
to nd a > 0 (possibly depending on ) such that if x has distance
at most from a then f (x) is at most distance from L. The quan-
tity is the measure of closeness of f (x) to L and is the measure
of closeness of x to a. Another way of thinking about it is that is
the control on the input error that you need to guarantee an output
error of .
Visualising this geometrically, what we have is the following:
Since [ f (x) L[ < is equivalent to L < f (x) < L + , then
lim
xa
f (x) = L means that for any > 0 we can nd > 0 small
enough that the part of the graph of f corresponding to the interval
(a , a + ) is entirely in the horizontal strip L < y < L + .
In other words, when x approaches a the value f (x) of the function
approaches L.
We will demonstrate this with a simple example.
Example 6.10. Let f (x) = 2x + 1 and a = 0. Intuitively we expect
the limit as x tends to 0 to be 1. More formally, suppose that > 0. To
show that lim
x0
f (x) = 1 we need to nd a > 0 so that if x is at distance
at most from 0 then f (x) is at distance at most from 1. A good guess
would be = /2. Indeed, if [x 0[ < then
[ f (x) 1[ = [2x +1 1[ = [2x[ = 2[x[ < 2 =
as required.
We can also give a precise denition for limits as x approaches .
Definition 6.11. (Limits at and )
Let f (x) be dened on an interval (c, ) for some c R. We say that
f (x) tends to L as x if for all > 0 there exists b > c such that
[ f (x) L[ < for all x b. In a similar way we dene lim
x
f (x) = L.
Geometrically, lim
x
f (x) = L means that for any > 0 we can
nd some b > a (possibly depending on ) so that the part of the
100
graph of f corresponding to the interval [b, ) is entirely in the
horizontal strip L < y < L + . In other words, as the variable
x becomes larger and larger the value f (x) of the function at x
approaches L.
We can also give a precise denition for f to diverge to .
Definition 6.12. (Diverging to )
Let f (x) be dened on an interval of the form (b, a) for some b < a. We
say that f diverges to as x a with x < a if for any M > 0 there
exists > 0 such that f (x) > M for all x < a with [x a[ < .
This is saying that no matter what positive number M you
choose, you can always nd some margin so that if x is at most
away from a then f (x) will be greater than M.
6.2 Limits of vector functions
We can consider limits of vector functions in a similar way to how
we viewed limits for scalar functions.
Let r(t) = ( f (t), g(t), h(t)) be a vector function dened on some
open interval around a, except possibly at a. Dene
lim
ta
r(t) =
_
lim
ta
f (t), lim
ta
g(t), lim
ta
h(t)
_
,
if the limits of the coordinate functions exist. Limits can then be
calculated using the techniques for scalar-valued functions.
We can also dene lim
ta

r(t) and lim


ta
+
r(t) in a similar fashion.
6.3 Limits of functions of two variables
We can also extend the notion of a limit to functions of two vari-
ables but this requires a few modications.
First, for functions of one real variable we used the notion of an
open interval of the real line. The appropriate notion in the two
variable case is that of an open disc: Given a = (a
1
, a
2
) R
2
and
> 0, the open disc of radius and centre a is the set
B(a, ) = x R
2
[ d(x, a) <
Here
d(x, a) =
_
(x
1
a
1
)
2
+ (x
2
a
2
)
2
is the distance between the points x = (x
1
, x
2
) and a.
Given a subset D of R
2
we say that a is an interior point of D if D
contains some open disc centred at a.
The intuitive denition of a limit would then be:
mathematical methods i 101
Definition 6.13. (Intuitive denition of a limit)
Let f (x) be dened on an open disc centred at a, except possibly at a, and
let L R. We say that f converges to L as x a if we can make the
values of f (x) arbitrarily close to L by taking x to be sufciently close to
a, but not equal to a. We write lim
xa
f (x) = L.
Example 6.14. 1. Let f (x, y) = x. Then
lim
(x,y)(a
1
,a
2
)
f (x, y) = a
1
.
2. Determine lim
(x,y)(0,0)
x
2
+ x + xy + y
x + y
.
Solution: Here it helps if we factorise the numerator. We have
lim
(x,y)(0,0)
x
2
+ x + xy + y
x + y
= lim
(x,y)(0,0)
(x + y)(x +1)
x + y
= lim
(x,y)(0,0)
x +1
= 1
For the one variable case there were only two paths along which
to approach a real number a: from the left or from the right. For
the limit to exist we require that the limit as we approach a from
either side is the same. However, there are innitely many paths
approaching a point a R
2
and they are not all straight lines! For
the limit to exist the value of f (x) must approach the same value L
no matter which path we choose.
More formally, for the limit to exist and to be equal to L we
require that for every curve r(t) with lim
tt
0
r(t) = a we have
lim
tt
0
f (r(t)) = L
If the latter holds for only some curves but not all curves, then
lim
xa
f (x) does not exist. This gives a simple method for showing
that a limit does not exist.
Example 6.15. Let D = R
2
(0, 0) and dene
f : D R
(x, y)
xy
x
2
+y
2
Let a = (0, 0).
First consider the curve r
1
(t) = (t, t) passing through a. Then
lim
t0
r
1
(t) = (0, 0), so
lim
t0
f (r
1
(t)) = lim
t0
tt
t
2
+ t
2
=
1
2
.
Next consider the curve, r
2
(t) = (0, t) dened for all t > 0. Then
lim
t0
+
r
2
(t) = a and along this curve f (r
2
(t)) = f (0, t) = 0 and so
lim
t0
+
f (r
2
(t)) = lim
t0
+
0 = 0 ,=
1
2
102
Since we have found two different paths with two different limits as they
approach (0, 0) it follows that lim
(x,y)(0,0)
f (x, y) does not exist.
To see what is actually happening with the function we can plot its
graph and we see in Figure 6.3 that the resulting surface seems to have
some kind of pinch in it at (0, 0). The curve (r
1
(t), f (r
1
(t))) has gone
along the top ridge while the curve (r
2
(t), f (r
2
(t))) has gone along the
bottom ridge.
Figure 6.3: The surface dened by
f (x, y) =
xy
x
2
+y
2
.
Sometimes it is easier to nd a limit, if it exists, by rst changing
the co-ordinate system; so we shall now consider another com-
monly used system of co-ordinates in R
2
.
Definition 6.16. (Polar coordinates in R
2
)
Let x = r cos() and y = r sin() for r 0 and [0, 2). Then r,
are called the polar coordinates of the point (x, y). Notice that r
2
=
x
2
+ y
2
, so r =
_
x
2
+ y
2
and is the distance of the point (x, y) from the
origin.
Exercise 6.3.1. Find lim
(x,y)(0,0)
xy
_
x
2
+ y
2
if it exists.
Solution. Introduce polar coordinates: x = r cos() and y = r sin().
Since r =
_
x
2
+ y
2
, we have r 0 as (x, y) (0, 0). Now

xy
_
x
2
+ y
2

=
[r
2
cos() sin()[
r
= r[ cos()[[ sin()[ r
and so
r
xy
_
x
2
+ y
2
r.
Letting (x, y) (0, 0), we have r 0, so by the Squeeze Theorem,
lim
(x,y)(0,0)
xy
_
x
2
+ y
2
= 0.
We can dene limits for functions of three or more variables in
a similar way. We dene the open ball of radius and centre a to be
the set
B(a, ) = x R
n
[ d(x, a) <
mathematical methods i 103
where d(x, a) is the distance between x and a. Sometimes we refer
to open discs in R
2
and open intervals in R as open balls so that
we do not need to interchange terminology for each case. Then
Denition 6.3 extends to the general setting by replacing open
disc with open ball.
We end this section by giving the precise denition of a limit for
the interested student.
Definition 6.17. (Precise denition of limit)
Let f (x) be dened on an open ball around a, except possibly at a, and let
L R. We write
L = lim
xc
f (x)
and say that the limit of f (x) as x approaches a is equal to L if for
every > 0 there exists > 0 such that [ f (x) L[ < holds for each
x B(a, ) with x ,= a (that is, for each x with 0 < d(x, a) < ).
6.4 Continuity of Functions
Continuity for scalar-valued functions of one variable essentially
means that the graph of the function has no gaps in it, that is, can
be drawn without taking your pen off the page. We make this def-
inition more precise and state it in a manner that also applies to
vector functions and functions of several variables as follows:
Definition 6.18. (Continuity)
Let a R
m
for some m 1 and let D R
m
such that D contains an
open ball centred at a. Let f : D R
n
for some n 1. We say that f is
continuous at a if
lim
x a
f (x) = f (a)
We say that f is continuous if it is continuous at all points in its domain.
It may be the case that f is not dened
on an open ball around each point in
its domain. For example, if A = [0, )
and f : A R is dened by f (x) =

x then point 0 is in the domain of


f but f is not dened on an open
interval around 0. In this case, to say
that f is continuous at 0 means that
lim
x0
+ = f (0). We can similarly
dene f to be continuous at the right
end point of an interval.
One consequence of continuity is that a continuous function
cannot skip values, that is small changes in the variable result in
small changes in the value of the function.
It is easy to see that if r(t) is a vector-valued function then it is
continuous at t
0
if and only if each of the coordinate functions f (t),
g(t) and h(t) are continuous at t
0
.
We give some examples of continuous functions.
Example 6.19. 1. Given real numbers a
0
, a
1
, . . . , a
n
, the polynomial
P(x) = a
0
x
n
+ a
1
x
n1
+ . . . + a
1
x + a
n
, x R
is a continuous function, and therefore every rational function
R(x) =
P(x)
Q(x)
104
where P and Q are polynomials, is continuous on its domain.
2. sin(x) and cos(x) are continuous on R.
3. e
x
is continuous on R, while ln(x) is continuous on its domain, which
is (0, ).
4. For any constant R, the function f (x) = x

is continuous for all


x > 0.
5. The function r(t) = (t
3
, ln(3 t),

t) with domain [0, 3) is continu-


ous since each of its coordinate functions is continuous on this interval.
6. It follows from part (1) that f (x, y) = x is a continuous function in
R
2
. Similarly, g(x, y) = y is continuous.
7. The function f dened in Equation (6.1) is not continuous at x = 0.
The next result tells us various ways of combining continuous
functions to get new continuous functions.
Theorem 6.20. Let n 1, D R
n
and let f , g : D R be
continuous at c D. Then:
1. The functions a f (x) + bg(x) (a, b R) and f (x)g(x) are continuous
at c.
2. If g(c) ,= 0, then
f (x)
g(x)
is continuous at c.
3. If h(t) is a function of a single variable t R, dened on the range of f
and continuous at f (c), then h( f (x)) is continuous at c.
4. If t I R, and r(t) is a vector function such that r(t) D for t I
and r(t
0
) = c, and if r is continuous at t
0
, then the function f (r(t)) is
continuous at t
0
.
Example 6.21. Since f (x, y) = x and g(x, y) = y are continuous
functions (Example 6.19(6)), Theorem 6.20(1) implies that h(x, y) = x
p
y
q
is continuous for all integers p, q 0. Using Theorem 6.20(1) again, one
gets that every polynomial
P(x, y) = a
1
x
p
1
y
q
1
+ a
2
x
p
2
y
q
2
+ . . . + a
k
x
p
k
y
q
k
is continuous. Hence by Theorem 6.20(2) rational functions R(x, y) =
P(x,y)
Q(x,y)
, where P and Q are polynomials, are continuous on their domains
that is, everywhere except where Q(x, y) = 0.
Then, applying Theorem 6.20(3), it follows that e
R(x,y)
, sin(R(x, y))
and cos(R(x, y)) are continuous functions (on their domains).
We note that if we know that a function f is continuous at a
point a then it is easy to calculate lim
xa
f (x): it is simply f (a).
Example 6.22. Determine lim
(x,y)(0,0)
x
2
+ y
2
2x +2
.
Solution: Note that the denominator is nonzero when (x, y) = (0, 0).
Thus since both the numerator and the denominator are polynomials,
mathematical methods i 105
Theorem 6.20(2) implies that the function f (x, y) =
x
2
+y
2
2x+2
is continuous
at (0, 0). Thus
lim
(x,y)(0,0)
x
2
+ y
2
2x +2
=
0
2
+0
2
2(0) +2
= 0
7
Differentiation
You should be familiar with differentiating functions of one vari-
able from high school. We will rst revise this and then move on
to the appropriate analogues for vector functions and functions of
several variables.
7.1 Derivatives of functions of one variable
Definition 7.1. (Derivative)
Let f (x) be a function dened on some interval I and let a be an interior
point of I.
We say that f is differentiable at a if the limit
lim
xa
f (x) f (a)
x a
exists and we dene f
/
(a) to be the value of the limit. Then f
/
(a) is called
the (rst) derivative of f at a.
In a similar way one denes the left derivative
f
/
(a

) = lim
xa

f (x) f (a)
x a
of f at a, and the right derivative
f
/
(a
+
) = lim
xa
+
f (x) f (a)
x a
of f at a, whenever the corresponding limits exist.
Letting x = a + h we have that x a is the same as h 0 and so
an equivalent denition of the derivative is that
f
/
(a) = lim
h0
f (a + h) f (a)
h
Recall that another notation for the derivative of f at a is
d f
dx
(a).
Notice that the left and right derivatives can be considered not
only for interior points a of I but also when a is a right or left end-
point of I respectively.
108
We emphasis the following key concept as it will guide our later
denitions of the derivative of other types of functions.
Key Concept 7.2. The derivative of f at a is the slope of the tangent
line to the curve dened by y = f (x) at the point (a, f (a)). More-
over, close to x = a the tangent line is the best approximation to the
curve by a straight line.
You should be familiar with nding the derivative of various
functions from high school, such as polynomials, trigonometric
functions, exponential functions and logarithmic functions. More-
over, for a function dened in a piecewise manner you often need
to use the formal denition to determine the derivative. You should
also be familiar with the following results.
Theorem 7.3. Let f (x) and g(x) be differentiable on an interval I. Then:
1. [ f (x) + g(x)]
/
= f
/
(x) + g
/
(x) and [ f (x) g(x)]
/
= f
/
(x) g
/
(x).
2. For any constant c R, [c f (x)]
/
= c f
/
(x).
3. (Product Rule)
[ f (x)g(x)]
/
= f
/
(x)g(x) + f (x)g
/
(x)
4. (Quotient Rule) If g(x) ,= 0 on I
_
f (x)
g(x)
_
/
=
f
/
(x)g(x) f (x)g
/
(x)
(g(x))
2
5. (Chain Rule) Assume that h(x) is a differentiable function on some
interval J and h(x) I for all x J. Then the function f (h(x)) is
differentiable on J and
d
dx
[ f (h(x))] = f
/
(h(x))h
/
(x)
Theorem 7.4. If f is differentiable at some a I, then f is continuous
at a.
Remark 7.5. The converse statement of Theorem 7.4 is not true, that is,
not every continuous function is differentiable. For example, f (x) = [x[ is
continuous everywhere on R. However
f
/
(0

) = lim
x0

[x[ 0
x 0
= lim
x0

x
x
= 1 ,
while
f
/
(0
+
) = lim
x0
+
[x[ 0
x 0
= lim
x0
+
x
x
= 1 ,= f
/
(0

) ,
so f is not differentiable at 0.
1 1
There are continuous functions on
R which are not differentiable at any
point, but such examples are beyond
the scope of this unit.
mathematical methods i 109
Note that even if f is differentiable at a point a it is not necessar-
ily true that f
/
(a) = lim
xa
f
/
(x). This happens to be true only if f
/
(x)
is continuous at a and this is not necessarily the case. We say that
f is continuously differentiable on an interval I if f
/
(x) exists and is
continuous everywhere in I.
Definition 7.6. (nth derivative)
If f is differentiable at any x I, we get a function x f
/
(x) de-
ned on I which is called the (rst) derivative of f on I. If f
/
(x) is
also differentiable on I we dene the second derivative of f on I by
f
//
(x) = ( f
/
)
/
(x) for any x I. Repeating this process we can dene the
nth derivative by f
(n)
(x) = ( f
(n1)
)
/
(x) for all x I.
7.2 Differentiation of vector functions
We can dene the derivative of a vector function in an analogous
way to the derivative of a scalar function of one variable.
Definition 7.7. (Derivative of a vector function)
Let r(t) be a vector-valued function dened on an interval I containing
the point t
0
. Dene
r
/
(t
0
) = lim
s0
r(t
0
+ s) r(t
0
)
s
,
if the limit exists. In this case r
/
(t
0
) is called the derivative of r at t
0
, and
r is called differentiable at t
0
.
The vector function r(t) is called continuously differentiable in I
if r
/
(t) exists and is continuous everywhere in I. Just as for scalar
functions of one variable, it is possible for a function to be differen-
tiable on an interval but for the derivative not to be continuous.
The following result shows us that to differentiate a vector func-
tion we simply need to differentiate each of the coordinate func-
tions. We state it for vector functions whose image lies in R
2
but it
will clearly extend to any R
n
.
Theorem 7.8. The vector function r(t) = ( f (t), g(t)) is differentiable at
t
0
if and only if the coordinate functions f , g and h are differentiable at t
0
.
When r(t) is differentiable at t
0
, its derivative is given by
r
/
(t
0
) = ( f
/
(t
0
), g
/
(t
0
)).
110
Proof. We have
r
/
(t
0
) = lim
s0
r(t
0
+ s) r(t
0
)
s
= lim
s0
1
s
( f (t
0
+ s) f (t
0
), g(t
0
+ s) g(t
0
))
=
_
lim
s0
f (t
0
+ s) f (t
0
)
s
, lim
s0
g(t
0
+ s) g(t
0
)
s
_
= ( f
/
(t
0
), g
/
(t
0
))
Recall that given a vector v = (x, y)
R
2
, the length or magnitude of v is de-
noted by [v[ and is given by
_
x
2
+ y
2
What is the geometric interpretation of the derivative of a vector
function? Recall that the vector function r : I R
2
describes
a curve C in R
2
. If t
0
I such that r
/
(t
0
) exists and is non-
zero, then r
/
(t
0
) is a tangent vector to the curve C at r(t
0
). It
will sometimes be useful to use the corresponding unit tangent
vector given by
T(t
0
) =
r
/
(t
0
)
[r
/
(t
0
)[
The line passing through the point r(t
0
) and parallel to the
vector r
/
(t
0
) is called the tangent line to the curve C at r(t
0
). The
tangent line is called the (afne) linear approximation of C at
r(t
0
). There is a unique linear parameterisation of by a vector
function v(t) such that v(t
0
) = r(t
0
) and v
/
(t
0
) = r
/
(t
0
), namely
v(t) = r(t
0
) + (t t
0
) r
/
(t
0
)
(However there are many other linear parameterisations of .)
Near r(t
0
), it is the best approximation of C by a line.
We have a similar geometric interpretation for functions
r : I R
3
.
The derivative of a vector function r(t) encodes more than just
the slope of the tangent to the curve traced out by r(t). For exam-
ple, consider the two vector functions r
1
(t) = (cos(t), sin(t)) and
r
2
(t) = (cos(2t), sin(2t)). Both functions trace out the same
curve in R
2
, namely the unit circle. Now r
/
1
(t) = (sin(t), cos(t))
while r
/
2
(t) = (2 sin(2t), 2 cos(2t)). The rst is a vector of mag-
nitude 1 while the second has magnitude 2 and at time t = 0 points
in the opposite direction to the rst. This reects the fact that the
rst function describes an anticlockwise trajectory around the circle
while the second describes a clockwise trajectory that is travelling
at twice the speed.
In particular, for a particle with position vector at time t given by
r)(t) the velocity vector is given by r
/
(t) and the speed is given by
[r
/
(t)[. The velocity vector describes the rate of change of position
(and so has a direction) while the speed is the rate of change of
distance travelled (and so only a scalar). The acceleration vector is
given by r
//
(t).
mathematical methods i 111
C

r(t
0
)
Figure 7.1: Tangent line of a curve C
at r(t
0
).
Example 7.9. 1. Sketch the curve C dened by
r(t) = (cos(t), sin(t), t) for t R,
nd the tangent vector r
/
(t) and write down the parametric equations
of the tangent line to C at r(t
0
). (This curve is called a helix.)
Solution. The curve C lies on the cylinder x
2
+ y
2
= 1. The orthogo-
nal projection of r(t) on the xy-plane is v(t) = (cos(t), sin(t)), while
z = t is the height of the point r(t). The curve C is shown in Figure
7.2.
Figure 7.2: The curve dened by
R(t) = (cos(t), sin(t), t).
We have
r
/
(t) = (sin(t), cos(t), 1)
Given t
0
, the tangent line to C at r(t
0
) is the line through r(t
0
) =
(cos(t
0
), sin(t
0
), t
0
) parallel to r
/
(t
0
) = (sin(t
0
), cos(t
0
), 1). Thus
the parametric equations of are:
L : x = cos(t
0
) (t t
0
) sin(t
0
) , y = sin(t
0
) + (t t
0
) cos(t
0
) , z = t , t R.
2. Let r(t) = (t, f (t)), where f (t) is a differentiable function. Then the
curve dened by
x = t, y = f (t)
112
coincides with the graph of the function f and r
/
(t) = (1, f
/
(t)) which
is a vector in the direction of the tangent to the graph.
Just as with scalar functions of one variable, we need some rules
for nding the derivatives of combinations of vector functions. So
we have the following
Theorem 7.10. Let u(t) and v(t) be differentiable vector functions (both
with values in R
n
) and let f (t) be a differentiable real-valued function.
Then:
1. [u(t) +v(t)]
/
= u
/
(t) +v
/
(t);
2. [cu(t)]
/
= cu
/
(t) for any constant c R;
3. [ f (t)u(t)]
/
= f
/
(t)u(t) + f (t)u
/
(t);
4. [u(t) v(t)]
/
= u
/
(t) v(t) +u(t) v
/
(t);
5. (for vector functions in R
3
)
[u(t) v(t)]
/
= u
/
(t) v(t) +u(t) v
/
(t)
6. (Chain Rule) [u( f (t))]
/
= f
/
(t)u
/
( f (t)).
Proof. We will only prove parts (3) and (6) and only in the case
of R
2
. Let u(t) = (u
1
(t), u
2
(t)).
(3) Using the Product Rule for real-valued functions we have
[ f (t)u(t)]
/
= ( f (t)u
1
(t), f (t)u
2
(t))
/
= (( f (t)u
1
(t))
/
, ( f (t)u
2
(t))
/
)
= ( f
/
(t)u
1
(t) + f (t)u
/
1
(t), f
/
(t)u
2
(t) + f (t)u
/
2
(t))
= f
/
(t)(u
1
(t), u
2
(t)) + f (t)(u
/
1
(t), u
/
2
(t))
= f
/
(t)u(t) + f (t)u
/
(t).
(6) Using the Chain Rule for real-valued functions, we get
[u( f (t))]
/
= (u
1
( f (t)), u
2
( f (t)))
/
= ([u
1
( f (t))]
/
, [u
2
( f (t))]
/
)
= ( f
/
(t)u
/
1
( f (t)), f
/
(t)u
/
2
( f (t))) = f
/
(t) u
/
( f (t))
Example 7.11. Show that if the vector function r(t) is continuously
differentiable for t in an interval I and [r(t)[ = c, a constant for all t I,
then r
/
(t) r(t) for all t I.
Question: What would the curve described by r(t) look like?
Solution. Note that [r(t)[
2
= r(t) r(t). Then differentiating
r(t) r(t) = c
2
,
by using Theorem 7.10(4), we get
r
/
(t) r(t) +r(t) r
/
(t) = 0.
That is, 2r
/
(t) r(t) = 0. Hence r
/
(t) r(t) for all t.
Question: So which well-known geometric fact have we just proved?
mathematical methods i 113
7.3 Partial Derivatives
For scalar-valued functions of one variable, the derivative gave us
the rate of change of the function as the single variable changed.
Functions of several variables have several variables and the func-
tion may change at different rates with respect to different vari-
ables. To accommodate this we need to introduce the notion of a
partial derivative.
Definition 7.12. (Partial Derivative)
Let f (x, y) be a function dened on a subset D of R
2
, and let (a, b) be an
interior point of D. For x near a, dene g(x) = f (x, b) to be a function of
x. We dene f
x
(a, b) = g
/
(a), if this exists.
We call f
x
(a, b) the partial derivative of f with respect to x at (a, b).
From the denition of the derivative for a function of a single variable we
have:
f
x
(a, b) = lim
h0
f (a + h, b) f (a, b)
h
.
The partial derivative of f with respect to y at (a, b) is dened simi-
larly:
f
y
(a, b) = lim
h0
f (a, b + h) f (a, b)
h
.
For f
x
(a, b) we have xed the value of y and let x vary, so we
are back to dealing with a function of just one variable, a familiar
situation.
In general, for any value of x and y:
f
x
(x, y) = lim
h0
f (x + h, y) f (x, y)
h
and
f
y
(x, y) = lim
h0
f (x, y + h) f (x, y)
h
,
whenever the limits exist. So f
x
(x, y) gives us the rate of change
of f with respect to x with y held constant and f
y
(x, y) the rate of
change of f with respect to y with x constant.
What is the geometrical interpretation? Imagine standing at a
point on the surface given by the graph of the function f . The
value of f
x
(x, y) at the point will be the slope of the tangent
line to the surface that is parallel to the xz-plane, that is the
slope you see when looking in the direction of the x-axis. This
is demonstrated in Figure 7.3. This slope is likely to be different
from the slope of the tangent line to the surface that is parallel
to the yz-plane, which is given by f
y
(x, y). See Figure 7.4.
There are lots of different notation used to denote partial deriva-
tives. Some are
2
:
2
Note that we use

x
instead of
d
dx
to
distinguish between partial differentia-
tion and ordinary differentiation.
114
Figure 7.3: Tangent line parallel to
xz-plane
Figure 7.4: Tangent lines parallel to
xz-plane and yz-plane
f
x
(x, y) = f
/
x
(x, y) =

x
f (x, y) =
f
x
(x, y),
f
y
(x, y) = f
/
y
(x, y) =

y
f (x, y) =
f
y
(x, y)
To nd the partial derivative of f (x, y) with respect to x we con-
sider y as a constant and proceed as if f were a function of a single
variable. This means that we can use the familiar rules for differen-
tiation of functions of a single variable.
Example 7.13. Let f (x, y) = x
2
y + y sin(x). Then
f
x
(x, y) = 2xy + y cos(x),
f
y
(x, y) = x
2
+sin(x).
We can extend the notion of a partial derivative to dene the
partial derivatives f
x
, f
y
and f
z
of a function f (x, y, z) of three vari-
ables, and more generally the partial derivatives f
x
i
of a function
f (x
1
, x
2
, . . . , x
n
) of n variables. To calculate the partial derivative
with respect to the variable x
i
, we consider all x
j
s with j ,= i as
constants and proceed as though f were a function of a single vari-
able x
i
. Again we can use the familiar rules for differentiation of
functions of a single variable.
Example 7.14. 1. Let f (x, y, z) =
x + y + z
x
2
+ y
2
+ z
2
. Then
f
x
(x, y, z) =
(x
2
+ y
2
+ z
2
) (x + y + z) (2x)
(x
2
+ y
2
+ z
2
)
2
=
y
2
+ z
2
x
2
2x(y + z)
(x
2
+ y
2
+ z
2
)
2
.
mathematical methods i 115
Similarly, or just using the symmetry with respect to x, y and z, one
gets
f
y
(x, y, z) =
x
2
+ z
2
y
2
2y(x + z)
(x
2
+ y
2
+ z
2
)
2
, f
z
(x, y, z) =
x
2
+ y
2
z
2
2z(x + y)
(x
2
+ y
2
+ z
2
)
2
.
2. f (x, y, z) = e
xyz
+ y sin(xz). Then
f
x
(x, y, z) = yze
xyz
+ yz cos(xz) , f
y
(x, y) = xze
xyz
+sin(xz) ,
f
z
(x, y, z) = xye
xyz
+ xy cos(xz).
3. Consider the linear function f (x) = c x dened for x R
n
, where c
is a constant vector; that is,
f (x
1
, x
2
, . . . , x
n
) = c
1
x
1
+ c
2
x
2
+ . . . + c
n
x
n
.
Then f
x
i
(x) = c
i
for each i and each x R
n
.
7.3.1 Higher Derivatives
With functions of one variable there is only at most one second
derivative but as the number of variables increases the number of
possible second order derivatives increases quite rapidly, as we
shall see in what follows.
Definition 7.15. (Second partial derivative)
Let D be an open disc in R
2
and let f : D R. Suppose the partial
derivative f
x
(x, y) exists for all (x, y) D. This denes a function
g = f
x
: D R. Suppose this new function g has a partial derivative
with respect to x at some (a, b) D. We denote
f
xx
(a, b) = g
x
(a, b) =

x
f
x
(a, b)
and call it the second partial derivative of f with respect to x at
(a, b). Similarly we dene
f
xy
(a, b) = g
y
(a, b) =

y
f
x
(a, b)
Some alternative notation is:
f
xx
= f
//
xx
=

x
_
f
x
_
=

2
f
x
2
,
f
xy
= f
//
xy
=

y
_
f
x
_
=

2
f
yx
.
We can similarly dene f
yx
and f
yy
. Thus we have a total of four
second-order partial derivatives, although for many of the functions
with which we shall be concerned it turns out that the two mixed
derivatives, f
yx
and f
xy
, are identical.
116
We can also dene third order derivatives:
f
xxy
=

y
_

x
_
f
x
__
,
similarly one denes f
xxx
, f
xyx
, etc.
Example 7.16. Let f (x, y) = x
2
y + xy y
3
. Then
f
x
(x, y) = 2xy + y and f
y
(x, y) = x
2
+ x 3y
2
Also
f
xx
(x, y) =

x
(2xy + y) = 2y , f
xy
(x, y) =

y
(2xy + y) = 2x +1,
f
yx
(x, y) =

x
(x
2
+ x 3y
2
) = 2x +1
f
yy
(x, y) =

y
(x
2
+ x 3y
2
) = 6y.
Notice that f
xy
(x, y) = f
yx
(x, y) for this function. Also
f
xxy
(x, y) =

y
(2y) = 2, f
xyx
(x, y) =

x
(2x +1) = 2
and
f
yxx
(x, y) =

x
(2x +1) = 2
Thus f
xxy
(x, y) = f
xyx
(x, y) = f
yxx
(x, y).
Example 7.16 is an illustration of the following theorem.
Theorem 7.17 (Clairauts Theorem). Let f (x, y) be dened on an open
disc D of R
2
. If the functions f
xy
and f
yx
are both dened and continuous
on D, then f
xy
= f
yx
on D.
An analogue of Theorem 7.17 holds for functions of three or
more variables. For example,
f
xyzx
(x, y, z) = f
xxyz
(x, y, z) = f
yzxx
(x, y, z) = f
yxzx
(x, y, z), etc,
provided the partial derivatives are continuous.
7.4 Tangent Planes and differentiability
When dealing with functions of one variable we often want to nd
the equation of a line tangent to the graph of the function at some
point. One reason for doing this is that sometimes it is convenient
to use the equation of the tangent line as an approximation to the
more complicated equation of the actual function. Indeed if f (x) is
a function that is differentiable at x = a then the tangent line to the
curve dened by y = f (x) at x = a has equation
y = f
/
(a)(x a) + f (a) (7.1)
The derivative of f at x = a is the scalar multiple of x given in
the equation of the line. This is demonstrated in Figure 7.5 for the
function f (x) = x
2
at the point x = 3.
mathematical methods i 117
1 1 2 3 4 5
10
10
20
30
Figure 7.5: The tangent line to y = x
2
at x = 3.
The function L(x) = f
/
(x)(x a) + f (a) is called the best linear
approximation
3
to f at x = a.
3
Note that L(x) : R R is not
necessarily a linear function in the
sense of Chapter 4. Indeed, if f (a) ,= 0
then L(x) = f
/
(a)(x a) + f (a) ,=
L(x). The function L(x) is what we
would call an afne function.
Similar ideas occur in multi-variable calculus but the linear ap-
proximations are of higher dimensions than one-dimensional tan-
gent lines. For instance, for functions of two variables the analogue
of a tangent line is a tangent plane.
Consider a function f : D R where the domain D is a subset
of R
2
, and let (a, b) be an interior point of D. Let S be the surface
dened by
(x, y, f (x, y)) [ (x, y) D.
Suppose the partial derivatives f
x
(a, b) and f
y
(a, b) exist. Fix y = b
and consider the following curve C
1
on S dened by
r
1
(x) = (x, b, f (x, b))
for all (x, b) such that f (x, b) is dened. Then C
1
is the intersection
curve of the surface S with the plane y = b. The vector function
r
1
(x) is differentiable at a and r
/
1
(a) = (1, 0, f
x
(a, b)) is a tangent
vector to C
1
at P = r
1
(a) = (a, b, f (a, b)).
Similarly, the curve C
2
dened by
r
2
(y) = (a, y, f (a, y))
for all (a, y) such that f (a, y) exists, is the intersection of S with
the plane x = a and has a tangent vector r
/
2
(b) = (0, 1, f
y
(a, b)) at
r
2
(b) = P.
In conclusion, the vectors r
/
1
(a) = (1, 0, f
x
(a, b)) and r
/
2
(b) =
(0, 1, f
y
(a, b)) are tangent to the surface S at the point P = (a, b, f (a, b)).
These dene tangent lines
1
and
2
to S parallel to the correspond-
ing coordinate planes. This was depicted in Figure 7.4.
Since
1
is the best linear approximation to C
1
near P and
2
is the best linear approximation to C
2
near P, if there is a plane
that is the best linear approximation to S near P it must contain
both
1
and
2
. There is a unique plane , containing the point
P = (a, b, f (a, b)) and parallel to the vectors r
/
1
(a) and r
/
2
(b) and
is called the tangent plane to S at P. See Figure 7.6.
118
Figure 7.6: Tangent plane
Any non-zero vector orthogonal to is called a normal vector to
the surface S at the point P. Thus
n = r
/
1
(a) r
/
2
(b) = (f
x
(a, b), f
y
(a, b), 1)
is a normal vector to S at P.
4 4
Recall from the week 4 tutorial sheet
that the cross product of two vectors is
perpendicular to both the original two
vectors.
Since n is perpendicular to , an equation for the tangent plane
has the form
f
x
(a, b) x f
y
(a, b) y + z = d
for some constant d which can be determined by using the fact that
P .
Example 7.18. Determine the equation of the tangent plane to the
graph S of f (x, y) = x
2
+3xy y
2
at the point P = (1, 1, 3) and an
upward normal to this plane.
Solution: Calculating the partial derivatives we have f
x
(x, y) = 2x +3y
and f
y
(x, y) = 3x 2y. Thus f
x
(1, 1) = 1 and f
y
(1, 1) = 5, and
therefore n = (1, 5, 1) is a normal vector to the plane .
Thus an equation for has the form x 5y + z = d. Since P =
(1, 1, 3) it follows that 1 +5 3 = d, that is, d = 3. Hence the
equation of the tangent plane is
x 5y + z = 3.
It is possible to write the equation of the tangent plane in a dif-
ferent manner. Since the point P is given by (a, b, f (a, b)) the equa-
tion of can also be written as
z f (a, b) = f
x
(a, b)(x a) + f
y
(a, b)(y b)
or
z =
_
f
x
(a, b) f
y
(a, b)
_
_
x a
y b
_
+ f (a, b) (7.2)
This looks similar to the equation of the tangent line given in
(7.1). The tangent plane to S at P may or may not be a good linear
approximation to S near P. If it is then we dene our function f to
be differentiable. We can make this denition more
precise but this is beyond the scope of
the course: we would dene f to be
differentiable at the point c = (a, b) if
there is a linear function A : R
2
R
such that
lim
xc
f (x) f (c) A(x c)
d(x, c)
= 0
where d(x, c) is the distance between x
and c. The linear map A is called the
derivative of f at (a, b). As we saw in
Chapter 4, any linear transformation
can be represented by a matrix and
with respect to the standard basis the
matrix for A will be Df (a, b).
mathematical methods i 119
Definition 7.19. (Derivative)
We say that f is differentiable at (a, b) if the tangent plane given by
(7.2) is the best linear approximation to f near (a, b). The matrix
Df (a, b) =
_
f
x
(a, b) f
y
(a, b)
_
is called the derivative of f at (a, b). We say that f is differentiable if f
is differentiable at every point in its domain.
We should note that the existence of
f
x
and
f
y
at (a, b) does not
guarantee that f is differentiable at (a, b), as demonstrated in the
following example.
Example 7.20. Consider the function
f : R
2
R
(x, y)
_
0 if xy ,= 0
1 if xy = 0
The graph of f is given in Figure 7.7. When x = 0 or y = 0, the value
of f (x, y) is 1 and so f is constant in both the x-direction and the y-
direction. Hence f
x
(0, 0) = 0 and f
y
(0, 0) = 0. However, the tangent
plane
z =
_
f
x
(0, 0) f
y
(0, 0)
_
_
x 0
y 0
_
+ f (0, 0)
= 1
is clearly not a good a approximation for f near (0, 0) as f (a, b) = 0 for
any (a, b) arbitrarily close to (0, 0) with ab ,= 0.
A simple test for a function to be differentiable is the following
theorem.
Theorem 7.21. If
f
x
and
f
y
exist on an open disc centred (a, b) and are
continuous at (a, b) then f is differentiable at (a, b).
Example 7.22. Recall the function f (x, y) = x
2
+ 3xy y
2
from
Example 7.18. Its partial derivatives are f
x
(x, y) = 2x + 3y and
f
y
(x, y) = 3x 2y which exist and are continuous for all (x, y) R
2
as
they are polynomials. Hence f is differentiable and its derivative is
Df (x, y) =
_
2x +3y 3x 2y
_
We also have the following analogue of Theorem 7.4.
Theorem 7.23. If f is differentiable at (a, b) then f is continuous at
(a, b).
7.5 The Jacobian matrix and the Chain Rule
We saw in the previous section that if a function f : R
2
R
is differentiable then the derivative of f at c = (a, b) is the 1 2
120
x
y
z
Figure 7.7: The graph of the function f
dened in Example 7.20.
matrix given by
_
f
x
(c)
f
y
(c)
_
We also saw in Section 7.2 that if f (t) = ( f
1
(t), f
2
(t)) is a differ-
entiable vector function then its derivative is the vector function
f
/
(t) = (
d f
1
dt
(t),
d f
2
dt
(t)). It is convenient to write both f (t) and f
/
(t)
as column vectors
5
, that is
5
This is consistent with our convention
that all vectors are actually column
vectors.
f (t) =
_
f
1
(t)
f
2
(t)
_
and f
/
(t) =
_
d f
1
dt
(t)
d f
2
dt
(t)
_
that is, as 2 1 matrices.
What happens if we have a function F : R
2
R
2
, that is, a
function of the form
(u, v) = F(x, y)
Then really we have two coordinate functions of x and y and it is
best to think of our function in terms of column vectors, that is
_
u
v
_
= F(x, y) =
_
f
1
(x, y)
f
2
(x, y)
_
If our function F is differentiable
6
then the derivative at c = (a, b) is
6
We havent said what it actually
means for such a function to be differ-
entiable but essentially it is just that
F can be approximated by an (afne)
linear function.
the 2 2 matrix
DF(c) =
_
f
1
x
(c)
f
1
y
(c)
f
2
x
(c)
f
2
y
(c)
_
(7.3)
consisting of all the rst order partial derivatives.
mathematical methods i 121
Definition 7.24. (Jacobian matrix)
The matrix DF(c) is called the Jacobian matrix of F at c and its determi-
nant is called the Jacobian.
In general, if F : R
n
R
m
then F has m coordinate functions
and the Jacobian matrix of F is an m n matrix of partial deriva-
tives.
Example 7.25. 1. Polar coordinates transformation: Let F(r, ) =
(r cos(), r sin()) for r 0 and [0, 2). That is F(r, ) =
( f
1
(r, ), f
2
(r, )) where f
1
(r, ) = r cos() and f
2
(r, ) = r sin().
Now if
D = (r, ) [ r [0, a], [0, 2)
for some a > 0, then
range(F) = (x, y) R
2
[ x
2
+ y
2
a
2

that is, the disc with centre at 0 and radius a.


Then for every c = (r, ) we have
DF(c) =
_
f
1
r
(c)
f
1

(c)
f
2
r
(c)
f
2

(c)
_
=
_
cos() r sin()
sin() r cos()
_
Notice that the Jacobian of F is equal to r.
2. Consider F : R
2
R
2
given by F(x, y) = (3x +4y, 2x + y). Then
(writing vectors as columns) we have
F(x, y) =
_
3x +4y
2x + y
_
=
_
3 4
2 1
_ _
x
y
_
so F is the linear transformation dened by the matrix
A =
_
3 4
2 1
_
Now
DF(x, y) =
_
f
1
x
f
1
y
f
2
x
f
2
y
_
=
_
3 4
2 1
_
= A
This should not be surprising: the derivative of a function gives the
best linear approximation to the function near a point. If the function is
already linear then the best linear approximation is the function itself.
In the case of functions of one variable the derivative of f (x) = ax is
equal to a for all x R. For a linear function F(x) = Ax given by a
matrix A, then DF(x) = A for every x R
2
(Exercise).
122
7.5.1 The Chain Rule
If f is a scalar-valued function of a variable x and x is in turn a
scalar-valued function of t, then recall that the Chain Rule allows us
to determine the derivative of f with respect to t without having to
explicitly nd f as a function of t:
d f
dt
=
d f
dx
dx
dt
This naturally extends to functions of several variables.
Suppose rst that f : R
2
R is a function of the variables x and
y and both x and y are functions of t. Then f can be rewritten as a
function of t and the derivative with respect to t will be:
Chain Rule I:
d f
dt
=
f
x
dx
dt
+
f
y
dy
dt
(7.4)
Note that both f , x and y are functions of the single variable t
and so we use
d f
dt
,
dx
dt
and
dy
dt
instead of the partial derivative nota-
tion.
Example 7.26. Let f (x, y) = x
2
+ y
2
such that x = t
2
and y = e
t
. Then
dx
dt
= 2t and
dy
dt
= e
t
while
f
x
= 2x and
f
y
= 2y
Thus by the Chain Rule we have
d f
dt
=
f
x
dx
dt
+
f
y
dy
dt
= 2x(2t) +2y(e
t
)
= 2(t
2
)2t +2e
t
e
t
substituting for x and y
= 4t
3
+2e
2t
Alternatively, suppose that f : R
2
R is a function of the
variables x and y and both x and y are functions of the variables s
and t. Then f is a function of s and t and its partial derivatives will
be given by:
Chain Rule II:
f
s
=
f
x
x
s
+
f
y
y
s
f
t
=
f
x
x
t
+
f
y
y
t
(7.5)
Example 7.27. Let f (x, y) = x
2
+ y
2
with x = 2s + t and y = s
2
+ 4.
Then
f
x
= 2x
f
y
= 2y
mathematical methods i 123
x
t
= 1
x
s
= 2
y
t
= 0
y
s
= 2s
Then by the Chain Rule we have
f
s
=
f
x
x
s
+
f
y
y
s
= (2x)2 + (2y)(2s)
= 2(2s + t) +2(s
2
+4)2s
= 4s
3
+24s +4t
and
f
t
=
f
x
x
t
+
f
y
y
t
= (2x)1 + (2y)0
= 2(2s + t) = 4s +2t
Recall that the Chain Rule for scalar-valued functions of one
variable also provides a way of determining the derivative of the
composition of two functions. In this interpretation we have
d
dx
f (g(x)) = f
/
(g(x))g
/
(x).
For example, if f (x) = sin(x) and g(x) = 2x
2
then f (g(x)) =
sin(2x
2
)
7
and
7
Indeed doing such differentiation
has probably become second nature
without even realising that you are
using the Chain Rule.
d
dx
f (g(x)) = cos(2x
2
)4x
This interpretation naturally extends to vector functions of an
arbitrary number of variables in a manner that does not require
more complicated and complicated expressions along the lines of
our previous versions of the Chain Rule in (7.4) and (7.5).
Theorem 7.28 (Chain Rule). Let f : R
n
R
m
and g : R
s
R
n
be differentiable functions and let F = f g : R
s
R
m
. Then F is
differentiable and
DF(x) = Df (g(x))Dg(x)
Note that Df (g(x)) is an m n matrix and Dg(x) is an n s
matrix so their product is an ms matrix as required.
Is this the same as Chain Rules I and II? Suppose that f is a func-
tion of x and y, and both x and y are in turn functions of t. De-
ne g(t) = (x(t), y(t)), the change of variables function and let
F = f g. Then F can be viewed as the function given by f when
written as a function of t. Then
Dg(t) =
_
dx
dt
dy
dt
_
and Df (x, y) =
_
f
x
f
y
_
Now Theorem 7.28 gives
DF(t) =
_
f
x
f
y
_
_
dx
dt
dy
dt
_
=
_
f
x
dx
dt
+
f
y
dy
dt
_
124
that is
d f
dt
=
dF
dt
=
f
x
dx
dt
+
f
y
dy
dt
as we saw in Chain Rule I (7.4). Chain Rule II can be recovered in a
similar manner.
Example 7.29. Let g(x, y) = (x
2
+ 3y, y
2
+ 3) and f (u, v) =
(e
u
v, ln(v)). Find the Jacobian matrix and the Jacobian of F(x, y) =
f (g(x, y)) at c = (0, 1).
We have g(c) = (3, 4) and
Dg(x, y) =
_
2x 3
0 2y
_
, Df (u, v) =
_
e
u
1
0 1/v
_
.
The Chain Rule implies
DF(c) = Df (g(c))Dg(c) =
_
e
3
1
0 1/4
_ _
0 3
0 2
_
=
_
0 3e
3
2
0 1/2
_
Hence the Jacobian, det(DF(c)), is 0.
7.6 Directional derivatives and gradients
We saw in Section 7.3, that
f
x
gives the rate of change of f in the
direction of the x-axis and
f
y
gives the rate of change of f in the
direction of the y-axis. However, we may also be interested in the
rate of change of f in some other direction. This is provided by the
directional derivative.
Definition 7.30. (Directional Derivative)
Let f (x, y) be dened on some subset D of R
2
and let c = (a, b) be an
interior point of D. Given a non-zero vector v R
2
, dene u =
v
[v[
, that
is the unit vector in the direction of v.
The directional derivative of f at c in the direction v is dened by
D
v
f (c) = lim
h0
f (c + hu) f (c)
h
whenever the limit exists.
One can dene directional derivatives for functions of three or
more variables in a similar fashion.
Geometrically, D
v
f (c) is the rate of change of f at c in the di-
rection of v. So if v is parallel to the x-axis then the directional
derivative is just the partial derivative with respect to x. Similarly,
if v is parallel to the y-axis then the directional derivative is the
partial derivative with respect to y.
To enable us to calculate directional derivatives we introduce the
notion of the gradient vector.
8 8
You will have noticed that the gra-
dient vector f looks very similar to
the derivative Df and may wonder
why we have given it a different name.
There is a subtle difference and so
many mathematicians usually write
f as a column vector to distinguish
the two. The derivative Df (x) is a
matrix and so denes a linear function
from R
n
to R, while the gradient vec-
tor f (x) is a vector in R
n
associated
with the point x.
mathematical methods i 125
Definition 7.31. (Gradient vector)
Let f : D R with D R
2
and let (x, y) be an interior point of D. The
gradient vector of f at (x, y) is the vector
f (x, y) = grad f (x, y) = ( f
x
(x, y), f
y
(x, y))
if the partial derivatives f
x
(x, y) and f
y
(x, y) exist.
In a similar way one denes the gradient vector of a function of three
or more variables. That is, if f (x, y, z) is a function of three variables, its
gradient vector at (x, y, z) is dened by
f (x, y, z) = ( f
x
(x, y, z), f
y
(x, y, z), f
z
(x, y, z))
whenever the partial derivatives of f at (x, y, z) exist.
Example 7.32. 1. If f (x, y) = x
2
y ln(x) for all x > 0 and y R,
then the gradient vector of f is
f (x, y) = (2xy
1
x
, x
2
)
for each (x, y) in the domain of f .
2. Let f (x, y, z) = c x = ax + by + cz, where c = (a, b, c) R
3
. Then
f (x, y, z) = (a, b, c) = c.
The next theorem gives a method for calculating directional
derivatives
Theorem 7.33. If f is a differentiable function at c, then for every non-
zero vector v, the directional derivative D
v
f (c) exists and
D
v
f (c) = f (c) u
where u =
v
[v[
.
Example 7.34. Find the directional derivative of f (x, y) = xe
y
at
c = (3, 0) in the direction of the vector v = (1, 2).
Solution. Now f (x, y) = (e
y
, xe
y
) and the partial derivatives f
x
and
f
y
are continuous so by Theorem 7.21, f is differentiable. Now f (c) =
(1, 3) and u =
v
[v[
=
(1, 2)

5
= (
1

5
,
2

5
). Hence
D
v
f (c) = f (c) u = (1, 3) (
1

5
,
2

5
) =
5

5
=

5.
7.6.1 Maximum rate of change
We would like to determine the direction in which f has the max-
imum rate of change. As we saw in Theorem 7.33 the directional
derivative of f in the direction of v is
D
v
f (c) = f (c) u
126
where u is the unit vector in the direction of v. Recall that we can
also calculate the dot product of two vectors by
f (c) u = [f (c)[ [u[ cos()
where is the angle between f (c) and u. Since [u[ = 1 and
1 cos() 1 it follows that the maximum value for D
v
f (c) will
be [ f (c) [ and will occur in the direction of f (c).
Thus we have proved the following theorem.
Theorem 7.35. Let f (x) for x D, be a differentiable function of two or
more variables and let c be an interior point of D such that f (c) ,= 0.
Then
max
[u[=1
D
u
f (c) = [f (c)[
and the maximum is achieved only for u =
f (c)
[f (c)[
.
Theorem 7.35 is saying that the gradient vector at a given point,
points in the direction in which the function is increasing most
rapidly.
The same argument shows that the function is decreasing most
rapidly when the angle between v and f (c) is , that is, when v
is in the direction of f (c).
Example 7.36. The temperature at each point of a metal plate is given by
the function T(x, y) = e
x
cos(y) + e
y
cos(x). In what direction does the
temperature increase most rapidly at the point (0, 0). What is this rate of
increase?
Solution. T = (e
x
cos(y) e
y
sin(x), e
x
sin(y) + e
y
cos(x)), so
T(0, 0) = (1, 1). This is the direction of fastest increase of T at (0, 0).
The rate of increase in the direction of T(0, 0) is [T(0, 0)[ =

2.
If you consider the contour curves on a map, then the contour
curves indicate the direction in which the altitude is constant and
the direction in which the ground is steepest is in the direction
perpendicular to the contour curve. As we mentioned in Section
5.2.1, the contour lines on a map correspond to the level curves
of a function of two variables. The following theorem tells us that
the gradient vector is perpendicular to the level curves, since the
gradient vector is the direction of the greatest rate of increase, this
agrees with our experience.
Theorem 7.37. If f : D R with D R
2
, is differentiable at (a, b)
and f (a, b) ,= 0 then f (a, b) is perpendicular to the tangent line to
the level curve of f at (a, b).
Proof. Let C be the level curve passing through c = (a, b) and
let r(t) be a continuously differentiable parametrisation of C near
c with r(t
0
) = c for some t
0
. Since r(t) C for all t, we have
f (r(t)) = k for all t. Differentiating this equality with respect to t
and using the Chain Rule gives
0 = Df (r(t))Dr(t)
= f (r(t)) r
/
(t)
mathematical methods i 127
For t = t
0
this gives f (c) r
/
(t
0
). Since r
/
(t) is the direction of
the tangent line of C at c it follows that f (c) is orthogonal to C
at c.
Theorem 7.37 is illustrated in Figure 7.8.
z = f(x,y)
through (a,b)
Level curve
(a,b)
Tangent line
Normal line
Gradient Vector
f(a,b)
Figure 7.8: The gradient vector related
to the level curves
Similarly, let f (x) : D R, where D R
n
, be a differentiable
function of three or more variables. Consider the level surface
S : f (x) = k
and assume that f (c) = k and f (c) ,= 0. Then f (c) is a normal
vector to S at c, and the (n 1)-dimensional subspace containing c
and orthogonal to f (c) is the tangent plane to S at c.
Example 7.38. 1. Find a normal vector to the surface
S : z xe
y
cos(z) = 1
at c = (1, 0, 0) and an equation for the tangent plane to S at c.
Solution. S is a level surface of f (x, y, z) = z xe
y
cos z. We have
f (x, y, z) = (e
y
cos z, x e
y
cos z, 1 + x e
y
sin z)
so f (c) = (1, 1, 1) is a normal vector to S at c. Since the tangent
plane is perpendicular to f (c), it has an equation x y + z = d for
some constant d. Since c lies in the tangent plane, we get 1 = d, so
an equation of the plane is x + y z = 1.
2. Let S be the graph of f (x, y) : D R. Notice that S is a level surface
of the function g(x, y, z) = z f (x, y), so assuming f is differentiable
at (a, b) D, the vector g(c) = (f
x
(a, b), f
y
(a, b), 1) is a
normal vector to S at c = (a, b, f (a, b)). This agrees with the denition
of a normal vector given in Section 7.4.
8
Maxima and Minima
In this chapter we look at maxima and minima of functions of
several variables. First we revise the single variable case that should
be familiar from school but our treatment may be different.
8.1 Functions of a single variable
Definition 8.1. (Maxima and minima)
Let D R and let f : D R. Given c D, we say that f has an
absolute maximum at c if f (x) f (c) for all x D. Similarly we say
that f has an absolute minimum at c if f (x) f (c) for all x D.
We say that f has a local maximum at c if there exists an open inter-
val I centred at c such that f (x) f (c) for all x I D. We dene a
local minimum similarly.
A maximum or minimum of f is also called an extremum of f .
So a local maximum (or minimum) occurs at a point where the
value of the function is at least as great (or at least as small) as at
any point in the vicinity but which may be exceeded at some other
point in the domain, whereas an absolute maximum (minimum)
is the largest (least) value taken by the function anywhere in its
domain.
Example 8.2. The function f dened on the interval [1, 3] and whose
graph is given in Figure 8.1 has an absolute maximum at x = 3, an
absolute mininum at x = 1 as well a local maximum at x = 2/3 and a
local minimum at x = 2.
Theorem 8.3 (The Extreme Value Theorem). Let f be a continuous
scalar-valued function on a nite and closed interval [a, b]. Then f is
bounded and has an absolute maximum and an absolute minimum in
[a, b], that is, there exist c
1
, c
2
[a, b] such that f (c
1
) f (x) f (c
2
)
for all x [a, b].
Remark 8.4. 1. The points c
1
, c
2
do not have to be unique
1
.
1
They can be the same point if f is a
constant function.
130
1 1 2 3
20
10
10
Figure 8.1: Examples of maxima and
minima
2. The statement of Theorem 8.3 is not true (in general) for other types of
intervals. For example, the function f (x) = 1/x dened on x [1, ),
is a continuous function that has no absolute minimum. Although the
range of the function is bounded below by 0, it never actually achieves
this value.
Definition 8.5. (Critical points)
Let f be a scalar-valued function dened on the subset D of R. For c
D, we say that c is a critical point of f if
1. c is an interior point of D, and
2. either f
/
(c) does not exist or f
/
(c) = 0.
Theorem 8.6 (Fermats Theorem). If the function f has a local maxi-
mum or minimum at some interior point c of it domain, then c is a critical
point of f .
So, we have the following relations for interior points:
points of absolute extrema points of local extrema
critical points boundary points
In general, these sets are not equal. In particular, not all critical
points give rise to local maxima or minima. For example if f (x) =
x
3
then f
/
(x) = 3x
2
so x = 0 is a critical point yet it is neither a
local maxima nor a local minima.
The above suggests the following process.
Key Concept 8.7. Procedure for nding the absolute extrema of a
continuous function f over a nite closed interval.
Step 1. Find the critical points of f and the values of f at these.
mathematical methods i 131
Step 2. Find the values of f at the ends of the interval.
Step 3. Compare the values from Steps 1 and 2; the largest of them
gives the absolute maximum, while the smallest gives the absolute
minimum.
The existence of the absolute maximum and minimum follows from
the Extreme Value Theorem.
Example 8.8. Find the absolute maximum and the absolute minimum of
the function f (x) =

x (5 x
2
) on the interval [0, 4].
Step 1. For x > 0 we have
f
/
(x) =
1
2

x
(5 x
2
) +

x (2x) =
(5 x
2
) 4x
2
2

x
=
5(1 x
2
)
2

x
.
So, for x (0, 4) we have f
/
(x) = 0 only when x = 1, that is, x = 1 is
the only critical point of f . We have f (1) = 4.
Step 2. f (0) = 0 and f (4) = 22.
Step 3. Comparing f (1) = 4, f (0) = 0 and f (4) = 22, we conclude
that f (1) = 4 is the absolute maximum of the function, while f (4) =
22 is its absolute minimum.
8.2 Functions of several variables
The ideas of Section 8.1 easily extend to functions of two variables.
Definition 8.9. (Maxima and minima)
Let D R
2
and let f : D R. Given c D, we say that f has an
absolute maximum at c if f (x) f (c) for all x D. Similarly we say
that f has an absolute minimum if f (x) f (c) for all x D.
We say that f has a local maximum at c if there exists an open ball B
centred at c such that f (x) f (c) for all x B D. We dene a local
minimum similarly.
A maximum or minimum of f is also called an extremum of f .
Example 8.10. The function f (x, y) = 2x
2
+ y
2
has an absolute mini-
mum at (0, 0) because f (x, y) 0 = f (0, 0) for all (x, y).
We have the natural analogue for critical points for functions of
two variables.
Definition 8.11. (Critical points)
Let f be a scalar-valued function dened on a subset D of R
2
. For c D
we say that c is a critical point of f if
132
1. c is an interior point of D, and
2. f (c) does not exist or f (c) = 0.
We have the following analogue of Theorem 8.6.
Theorem 8.12. Let D R
2
and let f : D R. If c is an interior point
of D and f has a local maximum or minimum at c, then c is a critical
point of f .
Again note that not every critical point is a local maxima or
minima. For example, if f (x, y) = y
2
x
2
, we have f (0, 0) = 0,
but (0, 0) is neither a local maximum nor a local minimum. As can
be seen in Figure 8.2 the graph of the function near (0, 0) looks
like a saddle: it is increasing in some directions and decreasing in
others. This leads to the following denition.
Figure 8.2: The surface dened by
f (x, y) = y
2
x
2
Definition 8.13. (Saddle point)
Let f : D R, with D R
2
. A critical point of f is called a saddle
point of f if f has neither a local minimum nor a local maximum at the
point.
The Extreme Value Theorem also has an appropriate analogue
for functions of two variables but rst we need some denitions.
Definition 8.14. (Closed, open and bounded subsets)
The boundary D of a subset D of R
2
is the set of all points c R
2
such
that every open disc centred at c contains points of D and points not in D.
A subset D of R
2
is called closed if all its boundary points are in the
subset, that is, if D D.
A subset D of R
2
is called open if none of its boundary points are in
the subset, that is, if D D = .
mathematical methods i 133
A set D is called bounded if it is contained in a disc of nite radius.
The boundary of the set
D = (x, y) [ 2 x 3, 0 y 10
is
(x, y) D [ x 2, 2 or y 0, 10
Thus D is closed and it is also bounded. The set (x, y) [ 2 x
3, y > 0 is unbounded.
Theorem 8.15 (The Extreme Value Theorem). Let D be a non-empty
closed and bounded subset of R
2
and let f : D R be a continuous
function. Then f is bounded and there exist a D and b D such that
f (a) f (x) f (b) for all x D.
Again we have a strategy for nding the absolute maxima and
minima of a function.
Key Concept 8.16. Procedure for nding the absolute extrema of a
continuous function f over a closed and bounded set
Step 1. Find the critical points of f and the values of f at these.
Step 2. Find the maximum and the minimum of the restriction of f
to the boundary D.
Step 3. Compare the values from Steps 1 and 2; the largest of them
gives the absolute maximum, while the smallest gives the absolute
minimum.
The existence of the absolute maximum and minimum follows from
the Extreme Value Theorem.
Example 8.17. Find the maximum and minimum of the function f (x, y) =
2x
2
+ x + y
2
2 on D = (x, y) [ x
2
+ y
2
4.
Solution:
Step 1. To nd the critical points of f , solve the system
f
x
(x, y) = 4x +1 = 0
f
y
(x, y) = 2y = 0
There are no points in D where f (x, y) fails to exist, so the only
critical point is P = (1/4, 0) which is in the interior of D. We have
f (P) =
1
8

1
4
2 =
17
8
.
Step 2. The boundary D of D is the circle x
2
+ y
2
= 4. On this set of
points we have y
2
= 4 x
2
, so
f (x, y) = 2x
2
+ x + (4 x
2
) 2 = x
2
+ x +2
134
on D. Notice that on D we have x [2, 2]. So, we have to nd the
extreme values of g(x) = x
2
+ x +2 on the interval [2, 2].
From g
/
(x) = 2x +1, the only critical point of g(x) on [2, 2] is x =
1/2. Since g(1/2) =
7
4
and on the end points of the interval we
have g(2) = 4 and g(2) = 8, it follows that the absolute maximum
of g(x) is g(2) = 8 and its absolute minimum is g(1/2) =
7
4
. These
are the maximum and the minimum values of f on the circle D.
Step 3. Comparing the maximum and minimum values of f on D with
the value of f at the only critical point P, we see that the absolute
minimum of f is
17
8
= f (1/4, 0) and its absosulte maximum is
8 = g(2) = f (2, 0).
If f is a function of three or more variables dened in some re-
gion D of R
n
, one can dene local and absolute extrema, critical
points and saddle points in the same way. The analogues of Theo-
rems 8.12 and 8.15 remain true.
8.3 Identifying local maxima and minima
If f (x) is a real-valued function of a single variable, one way to
identify if a critical point c is a local maxima or minima is to ex-
amine the value of the derivative around c. For example, if on an
interval I centred at c the function f (x) is increasing for x < c (that
is, if f
/
(x) > 0) and decreasing for x > c, (that is, f
/
(x) < 0) then
f has a local maxima at x = c. We can identify a local minima in a
similar way.
Another way is to identify the concavity of the function.
Definition 8.18. (Inection point)
A function f is called concave up on an interval I if f
/
(x) is increasing
on I. Similarly, f is called concave down if f
/
(x) is decreasing on I.
We say that c I is an inection point for f if c is an interior point
of I and the type of concavity of f changes at c.
Remark 8.19. Of course, one way to identify if f
/
(x) is increasing or
decreasing is to examine the second derivative f
//
(x). That is, if f
//
(x)
exists and f
//
(x) > 0 for all x I then f is concave up on I. Similarly, if
f
//
(x) < 0 for all x I then f is concave down on I.
Moreover, if c is a point of inection then f
//
(c) = 0.
Example 8.20. For g(x) = x
3
whose graph is given in Figure 8.3, we
have g
//
(x) = 6x, so g
//
(x) < 0 for x < 0 and g
//
(x) > 0 for x > 0.
Thus, g is concave down on (, 0) and concave up on (0, ). The point
x = 0 is an inection point. There are no other inection points.
This provides the Second Derivative Test.
Theorem 8.21 (The Second Derivative Test). Let f (x) be a continuous
function dened on an interval I with continuous second derivative and
let c I be a critical point of f .
mathematical methods i 135
2 1 1 2
2
2
Figure 8.3: The graph of the function
f (x) = x
3
.
1. If f
//
(c) > 0 then f has a local minimum at c.
2. If f
//
(c) < 0 then f has a local maximum at c.
If f
//
(c) = 0, then c may or may not be an inection point.
Example 8.22. The function f (x) = x
4
for x R whose graph is
given in Figure 8.4, is concave up on the whole of R, so x = 0 is not an
inection point for f (although f
//
(0) = 0).
2 1 1 2
2
4
6
8
Figure 8.4: The graph of the function
f (x) = x
4
.
Next consider a scalar-valued function f (x, y) of two variables
dened on a subset D of R
2
. There are now four second order
partial derivatives f
xx
, f
xy
, f
yx
and f
yy
to consider. We already saw
in Theorem 7.17 that if f
xy
and f
yx
are dened and continuous on D
then they are equal. For c D, we dene the matrix
_
f
xx
(c) f
xy
(c)
f
yx
(c) f
yy
(c)
_
This matrix is sometimes called the Hessian matrix and denoted by
D
2
f (c).
136
The Hessian matrix denes a quadratic form
Q
c
(x, y) =
_
x y
_
_
f
xx
(c) f
xy
(c)
f
yx
(c) f
yy
(c)
_ _
x
y
_
= f
xx
(c)x
2
+ ( f
xy
(c) + f
yx
(c))xy + f
yy
(c)y
2
The quadratic form Q
c
is called positive denite if Q
c
(x, y) > 0 for
all (x, y) ,= (0, 0) and called negative denite if Q
c
(x, y) < 0 for all
(x, y) ,= (0, 0).
We will assume from now on that f
xy
and f
yx
are continuous
so that D
2
f (c) is a symmetric matrix. Since D
2
f (c) is symmetric,
the quadratic form Q
c
will be positive denite if and only if the
determinant D
c
of its associated Hessian matrix is strictly positive
and f
xx
(c) > 0. The form Q
c
will be negative denite if and only if
D
c
> 0 and f
xx
(c) < 0. When D
c
< 0, the quadratic form takes both
positive and negative values. These three statements are perhaps
best justied by looking at the eigenvalues of the Hessian matrix, but
eigenvalues wont be seen until later in the course.
Theorem 8.23 (Second Derivatives Test). Suppose f has continuous
second partial derivatives in an open set U in R
2
, c U and f (c) = 0
(that is, c is a critical point of f ).
1. If Q
c
(x, y) is positive denite (that is, D
c
> 0 and f
xx
(c) > 0), then f
has a local minimum at c.
2. If Q
c
(x, y) is negative denite (that is, D
c
> 0 and f
xx
(c) < 0), then f
has a local maximum at c.
3. If Q
c
(x, y) assumes both positive and negative values (that is, D
c
< 0),
then c is a saddle point of f .
Remark 8.24. 1. When D
c
= 0, the Second Derivatives Test gives no
information.
2. Notice that the three cases in Theorem 8.23 do not cover all possible
cases for the quadratic form Q
c
(x, y). For example, it may happen
that Q
c
(x, y) 0 for all x, y (we then say that Q is positive semi-
denite).
Example 8.25. Find all critical points of the function f (x, y) = x
3

3xy + y
3
and determine their type (if possible).
Solution: There are no points in the domain where f (x, y) does not
exist, so the only critical points of f are solutions of the system
f
x
(x, y) = 3x
2
3y = 0,
f
y
(x, y) = 3x +3y
2
= 0
From the 1st equation y = x
2
. Substituting this into the second equation
we get x + x
4
= 0. Thus, x = 0 or x = 1. Consequently, the critical
points of f are P
1
= (0, 0) and P
2
= (1, 1).
mathematical methods i 137
Next, f
xx
= 6x, f
xy
= 3 and f
yy
= 6y. At P
1
the Hessian matrix is
_
0 3
3 0
_
which has determinant 9 and so P
1
is a saddle point.
At P
2
the Hessian matrix is
_
6 3
3 6
_
which has determinant 27. Since f
xx
(1, 1) = 6 > 0 the quadratic form is
positive denite and so f has a local minimum at P
2
= (1, 1).
The surface dened by the graph of f is given in Figure 8.5.
Figure 8.5: The surface dened by
f (x, y) = x
3
3xy + y
3
.
9
Taylor Polynomials
We saw in Chapter 7 that we could use the derivative of a function
to get a good linear approximation of the function. Sometimes,
an even better approximation is required. In particular, it is often
convenient to be able to approximate a function by a polynomial: it
is easy to calculate its value and easy to differentiate and integrate.
9.1 Taylor polynomials for functions of one variable
Definition 9.1. (Taylor polynomial)
Let f (x) be a real-valued function of one variable dened on some interval
I and having continuous derivatives f
/
(x), f
//
(x), . . . , f
(n)
(x) on I for
some integer n 1. Let a be an interior point of I.
The nth Taylor polynomial of f about a is dened by
T
n,a
(x) = f (a) +
f
/
(a)
1!
(x a) +
f
//
(a)
2!
(x a)
2
+. . . +
f
(n)
(a)
n!
(x a)
n
This is always a polynomial of degree at most n (however the de-
gree might be less than n, since it may happen that f
(n)
(a) = 0).
Note that T
1,a
(x) is the equation of the tangent line to the curve
y = f (x) when x = a.
Remark 9.2. 1. Clearly T
n,a
(a) = f (a). Next,
T
/
n,a
(x) = f
/
(a) +
f
//
(a)
1!
(x a) +
f
///
(a)
2!
(x a)
2
+. . . +
f
(n)
(a)
(n 1)!
(x a)
n1
so T
/
n,a
(a) = f
/
(a).
Continuing in this way, one sees that
T
(k)
n,a
(a) = f
(k)
(a)
for all k = 0, 1, 2, . . . , n.
2. Using properties of polynomials one can actually prove that if Q(x) is
any polynomial of degree at most n such that Q
(k)
(a) = f
(k)
(a) for all
k = 0, 1, 2, . . . , n, then Q(x) = T
n,a
(x).
140
Example 9.3. 1. Let f (x) = e
x
and a = 0. Since f
/
(x) = e
x
= f
(k)
(x)
for all k 0 we have f
(k)
(0) = 1 for all k 0, and so
T
n,0
(x) = 1 +
x
1!
+
x
2
2!
+ . . . +
x
n
n!
for any positive integer n.
2. Let f (x) = sin(x). Notice that
f
/
(x) = cos(x) , f
//
(x) = sin(x) , f
///
(x) = cos(x) , f
(4)
(x) = sin(x)
and the sequence of the derivatives is periodic with period 4.
Consider a = /3 and n = 3. We have
f (/3) =

3
2
, f
/
(/3) =
1
2
, f
//
(/3) =

3
2
, f
///
(/3) =
1
2
therefore
T
3,/3
(x) =

3
2
+
1
2
(x /3)

3
4
(x /3)
2

1
12
(x /3)
3
Taylor polynomials are used to approximate the original func-
tion f . That is, given f , n and a as before, we consider the approxi-
mation
f (x) T
n,a
(x) for x I (9.1)
This can be written as
f (x) = T
n,a
(x) + R
n,a
(x) , (9.2)
where R
n,a
(x) is the error term (the so called remainder). Naturally,
the approximation (9.1) will not be of much use if we do not know
how small the error is. The following theorem gives some informa-
tion about the remainder R
n,a
(x) which can be used to get practical
estimates about the size of the error.
Theorem 9.4 (Taylors Formula). Assume that f has continuous deriva-
tives up to order n + 1 on some interval I and a is an interior point of I.
Then for any x I there exists z between x and a such that
R
n,a
(x) =
f
(n+1)
(z)
(n +1)!
(x a)
n+1
That is, for any x I,
f (x) = f (a) +
f
/
(a)
1!
(x a) +
f
//
(a)
2!
(x a)
2
+. . . +
f
(n)
(a)
n!
(x a)
n
+
f
(n+1)
(z)
(n +1)!
(x a)
n+1
for any x I, where z depends on x.
Despite the fact that we dont know what z is, knwoing an in-
terval in which it lies allows us to bound the error, as we see in the
next example.
mathematical methods i 141
Example 9.5. Find T
6,0
(x) for f (x) = sin(x) and estimate the maximal
possible error in the approximation
sin(x) T
6,0
(x) (9.3)
for x [0.3, 0.3]. Use this approximation to nd sin(12

) correct to 6
decimal places.
Solution: Recall the derivatives of sin(x):
f
/
(x) = cos(x) , f
//
(x) = sin(x) , f
///
(x) = cos(x) , f
(4)
(x) = sin(x) ,
f
(5)
(x) = cos(x) , f
(6)
(x) = sin(x) , f
(7)
(x) = cos(x)
Thus,
T
6,0
(x) = 0 +
1
1!
x +0
1
3!
x
3
+0 +
1
5!
x
5
= x
x
3
3!
+
x
5
5!
To estimate the error, recall that
sin(x) = T
6,0
(x) + R
6,0
(x) ,
where by Taylors formula, for any x the error term has the form
R
6,0
(x) =
f
(7)
(z)
7!
x
7
for some z between 0 and x. For x [0.3, 0.3] we have [x[ 0.3, so
[R
6,0
(x)[ =
[ cos(z)[
5040
[x[
7

1
5040
[0.3[
7
= 0.4339285 10
7
< 4.4 10
8
.
Hence the maximum error in the approximation (9.3) for x [0.3, 0.3] is
at most 4.4 10
8
.
Since 12

is /15 in radians and /15 < 0.3, the previous argument


shows that
sin(12

) = sin(/15) /15
(/15)
3
3!
+
(/15)
5
5!
= 0.20791169...
is correct to at least 6 decimals.
9.2 Taylor polynomials for functions of two variables
We can also dene Taylor polynomials for functions of two vari-
ables. We will only state the second degree approximation.
Theorem 9.6 (Taylors Theorem). Let D be an open disc in R
2
, let
f : D R, and let c = (a, b) D. If f has continuous and bounded
partial derivatives up to third order in D, then there exists a constant
C > 0 such that for each (x, y) D we have
f (x, y) = f (a, b) +
_
f
x
(a, b) f
y
(a, b)
_
_
x a
y b
_
+
1
2
_
x a y b
_
_
f
xx
(a, b) f
xy
(a, b)
f
yx
(a, b) f
yy
(a, b)
_ _
x a
y b
_
+R
2
(a, b; x, y)
where [R
2
(a, b; x, y)[ C [(x a, y b)[
3
= C [(x a)
2
+ (y b)
2
]
3/2
.
142
Note that the rst two terms in the approximation for f (x, y)
give the equation of the tangent plane to the surface z = f (x, y) at
the point (a, b, f (a, b)).
Remark 9.7. If (a, b) is a critical point of f then f
x
(a, b) = f
y
(a, b) = 0
and so the second degree Taylor polynomial is simply
f (a, b) +
1
2
_
x a y b
_
_
f
xx
(a, b) f
xy
(a, b)
f
yx
(a, b) f
yy
(a, b)
_ _
x a
y b
_
Note that the quadratic term involves the Hessian matrix for f and the
quadratic form used in the Second Derivative Test in Theorem 8.23.
Clearly, if the quadratic form only takes positive values near (a, b) then
f will have a local minimum at (a, b), and if the quadratic form only takes
negative values near (a, b) then f will have a local maximum at (a, b).
This justies Theorem 8.23.
Example 9.8. Let f (x, y) =
_
1 + x
2
+ y
2
and let c = (0, 0). Then
f
x
(x, y) = x/
_
1 + x
2
+ y
2
, so f
x
(0, 0) = 0, and similarly f
y
(0, 0) = 0.
Thus, the rst degree approximation of f about c = (0, 0) is
f (x, y) f (0, 0) +
_
f
x
(0, 0) f
y
(0, 0)
_
_
x 0
y 0
_
= 1
This amounts to approximating the function by the equation of its tangent
plane.
Next,
f
xx
(x, y) =
1
_
1 + x
2
+ y
2

x
2
(1 + x
2
+ y
2
)
3/2
so f
xx
(0, 0) = 1. Similarly, f
xy
(0, 0) = 0 and f
yy
(0, 0) = 1, so the second
degree approximation of f about c = (0, 0) is
f (x, y) 1 +
1
2
_
x 0 y 0
_
_
f
xx
(0, 0) f
xy
(0, 0)
f
yx
(0, 0) f
yy
(0, 0)
_ _
x 0
y 0
_
= 1 +
1
2
(x
2
+ y
2
)
Now we are approximating the function by a second-order polynomial, so
by the equation of a curved, rather than at, surface.
Part III
Differential equations and
eigenstructure: by Des Hill
10
Differential Equations
10.1 Introduction
In a vast number of situations a mathematical model of a system
or process will result in an equation (or set of equations) involving
not only functions of the dependent variables but also derivatives
of some or all of those functions with respect to one or more of the
variables. Such equations are called differential equations.
The simplest situation is that of a single function of a single
independent variable, in which case the equation is referred to
as an ordinary differential equation. A situation in which there is
more than one independent variable will involve a function of
those variables and an equation involving partial derivatives of that
function is called a partial differential equation.
Notationally, it is easy to tell the difference. For example, the
equation
f
x
+
f
y
= f
2
(10.1)
is a partial differential equation to be solved for f (x, y), whereas
d
2
f
dx
2
+3
d f
dx
+2f = x
4
(10.2)
is an ordinary differential equation to be solved for f (x).
The order of a differential equation is the degree of the highest
derivative that occurs in it. Partial differential equation (10.1) is
rst order and ordinary differential equation (10.2) is second order.
For partial differential equations the degree of a mixed derivative
is the total number of derivatives taken. For example, the following
partial differential equation for f (x, t) has order ve:
f
5
x
3
t
2
+

2
f
x
2
+
f
t
= 0 (10.3)
An important class of differential equations are those referred to
as linear. Roughly speaking linear differential equations are those
in which neither the function nor its derivatives occur in products,
powers or nonlinear functions. Differential equations that are not
linear are referred to as nonlinear. Equation (10.1) is nonlinear,
whereas equations (10.2) and (10.3) are both linear.
146
Example 10.1. Classify the following differential equations with respect
to (i) their nature (ordinary or partial), (ii) their order and (iii) linear or
nonlinear:
(a)
f
x

f
y
= 1 (b)

2
g
t
2
+ g = sin(t)
(c)
d
3
y
dx
3
+8y = x sin x (d)
u
x
u
t
= x + t
(e) P
2
d
2
P
dx
2
= x
5
+1 ( f )

4
F
xy
3
= t
2
F
Solution:
(i) Equations (a), (b), (d) and ( f ) involve partial derivatives and are
hence partial differential equations, whereas equations (c) and (e) involve
ordinary derivatives and are hence ordinary differential equations.
(ii) Recall that the order of a differential equation is the degree of the
highest derivative that occurs in it. The orders of the differential equations
are as follows:
(a) rst (b) second (c) third (d) rst
(e) second ( f ) fourth
(iii) Recall that linear differential equations are those in which neither
the function nor its derivatives occur in products, powers or nonlinear
functions. It doesnt matter how the independent variables appear. We
observe that equations (a), (b), (c) and ( f ) are linear whereas equations
(d) and (e) are nonlinear.
10.1.1 Solutions of differential equations
When asked to solve an algebraic equation, for example x
2
3x +
2 = 0, we expect the answers to be numbers. The situation as
regards differential equations is much more difcult because we are
being asked to nd functions that will satisfy the given equation,
for example in Example 10.1(a) we are asked for a function f (x, y)
that will satisfy the partial differential equation
f
x

f
y
= 1, and
in Example 10.1(c) we are asked to nd a function y(x) that will
satisfy
d
3
y
dx
3
+8y = x sin x.
Unlike algebraic equations, which only have a discrete set of
solutions (for example x
2
3x +2 = 0 only has the solutions x = 1
or 2) differential equations can have whole families of solutions.
For example, y = Ae
3t
satises the ordinary differential equation
dy
dt
= 3y for any value of A.
If a differential equation is linear then there is a well-established
procedure for nding solutions and we shall cover this in detail for
ordinary differential equations. If an ordinary differential equa-
tion is nonlinear but is of rst order then we may be able to nd
solutions.
The theory of partial differential equations is outside the scope of
this unit but we shall touch upon it at the end of this section.
mathematical methods i 147
10.1.2 Verication of solutions of differential equations
To get a feel for things (and to practice our algebra) well have a
quick look at the relatively simple procedure of verifying solutions
of differential equations by way of a few examples
Example 10.2. Verify that
y(x) = Ae
2x
+ Be
2x
2 cos(x) 5x sin(x) (10.4)
is a solution of the ordinary differential equation
d
2
y
dx
2
4y = 25x sin(x) (10.5)
for any value of the constants A and B.
Solution: We need to calculate
d
2
y
dx
2
. In order to do this we need the
product rule to differentiate x sin x. It gives
d
dx
(x sin(x)) = sin(x) + x cos(x)
and
d
2
dx
2
(x sin x) = 2 cos(x) x sin(x)
Hence
d
2
y
dx
2
= 4Ae
2x
+4Be
2x
8 cos(x) +5x sin(x)
and substitution of this and the expression (10.4) into equation (10.5)
quickly yields the required verication.
Example 10.3. Verify that both
(i) f (x, y) = xy
y
2
2
and (ii) f (x, y) = sin(y x) +
x
2
2
are solutions of the partial differential equation
f
x
+
f
y
= x
Solution: In each case we need to calculate
f
x
and
f
y
. We have
(i)
f
x
= y and
f
y
= x y. Thus
f
x
+
f
y
= y + x y = x
(ii)
f
x
= cos(y x) + x and
f
y
= cos(y x).
Hence
f
x
+
f
y
= cos(y x) + x +cos(y x) = x
In both cases we have veried the solution of the partial differential equa-
tion.
148
10.2 Mathematical modelling with ordinary differential equa-
tions
For real-world systems changing continuously in time we can use
derivatives to model the rates of change of quantities. Our mathe-
matical models are thus differential equations.
Example 10.4. Modelling population growth.
The simplest model of population growth is to assume that the rate of
change of population is proportional to the population at that time. Let
P(t) represent the population at time t. Then the mathematical model is
dP
dt
= rP for some constant r > 0
It can be shown (using a method called separation of variables, which
we shall learn shortly) that the function P(t) that satises this differential
equation is
P(t) = P
0
e
rt
where P
0
is the population at t = 0
This model is clearly inadequate in that it predicts that the population will
increase without bound if r > 0. A more realistic model is the logistic
growth model
dP
dt
= rP(C P) where r > 0 and C > 0 are constants
The method of separation of variables can be used to show that the solution
of this differential equation is
P(t) =
CP
0
P
0
+ (C P
0
)e
rt
where P
0
= P(0)
This model predicts that as time goes on, the population will tend towards
the constant value C, called the carrying capacity.
Example 10.5. Newtons law of cooling
The rate at which heat is lost from an object is proportional to the differ-
ence between the temperature of the object and the ambient temperature.
Let H(t) be the temperature of the object (in

C) at time t and suppose
the xed ambient temperature is A

C. Newtons law of cooling says that


dH
dt
= (A H) for some constant > 0
The method of separation of variables can be used to show that the solution
of this differential equation is
H(t) = A + (H
0
A)e
t
where H
0
= H(0)
This model predicts that as time goes on, the temperature of the object will
approach that of its surroundings, which agrees with our intuition.
Example 10.6. One tank mixing process
Suppose we have a tank of salt water and we allow fresh water into the
tank at a rate of F m
3
/sec, and allow salt water out at the same rate. Note
mathematical methods i 149
1.4 One tank mixing process
Suppose we have a tank of salt water and we allow fresh water into the tank at a rate of F
m
3
/sec, and allow salt water out at the same rate. Note this is a volume rate, and that the volume V
of the tank is maintained constant. We assume instantaneous mixing so that the tank has a uniform
concentration.
Tank
Fresh Water
F
Salt Water
F
Volume
V
Let x(t) represent the salt concentration of the water (Kg/m
3
) in the tank at time t and a(t)
represent the amount of salt (Kg). We have x(t) = a(t)/V . The tank starts with an amount of salt
a
0
Kg. The rate at which salt is being removed from the tank at time t is given by
da
dt
= x(t) ow rate = Fx(t) =
F
V
a(t) = a(t)
where = F/V is a positive constant. This equation has the solution a(t) = a
0
e
t
. This converges
to zero as t .
Consider the same tank which is now lled with fresh water. Water polluted with q Kg/m
3
of
some chemical enters the tank at a rate of F m
3
/sec, and polluted water exits the tank at the same
rate. We again assume instantaneous mixing so that the tank has a uniform concentration.
Let x(t) represent the concentration of pollutant (Kg/m
3
) in the water in the tank at time t and
a(t) represent the amount of pollutant (Kg). We have x(t) = a(t)/V . The rate at which pollutant is
being added to the tank at time t is given by
da
dt
= amount of pollutant added per sec amount of pollutant removed per sec
= qF Fx(t) = qF
F
V
a(t)
Alternatively, we can obtain a dierential equation for the concentration x(t) by dividing through the
above equation by V to give
dx
dt
=
qF
V

F
V
x(t) =
F
V
(q x) = (q x)
where = F/V is a positive constant. Solving this equation by separation of variables gives

dx
q x
= ln |q x| =

dt + c = t + c.
Take exponentials of both sides to get
|q x| = exp(t c) =|q x| = Ce
t
, where C = e
c
> 0.
We can remove the absolute value bars to give
q x = Ce
t
= Ke
t
=x(t) = q Ke
t
where K is now any constant (positive or negative). Now substitute the initial condition x
0
= 0 when
t = 0 to obtain
0 = q K =K = q
and then the solution is
x(t) = q(1 e
t
).
MODELLING WITH DIFFERENTIAL EQUATIONS 4
Figure 10.1: A tank.
this is a volume rate, and that the volume V of the tank is maintained
constant. We assume instantaneous mixing so that the tank has a uniform
concentration.
Let x(t) represent the salt concentration of the water (kg/m
3
) in the
tank at time t and a(t) represent the amount of salt (kg). We have x(t) =
a(t)/V. The tank starts with an amount of salt a
0
kg.
The rate at which salt is being removed from the tank at time t is given
by
da
dt
= x(t) ow rate = Fx(t) =
F
V
a(t) = a(t)
where = F/V is a positive constant. This equation has the solution
a(t) = a
0
e
t
. This converges to zero as t .
Consider the same tank which is now lled with fresh water. Water pol-
luted with q kg/m
3
of some chemical enters the tank at a rate of F m
3
/sec,
and polluted water exits the tank at the same rate. We again assume in-
stantaneous mixing so that the tank has a uniform concentration.
Let x(t) represent the concentration of pollutant (kg/m
3
) in the water
in the tank at time t and a(t) represent the amount of pollutant (kg). We
have x(t) = a(t)/V. The rate at which pollutant is being added to the
tank at time t is given by
da
dt
= amount of pollutant added per sec amount of pollutant removed per sec
That is,
da
dt
= qF Fx(t) = qF
F
V
a(t)
Alternatively, we can obtain a differential equation for the concentration
x(t) by dividing through the above equation by V to give
dx
dt
=
F
V
(q x)
dx
dt
= (q x)
where = F/V is a positive constant. Notice that this is essentially
the same as the differential equation that we obtained for Newtons law of
cooling.
10.3 First order ordinary differential equations
Most rst order ordinary differential equations can be expressed
(by algebraic re-arrangement if necessary) in the form
dx
dt
= f (t, x) (10.6)
150
where the function f (t, x) is known, and we are asked to nd the
solution x(t).
10.3.1 Direction elds
Expression (10.6) means that for any point in the tx-plane (for
which f is dened) we can evaluate the gradient
dx
dt
and repre-
sent this graphically by means of a small arrow. If we do this for a
whole grid of points in the tx-plane and place all of the arrows on
the same plot we produce what is called a direction eld. Figure 10.2
displays the direction eld in the case where f (t, x) = x
2
t
2
.
Figure 10.2: The direction eld of
dx
dt
= x
2
t
2
.
A solution of equation (10.6) is a function relating x and t which
geometrically is a curve in the tx-plane. Since this solution satises
the differential equation then the curve is such that its gradient is
the same as the direction eld vector at any point on the curve.
That is, the direction eld is a collection of arrows that are
tangential to the solution curves. This observation enables us to
roughly sketch solution curves without actually solving the differ-
ential equation, as long as we have a device to plot the direction
eld. We can indeed sketch many such curves (called a family of
solution curves) superimposed on the same direction eld.
Example 10.7. The direction eld of
dx
dt
= x
2
t
2
along with three
solution curves is given in Figure 10.3. The top curve is the solution that
goes through (t, x) = (0, 1). The middle curve is the solution that goes
through (t, x) = (0, 0) and the bottom curve is the solution that goes
through (t, x) = (0, 2).
Example 10.8. The direction eld of
dx
dt
= 3x + e
t
along with three
solution curves. The top curve is the solution that goes through (t, x) =
(0, 1) is given in Figure 10.4. The middle curve is the solution that goes
through (t, x) = (0, 0) and the bottom curve is the solution that goes
through (t, x) = (0, 1).
mathematical methods i 151
Figure 10.3: Three solution curves for
dx
dt
= x
2
t
2
.
Figure 10.4: Three solution curves for
dx
dt
= 3x + e
t
.
152
10.3.2 Separation of variables
If it is the case that a rst order differential equation can be manip-
ulated into the following form, called separable form:
dx
dt
=
h(t)
g(x)
for some g(x) ,= 0 (10.7)
(and the equation is called a separable differential equation) then there
is a simple way of determining its solution. Informally, we treat
dx
dt
as a fraction and re-arrange equation (10.7) into the form
g(x)dx = h(t)dt
We now (symbolically) integrate both sides to get
_
g(x)dx =
_
h(t)dt
If we can integrate g(x) and h(t) with respect to their respective ar-
guments then we will have an expression that we can (in principle)
re-arrange into a solution of the form x(t).
This informal approach can be made rigorous by appeal to the
Chain Rule. We write equation (10.7) as
g(x)
dx
dt
= h(t)
and integrate both sides with respect to t to get
_
g(x)
dx
dt
dt =
_
h(t)dt which implies
_
g(x)dx =
_
h(t)dt
Example 10.9. Solve the rst order differential equation
dx
dt
= x
2
sin(t)
Solution: The differential equation is separable. The solution is given
by
_
dx
x
2
=
_
sin(t)dt which implies
1
x
= cos(t) + C
(10.8)
where C is a constant of integration. We can re-arrange equation (10.8) to
get
x(t) =
1
cos(t) C
(10.9)
Note that we could have added integration constants, say C
1
and C
2
,
to each side of equation (10.8) but these can easily be combined into one
integration constant: C = C
2
C
1
.
Note also that equation (10.8) does not hold if x = 0. In such sit-
uations we have to investigate the original differential equation
dx
dt
=
x
2
sin(t). In this case it turns out that x(t) = 0 is in fact a solution, but
not of the form (10.9). Special situations like this are something that we
should be aware of.
10.3.3 The integrating factor method
If a rst order ordinary differential equation is linear then it can be
solved by the following process. The most general form for a rst
mathematical methods i 153
order linear ordinary differential equation is
dx
dt
+ p(t)x = r(t) (10.10)
The trick is to multiply this equation throughout by an as yet un-
determined function g(t) and then appeal to the product rule of
differentiation to guide us in the correct choice of g(t). Multiplying
equation (10.10) throughout by g(t) gives
g(t)
dx
dt
+ g(t)p(t)x = g(t)r(t) (10.11)
We would like to be able to write the left-hand side of this equation
in the form:
d
dt
[g(t)x]
Applying the product rule to this expression and comparing it to
the leftt-hand side of equation (10.11) we have
g(t)
dx
dt
+
dg
dt
x = g(t)
dx
dt
+ g(t)p(t)x
which implies that
dg
dt
= g(t)p(t) (10.12)
This equation is separable. The solution is
_
dg
g(t)
=
_
p(t)dt
which implies
ln g(t) =
_
p(t)dt (10.13)
We can set the constant of integration to zero - any function that
satises equation (10.12) will do for our purposes. Exponentiating
both sides of equation (10.13) gives
g(t) = e
_
p(t)dt
and g(t) is referred to as an integrating factor. With g(t) having this
special form, equation (10.11) is equivalent to
d
dt
[g(t)x(t)] = g(t)r(t)
Integrating this equation with respect to t gives
g(t)x(t) =
_
g(t)r(t)dt + C
and hence the solution of the ordinary differential equation is
x(t) =
1
g(t)
_
_
g(t)r(t)dt + C
_
where g(t) = e
_
p(t)dt
(10.14)
Note that we must include the integration constant C because we
want the most general solution possible.
154
Example 10.10. . Solve the rst order linear differential equation
dx
dt
3x = e
t
(10.15)
As a guide to the shape of the solutions, the direction eld for this differ-
ential equation along with a number of solution curves appears in Figure
10.4.
Solution: The integrating factor is
g(t) = e
_
p(t)dt
= e
_
3dt
= e
3t
Multiplying the differential equation through by g(t) gives
e
3t
dx
dt
3e
3t
x = e
2t
We know from our recent work that the left-hand side of this can be rewrit-
ten in product form and so we obtain:
d
dt
_
e
3t
x
_
= e
2t
Now integrate both sides with respect to t:
e
3t
x =
_
e
2t
dt =
1
2
e
2t
+ C
and tidy up:
x(t) =
e
t
2
+ Ce
3t
(10.16)
Note that we could have written down the solution immediately by ap-
pealing to equation (10.14) but when learning the method it is instructive
to follow through each step in the process in order to gain a better under-
standing of how it works.
Example 10.11. Solve the rst order linear differential equation
dx
dt
+
2
t
x =
sin(t)
t
2
Solution: The integrating factor is
g(t) = e
_
p(t)dt
= e
_
2
t
dt
= e
2 ln(t)
= e
ln(t
2
)
= t
2
Multiplying the differential equation through by this gives
t
2
dx
dt
+2tx = sin(t)
We know from our recent work that the right-hand side of this can be
rewritten in product form:
d
dt
_
t
2
x
_
= sin(t)
Now integrate with respect to t:
t
2
x =
_
sin(t)dt = cos(t) + C
and tidy up:
x(t) =
cos(t) + C
t
2
mathematical methods i 155
10.4 Initial conditions
The values of constants of integration that arise when we solve
differential equations can be determined by making use of other
conditions (or restrictions) placed on the problem.
Example 10.12. Solve
dx
dt
3x = e
t
subject to x(0) = 1, that is, x = 1
when t = 0.
Solution: We have already seen this differential equation. It is equa-
tion (10.15) and we have determined that its (most general) solution is
given by (10.16):
x =
e
t
2
+ Ce
3t
All we have to do is substitute x = 1 and t = 0 and solve the resulting
algebraic equation for C. We have
1 =
e
0
2
+ Ce
0
1 =
1
2
+ C C =
3
2
so the required solution is
x(t) =
3e
3t
e
t
2
(10.17)
In applications, the variable t is usually used to represent time.
The extra condition (for example x(0) = 1) is called an initial value
and the combined differential equation plus initial condition is
called an initial value problem. The solution curve of (10.17) appears
in Figure 10.4.
Occasionally there are situations when the initial condition is
actually specied at a nonzero time value.
Example 10.13. Solve the initial value problem
dx
dt
=
t
2
x
subject to x(1) = 4
Solution: We observe that the differential equation is separable. The
solution is:
_
xdx =
_
t
2
dt
x
2
2
=
t
3
3
+ C
which implies
x =
_
2t
3
3
+2C
Notice that we have two different solutions to the differential equation, one
positive and one negative. The initial condition x(1) = 4 allows us to
eliminate the negative solution so we are left with
x =
_
2t
3
3
+2C
and substituting into this x = 4 and t = 1 gives
4 =
_
2
3
+2C C =
23
3
x =
_
2t
3
+46
3
156
10.5 Linear constant-coefcient ordinary differential equations
The general theory of higher order (that is, greater than one) dif-
ferential equations is beyond the scope of this unit and in any case
many applications lead to differential equations that have an espe-
cially simple structure, and we shall concentrate on these.
Recall that in a linear differential equation neither the function
nor its derivatives occur in products, powers or nonlinear functions.
We shall focus on second order linear differential equations. Exten-
sion to higher orders is straightforward, and rst order ones can be
handled by the methods already discussed.
The most general second order linear ordinary differential equa-
tion is
a(t)
d
2
x
dt
2
+ b(t)
dx
dt
+ c(t)x = f (t)
but we shall simpify this even further by assuming that the coef-
cient functions on the left-hand side are constants (that is, not
functions of t). In this case we have the general second order linear
constant-coefcient ordinary differential equation
a
d
2
x
dt
2
+ b
dx
dt
+ cx = f (t) a ,= 0 (10.18)
If f (t) = 0 then the equation is referred to as being homogeneous,
otherwise it is called nonhomogeneous. We shall rst of all concen-
trate on the homogeneous case, that is, we wish to solve
a
d
2
x
dt
2
+ b
dx
dt
+ cx = 0 a ,= 0 (10.19)
where a, b and c are given constants.
In seeking a solution technique we consider the rst order equa-
tion
b
dx
dt
+ cx = 0 b ,= 0
We can solve this by using either the separation approach or the
integrating factor method. Using separation we have
dx
dt
=
cx
b

_
dx
x
=
_

c
b
dt
ln(x) =
c
b
t + C
x = Ae
mt
where m = c/b, C is the constant of integration and A = e
C
.
By analogy we attempt to nd a solution to the second order
equation (10.19) in the form x = e
mt
. Substituting x = e
mt
into
equation (10.19) yields
am
2
e
mt
+ bme
mt
+ ce
mt
= 0 (am
2
+ bm + c)e
mt
= 0
and since e
mt
can never be zero we can divide through by it to
arrive at a quadratic equation for m:
am
2
+ bm + c = 0
mathematical methods i 157
This equation is usually referred to as the characteristic equation (or
auxiliary equation) of the differential equation. Label the roots of this
equation m
1
and m
2
. Then both e
m
1
t
and e
m
2
t
are solutions of the
differential equation.
Homogeneous linear differential equations have the linearity
property, that is, if both x
1
(t) and x
2
(t) are solutions then so is any
linear combination of these (that is, Ax
1
(t) + Bx
2
(t) where A and B
are constants).
We can easily show this for equation (10.19). Let x(t) = Ax
1
(t) +
Bx
2
(t). Then
a
d
2
x
dt
2
+ b
dx
dt
+ cx = a
_
A
d
2
x
1
dt
2
+ B
d
2
x
2
dt
2
_
+ b
_
A
dx
1
dt
+ B
dx
1
dt
_
+ c [Ax
1
+ Bx
2
]
= A
_
a
d
2
x
1
dt
2
+ b
dx
1
dt
+ cx
1
_
+ B
_
a
d
2
x
2
dt
2
+ b
dx
2
dt
+ cx
2
_
= 0A +0B = 0
Making use of this linearity result allows us to write down the
general solution of the second order homogeneous linear constant-
coefcient ordinary differential equation:
x(t) = Ae
m
1
t
+ Be
m
2
t
(10.20)
Notice that in this case we have two undetermined constants
in the solution. The general situation is that an n
th
order linear
differential equation (even one with variable coefcients) will have
n undetermined constants in its general solution. If the differential
equation is nonlinear the situation is much less clear.
Recall that the roots of a quadratic equation can be either (i) both
real; (ii) complex conjugates; or (iii) a repeated root. The actual
solution form is different in each of these three cases.
Case 1. Two real roots. In this case the solution is simply that
given in (10.20).
Example 10.14. Solve the differential equation
d
2
x
dt
2
5
dx
dt
+4x = 0
Solution: The characteristic equation is m
2
5m + 4 = 0 which
factorises into (m 1)(m 4) = 0 and hence the required solutions are
m
1
= 1 and m
2
= 4 and the solution is
x(t) = Ae
t
+ Be
4t
Case 2. Complex conjugate roots. The differential equation is
in terms of real variables so we would like to have a solution in
terms of functions of real numbers. To achieve this we recall Eulers
formula
e
it
= cos(t) + i sin(t)
If the roots are m
1
= + i and m
2
= i then general solution
(including both real and complex functions) is
x = Ae
(+i)t
+ Be
(i)t
= e
t
_
Ae
it
+ Be
it
_
158
where A and B are complex numbers. Applying Eulers formula
gives
x(t) = e
t
[A(cos(t) + i sin(t)) + B(cos(t) i sin(t))]
which we can rewrite as
x(t) = C
1
e
t
cos(t) + C
2
e
t
sin(t)
where C
1
= A + B and C
2
= i(A B). Since we are only interested
in real-valued functions we are only interested in the cases where
C
1
and C
2
are real.
Example 10.15. Solve the differential equation
d
2
x
dt
2
4
dx
dt
+13x = 0
Solution: The characteristic equation is m
2
4m + 13 = 0. The
quadratic formula gives the roots as m
1,2
= 2 3i and hence the solution
is
x(t) = C
1
e
2t
cos(3t) + C
2
e
2t
sin(3t)
where C
1
and C
2
are any real numbers.
Case 3. Equal roots. In this case we have only one solution of the
form x
1
= e
mt
. The other solution will have the form x
2
= te
mt
. This
can be deduced using a process called reduction of order but this is
outside the scope of this material. The required solution is
x(t) = Ae
mt
+ Bte
mt
Example 10.16. Solve the differential equation
d
2
x
dt
2
+6
dx
dt
+9x = 0
Solution: The characteristic equation is m
2
+ 6m + 9 = 0 which
factorises into (m + 3)
2
= 0 and hence the required component solutions
are e
3t
and te
3t
and the solution of the differential equation is
x(t) = Ae
3t
+ Bte
3t
10.6 Linear nonhomogeneous constant-coefcient differential equa-
tions
We now consider equations of the form (10.18), reproduced here:
a
d
2
x
dt
2
+ b
dx
dt
+ cx = f (t) a ,= 0 (10.21)
where a, b and c are given constants. In the previous section we
learnt how to solve such a differential equation if f (t) = 0 (that is,
the equation is homogeneous). If f (t) ,= 0 the equation is called
nonhomogeneous.
mathematical methods i 159
In order to solve equation (10.21) we rst need to solve the
equivalent homogeneous equation (that is, when f (t) = 0). The
solution so obtained is referred to as the homogeneous solution or
the complementary function. It is actually a whole family of solutions
because it has two free parameters (C
1
and C
2
, or A and B).
The solution of equation (10.21) is the sum of this homogeneous
solution and a particular solution (that is, any solution that we
can nd) to the nonhomogeneous differential equation (10.21) and
is referred to as the general solution of (10.21). It is also actually a
whole family of solutions for the same reasons.
We can show that such a combination will be a solution to equa-
tion (10.21). Let x
c
(t) be a solution of the homogeneous equation
and x
p
(t) be a solution of equation (10.21). We need to show that
x(t) = x
c
(t) + x
p
(t) is also a solution of (10.21):
a
d
2
x
dt
2
+ b
dx
dt
+ cx = a
_
d
2
x
c
dt
2
+
d
2
x
p
dt
2
_
+ b
_
dx
c
dt
+
dx
p
dt
_
+ c
_
x
c
+ x
p
_
=
_
a
d
2
x
c
dt
2
+ b
dx
c
dt
+ cx
c
_
+
_
a
d
2
x
p
dt
2
+ b
dx
p
dt
+ cx
p
_
= 0 + f (t) = f (t)
and our claim is proved. We can also show that there are in fact
no other possible solutions of equation (10.21) by showing that
the difference between any two solutions (say x
1
(t) and x
2
(t)) is a
solution of the homogeneous equation:
a
d
2
dt
2
(x
1
x
2
) + b
d
dt
(x
1
x
2
) + c(x
1
x
2
)
=
_
a
d
2
x
1
dt
2
+ b
dx
1
dt
+ cx
1
_

_
a
d
2
x
2
dt
2
+ b
dx
2
dt
+ cx
2
_
= f (t) f (t) = 0
So the set of solutions of a linear differential equation behaves in
the same way as what we saw for the set of solutions of a system of
linear equations in Section 3.4.
In any given problem our tasks will be to solve the homogeneous
equation (using the methods of the previous section) and to nd
a particular solution. There is a general method (called variation of
parameters) for achieving the latter but it often leads to difcult, if
not impossible integrals to solve. We shall restrict ourselves to a
class of right-hand side functions f (t) for which there is an easily
applicable method.
This class is the collection of polynomial functions

n
t
n
+
n1
t
n1
+ +
1
t +
0
,
exponential functions e
t
and the trigonometric functions cos(t)
and sin(t) or any linear combination of these, for example
f (t) = t
2
+7t +4 + e
5t
+6 cos(3t)
The method of nding a particular solution for such a class of right-
hand side functions is called the method of undetermined coefcients.
160
10.6.1 Method of undetermined coefcients
Table 10.1 is a summary of the appropriate form of solution to seek.
The values of
n
, ,
0
, , and will be given and we are required
to determine the correct values of P
n
, , P
0
, P and/or Q that will
give a form that will satisfy the differential equation.
f (t) Solution form

n
t
n
+
n1
t
n1
+ +
1
t +
0
P
n
t
n
+ P
n1
t
n1
+ + P
1
t + P
0
e
t
Pe
t
cos(t) Pcos(t) + Qsin(t)
sin(t) Pcos(t) + Qsin(t)
Table 10.1: Solution forms for method
of undetermined coefcients
Example 10.17. Solve the differential equation
d
2
x
dt
2
+6
dx
dt
+9x = 4t
2
+5.
Solution: In Example 10.16, we found the complementary solution
(that is, the solution of the homogeneous problem). It is
x
c
(t) = Ae
3t
+ Bte
3t
We now need to determine a particular solution. Based on Table 10.1 we
try
x
p
(t) = P
2
t
2
+ P
1
t + P
0
(10.22)
Note that we must include the term P
1
t. Substitution of expression
(10.22) into the differential equation gives
2P
2
+6(2P
2
t + P
1
) +9(P
2
t
2
+ P
1
t + P
0
) = 4t
2
+5
9P
2
t
2
+ (9P
1
+12P
2
)t + (9P
0
+6P
1
+2P
2
) = 4t
2
+5
and equating powers of t on each side of this equation leads to a set of
algebraic equations to solve for the unknowns P
0
, P
1
and P
2
:
9P
2
= 4 9P
1
+12P
2
= 0 9P
0
+6P
1
+2P
2
= 5
The solution of this set of equations is
P
0
=
23
27
P
1
=
16
27
P
2
=
4
9
Hence the particular solution
x
p
(t) =
4
9
t
2

16
27
t +
23
27
and nally, the general solution of the nonhomogeneous differential equa-
tion is
x(t) = Ae
3t
+ Bte
3t
+
4
9
t
2

16
27
t +
23
27
(10.23)
Example 10.18. Solve the differential equation
d
2
x
dt
2
5
dx
dt
+4x = 7 cos(3t)
mathematical methods i 161
Solution: In Example 10.14 we found the complementary solution
(that is, the solution of the homogeneous problem). It is
x
c
(t) = Ae
t
+ Be
4t
We now need to determine a particular solution. Based on Table 10.1 we
try
x
p
(t) = Pcos(3t) + Qsin(3t) (10.24)
Note that we must include both cos and sin terms. Substitution of expres-
sion (10.24) into the differential equation gives
9Pcos(3t) 9Qsin(3t) 5(3Psin(3t) +3Qcos(3t)) +4(Pcos(3t) +Qsin(3t)) = 7 cos(3t)
and equating coefcients of cos(3t) and sin(3t) on both sides of this
equation leads to a set of algebraic equations to solve for the unknowns P
and Q:
9P 15Q +4P = 7 9Q +15P +4Q = 0
The solution of this set of equations is
P =
7
50
and Q =
21
50
and so
x
p
(t) =
7
50
cos(3t)
21
50
sin(3t)
and nally, the general solution of the nonhomogeneous differential equa-
tion is
x(t) = Ae
t
+ Be
4t

7
50
cos(3t)
21
50
sin(3t) (10.25)
Example 10.19. Solve the differential equation
d
2
x
dt
2
4
dx
dt
+13x = 8e
3t
Solution: In Example 10.15 we found the complementary solution
(that is, the solution of the homogeneous problem). It is
x
c
(t) = C
1
e
2t
cos(3t) + C
2
e
2t
sin(3t)
We now need to determine a particular solution. Based on Table 10.1 we
try
x
p
(t) = Pe
3t
(10.26)
Substitution of expression (10.26) into the differential equation gives
9Pe
3t
+12Pe
3t
+13Pe
3t
= 8e
3t
and cancelling throughout by e
3t
(which is valid because it can never be
zero) gives
34P = 8 P =
4
17
x
p
(t) =
4
17
e
3t
and hence the general solution of the nonhomogeneous differential equation
is
x(t) = C
1
e
2t
cos(3t) + C
2
e
2t
sin(3t) +
4
17
e
3t
(10.27)
162
10.7 Initial and boundary conditions
As mentioned earlier, the values of constants that arise when we
solve differential equations can be determined by making use of
other conditions (or restrictions) placed on the problem. If there are
n unknown constants then we will need n extra conditions.
If all of the extra conditions are given at one value of the inde-
pendent variable then the extra conditions are called initial condi-
tions and the combined differential equation plus initial condition
is called an initial value problem. This is so even if the independent
variable doesnt represent time.
Formally, if the extra conditions are given at different values of
the independent variable then they are called boundary conditions
and the combined differential equation plus boundary conditions is
called a boundary value problem.
In applications, the term initial condition is used if the dependent
variable represents time and the term boundary condition is used if
the dependent variable represents space. Usually, the two different
uses of the terminolgy coincide.
Example 10.20. Solve the initial value problem
d
2
x
dt
2
5
dx
dt
+4x = 7 cos(3t) subject to x(0) = 1 and
dx
dt
(0) = 2
Solution: We have already seen this differential equation in Example
10.18 and determined that its general solution is given by (10.25):
x(t) = Ae
t
+ Be
4t

7
50
cos(3t)
21
50
sin(3t)
The initial conditions will give two equations to solve for the unknowns, A
and B. Firstly,
x(0) = 1 1 = A + B
7
50
(10.28)
The second initial condition involves the derivative so rst we need to nd
it:
dx
dt
= Ae
t
+4Be
4t
+
21
50
sin(3t)
63
50
cos(3t)
and the second initial condition then gives
dx
dt
(0) = 2 2 = A +4B
63
50
(10.29)
Solving the pair of algebraic equations (10.28) and (10.29) gives
A =
13
30
and B =
53
75
so the required solution is
x(t) =
13
30
e
t
+
53
75
e
4t

7
50
cos(3t)
21
50
sin(3t)
mathematical methods i 163
Example 10.21. Solve the boundary value problem
d
2
x
dt
2
+6
dx
dt
+9x = 4t
2
+5 subject to x(0) = 7 and x(1) = 3
Solution: We have already seen this differential equation and deter-
mined that its general solution is given by (10.23):
x(t) = Ae
3t
+ Bte
3t
+
4
9
t
2

16
27
t +
23
27
The boundary conditions give two equations to solve for the unknowns, A
and B:
x(0) = 7 7 = A +
23
27
x(1) = 3 3 = Ae
3
+ Be
3
+
4
9

16
27
+
23
27
Solving this pair of algebraic equations gives
A =
166
27
and B =
166 +100e
3
75
so the required solution is
x(t) =
166
27
e
3t

166 +100e
3
75
te
3t
+
4
9
t
2

16
27
t +
23
27
Example 10.22. Solve the boundary value problem
d
2
x
dt
2
4
dx
dt
+13x = 8e
3t
subject to x() = 2 and
dx
dt
() = 0
Solution: We have already seen this differential equation and deter-
mined that its general solution is given by (10.27):
x(t) = C
1
e
2t
cos(3t) + C
2
e
2t
sin(3t) +
4
17
e
3t
The boundary conditions give two equations to solve for the unknowns, C
1
and C
2
:
x() = 2 2 = C
1
e
2
+
4
17
e
3
The derivative of the solution is
dx
dt
= C
1
_
2e
2t
cos(3t) 3e
2t
sin(3t)
_
+C
2
_
2e
2t
sin(3t) +3e
2t
cos(3t)
_

12
17
e
3t
and hence
dx
dt
() = 0 0 = 2C
1
e
2
3C
2
e
2

12
17
e
3
Solving this pair of algebraic equations gives
C
1
= 2e
2
+
4
17
e
5
and C
2
=
4
3
e
2

20
51
e
5
so the required solution is
x(t) =
_
2e
2
+
4
17
e
5
_
e
2t
cos(3t) +
_
4
3
e
2

20
51
e
5
_
e
2t
sin(3t) +
4
17
e
3t
164
10.8 Summary of method
We now summarize the method for solving a linear constant-
coefcient second order ordinary differential equation with initial
or boundary conditions.
Key Concept 10.23. Solve
a
d
2
x
dt
2
+ b
dx
dt
+ cx = f (t) with a ,= 0
given some initial or boundary conditions.
Step 1. Solve the corresponding characteristic (or auxilliary) equation
am
2
+ bm + c = 0
Step 2. Use your solution to the characteristic equation to nd the
general solution x
c
(t) of the corresponding homogeneous equation
a
d
2
x
dt
2
+ b
dx
dt
+ cx = 0
by using one of the three general forms given in Section 10.5
depending on whether or not the characteristic equation has two
distinct real roots, complex conjugate roots or equal roots. Your
solution will have two unknown constants.
Step 3. Use the Method of Undetermined Coefcients outlined in
Section 10.6.1 to nd a particular solution x
p
(t).
Step 4. The general solution is now x(t) = x
c
(t) + x
p
(t).
Step 5. Use the initial or boundary conditions to determine the two
unknown constants in x(t) coming from x
c
(t).
10.9 Partial differential equations
A partial differential equation (PDE) is one that involves an un-
known function of more than one variable and any number of its
partial derivatives. For example,
u
x
+
u
y
= 0
is a partial differential equation for the unknown function u(x, y).
There may be any number of independent variables. For ex-
ample, an analysis of the vibrations of a solid structure (due to an
earthquake perhaps) would involve the three spacial variables x, y
and z, and time t. If there were only one independent variable then
the equation would be an ordinary differential equation.
mathematical methods i 165
Partial differential equations are important in a large number of
very important areas, for example, hydrodynamics and aerodynam-
ics (the Navier-Stokes equation), groundwater ow, acoustics, the
spread of pollution, combustion, electrodynamics (Maxwells equa-
tions), quantum mechanics (the Schrdinger equation), meteorology
and climate science.
Example 10.24. Maxwells Equations
These are 4 coupled vector PDEs for the vector unknowns for the elec-
tric and magnetics elds E = (E
x
, E
y
, E
z
) and B = (B
x
, B
y
, B
z
). They
are, in vector derivative notation,
E = 4
e
B = 0
E =
1
c
B
t
B =
1
c
E
t
+
4
c
j
e
where is the so-called vector differential operator
= (

x
,

y
,

z
)
Maxwells Equations constitute an (overspecied) set of 8 scalar equations.
The charge density
e
and current density j
e
are specied.
Example 10.25. Three important partial differential equations are the
following.
The Wave equation

2
u
t
2
= c
2

2
u
x
2
for u(x, t)
where c is the wave speed.
The heat conduction (or diffusion) equation
T
t
=

2
T
x
2
for T(x, t)
where is the thermal diffusivity.
The Laplace equation

x
2
+

2

y
2
= 0 for (x, y)
10.9.1 Finding solutions of partial differential equations
There are a number of ways to nd solutions of PDEs but they can
involve a large amount of work and can be very subtle. A general
discussion of how to nd solutions to PDEs is beyond the scope
of this unit. We shall instead focus on verication of a given solu-
tion and analysis of its behaviour and what this tells us about the
situation being modelled.
Recall that when solving ODEs there would arise arbitrary con-
stants, for example,
d
2
y
dx
2
+ y = 0 y(x) = Acos x + Bsin x
166
and these would be determined from given initial data, e.g.
y(0) = 1 and y
/
(0) = 0 A = 1 and B = 0
In the case of PDEs, arbitrary functions can arise. For example, the
solution of
u
x
+
u
y
= 0 is u(x, y) = f (x y)
where f is any differentiable function of one variable. We verify this
using the Chain Rule as follows:
u
x
+
u
y
= f
/
(x y)(1) + f
/
(x y)(1) = 0
as required.
10.9.2 Verication of given solutions of partial differential equations
This is simply a matter of substituting the given solution into the
partial differential equation and evaluating the derivatives involved.
Example 10.26. Verify that u(x, t) =
_
x/(1 + t) is a solution of the
PDE
u
t
+ u
2
u
x
= 0
Solution. We will need the Chain Rule and the quotient rule:
u
x
=
1
2
1
_
x(1 + t)
u
t
=
1
2
_
x
(1 + t)
3
and hence
u
t
+ u
2
u
x
=
1
2
_
x
(1 + t)
3
+
x
1 + t

1
2
1
_
x(1 + t)
= 0
as required.
Example 10.27. Verify that u(x, y) = e
x
f (y x) is a solution of the
PDE
u
x
+
u
y
= u
for any function f of one variable.
Solution. We will need the chain rule and the product rule:
u
x
= e
x
f (y x) e
x
f
/
(y x)
u
y
= e
x
f
/
(y x)
and hence
u
x
+
u
y
= e
x
f (y x) e
x
f
/
(y x) + e
x
f
/
(y x) = e
x
f (y x) = u
as required.
Remark 10.28. Note that the PDE is symmetric in x and y but the
solution doesnt appear to be. What happened? We can write a symmetric
partner to this solution:
u(x, y) = e
x
f (y x) = e
y
e
xy
f (y x)
which implies that
u(x, y) = e
y
g(x y) where g(z) = e
z
f (z)
mathematical methods i 167
A given partial differential equation can have different solution
forms. For example, all of the functions
(x, y) = x
2
y
2
(x, y) = e
x
cos(y)
(x, y) = ln(x
2
+ y
2
)
are solutions of the Laplace equation

x
2
+

2

y
2
= 0
Verication of these solutions is a simple task.
(x, y) = x
2
y
2
then

2

x
2
+

2

y
2
= 2 + (2) = 0
(x, y) = e
x
cos y then

2

x
2
+

2

y
2
= e
x
cos y +e
x
(cos y) = 0
(x, y) = ln(x
2
+y
2
) then

x
=
2x
x
2
+ y
2
and

y
=
2y
x
2
+ y
2
and hence

x
2
+

2

y
2
=
2(y
2
x
2
)
(x
2
+ y
2
)
2
+
2(x
2
y
2
)
(x
2
+ y
2
)
2
= 0
10.9.3 Graphical description of solutions of partial differential equa-
tions
There are three ways of representing solutions of a PDE graphically.
Consider the function u(x, y) = e
x
cos(y). If we have access to a
sophisticated graphing package we can plot the solution surface in
three-dimensional space as in Figure 10.5.
2
1
x
0
1
2 3.5
1.5 y 7.5
6.5
2.5
2.5
7.5
Figure 10.5: Surface plot of u(x, y) =
e
x
cos(y).
168
Alternatively, we could produce a plot of the level curves of the
solution in two dimensions as in Figure 10.6.
Figure 10.6: Level-curve plot of
u(x, y) = e
x
cos(y).
We could also graph u against x for chosen values of y as in
Figure 10.7.
Figure 10.7: Section plots of u(x, y) =
e
x
cos(y) for y = 0, /8, /4.
10.9.4 Initial and Boundary Conditions
Recall that to fully determine the solution of an n
th
order ODE
we needed to specify n initial or boundary conditions. To fully
determine the solution of a PDE we need sufciently many initial
and/or boundary conditions. These will appear in the form of
functions. An example of an initial condition is
u(x, 0) = sin(x) for all x
and an example of a boundary condition is
u(0, t) = e
t
for all t > 0
How many initial and/or boundary conditions do we need? If
a PDE is n
th
order in time and m
th
order in space then we will re-
quire n initial conditions and m boundary conditions. An example
of a fully specied problem is:
Solve

2
u
t
2
=

3
u
x
3
subject to the initial conditions
u(x, 0) = cos(x) and
u
t
(x, 0) = 0 for 0 < x < 1
and the boundary conditions
u(0, t) = sin(t),
u
x
(0, t) = 1,

2
u
x
2
(0, t) = t for t > 0
An initial-boundary value problem (IBVP) is a PDE for an unknown
function of time and some spatial coordinates plus enough initial
mathematical methods i 169
and boundary information to determine any unknown functions or
constants that arise. An example of an IBVP for the Wave equation
in the region 0 < x < , 0 < t < is:
Solve

2
u
t
2
= c
2

2
u
x
2
subject to the initial conditions
u(x, 0) = cos(x) and
u
t
(x, 0) = 0
and the boundary conditions
u(0, t) = 1 and u(

2
, t) = 0
Example 10.29. Verify that the function
u(x, t) = xe
x
sin(t)
satises the initial-boundary value problem

2
u
t
2
=

2
u
x
2
+2
u
x
u(x, 0) = 0,
u
t
_
x,

2
_
= 0
u(0, t) = 0, lim
x
u(x, t) = 0
Soluton step 1. Verify that the function satises the partial differential
equation:

2
u
t
2
= xe
x
sin(t)
and

2
u
x
2
+2
u
x
= 2e
x
sin(t) + xe
x
sin(t) +2e
x
sin(t) 2xe
x
sin(t)
= xe
x
sin(t)
and hence the partial differential equation is satised.
Solution step 2. Verify the auxiliary (that is, initial and boundary)
conditions:
sin(0) = 0 u(x, 0) = 0
u
t
= xe
x
cos(t) and cos
_

2
_
= 0
u
t
(x,

2
) = 0
u(0, t) = 0
lim
x
u(x, t) = 0 =
_
lim
x
x
e
x
_
sin(t) = 0 sin(t) = 0
and hence all of the auxiliary conditions are satised. A plot of the solution
surface in three-dimensional space is given in Figure 10.8.
170
0.0
0.5
t
1.0
0.0
10.0
0.05
7.5
5.0
0.1
1.5
2.5
x
0.15
0.0
0.2
0.25
0.3
0.35
Figure 10.8: Solution surface for
Example 10.29.
11
Eigenvalues and eigenvectors
11.1 Introduction
Matrix multiplication usually results in a change of direction, for
example,
_
2 0
1 3
_ _
1
4
_
=
_
2
13
_
which is not parallel to
_
1
4
_
The eigenvectors of a given (square) matrix A are those special non-
zero vectors v that map to multiples of themselves under multipli-
cation by the matrix A. That is,
Av = v
The eigenvalues of A are the corresponding scale factors . Geomet-
rically, the eigenvectors of a matrix A are stretched (or shrunk) on
multiplication by A, whereas any other vector is rotated as well as
being stretched or shrunk:
x
1
x
2
x
1
x
2
v
x Av = v
Ax ,= x
Note that by denition 0 is not an eigenvector but we do allow 0 to
be an eigenvalue.
Example 11.1. Observe that
_
2 6
3 5
_ _
1
1
_
=
_
8
8
_
= 8
_
1
1
_
_
2 6
3 5
_ _
2
1
_
=
_
2
1
_
= 1
_
2
1
_
172
Hence 8 and 1 are eigenvalues of the matrix
_
2 6
3 5
_
with = 8 having
corresponding eigenvector (1, 1) and = 1 having corresponding
eigenvector (2, 1).
Some observations: Any multiple of an eigenvector is also an
eigenvector:
A(cv) = c(Av) = c(v) = (cv)
so cv is an eigenvector of A if v is, and with the same correspond-
ing eigenvalue. The set of all eigenvectors of a given eigenvalue
together with the zero vector is called the eigenspace corresponding
to the eigenvalue and we write
E

= v [ Av = v
The eigenspaces for Example 11.1 are shown in Figure 11.1.
x
1
x
2
E
8
E
1
Figure 11.1: Eigenspaces for Example
11.1.
Eigenspaces for nonzero eigenvalues are subspaces of the col-
umn space of the matrix while the 0-eigenspace is the nullspace
of the matrix. Geometrically, they are lines or planes through the
origin in R
3
if A is a 3 3 matrix (other than the identity matrix).
11.2 Finding eigenvalues and eigenvectors
Let A be a given n n matrix. Recall that the algebraic denition of
an eigenvalue-eigenvector pair is
Av = v
where is a scalar and v is a nonzero column vector of length n.
We begin by rearranging and regrouping, and noting that v = Iv
for any vector v, where I is the n n identity matrix, as follows:
Av v = 0
Av Iv = 0
mathematical methods i 173
(A I)v = 0
This is a homogeneous set of linear equations for the components
of v. If the matrix A I were invertible then the solution would
simply be v = 0 but this is not allowed by denition and in any
case would be of no practical use in applications. We hence require
that A I be not invertible and this will be the case if
det(A I) = 0 (11.1)
When we evaluate this determinant we will have a polynomial
equation of degree n in the unknown . The solutions of this equa-
tion will be the required eigenvalues. Equation (11.1) is called the
characteristic equation of the matrix A. The polynomial det(A I)
is called the characteristic polynomial of A.
Example 11.2. Example 11.1 revisited. We start by forming
A I =
_
2 6
3 5
_

_
1 0
0 1
_
=
_
2 6
3 5
_
This determinant of this matrix is
det(A I) = (2 )(5 ) 18 =
2
7 8
and hence the characteristic equation is

2
7 8 = 0 ( 8)( +1) = 0 = 8, 1
That is, the eigenvalues of the given matrix A are = 8 and = 1.
For each eigenvalue we must solve the equation
(A I)v = 0
When = 8 we have
_
6 6
3 3
_ _
v
1
v
2
_
=
_
0
0
_
E
8
= span
__
1
1
__
When = 1 we have
_
3 6
3 6
_ _
v
1
v
2
_
=
_
0
0
_
E
1
= span
__
2
1
__
In solving these sets of equations we end up with the complete eigenspace
in each case, but we usually select just one member of each.
It is important to note that the reduced row echelon form of
(A I) will always have at least one row of zeros. The number of zero
rows equals the dimension of the eigenspace.
Example 11.3. The upper triangular matrix
A =
_

_
1 2 6
0 3 5
0 0 4
_

_
174
Since A is upper triangular, so is AI and hence its determinant is just
the product of the diagonal. Thus the characteristic equation is
(1 )(3 )(4 ) = 0 = 1, 3, 4
Note that these are in fact the diagonal elements of A. The respective
eigenspaces are
E
1
= span
_

_
_

_
1
0
0
_

_
_

_
E
3
= span
_

_
_

_
1
1
0
_

_
_

_
E
4
= span
_

_
_

_
16
15
3
_

_
_

_
The eigenvalues, and corresponding eigenvectors, could be
complex-valued. If the matrix A is real-valued then the eigenval-
ues, that is, the roots of the characteristic polynomial, will occur in
complex conjugate pairs.
Example 11.4.
A =
_
2 1
5 4
_
The characteristic polynomial is
2
6 + 13 = 0. The eigenvalues are
= 3 + 2i and = 3 2i. When we solve (A I)v = 0 we will get
solutions containing complex numbers. Although we cant interpret them
as vectors in R
2
there are many applications (particularly in Engineering)
in which there is a natural interpretation in terms of the problem under
investigation. The corresponding eigenvectors are
v =
_
1 2i
5
_
and v =
_
1 +2i
5
_
The characteristic polynomial may have repeated roots. If it
factors into the form
(
1
)
m
1
(
j
)
m
j
(
p
)
m
p
we say that the algebraic multiplicity of =
j
is m
j
. For example, if
the characteristic polynomial were ( + 3)( 2)
4
( 5) then the
algebraic multiplicity of = 2 would be 4.
Example 11.5.
A =
_

_
2 2 3
2 1 6
1 2 0
_

_
The characteristic equation is

2
+21 +45 = 0
and so
(3 + )
2
(5 ) = 0 = 3, 3, 5
mathematical methods i 175
Repeating the root 3 reects the fact that = 3 has algebraic multi-
plicity 2.
To nd the eigenvectors corresponding to = 5 we solve
(A 5I)v = 0
_

_
7 2 3
2 4 6
1 2 5
_

_
_

_
v
1
v
2
v
3
_

_ =
_

_
0
0
0
_

_
After some work we arrive at the reduced row echelon form
_

_
1 2 5 0
0 1 2 0
0 0 0 0
_

_
Note that we have one row of zeros. The solution will therefore involve one
free parameter. Let v
3
= t. The second row gives v
2
= 2t and the rst
row then gives v
1
= t and hence the eigenspace corresponding to = 5
is
E
5
= span
_

_
_

_
1
2
1
_

_
_

_
Similarly, to nd the eigenvectors corresponding to = 3 we solve
(A +3I)v = 0
_

_
1 2 3
2 4 6
1 2 3
_

_
_

_
v
1
v
2
v
3
_

_ =
_

_
0
0
0
_

_
It turns out that the reduced row echelon form has two rows of zeros and
hence the solution will involve two free parameters. Let v
2
= s and
v
3
= t. Calculation gives v
1
= 2s + 3t and hence the eigenspace
corresponding to = 3 is
E
3
= span
_

_
_

_
2
1
0
_

_,
_

_
3
0
1
_

_
_

_
The dimension of the eigenspace of an eigenvalue is called its
geometric multiplicity. In the above example the geometric multiplic-
ity of = 3 is 2 and that of = 5 is 1. Note that the eigenspace
corresponding to = 3 is a plane through the origin and the
eigenspace corresponding to = 5 is a line through the origin, as
displayed in Figure 11.2
11.3 Some properties of eigenvalues and eigenvectors
Let A be an n n matrix with eigenvalues
1
,
2
, ,
n
. (We
include all complex and repeated eigenvalues.) Then
The determinant of the matrix A equals the product of the eigen-
values:
det(A) =
1

2
. . .
n
176
x
y
z
E
5
E
3
Figure 11.2: The eigenspaces for
Example 11.5.
The trace of a square matrix is the sum of its diagonal entries.
The trace of the matrix A equals the sum of the eigenvalues:
trace(A) a
11
+ a
22
+ + a
nn
=
1
+
2
+ +
n
Note that in both of these formulae all n eigenvalues must be
counted.
The eigenvalues of A
1
(if it exists) are
1

1
,
1

2
, ,
1

n
The eigenvalues of A
T
(that is, the transpose of A) are the same
as for the matrix A:

1
,
2
, ,
n
If k is a scalar then the eigenvalues of the matrix kA are
k
1
, k
2
, , k
n
If k is a scalar and I the identity matrix then the eigenvalues of
the matrix A + kI are

1
+ k,
2
+ k , ,
n
+ k
If k is a positive integer then the eigenvalues of A
k
are

k
1
,
k
2
, ,
k
n
mathematical methods i 177
Any matrix polynomial in A:
A
n
+
n1
A
n1
+ +
1
A +
0
I
has eigenvalues

n
+
n1

n1
+ +
1
+
0
for =
1
,
2
, . . . ,
n
The Cayley-Hamilton Theorem: A matrix A satises its own
characteristic equation, that is, if the characteristic equation is

n
+ c
n1

n1
+ + c
1
+ c
0
= 0
where c
1
, c
2
, , c
n
are constants then
A
n
+ c
n1
A
n1
+ + c
1
A + c
0
I = 0
It can be shown that any set of vectors from different eigenspaces,
that is, corresponding to different eigenvalues, is a linearly inde-
pendent set.
11.4 Diagonalisation
The eigenstructure (that is, the eigenvalue and eigenvector informa-
tion) is often collected into matrix form. The eigenvalues (including
multiplicities) are placed as the entries in a diagonal matrix D and
one eigenvector for each eigenvalue is taken to form a matrix P.
Example 11.6. Example 11.5 revisited.
A =
_

_
2 2 3
2 1 6
1 2 0
_

_ P =
_

_
1 2 3
2 1 0
1 0 1
_

_ D =
_

_
5 0 0
0 3 0
0 0 3
_

_
Note that the columns of D and P must correspond, e.g. = 5 is in
column 1 of D so the corresponding eigenvector must be in column 1 of P.
The following relationship between the matrices A, P and D will
always hold:
AP = PD (11.2)
We can see this by noting that the i
th
column of the equation:
_

_
a
11
a
1n
.
.
.
.
.
.
a
n1
a
nn
_

_
_

_
. . . v
i1
. . .
.
.
.
.
.
.
.
.
.
. . . v
in
. . .
_

_
=
_

_
. . . v
i1
. . .
.
.
.
.
.
.
.
.
.
. . . v
in
. . .
_

_
_

_
.
.
.
0 0
0
i
0
0 0
.
.
.
_

_
is just the eigenvalue equation for
i
, that is
Av
i
=
i
v
i
For Example 11.6:
AP =
_

_
5 6 9
10 3 0
5 0 3
_

_ = PD
178
If matrix P is invertible then equation (11.2) can be re-written as:
A = PDP
1
(11.3)
or
D = P
1
AP (11.4)
Two matrices M and N are called similar matrices if there exists
an invertible matrix Q such that
N = Q
1
MQ
Clearly A and the diagonal matrix D constructed from the eigenval-
ues of A are similar.
Diagonalisation is the process of determining a matrix P such that
D = P
1
AP
All we need to do is to nd the eigenvalues of A and form P as
described above.
Note, however, that not every matrix is diagonalisable.
Example 11.7.
A =
_

_
1 1 0
0 1 1
0 0 2
_

_
The eigenvalues of A are = 2, 1, 1 (that is, = 1 has algebraic
multiplicity 2).
The eigenspace corresponding to = 2 is
E
2
= span
_

_
_

_
1
3
9
_

_
_

_
The eigenspace corresponding to = 1 is
E
1
= span
_

_
_

_
1
0
0
_

_
_

_
Both = 2 and = 1 have geometric multipliciy 1. In order for P to
be invertible its columns must be linearly independent. Unfortunately,
there are not enough linearly independent eigenvectors to enable us to
build matrix P. Hence matrix A is not diagonalisable.
If an n n matrix has n distinct eigenvalues then it will be di-
agonalisable because each eigenvalue will give a representative
eigenvector and these will be linearly independent because they
correspond to different eigenvalues.
A matrix A is called a symmetric matrix if it equals its transpose,
that is
A = A
T
It can be shown that the eigenvalues of a real symmetric n n matrix
A are all real and that we can always nd enough linearly inde-
pendent eigenvectors to form matrix P, even if there are less than n
distinct eigenvalues. That is, a real symmetric matrix can always be
diagonalised.
mathematical methods i 179
11.5 Linear Systems
A homogeneous system of linear differential equations is a system of the
form
x
/
= Ax
where
x =
_

_
x
1
(t)
.
.
.
x
n
(t)
_

_
and A =
_

_
a
11
. . . a
1n
.
.
.
.
.
.
a
n1
. . . a
nn
_

_
, a constant matrix.
Well see later that the general solution is a linear combination
of certain vector solutions (which well label with a superscript
bracketed number), x
(1)
(t), x
(2)
(t), etc:
x(t) = c
1
x
(1)
(t) + + c
n
x
(n)
(t)
where the constants c
1
, , c
n
are called combination co-efcients.
There is a connection between the solution of the system of differ-
ential equations and the eigenstructure of the associated matrix A.
Example 11.8. Consider the second-order differential equation
x
//
+2x
/
3x = 0
Let x
1
= x and x
2
= x
/
. The derivatives of the new variables are
x
/
1
= x
/
= x
2
x
/
2
= x
//
= 3x 2x
/
= 3x
1
2x
2
In matrix-vector form this is
_
x
/
1
x
/
2
_
=
_
0 1
3 2
_ _
x
1
x
2
_
The (eigenvalue) characteristic equation for the matrix is
2
+2 3 = 0.
Note the similarity between this and the original differential equation. In
order to solve the original problem we would try for a solution of the form
x = e
mt
. Then x
/
= me
mt
and x
//
= m
2
e
mt
and substitution gives
m
2
e
mt
+2me
mt
3e
mt
= 0 (m
2
+2m3)e
mt
= 0
Divide throughout by e
mt
(which is never zero) to get
m
2
+2m3 = 0 This is the characteristic equation in m instead of .
We shall use a method based on eigenstructure to solve systems
of differential equations.
11.6 Solutions of homogeneous linear systems of differential equa-
tions
When n = 1 the system is simply x
/
= ax where a = a
11
. The so-
lution is x(t) = ce
at
where c is a constant. To solve a homogeneous
linear system, we seek solutions of the form
x(t) = ve
t
(11.5)
180
where is a constant and v is a constant vector. Substitution into
the system gives
x
/
= Ax ve
t
= Ave
t
v = Av
This is the eigenvalue-eigenvector problem so as soon as we know
the eigenvalues and eigenvectors we can write down the compo-
nent solutions just by substituting into equation (11.5).
Example 11.9. Example 11.8 revisited.
_
x
/
1
x
/
2
_
=
_
0 1
3 2
_ _
x
1
x
2
_
The eigenstructure of the matrix is
v
1
=
_
1
1
_
v
3
=
_
1
3
_
The two component solutions are
x
(1)
(t) =
_
1
1
_
e
t
x
(2)
(t) =
_
1
3
_
e
3t
and the general solution is
x = c
1
_
1
1
_
e
t
+ c
2
_
1
3
_
e
3t
In scalar form, the solutions are
x x
1
= c
1
e
t
+ c
2
e
3t
x
/
x
2
= c
1
e
t
3c
2
e
3t
Note that both x
1
(position) and x
2
(velocity) satisfy the original differ-
ential equation. This will be so if the system was derived from one higher
order differential equation.
Example 11.10. The second order system
x
/
(t) = Ax(t) x =
_
x
1
x
2
_
A =
_
1 2
5 4
_
The eigensolution is
v
6
=
_
2
5
_
v
1
=
_
1
1
_
Hence the two component solutions are
x
(1)
(t) =
_
2
5
_
e
6t
x
(2)
(t) =
_
1
1
_
e
t
The general solution is
x(t) = c
1
_
2
5
_
e
6t
+ c
2
_
1
1
_
e
t
mathematical methods i 181
Example 11.11. The third order system
x
/
(t) = Ax(t) x =
_

_
x
1
x
2
x
3
_

_ A =
_

_
1 2 3
0 4 5
0 0 6
_

_
The eigensolution is
v
1
=
_

_
1
0
0
_

_ v
4
=
_

_
2
3
0
_

_ v
6
=
_

_
16
25
10
_

_
The general solution is
x(t) = c
1
_

_
1
0
0
_

_e
t
+ c
2
_

_
2
3
0
_

_e
4t
+ c
3
_

_
16
25
10
_

_e
6t
Example 11.12. Repeated eigenvalue example. The third order
system
x
/
(t) = Ax(t) x =
_

_
x
1
x
2
x
3
_

_ A =
_

_
2 2 3
2 1 6
1 2 0
_

_
The eigensolution is
v
5
=
_

_
1
2
1
_

_ v
3
=
_

_
2
1
0
_

_,
_

_
3
0
1
_

_
The general solution is
x(t) = c
1
_

_
1
2
1
_

_e
5t
+
_
_
_c
2
_

_
2
1
0
_

_ + c
3
_

_
3
0
1
_

_
_
_
_e
3t
11.7 A change of variables approach to nding the solutions
Equations (11.3) and (11.4) can be used to simplify difcult prob-
lems:
Original Problem Easy solution
Easier Problem
Original solution
A
P
P
1
D
182
Introduce a vector variable y(t) through the relationship
x(t) = Py(t) (11.6)
where the columns of matrix P are the eigenvectors of A. Substitute
this change of variables into the system of differential equations:
x
/
= Ax Py
/
= APy y
/
= P
1
APy
We know that P
1
AP = D so the new system can be written as
y
/
(t) = Dy(t)
In expanded notation this new system of differential equations is
_

_
y
/
1
.
.
.
y
/
n
_

_
=
_

1
. . . 0
.
.
.
.
.
.
.
.
.
0 . . .
n
_

_
_

_
y
1
.
.
.
y
n
_

_
Each row of this matrix-vector equation is a simple scalar differ-
ential equation for the unknown functions y
1
(t), , y
n
(t). They
are
y
/
1
=
1
y
1
, y
/
2
=
2
y
2
, , y
/
n
=
n
y
n
The solutions of these are easily obtained:
y
1
(t) = c
1
e

1
t
, y
2
(t) = c
2
e

2
t
, , y
n
(t) = c
n
e

n
t
and substituting these into equation (11.6) gives the required solu-
tions.
12
Change of Basis
12.1 Introduction
Recall that the standard basis in R
2
is
S =
__
1
0
_
,
_
0
1
__
Any vector in R
2
can be written as a linear combination of these, for
example,
_
2
3
_
= 2
_
1
0
_
+3
_
0
1
_
Recall also that the standard basis in R
3
is
S =
_

_
_

_
1
0
0
_

_,
_

_
0
1
0
_

_,
_

_
0
0
1
_

_
_

_
Any vector in R
3
can be written as a linear combination of these, for
example,
_

_
4
2
7
_

_ = 4
_

_
1
0
0
_

_ +2
_

_
0
1
0
_

_ 7
_

_
0
0
1
_

_
In many areas of mathematics, both pure and applied, there is
good reason to opt to use a different basis than the standard one.
Judicious choice of a different basis will greatly simplify the prob-
lem. We saw this idea in Section 11.7 where we used a specially
chosen (based on eigenstructure) change of variables to greatly sim-
plify a system of differential equations. This plan of attack (that is,
a well chosen change of basis) has many uses.
Recall that a set of vectors B = u
1
, u
2
, , u
n
is a basis for a
subspace W of R
m
if B is linearly independent and spans W. Note that
W may be all of R
m
.
Example 12.1. Let W
P
be the plane 5x + 11y 9z = 0 through the
origin in R
3
. A basis for this subspace is
B =
_

_
_

_
1
2
3
_

_,
_

_
4
1
1
_

_
_

_
184
We can verify this by observing that the basis vectors are clearly linearly
independent (there are only two of them and one is not a multiple of the
other) and noting that
_

_
1
2
3
_

_
_

_
4
1
1
_

_ =
_

_
5
11
9
_

_, a vector normal to W
P
Any vector v in W
P
can be written as a linear combination of the basis
vectors, for example,
v =
_

_
5
8
7
_

_ = 3
_

_
1
2
3
_

_2
_

_
4
1
1
_

_ Note that 5(5) +11(8) 9(7) = 0


In general, the task of nding the required linear combination of
basis vectors u
1
, u
2
, , u
n
that produce a given target vector v is
that of solving a set of simultaneous linear equations:
Find
1
,
2
, ,
n
such that v =
1
u
1
+
2
u
2
+ +
n
u
n
Example 12.2. Example continued. Express w = (16, 13, 7) in
terms of the basis B.
Solution. We must solve:
_

_
16
13
7
_

_ =
1
_

_
1
2
3
_

_ +
2
_

_
4
1
1
_

_
_

1
+4
2
= 16
2
1

2
= 13
3
1
+
2
= 7
and so
_

1
= 4

2
= 5
The fact that we have found a solution means that w = (16, 13, 7) lies
in the plane W
P
. If it did not then the system of equation for
1
,
2
would
be inconsistent.
The coefcients
1
,
2
, ,
n
are called the coordinates of v in the
basis B and we write
[v]
B
=
_

1
.
.
.

n
_

_
Example 12.3. Example continued.
The coordinates of v and w in the basis B are
[v]
B
=
_
3
2
_
[w]
B
=
_
4
5
_
As mentioned before it is often useful to change the basis that
one is working with. Suppose that instead of the basis B in the
above example we chose to use another basis, say
C =
_

_
_

_
5
1
4
_

_,
_

_
3
3
2
_

_
_

_
mathematical methods i 185
(As was the case for basis B, it is easy to verify that this is a basis
for W). After some calculation we can show that
[v]
C
=
_

_
1
2
5
2
_

_ [w]
C
=
_

_
1
2

9
2
_

_
In order for us to exploit a judiciously chosen basis we would
like to have a simple way of converting from coordinates in one
basis to coordinates in another. For example, given [v]
B
can we nd
[v]
C
without having to work out what v itself is?
12.2 Change of basis
A change of coordinates from one basis to another can be achieved
by multiplication of the given coordinate vector by a so-called
change of coordinates matrix. Consider a subspace W of R
m
and two
different bases for W:
B = u
1
, u
2
, , u
n
and C = w
1
, w
2
, , w
n

A given vector v W will have cordinates in each of these bases:


[v]
B
=
_

1
.
.
.

n
_

_
[v]
C
=
_

1
.
.
.

n
_

_
Our task is to nd an invertible matrix P
CB
for which
[v]
C
= P
CB
[v]
B
and [v]
B
= P
BC
[v]
C
where P
BC
= P
1
CB
. That is, pre-multiplication by P
CB
will con-
vert coordinates in basis B to coordinates in basis C and pre-
multiplication by P
BC
will convert those in C to those in B.
To deduce the matrix P
CB
we need to write the elements of basis
B in terms of those of basis C. That is, we need to nd
[u
1
]
C
=
_

_
p
11
.
.
.
p
n1
_

_
[u
2
]
C
=
_

_
p
12
.
.
.
p
n2
_

_
[u
n
]
C
=
_

_
p
1n
.
.
.
p
nn
_

_
The coordinates found form the columns of the change of coordi-
nates matrix:
P
CB
=
_

_
p
11
p
1n
.
.
.
.
.
.
p
n1
p
nn
_

_
Note that P
CB
will be invertible because the elements of basis B are
linearly independent and the elements of basis C are linearly indepen-
dent.
186
Example 12.4. Example 12.1 revisited again. The required coordinates
are
_

_
1
2
3
_

_
C
=
_

_
1
2
1
2
_

_
_

_
4
1
1
_

_
C
=
_

_
1
2

1
2
_

_
and hence
P
CB
=
_
1
2
1
2
1
2

1
2
_
and P
BC
= P
1
CB
=
_
1 1
1 1
_
We can verify these. Recall that [v]
B
= [3, 2] and [w]
C
= [1/2, 9/2].
Then
[v]
C
= P
CB
[v]
B
=
_
1
2
1
2
1
2

1
2
_ _
3
2
_
=
_
1
2
5
2
_
which agrees with what we got before, and
[w]
B
= P
BC
[w]
C
=
_
1 1
1 1
_ _
1
2

9
2
_
=
_
4
5
_
which also agrees with what we got before.
12.3 Linear transformations and change of bases
Recall that linear transfomations can be represented using matrices,
for example, counterclockwise rotation through an angle in R
2
using the standard basis for the original and transformed vectors
has transformation matrix
A
SS
=
_
cos sin
sin cos
_
We have used the subscript SS to indicate that the standard basis
is being used to represent the original vectors and also the rotated
vectors. For example, the vector e
2
= [0, 1] under a rotation of /2
becomes
_
0 1
1 0
_ _
0
1
_
=
_
1
0
_
= e
1
, as expected
If we desire to use different bases to represent the vectors, say basis
B for the original vectors v and basis C for the transformed vectors
v

then we shall label the transformation matrix A


CB
and the linear
transformation will be
_
v

C
= A
CB
[v]
B
(12.1)
We need a way of deducing A
CB
. This can be achieved by employ-
ing the change of basis matrices P
BS
and P
CS
where
[v]
B
= P
BS
[v]
S
and
_
v

C
= P
CS
_
v

S
Substitution of these formulae in equation (12.1) gives
P
CS
_
v

S
= A
CB
P
BS
[v]
S

_
v

S
= P
SC
A
CB
P
BS
[v]
S
mathematical methods i 187
(recalling that P
CS
= P
1
SC
). Using the standard basis for the trans-
formation would be [v

]
S
= A
SS
[v]
S
and hence we must have
A
SS
= P
SC
A
CB
P
BS
, which we can rearrange to get the linear trans-
formation change of basis formula
A
CB
= P
CS
A
SS
P
SB
Note that if we use the same basis B for both the original and trans-
formed vectors then we have
A
BB
= P
BS
A
SS
P
SB
(12.2)
Recall that two matrices M and N are similar if there exists an in-
vertible matrix Q such that N = Q
1
MQ. Equation (12.2) tells us
that all linear transformation matrices in which the same basis is
used for the original and for the transformed vectors are similar.
Diagonalisation is an example of this situation when matrix A
represents some linear transformation in the standard bases. If
we switch to bases composed of the eigenvectors of A then the
matrix representation of the linear transformation takes on the
much simpler form D, a diagonal matrix.
Example 12.5. We determine the change of basis matrix from the stan-
dard basis S to a basis
B = u
1
, u
2
, , u
n

We note that [u
i
]
S
= [u
i
] for all i = 1, , n and hence we can immedi-
ately write
P
SB
= [u
1
u
2
u
n
]
Example 12.6. We determine the transformation matrix for counterclock-
wise rotation through an angle in R
2
using the basis
B =
__
1
1
_
,
_
1
1
__
to represent both the original vectors and the transformed vectors.
We need to calculate P
SB
and P
BS
. We can write down immediately
that
P
SB
=
_
1 1
1 1
_
and that P
BS
= P
1
SB
=
1
2
_
1 1
1 1
_
and so the desired transformation matrix is
A
BB
= P
BS
A
SS
P
SB
=
1
2
_
1 1
1 1
_ _
cos sin
sin cos
_ _
1 1
1 1
_
=
1
2
_
1 1
1 1
_ _
cos +sin cos sin
sin cos sin +cos
_
=
_
cos sin
sin cos
_
That is, A
BB
= A
SS
, but if we think about the geometry then this makes
sense.
188
Example 12.7. We determine the transformation matrix for counterclock-
wise rotation through an angle in R
2
using the basis
C =
__
1
0
_
,
_
1
1
__
to represent both the original vectors and the transformed vectors.
We need to calculate P
SC
and P
CS
. We can write down immediately
that
P
SC
=
_
1 1
0 1
_
and that P
CS
= P
1
SC
=
_
1 1
0 1
_
and so the desired transformation matrix is
A
CC
= P
CS
A
SS
P
SC
=
_
1 1
0 1
_ _
cos() sin()
sin() cos()
_ _
1 1
0 1
_
=
_
1 1
0 1
_ _
cos() cos() sin()
sin() sin() +cos()
_
=
_
cos() sin() 2 sin()
sin() sin() +cos()
_
Example 12.8. The matrix of a particular linear transfomation in R
3
in
which the standard basis has been used for both the original and trans-
formed vectors is
A
SS
=
_

_
1 3 4
2 1 1
3 5 1
_

_
Determine the matrix of the linear transfomation if basis B is used for the
original vector and basis C is used for the transformed vectors, where
B =
_

_
_

_
0
0
1
_

_,
_

_
0
1
0
_

_,
_

_
1
0
0
_

_
_

_
C =
_

_
_

_
1
1
1
_

_,
_

_
0
1
1
_

_,
_

_
0
0
1
_

_
_

_
Solution. First we need to calculate P
CS
and P
SB
. We can immediately
write down P
SB
and P
SC
, and use the fact that P
CS
= P
1
SC
. We have
P
SB
=
_

_
0 0 1
0 1 0
1 0 0
_

_ and P
SC
=
_

_
1 0 0
1 1 0
1 1 1
_

_ P
CS
= P
1
SC
=
_

_
1 0 0
1 1 0
0 1 1
_

_
and hence
A
CB
=
_

_
4 3 1
3 4 1
0 6 5
_

_
Example 12.9. Recall that a square matrix A is diagonalisable if there ex-
ists an invertible matrix P such that D = P
1
AP where D is a diagonal
matrix. This is in fact a change of basis formula. If A represents a linear
transformation using the standard basis for both the original and trans-
formed vectors, that is, A = A
SS
, then we can interpret P as representing
a change of basis, from a new basis B to the standard basis S (yes, it has to
mathematical methods i 189
be this way around). That is, P = P
SB
. Then the diagonalisation formula
reads
D = P
BS
A
SS
P
SB
D = A
BB
That is, solving the eigenvalue-eigenvector problem is equivalent to nding
a basis in which the linear transformation has a particularly simple (in
this case, diagonal) matrix representation.
Example 12.10. Engineering example. In a number of branches of
engineering one encounters stress (and strain) tensors. These are in fact
matrix representations of (linearised) mechanical considerations. For ex-
ample, the stress tensor is used to calculate the stress in a given direction
at any point of interest (T
(n)
= n in one of the standard notations). The
eigenvalues are referred to as principal stresses and the eigenvectors as
principal directions. The well-known (to these branches of engineering)
transformation rule for the stress tensor is essentially a change of basis
formula.
Part IV
Sequences and Series: by
Luchezar Stoyanov,
Jennifer Hopwood and
Michael Giudici
13
Sequences and Series
13.1 Sequences
By a sequence we mean an innite sequence of real numbers:
a
1
, a
2
, a
3
, . . . , a
n
, . . .
We denote such a sequence by a
n
or a
n

n=1
. Sometimes our
sequences will start with a
m
for some m ,= 1.
Example 13.1. 1.
1,
1
2
,
1
3
, . . . ,
1
n
, . . .
Here a
n
=
1
n
for all integers n 1.
2. b
n
= (1)
n
n
3
for n 1, denes the sequence
1, 2
3
, 3
3
, 4
3
, 5
3
, 6
3
, . . .
3. For any integer n 1, dene a
n
= 1 when n is odd, and a
n
= 0 when
n is even. This gives the sequence
1, 0, 1, 0, 1, 0, 1, 0, . . .
4. The sequence of the so called Fibonacci numbers is dened recur-
sively as follows: a
1
= a
2
= 1, a
n+2
= a
n
+ a
n+1
for n 1. This is
then the sequence
1, 1, 2, 3, 5, 8, 13, . . .
In the same way that we could dene the limit as x of a
function f (x) we can also dene the limit of a sequence. This is not
surprising since a sequence can be regarded as a function with the
domain being the set of positive integers.
1 1
Again this denition can be made
more precise along the lines of Section
6.1.1 in the following manner: We say
that a
n
has a limit L if for every
> 0 there exists a positive integer N
such that [a
n
L[ < for all n N.
Definition 13.2. (Intuitive denition of the limit of a sequence)
Let a
n
be a sequence and L be a real number. We say that a
n
has
a limit L if we can make a
n
arbitrarily close to L by taking n to be suf-
ciently large. We denote this situation by
lim
n
a
n
= L
194
We say that a
n
is convergent if lim
n
a
n
exists; otherwise we say
that a
n
is divergent.
Example 13.3. 1. Let a
n
= b for all n 1. Then lim
n
a
n
= b.
2. Consider the sequence a
n
=
1
n
(n 1). Then lim
n
a
n
= 0.
3. If > 0 is a constant (that is, does not depend on n) and a
n
=
1
n

for
any n 1, then lim
n
a
n
= 0. For instance, taking = 1/2 gives
1,
1

2
,
1

3
, . . . ,
1

n
, . . . 0
4. Consider the sequence a
n
from Example 13.1(3) above, that is
1, 0, 1, 0, 1, 0, 1, 0, . . .
This sequence is divergent.
Just as for limits of functions of a real variable we have Limit
Laws and a Squeeze Theorem:
Theorem 13.4 (Limit Laws). Let a
n
and b
n
be convergent se-
quences with lim
n
a
n
= a and lim
n
b
n
= b. Then:
1. lim
n
(a
n
b
n
) = a b.
2. lim
n
(c a
n
) = c a for any constant c R.
3. lim
n
(a
n
b
n
) = a b.
4. If b ,= 0 and b
n
,= 0, for all n then lim
n
a
n
b
n
=
a
b
.
Theorem 13.5 (The Squeeze Theorem or The Sandwich Theorem).
Let a
n
, b
n
and c
n
be sequences such that lim
n
a
n
= lim
n
c
n
= a
and
a
n
b
n
c
n
for all n 1. Then the sequence b
n
is also convergent and lim
n
b
n
= a.
We can use Theorems 13.4 and 13.5 to calculate limits of various
sequences.
Example 13.6. Using Theorem 13.4 several times
lim
n
n
2
n +1
3n
2
+2n 1
= lim
n
n
2
(1 1/n +1/n
2
)
n
2
(3 +2/n 1/n
2
)
=
lim
n
(1 1/n +1/n
2
)
lim
n
(3 +2/n 1/n
2
)
=
lim
n
1 lim
n
1
n
+ lim
n
1
n
2
lim
n
3 +2 lim
n
1
n
lim
n
1
n
2
=
1
3
mathematical methods i 195
Here we used the fact that lim
n
1
n
2
= 0 (see Example 13.3(3)).
Example 13.7. Find lim
n
cos(n)
n
if it exists.
Solution. (Note: Theorem 13.4 is not applicable here, since lim
n
cos(n)
does not exist.) Since 1 cos(n) 1 for all n, we have

1
n

cos(n)
n

1
n
for all n 1. Using the Squeeze Theorem and the fact that
lim
n

1
n
= lim
n
1
n
= 0
it follows that lim
n
cos(n)
n
= 0.
A particular way for a sequence to be divergent is if it diverges
to or .
Definition 13.8. (Diverging to innity)
We say that the sequence a
n

n=1
diverges to if given any positive
number M we can always nd a point in the sequence after which all
terms are greater than M. We denote this by lim
n
a
n
= or a
n
.
Similarly, we say that a
n

n=1
diverges to if given any negative
number M we can always nd a point in the sequence after which all
terms are less than M. We denote this by lim
n
a
n
= or a
n
.
Note as in the limit of a function case when we write lim
n
a
n
=
we do not mean that the limit exists and is equal to some special
number . We are just using a combination of symbols which has
been agreed will be taken to mean "the limit does not exist and the
reason it does not exist is that the terms in the sequence increase
without bound. In particular, IS NOT a number and so do not
try to do arithmetic with it. If you do you will almost certainly end
up with incorrect results, sooner or later, and you will always be
writing nonsense.
Note that it follows from the denitions that if a
n
, then
[a
n
[ .
Example 13.9. 1. Let a
n
= n and b
n
= n
2
for all n 1. Then
a
n
, while b
n
, and [b
n
[ = n
2
.
2. a
n
= (1)
n
n does not diverge to and a
n
does not diverge to ,
either. However, [a
n
[ = n .
3. Let r > 1 be a constant. Then r
n
. If r < 1, then r
n
does not
diverge to or . However, [r
n
[ = [r[
n
.
4. If a
n
,= 0 for all n, then a
n
0 if and only if
1
[a
n
[
. Similarly,
if a
n
> 0 for all n 1, then a
n
if and only if
1
a
n
0. (These
properties can be easily derived from the denitions.)
196
13.1.1 Bounded sequences
Definition 13.10. (Bounded)
A sequence a
n

n=1
is called bounded above if a
n
is less than or
equal to some number (an upper bound) for all n. Similarly, it is called
bounded below if a
n
is greater than or equal to some number (a lower
bound) for all n.
We say that a sequence is bounded if it has both an upper and lower
bound.
In Examples 13.1, the sequences in part (1) and (3) are bounded,
the one in part (2) is bounded neither above nor below, while the
sequence in part (4) is bounded below but not above.
Theorem 13.11. Every convergent sequence is bounded.
It is important to note that the converse statement in Theorem
13.11 is not true; there exist bounded sequences that are divergent.
See Example 13.1(3) above. So this theorem is not used to prove
that sequences are convergent but to prove that they are not, as in
the example below.
Example 13.12. The sequence a
n
= (1)
n
n
3
is not bounded, so by
Theorem 13.11. it is divergent.
An upper bound of a sequence is not unique. For example, both
1 and 2 are upper bounds for the sequence a
n
=
1
n
. This motivates
the following denition.
Definition 13.13. (Supremum and inmum)
Let A R. The least upper bound of A (whenever A is bounded above) is
called the supremum of A and is denoted sup A.
Similarly, the greatest lower bound of A (whenever A is bounded be-
low) is called the inmum of A and is denoted inf A.
Notice that if the set A has a maximum, then sup A is the largest
element of A. Similarly, if A has a minimum, then inf A is the
smallest element of A. However, sup A always exists when A is
bounded above even when A has no maximal element. For exam-
ple, if A = (0, 1) then A does not have a maximal element ( for any
a A there is always b A such that a < b). However, it has a
supremum and sup A = 1. Similarly, inf A always exists when A
is bounded below, regardless of whether A has a minimum or not.
For example, if A = (0, 1) then inf A = 0.
mathematical methods i 197
Definition 13.14. (Monotone)
A sequence a
n
is called monotone if it is either non-decreasing or non-
increasing, that is, if either a
n
a
n+1
for all n or a
n
a
n+1
for all
n.
We can now state one important property of sequences.
Theorem 13.15 (The Monotone Sequences Theorem). If the se-
quence a
n

n=1
is non-decreasing and bounded above, then the sequence
is convergent and
lim
n
a
n
= sup(a
n
)
If a
n

n=1
is non-increasing and bounded below, then a
n
is convergent
and
lim
n
a
n
= inf(a
n
)
That is, every monotone bounded sequence is convergent.
Theorem 13.15 and the denition of divergence to implies
the following:
Key Concept 13.16. For any non-decreasing sequence
a
1
a
2
a
3
. . . a
n
a
n+1

there are only two options:
Option 1. The sequence is bounded above. Then by the Monotone
Sequences Theorem the sequence is convergent, that is, lim
n
a
n
exists.
Options 2. The sequence is not bounded above. It then follows
from the denition of divergence to that lim
n
a
n
= .
Similarly, for any non-increasing sequence
b
1
b
2
b
3
. . . b
n
b
n+1

either the sequence is bounded below and then lim
n
b
n
exists or the
sequence is not bounded below and then lim
n
b
n
= .
Example 13.17. Let a
n
=
1
n
for all n 1. As we know,
lim
n
a
n
= 0 = inf
_
1
n
[ n = 1, 2, . . .
_
This just conrms Theorem 13.15.
Theorem 13.18. 1. For every real constant > 0
lim
n
ln n
n

= 0 .
2.
lim
n
n

n = lim
n
n
1/n
= 1 .
198
3. For every constant a R
lim
n
a
n
n!
= 0 .
4. For every constant a R
lim
n
_
1 +
a
n
_
n
= e
a
.
13.2 Innite Series
Definition 13.19. (Innite series)
An innite series is by denition of the form

n=1
a
n
= a
1
+ a
2
+ . . . + a
n
+ . . . (13.1)
where a
1
, a
2
, . . . , a
n
, . . . is a sequence of real numbers.
Sometimes we will label the series
2
by

n=0
a
n
= a
0
+ a
1
+ a
2
+ . . ..
2
It is important to be clear about the
difference between a sequence and
a series. These terms are often used
interchangeably in ordinary English
but in mathematics they have distinctly
different and precise meanings.
Since an innite series involves the sum of innitely many terms
this raises the issue of what the sum actually is. We can deal with
this in a precise manner by the following denition.
Definition 13.20. (Convergent and divergent series)
For every n 1,
s
n
= a
1
+ a
2
+ . . . + a
n
is called the nth partial sum of the series in (13.1). If lim
n
s
n
= s we say
that the innite series is convergent and write

n=1
a
n
= s
The number s is then called the sum of the series.
If lim
n
s
n
does not exist, we say that the innite series (13.1) is diver-
gent.
Note that when a series is convergent we have

n=1
a
n
= lim
n
(a
1
+ a
2
+ . . . + a
n
)
Example 13.21. 1. The interval (0, 1] can be covered by subintervals
as follows: rst take the subinterval [1/2, 1] (of length 1/2), then the
subinterval [1/4, 1/2] (of length 1/4); then the subinterval [1/8, 1/4]
(of length 1/8), etc. The total length of these subintervals is 1, which
implies
1 =
1
2
+
1
4
+
1
8
+ . . . +
1
2
n
+ . . . .
mathematical methods i 199
2. Given a constant r R, consider the geometric series

n=0
r
n
= 1 + r + r
2
+ . . . + r
n1
+ . . .
For the nth partial sum of the above series we have
s
n
= 1 + r + r
2
+ . . . + r
n1
=
1 r
n
1 r
when r ,= 1.
(a) Let [r[ < 1. Then lim
n
s
n
= lim
n
1 r
n
1 r
=
1
1 r
, since limr
n
= 0
whenever [r[ < 1. Thus the geometric series is convergent in this
case and we write
1
1 r
= 1 + r + r
2
+ . . . + r
n1
+ . . . (13.2)
Going back to part (1), now rigorously we have

n=1
1
2
n
= 1 +

n=0
1
2
n
= 1 +
1
1 1/2
= 1
(b) If [r[ > 1, then lim
n
r
n
does not exist (in fact [r
n
[ as n ),
so s
n
=
1r
n
1r
does not have a limit; that is, the geometric series is
divergent.
(c) If r = 1, then s
n
= n as n , so the geometric series is
again divergent.
(d) Let r = 1. Then s
n
= 1 for odd n and s
n
= 0 for even n. Thus the
sequence of partial sums is 1, 0, 1, 0, 1, 0, . . . , which is a divergent
sequence. Hence the geometric series is again divergent.
Conclusion: the geometric series is convergent if and only if [r[ < 1.
Theorem 13.22. If the series

n=1
a
n
is convergent, then lim
n
a
n
= 0.
The above theorem says that lim
n
a
n
= 0 is necessary for con-
vergence. The theorem is most useful in the following equivalent
form.
Theorem 13.23 (Test for divergence). If the sequence a
n
does not
converge to 0, then the series

n=1
a
n
is divergent.
Example 13.24. The series

n=1
n
2
+2n +1
3n
2
+4
is divergent by the Test for
Divergence, since
lim
n
a
n
= lim
n
n
2
+2n +1
3n
2
+4
= lim
n
1 +2/n +1/n
2
3 +4/n
2
=
1
3
,= 0 .
Note that Theorem 13.22 does not say that lim
n
a
n
= 0 implies
that the series is convergent. In fact, the following example shows
that this is not the case.
200
Example 13.25. The harmonic series
1 +
1
2
+
1
3
+ . . . +
1
n
+ . . .
is divergent, even though lim
n
a
n
= 0.
Proof for the interested reader: First notice that s
n
< s
n+1
for every
n 1, so the sequence s
n
of partial sums is increasing. Therefore to
show that lim
n
s
n
= it is enough to observe that the sequence s
n
is
not bounded. To do this we will demonstrate that the subsequence
s
1
, s
2
, s
2
2 , s
2
3 , . . . , s
2
n , . . .
as n .
We have s
1
= 1 and s
2
= 1 +
1
2
. Next,
s
2
2 =
_
1 +
1
2
_
+
_
1
3
+
1
4
_
> s
2
+
_
1
4
+
1
4
_
= 1 +
2
2
and
s
2
3 =
_
1 +
1
2
+
1
3
+
1
4
_
+
_
1
5
+
1
6
+
1
7
+
1
8
_
> s
2
2 +
_
1
8
+
1
8
+
1
8
+
1
8
_
> 1 +
2
2
+
1
2
= 1 +
3
2
Assume that for some k 1 we have s
2
k > 1 +
k
2
. Then
s
2
k+1 =
_
1 +
1
2
+ . . . +
1
2
k
_
+
_
1
2
k
+1
+
1
2
k
+2
+ . . . +
1
2
k+1
_
> s
2
k +2
k

1
2
k+1
> 1 +
k
2
+
1
2
= 1 +
k+1
2
It now follows by the Principle of Mathematical Induction that
s
2
n > 1 +
n
2
for all positive integers n. This obviously implies lim
n
s
2
n = and there-
fore the sequence s
n
is unbounded. Since this sequence is increasing (as
mentioned earlier), it follows that lim
n
s
n
= , so the harmonic series is
divergent.
Indeed, it can be shown that if s
n
is the nth partial sum of the harmonic
series then
s
n
ln n
=
1 +
1
2
+
1
3
+ . . . +
1
n
ln n
1
as n . That is, s
n
as fast as ln(n) .
Definition 13.26. (p-series)
One important series is the p-series given by

n=1
1
n
p
Here p R is an arbitrary constant.
Note that the case p = 1 is the harmonic series which we have
already seen is divergent.
mathematical methods i 201
Theorem 13.27. The p-series

n=1
1
n
p
is convergent if and only if p > 1.
There are several results and tests that allow you to determine
if a series is convergent or not. We outline some of them in the
theorems below.
Theorem 13.28. If the innite series

n=1
a
n
and

n=1
b
n
are convergent,
then the series

n=1
(a
n
b
n
) and

n=1
c a
n
(for any constant c R) are also
convergent with

n=1
(a
n
b
n
) =

n=1
a
n

n=1
b
n
and

n=1
c a
n
= c

n=1
a
n
Example 13.29. Using Theorem 13.28 and the formula for the sum of a
(convergent) geometric series, we get

n=0
__

2
3
_
n
+
3
4
n+1
_
=

n=0
_

2
3
_
n
+
3
4

n=0
1
4
n
=
1
1 +2/3
+
3
4

1
1 1/4
=
8
5
Another way to determine if a series converges is to compare it
to a series whose behaviour we know. The rst way to do this is in
an analogous way to the Squeeze Theorem.
Theorem 13.30 (The Comparison Test). Let

n=1
a
n
and

n=1
b
n
be
innite series such that
0 a
n
b
n
holds for all sufciently large n.
1. If

n=1
b
n
is convergent then

n=1
a
n
is convergent.
2. If

n=1
a
n
is divergent then

n=1
b
n
is divergent.
Since we know the behaviour of a p-series, we usually use one in
our comparison.
Example 13.31. 1. Consider the series

n=1
1 +sin(n)
n
2
Since 1 sin(n) 1, we have 0 1 +sin(n) 2 for all n. Thus
0
1 +sin(n)
n
2

2
n
2
(13.3)
202
for all integers n 1.
The series

n=1
1
n
2
is convergent, since it is a p-series with p = 2 > 1.
Thus

n=1
2
n
2
is convergent, and now (13.3) and the Comparison Test
show that the series

n=1
1 +sin(n)
n
2
is also convergent.
2. Consider the series

n=1
ln(n)
n
. Since
0 <
1
n

ln(n)
n
for all integers n 3, and the series

n=1
1
n
is divergent (it is the
harmonic series), the Comparison Test implies that the series

n=1
ln(n)
n
is also divergent.
Another way to compare to series is to take the limit of the ra-
tio of terms from the corrsponding sequences. This allows us to
compare the rates at which the two sequences go to 0.
Theorem 13.32 (The Limit Comparison Test). Let

n=1
a
n
and

n=1
b
n
be innite series such that a
n
0 and b
n
> 0 for sufciently large n, and
let
c = lim
n
a
n
b
n
0
(a) If 0 < c < , then

n=1
a
n
is convergent if and only if

n=1
b
n
is
convergent.
(b) If c = 0 and

n=1
b
n
is convergent then

n=1
a
n
is convergent.
(c) If lim
n
a
n
b
n
= and

n=1
a
n
is convergent then

n=1
b
n
is convergent.
Remark 13.33. 1. Clearly in case (a) above we have that

n=1
a
n
is diver-
gent whenever

n=1
b
n
is divergent. In case (b), if

n=1
a
n
is divergent,
then

n=1
b
n
must be also divergent. And in case (c), if

n=1
b
n
is diver-
gent, then

n=1
a
n
is divergent.
2. Notice that in cases (b) and (c) we have implications (not equivalences).
For example, in case (b) if we know that

n=1
a
n
is convergent, we can-
mathematical methods i 203
not claim the same for

n=1
b
n
. Similarly, in case (c) if

n=1
b
n
is conver-
gent, we cannot conclude the same about

n=1
a
n
.
Example 13.34. 1. Consider the series

n=1
sin
2
(n) + n
2n
2
1
. To check
whether the series is convergent or divergent we will compare it with
the series

n=1
b
n
, where b
n
=
1
n
. For a
n
=
sin
2
(n) + n
2n
2
1
, we have
a
n
b
n
=
n(sin
2
(n) + n)
2n
2
1
=
sin
2
(n)
n
+1
2
1
n
2

1
2
as n , since 0
sin
2
(n)
n

1
n
, so by the Squeeze Theorem
lim
n
sin
2
(n)
n
= 0.
Now the series

n=1
b
n
=

n=1
1
n
is divergent (it is the harmonic series),
so part (a) of the Limit Comparison Test implies that

n=1
a
n
is also
divergent.
2. Consider the series

n=1
2

n +3
3n
2
1
. To check whether the series is con-
vergent or divergent we will compare it with the series

n=1
b
n
, where
b
n
=
1
n
3/2
. For a
n
=
2

n +3
3n
2
1
, we have
a
n
b
n
=
n
3/2
(2

n +3)
3n
2
1
=
2 +3/

n
3
1
n
2

2
3
as n .
Since the series

n=1
b
n
=

n=1
1
n
3/2
is convergent (it is a p-series with
p = 3/2 > 1), part (a) of the Limit Comparison Test shows that

n=1
a
n
is also convergent.
Alternating series are innite series of the form

n=1
(1)
n1
a
n
= a
1
a
2
+ a
3
a
4
+ a
5
a
6
+ . . .
where a
1
, a
2
, . . . is a sequence with a
n
0 for all n.
Theorem 13.35 (The Alternating Series Test). Let a
n
be a sequence
such that a
n
0 and a
n
a
n+1
for all n 1, and such that lim
n
a
n
= 0.
Then the alternating series

n=1
(1)
n1
a
n
is convergent. Moreover, if
s is the sum of the alternating series and s
n
its nth partial sum, then
[s s
n
[ a
n+1
for all n 1.
204
Remark 13.36. The conclusion of Theorem 13.35 about the convergence
of an alternating series remains true if we assume that a
n
0 and a
n

a
n+1
for all sufciently large n and lim
n
a
n
= 0.
Example 13.37. The series

n=1
(1)
n1
1
n
= 1
1
2
+
1
3

1
4
+
1
5

1
6
. . .
is convergent according to the Alternating Series Test, since a
n
=
1
n
0
for all n 1, and the sequence a
n
is decreasing and converges to 0.
13.3 Absolute Convergence and the Ratio Test
We begin with a theorem.
Theorem 13.38. If the innite series

n=1
[a
n
[ is convergent, then the
series

n=1
a
n
is also convergent.
This motivates the following denition.
Definition 13.39. (Absolute convergence)
An innite series

n=1
a
n
is called absolutely convergent if the series

n=1
[a
n
[ is convergent. If

n=1
a
n
is convergent but

n=1
[a
n
[ is divergent
then we say that

n=1
a
n
is conditionally convergent.
As Theorem 13.38 shows, every absolutely convergent series is
convergent. The next example shows that the converse is not true.
Example 13.40. 1. The series

n=1
(1)
n1
n
is convergent (by the Alter-
nating Series Test). However,

n=1

(1)
n1
n

n=1
1
n
is divergent (it is the harmonic series). Thus

n=1
(1)
n1
n
is condition-
ally convergent.
2. Consider the series

n=1
sin(n)
n
2
. Since
0

sin(n)
n
2

1
n
2
mathematical methods i 205
and the series

n=1
1
n
2
is convergent (a p-series with p = 2 > 1),
the Comparison Test implies that

n=1

sin(n)
n
2

is convergent. Hence

n=1
sin(n)
n
2
is absolutely convergent and therefore convergent.
Theorem 13.41 (The Ratio Test). Let

n=1
a
n
be such that
lim
n

a
n+1
a
n

= L
1. If L < 1, then

n=1
a
n
is absolutely convergent.
2. If L > 1, then

n=1
a
n
is divergent.
Note that when L = 1 the Ratio Test gives no information.
Example 13.42. Consider the series

n=1
n
2
2
n
. We have

a
n+1
a
n

=
a
n+1
a
n
=
(n +1)
2
/2
n+1
n
2
/2
n
=
1
2
_
n +1
n
_
2

1
2
as n . Since L =
1
2
< 1, the Ratio Test implies that

n=1
n
2
2
n
is
absolutely convergent.
Example 13.43. For any constant b R the innite series

n=0
b
n
n!
is
absolutely convergent. We have

a
n+1
a
n

b
n+1
/(n +1)!
b
n
/n!

=
[b[
n +1
0
as n . So, by the Ratio Test, the series

n=0
b
n
n!
is absolutely conver-
gent. In particular (by the Test for Divergence), lim
n
b
n
n!
= 0.
13.4 Power Series
Definition 13.44. (Power series)
Let a R be a given number, a
n

n=0
a given sequence of real numbers
and x R is a variable (parameter).
A series of the form

n=0
a
n
(x a)
n
= a
0
+ a
1
(x a) + a
2
(x a)
2
+. . . + a
n
(x a)
n
+. . .
is called a power series centred at a. When a = 0, the series is simply
called a power series.
206
Clearly a power series can be regarded as a function of x dened
for all x R for which the innite series is convergent.
Theorem 13.45. For a power series

n=0
a
n
(x a)
n
one of the following
three possibilities occurs:
(a) The series is absolutely convergent for x = a and divergent for all
x ,= a.
(b) The series is absolutely convergent for all x R.
(c) There exists R > 0 such that the series is absolutely convergent for
[x a[ < R and divergent for [x a[ > R.
In other words, a power series is convergent at only one point or
convergent everywhere or convergent only on an interval centred at
a. It is not possible for it to only be convergent at several separated
points or on several separate intervals.
The number R in part (c) of Theorem 13.45 is called the radius of
convergence of the power series. In case (a) we dene R = 0, while in
(b) we set R = . When [x a[ = R then the series may or not be
convergent. It is even possible for a series to be convergent for one
value of x for which [x a[ = R and divergent for another.
Example 13.46. Find all x R for which the series

n=1
(1)
n

n
(x +3)
n
(13.4)
is absolutely convergent, conditionally convergent or divergent.
Solution. When x = 3 the series is absolutely convergent.
Let x ,= 3. Then

a
n+1
a
n

(1)
n+1
(x +3)
n+1
/

n +1
(1)
n
(x +3)
n
/

= [x +3[
_
n
n +1
[x +3[
as n . Thus, by the Ratio Test, the series is absolutely convergent for
[x +3[ < 1, that is, for 4 < x < 2, and divergent for [x +3[ > 1, that
is, for x < 4 and x > 2.
It remains to check the points x = 4 and x = 2.
Substituting x = 4 in (13.4) gives the series

n=1
(1)
n

n
(1)
n
=

n=1
1
n
1/2
which is divergent (it is a p-series with p = 1/2 < 1).
When x = 2, the series (13.4) becomes

n=1
(1)
n

n
, which is conver-
gent by the Alternating Series Test. However,

n=1

(1)
n

n=1
1

n
is divergent (as we mentioned above), so

n=1
(1)
n

n
is conditionally con-
vergent.
Conclusion: The series (13.4) is
mathematical methods i 207
1. absolutely convergent for 4 < x < 2;
2. conditionally convergent for x = 2
3. divergent for x 4 and x > 2.
Clearly (13.4) is a power series centred at a = 3. We have just shown
that its radius of convergence is R = 1.
Power series have the useful property that they can be differenti-
ated term-by-term.
Theorem 13.47 (Term-by-term differentiation of a power series). As-
sume that the power series

n=0
a
n
(x a)
n
has a radius of convergence
R > 0 and let f (x) be dened by
f (x) = a
0
+ a
1
(x a) + a
2
(x a)
2
+ . . . + a
n
(x a)
n
+ . . .
for [x a[ < R. Then f (x) is differentiable (and so continuous) for
[x a[ < R and
f
/
(x) = a
1
+2a
2
(x a) +3a
3
(x a)
2
+ . . . + n a
n
(x a)
n1
+ . . .
for [x a[ < R. Moreover, the radius of convergence of the power series
representation for f
/
(x) is R.
13.5 Taylor and MacLaurin Series
Definition 13.48. (Power series representation)
If for a function f (x) we have
f (x) = a
0
+ a
1
(x a) + a
2
(x a)
2
+ . . . + a
n
(x a)
n
+ . . .
for all x in some interval I containing a, we say that the above is a power
series representation for f about a on I. When a = 0 this is simply
called a power series representation for f on I.
For example, the formula for the sum of a geometric series gives
1
1 x
= 1 + x + x
2
+ . . . + x
n
+ . . . for all [x[ < 1
which provides a power series representation for f (x) =
1
1 x
on (1, 1).
Suppose that a function f (x) has a power series representation
f (x) = a
0
+ a
1
(x a) + a
2
(x a)
2
+ . . . + a
n
(x a)
n
+ . . . (13.5)
for those x such that [x a[ < R for some positive real number R.
Substituting x = a in (13.5) implies that f (a) = a
0
. Next, differenti-
ating (13.5) using Theorem 13.47 implies
f
/
(x) = a
1
+2a
2
(x a) +3a
3
(x a)
2
+ . . . + n a
n
(x a)
n1
+ . . .
(13.6)
208
for all [x a[ < R. Then substituting x = a in (13.6) gives f
/
(a) =
a
1
.
Similarly, differentiating (13.6) yields
f
//
(x) = 2a
2
+6a
3
(x a) + . . . + n (n 1) a
n
(x a)
n2
+ . . .
for all [x a[ < R and substituting x = a in this equality gives
f
//
(a) = 2a
2
, that is, a
2
=
f
//
(a)
2!
.
Continuing in this fashion we must have
a
n
=
f
(n)
(a)
n!
for each n.
Definition 13.49. (Taylor Series)
Assume that f (x) has derivatives of all orders on some interval I contain-
ing the point a in its interior. Then the power series

n=0
f
(n)
(a)
n!
(x a)
n
= f (a) + f
/
(a) (x a) +
f
//
(a)
2!
(x a)
2
+ . . .
is called the Taylor series of f about a. When a = 0 this series is called
the MacLaurin Series of f .
Recall from Chapter 9, that
T
n,a
(x) = f (a) + f
/
(a) (x a) +
f
//
(a)
2!
(x a)
2
+
f
///
(a)
3!
(x a)
3
+. . . +
f
(n)
(a)
n!
(x a)
n
is the nth Taylor polynomial of f at a. Clearly this is the (n + 1)st
partial sum of the Taylor series of f at a. For any given x I we
have
f (x) = T
n,a
(x) + R
n,a
(x)
where R
n,a
(x) is the remainder (or error term). So, if for some x I
we have R
n,a
(x) 0 as n , then
f (x) = lim
n
T
n,a
(x) =

n=0
f
(n)
(a)
n!
(x a)
n
That is, when R
n,a
(x) 0 as n , the Taylor series is convergent
at x and its sum is equal to f (x).
Example 13.50. 1. Consider the function f (x) = e
x
. Then f
(n)
(x) = e
x
for all integers n 1, so f
(n)
(0) = 1 for all n. Thus, the Taylor series
for e
x
about 0 has the form

n=0
x
n
n!
= 1 +
x
1!
+
x
2
2!
+
x
3
3!
+ . . . +
x
n
n!
+ . . . . (13.7)
We will now show that the sum of this series is equal to e
x
for all x
R.
Fix a constant b > 0. For any x [b, b] we have
f (x) = T
n,0
(x) + R
n,0
(x)
mathematical methods i 209
where T
n,0
(x) is the nth Taylor polynomial of f about 0 and R
n,0
(x) is
the corresponding remainder. By Taylors formula,
R
n,0
(x) =
f
(n+1)
(z)
(n +1)!
x
n+1
for some z between 0 and x. Thus [z[ b, and therefore
0 [R
n,0
(x)[ =

f
(n+1)
(z)
(n +1)!
x
n+1

e
z
(n +1)!
b
n+1
e
b
b
n+1
(n +1)!
0 ,
as n by Example 13.43. By the Squeeze Theorem, (for the given
x) lim
n
[R
n,0
(x)[ = 0. Thus the Taylor series (13.7) is convergent
and its sum is e
x
. Since b > 0 can be taken arbitrarily large the above
conclusion can be made about any x R. Thus
e
x
=

n=0
x
n
n!
for all x R.
2. Using the above argument (with slight modications), one shows that
sin(x) =

n=0
(1)
n
x
2n+1
(2n +1)!
=
x
1!

x
3
3!
+
x
5
5!
. . . + (1)
n
x
2n+1
(2n +1)!
+. . .
for all x R. Here the right hand side is the Taylor series of sin(x)
about 0.
3. Similarly,
cos(x) =

n=0
(1)
n
x
2n
(2n)!
= 1
x
2
2!
+
x
4
4!

x
6
6!
. . . + (1)
n
x
2n
(2n)!
+. . .
for all x R. Here the right hand side is the Taylor series of cos(x)
about 0.
Combining parts (1), (2) and (3) and extending the notion of a power
series to the complex numbers this gives a way of showing that e
ix
=
cos(x) + i sin(x).
14
Index
p-series, 200
divergent, 194
absolute maximum, 129, 131
absolute minimum, 129, 131
absolutely convergent, 204
additive identity, 49
Alternating series, 203
associative, 48
augmented matrix, 15
auxiliary equation, 157
back substitution, 17
basic variables, 20
basis, 40
boundary, 132
boundary conditions, 162
boundary value problem, 162
bounded, 133
change of coordinates matrix, 185
characteristic equation, 157
closed, 132
codomain, 77
coefcient matrix, 12
column space, 51
column vector, 26
commutative, 48
commute, 49
conditionally convergent., 204
consistent, 10
continuous, 103
continuously differentiable, 109
contrapositive, 39
convergent, 194, 198
critical point, 130, 131
derivative, 109, 119
determinant, 69
diagonal matrix, 50
differentiable, 107, 119
differential equations, 145
dilation, 80
dimension, 44
direction eld, 150
directional derivative, 124
divergent, 198
diverges to , 195
domain, 77
eigenvalues, 171
eigenvectors, 171
elementary matrix, 74
elementary row operation, 12
extremum, 129, 131
free parameter, 20
free variable, 20
full rank, 55
function, 77
Gaussian Elimination, 17
gradient vector, 125
group, 68
harmonic series, 200
Hessian matrix, 135
homogeneous, 23, 156
idempotent matrix, 50
identity, 50
identity matrix, 50, 80
image, 77
inconsistent., 10
independent, 38
inmum, 196
innite series, 198
inection point, 134
initial value problem, 155
inverse, 59, 61
inverse function, 82
invertible, 61, 82
Jacobian, 121
Jacobian matrix, 121
kernel, 83
leading entries, 19
leading entry, 16
leading variables, 20
left-distributive, 48
linear combination, 32
linear transformation, 77
linearly independent, 38
local maximum, 129, 131
local minimum, 129, 131
lower-triangular matrix, 50
MacLaurin Series, 208
main diagonal, 50
matrix, 47
matrix addition, 47
matrix multiplication, 47
matrix transposition, 47
monotone, 197
multiplicative identity, 49
negative denite, 136
nilpotent matrix, 50
non-basic variable, 20
non-invertible, 61
nonhomogeneous, 156
normal vector, 118
null space, 56
nullity, 58
open, 132
ordinary differential equation, 145
orthogonal projection, 78
partial derivative, 113
partial differential equation, 145
pivot entry, 17
pivot position, 17
212
polar coordinates, 102
positive denite, 136
power series, 207
product, 26
proof, 14
quadratic form, 136
rank, 55
Rank-Nullity Theorem, 58
ratio, 80
reduced row-echelon form, 64
right-distributive, 48
row echelon form, 16
row space, 51
row vector, 26
row-reduction, 17
saddle point, 132
scalar, 9
scalar multiple, 9
scalar multiplication, 25, 47
separable differential equation, 152
sequence, 193
similar, 187
skew-symmetric matrix, 50
span, 32
spanning set, 35
standard basis, 41
standard basis vectors, 41
standard matrix, 80
subspace, 27
sum, 26
supremum, 196
symmetric matrix, 50
systems of linear equations, 9
Taylor polynomial, 139
Taylor series, 208
transpose, 48
upper-triangular matrix, 50
vector addition, 25
vector space, 25
vector subspace, 27
vectors, 25
zero divisors, 50
zero matrix, 50
zero-vector, 27

Anda mungkin juga menyukai