Anda di halaman 1dari 270

A Second Semester of Linear Algebra

S. E. Payne
19 January 09
2
Contents
1 Preliminaries 7
1.1 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Linear Equations Solved with Matrix
Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Vector Spaces 15
2.1 Denition of Vector Space . . . . . . . . . . . . . . . . . . . . 15
2.1.1 Prototypical Example . . . . . . . . . . . . . . . . . . . 15
2.1.2 A Second Example . . . . . . . . . . . . . . . . . . . . 16
2.2 Basic Properties of Vector Spaces . . . . . . . . . . . . . . . . 16
2.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Sums and Direct Sums of Subspaces . . . . . . . . . . . . . . . 18
2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Dimensional Vector Spaces 21
3.1 The Span of a List . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Linear Independence and the Concept of Basis . . . . . . . . . 22
3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4 Linear Transformations 31
4.1 Denitions and Examples . . . . . . . . . . . . . . . . . . . . 31
4.2 Kernels and Images . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Rank and Nullity Applied to Matrices . . . . . . . . . . . . . . 34
4.4 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.5 Bases and Coordinate Matrices . . . . . . . . . . . . . . . . . 36
3
4 CONTENTS
4.6 Matrices as Linear Transformations . . . . . . . . . . . . . . . 38
4.7 Change of Basis . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5 Polynomials 47
5.1 Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 The Algebra of Polynomials . . . . . . . . . . . . . . . . . . . 49
5.3 Lagrange Interpolation . . . . . . . . . . . . . . . . . . . . . . 52
5.4 Polynomial Ideals . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6 Determinants 61
6.1 Determinant Functions . . . . . . . . . . . . . . . . . . . . . . 61
6.1.3 n-Linear Alternating Functions . . . . . . . . . . . . . 64
6.1.5 A Determinant Function - The Laplace
Expansion . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2 Permutations & Uniqueness of Determinants . . . . . . . . . . 67
6.2.1 A Formula for the Determinant . . . . . . . . . . . . . 67
6.3 Additional Properties of Determinants . . . . . . . . . . . . . 72
6.3.1 If A is a unit in M
n
(K), then det(A) is a unit in K. . . 72
6.3.2 Triangular Matrices . . . . . . . . . . . . . . . . . . . . 72
6.3.3 Transposes . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.3.4 Elementary Row Operations . . . . . . . . . . . . . . . 73
6.3.5 Triangular Block Form . . . . . . . . . . . . . . . . . . 73
6.3.6 The Classical Adjoint . . . . . . . . . . . . . . . . . . . 74
6.3.8 Characteristic Polynomial of a Linear Map . . . . . . . 77
6.3.10 The Cayley-Hamilton Theorem . . . . . . . . . . . . . 79
6.3.11 The Companion Matrix of a Polynomial . . . . . . . . 81
6.3.13 The Cayley-Hamilton Theorem: A Second Proof . . . . 83
6.3.14 Cramers Rule . . . . . . . . . . . . . . . . . . . . . . . 85
6.3.15 Comparing ST with TS . . . . . . . . . . . . . . . . . 85
6.4 Deeper Results with Some Applications

. . . . . . . . . . . . 89
6.4.1 Block Matrices whose Blocks Commute

. . . . . . . . 90
6.4.2 Tensor Products of Matrices

. . . . . . . . . . . . . . 91
6.4.3 The Cauchy-Binet Theorem-A Special Version

. . . . 92
6.4.5 The Matrix-Tree Theorem

. . . . . . . . . . . . . . . 94
6.4.10 The Cauchy-Binet Theorem - A General Version

. . . 96
6.4.12 The General Laplace Expansion

. . . . . . . . . . . . 98
CONTENTS 5
6.4.13 Determinants, Ranks and Linear Equations

. . . . . . 100
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7 Operators and Invariant Subspaces 113
7.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . 113
7.2 Upper-Triangular Matrices . . . . . . . . . . . . . . . . . . . . 115
7.3 Invariant Subspaces of Real Vector Spaces . . . . . . . . . . . 120
7.4 Two Commuting Linear Operators . . . . . . . . . . . . . . . 122
7.5 Commuting Families of Operators

. . . . . . . . . . . . . . . 126
7.6 The Fundamental Theorem of Algebra

. . . . . . . . . . . . . 129
7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
8 Inner Product Spaces 137
8.1 Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . 137
8.2 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . 146
8.3 Orthogonal Projection and Minimization . . . . . . . . . . . . 148
8.4 Linear Functionals and Adjoints . . . . . . . . . . . . . . . . . 152
8.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
9 Operators on Inner Product Spaces 159
9.1 Self-Adjoint Operators . . . . . . . . . . . . . . . . . . . . . . 159
9.2 Normal Operators . . . . . . . . . . . . . . . . . . . . . . . . . 163
9.3 Decomposition of Real Normal Operators . . . . . . . . . . . . 166
9.4 Positive Operators . . . . . . . . . . . . . . . . . . . . . . . . 169
9.5 Isometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
9.6 The Polar Decomposition . . . . . . . . . . . . . . . . . . . . . 175
9.7 The Singular-Value Decomposition . . . . . . . . . . . . . . . 177
9.7.3 Two Examples . . . . . . . . . . . . . . . . . . . . . . 181
9.8 Pseudoinverses and Least Squares

. . . . . . . . . . . . . . . 184
9.9 Norms, Distance and More on Least Squares

. . . . . . . . . 189
9.10 The Rayleigh Principle

. . . . . . . . . . . . . . . . . . . . . 195
9.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
10 Decomposition WRT a Linear Operator 203
10.1 Powers of Operators . . . . . . . . . . . . . . . . . . . . . . . 203
10.2 The Algebraic Multiplicity of an
Eigenvalue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
10.3 Elementary Operations . . . . . . . . . . . . . . . . . . . . . . 208
6 CONTENTS
10.4 Transforming Nilpotent Matrices . . . . . . . . . . . . . . . . 210
10.5 An Alternative Approach to the Jordan Form . . . . . . . . . 216
10.6 A Jordan Form for Real Matrices . . . . . . . . . . . . . . . 219
10.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
11 Matrix Functions

233
11.1 Operator Norms and Matrix Norms

. . . . . . . . . . . . . . 233
11.2 Polynomials in an Elementary Jordan Matrix

. . . . . . . . . 236
11.3 Scalar Functions of a Matrix

. . . . . . . . . . . . . . . . . . 238
11.4 Scalar Functions as Polynomials

. . . . . . . . . . . . . . . . 242
11.5 Power Series

. . . . . . . . . . . . . . . . . . . . . . . . . . . 244
11.6 Commuting Matrices

. . . . . . . . . . . . . . . . . . . . . . 250
11.7 A Matrix Dierential Equation

. . . . . . . . . . . . . . . . . 255
11.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
12 Innite Dimensional Vector Spaces

259
12.1 Partially Ordered Sets & Zorns Lemma

. . . . . . . . . . . . 259
12.2 Bases for Vector Spaces

. . . . . . . . . . . . . . . . . . . . . 260
12.3 A Theorem of Philip Hall

. . . . . . . . . . . . . . . . . . . . 261
12.4 A Theorem of Marshall Hall, Jr.

. . . . . . . . . . . . . . . . 264
12.5 Exercises

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
Chapter 1
Preliminaries
Preface to the Student
This book is intended to be used as a text for a second semester of linear
algebra either at the senior or rst-year-graduate level. It is written for you
under the assumption that you already have successfully completed a rst
course in linear algebra and a rst course in abstract algebra. The rst
short chapter is a very quick review of the basic material with which you are
supposed to be familiar. If this material looks new, this text is probably not
written for you. On the other hand, if you made it into graduate school, you
must have already acquired some background in modern algebra. Perhaps all
you need is to spend a little time with your undergraduate texts reviewing the
most basic facts about equivalence relations, groups, matrix computations,
row reduction techniques, and the basic concepts of linear independence,
span, and basis in the context of 1
n
.
On the other hand, some of you will be ready for a more advanced ap-
proach to the material covered here than we can justify making a part of the
course. For this reason I have included some starred sections that may be
skipped without disturbing the general ow of ideas, but that might be of
interest to some students. If material from a starred section is ever cited in
a later section, that section will necessarily also be starred.
For the material we do cover in detail, we hope that you will nd our pre-
sentation to be thorough and our proofs to be complete and clear. However,
when we indicate that you should verify something for yourself, that means
you should use paper and pen and write out the appropriate steps in detail.
One major dierence between your undergraduate linear algebra text and
7
8 CHAPTER 1. PRELIMINARIES
this one is that we discuss abstract vector spaces over arbitrary elds instead
of restricting our discussion to the standard space of n-tuples of real numbers.
A second dierence is that the emphasis is on linear transformations from one
vector space over the eld F to a second one, rather than on matrices over
F. However, a great deal of the general theory can be eortlessly translated
into results about matrices.
1.1 Fields
The elds of most concern in this text are the complex numbers c, the real
numbers 1, the rational numbers Q, and the nite Galois elds GF(q), where
q is a prime power. In fact, it is possible to work through the entire book and
use only the elds 1 and c. And if nite elds are being considered, most
the time it is possible to use just those nite elds with a prime number of
elements, i.e., the elds Z
p
Z/pZ, where p is a prime integer, with the
algebraic operations just being addition and multiplication modulo p. For
the purpose of reading this book it is sucient to be able to work with the
elds just mentioned. However, we urge you to pick up your undergraduate
abstract algebra text and review what a eld is. For most purposes the
symbol F denotes an arbitrary eld except where inner products and/or
norms are involved. In those cases F denotes either 1 or c.
Kronecker delta When the underlying eld F is understood, the sym-
bol
ij
(called the Kronecker delta) denotes the element 1 F if i = j and
the element 0 F if i ,= j. Occasionally it means 0 or 1 in some other
structure, but the context should make that clear.
If you are not really familiar with the eld of complex numbers, you should
spend a little time getting acquainted. A complex number = a +bi, where
a, b 1 and i
2
= 1, has a conjugate = a bi, and = a
2
+b
2
= [[
2
.
It is easily checked that = . Note that [[ =

a
2
+ b
2
= 0 if and
only if = 0. You should show how to compute the multiplicative inverse of
any nonzero complex number.
We dene the square-root symbol

as usual: For 0 x 1, put

x = [y[ where y is a real number such that y


2
= x.
The following two lemmas are often useful.
Lemma 1.1.1. Every complex number has a square root.
1.2. GROUPS 9
Proof. Let = a + bi be any complex number. With the denition of

given above, and with =

a
2
+ b
2
[a[, we nd that
_
_
+ a
2
i
_
a
2
_
2
= a [b[i.
Now just pick the sign so that [b[ = b.
Lemma 1.1.2. Every polynomial of odd degree with real coecients has a
(real) zero.
Proof. It suces to prove that a monic polynomial
P(x) = x
n
+ a
1
x
n1
+ + a
n
with some a
i
,= 0, with a
1
, . . . , a
n
1 and n odd has a zero. Put a =
[a
1
[ +[a
2
[ + +[a
n
[ + 1 > 1, and = 1. Then
[a
1
(a)
n1
+ + a
n1
(a) + a
n
[ [a
1
[a
n1
+ +[a
n1
[a +[a
n
[
([a
1
[ +[a
2
[ + +[a
n
[)a
n1
= (a 1)(a
n1
) < a
n
.
It readily follows that P(a) > 0 and P(a) < 0. (Check this out for
yourself!) Hence by the Intermediate Value Theorem (from Calculus) there
is a (a, a) such that P() = 0.
1.2 Groups
Let G be an arbitrary set and let : GG G be a binary operation on G.
Usually we denote the image of (g
1
, g
2
) GG under the map by g
1
g
2
.
Then you should know what it means for (G, ) to be a group. If this is the
case, G is an abelian group provided g
1
g
2
= g
2
g
1
for all g
1
, g
2
G. Our
primary example of a group will be a vector space whose elements (called
vectors) form an abelian group under vector addition. It is also helpful if you
remember how to construct the quotient group G/N, where N is a normal
subgroup of G. However, this latter concept will be introduced in detail in
the special case where it is needed.
10 CHAPTER 1. PRELIMINARIES
1.3 Matrix Algebra
An mn matrix A over F is an m by n array of elements from the eld F.
We may think of A as an ordered list of m row vectors from F
n
or equally
well as an ordered list of n column vectors from F
m
. The element in row i
and column j is usually denoted A
ij
. The symbol M
m,n
(F) denotes the set
of all mn matrices over F. It is readily made into a vector space over F.
For A, B M
m,n
(F) dene A + B by
(A + B)
i,j
= A
ij
+ B
ij
.
Similarly, dene scalar multiplication by
(aA)
ij
= aA
ij
.
Just to practice rehearsing the axioms for a vector space, you should show
that M
m,n
(F) really is a vector space over F. (See Section 2.1.)
If A and B are m n and n p over F, respectively, then the product
AB is an mp matrix over F dened by
(AB)
ij
=
n

k=1
A
ik
B
kj
.
Lemma 1.3.1. Matrix multiplication, when dened, is associative.
Proof. (Sketch) If A, B, C are m n, n p and p q matrices over F,
respectively, then
((AB)C)
ij
=
p

l=1
(AB
il
)C
lj
=
p

l=1
(
n

k=1
A
ik
B
kl
)C
lj
=
=

1kn;1lp
A
ik
B
kl
C
lj
= (A(BC))
ij
.
The following observations are not especially deep, but they come in so
handy that you should think about them until they are second nature to you.
Obs. 1.3.2. The ith row of AB is the ith row of A times the matrix B.
1.4. LINEAR EQUATIONS SOLVED WITH MATRIX ALGEBRA 11
Obs. 1.3.3. The jth column of AB is the matrix A times the jth column of
B.
If A is mn, then the n m matrix whose (i, j) entry is the (j, i) of A
is called the transpose of A and is denoted A
T
.
Obs. 1.3.4. If A is mn and B is n p, then (AB)
T
= B
T
A
T
.
Obs. 1.3.5. If A is mn with columns C
1
, . . . , C
n
and X = (x
1
, . . . , x
n
)
T
is
n1, thenAX is the linear combination of columns of A given by

n
j=1
x
j
C
j
.
Obs. 1.3.6. If A is mn with rows
1
, . . . ,
m
and X = (x
1
, . . . , x
m
), then
XA is the linear combination of rows of A given by

m
i=1
x
i

i
.
At this point you should review block multiplication of partitioned ma-
trices. The following special case is sometimes helpful.
Obs. 1.3.7. If A is m n and B is n p, view A as partitioned into n
column vectors in F
m
and view B as partitioned into n row vectors in F
p
.
Then using block multiplication we can view AB as
AB =
n

j=1
col
j
(A) row
j
(B).
Note that col
j
(A) row
j
(B) is an mp matrix with rank at most 1.
1.4 Linear Equations Solved with Matrix
Algebra
For each i, 1 i m, let
n

j=1
a
ij
x
j
= b
j
be a linear equation in the indeterminates x
1
, . . . , x
n
with the coecients a
ij
and b
j
all being real numbers. One of the rst things you learned to do in
your undergraduate linear algebra course was to replace this system of linear
equations with a single matrix equation of the form
12 CHAPTER 1. PRELIMINARIES
Ax =

b, where A = (a
ij
) is mn with m rows and n columns,
and x = (x
1
, . . . , x
n
)
T
,

b = (b
1
, . . . , b
m
)
T
.
You then augmented the matrix A with the column

b to obtain
A
t
=
_
_
_
_
_
a
11
. . . a
1n
b
1
a
21
. . . a
2n
b
2
.
.
. . . .
.
.
.
.
.
.
a
m1
. . . a
mn
b
m
_
_
_
_
_
.
At this point you performed elementary row operations on the matrix A
t
so as to replace the submatrix A with a row-reduced echelon matrix R and
at the same time replace

b with a probably dierent column vector

b
t
. You
then used this matrix (R

b
t
) to read o all sorts of information about the
original matrix A and the system of linear equations. The matrix R has r
nonzero rows for some r, 0 r m. The leftmost nonzero entry in each row
of R is a 1, and it is the only nonzero entry in its column. Then the matrix
(R

b
t
) is used to solve the original system of equations by writing each of the
leading variables as a linear combination of the other (free) variables. The
r nonzero rows of R form a basis for the row space of A, the vector subspace
of 1
n
spanned by the rows of A. The columns of A in the same positions as
the leading 1s of R form a basis for the column space of A, i.e., the subspace
of 1
m
spanned by the columns of A. There are nr free variables. Let each
of these variables take a turn being equal to 1 while the other free variables
are all equal to 0, and solve for the other leading variables using the matrix
equation Rx =

0. This gives a basis of the (right) nullspace of A, i.e., a basis


of the space of solutions to the system of homogeneous equations obtained
by replacing

b with

0. In particular we note that r is the dimension of both
the row space of A and of the column space of A. We call r the rank of A.
The dimension of the right null space of A is n r. Replacing A with its
transpose A
T
, so that the left null space of A becomes the right null space
of A
T
, shows that the left null space of A has dimension mr.
A good reference for the basic material is the following text: David C.
Lay, LINEAR ALGEBRA AND ITS APPLICATIONS, 3rd Edition, Addison
Wesley, 2003. An excellent reference at a somewhat higher level is R. A. Horn
and C. R. Johnson, MATRIX ANALYSIS, Cambridge Univ. Press, 1985.
1.5. EXERCISES 13
1.5 Exercises
1. Consider the matrix
A =
_
_
_
_
1 1 1
1 1 3
2 1 0
1 2 3
_
_
_
_
.
(a) Find the rank of A.
(b) Compute a basis for the column space of A, the row space of A,
the null space of A, and the null space of A
T
.
(c) Give the orthogonality relations among the four subspaces of part
(b).
(d) Find all solutions x = (x
1
, x
2
, x
3
)
T
to Ax =
_
_
_
_
1
3
3
0
_
_
_
_
; and to
Ax =
_
_
_
_
3
1
4
4
_
_
_
_
.
2. Block Multiplication of matrices. Let A be an m n matrix over F
whose elements have been partitioned into blocks. Also, let B be an
np matrix over F whose elements are partitioned into blocks so that
the number of blocks in each row of A is the same as the number of
blocks in each column of B. Moreover, suppose that the block A
ij
in the
i-th row and j-th column of blocks of A and the block B
jk
in the j-th
row and k-th column of blocks of B are compatible for multiplication,
i.e., A
ij
B
jk
should be well-dened for all appropriate indices i, j, k.
Then the matrix product AB (which is well-dened) may be computed
by block multiplication. For example, if A and B are partitioned in
this way, then the (i, j)-th block of AB is given by the natural formula
14 CHAPTER 1. PRELIMINARIES
(AB)
ij
=
_

_
_
_
_
_
_
A
11
A
12
A
1r
A
21
A
22
A
2r
.
.
.
.
.
.
.
.
.
.
.
.
A
s1
A
s2
A
sr
_
_
_
_
_
_
_
_
_
_
B
11
B
12
B
1t
B
21
B
22
B
2t
.
.
.
.
.
.
.
.
.
.
.
.
B
r1
B
r2
B
rt
_
_
_
_
_
_

_
ij
=
r

k=1
A
ik
B
kj
.
3. Suppose the matrix A (which might actually be huge!) is partitioned
into four block in such a way that it is block upper triangular and the
two diagonal blocks are square and are invertible.
A =
_
A
11
A
12
0 A
22
_
.
Knowing that A
11
and A
22
are both invertible, compute A
1
. Then
illustrate your solution by using block computations to compute the
inverse of the following matrix.
A =
_
_
_
_
_
_
1 1 1 1 2
3 2 0 1 1
0 0 1 0 1
0 0 1 1 0
0 0 0 1 0
_
_
_
_
_
_
Chapter 2
Vector Spaces
Linear algebra is primarily the study of linear maps on nite-dimensional
vector spaces. In this chapter we dene the concept of vector space and
discuss its elementary properties.
2.1 Denition of Vector Space
A vector space over the eld F is a set V together with a binary operation +
on V such that (V, +) is an abelian group, along with a scalar multiplication
on V (i.e., a map F V V ) such that the following properties hold:
1. For each a F and each v V , av is a unique element of V with
1 v = v for all v V . Here 1 denotes the multiplicative identity of F.
2. Scalar multiplication distributes over vector addition: a(u + v) =
(au) + (av), which is usually written as au + av, for all a F and all
u, v V .
3. (a + b)u = au + bu, for all a, b F and all u V .
4. a(bv) = (ab)v, for all a, b F and all v V .
2.1.1 Prototypical Example
Let S be any nonempty set and let F be any eld. Put V = F
S
= f : S
F : f is a function. Then we can make V into a vector space over F as
follows. For f, g V , dene the vector sum f + g : S F by
(f + g)(s) = f(s) + g(s) for all s S.
15
16 CHAPTER 2. VECTOR SPACES
Then dene a scalar multiplication F V V as follows:
(af)(s) = a(f(s)) for all a F, f F, s S.
It is a very easy but worthwhile exercise to show that with this vector ad-
dition and this scalar multiplication, V is a vector space over F. It is also in-
teresting to see that this family of examples includes (in some abstract sense)
all the examples of vector spaces you might have studied in your rst linear
algebra course. For example, let F = 1 and let S = 1, 2, . . . , n. Then
each f : 1, 2, . . . , n 1 is given by the n-tuple (f(1), f(2), . . . , f(n)). So
with almost no change in your point of view you can see that this example
is essentially just 1
n
as you knew it before.
2.1.2 A Second Example
Let F be any eld and let x be an indeterminate over F. Then the ring F[x]
of all polynomials in x with coecients in F is a vector space over F. In
Chapter 4 we will see how to view F[x] as a subspace of the special case of
the preceding example where S = 0, 1, 2, . . .. However, in this case, any
two elements of F[x] can be multiplied to give another element of F[x], and
F[x] has the structure of a commutative ring with 1. It is even an integral
domain! In fact, it is a linear algebra. See the beginning of Chapter 4 for
the denition of a linear algebra, a term that really ought to be dened
somewhere in a Linear Algebra course. (Look up any of these words that
you are unsure about in your abstract algebra text.) (Note: Our convention
is that the zero polynomial has degree equal to .)
If we x the nonnegative integer n, then
T
n
= f(x) F[x] : degree(f(x)) n
is a vector space with the usual addition of polynomials and scalar multipli-
cation.
2.2 Basic Properties of Vector Spaces
We are careful to distinguish between the zero 0 of the eld F and the zero
vector

0 in the vector space V .
2.3. SUBSPACES 17
Theorem 2.2.1. (Properties of zero) For a F and v V we have the
following:
av =

0 if and only if a = 0 F or v =

0 V.
Proof. First suppose that a = 0. Then 0v = (0 +0)v = 0v +0v. Now adding
the additive inverse of 0v to both sides of this equation yields

0 = 0v, for all
v V . Similarly, if v =

0, we have a

0 = a(

0 +

0) = a

0 + a

0. Now add the


additive inverse of a

0 to both sides to obtain



0 = a

0.
What remains to be proved is that if av =

0, then either a = 0 or v =

0.
So suppose av =

0. If a = 0 we are done. So suppose a ,= 0. In this case a
has a multiplicative inverse a
1
F. So v = (a
1
a)v = a
1
(av) = a
1

0 =

0
by the preceding paragraph. This completes the proof.
Let v denote the additive inverse of v in V , and let 1 denote the
additive inverse of 1 F.
Lemma 2.2.2. For all vectors v V , (1)v = v.
Proof. The idea is to show that if (1)v is added to v, the the result is

0,
the additive identity of V . So, (1)v +v = (1)v +1v = (1+1)v = 0v =

0
by the preceding result.
2.3 Subspaces
Denition A subset U of V is called a subspace of V provided that with
the vector addition and scalar multiplication of V restricted to U, the set U
becomes a vector space in its own right.
Theorem 2.3.1. The subset U of V is a subspace of V if and only if the
following properties hold:
(i) U ,= .
(ii) If u, v U, then u + v U.
(iii) If a F and u U, then au U.
Proof. Clearly the three properties all hold if U is a subspace. So now suppose
the three properties hold. Then U is not empty, so it has some vector u. Then
0u =

0 U. For any v U, (1)v = v U. By these properties and
18 CHAPTER 2. VECTOR SPACES
property (ii) of the Theorem, (U, +) is a subgroup of (V, +). (Recall this
from your abstract algebra course.) It is now easy to see that U, with the
addition and scalar multiplication inherited from V , must be a vector space,
since all the other properties hold for U automatically because they hold for
V .
2.4 Sums and Direct Sums of Subspaces
Denition Let U
1
, . . . , U
m
be subspaces of V . The sum U
1
+U
2
+ +U
m
is dened to be
m

i=1
U
i
:= u
1
+ u
2
+ + u
m
: u
i
U
i
for 1 i m.
You are asked in the exercises to show that the sum of subspaces is a
subspace.
Denition The indexed family, or ordered list (U
1
, . . . , U
m
) of subspaces
is said to be independent provided that if

0 = u
1
+u
2
+ +u
m
with u
i
U
i
,
1 i m, then u
i
= 0 for all i = 1, 2, . . . , m.
Theorem 2.4.1. Let U
i
be a subspace of V for 1 i m. Each element
v

m
i=1
U
i
has a unique expression of the form v =

m
i=1
u
i
with u
i
U
i
for all i if and only if the list (U
1
, . . . , U
m
) is independent.
Proof. Suppose the list (U
1
, . . . , U
m
) is independent. Then by denition

0
has a unique representation of the desired form (as a sum of zero vectors).
Suppose that some v =

m
i=1
u
i
=

m
i=1
v
i
has two such representations with
u
i
, v
i
U
i
, 1 i m. Then

0 = v v =

m
i=1
(u
i
v
i
)with u
i
v
i
U
i
since U
i
is a subspace. So u
i
= v
i
for all i. Conversely, if each element of the
sum has a unique representation as an element of the sum, then certainly

0
does also, implying that the list of subspaces is independent.
Denition If each element of

m
i=1
U
i
has a unique representation as
an element in the sum, then we say that the sum is the direct sum of the
subspaces, and write
m

i=1
U
i
= U
1
U
2
U
m
=
m

i=1
U
i
.
2.5. EXERCISES 19
Theorem 2.4.2. Let U
1
, . . . , U
m
be subspaces of V for which V =

m
i=1
U
i
.
Then the following are equivalent:
(i) V =

m
i=1
U
i
.
(ii) U
j

1im;i,=j
U
i
=

0 for each j, 1 j m.
Proof. Suppose V =

m
i=1
U
i
. If u U
j

1im;i,=j
U
i
, say u = u
j
U
j
and u =

1im;i,=j
u
i
, then

m
i=1
u
i
=

0, forcing all the u


i
s equal to

0. This
shows that (i) implies (ii). It is similarly easy to see that (ii) implies that

0
has a unique representation in

m
i=1
U
i
.
Note: When m = 2 this says that U
1
+ U
2
= U
1
U
2
if and only if
U
1
U
2
=

0.
2.5 Exercises
1. Determine all possible subspaces of F
2
= f : 1, 2 F : f is a function.
2. Prove that the intersection of any collection of subspaces of V is again
a subspace of V . Here V is any vector space over any eld F.
3. Dene the sum of a countably innite number of subspaces of V and
discuss what it should mean for such a collection of subspaces to be
independent.
4. Prove that the set-theoretic union of two subspaces of V is a subspace
if and only if one of the subspaces is contained in the other.
5. Prove or disprove: If U
1
, U
2
, W are subspaces of V for which U
1
+W =
U
2
+ W, then U
1
= U
2
.
6. Prove or disprove: If U
1
, U
2
, W are subspaces of V for which U
1
W =
U
2
W, then U
1
= U
2
.
7. Let U
1
, U
2
, U
3
be three subspaces of a vector space V over the eld F
for which U
1
U
2
= U
1
U
3
= U
2
U
3
= 0. Prove that
U
1
+ U
2
+ U
3
= U
1
U
2
U
3
,
or give a counterexample.
20 CHAPTER 2. VECTOR SPACES
Chapter 3
Dimensional Vector Spaces
Our main concern in this course will be nite dimensional vector spaces, a
concept that will be introduced in this chapter. However, in a starred section
of Chapter 12 and with the help of the appropriate axioms from set theory we
show that every vector space has a well-dened dimension. The key concepts
here are: span, linear independence, basis, dimension.
We assume throughout this chapter that V is a vector space over the eld
F.
3.1 The Span of a List
For the positive integer N let N be the ordered set N = 1, 2, . . . , N, and let
^ be the ordered set 1, 2, . . . , of all natural numbers. Denition: A list
of elements of V is a function from some N to V or from ^ to V . Usually
such a list is indicated by (v
1
, v
2
, . . . , v
m
) for some positive integer m, or
perhaps by (v
1
, v
2
, . . .) if the list is nite of unknown length or countably
innite. An important aspect of a list of vectors of V is that it is ordered.
A second important dierence between lists and sets is that elements of lists
may be repeated, but in a set repetitions are not allowed. For most the work
in this course the lists we consider will be nite, but it is important to keep
an open mind about the innite case.
Denition: Let L = (v
1
, v
2
, . . .) be a list of elements of V . We say
that v V is in the span of L provided there are nitely many scalars
a
1
, a
2
, . . . , a
m
F such that v =

m
i=1
a
i
v
i
. Then the set of all vectors in
the span of L is said to be the span of L and is denoted span(v
1
, v
2
, . . .). By
21
22 CHAPTER 3. DIMENSIONAL VECTOR SPACES
convention we say that the span of the empty list ( ) is the zero space

0.
The proof of the following lemma is a routine exercise.
Lemma 3.1.1. The span of any list of vectors of V is a subspace of V .
If span(v
1
, . . . , v
m
) = V , we say (v
1
, . . . , v
m
) spans V . A vector space is
said to be nite dimensional if some nite list spans V . For example, let F
n
denote the vector space of all column vectors with n entries from the eld
F, and let e
i
be the column vector (0, . . . , 0, 1, 0 . . . , 0)
T
with n 1 entries
equal to 0 F and a 1 F in position i. Then F
n
is nite dimensional
because the list (e
1
, e
2
, . . . , e
n
) spans F
n
.
Let f(x) F[x] have the form f(x) = a
0
+ a
1
x + a
n
x
n
with a
n
,= 0.
We say that f(x) has degree n. The zero polynomial has degree . We let
T
n
(F) denote the set of all polynomials with degree at most n. Then T
n
(F)
is a subspace of F[x] with spanning list L = (1, x, x
2
, . . . , x
n
).
3.2 Linear Independence and the Concept of
Basis
One of the most fundamental concepts of linear algebra is that of linear
independence.
Denition: A nite list (v
1
, . . . , v
m
) of vectors in V is said to be linearly
independent provided the only choice of scalars a
1
, . . . , a
m
F for which

m
i=1
a
i
v
i
=

0 is a
1
= a
2
= = a
m
= 0. An innite list (v
1
, v
2
, . . .) of
vectors in V is said to be linearly independent provided that for each positive
integer m, the list (v
1
, . . . , v
m
) consisting of the rst m vectors of the innite
list is linearly independent.
A nite set S of vectors of V is said to be linearly independent provided
each list of distinct vectors of S (no repetition allowed) is linearly indepen-
dent. An arbitrary set S of vectors of V is said to be linearly independent
provided every nite subset of S is linearly independent.
Any subset of V or list of vectors in V is said to be linearly dependent
provided it is not linearly independent. It follows immediately that a list
L = (v
1
, v
2
, . . .) is linearly dependent provided there is some integer m 1
for which there are m scalars a
1
, . . . , a
m
F such that

m
i=1
a
i
v
i
=

0 and
not all the a
i
s are equal to zero.
The following Linear Dependence Lemma will turn out to be ex-
tremely useful.
3.2. LINEAR INDEPENDENCE AND THE CONCEPT OF BASIS 23
Lemma 3.2.1. Let L = (v
1
, v
2
, . . .) be a nonempty list of nonzero vectors in
V . (Of course we know that any list that includes the zero vector is automat-
ically linearly dependent.) Then the following are equivalent:
(i) L is linearly dependent.
(ii) There is some integer j 2 such that v
j
span(v
1
, . . . , v
j1
). (Usu-
ally we will want to choose the smallest index j for which this holds.)
(iii) There is some integer j 2 such that if the jth term v
j
is removed
from the list L, the span of the remaining list equals the span of L.
Proof. Suppose that the list L = (v
1
, v
2
, . . .) is linearly dependent and v
1
,=

0.
For some m there are scalars a
1
, . . . , a
m
for which

m
i=1
a
i
v
i
=

0 with at least
one of the a
i
not equal to 0. Since v
1
,=

0, not all of the scalars a
2
, . . . , a
m
can equal 0. So let j 2 be the largest index for which a
j
,= 0. Then
v
j
= (a
1
a
1
j
)v
1
+ (a
2
a
1
j
)v
2
+ + (a
j1
a
1
j
)v
j1
,
proving (ii).
To see that (ii) implies (iii), just note that in any linear combination of
vectors from L, if v
j
appears, it can be replaced by its expression as a linear
combination of the vectors v
1
, . . . , v
j1
.
It should be immediately obvious that if (iii) holds, then the list L is
linearly dependent.
The following theorem is of major importance in the theory. It says that
(nite) linearly independent lists are never longer than (nite) spanning lists.
Theorem 3.2.2. Suppose that V is spanned by the nite list L = (w
1
, . . . , w
n
)
and that M = (v
1
, . . . , v
m
) is a linearly independent list. Then m n.
Proof. We shall prove that m n by using an algorithm that is interesting
in its own right. It amounts to starting with the list L and removing one w
and adding one v at each step so as to maintain a spanning list. Since the
list L = (w
1
, . . . , w
n
) spans V , adjoining any vector to the list produces a
linearly dependent list. In particular,
(v
1
, w
1
, . . . , w
n
)
is linearly dependent with its rst element dierent from

0. So by the Linear
Dependence Lemma we may remove one of the ws so that the remaining list
24 CHAPTER 3. DIMENSIONAL VECTOR SPACES
of length n is still a spanning list. Suppose this process has been carried out
until a spanning list of the form
B = (v
1
, . . . , v
j
, w
t
1
, . . . , w
t
nj
)
has been obtained. If we now adjoin v
j+1
to the list B by inserting it imme-
diately after v
j
, the resulting list will be linearly dependent. By the Linear
Dependence Lemma, one of the vectors in this list must be a linear combina-
tion of the vectors preceding it in the list. Since the list (v
1
, . . . , v
j+1
) must
be linearly independent, this vector must be one of the w
t
s and not one of
the vs. So we can remove this w
t
and obtain a new list of length n which still
spans V and has v
1
, . . . , v
j+1
as its initial members. If at some step we had
added a v and had no more ws to remove, we would have a contradiction,
since the entire list of vs must be linearly independent even though when we
added a v the list was to become dependent. So we may continue the process
until all the vectors v
1
, . . . , v
m
have been added to the list, i.e., m n.
Denition A vector space V over the eld F is said to be nite dimen-
sional provided there is a nite list that spans V .
Theorem 3.2.3. If U is a subspace of the nite-dimensional space V , then
U is nite dimensional.
Proof. Suppose that V is spanned by a list of length m. If U =

0, then
certainly U is spanned by a nite list and is nite dimensional. So suppose
that U contains a nonzero vector v
1
. If the list (v
1
) does not span U, let v
2
be a vector in U that is not in span(v
1
). By the Linear Dependence Lemma,
the list (v
1
, v
2
) must be linearly independent. Continue this process. At each
step, if the linearly independent list obtained does not span the space U, we
can add one more vector of U to the list keeping it linearly independent. By
the preceding theorem, this process has to stop before a linearly independent
list of length m + 1 has been obtained, i.e., U is spanned by a list of length
at most m, so is nite dimensional.
Note: In the preceding proof, the spanning list obtained for U was also
linearly independent. Moreover, we could have taken V as the subspace U on
which to carry out the algorithm to obtain a linearly independent spanning
list. This is a very important type of list.
Denition A basis for a vector space V over F is a list L of vectors of
V that is a spanning list as well as a linearly independent list. We have just
observed the following fact:
3.2. LINEAR INDEPENDENCE AND THE CONCEPT OF BASIS 25
Lemma 3.2.4. Each nite dimensional vector space V has a basis. By con-
vention the empty set is a basis for the zero space

0 .
Lemma 3.2.5. If V is a nite dimensional vector space, then there is a
unique integer n 0 such that each basis of V has length exactly n.
Proof. If B
1
= (v
1
, . . . , v
n
) and B
2
= (u
1
, . . . , u
m
) are two bases of V , then
since B
1
spans V and B
2
is linearly independent, m n. Since B
1
is linearly
independent and B
2
spans V , n m. Hence m = n.
Denition If the nite dimensional vector space V has a basis with
length n, we say that V is n-dimensional or that n is the dimension of V .
Moreover, each nite dimensional vector space has a well-dened dimension.
For this course it is completely satisfactory to consider bases only for nite
dimensional spaces. However, students occasionally ask about the innite
dimensional case and we think it is pleasant to have a convenient treatment
available. Hence we have included a treatment of the innite dimensional
case in Chapter 12 that treats a number of special topics. For our general
purposes, however, we are content merely to say that any vector space which
is not nite dimensional is innite dimensional (without trying to associate
any specic innite cardinal number with the dimension).
Lemma 3.2.6. A list B = (v
1
, . . . , v
n
) of vectors in V is a basis of V if and
only if every v V can be written uniquely in the form
v = a
1
v
1
+ + a
n
v
n
. (3.1)
Proof. First assume that B is a basis. Since it is a spanning set, each v V
can be written in the form given in Eq. 3.1. Since B is linearly independent,
such an expression is easily seen to be unique. Conversely, suppose each
v V has a unique expression in the form of Eq. 3.1. The existence of the
expression for each v V implies that B is a spanning set. The uniqueness
then implies that B is linearly independent.
Theorem 3.2.7. If L = (v
1
, . . . , v
n
) is a spanning list of V , then a basis of
V can be obtained by deleting certain elements from L. Consequently every
nite dimensional vector space has a basis.
Proof. If L is linearly independent, it must already be a basis of V . If not,
consider v
1
. If v
1
=

0, discard v
1
. If not, then leave L unchanged. Since
26 CHAPTER 3. DIMENSIONAL VECTOR SPACES
L is assumed to be linearly dependent, there must be some j such that
v
j
span(v
1
, . . . , v
j1
). Choose the smallest j for which this is true and
delete that v
j
from L. This will yield a list L
1
of n 1 elements that still
spans V . If L
1
is linearly independent, then L
1
is the desired basis of V . If
not, then proceed as before to obtain a spanning list of size n 2. Continue
this way, always producing spanning sets of shorter length, until eventually
a linearly independent spanning set is obtained.
Lemma 3.2.8. Every linearly independent list of vectors in a nite dimen-
sional vector space V can be extended to a basis of V .
Proof. Let V be a nite dimensional space, say with dim(V ) = n. Let
L = (v
1
, . . . , v
k
) be a linearly independent list. So 0 k n. If k < n we
know that L cannot span the space, so there is a vector v
k+1
not in span(L).
Then L
t
= (v
1
, . . . , v
k
, v
k+1
) must still be linearly independent. If k = 1 < n
we can repeat this process, adjoining vectors one at a time to produce longer
linearly independent lists until a basis is obtained. As we have seen above,
this must happen when we have an independent list of length n.
Lemma 3.2.9. If V is a nite-dimensional space and U is a subspace of
V , then there is a subspace W of V such that V = U W. Moreover,
dim(U) dim(V ) with equality if and only if U = V .
Proof. We have seen that U must also be nite-dimensional, so that it
has a basis B
1
= (v
1
, . . . , v
k
). Since B
1
is a linearly independent set of
vectors in a nite-dimensional space V , it can be completed to a basis
B = (v
1
, . . . , v
k
, v
k+1
, . . . , v
n
) of V . Put W = span(v
k+1
, . . . , v
n
). Clearly
V = U + W, and U W =

0. It follows that V = U W. The last part


of the lemma should also be clear.
Theorem 3.2.10. Let L = (v
1
, . . . , v
k
) be a list of vectors of an n-dimensional
space V . Then any two of the following properties imply the third one:
(i) k = n;
(ii) L is linearly independent;
(iii) L spans V .
Proof. Assume that (i) and (ii) both hold. Then L can be completed to a
basis, which must have exactly n vectors, i.e., L already must span V . If
(ii) and (iii) both hold, L is a basis by denition and must have n elements
by denition of dimension. If (iii) and (i) both hold, L can be restricted to
3.3. EXERCISES 27
form a basis, which must have n elements. Hence L must already be linearly
independent.
Theorem 3.2.11. let U and W be nite-dimensional subspaces of the vector
space V . Then
dim(U + W) = dim(U) + dim(W) dim(U W).
Proof. Let B
1
= (v
1
, . . . , v
k
) be a basis for U W. Complete it to a ba-
sis B
2
= (v
1
, . . . , v
k
, u
1
, . . . , u
r
) of U and also complete it to a basis B
3
=
(v
1
, . . . , v
k
, w
1
, . . . , w
t
) for W. Put B = (v
1
, . . . , v
k
, u
1
, . . . , u
r
, w
1
, . . . , w
t
).
We claim that B is a basis for U + W, from which the theorem follows.
First we show that B is linearly independent. So suppose that there
are scalars a
i
, b
i
, c
i
F for which

k
i=1
a
i
v
i
+

r
i=1
b
i
u
i
+

t
i=1
c
i
w
i
=

0.
It follows that

k
i=1
a
i
v
i
+

r
i=1
b
i
u
i
=

t
i=1
c
i
w
i
U W. Since B
2
is linearly independent, this means all the b
i
s are equal to 0. This forces

k
i=1
a
i
v
i
+

t
i=1
c
i
w
i
=

0. Since B
3
is linearly independent, this forces all
the a
i
s and c
i
s to be 0. But it should be quite clear that B spans U +V , so
that in fact B is a basis for U + W.
At this point the following theorem is easy to prove. We leave the proof
as an exercise.
Theorem 3.2.12. Let U
1
, . . . , U
m
be nite-dimensional subspaces of V with
V = U
1
+ U
m
and with B
i
a basis for U
i
, 1 i m. The the following
are equivalent:
(i) V = U
1
U
m
;
(ii) dim(V ) = dim(U
1
) + dim(U
2
) + + dim(U
m
);
(iii) (B
1
, B
2
, . . . , B
m
) is a basis for V .
(iv) The spaces U
1
, . . . , U
m
are linearly independent.
3.3 Exercises
1. Write out a proof of Lemma 3.1.1.
2. Let L be any list of vectors of V , and let S be the set of all vectors in
L. Show that the intersection of all subspaces having S as a subset is
just the span of L.
28 CHAPTER 3. DIMENSIONAL VECTOR SPACES
3. Show that any subset T of a linearly independent set S of vectors is
also linearly independent, and observe that this is equivalent to the
fact that if T is a linearly dependent subset of S, then S is also linearly
dependent. Note: The empty set of vectors is linearly independent.
4. Show that the intersection of any family of linearly independent sets of
vectors of V also linearly independent.
5. Give an example of two linearly dependent sets whose intersection is
linearly independent.
6. Show that a list (v) of length 1 is linearly dependent if and only if
v =

0.
7. Show that a list (v
1
, v
2
) of length 2 is linearly independent if and only
if neither vector is a scalar times the other.
8. Let m be a positive integer. Let V = f F[x] : deg(f) = m or f =

0. Show that V is or is not a subspace of F[x].


9. Prove or disprove: There is a basis of T
m
(x) all of whose members have
degree m.
10. Prove or disprove: there exists a basis (p
0
, p
1
, p
2
, p
3
) of T
3
(F) such that
(a) all the polynomials p
0
, . . . , p
3
have degree 3.
(b) all the polynomials p
0
, . . . , p
3
give the value 0 when evaluated at
3.
(c) all the polynomials p
0
, . . . , p
3
give the value 3 when evaluated at
0.
(d) all the polynomials p
0
, . . . , p
3
give the value 3 when evaluated at
0 and give the value 1 when evaluated at 1.
11. Prove that if U
1
, U
2
, , U
m
are subspaces of V then dim(U
1
+ +
U
m
) dim(U
1
) + dim(U
2
) + + dim(U
m
).
12. Prove or give a counterexample: If U
1
, U
2
, U
3
are three subspaces of
a nite dimensional vector space V , then
dim(U
1
+ U
2
+ U
3
) =
3.3. EXERCISES 29
dim(U
1
) + dim(U
2
) + dim(U
3
)
dim(U
1
U
2
) dim(U
2
U
3
) dim(U
3
U
1
)
+dim(U
1
U
2
U
3
).
13. Suppose that p
0
, p
1
, . . . , p
m
are polynomials in the space T
m
(F) (of
polynomials over F with degree at most m) such that p
j
(2) = 0 for each
j. Prove that the set p
0
, p
1
, . . . , p
m
is not linearly independent in
T
m
(F).
14. Let U, V, W be vector spaces over a eld F, with U and W subspaces of
V . We say that V is the direct sum of U and W and write V = UW
provided that for each vector v F there are unique vectors u U,
w W for which v = u + w.
(a) Prove that if dimU + dimW = dimV and U W = 0, then
V = U W.
(b) Prove that if C is an mn matrix over the eld 1of real numbers,
then 1
n
= Col(C
T
) N(C), where Col(A) denotes the column
space of the matrix A and N(A) denotes the right null space of
A.
30 CHAPTER 3. DIMENSIONAL VECTOR SPACES
Chapter 4
Linear Transformations
4.1 Denitions and Examples
Throughout this chapter we let U, V and W be vector spaces over the eld
F.
Denition A function (or map) T from U to V is called a linear map or
linear transformation provided T satises the following two properties:
(i) T(u + v) = T(u) + T(v) for all u, v U.
and
(ii) T(au) = aT(u) for all a F and all u U.
These two properties can be combined into the following single property:
Obs. 4.1.1. T : U V is linear provided T(au+bv) = aT(u)+bT(v)a, b
F, u, v U.
Notice that T is a homomorphism of the additive group (U, +) into the
additive group (V, +). So you should be able to show that T(

0) =

0, where
the rst

0 is the zero vector of U and the second is the zero vector of V .
The zero map: The map 0: U V dened by 0(v) =

0 for all v U
is easily seen to be linear.
The identity map: Similarly, the map I: U U dened by I(v) = v
for all v U is linear.
For f(x) = a
0
+ a
1
x + a
2
x
2
+ a
n
x
n
F[x], the formal derivative of
f(x) is dened to be f
t
(x) = a
1
+ 2a
2
x + 3a
3
x
3
+ + na
n
x
n1
.
Dierentiation: Dene D: F[x] F[x] by D(f) = f
t
, where f
t
is the
formal derivative of f. Then D is linear.
31
32 CHAPTER 4. LINEAR TRANSFORMATIONS
The prototypical linear map is given by the following. Let F
n
and F
m
be
the vector spaces of column vectors with n and m entries, respectively. Let
A M
m,n
(F). Dene T
A
: F
n
F
m
by
T
A
: X AX for all X F
n
.
The usual properties of matrix algebra force T
A
to be linear.
Theorem 4.1.2. Suppose that U is n-dimensional with basis B = (u
1
, . . . , u
n
),
and let L = (v
1
, . . . v
n
) be any list of n vectors of V . Then there is a unique
linear map T : U V such that T(u
i
) = v
i
for 1 i n.
Proof. Let u be any vector of U, so u =

n
i=1
a
i
u
i
for unique scalars a
i
F.
Then the desired T has to be dened by T(u) =

n
i=1
a
i
T(u
i
) =

n
i=1
a
i
v
i
.
This clearly denes T uniquely. The fact that T is linear follows easily from
the basic properties of vector spaces.
Put /(U, V ) = T : U V : T is linear.
The interesting fact here is that /(U, V ) is again a vector space in its own
right. Vector addition S + T is dened for S, T /(U, V ) by: (S + T)(u) =
S(u) + T(u) for all u U. Scalar multiplication is dened for a F and
T /(U, V ) by (aT)(u) = a(T(u)) for all u U. You should verify that
with this vector addition and scalar multiplication /(U, V ) is a vector space
with the zero map being the additive identity.
Now suppose that T /(U, V ) and S /(V, W). Then the composition
(S T): U W, usually just written ST, dened by ST(u) = S(T(u)),
is well-dened and is easily shown to be linear. In general the composition
product is associative (when a triple product is dened) because this is true
of the composition of functions in general. But also we have the distributive
properties (S
1
+ S
2
)T = S
1
T + S
2
T and S(T
1
+ T
2
) = ST
1
+ ST
2
whenever
the products are dened.
In general the multiplication of linear maps is not commutative even when
both products are dened.
4.2 Kernels and Images
We use the following language. If f : A B is a function, we say that A
is the domain of f and B is the codomain of f. But f might not be onto
4.2. KERNELS AND IMAGES 33
B. We dene the image (or range of f by Im(f) = b B : f(a) =
b for at least one a A. And we use this language for linear maps also.
The null space (or kernel) of T /(U, V ) is dened by null(T) = u U :
T(u) =

0. NOTE: The terms range and image are interchangeable, and
the terms null space and kernel are interchangeable.
In the exercises you are asked to show that the null space and image of a
linear map are subspaces of the appropriate spaces.
Lemma 4.2.1. Let T /(U, V ). Then T is injective (i.e., one-to-one) if
and only if null(T) =

0.
Proof. Since T(

0) =

0, if T is injective, clearly null(T) =

0. Conversely,
suppose that null(T) =

0. Then suppose that T(u


1
) = T(u
2
) for u
1
, u
2

U. Then by the linearity of T we have T(u
1
u
2
) = T(u
1
) T(u
2
) =

0, so
u
1
u
2
null(T) =

0. Hence u
1
= u
2
, implying T is injective.
The following Theorem and its method of proof are extremely useful in
many contexts.
Theorem 4.2.2. If U is nite dimensional and T /(U, V ), then Im(T) is
nite-dimensional and
dim(U) = dim(null(T)) + dim(Im(T)).
Proof. (Pay close attention to the details of this proof. You will want to use
them for some of the exercises.)
Start with a basis (u
1
, . . . , u
k
) of null(T). Extend this list to a basis
(u
1
, . . . , u
k
, w
1
, . . . , w
r
) of U. Thus dim(null(T)) = k and dim(U) = k + r.
To complete a proof of the theorem we need only show that dim(Im(T)) = r.
We will do this by showing that (T(w
1
), . . . , T(w
r
)) is a basis of Im(T). Let
u U. Because (u
1
, . . . , u
k
, w
1
, . . . , w
r
) spans U, there are scalars a
i
, b
i
F
such that
u = a
1
u
1
+ a
k
u
k
+ b
1
w
1
+ b
r
w
r
.
Remember that u
1
, . . . , u
k
are in null(T) and apply T to both sides of the
preceding equation.
T(u) = b
1
T(w
1
) + b
r
T(w
r
).
34 CHAPTER 4. LINEAR TRANSFORMATIONS
This last equation implies that (T(w
1
), . . . , T(w
r
)) spans Im(T), so at least
Im(T) is nite dimensional. To show that (T(w
1
), . . . , T(w
r
)) is linearly
independent, suppose that
r

i=1
c
i
T(w
i
) =

0 for some c
i
F.
It follows easily that

r
i=1
c
i
w
i
null(T), so

r
i=1
c
i
w
i
=

k
i=1
d
i
u
i
. Since
(u
1
, . . . , u
k
, w
1
, . . . , w
r
) is linearly independent, we must have that all the
c
i
s and d
i
s are zero. Hence (T(w
1
), . . . , T(w
r
)) is linearly independent, and
hence is a basis for Im(T).
Obs. 4.2.3. There are two other ways to view the equality of the preceding
theorem. If n = dim(U), k = dim(null(T)) and r = dim(Im(T)), then the
theorem says n = k + r. And if dim(V ) = m, then clearly r m. So
k = n r n m. If n > m, then k > 0 and T is not injective. If n < m,
then r n < m says T is not surjective (i.e., onto).
Denition: Often we say that the dimension of the null space of a linear
map T is the nullity of T.
4.3 Rank and Nullity Applied to Matrices
Let A M
m,n
(F). Recall that the row rank of A (i.e., the dimension of
the row space row(A) of A) equals the column rank (i.e., the dimension of
the column space col(A)) of A, and the common value rank(A) is called the
rank of A. Let T
A
: F
n
F
m
: X AX as usual. Then the null space of
T
A
is the right null space rnull(A) of the matrix A, and the image of T
A
is
the column space of A. So by Theorem 4.2.2 n = rank(A) + dim(rnull(A)).
Similarly, m = rank(A) + dim(lnull(A)). (Clearly lnull(A) denotes the left
null space of A.)
Theorem 4.3.1. Let A be an mn matrix and B be an n p matrix over
F. Then
(i) dim(rnull(AB)) dim(rnull(A)) + dim(rnull(B)),
and
(ii) rank(A) + rank(B) - rank(AB) n.
4.4. PROJECTIONS 35
Proof. Start with
rnull(B) rnull(AB) F
p
.
col(B) rnull(A) rnull(A) F
n
.
Consider the map T
B
: rnull(AB) col(B) rnull(A) : x Bx. Note that
T
B
could be dened on all of F
p
, but we have dened it only on rnull(AB).
On the other hand, rnull(B) really is in rnull(AB), so null(T
B
) = rnull(B).
Also, Im(T
B
) col(B) rnull(A) rnull(A), since T
B
is being applied only
to vectors x for which A(Bx) = 0. Hence we have the following:
dim(rnull(AB)) = dim(null(T
B
)) + dim(Im(T
B
))
= dim(rnull(B)) + dim(Im(T
B
))
dim(rnull(B)) + dim(rnull(A)).
This proves (i).
This can be rewritten as p rank(AB) (p rank(B)) +(n rank(A)),
which implies (ii).
4.4 Projections
Suppose V = U W. Dene P : V V as follows. For each v V , write
v = u + w with u U and w W. Then u and w are uniquely dened
for each v V . Put P(v) = u. It is straightforward to verify the following
properties of P.
Obs. 4.4.1. (i) P /(V ).
(ii) P
2
= P.
(iii) U = Im(P).
(iv) W = null(P).
(v) U and W are both P-invariant.
This linear map P is called the projection onto U along W (or parallel to
W) and is often denoted by P = P
U,W
. Using this notation we see that
Obs. 4.4.2. I = P
U,W
+ P
W,U
.
36 CHAPTER 4. LINEAR TRANSFORMATIONS
As a kind of converse, suppose that P /(V ) is idempotent, i.e., P
2
= P.
Put U = Im(P) and W = null(P). Then for each v V we can write
v = P(v) +(v P(v)) Im(P) +null(P). Hence V = U +W. Now suppose
that u U W. Then on the one hand u = P(v) for some v V . On
the other hand P(u) =

0. Hence

0 = P(u) = P
2
(v) = P(v) = u, implying
U W =

0. Hence V = U W. It follows readily that P = P


U,W
. Hence
we have the following:
Obs. 4.4.3. The linear map P is idempotent if and only if it is the projection
onto its image along its null space.
4.5 Bases and Coordinate Matrices
In this section let V be a nite dimensional vector space over the eld F.
Since F is a eld, i.e., since the multiplication in F is commutative, it turns
out that it really does not matter whether V is a left vector space over F
(i.e., the scalars from F are placed on the left side of the vectors of V ) or V
is a right vector space. Let the list B = (v
1
, . . . , v
n
) be a basis for V . So if v
is an arbitrary vector in V there are unique scalars c
1
, . . . , c
n
in F for which
v =

n
i=1
c
i
v
i
. The column vector [v]
B
= (c
1
, . . . , c
n
)
T
F
n
is then called
the coordinate matrix for v with respect to the basis B, and we may write
v =
n

i=1
c
i
v
i
= (v
1
, . . . , v
n
)
_
_
_
_
_
c
1
c
2
.
.
.
c
n
_
_
_
_
_
= B[v]
B
.
Perhaps we should discuss this last multiplication a bit.
In the usual theory of matrix manipulation, if we want to multiply two
matrices A and B to get a matrix AB = C, there are integers n, m and p
such that A is mn, B is n p, and the product C is mp. If in general
the entry in the ith row and jth column of a matrix A is denoted by A
ij
,
then the (i, j)th entry of AB = C is (AB)
ij
=

n
k=1
A
ik
B
kj
. If A is a row or
a column the entries are usually indicated by a single subscript. So we might
4.5. BASES AND COORDINATE MATRICES 37
write
A = (a
1
, . . . , a
n
); B =
_
_
_
_
_
b
1
b
2
.
.
.
b
n
_
_
_
_
_
; AB =

k
a
k
b
k
.
In this context it is usually assumed that the entries of A and B (and hence
also of C) come from some ring, probably a commutative ring R, so that in
particular this sum

k
a
k
b
k
is a uniquely dened element of R. Moreover,
using the usual properties of arithmetic in R it is possible to show directly
that matrix multiplication (when dened!) is associative. Also, matrix ad-
dition is dened and matrix multiplication (when dened!) distributes over
addition, etc. However, it is not always necessary to assume that the entries
of A and B come from the same kind of algebraic system. We may just as
easily multiply a column (c
1
, . . . , c
n
)
T
of scalars from the eld F by the row
(v
1
, . . . , v
n
) representing an ordered basis of V over F to obtain
v = (v
1
, . . . , v
n
)
_
_
_
_
_
c
1
c
2
.
.
.
c
n
_
_
_
_
_
=
n

i=1
c
i
v
i
.
We may also suppose A is an n n matrix over F and write
(u
1
, u
2
, . . . , u
n
) = (v
1
, v
2
, . . . , v
n
)A, so u
j
=
n

i=1
A
ij
v
i
.
Then it follows readily that (u
1
, . . . , u
n
) is an ordered basis of V if and
only if the matrix A is invertible, in which case it is also true that
(v
1
, v
2
, . . . , v
n
) = (u
1
, u
2
, . . . , u
n
)A
1
, so v
j
=
n

i=1
(A
1
)
ij
u
i
.
To see this, observe that
(v
1
, . . . , v
n
)
_
_
_
_
_
c
1
c
2
.
.
.
c
n
_
_
_
_
_
= (v
1
, v
2
, . . . , v
n
)A
_
_
_
_
_
c
1
c
2
.
.
.
c
n
_
_
_
_
_
38 CHAPTER 4. LINEAR TRANSFORMATIONS
= (v
1
, v
2
, . . . , v
n
) (A
_
_
_
_
_
c
1
c
2
.
.
.
c
n
_
_
_
_
_
)
Since (v
1
, v
2
, . . . , v
n
) is an independent list, this product is the zero vector if
and only if
A
_
_
_
_
_
c
1
c
2
.
.
.
c
n
_
_
_
_
_
is the zero vector, from which the desired result is clear.
4.6 Matrices as Linear Transformations
Let B
1
= (u
1
, . . . , u
n
) be an ordered basis for the vector space U over the
eld F, and let B
2
= (v
1
, . . . , v
m
) be an ordered basis for the vector space
V over F. Let A be an m n matrix over F. Dene T
A
: U V by
[T
A
(u)]
B
2
= A [u]
B
1
for all u U. It is quite straightforward to show that
T
A
/(U, V ). It is also clear (by letting u = u
j
), that the j
th
column of A
is [T(u
j
)]
B
2
. Conversely, if T /(U, V ), and if we dene the matrix A to be
the matrix with j
th
column equal to [T(u
j
)]
B
2
, then [T(u)]
B
2
= A [u]
B
1
for
all u U. In this case we say that A is the matrix that represents T with
respect to the pair (B
1
, B
2
) of ordered bases of U and V , respectively, and we
write A = [T]
B
2
,B
1
.
Note that a coordinate matrix of a vector with respect to a basis has a
subscript that is a single basis, whereas the matrix representing a linear map
T has a subscript which is a pair of bases, with the basis of the range space
listed rst and that of the domain space listed second. Soon we shall see why
this order is the convenient one. When U = V and B
1
= B
2
it is sometimes
the case that we write [T]
B
1
in place of [T]
B
1
,B
1
. And we usually write /(V )
in place of /(V, V ), and T /(V ) is called a linear operator on V .
In these notes, however, even when there is only one basis of V being
used and T is a linear operator on V , we sometimes indicate the matrix that
represents T with a subscript that is a pair of bases, instead of just one basis,
because there are times when we want to think of T as a member of a vector
4.6. MATRICES AS LINEAR TRANSFORMATIONS 39
space so that it has a coordinate matrix with respect to some basis of that
vector space. Our convention makes it easy to recognize when the matrix
represents T with respect to a basis as a linear map and when it represents T
as a vector itself which is a linear combination of the elements of some basis.
We give an example of this.
Example 4.6.1. Let V = M
2,3
(F). Put B = (v
1
, . . . , v
6
) where the v
i
are
dened as follows:
v
1
=
_
1 0 0
0 0 0
_
; v
2
=
_
0 1 0
0 0 0
_
; v
3
=
_
0 0 1
0 0 0
_
;
v
4
=
_
0 0 0
1 0 0
_
; v
5
=
_
0 0 0
0 1 0
_
; v
6
=
_
0 0 0
0 0 1
_
.
It is clear that B is a basis for M
2,3
(F), and if A M
2,3
(F), then the
coordinate matrix [A]
B
of A with respect to the basis B is
[A]
B
= (A
11
, A
12
, A
13
, A
21
, A
22
, A
23
)
T
.
Given such a matrix A, dene a linear map T
A
: F
3
F
2
by T
A
(u) = Au
for all u F
3
. Let o
1
= (e
1
, e
2
, e
3
) be the standard ordered basis of F
3
,
and let o
2
= (h
1
, h
2
) be the standard ordered basis of F
2
. So, for example,
h
2
= (0, 1)
T
. You should check that
[T
A
]
S
2
,S
1
= A.
For 1 i 3; 1 j 2, let f
ij
/(F
3
, F
2
) be dened by f
ij
(e
k
) =

ik
h
j
. Let B
3
= (f
11
, f
21
, f
31
, f
12
, f
22
, f
32
). We want to gure out what is the
coordinate matrix [T
A
]
B
3
. (The next theorem sows that B
3
is indeed a basis
of /(F
3
, F
2
).)
We claim that [T
A
]
B
3
= (a
11
, a
12
, a
13
, a
21
, a
22
, a
23
)
T
. Because of the order
in which we listed the basis vectors f
ij
, this is equivalent to saying that T
A
=

i,j
a
ij
f
ji
. If we evaluate this sum at (e
k
) we get

i,j
a
ij
f
ji
(e
k
) =

i
a
ik
f
ki
(e
k
) =

i
a
ik
h
i
=
_
a
1k
a
2k
_
= Ae
k
= T
A
(e
k
).
This establishes our claim.
40 CHAPTER 4. LINEAR TRANSFORMATIONS
Recall that B
1
= (u
1
, . . . , u
n
) is an ordered basis for U over the eld F,
and that B
2
= (v
1
, . . . , v
m
) is an ordered basis for V over F. Now suppose
that W has an ordered basis B
3
= (w
1
, . . . , w
p
).
Let S /(U, V ) and T /(V, W), so that T S /(U, W), where T S
means do S rst. Then we have
[(T S)(u)]
B
3
= [T(S(u))]
B
3
= [T]
B
3
,B
2
[S(u)]
B
2
= [T]
B
3
,B
2
[S]
B
2
,B
1
[u]
B
1
=
= [T S]
B
3
,B
1
[u]
B
1
for all u U.
This implies that
[T S]
B
3
,B
1
= [T]
B
3
,B
2
[S]
B
2
,B
1
.
This is the equation that suggests that the subscript on the matrix repre-
senting a linear map should have the basis for the range space listed rst.
Recall that /(U, V ) is naturally a vector space over F with the usual
addition of linear maps and scalar multiplication of linear maps. Moreover,
for a, b F and S, T /(U, V ), it follows easily that
[aS + bT]
B
2
,B
1
= a[S]
B
2
,B
1
+ b[T]
B
2
,B
1
.
We leave the proof of this fact as a straightforward exercise. It then
follows that if U = V and B
1
= B
2
, the correspondence T [T]
B
1
,B
1
= [T]
B
1
is an algebra isomorphism. This includes consequences such as [T
1
]
B
=
([T]
B
)
1
when T happens to be invertible. Proving these facts is a worthwhile
exercise!
Let f
ij
/(U, V ) be dened by
f
ij
(u
k
) =
ik
v
j
, 1 i, k n; 1 j m.
So f
ij
maps u
i
to v
j
and maps u
k
to the zero vector for k ,= i. This
completely determines f
ij
as a linear map from U to V .
Theorem 4.6.2. The set B

= f
ij
: 1 i m; 1 j n is a basis for
/(U, V ) as a vector space over F.
Note: We could turn B

into a list, but we dont need to.


4.7. CHANGE OF BASIS 41
Proof. We start by showing that B

is linearly independent. Suppose that

ij
c
ij
f
ij
= 0, so that

0 =

ij
c
ij
f
ij
(u
k
) =

j
c
kj
v
j
for each k. Since
(v
1
, . . . , v
m
) is linearly independent, c
k1
= c
k2
= = c
km
= 0, and this holds
for each k, so the set of f
ij
must be linearly independent. We now show that it
spans /(U, V ). For suppose that S /(U, V ) and that [S]
B
2
,B
1
= C = (c
ij
),
i.e., S(u
j
) =

n
i=1
c
ij
v
i
. Put T =

i,k
c
ik
f
ki
. Then T(u
j
) =

c
ik
f
ki
(u
j
) =

i
c
ij
v
i
, implying that S = T since they agree on a basis.
Corollary 4.6.3. If dim(U) = n and dim(V ) = m, then dim/(U, V ) = mn.
4.7 Change of Basis
We want to investigate what happens to coordinate matrices and to matrices
representing linear operators when the ordered basis is changed. For the
sake of simplicity we shall consider this question only for linear operators
on a space V , so that we can use a single ordered basis. So in this section
we write matrices representing linear operators with a subscript which is a
single basis.
Let F be any eld and let V be a nite dimensional vector space over
F, say dim(V ) = n. Let B
1
= (u
1
, . . . , u
n
) and B
2
= (v
1
, . . . , v
n
) be two
(ordered) bases of V . So for v V , and for i = 1, say that [v]
B
1
=
(c
1
, c
2
, . . . , c
n
)
T
, i.e., v =

n
i=1
c
i
u
i
. We often write this equality in the
form
v = (u
1
, . . . , u
n
)[v]
B
1
= B
1
[v]
B
1
.
Similarly, v = B
2
[v]
B
2
.
Since B
1
and B
2
are both bases for V , there is an invertible matrix Q
such that
B
1
= B
2
Q and B
2
= B
1
Q
1
.
The rst equality indicates that (u
1
, . . . , u
n
) = (v
1
, . . . , v
n
)Q, or
u
j
=
n

i=1
Q
ij
v
i
. (4.1)
42 CHAPTER 4. LINEAR TRANSFORMATIONS
This equation says that the jth column of Q is the coordinate matrix of
u
j
with respect to B
2
. Similarly, the jth column of Q
1
is the coordinate
matrix of v
j
with respect to B
1
.
For every v V we now have
v = B
1
[v]
B
1
= (B
2
Q)[v]
B
1
= B
2
[v]
B
2
.
It follows that
Q[v]
B
1
= [v]
B
2
. (4.2)
Now let T /(V ). Recall that the matrix [T]
B
that represents T with
respect to the basis B is the unique matrix for which
[T(v)]
B
= [T]
B
[v]
B
for all v V.
Theorem 4.7.1. Let B
1
= B
2
Q as above. Then [T]
B
2
= Q[T]
B
1
Q
1
.
Proof. In Eq. 4.2 replace v with T(v) to get
Q[T(v)]
B
1
= [T(v)]
B
2
= [T]
B
2
[v]
B
2
= [T]
B
2
Q[v]
B
1
for all v V . It follows that
Q[T]
B
1
[v]
B
1
= [T]
B
2
Q[v]
B
1
for all v V , implying
Q[T]
B
1
= [T]
B
2
Q, (4.3)
which is equivalent to the statement of the theorem.
It is often convenient to remember this fact in the following form (let P
play the role of Q
1
above):
If B
2
= B
1
P, then [T]
B
2
= P
1
[T]
B
1
P. (4.4)
A Specic Setting
Now let V = F
n
, whose elements we think of as being column vectors.
Let A be an n n matrix over F and dene T
A
/(F
n
) by
T
A
(v) = Av, for all v F
n
.
4.7. CHANGE OF BASIS 43
Let o = (e
1
, . . . , e
n
) be the standard ordered basis for F
n
, i.e., e
j
is the
column vector in F
n
whose jth entry is 1 and all other entries are equal to
0. It is clear that we can identify each vector v F
n
with [v]
S
. Moreover,
the jth column of A is Ae
j
= [Ae
j
]
S
= [T
A
e
j
]
S
= [T
A
]
S
[e
j
]
S
= [T
A
]
S
e
j
= the
jth column of [T
A
]
S
, which implies that
A = [T
A
]
S
. (4.5)
Putting all this together, we have
Theorem 4.7.2. Let o = (e
1
, . . . , e
n
) be the standard ordered basis of F
n
.
Let B = (v
1
, . . . , v
n
) be a second ordered basis. Let P be the matrix whose
jth column is v
j
= [v
j
]
S
. Let A be an n n matrix over F and dene
T
A
: F
n
F
n
by T
A
(v) = Av. So [T
A
]
S
= A. Then [T
A
]
B
= P
1
AP.
Denition Two n n matrices A and B over F are said to be similar
(written A B) if and only if there is an invertible n n matrix P such
that B = P
1
AP. You should prove that similarity is an equivalence
relation on M
n
(F) and then go on to complete the details giving a proof of
the following corollary.
Corollary 4.7.3. If A, B M
n
(F), and if V is an n-dimensional vector
space over F, then A and B are similar if and only if there are bases B
1
and
B
2
of V and T /(V ) such that A = [T]
B
1
and B = [T]
B
2
.
The Dual Space
We now specialize to the case where V = F is viewed as a vector space
over F. Here /(U, F) is denoted U

and is called the dual space of U. An


element of /(U, F) is called a linear functional. Write B = (u
1
, . . . , u
n
) for
the xed ordered basis of U. Then 1 F is a basis of F over F, so we write
this basis as

1 = (1) and m = 1, and there is a basis B

of U

dened by
B

= (f
1
, . . . , f
n
), where f
i
(u
j
) =
ij
F. This basis B

is called the basis


dual to B. If f U

satises f(u
j
) = c
j
for 1 j n, then
[f]
1,B
1
= [c
1
, . . . , c
n
] = [f(u
1
), . . . , f(u
n
)].
As above in the more general case, if g =

i
c
i
f
i
, then g(u
j
) =

i
c
i
f
i
(u
j
) =
c
j
, so f = g, and [f]
B
= [c
1
, . . . , c
n
]
T
= ([f]
1,B
1
)
T
. We restate this in general:
For f U

, [f]
B
= ([f]
1,B
1
)
T
.
44 CHAPTER 4. LINEAR TRANSFORMATIONS
Given a vector space U, let GL(U) denote the set of all invertible linear
operators on U. If we let composition of maps be the binary operation of
GL(U), then GL(U) turns out to be a group called the general linear group
on U.
Suppose that T GL(U), i.e., T is an invertible element of /(U, U). We
dene a map

T : U

by

T(f) = f T
1
, for all f U

.
We want to determine the matrix [

T]
B

,B
. We know that the j
th
column of
this matrix is [

T(f
j
)]
B
= [f
j
T
1
]
B
=
_
[f
j
T
1
]
1,B
_
T
=
_
[f
j
]
1,B
[T
1
]
B,B
_
T
=
_
(0, , 1
j
, , 0) ([T]
B,B
)
1
_
T
= ([T]
B,B
)
T

_
_
_
_
_
_
_
0
.
.
.
1
j
.
.
.
0
_
_
_
_
_
_
_
= j
th
column of ([T]
B,B
)
T
.
This says that
[

T]
B

,B
= ([T]
B,B
)
T
.
Exercise 18. asks you to show that the map GL(V ) GL(V

) : T

T
is an isomorphism.
4.8 Exercises
1. Let T /(U, V ). Then null(T) is a subspace of U and Im(T) is a
subspace of V .
2. Suppose that V and W are nite-dimensional and that U is a subspace
of V . Prove that there exists a T /(V, W) such that null(T) = U if
and only if dim(U) dim(V ) dim(W).
3. Let T /(V ). Put R = Im(T) and N = null(T). Note that both R
and N are T-invariant. Show that R has a complementary T-invariant
subspace W (i.e., V = RW and T(W) W) if and only if RN =

0, in which case N is the unique T-invariant subspace complementary


to R.
4.8. EXERCISES 45
4. State and prove Theorem 4.3.1 for linear maps (instead of for matrices).
5. Prove Corollary 4.6.3
6. If T /(U, V ), we know that as a function from U to V , T has an
inverse if and only if it is bijective (i.e., one-to-one and onto). Show
that when T is invertible as a function, then its inverse is in /(V, U).
7. If T /(U, V ) is invertible, and if B
1
is a basis for U and B
2
is a basis
for V , then ([T]
B
2
,B
1
)
1
= [T
1
]
B
1
,B
2
.
8. Two vector spaces U and V are said to be isomorphic provided there is
an invertible T /(U, V ). Show that if U and V are nite-dimensional
vector spaces over F, then U and V are isomorphic if and only if
dim(U) = dim(V ).
9. Let A M
m,n
(F) and b M
m,1
(F) = F
m
. Consider the matrix
equation Ax =

b as a system of m linear equations in n unknowns
x
1
, . . . , x
n
. Interpret Obs. 4.2.3 for this system of linear equations.
10. Let B
1
, B
2
be bases for U and V , respectively, with dim(U) = n and
dim(V ) = m. Show that the map
/: /(U, V ) M
m,n
(F) : T [T]
B
2
,B
1
is an invertible linear map.
11. Suppose that V is nite-dimensional and that T /(V ). Show that
the following are equivalent:
(i) T is invertible.
(ii) T is injective.
(iii) T is surjective.
12. Suppose that V is nite dimensional and S, T /(V ). Prove that ST
is invertible if and only if both S and T are invertible.
13. Suppose that V is nite dimensional and T /(V ). Prove that T
is a scalar multiple of the identity if and only if ST = TS for every
S /(V ).
46 CHAPTER 4. LINEAR TRANSFORMATIONS
14. Suppose that W is nite dimensional and T /(V, W). Prove that T
is injective if and only if there exists an S /(W, V ) such that ST is
the identity map on V .
15. Suppose that V is nite dimensional and T /(V, W). Prove that T
is surjective if and only if there exists an S /(W, V ) such that TS is
the identity map on W.
16. Let V be the vector space over the reals 1consisting of the polynomials
in x of degree at most 4 with coecients in 1, and with the usual
addition of vectors (i.e., polynomials) and scalar multiplication.
Let B
1
= 1, x, x
2
, . . . , x
4
be the standard ordered basis of V . Let W
be the vector space of 2 3 matrices over 1 with the usual addition
of matrices and scalar multiplication. Let B
2
be the ordered basis of
W given as follows: B
2
= v
1
=
_
1 0 0
0 0 0
_
, v
2
=
_
0 1 0
0 0 0
_
, v
3
=
_
0 0 1
0 0 0
_
, v
4
=
_
0 0 0
1 0 0
_
, v
5
=
_
0 0 0
0 1 0
_
, v
6
=
_
0 0 0
0 0 1
_
.
Dene T : V W by:
For f = a
0
+ a
1
x + a
2
x
2
+ a
3
x
3
+ a
4
x
4
, put
T(f) =
_
0 a
3
a
2
+ a
4
a
1
+ a
0
a
0
0
_
.
Construct the matrix A = [T]
B
1
,B
2
that represents T with respect to
the pair B
1
, B
2
of ordered bases.
17. Let T /(V ). Prove or disprove each of the following:
(a) V = null(T) range(T).
(b) There exists a subspace U of V such that U null(T) = 0 and
range(T) = T(u) : u U.
18. Let V be a nite dimensional vector space over F. Show that the map
GL(V ) GL(V

) : T

T
is an isomorphism.
Chapter 5
Polynomials
5.1 Algebras
It is often the case that basic facts about polynomials are taken for granted as
being well-known and the subject is never developed in a formal manner. In
this chapter, which we usually assign as independent reading, we wish to give
the student a somewhat formal introduction to the algebra of polynomials
over a eld. It is then natural to generalize to polynomials with coecients
from some more general algebraic structure, such as a commutative ring. The
title of the course for which this book is intended includes the words linear
algebra, so we feel some obligation to dene what a linear algebra is.
Denition Let F be a eld. A linear algebra over the eld F is a vector
space / over F with an additional operation called multiplication of vectors
which associates with each pair of vectors u, v / a vector uv in / called
the product of u and v in such a way that
(a) multiplication is associative: u(vw) = (uv)w for all u, v, w /;
(b) multiplication distributes over addition: u(v +w) = (uv) +(uw) and
(u + v)w = (uw) + (vw), for all u, v, w /;
(c) for each scalar c F, c(uv) = (cu)v = u(cv) for all u, v /.
If there is an element 1 / such that 1u = u1 = u for each u /, we
call / a linear algebra with identity over F, and call 1 the identity of /. The
algebra / is called commutative provided uv = vu for all u, v /.
Warning: What we have just called a linear algebra is sometimes
called a linear associative algebra, because multiplication of vectors is as-
sociative. There are important situations in which nonassociative linear al-
47
48 CHAPTER 5. POLYNOMIALS
gebras (i.e., not necessarily associative algebras) are studied. We will not
meet them in this course, so for us all linear algebras are associative.
Example 5.1.1. The set of n n matrices over a eld, with the usual op-
erations, is a linear algebra with identity; in particular the eld itself is an
algebra with identity. This algebra is not commutative if n 2. Of course,
the eld itself is commutative.
Example 5.1.2. The space of all linear operators on a vector space, with
composition as the product, is a linear algebra with identity. It is commutative
if and only if the space is one-dimensional.
Now we turn our attention to the construction of an algebra which is
quite dierent from the two just given. Let F be a eld and let S be the set
of all nonnegative integers. We have seen that the set of all functions from
S into F is a vector space which we now denote by F

. The vectors in F

are just innite sequences (i.e., lists) f = (f


0
, f
1
, f
2
, . . .) of scalars f
i
F. If
g = (g
0
, g
1
, g
2
, . . .) and a, b F, then af + bg is the innite list given by
af + bg = (af
0
+ bg
0
, af
1
+ bg
1
, . . .) (5.1)
We dene a product in F

by associating with each pair (f, g) of vectors


in F

the vector fg which is given by


(fg)
n
=
n

i=0
f
i
g
ni
, n = 0, 1, 2, . . . (5.2)
Since multiplication in F is commutative, it is easy to show that multi-
plication in F

is also commutative. In fact, it is a relatively routine task


to show that F

is now a linear algebra with identity over F. Of course the


vector (1, 0, 0, . . .) is the identity, and the vector x = (0, 1, 0, 0, . . .) plays a
distinguished role. Throughout this chapter x will continue to denote this
particular vector (and will never be an element of the eld F). The product
of x with itself n times will be denoted by x
n
, and by convention x
0
= 1.
Then
x
2
= (0, 0, 1, 0, . . .), x
3
= (0, 0, 0, 1, 0, . . .), etc.
Obs. 5.1.3. The list (1, x, x
2
, . . .) is both independent and innite. Thus the
algebra F

is not nite dimensional.


5.2. THE ALGEBRA OF POLYNOMIALS 49
The algebra F

is sometimes called the algebra of formal power series


over F. The element f = (f
0
, f
1
, f
2
, . . .) is frequently written as
f =

n=0
f
n
x
n
. (5.3)
This notation is very convenient, but it must be remembered that it is
purely formal. In algebra there is no such thing as an innite sum, and the
power series notation is not intended to suggest anything about convergence.
5.2 The Algebra of Polynomials
Denition Let F[x] be the subspace of F

spanned by the vectors 1, x, x


2
, . . . ,.
An element of F[x] is called a polynomial over F.
Since F[x] consists of all (nite) linear combinations of x and its powers,
a non-zero vector f in F

is a polynomial if and only if there is an integer


n 0 such that f
n
,= 0 and such that f
k
= 0 for all integers k > n. This
integer (when it exists) is called the degree of f and is denoted by deg(f).The
zero polynomial is said to have degree . So if f F[x] has degree n, it
may be written in the form
f = f
0
x
0
+ f
1
x + f
2
x
2
+ + f
n
x
n
, f
n
,= 0.
Usually f
0
x
0
is simply written f
0
and called a scalar polynomial. A non-zero
polynomial f of degree n such that f
n
= 1 is called a monic polynomial. The
verication of the various parts of the next result guaranteeing that / = F[x]
is an algebra is routine and is left to the reader.
Theorem 5.2.1. Let f and g be non-zero polynomials over F. Then
(i) fg is a non-zero polynomial;
(ii) deg(fg) = deg(f) + deg(g) ;
(iii) fg is a monic polynomial if both f and g are monic;
(iv) fg is a scalar polynomial if and only if both f and g are scalar
polynomials;
(v) deg(f + g) maxdeg(f), deg(g).
Corollary 5.2.2. The set F[x] of all polynomials over a given eld F with
the addition and multiplication given above is a commutative linear algebra
with identity over F.
50 CHAPTER 5. POLYNOMIALS
Corollary 5.2.3. Suppose f, g, and h are polynomials over F such that
f ,= 0 and fg = fh. Then g = h.
Proof. Since fg = fh, also f(g h) = 0. Since f ,= 0, it follows from (i)
above that g h = 0.
Let f =

m
i=0
f
i
x
i
and g =

n
j=0
g
j
x
j
, and interpret f
k
= 0, g
t
= 0, if
k > m, t > n, respectively. Then
fg =

i,j
f
i
g
j
x
i+j
=
m+n

i=0
_
i

j=0
f
j
g
ij
_
x
i
,
where the rst sum is extended over all integer pairs (i, j) with 0 i m
and 0 j n.
Denition Let / be a linear algebra with identity over the eld F. We
denote the identity of / by 1 and make the convention that u
0
= 1 for each
u /. Then to each polynomial f =

n
i=0
f
i
x
i
over F and u /, we
associate an element f(u) in / by the rule
f(u) =
n

i=0
f
i
u
i
.
Warning: f
0
u
0
= f
0
1, where 1 is the multiplicative identity of /.
Example 5.2.4. Let c be the eld of complex numbers and let f = x
2
+ 2.
(a) If / = c and z c, f(z) = z
2
+ 2, in particular f(3) = 11 and
f
_
1 + i
1 i
_
= 1.
(b) If / is the algebra of all 2 2 matrices over c and if
B =
_
1 0
1 2
_
,
then
f(B) = 2
_
1 0
0 1
_
+
_
1 0
1 2
_
2
=
_
3 0
3 6
_
.
5.2. THE ALGEBRA OF POLYNOMIALS 51
(c) If / is the algebra of all linear operators on c
3
and T is the element of
/ given by
T(c
1
, c
2
, c
3
) = (i

2c
1
, c
2
, i

2c
3
),
then f(T) is the linear operator on c
3
dened by
f(T)(c
1
, c
2
, c
3
) = (0, 3c
2
, 0).
(d) If / is the algebra of all polynomials over c and g = x
4
+3i, then f(g)
is the polynomial in / given by
f(g) = 7 + 6ix
4
+ x
8
.
Theorem 5.2.5. Let F be a eld, / a linear algebra with identity over F,
f, g F[x], u / and c F. Then:
(i) (cf + g)(u) = cf(u) + g(u);
(ii) (fg)(u) = f(u)g(u) = (gf)(u).
Proof. We leave (i) as an exercise. So for (ii), suppose
f =
m

i=0
f
i
x
i
and g =
n

j=0
g
j
x
j
.
Recall that fg =

i,j
f
i
g
j
x
i+j
. So using (i) we obtain
(fg)(u) =

i,j
f
i
g
j
u
i+j
=
_
m

i=0
f
i
u
i
__
n

j=0
g
j
u
j
_
= f(u)g(u).
Fix u / and dene E
u
: F[x] / by
E
u
(f) = f(u). (5.4)
Using Theorem 5.2.5 it is now easy to see that the map E
u
: F[x] /
is an algebra homomorphism, i.e., it preserves addition and multiplication.
There is a special case of this that is so important for us that we state it as
a separate corollary.
Corollary 5.2.6. If / = /(V ) and T /, and if f, g F[x],then
(f g)(T) = f(T) g(T).
52 CHAPTER 5. POLYNOMIALS
5.3 Lagrange Interpolation
Throughout this section F is a xed eld and t
0
, t
1
, . . . , t
n
are n + 1 distinct
elements of F. Put V = f F[x] : deg(f) n, and dene E
i
: V F
by E
i
(f) = f(t
i
), 0 i n.
By Theorem 5.2.5 each E
i
is a linear functional on V . Moreover, we show
that B

= (E
0
, E
1
, . . . , E
n
) is the basis of V

dual to a particular basis of V .


Put
p
i
=

j,=i
_
x t
j
t
i
t
j
_
.
Then each p
i
has degree n, so belongs to V , and
E
j
(p
i
) = p
i
(t
j
) =
ij
. (5.5)
It will turn out that B = (p
0
, . . . , p
n
) is a basis for V , and then Eq. 5.5
expresses what we mean by saying that B

is the basis dual to B.


If f =

n
i=0
c
i
p
i
, then for each j
f(t
j
) =

i
c
i
p
i
(t
j
) = c
j
. (5.6)
So if f is the zero polynomial, each c
j
must equal 0, implying that the list
(p
0
, . . . , p
n
) is linearly independent in V . Since (1, x, x
2
, . . . , x
n
) is a basis for
V , clearly dim(V ) = n +1. Hence B = (p
0
, . . . , p
n
) must be a basis for V . It
then follows from Eq. 5.6 that for each f V , we have
f =
n

i=0
f(t
i
)p
i
. (5.7)
The expression in Eq. 5.7 is known as Lagranges Interpolation For-
mula. Setting f = x
j
in Eq. 5.7 we obtain
x
j
=
n

i=0
(t
i
)
j
p
i
(5.8)
Denition Let /
1
and /
2
be two linear algebras over F. They are said
to be isomorphic provided there is a one-to-one mapping u u
t
of /
1
onto
/
2
such that
(a) (cu + dv)
t
= cu
t
+ dv
t
5.3. LAGRANGE INTERPOLATION 53
and
(b) (uv)
t
= u
t
v
t
for all u, v /
1
and all scalars c, d F. The mapping u u
t
is called
an isomorphism of /
1
onto /
2
. An isomorphism of /
1
onto /
2
is thus a
vector space isomorphism of /
1
onto /
2
which has the additional property
of preserving products.
Example 5.3.1. Let V be an n-dimensional vector space over the eld F. As
we have seen earlier, each ordered basis B of V determines an isomorphism
T [T]
B
of the algebra of linear operators on V onto the algebra of n n
matrices over F. Suppose now that S is a xed linear operator on V and that
we are given a polynomial
f =
n

i=0
c
i
x
i
with coecients c
i
F. Then
f(S) =
n

i=0
c
i
S
i
.
Since T [T]
B
is a linear mapping,
[f(S)]
B
=
n

i=0
c
i
[S
i
]
B
.
From the additional fact that
[T
1
T
2
]
B
= [T
1
]
B
[T
2
]
B
for all T
1
, T
2
/(V ), it follows that
[S
i
]
B
= ([S]
B
)
i
, 2 i n.
As this relation is also valid for i = 0 and 1, we obtain the result that
Obs. 5.3.2.
[f(S)]
B
= f ([S]
B
) .
In other words, if S /(V ), the matrix of a polynomial in S, with respect
to a given basis, is the same polynomial in the matrix of S.
54 CHAPTER 5. POLYNOMIALS
5.4 Polynomial Ideals
In this section we are concerned primarily with the fact that F[x] is a prin-
cipal ideal domain.
Lemma 5.4.1. Suppose f and d are non-zero polynomials in F[x] such that
deg(d) deg(f). Then there exists a poynomial g F[x] for which
deg(f dg) < deg(f).
Note: This includes the possibility that f = dg so deg(f dg) = .
Proof. Suppose
f = a
m
x
m
+
m1

i=0
a
i
x
i
, a
m
,= 0
and that
d = b
n
x
n
+
n1

i=0
b
i
x
i
, b
n
,= 0.
Then m n and
f
_
a
m
b
n
_
x
mn
d = 0 or deg
_
f
_
a
m
b
n
_
x
mn
d
_
< deg(f).
Thus we may take g =
_
am
bn
_
x
mn
.
This lemma is useful in showing that the usual algorithm for long divi-
sion of polynomials works over any eld.
Theorem 5.4.2. If f, d F[x] and d ,= 0, then there are unique polynomials
q, r F[x] such that
(i) f = dq + r;
(ii) deg(r) < deg(d).
Proof. If deg(f) < deg(d) we may take q = 0 and r = f. In case f ,= 0
and deg(f) deg(d), the preceding lemma shows that we may choose a
polynomial g F[x] such that deg(f dg) < deg(f). If f dg ,= 0 and
deg(f dg) deg(d) we choose a polynomial h F[x] such that
deg[f d(g + h)] < deg(f dg).
5.4. POLYNOMIAL IDEALS 55
Continuing this process as long as necessary, we ultimately obtain polynomi-
als q, r satisfying (i) and (ii).
Suppose we also have f = dq
1
+ r
1
where deg(r
1
) < deg(d). Then
dq +r = dq
1
+r
1
and d(q q
1
) = r
1
r. If q a
1
,= 0, then d(q q
1
) ,= 0 and
deg(d) + deg(q q
1
) = deg(r
1
r).
But since the degree of r
1
r is less than the degree of d, this is impossible.
Hence q = a
1
and then r = r
1
.
Denition Let d be a non-zero polynomial over the eld F. If f F[x],
the preceding theorem shows that there is at most one polynomial q F[x]
such that f = dq. If such a q exists we say that d divides f, that f is divisible
by d, and call q the quotient of f by d. We also write q = f/d.
Corollary 5.4.3. Let f F[x], and let c F. Then f is divisible by x c
if and ony if f(c) = 0.
Proof. By the theorem, f = (x c)q +r where r is a scalar polynomial. By
Theorem 5.2.5,
f(c) = 0q(c) + r(c) = r(c).
Hence r = 0 if and only if f(c) = 0.
Denition Let F be a eld. An element c F is said to be a root or a
zero of a given polynomial f F[x] provided f(c) = 0.
Corollary 5.4.4. A polynomial f F[x] of degree n has at most n roots in
F.
Proof. The result is obviously true for polynomials of degree 0 or 1. We
assume it to be true for polynomials of degree n 1. If a is a root of f,
f = (x a)q where q has degree n 1. Since f(b) = 0 if and only if a = b
or q(b) = 0, it follows by our induction hypothesis that f has at most n
roots.
Denition Let F be a eld. An ideal in F[x] is a subspace M of F[x]
such that fg belongs to M whenever f F[x] and g M.
56 CHAPTER 5. POLYNOMIALS
Example 5.4.5. If F is a eld and d F[x], the set M = dF[x] of all
multiples df of d by arbitrary f in F[x] is an ideal. This is because d M
(so M is nonempty), and it is easy to check that M is closed under addition
and under multiplication by any element of F[x]. The ideal M is called the
principal ideal generated by d and is denoted by dF[x]. If d is not the zero
polynomial and its leading coecient is a, then d
1
= a
1
d is monic and
d
1
F[x] = dF[x].
Example 5.4.6. Let d
1
, . . . , d
n
be a nite number of polynomials over F.
Then the (vector space) sum M of the subspaces d
i
F[x] is a subspace and is
also an ideal. M is the ideal generated by the polynomials d
1
, . . . , d
n
.
The following result is the main theorem on ideals in F[x].
Theorem 5.4.7. Let M be any non-zero ideal in F[x]. Then there is a
unique monic polynomial d F[x] such that M is the principal ideal dF[x]
generated by d.
Proof. Among all nonzero polynomials in M there is (at least) one of minimal
degree. Hence there must be a monic polynomial d of least degree in M.
Suppose that f is any element of M. We can divide f by d and get a
unique quotient and remainder: f = qd + r where deg(r) < deg(d). Then
r = f qd M, but deg(r) is less than the smallest degree of any nonzero
polynomial in M. Hence r = 0. So f = qd. This shows that M dF[x].
Clearly d M implies dF[x] M, so in fact M = dF[x]. If d
1
and d
2
are two monic polynomials in F[x] for which M = d
1
F[x] = d
2
F[x], then
d
1
divides d
2
and d
2
divides d
1
. Since they are monic, they clearly must be
identical.
Denition If p
1
, . . . , p
k
F[x] and not all of them are zero, then the
monic generator d of the ideal p
1
F[x] + + p
k
F[x] is called the greatest
comon divisor (gcd) of p
1
, , p
k
. We say that the polynomials p
1
, . . . , p
k
are relatively prime if the greatest common divisor is 1, or equivalently if the
ideal they generate is all of F[x].
There is a very important (but now almost trivial) consequence of The-
orem 5.4.7. It is easy to show that if A is any n n matrix over any eld
F, the set f(x) F[x] : f(A) = 0 is an ideal in F[x]. Hence there is
a unique monic polynomial p(x) of least degree such that p(A) = 0, and if
g(x) F[x] satises g(A) = 0, then g(x) = p(x)h(x) for some h(x) F[x].
This polynomial p(x) is called the minimal polynomial of A.
5.4. POLYNOMIAL IDEALS 57
The exercises at the end of this chapter are to be considered an integral
part of the chapter. You should study them all.
Denition The eld F is called algebraically closed provided each poly-
nomial in F[x] that is irreducible over F has degree 1.
To say that F is algebraically closed means the every non-scalar irre-
ducible monic polynomial over F is of the form x c. So to say F is alge-
braically closed really means that each non-scalar polynomial f in F[x] can
be expressed in the form
f = c(x c
1
)
n
1
(x c
k
)
n
k
where c is a scalar and c
1
, . . . , c
k
are distinct elements of F. It is also true
that F is algebraically closed provided that each non-scalar polynomial over
F has a root in F.
The eld 1 of real numbers is not algebraically closed, since the polyno-
mial (x
2
+ 1) is irreducible over 1 but not of degree 1. The Fundamental
Theorem of Algebra states that the eld c of complex numbers is algebraically
closed. We shall prove this theorem later after we have introduced the con-
cepts of determinant and of eigenvalues.
The Fundamental Theorem of Algebra also makes it clear what the possi-
bilities are for the prime factorization of a polynomial with real coecients. If
f is a polynomial with real coecients and c is a complex root of f, then the
complex conjugate c is also a root of f. Therefore, those complex roots which
are not real must occur in conjugate pairs, and the entire set of roots has the
form b
1
, . . . , b
r
, c
1
, c
1
, . . . , c
k
, c
k
, where b
1
, . . . , b
r
are real and c
1
, . . . , c
k
are
non-real complex numbers. Thus f factors as
f = c(x b
1
) . . . (x b
r
)p
1
p
k
where p
i
is the quadratic polynomial
p
i
= (x c
i
)(x c
i
).
These polynomials p
i
have real coecients. We see that every irreducible
polynomial over the real number eld has degree 1 or 2. Each polynomial
over 1 is the product of certain linear factors given by the real roots of f
and certain irreducible quadratic polynomials.
58 CHAPTER 5. POLYNOMIALS
5.5 Exercises
1. (The Binomial Theorem) Let
_
m
k
_
=
m!
k!(mk)!
be the usual binomial co-
ecient. Let a, b be elements of any commutative ring. Then
(a + b)
m
=
m

i=0
_
m
k
_
a
mk
b
k
.
Note that the binomial coecient is an integer that may be reduced
modulo any modulus p. It follows that even when the denominator of
_
m
k
_
appears to be 0 in some ring of characteristic p, for example, this
binomial coecient can be interpreted modulo p. For example
_
6
3
_
= 20 2 (mod 3),
even though 3! is zero modulo 3.
The derivative of the polynomial
f = c
0
+ c
1
x + + c
n
x
n
is the polynomial
f
t
= Df = c
1
+ 2c
2
x + + nc
n
x
n1
.
Note: D is a linear operator on F[x].
2. (Taylors Formula). Let F be any eld, let n be any positive integer,
and let f F have degree m n. Then
f =
n

k=0
D
k
(f)(c)
k!
(x c)
k
.
Be sure to explain how to deal with the case where k! is divisible by
the characteristic of the eld F.
If f F[x] and c F, the multiplicity of c as a root of f is the largest
positive integer r such that (x c)
r
divides f.
5.5. EXERCISES 59
3. Show that if the multiplicity of c as a root of f is r 2, then the
multiplicity of c as a root of f
t
is at least r 1.
4. Let A =
_
a b
c d
_
. Let M = f F[x] : f(A) = 0. Show that M is
a nonzero ideal. (Hint: consider the polynomial f(x) = x
2
(a+d)x+
(ad bc).)
Denition Let F be a eld. A polynomial f F[x] is said to be
reducible over F provided there are polynomials g, h F[x] with degree
at least 1 for which f = gh. If f is not reducible over F, it is said to
be irreducible over F. A polynomial p(x) F[x] of degree at least 1
is said to be a prime polynomial over F provided whenever p divides
a product gh of two polynomials in F[x] then it has to divide at least
one of g and h.
5. Show that a polynomial p(x) F[x] with degree at least 1 is prime
over F if and only if it is irreducible over F.
6. (The Primary Decomposition of f) If F is a eld, a non-scalar monic
polynomial in F[x] can be factored as a product of monic primes in
F[x] in one and, except for order, only one way. If p
1
, . . . , p
k
are the
distinct monic primes occurring in this factorization of f, then
f = p
n
1
1
p
n
2
2
p
n
k
k
,
where n
i
is the number of times the prime p
i
occurs in this factorization.
This decomposition is also clearly unique and is called the primary
decomposition of f (or simply the prime factorization of f).
7. Let f be a non-scalar monic polynomial over the eld F, and let
f = p
n
1
1
p
n
k
k
be the prime factorization of f. For each j, 1 j k, let
f
j
=
f
p
j
n
j
=

i,=j
p
n
i
i
.
Then f
1
, . . . , f
k
are relatively prime.
60 CHAPTER 5. POLYNOMIALS
8. Using the same notation as in the preceding problem, suppose that
f = p
1
p
k
is a product of distinct non-scalar irreducible polynomials
over F. So f
j
= f/p
j
. Show that
f
t
= p
t
1
f
1
+ p
t
2
f
2
+ + p
t
k
f
k
.
9. Let f F[x] have derivative f
t
. Then f is a product of distinct irre-
ducible polynomials over F if and only if f and f
t
are relatively prime.
10. Euclidean algorithm for polynomials
Let f(x) and g(x) be polynomials over F for which
deg(f(x)) deg(g(x)) 1.
Use the division algorithm for polynomials to compute polynomials
q
i
(x) and r
i
(x) as follows.
f(x) = q
1
(x)g(x) + r
1
(x), degr
1
(x) < deg(g(x)). If r
1
(x) ,= 0, then
g(x) = q
2
r
1
(x) + r
2
(x), degr
2
(x) < deg(g(x)). If r
2
(x) ,= 0, then
r
1
(x) = q
3
(x)r
2
(x) + r
3
(x), degr
3
(x) < deg(r
2
(x)). If r
3
(x) ,= 0, then
r
2
(x) = q
4
(x)r
3
(x) + r
4
(x), degr
4
(x) < deg(r
3
(x)). If r
4
(x) ,= 0, then
.
.
.
r
j
(x) = q
j+2
(x)r
j+1
(x) + r
j+2
(x), degr
j+2
(x) < deg(r
j+1
(x)). If r
j+2
(x) = 0,
r
j
(x) = q
j+2
(x)r
j+1
(x) + 0.
Show that r
j+1
(x) = gcd(f(x), g(x)). Then use these equations to ob-
tain polynomials a(x) and b(x) for which r
j+1
(x) = a(x)f(x)+b(x)g(x).
The case where 1 is the gcd of f(x) and g(x) is especially useful.
Chapter 6
Determinants
We assume that the reader has met the notion of a commutative ring K with
1. Our main goal in this chapter is to study the usual determinant function
dened on the set of square matrices with entries from such a K. However,
essentially nothing from the general theory of commutative rings with 1 will
be used.
One of the main types of application of the notion of determinant is
to determinants of matrices whose entries are polynomials in one or more
indeterminates over a eld F. So we might have K = F[x], the ring of
polynomials in the indeterminate x with coecients from the eld F. It is
also quite useful to consider the theory of determinants over the ring Z of
rational integers.
6.1 Determinant Functions
Throughout these notes K will be a commutative ring with identity. Then
for each positive integer n we wish to assign to each n n matrix over K a
scalar (element of K) to be known as the determinant of the matrix. As soon
as we have dened these terms we may say that the determinant function is
n-linear alternating with value 1 at the identity matrix.
Denition Let D be a function which assigns to each n n matrix A
over K a scalar D(A) in K. We say that D is n-linear provided that for each
i, 1 i n, D is a linear function of the ith row when the other n 1 rows
are held xed.
Perhaps this denition needs some clarication. If D is a function from
61
62 CHAPTER 6. DETERMINANTS
M
m,n
(K) into K, and if
1
, . . . ,
n
are the rows of the matrix A, we also
write
D(A) = D(
1
, . . . ,
n
),
that is, we think of D as a function of the rows of A. The statement that D
is n-linear then means
D(
1
, . . . , c
i
+
t
i
, . . . ,
n
) = cD(
1
, . . . ,
i
, . . . ,
n
) + (6.1)
+ D(
1
, . . . ,
t
i
, . . . ,
n
).
If we x all rows except row i and regard D as a function of the ith row, it
is often convenient to write D(
i
) for D(A). Thus we may abbreviate Eq. 6.1
to
D(c
i
+
t
i
) = cD(
i
) + D(
t
i
),
so long as it is clear what the meaning is.
In the following sometimes we use A
ij
to denote the element in row i and
column j of the matrix A, and sometimes we write A(i, j).
Example 6.1.1. Let k
1
, . . . , k
n
be positive integers, 1 k
i
n, and let a be
any element of K. For each n n matrix A over K, dene
D(A) = aA(1, k
1
)A(2, k
2
) A(n, k
n
). (6.2)
Then the function dened by Eq. 6.2 is n-linear. For, if we regard D as
a function of the ith row of A, the others being xed, we may write
D(
i
) = A(i, k
i
)b
where b is some xed element of K (b is a multiplied by one entry of A from
each row other than the i-th row.) . Let
t
i
= (A
t
i1
, . . . , A
t
in
). Then we have
D(c
i
+
t
i
) = [cA(i, k
i
) + A
t
(i, k
i
)]b
= cD(
i
) + D(
t
i
).
Thus D is a linear function of each of the rows of A.
A particular n-linear function of this type is just the product of the diag-
onal entries:
D(A) = A
11
A
22
A
nn
.
6.1. DETERMINANT FUNCTIONS 63
Example 2. We nd all 2-linear functions on 22 matrices over K. Let
D be such a function. If we denote the rows of the 2 2 identity matrix by

1
and
2
, then we have
D(A) = D(A
11

1
+ A
12

2
, A
21

1
+ A
22

2
).
Using the fact that D is 2-linear, we have
D(A) = A
11
D(
1
, A
21

1
+ A
22

2
) + A
12
D(
2
, A
21

1
+ A
22

2
) =
= A
11
A
21
D(
1
,
1
) + A
11
A
22
D(
1
,
2
) + A
12
A
21
D(
2
,
1
) + A
12
A
22
D(
2
,
2
).
This D is completely determined by the four scalars
D(
1
,
1
), D(
1
,
2
), D(
2
,
1
), D(
2
,
2
).
It is now routine to verify the following. If a, b, c, d are any four scalars
in K and if we dene
D(A) = A
11
A
21
a + A
11
A
22
b + A
12
A
21
c + A
12
A
22
d,
then D is a 2-linear function on 2 2 matrices over K and
D(
1
,
1
) = a, D(
1
,
2
) = b
D(
2
,
1
) = c, D(
2
,
2
) = d.
Lemma 6.1.2. A linear combination of n-linear functions is n-linear.
Proof. It suces to prove that a linear combination of two n-linear functions
is n-linear. Let D and E be n-linear functions. If a and b are elements of K
the linear combination aD + bE is dened by
(aD + bE)(A) = aD(A) + bE(A).
Hence, if we x all rows except row i,
(aD + bE)(c
i
+
t
i
) = aD(c
i
+
t
i
) + bE(c
i
+
t
i
)
= acD(
i
) + aD(
t
i
) + bcE(
i
) + bE(
t
i
)
= c(aD + bE)(
i
) + (aD + bE)(
t
i
).
64 CHAPTER 6. DETERMINANTS
NOTE: If K is a eld and V is the set of nn matrices over K, the above
lemma says the following. The set of n-linear functions on V is a subspace
of the space of all functions from V into K.
Example 3. Let D be the function dened on 2 2 matrices over K by
D(A) = A
11
A
22
A
12
A
21
. (6.3)
This D is the sum of two functions of the type described in Example 1:
D = D
1
+ D
2
D
1
(A) = A
11
A
22
D
2
(A) = A
12
A
21
(6.4)
Most readers will recognize this D as the usual determinant function
and will recall that it satises several additional properties, such as the fol-
lowing one.
6.1.3 n-Linear Alternating Functions
Denition: Let D be an n-linear function. We say D is alternating provided
D(A) = 0 whenever two rows of A are equal.
Lemma 6.1.4. Let D be an n-linear alternating function, and let A be nn.
If A
t
is obtained from A by interchanging two rows of A, then D(A
t
) =
D(A).
Proof. If the ith row of A is and the jth row of A is , i ,= j, and all other
rows are being held constant, we write D(, ) in place of D(A).
D( + , + ) = D(, ) + D(, ) + D(, ) + D(, ).
By hypothesis, D( + , + ) = D(, ) = D(, ) = 0. So
0 = D(, ) + D(, ).
6.1. DETERMINANT FUNCTIONS 65
If we assume that D is n-linear and has the property that D(A
t
) = D(A)
when A
t
is obtained from A by interchanging any two rows of A, then if A
has two equal rows, clearly D(A) = D(A). If the characteristic of the ring
K is odd or zero, then this forces D(A) = 0, so D is alternating. But, for
example, if K is an integral domain with characteristic 2, this is clearly not
the case.
6.1.5 A Determinant Function - The Laplace
Expansion
Denition: Let K be a commutative ring with 1, and let n be a positive
integer. Suppose D is a function from nn matrices over K into K. We say
that D is a determinant function if D is n-linear, alternating and D(I) = 1.
It is clear that there is a unique determinant function on 1 1 matrices,
and we are now in a position to handle the 22 case. It should be clear that
the function given in Example 3. is a determinant function. Furthermore,
the formulas exhibited in Example 2. make it easy to see that the D given
in Example 3. is the unique determinant function.
Lemma 6.1.6. Let D be an n-linear function on n n matrices over K.
Suppose D has the property that D(A) = 0 when any two adjacent rows of A
are equal. Then D is alternating.
Proof. Let B be obtained by interchanging rows i and j of A, where i < j.
We can obtain B from A by a succession of interchanges of pairs of adjacent
rows. We begin by interchanging row i with row i +1 and continue until the
rows are in the order

1
, . . . ,
i1
,
i+1
, . . . ,
j
,
i
,
j+1
, . . . ,
n
.
This requires k = j i interchanges of adjacent rows. We now move
j
to the ith position using (k 1) interchanges of adjacent rows. We have
thus obtained B from A by 2k 1 interchanges of adjacent rows. Thus by
Lemma 6.1.4,
D(B) = D(A).
Suppose A is any n n matrix with two equal rows, say
i
=
j
with
i < j. If j = i + 1, then A has two equal and adjacent rows, so D(A) = 0.
66 CHAPTER 6. DETERMINANTS
If j > i + 1, we intechange
i+1
and
j
and the resulting matrix B has two
equal and adjacent rows, so D(B) = 0. On the other hand, D(B) = D(A),
hence D(A) = 0.
Lemma 6.1.7. Let K be a commutative ring with 1 and let D be an alter-
nating n-linear function on n n matrices over K. Then
(a) D(A) = 0 if one of the rows of A is 0.
(b) D(B) = D(A) if B is obtained from A by adding a scalar multiple of
one row of A to a dierent row of A.
Proof. For part (a), suppose the ith row
i
is a zero row. Using the linearity
of D in the ith row of A says D(
i
+
i
) = D(
i
) + D(
i
), which forces
D(A) = D(
i
) = 0. For part (b), if i ,= j, write D(A) = D(
i
,
j
), with
all rows other than the ith one held xed. D(B) = D(
i
+ c
j
,
j
) =
D(
i
,
j
) + cD(
j
,
j
) = D(A) + 0.
Denition: If n > 1 and A is an n n matrix over K, we let A(i[j)
denote the (n1) (n1) matrix obtained by deleting the ith row and jth
column of A. If D is an (n 1)-linear function and A is an n n matrix, we
put D
ij
(A) = D[A(i[j)].
Theorem 6.1.8. Let n > 1 and let D be an alternating (n1)-linear function
on (n 1) (n 1) matrices over K. For each j, 1 j n, the function
E
j
dened by
E
j
(A) =
n

i=1
(1)
i+j
A
ij
D
ij
(A) (6.5)
is an alternating n-linear function on nn matrices A. If D is a determinant
function, so is each E
j
.
Proof. If A is an n n matrix, D
ij
(A) is independent of the ith row of A.
Since D is (n1)-linear, it is clear that D
ij
is linear as a function of any row
except row i. Therefore A
ij
D
ij
(A) is an n-linear function of A. Hence E
j
is
n-linear by Lemma 6.1.2. To prove that E
j
is alternating it will suce to
show that E
j
(A) = 0 whenever A has two equal and adjacent rows. Suppose

k
=
k+1
. If i ,= k and i ,= k +1, the matrix A(i[j) has two equal rows, and
thus D
ij
(A) = 0. Therefore,
E
j
(A) = (1)
k+j
A
kj
D
kj
(A) + (1)
k+1+j
A
(k+1)j
D
(k+1)j
(A).
6.2. PERMUTATIONS & UNIQUENESS OF DETERMINANTS 67
Since
k
=
k+1
,
A
kj
= A
(k+1)j
and A(k[j) = A(k + 1[j).
Clearly then E
j
(A) = 0.
Now suppose D is a determinant function. If I
(n)
is the n n identity
matrix, then I
(n)
(j[j) is the (n 1) (n 1) identity matrix I
(n1)
. Since
I
(n)
ij
=
ij
, it follows from Eq. 6.5 that
E
j
(I
(n)
) = D(I
(n1)
). (6.6)
Now D(I
(n1)
) = 1, so that E
j
(I
(n)
) = 1 and E
j
is a determinant function.
We emphasize that this last Theorem (together with a simple induction
argument) shows that if K is a commutative ring with identity and n 1,
then there exists at least one determinant function on K
nn
. In the next
section we will show that there is only one determinant function. The de-
terminant function E
j
is referred to as the Laplace expansion of the deter-
minant along the jth column. There is a similar Laplace expansion of the
determinant along the ith row of A which will eventually show up as an easy
corollary.
6.2 Permutations & Uniqueness of Determi-
nants
6.2.1 A Formula for the Determinant
Suppose that D is an alternating n-linear function on n n matrices over
K. Let A be an n n matrix over K with rows
1
,
2
, . . . ,
n
. If we denote
the rows of the n n identity matrix over K by
1
,
2
, . . . ,
n
, then

i
=
n

j=1
A
ij

j
, 1 i n. (6.7)
Hence
68 CHAPTER 6. DETERMINANTS
D(A) = D
_

j
A
1j

j
,
2
, . . . ,
n
_
=

j
A
1j
D(
j
,
2
, . . . ,
n
).
If we now replace
2
with

k
A
2k

k
, we see that
D(A) =

j,k
A
1j
A
2k
D(
j
,
k
, . . . ,
n
).
In this expression replace
3
by

l
A
3l

l
, etc. We nally obtain
D(A) =

k
1
,k
2
,...,kn
A
1k
1
A
2k
2
A
nkn
D(
k
1
, . . . ,
kn
). (6.8)
Here the sum is over all sequences (k
1
, k
2
, . . . , k
n
) of positive integers not
exceeding n. This shows that D is a nite sum of functions of the type de-
scribed by Eq. 6.2. Note that Eq. 6.8 is a consequence just of the assumption
that D is n-linear, and that a special case was obtained earlier in Example
2. Since D is alternating,
D(
k
1
,
k
2
, . . . ,
kn
) = 0
whenever two of the indices k
i
are equal. A sequence (k
1
, k
2
, . . . , k
n
) of pos-
itive integers not exceeding n, with the property that no two of the k
i
are
equal, is called a permutation of degree n. In Eq. 6.8 we need therefore sum
only over those sequences which are permutations of degree n.
A permutation of degree n may be dened as a one-to-one function from
the set 1, 2, . . . , n onto itself. Such a function corresponds to the n-tuple
(1, 2, . . . , n) and is thus simply a rule for ordering 1, 2, . . . , n in some
well-dened way.
If D is an alternating n-linear function and A is a n n matrix over K,
we then have
D(A) =

A
1(1)
A
n(n)
D(
1
, . . . ,
n
) (6.9)
where the sum is extended over the distinct permutations of degree n.
6.2. PERMUTATIONS & UNIQUENESS OF DETERMINANTS 69
Next we shall show that
D(
1
, . . . ,
n
) = D(
1
, . . . ,
n
) (6.10)
where the sign depends only on the permutation . The reason for this is
as follows. The sequence (1, 2, . . . , n) can be obtained from the sequence
(1, 2, . . . , n) by a nite number of interchanges of pairs of elements. For
example, if 1 ,= 1, we can transpose 1 and 1, obtaining (1, . . . , 1, . . .).
Proceeding in this way we shall arrive at the sequence (1, . . . , n)) after n
or fewer such interchanges of pairs. Since D is alternating, the sign of its
value changes each time that we interchange two of the rows
i
and
j
. Thus,
if we pass from (1, 2, . . . , n) to (1, 2, . . . , n) by means of m interchanges
of pairs (i, j), we shall have
D(
1
, . . . ,
n
) = (1)
m
D(
1
, . . . ,
n
).
In particular, if D is a determinant function
D(
1
, . . . ,
n
) = (1)
m
, (6.11)
where m depends only on , not on D. Thus all determinant functions assign
the same value to the matrix with rows
1
, . . . ,
n
, and this value is either
1 or -1.
A basic fact about permutations is the following: if is a permutation
of degree n, one can pass from the sequence (1, 2, . . . , n) to the sequence
(1, 2, . . . , n) by a succession of interchanges of pairs, and this can be
done in a variety of ways. However, no matter how it is done, the number of
interchanges used is either always even or always odd. The permutation is
then called even or odd, respectively. One denes the sign of a permutation
by
sgn =
_
1, if is even;
1, if is odd.
We shall establish this basic properties of permutations below from what
we already know about determinant functions. However, for the moment
let us assume this property. Then the integer m occurring in Eq. 6.11 is
always even if is an even permutation, and is always odd if is an odd
permutation. For any alternating n-linear function D we then have
D(
1
, . . . ,
n
) = (sgn )D(
1
, . . . ,
n
),
70 CHAPTER 6. DETERMINANTS
and using Eq. 6.9 we obtain
D(A) =
_

(sgn )A
1(1)
A
n(n)
_
D(I). (6.12)
From Eq. 6.12 we see that there is precisely one determinant function on
n n matrices over K. If we denote this function by det, it is given by
det(A) =

(sgn )A
1(1)
A
2(2)
A
n(n)
, (6.13)
the sum being extended over the distinct permutations of degree n. We
can formally summarize this as follows.
Theorem 6.2.2. Let K be a commutative ring with 1 and let n be a positive
integer. There is precisely one determinant function on the set of n n
matrices over K and it is the function det dened by Eq. 6.13. If D is any
alternating n-linear function on M
n
(K), then for each n n matrix A,
D(A) = (det A)D(I).
This is the theorem we have been working towards, but we have left a gap
in the proof. That gap is the proof that for a given permutation , when we
pass from (1, 2, . . . , n) to (1, . . . , n) by interchanging pairs, the number of
interchanges is always even or always odd. This basic combintaorial fact can
be proved without any reference to determinants. However, we now point
out how it follows from the existence of a determinant function on n n
matrices.
Let K be the ring of rational integers. Let D be a determinant function
on the n n matrices over K. Let be a permutation of degree n, and
suppose we pass from (1, 2, . . . , n) to (1, . . . , n) by m interchanges of pairs
(i, j), i ,= j. As we showed in Eq. 6.11
(1)
m
= D(
1
, . . . ,
n
),
that is, the number (1)
m
must be the value of D on the matrix with rows

1
, . . . ,
n
. If
D(
1
, . . . ,
n
) = 1,
then m must be even. If
D(
1
, . . . ,
n
) = 1,
6.2. PERMUTATIONS & UNIQUENESS OF DETERMINANTS 71
then m must be odd.
From the point of view of products of permutations, the basic property
of the sign of a permutation is that
sgn () = (sgn )(sgn ). (6.14)
This result also follows from the theory of determinants (but is well known
in the theory of the symmetric group independent of any determinant theory).
In fact, it is an easy corollary of the following theorem.
Theorem 6.2.3. Let K be a commutative ring with identity, and let A and
B be n n matrices over K. Then
det (AB) = (det A)( det B).
Proof. Let B be a xed n n matrix over K, and for each n n matrix A
dene D(A) = det (AB). If we denote the rows of A by
1
, . . . ,
n
, then
D(
1
, . . . ,
n
) = det (
1
B, . . . ,
n
B).
Here
j
B denotes the 1 n matrix which is the product of the 1 n matrix

j
and the n n matrix B. Since
(c
i
+
t
i
)B = c
i
B +
t
i
B
and det is n-linear, it is easy to see that D is n-linear. If
i
=
j
, then

i
B =
j
B, and since det is alternating,
D(
1
, . . . ,
n
) = 0.
Hence, D is alternating. So D is an alternating n-linear function, and by
Theorem 6.2.2
D(A) = (det A)D(I).
But D(I) = det (IB) = det B, so
det (AB) = D(A) = (det A)(det B).
72 CHAPTER 6. DETERMINANTS
6.3 Additional Properties of Determinants
Several well-known properties of the determinant function are now easy con-
sequences of results we have already obtained, Eq. 6.13 and Theorem 6.2.3,
for example. We give a few of these, proving some and leaving the others as
rather routine exercises.
Recall that a unit u in a ring K with 1 is an element for which there is
an element v K for which uv = vu = 1.
6.3.1 If A is a unit in M
n
(K), then det(A) is a unit in
K.
If A is an invertible n n matrix over a commutative ring K with 1, then
det (A) is a unit in the ring K.
6.3.2 Triangular Matrices
If the square matrix A over K is upper or lower triangular, then det (A) is
the product of the diagonal entries of A.
6.3.3 Transposes
If A
T
is the transpose of the square matrix A, then
det (A
T
) = det (A).
Proof. If is a permutation of degree n,
(A
T
)
i(i)
= A
(i)i
.
Hence
det (A
T
) =

(sgn )A
(1)1
A
(n)n
.
When i =
1
j, A
(i)i
= A
j(
1
j)
. Thus
A
(1)1
A
(n)n
= A
1(
1
1)
A
n(
1
n)
.
Since
1
is the identity permutation,
6.3. ADDITIONAL PROPERTIES OF DETERMINANTS 73
(sgn )(sgn
1
) = 1, so sgn (
1
) = sgn ().
Furthermore, as varies over all permutations of degree n, so does
1
.
Therefore,
det (A
T
) =

(sgn
1
)A
1(
1
1)
A
n(
1
n)
= det (A).
6.3.4 Elementary Row Operations
If B is obtained from A by adding a multiple of one row of A to another (or
a multiple of one column to another), then det(A) = det(B). If B = cA,
then det (B) = c
n
det (A).
6.3.5 Triangular Block Form
Suppose an n n matrix A is given in block form
A =
_
B C
0 E
_
,
where B is an r r matrix, E is an s s matrix, C is r s, and 0 denotes
the s r zero matrix. Then
det
_
B C
0 E
_
= (det B)(det E). (6.15)
Proof. To prove this, dene
D(B, C, E) = det
_
B C
0 E
_
.
If we x B and C, then D is alternating and s-linear as a function of the
rows of E. Hence by Theorem 6.2.2
D(B, C, E) = (det E)D(B, C, I),
74 CHAPTER 6. DETERMINANTS
where I is the s s identity matrix. By subtracting multiples of the rows of
I from the rows of B and using the result of 6.3.4, we obtain
D(B, C, I) = D(B, 0, I).
Now D(B, 0, I) is clearly alternating and r-linear as a function of the rows
of B. Thus
D(B, 0, I) = (det B)D(I, 0, I) = 1.
Hence
D(B, C, E) = (det E)D(B, C, I)
= (det E)D(B, 0, I)
= (det E)(det B).
By taking transposes we obtain
det
_
A 0
B C
_
= (det A)(det C).
It is now easy to see that this result generalizes immediately to the case
where A is in upper (or lower) block triangular form.
6.3.6 The Classical Adjoint
Since the determinant function is unique and det(A) = det(A
T
), we know
that for each xed column index j,
det (A) =
n

i=1
(1)
i+j
A
ij
det A(i[j), (6.16)
and for each row index i,
det (A) =
n

j=1
(1)
i+j
A
ij
det A(i[j). (6.17)
6.3. ADDITIONAL PROPERTIES OF DETERMINANTS 75
As we mentioned earlier, the formulas of Eqs. 6.16 and 6.17 are known as
the Laplace expansion of the determinant in terms of columns, respectively,
rows. Later we will present a more general version of the Laplace expansion.
The scalar (1)
i+j
det A(i[j) is usually called the i, j cofactor of A or the
cofactor of the i, j entry of A. The above formulas for det (A) are called the
expansion of det (A) by cofactors of the jth column (sometimes the expansion
by minors of the jth column), or respectively, the expansion of det (A) by
cofactors of the ith row (sometimes the expansion by minors of the ith row).
If we set
C
ij
= (1)
i+j
det A(i[j),
then the formula in Eq. 6.16 says that for each j,
det (A) =
n

i=1
A
ij
C
ij
,
where the cofactor C
ij
is (1)
i+j
times the determinant of the (n1)(n1)
matrix obtained by deleting the ith row and jth column of A.
Similarly, for each xed row index i,
det (A) =
n

j=1
A
ij
C
ij
.
If j ,= k, then
n

i=1
A
ik
C
ij
= 0.
To see this, replace the jth column of A by its kth column, and call the
resulting matrix B. Then B has two equal columns and so det(B) = 0.
76 CHAPTER 6. DETERMINANTS
Since B(i[j) = A(i[j), we have
0 = det (B)
=
n

i=1
(1)
i+j
B
ij
det (B(i[j))
=
n

i=1
(1)
i+j
A
ik
det (A(i[j))
=
n

i=1
A
ik
C
ij
.
These properties of the cofactors can be summarized by
n

i=1
A
ik
C
ij
=
jk
det (A). (6.18)
The nn matrix adj A , which is the transpose of the matrix of cofactors
of A is called the classical adjoint of A. Thus
(adj A)
ij
= C
ji
= (1)
i+j
det (A(j[i)). (6.19)
These last two formulas can be summarized in the matrix equation
(adj A)A = (det (A))I. (6.20)
(To see this, just compute the (j, k) entry on both sides of this equation.)
We wish to see that A(adj A) = (det A)I also. Since A
T
(i[j) = (A(j[i))
T
,
we have
(1)
i+j
det (A
T
(i[j)) = (1)
i+j
det (A(j[i)),
which simply says that the i, j cofactor of A
T
is the j, i cofactor of A. Thus
adj (A
T
) = (adj A)
T
. (6.21)
Applying Eq. 6.20 to A
T
, we have
(adj A
T
)A
T
= (det (A
T
))I = (det A)I.
Transposing, we obtain
6.3. ADDITIONAL PROPERTIES OF DETERMINANTS 77
A(adj A
T
)
T
= (det (A))I.
Using Eq. 6.21 we have what we want:
A(adj A) = (det (A))I. (6.22)
An almost immediate corollary of the previous paragraphs is the following:
Theorem 6.3.7. Let A be an n n matrix over K. Then A is invertible
over K if and only if det (A) is invertible in K. When A is invertible, the
unique inverse for A is
A
1
= (det A)
1
adj A.
In particular, an n n matrix over a eld is invertible if and only if its
determinant is dierent from zero.
NOTE: This determinant criterion for invertibility proves that an n n
matrix with either a left or right inverse is invertible.
NOTE: The reader should think about the consequences of Theorem 6.3.7
in case K is the ring F[x] of polynomials over a eld F, or in case K is the
ring of rational integers.
6.3.8 Characteristic Polynomial of a Linear Map
If P is also an n n invertible matrix, then because K is commutative and
det is multiplicative, it is immediate that
det (P
1
AP) = det (A). (6.23)
This means that if K is actually a eld, if V is an n-dimensional vector
space over K, if T : V V is any linear map, and if B is any basis of V ,
then we may unambiguously dene the characteristic polynomial c
T
(x) of T
to be
c
T
(x) = det(xI [T]
B
).
This is because if A and B are two matrices that represent the same
linear transformation with respect to some bases of V , then by Eq. 6.23 and
Theorem 4.7.1
78 CHAPTER 6. DETERMINANTS
det(xI A) = det(xI B).
Given a square matrix A, a principal submatrix of A is a square submatrix
centered about the diagonal of A. So a 1 1 principal submatrix is simply
a diagonal element of A.
Theorem 6.3.9. Let K be a commutative ring with 1, and let A be an nn
matrix over K. The characteristic polynomial of A is given by
f(x) = det(xI A) =
n

i=0
c
i
x
ni
(6.24)
where c
0
= 1, and for 1 i n, c
i
=

det(B), where B ranges over all the


i i principal submatrices of A.
For an n n matrix A, the trace of A is dened to be
tr(A) =
n

i=1
A
ii
.
Note: Putting i = 1 yields the fact that the coecient of x
n1
is

n
i=1
A
ii
= tr(A), and putting i = n says that the constant term is
(1)
n
det(A).
Proof. Clearly det(xI A) is a polynomial of degree n which is monic, i.e.,
c
0
= 1, and and with constant term det(A) = (1)
n
det(A). Suppose
1 i n 1 and consider the coecient c
i
of x
ni
in the polynomial
det(xI A). Recall that in general, if D = (d
ij
) is an n n matrix over a
commutative ring with 1, then
det(D) =

Sn
(1)
sgn()
d
1,(1)
d
2,(2)
d
n,(n)
.
So to get a term of degree n i in det(xI A) =

Sn
(1)
sgn()
(xI
A)
1,(1)
(xI A)
n,(n)
we rst select ni indices j
1
, . . . , j
ni
, with comple-
mentary indices k
1
, . . . , k
i
. Then in expanding the product (xIA)
1,(1)
(xI
A)
n,(n)
when xes j
1
, . . . , j
ni
, we select the term x from the factors
(xI A)
j
1
,j
1
, . . . , (xI A)
j
ni
j
ni
, and the terms (A)
k
1
,(k
1
)
, . . . , (A)
k
i
,(k
i
)
otherwise. So if A(k
1
, . . . , k
i
) is the principal submatrix of A indexed by rows
6.3. ADDITIONAL PROPERTIES OF DETERMINANTS 79
and columns k
1
, . . . , k
i
, then det(A(k
1
, . . . , k
i
)) is the associated contribu-
tion to the coecient of x
ni
. It follows that c
i
=

det(B) where B ranges


over all the principal i i submatrices of A.
Suppose the permutation o
n
consists of k permutation cycles of sizes
l
1
, . . . , l
k
, respectively, where

l
i
= n. Then sgn() can be computed by
sgn() = (1)
l
1
1+l
2
1+l
k
1
= (1)
nk
= (1)
n
(1)
k
.
We record this formally as:
sgn() = (1)
n
(1)
k
if o
n
is the product of k disjoint cycles. (6.25)
6.3.10 The Cayley-Hamilton Theorem
In this section we let K be an integral domain (i.e., K is a commutative ring
with 1 such that a b = 0 if and only if either a = 0 or b = 0) and let be an
indeterminate over K, so that the polynomial ring K[] is also an integral
domain. If B is any n n matrix over an appropriate ring, [B[ denotes its
determinant. Let A be an n n matrix over K and let
f() = [I A[ =
n
+ c
n1

n1
+ + c
1
+ c
0
=
n

i=0
c
i

i
be the characteristic polynomial of A. (Note that we put c
n
= 1.) The
Cayley-Hamilton Theorem asserts that f(A) = 0. The proof we give actually
allows us to give some other interesting results also. The classical adjoint
of the matrix A (also known as the adjugate of A), will be denoted by A
adj
.
Recall that (A
adj
)
ij
= (1)
i+j
det(A(j[i)).
Since I A has entries from an integral domain, it also has a classical
adjoint (i.e., an adjugate), and
_
(I A)
adj
_
ij
= (1)
i+j
det ((I A)(j[i)) .
Since the (i, j) entry of (I A)
adj
is a polynomial in of degree at most
n 1, it follows that (I A)
adj
must itself be a polynomial in of degree
at most n 1 and whose coecients are n n matrices over K. Say
(I A)
adj
=
n1

i=0
D
i

i
80 CHAPTER 6. DETERMINANTS
for some n n matrices D
0
, . . . , D
n1
.
At this point note that if = 0 we have: (A)
adj
= (1)
n1
A
adj
= D
0
.
On the one hand we know that
(I A)(I A)
adj
= det(I A) I = f() I.
On the other hand we have
(I A)
_
n1

i=0
D
i

i
_
= AD
0
+
n1

i=1
(D
i1
AD
i
)
i
+D
n1

n
=
n

i=0
c
i
I
i
.
If we dene D
n
= D
1
= 0, we can write
n

i=0
(D
i1
AD
i
)
i
=
n

i=0
c
i
I
i
.
So D
i1
AD
i
= c
i
I, 0 i n, which implies that A
j
D
j1
A
j+1
D
j
= c
j
A
j
.
Thus we nd
f(A) =
n

j=0
c
j
A
j
=
n

j=0
A
j
(D
j1
AD
j
) =
= (D
1
AD
0
) + (AD
0
A
2
D
1
) + (A
2
D
1
A
3
D
2
) +
+ + (A
n1
D
n2
A
n
D
n1
) + (A
n
D
n1
A
n+1
D
n
)
= D
1
A
n+1
D
n
= 0.
This proves the Cayley-Hamilton theorem.
Multiply D
j1
A
j
D
j
= c
j
I on the left by A
j1
to get A
j1
D
1
A
j
D
j
=
c
j
A
j1
, and then sum for 1 j n. This gives
n

j=1
c
j
A
j1
=
n

j=1
(A
j1
D
j1
A
j
D
j
) = D
0
A
n
D
n
= D
0
,
so
(1)
n1
A
adj
= D
0
= A
n1
+ c
n1
A
n2
+ + c
1
I.
Put g() =
f()f(0)
0
=
n1
+ c
n1

n2
+ + c
1
to obtain g(A) =
(1)
n1
A
adj
, or
A
adj
= (1)
n1
g(A). (6.26)
6.3. ADDITIONAL PROPERTIES OF DETERMINANTS 81
This shows that the adjugate of a matrix is a polynomial in that matrix.
Then if det(A) is a unit in K, so A is invertible, we see how to view A
1
as
a polynomial in A.
Now use the equations D
j1
AD
j
= c
j
I (with D
1
= D
n
= 0, c
n
= 1)
in a slightly dierent manner.
D
n1
= I
D
n2
= AD
n1
+ c
n1
I = A + c
n1
I
D
n3
= AD
n2
+ c
n2
I = A
2
+ c
n1
A + c
n2
I
D
n4
= A
n3
+ c
n3
I = A
3
+ c
n1
A
2
+ c
n2
A + c
n3
I
.
.
.
D
nj
= A
j1
+ c
n1
A
j2
+ c
n2
A
j3
+ + c
nj+2
A + c
nj+1
I
.
.
.
D
j
= A
nj1
+ c
n1
A
nj2
+ c
n2
A
nj3
+ + c
j+2
A + c
j+1
I
.
.
.
D
2
= AD
3
+ c
3
I = A
n3
+ c
n1
A
n4
+ c
n2
A
n5
+ + c
4
A + c
3
I
D
1
= AD
2
+ c
2
I = A
n2
+ c
n1
A
n3
+ c
n2
A
n4
+ + c
3
A + c
2
I
D
0
= AD
1
+ c
1
I = A
n1
+ c
n1
A
n2
+ c
n2
A
n3
+ + c
2
A + c
1
I
Substituting these values for D
i
into (I A)
adj
=

n1
i=0
D
i

i
and col-
lecting coecients on xed powers of A we obtain
(I A)
adj
= A
n1
+ ( + c
n1
)A
n2
+ (
2
+ c
n1
+ c
n2
)A
n3
+ (
3
+ c
n1

2
+ c
n2
+ c
n3
)A
n4
.
.
. (6.27)
+ (
n2
+ c
n1

n3
+ c
n2

n4
+ + c
3
+ c
2
)A
+ (
n1
+ c
n1

n2
+ c
n2

n3
+ c
n3

n4
+ + c
2
+ c
1
) A
0
6.3.11 The Companion Matrix of a Polynomial
In this section K is a eld and f(x) = x
n
+a
n1
x
n1
+ +a
1
x+a
0
K[x].
Dene the companion matrix C(f(x)) by
82 CHAPTER 6. DETERMINANTS
C(f(x)) =
_
_
_
_
_
_
_
_
_
0 0 0 0 a
0
1 0 0 0 a
1
0 1 0 0 a
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 1 0 a
n2
0 0 0 1 a
n1
_
_
_
_
_
_
_
_
_
.
The main facts about C(f(x)) are in the next result.
Theorem 6.3.12.
det (xI
n
C(f(x))) = f(x)
is both the minimal and characteristic polynomial of C(f(x)).
Proof. First we establish that f(x) = det(xI
n
C(f(x))). This result is clear
if n = 1 and we proceed by induction. Suppose that n > 1 and compute the
determinant by cofactor expansion along the rst row, applying the induction
hypothesis to the rst summand.
det(xI
n
C(f(x))) = det
_
_
_
_
_
_
_
_
_
x 0 0 0 a
0
1 x 0 0 a
1
0 1 0 0 a
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 1 x a
n2
0 0 0 1 x + a
n1
_
_
_
_
_
_
_
_
_
= x det
_
_
_
_
_
_
_
x 0 0 a
1
1 0 0 a
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 1 x a
n2
0 0 1 x + a
n1
_
_
_
_
_
_
_
+a
0
(1)
n+1
det
_
_
_
_
_
_
_
1 x 0 0
0 1 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 1 x
0 0 0 1
_
_
_
_
_
_
_
6.3. ADDITIONAL PROPERTIES OF DETERMINANTS 83
= x(x
n1
+ a
n1
x
n2
+ + a
1
) + a
0
(1)
n+1
(1)
n1
= x
n
+ a
n1
x
n1
+ + a
1
x + a
0
= f(x).
This shows that f(x) is the characteristic polynomial of C(f(x)).
Now let T be the linear operator on K
n
whose matrix with respect to the
standard basis o = (e
1
, e
2
, . . . , e
n
) is C(f(x)). Then Te
1
= e
2
, T
2
e
1
= Te
2
=
e
3
, . . . , T
j
e
1
= T(e
j
) = e
j+1
for 1 j n 1, and Te
n
= a
0
e
1
a
1
e
2

a
n1
e
n
, so
(T
n
+ a
n1
T
n1
+ + a
1
T + a
0
I)e
1
= 0.
Also
(T
n
+ + a
1
T + a
0
I)e
j+1
= (T
n
+ + a
1
T + a
0
I)T
j
e
1
= T
j
(T
n
+ + a
1
T + a
0
I)e
1
= 0.
It follows that f(T) must be the zero operator. On the other hand,
(e
1
, Te
1
, . . . , T
n1
e
1
) is a linearly independent list, so that no nonzero poly-
nomial in T with degree less than n can be the zero operator. Then since
f(x) is monic it must be that f(x) is also the minimal polynomial for T and
hence for C(f(x)).
6.3.13 The Cayley-Hamilton Theorem: A Second Proof
Let dim(V ) = n and let T /(V ). If f is the characteristic polynomial for
T, then f(T) = 0. This is equivalent to saying that the minimal polynomial
for T divides the characteristic polynomial for T.
Proof. This proof is an illuminating and fairly sophisticated application of
the general theory of determinants developed above.
Let K be the commutative ring with identity consisting of all polynomials
in T. Actually, K is a commutative algebra with identity over the scalar
eld F. Choose a basis B = (v
1
, . . . , v
n
) for V and let A be the matrix which
represents T in the given basis. Then
T(v
j
) =
n

i=1
A
ij
v
i
, 1 j n.
These equations may be written in the equivalent form
84 CHAPTER 6. DETERMINANTS
n

i=1
(
ij
T A
ij
I)v
i
=

0, 1 j n.
Let B M
n
(K) be the matrix with entries
B
ij
=
ij
T A
ji
I.
Note the interchanging of the i and j in the subscript on A. Also, keep in
mind that each element of B is a polynomial in T. Let f(x) = det (xI A) =
det
_
xI A
T
_
. Then f(T) = det(B). (This is an element of K. Think about
this equation until it seems absolutely obvious.) Our goal is to show that
f(T) = 0. In order that f(T) be the zero operator, it is necessary and
sucient that det(B)(v
k
) =

0 for 1 k n. By the denition of B, the
vectors v
1
, . . . , v
n
satisfy the equations

0 =
n

i=1
(
ij
T A
ij
I)(v
i
) =
n

i=1
B
ji
v
i
, 1 j n. (6.28)
Let

B be the classical adjoint of B, so that

BB = B

B = det(B)I. Note
that

B also has entries that are polynomials in the operator T. Let

B
kj
operate on the right side of Eq. 6.28 to obtain

0 =

B
kj
n

i=1
B
ji
(v
i
) =
n

i=1
(

b
kj
B
ji
)(v
i
).
So summing over j we have

0 =
n

i=1
_
n

j=1

B
kj
B
ji
_
(v
i
) =
n

i=1
(
ki
det(B)) (v
i
) = det(B)(v
k
).
At this point we know that each irreducible factor of the minimal poly-
nomial of T is also a factor of the characteristic polynomial of T. A converse
is also true: Each irreducible factor of the characteristic polynomial of T is
also a factor of the minimal polynomial of T. However, we are not yet ready
to give a proof of this fact.
6.3. ADDITIONAL PROPERTIES OF DETERMINANTS 85
6.3.14 Cramers Rule
We now discuss Cramers rule for solving systems of linear equations. Sup-
pose A is an n n matrix over the eld F and we wish to solve the system
of linear equations AX = Y for some given n-tuple (y
1
, . . . , y
n
). If AX = Y ,
then
(adj A)AX = (adj A)Y
implying
(det A)X = (adj A)Y.
Thus computing the j-th row on each side we have
(det A)x
j
=
n

i=1
(adj A)
ji
y
i
=
n

i=1
(1)
i+j
y
i
det A(i[j).
This last expression is the determinant of the n n matrix obtained by
replacing the jth column of A by Y . If det A = 0, all this tells us nothing.
But if det A ,= 0, we have Cramers rule:
Let A be an n n matrix over the eld F such that det A ,= 0. If
y
1
, . . . , y
n
are any scalars in F, the unique solution X = A
1
Y of the system
of equations AX = Y is given by
x
j
=
det B
j
det A
, j = 1, . . . , n,
where B
j
is the n n matrix obtained from A by replacing the jth column
of A by Y .
6.3.15 Comparing ST with TS
We begin this subsection with a discussion (in terms of linear operators) that
is somewhat involved and not nearly as elegant as that given at the end. But
we nd the techniques introduced rst to be worthy of study. Then at the
end of the subsection we give a better result with a simpler proof.
Theorem 6.3.16 (a) Let V be nite dimensional over F and suppose
S, T /(V ). Then ST and TS have the same eigenvalues, but not necessarily
the same minimal polynomial.
86 CHAPTER 6. DETERMINANTS
Proof. Let be an eigenvalue of ST with nonzero eigenvector v. Then
TS(T(v)) = T(ST(v)) = T(v) = T(v). This says that if T(v) ,= 0,
then is an eigenvalue of TS with eigenvector T(v). However, it might be
the case that T(v) = 0. In that case T is not invertible, so TS is not invert-
ible. Hence it is not one-to-one. This implies that = 0 is an eigenvalue
of TS. So each eigenvalue of ST is an eigenvalue of TS. By a symmetric
argument, each eigenvalue of TS is an eigenvalue of ST.
To show that ST and TS might not have the same minimal polynomial,
consider the matrices
S =
_
0 1
0 0
_
; T =
_
0 0
0 1
_
.
Then
ST =
_
0 1
0 0
_
and TS =
_
0 0
0 0
_
.
The minimal polynomial of ST is x
2
and that of TS is x.
Additional Comment 1: The above proof (and example) show that the
multiplicities of 0 as a root of the minimal polynomials of ST and TS may be
dierent. This raises the question as to whether the multiplicities of nonzero
eigenvalues as roots of the minimal polynomials of ST and TS could be
dierent. However, this cannot happen. We sketch a proof of this. Suppose
the minimal polynomial of ST has the factor (x)
r
for some positive integer
r and some nonzero eigenvalue . This means that on the one hand, for each
generalized eigenvector w ,= 0 associated with , (ST I)
r
(w) = 0, and on
the other hand, there is a nonzero generalized eigenvector v associated with
such that (ST I)
r1
(v) ,= 0. (See the beginning of Chapter 10.)
Step 1. First prove (say by induction on m) that T(ST I)
m
= (TS
I)
m
T and (interchanging the roles of S and T) S(TSI)
m
= (ST I)
m
S.
This is for all positive integers m.
Step 2. Show ST(v) ,= 0, as follows:
Assume that ST(v) = 0. Then
0 = (ST I)
r
(v)
= (ST I)
r1
(ST I)(v)
= (ST I)
r1
(0 v)
= (ST I)
r1
(v) ,= 0,
6.3. ADDITIONAL PROPERTIES OF DETERMINANTS 87
a clear contradiction.
Note that this also means that T(v) ,= 0.
Step 3. From the above we have 0 = T(ST I)
r
(v) = (TS I)
r
T(v),
so T(v) is a generalized eigenvector of TS associated with .
Step 4. Show that (TS I)
r1
T(v) ,= 0. This will imply that (x )
r
divides the minimum polynomial of TS. Then interchanging the roles of S
and T will show that if (x)
r
divides the minimum polynomial of TS, it also
divides the minimum polynomial of ST. Hence for each nonzero eigenvalue
, it has the same multiplicity as a factor in the minimum polynomial of TS
as it has in the minimum polynomial of ST. The details are:
Suppose that 0 = (TS I)
r1
T(v), then
0 = S(0) = S(TS I)
r1
T(v)
= (ST I)
r1
ST(v)
But then
0 = (ST I)
r
(v) = (ST I)
r1
(ST I)(v)
= (ST I)
r1
(ST(v) v)
= 0 (ST I)
r1
(v) ,= 0,
a clear contradiction.
This completes the proof.
Additional Comment 2: It would be easy to show that the two charac-
teristic polynomials are equal if any two n n matrices were simultaneously
upper triangularizable. It is true that if two such matrices commute then
(over an algebraically closed eld) they are indeed simultaneously upper tri-
angularizable. However, consider the following example:
S =
_
1 0
0 0
_
, and T =
_
1 1
1 0
_
.
Suppose there were an invertible matrix P =
_
a b
c d
_
for which both
PSP
1
and PTP
1
were upper triangular. Let = detP = ad bc ,= 0.
Then
PSP
1
=
1

_
ad ab
cd bc
_
,
88 CHAPTER 6. DETERMINANTS
which is upper triangular if and only if cd = 0.
PTP
1
=
1

_
ad + bd ac ab b
2
+ a
2
cd + d
2
c
2
cd + d
2
c
2
_
,
which is upper triangular if and only if d
2
+ cd c
2
= 0. If c = 0 this forces
d = 0, giving = 0, a contradiction. If d = 0, then c = 0.
Additional Comment 3: The preceding Comment certainly raises the
question as to whether or not the characteristic polynomial of ST can be
dierent from that of TS. In fact, they must always be equal, and there
is a proof of this that introduces a valuable technique. Suppose that Z is
the usual ring of integers. Suppose that Y = (y
ij
) and Z = (z
ij
) are two
nn matrices of indeterminates over the integers. Let D = Z[y
ij
, z
ij
], where
1 i, j n. So D is the ring of polynomials over Z in 2n
2
commuting
but algebraically independent indeterminates. Then let L = Q(y
ij
, z
ij
) be
the eld of fractions of the integral domain D. Clearly the characteristic
polynomials of Y Z and of ZY are the same whether Y and Z are viewed
as being over D or L. But over L both Y and Z are invertible, so Y Z =
Y (ZY )Y
1
and ZY are similar. Let x be an additional indeterminate so that
the characteristic polynomial of ZY is
det(xI ZY ) = det(Y ) det(xI ZY ) det(Y
1
) = det(xI Y Z),
which is then also the characteristic polynomial of Y Z. Hence ZY and Y Z
have the same characteristic polynomial.
Now let R be a commutative ring with 1 ,= 0. (You may take R to
be a eld, but this is not necessary.) Also let S = (s
ij
) and T = (t
ij
) be
n n matrices over R. There is a homomorphism : D[x] = Z[y
ij
, z
ij
][x]
R[x] that maps y
ij
s
ij
and z
ij
t
ij
for all i, j, and that maps x to
x. This homomorphism maps the characteristic polynomial of Y Z to the
characteristic polynomial of ST and the characteristic polynomial of ZY
to that of TS. It follows that ST and TS have the same characteristic
polynomial since ZY and Y Z do.
Exercise: If 0 has multiplicity at most 1 as a root of the characteristic
polynomial of ST, then ST and TS have the same minimal polynomial.
Now suppose that R is a commutative ring with 1, that A and B are
matrices over R with A being m n and B being n m, with m n. Let
6.4. DEEPER RESULTS WITH SOME APPLICATIONS

89
f
AB
, f
BA
be the characteristic polynomials of AB and BA, respectively, over
R.
Theorem 6.3.15(b) f
BA
= x
nm
f
AB
.
Proof. For square matrices of size m+n we have the following two identities:
_
AB 0
B 0
__
I A
0 I
_
=
_
AB ABA
B BA
_
_
I A
0 I
__
0 0
B BA
_
=
_
AB ABA
B BA
_
.
Since the (m + n) (m + n) block matrix
_
I A
0 I
_
is nonsingular (all its eigenvalues are +1), it follows that
_
I A
0 I
_
1
_
AB 0
B 0
__
I A
0 I
_
=
_
0 0
B BA
_
.
Hence the two (m + n) (m + n) matrices
C
1
=
_
AB 0
B 0
_
and C
2
=
_
0 0
B BA
_
are similar.
The eigenvalues of C
1
are the eigenvalues of AB together with n zeros.
The eigenvalues of C
2
are the eigenvalues of BA together with m zeros. Since
C
1
and C
2
are similar, they have exactly the same eigenvalues, including
multiplicities, the theorem follows.
6.4 Deeper Results with Some Applications

The remainder of this chapter may be omitted without loss of continuity.


In general we continue to let K be a commutative ring with 1. Here
Mat
n
(K) denotes the ring of n n matrices over K, and [A[ denotes the
element of K that is the determinant of A.
90 CHAPTER 6. DETERMINANTS
6.4.1 Block Matrices whose Blocks Commute

We can regard a k k matrix M = (A


(i,j)
) over Mat
n
(K) as a block matrix,
a matrix that has been partitioned into k
2
submatrices (blocks) over K, each
of size n n. When M is regarded in this way, we denote its determinant
in K by [M[. We use the symbol D(M) for the determinant of M viewed as
a k k matrix over Mat
n
(K). It is important to realize that D(M) is an
n n matrix.
Theorem. Assume that M is a k k block matrix of nn blocks
A
(i,j)
over K that pairwise commute. Then
[M[ = [D(M)[ =

S
k
(sgn )A
(1,(1))
A
(2,(2))
A
(k,(k))

. (6.29)
Here o
k
is the symmetric group on k symbols, so the summation is the
usual one that appears in a formula for the determinant. The rst proof
of this result to come to our attention is the one in N. Jacobson, Lectures
in Abstract algebra, Vol. III Theory of Fields and Galois Theory, D. Van
Nostrand Co., Inc., 1964, pp 67 70. The proof we give now is from I.
Kovacs, D. S. Silver, and Susan G. Williams, Determinants of Commuting-
Block Matrices, Amer. Math. Monthly, Vol. 106, Number 10, December
1999, pp. 950 952.
Proof. We use induction on k. The case k = 1 is evident. We suppose that
Eq. 6.27 is true for k 1 and then prove it for k. Observe that the following
matrix equation holds:
_
_
_
_
_
I 0 0
A
(2,1)
I 0
.
.
.
.
.
.
.
.
.
A
(k,1)
0 I
_
_
_
_
_
_
_
_
_
_
I 0
0 A
(1,1)
0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 A
(1,1)
_
_
_
_
_
M =
_
_
_
_
_
A
(1,1)

0
.
.
. N
0
_
_
_
_
_
,
where N is a (k 1) (k 1) matrix. To simplify the notation we write this
as
PQM = R, (6.30)
6.4. DEEPER RESULTS WITH SOME APPLICATIONS

91
where the symbols are dened appropriately. By the multiplicative property
of determinants we have D(PQM) = D(P)D(Q)D(M) = (A
(1,1)
)
k1
D(M)
and D(R) = A
(1,1)
D(N). Hence we have (A
(1,1)
)
k1
D(M) = A
(1,1)
D(N).
Take the determinant of both sides of the last equation. Since [D(N)[ = [N[
by the induction hypothesis, and using PQM = R, we nd
[A
(1,1)
[
k1
[D(M)[ = [A
(1,1)
[[D(N)[ = [A
(1,1)
[[N[
= [R[ = [P[[Q[[M[ = [A
(1,1)
[
k1
[M[.
If [A
(1,1)
[ is neither zero nor a zero divisor, then we can cancel [A
(1,1)
[
k1
from both sides to get Eq. 6.29.
For the general case, we embed K in the polynomial ringK[z], where z
is an indeterminate, and replace A
(1,1)
with the matrix zI + A
(1,1)
. Since
the determinant of zI + A
(1,1)
is a monic polynomial of degree n, and hence
is neither zero nor a zero divisor, Eq. 6.27 holds again. Substituting z =
0 (equivalently, equating constant terms of both sides) yields the desired
result.
6.4.2 Tensor Products of Matrices

Denition: Let A = (a
ij
) M
m
1
,n
1
(K), and let B B
m
2
,n
2
(K). Then
the tensor product or Kronecker product of A and B, denoted A B
M
m
1
m
2
,n
1
n
2
(K), is the partitioned matrix
A B =
_
_
_
_
_
a
11
B a
12
B a
1n
1
B
a
21
B a
22
B a
2n
1
B
.
.
.
.
.
.
.
.
.
.
.
.
a
m
1
1
B a
m
1
2
B a
m
1
n
1
B
_
_
_
_
_
. (6.31)
It is clear that I
m
I
n
= I
mn
.
Lemma Let A
1
M
m
1
,n
1
(K), A
2
M
n
1
,r
1
(K), B
1
M
m
2
,n
2
(K), and
B
2
M
n
2
,r
2
(K). Then
(A
1
B
1
)(A
2
B
2
) = (A
1
A
2
) (B
1
B
2
). (6.32)
92 CHAPTER 6. DETERMINANTS
Proof. Using block multiplication, we see that the (i, j) block of (A
1
A
2
)
(B
1
B
2
) is
n
1

k=1
((A
1
)
ik
B
1
) ((A
2
)
kj
B
2
) =
=
_
n
1

k=1
(A
1
)
ik
(A
2
)
kj
_
B
1
B
2
,
which is also seen to be the (i, j) block of (A
1
A
2
) (B
1
B
2
).
Corollary Let A M
m
(K) and B M
n
(K). Then
A B = (A I
n
)(I
m
B). (6.33)
and
[A B[ = [A[
n
[B[
m
. (6.34)
Proof. Eq. 6.33 is an easy consequence of Eq. 6.32. Then prove Eq. 6.34 as
follows. Use the fact that the determinant function is multiplicative. Clearly
I
m
B is a block diagonal matrix with m blocks along the diagonal, each
equal to B, so det(I
m
B) = (det(B))
m
. The determinant det(A I
n
) is
a little trickier. The matrix A I
n
is a block matrix whose (i, j)-th block
is a
ij
I
n
, 1 i, j m. Since each two of the blocks commute, we can use
the theorem that says that we can rst compute the determinant as though
it were an m m matrix of elements from a commutative ring of n n
matrices, and then compute the determinant of this n n matrix. Hence
det(A I) = det(det(A) I
n
) = (det(A))
n
, and the proof is complete.
6.4.3 The Cauchy-Binet Theorem-A Special Version

The main ingredient in the proof of the Matrix-Tree theorem (see the next
section) is the following theorem known as the Cauchy-Binet Theorem. It
is more commonly stated and applied with the diagonal matrix below
taken to be the identity matrix. However, the generality given here actually
simplies the proof.
Theorem 6.4.4. Let A and B be, respectively, r m and m r matrices,
with r m. Let be the mm diagonal matrix with entry e
i
in the (i, i)-
position. For an r-subset S of [m], let A
S
and B
S
denote, respectively, the
6.4. DEEPER RESULTS WITH SOME APPLICATIONS

93
r r submatrices of A and B consisting of the columns of A, or the rows of
B, indexed by the elements of S. Then
det(A B) =

S
det(A
S
)det(B
S
)

iS
e
i
,
where the sum is over all r-subsets S of [m].
Proof. We prove the theorem assuming that e
1
, . . . , e
m
are independent (com-
muting) indeterminates over F. Of course it will then hold for all values of
e
1
, . . . , e
m
in F.
Recall that if C = (c
ij
) is any r r matrix over F, then
det(C) =

Sr
sgn()c
1(1)
c
2(2)
c
r(r)
.
Given that A = (a
ij
) and B = (b
ij
), the (i,j)-entry of AB is

m
k=1
a
ik
e
k
b
kj
,
and this is a linear form in the indeterminates e
1
, . . . , e
m
. Hence det(AB) is
a homogeneous polynomial of degree r in e
1
, . . . , e
m
. Suppose that det(AB)
has a monomial e
t
1
1
e
t
2
2
. . . where the number of indeterminates e
i
that have
t
i
> 0 is less than r. Substitute 0 for the indeterminates e
i
that do not
appear in e
t
1
1
e
t
2
2
. . ., i.e., that have t
i
= 0. This will not aect the monomial
e
t
1
1
e
t
2
2
. . . or its coecient in det(A B). But after this substitution has
rank less than r, so A B has rank less than r, implying that det(A B)
must be the zero polynomial. Hence we see that the coecient of a monomial
in the polynomial det(A B) is zero unless that monomial is the product
of r distinct indeterminates e
i
, i.e., unless it is of the form

iS
e
i
for some
r-subset S of [m].
The coecient of a monomial

iS
e
i
in det(A B) is found by setting
e
i
= 1 for i S, and e
i
= 0 for i , S. When this substitution is made in
, AB evaluates to A
S
B
S
. So the coecient of

iS
e
i
in det(AB) is
det(A
S
)det(B
S
).
Exercise 6.4.4.1. Let M be an nn matrix all of whose linesums are zero.
Then one of the eigenvalues of M is
1
= 0. Let
2
, . . . ,
n
be the other
eigenvalues of M. Show that all principal n 1 by n 1 submatrices have
the same determinant and that this value is
1
n

3

n
.
Sketch of Proof: First note that since all line sums are equal to zero, the
entries of the matrix are completely determined by the entries of M in the
94 CHAPTER 6. DETERMINANTS
rst n 1 rows and rst n 1 columns, and that the entry in the (n, n)
position is the sum of all (n1)
2
entries in the rst n1 rows and columns.
Clearly
1
= 0 is an eigenvalue of A. Observe the appearance of the
(n 1) (n 1) subdeterminant obtained by deleting the bottom row and
right hand column. Then consider the principal subdeterminant obtained by
deleting row j and column j, 1 j n 1, from the original matrix M. In
this (n1) (n1) matrix, add the rst n2 columns to the last one, and
then add the rst n 2 rows to the last one. Now multiply the last column
and the last row by -1. This leaves a matrix that could have been obtained
from the original upper (n 1) (n 1) submatrix by moving its jth row
and column to the last positions. So it has the same determinant.
Now note that the coecient of x in the characteristic polynomial f(x) =
det(xIA) is (1)
n1

3

n
, since
1
= 0, and it is also (1)
n1

det(B),
where the sum is over all principal subdeterminants of order n1, which by
the previous paragraph all have the same value. Hence det(B) =
1
n

3

n
,
for any principal subdeterminant det(B) of order n 1.
6.4.5 The Matrix-Tree Theorem

The matrix-tree theorem expresses the number of spanning trees in a graph


as the determinant of an appropriate matrix, from which we obtain one more
proof of Cayleys theorem counting labeled trees.
An incidence matrix N of a directed graph H is a matrix whose rows
are indexed by the vertices V of H, whose columns are indexed by the edges
E of H, and whose entries are dened by:
N(x, e) =
_
_
_
0 if x is not incident with e, or e is a loop,
1 if x is the head of e,
1 if x is the tail of e.
Lemma 6.4.6. If H has k components, then rank(N) = [V [ k.
Proof. N has v = [V [ rows. The rank of N is v n, where n is the dimension
of the left null space of N, i.e., the dimension of the space of row vectors g for
which gN = 0. But if e is any edge, directed from x to y, then gN = 0 if and
only if g(x) g(y) = 0. Hence gN = 0 i g is constant on each component
of H, which says that n is the number k of components of H.
6.4. DEEPER RESULTS WITH SOME APPLICATIONS

95
Lemma 6.4.7. Let A be a square matrix that has at most two nonzero entries
in each column, at most one 1 in each column, at most one -1 in each column,
and whose entries are all either 0, 1 or -1. Then det(A) is 0, 1 or -1.
Proof. This follows by induction on the number of rows. If every column
has both a 1 and a -1, then the sum of all the rows is zero, so the matrix is
singular and det(A) = 0. Otherwise, expand the determinant by a column
with one nonzero entry, to nd that it is equal to 1 times the determinant
of a smaller matrix with the same property.
Corollary 6.4.8. Every square submatrix of an incidence matrix of a di-
rected graph has determinant 0 or 1. (Such a matrix is called totally
unimodular.)
Theorem 6.4.9. (The Matrix-Tree Theorem) The number of spanning trees
in a connected graph G on n vertices and without loops is the determinant of
any n 1 by n 1 principal submatrix of the matrix D A, where A is the
adjacency matrix of G and D is the diagonal matrix whose diagonal contains
the degrees of the corresponding vertices of G.
Proof. First let H be a connected digraph with n vertices and with incidence
matrix N. H must have at least n 1 edges, because it is connected and
must have a spanning tree, so we may let S be a set of n1 edges. Using the
notation of the Cauchy-Binet Theorem, consider the n by n 1 submatrix
N
S
of N whose columns are indexed by elements of S. By Lemma 6.4.6, N
S
has rank n1 i the spanning subgraph of H with S as edge set is connected,
i.e., i S is the edge set of a tree in H. Let N
t
be obtained by dropping any
single row of the incidence matrix N. Since the sum of all rows of N (or of
N
S
) is zero, the rank of N
t
S
is the same as the rank of N
S
. Hence we have
the following:
det(N
t
S
) =
_
1 if S is the edge set of a spanning tree in H,
0 otherwise.
(6.35)
Now let G be a connected loopless graph on n vertices. Let H be any
digraph obtained by orienting G, and let N be an incidence matrix of H.
Then we claim NN
T
= D A. For,
96 CHAPTER 6. DETERMINANTS
(NN
T
)
xy
=

eE(G)
N(x, e)N(y, e)
=
_
deg(x) if x = y,
t if x and y are joined by t edges in G.
An n 1 by n 1 principal submatrix of D A is of the form N
t
N
tT
where N
t
is obtained from N by dropping any one row. By Cauchy-Binet,
det(N
t
N
tT
) =

S
det(N
t
S
) det(N
tT
S
) =

S
(det(N
t
S
))
2
,
where the sum is over all n 1 subsets S of the edge set. By Eq. 6.33 this is
the number of spanning trees of G.
Exercise 6.4.9.1. (Cayleys Theorem) In the Matrix-Tree Theorem, take G
to be the complete graph K
n
. Here the matrix D A is nI J, where I is
the identity matrix of order n, and J is the n by n matrix of all 1s. Now
calculate the determinant of any n 1 by n 1 principal submatrix of this
matrix to obtain another proof that K
n
has n
n2
spanning trees.
Exercise 6.4.9.2. In the statement of the Matrix-Tree Theorem it is not
necessary to use principal subdeterminants. If the n 1 n 1 submatrix
M is obtained by deleting the ith row and jth column from D A, then
the number of spanning trees is (1)
i+j
det(M). This follows from the more
general lemma: If A is an n 1 n matrix whose row sums are all equal
to 0 and if A
j
is obtained by deleting the jth column of A, 1 j n, then
det(A
j
) = det(A
j+1
).
6.4.10 The Cauchy-Binet Theorem - A General Version

Let 1 p m Z, and let Q


p,m
denote the set of all sequences =
(i
1
, i
2
, , i
p
) of p integers with 1 i
1
< i
2
< < i
p
m. Note that
[Q
p,m
[ =
_
m
p
_
.
Let K be a commutative ring with 1, and let A M
m,n
(K). If Q
p,m
and Q
j,n
, let A[[] denote denote the submatrix of A consisting of
the elements whose row index is in and whose column index is in . If
Q
p,m
, then there is a complementary sequence Q
mp,m
consisting of
the list of exactly those positive integers between 1 and m that are not in ,
and the list is in increasing order.
Theorem 6.4.11. Let A M
m,n
(K) and B M
n,p
(K). Assume that 1
t minm, n, p and let Q
t,m
, Q
t,p
. Then
6.4. DEEPER RESULTS WITH SOME APPLICATIONS

97
det(AB[[]) =

Qt,n
det(A[[]) det(B[[]).
Proof. Suppose that = (
1
, . . . ,
t
), = (
1
, . . . ,
t
), and let C = AB[[].
Then
C
ij
=
n

k=1
a

i
k
b
k
j
.
So we have
C =
_
_
_

n
k=1
a

1
k
b
k
1


n
k=1
b
kt
.
.
.
.
.
.
.
.
.

n
k=1
a
tk
b
k
1


n
k=1
a
tk
b
kt
_
_
_
.
To calculate the determinant of C we start by using the n-linearity in the
rst row, then the second, row, etc.
det(C) =
n

k
1
=1
a

1
k
1
det
_
_
_
_
_
b
k
1

1
b
k
1
t

n
k=1
a

2
k
b
k
1


n
k=1
a

2
k
b
kt
.
.
.
.
.
.

n
k=1
a

2
k
b
kt

n
k=1
a
tk
b
k
1


n
k=1
a
tk
b
kt
_
_
_
_
_
=
=
n

k
1
=1

n

kt=1
a

1
k
1
a
tkt
det
_
_
_
b
k
1

1
b
k
1
t
.
.
.
.
.
.
b
kt
1
b
ktt
_
_
_
. (6.36)
If k
i
= k
j
for i ,= j, then
det
_
_
_
b
k
1

1
b
k
1
t
.
.
.
.
.
.
b
kt
1
b
ktt
_
_
_
= 0.
Then the only possible nonzero determinant occurs when (k
1
, . . . , k
t
) is
a permutation of a sequence = (
1
, . . . ,
t
) Q
t,n
. Let o
t
be the
permutation of 1, 2, . . . , t such that
i
= k
(i)
for 1 i t. Then
98 CHAPTER 6. DETERMINANTS
det
_
_
_
b
k
1

1
b
k
1
t
.
.
.
.
.
.
b
kt
1
b
ktt
_
_
_
sgn() det(B[[]). (6.37)
Given a xed Q
t,n
, all possible permutations of are included in the
summation in Eq. 6.36. Therefore Eq. 6.36 may be rewritten , using Eq.
6.37, as
det(C) =

Qt,n
_

St
sgn()a

(1)
a
t
(t)
_
det(B[[]),
which is the desired formula.
The Cauchy-Binet formula gives another verication of the fact that
det(AB) = det(A) det(B) for square matrices A and B.
6.4.12 The General Laplace Expansion

For = (
1
, . . . ,
t
) Q
t,n
, put s() =

t
j=1

j
. Theorem: Let A M
n
(K)
and let Q
t,n
(1 t n) be given. Then
det(A) =

Qt,n
(1)
s()+s()
det(A[[]) det(A[ [ ]). (6.38)
Proof. For A M
n
(K), dene
D

(A) =

Qt,n
(1)
s()+s()
det(A[[]) det(A[ [ ]). (6.39)
Then D

: M
n
(K) K is easily shown to be n-linear as a function on
the columns of A. To complete the proof, it is only necessary to show that
D

is alternating and that D

(I
n
) = 1. Thus, suppose that the columns of A
labeled p and q, p < q, are equal. If p and q are both in Q
t,n
, then A[[]
will have two columns equal and hence have zero determinant. Similarly, if
p and q are both in Q
nt,n
, then det(A[ [ ) = 0. Thus in the evaluation
of D

(A) it is only necessary to consider those Q


t,n
such that p and
q , or vice versa. So suppose p , q , and dend a new sequence

t
in Q
t,n
by replacing p by q. Thus
t
agrees with except that q has
been replaced by p. Thus
6.4. DEEPER RESULTS WITH SOME APPLICATIONS

99
s(
t
) s() = q p. (6.40)
(Note that s() p and s(
t
) q are both the sum of all the things in
except for p.) Now consider the sum
(1)
s()
det(A[[]) det(A[ [ ]) + (1)
s(

)
det(A[[
t
]) det(A[ [
t
]),
which we denote by S(A). We claim that this sum is 0. Assuming this,
since and
t
appear in pairs in Q
t,n
, it follows that D

(A) = 0 whenever
two columns of A agree, forcing D

to be alternating. So now we show that


S(A) = 0.
Suppose that p =
k
and q =
l
. Then and
t
agree except in the range
from p to q, as do and
t
. This includes a total of q p +1 entries. If r of
these entries are included in , then

1
< <
k
= p <
k+1
< <
k+r1
< q <
k+r
< <
t
and
A[[
t
] = A[[]P
w
1,
where w is the r-cycle (k + r 1, k + r 2, . . . , k). Similarly,
A[ [
t
] = A[ [ ]P
w

where w
t
is a (q p + 1 r)-cycle. Thus,
(1)
s(

)
det(A[[
t
]) det(A[ [
t
]) =
= (1)
s(

)+(r1)+(qp)r
det(A[[]) det(A[ [ ]).
Since s(
t
) + (q p) 1 s() = 2(q p) 1 is odd, we conclude that
S(A) = 0. Thus D

is n-linear and alternating. It is routine to check that


D

(I
n
) = 1, completing the proof.
Applying this formula for det(A) to det(A
T
) gives the Laplace expansion
in terms of columns.
100 CHAPTER 6. DETERMINANTS
6.4.13 Determinants, Ranks and Linear Equations

If K is a commutative ring with 1 and A M


m,n
(K), and if 1 t
minm, n, then a t t minor of A is the determinant of any submatrix
A[[] where Q
t,m
, Q
t,n
. The determinantal rank of A, denoted
D-rank(A), is the largest t such that there is a nonzero t t minor of A.
With the same notation,
F
t
(A) = det A[[] : Q
t,m
, Q
t,n
) K.
That is, F
t
(A) is the ideal of K generated by all the tt minors of A. Put
F
0
(A) = K and F
t
(A) = 0 if t > minm, n. F
t
(A) is called the t
th
-Fitting
ideal of A. The Laplace expansion of determinants along a row or column
shows that F
t+1
(A) F
t
(A). Thus there is a decreasing chain of ideals
K = F
0
(A) F
1
(A) F
2
(A) .
Denition: If K is a PID, then F
t
(A) is a principal ideal, say F
t
(A) =
d
t
(A)) where d
t
(A) is the greatest common divisor of all the t t minors of
A. In this case, a generator of F
t
(A) is called the t
th
-determinantal divisor
of A.
Denition If A M
m,n
(K), then the M-rank(A) is dened to be the
largest t such that 0 = Ann(F
t
(A)) = k K : kd = 0 for all d F
t
(A).
Obs. 6.4.14. 1. M-rank(A) = 0 means that Ann(F
1
(a)) ,= 0. That
is, there is a nonzero a K with a a
ij
= 0 for all entries a
ij
of A.
Note that this is stronger than saying that every element of A is a zero
divisor. For example, if A = (2 3) M
1,2
(Z
6
), then every element of
A is a zero divisor in Z
6
, but there is no single nonzero element of Z
6
that annihilates both entries in the matrix.
2. If A M
n
(K), then M-rank(A) = n means that det(A) is not a zero
divisor of K.
3. To say that M-rank(A) = t means that there is an a ,= 0 K with
a D = 0 for all (t +1)(t +1) minors D of A, but there is no nonzero
b K which annihilates all t t minors of A by multiplication. In
particular, if det(A[[]) is not a zero divisor of K for some Q
s,m
,
Q
s,n
, then M-rank(A) s.
6.4. DEEPER RESULTS WITH SOME APPLICATIONS

101
Lemma 6.4.15. If A M
m,n
(K), then
0 M rank(A) D-rank (A) minm, n.
Proof. Routine exercise.
We can now give a criterion for solvability of the homogeneous linear
equation AX = 0, where A M
m,n
(K). This equation always has the
trivial solution X = 0, so we want a criterion for the existence of a solution
X ,= 0 M
n,1
(K).
Theorem 6.4.16. Let K be a commutative ring with 1 and let A M
m,n
(K).
The matrix equation AX = 0 has a nontrivial solution X ,= 0 M
n,1
(K) if
and only if
M rank(A) < n.
Proof. Suppose that M-rank(A) = t < n. then Ann(F
t+1
(A)) ,= 0, so
choose b ,= 0 K with b F
t+1
(A) = 0. Without loss of generality, we may
assume that t < m, since, if necessary, we may replace the system AX = 0
with an equivalent one (i.e., one with the same solutions) by adding some
rows of zeros to the bottom of A. If t = 0, then ba
ij
= 0 for all a
ij
and we
may take
X =
_
_
_
b
.
.
.
b
_
_
_
.
Then X ,= 0 M
n,1
(K) and AX = 0.
So suppose that t > 0. Then b , Ann(F
t
(A)) = 0, so b det(A[[]) ,= 0
for some Q
t,m
, Q
t,n
. By permuting rows and columns, which
does note aect whether AX = 0 has a nontrivial solution, we can assume
= (1, . . . , t) = . For 1 i t + 1 let
i
= (1, 2, . . . ,

i, . . . , t + 1) Q
t,t+1
,
where

i indicates that i is deleted. Let d
i
= (1)
t+1+i
det(A[[
i
]). Thus
d
1
, . . . , d
t+1
are the cofactors of the matrix
A
1
= A[(1, . . . , t + 1)[(1, . . . , t + 1)]
obtained by deleting row t + 1 and column i. Hence the Laplace expansion
gives
102 CHAPTER 6. DETERMINANTS
_

t+1
j=1
a
ij
d
j
= 0, if 1 i t,

t+1
j=i
a
ij
d
j
= det(A[(1, . . . , t, i)[(1, . . . , t, t + 1)]), if t < i m.
(6.41)
Let X =
_
_
_
x
1
.
.
.
x
n
_
_
_
, where
_
x
i
= bd
i
, if 1 i t + 1,
x
i
= 0, if t + 2 i n.
Then X ,= 0 since x
t+1
= b det(A[[]) ,= 0. But Eq. 6.39 and the fact
that b Ann(F
t+1
(A)) show that
AX =
_
_
_
b

t+1
j=1
a
1j
d
j
.
.
.
b

t+1
j=1
a
mj
d
j
_
_
_
=
_
_
_
_
_
_
_
_
_
0
.
.
.
0
b det(A[(1, . . . , t, t + 1)[(1, . . . , t, t + 1)])
.
.
.
b det(A[(1, . . . , t, m)[(1, . . . , t, t + 1)])
_
_
_
_
_
_
_
_
_
= 0.
Thus X is a nontrivial solution to the equation AX = 0.
Conversely, assume that X ,= 0 M
n,1
(K) is a nontrivial solution to
AX = 0, and choose k with x
k
,= 0. We claim that Ann(F
n
(A)) ,= 0. If
n > m, then F
n
(A) = 0, and hence Ann(F
n
(A)) = K ,= (0). Thus we
may assume that n m. Let = (1, . . . , n) and for each Q
n,m
, let
B

= A[[]. Then since AX = 0 and since each row of B

is a full row of
A, we conclude that B

= 0. The adjoint matrix formula (Eq. 6.20) then


shows that
(det(B

))X = (Adj B

)B

X = 0,
from which we conclude that x
k
det(B

) = 0. Since Q
n,m
is arbitrary,
we conclude that x
k
F
n
(A) = 0, i.e., x
k
Ann(F
n
(A)). But x
k
,= 0, so
Ann(F
n
(A)) ,= 0, and we conclude that M-rank(A) < n, completing the
proof.
6.4. DEEPER RESULTS WITH SOME APPLICATIONS

103
In case K is an integral domain we may replace the M-rank by the ordi-
nary determinantal rank to conclude the following:
Corollary 6.4.17. If K is an integral domain and A M
m,n
(K), then
AX = 0 has a nontrivial solution if and only if D-rank(A) < n.
Proof. If I K, then Ann(I) ,= 0 if and only if I = 0 since an integral
domain has no nonzero zero divisors. Therefore, in an integral domain D-
rank(A) = M-rank(A).
The results for n equations are in n unknowns are even simpler.
Corollary 6.4.18. Let K be a commutative ring with 1.
1. If A M
n
(K), then AX = 0 has a nontrivial solution if and only if
det(A) is a zero divisor of K.
2. If K is an integral domain and A M
n
(K), then AX = 0 has a
nontrivial solution if and only if det (A) = 0.
Proof. If A M
n
(K), then F
n
(A) = det(A)), so M-rank(A) < n if and
only if det(A) is a zero divisor. In particular, if K is an integral domain then
M-rank(A) < n if and only if det(A) = 0.
There are still two other concepts of rank which can be dened for ma-
trices with entries in a commutative ring.
Denition Let K be a commutative ring with 1 and let A M
m,n
(K).
Then we will dene the row rank of A, denoted by row-rank(A), to be the
maximum number of linearly independent rows in A, while the column rank
of A, denoted col-rank(A), is the maximum number of linearly independent
columns.
Corollary 6.4.19. Let K be a commutative ring with 1.
1. If A A
m,n
(K), then
maxrow-rank(A), col-rank(A) M rank(A) D-rank(A).
2. If K is an integral domain, then
row-rank(A) = col-rank(A) = M rank(A) = D-rank(A).
104 CHAPTER 6. DETERMINANTS
Proof. We sketch the proof of the result when K is an integral domain. In
fact, it is possible to embed K in its eld F of quotients and do all the
algebra in F. Recall the following from an undergraduate linear algebra
course. The proofs work over any eld, even if you did only consider them
over the real numbers. Let A be an m n matrix with entries in F. Row
reduce A until arriving at a matrix R in row-reduced echelon form. The rst
thing to remember here is that the row space of A and the row space of R
are the same. Similarly, the right null space of A and the right null space of
R are the same. (Warning: the column space of A and that of R usually are
not the same!) So the leading (i.e., leftmost) nonzero entry in each nonzero
row of R is a 1, called a leading 1. Any column with a leading 1 has that
1 as its only nonzero entry. The nonzero rows of R form a basis for the
row space of A, so the number r of them is the row-rank of A. The (right)
null space of R (and hence of A) has a basis of size n r. Also, one basis
of the column space of A is obtained by taking the set of columns of A in
the positions now indicated by the columns of R in which there are leading
1s. So the column rank of A is also r. This has the interesting corollary
that if any r independent columns of A are selected, there must be some
r rows of those columns that are linearly independent, so there is an r r
submatrix with rank r. Hence this submatrix has determinant dierent from
0. Conversely, if some r r submatrix has determinant dierent from 0, then
the short columns of the submatrix must be independent, so the long
columns of A to which they belong must also be independent. It is now clear
that the row-rank, column-rank, M-rank and determinantal rank of A are all
the same.
Obs. 6.4.20. Since all four ranks of A are the same when K is an integral
domain, in this case we may speak unambiguously of the rank of A, denoted
rank(A). Moreover, the condition that K be an integral domain is truly
necessary, as the following example shows.
Dene A M
4
(Z
210
) by
A =
_
_
_
_
0 2 3 5
2 0 6 0
3 0 3 0
0 0 0 7
_
_
_
_
.
It is an interesting exercise to show the following:
1. row-rank(A) = 1.
6.4. DEEPER RESULTS WITH SOME APPLICATIONS

105
2. col-rank(A) = 2.
3. M-rank(A) = 3.
4. D-rank(A) = 4.
Theorem 6.4.21. Let K be a commutative ring with 1, let M be a nitely
generated K-module, and let S M be a subset. If [S[ > (M) = rank(M),
then S is K-linearly dependent.
Proof. Let (M) = m and let T = w
1
, . . . , w
m
be a generating set for M
consisting of m elements. Choose n distinct elements v
1
, . . . , v
n
of S for
some n > m, which is possible by hypothesis. Since M = w
1
, . . . , w
m
), we
may write
v
j
=
m

i=1
a
ij
w
i
, with a
ij
K.
Let A = (a
ij
) M
m,n
(K). Since n > m, it follows that M-rank(A)
m < n, so Theorem 6.4.16 shows that there is an X ,= 0 M
n,1
(K) such
that AX = 0. Then
n

j=1
x
j
v
j
=
n

j=1
x
j
_
m

i=1
a
ij
w
i
_
=
m

i=1
_
n

j=1
a
ij
x
j
_
w
i
(6.42)
= 0, since AX = 0.
Therefore, S is K-linearly dependent.
Corollary 6.4.22. Let K be a commutative ring with 1, let M be a K-
module, and let N M be a free submodule. Then rank(N) rank(M).
Proof. If rank(M) = , there is nothing to prove, so assume that rank(M)
= m < . If rank(N) > m, then there is a linearly independent subset
of M, namely a basis of N, with more than m elements, which contradicts
Theorem 6.4.21.
Theorem 6.4.23. If A M
m,n
(K) and B M
n,p
(K), then
D-rank(AB) minD-rank(A), D-rank(B). (6.43)
106 CHAPTER 6. DETERMINANTS
Proof. Let t > minD-rank(A), D-rank(B) and suppose that Q
t,m
,
Q
t,p
. Then by the Cauchy-Binet formula
det(AB[[]) =

Qt,n
det(A[[]) det(B[[]).
Since t > min D-rank (A), D-rank(B), at least one of the determinants
det(A[[]) or det(B[[]) must be 0 for each Q
t,n
. Thus det(AB[[]) =
0, and since and are arbitrary, it follows that D-rank(AB) < t, as
required.
The preceding theorem has a useful corollary.
Corollary 6.4.24. Let A M
m,n
(K), U GL(m, K), V GL(n, K).
Then
D-rank(UAV ) = D-rank(A).
Proof. Any matrix B M
m,n
(K) satises D-rank(B) minm, n. Since
D-rank(U) = m and D-rank(V ) = n, it follows from Eq. 6.41 that
D-rank(UAV ) minD-rank(A), n, m = D-rank(A)
and
D-rank(A) = D-rank(U
1
(UAV )V
1
) D-rank(UAV ).
This completes the proof.
6.5 Exercises
Except where otherwise noted, the matrices in these exercises have entries
from a commutative ring K with 1.
1. If A =
_
_

0 0
0 0
_
_
, show that no matter how the ve elements
might be replaced by elements of K it is always the case that det(A) =
0.
6.5. EXERCISES 107
2. Suppose that A is n n with n odd and that A
T
= A. Show that:
(i) If 2 ,= 0 in K, then det(A) = 0.
(ii)

If 2 = 0 but we assume that each diagonal entry of A equals 0,


then det(A) = 0.
3. Let A be an m n complex matrix and let B be an n m complex
matrix. Also, for any positive integer p, I
p
denotes the p p identity
matrix. So AB is mm, while BA is n n.
Show that the complex number ,= 1 is an eigenvalue of I
m
AB if
and only if is an eigenvalue of I
n
BA.
4. (Vandermonde Determinant)
Let t
1
, . . . , t
n
be commuting indeterminates over K, and let A be the
nn matrix whose entries are from the commutative ring K[t
1
, . . . , t
n
]
dened by
A =
_
_
_
_
_
1 t
1
t
2
1
t
n1
1
1 t
2
t
2
2
t
n1
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 t
n
t
2
n
t
n1
n
_
_
_
_
_
.
Then
det A =

1j<in
(t
i
t
j
).
5. Dene the following determinants of matrices whose elements come
from the ring F[x]:
108 CHAPTER 6. DETERMINANTS

1
= 1

2
=

0 1
1 x

3
=

0 1 0
0 x 1
1 0 x

4
=

0 1 0 0
0 x 1 0
0 0 x 1
1 0 0 x

Continue in this fashion.


n
is the determinant of an nn matrix each
of whose entries is either 0 or -1 or x according to the following rule.
The diagonal entries are all equal to x except for the rst diagonal
entry which is 0. Each entry along the super diagonal, (i.e., just above
the main diagonal ) equals -1, as does the entry in the lower left-hand
corner. All other entries are 0. Evaluate
n
.
6. Suppose that A is square and singular and Ax =

b is consistent. Show
that A
adj

b =

0.
7. Dene the following determinants:

0
= 1.

1
= [a[ = a (a 1 1 determinant).

2
=

a b
b a

= a
2
b
2
.

3
=

a b 0
b a b
0 b a

6.5. EXERCISES 109


In general,
n
is the determinant of an nn matrix with each diagonal
entry equal to a and each entry just above or just below the main
diagonal equal to b, and all other entries equal to 0.
(i) Show that
n+2
= a
n+1
b
2

n
.
(ii) Show that
n
=

i
(1)
i
_
ni
i
_
a
n2i
b
2i
.
8. Show that if A is a row vector and B is a column vector then AB =
B A.
For an mn matrix A let A
i
denote its ith row and A
j
denote its jth
column.
9. Show that (A B)
T
= A
T
B
T
.
10. If A and B are invertible, then so is AB, and (AB)
1
= A
1
B
1
.
11. Suppose that A M
(m,m)
and B M
(n,n)
. Show that there is an
mn mn permutation matrix P for which P(A B)P
1
= B A.
12. Let A and B be as in Exercise 11. Show that det(A B) = det(A)
n

det(B)
m
.
13. Let E
ij
be the mn matrix over F with a 1 in the (i, j) position and
0 elsewhere; let F
ij
be the k l matrix over F with a 1 in the (i, j)
position and 0 elsewhere; let G
ij
be the mk nl matrix over F with a
1 in the (i, j) position and 0 elsewhere. Then for (d, g) and (h, e) with
1 d m, 1 g n, 1 h k, 1 e l, we have
E
dg
F
ke
= G
(d1)k+h,(g1)l+e
.
Show that M
(mk,nl)

= M
(m,n)
M
(k,l)
.

For the next exercise we need to dene a function Vec from the set
of m n matrices into 1
mn
as follows. If A has columns a
1
, . . . , a
n
,
each in 1
m
, then Vec(A) is the column vector obtained by putting the
110 CHAPTER 6. DETERMINANTS
columns of A in one long column starting with c
1
and proceeding to
c
n
. As an example,
Vec
_
1 2 3
4 5 6
_
=
_
_
_
_
_
_
_
_
1
4
2
5
3
6
_
_
_
_
_
_
_
_
.
14. Let M be mn and N be n p. Then we have
(i) Vec(MN) = (I
p
M)Vec(N).
(ii)Vec(MN) = (N
T
I
m
)Vec(M).
(iii) If A is mn, X is n p and B is p q, then
Vec(AXB) = (B
T
A)Vec(X).
(iv) If all products are dened,
A
1
XB
1
+A
2
XB
2
= C
_
(B
T
1
A
1
) + (B
T
2
A
2
)
_
Vec(X) = Vec(C).
15. Let F be any eld. For positive integers m, n, let o
n
= (e
1
, . . . , e
n
),
o
m
= (h
1
, . . . , h
m
) be the standard ordered bases of F
n
and F
m
, re-
spectively. For A M
m,n
(F) there is the linear map T
A
/(F
n
, F
m
)
dened by T
A
: x Ax. Moreover, we showed that A is the unique
matrix [T
A
]
Sm,Sn
for which [T
A
(x)]
Sm
= [T
A
]
Sm,Sn
[x]
Sn
for all x F
n
.
The map A T
A
is an isomorphism from M
m,n
(F) onto /(F
n
, F
m
).
Let E
ij
denote the m n matrix with 1 in positiion (i, j) and zero
elsewhere. Then
E
ij
e
k
=
jk
h
i
. (6.44)
Dene f
ij
/(F
n
, F
m
) by f
ij
= T
E
ji
. So
f
ij
: e
k
E
ji
e
k
=
ik
h
j
, 1 i, k n; 1 j m. (6.45)
Let B
1
= (v
1
, v
2
, . . . , v
mn
) be the standard ordered basis of F
mn
, and
let B
2
be the ordered basis of /(F
n
, F
m
) given by
B
2
= (f
11
, f
21
, . . . , f
n1
, f
12
, f
22
, . . . , f
n2
, . . . , f
nm
).
6.5. EXERCISES 111
Prove the following:
(i) Vec(E
ji
) = v
(i1)m+j
, 1 i n, 1 j m. Moreover, Vec:
M
m,n
(F) F
mn
is a vector space isomorphism.
(ii) If A M
m,n
(F), then T
A
/(F
n
, F
m
), so T
A
is some linear com-
bination of the elements of B
2
. Hence it has a coordinate matrix with
respect to B
2
. In fact,
[T
A
]
B
2
= Vec(A
T
).
Let B M
m,m
(F), C M
n,n
. Dene T
B,C
: M
mn
, (F) M
m,n
(F) by
T
B,C
: A BAC.
Recall that Vec(BAC) = C
T
B Vec(A). So if we interpret T
B,C
as
an element of /(F
mn
, F
mn
), we have T
B,C
: Vec(A) C
T
B Vec(A),
i.e., T
B,C
= T
C
T
B
. So we have
T
B,C
: F
mn
F
mn
: Vec(A) C
T
B Vec(A).
(iii) [T
B,C
]
B
1
,B
1
= C
T
B.
Dene g
ij
: F
mn
F
mn
: v
k

ik
v
j
, 1 i, j mn. Put G = (g
ij
)
and B
3
= (Vec(G))
T
= (g
11
, g
21
, . . . , g
mn,1
, . . . , g
1,mn
, . . . , g
mn,mn
). Then
(iv) [T
B,C
]
B
3
= [T
C
T
B
]
B
3
= Vec((C B
T
).
112 CHAPTER 6. DETERMINANTS
Chapter 7
Operators and Invariant
Subspaces
Throughout this chapter F will be an arbitrary eld unless otherwise re-
stricted, and V will denote an arbitrary vectors space over F. Much of our
work will require that V be nite dimensional, but we shall be general as
long as possible. Given an operator T /(V ), our main interest will be in
nding T-invariant subspaces U
1
, . . . , U
r
such that V = U
1
U
2
U
r
.
Since each U
i
is invariant under T, we may consider the restriction T
i
= T[
U
i
of T to the subspace U
i
. Then we write T = T
1
T
r
. It is usually easier
to analyze T by writing it as the sum of the operators T
i
and then analyzing
the operators T
i
on the smaller subspaces U
i
.
7.1 Eigenvalues and Eigenvectors
Let T /(V ). Then

0, V, null(T), and Im(T) are invariant under T.


However, often these are not especially interesting as invariant subspaces,
and we want to begin with 1-dimensional T-invariant subspaces.
Suppose that U is a 1-dimensional T-invariant subspace. Then there
exists some nonzero vector u U. Since T(u) U, there is some scalar
a F such that T(u) = au. By hypothesis (u) is a basis for U. If bu is any
vector in U, then T(bu) = bT(u) = b(au) = a(bu). Hence for each v U,
T(v) = av. If a F satises the property that there is some nonzero vector
v V such that T(v) = av, then a is called an eigenvalue of T. A vector
v V is called an eigenvector of T belonging to the eigenvalue provided
113
114 CHAPTER 7. OPERATORS AND INVARIANT SUBSPACES
T(v) = v, i.e., (T I)(v) =

0. Note that the eigenvalue could be the
zero scalar, but it is an eigenvalue if and only if there is a nonzero eigenvector
belonging to it. For this reason many authors restrict all eigenvectors to be
nonzero. However, it is convenient to include

0 in the set of eigenvectors
belonging to any particular eigenvalue so that the set of all eigenvectors
belonging to some eigenvalue is a subspace, in fact a T-invariant subspace.
We have the following:
Obs. 7.1.1. If is an eigenvalue of T, then null(T I) is the T-invariant
subspace of V consisting of all eigenvectors belonging to .
Consider the example T /(F
2
) dened by T(y, z) = (2z, y). Then
T(y, z) = (y, z) if and only if 2z = y and y = z, so 2z = (z).
If (y, z) ,= (0, 0), then
2
= 2. If F = 1, for example, then T has no
eigenvalue. However, if F is algebraically closed, for example, then T has
two eigenvalues where is one of the two solutions to
2
= 2. If
F = Z
5
, then 2 = 3 is a non-square in F, so T has no eigenvalues. But if
F = Z
11
, then = 3 satises
2
= 2 in F.
Nonzero eigenvectors belonging to distinct eigenvalues are linearly inde-
pendent.
Theorem 7.1.2. Let T /(V ) and suppose that
1
, . . . ,
m
are distinct
eigenvalues of T with corresponding nonzero eigenvectors v
1
, . . . , v
m
. Then
the list (v
1
, . . . , v
m
) is linearly independent.
Proof. Suppose (v
1
, . . . , v
m
) is linearly dependent. By the Linear Dependence
Lemma we may let j be the smallest positive integer for which (v
1
, . . . , v
j
)
is linearly dependent, so that v
j
span(v
1
, . . . , v
k1
). So there are scalars
a
1
, . . . , a
j1
for which
v
j
= a
1
v
1
+ + a
j1
v
j1
. (7.1)
Apply T to both sides of this equation to obtain

j
v
j
= a
1

1
v
1
+ a
2

2
v
2
+ + a
j1

j1
v
j1
.
Multiply both sides of Eq. 7.1 by
j
and subtract the equation above from
it. This gives

0 = a
1
(
j

1
)v
1
+ + a
j1
(
j

j1
)v
j1
.
7.2. UPPER-TRIANGULAR MATRICES 115
Because j was chosen to be the smallest integer for which
v
j
span(v
1
, . . . , v
j1
)
we now have that (v
1
, . . . , v
j1
) is linearly independent. Since the s were
all distinct, this means that a
1
= = a
j1
= 0, implying that v
j
=

0 (by
Eq. 7.1), contradicting our hypothesis that all the v
i
s are nonzero. Hence
our assumption that (v
1
, . . . , v
j
) is linearly dependent must be false.
Since each linearly independent set of a nite dimensional space has no
more elements than the dimension of that space, we obtain the following
result.
Corollary 7.1.3. If dim(V ) = n < , then for any T /(V ), T can never
have more than n distinct eigenvalues.
7.2 Upper-Triangular Matrices
In Chapter 5 we applied polynomials to elements of some linear algebra over
F. In particular, if T /(V ) and f, g, h F[x] with f = gh, then by
Theorem 5.2.5 we know that f(T) = g(T)h(T).
Theorem 7.2.1. Let V be a nonzero, nite dimensional vector space over the
algebraically closed eld F. Then each operator T on V has an eigenvalue.
Proof. Suppose dim(V ) = n > 0 and choose a nonzero vector v V . Let
T /(V ). Then the set
(v, T(v), T
2
(v), . . . , T
n
(v))
of n + 1 vectors in an n-dimensional space cannot be linearly independent.
So there must be scalars, not all zero, such that

0 = a
0
v +a
1
T(v)+a
2
T
2
(v)+
+a
n
T
n
(v). Let m be the largest index such that a
m
,= 0. Since v ,=

0, the
coecients a
1
, . . . , a
n
cannot all be 0, so 0 < m n. Use the a
t
s to construct
a polynomial which can be written in factored form as
a
0
+ a
1
z + a
2
z
2
+ + a
m
z
m
= c(z
1
)(z
2
) (z
m
),
116 CHAPTER 7. OPERATORS AND INVARIANT SUBSPACES
where c F is nonzero, each
j
F, and the equation holds for all z F.
We then have

0 = a
0
v + a
1
T(v) + + a
m
T
m
(v)
= (a
0
I + a
1
T + a
m
T
m
)(v) (7.2)
= c(T
1
I)(T
2
I) (T
m
I)(v).
If (T
m
I)(v) =

0, then
m
must be an eigenvalue. If it is not the zero
vector, consider (T
m1
I)(T
m
I)(v). If this is zero, then
m1
must
be an eigenvalue. Proceeding this way, we see that the rst
j
(i.e., with the
largest j) for which (T
j
I)(T
j+1
I) (T
m
I)(v) =

0 must be an
eigenvalue.
Theorem 7.2.2. Suppose T /(V ) and B = (v
1
, . . . , v
n
) is a basis of V .
Then the following are equivalent:
(i) [T]
B
is upper triangular.
(ii) T(v
k
) span(v
1
, . . . , v
k
) for each k = 1, . . . , n.
(iii) The span(v
1
, . . . , v
k
) is T-invariant for each k = 1, . . . , n.
Proof. By now the proof of this result should be clear to the reader.
Theorem 7.2.3. Let F be an algebraically closed eld, let V be an n-
dimensional vector space over F with n 1, and let T /(V ). Then
there is a basis B for V such that [T]
B
is upper triangular.
Proof. We use induction on the dimension of V . Clearly the theorem is true
if n = 1. So suppose that n > 1 and that the theorem holds for all vector
spaces over F whose diminsion is a positive integer less than n. Let be any
eigenvalue of T (which we know must exist by Theorem 7.2.1). Let
U = Im(T I).
Because T I is not injective, it is also not surjective, so dim(U) < n =
dim(V ). If u U, then
T(u) = (T I)(u) + u.
Obviously (T I)(u) U (from the denition of U) and u U. Thus
the equation above shows that T(u) U, hence U is T-invariant. Thus
T[
U
/(U). By our induction hypothesis, there is a basis (u
1
, . . . , u
m
) of U
7.2. UPPER-TRIANGULAR MATRICES 117
with respect to which T[
U
has an upper triangular matrix. Thus for each j
we have (using Theorem 7.2.2)
T(u
j
) = (T[
U
)(u
j
) span(u
1
, . . . , u
j
). (7.3)
Extend (u
1
, . . . , u
m
) to a basis (u
1
, . . . , u
m
, v
1
, . . . , v
r
) of V . For each k,
1 k r, we have
T(v
k
) = (T I)(v
k
) + v
k
.
The denition of U shows that (T I)(v
k
) U = span(u
1
, . . . , u
m
).
Clearly
T(v
k
) span(u
1
, . . . , u
m
, v
1
, . . . , v
k
).
It is now clear that that T has an upper triangular matrix with respect
to the basis (u
1
, . . . , u
m
, v
1
, . . . , v
r
).
Obs. 7.2.4. Suppose T /(V ) has an upper triangular matrix with respect
to some basis B of V . Then T is invertible if and only if all the entries on
the diagonal of that upper triangular matrix are nonzero.
Proof. We know that T is invertible if and only if [T]
B
is invertible, which
by Result 6.3.2 and Theorem 6.3.7 is invertible if and only if the diagonal
entries of [T]
B
are all nonzero.
Corollary 7.2.5. Suppose T /(V ) has an upper triangular matrix with
respect to some basis B of V . Then the eigenvalues of T consist precisely of
the entries on the diagonal of [T]
B
. (Note: With a little care the diagonal
of the upper triangular matrix can be arranged to have all equal eigenvalues
bunched together. To see this, rework the proof of Theorem 7.2.3.)
Proof. Suppose the diagonal entries of the nn upper triangular matrix [T]
B
are
1
, . . . ,
n
. Let F. Then [T I]
B
is upper triangular with diagonal
elements equal to
1
,
2
, . . . ,
n
. Hence T I is not invertible
if and only if equals one of the
j
s. In other words, is an eigenvalue of
T if and only equals one of the
j
s as desired.
Obs. 7.2.6. Let B = (v
1
, . . . , v
n
) be a basis for V . An operator T /(V )
has a diagonal matrix diag(
1
, . . . ,
n
) with respect to B if and only if T(v
i
) =

i
v
i
, i.e., each vector in B is an eigenvector of T.
118 CHAPTER 7. OPERATORS AND INVARIANT SUBSPACES
In some ways, the nicest operators are those which are diagonalizable, i.e.,
those for which there is some basis with respect to which they are represented
by a diagonal matrix. But this is not always the case even when the eld
F is algebraically closed. Consider the following example over the complex
numbers. Dene T /(c
2
) by T(y, z) = (z, 0). As you should verify, if o is
the standard basis of c
2
, then [T]
S
=
_
0 1
0 0
_
, so the only eigenvalue of T
is 0. But null(T 0I) is 1-dimensional. So clearly c
2
does not have a basis
consisting of eigenvectors of T.
One of the recurring themes in linear algebra is that of obtaining condi-
tions that guarantee that some operator have a diagonal matrix with respect
to some basis.
Theorem 7.2.7. If dim(V ) = n and T /(V ) has exactly n distinct eigen-
values, then T has a diagonal matrix with respect to some basis.
Proof. Suppose that T has distinct eigenvalues
1
, . . . ,
n
, and let v
j
be a
nonzero eigenvector belonging to
j
, 1 j n. Because nonzero eigenvec-
tors corresponding to distinct eigenvalues are linearly independent, (v
1
, . . . , v
n
)
is linearly independent, and hence a basis of V . So with respect to this basis
T has a diagonal matrix.
The following proposition gathers some of the necessary and sucient
conditions for an operator T to be diagonalizable.
Theorem 7.2.8. Let
1
, . . . ,
m
denote the distinct eigenvalues of T /(V ).
Then the following are equivalent:
(i) T has a diagonal matrix with respect to some basis of V .
(ii) V has a basis consisting of eigenvectors of T.
(iii) There exist one-dimensional T-invariant subspaces U
1
, . . . , U
n
of V
such that
V = U
1
U
n
.
(iv) V = null(T
1
I) null(T
m
I).
(v) dim(V ) = dim(null(T
1
I)) + + dim(null(T
m
I)).
Proof. At this stage it should be clear to the reader that (i), (ii) and (iii) are
equivalent and that (iv) and (v) are equivalent. At least you should think
about this until the equivalences are quite obvious. We now show that (ii)
and (iv) are equivalent.
7.2. UPPER-TRIANGULAR MATRICES 119
Suppose that V has a basis B = (v
1
, . . . , v
n
) consisting of eigenvec-
tors of T. We may group together those v
i
s belonging to the same eigen-
value, say v
1
, . . . , v
d
1
belong to
1
so span a subspace U
1
of null(T
1
I);
v
d
1
+1
, . . . , v
d
1
+d
2
belong to
2
so span a subspace U
2
of null(T
2
I); . . . , and
the last d
m
of the v
i
s belong to
m
and span a subspace U
m
of null(T
m
I).
Since B spans all of V , it is clear that V = U
1
+ + U
m
. Since the sub-
spaces of eigenvectors belonging to distinct eigenvalues are independent (an
easy corollary of Theorem 7.1.2), it must be that V = U
1
U
m
. Hence
each U
i
is all of null(T
i
I).
Conversely, if V = U
1
U
m
, by joining together bases of the U
i
s we
get a basis of V consisting of eigenvectors of T.
There is another handy condition that is necessary and sucient for an
operator on a nite-dimensional vector space to be diagonalizable. Before
giving it we need the following lemma.
Lemma 7.2.9. Let T /(V ) where V is an n-dimensional vector space over
the algebraically closed eld F. Suppose that F is an eigenvalue of T.
The must be a root of the minimal polynomial p(x) of T.
Proof. Let v be a nonzero eigenvector of T associated with . It is easy
to verify that p(T)(v) = p() v. But if P(T) is the zero operator, then
p(T)(v) = 0 = p() v implies that p() = 0.
Theorem 7.2.10. Let V be an n-dimensional vector space over the alge-
braically closed eld F, and let T /(V ). Then T is diagonalizable if and
only if the minimal polynomial for T has no repeated root.
Proof. Suppose that T /(V ) is diagonalizable. This means that there is a
basis B of V such that [T]
B
is diagonal, say it is the matrix diag(
1
, . . . ,
n
) =
A. If f(x) is any polynomial over F, then f(A) = diag(f(
1
), . . . , f(
n
). So
if
1
, . . . ,
r
are the distinct eigenvalues of T, and if p(x) = (x
1
)(x

2
) (x
r
), then p(A) = 0, so p(T) = 0. Hence the minimal polynomial
has no repeated roots (since it must divide p(x)).
The converse is a bit more complicated. Suppose that
p(x) = (x
1
) (x
r
)
is the minimal polynomial of T with
1
, . . . ,
r
distinct. By the preceding
lemma we know that all the eigenvalues of T appear among
1
, . . . ,
r
. Let
120 CHAPTER 7. OPERATORS AND INVARIANT SUBSPACES
W
i
= null(T
i
I) for 1 i r. We proved earlier that W
1
+ + W
r
=
W
1
W
r
. What we need at this points is to show that W
1
W
r
is all of V .
For 1 j r put
p
j
(x) =
p(x)
x
j
.
Note that p
t
(x) =

r
j=1
p
j
(x), so that p
t
(
i
) =

k,=i
(
i

k
). Also, p
j
(
i
) =
_
0, if i ,= j;
p
t
(
i
), if i = j.
Use partial fraction technique to write
1
p(x)
=
r

i=1
a
i
x
i
.
The usual process shows that a
i
=
1
p

(
i
)
, so we nd
1 =
r

i=1
1
p
t
(
i
)
p
i
(x).
For each i, 1 i r, put P
i
=
1
p

(
i
)
p
i
(T), so I = P
1
+ P
2
+ + P
r
. Since
p(x) = (x
i
)p
i
(x), (T
i
)p
i
(T)(v) = 0 for all v V . This means that
P
i
(v) W
i
for all v V . Moreover, for each v V , v = P
1
(v) +P
2
(v) + +
P
r
(v) W
1
W
r
. This says V is the direct sum of the eigenspaces of
T. Hence if we choose any basis for each W
i
and take their union, we have a
basis B for V consisting of eigenvectors of T, so that [T]
B
is diagonal.
Corollary 7.2.11. Let T /(V ) be diagonalizable. Let U be a T-invariant
subspace of V . Then T[
U
is diagonalizable.
Proof. Let p(x) be the minimal polynomial for T. If u U, then p(T)(u) = 0.
So p(T)[
U
is the zero operator on U, implying that p(x) is a multiple of the
minimal polynomial for T[
U
. Since T is diagonalizable its minimal polynomial
has no repeated roots. Hence the minimal polynomial for T[
U
can have no
repeated roots, implying that T[
U
is diagonalizable.
7.3 Invariant Subspaces of Real Vector Spaces
We have seen that if V is a nite dimensional vector space over an alge-
braically closed eld F then each linear operator on V has an eigenvalue in
7.3. INVARIANT SUBSPACES OF REAL VECTOR SPACES 121
F. We have also seen an example that shows that this is not the case for real
vector spaces. This means that an operator on a nonzero nite dimensional
real vector space may have no invariant subspace of dimension 1. However,
we now show that an invariant subspace of dimension 1 or 2 always exists.
Theorem 7.3.1. Every operator on a nite dimensional, nonzero, real vector
space has an invariant subspace of dimension 1 or 2.
Proof. Suppose V is a real vector space with dim(V ) = n > 0 and let T
/(V ). Choose v V with v ,=

0. Since (v, T(v), . . . , T


n
(v)) must be linearly
dependent (Why?), there are scalars a
0
, a
1
, . . . , a
n
F such that not all the
a
i
s are zero and

0 = a
0
v + a
1
T(v) + a
n
T
n
(v).
Construct the polynomial f(x) =

n
i=0
a
i
x
i
which can be factored in the
form
f(x) = c(x
1
) (x
r
)(x
2
+
1
x +
1
) (x
2
+
k
x +
k
),
where c is a nonzero real number, each
j
,
j
, and
j
is real, r + k 1,

2
j
< 4
j
and the equation holds for all x 1. We then have
0 = a
0
v + a
1
T(v) + + a
n
T
n
(v)
= (a
0
I + a
1
T + + a
n
T
n
)(v) (7.4)
= c(T
1
I) (T
r
I)(T
2
+
1
T +
1
I) (T
2
+
k
T +
k
I)(v),
which means that T
j
is not injective for at least one j or that T
2
+
j
T+
j
I
is not injective for at least one j. If T
j
I is not injective for some j, then
T has an eigenvalue and hence a one-dimensional invariant subspace. If
T
2
+
j
T +
j
I is not injective for some j, then we can nd a nonzero vector
w for which
T
2
(w) +
j
T(w) +
j
w =

0. (7.5)
Using Eq. 7.5 it is easy to show that span(w, T(w)) is T-invariant and
it clearly has dimension 1 or 2. If it had dimension 1, then w would be an
eigenvector of T belonging to an eigenvalue that would have to be a root
of x
2
+
j
x +
j
= 0, contradicting the assumption that
2
j
< 4
j
. Hence T
has a 2-dimensional invariant subspace.
In fact we now have an easy proof of the following:
122 CHAPTER 7. OPERATORS AND INVARIANT SUBSPACES
Theorem 7.3.2. Every operator on an odd-dimensional real vector space has
an eigenvalue.
Proof. Let V be a real vector space with dim(V ) odd and let T /(V ). We
know that the eigenvalues of T are the roots of the characteristic polynomial
of T which has real coecients and degree equal to dim(V ). But every real
polynomial of odd degree has a real root by Lemma 1.1.2, i.e., T has a real
eigenvalue.
7.4 Two Commuting Linear Operators
Let V be a nite-dimensional vector space over the eld F, and let T be a
linear operator on V . Suppose T has matrix A =
_
0 1
1 0
_
with respect
to some basis. If i is an element in F (or in some extension of F) for which
i
2
= 1, then the eigenvalues of T are i. So T has eigenvectors in V if and
only if i F. For example, if F = 1, then T has no eigenvectors. To avoid
having to deal with this kind of situation we assume from now on that F is
algebraically closed, so that each polynomial that has coecients in F splits
into linear factors over F. In particular, any linear operator T on V will have
minimal and characteristic polynomials that split into linear factors over F.
Our primary example of an algebraically closed eld is the eld c of complex
numbers.
Recall that the ring F[x] of polynomials in the indeterminate x with
coecients from F is a principle ideal domain. This means that if I is an
ideal of F[x], it must consist of all multiples of some particular element of
F[x]. Our chief example is the following: Let T be any linear operator on
V , let W be a T-invariant subspace of V , and let v be any vector in V .
Put T(v, W) = f(x) F[x] : f(T)(v) W. It is easy to show that
T(v, W) is an ideal of F[x]. (This just means that the sum of any two
polynomials in T(v, W) is also in T(v, W), and if f(x) T(v, W) and g(x)
is any polynomial in F[x], then the product f(x)g(x) is back in T(v, W).)
Hence there is a unique monic polynomial g(x) of minimal degree in T(v, W)
called the T-conductor of v into W. For this conductor g(x) it is true that
f(x) T(v, W) if and only if there is some h(x) F[x] for which f(x) =
g(x) h(x). If W =

0, then g(x) is called the T-annihilator of v. Clearly


the minimal polynomial p(x) of T is in T(v, W), so g(x) divides p(x). All
7.4. TWO COMMUTING LINEAR OPERATORS 123
these polynomials have coecients in F, so by hypothesis they all split into
linear factors over F.
The fact that p(x) divides any polynomial q(x) for which q(T) = 0 is
quite important. Here is an example. Suppose W is a subspace invariant
under T, so the restriction T[
W
of T to vectors in W is a linear operator on
W. Let g(x) be the minimal polynomial for T[
W
. Since p(T) = 0, clearly
p(T[
W
) = 0, so g(x) divides p(x).
Theorem 7.4.1. Let V be a nite-dimensional vector space over (the alge-
braically closed) eld F, with n = dim(V ) 1. Let S and T be commuting
linear operators on V . Then every eigenspace of T is invariant under S, and
S and T have a common eigenvector in V .
Proof. Since dim(V ) 1 and F is algebraically closed, T has an eigenvector
v
1
in V with associated eigenvalue c F, i.e., 0 ,= v
1
V and (T cI)(v
1
) =
0. Put W = v V : (T cI)(v) = 0, so W is the eigenspace of T
associated with the eigenvalue c.
To see that W is invariant under S let w W, so T(w) = cw. Then since
S commutes with T, it also commutes with T cI, and (T cI)(S(w)) =
[(T cI)S](w) = [S(T cI)](w) = S((T cI)(w)) = S(0) = 0. This says
that S(w) is in W, so S acts on W, which has dimension at least 1. But
then S[
W
is a linear operator on W and must have an eigenvector w
1
in W.
So w
1
is a common eigenvector of S and T.
Note: In the above proof we do not claim that every element of W is an
eigenvector of S. Also, S and T play symmetric roles in the above proof, so
that also each eigenspace of S is invariant under T.
We need to use the concept of quotient space. Let W be a subspace of
V , where V is any vector space over a eld F. For each v V , the set
v +W = v +w : w W is called a coset of the subgroup W in the additive
group (V, +), and these cosets form a group V/W (called a quotient group)
with the following binary operation: (v
1
+W) +(v
2
+W) := (v
1
+v
2
) +W.
From Modern Algebra we know that the quotient group (V/W, +) is also an
abelian group (Think about the dierent roles being played by the symbol
+ here.) We can make this quotient group V/W into a vector space over
the same eld by dening a scalar multiplication as follows: for c F and
v +W V/W, put c(v +W) = cv +W. It is routine to show that this makes
124 CHAPTER 7. OPERATORS AND INVARIANT SUBSPACES
V/W into a vector space. Moreover, if B
1
= (v
1
, . . . , v
r
) is a basis for W, and
B
2
= (v
1
, . . . , v
r
, v
r+1
, . . . , v
n
) is a basis for V , then (v
r+1
+W, . . . , v
n
+W) is
a basis for V/W. (Be sure to check this out!!) Hence dim(V ) = dim(W) +
dim(V/W). Moreover, if W is invariant under the operator T /(V ), then
T induces a linear operator T on V/W as follows:
T : V/W V/W : v + W T(v) + W.
Clearly if T, S /(V ), if W is invariant under both S and T, and if
S T = T S, then T S = S T.
We are now ready for the following theorem on two commuting operators.
Theorem 7.4.2. Let S and T be two commuting operators on V over the
algebraically closed eld F. Then there is a basis B of B with respect to which
both [T]
B
and [S]
B
are upper triangular.
Proof. We proceed by induction on n, the dimension of V .
By the previous theorem there is a vector v
1
V such that Tv
1
=
1
v
1
and Sv
1
=
1
v
1
for some scalars
1
and
1
. Let W be the subspace spanned
by v
1
. Then the dimension of V/W is n 1, and the operators T and S
on V/W commute, so by our induction hypothesis there is a basis B
1
=
(v
2
+ W, v
3
+ W, . . . , v
n
+ W) of V/W with respect to which both T and S
have upper triangular matrices. It follows that B = (v
1
, v
2
, . . . , v
n
) is a basis
of V with respect to which both T and S have upper triangular matrices.
Theorem 7.4.3. Let T be a diagonalizable operator on V and let S /(V ).
Then ST = TS if and only if each eigenspace of T is invariant under S.
Proof. Since T is diagonalizable, there is a basis B of V consisting of eigen-
vectors of T. Let W be the eigenspace of T associated with the eigenvalue
, and let w W. If S(w) W, then (TS)(w) = T(Sw) = Sw =
S(w) = S(Tw) = ST(w). So if each eigenspace of T is invariant un-
der S, S and T commute at each element of B, implying that ST = TS
on all of V . Conversely, suppose that ST = TS. Then for any w W,
T(Sw) = S(Tw) = S(w) = S(w), implying that Sw W. So each
eigenspace of T must be invariant under S.
Note that even if T is diagonalizable and S commutes with T, it need not
be the case that S must be diagonalizable. For example, if T = I, then V is
the only eigenspace of T, and if S is any non-diagonalizable operator on V ,
7.4. TWO COMMUTING LINEAR OPERATORS 125
then S still commutes with T. However, if both T and S are known to be
diagonalizable, then we can say a bit more.
Theorem 7.4.4. Let S and T both be diagonalizable operators on the n-
dimensional vector space V over the eld F. Then S and T commute if and
only if S and T are simultaneously diagonalizable.
Proof. First suppose that S and T are simultaneously diagonalizable. Let B
be a basis of V with respect to which both [T]
B
and [S]
B
are diagonal. Since
diagonal matrices commute, [T]
B
and [S]
B
commute, implying that T and S
commute.
Conversely, suppose that T and S commute. We proceed by induction
on n. If n = 1, then any basis is equivalent to the basis consisting of any
nonzero vector, and any 1 1 matrix is diagonal. So assume that 1 < n
and the result holds over all vector spaces of dimension less than n. If T is
a scalar times the identity operator, then clearly any basis that diagonalizes
S will diagonalize both S and T. So suppose T is not a scalar times the
identity. Let be an eigenvalue of T and put W = null(T I). So W is the
eigenspace of T associated with , and by hypothesis 1 dim(W) < n. By
Theorem 7.4.3 W is invariant under S. It follows that T[
W
and S[
W
are both
diagonalizable by Corollary 7.2.11, and hence by the induction hypothesis
the two are simultaneously diagonalizable. Let B
W
be a basis for W which
consists of eigenvectors of both T[
W
and S[
W
, so they are also eigenvectors
of both T and S. Repeat this process for each eigenvalue of T and let B be
the union of all the bases of the various eigenspaces. Then the matrices [T]
B
and [S]
B
are both diagonal.
Theorem 7.4.5. Let T be a diagonalizable operator on V , an n-dimensional
vector space over F. Let S /(V ). Then there is a polynomial f(x) F[x]
such that S = f(T) if and only if each eigenspace of T is contained in a
single eigenspace of S.
Proof. First suppose that S = f(T) for some f(x) F[x]. If T(v) = v,
then S(v) = f(T)(v) = f()v. Hence the entire eigenspace of T associated
with is contained in the eigenspace of S associated with its eigenvalue f().
This completes the proof in one direction.
For the converse, let
1
, . . . ,
r
be the distinct eigenvalues of T, and let
W
i
= null(T
i
I), the eigenspace of T associated with
i
. Let
i
be
126 CHAPTER 7. OPERATORS AND INVARIANT SUBSPACES
the eigenvalue of S whose corresponding eigenspace contains W
i
. Note that
the values
1
, . . . ,
r
might not be distinct. Use Lagrange interpolation to
construct the polynomial f(x) F[x] for which f(
i
) =
i
, 1 i r. Then
dene an operator S
t
/(V ) as follows. Let B be a basis of V consisting
of the union of bases of the eigenspaces W
i
. For each v B, say v W
i
,
put S
t
(v) = f(T)(v) = f(
i
)v =
i
v = S(v). Then since S
t
and S agree on
a basis of V , they must be the same operator. Hence S = f(T). (Here we
have used the observation that if S
t
(v) = f(T)(v) for each v in some basis of
V , then S
t
= f(T).)
Corollary 7.4.6. Let T /(V ) have n distinct eigenvalues where n =
dim(V ). Then the following are equivalent:
(i) ST = TS.
(ii) S and T are simultaneously diagonalizable.
(ii) S is a polynomial in T.
Proof. Since T has n distinct eigenvalues, its minimal polynomial has no
repeated factors, so T is diagonalizable. Then Theorem 7.4.3 says that ST =
TS i each eigenspace of T is invariant under S. Since each eigenspace of T
is 1-dimensional, this means that each eigenvector of T is also an eigenvector
of S. Using Theorem 7.4.5 we easily see that the theorem is completely
proved.
We note the following example: If T is any invertible operator, so the con-
stant term of its minimal polynomial is not zero, it is easy to use the minimal
polynomial to write I as a polynomial in T. However, any polynomial in I
is just some constant times I. So if T is any invertible operator that is not a
scalar times I, then I is a polynomial in T, but T is not a polynomial in I.
7.5 Commuting Families of Operators

Let V be an n-dimensional vector space over F, and let T be a family of linear


operators on V . We want to know when we can simultaneously triangularize
or diagonalize the operators in T, i.e., nd one basis B such that all of the
matrices [T]
B
for T T are upper triangular, or they are all diagonal. In
the case of diagonalization, it is necessary that F be a commuting family of
7.5. COMMUTING FAMILIES OF OPERATORS

127
operators: UT = TU for all T, U T. That follows from the fact that all
diagonal matrices commute. Of course, it is also necessary that each operator
in T be a diagonalizable operator. In order to simultaneously triangularize,
each operator in T must be triangulable. It is not necessary that T be
a commuting family; however, that condition is sucient for simultaneous
triangulation as long as each T in T can be individually triangulated.
The subspace W of V is invariant under (the family of operators) T if
W is invariant under each operator in T.
Suppose that W is a subspace that is T-invariant for some T /(V ).
Recall the denition of the T-conductor of v into W. It is the set f(x)
F[x] : f(T)(v) W. We saw that since F[x] is a PID there must be a monic
polynomial g(x) of minimal degree in this ideal such that f(x) belongs to this
ideal if and only if f(x) = g(x)h(x) for some h(x) F[x]. This polynomial
g(x) was called the T-conductor of v into W. Since

0 W, it is clear that
g(x) divides the minimal polynomial for T.
Lemma 7.5.1. Let T be a commuting family of triangulable linear operators
on V . Let W be a proper subspace of V which is invariant under T. There
exists a vector v V such that:
(a) v is not in W;
(b) for each T in T, the vector T(v) is in the subspace spanned by v and
W.
Proof. Since the space of all linear operators on V is a vector space with
dimension n
2
, it is easy to see that without loss of generality we may assume
that T has only nitely many operators. For let T
1
, . . . , T
r
be a maxi-
mal linearly independent subset of T, i.e., a basis for the subspace of /(V )
spanned by T. If v is a vector such that (b) holds for each T
i
, then (b) holds
for each each operator which is a linear combination of T
1
, . . . , T
r
.
First we establish the lemma for one operator T. To do this we need to
show that there is some vector v (V W) for which the T-conductor of v
into W is a linear polynomial. Since T is triangulable, its minimal polynomial
p(x), as well as its characteristic polynomial, factor over F into a product of
linear factors. Say p(x) = (x c
1
)
e
1
(x c
2
)
e
2
(x c
r
)
er
. Let w be any
vector of V that is not in W, and let g(x) be the T-conductor of w into W.
Then g divides the minimal polynomial for T. Since w is not in W, g is not
constant. So g(x) = (x c
1
)
f
1
(x c
2
)
f
2
(x c
r
)
fr
where at least one of
the integers f
i
is positive. Choose j so that f
j
> 0. Then (x c
j
) divides
g(x):
128 CHAPTER 7. OPERATORS AND INVARIANT SUBSPACES
g = (x c
j
)h(x).
If h(x) = 1 (it must be monic in any case), then w is the vector we seek. By
denition of g, if h(x) has degree at least 1, then the vector v = h(T)(w)
cannot be in W. But
(T c
j
I)(v) = (T c
j
I)h(T)(w) (7.6)
= g(T)(w) W.
Now return to thinking about the family T. By the previous paragraph
we can nd a vector v
1
not in W and a scalar c
1
such that (T
1
c
1
I)(v
1
) W.
Let V
1
be the set of all vectors v V such that (T
1
c
1
I)(v) W. Then V
1
is a subspace of V that properly contains W. Since T T commutes with
T
1
we have
(T
1
c
1
I)(T(v)) = T(T
1
c
1
I)(v).
If v V
1
, then (T
1
c
1
I)v W. Since W is invariant under each T in T, we
have T(T
1
c
1
I)(v) W i.e., Tv V
1
, for all v V
1
and all T T.
Note again that W is a proper subspace of V
1
. Put U
2
= T
2
[
V
1
, the oper-
ator obtained by restricting T
2
to the subspace V
1
. The minimal polynomial
for U
2
divides the munimum polynomial for T
2
. By the second paragraph
of this proof applied to U
2
and the invariant subspace W, there is a vector
v
2
V
1
but not in W, and a scalar c
2
such that (T
2
c
2
I)(v
2
) W. Note
that
(a) v
2
, W;
(b) (T
1
c
1
I)(v
2
) W;
(c) (T
2
c
2
I)(v
2
) W.
Let V
2
= v V
1
: (T
2
c
2
I)(v) W. Then V
2
is invariant under T.
Apply the same ideas to U
3
= T
3
[
V
2
. Continuing in this way we eventually
nd a vector v = v
r
not in W such that (T
j
c
j
I)(v) W, for 1 j r.
Theorem 7.5.2. Let V be a nite-dimensional vector space over the eld
F. Let T be a commuting family of triangulable linear operators on V (i.e.,
the minimal polynomial of each T T splits into linear factors). There
exists an ordered basis for V such that every operator in T is represented by
a triangular matrix with respect to that basis.
Proof. Start by applying Lemma 7.5.1 to the T-invariant subspace W = 0
to obtain a nonzero vector v
1
for which T(v
1
) W
1
= v
1
) for all T T.
7.6. THE FUNDAMENTAL THEOREM OF ALGEBRA

129
Then apply the lemma to W
1
to nd a vector v
2
not in W
1
but for which
T(v
2
) W
2
= v
1
, v
2
). Proceed in this way until a basis B = (v
1
, v
2
, . . .) of
V has been obtained. Clearly [T]
B
is upper triangular for every T T.
Corollary 7.5.3. Let T be a commuting family of n n matrices over an
algebraically closed eld F. There exists a nonsingular n n matrix P with
entries in F such that P
1
AP is upper-triangular, for every matrix A in T.
Theorem 7.5.4. Let T be a commuting family of diagonalizable linear op-
erators on the nite-dimensional vector space V . There existws an ordered
basis for V such that every operator in T is represented in that basis by a
diagonal matrix.
Proof. If dim(V ) = 1 or if each T T is a scalar times the identity, then
there is nothing to prove. So suppose 1 < n = dim(V ) and that the theorem
is true for vector spaces of diimension less than n. Also assume that for
some T T, T is not a scalar multiple of the identity. Since the operators
in T are all diagonalizable, we know that each minimal polynomial splits
into distinct linear factors. Let c
1
, . . . , c
k
be the distinct eigenvalues of T,
and for each index i put W
i
= null(T c
i
I). Fix i. Then W
i
is invariant
under every operator that commutes with T. Let T
i
be the family of linear
operators on W
i
obtained by restricting the operators in T to the invariant
subspace W
i
. Each operator in T
i
is diagonalizable, because its minimal
polynomial divides the minimal polynomial for the corresponding operator
in T. By hypothesis, dim(W
i
) < dim(V ). So the operators in T
i
can be
simultaneously diagonalized. In other words, W
i
has a basis B
i
which consists
of vectors which are simultaneously characteristic vectors for every operator
in T
i
. Then B = (B
1
, . . . , B
k
) is the basis of V that we seek.
7.6 The Fundamental Theorem of Algebra

This section is based on the following article: Harm Derksen, The Funda-
mental Theorem of Algebra and Linear Algebra, The American Mathematical
Monthly, vol 110, Number 7, August-September 2003, 620 623. We start
by quoting from the third paragraph of the article by Derksen.
Since the fundamental theorem of algebra is needed in linear algebra
courses, it would be desirable to have a proof of it in terms of linear algebra.
In this paper we prove that every square matrix with complex coecients has
130 CHAPTER 7. OPERATORS AND INVARIANT SUBSPACES
an eigenvector. This statement is equivalent to the fundamental theorem of
algebra. In fact, we will prove the slightly stronger result that any number of
commuting square matrices with complex entries have a common eigenvector.
The proof lies entirely within the framework of linear algebra, and unlike
most other algebraic proofs of the fundamental theorem of algebra, it does
not require Galois theory or splitting elds.
Preliminaries
Several results we have obtained so far have made the assumption that
the eld F was algebraically closed. Moreover, we often gave the complex
numbers c as the prototypical example. So in this section we have to be care-
ful not to quote any results that might have hidden in them the assumption
that c is algebraically closed.
For the proof we use only the following elementary properties of real and
complex numbers that were established much earlier.
Lemma Every polynomial of odd degree with real coecients has a (real)
zero.
Lemma Every complex number has a square root.
Theorem 7.3.2 If A is real, nn, with n odd, then A has an eigenvector
(belonging to a real eigenvalue).
An Induction Argument
Keep in mind that we cannot use results that might have hidden in them
the assumption that c is algebraically closed.
For a eld K and for positive integers d and r, consider the following
statement:
P(K, d, r): Any r commuting linear transformations A
1
, A
2
, . . . , A
r
of a
K- vector space V of dimension n such that d does not divide n have a
common eigenvector.
Lemma 7.6.1. If P(K, d, 1) holds, then P(K, d, r) holds for all r 1.
It is important to realize that the smallest d for which the hypothesis of
this lemma holds in a given situation might be much larger than d = 1.
Proof. The proof is by induction on r. The case of P(K, d, 1) is true by
hypothesis. For r 2, suppose that P(K, d, r 1) is true and let A
1
, . . . , A
r
be commuting linear transformations of V of dimension such that d does not
7.6. THE FUNDAMENTAL THEOREM OF ALGEBRA

131
divide n. Because (K, d, 1) holds, A
r
has an eigenvalue in K. Let W be
the kernel and Z the image of A
r
I. It is now easy to show that each of
W and Z are left invariant by each of A
1
, . . . , A
r1
.
First suppose that W ,= V . Because dim W + dim Z = dim V , either d
does not divide dim W or d does not divide dim Z. Since dim W < n and
dim Z < n, we may assume by induction on n that A
1
, . . . , A
r
already have
a common eigenvector in W or in Z.
In the remaining case, W = V . Because P(K, d, r 1) holds, we may
assume that A
1
, . . . , A
r1
have a common eigenvector in V , say v. Since
A
r
v = v (because W = V ), v is a common eigenvector of A
1
, . . . , A
r
.
Lemma 7.6.2. P(K, 2, r) holds for all r 1. In other words, if A
1
, . . . , A
r
are commuting linear transformations on an odd dimensional 1-vector space,
then they have a common eigenvector.
Proof. By Lemma 7.6.1 it is enough to show that P(1, 2, 1) is true. If A is
an linear transformation of an odd dimensional 1-vector space, det(xI A)
is a polynomial of odd degree, which has a zero by Lemma 1.1.2. Then
is a real eigenvalue of A.
We now lift the result of Lemma 7.6.2 to the analogous result over the
eld c.
Denition If A is any mn matrix over c we let A

denote the transpose


of the complex conjugate of the matrix A, i.e., (A

)
ij
= A
ji
. Then A is said
to be Hermitian if m = n and A

= A.
Lemma 7.6.3. P(c, 2, 1) holds, i.e., every linear transformation of a c-
vector space of odd diimension has an eigenvector.
Proof. Suppose that T
A
: c
n
c
n
: v Av is a c-linear map with n odd.
Let V be the 1-vector space Herm
n
(c), the set of nn Hermitian matrices.
Dene two linear operators L
1
and L
2
on V by
L
1
(B) =
AB + BA

2
,
and
L
2
(B) =
AB BA

2i
.
It is now easy to show that dim V = n
2
, which is odd. It is also routine
to check that L
1
and L
2
commute. Hence by Lemma 7.6.2, P(1, 2, 2) holds
132 CHAPTER 7. OPERATORS AND INVARIANT SUBSPACES
and implies that L
1
and L
2
have a common eigenvector B, say L
1
(B) = B
and L
2
(B) = B, with and both real. But then
(L
1
+ iL
2
)(B) = AB = ( + i)B,
and any nonzero column vector of B gives an eigenvector for the matrix
A.
Lemma 7.6.4. P(c, 2
k
, r) holds for all k 1and r 1.
Proof. The proof is by induction on k. The case k = 1 follows from Lem-
mas 7.6.3 and 7.6.1. Assume that P(c, 2
l
, r) holds for l < k. We will establish
that P(c, 2
k
, r) holds. In view of Lemma 7.6.1 it suces to prove P(c, 2
k
, 1).
Suppose that A : c
n
c
n
is linear, where n is divisible by 2
k1
but not by
2
k
. Let V be the c-vector space Skew
n
(c) = B M
n
(c) : B

= B, the
set of n n skew-symmetric matrices with complex entries. Note that dim
V = n(n 1)/2, which ensures that 2
k1
does not divide dim V . Dene two
commuting linear transformations L
1
and L
2
of V by
L
1
(B) = AB + BA

and
L
2
(B) = ABA

.
It is an easy exercise to show that L
1
and L
2
are indeed both in /(V )
and they commute.
By P(c, 2
k1
, 2), L
1
and L
2
have a common eigenvector B, say
L
1
(B) = (B) = AB + BA

, i.e., BA

= (I A)B
and
L
2
(B) = B = ABA

= A(I A)B = (A A
2
)B,
which implies
(A
2
A + I)B = 0,
where and are now complex numbers.
Let v be a nonzero column of B. Then
(A
2
A I)v = 0.
7.6. THE FUNDAMENTAL THEOREM OF ALGEBRA

133
Since each element of c has a square root, there is a in c such that

2
=
2
4. We can write x
2
x+ = (x)(x), where = (+)/2
and = ( )/2. We then have
(A I)w = 0,
where w = (AI)v. If w = 0, then v is an eigenvector of A with eigenvalue
; if w ,= 0, then w is an eigenvector of A with eigenvalue .
We have now reached the point where we can prove the main result for
commuting operators on a complex space.
Theorem 7.6.5. If A
1
, A
2
, . . . , A
r
are commuting linear transformations of
a nite dimensional nonzero c-vector space V , then they have a common
eigenvector.
Proof. Let n be the dimension of V . There exists a positive integer k such
that 2
k
does not divide n. Since P(c, 2
k
, r) holds by Lemma 7.6.4, the theo-
rem follows.
Theorem 7.6.6. (The Fundamental Theorem of Algebra) If P(x) is a non-
constant polynomial with complex coecients, then there exists a in c such
that P() = 0.
Proof. It suces to prove this for monic polynomials. So let
P(x) = x
n
+ a
1
x
n1
+ a
2
x
n2
+ + a
n
.
Then P(x) = det (xI A), where A is the companion matrix of P:
A =
_
_
_
_
_
_
_
0 0 0 a
n
1 0 0 a
n1
0 1 0 a
n2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 1 a
1
_
_
_
_
_
_
_
.
Theorem 7.6.5 implies that A has a complex eigenvalue in c, from which
it follows that P() = 0.
134 CHAPTER 7. OPERATORS AND INVARIANT SUBSPACES
7.7 Exercises
1. Let V be nite dimensional over F and let P /(V ) be idempotent.
Determine the eigenvalues of P and show that P is diagonalizable.
2. If T, S /(V ) and TS = ST, show that
(i) null(T) is S-invariant; and
(ii) If f(x) F[x], then null(f(T)) is S-invariant.
3. Suppose n is a positive integer and T /(F
n
) is dened by
T(z
1
, z
2
, . . . , z
n
) = (z
1
+ + z
n
, z
1
+ + z
n
, . . . , z
1
+ + z
n
).
Determine all eigenvalues and eigenvectors of T.
4. Suppose T /(V ) and dim(Im(T)) = k. Prove that T has at most
k + 1 distinct eigenvalues.
5. Suppose that S, T /(V ), F and 1 k Z. Show that (TS
I)
k
T = T(ST I)
k
.
6. Suppose that S, T /(V ). Prove that ST and TS have the same
eigenvalues but not necessarily the same minimal polynomial.
7. Suppose that S, T /(V ) and at least one of S, T is invertible. Show
that ST and TS have the same minimal and characteristic polynomials.
8. Suppose that F is algebraically closed, p(z) F[z] and a F. Prove
that a is an eigenvalue of p(T) if and only if a = p() for some eigenvalue
of T. (Hint: Suppose that a is an eigenvalue of p(T). Factor p(z)a =
c(z
1
) (z
m
). Use the fact that p(T)aI is not injective. Dont
forget to consider what happens if c = 0.)
9. Suppose that S, T /(V ) and that T is diagonalizable. Suppose that
each eigenvector of T is an eigenvector of S. Show that ST = TS.
10. Let T : V V be a linear operator on the vector space over the eld
F. Let v V and let m be a positive integer for which v ,= 0, T(v) ,= 0,
..., T
m1
(v) ,= 0, but T
m
(v) = 0. Show that v, T(v), . . . , T
m1
(v) is
a linearly independent set.
7.7. EXERCISES 135
11. Let W be a subspace of the vector space V over any eld F.
(a) Prove the statements about the quotient space V/W made in the
paragraph just before Theorem 7.4.2.
(b) First Isomorphism Theorem: Each linear transformation T : V
W induces a linear isomorphism : V/(null(T)) Im(T).
136 CHAPTER 7. OPERATORS AND INVARIANT SUBSPACES
Chapter 8
Inner Product Spaces
8.1 Inner Products
Throughout this chapter F will denote a subeld of the complex numbers c,
and V will denote a vector space over F.
Denition An inner product on V is a scalar-valued function , ) :
V V F that satises the following properties:
(i) u + v, w) = u, w) +v, w) for all u, v, w V .
(ii) cu, v) = cu, v) for all c F, u, v V .
(iii) v, u) = u, v) for all u, v V , where the overline denotes complex
conjugate .
(iv) u, u) > 0 if u ,=

0.
It is easy to check that the above properties force
u, cv + w) = cu, v) +u, w) u, v, w V, c F.
Example 8.1.1. On F
n
there is a standard inner product dened as fol-
lows: for x = (x
1
, . . . , x
n
) and y = (y
1
, . . . , y
n
), put
x, y) =
n

i=0
x
i
y
i
.
It is easy enough to show that the above denition really does give an
inner product on F
n
. In fact, it is a special case of the following example.
137
138 CHAPTER 8. INNER PRODUCT SPACES
Example 8.1.2. Let A be an invertible n n matrix over F. Dene , )
on F
n
(whose elements are written as row vectors for the purpose of this
example) as follows. For u, v V put
u, v) = uAA

, where B

denotes the complex conjugate transpose of B.


It is a fairly straightforward exercise to show that this denition really gives
an inner product. If A = I the standard inner product of the previous example
is obtained.
Example 8.1.3. Let C(0, 1) denote the vector space of all continuous, real-
valued functions on the interval [0, 1]. For f, g C(0, 1) dene
f, g) =
_
1
0
f(t)g(t)dt.
Again it is a routine exercise to show that this really gives an inner product.
Denition An inner product space is a vector space over F (a subeld
of c) together with a specied inner product on that space.
Let V be an inner product space with inner product , ). The length of
a vector v V is dened to be [[v[[ =
_
v, v).
Theorem 8.1.4. If V is an inner product space, then for any vectors u, v V
and any scalar c F,
(i) [[cu[[ = [c[ [[u[[;
(ii) [[u[[ > 0 for u ,=

0;
(iii) [u, v)[ [[u[[ [[v[[;
(iv) [[u + v[[ [[u[[ +[[v[[.
Proof. Statements (i) and (ii) follow almost immediately from the various
denitions involved. The inequality in (iii) is clearly valid when u =

0. If
u ,=

0, put
w = v
v, u)
[[u[[
2
u.
It is easily checked that w, u) = 0 and
8.1. INNER PRODUCTS 139
0 [[w[[
2
= v
v, u)
[[u[[
2
u, v
v, u)
[[u[[
2
u)
= v, v)
v, u)u, v)
[[u[[
2
= [[v[[
2

[u, v)[
2
[[u[[
2
.
Hence [u, v)[
2
[[u[[
2
[[v[[
2
. It now follows that
Reu, v) [u, v)[ |u| |v|,
and
[[u + v[[
2
= [[u[[
2
+u, v) +v, u) +[[v[[
2
= [[u[[
2
+ 2Reu, v) +[[v[[
2
[[u[[
2
+ 2[[u[[ [[v[[ +[[v[[
2
= ([[u[[ +[[v[[)
2
.
Thus [[u + v[[ [[u[[ +[[v[[.
The inequality in (iii) is called the Cauchy-Schwarz inequality. It has
a very wide variety of applications. The proof shows that if u is nonzero,
then [u, v)[ < [[u[[ [[v[[ unless
v =
v, u)
[[u[[
2
u,
which occurs if and only if (u, v) is a linearly dependent list. You should try
out the Cauchy-Schwarz inequality on the examples of inner products given
above. The inequality in (iv) is called the triangle inequality.
Denitions Let u and v be vectors in an inner product space V . Then
u is orthogonal to v if and only if u, v) = 0 if and only if v, u) = 0, in which
case we say u and v are orthogonal and write u v. If S is a set of vectors
in V , S is called an orthogonal set provided each pair of distinct vectors in S
is orthogonal. An orthonormal set is an orthogonal set S with the additional
property that [[u[[ = 1 for every u S. Analogous denitions are made for
lists of vectors.
140 CHAPTER 8. INNER PRODUCT SPACES
Note: The standard basis of F
n
is an orthonormal list with respect to
the standard inner product.
Also, the zero vector is the only vector orthogonal to every vector. (Prove
this!)
Theorem 8.1.5. An orthogonal set of nonzero vectors is linearly indepen-
dent.
Proof. Let S be a nite or innite orthogonal set of nonzero vectors in a
given inner product space. Suppose v
1
, . . . , v
m
are distinct vectors in S and
that
w = c
1
v
1
+ c
2
v
2
+ + c
m
v
m
.
Then
w, v
k
) =

j
c
j
v
j
, v
k
)
=

j
c
j
v
j
, v
k
)
= c
k
v
k
, v
k
).
Since v
k
, v
k
) , = 0, it follows that
c
k
=
w, v
k
)
[[v
k
[[
2
, 1 k m.
When w =

0, each c
k
= 0, so S is an independent set.
Corollary 8.1.6. If a vector w is a linear combination of an orthogonal list
(v
1
, . . . , v
m
) of nonzero vectors, then w is the particular linear combination
w =
m

k=1
w, v
k
)
[[v
k
[[
2
v
k
. (8.1)
Theorem 8.1.7. (Pythagorean Theorem) If u v, then
[[u + v[[
2
= [[u[[
2
+[[v[[
2
. (8.2)
Proof. Suppose u v. Then
[[u + v[[
2
= u + v, u + v)
= [[u[[
2
+[[v[[
2
+u, v) +v, u)
= [[u
2
[[ +[[v[[
2
.
8.1. INNER PRODUCTS 141
Theorem 8.1.8. (Parallelogram Equality) If u, v are vectors in the inner
product space V , then
[[u + v[[
2
+[[u v[[
2
= 2([[u[[
2
+[[v[[
2
).
Proof. The details are routine and are left as an exercise (see exercise 1).
Starting with an inner product , ) on a vector space U over F (where
F is some subeld of c), we dened a norm on U by [[v[[ =
_
v, v), for all
v U. This norm function satises a variety of properties as we have seen
in this section. Sometimes we have a norm function given and would like
to know if it came from an inner product. The next theorem provides an
answer to this question, but rst we give an ocial denition of a norm.
Denition A norm on a vector space U over the eld F is a function
[[ [[ : U [0, ) 1 such that
(i) [[u[[ = 0 i u =

0;
(ii))[[au[[ = [a[ [[u[[ a F, u U;
(iii) [[u + v[[ [[u[[ +[[v[[.
Theorem 8.1.9. Let [[ [[ be a norm on U. Then there is an inner product
, ) on U such that [[u[[ = u, u)
1
2
for all u U if and only if [[ [[ satises
the parallelogram equality.
Proof. We have already seen that if the norm is derived from an inner prod-
uct, then it satises the parallelogram equality. For the converse, now sup-
pose that [[ [[ is a norm on U satisfying the parallelogram equality. We will
show that there must have been an inner product from which the norm was
derived in the usual fashion. We rst consider the case F = 1. It is then
clear (from the real polarization identity - see Exercise 13) that , ) must
be dened in the following way:
u, v) =
[[u + v[[
2
[[u v[[
2
4
. (8.3)
It is then clear that u, u) =
[[2u[[
2
[[

0[[
2
[
4
= [[u[[
2
, so [[u[[ = u, u)
1
2
for all
u U. but it is not at all clear that , ) is an inner product. However, since
u, u) = [[u[[
2
, by the denition of norm we see that
142 CHAPTER 8. INNER PRODUCT SPACES
(a) u, u) 0, with equality if and only if u =

0.
Next we show that , ) is additive in the rst slot. We use the parallel-
ogram equality in the form [[u[[
2
+[[v[[
2
=
[[u+v[[
2
2
+
[[uv[[
2
2
.
Let u, v, w U. Then from the denition of , ) we have:
4(u + v, w) u, w) v, w)) (which should be 0)
= [[u+v +w[[
2
[[u+v w[[
2
[[u+w[[
2
+[[uw[[
2
[[v +w[[
2
+[[v w[[
2
= [[u+v+w[[
2
+([[uw[[
2
+[[vw[[
2
)[[u+vw[[
2
([[u+w[[
2
+[[v+w[[
2
)
= [[u+v+w[[
2
+
[[u + v 2w[[
2
2
+
[[u v[[
2
2
[[u+vw[[
2

|u + v + 2w[[
2
2

[[u v[[
2
2
= ([[u+v+w[[
2
+[[w[[
2
)+
[[u + v 2w[[
2
2
([[u+vw[[
2
+[[w[[
2
)
[[u + v + 2w[[
2
2
=
[[u + v + 2w[[
2
2
+
[[u + v[[
2
2
+
[[u + v 2w[[
2
2

[[u + v[[
2
2

[[u + v 2w[[
2
2

[[u + v + 2w[[
2
2
= 0.
Hence u + v, w) = u, w) +v, w), proving
(b) , ) is additive in the rst slot.
To prove that , ) is homogeneous in the rst slot is rather more involved.
First suppose that n is a positive integer. Then using additivity in the rst
slot we have nu, v) = nu, v). Replacing u with
1
n
u gives u, v) = n
1
n
u, v),
so
1
n
u, v) =
1
n
u, v). So if m, n are positive integers,

m
n
u, v) = m
1
n
u, v) =
m
n
u, v).
Using property (ii) in the denition of norm we see [[ u[[ = [ 1[[[u[[ = [[u[[.
Then using the denition of , ), we have
u, v) =
[[ u + v[[
2
[[ u v[[
2
[
4
=
[[ (u v)[[
2
[[ (u + v)[[
2
4
=
[[u + v[[
2
+[[u v[[
2
4
= u, v).
8.1. INNER PRODUCTS 143
This shows that if r is any rational number, then
ru, v) = ru, v).
Now suppose that 1 is any real number (with special interest in the
case where is not rational). There must be a sequence r
n

n=1
of rational
numbers for which lim
n
r
n
= . Thus
u, v) = lim
n
r
n
u, v) = lim
n
r
n
u, v) =
= lim
n
[[r
n
u + v[[
2
[[r
n
u v[[
2
4
.
We claim that lim
n
[[r
n
u + v[[
2
= [[u + v[[
2
and lim
n
[[r
n
u v[[
2
=
[[u v[[
2
. Once we have shown this, we will have
u, v) =
u + v[[
2
[[u v[[
2
4
= u, v),
completing the proof that , ) is homogeneous in the rst slot.
For x, y U, [[x[[ = [[y+(xy)[[ [[y[[+[[yx[[, so [[x[[[[y[[ [[xy[[.
Interchanging the roles of x and y we see that also [[y| |x[[ [[y x[[ =
[[x y[[. Hence [[[x[[ [[y[[[ [[y x[[. With x = r
n
u + v and y = u + v,
this latter inequality gives
[[[r
n
u + v[[ [[u + v[[[ [[(r
n
)u[[ = [r
n
[ [[u[[.
Since r
n
0, we have [[r
n
u + v[[ [[u + v[[. Replacing v with v
gives lim
n
[[r
n
u v[[ = [[u v[[. So indeed , ) is homogeneous in the
rst slot. Fiinally, we show that
(d) u, v) = v, u).
So:
u, v) =
[[u + v[[
2
[[u v[[
2
4
=
[[v + u[[
2
[[v u[[
2
4
= v, u).
This completes the proof in the case that F = 1.
Now consider the case F = c. By the complex polarization identity (see
Exercise 15) we must have
u, v) =
1
4
4

n=1
i
n
[[u + i
n
v[[
2
. (8.4)
144 CHAPTER 8. INNER PRODUCT SPACES
Then putting v = u we nd
u, u) =
1
4
4

n=1
i
n
[[u + i
n
u[[
2
=
1
4
4

n=1
i
n
[1 + i
n
[
2
[[u[[
2
=
[[u[[
2
4
[2i 0 2i + 4] = [[u[[
2
,
as desired. But we must still show that , ) has the properties of an inner
product.
Because u, u ) = [[u[[
2
, it follows immediately from the properties of a
norm that u, u) 0 with equality if and only if u =

0.
For convenience dene , )
1
to be the real inner product dened above,
so
u, v)
1
=
[[u + v[[
2
[[u v[[
2
4
.
Note that
u, v) = u, v)
1
+ iu, iv)
1
.
We have already proved that , )
1
is additive in the rst slot, which can
now be used in a routine fashion to show that , ) is also additive in the
rst slot. It is even easier to show that au, v) = au, v) for a 1 using the
homogeneity of , )
1
in the rst slot. However, we must still extend this to
all complex numbers. Note that:
iu, v) =
=
[[iu + v[[
2
[[iu v[[
2
+ i[[iu + iv[[
2
i[[iu iv[[
2
4
=
[[i(u + v)[[
2
i [[i(u v)[[
2
i [[i(u + iv)[[
2
+[[i(u iv)[[
2
4
=
[[u + v[[
2
i [[u v[[
2
i [[u + iv[[
2
+[[u iv[[
2
4
= iu, v).
Combining this with additivity and homogeneity with respect to real
numbers, we get that
(a + bi)u, v) = (a + bi)u, v) a, b 1.
8.1. INNER PRODUCTS 145
Hence , ) is homogeneous in the rst slot. The only thing remaining to
show is that v, u) = u, v).
u, v) =
[[u + v[[
2
[[u v[[
2
+[[u + iv[[
2
i [[u iv[[
2
i
4
=
[[v + u[[
2
[[v u[[
2
+[[i(v ui)[[
2
i [[(i)(v + ui)[[
2
i
4
=
[[v + u[[
2
[[v u[[
2
+[[v + iu[[
2
i [[v iu[[
2
i
4
= v, u).
Now suppose that F is a subeld of c and that V = F
n
. There are three
norms on V that are most commonly used in applications.
Denition For vectors x = (x
1
, . . . , x
n
)
T
V , the norms | |
1
, | |
2
,
and | |

, called the 1-norm, 2-norm, and -norm, respectively, are dened


as:
|x|
1
= [x
1
[ +[x
2
[ + +[x
n
[;
|x|
2
= ([x
1
[
2
+[x
2
[
2
+ +[x
n
[
2
)
1/2
; (8.5)
|x|

= max[x
1
[, [x
2
[, . . . , [x
n
[.
Put x = (1, 1)
T
F
2
and y = (1, 1)
T
F
2
. Using these vectors x and y
it is routine to show that | |
1
and | |

do not satisfy the parallelogram


equality, hence must not be derived from an inner product in the usual way.
On the other hand, all three norms are equivalent in a sense that we are
about to make clear. First we pause to notice that the so-called norms really
are norms. The only step that is challenging is the triangle inequality for the
2-norm, and we proved this earlier. The details for the other two norms are
left to the reader.
Denition Let | | be a norm on V . A sequence v
i

i=1
of vectors is
said to converge to the vector v

provided the sequence |v


i
v

| of real
numbers converges to 0.
With this denition we can now talk of a sequence of vectors in F
n
con-
verging by using norms. But which norm should we use? What we mean
by saying that all three norms are equivalent is that a sequence of vectors
146 CHAPTER 8. INNER PRODUCT SPACES
converges to a vector v using one of the norms if and only if it converges to
the same vector v using either of the other norms.
Theorem 8.1.10. The 1-norm, 2-norm and -norm on F
n
are all equiva-
lent in the sense that
(a) If a sequence x
i
of vectors converges to x

as determined in one of
the norms, then it converges to x

in all three norms, and for xed index j,


the entries (x
i
)
j
converge to the entry (x

)
j
. This is an easy consequence of
(b) |x|
1
|x|
2

n n|x|

n|x|
1
.
Proof. If u = (u
1
, . . . , u
n
), put x = ([u
1
[, . . . , [u
n
[)
T
and y = (1, 1, . . . , 1)
T
.
Then by the Cauchy-Schwarz inequality applied to x and y, we have
[x, y)[ =

[u
i
[ = |u|
1
|x|
2
|y|
2
=
_

[u
i
[
2

n = |u|
2

n.
So |x|
1
|x|
2

n for all x F
n
.
Next, |x|
2
2
= x
2
1
+ +x
2
n
n(max[x
i
[)
2
= n |x[[
2

, implying |x|
2
2

n|x|

. This proves the rst and second inequalities in (b), and the third
is quite obvious.
In fact, any two norms on a nite dimensional vector space over F are
equivalent (in the sense given above), but we do not need this result.
8.2 Orthonormal Bases
Theorem 8.2.1. Let L = (v
1
, . . . , v
m
) be an orthonormal list of vectors in
V and put W = span(L). If w =

m
i=1
a
i
v
i
is an arbitrary element of W,
then
(i) a
i
= w, v
i
), and
(ii) [[w[[
2
=

m
i=1
[a
i
[
2
=

m
i=1
[w, v
i
)[
2
.
Proof. If w =

m
i=1
a
i
v
i
, compute w, v
j
) =

m
i=1
a
i
v
i
, v
j
) = a
j
. Then apply
the Pythagorean Theorem.
The preceding result shows that an orthonormal basis can be extremely
handy. The next result gives the Gram-Schmidt algorithm for replacing a
linearly independent list with an orthonormal one having the same span as
the original.
8.2. ORTHONORMAL BASES 147
Theorem 8.2.2. (Gram-Schmidt method) If L = (v
1
, . . . , v
m
) is a lin-
early independent list of vectors in V , then there is an orthonormal list
B = (e
1
, . . . , e
m
) such that span(e
1
, . . . , e
j
) = span(v
1
, . . . , v
j
) for each j =
1, 2, . . . , m.
Proof. Start by putting e
1
= v
1
/[[v
1
[[, so e
1
has norm 1 and spans the same
space as does v
1
.
We construct the remaining vectors e
2
, . . . , e
m
inductively. Suppose that
e
1
, . . . , e
k
have been determined so that (e
1
, . . . , e
k
) is orthonormal and
span(e
1
, . . . , e
j
) = span(v
1
, . . . , v
j
) for each j = 1, 2, . . . , k. We then con-
struct e
k+1
as follows. Put e
t
k+1
= v
k+1

k
i=1
v
k+1
, e
i
)e
i
. Check that e
t
k+1
is orthogonal to each of e
1
, . . . , e
k
. Then put e
k+1
= e
t
k+1
/[[e
t
k+1
[[.
At this point we need to assume that V is nite dimensional.
Corollary 8.2.3. Let V be a nite dimensional inner product space.
(i) V has an orthonormal basis.
(ii) If L is any orthonormal set in V it can be completed to an orthonormal
basis of V .
Proof. We know that any independent set can be completed to a basis to
which we can then apply the Gran-Schmidt algorithm.
Lemma 8.2.4. If T /(V ) has an upper triangular matrix with respect to
some basis of V , then it has an upper triangular matrix with respect to some
orthonormal basis of V .
Proof. Suppose that B = (v
1
, . . . , v
n
) is a basis such that [T]
B
is upper tri-
angular. Basically this just means that for each j = 1, 2, . . . , n the subspace
span(v
1
, . . . , v
j
) is T-invariant. Use the Gram-Schmidt algorithm to construct
an orthonormal basis o = (e
1
, . . . , e
n
) for V such that span(v
1
, . . . , v
j
) =
span(e
1
, . . . , e
j
) for each j = 1, 2, . . . , n. Hence for each j, span(e
1
, . . . , e
j
) is
T-invariant, so that [T]
S
is upper triangular.
Using the fact that c is algebraically closed we showed that if V is a
nite dimensional complex vector space, then there is a basis B for V such
that [T]
B
is upper triangular. Hence we have the following result which is
sometimes called Schurs Theorem.
Corollary 8.2.5. (Schurs Theorem) If T /(V ) where V is a nite di-
mensional complex inner product space, then there is an orthonormal basis
for V with respect to which T has an upper triangular matrix.
148 CHAPTER 8. INNER PRODUCT SPACES
We now apply the preceding results to the case where V = c
n
. First
we introduce a little more language. If P is an invertible matrix for which
P
1
= P

, where P

is the conjugate transpose of P, then P is said to be


unitary. If P is real and unitary (so P
1
= P
T
), we say P is an orthogonal
matrix. Let A be an nn matrix over c. View c
n
as an inner product space
with the usual inner product. Let o be the standard ordered (orthonormal)
basis. Dene T
A
/(c
n
) by T
A
(x) = Ax. We know that [T
A
]
S
= A. Let
B = (v
1
, . . . , v
n
) be an orthonormal basis with respect to which T
A
has an
upper triangular matrix. Let P be the matrix whose jth column is v
j
= [v
j
]
S
.
Then [T
A
]
B
= P
1
AP by Theorem 4.7.2. Since B is orthonormal it is easy
to check that P is a unitary matrix. We have proved the following.
Corollary 8.2.6. If A is an nn complex matrix, there is a unitary matrix
P such that P
1
AP is upper triangular.
8.3 Orthogonal Projection and Minimization
Let V be a vector space over the eld F, F a subeld of c, and let , ) be
an inner product on V . Let W be a subspace of V , and put
W

= v V : w, v) = 0 for all w W.
Obs. 8.3.1. W

is a subspace of V and W W

= 0. Hence W +W

=
W W

.
Proof. Easy exercise.
We do not know if each v V has a representation in the form v =
w + w
t
with w W and w
t
W

, but at least we know from the preceding


observation that it has at most one. There is one case where we know that
V = W W

.
Theorem 8.3.2. If W is nite dimensional, then V = W W

.
Proof. Suppose W is nite dimensional. Then using the Gram-Schmidt pro-
cess, for example, we can nd an orthonormal basis T = (v
1
, . . . , v
m
) of W.
For arbitrary v V , put a
i
= v, v
i
). Then we know that
w =
m

i=1
a
i
v
i
8.3. ORTHOGONAL PROJECTION AND MINIMIZATION 149
is in W, and we write
v = w + w
t
=
m

i=1
a
i
v
i
+ (v
m

i=1
a
i
v
i
).
We show that v

m
i=1
a
i
v
i
W

.
So let u W, say u =

m
j=1
b
j
v
j
. Since T is orthonormal,
v
m

1
a
i
v
i
, u) = v
m

i=1
a
i
v
i
,
m

j=1
b
j
v
j
)
=
m

j=1
b
j
v, v
j
)
m

i,j=1
b
j
a
i
v
i
, v
j
)
=
m

j=1
b
j
a
j

j=1
b
j
a
j
v
j
, v
j
) = 0.
So with w =

m
i=1
a
i
v
i
and w
t
= v w, v = w + w
t
is the unique way to
write v as the sum of an element of W plus an element of W

.
Now suppose V = W W

, (which we have just seen is the case if W


is nite dimensional). A linear transformation P : V V is said to be an
orthogonal projection of V onto W provided the following hold:
(i) P(w) = w for all w W,
(ii) P(w
t
) = 0 for all w
t
W

.
So let P be an orthogonal projection onto W by this denition. Let
v = w + w
t
be any element of V with w W, w
t
W

. Then P(v) =
P(w+w
t
) = P(w) +P(w
t
) = w+0 = w W. So P(v) is a uniquely dened
element of W for all v.
Conversely, still under the hypothesis that V = WW

, dene P
t
: V
V as follows. For v V , write v = w + w
t
, w W, w
t
W

(uniquely!),
and put P
t
(v) = w. It is an easy exercise to show that P
t
really is linear,
P
t
(w) = w for all w W, and P
t
(w
t
) = P
t
(0 + w
t
) = 0 for w
t
W

. So
P
t
: v = w +w
t
w is really the unique orthogonal projection of V onto W.
Moreover,
Obs. 8.3.3. P
2
= P; P(v) = v if and only if v W; P(v) = 0 if and only
if v W

.
150 CHAPTER 8. INNER PRODUCT SPACES
Obs. 8.3.4. I P is the unique orthogonal projection of V onto W

.
Both Obs. 8.3.3 and 8.3.4 are fairly easy to prove, and their proofs are
worthwhile exercises.
Obs. 8.3.5. W W

.
Proof. W

consists of all vectors in V orthogonal to every vector of W. So


in particular, every vector of W is orthogonal to every vector of W

, i.e.,
each vector of W is in (W

. But in general we do not know if there could


be some vector outside W that is in W

.
Theorem 8.3.6. If V = W W

, then W = (W

.
Proof. By Obs. 8.3.5, we must show that W

W. Since V = W W

,
we know there is a unique orthogonal projection P of V onto W, and for
each v V , v P(v) W

. Keep in mind that W (W

and P(v)
W (W

, so v P(v), P(v)) = 0. It follows that for v W

we have
[[vP(v)[[
2
= vP(v), vP(v)) = v, vP(v))P(v), vP(v)) = 00 = 0
by the comments just above. Hence v = P(v), which implies that v W,
i.e., W

W. Hence W

= W.
Note: If V = W W

, then (W

= W, so also V = W

(W

,
and (W

= W

.
Given a nite dimensional subspace U of the inner product space V and
a point v V , we want to nd a point u U closest to v in the sense that
[[v u[[ is as small as possible. To do this we rst construct an orthonormal
basis B = (e
1
, . . . , e
m
) of U. The unique orthogonal projection of V onto U
is given by
P
U
(v) =
m

i=0
v, e
i
)e
i
.
We show that P
U
(v) is the unique u U closest to v.
Theorem 8.3.7. Suppose U is a nite dimensional subspace of the inner
product space V and v V . Then
[[v P
U
(v)[[ [[v u[[ u U.
Furthermore, if u U and equality holds, then u = P
U
(v).
8.3. ORTHOGONAL PROJECTION AND MINIMIZATION 151
Proof. Suppose u U. Then v P
U
(v) U

and P
U
(v) u U, so we may
use the Pythagorean Theorem in the following:
[[v P
U
(v)[[
2
[[v P
U
(v)[[
2
+[[P
U
(v) u[[
2
(8.6)
= [[(v P
U
(v)) + (P
U
(v) u)[[
2
= [[v u[[
2
,
where taking square roots gives the desired inequality. Also, the inequality
of the theorem is an equality if and only if the inequality in Eq. 8.6 is an
equality, which is if and only if u = P
U
(v).
Example 8.3.8. There are many applications of the above theorem. Here
is one example. Let V = 1[x], and let U be the subspace consisting of all
polynomials f(x) with degree less than 4 and satisfying f(0) = f
t
(0) = 0.
Find a polynomial p(x) Usuch that
_
1
0
[2 + 3x p(x)[
2
dx is as small as
possible.
Solution: Dene an inner product on 1[x] by f, g) =
_
1
0
f(x)g(x)dx.
Put g(x) = 2 + 3x, and note that U = p(x) = a
2
x
2
+ a
3
x
3
: a
2
, a
3
1.
We need an orthonormal basis of U. So start with the basis B = (x
2
, x
3
)
and apply the Gram-Schmidt algorithm. We want to put e
1
=
x
2
[[x
2
[[
. First
compute [[x
2
[[
2
=
_
1
0
x
4
dx =
1
5
, so
e
1
=

5 x
2
. (8.7)
Next we want to put
e
2
=
x
3
x
3
, e
1
)e
1
[[x
3
x
3
, e
1
)e
1
[[
.
Here x
3
, e
1
) =

5
_
1
0
x
5
dx =

5
6
. Then x
3
x
3
, e
1
)e
1
= x
3

5
6
x
2
.
Now [[x
3

5
6
x
2
[[
2
=
_
1
0
(x
3

5
6
x
2
)
2
dx =
1
736
. Hence
e
2
= 6

7(x
3

5
6
x
2
) =

7(6x
3
5x
2
). (8.8)
Then the point p U closest to g = 2 + 3x is
152 CHAPTER 8. INNER PRODUCT SPACES
p = g, e
1
)e
1
+g, e
2
)e
2
=
_

5
_
1
0
(2 + 3x)x
2
dx
_

5x
2
+ (8.9)
+
_

7
_
1
0
(2 + 3x)(6x
3
5x
2
)dx
_

7(6x
3
5x
2
),
(8.10)
so that after a bit more computation we have
p = 24x
2

203
10
x
3
. (8.11)
8.4 Linear Functionals and Adjoints
Recall that if V is a vector space over any eld F, then a linear map from
V to F (viewed as a vector space over F) is called a linear functional. The
set V

of all linear functionals on V is called the dual space of V . When V


is a nite dimensional inner product space the linear functionals on V have
a particularly nice form. Fix v V . Then dene
v
: V F : u u, v).
It is easy to see that
v
is a linear functional. If V is nite dimensional then
every linear functional on V arises this way.
Theorem 8.4.1. Let V be a nite-dimensional inner product space. Let
V

. Then there is a unique vector v V such that


(u) = u, v)
for every u V .
Proof. Let (e
1
, . . . , e
n
) be an orthonormal basis of V . Then for any u V ,
we have
u =
n

i=1
u, e
i
)e
i
.
8.4. LINEAR FUNCTIONALS AND ADJOINTS 153
Hence
(u) = (
n

i=1
u, e
i
)e
i
) =
n

i=1
u, e
i
)(e
i
)
=
n

i=1
u, (e
i
)e
i
).
(8.12)
So if we put v =

n
i=1
(e
i
)e
i
, we have (u) = u, v) for every u V .
This shows the existence of the desired v. For the uniqueness, suppose that
(u) = u, v
1
) = u, v
2
) u V.
Then 0 = u, v
1
v
2
) for all u V , forcing v
1
= v
2
.
Now let V and W both be inner product spaces over F. Let T /(V, W)
and x w W. Dene : V F by (v) = T(v), w). First, it is easy to
check that V

. Second, by the previous theorem there is a unique vector


(that we now denote by T

(w)) for which


(v) = T(v), w) = v, T

(w)) v V.
It is clear that T

is some kind of map from W to V . In fact, it is routine


to check that T

is linear, i.e., T

/(W, V ). T

is called the adjoint of T.


Theorem 8.4.2. Let V and W be inner product spaces and suppose that
S, T /(V, W) are both such that S

and T

exist. Then
(i) (S + T)

= S

+ T

.
(ii) (aT)

= aT

.
(iii) (T

= T.
(iv) I

= I.
(v) (ST)

= T

.
Proof. The routine proofs are left to the reader.
Theorem 8.4.3. Let V and W be nite dimensional inner product spaces
over F. Suppose T /(V, W). Then
(i) null(T

) = (Im(T))

;
(ii) Im(T

) = (null(T))

;
(iii) null(T) = (Im(T

))

;
(iv) Im(T) = (null(T

))

.
154 CHAPTER 8. INNER PRODUCT SPACES
Proof. For (i), w null(T

) i T

(w) =

0 i v, T

(w)) = 0 v V
i T(v), w) = 0 v V i w (Im(T))

. This proves (i). By taking


orthogonal complements of both sides of an equality, or by replacing an
operator with its adjoint, the other three equalities are easily established.
Theorem 8.4.4. Let V be a nite dimensional inner product space over F
and let B = (e
1
, . . . , e
n
) be an orthonormal basis for V . Let T /(V ) and
let A = [T]
B
. Then A
ij
= T(e
j
), e
i
).
Proof. The matrix A is dened by
T(e
j
) =
n

i=1
A
ij
e
i
.
Since B is orthonormal, we also have
T(e
j
) =
n

i=1
T(e
j
), e
i
)e
i
.
Hence A
ij
= T(e
j
), e
i
).
Corollary 8.4.5. Let V be a nite dimensional inner product space over F
and let B = (e
1
, . . . , e
n
) be an orthonormal basis for V . Let T /(V ) and
let A = [T]
B
. Then [T

]
B
= A

, where A

is the conjugate transpose of A.


Proof. According to Theorem 8.4.4, A
ij
= T(e
j
), e
i
), and if B = [T

]
B
, then
B
ij
= T

(e
j
), e
i
) = e
i
, T

(e
j
)) = T(e
i
), e
j
) = A
ji
.
8.5 Exercises
1. Parallelogram Equality: If u, v are vectors in the inner product
space V , then
[[u + v[[
2
+[[u v[[
2
= 2([[u[[
2
+[[v[[
2
).
Explain why this equality should be so-named.
2. Let T : 1
n
1
n
be dened by
T(z
1
, z
2
, . . . , z
n
) = (z
2
z
1
, z
3
z
2
, . . . , z
1
z
n
).
8.5. EXERCISES 155
(a) Give an explicit expression for the adjoint, T

.
(b) Is T invertible? Explain.
(c) Find the eigenvalues of T.
(d) Compute the characteristic polynomial of T.
(e) Compute the minimal polynomial of T.
3. Let T /(V ), where V is a complex inner product space and T has an
adjoint T

. Show that if T is normal and T(v) = v for some v V ,


then T

(v) = v.
4. Let V and W be inner product spaces over F, and let T /(V, W). If
there is an adjoint map T

: W V such that T(v), w) = v, T

(w))
for all v V and all w W, show that T

/(W, V ).
5. Let T : V W be a linear transformation whose adjoint T

: W V
with respect to , )
V
, , )
W
does exist. So for all v V , w W,
T(v), w)
W
= v, T

(w))
V
. Put N = null(T) and R

= Im(T

).
(a) Show that (R

= N.
(b) Suppose that R

is nite dimensional and show that N

= R

.
6. Prove Theorem 8.4.2.
7. Prove Theorem 8.4.3.
8. On an in-class linear algebra exam, a student was asked to give (for ten
points) the denition of real symmetric matrix. He couldnt remem-
ber the denition and oered the following alternative: A real, n n
matrix A is symmetric if and only if A
2
= AA
T
. When he received no
credit for this answer, he went to see the instructor to nd out what was
wrong with his denition. By that time he had looked up the denition
and tried to see if he could prove that his denition was equivalent to
the standard one: A is symmetric if and only if A = A
T
. He had a
proof that worked for 2 2 matrices and thought he could prove it for
33 matrices. The instructor said this was not good enough. However,
the instructor would give the student 5 points for a counterexample,
or the full ten points if he could prove that the two denitions were
equivalent for all n. Your problem is to determine whether or not the
156 CHAPTER 8. INNER PRODUCT SPACES
student should have been able to earn ten points or merely ve points.
And you might consider the complex case also: show that A
2
= AA

if
and only A = A

, or nd a counterexample.
9. Let A be a real symmetric matrix. Either prove the following state-
ment or disprove it with a counterexample: Each diagonal element of
A is bounded above (respectively, below) by the largest (respectively,
smallest) eigenvalue of A.
10. Suppose V and W are nite dimensional inner product spaces with
orthonormal bases B
1
and B
2
, respectively. Let T /(V, W), so we
know that T

/(W, V ) exists and is unique. Prove that [T

]
B
1
,B
2
is
the conjugate transpose of [T]
B
2
,B
1
.
11. Suppose that V is an inner product space over F. Let u, v V . Prove
that u, v) = 0 i [[u[[ [[u + av[[ a F.
(Hint: If u v, use the Pythagorean theorem on u + av. For the con-
verse, square the assumed inequality and then show that 2Re(au, v))
[a[
2
[[v[[ for all a F. Then put a = 1/[[v[[ in the case v ,=

0.)
12. For arbitrary real numbers a
1
, . . . , a
n
and b
1
, . . . , b
n
, show that
_
n

j=1
a
j
b
j
_
2

_
n

j=1
ja
2
j
__
n

j=1
b
2
j
/j
_
.
13. Let V be a real inner product space. Show that
u, v) =
1
4
[[u + v[[
2

1
4
[[u v[[
2
. (8.13)
(This equality is the real polarization identity.)
14. Let V be a complex inner product space. Show that
u, v) = Re(u, v)) + i Re(u, iv)). (8.14)
15. Let V be a complex inner product space. Show that
u, v) =
1
4
4

n=1
i
n
[[u + i
n
v[[
2
. (8.15)
8.5. EXERCISES 157
(This equality is the complex polarization identity.)
For the next two exercises let V be a nite dimensional inner product
space, and let P /(V ) satisfy P
2
= P.
16. Show that P is an orthogonal projection if and only if null(P)
(Im(P))

.
17. Show that P is an orthogonal projection if and only if [[P(v)[[ [[v[[
for all v V . (Hint: You will probably need to use Exercises 11 and
16.)
18. Fix a vector v V and dene T V

by T(u) = u, v). For a F


determine the formula for T

(a). Here we us the standard inner product


on F given by a, b) = ab for a, b F.
19. Let T /(V, W). Prove that
(a) T is injective if and only if T

is surjective;
(b) T is surjective if and only if T

is injective.
20. If T /(V ) and U is a T-invariant subspace of V , then U

is T

-
invariant.
21. Let V be a vector space with norm | |. Prove that
[ |u| |v| [ |u v|.
22. Use exercise 21 to show that if v
i

i=1
converges to v in V , then
|u
i
|

i=1
converges to |v|. Give an example in 1
2
to show that the
norms may converge without the vectors converging.
23. Let B be the basis for 1
2
given by B = (e
1
, e
1
+e
2
) where e
1
= (1, 0)
T
and e
2
= (0, 1)
T
. Let T /(1
2
) be the operator whose matrix with
respect to the basis B is [T]
B
=
_
1 1
1 2
_
. Compute the matrix [T

]
B
,
where the standard inner product on 1
2
is used.
24. Let A, B denote matrices in M
n
(c).
158 CHAPTER 8. INNER PRODUCT SPACES
(a) Dene the term unitary matrix; dene what it means for A and B
to be unitarily equivalent; and show that unitary equivalence on
M
n
(c) is an equivalence relation.
(b) Show that if A = (a
ij
) and B = (b
ij
) are unitarily equivalent, then

n
i,j=1
[a
ij
[
2
=

n
i,j=1
[b
ij
[
2
. (Hint: Consider tr(A

A).)
(c) Show that A =
_
3 1
2 0
_
and B =
_
1 1
0 2
_
are similar but
not unitarily equivalent.
(d) Give two 22 matrices that satisfy the equality of part (b) above
but are not unitarily equivalent. Explain why they are not uni-
tarily equivalent.
25. Let U and W be subspaces of the nite-dimensional inner product space
V .
(a) Prove that U

= (U + W)

.
(b) d Prove that
dim(W) dim(U W) = dim(U

) dim(U

).
26. Let u, v V . Prove that u, v) = 0 i |u| |u + av| for all a F.
Chapter 9
Operators on Inner Product
Spaces
Throughout this chapter V will denote an inner product space over F. To
make life simpler we also assume that V is nite dimensional, so operators
on V always have adjoints, etc.
9.1 Self-Adjoint Operators
An operator T /(V ) is called Hermitian or self-adjoint provided T = T

.
The same language is used for matrices. An n n complex matrix A is
called Hermitian provided A = A

(where A

is the transpose of the complex


conjugate of A). If A is real and Hermitian it is merely called symmetric.
Theorem 9.1.1. Let F = c, i.e., V is a complex inner product space, and
suppose T /(V ). Then
(a) If T(v), v) = 0 v V , then T = 0.
(b) If T is self-adjoint, then each eigenvalue of T is real.
(c) T is self-adjoint if and only if T(v), v) 1 v V .
Proof. It is routine to show that for all u, w V ,
T(u), w) =
T(u + w), u + w) T(u w), u w)
4
+
+
T(u + iw), u + iw) T(u iw), u iw)
4
i.
159
160 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
Note that each term on the right hand side is of the form T(v), v) for an
appropriate v V , so by hypothesis T(u), w) = 0 for all u, w V . Put
w = T(u) to conclude that T = 0. This proves part (a).
For part (b), suppose that T = T

, and let v be a nonzero vector in V


such that T(v) = v for some complex number . Then
v, v) = T(v), v) = v, T(v)) = v, v).
Hence = , implying 1, proving part (b).
For part (c) note that for all v V we have
T(v), v) T(v), v) = T(v), v) v, T(v))
= T(v), v) T

(v), v)
= (T T

)(v), v).
If T(v), v) 1 for every v V , then the left side of the equation equals
0, so (T T

)(v), v) = 0 for each v V . Hence by part (a), T T

= 0,
i.e., T is self-adjoint.
Conversely, suppose T is self-adjoint.Then the right hand side of the
equation above equals 0, so the left hand side must also be 0, implying
T(v), v) 1 for all v V , as claimed.
Now suppose that V is a real inner product space. Consider the operator
T /(1
2
) dened by T(x, y) = (y, x) with the standard inner product on
1
2
. Then T(v), v) = 0 for all v 1
2
but T ,= 0. However, this cannot
happen if T is self-adjoint.
Theorem 9.1.2. Let T be a self-adjoint operator on the real inner product
space V and suppose that T(v), v) = 0 for all v V . Then T = 0.
Proof. Suppose the hypotheses of the theorem hold. It is routine to verify
T(u), w) =
T(u + w), u + w) T(u w), u w)
4
using T(w), u) = w, T(u)) = T(u), w) because T is self-adjoint and V is a
real vector space. Hence also T(u), w) = 0 for all u, w V . With w = T(u)
we see T = 0.
9.1. SELF-ADJOINT OPERATORS 161
Lemma 9.1.3. Let F be any subeld of c and let T /(V ) be self-adjoint.
If , 1 are such that x
2
+ x + is irreducible over 1, i.e.,
2
< 4,
then T
2
+ T + I is invertible.
Proof. Suppose
2
< 4 and

0 ,= v V . Then
(T
2
+ T + I)(v), v) = T
2
(v), v) + T(v), v) + v, v)
= T(v), T(v)) + T(v), v) + [[v[[
2
[[T(v)[[
2
[[ [[T(v)[[ [[v[[ + [[v[[
2
=
_
[[T(v)[[
[[ [[v[[
2
_
2
+
_


2
4
_
[[v[[
2
> 0, (9.1)
where the rst inequality holds by the Cauchy-Schwarz inequality. The last
inequality implies that (T
2
+T +I)(v) ,=

0. Thus T
2
+T +I is injective,
hence it is invertible.
We have seen that some operators on a real vector space fail to have
eigenvalues, but now we show that this cannot happen with self-adjoint op-
erators.
Lemma 9.1.4. Let T be a self-adjoint linear operator on the real vector space
V . Then T has an eigenvalue.
Proof. Suppose n = dim(V ) and choose v V with

0 ,= v. Then
(v, T(v), . . . , T
n
(v))
must be linearly dependent. Hence there exist real numbers a
0
, . . . , a
n
, not
all 0, such that

0 = a
0
v + a
1
T(v) + + a
n
T
n
(v).
Construct the polynomial f(x) = a
0
+a
1
x +a
2
x
2
+ +a
n
x
n
which can
be written in factored form as
f(x) = c(x
2
+
1
x +
1
) (x
2
+
k
x +
k
)(x
1
(x
m
),
where c is a nonzero real number, each
j
,
j
and
j
is real, each
2
j
< 4
j
,
m + k 1, and the equation holds for all real x. Then we have
0 = a
0
v + a
1
T(v) + + a
n
T
n
(v)
= (a
0
I + a
1
T + + a
n
T
n
)(v)
= c(T
2
+
1
T +
1
I) (T
2
+
k
T +
k
)(T
1
I) (T
m
I)(v).
162 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
Each T
2
+
j
T +
j
I is invertible by Lemma 9.1.3 because T is self-adjoint
and each
2
j
< 4
j
. Also c ,= 0. Hence the equation above implies that
0 = (T
1
I) (T
m
I)(v).
It then follows that T
j
I is not injective for at least one j. This says that
T has an eigenvalue.
The next theorem is very important for operators on real inner product
spaces.
Theorem 9.1.5. The Real Spectral Theorem: Let T be an operator on
the real inner product space V . Then V has an orthonormal basis consisting
of eigenvectors of T if and only if T is self-adjoint.
Proof. First suppose that V has an orthonormal basis B consisting of eigen-
vectors of T. Then [T]
B
is a real diagonal matrix, so it equals its conjugate
transpose, i.e., T is self-adjoint.
For the converse, suppose that T is self-adjoint. Our proof is by induction
on n = dim(V ). The desired result clearly holds if n = 1. So assume that
dim(V ) = n > 1 and that the desired result holds on vector spaces of smaller
dimension. By Lemma 9.1.4 we know that T has an eigenvalue with a
nonzero eigenvector u, and without loss of generality we may assume that
[[u[[ = 1. Let U = span(u). Suppose v U

, i.e. u, v) = 0. Then
u, T(v)) = T(u), v) = u, v) = 0,
so T(v) U

whenever u U

, showing that U

is T-invariant. Hence the


map S = T[
U
/(U

). If v, w U

, then
S(v), w) = T(v), w) = v, T(w)) = v, S(w)),
which shows that S is self-adjoint. Thus by the induction hypothesis there
is an orthonormal basis of U

consisting of eigenvectors of S. Clearly every


eigenvector of S is an eigenvector of T. Thus adjoining u to an orthonormal
basis of U

consisting of eigenvectors of S gives an orthonormal basis of V


consisting of eigenvectors of T, as desired.
Corollary 9.1.6. Let A be a real nn matrix. Then there is an orthogonal
matrix P such that P
1
AP is a (necessarily real) diagonal matrix if and only
if A is symmetric.
9.2. NORMAL OPERATORS 163
Corollary 9.1.7. Let A be a real symmetric matrix with distinct (necessarily
real) eigenvalues
1
, . . . ,
m
. Then
V = null(T
1
I) null(T
m
I).
9.2 Normal Operators
Denition An operator T /(V ) is called normal provided TT

= T

T.
Clearly any self-adjoint operator is normal, but there are many normal op-
erators in general that are not self-adjoint. For example, if A is an n n
nonzero real matrix with A
T
= A (i.e., A is skew-symmetric) , then A ,= A

but A is a normal matrix because AA

= A

A. It follows that if T /(1


n
)
is the operator with [T]
S
= A where o is the standard basis of 1
n
, then T
is normal but not self-adjoint.
Recall Theorem 7.1.2 (A list of nonzero eigenvectors belonging to distinct
eigenvalues must be linearly independent.)
Theorem 9.2.1. Let T /(V ). Then [[T(v)[[ = [[T

(v)[[ v V i T is
normal.
Proof.
T is normal T

T TT

= 0
(T

T TT

)(v), v) = 0 v V
T

T(v), v) = TT

(v), v) v V
[[T(v)[[
2
= [[T

(v)[[
2
v V. (9.2)
Since T

T TT

is self-adjoint, the theorem follows from Theorems 9.1.1


part (a) and 9.1.2.
Corollary 9.2.2. Let T /(V ) be normal. Then
(a) If v V is an eigenvector of T with eigenvalue F, then v is also
an eigenvector of T

with eigenvalue .
(b) Eigenvectors of T corresponding to distinct eigenvalues are orthogonal.
Proof. Note that (T I)

= T

I, and that T is normal if and only if


T I is normal. Suppose T(v) = v. Since T is normal, we have
0 = [[(T I)(v)[[ = v, (T

I)(T I)(v)) = [[(T

I)(v)[[.
164 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
Part (a) follows.
For part (b), suppose and are distinct eigenvalues with associated
eigenvectors u and v, respectively. So T(u) = u and T(v) = v, and from
part (a), T

(v) = v. Hence
( )u, v) = u, v) u, v)
= T(u), v) u, T

(v))
= 0.
Because ,= , the above equation implies that u, v) = 0.
The next theorem is one of the truly important results from the theory of
complex inner product spaces. Be sure to compare it with the Real Spectral
Theorem.
Theorem 9.2.3. Complex Spectral Theorem Let V be a nite dimen-
sional complex inner product space and T /(V ). Then T is normal if and
only if V has an orthonormal basis consisting of eigenvectors of T.
Proof. First suppose that V has an orthonormal basis B consisting of eigen-
vectors of T, so that [T]
B
= A is a diagonal matrix. Then A

is also diagonal
and is the matrix A

= [T

]
B
. Since any two diagonal matrices commute,
AA

= A

A. This implies that T is normal.


For the converse, suppose that T is normal. Since V is a complex vector
space, we know (by Schurs Theorem) that there is an orthonormal basis
B = (e
1
, . . . , e
n
) for which A = [T]
B
is upper triangular. If A = (a
ij
), then
a
ij
= 0 whenever i > j. Also T(e
1
) = a
11
e
1
, so
[[T(e
1
)[[
2
= [a
11
[
2
[[T

(e
1
)[[
2
= [a
11
[
2
+[a
12
[
2
+ +[a
1n
[
2
.
Because T is normal, [[T(e
1
)[[ = [[T

(e
1
)[[. So the two equations above imply
that all entries in the rst row of A, except possibly the diagonal entry a
11
,
equal 0. It now follows that T(e
2
) = a
12
e
1
+ a
22
e
2
= a
22
e
2
, so
[[T(e
2
)[[
2
= [a
22
[
2
,
[[T

(e
2
)[[
2
= [a
22
[
2
+[a
23
[
2
+ +[a
2n
[
2
.
Because T is normal, [[T(e
2
)[[ = [[T

(e
2
)[[. Thus the two equations just
above imply that all the entries in the second row of A, except possibly the
diagonal entry a
22
, must equal 0. Continuing in this fashion we see that all
the nondiagonal entries of A equal 0, i.e., A is diagonal.
9.2. NORMAL OPERATORS 165
Corollary 9.2.4. Let A be a normal, n n complex matrix. Then there is
a unitary matrix P such that
P
1
AP is diagonal.
It follows that the minimal polynomial of A has no repeated roots.
Theorem 9.2.5. Let A be n n. Then A is normal if and only if the
eigenspaces of AA

are A-invariant.
Proof. First suppose that A is normal, so AA

= A

A, and suppose that


AA

x = x. We show that Ax also belongs to for AA

: AA

(Ax) =
A(AA

x) = A x = (Ax). For the converse, suppose the eigenspaces of


AA

are A-invariant. We want to show that A is normal. We start with the


easy case.
Lemma 9.2.6. Suppose BB

= diag(
1
, . . . ,
k
, 0 . . . , 0) is a diagonal ma-
trix, and suppose that the eigenspaces of BB

are B-invariant. Then B is


normal.
Proof of Lemma: u = (0, . . . , 0, u
k+1
, . . . , u
n
)
T
is a typical element of the
null space of BB

, i.e., the eigenspace belonging to the eigenvalue 0. First


note that the bottomnk rows of B must be zero, since the (i, i) entry of BB

is the inner product of the ith row of B with itself, which must be 0 if i k+1.
So by hypothesis B(0, . . . , 0, u
k+1
, . . . , u
n
)
T
= (0, . . . , 0, v
k+1
, . . . , v
n
)
T
. Since
the top k entries of B(0, . . . , 0, u
k+1
, . . . , u
n
)
T
must be zero, the entries in the
top k rows and last n k columns must be zero, so
B =
_
B
1
0
0 0
_
,
where B
1
is k k with rank k.
For 1 i k, the standard basis vector e
i
is an eigenvector of BB

belonging to
i
. And by hypothesis, BB

Be
i
=
i
Be
i
. So B BB

e
i
=
B
i
e
i
, implying that [B
2
B

BB

B]e
i
=

0. But this latter equality is
also seen to hold for k + 1 i n. Hence B
2
B

= BB

B. Now using
block multiplication, B
2
1
B

1
= B
1
B

1
B
1
, implying that B
1
B

1
= B

1
B
1
. But
this implies that BB

= B

B. So B is normal, proving the lemma.


Now return to the general case. Let u
1
, . . . , u
n
be an orthonormal basis
of eigenvectors of AA

. Use these vectors as the columns of a matrix U, so


that U

(AA

)U = diag(
1
, . . . ,
k
, 0 =
k+1
, . . . , 0 =
n
), where 0 ,=
1

k
166 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
and k = rank(A). Our hypothesis says that if AA

u
j
=
j
u
j
, then AA


Au
j
=
j
Au
j
. Put B = U

AU. So BB

= U

AUU

U = U

(AA

)U =
diag(
1
, . . . ,
n
). Compute BB

(U

u
j
) = U

AA

u
j
= U


j
u
j
=
j
U

u
j
.
So U

u
j
= e
j
is an eigenvector of BB

belonging to
j
.
Also (BB

)BU

u
j
= U

AUU

UU

AUU

u
j
= U

AA

Au
j
= U

j
Au
j
=
j
U

AUU

u
j
=
j
B(U

u
j
). So the eigenspaces of BB

are
B-invariant. Since BB

is diagonal, by the Lemma we know B is normal.


But now it follows easily that A = UBU

must be normal
9.3 Decomposition of Real Normal Operators
Throughout this section V is a real inner product space.
Lemma 9.3.1. Suppose dim(V ) = 2 and T /(V ). Then the following are
equivalent:
(a) T is normal but not self-adjoint.
(b) The matrix of T with respect to every orthonormal basis of V has the
form
_
a b
b a
_
, with b ,= 0.
(c) The matrix of T with respect to some orthonormal basis of V has the
form
_
a b
b a
_
, with b > 0.
Proof. First suppose (a) holds and let o = (e
1
, e
2
) be an orthonormal basis
of V . Suppose
[T]
S
=
_
a c
b d
_
.
Then [[T(e
1
)[[
2
= a
2
+ b
2
and [[T

(e
1
)[[
2
= a
2
+ c
2
. Because T is normal,
by Theorem 9.2.1 [[T(e
1
)[[ = [[T

(e
1
)[[. Hence b
2
= c
2
. Since T is not
self-adjoint, b ,= c, so we have c = b. Then [T

]
S
=
_
a b
b d
_
. So
[TT

]
S
=
_
a
2
+ b
2
ab bd
ab bd b
2
+ d
2
_
, and [T

T]
S
=
_
a
2
+ b
2
ab + bd
ab + bd b
2
+ d
2
_
.
Since T is normal it follows that b(a d) = 0. Since T is not self-adjoint,
b ,= 0, implying that a = d, completing the proof that (a) implies (b).
Now suppose that (b) holds and let B = (e
1
, e
2
) be any orthonormal basis
of V . Then either B or B
t
= (e
1
, e
2
) will be a basis of the type needed to
show that (c) is satised.
9.3. DECOMPOSITION OF REAL NORMAL OPERATORS 167
Finally, suppose that (c) holds, i.e., there is an orthonormal basis B =
(e
1
, e
2
) such that [T]
B
has the form given in (c). Clearly T ,= T

, but a
simple computation with the matrices representing T and T

shows that
TT

= T

T, i.e., T is normal, implying that (a) holds.


Theorem 9.3.2. Suppose that T /(V ) is normal and U is a T-invariant
subspace. Then
(a) U

is T-invariant.
(b) U is T

-invariant.
(c) (T[
U
)

= (T

)[
U
.
(d) T[
U
is a normal operator on U.
(e) T[
U
is a normal operator on U

.
Proof. Let B
t
= (e
1
, . . . , e
m
) be an orthonormal basis of U and extend it to an
orthonormal basis B = (e
1
, . . . , e
m
, f
1
, . . . , f
n
) of V . Since U is T-invariant,
[T]
B
=
_
A B
0 C
_
, where A = [T[
U
]
B
.
For each j, 1 j m, [[T(e
j
)[[
2
equals the sum of the squares of the
absolute values of the entries in the jth column of A. Hence
m

j=1
[[T(e
j
)[[
2
=
the sum of the squares of the absolute
values of the entries of A.
(9.3)
For each j, 1 j m, [[T

(e
j
)[[
2
equals the sum of the squares of the
absolute values of the entries in the jth rows of A and B. Hence
m

j=1
[[T

(e
j
)[[
2
=
the sum of the squares of the absolute
values of the entries of A and B.
(9.4)
Because T is normal, [[T(e
j
)[[ = [[T

(e
j
)[[ for each j. It follows that the
entries of B must all be 0, so
[T]
B
=
_
A 0
0 C
_
.
This shows that U

is T-invariant, proving (a).


But now we see that
[T

]
B
=
_
A

0
0 C

_
,
168 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
implying that U is T

-invariant. This completes a proof of (b).


Now let S = T[
U
. Fix v U. Then
S(u), v) = T(u), v) = u, T

(v)) u U.
Because T

(v) U (by (b)), the equation above shows that S

(v) = T

(v),
i.e., (T[
U
)

= (T

)[
U
, completing the proof of (c). Parts (d) and (e) now
follow easily.
At this point the reader should review the concept of block multiplication
for matrices partitioned into blocks of the appropriate sizes. In particular,
if A and B are two block diagonal matrices each with k blocks down the
diagonal, with the jth block being n
j
n
j
, then the product AB is also block
diagonal. Suppose that A
j
, B
j
are the jth blocks of A and B, respectively.
Then the jth block of AB is A
j
B
j
:
_
_
_
_
_
A
1
0 0
0 A
2
0
.
.
.
.
.
.
.
.
.
0 A
m
_
_
_
_
_

_
_
_
_
_
B
1
0 0
0 B
2
0
.
.
.
.
.
.
.
.
.
0 B
m
_
_
_
_
_
=
_
_
_
_
_
A
1
B
1
0 0
0 A
2
B
2
0
.
.
.
.
.
.
.
.
.
0 A
m
B
m
_
_
_
_
_
.
We have seen the example T(x, y) = (y, x) of an operator on 1
2
that
is normal but has no eigenvalues, so has no diagonal matrix. However, the
following theorem says that normal operators have block-diagonal matrices
with blocks of size at most 2 by 2.
Theorem 9.3.3. Suppose that V is a real inner product space and T /(V ).
Then T is normal if and only if there is an orthonormal basis of V with respect
to which T has a block diagonal matrix where each block is a 1-by-1 matrix
or a 2-by-2 matrix of the form
_
a b
b a
_
, (9.5)
with b > 0.
9.4. POSITIVE OPERATORS 169
Proof. First suppose that V has an orthonormal basis B for which [T]
B
is
block diagonal of the type described in the theorem. Since a matrix of the
form given in Eq. 9.5 commutes with its adjoint, clearly T is also normal.
For the converse, suppose that T is normal. Our proof proceeds by in-
duction on the dimension n of V . For n = 1 the result is obvious. For
n = 2 if T is self-adjoint it follows from the Real Spectral Theorem; if T is
not self-adjoint, use Lemma 9.3.1. Now assume that n = dim(V ) > 2 and
that the desired result holds on vector spaces of dimension smaller than n.
By Theorem 7.3.1 we may let U be a T-invariant subspace of dimension 1 if
there is one. If there is not, then we let U be a 2-dimensional T-invariant
subspace. First, if dim(U) = 1, let e
1
be a nonzero vector in U with norm
1. So B
t
= (e
1
) is an orthonormal basis of U. Clearly the matrix [T[
U
]
B

is 1-by-1. If dim(U) = 2, then T[


U
is normal (by Theorem 9.3.2), but not
self-adjoint (since otherwise T[
U
, and hence T, would have an eigenvector
in U by Lemma 9.1.4). So we may choose an orthonormal basis of U with
respect to which the matrix of T[
U
has the desired form. We know that
U

is T-invariant and T[
U
is a normal operator on U

. By our induction
hypothesis there is an orthonormal basis of U

of the desired type. Putting


together the bases of U and U

we obtain an orthonormal basis of V of the


desired type.
9.4 Positive Operators
In this section V is a nite dimensional inner product space. An operator
T /(V ) is said to be a positive operator provided
T = T

and T(v), v) 0 v V.
Note that if V is a complex space, then having T(v), v) 1 for all v V
is sucient to force T to be self-adjoint (by Theorem 9.1.1, part (c)). So
T(v), v) 0 for all v V is sucient to force T to be positive.
There are many examples of positive operators. If P is any orthogonal
projection, then P is positive. (You should verify this!) In the proof of
Lemma 9.1.3 we showed that if T /(V ) is self-adjoint and if , 1 are
such that
2
< 4, then T
2
+ T + I is positive. You should think about
the analogy between positive operators (among all operators) and the non-
negative real numbers among all complex numbers. This will be made easier
170 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
by the theorem that follows, which collects the main facts about positive
operators.
If S, T /(V ) and S
2
= T, we say that S is a square root of T.
Theorem 9.4.1. Let T /(V ). Then the following are equivalent:
(a) T is positive;
(b) T is self-adjoint and all the eigenvalues of T are nonnegative:
(c) T has a positive square root;
(d) T has a self-adjoint square root;
(e) There exists an operator S /(V ) such that T = S

S.
Proof. We will prove the (a) = (b) = (c) = (d) = (e) = (a).
Suppose that (a) holds, i.e., T is positive, so in particular T is self-adjoint.
Let v be a nonzero eigenvector belonging to the eigenvalue . Then
0 T(v), v) = v, v) = v, v),
implying that is a nonnegative number, so (b) holds.
Now suppose that (b) holds, so T is self-adjoint and all the eigenvalues
of T are nonnegative. By the Real and Complex Spectral Theorems, there is
an orthonormal basis B = (e
1
, . . . , e
n
) of V consisting of eigenvectors of T.
Say T(e
i
) =
j
e
i
, where each
i
0. Dene S /(B) by
S(e
j
) =
_

j
e
j
, 1 j n.
Since S has a real diagonal matrix with respect to the basis B, clearly S is self-
adjoint. Now suppose v =

n
j=1
a
j
e
j
. Then S(v), v) =

n
j=1
_

j
[a
j
[
2
0,
so S is a positive square root of T. This shows (b) = (c).
Clearly (c) implies (d), since by denition every positive operator is self-
adjoint. So suppose (d) holds and let S be a self-adjoint operator with
T = S
2
. Since S is self-adjoint, T = S

S, proving that (e) holds.


Finally, suppose that T = S

S for some S /(V ). It is easy to check


that T

= T, and then T(v), v) = S

S(v), v) = (S(v), S(v)) 0, showing


that (e) implies (a).
An operator can have many square roots. For example, in addition to I
being square roots of I, for each a F and for each nonzero b F, we have
that
A =
_
a b
1a
2
b
a
_
satises A
2
= I.
However, things are dierent if we restrict our attention to positive operators.
9.5. ISOMETRIES 171
Theorem 9.4.2. Each positive operator on V has a unique positive square
root.
Proof. Let T /(V ) be positive with nonnegative distinct eigenvalues

1
, . . . ,
m
. Since T is self-adjoint, we know by Theorem 7.2.8 (and the
Spectral Theorems) that
V = null(T
1
I) null(T
m
I).
By the preceding theorem we know that T has a positive square root S.
Suppose is an eigenvalue of S. If v null(S I). Then T(v) = S
2
(v) =

2
v, so
2
is some eigenvalue of T, i.e., =

i
for some i. Clearly
null(S
_

j
) null(T
j
I).
Since the only possible eigenvalues of S are

1
, . . . ,

m
, and because
S is self-adjoint, we also know that
V = null(S
_

1
I) null(S
_

m
I).
A dimension argument then shows that
null(S
_

j
I) = null(T
j
I)
for each j. In other words, on null(T
j
I), the operator S is just multiplica-
tion by
_

j
. Thus S, the positive square root of T, is uniquely determined
by T.
9.5 Isometries
An operator S /(V ) is called an isometry provided
[[S(v)[[ = [[v[[ v V.
For example, I is an isometry whenever F satises [[ = 1. More
generally, suppose that
1
, . . . ,
n
are scalars with absolute value 1 and S
/(V ) satises S(e
j
) =
j
e
j
for some orthonormal basis B = (e
1
, . . . , e
n
) of
V . For v V we have
v = v, e
1
)e
1
+ +v, e
n
)e
n
(9.6)
172 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
and (using the Pythagorean theorem)
[[v[[
2
= [v, e
1
)[
2
+ +[v, e
n
)[
2
. (9.7)
Apply S to both sides of Eq. s9.6:
S(v) =
1
v, e
1
)e
1
+ +
n
v, e
n
)e
n
.
This last equation along with [
j
[ = 1 shows that
[[S(v)[[
2
= [v, e
1
)[
2
+ +[v, e
n
)[
2
. (9.8)
If we compare Eqs. 9.7 and 9.8, we see that [[v[[ = [[S(v)[[, i.e., S is an
isometry. In fact, this is the prototypical isometry, as we shall see. The next
theorem collects the main results concerning isometries.
Theorem 9.5.1. Suppose V is an n-dimensional inner product space and
S /(V ). Then the following are equivalent:
(a) S is an isometry;
(b) S(u), S(v)) = u, v) u, v V ;
(c) S

S = I;
(d) (S(e
1
), . . . , S(e
n
)) is an orthonormal basis of V whenever (e
1
, . . . , e
n
)
is an orthonormal basis of V ;
(e) there exists an orthonormal basis (e
1
, . . . , e
n
) of V for which
(S(e
1
), . . . , S(e
n
)) is orthonormal;
(f ) S

is an isometry;
(g) S

(u), S

(v)) = u, v) u, v V ;
(h) SS

= I;
(i) S

(e
1
), . . . , S

(e
n
)) is orthonormal whenever (e
1
, . . . , e
n
) is an or-
thonormal list of vectors in V ;
(j) there exists an orthonormal basis (e
1
, . . . , e
n
) of V for which
(S

(e
1
), . . . , S

(e
n
)) is orthonormal.
Proof. To start, suppose S is an isometry. If V is a real inner-product space,
then for all u, v V , using the real polarization identity we have
S(u), S(v)) =
[[S(u) + S(v)[[
2
[[S(u) S(v)[[
2
4
=
[[S(u + v)[[
2
[[S(u v)[[
2
4
=
[[u + v[[
2
[[u v[[
2
4
= u, v).
9.5. ISOMETRIES 173
If V is a complex inner product space, use the complex polarization iden-
tity in the same fashion. In either case we see that (a) implies (b).
Now suppose that (b) holds. Then
(S

S I)(u), v) = S(u), S(v)) u, v)


= 0
for every u, v V . In particular, if v = (S

S I)(u), then necessarily


(S

S I)(u) = 0 for all u V , forcing S

S = I. Hence (b) implies (c).


Suppose that (c) holds and let (e
1
, . . . , e
n
) be an orthonormal list of vec-
tors in V . Then
S(e
j
), S(e
k
)) = S

S(e
j
), e
k
) = e
j
, e
k
).
Hence (S(e
1
), . . . , S(e
n
)) is orthonormal, proving that (c) implies (d).
Clearly (d) implies (e).
Suppose that (e
1
, . . . , e
n
) is a basis of V for which (S(e
1
), . . . , S(e
n
)) is
orthonormal. For v V ,
[[S(v)[[
2
= [[S(
n

i=1
v, e
i
)e
i
[[
2
=
n

i=1
[[v, e
i
)S(e
i
)[[
2
=
n

i=1
[v, e
i
)[
2
= [[v[[
2
.
Taking square roots we see that (e) implies (a).
We have now shown that (a) through (e) are equivalent. Hence replacing
S by S

we have that (f) through (j) are equivalent. Clearly (c) and (h) are
equivalent, so the proof of the theorem is complete.
The preceding theorem shows that an isometry is necessarily normal (see
parts (a), (c) and (h)). Using the characterization of normal operators proved
earlier we can now give a complete description of all isometries. But as usual,
there are separate statements for the real and the complex cases.
174 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
Theorem 9.5.2. Let V be a (nite dimensional) complex inner product space
and let S /(V ). Then S is an isometry if and only if there is an orthonor-
mal basis of V consisting of eigenvectors of S all of whose corresponding
eigenvalues have absolute value 1.
Proof. The example given at the beginning of this section shows that the
condition given in the theorem is sucient for S to be an isometry. For the
converse, suppose that S /(V ) is an isometry. By the complex spectral
theorem there is an orthonormal basis (e
1
, . . . , e
n
) of V consisting of eigen-
vectors of S. For 1 j n, let
j
be the eigenvalue corresponding to e
j
.
Then
[
j
[ = [[
j
e
j
[[ = [[S(e
j
)[[ = [[e
j
[[ = 1.
Hence each eigenvalue of S has absolute value 1, completing the proof.
The next result states that every isometry on a real inner product space
is the direct sum of pieces that look like rotations on 2-dimensional subpaces,
pieces that equal the identity operator, and pieces that equal multiplication
by -1. It follows that an isometry on an odd-dimensional real inner product
space must have 1 or -1 as an eigenvalue.
Theorem 9.5.3. Suppose that V is a real inner product space and S /(V ).
Then S is an isometry if and only if there is an orthonormal basis of V
with respect to which S has a block diagonal matrix where each block on the
diagonal is a 1-by-1 matrix containing 1 or -1, or a 2-by-2 matrix of the form
_
cos() sin()
sin() cos()
_
, with (0, ). (9.9)
Proof. First suppose that S is an isometry. Because S is normal, there is an
orthonormal basis of V such that with respect to this basis S has a block
diagonal matrix, where each block is a 1-by-1 matrix or a 2-by-2 matrix of
the form
_
a b
b a
_
, with b > 0. (9.10)
If is an entry in a 1-by-1 block along the diagonal of the matrix of S (with
respect to the basis just mentioned), then there is a basis vector e
j
such that
S(e
j
) = e
j
. Because S is an isometry, this implies that [[ = 1 with real,
forcing = 1.
9.6. THE POLAR DECOMPOSITION 175
Now consider a 2-by-2 matrix of the form in Eq. 9.10 along the diagonal
of the matrix of S. There are basis vectors e
j
, e
j+1
such that
S(e
j
) = ae
j
+ be
j+1
.
Thus
1 = [[e
j
[[
2
= [[S(e
j
)[[
2
= a
2
+ b
2
.
This equation, along with the condition that b > 0, implies that there exists
a number (0, ) such that a = cos() and b = sin(). Thus the matrix
in Eq. 9.10 has the required form. This completes the proof in one direction.
Conversely, suppose that there is an orthonormal basis of V with respect
to which the matrix of S has the form stated in the theorem. There there is
a direct sum decomposition
V = U
1
U
m
,
where each subspace U
j
is a subspace of V having dimension 1 or 2. Fur-
thermore, any two vectors belonging to distinct Us are orthogonal, and each
S[
U
j
is an isometry mapping U
j
into U
j
. If v V , write
v =
m

i=1
u
i
, u
j
U
j
.
Applying S to this equation and taking norms gives
[[S(v)[[
2
= [[S(u
1
) + + S(u
m
)[[
2
= [[S(u
1
)[[
2
+ +[[S(u
m
)[[
2
= [[u
1
[[
2
+ +[[u
m
[[
2
= [[v[[
2
.
This shows that S is an isometry, as desired.
9.6 The Polar Decomposition
Theorem 9.6.1. Polar Decomposition If T /(V ), then there exists an
isometry S /(V ) such that
T = S

T.
176 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
Proof. Let T /(V ). Then for each v V , [[T(v)[[
2
= T(v), T(v)) =
T

T(v), v) =

T(v),

T(v)) = [[

T(v)[[
2
.
So we have established
[[T(v)[[ = [[

T(v)[[ v V. (9.11)
The next step is to construct a map
S
1
: Im(

T) Im(T) :

T(v) T(v),
and show that it is a well-dened isometry.
[[T(v
1
)T(v
2
)[[ = [[T(v
1
v
2
)[[ = [[

T(v
1
v
2
)[[ = [[

T(v
1
)

T(v
2
)[[.
This shows that T(v
1
) = T(v
2
) if and only if

T(v
1
) =

T(v
2
). In
fact it shows that S
1
is well-dened and is a bijection from Im(

T) onto
Im(T). One consequence of this is that
dim(Im(

T)) = dim(Im(T)) and


dim(Im(

T))

= dim(Im(T))

.
It is also easy to check that S
1
is linear. Moreover, if v =

T(u), then
[[S
1
(v)[[ = [[T(u)[[ = [[

T(u)[[ = [[v[[, implying that S


1
is an isometry.
Now construct an orthonormal basis (e
1
, . . . , e
m
) of (Im(

T)

and an
orthonormal basis (f
1
, . . . , f
m
) of (Im(T))

. Dene
S
2
: (Im(

T)

(Im(T))

by: S
2
(e
j
) = f
j
(and extend linearly). It follows that [[S
2
(w)[[ = [[w[[ for all
w (Im(

T)

. Here S
2
is an isometry by part (e) of Theorem 9.5.1.
(Notation: If U and W are independent, orthogonal subspaces of V so
U + W = U W and U W

, we write U W in place of U W.)


We know that
V = Im(

T) (Im(

T))

.
For v V , write v = u + w with u Im(

T), w (Im(

T))

.
Dene S : V V by S(v) = S
1
(u) + S
2
(w). It is easy to check that
S /(V ). Moreover, [[S(v)[[
2
= [[S
1
(u)+S
2
(w)[[
2
= [[S
1
(u)[[
2
+[[S
2
(w)[[
2
=
[[u[[
2
+[[w[[
2
= [[v[[
2
, implying S is an isometry. (Here we used the fact that
S
1
(u) Im(T) and S
2
(w) (Im(T))

.) The only thing left to check is that


T = S

T, but this is obvious by the way S


1
is dened on the image of

T.
9.7. THE SINGULAR-VALUE DECOMPOSITION 177
We can visualize this proof as follows. Start with T /(V ).
Im(

T)
_
Im(

T)
_

e
1
, . . . , e
m

T(v) e
j
S
1
S
2

T(v) f
j
Im(T) (Im(T))

f
1
, . . . , f
m
S = S
1
S
2
= S is an isometry.
And
Polar Decomposition: T = S

T.
If T is invertible, then S = T (

T)
1
is unique. If T is not invertible,
then Im(T) ,= V , so S
2
,= S
2
. Hence S
t
= S
1
S
2
yields s a polar
decomposition of T distinct from that given by S
t
.
The polar decomposition states that each operator on V can be written
as the product of an isometry and a positive operator. Thus we can write
each operator on V as the product of two operators, each of which is of a
type that we have completely described and understand reasonably well. We
know there is an orthonormal basis of V with respect to which the isometry
S has a diagonal matrix (if F = c) or a block diagonal matrix with blocks of
size at most 2-by-2 (if F = 1), and there is an orthonormal basis of V with
respect to which

T has a diagonal matrix. Unfortunately, there may


not be one orthonormal basis that does both at the same time. However,
we can still say something interesting. This is given by the singular value
decomposition which is discussed in the next section.
9.7 The Singular-Value Decomposition
Statement of the General Result
Let F be a subeld of the complex number eld c, and let A M
m,n
(F).
Clearly A

A is self-adjoint, so there is an orthonormal basis (v


1
, . . . , v
n
) for
F
n
(whose elements we view as column vectors) consisting of eigenvectors
of A

A with associated eigenvalues


1
, . . . ,
n
. Let , ) denote the usual
178 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
inner product on F
n
. Since [[Av
i
[[
2
= Av
i
, Av
i
) = A

Av
i
, v
i
) =
i
v
i
, v
i
) =

i
[[v
i
[[
2
=
i
, we have proved the following lemma.
Lemma 9.7.1. With the notation as above,
(i) Each eigenvalue
i
of A

A is real and nonnegative. Hence WLOG we


may assume that
1

2

n
0.
(ii) With s
i
=

i
for 1 i n, we say that the s
i
, 1 i n, are the
singular values of A. Hence the singular values of A are the lengths of the
vectors Av
1
, . . . , Av
n
.
This is enough for us to state the theorem concerning the Singular Value
Decomposition of A.
Theorem 9.7.2. Given A M
m,n
(F) as above, there are unitary matrices
U M
m
(F) and V M
n
(F) such that
A = UV

, where =
_
_
_
_
_
_
_
s
1
0
.
.
.
s
r
0
0
_
_
_
_
_
_
_
(9.12)
is a diagonal m n matrix and s
1
s
2
s
r
are the positive (i.e.,
nonzero) singular values of A.
(i) The columns of U are eigenvectors of AA

, the rst r columns of U


form an orthonormal basis for the column space col(A) of A, and the last
mr columns of U form an orthonormal basis for the (right) null space of
the matrix A

.
(ii) The columns of V are eigenvectors of A

A, the last n r columns of


V form an orthonormal basis for the (right) null space null(A) of A, and the
rst r columns of V form an orthonormal basis for the column space col(A

).
(iii) The rank of A is r.
Proof. Start by supposing that
1

2

r
>
r+1
= =
n
= 0.
Since [[Av
i
[[ =

i
= s
i
, clearly Av
i
,= 0 i 1 i r. Also, if i ,= j, then
Av
i
, Av
j
) = A

Av
i
, v
j
) =
i
v
i
, v
j
) = 0. So (Av
1
, . . . , Av
n
) is an orthogonal
list. Moreover, Av
i
,= 0 i 1 i r, since [[Av
i
[[ =

i
= s
i
. So clearly
(Av
1
, . . . , Av
r
) is a linearly independent list that spans a subspace of col(A).
On the other hand suppose that y = Ax col(A). Then x =

c
i
v
i
, so
y = Ax =

n
i=1
c
i
Av
i
=

r
i=1
c
i
Av
i
. It follows that (Av
1
, . . . Av
r
) is a basis
9.7. THE SINGULAR-VALUE DECOMPOSITION 179
for col(A), implying that r =dim(col(A)) = rank(A). Of course, then the
right null space of A has dimension n r.
For 1 i r, put u
i
=
Av
i
[[Av
i
[[
=
1
s
i
Av
i
, so that Av
i
= s
i
u
i
. Now extend
(u
1
, . . . , u
r
) to an orthonormal basis (u
1
, . . . , u
m
) of F
m
. Let U = [u
1
, . . . , u
m
]
be the matrix whose jth column is u
j
. Similarly, put V = [v
1
, . . . , v
n
], so U
and V are unitary matrices. And
AV = [Av
1
, . . . , Av
r
, 0, . . . , 0] = [s
1
u
1
, . . . , s
r
u
r
, 0, . . . , 0] =
= [u
1
, . . . , u
m
]
_
_
_
_
_
s
1
0
.
.
.
s
r
0
.
.
.
_
_
_
_
_
= U,
where is diagonal, mn, with the singular values of A along the diagonal.
Then AV = U implies A = UV

, A

= V

, so A

A = V

UV

= V

.
Then
(A

A)V = V

V = V

= V
_
_
_

1
.
.
.

r
_
_
_
= (
1
v
1
, . . . ,
r
v
r
, 0, , 0).
Since (A

A)v
j
= 0 for j > r, implying v

j
A

Av
j
= 0, forcing Av
j
= 0,
it is clear that v
r+1
, . . . , v
n
form an orthonormal basis for null(A). Since

j
v
j
= (A

A)v
j
= A

j
u
j
, we see A

u
j
= s
j
v
j
. It now follows readily that
v
1
, . . . , v
r
form a basis for the column space of A

.
Similarly, AA

= U

, so that
(AA

)U = U

= (
1
u
1
, . . . ,
r
u
r
, 0, , 0).
It follows that (u
1
, . . . , u
n
) consists of eigenvectors of AA

. Also, u
r+1
, . . . , u
m
form an orthonormal basis for the (right) null space of A

, and u
1
, . . . , u
r
form
an orthonormal basis for col(A).
Recapitulation
We want to practice recognizing and/or nding a singular value decom-
position for a given mn matrix A over the subeld F of c.
180 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
A = UV

if and only if A

= V

. (9.13)
Here is mn and diagonal with the nonzero singular values s
1
s
2

. . . , s
r
down the main diagonal as its only nonzero entries. Similarly,

is n m and diagonal with s


1
s
r
down the main diagonal as its
only nonzero entries. So the nonzero eigenvalues of A

A are identical to the


nonzero eigenvalues of AA

. The only dierence is the multiplicity of 0 as


an eigenvalue.
For 1 i r, Av
i
= s
i
u
i
and A

u
i
= s
i
v
i
. (9.14)
(v
1
, . . . , v
r
, v
r+1
, . . . , v
n
) is an orthonormal basis of eigenvectors of A

A.
(9.15)
(u
1
, . . . , u
r
, u
r+1
, . . . , u
m
) is an orthonormal basis of eigenvectors of AA

.
(9.16)
(v
1
, . . . , v
r
) is a basis for col(A

) and (v
r+1
, . . . , v
n
) is a basis for null(A).
(9.17)
(u
1
, . . . , u
r
) is a basis for col(A) and (u
r+1
, . . . , u
m
) is a basis for null(A

).
(9.18)
We can rst nd (v
1
, . . . , v
n
) as an orthonormal basis of eigenvectors of
A

A, put u
i
=
1
s
i
Av
i
, for 1 i r, and then complete (u
1
, . . . , u
r
) to an
orthonormal basis (u
1
, . . . , u
m
) of eigenvectors of AA

. Sometimes here it is
ecient to use the fact that (u
r+1
, . . . , u
m
) form a basis for the null space of
AA

or of A

. If n < m, this is the approach usually taken.


9.7. THE SINGULAR-VALUE DECOMPOSITION 181
Alternatively, we can nd (u
1
, . . . , u
m
) as an orthonormal basis of eigen-
vectors of AA

(with eigenvalues ordered from largest to smallest), put


v
i
=
1
s
i
A

u
i
for 1 i r, and then complete (v
1
, . . . , v
r
) to an orthonormal
basis (v
1
, . . . , v
n
) of eigenvectors of A

A. If m < n, this is the approach


usually taken.
9.7.3 Two Examples
Problem 1. Find the Singular Value Decomposition of A =
_
_
1 1
2 2
2 2
_
_
.
Solution: A

A = A
T
A =
_
9 9
9 9
_
. In this case it is easy to nd
an orthonormal basis of R
2
consisting of eigenvectors of A

A. Put v
1
=
_

1

2
1

2
_
, v
2
=
_
1

2
1

2
_
, V = [v
1
, v
2
].
Then A

A[v
1
, v
2
] = [18v
1
, 0 v
2
]. So put u
1
=
Av
1
[[Av
1
[[
=
1
3

2
_
_

2
2

2
2

2
_
_
=
_
_

1
3
2
3

2
3
_
_
= the rst column of U.
It is now easy to see that w
1
=
_
_
2
1
0
_
_
, w
2
=
_
_
2
0
1
_
_
form a basis of
u
1

. We apply Gram-Schmidt to (w
1
, w
2
) to obtain
u
2
=
_
_
2

5
1

5
0
_
_
, u
3
=
_
_
_
2

45
4

45
5

45
_
_
_
.
Now put U = [u
1
, u
2
, u
3
].
It follows that A = UV

is one singular value decomposition of A, where


=
_
_
3

2 0
0 0
0 0
_
_
.
182 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
Problem 2. Compute a singular value decomposition of A =
_
1 0 i
0 1 i
_
.
Solution: In this case AA

is 2 2, while A

A is 3 3, so we start with
AA

=
_
2 1
1 2
_
. Here AA

has eigenvalues
1
= 3 and
2
= 1. So the
singular values of A

(and hence of A) are s


1
=

3 and s
2
= 1. It follows
that =
_
3 0 0
0 1 0
_
. Also

=
_
_

3 0
0 1
0 0
_
_
.
In this case we choose to compute an orthonormal basis (u
1
, u
2
) of eigen-
vectors of AA

. It is simple to check that AA

3I =
_
1 1
1 1
_
, and we
may take u
1
=
_
1

2
1

2
_
. Similarly, AA

I =
_
1 1
1 1
_
, and we may
take u
2
=
_
1

2
1

2
_
. Then U = [u
1
, u
2
]. At this point we must put
v
1
=
1
s
1
A

u
1
=
1

3
_
_
1 0
0 1
i i
_
_
_
1

2
1

2
_
=
_
_
_
1

6
1

6
2i

6
_
_
_
,
and
v
2
= 1 A

u
2
=
_
_
1 0
0 1
i i
_
_
_
1

2
1

2
_
=
_
_
1

2
1

2
0
_
_
.
At this point we know that (v
3
) must be an orthonormal basis for
span(v
1
, v
2
)

. So we want v
3
= (x, y, z)
T
with 0 = (1, 1, 2i), (x, y, z)) =
x y 2iz, and 0 = (1, 1, 0), (x, y, z)) = x + y. It follows easily that
(x, y, z) = (iz, iz, z), where we must choose z so that the norm of this
vector is 1. If we put z =
i

3
, i.e., z =
i

3
, then (x, y, z) = (
1

3
,
1

3
,
i

3
).
Then
A =
_
1

2
1

2
1

2
_
_
3 0 0
0 1 0
_
_
_
_
1

6

1

6
2i

6
1

2
1

2
0
1

3

1

3

i

3
_
_
_
,
which is a singular value decomposition of A.
9.7. THE SINGULAR-VALUE DECOMPOSITION 183
THE REMAINDER OF THIS SECTION MAY BE CONSIDERED TO
BE STARRED.
Denition If is an eigenvalue of the matrix A, then dim(null(T I))
is called the geometric multiplicity of the eigenvalue .
Theorem 9.7.4. Let M be m
2
m
1
and N be m
1
m
2
. Put
A =
_
0 N
M 0
_
.
Then the following are equivalent:
(i) ,= 0 is an eigenvalue of A with (geometric) multiplicity f.
(ii) ,= 0 is an eigenvalue of A with (geometric) multiplicity f.
(iii)
2
,= 0 is an eigenvalue of MN with (geometric) multiplicity f.
(iv)
2
,= 0 is an eigenvalue of NM with (geometric) multiplicity f.
Proof. Step 1. Show (i) (ii) Let AU = U for some matrix U of rank
f. Write U =
_
U
1
U
2
_
, and put

U =
_
U
1
U
2
_
, where U
i
has m
i
rows for
i = 1, 2. Then AU = U becomes
_
0 N
M 0
__
U
1
U
2
_
=
_
NU
2
MU
1
_
=
_
U
1
U
2
_
,
so NU
2
= U
1
and MU
1
= U
2
.
This implies A

U =
_
0 N
M 0
__
U
1
U
2
_
=
_
U
1
U
2
_
=

U. Since
rank(U) = rank(

U), the rst equivalence follows.


Step 2. Show (iii) (iv). Let MNU
t
=
2
U
t
for some matrix U
t
of
rank f. Then (NM)(NU
t
) =
2
NU
t
, and rank(NU
t
) = rank(U
t
), since
rank(
2
U
t
) = rank(MNU
t
) rank(U
t
), and ,= 0. So NM has
2
as
eigenvalue with geometric multiplicity at least f. Interchanging roles of N
and M proves (iii) (iv).
Step 3. Show (i) (iii) Let AU = U with U having rank f, and

U =
_
U
1
U
2
_
as in Step 1. Put n = m
1
+m
2
. Then A
2
_
U;

U
_
=
2
_
U;

U
_
.
since NU
2
= U
1
and MU
1
= U
2
, U
1
and U
2
have the same row space. So
row(U) = row(U
1
) = row(U
2
). This implies rank(U
1
) = rank(U
2
) = f.
184 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
Using column operations we can transform
_
U
1
U
1
U
2
U
2
_
into
_
U
1
0
0 U
2
_
,
which has rank 2f. So
2
is an eigenvalue of A
2
with geometric multiplicity
at least 2f.
On the other hand, the geometric multiplicity of an eigenvalue
2
,= 0 of
A
2
equals n rank(A
2

2
I) = n rank((A I)(A + I)) n + n
rank(A I) rank(A + I) = 2f.
Note: The singular values of a complex matrix N are sometimes de-
ned to be the positive eigenvalues of
_
0 N
N

0
_
. By the above result
we see that they are the same as the positive square roots of the nonzero
eigenvalues of NN

(or of N

N) as we have dened them above.


9.8 Pseudoinverses and Least Squares

Let A be an m n matrix over F (where F is either 1 or c) and suppose


the rank of A is r. Let
A = UV

be a singular value decomposition of A.


So is an mn diagonal matrix with the nonzero singular values s
1

s
2
. . . , s
r
down the main diagonal as its only nonzero entries. Dene
+
to be the nm diagonal matrix with diagonal equal to (
1
s
1
,
1
s
2
, . . . ,
1
sr
, 0, . . .).
Then both
+
and
+
have the general block form
_
I
r
0
0 0
_
,
but the rst product is mm and the second is n n.
Denition The Moore-Penrose generalized inverse of A (sometimes just
called the pseudoinverse of A) is the n m matrix A
+
over F dened by
A
+
= V
+
U

.
Theorem 9.8.1. Let A be an m n matrix over F with pseudoinverse A
+
(as dened above). Then the following three properties hold:
(a) AA
+
A = A;
(b) A
+
AA
+
= A
+
;
(c) AA
+
and A
+
A are hermitian (i.e., self-adjoint).
9.8. PSEUDOINVERSES AND LEAST SQUARES

185
Proof. All three properties are easily shown to hold using the denition of
A
+
. You should do this now.
Our denition of the Moore-Penrose generalized inverse would not be
valid if it were possible for there to be more than one. However, the fol-
lowing theorem shows that there is at most one (hence exactly one!) such
pseudoinverse.
Theorem 9.8.2. Given an m n matrix A, there is at most one matrix
satisfying the three properties of A
+
given in Theorem 9.8.1. This means
that A has a unique pseudoinverse.
Proof. Let B and C be pseudoinverses of A. i.e., satisfying the three prop-
erties of A
+
in Theorem 9.8.1. Then
CA = C(ABA) = CA(BA)

= CAA

= (A(CA)

= (ACA)

= A

= (BA)

= BA,
i.e.,
CA = BA.
Then
B = BAB = B(AB)

= BB

,
so that
B = BB

(ACA)

= BB

(AC)

= BAC = CAC = C.
At this point we want to review and slightly revise the Gram-Schmidt
algorithm given earlier. Let (v
1
, . . . , v
n
) be any list of column vectors in F
m
.
Let these be the columns of an m n matrix A. Let W = span(v
1
, . . . , v
n
)
be the column space of A. Dene u
1
= v
1
. Then for 2 i n put
u
i
= v
i

1i
u
1

2i
u
2

i1,i
u
i1
,
where
ji
=
v
i
,u
j
)
u
j
,u
j
)
, if u
j
,=

0, and
ji
= 0 if u
j
=

0. Hence
v
i
=
1i
u
1
+
2i
u
2
+
3i
u
3
+ +
i1,i
u
i1
+ u
i
. (9.19)
186 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
Let Q
0
be the mn matrix whose columns are u
1
, u
2
, . . . , u
n
, respectively,
and let R
0
be the n n matrix given by
R
0
=
_
_
_
_
_
1
12

1n
0 1
2q
.
.
.
.
.
.
.
.
.
.
.
.
0 0 1
_
_
_
_
_
. (9.20)
The u
i
in Q
0
are constructed by the Gram-Schmidt method and form an
orthogonal set. This means that Q
0
has orthogonal columns, some of which
may be zero. Let Q and R be the matrices obtained by deleting the zero
columns from Q
0
and the corresponding rows from R
0
, and by dividing each
nonzero column of Q
0
by its norm and multiplying each corresponding row
of R
0
by that same norm. Then Eq. 9.20 becomes
A = QR with R upper triangular and Q having orthonormal columns.
(9.21)
Eq. 9.21 is the normalized QR-decomposition of A. Note that if A has
rank k, then Q is m k with rank k and R is k n and upper triangular
with rank k. The columns of Q form an orthonormal basis for the column
space W of A.
If we compute Q

Q, since the columns of Q are orthonormal, we get


Q

Q = I
k
.
Since (u
1
, . . . , u
k
) is an orthonormal basis for W, if P
0
/(F
n
) is the
orthogonal projection onto W, then for v F
n
we have
P
0
(v) =
k

i=1
v, u
i
)u
i
= (u
1
, . . . , u
k
)
_
_
_
v, u
1
)
.
.
.
v, u
k
)
_
_
_
= Q
_
_
_
u

1
.
.
.
u

k
_
_
_
v = QQ

v.
So QQ

is the projection matrix projecting v onto W = col(Q). Hence QQ

v
is the unique vector in W closest to v.
Lemma 9.8.3. Suppose that the m n matrix A has rank k and that A =
BC, where B is mk with rank k and C is k n with rank k. Then
A
++
= C

(CC

)
1
(B

B)
1
B

is the pseudoinverse of A.
9.8. PSEUDOINVERSES AND LEAST SQUARES

187
Proof. It is rather straightforward to verify that the three properties of The-
orem 9.8.1 are satised by this A
++
. Then by Theorem 9.8.2 we know that
a matrix A
+
satisfying the three properties of Theorem 9.8.1 is uniquely
determined by these properties.)
Corollary 9.8.4. If A = QR is a normalized QR-decomposition of A, then
A
+++
= R

(RR

)
1
Q

is the pseudoinverse of A.
Given that A = UV

is a singular value decomposition of A, partition


the two matrices as follows: U = (U
k
, U
mk
) and V = (V
k
, V
nk
), where U
k
is m k and its columns are the rst k columns of U, i.e., they form an
orthonormal basis of col(A). Similarly, V
k
is k n and it columns are the
rst k columns of V , so they form an orthonormal basis for col(A

). This is
the same as saying that the rows of V

k
form an orthonormal basis for row(A).
Now let D be the k k diagonal matrix with the nonzero singular values of
A down the diagonal, i.e., =
_
D 0
0 0
_
. Now using block multiplication
we see that
A = (U
k
, U
mk
)
_
D 0
0 0
__
V

k
V

nk
_
= U
k
D
k
V

k
.
This expression A = U
k
D
k
V

k
is called the reduced singular value decompo-
sition of A. Here U
k
is m k with rank k, and D
k
V

k
is k n with rank
k. Then A = QR = U
k
D
k
V

k
where Q = U
k
is m k with rank k, and
R = D
k
V

k
is k n with rank k. The columns of Q form an orthonormal
basis for col(A), and the rows of R form an orthogonal basis for row(A). So
a pseudoinverse A
+
is given by
A
+
= R

(RR

)
1
(Q

Q)
1
Q

= V
k
D
1
k
U

k
, after some computation.
Suppose we are given the equation
Ax = b, (9.22)
where A is m n and b is m 1. It is possible that this equation is not
consistent. In this case we want to nd x so that A x is as close to b as
possible, i.e., it should be the case that A x is the projection

b of b onto the
column space of A. Put
x = A
+
b = V
k
D
1
k
U

k
b.
188 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
Then
A x = (U
k
D
k
V

k
)V
k
D
1
k
U

k
b
= U
k
DD
1
U

k
b (9.23)
= U
k
U

k
b,
which by the paragraph preceding Lemma 9.8.3 must be the projection

b of
b onto col(A). Thus x is a least-squares solution to Ax = b, in the sense
that x minimizes [[A x b[[ (which is the square root of a sum of squares).
Moreover, it is true that x has the smallest length among all least-squares
solutions to Ax = b.
Theorem 9.8.5. Let A be mn with rank k. Let A = UV

be a normalized
singular value decomposition of A. Let U = (U
k
, U
mk
) and V = (V
k
, V
nk
)
be partitioned as above. Then A
+
= V
k
D
1
k
U

k
, as above, and x = A
+
b is a
least-squares solution to Ax = b. If x
0
is any other least-squares solution,
i.e., [[Ax
0
b[[ = |A x b[[, then [[ x[[ [[x
0
[[, with equality if and only if
x
0
= x.
Proof. What remains to be shown is that if [[Ax
0
b[[ = |A x b[[, then
[[ x[[ [[x
0
[[, with equality if and only if x
0
= x. We know that [[Ax b[[ is
minimized precisely when Ax =

b is the projection U
k
U

k
b of b onto col(A).
So suppose Ax
0
=

b = A x, which implies A(x
0
x) = 0, i.e., x
0
x null(A).
By Eq. 9.17 this means that x
0
= x+V
nk
z for some z F
nk
. We claim that
x = A
+
b = V
k
D
1
k
U

k
b and V
nk
z are orthogonal. For, V
k
D
1
K
U

k
b, V
nk
z) =
z

(V

nk
V
k
)D
1
k
U

K
b = 0, because by Eq. 9.15, V

nk
V
k
= 0
(nk)k
. Then
[[x
0
[[
2
= [[ x +V
nk
z[[
2
= [[ x[[
2
+[[V
nk
z[[
2
[[ x[[
2
, with equality if and only
if [[V
nk
z[[
2
= 0 which is if and only if V
nk
z =

0 which is if and only if
x
0
= x.
Theorem 9.8.6. Let A be mn with rank k. Then A has a unique (Moore-
Penrose) pseudoinverse. If k = n, A
+
= (A

A)
1
A

, and A
+
is a left
inverse of A. If k = m, A
+
= A

(AA

)
1
, and A
+
is a right inverse of A.
If k = m = n, A
+
= A
1
.
Proof. We have already seen that A has a unique pseudoinverse. If k = n,
A = B, C = I
k
yields a factorization of the type used in Lemma 9.8.3,
so A
+
= (A

A)
1
A

. Similarly, if k = m, put B = I
k
, C = A. Then
A
+
= A

(AA

)
1
. And if A is invertible, so that (A

A)
1
= A
1
(A

)
1
, by
the k = n case we have A
+
= (A

A)
1
A

= A
1
(A

)
1
A

= A
1
.
9.9. NORMS, DISTANCE AND MORE ON LEAST SQUARES

189
9.9 Norms, Distance and More on Least Squares

In this section the norm |A| of an mn matrix A over c is dened by:


|A| = (tr(A

A))
1/2
=
_

i,j
[A
ij
[
2
_
1/2
.
(Here we write tr(A) for the trace of A.)
Note: This norm is the 2-norm of A thought of as a vector in the mn-
dimensional vector space M
m,n
(c). It is almost obvious that |A| = |A

|.
Theorem 9.9.1. Let A, P, Q be arbitrary complex matrices for which the
appropriate multiplications are dened. Then
|AP|
2
+|(I AA
+
)Q|
2
= |AP + (I AA
+
)Q|
2
.
Proof.
|AP + (I AA
+
)Q|
2
= tr
_
[AP + (I AA
+
)Q]

[AP + (I AA
+
)Q]
_
= |AP|
2
+tr
_
(AP)

(I AA
+
)Q
_
+tr
_
((I AA
+
)Q)

AP
_
+|(IAA
+
)Q|
2
.
We now show that each of the two middle terms is the trace of the zero
matrix. Since the second of these matrices is the conjugate transpose of the
other, it is necessary only to see that one of them is the zero matrix. So for
the rst matrix:
(AP)

(I AA
+
)Q = (AP)

(I AA
+
)

Q = ((I AA
+
)AP)

Q =
= (AP AA
+
AP)

Q = (AP AP)

Q = 0.
Theorem 9.9.2. Let A and B be mn and mp complex matrices. Then
the matrix X
0
= A
+
B enjoys the following properties:
(i) |AX B| |AX
0
B| for all n p X. Moreover, equality holds if
and only if AX = AA
+
B.
(ii) |X| |X
0
| for all X such that |AX B| = |AX
0
B|, with
equality if and only if X = X
0
.
190 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
Proof.
|AX B|
2
=
= |A(XA
+
B)+((I AA
+
)(B)|
2
= |A(XA
+
B|
2
+|(I AA
+
)(B)|
2
= |AX AA
+
B|
2
+|AA
+
B B|
2
|AA
+
B B|
2
,
with equality if and only if AX = AA
+
B. This completes the proof of (i).
Now interchange A
+
and A in the preceding theorem and assume AX =
AA
+
B so that equality holds in (i). Then we have
|X|
2
= |A
+
B + X A
+
(AA
+
B)|
2
= |A
+
B + (I A
+
A)X|
2
= |A
+
B|
2
+|X A
+
AX|
2
= |A
+
B|
2
+|X A
+
B|
2
|A
+
B|
2
,
with equality if and only if X = A
+
B.
NOTE: Suppose we have n (column) vectors v
1
, . . . , v
n
in 1
m
and some
vector y in 1
m
, and we want to nd that vector in the space span(v
1
, . . . , v
n
) =
V which is closest to the vector y. First, we need to use only an independent
subset of the v
i
s. So use row-reduction techniques to nd a basis (w
1
, . . . , w
k
)
of V . Now use these w
i
to form the columns of a matrix A. So A is m k
with rank k, and the pseudoinverse of A is simpler to compute than if we
had used all of the v
i
s to form the columns of A. Put x
0
= A
+
y. Then
Ax
0
= AA
+
y is the desired vector in V closest to y. Now suppose we want
to nd the distance from y to V , i.e., the distance
d(y, AA
+
y) = |y AA
+
y|.
It is easier to compute the square of the distance rst:
|y AA
+
y|
2
= |(I AA
+
)y|
2
= y

(I AA
+
)

(I AA
+
)y =
= y

(I AA
+
)(I AA
+
)y = y

(I AA
+
AA
+
AA
+
AA
+
)y = y

(I AA
+
)y.
We have proved the following:
Theorem 9.9.3. The distance between a vector y of 1
m
and the column
space of an mn matrix A is (y

(I AA
+
)y)
1
2
.
9.9. NORMS, DISTANCE AND MORE ON LEAST SQUARES

191
Theorem 9.9.4. AXB = C has a solution X if and only if AA
+
CB
+
B = C,
in which case for any Y ,
X = A
+
CB
+
+ Y A
+
AY BB
+
is a solution.
This gives the general solution. (The reader might want to review the exer-
cises in Chapter 6 for a dierent approach to a more general problem.)
Proof. If AXB = C, then C = AXB = AA
+
AXBB
+
B = AA
+
CB
+
B.
Conversely, if C = AA
+
CB
+
B, then X = A
+
CB
+
is a particular solution
of AXB = C. Any expression of the form X = Y A
+
AY BB
+
satises
AXB = 0. And if AXB = 0, then X = Y A
+
AY BB
+
for Y = X.
Put X
0
= A
+
CB
+
and let X
1
be any other solution to AXB = C. Then
X = X
1
X
0
satises AXB = 0, so that
X
1
X
0
= X = Y A
+
AY BB
+
for some Y.
If x = (x
1
, . . . , x
n
)
T
and y = (y
1
, . . . , y
n
)
T
are two points of c
n
, the
distance between x and y is dened to be
d(x, y) = |x y| =
_

[x
i
y
i
[
2
_
1/2
.
A hyperplane H in c
n
is the set of vectors x = (x
1
, . . . , x
n
)
T
satisfying an
equation of the form

n
i=1
a
i
x
i
+ d = 0, i.e.,
H = x c
n
: Ax + d = 0, where A = (a
1
, . . . , a
n
) ,=

0.
A = 1 (a
1
, . . . , a
n
), where 1 is 1 1 of rank 1 and (a
1
, . . . , a
n
) is 1 n of
rank 1, so
A
+
= A

(AA

)
1
=
1
|A|
2
A

.
For such an A you should now show that |A
+
| =
1
|A|
.
Let y = (y
1
, . . . , y
n
)
T
be a given point of c
n
and let H : Ax + d = 0 be a
given hyperplane. We propose to nd the point x
0
of H that is closest to y
0
and to nd the distance d(x
0
, y
0
).
Note that x is on H if and only if Ax + d = 0 if and only if A(x y
0
)
(d Ay
0
) = 0. Hence our problem is to nd x
0
such that
192 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
(i) A(x
0
y
0
) (d Ay
0
) = 0, (i.e., x
0
is on H),
and
(ii) |x
0
y
0
| is minimal (so x
0
is the point of H closest to y
0
).
Theorem 9.9.2 says that the vector x = x
0
y
0
that satises Ax (d
Ay
0
) = 0 with |x| minimal is given by
x = x
0
y
0
= A
+
(d Ay
0
) =
1
|A|
2
A

(d Ay
0
).
Hence
x
0
= y
0
+
d
|A|
2
A

1
|A|
2
A

Ay
0
.
And
d(y
0
, H) = d(y
0
, x
0
) = |x
0
y
0
| = |A
+
(d Ay
0
)|
= [d + Ay
0
[ |A
+
| =
[Ay
0
+ d[
|A|
.
Note: This distance formula generalizes well known formulas for the
distance from a point to a line in 1
2
and from a point to a plane in 1
3
.
We now recall the method of approximating by least squares. Suppose
y is a function of n real variables t
(1)
, . . . , t
(n)
, and we want to approximate
y as a linear function of these variables. this means we want to nd (real)
numbers x
0
, . . . , x
n
for which
1. x
0
+x
1
t
(1)
+ +x
n
t
(n)
is as close to the function y = y(t
(1)
, . . . , t
(n)
)
as possible.
Suppose we have m measurements of y corresponding to m dierent
sets of values of the t
(j)
: (y
i
; t
(1)
i
, . . . , t
(n)
i
), i = 1, . . . , m. The problem
is to determine x
0
, . . . , x
n
so that
2. y
i
= x
1
t
(1)
i
+ + x
n
t
(n)
+ r
i
, 1 i m, where the r
i
s are small in
some sense.
Put t
(j)
i
= a
ij
, y = (y
1
, . . . , y
m
)
T
, x = (x
0
, . . . , x
n
)
T
, r = (r
1
, . . . , r
m
)
T
,
and
9.9. NORMS, DISTANCE AND MORE ON LEAST SQUARES

193
A =
_
_
_
_
_
1 a
11
a
1n
1 a
21
a
2n
.
.
.
.
.
.
.
.
.
.
.
.
1 a
m
1
a
mn
_
_
_
_
_
.
Then 2. becomes:
3. y = Ax + r.
A standard interpretation of saying that r is small is to say that for
some some weighting constants w
1
, . . . , w
m
, the number S =

m
i=1
w
i
r
2
i
=
r
T
Wr, where W = diag(w
1
, . . . , w
m
) is minimal.
As S =

m
i=1
w
i
(y
i
s
0
x
1
a
i1
x
2
a
i2
x
n
a
in
)
2
, to minimize S
as a function of x
0
, . . . , x
n
, we require that
S
x
k
= 0 for all k.
Then
S
x
0
= 2

i
w
i
(y
i
x
0
x
1
a
i1
x
n
a
in
) = 0
implies
4.
(
m

i=1
w
i
)x
0
+ (
m

i=1
w
i
a
i1
)x
1
+ + (
m

i=1
w
i
a
in
)x
n
=

w
i
y
i
.
And for 1 k n :
S
x
k
= 2

m
i=1
w
i
(y
i
x
0
x
1
a
i1
x
n
a
in
)a
ik
= 0
implies
5.
(
m

i=1
w
i
a
ik
)x
0
+ (
m

i=1
w
i
a
i1
a
ik
)x
1
+ + (
m

i=1
w
i
a
in
a
ik
)x
n
=

w
i
y
i
a
ik
.
It is easy to check that
A
T
W =
_
_
_
_
_
1 1
a
11
a
m1
.
.
.
.
.
.
a
1n
a
mn
_
_
_
_
_
_
_
_
w
1
0
.
.
.
.
.
.
.
.
.
0 w
m
_
_
_
194 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
=
_
_
_
_
_
w
1
w
m
a
11
w
1
a
m1
w
m
.
.
.
.
.
.
a
1n
w
n
a
mn
w
m
_
_
_
_
_
.
So putting together the right hand sides of 4. and 5., we obtain
6.
(A
T
W)y =
_
_
_
_
_

w
i
y
i

a
i1
w
i
y
i
.
.
.

a
in
w
i
y
i
_
_
_
_
_
.
Also we compute
A
T
WA =
_
_
_
_
_
w
1
w
m
a
11
w
1
a
m1
w
m
.
.
.
.
.
.
.
.
.
a
1n
w
1
a
mn
w
m
_
_
_
_
_
_
_
_
1 a
11
a
1n
.
.
.
.
.
.
.
.
.
.
.
.
1 a
m1
a
mn
_
_
_
.
Comparing the above with the left hand sides of 4. and 5., and putting
the above equations together, we obtain the following system of n + 1
equations in n + 1 unknowns:
7. (A
T
WA)x = A
T
Wy.
We now reconsider this problem taking advantage of our results on
generalized inverses. So A is a real m n matrix, y is a real m 1
matrix, and x R
n
is sought for which (yAx)
T
(yAx) = |Axy|
2
is minimal (here we put W = I). Theorem 9.9.2 solves this problem by
putting x = A
+
y. We show that this also satises the above condition
A
T
Ax = A
T
y, as should be expected. For suppose A = BC with B
mk, C k n, where k = rank(A) = rank(B) = rank(C). Then
A
T
Ax = A
T
AA
+
y = C
T
B
T
BCC
T
(CC
T
)
1
(B
T
B)
1
B
T
y
= C
T
B
T
y = A
T
y.
So when W = I the generalized inverse solution does the following:
First, it actually constructs a solution of the original problem that,
second, has the smallest norm of any solution.
9.10. THE RAYLEIGH PRINCIPLE

195
9.10 The Rayleigh Principle

All matrices in this section are over the eld c of complex numbers, and
for any matrix B, B

denotes the conjugate transpose of B. Also, , )


denotes the standard inner product on c
n
given by: x, y) = x
T
y = y

x.
Let A be n n hermitian, so A has real eigenvalues
1

2

n
with an orthonormal set v
1
, . . . , v
n
of associated eigenvectors: Av
j
=
j
v
j
,
v
j
, v
i
) = v
T
j
v
i
= v

i
v
j
=
ij
.
Let Q = (v
1
, . . . , v
n
) be the matrix whose jth column is v
j
. Then Q

Q =
I
n
and Q

AQ = Q

(
1
v
1
, . . . ,
n
v
n
) = Q

Q = = diag(
1
, . . . ,
n
), and Q
is unitary (Q

= Q
1
).
For 0 ,= x c
n
dene the Rayleigh Quotient
A
(x) for A by

A
(x) =
Ax, x)
x, x)
=
x

Ax
[[x[[
2
. (9.24)
Put O = x c
n
: x, x) = 1, and note that for 0 ,= k c, 0 ,= x c
n
,

A
(kx) =
A
(x). (9.25)
Hence

A
(x) : x ,= 0 =
A
(x) : x O = x

Ax : x O. (9.26)
The set W(A) =
A
(x) : x O is called the numerical range of A.
Observe that if x = Qy, then x O i y O. Since W(A) is the continuous
image of a compact connected set, it must be a closed bounded interval
with a maximum M and a minimum m. Since Q : c
n
c
n
: x Qx is
nonsingular, Q maps O to O in a one-to-one and onto manner. Hence
M = max
xC
x

Ax = max
yC
(Qy)

A(Qy) = y

AQy =
= max
yC
y

y = max
yC

j=1

j
[y
j
[
2
,
where y = (y
1
, y
2
, . . . , y
n
)
T
c
n
,

[y
i
[
2
= 1.
Similarly,
m = min
xC

A
(x) = min
yC

j=1

j
[y
j
[
2
.
196 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
By the ordering of the eigenvalues, for y O we have

1
=
1

[y
j
[
2

j=1

j
[y
j
[
2
=

(y)
n
n

j=1
[y
j
[
2
=
n
. (9.27)
Furthermore, with y = (1, 0, . . . , 0)

O and z = (0, . . . , 0, 1)

O, we
have

(y) =
1
and

(z) =
n
. Hence we have almost proved the following
rst approximation to the Rayleigh Principle.
Theorem 9.10.1. Let A be an n n hermitian matrix with eigenvalues

1

2

n
. Then for any nonzero x O,

1

A
(x)
n
, and (9.28)

1
= min
xC

A
(x);
n
= max
xC

A
(x). (9.29)
If 0 ,= x c
n
satises
A
(x) =
i
for either i = 1 or i = n, (9.30)
then x is an eigenvector of A belonging to the eigenvalue
i
.
Proof. Clearly Eqs. 9.28 and 9.29 are already proved. So consider Eq. 9.30.
Without loss of generality we may assume x O. Suppose x =

n
j=1
c
j
v
j
,
so that
A
(x) = x

Ax =
_

n
j=1
c
j
v

j
__

n
j=1
c
j

j
v
j
_
=

n
j=1

j
[c
j
[
2
.
Clearly
1
=
1

n
j=1
[c
j
[
2

n
j=1

j
[c
j
[
2
with equality i
j
=
1
when-
ever c
j
,= 0. Hence
A
(x) =
1
i x belongs to the eigenspace associated with

1
. The argument for
n
is similar.
Recall that Q = (v
1
, . . . , v
n
), and note that if x = Qy, so y = Q

x,
then x = v
i
= Qy i y = Q

v
i
= e
i
. So with the notation x = Qy,
y = (y
1
, . . . , y
n
)
T
, we have
x, v
i
) = x

v
i
= x

QQ

v
i
= y

e
i
= y
i
.
Hence x v
i
i y = Q

x satises y
i
= 0.
Def. T
j
= x ,= 0 : x, v
k
) = 0 for k = 1, . . . , j = v
1
, . . . , v
j

0.
Theorem 9.10.2.
A
(x)
j+1
for all x T
j
, and
A
(x) =
j+1
for some
x T
j
i x is an eigenvector associated with
j+1
. Thus

j+1
= min
xT
j

A
(x) =
A
(v
j+1
).
9.11. EXERCISES 197
Proof. 0 ,= x T
j
i x = Qy where y =

n
k=j+1
y
k
e
k
i x =

n
k=j+1
y
k
v
k
.
Without loss of generality we may assume x O T
j
. Then y = Q

x O
and x T
j
O i
A
(x) =

n
k=j+1

k
[y
k
[
2

j+1

n
k=j+1
[y
2
k
[
j+1
,
with equality i y
k
= 0 whenever
k
>
j+1
. In particular, if y = e
j+1
, so
x = v
j+1
,
A
(x) =
j+1
.
Theorem 9.10.3. Put S
j
= x ,= 0 : x, v
k
) = 0 for k = n, n 1, . . . , n
(j 1). So S
j
= v
n
, v
n1
, . . . , v
n(j1)

0. Then
A
(x)
nj
for all
x S
j
, and equality holds i x is an eigenvector associated with
nj
. Thus

nj
= max
xS
j

A
(x) =
A
(v
nj
).
Proof. We leave the proof as an exercise for the reader.
The Rayleigh Principle consists of Theorems 9.10.2 and 9.10.3.
9.11 Exercises
1. Show that if T /(V ), where V is any nite-dimensional inner product
space, and if T is normal, then
(a) Im(T) = Im(T

), and
(b) null(T) = null(T

).
2. Prove that if T /(V ) is normal, then
null(T
k
) = null(T) and Im(T
k
) = Im(T)
for every positive integer k.
3. Prove that a normal operator on a complex inner product space is
self-adjoint if and only if all its eigenvalues are real.
4. Let V be a nite dimensional inner product space over c. Suppose
T /(V ) and U is a subspace of V . Prove that U is invariant under
T if and only if U

is invariant under T

.
198 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
5. Let V be a nite dimensional inner product space over c. Suppose
that T is a positive operator on V (called positive semidenite by some
authors). Prove that T is invertible if and only if
Tv, v) > 0
for every v V 0.
6. In this problem M
n
(1) denotes the set of all n n real matrices, and
1
n
denotes the usual real inner product space of all column vectors
with n real entries and inner product (x, y) = x
T
y = y
T
x = (y, x).
Dene the following subsets of M
N
(1):

n
= A M
n
(1) : (Ax, x) > 0 for all 0 ,= x 1
n

o
n
= A M
n
(1) : A
T
= A

n
=
n
o
n
K
n
= A M
n
(1) : A
T
= A
For this problem we say that A M
n
(1) is positive denite if and
only if A
n
. A is symmetric if and only if it belongs to o
n
. It is
skew-symmetric if and only if it belongs to K
n
.
Note: It is sometimes the case that a real matrix is said to be positive
denite if and only if it belongs to o
n
also, i.e. A
n
. Remember
that we do not do that here.
Problem: Let A M
n
(1).
(i) Prove that there are unique matrices B o
n
and C K
n
for
which A = B +C. Here B is called the symmetric part of A and
C is called the skew-symmetric part of A.
(ii) Show that A is positive denite if and only if the symmetric part
of A is positive denite.
(iii) Let A be symmetric. Show that A is positive denite if and only if
all eigenvalues of A are positive. If this is the case, then det(A) >
0.
9.11. EXERCISES 199
(iv) Let ,= S 1, 2, . . . , n. Let A
S
denote the submatrix of
A formed by using the rows and columns of A indexed by the
elements of S. (So in particular if S = k, then A
S
is the 1 1
matrix whose entry is the diagonal entry A
kk
of A.) Prove that if
A is positive denite (but not necessarily symmetric), then A
S
is
positive denite. Hence conclude that each diagonal entry of A is
positive.
(v) Let A
n
and for 1 i n let A
i
denote the principal subma-
trix of A dened by using the rst i rows and rst i columns of
A. Prove that det(A
i
) > 0 for each i = 1, 2, . . . , n. (Note: The
converse is also true, but the proof is a bit more dicult.)
7. A is a real rectangular matrix (possibly square).
(a) Prove that A
+
= A
1
when A is nonsingular.
(b) Determine all singular value decompositions of I with U = V .
Show that all of them lead to exactly the same pseudoinverse.
(c) The rank of A
+
is the same as the rank of A.
(d) If A is self-conjugate, then A
+
is self-conjugate.
(e) (cA)
+
=
1
c
A
+
for c ,= 0.
(f) (A
+
)

= (A

)
+
.
(g) (A
+
)
+
= A.
(h) Show by a counterexample that in general (AB)
+
,= B
+
A
+
.
(i) If A is m r, B is r n, and both matrices have rank k, then
(AB)
+
= B
+
A
+
.
(j) Suppose that x is m1 and y is 1 m. Compute
(a) x
+
; (b) y
+
; (c) (xy)
+
.
8. Let A =
_
_
_
_
1 2
1 2
0 3
2 5
_
_
_
_
,

b =
_
_
_
_
3
1
4
2
_
_
_
_
.
(a) Find a least-squares solution of Ax =

b.
200 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
(b) Find the orthogonal projection of

b onto the column space of A.


(c) Compute the least-squares error (in the solution of part (a)).
9. (a) Show that if C M
n
(c) is Hermitian and x

Cx = 0 for all x c
n
,
then C = 0.
(b) Show that for any A M
n
(c) there are (unique) Hermitian ma-
trices B and C for which A = B + iC.
(c) Show that if x

Ax is real for all x c


n
, then A is Hermitian.
10. Let V be the vector space over the reals 1consisting of the polynomials
in x of degree at most 4 with coecients in 1, and with the usual
addition of vectors (i.e., polynomials) and scalar multiplication.
Let B
1
= 1, x, x
2
, . . . , x
4
be the standard ordered basis of V . Let W
be the vector space of 2 3 matrices over 1 with the usual addition
of matrices and scalar multiplication. Let B
2
be the ordered basis of
W given as follows: B
2
= v
1
=
_
1 0 0
0 0 0
_
, v
2
=
_
0 1 0
0 0 0
_
, v
3
=
_
0 0 1
0 0 0
_
, v
4
=
_
0 0 0
1 0 0
_
, v
5
=
_
0 0 0
0 1 0
_
, v
6
=
_
0 0 0
0 0 1
_
.
Dene T : V W by:
For f = a
0
+ a
1
x + a
2
x
2
+ a
3
x
3
+ a
4
x
4
, put
T(f) =
_
0 a
3
a
2
+ a
4
a
1
+ a
0
a
0
0
_
.
Construct the matrix A = [T]
B
1
,B
2
that represents T with respect to
the pair B
1
, B
2
of ordered bases.
11. Show that if A M
n
(c) is normal, then Ax =

0 if and only if A

x =

0.
Use the matrices B =
_
0 1
0 0
_
and B

=
_
0 0
1 0
_
to show that
this if and only if does not hold in general.
9.11. EXERCISES 201
12. Let U M
p,m
(c).
Show that the following are equivalent:
(a)
_
I
p
U
U

I
m
_
is positive denite;
(b) I
p
UU

is positive denite;
(c) I
m
U

U is positive denite.
(Hint: consider
(v

, w

)
_
I
p
U
U

I
m
__
v
w
_
=
= v

(I
p
UU

)v + ??? = w

(I
m
U

U)w + ???)
13. Let A =
_
_
2 1 0
1 2 1
0 1 2
_
_
. Show that A is positive denite in three
dierent ways.
14. Compute a singular value decomposition of A where
A =
_
1 + i 1 0
1 i 0 1
_
.
15. You may assume that V is a nite-dimensional vector space over F, a
subeld of c, say dim(V ) = n. Let T /(V ). Prove or disprove each
of the following:
(a) V = null(T) range(T).
(b) There exists a subspace U of V such that U null(T) = 0 and
range(T) = T(u) : u U.
16. Let R, S, T /(V ), where V is a complex inner product space.
(i) Suppose that S is an isometry and R is a positive operator such
that T = SR. Prove that R =

T.
202 CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES
(ii) Let denote the smallest singular value of T, and let

denote
the largest singular value of T. Prove that
_
_
_
T(v)
|v|
_
_
_

for
every nonzero v V .
17. Let A be an nn matrix over F, where F is either c or 1. Show that
there is a polynomial f(x) F[x] for which A

= f(A) if and only if


A is normal. What can you say about the minimal degree of such a
polynomial?
18. Let F be any subeld of c. Let F
n
be endowed with the standard inner
product, and let A be a normal, nn matrix over F. (For example, F
could be the rational eld Q.) The following steps give another view
of normal matrices. Prove the following statements without using the
spectral theorem (for c).
(i) null(A) = (col(A))

= null(A

).
(ii) If A
2
x = 0, then Ax = 0.
(iii) If f(x) F[x], then f(A) is normal.
(iv) Suppose f(x), g(x) F[x] with 1 = gcd(f(x), g(x)). If f(A)x = 0
and g(A)y = 0, then x, y) = 0.
(v) Suppose p(x) = is the minimal polynomial of A. Then p(x) has
no repeated (irreducible) factors.
19. (a) Prove that a normal operator on a complex inner product space
with real eigenvalues is self-adjoint.
(b) Let T : V V be a self-adjoint operator. Is it true that T must
have a cube root? Explain. (A cube root of T is an operator
S : V V such that S
3
= T.)
Chapter 10
Decomposition WRT a Linear
Operator
To start this chapter we assume that F is an arbitrary eld and let V be an
n-dimensional vector space over F. Note that this necessarily means that we
are not assuming that V is an inner product space.
10.1 Powers of Operators
First note that if T /(V ), if k is a nonnegative integer, and if T
k
(v) =

0,
then T
k+1
(v) =

0. Thus null(T
k
) null(T
k+1
). It follows that

0 = null(T
0
) null(T
1
) null(T
k
) null(T
k+1
) . (10.1)
Theorem 10.1.1. Let T /(V ) and suppose that m is a nonnegative integer
such that null(T
m
) = null(T
m+1
). Then null(T
m
) = null(T
k
) for all k m.
Proof. Let k be a positive integer. We want to prove that
null(T
m+k
) = null(T
m+k+1
).
We already know that null(T
m+k
) null(T
m+k+1
). To prove the inclusion in
the other direction, suppose that v null(T
m+k+1
). Then

0 = T
m+1
(T
k
)(v).
So T
k
(v) null(T
m+1
) = null(T
m
), implying that

0 = T
m
(T
k
(v)) = T
m+k
(v).
Hence null(T
m+k+1
) null(T
m+k
), completing the proof.
203
204 CHAPTER 10. DECOMPOSITION WRT A LINEAR OPERATOR
Corollary 10.1.2. If T /(V ), n = dim(V ), and k is any positive integer,
then null(T
n
) = null(T
n+k
).
Proof. If the set containments in Eq. 10.1 are strict for as long as possible, at
each stage the dimension of a given null space is at least one more than the
dimension of the preceding null space. Since the entire space has dimension
n, at most n + 1 proper containments can occur. Hence by Theorem 10.1.1
the Corollary must be true.
Denition: Let T /(V ) and suppose that is an eigenvalue of T.
A vector v V is called a generalized eigenvector of T corresponding to
provided
(T I)
j
(v) =

0 for some positive integer j. (10.2)


By taking j = 1 we see that each eigenvector is also a generalized eigen-
vector. Also, the set of generalized eigenvectors is a subspace of V . Moreover,
by Corollary 10.1.2 (with T replaced by T I) we see that
Corollary 10.1.3. The set of generalized eigenvectors corresponding to is
exactly equal to null[(T I)
n
], where n = dim(V ).
An operator N /(V ) is said to be nilpotent provided some power of N is
the zero operator. This is equivalent to saying that the minimal polynomial
p(x) for N has the form p(x) = x
j
for some positive integer j, implying that
the characteristic polynomial for N is x
n
. Then by the Cayley-Hamilton
Theorem we see that N
n
is the zero operator.
Now we turn to dealing with images of operators. Let T /(V ) and
k 0. If w Im(T
k+1
), say w = T
k+1
(v) for some v V , then w =
T
k
(T(v) Im(T
k
). In other words we have
V = Im(T
0
) ImT
1
Im(T
k
) Im(T
k+1
) . (10.3)
Theorem 10.1.4. If T /(V ), n = dim(V ), and k is any positive integer,
then
Im(T
n
) = Im(T
n+k
).
Proof. We use the corresponding result already proved for null spaces.
dim(Im(T
n+k
)) = n dim(null(T
n+k
))
= n dim(null(T
n
))
= dim(Im(T
n
)).
Now the proof is easily nished using Eq. 10.3
10.2. THE ALGEBRAIC MULTIPLICITY OF AN EIGENVALUE 205
10.2 The Algebraic Multiplicity of an
Eigenvalue
If the matrix A is upper triangular, so is the matrix xI A, whose deter-
minant is the product of its diagonal elements and also is the characteristic
polynomial of A. So the number of times a given scalar appears on the
diagonal of A is also the algebraic multiplicity of as a root of the character-
istic polynomial of A. The next theorem states that the algebraic multiplicity
of an eigenvalue is also the dimension of the space of generalized eigenvectors
associated with that eigenvalue.
Theorem 10.2.1. Let n = dim(V ), let T /(V ) and let F. Then
for each basis B of V for which [T]
B
is upper triangular, appears on the
diagonal of [T]
B
exactly dim(null[(T I)
n
]) times.
Proof. For notational convenience, we rst assume that = 0. Once this
case is handled, the general case is obtained by replacing T with T I.
The proof is by induction on n, and the theorem is clearly true when n = 1.
We assume that n > 1 and that the theorem holds on spaces of dimension
n 1.
Suppose that B = (v
1
, . . . , v
n
) is a basis of V for which [T]
B
is the upper
triangular matrix
_
_
_
_
_

1

.
.
.

n1
0
n
_
_
_
_
_
. (10.4)
Let U = span(v
1
, . . . , v
n1
). Clearly U is invariant under T and the
matrix of T[
U
with respect to the basis (v
1
, . . . , v
n1
) is
_
_
_

1

.
.
.
0
n1
_
_
_
. (10.5)
By our induction hypothesis, 0 appears on the diagonal of the matrix in
Eq. 10.5 exactly dim(null((T[
U
)
n1
))) times. Also, we know that null((T[
U
)
n1
) =
null((T[
U
)
n
), because dim(U) = n 1. Hence
0 appears on the diagonal of 10.5 dim(null((T[
U
)
n
))) times. (10.6)
206 CHAPTER 10. DECOMPOSITION WRT A LINEAR OPERATOR
The proof now breaks into two cases, depending on whether
n
= 0 or
not. First consider the case where
n
,= 0. We show in this case that
null(T
n
) U. (10.7)
This will show that null(T
n
) = null((T[
U
)
n
), and hence Eq 10.6 will say that 0
appears on the diagonal of Eq. 10.4 exactly dim(null(T
n
)) times, completing
the proof in the case where
n
,= 0.
It follows from Eq. 10.4 that
[T
n
]
B
= ([T]
B
)
n
=
_
_
_
_
_

n
1

.
.
.

n
n1
0
n
n
_
_
_
_
_
. (10.8)
This shows that
T
n
(v
n
) = u +
n
n
v
n
for some u U. Suppose that v null(T
n
). Then v = u + av
n
where u U
and a F. Thus

0 = T
n
(v) = T
n
( u) + aT
n
(v
n
) = T
n
( u) + au + a
n
n
v
n
.
Because T
n
( u) and au are in U and v
n
, U, this implies that a
n
n
= 0. Since

n
,= 0, clearly a = 0. Thus v = u U, completing the proof of Eq. 10.7,
and hence nishing the case with
n
,= 0.
Suppose
n
= 0. Here we show that
dim(null(T
n
)) = dim(null((T[
U
)
n
)) + 1, (10.9)
which along with Eq. 10.6 will complete the proof when
n
= 0.
First consider
dim(null(T
n
)) = dim(U null(T
n
) + dim(U + null(T
n
)) dim(U)
= dim(null((T[
U
)
n
)) + dim(U + null(T
n
)) (n 1).
Also,
n = dim(V ) dim(U + null(T
n
)) dim(U) = n 1.
10.2. THE ALGEBRAIC MULTIPLICITY OF AN EIGENVALUE 207
It follows that if we can show that null(T
n
) contains a vector not in U, then
Eq. 10.9 will be established. First note that since
n
= 0, we have T(v
n
) U,
hence
T
n
(v
n
) = T
n1
(T(v
n
)) Im[(T[
U
)
n1
] = Im[(T[
U
)
n
].
This says that there is some u U for which T
n
(u) = T
n
(v
n
). Then uv
n
is
not in U but T
n
(uv
n
) =

0. Hence Eq. 10.9 holds, completing the proof.


At this point we know that the geometric multiplicity of is the dimension
of the null space of T I, i.e., the dimension of the eigenspace associated
with , and this is less than or equal to the algebraic multiplicity of , which is
the dimension of the null space of (T I)
n
and also equal to the multiplicity
of as a root of the characteristic polynomial of T, at least in the case that
T is upper triangularizable. This is always true if F is algebraically closed.
Moreover, in this case the following corollary is clearly true:
Corollary 10.2.2. If F is algebraically closed, then the sum of the algebraic
multiplicities of of all the eigenvalues of T equals dim(V ).
For any f(x) F[x], T and f(T) commute, so that the null space of p(T)
is invariant under T.
Theorem 10.2.3. Suppose F is algebraically closed and T /(V ). Let

1
, . . . ,
m
be the distinct eigenvalues of T, and let U
1
, . . . , U
m
be the corre-
sponding subspaces of generalized eigenvectors. Then
(a) V = U
1
. . . U
m
;
(b) each U
j
is T-invariant;
(c) each (T
j
I)[
U
j
is nilpotent.
Proof. Since U
j
= null(T
j
I)
n
for each j, clearly (b) follows. Clearly
(c) follows from the denitions. Also the sum of the multiplicities of the
eigenvalues equals dim(V ), i.e.
dim(V ) =
m

j=1
dim(U
j
).
Put U = U
1
+ + U
m
. Clearly U is invariant under T. Hence we can
dene S /(U) by
S = T[
U
.
208 CHAPTER 10. DECOMPOSITION WRT A LINEAR OPERATOR
It is clear that S has the same eigenvalues, with the same multiplicities,
as does T, since all the generalized eigenvectors of T are in U. Then the
dimension of U is the sum of the dimensions of the generalized eigenspaces
of T, forcing V = U, and V =

m
j=1
U
j
, completing the proof of (a).
There is another style of proof that oers a somewhat dierent insight into
this theorem, so we give it also. Suppose that f() = (
1
)
k
1
(
m
)
km
is the characteristic polynomial of T. Then by the Cayley-Hamilton theorem
(T
1
I)
k
1
(T
m
I)
km
is the zero operator. Put b
j
() =
f()
(
j
)
k
j
, for 1
j m. Then since gcd(b
1
(), . . . , b
m
()) = 1 there are polynomials a
j
() for
which 1 =

j
a
j
()b
j
(), which implies that I =

j
a
j
(T)b
j
(T). Then for
any vector u V we have u =

j
a
j
(T)b
j
(T)(u), where a
j
(T)b
j
(T)(u) U
j
.
It follows that V =

j
U
j
. Since the U
j
are linearly independent, we have
V =

j
U
j
.
By joining bases of the various generalized eigenspaces we obtain a basis
for V conisting of generalized eigenvectors, proving the following corollary.
Corollary 10.2.4. Let F be algebraically closed and T /(V ). Then there
is a basis of V consisting of generalized eigenvectors of T.
Lemma 10.2.5. Let N be a nilpotent operator on a vector space V over any
eld F. Then there is a basis of V with respect to which the matrix of N has
the form
_
_
_
0
.
.
.
0 0
_
_
_
, (10.10)
Proof. First choose a basis of null(N). Then extend this to a basis of
null(N
2
). Then extend this to a basis of null(N
3
). Continue in this fashion
until eventually a basis of V is obtained, since V = null(N
m
) for suciently
large m. A little thought should make it clear that with respect to this basis,
the matrix of N is upper triangular with zeros on the diagonal.
10.3 Elementary Operations
For 1 i n, let e
i
denote the column vector with a 1 in the ith position
and zeros elsewhere. Then e
i
e
T
j
is an nn matrix with all entries other than
the (i, j) entry equal to zero, and with that entry equal to 1.
10.3. ELEMENTARY OPERATIONS 209
Let
E
ij
(c) = I + ce
i
e
T
j
.
We leave to the reader the exercise of proving the following elementary
results:
Lemma 10.3.1. The following elementary row and column operations are
obtained by pre- and post-multiplying by the elementary matrix E
ij
(c):
(i) E
ij
(c)A is obtained by adding c times the jth row of A to the ith row.
(ii) AE
ij
(c) is obtained by adding c times the ith column of A to the jth
column.
(iii) E
ij
(c) is the inverse of the matrix E
ij
(c).
Moreover, if T is upper triangular with diagonal diag(
1
, . . . ,
n
), and if

i
,=
j
with i < j, then
T
t
= E
ij
(c)TE
ij
(c) = E
ij
(c)
_
_
_
_
_
_
_
_
_
_

1
.
.
.

i
.
.
.
0
j
.
.
.
_
_
_
_
_
_
_
_
_
_
E
ij
(c),
where T
t
is obtained from T by replacing the (i,j) entry t
ij
of T with t
ij
+
c(
i

j
). The only other entries of T that can possibly be aected are to
the right of t
ij
or above t
ij
.
Using Lemma 10.3.1 over and over, starting low, working left to right in
a given row and moving upward, we can transform T into a direct sum of
blocks
P
1
TP =
_
_
_
_
_
A
1
A
2
.
.
.
A
r
_
_
_
_
_
,
where each block A
i
is upper triangular with each diagonal element equal to

i
, 1 i r.
Our next goal is to nd an invertible matrix U
i
that transforms the block
A
i
into a matrix U
1
i
A
i
U
i
=
i
I + N where N is nilpotent with a special
210 CHAPTER 10. DECOMPOSITION WRT A LINEAR OPERATOR
form. All entries on or below the main diagonal are 0, all entries immediately
above the diagonal are 0 or 1, and all entries further above the diagonal are
0. This matrix
i
I + N is called a Jordan block. We want to arrange it so
that it is a direct sum of elementary Jordan blocks. These are matrices of the
form
i
I +N where each entry N just above the diagonal is 1 and all other
elements of N are 0. Then we want to use block multiplication to transform
P
1
TP into a direct sum of elementary Jordan blocks.
10.4 Transforming Nilpotent Matrices
Let B be n n, complex and nilpotent of order p: B
p1
,= 0 = B
p
. By
N(B
j
) we mean the right null space of the matrix B
j
. Note that if j > i,
then N(B
i
) N(B
j
). We showed above that if N(B
i
) = N(B
i+1
), then
N(B
i
) = N(B
k
) for all k k.
Step 1.
Lemma 10.4.1. Let W = (w
1
, . . . , w
r
) be an independent list of vectors in
N(B
j+1
) with W) N(B
j
) = 0. Then BW = (Bw
1
, . . . , Bw
r
) is an
independent list in N(B
j
) with BW) N(B
j1
) = 0.
Proof. Clearly BW N(B
j
). Suppose

r
i=1
c
i
Bw
i
+ u
j1
= 0, with u
j1

N(B
j1
). Then 0 = B
j1
(

r
i=1
c
i
Bw
i
+u
j1
) =

r
i=1
c
i
B
j
w
i
+ 0. This says

r
i=1
c
i
w
i
N(B
j
), so by hypothesis c
1
= c
2
= = c
r
= 0, and hence also
u
j1
= 0. It is now easy to see that the Lemma must be true.
Before proceeding to the next step, we introduce some new notation.
Suppose V
1
is a subspace of V . To say that the list (w
1
, . . . , w
r
) is independent
in V V
1
means rst that it is independent, and second that if W is the span
w
1
, . . . , w
r
) of the given list, then W V
1
= 0. To say that the list
(w
1
, . . . , w
r
) is a basis of V V
1
means that a basis of V
1
adjoined to the list
(w
1
, . . . , w
r
) is a basis of V . Also, L) denotes the space spanned by the list
L.
Step 2. Let U
p
be a basis of N(B
p
) N(B
p1
) = c
n
N(B
p1
), since
B
p
= 0. By Lemma 10.4.1 BU
p
is an independent list in N(B
p1
) N(B
p2
),
with BU
p
) N(B
p2
) = 0.
Complete BU
p
to a basis (BU
p
, U
p1
) of N(B
p1
) N(B
p2
). At this
point we have that
(BU
p
, U
p1
, U
p
) is a basis of N(B
p
) N(B
p2
) = c
n
N(B
p2
).
10.4. TRANSFORMING NILPOTENT MATRICES 211
Step 3. (B
2
U
p
, BU
p1
) is an independent list in N(B
p2
) N(B
p3
).
Complete this to a basis (B
2
U
p
, BU
p1
, U
p2
) of N(B
p2
) N(B
p3
). At this
stage we have that
(U
p
, BU
p
, U
p1
, B
2
U
p
, BU
p1
, U
p2
) is a basis of N(B
p
) N(B
p3
).
Step 4. Proceed in this way until a basis for the entire space V has been
obtained. At that point we will have a situation described in the following
array:
Independent set basis for subspace
(U
p
) N(B
p
) N(B
p1
) = c
n
N(B
p1
)
(U
p1
, BU
p
) N(B
p1
) N(B
p2
)
(U
p2
, BU
p1
, B
2
U
p
) N(B
p2
) N(B
p3
) (10.11)
(U
p3
, BU
p2
, B
2
U
p1
, B
3
U
p
) N(B
p3
) N(B
p4
)
.
.
.
.
.
.
(U
1
, BU
2
, . . . , B
p2
U
p1
, B
p1
U
p
) N(B
p(p1)
) N(B
0
) = N(B)
Choose x = x
1
B
pj
U
pj+1
and follow it back up to the top of that
column: x
pj
U
pj
; Bx
pj
= x
pj1
; . . . ; Bx
2
= x
1
. So B
pj1
x
pj
= x
1
.
We now want to interpret what it means for
P
1
BP = H =
_
_
_
H
1
.
.
.
H
s
_
_
_
with H
i
=
_
_
_
_
_
_
_
0 1 0
0 1
.
.
.
1
0
_
_
_
_
_
_
_
.
This last matrix is supposed to have 1s along the superdiagonal and 0s
elsewhere. This says that the jth column of BP (which is B times the jth
column of P) equals the jth column of PH, which is either the zero column
or the (j 1)st column of P. So we need
B(jth column of P) = 0 or the (j 1)st column of B.
212 CHAPTER 10. DECOMPOSITION WRT A LINEAR OPERATOR
So to form P, as the columns of P we take the vectors in the Array 10.11
starting at the bottom of a column, moving up to the top, then from the
bottom to the top of the next column to the left, etc. Each column of
Array 10.11 represents one block H
i
, and the bottom row consists of a basis
for the null space of B. Then we do have
P
1
BP = H =
_
_
_
H
1
.
.
.
H
s
_
_
_
.
Suppose we have a matrix A = I + B where B is nilpotent with all
elements on or below the diagonal equal to 0. Construct P so that P
1
BP =
H, i.e., P
1
AP = P
1
(B + I)P = I + H is a direct sum of elementary
Jordan blocks each with the same eigenvalue along the diagonal and 1s
just above the diagonal. Such a matrix is said to be in Jordan form.
Start with a general nn matrix A over C. Here is the general algorithm
for nding a Jordan form for A.
1. Find P
1
so that P
1
1
AP
1
= T is in upper triangular Schur form, so
T =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_

1
.
.
.

2
.
.
.

2
0
.
.
.

r
.
.
.

r
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
.
2. Find P
2
so that
P
1
2
TP
2
=
_
_
_
_
_
T

1
T

2
0
.
.
.
0 T
r
_
_
_
_
_
= C.
10.4. TRANSFORMING NILPOTENT MATRICES 213
3. Find P
3
so that P
1
3
CP
3
= D + H where
(P
1
P
2
P
3
)
1
A (P
1
P
2
P
3
) = D + H is in Jordan form.
An nn elementary Jordan block is a matrix of the form J
n
() = I +N
n
,
where c, N
n
is the nn nilpotent matrix with 1s along the superdiago-
nal and 0s elsewhere. The minimal and characteristic polynomials of J
n
()
are both equal to (x )
n
, and J
n
has a 1-dimensional space of eigenvectors
spanned by (1, 0, . . . , 0)
T
and an n-dimensional space of generalized eigen-
vectors, all belonging to the eigenvalue . If A has a Jordan form consisting
of a direct sum of s elementary Jordan blocks, the characteristic polynomi-
als of these blocks are the elementary divisors of A. The characteristic
polynomial of A is the product of its elementary divisors. If
1
, . . . ,
s
are
the distinct eigenvalues of A, and if for each i, 1 i s, (x
i
)
m
i
is the
largest elementary divisor involving the eigenvalue
i
, then

s
i=1
(x
i
)
m
i
is the minimal polynomial for A. It is clear that (x ) divides the minimal
polynomial of A if and only if it divides the characteristic polynomial of A if
and only if is an eigenvalue of A.
At this point we consider an example in great detail. In order to minimize
the tedium of working through many computations, we start with a block-
upper triangluar matrix N with square nilpotent matrices along the main
diagonal, so it is clear from the start that N is nilpotent. Let N = (P
ij
)
where the blocks P
ij
are given by
P
11
=
_
_
_
_
0 1 0 0
0 0 1 0
0 0 0 1
0 0 0 0
_
_
_
_
P
22
= P
33
=
_
_
0 1 0
0 0 1
0 0 0
_
_
P
44
=
_
0 1
0 0
_
P
55
= (0)
These diagonal blocks determine the sizes of each of the other blocks, and
all blocks P
ij
with i > j are zero blocks. For the blocks above the diagonal
214 CHAPTER 10. DECOMPOSITION WRT A LINEAR OPERATOR
we have the following.
P
12
=
_
_
_
_
1 1 0
1 1 0
1 1 0
0 1 0
_
_
_
_
; P
13
=
_
_
_
_
1 1 0
1 1 0
1 1 0
1 1 0
_
_
_
_
;
P
14
=
_
_
_
_
1 1
1 1
1 1
1 1
_
_
_
_
; P
15
=
_
_
_
_
1
1
1
1
_
_
_
_
;
P
23
=
_
_
1 1 0
1 1 0
0 1 0
_
_
; P
24
=
_
_
1 1
1 1
1 1
_
_
;
P
25
= P
35
=
_
_
1
1
1
_
_
; P
34
=
_
_
1 1
1 1
0 1
_
_
;
P
45
=
_
1
0
_
.
The reader might want to write out the matrix N as a 13 13 matrix
to visualize it better, but for computing N
2
using block multiplication the
above blocks suce. We give N
2
as an aid to computing its null space.
N
2
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0 0 1 0 1 0 1 1 0 1 1 0 0
0 0 0 1 1 0 1 1 0 1 1 0 0
0 0 0 0 0 0 1 1 0 1 1 0 0
0 0 0 0 0 0 1 1 0 1 1 0 0
0 0 0 0 0 0 1 1 0 1 1 0 0
0 0 0 0 0 0 0 0 0 1 1 0 0
0 0 0 0 0 0 0 0 0 1 1 0 0
0 0 0 0 0 0 0 0 0 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
Another routine computation shows that N
3
has rst row equal to
(0 0 0 1 1 0 0 0 0 0 0 0 0)
10.4. TRANSFORMING NILPOTENT MATRICES 215
and all other entries are zeros.
Now let x = (x
1
, x
2
, . . . , x
13
)
T
and write out the conditions for x to be in
the null space of N
j
for j = 1, 2, 3, 4.
The null space of N
4
is 1
13
. So N
4
has rank 0 and nullity 13.
x is in the null space of N
3
if and only if x
4
= x
5
. N
3
has rank 1 and
nullity 12.
x is in the null space of N
2
if and only if x
3
= x
4
= x
5
, x
7
= x
8
and
x
10
= x
11
. N
2
has rank 4 and nullity 9.
x is in the null space of N if and only if x
2
= x
3
= x
4
= x
5
, x
6
= x
7
= x
8
,
x
9
= x
10
= x
11
and x
12
= x
13
. N has rank 8 and nullity 5.
Now let o = (e
1
, e
2
, . . . , e
13
) be the standard basis of 1
13
. U
4
is to be a
basis of null(N
4
) null(N
3
) and must have one element since the nullity of
N
4
is one greater than the nullity of N
3
. Put U
4
= e
1
+ e
2
+ e
3
+ e
4
. It is
easy to see that this vector is not in the null space of N
3
.
For the next step rst compute N(e
1
+ e
2
+ e
3
+ e
4
) = e
1
+ e
2
+ e
3
.
This vector is in null(N
3
) null(N
2
). Since the nullity of N
3
is three more
than the nullity of N
2
, we need two more vectors in N
3
N
2
. We propose
e
1
+ e
2
+ + e
7
and e
1
+ + e
10
. Then we would have
e
1
+ + e
10
, e
1
+ + e
7
, e
1
+ e
2
+ e
3

as a basis of null(N
3
) null(N
2
). This is fairly easy to check. The next step
is to compute the image of each of these three vectors under N. We get the
three vectors
e
1
+ + e
9
, e
1
+ + e
6
, e
1
+ e
2
.
Since the nullity of N
2
is four more than the nullity of N, we need one
additional vector to have a basis of null(N
2
) null(N). We propose e
1
+ +
e
12
. Then it is routine to show that
e
1
+ + e
12
, e
1
+ e
9
, e
1
+ + e
6
, e
1
+ e
2

is a basis of null(N
2
) null(N).
We next compute the image of each of these four vectors under N and
get
e
1
+ + e
11
, e
1
+ + e
8
, e
1
+ + e
5
, e
1

216 CHAPTER 10. DECOMPOSITION WRT A LINEAR OPERATOR


in the null space of N. Since the nullity of N is 5, we need one more vector.
We propose e
1
+ + e
13
. So then we have
e
1
+ + e
13
, e
1
+ + e
11
, e
1
+ + e
8
, e
1
+ + e
5
, e
1

as a basis for the null space of N.


At this point we are ready to form the chains of generalized eigenvectors
into a complete Jordan basis. The rst chain is:
v
1
= e
1
, v
2
= e
1
+ e
2
, v
3
= e
1
+ e
2
+ e
3
, v
4
= e
1
+ e
2
+ e
3
+ e
4
.
The second chain is
v
5
= e
1
+ + e
5
, v
6
= e
1
+ + e
6
, v
7
= e
1
+ + e
7
.
The third chain is
v
8
= e
1
+ + e
8
, v
9
= e
1
+ + e
9
, v
10
= e
1
+ + e
10
.
The fourth chain is
v
11
= e
1
+ + e
11
, v
12
= e
1
+ + e
12
.
The fth chain is
v
13
= e
1
+ + e
13
.
With the basis B = (v
1
, v
2
, . . . , v
13
) forming the columns of a matrix
P, we have that P
1
NP is in Jordan form with elementary Jordan blocks of
sizes 4, 3, 3, 2 and 1.
10.5 An Alternative Approach to the Jordan
Form
Let F be an algebraically closed eld. Suppose A is n n over F with
characteristic polynomial f() = [I A[ = (
1
)
m
1
(
2
)
m
2
(

q
)
mq
, where
i
,=
j
if i ,= j, and each m
i
1. Let T
A
/(F
n
) be dened
as usual by T
A
: F
n
F
n
: x Ax. We have just seen that there is a
basis B of F
n
for which [T
A
]
B
is in Jordan form. This was accomplished
10.5. AN ALTERNATIVE APPROACH TO THE JORDAN FORM 217
by nding a basis B which is the union of bases, each corresponding to an
elementary Jordan block. If B
i
corresponds to the k k elementary Jordan
block J =
i
I + N
k
, then B
i
= (v
1
, v
2
, . . . , v
k
) where Av
j
=
i
v
j
+ v
j1
for
2 j k, and Av
1
=
i
v
1
. We call (v
1
, . . . , v
k
) a k-chain of generalized
eigenvectors for the eigenvalue
i
.
If P is the n n matrix whose coluns are the vectors in B arranged into
chains, with all the chains associated with a given eigenvalue being grouped
together, then P
1
AP = J
1
J
2
J
q
, where J
i
is the direct sum
of the elementary Jordan blocks associated with
i
. It is easy to see that
P
1
AP = D + N, where D is a diagonal matrix whose diagonal entries
are
1
, . . . ,
q
, with each
i
repeated m
i
times, and N is nilpotent. For an
elementary block I + N it is clear that I N = N I = N, from which
it follows that also DN = ND. So
Lemma 10.5.1. A = PDP
1
+ PNP
1
, where PDP
1
is diagonalizable,
PNP
1
is nilpotent and PDP
1
commutes with PNP
1
.
Write the partial fraction decomposition of 1/f() as
1
f()
=
a
1
()
(
1
)
m
1
+ +
a
q
()
(
q
)
mq
, (10.12)
where a
i
() is a polynomial in with degree at most m
i
1.
Put
b
i
() =
f()
(
i
)
m
i
=

j,=i
(
j
)
m
j
. (10.13)
Then put
P
i
= a
i
(A) b
i
(A) = a
i
(A)

j,=i
(A
j
I)
m
j
. (10.14)
Since f() divides b
i
()b
j
() when i ,= j, we have
P
i
P
j
= 0, if i ,= j. (10.15)
Since 1 =

q
i=1
a
i
()b
i
(), we have
I =
q

i=1
a
i
(A)b
i
(A) = P
1
+ + P
q
. (10.16)
218 CHAPTER 10. DECOMPOSITION WRT A LINEAR OPERATOR
Then P
i
= P
i
I = P
i
(P
1
+ + P
q
) = P
2
i
, so we can combine these last
two results as
P
i
P
j
=
ij
P
i
. (10.17)
Then using P
m
i
i
= P
i
we have
[P
i
(A
i
I)]
m
i
= P
i
(A
i
I)
m
i
= a
i
(A)b
i
(A)(A
i
I)
m
i
= a
i
(A)f(A) = 0.
(10.18)
Hence if we put N
i
= P
i
(A
i
I), then if i ,= j
N
i
N
j
= 0 and N
m
i
i
= 0. (10.19)
Note that P
i
and N
i
are both polynomials in A.
Start with I = P
1
+ + P
q
. Multiply by A on the right.
A = P
1
A + P
2
A + + P
q
A
= P
1
(
1
I + A
1
I) + + P
q
(
q
I + A
q
I)
=
q

i=1
(
i
P
i
+ P
i
(A
i
I))
=
q

i=1

i
P
i
+
q

i=1
N
i
= D + N, (10.20)
where D =

q
i=1

i
P
i
and N =

q
i=1
N
i
. Clearly D and N are polynomials
in A, so they commute with each other and with A, and N is nilpotent.
Starting with the observation that D
2
= (

q
i=1

i
P
i
)
2
=

q
i=1

2
i
P
i
it is easy
to see that for any polynomial g() F[] we have
g(D) =
q

i=1
g(
i
)P
i
. (10.21)
It then follows that if g() = (
1
)(
2
) (
q
), then g(D) = 0.
This says the minimal polynomial of D has no repeated roots, so that D is
diagonalizable.
Theorem 10.5.2. Let A be nn over F and suppose the minimal polynomial
for A factors into linear factors. Then there is a diagonalizable matrix D and
a nilpotent matrix N such that
(i) A = D + N; and
(ii) DN = ND.
The diagonalizable matrix D and the nilpotent matrix N are uniquely
determined by (i) and (ii) and each of them is a polynomial in A.
10.6. A JORDAN FORM FOR REAL MATRICES 219
Proof. We have just observed that we can write A = D + N where D is
diagonalizable and N is nilpotent, and where D and N not only commute
but are polynomials in A. Suppose that we also have A = D
t
+N
t
where D
t
is diagonalizable and N
t
is nilpotent, and D
t
N
t
= N
t
D
t
. (Recall that this
was the case in Lemma 10.5.1.) Since D
t
and N
t
commute with one another
and A = D
t
+N
t
, they commute with A, and hence with any polynomial in
A, especially with D and N. Thus D and D
t
are commuting diagonalizable
matrices, so by Theorem 7.4.4 they are simultaneously diagonalizable. This
means there is some invertible matrix P with P
1
DP and P
1
D
t
P both
diagonal and, of course, P
1
NP and P
1
N
t
P are both nilpotent. From
A = D + N = D
t
+ N
t
we have D D
t
= N
t
N. Then P
1
(D D
t
)P
is diagonal and equals P
1
(N
t
N)P, which is nilpotent (Why?), implying
P
1
(D D
t
)P is both diagonal and nilpotent, forcing P
1
(D D
t
)P = 0.
Hence D = D
t
and N = N
t
.
10.6 A Jordan Form for Real Matrices
Theorem 10.6.1. Let V be a real vector space and T /(V ). Then there
is a basis B of V for which
[T]
B
=
_
_
_
A
1

.
.
.
0 A
m
_
_
_
,
where each A
j
is a 1-by-1 matrix or a 2-by-2 matrix with no eigenvalues.
Proof. The result is clearly true if n = dim(V ) = 1, so suppose n = 2. If
T has an eigenvalue , let v
1
be any nonzero eigenvector of T belonging to
. Extend (v
1
) to a basis (v
1
, v
2
) of V . With respect to this basis T has an
upper triangular matrix of the form
_
a
0 b
_
.
In particular, if T has an eigenvalue, then there is a basis of V with respect
to which T has an upper triangular matrix. If T has no eigenvalues, then
choose any basis (v
1
, v
2
) of V . With respect to this basis, the matrix of T has
220 CHAPTER 10. DECOMPOSITION WRT A LINEAR OPERATOR
no eigenvalues. So we have the desired result when n = 2. Now suppose that
n = dim(V ) > 2 and that the desired result holds for all real vector spaces
with smaller dimension. If T has an eigenvalue, let U be a 1-dimensional
subspace of V that is T-invariant. Otherwise, by Theorem 7.3.1 let U be a
2-dimensional T-invariant subspace of V . Choose any basis of U and let A
1
denote the matrix of T[
U
with respect to this basis. If A
1
is a 2-by-2 matrix
then T has no eigenvalues, since otherwise we would have chosen U to be
1-dimensional. Hence T[
U
and A
1
have no eigenvalues.
Let W be any subspace of V for which V = U W. We would like to
apply the induction hypothesis to the subspace W. Unfortunately, W might
not be T-invariant, so we have to be a little tricky. Dene S /(W) by
S(w) = P
W,U
(T(w)) w W.
Note that
T(w) = P
U,W
(T(w)) + P
W,U
(T(w)) (10.22)
= P
U,W
(T(w)) + S(w) w W.
By our induction hypothesis, there is a basis for W with respect to which
S has a block upper triangular matrix of the form
_
_
_
S
2

.
.
.
0 A
m
_
_
_
,
where each A
j
is a 1-by-1 matrix or a 2-by-2 matrix with no eigenvalues.
Adjoin this basis of W to the basis of U chosen above, getting a basis of
V . The corresponding matrix of T is a block upper triangular matrix of the
desired form.
Our denition of the characteristic polynomial of the 2 2 matrix A =
_
a c
b d
_
is that it is
f(x) = (x a)(x d) bc = x
2
(trace(A))x + det(A).
The fact that this is a reasonable denition (the only reasonable denition
if we want the Cayley-Hamilton theorem to be valid for such matrices) follows
from the next result.
10.6. A JORDAN FORM FOR REAL MATRICES 221
Theorem 10.6.2. Suppose V is a real vector space with dimension 2 and
T /(V ) has no eigenvalues. Suppose A is the matrix of T with respect to
some basis. Let p(x) 1[x] be a polynomial of degree 2.
(a) If p(x) equals the characteristic polynomial of A, then p(T) = 0.
(b) If p(x) does not equal the characteristic polynomial of A, then p(T)
is invertible.
Proof. Part (a) follows immediately from the Cayley-Hamilton theorem, but
it is also easy to derive it independently of that result. For part (b), let
q(x) denote the characteristic polynomial of A with p(x) ,= q(x). Write
p(x) = x
2
+
1
x +
1
and q(x) = x
2
+
2
x +
2
for some
1
,
2
,
1
,
2
1.
Now
p(T) = p(T) q(T) = (
1

2
)T + (
1

2
)I.
If
1
=
2
, then
1
,=
2
, since otherwise p = q. In this case p(T) is some
multiple of the identity and hence is invertible. If
1
,=
2
, then
p(T) = (
1

2
)(T

2

1
)

2
I),
which is an invertible operator since T has no eigenvalues. Thus (b) holds.
Now suppose V is 1-dimensional and T /(V ). For 1, null(T I)
equals V if is an eigenvalue of T and 0 otherwise. If , 1 with

2
< 4, so that x
2
+ x + = 0 has no real roots, then
null(T
2
+ T + I) = 0.
(Proof: Because V is 1-dimensional, there is a constant 1 such that
Tv = v for all v V . (Why is this true?) So if v is a nonzero vector in V ,
then (T
2
+ T + I)v = (
2
+ + )v. The only way this can be 0 is for
v = 0 or (
2
+ + ) = 0. But this second equality cannot hold because
we are assuming that
2
< 4. Hence null(T
2
+ T + I) = 0.)
Now suppose that V is a 2-dimensional real vector space and that T
/(V ) still has no eigenvalues. For 1, null(T I) = 0 exactly because
T has no eigenvalues. If , 1 with
2
< 4, then null(T
2
+ T + I)
equals all of V if x
2
+x+ is the characteristic polynomial of T with respect
to some (and hence to all) ordered bases of V , and equals 0 otherwise by
222 CHAPTER 10. DECOMPOSITION WRT A LINEAR OPERATOR
part (b) of Theorem 10.6.2. It is important to note that in this case the null
space of T
2
+T +I is either 0 or the whole space 2-dimensional space!
The goal of this section is to prove the following theorem.
Theorem 10.6.3. Suppose that V is a real vector space of dimension n and
T /(V ). Suppose that with respect to some basis of V , the matrix of T has
the form
A =
_
_
_
A
1

.
.
.
0 A
m
_
_
_
, (10.23)
where each A
j
is a 1 1 matrix or a 2 2 matrix with no eigenvalues (as in
Theorem 9.4).
(a) If 1, then precisely dim(null((TI)
n
)) of the matrices A
1
, . . . , A
m
equal the 1 1 matrix [].
(b) If , 1 satisfy
2
< 4, then precisely
dim(null(T
2
+ T + I)
n
)
2
of the matrices A
1
, . . . , A
m
have characteristic polynomial equal to x
2
+x+b.
Note that this implies that null((T
2
+T +I)
n
) must have even dimen-
sion.
Proof. With a little care we can construct one proof that can be used to prove
both (a) and (b). For this, let , , 1 with
2
< 4. Dene p(x) 1[x]
by
p(x) =
_
x , if we are trying to prove (a);
x
2
+ x + , if we are trying to prove (b).
Let d denote the degree of p(x). Thus d = 1 or d = 2, depending on
whether we are trying to prove (a) or (b).
The basic idea of the proof is to proceed by induction on m, the number
of blocks along the diagonal in Eq. 10.23. If m = 1, then dim(V ) = 1 or
dim(V ) = 2. In this case the discussion preceding this theorem implies that
the desired result holds. Our induction hypothesis is that for m > 1, the
desired result holds when m is replaced with m1.
10.6. A JORDAN FORM FOR REAL MATRICES 223
Let B be a basis of V with respect to which T has the block upper-
triangular matrix of Eq. 10.23. Let U
j
denote the span of the basis vectors
corresponding to A
j
. So dim (U
j
) = 1 if A
j
is 1 1 and dim(U
j
) = 2 if A
j
is a 2 2 matrix (with no eigenvalues). Let
U = U
1
+ U
2
+ + U
m1
= U
1
U
2
U
m1
.
Clearly U is invariant under T and the matrix of T[
U
with respect to the
basis B
t
obtained from the basis vectors corresponding to A
1
, . . . , A
m1
is
[T[
U
]
B
=
_
_
_
A
1

.
.
.
0 A
m1
_
_
_
, (10.24)
Suppose that dim(U) = n
t
, so n
t
is either n 1 or n 2. Also,
dim(null(p(T[
U
)
n

)) = dim(null(p(T[
U
)
n
)). Hence our induction hypothesis
implies that
Precisely
1
d
dim(null(p(T[
U
))
n
) of the matrices
A
1
, . . . , A
m1
have characteristic polynomial p.
(10.25)
Let u
m
be a vector in U
m
. T(u
m
) might not be in U
m
, since the entries of
the matrix A in the columns above the matrix A
m
might not all be 0. But
we can project T(u
m
) onto U
m
. So let S /(U
m
) be the operator whose
matrix with respect to the basis corresponding to U
m
is A
m
. It follows that
S(u
m
) = P
Um,U
T(u
m
). Since V = UU
m
, we know that for any vector v V
we have v = P
U,Um
(v) + P
Um,U
(v). Putting T(u
m
) in place of v, we have
T(u
m
) = P
U,Um
T(u
m
) + P
Um,U
T(u
m
) (10.26)
=

u + S(u
m
),
where

u denotes an unknown vector in U. (Each time the symbol

u is used,
it denotes some vector in U, but it might be a dierent vector each time the
symbol is used.) Since S(u
m
) U
m
, so T(S(u
m
)) =

u + S
2
(u
m
), we can
apply T to both sides of Eq. 10.26 to obtain
T
2
(u
m
) =

u + S
2
(u
m
). (10.27)
(It is important to keep in mind that each time the symbol

u is used, it
probably means a dierent vector in U.)
224 CHAPTER 10. DECOMPOSITION WRT A LINEAR OPERATOR
Using equations Eqs. 10.26 and 10.27 it is easy to show that
p(T)(u
m
) =

u + p(S)u
m
. (10.28)
Note that p(S)(u
m
) U
m
. Thus iterating the last equation gives
p(T)
n
(u
m
) =

u + p(S)
n
(u
m
). (10.29)
Since V = U U
m
, for any v V , we can write v =

u+u
m
, with

u U
and u
m
U
m
. Then using Eq. 10.29 and the fact that U is invariant under
any polynomial in T, we have
p(T)
n
(v) =

u + p(S)
n
(u
m
), (10.30)
where v =

u + u
m
as above. If v null(p(T)
n
), we now see that 0 =

u + P(S)
n
(u
m
), from which it follows that P(S)
n
(u
m
) = 0.
The proof now breaks into two cases: Case 1 is where p(x) is not the
characteristic polynomial of A
m
; Case 2 is where p(x) is the characteristic
polynomial of A
m
.
So consider Case 1. Since p(x) is not the characteristic polynomial of A
m
,
we see that p(S) must be invertible. This follows from Theorem 10.6.3 and
the discussion immediately following, since the dimension of U
m
is at most
2. Hence P(S)
n
(u
m
) = 0 implies u
m
= 0. This says:
The null space of P(T)
n
is contained in U. (10.31)
This says that
null(p(T)
n
) = null(p(T[
U
)
n
).
But now we can apply Eq. 10.25 to see that precisely
_
1
d
_
dim(null(p(T)
n
)
of the matrices A
1
, . . . , A
m1
have characteristic polynmial p(x). But this
means that precisely
_
1
d
_
dim(null(p(T)
n
) of the matrices A
1
, . . . , A
m
have
characteristic polynomial p(x). This completes Case 1. Now suppose that
p(x) is the characteristic polynomial of A
m
. It is clear that dim(U
m
) = d.
Lemma 10.6.4. We claim that
dim(null(p(T)
n
)) = dim(null(p(T[
U
)
n
) + d.
10.6. A JORDAN FORM FOR REAL MATRICES 225
This along with the induction hypothesis Eq. 10.25 would complete the
proof of the theorem.
But we still have some work to do.
Lemma 10.6.5. V = U + null(p(T)
n
).
Proof. Because the characteristic polynomial of the matrix A
m
of S equals
p(x), we have p(S) = 0. So if u
m
U
m
, from Eq. 10.26 we see that
p(T)(u
m
) U. So
p(T)
n
(u
m
) = p(T)
n1
(p(T)(u
m
)) range(p(T[
U
)
n1
) = range(p(T[
U
)
n
),
where the last identity follows from the fact that dim(U) < n. Thus we can
choose u U such that p(T)
n
(u
m
) = p(T)
n
(u). Then
p(T)
n
(u
m
u) = p(T)
n
(u
m
) p(T)
n
(u)
= p(T[
U
)
n
(u) p(T[
U
)
n
(u) (10.32)
= 0.
This says that u
m
u null(p(T)
n
). Hence u
m
, which equals u+(u
m
u),
is in U + null(p(T)
n
), implying U
m
U + null(p(t)
n
). Therefore V =
U + null(p(T)
n
) V . This proves the lemma 10.6.5.
Since dim(U
m
) = d and dim(U) = n d, we have
dim(null(p(T)
n
) = dim(U null(p(T)
n
) + dim(U + null(p(T)
n
) dimU
= dim(null(p(T[
U
)
n
)) + dim(V ) (n d) (10.33)
= dim(null(p(T[
U
)
n
)) + d. (10.34)
This completes a proof of the claim in Lemma 10.6.4, and hence a proof
of Theorem 10.6.3
Suppose V is a real vector space and T /(V ). An ordered pair (, )
of real numbers is called an eigenpair of T if
2
< 4 and
T
2
+ T + I
is not injective. The previous theorem shows that T can have only nitely
many eigenpairs, because each eigenpair correpsponds to the characteristic
226 CHAPTER 10. DECOMPOSITION WRT A LINEAR OPERATOR
polynomial of a 2 2 matrix on the diagonal of A in Eq. 10.23, and there is
room for only nitely many such matrices along that diagonal.
We dene the multiplicity of an eigenpair (, ) of T to be
dim(null
_
(T
2
+ T + I)
dim(V )
_
)
2
.
From Theorem 10.5.3 we see that the multiplicity of (, ) equals the
number of times that x
2
+x + is the characteristic polynomial of a 2 2
matrix on the diagonal of A in Eq. 10.24.
Theorem 10.6.6. If V is a real vector space and T /(V ), then the sum
of the multiplicities of all the eigenvalues of T plus the sum of twice the
multiplicities of all the eigenpairs of T equals dim(V ).
Proof. There is a basis of V with respect to which the matrix of T is as
in Theorem 10.5.3. The multiplicity of an eigenvalue equals the number
of times the 1 1 matrix [] appears on the diagonal of this matrix (from
10.5.3). The multiplicity of an eigenpair (, ) equal the number of times
x
2
+x+ is the characteristic polynomial of a 22 matirx on the diagonal
of this matrix (from 10.5.3). Because the diagonal of this matrix has length
dim(V ), the sum of the multiplicities of all the eigenvalues of T plus the sum
of twice the multiplicities of all the eigenpairs of T must equal dim(V ).
Axlers approach to the characteristic polynomial of a real matrix is to
dene them for matrices of sizes 1 and 2, and then dene the characteristic
polynomial of a real matrix A as follows. First nd the matrix J = P
1
AP
that is the Jordan form of A. Then the characteristic polynomial of A is
the product of the characteristic polynomials of the 1 1 and 2 2 matrices
along the diagonal of J. Then he gives a fairly involved proof (page 207)
that the Cayley-Hamilton theorem holds, i.e., the characteristic polynomial
of a matrix A has A as a zero. So the minimal polynomial of A divides the
characteristic polynomial of A.
Theorem 10.6.7. Suppose V is a real, n-dimensional vector space and T
/(V ). Let
1
, . . . ,
m
be the distinct eigenvalues of T, with U
1
, . . . , U
m
the
corresponding spaces of generalized eigenvectors. So U
j
= null(T
j
I)
n
.
Let (
1
,
1
), . . . , (
r
,
r
) be the distinct eigenpairs of T. Let V
j
= null(T
2
+

j
T +
j
)
n
, for 1 j r. Then
10.7. EXERCISES 227
(a) V = U
1
U
m
V
1
V
r
.
(b) Each U
i
and each V
j
are invariant under T.
(c) Each (T
i
I)[
U
i
and each (T
2
+
j
T +
j
I)[
V
j
are nilpotent.
10.7 Exercises
1. Prove or disprove: Two nn real matrices with the same characteristic
polynomials and the same minimal polynomials must be similar.
2. Prove: Let A be an n n idempotent matrix (i.e., A
2
= A) with real
entries (or entries from ANY eld). Prove that A must be diagonaliz-
able.
3. Suppose A is a block upper-triangular matrix
A =
_
_
_
A
1

.
.
.
0 A
m
_
_
_
,
where each A
j
is a square matrix. Prove that the set of eigenvalues of
A equals the union of the sets of eigenvalues of A
1
, . . . , A
m
.
4. Suppose V is a real vector space and T /(V ). Suppose , 1 are
such that
2
< 4. Prove that
null(T
2
+ T + I)
k
has even dimension for every positive integer k.
5. Suppose V is a real vector space and T /(V ). Suppose , 1 are
such that
2
< 4 and T
2
+ T + I is nilpotent. Prove that dim(V )
is even and
(T
2
+ T + I)
dim(V )/2
= 0.
6. Let a and b be arbitrary complex numbers with b ,= 0. Put A =
_
a b
b a
_
. Dene a map T
A
on M
2
(c) by
T
A
: M
2
(c) M
2
(c) : B AB BA.
228 CHAPTER 10. DECOMPOSITION WRT A LINEAR OPERATOR
(i) Show that T
A
/(M
n
(c)), i.e., show that T
A
is linear.
(ii) Let B be the ordered basis of M
2
(c) given by
B = E
11
, E
12
, E
21
, E
22
,
where E
ij
has a 1 in position (i, j) and zeros elsewhere. Compute
the matrix [T
A
]
B
that represents T
A
with respect to the basis B.
(iii) Compute the characteristic polynomial of the linear operator T
A
and determine its eigenvalues.
(iv) Compute a basis of each eigenspace of T
A
.
(v) Give a basis for the range of T
A
.
(vi) Give the Jordan form of the matrix [T
A
]
B
and explain why your
form is correct.
7. Dene T /(c
3
) by T : (x
1
, x
2
, x
3
) (x
3
, 0, 0).
(a) Find all eigenvalues and correspopnding eigenvectors of T.
(b) Find all generalized eigenvectors of T.
(c) Compute the minimal polynomial for T.
(d) Give the Jordan form for T.
8. An nn matrix A is called nilpotent if A
k
= 0 for some positive integer
k. Prove:
(a) A is nilpotent if and only if all its eigenvalues are 0.
(b) If A is a nonzero real nilpotent matrix, it can not be symmetric.
(c) If A is a nonzero complex nilpotent matrix, it can not be hermitian
(i.e., self-adjoint).
9. Let A =
_
_
_
_
1 1 0 0
0 0 1 0
0 0 0 1
0 1 0 0
_
_
_
_
, where A is to be considered as a matrix over
c.
10.7. EXERCISES 229
(a) Determine the minimal and characteristic polynomials of A and the
Jordan form for A.
(b) Determine all generalized eigenvectors of A and a basis B of c
4
with
respect to which the operator T
A
: x Ax has Jordan form. Use this
to write down a matrix P such that P
1
AP is in Jordan form.
10. Let A be an n n symmetric real matrix with some power of A being
the identity matrix. Show that either A = I or A = I or A
2
= I but
A is not a scalar times I.
11. LetV be the usual real vector space of all 3 3 real matrices. Dene
subsets of V by
U = A V : A
T
= A and W = A V : A
T
= A.
(a) (3 points) Show that both U and W are subspaces of V .
(b) (3 points) Show that V = U W.
(c) (3 points) Determine the dimensions of U and W.
(d) (3 points) Let Tr : V 1 : A trace(A). Put R = A U :
Tr(A) = 0. Show that R is a subspace of U and compute a basis
for R.
(e) (8 points) Put P =
_
_
0 1 0
0 0 1
1 0 0
_
_
and dene T : W W : B
PB + BP
T
. Show that T /(W) and determine a basis B for
W. Then compute the matrix [T]
B
. What are the minimal and
characteristic polynomials for T?
12. Let W be the space of 32 matrices over 1. Put V
1
=
_
_
_
_
_
a 0
b 0
c 0
_
_
: a, b, c 1
_
_
_
,
and V
2
=
_
_
_
_
_
0 a
0 b
0 c
_
_
: a, b, c 1
_
_
_
. So W is isomorphic to V
1
V
2
.
Let A be the matrix
A =
_
_
1 2 3
2 0 4
3 4 1
_
_
230 CHAPTER 10. DECOMPOSITION WRT A LINEAR OPERATOR
Dene a map T /(W) by T : W W : B AB.
Compute the eigenvalues of T, the minimal polynomial for T, and the
characteristic polynomial for T. Compute the Jordan form for T. (Hint:
One eigenvector of the matrix A is
_
_
1
1
1
_
_
.)
13. Let V = 1
5
and let T /(V ) be dened by T(a, b, c, d, e) = (2a, 2b, 2c+
d, a + 2d, b + 2e).
(a) Find the characteristic and minimal polynomial of T.
(b) Determine a basis of F
5
consisting of eigenvectors and generalized
eigenvectors of T.
(c) Find the Jordan form of T (or of M(T)) with respect to your
basis.
14. Suppose that S, T /(V ). Suppose that T has dim(V ) distinct eigen-
values and that S has the same eigenvectors as T (though not neces-
sarily with the same eigenvalues). Prove that ST = TS.
15. Let k be positive integer greater than 1 and let N be an nn nilpotent
matrix over c. Say N
m
= 0 ,= N
m1
for some positive integer m 2.
(a) Prove that I + N is invertible
(b) Prove that there is some polynomial f(x) c[x] with degree at
most m 1 and constant term equal to 1 for which B = f(N)
satises B
k
= I + N.
16. Let A be any invertible nn complex matrix, and let k be any positive
integer greater than 1. Show that there is a matrix B for which B
k
= A.
17. If A is not invertible the situation is much more complicated. Show
that if the minimal polynomial for A has 0 as a root with multiplicity
1, then A has a k-th root. In particular if A is normal, then it has a
k-th root.
18. If A is nilpotent a wide variety of possibilities can occur concerning
k-th roots.
10.7. EXERCISES 231
(a) Show that there is an A M
10
(c) for which A
5
= 0 but A
4
,= 0.
(b) Show that any A as in Part (a) cannot have a cube root or any
k-th root for k 3.
19. Let m, k be integers greater than 1, and let J be an m m Jordan
block with 1s along the superdiagonal and zeros elsewhere. Show that
there is no mm complex matrix B for which B
k
= J.
232 CHAPTER 10. DECOMPOSITION WRT A LINEAR OPERATOR
Chapter 11
Matrix Functions

11.1 Operator Norms and Matrix Norms

Let V and W be vector spaces over F having respective norms ||


V
and ||
W
.
Let T /(V, W). One common problem is to understand the size of the
linear map T in the sense of its eects on the magnitude of the inputs. For
each nonzero v V , the quotient |T(v)|
W
/|v|
V
measures the magnication
caused by the transformation T on that specic vector v. An upper bound on
this quotient valid for all v would thus measure the overall eect of T on the
size of vectors in V . It is well known that in every nite dimensional space
V , the quotient |T(v)|
W
/|v|
V
has a maximum value which is achieved with
some specic vector v
0
. Note that if v is replaced by cv for some nonzero
c F, the quotient is not changed. Hence if O = x V : |x|
V
= 1, then
we may dene a norm for T (called a transformation norm) by
|T|
V,W
= max|T(v)|
W
/|v|
V
:

0 ,= v V = max|T(v)|
W
: v O.
(11.1)
(In an Advanced Calculus course a linear map T is shown to be continu-
ous, and the continuous image of a compact set is compact.) In this denition
we could replace max with supremum in order to guarantee that |T|
V,W
is always dened even when V and W are not nite dimensional. However,
in this text we just deal with the three specic norms we have already de-
ned on the nite dimensional vector spaces (See Section 8.1). (For more
on matrix norms see the book MATRIX ANALYSIS by Horn and Johnson.)
The norm of a linear map from V to W dened in this way is a vector norm
233
234 CHAPTER 11. MATRIX FUNCTIONS

on the vector space /(V, W). (See Exercise 1.) We can actually say a bit
more.
Theorem 11.1.1. Let U, V and W be vector spaces over F endowed with
vector norms | |
U
, | |
V
, | |
W
, respectively. Let S, T /(U, V ) and
L /(V, W) and suppose they each have norms dened by Eq. 11.1 Then:
(a) |T|
U,V
0, and |T|
U,V
= 0 if and only if T(u) =

0 for all u V .
(b) |aT|
U,V
= [a[ |T|
U,V
for all scalars a.
(c) |S + T|
U,V
|S|
U,V
+|T|
U,V
.
(d) |T(u)|
V
|T|
U,V
|u|
U
for all u U.
(e) |I|
U,U
= 1, where I /(U) is dened by I(u) = u for all u U.
(f ) |L T|
U,W
|L|
V,W
|T|
U,V
.
(g) If U = V , then |T
i
|
U,U
(|T|
U,U
)
i
.
Proof. Parts (a), (d) and (e) follow immediately from the denition of the
norm of a linear map. Part (b) follows easily since [a[ can be factored out
of the appropriate maximum (or supremum). For part (c), from the triangle
inequality for vector norms, we have
|(S + T)(u)|
V
= |S(u) + T(u)|
V
|S(u)|
V
+|T(u)|
V
(|S|
U,V
+|T|
U,V
)|u|
U
,
from part (d). This says the quotient used to dene the norm of S + T is
bounded above by (|S|
U,V
+ |T|
U,V
), so the least upper bound is less than
or equal to this. Part (f) follows in a similar manner, and then (g) follows
by repeated applications of part (f).
Let A be an m n matrix over F (with F a subeld of c). As usual,
we may consider the linear map T
A
: F
n
F
m
: x Ax. We consider the
norm on /(F
n
, F
m
) induced by each of the standard norms | |
1
, | |
2
, | |

(see Section 8.1), and use it to dene a corresponding norm on the vector
space of mn matrices. It seems natural to let the transformation norm of
T
A
also be a norm of A. Specically, we have
|A|
1
= max
_
|Ax|
1
|x|
1
: x ,=

0
_
|A|
2
= max
_
|Ax|
2
|x|
2
: x ,=

0
_
|A|

= max
_
|Ax|
|x|
: x ,=

0
_
.
11.1. OPERATOR NORMS AND MATRIX NORMS

235
Theorem 11.1.2. Let A be mn over the eld F. Then:
(a) |A|
1
= max

m
i=1
[A
ij
[ : 1 j n (maximum absolute column
sum).
(b) |A|

= max

n
j=1
[A
ij
[ : 1 i m (maximum absolute row
sum).
(c) |A|
2
= maximum singular value of A.
Proof. To prove (a), observe
|Ax|
1
=
m

i=1

j=1
A
ij
x
j

i=1
n

j=1
[A
ij
[ [x
j
[
=
n

j=1
_
m

i=1
[A
ij
[
_
[x
j
[
n

j=1
_
max
j

i
[A
ij
[
_
[x
j
[
=
_
max
j
m

i=1
[A
ij
[
_
|x|
1
= |x|
1
.
So |A|
1
= max
j

m
i=1
[A
ij
[. To complete the proof of (a) we
need to construct a vector x F
n
such that |Ax|
1
= |x|
1
. Put x =
(0, 0, . . . , 1, . . . , 0)
T
where the single nonzero entry 1 is in column j
0
, and
the maximum value of

m
i=1
[A
ij
[ occurs when j = j
0
. Then |x|
1
= 1, and
|Ax|
1
=

m
i=1
[A
ij
0
[ = = |x|
1
as desired.
We now consider part (b). Dene by
= max
n

j=1
[A
ij
[ : 1 i m.
For an arbitrary v = (v
1
, . . . , v
n
)
T
F
n
suppose that the maximum dening
above is attained when i = i
0
. Then we have
|T
A
(v)|

= |Av|

= max
i
[(Av)
i
[ = max
i

j=1
A
ij
v
j

max
i
n

j=1
([A
ij
[ [v
j
[) max
i
n

j=1
([A
ij
[ max
k
[v
k
[)
|v|

,
236 CHAPTER 11. MATRIX FUNCTIONS

so |T
A
(v)|

/|v|

for all nonzero v. This says that this norm of T


A
is
at most . To show that it equals we must nd a vector v such that the
usual quotient equals . For each j = 1, . . . , n, chose v
j
with absolute value
1 so that A
i
0
j
v
j
= [A
i
0
j
[. Clearly |v|

= 1, so that |T
A
(v)|

, which
we knew would have to be the case anyway. On the other hand
[(T
A
(v))
i
0
[ =

j=1
A
i
0
j
v
j

j=1
[A
i
0
j
[

= .
This shows that |T
A
(v)|

= for this particular v, and hence this norm of


T
A
is .
Part (c) is a bit more involved. First note that |v| as used in earlier
chapters is just |v|
2
. Suppose that S is unitary and m m, and that A is
mn. Then |(SA)(v)|
2
= |S(A(v))|
2
= |(A(v)|
2
for all v F
n
. It follows
immediately that |SA|
2
= |A|
2
. Now let A be an arbitrary m n matrix
with singular value decomposition A = UV

. So |A|
2
= |UV

|
2
=
|V

|
2
. Since | |
2
is a matrix norm coming from a transformation norm,
by part (f) of Theorem 11.1.1 |V

|
2
||
2
|V

|
2
= ||
2
. If =
diag(
1
,
2
, . . . ,
k
, 0, . . . , 0), with
1

2

k
> 0, then |x|
2
=
|(
1
x
1
, . . . ,
k
x
k
, 0, . . . , 0)
T
|
2
=
_

2
1
x
2
1
+
2
k
x
2
k

1
|x|
2
. So ||
2

1
.
Suppose x = (1, 0, . . . , 0)
T
. Then x = (
1
x
1
, 0, . . . , 0)
T
. So |x|
2
= 1,
|x|
2
= [
1
x
1
[ =
1
=
1
|x|
2
. This says ||
2
=
1
. So we know that
|V

|
2

1
. Let x be the rst column of V , so V

x = (1, 0, . . . , 0)
T
. Then
|V

x|
2
=
1
, showing that |V

|
2
= ||
2
=
1
, so that nally we see
|A|
2
=
1
.
11.2 Polynomials in an Elementary Jordan
Matrix

Let N
n
denote the n n matrix with all entries equal to 0 except for those
along the super diagonal just above the main diagonal:
N
n
=
_
_
_
_
_
_
_
0 1 0 0
0 0 1 0
0 0 0 0
.
.
.
.
.
.
.
.
. 0 1
0 0 0
_
_
_
_
_
_
_
. (11.2)
11.2. POLYNOMIALS IN AN ELEMENTARY JORDAN MATRIX

237
So (N
n
)
ij
=
_
1, if j = i + 1;
0, otherwise.
The following lemma is easily established
Lemma 11.2.1. For 1 m n 1, (N
m
n
)
ij
=
_
1, if j = i + m;
0, otherwise.
Also,
N
0
n
= I and N
n
n
= 0.
Then let J
n
() be the elementary n n Jordan block with eigenvalue
given by J
n
() = I + N
n
. So
J
n
() =
_
_
_
_
_
_
_
1 0 0
0 1 0
0 0 0
.
.
.
.
.
.
.
.
. 1
0 0
_
_
_
_
_
_
_
= I + N
n
,
where N
n
is nilpotent with minimal polynomial x
n
.
Theorem 11.2.2. In this Theorem (and until notied otherwise) write J in
place of J
n
(). For each positive integer m,
J
m
=
_
_
_
_
_

m
_
m
1
_

m1

_
m
m1
_

mn+1

m

_
m
m2
_

mn+2
.
.
.

m
_
_
_
_
_
,
that is,
(J
m
)
ij
=
_
m
j i
_

mj+i
1 i, j n.
Proof. It is easy to see that the desired result holds for m = 1 by the denition
of J, so suppose that it holds for a given m 1. Then
(J
m+1
)
ij
= (J J
m
)
ij
=
n

k=1
J
ik
(J
m
)
kj
= (J
m
)
ij
+ 1 (J
m
)
i+1,j
=
_
m
j i
_

mj+i
+
_
m
j i 1
_

mj+i+1
=
_
m
j i
_

m+1j+i
+
_
m
j i 1
_

m+1j+i
=
_
m + 1
j i
_

m+1j+i
.
238 CHAPTER 11. MATRIX FUNCTIONS

Now suppose that f(x) =

k
m=0
a
m
x
m
, so that f(J) = a
0
I +a
1
J + +
a
k
J
k
. Recall (or prove by induction on s) that for s 0,
f
(s)
(x) =
k

m=0
a
m
_
m
s
_
s!x
ms
, so
k

m=0
a
m
_
m
s
_

ms
=
f
(s)
()
s!
. (11.3)
So for s 0,
(f(J))
i,i+s
=
k

m=0
a
m
_
m
s
_

ms
=
1
s!
f
(s)
().
This says that
f(J) =
_
_
_
_
_
_
_
f()
1
1!
f
(1)
()
1
2!
f
(2)
()
1
(n1)!
f
(n1)
()
0 f()
1
1!
f
(1)
()
1
(n2)!
f
(n2)
()
0 0 f()
1
(n3)!
f
(n3)
()
.
.
.
.
.
.
.
.
.
.
.
.
0 0 f()
_
_
_
_
_
_
_
. (11.4)
Now suppose that J is a direct sum of elementary Jordan blocks:
J = J
1
J
2
J
s
.
Then for the polynomial f(x) we have
f(J) = f(J
1
) f(J
2
) f(J
s
).
Here f(J
1
), . . . , f(J
s
) are polynomials in separate Jordan blocks, whose val-
ues are given by Eq. 11.3. This result may be applied in the computation of
f(A) even when A is not originally in Jordan form. First determine a T for
which J = T
1
AT has Jordan form. Then compute f(J) as above and use
f(A) = f(TJT
1
) = Tf(J)T
1
.
11.3 Scalar Functions of a Matrix

For certain functions f : F F (satisfying requirements stipulated below)


we can dene a matrix function f : A f(A) so that if f is a polynomial, the
11.3. SCALAR FUNCTIONS OF A MATRIX

239
value f(A) agrees with the value given just above. Start with an arbitrary
square matrix A over F and let
1
, . . . ,
s
be the distinct eigenvalues of A.
Reduce A to Jordan form
T
1
AT = J
1
J
2
J
t
,
where J
1
, . . . , J
t
are elementary Jordan blocks. Consider the Jordan block
J
i
= J
n
i
(
i
) =
_
_
_
_
_

i
1 0 0

i
1 0
.
.
.
.
.
.

i
_
_
_
_
_
, (11.5)
which has (x
i
)
n
i
as its minimal and characteristic polynomials. If the
function f : F F is dened in a neighborhood of the point
i
and has
derivatives f
(1)
(
i
), . . . , f
(n
i
1)
(
i
), then dene f(J
i
) by
f(J
i
) =
_
_
_
_
_
_
_
f()
1
1!
f
(1)
()
1
2!
f
(2)
()
1
(n1)!
f
(n1)
()
0 f()
1
1!
f
(1)
()
1
(n2)!
f
(n2)
()
0 0 f()
1
(n3)!
f
(n3)
()
.
.
.
.
.
.
.
.
.
.
.
.
0 0 f()
_
_
_
_
_
_
_
. (11.6)
This says that
(f(J
i
))
r,r+j
=
1
j!
f
(j)
(), for 0 j n r. (11.7)
If f is dened in a neighborhood of each of the eigenvalues
1
, . . . ,
s
and
has (nite) derivatives of the proper orders in these neighborhoods, then also
f(J) = f(J
1
) f(J
2
) f(J
t
), (11.8)
and
f(A) = Tf(J)T
1
= Tf(J
1
)T
1
Tf(J
t
)T
1
. (11.9)
The matrix f(A) is called the value of the function f at the matrix A.
We will show below that f(A) does not depend on the method of reducing
A to Jordan form (i.e., it does not depend on the particular choice of T),
240 CHAPTER 11. MATRIX FUNCTIONS

and thus f really denes a function on the n n matrices A. This matrix


function is called the correspondent of the numerical function f. Not all
matrix functions have corresponding numerical functions. Those that do are
called scalar functions.
Here are some of the simplest properties of scalar functions.
Theorem 11.3.1. It is clear that the denition of scalar function was chosen
precisely so that part (a) below would be true.
(a) If f() is a polynomial in , then the value f(A) of the scalar functioin
f coincides with the value of the polynomial f() evaluated at = A.
(b) Let A be a square matrix over F and suppose that f
1
() and f
2
() are
two numerical functions for which the expressions f
1
(A) and f
2
(A) are
meaningful. If f() = f
1
() +f
2
(), then f(A) is also meaningful and
f(A) = f
1
(A) + f
2
(A).
(c) With A, f
1
and f
2
as in the preceding part, if f() = f
1
()f
2
(), then
f(A) is meaningful and f(A) = f
1
(A)f
2
(A).
(d) Let A be a matrix with eigenvalues
1
, . . . ,
n
, each appearing as often
as its algebraic multiplicity as an eigenvalue of A. If f : F F is
a numerical function and f(A) is dened, then the eigenvalues of the
matrix f(A) are f(
1
), . . . , f(
n
).
Proof. The proofs of parts (b) and (c) are analogous so we just give the
details for part (c). To compute the values f
1
(A), f
2
(A) and f(A) according
to the denition, we must reduce A to Jordan form J and apply the formulas
given in Eqs. 11.8 and 11.9. If we show that f(J) = f
1
(J)f
2
(J), then from
Eq. 11.8 we immediately obtain f(A) = f
1
(A)f
2
(A). In fact,
f(J) = f(J
1
) f(J
t
),
and
f
1
(J)f
2
(J) = f
1
(J)f
2
(J) = f
1
(J
1
)f
2
(J
1
) f
1
(J
t
)f
2
(J
t
),
so that the proof is reduced to showing that
f(J
i
) = f
1
(J
i
)f
2
(J
i
), (i = 1, 2, . . . , t),
11.3. SCALAR FUNCTIONS OF A MATRIX

241
where J
i
is an elementary Jordan block.
Start with the values of f
1
(J
i
) and f
2
(J
i
) given in Eq. 11.6, and multiply
them together to nd that
[f
1
(J
i
)f
2
(J
i
)]
r,r+s
=
s

j=0
(f
1
(J
i
))
r,r+j
(f
2
(J
i
))
r+j,r+s
=
=
s

j=0
1
j!
f
(j)
1
(
i
)
1
(s j)!
f
(sj)
2
(
i
) =
=
1
s!
s

j=0
s!
j!(s j)!
f
(j)
1
(
i
)f
(sj)
2
(
i
) =
1
s!
(f
1
f
2
)
(s)
(
i
),
where the last equality comes from Exercise 2, part (i).
Thus f
1
(J
i
)f
2
(J
i
) = f(J
i
), completing the proof of part (c) of the theo-
rem. For part (d), the eigenvalues of the matrices f(A) and T
1
f(A)T =
f(T
1
AT) are equal, and therefore we may assume that A has Jordan form.
Formulas in Eq. 11.5 and 11.6 show that in this case f(A) is upper triangular
with f(
1
), . . . , f(
n
) along the main diagonal. Since the diagonal elements
of an upper triangular matrix are its eigenvalues, part (d) is proved.
Similarly, if f and g are functions such that f(g(A)) is dened, and if
h() = f(g()), then h(A) = f(g(A)).
To nish this section we consider two examples.
Example 11.3.2. Let f() =
1
. This function is dened everywhere
except at = 0, and has nite derivatives of all orders everywhere it is
dened. Consequently, if the matrix A does not have zero as an eigenvalue,
i.e., if A is nonsingular, then f(A) is meaningful. But f() = 1, hence
Af(A) = I, so f(A) = A
1
. Thus the matrix function A A
1
corresponds
to the numerical function
1
, as one would hope.
Example 11.3.3. Let f() =

. To remove the two-valuedness of



it is
sucient to slit the complex plane from the origin along a ray not containing
any eigenvalues of A, and to consider one branch of the radical. Then this
function, for ,= 0, has nite derivatives of all orders. It follows that the
expression

A is meaningful for all nonsingular matrices A. Putting = A


in the equation
f()f() = ,
242 CHAPTER 11. MATRIX FUNCTIONS

we obtain
f(A)f(A) = A.
This shows that each nonsingular matrix has a square root.
11.4 Scalar Functions as Polynomials

At this point we need to generalize the construction given in Lagrange inter-


polation (see Section 5.3).
Lemma 11.4.1. Let r
1
, . . . , r
s
be distinct complex numbers, and for each i,
1 i s, let m
i
be a nonnegative integer. Let (a
ij
) be a table of arbitrary
numbers , 1 i s, and for each xed i, 0 j m
i
. Then there exists a
polynomial p(x) such that p
(j)
(r
i
) = a
ij
, for 1 i s and for each xed i,
0 j m
i
.
Proof. It is convenient rst to construct auxiliary polynomials p
i
(x) such
that p
i
(x) and its derivatives to the m
i
th order assume the required values
at the point r
i
and are all zero at the other given points. Put

i
(x) = b
i0
+ b
i1
(x r
i
) + + b
im
i
(x r
i
)
m
i
=
m
i

j=0
b
ij
(x r
i
)
j
,
where the b
ij
are complex numbers to be determined later. Note that
(j)
i
(r
i
) =
j!b
ij
.
Set

i
(x) = (x r
1
)
m
i
+1
(x r
i1
)
m
i
+1
(x r
i+1
)
m
i
+1
(x r
s
)
m
i
+1
,
and
p
i
(x) =
i
(x)
i
(x) =
_
m
i

j=0
b
ij
(x r
i
)
j
_
1js

j,=i
(x r
j
)
m
i
+1
.
By the rule for dierentiating a product (see Exercise 2),
p
(j)
i
(r
i
) =
j

l=0
_
j
l
_

(l)
i
(r
i
)
(jl)
i
(r
i
),
11.4. SCALAR FUNCTIONS AS POLYNOMIALS

243
or
a
ij
=
j

l=0
_
j
l
_
l!b
il

(jl)
i
(r
i
). (11.10)
Using Eq. 11.10 with j = 0 and the fact that
i
(r
i
) ,= 0 we nd
b
i0
=
a
i0

i
(r
i
)
, 1 i s. (11.11)
For each i = 1, 2, . . . , s, and for a given j with 0 j < m
i
, once b
il
is
determined for all l with 0 l < j, we can solve Eq. 11.10 for
b
ij
=
a
ij
j!
i
(r
i
)

j1
l=0
b
il

(jl)
i
(r
i
)
(j l)!
i
(r
i
)
, 1 i s. (11.12)
This determines p
i
(x) so that p
i
(x) and its derivatives up to the m
i
th deriva-
tive have all the required values at r
i
and all equal zero at r
t
for t ,= i. It is
now clear that the polynomial p(x) = p
1
(x) +p
2
(x) + +p
s
(x) satises all
the requirements of the Lemma.
Consider a numerical function f() and an nn matrix A for which
the value f(A) is dened. We show that there is a polynomial p(x) for which
p(A) equals f(A). Let
1
, . . . ,
s
denote the distinct eigenvalues of the matrix
A. Using only the proof of the lemma we can construct a polynomial p(x)
which satises the conditions
p(
i
) = f(
i
), p
t
(
i
) = f
t
(
i
), . . . , p
(n1)
(
i
) = f
(n1)
(
i
), (11.13)
where if some of the derivatives f
(j)
(r
i
) are superuous for the determination
of f(A), then the corresponding numbers in Eq. 11.12 may be replaced by
zeros. Since the values of p(x) and f() (and their derivatives) coincide at
the numbers
i
, then f(A) = p(A). This completes a proof of the following
theorem.
Theorem 11.4.2. The values of all scalar functions in a matrix A can be
expressed by polynomials in A.
Caution: The value f(A) of a given scalar function f can be represented
in the form of some polynomial p(A) in A. However, this polynomial, for a
given function f will be dierent for dierent matrices A. For example, the
244 CHAPTER 11. MATRIX FUNCTIONS

minimal polynomial p(x) of a nonsingular matrix A can be used to write A


1
as a polynomial in A, but for dierent matrices A dierent polynomials will
occur giving A
1
as a polynomial in A.
Also, considering the function f() =

, we see that for every nonsin-


gular matrix A there exists a polynomial p(x) for which
p(A)p(A) = A.
VERY IMPORTANT: With the help of Theorem 11.4.2 we can now
resolve the question left open just preceding Theorem 11.3.1 concerning
whether f(A) was well-dened. If we know the function f() and its deriva-
tives at the points
1
, . . . ,
s
, we can construct the polynomial p(x) whose
value p(A) does not depend on the reduction of the matrix A to Jordan form,
and at the same time is equal to f(A). Consequently, the value f(A) dened
in the preceding section using the reduction of A to Jordan form, does not
depend on the way this reduction is carried out.
Let f() be a numerical function, and let A be a matrix for which f(A)
is meaningful. By Theorem 11.4.2 we can nd a polynomial p(x) for which
p(A) = f(A). For a given function f(), the polynomial p(x) depends only
on the elementary divisors of the matrix A. But the elementary divisors of
A and its transpose A
T
coincide, so p(A
T
) = f(A
T
). Since for a polynomial
p(x) we have p(A
T
) = p(A)
T
, it must be that f(A
T
) = f(A)
T
for all scalar
functions f() for which f(A) is meaningful.
11.5 Power Series

All matrices are n n over F, as usual. Let


G(x) =

k=0
a
k
x
k
be a power series with coecients from F. Let G
N
(x) =

N
k=0
a
k
x
k
be the
N
th
partial sum of G(x). For each A M
n
(F) let G
N
(A) be the element of
M
n
(F) obtained by substituting A in this polynomial. For each xed i, j we
obtain a sequence of real or complex numbers c
N
ij
, N = 0, 1, 2, . . . by taking
c
N
ij
to be the (i, j) entry of the matrix G
N
(A). The series
G(A) =

k=0
a
k
A
k
11.5. POWER SERIES

245
is said to converge to the matrix C in M
n
(F) if for each i, j 1, 2, . . . , n
the sequence c
N
ij

N=0
converges to the (i, j) entry of C (in which case we
write G(A) = C). We say G(A) converges if there is some C M
n
(F) such
that G(A) = C.
For A = (a
ij
) M
n
(F) dene
|A| =
n

i,j=1
[a
ij
[,
i.e., |A| is the sum of the absolute values of all the entries of A.
Lemma 11.5.1. For all A, B M
n
(F) and all a F
(a) |A + B| |A| +|B|;
(b) |AB| |A| |B|;
(c) |aA| = [a[ |A|.
Proof. The proofs are rather easy. We just give a proof of (b).
|AB| =

i,j
[(AB)
ij
[ =

i,j

k
A
ik
B
kj

i,j,k
[A
ik
[ [B
kj
[

i,j,k,r
[A
ik
[ [B
rj
[ = |A| |B|.
Suppose that G(x) =

k=0
a
k
x
k
has radius of convergence equal to R.
Hence if |A| < R, then

k=0
a
k
|A|
k
converges absolutely, i.e.,

k=0
[a
k
[
|A|
k
converges. But then [a
k
(A
K
)
ij
[

a
k
|A
k
|

[a
k
[ |A|
k
, implying that

k=0
[a
k
(A
k
)
ij
[ converges, hence

k=0
a
k
(A
k
)
ij
converges. At this point we
have shown the following:
Lemma 11.5.2. If |A| < R where R is the radius of convergence of G(x),
then G(A) converges to a matrix C.
We now give an example of special interest. Let f() be the numerical
function
f() = exp() =

k=0

k
k!
.
246 CHAPTER 11. MATRIX FUNCTIONS

Since this power series converges for all complex numbers , each matrix
A M
n
(F) satises the condition in Lemma 11.5.2 so exp(A) converges to
a matrix C = e
A
. Also, we know that f() = exp() has derivatives of all
orders at each complex number. Hence f(A) is meaningful in the sense of the
preceding section. We want to be sure that the matrix C to which the series

k=0
1
k!
A
k
converges is the same as the value f(A) given in the preceding
section. So let us start this section over.
A sequence of square matrices
A
1
, A
2
, , A
m
, A
m+1
, , (11.14)
all of the same order, is said to converge to the matrix Aprovided the elements
of the matrices in a xed row and column converge to the corresponding
element of the matrix A.
It is clear that if the sequences A
m
and B
m
converge to matrices A
and B, respectively, then A
m
+ B
m
and A
m
B
m
converge to A + B and
AB, respectively. In particular, if T is a constant matrix, and the sequence
A
m
converges to A, then the sequence T
1
A
m
T will converge to T
1
AT.
Further, if
A
m
= A
(1)
m
A
(s)
m
, (m = 1, 2, . . .),
where the orders of the blocks do not depend on m, then A
m
will converge
to some limit if and only if each block A
(i)
m
converges separately.
The last remark permits a completely simple solution of the question of
the convergence of a matrix power series. Let
a
0
+ a
1
x + a
2
x
2
+ + a
m
x
m
+ (11.15)
be a formal power series in an indeterminate x. The expression
a
0
I + a
1
A + a
2
A
2
+ + a
m
A
m
+ (11.16)
is called the corresponding power series in the matrix A, and the polynomial
f
n
(A) = a
0
I + a
1
A + + a
n
A
n
is the nth partial sum of the series. The series in Eq. 11.16 is convergent
provided the sequence f
n
(A)

n=1
of partial sums has a limit. If this limit
exists it is called the sum of the series.
Reduce the matrix A to Jordan form:
11.5. POWER SERIES

247
T
1
AT = J = J
1
J
t
,
where J
1
, , J
t
are elementary Jordan blocks. We have seen above that
convergence of the sequence f
n
(A) is equivalent to the convergence of the
sequence T
1
f
n
(A)T. But
T
1
f
n
(A)T = f
n
(T
1
AT) = f
n
(J) = f
n
(J
1
) f
n
(J
t
),
and the question of the convergence of the series Eq. 11.16 is equivalent to
the following: Under what conditions is this series convergent for the Jordan
blocks J
1
, , J
t
? Consider one of these blocks, say J
i
. Let it have the
elementary divisor (x
i
)
n
i
, i.e., it is an elementary Jordan block with
minimal and characteristic polynomial equal to (x
i
)
n
i
. By Eq. 11.4
f
n
(J
i
) =
_
_
_
_
_
_
_
_
f
n
(
i
)
1
1!
f
(1)
n
(
i
)
1
2!
f
(2)
n
(
i
)
1
(n1)!
f
(n1)
n
(
i
)
0 f
n
(
i
)
1
1!
f
(1)
n
(
i
)
1
(n2)!
f
(n2)
n
(
i
)
0 0 f
n
(
i
)
1
(n3)!
f
(n3)
n
(
i
)
.
.
.
.
.
.
.
.
.
.
.
.
0 0 f()
_
_
_
_
_
_
_
_
. (11.17)
Consequently, f
n
(J
i
) converges if and only if the sequences f
(j)
n
(
i
)

n=1
for each j = 0, 1, . . . , n
i
1 converge, i.e., if and only if the series Eq. 11.15
converges, and the series obtained by dierentiating it term by term up to
n
i
1 times, inclusive, converges. It is known from the theory of analytic
functions that all these series are convergent if either
i
lies inside the circle
of convergence of Eq. 11.15 or
i
lies on the circle of convergence and the
(n
i
1)st derivative of Eq. 11.15 converges at
i
. Moreover, when
i
lies
inside the circle of convergence of Eq. 11.15, then the derivative of the series
evaluated at
i
gives the derivative of the original function evaluated at
i
.
Thus we have
Theorem 11.5.3. A matrix power series in A converges if and only if each
eigenvalue
i
of A either lies inside the circle of convergence of the corre-
sponding power series f() or lies on the circle of convergence, and at the
same time the series of (n
i
1)st derivatives of the terms of f() converges
at
i
to the derivative of the function given by the original power series eval-
uated at
i
, where n
i
is the highest degree of an elementary divisor belonging
248 CHAPTER 11. MATRIX FUNCTIONS

to
i
(i.e., where n
i
is the size of the largest elementary Jordan block with
eigenvalue
i
). Moreover, if each eigenvalue
i
of A lies inside the circle of
convergence of the power series f(), then the jth derivative of the power
series in A converges to the jth derivative of f() evaluated at A.
Now reconsider the exponential function mentioned above. We know that
f() =

k=0

k
k!
converges, say to e

, for all complex numbers . Moreover,


the function f() = e

has derivatives of all orders at each c and


the power series obtained from the original by dierentiating term by term
converges to the derivative of the function at each complex number . In fact,
the derivative of the function is again the original function, and the power
series obtained by dierentiating the original power series term by term is
again the original power series. Hence the power series

k=0
1
k!
A
k
not only
converges to some matrix C, it converges to the value exp(A) dened in the
previous section using the Jordan form of A. It follows that for each complex
n n matrix A there is a polynomial p(x) such that exp(A) = p(A).
Let J

denote the n n elementary Jordan block


J

=
_
_
_
_
_
_
_
1 0 . . . 0
0 1 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
0 1
0
_
_
_
_
_
_
_
= I + N
n
.
If f() = e

, then we know
f(J

) =
_
_
_
_
_
e
1
1!
e
1
2!
e


1
(n1)!
e

0 e
1
1!
e


1
(n2)!
e

.
.
.
.
.
.
.
.
.
.
.
.
0 e
1
1!
e

_
_
_
_
_
.
Now let t be any complex number. Let B
n
(t) = B
t
be the n n matrix
dened by
B
t
=
_
_
_
_
_
_
_
_
1
t
1!
t
2
2!

t
n1
(n1)!
0 1
t
1!

t
n2
(n2)!
0 0 1
t
n3
(n3)!
.
.
.
.
.
.
.
.
.
.
.
.
0 1
_
_
_
_
_
_
_
_
.
11.5. POWER SERIES

249
So (B
t
)
ij
= 0 if j < i and (B
t
)
ij
=
t
ji
(ji)!
if i j.
With f() = e

put g() = f(t) = e


t
. A simple induction shows that
g
(j)
() = t
j
g(). Then
g(J

) = e
tJ

= e
t
B
t
.
In particular,
e
J

= e

B
1
.
We can even determine the polynomial p(x) for which p(J

) = e
J

. Put
p
n
(x) =
n1

j=0
(x )
j
j!
.
It follows easily that
p
(j)
n
() = 1 for 0 j n 1, and p
(n)
n
(x) = 0.
From Eq. 11.6 we see that
p
n
(J
n
()) = B
n
(1), so e

p
n
(J
n
()) = e

B
n
(1) = e
J

.
We next compute B
t
B
s
, for arbitrary complex numbers t and s. Clearly
the product is upper triangular. Then for i j we have
(B
t
B
s
)
ij
=
j

k=i
(B
t
)
ik
(B
s
)
kj
) =
j

k=i
t
ki
(k i)!

s
jk
(j k)!
=
1
(j i)!
j

k=i
_
j i
j k
_
t
(ji)(jk)
s
jk
=
1
(j i)!
(t + s)
ji
= (B
t+s
)
ij
.
Hence
B
t
B
s
= B
t+s
= B
s
B
t
.
It now follows that
e
tJ

e
sJ

= e
t
B
t
e
s
B
s
= e
(t+s)
B
t+s
= e
(t+s)J

= e
sJ

e
tJ

.
Fix the complex number s and put T
0
= diag(s
n1
, s
n2
, . . . , s, 1). It is
now easy to verify that
T
1
0
sJ

T
0
= J
s
. (11.18)
250 CHAPTER 11. MATRIX FUNCTIONS

Hence J
s
is the Jordan form of sJ

. Also,
_
T
0
AT
1
0
_
ij
= s
ji
A
ij
. From this
it follows easily that
T
0
B
1
T
1
0
= B
s
.
It now follows that
e
sJ

= f(sJ

) = f(T
0
J
s
T
1
0
) = T
0
f(J
s
)T
1
0
=
= T
0
e
J
s
T
1
0
= T
0
(e
s
B
1
)T
1
0
= e
s
B
s
,
which agrees with an earlier equation.
Suppose A is a general n n matrix with Jordan form
T
1
AT = J

1
J
t
, J

i
M
n
i
(F).
Let T
0
be the direct sum of the matrices diag(s
n
i
1
, s
n
i
2
, . . . , s, 1). Then
T
1
0
T
1
sATT
0
= J
s
1
J
st
,
so
e
sA
= T
_
e
s
1
B
n
1
(s) e
st
B
nt
(s)
_
T
1
.
Several properties of the exponential function are now easy corollaries.
Corollary 11.5.4. Using the above descriptions of A and e
sA
, etc., we ob-
tain:
(a) Ae
A
= e
A
A for all square A.
(b) e
sA
e
tA
= e
(s+t)A
= e
tA
e
sA
, for all s, t c.
(c) e
0
= I, where 0 is the zero matrix.
(d) e
A
is nonsingular and e
A
= (e
A
)
1
.
(e) e
I
= eI.
(f ) det(e
J

) = e
n
= e
trace(J

)
, from which it follows that det(e
A
) =
e
trace(A)
.
11.6 Commuting Matrices

Let A be a xed n n matrix over c. We want to determine which matrices


commute with A. If A commutes with matrices B and C, clearly A commutes
with BC and with any linear combination of B and C. Also, if A commutes
with every n n matrix, then A must be a scalar multiple A = aI of the
identity matrix I. (If you have not already veried this, do it now.)
11.6. COMMUTING MATRICES

251
Now suppose
T
1
AT = J = J
1
J
s
(11.19)
is the Jordan form of A with each J
i
an elementary Jordan block. It is easy
to check that X commutes with J if and only if TXT
1
commutes with A.
Therefore the problem reduces to nding the matrices X that commute with
J. Write X in block form corresponding to Eq. 11.19:
X =
_
_
_
_
_
X
11
X
12
X
1s
X
21
X
22
X
2s
.
.
.
.
.
.
.
.
.
X
s1
X
s2
X
ss
_
_
_
_
_
. (11.20)
The condition JX = XJ reduces to the equalities
J
p
X
pq
= X
pq
J
q
(p, q = 1, , s). (11.21)
Note that there is only one equation for each X
pq
. Fix attention on one
block X
pq
= B = (b
ij
). Suppose J
p
is r r and J
q
is t t. Then X
pq
= B is
r t, J
p
=
p
I + N
r
, J
q
=
q
I + N
t
. From Eq. 11.21 we easily get

p
B
uv
+
r

k=1
(N
r
)
uk
B
kv
=
q
B
uv
+
t

k=1
B
uk
(N
t
)
kv
, (11.22)
for all p, q, u, v with 1 p, q s, 1 u r; 1 v t.
Eq. 11.22 quickly gives the following four equations:
(For v ,= 1, u ,= r),
p
B
uv
+ B
u+1,v
=
q
B
uv
+ B
u,v1
. (11.23)
(For v = 1; u ,= r),
p
B
u1
+ B
u+1,1
=
q
B
u1
+ 0. (11.24)
(For v ,= 1; u = r),
p
B
rv
+ 0 =
q
B
r,v
+ B
r,v1
. (11.25)
(For u = r, v = 1),
p
B
r1
+ 0 =
q
B
r1
. (11.26)
First suppose that
p
,=
q
. Then from Eq. 11.26 B
r1
= 0. Then from
Eq. 11.24 (
q

p
)B
u1
= B
u+1,1
. Put u = r 1, r 2, . . . , 1, in that order, to
252 CHAPTER 11. MATRIX FUNCTIONS

get B
u1
= 0 for 1 u r, i.e., the rst column of B is the zero column. In
Eq. 11.25 put v = 2, 3, . . . , t to get B
rv
= 0 for 1 v t i.e., the bottom row
of B is the zero row. Put u = r 1 in Eq. 11.23 and then put v = 2, 3, . . . , t,
in that order, to get B
r1,v
= 0 for all v, i.e., the (r 1)st column has all
entries equal to zero. Now put u = r 2 and let v = 2, 3, . . . , t, in that order,
to get B
r2,v
= 0 for 1 v t. Continue in this way for u = r 3, r 4,
etc., to see that B = 0.
So for
p
,=
q
we have X
pq
= 0.
Now consider the case
p
=
q
. The equations Eq. 11.23 through 11.25
become
(For v ,= 1; u ,= r), B
u+1,v
= B
u,v1
. (11.27)
(For v = 1; u ,= r)B
u+1,1
= 0. (11.28)
(For v ,= 1; u = r)B
r,v1
= 0. (11.29)
It is now relatively straightforward to check that there are only the follow-
ing three possibilities (each of which is said to be in linear triangular form):
1. r = t, in which case B =

t
i=0

i
N
i
t
for some scalars
i
c.
2. r > t, in which case B has the form
B =
_
t
i=1

i
N
i
t
0
rt,t
_
.
3. r < t, in which case B has the form
B =
_
0
r,tr

r
i=1

i
N
i
r
_
.
Conversely, the determination of the form of B shows that if B has any
of the forms shown above then B commutes with X. For example, if
B = I
3
+ N
3
I
2
+ N
2
I
3
+ N
3
(11.30)
for ,= , then the matrices X that commute with V have the form
11.6. COMMUTING MATRICES

253
X =
_
_
_
_

3
i=1
a
i
N
i
3

2
i=1
c
i
N
i
2
0
1,2
0
2,1

2
i=1
d
i
N
i
2

2
i=1
b
i
N
i
2
_
_
_
_

i=1

i
N
i
3
,
where a
i
, b
i
, c
i
d
i
,
i
are arbitrary scalars.
Polynomials in a matrix A have the special property of commuting, not
only with the matrix A, but also with any matrix X which commutes with
A.
Theorem 11.6.1. If C commutes with every matrix which commutes with
B, then C is a polynomial in B.
Proof. In the usual way we may assume that B is in Jordan form. Suppose
B =
s

i=1
B
i
=
s

i=1
J
n
i
(
i
) =
s

i=1

i
I
n
i
+ N
n
i
.
The auxiliary matrices
X =
s

i=1
a
i
I
n
i
where the a
i
are arbitrary numbers, are known to commute with B. So by
hypothesis X also commutes with C. From the preceding discussion, if we
take the a
i
to be distinct, we see that C must be decomposable into blocks:
C = C
1
C
s
.
Moreover, it follows from CB = BC that these blocks have linear trian-
gular form. Now let X be an arbitrary matrix that commutes with B. The
general form of the matrix X was established in the preceding section. By
assumption, C commutes with X. Representing X in block form, we see that
the equality CX = XC is equivalent to the relations
C
p
X
pq
= X
pq
C
q
(p, q = 1, 2, . . . , s). (11.31)
If the corresponding blocks B
p
, B
q
have distinct eigenvalues, then Eq. 11.31
contributes nothing, since then X
pq
= 0. Hence we assume that B
p
and B
q
have the same eigenvalues. Let
C
p
=
r

i=1
a
i
N
i
r
, C
q
=
t

i=1
b
i
N
i
t
.
254 CHAPTER 11. MATRIX FUNCTIONS

For deniteness, suppose that r < t. Then Eq. 11.31 becomes


_
r

i=1
a
i
N
i
r
_

__
0
r,tr

r
i=1

i
N
i
r
__
=
__
0
r,tr

r
i=1

i
N
i
r
__

_
t

i=1
b
i
N
i
t
_
,
where the
i
are arbitrary complex numbers. Multiply the top row times the
right hand column on both sides of the equation to obtain:
(a
1
, . . . , a
r
) (
r
, . . . ,
1
)
T
= (0, . . . , 0,
1
, . . . ,
r
) (b
t
, . . . , b
1
)
T
,
which is equivalent to
a
1

r
+ a
2

r1
+ + a
r

1
= b
1

r
+ b
2

r1
+ + b
r

1
.
Since the
i
can be chosen arbitrarily, it must be that
a
i
= b
i
, for 1 i r. (11.32)
It is routine to verify that a similar equality is obtained for r t.
Suppose that the Jordan blocks of the matrix B are such that blocks with
the same eigenvalues are adjacent. For example, let
B = (B
1
B
m
1
) (B
m
1
+1
B
m
2
) (B
m
k
+1
B
s
),
where blocks with the sam eigenvalues are contained in the same set of paren-
theses. Denoting the sums in parentheses by B
(1)
, B
(2)
, , respectively ,
divide the matrix B into larger blocks which we call cells. Then divide the
matrices X and C into cells in a corresponding way. The results above show
that all the nondiagonal cells of the matrix X are zero; the structure of the
diagonal cells was also described above. The conditions for the matrix C
obtained in the present section show that the diagonal blocks of this matrix
decompose into blocks in the same way as those of B. Moreover, the blocks
of the matrix C have a special triangular form. Equalities 11.32 mean that
in the blocks of the matrix C belonging to a given cell, the nonzero elements
lying on a line parallel to the main diagonal are equal.
For example, let B have the form given in Eq. 11.30,
11.7. A MATRIX DIFFERENTIAL EQUATION

255
B =
_
_
_
_
_
_
_
_
_
_
_
_
1 0 0 0 0 0 0
0 1 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 1 0
0 0 0 0 0 0 1
0 0 0 0 0 0 0
_
_
_
_
_
_
_
_
_
_
_
_
. (11.33)
Our results show that every matrix C that commutes with each matrix
which commutes with B must have the form
C =
_
_
_
_
_
_
_
_
_
_
_
_
a
0
a
1
a
2
a
0
a
1
a
0
a
0
a
1
a
0
d
0
d
1
d
2
d
0
d
1
d
0
_
_
_
_
_
_
_
_
_
_
_
_
. (11.34)
We need to prove that C can be represented as a polynomial in B. We
do this only for the particular case where B has the special form given in
Eq. 11.33, so that C has its form given in Eq. 11.34. By Lemma 11.4.1 there
is a polynomial f() that satises the conditions:
f() = a
0
, f
t
() = a
1
, f
tt
() = a
2
,
f() = d
0
, f
t
() = d
1
, f
tt
() = d
2
.
By applying Eq. 11.6 we see that f(B) = C
11.7 A Matrix Dierential Equation

Let F denote either 1 or c. If B(t) is an n n matrix each of whose entries


is a dierentiable function b
ij
(t) from F to F, we say that the derivative of
B with respect to t is the matrix
d
dt
(B(t)) whose (i, j)th entry is (b
ij
)
t
(t).
Let x(t) = (x
1
(t), . . . , x
n
(t))
T
be an n-tuple of unknown functions F F.
The derivative of x(t) will be denoted x(t). Let A be an n n matrix over
F, and consider the matrix dierential equation (with initial condition):
256 CHAPTER 11. MATRIX FUNCTIONS

x(t) = Ax(t), x(0) = x


0
. (11.35)
Theorem 11.7.1. The solution to the system of dierential equations in
Eq. 11.35 is given by x(t) = e
tA
x
0
.
Before we can prove this theorem, we need the following lemma.
Lemma 11.7.2.
d
dt
(e
tA
) = Ae
tA
.
Proof. First suppose that A is the elementary Jordan block A = J = J
n
() =
I+N, where N = N
n
. If n = 1, recall that the ordinary dierential equation
x
t
(t) = x(t) has the solution x(t) = ae
t
where a = x(0). Since N
1
= 0,
we see that the theorem holds in this case. Now suppose that n > 1. So
A = J = I + N where N
n
= 0. Recall that
e
A
= e
tJ
= e
t
B
t
=
n1

j=0
e
t
t
j
j!
N
j
.
A simple calculation shows that
d
dt
_
e
tJ
_
= e
t
I +
n1

j=1
_
e
t
t
j
+ je
t
t
j1
j!
_
N
j
.
On the other hand,
Je
tJ
= (I + N)
_
e
t
n1

j=0
t
j
j!
N
j
_
= e
t
_
(I + N)
_
I +
n2

j=1
t
j
j!
N
j
+
t
n1
j!
N
n1
__
= e
t
_
I + (1 + t)N +
n2

j=2
_
t
j
j!
+
t
j1
(j 1)!
_
N
j
+
+
_
t
n1
(n 1)!
+
t
n2
(n 2)!
_
N
n1
)
_
11.8. EXERCISES 257
=
d
dt
_
e
tJ
_
, as desired.
This completes a proof of the Lemma. Now turn to a proof of Theorem 11.7.1.
First suppose that A has the Jordan form J = P
1
AP, where J =
J
1
J
s
. Since e
tA
is a polynomial in tA, e
P
1
tAP
= P
1
e
tA
P, i.e.
e
tA
= P e
tJ
P
1
= T (e
tJ
1
e
tJs
)P
1
.
It now follows easily that
d
dt
_
e
tA
_
= Ae
tA
.
Since x
0
= x(0) is a constant matrix,
d
dt
_
e
tA
x
0
_
= A
tA
x
0
, so that x = e
tA
x
0
satises the dierential equation x = Ax.
It is a standard result from the theory of dierential equations that this
solution is the unique solution to the given equation.
11.8 Exercises
1. Show that the norm of T /(V, W) as dened in Equation 11.1 is a
vector norm on /(V, W) viewed as a vector space in the usual way.
2. Let D denote the derivative operator, and for a function f (in our case a
formal power series or Laurent series) let f
(j)
denote the jth derivative
of f, i.e., D
j
(f) = f
(j)
.
(i) Prove that
D
n
(f g) =
n

i=0
_
n
i
_
f
(i)
g
(ni)
.
(ii) Derive as a corollary to part (i) the fact that
D
j
(f
2
) =

i
1
+i
2
=j
_
j
i
1
, i
2
_
f
(i
1
)
f
(i
2
)
.
(iii) Now use part (i) and induction on n to prove that
D
j
(f
n
) =

i
1
++in=j
_
j
i
1
, . . . , i
n
_
f
(i
1
)
f
(in)
.
258 CHAPTER 11. MATRIX FUNCTIONS

3. The Frobenius norm |A|


F
of an m n matrix A is dened to be the
square root of the sum of the squares of the magnitudes of all the entries
of A.
(a) Show that the Frobenius norm really is a matrix norm (i.e., a vector
norm on the vector space of all mn matrices).
(b) Compute |I|
F
and deduce that | |
F
cannot be a transformation
norm induced by some vector norm.
(c) Let U and V be unitary matrices of the appropriate sizes and show
that
|A|
F
= |UA|
F
= |AV |
F
= |UAV |
F
.
4. Let f() and g() be two numerical functions and let A be
an n n matrix for which both f(A) and g(A) are dened. Show that
f(A)g(A) = g(A)f(A).
Chapter 12
Innite Dimensional Vector
Spaces

12.1 Partially Ordered Sets & Zorns Lemma

There are occasions when we would like to indulge in a kind of innite


induction. Basically this means that we want to show the existence of some
set which is maximal with respect to certain specied properties. In this text
we want to use Zorns Lemma to show that every vector space has a basis,
i.e., a maximal linearly independent set of vectors in some vector space. The
maximality is needed to show that these vectors span the entire space.
In order to state Zorns Lemma we need to set the stage.
Denition A partial order on a nonempty set Z is a relation on A
satisfying the following:
1. x x for all x A (reexive) ;
2. if x y and y x then x = y for all x, y A (antisymmetric);
3. if x y and y z then x z for all x, y, z A (transitive).
Given a partial order on A we often say that A is partially ordered by
, or that (A, ) is a partially ordered set.
Denition Let the nonempty set A be partially ordered by .
1. A subset V of A is called a chain if for all x, y B, either x y or
y x.
259
260 CHAPTER 12. INFINITE DIMENSIONAL VECTOR SPACES

2. An upper bound for a subset B of A is an element u A such that


b u for all b B.
3. A maximal element of A is an element m A such that if m x for
some x A, then m = x.
In the literature there are several names for chains, such as linearly ordered
subset or simply ordered subset. The existence of upper bounds and maximal
elements depends on the nature of (A, ).
As an example, let A be the collection of all proper subsets of Z
+
(the
set of positive integers) ordered by . Then, for example, the chain
1 1, 2 1, 2, 3
does not have an upper bound. However, the set A does have maximal
elements: for example Z
+
n is a maximal element of A for any n Z
+
.
Zorns Lemma If A is a nonempty partially ordered set in which every
chain has an upper bound, then A has a maximal element. It is a nontrivial
result that Zorns Lemma is independent of the usual (Zermelo-Fraenkel)
axioms of set theory in the sense that if the axioms of set theory are consistent,
then so are these axioms together with Zorns Lemma or with the negation
of Zorns Lemma. The two other most nearly standard axioms that are
equivalent to Zorns Lemma (in the presence of the usual Z-F axioms) are
the Axiom of Choice and the Well Ordering Principle. In this text, we just
use Zorns Lemma and leave any further discussion of these matters to others.
12.2 Bases for Vector Spaces

Let V be any vector space over an arbitrary eld. Let S = A V :


A is linearly independent and let S be partially ordered by inclusion. If C
is any chain in S, then the union of all the sets in C is an upper bound
for C. Hence by Zorns Lemma S must have a maximal element B. By
denition B is a linearly independent set not properly contained in any other
linearly independent set. If a vector v were not in the space spanned by B,
then B v would be a linearly independent set properly containing B, a
contradiction. Hence B is a basis for V . This proves the following theorem:
Theorem 12.2.1. Each vector space V over an arbitrary eld F has a basis.
This is a subset B of V such that each vector v V can be written in just
one way as a linear combination of a nite lis of vectors in B.
12.3. A THEOREM OF PHILIP HALL

261
12.3 A Theorem of Philip Hall

Let o and I be arbitrary sets. For each i I let A


i
o. If a
i
A
i
for all
i I, we say a
i
: i I is a system of representatives for / = (A
i
: i I).
If in addition a
i
,= a
j
whenever i ,= j, even though A
i
may equal A
j
, then
a
i
: i I is a system of distinct representatives (SDR) for /. Our rst
problem is: Under what conditions does some family / of subsets of a set o
have an SDR?
For a nite collection of sets a reasonable answer was given by Philip Hall
in 1935. It is obvious that if / = (A
i
: i I) has an SDR, then the union
of each k of the members of / = (A
i
: i I) must have at least k elements.
Halls observation was that this obvious necessary condition is also sucient.
We state the condition formally as follows:
Condition (H) : Let I = [n] = 1, 2, . . . , n, and let S be any (nonempty)
set. For each i I, let S
i
S. Then / = (S
1
, . . . , S
n
) satises Condition
(H) provided for each K I, [
kK
S
k
[ [K[.
Theorem 12.3.1. The family / = (S
1
, . . . , S
n
) of nitely many (not neces-
sarily distinct) sets has an SDR if and only if it satises Condition (H).
Proof. As Condition (H) is clearly necessary, we now show that it is also
sucient. B
r,s
denotes a block of r subsets (S
i
1
, . . . , S
ir
) belonging to /,
where s = [ S
j
: S
j
B
r,s
[. So Condition (H) says: s r for each block
B
r,s
. If s = r, B
r,s
is called a critical block. (By convention, the empty block
B
0,0
is critical.)
If B
r,s
= (A
1
, . . . , A
u
, C
u+1
, . . . , C
r
) and
B
t,v
= (A
1
, . . . , A
u
, D
u+1
, . . . , D
t
), write B
r,s
B
t,v
=
(A
1
, . . . , A
u
); B
r,s
B
t,v
= (A
1
, . . . , A
u
, C
u+1
, . . . , C
r
, D
u+1
, . . . , D
t
). Here
the notation implies that A
1
, . . . , A
u
are precisely the subsets in both blocks.
Then write
B
r,s
B
t,v
= B
u,w
, where w = [ A
i
: 1 i u[, and B
r,s
B
t,v
= B
y,z
,
where y = r + t u, z = [ S
i
: S
i
B
r,s
B
t,v
[.
The proof will be by induction on the number n of sets in the family /,
but rst we need two lemmas.
Lemma 12.3.2. If / satises Condition (H), then the union and intersec-
tion of critical blocks are themselves critical blocks.
262 CHAPTER 12. INFINITE DIMENSIONAL VECTOR SPACES

Proof of Lemma 12.3.2. Let B


r,r
and B
t,t
be given critical blocks. Say
B
r,r
B
t,t
= B
u,v
; B
r,r
B
t,t
= B
y,z
. The z elements of the union will be
the r + t elements of B
r,r
and B
t,t
reduced by the number of elements in
both blocks, and this latter number includes at least the v elements in the
intersection: z r + t v. Also v u and z y by Condition (H). Note:
y +u = r +t. Hence r +t v z y = r +t u r +t v, implying that
equality holds throughout. Hence u = v and y = z as desired for the proof
of Lemma 12.3.2 .
Lemma 12.3.3. If B
k,k
is any critical block of /, the deletion of elements
of B
k,k
from all sets in / not belonging to B
k,k
produces a new family /
t
in
which Condition (H) is still valid.
Proof of Lemma12.3.3. Let B
r,s
be an arbitrary block, and (B
r,s
)
t
= B
t
r,s

the block after the deletion. We must show that s


t
r. Let B
r,s
B
k,k
= B
u,v
and B
r,s
B
k,k
= B
y,z
. Say
B
r,s
= (A
1
, . . . , A
u
, C
u+1
, . . . , C
r
),
B
k,k
= (A
1
, . . . , A
u
, D
u+1
, . . . , D
k
).
So B
u,v
= (A
1
, . . . , A
u
), B
y,z
= (A
1
, . . . , A
u
, C
u+1
, . . . , C
r
, D
u+1
, . . . , D
k
).
The deleted block (B
r,s
)
t
= B
t
r,s
is (A
1
, . . . , A
u
, C
t
u+1
, . . . , C
t
r
). But C
u+1
, . . . , C
r
,
as blocks of the union B
y,z
, contain at least z k elements not in B
k,k
. Thus
s
t
v + (z k) u + y k = u + (r + k u) k = r. Hence s
t
r, as
desired for the proof of Lemma 12.3.3.
As indicated above, for the proof of the main theorem we now use induc-
tion on n. For n = 1 the theorem is obviously true.
Induction Hypothesis: Suppose the theorem holds (Condition (H) implies
that there is an SDR) for any family of m sets, 1 m < n.
We need to show the theorem holds for a system of n sets. So let 1 <
n, assume the induction hypothesis, and let / = (S
1
, . . . , S
n
) be a given
collection of subsets of S satisfying Condition (H).
First Case: There is some critical block B
k,k
with 1 k < n. Delete
the elements in the members of B
k,k
from the remaining subsets, to obtain
a new family /
t
= B
k,k
B
t
nk,v
, where B
k,k
and B
t
nk,v
have no common
elements in their members. By Lemma 12.3.3, Condition (H) holds in /
t
,
and hence holds separately in B
k,k
and in B
t
nk,v
viewed as families of sets.
12.3. A THEOREM OF PHILIP HALL

263
By the induction hypothesis, B
k,k
and B
t
nk,v
have (disjoint) SDRs whose
union is an SDR for /.
Remaining Case: There is no critical block for / except possibly the
entire system. Select any S
j
of / and then select any element of S
j
as its
representative. Delete this element from all remaining sets to obtain a family
/
t
. Hence a block B
r,s
with r < n becomes a block B
t
r,s
with s
t
s, s 1.
By hypothesis B
r,s
was not critical, so s r + 1 and s
t
r. So Condition
(H) holds for the family /
t
S
j
, which by induction has an SDR. Add to
this SDR the element selected as a representative for S
j
to obtain an SDR
for /.
We now interpret the SDR problem as one on matchings in bipartite
graphs. Let G = (X, Y, E) be a bipartite graph. For each S X, let N(S)
denote the set of elements of Y connected to at least one element of S by
an edge, and put (S) = [S[ [N(S)[. Put (G) = max(S) : S X.
Since () = 0, clearly (G) 0. Then Halls theorem states that G has an
X-saturating matching if and only if (G) = 0.
Theorem 12.3.4. G has a matching of size t (or larger) if and only if
t [X[ (S) for all S X.
Proof. First note that Halls theorem says that G has a matching of size
t = [X[ if and only if (S) 0 for all S X i [X[ [X[ (S) for
all S X. So our theorem is true in case t = [X[. Now suppose that
t < [X[. Form a new graph G
t
= (X, Y Z, E
t
) by adding new vertices
Z = z
1
, . . . , z
[X[t
to Y , and join each z
i
to each element of X by an edge
of G
t
.
If G has a matching of size t, then G
t
has a matching of size [X[, implying
that for all S X,
[S[ [N
t
(S)[ = [N(S)[ +[X[ t,
implying
[N(S)[ [S[ [X[ + t = t ([X[ [S[) = t [X S[.
This is also equivalent to t [X[ ([S[ [N(S)[) = [X[ (S).
264 CHAPTER 12. INFINITE DIMENSIONAL VECTOR SPACES

Conversely, suppose [N(S)[ t[XS[ = t([X[[S[). Then [N


t
(S)[ =
[N(S)[ +[X[ t (t [X[ +[S[) +[X[ t = [S[. By Halls theorem, G
t
has
an X-saturating matching M. At most [X[ t edges of M join X to Z, so
at least t edges of M are from X to Y .
Note that t [X[ (S) for all S X i t min
SX
([X[ (S)) =
[X[ max
SX
(S) = [X[ (G).
Corollary 12.3.5. The largest matching of G has size [X[ (G) = m(G),
i.e., m(G) + (G) = [X[.
12.4 A Theorem of Marshall Hall, Jr.

Many of the ideas of nite combinatorics have generalizations to situations


in which some of the sets involved are innite. We just touch on this subject.
Given a family / of sets, if the number of sets in the family is innite,
there are several ways the theorem of P. Hall can be generalized. One of the
rst (and to our mind one of the most useful) was given by Marshal Hall, Jr.
(no relative of P. Hall), and is as follows.
Theorem 12.4.1. Suppose that for each i in some index set I there is a
nite subset A
i
of a set S. The system / = (A
i
)
iI
has an SDR if and only
if the following Condition (H) holds: For each nite subset I
t
of I the system
/
t
= (A
i
)
iI
satises Condition (H).
Proof. We establish a partial order on deletions, writing D
1
D
2
for dele-
tions D
1
and D
2
i each element deleted by D
1
is also deleted by D
2
. Of
course, we are interested only in deletions which preserve Condition (H).
If all deletions in an ascending chain D
1
D
2
D
i
preserve
Condition (H), let D be the deletion which consists of deleting an element
b from a set A i there is some i for which b is deleted from A by D
i
. We
assert that deletion D also preserves Condition (H).
In any block B
r,s
of /, (r, s < ), at most a nite number of deletions in
the chain can aect B
r,s
. If no deletion of the chain aects B
r,s
, then of course
D does not aect B
r,s
, and Condition (H) still holds for B
r,s
. Otherwise, let
D
n
be the last deletion that aects B
r,s
. So under D
n
(and hence also under
D) (B
r,s
)
t
= B
t
r,s
still satises Condition (H) by hypothesis, i.e., s
t
r. But
B
r,s
is arbitrary, so D preserves Condition (H) on /. By Zorns Lemma,
12.4. A THEOREM OF MARSHALL HALL, JR.

265
there will be a maximal deletion

D preserving Condition (H). We show that
under such a maximal deletion

D preserving Condition H, each deleted set
S
t
i
has only a single element. Clearly these elements would form an SDR for
the original /.
Suppose there is an a
1
not belonging to a critical block. Delete a
1
from
every set A
i
containing a
1
. Under this deletion a block B
r,s
is replaced by a
block B
t
r,s
with s
t
s 1 r, so Condition (H) is preserved. Hence after
a maximal deletion each element left is in some critical block. And if B
k,k
is
a critical block, we may delete elements of B
k,k
from all sets not in B
k,k
and
still preserve Condition (H) by Lemma 12.3.3 (since it needs to apply only
to nitely many sets at a time). By Theorem 12.3.1 each critical block B
k,k
(being nite) possesses an SDR when Condition (H) holds. Hence we may
perform an additional deletion leaving B
k,k
as a collection of singleton sets
and with Condition (H) still holding for the entire remaining sets. It is now
clear that after a maximal deletion

D preserving Condition (H), each element
is in a critical block, and each critical block consists of singleton sets. Hence
after a maximal deletion

D preserving Condition (H), each set consists of a
single element, and these elements form an SDR for /.
The following theorem, sometimes called the CantorSchroederBernstein
Theorem, will be used with the theorem of M. Hall, Jr. to show that any
two bases of a vector space V over a eld F must have the same cardinality.
Theorem 12.4.2. Let X, Y be sets, and let : X Y and : Y X be
injective mappings. Then there exists a bijection : X Y .
Proof. The elements of X will be referred to as males, those of Y as females.
For x X, if (x) = y, we say y is the daughter of x and x is the father of
y. Analogously, if (y) = x, we say x is the son of y and y is the mother of
x. A male with no mother is said to be an adam. A female with no father
is said to be an eve. Ancestors and descendants are dened in the natural
way, except that each x or y is both an ancestor of itself and a descendant of
itself. If z X Y has an ancestor that is an adam (resp., eve) we say that
z has an adam (resp., eve). Partition X and Y into the following disjoint
sets:
X
1
= x X : x has no eve;
X
2
= x X : x has an eve;
266 CHAPTER 12. INFINITE DIMENSIONAL VECTOR SPACES

Y
1
= y Y : y has no eve;
Y
2
= y Y : y has an eve.
Now a little thought shows that : X
1
Y
1
is a bijection, and
1
:
X
2
Y
2
is a a bijection. So
= [
X
1

1
[
X
2
is a bijection from X to Y .
Corollary 12.4.3. If V is a vector space over the eld F and if B
1
and B
2
are two bases for V , then [B
1
[ = [B
2
[.
Proof. Let B
1
= x
i
: i I and B
2
= y
j
: j J. For each i I, let

i
= j J : y
j
occurs with nonzero coecient in the unique linear
expression for x
i
in terms of the y
t
j
s. Then the union of any k ( 1)
t
i
s, say

i
1
, . . . ,
i
k
, each of which of course is nite, must contain at least k distinct
elements. For otherwise x
i
1
, . . . , x
i
k
would belong to a space of dimension
less than k, and hence be linearly dependent. Thus the family (
i
: i I) of
sets must have an SDR. This means there is a function : I J which is
an injection. Similarly, there is an injection : J I. So by the preceding
theorem there is a bijection J I, i.e., [B
1
[ = [B
2
[.
12.5 Exercises

Exercise 12.5.0.1. Let / = (A


1
, . . . , A
n
) be a family of subsets of 1, . . . , n.
Suppose that the incidence matrix of the family is invertible. Show that the
family has an SDR.
Exercise 12.5.0.2. Prove the following generalization of Halls Theorem:
Let / = (A
1
, . . . , A
n
) be a family of subsets of X that satises the follow-
ing property: There is an integer r with 0 r < n for which the union of
each subfamily of k subsets of /, for all k with 0 k n, has at least k r
elements. Then there is a subfamily of size n r which has an SDR. (Hint:
Start by adding r dummy elements that belong to all the sets.)
12.5. EXERCISES

267
Exercise 12.5.0.3. Let G be a (nite, undirected, simple) graph with vertex
set V . Let C = C
x
: x V be a family of sets indexed by the vertices of
G. For X V , let C
X
=
xX
C
x
. A set X V is C-colorable if one can
assign to each vertex x X a color c
x
C
x
so that c
x
,= c
y
whenever x
and y are adjacent in G. Prove that if [C
X
[ [X[ whenever X induces a
connected subgraph of G, then V is C-colorable. (In the current literature of
graph theory, the sets assigned to the vertices are called lists, and the desired
proper coloring of G chosen from the lists is a list coloring of G. When G is
a complete graph, this exercise gives precisely Halls Theorem on SDRs. A
current research topic in graph theory is the investigation of modications of
this condition that suce for the existence of list colorings.
Exercise 12.5.0.4. With the same notation of the previous exercise, prove
that if every proper subset of V is C-colorable and [C
V
[ [V [, then V is
C-colorable.
Index
A = [T]
B
2
,B
1
, 38
QR-decomposition, 186
T-annihilator of v, 122
T-conductor of v into W, 122, 127
[v]
B
, 36
-norm, 145
/(U, V ), 32
n-linear, 61
1-norm, 145
2-norm, 145
adjoint of linear operator, 153
adjoint, classical, 76
adjugate, 79
algebraic multiplicity of an eigenvalue,
205
algebraically closed, 57
alternating (n-linear), 64
basis of a vector space, 24
basis of quotient space, 210
bijective, 45
binomial theorem, 58
block multiplication, 13
Cayley-Hamilton Theorem, 79, 83
chain, 259
characteristic polynomial, 77
classical adjoint, 76
codomain, 33
companion matrix, 81
complex number, 8
convergence of a sequence of matri-
ces, 246
convergence of a sequence of vectors,
145
convergence of power series in a ma-
trix, 245
coordinate matrix, 36
determinant, 65
diagonalizable, 118
dimension, 25
direct sum of subspaces, 18
dual basis, 43
dual space, 43, 152
eigenvalue, 113
eigenvector, 114
elementary divisors of a matrix, 213
elementary Jordan block, 213
Euclidean algorithm, 60
exponential of a matrix, 246, 250
nite dimensional, 24
Frobenius norm of a matrix, 258
Fundamental Theorem of Algebra, 57
generalized eigenvector, 204
geometric multiplicity of an eigenvalue,
183
Gram-Schmidt algorithm, 146
268
INDEX 269
Hermitian matrix, 131, 159
Hermitian operator, 159
hyperplane, 191
ideal in F[x], 56
idempotent, 36
independent list of subspaces, 18
injective, 33
inner product, 137
inner product space, 138
isometry, 171
isomorphic, 45, 52
Jordan form, 212
kernel, 33
Kronecker delta, 8
Lagrange Interpolation, 52
Lagrange interpolation generalized, 242
Laplace expansion, 67, 75
least-squares solution, 188
linear algebra, 47
linear functional, 43, 152
linear operator, 38
linear transformation, 31
linear triangular form, 252
linearly dependent, 22
linearly independent, 22
list, 21
lnull(A), 34
maximal element, 260
minimal polynomial , 56
monic polynomial, 49
Moore-Penrose generalized inverse, 184
nilpotent, 204
norm, 145
norm (on a vector space), 141
normal operator, 163
normal matrix, 163
normalized QR-decomposition, 186
null space, 33
nullity, 34
orthogonal, 139
orthogonal matrix, 148
orthonormal, 139
partial order, 259
partially ordered set, 259
polarization identity, complex, 157
polarization identity, real, 156
polynomial, 49
positive (semidenite) operator, 169
positive denite, 198
prime polynomial, 59
principal ideal, 56
principal submatrix, 78
projection, 35
projection matrix, 186
pseudoinverse, 184
quotient space, 123
range, 33
Rayleigh Principle, 196
reduced singular value decomposition,
187
rnull(A), 34
scalar matrix functions, 240
Schurs Theorem, 147
self-adjoint operator, 159
singular value, 178
singular value decomposition, 178
skew-symmetric matrix, 163
span, 22
270 INDEX
sum of subspaces, 18
surjective, 34
system of distinct representatives (SDR),
261
Taylors Formula, 58
tensor product, 91
trace of a matrix, 78
transformation norm, 233
unitary matrix, 148
upper bound, 260
Vec, 109
vector space, 15
vector subspace, 17
Zorns Lemma, 260