Anda di halaman 1dari 398

JinHo Kwak

Sungpyo Hong

Linear Algebra
Second Edition

Springer Science+Business Medi~ LLC


JinHo Kwak Sungpyo Hong
Department of Matbematics Department of Mathematics
Pohang University of Science Pohang University of Science
and Technology and Technology
Pohang, Kyungbuk 790-784 Pohang, Kyungbuk 790-784
South Korea SouthKorea

Library of Cougress Cataloging-in-PubHeation Data


Kwak, lin Ho, 1948-
Linear algebra I lin Ho Kwak, Sungpyo Hong.-2nd ed.
p.cm.
Includes bibliographical references and index.
ISBN 978-0-8176-4294-5 ISBN 978-0-8176-8194-4 (eBook)
DOI 10.1007/978-0-8176-8194-4
1. Algebras, Linear. I. Hong, Sungpyo, 1948- . Title.

QAI84.2.K932004
512' .5-dc22 2004043751
CIP

AMS Subject Classifications: 15-01

ISBN 978-0-8176-4294-5 Printed on acid-free paper.

@2004 Springer Science+Business Media New York


Originally published by Birkhlluser Boston in 2004

All rights reserved. This work may not be translated or copied in whole or in part without the written
permission of the publisher Springer Science+Business Media, LLC,
except for brief excerpts in connection with reviews or
scholarly analysis. Use in connection with any form of information storage and retrievaI, electronic
adaptation, computer software, or by similar or dissimilar methodology now known or hereafter
developed is forbidden.
The use in this publication of trade names, trademarks, service marks and similar terms, even if they
are not identified as such, is not to be taken as an expression of opinion as to whether or not they are
subject to property rights.

987654321 SPIN 10979327

www.birkhasuer-science.com
Preface tothe Second Edition

This second edition is based on many valuable comments and suggestions from
readers of the first edition. In this edition, the last two chapters are interchanged
and also several new sections have been added. The following diagram illustrates the
dependencies of the chapters.

Chapter 1
Linear Equations and Matrices

Chapter 2
Determinants

Chapter 4 Chapter 6
Linear Transformations Diagonalization

Chapter 5 Chapter?
Inner Product Spaces Complex Vector Spaces

Chapter 8
Jordan Canonical Forms
vi Preface to the Second Edition

The major changes from the first edition are the following.
(1) In Chapter 2, Section 2.5.1 "Miscellaneous examples for determinants" is
added as an application .
(2) In Chapter 4, "A homogeneous coordinate system" is introduced for an appli-
cation in computer graphics.
(3) In Chapter 5, Section 5.7 "Relations of fundamental subspaces" and Section 5.8
"Orthogonal matrices and isometries" are interchanged. "Least squares solutions,"
"Polynomial approximations" and "Orthogonal projection matrices" are collected
together in Section 5.9-Applications.
(4) Chapter 6 is entitled "Diagonalization" instead of "Eigenvectors and Eigen-
values." In Chapters 6 and 8, "Recurrence relations," "Linear difference equations"
and "Linear differential equations" are described in more detail as applications of
diagonalizations and the Jordan canonical forms of matrices .
(5) In Chapter 8, Section 8.5 "The minimal polynomial of a matrix" has been
added to introduce more easily accessible computational methods for Anand e A , with
complete solutions of linear difference equations and linear differential equations .
(6) Chapter 8 "Jordan Canonical Forms" and Chapter 9 "Quadratic Forms" are
interchanged for a smooth continuation of the diagonalization problem of matrices.
Chapter 9 "Quadratic Forms" is extended to a complex case and includes many new
figures.
(7) The errors and typos found to date in the first edition have been corrected .
(8) Problems are refined to supplement the worked-out illustrative examples and
to enable the reader to check his or her understanding of new definitions or theorems.
Additional problems are added in the last exercise section of each chapter. More
answers, sometimes with brief hints, are added, including some corrections.
(9) In most examples , we begin with a brief explanatory phrase to enhance the
reader's understanding.
This textbook can be used for a one- or two-semester course in linear algebra. A
theory oriented one-semester course may cover Chapter 1, Sections 1.1-1.4, 1.6-1.7;
Chapter 2 Sections 2.1-2.3; Chapter 3 Sections 3.1-3.6; Chapter 4 Sections 4.1-4.6;
Chapter 5 Sections 5.1-5.4; Chapter 6 Sections 6.1-6.2; Chapter 7 Sections 7.1-7.4
with possible addition from Sections 1.8, 2.4 or 9.1-9.4. Selected applications are
included in each chapter as appropriate. For a beginning applied algebra course, an
instructor might include some ofthem in the syllabus at his or her discretion depending
on which area is to be emphasized or considered more interesting to the students.
In definitions , we use bold face for the word being defined, and sometimes an italic
or shadowbox to emphasize a sentence or undefined or post-defined terminology.
Preface to the Second Edition vii

Acknowledgement: The authors would like to express our sincere appreciation


for the many opinions and suggestions from the readers of the first edition including
many of our colleagues at POSTECH. The authors are also indebted to Ki Hang Kim
and Fred Roush at Alabama State University and Christoph Dalitz at Hochschule
Niederrhein for improving the manuscript and selecting the newly added subjects in
this edition . Our thanks again go to Mrs . Kathleen Roush for grammatical corrections
in the final manuscript, and also to the editing staff of Birkhauser for gladly accepting
the second edition for publication.

JinHo Kwak
Sungpyo Hong
E-mail: jinkwak@postech.ac.kr
sungpyo@postech.ac.kr
January 2004, Pohang, South Korea
Preface to the First Edition

Linear algebra is one of the most important subjects in the study of science and engi-
neering because of its widespread applications in social or natural science, computer
science, physics , or economics . As one of the most useful courses in undergradu-
ate mathematics , it has provided essential tools for industrial scientists. The basic
concepts of linear algebra are vector spaces, linear transformations, matrices and
determinants, and they serve as an abstract language for stating ideas and solving
problems .
This book is based on lectures delivered over several years in a sophomore-level
linear algebra course designed for science and engineering students. The primary
purpose of this book is to give a careful presentation of the basic concepts of linear
algebra as a coherent part of mathematics, and to illustrate its power and utility through
applications to other disciplines . We have tried to emphasize computational skills
along with mathematical abstractions , which have an integrity and beauty of their
own. The book includes a variety of interesting applications with many examples not
only to help students understand new concepts but also to practice wide applications
ofthe subject to such areas as differential equations, statistics, geometry, and physics.
Some of those applications may not be central to the mathematical development and
may be omitted or selected in a syllabus at the discretion of the instructor. Most
basic concepts and introductory motivations begin with examples in Euclidean space
or solving a system of linear equations, and are gradually examined from different
points of view to derive general principles .
For students who have finished a year of calculus, linear algebra may be the first
course in which the subject is developed in an abstract way, and we often find that many
students struggle with the abstractions and miss the applications . Our experience is
that, to understand the material, students should practice with many problems, which
are sometimes omitted . To encourage repeated practice, we placed in the middle of
the text not only many examples but also some carefully selected problems, with
answers or helpful hints . We have tried to make this book as easily accessible and
clear as possible , but certainly there may be some awkward expressions in several
ways. Any criticism or comment from the readers will be appreciated .
x Preface to the First Edition

We are very grateful to many colleagues in Korea, especially to the faculty mem-
bers in the mathematics department at Pohang University of Science and Technology
(POSTECH), who helped us over the years with various aspects of this book. For
their valuable suggestions and comments, we would like to thank the students at
POSTECH , who have used photocopied versions of the text over the past several
years. We would also like to acknowledge the invaluable assistance we have received
from the teaching assistants who have checked and added some answers or hints
for the problems and exercises in this book. Our thanks also go to Mrs. Kathleen
Roush who made this book much more readable with grammatical corrections in the
final manuscript. Our thanks finally go to the editing staff of Birkhauser for gladly
accepting our book for publication.

Jin Ho Kwak
Sungpyo Hong
April 1997, Pohang, South Korea
Contents

Preface to the Second Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Preface to the First Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

1 Linear Equations and Matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


1.1 Systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Gaussian elimination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Sums and scalar multiplications of matrices. . . . . . . . . . . . . . . . . . . . . 11
1.4 Products of matrices .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5 Block matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.6 Inverse matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.7 Elementary matrices and finding A-I . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8 LDU factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 29
1.9 Applications..... . .. . . . .. . . .. . .. .... . . . .. .. . . . . ....... . ... . 34
1.9.1 Cryptography.. . . .. . . . . . . . .. . .. . .. .. .. . .. . .. . . .. .. .. 34
1.9.2 Electrical network 36
1.9.3 Leontief model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.10 Exercises 40

2 Determinants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 45
2.1 Basic properties of the determinant. . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.2 Existence and uniqueness of the determinant. . . . . . . . . . . . . . . . . . .. 50
2.3 Cofactor expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 56
2.4 Cramer's rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 61
2.5 Applications .... . ....... ... . . . .. . ... ............. ... ... .... 64
2.5.1 Miscellaneous examples for determinants. . . . . . . . . . . . . . .. 64
2.5.2 Area and volume. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 67
2.6 Exercises ............. ..... . . .. . . .. . ...... .... .. .. . .. .. . . . 72
xii Contents

3 Vector Spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 75
3.1 The n-space jRn and vector spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2 Subspaces. . . . .. . . . ... . . .. . . . .. . . . . . ... . ... ... . . . . ... .... .. 79
3.3 Bases. . . .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.4 Dimensions . ... . ... .. .. ... .. .. . .. .. . . . . .... . .. ... . . . ... . .. 88
3.5 Rowand column spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 91
3.6 Rank and nullity 96
3.7 Bases for subspaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 100
3.8 Invertibility.... .. ...... ... .. ... . .. . . ... . . . . . . . . . . . . . . . . . . . 106
3.9 Applications 108
3.9.1 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.9.2 The Wronskian 110
3.10 Exercises 112

4 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117


4.1 Basic propertiesof linear transformations 117
4.2 Invertiblelinear transformations ..................... 122
4.3 Matrices of linear transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 126
4.4 Vector spaces of linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.5 Change of bases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.6 Similarity. . . . .. .. . . . . . . . . . .. . . . . . .. .... .. . . . ... .. . . . ... . .. 138
4.7. Applications 143
4.7.1 Dual spaces and adjoint " 143
4.7.2 Computer graphics 148
4.8 Exercises ... . . . . . .. . ... . . . .. . .. . . . . . . . . . . . .. . .. . . . . . . . . . .. 152

5 Inner Product Spaces 157


5.1 Dot products and inner products 157
5.2 The lengths and angles of vectors.. . . . .. . .. .. . . .. . . .. .. . . .. . .. 160
5.3 Matrix representations of inner products 163
5.4 Gram-Schmidt orthogonalization 164
5.5 Projections. .... . .. . . ... .. . . ... .... .... ................. . . . 168
5.6 Orthogonal projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.7 Relations of fundamental subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
5.8 Orthogonal matrices and isometries 177
5.9 Applications 181
5.9.1 Least squares solutions 181
5.9.2 Polynomial approximations 186
5.9.3 Orthogonalprojectionmatrices. . . . . . . . . . . . . . . . . . . . . . . . . 190
5.10 Exercises 196
Contents xiii

6 Diagonalization 201
6.1 Eigenvalues and eigenvectors 201
6.2 Diagonalization of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
6.3 Applications 212
6.3.1 Linear recurrence relations . . . . . . . . . .. 212
6.3.2 Linear difference equations 221
6.3.3 Linear differential equations I . . . . . . . . . . . . . . . . . . . . . . . . . . 226
6.4 Exponential matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 232
6.5 Applications continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 235
6.5.1 Linear differential equations II 235
6.6 Diagonalization of linear transformations . . . . . . . . . . . . . . . . . . . . . . . 240
6.7 Exercises . .. .. . .. .. .... . ... .... .... .. ....... . .. .. . ....... . 242

7 Complex Vector Spaces 247


7.1 The n-space en
and complex vector spaces " 247
7.2 Hermitian and unitary matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 254
7.3 Unitarily diagonalizable matrices 258
7.4 Normal matrices 262
7.5 Application . ...... .. .... . ........ ..... .............. .. .. .. . 265
7.5.1 The spectral theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 265
7.6 Exercises.. ...... .. .... . .. ..... . ... .... .... . . ........ ... . . 269

8 Jordan Canonical Forms 273


8.1 Basic properties of Jordan canonical forms . . . . . . . . . . . . . . .. 273
8.2 Generalized eigenvectors " 281
8.3 The power A k and the exponential eA . . . . .. . 289
8.4 Cayley-Hamilton theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 294
8.5 The minimal polynomial of a matrix " 299
8.6 Applications. .... .. ... ...... .... ...... .. ..... .. .. .. .. ...... 302
8.6.1 The power matrix A k again 302
8.6.2 The exponential matrix eA again 306
8.6.3 Linear difference equations again . . . . . . . . . . . . . . . . . . . . . .. 309
8.6.4 Linear differential equations again. . . . . . . . . . . . . . . . . . . . . . 310
8.7 Exercises 315

9 Quadratic Forms 319


9.1 Basic properties of quadratic forms " 319
9.2 Diagonalization of quadratic forms 324
9.3 A classification of level surfaces 327
9.4 Characterizations of definite forms " 332
9.5 Congruence relation 335
9.6 Bilinear and Hermitian forms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
9.7 Diagonalization of bilinear or Hermitian forms 342
9.8 Applications 348
9.8.1 Extrema of real-valued functions on jRn . . . 348
xiv Contents

9.8.2 Constrained quadratic optimization 353


9.9 Exercises.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356

SelectedAnswersand mots 361

Bibliography 383

Index 385
Linear Algebra
1
Linear Equations and Matrices

1.1 Systems of linear equations


One of the central motivations for linear algebra is solving a system oflinear equations.
We begin with the problem of finding the solutions of a system of m linear equations
in n unknowns of the following form :

I
allXI + a12 X2 + + alnXn = bl
a2lXI + a22x2 + + a2nXn = b2

amlXI + am2X2 + . ., + amnXn = b m,

where Xl , X2, . . . , Xn are the unknowns and aij's and b.'s denote constant (real or
complex) numbers.
A sequence of numbers (Sl, S2, .. . ,sn) is called a solution of the system if Xl =
sj , X2 = S2, , X n = Sn satisfy each equation in the system simultaneously. When
bl = b2 = = b m = 0, we say that the system is homogeneous .
The central topic of this chapter is to examine whether or not a given system has
a solution , and to find the solution if it has one. For instance, every homogeneous
system always has at least one solution XI = X2 = ... = Xn = 0, called the trivial
solution . Naturally, one may ask whether such a homogeneous system has a nontrivial
solution or not. If so, we would like to have a systematic method of finding all the
solutions. A system of linear equations is said to be consistent if it has at least one
solution, and inconsistent if it has no solution .
For example, suppose that the system has only one linear equation

If ai = 0 for i = I, .. . , n, then the equation becomes 0 = b. Thus it has no solution


if b i: 0 (nonhomogeneous), or has infinitely many solutions (any n numbers Xi'S
=
can be a solution) if b 0 (homogeneous).
In any case, if all the coefficients of an equation in a system are zero, the equation
is vacuously trivial. In this book, when we speak of a system of linear equation, we

J H Kwak et al., Linear Algebra


Birkhauser Boston 2004
2 Chapter 1. Linear Equations and Matrices

always assume that not all the coefficients in each equation of the system are zero
unless otherwise specified.

Example 1.1 The system of one equation in two unknowns x and y is

ax +by = c,

in which at least one of a and b is nonzero. Geometrically this equation represents a


straight line in the xy-plane. Therefore, a point P = (x, y) (actually, the coordinates
x and y) is a solution if and only if the point P lies on the line . Thus there are infinitely
many solutions which are all the points on the line.

Example 1.2 The system of two equations in two unknowns x and y is

a\x + b\y = c\
{ a2 X + b2Y = C2

Solution: (I) Geometric method. Since the equations represent two straight lines in
the xy-plane, only three types are possible as shown in Figure 1.1.

Case (1) Case (2) Case (3)

{2xx-y -1 {x- y = -1
{x+ y 1
-2y = -2 x-y = 0 x-y = 0

y y y

x x

Figure 1.1. Three types of solution sets

Since a solution is a point lying on both lines simultaneously, by looking at the


graphs in Figure 1.1, one can see that only the following three types of solution sets
are possible:
(1) the straight line itself if they coincide,
(2) the empty set if the lines are parallel and distinct,
(3) only one point if they cross at a point.

(II) Algebraic method. Case (1) (two lines coincide): Let the two equations rep-
resent the same straight line, that is, one equation is a nonzero constant multiple of
the other. This condition is equivalent to
1.1. Systemsof linear equations 3

In this case, if a point (s, t) satisfies one equation, then it automatically satisfies the
other too. Thus, there are infinitely many solutions which are all the points on the
line.
Case (2) (two lines are parallel but distinct) : In this case, a2 = Aa" bi =
Ab" but C2 : /= ACl for A : /= O. (Note that the first two equalities are equivalent to
a,b2 - a2b, = 0). Then no point (s, t) can satisfy both equations simultaneously, so
that there are no solutions .
Case (3) (two lines cross at a point) : Let the two lines have distinct slopes, which
means aib: - a2b, ::/= O. In this case, they cross at a point (the only solution), which
can be found by the elementary method of elimination and substitution. The following
computation shows how to do this:
Without loss of generality, one may assume a,: /= 0 by interchanging the two
equations if necessary. (If both a, and a2 are zero, the system reduces to a system of
one variable.)

(1) Elimination : The variable x can be eliminated from the second equation by adding
- a2 times the first equation to the second, to get
a,

(2) Since a, b2 - a2b, : /= 0, y can be found by multiplying the second equation by a


nonzero number
a, to get
a,b2 - a2b,

c,
a,c2 -a2c,
a,b2 - a2b,

(3) Substitution: Now, x is solved by substituting the value of y into the first equation,
and we obtain the solution to the problem:

b2Cl - b,C2
aib: - a2b,

Note that the condition a, b2 - a2b, : /= 0 is necessary for the system to have only
one solution. D

In Example 1.2, the original system of equations has been transformed into a
simpler one through certain operations, called elimination and substitution, which is
4 Chapter 1. Linear Equations and Matrices

just the solution of the given system . That is, if (x, y) satisfies the original system
of equations, then it also satisfies the simpler system in (3), and vice-versa. As in
Example 1.2, we will see later that any system of linear equations may have either no
solution, exactly one solution, or infinitely many solutions. (See Theorem 1.6.)
Note that an equation ax + by + cz = d, (a, b, c) i- (0,0,0) , in three unknowns
represents a plane in the 3-space ]R3. The solution set includes

{(x, y, 0) I ax + by = d} in the xy-plane,


{(x, 0, z) I ax + cz = d} in the xz-plane,
{(O, y, z) I by + cz = d} in the yz-plane.

One can also examine the various possible types of the solution set of a system of
three equations in three unknown. Figure 1.2 illustrates three possible cases.

Infinitly many solutions Only one solution No solutions

Figure 1.2. Three planes in ]R3

Problem 1.1 For a system of three linear equations in three unknowns

allx + anY + al3Z = bl


=
1 a21x
a31 x
+
+
a22Y
a32Y
+
+
a23Z
a33Z =
b2
b3,

describe all the possible types of the solution set in the 3-space ]R3 .

1.2 Gaussian elimination


A basic idea for solving a system of linear equations is to transform the given system
into a simpler one, keeping the solution set unchanged, and Example 1.2 shows an
idea of how to do it. In fact, the basic operations used in Example 1.2 are essentially
only the following three operations , called elementary operations:
1.2. Gaussian elimination 5

(1) multiply a nonzero constant throughout an equation,


(2) interchange two equations,
(3) add a constant multiple of an equation to another equation.
It is not hard to see that none of these operations alters the solutions . That is, if
satisfy the original equations , then they also satisfy those equations altered by
Xi ' S
the three operations, and vice-versa.
Moreover, each ofthe three elementary operations has its inverse operation which
is also an elementary operation:

(I') multiply the equation with the reciprocal of the same nonzero
constant ,
(2') interchange two equations again,
(3') add the negative of the same constant multiple of the equation to
the other.
Therefore, by applying a finite sequence of the elementary operations to the given
original system, one obtains another new system, and by applying these inverse oper-
ations in reverse order to the new system, one can recover the original system. Since
none of the three elementary operations alters the solutions, the two systems have
the same set of solutions. In fact, a system may be solved by transforming it into a
simpler system using the three elementary operations finitely many times.
These arguments can be formalized in mathematical language. Observe that in
performing any of these three elementary operations , only the coefficients of the
variables are involved in the operations, while the variables Xl, X2 , , X n and the
equal sign U= " are simply repeated . Thus, keeping the places of the variables and U= "
in mind, we just pick up the coefficients only from the given system of equations and
make a rectangular array of numbers as follows:

This matrix is called the augmented matrix of the system. The tenn matrix means
just any rectangular array of numbers, and the numbers in this array are called the
entries of the matrix. In the following sections, we shall discuss matrices in general.
For the moment, we restrict our attention to the augmented matrix of a system.
Within an augmented matrix, the horizontal and vertical subarrays

[ail a i2 . .. ain bj) and

are called the i-th ro w (matrix), which represents the i-th equation , and the j-th
column (matrix), which are the coefficients of j -th variable X i - of the augmented
6 Chapter 1. Linear Equations and Matrices

matrix, respectively. The matrix consisting of the first n columns of the augmented
matrix

is called the coefficient matrix of the system.


One can easily see that there is a one-to-one correspondence between the columns
of the coefficient matrix and variables of the system. Note also that the last column
[bl b2 . . . bmf of the augmented matrix represents homogeneity of the system and
so no variable corresponds to it.
Since each row of the augmented matrix contains all the information of the cor-
responding equation of the system, we may deal with this augmented matrix instead
of handling the whole system of linear equations, and the elementary operations may
be applied to an augmented matrix just like they are applied to a system of equations .
But in this case, the elementary operations are rephrased as the elementary row
operations for the augmented matrix:

(1st kind) multiply a nonzero constant throughout a row,


(2Dd kind) interchange two rows,
(3rd kind) add a constant multiple of a row to another row.

The inverse row operations which are also elementary row operations are

(1st kind) multiply the row by the reciprocal of the same constant,
(2Dd kind) interchange two rows again,
(3rd kind) add the negative of the same constant multiple of
the row to the other.

Definition 1.1 Two augmented matrices (or systems of linear equations) are said to
be row-equivalent if one can be transformed to the other by a finite sequence of
elementary row operations .

Note that, if a matrix B can be obtained from a matrix A by these elementary


row operations, then one can obviously recover A from B by applying the inverse
elementary row operations to B in reverse order. Therefore, the two systems have the
same solutions:

Theorem 1.1 If two systems of linear equations are row-equivalent, then they have
the same set ofsolutions.

The general procedure for finding the solutions will be illustrated in the following
example :
1.2. Gaussian elimination 7

Example 1.3 Solve the system of linear equations :

I
2y + 4z = 2
x + 2y + 2z = 3
3x + 4Y + 6z = -1.

Solution: One could work with the augmented matrix only. However, to compare
the operations on the system of linear equations with those on the augmented matrix,
we work on the system and the augmented matrix in parallel. Note that the associated
augmented matrix for the system is

[~ ; ~
3 4 6 -1
~].
(1) Since the coefficient of x in the first equation is zero while that in the second
equation is not zero, we interchange these two equations :

I [b;; ;].
x + 2y + 2z = 3
2y + 4z = 2
3x + 4Y + 6z = -1 3 4 6 -1

(2) Add -3 times the first equation to the third equation:

I
x +
-
2y
2y
2y
+
+
2z =
4z =
=
3
2
-10 [
1
o
o
2
2
-2
;o ;].
-10

Thus , the first variable x is eliminated from the second and the third equations. In this
process , the coefficient 1 of the first unknown x in the first equation (row) is called
the first pivot.
Consequently, the second and the third equations have only the two unknowns y
and z. Leave the first equat ion (row) alone, and the same elimination procedure can
be applied to the second and the third equations (rows): The pivot to eliminate y from
the last equation is the coefficient 2 of y in the second equation (row).
(3) Add 1 times the second equation (row) to the third equation (row):

I [b;; ;] .
x + 2y + 2z = 3
2y + 4z = 2
4z = -8 o 0 4 -8

The elimination process (i.e., (1) : row interchange , (2): elimination of x from the
last two equations (rows), and then (3): elimination of y from the last equation (row))
done so far to obtain this result is called forward elimination. After this forward
elimination, the leftmost nonzero entries in the nonzero rows are called the pivots .
Thus the pivots of the second and third rows are 2 and 4, respectively.
(4) Normalize nonzero rows by dividing them by their pivots. Then the pivots are
replaced by 1:
r
8 Chapter 1. Linear Equations and Matrices

[ o~ i ~ iJ .
+ 2y + 2z = 3
y + 2z = I
z = -2 0 I -2

The resulting matrix on the right-hand side is called a row-echelon form of the
augmented matrix, and the I's at the pivotal positions are called the leading I's. The
process so far is called Gaussian elimination.
The last equation gives z = -2. Substituting z = -2 into the second equation
gives y = 5. Now, putting these two values into the first equation, we get x = -3. This
process is called back substitution.The computation is shown below: i.e., eliminating
numbers above the leading I's,
(5) Add -2 times the third row to the second and the first rows:

+2; = 7 2 0
~l
U
{X = 5 I 0
z = -2 o I -2

(6) Add -2 times the second row to the first row:

= -3 0 0
J
{ x y
z
=
=
5
-2 UI 0 -35
o I -2
.

This resulting matrix is called the reduced row-echelon form of the augmented
matrix, which is row-equivalent to the original augmented matrix and gives the so-
lution to the system . The whole process to obtain the reduced row-echelon form is
called Gauss-Jordan elimination. 0

In summary, by applying a finite sequence of elementary row operations , the aug-


mented matrix for a system of linear equations can be transformed into its reduced
row-echelon form, which is row equivalent to the original one. Hence the two corre-
sponding systems have the same solutions. From the reduced row-echelon form, one
can easily decide whether the system has a solution or not, and find the solution of
the given system if it is consistent.

Definition 1.2 A row-echelon form of an augmented matrix is of the following form:


(1) The zero rows, if they exist, come last in the order of rows.
(2) The first nonzero entries in the nonzero rows are I, called leading I's.
(3) Below each leading I is a column of zeros. Thus, in any two consecutive nonzero
rows, the leading I in the lower row appears farther to the right than the leading
I in the upper row.
The reduced row-echelon form of an augmented matrix is of the form:
(4) Above each leading I is a column of zeros, in addition to a row-echelon form .
1.2. Gaussian elimination 9

Example 1.4 The first three augmented matrices below are in reduced row-echelon
form, and the last one is just in row-echelon form.

[ ~ ~ ~] , [~~ ~ i ~] , [~~ ~ ~], [~i ~ ~].


000 000 0 0 000 1 001 3

Recall that in an augmented matrix [A b], the last column b does not correspond
to any variable. Thus , if the reduced row-echelon form of an augmented matrix for a
nonhomogeneous system has a row of the form [ 0 0 ... 0 b ] with b f:. 0, then the
associated equation is OXj + OX2 + . .. + OXn = b with b f:. 0, which means the
system is inconsistent. If b = 0, then it has a row containing only O's, which can be
neglected and deleted . In this example, the third matrix shows the former case, and
the first two matrices show the latter case. 0
In the following example, we use Gauss-Jordan elimination again to solve a
system which has infinitely many solutions .
Example 1.5 Solve the following system of linear equations by Gauss-Jordan elim-

I
ination.
Xj + 3X2 - 2X3 = 3
2xj + 6X2 - 2X3 + 4X4 = 18
X2 + X3 + 3X4 = 10.

Solution: The augmented matrix for the system is

13-2 0 3]
[ 2 6 -2 4 18
o 1 1 3 10
.

The Gaussian elimination begins with:


(1) Adding -2 times the first row to the second produces

13-2 0 3]
[ o
o
0
1
2 4 12
1 3 10
.

(2) Note that the coefficient of X2 in the second equation is zero and that in the
third equation is not. Thus, interchanging the second and the third rows produces

13-2 0 3]
[ 011310
o 0 2 4 12
.

(3) Dividing the third row by the pivot 2 produces a row-echelon form
10 Chapter 1. Linear Equations and Matrices

We now continue the back-substitution :


(4) Adding -1 times the third row to the second, and 2 times the third row to the
first produces
1 3 0 4 15]
o 1 0 1 4 .
[o 0 1 2 6
(5) Finally, adding -3 times the second row to the first produces the reduced
row-echelon form:
1 0 0
010 14
[ 001 2 6
13] .

The corresponding system of equations is

Xl + X4 = 3
+
1
X2 X4 = 4
x3 + 2X4 = 6.

This system can be rewritten as follows:

~~ : ~ ~:
1 X3 = 6 2X4 .

Since there is no other condition on X4, one can see that all the other variables x}, X2,
and X3 may be uniquely determined if an arbitrary real value t e lRis assigned to X4
(R denotes the set of real numbers): thus the solutions can be written as

(Xl, X2, X3, x4)=(3-t , 4 - t , 6-2t, t), teR o


Note that if we look at the reduced row-echelon form in Example 1.5, the variables
and X3 correspond to the columns containing leading 1's, while the column
Xl , x2,
corresponding to X4 contains no leading 1.
An augmented matrix of a system of linear equations may have more than one
row-echelon form, but it has only one reduced row-echelon form (see Remark (2) on
page 97 for a concrete proof) . Thus the number of leading 1's in a system does not
depend on the Gaussian elimination.

Definition 1.3 Among the variables in a system, the ones corresponding to the
columns containing leading l's are called the basic variables, and the ones cor-
responding to the columns without leading 1's, if there are any, are called the free
variables.

Clearly the sum of the number of basic variables and that of free variables is equal
to the total number of unknowns: the number of columns .
In Example 1.4, the first and the last augmented matrices have only basic variables
but no free variables, while the second one has two basic variables Xl and X3, and two
1.3. Sums and scalar multiplications of matrices 11

free variables X2 and X4. The third one has two basic variables XI and X2, and only
one free variable X3.
In general, as we have seen in Example 1.5, a consistent system has infinitely
many solutions if it has at least one free variable, and has a unique solution if it has no
free variable. In fact, if a consistent system has a free variable (which always happens
when the number of equations is less than that of unknowns), then by assigning
arbitrary value to the free variable, one always obtains infinitely many solutions.

Theorem 1.2 If a homogeneous system has more unknowns than equations, then it
has infinitely many solutions.
Problem 1.2 Suppose that the augmented matrices for some systems of linear equations have
been reduced to reduced row-echelon forms as below by elementary row operations. Solve the
systems:

(1)
[
1 0
0 I
o 0
o
o
o
5]
-2 ,
4
(2) [bo ~ ~ ~
0 I 3
-I ]
6
2
.

Problem 1.3 Solve the following systems of equations by Gaussian elimination. What are the
pivots?
-x + y + 2z 0 2y z I
(1)1 3x + 4y + z 0 (2) 14X lOy + 3z 5
2x + 5y + 3z = O. 3x 3y 6.
w + x + y = 3
-3w 17x + y + 2z I
(3)\ 4w 17x + 8y 5z I
5x 2y + z l.

Problem 1.4 Determine the condition on bi so that the following system has a solution.
+ 2y + 6z + 3y
1~
2z
~
bl bl
(1) 3y 2z = b2 (2) { Y + 3z b2
3x y + 4z b3 4x + 2y + Z b3

1.3 Sums and scalar multiplications of matrices

Rectangular arrays of real numbers arise in many real-world problems. Historically,


it was an English mathematician J.1. Sylvester who first introduced the word "matrix "
in the year 1848. It was the Latin word for womb, as a name for an array of numbers.

Definition 1.4 An m by n (written mxn) matrix is a rectangular array of numbers


arranged into m (horizontal) rows and n (vertical) columns . The size of a matrix is
specified by the number m of the rows and the number n of the columns .
12 Chapter 1. Linear Equations and Matrices

In general, a matrix is written in the following form:

or just A = [aij] if the size of the matrix is clear from the context. The number aij is
called the (i, j)-entry of the matrix A , and is written as aij = [A]ij.
An m x 1 matrix is called a column (matrix) or sometimes a column vector, and
a 1 xn matrix is called a row (matrix), or a row vector. In general, we use capital
letters like A, B, C for matrices and small boldface letters like x, y, z for column
or row vectors.

Definition 1.5 Let A = [aij] be an mxn matrix. The transpose of A is the nxm
matrix, denoted by AT, whose j -th column is taken from the j -th row of A: that is,
[AT]ij = [Alji.

Example 1.6 (1) If A = [135]


2 4 6 ,then AT = [12]
;: .

(2) The transpose of a column vector is a row vector and vice-versa:

X
-- [ X~:'n~ ]
{:=:} xT = [XI X2 xnl.

o
Definition 1.6 A matrix A = [aij] is called a square matrix of order n if the number
of rows and the number of columns are both equal to n.

Definition 1.7 Let A be a square matrix of order n.


(1) The entries au, a22, . . . , ann are called the diagonal entries of A.
(2) A is called a diagonal matrix if all the entries except for the diagonal entries are
zero.
(3) A is called an upper (lower) triangular matrix if all the entries below (above,
respectively) the diagonal are zero.

The following matrices U and L are the general forms of the upper triangular and
lower triangular matrices, respectively:

~~: a~2
Jl
ln
a2n
a
. . ]
, L=

a~1
[
.. : a~n a n2
1.3. Sums and scalar multiplications of matrices 13

Note that a matrix which is both upper and lower triangular must be a diagonal matrix,
and the transpose of an upper (lower, respectively) triangular matrix is lower (upper,
respectively) triangular.

Definition 1.8 Two matrices A and B are said to be equal, written A = B , if their
sizes are the same and their corresponding entries are equal : i.e., [A]ij = [B]ij for
all i and j.

This definition allows us to write a matrix equation. A simple example is (ATl =


A by definition .
Let Mmxn{lR) denote the set of all m x n matrices with entries of real numbers.
Among the elements of Mmxn{lR), one can define two operations, called the scalar
multiplication and the sum of matrices, as follows:

Definition 1.9 (I)(Scalarmultiplication) Foranm xn matrix A = [aij] E Mmxn{lR)


and a scalar k E lR (which is simply a real number), the scalar multiplication of k
=
and A is defined to be the matrix kA such that [kA]ij k[A]ij for all i and j: i.e., in
an expanded form:

a ll . ..
k . .
[
. .
amI
(2)(Sum ofmatrices) For two matrices A = [aij] and B = [bij] in Mmxn(R), the
sum of A and B is defined to be the matrix A + B such that [A + B]ij [A]ij + [B]ij
=
for all i and j : i.e., in an expanded form:

aln]
:. +
[b
11
:.
.. .
'. '
al
n
7
bIn ].

amn bml amn +bmn


The resulting matrices kA and A + B from these two operations also belong to
Mmxn(lR) . In this sense, we say Mm xn(lR) is closed under the two operations. Note
that matrices of different sizes cannot be added ; for example, a sum

124]+[ad e fc]
[3 b

cannot be defined.
If B is any matrix , then - B is by definition the multiplication (-1) B. Moreover, if
A and B are two matrices of the same size, then the subtraction A - B is by definition
the sum A + (-1) B. A matrix whose entries are all zeros is called a zero matrix,
denoted by the symbol 0 (or Omxn when the size is emphasized).
Clearly, matrix sum has the same properties as the sum of real numbers. The real
numbers in the context here are traditionally called scalars even though "numbers"
is a perfectly good name and "scalar" sounds more technical. The following theorem
lists the basic arithmetic properties of the sum and scalar multiplication of matrices.
14 Chapter 1. Linear Equations and Matrices

Theorem 1.3 Suppose that the sizes of A, Band C are the same. Then the following
arithmetic rules ofmatrices are valid:
(1) (A + B) + C = A + (B + C) , (written as A + B + C) (Associativity),
(2) =
A+ 0 0+ A A, =
(3) A+(-A)=(-A)+A=O,
(4) A + B = B + A, (Commutativity),
(5) k (A + B) = kA + k.B,
(6) (k + )A = kA + lA,
(7) (kl)A = k (fA).

Proof: We prove only the equality (S) and the remaining ones are left for exercises.
For any (i, j),
ij = k[A + B]ij = k([A]ij + [B]ij) = [kA] ij + [kB]ij
= [kA +kB]ij . o
In particular, A + A = 2A , A + (A + A) = 3A = (A + A) + A, and inductively
nA = (n - l)A + A for any positive integer n,

Definition 1.10 A square matrix A is said to be symmetric if AT = A , or skew-


symmetric if AT = - A .
For example , the matrices

A = [~ ~ b ] ,
sc
B = [-~ ~; ] ,
b c -2 -3 0
are symmetric and skew-symmetric, respectively. Notice that all the diagonal entries
of a skew-symmetric matrix must be zero, since aii = -aii .
By a direct computation, one can easily verify the following properties of the
transpose of matrices:
Theorem 1.4 Let A and B be m x n matrices. Then
(kA)T = kAT and (A + B)T = AT + B T.
Problem 1.5 Prove the remaining parts of Theorem 1.3.

Problem 1.6 Find a matrix B such that A + BT = (A - B)T , where

2-3 0]
A = [ 4 -1 3
-1 0 1
.

Problem 1.7 Find a , b , c and d such that

[ ac db] = 2[a2 a+c + [2+b


c+d
3]
a+9]
b .
1.4. Products of matrices 15

1.4 Products of matrices


The sum and the scalar multiplic ation of matrices were introduced in Section 1.3. In
this section , we introduce the product of matrices. Unlike the sum of two matrices ,
the product of matrices is a little bit more complicated, in the sense that it can be
defined for two matrice s of different sizes. The product of matrices will be defined in
three steps:
Step ( 1) Product of vectors: For a 1 x n row vector a = [al a2 . . . an] and an
n x 1 column vector x = [XI X2 ... Xnf , the product ax is a 1 x 1 matrix (i.e., just
a number) defined by the rule

ax ~ [al a2 ... a.] [ ~~ ~


] [a,x, + a2X2 + .. .+ a.x.l ~ [t.aiXi] .

Note that the number of entries of the first row vector is equal to the number of entries
of the second column vector, so that an entry-wise multiplication is possible.
Step (2) Product of a matrix and a vector: For an m x n matrix

with the row vectors a i'S and for an n x 1 column vector x = [XI'" xn]T, the
product Ax is by definition an m x 1 matrix , whose m rows are computed according
to Step (1 ):

[:~: :~~ a2X [I:?=l


[alX]
n
Ax = oi ] [ XI]
al X2 I:?=I aloux.ix i ]
ami a m2 a~n = a~x = I:?=I:amiXi .
Therefore, for a system of m linear equations in n unknowns, by writing the n un-
knowns as an n x 1 column matrix x and the coefficients as an m x n matrix A, the
system may be expressed as a matrix equation Ax = b .
Step (3) Product of matrices: Let A be an m x n matrix and B an n x r matrix
with columns bj , b2, . . . , b.. written as B = [bl b2 .. . b.]. The product AB
is defined to be an m x r matrix whose r columns are the products of A and the r
columns of B, each computed according to Step (2) in corresponding order. That is,

AB = [ Abl
16 Chapter 1. Linear Equations and Matrices

which is an m x r matrix. Therefore , the (i , j)-entry [AB] ij of AB is:

[A Blij = a ib j = ailbl j + ai2b 2j + . .. + ainbnj = Lk=1 aikbkj .


This can be easily memorized as the sum of entry-wise multiplications of the boxed
vectors in Figure 1.3.

:
blj bl r
[ bll b2j b2r
AB ~

amI am2 amn


... bnj bnr J
Figure 1.3. The entry [AB]ij

Example 1.7 Consider the matrices

A=[~ ~l B= [51 - 20]


1 0 .

The columns of AB are the product of A and each column of B :

2 1 + 3 . 5 ] _ [ 17 ]
[~ ~] [;] [ 4 1+0 5 - 4 '

[~ ~] [-i] = 2 2 + 3 . (-1) ] _ [ 1 ]
[ 4 2+0(-1) - 8 '

[~ ~][~] 2 .0 + 3.0 ] _ [ 0 ]
[ 4 0+0 0 - 0 .

Therefore , AB is

2 3] [1 2 0] = [17 1 0]
[ 4 0 5 -1 0 4 8 0 .

Since A is a 2 x2 matrix and B is a 2x3 matrix, the product AB is a 2x3 matrix.


If we concentrate, for example, on the (2, I)-entry of AB, we single out the second
row from A and the first column from B, and then we multiply corresponding entries
together and add them up, i.e., 4 . 1 + 0 . 5 = 4. 0

Note that the product AB of A and B is not defined if the number of columns of
A and the number of rows of B are not equal.
1.4. Products of matrices 17

Remark: In step (2), instead of defining a product of a matrix and a vector, one can
define alternatively the product of a 1 x n row matrix a and an n x r matrix Busing
the same rule defined in step (1) to have a 1 x r row matrix aB. Accordingly, in step
(3) an appropriate modification produces the same definition of the product of two
matrices . We suggest that readers complete the details. (See Example 1.10.)
The identity matrix of order n, denoted by In (or I if the order is clear from the
context) , is a diagonal matrix whose diagonal entries are all 1, i.e.,

By a direct computation, one can easily see that AIn = A = InA for any n x n matrix
A.
The operations of scalar multiplication, sum and product of matrices satisfy many,
but not all, of the same arithmetic rules that real or complex numbers have. The matrix
Om xn plays the role of the number 0, and In plays that of the number 1 in the set of
usual numbers.
The rule that does not hold for matrices in general is commutativity AB = BA
of the product, while commutativity of the matrix sum A + B = B + A always holds .
The follow ing example illustrates noncommutativity of the product of matrices.

Example 1.8 (Noncommutativity of the matrix product)

Let A = [~ _ ~] and B = [~ ~l Then,

AB = [_~ ~ l BA = [~ -~]
which shows AB =1= BA . 0

The following theorem lists the basic arithmetic rules that hold in the matrix
product.

Theorem 1.5 Let A, B, C be arbitrary matrices for which the matrix operations
below can be defined, and let k be an arbitrary scalar. Then
(1) A(BC) = (AB)C, (written as ABC) (Associativity),
(2) A(B + C) = AB + AC, and (A + B)C = AC + BC, (Distributivity),
(3) I A = A = AI,
(4) k(BC) = (kB)C = B(kC),
(5) (AB)T =B T AT.
18 Chapter 1. Linear Equations and Matrices

Proof: Each equality can be shown by direct computation of each entry on both
sides of the equalities. We illustrate this by proving (I) only, and leave the others to
the reader.
Assume that A = [aij] is an mxn matrix, B = [bkl] is an nxp matrix, and
C = [cstl is a pxr matrix. We now compute the (i, i)-entry of each side of the
equation. Note that BC is an n xr matrix whose (i , i)-entry is [BC]ij = L:f=l bnc:..l :
Thus
n n p n p

[A(BC)]ij = :~::>i/l [ B C]/lj = L ai/l I)/l}.C}.j =L Lai/lb/l}.C}.j.


/l=1 /l=1 }.=I /l=1 }.=I

Similarly, AB is an m x p matrix with the (i, i)-entry [A B]ij = L:~=I ai/lb/lj, and

p p n n p

[(AB)C]ij = L[AB]i}.C}.j = L L ai/lb/l}.c}.j =L L ai/lb/l}.c}.j'


}.=l }.=I/l=1 /l=1 }.=I

This clearly shows that [A(BC)]ij = [(AB)C]ij for all i, i. and consequently
=
A(BC) (AB)C as desired. 0

Problem 1.8 Give an exampleof matrices A and B such that (AB)T f= AT BT .

Problem 1.9 Proveor disprove: If A is not a zeromatrixand AB = AC,thenB = C.Similarly,


is it true or not that AB = 0 implies A = 0 or B = O?

Problem 1.10 Showthatany triangularmatrix A satisfying A AT = AT A is a diagonalmatrix.


Problem 1.11 For a square matrix A , showthat
(1) AA T and A + AT are symmetric,
(2) A - AT is skew-symmetric, and
(3) A can be expressedas the sum of symmetric part B = ~ (A + AT) and skew-symmetric
part C = ~(A - AT) , so that A =B+ C.

As an application of our results on matrix operations, one can prove the following
important theorem:

Theorem 1.6 Any system of linear equations has either no solution, exactly one
solution, or infinitely many solutions.

Proof: We have seen that a system of linear equations may be written in matrix form
as Ax = b. This system may have either no solution or a solution. If it has only one
solution, then there is nothing to prove. Suppose that the system has more than one
solution and let Xl and X2 be two different solutions so that AXI = band AX2 = b.
Let Xo = XI - X2 Then xo f= 0, and Axo = A(XI - X2) = O.Thus
1.5. Block matrices 19

+ kxo) == AXI + kAxo = b.


A(XI

This means that Xl + kxo is also a solution of Ax = b for any k. Since there are
infinitely many choices for k, Ax = b has infinitely many solutions. 0

Problem 1.12 For which values of a does each of the following systems have no solution,
exactly one solution, or infinitely many solutions?

I
x + 2y 3z 4
(1) 3x y + Sz 2
4x + y + (a 2 - 14)z a + 2.

I
x - y + z = 1
(2) x + 3y + az 2
2x + ay + 3z 3.

1.5 Block matrices


In this section we introduce some techniques that may be helpful in manipulations of
matrices. A submatrix of a matrix A is a matrix obtained from A by deleting certain
rows and/or columns of A. Using some horizontal and vertical lines, one can partition
a matrix A into submatrices, called blocks, of A as follows: Consider a matrix

divided up into four blocks by the dotted lines shown. Now, if we write

All = [all a12 a 13], A12 = [ a14 ] ,


a21 a22 a23 a24

then A can be written as

called a block matrix.


The product of matrices partitioned into blocks also follows the matrix product
formula, as if the blocks Aij were numbers: If

A = [All A12] and B = [Bll B12]


A2l An B2l B22

are block matrices and the number of columns in Aik is equal to the number of rows
in Bkj, then
20 Chapter 1. Linear Equations and Matrices

AB _ [AllBll + Al2 B21 AII Bl2 + Al2 B22 ]


- A21Bll + A22B21 A2I Bl2 + A22 B22 .

This will be true only if the columns of A are partitioned in the same way as the rows
of B.
It is not hard to see that the matrix product by blocks is correct. Suppose, for
example, that we have a 3x3 matrix A and partition it as

a ll al2 I a 13] [ A
A = a21 a22 I a23 = All
[ - - - 21
a31 an I a33
and a 3 x 2 matrix B which we partition as

B= [:~:E3i1J3i
:~~] = [ :~: ] .
Then the entries of C = [Cij] = A B are

Cij = (ailb lj + ai2b2j) + ai3b3j.

The quantity ailblj + ai2b2j is simply the (i, j)-entry of AllBll if i .:::: 2, and is
the (i , j)-entry of A21 Bll if i = 3. Similarly, ai3b3j is the (i, j)-entry of A l2B21 if
i .:::: 2, and of A22B21 if i = 3. Thus AB can be written as

AB =[ Cll ] =[ AllBll + AI2 B21 ] .


e21 A21 Bll + A22 B21
Example 1.9 If an m x n matrix A is partitioned into blocks of column vectors : i.e.,
A = [CI C2 cn] , where each block Cj is the j-th column, then the product Ax
with x = [XI xnf is the sum of the block matrices (or column vectors) with
coefficients X /s:

Ax ~ ICI '2 ',] [ ~~ ~ ] x1'd X2'2 +. + x""

where Xj Cj = xj[alj a2j anjf . Hence, a matrix equation Ax = b is nothing


but the vector equation XI CI + X2C2 + .. . +
XnC n = b. 0
Example 1.10 Let A be an m x n matrix partitioned into the row vectors ar , a2, ... ,
an as its blocks, and let B be an n x r matrix so that their product AB is well defined .
By considering the matrix B as a block , the product AB can be written

AB =
at
a2
.
J =[ alB
a2B
..
J= [alb
a2bl
.
l
alb2
a2b2
.. B
. ..
[
am amB ambl amb2
1.6. Inverse matrices 21

where b., b2, . .. , b, denote the columns of B . Hence, the row vectors of AB are
the products of the row vectors of A and the column vectors of B. 0
Problem 1.13 Compute AB using block multiplication, where

A=
1211
-3 ~~ ~
0] , B=
1
~
012]
~~ .
[ o 01 2 -1 [
3 -2 I 1

1.6 Inverse matrices


As shown in Section 1.4, a system of linear equations can be written as Ax = b in
matrix form. This form resembles one of the simplest linear equations in one variable
ax = b whose solution is simply x = a-I b when a f= O. Thus it is tempting to write
the solution of the system as x = A -I b. However, in the case of matrices we first have
to assign a meaning to A -I . To discuss this we begin with the following definition.

Definition 1.11 For an m x n matrix A, an n x m matrix B is called a left inverse


of A if BA = In, and an n x m matrix C is called a right inverse of A if AC = 1m

Example 1.11 (One -sided inverse) From a direct calculation for two matrices

A = [1 2-1] andB=[-~
2 0 1
-3 ]
5 ,
-2 7

we have AB = h , and BA = [-~ -; - : ] f= 13 .


12 -4 9
Thus, the matrix B is a right inverse but not a left inverse of A , while A is a left inverse
but not a right inverse of B. 0

A matrix A has a right inverse if and only if AT has a left inverse, since (AB)T =
B T AT and IT = I. In general, a matrix with a left (right) inverse need not have a
right (left , respectively) inverse. However, the following lemma shows that if a matrix
has both a left inverse and a right inverse, then they must be equal:

Lemma 1.7 If an n x n square matrix A has a left inverse B and a right inverse C,
then Band C are equal, i.e., B = C.

Proof: A direct calculation shows that

B = BIn = B(AC) = (BA)C = InC = C . o


22 Chapter 1. Linear Equations and Matrices

By Lemma 1.7, one can say that if a matrix A has both left and right inverses,
then any two left inverses must be both equal to a right inverse C, and hence to each
other. By the same reason, any two right inverses must be both equal to a left inverse
B, and hence to each other. So there exists only one left and only one right inverse,
which must be equal.
We will show later (Theorem 1.9) that if A is a square matrix and has a left inverse,
then it has also a right inverse, and vice-versa. Moreover, Lemma 1.7 says that the left
inverse and the right inverse must be equal. However, we shall also show in Chapter 3
that any non-square matrix A cannot have both a right inverse and a left inverse: that
is, a non-square matrix may have only a one-sided inverse. The following example
shows that such a matrix may have infinitely many one-sided inverses.

!]
Example 1.12 (Infinitely many one-sided inverses) A non-square matrix A =

[~ can have more than one left inverse, In fact, for any x, Y E Ill. the matrix

B= [~ ~ ~] is a left inverse of A. 0

Definition 1.12 An n x n square matrix A is said to be invertible (or nonsingular)


if there exists a square matrix B of the same size such that

AB = In = BA.
Such a matrix B is called the inverse of A, and is denoted by A-I. A matrix A is said
to be singular if it is not invertible.

Lemma 1.7 implies that the inverse matrix of a square matrix is unique. That is
why we call B 'the' inverse of A. For instance, consider a2x2 matrix A = [~ ~].
If ad - be :/= 0, then it is easy to verify that

l
d -b
A-I _ 1
- ad -be
[d-c -b]
a -
_[ ad-c- be ad - be
a
ad - be ad -be

since AA -1 = l: = A -1 A . Note that any zero matrix is singular.


Problem 1.14 Let A be an invertible matrix and k any nonzero scalar. Show that
(1) A-I is invertible and (A-I)-I =
A;
(2) the matrix kA is invertible and (kA)-I = tA-I;
(3) AT is invertible and (AT)-I = (A-I)T.

Theorem 1.8 The product of invertible matrices is also invertible, whose inverse is
the product of the individual inverses in reversed order:
1.7. Elementary matrices and finding A-I 23

Proof: Suppose that A and B are invertible matrices of the same size. Then
(A B)(B - I A- I ) = A(BB-I)A- I = AlA-I = AA- I = I , and similarly
(B - 1A-I) (AB) = I. Thus , AB has the inverse B- 1A-I . o
The inverse of A is written as ' A to the power -1 ' , so one can give the meaning
of Ak for any integer k : Let A be a square matrix. Define A0 = I. Then, for any
positive integer k, we define the power A k of A inductively as

Moreover, if A is invertible, then the negative integer power is defined as

It is easy to check that AkH = AkA l whenever the right-hand side is defined. (If
A is not invertible, A3+(-I) is defined but A-I is not.)

Problem 1.15 Prove:

(1) If A has a zero row, so does AB .


(2) If B has a zero column, so does AB.
(3) Any matrix with a zero row or a zero column cannot be invertible.

Problem 1.16 Let A be an invertible matrix. Is it true that (Ak) T = (A T) k for any integer k?
Justify your answer.

1.7 Elementary matrices and finding A-I

We now return to the system of linear equations Ax = b. If A has a right inverse B


so that AB = 1m , then x = Bb is a solution of the system since

Ax = A (Bb) = (AB)b = b .
(Compare with Problem 1.23). In particular, if A is an invertible square matrix, then
it has only one inverse A- I, and x = A- Ib is the only solution of the system. In this
section, we discuss how to compute A-I when A is invertible.
Recall that Gaussian elimination is a process in which the augmented matrix is
transformed into its row-echelon form by a finite number of elementary row op-
erations. In the following, one can see that each elementary row operation can be
expressed as a nonsingular matrix, called an elementary matrix, so that the process
of Gaussian elimination is the same as multiplying a finite number of corresponding
elementary matrices to the augmented matrix.

Definition 1.13 An elementary matrix is a matrix obtained from the identity matrix
In by executing only one elementary row operation.
24 Chapter 1. Linear Equations and Matrices

For example, the following matrices are three elementary matrices corresponding
to each type of the three elementary row operations:

(1St kind) [~ _~] : the second row of h is multiplied by -5;

(2nd kind) [~o ~0 ~1 0~]: theares~cond and the fourth rows of 14


interchanged;
o I 0 0

(3 rd kind) [bo ~ ~]:


0 I
3 times the third row is added to the first
row of h-

It is an interesting fact that, if E is an elementary matrix obtained by executing a


certain elementary row operation on the identity matrix 1m , then for any m x n matrix
A, the product EA is exactly the matrix that is obtained when the same elementary
row operation in E is executed on A.
The following example illustrates this argument. (Note that AE is not what we
want. For this, see Problem 1.18.)

Example 1.13 (Elementary operation by an elementary matrix) Let b = [bl bz b3]T


be a 3 x 1 column matrix. Suppose that we want to execute a third kind of elementary
operation 'adding (-2) x the first row to the second row' on the matrix b. First, we
execute this operation on the identity matrix h to get an elementary matrix E:

100] .
E =
[ -2 1 0
001

Multiplying this elementary matrix E to b on the left produces the desired result:

Similarly, the second kind of elementary operation 'interchanging the first and
the third rows' on the matrix b can be achieved by multiplying an elementary matrix
P obtained from h by interchanging the two rows, to b on the left:

Recall that each elementary row operation has an inverse operation, which is
also an elementary operation, that brings the elementary matrix back to the original
identity matrix. In other words, if E denotes an elementary matrix and if E' denotes
the elementary matrix corresponding to the 'inverse' elementary row operation of E,
then E' E = 1, because
1.7. Elementary matrices and finding A -I 25

(1) if E multiplies a row by c =f. 0, then E' multiplies the same row by ~;
(2) if E interchanges two rows, then E' interchanges them again;
(3) if E adds a multiple of one row to another, then E' subtracts it from the same
row.
Furthermore, one can say that E' E I = =E E' : every elementary matrix is invertible
and inverse matrix E- 1 = E' is also an elementary matrix.

Example 1.14 (Inverse ofan elementary matrix) If

EI = 01 0c O
0 ] , E2 = [ 1
0 0
100 ] , E 3 = [01
[ 001 301 0

then

Definition 1.14 A permutation matrix is a square matrix obtained from the identity
matrix by permuting the rows.

In Example 1.14, E3 is a permutation matrix, but E2 is not.


Problem 1.17 Prove:
(1) A permutation matrix is the product of a finite number of elementary matrices each of
which corresponds to the 'row-interchanging' elementary row operation.
=
(2) Every permutation matrix P is invertible and p - 1 pT .
(3) The product of any two permutation matrices is a permutation matrix.
(4) The transpose of a permutation matrix is also a permutation matrix.

Problem 1.18 Define the elementary column operations for a matrix by just replacing 'row '
by 'column' in the definition of the elementary row operations . Show that if A is an m x n
matrix and if E is a matrix obtained by executing an elementary column operation on In, then
AE is exactly the matrix that is obtained from A when the same column operation is executed
on A. In particular, if D is an n x n diagonal matrix with diagonal entries d I, d2, . . . , dn,
then AD is obtained by multiplication by dl, d2, ... , dn of the columns of A, while DA is
obtained by multiplication by dl' d2' . . . , dn of the rows of A .

The next theorem establishes some fundamental relations between n xn square


matrices and systems of n linear equations in n unknowns.

Theorem 1.9 Let A be an n x n matrix. The following are equivalent:


(1) A has a left inverse;
(2) Ax = 0 has only the trivial solution x = 0;
26 Chapter 1. Linear Equations and Matrices

(3) A is row-equivalent to In ;
(4) A is a product of elementary matrices;
(5) A is invertible;
(6) A has a right inverse.

Proof: (1) =} (2): Let x be a solution of the homogeneous system Ax = 0, and let
B be a left inverse of A. Then

x = Inx = (BA)x = B(Ax) = BO = O.


(2) =} (3): Suppose that the homogeneous system Ax = 0 has only the trivial

I
solution x 0:=
XI = 0
X2 = 0

Xn = O.
This means that the augmented matrix [A 0] of the system Ax = 0 is reduced to the
system [In 0] by Gauss-Jordan elimination. Hence, A is row-equivalent to In.
(3) =} (4): Assume A is row-equivalent to In, so that A can be reduced to In by a
finite sequence of elementary row operations . Thus, one can find elementary matrices
EI, E2, "" e, such that
Ek .. . E2EIA = In.
By multiplying successively both sides of this equation by s;', ..., 2
E 1, E)I on
the left, we obtain

A = E I- I E2- I E-II E-IE- I E- I


'" k n= I 2'" k '

which expresses A as the product of elementary matrices .


(4) =} (5) is trivial, because any elementary matrix is invertible. In fact, A-I =
Ek " E2E l .
(5) =} (1) and (5) =} (6) are trivial.
(6) =} (5): If B is a right inverse of A , then A is a left inverse of B and one can
apply (1) =} (2) =} (3) =} (4) =} (5) to B and conclude that B is invertible, with A as
its unique inverse, by Lemma 1.7. That is, B is the inverse of A and so A is invertible.
o

If a triangular matrix A has a zero diagonal entry, then the system Ax = 0 has at
least one free variable, so that it has infinitely many solutions . Hence, one can have
the following corollary.
Corollary 1.10 A triangular matrix is invertible if and only if it has no zero diagonal
entry.

From Theorem 1.9, one can see that a square matrix is invertible if it has a one-
sided inverse. In particular, if a square matrix A is invertible, then x = A -I b is a
unique solution to the system Ax = b.
1.7. Elementary matrices and finding A -I 27

Problem 1.19 Find the inverse of the product

[~ o
~ ~] [ ~ ~ ~] [-~ ~ ~1]'
-c 1 -b 0 1 0 0

As an application of Theorem 1.9, one can find a practical method for finding
the inverse A-I of an invertible nxn matrix A. If A is invertible, then A is row
equivalent to In and so there are elementary matrices EI, Ez, .. . , Ek such that
Ek'" EzEIA = In. Hence,
A-I = Ek '" EzEI = Ek'" EzElln .
This means that A-I can be obtained by performing on In the same sequence ofthe
elementary row operations that reduces A to In . Practically, one first constructs an
n x 2n augmented matrix [A I In] and then performs a Gaussian-Jordan elimination
that reduces A to In on [A I In] to get [In I A -I]: that is,

[A I In] -+ [Ee'" EIA I Ee'" Ell] = [U I K]


-+ [Fk'" FlU I Fk'" FIK] = [I I A-I],
where Ee .. . EI represents a Gaussian elimination that reduces A to a row-echelon
form U and Fk .. . FI represents the back substitution. The following example illus-
trates the computation of an inverse matrix.
Example 1.15 (Computing A -I by Gauss-Jordan elimination) Find the inverse of

A=
[I 23]
235 .
102

Solution: Apply Gauss-Jordan elimination to


2 3 I100]
[A I I] =
U 3 5
0 2
I 0 1 0
I a a 1
(-2)row 1 + row 2
(-1 )row I + row 3

U
-+
2 3
-1 -1
-2 -1 n
I a
-2 1
-1 a
(-l)row 2

-+
U
2
1
3
1
1 0
2 -1
n (2)row 2 + row 3

n
-2 -1 -1 a

U
2 3 1 0
-+ 1 1 2 -1
a 1 3 -2
28 Chapter 1. Linear Equations and Matrices

This is [U I K] obtained by Gaussian elimination. Now continue the back substitution


to reduce [U I K] to [I I A -I] .

2 3
1 00] (-l)row 3 + row 2
[U I K] =
U 1 1
0 1
2 -1 0
3 -2 1

6
(-3)row 3 + row 1

2 0 -8
(-2)row 2 + row 1

U
-3 ]
-+ 1 0 -1 1 - 1
0 1 3 -2 1

-i-1 ] =
0 0 -6 4

U
I].
-+ 1 0 -1 1 [IIA-
0 1 3 -2

[-6 4-1 ]
Thus, we get

A-I = -1 1 -1 .
3 -2 1

(The reader should verify that AA- I = I = A-I A .) 0

Note that if A is not invertible, then , at some step in Gaussian elimination, a


zero r[ow ~il~ ShO~ ]u p on the left-hand Sid[e ~n [U61 Kl
]For example, the matrix

A= 2 4 -1 is row-equivalent to 0 -8 -9 , which has a zero row


-1 2 5 0 0 0
and is not invertible.
By Theorem 1.9, a square matrix A is invertible if and only if Ax = 0 has only
the trivial solution. That is, a square matrix A is noninvertible if and only if Ax 0 =
has a nontrivial solution, say XC. Now, for any column vector b =
[bl '" bn]T , if XI
is a solution of Ax = b for a noninvertible matrix A, so is kxo + xi for any k, since

A(kxo + Xl) = k(Axo) + AXI = kO + b = b.


This argument strengthens Theorem 1.6 as follows when A is a square matrix:

Theorem 1.11 If A is an invertible n x n matrix. then for any column vector b =


[bl .. . bnf. the system Ax = b has exactly one solution x = A-lb. If A is not
invertible, then the system has either no solution or infinitely many solutions according
to the consistency of the system.

Problem 1.20 Express A-I as a product of elementary matrices for A given in Example 1.15.
1.8. LDU factorization 29

dO
l ...
Problem 1.21 When is a diagonal matrix D = 0 ] . nonsingular, and what is
[
dn

Problem 1.22 Write the system of linear equations

I
x + 2y + 2z = 10
2x - 2y + 3z 1
4x - 3y + 5z = 4

in matrix form Ax = b and solve it by finding A-I b.

Problem 1.23 True or false: If the matrix A has a left inverse C so that C A = In, then x = Cb
is a solution of the system Ax = b . Justify your answer.

1.8 WU factorization

In this section, we show that the forward elimination for solving a system of linear
equations Ax = b can be expressed by some invertible lower triangular matrix, so
that the matrix A can be factored as a product of two or more triangular matrices.
We first assume that no permutations of rows (2nd kind of operation) are necessary
throughout the whole process of forward elimination on the augmented matrix [A b].
Then the forward elimination is just multiplications of the augmented matrix [A b]
by finitely many elementary matrices Ek, .. . , E 1: that is,

where each E, is a lower triangular elementary matrix whose diagonal entries are all
l's and [U y] is a row-echelon form of [A b] without divisions of the rows by the
pivots. (Note that if A is a square matrix, then U must be an upper triangular matrix).
Therefore, if we set L = (Ek' " E1)-1 = Ell . . . Ei: 1, then we have A = LU,
where L is a lowertriangularmatrix whosediagonalentriesareall l's. (In fact, each
Ei l
is also a lower triangular matrix, and a product of lower triangular matrices is
also lower triangular (see Problem 1.25). Such factorization A = LU is called an LU
factorization or an LU decomposition of A. For example,

where di 'S are the pivots.


Now, let A = LU be an LU factorization. Then the system Ax = b can be written
as LUx = b. Let Ux = y. Thus, the system
30 Chapter 1. Linear Equations and Matrices

Ax= LUx=b

can be solved by the following two steps:


Step 1 Solve Ly = b for y.
Step 2 Solve Ux = y by back substitution.
The following example illustrates the convenience of an LU factorization of a matrix
A for solving the system Ax = b.
Example 1.16 Solve the system oflinear equations

2 1 1
Ax = 4 1 0
[ -2 2 1

by using an LU factorization of A.

Solution: The elementary matrices for the forward elimination on the augmented
matrix [A b] are easily found to be

0]
o1 0 ,
3 1
so that

.: ., 0 = U.
o -4
41 ]

Thus, if we set

L = Ell E:;1 E3"1 =[ ; ~ ~],


-1 -3 1
which is a lower triangular matrix with 1 's on the diagonal, then

A = LU = [ ; ~ ~] [~ -~ 10]
-2 1 .
-1 -3 1 0 0 -4 4

Now, the system

= 1
Ly =b : Y2 = -2
3Y2 + Y3 = 7

can be easily solved inductively to get y = (1, -4, - 4) and the system
1.8. WU factorization 31

= 1
Ux=y: = -4
= -4

also can be solved by back substitution to get

for t E JR, which is the solution for the original system. o


As shown in Example 1.16, it is a simple computation to solve the systems Ly = b
=
and Ux y, because the matrix L is lower triangular and the matrix U is the matrix
obtained from A after forward elimination so that most entries of the lower-left side
are zero.

Remark: For a system Ax = b, the Gaussian elimination may be described as an


LU factorization of a matrix A. Let us assume that one needs to solve several systems
of linear equations Ax = bi for i = 1, 2, . . . , l with the same coefficient matrix A.
Instead ofperforming the Gaussian elimination process l times to solve these systems,
one can use an LU factorization of A and can solve first Ly = b, for i = 1,2, ... , l
to get solutions Yi and then the solutions of Ax = bi are just those of Ux = Yi . From
an algorithmic point of view, the suggested method based on the LU factorization
of A is much efficient than doing the Gaussian elimination repeatedly, in particular,
when l is large .
Problem 1.24 Determine an LU factorization of the matrix

1 -1 0 ]
A = -1 2 -1 ,
[ o -1 2

from which solve Ax = b for (1) b = [1 1 If and (2) b = [2 0 - l]T .

Problem 1.25 Let A and B be two lower triangular matrices. Prove that
(1) their product AB is also a lower triangular matrix;
(2) if A is invertible , then its inverse is also a lower triangular matrix ;
(3) if the diagonal entries of A and B are all 1 's, then the same holds for their product AB
and their inverses.

Note that the same holds for upper triangular matrices, and for the product of more than two
matrices .

The matrix U in the decomposition A = LU of A can further be factored as the


product U = DO, where D is a diagonal matrix whose diagonal entries are the pivots
32 Chapter 1. Linear Equations and Matrices

of U or zeros and U is a row-echelon form of A with leading 1's, so that A = LDU.

u n[~l * *]
For example,

0 0 * * *
1 0 0 dz *
A * * = LU
* 1 0 0 0 d3 *
0 0 0 o 0

u n[~l n[~
* *
* * *
=
0 0
1 0
0
dz
0
0
dI dI
d2 * d2
0 1 .!. * * .]
* 1 0 d3 0 0 0 1 *
d3
* * 0 0 0 0 0 0 0
= LDU.
For notational convention, we replace U again by U and write A = LDU. This
decomposition of A is called an LDU factorization or an LDU decomposition of
A.
For example, the matrix A in Example 1.16 was factored as

1 0] [2 1-210]
A = [ -12 -31 01 0-1 1 = LU .
0 0 -4 4
It can be further factored as A = LDU by taking
2 1 10] [200][1
1/2 1/2 ]
[ -1 -2 1
00-44 o
= 0 -1
0 0 -4
0
0
1 2
1
-1
-1
= DU.

The LDU factorization of a matrix A is always possible when no row interchange


is needed in the forward elimination process . In general, if a permutation matrix for
a row interchange is necessary in the forward elimination process , then an LDU
factorization may not be possible.

Example 1.17 (The LDU factorization cannot exist) Consider a matrix A =


[~ b] ' For forward elimination, it is necessary to interchange the first row with
the second row. Without this interchange, A has no LU or LDU factorization . In fact,
one can show that it cannot be expressed as a product of any lower triangular matrix
L and any upper triangular matrix U. 0
Suppose now that a row interchange is necessary during the forward elimination
on the augmented matrix [A b]. In this case, one can first do all the row interchanges
before doing any other type of elementary row operations, since the interchange of
rows can be done at any time, before or after the other elementary operations, with the
same effect on the solution . Those 'row-interchanging' elementary matrices altogether
form a permutation matrix P so that no more row interchanges are needed during the
forward elimination on PA. Now, the matrix PA can have an LDU factorization.
e
1.8. LDU factorization 33

E[Xrr t]18;:, :a:~:::::::~::o: )n::'::~::t:~::


102
:::::ro:
with the third row, that is, we need to multiply A by the permutation matrix P =

[~ ~
100
~] so that

PA = [~ ~ ~] = [~011
~ ~] [~0~0~]2 0
[~0~ ~1] = LDU.
012

Note that U is a row-echelon form of the matrix A. 0

Of course, if we choose a different permutation p i, then the L DU factorization of


p iA may be different from that of PA, even if there is another permutation matrix P"
that changes p i A to P A. Moreo ver, as the following example shows, even if a per-
mutation matrix is not necessary in the Gaussian elimination , the LDU factorization
of A need not be unique.

Example1.19 (Infinitely many LDU factorizations) The matrix

110]
B=
[ 000
1 3 0

has the LDU factorization


1 0
B=
[
1 1
o
~ ] [~ ~ ~] [~000
0 1 OO x
~ ~] = LDU

for any value x . It shows that a singular matrix B has infinitely many LDU factor-
izations . 0

However, if the matrix A is invertible and if the permutation matrix P is fixed


when it is necessary, then the matrix PA has a unique LDU factorization.

Theorem 1.12 Let A be an invertible matrix. Thenfor a fixed suitable permutation


matrix P the matrix PA has a unique LDU factorization.

Proof: Suppose that PA = LjD j U , = L zDzUz , wheretheL 's are lower triangular,
the U's are upper triangular whose diagonals are all l's, and the D's are diagonal
matrices with no zeros on the diagonal. One needs to show L , = Lz, D; = Dz , and
U, = Uz for the uniqueness .
Note that the inverse of a lower triangular matrix is also lower triangular, and the
inverse of an upper triangular matrix is also upper triangular. And the inverse of a
34 Chapter 1. Linear Equations and Matrices

diagonal matrix is also diagonal. Therefore, by multiplying (LIDI)-I = D I I L I I


on the left and V:;I on the right, the equation LI DI VI = L2D2V2 becomes

VIV:;I = DIILIIL2D2 '

The left-hand side is upper triangular, while the right-hand side is lower triangular.
Hence, both sides must be diagonal. However, since the diagonal entries of the upper
triangular matrix VI V:;I are all 1's, it must be the identity matrix I (see Problem
1.25). Thus VI V:; 1 = I, i.e., VI = V2. Similarly, L I I L2 = DID:;I implies that
LI = L2 and DI = D2. 0

In particular, if an invertiblematrix A is symmetric (i.e., A = A T) , and if it can


be factored into A = LDV without row interchanges,then we have

and thus, by the uniqueness of factorizations, we have V = L T and A = L DL T

Problem 1.26 FUuI the factors L , D , .nd U for A =[ -! -1 0 ]


2 -1
-1
.
2
What is the solution to Ax = b for b = [1 0 - If?

probl[em/2; ~o]r all possible permutation matrices P, find the LDU factorization of PA for

A= 2 4 2 .
111

1.9 Applications

1.9.1 Cryptography

Cryptographyis the study of sendingmessagesin disguisedform (secretcodes) so that


only the intended recipients can remove the disguise and read the message; modem
cryptography uses advanced mathematics. As an application of invertible matrices,
we introduce a simple coding. Suppose we associate a prescribed number with every
letter in the alphabet; for example,

ABC D x Y Z Blank ?
t t t t t t t t t t
1 2 3 23 24 25 26 27 28.

Suppose that we wantto send the message"GOOD LUCK." Replacethis message


by
6, 14, 14, 3, 26, 11, 20, 2, 10
1.9.1. Application: Cryptography 35

according to the preceding substitution scheme. To use a matrix technique, we first


break the message into three vectors in ]R3 each with three components, by adding
extra blanks if necessary:

Next, choose a nonsingular 3 x 3 matrix A, say

I 00]
A=
[2 I 0 ,
I I I

which is supposed to be known to both sender and receiver. Then, as a matrix multi-
plication, A translates our message into

By putting the components of the resulting vectors consecutively, we transmit

6, 26, 34, 3, 32, 40, 20, 42, 32.

To decode this message, the receiver may follow the following process . Suppose
that we received the following reply from our correspondent:

19, 45, 26, 13, 36, 41.

To decode it, first break the message into two vectors in ]R3 as before:

un ~ un
We want to find two vectors XI , X2 such that AXi is the i-th vector of the above two
vectors : i.e.,

AX! ~ Ax,

Since A is invertible, the vectors XI , X2 can be found by multiplying the inverse of A


to the two vectors given in the message. By an easy computation, one can find

I 00]
A-I =
[ -2 I 0 .
1 -I I

Therefore,
36 Chapter 1. Linear Equations and Matrices

=[ -21 01 00] [ 45
19 ] = [197 ]

XI ,
1 -1 1 26
The numbers one obtains are

19, 7, 0, 13, 10, 18.

Using our correspondence between letters and numbers, the message we have received
is "THANKS."

Problem 1.28 Encode ''TAKEUFO" using the same matrix A used in the above example.

1.9.2 Electrical network

In an electrical network, a simple current flow may be illustrated by a diagram like the
one below. Such a network involves only voltage sources , like batteries, and resistors,
like bulbs , motors, or refrigerators. The voltage is measured in volts, the resistance in
ohms, and the current flow in amperes (amps, in short). For such an electrical network,
current flow is governed by the following three laws:

Ohm's Law: The voltage drop V across a resistor is the product of the current I
and the resistance R: V = I R.
Kirchhoff's Current Law (KCL): The current flow into a node equals the current
flow out of the node.
Kirchhoff's Voltage Law (KVL): The algebraic sum of the voltage drops around
a closed loop equals the total voltage sources in the loop .

Example 1.20 Determine the currents in the network given in Figure 104.

2 ohms 2 ohms
P

18 volts

Figure 1.4. A circuit network


1.9.3. Application: Leontief model 37

40 ohms 20 volts 1 ohm

30 ohms

5 volts

40 ohms 40 volts 4 volts


(1) (2)

Figure 1.5. Two circuit networks

Solution: By applying KCL to nodes P and Q, we get equations

h + h = 12 at P,
12 = h + ls at Q.
Observe that both equations are the same, and one of them is redundant. By applying
KVL to each of the loops in the network clockwise direction, we get

6h + 212 = 0 from the left loop,


212 + 3h = 18 from the right loop.

Collecting all the equations, we get a system of linear equations:

j h -
6h
212
+ 212
h +
+ st,
h =
=
O
0
18.

By solving it, the currents are h = -1 amp, 12 = 3 amps and /3 = 4 amps. The
negative sign for h means that the current h flows in the direction opposite to that
shown in the figure. 0

Problem 1.29 Determine the currents in the networks given in Figure 1.5.

1.9.3 Leontiefmodel

Another significant application of linear algebra is to a mathematical model in eco-


nomics. In most nations, an economic society may be divided into many sectors that
produce goods or services, such as the automobile industry, oil industry, steelindustry,
38 Chapter 1. Linear Equations and Matrices

communication industry, and so on. Then a fundamental problem in economics is to


find the equilibrium of the supply and the demand in the economy.
There are two kind of demands for the goods: the intermediate demand from the
industries themselves (or the sectors) that are needed as inputs for their own pro-
duction, and the extra demand from the consumer, the governmental use, surplus
production, or exports. Practically, the interrelation between the sectors is very com-
plicated, and the connection between the extra demand and the production is unclear.
A natural question is whether there is a production level such that the total amounts
produced (or supply) will exactly balance the total demandfor the production, so that
the equality

{Total output} = {Total demand}


= {Intermediate demand} + {Extra demand}

holds. This problem can be described by a system of linear equations, which is called
the LeontiefInput-Output Model . To illustrate this, we show a simple example.
Suppose that a nation's economy consists of three sectors: It = automobile in-
dustry, h == steel industry, and h = oil industry.
Let x = [Xl Xzx3f denote the production vector (or production level) in ]R3 ,
where each entry Xi denotes the total amount (in a common unit such as 'dollars'
rather than quantities such as 'tons' or 'gallons') of the output that the industry Ii
produces per year.
The intermediate demand may be explained as follows. Suppose that, for the total
output Xz units of the steel industry h 20% is contributed by the output of It. 40%
by that of hand 20% by that of h Then we can write this as a column vector, called
a unit consumption vector of h :

0.2 ]
Cz = 0.4 .
[ 0.2

For example , if h decides to produce 100 units per year, then it will order (or demand)
20 units from It , 40 units from hand 20 units from h: i.e., the consumption vector
of h for the production Xz = 100 units can be written as a column vector: 100cz =
[2040 20]T. From the concept of the consumption vector, it is clear that the sum of
decimal fractions in the column cz must be ::: 1.
In our example, suppose that the demands (inputs) of the outputs are given by the
following matrix , called an input-output matrix:

output
It h
[0.3
h

A = input
t,
h 0.1
0.2
0.4
03 ]
0.1 .
h 0.3 0.2 0.3
t t t
c\ cz C3
1.9.3. Application: Leontiefmodel 39

In this matrix, an industry looks down a column to see how much it needs from
where to produce its total output, and it looks across a row to see how much of its
output goes to where. For example, the second row says that, out of the total output
Xz units of the steel industry lz. as the intermediate demand, the automobile industry
h demands 10% of the output Xl, the steel industry /z demands 40% of the output
Xz and the oil industry h demands 10% of the output X3. Therefore , it is now easy to
see that the intermediate demand of the economy can be written as

0.3 0.2 0.3] [ Xl] [ 0.3Xl + 0.2xz + 0.3X3 ]


Ax = 0.1 0.4 0.1 Xz = O.lXl + O.4xz + 0.lX3 .
[ 0.3 0.2 0.3 X3 0 3Xl + 0.2xz + 0.3X3

Suppose that the extra demand in our example is given by d = [dl, di, d3f =
[30,20, ioi" , Then the problem for this economy is to find the production vector x
satisfying the following equation:

x = Ax+d.

Another form of the equation is (l - A)x = d, where the matrix I - A is called the
Leontief matrix. If I - A is not invertible, then the equation may have no solution
or infinitely many solutions depending on what d is. If I - A is invertible, then the
equation has the unique solution x = (l - A)-ld. Now, our example can be written

[]
as

;~ ] = [~:i
[ X3
~:~ ~:i] ;~X3
0.3 0.2 0.3
+[ ;~10 ] .
In this example, it turns out that the matrix I - A is invertible and

2.0 1.0 1.0]


(l - A)-l = 0.5 2.0 0.5 .
[ 1.0 1.0 2.0

l
Therefore,

x ~ (l - AJ-1d = [ ~
which gives the total amount of product Xi of the industry I i for one year to meet the
required demand .

Remark: (1) Under the usual circumstances, the sum ofthe entries in a column of the
consumption matrix A is less than one because a sector should require less than one
units worth of inputs to produce one unit of output. This actually implies that I - A
is invertible and the production vector x is feasible in the sense that the entries in x
are all nonnegative as the following argument shows.
(2) In general, by using induction one can easily verify that for any k = 1, 2, . . . ,
40 Chapter 1. Linear Equations and Matrices

If the sums of column entries of A are all strictly less than one, then limk-+oo A k = 0
(see Section 6.4 for the limit of a sequence of matrices). Thus, we get (l - A)(l +
A + ... + A k + ...) = I, that is,

(l - A)-I = I + A + ...+ Ak + ... .


This also shows a practicalway of computing (l - A)-I sinceby takingk sufficiently
large the right-handside may be made veryclose to (l- A)-I. In Chapter6, an easier
method of computing A k will be shown.
In summary, if A and d have nonnegative entries and if the sum of the entries of
each column of A is less than one, then I - A is invertibleand the inverseis given as
the above formula. Moreover, as the formula shows the entries of the inverse are all
nonnegative, and so are those of the production vectorx = (l - A)-Id.
Problem 1.30 Determine the total demand for industries It, 12 and 13 for the input-output
matrix A and the extra demand vector d given below:

0.1 0.7 0.2]


A= 0.5 0.1 0.6 with d = O.
[ 0.4 0.2 0.2

Problem 1.31 Suppose that an economy is divided into three sectors : II = services, h =
=
manufacturing industries, and h agriculture . For each unit of output, II demands no services
from 1], 0.4 units from lz. and 0.5 units from 13. For each unit of output. lz requires 0.1 units
from sector II of services. 0.7 units from other parts in sector lz. and no product from sector
13. For each unit of output. 13 demands 0.8 units of services 110 0.1 units of manufacturing
products from lz. and 0.1 units of its own output from lJ. Determine the production level to
balance the economy when 90 units of services, 10 units of manufacturing. and 30 units of
agriculture are required as the extra demand.

1.10 Exercises
1.1. Which of the following matrices are in row-echelon form or in reduced row-echelon form?

A~U
0 0 0
0 1 0
-3 ]
4
h[l n- 0
O
0
0
1
0
0
0
1

-n D=[l
0 0 1 2
0 0 0
0 0 0

c~u
1 0 0

~l
0 1 2
0 1 1

nF=[l
0 0 1
0 0 1
0 0 0

-H
0 0 0

E=[l
1 0 0
1 0 0
0 1 0
0 0 1
1 0 -2
0 0 0
1.10. Exercises 41

1.2. Find a row-echelon form of each matrix.

[I -3 212]
H
2 3 4

[~
3 4 5
(1) 3 -9 10 2 9 (2) 4 5 1
2 -6 4 2 4 '
5 1 2
2 -6 8 1 7
1 2 3

1.3. Find the reduced row-echelon form of the matrices in Exercise 1.2.
1.4. Solve the systems of equations by Gauss-Jordan elimination.

(1) 12;~ ~ ;~
3XI + 2X2
: X3
;~ +;:
X4
-~
I

I~;
Xl + X2 + 3X3 3X4 -8 .

(2) ;~ + z 1~
2x + 4z 1.
What are the pivots in each of 3rd kind elementary operations?
1.5. Which of the following systems has a nontrivial solution?

I
X + 2y + 3z = 0 12x + y - z 0
(1) 2y + 2z = 0 (2) x - 2y - 3z = 0
x + 2y + 3z = O. 3x + y - 2z = O.
1.6. Determine all values of the b, that make the following system consistent:

I
x + y - z = bl
2y + z = b2
Y-Z=b3

1.7. Determine the condition on b; so that the following system has no solution :

2x + Y + 7z bl

1 6x
2x -
- 2y
y
+
+
l lz
3z
b2
b3.

1.8. Let A and B be matrices of the same size.


(I) Show that, if Ax = 0 for all x, then A is the zero matrix.
(2) Show that, if Ax = Bx for all x, then A = B.
1.9. Compute ABC and CAB for

A~[i -~ :j. B~[jl C~[I -I]


1.10. Prove that if A is a 3 x 3 matrix such that AB = BA for every 3 x 3 matrix B , then
A = clJ for some constant c.
1.11. Let A = [~ ~ ~]. Find Ak for all integers k.
001
42 Chapter 1. Linear Equations and Matrices

1.12. Compute (2A - B)C and CC T for

A = [~ ~ ~] , B = [-; ~ ~] , C =[ ~ ~ ~] .
I 0 I 0 0 1 -2 2 I
1.13. Let f(x) =anxn + an_IX n- 1 + ...+ alx + ao be a polynomial. For any square matrix
A, a matrix polynomial f (A) is defined as

n
f(A) = anAn + an_lAn-I + ...+ alA + aol.
For f(x) = 3x 3 + x 2 - 2x + 3, find f(A) for

(I)A~[ -i ~ (2)A~U -~-n


1.14. Find the symmetric part and the skew-symmetric part of each of the following matrices .

(I) A =[ ; ;
-1 3 2
~] , (2) A = [~
0 0
; -1].
3

1.15. Find AA T and A T A for the matrix A = [; - ~ ~ i].


2 8 4 0

1.16. Let A -I = [~ ~ ~] .
U:J.
421

(I) Flod. matrix B such that AB ~


(2) Find a matrix C such that AC = A2 + A.

1.17. Find all possible choices of a, band c so that A = [; ~] has an inverse matrix such
that A-I = A.
1.18. Decide whether or not each of the following matrices is invertible. Find the inverses for
invertible ones.

A=[~o ~ ~ :], B= [ 011I]


2 3 ,
5 5 I
; -;].
2 I
0 0 4
1.19. Find the inverse of each of the following matrices:

A=[=~ -~ ;]'B=[~ ~ ~ ~],c=[l ~ ~ ~](k=l=O)'


6411 1248 OOlk
1.20. Suppose A is a 2 x I matrix and B is a I x 2 matrix. Prove that the product AB is not

i
invertible.

1.21. Find three matrices which are row equivalent to A ~ [~ -


3
2
4 ]
-I .
-3 4
1.10. Exercises 43

1.22. Write the following systems of equations as matrix equations Ax = b and solve them by

I
computing A -I b:
2x1 - X2 + 2
3X3 = XI - X2 + X3 5
= XI + =
(1)
1 X2 -
2x1 + X2 - 2X3
5 (2)
4X3
7, = 4xI
X2
3X2 + 2x3
X3

1.23. Find the LDU factorization for each of the following matrices:
=
-1
-3.

(1) A = [~ ~ J. (2) A = [~ ~ l
U~ H
1.24. Find the LDL T factorization of the following symmetric matrices :

(1) A ~ (2) A ~ [: n
1.25. Solve Ax = b with A = LU , where Land U are given as
L = [ .: ~ ~], U = [~ -~ -~] , b = [ -; ] .

n b~Ul
o -1 I 0 0 1 4
Forward elimination is the same as Lc = b, and back-substitution is Ux = c.

1~ l<tA~U : ,00

(1) Solve Ax = b by Gauss-Jordan elimination.


(2) Find the LDU factorizat ion of A.
(3) Write A as a product of elementary matrice s.
(4) Find the inverse of A.
1.27. A square matrix A is said to be nilpotent if A k = 0 for a positive integer k.
(1) Show that any invertible matrix is not nilpotent.
(2) Show that any triangular matrix with zero diagonal is nilpoten t.
(3) Show that if A is a nilpotent with Ak = 0, then I - A is invertible with its inverse
I + A + ' " + Ak- I .
1.28 . A square matrix A is said to be idempotent if A 2 = A.
(l) Find an example of an idempotent matrix other than 0 or I .
(2) Show that, if a matrix A is both idempotent and invertible, then A = I.
1.29. Determine whether the following statements are true or false, in general, and justify your
answers.
(1) Let A and B be row-equivalent square matrices. Then A is invertible if and only if B
is invertible .
(2) Let A be a square matrix such that AA = A. Then A is the identity.
(3) If A and B are invertible matrices such that A 2 = I and B2 = I, then (A B) -I = BA.
(4) If A and B are invertible matrices, A + B is also invertible .
(5) If A, Band AB are symmetric, then AB = BA .
(6) If A and B are symmetric and of the same size, then AB is also symmetric.
44 Chapter 1. Linear Equations and Matrices

(7) If A is invertible and symmetric, then A -\ is also symmetric.


(8) Let AB T = I . Then A is invertible if and only if B is invertible .
(9) If a square matrix A is not invertible, then neither is AB for any B.
(10) If E\ and E2 are elementary matrices , then E\E2 = E2E\ .
(11) The inverse of an invertible upper triangular matrix is upper triangular.
(12) Any invertible matrix A can be written as A = LU, where L is lower triangular and
U is upper triangular.
2
Determinants

2.1 Basic properties of the determinant


Our primary interest in Chapter 1 was in the solvability or finding solutions of a
system Ax = b of linear equations . For an invertible matrix A, Theorem 1.9 shows
that the system has a unique solution x = A-I b for any b.
Now the question is how to determine whether or not a square matrix A is invert-
ible. In this chapter, we introduce the notion of determinant as a real-valued function
of square matrices that satisfies certain axiomatic rules, and then show that a square
matrix A is invertible if and only if the determinant of A is not zero. In fact, it was
shown in Chapter I that a 2 x 2 matrix A = [~ ~] is invertible if and only if
ad - be f:; O. This number is called the determinant of A, written det A, and is defined
formally as follows :

Definition 2.1 For a 2 x 2 matrix A = [~ ~] E M2x2(J~.), the determinant of


A is defined as det A = ad - be.

Geometrically, it turns out that the determinant of a 2 x 2 matrix A represents, up


to sign, the area of a parallelogram in the xy-plane whose edges are constructed
by the row vectors of A (see Theorem 2.10). Naturally, one can expect to define
a determinant function on higher order square matrices so that it has a geometric
interpretation similar to the 2 x 2 case . However, the formula itself in Definition 2.1
does not provide any clue of how to extend this idea of determinant to higher order
matrices. Hence, we first examine some fundamental properties of the determinant
function defined in Definition 2.1.
By a direct computation, one can easily verify that the function det in Definition 2.1
satisfies the following lemma.

Lemma 2.1 (1) det [~ ~] 1.

J H Kwak et al., Linear Algebra


Birkhauser Boston 2004
46 Chapter 2. Determinants

Proof: (2) det [~ : ] = be - ad = -(ad - be) = - det [ ~


(3) det [ ka ~ la' kb ~ lb' ] = (ka + la')d - (kb + lb')e

= k(ad - be) + l(a'd - b'e)

= k det [ ae db] + l det [ a'e b'


d .
]
o

In Lemma 2.5, it will be shown that if a function f : M2x2(lR) ~ lRsatisfies the


properties (l)-{3) in Lemma 2.1, then it must be the function detdefined in Definition
2.1, that is, f(A) =ad -be. The properties (1)-(3) in Lemma 2.1 of the determinant
on M2x2(lR) enable us to define the determinant function for any square matrix.

Definition 2.2 A real-valued function f : Mnxn(lR) ~ lRofalln xn square matrices


is called a determinant if it satisfies the following three rules:
(Rl) The value of f of the identity matrix is I , i.e., fUn) 1; =
(R2) the value of f changes sign if any two rows are interchanged;
(R3) f is linear in the first row: that is, by definition,

f
krl ~. lr~
.
] =
kf
[ ~~]
. +
.
[~]
.
l
f . ,
[ . . .
rn rn rn

where r i's denote the row vectors [ail . . . ain] of a matrix .

Remark: (1) To be familiar with the linearity rule (R3), note that all row vectors of
any n x n matrix belong to the set Mlxn(lR), on which a matrix sum and a scalar
multiplication are well defined.A real-valued function f : M 1 xn (R) ~ lR is said to be
linear if it preserves these two operations: that is, for any two vectors x, y E M1 xn (R)
and scalar k,

f(x + y) = f(x) + f(y) and f(kx) = kf(x),


or, equivalently f(kx+ly) = kf(x)+lf(y). Such a linear function will be discussed
again in Chapter 4.
(2) The determinant is not defined for a non-square matrix .

Itis already shown that the det on 2x2 matrices satisfies the rules (Rl)-{R3). In the
next section, one can see that for each positive integer n there always exists a function
2.1. Basic properties of the determinant 47

f : MnxnOR) ~ IR satisfying the three rules (Rl)-(R3) and such a function is unique
(existence and uniqueness). Therefore, we say 'the' determinant and designate it as
'det' in any order.
Let us first derive some direct consequences of the rules (Rl)-(R3).

Theorem 2.2 The determinant satisfies the following properties.


(1) The determinant is linear in each row.
(2) If A has either a zero row or two identical rows, then det A = O.
(3) The elementary row operation that adds a constant multiple ofone row to another
row leaves the determinant unchanged.

Proof: (I) Any row can be placed in the first row by interchanging rows with a change
of sign in the determinant by the rule (R2), and then using the linearity rule (R3) and
(R2) again by interchanging the same rows.
(2) If A has a zero row, then this row is zero times the zero row so that det A 0=
by (1). If A has two identical rows, then interchanging those two identical rows does
not change the matrix itself but det A = - det A by the rule (R2), so that det A = O.
(3) By a direct computation using (1), one can get

ri+krj r, rj
det = det + k det

in which the second term on the right-hand side is zero by (2). o


The rule (R2) of the determinant function is said to be the alternating property,
and the property (1) in Theorem 2.2 is said to be multilinearity.
It is now easy to see the effect of elementary row operations on evaluations of the
determinant. The first elementary row operation that 'multiplies a row by a constant
k' changes the determinant to k times the determinant, by Theorem 2.2(1). The rule
(R2) explains the effect of the second elementary row operation that 'interchanges
two rows '. The third elementary row operation that 'adds a constant multiple of a row
to another' is explained in Theorem 2.2(3). In summary, one can see that

det(EA) = det E det A for any elementary matrix E.

For example, if E is the elementary matrix obtained from the identity matrix by
'multiplies a row by a constant k', then det(EA) =
k det A and det E k by =
Theorem 2.2(1) , so that det(EA) = det E detA . As a consequence, if two matrices
A and B are row-equivalent, then det A = k det B for some nonzero number k.
48 Chapter 2. Determinants

Example 2.1 Consider a matrix

1
A= [ ~b
b+c c+a

If one adds the second row to the third, then the third row becomes

[a+b+c a+b+c a+b+c],

which is a scalar multiple of the first row. Thus , det A = O. o

Problem 2.1 Show that, for an n x n matrix A and k E JR, det(kA) = k n det A.
Problem 2.2 Explain why det A = 0 for
a + l a+4 a+7]
(1) A =[ a+2 a+5 a+ 8
a+3 a+6 a+9

Recall that any square matrix can be transformed into an upper triangular matrix
by forward elimination, possibly with row interchanges. Further properties of the
determinant are obtained in the following theorem.

Theorem 2.3 The determinant satisfies the following properties.


(1) The determinant ofa triangular matrix is the product of the diagonal entries.
(2) The matrix A is invertible if and only if det A =1= O.
(3) For any two n x n matrices A and B, det(AB) = det A det B.
(4) detA T = detA.

Proof: (1) If A is a diagonal matrix, then it is clear that det A =


all ... ann by the
multilinearity in Theorem 2.2(1) and rule (Rj). Suppose that A is a lower triangular
matrix . If A has a zero diagonal entry, then a forward elimination, which does not
change the determinant, produces a zero row, so that det A = O. If A does not have
a zero diagonal entry, a forward elimination makes A row equivalent to the diagonal
matrix D whose diagonal entries are exactly those of A, so that det A = det D=
all ' " ann. Similar arguments can be applied to an upper triangular matrix.
(2) A square matrix A is row equivalent to an upper triangular matrix U through
a forward elimination possibly with row interchanges : that is, A = PLU for some
permutation matrix P and a lower triangular matrix L whose diagonal entries are all
1'S o Thus det A = det U , and the invertibility of U and A are equivalent. However,
U is invertible if and only if U has no zero diagonal entry by Corollary 1.10, which
is equivalent to det U =1= 0 by (1).
2.1. Basic properties of the determinant 49

(3) If A is not invertible, then neither is AB, and so det(AB) = 0 = det A det B.
If A is invertible, it can be written as a product of elementary matrices by Theorem
1.9, say A = EIE2 ' " Ei, Then by induction on k,

det(AB) = det(EIE2'" EkB)


= det El det E2 ... det Ek det B
= det(El E2.. . Ek) det B
= detAdet B.

(4) Clearly, A is not invertibleif and only if AT is not. Thus, for a singular matrix
A we have det AT = 0 = det A. If A is invertible, then write it again as a product of
elementary matrices, say A = E; E2.. . Ei , But, det E = det E T for any elementary
matrix E. In fact, if E is an elementary matrix obtained from the identity matrix by
row interchange, then det E T = -1 = det E by (R2), and all elementary matrices of
other types are triangular, so that det E = det E T . Hence, we have by (3)

detA T = det(EI E2'" Ek)T


= det(E[ .. . EI Ef)
= det E[ . . . det EI det Ef
= det Ek' . . det E2 det El
= detA. 0

Remark: From the equality det A = det AT, one could define the determinant in
terms of columns instead of rows in Definition2.2, and Theorem 2.2 is also true with
'columns' instead of 'rows' .
Example 2.2 (Computing det A by a forward elimination) Evaluate the determinant
of
2-4 0 0]
1 -3 0 1
A = [ 1
3 -4
0 -1
3-1
2 .

Solution: By using forward elimination, A can be transformed to an upper triangu-


lar matrix U . Since the forward elimination does not change the determinant, the
determinant of A is simply the product of the diagonal entries of U:

2-4
o 0 0] =
-1 0 1
det A = det U = det
[o 0 0 -1
0 0 13
4 2 . ( _ 1)2 . 13 = 26. o

Problem 2.3 Prove that if A is invertible, then det A-I = 1/ det A.


50 Chapter 2. Determinants

Problem 2.4 Evaluate the determinant of each of the following matrices:

1 4 2] [11 13 24
12 23
21 22 14] [ x13
(1) 3 1 1 , (2) 31 32 33 34 , (3) x 2
[
-2 2 3 41 42 43 44 x

2.2 Existence and uniqueness of the determinant


Throughout this section, we prove the following fundamental theorem for the deter-
minant.

Theorem 2.4 For any natural number n,


(1) (Existence)thereexistsareal-valuedfunetionf: Mnxn(lR) ~ IRwhiehsatisfies
the three rules (RI)-(R3) in Definition 2.2.
(2) (Uniqueness) Suehfunetion is unique.

Clearly, it is true when n = 1 with conclusion det[a] = a.


For 2 x 2 matrices: When n = 2, the existence theorem comes from Lemma
2.1. The next lemma shows that any function f : M2x2(lR) ~ IR satisfying the three
rules (RI)-(R3) must be the det in Definition 2.1, which implies the uniqueness of
the determinant function on M2x2(1R).

Lemma 2.5 If a function f : M2x2(lR) ~ R satisfies the rules (Rj)-(R3), then


f [~ ~] = ad - be. That is, f(A) = detA.

Proof: First, note that f [~ b] = - 1 by the rules (Rj) and (R2).

f(A) = f [~ ~] = f [ ~ + 0 0+ ~ ]
= f[~ ~]+f[~~]
= f[~ ~]+f[~ ~]+f[~ ~]+f[~~]
ad + 0+ 0 - be = ad - be,

where the third and fourth equalities come from the multilinearity in Theorem 2.2(1).
o

For 3 x 3 matrices: For n = 3, the same process as in the case of n = 2 can


be applied . That is, by repeated use of the three rules (Rj)-(R3) as in the proof of
2.2. Existence and uniqueness of the determinant 51

Lemma 2.5, one can derive an explicit formula for det A of a matrix A = [aij] in
M3x3(lR) as follows :

[all a\2
det a21 a22 a23
a\,]
a31 an a33

[all ~ ]+de{ ~ a\2oo 0] [0


0 0
= det ~ a22 + det a23 a21 0 af]
0 a33 0
a31 0 an

+det
[ n
~
0
0 a23
o ] + det [0
a21
a\20 0] [0 0 a~, ]
0 + det 0 a22
a32 o 0 0 a33 a31 0

The first equality is obtained by the multilinearity in Theorem 2.2(1): First, by


applying it to the first row with

[all a \2 an] = [all 0 0] + [0 a\2 0] + [0 0 an],

det A becomes the sum of determinants of three matrices . Observe that, in each of the
three matrices, the first row has just one entry from A and all others zero. Subsequently,
by applying the same multilinearity to the second and the third rows of each of the
three matrices, one gets the sum of the determinants of 33 = 27 matrices, each of
which has exactly three entries from A, one in each of three rows, and all other entries
zero. In each of those 27 matrices, if any two of the three entries of A are in the
same column, then the matrix contains a zero column so that its determinant is zero.
Consequently, the determinants of six matrices are left to get the first equality.
The second equality is just the computation of the six determinants by using the
rules (R2) and Theorem 2.3(1). In fact, in each of those six matrices, no two entries
from A are in the same row or in the same column, and thus one can take suitable
'column interchanges' to convert it to a diagonal matrix. Thus the determinant of each
of them is just the product of the three entries with sign which is determined by
the number of column interchanges.

Remark: The explicit formula for the determinant of a 3 x 3 matrix can easily be
memorized by the following scheme. Copy the first two columns and put them on
the right of the matrix , and compute the determinant by multiplying entries on six
diagonals with a + sign or a - sign as in Figure 2.1. This is known as Sarrus's method
for 3 x 3 matrices. It has no analogue for matrices of higher order n ::: 4.
The computation of the explicit formula for det A shows that, if any real-valued
function f : M3x3(lR) -+ lR satisfies the rules (Rl)-(R3) , then f(A) = detA for
any matrix A = [aij] E M3x3(lR). This proves the uniqueness theorem when n = 3.
On the other hand, one can easily show that the given explicit formula for det A of
a matrix A E M3x3(lR) satisfies the three rules, which proves the existence when
52 Chapter 2. Determinants

Figure 2.1. Sarrus's method

n = 3. Therefore, for n = 3, it shows both the uniqueness and the existence of the
determinant function on M3x3(lR), which proves Theorem 2.4 when n = 3.
Problem 2.5 Showthatthegivenexplicitformula of thedeterminant for3 x 3 matrices satisfies
the three rules (RIHR3).

Problem 2.6 Use Sarrus's methodto evaluate the determinants of

(1) A =[ 142] ,
3 1 1
-2 2 3
(2) A = [ 4-2~
1
-2
_2~] .

Now, a reader might have an idea how to prove the uniqueness and the existence of
the determinant function on Mn xn(R) for n > 3. If so, that reader may omit reading its
continued proof below and rather concentrate on understanding the explicit formula
of det A in Theorem 2.6.
For matrices of higher order n > 3: Again , we repeat the same procedure as for
the 3 x 3 matrices to get an explicit formula for det A of any square matrix A = [aij]
of order n.
(Step 1) Just as the case for n = 3, use the multilinearity in each row of A to
get det A as the sum of the determinants of nn matrices. Notice that each one of n"
matrices has exactly n entries from A, one in each of n rows . However, if any two
of the n entries from A are in the same column, then it must have a zero column, so
that its determinant is zero and it can be neglected in the summation. Now, in each
remaining matrix, the n entries from A must be in different columns: that is, no two
of the n entries from A are in the same row or in the same column .
(Step 2) Now, we aim to estimate how many of them remain. From the observation
in Step 1, in each of the remaining matrices, those n entries from A are of the form

with some column indices i , i, k, .. . , l. Since no two of these n entries are in the
same column, the column indices i , l, k, .. . , are just a rearrangement of I, 2, .. . , n
without repetition or omissions. It is not hard to see that there are just n! ways of such
rearrangements, so that n! matrices remain for further consideration. (Here n! =
n(n - 1) .. 2 I, called n factorial.)
2.2. Existence and uniquenessof the determinant 53

Remark: In fact, the n! remaining matrices can be constructed from the matrix A =
[aij] as follows: First , choose anyone entry from the first row of A, say ali, in the
i -th column. Then all the other n - I entries a2j, a3k. .. . , ani should be taken from
the columns different from the i -th column. That is, they should be chosen from
the submatrix of A obtained by deleting the row and the column containing ali . If
the second entry a2j is taken from the second row, then the third entry a3k should be
taken from the submatrix of A obtained by deleting the two rows and the two columns
containing ali and a 2j, and so on. Finally, if the first n - 1 entries

are chosen, then there is no alternative choice for the last one an i since it is the one
left after deleting n - 1 rows and n - I columns from A.
(Step 3) We now compute the determinant of each of the n! remaining matrices.
Since each of those matrices has just n entries from A so that no two of them are in
the same row or in the same column, one can convert it into a diagonal matrix by
' suitable ' column interchanges. Then the determinant will be just the product of the
n entries from A with '' sign , which will be determined by the number (actually
the parity) of the column interchanges. To determine the sign , let us once again look
back to the case of n = 3.

Example 2.3 (Convert intoa diagonal matrixby column interchanges) Suppose that
one of the six matrices is of the form:

0 0 a 13]
o a22 0 .
[ a31 0 0
Then , one can convert this matrix into a diagonal matrix by interchanging the first
and the third columns. That is,

0 0 a13 ] [ a13
det 0 a 22 0 = - det 0
[
a31 0 0 0
Note that a column interchange is the same as an interchange of the corresponding
column indices. Moreover, in each diagonal entry of a matrix, the row index must be
the same as its column index . Hence, to convert such a matrix into a diagonal matrix,
one has to convert the given arrangement of the column indices (in the example we
have 3, 2, 1) to the standard order 1, 2, 3 to be matched with the arrangement 1,2,3
of the row indices.
In this case, there may be several ways of column interchanges to convert the
given matrix to a diagonal matrix. For example, to convert the given arrangement 3,
2, 1 of the column indices to the standard order 1, 2, 3, one can take either just one
interchanging of 3 and 1, or three interchanges: 3 and 2, 3 and 1, and then 2 and 1. In
either case, the parity is odd so that the "-" sign in the computation of the determinant
came from (-1 ) I = (_ 1)3, where the exponents mean the numbers of interchanges
of the column indices. 0
54 Chapter 2. Determinants

To formalize our discussions, we introduce a mathematical terminology for a


rearrangement of n objects:

Definition 2.3 A permutation of n objects is a one-to-one function from the set of


n objects onto itself.

=
In most cases, we use the set of integers Nn {I, 2, .. . , n} for a set of n objects.
A permutation a of Nn assigns a numbera(i) in Nn to each number i in Nn , and this
permutation a is usually denoted by

a = (a(I), a(2), .. . , a(n) = (a~l) a~2) :: : a~n) '


Here , the first row is the usual lay-out of Nn as the domain set, and the second row
is the image set showing an arrangement of the numbers in Nn in a certain order
without repetitions or omissions. If Sn denotes the set of all permutations of Nn,
then, as mentioned previously, one can see that Sn has exactly n! permutations. For
example, S2 has 2 = 2!, S3 has 6 = 3!, and S4 has 24 = 4! permutations.

Definition 2.4 A permutation a = (iI, h, .. . , jn) is said to have an inversion if


i,> jt for s < t (i.e., a larger number precedes a smaller number).

For example, the permutation a = (3,1,5,4,2) has five inversions, since 3 pre-
cedes 1 and 2; 5 precedes 4 and 2; and 4 precedes 2. Note that the identity (1, 2, . .. , n)
is the only one without inversions .

Definition 2.5 A permutation is said to be even if it has an even number of inversions,


and it is said to be odd if it has an odd number of inversions. For a permutation a in
Sn, the sign of a is defined as

_ { 1 if a is an even permutation } _ (_I)k


sgn (a ) - -1 if a is an odd permutation - ,

where k is the number of inversions of a.

For example, when n = 3, the permutations (1, 2, 3), (2, 3, 1) and (3, 1, 2)
are even, while the permutations (1, 3, 2), (2, 1, 3) and (3, 2, 1) are odd.
In general, one can convert a permutation a = (a(1), a(2), . . . , a(n) in Sn into
the identity permutation (1, 2, .. . , n) by transposing each inversion of a . However,
the number of necessary transpositions to convert the given permutation into the
identity permutation need not be unique as shown in Example 2.3. An interesting
fact is that, even though the number of necessary transpositions is not unique, the
parity (even or odd) is always the same as that of the number of inversions. (This
may not be clear and the readers are suggested to convince themselves with a couple
of examples.)
We now go back to Step 3 to compute the determinants of those remaining
n! matrices. Each of them has n entries of the form al 17 ( I ) , a2u(2), . . . , an17 (n)
2.2. Existence and uniqueness of the determinant 55

for a permutation 0' E Sn. Moreover, this can be converted into a diagonal ma-
trix by column interchanges corresponding to the inversions in the permutation
0' = (O'(l), 0'(2), . . . , O'(n)}. Hence, its determinant is equal to

sgn(0')alu(1)a2a(2) . . . anu(n)'

This is called a signed elementary product of A.


Our discussions can be summarized as follows to get an explicit formula for det A:

Theorem 2.6 For an n x n matrix A,

det A = L sgn(0')al u(1)a2a(2) . . . anu(n) '


UES n

That is, det A is the sum ofall signed elementary products of A.

This shows that the determinant must be unique if it exists. On the other hand, one
can show that the explicit formula for det A in Theorem 2.6 satisfies the three rules
(Rl)-(R3). Therefore, we have both existence and uniqueness for the determinant
function of square matrices of any order n 2: 1, which proves Theorem 2.4.

As the last part of this section , we add an example to demonstrate that any per-
mutation 0' can be converted into the identity permutation by the same number of
transpositions as the number of inversions in 0'.

Example 2.4 (Convert into the identity permutation by transpositions) Consider a


permutation 0' = (3,1 ,5,4, 2) in Ss. It has five inversions, and it can be converted
to the identity permutation by composing five transpositions successively:

0' = {3, 1,5,4, 2} -+ {I, 3, 5, 4, 2} -+ {I, 3, 5, 2, 4} -+ {I, 3, 2, 5, 4}


-+ {I, 2, 3, 5, 4} -+ {I, 2, 3, 4 , 5}.

It was done by moving the number 1 to the first position, and then 2 to the second
position by transpositions, and so on. In fact, the bold faced numbers are interchanged
in each one of five steps, and the five transpositions used to convert 0' into the identity
permutation are shown below :

0'{2, 1,3,4, 5){1, 2, 3,5, 4}{1,2, 4, 3, 5}{1,3, 2, 4, 5}{1, 2, 3, 5, 4}= (I, 2, 3,4,5).

Here, two permutations are composed from right to left for notational convention : i.e .,
= =
if we denote T (2, 1,3,4,5), then O'T 0' 0 T. For example, O'T(2) 0'(1) = 3.=
Also, note that 0' can be converted to the identity permutation by composing the
following three transpositions successively:

0' (2,1,3,4, 5}{1 , 5,3,4, 2}{1,2, 5,4, 3)={1, 2, 3, 4, 5). o


56 Chapter 2. Determinants

It is not hard to see that the number of even permutations is equal to that of odd
permutations, so it is if.
In the case n = 3, one can notice that there are three terms
with + sign and three terms with - sign in det A.

Problem 2.7 Show that the number of even permutations and the number of odd permutations
in S are equal.

Problem 2.8 Let A [CI = cn] be an n x n matrix with the column vectors c/s. Show that
det[c j CI .. . Cj_1 Cj+1 cn] = (-1)j -1 detlc] ... Cj .. . cn]. Note that the same kind of
equality holds when A is written in row vectors.

2.3 Cofactor expansion

Even ifone has found an explicit formula for the determinant as shown in Theorem 2.6,
it is not much help in computing because one has to sum up n! terms, which becomes
a very large number as n gets large. Thus, we reformulate the formula by rewriting it
in an inductive way, by which the summing time can be reduced.
The first factor ala (I) in each of the n! terms is one of al1, a12, ... ,al n in the first
row of A. Hence, one can divide the n! terms of the expansion of det A into n groups
according to the value of a(1): Say,

det A = L sgn(a)ala(l)a2a(2) ... ana(n)


es,
al1AII + al2 AI2 + .. . + alnAln,

where, for j = 1,2, ... , n, Al j is defined as

Alj = L sgn(a)a2a(2) . .. ana(n) '


aESn,a(I)=j

This number A Ij will tum out to be the determinant, up to a sign, ofthe submatrix of
A obtained by deleting the l-st row and j -th column. This submatrix will be denoted
by Mlj and called the minor of the entry al j.

Remark: If we replace the entries all, a 12, . .. , al n in the first row of A as unknown
variables XI, X2 , . , X n , then det A is a polynomial of the variables Xi , and the number
A I j becomes the coefficient of the variable X j in this polynomial.
We now aim to compute Alj for j = 1,2, .. . , n . Clearly, when j = 1,
Al1 = L sgn(a)a2a(2)" ana(n) = Lsgn('r)a2T(2)" . anT(n) ,
aESn,a(l)=1 T

summing over all permutations r of the numbers 2, 3, . .. , n. Note that each term in
A II contains no entries from the first row or from the first column of A. Hence, all
2.3. Cofactor expansion 57

the (n - I)! terms in the sum of All are just the signed elementary products of the
submatrix Mll of A obtained by deleting the first row and the first column of A, so
that
All = det Mll .
To compute the number Alj for j > 1, let A = [CI . . . cn] with the column vectors
cj's and let B = [Cj CI . . . Cj-l Cj+1 ... cn] be the matrix obtained from A by
interchanging the j-th column with its preceding j - 1 columns one by one up to the
first. Then, det A = (-1 )j-I det B (see Problem 2.8). Write

det B = bllBll + b12B\2 + . . . + bi BIn

as the expansion of det B. Then, alj = bll and the number BII is the coefficient of
the entry bll in the formula of det B. By noting A Ij is the coefficient of the entry alj
in the formula of det A, one can have Alj = (-I)j-1 BII. Moreover, the minor Mlj
of the entry ai] is the same as the minor NIl of the entry bll. Now, by applying the
previous conclusion All = det Mll to the matrix B, one can obtain Bll = det Nll
and then
. I . I . I
Alj = (-I)J- Bll = (-I)J- detNll = (-I)J - detMlj'

In summary, one can get an expansion of det A with respect to the first row:

detA = allAll+a\2AI2+ +alnAln


= all det Mll - a\2 det MI2 + .. . + (_l) I+na ln det MIn .

This is called the cofactor expansion of det A along the first row.
There is a similar expansion with respect to any other row, say the i-th row. To
show this, first construct a new matrix C from A by moving the i-th row of A up
to the first row by interchanges with its preceding i - I rows one by one. Then
det A = (_1)i-1 det C as before . Now, the expansion of det C with respect to the
first row [ail .. . ain] is

detC = ailCII +ai2C\2 + . .. +ainCln,

where Clj =:. (-I)j-1 det Mlj and Mlj denotes the minor of Clj in the matrix C.
Noting that Mlj = Mij as minors, we have

Aij = (-I)i- IClj = (-I/+ j detMij


as before and then

det A = ailAil + ai2Ai2 + .. . + ainAin.

The ~u~matrix Mij is called the minor of the entry aij and the number Aij
( _1)1 + J det Mij is called the cofactor of the entry aij.
Also, one can do the same with the column vectors because det AT = det A. This
gives the following theorem:
58 Chapter 2. Determinants

Theorem 2.7 Let A be an n x n matrix and let Aij be the cofactor of the entry aij'
Then,
(1) for each 1 s i s n,
det A = ail Ail + aizAiz + . . . + ainAin,
called the cofactor expansion of det A along the i -th row.
(2) For each 1 s sj n,

det A = aljAlj + a2jAzj + . . . + anjAnj,


called the cofactor expansion of det A along the j -tb column.

This cofactor expansion gives an alternative way of defining the determinant


inductively.
Remark: The sign (-1 )i+ j of the cofactor can be explained as follows:

+ + (_l)l+n
+ (_l)2+n
+ + (_1)3+n

(_l)n+1 (_l)n+Z (_l)n+3 (_l)n+n

Therefore, the determinant of an n x n matrix A is the sum of the products of the


entries in any given row (or, any given column) with their cofactors.

Example 2.5 (Computing det A by a cofactor expansion) Let

23]
5 6 .
8 9
Then the cofactors of all , al2 and al3 are

All = (_1)1+1 det [ ; ~] 5 9-8 6=-3,

AI2 = (-l)I+Zdet[ ~ ~] = (-1)(49-7 .6)=6,

AI3 = (_1)1+3det[ ~ ~] = 48-7 5=-3,

respectively. Hence the expansion of det A along the first row is

detA = allAII + alzAI2 + al3AI3 = 1 . (-3) + 26 + 3 (-3) = O. 0


2.3. Cofactor expansion 59

The cofactor expansion formula of det A suggests that the evaluation of Aij can be
avoided whenever aij = 0, because the product aij Aij is zero regardless of the value
of Aij. Therefore, the computation of the determinant will be simplified by making
the cofactor expansion along a row or a column that contains as many zero entries
as possible. Moreover, by using the elementary row (or column) operations which
do not alter the determinant, a matrix A may be simplified into another one having
more zero entries in a row or in a column whose computation of the determinant may
be simpler. For example, a forward elimination to a square matrix A will produce an
upper triangular matrix U, and so the determinant of A will be just the product of the
diagonal entries of U up to the sign caused by possible row interchanges. The next
examples illustrate this method for an evaluation of the determinant.

Example 2.6 (Computing det A by aforward elimination and a cofactor expansion)


Evaluate the determinant of

I -1]
I -1 2
-3 4 1 -1
A -- 2 -5 -3 8 .
-2 6-4 I

Solution: Apply the elementary operations:

3 x row 1 + row 2,
(-2) x row 1 + row 3,
2 x row 1 + row 4

to A; then

det A = det
1 -1
o
0 -3
1 2-1]
_~ ~~ = det
[1 7-4]
-3 -7 10 .
[
o 4 o -1 4 0-1

Now apply the operation : 1 x row 1 + row 2, to the matrix on the right-hand side,
and take the cofactor expansion along the second column to get

7
-7
o
-4 ]
10
-1
= det [-~4 ~0 -:]
-1

= (_1)1+2.7. det [ -~
-7(2 - 24) = 154.

Thus, det A = 154. o


60 Chapter 2. Determinants

Example 2.7 Show that det A = (x - y)(x - z)(x - w)(y - z)(y - w)(z - w) for
the Vandermonde matrix of order 4:

Solution: Use Gaussian elimination. To begin with, add (-1) x row 1 to rows 2, 3,
and 4 of A :
3

detA = det
[ 01 y -x x y2 x'
_ x 2 y3 x_ x3 J
o z-x Z2 - x 2 Z3 _ x 3
o w-x w 2 _ x 2 w 3 _x 3

[ y-x y' _ x' y3 - x 3 ]


= det Z - x Z2 _ x 2 Z3 _ x 3
w-x w 2 _x 2 w 3 _x 3

y+x y' +xy +x' ]


= (y - x)(z - x)(w - x) det [ : z+x Z2 +xz +x 2
w+x w 2 +xw +x 2

~
y+x y' +xy +x' ]
(x - y)(x - Z)(w - x)det [ z-y (z - y)(z + y + x)
w-y (w - y)(w + y + x)

(x - y)(x - z)(w _ x) det [ Z - Y (z - y)(z + y + x) ]


w-y (w-y)(w+y+x)

= (x - y)(x - z)(x - w)(y - z)(w - y) det [ ~ z+y+x]


w+y+x

= (x - y)(x - z)(x - w)(y - z)(y - w)(Z - w). 0

Problem 2.9 Use cofactor expansions along a row or a column to evaluate the determinants of
the following matrices:

(1) A = [~ ~ ~ ~]
2 2 0 1 '
2 220

Problem 2.10 Evaluate the determinant of


2.4. Cramer's rule 61

0
-a
a
0 be]
dc
(2) B ==
[
-b-d o 1 .
-c -e -I 0

2.4 Cramer's rule


In Chapter 1, we have studied two ways for solving a system of linear equations
Ax = b: (i) by Gauss-Jordan elimination (or by LDU factorization) or (ii) by using
A -I if A is invertible. In this section, we introduce another method for solving the
system Ax = b for an invertible matrix A .
The cofactor expansion of the determinant gives a method for computing the
inverse of an invertible matrix A. For i f. j, let A * be the matrix A with the j-th row
replaced by the i-th row. Then the determinant of A* must be zero, because the i-th
and j-th rows are the same. Moreover, with respect to the j-th row, the cofactors of
A* are the same as those of A : that is, Ajk = Ajk for all k = 1, . .. , n. Therefore,
we have

O=detA* = ailAjl+aizAjZ+ .. +ainAjn


= ailAjl + ns n + ... + ainAjn .

This proves the following lemma.


Lemma 2.8
ifi=j
ifif.j .
Definition 2.6 Let A be an n x n matrix and let A ij denote the cofactor of aij. Then
the new matrix

[A~I A~z
~~: ~~~ ::: ~~]
.'::
is called the matrix of cofactors of A. Its transpose is called the adjugate of A and
is denoted by adj A.

It follows from Lemma 2.8 that


detA 0

A adj A =
[
o
~
detA
~ ~ ] = (detA)/.

detA

If A is invertible, then det A f. 0 and we may write A (de~ A adj A) = I , Thus


62 Chapter 2. Determinants

and A = (det A) adj (A -I)

by replacing A with A -I.

Example2.8 (Computing A-I with adj A) For a matrix A = [~ ~], adj A =

[ -cd-b]
a
, and if det A = ad - be f:. 0, then

A-I _ 1 [ d -ab ] .
- ad -bc -c

Problem 2.11 Compute adj A and A -I for A =[ ; i ~] .


2 -2 1

Problem 2.12 Show that A is invertible if and only if adj A is invertible, and that if A is
invertible, then
(adj A)-I = -A- = adj(A- 1 ) .
detA

Problem 2.13 Let A be an n x n invertible matrix with n > 1. Show that


(1) det(adj A) = (detA)n-l ;
(2) adj(adj A) = (det A)n-2 A.

Problem 2.14 For invertible matrices A and B, show that


(1) adj AB =
adj B . adj A;
(2) adj QAQ-I =
Q(adj A)Q-l for any invertible matrix Q;
(3) if AB = =
BA , then (adj A)B B(adj A) .
In fact, these three properties are satisfied for any (invertible or not) two square matrices A and
B. (See Exercise 6.5.)

The next theorem establishes a formula for the solution of a system of n equations
in n unknowns . It may not be useful as a practical method but can be used to study
properties of the solution without solving the system.

Theorem 2.9 (Cramer's rule) Let Ax = b be a system ofn linear equations in n


unknowns such that det A f:. O. Then the system has the unique solution given by

detCj
x'--- j = 1,2, ... , n,
]- detA '

where C j is the matrix obtainedfrom A by replacing the j -th column with the column
matrixb = [bl b2 .. . bnf.
2.4. Crame r 's rule 63

Proof: If det A =1= 0, then A is invertible and x =


A-I b is the unique solution of
Ax = b. Since
x = A-l b = _ I_ (adj A)b,
detA
it follows that
det Cj
detA
o

Exam ple 2.9 Use Cramer 's rule to solve

I
Xl + 2X2 + X3 = 50
2Xl + 2X2 + X3 = 60
Xl + 2 X2 + 3X3 = 90.

Solution:

A~ [ ; 22I] I ,

n
2 3

Cl =
[ 50
2
60 2
90 2
C2~ [i 50 1]
60 I ,
90 3
C3 ~ [i 2
50 ]
2 60 .
2 90
Therefore,
det CI det C2 detC3
Xl = - - = 10
det A '
X2 = - - = 10,
detA
X3 = - - = 20.
det A
o
Cramer's rule provides a convenient method for writing down the solution of a
system of n linear equations in n unknowns in terms of determina nts. To find the
solution, however, one must evaluate n + I determinants of order n, Evaluating even
two of these determi nants generally involves more computations than solving the
system by using Gauss-Jordan elimination.
Problem 2.15 Use Cramer 's rule to solve the following systems.

I
4x2 + 3X3 -2
(I) 3Xl + 4X2 + 5X3 6
- 2x 1 + 5X2 2X3 1.
2 3 5
- + - 3
x y z
4 7 2
(2) + -y + - 0
x z
2 I
- - - 2.
y z
Problem 2.16 Let A be the matrix obtained from the identity matrix In with i-th column
replaced by the column vector x = [Xl .. . xn f. Compute det A.
64 Chapter 2. Determinants

2.5 Applications

2.5.1 Miscellaneous examples for determinants

Example 2.10 Let A be the Vandermonde matrix of order n:

1 Xl Xf ... X~_l]
1 X2 X22 . . . X2n-l
A= . . . . ' ..
[ .. . .
"

1 Xn x n2 x nn - l

Its determinant can be computed by the same method as in Example 2.7 as follows:

detA = TI (Xj -Xi) .


l~i<j~n

Example 2.11 Let Aij denote the cofactor of aij in a matrix A = [aij]. If n > 1,
then
0 Xl X2 Xn
xl au al2 al n n n
det x2 a21 a22 a2n = - LLAijXiXj .
i=l j=l
x n anI an2 ann

Solution: First take the cofactor expansion along the first row, and then compute the
cofactor expansion along the first column of each n x n submatrix. 0

Example 2.12 Let f I. 12 , ... , fn be n real-valued differentiable functions on JR.


For the matrix

fl(X) h(x)
fn(x) ]
f{(x) f 2(x ) f~(x)

[
f?-:I) (x) j 2(n- l)( )
x ..
f~n-:l\X) ,

its determinant is called the Wronskian for (fl (x), h(x) , . . . , fn(x)} , For example ,

X sin X cos X ] [ X sin X X + sin X ]


det 1 C?S X - sin x = - x, but det 1 1 + C?SX
C?SX = o.
[ o - sm x - cos x 0 - sm x - sm x

In general, in fl , 12, ..., fn , if one of them is a constant multiple of another or a


sum of such multiples, then the Wronskian must be zero.
2.5.1. Application: Miscellaneousexamples for determinants 65

Example 2.13 An n x n matrix A is called a circulant matrix if the i-th row of A


is obtained from the first row of A by a cyclic shift of the i - I steps, i.e., the general
form of the circulant matrix is

al a2 a3 an
an al a2 an-I
A= an-I an al an-2

a2 a3 a4 al

where w = e2rri /3 is the primitive root of unity. In general, for n > 1,

detA = n
n-I

j =O
(al +a2Wj +a3w7 +... +anwrl) ,

where W j = e2rrij In, j = 0, 1, ... , n - 1, are the roots of unity. (See Example 8.18
for the proof.)

Example 2.14 A tridiagonal matrix is a square matrix of the form

al bl 0 0 0 0
CI a2 b2 0 0 0
0 C2 a3 0 0 0
Tn =

0 0 0 an-2 bn-2 0
0 0 0 Cn-2 an-I bn-I
0 0 0 0 Cn-I an

The determinant of this matrix can be computed by a recurrence relation: Set Do = 1


and Dk = det n for k 2: 1. By expanding n
with respect to the k-th row, one can
have a recurrence relation

The following two special cases are interesting .


Case (1) Let all ai, b j , Ck be the same, say a; = bj = Ck = b > O. Then, DI = b,
D2 = 0 and
66 Chapter 2. Determinants

Dn = bDn-1 - b 2D n_2 for n ~ 3.


Successively, one can find D3 = = _b 4 , D5 = 0, . . .. In general, the
_b 3 , D4
n-th term D n of the recurrence relation is given by

o; = n
b [cos n; + ~ n; l sin

Later in Section 6.3.1, it will be discussed how to find the n-th term of a given
recurrence relation. (See Exercise 8.6.)
Case (2) Let all bj = 1 and all Ck = -1, and let us write

al 0 0 0 0
-1 a2 1 0 0 0
0 -1 a3 0 0 0
(al . .. an) = det
0 0 0 an-2 0
0 0 0 -1 an-I 1
0 0 0 0 -1 an

Then,
(ala2 an) 1
=al+
(a2a3 .. . an) + 1
a2
a3+

+ 1
an-I +-an

Proof: Let us prove it by induction on n. Clearly, al + = ~~ ih .It remains to


show that
1 (ala2 . . . an)
al + = ,
(a2a3 . . . an) (a2a3 . . . an)
(a3a4 an)
i.e., al (a2 . .. an) + (a3 .. . an) = (a,a2 . . . an) ' But this identity follows from the
previous recurrence relation, since (a,a2' . . an) = (an . . . a2a,) . D

Example 2.15 (Binet-Cauchy formula) Let A and B be matrices of size n x m and


m x n, respectively, and n ::: m. Then

where Ak! ...k n is the minor obtained from the columns of A whose numbers are
kl' . . . ,kn and Bk! ...kn is the minor obtained from the rows of B whose numbers
2.5.2. Application:Area and volume 67

are kl' ... ,kn . In other words, det(AB) is the sum of products of the corresponding
majors of A and B, where a major of a matrix is, by definition, a determinant of
maximal order minor in the matrix.

Proof: Let C = AB, Cij = L~=l aikbkj' Then

m
L alkt ' . . ankn L (_1)<1 bkt<1(l) ... bkn<1(n)
kt.....kn=l <1eSn
m
= L alkl" . anknB kl...kn.
kt.....kn=l

The minor Bkl ...kn is nonzero only if the numbers kl' ,kn are distinct. Thus, the
summation can be performed over distinct numbers kl ' ,kn . Since B,(kIl...,(kn) =
(-1)' Bkt ...kn for any permutation r of the numbers kl' ,kn , we have

1 -1 3 3 21 -12 ]
For example, if A = [ 2 2-1 2] and B ~[ _; ~ then

det(AB) = det [ ~ -1]


2 det [12 -12 ] + det [12 -13] det [ - 31 2]
1

+ det [ 21 3]
2 det [11 2]
2 + det [ -12 -13] det [ -23 -1]
1

+ det [
- 1
2 23] [2 -1] + [3 3] [-3 1]
det 1 2 det -1 2 det 1 2

= -167.

2.5.2 Area and volume

In this section, we demonstrate a geometrical interpretation of the determinant of a


square matrix A as the volume (or area for n = 2) of the parallelepiped peA) spanned
by the row vectors of A. For this, we restrict our attention to the case of n = 2 or 3
for a visualization, even if a similar argument can be applied for n > 3.
68 Chapter 2. Determinants

For an n x n square matrix A, its row vectors ri = [a i 1 ... ain] can be considered
as elements in

The set
P(A) = [ttiri:o s u s
1=1
1, i = 1,2, .. . ,nl
is called a parallelogram if n = 2, or a parallelepiped if n 2: 3. Note that the row
vectors of A form the edges of P (A) , and changing the order of row vectors does not
alter the shape of P(A) .
Theorem 2.10 The determinant det A of an n x n matrix A is the volume ofP(A)
up to sign. Infact, the volume ofP(A) is equal to 1 det A I.

Proof: We give the proof for the case n = 2 only, and leave the case n = 3 to the
readers. Let

where rl , r2 are the row vectors of A. Let Area(A) denote the area of the parallelo-
gram P(A) (see Figure 2.3). Note that

Area [ ;~ ] = Area [ ;~ ] ,

l
but
det [ ~~ ] = - det [ ~~
Thus, one can expect in general that

det [ ~~ ] = Area [ ~~ ] ,
which explains why we say 'up to sign' . To determine the sign , we first define the
orientation of A = [ ~~ ] to be

p(A) = { 1~~~~ll= 1 ifdetA =1= 0 ,


ifdetA = O.
In general, p(A) = 1 if and only if det A > 0: In this case, we say the ordered
pair (rl r2) is positively oriented, while p(A) = -1 if and only if det A < 0: In this
case, a is negatively oriented. See Figure 2.2 (next page).

For example, p ([ ~ ~]) = 1, while p ([ ~ ~]) = - 1.

Tofinish the proof, it is sufficientto show that the function D(A) = p(A)Area(A)
satisfies the rules (RI)-(R3) of the determinant, so that det = D = Area, or
Area(A) = 1 det AI. However,
2.5.2. Application: Area and volume 69

y y

___--+"""""""-'-__ x -------IO.-+--- x

Figure 2.2. Orientation of vectors

(1) it is clear that D [~ ~] = 1.

(2) D [ ~~ ] = -D [ ~~ ] ,because p ([ ~~]) = -p ([ ~~ ]) and

Area [ ~~ ] = Area [ ~~ l
(3) D [ k~~ ] = kD [ ~~ ] for any k . Indeed, if k = 0, it is clear. Suppose k t= o.
Then , as illustrated in Figure 2.3, the bottom edge r j ofP(A) is elongated by Ikl
while the height h remains unchanged. Thus

-::::.....-----------+-x

Figure 2.3. The parallelogram P ([ k~~ ])

On the other hand,


70 Chapter 2. Determinants

Therefore , we have

D[ k~~] = p ([ k~1 ]) Area [ k~1 ]

= 1:1 p ([ ~~ ]) IklArea [ ~~ ] = kD [ ~~ ] .
(4) D [r l ~r2 ] = D [~ ]+D [~ ] for anyu.rj andr2injR2.Ifu =O,there
is nothing to prove. Assume thatu f= O.Chooseanyvectorv E jR2 suchthat{u, v}
is a basis for jR2 and the pair (u , v) is positively oriented. Then rj = ajU + biv,
i = 1,2, and

D [ rl ~ r2] = D [ (al + a2)u ~ (bl + b2)V ]

= D [ (bl ~ b2)V] = (bl + b2)D [ : ]

= D [ alU ~ blv ] + D [ a2
U
~ b2V ]

= D[ r~ ] + D [ r~ l
The second equality follows from (2) and Figure 2.4.

o
Remark: (1) Note that if we have constructed the parallelepiped peA) using the
column vectors of A, then the shape of the parallelepiped is totally different from
the one constructed using the row vectors. However, det A = det A T means their
volumes are the same, which is a totally nontrivial fact.
(2) For n ~ 3, the volume ofP(A) can bedefined by induction onn, and exactly the
same argument in the proof can be applied to show that the volume is the determinant.
However, there is another 'Nayoflooking at this fact. Let {CI, C2, . . . , cn } be n column
2.5.2. Application:Area and volume 71

vectors of an m x n matrix A. They constitute an n-dimensional parallelepiped in JRm


such that

P(A) = [ttiCi : O::::ti::::l.i=1.2.... ,nj .


1= \
A formula for the volume of this parallelepiped may be expressed as follows: We
first consider a two-dimensional parallelepiped (a parallelogram) determined by two
column vectors C\ and C2 of A =
[CI c2l in R3. The area of this parallelogram is

---:~~----------x

z
Figure 2.5. A parallelogram in ]R3

simply Area(P(A)) = IIc\lIh, where h = IIc211 sin () and () is the angle between CI
and C2. Therefore. we have
2II 2
Area(P(A))2 = IIcIil c211 sin 2 () = IIcIi1211c2112(l - cos 2 ())

(C\ ' C2)2 )


= (CI ' CI)(C2 . C2) ( 1 - - - - - -
(CI . C\)(C2 . C2)

= (CI ' CI)(C2 . C2) - (CI C2)2

= det [ C\ . C\ CI' C2 ]
C2 . CI C2 ' C2

det ([:r] [CI C2]) =


T
det(A A),

where "." denotes the dot product. In general, let CI, . . Cn be n column vectors
of an m x n (not necessarily square) matrix A. Then one can show (for a proof see
Exercise 5.17) that the volume of the n-dimensional parallelepiped P(A) determined
by those n column vectors C/s in R m is

vol(P(A)) = Jdet(A T A) .
In particular, if A is an m x m square matrix. then

vol(P(A)) = Jdet(AT A) = Jdet(AT)det(A) = [det A],


as expected.
72 Chapter 2 . Determinants

Problem 2.17 Show thattheareaofa triangle ABC in the plane lR 2 , where A = (xj , YI), B =
=
(X2, Y2), C (X3, Y3), is equal to the absolute value of

2I det [Xl
X2
YI
Y2
1]
1 .
X3 Y3 1

2.6 Exercises

2.1. Determine the values of k for which det [~ 2:] = o.


2.2. Evaluate det(A 2 BA -1) and det(B- I A 3) for the following matrices:
A = [ - ; -; { ] , B = [~ -~ ; ] .
o 1 0 2 I 3
2.3. Evaluate the determinant of

A = [=~ -~ -~ =~].
2 -3 -5 8
2.4. Evaluate det A for an n x n matrix A = [aij] when
(1) aij = { 0
I i
j f
. _ . (2) aij = j + j.
J,
1-

2.5. Find all solutions of the equation det(AB) = 0 for


A=[X;2 X~2J. B=[~ x~2l
2.6. Prove that if A is an n x n skew-symmetric matrix and n is odd, then det A = O. Give an
example of 4 x 4 skew-symmetric matrix A with det A f O.
2.7. Use the determinant function to find
(1) the area of the parallelogram with edges determined by (4, 3) and (7, 5),
(2) the volume of the parallelepiped with edges determined by the vectors
(1,0,4), (0, -2,2) and (3, 1, -I) .
2.8. Use Cramer 's rule to solve each system .

(1) { Xl + X2 = 3
xl x2 = -1.
Xl + x2 + X3 2
(2)
1 xl
Xl
+
+
2X2
3X2
+ X3
X3
=
-4 .
2

X2 + X4 = -1
Xl + X3 = 3
(3)/ Xl x2 x3 X4 2
Xl + X2 + X3 + X4 O.
2.6. Exercises 73

(U: -nx~[ -n
2.9. Use Cramer's rule to solve the given system :

(l)[l ;]x~[n 2J
2.10 . Find a constant k so that the system of linear equations
kx - 2y - z 0
=
1 (k + l)y + 4z
(k - ljz = 0
0

has more than one solution . (Is it possible to apply Cramer's rule here?)
2.11. Solve the following system oflinear equations by using Cramer's rule and by using Gaus -
sian elimination:

[1 : ~ i]x~[n
2.12 . Solve the following system of equations by using Cramer's rule:
3x + 2y = 3z + 1

1 3x +
3z -
2z
1
=
=
8 -
x -
5y
2y.
2.13. Calculate the cofactors A II, A \2, A 13 and A33 for the matrix A:

(1)A=[~211
; ; ] , (2)A=[~ ~ ~], (3)A=[-; -~
312 32
;1]'
2.14. Let A be the n x n matrix whose entries are all 1. Show that
(1) det(A - nIn) = O.
(2) (A - nIn)ij =
(_1)n-1 nn-2 for all i, j, where (A - nIn)ij denotes the cofactor of
the (i , j)-entry of A - nino
2.15 . Show that if A is symmetric, so is adj A. Moreover, if it is invertible , then the inverse of
A is also symmetric.
2.16 . Use the adjugate formula to compute the inverses of the following matrices:

A=[-~ ~ ;], B=[COS~ ~ -Sin~] .


4 1 -1 sine 0 cos s
2.17. Compute adj A, detA, det(adj A), A -I, and verify A adj A = (detA)I for

(1) A = [-;
3 -2 1
~ ~], (2) A = [~1 ~5 ~]
7
.
=
2.18. Let A, B be invertible matrices . Show that adj(AB) adj B adj A .
(The reader may also try to prove this equality for noninvertible matrices .)
2.19. For an m x n matrix A and n x m matrix B, show that

det [_~ ~ ] = det(AB) .


2.20. Find the area of the triangle with vertices at (0,0), (1,3) and (3,1) in ]R2 .
74 Chapter 2. Determinants

2.21. Find the area of the triangle with vertices at (0, 0, 0), (1, 1,2) and (2,2,1 ) in ]R3.

2.22. For A , B, C, D E Mn x n (]R), show that det [~ ~] = det A det D. But , in general,
det [~ ~] =P det A det D - det B det C.

2.23. Determine whether or not the following statements are true in general, and justify your
answers.
(1) For any square matrices A and B of the same size, det(A = det A + det B .
+ B)
(2) For any square matrices A and B of the same size, det(AB) = det(BA) .
(3) If A is an n x n square matrix, then for any scalar c, det(c1n - A) = c" - det A .
(4) If A is an n x n square matrix, then for any scalar c, det(c1n - AT) = det(cln - A).
(5) If E is an elementary matrix, then det E = 1.
(6) There is no matrix A of order 3 such that A 2 = -lJ.
(7) Let A be a nilpotent matrix, i.e., A k = 0 for some natural number k , Then det A = 0.
(8) det(kA) = k det A for any square matrix A.
(9) The multilinearity holds in any two rows at the same time :

a+u b+ v c+w] [a b C] [U
+ x e + y f + z = det d e f + det x
[e m n
det d
emn e
(10) Any system Ax = b has a solution if and only if det A f= O.
(1 1) For any n x 1, n ~ 2, column vectors u and Y, = o.
det(uy T )
(12) If A is a square matrix with det A = 1, then adj(adj A ) = A .
(13) If the entries of A are all integers and det A = 1 or -1 , then the entries of A-I are
also integers .
(14) If the entries of A are O's or 1's, then det A = 1,0, or -1.
(15) Every system of n linear equations in n unknowns can be solved by Cramer's rule.
(16) If A is a permutation matrix, then AT = A.
3
Vector Spaces

3.1 The n-space jRn and vector spaces

Wehave seen thatthe Gauss-Jordan elimination is the most basic technique for solving
a system Ax = b of linear equations and it can be written in matrix notation as an
LDU factorization. Moreover, the questions of the existence or the uniqueness of the
solution are much easier to answer after the Gauss-Jordan eliminat ion. In particular,
if det A f= 0, x = 0 is the unique solution Ax = O. In general, the set of solutions
of Ax = 0 has a kind of mathematic al structure, called a vector space, and with this
concept one can characteri ze the uniqueness of the solution of a system Ax = b of
linear equations in a more systematic way.
In this chapter, we introduce the notion of a vector space, which is an abstraction
of the usual algebraic structure s of the 3-space ]R3 and then elaborate our study of a
system oflinear equations to this framework.
Usually, many physical quantities, such as length, area, mass, temperature are
described by real numbers as magnitudes. Other physical quantities like force or
velocity have directions as well as magnitudes . Such quantities with direction are
called vectors, while the numbers are called scalars. For instance, a vector (or a
point) x in the 3-space ]R3 is usually represented as a triple of real numbers:

where Xi E ]R, i = 1, 2, 3, which are called the coordinates of x. This expression


provides a rectangular coordinate system in a natural way. On the other hand, pictor-
ially such a point in the 3-space ]R3 can also be represented by an arrow from the
origin to x. In this way, a point in the 3-space ]R3 can be understood as a vector. The
direction of the arrow specifies the direction of the vector, and the length of the arrow
describes its magnitude.
In order to have a more general definition of vectors, we extract the most basic
properties of those arrows in 1R3 . Note that for all vectors (or points) in ]R3, there are
two algebraic operations: the sum of two vectors and scalar multiplication of a vector

J H Kwak et al., Linear Algebra


Birkhauser Boston 2004
76 Chapter 3. Vector Spaces

by a scalar. That is, for two vectors x = (XJ, x2, X3), Y = (YI, Y2 , Y3) in 1R3 and k a
scalar, we define

x+ Y = (Xl + YI,X2 + Y2, X3 + Y3),


kx = (kXI, kX2, kx3) .

Then a vector x = (Xl, X2, X3) in 1R3 may be written as

where i = (1,0,0), j = (0,1,0) and k = (0,0, 1) which were introduced as the


rectangular coordinate system in vector calculus. The sum of vectors and the scalar
multiplication of vectors in the 3-space 1R3 are illustrated in Figure 3.1:

X3 x+y X3

Xl
Figure 3.1. A vector sum and a scalar multiplication

Even though our geometric visualization of vectors does not go beyond the
3-space 1R3 , it is possible to extend these algebraic operations of vectors in the 3-
space 1R3 to the n-space IRn for any positive integer n. It is defined to be the set of all
ordered n-tuples (aI, a2, ... ,an) of real numbers, called vectors: i.e.,

For any two vectors x = (Xl, X2, . .. ,xn) and Y = (YI, Y2 , . .. , Yn) in the n-space IRn ,
and a scalar k, the sum X + Yand the scalar multiplication kx of them are vectors in
IRn defined by

x-l-y = (XI+YI, X2+Y2, ... ,X n+Yn) ,


kx = (kxl, kX2, . .. , kx n) .

It is easy to verify the following arithmetical rules of the operations:

Theorem 3.1 For any scalars k and f., and vectors x = (Xl , X2, . . . ,Xn ), Y =
(YI, Y2, . . . , Yn), and z = (Zl, Z2, . . . ,Zn) in the n-space IRn , the following rules
hold:
3.1. The n-space jRn and vector spaces 77

(1) x + Y = Y + x,
(2) x + (y + z) = (x + y) + z,
(3) x + 0 = x = 0 + x,
(4) x+(-I)x=O,
(5) k(x + y) = kx + ky,
(6) (k + e)x = kx + ex,
(7) k(ex) = (ke)x,
(8) lx = x,
where 0 = (0, 0, ... , 0) is the zero vector.

We usually identify a vector (aI, a2, ... ,an) in the n-space lRn with an n x I
column vector

Sometimes a vector in lRn is also identified with a I x n row vector (see Section 3.5).
Then, the two operations of the matrix sum and the scalar multiplication of column
matrices coincide with those of vectors in lRn , and Theorem 3.1 rephrases Theorem 1.3.
These rules of arithmetic of vectors are the most important ones because they
are the only rules that we need to manipulate vectors in the n-space lR n . Hence,
an (abstract) vector space can be defined with respect to these rules of operations of
vectors in the n-space lRn so that lRn itself becomes a vector space. In general, a vector
space is defined to be a set with two operations: an addition and a scalar multiplication
which satisfy the rules (1)-(8) in Theorem 3.1.
Definition 3.1 A (real) vector space is a nonempty set V of elements, called vectors,
with two algebraic operations that satisfy the following rules.
(A) There is an operation called vector addition that associates to every pair x and
y of vectors in V a unique vector x + y in V, called the sum of x and y, so that the
following rules hold for all vectors x, y, z in V :
(1) x + Y = Y+ x (commutativity in addition),
(2) x + (y+z) = (x-l-y) + z(= x + Y+ z) (associativity in addition),
(3) there is a unique vector 0 in V such that x + 0 = x = 0 + x for all x E V (it is
called the zero vector) ,
(4) for any x E V, there is a vector -x E V, called the negative of x, such that
x + (-x) = (-x) + x = O.
(B) There is an operation called the scalar multiplication that associates to each
vector x in V and each scalar k a unique vector kx in V so that the following rules
hold for all vectors x, y, z in V and all scalars k, e:
(5) k(x + y) = kx + ky (distributivity with respect to vector addition),
(6) (k + e)x = kx + ex (distributivity with respect to scalar addition) ,
78 Chapter 3. Vector Spaces

(7) k(lx) = (kl)x (associativity in scalar multiplication),


(8) Ix = x.
Clearly, the n-space JRn is a vector space by Theorem 3.1. A complex vector
space is obtained if, instead of real numbers, we take complex numbers for scalars.
For example , the set en of all ordered n-tuples of complex numbers is a complex
vector space. In Chapter 7 we shall discuss complex vector spaces, but until then we
will discuss only real vector spaces unless otherwise stated.

Example 3.1 (Miscellaneous examples 01 vector spaces)


(1) For any two positive integers m and n, the set Mmxn(JR) of all m x n matrices
forms a vector space under the matrix sum and the scalar multiplication defined in
Section 1.3. The zero vector in this space is the zero matrix Omxn , and -A is the
negative of a matrix A.
(2) Let A be an m x n matrix. Then it is easy to show that the set of solutions
of the homogeneous system Ax = 0 is a vector space (under the sum and the scalar
multiplication of matrices).
(3) Let C(JR) denote the set of real-valued continuous functions defined on the
real line JR. For two functions I and g, and a real number k, the sum I + g and the
scalar multiplication kf of them are defined by

(f + g)(x) I(x) + g(x) ,


(kf)(x) = kl(x).

Then the set C(lR) is a vector space under these operations. The zero vector in this
space is the constant function whose value at each point is zero.
(4) Let S(JR) denote the set of real-valued functions defined on the set of integers.
A function I E S(JR) can be written as a doubly infinite sequence of real numbers:

. . , X-2, X- I , Xo , XI , X2 , . ,

=
where Xk I (k) for each k. This kind of sequences appear frequently in engineering,
and is called a discrete or a digital signal. One can define the sum of two functions
and the scalar multiplication of a function with a scalar just as in C(JR) in (3) so that
S(JR) becomes a vector space. 0

Theorem 3.2 Let V be a vector space and let x, y be vectors in V. Then


(1) x + Y = Y implies x = 0,
(2) Ox = 0,
(3) kO = 0 lor any k E R
(4) -x is unique and -x = (-l)x,
(5) ifkx = 0, then k = 0 orx = O.

Proof: (1) By adding -y to both sides ofx + y = y, we have

x = x + 0 = x + Y+ (-y) =y+ (-y) = O.


3.2. Subspaces 79

(2) Ox = (0 + O)x = Ox + Ox implies Ox = 0 by (1).


(3) This is an easy exercise.
(4) The uniqueness of the negative -x ofx can be shown by a simple modification
of Lemma 1.7. In fact, if x is another negative of x such that x + x = 0, then

-x = -x + 0 = -x + (x + x) = (-x + x) + X= 0 + x = x.
On the other hand, the equation

x + (-I)x = lx + (-I)x = (I - I)x = Ox = 0

=
shows that (-l)x is another negative of x, and hence -x (-l)x by the uniqueness
of-x.
(5) Suppose kx = 0 and k ,p. O.Then x = lx = t(kx) = = O. to0

Problem 3.1 Let V be the set of all pairs (x, y) of real numbers. Suppose that an addition and
scalar multiplication of pairs are defined by

(x, y) + (u , v) = (x + 2u, y + 2v), k(x, y) = (kx, ky).

Is the set V a vector space under those operations? Justify your answer.

3.2 Subspaces
Definition 3.2 A subset W of a vector space V is called a subspace of V if W itself
is a vector space under the addition and the scalar multiplication defined in V.

In order to show that a subset W is a subspace of a vector space V, it is not


necessary to verify all the arithmetic rules of the definition of a vector space. One
only needs to check whether a given subset is closed under the same vector addition
and scalar multiplication as in V . This is due to the fact that certain rules satisfied in
the larger space are automatically satisfied in every subset.

Theorem 3.3 A nonempty subset W ofa vector space V is a subspace if and only if
x + y and kx are contained in W (or equivalently, x + ky E W) for any vectors x
and y in Wand any scalar k E JR.

Proof: We need only to prove the sufficiency. Assume both conditions hold and let
x be any vector in W. Since W is closed under scalar multiplication, 0 = Ox and
-x = (-I)x are in W, so rules (3) and (4) for a vector space hold . All the other rules
for a vector space are clear. 0

A vector space V itself and the zero vector {OJ are trivially subspaces. Some
nontrivial subspaces are given in the following examples.
80 Chapter 3. Vector Spaces

Example 3.2 (Which planes in lR 3 can be a subspace?) Let

W = {(x, Y, z) 3
E lR : ax + by + cz = O},

wherea ,b,careconstants. Ifx = (XI,XZ,X3), Y = (YI,Yz,Y3) are points in W ,


then clearly x + Y = (XI + YI, Xz + yz, X3 + Y3) is also a point in W, because it
satisfies the equation in W. Similarly, kx also lies in W for any scalar k. Hence , W is
a subspace of lR3 , which is a plane passing through the origin in lR 3 0

Example 3.3 (The solutions of Ax = 0 form a subspace) Let A be an m x n matrix.


Then, as shown in Example 3.1(2), the set

W = {x E lRn : Ax = O}

of solutions of the homogeneous system Ax =


0 is a vector space . Moreover, since
the operations in Wand in lRn coincide , W is a subspace of lRn . 0

Example 3.4 For a nonnegative integer n, let Pn(lR) denote the set of all real poly-
nomials in x with degree :5 n. Then Pn (R) is a subspace of the vector space C(lR) of
all continuous functions on R 0

Example 3.5 (The space ofsymmetricor skew-symmetric matrices) Let W be the set
of all n x n real symmetric matrices. Then W is a subspace of the vector space Mn xn (R)
of all n x n matrices, because the sum of two symmetric matrices is symmetric and
a scalar multiplication of a symmetric matrix is also symmetric. Similarly, the set of
all n x n skew-symmetric matrices is also a subspace of Mn xn{lR). 0

Problem 3.2 Which of the following sets are subspaces of the 3-space ll~.3? Justify your answer.

(1) W = =
{(x, y, z) E JR3 : x yz O} ,
(2) W =
{(2t , 3t , 4t) E JR3 : t E JR},
(3) W={(X,y ,Z)EJR3 : xZ + y2 _ z2 = O},
(4) W =
{x E JR3 : x T u = =
0 x T v}, where u and v are any two fixed nonzero vectors in
JR3.

Can you describe all subspaces of the 3-space JR3 ?

Problem 3.3 Let V = C(JR) be the vector space of all continuous functions on lR. Which of
the following sets Ware subspaces of V ? Justify your answer.
(l) W is the set of all differentiable functions on lR.
(2) W is the set of all bounded continuous functions on R.
(3) W is the set of all continuous nonnegative-valued functions on JR, i.e., !(x) 2:: 0 for any
x E lR.
(4) W is the set of all continuous odd functions on JR, i.e., ! (-x) =-! (x) for any x E lR.
(5) W is the set of all polynomials with integer coefficients.

Definition 3.3 Let U and W be two subspaces of a vector space V.


3.2. Subspaces 81

(1) The sum of V and W is defined by

V + W = {u + W E V : u E V , WE W}.

(2) A vector space V is called the direct sum of two subspaces V and W, written as
V = V ffi W , if V = V + Wand V n W = {O}.

It is easy to see that V + Wand V n W are also subspaces of V . If V = JR2


(x y-plane), V = {xi : x E JR}(x-axis), and W = {yj : y E JR} (y-axis), then it is easy
to see that JR2 = V ffi V = JR ffi JR, by considering the x-axis (and also the y-axis) as
R, Similarly, one can easily be convinced that JR3 = JR2 ffi JR 1 = JR 1 ffi JR 1 ffi JR I .

Problem 3.4 Let U and W be subspaces of a vector space V.

(l) Suppose that Z is a subspace of V contained in both U and W . Show that Z is also
contained in U n W.
(2) Suppose that Z is a subspace of V containing both U and W. Show that Z also contains
U + W as a subspace.

Theorem 3.4 A vector space V is the direct sum of subspaces V and W , i.e., V =
V ffi W, if and only if for any v E V there exist unique u E V and W E W such that
v = u-j-w.

Proof: (:::}) Suppose that V = V ffi W . Then, for any v E V, there exist vectors
u E V and WE W such that v = u + w, since V = V + W. To show the uniqueness,
suppose that v is also expressed as a sum u' + w' for u' E V and w' E W. Then
u + W = u' + w' implies

u - u' = w' - W E V n W = {O}.


=
Hence, u u' and W = w'.
(<=) Clearly, V =V + W. Suppose that there exists a nonzero vector v in V n W.
Then v can be written as a sum of vectors in V and W in many different ways:

1 1 1 2
v= v+ 0 = 0 + v = -v
2
+ -v
2
= -v
3
+ -v
3
E V + W. o
Example 3.6 (Sum, but not direct sum) In the 3-space JR3 , consider the three vectors
el = (1,0,0), e2 = (0, 1,0) and e3 = (0,0,1). These three vectors are also well
known as l, j and k respectively. Let V = {ai + ck : a, c E JR} be the xz-plane, and
let W = {bj + ck : b, c E JR} be the yz-plane, which are both subspaces ofJR3. Then
a vector in V + W is of the form

ai + bj + (Cl + c2)k
ai + bj + ck = (a , b , c),
82 Chapter 3. VectorSpaces

where c = ct + Cz and a, b, c can be arbitrary numbers . Thus U + W = ]R3. However,


]R3 f: U $ W since clearly k E un W f: {OJ. In fact, the vector k E ]R3 can be
written as many linear combinations of vectors in U and W:
1 1 1 2
k = -k
2
+ -k
2
= -k
3
+ -k
3
E U + W.

Note that if we had taken W = {yj : y E ]R} to be the y-axis, then it would be
easy to see that]R3 = U $ W. Note also that there are many choices for W so that
]R3 =U$ W. 0
Problem 3.5 Let U and W be the subspaces of the vector space MnxnOR) consisting of all
symmetric matrices and all skew-symmetric matrices, respectively. Show that Mnxn(lR) =
U e W . Therefore, the decomposition of a square matrix A given in (3) of Problem 1.11 is
unique.

3.3 Bases
As we know, a vector in the 3-space]R3 is of the form (XI , xz, X3) , and also it can be
written as

(XI, Xz, X3) = XI (1,0,0) + xz(O, 1,0) + X3(0, 0,1).

That is, any vector in ]R3 can be expressed as a sum of scalar multiples of three vectors
el = (1,0,0), ez = (0,1,0) and e3 = (0,0,1). The following definition gives a
name to such an expression.

Definition 3.4 Let V be a vector space and let {XI,Xz, . .. , Xm } be a set of vectors in
V. Then a vector y in V of the form

where aI, az , .. . , am are scalars, is called a linear combination of the vectors


XI, XZ, , Xm

The next theorem shows that the set of all linear combinations of a finite set of
vectors in a vector space forms a subspace .

Theorem 3.5 Let XI, Xz, . .. , Xm be vectors in a vector space V. Then the set W =
{aIxI + azxz + ... + amXm : a; E ]R} ofall linear combinationsof XI. Xz, . . . , Xm
is a subspace of V. It is called the subspace of V spanned by XI , Xz, . .. , Xm Or,
XI, Xz, ... , Xm span the subspace W.

Proof: It is necessary to show that W is closed under the vector sum and the scalar
multiplication. Let u and w be any two vectors in W. Then

u = aIxI + azxz +
w = bIXI + bzxz +
3.3. Bases 83

for some scalars ai' S and b, 's. Therefore,

and , for any scalar k,

Thus, u + wand ku are linear combinations of XI, X2, . .. , Xm and consequently


contained in W. Therefore, W is a subspace of V . 0

Example 3.7 (A space can be spanned by many different sets)


(1) For a nonzero vector v in a vector space V, a linear combination of v is simply
a scalar multiple ofv. Thus the subspace W of V spanned by v is W = {kv : k E JR}.
Note that this subspace W can be spanned by any kv, k :f: O.
(2) Consider three vectors el = (1,0,0), e2 = (0,1,0) and v = el + ~ =
(1, 1, 0) in JR3 . The subspace WI spanned by el and e2 is written as

WI = {aiel +a2e2 = (ai, a2, 0) : ai E JR} ,


and the subspace W2 spanned by ej , e2 and v is written as

W2 = {al el + a2e2 + a3v = (al + a3, a2 + a3, 0) : ai E JR}.


Clearly, WI S; W2 . On the other hand, since v = ei + e2 E WI, W2 S;
WI. Thus
WI = W2 which is the xy-plane in JR3 . In general, a subspace in a vector space can
have many different spanning sets. 0
Example 3.8 (Forany m ::: n, JRm is a subspace ofJRn) Let
el = (1,0,0, ,0),
e2 = (0,1 ,0, ,0) ,

en = (0,0,0, . . . , 1)
be n vectors in the n-space JRn (n 2: 3). Then a linear combination of ej , e2, e3 is of
the form

Hence, the set

is the subspace of the n-space JRn spanned by the vectors ej , e2, e3. Note that the
subspace W can be identified with the 3-space JR3 through the identification

(ai, a2, a3, 0, . .. , 0) =(ai, a2, a3)


with a, E JR. In general, for m ::: n, the m-space JRm can be identified as a subspace
of the n-space JRn . 0
84 Chapter 3. VectorSpaces

Example 3.9 (All Ax'sform the column space) Let A = [Cl C2'" cn] be an m x n
matrix with column c, 'so Then the column vectors Ci are in lRm , and the matrix product
Ax represents the linear combination of the column vectors Ci whose coefficients are
the components of x E lRn, i.e., Ax = Xl Cl + X2C2 + ... + XnC n (see Example 1.9).
Therefore, the set
W = {Ax E lRm : x E R"}
of all linear combinations of the column vectors of A is a subspace of lRm called the
=
column space of A . Consequently, Ax b has a solution (Xl , X2, .. . , x n ) in lRn if
and only if the vector b belongs to the column space of A. 0
Remark: One can take another point of view on the equation Ax = b. For any vector
x E lRn, A assigns another vector b in lRm which is given as Ax. That is, A can be
considered as a function from lRn into lRm and the column space of A is nothing but
the image of this function.
Problem 3.6 Let Xl, X2, , Xm be vectors in a vector space V and let W be the sub-
space spanned by xj , x2, , Xm. Show that W is the smallest subspace of V containing
Xl, X2, .. . , X m. Inotherwords,ifU is a subspace of V containing xl, X2, ... , xm,then
W ; U.

As we saw in Theorem 3.5 and Example 3.7, any nonempty subset of a vector
space V spans a subspace through the linear combinations of the vectors, and two
different subsets may span the same subspace. This means that a vector can be written
as linear combinations in various ways.
However, for some sets of vectors in a vector space V , any vector in V can be
expressed uniquely as a linear combination of the set. Such a set of vectors is called
a basis for V. In the following we will make this notion clear and show how to find
such a basis .

Definition 3.5 A set of vectors {Xl, X2, .. " xm} in a vector space V is said to be
linearly independent if the vector equation, called the linear dependence of Xi 'S,

has only the trivial solution Cl = C2 = ... = Cm = 0. Otherwise, it is said to be


linearly dependent.

By definition, a set of vectors {Xl, X2, . . . , xm} is linearly dependent if and only
if the linear dependence

CIXI + C2X2 + . .. + CmXm = 0

has a nontrivial solution (cj , C2, . . . , cm). For example , if Cm =1= 0, the equation can
be rewritten as
ci C2 Cm-l
Xm = --Xl - -X2 - ... - --Xm-l.
Cm Cm Cm

That is, a set ofvectors is linearly dependent if and only if at least one of the vectors
in the set can be written as a linear combination of the others. It means that at least
3.3. Bases 85

one of the vectors can be expressed as a linear combination of the set in two different
ways.
Example 3.10 (Linear independence in JR3) Let x = (1,2,3) and y = (3,2,1) be
two vectors in the 3-space JR3. Then clearly y =1= AX for any A E ]R (or ax + by = 0 is
possible only when a = b = 0). This means that {x, y} is linearly independent in JR3.
If w = (3, 6, 9), then {x, w} is linearly 'dependent since w - 3x = O. In general, if
x, y are non-collinear vectors in the 3-space ]R3, the set of all linear combinations of x
and y determines a plane W through the origin in]R3, i.e., W = {ax + by : a , b E R},
Let z be another nonzero vector in the 3-space JR3. If z E W, then there are some
scalars a, b E JR, not all of them are zero, such that z = ax + by, that is, the set
[x, y , z} is linearly dependent.Ifz ~ W, then ax + by + cz = 0 is possible only
when a = b = c = 0 (prove it). Therefore, the set {x, y, z] is linearly independent
if and only if z does not lie in W. 0
By abuse of language, it is sometimes convenient to say that "the vectors
xr, X2, . .. , xm are linearly independent," although this is really a property of a
set.
Example 3.11 The columns of the matrix

A= 4
1 -2
2
[ 2 -1
-1 0]
6 8
1 3
are linearly dependent in the 3-space ]R3, since the third column is the sum of the first
and the second. 0
As shown in Example 3.11, the concept of linear dependence can be applied to
the row or column vectors of any matrix.
Example 3.12 Consider an upper triangular matrix

235] .
A=
[ 016
004
The linear dependence of the column vectors of A may be written as

CJ [~] +'2 [!] +" [:] ~ [~],


which, in matrix notation, may be written as a homogeneous system:

From the third row, c3 = 0, from the second row c2 = 0, and substitution of them
into the first row forces Cl = 0, i.e., the homogeneous system has only the trivial
solution, so that the column vectors are linearly independent. 0
86 Chapter 3. Vector Spaces

The following theorem can be proved by the same argument.

Theorem 3.6 The nonzero rows of a matrix in row-echelon form are linearly inde-
pendent, and so are the columns that contain leading I's.

In particular, the rows of any triangular matrix with nonzero diagonals are linearly
independent, and so are the columns.

If V = lRm and VI, V2, . .. , Vn are n vectors in lRm , then they form an m x n
matrix A = [VI v2 .. . vnl . On the other hand, Example 3.9 shows that the linear
dependence ct VI + C2V2 + . .. + Cn Vn = 0 of Vi 'S is nothing but the homogeneous
equation Ax = 0, where x = (CI , c2, . .. , cn ) . Thus, one can say
(1) the column vectors Vi 'S of A are linearly independent in lRm if and only if the
homogeneous system Ax = 0 has only the trivial solution, and
(2) they are linearly dependent if and only if Ax = 0 has a nontrivial solution.
If U is the reduced row-echelon form of A , then we know that Ax = 0 and Ux = 0
have the same set of solutions. Moreover, a homogeneous system Ax = 0 with more
unknowns than equations always has a nontrivial solution by Theorem 1.2. This proves
the following lemma.

Lemma 3.7 (1) Ifn > m, any set ofn vectors in the m-space lRm is linearly depen-
dent .
(2) If U is the reduced row-echelon form of A, then the columns of U are linearly
independent if and only if the columns of A are linearly independent.

Example 3.13 Consider the vectors el = (1, 0, 0), e2 = (0, 1, 0) and e3 = (0, 0, 1)
in the 3-space lR3. The matrix A = [e\ e2 e3l is the identity matrix and so Ax = 0 has
only the trivial solution. Thus , the set of vectors {el , e2, e3} is linearly independent
and also spans lR3 . 0
Example 3.14 (The standard basis for lRn ) The vectors ej , e2, ... , en in lRn are
clearly linearly independent (see Theorem 3.6). Moreover, they span the n-space lRn :
In fact, a vector x = (Xl, x2, .. . ,X n ) E lRn is a linear combination of the vector ei 's :

o
Definition 3.6 Let V be a vector space. A basis for V is a set of linearly independent
vectors that spans V.

For example, as in Example 3.14, the set [ej , e2, .. . , en} forms a basis, called the
standard basis for the n-space lRn . Of course, there are many other bases for lRn .

Example 3.15 (A basis or not)


(1) The set of vectors (1,1, 0) , (0, -1 , 1), and (1,0,1) is not a basis for the 3-
space lR3 , since this set is linearly dependent (the third is the sum of the first two
vectors), and cannot span lR3. (The vector (1, 0, 0) cannot be obtained as a linear
combination of them (prove it).) This set does not have enough vectors spanning lR3.
3.3. Bases 87

(2) The set of vectors (1, 0, 0), (0,1,1), (1, 0,1) and (0,1,0) is not a basis either,
since they are not linearly independent (the sum of the first two minus the third makes
the fourth) even though they span jR3 . This set of vectors has some redundant vectors
spanning jR3.
(3) The set of vectors (1,1,1), (0,1,1), and (0, 0,1) is linearly independent and
also spans jR3. That is, it is a basis for jR3, different from the standard basis. This set
has the proper number of vectors spanning jR3, since the set cannot be reduced to a
smaller set nor does it need any additional vector to span jR3 . 0

By definition, in order to show that a set of vectors in a vector space is a basis, one
needs to show two things: it is linearly independent, and it spans the whole space.
The following theorem shows that a basis for a vector space represents a coordinate
system just like the rectangular coordinate system by the standard basis for jRn.

Theorem 3.8 Let a = {VI, V2, ... , vn} be a basis for a vector space V. Then each
vector x in V can be uniquely expressed as a linear combination of VI, V2, ... , Vn,
i.e., there are unique scalars a.'s, i = 1, 2, .. . , n, such that

Proof: If x can also expressed as x = blvl + b2V2 + . . . + bnv n, then we have


o= (al - b; )VI + (a2 - b2)V2 + . . . + (an - bn)vn . By the linear independence of
vi's,ai=b iforalli=I,2, . . . ,n. 0

Example 3.16 (Two different bases for jR3) Leta = [ej , e2, e3} be the standard basis
forjR3, and let,B = {VI , V2 , V3} with VI = (1, 1, 1) = el +e2 +e3, V2 = (0, 1, 1) =
e2 + e3, V3 = (0,0, 1) = ej , Then, ,B is also a basis for JR3 (see Example 3.15(3 .
For any x = (XI, X2 , X3) E jR3, one can easily verify that

x XI el + X2 e2 + X3e3

XlVI + (X2 - XI)V2 + (X3 - X2)V3 . o


Problem 3.7 Show that the vectors VI = (l, 2, 1), v2 = (2,9,0) and v3 = (3,3,4) in the
3-space lR3 form a basis .

Problem 3.8 Show that the set {l , x, x 2 , ... , x"} is a basis for Pn (R), the vector space of all
polynomials of degree ~ n with real coefficients .

Problem 3.9 Let Xk denote the vector in lRn whose first k - 1 coordinates are zero and whose
last n - k + I coordinates are 1. Show that the set {XI, X2, ... , xn} is a basis for lRn .
88 Chapter 3. Vector Spaces

3.4 Dimensions
We often say that the line ]R 1 is one-dimensional, the plane]R2 is two-dimensional and
the space R' is three-dimensional, etc. This is mostly due to the fact that the freedom
in choosing coordinates for each element in the space is 1,2 or 3, respectively. This
means that the concept of dimension is closely related to the concept of bases . Note
that for a vector space in general there is no unique way in choosing a basis . However,
there is something common to all bases, and this is related to the notion of dimension.
We first need the following lemma from which one can define the dimension of a
vector space.

Lemma 3.9 Let V be a vector space and let ex = {Xl, X2, ... , xm} be a set of m
vectors in V .
(1) Ifex spans V, then every set ofvectors with more than m vectors cannot be linearly
independent.
(2) If ex is linearly independent, then any set of vectors with fewer than m vectors
cannot span V.

Proof: Since (2) follows from (1) directly, we prove only (1). Let,8 = {Yl, Y2, .. . ,Yn}
be a set of n-vectors in V with n > m. We will show that ,8 is linearly dependent.
Indeed, since each vector Yj is a linear combination of the vectors in the spanning set
ex, i.e., for j = 1,2, .. . , n ,
m

Yj = aljXI + a2j X2 + .. . + amjXm = L aijXi,


i=l
we have

crYI + C2Y2 + .. . + cnYn = cr (alixi + a21x2 + ...+ amlXm)


+ c2(a12xI + a22x2 + ... + a m2 Xm)

= (all CI+ al2 c2 + ... + alnCn)XI


+ (a2l cl + a22 c2 + . .. + a2nCn)X2

Thus, ,8 is linearly dependent if and only if the system of linear equations

crYI + C2Y2 + ... + CnYn = 0


has a nontrivial solution (CI, C2, ,Cn ) ::f:: (0,0, .. . , 0). This is true if all the coef-
ficients of Xi'S are zero but not all of ci 's are zero. It means that the homogeneous
system of linear equations in Ci 'S
3.4. Dimensions 89

must have a nontrivial solution. But it is guaranteed by Lemma 3.7, since m < n. 0

It is clear by Lemma 3.9 that if a set (X = {XI , X2, ... , x n } of n vectors is a basis
for a vector space V, then no other set f3 = {YI, Y2 , . . . , Yr} of r vectors can be a basis
for V if r i= n. This means that all bases for a vector space V have the same number
of vectors, even if there are many different bases for a vector space. Therefore, we
obtain the following important result:

Theorem 3.10 If a basis for a vector space V consists of n vectors, so does every
other basis.

Definition 3.7 The dimension of a vector space V is the number, say n, of vectors
in a basis for V , denoted by dim V = n. When V has a basis of a finite number of
vectors, V is said to be finite dimensional.

Example 3.17 (Computing the dimension) The following are trivial:


(1) If V has only the zero vector: V = {OJ, then dim V = O.
(2) If V = IR n, then the standard basis {el , ez. ... , en} for V implies dim IRn = n.
(3) If V = Pn (IR) of all polynomials of degree less than or equal to n, then
dim Pn (lR) = n + 1 since {I , x, x 2 , ... , x " ] is a basis for V.
(4) If V = Mm x n( lR) of all m x n matrices, then dim M m x n(lR) = mn since {Eij :
i = 1, .. . , m, j = 1, .. . , n} is a basis for V , where Eij is the m x n matrix
whose (i, j)-th entry is 1 and all others are zero. 0

If V = C(IR) of all real-valued continuous functions defined on the real line,


then one can show that V is not finite dimensional. A vector space V is infinite
dimensional if it is not finite dimensional. In this book, we are concerned only with
finite-dimensional vector spaces unless otherwise stated.

Theorem 3.11 Let V be a finite-dimensional vector space.


(1) Any linearly independent set in V can be extended to a basis by adding more
vectors if necessary.
(2) Any set of vectors that spans V can be reduced to a basis by discarding vectors
if necessary.

Proof: We prove (1) and leave (2) as an exercise. Let (X = {XI, X2, . .. , Xk } be a
linearly independent set in V. If (X spans V, then (X is a basis. If (X does not span V,
then there exists a vector, say Xk+ I , in V that is not contained in the subspace spanned
by the vectors in (x. Now {XI, .. . , Xko xk+d is linearly independent (check why). If
{XI, . . , Xko xk+d spans V, then this is a basis for V. If it does not span V , then the
90 Chapter 3. VectorSpaces

same procedure can be repeated, yielding a linearly independent set that spans V, i.e.,
a basis for V . This procedure must stop in a finite number of steps because of Lemma
3.9 for a finite-dimensional vector space V. 0

Theorem 3.11 shows that a basis for a vector space V is a set of vectors in V which
is maximally independent and minimally spanning in the above sense. In particular,
if W is a subspace of V, then any basis for W is linearly independent also in V, and
can be extended to a basis for V . Thus dim W ::: dim V.

Corollary 3.12 Let V be a vector space ofdimension n. Then


(1) any set ofn vectors that spans V is a basisfor V, and
(2) any set ofn linearlyindependent vectorsis a basisfor V.

Proof: Again we prove (1) only. If a spanning set of n vectors were not linearly
independent, then the set would be reduced to a basis that has a number of vectors
smaller than n. 0

Corollary 3.12 means that if it is known that dim V = n and if a set of n vectors
either is linearly independent or spans V, then it is already a basis for the space V.

Example 3.18 (Constructing a basis) Let W be the subspace of]R4 spanned by the
vectors

xi = (1 , -2, 5, -3), X2 = (0, 1, 1, 4), X3 = (1, 0, 1, 0).

Find a basis for Wand extend it to a basis for ]R4 .

Solution: Note that dim W ::: 3 since W is spanned by three vectors Xj's. Let A be
the 3 x 4 matrix whose rows are XI. X2 and X3 :

-2 5
-3 ]
1 1 4 .
o 1 o
Reduce A to a row-echelon form :

The three nonzero row vectors of U are clearly linearly independent, and they also
span W because the vectors Xl, X2 and X3 can be expressed as a linear combination
of these three nonzero row vectors of U . Hence , the three nonzero row vectors of U
provides a basis for W. (Note that this implies dim W = 3 and hence xi, X2, X3 is
3.5. Rowand column spaces 91

also a basis for W by Corollary 3.12. The linear independence of Xi 'S is a by-product
of this fact).
To extend it to a basis for ]R4 , just add any nonzero vector of the form X4 =
(0, 0, 0, t) to the rows of U. D

Problem 3.10 Let W be a subspace of a vector space V. Show that if dim W = dim V , then
W=V .

Problem 3.11 Find a basis and the dimension of each of the following subspaces of Mn xn (R)
of all n x n matrice s.
(1) The space of all n x n diagonal matrices whose traces are zero.
(2) The space of all n x n symmetric matrices .
(3) The space of all n x n skew-symmetric matrices.

As a direct consequence of Theorem 3.11 and the definition of the direct sum of
subspaces, one can show the following corollary.

Corollary 3.13 For any subspace U of V, there is a subspace W of V such that


V = U EB W.

Proof: Choose a basis {UI , . . . , Uk} for U, and extend itto a basis {UI, .. . , Ub Uk+l ,
. . . un} for V. Then the subspace W spanned by {Uk+I . . . un} satisfies the require-
ment. D

Problem 3.12 Let lv j , V2... vn} be a basis for a vector space V and let Wi = {rv i : r E R}
be the subspace of V spanned by Vi. Show that V = W I EEl W2 EEl ... EEl Wn.

3.5 Rowand column spaces


In this section, we go back to systems of linear equations and study them in terms of
the concepts introduced in the previous sections. Note that an m x n matrix A can be
abbreviated by the row vectors or column vectors as follows:

where fi is the i-th row vectors of A in jRn, and Cj is the j -th column vectors of A in
jRm.

Definition 3.8 Let A beanm x n matrix withrow vectors {fl . f 2 , . .. , f m } and column
vectors {CI , C2, ... , cn }.
92 Chapter 3. Vector Spaces

(1) The row space of A is the subspace in jRn spanned by the row vectors {rl. r2 .. .
r m } . denoted by R(A) .
(2) The column space of A is the subspace in jRm spanned by the column vectors
Iei. C2 . . . cn } . denoted by C(A) .
(3) The solution set of the homogeneous equation Ax =
0 is called the null space
of A, denoted by N(A) .

Note that the null space N (A) is a subspace of the n-space jRn. Its dimension is
called the nullity of A. Since the row vectors of A are just the column vectors of
its transpose A T and the column vectors of A are the row vectors of AT, the row
(column) space of A is just the column (row) space of AT; that is,

Since Ax = XlCl + X2C2 + ... + XnC n for any vector x = (Xl. X2 , . ,Xn) E jRn, we
get
C(A) = {Ax : x E jRn}.
Thus, for a vector b E jRm. the system Ax = b has a solution if and only if b E
C(A) ~ jRm. In other words, the column space C(A) is the set of vectors bE jRm for
which Ax = b has a solution.
It is quite natural to ask what the dimensions of those subspaces are. and how
one can find bases for them. This will help us to understand the structure of all the
solutions of the equation Ax = b. Since the set of the row vectors and the set of
the column vectors of A are spanning sets for the row space and the column space,
respectively. a minimally spanning subset of each of them will be a basis for each of
them . This is not a difficult problem for a matrix of a (reduced) row-echelon form.

Example 3.19 (Find a basis for the null space) Let U be in a reduced row-echelon
form given as
1 0 0 2 2]
o 1 0 -1 3
U= 0 0 1 4 -1 .
[
o 0 0 0 0
Clearly, the first three nonzero row vectors containing leading 1's are linearly inde-
pendent and they form a basis for the row space R(U) . so that dim R(U) = 3. On the
other hand, the first three columns containing leading l's are also linearly indepen-
dent (see Theorem 3.6), and the last two column vectors can be expressed as linear
combinations ofthem. Hence, they form a basis for C(U), and dim C(U) = 3. To find
a basis for the null space N (U) , we first solve the system Ux = 0 to get the solution
3.5. Rowand columnspaces 93

where n, = (-2 , 1, -4, 1, 0),ot= (-2 , -3 , 1,0, l),ands andt are arbitrary
values for the free variables X4 and X5, respectively. It shows that these two vectors
n, and n, span the null space N (U ), and they are clearly linearly independent (see
their last two entries). Hence, the set (os , oel is a basis for the null space N(U ). 0
For any matrix A, we first investigate the row space R (A) and the null space
N (A) of A by comparing them with those of the reduced row-echelon form U of A.
Since Ax = 0 and Ux = 0 have the same solution set by Theorem 1.1, we clearly
have N (A ) = N(U).
Let [rj , ... , r m } be the row vectors of an m x n matrix A. The three elementary
row operations change A into the matrices Ai of the following three types:

Al = kr, for k = 0; A2 = for i < j; A3 = ri + krj


rj

It is clear that the row vectors of the three matrices AI, A2 and A3 are linear com-
binations of the row vectors of A. On the other hand, by the inverse elementary row
operations, these matrices can be changed into A. Thus, the row vectors of A can also
be written as linear combinations of those of Ai'S. This means that if matrices A and
B are row equivalent, then their row spaces must be equal, i.e., R(A) = R(B).
Now the nonzero row vectors in the reduced row-echelon form U are always
linearly independent and span the row space of U (see Theorem 3.6). Thus they form
a basis for the row space R (A) of A. It gives the following theorem.
Theorem 3.14 Let U be a (reduced) row-echelon form of a matrix A. Then

R (A) = R(U) and N(A) = N(U ).

Moreover, if U has r nonzero row vectors containing leading 1's, then they form a
basis for the row space R(A), so that the dimension ofR(A) is r.

The following example shows how to find bases for the row and the null spaces,
and at the same time how to find a basis for the column space.
Example 3.20 (Find bases for the row space and the column space of A) Let A be a
matrix given as

A = [-;o -;
-3
~ - ~ -~] = [ ~~ ] .
3 4 1 r3
3 6 0 -7 2 ~

Find bases for the row space R(A), the null space N(A ), and the column space C(A)
of A .
94 Chapter 3. Vector Spaces

Solutioo: (1) Find a basis for 'R (A ): By Gauss-Jordan elimination on A , we get the
reduced row-echelon form V :

V =
[1o a
0 I
0
2
-I o I
0 1 1
1] .

Since the three nonzero row vectors



VI = (1, 0, 2, 0, 1),
V2 = (0, 1, -1, 0, 1),
V3 = (0, 0, 0, 1, 1)

of V are linearly independent, they form a basis for the row space 'R(V) = R(A), so
dim R(A) = 3. (Note that in the process of Gaussian elimination, we did not use a
permutation matrix . This means that the three nonzero rows of V were obtained from
the first three row vectors rl, r2, r3 of A and the fourth row r4 of A turned out to be
a linear combination of them. Thus the first three row vectors of A also form a basis
for the row space.)
(2) Find a basisfor N(A) : It is enough to solve the homogeneous system Ux = 0,
since N(A) = N(V). That is, neglecting the fourth zero equation , the equation
Ux = 0 takes the following system of equations :

X2
+ 2x3
X3
+
+
Xs
Xs
= 0
X4 + X5 = o.
Since the first, the second and the fourth columns of V contain the leading 1's, we
see that the basic variables are XI , X2, X4, and the free variables are X3 , xs . As in
Example 3.19 by assigning arbitrary values sand t to the free variables X3 and xs,
one can find the solution x of Ux = 0 as

x= SOs + tOt,

where n, = (-2,1 ,1 ,0, Oj and n, = (-1, -1 ,0, -1, 1).lnfact,thetwovectors


n, and n, are the solutions when (X3 , xs) = (s, t) is (1,0) and when (X3, xs) = (s, t)
is (0, 1), respectively. They must be linearly independent, since (1,0) and (0, 1), as
the (X3, xs)-coordinates of n, and n, respectively, are linearly independent. Since any
solution of V x = 0 is a linear combination of them, the set (os , n.] is a basis for the
null space N(V) = N(A). Thus dimN(A) = 2 = the number offree variables in
Vx=O.
(3) Find a basis for C(A): Let CI, C2, C3, C4, Cs denote the column vectors of
A in the given order. Since these column vectors of A span C(A), we only need to
discard some of the columns that can be expressed as linear combinations of other
column vectors. But, the linear dependence

XI CI + X2C2 + X3 C3 + X4C4 + XScs = 0, i.e., Ax = 0,


3.5. Rowand column spaces 95

holds if and only ifx = (Xt , ... ,xs ) E N (A). By taking x = n, = (-2 , 1, 1, 0, 0)
orx = 0 , = ( - 1 , -1, 0, -1 , 1), the basis vectors of N(A) given in (2), we obtain
two nontrivial linear dependencies of C j 's:

= 0,
- C4 + Cs = 0,

respectively. Hence, the column vectors C3 and Cs corresponding to the free variables
in Ax = 0 can be written as

C3 2ct - C2,

Cs = Ct + C2 + C4.

That is, the column vectors C3, Cs of A are linear combinations of the column vectors
Ct, C2, C4, which correspond to the basic variables in Ax = O. Hence, [cj , C2, C4}
spans the column space C(A) .
We claim that [cj , C2, C4} is linearly independent. Let A = [Ct C2 C4] and U =
[uj U2 U4] be submatrices of A and U, respectively, where Uj is the j-th column
vector of the reduced row-echelon form U of A obtained in (1):

- 100]
0 1 0 - [ 1 2 2]
-2 -5 -1
U=
[o 0
0 Oland A =
0
0 - 3
3
4
6-7
.

Then clearly U is the reduced row-echelon form of Aso that N(A) = NeU). Since
the vectors uj , U2, U4 are just the columns of U conta ining leading 1's, they are linearly
independent, by Theorem 3.6, and Ux = 0 has only a trivial solution. This means
that Ax = 0 has also only a trivial solution , so [cj , C2, C4} is linearly independent.
Therefore, it is a basis for the column spaceC(A) and dimC(A) = 3 = the number of
basic variables . That is, the column vectors of A corresponding to the basic variables
in Ux = Oform a basis for the column space C(A) . 0

In summary, given a matrix A, we first find the (reduced) row-echelon form U of


A by Gauss-Jordan elimination. Then
a basis for R(A) = R(U) is the set of nonzero row vectors of U,
a basis for N(A) = N(U) can be found by solving Ux = 0,
for a basis for the column space C(A), one notices that C(U) =1= C(A) in general ,
since the column space of A is not preserved by Gauss-Jordan elimination. (See
Problem 3.16.) However, we have dimC(A) = dimC (U) , and a basis for C(A)
can be formed by selecting the columns in A, not in U , which correspond to the
basic variables (or the leading 1's in U ).
96 Chapter 3. Vector Spaces

Alternatively, a basis for the column space C(A) can also be found with the ele-
mentary column operations, which is the same as finding a basis for the row space
R(A T) of AT .
Problem 3.13 Let A be the matrix given in Example3.20. Find the conditions on a, b, c, d
so that the vector x = (a , b, c, d) belongs to C(A) .

Problem 3.14 Find bases for R(A) andN(A) of the matrix

1 -2 0 0
2 -5 -3 -2
A= 0 5 15 10
[
2 6 18 8

Also finda basis for C(A) by finding a basis for R(A T).

Problem 3.15 Let A and B be twon x n matrices. Showthat AB = oif and only ifthe column
space of B is a subspaceof the nullspaceof A.

Problem 3.16 Find an exampleof a matrix A and its row-echelon form U such that C(A) f=
C(U) . What is wrong in C(A) = R(A T ) = R(U T ) = C(U)?

3.6 Rank and nullity


The argument in Example 3.20 is so general that it can be used to prove the following
theorem, which is one of the most fundamental results in linear algebra. The proof
given here is just a repetition of the argument in Example 3.20 in a general case, and
so it may be skipped at the reader's discretion.

Theorem 3.15 (The fundamental theorem) For any m x n matrix A , the row space
and the column space of A have the same dimension ; that is, dim R(A) = dim C(A).

Proof: Let dim R(A) = r and let U be the reduced row-echelon form of A . Then
r is the number of the nonzero row (or column) vectors of U containing leading 1's,
which is equal to the number of basic variables in Ux = 0 or Ax = O. We shall prove
that the r columns of A corresponding to the leading l's (or basic variables) form a
basis for C(A), so that dim C(A) = r = dim R(A).
(1) They are linearly independent: Let Adenote the submatrix of A whose columns
are those of A corresponding to the r basic variables (or leading 1's) in U , and let
o denote the submatrix of U consisting of the r columns containing leading 1'so
Then, it is clear that 0 is the reduced row-echelon form of A, so that Ax 0 if =
and only if Ox = O. However, Ox = 0 has only a trivial solution since the columns
3.6. Rank and nullity 97

of V containing the leading I 's are linearly independent by Theorem 3.6. Therefore,
Ax = 0 also has only the trivial solution, so the columns of Aare linearly independent.
(2) They spanC(A): Note that the columns of A corresponding to the free variables
are not contained in A, and each of these column vectors of A can be written as a
linear combination of the column vectors of A (see Example 3.20). To show this,
let {Cil' Ci2' , Cik} be the columns of A (not contained in A) corresponding to the
free variables {Xii' Xi2' ,Xik}, and let Xij be any of these free variables. Then, by
assigning the value 1 to Xi j and 0 to all the other free variables, one can get a nontrivial
solution of
Ax = XI cI + X2c2 + ... + xnc n = O.

When such a solution is substituted into this equation, one can see that the column
Cij of A corresponding to Xij =
1 is written as a linear combination of the columns
of A. This can be done for each free variable Xi j' j = 1, 2, . .. , k, so the columns of
A corresponding to those free variables are redundant in the spanning set of C(A). 0

Remark: (1) In the proof ofTheorem 3.15, once we have shown that the columns in A
are linearly independent as in step (1), we may replace step (2) by the following argu-
ment: One can easily see that dim C(A) 2: dim R(A) by Theorem 3.11. On the other
hand, since this inequality holds for arbitrary matrices, applying to A T particularly we
get dim C(A") 2: dim R(A T ) . Moreover, C(A T ) = R(A) and R(A T ) = C(A) im-
plies dim C(A) :s dim R(A), which means dim C(A) = dim R(A). This also means
that the column vectors of A span C(A), and so form a basis.
(2) The proof (2) of Theorem 3.15 also shows that the reduced row-echelon form
of a system is unique , which was stated on page 10. In fact, if VI and V2 are two
reduced row-echelon forms of an m x n matrix A, then the columns of VI and V2
corresponding to the basic variables (i.e., containing leading 1's) must be the same
and of the form [0 . . . 0 1 0 .. . of by the definition of the reduced row-echelon
form . If there are no free variables , then it is quite clear that

1 0 0
0 1 0
0
VI= 1 = V2
0 0 0

0 0 ... 0

Suppose that there is a free variable . Since VIX = 0 if and only if V2X = 0, one can
easily check that the columns of VI and V2 corresponding to each free variable must
also be the same, so that VI = V2.
98 Chapter 3. Vector Spaces

In summary, the following equalities are now clear from Theorems 3.14 and 3.15:

dimN(A) = dimN(U)
the number of free variables in Ux = O.
dimR(A) = dimR(U)
= the number of nonzero row vectors of U
= the maximal number of linearly independent
row vectors of A
= the number of basic variables in Ux = 0
= the maximal number of linearly independent
column vectors of A
= dimC(A).

Definition 3.9 For an m x n matrix A, the rank of A is defined to be the dimension


of the row space (or the column space), denoted by rank A.

Clearly, rank In = n and rank A = rank AT. And for an m x n matrix A, since
dim R(A) S m and dim C(A) S n, we have the following corollary :

Corollary 3.16 If A is an m x n matrix, then rank AS min{m, n}.


Since dim R(A) = dimC(A) = rank A is the number of basic variables in
Ax = 0, and dimN(A) = nullity of A is the number of free variables Ax = 0, we
have the following theorem.

Theorem 3.17 (Rank Theorem) For any m x n matrix A,

dim R(A) + dimN(A) rank A + nullity of A = n,


dimC(A) + dimN(A T) = rank A + nullity of AT = m.

If dimN(A) = 0 (or N(A) = (O}), then dim R(A) = n (or R(A) = ~n), which
means that A has exactly n linearly independent rows and n linearly independent
columns. In particular, if A is a square matrix of order n, then the row vectors are
linearly independent if and only if the column vectors are linearly independent. There-
fore, Ax = 0 has only the trivial solution, and by Theorem 1.9 we get the following
corollary.

Corollary 3.18 Let A be an n x n square matrix. Then A is invertible if and only if


rank A = n.

Example 3.21 (Find the rank and the nullity) For a 4 x 5 matrix
3.6. Rank and nullity 99

[
2
A-
-
-~1 -2
2
1 2

find the rank and the nullity of A.

Solution: Gaussian elimination gives

2021]
013 1
o0 0 1 .
000 0

The first three nonzero rows containing leading 1 's in U form a basis for R(U) =
R(A). Therefore, rank A = dim'R.(A) = dimC(A) = 3, the nullity of A =
dimN(A) = 5 - dim R(A) = 2. 0

Problem 3.17 Find the nullity and the rank of each of the following matrices :

(1) A =[ ~ ; -~ ~], (2) A = [~ ~ ~ ~].


-I -2 0-5 2 I 5 -2

For each of the matrices. show that dim R(A) = dim C(A) directly by finding their bases.
Problem 3.18 Show that a system of linear equations Ax = b has a solution if and only if
rank A = rank [A b], where [A b] denotes the augmented matrix for Ax = b.

Theorem 3.19 For any two matrices A and B for which A B can be defined,
(1) N(AB) ;2 N(B),
(2) NAB)T) ;2 N(A T),
(3) C(AB) S; C(A),
(4) R(AB) S; R(B).

Proof: (1) and (2) are clear, since Bx = 0 implies (AB)x = A(Bx) = O.
(3) For an m x n matrix A and an n x p matrix B,

C(AB) = {ABx : x E lR.P}


S; {Ay: y E 1R.n} = C(A),

because Bx E lRn for any x E lRP . (See Example 3.9.)


(4) R(AB) = CABl) = C(B T AT) S; C(B T) = R(B). o
100 Chapter 3. Vector Spaces

Corollary 3.20 rank(AB) :s min{rank A , rank B} .


In some particular cases, the equality holds. In fact, it will be shown later in
Theorem 5.25 that for any square matrix A, rank(A T A) = rank A = rank(AA T ) . The
following problem illustrates another such case.
Problem 3.19 Let A be an invertible square matrix. Show that, for any matrix B, rank(AB) =
rank B = rank(B A) .

Theorem 3.21 Let A be an m x n matrix of rank r. Then


:s
(1) for every submatrix C of A, rank C r, and
(2) the matrix A has at least one r x r submatrix ofrank r .. that is, A has an invertible
submatrix oforder r.

Proof: (l) Consider an intermediate matrix B which is obtained from A by removing


the rows that are not wanted in C. Then clearly nCB) ~ R(A) and hence rank B :s
rank A. Moreover, since the columns of C are taken from those of B , C(C) ~ C(B)
and rank C :s rank B.
(2) Note that one can find r linearly independent row vectors of A, which form a
basis for the row space of A. Let B be the matrix whose row vectors consist of these
vectors. Then rank B = r and the column space of B must be of dimension r. By
taking r linearly independent column vectors of B, one can find an r x r submatrix
C of A with rank r. 0

Problem 3.20 Prove that the rank of a matrix is equal to the largest order of its invertible
submatrices .

Problem 3.21 For each of the matrices given in Problem 3.17, find an invertible submatrix of
the largest order.

3.7 Bases for subspaces


In this section, we introduce two ways of finding bases for V + Wand V n W of
two subspaces V and W of the n-space jRn, and then derive an important relationship
between the dimensions of those subspaces in terms of the dimensions of V and W.
Let ex = {VI, V2, . , Vk} and f3 = {WI, W2, . , we} be bases for V and W,
respectively. Let Q be the n x (k + l) matrix whose columns are those basis vectors:

Q= [VI" . Vk WI ... wtlnx(kH) .

Theorem 3.22 Let V and W be two subspaces ofjRn, and Q be the matrix defined
above.
(1) C(Q) = V + W , so that a basisfor the column space C(Q) is a basis for V + W.
(2) N(Q) can be identified with V n W so that dim(V n W) = dimN(Q).
3.7. Bases for subspaces 101

Proof: (1) It is clear that C(Q) = V + W.


(2) Let x = (al," " ak, bl, , be) E N(Q) ~ jRk+l . Then

Qx = alvl + + akVk + blwl + ... + bewe = 0,

from which we get

If we set

Y = alvl+ .. +akvk
-(bl WI + ... + bew ),

then Y E V n W since the first right-hand side alVI + . . . + akVk is in Vasa linear
combination of the basis vectors in a and the second right-hand side -(blwl + ... +
bew) is in W as a linear combination of the basis vectors in p. That is, to each
x E N(Q), there corresponds a vector Y in V n w.
On the other hand, if Y E V n W, then Y can be written in two linear combinations
by the bases for V and W separately as

Y = alvl+ +akvkEV,
Y = blwl + .. . + bewe E W,

for some al , .. . , ak and bl , . . . , be. Let x = (al , . .. , ak, -bl , . .. , -be) E jRk+e.
Then it is quite clear that Qx = 0, i.e., x E N(Q). Therefore, the correspondence ofx
in N (Q) ~ jRk+ e to a vector y in V n W ~ jRn gives us a one-to-one correspondence
between the sets N(Q) and V n W .
Moreover, if Xi, i = 1, 2, correspond to Yi, then one can easily check that XI + X2
corresponds to YI + Y2, and kx, corresponds to ky!. This means that the two vector
spaces N (Q ) and V n W can be identified as vector spaces (see Section 4.2 for an exact
meaning of this identification). In particular, for a basis for N(Q) , the corresponding
set in V n W is a basis for V n W : that is, if the set of vectors

I XI: = (all, .. " alk , bll' , bl e),

1 x, = (asl , ... , ask, bsl , , bs),

is a basis for N (Q), then the set of vectors


YI
:
= allvl + . .. + alkvk,
or
1 YI
:
= -(bllWI + ... + blewe),

1 Ys =
is a basis for V
a slvl + .. . + askvk,
n W, and vice-versa.
Ys =
This implies that
-(bsIWI + ...+ bsewt)

dimN(Q ) = dim(V n W) . o
Note that dim(V + W) =1= dim V + dim W , in general. The following theorem
gives a relation between them.
102 Chapter 3. VectorSpaces

Theorem 3.23 For any subspaces V and W of the n-space IRn,

dim(V + W) + dim(V n W) = dim V + dim W.

Proof: Let a = {VI, V2, . . , vd and f3 = {WI , W2, . . . , we} be bases for V and W ,
respectively. Let Q be the n x (k + f) matrix whose columns are the previous basis
vectors:
Q = [VI ' .. Vk WI . . . welnx(k+l)'
Then, by the Rank Theorem and Theorem 3.22, we have

k + f = dimC(Q) + dimN(Q) = dim(V + W) + dim(V n W). D

In particular, dim(V + W) = dim V + dim W if and only if V n W = {OJ. In this


case, V + W = V EB W .

Example 3.22 (Find a basis for a subspace) Let V and W be two subspaces of IRs

r r
with bases

= (1 , 3, -2, 2, 3), = (2, 3, -1, -2, 9),


V2 = (1, 4, -3, 4, 2), W2 = (1, 5, -6, 6, 1),
V3 = (1, 3, 0, 2, 3), W3 = (2, 4 , 4, 2, 8),

respectively. Find bases for V + Wand V n w.

Solution: The matrix Q takes the following form:

The Gauss-Jordan elimination gives

U =
1 0 0
0 1 0 -3
5 00]
2 0
[ 001
o -1 0 .
000 001

From this, one can directly see that dim(V + W) = 4, and the columns VI, V2, V3, W3
corresponding to the basic variables in Qx = 0 (or leading 1's in U) form a basis for
C(Q) = V + W. Moreover, dimN(Q) = dim(V n W) = 2, corresponding to two
free variables X4 and xs in Qx = O.
To find a basis for V n w, we solve Ux = 0 for (X4, xs) = (1,0) and (X4, xs) =
(0,1) respectively to obtain a basis for N(Q) :
3.7. Bases for subspaces 103

Xl = (-5,3,0,1,0,0) and X2 = (0, -2,1,0,1,0).


From Qx; = 0, we obtain two equations:

-5VI + 3V2 + WI = 0,
-2V2 + V3 + W2 = 0.

Therefore, {YJ, Y2} is a basis for V n W, where

Clearly, one can check

dim(V + W) + dim(V n W) = 4 + 2 = 3 + 3 = dim V + dim W. 0

Remark: (Another method for finding bases) Example 3.22 illustrates a method for
findingbases for V + Wand V n W for givensubspaces V and W oflRn by constructing
a matrix Q whose columns are basis vectors for V and basis vectors for W. There is
another method for finding their bases by constructing a matrix Q whose rows are
basis vectors for V and basis vectors for W. In this case, clearly V + W = R(Q) .
By finding a basis for the row space R(Q), one can get a basis for V + W.
On the other hand, a basis for V n W can be found as follows: Let A be the k x n
matrix whose rows are basis vectors for V, and B the e x n matrix whose rows are
basis vectors for W. Then V = R(A) and W = R(B). Let A denote the matrix A
with an additional unknown vector X = (Xl , x2, . .. ,Xn ) E IRn attached as the bottom
row, i.e.,

and the matrix B is defined similarly.Then it is clear that R(A) = R(A) and R(B) =
R(B) if and only if X E V n W = R(A) n R(B). This means that the row-echelon
form of A and that of A should be the same via the same Gaussian elimination.
Thus, by comparing the row vectors of the row-echelon form of A with those of A,
one can obtain a system of linear equations for x = (Xl, x2, . . . ,xn ) . By the same
argument applied to Band B, one gets another system oflinear equations for the same
x = (Xl, X2, . ,xn ) . Common solutions of these two systems together will provide
us a basis for V n W.
The following example illustrates how one can apply this argument to find bases
for V + Wand V n W.

Example 3.23 (Find a basis for a subspace) Let V be the subspace of 1R5 spanned
by
104 Chapter 3. Vector Spaces

VI = (1, 3, -2, 2, 3),


V2 = ( I, 4, -3, 4, 2),
V3 = (2, 3, -I , -2, 10),
and W the subspace spanned by

WI = (1, 3, 0, 2, I) ,
W2 = ( I , 5, -6, 6, 3),
W3 = (2, 5, 3, 2, I ) .

Find a basis for V + Wand for V n w.


Solution: Note that the matrix A whose row vectors are Vi 'S is reduced to a row-
echelon form


I 3 -2 2
1 -1 2
3]
-1 ,
[
1
so that dim V = 3. Similarly, the matrix B whose row vectors are w/s is reduced to
a row-echelon form

[

1 3 0 2
2 -6 4
1]
2 ,

so that dim W = 2.

Now, if Q denotes the 6 x 5 matrix whose row vectors are Vi 'S and w/s, then
V + W = R (Q ). By Gaussian elimination, Q is reduced to a row-echelon form,
excluding zero rows:

I 3 -2
-1 22 -13 ]


I
I -I .
[
I
Thus, the four nonzero row vectors

(1, 3, -2, 2, 3), (0, 1, -1 , 2, -1), (0, 0, 1, 0, -1), (0, 0, 0, 0, 1)

form a basis for V + W, so that dim(V + W) = 4.


We now find a basis for vn W.A vector x = (XI , X2, X3, X4, xs) E ]Rs is contained
in V n W if and only if x is contained in both the row space of A and that of B.
Let A be A with x attached as the last row:

A- = [~ 2
3 -2
4 -3 4 22 3]
3 - I - 2 10 .
XI X2 X3 X4 X5
3.7. Bases for subspaces 105

Then by the same Gaussian elimination to reduce A to its row-echelon form, A is


reduced to

[
1 3
o
o
o
1
0
0 -XI+X2+X3
-2
-1
0 -i]
1
o
.

Therefore, x E R(A) = V if and only if R(A) = R(A) . By comparing the row


vectors of the row-echelon form of A with those of A, one can say that x E RCA) if
and only if the last row vector of the row-echelon form of A is the zero vector, that
is, x is a solution of the homogeneous system of equat ions

- XI + X2 + X3 = 0
{ 4xI - 2X2 + X4 = o.
The same calculation with iJ gives another homogeneous system of linear equations
for x:
- 9X I + 3X2 + X3 0
+
1
4xI - 2X2 X4 = 0
2xI - X2 + X5 = O.
Solving these two homogeneous systems together yields

V nW = {t(I, 4, -3, 4, 2) : t E 1R}.

Hence, {(l, 4, -3, 4, 2)} is a basis for V n Wand dim(V n W) = 1. 0

Problem 3.22 Let V and W be the subs paces of the vector space P3 (R) spann ed by

3 x + 4x 2 + x3
5 + 5x 2 + x3
5 5x + lOx 2 + 3x 3
and

I
WI (x) = + 3x 2 + 2x 3
9 - 3x
W2(X) = 5 - + 2x 2 + x 3 x
W3(X) = 6 + 4x 2 + x 3
respectively. Find the dimensions and bases for V + Wand V n w.

Problem 3.23 Let

V = {(x,Y,Z,U)ElR4 : y + z+ u = O},
W {(X , Y, z, u) E
4
lR : x + Y = 0, Z = 2U}
be two subs paces of lR4 . Find bases for V. W, V + W, and V n w.
106 Chapter 3. Vector Spaces

3.8 Invertibility

In Chapter 1, we have seen that a non-square matrix A may have only one-sided
(right or left) inverses. In this section , it will be shown that the existence of a one-
sided inverse (right or left) of A implies the existence or the uniqueness of the solutions
of a system Ax = b.

Theorem 3.24 (Existence) Let A be an m xn matrix. Then the following statements


are equivalent.
(1) For each b E jRm, Ax = b has at least one solution x in jRn.
(2) The column vectors of A span jRm, i.e., C(A) =
jRm.
(3) rank A = m (hence m :s n) .
(4) A has a right inverse (i.e., B such that AB = 1m ) .

Proof: (1) <=> (2): In general, C(A) ~ jRm. For any b E jRm, there is a solution
x E jRn of Ax = b if and only if b is a linear combination of the column vectors of
A, i.e., b E C(A) . Thus jRm = C(A).
(2) <=> (3): C(A) = jRm if and only if dimC(A) = m :s n (see Problem 3.10). But
dimC(A) = rank A = dim R(A) :s min{m, n},
(1) => (4): Let ej , e2, ... , em be the standard basis for jRm. Then for each ei, one
can find an Xi E jRn such that AXi = ei by the hypothesis (1). If B is the n x m matrix
whose columns are these Xi'S: i.e., B = [Xj X2 . . xm ] , then, by matrix multiplication,

(4) => (1): If B is a right inverse of A, then for any b s jRm, X = Bb is a solution
~~=~ 0

Condition (2) means that A has m linearly independent column vectors, and con-
dition (3) implies that there exist m linearly independent row vectors of A, since rank
= =
A m dim R(A) .
Note that if C(A) ~ jRm, then Ax = b has no solution for b C(A) .

Theorem 3.25 (Uniqueness) Let A be an m x n matrix. Then thefollowing statements


are equivalent.
(1) For each b E jRm, Ax = b has at most one solution x in jRn.
(2) The column vectors of A are linearly independent.
(3) dimC(A) = rank A = n (hence n :s m).
(4) R(A) = jRn.
(5) N(A) = {OJ.
(6) A has a left inverse (i.e., C such that C A = In).

Proof: (1) => (2): Note that the column vectors of A are linearly independent if
and only if the homogeneous equation Ax = 0 has only the trivial solution . However,
3.8. Invertibility 107

Ax = 0 has always the trivial solution x = 0, and the statement (1) implies that it is
the only one.
(2) {} (3): Clear, because all the column vectors are linearly independent if and
only if they form a basis for C(A), or dim C(A) = n ~ m.
(3) {} (4): Clear, because dim R(A) = rank A = dimC(A) = n if and only if
R(A) = jRn (see Problem 3.10).
(4) {} (5): Clear, since dim R(A) + dimN(A) = n.
(2) ::::} (6): Suppose that the columns of A are linearly independent so that rank
A = n. Extend these column vectors of A to a basis for jRm by adding m -n additional
independent vectors to them. Construct an m x m matrix S with those basis vectors
in its columns. Then the matrix S has rank m, and hence it is invertible. Let C be the
n x m matrix obtained from S-I by throwing away the last m - n rows. Since the
first n columns of S constitute the matrix A, we have C A = In.

(6) ::::} (1): Let C be a left inverse of A. If Ax = b has no solution, then we are
done . Suppose that Ax = b has two solutions, say Xt and X2. Then

XI = CAxI = Cb = CAX2 = X2
Hence, the system can have at most one solution. o

Remark: (1) We have proved that an m x n matrix A has a right inverse if and only
if rank A = m, while A has a left inverse if and only if rank A = n. Therefore, if
m 1= n, A cannot have both left and right inverses.
(2) For a practical way of finding a right or a left inverse of an m xn matrix A,
we will show later (see Remark (1) below Theorem 5.26) that if rank A = m, then
(AAT )-I exists and AT(AAT)-I is a right inverse of A, and if rank A = n, then
(AT A)-I exists and ( A T A)-I AT is a left inverse of A (see Theorem 5.26) .
(3) Note that if m = n so that A is a square matrix, then A has a right inverse (and
a left inverse) if and only if rank A = m = n. Moreover, in this case the inverses are
the same (see Theorem 1.9). Therefore, a square matrix A has rank n if and only if
=
A is invertible. This means that for a square matrix "Existence Uniqueness," and
the ten statements listed in Theorems 3.24-3.25 are all equivalent. In particular, for
the invertibility of a square matrix it is enough to show the existence of a one-sided
inverse.

Problem 3.24 For each of the following matrices, find all vectors b such that the system of
linear equations Ax = b has at least one solution . Also, discuss the uniqueness of the solution.
l
108 Chapter 3. Vector Spaces

(I) A ~ [J 3 -2 5
4 1 3
~l
7 - 3 6 13
(2) A ~[ -6
; -;
1

(3) A ~ [l 2 -3 -2
3 -2 0 -4
8 -7 -2 _~;
1 -9 -10
-3] , (4) A = [1 1
~ 5
2 -2
;l
Summarizing all the results obtained so far about solvability of a system, one can
obtain several characterizations of the invertibility of a square matrix. The following
theorem is a collection of the results proved in Theorems 1.9,3.24, and 3.25.

Theorem 3.26 For a square matrix A oforder n, the following statements are equiv-
alent.
(1) A is invertible.
(2) det A =I O.
(3) A is row equivalent to In .
(4) A is a product of elementary matrices.
(5) =
Elimination can be completed: PA LDU, with all d; =I O.
(6) Ax = b has a solution for every bE jRn .
(7) Ax = 0 has only a trivial solution, i.e., N(A) = (OJ.
(8) The columns of A are linearly independent.
(9) The columns of A span jRn, i.e., C(A) = jRn.
(10) A has a left inverse.
(11) rank A = n.
(12) The rows of A are linearly independent.
(13) The rows of A span jRn, i.e., R(A) = R".
(14) A has a right inverse.
(15)* The linear transformation A : jRn -+ jRn via A(x) = Ax is injective.
(16)* The linear transformation A : jRn -+ jRn is surjective.
(17)* Zero is not an eigenvalue of A.

Proof: Exercise; where have we proved which claim? Prove any not covered. The
numbers with asterisks will be explained in the following places: (15) and (16) in
Remark on page 133 and (17) in Theorem 6.1. 0

3.9 Applications
3.9.1 Interpolation

In many scientific experiments , a scientist wants to find the precise functional rela-
tionship between input data and output data. That is, in his experiment, he puts various
3.9.1. Application: Interpolation 109

input values into his experimental device and obtains output values corresponding to
those input values. After his experiment, what he has is a table of inputs and out-
puts. The precise functional relationship might be very complicated, and sometimes
it might be very hard or almost impossible to find the precise function. In this case,
one thing he can do is to find a polynomial whose graph passes through each of the
data points and comes very close to the function he wanted to find. That is, he is
looking for a polynomial that approximates the precise function. Such a polynomial
is called an interpolating polynomial. This problem is closely related to systems of
linear equations.
Let us begin with a set of given data: Suppose that for n + I distinct experimental
input values xo, Xl, ... , x n, we obtained n + I output values YO = I(xo), YI =
I(XI), . .. ,Yn = I(xn) . The output values are supposed to be related to the inputs by
a certain (unknown) function I. We wish to construct a polynomial p(x) of degree less
than or equal to n which interpolates I (x) atxo, Xl, . . . , Xn: i.e., p(Xi) = Yi = I (Xi)
for i = 0, I , . . . , n.
Note that if there is such a polynomial, it must be unique. Indeed, if q (x) is another
such polynomial, then hex) = p(x) - q(x) is also a polynomial of degree less than
or equal to n vanishing at n + I distinct points Xo, Xl, . .. , Xn. Hence h (x) must be
the identically zero polynomial so that p(x) = q(x) for all X E JR.
To find such a polynomial p(x), let

with n + I unknowns ai's. Then,

p(Xi) = ao + alXi + ... + anxf = Yi = I(Xi),

for i = 0, I, . .. , n. In matrix notation,

I Xl . . ..
XQ ., X~] [
Xl ao]
al [ Yo
Yl ]
... .. . ..
. ... - .
..
.
[
I Xn X~ an Yn

The coefficient matrix A is a square matrix of order n + I, known as Vandennonde's


matrix (see Example 2.10), whose determinant is

detA = n
O;S.i<};S.n
(X) -Xi) .

Since the Xi'S are all distinct, det A =1= O. Hence, A is nonsingular, and Ax = b has a
unique solution which determines the unique polynomial p(x) of degree g n passing
through the given n + I points (xo, Yo), (Xl , Yl) , .. . , (Xn, Yn) in the plane ]R2.

Example 3.24 (Finding an interpolating polynomial) Given four points

(0, 3), (1, 0), (-I , 2) , (3, 6)


110 Chapter 3. VectorSpaces

in the plane ]R2, let p(x) = ao+alx+a2x2+a3x3 be the polynomial passing through
the given four points . Then , we have a system of equations

l
ao = 3
ao + al + a2 + a3 = 0
ao - al + a2 a3 2
ao + 3al + 9a2 + 27a3 = 6.

Solving this system, one can get ao = 3, al = -2, a2 = -2, a3 = I, and the unique
polynomial is p(x) = 3 - 2x - 2x 2 + x 3. 0

Problem 3.25 Let f(x) = sinx . Then at x = 0, ~, t, ~, zr, the values of fare y =
0, ~ , 4-, ~ , O. Find the polynomial p(x) of degree g 4 that passes through these five
points. (One may need to use a computer to avoid messy computation.)

Problem 3.26 Find a polynomial p(x) =a + bx + cx 2 + d x 3 that satisfies p(O) = 1,


p'(O) = 2, p(l) = 4, p'(l) = 4.

Problem 3.27 Find the equation ofa circle that passes through the three points (2, -2), (3, 5),
and (-4, 6) in the plane ]R2 .

Remark: Note that the interpolating polynomial p(x) of degree ~ n is uniquely


determined when we have the correct data, i.e., when we are given precisely n + 1
values of y at n + 1 distinct points Xo , XI , . .. , x n .
However, if we are given fewer data, then the polynomial is under-determined:
i.e., if we have m values of y with m < n + 1 at m distinct points XI, X2, ... , X m,
then there are as many interpolating polynomials as the null space of A since in this
case A is an m x (n + 1) matrix with m < n + 1. (See the Existence Theorem 3.24.)
On the other hand , if we are given more than n + 1 data, then the polynomial
is over-determined: i.e., if we have m values of y with m > n + 1 at m distinct
points XI, X2, . , X m, then there may not exist an interpolating polynomial since
the system could be inconsistent. (See the Uniqueness Theorem 3.25.) In this case,
the best one can do is to find a polynomial of degree ~ n to which the data is closest,
called the least square solution. It will be reviewed again in Sections 5.9-5.9.2.

3.9.2 The Wronskian

Let YI , Y2, . . , Yn be n vectors in an m-dimensional vector space V. To check the


independence of the vector Yi 's, consider its linear dependence:

CIYI + C2Y2 + ... + cnYn = O.


Let a = {Xl, X2, .. . ,x m} be a basis for V. By expressing each Yi as a linear combi-
nation of the basis vectors Xi'S: i.e., Yj = I:7=1
aijXi, the linear dependence of Yi'S
can be written as a linear combination of the basis vectors Xi'S:
3.9.2. Application: The Wronskian 111

o= ClYl + C2Y2 + . .. + CnYn (all Cl + a12c2 + + alnCn)Xl


+ (a21CI + a22C2 + + a2nCn)X2

so that all of the coefficients (which are also linear combinations of c; 's) must be zero.
It gives a homogeneous system of linear equations in c; 's, say Ac = 0 with an m x n
matrix A, as in the proof of Lemma 3.9:

Recall that the vectors Yi 's are linearly independent if and only if the system Ac = 0
has only the trivial solution. Hence, the linear independence of a set of vectors in a
finite dimensional vector space can be tested by solving a homogeneous system of
linear equations .

If V is not finite dimensional, this test for the linear independence of a set of vectors
cannot be applied. In this section, we introduce a test for the linear independence of
a set of functions . For our purpose, let V be the vector space of all functions on lR
which are differentiable infinitely many times . Then one can easily see that V is an
infinite dimensional vector space .

Let!J, h. . .. , In be n functions in V. The n functions are linearly independent


in V if
CIfl + c2h + ... + c-J = 0
implies that all c; = O. Note that the zero function 0 takes its value zero at all the
points in the domain. Thus they are linearly independent if

CI!I (x) + c2h(x) + ... + cnfn(x) = 0

for all x E lRimplies that all c; = O. By taking a differentiations n - 1 times, one can
obtain n equations:

for all x E R Or, in matrix form :

!I (x) h(x)
f{(x) f~(x) f~(x)
fn (x) C2
] [ Cl] [ 0 ]

[
fl(n-:l) (x) j 2(n - l) (
x) ..
fn(n-:l)(x) C~ = b.
112 Chapter 3. VectorSpaces

The determinant of the coefficient matrix is called the Wronskian for {II (x), h(x),
. .. , fn(x)} and denoted by W(x) . Therefore, if there is a point Xo E JR such that
W(xo) ::j:. 0, then the coefficient matrix is nonsingular at x = xo, and so all ci = O.
Therefore,

if the Wronskian W (x) ::j:. 0 for at least one x E R


then fl' h, .. ., fn are linearly independent.

However, the Wronsk ian W (x) = 0 for all x does not imply linear dependence of the
given functions f; 's. In fact, W (x) = 0 means that the functions are linearly dependent
at each point x E JR, but the constants Cj'S giving nontrivial linear dependence may
vary as x varies in the domain . (See Example 3.25 (2).)

Example 3.25 (Test the linear independence offunctions by Wronskian)


(1) For the sets of functions FI = {x, cosx, sinx} and F2 = {x, e", e- X}, the
Wronskians are

X cosx sinx ]
WI (x) = det [ 1 - sinx c~sx =x
o -cosx -SlllX

and
x
X eX e- ]
W2(X) = det 1 eX _e- x
[ o e"
= 2x .
e- X
Since Wj(x) ::j:. 0 for x ::j:. 0, both FI and h are linearly independent.
(2) For the set offunctions [x]x], x 2 } on JR, the Wronskian for them is
2
xlxl x ]
W(x) = det [ 21xl 2x = 0

for all x. These two functions are linearly dependent on each of (-00,0] and [0, 00),
since x]x] = -x 2 on (-00,0] and xlxl = x 2 on [0, 00). But they are clearly linearly
independent functions on R 0

Problem 3.28 Show that 1, x, x 2 , .. . , x n are linearly independent in the vectorspace C(lR)
of continuous functions.

3.10 Exercises
3.1. Let V be the set of all pairs (x, y) of real numbers. Define

(x , y) + (Xt , YI) (x +xl, Y + yt>


k(x, y ) (kx, y) .

Is V a vectorspacewiththeseoperations?
3.10. Exercises 113

3.2. For x, Y E JRn and k E JR, define two operations as

xE9y=x-y, k x = -kx.
The operations on the right-hand sides are the usual ones . Which of the rules in the
definition of a vector space are satisfied for (JR n , E9, .)?
3.3. Determine whether the given set is a vector space with the usual addition and scalar
multiplication of functions .
(1) The set of all continuous functions f defined on the interval [-1, 1] such that f (0) =
1.
(2) The set of all continuous functions f defined on the real line JR such that Iimx->oo
f(x) = O.
(3) The set of all twice differentiable functions f defined on JR such that fl/ (x) + f (x) =
O.
3.4. Let C 2 [ -1, 1] be the vector space of all functions with continuous second derivative s on
the domain [-1, 1] . Which of the following subsets is a subspace of C 2[-1, I]?
(1) W = {I(x) E C 2[ -I , 1] : fl/ (x ) + f(x) = 0, -1::: x::: I}.
(2) W = {I(x) E C 2[-I, 1] : fl/(x) + f(x) = x 2, -1::: x::: I}.
3.5 . Which of the following subsets of C[ -1, 1] is a subspace of the vector space C[ -I, 1]
of continuous functions on [-1, I]?
(1) W = {I(x) E C[-I, 1] : f(-I) = -f(1)}.
(2) W = {I(x) E C[-I , 1] : f(x) ~ 0 for all x in [-1, I]}.
(3) W = {I(x) E C[-I , 1] : fe-I) = -2 and f(1) = 2} .
(4) W = {I(x) E C[-I , 1] : f(i) = OJ.
3.6 . Show that the set of all matrices of the form AB - BA cannot span the vector space
Mnxn(lR).
3.7. Does the vector (3, -1, 0, -1) belong to the subspace oflR.4 spanned by the vectors
(2, -1 , 3, 2) , (-1, 1, 1, -3) and (1, 1, 9, -5) ?
3.8. Expres s the given function as a linear combination of functions in the given set Q.
(1) p(x) = -1 - 3x + 3x 2 and Q = {PI (x), P2(X), P3( X)} , where
PI(X)=1+2x+x 2, P2(X)=2+5x, P3(X)=3+8x-2x 2 .
(2) p(x) = =
-2 - 4x + x 2 and Q {PI (x) , P2(x), P3(X), P4(X)}, where
PI(X) = I+2x 2+x3, P2(X) = I+x+2x 3, P3(X) = - I - 3 x - 4x3,
P4(X) =
I +2x _x 2 +x 3.
3.9. Is {cos2 x , sin 2 x , 1, eX} linearly independent in the vector space C(JR)?
3.10. In the n-space JRn , determine whether or not the set

is linearly dependent.
3.11. Show that the given sets of functions are linearly independent in the vector space
C[-]f, ]fl.
(1) {I , x, x 2, x 3 , x 4}
114 Chapter 3. Vector Spaces

(2) {I, eX, e2x , e3x }


(3) {I, sinx, cosx, .. . , sinkx , coskx}
3.12. Are the vectors

VI = (I, I, 2, 4), V2 = (2, -I, -5,2),


V3=(1, -1, -4, 0), V4=(2, 1, 1,6)
linearly independent in the 4-space ]R4?
3.13. In the 3-space ]R3 , let W be the set of all vectors (XI , X2, X3) that satisfy the equation
XI - X2 - X3 = O. Prove that W is a subspace of]R3. Find a basis for the subspace W.
3.14. Let W be the subspace of C[ -1 , 1) consisting of functions of the form f(x) = a sinx +
b cos x . Determine the dimension of W.
3.15. Let V denote the set of all infinite sequences of real numbers :
V = {x: x = {x;}~I 'x; E ]R} .
Ifx = {x;} and y = {y;} are in V, then x + y is the sequence {x; + Y;}~I' If cis a real
number, then ex is the sequence {ex; }~I '
(1) Prove that V is a vector space.
(2) Prove that V is not finite dimensional.
3.16. For two matrices A and B for which AB can be defined, prove the following statements:
(1) If both A and B have linearly independent column vectors, then the column vectors
of AB are also linearly independent.
(2) If both A and B have linearly independent row vectors, then the row vectors of AB
are also linearly independent.
(3) If the column vectors of B are linearly dependent, then the column vectors of AB are
also linearly dependent.
(4) If the row vectors of A are linearly dependent, then the row vectors of AB are also
linearly dependent.
3.17. Let U = {(x, y, z) : 2x + 3y + z = O} and V = {(x, y, z) : x + 2y - z = O} be
subspaces of]R3.
(1) Find a basis for U n V.
(2) Determine the dimension of U + V.
(3) Describe U , V, un V and U + V geometrically.
3.18. How many 5 x 5 permutation matrices are there? Are they linearly independent? Do they
span the vector space MSxS(]R) ?
3.19. Find bases for the row space, the column space, and the null space for each of the following
matrices .

(1) A ~ [i IS]2
4 -3 0 , (2) B ~ [! 2
1 -2
1
-52 ] ,

n
2 -I 1 5 0 0

~ [i
1 -1

(3) C
3
6
9
(4) D ~1 [
1 -1
1 -1
0 -2
-23 '1 ]
8 3 .
2 1
5 -5 5 10
n
3.10. Exercises 115

3.20 Hnd the rank of A as a function of x A ~ U: =~


3.21 . Find the rank and the largest invertible submatrix of each of the following matrices .

(1) ~ ! i1.
[~ooooJ (2) [ ;
1114
~~ ;], (3) [i ~ ~ i1.
IOOOJ
3.22. For any nonzero column vectors u and v, show that the matrix A = uv T has rank 1.
Conversely, every matrix of rank 1 can be written as uv T for some vectors u and v.
3.23. Determine whether the following statements are true or false, and justify your answers.
(1) The set of all n x n matrices A such that AT = A -1 is a subspace of the vector space
Mnxn(lR).
(2) If Ct and {3 are linearly independent subsets of a vector space V, so is their union
a U {3.
(3) If U and Ware subspaces of a vector space V with bases Ct and {3 respectively, then
the intersection Ct n {3 is a basis for U n W.
(4) Let U be the row-echelon form of a square matrix A. lithe first r columns of U are
linearly independent, so are the first r columns of A.
(5) Any two row-equivalent matrices have the same column space.
(6) Let A be an m x n matrix with rank m. Then the column vectors of A span jRm .
(7) Let A be an m x n matrix with rank n. Then Ax = b has at most one solution .
(8) If U is a subspace of V and x, y are vectors in V such that x + y is contained in U,
then x E U andy E U.
(9) Let U and V are vector spaces . Then U is a subspace of V if and only if dim U ~
dimV .
(10) Forany m x n matrix A , dimC(A T) + dimN(A T) = m.
4
Linear Transformations

4.1 Basic properties of linear transformations


As shown in Chapter 3, there are many different vector spaces even with the same
dimension . The question now is how one can determine whether or not two given
vector spaces have the 'same' structure as vector spaces , or can be identified as the
same vector space. To answer the question , one has to compare them first as sets, and
then see whether their arithmetic rules are the same or not. A usual way of comparing
two sets is to define afunction between them. When a function f is given between the
underlying sets of vector spaces , one can compare the arithmetic rules of the vector
spaces by examining whether the function f preserves two algebraic operations: the
vector addition and the scalar multiplic ation, that is, f(x + y) = f(x) + f(y) and
f(kx) = kf(x) for any vectors x, y and any scalar k. In this chapter, we discuss this
kind of functions between vector spaces .

Definition 4.1 Let V and W be vector spaces. A function T : V -+ W is called a


linear transformation from V to W if for all x, y E V and scalar k the following
conditions hold:
(1) T(x + y) = T(x) + T(y),
(2) T(kx) = kT(x) .

We often call T simply linear. It is easy to see that the two conditions for a linear
transformation can be combined into a single requirement

T(x + ky) = T(x) + kT(y).

Geometrically, the linearity is just the requirement for a straight line to be transformed
to a straight line, since x + ky represents a straight line through x in the direction y
in V , and its image T(x) + kT(y) also represents a straight line through T(x) in the
direction of T(y) in W .

J H Kwak et al., Linear Algebra


Birkhauser Boston 2004
118 Chapter 4. Linear Transformations

Example 4.1 (Lin ear or not ) Consider the following functions:


(1) f : jR -+ jR defined by f(x ) = 2x ;
(2) g : jR -+ jR defined by g (x) = x 2 - x;
(3) h : jR2 -+ jR2 defined by h ex, y) = (x - y, 2x) ;
(4) k : jR2 -+ jR2 defined by k(x, y) = (xy , x 2 1). +
One can easily see that g and k are not linear, while f and h are linear. Moreover, on
the l-space R, all polynomials of degree greater than one are not linear. 0

Example 4.2 (A matrix A as a linear transformation)


(1) For an m x n matrix A, the transformation T : jRn -+ jRm defined by the
matrix product
T(x) = Ax
is a linear transformation by the distributive law A(x + ky) = Ax + kAy for any
x, y E jRn and for any scalar k E IR. Therefore, a matrix A, identified with the
transformation T, may be considered as a linear transformation from jRn to jRm.
(2) For a vector space V, the identity transformation id : V -+ V is defined
by id(x) = x for all x E V. If W is another vector space, the zero transformation
To : V -+ W is defined by To(x) = 0 (the zero vector) for all x E V. Clearly, both
transformations are linear. 0

The following theorem is a direct consequence of the definition, and the proof is
left for an exercise.

Theorem 4.1 Let T : V -+ W be a linear transformation. Then


(1) T(O) = O.
(2) For any xi , X2 , . .. , xn E V and scalars kl , k2, . .. , k.

Nontrivial important examples of linear transformations are the rotations , reflec-


tions, and projections in a geometry.

Example 4.3 (The rotations, reflections and projections in a geometry)


e
(1) Let denote the angle between the x-axis and a fixed vector in jR2. Then the
matrix
R _ [ cos sin e - e]
(j - sine cos s
defines a linear transformation on jR2 that rotates any vector in JR2 through the angle
e about the origin. It is called a rotation by the angle e.
(2) The projection on the x -axis is the linear transformation P : JR2 -+ jR2
defined by, for x = (x , y) E jR2,

P(x) = [~ ~] [ ; ] = [ ~].
4.1. Basic properties of linear transformation 119

y y y.
Ro(x)
x x
x

-~---x x
P(x) rex)
Figure 4.1. Three linear transformations on ]R2

(3) The linear transformation T : ]R2 -+ ]R2 defined by, for X = (x, y),

is the reflection about the x-axis. o


Problem 4.1 Find the matrix of the reflection about the line y = x in the plane ]R2 .
Example 4.4 (Differentiations and integrations in calculus) In calculus , it is well
known that two transformations

defined by differentiation and integration,

D(f)(x) = !,(x), I(f)(x) = i X

f(t)dt,

satisfy linearity, and so they are linear transformations . Many problems related with
differential and integral equations may be reformulated in terms of linear transforma-
tions . 0

Definition 4.2 Let V and W be two vector spaces, and let T : V -+ W be a linear
transformation from V into W.
(1) Ker(T) = (v E V : T(v) = O} ~ V is called the kernel of T.
(2) Im(T) = (T(v) E W : v E V} = T(V) ~ W is called the image of T .

Example 4.5 Let V and W be vector spaces and let id : V -+ V and To : V -+ W


be the identity and the zero transformations, respectively. Then it is easy to see that
Ker(id) = (OJ, Im(id) = V, Ker(To) = V and Im(To) = to}. 0

Theorem 4.2 Let T : V -+ W be a linear transformation from a vector space V to


a vector space W. Then the kernel Ker(T) and the image Im(T) are subspaces of V
and W, respectively.
120 Chapter 4. Linear Transformations

Proof: Since T(O) = 0, each of Ker(T) and Im (T) is nonempty having O.


(1) For any x, y E Ker(T) and for any scalar k ,

T(x + ky) = T(x) + kT(y) = 0 + kO = O.


Hence x + ky E Ker(T), so that Ker(T) is a subspace of V .
(2) If v, W E Im(T), then there exist x and y in V such that T(x) = v and
T(y) = w. Thus, for any scalar k,

v + kw = T(x) + kT(y) = T(x + ky) .

Thus v + kw E Im(T), so that Im(T) is a subspace of W. o


=
Example 4.6 (Ker(A) N(A) and Im(A) = C(A)forany matrix A) Let A : JRn --+
JRm be the linear transformation defined by an m x n matrix A as in Example 4.2(1).
The kernel Ker(A) of A consists of all solutions of the homogeneous system Ax = O.
Hence, the kernel Ker(A) of A is nothing but the null space N(A) of the matrix A,
and the image Im(A) of A is just the column space C(A) = Im(A) ~ JRm of the
matrix A. Recall that Ax is a linear combination of the column vectors of A. 0
Example 4.7 (The trace is linear) A trace is a function tr : M nxn (JR) --+ JR defined
by the sum of diagonal entries
n
tr(A) = au + a22 + . . . + ann = Laii
i=1
for A = [aij] E M n xn (JR), and tr(A) is called the trace of the matrix A. It is easy to
show that
tr(A + B) = tr(A) + tr(B) and tr(kA) = k tr(A)
for any two matrices A and B in Mnxn(JR), which means that 'tr' is a linear transfor-
mation from Mnxn(JR) to the l -space R In addition, one can easily show that the set
of all nxn matrices with trace 0 is a subspace of the vector space Mnxn(JR). 0

Problem 4.2 Let W = (A E Mnxn(R) : tr(A) = OJ. Show that W is a subspace, and find a
basis for W .

Problem 4.3 Showthat, for any matrices A and B in Mn xn (R) , tr(AB) = tr(BA) .
One of the most important properties of linear transformations is that they are
completely determined by their values on a basis.

Theorem 4.3 Let V and W be vector spaces. Let {VI , V2, ... , vnl be a bas is for V
and let WI, W2, . . . , Wn be any vectors (possibly repeated) in W. Then there exists a
unique linear transformation T : V --+ W such that T (Vi) =
Wi for i = 1, 2, . . . , n.

Proof: Let x E V . Then it has a unique expression: x = 2:7=1 aiv, for some scalars
aI, a2, .. . , an. Define
4.1. Basic properties of linear transformation 121
n
T :V ~ W by T(x) = I>iWi.
i=1
In particular, T(Vi) =Wi for i =
1,2, . . . , n.
Linearity : For x = =
2:7=1 aivi, Y 2:7=1 biv, E V and k a scalar. we have
=
x+ky 2:7=I(a i +kbi)Vi . Then
n n n
T(x +ky) = L(ai +kbi)Wi = Laiwi +k LbiWi = T(x) +kT(y).
i=1 i=1 i=1
Uniqueness: Suppose that S : V ~ W is linear and S(Vi) = Wi for i =
1.2 . .. , n. Then for any x E V with x = 2:7=1 aiv ], we have
n n
S(x) = LaiS(Vi) = Laiwi = T(x) .
i=1 i=1
Hence. we have S = T. o
The uniqueness in Theorem 4.3 may be rephrased as the following corollary.

CoroUary4.4 Let Vand Wbevectorspacesandlet{v], V2, ... vn}beabasisforV.


If S , T : V
~ W are linear transformations and Stvn T(vi)fori= =
1.2 .. . . n;
then S = T, i.e., S(x) = T(x) for all x E V.

Example 4.8 (Linear extension of a transformation defined on a basis) Let WI =


(1. 0). W2 = (2. -1). w3 = (4. 3) be three vectors in ]R2.
(1) Let ex = {el. e2. e3} be the standard basis for the 3-space]R3. and let T : ]R3 ~
]R2 be the linear transformation defined by

Find a formula for T(XI , X2. X3) . and then use it to compute T(2 . -3. 5).
(2) Let f3 = {VI,V2 , V3} be another basis for ]R3. where VI = (1, 1. 1). V2 =
(1. 1. 0). V3 = (1. O. 0). and let T : ]R3 ~ ]R2 be the linear transformation defined
by

Find a formula for T(XI, X2. X3). and then use it to compute T(2. -3, 5).

3 3
T(x) = LXiT(ei) = LXiWi
i=1 i=1
= XI(l ,0)+X2(2, -1)+X3(4, 3)
(XI + 2X2 + 4X3, -X2 + 3X3).
122 Chapter 4. Linear Transformations

Thus, T(2, -3, 5) = (16, 18). In matrix notation, this can be written as

[ oI -I24]
3
[ ~~ ] = [
X3
Xl + 2X2 + 4X3 ] .
- X2 + 3X3

(2) In this case, we need to express x = (Xl, X2, X3) as a linear combination of
VI, V2, V3, i.e.,
3
(XI ,X2,X3) = LkiVi = kl(l, I, 1)+k2(1, I, 0)+k3(1,0,0)
i=l

By equating corresponding components we obtain a system of equations

= Xl
= X2
= X3

The solution is kl = X3, k2 = X2 - X3, k3 = Xl - X2. Therefore,

(Xl, x2, X3) = x3VI + (X2 - X3)V2 + (Xl - X2)V3, and

T(XI,X2,X3) = X3T(VI)+(X2-X3)T(V2)+(XI-X2)T(V3)
= x3(1, 0) + (X2 - x3)(2, -I) + (X] - x2)(4, 3)
= (4XI - 2X2 - X3, 3XI - 4X2 + X3).

From this formula, we obtain T(2, -3, 5) = (9, 23) . o


Problem 4.4 Is there a linear transformation T : ]R3 -l- ]R2 such that T(3 , I, 0) = (1, I)and
T (-6, -2. 0) = (2, I) ? If yes, can you find an expression of T (x) for x = (xj , x2. X3) in
]R3?

Problem 4.5 Let V and W be vector spaces and T : V ~ W be linear.Let {WI, w2. . . Wk}
be a linearly independentsubset of the image Im(T) W. Suppose that ex = {V]. v2, . . . vkl
is chosen so that T(Vi) = Wi for i = I. 2, .. . , k. Prove that ex is linearly independent.

4.2 Invertible linear transformations


A function I from a set X to a set Y is said to be invertible if there is a function g
from Y to X such that their compositions satisfy go 1= id and log =
id . Such a
function g is called the inverse function of I and denoted by g = 1-1. One can notice
that if there exists an invertible function from a set X into another set Y, then it gives
a one-to-one correspondence between these two sets so that they can be identified
as sets. A useful criterion for a function between two given sets to be invertible is
4.2. Invertible linear transformations 123

that it is one-to-one and onto. Recall that a function f : X ~ Y is one-to-one (or


injective) if feu) = f(v) in Y implies u = v in X, and is onto (or surjective) iffor
each element y in Y there exists an element x in X such that f (x) = y. A function is
said to be bijective if it is both one-to-one and onto, that is, if for each element y in
Y there is a unique element x in X such that f (x) = y .

Lemma 4.5 A function f : X ~ Y is invertible if and only if it is bijective (or


one-to-one and onto) .

Proof: Suppose f : X ~ Y is invertible, and let g : Y ~ X be its inverse. If


feu) = f(v), then u = g(f(u)) = g(f(v)) = v. Thus f is one-to-one. For each
y E Y, let g(y) = x in X. Then f(x) = f(g(y)) = y. Thus it is onto.
Conversely, suppose f is bijective. Then, for each y E Y, there is a unique x E X
such that f(x) = y . Now for each y E Y define g(y) = x. Then one can easily check
that g : Y ~ X is well defined, and that fog = id and go f = id, i.e., g is the
inverse function of f. 0

If T : V ~ Wand S : W ~ Z are linear transformations, then it is easy to


show that their composition (S 0 T)(v) = S(T(v)) is also a linear transformation. In
particular, if two linear transformations are defined by matrices A : lRn ~ lRm and
B : lRm ~ lRk as in Example 4.2(1), then their composition is nothing but the matrix
product BA of them, i.e., (B 0 A)(x) = B(Ax) = (BA)x.
The following lemma shows that if a given function is an invertible linear trans-
formation from a vector space into another, then the linearity is preserved by the
inversion.

Lemma 4.6 Let V and W be vector spaces. If T : V ~ W is an invertible linear


transformation, then its inverse T - I : W ~ V is also linear.

Proof: Let wj , W2 E W, and let k be any scalar. Since T is invertible, there exist
unique vectors VI and V2 in V such that T(VI) = WI and T(V2) = W2. Then

T-I(WI + kW2) = T- I (T(VI) + kT(V2))


T- I (T(VI + kV2))
= VI + kV2
T-I(WI) + kT- I(W2). o
Definition 4.3 A linear transformation T : V ~ W from a vector space V to another
W is called an isomorphism if it is invertible (or one-to-one and onto) . In this case,
we say that V and W are isomorphic to each other.

Example 4.9 (The vector space Pn (R) is isomorphic to lRn + 1) Consider the vector
space P2(lR) = {a + bx + cx 2 : a , b, c E R] of all polynomials of degree ~ 2
with real coefficients. To each polynomial a + bx + cx 2 in the space P2(lR) , one
can assign the column vector [a b c f in lR3 . Then it is not hard to see that it is
124 Chapter 4. Linear Transformations

an isomorphism from the vector space P2(JR) to the 3-space JR3 , by which one can
identify the polynomial a + bx + cx 2 with the column vector [a b cf. It means
that these two vector spaces can be considered as the same vector space through the
isomorphism . In this sense, one often says that a vector space can be identified with
another if they are isomorphic to each other. In general, the vector space Pn (JR) can
be identified with the (n + I)-space JRn+l. 0

It is clear from Lemma 4.6 that if T is an isomorphism, then its inverse T- I


is also an isomorphism with (T- I ) -I = T . In particular, if a linear transformation
A : JRn -+ JRn is defined by an invertible n x n matrix A as in Example 4.2(1), then
the inverse matrix A -I plays the inverse linear transformation, so that it is also an
isomorphism on JRn . That is, a linear transformation A : JRn -+ JRn defined by an
n x n square matrix A is an isomorphism if and only if A is invertible, that is, rank
A =n.
Problem 4.6 Suppose that Sand T are linear transformations whose composition SoT is well
defined. Prove that
(1) if SoT is one-to-one, so is T,
(2) if SoT is onto, so is S,
(3) if Sand T are isomorphisms, so is SoT,
(4) if A and B are two n x n matrices of rank n, so is AB .

Problem 4.7 Let T : V ~ W be a linear transformation. Prove that


(1) T is one-to -one if and only if Ker(T) = {O}.
(2) If V = W, then T is one-to-one if and only if T is onto.

Theorem 4.7 Two vector spaces V and Ware isomorphic if and only if dim V =
dimW.

Proof: Let T : V -+ W be an isomorphism, and let (VI, V2, .. " vnl be a basis for
V. Then we show that the set (T(vj), T(V2), . . . , T(v n )} is a basis for W so that
dim W = n = dim V.
(l) It is linearly independent: Since T is one-to-one, the equation

implies that 0 = CI VI + C2V2 + + Cn Vn. Since the Vi'S are linearly independent,
we have c, = 0 for all i = 1,2, ,n.
(2) It spans W: Since T is onto, for any yEW there exists an x E V such that
T(x) = y. Write x = L:?=I aiVi. Then

i.e., y is a linear combination of T(VI) , T(V2) , .. . , T(v n ) .


Conversely, if dim V = dim W = n, then one can choose bases {VI, V2, . . , vn }
and {WI, W2, .. . , wn } for V and W, respectively. By Theorem 4.3, there exist linear
4.2. Invertible linear transformations 125

transformations T : V --+ Wand S : W --+ V such that T(Vi) = Wi and S(Wi) = Vi


fori = 1,2, . .. , n.Clearly,(SoT)(Vi) = Vi and (ToS)(wj) = Wi fori = 1,2, . . . , n ,
which implies that SoT and T 0 S are the identity transformations on V and W,
respect ively, by the uniqueness in Corollary 4.4. Hence , T and S are isomorphisms,
and consequently V and W are isomorphic. 0

Corollary 4.8 Let V and W be vector spaces.


(1) If dim V = n, then V is isomorphic to the n-space jRn.
(2) If dim V = dim W, any bijective function from a basis for V to a basis for W
can be extended to an isomorphism from V to w.
An isomorphism between a vector space V and jRn in Corollary 4.8 depends on the
choices of bases for two spaces as shown in Theorem 4.7. However, an isomorphism
is uniquely determined if we fix the bases in which the order of the vectors is also
fixed.
An ordered basis for a vector space is a basis endowed with a specific order.
For example, in the 3-space jR3 , two bases {eI, e2, e3} with the order eI, e2, e3 and
{e2, eI, e3} with the order e2, ej , e3 are clearly different as ordered bases, but the
same as unordered ones . The basis {el , e2, ... , en} with the order ej , e2, . .. , en is
called the standard ordered basis for IRn. However, we often say simply a basis for
an ordered basis if there is no ambiguity in the context.
Let V bea vector space of dimension n with an ordered basis a = {VI , V2, ... , vn},
and let f3 = {el, e2, ... , en} be the standard ordered basis for IRn. Then the isomor-
phism <t> : V --+ IRn defined by <t> (vj) = ei is called the natural isomorphism with
respect to the basis a. By this isomorphism, a vector in V can be identified with a
column vector in IRn. In fact, for any x = I:7=1 aiv, E V , the image of x under this
natural isomorphism is written as

<t>(x)
n
= ~ai<t>(Vi) n
= ~aiei = (al, . . . , an) = [ al: ]
E IRn,
.=1 1=1 an

which is called the coordinate vector of x with respect to the ordered basis a, and it
is denoted by [x]a. Clearly [vila = er.

Example 4.10 (1) Recall that, from Example 4.3, the rotation by the angle () of IR2
is given by the matrix
R = [ cos () - sin () ]
IJ sin () cos () .

Clearly, it is invertible and hence is an isomorphism of IR2. In fact, the inverse R;


I is
another rotation R_IJ .
(2) Let a = [ej , e2} be the standard ordered basis, and let f3 = {VI, V2}, where
Vi = RlJei , i = 1,2. Then f3 is also a basis for IR2. The coordinate vectors of Vi with
respect to a are
126 Chapter 4. Linear Transformations

COS 0 ] [ - sin 0 ]
[vtJa =[ sin s ' [vzl a = cosO'

l l
while

[vIlp =[ ~ [vzlp =[~


If we choose a' = {ez, eil as a different ordered basis for IR z , then the coordinate
vectors of Vi with respect to sx' are

[vIla ' = [ ~~~: ], [vzl a l = [ -~~~: ] . o


Example 4.11 (All reflections are of theform Re 0 T 0 R-e) In the plane IR z, the
reflection about the line y = x can be obtained by the compositions of the rotation
by -f of the plane, the reflection about the x-axis, and the rotation by Actually, l
it is a product of matrices given in (1) and (3) of Example 4.3 with 0 Note that = f:
f
the rotation by is

R" =
COS f - sin frr ] -_ [~1 - ~
1
]'
'4 [ sin!!.4 cos
'4 .J2.J2

and the reflection aboutthex-axis is [~ _ ~ land R_1- = Rjl. Hence, the matrix
for the reflection about the line y = x is

R"'4 [10 -10 ] R;l


'4
= [1 -1]
-
.J2
-
,J2
[10 -10 ] [ - - - 1 1] =
.J2 .J2
[01 01 ] .

In general, the reflection about a line .e in the plane can be expressed as the
composition Re 0 T 0 R-e , where T is the reflection about the x-axis and 0 is the
angle between the x-axis and the line .e (see Figure 4.2). 0

Problem4.8 Find the matrix of reflectionabout the line y = ,J3x in ]RZ.


Problem4.9 Find the coordinatevectorof 5 + 2x + 3x z with respectto the givenorderedbasis
ex for PZ(]R) :
(l)ex={l , x ,x z}; (2)ex={I+x ,l+x z, x+ x z}.

4.3 Matrices of linear transformations


We have seen that the product of an m x n matrix A and an n x 1 column matrix x
gives rise to a linear transformation from IR n to IR m . Conversely, one can show that a
linear transformation of a vector space into another can be represented by a matrix via
4.3. Matrices of linear transformations 127

x y

R_o(x)

y = RO 0 T 0 R_o(x)
x T

Ro To R_O(x)

Figure 4.2. The reflection Ro 0 T 0 R-o

the natural isomorphism between an n-dimensional vector space V and the n-space
jRn, which will be shown in this section.
Let T : V -+ W be a linear transformationfrom an n-dimensionalvectorspace V
toanm-dimensional vectorspace W, and let ex = {VI, . . . , vn } and,B = {WI, . . . , wm }
be any ordered bases for V and W, respectively, which will be fixed throughout this
section. Then by Theorem 4.3 the linear transformation T is completely determined
by its values on a basis ex: Write them as

{ T(v,) = allWI + aZlwz + + amlWm


T(vz) = alZwI + aZZwz + + amZWm

T(v n) = alnwl + aznwz + ... + amnwm,


or, in a short form,
m
T(vj) = I>ijWi for I :5 j :5 n,
i=1

for some scalars aij (i = 1, 2, . . . , m; j = 1, 2, ... , n) .


Now,for any vector x = LJ=I XjVj E V,

Equivalently, the coordinate vector of T(x) with respect to the basis ,B in W is

LJ=I aljx j] [all' .. aln] [ XI ]


[T(x)]p =
[
n: =: : : = A[x]a.
Lj=1 amjxj amI amn Xn
That is, for any x E V the coordinatevector[T(x)]p of T(x) in W isjust the productof
a fixedmatrix A and the coordinatevector[x]a of x. This situationcan be incorporated
128 Chapter 4. Linear Transformations

T
V W

X f--""> T(x)

~j T T j~
[x]a f--""> [T(x)p

jRn jRm ,
A = [T]g
Figure 4.3. The associated matrix for T

in the commutative diagram in Figure 4.3 with the natural isomorphisms ct> and
\II, defined in Section 4.2. Note that the commutativity of the diagram means that
A 0 ct> = \II 0 T.
Note that

is the matrix whose column vectors are just the coordinate vectors [T (v j)]p of T (v j )
with respect to the basis (3. In fact, A= [aU] is just the transpose of the coefficient
matrix in the expression of T(V i) with respect to the basis {3 in W . Note that this
matrix [T]~ is unique since the coordinate expression of a vector with respect to a
fixed basis is unique.
Definition 4.4 The matrix A is called the associated matrix for T (or the matrix
representation of T) with respect to the ordered bases ( and {3 , and denoted by
A = [T]g. When V = Wand a = {3, we simply write [T]a for [T]~ .
Now, the argument so far can be summarized in the following theorem.
Theorem 4.9 Let T : V ~ W be a linear transformation from an n-dimensional
vector space V to an m-dimensional vector space W. Forfixed ordered bases a =
{VI, V2, ... , vnl for V and {3 for W, there corresponds a unique associated m x n
matrix [T]g for T such that for any vector x E V the coordinate vector [T(x)]p of
T (x) with respect to {3 is given as a matrix product ofthe associated matrix [T]g for
T and the coordinate vector [x]a. i.e.,

[T(x)]p = [T]g[x]a . I
The associated matrix [T]g is given as
4.3. Matrices of linear transformations 129

The following examples illustrate the computation of the associated matrices for
linear transformations.

Example 4.12 (The associated matrix [id]a) Let id : V ~ V be the identity trans-
formation on a vector space V. Then for any ordered basis a for V, the matrix
[id]a = I, the identity matrix , because if a = {VI , V2 , " " vn } , then

id(vI> = IVI + OV2 + + OVm

j
id(V2) = OVI + IV2 + + OVm
..
.
id(vn ) = OVI + OV2 + + Iv m o
Example 4.13 (The associated matrix [T]~) Let T : PI (lR) ~ P2(lR) be the linear
transformation defined by
(T(p)(x) = xp(x) .

Find the associated matrix [T]~ with respected to ordered bases a = {I, x} and
f3 = {I , x, x 2 } for PI(lR) and P2(lR),respectiveiy.

Solution: Clearly,

(T(I(x) = x = O I + Ix + Ox2
{ (T(x(x) = x 2 = O I + Ox + Ix 2.

[! n.
Hence, the associated matrix for T is the transpose of the coefficient matrix in this

expression, that is, [T]~ ~ 0

Example 4.14 (The associated matrices [T]~ and [T]~') Let T : lR2 ~ lR3 be the
linear transformation defined by T(x , y) = (x + 2y, 0, 2x + 3y) with respect to
the standard bases a and f3 for lR2 and lR3, respectively. Then

T(el) = T(I,O) = (I, 0, 2) = lej + Oe2 + 2e3


{ T(e2) = + +

i:
T(O, I) = (2, 0, 3) = 2el Oe2 3e3.

Hence, [Tl~ ~ [~ P' = Ie, ., . ej ], then [TJ~' ~ [~ ~]. 0

Example 4.15 (The associated matrix [T]a for the standard basis a) Let T : lR2 ~
lR 2bealineartransformationgivenbyT(l , I) = (0, l)andT(-I, I) = (2, 3).Find
the matrix representation [T]a of T with respect to the standard basis a = [ej , e2}.
130 Chapter 4. Linear Transformations

Solution: Note (a, b) = ae) + be2 for any (a, b) E jR2. Thus, the definition of T
shows
T(el)
-T(e ++T(~) =
T(e2) = T(-e) +
T(el + e2) = T(l, 1) =
e2) = T(-l , 1) =
(0 , 1) =
(2, 3) = 2el + 3e2.
~,

By solving these equations, we obtain

T(el) = -el
T(e2) = el

Therefore, rn, = [ =~ ; l 0

Example 4.16 (The associated matrix [T]p for a non-standard basis 13) Let T be
the linear transformation given in Example 4.15. Find [TJp for a basis 13 {VI,V2}, =
where VI = (0, 1) and V2 = (2, 3).

Solution: From Example 4.15,

T(v = [=~ ;] [ ~ ] = [ ; ] = [T(VI)]a ,

T(V2) = [=~ ;] [; ] = [ ~] = [T(V2)]a.

To write these vectors as linear combinations of basis vectors in 13, we put

[ ; ] = aVI + bV2 = [ a +;: ], [~ ] + = CVI dV2 = [ C +;~ ].


Solving for a, b, Cand d, we obtain

[T(VI)]p = [ : ] = ~[ ~ ] and [T(V2)]p = [ ~] = ~[ ; ] .

Therefore, [T]p = '2 )[1 5]


1 1 .
o
Remark: (1) Recall that any m x n matrix A can be considered as a linear trans-
formation from the n-space jRn to the m-space jRm via x 1-+ Ax . Clearly, its matrix
representation with respect to the standard bases ex for jRn and 13 for jRm is the matrix
A itself, i.e., A = [A]~. (Note that Aej is just the j-th column vector of A.) In partic-
ular, if A is an invertible n x n square matrix, then the column vectors CI, C2, ... , Cn
form another basis y for jRn. Thus, A is simply the linear transformation on jRn that
takes the standard basis ex to y, in fact,
4.4. Vectorspaces of linear transformations 131

the j-th column of A, so that its matrix representation [A]~ is the identity matrix.
(2) Let V and W be vector spaces with bases a and {3, respectively , and let
T : V --+ W be a linear transformation with the matrix representation [T]~ A. Then =
itis clear that Ker(T) and Im(T) are isomorphic to the null spaceN(A) and the column
space C(A) , respectively , via the natural isomorphisms. In particular, if V = JRn and
W = JRm with the standard bases, then Ker(T) = N(A) and Im(T) = C(A) .
Therefore, from Theorem 3.17, we have

dim Ker(T) + dim Im(T) = dim V. I


(3) Let Ax = b be a system of linear equations with an m x n coefficient matrix A. By
considering the matrix A as a linear transformation from JRn to JRm , one can have other
equivalent conditions to those mentioned in Theorems 3.24 and 3.25: The conditions
in Theorem 3.24 (e.g., C(A) = JRm) are equivalent to the condition that A is surjective,
and those in Theorem 3.25 (e.g., N(A) = {On are equivalent to the condition that A
is one-fa-one. This observation gives the proof of (15)-(16) in Theorem 3.26.

Problem 4.10 Find the matrix representations [TJa and [Tl,8 of each of the following linear
transformations T on jR3 with respect to the standard basis a. = {eI, e2, e3} and another
fJ = [ej , e2, ej}:
(1) T(x, y, z) = (2x - 3y + 4z, Sx - y + 2z , 4x + 7y),
(2) T(x, y, z) = (2y + z, x - 4y , 3x).

Also , find the matrix representation [T]~ of each of the linear transformations T.

Problem 4.11 Let T : jR4 _ jR3 be the linear transformation defined by

T(x , y , z , u)=(x+2y, x - 3z + u, 2y+3z+4u) .

Let a. and fJ be the standard bases for jR4 and jR3 , respectively . Find [Tl~ .

Problem 4.12 Let id : jRn _ jRn be the identity transformation. Let Xk denote the vector in
jRn whose first k - 1 coordinates are zero and the last n - k + 1 coordinates are 1. Then clearly
fJ = [xj , . .. , xnl is a basis for jRn (see Problem 3.9) . Let a. = {eI, . . . , en} be the standard
basis for jRn. Find the matrix representations [idl~ and [idl~ .

4.4 Vector spaces of linear transformations


Let V and W be two vector spaces of dimensions nand m. Let (V; W) denote the
set of all linear transformations from V to W, i.e.,

(V; W) = {T : T is a linear transformation from V to W} .

For any two linear transformations Sand T in (V; W) and A E JR, we define the
sum S + T and the scalar multiplication 'AS by
132 Chapter 4. Linear Transformations

(S + T)(V) = S(V) + T(v) and (AS)(v) = A(S(v))


for any v E V. Clearly, the sum S + T and the scalar multiplication AS are also linear
and satisfy the operational rules of a vector space, so that L(V; W) becomes a vector
space.
Let a and fJ be two ordered bases for V and W, respectively and let T : V --+ W
be a linear transformation. Then the associated matrix [T]~ of T with respect to these
bases is uniquely determined by Theorem 4.9. That is, the function </J : L(V; W) --+
MmxnOR) defined by
</J(T) = [T]~ E MmxnOR)
for T E L(V; W) is well defined (see Section 4.3) .

Lemma 4.10 The function </J : L(V; W) --+ MmxnOR) is a one-to-one correspon-
dence between L(V; W) and Mmxn(IR).

Proof: (1) It is one-to-one : If [S]~ =[T]~ for Sand T in L(V; W), then we have
=
S T by Corollary 4.4.
(2) It is onto: For any m x n matrix A (considered as a linear transformation from
IRn to IRm), define a linear transformation T : V --+ W by T = w- I 0 A 0 4> as the
composition of A with the natural isomorphisms 4> : V --+ IR n and W : W --+ IRm.
Then clearly [T]~ = A, i.e., </J is onto. 0

Furthermore, the following lemma shows that </J is linear, so that it is in fact an
isomorphism from L(V; W) to Mm xn(IR).

Lemma 4.11 Let V and W be vector spaces with ordered bases a and fJ, respectively,
and let S, T : V --+ W be linear. Then we have

[S + T]~ = [S]~ + [T]~ and [kS]~ = k[S]~ .

Proof: Leta = {VI , .. . , vn } andfJ = {WI, ... , w m }. Then we have unique expres-
sions S(Vj) = Lr=1
aijw; and T(vj) = Lr=1
bijw; for each I s j s n, so that
[S]~ = [aij] and [T]~ = [bij]' Hence
m m m
(S + T)(Vj) = I>ijW; + I )ijW; = L(aij + bij)w;.
;=1 ;=1 ;=1
Thus
[S + T]~ = [S]~ + [T]~.
The proof of the second equality [kS]~ = k[S]~ is similar and left as an exercise. 0

In particular, if V = IRn and W = IRm, then the vector space Mm xn(IR) of m x n


matrices may be identified with the vector space L(IRn; IRm), since such a matrix A is
4.4. Vectorspaces of linear transformations 133

a linear transformation and A itself is the matrix representation of itself with respect
to the standard bases of lRn and lRm,
One can summarize our discussions in the following theorem:

Theorem 4.12 For vector spaces V ofdimension nand W ofdimension m, the vector
space ,C(V; W) ofall linear transformations from V to W is isomorphic to the vector
space Mmxn(lR) of all m x n matrices, and

dim(V ; W) = dim Mmxn(lR) = mn = dim V dim W.


Remark: With an isomorphism from the vector space (V; W) to the vector space
M mxn (R) as mentioned in Theorem 4.12, one can prove that the following conditions
for a linear transformation T on a vector space V are equivalent, as mentioned in
Theorem 3.26:
(1) T is an isomorphism,
(2) T is one-to-one,
(3) T is surjective.
(One can also prove it directly by using the definition of a basis for V. See Prob-
lem4.7.)

The next theorem shows that the one-to-one correspondence between (V ; W)


and Mm xn (R) preserves not only the vector space structure but also the compositions
oflinear transformations, Let V, Wand Z be vector spaces. Suppose that S : V --+ W
and T : W --+ Z are linear transformations, Then the composition T 0 S : V --+ Z is
also linear.

Theorem 4.13 Let V, Wand Z be vector spaces with ordered bases a , {3, and y ,
respectively. Suppose that S : V --+ Wand T : W --+ Z are linear transformations.
Then
[T 0 S]!; = [T]~ [S]~ .

Proof: Leta = (v), .. . , vn }, f3 = {WI, .. . , wm}andy = (z), .. . , zel . Let [ T ]~ =


[aij] and [S]~ = [bpq ] . Then, for 1 s i s n,

(T 0 S)(Vi) = T(S(Vi)) =T (tbkiWk) = tbkiT(Wk)


k=) k=)

= tbki (tajkZj) =t (tajkbki) Zj .


k=l j=) j=) k=)

It shows that [T 0 S]!; = [T]~[S]~. o


134 Chapter 4. Linear Transformations

Problem 4.13 Let ex be the standard basis for lR3, and let S, T : lR3 ~ lR3 be two linear
transformations given by

S(el) = (2, 2, 1), S(e2) = (0, I, 2), S(e3) = (-1, 2, I) ,


T(el) = (1, 0, 1), T(e2) = (0, 1, I), T(e3) = (1, 1, 2).

Compute [S + Tl a, [2T - Sla and [T 0 Sla .

Problem 4.14 Let T : P2(JR) ~ P2(JR) be the linear transformation defined by T (f) =
(3 +
x)J' +2J,andletS : P2(lR) ~ lR3 be the one definedby S(a+bx+cx 2) =
(a-b , a+b , c).
Fora basis ex = {l,x , x 2} for P2(lR) and the standard basis f3 = {el,e2, e3}for]R3, compute
p p
[Sla, [T]a and [S 0 T]a .

Theorem 4.14 Let V and W be vector spaces with ordered bases a and {3, respec-
tively, and let T : V ~ W be an isomorphism. Then

Proof: Since T is invertible, dim V = dim W, and the matrices [T]~ and [T-I]p
are square and of the same size. Thus,

is the identity matrix. Hence, [T-I]p = ([T]~) -I . o


In particular, if a linear transformation T : V ~ W is an isomorphism, then [T]~
is an invertible matrix for any bases a for V and {3 for W.

Problem 4.15 For the vector spaces PI (JR) and lR2, choose the bases ex = {I , x } for PI (R) and
f3= {el, e2} for lR2, respectively. Let T : PI (R) ~ ]R2 be the linear transformation defined
by T(a + bx) = (a, a + b) .
(1) Show that T is invertible. (2) Find [T]~ and [T-Il p'

4.5 Change of bases

In Section 4 .2, we have seen that, in an n-dimensional vector space V with a fixed
basis a, any vector x can be identified with a column vector [x]a in the n-space jRn via
the natural isomorphism <1>. Of course, one may get a different column vector [x]p if
another basis {3 is given instead of a . Thus, one may naturally ask what the relation
between [x], and [x]p is for two different bases a and {3.
To answer this question, let us begin with an example in the plane jR2. The coor-
dinate expression ofx = (x, y) E jR2 with respect to the standard basis a = Iei. e2}

is x = xel + ye2, so that [x]a = [ ~ ].


4.5. Change of bases 135

Figure 4.4. Coordinates {el' e2l and {e;, e2l

Now let fJ = {e;, e 2} be another basis for]R2 obtained by rotating ex counterclock-

l
wise through an angle 8 as in Figure 4.4, and suppose that the coordinate expression
ofx E ]R2 with respect to fJ is written as x = x'e; + y'e 2, or [xl,B = [ ~; Then,
the expressions of the vectors in fJ with respect to ex are

e; = id(eD = cos8 el + sin8 e2


{ e = id (e ) = - sin8 el + cos8 e2,
2 2
so
[e; ]a = [ ~~~: ] , ,
[e 2la =
[ - sin8 ]
cos8 '
Therefore, from x = xel + ye2 and
x = x' e; + y' e2 = (x' cos 8 - y' sin 8)el + (x' sin 8 + y' cos 8)e2,
one can have the matrix equation:

[ x]
y
= [ c~s 8
sm8
- sin 8 ] [
cos8 y'
x' ], or [x], = [id]p[xl,B,

where
[i d]a = [[ e' ] [e']] = [ c~s 8 - sin 8 ] .
,B I a 2a sin 8 cos 8

It means that two coordinate vectors [x]a and [x],B in the 2-space ]R2 are related
by the associated matrix [id]p for the identity transformation id on ]R2. Note that

[idlg = ([idl,Ba)-1 = [ c~s~ sin~] by Theorem 4.14 .


- smo coso

In general, if ex = {VI,V2, . . . , vn} and fJ = {WI,W2, . . . , wnl are two ordered


bases for an n-dimensional vector space V, then any vector x E V has two expressions:
136 Chapter 4. Linear Transformations
n n
X = l:::XiVi = LYjWj .
i=1 j=1

In particular, each vector in f3 is expressed as a linear combination of the vectors in


a: say Wj = id(wj) = I:7=1 quVi for j = 1,2, ... , n, so that

Then for any x E V,


n n n n
X = LXiVi = LYjWj LYj LqijVi
i= 1 j=1 j=1 i=1

This is equivalent to the matrix equation

[xla = [t J=I
quY j] = [idlp[xlfl

or

where
qll ... qln]
[id]p = '. = [[wIla .. . [Wn]a]
[
qnl qnn
This means that any two coordinate vectors of a vector in V with respect to two
different ordered bases a and f3 are related by the matrix representation [id]p of
the identity transformation on V, and this can be incorporated in the commutative
diagram in Figure 4.5 (next page).
Definition 4.5 The matrix representation [id]p of the identity transformation id :
V -+ V with respect to any two ordered bases a and f3 is called the basis-change
matrix or the coordinate-change matrix from f3 to a.
Since the identity transformation id : V -+ V is invertible, the basis-change
matrix Q = [id]p is also invertible by Theorem 4.14. If we had taken the expressions
of the vectors in the basis a with respect to the basis f3 : Vj = id(vj) = L:7=1 PijWi
for j = 1,2, . . . , n, then [Pij] = [idl~ = Q-I and

[x]fl = [idl~[x]a = ([id]p)-I[X]a'


4.5. Change of bases 137

id

I
V V
x 1--- ...... x

~' T
[xlp ~
T
[xla
l~
jRn jRn.
Q = [idl p

Figure 4.5. The basis-change matrix [idlp

Example 4.17 (Analytic interpretation of a basis-change matrix ) Consider a curve


xy = I on the plane ]R2 . Find the quadric equation of the cur ve which is obtained
from the curve x y = I by rotating around the origin clockwise through an angle x /4.

Solution: .Let f3 = {e; , e;l be the basis for ]R2 obtained by rotating the standard

basis a = {et , e2l counterclockwis e through an angle n /4, and let[x]a = [ ~ ] and

[x]p = [ ~: ] . Then,

[ ~ ] = (i d]~ ~;
[ ] = [ ~~~ f -~~~ f][~; ]= [ ~ - ~ ] [ ~; l
Hence, the equation xy = I is transformed to

1= xy = (~x/ _ ~y') (~x/ + ~y') = (x:/ _(y~)2 ,


which is a hyperbola. (See Figure 4.6.) o

Figure 4.6. The graphs of xy = 1 and T - ( ') 2 = 1


(X') 2
138 Chapter 4. Linear Transformations

Example 4.18 (Computing a basis-change matrix) Let the 3-space JR 3 be equipped


with the standardryz-coordinate system, i.e., with the standard basis a = [ej, e2, e3}.
Take a new x'y'z' -coordinate system by rotating the xyz-system around its z-axis
counterclockwise through an angle (), i.e., we take a new basis f3 = {e;, e2, e by J}
rotating the basis a about z axis through (). Then we get

- Sin()]
[e2]a =
[
c~s o ,

Hence, the basis-change matrix from f3 to a is


COS () - sin e
Q = [id]p = sin () cos ()

n
[
o 0

[n ~ [~~: -~~~: ~][


so

Ixl, ~ ~ Q[x],
Moreover, Q = [id]p is invertible and the basis-change matrix from a to f3 is

Q-I = [id]~ =
COS ()
sin
- s~n () cose 0
o 0] ,
[
o 1

0] [~.
so that
sin () x ]
X' ]
~: =
[COS ()
- s~n () co~ o ~ o
[

Problem 4.16 Find the basis-change matrix from a basis a to another basis f3 for the 3-space
1R3,wherea = {(I , 0,1 ), (1,1 ,0) , (0, 1, I)}, f3 = {(2, 3,1) , (1,2,0), (2,0, 3)).

4.6 Similarity

The coordinate expression of a vector in a vector space V depends on the choice of


an ordered basis. Hence, the matrix representation of a linear transformation is also
dependent on the choice of bases.
Let V and W be two vector spaces of dimensions nand m with two ordered bases
a and f3 , respectively, and let T : V -+ W be a linear transformation. In Section 4.3,
we discussed how to find the associated matrix [T]~. If one takes different bases a'
and f3' for V and W, respectively, then one may get another associated matrix a
[Tl:
of T . In fact, we have two different expressions
4.6. Similarity 139

[X]a and [x]a' in IR n for each x E V,


m
[T (x)]fJ and [T (x)] fJ' in IR for T(x) E W.

They are related by the basis-change matrices as follows:

I fJ'
[x]a' = [idv]~ [x]a , and [T(x)]fJ' = [idw]fJ [T(x)],8.

On the other hand, by Theorem 4.9 , we have

[T(x)],8 = [T]~[x]a, and [T(x)],8'

Therefore, we get

fJ' ,8' , 8' ,8


[T]a,[x]a' = [T(x)]fJ' = [idw]fJ [T(x)],8 = [idw],8 [T]a[x]a

= [idw ]~' [T]~ [idv ]~,[x]a"


This equation looks messy. However, by Theorem 4.13, this relation can be obtained
directly from T = idw 0 T 0 id v as

[T]~; = [idw 0 T 0 idv ]~; = [idw ]~' [T]~ [idv ]~,.

Note that [T]~ and [T]~; are m x n matrices, [idv ]~, is an n x n matrix and [idw ]~'
is an m x m matrix .
The relation can also be incorporated in the diagram in Figure 4.7 , in which all
rectangles are commutative.

[T]~;
jRn jRm

~ T
;/
(V, a') (W , fJ /)

[idv ]~, idv 1 1idw [idw]~,


T
(V,a) (W, fJ)

jRn
7 [T]~ ~ jRm

Figure 4.7. Relating two associated matrices [T]~ and [T],8;


a

Our discussion is summarized in the following theorem.


140 Chapter 4. Linear Transformations

Theorem 4.15 Let T : V -+ W be a linear transformation from a vector space V


with bases a and a ' to another vector space W with bases 13 and 13'. Then

[T]~: = P-I[T]~Q ,

where Q = [i dv J: , and P = [i dw ]~, are the basis-change matrices.


In particular, if we take W = V, a = 13 and a' = 13', then P = Q and we get to
the following corollary.
Corollary 4.16 Let T : V -+ V be a linear transformation on a vector space V and
let a and 13 be ordered bases for V. Let Q = [id]Jj be the basis-change matrixfrom
13 to a . Then
(1) Q is invertible, and Q-I = [id]~.
(2) For any x E V, [x]a = Q[x]p.
(3) [T]p = Q-l[T]aQ.

The relation (3) of [T]p and [T]a in Corollary 4.16 is called a similarity. In
general, we have the following definition.
Definition 4.6 For any square matrices A and B, A is said to be similar to B if there
exists a nonsingular matrix Q such that B = Q-I A Q.
Note that if A is similar to B, then B is also similar to A. Thus we simply say that
A and B are similar. We saw in Corollary 4.16 that if A and B are n x n matrices
representing the same linear transformation T on a vector space V , then A and B are
similar.
Example 4.19 (Two similar associated matrices) Let 13 = {VI , V2, V3} be a basis for
the 3-space ]R3 consisting of VI = (l, 1, 0), V2 = (1, 0, 1) and V3 = (0, 1, 1) . Let
T be the linear transformation on ]R3 given by the matrix

21-1]
[T]p =
[ 1 2
-1 1
3
1
.

Let a = {el, e2, e3} be the standard basis. Find the basis-change matrix [id]~ and
[T]a.

Solution: Since VI = ei + e2, V2 = el + e3, V3 = e2 + e3, we have


10]
o 1 , and [id]~ = ([idJJj)-1 = ~2 [ -1~ -~1 -1 ]
1 .

-: n
1 1 1
Therefore,

[T]a = [id]Jj[T]p[id]~ = -1 [ 43
2 -1 o
4.6. Similarity 141

Example 4.20 (Computing an associated matrix) Let T : ]R3 ~ JR.3 be the linear
transformation defined by

T(XI, X2, X3) = (2xI + X2, XI + X2 + 3X3, -X2)'


Let a = [ej , e2, es) be the standard ordered basis for JR.3, and let f3 = {VI, V2, V3}
be another ordered basis consisting of VI = (-1, 0, 0), V2 = (2, I, 0), and
V3 = (1, I, 1). Find the associated matrices [T]a and [T]/l for T. Also, show that
T (v j) is the linear combination of the basis vectors in f3 with the entries of the j -th
column of [T]/l as its coefficients for j = 1,2,3 .

Solution: One can easily show that

[T]a =
2 10]
[o1 1 3
-1 0
and [id]p =
[-1 21]
0 1 1
001
.

- 1 2
-1 ]
Thus, with the inverse [id]~ = ([id]p)-I = 0 1 - ~ , it follows that
[ o 0

To show the second statement,let j = 2. Then T(V2) = T(2, 1,0) = (5,3, -1). On
the other hand, the coefficients of [T (V2)]/l are just the entries of the second column
of [T]/l' Therefore,

T(V2) 2v\ + 4V2 - V3

2(-1,0,0) + 4(2 , 1,0) - (1, I, 1) = (5,3, -I),


as expected. o
The next theorem shows that two similar matrices can be matrix representations
of the same linear transformation.

Theorem 4.17 Suppose that an n x n matrix A representsa linear transformationT :


V ~ Von a vector space V with respect to an ordered basis a = {VI, V2, . . . , vn },
i.e., [T]a = A. If B = Q-I A Q for some nonsingular matrix Q, then there exists a
basis f3 for V such that B = [T]/l and Q = [id]p'

Proof: Let Q = [qij] and let WI, W2, . .. , Wn be the vectors in V defined by

qllvl + q21 V2 + + qnlvn


q12 VI + q22 V2 + + Qn2Vn
142 Chapter 4. Linear Transformations

Then the nonsingularityof Q = [% l impliesthat tl = {WI, W2, .. . , wn } is an ordered


basis for V, and Theorem 4.16(3) shows that [Tlp = Q-I [Tl a Q = Q-I A Q = B
with Q = [idlfj. 0

Example 4.21 (A matrix similar to an associated matrix is also an associated matrix)


Let D be the differential operator on the vector spaceP2(lR) Given the ordered basis
a = {I, x, x 2 }, first note that

D(l) = 0 = 0 . 1+ 0 . x + 0 . x2
D(x) = 1 =11+0x+Ox 2
D(x 2 )= 2x = O 1 + 2 x + 0 . x 2

Hence, the matrix representationof D with respect to a is given by

0I 0]
[Dl a =
[ 000
0 0 2 .

Choose a nonsingular matrix

Q= [~ ~ -~],
o 0 4

Let

Now, we are going to find a basis tl = {VI, V2, V3} so that B = [Dlp. But, if it is, the
matrix Q must be the basis-changematrix [idlfj, and then

VI = 11 + o x + O x 2 = I ,
V2 = 01 + 2 x + O x 2 = 2x,
V3 = -2.1+0x+4x 2 = -2+4x 2 .

Clearly, one can obtain

D(l) = 0 = O 1 + 0 . 2x + 0 . (-2 + 4x 2 ) ,
D(2x) = 2 = 2.1+0.2x+O .(-2+4x 2 ) ,
D( -2 + 4x ) = 8x = O 1 + 4 . 2x + 0 . (-2 + 4x 2 ) ,
2

and
020]
[Dlp = [ 0 0 4
000
, o
thus, as expected, that [Dlp = B = Q-I[DlaQ.
4.7.1. Application: Dual spaces and adjoint 143

Problem 4.17 Let T : ]R3 ~ ]R3 be the linear transformation defined by

Let a be the standard basis, and let fJ = {VI, V2, V3} be another ordered basis consisting of
VI = (1, 0, 0), v2 = (1, 1, 0), and v3 = (1, 1, 1) for ]R3. Find the associated matrix of T
with respect to a and the associated matrix of T with respect to fJ. Are they similar?

Problem 4.18 Suppose that A and B are similar n x n matrices. Show that
=
(1) det A det B,
(2) tr(A) =tr(B),
(3) rank A =
rank B.

Problem 4.19 Let A and B be n x n matrices. Show that if A is similar to B, then A2 is similar
to B 2 . In general, An is similar to B" for all n ~ 2.

4.7 Applications

4.7.1 Dual spaces and adjoint

Note that the space of all scalars is a one-dimensional vector space JR, and the set of
all linear transformations from V to JR is the vector space .c(V; JR) whose dimension
is equal to the dimension of V (see Theorem 4.12), so that the two vector spaces
.c( V; JR) and V are isomorphic. In this section, we are concerned exclusively with
such linear transformations from V to the scalar space R

Definition 4.7 Let V be a vector space.


(1) The vector space .c(V; JR) of all linear transformations from V to JR is called the
dual space of V and denoted by V* .
(2) An element (i.e., a linear transformation) in the dual space .c(V ; JR) is called a
linear functional of V .

From the definition, one can say that any vector space is isomorphic to its dual
space .

Example 4.22 The trace function tr Mn xn(JR) ~ JR is a linear functional of


Mn xn(JR)
The definite integral of continuous functions is one of the most important examples
of linear functionals in mathematics .

Example 4.23 (Fourier coefficients are linear functionals) Let C[a , b] be the vec-
tor space of all continuous real-valued functions on the interval [a , b]. The definite
integral I : C[a, b] ~ JR defined by
144 Chapter 4. Linear Transformations

I(f) = l b
f(t)dt

is a linear functional of qa, b) . In particular, if the interval is [0, 2rr) and n is an


integer, then

An(f) = - 1 1
rr 0
21r
f(t) cosnt dt and Bn(f) = - 1
rr 0
1 21r
f(t)sinntdt

are linear functionals, called the n-th Fourier coefficients of f. o


For a matrix A regarded as a linear transformation A : IRn -+ IRm , the transpose
AT of A is another linear transformation AT: IRm -+ IR n . For a linear transformation
T : V -+ W from a vector space V to W, one can naturally ask what its transpose is
and what the definition is. In this section, we discuss this problem.
Recall that a linear functional T : V -+ IR is completely determined by the values
on a basis for V. Let a = {VI. V2, .. . , vn } be a basis for a vector space V. For each
i = 1, 2, ... , n, define a linear functional

vt: V -+ IR

by vt(Vj) = liij for each j = 1, 2, . . . , n. Then, for any x = 2:ajvj E V , we have


vt (x) = ai, which is the i -th coordinate of x with respect to a . Thus, the functional
vt is called the i -th coordinate function with respect to the basis a .
Theorem 4.18 The set a* = {vr, vi , . .. , v~} of coordinate functions forms a basis
for the dual space V*, and for any T E V* we have
n
T = LT(vj)vt.
j=1

Proof: Clearly, the set a* = {vi , vi, . .. , v~} is linearly independent, since 0 =

2:7=1 Cjvt implies = 2:7=1 Cjvt(Vj) = Cj for each j = 1,2, . .. , n. Because
dim V* = dim V = n, these n linearly independent vectors in a* must form a basis.
Now, for any T E V* , let T = 2:7=1 Cjv7- Then, T(V j) = 2:7=1 Cjvt(Vj) = Cj.
It gives T = 2:7=1 T(vj)v7- 0

Definition 4.8 For a basis a {VI,V2 , .. . , vn } for a vector space V , the basis
a* = {vr, vi, ... , v~} for V* is called the dual basis of a.
Example 4.24 (Computing a dual basis) Let a = {VI , V2} be a basis for 1R2, where
VI = (1, 2) and V2 = (1, 3). To find its dual basis a* = {vi, vi} of a, we consider
the equations

I = vi(vd = vr(el) + 2vi(e2),


= Vi(V2) = vr(el) + 3Vr(e2).
Solving these equations, we obtain that vi(el) = 3 and Vi(e2) = -1. Thus vi(x , y) =
3x - y . Similarly, it can be shown that Vi (x , y) = -2x + y. 0
4.7.1. Application: Dual spaces and adjoint 145

The following example shows a natural isomorphism between the n-space JRn and
its dual space JRn*.

Example 4.25 (The dual basis et is the coordinate function) For the standard basis
a = {el, e2, ... , en} for the n-space JRn , its dual basis vector et is just the i -th
coordinate function. In fact, for any vector x = (XI , X2 , ,xn) = XI el + X2e2 +
.. . + xnen E JRn, we have et (x) = et (XI el + X2e2 + + xnen) = Xi for all i,
On the other hand, when we write a vector in JRn as x = (XI, X2, . . . ,xn ) with
variables Xi, it means that for a given point a = (ai, a2, ... , an) E JRn , each Xi
gives us the i-th coordinate of a, that is, xi(a) = a, for all i, In this sense, one can
identify et = Xi for i = 1, 2, ... , n , so that JRn * = JRn and they are called coordinate
functions . 0

Problem4.20 Leta = {(1, 0,1) , (1, 2,1), (0,0, 1)}beabasisforlR3.Findthedualbasis


a* .

For a given linear transformation T : V --+ W, one can define T* : W* --+ V*


by T* (g) = goT for any g E W*. In fact, for any linear functional g E W*, i.e. ,
g : W --+ JR, the composition goT : V --+ JR given by g 0 T(x) = g(T(x)) for x E V
defines a linear functional on V , i.e., T* (g) = goT E V*.

Lemma 4.19 The transformation T * : W* --+ V* defined by T*(g) = goT for


g E W* is a linear transformation. It is called the adjoint (or transpose) ofT. 0

Proof: For any f, g E W*, a, bE JR and x E V,

T*(af + bg)(x) = af(T(x)) + bg(T(x)) = (aT*(f) + bT*(g))(x) . 0

Example 4.26 idv)* = idv and (T 0 S)* = S* 0 T*)


(1) Let id : V --+ V be the identity transformation on a vector space V. Then for
any g E V*, i d" (g) = g 0 i d = g. Hence, the adjoint i d * : V* --+ V* is the identity
transformation on V*, i.e., id* = id.
(2) Let S : U --+ V and T : V --+ W be two linear transformations. Then for any
g E W*, we have

(T 0 S)*(g) = go (T 0 S) = (g 0 T) 0 S
= T*(g) 0 S = S*(T*(g)) = (S* 0 T*)(g).

It shows that (T 0 S)* = S* 0 T*. o


Now,ifS: V --+ Wisanisomorphism,then(S-I)*oS* = (SoS-I)* = id* = id
shows that S* : W* --+ V* is also an isomorphism.
Note that the linear transformation *:
V --+ V* defined by assigning a basis for V
to its dual basis is an isomorphism, so that the composition ** :
V --+ V** is also an
isomorphism. However, an isomorphism between V and V** can be defined without
choosing a basis for V . In fact, for each x E V, one can first define V* --+ JR by x:
146 Chapter 4. Linear Transformations

x(1) = f (x) for every f E V *. It is easy to verify that x is a linear functional on


x
V* so that E V**. The following theorem shows that the mapping ct> : V ~ V**
x
defined by ct>(x) = is an isomorphism and it is not dependent on the choice of basis
for V.

Theorem 4.20 The mapping ct> : V ~ V** defined by ct>(x) = xis an isomorphism
from V to V**.

Proof: To show the linearity of ct>, let x, y E V and k a scalar. Then , for any f E V*,
..--....-
ct>(x + ky)(f) = (x + ky)(f) = f(x + ky)
= f(x) + kf(y) = x(f) + ky(f)
= (x + ky)(f) = (ct>(x) + kct>(y (I).

Hence, ct>(x + ky) = ct>(x) + kct>(y) .


To show that ct> is injective, suppose x E Ker(ct . Then ct>(x) = x
= 0 in V**,
i.e., x(f) = 0 for all f E V* . It implies that x = 0: In fact, if x :f:. 0, one can choose
a basis ex = {VI, V2, ... , vn } for V such that VI = X. Let ex* = {vi, vi, . . . , v~} be
the dual basis of ex. Then

0= xCvi) = vi(x) = vi(vI) = 1,

which is a contradiction. Thus , x = 0 and Ker(ct = {OJ.


Since dim V = dim V**, ct> is an isomorphism. D

Problem 4.21 Let V = ]R3 and define Ii E V* as follows :


II (x, y, z) = x - 2y, h(x, y, Z) = x + y + Z, f3(x, y, Z) = y - 3z.

Prove that {fl, Iz. 13} is a basis for V*, and then find a basis for V for which it is the dual .

We now consider the matrix representation of the transpose S* : W* ~ V*


of a linear transformation S : V ~ W. Let ex = {VI, V2, .. . , vn } and f3 =
{WI, W2, . .. , w m} be bases for V and W with their dual bases ex* = {vi, vi, ... , v~}
and f3* = {wi, wi, ... , w~}, respectively.

Theorem 4.21 The matrix representation of the transpose S* : W* ~ V* is the


transpose of the matrix representation of S : V ~ W, that is,

Proof: Let S(Vi) = L~=I auw. so that

Then
4.7.1. Application: Dual spaces and adjoint 147

[S*lp: = [[S*(wj)la* ... [S*(w~)la*] '


Note that, for 1 s j s m,
n n
S*(wj) = LS*(wj)(v;)vi = LajiVi ,
;=1 ;=1

since

= (wj 0 S)(Vi) = wj(S(v;


m

= wj (takiwk) = Lakiwj(Wk) =
k=1 k=1

Hence, we get [S*lfJ*


0'*
= (
[SlafJ)T . D

Example 4.27 (The transpose AT is the adjoint transformation of A) Let A : jRn ~


jRm be a linear transformation defined by an m x n matrix A. Let a and f3 be the
standard bases for jRn and jRm, respectively. Then [Al~ = A. By Theorem 4.21, we
have [A *lp: = ([Al~)T. Thus, with the identification jRk* = jRk via Ct* = Ct and
f3* = f3 as in Example 4.25, we have [A *lp: = A * and A * = AT . In this sense, we
see that the transpose A T is the adjoint transformation of A. D
As the final part of the section, we consider the dual space of a subspace. Let
V be a vector space of dimension n, and let U be a subspace of V of dimension k.
Then U* = {T : U ~ jR : T is linear on U} is not a subspace of V*. However,
one can extend each T E U* to a linear functional on V as follows . Choose a basis
a = {UI , U2 , ... , uk! for U . Then by definition Ct* = {uj,ui, , uk} is its dual
basis for U*. Now extend a to a basis f3 = {UI, U2, . . . , Uk> Uk+l, , un} for V . For
each T E U*, let T : V ~ jR be the linear functional on V defined by

ifi ~ k ,
if k + 1 ~ i ~ n.

Then clearly T E V* and the restriction Tlu ofT on U is simply T: i.e., Tlu = T E
=
U*. It is easy to see that (T + kS) T + kS. In particular, it is also easy to see that
{uj, u2' .. ., uk} is linearly independent in V* and uilu = ui E U*, i = 1,2, .. . ,k.
Therefore, one obtains a one-to -one linear transformation

tp : U* ~ V*

given by cp(T) = T for all T E U*. The image cp(U*) is now a subspace of V*. By
identifying U* with the image cp(U*) , one can say U* is a subspace of V*.
Problem 4.22 Let U and W be subspaces of a vector space V. Show that U :: W if and only
ifW* :: U*.
148 Chapter 4. Linear Transformations

Let S be an arbitrary subset of V, and let (S} denote the subspace of V spanned
by the vectors in S. Let Sol = (f E V* : f(x) = 0 for any XES} . Then it is easy
to show that Sol is a subspace of V*, Sol = (S}ol, and dim(S} + dim Sol = n.
Let R be a subset of V*. Then Rol = {x E V : f(x) = 0 for any fER} is again
a subspace of V such that Rol = (R}ol and dim Rol + dim(R} = n.

Problem 4.23 For subspaces U and W of a vector space V, show that


=
(1) (U + W)ol Uol n w- (2) (U n W)ol Uol = + w-.

4.7.2 Computer graphics

One of the simple applications of a linear transformation is to animation or graphical


display of pictures on a computer screen. For a simple display of the idea, let us
consider a picture in the 2-plane ]R2. Note that a picture or an image on a screen
usually consists of a number of points, lines or curves connecting some of them, and
information about how to fill the regions bounded by the lines and curves. Assuming
that the computer has information about how to connect the points and curves, a figure
can be defined by a list of points.
For example, consider the capital letters 'LA' as in Figure 4.8. They can be repre-

o
Figure 4.8. Letter L.A. on a screen

sented by a matrix with coordinates of the vertices. For example, the coordinates of
the 6 vertices of 'L' form a matrix:

vertices 1 2 3 4 5 6
x-coordinate [ 0.0 0.0 0.5 0.5 2.0
y-coordinate 0.0 2.0 2.0 0.5 0.5 0.0
2.0] = A.
Of course, we assume that the computer knows which vertices are connected to
which by lines via some algorithm. We know that line segments are transformed
to other line segments by a matrix, considered as a linear transformation. Thus, by
multiplying A by a matrix, the vertices are transformed to the other set of vertices,
and the line segments connecting the vertices are preserved. For example, the matrix
B = [b 0.;5] transforms the matrix A to the following form, which represents
new coordinates of the vertices:
4.7.2. Application: Computer graphics 149

vertices 1 2 3 4 5 6
0.0 0.5 1.0 0.625 2.125 2.0]
BA = [ 0.0 2.0 2.0 0.5 0.5 0.0 .

Now, the computer connects these vertices properly by lines according to the given
algorithm and displays on the screen the changed figure as the left-hand side of the
Figure 4.9. The multiplication ofthe matrix C = [005 ~] to BA shrinks the width

BA n
L!!=Js
3

(CB)A

1 6

Figure 4.9. Tilting and Shrinking

of BA by half producing the right-hand side of Figure 4.9. Thus, changes in the shape
of a figure may be obtained by compositions of appropriate linear transformations.
Now, it is suggested that the readers try to find various matrices such as reflections ,
rotations, or any other linear transformations, and multiply A by them to see how the
shape of the figure changes.

Problem 4.24 For the given matrices A and B = [~ o.~s ] above, by the matrix BT A
instead of BA, what kind of figure can you have ?

Remark: Incidentally, one can see that the composition of a rotation by 1( followed
by a reflection about the x-axis is the same as the composition of the reflection
followed by the rotation (see Figure 4.10). In general, a rotation and a reflection are
not commutative, neither are a reflection and another reflection.
The above argument generally applie s to a figure in any dimension. For instance,
a 3 x 3 matrix may be used to convert a figure in )R3 since each point has three
components.

Example 4.28 (Classifying all rotations in )R3) It is easy to see that the matrices

1 0 - sin f3 ]
R(x .a) = 0 coso
[ o sin o
o ,
cos f3

COS Y
-sin y
cos y
O~]
R( z,y) = Si~ Y
[
o
are the rotations about the x, y, z-axes by the angles a, f3 and y, respectively .
150 Chapter 4. Linear Transformations

!t 1t
Rotationby 7f
~

Reflection Reflection

\f Rotation by x
2\
Figure 4.10. Commutativity of a rotationand a reflection

In general, the matrix that rotates JR3 with respect to a given axis appears frequently
in many applications. One can easily express such a general rotation as a composition
of basic rotations such as R(x ,a) , R(y ,{3) and R(z ,y) :
First, note that by choosing a unit vector u in JR3, one can determine an axis for a
rotation by taking a line passing through u and the origin O. (In fact, vectors u and -u
determine the same line) . Let u = (cosacos{3, cos a sin {3, sin a), -~ ~ a ~ ~'
o ~ {3 ~ 2rr in the spherical coordinates. To find the matrix R(u ,o) of the rotation
about the u-axis bye, we first rotate the u-axis about the z-axis into the x z-plane by
R(z,_{3) and then into the x-axis by the rotation R (y ,-a) about the y-axis , Then the
rotation about the u-axis is the same as the rotation about the x-axis followed by the
inverses of the above rotations, i.e., take the rotation R(x ,o) about the x-axis, and then
get back to the rotation about the u-axis via R(y,a) and R(z,{3) ' In summary,

z
u

Figure 4.11.A rotationabout the u-axis


4.7.2. Application: Computer graphics 151

Problem 4.25 Find the matrix R (0 , t ) for the rotation aboutthe line determined by u = 0 , 1, 1)
T(

by 4'

So far, we have seen rotations, reflections, tilting (or say shear) or scaling (shrink-
ing or enlargement) or their compositions as linear transformations on the plain ]R2 or
the space ]R3 for computer graphics. However, another indispensable transformation
for computer graphics is a translation: A translation is by definition a transformation
=
T : jRn --+ ]Rn defined by T (x) x + Xo for any x E jRn , where Xo is a fixed vector
in ]Rn . Unfortunately, a translation is not linear if Xo =1= 0 and hence it cannot be
represented by a matrix . To escape this disadvantage , we introduce a new coordinate
system , called a homogeneous coordinate. For brevity, we will consider only the
3-space ]R3.
A point x = (x, y, z) E ]R3 in the rectangular coordinate can be viewed as the
set of vectors x = (hx , hy, hz; h), h =1= 0 in the 4-space ]R4 in a homogeneous
coordinate. Most time, we use (x, y , z, I) as a representative of this set. Conversely,
a point (hx, hy, hz, h), (h i= 0) in the 4-space ]R4 in a homogeneous coordinate
corresponds to the point (x / h, y / h, z/ h) E ]R3 in the rectangular coordinate. Now,
it is possible to represent all of our transformations including translations as 4 x 4
matrices by using the homogeneous coordinate and it will be shown case by case as
follows.
(1) Translations: A translation T : jR3 --+ ]R3 defined by T (x) = x + xo, where
Xo = (xo, yo , zo). can be represented by a matrix multiplication in the homogeneous
coordinate as

nR(""~[ :~F ~
(2) Rotations: With the notations in Example 4.28 ,

R(',)~u ~~~; ~~raa


- sinf3
o
cos f3
o
COS Y - sin y 0 0]
sin y cos y 0 0
R (z,y )= 0 0 I 0
[
o 0 0 I
are the rotations about the x , y, z-axes by the angles a , f3 and y in the homogeneous
coordinate, respectively.
(3) Reflections: An xy- reflection is represented by a matrix multiplication in the
homogeneous coordinate as
152 Chapter 4. Linear Transformations

Similarly, one can have an x z-reflection and a yz -reflection.


(4) Shear: An xy-shear can be represented by a matrix multiplication in the
homogeneous coordinate as

Similarly, one can have an xz-shear and a yz-shear with matrices of the form

! ~ ~],
001

respectively.
(5) Scaling: A scaling is represented by a matrix multiplication in the homoge-
neous coordinate as

In summary, all of the transformations can be represented by matrix multiplica-


tions in the homogeneous coordinate and also their compositions can be done by their
corresponding matrix multiplications.

4.8 Exercises
4.1. Which of the following functions T are linear transformations?
(1) T(x, y) = (x 2 _ i, x 2 + y2) .
(2) T(x, y, z) = (x + y, 0, 2x + 4z).
(3) T(x , y) = (sin x , y) .
(4) T(x, y) = (x + I, 2y, x + y ) .
(5) T(x, y, z) = (]], 0) .
4.2. Let T : P2(lR) ~ P3(lR) be a linear transformation such that T(l) = 1, T(x) = x 2 and
T(x 2 ) = x 3 + x . Find T(ax 2 + bx + c) .
4.3. Find SoT and/or T 0 S whenever it is defined.
4.8. Exercises 153

(1) T(x , y,z) = (x - y+ z, x +z), S(x, y ) = (x, x - y , y) ;


(2) T(x, y ) = (x , 3y +x, 2x - 4y , y) , S(x, y, z) = (2x, y ).
4.4. Let S : C(R) --+ C(]R) be the function on the vector space C(]R)defined by, for f E C(]R),

S(f )(x ) = f(x) _ ~x uf (u)du .


Show that S is a linear transformation on the vector space C(]R).
4.5. Let T be a linear transformation on a vector space V such that T 2 = id and T f= id. Let
U = (v E V : T(v) = v} and W = (v E V : T(v ) = -v} . Show that
(1) at least one of U and W is a nonzero subspace of V;
(2) U nW = {O};
(3) V =U + W.
4.6. If T : ]R3 --+ ]R3 is defined by T(x , y, z) = (2x - z, 3x - 2y , x - 2y + z),
(1) determine the null space N (T) of T,
(2) determine whether T is one-to-one,
(3) find a basis for N(T) .
4.7. Show that each of the following linear transformations T on ]R3 is invertible, and find a
formula for T- 1:
(1) T (x,y ,z) = (3x , x- y, 2x+y + z).
(2) T(x, y, z) = (2x , 4x - y, 2x + 3y - z) .
4.8. Let S, T : V --+ V be linear transformations on a vector space V.
(1) Show that if T 0 S is one-to-one , then T is an isomorphism.
(2) Show that if T 0 S is onto, then T is an isomorphism.
(3) Show that if T k is an isomorphism for some positive k, then T is an isomorphism.
4.9. Let T be a linear transformation from ]R3 to ]R2, and let S be a linear transformation from
]R2 to ]R3 . Prove that the compo sition SoT is not invertible.
4.10. Let T be a linear transformation on a vector space V satisfying T - T 2 = id . Show that
T is invertible.
4.11. Let A be an n x n matrix, which is a linear transformat ion on the n-space R" by the matrix
multiplication Ax for any x E R", Suppose that rj , r2, . . . , r n are linearly independent
vectors in ]Rn constituting a parallelepiped (see Remark (2) on page 70). Then A trans-
forms this parallelepiped into another parallelepiped determined by Arl, Ar2, . . . , Arn .
Suppose that we denote the n x n matrix whose j -th column is r j by B , and the n x n
matrix whose j -th column is Ar j by C . Prove that

vol(P(C = I det AI vol(P(B .

(This means that, for a square matrix A considered as a linear transformation, the absolute
value of the determinant of A is the ratio between the volumes of a parallelepiped P(B)
and its image parallelepiped P(C) under the transformation by A.1f det A = 0, then the
image P(C) is a parallelepiped in a subspace of dimension less than n).
4.12 . Let T : ]R3 --+ ]R3 be the linear transformation given by

T (x, y ,z) = (x + y,y + z ,x + z ).

Let C denote the unit cube in ]R3 determined by the standard basis ej , e2, e3. Find the
volume of the image parallelepiped T (C) of C under T.
154 Chapter 4. Linear Transformations

4.13. With respect to the ordered basis a = {I, x, x 2 } for the vector space P2(lR), find the
coordinate vector of the following polynomials:

(1) f(x) =x2 - x + I, (2) f(x) = x 2 + 4x - I, (3) f(x) = 2x + 5.


4.14. Let T : P3 (R) ~ P3 (R) be the linear transformation defined by

Tf(x) = f"(x) - 4f'(x) + f(x).

Find the matrix [Tl a for the basis a = {x, 1 + x, x + x 2 ,


x 3 }.
2
4.15. Let T be the linear transformation on lR defined by T(x, y) (-y, x). =
(1) What is the matrix of T with respect to an ordered basis a = {VI, V2}, where vI =
(1, 2), V2 = (1, -I)?
(2) Show that for every real number c the linear transformation T - c id is invertible.
4.16. Find the matrix representation of each of the following linear transformations T on lR2
with respect to the standard basis {el, e2}.
(1) T(x , y) =
(2y, 3x - y) .
(2) T(x, y) = (3x - 4y, x + 5y).

4.17. Let M = [~ i ~l
(1) Find the unique linear transformation T : lR3 ~ lR2 so that M is the associated

!Ul [il UJ I
matrix of T with respect to the bases

"1 ~
(2) Find T(x, y, z).
"2 ~ ([ n[: ]J
4.18. Find the matrix representation of each of the following linear transformations T on P2 (lR)
with respect to the basis {I, x , x 2 }.
(1) T: p(x) ~ p(x + 1).
(2) T: p(x) ~ p' (x) .
(3) T: p(x) ~ p(O)x .
(4) T : p(x) ~ p(x) - p(O).
x
4.19. Consider the following ordered bases of lR3: a = {el , e2, ej] the standard basis and
fJ = {UI = (I, I , 1), U2 = (1, 1, 0), U3 = (1,0, O)}.
(1) Find the basis-change matrix P from a to fJ .
(2) Find the basis-change matrix Q from fJ to a.
(3) Verify that Q = P- l .
(4) Show that [vlfJ = P[vl er for any vector V E lR3.
(5) Show that [T]fJ = Q-I [Tl a Q for the linear transformation T defined by
T(x,y,z)=(2y+x, x-4y, 3x).
4.20 . There are no matrices A and B in Mnxn(lR) such that AB - BA = In.
4.8. Exercises 155

4.21. Let T : ]R3 -+ ]RZ be the linear transformation defined by

T (x , y, z) = (3x + 2y - 4z , x - 5y + 3z),
and let a ={(l , I , 1), (1,1, D), (I, D, D) } and d = {(l, 3), (2, 5)} bebasesfor]R3
and ]RZ, respect ively.
(l) Find the associated matrix [T]~ for T .
(2) Verify [T]~[v]a = [T (v)]/l for any v E ]R3.
4.22 . Find the basis-change matrix [id]~ from a to .8, when
(l) a = {(2, 3) , (D , I )}, .8 = {(6, 4) , (4,8)};
(2) a = {(5, 1), (I ,2) }, .8 = {(l , D), (D, I )};
(3) a= {(l, 1, 1), (l, I , D), (I, D, D)} , .8 = {(2, D, 3), (- 1, 4, I) , (3, 2, 5)};
(4) a={t , 1, t Z}, .8 = {3 +2t + t Z , t Z-4 , 2 +t}.

4.23. Show that all matri ces of the form Ae = [ c~s


Sin
BB sin BB ] are similar.
- cos

4.24. Show that the matrix A = [~ ~] cannot be similar to a diagonal matrix.

4.25. Are the matrices [~ i ~]


I DID D 3
and [ - ~ ~ ~]
similar?

4.26. For a linear transformation T on a vector space V, show that T is one-to -one if and only
if its tran spose T * is one-to-on e.
4.27. Let T : ]R3 -+ ]R3 be the linear transformation defined by
T (x, y , z)= (2y+ z , -x+4y+ z , x +z) .
Compute [Tl a and [T*la* for the standard basis a = [e j , eZ, ej} ,
4.28. Let T be the linea r transformation from ]R3 into ]Rz defined by
T (XI, X2 , X3) = (XI +x2, 2X3 - XI )
(l) For the standard ordered bases a and .8 for ]R3 and ]Rz respectivel y, find the associated
matrix for T with respect to the bases a and .8.
(2) Let a = {XI, XZ , x3} and .8 = {YI , yz}, where XI = (l, D, -I), xz = (1, 1, 1),
X3 = (1 , D, D), and YI = (D, I), yz = (I , D). Find the associated matrices [T]~ and
[T * l~: .
4.29 . Let T be the linear transformation from ]R3 to ]R4 defined by
T (x , y , z )= (2x+ y+4z , x +y +2z. y + 2z, x +y + 3z) .
Find the image and the kernel of T . What is the dimen sion of Im(T)? Find [Tl~ and
[T*l~: , where
a = {(1 , D, D), (D, 1, D) , (D, D, I)},
fJ = {(l ,D,D,D), (I , I , D, D), (1 , I , I ,D), (1, 1, 1, I)}.
4.30. Let T be the linear transforma tion on V = ]R3, for which the associated matrix with
respect to the standard ordered basis is

A= [ -I~i3 4~ ] .
Find the bases for the kernel and the image of the transpose T* on V* .
156 Chapter 4. Linear Transformations

4.31. Define three linear functionals on the vector space V = P2(lR) by


II (p) = Jd p(x)dx, h(p) = J02
p(x)dx , f3(p) = Jo- p(x)dx.
I

Show that (II, 12, 13j is a basis for V* by finding its dual basis for V .
4.32. Determine whether or not the following statements are true in general, and justify your
answers .
(1) For a linear transformation T : jRn --+ jRm, Ker(T) = {OJ if m > n.
(2) For a linear transformation T : jRn --+ jRm, Ker(T) =1= {OJ if m < n .
(3) A linear transformation T : jRn --+ jRm is one-to-one if and only if the nullspace of
[T]g is {OJ for any basis ex for jRn and any basis f3 for jRm.
(4) For any linear transformation T on jRn, the dimension of the image of T is equal to
that of the row space of [T]a for any basis ex for jRn .
(5) For any two linear transformations T : V --+ W and S : W --+ Z, if Ker(S 0 T) = 0,
then Ker(T) = O.
(6) Any polynomial p(x) is linear if and only if the degree of p(x) is less than or equal
to 1.
(7) Let T : jR3 --+ jR2 be a function given as T(x) = (TI (x) , T2(X for any x E jR3.
Then T is linear if and only if their coordinate functions Ti , i = 1, 2, are linear.
(8) For a linear transformation T : jRn --+ jRn , if [T]g = In for some bases ex and f3 of
R" , then T must be the identity transformation.
(9) If a linear transformation T : jRn --+ jRn is one-to-one , then any matrix representation
of T is nonsingular.
(10) Any m x n matrix A can be a matrix representation of a linear transformation T :
jRn --+ jRm.

(11) Every basis-change matrix is invertible.


(12) A matrix similar to a basis-change matrix is also a basis-change matrix.
(13) det : Mn xn(jR) --+ jR is a linear functional.
(14) Every translation in jRn is a linear transformation.
5
Inner Product Spaces

5.1 Dot products and inner products


To study a geometry ofa vector space, we go back to the case ofthe 3-space ]R3 The dot
(or Euclidean inner) product of two vectors x = (XI, X2, X3) and Y = (YI, Y2, Y3)
in ]R3 is a number defined by the formula

x- y ~ x , y1+ x,Y2 +X3Y3 ~ [x, X2'3] [ ~ ] ~ xTy,


where x T Y is the matrix product of x T and Y, which is also a number identified with
the 1 x 1 matrix x T y. Using the dot product, the length (or magnitude) of a vector
x =(XI , X2, X3) is defined by

[x] = (x x) i = Jxl + x~ + xi '


and the Euclidean distance between two vectors x and Y in ]R3 is defined by

d(x, y) = IIx - YII .


In this way, the dot product can be considered to be a ruler for measuring the length
of a line segment in the 3-space ]R3. Furthermore, it can also be used to measure the
angle between two nonzero vectors: in fact, the angle 0 between two vectors x and Y
in ]R3 is measured by the formula involving the dot product
x y
cos s = - - - 0 ~ 0 ~ it ,
IIxllllYII '
since the dot product satisfies the formula

x Y = IIxllllYII cos s,
In particular, two vectors x and Y are orthogonal (i.e ., they form a right angle
o= tt /2) if and only if the Pythagorean theorem holds:

J H Kwak et al., Linear Algebra


Birkhauser Boston 2004
158 Chapter 5. Inner Product Spaces

By rewriting this formula in terms of the dot product, we obtain another equivalent
condition:
X. Y = XIYI + xzYz + X3Y3 = O.
In fact, this dot product is one of the most important structures with which JR3 is
equipped. Euclidean geometry begins with the vector space JR3 together with the dot
product, because the Euclidean distance can be defined by the dot product.
The dot product has a direct extension to the n-space JRn of any dimension n:
for any two vectors x = (XI , Xz , . . . , xn) and y = (YI , Yz, .. . , Yn) in JRn , their dot
product, also called the Euclidean inner product, and the length (or magnitude)
of a vector are defined similarly as

x y XIYI +xzyz+ " ,+xnYn = xTy,

[x] = (x.x)! = Jxr+xi+ "'+x~,


To extend this notion of the dot product to a (real) vector space, we extract the
most essential properties that the dot product in JRn satisfies and take these properties
as axioms for an inner product on a vector space V .

Definition 5.1 An inner product on a real vector space V is a function that associates
a real number (x, y) to each pair of vectors x and y in V in such a way that the following
rules are satisfied: For any vectors x, y and z in V and any scalar k in JR,
(1) (x, y) = (y , x) (symmetry),
(2) (x + y, z) = (x, z) + (y, z) (additivity),
(3) (kx , y) = k(x, y) (homogeneity),
(4) (x, x) ::: 0, and (x, x) = 0 > x = 0 (positive definiteness) .
A pair (V, ( ,)) of a (real) vector space V and an inner product ( , ) is called a (real)
inner product space. In particular, the pair (JRn , .) is called the Euclidean n-space.

Note that by symmetry (1), additivity (2) and homogeneity (3) also hold for the
second variable: i.e.,

(2') (x, Y+ z) = (x, y) + (x, z),


(3') (x, ky) = k(x, y}.
It is easy to show that (0, y) = 0(0, y) = 0 and also (x, O) = O.
Remark: In Definition 5.1, the rules (2) and (3) mean that the inner product is linear
for the first variable; and the rules (2') and (3') above mean that the inner product is
also linear for the second variable. In this sense, the inner product is called bilinear.

Example 5.1 (Non-Euclidean inner product on JRz) For any two vectors x = (XI, xz)
and y = (YI , yz) in JR z , define
5.1. Dot products and inner products 159

{x, y} = ax \y\ + C(X\Y2 + X2Y\ ) + bX2Y2

= [XJX2] [~ ~] [ ~~ ] = T
x Ay,

where a, band C are arbitrary real numbers. Then , this function { , } clearly satisfies
the first three rules of the inner product , i.e., {, } is symmetric and bilinear. Moreover,

if a > and det A = ab - c2 >

if and only if either X2 =


hold, then it also satisfies rule (4) , the positive
definiteness of the inner product. (Hint: The equation {x , x} = ax ?+ 2cxI x2+bxl ::::
or the discriminant of (x , x} / xi
is nonpositive. ) In the

case of C = 0, this reduces to (x, y} = aXIYI + bX2Y2. Notice also that a = (ej , ej},
b = {e2, e2} and c = {el, e2} = {e2, e.) , 0

Problem 5.1 In Example 5.1, the conver se is also true : Prove that if (x, y) = x T Ay is an inner
product in ]R2, then a > 0 and ab - c2 > O.

Example 5.2 (Case ofx f. 0 f. Y but {x, y} = 0) Let V = C [0, 1] be the vector
space of all real-valued continuous functions on [0, 1]. For any two functions f and

1\
g in V , define
{f, g} = f (x) g(x)dx .

Then { , } is an inner product on V (verify this). Let

I - 2x if s x ::: i,
( s s i,
g(x) = if x
f(x) =
(
. I
If 2 sxs I,
and
2x - I if i s x s I.
Then f i= 0 i= g, but (f, g ) = O. o
By a subspace W of an inner product space V , we mean a subspace of the vector
space V together with the inner product that is the restriction of the inner product on
Vto W.

Example 5.3 (A subspace as an inner product space) The set W = D 1[0, 1] of all
real-valued differentiable functions on [0, 1] is a subspace of V = qo, 1]. The
restriction to W of the inner product on V defined in Example 5.2 makes W an inner
product subspace of V. However, one can define another inner product on W by the
following formula: For any two functions f(x) and g(x) in W,

{{f, g}} = 1 1
f(x)g(x )dx + 1 1
f '( x)g'(x)dx.

Then {{, }} is also an inner product on W , which is different from the restriction to W
of the inner product of V, and hence W with this new inner product is not a subspace
of the inner product space V. 0
160 Chapter 5. Inner Product Spaces

Remark: From vector calculus, most readers might be already familiar with the dot
product (or the inner product) and the cross product (or the outer product) in the
3-space jR3. The concept of this dot product is extended to a higher dimensional
Euclidean space jRn in this section. However, it is known in advanced mathematics
that the cross product in the 3-space jR3 cannot be extended to a higher dimensional
Euclidean space jRn. In fact, it is known that if there is a bilinear function I : jRn X
jRn -+ jRn; I(x, y) = x x y satisfying the property: x x y is perpendicular to x and
= =
y, and IIx x yll2 IIxll 211yll2 - (x . y)2, then n 3 or 7. Hence, the cross product or
the outer product will not be introduced in linear algebra.

5.2 The lengths and angles of vectors

In this section, we study a geometry of an inner product space by introducing a length ,


an angle or a distance between two vectors. The following inequality will enable us
to define an angle between two vectors in an inner product space V.

Theorem 5.1 (Cauchy-Schwarz inequality) If x and yare vectors in an inner


product space V , then
(x, y)2 :5 (x, x){y, y).

Proof: If x = 0, it is clear. Assume x i= O. For any scalar t , we have


0:5 (tx + y, tx + y) = (x, x)t 2 + 2{x, y)t + (y, y).

This inequality implies that the polynomial (x, x)t 2 + 2{x, y)t + (y, y) in t has either
no real roots or a repeated real root. Therefore, its discriminant must be nonpositive:

(x, y)2 - (x, x){y, y) :5 0,

which implies the inequality. o


Problem 5.2 Prove that the equalityin the Cauchy-Schwarz inequality holdsif and onlyif the
vectorsx and yare linearlydependent.

The lengths of vectors and angles between two vectors in an inner product space
are defined in a similar way to the case of the Euclidean n-space.

Definition 5.2 Let V be an inner product space.


(1) The magnitude [x] (or the length) of a vector x is defined by

[x] = J{x, x).


5.2. The lengths and angles of vectors 161

(2) The distance d(x , y) between two vectors x and Yis defined by

d(x, y) = [x - YII.

(3) From the Cauchy-Schwarz inequality, we have -1 :s lI~illr;1I :s 1 for any two
nonzero vectors x and y. Hence, there is a unique number () E [0, rr] such that

(x, y}
cos e = IIxlillYIl or (x,y} = IIxIiIlYllcos(}.
Such a number () is called the angle between x and y.

For example, the dot product in the Euclidean 3-space ]R3 defines the Euclidean
distance in ]R3. However, one can define infinitely many non-Euclidean distances in
]R3 as shown in the following example.

Example 5.4 (Infinitely many different inner products on ]R2 or ]R3)


(1) In ]R2 equipped with an inner product (x, y} = 2X1Yl + 3X2Y2, the angle
between x = (1, 1) and Y = (1,0) is computed as

(x, y} 2
cos () = - - = - - = 0.6324 .. ..
IIxllllYl1 J5:2
Thus, () = cos"! (k) . Notice that in the Euclidean 2-space]R2 with the dot product,
the angle between x = (1,1) and Y = (1,0) is clearly ~ and cos ~ = ~ =
0 .7071 . '. It shows that the angle between two vectors depends actually on an inner
product on a vector space. [d 1
0 0]
(2) For any diagonal matrix A = 0 da 0 with all d, > 0,
o 0 d3

defines an inner product on]R3. Thus, there are infinitely many different inner products
on ]R3 . Moreover, an inner product in the 3-space]R3 may play the roles of a ruler and
a protractor in our physical world ]R3. 0
Problem 5.3 In Example 5.4(2), show that x T Ay cannot be an inner product if A has a negative
diagonal entry d; < O.

Problem 5.4 Prove the following properties of length in an inner product space V : For any
vectors x, y E V,
(1) [x] ~ 0,
(2) [x] = 0 if and only if x = 0,
(3) IIkxll = Iklllxll,
(4) IIx+YII::::: [x] + IIYII (triangle inequality).
162 Chapter 5. Inner Product Spaces

Problem 5.5 Let V be an inner product space. Show that for any vectors x, y and z in V ,
(I) d(x, y) 2: 0,
(2) d(x, y) = 0 if and only if x = y,
(3) d(x, y) = dey, x),
(4) d(x, y) ::: d(x, z) + d(z , y) (triangle inequality).

Definition 5.3 Twovectorsx and Yin aninnerproductspacearesaid to beorthogonal


(or perpendicular) if (x, y) = O.
Note that for nonzero vectors x and y, (x, y) = 0 if and only if 0 = IT /2.
Lemma 5.2 Let V be an inner product space and let x E V . Then, the vector x is
=
orthogonal to every vector y in V (i.e., (x, y) 0 for all y in V) ifand only ifx O. =
Proof: Ifx = 0, clearly (x, y) = 0 for all y in V . Suppose that (x, y) = 0 for all Y
in V. Then (x, x) = 0, implying x = 0 by positive definiteness. 0

Corollary 5.3 Let V be an inner product space, and let a = {Vj, . . . , vn } be a basis
for V . Then, a vector x in V is orthogonal to every basis vector Vi in a ifand only if
x=o.
Proof: If (x, Vi) = 0 for i = 1, 2, . . . , n, then (x, y) = L:?=j Yi (x, Vi) = 0 for any
y = L:?=j YiV i E V. 0

Example 5.5 (Pythagorean theorem) Let V be an inner product space, and let x
and y be any two nonzero vectorsin V with the angle e. Then, (x, y) = IIxlillYIl cosO
gives the equality
IIx + yf = IIxli z + lIyllZ + 211xllllYil cos e .
Moreover, it deduces the Pythagorean theorem: IIx + yliZ = IIxli z + lIyllZ for any
orthogonal vectors x and y. 0
Theorem 5.4 [fxj, Xz , .. . , Xk are nonzero mutually orthogonal vectors in an inner
product space V (i.e., each vector is orthogonal to every other vector), then they are
linearly independent.

Proof: Suppose CjXj + CZXz + . .. + qXk = O. Then for each i = 1,2, ... , k,
o (0, Xi) = + . ..+ qXk , Xi)
(CjXj
= Cj (Xj,Xi) + . .. + Ci(Xi , Xi} + . .. + q(Xk, Xi}
2
= cillxiU ,
because xi , Xz, . .. , Xk are mutually orthogonal.Since each Xi is not the zero vector,
IIxdl ;6 0; so ci = 0 for i = 1, 2, .. . , k. 0
5.3. Matrix representations of inner products 163

r
Problem 5.6 Let f(x) and g(x) be continuous real-valued functions on [0, 1]. Prove

(1) [fJ f(x)g(x)dx s [fJ f2(X)dx] [fJ g2(x)dx].


1 I I
(2) [fJ (f(x) + g(x2dx] ~ ~ [fJ f 2(x)dx ] ~ + [fJ g2(x)dx] ~ .

Problem 5.7 Let V = C [0, 1] be the inner product space of all real-valued continuous func-
tions on [0, 1] equipped with the inner product

1
(f, g) = 10 f(x)g(x) dx for any f and gin V.

For the following two functions f and g in V, compute the angle between them: For any natural
numbers k, t,

(1) f(x) = kx and g(x) = lx,


(2) f(x) = sin 211'kx andg(x) = sin 211'lx,
(3) f(x) = cos 211'kx and g(x) = cos 211'lx.

5.3 Matrix representations of inner products


Let A be an n x n diagonal matrix with positive diagonal entries. Then one can
show that (x, y) = x T Ay defines an inner product on the n-space IRn as shown in
Example 5.4(2). The converse is also true: every inner product on a vector space can be
expressed in such a matrix product form. Let (V, ( ,)) be an inner product space, and
let a = {VI , V2, ... , vn } be a fixed ordered basis for V. Then for any x = L:?=I xiv,
and Y = LJ=l YjV j in V,

n n
(x, y) = L :~~>i Yj (Vi, V j)
i=1 j=1

holds. If we set aij = (Vi, V j) for i, j = I, 2, . . . , n, then these numbers constitute


a symmetric matrix A = [aij], since (Vi, V j) = (v j, Vi) ' Thus, in matrix notation, the
inner product may be written as

n n
(x,y) = LLxiYjaij = [xlrA[yla.
i=l j=l

The matrix A is called the matrix representation of the inner product ( , ) with
respect to the basis a.

Example 5.6 (Matrix representation of an inner product)


(1) With respect to the standard basis {el,~, . .. , en} for the Euclidean n-
space IRn , the matrix representation of the dot product is the identity matrix, since
164 Chapter 5. Inner Product Spaces

ei . ej = Dij . Thus, for x = Li Xiei and Y = Lj Yjej ERn, the dot product is just
the matrix product x T y:

(2) On V = Pz([O, 1]), we define an inner product on Vas

(f, g) = i l
f(x)g(x)dx.

Then for a basis a = (f1(X) = 1, fz(x) = x, h(x) = xZ} for V, one can easily
find its matrix representation A = [aijl: For instance,

aZ3 = (fz, 13) =


i
o
l
fz(x)h(x)dx = 11
0
x . xZdx = -.
1
4
D

The expression of the dot product as a matrix product is very useful in stating or
proving theorems in the Euclidean space.
For any symmetric matrix A and for a fixed basis a, the formula (x, y) =
[xlr A[Yla seems to give rise to an inner product on V . In fact, the formula clearly is
symmetric and bilinear, but does not necessarily satisfy the fourth rule, positive defi-
niteness. The following theorem gives a necessary condition for a symmetric matrix
A to give rise to an inner product. Some necessary and sufficient conditions will be
discussed in Chapter 8.
Theorem 5.5 The matrix representation A of an inner product (with respect to any
basis) on a vector space V is invertible. That is, det A =P O.

Proof: Let (x, y) = [xlr A[Yla be an inner product in a vector space V with respect
to a basis a, and let A[Yla = 0 as a homogeneous system of linear equations. Then

(Y, y) = [YlrA[Yla = O.

It implies that A[Yla = 0 has only the trivial solution Y = 0, or equivalently A is


invertible by Theorem 1.9. D

Recall that the conditions a > 0 and det A = ab - c Z > 0 in Example 5.1 are
sufficient for A to give rise to an inner product on R Z

5.4 Gram-Schmidt orthogonalization


The standard basis for the Euclidean n-space Rn has a special property: The basis
vectors are mutually orthogonal and are of length 1. In this sense, it is called the
5.4. Gram-Schmidt orthogonalization 165

rectangular coordinate system for jRn. In an inner product space, a vector with
length I is called a unit vector. If x is a nonzero vector in an inner product space
V, the vector II~II x is a unit vector. The process of obtaining a unit vector from a
nonzero vector by multiplying the reciprocal of its length is called a normalization.
Thus, if there is a set of mutually orthogonal vectors (or a basis) in an inner product
space, then the vectors can be converted to unit vectors by normalizing them without
losing their mutual orthogonality.

Definition 5.4 A set of vectors xr, X2, . .. , Xk in an inner product space V is said
to be orthonormal if
(orthogonality) ,
(normality) .

A set [xj , X2, . . . , xn } of vectors is called an orthonormal basis for V if it is a basis


and orthonormal.
Problem 5.8 Determine whether each of the following sets of vectors in ]R2 is orthogonal,
orthonormal, or neitherwith respectto the Euclidean inner product.

(I) {[ ~ ] , [ ~ ]}

(3) {[ ~ l [-~ ]} (4) I/J2 ] [-I/J2]}


{[ 1/J2 ' 1/J2

It will be shown later in Theorem 5.6 that every inner product space has an
orthonormal basis , just like the standard basis for the Euclidean n-space jRn.
The following example illustrates how to construct such an orthonormal basis.

Example 5.7 (How to construct an orthonormal basis ?) For a matrix

find an orthonormal basis for the column space C(A) of A.

Solution: Let Cl , C2 and C3 be the column vectors of A in the order from left to
right. It is easily verified that they are linearly independent, so they form a basis for
the column space C(A) of dimension 3 in jR4. For notational convention, we denote
by Spanlx}, .. . ,xd the subspace spanned by {Xl, . . . , xd .
(1) First normalize cl to get

Cl
ul = ~ =
Cl (I 1I I)
'2 = 2' 2' 2' 2 '
166 Chapter S. Inner Product Spaces

which is a unit vector. Then Spanluj] =Span{cI}, because one is a scalar multiple
of the other.
(2) Noting that the vector Cz - (UI, CZ}UI = Cz - 2uI = (0, 1, -1, 0) is a
nonzero vector orthogonal to UI , we set

Then, {UI, uz} is orthonormal and Spanlu}, uz} = Spanjc}, cz}. because each u, is
a linear combination of CI and C2, and the converse is also true.
(3) Finally, note that C3 - (UI, C3}UI - (U2, C3}UZ = C3 - 4uI + J2uz =
(0, 1, 1, -2) is also a nonzero vector orthogonal to both UI and uz. In fact,

(UI, C3-4uI+Y'2uZ) = (UI, c3}-4(UI,UI}+Y'2(UI, uz}=O,


(U2, C3-4uI+Y'2uZ) = (uz, c3}-4(uz, UI}+Y'2(UZ,UZ} =0.

By the normalization , the vector

is a unit vector, and one can also show that Spanlu}, Uz, U3) = Spanlcj , Cz , C3} =
C(A) . Consequently, {UI , Uz, U3} is an orthonormal basis for C(A). 0

The orthonormalization process in Example 5.7 indicates how to prove the fol-
lowing general case, called the Gram-Schmidt orthogonaIization.

Theorem 5.6 Every inner product space has an orthonormal basis.

Proof: [Gram-Schmidt orthogonalization process] Let {XI,Xz, .. . , xnl be a basis


for an n-dimensional inner product space V . Let

XI
UI = IIxIII'

Of course, Xz - (xz, UI}UI =f. 0, because {XI , xz} is linearly independent. Generally,
one can define by induction on k = 1,2, . . . , n,

Then, as Example 5.7 shows, the vectors UI, Uz, ... , Un are orthonormal in the
n-dimensional vector space V . Since every orthonormal set is linearly independent,
it is an orthonormal basis for V . 0
5.4. Gram-Schmidt orthogonalization 167

Problem 5.9 Use the Gram-Schmidt orthogonalization on the Euclidean space R" to transform
the basis
{(O, 1, 1, 0), (-I, 1, 0, 0), (1, 2, 0, -1), (-1, 0, 0, -I)}
into an orthonormal basis.

Problem5.10 Find an orthonormal basis for the subspace W of the Euclidean space ]R3 given
byx + 2y - z = O.

Problem 5.11 Let V = C[O, 1] with the inner product


1
(f,g) = 10 f(x)g(x)dx for any fandgin V.

Find an orthonormal basis for the subspace spanned by 1, x and x 2 .

The next theorem shows that an orthornormal basis acts just like the standard
basis for the Euclidean n-space jRn.

Theorem 5.7 Let {u I, U2, ... , Uk} be an orthonormal basis for a subspace V in an
inner product space V. Then, for any vector x in V ,

Proof: For any vector x E V, one can write x = XIUI + X2U2 + + XkUk, as a
linear combination of the basis vectors. However, for each i I, = , n,

(Uj,x) = (u., XIUI+ " '+XkUk)

XI (u. , uj ) + . ..+ Xj(Uj, Uj) + ...+ xt{Uj, Uk)

= xi ,

because {UI, U2, . , . , Uk} is orthonormal. o

In particular, if a = {v], V2, ... , vn } is an orthonormal basis for V, then any


vector x in V can be written uniquely as

x = (VI, X)VI + (V2, X)V2 + .. .+ (v n , x)v n .

Moreover, one can identify an n-dimensional inner product space V with the
Euclidean n-space jRn. Let a = {VI, V2, . . , vn } be an orthonormal basis for the
space V. With this orthonormal basis a, the natural isomorphism <I> : V....-+ jRn given
by <I> (Vj) = [v;]a = ej, i = I, 2, . . . , n preserves the inner product on vectors: For a
vector x = I:7=1 xiv, in V with x; = (x, Vj), the coordinate vector ofx with respect
to a is a column matrix
168 Chapter 5. Inner Product Spaces

Moreover, for another vector y = 2:7=1 YiVj in V,

The right-hand side of this equation is just the dot product of vectors in the Euclidean
space JRn. That is,
(x, y) = [x]~[yltx = <I>(x) <I>(y)
for any x, y E V. Hence, the natural isomorphism <I> preserves the inner product,
and we have the following theorem (compare with Corollary 4.8(1.
Theorem 5.8 Any n-dimension inner product space V with an inner product (, ) is
isomorphic to the Euclidean n-space JRn with the dot product ., which means that an
isomorphismpreserves the inner product.
In this sense, someone likes to restrict the study of an inner product space to the
case of the Euclidean n-space JRn with the dot product.
A special kind of linear transformation that preserves the inner product such as
the natural isomorphism from V to JRn plays an important role in linear algebra, and
it will be studied in detail in Section 5.8.

5.5 Projections
Let U be a subspace of a vector space V . Then, by Corollary 3.13 there is another
subspace W of V such that V = U EB W, so that any x E V has a unique expression as
x = u + w for u E U and w E W. As an easy exercise, one can show that a function
T : V ~ V defined by T(x) = T(u + w) = u is a linear transformation, whose
image Im(T) =T(V) is the subspace U and kernel Ker(T) is the subspace W.
Definition 5.5 Let U and W be subspaces of a vector space V . A linear transformation
T : V ~ V is called the projection of V onto the subspace U along W if V = U EB W
and T(x) = u for x = u+w E U EB W.

Example 5.8 (Infinitely many differentprojectionsofJR2 onto the x -axis) Let X, Y


and Z be the l-dimensional subspaces of the Euclidean 2-space JR2 spanned by the
vectors ej , e2, and v = el + e2 = (1, 1), respectively:

X = {reI : r E JR} = x-axis,


Y = {re2 : r E 1R} = y-axis ,
Z = {reel + e2) : r E JR}.
5.5. Projections 169

Since the pairs leI, ez} and [ej , v} are linearly independent, the space ]R2 can be
expressed as the direct sum in two ways : ]R2 = X $ Y = X $ Z .

y z

Ty(x) = (0, 1) _____ x = (2, 1) Sz(x) = (1, 1)

TX(x) = (2,0) x x

Figure 5.1. Two decompositions of]R2

Thus, a vector x = (2, 1) E ]R2 may be written in two ways:

x = (2, 1) = { 2(1,0) + (0, 1) E X $ Y = ]R2 or


(1,0) + (1,1) E X $ Z = ]R.i
Let Tx and Sx denote the projections of]R2 onto X along Y and Z, respectively. Then
Tx(x) = 2(1,0) E X, Ty(x) = (0, 1) E Y, and
Sx(x) = (1,0) E X, Sz(x) = (1, 1) E Z.

This shows that a projection of ]R2 onto the subspace X depends on a choice of
complementary subspace of X. For example, by choosing Zn = {r(n, I) : r E R]
for any integer n as a complementary subspace, one can construct infinitely many
different projections of ]R2 onto the x -axis. 0

Note that for a given subspace U of V, a projection T of V onto U depends on


the choice of a complementary subspace W of U as shown in Example 5.8. However,
by definition, T(u) = u for any u E U and for any choice of W. That is, ToT = T
for every projection T of V .
The following theorem shows an algebraic characterization of a linear transfor-
mation to be a projection.

Theorem 5.9 A linear transformation T : V ~ V is a projection if and only if


T = T 2 (= ToT by definition).

Proof: The necessity is clear, because ToT =


T for any projection T.
For the sufficiency, suppose T 2 = T . It suffices to show that V = Im(T)$Ker(T)
and T(u + w) = u for any u + w E Im(T) $ Ker(T) . First, one needs to prove
Im(T) n Ker(T) = (OJ and V = Im(T) + Ker(T). Indeed, if y E Im(T) n Ker(T) ,
then there exists x E V such that T(x) = yand T(y) = O. It implies
170 Chapter 5. Inner Product Spaces

y = T(x) = T 2(x) = T(T(x = T(y) = O.


The hypothesis T 2 = T also shows that T(v) E Im(T) and v - T(v) E Ker(T)
for any v E V . It implies V = Im(T) + Ker(T). Finally, note that T(o + w) =
T(o) + T(w) = T(o) = 0 for any 0 + WE Im(T) EEl Ker(T). 0

Let T : V -+ V be a projection, so that V = Im(T) EEl Ker(T). It is not difficult


to show that Im(idv - T) = Ker(T) and Ker(idv - T) = Im(T) for the identity
transformation idv on V.

Corollary 5.10 A linear transformation T : V -+ V is a projection if and only if


idv - T is a projection. Moreover, if T is the projection of V onto a subspace U
along W, then idv - T is the projection of V onto W along U. 0

Proof: It is enough to show that (idv - T) 0 (idv - T) = idv - T. But

(idv - T) 0 (idv - T) = (idv - T) - (T - T 2 ) = idv - T. 0

Problem 5.12 For V = U E9 W, let Tv denote the projection of V onto U along W, and let
Tw denote the projection of V onto W along U. Prove the following.

(1) For any x E V , X = TV (x) + Tw(x) .


(2) Tv 0 (idv - Tv) = O.
(3) Tv 0 Tw = Tw 0 TV = O.
(4) For any projection T : V ~ V, Im(idv - T) = Ker(T) and Ker(idv - T) = Im(T).

5.6 Orthogonal projections


Let U be a subspace of a vector space V. As shown in Example 5.8, we learn that there
are infinitely many projections of V onto U which depend on a choice of complemen-
tary subspace W of U. However, if V is an inner product space, there is a particular
choice of complementary subspace W, called the orthogonal complement of U, along
which the projection onto U is called the orthogonal projection, defined below. To
show this, we first extend the orthogonality of two vectors to an orthogonality of two
subspaces.

Definition 5.6 Let U and W be subspaces of an inner product space V.


(1) Two subspaces U and W are said to be orthogonal, written U .1 W, if (0, w) 0 =
for each 0 E U and w E W.
(2) The set of all vectors in V that are orthogonal to every vector in U is called the
orthogonal complement of U, denoted by u-, i.e.,

u- = {v E V : (v,o) = 0 for all 0 E U}.


5.6. Orthogonal projections 171

One can easily show that U.i is a subspace of V , and v E u if and only if
(v, u) = 0 for every U E f3, where f3 is a basis for U. Moreover, W ..L U if and only
if W ~ o-.
Problem 5.13 Let U and W be subspaces of an inner product space V. Show that

(1) If U .1 W, un W = {OJ. (2) U ~ W if and only if w- ~ u-.

Theorem 5.11 Let U be a subspace ofan inner product space V . Then


(1) (U.i).i = U.
(2) V = U EEl U.i : that is, for each x E V, there exist unique vectors Xu E U and
Xu.l E U.isuchthatx = xU+XuL This is called the orthogonal decomposition
of V (or of X) by U.

Proof: Let dim U = k. To show (U.i).i = U, take an orthonormal basis for U, say
a = {VI,V2, ... , vd, by the Gram-Schmidt orthogonalization, and then extend it to
an orthonormal basis for V, say f3 = {VI,V2,"" Vb Vk+I, ... , Vn }, which is always
possible. Then, clearly y = {Vk+I , " " vn } forms an (orthonormal) basis for o-,
which means that (U.i).i = U and V = U EEl o-. 0

Definition 5.7 Let U be a subspace of an inner product space V, and let {UI, U2 , . .. ,
urn} be an orthonormal basis for U. The orthogonal projection Proju from V onto
the subspace U is defined by

Proju(x) = (UI, X)UI + (U2 , X)U2 + .. . + (urn, x)u m

for any x E V.
Clearly, Proju is linear and a projection, because Proju 0 Proju = Proju. More-
over, Proju(x) E U and x - Proju(x) E u-, because

(x - Proju(x), u.) = (x, OJ) - (Proju(x), u.) = (x, OJ) - (Uj,x) = 0

for every basis vector u. . Hence, by Theorem 5.7, we have

Corollary 5.12 The orthogonal projection Proju is the projection of V onto a sub-
space U along its orthogonal complement u-.

Therefore, in Definition 5.7, the projection Proju(x) is independent of the choice


of an orthonormal basis for the subspace U. In this sense, it is called the orthogonal
projection from the inner product space V onto the subspace U .
Almost all projections used in linear algebra are orthogonal projections.
172 Chapter 5. Inner Product Spaces

Example 5.9 (The orthogonal projectionfrom JR3 onto the xy-plane) In the Euclidean
3-space JR3, let U be the xy-plane with the orthonormal basis ex = [ej , fl}. Then, the
orthogonal projection Proju(x) = (ej , x)el + (fl, X)fl is the orthogonal projection
onto the xy-plane in a usual sense in geometry, and x- Proju (x) E U 1.., which is the z-
axis. It actually means that Proju(XI, X2, X3) = (Xl , X2, 0) for any x = (XI,X2 , X3) E
~. 0
Example 5.10 (The orthogonal projection from JR2 onto the x-axis) As in Exam-
ple 5.8, let X, Y and Z be the I-dimensional subspaces of the Euclidean 2-space
JR2 spanned by the vectors eJ, e2, and v = el + fl = (1, I), respectively. Then
clearly Y = Xl.. and V :f:. Xl... And, for the projections Tx and Sx ofJR2 given in
Example 5.8, Tx is the orthogonal projection , but Sx is not, so that Tx Proh and =
SX:f:. Proh . 0
Theorem 5.13 Let U be a subspace of an inner product space V , and let x E V.
Then, the orthogonal projection Proju(x) ofx satisfies

IIx - Proju(x)1I ::: IIx - YII

for all y E V . The equality holds ifand only ify = Proju(x).

Proof: First, note that for any vector x E V, we have Proju(x) E V and x -
Proju(x) E u-.
Thus, for all y E V,

IIx - yf = II (x - Proju(x + (Proju(x) _ Y)1I 2


= IIx - Proju(x) 11 + IIProju(x) _ yll2
2

> IIx - Proju (x) 11 2,

where the second equality comes from the Pythagorean theorem for the orthogonality
(x - Proju(x .L (Proju(x) - y) . (See Figure 5.2.) 0

It follows from Theorem 5.13 that the orthogonal projection Proju(x) ofx is the
unique vector in U that is closest to x in the sense that it minimizes the distance
from x to the vectors in V. It also shows that in Definition 5.7, the vector Proju(x) is
independent of the choice of an orthonormal basis for the subspace U . Geometrically,
Figure 5.2 depicts the vector Proju(x) .


Problem 5.14 Find the point on the plane x - y - z = that is closest to p = (1, 2, 0).

Problem 5.15 Let U C ]R4 be the subspace of the Euclidean 4-space ]R4 spanned by
(1, 1, 0, 0) and (1, 0, 1, 0), and let We ]R4 be the subspace spanned by (0, 1, 0, 1) and
(0, 0, I , 1). Find a basis for and the dimension of each of the following subspaces:
(1) U + W, (2) u-, (3) Ul.. + w-, (4) un w.
5.6. Orthogonal projections 173

x - Proju(x) E UJ..
x
.... X-y

...... ,.:.:!'"
..
.. ' .... ..JV
'

U
Proju(x)

Figure 5.2. Orthogonal projection Proju

Problem 5.16 Let U and W be subspaces of an inner product space V . Show that
(I) (U + W)J.. = UJ.. n WJ... (2) (U n W)J.. = UJ.. + WJ...

As a particular case, let V = jRn be the Euclidean n-space with the dot product
=
and let U {ru : r E jR} be a l-dimensional subspace determined by a unit vector
u. Then for a vector x in jRn, the orthogonal projection of x into U is

Proju(x) = (u - x)u = (u T x)u = u(u Tx) = (uuT)x.


(Here, the last two equalities come from the facts that u . x = u TX is a scalar and the
associativity of a matrix product uuT x, respectively.) This equation shows that the
matrix representation of the orthogonal projection Proju with respect to the standard
basis ex is
[Projula = uuT .
If U is an m-dimensional subspace of R" with an orthonormal basis {UI , U2 , " "
urn}, then for any x E jRn,

Proju(x) = (UI' X)UI + (U2 . X)U2 + . .. + (urn' x)um


= Ut(u[x) + u2(ufx) + ... + um(u~x)
= (UIU; + U2Ur + ... + UmU~)X.

Thus, the matrix representation of the orthogonal projection Proju with respect to the
standard basis ex is

Definition 5.8 The matrix representation [Proju la of the orthogonal projection


Proju : jRn --+ jRn of jRn onto a subspace U with respect to the standard basis ex
is called the (orthogonal) projection matrix on U.

Further discussions about the orthogonal projection matrices will be continued in


Section 5.9.3.
174 Chapter 5. Inner Product Spaces

Example 5.11 (Distancefrom apointto a line) Letax+by+c 0 be a line L in the =


plane JR2. (Note that the line L cannot be a subspace ofJR2 if c f= 0.) For any two points
Q = (Xl. YI) and R = (X2 , Y2) on the line, the equality a(x2 - Xl) + b(Y2 - YI) = 0
~lies that the nonzero vector D = (a , b) is perpendicular to the line L, that is,
QR..L D.
Let P = (xo, YO) be any point in the plane JR2 . Then the distance d between the
point P and the line L is simply the length of the orthogonal projection of QP into
D, for any point Q = (Xl, YI) in the line. Thus,

d = II Projn(QP)II

= I(QP 11:11)1 (the dot product)


la(xo - Xl) + b(yo - YI)I
=
Ja2 + b2
laxo + byo + cl
=
Ja2 + b2

p = (xo. YO)

Figure 5.3. Distance from a point to a line

Note that the last equality is due to the fact that the point Q is on the line (i.e.,
aXI + bYI + c = 0).
To find the orthogonal projection matrix, let u 11:11 = =
';)+b 2(a, b). Then the
orthogonal projection matrix onto U = {ru : r E JR} is

uuT = a 2 ~ b2 [ : ] [a b] = a 2 ~ b2 [:; ~~ l
Thus, if x = (1, 1) E JR2, then
2
Proju(x) = (uuT)x = -2--2
1
a +b
[ ab
a
o
5.7. Relations of fundamental subspaces 175

Problem 5.17 Let V = P3(lR) be the vector space of polynomials of degree ~ 3 equipped
with the inner product
1
(f, g) = 10 f(x)g(x) dx for any f and gin V .

Let W be the subspace of V spanned by [I , x}, and define f(x) = x 2. Find the orthogonal
projection Proj w(f) of f onto W.

5.7 Relations of fundamental subspaces

We now go back to the study of a system Ax =


b of linear equations with an m x n
matrix A. One of the most important applications of the orthogonal projection of
vectors onto a subspace is the decompositions of the domain space and the image
space of A by the four fundamental subspacesN(A), 'R(A) in JRn and C(A), N(A T)
in JRm (see Theorem 5.16). From these decompositions, one can completely determine
the solution set of a consistent system Ax = b.

Lemma 5.14 For an m x n matrix A, the null spaceN(A) and the row space'R(A)
are orthogonal: i.e. , N(A) 1- R(A) in ]Rn. Similarly, N(A T) 1- C(A) in ]Rm.

Proof: Note that WE N(A) if and only if Aw = 0, i.e., for every row vector r in A,
r . W = O. For the second statement, do the same with AT. D

From Lemma 5.14 , it is clear that

N(A) S; 'R(A) .L (or'R(A) S; N(A).L), and


N(A T) S; C(A).L (or C(A) S; N(AT).L).

Moreover, by comparing the dimensions of these subspaces and by using Theo-


rem 5.11 and Rank Theorem 3.17, we have

dim R(A) + dimN(A) n = dim'R(A) + dim 'R(A).L,


dimC(A) + dimN(A T) = m = dim C(A) + dim C(A).L .

This means that the inclusions are actually equalities.

Lemma 5.15 (1) N(A) =


R(A).L (or'R(A) = N(A).L).
(2) N(A T) = C(A).L (orC(A) = N(AT).L).

We show that the row space 'R(A) is the orthogonal complement of the null space
N(A) in R", and vice-versa. Similarly, the same thing happens for the column space
C(A) and the null space N (A T) of AT in JRm . Hence, by Theorem 5.11, we have the
following orthogonal decomposition.
176 Chapter 5. Inner Product Spaces

Theorem 5.16 For any m x n matrix A,

(1) N (A) EB R(A) = ]Rn ,


(2) N(A T ) EBC(A) = ]Rm.

Note that if rank A = r so that dimR(A) = r = dimC(A), then dimN(A) =


n - rand dimN(A T ) = m - r. Considering the matrix A as a linear transformation
A : ]Rn -+ ]Rm, Figure 5.4 depicts Theorem 5.16.

A
- - - -b-

N( A) :::: jRn - r

Xn

Figure 5.4. Relations of four fundamental subspaces

Corollary 5.17 The set of solutions of a consistent system Ax = b is precisely


Xo+ N (A), where Xo is any solution of A x = b.
Proof: Let Xo E ]Rn be a solution of a system Ax = b. Now consider the set Xo +
N(A), which isjust a translation of N(A) by xo.
(1) For any vector Xo +0 in Xo +N(A) , it is also a solution because A(xo + 0) =
Axo = b .
(2) If x is another solution, then clearly x - Xo is in the null space N(A) so that
x = Xo + 0 for some 0 E N (A ), i.e., x E Xo + N(A). 0

In particular, if rank A = m (so that m ~ n), then C(A) = ]Rm. Thus, for any
b E ]Rm, the system Ax = b has a solution in ]Rn . (This is the case of the existence
Theorem 3.24) .
On the other hand, if rank A = n (so that n ~ m), then N(A) = (OJ and
R (A) = ]Rn. Therefore, the system Ax = b has at most one solution, that is, it
has a unique solution x in R(A ) if b E C(A), and has no solution if b rj. C(A).
(Th is is the case of the uniqueness Theorem 3.25) . The latter case may occur when
m > r = rank A; that is, N (A T) is a nontrivial subspace of]Rm, and will be discussed
later in Section 5.9.1.
5.8. Orthogonal matrices and isometries 177

Problem 5.18 Prove the following statements .

(l) If Ax = b and AT Y = 0, then yTb = 0, i.e., y.l b.


(2) If Ax = 0 and AT y = c, then x T c = 0, i.e., x .1 c.

Problem 5.19 Given two vectors (1 , 2, I, 2) and (0, - I , -1 , 1) in lR4 , find all vectors in
]R4 that are perpendicular to them.

Problem 5.20 Find a basis for the orthogonal complement of the row space of A :

1 28] 001]
(1) A =
[ 23 0-1 61 , (2) A =
[0 0 I .
111

5.8 Orthogonal matrices and isometries


In Chapter 4, we saw that a linear transformation can be associated with a matrix, and
vice-versa. In this section , we are mainly interested in those linear transformations
(or matrices) that preserve the length of a vector in an inner product space.
Let A = [CI . . . cn] be an n x n square matrix, with columns CI, . . , Cn' Then,
a simple computation shows that

A T A= [-- C~: - -] [ I
CI
- - cT
n
__ I

cT
Hence, if the column vectors are orthonormal, C j = 8ij, then AT A = In, that is,
AT is a left inverse of A, and vice-versa. Since A is a square matrix , this left inverse
must be the right inverse of A, i.e., AA T = In. Equivalently, the row vectors of A are
also orthonormal. This argument can be summarized as follows .

Lemma 5.18 For an n x n matrix A, the following are equivalent.


(1) The column vectors of A are orthonormal.
(2) AT A = In.
(3) AT = A-I .
(4) AAT=In .
(5) The row vectors of A are orthonormal.

Definition 5.9 A square matrix A is called an orthogonal matrix if A satisfies one


(and hence all) of the statements in Lemma 5.18.

Clearly, A is orthogonal if and only if AT is orthogonal.


178 Chapter 5. Inner Product Spaces

Example 5.12 (Rotations and reflections are orthogonal) The matrices

A = [ c~s 0 - sin 0 ] B = [ c~s 0 sin 0 ]


smO cosO ' smO -cosO

are orthogonal, and satisfy

A-I = AT = [ c~sO sinO] B-1 = B T = [ c~sO sinO] .


- smO cosO ' smO -cosO

Note that the linear transformation T : ]R2 -+ ]R2 defined by T (x) = Ax is a rotation
through the angle 0, while S : ]R2 -+ ]R2 defined by Sex) = Bx is the reflection about
the line passing through the origin that forms an angle 0/2 with the positive x-axis.D

Example 5.13 (All 2 x 2 orthogonal matrices) Show that every 2 x 2 orthogonal


matrix must be one of the forms

cosO - sinO] or cos O sinO]


[ sinO cosO [ sinO - cos e .

Solution: Suppose that A = [~ ~] is an orthogonal matrix, so that AA T = h =


AT A. The first equality gives a 2 + b 2 = 1, ac + bd = 0, and c 2 + d 2 = 1. The
second equality gives a 2 + c 2 = 1, ab + cd = 0, and b 2 + d 2 = 1. Thus, b = c.
If b = -c, then we get a = d . If b = c, then we get a = -d. Now, choose 0 so that
a = cos 0 and b = sin O. 0

Problem 5.21 Find the inverse of each of the following matrices .

1 0 0 ] 1/./2 -1/./2 0]
(1) 0 sinO ,
[ -1/~ -1/~ ~
cosO (2) .
[ o - sin 0 cos (}

What are they as linear transformations on ]R3 : rotations, reflections , or other?

Problem 5.22 Find eight 2 x 2 orthogonal matrices which transform the square -1 ~ x, y ~ 1
onto itself.

As shown in Examples 5.12 and 5.13, all rotations and reflections on the Euclidean
2-space ]R2 are orthogonal and preserve intuitively both the lengths of vectors and the
angle between two vectors. In fact, every orthogonal matrix A preserves the lengths
of vectors :

IIAxll2 = Ax- Ax = (Ax)T (Ax) = x T AT Ax = x T X = IIxf .


5.8. Orthogonal matrices and isometries 179

Definition 5.10 Let V and W be two inner product spaces. A linear transformation
T : V -+ W is called an isometry, or an orthogonal transformation, if it preserves
the lengths of vectors , that is, for every vector x E V

IIT(x)1I = [x]:

Clearly, any orthogonal matrix is an isometry as a linear transformation. If T :


V -+ W is an isometry, then T is one-to-one, since the kernel of T is trivial: T (x) = 0
implies [x] = IIT(x)1I = O. Thus, if dim V = dim W, then an isometry is also an
isomorphism.
The following theorem gives an interesting characterization of an isometry .

Theorem 5.19 Let T : V -+ W be a linear transformation from an inner product


space V to another W. Then, T is an isometry if and only if T preserves inner
products, that is,
(T(x), T(y)} = (x, y)
for any vectors x, yin V.

Proof: Let T be an isometry. Then IIT(x)f = IIxll 2 for any x E V . Hence,

(T(x+y), T(x+y )} = IIT(x+Y)1I 2 = IIx+yll2 = (x+y,x+Y)


for any x, y E V. On the other hand ,

(T(x + y) , T(x + y)} = (T(x), T(x)} + 2(T(x), T(y)} + (T(y) , T(y)},


(x + y, x + y) = (x, x) + 2(x, y} + (y, y),

from which we get (T(x), T(y)} = (x, y).


The converse is quite clear by choosing y = x. o
Theorem 5.20 Let A be an n x n matrix. Then, A is an orthogonal matrix if and only
if A : jRn -+ jRn, as a linear transformation, preserves the dot product. That is, for
any vectors x, y E jRn,
Ax - Ay =x .y.

Proof: The necessity is clear. For the sufficiency, suppose that A preserves the dot
product. Then for any vectors x, y E jRn,

Ax- Ay = xT ATAy = xTY = x y.


Take x = ei and y = ej . Then , this equation is just [AT A]ij = liij. o
Since d(x , y) = IIx - YII for any x and y in V , one can easily derive the following
corollary.
180 Chapter 5. Inner Product Spaces

Coronary 5.21 A linear transformation T : V ~ W is an isometry if and only if


d(T(x), T(y = d(x, y)
for any x and yin V.

Recall that if () is the angle between two nonzero vectors x and y in an inner
product space V, then for any isometry T : V ~ V,

cos () = {x, y} = {Tx, Ty} .


IIxlillYIl IITxllllTYIl
Hence, we have
Coronary 5.22 An isometry preserves the angle.
The converse of Corollary 5.22 is not true in general. A linear transformation
T (x) = 2x on the Euclidean space jRn preserves the angle but not the lengths of
vectors (i.e., not an isometry). Such a linear transformation is called a dilation.
We have seen that any orthogonal matrix is an isometry as the linear transforma-
tion T (x) = Ax. The following theorem says that the converse is also true, that is,
the matrix representation of an isometry with respect to an orthonormal basis is an
orthogonal matrix.

Theorem 5.23 Let T : V ~ W be an isometry from an inner product space V to


another W of the same dimension. Let ex = {v}, .. . , vn } and fJ = {WI , . . . , w n }
be orthonormal bases for V and W, respectively. Then, the matrix [T]~ for T with
respect to the bases ex and fJ is an orthogonal matrix.

Proof: Note that the k-th column vector of the matrix [T]~ is just [T (Vk)],8' Since
T preserves inner products and ex, fJ are orthonormal, we get

which shows that the column vectors of [T]~ are orthonormal. o


Remark: In summary, for a linear transformation T : V ~ W, the following are
equivalent:
(1) T is an isometry: that is, T preserves the lengths of vectors.
(2) T preserves the inner product.
(3) T preserves the distance.
(4) [T]~ with respect to orthonormal bases ex and fJ is an orthogonal matrix.

Anyone (hence all) of these conditions implies that T preserves the angle, but the
converse is not true.
5.9.1. Application: Least squares solutions 181

Problem 5.23 Find values r > 0, S > 0, a > 0, band c such that matrix Q is orthogonal.

(1) Q =
[
-s2ssa]
r

r
b ,
c
-s a]
3s b
-15 c
.

Problem 5.24 (Bessel's Inequality) Let V be an inner product space, and let {VI, . . . , vm } be
a set of orthonormal vectors in V (not necessarily a basis for V) . Prove that for any x in V,
IIxll 2 2: L:i"=ll(x, Vi)!2 .

Problem 5.25 Determine whether the following linear transformations on the Euclidean space
]R3are orthogonal.

(1) T(x, y, z) = (z, :!,fx + iY, ! - :!,fy).


(2) T(x, y, z) = (U5 x + UZ'
11 12
UY 5
- UZ' x .
)

5.9 Applications

5.9.1 Least squares solutions

In the previous section, we have completely determined the solution set for a system
Ax = b when b E C(A). In this section, we discuss what we can do when the system
Ax = b is inconsistent, that is, when b It C(A) IRm Certainly, there exists no
solution in this case, but one can find a 'pseudo' -solution in the following sense.
Note that for any vector x in IRn, Ax E C(A) . Hence, the best we can do is to find
a vector XO E jRn so that AXQ is the closest to the given vector b E jRm : i.e., II AXO - b II
is as small as possible. Such a vector XQ will give us the best approximation Ax to b
for all vectors x in IRn , and it is called a least squares solution of Ax = b.
To find a least squares solution, we first need to find a vector in C(A) that is
closest to b. However, from the orthogonal decomposition IRm = C(A) EEl N(A T ) ,
any b E IRm has the unique orthogonal decomposition as

b = be + b, E C(A) EElN(AT ) = IRm ,

where be = Prok(A)(b) E C(A) and b n = b - be E N(A T ) . Here, the vector


be = PrOk(A) (b) E C(A) has two basic properties:
(I) There always exists a solution XQ E IRn of Ax = be, since be E C(A),
(2) be is the closest vector to b among the vectors in C(A) (see Theorem 5.13).
Therefore, a least squares solution XQ E IRn of Ax = b is just a solution of Ax = be.
Furthermore, if XQ E IRn is a least squares solution, then the set of all least squares
solutions is XQ +N(A) by Corollary 5.17.
In particular, ifb E C(A), then b = be, so that the least squares solutions are just
the 'true' solutions of Ax = b . The second property of be means that a least squares
182 Chapter 5. Inner Product Spaces

solution Xo E jRn of Ax = b gives the best approximation Axo = b, to b : i.e., for


any vector x in jRn,

IIAxo - b] :::: II Ax - b].

In summary, to have a least squares solution of Ax = b, the first step is to find


the orthogonal projection b c = PrOjc(A) (b) E C(A) ofb, and then solve Ax = b c as
usual .
One can find b c from b E jRm by using the orthogonal projection if we have an
orthonormal basis for C(A). But, such a computation of b c could be uncomfortable,
because the only way we know so far to find an orthonormal basis for C(A) is the
Gram-Schmidt orthogonalization (whose computation may be cumbersome).
However, there is a bypass to avoid the Gram-Schmidt orthogonalization. For
this, let us examine a least squares solution once again . If XO E jRn is a least squares
solution of Ax = b, then

holds since AXO = bc. Thus, AT (Axo-b) = AT (-bn ) = oor equivalently AT Axo =
A Tb, that is, XO is a solution of the equation

ATAx=ATb. I
This equation is very interesting because it is also a sufficient condition for a least
squares solution as the next theorem shows, and it is called the normal equation of
Ax=b.

Theorem 5.24 Let A be an m x n matrix, and let b e jRm be any vector. Then, a
vector XO E jRn is a least squares solution of Ax = b ifand only ifxo is a solution of
the normal equation AT Ax = ATb.

Proof: We only need to show the sufficiency. Let Xo be a solution of the normal
equation AT Ax = ATb. Then, AT (Axo - b) = 0, so AXO - b E N(A T). Say,
AXO - b = n inN(A T), and let b = b c + b n E C(A) EBN(A T) . Then, Axo - b c=
n-j-b; E N(A T). Since Axo- b c is also contained in C(A) andN(AT)nC(A) = {OJ,
AXO = b, = PrOjc(A) (b) , i.e., xo is a least squares solution of Ax = b . 0
Example 5.14 (The best approximated solution of an inconsistent system Ax = b)
Find all the least squares solutions of Ax = b, and then determine the orthogonal
projection b c ofb into the column space C(A), where

1 -2
2 -3
A = -1 1
[
3 -5
5.9.1. Application: Least squares solutions 183

Solution: (The reader may check that Ax = b has no solutions).

-3
-1
2
-1 3] [ ~
2
1 -5
0 -1
3

and

ATb =
[ 1 2-1 3] [~]~ [ 0]
-2 -3
1 -1
1 -5
2 0
= -1
3
.

From the normal equation, a least squares solution of Ax = b is a solution of A T Ax =


ATb, i.e.,

[
-~~ -;~ -~] [;~] = [ -~ ] .
-3 3 6 X3 3
By sol ving this system of equations (left for an exercise), one can obtain all the least
squares solutions, which are of the form:

for any number t E JR. Moreover,

Note that the set of least squares solutions is Xo + N(A ), where xo = [-8/3
5/3 O]T andN(A) = {t[5 3 I f : t E JR}. 0

Problem 5.26 Find all least squares solutions x in ]R3 of Ax = b, where

10 2]
o 2 2 [3]
-3
A = [ -1
-1 2
1 -1
0
' b = O
-3
'

Note that the normal equation is always consistent by the construction, and, as
Example 5.14 shows, a least squares solution can be found by the Gauss-Jordan
elimination even though AT A is not invertible.lfb E C(A) or, even better, if the rows
of A are linearly independent (thus, rank A = m andC(A) = JRm), then be = b so that
184 Chapter 5. Inner Product Spaces

the system Ax = b is always consistent and the least squares solutions coincide with
the true solutions.Therefore,for any givensystem Ax = b, consistentor inconsistent,
by solving the normalequation AT Ax = ATb, one can obtaineither the true solutions
or the least squares solutions.
If the square matrix AT A is invertible,then the normal equation A T Ax = ATb of
the system Ax = b resolves to x = (AT A)-I ATb, which is a least squares solution.
In particular, if AT A = In , or equivalently the columns of A are orthonormal (see
Lemma5.18), then the normal equationreduces to the leastsquaressolution x = ATb.
The following theorem gives a condition for A T A to be invertible.
Theorem 5.25 Forany m x n matrix A, AT A is a symmetric n x n square matrix
and rank (A T A) = rank A.

Proof: Clearly, AT A is square and symmetric. Since the number of columns of A


and AT A are both n, we have

rank A + dimN(A) = n = rank (AT A) + dimN(A T A).


Hence, it sufficesto showthat.V(A) = N(A T A) sothatdimN(A) = dimN(A T A).
It is trivial to see that N (A) S; N (A T A), since Ax = 0 implies AT Ax = O. Con-
versely, suppose that AT Ax = O. Then

Ax - Ax = (Ax)T (Ax) = x T (AT Ax) = xTO = O.


Hence Ax = 0, and x E N(A) . o
It follows from Theorem 5.25 that AT A is invertible if and only if rank A = n:
that is, the columns of A are linearly independent. In this case, N(A) = {OJ and so
the system Ax = b has a unique least squares solution XQ in 'R(A) = JRn , which is

This can be summarized in the following theorem:


Theorem 5.26 Let A be an m x n matrix. If rank A = n, or equivalently the columns
of A are linearlyindependent, then
(1) AT A is invertible so that (AT A)-I AT is a left inverse of A,
(2) the vector xo = (AT A)-I ATb is the unique least squares solution of a system
Ax = b, and
(3) AXO = A(A T A)-IATb = be = ProjC(A)(b), that is, the orthogonal projection
ofJRm onto C(A) is PrOjC(A) = A(A T A)-lAT.

Remark: (1) For an m x n matrix A, by applying Theorem 5.26 to AT, one can say
that rank A = m if and only if AA T is invertible. In this case AT (AAT)-I is a right
inverse of A (cf. Remark after Theorem 3.25). Moreover, AA T is invertible if and
only if the rows of A are linearly independent by Theorem 5.25.
5.9.1. Application: Least squares solutions 185

(2) If the matrix A is orthogonal, then the columns UI , . . . , Un of A form an


orthonormal basis for the column space C(A ) , so that for any be lRm ,

be = (UI . bju, + . ..+ (Un' b)u n = (uluf + . ..+ unu~)b

and the projection matrix is


. T
Pr oJC(A) = UIUI + ... + unun
T

In fact, this result coincides with Theorem 5.26: If A is orthogonal so that AT A = In,
then

and the least squares solution is

UF - - ] _ [ UI. b ]
: b-: ,
u~ -- Un' b

which is the coordinate expression ofAXo = be = PrOk(A)(b) with respect to the


orthonormal basis {UI , ... , un} forC(A).
In general, the columns of A need not be orthonormal, in which case the above
formula is not possible. In Section 5.9.3, we will discuss more about this general case.
(3) If rank A = r < n, one can reduce the columns of A to a basis for the column
space and work with this reduced matrix A (thus, AT A is invertible) to find the
orthogonal projection PrOk(A) = A (A T A)-I AT of R" onto the column space C(A) .
However, the least squares solutions of Ax = b should be found from the original
normal equation directly, since the least squares solution Xo = (AT A)-I ,4Tb of
Ax = b has only r components so that it cannot be a solution of Ax = be.
Example 5.15 (Solving an inconsistent system Ax = b by thenormalequation) Find
the least squares solutions of the system:

Determine also the orthogonal projection be of b in the column space C(A) .

Solution: Clearly, the two columns of A are linearly independent and C(A) is the
xy-plane. Thus , b C(A ). Note that
186 Chapter 5. Inner Product Spaces

which is invertible . By a simple computation one can obtain

Hence,

is a least squares solution, which is unique sinceN (A) = O. The orthogonal projection
ofb in C(A) is

1
be = Ax = [ ~ ~
2] [ 14/3]
-1/3 =
[ 4]
~ . o

Problem 5.27 Find all the least squares solutions of the following inconsistent system of linear
equations:

5.9.2 Polynomial approximations

In this section, one can find a reason for the name of the "least squares" solutions,
and the following example illustrates an application of the least squares solution to
the determination of the spring constants in physics .

Example 5.16 Hooke's law for springs in physics says that for a uniform spring, the
length stretched or compressed is a linear function of the force applied, that is, the
force F applied to the spring is related to the length x stretched or compressed by the
equation
F=a+kx ,
where a and k are some constants determined by the spring .
Suppose now that, given a spring of length 6.1 inches, we want to determine the
constants a and k under the experimental data: The lengths are measured to be 7.6,
8.7 and lOA inches when forces of 2, 4 and 6 kilograms , respectively, are applied to
the spring . However, by plotting these data

(x, F) = (6.1, 0), (7.6, 2), (8.7, 4), (lOA , 6),


5.9.2. Application: Polynomial approximations 187

in the x F -plane, one can easily recognize that they are not on a straight line of the
=
form F a + kx in the x F -plane, which may be caused by experimental errors. This

r
means that the system of linear equations

= a + 6.lk = 0
F2 = a +7.6k = 2
F3 = a + 8.7k = 4
F4 = a + lOAk = 6
is inconsistent (i.e., has no solutions so the second equality in each equation may not
be a true equality). It means that if we put b = (0,2,4,6) and F = (FI, F2, F3, F4)
as vectors in R 4 representing the data and the points on the line at Xi'S, respectively,
then II b - F I is not zero. Thus, the best thing one can do is to determine the straight
line a + kx = F that 'fits' the data best: that is, to minimize the sum of the squares of
the vertical distances from the line to the data (Xi, Yi) for i = I, 2, 3, 4 (See Figure
5.5) (this is the reason why we say least squares):
(0 - FI)2 + (2 - F2)2 + (4 - F3)2 + (6 - F4)2 = lib - F1I 2 .

Thus, for the original inconsistent system

=
6.1] [~] = [ 0]
I~ ~:~ ~ =b ~
Ax
[I lOA 6
C(A),

we are looking for F E C(A), which is the projection of b onto the column space
C(A) of A, and the least squares solution xo, which satisfies Axo = F.

F
10
8
6
4
2

Figure 5.5. Least squares fitting

It is now easily computed as (by solving the normal equation A T Ax = A Tb)

[ ~ ] = x = (AT A)-I ATb = [ -~:~ ] .


It gives F = -8.6 + lAx . o
188 Chapter 5. Inner Product Spaces

In general, a common problem in experimental work is to obtain a polynomial


Y = f(x) in two variables x and Y that best 'fits' the data of various values of Y
determined experimentally for inputs x, say

plotted in the xy-plane. Some possible fitting polynomials are


(1) by a straight line: y = a + bx,
(2) by a quadratic polynomial : y = a + bx + cx 2 , or
(3) by a polynomial of degree k: y = ao + alx + ...+ akxk, etc.
As a general case, suppose that we are looking for a polynomial y f(x) = =
ao + alX + a2x2 + . . . + akx k of degree k that passes through the given data. Then
we obtain a system of linear equations,

I
f(XI) = ao + alxl + a2x~ + + akxt = YI
f(X2) =
ao + alx2 + a2x 2 + + akx2 = Y2

f(x n) = ao + alXn + a2x; +" . .+ akx~ = Yn,

or, in matrix form, the system may be written as Ax = b:

11 Xl
X2 x 2 x~ X
x t] [
2
ao
al ] [ Y2
YI ]

... ,'' - ... '


[
1 Xn x; x~ ak Yn

The left-hand side Ax represents the values of the polynomial atx, 's and the right-hand
side represents the data obtained from the inputs Xi'S in the experiment.
If n ~ k + 1, then the cases have already been discussed in Section 3.9.1. If
n > k + 1, this kind of system may be inconsistent. Therefore, the best thing one can
do is to find the polynomial f(x) that minimizes the sum of the squares of the vertical
distances between the graph of the polynomial and the data. But, it is equivalent to
find the least squares solution of the system Ax = b, because for any C E C(A) of the
form

xt ] [
x2
a
al
o]
[ ao + alxl + .. . + akxt ]
ao+alx2+ +akx2

[jX
n
x; ....." xi a~ = ao + alx
n
1.. .+ akX~ = c,

we have

lib - cII 2 = (YI - ao - a\x\ - .. " - akxf)2 + ...


+(Yn - ao - alXn - ". . - akx~)2"
5.9.2. Application: Polynomialapproximations 189

The previous theory says that the orthogonal projection be ofb into the column space
of A minimizes this quantity and shows how to find be and a least squares solution
Xo
Example 5.17 Find a straight line y = a + bx that fits best the given experimental
data, (1,0), (2,3), (3,4) and (4,4).

Solution: We are looking for a line y = a + bx that minimizes the sum of squares
of the vertical distances IYi - a - bx, I's from the line Y = a + bx to the data (Xi, Yi).
By adapting matrix notation

we have Ax = b and want to find a least squares solution of Ax = b. But the columns
of A are linearly independent, and the least squares solution is x = (AT A)-l ATb.
Now,

10]
30 '
(ATA)_I=[ ~ -~] ATb=[ll]
1 l' 34 .
-- -
2 5
Hence, we have

o
Problem 5.28 From Newton's second law of motion, a body near the surface of the earth falls
vertically downward according to the equation

1
set) = so + vot + zgt 2 ,

where set) is the distance that the body travelled in time t, andso, Vo are the initial displacement
and velocity, respectively, of the body, and g is the gravitational acceleration at the earth's
surface. Suppose a weight is released, and the distances that the body has fallen from some
reference point were measured to be s = -0.18, 0.31, 1.03, 2.48,3.73 feet at times t =
0.1. 0.2, 0.3, 0.4. 0.5 seconds. respect ively. Determine approximate values of so. vo, g
using these data.
190 Chapter 5. Inner Product Spaces

5.9.3 Orthogonal projection matrices

In Section 5.9.1, we have seen that the orthogonal projection ProjC(A) of the Euclidean
space lRm on the column space C(A) of an m x n matrix A plays an important role in
=
finding a least squares solution of an inconsistent system Ax b. Also, the orthogonal
projection ProjC(A) is the main tool in the Gram-Schmidt orthogonalization.
In general , for a given subspace U of lRm , the computation of the orthogonal
projection Proju oflRm onto U appears quite often in applied science and engineering
problems. The least squares solution method can also be used to find the orthogonal
projection: Indeed, by taking first a basis for U and then making an m x n matrix A with
these basis vectors as columns , one clearly gets U = C(A) , and so by Theorem 5.26

Proju = ProjC(A) = A(A T A)-lAT . I


In fact, this projection itself is the orthogonal projection matrix, that is, the matrix
representation of Proju with respect to the standard basis ex,

Note that this projection matrix Proju is independent of the choice of a basis for U
due to the uniqueness of the matrix representation of a linear transformation with
respect to a fixed basis.

Example 5.18 Find the projection matrix P on the plane 2x - y - 3z = 0 in the


space lR3 and calculate Pb for b = (1, 0, 1).

Solution: Choose any basis for the plane 2x - y - 3z = 0, say,


VI = (0, 3, -1) and V2 = (1 , 2, 0).

Let A ~[ _: ~] be the matrix with vr end V2 as columns,Thon

(A TA )- I = [10 6
6 5
]-1 = ..!.- [
14
5 -6]
- 6 10 .

The orthogonal projection matrix P = ProjC(A) is


= A(A T A)-IA T

[H
P

=
[[
14
0
-i ~
5 -6] [0 3
-6 10 1 2 -~ ]

=
14
2
-1 [102 13
6 -3 -H
5.9.3. Application: Orthogonal projection matrices 191

and

Pb = .!..
14
[1~6 -31~ _~]
5
[~]
1
= .!.. [
14
2~11 ]. o

If an orthonormal basis for U is known, then the computation of Proju =


A(A T A)-I AT is easy, as shown in Remark (2) on page 185: For an orthonormal
basis f3 = {uI, U2, . .. , un} for U, the orthogonal projection matrix onto the subspace
U is given as

Example 5.19 If A =
[CI c2l, where CI =
(1,0,0), C2 =
(0,1,0), then the col-
umn vectors of A are orthonormal, C(A) is the x y-plane, and the projection of
b = (x, y, z) E 1R3 onto C(A) is be = (x , y, 0). In fact,

which is equal to CIcj + C2C{. 0

Note that, if we denote by Proju; the orthogonal projection of R" on the subspace
spanned by the basis vector u, for each i, then its projection matrix is uiuT, and so

and
ifi=/=j
if i = j .

Problem5029 Let u = ()z ,)z) be a vector in lR2 which determines l-dimensional subspace
U = {au = (:72' :72) :a E R] . Show that the matrix

considered as a linear transformation on lR2 , is an orthogonal projection onto the subspace U.

Problem5.30 Show that if {VI , v2, .00. v m} is an orthonormal basis for lRm, then VIv[ +
T T_[m
V2 v2+" ,+vmv m-
192 Chapter 5. Inner ProductSpaces

In general, if a basis {CI, C2, . . . , cn} for U is given, but not orthonormal,then one
has to directly compute
Proju = A(A T A)-I AT,
where A = [CI c2 . . . cn] is an m x n matrix whose columns are the given basis
vectors c.'s.
Sometimesit is necessaryto computean orthonormalbasisfromthe givenbasis for
U by the Gram-Schmidt orthogonalization. Its computationgivesus a decomposition
of the matrix A into an orthogonal part and an upper triangular part, from which the
computation of the projection matrix might be easier.
QR decomposition method: Let {CI, C2, ... ,cn} be an arbitrary basis for a sub-
space U. The Gram-Schmidt orthogonalization process to this basis may be written
as the following steps:
(1) From the basis {CI, C2, . . . , cn} , find an orthogonal basis {ql' q2, ... , qn} for
Uby

ql = CI
(ql, C2)
q2 = C2 -
(ql,qJ)
ql

(qn-I,Cn) (ql, cn)


qn = Cn -
(qn-I , qn-I)
qn-I - . . . -
(ql, ql)
ql

(2) By normalization of these vectors: u, = qi/llqj II, one obtains an orthonormal


basis {UI , . . . , un} for U.
(3) By rewriting those equations in (I), one gets

CI = ql = bllUI
C2 = al2ql + q2 = bl2 uI + b22 U2

h aij = (qi,Cj ) ~ . . 1 d
were (qi ,qi) lor I < ], au = ,an

(q j, Cj)
bij = aij IIqjII = -(--)
qj,qj
IIqill = (u., Cj),

for i : s i. which is just the component of Cj in u, direction.


(4) Let A = [CI C2 . .. cn]' Then, the equations in (3) can be written in matrix
notation as
bll bvi
o b22 bin]
b2n
A = [CI C2 . . . cn] = [UI U2 . . . un] ~ =QR,
[
o bnn
5.9.3. Application: Orthogonal projection matrices 193

where Q = [UI U2 .. . un] is the m x n matrix whose orthonormal columns are


obtained from Cj 's by the Gram-Schmidt orthogonalization, and R is the n x n upper
triangular matrix.
(5) Note that rank A = rank Q = n ::: m , and C(Q) = V = C(A) which is
of dimension n in IR m Moreover, the matrix R is an invertible n x n matrix, since
each b jj = (uj , C j) is equal to IIcj - ProjUj_l (Cj) II , where Vj_1 is the subspace of
IR m spanned by {CI, C2, . .. , cj-d (or equivalently, by (UI , U2, . , uj-d), and so
bjj ;6 0 for all j because Cj ~ Vj-I.

One of the byproducts of this computation is the following theorem.

Theorem 5.27 Any m x n matrix ofrank n can be factored into a product Q R, where
Q is an m x n matrix with orthonormal columns and R is an n x n invertible upper
triangular matrix.

Definition 5.11 The decomposition A = QR is called the QR factorization or


the QR decomposition of an m x n matrix A, (rank A = n), where the matrix
=
Q [UI U2 .. . un] is called the orthogonal part of A, and the matrix R [bij] is=
called the upper triangular part of A .

Remark: In the QR factorization of A = QR, the orthononnality of the column


vectors of Q means QT Q = In, and the j-th column of the matrix R is simply the
coordinate vector of Cj with respect to the orthonormal basis f3 = {u I, U2 , .. , un}
for V : i.e.,

and so

o o
With this QR decomposition of A, the projection matrix and the least squares
solution of Ax = b for b E IR m can be calculated easily as

P = A(A T A)-I AT = QR(R T QT QR)-I R T QT = QQT,


Xo = (AT A)-I ATb = (R T QT QR)-I R T QTb = R- 1 QTb.
Corollary 5.28 Let A be an m x n matrix of rank n and let A = Q R be its Q R
factorization. Then,
(1) the projection matrix on the column space of A is [Prok(A)]a = Q QT .
(2) The least squares solution of the system Ax = b is given by XQ = R - I QTb,
which can be solved by using back substitution to the system Rx = QTb.
194 Chapter 5. Inner Product Spaces

Example 5.20 (QR decomposition of A) Find the QR factorization A = QR and


the orthogonal projection matrix P = [PrOk(A)]a for

1 1 0]
101
A= [CI C2 C3] = 0 1 1 .
[
001

Solution: We first find the decomposition of A into Q and R, the orthogonal part and
the upper triangular part. Use the Gram-Schmidt orthogonalization to get the column
vectors of Q:
ql = CI = (I, I, 0, 0)

q2 = C2 - C2' ql ql
ql . ql
= (~ , -~,
2 2
1, 0)
= C3 - C3 . q2 q2 _ C3 . ql ql = (-~ , ~ , ~, 1) ,
q3 q2 . q2 ql . ql 3 3 3

and IIqlll = -/2, IIq211 = .J'J72, IIq311 = ../173. Hence,


UI
= .s, _
IIqJII -
(_1 _1 00)
-/2' -/2' ,

U2
= ~_
IIq211 -
(_1 _ _ 1 -/2
../6 ' ../6'-.13'
0)
U3 =
q3 (
IIq311 = -
2 2
.J2T' .J2T' .J2T' .;7 .
2 -.13)
Then, CI = -/2uI, C2 = JzUI + .j[U2' C3 = JzUI + ~U2 + ftU3 ' In fact, these
equations can also be derived from A = QR with an upper triangular matrix R . (It
gives that the (i, j)-entry of R is bij = (Uj , Cj).) Therefore,

A = [H :]
o 0 1
= [UI U2 U3] [~lj~ .;7/-.13
~j~ ]
0 0

1/-/2 1/../6 -2/.J2T] [ -/2 1/-/2 1/-/2]


= 1/-/2 -1/../6 2/.J2T 0 -.13/-/2 1/../6 = QR,
[ o -/2/-.13 2/.J2T 0 0 .;7/-.13
o 0 -.13/.;7
and
6/7 1/7 1/7
- 2/ 7 ]
6/7 -1/7 2/7
p=QQT= _~~~ -1/7 6/7 2/7 . o
[
-2/7 2/7 2/7 3/7
5.9.3. Application: Orthogonal projection matrices 195

Problem 5.31 Find the 2 x 2 matrix P that projects the xy -plane onto the line y =x .

[i ~ l
Problem 5.32 Find the projection matrix P of the Euclidean 3-space]R3 onto the column space

C(A)focA~
Problem 5.33 Find the projection matrix P on the XJ, X2, X4 coordinate subspace of the
Euclidean 4-space ]R4.

8
Problem 5.34 Find the QR factorization of the matrix [ sin 88 cos ]
cos 0 .

As the last part of this section , we introduce a characterization of the orthogonal


projection matrices.

Theorem 5.29 A square matrix P is an orthogonal projection matrix if andonly if it


is symmetric and idempotent, i.e., pT = P and p 2 = P.

Proof: Let P be an orthogonal projection matrix . Then , the matrix P can be written
as P = A(A T A)-J AT for a matrix A whose column vectors form a basis for the
column space of P. It gives

pT = (A(A T A)-JATf = A(A T A)-J T AT = A(A T A)-JA T = P,


p2 = PP = (A(A T A)-JA T ) (A(A T A)-JA T ) = A(A T A)-lA T = P.
In fact, this second equation was already shown in Theorem 5.9.
Conversely, by Theorem 5.16, one has the orthogonal decomposition jRm = C(P)EB
N(p T). But, N(p T) = N(P) since p T = P. Thus , for any u + 0 E C(P) EB
N(p T) = jRm, P(u+o) = Pu+ Po = Pu = u, because p 2 = P implies Pu = u
for u E C(P) . It shows that P is an orthogonal projection matrix. (Alternatively, one
can use directly Theorem 5.9). 0

From Corollary 5.10, if P is a projection matrix on C(P), then 1- P is a projection


matrix on the null space N(p) (= C(l - P)), which is orthogonal to C(P) (=
N(l - P)).

Example 5.21 Let Pi : jRm --+ jRm be defined by

Pi(Xl, ,Xm)=(O, . . . ,O, Xi, 0, . .. ,0),

for i = 1, 2, ... , m. Then, each Pi is the projection of jRm onto the i -th axis, whose
matrix form looks like
196 Chapter5. Inner Product Spaces

o o
o
Pi = I - Pi = o
o
o o
When we restrict the image to JR, Pi is an element in the dual space JRn*, and usually
denoted by Xi as the i-th coordinate function (see Example 4.25). 0
Problem 5.35 Show that any square matrix P that satisfies p T P = P is a projection matrix.

5.10 Exercises
5.1. Decide which of the following functions on lR.2 are inner products and which are not. For
x = (XltX2), Y = (Yl, Y2) in]R2
(1) = XlY l X2Y2.
(x, y)
(2) (x, y) = 4Xl Yl + 4X2Y2 - Xl Y2 - X2Yl,
(3) (x, y) = Xl Y2 - X2Yl,
(4) (x, y) = XlYI + 3X2Y2,
(5) (x, y) = XlYI - Xl Y2 - X2Yl + 3X2Y2
5.2. Show that the function (A, B) = treAT B) for A, B E Mnxn(]R) defines an inner product
on Mnxn(lR) .
5.3. Find the angle between the vectors (4, 7, 9, 1, 3) and (2, 1, 1, 6, 8) in ]R5.
5.4. Determine the values of k so that the given vectors are orthogonal with respect to the

(l)IUlun (2)IUH -nl


Euclidean inner product in ]R4.

5.5. Consider the space qo, 1] with the inner product defined by

ir. g) = f f(x)g(x)dx .

Compute the length of each vector and the cosine of the angle between each pair of vectors
in each of the following :
(1) f(x) = 1, g(x) = X;
(2) f(x) = x m, g(x) = x n , where m, n are positive integers;
(3) f( x) = sin,.,rrx, g(x) = cosrerx, where m, n are positive integers.
5.6. Prove that
(al + + an)2 ::: near + ... + a;)
for any real numbers al, a2, , an. When does equality hold?
5.10. Exercises 197

5.7. Let V = P2([0, 1]) be the vector space of polynomials of degree g 2 on [0, 1] equipped
with the inner product
1
(f, g) = 10 f(t)g(t)dt.

(1) Compute (f, g) and 1If11 for f(x) = x + 2 and g(x) = x 2 - 2x - 3.


(2) Find the orthogonal complement of the subspace of scalar polynomials.
5.8. Find an orthonormal basis for the Euclidean 3-space ]R3 by applying the Gram-Schmidt
orthogonalization to the three vectors x = (1, 0, 1), X2 = (1, 0, - 1), x3 = (0, 3, 4).
5.9. Let W be the subspace of the Euclidean space]R3 spanned by the vectors VI = (1, 1, 2)
andv2=(I , 1, -1). Find Projg-Ib) forb = (1, 3, -2) .
5.10. Show that if u is orthogonal to v, then every scalar multiple of u is also orthogonal to v,
Find a unit vector orthogonal to VI = (1, 1, 2) and v2 = (0, 1, 3) in the Euclidean
3-space ]R3.
5.11. Determine the orthogonal projection of VI onto v2 for the following vectors in the n-space
]Rn with the Euclidean inner product.
(1) VI = (1, 2, 3), v2 = (1, 1, 2),
(2) VI = (1,2, 1), V2 = (2, 1, -1),
(3) VI = (1, 0, 1, 0) , v2 = (0, 2, 2, 0).
5.12. Let S = {v;}, where Vi'S are given below. For each S, find a basis for Sol with respect to
the Euclidean inner product on ]Rn.
(1) VI = (0, 1, 0), v2 = (0, 0, 1),
(2) VI = (1, 1, 0), V2 = (1, 1, 1),
(3) VI = (1, 0, 1,2), v2 = (1, 1, 1, 1), vs = (2, 2, 0, 1).
5.13. Which of the following matrices are orthogonal?

1/2 -1/3 ] 4/5 -3/5 ]


(1) [ -1/2 1/3' (2) [ -3/5 4/5'

(3)
1/./2
-1/./2
-1/./2]
1/./2 , (4)
1/ ./2 1/./3 -1/..;'6]
1/./2 -1/./3 1/..;'6 .
[
-1/./2 1/./2

[
1/./3 2/..;'6

5.14. Let W be the subspace of the Euclidean 4-space ]R4 consisting of all vectors that are
orthogonal to both x = (1, 0, -1 , 1) and Y = (2, 3, -1, 2). Find a basis for the
subspace W.
5.15. Let V be an inner product space. For vectors x and Yin V , establish the following identities :

(1) (x, y) = ! IIx + yll2 - ! IIx - yll2 (polarization identity) ,

(2) (x,y) = 1 (lIx+YII2 -lIxll 2 -IIYI12) (polarization identity),

(3) IIx+ YII 2 + IIx - YII 2 = 2(lIx1l 2 + IIYII 2) (parallelogram equality).


5.16. Show that x + Yis perpendicular to x - Yif and only if [x] = IIYII .
198 Chapter 5. Inner Product Spaces

Figure 5.6. n-dimensional parallelepiped P(A)

5.17. Let A be the m x n matrix whose columns are Cl, C2, ... , Cn in the Euclidean m-space
IRm . Prove that the volume of the n-dimensional parallelepiped P(A) determined by those
vectors Cj 's in IRm is given by

vol (A) = J
det(AT A) .

(Note that the volume of the n-dimensional parallelepiped determined by the vectors
c j , c2, ... , Cn in IRm is by definition the multiplication of the volume of the (n - 1)-
dimensional parallelepiped (base) determined by C2, . . . , Cn and the height of cl from
the plane W which is spanned by C2, .. . , Cn' Here, the height is the length of the vector
C = q - Proj W(q ), which is orthogonalto W . (See Figure 5.6.)lfthe vectors are linearly
dependent, then the parallelepiped is degenerate, i.e., it is contained in a subspace of
dimension less than n .)
5.18. Find the volume of the three-dimensional tetrahedron in the Euclidean 4-space jR4 whose
vertices are at (0,0,0,0), (1,0,0,0), (0, 1,2,2) and (0,0, 1,2).
5.19. For an orthogonal matrix A, show that det A = 1. Give an example of an orthogonal
matrix A for which det A = -1.
5.20. Find orthonormal bases for the row space and the null space of each of the following
matrices.

243] 1 40]
(I)
[21 1 I ,
0 I
(2)
[ -2 -3 1
002
,

5.21. Let A be an m x n matrix of rank r. Find a relation among m, nand r so that Ax = b has
infinitely many solutions for every b E jRm .
5.22. Find the equation of the straight line that fits best the data of the four points (0, 1), (1, 3),
(2, 4), and (3, 4).
5.23. Find the cubic polynomial that fits best the data of the five points
(-1 , -14), (0, -5), (1, -4), (2, 1), and (3, 22).
5.24. Let W be the subspace of the Euclidean 4-space IR4 spanned by the vectors Xj 'S given in
each of the following problems. Find the projection matrix P for the subspace W and the
null space N(P) of P. Compute Pb for b given in each problem.
5.10. Exercises 199

(1) xI = (1, I , 1, 1),X2=(1 , -I , 1, -1),X 3= (-I , 1, 1, O),and


b = (1, 2, I, 1).
(2) xI = (0, -2, 2, I ), X2 = (2, 0, -1, 2), and b = (1, 1, I , 1).
(3) XI = (2, 0, 3, -6),X2 = (- 3, 6, 8, O),andb= (-I , 2, -I , 1).
5.25. Find the matrix for orthogonal projection from the Euclidean 3-space jR3 to the plane
spanned by the vectors (1, 1, 1) and (1, 0, 2).
5.26. Find the projection matrix for the row space and the null space of each of the following

[{s -f ],
matrices:

[ 24 I] 1 4 0]
(1) (2) 1 1 I ' (3) 0 0 2 .
[ 2 3 -1
../5 ../5
5,27. Consider the space C[ -1, 1] with the inner product defined by

U, g) = 11
-I
f(x)g(x)dx.

A function f E C[-I, 1] is even if f(-x) = f(x), or odd if f(-x) = - f(x) . Let U


and V be the sets of all even functions and odd functions in C[ -1, 1], respectively.
(1) Prove that U and V are subspace s and C[-I , 1] = U + V.
(2) Prove that U .1 V.
(3) Prove that for any f E C[-l , IJ.11fll 2 = IIh1l 2 + IIgll 2 where f = h+g E U Etl V.
5.28. Determine whether the following statements are true or false, in general, and justify your
answers.
(1) An inner product can be defined on any vector space.
(2) Two nonzero vectors X and Y in an inner product space are linearly independent if
and only if the angle between x and Yis not zero.
(3) If V is perpendicular to W , then V .L is perpendicular to W .L.
(4) Let V be an inner product space . Then IIx - YII ~ [x] - IIYII for any vectors Xand Y
in V.
(5) Every permutation matrix is an orthogonal matrix .
(6) For any n x n symmetric matrix A, x T Ay defines an inner product on R",
(7) A square matrix A is a projection matrix if and only if A 2 = J.
(8) For a linear transformation T : jRn --+ jRn, T is an orthogonal projection if and only
if idlR.n - T is an orthogonal projection.
(9) For any m x n matrix A , the row space R(A) and the column space C(A) are ortho-
gonal.
(10) A linear transformation T is an isomorphism if and only if it is an isometry .
(11) For any m x n matrix A and bE jRm , AT Ax = ATb always has a solution.
(12) The least squares solution of Ax = b is unique for any matrix A .
(13) The least squares solution of Ax = b is the orthogonal projection of b on the column
space of A.
6
Diagonalization

6.1 Eigenvalues and eigenvectors

Gaussian elimination plays a fundamental role in solving a system Ax = b of linear


equations. In general, instead of solving the given system, one could try to solve the
normal equation A T Ax = A Tb , whose solutions are the true solutions or the least
squares solutions depending on whether or not the given system is consistent. Note
that the matrix A T A is a symmetric square matrix, and so one may assume that the
matrix in the system is a square matrix . For this kind of reason, we focus on a diagonal
matrix or a linear transformation from a vector space to itself throughout this chapter.
Recall that a square matrix A, as a linear transformation on jRn , may have various
matrix representations depending on the choice of the bases, which are all in similar
relations. In particular, A itself is the matrix representation with respect to the standard
basis. One may now ask whether there exists a basis f3 with respect to which the
matrix representation [Alp of A is diagonal or not. But then A and a diagonal matrix
D = [Alp are similar : i.e., there is an invertible matrix Q such that D = Q-l A Q.
In this chapter, we will see which matrices can have diagonal matrix representa-
tions and how one can find such representations. For this we introduce eigenvalues
and eigenvectors, which play important roles in their own right in mathematics and
have far-reaching applications not only in mathematics, but also in other fields of
science and engineering. Some specific applications of diagonalization of a square
matrix A are to
(1) solving a system Ax = b of linear equations,
(2) checking the invertibility of A or estimation of det A,
(3) calculating a power An or the limit of a matrix series L~l An,
(4) solving systems of linear differential equations or difference equations ,
(5) finding a simple form of the matrix representation of a linear transformation , etc.
One might notice that some of these problems are easy if A is diagonal.

Definition 6.1 Let A be an n x n square matrix. A nonzero vector x in the n-space


jRn is called an eigenvector (or characteristic vector) of A if there is a scalar A in lR

J H Kwak et al., Linear Algebra


Birkhauser Boston 2004
202 Chapter 6. Diagonalization

such that
Ax = AX.
The scalar A is called an eigenvalue (or characteristic value) of A, and we say X
belongs to A.

Geometrically, an eigenvector of a matrix A is a nonzero vector x in the n-space jRn


such that the vectors x and Ax are parallel. In other words, the subspace W spanned by
x is invariant under the linear transformation A : jRn ~ jRn in the sense A (W) C W.
Algebraically, an eigenvector x is a nontrivial solution of the homogeneous system
(A./- A)x = 0 of linear equations, that is, an eigenvector x is a nonzero vector in the
null spaceN(A./ - A).
There are two unknowns in the system (A./ - A)x = 0: an eigenvalue A and an
eigenvector x. To find those unknowns , first we should determine an eigenvalue A by
using the fact that the equation (A./ - A)x = 0 has a nontrivial solution x if and only
if A satisfies the equation
det(A./ - A) = 0,
called the characteristic equation of A. Note that det(A./ - A) is a polynomial of
degree n in A and it will be called the characteristic polynomial of A. Thus, the
eigenvalues are just the roots of the characteristic equation det(A./ - A) =
O.
Next, the eigenvectors of A can be determined by solving the homogeneous system
(A./ - A)x = 0 for each eigenvalue A. In summary, by referring to Theorem 3.26 we
have the following theorem.

Theorem 6.1 For any square matrix A. the following are equivalent:
(1) A is an eigenvalue of A;
(2) det(A./ - A) = 0 (or det(A - A./) = 0);
(3) H - A is singular;
(4) =
the homogeneous system (A./ - A)x 0 has a nontrivial solution .
Recall that the eigenvectors of A belonging to an eigenvalue A are just the nonzero
vectors x in the null space N(A./ - A). This null space is called the eigenspace of A
belonging to A, and denoted by H(A) .

Example 6.1 (Matrix having distinct eigenvalues) Find the eigenvalues and eigen-
vectors of

A= [ .J22 .J2t2].
Solution: The characteristic polynomial is

det(A./ - A) = det [ A - r,;2 -.J2] = A2 - 3A = A(A- 3) .


-"\12 A- I

Thus the eigenvalues are Al = 0 and A2 = 3. To determine the eigenvectors belonging


to Ai 'S, we should solve the homogeneous system of equations (AiI - A)x O. Let =
us take Al = 0 first; then the system of equations (All - A)x = 0 becomes
6.1. Eigenvalues and eigenvectors 203

-2 x I -/2X2 = 0,
or X2 = -h XI .
{ --/2 x I x2 = 0,

Hence, XI = (XI , X2) = (-1, -/2) is an eigenvector belonging to AI = 0, and


E(O) = {txI : t E R}. (Here, one can take any nonzero solution (XI, X2) as an
eigenvector XI belonging to AI 0.) =
For A2 = 3, the system of equations (A2l - A)x = 0 becomes

XI - -/2x2 = 0, or XI = h X2 .
{ --/2 XI + 2 X2 = 0,

Thus, by a similar calculation, X2 = (-/2, I) is one of the eigenvectors belonging to


A2 = 3 and E (3) = {tx2 : t E R}. Note that the eigenvectors XI and X2 belonging
to the eigenvalues AI and A2 respectively are linearly independent. 0

Example 6.2 (Matrix having a repeated eigenvalue butfull eigenvectors) Find a basis

3-2 0]
for the eigenspaces of

A=
[o -2 3 0
0 5
.

Solution: The characteristic polynomial of A is (A- l )(A- 5)2, so thattheeigenvalues


of A are AI = 1 and A2 = 5 with multiplicity 2. Thus, there are two eigenspaces of
A . By definition, X = (XI, X2, X3) is an eigenvector of A belonging to A if and only
if Xis a nontrivial solution of the homogeneous system (Al - A)x = 0:

A-2 3 A -23 00] [ XI]


= [ 0]
[ o 0 A- 5
X2
X3
0
0
.

If Al = I, then the system becomes

Solving this system yields XI = t, X2 = t , X3 = 0 for t E R. Thus , the eigenvectors


belonging to AI = 1 are nonzero vectors of the form

so that (1, 1, 0) is a basis for the eigenspace E(AI) belonging to Al = 1.


If A2 = 5, then the system becomes
204 Chapter 6. Diagonalization

Solving this system yields Xl = -s, X2 = s, X3 = t for s, t E R Thus, the


eigenvectors of A belonging to )..2 = 5 are nonzero vectors of the form

for s, t E R Since (-I , 1, 0) and (0, 0, 1) are linearly independent, they form a
basis for the eigenspace E()..2) belonging to)..2 = 5. 0

For each eigenvalue), of A in Examples 6.1 and 6.2, one can see that the dimension
of the eigenspace E()") is equal to the multiplicity of ).. as a root of the equation
det(H - A) = O. But, in general it is not true as the next example shows.

Example 6.3 (Matrix having a repeated eigenvalue with insufficient eigenvectors)

A~[H i~ n
Consider the matrix

A simple computation shows that the characteristic polynomial of the matrix A is


().. - 2)5 so that the eigenvalue X = 2 is of multiplicity 5. However, there is only one
linearly independent eigenvector ej = (I , 0, 0, 0, 0) belonging to ).. = 2 because
rank(2/ - A) = 4, which shows that dim E()") = dimN(21 - A) = 1 is less than
the multiplicity of ).. . This kind of matrix will be discussed later in Chapter 8. 0
Note that the equation det()..] - A) = 0 may have complex roots, which are
called complex eigenvalues. However, the complex numbers are not scalars of the
real vector space. In many cases, it is necessary to deal with those complex numbers,
that is, we need to expand the set of scalars to the set of complex numbers. This
expansion of the set of scalars to the set of complex numbers leads us to work with
complex vector spaces, which will be treated in Chapter 7. In this chapter, we restrict
our discussion to the case of real eigenvalues, even though the entire discussion in
this chapter applies in the same way to the complex vector spaces.

Example 6.4 (Matrix having complex eigenvalues) The characteristic polynomial of


the matrix
A = [ c~s 0 - sin 0 ]
smO cosO
is )..2 - 2 cos 0)" + (cos 20 + sin 2 0) . Thus, the eigenvalues are X = cosO i sinO,
which are complex numbers, so this matrix as a rotation of]R2 has no real eigenvalues
unless 0 = nn, n = 0, l, 2, . . . . 0
6.1. Eigenvalues and eigenvectors 205

Problem 6.1 Let x be an eigenvalue of A and let x be an eigenvector belonging to )... Use
mathematical induction to show that xm is an eigenvalue of Am and x is an eigenvector of Am
belonging to ).. m for each m = 1, 2, . . . .

In the following, we derive some basic properties of the eigenvalues and eigen-
vectors.

Lemma 6.2 (1) If A is a triangular matrix, then the diagonal entries are exactly the
eigenvalues of A.
(2) If A and B are square matrices similar to each other; then they have the same
characteristic polynomial.

Proof: (1) The characteristic equation of an upper triangular matrix is

= (A - all) . .. (A - ann) = O.

(2) Since there exists a nonsingular matrix Q such that B = Q-I AQ ,


det(AI - B) = det (Q-I(Al)Q - Q-I AQ)

det (Q-IO.. I - A)Q)

= det Q-I det (AI - A) det Q


= det(H - A). 0

Lemma 6.2(2) says that similar matrices have the same eigenvalues, i.e., the
eigenvalues are invariant under the similarity. However, their eigenvectors might
be different: in fact, x is an eigenvector of B belonging to A if and only if Qx is an
eigenvector of A belonging to A, since A Q = Q B, and A Qx = QBx = AQx .

Theorem 6.3 Let an n x n matrix A have n eigenvalues AI, A2. . .. , An possibly


with repetition. Then,

(1) det A = Al A2 .. . An, (the product of the n eigenvalues),


(2) tr(A) = Al + A2 + .. . + An, (the sum ofthe n eigenvalues).

Proof: (1) Since eigenvalues AI, A2, . .. An are the zeros of the characteristic poly-
nomial of A, we have
206 Chapter 6. Diagonalization

If we take A = 0 in both sides, then we get

(2) On the other hand,

(A - At}(A - A2) . .. (A - An) = det(U - A)

~a~lll
A -al
-a2n
n
]
= det .
[ A. ; '
-anI

=
which is a polynomial of the form peA) An +Cn_IA n- 1+.. .+ CIA + Co in A. One
can compute the coefficient Cn-I of An-I in two ways by expanding both sides, and
get AI + A2 + . .. + An = al1 + a22 + . .. + ann = tr(A) . 0

Problem 6.2 Show that

(1) for any 2 x 2 matrix A , det(A/ - A) = A2 + tr(A)A + det A ;


(2) for any 3 x 3 matrix A,

det(A/ - A) = -A 3 + tr(A)A2 + 2
1L.J
" (aija ji - ai;a jj)A + det A .
ii'j

In Theorem 6.3, we assume that the matrix A has n (real) eigenvalues counting
multiplicities. But, by allowing the scalars to be complex numbers, which will be
done in the next chapter, every n x n matrix has n eigenvalues counting multiplicities,
so that Theorem 6.3 remains true for any square matrix.

Corollary 6.4 The determinant and the trace of A are invariant under similarity.
Recall that any square matrix A is singular if and only if det A =
O. However,
det A is the product of its n eigenvalues . Thus a square matrix A is singular if and only
if zero is an eigenvalue of A, or A is invertible if and only if zero is not an eigenvalue
ofA.
The following corollaries are easy consequences of this fact.

Corollary 6.5 For any n x n matrices A and B, the following are equivalent.
(1) Zero is an eigen value of AB.
(2) A or B is singular.
(3) Zero is an eigenvalue of B A.
6.2. Diagonalization of matrices 207

Corollary 6.6 For any n x n matrices A and B, the matrices A Band B A have the
same eigenvalues.

Proof: By Corollary 6.5, zero is an eigenvalue of AB if and only ifit is an eigenvalue


of BA. Let A be a nonzero eigenvalue of AB with (AB )x = AX for a nonzero vector
x. Then the vector Bx is not zero, since A i= 0, but
(B A)( Bx) = B(Ax) = A(Bx).

This means that Bx is an eigenvector of BA belonging to the eigenvalue A, and A is


an eigenvalue of B A. Similarly, any nonzero eigenvalue of B A is also an eigenvalue
of AB . 0

Problem 6.3 Find the matrices A and B such that det A = det B, tr(A) = tr(B), but A is not
similar to B.

Problem 6.4 Show that A and A T have the same eigenvalues. Do they necessarily have the
same eigenvectors?

Problem 6.5 Let AI, A2, . . . , An be the eigenvalues of an n x n matrix A . Then


(1) A is invertible if and only if Ai f 0 for all i = 1,2, . . . , n.
I I
(2) If A is invertible, then the inverse A -I has eigenvalues - , - ,
Al A2 An
Problem 6.6 For any n x n matrices A and B , show that AB and BA are similar if A or B is
nonsingular. Is it true for two singular matrices A and B?

6.2 Diagonalization of matrices


In this section, we are going to show what kinds of square matrices are similar to
diagonal matrices. That is, given a square matrix A, we want to know whether there
exists an invertible matrix Q such that Q-I AQ is a diagonal matrix, and if so, how
one can find such a matrix Q.
Definition 6.2 A square matrix A is said to be diagonalizable if there exists an
invertible matrix Q such that Q-I A Q is a diagonal matrix (i.e., A is similar to a
diagonal matrix) .
If a square matrix A is diagonalizable, then the similarity D = Q-I A Q gives an
easy way to solve some problems related to the matrix A, like (1)-(5) on page 20l.
For instance, let Ax = b be a system of linear equations with a square matrix A,
and suppose that there is an invertible matrix Q such that Q-I A Q is a diagonal
matrix D. Then the system Ax = b can be written as QDQ -I x = b, or equivalently
DQ-I x = Q-1b. Hence, for c =
Q-1b the solution y of Dy = c yields the solution
x = Qy of the system Ax = b. Note that Dy = c can be solved easily.
The next theorem characterizes a diagonalizable matrix, and the proof shows a
practical way of diagonalizing a matrix.
208 Chapter 6. Diagonalization

Theorem 6.7 Let A be an n x n matrix. Then A is dlagonalizable if and only if A


has n linearly independent eigenvectors.

Proof: (= Suppose A is diagonalizable. Then there is an invertible matrix Q such


that Q-I AQ is a diagonal matrix D, say

AI 0 ..
oA2 .
Q-IAQ=D= ..
[
o 0
or, equivalently, AQ = QD. Let XI, .. . , Xn denote the column vectors of Q. Since
AQ = [AxI AX2 AXnl,
QD = [AIXI A2X2 Anxnl.

the matrix equation AQ = QD implies AXi = AiXi for i = 1, ... , n. Moreover,


since Q is invertible, their column vectors are nonzero and are linearly independent,
that is, the Xi'S are n linearly independent eigenvectors of A .
(<=) Assume that A has n linearly independent eigenvectors XI, . . , Xn belonging
to the eigenvalues AI, ... , An, respectively, so that AXi = AiXi for i = 1, .. . , n.1f
we define a matrix Q as
Q= [XI X2 .. . xn ]
with Xj as the j -th column vector, then the same equation shows A Q = Q D, where
D is the diagonal matrix having the eigenvalues AI, ... , An on the diagonal. Since
the column vectors of Q are assumed to be linearly independent, Q is invertible, so
Q-IAQ = D. 0

Remark: (1) The proof of Theorem 6.7 reveals how to diagonalize an n x n matrix
A.
Step 1 Find n linearly independent eigenvectors XI, X2, ... , Xn of A.
Step 2 Form the matrix Q = [XI X2 . . , xnl.
Step 3 The matrix Q-I A Q will be a diagonal matrix with AI, . . , An as its
successive diagonal entries, where Aj is the eigenvalue associated with
the eigenvector Xj, j = 1, 2, .. . , n.

(2) Let a denote the standard basis for IRn and let {J = {XI, X2, .. . , xn } be the
basis for IRn consisting of n linearly independent eigenvectors of A. Then the matrix

Q = [XI X2 ... xnl = [[xIla [X2]a . . . [xn]al = [idl p


is the basis-change matrix from {J to a , and the matrix representation of A, as a linear
transformation, with respect to {J, is
6.2. Diagonalization of matrices 209

AI 0 ]
[Alp = [idl~[Ala[id]p = Q-I AQ = ". .
[
o An

Note that the diagonal entries Ai'S are the eigenvalues of A.


(3) Not all matrices are diagonalizable. A standard example is A = [~ ~] .
Its eigenvalues are AI = A2 = O. Hence, if A is diagonalizable, then

for some invertible matrix Q, and then A must be the zero matrix. Since A is not the
zero matrix, no invertible matrix Q can be obtained so that Q-I AQ is diagonal.

Example 6.5 (Several different types ofa diagonalizationy Diagonalize the matrix

1 -3 3]
A= 0 -5 6 .
[ o -3 4

Solution: A direct calculation gives that the eigenvalues of A are Al = 1, A2 = 1 and


A3 = -2, and their associated eigenvectors are

XI = (1, 0, 0), X2 = (0, 1, 1) and X3 = (1, 2, 1),

respectively. They are linearly independent, and the first two vectors XI, X2 form a
basis for the eigenspace E (1) belonging to AI = A2 = 1, and X3 forms a basis for the
eigenspace E(- 2) belonging to A3 = - 2. Thus, the matrix

101]
P =
[ 0 1 2
011

U=; j] U=~ nurn


diagonalizes A. In fact, one can verify that

P-'AP =

= [~~ ~].
o 0 -2
What would happen if one chose different eigenvectors belonging to the eigenval-
ues 1 and -2? According to the proof of Theorem 6.7, nothing would happen: Any
matrix whose columns are linearly independent eigenvectors will diagonalize A. For
210 Chapter 6. Diagonalization

example, {(-I, 0, 0), (0, -I, -l)}isanotherbasisforE(l),and{(2, 4, 2)}is


also a basis for E( -2). The matrix

-1 02] [1 0 ~] .
Q =
[ 0 -1 4
o -1 2
also diagonalizes A as Q-I A Q = 0 1
0 0 -2

A change in the order of the eigenvectors in constructing a basis-change matrix Q


does not change the diagonalizability of A, but the eigenvalues appearing on the main
diagonal of the resulting diagonal matrix would appear in accordance with the order
of the eigenvectors in the basis-change matrix. For example, let

110]
S=
[ 011
0 2 1 .

Then, S will diagonalize A, because it has linearly independent eigenvectors as


columns. In fact, one can show that

-1 1] 1 00]
-1
1 -1
2
and S-IAS=
[ 0 -2 0
001
. o

Problem 6.7 Show that the following matrices are not diagonalizable.

). 1 0] ). 00]
(1) A =
[ 00)"1 ,
0).. (2) B =
[ 1)" 0
01)"
, ).. is any scalar.

Problem 6.8 Construct a 2 x 2 matrix A whose eigenvalues are 2 and 3, and whose eigenvectors
are (2, 1) and (3, 2), respectively.

From Theorem 6.7, we learn how to diagonalize a matrix and what the diagonal
matrix is when the matrix has a full set of linearly independent eigenvectors. The
next question is when a square matrix A can have a full set of linearly independent
eigenvectors. The following theorem shows that it can happen if an n x n matrix has
n distinct (real) eigenvalues.
Theorem 6.8 Let AI, AZ, .. . , Ak be distinct eigenvalues of a matrix A and
XI, Xz , . . , Xk eigenvectors belonging to them, respectively. Then {XI , Xz, . . , Xk}
is linearly independent.

Proof: Let r be the largest integer such that {XI, ... , xr } is linearly independent.
If r = k, then there is nothing to prove. Suppose not, i.e., 1 ::: r < k. Then
6.2. Diagonalization of matrices 211

{XI. ... ,xr+d is linearly dependent. Thus, there exist scalars CJ, C2, . . , Cr+1
with Cr+1 f:: 0 such that

(1)

Multiplying both sides by A and using

one can get


(2)
Multiplying both sides of (1) by Ar+1 and subtracting the resulting equation from (2)
yields

Since {XI, X2, . .. , xr } is linearly independent and AI, A2, . . , Ar+1 are all distinct,
it follows that CI = C2 = ... = c; = O. Substituting these values in (1) yields
Cr+1 = 0, which is a contradiction to the assumption. 0

As a consequence of Theorems 6.7 and 6.8, we obtain the following.

Theorem 6.9 Ifan n x n matrixA hasn distincteigenvalues, then A isdiagonalizable.


It follows from Theorem 6.9 that if XI, X2, .. . , Xn are eigenvectors of an n x n
matrix A belonging to n distinct eigenvalues AI , A2, . . , An , respectively , then they
form a basis for ~n and the matrix representation of A with respect to this basis should
be a diagonal matrix as shown in Remark (2) on page 208 .
Of course, some matrices can have eigenvalues with multiplicities> 1 so that the
number of distinct eigenvalues is strictly less than n. In this case, if such a matrix still
has n linearly independent eigenvectors, then it is also diagonalizable, because for a
diagonalization all we need is n linearly independent eigenvectors (see Example 6.2
or try with the matrix Un) . In some cases , such a matrix does not have n linearly
independent eigenvectors (see Example 6.3), so a diagonalization is impossible. This
case will be discussed in Chapter 8.
The next example shows a simple application of the diagonalization to the com-
putation of the power An of a matrix A.

Example 6.6 Compute A 100 for A = [ ; i].


Solution: Its eigenvalues are 5 and -2 with associated eigenvectors (1, 1) and
(-4, 3), respectively. Hence Q = [~ - : ] diagonalizes A, i.e.,
212 Chapter 6. Diagonalization

Therefore,

5 -4 4]
Problem 6.9 For the matrix A =
[ 12 -11 12 ,
4 -4 5
(1) diagonalize the matrix A; and (2) find the eigenvalues of A 10 + A7 + 5A.

6.3 Applications

6.3.1 Linear recurrence relations

Early in the thirteenth century, Fibonacci posed the following problem: "Suppose that
a newly born pair of rabbits produces no offspring during the first month of their lives,
but each pair gives birth to a new pair once a month from the second month onward.
Starting with one (= XI) newly born pair in the first month, how many pairs of rabbits
can be bred in a given time, assuming no rabbit dies?"
Initially, there is one pair. After one month there is still one pair, but two months
later it gives a birth, so there are two pairs. If at the end of n months there are X n pairs,
then after n + 1 months the number will be the X n pairs plus the number of offspring
of the Xn-I pairs who were alive at n - 1 months. Therefore, we have for n ::: 2,

Xn+1 = Xn + Xn- I

Here , if we assume Xo = 0 and XI = 1, then the first several terms of the sequence
become
0, I, 1, 2, 3, 5, 8, 13, 21, 34, 55, .. ..
This sequence is called the Fibonacci sequence and each term is called a Fibonacci
number.

Example 6.7 Find the 2000 th Fibonacci number.

Solution: A standard trick is to consider a trivial extra equation X n = Xn together


with the given equation:

Xn+1 = + Xn-I
{ Xn =

Equivalently in matrix notation ,


6.3.1. Application: Recurrence relations 213

which is of the form

Xn = AXn_1 = Anxo, n = 1, 2, . .. ,

where Xn = [ X::I ], Xo = [~] and A = [~ ~] . Thus, the problem is


reduced to computing An . However, a simple computation gives the eigenvalues
Al = to + ../5), A2 = to -
../5) of A and their associated eigenvectors VI =
(AI, 1), V2 = (A2, 1), respectively. Moreover, the basis-change matrix and its inverse
are found to be

Q-I_ _ 1 [ 1 _1-J5]
2
-../5 -1 1+J5 .
2

1+J5 0 ]
With D = [ ~ 1-
2J5'

e-2oJ5f ] Q-. I

For instance, if n = 2000, then

X2001 ]
[ X2000 = X2000

It gives

In general, the Fibonacci numbers satisfy

for n ::: O.

Js (1+J5f ooo, because e-2J5t is actually very small for large k .


Note that since X2000 must be an integer, we look for the nearest integer to the
huge number 2
214 Chapter 6. Diagonalization

Historically, the number I+z.J5, which is very close to the ratio ~=, is called the
golden mean. 0

Remark: The golden mean is one of the mysterious naturally occurring numbers,
like e = 2.71828182 or 1f = 3.14159265 and it is denoted by o Its decimal
representation is = 1.61803398 .. . It is also described as = for 0 < s < l f
f
satisfying = its.
Definition 6.3 A sequence {Xn : n ~ O} of numbers is said to satisfy a linear recur-
rence relation of order k if there exist k constants aj , i = 1, ... , k with al and ak
-ronzero such that
Xn = alXn-1 + azxn-z + ... + akXn-k for all n ~ k ,
For example, the relation Xn = aXn-1 of order 1 forn = 1,2, . gives a geometric 0 0

sequence x; = anxo, and the relation xs.i.; = Xn +Xn-l of order 2 withxo = 0, Xl =


1 for n = 1,2, 0 gives the Fibonacci sequence. A solution to the recurrence relation

is any sequence {xn : n ~ O} of numbers that satisfies the equation. Of course,


a solution can be found by simply writing out enough terms of the sequence if k
beginning values xo, Xl, . . , Xk-l, called the initial values, are given.
As in the case of the Fibonacci sequence, one can write the recurrence relation
Xn = alXn-1 + azxn-z + o + akXn-k for all n ~ k,
or equivalently,
Xn+k-l = aIXn+k-Z + aZXn+k-3 + . .. + akXn-1 for all n ~ 1.
Its matrix form with some trivial extra equations is
Xn+k-l al az a3 ak-l ak Xn+k-Z
Xn+k-Z 1 0 0 0 0 Xn+k-3
Xn = = 0 1 0 0 0 = AXn-1

Xn+l Xn
Xn 0 0 0 1 0 Xn-l

for n ~ 1, or simply Xn = AXn-I oThe matrix A is called the companion matrix of


a recurrence relation Xn = AXn-l.

To solve a recurrence relation Xn = AXn-l, we first compute the characteristic


polynomial of a companion matrix A.
Lemma 6.10 For a companion matrix

al az a3 ak-l ak
1 0 0 0 0
A= 0 1 0 0 0 with al and ak nonzero,

0 0 0 0
6.3.1. Application: Recurrence relations 215

(1) the characteristic polynomial of A is Ak - a\A k-\ - ... - ak-IA - ak.


(2) All eigenvalues of A are nonzero and for any eigenvalue A of A, X n = An is a
solution of the recurrence relation Xn = AXn-i.

Proof: (1) Use induction on k. Clearly true for k = 1. Assume the equality for
k = n - 1. Let k = n, By taking the cofactor expansion of det(Al - A) along the last
column, the induction hypothesis gives

det(Al - A) = A(An- 1 - aIA n-2 - . .. - an-I) +(-1)2n-I an


= An - aiA n- I - . . . - an-IA - an,

(2) Clearly all eigenvalues are not zero, because ak f= O. It follows from (I) that for
any eigenvalue A of A, X n = An satisfies the recurrence relation

Remark: (1) By Lemma 6.10(1), every monic polynomial, a polynomial whose co-
efficient of the highest degree is 1, can be expressed as the characteristic polynomial
of some matrix A. This matrix A is also called the companion matrix of the monic
polynomial peA) = Ak - aiA k- I - . . . - ak-IA - ak.
(2) From Lemma 6.10(1), one can see that if a recurrence relation

is given, then the characteristic equation of the associated companion matrix A can
be obtained from the recurrence relation by replacing Xi with Ai and dividing the
resulting equation by An-k . This relation between the recurrence relation and the
characteristic equation of the matrix A can be a reason why {An : n :::: O} is a solution
for each eigenvalue A.

Lemma 6.11 IfAo is an eigenvalue ofthe companion matrix A ofa lineardijference


equation of order k, then the eigenspace E(AO) is a I-dimensional subspace and
contains [A~-I .". . AO If.

Proof: An entry-wise comparison of Ax = AOX,

shows that Xi = AOXi-I = A~-I Xl for i = 2, ... ,k. o


The recurrence relation Xn = AXn-I can be solved if the companion matrix A is
diagonalizable.
216 Chapter 6. Diagonalization

Example 6.8 (Recurrence relation Xn = AXn-1 with diagonalizable A) Solve the


recurrence relation

Xn = 6Xn-1 - 1 IXn-2 + 6X n-3 for n ~ 3

with initial values Xo = 0, Xl = 1, X2 = -1.

Solution: In matrix form, it is

The characteristic polynomial of A is

det(H - A) = .1. 3 - 6.1.. 2 + 11.1.. - 6 = (A - 1)(.1.. - 2)(.1.. - 3),

by Lemma 6.10. Hence, the eigenvalues are Al = I, .1..2 = 2, .1..3 = 3 and their
associated eigenvectors are

respectively, by Lemma 6.11. Moreover, the basis-change matrix and its inverse can
be found to be

149] 1 -5 6]
Q = [VI V2 V3] =
[ 1 2 3 ,
III
Q-I = ~ -2 8 -6 .
2 [ 1 -3 2

n
With

D~U
0
2
0

one can get

Xn =

It implies that the solution is X n = -3 + 5 x 2n - 2 x 3 n. o


6.3.1. Application: Recurrence relations 217

As a generalization of a recurrence relation Xn = AXn-1 with a companion matrix


A, let us consider a sequence {xn} of vectors in JRk defined by a matrix equation
Xn = AXn-1 with arbitrary k x k square matrix A (not necessarily a companion
matrix). Such an equation Xn = AXn-1 is called a linear difference equation.
A solution to the linear difference equation Xn = AXn-1 is any sequence {x n E
Rk : n ~ O} of vectors that satisfies the equation. In fact, a solution of a linear
difference equation is reduced to a simple computation of A n if the starting vector XO,
called the initial value, is given.
We first examine the set of its solutions.

Theorem 6.12 For any k x k matrix A , the set ofall solutions ofthe linear difference
equation Xn = AXn-1 is a k-dimensional vector space.
In particular, the set ofsolutions of the recurrence relation of order k,

with nonzero al and at. is a k-dimensional vector space.

Proof: Since the proofs are similar, we prove this only for the recurrence relation. Let
W be the set ofsolutions {x n } of the recurrence relation. Clearly, a sum oftwo solutions
and any scalar multiplication of a solution are also solutions. Hence, the solutions
form a vector space as shown in Example 3.1(4) . One can show that the function
f : W ~ JRk defined by f({ xnD = (Xk-I, . . . , XI, xo) is a linear transformation.
Clearly, itis bijective, because any given initial k values of Xo, XI, ... , Xk -I generate
recursively a unique sequence {x n : n ~ O} of numbers that satisfies the equation.
Hence , dim W = dim JRk = k . (For the linear difference equation Xn = AXn-I, see
Problem 6.10) . 0

Problem 6.10 Let Xn = AXn_1 be any lineardifference equationwith a k x k matrix A. For


eachbasisvector ej in [ej , . ... ek} in]Rk. thereis a uniquesolutionofxn = AXn-1 withinitial
value ej. Showthat such k solutions form a basis for the solutionspace of Xn = AXn_l.

Definition 6.4 A basis for the space of solutions of a linear difference equation or a
recurrence relation is called a fundamental set of solutions, and a general solution
is described as its linear combination. If the initial value is specified , the solution is
uniquely determined and it is called a particular solution.

By Theorem 6.12, it is enough to find k linearly independent solutions in order


to solve a given linear difference equation or a recurrence relation of order k, and its
general solution is just a linear combination of those linearly independent solutions.
First, we assume that the square matrix A is diagonalizable with k linearly inde-
pendent eigenvectors VI , V2 , ... , Vk belonging to the eigenvalues AI, ).,2, . . . , ).,t.
218 Chapter 6. Diagonalization

respectively. Since {VI, vz , ... , vd is a basis for ]Rk, any initial vector Xo can be
written as
Xo = CIVI + CZV2 + .. . + CkVk
Since AVj = AjVj, we have

and, in general for all n = 1,2, .. . ,

In particular, if the companion matrix A ofthe recurrence relation Xn = al Xn-I +


a2xn-2 +- . +akxn-k of order k has k distinct eigenvalues AI, ... , Ab then its solution
X n is (as the (k, I)-entry ofthe vector x.) a linear combination of AI' A2, . .. , At. In
fact, for each 1 ~ j ~ k, X n = A'J is a solution of the recurrence relation, and these
k solutions are linearly independent, so that it forms a fundamental set of solutions
by Theorem 6.12. (One can also directly show that X n = A'J satisfies the recurrence
relation). Note that all these solutions are geometric sequences.
We can summarize as follows.

Theorem 6.13 Let A be a k x k diagonalizable matrix with k linearly independent


eigenvectors VI, V2 , ... , Vk belonging to the eigenvalues AI, A2, . . . , Ak, respec-
tively. Then, a general solution of a linear difference equation Xn =
AXn_1 can be
written as
X n = CIAIvI + C2A2v2 + . .. + CkAtVk

for some constants CI, C2, . .. , Ck.


In particular; for the recurrence relation

Xn = alXn-1 + a2xn-2 + .. . + akXn-b n 2: k

with nonzero al and ab if the associated companion matrix A has k distinct eigen-
values AI, A2, ... , Ab then its general solution is

Xn = CIAI + C2A2 + ... + CkAt with constants c.'s.

Example 6.9 (Recurrence relation with distinct eigenvalues) Solve the recurrence
relation
Xn = Xn-I + 7Xn-2 - X n- 3 - 6Xn- 4 for n 2: 4,
and also find its particular solution satisfying the initial conditions Xo = 0, XI =
I, X2 = -I, X3 = 2.

Solution: By Lemma 6.10, the characteristic polynomial of the companion matrix


A associated with the given recurrence relation is
6.3.1. Application: Recurrence relations 219

det(AI - A) = ). 4 - ).3 - 7).2 + ). + 6 = (). - 1)()' + I)(). + 2)()' - 3),

so that A has four distinct eigenvalues). I = I, ).2 = -1, ).3 = -2, ).4 = 3. Hence,
the geometric sequences {1 n}, {(_I)n}, {(_2)n}, {3n} are linearly independent and
a general solution is a linear combination of them by Theorem 6.13:

Xn = CJ In + C2(-l)n + c3(-2t + C43n

with constants CI, C2, C3, C4 . And the initial values give

if n =
=1
CI + C2
(_l)l c2
+ C3 + C4 = 0,
if n CI + + (-2)l c3 + 31c4 = I,
if n =2 CI + (-1)2 c2 + (-2)2 c3 + 32C4 = -I,
if n =3 CI + (-1)3 c2 + (-2)3 c3 + 33C4 = 2,

which is a system oflinear equations with a 4 x 4 Vandermonde matrix as its coefficient


. H
matrix. I
ence, one can 'It to h ave CI = 12
so ve 5 ' C2 = -g'
I C3 = -13'4 C4 = -40 I

and then its particular solution is

5 1 n 4 n In
= -1 n - -(-I) - -(-2) - -3 .
X
n 12 8 15 40 o

Problem 6.11 Let{an} be a sequence withao = I, al = 2, a2 = 0, and the recurrence relation


an = 2an- 1 + an-2 - 2an-3 for n 2: 3. Find the n-th term an,

The next example illustrates how to solve a recurrence relation when its associated
companion matrix A has a repeated eigenvalue.

Example 6.10 (Recurrence relation with a repeated eigenvalue) Solve the recurrence
relation
Xn = -2Xn- 1 - Xn-2 for n :::: 2,
and also find its particular solution satisfying the initial conditions Xo = 1, XI = 2.

Solution: Its characteristic polynomial is

=
and), -1 is an eigenvalue of multiplicity 2. Hence, the geometric sequence {x n } =
{(_1)n} is a solution of the recurrence relation . Since its solution space is of dimension
2 by Theorem 6.12, we should find one more solution which is independent of {x n } =
{(_I)n}. But, in this case {xn} = {n( _I)n} is also a solution of the recurrence relation .
In fact, for n :::: 2,

-2Xn_ 1 - Xn-2 = -2(n - 1)(-l)n-1 - (n - 2)(_l)n-2 = n(_l)n = Xn.


220 Chapter 6. Diagonalization

Clearly, two solutions (_l )n} and (n ( _l)n} are linearly independent, and so X n =
CI (_I)n +C2n( _I)n is a general solution. The initial condition gives CJ = 1, C2 = - 3
and X n = (_I)n - 3n( _I)n is the particular solution of the recurrence relation . 0

In Example 6.10, we show that A = -1 is an eigenvalue of multiplicity 2, and


two geometric sequences (-l)n} and (n(-l)n} are linearly independent solutions
of the recurrence relation.
As a general case, let us consider a recurrence relation

with nonzero al and ak, and let the associated companion matrix A have s distinct
eigenvalues Al, A2, ... , As with multiplicity ml, m2, ... , m s , respectively. For each
eigenvalue Ai with multiplicity m i > 1, we have their derivatives f(Ai) = ro =
. . . = f(m ;-I)(Ai) = 0, where f(A) = Ak - alA k- 1 - a2Ak-2 - . . . - ak_IA - ak
is the characteristic polynomial of A. Hence, for a new function FI (A) defined by

FI (A) = An-k f(A)


= An -aIA n- 1 _ . . . -ak_IA n-k+1 -ak An- k ,

one can see that the derivative F{ (A) =


A = Ai. That is,
at A = Ai and then F2(A) = AF{ (A) = Oat

n).n - al (n - 1)), n-I - ... - ak-I (n - k + 1)), n-k+1 - ak(n - k))' n-k = O.


It shows that X n = nAi is also a solution of the recurrence relation. Inductively,
Fj(A) = AFj_I(A) = at A = Ai shows that X n = nj-IAi is also a solution for
j = 1, . . . , mi. Therefore, one can conclude that X n = Ai, nAi, .. . , nm;-I Ai are
m, linearly independent solutions of the recurrence relation. Getting together such
m ; linearly independent solutions for each eigenvalue Ai, one can get a fundamental
set of solutions of the recurrence relation.
In summary, we have the following theorem.

Theorem 6.14 For any given recurrence relation

with nonzero al and ak, let the associated companion matrix A have s distinct eigen -
values AI, A2, . . . , As with multiplicity ml , m2, ... , m., respectively. Then

{{An, {nAi}, .. . , {nm;-I An I i = 1,2, ... -l


forms a fundamental set ofsolutions, and a general solution is a linear combination
of them.
6.3.2. Application: Difference equations 221

Problem 6.12 Prove that if ).. = q is an eigenvalueof a recurrence relation with multiplicity
m, then the m solutions {qn} , {nqn}, ... , {nm-Iqn} are linearly independent.

Problem 6.13 Solve the recurrence relation Xn = 3Xn_1 - 4xn-3 for n ~ 3. What is it if
XQ = 1, xI = X2 = I?

6.3.2 Linear difference equations

A linear difference equation Xn = AX n_1 represents sometimes mathematical models


of dynamic processes that change over time and are widely used in such areas as eco-
nomics, electrical engineering, and ecology. In this case, vectors Xn give information
about a dynamic process when time n passes.
In this concept, a linear difference equation

Xn = AXn-l, n = 1,2, . . .

with a square matrix A is also called a discrete dynamical system. If the matrix A
is a companion matrix , then it is nothing but a recurrence relation .
If the matrix A is diagonal with diagonal entries AI, ... , At. then, by Theorem
6.13, a general solution of Xn = AXn-1 is

with some constants CI , C2, .. , ct .


Throughout this section, we are concerned with only a linear difference equation
X n = AXn-I, n = 1, 2, ... for a diagonalizable matrix A, because if A is not
diagonalizable, it is not easy in general to solve it. However, it can be done after
reducing A to a simpler form called the Jordan canonical form and this case will be
discussed again in Chapter 8.
Let A be a k x k diagonalizable matrix with k linearly independent eigenvectors
VI, ... , Vk belonging to the eigenvalues AI, .. . , Ak , respectively. Then, by Theorem
6.13 again, a general solution of X n = AXn_1 is

with some constants CI, C2, , Ck. Hence, if IAi I < 1 for all i, then the vector Xn
must approach the zero vector as n increases . On the other hand, if there exists an
eigenvalue Ai with IAi I > I , this vector Xn may grow exponentially in magnitude.
Therefore, we have three possible cases for a dynamic process given by Xn =
AXn-I , n = 1, 2, . .. . The process is said to be

(1) unstable if A has an eigenvalue Awith IAI > I,


(2) stable if IAI < 1 for all eigenvalues of A,
(3) neutrally stable if the maximum value of the eigenvalues of A is 1.
222 Chapter 6. Diagonalization

To determine the stability of a dynamic process, it is often necessary to estimate


the (upper) bound for the absolute values of the eigenvalues of a square matrix A. To
do this, for any square matrix A = [aij] of order k, let

k
R(A) = max{Ri(A) = L laijl : 1::: i s k},
j=1
k
c(A) = max{cj(A) = L laijl 1::: j s k},
i=l
s, = Ri(A) -Iaul.
Theorem 6.15 (Gerschgorin's Theorem) For any square matrix A oforder k, every
eigenvalue ); of A satisfies I).. - au I ::: se for some 1 :::: l ::: k.

Proof: Let ).. be an eigenvalue with eigenvector x = [Xl Xz . .. xkf. Then


L'=l aijX j = sx; for i = 1, . .. , k. Take a coordinate xe of x with the largest
absolute value. Then clearly xe =1= 0, and

I).. - aullxel = IAxe - auxel = LaejXj ::: L laejllxel = selxel


Ne Ne

Since Ixel > 0, I).. - aul :::: sr. o


Corollary 6.16 For any square matrix A oforder k, every eigenvalue); of A satisfies
1)..1 ::: min{R(A), c(A)}.

Proof: Note that 1)..1::: I).. - aul + laul ::: Se + laul = Re(A) :::: R(A) . Moreover,
since Xis also an eigenvalue of AT, 1)..1 :::: R(A T) = c(A) . 0

Example 6.11 (Stable or unstable) Solve a discrete dynamical system xn = AXn-l,


n = 1,2, . . . , where

( 1) A _ [0.8 0.0]
- 0.0 0.5 '
( 2) A _ [1.2 0.0]
- 0.0 0.6 ' (3) A ="2 1[3 1]
1 3 .

Solution: (1) Clearly, the eigenvalues of A are 0.8 and 0.5 with eigenvectors VI =
[ ~] and Vz = [ ~] , respectively. Hence, its general solution is

It concludes that the system x, = AXn-1 is stable. (See Figure 6.1.)


6.3.2. Application: Difference equations 223

10

Figure 6.1. A stable dynamical system

(2) Similarly, one can show that

Xn = CI (1.2)n [ ~ ] + c2(O.6)n [ ~ ] ,

in which the system is unstable if Ci i= O. (See Figure 6.2.)

10

-5

-10

Figure 6.2. An unstable dynamical system

(3) The eigenvalues of A are I and 2 with eigenvectors VI = [ - ~ ] and V2 =


[ ~ ] , respectively. Hence , a general solution of Xn = AXn_1 is
224 Chapter 6. Diagonalization

It is unstable if C2 =1= 0: For example, if CI = -1 , C2 = 1, then

Xo = [ ~ ] , XI = [ i], X2 = [ ; ] , x3 = [ ~ ] , X4 = [ ~~ ] , ... . D

The following example is a special type of a discrete dynamical system, called a


Markov process.

Example 6.12 (Markov process with distinct eigenvalues) Suppose that the popu-
lation of a certain metropolitan area starts with Xo people outside a big city and YO
people inside the city. Suppose that each year 20% of the people outside the city move
in, and 10% of the people inside move out. What is the 'eventual' distribution of the
population?

Solution: At the end of the first year, the distribution of the population will be

XI = 0.8 Xo + 0.1 YO
{ YI = 0.2 Xo + 0.9 YO .

Or, in matrix form,

XI = [ ;~ ] = [~:~ ~:~] [ ;~ ] = Axo.


Thus if Xn = (x n , Yn) denotes the distribution in the metropolitan area of the popu-
lation after n years, we get Xn Anxo.=
In this formulation , the problem can be summarized as follows :
(1) The entries of A are all nonnegative because the entries of each column of A
represent the probabilities of residing in one of the two locations in the next year,
(2) the entries of each column of A add up to 1 because the total population of the
metropolitan area remains constant.

Now, to solve the problem, we first find the eigenvalues and eigenvectors of A.
They are A.I = 1, A.2 = 0.7 and VI = (1, 2), V2 = (-1, 1), respectively, so that its
general solution is

n[ 1] n [-1]
xn = CJ (1) 2 + c2(0.7) 1 = [CJ(1)n -c2(0.7)n
2CJ (I)" + c2(0.7)n
]
.

But, the initial condition Xo = (xo, YO) gives CJ =~+ ~ and C2 = -to + ~, so
that

x n (~O + ~O) [; ] + (-~XO + ~o) (0.7)n [ -~ ]

-+ (~O + ~O) [; l as n -+ 00.


6.3.2. Application: Difference equations 225

Note that, since X n + Yn = a is fixed for all n, the process in time remains on the
straight line x + Y = a. Thus , for a given initial total population a, the eventual
ratio X n : Yn of the populations tends to I : 2 which is independent of the initial
distribution. For initial populations of a = 3,4,5,6 million people, the processes are
shown in Figure 6.3. 0

Figure 6.3. A Markov process AnXO

Recall that the matrix A in Example 6.12 satisfies the following two conditions:
(1) all entries of A are nonnegative,
(2) the entries of each column of A add up to 1.

Such a matrix A is called a Markov matrix (or, a stochastic matrix). In general ,


a dynamical system Xn = AXn-l with a Markov matrix A is called a Markov process.

The next theorem follows directly from Gerschgorin's Theorem 6.15 .

Theorem 6.17 If Ais an eigenvalue ofany Markov matrix A, then IAI :s 1.


In fact, every Markov matrix has an eigenvalue A = 1. To show this, let A be any
Markov matrix . Then the entries of each column of A add up to 1. It means that the
sum of each column of A - I is 0, or equivalently rl + r2 + ... + r n = 0 for the row
vectors ri of A - I. This is a nontrivial linear combination of the row vectors, and
so these row vectors are linearly dependent, so that det(A - l) = O. Consequently,
A = 1 is an eigenvalue of A. If x is an eigenvector of A belonging to A = I, then
Ax = x. This is called the equilibrium state.

Theorem 6.18 If A is a stochastic matrix, then


=
(1) A 1 is an eigenvalue of A,
(2) there exists an equilibrium state x that remains fixed by the Markov process.
Problem 6.14 Suppose that a land use in a city in 2000 is

Residential XO = 30%,
Commercial YO =
20%,
Industrial Zo = 50%.
226 Chapter 6. Diagonalization

Denote by Xb Yb Zk the percentage of residential, commercial, and industrial, respectively,


after k years, and assume that the stochastic matrix is given as follows:

Xk+l] [ 0.8 0.1 0.0] [ Xk ]


[
Yk+l = 0.1 0.7 0.1 Yk .
Zk+l 0.1 0.2 0.9 Zk

Find the land use in the city after 50 years.

Problem 6.15 A car rental company has three branch offices in different cities. When a car is
rented at one of the offices, it may be returned to any of three offices. This company started
business with 900 cars, and initially an equal number of cars was distributed to each office.
When the week-by-week distribution of cars is governed by a stochastic matrix

0.6 0.1 0.2]


A = 0.2 0.2 0.2 ,
[ 0.2 0.7 0.6

determine the number of cars at each office in the k-th week. Also, find lim Ak.
k-vco

6.3.3 Linear differential equations I

A first-order differential equation is a relation between a real-valued differentiable


function yet) of time t and its first derivative Y'(t) , and it can be written in the form
, df(t)
Y (t) = dt = f(t, y(t.
As a special case, if it can be written as y' (t) = get) for an integrable function get) ,
then its solution is yet) J g(t)dt + c for a constant c. However, it is difficult to
=
solve it in most other cases such as y'(t) = sin( ty 2).
As another case.If y'(r) = 5y(t), then it has a general solution y = ce 5t , wherec
is an arbitrary constant. Ifan additional condition yeO) = 3, called an initial condition,
is given, then its solution is y = 3e5t , called a particular solution.
The second case can be generalized to a system of n linear differential equations
with constant coefficients, which is by definition of the form

I
y~ = allYl + a12Y2 + + alnYn
Y2 = a 2lYI + a22Y2 + + a2nYn

y~ = anlYI + an2Y2 + + annYn,


where Yi = Ii (t) for i = 1,2, . . . , n are real-valued differentiable functions on an
interval I = (a, b). In most cases, one may assume that the interval I contains 0,
and some initial conditions are given as Ii (0) = di at 0 E f.
Let y = [fl h . . . fnf denote the vector whose entries are the differentiable
functions y, = J;'sdefinedonanintervalf = (a , b) : thus, for each t E f,y(t) =
[fl (t) h (t) . . . fn (t) f is a vector in jRn. Its derivative is defined by
6.3.3. Application: DitTerentialequations I 227

f{(t) ]
f{ ] f~(t)
~ [ )~
,
or Y (t) = : .
y' [
f~(t)

If A denotes the coefficient matrix of the system of linear differential equations, the
matrix form of the system can be written as

Y' = Ay, or y'(t) = Ay(t) for all tEl.

An initial condition is given by Yo = y(O) = (dt, .. . , dn ) E jRn.


A differentiable vector function y(t) is called a solution of the system y'(t) =
Ay(t) if it satisfies the equation. In general, the entries of the coefficient matrix A
could be functions. However, in this book, we restrict our attention to the systems
with constant coefficients.
Example 6.13 Consider the following three systems:

yi = 2Yt- 3Y2 {Yi = tYt +t


2Y2
3
{ Yi = 2Yt - 3yi
{ y~ = 2Yl + Y2 ' y~ = 2Yl + t Y2' y~ = sin yj + 5Y2
The first two systems are linear, but the coefficients of the second are functions of t.
The third is not linear because of the terms yi
and sin YI. 0
Example 6.14 (Population model) Let p(t ) denote the population of a given species
like bacteria at time t and let r (t , p) denote the difference between its birth rate
and its death rate at time t. If r (t , p) is independent of time t, i.e., it is a constant,
then d~~t) = rp(t) is the rate of change of the population and its general solution is
p(t) = p(O)e r t 0

Some basic facts about a system y' (t) = Ay(t) , A is any n x n matrix, oflinear
differential equations defined on I = (a , b) are listed below.
(I) (The fundamental theorem for a system of linear differential equations) The
system y' (t) = Ay(t) always has a solution. In addition, if an initial condition Yo is
given, then there is a unique solution y(t) on I which satisfies the initial condition. If
y = [YI Y2 ... Ynf is a solution on I, then it draws a curve in jRn passing through
the initial vector Yo = y(O) = (dl' ... , dn ) as t varies in the interval I .
(II) (Linear independence ofsolutions) Let {YI, .. . , Yn} be a set of n solutions of
the system y' = Ay on I. The linear independence of the solutions YI, . . . , Yn on I
is defined as usual: if c lYI + ... + CnYn = 0 implies Cl = ... = Cn = O. Or,
equivalently, they are linearly dependent if and only if one of them can be written as
a linear combination of all the others. Define
ru(t)
::: ~~~~~ ]
Y12(t )
Y21(t ) Y22 (t)
Y (t) = [Yt (t) ... Yn(t)] = : : for i t.
[
Ynl (t ) Yn2(t ) Ynn(t )
228 Chapter 6. Diagonalization

If the n solutions are linearly dependent, then det Y (t) = 0 for all i e I, Or, equiv-
alently, if det Y (t) f. 0 for at least one point tEl, then the solutions are linearly
independent. However, the next lemma says that

det Y (t) f. 0 for all t e I if and only if det Y (t) f. 0 at one point t e I,

The determinant of Y (t) is called the Wronskian ofthe solutions, denoted by W (t) =
det Y (t) for tEl. Note that the Wronskian W (t) is a real-valued differentiable
function on I .

Lemma6.19 W'(t) = tr(A)W(t). o


Proof:

W'(t) = (det Y(t))' = L sgn(a)(Yla(l) ' " Yna{n'


aeSn
= Lsgn(a)Y~a{I)' " Yna{n) + ... + Lsgn(a)Yla(l) '" Y~a{n)

= i: i:
I J
Y;j Yij = i: (t
I J
Y;j [adj Y]ji) =
n
L[Y' . adj flii

= tr(Y'. adj Y) = tr(A (Y . adj Y)


= tr(det Y(t)A) = tr(A)W(t),
where Yij (t) is the cofactor of Yij, and the equalities in the last two lines are due to
the fact that

Y' (t) = [y~ (t) ... = A[YI (t)


y"(t)] .. . Yn(t)] = AY(t),
yet) adj yet) = det Y(t)In = W(t)In o
From Lemma 6.19, it is clear that the Wronskian W(t) is an exponential function
of the form Wet) = cetr{A)t with an initial condition W(O) = c. It implies that the
value of Wet) is zero for all t or never zero on I depending on whether or not c = O.
Thus, we have the following lemma.

Lemma 6.20 Let {Yl, Y2, . . . , Yn} be a set ofn solutions ofthe system Y' = Ayon I ,
where A is any n x n matrix. Then the following are equivalent.
(1) The vectors Yl, Y2 , . . . , Yn are linearly independent.
(2) Wet) f. Ofor some t, that is, Yl(t), Y2(t), ... , Yn(t) are linearly independent
in JRn for some t.
(3) W (t) f. 0 for all t , that is, Yl(t), Y2(t) , . . . , Yn (t) are linearly independent in
JRn for all t.
6.3.3. Application: Differential equations I 229

(III) (Dimension ofthe solution space) Clearly, the set of all solutions of Y' (t) =
Ay(t) is a vector space. In fact, for any two solutions YI, Y2 of the system Y' (t) =
Ay(t), we have

(cIYI + c2yd = CIYl + C2Y2 = cIAYI + C2 AY2 = A(CIYI + C2Y2) .


Thus, CIYI + C2Y2 is also a solution for any constants Cj's.
Let {el, e2, ... , en} be the standard basis for R". For each ei, there exists a
unique solution Yi of Y'(t) = Ay(t) such that Yi (0) = ei , by (I). All such so-
lutions YI, Y2, ... , Y are linearly independent by Lemma 6.20. Moreover, they
generate the vector space of solutions. To show this, let Y be any solution. Then the
vector y(O) can be written as a linear combination of the standard basis vectors: say,
Yo = CI el + C2e2 + ... + cnen. Then , by the uniqueness of the solution in (I), we
have
y(t) = cm (t) + C2Y2(t) + ... + cnYn(t).
This proves the following theorem.

Theorem 6.21 Forany n x n matrix A, the set ofsolutions ofa system Y' = Ay on
I is an n-dimensional vector space.
Definition 6.5 A basis for the solution space is called a fundamental set of solutions.
The solution expressed as a linear combination of a fundamental set is called a general
solution of the system . The solution determined by a given initial condition is called
a particular solution.

By Theorem 6.21, it is enough to find n linearly independent solutions to solve


a system Y' = Ay on I , and then its general solution is just a linear combination of
those linearly independent solutions . This may be considered in three steps: (1) A is
diagonal, (2) A is diagonalizable, and finally (3) A is any square matrix .
(1 ) First suppose that A is a diagonal matrix D. Then y'(t) = Ay(t ) is

This system is just n simple linear differential equations of the first order:

y;(t) = AiYi(t), i=I,2, ... , n,

and their solutions are trivial: Yi(t) = CieA;1 with a constant c; for i = 1 ,2 , . .. ,n.
On the other hand, the diagonal matrix A has n linearly independent eigenvectors
ej , .. . , en belonging to the eigenvalues AI, ... , An , respectively. One can see that
Yi(t) = eAj1ei is a solution of the systemy'(t) = Ay(t) for i = 1,2 , . .. , n. Moreover,
at t = 0, the solution set (yl(t) , . .. , Yn(t)} = {el , ... , en} is linearly independent.
Hence , by Lemma 6.20, a general solution of the system y'(t) = Ay(t) is

y(t) = ClYI (t) + . ..+ cnYn(t) = cleA\/ e l + . .. + cneA./en


230 Chapter 6. Diagonalization

with constants Ci'S . Or in matrix notation,


A1t
y'~t) ] [ e
yet) =
[
: =
;., ] [ ::] = ,'D yo,
Yn(t) 0

where et D is by definition,

Example 6.15 (A predator-prey problemas a modelofa system ofdifferentialequa-


tions) One of the fundamental problems of mathematical ecology is the predator-prey
problem. Let x(t) and yet) denote the populations at time t of two species in a spec-
ified region, one of which x preys upon the other y. For example, x(t) and yet) may
be the number of sharks and small fishes, respectively, in a restricted region of the
ocean. Without the small fishes (preys) the population of the sharks (predators) will
decrease, and without the sharks the population of the fishes will increase. A math-
ematical model showing their interactions and whether an ecological balance exists
can be written as the following system of differential equations:

X ' (t) = a x(t) - b x(t)y(t)


{ y'(t) = -c yet) + d x(t)y(t).
In this equation, the coefficients a and C are the birth rate of x and the death rate of y,
respectively. The nonlinear x(t)y(t) terms in the two equations mean the interaction
of the two species such as the number of contacts per unit time between predators
and prey, so the coefficients b and d are the measures of the effect of the interaction
between them . A study of this general system of differential equations leads to very
interesting developments in the theory of dynamical systems and can be found in any
book on ordinary differential equations . Here, we restrict our study to the case of x
and y very small, i.e., near the origin in the plane. In this case, one can neglect the
nonlinear terms in the equations, so the system is assumed to be given as follows:

x'(t) ] =
[ y' (t)
[a0 0] [
-c
x(t) ] .
yet)

Thus, the eigenvalues are Al = a and AZ = -c with their associated eigenvectors e,


and ez, respectively. Therefore, its general solution is

[ ;~~~ ] = [ ~~::ct ] = [e~t e~ct] [ ~~ ] = c,eate, + cze-ctez. 0

(2) We next assume that a matrix A in the system y'(t) = Ay(t) is diagonaliz-
able, that is, it has n linearly independent eigenvectors VI, , Vn belonging to the
6.3.3. Application: Differential equations I 231

eigenvalues AI , .. ., An, respectivel y. Then the basis-change matrix Q = [VI ' " vn]
diagonalizes A and

A = QDQ-I = Q [AI... 0] Q_I .


o An

Thus the system becomes Q-I y' = DQ-I y. If we take a change of variables by the
new vector x = Q-I y (or y = Qx), then we obtain a new system

x' = Dx,
with an initial condition Xo = Q-I yO= (cj , . . . , cn). Since D is diagonal , its general
solution is
=
x = elDxo CjeAllel + .. . + CneAnlen.
Now, a general solution of the original system y' = Ay is
Y = Qx = QeIDQ-I yO

= [v, -: ,:, ] [ ;: ]

= cleA11vI + c2eA21v2 + ... + cneAnlvn.


Remark: One can check directly that each vector function Yi (t ) = eAilvi is the
particular solution ofthe system with the initial condition Yi (0) = Vi for i = I, , n.
Since Yi (0) = Vi for i = 1, . . . , n are linearly independent, eA;1 Vi for i = 1, ,n
form a fundamental set of solutions.
Thus, we have obtained the following theorem :
Theorem 6.22 Let A be a diagonalizable n x n matrix with n linearly indepen-
dent eigenvectors VI , V2 , .. . , Vn belonging to the eigenvalues AI, A2 , ... , An,
respectively. Then, a general solution of the system of linear differential equations
y'(t) = Ay(t) is

with constants CI, C2, . . . , Cn '


Note that a particular solution can be obtained from a general solution by deter-
mining the coefficients depending on the given initial condition.
Example 6.16 (y' = Ay with a diagonalizable matrix A) Solve the system of linear
differential equations

4Y2 + 4Y3
IIY2 + 12Y3
4Y2 + 5Y3.
232 Chapter 6. Eigenvectors and Eigenvalues

Solution: In matrix form, the system may be written as y' = Ay with


5 -4 4]
A=
[ 12 -11 12
4 -4 5
.

The eigenvalues of A are Al = A2 = 1, and A3 = -3, and their associated eigenvectors


are vj = (1, 1,0), V2 = (-1 ,0,1) and V3 = (1,3, 1), respectively, which are linearly
independent (see Problem 6.9). Hence, by Theorem 6.22, its general solution is

(3) A system y' = Ay of linear differential equations with a non-diagonalizable


matrix A will be discussed in Section 6.5.

6.4 Exponential matrices


Just like the Maclaurin series ofthe exponential function eX, we define the exponential
ofa matrix.
Definition 6.6 For any square matrix A, the exponential matrix of A is defined as
the series
00 Ak A2 A3
eA =
k=O k!
L-
= I + A + - + - + ... .
2! 3!

That is, the exponential matrix e A is defined to be the (entry-wise) limit of the se-
quence: [e A ] IJ. . = lim [Lkm=O Ak:]
m-->oo ij
for all i, j.

Example 6.17 If

A1
D = [AI 0], then D k = for any k ~ 0.
o An
[ o
Thus, the exponential matrix eD is
00 Ak
L k~ o
00 Ak k=O
eD=L-=
k=O k!
o
6.4. Exponential matrices 233

which coincides with the definition given on page 230. 0


Practically, the computation of e A involves the computation of the powers A k for
all k 2: 1, and hence it is not easy in general. Nevertheless, one can show that the
limit e A exists for any square matrix A.

Theorem 6.23 For any square matrix A, the matrix e A exists. In other words, each
(i, j)-entry ofe A is convergent.

Proof: Since A has only n 2 entries, there is a number M such that laij I =s M for all
(i, j)-entries aij of A. Then one can easily show that [Ak]ij =s n k- t M k for all k and
i, j . Thus
00 1 1
[e A ] . . < '"' _n k - t Mk = _e nM
IJ - ~k' n'
k=O

Ak
= L k!
00
so by the comparison test, each entry of eA is absolutely convergent for
n=O
any square matrix A. 0

Example 6.18 If A = [6 ; l then

eA = I+A + ~ A +.. .
2
2

= [6 ~]+[~ ;]+~[~ 1~3]+ ... = [~~l


It is a good exercise to calculate the missing entry * directly from the definition. 0

Problem 6.16 Let A = [~ ~] . Find k~~ A k if it exists. (Note that the matrix A is not
diagonalizable.)

The following theorem is sometimes helpful to compute eA.

Theorem 6.24 Let At, A2, A3, .. . be a sequence of m x n matrices such that
lim Ak = L. Then
k-+oo

lim BAk = BL and lim AkC = LC


k-+oo k-+oo

for any matrices B and C for which the products can be defined.
234 Chapter 6. Eigenvectors and Eigenvalues

Proof: By comparing the (i , j)-entries of both sides

lim [BAk]ij = lim (t[B]il[Ak]j) = t[B]il lim [Aklj


k->oo k->oo =1 =1 k-scx:
m
=~)B]il[L]j = [BL]ij,
=1

we get lim BAk = BL. Similarly lim AkC = LC. D


k->oo k-e-tx:

For example, if A is a diagonalizable matrix and Q -I A Q = D is diagonal for


some invertible matrix Q, then, for each integer k ::: 0, A k = Q D k Q - I and

lim )..1
k->oo
lim A k = Q (lim D k ) Q-I = Q ".
k->oo k->oo
[ o
Thus, lim A k exists if and only if lim )..f exists for i = 1,2, . .. , n.
k->oo k-vtx:
Also, by Theorem 6.24,

whose computation is easy.

Example 6.19 (Computing e A for a diagonalizable matrix A) Let A =! [i ;].


Then its eigenvalues are 1 and 2 with associated eigenvectors UI = [ - ~ ] and U2 =

[~ l respectively. Thus A = QDQ-I with D =


Therefore,
[~ ~] and Q = [-~ ~ l

The following theorem shows some basic properties of the exponential matrices,
whose proofs are easy, and are left for exercises.

Theorem 6.25 (1) eM B = eAe B provided that AB = BA.


6.5.1. Application: Differential equations II 235

(2) e A is invertible for any square matrix A, and (e A) -I = e" A.


(3) e Q- 1AQ = Q-IeAQforany invertible matrix Q.
(4) If AI, A2 , .. . , An are the eigenvalues ofa matrix A with their associated eigen-
vectors VI, V2, . . , Vn , then eAi 's are the eigenvalues ofe A with the same asso-
ciated eigenvectors Vi 'S for i = I, 2, . . . , n. Moreover, det e A = eAI . eA. =
etr(A) f:. 0 for any square matrix A.

Problem 6.17 Prove Theorem 6.25.

Problem 6.18 Finish the computation of e A for the matrix A in Example 6.18.

Problem 6.19 Prove that if A is skew-symmetric, then e A is orthogonal.

In general, the computation of eA is not easy at all if A is not diagonalizable,


However, if A is a triangular matrix, it is relatively easy as shown in the following
example.

Example 6.20 (Computing e A for a triangular matrix of the form A = >..I + N)


For A = [~ ~], compute eA.

Solution: Write A = 21 + N with N = [~ ~ l Since (2I)N = N(2I), by


Theorem 6.25(1), e A = e2l eN . From the direct computation of the series expansion,
we get e 21 = e 2I . Moreover, since N k = 0 for k ~ 2, eN = 1+ N + ~~ + ... =

I + N = [~ ~ l Thus,

2].
A 2
e = e (l + 2
N) = e [ 0I 31]=[eo2 3 2
ee
o

2 3 0]
Problem6.20 Compute e A for A = 0 2 3 .
[ 002

6.5 Applications continued

6.5.1 Linear differential equations II

One of the most prominent applications of exponential matrices is to the theory


of linear differential equations. In this section, we show that a general solution of
y' (t) = Ay(t) is of the form y(t) = elAyo .
236 Chapter 6. Diagonalization

Lemma 6.26 For any t E IR and any square matrix A, the exponential matrix
2 3
etA = 1+ tA + ~A2 + ~A3 + ...
2! 3!
d
is a differentiable function oft, and dt etA = Ae tA.

Proof: By absolute convergence of the series expansion of etA one can use term by
term differentiation, i.e.,

d tA d t2 t3
-e = -(l+tA+-A 2+-A3+ ... )
dt dt 2! 3!

= A +t A 2 +-t2A3 + .. . = A e tA . o
21

As a direct consequence ofLemma 6.26, one can see that y(t) etAyo is a solution =
of the linear differential equation y' = Ay. In fact, by taking the initial vector Yo as
the standard basis vector ei, 0 ::: i ::: n , the n columns of e/ A become solutions and
they are clearly linearly independent, so they form a fundamental set of solutions.
Hence, we have the following .

Theorem 6.27 For any n x n matrix A, the linear differential equation y' = Ay has
a general solution
y(t) = etAyo.

In particular, if A is diagonalizable, say Q-I A Q = D is diagonal with a basis-


change matrix Q = [VI' . vn ] consisting of n linearly independent eigenvectors of
A belonging to the eigenvalues Ai'S, then a general solution of a system y' = Ay is

y(t) = etAyo = etQDQ-t Yo = Qe tD Q-I yO


A1t

= [VI . . vn ] [ e 0 ] [ CI ]

o eA. / Cn

= CJeAt/vl + C2eA2/v2 + ...+ cneA.tv n .

In fact, {Yi (t) = e Aj / Vi : i = I , ... , n} forms a fundamental set of solutions, and the
constants (CJ, . , cn) = Q-l yO can be determined if an initial condition is given.
Note that this just rephrases Theorem 6.22.

Example 6.21 (y' = Ay for a diagonalizable matrix A) Solve the system

[~l~~~ ] [~ b] [ ~~g~ J.
= or

with initial conditions YI (0) = I, Y2(0) = o.


6.5.1. Application: Differential equations II 237

Solution: (l ) The eigen values of A = [~ b] are X] = I andA2 = -1 with associated

If and V2 = If, respectively.


l
eigenvectors VI = [1 [I -
(2) By setting Q = [VI V2] = [~ _~ Q- IAQ = [b _~] = D.
(3) A general solution yet) = elAyo is

elAyo = eIQDQ-lyO = QeIDQ -I yO


l
1 1 ] [ e0 e-0] Q-I Yo = CJe 1 [ 11 ] + C2 e_I [
-11]
[ 1 -1 I

with constants CI , C2. The initial conditions YI(O) = 1, Y2(0 ) = 0 determine CI


C2 = 1,so that
D

Problem 6.21 Solvethe system { Y}


Y2

I
y' 4YI + Y3
Problem 6.22 Solve the system Y~ - 2YI + Y2
Y3 -2YI + Y3,
and find the particular solution of the system satisfying the initial conditions YI (0)
-I, Y2(0) = I, Y3(0) = o.

If A is not diagonalizable, then it is not easy in general to compute etA directly.


However, one can still reduce A to a simpler form called the Jordan canon ical form,
which will be introduced in Chapter 8, and then the computation of el A is made
relatively easy. The following example shows that the computation of eA is possible
for some triangular matrices A even if they are not diagonalizable. A general case
will be treated again in Chapter 8.

Example 6.22 (y' = Ay for a triangular matrix A = AI + N) Solve the system


y'= Ay of linear differential equations with initial condition yeO) = Yo, where

A=[~ .I Yo = [ : ] .

Solution: First note that A has an eigenvalue Aof multiplicity 2 and is not diagonal-
izable. One can rewrite A as

Then, by the same argument as in Example 6.20,


238 Chapter 6. Diagonalization

etA = et(A/+N ) = eAt etN = eAt [6 ~] .


Therefore, the solution is

_ tA
y - e Yo - e
_ At [1 0 1
t] [ a ] _ [ (a + bt)e
b - beAt
At
] _ At [ a ]
- e b
+ te
At [ b ]
O'

In terms of components , Yl = (a + bt)e At, Y2 = be", D

Example 6.23 (y' = Ay with A having complex eigenvalues)Find a general solution


of the system y' = Ay, where

a
A= [ b
-b]a '

Solution: Note that the eigenvalues of A are a ib , which are not real. However,
one can compute etA directly without using diagonalization. We first write A as

A = [ ab -b]
a a [1 0] + b [0 -1]
= 0 1 1 0 = al + bJ.

Then clearly I J = J I and et A = eatI+bt J = eatebt J . Since

J2 = [-1 0]0 -1 = -I, J3 = [ - 1 0 01] = -J,

one can deduce Jk = Jk+4 for all k = 1,2, .. . , and


bt J (bt)2 J2 (bt)3J3 (bt)4 J4
1+-+--+--+--+
I! 2! 3! 4!
..

- (bt)+ -(bt)3
3!
- -(bt)5
5!
+ .. ]
(bt)2 (bt)4
1---+--- ..
2! 4!

_ [cosbt - sinbt ]
- sinbt cosbt
for any constant band t . Thus, a general solution of y' = Ay is

Y= i Ac = b J = eat cosbt
eatetc . b - sin bt ] [ Cl ]
[ SIn t cosbt C2
In terms of components,

Yl = eat(Cl cos bt - C2 sin bt)


{ Y2 = eat(CJ sin bt + C2 cos bt). D
6.5.1. Application: Differential equations II 239

Problem 6.23 Solvethe system y' = Aywithinitialcondition yeO) = Yo by computing etAyo

U~ j l
for

(1) A ~ [~ =;J. YO ~ [ : 1 (2) A ~ Yo ~ [:]

Remark: Consider the n-th order homogeneous linear differential equation


dny dn -ly d n-2y
dt n + al dt n- l + a2 dt n-2 + ... + anY = 0,
where a, are constants and yet) is a differentiable function on an intervall = (a, b).
A fundamental theorem of differential equations says that such a differential equation
has a unique solution yet) on I satisfying the given initial condition: For a point to
in I and arbitrary constants Co , .. . , Cn-I , there is a unique solution y yet) of the =
equation such that y(to) = co, y'(to) = CI, . . . ,y(n-l)(to) = Cn-I . This can be
confirmed as follows : Let

Yl = y,
dYI
Y2 = y' =
dr '
dY2
Y3 = y" =
dr'

y(n-I) dYn-l
Yn = =
dt
Then the original homogeneous linear differential equation is nothing but
dYn dny
dr = dtn = -alYn - a2Yn-l - . . . - an-lY2 - anYl

In matrix notation,

-an-l -an
Y~ ] o 0
y(I)=
[
~ =
o 0
[J. ] = Ay(I),

o 0 o
which is just a system of linear differential equations with a companion matrix A. It
is treated in Section 6.3.3 (see Theorem 6.22). Therefore, the solution of the original
differential equation is just the solution of y'(t) = Ay(t), which is of the form:

yet) = cle Att + . .. + cneAnt

if A has distinct eigenvalues AI, . . . , An. In Chapter 8, we will discuss the case of
eigenvalues with multiplicity.
240 Chapter 6. Eigenvectors and Eigenvalues

6.6 Diagonalization of linear transformations

Recall that two matrices are similar if and only if they can be the matrix representations
of the same linear transformation, and similar matrices have the same eigenvalues . In
this section, we aim to find a basis ex so that the matrix representation of a linear trans-
formation with respect to ex is a diagonal matrix. First, we start with the eigenvalues
and the eigenvectors of a linear transformation.

Definition 6.7 Let V be an n-dimensional vector space, and let T : V ~ V be


a linear transformation on V . Then the eigenvalues and eigenvectors of T can be
defined by the same equation, Tx = AX, with a nonzero vector X E V.

Practically, the eigenvalues and eigenvectors of T can be computed as follows: Let


ex = {VI , V2, ... , vn } be a basis for V. Then the natural isomorphism 4> : V ~ IRn
identifies the associated matrix A = [Tl a : IRn ~ IRn with the linear transformation
T : V ~ V via the following commutative diagram.
T
V V

~l~ ~l~
IRn IRn
A = [Tl a

Now, the eigenvalues of T are those of its matrix representation A = [Tl a because
if [Tl a is similar to [Tlp for any other basis f3 for V, then their eigenvalues are the
=
same by Theorem 6.3. For eigenvectors of T, note that X (Xl, X2, . . . ,Xn ) E IRn
is an eigenvector of A belonging to A (Ax = AX) if and only if 4>-1(X) = V =
XlVI + X2V2 + ... + XnV n E V is an eigenvector of T (T(v) = AV), because the
commutativity of the diagram shows

[T(v)la = [Tla[vl a = Ax = AX = [Avla .


Therefore, if XI. X2, . . . xk are linearly independent eigenvectors of A = [T]a,
I

then 4>-I(XI), 4>-I(X2), ... , 4>-1(Xk) are linearly independent eigenvectors of T.


Hence, the linear transformation T has a diagonal matrix representation if and only
if it has n linearly independent eigenvectors, by Theorem 6.7.
The following example illustrates how to find a diagonal matrix representation of
a linear transformation on a vector space .

Example 6.24 Let T : P2(1R) ~ P2(1R) be the linear transformation defined by

(Tf)(x) = f(x) +xf'(x) + f'(x).

Find a basis for P2(1R) with respect to which the matrix of T is diagonal.
6.6. Diagonalization of linear transformations 241

Solution: First of all, we find the eigenvalues and the eigenvectors of T. Take a basis
=
for the vector space P2(lR), say ex {I, x, x 2}. Then the matrix of T with respect to
ex is
1 1 0]
[Tl a = 0 2 2 ,
[ 003

which is upper triangular. Hence, the eigenvalues of T are AI = 1, A2 = 2 and


A3 = 3. By a simple computation, one can verify that the vectors XI = (l, 0, 0),
X2 =(1, 1, 0) and X3 =(1, 2, 1) are eigenvectors of [Tl a in ]R3 belonging to
eigenvalues AI, A2, A3, respectively. Their associated eigenvectors of Tin P2(lR)
are II(x) = 1, h(x) = 1 + x , f3(x) = 1 + 2x + x 2 , respectively . Since the
eigenvalues AI, A2, A3 are all distinct, the eigenvectors {XI,X2, X3} of [Tl a are
linearly independent and so are {3 = {ft, 12, f3} in P2(lR). Thus , each Ij is a basis
for the eigenspace E(Aj) of T belonging to Ai for i = 1, 2, 3, and the basis-change
matrix is

1 1 1]
Q = [idl p = [XI X2 x3l = [[fila [hla [f3la l = 0 1 2 .
[ 001

Hence, by changing the basis ex to (3, the matrix representation of T is a diagonal


matrix :

I 00]
[T],B = [idl~[Tla[idlp = Q-I[T]aQ =
[ 003
0 2 0 = D. o

Note that, if T = A is an n x n square matrix written in column vectors, A =


[CI . .. cnl, then the linear transformation A : IRn ~ IRn is given by A(ej) =
Cj, i = 1, . .. , n, so that A itself is just the matrix representation with respect
to the standard basis ex = [ej , . . . , en} for IRn , say A = [Ala . Now if there is a
basis {3 = [xj , .. . ,xn } of n linearly independent eigenvectors of A, then the natural
isomorphism <I> : IRn ~ IRn defined by <I>(X j) = ej is simply a change of basis by
the basis-change matrix Q = [idlp and the matrix representation of A with respect
to (3 is a diagonal matrix :

Problem 6.24 Let T be the lineartransformation on lR 3 defined by

T(x, y, z) = (4x + z, 2x + 3y + Zz, x + 4z) .


Find all the eigenvalues and their eigenvectors of T and diagonalize T.

Problem 6.25 Let M2x2(lR) be the vectorspace of all real 2 x 2 matrices and let T be the
linear transformation on M2x2 (R) defined by
242 Chapter 6. Eigenvectors and Eigenvalues

-I: C
b]=[a+b+d a+b+C]
d b+c+d a+c+d .

Find the eigenvalues and basis for each of the eigenspaces of T , and diagonalize T .

Problem 6.26 Let T : P2(lR) ~ P2(lR) be the linear transformation defined by T(f(x)) =
lex) +x/'(x). Find all the eigenvalues of T and find a basis ex for P2(lR) so that [T]a is a
diagonal matrix .

6.7 Exercises
6.1. Find the eigenvalues and eigenvectors for the given matrix, if they exist.

(1) [_~ ~ l (2)


[
311 -i l
1
- 33 ] ,

(3) [! ~ ! ~], [~1 ~ ~ ~], (4)

[-! ~~ -! ~~], [i -1 ~ j].


1010 III

(5) (6)
-1 0 -1 2 0 0 2 1 .

n
6.2. Find the characteristic polynomial, eigenvalue s and eigenvectors of the matrix

-2 0
A =[ 3 2
4 -1

6.3. Show that a 2 x 2 matrix A = [; ~] has


(1) two distinct real eigenvalues if (a - d)2 + 4bc > 0,
(2) one eigenvalue if (a - d)2 + 4bc = O.
(3) no real eigenvalues if (a - d)2 + 4bc < 0,
(4) only real eigenvalues if it is symmetric (i.e., b = c).
6.4. Suppose that a 3 x 3 matrix A has eigenvalues -1, 0, 1 with eigenvectors u, v, w,
respectively. Describe the null space N(A) , and the column space C(A) .
6.5. For any two matrices A and B, show that
(1) adj AB = adj B . adj A;
(2) adj QAQ-l = Q(adj A)Q-l for any invertible matrix Q;
(3) if AB = BA, then (adj A)B = B(adj A) .
6.7. Exercises 243

(Hint : It was mentioned for any two invertible matrices A and B in Problem 2.14)
6.6. If a 3 x 3 matrix A has eigenvalues 1, 2, 3, what are the eigenvectors of B = (A -l)(A-
2l)(A - 31) ?
6.7. Show that any 2 x 2 skew-symmetric nonzero matrix has no real eigenvalue .
6.8. Find a 3 x 3 matrix that has the eigenvalues Al = 1, AZ = 2, A3 = 3 with the associated
eigenvectors XI = (2, -1 ,0), Xz = (- 1, 2, -1) , x3 = (0, -1,2 ).
6.9. Let P be the projection matri x that project s lRn onto a subspace W. Find the eigenvalues
and the eigenspaces for P.
6.10. Let u ,Y be n x 1 column vectors, and let A =
UyT. Show that u is an eigenvector of A,
and find the eigenvalues and the eigenvectors of A.
6.11. Show that if A is an eigenvalue of an idempotent n x n matrix A (i.e., A Z = A) , then A
must be either 0 or 1.
6.12. Prove that if A is an idempotent matrix, then tr(A) = rank A.
6.13. Let A = [aU] be an n x n matrix with eigenvalues AI, . . . , An . Show that

Aj = ajj + L (au - Ai) for j = I, .. . , n.


i f.j

6.14. Prove that if two diagonalizable matrices A and B have the same eigenvectors (i.e., there
exists an invertible matrix Q such that both Q-I AQ and Q-I BQ are diagonal ; such
matrices A and B are said to be simultaneously diagonalizable), then AB BA . In fact, =
the converse is also true. (See Exercise 7.17.) Prove the converse with an assumption that
the eigenvalues of A are all distinct.
6.15. Let D : P3 (lR) --+ P3 (lR) be the differentiation defined by Df (x ) = !,(x) for f E P3(lR).
Find all eigenvalues and eigenvectors of D and of DZ.
6.16. Let T : Pz(R) --+ Pz (R) be the linear transformation defined by

T(azx
z
+ al x + ao) = (ao + al )x z + (a l + az )x + (ao + az )
Find a basis for Pz(lR) with respect to which the matrix representation for T is diagonal.
6.17. Determine whether or not each of the following matrices is diagonalizable.

(1) [ i b -;],
-I 2 3
(2) [i ~ ~] ,
0 I 2
(3) [ ~ ~ ~]
-2 0 -I
.

6.18. Find an orthogonal matrix Q and a diagonal matrix D such that Q T A Q = D for
(1) A = [-;
4
-~ ~ ] , (2) A = [;
2 -3 0 2 3
; ~], (3) A = [b 0 1 1
~ ~] .
6.19. Calculate A lOx for A = [bo ~ 6 -2
=;] , x= [ ~] .
7
6.20. For n ~ 1, let an denote the number of subsets of {I , 2 , . . . , n) that contain no consecutive
integers. Find the number an for all n ~ 1.
6.21. Find a general solut ion of each of the following recurrence relations .
244 Chapter 6. Eigenvectors and Eigenvalues

(1) Xn = 6Xn- l - llx n _ 2 + 6x n- 3' n 2: 3,


(2) Xn = 3Xn - l - 4Xn - 2 + 2x n- 3 , n 2: 3,
=
(3) Xn 4Xn- l - 6Xn - 2 + 4Xn - 3 - Xn- 4 , n 2: 4.

6.22. LetA = [0~6 0;3] . Find a value X so that A has an eigenvalue A = 1. For X() = (1,1),
calculate lim Xb where Xk
k-..oo
= AXk-l , k = 1, 2, . ...
6.23 . Compute e A for

(1) A = [~ ~ l (2) A = [i ~ l
6.24 . In 2000, the initial status of the car owners in a city was reported as follows: 40% of the
car owners drove large cars, 20% drove medium-sized cars, and 40% drove small cars. In
2005, 70% of the large-car owners in 2000 still owned large cars, but 30% had changed to
a medium-sized car. Of those who owned medium-sized cars in 2000, 10% had changed to
large cars , 70% continued to drive medium-sized cars, and 20% had changed to small cars .
Finally, of those who owned the small cars in 2000, 10% had changed to medium-sized
cars and 90% still owned small cars in 2005. Assuming that these trends continue, and
that no car owners are born, die or otherwise add realism to the problem, determine the
percentage of car owners who will own cars of each size in 2035.

6.25 . Let A = [~ ; J
(1) Compute eA directly from the expansion.
(2) Compute eA by diagonalizing A.
6.26 . Let A(t) be a matrix whose entries are all differentiable functions in t and invertible for
all t . Compute the following :

~(A(t)-l).

m
(1) :t (A(t)3) , (2)
dt
6.27. Solve y' = Ay, where
(1)A=
-6
-1
24
8
[ 2 -12 -:J ODd y(l) ~
(2) A = [; -~] and y(O) = [ ~ l
I
y' - Yl Y2 + 2Y3
6.28. Solve the system Y~:: 3Yl + 4Y3
Y3 = 2Yl + Y2
with initial conditions Yl (0) = 0, Y2(0) = 2, Y3(0) = 1.
6.29 . Let f(A) = det(Al - A) be the characteristic polynomial of A . Evaluate f(A) for

(1) A = [;
I
~ ~] ,
1 3
(2) A =[ ~
-1
; -
1
i].
4
In fact, f(A) = 0 for any square matrix A and its characteristic polynomial f(A). (This
is the Cayley-Hamilton theorem).
6.30. Determine whether the following statements are true or false, in general, and justify your
answers.
6.7. Exercises 245

(1) If B is obtained from A by interchanging two rows, then B is similar to A .


(2) If A is an eigenvalue of A of multiplicity k , then there exist k linearly independent
eigenvectors belonging to A.
(3) If A and Bare diagonalizable, so is AB .
(4) Every invertible matrix is diagonalizable.
(5) Every diagonalizable matrix is invertible.
(6) Interchanging the rows of a 2 x 2 matrix reverses the signs of its eigenvalues.
(7) A matrix A cannot be similar to A + I.
(8) Each eigenvalue of A + B is a sum of an eigenvalue of A and one of B.
(9) The total sum of eigenvalues of A + B equals the sum of all the eigenvalues of A and
of those of B.
(10) A sum of two eigenvectors of A is also an eigenvector of A.
(11) Any two similar matrices have the same eigenvectors .
(12) For any square matrix A, det eA = edet A .
7
Complex Vector Spaces

7.1 The n-space en and complex vector spaces


So far, we have been dealing with matrices having only real entries and vector spaces
with real scalars. Also , in any system of linear (difference or differential) equations,
we assumed that the coefficients of an equation are all real. However, for many ap-
plications of linear algebra, it is desirable to extend the scalars to complex numbers.
For example, by allowing complex scalars, any polynomial of degree n (even with
complex coefficients) has n complex roots counting multiplicity. (This is well known
as the fundamental theorem of algebra). By applying it to a characteristic polynomial
of a matrix, one can say that all the square matrices of order n will have n eigen-
values counting multiplicity. For instance, the matrix A = [~ - ~ ] has no real
eigenvalues, but it has two complex eigenvalues A. = I i . Thus, it is indispensable
to work with complex numbers to find the full set of eigenvalues and eigenvectors.
Therefore, it is natural to extend the concept of real vector spaces to that of complex
vector spaces, and develop the basic properties of complex vector spaces.
The complex n-space en is the set of all ordered n-tuples (zr, Z2 , ... ,Zn) of
complex numbers:

en = {(ZI,Z2 , .. . ,Zn): u E C, i = 1,2, .. . ,n},

and it is clearly a complex vector space with addition and scalar multiplication defined
as follows :

(Zl , Z2 , . . . , Zn) + (Z'l ' z;, , z~) = (Zl + zl' Z2 + z; , ... , Zn + z~)
k(ZI , Z2, , Zn) = (kZI, kZ2, ... ,kzn) for k E Co

The standard basis for the space en is again {ej , e2, .. . , en} as the real case, but
the scalars are now complex numbers so that any vector z in en is of the form
z = Lk=l Zkek with Zk = Xk + iYk E C, i.e., Z = x + iy with x , y E jRn .
In a complex vector space, linear combinations are defined in the same way as
the real case except that scalars are allowed to be complex numbers. Thus the same

J H Kwak et al., Linear Algebra


Birkhauser Boston 2004
248 Chapter 7. ComplexVectorSpaces

is true for linear independence, spanning spaces, basis, dimension, and subspace. For
complex matrices, whose entries are complex numbers, the matrix sum and product
follow the same rules as real matrices. The same is true for the concept of a linear
transformation T : V -+ W from a complex vector space V to a complex vector space
W. The definitions of the kernel and the image of a linear transformation remain the
same as those in the real case, as well as the facts about null spaces , column spaces,
matrix representations of linear transformations, similarity, and so on.
However, if we are concerned about the inner product , there should be a modifica-
tion from the real case. Note thatthe absolutevalue (or modulus) of a complex number
Z = x+iy is defined as the nonnegative real number lzl = (Zz)! = Jx 2 + y2, where f
is the complex conjugate of z. Accordingly, the length of a vector z = (Zt, Z2, , Zn)
in the n-space en
with Zk = Xk + iYk E e has to be modified: if one would take an
inner product in en
as IIzll2 = zI
+ .. . +z;,
then a nonzero vector (1, i) in 2 would e
have zero length : 1 + i = 0. In any case, a modified definition should coincide
2 2

with the old definition, when the vectors and matrices were real. The following is the
definition of a usual inner product on the n-space en.
Definition7.1 For two vectors U=[UI U2 . .. unf andv=[vi Vz .. . vnf in en,
uk. Vk E C, the dot (or Euclidean inner) product u . v ofu and v is defined by

u -v = UlVI + UZVz + .. .+ UnVn = iiTv,

where ii = [iiI U2 . . . unf, the conjugate ofu. The Euclidean length (or magni-
tude) of a vector u in en is defined by

where IUkl 2 = UkUk, and the distance between two vectors u and v in en is defined
by
d(u, v) = [u - vII.
In an (abstract) complex vector space, one can also define an inner product by
adopting the basic properties of the Euclidean inner product on as axioms. en
Definition 7.2 A (complex) inner product (or Hermitian inner product) on a com-
plex vector space V is a function that associates a complex number (u, v) with each
pair of vectors u and v in V in such a way that the following rules are satisfied: For
all vectors u, v and w in V and all scalars k in C,
(1) (u, v) = (v, u),
(2) (u + v, w) = (u, w) + (v, w) (additivity),
=
(3) (ku , v) k(u, v)
(4) (v, v) 2:: 0, and (v, v) = if and only if v = 0
(antilinear) ,
(positive definiteness).
A complex vector space together with an inner product is called a complex inner
product space or a unitary space. In particular, the n-space en with the dot product
is called the Euclidean (complex) n-space.
7.1. The n-space en 249

The following properties are immediate from the definition of an inner product:
(5) (0, v) = (v,O) = 0,
(6) (0, v + w) = (0, v) + (0, w),
(7) (0 , kv) = k(o , v) .

Remark: There is another way to define an inner product on a complex vector space.
If we redefine the dot product 0 . v on the n-space en by

o v = UIVI + U2V2 + . .. + UnV n,

then the third rule in Definition 7.2 should be modified to be


(3') (0, kv) = k{o, v), so that (ko, v) = k(o , v).
But these two different definitions do not induce any essential difference in a complex
vector space.
In a complex inner product space, as the real case, the length (or magnitude) of
a vector 0 and the distance between two vectors 0 and v are defined by
1
11011 = (u, o)~ , d(o, v) = 110 - vII,

respectively.

Example 7.1 (A complexinner product onafunction space) Let Cc[a, b] denote the
set of all complex-valued continuous functions defined on [a, b]. Thus an element in
Cc[a , b] is of the formf (x) = II (x)+ih(x), where I I (x) and h(x) are real-valued
and continuous on [a, b]. Note that f is continuous if and only if each component
function Ii is continuous. Clearly, the set Cc[a , b] is a complex vector space under
the sum and scalar multiplication of functions . For a vector f(x) = II (x) + ih(x) in
Cc[a, b] , its integral is defined as follows:

l b
f(x)dx = l b
[II (x) + ih(x)]dx = l b
II (x)dx + i l b
h(x)dx .

It is an elementary exercise to show that, for vectors f(x) = II(x) + ih(x) and
g(x) = gl (x) + ig2(X) in the complex vector space Cc[a , b], the following formula
defines an inner product on Cc[a, b] :

(f, g) = l b
f(x)g(x)dx

= l b
[[I (x ) - ih(x)][gl (x) + ig2(X)] dx

l b
[II (x )gl (x) + h(X )g2(X)] dx

+i l b
[II (X)g2(X) - h(x)gl(x)]dx.
D
250 Chapter 7. ComplexVectorSpaces

Problem 7.1 Show that the Euclidean inner product on en satisfies all the inner product axioms.

The definitions of such terms as orthogonal sets, orthogonal complements, or-


thonormal sets, and orthonormal basis remain the same in complex inner product
spaces as in real inner product spaces. Moreover, the Gram-Schmidt orthogonaliza-
tion is still valid in complex inner product spaces, and can be used to convert an
arbitrary basis into an orthonormal basis. If V is an n-dimensional complex vector
space, then by taking an orthonormal basis for V, there is a natural isometry from
V to en that preserves the inner product as in the real case. Hence, without loss of
generality, one may work only in en with the Euclidean inner product, and we use
and (, ) interchangeably.
On the other hand, one may consider the set en as a real vector space by defining
addition and scalar multiplication as
(ZI , Z2, . .. , Zn) + (z;, z~, ,z~) = (ZI + z;, Z2 + z~ , . .. , Zn + z~)
r(ZI, Z2, ,Zn) = (rZI, rZ2, .. . , r zn) for r E R
Two vectors el = (1,0, . .. ,0) and iel = (i, 0, . . . ,0) are linearly dependent when
the space en is considered as a complex vector space. However, they are linearly
independent if en is considered as a real vector space. In general,

forms a basis for en considered as a real vector space. In this way, en is naturally
identified with the 2n-dimensional real vector space lR2n . That is, dim en = n when
en is considered as a complex vector space, but dim en = 2n when en is considered
as a real vector space.
Note that when en is considered as a 2n-dimensional real vector space, the space
lRn = {(XI, X2, ... ,xn) : Xi E lR} is a subspace of en, but not when en is considered
as an n-dimensional complex vector space.
Example 7.2 (Gram-Schmidt orthogonalization on a complex vector space) Con-
sider the complex vector space e 3 with the Euclidean inner product. Apply the
Gram-Schmidt orthogonalization to convert the basis XI = (i, i, i), X2 = (0, i, i),
X3 = (0, 0, i) into an orthonormal basis.

Solution: Step 1: Set

XI (i, i, i) (i i i)
ul = IIxIIi = .J3 = .J3' .J3 ' .J3 .
Step 2: Let WI denote the subspace spanned by UI. Then
X2 - ProjWl X2 = X2 - (UI, X2)UI

.. 2(i i i)
= (0, I, I) - ../3 ../3' ../3' ../3
2i i i)
= ( -3' 3' 3" .
7.1. The n-space en 251

Therefore,
x2-ProjW1x2 3 ( 2i i i) (2i i i)
U2 = Projw\x211 =./6 -3' 3' 3 = - ./6 ' ./6 ' ./6 .
II x2 -
Step 3: Let W2 denote the subspace spanned by {UI, U2} . Then

X3 - ProjW2x3
= X3 - (UI, X3)UI - (U2 , X3)U2
1 (i i i) 1 ( 2i i i)
= (0, 0, i) - ./3 ./3' ./3'./3 - ./6 - ./6' ./6' ./6

= (o,-~ ,~) .
Therefore,

Thus,

iii ) (2i i i ) (, i i )
UI= ( ./3' ./3' -Jj , U2= - ./6 ' ./6' ./6 ' U3=,0, - ../2' .j2

form an orthonormal basis for (; 3. o

Example 7.3 (An orthonormal set in the complex-valuedfunction space Cc[O, 2rrD
Let Cc[O, 2rr] be the complex vector space with the inner product given in Example
7.1, and let W be the set of vectors in Cc[O, 2rr] of the form
eikx = coskx + i sinkx,
where k is an integer. The set W is orthogonal. In fact, if
gk(X) = eikx and gl(X) = eilx
are vectors in W, then

(gk. ge) = i 2
1l" eikxeil xdx = i 2
1l" e-ikxeilxdx = i 2
1l" ei(l-k)xdx

=i i
2 2

I
1l" cos(i - k)xdx + i 1l" sin(i - k)xdx

= [ l~k sin(i - k)x J: + i [ l--~ cos(i - k)x J: if k i= i ,


10271" dx if k = l.

if k i= i,
if k = e.
252 Chapter 7. ComplexVectorSpaces

Thus, the vectors in Ware orthogonal and each vector has length $ . By normalizing
each vector in the orthogonal set W, one can get an orthonormal set. Therefore, the
vectors
1 lO
k
fk(X) = r-:t=e x, k = 0, 1, 2,
v2rr
form an orthonormal set in the complex vector space CdO, 2rr]. D
Problem 7.2 Prove that in a complexinner productspace V,
(1) I(x, y}!2 S (x, x}(y,y} (Cauchy-Schwarzinequality),
(2) IIx+YII S [x] + IIYII (triangleinequality),
(3) IIx + YII 2 = IIxll 2 + IIYII 2 if x and Yare orthogonal (Pythagorean theorem).

The definitions of eigenvalues and eigenvectors in a complex vector space are the
same as the real case, but the eigenvalues can now be complex numbers. Hence , for
any n x n (real or complex) matrix A , the characteristic polynomial det(A./ - A) has
always n complex roots (i.e., eigenvalues) counting multiplicities.
For example, consider a rotation matrix

A = [ c~s () - sin () ]
sm () cos ()

with real entries. This matrix has two complex eigenvalues for any () E JR, but no real
=
eigenvalues unless () kn for an integer k.
Therefore, all theorems and corollaries in Chapter 6 regarding eigenvalues and
eigenvectors remain true without requiring the existence of n eigenvalues explicitly,
and exactly the same proofs as the real case are valid since the arguments in the proofs
are not concerned with what the scalars are. For example , one can have a theorem
like 'for an n x n matrix A, the eigenvectors belonging to distinct eigenvalues are
linearlyindependent', and 'if the n eigenvalues of A aredistinct, then the eigenvectors
c
belonging to themform a basisfor n so that A is diagonalizable ',
An n x n real matrix A can be considered as a linear transformation on both JRn
and C":

T : JRn ~ JRn defined by T(x) = Ax,


S:cn~cn defined by S(x) = Ax.

Since the entries are all real, the coefficients of the characteristic polynomial f(A) =
det(A./ - A) of A are all real. Thus, if A is a root of f(A) = 0, then its conjugate). is
also a root because f().) = f(A) = 0. In other words, if A is an eigenvalue ofa real
matrix A, then). is also an eigenvalue. In particular, any n x n real matrix A has at
least one real eigenvalue if n is odd.
Moreover, if x is an eigenvector belonging to a complex eigenvalue A, then the
complex conjugate i is an eigenvector belonging to).. In fact, if Ax = AX with x =1= 0,
then
7.1. The n-space en 253

x
where denotes the vector whose entries are the complex conjugates of the corre-
sponding entries of x.
Using this fact, the following example shows that any 2 x 2 matrix with no real
eigenvalues can be written as a scalar multiple of a rotation.

Example 7.4 Show that if A is a 2 x 2 real matrix having no real eigenvalues, then
A is similar to a matrix of the form

r cos B rsin B ]
[ -r sin B r cos B .

Solution: Let A be a 2 x 2 real matrix having no real eigenvalues, and let A = a + ib


=
and X a - ib with a, b E jR and b =1= 0 be two complex eigenvalues of A with
x
associated eigenvectors x = u + iv and = u - iv with u, v E jR2, respectively. It
follows immediately that

u = !(x + x), v = -~(x - x),


I - i -
a = 2(A + A), b = -2(A - A).

x
Since A =1= X, the eigenvectors x and are linearly independent in the complex vector
space (;2 , as they are when (;2 is considered as a real vector space. It implies that
x x
the vectors X + and x - are linearly independent in the real vector space (;2. (see
Problem 7.3 below), so that the real vectors u and v are linearly independent in the
=
subspace jR2 of the real vector space (;2. Thus a [u , v} is a basis for the real vector
space jR2, and

1 _ 1 -_)
Au 2(Ax + Ax) = 2(AX + AX

= A (U ~ iV) + X(U ~ iV) = au _ by.

Similarly, one can get Av = bu + av, implying that the matrix representation of the
linear transformation A : jR2 ~ jR2 with respect to the basis Ci is

That is, any 2 x 2 matrix that has no real eigenvalues is similar to a matrix of such
form. Now, by setting r = .Ja 2 + b2 > 0, one can get a = r cos Band b = r sin B
for some B E R, so
[A] = [ r c~s B r sin B ] . o
a -r SIn B r cos B

Problem 7.3 Let x and y be two vectors in a vector space V . Show that x and y are linearly
independent if and only if x + y and x - y are linearly independent.
254 Chapter 7. ComplexVectorSpaces

Problem 7.4 Find the eigenvalues and the eigenvectors of

(1)
[ 0i O
2 01] ,
1 0 -i
[1 . . :
(2) - i
1- i 0
2 0
1
.

Problem 7.5 Prove that an n x n complex matrix A is diagonalizable if and only if A has n
linearly independent eigenvectors in the complex vector space en.

7.2 Hermitian and unitary matrices


Recall that the dot product of real vectors x, y E lRn is given by x . y x Ty in=
matrix form . For complex vectors u, v E en, the Euclidean inner product is defined
by u . v = UI VI + .. . + Un Vn = iiT v, which involves the conjugate transpose, not
just the transpose.

Definition 7.3 For a complex matrix A, its complex conjugate transpose, A H = AT,
is called the adjoint of A.
Note that A is the matrix whose entries are the complex conjugates of the corre-
sponding entries in A . Thus, [aij]H = [a jil. With this notation, the Euclidean inner
product on en can be written as

u- v = -T
U v = u v,
H

Problem 7.6 Show that (AB)H = B HAH when AB can be defined.


Problem 7.7 Prove that if A is invertible, so is A H, and (A H)-I = (A -I)H .

For complex matrices, the notion of symmetry and skew-symmetry real matrices
are replaced by Hermitian and skew-Hermitian matrices , respectively.
Definition 7.4 A complex square matrix A is said to be Hermitian (or self-adjoint)
if A H = A , or skew-Hermitian if A H = -A.
For matrices

A= [ 4 ~ i 4 j i] and B= [ _/+ i 1': i J.


one can see that A is Hermitian and B is skew-Hermitian.
A Hermitian matrix with real entries is just a real symmetric matrix, and con-
versely, any real symmetric matrix is Hermitian.
Like real matrices , any m x n (complex) matrix A can be considered as a linear
transformation from en to em, and

(Ax). y = (Ax)H y = x H AHy = x. (AHy)


for any x E en and y E em . The following theorem lists some important properties
.of Hermitian matrices.
7.2. Hermitian and unitary matrices 255

Theorem 7.1 Let A be a Hermitianmatrix.


(1) Forany (complex) vector x E en, x HAx is real.
(2) All (complex) eigenvalues of A are real. In particular; an n x n real symmetric
matrix has precisely n real eigenvalues.
(3) The eigenvectors ofA belongingto distinct eigenvalues are mutuallyorthogonal.

Proof: (1) Since x HAx is a 1x 1 matrix, (x HAx) = (x HAx)H = x HAx .


(2) If Ax = AX, then x HAx = x HAX = AX HX = AIIxf. The left-hand side is real
and IIxll 2 is real and positive, because x =F O. Therefore, Amust be real.
(3) Let x and y be eigenvectors of A belonging to eigenvalues Aand u, respectively.
Let A =F p: Because A = A H and A is real, it follows that

A(X . y) = (AX) . Y = Ax Y = x Ay = j.L(x . y).

Since A =F u, it gives that x . y = x H Y = 0, i.e., x is orthogonal to y. o


In particular, eigenvectors belonging to distinct eigenvalues of a real symmetric
matrix are orthogonal.

Remark: Condition (1) in Theorem 7.1 (i.e., x HAx is real for any complex vector
x E en) is equivalent to saying that the diagonals of A are real :

= 'LaijiiXj
::][:J
i.]

= "L..,aiilx;l 2 + C + C,
-

where C = Li<j aiji;xj . Since C + C is real, all au E lR if and only ifx H Ax E lR


for any x E en.

Problem 7.8 Prove that the determinant of any Hermitian matrix is real.

Problem 7.9 Let x be a nonzero vector in the complex vector space en, and A = xx H. Show
that A is Hermitian, and find all the eigenvalues and their eigenspaces for A.

It is easy to see that if A is Hermitian, then the matrix i A is skew-Hermitian;


similarly, if A is skew-Hermitian, then i A is Hermitian. Therefore, the following
theorem is a direct consequence of this fact and Theorem 7.1. The proof is left for an
exercise.
256 Chapter 7. ComplexVector Spaces

Theorem 7.2 Let A be a skew-Hermitian matrix.


(1) For any complex vector x i= 0, x H Ax is purely imaginary, and the diagonal
entries of A are purely imaginary.
(2) All eigenvalues of A are purely imaginary. In particular; a real skew-symmetric
matrix has purely imaginary n eigenvalues.
(3) The eigenvectors of A belonging to distinct eigenvalues are mutually orthogonal.

Problem 7.10 Prove Theorem 7.2 by using Theorem 7.1 , andprove (3)directly.

Problem 7.11 Show that A = B + iC (B and C real matrices) is skew-Hermitian if andonly


if B is skew-symmetric and C is symmetric.

Problem 7.12 Let A and B be either both Hermitian or both skew-Hermitian.


(1) AB is Hermitian if and only if AB =BA .
(2) AB is skew-Hermitian if andonly if AB = - B A.
Recall that a square matrix Q with real entries is orthogonalif their column vectors
are orthonormal (i.e. QT Q = l). The same is true for complex matrices (compare
with Lemma 5.18).

Lemma 7.3 For a complex square matrix U, the following are equivalent:
(1) the column vectors of U are orthonormal;
(2) UHU = I ;
(3) U-1=UH ;
(4) UUH = I ;
(5) the row vectors of U are orthonormal.

The complex analogue to an orthogonal matrix is a unitary matrix.


Definition 7.5 A complex square matrix U is said to be unitary if it satisfies anyone
(and hence, all) of the conditions in Lemma 7.3.

Like a real orthogonal matrix, any unitary matrix preserves the lengths of vectors.

Theorem 7.4 Let U be an n x n unitary matrix.


(1) U preserves the dot product on en: i.e., for all x and y in en,

(2) If A is an eigenvalue ofU, then IAI = 1.


(3) The eigenvectors ofU belonging to distinct eigenvalues are mutually orthogonal.

Proof: (1) (Ux)H (Uy) = xHUHUy = xHy.


(2) For Ux = AX, xHx = (Ux)H (Ux) = IAI 2x H x.
(3) Let Ux = AX, Uy = J-Ly, and A i= J-L. Since U is unitary, we have Ai =1=
J-LfL. and U-1 y = J-L-1 y = fLy. Therefore,
7.2. Hermitian and unitary matrices 257

hHy = (AX)Hy = (UX)H y = xHU-Iy = XH (ji,y) = ji,xHy


holds, and A :f: /-L implies x H y = O. 0

From the same argument as in the proof of Theorem 5.19, U preserves the dot
en.
product if and only if it preserves the lengths of vectors: IIUxll = IIxll for all x in
Thus, a unitary matrix is an isometry.
Theorem 7.5 A basis-change matrix from one orthonormal basis to another in a
complex vector space is unitary.

Proof: Let a = {VI, . . . , vn } and f3 = {WI , . . . , wn } be two orthonormal bases, and


let Q = [qij] be the basis-change matrix from the basis f3 to the basis a . By definition,
n
Wj = LqijVi.
i= 1
Thus ,

n n
= Lqki Lqij(Vb Vi}
k=1 i=1
n n
= Lqkiqkj = L[QH]ik[Q]kj.
k=1 k=1
This means that the columns of Q are orthonormal and Q is unitary. o
Just as in the real case, it is true that two matrices representing the same linear
transformation on a complex vector space with respect to different bases are similar.
If the two bases are both orthonormal , then the basis-change matrix is unitary (or
orthogonal) .
Problem 7.13 Show that I det U I = I for any unitary matrix U .

Problem 7.14 Show that


-
I +-i -I+i]
-
A= 2 2
[ I~i - 1+ i
2
is unitary but neither Hermitian nor skew-Hermitian.

Problem 7.15 Show that the adjoint of a unitary matrix is unitary, and the product of two
unitary matrices is unitary.

Problem 7.16 Describe all 3 x 3 matrices that are simultaneously Hermitian, unitary, and
diagonal. How many are there?
258 Chapter 7. Complex Vector Spaces

7.3 Unitarily diagonalizable matrices


In the previous section, it was shown that if an n x n square matrix A is Hermitian,
skew-Hermitian or unitary , then the eigenvectors belonging to distinct eigenvalues are
mutually orthogonal. Hence, if such a matrix A has n distinct eigenvalues, then there
exists an orthonormal basis ex for en consisting of eigenvectors of A so that the matrix
representation [Ala is diagonal, i.e., A is diagonalizable by a unitary matrix. In this
section, it will be shown that any Hermitian, skew-Hermitian or unitary matrix has n
orthonormal eigenvectors even if the eigenvalues are not all distinct. In particular, it
is always diagonalizable by a unitary matrix.

Definition 7.6 (1) Two real matrices A and B are orthogonally similar if there
exists an orthogonal matrix P such that P-' A P = B. A matrix is orthogonally
diagonalizable if it is orthogonally similar to a diagonal matrix.
(2) Two complex matrices A and B are nnitarily similar if there exists a unitary
matrix U such that U-' AU = B. A matrix is nnitarily diagonalizable if it is
unitarily similar to a diagonal matrix.

We begin with a classical theorem due to Schur (1909) concerning orthogonal and
unitary similarity.

Lemma 7.6 (Schur's Lemma) (1) Ifan n x n realmatrix A has only real eigenval-
ues, then A is orthogonallysimilar to an upper triangularmatrix.
(2) Every n x n complexmatrix is unitarilysimilar to an upper triangularmatrix.

Proof: We prove only the second assertion (2) by mathematical induction on n,


because (1) can be done in a similar way. Clearly, it is true for n =1. Assume now
that the assertion (2) holds for n = r - 1. Let A be any r x r complex matrix
and let)", be an eigenvalue of A with a normalized eigenvector x. Extend it to an
orthonormal basis by the Gram-Schmidt orthogonalization, say {x, U2, .. , u,} for
C", Set a unitary matrix U, = [x U2 ... u, 1 with these basis vectors as its columns.
A direct computation of the product o;'
AU, shows

[
Ui'AU, = uf AU, = Uf[Ax AU2 .. . Aurl
iT

~~H+ +J
--
-- u-T I
=
2
AU2 ...
I
u-T
r
A, I
+
*
= 0 I
I B
0 I
7.3. Unitarily diagonalizable matrices 259

where B is an (r - 1) x (r - 1) matrix. By the induction hypothesis there exists an


(r - 1) x (r - 1) unitary matrix V2 such that V;1 B V2 is an upper triangular matrix
with diagonal entries A2 , A3, . .. , Ar . Define

Then it is easy to check that V is also a unitary matrix , and

o
o
Schur's lemma is a cornerstone in the study of complex matrices.

Theorem 7.7 If A is either a Hermitian, a skew-Hermitian or a unitary matrix, then


it is unitarily diagonalizable.

Proof: By Schur 's lemma, V H AV = B is an upper triangular matrix for some


unitary matrix V . However,

where the right-hand sides of the equalities depend on whether A is either a Hermitian,
a skew-Hermitian or a unitary matrix. This means that the upper triangular matrix B
takes the same type; a Hermitian, a skew-Hermitian or unitary, as A.
Note that B H is a lower triangular matrix and B- 1 is an upper triangular matrix
because B is upper triangular. Therefore, the upper triangular matrix B must be a
diagonal matrix in each case of Hermitian, skew-Hermitian or unitary. 0

Note that, in the similarity condition V-I AV(= V H AU) = D of A to a diagonal


matrix D through a unitary matrix V, the equation AV = V D shows that the column
vectors of V constitute a set of n orthonormal eigenvectors of A while the diagonal en-
tries of D are eigenvalues of A as shown in Theorem 6.7. Therefore, by Theorems 7.1,
7.2 and 7.4, all the diagonal entries of D are real, purely imaginary or of unit length
depending on the types (Hermitian, skew-Hermitian or unitary, respectively) of the
matrix A.

Example 7.5 (AHermitianmatrixis unitarilydiagonalizable) Diagonalize the matrix

2 l-i]
A= [ l+i 1

by a unitary matrix.
260 Chapter 7. Complex Vector Spaces

Solution: Since A is Hermitian, it is unitarily diagonalizable. One can show that A


has the eigenvalues Al = 3 and A2 = 0 with associated eigenvectors Xl = (1 - i, 1)
and X2 = (-1 , 1 + i), respectively. Let

xII. )
ui = IIxllI
= J3(1 - I , 1 ,
X2 1 .
U2 = I x211 = J3(-1, 1 +1),

and let

U=~[l~ i l~il
Then, U is a unitary matrix and diagonalizes A:

UHAU = ~[l~/ l~i][ 111 l~i][ l~i l~i]


= [~ ~l 0

Since all the real symmetric matrices are Hermitian matrices, they are unitarily
diagonalizable by Theorem 7.7 . However, the following theorem says more than that.

Theorem 7.8 For any n x n real matrix A, the following are equivalent.
(1) A is symmetric.
(2) A is orthogonally diagonalizable.
(3) A has a full set of n orthonormal eigenvectors.

Proof: (1) =} (2): If A is real and symmetric, then it is a Hermitian matrix, so


it has only real eigenvalues. By Schur's lemma 7.6, A is orthogonally similar to an
upper triangular matrix, which must be already diagonal. Hence it is orthogonally
diagonalizable.
(2) =} (3): If A is diagonalized by an orthogonal matrix Q, then the column vec-
tors of Q are eigenvectors of A. Hence A has a full set of n orthonormal eigenvectors.
(3) =} (1): If A has a full set of n orthonormal eigenvectors, then these eigenvec-
tors form an orthogonal basis-change matrix Q such that A Q = Q D. It is now trivial
to show that A = QDQ-l = QDQT is symmetric. 0

Corollary 7.9 Let A be a real symmetric matrix, and let A be an eigenvalue of A of


multiplicity ms , Then

dim E(A) = dimN(Al - A) = m)...

By Theorem 7.8, all real symmetric matrices are always diagonalizable, even
more, orthogonally. Moreover, they are all that can be "orthogonally" diagonalized.
Even though not all matrices are diagonalizable, certain non-symmetric matrices may
7.3. Unitarily diagonalizable matrices 261

still have a full set oflinearly independent eigenvectors so that they are diagonalizable,
but in this case the eigenvectors cannot be orthogonal. That is, the basis-change matrix
Q cannot be an orthogonal matrix. For example, any triangular matrix having all
distinct diagonal entries is diagonalizable because their eigenvalues are all distinct,
but cannot be orthogonally diagonalizable if it is not diagonal (i.e., not symmetric).
Problem 7.17 Show that the non-symmetric matrices

10-1]
(1) A =
[o
0 1
0
0
2
are diagonalizable, but not orthogonally .

Remark: The procedure for orthogonal diagonalization of a symmetric matrix A can


be summarized as follows.
Step 1 Find a basis for each eigenspace of A.
Step 2 Apply the Gram-Schmidt orthogonalization to each of these bases to
obtain an orthonormal basis for each eigenspace.
Step 3 Form the matrix Q whose columns are the basis vectors constructed in
Step 2; this matrix orthogonally diagonalizes A.

The justification of this procedure should be clear, because eigenvectors belonging


to distinct eigenvalues are orthogonal, while an application of the Gram-Schmidt
orthogonalization assures that the eigenvectors obtained within the same eigenspace
are orthonormal. Thus, the entire set of eigenvectors obtained by this procedure is
orthonormal.

Example 7.6 (A symmetric matrix is orthogonally diagonalizable) Find an orthogo-


nal matrix Q that diagonalizes the symmetric matrix

4 2 2]
A= 242 .
[ 224

Solution: The characteristic polynomial of A is

A-4 -2 -2] =
det(AI - A) = det
[ -2
-2
A- 4
-2
-2
A-4
(A - 2)2(A - 8).

Thus, the eigenvalues of A are A = 2 and A = 8. By the method used in Example 6.2,
it can be shown that

Xl = (-1, I, 0) and X2 = (-1, 0, 1)

form a basis for the eigenspace belonging to A = 2. Applying the Gram-Schmidt


orthogonalization to {Xl , X2} yields the following orthonormal eigenvectors (verify):
262 Chapter 7. ComplexVectorSpaces
I 1
01 = "fi (-I, I, 0) and 02 = .J6 (-I, -I, 2).

The eigenspace belonging to A = 8 has X3 = (I, I, I) as a basis. The normalization


of X3 yields 03 = ~ (I, I, I). Finally, using 01, 02, and 03 as column vectors,
one can obtain
1 1 1
-"fi .J6 .j3
1 1 1
Q= -.J6
"fi .j3
2 1
0
.J6 .j3
which orthogonally diagonalizes A. (It is suggested that readers verify that QT A Q
is actually a diagonal matrix.) D

Example 7.7 (Diagonal, butneitherHermitian, skew-Hermitian, nor unitary) A ma-

100]
trix

A=
[o 0
0 i 0
x
, x E R with [x] f= I,

is neither Hermitian, skew-Hermitian, nor unitary, but is a diagonal matrix . Hence,


there are infinitely many unitarily diagonalizable matrices which are neither Hermi-
tian, skew-Hermitian, nor unitary. D

Problem 7.18 For each of matrices [2i 0 0]


(1) [~ ~ ] (2) i -1 -i
-1 0 2i
find a unitary matrix U such that U- I AU is an upper triangular matrix.

7.4 Normal matrices


We have seen that Hermitian, skew-Hermitian and unitary matrices are all unitarily
diagonalizable. However, it turns out thatthey do not constitute the entire class of uni-
tarily diagonalizable matrices, whereas in the class of real matrices the real symmetric
matrices are the only matrices with real entries that are orthogonally diagonalizable.
That is, there are infinitely many unitarily diagonalizable matrices that are neither one
of the above-mentioned classes of matrices, (see Example 7.7). Actually, all unitar-
ily diagonalizable matrices belong to the following class of matrices, called normal
matrices .

Definition 7.7 A complex square matrix A is called normal if


AA H =AHA.
7.4. Normal matrices 263

Note that all the Hermitian, skew-Hermitian and unitary matrices are normal ,
But, there are infinitely many matrices that are normal but are none of these, as shown
in Example 7.7. Moreover, there exists an example of such matrices which are not
diagonal.

Example 7.8 (Normal, but neither Hermitian , skew-Hermitian, unitary, nor diago-
nal) The 2 x 2 matrix

is normal, but is neither Hermitian, skew-Hermitian, unitary, nor diagonal. However,


one can easily check that this matrix is unitarily diagonalizable. In fact,

Problem 7.19 Whichof followingmatricesare Hermitian,skew-Hermitian, unitaryor normal?

~ ~] , (3) [ - ~ L -~],
I 1 -i 0 -i

(4) [
-i
3 2
3] ' o ,(6) [3
Oi] 1 -:- i
l+1 ii]
3 .
o 0 -I 3 1

As a matter of fact, it will be shown that the normal matrices are all classified as
the unitarily diagonalizable matrices . We begin with a lemma.

Lemma 7.10 If an upper triangular matrix T is normal, then it must be a diagonal


matrix.

Proof: Use induction on k in comparing the diagonal (k, k)-entry of both sides of
TT H = THT:

tll " . tIn]


t ll
0] [til 0] [tIl tIn]
[ [
o tnn tIn t
nn
= tIn t
nn
0' lnn

If k = 1, the equality

[TTH]ll = Itlll2 + ... + Itlnl2, and [THT]ll = Itlll2


implies tl2 = .. . = tIn = O. Inductively, assume that ti-Ii = . . . = ti-In = 0 has
been shown for i = 1, .. . , k. Then
264 Chapter 7. ComplexVectorSpaces

and
[T
H
T]kk = Itlk 1
2
+ . .. + Itk-Ik 1
2
+ 2
Itkk 1 = Itkd

because tlk = .. . = tk-Ik = 0 by an induction hypothesis. But TT H = T H T yields


tkk+1 = .. . = lkn = O. It concludes that tkk+1 = .. , = lkn = 0 for all k = I, . . . , n,
which shows that all the entries of T off the diagonal are zero, i.e., T is diagonal. D

Theorem 7.11 For any n x n complex matrix A, the following are equivalent:
(1) A is normal;
(2) A is unitarily diagonalizable;
(3) A has a full set ofn orthonormal eigenvectors.

Proof: (1) => (2): Suppose that A is normal. By Schur 's lemma, there exists a
unitary matrix U such that T =
U HAU is an upper triangular matrix . Then T is also
normal , since

TT H = U H AUU H AHU = U H AAHU = U H A H AU


= U H AHUU H AU = THT.

Thus , by Lemma 7.10, T is already diagonal so that A is unitarily diagonalizable.


(2) => (3): It is clear that the columns of the basis-change matrix U are n or-
thonormal eigenvectors of A.
(3) => (1): Let U be the unitary matrix whose columns are the n orthonormal
eigenvectors . Then AU = U D or A = U DU H, and

AA H = UDUHUDHU H = UDDHU H = UDHDU H


= UDHUHUDU H AHA .

That is, A is normal. D

Note that there exist infinitely many non-normal complex matrices that are still
diagonalizable, but of course not unitarily. One can find such examples among the
triangle matrices having distinct diagonal entries.
Recall that any n x n real matrix A can be written as the sum S + T of a symmetric
matrix S = !(A +A T) and askew-symmetric matrix T = !(A - AT) . (See Problem
1.11.) The same kind of expression is also possible for a complex matrix. A complex
matrix A can be written as the sum A = HI + i H2, where

1 H i H . 1 H
HI = -(A +A ), H2=--(A-A ); or 1H2=-(A-A ).
2 2 2
Clearly both HI and H2 are Hermitian, and i H2 is skew-Hermitian.

Problem 7.20 Show that the matrix A = [~ ~], x E JR, is not normal, so it cannot be
uniIarily diagonalizable. But it is diagonalizable.
7.5.1. Application: The spectral theorem 265

[Ii i]
Problem 7.21 Determine whether or not the matrix

A = iii
iii

is unitarily diagonalizable. If it is, find a unitary matrix U that diagonalizes A.

Problem 7.22 Let HI and H2 be two Hermitian matrices . Show that A = HI + i H2 is normal
if and only if HIH2 = H2HI.

Problem 7.23 For any unitarily diagonalizable matrix A, prove that


(1) A is Hermitian if and only if A has only real eigenvalues;
(2) A is skew-Hermitian if and only if A has only purely imaginary eigenvalues;
=
(3) A is unitary if and only if 1> . 1 I for any eigenvalue Xof A .

7.S Application

7.5.1 The spectral theorem

As shown in the previous section, the normal matrices are the only matrices that can
be unitarily diagonalized. That is, A is normal if and only if there exists a basis ex for
en consisting of orthonormal eigenvectors of A such that the matrix representation
[A]a of A with respect to ex is diagonal.

Theorem 7.12 (Spectral theorem) Let A be a normal matrix, and let {UI, U2, .. . ,
un} be a set oforthonormal eigenvectors belonging to the eigenvalues AI, A2, ... , An
of A, respectively. Then A can be written as

A = U DU
H
= A,IUIUf + A,2 U2U + . . . + A,nUn u !{ I
and uiuf is the orthogonal projection matrix onto the subspace spanned by the
eigenvector Uj for i = 1, .. . , n.

Proof: Note that U = [UI U2 ... un] is a unitary matrix that transforms A into a
diagonal matrix D, i.e., U- I AU = U H AU = D . Then

A ~ UDU" ~ [A,., A2.2 A l [~ ]

A,IUluf + A,2U2U + + A,nUn u !{


= A,I PI+A,2 P2+"'+A,n Pn,
266 Chapter 7. Complex Vector Spaces

where
Z
uli ] [IUliI .. . Uliuni]
Pi = UiU!l
I .
= : [Uli' . . Un;] = :

'.
:

,
[
Uni UniUIi lunilZ

which is a Hermitian matrix. Now, for any x E en and i, j = 1, . . . , n,

PiX = uiufx = (Ui, X}Ui,


PiPj = uiuf UjUf = (Ui , Uj }Uiuf
lUiuf = Pi ifi = j,
= { OUiuf = 0 ifi =P j,
n
(PI + .. . + Pn)x = PIX + .. . + Pnx = ~)Ui' X}Ui = X = id(x) .
i= 1

Therefore, each Pi is nothing but the orthogonal projection onto the subspace spanned
by the eigenvector Ui. 0

=
Note that the equation PI + ... + Pn id means that if one restricts the image of
the Pi to be the subspace spanned by Ui which is isomorphic to C, then (PI, .. . , Pn)
defines another orthogonal coordinate system with respect to the orthonormal basis
{UI, .. . , un} just like (Zl, . . . , Zn) of the en (see Sections 5.6 and 5.9.3).
Note that any X E en has the unique expression x = ~:::<Ui, X}Ui as a linear
combination of the orthonormal basis vectors u., and by the spectral theorem ,

Ax = AIPIX + AZPZX + . . . + AnPnX


= AIUI (u~x) + + AnUn(U~X)
= AI(U"X}UI + +An(Un,X}un.

If an eigenvalue A has multiplicity t, i.e., A = Ail = ... = Ait' with a set of e


orthonormal eigenvectors Uil' ... , Uit' then they form an orthonormal basis for the
eigenspace E(A), and

is the orthogonal projection matrix onto E(A) = N(U - A). Therefore, counting
the multiplicity of each eigenvalue, every normal matrix A has the unique spectral
decomposition into the projections

for k ::: n, where Ai'S are all distinct.

Coronary 7.13 Let A be a normal matrix.


7.5.1. Application:The spectral theorem 267

(1) The eigenvectors of A belonging to distinct eigenvalues are mutually orthogonal.


(2) If an eigenvalue A of A has multiplicity k, then the eigenspace N(A - AI)
belonging to A is ofdimension k.

Coronary 7.14 Let A be a normal matrix with the spectral decomposition A =


A1 PAl + .. .+ AkP Ak Then,for any positive integer m,

Am = Ai PAl + ... + Ak' PAk


Moreover, if A is invertible, then for any positive integer t;

t i l
At A1 + .. +-P
A- =-P At Ak

1 k

Example 7.9 (Spectral decomposition of a symmetric matrix) Find the spectral de-
composition of
422]
A=
[ 224
2 4 2 .

Solution: From Example 7.6, the spectral decomposition is

where the projections are

-2 ]
-2 ,
4

Hence,

p{ = P1 + P2
1[ 2-1 -1]
=3 - 1 2-1
-1 -1 2
is the projection onto the eigenspace E(2) belonging to A = 2, P3 is the projection
onto the eigenspace E (8) belonging to A 8, and =
42 24 2]
2 = -2 [ -12 -12 -1] + -8 [11 11 1]
A =
[ 2 2 4
-1 1 . o
3 -1 -1 2 3 1 1 1
268 Chapter 7. Complex Vector Spaces

0 2-1]
Problem 7.24 Given A =
[ 2
-1 -2
3 - 2
0
, find an orthogonal matrix Q that diagonalizes

A, and find the spectral decomposition of A.

Example 7.10 (Spectral decomposition ofa normalmatrix) Find the spectral decom-
position of a normal matrix

A=
0 0
0 i 0
i] .
[ i 0 0

Solution: Since A is normal (AA H = AHA) , it is unitarily diagonalizable. The


characteristic polynomial of A is

det(AI - A) = det [ A 0
0 A- i
-i] =
0 (A - i)2(A + i).
-i 0 A

Hence, the eigenvalues are}q = A2 = i of multiplicity 2 and A3 = -i of multiplicity


1. By a simple computation using the Gram-Schmidt orthogonalization, one can find

[n ", ~ [n ", ~ J, [ -~ ]
that

"I ~ J,
are orthonormal eigenvectors of A belonging to the eigenvalues AJ, A2 and A3, re-
spectively. Now, the spectral decomposition is A = i (PI + P2) - i P3, where the
projection matrices are

-1]
o .
1
Hence,

1] .[ 10 -1]
o _!.. o . 0 0 D
1 2 -1 0 1
7.6. Exercises 269

Problem 7.25 Find the spectral decomposition of each of the following matrices :

(1) A = [~ ~ J' (2) B =[ 2 ~ i 2+iJ


3 '

(3) C =
1 0 0
020
0 0 2
(4) D _
-
I
1 0
1 0
1
11]
o
o
0
0 .
[ 000 [
1 0 o 0

7.6 Exercises
7.1. Calculate [ x ] for

(1) x = [ 1 +2 i ]
' (2) x =[ I ~ 2i ] .
3+i
7.2. Construct an orthonormal basis for (;2 from {(i, 4 + 2i) , (5 + 6i , I)} by applying the
Gram-Schmidt orthogonalization.
i 1 l - i l+ i ]
7.3. Find the rank of the matrix A = 1- i I + i I 2+ i .
[ 1 + 3i 1 - i 2 - i 1 + 4i
7.4. Find the eigenvalues and eigenvectors for each of the following matrices:

(1) [-~ -~ J.
(3) [- ~ -~ -;],
3 - 5 - 3

7.5. Find the third column vector v so that U =


[
JJ
1 ~o
_...L
v
I] I
is unitary. How much

"J3 .j'j,
freedom is there in this choice ?
7.6. Find a real matrix A such that A + r I is invertible for all r E R Does there exist a square
matrix A such that A + cI is invertible for all C E (; ?
7.7. Find a unitary matrix whose first row is
(1)k(l,I-i)wherekisanumber, (2)(1 '~ ' 9)-
7.8. Let V = (;2 with the Euclidean inner product. Let T be the linear transformation on V
with the matrix representation A = [~ ~ ] with respect to the standard basis. Show
that T is normal and find a set of orthonormal eigenvectors of T.
7.9. Prove that the following matrices are unitarily similar :

[~~~: -:~~: J. [e~O ) iOJ. where e is a real number.


270 Chapter 7. Complex Vector Spaces

7.10. For each of the following real symmetric matrices A, find a real orthogonal matrix Q such
that QT AQ is diagonal :

(1) [~ ~ J. (2) [~ ~ l
7.11. For each of the following Hermitian matrices A, find a unitary matrix U such that UH AU
is diagonal.

(2) [ 1 . 2 + 3i ] ,
I i 2+i]
(3) -i 2 1- i .
2 - 31 -1 [ 2-i l+i 2
7.12. Find the diagonal matrices to which the following matrices are unitarily similar. Determine
whether each of them is Hermitian, unitary or orthogonal.

(1)
l[l+i
2: 1- i
l-i]
I+ i '
(2) [0.6 -0.8],
0.8 0.6
(3)
[
1
-i
i
1 i
0] .
o -i I
7.13. For a skew-Hermitian matrix A, show that
(1) A - I is invertible, (2) eA is unitary.
7.14. Let U be a unitary matrix. Prove that U and U T have the same set of eigenvalues .

7.15. Verify that A = [; ~] is normal. Diagonalize A by a unitary matrix U .


7.16. Show that the non-symmetric real matrix

A =[ ~ ~ ~ ] can be diagonalized.
-2 -4 -5
7.17. Suppose that A , Bare diagonalizable n x n matrices. Prove that AB = BA if and only
if A and B can be diagonalized simultaneously by the same matrix Q, i.e., Q-I AQ and

7.18. :~~:':':""~::i:: = [~ ~ ~] . of A
1 1 2
7.19. Let A and B be 2 x 2 symmetric matrices. Prove that A and B are similar if and only if
det A = det Band tr(A) = tr(B) .
7.20. Let A be a real symmetric n x n matrix and X an eigenvalue of A with multiplicity m,
Show that dimN(A - H) = m.
7.21. Show that a matrix A is nilpotent, i.e., An = 0 for some integer n ~ 1, if and only if its
eigenvalues are all zero.
7.22. Determine whether the following statements are true or false, in general, and justify your
answers .
(1) Every Hermitian matrix is unitarily similar to a diagonal matrix.
(2) An orthogonal matrix is always unitarily similar to a real diagonal matrix.
(3) For any square matrix A, AA H and AHA have the same eigenvalues.
(4) If a triangular matrix is similar to a diagona l matrix, it is already diagonal .
(5) If all the columns of a square matrix A are orthonormal, then A is diagonalizable.
(6) Every permutation matrix is diagonalizable.
(7) Every permutation matrix is Hermitian.
7.6. Exercises 271

(8) A nonzero nilpotent matrix cannot be Hermitian .


(9) Every square matrix is similar to a triangular matrix .
(10) If A is a Hermitian matrix, then A + i I is invertible.
(11) If A is a real matrix, then A + i I is invertible.
(12) If A is an orthogonal matrix, then A + -! I is invertible.
(13) Every unitarily diagonalizable matrix with real eigenvalues is Hermitian .
(14) Every diagonalizable matrix is normal.
(15) Every invertible matrix is similar to a unitary matrix.
8
Jordan Canonical Forms

8.1 Basic properties of Jordan canonical forms

Most problems related to a (complex) matrix A can be easily solved if the matrix is
diagonalizable, as shown in previous chapters. For example, this is true in computing
the power An, in solving a linear difference equation Xn = AXn-J or a linear dif-
ferential equation y' (t) = Ay(t) . In this chapter, we discuss how to solve the same
problems for a non-diagonalizable matrix A by introducing the Jordan canonical form
of a square matrix.
Recall that an n x n matrix A is diagonalizable if and only if A has a full
set of n linearly independent eigenvectors, or equivalently, the dimension of each
eigenspace E(A) = N(Al - A) is equal to the multiplicity of the eigenvalue A.
Hence, itA J, . . . , AI are distinct eigenvalues of A with multiplicities mAl' .. . , mAt'
respectively, then

dim E(Ad + ... + dim EP.I) = mAl + . ..+ rnA, =n


and

On the other hand, a matrix A is not diagonalizable if and only if A has an


eigenvalue A with multiplicity rn A > 1 such that

1 :::: dim E(A) < rnA,

so that the number of linearly independent eigenvectors belonging to A must be less


than ms,
However, even if a matrix A is not diagonalizable, one may try to find a matrix
similar to A which has as many zero entries as possible except diagonals. Schur's
lemma says that any square matrix is (unitarily) similar to an upper triangular matrix.
But, it is a fact that any square matrix A can be similar to a matrix much "closer"
to a diagonal matrix, called a Jordan canonical form . Its diagonal entries are the
eigenvalues of A, the entry just above each diagonal entry is 0 or I, and all other

J H Kwak et al., Linear Algebra


Birkhauser Boston 2004
274 Chapter 8. Jordan Canonical Forms

entries are O. In this case, the columns of a basis-change matrix Q are something like
eigenvectors, but not the same in general. They are called generalizedeigenvectors.

Theorem 8.1 Forany square matrix A, A is similar to a matrix J of the following


form, called the Jordan canonical form of A or a Jordan canonical matrix,

in which
(1) s is the number oflinearlyindependent eigenvectors of A, and
(2) each Jt is an upper triangularmatrix oftheform

where Ai isa singleeigenvalue of Jj withonlyone linearlyindependent associated


eigenvector. Such Jj is called a Jordan block belonging to the eigenvalue Aj.
The proof of Theorem 8.1 may be beyond a beginning linear algebra course.
Hence, we leave it to some advanced books and we are only concerned with how to
find the Jordan canonical form J of A and a basis-change matrix Q in this book.
First, note that if an n x n matrix A has a full set of n linearly independent
eigenvectors (that is s = n), then there have to be n Jordan blocks so that each
Jordan block is just a 1 x 1 matrix, and an eigenvalue A appears as many times as its
multiplicity. In this case, the Jordan canonical form J of A is just the diagonal matrix
with eigenvalues on the diagonal and a basis-change matrix Q is defined by taking
the n linearly independent eigenvectors as its columns. Hence, a diagonal matrix is a
particular case of the Jordan canonical form.
Remark: (1) For a given Jordan canonical form J of A, one can get another one
by permuting the Jordan blocks of J, and this new one is also similar to A. It will
be shown later that any two Jordan canonical forms of A cannot be similar except
this possibility. In other words, any (complex) square matrix A is similar to only one
Jordan canonical matrix up to permutations of the Jordan blocks. In this sense, it is
called the Jordan canonical form of a matrix A.
(2) As an alternative way to define the Jordan canonical form of A, one can take the
transpose JT of the Jordan canonical matrix J given in Theorem 8.1. In this case,
each Jordan block becomes a lower triangular matrix with a single eigenvalue. But
this alternative definition does not induce any essential difference from the original
one.
8.1. Basic properties of Jordan canonical forms 275

If J is the Jordan canonical form ofa matrix A , then they have the same eigenvalues
and the same number of linearly independent eigenvectors, but not the same set of
them in general. (Note that x is an eigenvector of J = Q -I A Q if and only if Qx is
an eigenvector of A).
Actually, for a given matrix A, its Jordan canonical form J is completely de-
termined by the number s of linearly independent eigenvectors of A and their ranks
(which will be defined in Section 8.2): each eigenvector corresponds to a Jordan block
in J, and the rank of an eigenvector determines the size of the corresponding block.
The following example illustrates which matrices A have the Jordan canonical
form J and how the s linearly independent eigenvectors of A (or J) correspond to
the Jordan blocks in J.

Example 8.1 (Each Jordan block to each eigenvector) Let J be .a Jordan canonical
matrix of the form :

J=
[~ ~] [~ ~]
]
=
[ JI
h h
]
.
[
[2]

Find the number of linearly independent eigenvectors of J and determine all matrices
whose Jordan canonical forms are J .

Solution: Since J is a triangular matrix , the eigenvalues of J are the diagonal entries
6 and 2 with multiplicities 2 and 3, respectively. The eigenspace E(6) has a single
linearly independent eigenvector because dim E(6) =
dimN(1-6I) 5-rank(J- =
6/) = 1, by the Rank Theorem 3.17. In fact, el = (l, 0, 0, 0, 0) is such a
vector and A = 6 appears only in a single block h . Similarly, one can see that the
eigenspace E (2) has two linearl y independent eigenvectors e3 and es with dim E (2) =
5 - rank(1 - 2/) = 2, and A = 2 appears in two blocks h and h
Hence, one can conclude that if a matrix A is similar to J , then A is a 5 x 5 matrix
whose eigenvalues are 6 and 2 with multiplicity 2 and 3 respectively, but there is only
one linearly independent eigenvector belonging to 6, (i.e., dim E(6) = 1) and there
are only two linearly independent eigenvectors belonging to 2, (i.e., dim E(2) = 2).
Moreover, the converse is also true by Theorem 8.1. In general, one can say that if
a matrix A is similar to a Jordan canonical matrix J, then both matrices have the
same eigenvalues of the same multiplicities and dimN(U - A) = dimN(U - J)
for each eigenvalue A. D

As shown in Example 8.1, the standard basis vectors ej's associated with the first
column vectors of the Jordan blocks Jj'S of J form linearly independent eigenvectors
of the matrix J, and then the vectors Qe j 's form linearly independent eigenvectors
of the matri x A, where Q-1AQ = J.
276 Chapter 8. Jordan Canonical Forms

Problem 8.1 Note that the matrix

A = [~o - i-~ j -~]


0 0 2 0
o 0 0 0 2
has the eigenvalues 6 and 2 with multiplicities 2 and 3, respectively. Moreover, there are two
linearly independent eigenvectors uj = =
(0, -1 , 1, 0, 0) and U2 (0, 1, 0, 0, 1) belonging
to A = 2, and vI = (-1, 0, 0, 0, 0) is an eigenvector belonging to A = 6. Show that the
Jordan canonical form of A is the matrix J given in Example 8.1 by showing Q-I A Q = J
with an invertible matrix

Q = [-~o 1-! ~! ].
0 0 -1 0
o 0 001

Problem 8.2 Show that for any Jordan block J, the order of J is the smallest positive integer
k such that (J - )..J)k = 0 , where A is an eigenvalue of J .

An eigenvalue A. may appear in several blocks. In fact, the number of Jordan


blocks belonging to an eigenvalue A. is equal to the number of linearly independent
eigenvectors of A belonging to A., which is the dimension of the eigenspace E(A.) =
N(A./ - A) . Moreover, the sum of the orders of all Jordan blocks belonging to an
eigenvalue A. is equal to the multiplicity ms; of A..
Next, one might ask how to determine the Jordan blocks belonging to a given
eigenvalue A.. The following example shows all possible cases of Jordan blocks be-
longing to an eigenvalue A. when its multiplicity rnA is fixed.
Example 8.2 (Classifying Jordan canonical matrices having a single eigenvalue)
Classify all possible Jordan canonical forms of a 5 x 5 matrix A that has a single
eigenvalue A. of multiplicity 5 (up to permutations of the Jordan blocks).

Solution: There are seven possible Jordan canonical forms as follows .


(I) Suppose A has only one linearly independent eigenvector belonging to A.. Then
the Jordan canonical form of A is of the form

o A.I OI O0
A. 0O ]
J(l) = Q-I AQ = 0 0 A. I 0 ,
[ o 0 0 A. I
o 0 0 0 A.
which consists of only one Jordan block with eigenvalue A. on the diagonal. And, both
A and J (l ) have only one linearly independent eigenvector belonging to A. . (Note that
rank(J (l) - A./) = 4.)
8.1. Basic properties of Jordan canonical forms 277

(2) Suppose it has two linearly independent eigenvectors belonging to A. Then the
Jordan canonical form of A is either one of the forms

each of which consists of two Jordan blocks belonging to the eigenvalue A. Note that
J (2) has two linearly independent eigenvectors el and ej, while J (3) has el and e2.
These two matrices J (2) and J (3) cannot be similar, because (1 (2) - A1)3 = 0, but
(1(3 ) - A1)3 i: O. (One can justify it by a direct computation.)
(3) Suppose it has three linearly independent eigenvectors belonging to A. Then
the Jordan canonical form of A is either one ofthe forms

each of which consists of three Jordan blocks belonging to the eigenvalue A. Note that
J(4 ) has three linearly independent eigenvectors ej , e2 and ea, while J(5) has el, e2
and e3. These two matrices J (4) and J (5) are not similar, because (1 (4) - A1)2 = 0,
but (1 (5) - A1)2 f= O.
(4) Suppose it has four linearly independent eigenvectors belonging to A. Then
the Jordan canonical form of A is of the form

which consists of four Jordan blocks with eigenvalue A.


(5) Suppose it has five linearly independent eigenvectors belonging to A. Then the
Jordan canonical form of A is the diagonal matrix J(7 ) with diagonal entries A.
Note that all of these seven possible Jordan canonical matrices have the same
trace, determinant and the same characteristic polynomial , but any two of them are
not similar to each other. D

As shown in the case (2) (also in (3)) of Example 8.2, two Jordan canonical
matrices J (2) and J (3) have the same eigenvalue of the same multiplicity and they
also have the same number of linearly independent eigenvectors , but they are not
similar. The problem of choosing one of the two possible Jordan canonical forms
278 Chapter 8. Jordan Canonical Forms

which is similar to the given matrix A depends on the sequence ofrank(A - ).,1)i for
e= 1,2, . . . , n.
The next example illustrates how to determine the orders of the Jordan blocks
belonging to the same eigenvalue A.

Example 8.3 (Determine Jordan blocks belonging to the sameeigenvalue) Let J be


a Jordan canonical matrix with a single eigenvalue ).,:

[).,]

Then,

and (J - 'A./)3 = O. 0

[0]

Thus, one can get rank(J - H) = 4, rank(J - ).,1)2 = 1 and rank(J - H)3 = O.
e
This sequence of ranks, rank(J - H)i for = 1,2,3, determines completely the
orders of blocks of J . In fact, one can notice that
(i) the fact that (J - ).,1)3 = 0 but (J - H)2 f= 0 implies that the largest block has
order 3,
=
(li) rank( J - ).,1)2 1 is equal to the number of blocks of order 3,
(iii) rank(J - H) = 4 is equal to twice the number of blocks of order 3 plus the
number of blocks of order 2, so there are two of them,
(iv) the number of blocks of order 1 is 8 - (2 x 2) - (3 xl) = 1.

Recall that two similar matrices have the same rank. Hence, if J is the Jordan
canonical form of A, then for any eigenvalue X and for any positive integer e, we
have rank(A - ).,1)i = rank (J - ).,1)i. Furthermore, in a sequence of matrices
J - H, (J - ).,1)2, (J - ).,1)3, . .. , all Jordan blocks belonging to the eigenvalue x
will terminate to a zero matrix but all other blocks (belonging to an eigenvalue different
from X)remain as upper triangular matrices with nonzero diagonal entries. Hence, the
sequence ofrank(J - ).,1)k must stop decreasing at n - ms, (Note: rank(J _ H)k =
n - rnA, when k = rnA')
8.1. Basic properties of Jordan canonical forms 279

Let CA = n - m A for simplicity. Then, the decreas ing sequence

{rank(A - Al)k - CA : k = 1, . . . , m A}
determines completely the orders of the blocks of J belonging to A as shown in
Example 8.3:
(i) The order of the largest block belonging to A is the smallest positive integer k
such that rank(A - Al)k - CA = O. And the number of such largest blocks is
equal to rank(A - Al)k-I - CA, (say = ij) .
(ii) The number of blocks of order k - 1 is equal to rank(A - Al)k-2 - CA -
21, (say = i2).
(iii) The number of blocks of order k - 2 is equal to rank(A - Al)k-3 - CA - 3i I -
22, (say = i3), and so on.
In general, if iI, i2, ... , i j are given, one can determine i j+ I inductively as
follows :
(iv) The number of blocks of order k - j is equal to rank(A - Al)k-(j+I) - CA -
(j + 1)il - ji2 - .. . - 2ij, (say = ij+l) with io = 0 for j = 0, . .. , k - 1.

In summary, one can determine the Jordan canonical form J of an n x n matrix


A by the following procedure.

Step 1 Find all distinct eigenvalues AI, ... , At of A . Let their multiplicities
be mAl' ... , m A" respectively, so that mAl + .. . + mAt = n , and let
c Aj = n - mA j '
Step 2 For each eigenvalue A, the Jordan blocks belonging to the eigenvalue A
are determined by the following criteria:
(i) The order of the largest block belonging to A is the smallest positive
integer k such that rank(A - Al)k - c ).. = 0, and
(ii) the number of blocks of order k - j is inductively determined as
rank(A - Ul-(j+I) - CA - (j + l)il - ji2 - . . . - 2j, (say = ij+l)
with io = 0 for j = 0, .. . , k - 1.

This is a general guide to determine the Jordan canonical form of a matrix. How-
ever, for a matrix oflarge order, the evaluation ofrank(A - Al)k might not be easy at
all, while, for matrices of lower order or relatively simple matrices, the computations
may be accessible.

Example 8.4 (Jordan canonicalform ofa triangularmatrix) Find the Jordan canon-
ical form J of the matrix
2 1 4]
A= 0 2 -1 .
[ 003

Solution: Since A is triangular, the eigenvalues of A are the diagonal entries AI =


A2 = 2, A3 = 3. Hence, there are two possibilities of the Jordan canonical form of
A:
280 Chapter 8. Jordan Canonical Forms

200] [210]
J{I) =
[0 0 3 0 0 3
0 2 0 or J (2) = 0 2 0 .

But, one can see that rank(A - 2l) = 2. It implies that rank(J - 2/) = 2 and so the
Jordan canonical form of A must be J(2). 0

Example 8.5 (Jordan canonicalform of a companion matrix)Find the Jordan canon-


ical form J of the matrix

A =[
0 1
0 o
o
0 0
1 0
o 1
0] .
-1 4 -6 4

Solution: The characteristic polynomial of the matrix A is

det(AI - A) = >.. 4 - 4>..3 + 6>.. 2 - 4>" + 1 = (>" - 1)4.

The eigenvalue of A is >.. = 1 of multiplicity 4. Note that the rank of the matrix

-1o 1 00] -1 1 0
A -I = [ 0
-1
0 -1 1
4 -6 3
is 3; by noting that the first three rows are linearly independent, or the determinant
of the 3 x 3 principal submatrix of the upper left part is not zero. Hence, the rank of
J - I is also 3 and so J - I must have three 1's beyond diagonal entries. It means
that

J
1 1 0

=[
011
0 0 1
000
Also, one can check the following equations:

-1 3-3 1]
(A _ 1)3 =
[ -1 3 -3 1
- 1 3 -3 1
-1 3 -3 1
i- 0, but (A - 1)4 = 0,

which says that the order of the largest Jordan block is 4. o

Problem 8.3 Let A be a 5 x 5 matrix with two distinct eigenvalues: ).. of multiplicity 3 and
J.L of multiplicity 2. Find all possible Jordan canonical forms of A up to permutations of the
Jordan blocks.
8.2. Generalized eigenvectors 281

n
Problem 8.4 Find the Jordan canonical form for each of the following matrices:

(2)
412]
[ 004
0 4 2 ,
(3) U~ !
8.2 Generalized eigenvectors
In Section 8.1, assuming Theorem 8.1, we have shown how to determine the Jordan
canonical form J of a matrix A . In this section, we discuss how to determine a basis-
change matrix Q so that Q -I A Q = J is the Jordan canonical form of A. In fact, if J
is given, then a basis-change matrix Q is a nonsingular solution of a matrix equation
=
AQ QJ.
The following example illustrates how to determine a basis-change matrix Q when
a matrix A and its Jordan canonical form J are given.

Example 8.6 (Each Jordan block to each chain of generalized eigenvectors) Let A
be a 5 x 5 matrix similar to a Jordan block of the form

A
o AI OOO]
I 0 0
Q-IAQ=J= 0 0 A 1 0 .
[o 0 0 A 1
o 0 0 0 A

Determine a basis-change matrix Q = [XI X2 X3 X4 xs].

Solution: Clearly, two similar matrices A and J have the same eigenvalue A of
multiplicity 5. Since rank(J - U) = 4, dimN(J - AI) = dim E(A) = 1. In fact, J
has only one linearly independent eigenvector , which is el = (1, 0, 0, 0, 0). Thus
Qel = Xl is a linearly independent eigenvector of A. Also, note that the smallest
positive integer k such that (A - U)k = (J - ul = 0 is 5 which is the order of the
block J.
To see what the other columns of Q are, we expand A Q Q J as=
[Ax] AX2 AX3 AX4 AxS] = [AXI Xl + AX2 X2 + AX3 X3 + AX4 X4 + AXS].

By comparing the column vectors, we have

Axs = X4 + AXs, or (A - AI)xs =(A - AI)IXS = X4,


AX4 = X3 + AX4, or (A - U)X4 = (A - U)2 xS = X3,
AX3 = X2 + AX3, or (A - U)X3 =(A - U)3 xS = X2,
AX2 = Xl + AX2, or (A - U)X2 = (A - AI)4 xs = XI,
AXI = AXI, or (A - U)XI =(A - U)SXs = O.
282 Chapter 8. Jordan Canonical Forms

= =
Note that the vectorxs satisfies (A -A/)5 x5 0 but (A -Al)4 X5 Xl f:. O. However,
(A -A/)5 = (J _A/)5 = O. Hence, if one gets xj as a solution of (A -A/)4x f:. 0, then
all other x, 's can be obtained by X4 = (A -Al)X5, X3 = (A -Al)X4, X2 = (A -Al)X3,
etc. Such a vectorxs is called a generalized eigenvector of rank 5, and the (ordered) set
{xj , .. . , X5} is called a chain ofgeneralized eigenvectors belonging to A. Therefore,
the columns of the basis-change matrix Q form a chain of generalized eigenvectors.
o
In general, by expanding A Q = QJ, one can see that the columns of Q corre-
sponding to the first columns of Jordan blocks of J form a maximal set of linearly
independent eigenvectors of A, and remaining columns of Q are generalized eigen-
vectors .
Definition 8.1 A nonzero vector X is said to be a generalized eigenvector of A of
rank k belong ing to an eigenvalue A if
(A - A/)k x = 0 and (A - Al)k-l x f:. O.
Note that if k = 1, this is the usual definition of an eigenvector. For a generalized
eigenvector X of rank k 2: 1 belonging to an eigenvalue A,define

Xk = X,
Xk-l = (A - A/)Xk = (A - A/)X,
Xk-2 = (A - A/)Xk-I = (A - A/)2 x,

X2 = (A - A/)X3 = (A - Al)k-2 x ,
XI = (A - A/)X2 = (A - Al)k-I X.

Thus, for each i, 1 < i :::: k, we have (A - A/)lxl = (A - Al)k x = 0 and


(A - A/)l-I x l = XI ;;f: O. Note also that (A - Al)lXj = 0 for i 2: i. Hence, the
vector Xl = (A - Al)k-l x is a generalized eigenvector of A of rank l. See Figure 8.1.
Definition 8.2 The set of vectors {xI , X2 , ... , Xk} is called a chain of generalized
eigenvectors belonging to the eigenvalue A. The eigenvector XI is called the initial
eigenvector of the chain.
The following successive three theorems show that a basis-change matrix Q can
be constructed from the chains of generalized eigenvectors initiated from s linearly
independent eigenvectors of A, and also justify the invertibility of Q. (A reader may
omit reading their proofs below if not interested in details .)
Theorem 8.2 A chain of generalized eigenvectors {XI, X2, ... , Xk} belonging to an
eigenvalue A is linearly independent.

Proof: Let CIXI + C2X2 + .. . + CkXk = 0 with constants ci, i = 1, .. . , k. If


we multiply (on the left) both sides of this equation by (A - Al)k-I , then for i =
1, ... ,k-l,
8.2. Generalized eigenvectors 283

. . .
<..>
A - >..I
'-A.../
A - >..I
A - AI

Figure 8.1. A chain of generalized eigenvectors

(A - AI)k-l Xi = (A - ).,l)k-(i+I)(A - AI)i Xi = O.


Thus, Ck(A - AI)k-l Xk = 0, and, hence, Ck = O.
Do the same to the equation ClXl + ... +Ck-lXk-1 = 0 with (A - AI)k-Z and get
Ck-l = = =
O. Proceeding successively, Onecan show that ci 0 for all i 1, . . . , k. 0

Theorem 8.3 The union of chainsof generalized eigenvectorsofa square matrix A


belongingto distinct eigenvalues is linearlyindependent.

Proof: For brevity, we prove this theorem for only two distinct eigenvalues. Let
{Xl,Xz, . . . , Xk} and {Yl,Yz, . . . , Yd be the chains of generalized eigenvectors of A
belonging to the eigenvalues x and u, respectively, and let X ::f: u: In order to show
that the set of vectors [xj , ... ,Xk, Yl, .. ., Yd is linearly independent, let

elxl + . . . + qXk + dlYl + . . . + deYe = 0

with COnstants ci 's and dj ' SoWe multiply both sides of the equation by (A - ).,l)k and
note that (A - Al)k Xi = 0 for all i = 1, .. . , k. Thus we have

(A - )..l)k(d1YJ + dzyz + . . . + deye) = O.

Again, multiply this equation by (A - f..L/)e-1 and note that

(A - f..Ll)e-I(A - ).,l)k = (A - ).,l)k(A - f..Ll)e-l,


(A - f..Ll)e-l Ye = Yl,
(A - f..Ll)e-l Yi = 0

for i = 1, ... , f. - 1. Thus we obtain

Because (A - f..L/)Yl = 0 (or AYI = f..LYl), this reduces to

de(f..L - )..lYl = 0,
284 Chapter 8. Jordan Canonical Forms

which implies that dl = 0 by the assumption A =1= /.l and YI =1= O. Proceeding
successively, one can show that dj = 0, i = .e,.e - 1, .. . ,2,1, so we are left with

CIXt + . .. + qXk = O.
Since {XI, ... , xd is already linearly independent by Theorem 8.2, c, =0 for all
i = 1, . . . , k. Thus the set of generalized eigenvectors {x I , . .. , Xb Yt , ... , Yel is
linearly independent. 0

The next step to determine Q such that A Q = Q J is how to choose chains of


generalized eigenvectors from a generalized eigenspace, which is defined below, so
that the union of the chains is linearly independent.

Definition 8.3 Let A be an eigenvalue of A. The generalized eigenspace of A


belonging to A, denoted by K)., is the set

K). = {x E en : (A - AI)P X =0 for some positive integer pl.


It turns out that dim K). is the multiplicity of A, and it contains the usual eigenspace
N(A - AI) . The following theorem enables us to choose a basis for K)., but we omit
the proof even though it can be proved by induction on the number of vectors in SU T.

Theorem 8.4 Let S = {XI, X2, . . . , Xk} and T = {YI, Y2, .. . , Ytl be two chains of
generalized eigenvectors ofA belonging to the same eigenvalue A.lfthe initial vectors
XI and y ; are linearly independent. then the union SU T is also linearly independent.

Note that Theorem 8.4 extends easily to a finite number of chains of generalized
eigenvectors of A belonging to an eigenvalue A, and the union of such chains will
form a basis for K). so that the matrix Q may be constructed from these bases for
each eigenvalue as usual.

Example 8.7 (Basis-change matrix for a triangular matrix) For a matrix

21 4]
A =
[ 00 02 3-1 ,
find a basis-change matrix Q so that Q-I A Q is the Jordan canonical matrix.

Solution: Method 1: In Example 8.4, the Jordan canonical form J of A is determined

210]
as

J=
[ 0 2 0
003
.

By comparing the column vectors of AQ = QJ with Q = [XI X2 X3], one can get
8.2. Generalized eigenvectors 285

Since XI and X3 are eigenvectors of A belonging to A = 2 and A = 3, one can take


XI = (1, 0, 0) and X3 = (3, -1, 1) . Also, from the equation AX2 = 2X2 + XI, one
can conclude X2 = (a , 1, 0) with any constant a, so that

1 a 3 ]
Q= 0 1 -1 .
[ 001

One may check directly the equality A Q = QJ by a direct computation.


Method 2: This is a direct method to compute Q without using the Jordan canonical
form J. Clearly, the eigenvalues of A are Al = A2 = 2, A3 = 3. Since rank(A -
All) = 2, the dimension of the eigenspace N(A - All) is 1. Thus there is only
one linearly independent eigenvector belonging to Al = A2 = 2, and an eigenvector
belonging to A3 = 3 is found to be X3 = (3, -1 , 1). We need to find a generalized
eigenvector X2 of rank 2 belonging to A = 2, which is a solution of the following
systems:

(A - 2l)x =
[~
1
0
0
-:]. ~.
(A - 2l)2 x
~U
0
0
0
-!]. ~.
From the second equation , X2 has to be ofthe form (a , b, 0), and from the first equation
we must have b :f; O. Let us take X2 = (0, 1, 0) as a generalized eigenvector of rank
2. Then, we have XI = (A - 2l)x2 = (1, 0, 0). Now, one can set

Q~['1'2'3]~ U! -:]
The reader may check by a direct computation

210]
Q-I AQ =
[ o0 02 03 = [ 0 JI

where JI = [~ ; ] and Jz = [3]. o

Example 8.8 (Basis-change matrix for a companion matrix) Find a basis-change


matrix Q so that Q-I A Q = J is the Jordan canonical form of the matrix

A=
0 1
0 0
o 0
o1
01
0 0] '
[ -1 4 -6 4
286 Chapter 8. Jordan Canonical Forms

Solution: Method J: In Example 8.5, we computed

1100]
o1 1 0
J=
[0 0 1 1
000 1
.

Now, one can find a basis-change matrix Q = [Xl X2 X3 X4] by comparing the column
vectors of AQ = QJ :

AXl=Xl , AX2=X2+Xl , AX3=X3+X2 , AX4=X4+X3.

By computing an eigenvector of A belonging to >.. = I, one can get xj = (1, 1, 1, 1).


The equation AX2 = X2 + Xl gives X2 = (a, a + 1, a + 2, a + 3) for any a. Take
X2 = (0, 1, 2, 3). Similarly, from equations AX3 = X3 + X2 and AX4 = X4 + X3,
one can get X3 = (b, b, b + I, b + 3) for any b and set X3 = (0, 0, 1, 3), and

n
successively one can take X4 = (0, 0, 0, 1). We conclude that

Q~U ~!
One may check A Q = QJ by a direct matrix multiplication.
Method 2: The characteristic polynomial of the matrix A is

det(A - >..I) = >..4 - 4>..3 + 6>..2 - 4>" + 1 = (>.. _1)4 .

The only eigenvalue of A is x = 1 of multiplicity 4. Note that dimN(A - J) = 1


because the rank of the matrix

-1 1 00]
0 -1 1 0
[
A _ J _
- 0 0 -1 1
-1 4 -6 3
is 3. Thus, a basis-change matrix Q = lxr X2 X3 X4] consists of a chain of generalized
eigenvectors. First, we find a generalized eigenvector X4 of rank 4, which is a solution
Xof the following equations:

(A _1)3 x = [-1 -3 1]
3
-1 3 -3 1
-1 3 -3 1 X f: 0,
-1 3 -3 1
(A _1)4 x = O.

But, a direct computation shows that the matrix (A - 1)4 = O. Hence, one can take
any vector that satisfies the first equation as a generalized eigenvector of rank 4: Take
X4 = (-I, 0, 0, 0). Then ,
[-~ n[-~] [H
8.2. Generalized eigenvectors 287

1 0
-1 1
X3 = (A -/)'4 = =
0 -1
-1 4 -6
X2 = (A -/)X3 = (-1, 0, 1, 2),
Xl = (A -/)X2 (1, 1, 1, 1).

Now, one can set


-1
o
1 o
~ -~]
0 .
2 1 0
Then,

100] . h Q-I _ [ 00 -1 1 0 1 00]


o1
1 0 = J
1 1 Wit - 0 1 -2 1 . o
o 0 1 -1 3 -3 1

Remark: In Examples 8.7-8 .8, we use two different methods to determine a basis-
change matrix. In Method 1, we first find an initial eigenvector Xl in order to get a
chain {Xl, X2, .. . ,Xk} of generalized eigenvectors belonging to A. After that, we find
X2 as a solution of (A - H)X2 = xi, and X3 as a solution of (A - A/)X3 = X2, and
so on. In this method, we don't need to compute the power matrix (A - H)2 and
(A - A/)3, etc. But, this method may not work sometimes, when the matrix A - H
is not invertible. See the next Example 8.9. In Method 2, we first find a generalized
eigenvector Xk of rank k as a solution of (A - )../)k x = 0 but (A - )..l)k-I x =1= O.
With this Xko one can get Xk-l as Xk-l = (A - A/)Xk, and Xk-2 = (A - )../)Xk-l,
and so on. This method works always, but we need to compute a power (A _ )../)k .
The next example shows that a chain of generalized eigenvectors may sometimes
not be obtained from an initial eigenvector of the chain.

Example 8.9 For a matrix

A=
[ 5-3 -2]
8 -5 -4 ,
-4 3 3

find a basis-change matrix Q so that Q-l A Q is the Jordan canonical matrix.

Solution: Method 1: The eigenvalue of A is A = I of multiplicity 3, and the rank of

[ 4-3 -2]
the matrix

A - 1= 8 -6 -4
-4 3 2
288 Chapter 8. Jordan Canonical Forms

is 1. (Note that the second and the third rows are scalar multiples of the first row.)
Hence, there are two linearly independent eigenvectors belonging to A. = 1, and the
Jordan canonical form J of A is determined as

J= 1 1
010
[ 001
J.
Now, one may find a basis-change matrix Q = [XI X2 X3] by comparing the column
vectors of AQ = QJ:

By computing an eigenvector of A belonging to A. = 1, one may get two linearly


independent eigenvectors: take UI = =
(l, 0, 2) and U2 (0, 2, -3). If we take the
eigenvector XI as UI or U2, then a generalized eigenvector X2 must be a solution of

But this system is inconsistent and one cannot get a generalized eigenvector X2 in this
way. It means that we are supposed to take an eigenvector XI E (1) carefully in
=
order to get X2 as a solution of (A - I)x XI .
Method 2: First, note that A has an eigenvalue A. = 1 of multiplicity 3 and there
are two linearly independent eigenvectors belonging to A. = 1. Hence, we need to find
a generalized eigenvector of rank 2, which is a solution X of the following equations:

(A -I)x

(A _1)2 x
=
=
U=~ =n [n ~
O.
0,

But, a direct computation shows that the matrix (A - 1)2 = O. Hence, one can take
any vector that satisfies the first equation as a generalized eigenvector of rank 2: take
X2 = (0, 0, -1). Then,

x\ ~ (A -I)x, ~ [j =~ =n UJ ~ Ul
Now, by taking another eigenvector X3 = (1, 0, 2), so that XI and X3 are linearly
independent, one can get
8.3. The power A k and the exponential e A 289

-n
One may check by a direct computation

1 1 0]
Q-'AQ= 0 1 0 =J o
[ 001

Problem 8.5 (From Problem 8.4) Find a full set of generalized eigenvectors of the following
matrices:

412]
(2)
[ 0 4 2
004

8.3 The power A k and the exponential eA


In this section, we discuss how to compute the power Ak and the exponential matrix
e A , which are necessary for solving linear difference or differential equations as
mentioned in Sections 6.3 and 6.5. It can be dealt with in two ways. Firstly, it can
be done by computing the power Jk and the exponential matrix e J for the Jordan
canonical form J of A, as shown in this section. The second method is based on the
Cayley-Hamilton theorem (or the minimal polynomial) and it will be discussed later
in Sections 8.6.1-8.6.2.
Let J be the Jordan canonical form of a square matrix A and let

with a basis-change matrix Q. Since

for k = 1,2, . .. it is enough to compute Jk for a single Jordan block J. Now an


n x n Jordan block J belonging to an eigenvalue). of A can be written as

).J + N.
290 Chapter 8. Jordan Canonical Forms

Since I is the identity matrix, clearly IN = N I and

Note that N k = 0 for k ~ n. Thus, by assuming (~) = 0 if k < e,

Next, to compute the exponential matrix e A , we first note that

Thus, it is enough to compute e J for a single Jordan block J. Let J = AI + N, as


before. Then, N k = 0 for k ~ nand

I I 1
1 -
21 (n - 2)! (n - I)!
1 1
0 -
2! (n - 2)!
Nk
L-
n-l
eJ =eJ..leN = eA = eA
k=O k! 1
1 -
2!
1
0 0 1
8.3. The power A k and the exponential e A 291

Example 8.10 (Computing Ak and eA by usingthe Jordan canonicalform) Compute


the power Ak and the exponential matrix eA by using the Jordan canonical form of A
for

(1)A=
[o
2 2I - 4]
0
0
I
3
(2) A =[ ~ ~ ! ~].
-I 4 -6 4

Solution: (1) From Examples 8.4-8.7, one can see that

A=QJQ-l=[~0~0-i][~; ~][~ ~
1003001
-i],
and

Jk =
2k
00
(~)2kk-l
2
0]
0
[
o 3k
Hence,

and

(2) Do the same process as (I). From Examples 8.5-8.8,

-~][~
o
i: ~][ ~ -l j ~1]'
0 0 0 I -1 3-3

and
292 Chapter 8. Jordan Canonical Forms

(~) @ (;)
o 1 m @ o1 11 ~1 ~]
1
and e J
=e 2I .
o 0 1 (~) [
o0 1 1
000 1 o0 0 1

Now. one can compute Af = QJkQ-1 and e/' = Qe JQ-l.Forexample,

J[ :
1

-~o ~ 0]
l]U
2I 1
A
[ '1 -10 0I 1 -1 o1 0
e = ell 0 0 0 1 1 -2 1
1 2 1 o 0 0 3 -3 1
0
1

[-I
2

J
1
-2o ']
= e 2 5
0
-3 2 -3
13 21 17
-'6 8 -2 T

Example 8.11 (Computing Ak and e A by usingthe Jordancanonicalform) Compute


the power A k and the exponential matrix eA by using the Jordan canonicalform of A
for

o1
21 21 01]
A= 0 0 2 0 .
[
-1 1 0 3

Solution: (1) The characteristicpolynomialof A is det(AI - A) = >.. 4 _8>..3 +24>..2-


32>" + 16 = (>" - 2)4 , and A = 2 is an eigenvalue of multiplicity4. Since

-1 1 1 1]
o0 2 0
rank(A - 2l) = rank 0 0 0 0 =2
[
-1 1 0 1

and

2 00 00 01 00 ]
rank(A - 2l) = rank 0 0 0 0 = 1,
[
o 0 1 0
the Jordan canonicalform J of A must be of the form
8.3. The power Ak and the exponential e A 293

o2100]
2I 0
J=
[0 020'
o0 0 2
and a basis-change matrix Q such that Q-I AQ = J is

101 0]
2 0
o
1 -1-1 [ 2 -~ ~ -~]
0 ' and then Q = 0 o 1 0 .
o 0 -1 2 -1 0 -2

Therefore,

2 1 O]k
Ak _ QJk Q-I _ Q 0 2 1
[ 0 0 2
- -
[
o
and

~ ~]k [~
02
=
0
Hence,

k-2]
Zk (~)2k-1 @2 ]
k
Ak = QJk Q-I = Q 0 2 (~)2k-l 0 Q_I
[
[ o 0 2k
o 2k
2k _ k2 k- 1 kZk-l ~2k-2 + k2 k- 1 k2 k- 1 ]
o 2k 2k2 k- 1 0
- 0 0 2k O'
[
-kZk-l k2 k- 1 k(k;I)Zk-2 k2 k- 1 + Zk

(Z) With the same notation,

o ] Q-l
eh '

where
294 Chapter 8. Jordan Canonical Forms

;, ]
Hence,
I
ell -, = e2 [ 01
= e21eN = e2 L'N' I I .
k-O k.
- 0 0 I
Thus we have

~] g-l~ U'
1
I

~ ~ ~
2 e2 J. e2
2
A I I e2 2e 2 e'0 ]
e gel g -l e'g [
0 I 0 e2 o . 0
e2 ~e2 2e 2
0 0

j j
Problem 8.6 Compute A k and e A by using the Jordan canonical form for

2 1 0 -3
2 o
o1 00 -2 1 -11 2
2
(1) A = 0 0 2 0 ' (2) A = -2 1 -1 2 .
[ o 0 o 1 [
-2 -3 1 4

8.4 Cayley-Hamilton theorem


As we saw in earlier chapters, the association of the characteristic polynomial with
a matrix is very useful in studying matrices. In this section, using this association of
polynomials with matrices, we prove one more useful theorem, called the Cayley-
Hamilton theorem, which makes the calculation of matrix polynomials simple, and
has many applications to real problems.

Let f(x) = amx m + am_lX m-1 + . . . + alx + ao be a polynomial, and let A be


an n x n square matrix. The matrix defined by

is called a matrix polynomial of A.


For example, if f(x) = x 2 - 2x + 2 and A = [ ; ~], then
f(A) = A 2-2A+2h

= [~~]-2[; ~]+2[~ ~]=[~ ~] .


8.4. Cayley-Hamilton theorem 295

Problem 8.7 Let}. be an eigenvalue of a matrix A . For any polynomial f(x), show that f(}.)
is an eigenvalue of the matrix polynomial f(A) .

Theorem 8.5 (Cayley-Hamilton) For any n x n matrix A , if f(A) = det(Al - A)


is the characteristic polynomial of A, then f(A) O. =
Proof: If A is diagonal , its proof is an easy exercise. For an arbitrary square matrix
A, let its Jordan canonical form be

sothatf(A) = Qf(J)Q-I. Since

and f(J) = [f(JI> ". 0 ],


o f(Js)

it is sufficient to show that f(Jj) = 0 for each Jordan block Jj. Let Jj = Aol + N
with eigenvalue Ao of multiplicity m, in which N'" = O. Since f(A) = det(Al - A) =
det(Al - J) = (A - Ao)mg(A) for some g(A) , we have

f(Jj) = (Jj - Aol)mg(Jj) = (Aol + N - Aol)mg(Jj) = Nmg(Jj) = 0 g(Jj) = O.


o
Remark: For a Jordan block

J=[1 :]

o Ao

with a single eigenvalue Ao of multiplicity m, we have

( k ),k-m+1
.. . \m-I 1\.0

o
Hence, for any polynomial p(A),
296 Chapter 8. Jordan CanonicalForms
a~m-l ) pO.O)
(m-l) !

p(J) =
cl).pOo)
o p().o)

where 11>.. denotes the derivative with respect to x . In particular, for the characteristic
polynomial f().) = det(Al - J) = (). - ).o)m, we have f().o) = a>..f().o) = ... =
aim-I) f().o) = 0 and hence f(J) = O.

Example 8.12 The characteristic polynomial of

A=
3
0
6
2
[ -3 -12 j]
is f().) = det(Al - A) =).3 +).2 - 6)" and

f(A) = A 3+A2-6A

=
[27o 78 54]o + [-9 -42 -18]
8 0 4 0

n
-27 -102 -54 9 30 18

-6[ : 6 6] [0
-3 -12 -6
2 0 = 0 0
0 0
0
o

Problem 8.8 Let us prove the Cayley-Hamilton Theorem 8.5 as follows: by setting A = A ,
f(A) = det(AI - A) = =
det 0 O. Is it correct or not? If not, what is a wrong step?

Problem 8.9 Prove the Cayley-Hamilton theorem for a diagonal matrix A by computing f(A)
directly. By using this, do the same for a diagonalizable matrix A.

The Cayley-Hamilton theorem can be used to find the inverse of a nonsingular


matrix. If f().) = ).n + an_l).n-l + . . . + al). + ao is the characteristic polynomial
of a matrix A, then

O=f(A) = An+an_lAn-l+ ... +alA+aoI,


or - aoI (A n- l + an_lA n- 2 + . . . + all)A.

Since ao = f(O) = det(OI - A) = det( -A) = (_l)n det A, A is nonsingular if and


only if ao = (_l)n det A :/= O. Therefore, if A is nonsingular,

1
A -1 = --(A n- l +an-lA n-2 + .. . +all).
ao
8.4. Cayley-Hamilton theorem 297

Example 8.13 (Compute A-I by the Cayley-Hamilton theorem) The characteristic


polynomial of the matrix
42-2]
A = [ -5 3
-2 4
2
I
is f()...) = det()...h - A) =)...3 - 8)...2 + 17A - 10, and the Cayley-Hamilton theorem
yields

Hence

-2] +-17 [1
2 0
1 10 1

o
Problem 8.10 Let A and B be square matrices , not necessarily of the same order, and let
f()..) = det(A1 - A) be the characteristic polynomial of A. Show that f(B) is invertible if and
only if A has no eigenvalue in common with B.

The Cayley-Hamilton theorem can also be used to simplify the calculation of


matrix polynomials. Let p()...) be any polynomial and let f()...) be the characteris-
tic polynomial of a square matrix A. A theorem of algebra tells us that there are
polynomials q()...) and r()...) such that

p()...) = q()...)f()...) + r()...) ,


where the degree of r()...) is less than the degree of f()...). Then

p(A) = q(A)f(A) + r(A).

By the Cayley-Hamilton theorem, f(A) = 0 and


p(A) = r(A).

Thus, the problem of evaluating a polynomial of an n x n matrix A or in particular a


power A k can be reduced to the problem of evaluating a matrix polynomial of degree
less than n.

Example 8.14 The characteristic polynomial of the matrix A = [ ; i] is f()...) =


)...2 _ 2)", -3. Let p()...) = )...4 _7A 3 - 3)...2 + ).. + 4 be a polynomial. A division by
f()...) gives that
298 Chapter 8. Jordan Canonical Forms

p(),) = (),2 - 5), - lO)f(),) - 34), - 26.


Therefore

p(A) = (A 2 - 5A - lO)f(A) - 34A - 26/


= -34A -26/

= -34 [; i] - [b 26 ~] = [ =~~ -68 ]


-60 . o

Example 8.15 (Computing A k by the Cayley-Hamilton theorem) Computethepower


A 10 by using the Cayley-Hamiltontheoremfor

o1 1
2 1
210]
A= 002 0 .
[
- 1 1 0 3

Solution: The characteristic polynomial of A is f(),) = det(Al - A) =), 4 - 8),3 +


24),2 - 32), + 16 = (), - 2)4, see Example 8.11. A divisionby f(),) gives

), 10 = q(),)f(),) + r(),)

with a quotientpolynomial q(),) = ),6+8),5+40),4+ 160),3 +560),2+ 1792),+5376


and a remainderpolynomial r(),) = 15360),3 -80640),2+ 143360),-86016. Hence,

A IO = r(A) = 15360A 3 - 80640A 2 + 143360A - 86016/


-4096 5120 16640 5120]
=
o 1024 10240 0
[ o 0 1024 O '
-5120 5120 11520 6144

(Compare this result with Ak givenin Example 8.11). One may noticethat this com-
putationalmethodfor An will becomeincreasingly complicated if n becomesbigger
and bigger. A simpler methodwill be shownlater in Example 8.22. 0

I 0 1]
Problem 8.11 For the matrix A =[ 0 2 0 ,(1) evaluate the power matrix A 10 and the
002
inverse A-I ; (2) evaluate the matrix polynomial A 5 + 3A 4 + A 3 - A 2 + 4A + 6/.
8.5. The minimal polynomial 299

8.5 The minimal polynomial of a matrix

Let A be a square matrix of order n and let


t
to.) = det(H - A) = An + an_lAn-I + """ + alA + ao = TI (A - Ai)m}.j
i= 1

be the characteristic polynomial of A, where mi; is the multiplicity of the eigenvalue


Ai. Then, the Cayley-Hamilton theorem says that
t
An + an_lAn-I +.. . + alA + ao! = TI (A - Ail)ml.j = O.
i=1

The minimal polynomial of a matrix A is the monic polynomial m(A) = Er=o CiAi
of smallest degree m such that
m

m(A) = L ci A i = o.
i=O

Clearly, the minimal polynomial m(A) divides any polynomial p(A) satisfying p(A) =
O. In fact, if p(A) = q(A)m(A) + r(A) as on page 297, where the degree of r(A) is
less than the degree of m(A), then 0 = p(A) = q(A)m(A) + r(A) = r(A) and
=
the minimality of m(A) implies that r(A) O. In particular, the minimal polynomial
m(A) divides the characteristic polynomial so that
t
m(A) = TI (A - Ai)k j
i=1

with k; ~ mi. For example, the characteristic polynomial of the n x n zero matrix is
An and its minimal polynomial is just A.
Clearly, any two similar matrices have the same minimal polynomial because

for any invertible matrix Q.

Example 8.16 For a diagonal matrix

n
0 0 0

[i
2 0 0
0 2 0
A- 0 0 5
0 0 0
300 Chapter 8. Jordan Canonical Forms

its characteristic polynomial is f(A) = (A- 2)3 (A- 5)2. However, for the polynomial
m(A) = (A - 2)(A - 5), the matrix m(A) is

m(A) = (A - 2/)(A - 5/) = [~ ~ ~ ~ ~] [-~ -~ -~ ~ ~~]


00030 0000
= O.

00003 0000
Hence, the minimal polynomial of A is m(A) = (A - 2)(A - 5). o
Example 8.17 (The minimal polynomial ofa diagonal matrix)
(1) Any diagonal matrix of the form AO/ has the minimal polynomial A - AO. In
particular, the minimal polynomial of the zero matrix is the monomial A, and the
minimal polynomial of the identity matrix is the monomial A - I.
(2) If an n x n matrix A has n distinct eigenvalues AI , ... , An , then its minimal
polynomial coincides with the characteristic polynomial f(A) = 07=1(A - Ai).
In fact, for the diagonal matrix D having distinct diagonal entries AI, .. . , An
successively and for any given j, the (j, j)-entry of Oi;=j (D - Ai l) is equal to
Oi;=/Aj - Ai), which is not zero . 0
Example 8.18 (The minimal polynomial ofa Jordan canonical matrix J having one
or two blocks)

[I ! ~ i ~],
(1) For any 5 x 5 matrix A similar to a Jordan block of the form

Q-I AQ = J =

OOOOAO
its minimal polynomial is equal to the characteristic polynomial f(A) = (A -
AO)5, because (J - AO/)4 :p 0 but (J - AO/)5 O. =
(2) For a matrix J having two Jordan blocks belonging to a single eigenvalue A, say

JI 0]
J = [ 0 h =
AO
o
1 0]
AO 1
[ o 0 AO
the minimal polynomial of the smaller block Jz is a divisor of the minimal poly-
nomial of the larger block JI . In general, if a matrix A or its Jordan canonical
form J has a single eigenvalue AO, then its minimal polynomial is (A - AO)k,
where k is the smallest positive integer l such that (A - A/)i = O. In fact, such
number k is known as the order of the largest Jordan block belonging to A.
8.5. The minimal polynomial 301

(3) For a matrix J having two Jordan blocks belonging to two different eigenvalues
AO :/= AI respectively, say
I 0 0

[~
AO I o 0
o
0 AO 1 0
]
J-[iJ 0]_

n
0 0 AO I
- 0 h - 0 0 o AO

[~
1
Al
0
the minimal polynomial of J is a product of the minimal polynomials of JI and
h which is (A - Ao)5(A - AI)3.
In general, for a Jordan canonical matrix J, let h denote the direct sum of Jordan
blocks belonging to the eigenvalueA. Then, J = E9:=1 JA; , where t is the number
of the distinct eigenvalues of J. In this case, the minimal polynomial of J is the
product of the minimal polynomials of the summands hi's. 0
By a method similar to Example 8.18 (2) and (3) with Step 2(i) on page 279, one
can have the following theorem.
Theorem 8.6 For any n x n matrix A or its Jordan canonical matrix J. its minimal
polynomial is n:=1 (A - Ai)ki , where AI, . . . , At are the distinct eigenvalues of A
and k; is the order ofthe largest Jordan block in J belonging to the eigenvalue Ai. Or
=
equivalently, k, is the smallest positive integer i such that rank(A - Ai /)l + m A; n,
where mAl is the multiplicity of the eigenvalue Ai.
Corollary 8.7 A matrix A is diagonalizable if and only if its minimal polynomial is
equal to n:=1 (A - Ai), where AI, ... , At are the distinct eigenvalues of A.
Example 8.19 (Computing the minimal polynomial) Compute the minimal polyno-
mial of A for

20 2I -I4] 01
o 0 00]
1 0
(I)A=
[o 0 3
(2) A =
[ -10 0 0 1
4 -6 4
.

Solution: (1) Since A is triangular, its eigenvalues are Al = A2 = 2, A3 = 3. But,


rank(A - 2/) = 2 and so A is not diagonalizable. Hence, its minimal polynomial is
m(A) = (A - 2)2(A - 3).
(2) Recalling Example 8.5, we know that the eigenvalue of A is A = 1 of mul-
tiplicity 4, and rank(A - I) = 3. It implies that the Jordan canonical form of A
is

J =
1 1
0 I 1 0
o
o
0 1 1 '
0]
[
o 0 o 1
302 Chapter 8. Jordan Canonical Forms

and the minimal polynomial of A is meA) = (A - 1)4, by Theorem 8.6. 0

Example 8.20 Compute the minimal polynomial of A for

o1 12 1 2 1 0 J
A= 002 0 .
[ -1 1 0 3

Solution: In Example 8.11, we show that the characteristic polynomial of A is


f(A) = (A - 2)4 , rank(A - 2/) = 2, rank(A - 2/)2 = 1 and the Jordan canonical
form J of A is

J=[H
o
HJ.
0 0 2
So, the minimal polynomial of A is meA) = ()" - 2)3, by Theorem 8.6. 0

Problem 8.12 In Example8.2, we haveseen that there are seven possible(nonsimilar) Jordan
canonicalmatricesof order 5 that have a single eigenvalue. Computethe minimalpolynomial
of each of them.

8.6 Applications

8.6.1 The power matrix Ak again

We already know how to compute a power matrix Ak in two ways: One is by using the
Jordan canonical form J of A with a basis-change matrix Q as shown in Section 8.3;
and the other is by using the Cayley-Hamilton theorem with the characteristic poly-
nomial of A as shown in Section 8.4. In this section, we introduce the third method
by using the minimal polynomial , as a possibly simpler method.
The following example demonstrates how to compute a power A k and A-I for a
diagonalizable matrix A by using the minimal polynomial instead ofthe characteristic
polynomial and also without using the Jordan canonical form J of A.

Example 8.21 (Computing A k by the minimalpolynomial when A is diagonalizable)


Compute the power A k and A-I by using the minimal polynomial for a symmetric
matrix
4 0
A= 0 4
1 1 5
~ -~0 J .
[
-1 1 o 5
8.6.1. Application: The power matrix Ak again 303

Solution: Its characteristic polynomial is f (A) = (A - 3)2(A - 6)2. Since A is


symmetric and so diagonalizable, its minimal polynomial is meA) = (A - 3)(A - 6),
or equivalently meA) = A 2 - 9A + 181 = O. Hence ,

I 1
-5 0 I-I]
0 -5 I I
[
I
A- =--(A-9l)=-- .
18 18 I 1 -4 0
-I 1 0-4

To compute Ak for any natural number k, first note that the power Ak with k ~ 2
can be written as a linear combination of I and A, because A 2 = 9A - 18/. (See
page 297.) Hence, one can write

with unknown coefficients Xi'S. Now, by multiplying an eigenvector x of A belonging


to each eigenvalue Ain this equation, that is, by computing Akx = (xol + XI A)x, we
have a system of equations

as A = 3, 3k = Xo + 3XI ,
as A = 6, 6k = Xo + 6xI.

Its solution is Xo = 2 . 3k - 6k and XI = 1(6k - 3k). Hence ,

6k _ 3k
6k - 3k
3k + 2. 6k
o

Now, we discuss how to compute a power A k in two separate cases depending on


the diagonalizability of A.
(1) Firstly, suppose that A is diagonalizable. Then its minimal polynomial is equal
to meA) = 0:=1 (A - Ai), where AI , . . . , AI are the distinct eigenvalues of A. Now,
by using meA) = 0:=1(A - Ail) = 0 as in Example 8.21, one can write

A k = xoI + x\A + ... + xl_ IA I - I

with unknowns Xi 'S. Now, by multiplying an eigenvector Xi of A belonging to each


eigenvalue Ai in this equation , one can have a system of equations
304 Chapter 8. Jordan Canonical Forms

Its coefficient matrix P.. {] is an invertible Vandermonde matrix of order k because


the eigenvalues Ai'S are all distinct. Hence, the system is consistent and its unique
solution Xi'S determines the power A k completely.
(2) Secondly, let A be any n x n, may not be diagonalizable. By Theorem 8.6,
the minimal polynomial of A is m(A) = m=1
(A - Ai )k; , where AI, ... , A, are the
distinct eigenvalues of A and ki is the smallest positive integer l such that rank(A -
Ai l)l + mA; =
n, Let s =
L::=1 k, be the degree of the minimal polynomial m(A)
and let
A k = xoI + xlA + .. . + xs_lA s- l
with unknown Xi'S as before. Now, by multiplying an eigenvector x of A belonging
to each eigenvalue Ai in this equation, we have

Af = Xo + AiXl + ... + Af-lxs_l.

For the eigenvalue Ai, let {Xl, X2, .. . , Xkj} be a chain of generalized eigenvectors
belonging to Ai. Then these vectors satisfy

= AjXl,
Ai X2 + Xl,

By using the first two equations repeatedly, one can get

AkX2 = Afx2 + kAf-lXl .

On the other hand,

Akx2 = (xoI+XlA+ "'+XS_lA S - l)X2

= XOX2 + Xl(AiX2 + Xl) + . .. + Xs-l (Af- lX2 + (s - 1)Af-2Xl)

= (XO+AiXl+"'+Af-lXs-l)X2

+ (Xl + 2AiX2 + . .. + (s - 1)Ar2Xs_l) Xl.

In these two equations of Akx2, since the vectors Xl and X2 are linearly independent,
their coefficients must be the same. It means that
, s-l
= Xo + AiXl + ArX2 + + "i Xs-l,
Xl + 2AiX2 + + (s - 1)Ar2xs_l.

Here, the second equation is the derivative of the first equation with respect to Ai.
Similarly, one can write AkXk; as a linear combination oflinearly independent vec-
tors xi, X2, . .. , Xkj in two different ways and then a comparison of their coefficients
gives the following ki equations:
8.6.1. Application: The power matrix A k again 305

).} = + xo AjXI + ... + As-I


i Xs-I,
(~)A/-I = (DXI + (i)AiX2 + .. . + e- I)AS - 2
I i Xs-I ,
(~)Ajk-2 = @X2 + @Ai X3 + ... + e- I S 3
2 )Ai - Xs-I,

Note that the last ki - 1 equations are equivalent to the consecutive derivatives of
the first equationwith respect to Ai . For example, the two times of the third equation
is just the second derivative of the first equation, and the (ki - I)! times of the last
equationis the (ki - 1)-th derivative of the firstequation. Gettingtogetherall of such
equations for each eigenvalue Ai, i = 1, .. . , s, one can get a system of s equations
with s unknowns x j 's with an invertible coefficient matrix. Therefore,the unknowns
xo, . . . , Xs-I can be uniquely determined and so can the power Ak ,

Example 8.22 (Computing A k by the minimal polynomial when A is not diagonal-


izable) (Example8.11 again) Computethe power A k and A -I by using the minimal
polynomial for
I 21I2 01]
o
A=
[ -10 0 2 0
1 0 3
'

Solution: In Example 8.20, the minimal polynomial of A is determinedas m(A) =


(A - 2)3. Hence, one can write
k
A = xoI + xIA + x2A2

with unknown coefficients Xi'S and

2
A = [ ~ ~ ~8]'
-4 4 1
:

Now, at the eigenvalue A = 2, one can have

as A = 2, 2k = Xo + 2xI + 22x2,
take a", k2k- 1 = XI + 2 2x2,
take aI, k(k - 1)2k - 2
= 2X2

Its solutionis

X2 = k(k - 1)2k- 3; xI = k(2 - k)2 k _ I and XQ = 2k (1 3+ )


"ik2 - "ik 1 ,
306 Chapter 8. Jordan Canonical Forms

Thus, one can have (the same power as in Example 8.11)

2k - 3(3k + k2 )
k2 k
2k
k(k - 1)2k- 3

To find A -1, first note that m(A) = A 3 - 6A 2 + 12A - 8/ = O. Hence,

6 -2 -1 -2]
1 1 0 4 -4 0
A
-1
= '8 (A
2
- 6A + 12/) = '8 0 0 4 0 . o
[
2 -2 1 2

Problem 8.13 (Problem 8.11 again) F., the matrix A ~ [~ ~ ~]. ,"",.,1< the pow"

matrix A 10 and the inverse A - 1.

Problem 8.14 (Problem8.6 again)Compute Ak by using the minimal polynomial for

(1) A =
2 1
o 2 o
0 0 2 0
1 0 0] (2) A = -2
0 -3
1 - 1 2
1 -1 2
-2
12] .
[
'
[
o 0 o 1 -2 -3 1 4

8.6.2 The exponential matrix eA again

As a continuation of computing the exponential matrix eA in Section 8.3 in which


we use the Jordan canonical form J of A and a basis-change matrix Q, we intro-
duce another method with its minimal polynomial. This method is quite similar to
computing a power A k discussed in Section 8.6.1 (see also Example 8.22). It means
that we don't need the Jordan canonical form J of A and a basis-change matrix Q.
Because of a similarity to computing A k , we state only the difference in computing
the exponential matrix eA in this section .
We also discuss how to compute eA in two cases depending on the diagonalizability
of A.
(1) First, let A be diagonalizable. Then its minimal polynomial is equal to m(A) =
n:=1 (A - Ai), where A1' .. . , At are the distinct eigenvalues of A. Now, as before,
one can set
8.6.2. Application: The exponential matrix e A again 307

with unknowns Xj'S and by multiplying an eigenvector x of A belonging to each


eigenvalue Aj in this equation, one can get

~ ~~ ~l=~ ] [ ~~] [ :~: ]


[ ; A, A;~l X,~, ,L . =

The coefficient matrix is invertible and the unique solution x i 's determines the matrix
eA.
(2) Next, let A be any n x n, may not be diagonalizable. Then, the minimal
polynomial of A is m(A) = m=1 (A - A;)k; , where AI, ... , At are the distinct
eigenvalues of A and k; is the smallest positive integer f. such that rank(A - Aj /) l +
mA; = n. Let s = L~=I k, be the degree of m(A) and let

e
A
= xoI + xlA + .. . + xs_IA
s- 1

with unknown Xj'S as before.


For each eigenvalue Aj and a chain of generalized eigenvectors {XI , X2, . . . , Xk;}
belonging to Aj, a parallel procedure in computing eAxk; to that for A kxk; gives the
following kj equations :

eA; = AjXI + + ,s-l


I\.j Xs-I,
teA; = q)AjX 2 + + s- l ) , s-2
( 1 I\.j xs-I,
"':'e A; =
2! (2)A jX3 + + (S-2 l),I\.js-3 Xs-I,

Note that the last k; - 1 equations are equivalent to the consecutive derivatives of
the first equation with respect to Aj. For example, the 2! times of the third equation
is just the second derivative of the first equation, and the (kj - I)! times of the last
equation is the (kj - 1)-th derivative of the first equation . Getting together all of such
equations for each eigenvaluex. , i = 1, . .. , t, one can get a system of s equations
with s unknowns x j 's with an invertible coefficient matrix. Therefore, the unknowns
xo, .. . , Xs-I can be uniquely determined and so can the exponential eA.

Example 8.23 (Computing e A by the minimalpolynomialwhen A is diagonalizable)


Compute the exponential matrix e A by using the minimal polynomial for

5 -4 4]
A = [ 12 -11 12
4 -4 5
.

Solution: First, recall that the matrix A is the coefficient matrix of the system oflinear
differential equations given in Example 6.16:
308 Chapter 8. Jordan Canonical Forms

I
yj = 5Yl - 4Y2 + 4Y3
y~ = l2Yl - llY2 + l2Y3
Y~ = 4Yl - 4Y2 + 5Y3
It was known that A is diagonalizable with the eigenvalues Al = A2 = 1, and
A3 = -3. Hence, the minimal polynomial of A is m(A) = (A - l)(A + 3) , and one
can write e A =
xoI + Xl A with unknowns Xi'S. Then
as A = 1, e = Xo + Xl,
as A = -3, e- 3 = Xo - 3Xl.
By solving it, one can get Xo = t(3e + e- 3 ) , Xl = t(e - e- 3 ) and

2e - e- 3 -e + e- 3 e - e- 3 ]
e A
= xoI + xlA = 3e - 3e-3 -2e + 3e-3 3e - 3e-3 .
[ e - e- 3
o
-e + e- 3 2e - e- 3
Example 8.24 (Example 8.11 again) (Computing eA by theminimalpolynomialwhen
A is not diagonalizable) Compute the exponential matrix e A by using the minimal
polynomial for

o1 1
2 1
210]
A= 0 0 2 0 .
[
-1 1 0 3

Solution: In Example 8.20, the minimal polynomial of A is computed as m(A) =


(A - 2)3. Hence, one can set as in Example 8.22,
e A = xoI +xlA +x2A2
with unknown coefficients Xi 'S and

A
2
=[ ~ ~ : ~8]'
-4 4 1
Now, at the eigenvalue A = 2, one can have
as A = 2, e2 = Xo + 2xl +
take 0)", e2 = Xl +
take (0),,)2, e2 =
Its solution is X2 = !e2; Xl = _ e2 and Xo = e2. Hence,
e2 ~e2
A
e =xoI +xlA +x2A
2
= e2 (I - A+ ~A2) = [ ~ e2 2e2
o e2
_e 2 e2 !e 2
as shown in Example 8.11. o
8.6.3. Application: Linear Difference equations again 309

Problem 8.15 (Problem 8.6 again) Compute eA by using the minimal polynomial for

(1) A [~ ~ ~ ~]
= o (2) A = -2
0 -3
1 -1 2
12]
0 2 0 ' -2 1 -1 2 .
[
000 1 -2 - 3 1 4

8.6.3 Linear difference equations again

A linear difference equation is a matrix equation Xn =


AXn-I with a k x k matrix A
and if an initial vector Xo is given, then Xn = Anxo for all n.
If the matrix A is diagonalizable with k linearly independent eigenvectors
VI , V2, . . Vk belonging to the eigenvalues AI . A2, .. , Ak, respectively, a gen-
eral solution of a linear difference equation Xn =
AXn-I is known as

Xn = CIA'jVI + c2A2v2 + .. . + CkAkvk

with constants CI, C2, . q . (See Theorem 6.13.)


On the other hand, a linear difference equation Xn =
AXn-I can be solved for
any square matrix A (not necessarily diagonalizable) by using the power An, whose
computation was discussed in Sections 8.3 and 8.5.

Example 8.25 Solve a linear difference equation Xn = AXn-I, where

21 o1 211]
0
A= 0 0 2 0 .
[
- 1 1 0 3

Solution: In Example 8.11. it was shown that the matrix A has an eigenvalue 2 with
multiplicity 4, and A is not diagonalizable. However, the solution of Xn = AXn-I is
Xn = Anxo , where

2n _ n2 n- I n2 n- 1 2n- 3(3n + n2 )
n 0 ~ n~
A = 0 0 2n
[
_n2n- 1 n2 n- 1 n(n - 1)2 n- 3

which is given in Examples 8.11 and 8.22 o

Problem 8.16 Solve the linear difference equation Xn = AXn-l for


21
(1) A _
-
0 2
[ o0 00 ! ~],
o 2
(2) A = [~ i ! ~],
0 0 0 2
(3) A = [~ i ~ ~2] '
000
310 Chapter 8. Jordan Canonical Forms

8.6.4 Linear differential equations again


Now, we go back to a system of linear differential equations
y' = Ay with initial condition y(O) = Yo.
Its solution is known as y(t) = etAyo. (See Theorem 6.27.) In particular, if A is
diagonal izable, a general solution of y' = Ay is known as
y(t) = etAyo = cteA1tvl + c2eA2tv2 + ...+ cneA.tvn,
where VI , V2 , . .. , Vn aretheeigenvectorsbelongingtotheeigenvaluesAl, A2, . . . , An
of A, respectively .
For any square matrix A (not necessarily diagonalizable), the matrix etA can be
computed in two different ways. Firstly, let Q-l AQ =
J be the Jordan canonical
form of A. Then, the solution y(t) = etAyo is

where Q-lyO = (cj , ... , cn) and the Uj'S are generalized eigenvectors of A. In
particular, if Q-I A Q = J is a single Jordan block with corresponding generalized
eigenvectors u, of order k, then the solution becomes
iAyo = eAt QetN Q-I yO
= eAt [UI U2 . . . un]

o
x

o
= eAt ((~Ck+<~) UI + (~Ck+2~) U2 + ... +cnun).

As a simpler method to compute et A, one can use the minimal polynomial of t A


as discussed in Section 8.6.2. First, note that if m().,) is the minimal polynomial of A,
then m(t).,) is that of the matrix tAo
8.6.4. Application: Linear Differential equations again 311

Example 8.26 Solve the linear differential equation y' = Ay with initial condition
y(O) = Yo, where

A = [ 4-3 -1]
1
-1
0 -1
2 3
, Yo = [2]
1
4
.

Solution: Method 1: (i) Note that the characteristic polynomial of A is det(AI - A) =


A3 - n
2 + 16A - 12 = (A - 3) (A - 2)2 and A is not diagonalizable (because

rank(A-2I) = 2). By taking Xl = (-1, -1 , Ij and x, = (2,1, -1)aseigenvectors


belonging to A = 2 and A = 3, respectively, one can compute the Jordan canonical
form of A as follows:

].
where

h = [3], and Q = [=~1 0~ -1~


(ii) Let y = Qx. Then the given system changes to x' = Jx with

-1 1] [2] = [5]
1 1 1 5
-1 0 4 1

and its solution is


2t
to
J2 ] 5] =
[ 5 [e0 te:
e 0 0] [5]
5,
e 1 0 Oe 3t 1

since

(iii) Thus, we get

y(t) = Qx(t) = [=~


1 0 -1
~ ~ [e~] 0
t;;'t ~] [ ; ]
0 e 3t 1
312 Chapter 8. Jordan Canonical Forms

Method 2: To use the minimal polynomial of the matrix t A, first recall that the
characteristic polynomial of A is f(A) =
(A - 3) (A - 2)2 and A is not diagonal-
izable. Hence, its minimal polynomial coincides with the characteristic polynomial.
Therefore, the minimal polynomial of t A is the polynomial f(t A), and one can write

with unknown coefficient functions Xj(t)'s and

-4 ]
-4 .
8

Now, at each eigenvalue A, one can have


2t + +
as A = 2, e = Xo(t) 2x1 (t) 22X2(t) ,
take 0)", te 2t = XI (t) + 2 2x2(t) ,
=
as A 3, e3t = xo(t) + 3xI (t) + 32x2(t).

Its solution is xo(t) =-3e 2t - 6te 2t + 4e3t; XI(t) = 5te 2t - 4e 3t + 4e2t and
3t 2t 2t
X2(t) = e - e - te Hence,

Now, one might compare the value y(t) = etAy(O) with the solution obtained by
Method 1. The reader can easily notice that Method 2 is simpler than Method 1. 0

Example 8.27 (Computing etA when A is diagonalizable) (Example 6.16 again)


Solve the system of linear differential equations

Y~ = 5YI - 4Y2 + 4Y3


Y2 = 12YI - +
1
l1Y2 12Y3
Y3 = 4YI - 4Y2 + 5Y3,

and also find its particular solution satisfying the initial conditions YI (0) = 0, Y2 (0) =
3 and Y3(0) = 2.

Solution: The matrix form of the system is y' = Ay with

5 -4 4]
A = [ 12 -11 12
4 -4 5
,
8.6.4. Application: Linear Differential equations again 313

and its general solution is y = elAyo. It was known that A is diagonalizable and the
minimal polynomial of A is meA) = (A-l)(A+3). Ifwe write elA = xo(t)I +Xl (t)A
with unknown functions Xi (t)'s, then

as A = 1, el = xo(t) + Xl (t) ,
=
as A -3, e- 31 = xo(t) 3Xl(t).

(Compare with Example 8.23) . By solving it, we havexo(t) = t(3e' +e- 31) , XI(t) =
teet - e- 31) and

-el +e- 31
-2el + 3e-31
-el + e- 31

Moreover, with the initial conditions Yl (0) = 0, Y2 (0) = 3 and Y3 (0) = 2, the
particular solution is

One might compare this method with that given in Example 6.16. o

Note: In Example 8.27, we compute el A by finding the unknown functions Xi (t)'s in


elA = xo(t)I +xl(t)A. However, in Example 8.23, we determined e A = xoI +xlA
with Xo = t(3e + e-3 ) and Xl = tee - e- 3 ) . Hence, it looks true that elA =
xoI + Xl (t A) with the same Xo = t(3e + e- 3 ) and Xl = tee - e- 3 ) . But, it is not a
fact, because if we put elA = xoI + xlA, then Xo and Xl must be functions of t,

Example 8.28 (Computing elA when A is not diagonalizable) Solve the system of
linear differential equations y'(t)=Ay(t) , where

1111]
o 2 2 0
A=
[ 002 0
-1 I 0 3
.

Solution: In Example 8.20, the minimal polynomial of A is computed as meA) =


(A - 2)3 . Hence, as in Example 8.24, one can set
314 Chapter 8. Jordan Canonical Forms

with unknown coefficient functions Xj(t)'s. Now, at the eigenvalue X = 2, one can
have

as ).. = 2, e 2t = xo(t) + 2XI(t) + 22X2(t),


take aj", te 2t = XI(t) + 2 2x2(t),
take af, t 2e2t = 2X2(t) .

Its solution is X2(t) = !t 2e2t ; Xl (t) = te 21- 2t 2e2t andxo(t) = e2t - 2te 2t +2t 2e2t
Hence,

etA = xo(t)l +xI(t)A +x2(t)A2

[ e" - "u "u ,e" + k",u te 2t


o e21 2te 21
=
o 0 e 2t
oo ]'
-te 2t te 2t !t 2e21 e 21 + te 21

and the solution ofy'(t) = Ay(t) is given by y(t) = e'Ayo. o

Example 8.29 (Solving y' (t) =


Ay(t) by the minimal polynomial) Solve the system
of linear differential equations y' (t) =
Ay(t), where

A=
[ 5-3 -2]
8 -5 -4
-4 3 3
.

Solution: (1) The characteristic polynomial of A is det(Al - A) = )..3 -3)..2+3),,-1 =


=
().. - 1)3, so that the eigenvalue of A is ), 1 of multiplicity 3.
(2) In the matrix

A -I =
[ 4-3 -2]
8 -6 -4
-4 3 2
,

one can see that the second and the third rows are constant multiples of the first row,
and hence rank(A - I) =
1. Hence, A is not diagonalizable, but (A - 1)2 O. It =
means that the minimal polynomial of A is m()..) = ().. - 1)2 . Therefore, one can
write
e'A = xo(t)l + Xl (t)A
with unknown coefficient functions x, (t)'s, and one can have

as ).. = 1, Xo(t) + 1 Xl (t),


take aj", Xl (t) .
8.7. Exercises 315

Its solution is xo(t) = e' (l - t); XI (t) = te': Hence,

(l + 4t)e l -3te l -2te


l
]
eIA=xo(t)/+xt(t)A= 8te' (l-6t)e l -4te l
[
-4te l 3te' (1 + 2t)e' ,

and a generalsolutionof y' (t) = Ay(t) is given by y(t) = elAyo. o

Problem 8.17 Solve the system of linear differential equations y' = Ay with the initial condi-
tion yeO) = YO, where

2 1 -1]
A =[ -3 -1 1 , Yo =[ -1 ]
- 1 .
9 3 -4 1

Problem 8.18 Solve the system of linear differential equations y' = Ay for

!~] , [~ i !o ~], [~ i ~
2 1
(1) A = 0 2
= =
o 0 (2) A (3) A
[ o 0 o 2 0 0 2 000

8.7 Exercises

8.1. For A=[ ~


o
1! ~] ()"
0 0 )"
=1= 0), find A-I and its Jordan canonical form J.
8.2. Show that if A is nonsingular, then A - I has the same block structure in its Jordan canonical
form as A does .
8.3. Find the number of linearly independent eigenvectors for each of the following matrices :

[11000] ['0000] ['


1 0 0
01100 02000 0 2 0 o 0
(1) 0 0 1 0 0 ,(2) 0 0 2 0 0 ,(3) 0 0 3 o o]
0 .
00031 00051 0 0 0 3 0
00003 00005 0 0 0 o 5

8.4. Find the Jordan canonical form for each of the following matrices :

(1) [~ ~]. (2)


[-2o 0-2]
-1 1
1
-2
-1
, (3) [=~ 3~ -2~ ].
o 2 1
Also, find a full set of generalized eigenvectors of each of them.
316 Chapter 8. Jordan Canonical Forms

8.5. Show that a Jordan block J is similar to its transpose , J T = P -I J P, by the permutation
=
matrix P [en . .. ej]. Deduce that every matrix is similar to its transpose.
8.6. Evaluate det An for a tridiagonal matrix

b b 0 0 0 0 0 0
b b b 0 0 0 0 0
0 b b b 0 0 0 0
An = b > O.
0 0 0 0 b b b 0
0 0 0 0 0 b b b
0 0 0 0 0 0 b b

8.7. Solve the system of linear equations


{ (1- i)x + (1 + i)y = 2-i
(1 + i)x + (1 + i)y = 1 + 3i .
8.8. Solve the system of three difference equations:

{ X.+l = 3xn + 5Yn + 2zn


Yn+1 = Xn Yn + Zn
Zn+1 = 2xn + Yn + 3zn
= 0 , 1, 2, .. ..
1
for n

8.9. Solve Yn = AYn-1 for A = [i ~] with Yo = [ ~


8.10. Solve Yn = AYn-1 for A = [=~ ~
2 -12 -6
:] with Yo = (2, 1, 0) .

8.11. Solve s' = Ay for A = [~ ~ ] with Yo = [ ~ 1


8.12. Solve y' = Ay for A = [ =~ 2: : ] with y(1) = (2, 1, 0) .
2 -12 -6
8.13. Solve the initial value problem

YI(O) = -2
n(O) = 0
Y3(O) = - 1.

8.14. Consider a 2 x 2 matrix A = [~ ~].


(1) Find a necessary and sufficient condition for A to be diagonalizable.
(2) The characteristic polynomial for A is f()..) = )..2 - (a + d))" + (ad - be). Show
that f(A) = O.
8.7. Exercises 317

8.15. For each of the following matrices, find its Jordan canonical form and the minimal poly-
nomial .

(2) [ 7
2 -3 ]
-4 ' (3) [~ ~ -~].
o 2 -1
8.16. Compute the minimal polynomial of each of the following matrices and from these results

n
compute A -I , if it exists .

(1) [~ ~J. (2) [ ; ~ J. mU 0


2
0

8.17. Compute A-I, An and e A for

(1)
[~ ~ J. (2)
4I 2]
o4 2 , (3)
[1 0 I]
0 2 1 ,
004 001

[" [ 10
1 1 0
0 0]
1 0
(4) 0 2 0 0 0
o0 1 1 (5) 0 1 1 0 .
o0 0 2 -1 0 0 1

8.18 . An n x n matrix A is called a circulant matrix if the i -th row of A is obtained from the
first row of A by a cyclic shift of the i - I steps, i.e., the general form of the circulant
matrix is

a2 a3
[ an
"I al a2
A = an-I: an al

a2 a3 a4
(1) Show that any circulant matrix is normal.
(2) Find all eigenvalues of the n x n circulant matrix

(3) Find all eigenvalues of the circulant matrix A by showing that


n
A = "L..Jai W;-I .
;=1

(4) Compute det A. (Hint: It is the produc t of all eigenvalues.)


318 Chapter 8. Jordan Canonical Forms

(5) Use your answer to find the eigenvalues of

8.19. Determine whether the following statements are true or false, in general, and justify your
answers .
(1) Any square matrix is similar to a triangular matrix.
(2) If a matrix A has exactly k linearly independent eigenvectors, then the Jordan canon-
ical form of A has k Jordan blocks .
(3) If a matrix A has k distinct eigenvalues, then the Jordan canonical form of A has k
Jordan blocks.
(4) If two square matrices A and B have the same characteristic polynomial det(AI- A) =
det(AI - B) and for each eigenvalue Athe dimensions of their eigenspacesN(AI - A)
and N (AI - B) are the same, then A and B are similar.
(5) lfa4x4matrix A has eigenvalues 1 and2,eachofmultiplicity2,suchthatdim E(1) =
2 and dim E(2) =
I, then the Jordan canonical form of A has three Jordan blocks .
(6) If there is an eigenvalue Aof A with multiplicity mj., and dim E(Aj) :F mj." then A is
not diagonalizable.
(7) For any Jordan block J with eigenvalue A, det eJ = ej.,.
(8) For any square matrix A , A and A T have the same Jordan canonical form.
(9) If f(x) is a polynomial and A is a square matrix such that f(A) = 0, then f(x) is a
multiple of the characteristic polynomial of A.
(10) The minimal polynomial of a Jordan canonical matrix J is the product of the minimal
polynomials of its Jordan blocks Jj .
(11) If the degree of the minimal polynomial of A is equal to the number of the distinct
eigenvalues of A , then A is diagonalizable.
9

Quadratic Forms

9.1 Basic properties of quadratic forms


In the beginning of this book, we started with systems of linear equations , one of
which can be written as

+ a2x2 + .. . + anXn = b.
alXl
The left-hand side al X l +a2x2+ ... +anXn = aT x of the equation is a (homogeneous)
polynomial of degree I in n real variables. In this chapter, we study a (homogeneous)
polynomial of degree 2 in several variables, called a quadraticform, and show that
matrices also play an important role in the study of a quadratic form. Quadratic forms
arise in a variety of applications, including geometry , number theory, vibrations of
mechanical systems, statistics, electrical engineering, etc. A more general type of a
quadratic form is a bilinearform which will be described in Section 9.6. As a matter
of fact, a quadratic form (or bilinear form) can be associated with a real symmetric
matrix, and vice-versa.
A quadratic equation in two variables X and y is an equat ion of the form

ax
2
+ 2bx y + cy2 + dx + ey +f = 0,
in which the left-hand side consists of a constant term f , a linear form dx + ey , and
a quadratic form ax 2 + 2bxy + cy2 . Note that this quadratic form may be written in
matrix notation as

ax
2
+ 2bxy + cl = [x y] [ : ~] [ ~ ] = xT Ax,

l
where

x =[~ ] and A = [: ~
Note also that the matrix A is taken to be a (real) symmetric matrix.
Geometrically, the solution set of a quadratic equation in X and y usually represents
a conic section, such as an ellipse , a parabola or a hyperbola in the xy -plane. (See
Figure 9.1.)

J H Kwak et al., Linear Algebra


Birkhauser Boston 2004
320 Chapter 9. Quadratic Forms

Definition 9.1 (1) A linear form on the Euclidean space jRn is a polynomial of
degree 1 in n variables XI. X2, ... Xn of the form
n
b
T
X = L:>;x;.
;= 1

where x = [XI .. . xn]T and b = [bl ... bn]T in jRn.


(2) A quadratic equation on jRn is an equation in n variables XI. X2 , .. . Xn of the
fonn
n n n
f(x) = L L aijx;xj + L bix; + c = O.
;=1 j=1 ;=1

where aij , bj and c are real constants. In matrix form. it can be written as

f(x) = xT Ax + b T X + c = 0,
where A = [aij]. x = [XI . . , xn]T and b = [bl .. . bnf in lRn
(3) A quadraticform on jRn is a (homogeneous) polynomial of degree 2 in n variables
XI. X2 . . . Xn of the form

X2
XI ] n n
q(x) = xT Ax = [XI X2 ., . xn][aij] : = LLa;jX;Xj,
[ . ;=1 j=l
Xn

where x = [XI X2 .. . xnf E jRn and A = [aij] is a real n x n symmetric matrix.

It is also possible to define a linear form and a quadratic form on the Euclidean
complex n-space en instead of the Euclidean real n-space lRn

Definition 9.2 (1) A linear form on the complex n-space en is a polynomial of


degree 1 in n complex variables XI. X2 . . . Xn of the form
n
bH x = "
L.Jb;x;
'- ,
;= 1

=
where x [XI .. . xn]T and b = [bl . . . bnf in en.
(2) A complex quadratic form on en is a polynomial of degree 2 in n complex
variables Xl. X2 . . . Xn of the form

where x E en and A = [aij] is an n x n Hermitian matrix .


9.1. Basic properties of quadratic forms 321

The real quadratic form on jRn and the complex quadratic form on en can be
denoted simultaneously as q(x) = (x, Ax) by using the dot product on the real
n-space jRn or on the complex n-space en .

Remark: (I)A quadratic equation f(x) is said to be consistent ifithas a solution, i.e.,
there is a vector x E jRn such that f(x) = O. Otherwise, it is said to be inconsistent.
For instance, the equation 2x 2 + 3 y 2 = -1 in jR2 is inconsistent. In the following,
we will consider only consistent equations.
(2) A linear form is simply the dot product on the real n-space jRn or on the
complex n-space en with a fixed vector b.
(3) The matrix A in the definition of a real quadratic form can be any square
matrix. In fact, a square matrix A can be expressed as the sum of a symmetric part B
and a skew-symmetric part C, say

1 TIT
A = B+C, where B = -(A + A ) and C = -(A - A ).
2 2
For the skew-symmetric matrix C, we have

Hence, as a real number, x T Cx = O. Therefore,

q(x) = x T Ax = x T (B + C)x = x T Bx.


This means that, without loss of generality, one may assume that the matrix A in the
definition of a real quadratic form is a symmetric matrix.
(4) For the definition of a complex quadratic form, let A be any n x n complex
matrix. Then, for any x E en, the matrix product x H Ax is a complex number. But,
for the matrix A, it is known that there are Hermitian matrices Band C such that
A = B + iC. (See page 264.) Hence,

x H Ax = x H (B + iC)x = x H Bx + ix H Cx ,

in which x H Bx and x H Cx are real numbers. Hence, for a complex quadratic form on
en , we are only concerned with a Hermitian matrix A so that x H Ax is a real number
for any x E en.
The solution set of a consistent quadratic equation f (x) = x T Ax + b T x + c = 0
is a level surface in jRn, that is, a curved surface that can be parameterized in n - 1
variables. In particular, if n = 2, the solution set of a quadratic equation is called
a quadratic curve, or more commonly a conic section. When n = 3, it is called a
quadratic surface, which is an ellipsoid, a paraboloid or a hyperboloid.

Example 9.1 (The standard three types of conic sections)

(1) (circle or ellipse) ~ + ~ = 1 with A = [~ ~] .


322 Chapter 9. Quadratic Forms
2
(2) (hyperbola) ~ - ~ = 1 or ;;. -
2 2
ir2 = 1 with

A= [* _~] or A= [-t ~].


(3) (parabola) x
2
= ay or y2 = bx with A = [~ ~] or A = [~ ~].
All of these cases are illustrated in Figures 9.1 as conic sections. 0

Figure 9.1. Conic sections

Example 9.2 (The standardfour typesof quadratic surfaces)

(1) (ellipsoids) ~+ ~+ ~= 1 with A = [* X ~].


00&
+
. 2 2 2 2
(2) (hyperboloids of one or two sheets) ~
a trb - ~
c
= 1 (of one sheet) or -~
a -
2 2
~ + ~ = 1 (of two sheets) with

:0!I
a 01 0] [-:!I0 a 01 0]
A=
[o b!
0 -1c
0 or A=
0
-b! &? .
0

:0!I b!0 0]
a 1

[o
2 2 2
(3) (cones) ~ + ~ - ~ = 0 with A = 0 .
0 -&1
(4) (paraboloids; elliptic or hyperbolic) ~+ ~ = ~ (elliptic) or ~ - ~ = ~, c > 0
(hyperbolic) with

A~[~ 0]
1
o1 0
'Q!
b! or A= [ ~
o 0
9.1. Basic properties of quadratic forms 323

2 2 2
Figure 9.2. Ellipsoid: a~ + trb + ~
c =1

2 2 2 x2 2 2
Figure 9.3. Hyperboloid of onesheet: ~
a
~ = 1;and of
+ trb - c twosheets: -:;r - trb + ~
a e
=1

2 2 2
Figure 9.4. Cone: ~
a + trb - 5
c =0

All of these cases are illustrated in Figures 9.2-9.5. o


Problem 9.1 Find the symmetricmatricesrepresenting the quadraticforms

(1) 9x[ - xi + 4xj + 6XjX2 - 8XjX3 + 2X2X3,


(2) XjX2 + XjX3 + X2x3,
(3) X[ + xi - xj - xl + 2xjX2 - lOxjX4 + 4X3X4.
324 Chapter 9. Quadratic Forms

Figure 9.5. Elliptic paraboloid: ~+ ~ = ~ and Hyperbolic paraboloid: ~ - ~ = ~,


c>O

9.2 Diagonalization of quadratic forms


In this section, we discuss how to sketch the level surface of a quadratic equation
on jRn. To do this for a quadratic equation f(x) =
x T Ax + b T X + c 0, we first =
T
consider a special case of the type x Ax = c without a linear form .
A quadratic form on jRn without a linear form may be written as the sum of two
parts:
n
T
q(x) = x Ax = 'L,aiixl + 2 'L,aijXiXj,
;=1 i<j

in which the first part L:?=Iaiixl is called the (perfect) square terms and the second
part Li;fj aijxjxj is called the cross-product terms. Actually, what makes it hard to
sketch the level surface of a quadratic equation is the cross-product terms, However,
the quadratic form q(x) = x T Ax can be transformed into a new quadratic form
without the cross-product terms by a suitable change of variables . It can be done
by computing the eigenvalues of A and their associated eigenvectors. In fact, the
symmetric matrix A can be orthogonally diagonalized, i.e., there exists an orthogonal
matrix P such that

AI 0 ]
T -I A2
P AP = P AP = D = .. . .
[
o An

Here, the diagonal entries Ai's are the eigenvalues of A and the column vectors of P
are their associated eigenvectors of A. Now, by setting x = Py, (that is, by a change
of variables), we have

which is a quadratic form without the cross-product terms,


It is also true for a complex quadratic form q(x) = x H Ax with a Hermitian
matrix A. Since every Hermitian matrix is unitarily diagonalizable, there exists a
9.2. Diagonalizationof quadratic forms 325

unitary matrix U such that U H AU = D is a diagonal matrix. Hence , by a change of


= =
variables x Uy, the quadratic form q(x) s" Dy has only square terms.
In either case of real or complex, we consequently have the following theorem .

Theorem 9.1 (The principal axes theorem)


(1) Let x T Ax be a quadraticform in x = [XI X2 xnf E ]Rn for a symmetric ma-
trix A. Then, there isachangeofcoordinatesofxintoy = pT X = [YI Y2 . . . Yn]T
such that
x TAx = yTDy = AIYf + A2yi + . .. + AnY; ,
where P is an orthogonal matrixand pT AP = D is diagonal.
(2) Let x H Ax be a complex quadraticform on en witha Hermitian matrix A. Then,
thereis a change ofcoordinates ofx into y = U H X = [YI Y2 ... Ynf such that

where U is a unitary matrixand U H AU = D is diagonal.


Clearly, the columns of the matrix P (U, respectively) in Theorem 9.1 form an
orthonormal basis for R" (for en, respectively) and it is called the principal axes of
the quadratic form. The vector y is just the coordinate expression of x with respect to
the principal axes.

Example 9.3 (Via a diagonalization ofa quadraticform) Determine the conic section
3x 2 + 2x y + 3y 2 - 8 = 0 on ]R2.

Solution: In matrix form, it is

The matrix A = [~ ; ] has eigenvalues Al = 2 and A2 = 4 with associated unit


eigenvectors

VI=(~' - ~) and

respectively, which form an orthonormal basis p. If ex denotes the standard basis, then
the basis-change matrix

1 [ 1 1] [ cos 45 sin 45 ]
P = [id]p = [VI V2] = r;; -1 I = . ,
"0/2 -sm45 cos45

which is a rotation through 45 in the clockwise direction such that pT = p-I . It


gives a change of coordinates, x = Py, i.e.,
326 Chapter 9. Quadratic Forms

~
[ ]= ~ [-~ ~] [ ~; ] = [ - J,::: ~~: l
It implies that

3x 2 + 2xy + 3y2 = xTAx = yT pT APy


= yT [~ ~] y = 2(x')2 + 4(y')2 = 8,

or
(X')2 (y')2
-4-+-2-= I,

which is an ellipse with the principal axes VI = pT el and V2 = pT e2. 0

To determine the type of a quadratic form, we introduce the following definition


for a symmetric matrix A or a quadratic form x T Ax.
Definition 9.3 Let A = [aij] E Mnxn{lR) be a symmetric matrix and let x =
(Xl , X2, ... , Xn ) E jRn . Then, the matrix A, or a quadratic form xT Ax, is said
to be
(1) positive definite if x T Ax = Li,j aijXiXj > 0 for all nonzero x,
(2) positive semidefinite if x T Ax = Li,j aijXiXj 2: 0 for all x,
(3) negative definite if x T Ax = Li,j aijxix j < 0 for all nonzero x,
(4) negative semidefinite if x T Ax = Li,j aijxix] ::: 0 for all x,
(5) indefinite if x T Ax takes both positive and negative values.
Similarly, one can define the same terminologies for a Hermitian matrix or for a
complex quadratic form x H Ax on en .
For example, the real symmetric matrix

is positive definite, because the quadratic form satisfies

xT Ax = [Xl x2 X3] [ - io -;-1 -~2 ][ ~~ ]


X3

= [Xl X2 X3] [ _XI2~2~2X~ X3 ]


-X2 + 2X3
= Xl (2Xl - X2) + X2(-Xl + 2x2 - X3) + X3(-X2 + 2X3)
= 2xr - 2X\X2 + 2xi - 2X2X3 + 2xi
= xr + (Xl - X2)2 + (X2 - X3)2 + xi > 0
9.3. A classification of levelsurfaces 327

unless XI = X2 = X3 = O.
The following characterizationsfollow from the principal axes theorem.
Corollary 9.2 A real symmetric or a Hermitian matrix A is
(1) positive definite if and only if all the eigenvalues of A are positive,
(2) positive semidefinite if and only if all the eigenvalues of A are nonnegative,
(3) negative definite if and only if all the eigenvalues of A are negative,
(4) negative semidefinite if and only if all the eigenvalues of A are nonpositive,
(5) indefinite if and only if A takes both positive and negative eigenvalues.

Note that if A is positive definite, det A > 0 (as the product of all eigenvalues).
If the eigenvalues of A are all negative, then - A must be positive definite and con-
sequently A must be negative definite. If A has eigenvalues that differ in sign, then
A is indefinite. Indeed, if Al is a positive eigenvalue of A and XI is an eigenvector
belonging to AI, then

and if A2 is a negativeeigenvaluewith eigenvector X2, then

If A is definite,then 0 is the only criticalpoint of a quadraticform q(x) = (x, Ax),


and q (0) = 0 is the global minimum if A is positive definiteand the global maximum
if A is negativedefinite. If A is indefinite, then 0 is a saddle point.

9.3 A classification of level surfaces

We have seen already in Section 9.1 that the geometric type of the level surface of a
quadraticequation xT Ax = 0 depends on the signs of the eigenvalues of A. In fact, it
is completely determined by the numbers of positive, negative and zero eigenvalues
of A.

Definition 9.4 The inertia of a Hermitian (or a symmetric) matrix A is a triple of


integers denoted by In(A) = (p, q , k), where p, q and k are the numbersof positive,
negative and zero eigenvalues of A, respectively.

The inertiaIn(A) determinesthegeometrictype ofthe quadraticsurfacex T Ax = 0


on jRn in the following sense. Since In(-A) = (q, p, k) ifIn(A) = (p , q, k) and the
equation x T Ax = c is inconsistent if p = 0 and c > 0, it suffices to consider the
cases of c :::: 0 and p > O. Excluding those inconsistentcases, we have the following
characterization of the solution sets for n = 2 and 3:
328 Chapter 9. Quadratic Forms

For n = 2, there are only three possible cases for In(A):


In(A) The solution of x T Ax = c
(p , q , k) c>O c=O
(2,0,0) ellipse a point
(I, 1,0) hyperbola two lines crossing at 0
(1,0 , 1) two parallel lines a line

For n = 3, there are six possibilities:


In(A) The solution of x T Ax = c
(p, q, k) c>O c=O
(3,0,0) ellipsoid a point
(2, 1,0) one-sheeted hyperboloid elliptic cone
(2,0, 1) elliptic cylinder a line
(1,2,0) two-sheeted hyperboloid elliptic cone
(1 , I, 1) hyperbolic cylinder two planes crossing in a line
(1 ,0,2) two parallel planes a plane

In general , for an n x n symmetric matrix A, In(A) will have n(n + 1)/2 possibil-
ities , each characterizing a different geometric type of a quadratic form. For example,
if In(A) = (n, 0, 0) and c > 0, i.e., the eigenvalues of A are all positive, then the
quadratic form describes an ellipsoid in ]Rn, etc.

Example 9.4 (The inertia determines the geometric type of the quadratic form) De-
termine the quadratic surface for 2xy + 2xz = 1 on ]R3.

Solution: The matrix for the given quadratic form is

011]
A=
[ 1 0
100
,

and the eigenvalues of A can be found to be Al = ./2, A2 = -./2, A3 = 0, with


associated orthonormal eigenvectors

VI = (_1 ~~) V2 = (__


1 ~~) V3 = (0 __
1 _1)
./2 ' 2' 2 ' ./2' 2' 2 ' './2' ./2 '
respectively. Hence, an orthogonal matrix P that diagonalizes A is

1 [./2 -./2 0]
P = 2: 1 1 -./2 ,
1 1./2
9.3. A classification of levelsurfaces 329

and with the change of coordinates x = Py, that is,


1 / / 1 / / r: /
y = - (x + y -v2 z), z = ~(x/ + y/ + hz/) ,
X = - ( x - y ),
./2 2

the equation is transformed to ./2(x/)2 - ./2(y')2 = 1, which is a hyperbolic cylinder


as shown in Figure 9.6. Note that In(A) = (1, 1, 1). D

Figure 9.6. Hyperbolic cylinder: 2x y + 2xz =1

Now, consider a general form of a quadratic equation on IRn

(1) If it does not have a linearfonn, i.e., b = 0, then, as shown already, aparabolic
level surface does not appear as a solution of the quadratic equation.
(2) Suppose that it has a nonzero linear form , i.e., b ::/= O. If the matrix A is
invertible, then, by taking a change of variables as y = x +! -!
A -I b (or x = y A -I b),
(it is a translation) the given quadratic equation is transformed into a new quadratic
equation yTAy = d without a linear form, where d = c + ~bT A-lb. However, if
A is not invertible, the solution of the quadratic equation depends not only on the
inertia of A , but also on the type of linear form, and a parabolic level surface appears
as the solution of the quadratic equation with a nonzero linear form, For example, the
equation x 2 - Z = c has a singular quadratic form for which In(A) = (1,0,2) and
also has a nonzero linear form that cannot be removed by any change of variables.
The solution of this equation is a parabolic cylinder when n = 3.

Example 9.5 (A quadratic equation having a linear form) Determine the conic sec-
tion for 3x 2 - 6x y + 4y2 + 2x - 2y = O.

Solution: The matrix for the quadratic form 3x 2 - 6x y + 4y2 is

A = [ 3-3]
-3 4 .
330 Chapter 9. Quadratic Forms

Its inverse is
A-I = j [~ ;] and b = [ _; l
With the change of variables y = x + ! A-I b , that is
, 1,
x = x + 3' y = y,

the equation is transformed to a new equation 3(x')2 - 6x' y' + 4(/)2 = j. Clearly,
the matrix representation of the new quadratic form is also A, and its eigenvalues
are 1(7 .,ffl). Therefore, In(A) = (2,0,0) and the solution of the equation is an
ellipse. D

The following is another view of the classifying of the conic sections in ]R2 and it
can be skipped at the reader's discretion.

Example 9.6 (The classification of the conic sections in ]R2) Consider a quadratic
equation in two variables on ]R2 :

ax 2 + 2bxy + ci + dx + ey + f = o.
Or, in matrix form
x Ax + b
T T
X + f = 0,
with the symmetric matrix A = [~ ~], b = [d ef and x = [x yf in ]R2. We
present here the classification of the conic sections according to the coefficients.
(1) If b = 0 , then A is already a diagonal matrix with the eigenvalues a and c,
and the equation becomes

ax 2 + cy2 + dx + ey + f = o.
(I) If a = 0 = c, then the conic section is a line in the plane.
(Ii) If a t= t=
0 = c, then it is a parabola when e 0, or one or two lines when e = O.
(iii) If a t= t=
0 c, then the quadratic equation becomes

ax 2 + ci + dx + ey + f = a(x - p)2 + c(y - q)2 +h = 0

for some constants p, q, and h. If h = 0, the cases are easily classified (try).
Suppose h t=
O.Then, the conic section is a circle if a = c, an ellipse if ac > 0,
or a hyperbola if ac < O.

(2) Suppose that b t=


O. Since A is symmetric, it can be diagonalized by an
orthogonal matrix P whose columns are orthonormal eigenvectors, and the diagonal
matrix has eigenvalues AI and A2 on the diagonal. By a basis-change by P, the
quadratic equation becomes

ax 2 +2bxy + cy2 +dx + ey+ f = AIU 2+A2V2 +d'u + e'v + f = 0


9.3. A classificationof level surfaces 331

for some constantsd' and e', Hence,the classification ofthe conic sectionsis reduced
to thecase (1) according tothevarious possiblecasesof theeigenvalues of A. However,
the eigenvalues are given as

(a + c) + ../ (a - c)2 + 4b 2 (a +c) - ../(a - c)2 +4b2


Al = 2 ' A2 = 2 '

whichare determined by the coefficients a, b, and c. Henceone can classifythe conic


sectionaccording to the various possiblecases a, b, and c (see Exercise 9.4).
(3) The axes of the conic sectionare the directions of the eigenvectors, whichare
orthogonal to each other. Since we only need to find axis lines, but not the direction
vectors, we may choose them to be the rotation of the standard coordinate x- and
y-axes which are determined by ej , e2 . Now, a pair of orthogonal eigenvectors are
found to be
Vi = [ ~:~ ] = [ -(a ~ Ai) ]

for i = 1,2. The slope of VI from the x-axis is

= _a2~c + (a;Cr +1

= - cot 28 + cosec 28 = tan 8,



where cot 28 = aibc for some 8. Since b :f; and a - c > 0, one may assume that
< 8 < T( . This means that if we set tan 28 = a~c with - ~ < 8 < ~ , then 8 is the
rotation angle we were looking for. Therefore, the orthonormal eigenvectors Ul and
U2 of A may be chosen as the rotationof the standardbasis through the angle 8. The
basis-change matrix is now

P -_ [Ul U2 -
l_[cOSO
. 0
sm
-sinO]
cosol l ' an
d pT Ap_p-l
- A P_[AI
- 0 '11.2 0] .

By a change of coordinates [ ~] = P[ ~; ], the quadratic equationbecomes


ax 2 + 2bxy + cy2 + dx + ey + f = AIX,2 + A2y,2 + d'x' + e'y' + f = 0,
where d' = d cos8 + e sin 0 and e' = -d sin 0 + e cos O. o

Problem 9.2 Sketchthe level surfaceof each of the following quadratic equations:
(1) 2x 2 + 2 y 2 + 6yz+ lOz2 = 9;
(2) x 2 - 8xy + 16y2 - 3z 2 = 8;
(3) 4x 2 + 12xy + 9 y 2 + 3x - 4 = O.
332 Chapter 9. Quadratic Forms

9.4 Characterizations of definite forms


In the previous section, we have seen that the geometric type of a quadratic equation
(x, Ax) = 0 depends on the inertia of the matrix A. Hence, it is important to determine
whether or not a Hermitian (or symmetric) matrix A is positive definite or negative
definite. In most cases, the definition does not help much for such criteria. But we
have seen that Corollary 9.2 gives us a practical characterization of positive definite
matrices: A is positivedefinite if and only if all eigenvalues of A arepositive. We will
find some other practical criteria in terms of the determinant of the matrix. For this, we
=
again look at the quadratic form in two real variables, q(x , y) ax 2 + 2bxy + cy2,
which may be rewritten in a complete square form as

q (x) = ax
2
+ 2bxy + ci = a (x + ~ y) 2+ (c _ ~2) i.
We see that q is positive definite, i.e., q(x) = x T Ax > 0 for any nonzero vector
=
x (x, y) E jR2, if and only if a > 0 and ac > b2, or equivalently, the determinants
of
[a] and [: ~]
are positive.
A generalization of these conditions will involve all n submatrices of A, called the
principal submatrices of A, which are defined as the upper left square submatrices

With this construction, we have the following characterization of positive definite


matrices.

Theorem 9.3 The following are equivalent for a Hermitian (or a real symmetric)
matrix A:
(1) A is positive definite, i.e., (x, Ax) > 0 for all nonzero vectors x;
(2) all the eigenvalues of A are positive;
(3) all the principal submatrices Ak 's havepositive determinants;
(4) A can be reduced to an upper triangular matrix by using only the elementary
operation of "adding a constant multiple of a row to another," (without row
interchanges), and all the pivots are positive;
(5) there exists a lower triangularmatrix L withpositive diagonalentriessuch that
A = LLH, (= LLT if A is real symmetric) (calleda Cholesky decomposition
or a Cholesky factorization);
(6) there existsa nonsingular matrix W such that A = W H W, (= WT W if A is real
symmetric).
9.4. Characterizations of definite forms 333

Proof: (1) } (2) was shown .


(2) =} (3) First, we prove it for the real case with a symmetric matrix A. If A
has positive eigenvalues ).,1, ).,2, . . . , ).,n, then det A = ).,1).,2 ).,n > O. To prove
the same result for all the submatrices At, we claim that if A is positive definite,
so is every Ak . For each k = 1, . . . , n, consider all the vectors whose last n - k
x
components are zero , say = [XI . , . Xk 0 ... of = [Xk of, where is anyxl
vector in ]Rk. Then

x T Ax = [Xk 0] T [Ak* * *] [ 0 ]
Xk = xkT AkXk.
Since x T Ax > 0 for all nonzero x, xlAkXk > 0 for all nonzero Xk E ]Rk; that is,
Ak 'S are positive definite, all eigenvalues of Ak are positive, and their determinants
are positive. The complex case with a Hermitian matrix A can be proved by the same
argument except for using H instead of T.
(3) =} (4) Let A = [aij] . Then the first principal submatrix AI = [an] has
positive determinant, i.e., an > O. Thus , an can be used as a pivot for forward
elimination to make all other first column entries below all zero. Let ai~ denote
the (2, 2)-entry of the resulting matrix, so that the principal submatrix A2 has been
transformed into a matrix
al l a12]
[ (I)
O a22
Since the elementary operation of "adding a constant multiple of a row to another"
does not change the determinant, we have

(I) d (I) det A2 det A2


detA2 = alla22 an a 22 =-all
- =- ->
det Al
O.

Since ai~ = 0, it can be used as a pivot in the second forward elimination, to transform
the principal submatrix A3 into

Also, one can show that

(2) det A3 det A3


and a33 = --(-I) = -d-- > O.
ana22 et A2

A similar process can be repeated to get an upper triangular matrix as a row-echelon


form of A with the k-th pivot a1~-I), which is exactly the ratio of det Ak to det Ak-I.
Hence, all pivots are positive.
(4) =} (5) First, consider the real case . By the hypothesis (4) and the uniqueness
of the LDU factorization of a symmetric matrix, the matrix A can be factored as a
product L DL T with
334 Chapter 9. Quadratic Forms

d; > 0.

Define

Then, clearly det( yIn) > 0, D = ,Ji5,Ji5 and (,Ji5l = ,Ji5. Hence,
A = LDL T = (L.JD)(.JDL T) = (L.JD)(L.JD)T,
as desired. A similar process can be repeated for a complex case with H instead of
T

(5) =} (6) is easy.


(6) =} (1) Let A = W T W for a nonsingular matrix W. Then, for x =1= 0,

x T Ax = x T (WTW)x = (Wxl (Wx) = IIWxll 2 > 0,


because Wx =1= O. Similarly for a complex case . o

Problem 9.3 Determine which one of the following matrices A and B is positive definite. For
the positive definite one, find a nonsingular matrix W such that it is WT W .

A =
[ 2-I -1]
-1
-I
2 -I
-I 2
, B=
2-I 0]
[o
-1 2 1
1 2
.

Problem 9.4 Let A be a positive definite matrix. Prove that CT AC (or C H AC for a complex
case) is also positive definite for any nonsingular matrix C .

Since a Hermitian matrix A is negative definite if and only if - A is positive


definite, one can get the following theorem from Theorem 9.3.

Theorem 9.4 The following statements are equivalent for a Hermitian (or a real
symmetric) matrix A:

(1) A is negative definite, i.e., (x, Ax) < for all nonzero vectors x:
(2) all the eigenvalues of A are negative;
(3) the determinants ofthe principal submatrices Ak 'salternate in sign: i.e., det AI <
0, det A2 > 0, det A3 < 0, and so on,'
9.5. Congruence relation 335

(4) A can be reduced to an upper triangular matrix by using only the elementary
operation of "adding a constant multiple of a row to another," (without row
interchanges), and all the pivots are negative;
(5) there exists a lower triangular matrix L with positive diagonal entries such that
A = -LL H , (= -LL T if A is real symmetric);
(6) there exists a nonsingular matrix W such that A = - WH W, (= - WTW if A is
real symmetric).

Problem 9.5 Show that the determinant of a negative definite n x n symmetric matrix is positive
if n is even and negative if n is odd.

One can easily establish the following analogous theorem for semidefinite matri-
ces.

Theorem 9.5 The following statements are equivalent for a Hermitian (or a real
symmetric) matrix A:
(1) A is positive semidefinite, i.e., (x, Ax) ~ 0 for all nonzero vectors x;
(2) all the eigenvalues of A are nonnegative;
(3) A can be reduced to an upper triangular matrix by using only the elementary
operation of "adding a constant multiple of a row to another," (without row
interchanges), and all the pivots are nonnegative;
(4) there exists a matrix W, possibly singular, such that A =
WHW.
(= WTW if A is real symmetric).

Problem 9.6 Determine whether the following statement is true or false: A Hermitian matrix
A is positive semidefinite if and only if all the principal submatrices Ak'S have nonnegative
determinants.

Problem 9.7 State the corresponding conditions to the ones in Theorem 9.5 for the negative
semidefinite forms.

Problem 9.8 Which of the following matrices are positive definite? negative definite? indefi-
nite?

(I)
[
121]
2 1 1
I 1 2
, (2)
[
200]
0 5 3 ,
(3) [ -:
-I 0]
2 I .
035 I 3

9.5 Congruence relation

As we have seen already, in a quadratic equation x T Ax + b T X + c = 0 on IRn , the


linear form may be eliminated by a change of variables when A is invertible, and
then by the principal axis theorem the equation can be transformed into a simple form
yTAy = c having only square terms. Hence, the geometric type of the quadratic
336 Chapter 9. Quadratic Forms

equation may be easily classified. However, these changes of variables contain basis
changes by some invertible matrices.
Let us now consider a change of basis (or variables) and a relation between
two different matrix representations of a quadratic form, Usually a real or complex
quadratic form q(x) = (x, Ax) is expressed in the coordinates of x with respect to
the standard basis ex = {ej , C2, , en} for IR n or for en depending on a real or
complex case. Let f3 = {e~, ~, , e~} be another basis. Then, any vector x has two
coordinate representations [x], and [x]p through the equations

x\e\ + X2e2 + ...+ xne n =X = y\e~ + Y2e~ + ... + Yne~.

They are related as [x]a = P[x]p, where P = [id]p is the basis-change matrix from
f3 to ex. This is just a change of variables. If we set notations x = [x]a and y = [x]p,
then the quadratic form can be written as

q(x) = (x, Ax) = (Py, APy) = (y, pH APy) = (y, By) ,

where B = pH AP and (y, By) is the expression of q(x) = (x, Ax) in a new basis
(or a new coordinate system) f3.
Definition 9.5 (1) Tworeal n x n matrices A and B are said to be congruent if there
exists an invertible real matrix P such that pTA P = B.
(2) Two complex n x n matrices A and B are said to be Hermitian congruent if
there exists an invertible complex matrix P such that pH A P = B.
It is easily seen that the congruence relation is an equivalence relation in the
vector space Mnxn(IR), and any two matrix representations of a quadratic form on
IR n with respect to different bases are congruent. A similar statement also holds for a
Hermitian congruence and a complex quadratic form,
Remark: (1) Two orthogonally similar real matrices are clearly congruent, but the
converse is not true in general. Clearly, a real symmetric matrix A is congruent to
a diagonal matrix D by an orthogonal matrix P . However, it can be congruent to
infinitely many different diagonal matrices (not necessarily by orthogonal matrices).
In fact, if pT AP = D by an orthogonal matrix P, then the matrix Q = kP, k =F 0,
also diagonalizes A to a different diagonal matrix via a congruence relation:

which is also diagonal with diagonal entries k 2AJ, k2A2, ... , k 2An. In this case, if
k =F 1, Q is not an orthogonal matrix and the resulting diagonal entries are not the
eigenvalues of A anymore.
(2) Sylvester's law of inertia (Theorem 9.10 in Section 9.6) says that even though a
real symmetric matrix A may be congruent to various diagonal matrices, the numbers
of positive, negative and zero diagonal entries are invariant under the congruence
relation. That is, any two symmetric matrices which are congruent have the same
inertia. A similarity holds for Hermitian matrices: any two Hermitian matrices which
are Hermitian congruent have the same inertia. (See Corollary 9.11.)
9.5. Congruence relation 337

Certainly, the inertia of a real symmetric matrix (a Hermitian matrix in a complex


case) can be found by computing the eigenvalues. However, there is another practical
method of diagonalizing it through the congruence (the Hermitian congruence in a
complex case) relation by using the elementary row operation of adding a constant
multiple of a row (or a column) to another row (or a column).
First, suppose that a real symmetric matrix A is diagonalized by an invertible
matrix P through the congruence relation pT AP = D. Since both P and pT are
=
invertible matrices, P T can be written as a product of elementary matrices, say P T
Ek '" E2E\. Then we have

Recall that for any elementary matrix E, the product E A is exactly the matrix that is
obtained from A when the same elementary row operation is executed on A. Clearly,
if E is an elementary matrix, so is E T Moreover, if an elementary matrix E is
obtained by executing an elementary operation on the i -th row, then the product
E AE T is just the matrix that is obtained from A when the same elementary operation
is executed both on the i -th row and on the i -th column. Since A is symmetric, the
operation EAE T will have the same effect on the diagonally opposite entries of A
simultaneously. For instance, if

112] 1 0 0]
A=
[ 236
1 0 3 and E =[ - 1 1
001
0 ,

which is an elementary matrix adding -1 times the first row to the second row, then
EA is the matrix obtained from A by replacing the second row [1 0 3] by [0 - 1 1].
Now, the matrix EAET is obtained from the matrix EA by replacing the second
column by [0 - 1 If for the symmetry of the matrix EAE T . In fact,

It implies that the operations performed from the left of A (i.e., the product of
Ek . .. E2E\) are nothing but a forward elimination on A to get an upper triangu-
lar matrix pT A and those from the right (i.e., the product of Ef EI ... ED are the
corresponding column operations to yield a diagonal matrix D. In summary, ifwe take
aforward elimination on A to getan uppertriangularmatrixby the elementarymatri-
ces E\, . . . , Ek, then Ek ' " E\AEf . . , E[ = D is diagonaland pT = Ek'" E\.
It gives

[A I I] ~ [E\AEf I E\l] ~ [E2E\AEf EI I E2E\I] ~ ...


~ [Ek'" E\AEf .. E[ I Ek'" E\I] = [D I pT].
338 Chapter 9. Quadratic Forms

Remark: (1) In the conjugate relation p T AP = D, the matrix P need not be an


orthogonal matrix, and the diagonal entries of D need not be eigenvalues of A.
(2) Be careful not to apply the same argument for the diagonalization of symmetric
matrices through the similarity p- I AP =D, because multiplying E- 1 on the right
of A is not the same column operation as E T, so that the operations EAET do not
work for the diagonalization of A.
To diagonalize a Hermitian matrix A through the Hermitian congruence as a
complex case, a similar argument as the real case can be applied in parallel with H
instead of T.
The following example shows how to determine the inertia of a real symmetric
matrix A through the congruence relation (instead of computing the eigenvalues).

Example 9.7 (Computing In(A) through the congruence relation) Determine the
inertia of the symmetric matrix

A=
[1 12]
1 3 .
236

Is it positive definite, negative definite, or indefinite?

n
Solution: The preceding method produces

1 2I1 0
[A I I] =
U 310 1

n
3 610
1
~ [E2 EIAE[ EI 1E2EIl] =
U 01
-1 1 1-1 1

n
1 21-2

U
01 1
~ [E3 E2EIAE[ EI Ej I E3 E2EI /] = -1 I -1 1
31-3 1
T
[D I p ],
=
where

.
100] [ 1 0 0] [100]
EI = [ -1 1 ,E2 = 1 , E3 = 1
0 1 -2 0 1 0 1 1
Since the diagonal entries of Dare 1, -1 and 3, we get In(A) = (2,1,0) and A is
indefinite. One can check that p'I' AP = D by a direct computation. 0
9.6. Bilinear and Hermitian forms 339

Problem 9.9 Find an invertible matrix P such that p T AP is diagonal for each of the following
symmetric matrices :

Problem9.10 For each A of the following Hermitian matrices, find an invertible matrix P
such that pH AP is diagonal and determine In(A) :

(1) A =
0 I . ]
1 I ~, (2) A =
[I
I + 3i
I - 3i
4 2i 1] , (3) A = [OOi]
0 0 0 .
[ -i 0 2 1 -2i 5 -i 0 0

9.6 Bilinear and Hermitian forms

In this section, we are concerned with two new forms, bilinear and Hermitian, to have
a little deep insight into a real or complex quadratic form, and prove Sylvester's law
of inertia as one of the main results.

Definition 9.6 A bilinear form on a pair of real vector spaces V and W is a real-
valued function b : V x W -+ JR on V x W satisfying
(1) b(kx + ex' , y) = k b(x , y) +eb(x' , y),
(2) b(x, ky + ey') = k b(x , y) + e b(x, y' )
for any x, x' in V, y, y' in W and any scalars k, e. In particular, if V = W ,
b : V x V -+ JR is called a bilinear form on V.

The conditions (1) and (2) say that b is linear in the first variable and also in the second
variable. In this sense, the function b : V x W -+ JR is said to be bilinear.

Example 9.8 (Every inner product is a bilinearform) Let A be an m x n real matrix


and let b : JRm X JRn -+ JR be defined by b(x, y) = x T Ay for x E JRm, y E JRn. Then
b is clearly a bilinear form. In particular, if m = n and A = In, the identity matrix,
then it shows
(1) the dot product on JRn is a bilinear form. In general,
(2) any inner product b(x, y) = (x, y) on a real vector space is a bilinear form. 0

Example 9.9 Let V be a vector space and V* its dual vector space, that is, V * =
(V ; JR). Let b : V x V* -+ JR be defined by

b(v, v") =v*(v) for any v E V , v" E V *.

Then, b is clearly a bilinear form on the pair of vector spaces V and V *. 0


340 Chapter 9. Quadratic Forms

Definition 9.7 A bilinear form b on a vector space V is said to be symmetric if


b(x, y) = b(y, x) for any x, y E V , and is skew-symmetric (or alternating) if
b(x, y) = -b(y, x) for any x, y E V .
For example, the bilinear form b : jRn X jRn -1' jR defined by b(x, y) xT Ay is =
symmetric (skew-symmetric, respectively) if and only if the matrix A is symmetric
(skew-symmetric, respectively) , Clearly, the dot product and a real quadratic form on
jRn are symmetric bilinear forms.

Problem 9.11 Showthata bilinear formb on lRn is skew-symmetric if andonlyif b(x, x) =0


for all x E jRn.

Definition 9.8 A sesquilinearform on a complex vector space V is a complex-valued


function b : V x V -1' C satisfying
=
(1) b(kx + ts', y) k b(x , y) + 'l b(x' , y) (semilinear in IS' variable),
(2) b(x, ky + ly') = k b(x , y) + l b(x , y') (linear in 2nd variable)
for any x, s', y, y' in V and any complex scalars k, l.
A sesquilinear form is called Hermitian if it satisfies
(3) b(x , y) = bey, x) for any x, yin V.

Example 9.10 (Every complex inner product is a Hermitianform) For any n x n


complex matrix A, the function b : en x en -1' e defined by b(x, y) =
x H Ay
for x, y E en is a Hermitian form if and only if A is a Hermitian matrix . In fact,
b(x, y) = x H Ay is certainly semilinear in the first variable, and s" Ax = x H Ay for
all x, y E en if and only if the matrix A is Hermitian. As a special case, if one takes
A = In, the identity matrix, then it shows
(1) the dot product on en is a Hermitian form. In general,
(2) any complex inner product b(x, y) = (x, y) on a complex vector space is a
Hermitian form. 0
Let b : V x V -1' jR be a bilinear form on a real vector space V, and let a =
{VI, V2 ," " vn } be a basis for V. Such a bilinear form is completely determined by
the values b(Vi, Vi) of the vectors Vi, Vi in the basis a because of the bilinearity. In
fact, if
x =
XlVI + X2V2 +
Y =
YI VI + Y2 V2 +
are vectors in V, then
n
b(x , y) = L XiYjb(Vi, Vi) = [X]~A[Y]a ,
i,i=I

where A = [aij], aij = b(Vi, Vi)' It is called the matrix representation of b with
respect to the basis a and denoted by [b]a. Let fJ be another basis for the vector space
V and let P = [i d]p be the basis-change matrix from fJ to a. Then we get
9.6. Bilinear and Hermitian forms 341

P[x]p = [id]p[x]p = [x]a


for any x in V, and

b(x, y) = [x]r A[Y]a = [x]~ (pT AP)[y]p

for any x and y in V. Thus, two matrixrepresentations ofa bilinearform b withrespect


to differentbases are congruent, and conversely any two congruentmatrices can be
matrix representations ofthe samebilinearform (verify it). Moreover, a bilinear form
is symmetric (or skew-symmetric) if and only if its matrix representation is symmetric
(or skew-symmetric) for any basis.
A similar process works in a complex case with H instead of T in order to have
a matrix representation [b]a of the sesquilinear form b with respect to the basis a.
As the real case, one can show that two matrix representations ofa sesquilinearform
b with respect to different bases are Hermitian congruent, and conversely any two
Hermitiancongruentmatricescan be matrix representations ofthe same sesquilinear
form. Moreover, a sesquilinear form b is Hermitian if and only if its matrix represen-
tation is Hermitian for any basis.

Problem 9.12 Prove:

(1) A bilinear form b is symmetric (or skew-symmetric, resp.) if and only if b(Vi, Vj) =
b(vj. Vi) (or b(Vi, Vj) = -b(Vj. Vi), resp.) for any vectors Vi, Vj in a basis ex, or
equivalently, the matrix representation [bl a is symmetric (or skew-symmetric, resp.) for
some basis ex.
(2) A sesquilinear form is Hermitian if and only if the matrix representation [bl a is Hermitian
for some basis a.
(3) A sesquilinear form on a complex vector space V is called skew-Hermitian if it satisfies
b(x . y) = -b(y, x) for any x, y in V . Show that a sesquilinear form b is skew-Hermitian
if and only if its matrix representation [bla is skew-Hermitian for some basis ex.

Note that congruent or Hermitian congruent matrices have the same rank because
the basis-change matrix P is nonsingular and so is p T (or pH).

Definition 9.9 The rank of a bilinear or a sesquilinear form b on a vector space V,


written rank(b), is defined as the rank of any matrix representation of b.

Example 9.11 (Computing rank(b)) Let b : ]Rz x ]Rz ~ ]R be defined by b(x, y) =


XIYI + 3xIYz + 2xzYI -XZYZ with respect to the standard basis a =
Ier . ez}. Then , b
is clearly a bilinear form but not symmetric, and the matrix representation of b with
respect to a is

[b]a = [~ -i].
UfJ = {VI , vz}withvj = (1,0) , Vz = (1, l)isanotherbasisfor]Rz,thenthematrix
representation of b with respect to fJ becomes
342 Chapter 9. Quadratic Forms

[blp = [; : J.
because b(VI, VI) =
1, b(VI, V2) =
4, b(V2, VI) =
3 and b(V2 , V2) = 5. Hence ,
rankjs], = rank[blp = 2, and the rank of b is also 2. 0
Problem 9.13 (1) Let b : ]R3 x]R3 - ]R be definedby b(x, y) = xIYI -2XIY2 +x2YI -X3Y3
with respect to the standard basis. Is this a bilinear form? If so, find the matrix representation
of b with respect to the basis
ex = {VI = (1, 0, 1), v2 = (1, 0, -1), vs = (0, 1, D)} .
Find its rank.
(2) Let V = M2x2(lR) be the vector space of 2 x 2 matrices, and let b : V x V _ ]R be
definedby b(A, B) = tr(A) tr(B) . Is this a bilinearform? If so, findthe matrixrepresentation
of b with respect to the basis

ex = {I = [b ~ J. 2 = [~ bJ. 3 = [~ ~ l 4 = [~ ~]}.
Find its rank.

9.7 Diagonalization of bilinear or Hermitian forms


Every inner product (x, Ax} on a real vector space can be represented by a symmetric
matrix A, which is diagonalizable. However, it is not true for a bilinear form.
Definition 9.10 A bilinear (or sesquilinear) form b on V is diagonalizable if there
exists a basis ot for V such that the matrix representation [bla of b with respect to ot
is diagonal.
Theorem 9.6 (1) A bilinearform b on a real vector space V is symmetric if and only
if it is diagonalizable.
(2) A sesquilinearform b on a complex vector space V is Hermitian if and only if it
is diagonalizable in which all diagonal entries are real.

Proof: We prove only (1) and leave (2) as an exercise. Since every symmetric matrix
is orthogonally diagonalizable, we only need to prove the sufficiency. Let a bilinear
form b be diagonalizable so that the matrix representation [bla is diagonal for some
basis ott Then, for any vectors Vi, Vj in a basis ot = {VI,. . . , Vn } , we have b(Vi, Vj) =
b(Vj, Vi). Now, for any two vectors x and y in V, let x = L:7=I xiv, and y =
L:J=I YjVj. Then,
n n
b(x, y) =L xi Yjb(Vi, Vj) =L YjX ib(Vj , Vi) = bey, x) .
i,j=I i,j=I

Hence , b is symmetric. (See also Problem 9.12(1).) o


9.7. Diagonalization of bilinear or Hermitian forms 343

Problem 9.14 Prove Theorem 9.6(2): a sesquilinear form b on a complex vector space V is
Hermitian if and only if it is diagonalizable in which all diagonal entries are real.

Example 9.12 (Diagonalizing a symmetricbilinearform) Let b : ]R3 x ]R3 ~ lRbe


the bilinear form defined by

b(x, y) = XIY3 - 2X2Y2 + 2X2Y3 + X3YI + 2X3Y2 - X3Y3.

Clearly, b(x, y) = b(y, x), and the matrix representation of b with respect to the
standard basis at = [ej , e2, e3} is

00 -20 21] ,
[b]a = [1 2-1

which is symmetric. Hence, the bilinear form b is symmetric. By Theorem 9.6, it is


diagonalizable through the congruence. In fact,

[[b]a II] =
[ O~ 0 1 1100]
-2 210 1 0
2 -1 10 0 1

--* [EI[b]aE[ I Ell] =


[ O~ 011100]
-2 0 I 0 1 0
o 1I0 1 1
-1 0 011 -1 -1]
--* [E2EI[b]aE[ EI I E2EI I ] = 0 -2 0 10 1 0
[ o 0110 1 1
= [D I pT],
where

EI = [~ ~1 ~], E2 = [~ ~ -~] .
o 1 0 0 1
By a direct computation, one can show that pT[b]a P D . =
Moreover, if we take another basis f3 = {CI, C2, C3} consisting of the column
vectors of the matrix P, then P = [id]p and

Hence, if we write [x]p = (xl' x 2' x 3)and [y]p = (Yl' Y2' Y3) as new variables, then
the bilinear form b becomes

D
344 Chapter 9. Quadratic Forms

A skew-symmetric matrix is not diagonalizable in general, but the following theo-


rem shows the structure of a skew-symmetric bilinear form. Note that a bilinear form
b is skew-symmetric if and only if b(x, x) =
0 for any x in V .
Theorem 9.7 Let b : V x V -+ IR be a skew-symmetric bilinear form. Then there
exists a basis a for V with respect to which the matrix representation [b]a is of the
form

[-~ ~] o

[-~ ~]
o

o o
Proof: If b = 0, then [b]a is the zero matrix. Also if dim V = 1, then b(x, x) = 0
for any basis vector x in V, so b = O.
Now, we assume that b =1= 0 and prove it by induction on dim V. Since b ::J= 0,
there exist nonzero vectors x and y in V such that b(x, y) =1= O. By the bilinearity of b,
one can assume that b(x , y) =
1. Such vectors x and y must be linearly independent,
because ify = kx, then b(x, y) = kb(x, x) = O. Let U be the subspace of V spanned
by x and y, and let

w= (v E V : b(v, u) = 0 for any U E U}.

Then, one can easily show that W is also a subspace of V and un W = {OJ. Moreover,
U+ W =V. In fact for a given vector v E V, let U =
b(v, y)x - b(v, x)y. It is easy
to show that U E U and v - U E W . Thus V = U E9 W, where dim W = n - 2. Clearly,
the matrix representation of the restriction of b to U with respect to the basis {x, y}

is [ _ ~ ~], and the restriction of b to W is also skew-symmetric. The induction


hypothesis can be applied to W, and then one can finish the proof. 0

Problem 9.15 Prove that U n W = {OJ in the proof of Theorem9.7.

Example 9.13 (Block diagonalizing a skew-symmetric bilinear form) Let b : 1R3 x


1R3 -+ IR be the bilinear form defined by

b(x, y) = Xl Y2 - X2Yl + X3Yl - Xl Y3 + X2Y3 - X3Y2

Clearly, b(x, y) = - b(y, x), and the matrix representation of b with respect to the
standard basis a = [ej , e2, e3} is
9.7. Diagonalization of bilinear or Hermitian forms 345

0 1 -1]
[b]a =[ -1 0 1 ,
1 -1 0

which is skew-symmetric. By a simple computation, b(ej , e2) = 1 = -b(e2, ej).


Let U be the subspace of jR3 spanned by ej and e2, i.e., the xy-plane. If we set
W = {v E V : b(v, u) = 0 for any U E U}, then W = {AZ : A E R] , where
Z = (1, 1, 1). Clearly, f3 = [ej , e2, z} is a basis for jR3 and b(z, z) = 0 so that

010]
[b]p =
[ -1000
0 0 . o

Problem 9.16 Show that any bilinear form b on a vector space V is the sum of a symmetric
bilinear form and a skew-symmetric bilinear form.

The following theorem shows how quadratic forms and symmetric bilinear forms
are related.

Theorem 9.8 If b is a symmetric bilinear form on jRn, then the function q(x) =
b(x, x) for x E jRn is a quadratic form .
Conversely, for every quadratic form q, there is a unique symmetric bilinearform
b such that q(x) = b(x, x) for all x in jRn.

Proof: If b(x, y) = x T Ay is a symmetric bilinear form, then q(x) = b(x, x) =


x T Ax is clearly a quadratic form .
Conversely, if b is a symmetric bilinear form, then

b(x + y, x + y) = b(x, x) + 2b(x, y) + b(y, y),

which is called thepolarform of b. Hence, for any given quadratic form q (x) = x T Ax
with a symmetric matrix A , a bilinear form b can be defined by

1
b(x, y) = 2[q(x + y) - q(x) - q(y)].

This form b is clearly symmetric, bilinear and b(x, x) = q(x). The uniqueness also
comes from this relation . 0

The following theorem shows how complex quadratic forms and Hermitian forms
are related.

Theorem 9.9 lfb is a Hermitianform on en, then the function q(x) = b(x, x)for
x E en is a complex quadratic form .
Conversely,for every complex quadratic form q, there is a unique Hermitian form
b such that q(x) = b(x, x) for all x in en.
346 Chapter 9. Quadratic Forms

Proof: If b(x, y) = x H Ay is a Hermitian form, then q(x) = b(x, x) = x H Ax is


clearly a complex quadratic form.
Conversely, if b is a Hermitian form, then

b(x + y, x + y) = b(x, x) + b(y, y) + b(x, y) + b(x, y),


b(x - y, x - y) = b(x, x) + b(y, y) - b(x, y) - b(x, y).

Hence, for any given complex quadratic form q (x) = xH Ax with a Hermitian matrix
A, a Hermitian form b can be defined by

b(x, y) = ~[q(X + y) - q(x - y)] + ~[q(iX + y) - q(ix - y)],

which is called the polarform of a Hermitian form b. This form b is clearly Hermitian
and b(x, x) = q(x) , which implies the uniqueness of b. 0

Now, we prove Sylvester's law of inertia.

Theorem 9.10 (Sylvester's law of inertia) Let b be a symmetric bilinear or a Her-


mitianform on a vector space V . Then, the number ofpositive diagonal entries and
the number ofnegati ve diagonal entries ofany diagonal representation ofb are both
independent of the diagonal representation.

Proof: We only prove it for a symmetric bilinear form, because the other case can
be proved by a similar method. Let b be a symmetric bilinear form on a vector space
V and let ex. = {XI, . .. ,xp , Xp+I, ... , xn } be an ordered basis for V in which

b(Xi . Xi) > 0 fori = 1,2, , p , and


b(Xi,Xi)::SO fori=p+l, , n,

and let fJ = {y I, ... , Yp' Ypi + I, . .. , Yn} be another ordered basis for V in which

b(Yi, Yi) > 0 for i = 1, 2, , p', and


b(Yi ,Yi)::SO fori=p'+l, ,n.

To show p = p', let U and W be subspaces


of V spanned by {Xl, " " x p } and
{ypl+l," " Yn}, respectively. Then, b(u, u) >
0 for any nonzero vector u E U
and b(w, w) ::s 0 for any nonzero vector w E W by the bilinearity of b. Thus,
un =W {OJ, and
dim(U + W) = dim U + dim W - dim(U n W) = p + (n - p') ::s n,
or p ::s p', Similarly, one can show p' ::s p to conclude p = p', Therefore, any
two diagonal matrix representations of b have the same number of positive diagonal
entries. By considering the bilinear form -b instead of b, one can also have that any
two diagonal matrix representations of b have the same number of negative diagonal
entries . 0
9.7. Diagonalization of bilinear or Hermitian forms 347

Corollary 9.11 (1) Any two symmetricmatrices which are congruenthave the same
inertia.
(2) Any two Hermitian matrices which are Hermitian congruenthave the same iner-
tia.
Definition 9.11 Let A be a real symmetric or a Hermitian matrix. The number of
positive eigenvalues of A is called the index of A. The difference between the number
of positive eigenvalues and the number of negative eigenvalues of A is called the
signature of A .
Hence, the index and signature together with the rank of a symmetric or a Her-
mitian matrix are invariants under the congruence relation, and any two of these
invariants determine the third: that is,

the index = the number of positive eigenvalues,


the rank = the index + the number of negative eigenvalues,
the signature = the index - the number of negative eigenvalues.

We have shown the necessary condition of the following corollary.

Corollary 9.12 (1) Two symmetric matrices are congruent if and only if they have
the same invariants; index, signature and rank.
(2) Two Hermitian matrices are Hermitian congruent if and only if they have the
same invariants.

Proof: We only prove (I). Suppose that two symmetric matrices A and B have
the same invariants, and let D and E be diagonal matrices congruent to A and B,
respectively. Without loss of generality, one may choose D and E so that the diagonal
entries are in the order of positive, negative and zero. Let p and r denote the index
and the rank, respectively, of both D and E . Let d; denote the i-th diagonal entry of
D. Define the diagonal matrix Q whose i-th diagonal entry q; is given by

qi =
j l/.Jdi
~/J-dj
iflSiSp
ifp<iSr
if r < i S n ,

Then,

Hence, A is congruent to Jpr and similarly so is B. It concludes that A is congruent


to B. 0
348 Chapter 9. Quadratic Forms

Example 9.14 Determine the index, the signature and the rank for each of the fol-
lowing matrices .

12]
o 3 ,
3 6

Which are congruent to each other?

Solution: In Example 9.7, we saw that the matrix A is congruent to the diagonal

1 00]
matrix

D = [o0 -1 0
0 3
.

Therefore, A has rank 3, index 2 and signature 1.The matrix B is already diagonal, and
has rank 3, index 3 and signature 3. Using the method of Example 9.7, one can show
that C is congruent to the diagonal matrix with diagonal entries 1, 1, -4. Therefore, C
has rank 3, index 2 and signature 1. (Note that it is not necessary to find the eigenvalue
of C to determine its invariants.) We conclude that A and C are congruent and B is
congruent to neither A nor C by Corollary 9.12. 0

Problem 9.17 Prove that if the diagonal entries of a diagonal matrix are permuted, then the
resulting diagonal matrix is congruent to the original one.

Problem 9.18 Prove that the total number of distinct equivalence classes of congruent n x n
real symmetric matrices is equal to ~(n + l)(n + 2).

Problem 9.19 Find the signature , the index and the rank of each of the following matrices .

0 I 2] 123]
(1)
[ 21 3-2 43 , (2)
[ 2 4 5
356
,

Which are congruent to each other?

9.8 Applications

9.8.1 Extrema of real-valued functions on jRn

In calculus, one uses the second derivative test to see whether a given function y =
f(x) takes a local maximum or a local minimum at a critical point. In this section,
we show a similar test for a function of more than one variable and also show how
quadratic forms arise and how they can be used in this context.
9.8.1. Application: Extrema of real-valued functions on jRn 349

Let f (x) be a real-valued function (not necessarily a quadratic equation) on lRn .


A point Xo in lRn at which either a first partial derivative of f fails to exist or the first
partial derivatives of f are all zero is called a critical point of f. If f(x) has either a
local maximum or a local minimum at a point Xo and all the first partial derivatives of
f exist at xo, then all of them must be zero, i.e., fXj (xo) = 0 for all i = 1, 2, ... , n.
Thus, if f(x) has first partial derivatives everywhere, its local maxima and minima
will occur at critical points.
=
Let us first consider a function of two variables: f(x), x (x, y) E JR2 , which
has a critical point Xo = (xo, YO) E lR 2 If f has continuous third partial derivatives
in a neighborhood of Xo, it can be expanded in a Taylor series about that point: For
x = (xo + h, YO + k),
f(x) = f(xo + h, YO + k) = f(xo) + (hfx(xo) + kfy(xo))
+ ~ (h 2 fxx(xo) + 2hkfxy(xo) + k 2 fyy(x o)) + R
1
= f(xo) + '2 (ah 2 + 2bhk + ck 2) + R,
where
a = fxx(Xo), b = fxy(xo), c = fyy(xo),
and the remainder R is given by

R = ~ (h 3 fxxx(z) + 3h2kfxxy(z) + 3hk 2f xyy(z) + k 3 fyyy(z)) ,


with Z = (xo + (}h, Yo + (}k) for some 0 < () < 1.
If hand k are sufficiently small, IRI will be smaller than the absolute value of
!(ah 2 + 2bhk + ck 2), and hence f(x) - f(Xo) and ah 2 + 2bhk + ck 2 will have the
same sign. Note that the expression

q(h, k) = ah 2 + 2bhk + ck 2 = [h k]H [ ~ ]

is a quadratic form in the variables hand k, where

H = H(xo) = [a b] = [fxx(Xo) fxy(x o)]


b c fxy(xo) fyy(xo)

is a symmetric matrix, called the Hessian of fat Xo =


(xo, Yo). Hence, f(x, y)
has a local minimum (or maximum) at Xo if the quadratic form q(h, k) is positive
(or negative, respectively) for all sufficiently small (h, k). The critical point Xo is
called a saddle point if q(h , k) takes both positive and negative values. Thus, at this
point f (x, y) has neither a local minimum nor a local maximum. (This is the second
derivative test for a local extrema of f(x, y).)
In particular, a quadratic form

T
q(x) = x Ax = [x y] [~ ~] [ ~ ] = ax
2
+ 2bxy + cy2
350 Chapter 9. Quadratic Forms

forx = [x y]T E 1R2 is itself a function oftwo variables,and its firstpartial derivatives
are

qx = 2ax + 2by,
qy = 2bx +2cy.

By setting these equal to zero, we see that 0 = (0, 0) is a critical point of q. If


ac - b 2 f= 0, this will be the only critical point of q. Note the Hessian of q is

Thus, H is nonsingular if and only if ac - b 2 f= O.


Since q(O) = 0, it follows that the quadratic form q takes the global minimum at
oif and only if
q(x) = x T Ax > 0 for all x f= 0,
and q takes the global maximum at 0 if and only if

q(x) = x T Ax < 0 for all x f= O.

If x T Ax takes both positive and negative values, then 0 is a saddle point. Thus, if A
is nonsingular, the quadratic form q will have either the global minimum, the global
maximum or a saddle point at O.
In general, if a function f of two variables has a nonsingular Hessian H at a
critical point xo = (xo, YO) which has nonzero eigenvalues Al and A2, then the
second derivative test for f(x) says
(l) f has a minimum at Xo if both Al and A2 are positive,
(2) f has a maximum at Xo if both Al and A2 are negative,
(3) f has a saddle point at Xo if Al and A2 have different signs.

Example 9.15 (The extrema ofa quadraticform f (x, y) = x T Ax can be determined


by the inertia ofA)Forq(x, y) = 2x 2-4xy+5y2, determine the nature of the critical
point (0, 0).

Solution: The matrix of the quadratic form is

A = [ 2-2]
-2 5 .

There are two methods:


(1) Similarity method: Solve det(A./ - A) to get eigenvalues Al = 6 and A2 = 1.
Since both eigenvaluesare positive, A is positivedefinite and hence (0, 0) is a global
minimum.
(2) Congruence method: Diagonalizethe matrix A through thecongruencerelation
to get
9.8.1. Application: Extrema of real-valued functions on jRn 351

A= [_~ -~] -+ EAE


T
= [~ ~ l
where E = [~ ~] . It shows that In(A) = (2,0,0) and A is positive definite and
hence (0, 0) is a global minimum . 0

Example 9.16 (The inertia ofthe Hessiandeterminesthe local extrema ofany (non-
quadratic) f(x, y) at criticalpoints) Find and describe all critical points of the func-
tion
I
f(x , y) = _x 3 + xi - 4xy + 1.
3

Solution: The first partial derivatives of f are

L> 2
+y2 -4 y , fy = 2xy - 4x = 2x(y - 2).

Setting fy = 0, we get x = =
or y 2. Setting fx =
0, we see that if x =
0, then y
mustbeeitherOor4,andify = 2,thenx = 2.Thus,(0, 0), (0,4), (2, 2), (-2,2)
are the critical points of f . To classify these critical points, we compute the second
partial derivatives:

fxx = 2x , fxy = 2y - 4, fyy = 2x.


For each critical point (xo, Yo), one can determine the eigenvalues Al and A2 of the
Hessian
H =[ 2xo 2yo - 4 ] .
2yo - 4 2xo
These values are summarized in the following table:

Critical Point (xo, Yo) Al A2 Description


(0, 0) 4 -4 saddle point
(0, 4) 4 -4 saddle point
(2, 2) 4 4 local minimum
(-2, 2) -4 -4 local maximum

As an alternative method, one can compute the inertia of the Hessian at each
critical point by a congruence relation and get the same description of the nature of
critical points . 0

Beyond the functions of two variables, the same argument of the second derivative
test for functions of two variables can be justified for functions of more than two
variables with a Tayler series about critical points: Let f(x) = f(XI, X2, ... , x n ) be a
real-valued function whose third partial derivatives are all continuous. Ifxo is a critical
point of f, the Hessian of fat xo is the n x n symmetric matrix H = H (xo) = [hij]
given by
352 Chapter 9. Quadratic Forms

82 I
hij = --(xo) .
8Xi8xj
The critical point can be classified as follows :
(1) I has a local minimum at Xo if H(xo) is positive definite,
(2) I has a local maximum at Xo if H (xo) is negative definite,
(3) Xo is a saddle point of I if H (xa) is indefinite.

Example 9.17 (The inertia ofthe Hessian determines the extrema01any I(x, y, z)
at critical points) Find the local extrema of the function

ts. y, z) = x 2 + xz - 3 cos y + Z2.

Solution: The first partial derivatives of I are

Ix = 2x + z, Iy = 3 sin y , Iz =x + 2z.

It follows that (x, y, z) is a critical point of I if and only if x z = = =


0 and y nn,
where n is an integer. Let Xo = (0, 2krr, 0). The Hessian of I at Xo is given by

H(xo) = [0
20I]
3 0 .
102

It can be diagonalized through the congruence relation to get

H(xo) = 02 03 01 ] ~ [20
0 3 00 ] .
[ 1 0 2 0 0 3/2

It shows that In(H(xo)) = (3,0,0) and H(xa) is positive definite and hence f has
a local minimum at xo. (Alternatively, one can compute the eigenvalues of H (xo)
which are 3, 3, and 1, which implies that H (xo) is positive definite.)
On the other hand, at a critical point of the form Xl =
(0, (2k - l)rr, 0), the
Hessian will be
2 01]
H(Xl) =
[1 0 2
0 -3 0 .

One can show either that In(H (Xl)) = (2, 1, 0) by using a congruence relation or that
the eigenvalues of H(Xl) are -3, 3, and 1. Either one shows that H(x}) is indefinite
and hence Xl is a saddle point of I . 0

Problem 9.20 For each of the following functions, determine whetherthe givencriticalpoint
corresponds to a local minimum, local maximum, or saddlepoint:
(1) !(x, y) = 3x 2 - xy + y2 at (0, 0) ;
9.8.2. Application: Constrained quadratic optimization 353

(2) f(x ,y,z)=x 3+xY Z+y2_3x at (1,0,0).

Problem 9.21 Show that for a continuous function f(x , y) on]R2 which has continuous third
partial derivatives, a critical point xo =
(xo, YO) E ]R2 is a saddle point if and only if
det H (X()) < O.
Is it also true for such a function f(x, y, z) on ]R3?

9.8.2 Constrained quadratic optimization

One of the most importantproblems in appliedmathematics is the optimization (min-


imizationor maximization) of a real-valued function f of n variables subjectto con-
straintson the variables. For example, whenthe function f is a linearform subjectto
constraints in the form of linear equalities and/or inequalities, the optimization prob-
lem is known as linear programming. Those optimization problems are extensively
used in the military, industrial, governmental planningfields, among others.
In this section,weconsideran optimization problemof a quadraticformin n vari-
ables. If there are no constraints on the variables, then such an optimization problem
was discussed in Section 9.8.1.
As a quadraticoptimization problemwith constraints, we considera very special
one: Find the maximumand minimumvalues of a (real or complex) quadratic form
q(x) = (x, Ax} subject to the constraint [x] = 1. Advanced calculus tells us that
such constraintextremaof q(x) alwaysexists.
Theorem 9.13 Let A be a symmetric or a Hermitian matrix, and let the eigenvalues
of A be Amin = Al :::: A2 :::: ... :::: An = Amax in increasing orde r. Then,

(1) Amin IIxll 2 :::: (x, Ax} :::: Amax IIxll 2 for all x.
(2) (x, Ax} = A IIxll 2 ifx is an eigenvector of A belonging to an eigenvalue A.
(3) Amax = max - - - = max (x, Ax} , and for a unu. vector x, Amax = (x, Ax}
~,A~
x;fO (x, x} IIxll=1
if and only ifx is an eigenvector belonging to the eigenvalue Amax.
. (x, Ax} . ( .
(4) Amin = nun - - - = mm
#0 (x, x} IIxll=1
x, Ax}, an
dfi
or a unit vector x, Amin = (x, Ax} if.
and only ifx is an eigenvector belonging to the eigenvalue Amin.
In particular, the maximum and minimum values of a (real or complex) quadratic
form q(x) = (x, Ax} subject to the constraint [x] =
1 is the largest and the smallest
eigenvalues of A, respectively.

Proof: We prove it for only a Hermitian matrix A and the other case of a real
symmetric matrix is left as an exercise. If A is Hermitian, there is a unitary matrix
U such that UH AU = D is a diagonal matrix with AI, A2, . . ,An as its diagonal
entries. Moreover, with a change of coordinates y = U H x = [YI Y2 ... Ynf we
have
354 Chapter 9. Quadratic Forms

by the principal axes theorem, and [x] = IIYII because U is unitary. It implies that
2+A2IY212+ 2
(x,Ax) =xHAx = AJiYli " '+A nIYnI
2 2 '2
< AnlyJi + AnlY21 + + AnlYnl
2 2 2
= An (IYII + IY21 + + IYn 1 )
= An lIyll2 = Amax IIxll
2

since An = Amax is the largest eigenvalue. Similarly, one can show Amin ::: (x, Ax)
for all x. It proves (1).
(2) If x is an eigenvector of A belonging to A, then

(x, Ax) = (x, AX) = A(X, x) = Allxll 2.


In particular, if x is an eigenvector of A belonging to Amax (Amin, respectively) and
[x] = 1, then (x, Ax) = Amax (Amin , respectively) .
(3) We only prove the necessity part of the second assertion, because all other
parts are clear from (1) and (2). To show this, suppose that Amax =
(x, Ax) for a
Hermitian matrix A and a unit vector x. Let AI, A2, .. . , An be the eigenvalues of
A with associated eigenvectors VI , V2, . . . , Vn, respectively. One can assume that
Amax = Al and the eigenvectors VI,V2,"" Vn are orthonormal since A is Hermitian.
Let x = LjajVj. Then, we have

n n
= L aj l2Aj s L laj\2A1 = Al
i
j=I j=I
since Amax = AI. Hence , it should be a j = 0 whenever Aj = AI, which implies that
x is an eigenvector belonging to AI.
(4) can be proved in a similar way to (3). 0

Definition 9.12 The Rayleigh quotient of a symmetric or Hermitian matrix A is the


function RA defined for x = 0 by

RA (x) = (x, Ax) for x = O.


(x, x)

It follows from Theorem 9.13 that, subjectto the constraint [x] = 1, the quadratic
form (x, Ax) has the maximum value Amax and the minimum value Amin. It means
that the smallest and the largest eigenvalues of a Hermitian matrix are characterized
as the solutions of a constrained minimum and maximum problem of the Rayleigh
quotient. This is very important in vibration problems ranging from aerodynamics to
particle physics .
9.8.2. Application: Constrained quadratic optimization 355

Example 9.18 (The extreme values ofa constrained quadratic form) Find the maxi-
mum and minimum valuesof the quadratic form

xf + xi + 4XlX2
xf
subject to the constraint + xi = 1, and determine valuesof Xl and X2 at whichthe
maximumand minimumoccur.

Solution: The quadraticform can be written as

Xl2 + x22 + 4XlX2 = xT Ax = [ Xl X2 ] [1 2] [


2 1 Xl
X2']

The eigenvalues of A are A = 3 and A = -1, which are the largest and smallest
eigenvalues, respectively. Their associated unit eigenvectors are

respectively. Note that those extremevaluesof the quadratic form occur at thoseunit
xf
eigenvectors. Thus, subject to the constraint + xi = 1, the maximum valueof the
quadratic form is A = 3, which occurs at x = (I/.J2, 1/.J2), and the minimum
value is A = -1, which occurs at x = (I/.J2, -1/.J2). Clearly, the quadratic
equation x T Ax = c is a hyperbola and the extreme values occur when a hyperbola
and the unit circle intersectas in Figure 9.7. 0

~"":-""""+-t-----:*,""-+-i--'::""";:"""""~ Xl
Xf +Xi +4XI X2 = 3
xf +X~ +4XIX2 =-1

Figure 9.7. Extremevalues of the constraint quadratic form


356 Chapter 9. Quadratic Forms

Remark: In Example 9.18, it is shown that the maximum value of the quadratic form
xf +xi + 4XlX2 subject to the constraintxf +xi = I is 3. By examining Figure 9.7,
one can also see the following dual constraint optimization: the minimum value of
the quadratic form xf + xi subject to the constraint xf + xi + 4Xl X2 = 3 is I.

pm:~:;~2[1Tnm]~m: ~~[~? ~~U~O]flh' Rayleigh quotient of each of


o 0 0 I 0 -i I

Problem 9.23 Find the maximum and minimum values of the quadratic form

2Xf + 2xi + 3XlX2

subject to the constraint xf + xi = I, and determine valuesof Xl and X2 at which the maximum
and minimum occur.

Problem 9.24 Find the maximum and minimum of the following quadratic forms subject to
the constraint xf + xi + xi = I and determine the values of Xl, x2, and X3 at which the
maximum and minimum occur:
(1) xf + xi + 2xj - 2XlX2 + 4XlX3 + 4X2X3 ,
(2) 2xf + xi + xi + 2xIX3 + 2XlX2

We have seen that the Rayleigh quotient characterizes the largest and the smallest
eigenvalues and their associated eigenvectors of a real symmetric or a Hermitian
matrix A in terms of a constrained optimization problem. But all other eigenvalues
and their associated eigenvectors can be characterized in a similar way. For example,
the second largest eigenvalue can be characterized as the maximum value of the
quadratic form {x, Ax} subject to the constraint x H vn=O, where V n is an eigenvector
belonging to the largest eigenvalue Amax. For a future discussion, one can refer to
some advanced linear algebra books.

9.9 Exercises
9.1. Find the matrix representing each of the following quadratic forms:
(1) xf + 4XlX2 + 3xi.
(2) xf - xi + + 4XlX3 - 5X2X3 ,
xi
(3) XI - 2x~ - 3xj + 4XlX2 + 6XlX3 - 8X2X3.
3Xl Yl - 2xl Y2 + 5X2Yl + 7X2Y2 - 8X2Y3 + 4X3Y2 - X3Y3 ,

l
(4)

(5) [Xl X2] [~ ~] [ ;~


9.2. Sketch the level surface of each of the following quadratic equations:
(I) xy = 2.
9.9. Exercises 357

(2) 53x 2 - 72xy + 32y2 = 80,


(3) 16x 2 - 24xy + 9y2 - 60x - 80y + 100 = O.

9.3. Letq be a quadratic form on]R3 and let A =[ ~ -~ -~] bethematrixrepresenting


-5 4 7
q with respect to the basis

ex = {(I, 0, I) , (I, I, 0), (0,0, I)}.

(1) Diagonalize A, i.e., find an orthogonal matrix P so that p T AP is a diagonal matrix.


(2) Construct a basis f3 for]R3 such that the elements of f3 are the principal axes of the
quadratic surface q(x) O. =
9.4. For a given quadratic equation ax 2 + 2bxy =
+ ci + dx + ey + f 0 with b :F 0, classify
the conic section according to the various possible cases of a, b, and c (see Example 9.6).
9.5 . For a pos itive definite quadratic form q(x) =
ax 2 + 2bxy + cy2, the curve q(x) 1 is =
an ellipse. When a = = =
c 2 and b -1, sketch the ellipse.
9.6. Show that if A and B are both positive definite, so are A 2 , A -1 and A B. +
9.7. Prove that if A and B are symmetric and positive definite, so is A2 + B- 1.
9.8. Find a substitution x = Qy that diagonalizes each of the following quadratic forms, where
Q is orthogonal. Also, classify the form as positive definite, positive semidefinite, and so
on .
(1) q(x) =
2x 2 + 6xy + 2y2.
(2) q(x) =
x 2 + i + z2 + 2(xy + xz + yz) .
9.9. Determine whether or not each of the following matrices is positive definite:

(1)A=[-~ -~ =~] , (2)A=[~ ~ ~] .


-1 -1 2 1 0 1
Use the decomposition A = LDL T to write x T Ax as the sum of squares.
9.10. Let b be a bilinear form on ]R2defined by

b(x, y) = 2Xl Yl - 3xl Y2 + x2Y2

(1) Find the matrix A of b with respect to the basis ex = {(t, 0), (1, I)}.
(2) Find the matrix B of b with respect to the basis f3 = {(2, 1), (1, -I)}.
(3) Find the basis-change matrix Q from the basis f3 to the basis ex and verify that
B =
QT AQ .
9.11. Find the signature, index and rank of each of the following symmetric matrices:

(1)
[
0 12]
1 -1 3
2 3 4
, (2)
[2 3 0]
3 -1 -2
0 -2 0
, (3)
[4 -3-3
5
2
1 -6
i].
9.12. Which of the following functions b on ]R2 are of bilinear form?
(1) b(x, y) =1
(2) b(x, y) = (Xl - Yl)2 + X2Y2
(3) b(x, y) = (Xl + Yl)2 - (Xl - yJ>2
358 Chapter 9. Quadratic Forms

(4) b(x, y) = XIY2 - X2YI

9.13. For a bilinear form on]R2 defined by b(x , y) =


XI YI +X2Y2 , find the matrix representation
of b with respect to each of the following bases :

a = {(I, 0), (0, I)}, {3 = {(I , -1) , (1, I)}, y = {(I, 2), (3, 4)}.

9.14. Which one of the following bilinear forms on lR3 are symmetric or skew-symmetric? For
each symmetric one, find its matrix representation of the diagonal form , and for each
skew-symmetric one, find its matrix representation of the block form in Theorem 9.7.

(1) b(x, y) = XI Y3 + X3YI


(2) b(x, y) = xI YI + 2xIY3 + 2x3YI - X2Y2
(3) b(x, y) = xIY2 + 2XIY3 - x2Y3 - x2YI - 2X3YI + x3Y2
(4) b(x, y) = rL=1 (i - j)Xj Yj
9.15. Determine whether each of the following matrices takes a local minimum, local maximum
or saddle point at the given point:
(1) f(x , y) = -1 +
+4(eX -x) - 5xsiny 6y 2 atthepoint (x , y) = (0, 0);
(2) f(x , y)=(x 2-2x)cosyat(x, y)=(1, zr).

9.16. Show that the quadratic form q(x) = 2x 2 + 4xy + y 2 has a saddle point at the origin,
despite the fact that its coefficients are positive . Show that q can be written as the difference
of two perfect squares.
9.17. Find the eigenvalues of the following matrices and the maximum value of the associated
quadratic forms on the unit sphere .

(1) [-~ ., ~ ], (2) [-~ -~ ~ ] , (3) [-~ -~ ~] .


o 1 -1 0 1 -2 0 0 5
9.18. A bilinear form b : V x W ~ lRon vector spaces V and W is said to be nondegenerate
if it satisfies

b(v , w) =0 for all w E W implies v = 0, and


b(v , w) =0 for all v E V implies w = O.
As an example, an inner product on a vector space V is just a symmetric, nondegenerate
bilinear form on V. Let b : V x W ~ lR be a nondegenerate bilinear form. For a fixed
WE W , we define f{Jw : V ~ R by

f{Jw(v) = b(v, w) for v E V.

Then, the bilinearity of b proves that f{Jw E V*, from which we obtain a linear transfor-
mation
rp : W ~ V* defined by rp(w) = f{Jw .
Similarly, we can have a linear transformation 1/1 : V ~ W* defined by

1/I(v)(W) = b(v, w) for v E V and WE W.

Prove the following statements:


9.9. Exercises 359

(1) If b : V x W -+ JR is a nondegenerate bilinear form, then the linear transformations


rp : W -+ V* and 1/1 : V -+ W* are isomorphisms.
(2) If there exists a nondegenerate bilinear form b : V x W -+ JR, then dim V = dim W .
9.19. Determine whether the following statements are true or false, in general, and justify your
answers .
(1) For any quadratic form q on JRn , there exists a basis ex for JRn with respect to which
the matrix representation of q is diagonal.
(2) Any two matrix representations of a quadratic form have the same inertia.
(3) If A is positive definite symmetric matrix, then every square submatrix of A has
positive determinant.
(4) If A is negative definite, det A < O.
(5) The sum of a positive definite quadratic form and a negative definite quadratic form
is indefinite.
(6) If A is a real symmetric positive definite matrix, then the solution set of x T Ax =I
is an ellipsoid.
(7) For any nontrivial bilinear form b i= 0 on a vector space V, if b(v, v) = 0, then
v =0.
(8) Any symmetric matrix is congruent to a diagonal matrix.
(9) Any two congruent matrices have the same eigenvalues.
(10) Any two congruent matrices have the same determinant.
(11) The sum of two bilinear forms on V is also a bilinear form.
(12) Any matrix representation of a bilinear form is diagonalizable.
(13) If a real symmetric matrix A is both positive semidefinite and negative semidefinite,
then A must be the zero matrix.
(14) Any two similar real symmetric matrices have the same signature .
Selected Answers and Hints

Chapter1
Problems
1.2 (1) Inconsistent.
(2) (XI, Xz, x3, X4) = (-1- 4t, 6 - 21, 2 - 31, 1) for any 1 E JR.
1.3 (1) (x , y, z) = (1, -1 , 1). (3) (w, x , y, z) = (2, 0, 1, 3)
1.4 (1) bl + bZ - b3 = O. (2) For any bj's.
1.7 a = -, b = , c = .!j, d = -4.
1.9 Consider the matrices : A = [; : l B = [; ~ J. C = [~ ~ J-
1.10 Compare the diagonal entries of AA T and AT A.
1.12 (1) Infinitely many for a = 4, exactly one for a f= 4, and none for a = -4.
(2) Infinitely many for a = 2, none for a = -3, and exactly one otherwise.
1.14 (3) I = IT = (AA-I)T = (A-I)T AT means by definition (AT)-I = (A-I)T.
1.16 Use Problem 1.14(3).
1 .17 Any permutation on n objects can be obtained by taking a finite number of interchangings
of two objects .
1.21 Consider the case that some dj is zero.
1.22 X = 2, y = 3, Z = 1.
1.23 True if Ax = b is consistent, but not true in general.
1.24 L =[ -~
o -I I
~ ~], U=
0
[b -~ -~ ].
0 I
1.25 (1) Consider (i, j)-entries of AB for i < j .
(2) A can be written as a product of lower triangular elementary matrices .
362 Selected Answers and Hints

1 01 0] [20 3~2 ~ u= [~ -~/2 -~/3] .


1.26 L=
[ -1/2 0
o -2/3 1
,D=
0 o 4/3
] ,
0 0 1
1.27 There are four possibilities for P.
1.29 (1) II = 0.5, 12 = 6, 13 = 0.55. (2) II = 0, h = 13 = 1,14 = 15 = 5.
0.35 ]
1.30 x = k 0.40 for k > O.
[ 0.25

0.0 0.1 0.8] [ 90 ]


1.31 A = 0.4 0.7 0.1 with d = 10 .
[ 0.5 0.0 0.1 30

Exercises

1.1 Row-echelon forms are A, B, D, F . Reduced row-echelon forms are A, B, F.


1 -3 2 1 2]
1.2 (1) ~ ~ ~ -1/~ 3/~ .
[
o 0 0 0 0
1 -3 0 3/2 1/2]
1.3 (1) 0 0 1 -1/4 3/4
[
o 0 0 0 0 .
o 0 0 0 0
1.4 (1) XI = 0, X2 = 1, X3 = -1, X4 = 2. (2) X = 17/2, y = 3, Z = -4.
1.5 (1) and (2) .
1.6 For any bi'S.
1.7 bl - 2b2 + 5b3 i= O.
1.8 (1) Take x the transpose of each row vector of A.
1.10 Try it with several kinds of diagonal matrices for B.
1 2k 3k(k - 1) ]
1.11 Ak = 0 1 3k .
[ o 0 1

1.13 (2)
5
0
-2227
101]
-60 .
[ o 0 87
1.14 See Problem 1.9.
1.16 (1) A-lAB = B. (2) A-lAC =C = A + I.
1.17 a = 0, c- I = b i= O.

1 -1 0 0] [13/8
11 8 A-I _ 0 1/2 -1/2 0 B-1 = -15/8
. - 0 0 1/3 -1/3 '
[
o 0 0 1/4 5/4

= -Is
8 -23
-19 2]
1.19 A-I
[4
1 4 .
-2 1
Selected Answers and Hints 363

1.22 (I)x=A-Ib= [-~~; -~~~ ~~~] [ ; ] = [-~~;].


-1/3 -2/3 1/3 7 - 5/ 3

1.23 (1) A = [~ ~] [~ ~] [~ Ii
2]
= LDU , (2) L = A, D = U = I.

1.24 (1) A = [~3 ~1 ~]


1
[~~ ~ ] [~0 ~0 ~]
0 0 -1 1
,
(2) [b~a ~][ ~ d _ ~2/a ][ ~ b/~ l
1.25 c=[2 -I3f,x=[423]T.

1.26 (2) A = [~1~1~]1[~0 0


~ ~]2 [~
001
~ 4/~] .
1.27 (1) (Ak)- I = (A-Ii . (2) An-I = 0 if A E Mnxn.
(3) (l- A)(l + A + ... + Ak- I) = I - Ak.

1.28 (1) A = [~ ~ 1 (2) A = r l


AZ = r l
A= I.

l
1.29 Exactly seven of them are true.
(1) See Theorem 1.9. (2) Consider A = [~ ~

::: r)
;:n::e~:=[1-1~ T]1f ~:B-: [? tB~=]~ABl = BT AT =BA .
5 3 1 1 3 2
(7) (A-Il = (AT)-I = A-I. (8) If A-I exists, A-I = A-I (AB T) = B T.

U~ l
(9) If AB has the (right) inverse C, then A-I = BC.
(10) Con sider EI = [~ ~] and EZ =
(12) Consider a permutation matrix [~ ~] .

Chapter 2
Problems
2.2 (1) Note: 2nd column - 1st column = yd column - 2nd column.
2.4 (1) -27, (2) 0, (3) (1 - x 4)3 .
2.7 Let a be a transposition in Sn. Then the composition of a with an even (odd) permutation
in Sn is an odd (even, respectively) permutation.
2.9 (1) -14. (2) O.
2.10 (I)(y-x)(-x+z)(z-y)(w-x)(w-y)(w- z)(w+y+x+z) .
(2) (fa - be + cd) (fa + cd - eb) .

2.11 A-I = [-~ I =1]; adjA= [_~ -58 -52] -1 I .


2 -! ! 6
364 Selected Answers and Hints

2.12 If A = 0, then clearly adj A = O. Otherwise , use A . adj A = (det A)/.


2.13 Use adj A . adj(adj A) = det(adj A) I .
2.14 If A and B are invertible matrices , then (AB)-l = B-1 A-I . Since for any invertible
= =
matrix A we have adj A (det A)A- 1, (1) and (2) are obvious. To show (3), let AB BA
and A be invertible. Then A-I B = A- 1(BA)A- 1 = A- 1(AB)A-1 = BA- 1, which
gives (3).
2.15 (1) xl = 4, X2 = 1, x3 = -2.
10 5 5
(2) x = 23' y = 6' z = 2'

2.16 The solution of the system id(x) = x is Xi = Ji~Si = det A.


Exercises

2.1 k = 0 or 2.
2.2 It is not necessary to compute A 2 or A 3 .
2.3 -37.
2.4 (1) det A = (_1)n-1 (n - 1). (2) O.
2.5 -2,0,1,4.
2.6 Consider L a1u(l) . .. anu(n)'
ueS.
2.7 (1) 1, (2) 24.
2.8 (3) Xl = 1, X2 = -1, x3 =2, X4 = - 2.
2.9 (2) x = (3,0, 4/11)T.
2.10 k=Oor1.
2.11 x = (-5, 1,2, 3)T .
2.12 x = 3, y = -1, z = 2.
2.13 (3) All = - 2, A12 = 7, An = -8, A33 = 3.

2.16 A-I = -h. [~~ -~ 1~ ] .


6 14 -18

2.17 (1) adj A = [


-4
i =~ =~] ,
7 5
det A = -7, det(adj A) = 49,

A-I = -+adj A. (2) adj A = [-1~ ~ -~],


7 -3 -1
det A = 2, det(adj A) = 4, A-I = !-adj A .

2.19 Multiply [~ ~].


= [; ~], then the area is 11 det AI = 4.

[i ~ 1
2.20 If we set A

2.21 If we set A = then the area is !.fl "'I(AT A)I ~ 1#.


SelectedAnswers and Hints 365

2.22 Use det A = L sgn(a)al 17(I)'" a n17(n)'


17eS n

l
2.23 Exactly seven of them are true.

(I) Consider A = [ ; ; ] and B = [~ ;


(2) det(AB) = det A det B = det B det A .
(3) Consider A = [ ; ; ] and c = 3. (4) (cIn - A)T = cln - AT .

(5) Consider E = [~ ~ ] . (6) and (7) Compare their determinants.


(8) If A = h ? (9) Find its counter example.
(10) What happened for A = O?
(11) UyT = U[VI . .. vn] = [VIU' " vnu] ;
det(uyT) = VI . . . Vn det([u uJ) = O.

: : ::::, U[~(:'
o
~ 'J':
1 1
A : :l IT ~ dot A 0'
(16) At = A -I for any permutation matrix A.

Chapter 3
Problems

3.1 Check the commutativity in addition.


3.2 (2), (4).
3.3 (1), (2), (4).
3.5 See Problem 1.11.
3.6 Note that any vector yin W is of the form aIxI + a2x2 + ... + amXm which is a vector
inU .
3.7 Use Lemma 3.7(2) .
3.9 Use Theorem 3.6
3.10 Any basis for W must be a basis for V already, by Corollary 3.12.
3.11 (I) dim= n - 1, (2) dim= n(nt) , (3) dim= n(yll.
3.13 63a + 39b - 13c + 5d = O.
3.15 If bj , . .. , b n denote the column vectors of B, then AB = [Abl ... Ab n].
3.16 Consider the matrix A from Example 3.20.
3.17 (I) rank = 3, nullity = 1. (2) rank = 2, nullity = 2.
3.18 Ax = b has a solution if and only if b E C(A) .
3.19 A-I(AB) = B implies rank B = rank A-1(AB)::: rank(AB) .
3.20 By (2) of Theorem 3.21 and Corollary 3.18, a matrix A of rank r must have an invertible
submatrix C of rank r. By (1) of the same theorem, the rank of C must be the largest.
3.22 dim(V + W) = 4 and dim(V n W) = 1.
366 Selected Answers and Hints

3.23 A basis for V is 1 ,0,0,0), (0, -1 , 1,0), (0, -1,0, I)},


for W : -1 , 1,0,0), (0,0,2, I)}, and for V n W : 3, -3,2, I)}. Thus,
dim(V + W) = 4 means V + W =]R4 and any basis for]R4 works for V + W.
3.26
1 0

A= 1 1
[
1

Exercises

3.1 Consider 0(1, 1).


3.2 (5).
3.3 (2), (3). For (1), if f(O) = 1, 2f(O) = 2.
3.4 (1).
3.5 (1), (4).
3.6 tr(AB - BA) = 0.
3.7 No.
3.8 (1) p(x) = -PI (x) + 3P2(X) - 2P3(X).
3.9 No.
3.10 Linearly dependent.
3.12 No.
3.13 ((1,1,0) , (1,0, I)}.
3.14 2.
3 {} OO }
.15 Consider (ej = ai i=l where ai =
{I if i = j ,
otherwise.

3.16 (1) 0 = qAb l + ... +cpAbs = A(qb l + . .. + cpb S ) implies qb l + .. . + cpb P = 0



since N (A) = 0, and this also implies Ci = for all i = 1, . . . , P since columns of Bare
linearly independent.
(2) B has a right inverse . (3) and (4) : Look at (1) and (2) above.
3.17 (1) {(-5, 3, I)}. (2) 3.
3.18 5f, and dependent.
3.19 (1) 'R,(A) = (1,2,0,3) , (0,0,1 ,2), C(A) = (5,0,1), (0,5 ,2),
N(A) = (-2, 1,0,0), (-3,0, -2, 1) .
(2) 'R,(B) =
(1 ,1 , -2,2), (0,2,1, -5), (0,0,0, I), C(B) =
((1, -2,0), (0,1,1),
(0,0,1), N(B) = (5, -1,2,0).
3.20 rank = 2 when x = -3, rank = 3 when x =f. - 3.
3.22 See Exercise 2.23: Each column vector of UyT is of the form ViU , that is, U spans the
column space . Conversely, if A is of rank I, then the column space is spanned by anyone
column of A, say the first column u of A, and the remaining columns are of the form ViU,
i = 2, . .. , n . Take v = [1 V2 ... vnf . Then one can easily see that A = UyT.
3.23 Four of them are true.
(1) A - A =1 or 2A = 1
(2) In]R2 , let a = [ej , e2} and f3 = [ej , -e2}.
Selected Answers and Hints 367

(3) Even U = W, (X n fJ can be an empty set.


(4) How can you find a basis for C(A). See Example 3.20.
.
(5) Consider A = [0
1 0
0 ] and B = [ 1
0 0
0] .
(6) See Theorem 3.24. (7) See Theorem 3.25. (8) Ifx = -s.
=
(9) In]R2, Consider U ]R2 x 0 and V 0 x ]R2. =
(10) Note dim C(A T ) = =
dim'R(A) dim C(A).
(II) By the fundamental Theorem and the Rank Theorem.

Chapter 4
Problems

4.1 [~ ~ l since it is simply the change of coordinates x and y .

4.2 To show W is a subspace, see Theorem 4.2. Let Eij be the matrix with I at the (i, j)-th
position and 0 at others . Let Fk be the matrix with 1 at the (k, k) -th position, -I at the
(n, n)-thposition and 0 at others. Then theset{Eij, Fk : I :::: i f= j :::: n, k = I, ... , n-I}
is a basis for W . Thus dim W = n2 - 1.
4.3 tr(AB) = I:i=1 I:k=1 aikbki = I:k=1 I:i=1 bkiaik = tr(BA) .
4.4 If yes, (2, I) = T(-6 , -2, 0) = -2T(3 , I, 0) = (-2, -2).
4.5 If aivi + a2v2 + . ..+ akvk = 0, then
o = T(al VI + a2v2 + .., + akvk) = al WI + a2w2 + . .. + akwk implies ai = 0 for
i=I , ... ,k.
4.6 (I) If T(x) = T(y) , then S 0 T(x) = So T(y) implies x = y. (4) They are invertible.
4.7 (1) T(x) = T(y) if and only if T(x - y) = 0, i.e., x - y E Ker(T) .
(2) Let{VI, . . . , vn} be a basis for V. 1fT is one-to-one, then the set{T(vI), . .. , T(vn )} is
linearly independent as the proof of Theorem 4.7 shows. Corollary 3.12 shows it is a basis
=
for V . Thus, for any y E V, we can write it as y I:I=I aiT(vi) =
T(I:I=I aivi) . Set
x = I:I=I aivi E V . Then clearly T(x) = y so that T is onto. If T is onto, then for each
i = =
I , .. . , n there exists xi E V such that T(Xi) Vi.Then the set Ixj , ... ,xn} is linearly
independent in V,since, ifI:l=l aixi = = =
0, then 0 T(I:I=I aixi) I:I=I aiT(xi) =
I:I=I aivi implies ai = 0 for all i = I, . . . ,n. Thus it is a basis by Corollary 3.12 again.
If T(x) = 0 for x= I:I=I aixi E V , then 0 = T(x) =I:I=I aiT(xi) = I:I=I aivi
implies ai = =
0 for all i =
I , . . . , n, that is x O.Thus Ker (T) ={O}.

4.8 Use rotation R!f andrefiection [~ _~] about the x-axis.


4.9 (1) (5, 2, 3). (2) (2, 3, 0).

4.10 (I)[Tla=[;
4
=i ~]'[Tlll=[~ -~
7 0 4 -3
245 ].

4.11 [Tl~ = [~o ~2 -~3 4~] .


U~ n[TaS]a~ U~ n
368 Selected Answers and Hints

413 [S+Tla~
4.14 [Sl~ = [~o - 0~ ~]
1
, [Tl a = [~ ~ ~] .
0 0 4

4.15 (2) [T]~ = [~ ~] [T -1l p = [ - ~ ~ l


4.16 [idlp=~[~ -; -~]'[idl~=[-; -~ -1~] .
2 2 1 1 1 1 -2

4.17 [T]a=[~1 -~0 4~]'[T1P=[-~1 -~1 -~]


5
.
4.18 Write B = Q-I AQ with some invertible matrix Q.
(1) det B = det(Q-I AQ) = det Q-I det Adet Q = det A. (2) tr(B) = tr(Q-I AQ) =
tr(QQ-I A) = tr(A) (see Problem 4.3). (3) Use Problem 3.19.
4.20 a* = {fI (x, y, z) = x- !y, !2(x, y, z) = !y, f3(X, y, z) = -x + z},
4.24 By BA, we get a tilting along the x-axis; while by BT A, one can get a tilting along the
y-axis .

Exercises

4.1 (2).
4.2 ax 3 + bx 2 + ax + c.
4.4 S is linear because the integration satisfies the linearity.
4.5 (1) Consider the decomposition ofv = v+I(V) + v-I(v).
4.6 (1) {(x, ~x, 2x) E lR3 : x E R}.
4.7 (2) T-1(r, s, t) = (! r, 2r - s, 7r - 3s - r) ,
4.8 (1) Since T 0 S is one-to-one from V into V, T 0 S is also onto and so T is onto . Moreover,
if S(u) = S(v), then T 0 S(u) = T 0 S(v) implies u = v. Thus, S is one-to-one, and
so onto . This implies T is one-to-one. In fact, if T(u) = T(v), then there exist x and y
such that S(x) = u and S(y) = v. Thus T 0 S(x) = T 0 S(y) implies x = y and so
u = T(x) = T(y) =v.
4.9 Note that T cannot be one-to-one and S cannot be onto.
4.12 vol(T(C = Idet(A)lvo1(C), for the matrix representation A of T.
4.13 (3) (5, 2, 0).
5 4
4.14
[
-4 -3
0
o 0
0 =~o -:~]. 1
-1/3
4.15 (1) [ -5/3 2/3 ]
1/3 .
Selected Answers and Hints 369

4.16 (1) [~ _ i l [i (2) -~ 1


4.17 (1) T(l , 0, 0) = (4, 0), T(1, I, 0) = (1, 3), T(1 , I , I) = (4, 3).
(2) T(x, y, z) = (4x - 2y + z, Y+ 2z) .

4.18 (1)[~001
~ ;] '(4)[~0 0~ 0~] .
~1' h (I) U~: ~n(2) Q ~ [: i iJ ~ tr:' ,

4.20 Compute the trace of AB - BA .

4.21 (1) [-7 -13]


4
-33
19 8'

-2/3 1/3 4/3]


4.22 (2) [~ ;], (4) 2/3 -1/3 -1/3 .
[ 7/3 -2/3 -8/3
4.23 ' All represents reflection in a line at angle ()/2 to the x -axis . And , any two such reflections
are similar (by a rotation).
4.25 Compute their determinants.

4.27 [T]a = [-~ ~ ~] = ([T*Ja*)T .


1 0 1

4.28 (1) [_~ ~ ~ l(2)[T]~ = [-i ; -~ 1


4J9 :~):Il::: I , 0, I), 0, I, I , I) , 14,2,2,3, [T)~~ ~1 ~ ~n
[

4.31 PI (x) =1 + x - ~x2, P2(x) = -i + ~x2, P3(X) = -j. + x - ~x2 .


4.32 Five of them are false.
(1) Consider T : jR2 -+ jR3 defined by T(x , y) = (x, 0, 0) .
(2) Note dim Ker(T) + dim 1m(T) = dim jRn = n ,
(3) dim Ker(T) = dim N([T]~).
(4) dim Im(T) =
dimC([T]a) dimjR([TJa). =
(5) Ker(T) CKer(S 0 T) .
(6) P (x) = 2 is not linear.
(7) Use the definition of a linear transformation.
(8) and (10) See Remark (1) in Section 4.3.
(9) T : jRn -+ jRn is one-to-one iff T is an isomorphism. See Remark in Section 4.4 and
Theorem 4.14.
(11) By Definition 4.5 and Theorem 4.14.
(12) Cf. Theorem 4.17. (13) det(A + B) i= det(A) + det(B) in general.
(14) T(O) i= (0) in general for a translation T.
370 Selected Answers and Hints

Chapter 5
Problems
5.1 Let (x, y) = XT Ay be an inner product. Then, for x = (1,0), (x, x) = aXIYI +
C(XI Y2 + X2YI) + bX2Y2 > 0 implies a > O. Similarly, for any x =
(x, 1), (x, x) > 0
implies ab - c2 > O.
5.2 (x, y)2 = (x , x) (y, y) if and only if [rx + ylj2 = (x, x)t 2 + 2(x, y)t + (y, y) = 0 has
a repeated real root to.
5.3 If dl < 0, then for x = (1, 0, 0), (x, x) = dl < 0: Impossible.
5.4 (4) Compute the square of both sides and then use Cauchy-Schwarz inequality.
5.5 (4) Use Problem 5.4(4): triangle inequality in the length.
5.6 (f, g) = fJ
f(x)g(x)dx defines an inner product on C [0, 1]. Use Cauchy-Schwarz
inequality.
5.7 (2)-(3): Use fJ
f(x)g(x) dx = 0 if k f. i; and = ~ if k = e.
5.8 (1): Orthogonal, (2) and (3): None, (4): Orthonormal.
= =
5.10 Clearly, Xl (1,0,1), x2 (0,1,2) are in W. The Gram-Schmidt orthogonaliza-
tion gives UI = W
= (1,0,1), u2 = JJ(-I, 1, 1) which form a basis.

5.11 {I, J3(2x - 1), V5(6x 2 - 6x + I)}.

5.12 (4) Im(idv - T) ~ Ker(T) because T(idV - T)(x) = T(x) - T 2(x) = O. Im(idV-
T) 2 Ker(T) because if T(x) = 0, x = x - 0 = X - T(x) = (idv - T)x.
5.13 (1) (x , x) = 0 for only x = O. (2) Use Definition 5.6(2).
5.16 1) is just the definition, and use (1) to prove (2).
5.17 -~+x.
5.18 (1) b E C(A) and y E N(A T) .

5.19 The null space of the matrix [b -i _~ i] is


x=t[I - I l O]T +s[-4IOIffort,sElR.
5.20 Note: R(A)l. = N(A).
5.22 There are 4 rotations and 4 reflections.
5.23 (1) r = ~, s= ~, a = JJ ' b = - JJ ' C = - JJ.
5.24 Extend {VI , ... , Vm} to an orthonormal basis {Vb " " vm, ... , vn}. Then IIxll 2 =
El=ll(x, vi)12 + E}=m+ll(x, Vj)l2.
5.25 (1) orthogonal. (2) not orthogonal.
5.26 x = (1, -1 , 0) + t(2, 1, -1) for any number t .

5.28 [ ~~ ] =x= (AT A)-IATb= [~~3:] .


'!g 16.1
5.30 For x E IRm, x = (VI, X)VI + . . . + (vm, x)vm = (VI vf)x + ... + (Vmv~)x.
5.31 The line is a subspace with an orthonormal basis ~ (1, 1), or is the column space of

A=~[~l
Selected Answers and Hints 371

5.32 P =~ [ 7 ; -~ ] .
3 1 - I 2
5.33 Note that (e) , e2, ea} is an orthonormal basis for the subspace.
5.35 Hint: First, show that P is symmetric.
Exercises
5.1 Inner products are (2) , (4), (5).
5.2 For the last condition of the definition, note that (A , A) = tr(A T A) = Lj,i at = 0
if and only if aij = 0 for all t, j .
5.4 (I)k = 3.
5.5 (3) 11/11 = IIgll =../f7'1., The angle is 0 ifn = m, ~ ifn i: m .
5.6 Use the Cauchy-Schwarz inequality and Problem 5.2 with x = (a), . . . , an) and
y = (1, .. . , 1) in (R", .).
5.7 (1) 37/4, JT97J.
(2) If (h , g) = h(% + ~ + c)= 0 with h i: 0 a constant and g(x) = ax 2 + bx + c,
then (a , b, c) is on the plane %+ ~ + c = 0 in jR3.
5.9 Hint: For A = [V) V2], two columns are linearly independent, and its column space
is W.
3 1
5.11 (I ) ZV2, (2) ZV2.
5.13 Orthogonal: (4). Nonorthogonal: (I ), (2), (3).
5.17 Use induction on n. Let B be the matrix A with the first column c) replaced by
c = c ) - Projw (C) , and write Projw(c) = a2c2 + ... +ancn for some c. ts. Show
that Jdet (AT A ) = J det( B T B ) = Ilcllvol(c2, . . . , cn) = vol(P(A.

5.18 Let A = [~ ! ~] . Then the volume of the tetrahedron is tJ det(A T A ) = 1.


012
5.19 A T A =I = det A imply det A = 1.
and det AT

sin e
. A = [cos
Th e matrix e ] ISort
sin e
e _ cos . h ' h det A = - I .
i WIt
ogona

5.21 Ax = b has a solution for every b E jRm if r = m. It has infinitely many solutions if
nullity =n - r =n - m > O.

5.22 Fiod a least squares <0""00 0' [ 1

i
. 3
my = a + bx. Then y = x + Z.

5.23 Follow",,,,,'se 5.22 with A ~ 1-~ !


[
-I ]
.Then y = 2x 3 - 4x 2 + 3x - 5.

27
372 Selected Answers and Hints

5.27 (1) Let h ex ) = !-U (x ) + f (- x and g(x ) = !-U (x ) - fe-x ~. Then f = h + g .


(2) For fEU and g E V , (f, g) = f~1 f (x )g (x)dx =- t:'
f(-t )g(-t)dt
= - f~1 f(t)g (t )dt = -(f,g ), by change of variable x = -rl,
(3) Expand the length in the inner product.
5.28 Five of them are true.
(1) Possible via a natural isomorphism. See Theorem 5.8.
(2) Consider (I , 0) and (-1 , 0).
(3) Consider two subspaces U and W of R 3 spanned by el and e2, respectively.
(4) IIx - YII + IIYII ::: [x] by the triangle inequality.
(5) The columns of any permutation matrix P are {el, ... , en} in some order.
(6) See Theorem 5.5.
(7) =
A is a projection iff A 2 A. (See Theorem 5.9.)
(8) By Corollaries 5.10 and 5.12.
(9) R(A) and C(A) are not subspaces of the same vector space .
(10) A dilation is an isomorphism.
(11) ATb E C(A T A) always.
(12) The solution set is Xo + N(A) by Corollary 5.17.

Chapter 6

l
Problems

6.3 Consider the matrices [~ ~] and [~ ~


6.4 Check with A = [~ ~ l
6.5 (1) Use det A = Al .. . An. (2) Ax = AX if and only if x = AA-Ix.
6.6 If A is invertible, then AB = A (B A)A -I. For two singular matrices A = [~ ~]
and B = [~ ~] , A Band BA are not similar, but they have the same eigenvalues .
6.7 (1) If Q = [XI X2 X3] diagonalizes A, then the diagonal matrix must be AI and
=
A Q AQI . Expand this equation and compare the corresponding columns of the
equation to find a contradiction on the invertibility of Q.

6.8 Q = [~ ~ l D = [~ ~ J Then A = QDQ-I = [=~ :l


6.9 (1) The eigenvalues of A are I , 1, -3, and their associated eigenvectors are (1, 1,0),
(-1,0, I) and (1, 3, 1), respectively.
(2) If f(x) =
x lO + x 7 + 5x, then f(1), f(1) and f(-3) are the eigenvalues of
A IO + A7 +5A .

6.11 Note that [ a~~1


an-I
] [i ~ -~] [a:~1
=
0 1 0 a n -2
] .TheeigenvaluesareI ,2,-Iand

eigenvectors are (1, I, 1), (4, 2,1) and (1, -1 , 1), respectively. It turns out that an =
2 2n
2 - (_I)n_ - -.
3 3
6.13 Its characteristic polynomial is f (A) = (A + 1) (A - 2)2 ; so (- Il , 2n , n2n form a
fundamental set.
Selected Answers and Hints 373

6.14 The eigenvalues are 0.6, 0.8, and 1.


6.15 The eigenvalues are 0, 0.4, and 1, and their eigenvectors are
(1,4, -5) , (1,0, -1) and (3, 2, 5), respectively.
6.17 For (1), use (A + B)k = 2:~=O (~)A i B k- j if AB = BA . For (2) and (3), use the
definition of eA . Use 0) for (4).
6.19 Note that e(A
T
) = (eA)T by definition (thus , if A is symmetric, so is eA), and use
(4).
2

6.20 A = 2l + N with N = [~o ~ ~].


0 0
Then N = O. e = e [
3 3 ~
3
A
1 3
0 o 1
2
~ ]
.

6.21 Yl = CJ e2x - !C2e-3x; Y2 = Cl e2x + C2e-3x .


I
I
2x 3x
Yl = - C2 e2x + C3 e3x Yl = e - 2e
6.22 Y2 = CJ eX + 2c2e2x - C3e3x, Y2 = eX - 2e + 2e3x
2x
Y3 = 2c2e2x - C3e3x Y3 = - 2e2x + 2e3x .

623 (1) [ :=: ], (2) [ i~;'3l


6.24 With respect to the standard basis a , [T]a = [~ ~ ;] with eigenvalues 3, 3, 5
104
and eigenvectors (0, 1,0), (-1,0,1) and 0 ,2,1), respectively.
6.25 With the standard basis for M2x2(lR):

1 1 0 1]
[T]a = A = b~ ~ ~ . The eigenvalues are 3, I, I, -I, and their asso-
[
1 0 1 1
ciated eigenvectors are 0,1 ,1, I), (-1,0,1,0), (0, -1 , 0, I) , and (-1, I, -1, I) ,
respectively.

6.26 With tho basis e ~ [I, x, x'), IT]. ~ A ~ [ ~ o2 0]


0 .
o 3
Exercises

6.1 (4) 0 of multiplicity 3, 4 of multiplicity 1. Eigenvectors are ej - e i+l for 1 ::: i ::: 3
and 2:1=1 ej.
6.2 f(A) = (A + 2) (A 2 - 8A + 15), Al = -2, A2 = 3, A3 = 5,
xl = (-35, 12, 19), x2 = (0, 3, 1), x3 = (0, 1, 1).
6.4 {v} is a basis for N(A) , and {u, w} is a basis for C(A).
374 Selected Answers and Hints

6.5 Assume that it is true forinvertible matrices. In each of the equations (1)-(3) both sides
continuously depend on the elements of A and B . Any matrix A can be approximated
by matrices of the form Ae = A + el which are invertible for sufficiently small
nonzero E. (Actually, if AI, . . . , An is the whole set of eigenvalues of A, then A e is
invertible for all E:/= -Ai,) Besides, if AB = BA , then AeB = BA e .
6.6 Note that the order in the product doesn't matter, and any eigenvector of A is killed
by B. Since the eigenvalues are all different, the eigenvectors belonging to 1,2,3
form a basis. Thus B = 0, that is, B has only the zero eigenvalue, so all vectors are
eigenvectors of B.

6.8 A = QDQ-I = ~ [~ -; =~].


2 1 2 7
6.9 Note that jRn = W E9 w- and pew) = w for w E Wand P(v) = 0 for v E w-.
Thus, the eigenspace belonging to A= 1 is W , and that to A = 0 is w-.
6.10 Foranyw E jRn,Aw = u(v T w) = (v-w)u. Thus Au = (v-uju.so u is an eigenvector
belonging to the eigenvalue A = v . u . The other eigenvectors are those in vi. with
eigenvalue zero. Thus, A has either two eigenspaces E(A) that are l-dimensional
spanned by u and E(O) = vi. if v . u :/= 0, or just one eigenspace E(O) = jRn if
vu=O.
6.11 AV = Av = A 2v = A2V implies A(A - 1) = O.
6.13 Use tr(A) = Al + ... + An = all + ... + ann'
6.14 A = QDI Q-I and B = QD2Q-I imply AB = BA since DID2 = D2DI .
Conversely, Suppose AB =
BA and all eigenvalues AI, , An of A are distinct.
Then the eigenspaces E (Ai) are all I-dimensional for i = 1, , n. But if Ax = Aix,
then ABx = BAx = ABx implies Bx E E(Ai)' Thus Bx = /l-x means x is also an
eigenvector of B . By the same reason, any eigenvector of B is also an eigenvector of
A. Choose a set of linearly independent eigenvectors of A, which form an invertible
matrix Q such that Q-I AQ = DI and Q-I BQ = D2.

6.16 With respect to the basis ex = {I, x, x 2 }, [Tl a = [6 ~


1 1 0
~] . The eigenvalues are
2,1, - 1 and the eigenvectors are (1, 1, 1), (-1 , 1,0) and (1, 1, -2) , respectively.
6.17 None is diagonalizable.

6.18 (1) D = [~o ~70 -7~] (2) D = [~I0 0~ ~]


5
(3) D = [~0 0~ 2~]
6.19 Eigenvalues are I, 1,2 and eigenvectors are (1,0,0), (0,1,2) and (1, 2,3). A lOx =
(1025, 2050, 3076).
6.20 Fibonacci sequence: an+1 = an + an-I with al = 2 and a2 = 3.
6.22 Thecharacteristicequation isA2-xA-0 .I8 = O. Since A = I is a solution, X = 0.82.

l
The eigenvalues are now I, -0.18 and the eigenvectors are (-0.3, -1) and (1, -0.6).

6.23 (l) e A = [~ e- ~
SelectedAnswers and Hints 375

6.24 The initial status in 1985 is Xo = (xo, YO, zo) = (0.4,0.2,0.4), where x, Y, z rep-
resent the perc[en;:ge] of lar[geo.~e~~m, ~d]sm[~.;a]r owners. In 1995, the sta-

tus is Xl = Yl =
0.3 0.7 0.1 0.2 = Axo. Thus , in 2025,
Zl 0 0.2 0.9 0.4
the status is X4 = A4xo. The eigenvalues are 0.5, 0.8, and 1, whose eigenvectors
are (-0.41,0.82, -0.41), (0.47,0.47, -0.94), and (-0.17, -0.52, -1.04), respec-
tively.

I
YI(X) = _2e 2(l - x ) +4e2(x - l ) 2x.
6.27 (1) Y2(X) = _e 2(I - x) + 2e2(x - l ) (2) { Yl (x) =e (co~x.- smx)
( )
Y3X = 2e
2(J-x)
-e
2 2(x-l)
.
Y2(X) = 2e smx .

6.28 Yl = 0, Y2 = 2e2t, Y3 = e2t.


6.29 (1) I (A) =
A3 - lOA 2 + 28A - 24, eigenvalues are 6, 2, 2, and eigenvectors are
(1,2,1), (-1, 1,0) and (-1, 0,1).
(2) I(A) =(A - 1)(A2 - 6A + 9), eigenvalues are 1, 3,3, and eigenvectors are
(2, -1 ,1), (1, 1,0) and (1, 0,1).
6.30 Two of them are true:
(1) For A = [~ ~ 1 if B = Q-l AQ then B must be the identity.
Or. [~ ~] and [~ ~] have a different eigenvalue .

1
(2) See Example 6.3.

(3) Consider A = [~ ~] and B = [~ ~


(4) Consider [~ ~]. (5) Consider [~ ~ 1 (6) Consider [~ ~ l
(7) For any eigenvalue A of A, A + 1 is an eigenvalue of A +
l
I.

(8) Consider A = [b ~ ] and B = [~ ~


(9) tr(A + B) = tr(A) + tr(B). See Theorem 6.3.
(10) If both belong to the same eigenvalue .
(11) In Example 6.6, Q-l AQ is diagonal and its two linearly independent eigenvec-
tors are el and e2.
(12) Use Theorem 6.25 with A In. =
Chapter 7
Problems
7.1 (1) u- v = liT v = Lj uiv; = Lj V;Uj = v u.
= Lj kuiu, = kI:; UjVj = k(u , v).
(3) (ku) v
(4)u u = Lj lu;l2 ~ 0, and u u = 0 if and only ifuj = 0 for all i ,
7.2 (1) If x = 0, clear. Suppose X =1= 0 =1= y. For any scalar k,
o :::: (x - ky, x - ky) = (x , x) - k(x, y) - k(y, x) + kk(y, y). Let k = ~ to obtain
I (x, x) (y, y) - 1(x, y) 12 ~ O. Note that equality holds if and only if x = ky for some
scalar k.
(2) Expand IIx + yll2 = (x + y, x + y) and use (1).
376 Selected Answers and Hints

7.3 Suppose that x and yare linearly independent, and consider the linear dependence
a(x + y) + b(x - y) = 0 ofx +y and x - y. Then 0 = (a +b)x + (a - b)y . Since x
=
and yare linearly independent , we have a + b = 0 and a - b 0 which are possible
only for a = =
0 b. Thus x + y and x - y are linearly independent. Conversely, if
x + y and x - y are linearly independent, then the linear dependence ax + by 0 =
=
ofx and y gives ~(a + b) (x + y) + ~(a - b)(x - y) O. Thus we get a 0 b. = =
Thus x and yare linearly independent.
7.4 (1) Eigenvalues are 0, 0, 2 and their eigenvectors are (1,0, -i) and (0, 1,0), respec-
, I-p,and their eigenvectors are (1, -i, ),
tively. (2) Eigenvalues are 3,
(1- 3i, I, ( 1 + .. and (- 4+ 3i, I, l+p(l + .. respectively.
7.5 Refer to the real case.
7.6 (AB)H = (ABl = 7F7/ = BHA H .
7.7 (AH)(A-I)H = (A-IA)H = I.
7.8 The determinant is just the product of the eigenvalues and a Hermitian matrix has
only real eigenvalues.
7.9 See Exercise 6.10.
7.10 To prove (3) directly, show that I(x . y) = jr(x . y) by using the fact that A H x = -/-,x
when Ax = /-,X.
7.11 A H = BH += BT - iC T = -B - iC = -A .
(iC)H
7.12 AB = (AB)H = BHA H = (B)(A) = BA, + if they are Hermitian, - if they
are skew-Hermitian.
= =
7.13 Note that det U H det U, and I detl det(UH U) = = I det U1 2.
7.15 If A-I =
A H and B-1 =
BH, then (AB)HAB I. =
7.16 Hermitian means the diagonal entries are real, and diagonality implies off-diagonal
entries are zero. Unitary means the diagonal entries must be l.

(1)IfTJii./3+~ -ii./3+~] U-IATJ~-~i./3


.
718
v1 -i./3 i./3 ' v1 0 ~ + 0~i./3 ]
(2)lfU=[-~~!5i
o
~:!i
-1
~~!5i]'U-IAU=[~1
0 0
~0 2i~]
7.20 Note that A has two distinct eigenvalues.
7.21 This is a normal matrix. From a direct computation, one can find the eigenvalues,
I - i , 1 - i and I + 2i, and the associated eigenvectors : (-1 , 0, 1), (-1, 1, 0) and
(1,1 ,1), respectively, which are not orthogonal. But, by an orthonormalization, one
can obtain a unitary basis-change matrix so that A is unitarily diagonalizable.
7.22 AHA = (HI - H2)(HI + H2) = (HI + H2)(HI - H2) = AA H if and only if
HIH2 - H2Hl = O.
7.23 In one direction these are all already proven in the theorems. Suppose that UH AU =
D for a unitary matrix U and a diagonal matrix D.
(1) and (2). If all the eigenvalues of A are real (or purely imaginary), then the diagonal
=
entries of D are all real (or purely imaginary) . Thus D H D, so that A is Hermitian
(or skew-Hermitian).
=
(3) The diagonal entries of D satisfy IAI 1. Thus, D H =tr:', and
A H = UD- 1U- 1 = A-I.
SelectedAnswers and Hints 377

7.24 Q = _1 [~ -./2 -1]


./2 -2 .
./6 ./3 ./21

7.25 (1)A=i[ -~ -~ ]+%[ ~ ~ 1


1 (l +..(6)(2+i) ]
(2) B = 3+ g./6 [ (1+1)(2- i) 7+~./6
1 (1-../6)(2+i) ]
3- g./6 [
+ a
(l-v~(2-i) 14-2
7J-i6'

Exercises
7.1 (1)./6, (2) 4 .
7.4 (1) A= i, x = t(I, -2 - i), A= - i, x = t(1, -2 + i).
(2) A= I, x = t(i , I), A = -I , x = t(-i , 1).
(3) Eifenvalues are 2, 2 + i, 2 - i, and eigenvectors are (0, -I, 1),
(1 , -3'(2 + i), I), (1, - ~ (2 - i), 1).
(4) Eigenvalues are 0, -1 ,2, and eigenvectors are
(1,0, -1 , (1 , - i , 1), (1, 2i, 1).
7.6 A + cI is invertible if det(A + cI) i= O. However, for any matrix A, det(A + eI) =
o as a complex polynomial has always a (complex) solution . For the real matrix
COS8 - cossin 88 ] ' A + r I "IS mvertiible f or every real number
num er rr SInce
si A h as no rea1
[ sin 8
eigenvalues.

7.7 (1)./3
I[ I+i1 I-i]
-1 ' (2) ~ [~I2i ~ 1 ~ i ].
-I+i

7.10 (2) Q =./2 1[I 1] 1 -1 .


7.12 (1) Unitary ; diagonal entries are {I, i}. (2) Orthogonal; {cos 8+i sin 8 , cos 8-i sin 8},
where 8 = cos-I (0.6) . (3) Hermitian; {I , 1+./2 , 1 - ./2} .
7.13 (1) Since the eigenvalues of a skew-Hermitian matrix must always be purely imagi-
nary, 1 cannot be an eigenvalue .
(2) Note that, for any invertible matrix A, (eA)H = e AH = e- A = (eA)-I .
=
7.14 det(U - AI) det(U - AI)T det(U T - AI). =
7.15 U= ~[ ~ -~ l D=U
HAU=[26"i
2~ i l
7.17 See Exercise 6.14.
7.18 The eigenvalues are I, 1,4, and the orthonormal eigenvectors are
(Ji,-Ji' 0), (- ~, - ~, 1) and (-JJ' JJ, JJ).Therefore,
A = ~ [~l
-I
-;1 =~]
-I 2
+ ~ [~I I~ I~] .
378 Selected Answers and Hints

7.20 See Theorem 7.8.


7.21 If A is an eigenvalue of A , then An is an eigenvalue of An. Thus, if An = 0, then
An = 0 or A = O. Conversely, by Schur's lemma, A is similar to an upper triangular
matrix, whose diagonals are eigenvaluesthat are supposed to be zero. Then it is easy
to conclude A is nilpotent.
7.22 Ten of them are true.
(I) See Theorem 7.7. .
(2) Consider . 9 -sin9]
[COS9 . 9
9 with 1= kn .
l
Sin cos

(3) True: See Corollary 6.6. (4) Consider [~ ~


(5) Such a matrix A is unitary.
(6) and (7) A permutationmatrixis an orthogonalmatrix,but need not be symmetric.
(8) True: If A is Hermitian,by Schur's lemma, A is orthogonallysimilar to an upper
triangular matrix T.1f A is nilpotent,the eigenvaluesof A, as the diagonalentries
of T, are all zero. By showing TH =
T , one can conclude that such A must be
the zero matrix.
(9) Schur's lemma.
(IO) For a Hermitianmatrix A, -i cannot be an eigenvalueof A . Hence,det(A +i I) 1=

l
O.
(11) Consider A = [~ =~
(12) Modify (IO).
(I3) If U H AU = D with real diagonal entries, then A H = A.
(I5) I det U I = 1 for any unitary matrix U.
Chapter 8
Problems
8.2 Hint: Let

Then,

~ [i o1
01 00 00]
J -H
o0 1 0
000 1
o0 0 0

and that (J - Al)4 1= 0 but (J - A1)5 = O.

n
8.3 Six different possibilities.

U !l [!
0 0
1
2 0
8.4 (1) [~ ~] , (2) 4 (3)
0 1
0
0 0
Selected Answers and Hints 379

=~ ~ ~]
1 0 1 .
1 0 0
8.7 See Problem 6.1.
8.8 f(A) # det(Al - A) in general .
8.9 For a diagonal D, all diagonal entries of f(D) are zero. For a diagonalizable A =
Q-l DQ , f(A) =
Q-l f(D)Q .
8.10 Let A}, . .. , An be the eigenvalues of A. Then

f(A) = det(Al - A) = (A - AI) ' " (A- An).

Thus , f (B) = (B - A11m) ... (B - An 1m) is nonsingular if and only if B - Ai1m,


i = 1, .. . , n, are all nonsingular. That is, none of the Ai'S is an eigenvalue of B.
1 0 -1/2] [1 0 1023 ]
8.11 (1) A-I = 0 1/2 0 and A lO = 0 1024 0 .
[ o 0 1/2 0 0 1024
(2) The characteristic POlYnonn['a~:f AOis ~~)]= (A -1)(A - 2)2, and the remainder
is 104A2 - 228A + 1381 = 0 98 0 .
o 0 98
8.12 For J(2) ,m(x) = (x - A)3 . For J(3) , m(x) = (x _A)4.
8.14 (1) m(A) = (A- 1) (A- 2)3 .
n 1 n n
0 _2 + 2 2 ]
2 n _2 n 0 0 2n
(2) m(A) = A (A - 2) . A = _2n 0 0 2n n ~ 2.
[
_2n _2n+1 2n 2n+1
2 2 2
-2 + e
1 3 - 2e -1 + e ]
2 2
8.15 (2)eA = l-e 2 -1 -1 +e .
[ 1- e
2 1 0 -1 + e 2
1 - e2 3 - 2e 2 -2 + e2 -1 + 2e 2
8.17 The eigenvalue is -1 of multiplicity 3 and has only one linearly independent eigen-
vector (1, 0, 3). The solution is
2
Yl (t) ] [ -1 - 5t + 2t ]
y(t) = Y2(t) = e:' -1 +4t .
[
Y3(t) 1 - 1St + 6t 2
380 Selected Answers and Hints

8.2 Find the Jordan canonical form of A as Q-I A Q = J. Since A is nonsingular,


all the diagonal entries Ai of J , as the eigenvalues of A, are nonzero . Hence, each
Jordan blocks Jj of J is invertible. Now one can easily show that (Q-I AQ)-I =
Q-I A-I Q = J- I which is the Jordan form of A-I, whose Jordan blocks are of the
form J;I .
8.3 (1) [ej , ea}; (3) {el, e3, ea, eS}.
8.4 (1) For A = j - I, XI = (1, -1); for A = j + I , x2 = (1, 1) . (2) For A = -I,
XI = (-2, 0, 1), x2 = (0, 1, 1), and for A = 0, XI = (-1, 1, 1). (3) For A = I,
XI =(2,0, -I), X2=(-~, -~ , ~),andforA=-I,xl =(9,1, -1).
8.6 Solve the recurrence relation in Example 2.14.
8.7 (x,y)=~(4+ j ,i).

8.9 Use [ 31 31 ] = [~_~!I] [20 40 ] [11 -1


1 ] .

8.10 Use A = QJQ-I = [=~6 -1i -4~] [~20 0~ -4~] [~i -~~ ot].
'.11 y(l) ~~,~ ~] _~U [ [ -t' l
I
YI (t) _2e 2(I-t) + 4e 2(t-l)
8.12 n(t) = _e 2(1-t) + 2e 2(t-l)
Y3 (t) = 2e 2(1-t) 2e 2(t-l)

I
YI (t)
2(t - l)e
'
8.13 Y2(t) -2te t
Y3(t) =
(2t - l)e
'
8.14 (1) (a - d)2 + 4bc f= 0 or A = a/.
8.15 (1) t 2 + t - 11, (2) t 2 + 2t + 13, (3) (t - 1)(t 2 - 2t - 5).

U -: ]
8.17 (3)

A-I~
0
I
2: -2: ' An = [IOn
0 2n 2n - 1 ] .
0 1 o 0 1
(5)

r l - [ -:
0
1 o 0 o 0] n [ n1
00]
o1 0 0
- 1 -1 1 0 ' A = n(yl) n 1 0 .
1 0 o 1 -n 0 o 1

8.18 (2) The characteristic polynomial of W is I(A) = An - 1. So, its eigenvalues are
1, w , w 2, . . . , wn- I, where w = e2ni/n.
(3) Eigenvalues of A are Ak = L:1=1 aiw(i -l)k = al + a2wk + a3w2k + . . . +
anw(n-I)k , k = 0, 1, .. . .n - 1.
n
(4) detA = k: 6(al +a2wk +a3w~ + . .. +anwk- I), where oj, = e2nik /n .
(5) The characteristic polynomial of B is I(A) = (A - n + 1)(A + l)n-l .
Selected Answers and Hints 381

8.19 Six of them are true.


(I) and (2) See Theorem 8.1.
(3) and (4): Check it with Example 8.2.
(5) and (6) See Examples 8.2 and 8.3.
(7) det e 1n = en .
(8) For any Jordan matrix J, J and JT are similar.
(9) What is the minimal polynomial of In? (10) See Example 8.18(2).
(II) See Corollary 8.7.

Chapter 9
Problems
9.1

5]

11
3-4] 1[011] 110-0
(I) [ ; 1 , (2) 2 1

-1 1 ,(3) -1 2 .
-4 1 4 [
1 1 -5 2-1

9.2 (1) The eigenvalues of A are 1,2, 11. (2) The eigenvalues are 17,0, -3, and so it is
a hyperbolic cylinder. (3) A is singular and the linear form is present, thus the graph
is a parabola.
9.3 B with the eigenvalues 2, 2 + J'i and 2 - J'i.
9.5 The determinant is the product of the eigenvalues.

9.6 False with a counter-example A = [~ _~ ] .


9.8 (1) is indefinite. (2) and (3) are positive definite.
9.13 (2) bll = bl4 = b41 = b44 = 1, all others are zero.
9.15 Ifu E un w, then u = ax + f3y E W for some scalars a and f3. Since x,
y E U,
b(u, x) = b(u, y) = 0. Butb(u, x) = f3b(y, x) = -f3 andb(u, y) = ab(x, y) = a .
9.16 Letc(x, y) = ~(b(x , y)+b(y , x)) andd(x, y) = ~(b(x, y)-b(y, x)) .Thenb = c+d.
9.17 Let D be a diagonal matrix, and let D' be obtained from D by interchanging two
diagonal entries dii and dj l : i =1= j. Let P be the permutation matrix interchanging
i-th and j-th rows. Then P Dp T = D' .
9.18 Count the number of distinct inertia (p, q, k). For n, the number of inertia with p =i
is n - i + 1.
9.19 (3) index = 2, signature = 1, and rank = 3.
9.20 (1) local minimum, (2) saddle point.
9.21 Check it with [t, y, z) =x2 - y2 - z2.
9.22 Note that the maximum value of R(x) is the maximum eigenvalue of A, and similarly
for the minimum value.
9.23 max =i at (I/J'i, 1/J'i), min = ~ at (1/J'i, - 1/ J'i).
1
9.24 (I) max = 4 at .J6 (1, 1, 2), min = -2 at ~ (-1, -1 , 1);
I
(2)max=3at .J6(2, I , 1), min ee Oat E ~(I, -1 , -1).
382 Selected Answers and Hints

Exercises

9.1 (1) [ ; ; J' (3) [ ;


3 -4 -3
-~ -~], (4) [ ; 0 4-1
-; -~] .
9.3 (2) {(2, 1,2) , (-1, -2,2), O,O,O)}.
9.4 (i) If a = 0 = c, then X, = b. Thus the conic section is a hyperbola.
(ii) Since we assumed that b i= 0, the discriminant (a - c)2 + 4b 2 > O. By the
symmetryof the equationin x and y, we may assume that a - c ::: O.
If a - c = 0, then x, = a b. Thus, the conic section is an ellipse if ),,1),,2 =
a 2 - b2 > 0, or a hyperbolaif a 2 - b2 < O. If ),,1),,2 = a 2 - b2 = 0, then it is a
parabola when x] i= 0 and e' i= 0, or a line or two lines for the other cases.
If a - c > O. Let,2 = (a - c)2 + 4b 2 > O. Then x, = (a+ for j = 1,2.
2)r
Hence, 4),,1),,2 = (a + c)2 _,2 = 4(ac - b2) . Thus, the conic section is an ellipse if
det A = ac -b2 > 0, ora hyperbolaifdet A = ac-b2 < O.lfdet A = ac-b2 = 0,
it is a parabola,or a line or two lines dependingon some possiblevaluesof d ', e' and
the eigenvalues.
t
9.6 If)" is an eigenvalue of A, then)"2 and areeigenvalues of A 2 and A-I, respectively.
Note x T (A + B)x = x T Ax + x T Bx.
9.8 (1) Q = Jz [~ _~ 1 The formis indefinite witheigenvalues)" = 5 and)" = -1.

9.10 (1) A = [~ - ~ J. (2) B = [~ :], (3) Q = [~ - i 1


J
9.11 (2) The signatureis I, the index is 2, and the rank is 3.

9.15 (2)The point (1, rr) is a criticalpoint,and the Hessianis [~ _~ Hence, JO, iT)
is a local maximum.
9.18 (1) Supposethat lpw = ({Jw' . Then, for all v E V,
b(v, w) = ({J(w)(v) = ((J(w')(v) = b(v, Wi) or b(v, w - Wi) = O.
The non-degeneracy of b implies that w = w', that is, ({J is one-to-one. This also
implies that dim W ::: dim V". A similar argument shows that the linear trans-
formation 1ft : V ...... woO is also one-to-one, and therefore dim V ::: dim WoO.
Since dim V = dim V" and dim W = dim WoO from Theorem 4.18, we have
dim V ::: dim WoO = dim W ::: dim V" = dim V. Therefore, ({J and 1ft are sur-
jective, and so are isomorphisms.
(2) comes from (1).
9.19 Exactly seven of them are true.
(1) See Theorem9.1 (The principalaxes theorem) (1)
(2) Any two congru[en; m;tri~e]s have the same inertia. [ -1

(3) Consider A = ~ ~ ~ . (4) Consider A = 0

(7) Considera bilinearform b(x, y) = XIYl - X2Y2 on JR2.


(9) The identity I is congruentto k 2 I for all k E JR.
(10) See (9).
(12) Consider a bilinear form b(x, y) = XIY2. Its matrix Q [ o0 1] .
0 IS not
diagonalizable.
Bibliography

1. M. Artin, Algebra, Prentice-Hall, Englewood Cliffs, NJ, 1991.


2. M. Braun, Differential Equations and Their Applications, 4th Edition , Springer-
Verlag, New York, 1993.
3. P.R. Gantmakher, The Theoryof Matrices, I, II, Chelsea, New York, 1959.
4. P.R. Halmos, Finite-dimensional Vector Spaces, Springer-Verlag, New York, 1974.
5. K. Hoffman and R. Kunze, LinearAlgebra, 2nd Edition, Prentice-Hall, Englewood
Cliffs, NJ, 1971.
6. R.A. Hom and C.R. Johnson , Matrix Analysis, Cambridge University Press , Cam-
bridge , 1986.
7. G. Strang, Linear Algebra and Its Applications, 3rd Edition , Harcourt Brace Jo-
vanovich, San Diego, CA, 1998.
8. V.V. Prasolov, Problems andTheorems in LinearAlgebra, Translatedfrom theRussian
manuscript by D.A. Lettes, American Mathematical Society, Providence , RI, 1994.
Index

LDU decomposition, 32 rank of, 341


LDU factorization, 32 skew-symmetric, 340
LU decomposition, 29 symmetric, 340
LU factorization, 29 Binet-Cauchy formula, 66
QR decomposition, 193 Block ,19
QR factorization, 193 matrix, 19
n-space
complex, 247 Cauchy-Schwarz inequality, 160,252
real,76 Cayley-Hamilton theorem, 244, 295
Characteristic equation, 202
Additivity, 158,248 Characteristic polynomial, 202
Adjoint, 143, 145,254 Characteristic value, 202
Adjugate , 61 Characteristic vector, 202
Angle, 157, 161 Cholesky decomposition, 332
Antilinear, 248 Cholesky factorization , 332
Associated matrix, 128 Circulant matrix, 65, 317
Augmented matrix, 5 Coefficient matrix, 6
Cofactor, 57
Back substitution, 8 expansion , 58
Basic variable, 10 Column (matrix), 12
Basis, 86 Column space , 84, 92
change of, 134 Column vector, 12
dual,l44 Companion matrix, 214, 215
ordered, 125 Computer graphics, 148
orthonormal, 165 Congruent matrix, 336
standard, 86, 163 Conic section, 321, 322
Basis-change matrix , 136 Conjugate, 248
Bessel's inequality, 181 Coordinate, 75
Bijective, 123 homogeneous, 151
Bilinear, 158 rectangular, 165
Bilinear form , 339 Coordinate function , 144
alternating, 340 Coordinate vector, 125
diagonalizable, 342 Coordinate-change matrix, 136
nondegenerate, 358 Cramer's rule, 62
386 Index

Critical point, 349 Gauss-Jordan, 8


Cross-product term, 324 Entry, 12
Cryptography, 34 Equilibrium state, 225
Euclidean n-space, 158
Decomposition Euclidean complex n-space, 248
LDU ,32 Euclidean inner product, 158
LU,29 Euclidean length, 248
QR,193 Exponential matrix, 232, 306
Cholesky,332
Definite form, 326, 327 Factorization
negative, 326, 327 LDU, 32
positive, 326, 327 LU,29
Determinant, 46 QR,193
Diagonal entry, 12 Cholesky, 332
Diagonal matrix, 12 Fibonacci,212
Diagonalizable number,212
orthogonally, 258 sequence, 212
unitarily,258 Forward elimination, 7
Diagonalization Fourier coefficient, 144
of a quadratic form, 324 Free variables, 10
of linear transformation,240 Fundamental set, 229
of matrices, 207 Fundamental theorem, 96
Difference equation
Gauss-Jordan elimination, 8
linear, 217
Gaussian elimination, 8
Differential equation
General solution, 226, 229
linear, 226
Generalized eigenspace, 284
Dilation, 180
Generalized eigenvector, 282
Dimension, 89
chain of, 282
finite, 89
Gerschgorin's theorem, 222
infinite, 89
Global maximum, 327
Direct sum, 81
Global minimum, 327
Discrete dynamical system, 221
Golden mean, 214
Distance, 157, 161,249
Gram-Schmidt orthogonalization, 166
Dot product, 157, 158
Dual basis, 144 Hermitian congruent matrix, 336
Dual space, 143 Hermitian form, 340
rank of, 341
Eigenspace, 202 Hermitian matrix, 254
Eigenvalue, 202 Hessian, 349, 351
Eigenvector,201 Homogeneity, 158
Electrical network, 36 Homogeneouscoordinate, 151
Elementary column operation, 25 Homogeneoussystem, 1
Elementary matrix, 23
Elementary operations, 4 Idempotent matrix, 43, 243
Elementary product, 55 Identity matrix, 17
signed,55 Identity transformation, 118
Elementary row operation, 6 Image, 119
Elimination, 3 Indefinite form, 326, 327
forward,7 Index, 347
Index 387

Inertia, 327, 336, 346 associated matrix of, 128


Initial condition, 226 diagonalization of, 240
Initial eigenvector, 282 dilation , 180
Injective, 123 eigenvalue of, 240
Inner product, 158,248 eigenvector of, 240
complex, 248 identity, 118
Euclidean, 157, 158 image, 119
Hermitian, 248 invertible , 123
matrix representation of, 163 isomorphism, 123
positive definite, 158,248 kernel,119
real, 158 matrix representation of, 128
Input-output model , 38 orthogonal, 179
Interpolating polynomial, 109 projection, 168
Interpolation, 108 reflection, 119
Inverse rotation, 118
left,21 scalar multiplication of, 131
right, 21 sum of, 131
Inverse matrix, 22 transpose, 145
Inversion, 54 zero, 118
Invertible matrix, 22 Linearly dependent, 84
Isometry, 179 Linearly independent, 84
Isomorphism, 123 Lower triangular matrix, 12
natural, 125
Magnitude, 157, 160,248
Jordan, 273 Markov matrix, 225
block,274 Markov process, 224, 225
canonical form, 273,274 Matrix, 11
canonical matrix, 274 associated, 128
augmented, 5
Kernel,119 basis-change, 136
Kirchhoff's Current Law, 36 block,19
Kirchhoff's Voltage Law, 36 circulant, 65, 317
column, 12
Leading 1's, 8 congruent, 336
Least squares solution, 181 coordinate-change, 136
Length, 157, 160,249 diagonal, 12
Linear combination, 82 diagonalizable, 207
Linear dependence, 84 diagonalization of, 207
Linear difference equation, 217, 221, 309 elementary, 23
Linear differential equation , 226, 235, 310 entry of, 12
Linear equations, 1 exponential, 232, 306
consistent system of, I Hermitian, 254
homogeneous system of, 1 Hermitian congruent, 336
inconsistent system of, 1 idempotent, 43
system of, 1 identity, 17
Linear form , 320 indefinite, 326, 327
Linear functional , 143 inverse, 22
Linear programming, 353 invertible , 22
Linear transformation, 117 Jordan canonical, 274
388 Index

lower triangular, 12 Nilpotentmatrix, 43


Markov,225 Nonsingularmatrix, 22
minimal polynomialof, 299 Normal equation, 182
negativedefinite,326, 327 Normal matrix, 262
negative semidefinite, 326, 327 Normalization, 165
nilpotent, 43 Null space, 92
nonsingular, 22 Nullity,92
normal,262
order of, 12 Ohm's Law,36
orthogonal, 177 One-to-one, 123
orthogonal part of, 193 Onto, 123
orthogonal projection, 173, 190, 195 Ordered basis, 125
permutation,25 Orientation,68
positive definite, 326, 327 Orthogonal, 170
power of, 289 complement, 170
product of, 15 decomposition, 171
row, 12 matrix, 177
scalar multiplication of, 13 projectionmatrix, 173, 190, 195
semidefinite, 326, 327 transformation, 179
similar, 140 vectors, 162
simultaneously diagonalizable, 243 Orthogonalization, 166
singular,22 Gram-Schmidt, 166
size of, 11 Orthogonally similar, 258
skew-Hermitian, 254 Orthonormalbasis, 165
skew-symmetric, 14 Orthonormal vectors, 165
square, 12
stochastic, 225 Paraboliccylinder,329
sum of, 13 Parallelepiped, 68
symmetric, 14 Parallelogram, 68
transpose of, 12 equality,198
tridiagonal, 65, 316 Particular solution,226, 229
unitary,256 Permutation, 54
upper triangular, 12 even, 54
upper triangular part of, 193 inversionof, 54
Vandermonde, 60, 64, 109 odd,54
zero, 13 matrix, 25
Matrix of cofactors, 61 sign of, 54
Matrix polynomial,42, 294 Perpendicularvectors, 162
Matrix representation, 128, 163 Pivot,7
inner product, 163 Polarizationidentity, 198
linear transformation, 128 Polynomialapproximations, 186
Maximum, 349 Predator-prey problem, 230
Minimal polynomial,299 Principal submatrix,332
Minimum, 349 Projection, 168
Minor, 57 Pythagoreantheorem, 162, 252
Monic, 215
Multilinear, 47 Quadraticequation, 320
Quadraticform, 319, 320
Newton's second law, 189 complex,320
Index 389

Quadratic surface, 321 Stochastic matrix, 225


Submatrix, 19
Rank ,98 minor, 57
Rank theorem, 98 principal , 332
Rayleigh quotient, 354 Subspace, 79
Real inner product space , 158 fundamental, 175
Recurrence relation , 212, 214 spanned ,82
Reduced row-echelon form, 8 sum of, 81
Row (matrix), 12 Substitution, 3
Row space, 92 Sum of
Row vector, 12,91 linear transformations, 131
Row-echelon form, 8 matrices, 13
reduced,8 subspaces,81
Row-echelon matrix, 8 vectors, 77
Row-equivalent, 6 SuIjective, 123
Sylvester's law of inertia, 336, 346
Saddle point, 327, 350 Symmetric matrix, 14
Sarrus 's method , 51
Trace, 120, 143
Scalar, 13,75
Transformation
Scalar multiplication of
identity, 118
linear transformation, 131
injective, 123
matrix, 13
linear, 117
vectors , 77
surjective, 123
Schur's lemma , 258
zero, 118
Second derivative test, 350
Transpose, 12, 145
Self-adjoint, 254
Triangle inequality, 162, 252
Semidefinite form, 326, 327
Tridiagonal matrix, 65, 316
negative, 326,327
positive, 326, 327 Unit vector, 165
Semilinear, 340 Unitarily similar, 258
Sesquilinear form, 340 Unitary matrix, 256
Sign of permutation, 54 Unitary space, 248
Signature, 347 Upper triangular matrix, 12
Similar, 140
orthogonally, 258 Value
unitarily, 258 characteristic, 202
Similar matrix, 140 Vandermonde matrix, 60, 64, 109,304
Similarity, 140 Vector, 75, 77
Simultaneously diagonalizable, 243 characteristic, 202
Singular matrix, 22 column, 12
Skew-Hermitian form, 341 component of, 75
Skew-Hermitian matrix, 254 orthogonal, 162
Skew-symmetric matrix, 14 perpendicular, 162
Spectral decomposition, 266 row, 12,91
Spectral theorem , 265 scalar multiplication of, 77
Square matrix, 12 sum of, 77
Square term, 324 unit, 165
Standard basis, 86, 163 zero, 77
Standard ordered basis, 125 Vector addition , 77
390 Index

Vector space, 77 Wronskian, 64, 110,228


complex , 78, 248
isomorphic, 123 Zero matrix, 13
real,77 Zero transformation, 118
Volume, 67 Zero vector, 77

Anda mungkin juga menyukai