Anda di halaman 1dari 129

Arindama Singh

Linear Algebra for Engineers

Classnotes for MA2031


Department of Mathematics, IIT Madras

Contents

Matrix Operations
1.1 Examples of linear equations
1.2 Basic matrix operations . . .
1.3 Transpose and adjoint . . . .
1.4 Elementary row operations .
1.5 Row reduced echelon form .
1.6 Determinant . . . . . . . . .
1.7 Computing inverse of a matrix

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

1
1
3
10
13
15
20
23

Rank and Linear Equations


2.1 Linear independence . . . . . .
2.2 Determining linear independence
2.3 Rank of a matrix . . . . . . . . .
2.4 Solvability of linear equations .
2.5 Gauss-Jordan elimination . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

27
27
31
33
36
40

Subspace and Dimension


3.1 Subspace and span . . . .
3.2 Basis and dimension . . . .
3.3 Matrix as a linear map . . .
3.4 Change of basis . . . . . .
3.5 Equivalence and Similarity

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

45
45
48
54
57
61

.
.
.
.

65
65
67
71
73

Eigenvalues and Eigenvectors


5.1 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Characteristic polynomial . . . . . . . . . . . . . . . . . . . . . .
5.3 Special types of matrices . . . . . . . . . . . . . . . . . . . . . .

77
77
78
82

.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

Orthogonalization
4.1 Inner products . . . . . . . . . . . . . .
4.2 Gram-Schmidt orthogonalization . . . .
4.3 Best approximation . . . . . . . . . . .
4.4 QR factorization and least squares . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

iii

iv
6

Elementary Matrix Theory


Canonical Forms
6.1 Schur triangularization . . .
6.2 Diagonalizability . . . . . .
6.3 Jordan form . . . . . . . . .
6.4 Singular value decomposition
6.5 Polar decomposition . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

87
. 87
. 91
. 97
. 110
. 117

Short Bibliography

123

Index

124

1
Matrix Operations

1.1

Examples of linear equations

Linear equations are everywhere, starting from mental arithmetic problems to advanced defense applications. We start with an example. The system of linear equations
x1 + x2 = 3
x1 x2 = 1
has a unique solution x1 = 2, x2 = 1. Substituting these values for the unknowns, we
see that the equations are satisfied; but why are there no other solutions? Well, we
have not merely guessed this solution; we have solved it! The details are as follows.
Suppose the pair (x1 , x2 ) is a solution of the system. Subtracting the first from the
second, we get another equation: 2x2 = 2. It implies x2 = 1. Then from either of
the equations, we get x1 = 1. To proceed systematically, we would like to replace the
original system with the following:
x1 + x2 = 3
x2 = 1
Substituting x2 = 1 in the first equation of the new system, we get x1 = 2. In fact,
substituting these values of x1 and x2 , we see that the original equation is satisfied.
Convinced? The only solution of the system is x1 = 2, x2 = 1. What about the
system
x1 + x2 = 3
x1 x2 = 1
2x1 x2 = 3
The first two equations have a unique solution and that satisfies the third. Hence this
system also has a unique solution x1 = 2, x2 = 1. So the extra equation does not put
any constraint on the solutions that we obtained earlier.
But what about our systematic solution method? We aim at eliminating the first
unknown from all but the first equation. We replace the second equation with the one

Elementary Matrix Theory

obtained by second minus the first. We also replace the third by third minus twice
the first. It results in
x1 + x2 =

x2 = 1
3x2 =

Notice that the second and the third equations coincide, hence the conclusion. We
give another twist. Consider the system
x1 + x2 = 3
x1 x2 = 1
2x1 + x2 = 3
The first two equations again have the same solution x1 = 2, x2 = 1. But this time, the
third is not satisfied by these values of the unknowns. So, the system has no solution.
Also, by using our elimination method, we obtain the equations as:
x1 + x2 =

x2 = 1
x2 = 3
The last two equations are not consistent. So, the original system has no solution.
Finally, instead of adding another equation, we drop one. Consider the linear
equation
x1 + x2 = 3
having only one equation. The old solution x1 = 2, x2 = 1 is still a solution of this
system. But x1 = 1, x2 = 2 is also a solution. Moreover, since x1 = 3 x2 , by assigning x2 any real number, we get a corresponding value for x1 , which together give
a solution. Thus, it has infinitely many solutions. Notice that the same conclusion
holds if we have more equations, which are some multiple of the only given equation.
For example,
x1 + x2 = 3
2x1 + 2x2 = 6
3x1 + 3x2 = 9
We see that the number of equations really does not matter, but the number of
independent equations does matter.
Caution: the notion of independent equations is not yet clear; nonetheless we have
some working idea.
It is not also very clear when does a system of equations have a solution, a unique
solution, infinitely many solutions, or even no solutions. And why not a system of

Matrix Operations

equations has more than one but finitely many solutions? How do we use our elimination method for obtaining infinite number of solutions? To answer these questions,
we will introduce matrices. Matrices will help us in representing the problem is a
compact way and also will lead to a definitive answer. We will also study the eigenvalue problem for matrices which come up often in applications. These concerns
will allow us to represent matrices in elegant forms.

1.2

Basic matrix operations

As usual, R denotes the set of all real numbers and C denotes the set of all complex
numbers. We will write F for either R or C. The numbers in F will also be referred
to as scalars.
A matrix is a rectangular array of symbols. For us these symbols are real numbers
or, in general, complex numbers. The individual numbers in the array are called the
entries of the matrix. Each entry of a matrix is a scalar. The number of rows and the
number of columns in any matrix are necessarily positive integers. A matrix with m
rows and n columns is called an m n matrix and it may be written as

a11 a1n

.. ,
A = ...
.
am1 amn
or as A = [ai j ] for short, with ai j F for i = 1, . . . , m] j = 1, . . . , n. The number ai j
which occurs at the entry in ith row and jth column is referred to as the (i, j)th entry
of the matrix [ai j ].
Any matrix with m rows and n columns will be referred as an m n matrix. The
set of all m n matrices with entries from F will be denoted by Fmn .
A row vector of size n is a matrix in F1n . Similarly, a column vector of size n
is a matrix in Fn1 . The vectors in F1n (row vectors) will be written as
[a1 , , an ] or as

[a1 an ]

for scalars a1 , . . . , an . The vectors in Fn1 are written as



b1
..
.
bn
for scalars b1 , . . . , bn . We will sometimes write such a column vector as [b1 bn ]t ,
for saving vertical space.
We will write both F1n and Fn1 as Fn . Especially when a result is applicable to
both row vectors and columns vectors, this notation will become handy. Also, we

Elementary Matrix Theory

may write a typical vector in Fn as


(a1 , . . . , an ).
When Fn is F1n , you should read (a1 , . . . , an ) as [a1 , . . . , an ], a row vector, and when
Fn is Fn1 , you should read (a1 , . . . , an ) as [a1 , . . . , an ]t , a column vector.
Any matrix in Fmn is said to have its size as m n. If m = n, the rectangular
array becomes a square array with m rows and m columns; and the matrix is called a
square matrix of order m.
Naturally, two matrices of the same size are considered equal when their corresponding entries coincide, i.e., if A = [ai j ] and B = [bi j ] are in Fmn , then
A=B

iff ai j = bi j

for each i {1, . . . , m} and for each j {1, . . . , n}. Thus matrices of different sizes
are unequal.
The zero matrix is a matrix each entry of which is 0. We write 0 for all zero
matrices of all sizes. The size is to be understood from the context.
Let A = [ai j ] Fnn be a square matrix of order n. The entries aii are called as
the diagonal entries of A. The diagonal of A consists of all diagonal entries; the
first entry on the diagonal is a11 , and the last diagonal entry is ann . The entries of A,
which are not on the diagonal, are called as off diagonal entries of A; they are ai j
for i 6= j. The diagonal of the following matrix is shown in bold:

1 2 3
2 3 4 .
3 4 0
Here, 1 is the first diagonal entry, 3 is the second diagonal entry and 5 is the third
and the last diagonal entry.
The super-diagonal of a matrix consists of entries above the diagonal. That is,
the entries ai,i+1 consist the super-diagonal of an n n matrix A = [ai j ]. Of course, i
varies from 1 to n 1 here. The super-diagonal of the following matrix is shown in
bold:

1 2 3
2 3 4 .
3 4 0
If all off-diagonal entries of A are 0, then A is said to be a diagonal matrix. Only
a square matrix can be a diagonal matrix. There is a way to generalize this notion to
any matrix, but we do not require it. Notice that the diagonal entries in a diagonal
matrix need not all be nonzero. For example, the zero matrix of order n is also a
diagonal matrix. The following is a diagonal matrix. We follow the convention of
not showing the 0 entries in a matrix.

1
1 0 0
3 = 0 3 0 .
0
0 0 0

Matrix Operations

We also write a diagonal matrix with diagonal entries d1 , . . . , dn as diag (d1 , . . . , dn ).


Thus the above diagonal matrix is also written as
diag (1, 3, 0).
The identity matrix is a square matrix of which each diagonal entry is 1 and each
off-diagonal entry is 0.
I = diag (1, . . . , 1).
When identity matrices of different orders are used in a context, we will use the
notation Im for the identity matrix of order m.
We write ei for a column vector whose ith component is 1 and all other components
0. When we consider ei as a column vector in Fn1 , the jth component of ei is i j .
Here,
(
1 if i = j
ij =
0 if i 6= j
is the Kronekers delta. Notice that the identity matrix I = [ i j ].
There are then n distinct column vectors e1 , . . . , en . The list of column vectors
e1 , . . . , en is called the standard basis for Fn1 , for reasons we will discuss later.
Accordingly, the ei s are referred to as the standard basis vectors. These are the
columns of the identity matrix of order n, in that order; that is, ei is the ith column of
I. The transposes of these ei s are the rows of I. That is, the ith row of I is eti . Thus
t
e

 .1
I = e1 en = .. .
etn
A scalar matrix is a matrix of which each diagonal entry is a scalar, the same
scalar, and each off-diagonal entry is 0. Each scalar matrix is a diagonal matrix with
same scalar on the diagonal. The following is a scalar matrix:

3
3

3
3
It is also written as diag (3, 3, 3, 3). If A, B Fmm and A is a scalar matrix, then
AB = BA. Conversely, if A Fmm is such that AB = BA for all B Fmm , then A
must be a scalar matrix. This fact is not obvious, and its proof will require much
more than discussed until now.
A matrix A Fmn is said to be upper triangular iff all entries above the diagonal
are zero. That is, A = [ai j ] is upper triangular when ai j = 0 for i > j. In writing such
a matrix, we simply do not show the zero entries below the diagonal. Similarly, a
matrix is called lower triangular iff all its entries above the diagonal are zero. Both
upper triangular and lower triangular matrices are referred to as triangular matrices.

Elementary Matrix Theory

A diagonal matrix is both upper and lower triangular. The following are examples of
lower triangular matrix L and upper triangular matrix U, both of order 3.

1
1 2 3
L = 2 3 , U = 3 4 .
3 4 5
5
Sum of two matrices of the same size is a matrix whose entries are obtained by
adding the corresponding entries in the given two matrices. That is, if A = [ai j ] and
B = [bi j ] are in Fmn , then
A + B = [ai j + bi j ] Fmn .
For example,

 
 

1 2 3
3 1 2
4 3 5
+
=
.
2 3 1
2 1 3
4 4 4
We informally say that matrices are added entry-wise. Matrices of different sizes can
never be added.
It then follows that
A + B = B + A.
Similarly, matrices can be multiplied by a scalar entry-wise. If A = [ai j ] Fmn ,
and F, then
A = [ ai j ] Fmn .
Therefore, a scalar matrix with on the diagonal is written as I. Notice that
A+0 = 0+A = A
for all matrices A Fmn , with an implicit understanding that 0 Fmn . For A =
[ai j ], the matrix A Fmn is taken as one whose (i, j)th entry is ai j . Thus
A = (1)A

and A + (A) = A + A = 0.

We also abbreviate A + (B) to A B, as usual. For example,



 
 

1 2 3
3 1 2
0 5 7
3

=
.
2 3 1
2 1 3
4 8 0
The addition and scalar multiplication as defined above satisfy the following properties:
Let A, B,C Fmn . Let , F.
1. A + B = B + A.
2. (A + B) +C = A + (B +C).
3. A + 0 = 0 + A = A.

Matrix Operations
4. A + (A) = (A) + A = 0.
5. ( A) = ( )A.
6. (A + B) = A + B.
7. ( + )A = A + A.
8. 1 A = A.

Notice that whatever we discuss here for matrices apply to row vectors and column
vectors, in particular. But remember that a row vector cannot be added to a column
vector unless both are of size 1 1, when both become numbers in F.
Another operation that we have on matrices is multiplication of matrices, which
is a bit involved. Let A = [aik ] Fmn and B = [bk j ] Fnr . Then their product AB
is a matrix [ci j ] Fmr , where the entries are
n

ci j = ai1 b1 j + + ain bn j =

aik bk j .
k=1

Notice that the matrix product AB is defined only when the number of columns in A
is equal to the number of rows in B.
A particular case might be helpful. Suppose A is a row vector in F1n and B is a
column vector in Fn1 . Then their product AB F11 ; it is a matrix of size 1 1.
Often we will identify such matrices with numbers. The product now looks like:

b1

 . 

a1 an .. = a1 b1 + + an bn
bn
This is helpful in visualizing the general case, which looks like

b11 b1j b1r

a11
a1k
a1n
c11 c1 j

..


ai1 aik ain b`1 b`j b`r = ci1
cij

..

.
am1
amk
amn
cm1 cm j
bn1 bnj bnr

c1r

cir

cmr

The ith row of A multiplied with the jth column of B gives the (i, j)th entry in AB.
Thus to get AB, you have to multiply all m rows of A with all r columns of B. Besides
writing a linear system in compact form, we will see later why matrix multiplication
is defined this way. For example,

3 5 1
2 2 3 1
22 2
43 42
4 0 2 5 0 7 8 = 26 16 14
6 .
6 3 2
9 4 1 1
9
4 37 28

Elementary Matrix Theory

If u F1n and v Fn1 , then uv F; but vu Fnn .



1 
3 6 1

 1
 

3 6 1 2 = 19 , 2 3 6 1 = 6 12 2 .
4
4
12 24 4
It shows clearly that matrix multiplication is not commutative. Commutativity can
break down due to various reasons. First of all when AB is defined, BA may not be
defined. Secondly, even when both AB and BA are defined, they may not be of the
same size; and thirdly, even when they are of the same size, they need not be equal.
For example,


 



 

1 2 0 1
4 7
0 1 1 2
2 3
=
but
=
.
2 3 2 3
6 11
2 3 2 3
8 13
It does not mean that AB is never equal to BA. There can be some particular matrices
A and B both in Fnn such that AB = BA.
Observe that if A Fmn , then AIn = A and Im A = A. Look at the columns of In in
this product. They say that
Ae j = the jth column of A

for j = 1, . . . , n.

Here, e j is the standard jth basis vector, the jth column of the identity matrix of order
n; its jth component is 1 and all other components are 0. Also, directly multiplying
A with e j we see that


a1 j
0
a11 a1 j a1n

.. ..
..

. .
.

Ae j = ai1 ai j ain
1 = ai j = jth column of A.

. .
..

.. ..
.
am j
am1 am j amn 0
Thus A can be written in block form as
A = [Ae1 Ae j Aen ].
Unlike numbers, product of two nonzero matrices can be a zero matrix. For example,


 

1 0 0 0
0 0
=
.
0 0 0 1
0 0
It is easy to verify the following properties of matrix multiplication:
1. If A Fmn , B Fnr and C Frp , then (AB)C = A(BC).
2. If A, B Fmn and C Fnr , then (A + B)C = AB + AC.
3. If A Fmn and B,C Fnr , then A(B +C) = AB + AC.

Matrix Operations
4. If F, A Fmn and B Fnr , then (AB) = ( A)B = A( B).

You can see matrix multiplication in a block form. Suppose A Fmn . Write its
ith row as Ai? Also, write its kth column as A?k . Then we can write A as a row of
columns and also as a column of rows in the following manner:

A1?



A = [aik ] = A?1 A?n = ... .


Am?
Write B Fnr similarly as

B = [bk j ] = B?1


B1?
 .
B?r = .. .
B?n

Then their product AB can now be written as

A1? B

 .
AB = AB?1 AB?r = .. .
Am? B
When writing this way, we ignore the extra brackets [ and ].
Powers of square matrices can be defined inductively by taking
A0 = I

and

An = AAn1 for n N.

Example

1.1
1 1 0
1 n n(n 1)
2n for n N.
Let A = 0 1 2 . Show that An = 0 1
0 0 1
0 0
1
We use induction on n. The basis case n = 1 is obvious. Suppose An is as
given. Now,

1 1 0 1 n n(n 1)
1 n + 1 (n + 1)n
2n = 0
1
2(n + 1) .
An+1 = AAn = 0 1 2 0 1
0 0 1 0 0
1
0
0
1
Notice that taking n = 0 in the matrix An , we see that A0 = I.
A square matrix A of order m is called invertible iff there exists a matrix B of
order m such that
AB = I = BA.
Such a matrix B is called an inverse of A. If C is another inverse of A, then
C = CI = C(AB) = (CA)B = IB = B.

10

Elementary Matrix Theory

Therefore, an inverse of a matrix is unique and is denoted by A1 . We talk of invertibility of square matrices only; and all square matrices are not invertible. For
example, I is invertible but 0 is not. If AB = 0 for square matrices A and B, then
neither A nor B is invertible.
If both A, B Fnn are invertible, then (AB)1 = B1 A1 . Reason:
B1 A1 AB = B1 IB = I = AIA1 = ABB1 A1 .
Invertible matrices play a crucial role in solving linear systems uniquely. We will
come back to the issue later.

Exercises for 1.2


1. Compute AB, CA, DC, DCAB, A2 , D2 and A3 B2 , where





1
2
3
2
1
2 3
4 1
0 .
A=
, B=
, C = 2 1 , D = 4 6
1 2
4
0
1
3
1 2 2
2. Let Ei j be the n n matrix whose i jth entry is 1 and all other entries are 0.
Show that each A = [ai j ] Cnn can be expressed as A = ni=1 nj=1 ai j Ei j .
Also show that Ei j Ekm = 0 if j 6= k, and Ei j E jm = Eim .
3. Let A Cmn , B Cnp . Let B1 , . . . , B p be the columns of B. Show that
AB1 , . . . , AB p are the columns of AB.
4. Let A Cmn , B Cnp . Let A1 , . . . , Am be the rows of A. Show that A1 B, . . . , Am B
are the rows of AB.
5. Construct two 3 3 matrices A and B such that AB = 0 but BA 6= 0.

1.3

Transpose and adjoint

Given a matrix A Fmn , its transpose is a matrix in Fnm , which is denoted by At ,


and is defined by
the (i j)th entry of At = the ( ji)th entry ofA.
That is, the ith column of At is the column vector [ai1 , , ain ]t . The rows of A
t
t
are the
 columns of A and the columns of A become the rows of A . In particular, if
u = a1 am is a row vector, then its transpose is

a1
..
t
u = . ,
am

11

Matrix Operations

which is a column vector, as mentioned earlier. Similarly, the transpose of a column


vector is a row vector. If you write A as a row of column vectors, then you can
express At as a column of row vectors, as in the following:
t
A?1


..
t
A = A?1 A?n A = . .
At?n

A1?



A = ... At = At1? Atm? .


Am?
For example,



1 2
1 2 3
A=
At = 2 3 .
2 3 1
3 1
It then follows that transpose of the transpose is the original matrix. The following
are some of the properties of this operation of transpose.
1. (At )t = A.
2. (A + B)t = At + Bt .
3. ( A)t = At .
4. (AB)t = Bt At .
5. If A is invertible, then At is invertible, and (At )1 = (A1 )t .
In the above properties, we assume that the operations are allowed, that is, in (2), A
and B must be of the same size. Similarly, in (4), the number of columns in A must
be equal to the number of rows in B; and in (5), A must be a square matrix.
It is easy to see all the above properties, except perhaps the fourth one. For this,
let A Fmn and B Fnr . Now, the ( j, i)th entry in (AB)t is the (i, j)th entry in AB;
and it is given by
ai1 b j1 + + ain b jn .
On the other side, the ( j, i)th entry in Bt At is obtained by multiplying the jth row
of Bt with the ith column of At . This is same as multiplying the entries in the jth
column of B with the corresponding entries in the ith row of A, and then taking the
sum. Thus it is
b j1 ai1 + + b jn ain .
This is the same as computed earlier.
The fifth one follows from the fourth one and the fact that (AB)1 = B1 A1 .
Observe that transpose of a lower triangular matrix is an upper triangular matrix,
and vice versa.

12

Elementary Matrix Theory

Close to the operations of transpose of a matrix is the adjoint. Let A = [ai j ] Fmn .
The adjoint of A is denoted as A , and is defined by
the (i j)th entry of A = the complex conjugate of ( ji)th entry ofA.
We write for the complex conjugate of a scalar . That is, + i = i . Thus,
if ai j R, then ai j = ai j . Thus, when A has only real entries, A = At . Also, the ith
column of At is the column vector (ai1 , , ain )t . For example,



1 2
1 2 3
A=
A = 2 3 .
2 3 1
3 1



1i 2
1+i 2 3
3 .
A=
A = 2
2 3 1i
3 1+i
Similar to the transpose, the adjoint satisfies the following properties:
1. (A ) = A.
2. (A + B) = A + B .
3. ( A) = A .
4. (AB) = B A .
5. If A is invertible, then A is invertible, and (A )1 = (A1 ) .
Here also, in (2), the matrices A and B must be of the same size, and in (4), the
number of columns in A must be equal to the number of rows in B. The adjoint of A
is also called the conjugate transpose of A. Notice that if A Rmn , then A = At .
Occasionally, we will use A for the matrix obtained from A by taking complex
conjugate of each entry. That is, the (i, j)th entry of A is the complex conjugate of
the (i, j)th entry of A. Hence A = (A)t .

Exercises for 1.3


1. Determine At , A, A , A A and AA , where

1 2 + i 3 i
1 2 3 1

i 1 i
2i

(a) A = 2 1 0 3 (b) A =
1 + 3i
i 3
0 1 3 1
2
0 i
2. Let A Cmn . Suppose AA = Im . Does it follow that A A = In ?

13

Matrix Operations

1.4

Elementary row operations

Recall that while solving linear equations in two or three variables, you try to eliminate a variable from all but one equation by adding an equation to the other, or
even adding a constant times one equation to another. We do similar operations on
the rows of a matrix. These are achieved by multiplying a given matrix with some
special matrices, called elementary matrices.
Let e1 , . . . , em Fm1 be the standard basis vectors. Let 1 i, j m. The product
ei etj is an m m matrix whose (i, j)th entry is 1 and all other entries are 0. We write
such a matrix as Ei j . For instance, when m = 3, we have

0 
0

e2 et3 = 1 0 0 1 = 0
0
0

0
0
0

0
1 = E23 .
0

An elementary matrix of order m is one of the following three types:


1. E[i, j] := I Eii E j j + Ei j + E ji with i 6= j.
2. E [i] := I Eii + Eii , where is a nonzero scalar.
3. E [i, j] := I + Ei j , where is a nonzero scalar and i 6= j.
Here, I is the identity matrix of order m. Similarly, the order of the elementary matrices will be understood from the context; we will not show that in our symbolism.
Example 1.2
The following are instances of elementary matrices of order 3.

0 1 0
1
0 0
1
E[1, 2] = 1 0 0 , E1 [2] = 0 1 0 , E2 [3, 1] = 0
0 0 1
0
0 1
2

0
1
0

0
0 .
1

We observe that for a matrix A Fmn , the following are true:


1. E[i, j] A is the matrix obtained from A by exchanging its ith and jth rows.
2. E [i] A is the matrix obtained from A by replacing its ith row with times the
ith row.
3. E [i, j] A is the matrix obtained from A by replacing its ith row with the ith
row plus times the jth row.

14

Elementary Matrix Theory

We call these operations of pre-multiplying a matrix with an elementary matrix as elementary row operations. Thus there are three kinds of elementary row operations
as listed above. Sometimes, we will refer to them as of Type-1, 2, or 3, respectively.
Also, in computations, we will write
E

A B
to mean that the matrix B has been obtained by an elementary row operation E, that
is, B = EA.
Example 1.3
See the following applications of elementary row operations:

1 1 1
1 1 1
1 1 1
[3,1]
E2 [2,1]
2 2 2 E3
2 2 2 0 0 0
3 3 3
0 0 0
0 0 0
Often we will apply elementary row operations in a sequence. In this way, the
above operations could be shown in one step as E3 [3, 1], E2 [2, 1]. However, remember that the result of application of this sequence of elementary row operations
on a matrix A is E2 [2, 1] E3 [3, 1] A; the products are in reverse order.
Elementary row operations can be undone by other elementary row operations.
The reason: each elementary matrix is invertible. In fact, the inverses of the elementary matrices are as follows:
(E[i, j])1 = E[i, j],

(E [i])1 = E1/ [i],

(E [i, j])1 = E [i, j].

Therefore, applying a sequence of elementary row operations on a matrix A amounts


to pre-multiplying A with a suitable invertible matrix.

Exercises for 1.4


1. Compute E[2, 3]A, Ei [2]A, E1/2 [1, 3]A and Ei [1, 2]A, where A is given by

1
2 + i 3 i
1 2 3 1
i
1 i
2i

A = 2 1 0 3 (b) A =
1 + 3i i
3
0 1 3 1
2
0
i
2. Argue in general terms why the following are true:
(a) E[i, j] A is the matrix obtained from A by exchanging its ith and jth rows.
(b) E [i] A is the matrix obtained from A by replacing its ith row with
times the ith row.
(c) E [i, j] A is the matrix obtained from A by replacing its ith row with the
ith row plus times the jth row.
3. Describe A E[i, j], A E [i] and A E [i, j] as to how are they obtained from A.

15

Matrix Operations

1.5

Row reduced echelon form

Elementary operations can be used to reduce a matrix to a nice form, bringing in


many zero entries. Recall that this corresponds to eliminating a variable from an
equation of a linear system. The first, from left, nonzero entry in a nonzero row of
a matrix is called a pivot. We denote a pivot in a row by putting a box around it. A
column where a pivot occurs is called a pivotal column. A row where a pivot occurs
is called a pivotal row.
A matrix A Fmn is said to be in row reduced echelon form (RREF) iff the
following conditions are satisfied:
(1) Each pivot is equal to 1.
(2) In a pivotal column, all entries other than the pivot are zero.
(3) The row index of each pivotal row is smaller than the row index of each zero
row.
(4) If ith row and (i + k)th row are pivotal rows, for i, k 1, then the column index
of the pivot in the (i + k)th row is greater than the column index of the pivot in
the ith row.
Example 1.4

The matrices
are in

0
0

0
0


0
0
,
0
0
echelon form. Whereas


0
0 1 3 1
0 0 0 1

0
0 0 0 0 ,
0
0 0 0 0
0

1 2 0

0 0 1
0 0 0

row reduced

1 3 0
0 0 2
,
0 0 0
0 0 0

0 ,
1

1
0
,
0
0
1
0
0
0

0
1
0
0



0 0 0 0 ,

0
0
0
1

0
0

1
0
0
0

i
1 0 0 0

3
0
0
0

0
1
1
0

are not in row reduced echelon form:


Observe that a (single) column vector in row reduced echelon form is either the
zero vector or e1 . Similarly, a row vector in row reduced echelon form is either a
zero row or etk for some k.
If a matrix in RREF has k pivotal columns, then those columns occur in the matrix
as e1 , . . . , ek , read from left to right, though there can be other columns inbetween
these pivotal columns.
Any matrix can be brought to a row reduced echelon form by using elementary
row operations. We give an algorithm to achieve this.

16

Elementary Matrix Theory

Reduction to Row Reduced Echelon Form


1. Set the work region R as the whole matrix A.
2. If all entries in R are 0, then stop.
3. If there are nonzero entries in R, then find the leftmost nonzero column. Mark
it as the pivotal column.
4. Find the topmost nonzero entry in the pivotal column. Box it; it is a pivot.
5. If the pivot is not on the top row of R, then exchange the row of A which
contains the top row of R with the row where the pivot is.
6. If the pivot, say, is not equal to 1, then replace the top row of R in A by 1/
times that row.
7. Make all entries, except the pivot, in the pivotal column as zero by replacing
each row above and below the top row of R using elementary row operations
in A with that row and the top row of R.
8. Find the sub-matrix to the right and below the pivot. If no such sub-matrix
exists, then stop. Else, reset the work region R to this sub-matrix, and go to 2.
We will refer to the output of the above reduction algorithm as the row reduced
echelon form (the RREF) of a given matrix.
Example 1.5

1
3
A =
1
2

1
R2
0

0
0

0
1 1
0 2
R1
1

0 4
5
9
0 6

0 32 12
1 E1/3 [3]

1 12
2

0 0 3
0 0 6

1
5
5
8

2
7
4
7

2
1
2
3
1
0
0
0

0
E1/2 [2]
1

5
9

1 2 0
1 12 12

4 2 5
6 3 9

0 32 12
1 0 32 0

1
1
R3 0
1 2
1 12 0
2

0 0 0 1
0 0
1
0 0
6
0 0 0 0
1
0
0
0

=B

Here, R1 = E3 [2, 1], E1 [3, 1], E2 [4, 1]; R2 = E1 [2, 1], E4 [3, 2], E6 [4, 2]; and
R3 = E1/2 [1, 3], E1/2 [2, 3], E6 [4, 3]. The matrix B is the RREF of A; and
B = E6 [4, 3] E1/2 [2, 3] E1/2 [1, 3] E1/3 [3] E6 [4, 2] E4 [3, 2] E1 [2, 1]E1/2 [2]
E2 [4, 1] E1 [3, 1] E3 [2, 1] A.
The products are in reverse order.

17

Matrix Operations

The RREF of an m n matrix is special. We try to see what information can


be derived from such a special form. Let A have the columns u1 , . . . , un ; these are
column vectors from Fm1 . That is,
A = [u1 u2 un ].
Let B be the RREF of A obtained by applying a sequence of elementary row operations. Let E be the mm invertible matrix (which is the product of the corresponding
elementary matrices) so that
EA = E[u1 u2 un ] = B.
Suppose the number of pivots in B is r. Then the standard basis vectors e1 , . . . , er of
Fm1 occur as the pivotal columns in B. Denote the n r non-pivotal columns in
B as v1 , . . . , vnr . In B, the columns e1 , . . . , er , v1 , . . . , vnr occur in some order. The
following observations are immediate from the above equations.
Observation 1: If ei occurs as the jth column in B, then Eu j = ei .
Observation 2: If vi occurs as the jth column in B, then Eu j = vi .
Notice that in B, the vectors e1 , . . . , er occur in that order, though some other vectors vi may occur between them; look at Example 1.5. If vi occurs between e j and
e j+1 , then vi has zero entries beyond the jth position. We then observe the following.
Observation 3: In B, if a vector vi occurs between standard basis vectors e j and e j+1 ,
then vi = [a1 a2 a j 0 0 0]t = a1 e1 + + a j e j for some a1 , . . . , a j F.
If e1 occurs as k1 th column, e2 occurs as k2 th column, and so on, then vi =
a1 Euk1 + + a j Euk j . That is,
E 1 vi = a1 uk1 + + a j uk j .
However, E 1 v j is the corresponding column of A. Thus we observe the following.
Observation 4: In B, if a vector vi = [a1 a2 a j 0 0 0]t occurs as the kth column,
and prior to it occur the standard basis vectors e1 , . . . , e j (and no other) in the columns
k1 , . . . , k j , respectively, then uk = a1 uk1 + + a j uk j .
It thus follows that each column of A can be written as b1 uk1 + br ukr for some
b1 , . . . , br F, where k1 , . . . , kr are all the column indices of pivotal columns in B.
Thus we have the following observation.
Observation 5: If v is any vector expressible in the form v = 1 u1 + + n un
and k1 , . . . , kr are all the column indices of pivotal columns in B, then then there
are scalars 1 , . . . , r F such that v = 1 uk1 + + r ukr . Moreover, uki is not
expressible in the form 1 uk1 + + i1 uki1 + i+1 uki+1 + + r ukr for any j F.
Notice that v = 1 uk1 + + r ukr = 1 E 1 e1 + + r E 1 er . Since er+1 is not
expressible in the form 1 e1 + + r er , we see that E 1 er+1 is not expressible in

18

Elementary Matrix Theory

the form 1 E 1 e1 + + r E 1 er . Due to Observation 5, we conclude the following.


Observation 6: If the number of pivots r in B is less than m, then the vector E 1 er+k
for 1 k m r, is not expressible in the form 1 u1 + + n un .
In B, the m r bottom rows are zero rows. They have been obtained from the
pivotal rows by elementary row operations. Monitoring the row exchanges that have
been applied on A to reach at B, we see that the zero rows correspond to some m r
rows of A. Therefore, similar to Observation 5, we find the following.
Observation 7: Let wk1 , . . . , wkr be the rows of A which have become the pivotal
rows in B. If w is any other row of A, then there exist scalars 1 , . . . , r such that u =
1 uk1 + + r ukr . Moreover, uki 6= 1 uk1 + + i1 uki1 + i+1 uki+1 + + r ukr
for any j F.
For vectors in Fn , we say that v is a linear combination of v1 , . . . , vm if there
exist scalars ai F such that v = a1 v1 + + am vm Suppose the number of pivots
in the RREF of A Fmn is r. Then Observations 5 and 7 imply that there exist
exactly r number of columns in A so that each of the other m r columns is a linear
combinations of these r ones, and none of these r columns is a linear combination of
other r 1 such columns. These r columns correspond to the pivotal columns in the
RREF of A. Similarly, there exist r number of rows of A, such that each of the other
n r rows is a linear combination of the r ones, and none of the r rows is a linear
combination of other such r 1 rows. Again, these r rows correspond to the nonzero
rows in the RREF of A, monitoring the row exchanges.
The row reduced echelon form of a matrix is canonical, in the following sense.
Theorem 1.1
Let A Fmn . There exists a unique matrix in Fmn in row reduced echelon
form obtained from A by elementary row operations.
Proof Suppose B,C Fmn are matrices in RREF such that each has been
obtained from A by elementary row operations. Then B = E1 A and C = E2 A
for some invertible matrices E1 , E2 Fmm . Now, B = E1 A = E1 (E2 )1C. Write
E = E1 (E2 )1 to have B = EC, where E is invertible.
Assume, on the contrary, that B 6= C. Then there exists a column index,
say k 1, such that the first k 1 columns of B coincide with the first k 1
columns of C, respectively; and the kth column of B is not equal to the kth
column of C. Let u be the kth column of B, and let v be the kth column of C.
We have u = Ev and u 6= v.
Suppose the pivotal columns that appear within the first k 1 columns in
C, and also in B, are e1 , . . . , e j . Since B = EC, we have
e1 = Ee1 = E 1 e1 , . . . , e j = Ee j = E 1 e j .

Matrix Operations

19

Since C is in RREF, either u = e j+1 or there exist scalars 1 , . . . , j such that


u = 1 e1 + + j e j . The latter case includes the possibility that u = 0. (If
none of the first k columns in C is a pivotal column, we take u = 0.) Similarly,
B is in RREF implies that either v = e j+1 or v = 1 e1 + + j e j for some
scalars 1 , . . . , j . We consider the following exhaustive cases.
If u = e j+1 and v = e j+1 , then u = v.
If v = 1 e1 + + j e j (and whether u = e j+1 or u = 1 e1 + + j e j ), then
u = Ev = 1 Ee1 + + j Ee j = 1 e1 + + j e j = v.
If u = 1 e1 + + j u j and v = e j+1 , then
v = E 1 u = 1 E 1 e1 + + j E 1 e j = 1 e1 + + j e j = u.
In either case, u = v; and this is a contradiction. Therefore, B = C.
Theorem 1.1 justifies our use of the term the RREF of a matrix. Given a matrix,
it does not matter whether you compute its RREF by following our algorithm or any
other algorithm; the end result is the same matrix in RREF.

Exercises for 1.5


1. Compute row reduced echelon forms of the following
matrices:

1 2 1 1
0 0 1
2 1 1 0
0 2 3 3

(a) 0 1 0 (b) 1 0 1 4 (c)


1 1 3 4
1 0 0
0 1 1 4
1 1 5 2
2. Argue why our algorithm for reducing a matrix to its RREF gives a unique
output.
3. In Example 1.5, let ui be the ith column and let w j be the jth row of A.
(a) Compute the matrix X so that XA is in RREF.
(b) Verify that Xu2 = e2 .
(c) Find a, b R such that u3 = au1 + bu2 using the RREF of A.
(d) Determine a, b, c R such that w4 = aw1 + bw2 + cw3 using the RREF
reduction of A.
4. Construct v R4 which is not expressible as av1 + bv2 + cv3 + dv4 for any
a, b, c, d R, where v1 = (1, 2, 3, 4), v2 = (2, 0, 1, 1), v3 = (3, 2, 1, 2) and
v4 = (1, 2, 2, 3). (Hint: take A = [vt1 vt2 vt3 vt4 ]. Compute its RREF and use
Observation 6.)

20

Elementary Matrix Theory

1.6

Determinant

There are two important quantities associated with a square matrix. One is the trace
and the other is the determinant.
The sum of all diagonal entries of a square matrix is called the trace of the matrix.
That is, if A = [ai j ] Fmm , then
n

tr(A) = a11 + + ann =

akk .
k=1

In addition to tr(Im ) = m, tr(0) = 0, the trace satisfies the following properties:


1. tr( A) = tr(A) for each F.
2. tr(At ) = tr(A) and tr(A ) = tr(A).
3. tr(A + B) = tr(A) + tr(B) and tr(AB) = tr(BA).
4. tr(A A) = 0 iff tr(AA ) = 0 iff A = 0.
m
2

Observe that tr(A A) = m


i=1 j=1 |ai j | = tr(AA ). Form this (4) follows.

The second quantity, called the determinant of a square matrix A = [ai j ] Fnn ,
written as det(A), is defined inductively as follows:
If n = 1, then det(A) = a11 .
If n > 1, then det(A) = nj=1 (1)1+ j a1 j det(A1 j )
where the matrix A1 j F(n1)(n1) is obtained from A by deleting the first row and
the jth column of A.
When A = [ai j ] is written showing all its entries, we also write det(A) by replacing
the two big closing brackets [ and ] by two vertical bars | and |. For a 2 2 matrix,
its determinant is seen as follows:


a11 a12
1+1
1+2


a21 a22 = (1) a11 det[a22 ] + (1) a12 det[a21 ] = a11 a22 a12 a21 .
Similarly, for a 3 3 matrix, we need to compute three 2 2 determinants. For
example,

1 2 3
1 2 3


det 2 3 1 = 2 3 1
3 1 2
3 1 2






3 1




+ (1)1+2 2 2 1 + (1)1+3 3 2 3
= (1)1+1 1




1 2
3 2
3 1






3 1




2 2 1 + 3 2 3
= 1




1 2
3 2
3 1
= (3 2 1 1) 2 (2 2 1 3) + 3 (2 1 3 3)
= 5 2 1 + 3 (7) = 18.

21

Matrix Operations
For a lower triangular matrix, we see that




a11
a22




a12 a22
a23 a33




a13 a23 a33



..
=
a



11
.
..



.






an1

an1

ann







= = a11 a22 ann .



ann

In general, the determinant of any triangular matrix (upper or lower), is the product
of its diagonal entries. In particular, the determinant of a diagonal matrix is also
the product of its diagonal entries. Thus, if I is the identity matrix of order n, then
det(I) = 1 and det(I) = (1)n .
Our definition of determinant expands the determinant in the first row. In fact, the
same result may be obtained by expanding it in any other row, or even any other
column. Along with this, some more properties of the determinant are listed in the
following.
Let A Fnn . The sub-matrix of A obtained by deleting the ith row and the jth
column is called the (i, j)th minor of A, and is denoted by Ai j . The (i, j)th co-factor
of A is (1)i+ j det(Ai j ); it is denoted by Ci j (A). Sometimes, when the matrix A is
fixed in a context, we write Ci j (A) as Ci j . The adjugate of A is the n n matrix
obtained by taking transpose of the matrix whose (i, j)th entry is Ci j (A); it is denoted
by adj(A). That is, adj(A) Fnn is the matrix whose (i, j)th entry is the ( j, i)th cofactor C ji (A). Also, we write Ai (x) for the matrix obtained from A by replacing its
ith row by a row vector x of appropriate size.
Let A Fnn . Let i, j, k {1, . . . , n}. Let E[i, j], E [i] and E [i, j] be the elementary
matrices of order n with 1 i 6= j n and 6= 0, a scalar. Then the following
statements are true.
1. det(E[i, j] A) = det(A).
2. det(E [i] A) = det(A).
3. det(E [i, j] A) = det(A).
4. If some row of A is the zero vector, then det(A) = 0.
5. If one row of A is a scalar multiple of another row, then det(A) = 0.
6. For any i {1, . . . , n}, det( Ai (x + y) ) = det( Ai (x) ) + det( Ai (y) ).
7. det(At ) = det(A).
8. If A is a triangular matrix, then det(A) is equal to the product of the diagonal
entries of A.
9. det(AB) = det(A) det(B) for any matrix B Fnn .

22

Elementary Matrix Theory

10. det(At ) = det(A).


11. A adj(A) = adj(A)A = det(A) I.
12. A is invertible iff det(A) 6= 0.
Elementary column operations are operations similar to row operations, but with
columns instead of rows. Notice that since det(At ) = det(A), the facts concerning
elementary row operations also hold true if elementary column operations are used.
Using elementary operations, the computational complexity for evaluating a determinant can be reduced drastically. The trick is to bring a matrix to a triangular form
by using elementary row operations, so that the determinant of the triangular matrix
can be computed easily.
Example 1.6

1 0 0

1 1 0

1 1 1

1 1 1



1 0 0
1


1 R1 0 1 0
=
0 1 1
1

0 1 1

1



1
1


2 R2 0
=
0
2

0

2

0 0
1 0
0 1
0 1



1
1


2 R3 0
=
0
4

0

4

0
1
0
0

0
0
1
0


1
2
= 8.
4
8

Here, R1 = E1 [2, 1]; E1 [3, 1]; E1 [4, 1], R2 = E1 [3, 2]; E1 [4, 2], and R3 = E1 [4, 3].
Finally, the upper triangular matrix has the required determinant.
Example 1.7
See that the following is true, for verifying Property (6) as mentioned above:




3 1 2 4 1 0 0 1 2 1 2 3




1 1 0 1 1 1 0 1 1 1 0 1
+


=
1 1 1 1 1 1 1 1 1 1 1 1 .




1 1 1 1 1 1 1 1 1 1 1 1

Exercises for 1.6


1. Construct an nn nonzero matrix, where no row is a scalar multiple of another
row but its determinant is 0.
2. Let A Cnn . Show that if tr(A A) = 0, then A = 0.
3. Let a1 , . . . , an C. Let A be the n n matrix whose first row has all entries
k1 in that order. Show that
as 1 and whose kth row has entries ak1
1 , . . . , an
det(A) = i< j (ai a j ).
4. Let A be an n n matrix with integer entries. Prove that if det(A) = 1, then
A1 has only integer entries.

23

Matrix Operations

5. Determine

A1

1 0 0
1 1 0
using the adj(A), where A =
1 1 1
1 1 1

6. Compute the determinant of the n n matrix

1
.
..
1

1
1
.
1
1

where the anti-diagonal entries are all 1 and all other entries are 0.

1.7

Computing inverse of a matrix

The adjugate property of the determinant provides a way to compute the inverse
of a matrix, provided it is invertible. However, it is very inefficient. We may use
elementary row operations to compute the inverse. Our computation of the inverse
bases on the following fact.
Theorem 1.2
A square matrix is invertible iff it is a product of elementary matrices.
Proof Each elementary matrix is invertible since E[i, j] is its own inverse,
E1/ [i] is the inverse of E [i], and E [i, j] is the inverse of E [i, j]. Therefore,
any product of elementary matrices is invertible.
Conversely, suppose that A is an invertible matrix. Let EA1 be the RREF
of A1 . If EA1 has a zero row, then EA1 A also has a zero row. That is, E
has a zero row. But E is a product of elementary matrices, which is invertible;
it does not have a zero row. Therefore, EA1 does not have a zero row. Then
each row in the square matrix EA1 has a pivot. But the only square matrix in
RREF having a pivot at each row is the identity matrix. Therefore, EA1 = I.
That is, A = E, a product of elementary matrices.
The computation of inverse will be easier if we write the matrix A and the identity
matrix I side by side and apply the elementary operations on both of them simultaneously. For this purpose, we introduce the notion of an augmented matrix.
If A Fmn and B Fmk , then the matrix [A|B] Fm(n+k) obtained from A and
B by writing first all the columns of A and then the columns of B, in that order, is
called an augmented matrix.

24

Elementary Matrix Theory

The vertical bar shows the separation of columns of A and of B, though, conceptually unnecessary.
For computing the inverse of a matrix, start with the augmented matrix [A|I]. Apply elementary row operations for reducing A to its row reduced echelon form, while
simultaneously applying the same operations on the entries of I. This means we premultiply the matrix [A|I] with a product B of elementary matrices. In block form, our
result is the augmented matrix [BA|BI]. If BA = I, then BI = A1 . That is, the part
that contained I originally will give the matrix A1 after the elementary row operations have been applied. If after row reduction, it turns out that B 6= I, then A is not
invertible; this information is a bonus.
Example 1.8
For illustration, consider the following square matrices:

1 1 2 0
1 0 0 2

A=
2 1 1 2 ,
1 2 4 2

1 1 2 0
1 0 0 2

B=
2 1 1 2 .
0 2 0 2

We want to find the inverses of the matrices, if at all they are invertible.
Augment A with an identity matrix to get

1 1 2 0
1 0 0 2

2 1 1 2
1 2 4 2

1
0
0
0

0
1
0
0

0
0
1
0

0
0
.
0
1

Use elementary row operations. Since a11 = 1, we leave row(1) untouched.


To zero-out the other entries in the first column, we use the sequence of
elementary row operations E1 [2, 1], E2 [3, 1], E1 [4, 1] to obtain

1 1 2 0
1
0 1 2 2
1

0 3 5 2 2
0 1 2 2 1

0
1
0
0

0
0
1
0

0
0
.
0
1

The pivot is 1 in (2, 2) position. Use E1 [2] to make the pivot 1.

1 1 2 0
1 0 0 0
0 1 2 2 1 1 0 0

0 3 5 2 2 0 1 0 .
0 1 2 2 1 0 0 1
Use E1 [1, 2], E3 [3, 2], E1 [4, 2] to zero-out all non-pivot entries in the pivotal

25

Matrix Operations
column to 0:

1
0 0 2
0 1
0 1 2 2 1 1
0 0 1
4
1 3
0 0 0 0 2 1

0
.
0
1

0
0
1
0

Since a zero row has appeared in the A portion of the augmented matrix,
we conclude that A is not invertible. You see that the second portion of the
augmented matrix has no meaning now. However, it records the elementary
row operations which were carried out in the reduction process. Verify that
this matrix is equal to
E1 [4, 2] E3 [3, 2] E1 [1, 2] E1 [2] E1 [4, 1] E2 [3, 1] E1 [2, 1]
and that the first portion is equal to this matrix times A.
For B, we proceed similarly. The augmented matrix [B|I] with the first pivot
looks like:

1 1 2 0
1 0 0 2

2 1 1 2
0 2 0 2

1
0
0
0

0
1
0
0

0
0
1
0

0
0
.
0
1

The sequence of elementary row operations E1 [2, 1]; E2 [3, 1] yields

1 1 2 0
1 0
0 1 2 2
1 1
0 3 5 2 2 0
0 2 0 2
0 0

0
0
1
0

0
0
.
0
1

Next, the pivot is 1 in (2, 2) position. Use E1 [2] to get the pivot as 1.

1 1 2 0
1 0 0 0
0 1 2 2 1 1 0 0

0 3 5 2 2 0 1 0 .
0 0 0 1
0 2 0 2

And then E1 [1, 2]; E3 [3, 2]; E2 [4, 2] gives

1
0

0
0

0 0 2 0 1
1 2 2 1 1
0 1 4 1 3
0 4 2 2 2

0
0
1
0

0
0
.
0
1

Next pivot is 1 in (3, 3) position. Now, E2 [2, 3]; E4 [4, 3] produces

1
0
0
0

0
1
0
0

0 2 0 1
0 6 1 5
1
4 1 3
0 14 2 10

0
2
1
4

0
0
0
1

26

Elementary Matrix Theory

Next pivot is 14 in (4, 4) position. Use [4; 1/14] to get the pivot as 1:

1
0
0
0

0
1
0
0

0 2
0 1
0
0
0 6
1
5
2
0
1
4
1
3
1
0
0 1 1/7 5/7 2/7 1/14

Use E2 [1, 4]; E6 [2, 4]; E4 [3, 4] to zero-out the entries in the pivotal column:

1
1
Thus B1 =
73
1

1
0
0
0

0
1
0
0

0
0
1
0

0
0
0
1

2/7
1/7
3/7
1/7

3/7 4/7 1/7


5/7 2/7 3/7
1/7 1/7 2/7
5/7 2/7 1/14

3 4 1
5 2 3
. Verify that B1 B = BB1 = I.
1 1 2
5 2 21

Observe that if a matrix is not invertible, then our algorithm for reduction to RREF
produces a pivot in the I portion of the augmented matrix.

Exercises for 1.7


1. Compute the inverses of the following matrices, if possible:

3 1 1 2
2 1 2
1 4 6
1 2 0 1

(a) 1 3 1 (b) 1 1 3 (c)


1 1 2 1
1 1 2
1 2 3
2 1 1 3

0 1 0
2. Let A = 0 0 1 , where b, c C. Show that A1 = bI + cA.
1 b c
3. Show that if a matrix A is upper triangular and invertible, then so is A1 .
4. Show that if a matrix A is lower triangular and invertible, then so is A1 .
5. Show that every nn matrix can be written as a sum of two invertible matrices.
6. Show that every n n invertible matrix can be written as a sum of two noninvertible matrices.

2
Rank and Linear Equations

2.1

Linear independence

In the reduction to RREF, why some rows are reduced to zero rows and why the
others are not reduced to zero rows? To see what is going on in such a reduction we
need to introduce some more concepts. If A is an m n matrix with entries from F,
then its rows are vectors in F1n and its columns are vectors in Fm1 . Recall that to
talk about row and column vectors at a time, we write Fn for both F1n and Fn1 .
The elements of Fn are written as (a1 , . . . , an ). That is, such an n-tuple of numbers
from F is interpreted as either a row vector with n components or a column vector
with n components, as the case demands.
Let v1 , . . . , vm be vectors in Fn . Let 1 , . . . , m F be scalars. Recall that the
vector
1 v1 + m vm
is called a linear combination of v1 , . . . , vm .
For example, in F12 , one linear combination of v1 = [1, 1] and v2 = [1, 1] is as
follows:
2[1, 1] + 1[1, 1].
This linear combination evaluates to [3, 1]. Thus [3, 1] is a linear combination of
v1 , v2 .
Is [4, 2] a linear combination of v1 and v2 ? Yes, since
[4, 1] = 1[1, 1] + 3[1, 1].
In fact, every vector in F12 is a linear combination of v1 and v2 . Reason:
[a, b] =

a+b
2

[1, 1] + ab
2 [1, 1].

However, every vector in F12 is not a linear combination of [1, 1] and [2, 2]. Reason? Any linear combination of these two vectors is a multiple of [1, 1]. Then [1, 0]
is not a linear combination of these two vectors.
Now, you see that a zero row in a row echelon form matrix is a linear combination
of earlier rows. Conversely, if a row is a linear combination of earlier rows in any
matrix, then in the RREF of the matrix, this row is reduced to a zero row. However,
during the reduction process, there can be row exchanges. In that case, instead of

27

28

Elementary Matrix Theory

talking about a linear combination of earlier rows, we may think of a linear combination of all other rows.
The vectors v1 , . . . , vm in Fn are called linearly dependent iff at least one of them
is a linear combination of others. The vectors are called linearly independent iff
none of them is a linear combination of others.
For example, [1, 1], [1, 1], [4, 1] are linearly dependent vectors whereas [1, 1],
[1, 1] are linearly independent vectors in F12 .
If 1 = = m = 0, then, the linear combination 1 v1 + + m vm evaluates
to 0. That is, the zero vector can always be written as a trivial linear combination.
Suppose the vectors v1 , . . . , vm are linearly dependent. Then one of them, say, vi is
a linear combination of others. That is,
vi = 1 v1 + + i1 vi1 + i+1 vi+1 + + m vm .
Then
1 v1 + + i1 vi1 + (1)vi + i+1 vi+1 + + m vm = 0.
Here, we see that a linear combination becomes zero, where at least one of the coefficients, that is, the ith one is nonzero.
Conversely, suppose that we have scalars 1 , . . . , m not all zero such that
1 v1 + + m vm = 0.
Suppose that the kth scalar k is nonzero. Then
vk =


1
1 v1 + + k1 vk1 + k+1 vk+1 + + m vm .
k

That is, the vectors v1 , . . . , vm are linearly dependent.


Thus we have proved the following:
v1 , . . . , vm are linearly dependent
iff 1 v1 + + m vm = 0 for scalars 1 , . . . , m not all zero.
iff the zero vector can be written as a non-trivial linear combination of
v1 , . . . , vm .
The same may be expressed in terms of linear independence.
Theorem 2.1
The vectors v1 , . . . , vm Fn are linearly independent iff for all 1 , . . . , m F,
1 v1 + m vm = 0 implies that 1 = = m = 0.
Theorem 2.1 provides a way to determine whether a finite number of vectors are
linearly independent or not. You start with a linear combination of the given vectors;
and equate it to 0. Then you must be able to derive that each coefficient in that linear
combination is 0. If this is the case, then the given vectors are linearly independent.

29

Rank and Linear Equations

If it is not possible, then from its proof you must be able to find a way of expressing
one of the vectors as a linear combination of the others, showing that the vectors are
linearly dependent.
Example 2.1
Are the vectors [1, 1, 1], [2, 1, 1], [3, 1, 0] linearly independent?
We start with an arbitrary linear combination and equate it to the zero vector.
Solve the resulting linear equations to determine whether all the coefficients
are necessarily 0 or not.
So, let
a[1, 1, 1] + b[2, 1, 1] + c[3, 1, 0] = [0, 0, 0].
Comparing the components, we have
a + 2b + 3c = 0, a + b + c = 0, a + b = 0.
The last two equations imply that c = 0. Substituting in the first, we see that
a + 2b = 0.
This and the equation a + b = 0 give b = 0. Then it follows that a = 0.
We conclude that the given vectors are linearly independent.
Example 2.2
Are the vectors [1, 1, 1], [2, 1, 1], [3, 2, 2] linearly independent?
Clearly, the third one is the sum of the first two. So, the given vectors are
linearly dependent.
To illustrate our method, we start with an arbitrary linear combination and
equate it to the zero vector. We then solve the resulting linear equations to
determine whether all the coefficients are necessarily 0 or not.
So, as earlier, let
a[1, 1, 1] + b[2, 1, 1] + c[3, 2, 2] = [0, 0, 0].
Comparing the components, we have
a + 2b + 3c = 0, a + b + 2c = 0, a + b + 2c = 0.
The last equation is redundant. From the first and the second, we have
b + c = 0.
We may choose b = 1, c = 1 to satisfy this equation. Then from the second
equation, we have a = 1. Our starting equation says that the third vector is
the sum of the first two.

30

Elementary Matrix Theory

Be careful with the direction of implication here. Your work-out must be in the
form
1 v1 + + m vm = 0 1 = = m = 0.
And that would prove linear independence.
To see how linear independence is
equations:
x1
2x1
4x1

helpful, consider the following system of linear


+2x2 3x3 = 2
x2 +2x3 = 3
+3x2 4x3 = 7

Here, we find that the third equation is redundant, since 2 times the first plus the
second gives the third. That is, the third one linearly depends on the first two. (You
can of course choose any other equation here as linearly depending on other two, but
that is not important.) Now, take the row vectors of coefficients of the unknowns as
in the following:
v1 = [1, 2, 3, 2],

v2 = [2, 1, 2, 3],

v3 = [4, 3, 4, 7].

We see that v3 = 2v1 + v2 , as it should be. We see that the vectors v1 , v2 , v3 are
linearly dependent. But the vectors v1 , v2 are linearly independent. Thus, solving the
given system of linear equations is the same thing as solving the system with only
first two equations. For solving linear systems, it is of primary importance to find out
which equations linearly depend on others. Once determined, such equations can be
thrown away, and the rest can be solved.

Exercises for 2.1


1. Check whether the given vectors are linearly independent, in each case:
(a) (1, 2, 6), (1, 3, 4), (1, 4, 2) in R3 .
(b) (1, 0, 2, 1), (1, 3, 2, 1), (4, 1, 2, 2) in C4 .
2. Suppose that u, v, w are linearly independent in C5 . Are the following lists of
vectors linearly independent?
(a) u, v + w, w, where is a nonzero complex number.
(b) u + v, v + w, w + u.
(c) u v, v w, w u.
3. Give three linearly dependent vectors in R2 such that none of the three is a
scalar multiple of another.
4. Suppose S is a set of vectors and some v S is not a linear combination of
other vectors in S. Is S linearly independent?
5. Prove that the nonzero vectors v1 , . . . , vm Fn are linearly independent iff there
exists a vector vk which is a linear combination of v1 , . . . , vk1 .

Rank and Linear Equations

2.2

31

Determining linear independence

We may use elementary row operations to check linear independence. Given m row
vectors v1 , . . . , vm F1n , we form a matrix A with its ith row as vi . Then using elementary row operations, we bring it to its RREF. Observe that exchanging vi with v j
in the list of vectors does not change linear independence of the vectors. Multiplying
a nonzero scalar with vi does not affect linear independence. Also, replacing vi with
vi + v j does not alter linear independence.
To see the last one, suppose v1 , . . . , vm are linearly independent. Let wi = vi + v j ,
i 6= j. To show the linear independence of v1 , . . . , vi1 , wi , vi+1 , . . . , vn , suppose that
1 v1 + + i1 vi1 + i wi + i+1 vi+1 + + m vm = 0.
Then
1 v1 + + i1 vi1 + i (vi + v j ) + i+1 vi+1 + + m vm = 0.
Simplifying, we have
1 v1 + + i vi + + ( j + i )v j + + m vm = 0.
Using linear independence of v1 , . . . , vm , we obtain
1 = i = j + bi = = m = 0.
This gives j = i = 0 and all other s are zero. Thus v1 , . . . , wi , . . . , vm are
linearly independent. Similarly, the converse also holds.
Thus, we take these vectors as the rows of a matrix and apply our reduction to
RREF algorithm. From the RREF, we know that all rows where a pivot occurs are
linearly independent. If you want to determine exactly which vectors among these
are linearly independent, you must keep track of the row exchanges. A summary of
the discussion in terms of a matrix is as follows.
Theorem 2.2
Let v1 , . . . , vm F1n . Let A Fmn be the matrix whose jth row is v j . Then
v1 , . . . , vm are linearly independent iff the RREF of A has no zero row.
Example 2.3
To determine whether the vectors [1, 1, 0, 1], [0, 1, 1, 1] and [1, 3, 2, 1] are
linearly independent or not, we proceed as follows.

32

Elementary Matrix Theory

1
0
1

1
1
3

0
1
1 1
2 1

E1 [3,1]

R2

1
0
0

0
0

0
1
1
R1
1 1 0
2 2
0

0 1 0

1 1
0
0 0 1
1
1
2

0 1
2
1 1 1
0 0 4

Here, R1 = E1 [1, 2], E2 [3, 2] and R2 = E1/4 [3], E2 [1, 3], E1 [2, 3].
The last matrix is in RREF in which each row has a pivot. Therefore, the
original vectors are linearly independent.
Though we have formulated Theorem 2.2 for row vectors. It is applicable for
column vectors as well. All that we do is start with the transposes of the column
vectors and apply the theorem.
Example 2.4
Are the vectors [1, 1, 0, 1]t , [0, 1, 1, 1]t and [2, 1, 3, 5]t linearly independent?
The vectors are in F41 . These are linearly independent iff their transposes
are. Forming a matrix with the transpose of the given vectors as rows, and
reducing it to its RREF, we see that

1
0
1
1
0 1 2
1
1
0
1 E [3,1] 1
R1
2
0
1
1 1 0
1
1 1 0
1
1 1
2 1 3
5
0 3 3
3
0
0
0
0
Here, R1 = E1 [1, 2], E3 [3, 2]. Since a zero row has appeared, the original vectors are linearly dependent. Also, notice that no row exchanges were carried
out in the reduction process. Therefore, the third vector is a linear combination of the first two vectors, which are linearly independent.
Due to our observations on RREF, we may follow another alternative. Instead of
taking transposes of the given column vectors, we proceed with the vectors themselves; thus forming a matrix with the columns as given vectors. In the RREF of the
matrix, if we see that each column is a pivotal column, then the vectors are linearly
independent. Moreover, if a column remains non-pivotal, then such a non-pivotal
column is a linear combination of the pivotal columns that preceed it. The components in the non-pivotal column give the coefficients of such a linear combination.
We solve Example 2.4 once more for illustrating this point.
Example 2.5
To determine whether the v1 = [1, 1, 0, 1]t , v2 = [0, 1, 1, 1]t and v3 = [2, 1, 3, 5]t
are linearly independent or not, we form a matrix [v1 v2 v3 ] and then reduce

33

Rank and Linear Equations


it to its RREF. It is as follows.

1 0 2
1 1 1 R1

0 1 3
1 1 5

1
1
0 2
0
R2
0
1 3

0
0
1 3
0 1 3
0

0
2
1 3
.
0
0
0
0

Here, R1 = E1 [2, 1], E1 [4, 1] and R2 = E1 [3, 2] E1 [4, 2]. The third column is
non-pivotal. Thus, the corresponding vector v3 is a linear combination of the
vectors that correspond to the pivotal columns, i.e., of v1 and v2 . Moreover,
the components in the non-pivotal column say that v3 = 2 v1 3 v2 . You can
easily verify it.

Exercises for 2.2


1. Using elementary row operations determine whether the given vectors are linearly dependent or independent in each of the following cases.
(a) [1, 0, 1, 2, 3], [2, 1, 2, 4, 1], [3, 0, 1, 1, 1], [2, 1, 1, 1, 2].
(b) [1, 0, 1, 2, 3], [2, 1, 2, 4, 1], [3, 0, 1, 1, 1], [2, 1, 0, 7, 3].
(c) [1, i, 1, 1 i], [i, 1, i, 1 + i], [2, 0, 1, i], [1 + i, 1 i, 1, i].
2. Let V = R3 ; A = {(1, 2, 3), (4, 5, 6), (7, 8, 9)}. Determine whether A is linearly
dependent and if it is, express one of the vectors in A as a linear combination
of the remaining vectors.

2.3

Rank of a matrix

Suppose B is the RREF of a matrix A. Keeping track of the row exchanges, suppose
that the jth row of A has become a zero row in B. In that case, the jth row of A
is a linear combination of other rows. Conversely, if the jth row of A is a linear
combination of other rows, then in B, this row becomes a zero row. If B has r number
of pivots, then A has r number of linearly independent rows and other rows are linear
combinations of these r rows.
We define the rank of A as the number of pivots in the RREF of A, and denote it
by rank(A).
For an mn matrix A, the number nrank(A) is called the nullity of the matrix A.
Nullity of A is the number of un-pivoted columns in the RREF of A. We will connect
the nullity of a matrix to the solutions of the homogeneous linear system Ax = 0 later.

34

Elementary Matrix Theory

Example
2.6

1 1 1 2 1
1 2 1 1 1

Let A =
3 5 3 4 3 . We compute its RREF as follows:
1 0 1 3 1

1
1

3
1

1 1 2 1
R1
2 1 1 1

5 3 4 3
0 1 3 1

1
0
0
0

1
1
2
1

1 2 1
R2
0 1 0

0 2 0
0 1 0

1
0
0
0

0
1
0
0

1 3 1
0 1 0

0 0 0
0 0 0

Here, R1 is E1 [2, 1], E3 [3, 1], E1 [4, 1] and R2 is E1 [1, 2], E2 [3, 2], E1 [4, 2].
Thus rank(A) = 2.
Example 2.7
Determine the rank of the matrix A in Example 1.5, and point out which
rows of A are linear combinations of other rows, and which columns are linear
combinations of other columns, by reducing A to its RREF.
Form Example 1.5, we have seen that

1 0 32 0
1 1 2 0

3 5 7 1 E
0 1 12 0

A=

.
1 5 4 5
0 0 0 1
2 8 7 9
0 0 0 0
The row operation E is given by
E = E3 [2, 1], E1 [3, 1], E2 [4, 1] E1 [2, 1], E4 [3, 2], E6 [4, 2],
E1/2 [1, 3], E1/2 [2, 3], E6 [4, 3].
We see that rank(A) = 3, the number of pivots in the RREF of A. In this
reduction, no row exchanges have been used. Thus the first three rows of A
are the required rows. The fourth row is a linear combination of these three
rows. In fact,
row(4) = 3 row(1) + (1) row(2) + 2 row(3).
It also says that the third column is a linear combination of first and second.
Notice that the coefficients in such a linear combination are given by the
entries of the third column in the RREF. We can easily check that
col(3) = 23 col(1) + 21 col(2).
Let A Fmn . If rank(A) = r, then there are r number of linearly independent rows
in the RREF of A and other rows are linear combinations of these r rows. In A, the

35

Rank and Linear Equations

corresponding r rows (with row exchanges taken care) are linearly independent and
the other rows are linear combinations of these r rows. Therefore, the maximum
number of linearly independent rows in A is r.
Looking at the columns in the RREF of A, we see that if rank(A) = r, then the
pivoted columns in the RREF are the standard basis vectors e1 , . . . , er . Thus, the unpivoted columns in the RREF are linear combinations of the pivoted ones. It shows
that in the RREF, there are r number of linearly independent columns and all other
columns are linear combinations of these r columns. Is this true in A also?
The RREF of A can be expressed as EA, where the matrix E is a product of elementary matrices. E is invertible. If the jth column of A is v j , then the jth column
of EA is Ev j . Without loss of generality, suppose that v1 , . . . , vr are linearly independent and each of vr+1 , . . . , vn is a linear combination of v1 , . . . , vr . We claim that the
columns of EA also has this property. To see this, suppose that
1 Ev1 + + r Evr = 0.
As, E is invertible, multiplying this equation with E 1 , we have
1 v1 + + r vr = 0.
Then the linear independence of v1 , . . . , vr implies that 1 = = r = 0. That is,
the vectors Ev1 , . . . , Evr are linearly independent. Next, for j > r, if
v j = 1 v1 + + r vr ,
then it follows that
Ev j = 1 Ev1 + + r Evr .
That is each of the vectors Evr+1 , . . . , Evn is a linear combination of Ev1 , . . . , Evr . It
proves our claim.
It then follows that if k is the maximum number of linearly independent columns
in A, then k is also the maximum number of columns in EA. Conversely, if EA has
maximum of k number of linearly independent columns, then A = E 1 (EA) will also
have maximum of k number of linearly independent columns.
Therefore, we conclude that the maximum number of linearly independent columns
in A is same as the maximum number of linearly independent columns in the RREF
of A, which is equal to rank(A).
The maximum number of linearly independent rows of a matrix is called the row
rank of the matrix. Similarly, the column rank of a matrix is the maximum number
of linearly independent columns. Using this terminology, we note down the above
discussion as our next theorem.
Theorem 2.3
Let A Fmn . Then rank(A) is equal to the row rank of A, which is also equal
to the column rank A.

36

Elementary Matrix Theory

It then follows that rank(A ) = rank(At ) = rank(A). Moreover, our discussion also
reveals that if P is any invertible matrix of order m, then rank(PA) = rank(A). Now,
if Q is an invertible matrix of order n, then
rank(AQ) = rank(Qt At ) = rank(At ) = rank(A).
We summarize the result as follows.
Theorem 2.4
Let A Fmn . Let P Fmm and Q Fnn be invertible matrices. Then
rank(PAQ) = rank(PA) = rank(AQ) = rank(A).
In general, if P Fkm and Q Fns , then rank(PA) rank(A) and rank(AQ)
rank(A). These are left as exercises for you.
Further, it follows that a matrix A Fnn is invertible iff rank(A) = n.

Exercises for 2.3

1 2 1 1 1
3 5 3 4 3

1. Determine rank r of
1 1 1 2 1 . Find out the r linearly independent
5 8 5 7 5
rows, and also the r linearly independent columns of the matrix. And then
express 4 r rows as linear combinations of those r rows, and 5 r columns
as linear combinations of those r columns.
2. Let A Fnn . Prove that A is invertible iff rank(A) = n iff det(A) 6= 0.
3. Let A Fmn . Let P Fmm . Is it true that the RREF of PA is same as the
RREF of A?
4. Let A Fmn and B Fnk . Prove that rank(AB) min{rank(A), rank(B)}.

2.4

Solvability of linear equations

We can now use our knowledge about matrices to settle some issues regarding solvability of linear systems. A linear system with m equations in n unknowns looks

37

Rank and Linear Equations


like:
a11 x1 + a12 x2 + a1n xn = b1
a21 x1 + a22 x2 + a2n xn = b2
..
.
am1 x1 + am2 x2 + amn xn = bm

Solving such a linear system amounts to determining the unknowns x1 , . . . , xn with


known scalars ai j and bi . Using the abbreviation x = [x1 , . . . , xn ]t , b = [b1 , . . . , bm ]t
and A = [ai j ], the system can be written in the compact form:
Ax = b.
Here, A Fmn , x Fn1 and b Fm1 . We also say that the matrix A is the system
matrix of the linear system Ax = b. Observe that the matrix A is a linear transformation from Fn1 to Fm1 , where m is the number of equations and n is the number of
unknowns in the system.
There is a slight deviation from our accepted symbolism. In case of linear systems,
we write b as a column vector and xi are unknown scalars.
Let A Fmn and b Fm1 . A solution of the system Ax = b is any vector y Fn1
such that Ay = b. In such a case, if y = [a1 , . . . , an ]t , then ai is called as the value of
the unknown xi in the solution y. In this language a solution of the system is also
written informally as
x1 = a1 , , xn = an .
The system Ax = b has a solution iff b R(A); and it has a unique solution iff b
R(A) and A is a one-one map. Corresponding to the linear system Ax = b is the
homogeneous system
Ax = 0.
The homogeneous system always has a solution since y := 0 is a solution. It has
infinitely many solutions when it has a nonzero solution. For, if y is a solution of
Ax = 0, then so is x for any scalar .
To study the non-homogeneous system, we use the augmented matrix [A|b] Fm(n+1)
which has its first n columns as those of A in the same order, and the (n+1)th column
is b. For example,



 

1 2 3
4
1 2 3 4
A=
, b=
[A|b] =
.
2 3 1
5
2 3 1 5
Theorem 2.5
Let A Fmn and b Fm1 . Then the following statements are true.
(1) Ax = b has a solution iff rank([A|b]) = rank(A).

38

Elementary Matrix Theory

(2) If u is a particular solution of Ax = b, then each solution of Ax = b is


given by u + y, where y is a solution of the homogeneous system Ax = 0.
(3) If [A0 |b0 ] is obtained from [A|b] by a finite sequence of elementary row
operations, then each solution of Ax = b is a solution of A0 x = b0 , and
vice versa.
(4) If r = rank([A|b]) = rank(A) < n, then there are n r unknowns which
can take arbitrary values and other r unknowns be determined from the
values of these n r unknowns.
(5) If m < n, then the homogeneous system has infinitely many solutions.
(6) Ax = b has a unique solution iff rank([A|b]) = rank(A) = n.
(7) If m = n, then Ax = b has a unique solution iff det(A) 6= 0.
Proof (1) Ax = b has a solution iff b is a linear combination of columns
of A iff the column rank of [A|b] is equal to the column rank of A. By Theorem 2.3, this happens, if and only if rank([A|b]) = rank(A).
(2) Let u be a particular solution of Ax = b. Then Au = b. Now, y is a solution
of Ax = b iff Ay = b iff Ay = Au iff A(y u) = 0 iff y u is a solution of Ax = 0.
(3) If [A0 |b0 ] has been obtained from [A|b] by a finite sequence of elementary
row operations, then A0 = EA and b0 = Eb, where E is the product of corresponding elementary matrices. The matrix E is invertible. Now, A0 x = b0 iff
EAx = Eb iff Ax = E 1 Eb = b.
(4) Due to (2), consider solving the corresponding homogeneous system. Let
rank(A) = r < n. Due to (3), assume that A is in RREF. There are r number
of pivots in A and m r number of zero rows. Omit all the zero rows. It
does not affect the solutions. The unknowns corresponding to pivots can be
expressed in terms of these n r unknowns. The n r unknowns which do not
correspond to pivots can take arbitrary values.
(5) If m < n, then r = rank(A) m < n. Consider the homogeneous system
Ax = 0. By (4), there are n r 1 number of unknowns which can take arbitrary values, and other r unknowns are determined accordingly. Each such
assignment of values to the n r unknowns gives rise to a distinct solution
resulting in infinite number of solutions of Ax = 0.
(6)

It follows from (1) and (4).

(7) Notice that for a matrix A Fnn , it is invertible iff rank(A) = n iff
det(A) 6= 0. Then the statement follows from (5).
A system of linear equations Ax = b is said to be consistent iff rank([A|b]) = rank(A).

Rank and Linear Equations

39

Theorem 2.5(1) says that only consistent systems have solutions. Conversely, if a
system has a solution, then the system must be consistent. The statement in Theorem 2.5(4) is sometimes informally stated as follows:
A consistent system has n rank(A) number of linearly independent solutions.
The unknowns that correspond to the pivots are called the basic variables, and the
unknowns which correspond to the un-pivoted ones are called the free variables.
Thus there are rank(A) number of basis variables and n rank(A) of free variables,
which are assigned arbitrary values. Therefore, the number of free variables is equal
to the nullity of A.
To summarize, suppose that a linear homogeneous system Ax = 0 has m number
of equations and n number of unknowns. If m < n, then the system has a nonzero
solution; and hence, infinitely many solutions. If m > n, then the number of solutions
depends on the rank of the system matrix. In this case, if rank(A) < n, then Ax = 0
has infinitely many solutions; and if rank(A) = n, then it has a unique solution, which
is the trivial solution.
For non-homogeneous linear systems the same conclusion is drawn provided that
the system is consistent. To say it explicitly, let the linear system Ax = b has m
equations, n unknowns with rank(A) = r. Then it has no solution iff rank([A|b]) > r;
it has a unique solution iff rank([A|b]) = r = n; and it has infinitely many solutions
iff rank([A|b]) = r < n. Notice that the number m of equations plays no role; but the
number r, which is the number of linearly independent equations is important here.

Exercises for 2.4


1. Show that a linear system Ax = b is solvable iff b is a linear combination of
columns of A.
2. Consider the linear system Ax = b, where A Fmn and rank(A) = r. Write
explicit conditions on m, n, r so that the system has
(a) no solution
(b) unique solution
(c) infinite number of solutions
3. Let A Fnn . Prove that the following are equivalent:
(a) A is invertible.
(b) Ax = 0 has no non-trivial solution.
(c) Ax = b has a unique solution for some b Fn1 .
(d) Ax = b has at least one solution for each b Fn1 .
(e) Ax = ei has at least one solution for each i {1, . . . , n}.
(f) Ax = vi has at least one solution for each basis {v1 , . . . , vn } of Fn1 .
(g) Ax = b has at most one solution for each b Fn1 .
(h) Ax = b has a unique solution for each b Fn1 .
(i) rank(A) = n.

40

Elementary Matrix Theory


(j) The RREF of A is I.
(k) The rows of A are linearly independent.
(l) The columns of A are linearly independent.
(m) det(A) 6= 0.
(n) For each B Cnn , AB = 0 implies that B = 0.
4. Let A, B Fmn be in RREF. Prove that Sol (A, 0) = Sol (B, 0) iff A = B.

2.5

Gauss-Jordan elimination

Gauss-Jordan elimination is an application of converting the augmented matrix to its


row reduced echelon form for solving linear systems.
To determine whether a system of linear equations is consistent or not, we convert
the augmented matrix [A|b] to its RREF. In the RREF, if an entry in the b portion has
become a pivot, then the system is inconsistent; otherwise, the system is consistent.
Example 2.8
Is the following system of linear equations consistent?
5x1 + 2x2 3x3 + x4 = 7
x1 3x2 + 2x3 2x4 = 11
3x1 + 8x2 7x3 + 5x4 = 8
We take the augmented matrix and reduce it to its row reduced echelon form
by elementary row operations.

2/5 3/5
1/5
7/5
5 2 3 1 7
1
R1
1 3 2 2 11 0 17/5 13/5 11/5 48/5
3 8 7 5 8
0 34/5 26/5 22/5 19/5

1
0 5/17 1/17 43/17
R2

0 1 13/17 11/17 48/17


0 0
0
0 77/5
Here, R1 = E1/5 [1], E1 [2, 1], E3 [3, 1] and R2 = E5/17 [2], E2/5 [1, 2], E34/5 [3, 2].
Since an entry in the b portion has become a pivot, the system is inconsistent.
In fact, you can verify that the third row in A is simply first row minus twice
the second row, whereas the third entry in b is not the first entry minus twice
the second entry. Therefore, the system is inconsistent.

41

Rank and Linear Equations

Example 2.9
We change the last equation in the previous example to make it consistent.
We consider the new system
5x1 + 2x2 3x3 + x4 = 7
x1 3x2 + 2x3 2x4 = 11
3x1 + 8x2 7x3 + 5x4 = 15
The reduction to echelon form is as follows:

2/5 3/5
1/5
7/5
7
5 2 3 1
1
R1
1 3 2 2 11 0 17/5 13/5 11/5 48/5
3 8 7 5 15
0 34/5 26/5 22/5 96/5

1
0 5/17 1/17 43/17
R2
0 1 13/17 11/17 48/17
0 0
0
0
0
with R1 = E1/5 [1], E1 [2, 1], E3 [3, 1] and R2 = E5/17 [2], E2/5 [1, 2], E34/5 [3, 2]
as the row operations. This expresses the fact that the third equation is
redundant. Now, solving the new system in row reduced echelon form is
easier. Writing as linear equations, we have
1 x1

5
1
17 x3 17 x4

1 x2 13
17 x3 +

11
17 x4

43
17

= 48
17

The unknowns corresponding to the pivots, that is, x1 and x2 are the basic
variables and the other unknowns, x3 , x4 are the free variables. The number
of basic variables is equal to the number of pivots, which is the rank of the
system matrix. By assigning the free variables xi to any arbitrary values, say,
i , the basic variables can be evaluated in terms of i .
We assign x3 to and x4 to . Then we have
x1 =

43
17

5
1
+ 17
+ 17
,

13
11
x2 = 48
17 + 17 17 .

Therefore, any vector y F41 in the form


43 5

1
17 + 17 + 17

48 13
+ 11
17
y := 17 17

for , F

is a solution of the linear system. Observe that

43/17
5/17
1/17
48/17
13/17
11/17

y=
0 + 1 + 0 .
0
0
1

42

Elementary Matrix Theory

Here, the first vector is a particular solution of the original system. The two
vectors

1/17
5/17
11/17
13/17

1 and 0
1
0
are linearly independent solutions of the corresponding homogeneous system.
There should be exactly two such linearly independent solutions of the homogeneous system, because the nullity of the system matrix is the number of
unknowns minus its rank, which is 4 2 = 2.
It is easy to devise a mechanical way to write out the solution set of a consistent
linear system using Gauss-Jordan elimination. One has to add necessary number of
zero rows or delete some so that the RREF is a square matrix. Then identifying the
free and basis variables, one can just write out the solution set by reversing the signs
of entries on the un-pivoted columns, and changing one of the 0s to a 1. This is left
as an exercise for you.
There are variations of Gauss-Jordan elimination. Instead of reducing the augmented matrix to its row reduced echelon form, if we reduce it to another intermediary form, called the row echelon form, then we obtain the method of Gaussian elimination. In the row echelon form, we do not require the entries above a pivot to be 0;
also the pivots need not be equal to 1. In that case, we will require back-substitution
in solving a linear system. To illustrate this process, we redo Example 2.9 starting
with the augmented matrix, as follows:

5
5
2 3 1
7
R1

1 3 2 2 11
0
3 8 7 5 15
0

5
E2 [3,2]
0
0

48/5
96/5

2 3
1 7

17/5 13/5 11/5 48/5


0 0
0 0
3
17/5
13/5
34/5 26/5
2

1
11/5
22/5

Here, R1 = E1/5 [2, 1], E3/5 [3, 1]. The augmented matrix is now in row echelon
form. It is a consistent system, since no entry in the b portion is a pivot. The pivots
say that x1 , x2 are basic variables and x3 , x4 are free variables. We assign x3 to and
x4 to . Writing in equations form, we have

13
11
5 48
x1 = 7 2 x2 + 3 , x2 = 17
5 5 + 5 .
First we determine x2 and then back-substitute. We obtain
x1 =

43
17

5
1
+ 17
+ 17
,

48
11
x2 = 17
+ 13
17 17 ,

x3 = ,

x4 = .

As you see we end up with the same set of solutions as in Gauss-Jordan elimination.

Rank and Linear Equations

43

Exercises for 2.5


1. Using Gauss-Jordan elimination, and also by Gaussian elimination, solve the
following linear systems:
(a) 3w + 2x + 2y z = 2, 2x + 3y + 4z = 2, y 6z = 6.
(b) w + 4x + y + 3z = 1, 2x + y + 3z = 0, w + 3x + y + 2z = 1, 2x + y + 6z = 0.
(c) w x + y z = 1, w + x y z = 1, w x y + z = 2, 4w 2x 2y = 1.
2. Show that the linear system x + y + kz = 1, x y z = 2, 2x + y 2z = 3 has
no solution for k = 1, and has a unique solution for each k 6= 1.

3
Subspace and Dimension

3.1

Subspace and span

Recall that F stands for either R or C; and Fn denotes either F1n or Fn1 . Also,
recall that a typical row vector in F1n is written as [a1 , . . . , an ] and a column vector in Fn1 is written as [a1 , . . . , an ]t . Both the row and column vectors are written
uniformly as (a1 , . . . , an ); these constitute the vectors in Fn . In Fn , we have a special vector, called the zero vector, which we denote by 0 := (0, . . . , 0). And if
x = (a1 , . . . , an ) Fn , then its additive inverse x := (a1 , . . . , an ).
The operations of addition and scalar multiplication in Fn enjoy the following
properties:
For u, v, w Fn and , F,
1. u + v = v + u.
2. (u + v) + w = u + (v + w).
3. u + 0 = 0 + u = u.
4. u + (u) = u + u = 0.
5. ( u) = ( )u.
6. (u + v) = u + v.
7. ( + )u = u + u.
8. 1 u = u.
9. (1)u = u.
10. If u + v = u + w, then v = w.
11. If u = 0, then = 0 or u = 0.
It so happens that the last three properties follow from the earlier ones. Any
nonempty set where the two operations of addition and scalar multiplication are defined, and which enjoy the first eight properties above, is called a vector space. In
this sense, both F1n and Fn1 are vector spaces. In such a general setting when

45

46

Elementary Matrix Theory

a nonempty subset of a vector space is closed under both the operations is called a
subspace. We may not need these general notions. However, we define a subspace
of our two specific vector spaces.
Let V be a nonempty subset of Fn . We say that V is a subspace of Fn iff the
following properties are satisfied:
1. For each u, v V, u + v V.
2. For each F and for each v V, v V.
Example 3.1
1. {0} and Fn are subspaces of Fn .
2. Let V = {(a, b, c) : 2a + 3b + 5c = 0, a, b, c F}. Clearly, (0, 0, 0) V. So,
V 6= . If (a1 , b1 , c1 ), (a2 , b2 , c2 ) V, then
2a1 + 3b1 + 5c1 = 0,

2a2 + 3b2 + 5c2 = 0.

So, 2(a1 + a2 ) + 3(b1 + b2 ) + 5(c1 + c2 ) = 0; (a1 , b1 , c1 ) + (a2 , b2 , c2 ) V.


If F, then 2( a1 ) + 3( b1 ) + 5( c1 ) = 0. So, (a1 , b1 , c1 ) V.
Therefore, V is a subspace of F3 .
3. Let V = {(a, b, c) : 2a + 3b + 5c = 1, a, b, c F}. Clearly, (1/2, 0, 0) V. So,
V 6= . Also, (0, 1/3, 0) V.
We see that (1/2, 0, 0) + (0, 1/3, 0) = (1/2, 1/3, 0). And
2 1/2 + 3 1/3 + 5 0 = 2 6= 1.
That is, (1/2, 0, 0) + (0, 1/3, 0) 6 V.
Therefore, V is not a subspace of F3 .
Also, notice that 2 (1/2, 0, 0) 6 V.
4. Let 1 , . . . , n F. Let V = {[a1 , . . . , an ] : 1 a1 + + n an = 0}. It is easy
to check that V is a subspace of F1n .
5. Let 1 , . . . , n F; F, 6= 0; V = {[a1 , . . . , an ] : 1 a1 + + n an = }.
Then V is not a subspace of F1n . Why?
It is easy to verify that in a subspace V, all the properties (1-8) above hold true.
This is the reason, we call such a nonempty subset as a subspace.
A single nonzero vector is not a subspace. For example, {(1, 1)} is not a subspace
of F2 but the set generated from it such as
{ (1, 1) : F}

47

Subspace and Dimension

is a subspace of F2 . This is the set of all linear combinations of the vector (1, 1).
Recall that a linear combination of vectors v1 , . . . , vm is any vector in the form
1 v1 + + m vm
for scalars 1 , . . . , m . We give a name to the set of all linear combinations of given
vectors.
If S is any nonempty subset of Fn , we define span (S) as the set of all linear combinations of finite number of vectors from S. We read span (S) as span of S. That
is,
span (S) = { 1 v1 + m vm : 1 , . . . , m F for some m N}.
If S = , then we define span () = {0}.
When S = {v1 , . . . , vm }, we also write span (S) as span {v1 , . . . , vm }. We see that
span {v1 , . . . , vm } = { 1 v1 + m vm : 1 , . . . , m F}.
For instance, v1 + v2 + + vm and v1 + 5v2 are in span {v1 , . . . , vm }. In the first case,
each i is equal to 1, whereas in the second case, 1 = 1, 2 = 5 and all other s
are 0.
Notice that S span (S) since each u S is a linear combination of itself, with the
coefficient as 1. Similarly, 0 span (S) since 0 = 0 u for any u S. However,
this argument is valid provided, there exists such a u in S. Otherwise, by definition,
span () = {0}. Therefore, 0 span (S) for every subset S of Fn .
Suppose S Fn . If u, v span (S), then both of them are linear combinations of
vectors from S. Their sum u+v is also a linear combination of vectors from S. Hence,
u + v span (S). Similarly, u span (S). Therefore, span (S) is a subspace of Fn .
Moreover, span (span (S)) = span (S). In general span of any subspace is the subspace
itself.
Let V be a subspace of Fn and let S V. We say that S is a spanning subset of V,
or that S spans V iff V = span (S). In this case, each vector in V can be expressed as
a linear combination of vectors from S. We also informally say that the vectors in S
span V whenever span (S) = V.
We just saw that
   
   
1
2
1
1
21
span
,
$ F , span
,
= F21 .
1
2
1
1
Notice that the vectors [1, 1]t , [1, 1]t , [4, 1]t also span F21 . In fact, since the
first two vectors span F21 , any list of vectors containing these two will also span
F21 .
Similarly, the vectors e1 , . . . , en in Fn1 span Fn1 , where ei is the column vector
in Fn1 whose ith component is 1 and all other components are 0.
In this terminology, vectors v1 , . . . , vn are linearly dependent iff one of the vectors
in this list is in the span of the rest. If no vector in the list is in the span of the rest,
then the vectors are linearly independent.

48

Elementary Matrix Theory

Exercises for 3.1


1. Let W be a vector space. Suppose U is a subspace of V and V is a subspace of
W. Is U a subspace of W ?
2. Let u, v1 , v2 , . . . , vn be n + 1 distinct vectors in Fn . Take S1 = {v1 , v2 , . . . , vn }
and S2 = {u, v1 , v2 , . . . , vn }. Prove that span (S1 ) = span (S2 ) iff u span (S1 ).
3. Let A, B be subsets of Fn . Prove or disprove the following:
(a) A is a subspace of Fn if and only if span (A) = A.
(b) If A B, then span (A) span (B).
(c) span (A B) = {u + v : u span (A), v span (B).
(d) span (A B) span (A) span (B).
4. Let S be a subset of a vector space V. Prove that span (S) is the subspace of V
satisfying the following properties:
(a) S span (S)

(b) If U is a subspace and S U, then span (S) U.

5. Let A and B be subsets of a vector space V. Prove or disprove:


span (A) span (B) = {0} iff A B is linearly independent.

3.2

Basis and dimension

We bring in some flexibility in using the phrases linearly dependent or independent. When the vectors v1 , . . . , vm in Fn are linearly independent, we say that the list
v1 , . . . , vm is linearly independent, and also the set {v1 , . . . , vm } is linearly independent. Similarly, the vectors v1 , . . . , vm are linearly dependent iff the list v1 , . . . , vm is
linearly dependent. If there are no repetitions of vectors, in this list, then it is also
equivalent to asserting that the set {v1 , . . . , vm } is linearly dependent.
Let V be a subspace of Fn . Let S be a subset of V. The subset S may or may not
span V. If it spans V, it is possible that it has a proper subset which also spans V. For
instance,
S = {[1, 2, 3], [1, 0, 1], [2, 4, 2], [0, 2, 2]}
spans the subspace V = {[a, b, c] : a + b + c = 0} of F13 . Also, the subset
{[1, 2, 3], [1, 0, 1], [2, 4, 2]}
spans the same subspace V. Notice that S is linearly dependent. Reason:
[0, 2, 2] = (1)[1, 2, 3] + [1, 0, 1].

49

Subspace and Dimension

On the other hand the linearly independent set {[1, 2, 3]} does not span V. For
instance,
[1, 0, 1] 6= [1, 2, 3], for any F.
That is, a spanning subset may be superfluous and a linearly independent set may
be deficient. A linearly independent set which also spans a subspace may be just
adequate in spanning the subspace.
Let V be a subspace of Fn . Let B be a list of vectors from V. We say that B is a
basis of V iff B is linearly independent and B spans V. We write a basis using the set
notation, though it is a list of vectors. However, we remember that a basis is a list, an
ordered set, where the ordering of the vectors is as they are written. For instance, if
{v1 , v3 , v2 } is a basis for a subspace U, then we consider v1 as the first basis vector,
v3 as the second basis vector, and v2 as he third basis vector.
Example 3.2
1. It is easy to check that B = {e1 , . . . , en } is a basis of Fn1 . Similarly,
E = {et1 , . . . , etn } is a basis of F1n .
2. We show that B = {[1, 2, 3], [1, 0, 1]} is a basis of
V = {[a, b, c] : a + b + c = 0, a, b, c F}.
First, B V. Second, any vector in V is of the form [a, b, a b] for
a, b F. Now,

[a, b, a b] = 2b [1, 2, 3] + a b2 [1, 0, 1]
shows that span (B) = V. For linear independence, suppose
[1, 2, 3] + [1, 0, 1] = [0, 0, 0].
Then + = 0, 2 = 0, 3 = 0. It implies that = = 0.
3. Also, E = {[1, 1, 0], [0, 1, 1]} is a basis for the subspace V in (2).
Let B be a basis of a subspace V of Fn . If C is any proper superset of B, then any
vector in C \ B is a linear combination of vectors from B. So, C is linearly dependent.
On the other hand, if D is any proper subset of B, then each vector in B \ D fails
to be a linear combination of vectors from D. For, otherwise, B would be linearly
dependent. We thus say that
A basis is a maximal linearly independent set.
A basis is a minimal spanning set.
The zero subspace {0} has a single basis . But other subspaces do not have a unique
basis. For instance, the subspace V in Example 3.2 has at least two bases. However,
something remains same in all these bases. In that example, both the bases have

50

Elementary Matrix Theory

exactly two vectors. Is it true that all bases of a subspace have the same number of
vectors?
Theorem 3.1
If a subspace V of Fn has a basis of k vectors, then any list of vectors from V
having more than k vectors is linearly dependent.
Proof Let B = {u1 , . . . , uk } be a basis for a subspace V of Fn . Let E =
{v1 , . . . , vm }, where m > k. Each v j is a linear combination of us. So, we have
scalars ai j for i = 1, . . . , k and j = 1, . . . , m such that
k

v j = a1 j u1 + a2 j u2 + + ak j uk = ai j ui

for j = 1, 2, . . . , m.

i=1

Now, suppose mj=1 j v j = 0. Then mj=1 ki=1 j ai j ui = 0. Or that


k

i=1 j=1

j ai j ui = j a1 j
j=1


u1 + +

j ak j


uk = 0.

j=1

As B is linearly independent, (mj=1 j ai j ) = 0 for each i = 1, . . . , k. We then get


the linear system
a11 1 + a12 2 + + a1m m = 0
a21 1 + a22 2 + + a2m m = 0
..
.
ak1 1 + ak2 2 + + akm m = 0
This is a homogeneous linear system with k equations and m > k unknowns
1 , . . . , m . Thus it has a nonzero solution. Therefore, mj=1 j v j = 0, where
not all s are zero. That is, E is linearly dependent.
How does it answer our question? Well, suppose V is a subspace of Fn . Since Fn
has the standard basis with n number of vectors, we cannot find more than n vectors
from V which are linearly independent. Therefore, and basis of V will have at the
most n number of vectors. Now, suppose V has two bases B and E with k and m
vectors, respectively. As B is a basis and E is linearly independent, m cannot be
greater than k. Again, since B is linearly independent and E is a basis, k cannot be
greater than m. Therefore, k = m.
Now that we know, each basis of a subspace of Fn has a definite number of vectors
in it. We give a name to this important number associated with a subspace.
Let V be a subspace of Fn . The number of vectors in some (or any) basis for V is
called the dimension of V. We write this number as dim (V ).

Subspace and Dimension

51

Since {e1 , . . . , en } is a basis for Fn1 , dim (Fn1 ) = n. Similarly, dim (F1n ) = n.
Remember that when we consider Cn1 or C1n , the scalars are complex numbers,
and for Rn1 or R1n , the scalars are real numbers.
Example 3.3
1. The dimension of the zero space is 0. That is, dim ({0}) = 0.
2. The subspace U := {[a, b, c, d] : a 2b + 3c = 0 = d + a, a, b, c, d F} can
be written as
U = {[a, b, c, d] : [2b 3c, b, c, 2b + 3c] : b, c F}
= {b [2, 1, 0, 2] + c [3, 0, 1, 3] : b, c F}.
The vectors [2, 1, 0, 2] and [3, 0, 1, 3] are linearly independent. Therefore, U has a basis {[2, 1, 0, 2], [3, 0, 1, 3]}. So, dim (U) = 2.
For any subset B of a subspace V of Fn , the following statements should then be
obvious.
1. If B has less vectors than dim (V ), then span (B) is a proper subspace of V.
2. If B has more vectors than dim (V ), then B is linearly dependent.
3. If B has dim (V ) number of vectors and span (B) = V, then B is a basis for V.
4. If B has dim (V ) number of vectors and B is linearly independent, then B is a
basis of V.
5. If B is a proper superset of a spanning set of V, then B is linearly dependent.
6. If B is a proper subset of a linearly independent subset of V, then B is linearly
independent and span (B) is a proper subspace of V.
7. If U is a subspace of V, then dim (U) dim (V ) n.
8. If B is a spanning set of V, then it contains a basis for V.
9. If B is linearly independent, then there exists a superset of B which is a basis
of V.
To see (8), suppose B is a spanning subset of V. If B = {0}, then V = {0} and in
that case, B is a basis of V. Otherwise, choose a nonzero vector v1 from B. Take
C := {v1 }. If V = span (C), then C is a basis of V. Else, choose a (nonzero) vector
v2 from V \ span (C). Update C to C {v2 }. Notice that C is linearly independent.
Continue this process to obtain a basis C for V. This process terminates since V is
finite dimensional.

52

Elementary Matrix Theory

Incidentally the same process is applicable to prove (9) starting from the linearly
independent subset B of V. Observe that a linearly independent subset is a basis for
the span of the subset. Therefore, we conclude the following statement from (9).
Theorem 3.2
(Basis Extension Theorem) Let V be a subspace of Fn . Then each basis of a
subspace of V can be extended to a basis of V.
You can use the methods of last section, using elementary row operations to extract
a basis for a subspace which is given in the form of span of some finite number of
vectors. The trick is to throw away the vectors which are linear combinations of the
selected ones. That is, write the vectors as row vectors and form a matrix; convert the
matrix to its RREF; and then throw away the zero rows, or the rows corresponding
to the zero rows by monitoring row exchanges.
Example 3.4
Find a basis for the subspace U of F4 , where
U = span {(1, 1, 1, 1), (2, 1, 0, 3), (1, 0, 1, 2), (0, 3, 2, 1)}.
We start with the matrix with these vectors as its rows and
its RREF as follows.

1 1 1 1
1
1 1 1
2 1 0 3 R1 0 1 2 1

1 0 1 2 0
1 2 1
0 3 2 1
0
3 2 1

1
1 0 1 2

R2
R3 0
2 1
0 1

0 0
0
0 0
0 0 4 4
0

convert it to

0
1
0
0

0
1

0
1

1 1
0
0

Here, R1 is E2 [2, 1], E1 [3, 1], R2 is E1 [2], E1 [1, 2], E1 [3, 2], E3 [4, 2], and
R3 is E[3, 4], E1/4 [3], E1 [1, 3], E2 [2, 3].
Taking the pivoted rows, we see that {(1, 0, 0, 1), (0, 1, 0, 1), (0, 0, 1, 1)} is
a basis for the given subspace. Notice that only one row exchange has been
done in this reduction process, which means that the third row in the RREF
corresponds to the fourth vector and the fourth row corresponds to the third
vector. Thus the pivoted rows correspond to the first, second and the fourth
vector, originally. This says that a basis for the subspace is also given by
{(1, 1, 1, 1), (2, 1, 0, 3), (0, 3, 2, 1)}.
The reduction process confirms that the third vector is a linear combination
of the first, second, and the fourth.

Subspace and Dimension

53

Linear independence has something to do with invertibility of a square matrix.


Suppose that the rows of a square matrix A Fnn are linearly independent. Then
the RREF of A has n number of pivots. That is, rank(A) = n. Consequently, A is
invertible. On the other hand, if a row of A is a linear combination of other rows,
then this row appears as a zero row in the RREF of A. That is, A is not invertible.
Considering At instead of A, we conclude that At is invertible iff the rows of A
are linearly independent. However, A is invertible iff At is invertible. Therefore, A
is invertible iff its columns are linearly independent. We note it down as our next
result.
Theorem 3.3
A square matrix is invertible iff its rows are linearly independent iff its columns
are linearly independent.
From Theorem 3.3 it follows that an n n matrix is invertible iff its rows form a
basis for F1n iff its columns form a basis for Fn1 .

Exercises for 3.2


1. Answer the following questions with justification:
(a) Is every subset of a linearly independent set linearly independent?
(b) Is every subset of a linearly dependent set linearly dependent?
(c) Is every superset of a linearly independent set linearly independent?
(d) Is every superset of a linearly dependent set linearly dependent?
(e) Is union of two linearly independent sets linearly independent?
(f) Is union of two linearly dependent sets linearly dependent?
(g) Is intersection of two linearly independent sets linearly independent?
(h) Is intersection of two linearly dependent sets linearly dependent?
2. Prove statements in (1)-(7) listed after Example 3.3.
3. Let {x, y, z} be a basis for a vector space V. Is {x + y, y + z, z + x} also a basis
for V ?
4. Find a basis for the subspace {(a, b, c) R3 : a + b 5c = 0} of R3 .
5. Find bases and dimensions of the following subspaces of R5 :
(a) {(a, b, c, d, e) R5 : a c d = 0}.
(b) {(a, b, c, d, e) R5 : b = c = d, a + e = 0}.
(c) span {(1, 1, 0, 2, 1), (2, 1, 2, 0, 0), (0, 3, 2, 4, 2), (3, 3, 4, 2, 1),
(5, 7, 3, 2, 0)}.

54

Elementary Matrix Theory


6. Extend the set {(1, 0, 1, 0), (1, 0, 1, 0)} to a basis of R4 .
7. Prove that the only proper subspaces of R2 are the straight lines passing through
the origin.

3.3

Matrix as a linear map

Let A Fmn . We may view the matrix A as a function from Fn1 to Fm1 . It goes
as follows.
Let x Fn1 . Then define the matrix A as a function A : Fn1 Fm1 by
A(x) = Ax.
That is, the value of the function A at any vector x Fn1 is the vector Ax in Fm1 .
Since the matrix product Ax is well defined, such a function is meaningful. We see
that due to the properties of matrix product, the following are true:
1. A(u + v) = A(u) + A(v) for all u, v Fn1 .
2. A( v) = A(v) for all v Fn1 and for all F.
In this manner a matrix is considered as a linear map. In fact, any function A from a
vector space to another (both over the same field) satisfying the above two properties
is called a linear transformation or a linear map.
To see the connection between the matrix as a rectangular array and as a function,
consider the values of the matrix A at the standard basis vectors e1 , . . . , en in Fn1 .
Recall that e j is a column vector in Fn1 where the jth entry is 1 and all other entries
are 0. Let A = [ai j ] Fmn . Then Ae j is the jth column of A. We thus observe the
following:
A matrix A Fmn is viewed as the linear map A : Fn1 Fm1 , where
A(e j ) is the jth column of A, and A(v) = Av for each v Fn1 .
The range of the matrix A (of the linear map A) is the set R(A) = {Ax : x Fn1 }.
Now, each vector x = [ 1 , . . . , n ]t Fn1 can be written as
x = 1 e1 + + n en .
If y R(A), then there exists an x Fn1 such that y = Ax. Such a y is written as
y = Ax = 1 Ae1 + + n Aen .
Conversely we see that each vector 1 Ae1 + + n Aen is in R(A). Since Ae j is the
jth column of A, we find that
R(A) = { 1 A1 + + n An : 1 , . . . , n F},

55

Subspace and Dimension

where A1 , . . . , An are the n columns of A. Thus, R(A) is the span of the columns of A;
and hence, it is a subspace of Fm1 . We thus refer to R(A) as the range space of A.
In this terminology, rank(A), which is the maximum number of columns of A is the
maximum number of linearly independent vectors in R(A). That is,
rank(A) = dim (R(A)).
In the RREF of A, the un-pivoted columns are the linear combinations of the pivoted
ones. If rank(A) = r, then the pivoted columns are e1 , . . . , er . Thus, in A, the columns
that correspond to the pivoted ones form a basis of R(A).
The null space N(A) = {x Fn1 : Ax = 0} of A is simply the set of solutions of
the homogeneous system Ax = 0. If u, v N(A), then A(u + v) = Au + Av = 0. For
any scalar , A( u) = Au = 0. Therefore, N(A) is a subspace of Fn1 . And we
refer to N(A) as the null space of A.
The nullity of A, denoted by null (A) := n rank(A) is the maximum number of
linearly independent vectors in N(A). That is,
null (A) = dim (N(A)).
As we know from the properties of a linear system, null (A) gives the number of
linearly independent solutions of the homogeneous system Ax = 0. In fact, by GaussJordan elimination we can construct a basis for N(A).
Explicitly, if A has more rows than columns, then we neglect the last m n zero
rows in the RREF of A; and if A has less rows than columns, we put in n m number
of zero rows at the bottom of the RREF of A, to get a square matrix. From the
resulting square matrix B, we collect all un-pivoted columns, and if an un-pivoted
column had the column index j in B, then we change its jth entry from 0 to 1.
These changed un-pivoted column vectors form a basis of N(A).
Example 3.5
Consider the system matrix in Example
pivots as shown below:

1
5 2 3 1
A = 1 3 2 2 0
3 8 7 5
0

2.9. We had its RREF with boxed


0
1
0

5/17 1/17
13/17

11/17

= RREF(A).

The first two columns in RREF(A) are the pivoted columns. So, the first two
columns in A form a basis for R(A). That is,
Basis for R(A) is {[5, 1, 3]t , [2, 3, 8]t }.
For a basis of N(A), we adjoin a zero row to the RREF to make it a square
matrix to obtain

1
0 5/17 1/17
0 1 13/17 11/17
.
B=
0 0
0
0
0 0
0
0

56

Elementary Matrix Theory

Then we change the diagonal entries in the un-pivoted columns to 1. These


changed un-pivoted columns form a basis for N(A). That is,
Basis for N(A) is



t  1
5
, 13
17
17 , 1, 0 , 17 ,

11
17 ,

t
0, 1 .

Check whether these vectors came up in writing the solution set of the system
in Example 2.9.
It follows that dim (R(A)) + dim (N(A)) is equal to the dimension of the domain
space of the linear map A, which is same as the number of columns of A. This statement is referred to as the rank nullity theorem.

Exercises for 3.3


1. Given any subspace U of Fn1 , does there exist a matrix in Fnn such that
U = N(A)?
2. Let A Fmn . Let {u1 , . . . , uk } be a basis for N(A). Extend this to {u1 , . . . , uk ,
v1 , . . . , vnk }, a basis for Fn1 . Then show that {Av1 , . . . , Avnk } is a basis for
R(A). This will give an alternate proof of the rank nullity theorem.

1 2 1 1 1
3 5 3 4 3

3. Determine rank r of A =
1 1 1 2 1 . And then express suitable 4 r
5 8 5 7 5
rows of A as linear combinations of the other r rows. Also, express suitable
5 r columns of A as linear combinations of the other r columns.
4. If E Fmm is an elementary matrix and A Fmn , then show that the row
rank of EA is equal to the row rank of A.
5. If B Fmm is an invertible matrix and A Fmn , then show that the column
rank of BA is equal to the column rank of A.
6. From previous two exercises, conclude that an elementary row operation neither alters the row rank nor the column rank of a matrix.
7. Let A Fmn . Prove that the linear map A : Fn1 Fm1 given by A(x) := Ax
for each x Fn1 is one-one iff N(A) = {0}.
8. Let A Fnn . Prove that the linear map A : Fn1 Fn1 given by A(x) := Ax
is one-one and onto iff A maps any basis onto another (may be same) basis of
Fn1 .
9. Let A Fnn . Prove that the linear map A : Fn1 Fn1 given by A(x) := Ax
for each x Fn1 is one-one iff it is onto.

57

Subspace and Dimension

3.4

Change of basis

Let B = {v1 , . . . , vm } be a list of vectors in a subspace V of Fn . Suppose that B is a


basis of V. Let v V. As B spans V, the vector v is a linear combination of vectors
from B. Can there be two distinct linear combinations? Suppose that there exist
scalars a1 , . . . , am F and b1 , . . . , bm F such that
v = a1 v1 + + am vm = b1 v1 + + bm vm .
Then (a1 b1 )v1 + + (am bm )vm = 0. Due to linear independence of B, we
conclude that a1 = b1 , . . . , am = bm . That is, such a linear combination is unique.
Conversely, suppose that each vector in V is written uniquely as a linear combination of v1 , . . . , vm . To show linear independence of these vectors, suppose that
1 v1 + + m vm = 0
for scalars 1 , . . . , m . We also have
0 v1 + + 0 vm = 0.
From the uniqueness of writing the zero vector as a linear combination of v1 , . . . , vm ,
we conclude that 1 = 0, . . . , m = 0. Therefore, B is linearly independent.
We note down the result we have proved.
Theorem 3.4
Let B = {v1 , . . . , vm } be an ordered subset of a subspace V of Fn . B is a basis of
V iff for each v V, there exists a unique column vector [ 1 , . . . , m ]t Fm1
such that v = 1 v1 + + m vm .
Notice that in such a case, dim (V ) = m. Once a basis B having m vectors is given
for a subspace V of Fn , the unique column vector [ 1 , . . . , m ]t Fm1 is called the
coordinate vector of v with respect to the basis B; and it is denoted by [v]B .
In this sense, we say that a basis provides a coordinate system in a subspace; it
co-ordinatizes the subspace.
Example 3.6
Let V = {[a, b, c]t : a + 2b + 3c = 0, a, b, c R}. This subspace has a basis
B = {[0, 3, 2]t , [2, 1, 0]t }.
Let v = [3, 0, 1]t V. We find that
v = [3, 0, 1]t = 21 [0, 3, 2]t 32 [2, 1, 0]t .

t
Therefore, [v]B = 21 , 32 .

58

Elementary Matrix Theory


If w V has coordinate vector given by [w]B = [1, 1]t , then
w = 1 [0, 3, 2]t + 1 [2, 1, 0]t = [2, 2, 2]t .

Since dim (V ) = 2, all the coordinate vectors are elements of F21 .


When we change the coordinate system, there are two questions to answer. How
do we obtain the new coordinate vectors from the old? And how does a matrix
change with respect to the change in coordinate system?
Let A Fmn . We view A as a linear map from Fn1 to Fm1 . For a vector v
n1
F , we have a vector Av Fm1 . In writing the vectors v and Av, we had used the
standard bases, by default. If we choose a different pair of bases, that is, a basis B
for Fn1 and a basis C for Fm1 , then our second question is formulated as follows:
What is the matrix M such that [Av]C = M[v]B ?
Let B = {u1 , . . . , un } be a basis for Fn1 . Suppose the jth standard basis vector of
Fn1 can be written as
[e j ]B = [c1 , . . . , cn ]t .

e j = c1 u1 + + cn un ,

Construct a matrix [u1 u2 un ] by taking the vectors from B as columns, in that


order. Then

c1
..
[u1 un ][e j ]B = [u1 un ] . = c1 u1 + + cn un = e j .
cn
We thus observe the following.
Observation 1: Let B = {u1 , . . . , un } be an ordered basis for Fn1 . Let e j denote the
jth standard basis vector of Fn1 . Construct the matrix P = [u1 un ] by taking the
column vector uk as its kth column. Then P[e j ]B = e j .
We use this observation in the proof of the following result.
Theorem 3.5
Let B = {u1 , . . . , un } and C = {v1 , . . . , vm } be bases for Fn1 and Fm1 , respectively. Let A Fmn . Construct the matrices
P = [u1 . . . un ],

Q = [v1 . . . vm ],

by taking the column vectors ui and v j as columns of the respective matrices.


Then, for each w Fn1 , [Aw]C = Q1 AP [w]B .
Proof Let w = [a1 , . . . , an ]t Fn1 . Write Aw = [b1 , . . . , bm ]t . Also, let e1 , . . . , en
be the standard basis vectors in Fn1 ; and f1 , . . . , fm the standard basis vectors

59

Subspace and Dimension


in Fm1 . We know that, for i = 1, . . . , n and j = 1, . . . , m,
n

P [e j ]B = e j ,

Q [ fi ]C = fi ,

w=

a je j,

j=1

Aw = bi fi .
i=1

Then
P [w]B = P

a je j

j=1

Q [Aw]C = Q

=P

bi fi

i=1

i
C

a j [e j ]B

a j P [e j ]B = a j e j = w.

j=1

j=1

j=1

= bi Q[ fi ]C = bi fi = Aw = AP [w]B .
i=1

i=1

Fnn

Since the columns of Q


are linearly independent, Q is invertible. It
then follows that [Aw]C = Q1 AP [w]B .
Theorem 3.5 says that the matrix Q1 AP when multiplied with the coordinate
vector of w with respect to the basis B, produces the coordinate vector of Aw with
respect to the basis C. In this sense, the matrix Q1 AP now represents the same linear
map A in the new coordinate system.
In particular, taking A as the identity matrix, we see that
[u]C = Q1 P[u]B .
Here, of course, both B and C are bases for the same space Fn1 . This formula
shows how the coordinate vector changes when a basis changes; thus answering our
first question. The matrix Q1 P is called the change of basis matrix.
Example 3.7
Consider the following bases for R3 :
O = {(1, 0, 1), (1, 1, 0), (0, 1, 1)},

N = {(1, 1, 1), (1, 1, 1), (1, 1, 1)}

Find the change of basis matrix M when the basis changes from O
to N. Also

1 1 1
find the matrix B that represents the linear map given by A = 1 0 1
0 1 0
and verify that




1
1
1
1
2 = M 2 , A 2 = B 2 .
3 N
3 O
3 N
3 O
We consider the transposes of the basis vectors and work in R31 . As per
the construction in Theorem 3.5, the change of basis matrix is given by

1 1 1
1 1 0
1 1/2 1/2
M = 1 1 1 0 1 1 = 1/2 1 1/2 .
1/2 1/2
1 1 1
1 0 1
1

60

Elementary Matrix Theory

To verify our result for (1, 2, 3), notice that


(1, 2, 3) = 1(1, 0, 1) + 0(1, 1, 0) + 2(0, 1, 1)
(1, 2, 3) = 2(1, 1, 1) + 32 (1, 1, 1) + 25 (1, 1, 1).
Therefore, [(1, 2, 3)]O = [1, 0, 2]t and [(1, 2, 3)]N = [2, 32 , 25 ]t . Then


1 1/2 1/2 1
2
M[(1, 2, 3)]O = 1/2 1 1/2 0 = 3/2 = [(1, 2, 3]N .
1/2 1/2
5/2
1
2
According to Theorem 3.5,

2 3 3
1 1 1
1 1 0
1
B = 1 1 1 A 0 1 1 = 2 1 3 .
2
0 0 2
1 1 1
1 0 1
As to the verification,




1
6
1
1
1
A 2 = 2 = 4 1 + 4 1 + 2 1 .
3
6
1
1
1
We find that



1
1
4
1
B 2 = B 0 = 4 = A 2 .
3 O
2
2
3 N

Observe that the matrix B is now a linear map from R31 with basis as O to
the space R31 with basis as N.

Exercises for 3.4


1. Consider the subspace V = {[a, b, c]t : a + 2b + 3c = 0, a, b, c R} of F31 . V
t
t
t
t
has bases
B1 = {[0,3, 2] , [2, 1, 0] } and B2 = {[1, 1, 1] , [3, 0, 1] }. Let
3 1 0
A = 0 1 3 .
1 1 2
(a) Show that A : V V is a well defined map.
(b) Extend the basis B1 of V to a basis O for F31 .
(c) Extend the basis B2 of V to a basis N for F31 .
(d) Find the change of basis matrix M when basis changes from O to N.
(e) Verify that [v]N = M[v]O for v = [4, 1, 2]t .
(f) Find a matrix M that represents A obtained by changing the basis from O
to N.

Subspace and Dimension

61

(g) Verify that [Av]N = M[v]O .


(h) Find a matrix B such that [Av]N = B[v]N for any v F31 .
2. Given any subspace U of Fn1 , does there exist a matrix A in Fnn such that
U = R(A)?

3.5

Equivalence and Similarity

In view of Theorem 3.5, we say that two matrices A, B Fmn are equivalent iff
there exist invertible matrices P Fnn and Q Fmm such that B = Q1 AP.
Observe that equivalent matrices represent the same linear map (matrix) with respect to possibly different pairs of bases. Therefore, ranks of two equivalent matrices
are same.
We can construct a matrix of rank r relatively easily. Let r min{m, n}. The
matrix Rr Fmn whose first r columns are the first standard basis vectors e1 , . . . , er
of F1n and all other rows are zero rows, has rank r. That is, in block form,


I 0
Rr = r
.
0 0
Such a matrix is called a rank echelon matrix. For notational ease, we do not show
the size of a rank echelon matrix; we rather specify it in different contexts. From
Theorem 2.4, it follows that any matrix which is equivalent to Rr has also rank r.
Conversely, if a row of a matrix is a linear combination of other rows, then the
rank of the matrix is same as the matrix obtained by deleting such a row. Similarly,
deleting a column which is a linear combination of other columns does not change
the rank of the matrix. It is thus possible to perform elementary row and column
operations to bring a matrix of rank r to its rank echelon form Rr . We state this result
and give a rigorous proof.
Theorem 3.6
(Rank factorization) A matrix is of rank r iff it is equivalent to the rank echelon
matrix Rr of the same size.
Proof Let A Fmn . Suppose rank(A) = r. Convert A to its row reduced
echelon form C := E1 A, where E1 is a suitable product of elementary matrices.
Now, each non-pivotal column is a linear combination of the r pivotal columns.
Consider the matrix C t . The pivotal columns of C are now the pivotal rows
t
ei in C t . Each other row in C t is a liner combination of these pivotal rows.
Exchange the rows of C t so that the first r rows of the new C t are et1 , . . . , etr in
that order. Use suitable elementary row operations to zero-out all non-pivotal

62

Elementary Matrix Theory

rows. This is possible since each non-pivotal row is a linear combination of


et1 , . . . , etr . Thus we obtain the matrix E2C t , where E2 is the suitable product of
elementary matrices such that E2C t has first r rows as et1 , . . . , etr and all other
rows are zero rows. Then taking transpose, we see that CE2t is a matrix whose
first r columns are e1 , . . . , er and all other columns are zero columns.
To summarize, we have obtained two matrices E1 and E2 , which are products
of elementary matrices such that


Ir 0
t
t
CE2 = E1 AE2 =
= Rr .
0 0
With P = E11 and Q = E2t , we see that A = Q1 Rr P.
For the converse, suppose that A = Q1 Rr P for some invertible matrices P
and Q. By Theorem 2.4, rank(A) = rank(Rr ) = r.
The rank factorization can be used to characterize equivalence of matrices. If A
and B are equivalent matrices, then clearly, they have the same rank. Conversely,
if two m n matrices have the same rank r, then both of them are equivalent to Rr .
Then, they are equivalent. Therefore, we have the rank theorem, which is stated as
follows:
Two matrices of the same size are equivalent iff they have the same rank.

  

I 0
I 
Observe that Rr = r
= r Ir 0 . Therefore, any matrix A of rank r can
0 0
0
be written as
 


I
A = BC, with B = Q r , C = Ir 0 P1
0
for some invertible matrices P and Q. Here, B Fmr is of rank r and C Frn is of
rank r also. Such a factorization of a matrix is called the full rank factorization.
The notion of equivalence stems from the change of bases in both the domain and
the co-domain of a matrix viewed as a linear map. In case, the matrix is a square
matrix of order n, it is considered as a linear map on Fn1 . If we change the basis in
Fn1 , we would have a corresponding representation of the matrix in the new basis.
Let A Fnn , a square matrix of order n. The matrix A is a map from Fn1 to
n1
F . Let E = {e1 , . . . , en } be the standard basis of Fn1 . The matrix A acts in the
usual way: Ae j is the jth column of A. Suppose, we change the basis of Fn1 to
C = {v1 , . . . , vn }. That is, in both the domain and the co-domain space, we take the
new basis as C. Suppose the equivalent matrix of A is M. Then the following equation
will hold for each v Fn1 :
[Av]C = P1 AP [v]C ,

P = [v1 . . . vn ].

The matrix A as a linear map now takes the form P1 AP, where the columns of P
form a basis, or a new coordinate system in Fn1 . This leads to similarity of two
matrices.

63

Subspace and Dimension

We say that two matrices A, B Fnn are similar iff B = P1 AP for some invertible matrix P Fnn .
We emphasize that if B = P1 AP is a matrix similar to A, then the matrix A as a
linear map on Fn1 with standard basis, and the matrix B as a liner map on Fn1 with
an ordered basis as the columns of P, are the same linear map.
Let N be the ordered basis whose jth element is the jth column of P. We see that
for each vector v Fn1 , [Av]N = P1 AP [v]N .
Example 3.8
t
t
31
Consider the basis N = {[1, 1,1]t , [1, 1, 1]
, [1, 1, 1] } for R . To deter1 1 1
mine the matrix similar to A = 1 0 1 , when the basis has changed from
0 1 0
the standard basis to N, we construct the matrix P by taking the basis vectors
of N as follows:

1 1 1
P = 1 1 1 .
1 1 1
Then the matrix similar

1 0
1
B = P1 AP = 1 1
2
0 1

to A with the change of basis to N is

1
1 1 1
1 1 1
0 2 2
1
0 1 0 1 1 1 1 = 1 1 3 .
2
1
0 1 0
1 1 1
1 1 3

From Example 3.7, we know that for u = [1, 2, 3]t ,


[u]N = [2, 32 , 52 ]t ,

[Au]N = [4, 4, 2]t .


0 2 2
2
4
1
B [u]N = 1 1 3 3/2 = 4 .
2
1 1 3 5/2
2
This verifies the condition [Au]N = B [u]N for the vector u = [1, 2, 3]t .
Though equivalence is easy to characterize by the rank, similarity is much more
difficult. And we postpone this to a later chapter.

Exercises for 3.5


1. Let A Cmn . Define T : C1m C1n by T (x) = xA for x C1m . Show
that T is a linear map. Identify T (etj ), Find a rank factorization of A.
2. Define T : R31 R21 by T ([a, b, c]t ) = [c, b + a]t . Show that T is a linear
map. Find a matrix A R23 such that T ([a, b, c]t ) = A[a, b, c]t . Determine a
full rank factorization of A.

64

Elementary Matrix Theory


3. Let T : R31 R31 be a linear map defined by
T ([a, b, c]t ) = [a + b, 2a b c, a + b + c]t .
Let A R33 be the matrix such that T (x) = Ax for x R31 . Find rank(A).
Then determine a rank factorization of A.
4. Using rank factorization, show that for any m k matrix A and k n matrix B,
rank(AB) min{rank(A), rank(B)}.
5. Let A, B Fnn . Show that rank(A) + rank(B) n rank(AB).
6. Using full rank factorization show that rank(A + B) rank(A) + rank(B) for
A, B Fmn .
7. Which matrices are equivalent to the zero matrix?
8. Which matrices are similar to the zero matrix?
9. Which matrices are equivalent to the identity matrix?

10. Which matrices are similar to the identity matrix?


11. Is a rank factorization of a matrix unique?
12. Is a full rank factorization of a matrix unique?

4
Orthogonalization

4.1

Inner products

The dot product in R3 is used to define length and angle. In particular, the dot
product is used to determine when two vectors become perpendicular to each other.
This notion can be generalized to Fn .
For vectors u, v F1n , we define their inner product as
hu, vi = uv .
For example, if u = [1, 2, 3], v = [2, 1, 3], then hu, vi = 1 2 + 2 1 + 3 3 = 13.
Similarly, for x, y Fn1 , we define their inner product as
hx, yi = y x.
In case, F = R, in the definition of inner product, x becomes xt . The inner product
satisfies the following properties:
For x, y, z Fn , , F,
1. hx, xi 0.
2. hx, xi = 0 iff x = 0.
3. hx, yi = hy, xi.
4. hx + y, zi = hx, zi + hy, zi.
5. hz, x + yi = hz, xi + hz, yi.
6. h x, yi = hx, yi.
7. hx, yi = hx, yi.
In any vector space V, with a map from h i : V V F that satisfies Properties
(1)-(4) and (6) is called an inner product space. Properties (5) and (7) follow from
the others.
The inner product gives rise to the length of a vector as in the familiar case of
R13 . We now call the generalized version of length as the norm.

65

66

Elementary Matrix Theory

For u Fn , we define its norm, denoted by kuk as the nonnegative square root of
hu, ui. That is,
p
kuk = hu, ui.
The norm satisfies the following properties:
For x, y Fn and F,
1. kxk 0.
2. kxk = 0 iff x = 0.
3. k xk = | | kxk.
4. |hx, yi| kxk kyk. (Cauchy-Schwartz inequality)
5. kx + yk = kxk + kyk. (Triangle inequality)
A proof of Cauchy-Schwartz inequality goes as follows:
If y = 0, then the inequality clearly holds. Else, hy, yi 6= 0. Write =
=

hx, yi
. Then
hy, yi

hy, xi
and hx, yi = | |2 kyk2 . Then
hy, yi

0 hx y, x yi = hx, xi hx, yi + hy, yi hy, xi
= kxk2 hx, yi = kxk2 | |2 kyk2 = kxk2

|hx, yi|2
kyk2 .
kyk4

The triangle inequality can be proved using Cauchy-Schwartz, as in the following:


kx + yk2 = hx + y, x + yi = kxk2 + kyk2 + hx, yi + hy, xi kxk2 + kyk2 + 2kxk kyk.
Using these properties, the acute (non-obtuse) angle between any two nonzero vectors can be defined. Let x, y F1n (or in Fn1 ). The angle between x and y,
denoted by (x, y) is defined by
cos (x, y) =

|hx, yi|
.
kxk kyk

We single out a particular case. We say that a vector x is orthogonal to a vector y,


and write it as x y, iff hx, yi = 0.
Notice that this definition allows x and y to be zero vectors. Also, the zero vector is
orthogonal to every vector. Also, if x y, then y x; thus whenever x is orthogonal
to y, we say that x and y are orthogonal vectors.
It follows that if x y, then kxk2 +kyk2 = kx +yk2 . This is referred to as Pythagoras law. The converse of Pythagoras law holds when F = R. For F = C, it does not
hold, in general.
We extend the notion of orthogonality to a set of vectors. A set of nonzero vectors
in Fn is called an orthogonal set in Fn iff each vector in the set is orthogonal to every

67

Orthogonalization

other vector in the set. An orthogonal set of vectors is called an orthonormal set if
the norm of each vector is 1. For example,
{[1, 2, 3]t , [2, 1, 0]t }
is an orthogonal set in F31 . And


 
1/ 14, 2/ 14, 3/ 14 t , 2/ 5,

1/ 5,

t
0

is an orthonormal set in F21 . The standard basis {e1 , . . . , en } is an orthonormal set in


Fn . Orthogonal and orthonormal sets enjoy nice properties; some of them are listed
in the exercises.

Exercises for 4.1


1. In C, consider the inner product hx, yi = xy. Let x = 1 and y = i be two vectors
in C. Show that kxk2 + kyk2 = kx + yk2 but hx, yi 6= 0.
2. In Fn1 , show that the parallelogram law holds. That is, for all x, y Fn1 , we
have kx + yk2 + kx yk2 = 2(kxk2 + kyk2 ).
hu, vi
v + w for v 6= 0. Show that hv, wi = 0. Then use
kvk2
Pythagoras theorem to derive Cauchy-Schwartz inequality.

3. Write a vector u as u =

4. Is the set {[1, 2, 3, 1]t , [2, 1, 0, 0]t , [0, 0, 1, 3]t } an orthogonal set in F41 ?
Is it also a linearly independent set?
5. Prove that each orthogonal set in Fn is linearly independent.
6. Construct an orthonormal set from {[1, 2, 3, 1]t , [2, 1, 0, 0]t , [0, 0, 1, 3]t }.
7. If an orthogonal set is given, how do we construct an orthonormal set from it?
8. Let B = {v1 , . . . , vm } be an orthonormal set in Fn . Let V = span (B). Let x Fn .
Prove the following:
(a) Fourier Expansion: If x V, then x = mj=1 hx, v j iv j .
(b) Parsevals Identity: If x V, then kxk2 = mj=1 |hx, v j i|2 .
(c) Bessels Inequality: kxk2 mj=1 |hx, v j i|2 .

4.2

Gram-Schmidt orthogonalization

It is easy to see that if the nonzero vectors v1 , . . . , vn are orthogonal, then they are
also linearly independent. For, suppose v1 , . . . , vn are nonzero orthogonal vectors.

68

Elementary Matrix Theory

Assume that
1 v1 + + n vn = 0.
Let j {1, . . . , n}. Take inner product of the left hand side and the right hand side
of the above equation with v j . If i 6= j, then hvi , v j i = 0. So, we have j hv j , v j i = 0.
But v j 6= 0 implies that hv j , v j i 6= 0. Therefore, j = 0. That is,
1 = = n = 0.
Therefore, the vectors v1 , . . . , vn are linearly independent.
Conversely, given n linearly independent vectors v1 , . . . , vn (necessarily all nonzero),
we can orthogonalize them. If v1 , . . . , vk are linearly independent but v1 , . . . , vk , vk+1
are linearly dependent, then we will see that our orthogonalization process will yield
the (k + 1)th vector as the zero vector. We now discuss this method, called GramSchmidt orthogonalization.
Given two linearly independent vectors u1 , u2 on the plane how do we construct
two orthogonal vectors? Keep v1 = u1 . Take out the projection of u2 on u1 to get v2 .
Now, v2 v1 .
What is the projection of u2 on u1 ? Its length is hu2 , u1 i. Its direction is that of
hu2 , v1 i
v1 does the job. You can now verify
u1 . Thus taking v1 = u1 and v2 = u2
hv1 , v1 i
that hv2 , v1 i = 0. We may continue this process of taking away projections in Fn . It
results in the following process.
Theorem 4.1
(Gram-Schmidt orthogonalization) Let u1 , u2 , . . . , un be linearly independent vectors in Fn . Define
v1 = u1
v2 = u2

hu2 , v1 i
v1
hv1 , v1 i

..
.
vn+1 = un+1

hun+1 , v1 i
hun+1 , vn i
v1
vn
hv1 , v1 i
hvn , vn i

Then v1 , v2 , . . . , vn are orthogonal and span {v1 , v2 , . . . , vn } = span {u1 , u2 , . . . , un }.


hu2 , v1 i
hu2 , v1 i
v1 , v1 i = hu2 , v1 i
hv1 , v1 i = 0.
hv1 , v1 i
hv1 , v1 i
Also, span {u1 , u2 } = span {v1 , v2 }. To complete the proof, use Induction.

Proof

hv2 , v1 i = hu2

Observe that if u1 , . . . , uk are linearly independent but u1 , . . . , uk+1 are linearly


dependent, then Gram-Schmidt process will compute nonzero orthogonal vectors
v1 , . . . , vk and it will give vk+1 as the zero vector.

69

Orthogonalization
Example 4.1

The vectors u1 = [1, 0, 0], u2 = [1, 1, 0], u3 = [1, 1, 1] are linearly independent
in R13 . Apply Gram-Schmidt Orthogonalization.

v1 = [1, 0, 0].
hu2 , v1 i
v2 = u2
v1 = [1, 1, 0] 1 [1, 0, 0] = [0, 1, 0].
hv1 , v1 i
hu3 , v2 i
hu3 , v1 i
v1
v2 = [1, 1, 1] [1, 0, 0] [0, 1, 0] = [0, 0, 1].
v3 = u3
hv1 , v1 i
hv2 , v2 i

The vectors [1, 0, 0], [0, 1, 0], [0, 0, 1] are orthogonal.

Example 4.2
The vectors u1 = [1, 1, 0], u2 = [0, 1, 1], u3 = [1, 0, 1] form a basis for F13 . Apply
Gram-Schmidt Orthogonalization.

v1 = [1, 1, 0].
hu2 , v1 i
[0, 1, 1] [1, 1, 0]
v2 = u2
v1 = [0, 1, 1]
[1, 1, 0]
hv1 , v1 i
[1, 1, 0] [1, 1, 0]
h
= [0, 1, 1] 12 [1, 1, 0] = 12 , 12 , 1].
v3 = u3

hu3 , v1 i
hu3 , v2 i
v1
v2
hv1 , v1 i
hv2 , v2 i

= [1, 0, 1] [1, 0, 1] [1, 1, 0][1, 1, 0] [1, 0, 1] [ 21 , 12 , 1] [ 12 , 21 , 1]


= [1, 0, 1] 12 [1, 1, 0] 13 [ 12 , 12 , 1] = [ 32 , 32 , 23 ].



The set [1, 1, 0], [ 12 , 12 , 1], [ 32 , 32 , 23 ] is orthogonal.

Example 4.3
Apply Gram-Schmidt orthogonalization process on the vectors u1 = [1, 1, 0, 1],
u2 = [0, 1, 1, 1] and u3 = [1, 3, 2, 1].

70

Elementary Matrix Theory

v1 = [1, 1, 0, 1].
hu2 , v1 i
v1 = [0, 1, 1, 1] 0 [1, 1, 0, 1] = [0, 1, 1, 1].
v2 = u2
hv1 , v1 i
hu3 , v2 i
hu3 , v1 i
v1
v2
v3 = u3
hv1 , v1 i
hv2 , v2 i
h(1, 3, 2, 1), (1, 1, 0, 1)i
= (1, 3, 2, 1)
(1, 1, 0, 1)
h(1, 1, 0, 1), (1, 1, 0, 1)i
h(1, 3, 2, 1), (0, 1, 1, 1)i
(0, 1, 1, 1)

h(0, 1, 1, 1), (0, 1, 1, 1)i


= [1, 3, 2, 1] [1, 1, 0, 1] 2[0, 1, 1, 1] = [0, 0, 0, 0].

Discarding v3 , which is the zero vector, we have only two linearly independent
vectors out of u1 , u2 , u3 . They are u1 and u2 ; and u3 is a linear combination of
these two. In fact, the process also revealed that u3 = u1 2u2 .
An orthogonal set can be made orthonormal by dividing each vector by its norm.
Also you can modify Gram-Schmidt orthogonalization process to directly output
orthonormal vectors.
Let V be a subspace of Fn . An orthogonal subset of V which is also a basis of V is
called an orthogonal basis of V. Similarly, when an orthonormal set is a basis of V,
the set is said to an orthonormal basis of V.
For example, the standard basis {e1 , . . . , en } of Fn1 is an orthonormal basis of
n1
F . Similarly, {et1 , . . . , etn } is an orthonormal basis of F1n .
Gram-Schmidt procedure constructs an orthogonal or an orthonormal basis from
a given basis of any subspace of Fn . It also shows that every subspace of Fn has an
orthogonal (orthonormal) basis.
In an orthonormal basis, the components of the coordinate vector of any vector can
be expressed through the inner product. Revisit Exercise 8(a) of Section 4.1. The
Fourier expansion, there, says that the coordinate vector of any vector x with respect
to an orthonormal basis has the jth component as the inner product of x with the jth
basis vector.

Exercises for 4.2


1. Using Gram-Schmidt process, orthonormalize the vectors [1, 1, 1], [1, 0, 1],
and [0, 1, 2].

2. Find u R13 so that [1/ 3, 1/ 3, 1/ 3], [1/ 2, 0, 1/ 2], u are orhtonormal. Form a matrix with the vectors as rows, in that order. Verify that the
columns of the matrix are also orthonormal.

71

Orthogonalization

3. Show that the cross product u v of two linearly independent vectors u, v in


R13 is orthogonal to both u and v. How to obtain this third vector as u v by
Gram-Schmidt process?
4. If U = span {v1 , . . . , vm } is a subspace of Fn , can we use Gram-Schmidt process to extract a basis for U using the vectors v1 , . . . , vm ?
5. How do we use Gram-Schmidt process to compute the rank of a matrix?

4.3

Best approximation

To find the point in a given plane closest to a point in space, we draw a perpendicular
from the point to the plane. The foot of the perpendicular is the closest point in the
plane.
Let U be a subspace of Fn . Let v Fn . A vector u U is a best approximation
of v iff kv uk kv xk for each x U.
We show that our intuition of taking a perpendicular can be used to compute a best
approximation.
Theorem 4.2
Let U be a subspace of Fn . A vector u U is a best approximation of v Fn iff
v u U. Moreover, a best approximation is unique.
Proof
rem,

Suppose v u U. Let x U. Now, u x U. By Pythagoras Theokv xk2 = k(v u) + (u x)k2 = kv uk2 + ku xk2 kv uk2 .

Hence, kv uk kv xk for each x U; so, u is a best approximation of v.


Conversely, suppose u is a best approximation of v. Then
kv uk kv xk for each x U.

(4.1)

Let y U. We show that hv u, yi = 0. For y = 0, clearly hv u, yi = 0. For y 6= 0,


let = hv u, yi/kyk2 . Then hv u, yi = | |2 kyk2 . Thus, h y, v ui = | |2 kyk2
also. From (4.1), we have
kv uk2 kv u yk2 = hv u y, v u yi
= kv uk2 hv u, yi h y, v ui + | |2 kyk2
= kv uk2 | |2 kyk2 .
Hence, | |2 kyk2 = 0. As y 6= 0, | |2 = 0. It follows that hv u, yi = 0.

72

Elementary Matrix Theory

To show uniqueness of a best approximation, suppose that u and w are best


approximations to v. Then kv uk kv wk and kv wk kv uk. So,
kv uk = kv wk.
Now, since v w w u, by Pythagoras theorem,
kv uk2 = k(v w) + (w u)k2 = kv wk2 + kw uk2 = kv uk2 + kw uk2 .
Thus, kw uk2 = 0. That is, w = u.
In the presence of an orthonormal basis, we can have an explicit expression for the
best approximation.
Theorem 4.3
Let {u1 , . . . , un } be an orthonormal basis for a subspace U of Fn . Let v Fn .
The unique best approximation of v from U is u = ni=1 hv, ui iui .
Proof Let u = ni=1 hv, ui iui . Let x U. We have scalars 1 , . . . , n such that
x = nj=1 j u j . Let j {1, . . . , n}. Then
n
n


hv u, u j i = v hv, ui iui , u j = hv, u j i hv, ui ihui , u j i = hv, u j i hv, u j i = 0.
i=1

i=1

Therefore, hv u, xi = 0. Due to Theorem 4.2, u is the best approximation of


v from U.
In case, {u1 , . . . , un } is any basis for U, we may orthonormalize it using GramSchmidt process, and then apply the formula given in Theorem 4.3 for computing
the best approximation. Alternatively, we may write u = nj=1 j u j . Our requirement
v u u j means determining j from hv nj=1 j u j , ui i = 0. Thus, we solve the
linear system
n

hu j , ui i j = hv, ui i.

j=1

Theorem 4.3 guarantees that this linear system has a unique solution. Notice that the
system matrix of this linear system is A = [ai j ], where ai j = hu j , ui i. Such a matrix
which results from a basis by taking the inner products of basis vectors is called a
Gram matrix. Our result shows that a Gram matrix is invertible. Can you prove
directly that a Gram matrix is invertible?
Example 4.4
Find the best approximation of v = (1, 0) R2 from U = {(a, a) : a R}.
We seek ( , ) so that (1, 0) ( , ) ( , ) for all . That is, to find
( , ) so that (1 , ) (1, 1) = 0. So, = 1/2. The best approximation
here is (1/2, 1/2).

73

Orthogonalization

Exercises for 4.2


1. Find the best approximation of x V from U where
(a) V = R3 , x = (1, 2, 1), U = span {(3, 1, 2), (1, 0, 1)}.
(b) V = R3 , x = (1, 2, 1), U = {( , , ) R3 : + + = 0}.
(c) V = R4 , x = (1, 0, 1, 1), U = span {(1, 0, 1, 1), (0, 0, 1, 1)}.

4.4

QR factorization and least squares

Notice that we have discussed two ways of extracting a basis for the span of a finite
number of vectors from Fn . One is the method of elementary row operations and
the other is Gram-Schmidt orthogonalization. The orthogonalization is a superior
tool though computationally difficult. We will now see one of its applications in
factorizing a matrix.
Let u1 , u2 , . . . , un be the columns of A Fmn , where m n. Suppose that the
columns are linearly independent. Using Gram-Schmidt process and then orthonormalizing we get the vectors v1 , v2 , . . . , vn . Since span {u1 , . . . , uk } = span {v1 , . . . , vk }
for any k = 1, . . . , n, there exist scalars ai j , 1 i j n such that
u1 = a11 v1
u2 = a12 v1 + a22 v2
..
.
un = a1n v1 + a2n v2 + + ann vn
We take ai j = 0 for i > j. Writing R = [ai j ] for i, j = 1, 2, . . . , n, Q = [v1 v2 vn ],
we see that
A = [u1 u2 un ] = QR.
Since columns of Q are orthonormal, Q Fmn , R Fnn , Q Q = I, and R is upper
triangular.
The QR factorization of a matrix A is the determination of a matrix Q with orthonormal columns and an upper triangular matrix R so that A = QR.
Recall that if Q Rmn , we have Qt Q = I. The above discussion boils down to
the following result.
Theorem 4.4
Any matrix A Fmn with linearly independent columns has a QR factorization. Moreover, R is invertible.

74

Elementary Matrix Theory

Example
4.5

1/ 2 0
1 1
Let A = 0 1 . Orthonormalization of the columns of A yields Q = 0 1 .

1/ 2 0
1 1


2
2
. It is easy to check
Since A = QR and Qt Q = I, we have R = Qt A =
0
1
that A = QR.
The QR factorization and best approximation together give an efficient procedure
in approximating a solution of a system of linear equations. In order to discuss this,
we first define the so called least squares approximation of a solution of a linear
system.
Let A Fmn . A vector u Fn1 is a called a least squares solution of the linear
system Ax = b iff kAu bk kAz bk for all z Fm1 .
Notice that a least squares solution of Ax = b is simply the best approximation of
a solution to Ax = b from R(A). Then Theorem 4.2 yields the following result.
Theorem 4.5
Let A Fmn , and let b Fm1 .
1. The linear system Ax = b has a least squares solution.
2. A vector u Fn1 is a least squares solution iff Au b R(A).
3. A least squares solution is unique iff N(A) = {0}.
Recall that the null space N(A) of A is the set of all solutions of the homogeneous
system Ax = 0. Thus the condition N(A) = {0} says that the homogeneous system
Ax = 0 does not have a nonzero solution.
Least squares solutions can be computed by solving a related linear system.
Theorem 4.6
Let A Fmn and let b Fm1 . A vector u Fn1 is a least squares solution of
Ax = b iff A Au = A b.
Proof The columns u1 , . . . , un of A span R(A). Due to Theorem 4.5,
u is a least squares solution of Ax = b
iff hAu b, ui i = 0, for i = 1, . . . , n
iff ui (Au b) = 0 for i = 1, . . . , n
iff A (Au b) = 0
iff A Au = A b.

Orthogonalization

75

If A Rmn and b Rm1 , then a least squares solution u of the system Ax = b


satisfies At Au = At b.
Least squares solutions are helpful in those cases where some errors in data lead
to an inconsistent system.
Example
 4.6 
 


1 1
0
1
Let A =
, b=
, and u =
. We see that At Au = At b. Hence
0 0
1
1
u is a least squares solution of Ax = b.
Notice that Ax = b has no solution.
We may use QR factorization in computing a least squares solution of a linear
system.
Theorem 4.7
Let A Fmn have linearly independent columns. Let A = QR be the QR factorization of A. Then, the least squares solution of Ax = b is given by u = R1 Q b.
Proof Let u = R1 Q b. Now, A Au = R Q QRR1 Q b = R Q b = A b. That
is, u satisfies the equation A Ax = A b. Thus, u is a least squares solution of
Ax = b.
Moreover, since A has linearly independent columns, rank(A) = n. Then
null (A) = n n = 0. That is, N(A) = {0}. So, the homogeneous system Ax = 0
has no nonzero solution. Therefore, this least square solution u is the unique
least squares solution.
Why is u = R1 Q b not a solution of Ax = b? The reason is, Q has orthonormal
columns, but it need not have orthonormal rows. Consequently, QQ need not be
equal to I. Then Au = QRR1 Q b = QQ b need not be equal to b.
But, if a solution v exists for Ax = b, then Av = b. It implies QRv = b; and then
Rv = Q b. And finally, it yields v = u. That is, a solution of Ax = b must be equal to
the least squares solution.
Notice that u = R1 Q b leads to the linear system Ru = Q b, which is easy to
solve since R is upper triangular.

Exercises for 4.4


1. Find a QR-factorization of each of the following matrices:

1 1
2
0 1
1 0 2
0 1 1

(a) 1 1 (b) 0 1 1 (c)


1 1
0
0 1
1 2 0
0 0
1

76

Elementary Matrix Theory


2. Let A Rmn and b Rm1 . If columns of A are linearly independent, then
show that there exists a unique x Rn1 such that At Ax = At b.
3. Find the least squares solution of the system Ax = b, where

0
1 1 1
3
1
1
1
1 0 1

2 , b = 0
(a) A = 1
(b) A =
1 1 0 , b = 1 .
2 1
2
2
0 1 1
4. Let A Rnn . Let b Rn1 with b 6= 0. Show that if b is orthogonal to each
column of A, then Ax = b is inconsistent. What are least squares solutions of
Ax = b?

5
Eigenvalues and Eigenvectors

5.1

Eigenvalues



0 1
Let A =
. We view A as a linear transformation; A : R21 R21 . It trans1 0
forms straight lines to straight lines or points. Does there exist a straight line which
is transformed to itself?
  
   
x
0 1
x
y
A
=
=
.
y
1 0
y
x
Thus, the line {(x,x) :x R}
 never
 moves.
 So also the line
 {(x,
 x) : x R}.
x
x
x
x
Observe that A
=1
and A
= (1)
.
x
x
x
x
Let A Fnn . A scalar F is called an eigenvalue of A iff there exists a nonzero vector v Fn1 such that Av = v. Such a vector v is called an eigenvector of
A for (or, associated with, or, corresponding to) the eigenvalue .
Example 5.1

1 1 1
Consider the matrix A = 0 1 1 . It has an eigenvector [0, 0, 1]t associated
0 0 1
with the eigenvalue 1. Is [0, 0, c]t also an eigenvector associated with the same
eigenvalue 1?
In fact, corresponding to an eigenvalue, there are infinitely many eigenvectors.

Exercises for 5.1


1. Suppose A Fnn , F, and b Fn1 are such that (A I)x = b has a
unique solution. Show that is not an eigenvalue of A.
2. Formulate a converse of the statement in Exercise 1 and prove it.
3. Let A Fnn . Show that A is invertible iff 0 is not an eigenvalue of A.

77

78

5.2

Elementary Matrix Theory

Characteristic polynomial

Notice that Av = v iff (A I)v = 0. Thus a nonzero vector v is an eigenvector


for the eigenvalue iff v is a nonzero solution of (A I)x = 0. Further, the linear
system (A I)x = 0 has a nonzero solution iff rank(A I) < n, where A is an
n n matrix. And this happens iff det(A I) = 0. Therefore, we have the following
result.
Theorem 5.1
Let A Cnn . A scalar C is an eigenvalue of A iff det(A I) = 0.
The polynomial det(A tI) is called the characteristic polynomial of the matrix
A. Each eigenvalue of A is a zero of the characteristic polynomial of A. Conversely,
each zero of the characteristic polynomial is said to be a complex eigenvalue of A.
If A is a matrix with real entries, some of the zeros of its characteristic polynomial may turn out to be complex numbers. Considering A as a linear transformation
from Rn1 to Rm1 , the scalars are now only real numbers. Thus each zero of the
characteristic polynomial may not be an eigenvalue; only the real zeros are.
If we regard A as a matrix with complex entries, then A is a linear transformation on Cn1 . Then each complex eigenvalue, that is, a zero of the characteristic
polynomial of A, is an eigenvalue of A.
Since the characteristic polynomial of a matrix A of order n is a polynomial of
degree n in t, it has exactly n, not necessarily distinct, zeros in C. And these are the
eigenvalues (complex eigenvalues) of A. Notice that, here, we are using the fundamental theorem of algebra which says that each polynomial of degree n with complex
coefficients can be factored into exactly n linear factors.
Caution: When is a complex eigenvalue of A Fnn , a corresponding eigenvector
x is a vector in Cn1 , in general.
For computing eigenvalues, a matrix in Rnn is considered as a matrix in Cnn .
Thus, the eigenvalues of A are complex eigenvalues, in general.
Example 5.2
Find the eigenvalues and corresponding eigenvectors of the matrix

1
A= 1
1

0
1
1

0
0 .
1

The characteristic polynomial is det(A tI) =

1t 0
0
1 1 t 0 = (1 t)3 .
1
1 1t

79

Eigenvalues and Eigenvectors

The eigenvalues of A are its zeros, that is, 1, 1, 1. To get an eigenvector,


we solve A[a, b, c]t = [a, b, c]t or that
a = a, a + b = b, a + b + c = c.
It gives b = c = 0 and a F can be arbitrary. Since an eigenvector is nonzero,
all the eigenvectors are given by (a, 0, 0)t , for a 6= 0.
Example 5.3
For A R22 , given by


A=


0 1
,
1 0

the characteristic polynomial is t 2 + 1 = 0. It has no real zeros. Then A has


no eigenvalue.
However, i and i are its complex eigenvalues. That is, the same matrix
A C22 has eigenvalues as i and i. The corresponding eigenvectors are
obtained by solving
A[a, b]t = i[a, b]t and A[a, b] = i[a, b]t .
For = i, we have b = ia, a = ib. Thus, [a, ia]t is an eigenvector for a 6= 0.
For the eigenvalue i, the eigenvectors are [a, ia] for a 6= 0.
We consider A as a matrix with complex entries; and it has (complex)
eigenvalues i and i.
The following theorem lists some important facts about eigenvalues.
Theorem 5.2
Let A Fnn . Then the following are true.
1. A and At have the same eigenvalues.
2. Similar matrices have the same eigenvalues.
3. If A is a diagonal or an upper triangular or a lower triangular matrix,
then its diagonal elements are precisely its eigenvalues.
4. The product of all eigenvalues of A is equal to det(A).
5. The sum of all eigenvalues of A is equal to tr(A).
6. (Caley-Hamilton) A satisfies its characteristic polynomial.
Proof

(1) det(At tI) = det((A tI)t ) = det(A tI).

(2) det(P1 APtI) = det(P1 (AtI)P) = det(P1 )det(AtI)det(P) = det(AtI).

80

Elementary Matrix Theory

(3) In all these cases, det(A tI) = (a11 t) (ann t).


(4) Let 1 , . . . , n be the eigenvalues of A, not necessarily distinct. Now,
det(A tI) = ( 1 t) ( n t).
Put t = 0. It gives det(A) = 1 n .
(5) Expand det(A tI) on its first row. The first term is (a11 t)A11 , where
A11 is the minor corresponding to the (1, 1)th entry. All other terms are
polynomials of degree less than or equal to n 2. Continuing in a similar
fashion of expanding the minor A11 we see that
Coeff of t n1 in det(A tI) = Coeff of t n1 in (a11 t) A11
= = Coeff of t n1 in (a11 t) (a22 t) (ann t) = (1)n1 tr(A).
But Coeff of t n1 in det(A tI) = (1)n1 ( 1 + + n ).
(6) Let A Fnn . Its characteristic polynomial is
p(t) = (1)n det(A t I).
We show that p(A) = 0, the zero matrix. Recall that for any square matrix B,
we have B adj(B) = adj(B)B = det(B) I. Taking B = A tI, we have
p(t) I = (1)n det(A tI) I = (1)n (A tI) adj (A tI).
The entries in adj (A tI) are polynomials in t of degree at most n 1. Write
adj (A tI) := B0 + tB1 + + t n1 Bn1 ,
where B0 , . . . , Bn1 Fnn . Then
p(t)I = (1)n (A t I)(B0 + tB1 + t n1 Bn1 ).
Notice that this is an identity in polynomials, where the coefficients of t j are
matrices. Substituting t by any matrix of the same order will satisfy the
equation. In particular, substituting A for t we obtain p(A) = 0.
Cayley-Hamilton theorem helps us in computing powers of matrices and also the
inverse of a matrix if at all it is invertible. For instance, suppose that a matrix A has
the characteristic polynomial
a0 + a1t + + ant n .
By Cayaley-Hamilton theorem, we have
a0 I + a1 A + + an An = 0.
Then An = (a0 I + a1 A + + an1 An1 ). Thereby computation of An , An+1 , . . . can
be reduced to computing A, A2 , . . . , An1 .

81

Eigenvalues and Eigenvectors

For computing the inverse, suppose that A is invertible. Then det(A) 6= 0. Since
det(A) is the product of all eigenvalues of A, = 0 is not an eigenvalue of A. It
implies that (t ) is not a factor of the characteristic polynomial of A. Therefore,
the constant term a0 in the characteristic polynomial of A is nonzero. Then we can
rewrite the above equation as
a0 I + A(a1 I + an An1 ) = 0.
Multiplying A1 and simplifying, we obtain
A1 =


1
a1 I + a2 A + + an An1 .
a0

This way, A1 can also be computed from A, A2 , . . . , An1 .

Exercises for 5.2

3
0
1. Find eigenvalues and corresponding eigenvectors of
0
0

0
2
0
0

0
0
0
0
.
0 2
2
0

2. Find all eigenvalues and their corresponding eigenvectors of the n n matrix


A whose jth row has each entry j.
3. Determine all eigenvalues and their corresponding eigenvectors for the 5 5
matrix whose each row is [1 2 3 4 5].
4. Let A Fnn be a matrix such that the sum of all entries in any row is for
some F. Then show that is an eigenvalue of A.
5. Show that if rank of an n n matrix is 1, then its trace is one of its eigenvalues.
What are its other eigenvalues?
6. Let A Fnn . Let p(t) be a polynomial of degree n with coefficient of t n as
(1)n . If p(A) = 0, then does it follow that p(t) is the characteristic polynomial of A?
7. Let A, B, P Cnn be such that B = P1 AP. Let be an eigenvalue of A. Show
that a vector v is an eigenvector of B corresponding to the eigenvalue iff Pv
is an eigenvector of A corresponding to the same eigenvalue .
8. An n n matrix A is said to be idempotent if A2 = A. Show that the only
possible eigenvalues of an idempotent matrix are 0 or 1.
9. An n n matrix A is said to be nilpotent if Ak = 0 for some natural number k.
Show that 0 is the only eigenvalue of a nilpotent matrix.
10. Show that if each eigenvalue of A Fnn has absolute value less than 1, then
both I A and I + A are invertible.

82

5.3

Elementary Matrix Theory

Special types of matrices

A square matrix A is called hermitian, iff A = A. A hermitian matrix with real


entries satisfies At = A; and accordingly, such a matrix is called a real symmetric
matrix. In general, A is called a symmetric matrix iff At = A. And A is called skew
hermitian iff A = A; also, a matrix is called skew symmetric iff At = A. In
the following, B is symmetric, C is skew-symmetric, D is hermitian, and E is skewhermitian. B is also hermitian and C is also skew-hermitian.

1 2 3
0 2 3
i 2i 3
0 2+i 3
B = 2 3 4 , C = 2 0 4 , D = 2i 3 4 , E = 2 i i 4i
3 4 5
3 4 0
3 4 5
3 4i 0
Notice that a skew-symmetric matrix must have a zero diagonal, and the diagonal
entries of a skew-hermitian matrix must be 0 or purely imaginary. Reason:
aii = aii 2Re(aii ) = 0.
Let A be a square matrix. Since A + At is symmetric and A At is skew symmetric, every square matrix can be written as a sum of a symmetric matrix and a skew
symmetric matrix:
A = 21 (A + At ) + 12 (A At ).
Similar rewriting is possible with hermitian and skew hermitian matrices:
A = 21 (A + A ) + 21 (A A ).
A square matrix A is called unitary iff A A = I = AA . In addition, if A is real, then
it is called an orthogonal matrix. That is, an orthogonal matrix is a matrix with
real entries satisfying At A = I = AAt . Notice that a square matrix is unitary iff it is
invertible and its inverse is equal to its adjoint. Similarly, a real matrix is orthogonal
iff it is invertible and its inverse is its transpose. In the following, B is a unitary
matrix of order 2, and C is an orthogonal matrix (also unitary) of order 3:



2 1 2
1 1+i 1i
1
B=
, C = 2 2 1 .
2 1i 1+i
3
1 2 2
The following are examples of orthogonal 2 2 matrices.




cos sin
cos
sin
O1 :=
, O2 :=
.
sin
cos
sin cos
If A = [ai j ] is an orthogonal matrix of order 2, then At A = I implies
a211 + a221 = 1 = a212 + a222 , a11 a12 + a21 a22 = 0.

Eigenvalues and Eigenvectors

83

Thus, there exist , such that a11 = cos , a21 = sin , a12 = cos , a22 = sin
and cos( ) = 0. It then follows that A is in the form of either O1 or O2 .

Let (a, b) be the vector in the plane that starts at the origin and ends at the point
(a, b). Writing the point (a, b) as a column vector [a b]t , we see that the matrix

product O1 [a b]t is the end-point of the vector obtained by rotating the vector (a, b)
by an angle . Similarly, O2 [a b]t gives a point obtained by reflecting (a, b) along a
straight line that makes an angle /2 with the x-axis. Thus, O1 is said to be a rotation
by an angle and O2 is called a reflection along a line making an angle of /2 with
the x-axis.
If A Fmn , then A A = I is equivalent to asserting that the columns of A are
orthonormal; and AA = I is equivalent to the fact that the rows of A are orthonormal.
Unitary or orthogonal matrices preserve inner product and also the norm. This is the
reason unitary matrices are also called isometries. We prove these facts about unitary
matrices in the following theorem.
Theorem 5.3
Let A Fnn be a unitary or an orthogonal matrix.
1. For each pair of vectors x, y Fn1 , hAx, Ayi = hx, yi.
2. For each x Fn1 , kAxk = kxk.
3. The columns of A are orthonormal.
4. The rows of A are orthonormal.
5. |det(A)| = 1.
Proof

(1) hAx, Ayi = (Ay) Ax = y A Ax = y x = hx, yi.

(2) Take x = y in (1).


(3) Since A A = I, the ith row of A multiplied with the jth column of A gives
i j . However, this product is simply the inner product of the jth column of A
with the ith column of A.
(4) It follows from (2). Also, considering AA = I, we get this result.
(5) Notice that det(A ) = det(A) = det(A). Thus
det(A A) = det(A )det(A) = det(A)det(A) = |det(A)|2 .
However, det(A A) = det(I) = 1. Therefore, |det(A)| = 1.
It thus follows that the determinant of an orthogonal matrix is either 1 or 1.
We wish to see the nature of eigenvalues and eigenvectors of these special types
of matrices.

84

Elementary Matrix Theory

Theorem 5.4
Let A Fnn . Let be any complex eigenvalue of A.
1. If A is hermitian or real symmetric, then R. Moreover, there exists
a real eigenvector corresponding to .
2. If A is skew-hermitian or skew-symmetric, then is purely imaginary
or zero.
3. If A is unitary or orthogonal, then | | = 1.
Proof Let A Fnn . Let be any complex eigenvalue of A with an eigenvector v Cn1 . Now, Av = v. Pre-multiplying with v , we have v Av = v v C.
(1) If A is hermitian, then A = A . Now,
(v Av) = v A v = v Av and

(v v) = v v.

So, both v Av and v v are real. Therefore, in v Av = v v, is also real.


If v = x + iy Cn1 is an eigenvector corresponding to , with x, y Rn1 ,
then
A(x + iy) = (x + iy).
Comparing the real and imaginary parts, we have
Ax = x,

Ay = y.

Since x + iy 6= 0, at least one of x or y is nonzero. One such nonzero vector is


a real eigenvector corresponding to the eigenvalue of A.
(2) When A is skew-hermitian, (v Av) = v Av. Then v Av = v v implies that
( v v) = (v v).
= . That is, 2Re( ) = 0. This shows
Since v 6= 0, v v 6= 0. Therefore, =
that is purely imaginary or zero.
. Then
(3) Suppose A A = I. Now, Av = v implies v A = ( v) = v
v v = | |2 v v.
v v = v Iv = v A Av =
Since v v 6= 0, | | = 1.

Exercises for 5.3


1. Construct an orthogonal 2 2 matrix whose determinant is 1.
2. Construct an orthogonal 2 2 matrix whose determinant is 1.
3. Construct a 3 3 hermitian matrix with no zero entry whose eigenvalues are
1, 2 and 3.

Eigenvalues and Eigenvectors

85

4. Construct a 2 2 skew-hermitian matrix whose eigenvalues are purely imaginary.


5. Show that if a matrix A is real symmetric and invertible, then so is A1 .
6. Show that if a matrix A is hermitian and invertible, then so is A1 .
7. Show that in the plane,
(a) a rotation following a rotation is a rotation;
(b) a rotation following a reflection is a reflection;
(c) a reflection following a rotation is a reflection; and
(d) a reflection following a reflection is a rotation.
8. Let A Fnn . Show that hAx, Ayi = hx, yi for all x, y Fn1 iff kAxk = kxk for
all x Fn1 .
9. Let A R22 be an orthogonal matrix. Suppose that A has a non-trivial fixed
point; that is, there exists a nonzero vector v R21 such that Av = v. Show
that with respect to any orthonormal basis B of R21 , the matrix [A]B is in the
form


cos
sin
.
sin cos

6
Canonical Forms

6.1

Schur triangularization

Eigenvalues and eigenvectors can be used to bring a matrix to nice forms using similarity transformations. A very general result in this direction is Schurs unitary triangularization. It says that using a suitable similarity transformation, we can represent
a square matrix by an upper triangular matrix. Thus, the diagonal entries of the upper
triangular matrix must be the eigenvalues of the given matrix. This information can
be used to construct the appropriate similarity transformation.
Theorem 6.1
(Schur Triangularization) Let A Cnn . Then there exists a unitary matrix
P Cnn such that P AP is upper triangular. Moreover, if A Rnn has only
real eigenvalues, then P can be chosen to be an orthogonal matrix.
Proof Our proof is by induction on n. If n = 1, then clearly A is an upper
triangular matrix, and we take P = [1], the identity matrix with a single entry
as 1, which is both unitary and orthogonal.
Assume that for all B Cmm , m 1, we have a unitary matrix Q Cmm
such that Q BQ is upper triangular. Let A C(m+1)(m+1) . Let C be an
eigenvalue of A with an associated eigenvector u. Consider C(m+1)1 as an inner
product space with the usual inner product hw, zi = z w. Let v = u/kuk, so that
v is an eigenvector of A of norm 1 associated with the eigenvalue . Extend the
set {v} to obtain an orthonormal ordered basis E = {v, v1 , . . . , vm } for C(m+1)1 .
Here, you may have to use an extension of a basis, and then Gram-Schmidt
orthonormalization process. Now, construct the matrix R C(m+1)(m+1) by
taking these basis vectors as its columns, in that order; that is, let
R = [v v1 vm ].
Since E is an orthonormal set, R is unitary. With respect to the basis E, the
matrix representation of A is R1 AR = R AR. The first column of R AR is
R ARe1 = R Av = R1 v = R1 v = R1 Re1 = e1 ,

87

88

Elementary Matrix Theory

where e1 C(m+1)1 has first component as 1 and all other components 0.


Then R AR can be written in the following block form:


x
R AR =
,
0 C
where 0 Cm1 , C Cmm and x C1m . In fact, x = [v Av1 v Av2 v Avm ];
but that is not important for the purpose of the proof.
Notice that if m = 1, the proof is complete. For m > 1, by induction hypothesis, we have a matrix S Cmm such that SCS is upper triangular. Then
take


1 0
P=R
.
0 S
Since S is unitary, P P = PP = I; that is, P is unitary. Moreover,



 


 
1 0
1 0
1 0 x 1 0

P AP =
R AR
=
=
0
0 S
0 S
0 S 0 C 0 S

y
SCS

for some y C1m . Since SCS is upper triangular, the induction proof is
complete for the case A Cnn .
When A Rnn , and all the eigenvalues of A are real, we use the transpose
instead of the adjoint, every where in the above proof. Thus, P can be chosen
to be an orthogonal matrix.
To eradicate possible misunderstanding, we recall that A has only real eigenvalues
means that when we consider this A as a matrix in Cnn , all its complex eigenvalues
turn out to be real numbers. This again means that all zeros of the characteristic
polynomial of A are real.
Further, during the course of the proof of Schurs triangularization, once we obtain
a matrix similar to A in the form



y
,
0 SCS
we look for whether is still an eigenvalue of SCS. If so, we choose this eigenvalue
over others for further reduction. In the next step we obtain a matrix similar to A in
the form

y z
0 x ,
0 0 M
where M is an (n 2) (n 2) matrix. Continuing further this way, we see that a
Schur triangularization of A exists, where on the diagonal of the final upper triangular
matrix equal eigenvalues occur together. Of course, the construction allows to have
an upper triangular form, where eigenvalues can be chosen to occur on the diagonal
on any given order. However, this particular form, where equal eigenvalues occur
together will be helpful later.

89

Canonical Forms
Example 6.1

2
1 0
3 0 for Schur triangularization.
Consider the matrix A = 2
1 1 1
We find that A (t) = (1 t)2 (4 t). All eigenvalues of A are real; thus there
exists an orthogonal matrix P such that Pt AP is upper triangular. To determine such a matrix P, we take one of the eigenvalues, say 1. An associated
eigenvector of norm 1 is v = [0, 0, 1]t . We extend {v} to an orthonormal basis
for C31 . For convenience, we take the (ordered) orthonormal basis as
{[0, 0, 1]t , [1, 0, 0]t , [0, 1, 0]t }.
Taking the basis vectors as columns, we form the matrix R as

0 1 0
R = 0 0 1 .
1 0 0
We then find that

1 1 1
2
1 .
Rt AR = 0
0
2
3


2 1
Now, we try to triangularize the matrix C =
. It has eigenvalues 1
2 3
and 4. The eigenvector of unit norm associated with the eigenvalue 1 is

[1/ 2, 1/ 2]t . We extend it to an orthonormal basis

{[1/ 2,

1/ 2]t ,

[1/ 2, 1/ 2]t }

for C21 . Then we construct the matrix S by taking these basis vectors as its
columns, that is,


1/ 2
1/ 2
S = 1
.
/ 2 1/ 2


1 1
We find that St CS =
, which is an upper triangular matrix. Then
0
4


1/ 2
1/ 2


0 1 0 1
0
0
0
1 0

1/ 2 = 0 1/ 2
1/ 2 .
1/ 2
P=R
= 0 0 1 0
0 S

1 0 0 0 1/ 2 1/ 2
1
0
0
Computing Pt AP, we have

1
Pt AP = 0
0


0 2
1 1 ,
0
4

which is upper triangular.


Since P = P1 , Shur triangularization is informally stated as

90

Elementary Matrix Theory


any square matrix is unitarily similar to an upper triangular matrix.

Further, there is nothing sacred about being upper triangular. For, given a matrix
A Cnn , consider using Schur triangularization of A . There exists a unitary matrix
P such that P A P is upper triangular. Then taking transpose, we have P AP is lower
triangular. That is,
any square matrix is unitarily similar to a lower triangular matrix.
Analogously, a real square matrix having only real eigenvalues is also orthogonally
similar to a lower triangular matrix. We remark that the lower triangular form of a
matrix need not be the transpose or the adjoint of its upper triangular form.
Moreover, neither the unitary matrix P nor the upper triangular matrix P AP in
Schur triangualrization is unique. That is, there can be unitary matrices P and Q
such that both P AP and Q AQ are upper triangular, and Q, P 6= Q, P AP 6= Q AQ.
The non-uniqueness stems from the choices involved in the associated eigenvectors
and in extending this to an orthonormal basis. For instance, in Example 6.1, if you
extend {[0, 0, 1]t } to the ordered orthonormal basis
{[0, 0, 1]t , [0, 1, 0]t , [1, 0, 0]t },
then you end up with

0
P = 0
1

1/ 2

1/ 2


1/ 2

1/ 2 ,

1
Pt AP = 0
0


0 2
1
1 .
0
4

Exercises for 6.1


1. Prove that if all eigenvalues of a square matrix are real, then it is orthogonally
similar to a lower triangular matrix.
2. Let B Fnn . Suppose that I + B is invertible. Let C = (I + B)1 (I B). Show
that C is unitary. Further, if B is unitary, then show that C is skew-hermitian.






1
2
1 0
0 2
3. Let A =
, B=
and C =
. Show that both B and
1 2
0 0
0 1
C are Schur triangularizations of A. This would prove that Schur triangularization is not unique.






1 2
0 1
0 2
4. Let A =
, B=
and C =
. Show that B is a Schur trian3 6
0 7
0 7
gularization of A but C is not.
5. Using Schur triangularization, prove Spectral mapping theorem, which states
that if is a complex eigenvalue of A, then p( ) is a complex eigenvalue of
p(A), for any polynomial p(t). Moreover, all complex eigenvalues of p(A) are
of this form.
6. Prove Cayley-Hamilton theorem using Schur triangularization.

91

Canonical Forms

6.2

Diagonalizability

As you see from Schur triangularization, each matrix with complex entries is similar
to an upper triangular matrix. Moreover, a matrix A with real entries is similar to
an upper triangular real matrix provided all zeros of the characteristic polynomial
are real. The upper triangular matrix similar to a given matrix A takes a better form
when A is hermitian.
Theorem 6.2
(Spectral theorem for hermitian matrices) Each hermitian matrix is orthogonally similar to a diagonal matrix.
Proof Let A Cnn be a hermitian matrix. Due to Schur triangularization,
we have a unitary matrix P such that D = P AP is upper triangular. Now,
D = P A P = P AP = D.
Since D is upper triangular and D = D, we see that D is a diagonal matrix.
Observe that the diagonal entries in the diagonal matrix D are the eigenvalues of A. Since A is hermitian, all of them are real numbers, and the associated
eigenvectors can always be chosen to be in Rn1 . Therefore, the unitary matrix
P, which consists of real eigenvectors of A is an orthogonal matrix.
It thus follows that each real symmetric matrix is orthogonally similar to a diagonal matrix. The spectrum of a matrix is the multi-set of its eigenvalues. Theorem 6.2
is called the spectral theorem for hermitian matrices since it explicitly uses the eigenvalues of the matrix.
A matrix A Fnn is called diagonalizable iff there exists an invertible matrix P
Fnn such that P1 AP is a diagonal matrix. When P1 AP is a diagonal matrix, we
say that A is diagonalized by P. In this language, the spectral theorem for hermitian
matrices may be stated as follows:
Every hermitian matrix is orthogonally diagonalizable.
To see how the eigenvalues and eigenvectors are involved in the diagonalization process, we proceed as follows.
Let A Fnn . Let 1 , . . . , n be all complex eigenvalues (not necessarily distinct)
of A. Let v1 , . . . , vn Fn1 be corresponding eigenvectors. Construct n n matrices
P := [v1 v2 vn ],

D = diag ( 1 , 2 , . . . , n ).

Notice that Pei = vi and Dei = i ei for i = 1, . . . , n. Then


col(i) of AP = Avi = i vi = i Pei = P ( i ei ) = PDei = col(i) of PD.

92

Elementary Matrix Theory

That is,
AP = P D..
P1 AP

If P is invertible, then
= D, a diagonal matrix. That is, A is similar to a
diagonal matrix. Moreover, P is invertible iff its columns form a basis for Fn1 . We
thus obtain the following result.
Theorem 6.3
A matrix A Fnn is diagonalizable iff there exists a basis of Fn1 consisting
of eigenvectors of A.
The question is, when are there n linearly independent eigenvectors of A? The
spectral theorem provides a partial answer. We can generalize the spectral theorem
to the so called normal matrices.
A matrix A Cnn is called a normal matrix iff A A = AA . We observe the
following.
Theorem 6.4
Each upper triangular normal matrix is diagonal.
Proof Let U Cnn be an upper triangular matrix. If n = 1, then clearly
U is a diagonal matrix. Lay out the induction hypothesis that each upper
triangular normal matrix of order k is diagonal. Let U be an upper triangular
normal matrix of order k +1. Write U in a partitioned form, as in the following.


V u
U=
,
0 a
where V Ckk , u Ck1 , 0 is the zero row vector in C1k , and a C. Then




V V
V u
VV + uu a u

U U =
=
UU
=
u V u u + |a|2
a u
|a|2
implies that u u + |a|2 = |a|2 . That is, u = 0. Plugging u = 0 in the above
equation, we see that V V = VV . Since V is upper triangular, by the induction
hypothesis, V is a diagonal matrix. Then with u = 0, U is also a diagonal
matrix.
Using this result on upper triangular normal matrices, we can generalize the spectral theorem to normal matrices.
Theorem 6.5
(Spectral theorem for normal matrices) A square matrix is unitarily diagonalizable iff it is a normal matrix.

93

Canonical Forms

Proof Let A Cnn . If A is unitarily diagonalizable, then we have a diagonal


matrix D = diag ( 1 , . . . , n ) and a unitary matrix P such that A = PDP . Then
A A = PD DP and AA = PDD P . However, D D is a diagonal matrix with
diagonal entries as | |2 ; so is DD . That is, D D = DD . Therefore, A A = AA .
That is, A is a normal matrix.
Conversely if A is normal, then A A = AA . Due to Schur triangularization,
let Q be a unitary matrix such that Q AQ = U, an upper triangular matrix.
Since Q = Q1 , the condition A A = AA implies that U U = UU . By Theorem 6.4, U is a diagonal matrix.
There can be non-normal matrices which are diagonalizable. For example, with

1 0
0
0 1 0
5 2
A = 4 3 2 , P = 1
2 1
0
1
3 1
we see that A A 6= AA , P P 6= I but

1 1
2
1
0
0 4
P1 AP = 1
2
1 1
2

0
0
0 1
3 2 1
5
1
0
1
3


0
1
2 = 0
1
0

0
1
0

0
0 .
2

Observe that in such a case, the diagonalizing matrix P is non-unitary.


Another partial answer is provided by the following theorem.
Theorem 6.6
Eigenvectors associated with distinct eigenvalues of a matrix are linearly independent. In particular, if a matrix of order n has n distinct eigenvalues,
then it is diagonalizable.
Proof Let 1 , . . . , n be the distinct eigenvalues of A, and let v1 , . . . , vn be
corresponding eigenvectors. We use induction on k {1, . . . , n}.
For k = 1, since v1 6= 0, {v1 } is linearly independent. Lay out the induction
hypothesis: for k = m suppose {v1 , . . . , vm } is linearly independent. Now, for
k = m + 1, suppose
1 v1 + 2 v2 + + m vm + m+1 vm+1 = 0.

(6.1)

Then, T ( 1 v1 + 2 v2 + + m vm + m+1 vm+1 ) = 0 gives (since T vi = i vi )


1 1 v1 + 2 2 v2 + + m m vm + m+1 m+1 vm+1 = 0.
Multiply (6.1) with m+1 and subtract from the last equation to get
1 ( 1 m+1 )v1 + + m ( m m+1 )vm = 0.

94

Elementary Matrix Theory

By the Induction Hypothesis, i ( i m+1 ) = 0 for i = 1, 2, . . . , m. However,


i 6= m+1 . Thus i = 0 for each such i. Then, (6.1) yields m+1 vm+1 = 0.
Since vm+1 6= 0, m+1 = 0. This completes the proof of linear independence of
eigenvectors associated with distinct eigenvalues.
For the second statement, suppose A Cnn has n distinct eigenvalues
1 , . . . , n . Then the associated eigenvectors v1 , . . . , vn are linearly independent, and thus form a basis for Cn1 . Therefore, A is diagonalizable. More
directly, taking
P = [v1 v2 vn ],
we see that P is invertible and P1 AP = diag ( 1 , . . . , n ).
As you have seen, the procedure for diagonalization of a matrix A Fnn goes
as follows. First, we find its eigenvalues and the associated eigenvectors. Observe
that if F = R, and there exists an eigenvalue with nonzero imaginary part, then A
cannot be diagonalized by a matrix with real entries. Suppose we have n number
of eigenvalues in F counting multiplicities. Then, we check whether n such linearly
independent eigenvectors exist or not. If not, we cannot diagonalize A. If yes, then,
we put the eigenvectors together as columns to form the matrix P; and P1 AP is a
diagonalization of A.
Example 6.2

1 1 1
1 1 is real symmetric. It has eigenvalues
The matrix A = 1
1 1
1
1, 2, 2, with associated eigenvectors (normalized)



1/ 3
1/ 6
1/ 2
1/3 , 1/2 , 1/6 .

1/ 3
2/ 6
0
They form an orthonormal basis for R3 . Taking


1/ 3 1/ 2 1/ 6

P = 1/ 3 1/ 2 1/ 6 ,

1/ 3
2/ 6
0

1 0
we see that P1 = Pt and P1 AP = Pt AP = 0 2
0 0

0
0 .
2

If is an eigenvalue of a matrix A, then its associated eigenvector u is a solution of


Au = u. This means u N(A I). Now, how many linearly independent solution
vectors u of Au = u are there? Obviously, it is dim (N(A I)). This number has
certain relation with diagonalizability of T. To use this further, we first give this
number a name.

95

Canonical Forms

Let be an eigenvalue of a matrix A Fnn . The geometric multiplicity of


is dim N(A I); and the algebraic multiplicity of is the number of times is a
zero of the characteristic polynomial of A.
Observe that if is an eigenvalue of A, then its geometric multiplicity is the maximum number of linearly independent vectors associated with .
Theorem 6.7
Let A Fnn . Let be an eigenvalue of A. Then the geometric multiplicity of
is less than or equal to its algebraic multiplicity. Further, A is diagonalizable
iff geometric multiplicity of each eigenvalue is equal to its algebraic multiplicity
iff sum of geometric multiplicities of all eigenvalues is n.
Proof Let the geometric multiplicity of an eigenvalue be k. Then we
have k number of linearly independent eigenvectors of A associated with the
eigenvalue , and no more. Extend the set of these eigenvectors to an ordered
basis B of V. Let P be the matrix whose columns are the vectors in B. Then


Ik C
1
M := P AP =
,
0
D
where Ik is the identity matrix of order k and C Ck(nk) , D C(nk)(nk)
are some matrices. Since A is similar to M, they have the same characteristic
polynomial; and it is of the form
( t)k p(t)
for some polynomial p(t) of degree n k. Clearly, the algebraic multiplicity of
is at least k. This proves the first statement.
For the second statement, suppose A is diagonalizable. Then we have an
ordered basis E of V which consists of eigenvectors of A, with respect to which
the matrix of A is diagonal. If is an eigenvalue of A of algebraic multiplicity
m, then in this diagonal matrix there are exactly m number of entries equal to
. Then in the basis E there are exactly m number of eigenvectors associated
with . Therefore, the geometric multiplicity of is m.
Conversely, suppose that the geometric multiplicity of each eigenvalue is
equal to its algebraic multiplicity. Then corresponding to each eigenvalue ,
we have exactly that many linearly independent eigenvectors as its algebraic
multiplicity. Collecting together the eigenvectors associated with all eigenvalues, we get n linearly independent eigenvectors; which form a basis for Fn1 .
Therefore, A is diagonalizable.
The second iff statement follows since geometric multiplicity of each eigenvalue is at most its algebraic multiplicity.
Example
 6.3 

1 0
1
Let A =
and B =
0 1
0


1
. The characteristic polynomials of both
1

96

Elementary Matrix Theory

A and B are equal to (1 t)2 . The eigenvalue = 1 has algebraic multiplicity


2 for both A and B.
For geometric multiplicities, we solve Ax = x and By = y.
Now, Ax = x gives x = x, which is satisfied by the linearly independent vectors [1, 0]t and [0, 1]t . Thus, N(A I) has dimension 2. Thus, the geometric
multiplicity of the only eigenvalue 1 of A is same as its algebraic multiplicity.
Also, we see that A is diagonalizable; in fact, it is already a diagonal matrix.
Proceeing similarly with the matrix B, we see that the linear system Bx = x
with x = [a, b]t gives a + b = a and b = b. That is, a = 0 and b can be any
complex number. For example, [0, 1]t . Now, dim (N(B I)) = 1. Geometric
multiplicity of the eigenvalue 1 of B is 1, which is not equal to its algebraic
multiplicity. Therefore, B is not diagonalizable.

Exercises for 6.2


1. In each of the following cases, determine whether the given
nalizable by a matrix with complex entries:

2 1 0


1 10 0
0 2 0
2 3
3 1
(a)
(b) 1
(c)
0 0 2
6 1
1
0 4
0 0 0

7 5 15
2. Diagonalize A = 6 4 15 . Then compute A6 .
0
0 1

matrix is diago
0
0

0
5

3. Diagonalize the following matrices:

0 1 1
7 2
0
6 2 .
(a) 1 0 1
(b) 2
1 1 0
0 2
5
4. Check whether each of the following matrices is diagonalizable. If diagonalizable, find a basis of eigenvectors for the space C31 :

1 1 1
1 1 1
1 0 1
(a) 1 1 1
(b) 0 1 1
(c) 1 1 0
1 1 1
0 0 1
0 1 1
5. Show that each of the following matrices is diagonalizable with a matrix in
R33 . Also find a basis of eigenvectors for R31 .

3/2 1/2 0
3 1/2 3/2
2 1 0
2 0 .
(a) 1/2 3/2 0
(b) 1 3/2 3/2
(c) 1
1/2 1/2 1
1
5
1
/2
/2
2
2 3
6. Prove that if a normal matrix has only real eigenvalues, then it is hermitian.
Conclude that if a real normal matrix has only real eigenvalues, then it is symmetric.

97

Canonical Forms

6.3

Jordan form

We now know that all matrices cannot be diagonalized since corresponding to an


eigenvalue, there may not be sufficient number of linearly independent eigenvectors.
Non-diagonalizability of a matrix A Fnn means that we cannot have a basis consisting of vectors v j for Fn1 so that A(v j ) = j v j for scalars j . In that case, we
would like to have a basis which would bring the matrix of the linear operator to a
nearly diagonal form.
In what follows, we will be using similarity transformations resulting out of elementary matrices. A similarity transformation that uses an elementary matrix E[i, j]
on a matrix A transforms A to (E[i, j])1 A E[i, j]. Since (E[i, j])1 = (E[i, j])t =
E[i, j], the net effect of this transformation is described as follows:
E[i, j]1 A E[i, j] = E[i, j] A E[i, j] exchanges the ith and jth rows, and
then exchanges the jth and ith columns of A.
We will refer to this type of similarity transformations by the name permutation
similarity.
Using the second type of elementary matrices, we have a similarity transformation (E [i])1 A E [i] for 6= 0. Since (E [i])1 = E1/ [i] and (E [i])t = E [i], this
similarity transformation has the following effect:
(E [i])1 A E [i] = E1/ [i] A E [i] multiplies all entries in the ith row
with 1/ and then multiplies all entries in the ith column with ; thus
keeping (i, i)th entry intact.
We will refer to this type of similarity as dilation similarity. In particular, if A is
such a matrix that its ith row has all entries 0 except the (i, i)th entry, and there is
another entry on the ith column which is 6= 0, then (E [i])1 AE [i] is the matrix
in which this changes to 1 and all other entries are as in A.
The third type of similarity transformation applied on A yields (E [i, j])1 AE [i, j].
Notice that (E [i, j])1 = E [i, j] and (E [ j, i])t = E [i, j]. This similarity transformation changes a matrix A as described below:
(E [i, j])1 AE [i, j] = E [i, j] A E [i, j] is obtained from A by replacing the ith row by the ith row minus times the jth row, and replacing
the jth column by the jth column plus times the ith column.
We name this type of similarity as a combination similarity.
Our goal is to prove that there exists an invertible matrix P such that P1 AP is in
the form
P1 AP = diag (J1 , J2 , . . . , Jk ),
where each Ji is a block diagonal matrix of the form
Ji = diag (J1 ( i ), J2 ( i ), . . . , Jsi ( i )),

98
for some si ; each matrix Jj ( i ) here has the form

i 1

i 1

.. ..
Jj ( i ) =
. .

Elementary Matrix Theory

1
i

The missing entries are all 0. Such a matrix Jj ( i ) is called a Jordan block with
diagonal entries i . Any matrix which is in the above block diagonal form is said
to be in Jordan form. We will see that the number of Jordan blocks with diagonal
entries i is the geometric multiplicity of the eigenvalue i .
Example 6.4
The following matrix is one in Jordan form:

1
1

2
1

2 1
2
It has three Jordan blocks for the eigenvalue 1 of which two are of size 11 and
one of size 2 2; and it has one block of size 3 3 for the eigenvalue 2. Notice
that the eigenvalue 1 has geometric multiplicity 3, algebraic multiplicity 4,
and 2 has geometric multiplicity 1 and algebraic multiplicity 3.
Theorem 6.8
(Jordan form) Each matrix A Cnn is similar to a matrix in Jordan form J,
where the diagonal entries are the eigenvalues of A. Moreover, if mk ( ) is the
number of Jordan blocks of order k with diagonal entry , then
mk ( ) = rank((A I)k1 )2 rank((A I)k )+rank((A I))k+1 for k = 1, . . . , n.
In particular, the Jordan form of A is unique up to a permutation of the blocks.
In the formula for mk we use the convention that for any matrix B of order n, B0 is
the identity matrix of order n.
Proof First, we will show the existence of a Jordan form, and then we will
come back to the formula mk , which will show the uniqueness of a Jordan form
up to a permutation of Jordan blocks.

99

Canonical Forms

Due to Schur triangularization, we assume that A is an upper triangular


matrix, where the eigenvalues of A occur on the diagonal, and equal eigenvalues occur together. If 1 , . . . , k are the distinct eigenvalues of A, then our
assumption means that A is an upper triangular matrix with diagonal entries
read from top left to bottom right appear as
1, . . . , 1; 2, . . . , 2; . . . ; k, . . . , k.
In this list suppose i occurs ni times. First, we show that by way of a
similarity transformation, A can be brought to the form
diag (A1 , A2 , . . . , Ak ),
where each Ai is an upper triangular matrix of size ni ni and each diagonal
entry of Ai is i . Our requirement is shown schematically as follows, where each
such element marked x that is not inside the blocks Ai need to be zeroed-out
by a similarity transformation.

x
0
A1
A1

A2
A2

..
..

.
.

Ak
Ak
If such an x occurs as the (r, s)th entry in A, then r < s. Moreover, the corresponding diagonal entries arr and ass are eigenvalues of A occurring in different
blocks Ai and A j . Thus arr 6= ass . Further, all entries below the diagonals of Ai
and of A j are 0. We use a combination similarity to obtain
E [r, s] A E [r, s] with

x
.
arr ass

This similarity transformation replaces the rth row with the rth row minus
times the sth row, and then replaces the sth column with sth column plus
times the rth column. Since r < s, it changes the entries of A in the rth
row to the right of the sth column, and the entries in the sth column above
the rth row. Thus the upper triangular nature of the matrix does not change.
Further, it replaces the (r, s)th entry x with
ars + (arr ass ) = x +

x
(arr ass ) = 0.
arr ass

We use a sequence of such similarity transformations starting from the last row
of Ak1 with least column index and ending in the first row with largest column
index. Observe that an entry beyond the blocks, which was 0 previously can
become nonzero after a single such similarity transformation. Such an entry
will eventually be zeroed-out. Finally, each position which is not inside any

100

Elementary Matrix Theory

of the k blocks A1 , . . . , Ak contains only 0. On completion of this stage, we end


up with a matrix
diag (A1 , A2 , . . . , Ak ).
In the second stage, we focus on bringing each block Ai to the Jordan form.
For notational convenience, write i as a. If ni = 1, then such an Ai is already
in Jordan form. We use induction on the order ni of A. Lay out the induction
hypothesis that each such matrix of order m 1 has a Jordan form. Suppose
Ai has order m. Look at Ai in the following partitioned form:


B u
,
Ai =
0 a
where B is the first (m 1) (m 1) block, 0 is the zero row vector in C1(m1) ,
and u is a column vector in C(m1)1 . Due to the induction hypothesis, there
exists an invertible matrix Q such that Q1 BQ is in Jordan form; it looks like

1
Q BQ =

B1
B2
..

.
B`

a 1
1
a

..

where each B j =
.

..

1
a

Then


Q
C :=
0

1 
0
Q
Ai
1
0

  1
0
Q BQ
=
1
0

a

..

1
Q u

.
=
a

..

b1
b2

.
bm2

a bm1
a

Here, the sequence of s on the super-diagonal read from top left to right
bottom comprise a block of 1s followed by a 0 and then a block of 1s followed
by a 0, and so on. The number of 1s depend on the sizes of B1 , B2 , etc. That
is, when B1 is complete and B1 starts, we have a 0. Also, we have shown Q1 u
as [b1 . . . bm1 ]t . Our goal is to zero-out all b j s to 0 except bm1 which may
be made a 0 or 1.
In the next sub-stage, call it the third stage, we apply similarity transformations to zero-out (all or except one of) the entries b1 , . . . , bm2 . In any row
of C the entry above the diagonal (the there) is either 0 or 1. The is a 0
at the last row of each block B j . We leave all such bs right now; they are to
be tackled separately. So, suppose in the rth row, br 6= 0 and the (r, r + 1)th
entry (the above the diagonal entry) is a 1. We wish to zero-out each such

Canonical Forms

101

br which is in the (r, m) position. For this purpose, we use a combination


similarity to transform C to
Ebr [r + 1, m]C (Ebr [r + 1, m])1 = Ebr [r + 1, m]C Ebr [r + 1, m].
Observe that this matrix is obtained from C by replacing the (r + 1)th row
with (r + 1)th row plus br times the last row, and then replacing the last
column with the last column minus br times the (r + 1)th column. Its net
result is replacing the (r, m)th entry by 0, and keeping all other entries intact.
Continuing this process of applying a suitable combination similarity transformation, each nonzero bi with a corresponding 1 on the super-diagonal on
the same row is reduced to 0. We then obtain a matrix, where all entries in
the last column of C have been zeroed-out, without touching the entries at
the last row of any of the blocks B j . Write such entries as c1 , . . . , c` . Thus at
the end of third stage, Ai has been brought to the following from by similarity
transformations:

B1
c
1

B2

c2

..
F :=
.
.

B`

c`
a
Notice that if B j is a 1 1 block, then the corresponding entry on the last
column is already 0. In the next sub-stage, call it the fourth stage, we keep
the nonzero c corresponding to the last block (the c entry with highest column index) and zero-out all other cs. Let Bq be the last block so that its
corresponding c entry is cq 6= 0 in the sth row. (It may not be c` ; in that
case, all of cq+1 , . . . , c` are already 0.) We first make cq a 1 by using a dilation
similarity:
G := E1/cq [s] FEcq [s].
In G, the earlier cq at (s, m)th position is now 1. Let B p be any block other
than Bq with c p 6= 0 in the rth row. Our goal in this sub-stage, call it the fifth
stage, is to zero-out c p . We use two combination similarity transformations as
shown below:
H := Ec p [r 1, s 1] Ec p [r, s] H Ec p [r, s] Ec p [r 1, s 1].
This similarity transformation brings c p to 0 and keeps other entries intact.
We do this for each such c p . Thus in the mth column of H, we have only one
nonzero entry 1 at (s, m)th position. If this happens to be at the last row, then
we have obtained a Jordan form. Otherwise, call this sub-stage as the seventh
stage, we move this 1 to the (s, s + 1)th position by the following sequence of
permutation similarities:
K := E[m 1, m] E[s + 2, m]E[s + 1, m] H E[s + 1, m]E[s + 2, m] E[m 1, m].

102

Elementary Matrix Theory

This transformation exchanges the rows and columns beyond the sth so that
the 1 in (s, m)th position moves to (s, s + 1)th position making up a block; and
other entries remain as they were earlier.
Here ends the proof by induction that each block Ai can be brought to a
Jordan form by similarity transformations. From a similarity transformation
for Ai a similarity transformation can be constructed for the block diagonal
matrix
A := diag (A1 , A2 , . . . , Ak )
by putting identity matrices of suitable order and the similarity transformation for Ai in a block form. As these transformations do not affect any other
a sequence of such transformations brings A to its
rows and columns of A,
Jordan form, proving the existence part in the theorem.
For the formula for mk , let be an eigenvalue of A. Suppose k {1, . . . , n}.
Observe that A I is similar to J I. Thus, rank((A I)i ) = rank((J I)i )
for each i. Therefore, it is enough to prove the formula for J instead of A. We
use induction on n. In the basis case, J = [ ]. Here, k = 1 and mk = m1 = 1. On
the right hand side, due to the convention,
(J I)k1 = I = [1], (J I)k = [0]1 = [0], (J I)k+1 = [0]2 = [0].
So, the formula holds for n = 1.
Lay out the induction hypothesis that for all matrices in Jordan form of
order less than n, the formula holds. Let J be a matrix of order n, which is in
Jordan form. We consider two cases.
Case 1: Let J have a single Jordan block corresponding to . That is,

0 1
1

0 1

.
.
.
.
.. ..
.. ..
J=
.
, J I =

1
1
0

Here m1 = 0, m2 = 0, . . . , mn1 = 0 and mn = 1. We see that (J I)2 has 1


on the super-super-diagonal, and 0 elsewhere. Proceeding similarly for higher
powers of J I, we see that their ranks are given by
rank(J I) = n 1, rank((J I)2 ) = n 2, . . . , rank((J I)i = n i,
rank((J I)n ) = 0, rank((J I)n+1 = 0, . . . .
Then for k < n,

rank((J I)k1 ) 2 rank((J I)k ) + rank((J I)k+1 )


= (n (k 1)) 2(n k) + (n k 1) = 0.

And for k = n,

rank((J I)k1 ) 2 rank((J I)k ) + rank((J I)k+1 )


= (n (n 1)) 2 0 + 0 = 1 = mn .

103

Canonical Forms

Case 2: Suppose J has more than one Jordan block corresponding to . Suppose that the first Jordan block in J corresponds to and has order r < n.
Then J I can be written in block form as


C 0
J I =
,
0 D
where C is the Jordan block of order r with diagonal entries as 0, and D is
the matrix of order n r in Jordan form consisting of other blocks of J I.
Then, for any j,
 j

C
0
j
(J I) =
.
0
Dj
Therefore,
rank(J I) j = rank(C j ) + rank(D j ).
Write mk (C) and mk (D) for the number of Jordan blocks of order k for the
eigenvalue that appear in C and in D, respectively. Then
mk = mk (C) + mk (D).
By the induction hypothesis,
mk (C) = rank(Ck1 ) 2 rank(Ck ) + rank(C)k+1 ,
mk (D) = rank(Dk1 ) 2 rank(Dk ) + rank(D)k+1 .
It then follows that
mk = rank((J I)k1 ) 2 rank((J I)k ) + rank((J I))k+1 .
Since the number of Jordan blocks of order k corresponding to each eigenvalue of A is uniquely determined, the Jordan form of A is also uniquely
determined up to a permutation of blocks.
To obtain a Jordan form of a given matrix, we may use the construction of similarity transformations as used in the proof of Theorem 6.8, or we may use the formula
for mk as given there. We illustrate these methods in the following examples.
Example 6.5

2 1 0 0 0 1 0
2 0 0 0 3 0

2
2 1 0 0

2 0 2 0

2 0 0
Consider the upper triangular matrix A =

2 0

2 0

1
0

0
0
0
0
1
3

0
0
0
0
1
1
3

104

Elementary Matrix Theory

Following the proof of Theorem 6.8, we first zero-out the circled entries,
starting from the entry on the third row. Here, the row index is r = 3, the
column index is s = 7, the eigenvalues are arr = 2, ass = 3, and the entry to
be zeroed-out is x = 2. Thus, = 2/(2 3) = 2. We use the combination
similarity:
M1 = E2 [3, 7] A E2 [3, 7].
That is, in A, we replace row(3) with row(3) 2 row(7) and then replace
col(7) with col(7) + 2 col(3) to obtain

2
2 1 0 0 0 1 0
0
2 0 0 0 3 0 0
1

2 1 0 0 0 2 0

2 0 2 0 0 0

2 0 0 0 0
M1 =
.

2
0
0
0

3 1 1

3 1
3
Notice that the similarity transformation made a previously 0 entry nonzero.
It brought in a new nonzero entry such as 2 in (3, 8) position. We will
zero it out before proceeding to the originally encircled ones. The suitable
combination similarity is
M2 = E2 [3, 8] M1 E2 [3, 8],
which replaces row(3) with row(3) + 2 row(8) and then replaces col(8) with
col(8) 2 col(3). Verify that it zeroes-out the entry 2 but introduces 2 at
(3, 9) position. Once more, we use a combination similarity. This time we use
M3 = E2 [3, 9] M2 E2 [3, 9]
replacing row(3) with row(3)2row(9) and then replacing col(9) with col(9)+
2 col(3). Now,

2 0
2 1 0 0 0 1 0
2 0 0 0 3 0 0
1

2
1
0
0
0
0
0

2 0 2 0 0 0

2 0 0 0 0
M3 =

2
0
0
0

3 1 1

3 1
3
Similar to the above, we use the combination similarities to reduce M3 to M4 ,
where
M4 = E1 [2, 9] M3 E1 [2, 9].

105

Canonical Forms
To zero-out the encircled 2, we use the combination similarity
M5 = E2 [1, 8] M4 E2 [1, 8].

It zeroes-out the encircled 2 but introduces 2 at (1, 9) position. Once more,


we use a suitable combination similarity to obtain

1
2 1 0 0 0

2 0 0 0 3

2
1
0
0

2
0
2

.
2
0
M6 = E2 [1, 9] M5 E2 [1, 9] =

3
1
1

3 1
3
Now, the matrix M6 is in block diagonal form. We focus on each of the
blocks, though we will be working with the whole matrix. We consider the
block corresponding to the eigenvalue 2 first. Since this step is inductive we
scan this block from the top left corner. The 2 2 principal sub-matrix of
this block is already in Jordan form. The 3 3 principal sub-matrix is also
in Jordan form. We see that the principal sub-matrix of size 4 4 and 5 5
are also in Jordan form, but the 6 6 sub-matrix, which is the block itself is
not in Jordan form. We wish to bring the sixth column to its proper shape.
Recall that our strategy is to zero out all those entries on the sixth column
which are opposite to a 1 on the super-diagonal of this block. There is only
one such entry, which is encircled in M6 above.
The row index of this entry is r = 1, its column index is m = 6, and the
entry itself is br = 1. We use the combination similarity

2 1 0 0 0 0
2 0 0 0 5

2 1 0 0

2 0 2

.
2 0
M7 = E1 [2, 6] M6 E1 [2, 6] =

3 1 1

3 1
3
Next, among the nonzero entries 5 and 2 at the positions (2, 6) and (4, 6), we
wish to zero-out the 5 and keep 2 as the row index of 2 is higher. First, we
use a dilation similarity to make this entry 1 as in the following:
M8 = E1/2 [4] M7 E2 [4].

106

Elementary Matrix Theory

It replaces row(4) with 1/2 times itself, and then replaces col(4) with 2 times
itself, thus making (4, 6)th entry 1 and keeping all other entries intact. Next,
we zero-out the 5 on (2, 4) position by using the two combination similarities.
Here, c p = 5, r = 2, s = 4; thus

2 1 0 0 0 0

2 0 0 0 0

2 1 0 0

1
2 0

2 0
M9 = E5 [1, 3] E5 [2, 4] M8 E5 [2, 4] E5 [1, 3] =

3 1 1

3 1
3

Notice that M9 has been obtained from M8 by replacing row(2) with row(2)
5 row(4), col(4) with col(4) + 5 col(2), row(1) with row(1) 5 row(3), and
then col(3) with col(3) + 5 col(1).
Next, we move this encircled 1 to (4, 5) position by similarity. Here, s =
4, m = 6. Thus the sequence of permutation similarities boils down to only one,
i.e., exchanging row(5) with row(6) and then exchanging col(6) with col(5).
Observe that we would have to use more number of permutation similarities
if the difference between m and s is more than 2. We thus obtain

2 1 0 0 0 0
2 0 0 0 0

2
1
0
0

2
1
0

.
2
0
M10 = E[5, 6] M9 E[5, 6] =

3
1
1

3 1
3

Now, the diagonal block corresponding to the eigenvalue 2 is in Jordan form.


We focus on the other block corresponding to 3. Here, (7, 9)th entry which
contains a 1 is to be zeroed out. this entry is opposite to a 1 on the superdiagonal. Thus we use a combination similarity. Here, the row index is
r = 7, the column index m = 9, and the entry is br = 1. Thus the similarity

107

Canonical Forms
transformation is

2 1 0 0 0 0

2 0 0 0 0

2 1 0 0

2 1 0

2 0
M11 = E1 [8, 9] M10 E1 [8, 9] =

3 1 0

3 1
3
Now, M11 is in Jordan form.

Example 6.6
We consider the same matrix A of Example 6.5. Here, we compute the number
mk of Jordan blocks corresponding to each eigenvalue which is of size k. For
this purpose, we require the ranks of the matrices (A I)k for successive k
and for each eigenvalue of A. We see that A has two eigenvalues 2 and 3. You
may compute the successive powers of (A 2I) and of (A 3I) and their ranks
using packages such as Matalb or Scilab. We find that for the eigenvalue 2,
rank(A 2I)0 = rank(I) = 9, rank(A 2I) = 6, rank(A 2I)2 = 4,
rank(A 2I)3+k = 3

for k = 0, 1, 2, . . . .

Similarly, we see that


rank(A 3I)0 = rank(I) = 9, rank(A 3I) = 8, rank(A 3I)2 = 7,
rank(A 3I)3+k = 6

for k = 0, 1, 2, . . . .

Using the formula for mk ( ), we obtain


m1 (2) = 9 2 6 + 4 = 1, m2 (2) = 6 2 4 + 3 = 1,
m3 (2) = 4 2 3 + 3 = 1, m3+k (2)3 2 3 + 3 = 0.
m1 (3) = 9 2 8 + 7 = 0, m2 (3) = 8 2 7 + 6 = 0,
m3 (3) = 7 2 6 + 6 = 1, m3+k (3) = 6 2 6 + 6 = 0.
Therefore, in the Jordan form of A, there is one Jordan block of size 1, one of
size 2 and one of size 3 with eigenvalue 2, and one block of size 3 with eigenvalue 3. From this information we see that the Jordan form of A is uniquely
determined up to any rearrangement of the blocks. Check that M11 as obtained in Example 6.5 is one such Jordan form of A.

108

Elementary Matrix Theory

Suppose that a matrix A Cnn has a Jordan form J = P1 AP, in which the first
Jordan block is of size k with diagonal entries as . Suppose P = [v1 vn ]. Then
AP = PJ implies that
A(v1 ) = v1 , A(v2 ) = v1 + v2 , . . . , A(vk ) = vk1 + vk .
If the next Jordan block in J has diagonal entries as (which may or may not be
equal to ) then we have Avk+1 = vk+1 , Avk+2 = vk+1 + vk+2 , . . . , and so on.
The list of vectors v1 , . . . , vk above is called a Jordan string that starts with v1 and
ends with vk . The number k is called the length of the Jordan string. In such a Jordan
string, we see that
v1 N(A I), v2 N(A I)2 , . . . , vk N(A I)k .
Any vector in N((A I) j ), for some j, is called a generalized eigenvector corresponding to the eigenvalue of A.
The columns of P are all generalized eigenvectors of A corresponding to some
eigenvalue of A. Moreover, the columns of P can be constructed this way, looking at
the subspaces N(A I) j . One may start with linearly independent vectors satisfying
(A I)v = 0. Corresponding to each solution v1 of this linear system, one determines linearly independent vectors satisfying (A I)v = v1 . Next, corresponding
to each solution v2 of this linear system, one solves (A I)v = v2 and so on. The
process stops when n number of linearly independent vectors have been obtained this
way. These vectors form the matrix P.
In the first stage, if the geometric multiplicity of the eigenvalue is , then there
are number of linearly independent eigenvectors associated with . These are possible candidates for v1 . Thus there are number of Jordan strings associated with .
These strings give rise to the Jordan blocks with diagonal entries as . Thus, in J
there are exactly number of Jordan blocks with diagonal entries as .
You can prove this fact from J directly by first showing that the geometric multiplicity of the eigenvalue of A is the same as the geometric multiplicity of the
eigenvalue of J.
The uniqueness of a Jordan form can be made exact by first ordering the eigenvalues of A and then arranging the blocks corresponding to each eigenvalue (which now
appear together on the diagonal) in some order, say in ascending order of their size.
In doing so, the Jordan form of any matrix becomes unique. Such a Jordan form is
called the Jordan canonical form of a matrix. It then follows that if two matrices
are similar, then they have the same Jordan canonical form. Moreover, uniqueness
also implies that two dissimilar matrices will have different Jordan canonical forms.
Therefore, Jordan form characterizes similarity of matrices.
As an application of Jordan form, we will show that each matrix is similar to its
transpose. Suppose J = P1 AP. Now, Jt = Pt At (P1 )t = Pt At (Pt )1 . That is, At is
similar to Jt . Thus it is enough to show that Jt is similar to J. First, let us see it for a

109

Canonical Forms
single Jordan block. So, let

J =

1
.. ..
. .

Take the matrix Q as

Q=

1
.
1

..

where the entries on the anti-diagonal are all 1 and all other entries are 0. We see that
Q2 = I. Thus Q1 = Q. Further,
Q1 J Q = Q J Q = (J )t .
Therefore, each Jordan block is similar to its transpose. Now, construct a matrix R
by putting such a matrix as its blocks matching the orders of each Jordan block in J.
Then it follows that R1 J R = Jt .
It also follows from the Jordan form that one can always choose m linearly independent generalized eigenvectors corresponding to the eigenvalue , where m is the
algebraic multiplicity of . Further, it is guaranteed that
if the linear system (A I)k x = 0 has r < m number of linearly independent solutions, then (A I)k+1 has at least r + 1 number of linearly
independent solutions.
This result is often more useful in computing the exponential of a matrix rather than
using explicitly the Jordan form, which is comparatively difficult to compute.

Exercises for 6.3


1. Determine the Jordan forms of the following matrices:

0 0 0
2 1 3
3
3
(a) 1 0 0 (b) 4
2 1 0
2
1 1
2. Let A be a 7 7 matrix with characteristic polynomial (t 2)4 (3 t)3 . In the
Jordan form of A, the largest block for each of the eigenvalues is 2. Show that
there are only two possible Jordan forms for A; and determine those Jordan
forms.

110

Elementary Matrix Theory

3. Let A be a 5 5 matrix whose first two rows are [0, 1, 1, 0, 1] and [0, 0, 1, 1, 1];
all other rows are zero rows. What is the Jordan form of A?
4. Determine the matrix P C33 such that P1 AP is in Jordan form, where A is
the matrix in Exercise 1(b).
5. Let A Cnn have an eigenvalue . Suppose the number mk for this eigenvalue are known for each k N. Show that for each j, both rank(A I) j and
null (A I) j are uniquely determined.
6. Let A, B Cnn . Show that A and B are similar iff they have the same eigenvalues and for each eigenvalue , for each k N, rank(A I)k = rank(B I)k .
7. Let A, B Cnn . Show that A and B are similar iff they have the same eigenvalues and for each eigenvalue , for each k N, null (A I)k = null (B I)k .
8. Let be an eigenvalue of a matrix A Cnn having algebraic multiplicity m.
Then prove that null ((A I)m ) = m.
9. Let J be a Jordan form of a matrix A Cnn . Let be an eigenvalue of A.
Show that the geometric multiplicity of as an eigenvalue of A is the same as
the geometric multiplicity of as an eigenvalue of J.
10. Let J be a matrix in Jordan form. Let be an eigenvalue of J. Show that the
geometric multiplicity of is equal to the number of Jordan blocks in J having
as the diagonal entries.
11. Conclude from previous two exercises that if is an eigenvalue of a matrix A
and J is the Jordan form of A, then the number of Jordan blocks with diagonal
entry in J is the geometric multiplicity of .

6.4

Singular value decomposition

Given an m n matrix A with complex entries there are two hermitian matrices that
can be constructed naturally from it, namely, A A and AA . We wish to study the
eigenvalues and eigenvectors of these matrices and their relations to certain parameters associated with A. We will see that these concerns yield a factorization of A.
The hermitian matrix A A Cnn has only real eigenvalues. If R is such an
eigenvalue with an associated eigenvector v Cn1 , then A Av = v implies that
kvk2 = v v = v ( v) = v A Av = (Av) (Av) = kAvk2 .
Since kvk > 0, we see that 0. The eigenvalues of A A can thus be arranged in a
decreasing list
1 2 r r+1 = = n = 0

111

Canonical Forms

for some r, 0 r n. Notice that all of 1 , . . . , r are positive and the rest are all
equal to 0. Conventionally, each i is written as s2i for si R. Notice that in this
notation, an si may be positive, negative or zero. We first give a name to the square
roots of eigenvalues of A A and then relate the number r of positive eigenvalues
of A A with the rank of the matrix A. Of course, we could have started with the
eigenvalues of AA instead of A A.
Let A Cmn . Let s21 s2n be the n eigenvalues of A A. The non-negative
real numbers s1 , . . . , sn are called the singular values of A.
Theorem 6.9
Let A Cmn . Then rank(A) = rank(A A) = rank(AA ) = the number of positive
singular values of A.
Proof As linear transformations, A : Cn1 Cm1 , A : Cm1 Cn1 , and
A A : R(A) Cm1 . By the rank nullity theorem,
rank(A A) = dim (R(A A)) dim (R(A)) = rank(A).
For the other inequality, let v N(A A). That is, A Av = 0. Then v A Av = 0
implies that kAvk2 = 0 giving Av = 0. That is, N(A A) N(A). It implies that
null (A A) null (A). Notice that A A Cnn . Thus
rank(A) = n null (A) n null (A A) = rank(A A).
Combining both the inequalities, we obtain rank(A A) = rank(A).
Now, consider A instead of A. What we just proved implies that
rank(AA ) = rank((A ) A ) = rank(A ).
But rank(At ) = rank(A). Also, rank(A) = rank(A). Thus, rank(A ) = rank(A).
Therefore,
rank(AA ) = rank(A).
Notice that A A is hermitian. So, it is similar to the diagonal matrix
D = diag (s21 , . . . , s2r , s2r+1 , . . . , s2n )
with s21 s2r s2r+1 s2n . Since rank(A A) = rank(D) = r, we see that
s1 sr > 0 and sr+1 = = sn = 0. That is, A has exactly r number of
positive singular values.
Not only the number of positive eigenvalues of A A and of AA is same, but also
something more can be said about these eigenvalues.
Suppose > 0 is an eigenvalue of A A with an associated eigenvector v. Then

A Av = v implies (AA )(Av) = (Av). Since v 6= 0, Av 6= 0. Thus the same scalar


is also an eigenvalue of AA with an associated eigenvector Av.

112

Elementary Matrix Theory

Similarly, if > 0 is an eigenvalue of AA , then it follows that the same is an


eigenvalue of A A. That is, a positive real number is an eigenvalue of A A iff it is an
eigenvalue of AA .
It also follows that A and A have the same r number of positive singular values,
where r = rank(A) = rank(A ). Further, A has n r number of zero singular values,
whereas A has m r number of zero singular values. In addition, if A Cnn is
hermitian and has eigenvalues 1 , . . . , n , then its singular values are | 1 |, . . . , | n |.
During the course of the proof of Theorem 6.9, we have shown that there exists a
unitary matrix P such that
P (A A)P = diag (s21 , . . . , s2r , 0, . . . , 0),
where s1 sr are the positive singular values of A. An analogous factorization
for A itself also holds.
Theorem 6.10
(SVD) Let A Cmn be of rank r. Let s1 . . . sr be the positive singular
values of A. Let S := diag (s1 , . . . , sr ) Crr . Then there exist unitary matrices
P Cmm and Q Cnn such that


S 0

A = P Q , :=
Cmn .
0 0
Further, the columns of P are the eigenvectors of AA , that form an orthonormal basis of Cm1 , and the columns of Q are the eigenvectors of A A that form
an orthonormal basis of Cn1 .
Proof The matrix A has singular values s1 sr > 0, , 0. Thus, the
eigenvalues of AA and of A A are s21 s2r > 0, , 0. In case of AA , there
are m r number of zeros and in case of A A, there are n r number of zeros.
The matrix A A is hermitian.There exists an orthonormal basis {v1 , . . . , vn }
for Cn1 such that
A A vi = s2i vi for i = 1, . . . , r ;

A A v j = 0 for j = r + 1, . . . , n.

For i = 1, . . . , r, define ui Cm1 by


ui =

1
Avi .
si

Then,
1
1
AA Avi = s2i Avi = s2i ui .
si
si
That is, u1 , . . . , ur are the eigenvectors of AA associated with the eigenvalues
s21 , . . . , s2r , respectively. Further, since {v1 , . . . , vr } is an orthonormal set, for
i, j = 1, . . . , r, we have
AA ui =

uj ui =

1
1
1 2
si
(Av j ) (Avi ) =
v j (A Avi ) =
v j si vi = vj vi .
si s j
si s j
si s j
sj

113

Canonical Forms

It says that for i 6= j, ui u j , and kui k2 = 1. Therefore, {u1 , . . . , ur } is an


orthonormal set of eigenvectors of AA , in Cm1 .
Also, the above equation shows that, for i, j = 1, . . . , r,
ui Avi = si

and uj Avi = si uj ui = 0 for i 6= j.

Now, extend the orthonormal ordered set {u1 , . . . , ur } to an orthonormal


basis {u1 , . . . , um } for Cm1 . Clearly, the above equations hold. Moreover,
A Avk = 0 for each k = r + 1, . . . , n. Now, kAvk k2 = vk A Avk = 0. Hence, Avk = 0
for k = r + 1, . . . , n. It follows that for j = r + 1, . . . , m and k = r + 1, . . . , n, we
have uj Avk = 0. In summary, for i = 1, . . . , n, and j = 1, . . . , m, we obtain

si
ui Av j = 0

if 1 i = j r
if 1 i 6= j r
otherwise.

Writing in matrix form, we have



u1



diag (s1 , . . . , sr ) 0
.. 
v

v
A
=
.
.
1
n
0
0

um
Next, take P as the matrix whose ith column is ui , and Q as the matrix whose
jth column is v j . We obtain


S 0

P AQ =
, S := diag (s1 , . . . , sr ).
0 0
Since P Cmm has orthonormal columns, it is unitary. Similarly, Q is unitary.
That is, P P = PP = I and Q Q = QQ = I. Multiplying P on the left and Q
on the right, we obtain the required factorization of A.
In the singular value decomposition of a matrix A, the columns ui in P are the
eigenvectors of AA ; and these are called the left singular vectors of A. Analogously,
the columns v j in Q are the eigenvectors of A A, and are called the right singular
vectors of A.
Observe that the columns r + 1 onwards in the matrices P and Q in the product
PAQ produce the zero blocks. Thus taking




P = u1 ur , Q = v1 vr ,
we see that a simplified decomposition of A can also be given. It is as follows:
Q .
A = PS
Such a decomposition is called the tight SVD of the matrix A.

114

Elementary Matrix Theory

In the tight SVD, A Cmn , P Cmr , S Crr and Q Crn are matrices each
and C = SQ to obtain
of rank r. Write B = PS

A = B Q = PC,
where B Cmr and C Crn have rank r. It shows that each m n matrix of rank
r can be written as a product of one m r matrix of rank r and a matrix of size
r n, which is also of rank r. Recall that this factorization is named as the full rank
factorization of a matrix.
Example 6.7

2 1
1 ,
To obtain SVD, tight SVD, and a full rank factorization of A = 2
4 2
we consider


24 12

A A=
.
12
6

It has eigenvalues 1 = 30 and 2 = 0. Thus s1 = 30. It is easy to check that


rank(A) = 1 as the first column of A is 2 times the second column. Solving
the equations A A[a, b]t = 30[a, b]t , that is,
24a 12b = 30a, 12a + 6b = 30b,
we obtain a solution as a = 2, b = 1. So, a unit eigenvector of A A corresponding to the eigenvalue 30 is
 
2
1
v1 = 5
.
1
For the eigenvalue 2 = 0, the equations are
24a 12b = 0, 12a + 6b = 0.
Thus a unit eigenvector orthogonal to the earlier is
 
1
v2 = 15
.
2
Then,

2 1  2 
/ 5
1 1 =
u1 = 130 Av1 = 130 2
/ 5
4 2


1
1 1 .
6
2

Notice that ku1 k = 1. We extend {u1 } to an orthonormal basis of C31 . It is




1
1
1

u1 = 16 1 , u2 := 12 1 , u3 = 13 1 .

2
0
1

115

Canonical Forms

Next, we take u1 , u2 , u3 as the columns of P and v1 , v2 as the columns of Q to


obtain the SVD of A as


1/ 2
1/ 3
1/ 6
2 1
30 0  2 1 
/ 5 / 5

2
1 = 1/ 6 1/ 2 1/ 3 0
.

0 1
/ 5 2/ 5

1/ 3
2/ 6
0
4 2
0
0
For the tight SVD, P has the r columns as the the first r columns of P, Q has
the the r columns as the first r columns of Q, and S is the usual r r block
consisting of singular values of A as the diagonal entries. With r = rank(A) = 1,
we thus have the tight SVD as


1/ 6


2 1
  2/5

2
1 = 1/ 6 30
.

1/ 5

2/ 6
4 2
In the tight SVD, using associativity of matrix product, we get the rank
factorizations as



1/ 6 
5  2 

2 1
2/ 6
5
/

1
2

1 =
/ 6
.
=

1/ 5
5

6
2/ 6
4 2
2 5
You should check that the columns of Q are eigenvectors of AA .
Like the tight SVD, another simplification can be done in SVD. Let A Cmn
with m n. Suppose A = PQ is an SVD of A. Let the ith row of Q be denoted by
vi C1n . Write

v1
..
P1 := P, Q1 = . Cmn , 1 = diag (s1 , . . . , sr , 0, . . . , 0) Cmm .
vm
Notice that P1 is unitary and the m rows of Q1 are orthonormal. In block form, we
have




Q = Q1 Q2 , = 1 0 ,


where Q2 = vm+1 vn . Then

A = PQ = P1 1

 
 Q1
0
= P1 1 Q1 .
Q2

Similarly, when m n, we may curtail P accordingly. That is, suppose the ith
column of P is denoted by ui Cm1 . Write


P2 = u1 un Cmn , 2 = diag (s1 , . . . , sr , 0, . . . , 0) Cnn , Q2 := Q.

116

Elementary Matrix Theory

Here, the n columns of P2 are orthonormal, and Q2 is unitary. Then


A = PQ = P2 2 Q2 .
These above two forms of SVD, one for m n and the other for m n are called
the thin SVD of A.
It is easy to see that a singular value decomposition of a matrix is not unique. For,
SVD depends on the choice of orthonormal bases; and we can always choose different orthonormal bases, for instance, just by multiplying 1 to an already constructed
one. Also, it can be shown that when A Rmn , the matrices P and Q can be chosen
to have real entries.
Singular value decomposition is the most important result for scientists and engineers, perhaps, next to the theory of linear equations. It shows clearly the power
of eigenvalues and eigenvectors in a dramatic way. Observe that when we write
an m n matrix A of rank r in its SVD form A = PQ , the columns of P are the
eigenvectors of the matrix AA associated with the eigenvalues s21 , . . . , s2r , 0, . . . , 0.
Similarly, the columns of Q are the eigenvectors of the matrix A A associated with
the same eigenvalues. In the former case, there are m r zero eigenvalues and in the
latter case, they are n r in number. Writing the ith column of P as ui and the jth
column of Q as v j , SVD amounts to writing A as
A = s1 u1 v1 + + sr ur vr .
Each matrix uk vk here is of rank 1. This means that if we know the first r singular
values of A and we know their corresponding left and right singular vectors, we
know A completely. This is particularly useful when A is a very large matrix of low
rank. No wonder, SVD is used in image processing, various compression algorithms,
and in principal components analysis. We will see another application of SVD in
representing a matrix in a very useful and elegant manner.

Exercises for 6.4


1. Let A Cmn . Let s1 sr be the positive singular values of A. Show that
the positive singular values of A are also s1 , . . . , sr .
2. Prove that if 1 , . . . , n are the eigenvalues of an n n hermitian matrix, then
its singular values are | 1 |, . . . , | n |.
3. Compute the singular value decomposition of the following matrices:



2 2
1 2
2
2
1
2
(a) 1 1 (b)
(c) 2 0 5 .
2 1 2
1
1
3 0
0




1 0
2 1
are similar but they have different
4. Show the matrices
and
1 1
1
0
singular values.

117

Canonical Forms

5. Show that a matrix A Cmn is of rank 1 iff there exist vectors u Cm1 and
v C1n such that A = uv.
6. Let A Cmn be
of rank r with positive singular values s1 , . . . , sr .
 a matrix

S 0
Suppose A = P
Q is an SVD of A, where S = diag (s1 , . . . , sr ). Define
0 0
 1 
S
0
A = Q
P . Prove that A satisfies the following properties:
0 0
(AA ) = AA , (A A) = A A, AA A = A, A AA = A .
A is called the generalized inverse of A.
7. Let A Fmn . Prove that there exists a unique matrix A Fnm satisfying the
four identities mentioned in the previous exercise.

6.5

Polar decomposition

Square matrices behave like complex numbers in many ways. One example is a powerful representation of square matrices using a stretch and a rotation. This mimics
the polar representation of a complex number as z = rei . In this representation, r is
a non-negative real number, thus it represents the stretch; and ei is a rotation. Similarly, a square matrix can be written as a product of a positive semidefinite matrix
and a unitary matrix. The positive semidefinite matrix is a stretch and the unitary
matrix is a rotation. We slightly generalize the representation to any m n matrix.
A hermitian matrix P Fnn is called positive semidefinite iff x Px 0 for each
x Fn1 .
Recall that a matrix U has orthonormal rows iff UU = I; it has orthonormal
columns iff U U = I; and it is unitary iff its rows are orthonormal and its columns
are orthonormal iff UU = U U = I.
Theorem 6.11
(Polar decomposition) Let A Cmn . Then there exist positive semidefinite
matrices P Cmm , Q Cnn , and a matrix U Cmn such that
A = PU = UQ,
where P2 = AA , Q2 = A A, and U satisfies the following:
1. If m = n, then the n n matrix U is unitary.
2. If m < n, then the rows of U are orthonormal.
3. If m > n, then the columns of U are orthonormal.

118

Elementary Matrix Theory

Proof Let A Cmn be a matrix of rank r having positive singular values


s1 sr . Let A = BDE be an SVD of A, where B Cmm , E Cnn are
unitary matrices, and D Cmn has first r diagonal entries as s1 , . . . , sr , all
other entries 0.
(1) Suppose m = n. Then all the matrices A, B, D, E are of size n n. Since
B B = BB = E E = EE = I, we rewrite A as follows.
A = BDE = (BDB )(BE ) = (BE )(EDE ).
We take U := BE , P := BDB and Q = EDE so that A = PU = UQ. We must
show that U is unitary and P, Q are positive semidefinite satisfying P2 = AA
and Q2 = A A. Now,
U U = EB BE = EE = I,

UU = BE EB = BB = I.

Thus, U is unitary.
Clearly, both P and Q are hermitian. For the other properties of P and Q,
let x Cn1 . Then
x Px = x BDB x = (B x) D(B x).
Write B x := (a1 , . . . , an )t Cn1 . Then
x Px = (a1 , . . . , an )D(a1 , . . . , an )t = |a1 |2 s1 + + |ar |2 sr 0.
Therefore, P is positive semidefinite. Also,
P2 = BDB BDB = BDDB = BDE EDB = (BDE )(BDE ) = AA .
Similarly, it follows that Q is positive semidefinite and Q2 = A A.
(2) Let m < n. Write the n n matrix E in block form


E = E1 E2 ,
where E1 Cnm comprises the first m columns of E and E2 Cn(nm) comprises the rest of the columns. Since E is unitary, the columns of E1 are
orthonormal, that is, E1 E1 = I. Notice that E1 E1 need not be I. Further, write
D in block form with D1 Cmm as the matrix obtained from D by retaining
the first m columns and deleting the next n m columns. That is,


D = D1 0 , D1 = diag (s1 , . . . , sr , 0, . . . , 0).
Consequently,

DE = D1

 
 E1
= D1 E1 ,
E2

ED = ED = (DE ) = (D1 E1 ) = E1 D1 .

119

Canonical Forms
Set U := BE1 and Q := E1 DE . Now, U Cmn , Q Cnn ; and
A = BDE = BE1 E1 DE = UQ.
We find that
UU = (BE1 )(BE1 ) = BE1 E1 B = BB = I.

That is, the rows of U are orthonormal.


Clearly, Q is hermitian. Next, let x Cn1 . Write E x := (a1 , . . . , an )t Cn1 ;
so E1 x = (a1 , . . . , am )t Cm1 . Then
x Qx = x E1 DE x = x E1 D1 E1 x = (E1 x) D(E1 x) = |a1 |2 s1 + + |ar |2 sr 0.
That is, Q is positive semidefinite.
Using DE = D1 E1 , E1 D1 = ED, E1 E1 = I and B B = I, we have
Q2 = (E1 DE )2 = E1 DE E1 DE = E1 D1 E1 E1 DE = E1 D1 DE
= EDDE = EDB BDE = (BDE ) (BDE) = A A.
To show that A can also be written in the form PU, consider the following:
A = BDE = BD1 E1 = (BD1 B )(BE1 ) = PU

with P := BD1 B .

As earlier, it is easily checked that P is positive semidefinite and P2 = AA .


(3) Let m > n. Then A Cnm has less number of rows than columns. We
then use (2) to obtain positive semidefinite matrices P Cnn , Q Cmm and
a matrix U Fnm having orthonormal rows such that

A = PU = U Q.
Taking adjoint, and writing P := Q , Q := P and U := U , we obtain
A = UQ = PU,
where U Cmn has orthonormal columns, and P Cnn , Q Cmm satisfy
P2 = AA , Q2 = A A.
Notice that the proof of Theorem 6.11(2) is also valid when m n. Thus, the proof
of (1) is redundant as it would follow from (2) and (3). Also, (2) can be proved more
easily by using the thin SVD. Further, in the proof of (3) we have not constructed
the matrices P and Q explicitly. To unfold the proof, we start with A = BDE , which
yields A = ED B . It is in the form
A = B D E ,

B = E,

D = D ,

E = B.

Next, we follow the construction in (2) with m and n interchanged. It asks us to get
and D 1 as the first n columns of D,
and the first n
B 1 as the first n columns of B;
columns of E as E1 . Then A = PU = U Q with
U = B E1 ,

P = B D 1 B ,

Q = E1 D E

120

Elementary Matrix Theory

We also write B1 , E1 for the matrices formed by taking the first n columns of B, E,
respectively and D1 for the matrix formed by taking the first n rows of D.
Now, taking adjoint, we have A = PU = UQ with
U = U = E1 B = (first n columns of B)E = B1 E ,
P = Q = E D E1 = BDB1 ,

Q = P = B D 1 B = ED1 E .

With these U, P and Q, you can give a direct proof of (3) in Theorem 6.11.
The construction of polar decomposition from SVD may be summarized as follows:
If A Cmn has SVD as A = BDE , then A = PU = UQ, where
mn :

U = BE1 ,

P = BD1 B ,

Q = E1 DE .

mn :

U = B1 E ,

P = BDB1 ,

Q = ED1 E .

Here, E1 is constructed from E by taking its first m columns; B1 is constructed from


B by taking its first n columns; and D1 is constructed from D by taking its first n
rows. In case, m = n, the subscripts go away from B, D and E.
Example 6.8

2 1
1 of Example 6.7. We had obtained its
Consider the matrix A = 2
4 2
SVD as A = BDE , where


1/ 2
1/ 3
1/ 6


30 0
2/ 5 1/ 5

B = 1/ 6 1/ 2 1/ 3 , D = 0

.
0 , E =
1/ 5
2/ 5

2/ 6
1/ 3
0
0
0
We follow the notation used in the proof of Theorem 6.11. Here, A C32 .
Thus Theorem 6.11(3) is applicable; see the discussion following the proof of
the theorem. We construct the matrices B1 by taking first two columns of B,
and D1 by taking first two rows of D, as in the following:


1/ 6
1/ 2



30 0
1
1

/ 6
/ 2 , D1 =
B1 =
.

0
0
2/ 6
0
Then

1
U = B1 E = 16 1
2

P = BDB1 = 5 1
2



2 + 3 1 + 23
1
1
2 1
= 130 2 + 3
3
1 + 2 3 ,
1 2
5
0
4
2



1 1 2
0
1
1
1
2

0
1 2 ,
= 56 1
3
3 0
6
0
2 2 4

121

Canonical Forms

2
Q = ED1 E = 6
1



0 1 2

1
0
5


1
=
2

6
5


4 2
.
2
1

As expected we find that

1 1
2
2 + 3 1 + 23
2 1

1
1 2 2 + 3
1 = A.
PU = 56 1
1 + 2 3 = 2
30
2 2
4
4 2
4
2


2 + 3 1 + 23 
2 1
6
4
2
1 = A.
= 2
UQ = 130 2 + 3
1 + 2 3
2
1
5
4 2
4
2

Computation of polar decomposition does not require SVD. In A = PU = UQ, the


matrices P and Q satisfy P2 = AA and Q2 = A A. If A Cmn , then AA Cmm
and A A Cnn are hermitian matrices with eigenvalues as s21 , . . . , s2r , 0, . . . , 0. That
is, if AA = C diag (s21 , . . . , s2r , 0, . . . , 0)C, then P = C diag (s1 , . . . , sr , 0, . . . , 0)C .
Here, the matrix C consists of orthonormal eigenvectors of AA corresponding to
the eigenvalues s21 , . . . , s2r , 0, . . . , 0.
Similarly, the matrix Q is equal to F diag (s1 , . . . , sr , 0, . . . , 0) F, where F consists of orthonormal eigenvectors of A A corresponding to the eigenvalues s21 , . . . , s2r ,
0, . . . , 0. Finally, the Us can be computed by solving the linear systems A = PU and
A = UQ. The Us in two instances may differ since they depend on the choices of
orthonormal eigenvectors of AA and A A. In case, A is invertible, you would end up
with the same U.

Exercises for 6.5


1. Determine the polar decompositions of the matrix A of Example 6.8 by diagonalizing AA and A A as mentioned in the text.
2. Let A Cmn with m < n. Prove that there exists a unitary matrix U Cnn
and a matrix P Cmn such that A = PU.
3. Give a direct proof of Theorem 6.11(3) analogous to that of (2) there. You
may have to partition B instead of E in the SVD of A as A = BDE .
4. Prove Theorem 6.11(2-3) by using thin SVD.
5. Derive singular value decomposition from the polar decomposition.

Short Bibliography

[1] S. Axler, Linear Algebra Done Right, Springer Int. ed., Indian Reprint, 2013.
[2] R.A. Brualdi, The Jordan canonical form: an old proof, The American Mathematical Monthly, 94:3 (1987), 257-267.
[3] S. D. Conte, C. de Boor, Elementary Numerical Analysis: An algorithmic approach, McGraw-Hill Book Company, Int. Student Ed., 1981.
[4] J. W. Demmel, Numerical Linear Algebra, SIAM Pub., Philadelphia, 1996.
[5] A. F. Filippov, A short proof of the theorem on reduction of a matrix to Jordan
form, Vestnik Moskov Univ. Ser. I Mat. Meh. 26:2 (1971), 1819. MR 43 No.
4839.
[6] F. R. Gantmacher, Matrix Theory, Vol. 1-2, American Math. Soc., 2000.
[7] G. H. Golub, C. F. Van Loan, Matrix Computations, Hindustan Book Agency,
Texts and Readings in Math. - 43, New Delhi, 2007.
[8] P. R. Halmos, Finite Dimensional Vector Spaces, Springer Int. Ed., Indian
Reprint, 2013.
[9] J. Hefferon, Linear Algebra, http://joshua.smcvt.edu/linearalgebra, 2014.
[10] R. Horn, C. Johnson, Matrix Analysis, Cambridge University Press, New York,
1985.
[11] K. Janich, Linear Algebra, Undergraduate Texts in Math., Springer, 1994.
[12] S. Kumaresan, Linear Algebra: A geometric approach, PHI, 200.
[13] S. Lang, Introduction to Linear Algebra, 2nd Ed., Springer-Verlag, 1986.
[14] D. Lewis, Matrix Theory, World Scientific, 19191.
[15] C. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM Pub., 2000.
[16] R. Piziak, P.L. Odell, Matrix Theory: From Generalized Inverses to Jordan
Form, Chapman and Hall / CRC, 2007.
[17] G. Strang, Linear Algebra and its Applications, 4th Ed., Cengage Learning,
2006.

123

Index

adjoint of a matrix, 12
adjugate, 21
algebraic multiplicity, 95
angle between vectors, 66

free variable, 39
full rank factorization, 62, 114
Gaussian elimination, 42
geometric multiplicity, 95
Gram-Schmidt orthogonalization, 68
Gram matrix, 72

basic variable, 39
basis, 49
best approximation, 71

Homogeneous system, 37

Cayley-Hamilton, 79
change of basis matrix, 59
characteristic polynomial, 78
co-factor, 21
column rank, 35
column vector, 3
combination similarity, 97
complex conjugate, 12
complex eigenvalue, 78
conjugate transpose, 12
consistent system, 38
coordinate vector, 57

identity matrix, 5
inner product, 65
Jordan block, 98
Jordan form, 98
least squares, 74
linearly dependent, 28
linearly independent, 28
linear combination, 18, 27
linear map, 54
Linear system, 36

Determinant, 20
diagonalizable, 91
diagonalized by, 91
diagonal entries, 4
diagonal matrix, 4
diagonal of a matrix, 4
dilation similarity, 97
dimension, 50

Matrix, 3
augmented, 23
entry, 3
hermitian, 82
inverse, 9
invertible, 9
lower triangular, 5
multiplication, 7
multiplication by scalar, 6
normal, 92
order, 4
orthogonal, 82
real symmetric, 82
size, 4
skew hermitian, 82

eigenvalue, 77
eigenvector, 77
elementary matrix, 13
elementary row operation, 14
equal matrices, 4
equivalent matrices, 61

124

125

Short Bibliography
skew symmetric, 82
sum, 6
symmetric, 82
trace, 20
unitary, 82
minor, 21

Spectral theorem, 92
standard basis, 5
standard basis vectors, 5
subspace, 46
super-diagonal, 4
system matrix, 37

norm, 66
null space, 55

thin SVD, 116


tight SVD, 113
transpose of a matrix, 10
triangular matrix, 5

off diagonal entries, 4


orthogonal basis, 70
orthogonal set, 66
orthogonal vectors, 66
orthonormal basis, 70
orthonormal set, 67
permutation similarity, 97
pivot, 15
pivotal column, 15
pivotal row, 15
positive semidefinite, 117
powers of matrices, 9
Pythagoras, 66
QR factorization, 73
range, 54
range space, 55
rank echelon matrix, 61
rank factorization, 61
rank nullity theorem, 56
Reduction
row reduced echelon form, 16
row rank, 35
Row reduced echelon form, 15
row vector, 3
scalars, 3
scalar matrix, 5
similar matrices, 63
singular values, 111
solution of linear system, 37
span, 47
spanning subset, 47
spans, 47

upper triangular matrix, 5


value of unknown, 37
zero matrix, 4