Anda di halaman 1dari 53

The Vector Space

n
5
SECTION 5.1 Subspaces and Spanning
In Section 2.5 we introduced the set
n
of all n 1 columns, investigated the linear
transformations
n

m
, and showed that they are all given by left multiplication
by an m n matrix. Particular attention was paid to the euclidean plane
2
and to
euclidean space
3
, where geometric transformations like rotations and reflections
were shown to be matrix transformations. We returned to this in Section 4.4 where
projections in
2
or
3
were also shown to be matrix transformations, and where
determinants were related to areas and volumes.
In this chapter we investigate
n
in full generality, and introduce some of the
most important concepts and methods in linear algebra. While the n-tuples in
n
can be written as rows or as columns, we will primarily denote them as column
matrices X, Y, etc. The main exception is that the geometric vectors in
2
and
3
will be written as , , etc.
Subspaces of
n
A set
1
U of vectors in
n
is called a subspace of
n
if it satisfies the following
properties:
S1. The zero vector 0 is in U.
S2. If X and Y are in U, then X + Y is also in U.
S3. If X is in U, then aX is in U for every real number a.
We say that the subset U is closed under addition if S2 holds, and that U is closed
under scalar multiplication if S3 holds.
Clearly
n
is a subspace of itself. The set U {0}, consisting of only the zero
vector, is also a subspace because 0 + 0 0 and a0 0 for each a in ; it is called the
zero subspace. Any subspace of
n
other than {0} or
n
is called a proper
subspace.
w

1 We use the language of sets. Informally, a set X is a collection of objects, called the elements of the set. Two sets X and Y are
called equal (written X Y) if they have the same elements. If every element of X is in the set Y, we say that X is a subset
of Y, and write X

Y. Hence both X

Y and Y

X if and only if X Y.
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 197
198 Chapter 5 The Vector Space
n
We saw in Section 4.2 that every plane M through the origin in
3
has equation
ax + by + cz 0 where a, b, and c are not all zero. Here is a normal to
the plane and
where and denotes the dot product introduced in Section 4.2.
2
Then Mis a subspace of
3
. Indeed we show that Msatisfies S1, S2, and S3 as follows:
S1. is in M because ;
S2. If and are in M, then , so is in M;
S3. If is in M, then , so is in M.
This proves the first part of
Example 1
Planes and lines through the origin in
3
are all subspaces of
3
.
Solution We dealt with planes above. If L is a line through the origin with direction
vector then . We leave it as an exercise to verify that L
satisfies S1, S2, and S3.
Example 1 shows that lines through the origin in
2
are subspaces; in fact, they are
the only proper subspaces of
2
(Exercise 24). Indeed, we shall see in Example 11
5.2 that lines and planes through the origin in
3
are the only proper subspaces of

3
. Thus the geometry of lines and planes through the origin is captured by the
subspace concept. (Note that every line or plane is just a translation of one of these.)
Subspaces can also be used to describe important features of an m n matrix A.
The null space of A, denoted null A, and the image space of A, denoted im A, are
defined by
.
In the language of Chapter 2, null A consists of all solutions X in
n
of the
homogeneous system AX 0, and im A is the set of all vectors Y in
m
such that
AX Y has a solution for some X.
Note that X is in null A if it satisfies the condition AX 0, while im A consists of
vectors of the form AX for some X in
n
. These two ways to describe subsets occur
frequently.
Example 2
If A is an m n matrix, then:
1. null A is a subspace of
n
.
2. im A is a subspace of
m
.
Solution 1. The zero vector 0 in
n
lies in null A because A0 0.
3
If X and X
1
are in null A, then X + X
1
and aX are in null A because they satisfy the
required condition:
null { } im { } A X AX A AX X
n n
in and in

0
L td t { }


in d

,
av

n av a n v a

i ( ) ( ) ( ) i 0 0 v

v v

+
1
n v v n v n v

i i i ( ) + + +
1 1
0 0 0 v
1


i 0 0 0

n v

i v x y z
T
[ ]
M v n v { }

in
3

i 0
n a b c
T
[ ]
2 We are using set notation here. In general {q | p } means the set of all objects q with property p.
3 We are using 0 to represent the zero vector in both
m
and
n
. This abuse of notation is common, and causes no confusion
once everyone knows that it is going on.
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 198
199 Section 5.1 Subspaces and Spanning
4 The vector n

is nonzero because v

and w

are not parallel.


Hence null A satisfies S1, S2, and S3, and so is a subspace of
n
.
2. The zero vector 0 in
m
lies in im A because 0 A0. Suppose that Y and
Y
1
are in im A, say Y AX and Y
1
AX
1
where X and X
1
are in
n
. Then
show that Y + Y
1
and aY are both in im A (they have the required form).
Hence im A is a subspace of
m
.
There are other important subspaces associated with a matrix A that clarify basic
properties of A. If A is an m n matrix and is any number, let
.
A vector X is in E

(A) if and only if (I A)X 0

, so Example 2 gives:
Example 3
E

(A) null(I A) is a subspace of


n
for each n n matrix A and number .
E

(A) is called the eigenspace of A corresponding to . The reason for the name is
that, in the terminology of Section 3.3, is an eigenvalue of A if E

(A) {0}.
In this case the nonzero vectors in E

(A) are called the eigenvectors of A


corresponding to .
The reader should not get the impression that every subset of
n
is a subspace.
For example:
Hence neither U
1
nor U
2
is a subspace of
2
. (However, see Exercise 20.)
Spanning Sets
Let and be two nonzero, nonparallel vectors in
3
with their tails at the origin.
The plane M through the origin containing these vectors is described in Section 4.2
by saying that is a normal for M, and that M consists of all vectors such
that .
4
While this is a very useful way to look at planes, there is another
approach that is at least as useful in
3
and, more importantly, works for all subspaces
of
n
.
The idea is as follows: Observe that, by the diagram, a vector is in M if and
only if it has the form
for certain real numbers a and b (we say that p

is a linear combination of v

and w

).
p av bw

+
p

n p

i 0
p

n v w


w

U
x
y
x
U
x
y
x y
1
2
2
0

1
]
1

1
]
1

satisfies S1and S2, but not S3;


22

satisfies S1and S3, but not S2.


E A X AX X
n

( ) { } in
Y Y AX AX A X X aY a AX A aX + + +
1 1 1
( ) ( ) ( ) and
A X X AX AX A aX a AX a ( ) ( ) ( ) . + + +
1 1
0 0 0 0 0 and
a
b
M
p

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 199


Hence we can describe M as
and we say that is a spanning set for M. It is this notion of a spanning set that
provides a way to describe all subspaces of
n
.
Given vectors X
1
, X
2
, , X
k
in
n
, a vector of the form
is called a linear combination of the X
i
, and t
i
is called the coefficient of X
i
in the
linear combination. The set of all such linear combinations is called the span of the
X
i
and is denoted
.
Thus span{X, Y} {sX + tY

s, t in }, and span{X} {tX

t in }.
In particular, the above discussion shows that, if and are two nonzero,
nonparallel vectors in
3
, then
M span
is the plane in
3
containing and . Moreover, if is any nonzero vector in
3
(or
2
), then
is the line with direction vector . Hence lines and planes can both be described in
terms of spanning sets.
Example 4
Let and in
4
. Determine whether
or is in U span{X, Y}.
Solution The vector P is in U if and only if P sX + tY for scalars s and t. Equating
components gives equations
.
This linear system has solution s 3 and t 2, so P is in U. On the other
hand, Q sX + tY leads to equations
and this system has no solution. So Q does not lie in U.
Theorem 1
Let in
n
. Then:
1. U is a subspace of
n
containing each X
i
.
2. If Wis a subspace of
n
and each X
i
is in W, then U

W.
U X X X
k
span , , , { }
1 2

2 3 2 4 3 2 1 2 s t s t s t s t + + + , , , and
2 3 0 4 11 2 8 1 s t s t s t s t + + + , , , and
Q
T
[ ] 2 3 1 2 P
T
[ ] 0 11 8 1
Y
T
[ ] 3 4 1 1 X
T
[ ] 2 1 2 1
d

L v td t span{ } { }


in
d

{ , } v w

w

span{ , , , } { } X X X t X t X t X t
k k k i 1 2 1 1 2 2
+ + + in
t X t X t X t
k k i 1 1 2 2
+ + + where the are scalars
{ , } v w

M av bw a b + { , }

in
5
200 Chapter 5 The Vector Space
n
5 In particular, this implies that any vector p

orthogonal to v

must be a linear combination p

av

+ bw

of v

and w

for
some a and b. Can you prove this directly?
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 200
PROOF
Write
1. The zero vector 0 is in U because is a linear
combination of the X
i
. If and
are in U, then X + Y and aX are in U because
Hence S1, S2, and S3 are satisfied for U, proving (1).
2. Let where the t
i
are scalars and each X
i
is in
W. Then each t
i
X
i
is in Wbecause Wsatisfies S3. But then X is in W
because Wsatisfies S2 (verify). This proves (2).
Condition (2) in Theorem 1 can be expressed by saying that is
the smallest subspace of
n
that contains each X
i
. Here is an example of how it is used.
If we say that the vectors span the
subspace U.
Example 5
If X and Y are in
n
, show that .
Solution Since both X + Y and X Y are in span{X, Y }, Theorem 1 gives
But and are both in
span{X + Y, X Y}, so
again by Theorem 1. Thus span{X, Y} span{X + Y, X Y}.
It turns out that many important subspaces are best described by giving a
spanning set. Here are three examples, beginning with an important spanning set
for
n
itself. Column j of the n n identity matrix I
n
is denoted E
j
and called the
jth coordinate vector in
n
, and the set is called the standard basis
of
n
. If is any vector in
n
, then
as the reader can verify. This proves:
Example 6
.
If A is an m n matrix A, the next two examples show that it is a routine
matter to find spanning sets for null A and im A.
Example 7
Given an m n matrix A, let X
1
, X
2
, , X
k
denote the basic solutions to the
system AX 0 given by the gaussian algorithm. Then
. null span{
2
A X X X
k

1
, , , }

n
k
E E E span{ , , , }
1 2

X x E x E x E
n n
+ + +
1 1 2 2

X x x x
n
T
[ ]
1 2

{ , , , } E E E
n 1 2

span span { , } { , } X Y X Y X Y

+
Y X Y X Y +
1
2
( ) ( )
1
2
X X Y X Y + +
1
2
1
2
( ) ( )
span span { , } { , }. X Y X Y X Y +

span span { } { } X Y X Y X Y , , +
X X X
k 1 2
, , , U X X X
k
span , , , { }
1 2

span{ } X X X
k 1 2
, , ,
X t X t X t X
k k
+ + +
1 1 2 2

X Y s t X s t X s t X
aX at X at X
k k k
+ + + + + + +
+ +
( ) ( ) ( ) and
( ) ( )
1 1 1 2 2 2
1 1 2 2
,
+ ( ) at X
k k
.
Y s X s X s X
k k
+ + +
1 1 2 2

X t X t X t X
k k
+ + +
1 1 2 2

0 0 0 0
1 2
+ + + X X X
k

U X X X
k
span , , , { } for convenience.
1 2

201 Section 5.1 Subspaces and Spanning
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 201
Solution If X is in null A, then AX 0 so Theorem 3 2.2 shows that X is a linear
combination of the basic solutions; that is, null A

span{X
1
, X
2
, , X
k
}. On
the other hand, if X is in span{X
1
, X
2
, , X
k
}, then X t
1
X
1
+ t
2
X
2
+

+ t
k
X
k
for scalars t
i
, so
.
This shows that X is in null A, and hence that span{X
1
, X
2
, , X
k
}

null A.
Thus we have equality.
Example 8
Let C
1
, C
2
, , C
n
denote the columns of the m n matrix A. Then
im A span{C
1
, C
2
, , C
n
}.
Solution Observe first that AE
j
C
j
for each j where E
j
is the jth coordinate vector in
n
.
Hence each C
j
is in im A, and so span{C
1
, C
2
, , C
n
}

im A by Theorem 1.
Conversely, let Y be a vector in im A, say Y AX for some X in
n
.
If X [x
1
x
2

x
n
]
T
, then Theorem 4 2.2 gives
so Y is in span{C
1
, C
2
, , C
n
}. Hence im A

span{C
1
, C
2
, , C
n
}, and the
result is proved.
Exercises 5.1
Y AX C C C
x
x
x
x C x C x C
n
n
n n

1
]
1
1
1
1
+ + + [ ]
1 2
1
2
1 1 2 2


AX t AX t AX t AX t t t
k k k
+ + + + + +
1 1 2 2 1 2
0 0 0 0
202 Chapter 5 The Vector Space
n
1. In each case determine whether U is a subspace
of
3
. Support your answer.
(a) U {[1 s t]
T
s and t in }.
(b) U {[0 s t]
T
s and t in }.
(c) U {[r s t]
T
r, s, and t in , r + 3s + 2t 0}.
(d) U {[r 3s r 2]
T
r and s in }.
(e) U {[r 0 s]
T
r
2
+ s
2
0, r and s in }.
(f ) U {[2r s
2
t]
T
r, s, and t in }.
2. In each case determine if X lies in U span{Y, Z}.
If X is in U, write it as a linear combination of Y
and Z; if X is not in U, show why not.
(a) X [2 1 0 1]
T
, Y [1 0 0 1]
T
, and
Z [0 1 0 1]
T
.
(b) X [1 2 15 11]
T
, Y [2 1 0 2]
T
, and
Z [1 1 3 1]
T
.
(c) X [8 3 13 20]
T
, Y [2 1 3 5]
T
, and
Z [1 0 2 3]
T
.
(d) X [2 5 8 3]
T
, Y [2 1 0 5]
T
, and
Z [1 2 2 3]
T
.
3. In each case determine if the given vectors
span
4
. Support your answer.
(a) {[1 1 1 1]
T
, [0 1 1 1]
T
, [0 0 1 1]
T
,
[0 0 0 1]
T
}.
(b) {[1 3 5 0]
T
, [2 1 0 0]
T
, [0 2 1 1]
T
,
[1 4 5 0]
T
}.
4. Is it possible that {[1 2 0]
T
, [2 0 3]
T
} can span
the subspace U {[r s 0]
T
r and s in }?
Defend your answer.
5. Give a spanning set for the zero subspace {0}
of
n
.
6. Is
2
a subspace of
3
? Defend your answer.
7. If U span{X, Y, Z} in
n
, show that
U span{X + tZ, Y, Z} for every t in .
8. If U span{X, Y, Z} in
n
, show that
U span{X + Y, Y + Z, Z + X}.
9. If a 0 is a scalar, show that span{aX} span{X}
for every vector X in
n
.
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 202
10. If are nonzero scalars, show that
for any vectors X
i
in
n
.
11. If X 0 in
n
, determine all subspaces of
span{X}.
12. Suppose that where
each X
i
is in
n
. If A is an m n matrix and
AX
i
0 for each i, show that AY 0 for every
vector Y in U.
13. If A is an m n matrix, show that, for each
invertible m m matrix U, null(A) null(UA).
14. If A is an m n matrix, show that, for each
invertible n n matrix V, im(A) im(AV ).
15. Let U be a subspace of
n
, and let X be a vector
in
n
.
(a) If aX is in U where a 0 is a number, show
that X is in U.
(b) If Y and X + Y are in U where Y is a vector
in
n
, show that X is in U.
16. In each case either show that the statement is
true or give an example showing that it is false.
(a) If U
n
is a subspace of
n
and X + Y is in
U, then X and Y are both in U.
(b) If U is a subspace of
n
and rX is in U for all
r in , then X is in U.
(c) If U is a subspace of
n
and X is in U, then
X is also in U.
(d) If X is in U and U span{Y, Z}, then
U span{X, Y, Z}.
(e) The empty set of vectors in
n
is a subspace
of
n
.
17. (a) If A and B are m n matrices, show that
U {X in
n

AX BX} is a subspace of
n
.
(b) What if A is m n, B is k n, and m k?
18. Suppose that are vectors in
n
.
If Y a
1
X
1
+ a
2
X
2
+

+ a
k
X
k
where a
1
0, show
that span{X
1
, X
2
, , X
k
} span{Y, X
2
, , X
k
}.
19. If U {0} is a subspace of , show that U .

20. Let U be a nonempty subset of


n
. Show that U
is a subspace if and only if S2 and S3 hold.
21. If S and T are nonempty sets of vectors in
n
,
and if S

T, show that span{S}

span{T}.
22. Let U and Wbe subspaces of
n
. Define their
intersection U Wand their sum U + Was
follows:
U W {X in
n
X belongs to both
U and W}.
U + W {X in
n
X is a sum of a vector in U
and a vector in W}.
(a) Show that U Wis a subspace of
n
.
(b) Show that U + Wis a subspace of
n
.
23. Let P denote an invertible n n matrix.
If is a number, show that
for each n n matrix A. [Here E

(A) is the set of


eigenvectors of A.]
24. Show that every proper subspace U of
2
is a
line through the origin. [Hint: If is a nonzero
vector in U, let denote the
line with direction vector . If is in U but not
in L, argue geometrically that every vector in

2
is a linear combination of and .] d

L d rd r

{ } in
d

{ is in ( )} P E A X X

E PAP

( )
1

X X X
k 1 2
, , ,
U X X X
k
, , , span{ }
1 2

span span { } { } a X a X a X X X X
k k k 1 1 2 2 1 2
, , , , , ,
a a a
k 1 2
, , ,
203 Section 5.2 Independence and Dimension
SECTION 5.2 Independence and Dimension
Some spanning sets are better than others. If is a subspace
of
n
, then every vector in U can be written as a linear combination of the X
i
in at
least one way. Our interest here is in spanning sets where each vector in U has a
exactly one representation as a linear combination of these vectors.
Linear Independence
Suppose that two linear combinations are equal in
n
:
. r X r X r X s X s X s X
k k k k 1 1 2 2 1 1 2 2
+ + + + + +
U X X X
k
, , , span{ }
1 2

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 203
We are looking for a condition on the set of vectors that guarantees
that this representation is unique; that is, r
i
s
i
for each i. Taking all terms to the left
side gives
.
so the required condition is that this equation forces all the coefficients r
i
s
i
to be zero.
With this in mind, we call a set of vectors linearly independent
(or simply independent) if it satisfies the following condition:
.
We record the result of the above discussion for reference.
Theorem 1
If is an independent set of vectors in
n
, then every vector in
has a unique representation as a linear combination of the X
i
.
It is useful to state the definition of independence in different language. Let us
say that a linear combination vanishes if it equals the zero vector, and call a linear
combination trivial if every coefficient is zero. Then the definition of independence
can be compactly stated as follows:
A set of vectors is independent if and only if the only
linear combination that vanishes is the trivial one.
Hence the procedure for checking that a set of vectors is independent is:
Independence Test
To verify that a set of vectors in
n
is independent, proceed as
follows:
1. Set a linear combination equal to zero: .
2. Show that t
i
0 for each i (that is, the linear combination is trivial).
Of course, if some nontrivial linear combination vanishes, the vectors are not
independent.
Example 1
Determine whether is independent
in
4
.
Solution Suppose a linear combination vanishes:
.
Equating corresponding entries gives a system of four equations:
.
The only solution is the trivial one r s t 0 (verify), so these vectors are
independent by the independence test.
r s t s t r t r s t + + , + , + + 2 0 0 2 2 0 5 0 , and
r s t
T T T T
[ ] [ ] [ ] [ ] 1 0 2 5 2 1 0 1 1 1 2 1 0 0 0 0 + +
{[ ] , [ ] , [ ] } 1 0 2 5 2 1 0 1 1 1 2 1
T T T
t X t X t X
k k 1 1 2 2
0 + + +
{ } X X X
k 1 2
, , ,
span{ , , , } X X X
k 1 2

{ } X X X
k 1 2
, , ,
If then t X t X t X t t t
k k k 1 1 2 2 1 2
0 0 + + +
{ } X X X
k 1 2
, , ,
( ) ( ) ( ) r s X r s X r s X
k k k 1 1 1 2 2 2
0 + + +
{ } X X X
k 1 2
, , ,
204 Chapter 5 The Vector Space
n
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 204
Example 2
Show that the standard basis of
n
is independent.
Solution We have for all scalars t
i
, so the linear
combination vanishes if and only if each t
i
0. Hence the independence test
applies.
Example 3
If {X, Y} is independent, show that {2X + 3Y, X 5Y} is also independent.
Solution If s(2X + 3Y) + t(X 5Y) 0, collect terms to get (2s + t)X + (3s 5t)Y 0.
Since {X, Y} is independent this combination must be trivial; that is, 2s + t 0
and 3s 5t 0. These equations have only the trivial solution s t 0, as
required.
Example 4
Show that the zero vector in
n
does not belong to any independent set.
Solution Given a set of vectors containing 0, we have a vanishing,
nontrivial linear combination . Hence the set is
not independent.
Example 5
Given X in
n
, show that {X} is independent if and only if X 0.
Solution A vanishing linear combination from {X} takes the form tX 0, t in . This
implies that t 0 because X 0.
A set of vectors in
n
is called linearly dependent (or simply dependent) if it is
not linearly independent, equivalently if some nontrivial linear combination vanishes.
Example 6
If and are nonzero vectors in
3
, show that is dependent if and only
if and are parallel.
Solution If and are parallel, then one is a scalar multiple of the other (Theorem 4
4.1), say for some scalar a. Then the nontrivial linear combination
vanishes, so is dependent. Conversely, if is dependent,
let be nontrivial, say s 0. Then , so X and are parallel
(by Theorem 4 4.1). A similar argument works if t 0.
By Theorem 5 2.3, the following conditions are equivalent for an n n matrix A:
1. A is invertible.
2. If AX 0 where X is in
n
, then X 0.
3. AX B has a solution X for every vector B in
n
.
w

v w
t
s

sv tw


+ 0
{ } v w

, { } v w

, v aw


0
v aw

{ } v w

, w

1 0 0 0 0 0
1 2
+ + + + X X X
k

{ } 0
1 2
, , , , X X X
k

t E t E t E t t t
n n n
T
1 1 2 2 1 2
+ + + [ ]
{ } E E E
k 1 2
, , ,
205 Section 5.2 Independence and Dimension
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 205
While (1) makes no sense if A is not square, conditions (2) and (3) are meaningful
for any matrix A and, in fact, are related to independence and spanning. To see
how, let denote the columns of an m n matrix A. If X [x
1
x
2

x
n
]
T
is a column in
n
, then
()
by Theorem 4 2.2. With this, we get the following theorem:
Theorem 2
If A is an m n matrix, let denote the columns of A.
1. is independent in
m
if and only if AX 0, X in
n
,
implies X 0.
2. if and only if AX B has a solution X for every
vector B in
n
.
PROOF
Write . Then AX 0 means by
(), so (1) follows from the definition of independence. Similarly, () shows that
a vector B in
n
satisfies AX B if and only if X is a linear combination of the
columns C
j
, so (2) follows from the definition of a spanning set.
For a square matrix A, Theorem 2 characterizes the invertibility of A in terms of
the spanning and independence of its columns (see the discussion preceding
Theorem 2). It is important to be able to discuss these notions for rows. If
are 1 n rows, we define to be the set of all
linear combinations of the X
i
(as matrices), and we say that is
linearly independent if the only vanishing linear combination is the trivial one (that
is, if is independent in
n
, as the reader can verify).
6
Theorem 3
The following are equivalent for an n n matrix A:
1. A is invertible.
2. The columns of A are linearly independent.
3. The columns of A span
n
.
4. The rows of A are linearly independent.
5. The rows of A span the set of all 1 n rows.
PROOF
Let denote the columns of A.
(1) (2). By Theorem 5 2.3, A is invertible if and only if AX 0 implies X 0; this
holds if and only if is independent by Theorem 2.
(1) (3). Again by Theorem 5 2.3, A is invertible if and only if AX B has a
solution for every column B in
n
; this holds if and only if
by Theorem 2.
span{ , , , } C C C
n
n
1 2

{ , , , } C C C
n 1 2

C C C
n 1 2
, , ,
{ , , , } X X X
T T
k
T
1 2

{ , , , } X X X
k 1 2

span{ , , , } X X X
k 1 2
X X X
k 1 2
, , ,
x C x C x C
n n 1 1 2 2
0 + + X x x x
n
T
[ ]
1 2

m
n
C C C span{ , , , }
1 2

{ , , , } C C C
n 1 2

{ , , , } C C C
n 1 2

AX x C x C x C
n n
+ +
1 1 2 2

C C C
n 1 2
, , ,
206 Chapter 5 The Vector Space
n
6 It is best to view columns and rows as just two different notations for ordered n-tuples. This discussion will become
redundant in Chapter 6 where we define the general notion of a vector space.
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 206
207 Section 5.2 Independence and Dimension
7 The plural of basis is bases.
8 We will show in Theorem 6 that every subspace of
n
does indeed have a basis.
(1) (4). The matrix A is invertible if and only if A
T
is invertible (by the
Corollary to Theorem 4 2.3); this in turn holds if and only if A
T
has independent
columns (by (1) (2)); finally, this last statement holds if and only if A has
independent rows (because the rows of A are the transposes of the columns of A
T
).
(1) (5). The proof is similar to (1) (4).
Dimension
It is common geometrical language to say that
3
is 3-dimensional, that planes
are 2-dimensional and that lines are 1-dimensional. The next theorem is a
basic tool for clarifying this idea of dimension. Its importance is difficult to
exaggerate.
Theorem 4 Fundamental Theorem
Let U be a subspace of
n
. If U is spanned by m vectors, and if U contains k
linearly independent vectors, then k m.
We give a proof at the end of this section.
The main use of the fundamental theorem depends on the following concept. If
U is a subspace of
n
, a set of vectors in U is called a basis of U if
it satisfies the following two conditions:
1. is linearly independent.
2. .
The most remarkable result about bases
7
is:
Theorem 5 Invariance Theorem
If and are bases of a subspace U of
n
, then m k.
PROOF
We have k m by the fundamental theorem because spans U,
and is independent. Similarly m k, so m k.
The invariance theorem guarantees that there is no ambiguity in the follow-
ing definition: If U is a subspace of
n
and is any basis of U, the
number m of vectors in the basis is called the dimension of U, and is denoted
.
The importance of the invariance theorem is that the dimension of U can be deter-
mined by counting the number of vectors in any basis.
8
This is very useful as we
shall see.
dimU m
{ , , , } X X X
m 1 2

{ , , , } Y Y Y
k 1 2

{ , , , } X X X
m 1 2

{ , , , } Y Y Y
k 1 2
{ , , , } X X X
m 1 2

U X X X
m
span{ , , , }
1 2

{ , , } X X X
m 1 2
,
{ , , , } X X X
m 1 2

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 207
Let denote the standard basis of
n
, that is the set of columns of
the identity matrix. Then by Example 6 5.1, and
is independent by Example 2. Hence it is indeed a basis of
n
in the
present terminology, and we have
Example 7
dim(
n
) n and is a basis.
This agrees with our sense that
2
is two-dimensional and
3
is three-
dimensional. It also says that
1
is one-dimensional, and {1} is a basis.
Returning to subspaces of
n
, we define
.
This amounts to saying {0} has a basis containing no vectors. This makes sense
because 0 cannot belong to any independent set (Example 4).
Example 8
Let . Show that U is a subspace of
4
, find a
basis of U, and calculate dim U.
Solution Clearly, where
, and . It follows that U span{X
1
, X
2
, X
3
},
and hence that U is a subspace of
4
. Moreover, if a linear combination
vanishes, it is clear that r s t 0, so {X
1
, X
2
, X
3
}
is independent. Hence {X
1
, X
2
, X
3
} is a basis of U and so dim U 3.
Example 9
Let be a basis of
n
. If A is an invertible n n matrix,
then is also a basis of
n
.
Solution Let X be a vector in
n
. Then A
1
X is in
n
so, since B is a basis, we have
for t
i
in . Left multiplication by A gives
,
and it follows that D spans
n
. To
show independence, let , where the s
i
are
in . Then , so left multiplication by A
1
gives
. Now the independence of B shows that each
s
i
0, and so proves the independence of D. Hence D is a basis of
n
.
While we have found bases in many subspaces of
n
, we have not yet shown that
every subspace has a basis. This is part of the next theorem, the proof of which is
deferred to Section 6.4 where it will be proved in more generality.
Theorem 6
Let U {0} be a subspace of
n
. Then:
1. U has a basis and dim U n.
2. Any independent set in U can be enlarged (by adding vectors) to a basis of U.
3. If B spans U, then B can be cut down (by deleting vectors) to a basis of U.
s X s X s X
n n 1 1 2 2
0 + + +
A s X s X s X
n n
( )
1 1 2 2
0 + + +
s AX s AX s AX
n n 1 1 2 2
0 ( ) ( ) ( ) + + +
X t AX t AX t AX
n n
+ + +
1 1 2 2
( ) ( ) ( )
A X t X t X t X
n n

+ + +
1
1 1 2 2

D AX AX AX
n
{ , , , }
1 2

B X X X
n
{ , , , }
1 2

rX sX tX r s t s
T
1 2 3
+ + [ ]
X
T
3
0 0 1 0 [ ] X
T
2
0 1 0 1 [ ]
X
T
1
1 0 0 0 [ ] , [ ] r s t s rX sX tX
T
+ +
1 2 3
U r s t s r s t
T
{[ ] | , , } and in
dim{ } 0 0
{ , , , } E E E
n 1 2

{ , , , } E E E
n 1 2

n
n
E E E span{ , , , }
1 2

{ , , , } E E E
n 1 2

208 Chapter 5 The Vector Space
n
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 208
Theorem 6 has a number of useful consequences. Here is the first.
Theorem 7
Let U be a subspace of
n
, and let be a set of m vectors in U
where m dim U. Then
B is independent if and only if B spans U.
PROOF
Suppose B is independent. If B does not span U then, by Theorem 6, B can be
enlarged to a basis of U containing more than m vectors. This contradicts the
invariance theorem because dim U m, so B spans U. Conversely, if B spans U
but is not independent, then B can be cut down to a basis of U containing fewer
than m vectors, again a contradiction. So B is independent, as required.
Theorem 7 is a labour-saving result. It asserts that, given a subspace U of
dimension m and a set B of exactly m vectors in U, to prove that B is a basis of U it
suffices to show either that B spans U or that B is independent. It is not necessary to
verify both properties.
Example 10
Find a basis of
4
containing B {X
1
, X
2
, X
3
} where ,
, and .
Solution If , then it is routine to verify that {E
1
, X
1
, X
2
, X
3
} is linearly
independent. Since
4
has dimension 4 it follows by Theorem 7 that
{E
1
, X
1
, X
2
, X
3
} is a basis.
9
Theorem 8
Let U

Wbe subspaces of
n
. Then:
1. dim U dim W.
2. If dim U dim W, then U W.
PROOF
Write dim W k, and let B be a basis of U.
1. If dim U > k, then B is an independent set in Wcontaining more than k
vectors, contradicting the fundamental theorem. So dim U dim W,
proving (1).
2. If dim U k, then B is an independent set in Wcontaining k dim Wvectors,
so B spans Wby Theorem 7. Hence W span B U, proving (2).
E
T
1
1 0 0 0 [ ]
X
T
3
1 1 1 1 [ ] X
T
2
0 1 1 3 [ ]
X
T
1
1 2 1 0 [ ]
B X X X
m
{ , , , }
1 2

209 Section 5.2 Independence and Dimension
9 In fact, an independent subset of
n
can always be enlarged to a basis by adding vectors from the standard basis of
n
. (See
Example 7 6.4.)
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 209
It follows from Theorem 8 that if U is a subspace of
n
, then , and
that
, and
The other subspaces are called proper. The following example uses Theorem 8 to
show that the proper subspaces of
2
are the lines through the origin, while the
proper subspaces of
3
are the lines and planes through the origin.
Example 11
1. If U is a subspace of
2
or
3
, then dim U 1 if and only if U is a line
through the origin.
2. If U is a subspace of
3
, then dim U 2 if and only if U is a plane through
the origin.
PROOF
1. Since dim U 1, let be a basis of U. Then , so U is the line
through the origin with direction vector . Conversely each line L with direction
vector has the form . Hence {d

} is a basis of U, so U has
dimension 1.
2. If U


3
has dimension 2, let be a basis of U. Then and are not par-
allel (by Example 6) so . Let denote
the plane through the origin with normal . Then P is a subspace of
3
(Example 1 5.1) and both and lie in P (they are orthogonal to ), so
by Theorem 1 5.1. Hence
.
Since dim U 2 and dim(
3
) 3, it follows from Theorem 8 that dimP 2 or 3,
whence P U or
3
. But P
3
(for example, is not in P) and so U P is a plane
through the origin.
Conversely, if U is a plane through the origin, then dim U 0, 1, 2, or 3 by
Theorem 8. But dim U 0 or 3 because and U
3
, and dim U 1 by (1).
So dim U 2.
Note that this proof shows that if and are nonzero, nonparallel vectors in
3
,
then is the plane with normal . We gave a geometrical
verification of this fact in Section 5.1.
Proof of the Fundamental Theorem
Fundamental Theorem (Theorem 4)
Let U be a subspace of
n
. If and if is an
independent set in U, then k m.
{ , , , } Y Y Y
k 1 2
U X X X
m
span{ , , , }
1 2

n v w

span{ , } v w

w

U { } 0

U P
3
U v w P

span{ , }

n

P x n x { }

in
3
0 i n v w


0
w

{ , } v w

L td t { }

in d

U tu t { }

in { } u

dimU n U
n
if and only if
dim { } U U 0 0 if and only if
dimU n 0 1 2 , , , ,
210 Chapter 5 The Vector Space
n
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 210
PROOF
We assume that k > m and show that this leads to a contradiction. Each Y
j
is in
, so write
where the coefficients a
ij
are real numbers. These coefficients form column j of
a matrix A [a
ij
] of size m k. Since k > m by assumption, the homogeneous
system AX 0 has a nontrivial solution . Consider the
linear combination of the Y
j
with these x
j
as coefficients:
where for each i because it is the ith entry of AX 0. Since the
Y
j
are independent, the equation implies that each x
j
0, a
contradiction because X 0.
Exercises 5.2
j
k
j j
x Y


1
0
j
k
ij j
a x


1
0
x Y x a X a x X
j j
j
k
j
j
k
ij i
i
m
i
m
ij j
j
k
i

_
,

_
,




1 1 1 1 1
(( ) , 0 0
1
X
i
i
m

X x x x
k
T
[ ]
1 2
0
Y a X a X a X a X j k
j j j mj m ij i
i
m
+ + +

1 1 2 2
1
1 2 , , , , ,
U X X X
m
span{ , , , }
1 2

211 Section 5.2 Independence and Dimension
1. Which of the following subsets are independent?
Support your answer.
(a) {[1 1 0]
T
, [3 2 1]
T
, [3 5 2]
T
} in
3
.
(b) {[1 1 1]
T
, [1 1 1]
T
, [0 0 1]
T
} in
3
.
(c) {[1 1 1 1]
T
, [2 0 1 0]
T
, [0 2 1 2]
T
} in
4
.
(d) {[1 1 0 0]
T
, [1 0 1 0]
T
, [0 0 1 1]
T
,
[0 1 0 1]
T
} in
4
.
2. Let {X, Y, Z, W} be an independent set in
n
.
Which of the following sets is independent?
Support your answer.
(a) {X Y, Y Z, Z X}
(b) {X + Y, Y + Z, Z + X}
(c) {X Y, Y Z, Z W, W X}
(d) {X + Y, Y + Z, Z + W, W + X}
3. Find a basis and calculate the dimension of the
following subspaces of
4
.
(a) span{[1 1 2 0]
T
, [2 3 0 3]
T
, [1 9 6 6]
T
}.
(b) span{[2 1 0 1]
T
, [1 1 1 2]
T
, [2 7 4 5]
T
}.
(c) span{[1 2 1 0]
T
, [2 0 3 1]
T
, [4 4 11 3]
T
,
[3 2 2 1]
T
}.
(d) span{[2 0 3 1]
T
, [1 2 1 0]
T
, [2 8 5 3]
T
,
[1 2 2 1]
T
}.
4. Find a basis and calculate the dimension of the
following subspaces of
4
.
(a) U {[a a + b a b b]
T
a and b in }.
(b) U {[a + b a b b a]
T
a and b in }.
(c) U {[a b c + a c]
T
a, b, and c in }.
(d) U {[a b b + c a b + c]
T
a, b, and c in }.
(e) U {[a b c d]
T
a + b c + d 0 in }.
(f ) U {[a b c d]
T
a + b c + d in }.
5. Suppose that {X, Y, Z, W} is a basis of
4
. Show
that:
(a) {X + aW, Y, Z, W} is also a basis of
4
for
any choice of the scalar a.
(b) {X + W, Y + W, Z + W, W} is also a basis
of
4
.
(c) {X, X + Y, X + Y + Z, X + Y + Z + W} is also
a basis of
4
.
6. Use Theorem 3 to determine if the following
sets of vectors are a basis of the indicated space.
(a)
(b)
(c)
(d)
(e)

(f ) {[ ] , [ ] , [ ] ,
[ ] } .
1 0 2 5 4 4 3 2 0 1 0 3
1 3 3 10
4

T T T
T
in
{[ ] , [ ] , [ ] ,
[ ] } .
2 1 1 3 1 1 0 2 0 1 0 3
1 2 3 1
4

T T T
T
in
{[ ] , [ ] , [ ] } . 5 2 1 1 0 1 3 1 0
3

T T T
in
{[ ] , [ ] , [ ] } . 1 1 1 1 1 2 0 0 1
3 T T T
in
{[ ] , [ ] , [ ] } . 1 1 1 1 1 1 0 0 1
3

T T T
in
{[ ] , [ ] } . 3 1 2 2
2

T T
in
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 211
7. In each case show that the statement is true or
give an example showing that it is false.
(a) If {X, Y} is independent, then {X, Y, X + Y}
is independent.
(b) If {X, Y, Z} is independent, then {Y, Z} is
independent.
(c) If {Y, Z} is dependent, then {X, Y, Z} is
dependent.
(d) If all of are nonzero, then
is independent.
(e) If one of is zero, then
is dependent.
(f ) If aX + bY + cZ 0, then {X, Y, Z} is inde-
pendent.
(g) If {X, Y, Z} is independent, then
aX + bY + cZ 0 for some a, b, and c in .
(h) If is dependent, then
for some num-
bers t
i
in not all zero.
(i) If is independent, then
for some t
i
in .
8. If A is an n n matrix, show that det A 0 if
and only if some column of A is a linear
combination of the other columns.
9. Let {X, Y, Z} be a linearly independent set in

4
. Show that {X, Y, Z, E
k
} is a basis of
4
for
some E
k
in the standard basis {E
1
, E
2
, E
3
, E
4
}.

10. If {X
1
, X
2
, X
3
, X
4
, X
5
, X
6
} is an independent set
of vectors, show that the subset {X
2
, X
3
, X
5
} is
also independent.
11. Let A be any m n matrix, and let B
1
, B
2
,
B
3
, , B
k
be columns in
m
such that the
system AX B
i
has a solution X
i
for each i. If
{B
1
, B
2
, B
3
, , B
k
} is independent in
m
, show
that {X
1
, X
2
, X
3
, , X
k
} is independent in
n
.

12. If {X
1
, X
2
, X
3
, , X
k
} is independent, show that
{X
1
, X
1
+ X
2
, X
1
+ X
2
+ X
3
, , X
1
+ X
2
+

+ X
k
}
is also independent.
13. If {Y, X
1
, X
2
, X
3
, , X
k
} is independent, show
that {Y + X
1
, Y + X
2
, Y + X
3
, , Y + X
k
} is also
independent.
14. Suppose that {X, Y} is a basis of
2
, and let
.
(a) If A is invertible, show that
{aX + bY, cX + dY } is a basis of
2
.
(b) If {aX + bY, cX + dY } is a basis of
2
, show
that A is invertible.
15. Let A denote an m n matrix.
(a) Show that null A null(UA) for every
invertible m m matrix U.
(b) Show that dim(null A) dim(null(AV )) for
every invertible n n matrix V. [Hint: If
{X
1
, X
2
, , X
k
} is a basis of null A, show that
{V
1
X
1
, V
1
X
2
, , V
1
X
k
} is a basis of
null(AV ).]
16. Let A denote an m n matrix.
(a) Show that im A im(AV ) for every
invertible n n matrix V.
(b) Show that dim(im A) dim(im(UA)) for
every invertible m m matrix U.
[Hint: If {Y
1
, Y
2
, , Y
k
} is a basis of im(UA),
show that {U
1
Y
1
, U
1
Y
2
, , U
1
Y
k
} is a
basis of im A.]
17. Let U and Wdenote subspaces of
n
, and
assume that U

W. If dimU n 1, show
that either W U or W
n
.

18. Let U and Wdenote subspaces of


n
, and
assume that U

W. If dimW 1, show that


either U {0} or U W.
A
a b
c d

1
]
1
t X t X t X
k k 1 1 2 2
0 + + +
{ } X X X
k 1 2
, , ,
t X t X t X
k k 1 1 2 2
0 + + +
{ } X X X
k 1 2
, , ,
{ } X X X
k 1 2
, , ,
X X X
k 1 2
, , ,
{ } X X X
k 1 2
, , ,
X X X
k 1 2
, , ,
212 Chapter 5 The Vector Space
n
SECTION 5.3 Orthogonality
Length and orthogonality are basic concepts in geometry and, in
2
and
3
, they
can both can be defined using the dot product. In this section we extend these
concepts to
n
, introduce the idea of an orthogonal basisone of the most useful
concepts in linear algebra, and begin exploring some of its applications.
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 212
Dot Product, Length, and Distance
Let and be two vectors in
n
. The dot
product X
i
Y of X and Y is defined by
.
Note that technically X
T
Y is a 1 1 matrix, which we take to be a number. The
length

X

of the vector X is defined by


where indicates the positive square root. A vector of length 1 is called a
unit vector.
Example 1
If and in
5
, then
X
i
Y 2 + 0 15 + 2 1 12, and .
These definitions agree with those in
2
and
3
, and many properties carry over to
n
:
Theorem 1
Let X, Y, and Z denote vectors in
n
. Then:
1. X
i
Y Y
i
X.
2. X
i
(Y + Z) X
i
Y + X
i
Z.
3. (aX)
i
Y a(X
i
Y) X
i
(aY ) for all scalars a.
4.

X

2
X
i
X.
5.

X

0, and

0 if and only if X 0.
6.

aX

for all scalars a.


PROOF
(1), (2), and (3) follow from matrix arithmetic because X
i
Y X
T
Y; (4) is clear
from the definition; and (6) is a routine verification since . If
, then , so

0 if and only if
. Since each x
i
is a real number this happens if and only if x
i
0
for each i; that is, if and only if X 0. This proves (5).
Because of Theorem 1, computations with dot products in
n
are similar to those
in
3
. In particular, the dot product
equals the sum of mn terms, X
i
i
Y
j
, one for each choice of i and j. For example:
holds for all vectors X and Y.
( ) ( ) 3 4 7 2 21 6 28 8
21 22 8
2 2
X Y X Y X X X Y Y X Y Y
X X Y Y
+ +

i i i i i
i
( ) ( ) X X X Y Y Y
m n 1 2 1 2
+ + + + + + i
x x x
n 1
2
2
2 2
0 + + +
X x x x
n
+ + +
1
2
2
2 2
X x x x
n
T
[ ]
1 2

a a
2
X + + + + 1 0 9 4 1 15
Y
T
[ ] 2 1 5 1 1 X
T
[ ] 1 0 3 2 1
X X X x x x
n
+ + + i
1
2
2
2 2
X Y X Y
x y x y x y
T
n n
i

+ + +
1 2 2 2
Y
y y y
n
T
[ ] 1 2 X x x x
n
T
[ ]
1 2

213 Section 5.3 Orthogonality
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 213
Example 2
Show that

X + Y

2
+ 2X
i
Y +

Y

2
for any X and Y in
n
.
Solution Using Theorem 1 several times:
Example 3
Suppose that for some vectors F
i
. If X
i
F
i
0 for each
i where X is in
n
, show that X 0.
Solution We show X 0 by showing that

0 and using (5) of Theorem 1.


Since the F
i
span
n
, write where the t
i
are in .
Then
We saw in Section 4.2 that if u

and v

are nonzero vectors in


3
, then
where is the angle between u

and v

. Since

cos

1 for
any angle , this shows that

u
i
v

. In this form the result holds in


n
.
Theorem 2 Cauchy Inequality
10
If X and Y are vectors in
n
, then
.
Moreover

X
i
Y



X

Y

if and only if one of X and Y is a multiple


of the other.
PROOF
The inequality holds if X 0 or Y 0 (in fact it is equality). Otherwise, write

a > 0 and

b > 0 for convenience. A computation like that in


Example 2 gives
. ()
It follows that ab X
i
Y 0 and ab + X
i
Y 0, and so that ab X
i
Y ab.
Hence

X
i
Y

ab

X

Y

, proving the Cauchy inequality.


bX aY ab ab X Y bX aY ab ab X Y + +
2 2
2 2 ( ) ( ) i i and
X Y X Y i
u v
u u


i
|| |||| ||
cos
2
1 1 2 2
1 1 2 2
X X X X t F t F t F
t X F t X F t X F
t
k k
k k
+ + +
+ + +

i i
i i i
( )
( ) ( ) ( )
11 2
0 0 0 0 ( ) ( ) ( ) . + + + t t
k

X t F t F t F
k k
+ + +
1 1 2 2

n
k
F F F , , , span{ }
1 2

X Y X Y X Y X X X Y Y X Y Y
X X Y Y
+ + + + + +
+ +
2
2 2
2
( ) ( )
.
i i i i i
i
214 Chapter 5 The Vector Space
n
10 Augustin Louis Cauchy (17891857) was born in Paris and became a professor at the cole Polytechnique at the age of 26.
He was one of the great mathematicians, producing more than 700 papers, and is best remembered for his work in analysis
in which he established new standards of rigour and founded the theory of functions of a complex variable. He was a devout
Catholic with a long-term interest in charitable work, and he was a royalist, following King Charles X into exile in Prague after
he was deposed in 1830.
Augustin Louis Cauchy.
Photo Corbis.
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 214
215 Section 5.3 Orthogonality
If equality holds, then

X
i
Y

ab, so X
i
Y ab or X
i
Y ab. Hence ()
shows that bX aY 0 or bX + aY 0, so one of X and Y is a multiple of the
other (even if a 0 or b 0).
The Cauchy inequality is equivalent to (X
i
Y)
2

2
; for example, in
5
this
becomes
for all x
i
and y
i
in .
There is an important consequence of the Cauchy inequality. Given X and Y in

n
, use Example 2 and the fact that X
i
Y

X

Y

to compute
.
Taking positive square roots gives:
Corollary Triangle Inequality
If X and Y are vectors in
n
, then

X + Y

+

Y

.
The reason for the name comes from the observation that in
2
the inequality
asserts that the sum of the lengths of two sides of a triangle is not less than the
length of the third side. This is illustrated in the first diagram.
If X and Y are two vectors in
n
, we define the distance d(X, Y ) between
X and Y by
The motivation again comes from
2
as is clear in the second diagram. This
distance function has all the intuitive properties of distance in
2
, including another
version of the triangle inequality.
Theorem 3
If X, Y, and Z are three vectors in
n
we have:
1. d(X, Y ) 0 for all X and Y.
2. d(X, Y ) 0 if and only if X Y.
3. d(X, Y ) d(Y, X).
4. Triangle inequality. d(X, Z) d(X, Y ) + d(Y, Z ).
PROOF
(1) and (2) restate part (5) of Theorem 1 because d(X, Y )

X Y

, and (3) follows


because

for every vector U in


n
. To prove (4) use the Corollary
to Theorem 2:
d X Z X Z X Y Y Z
X Y Y Z d X Y d Y Z
( , ) ( ) ( )
( ) ( ) ( , ) ( , ).
+
+ +
d X Y X Y ( , )
X Y X X Y Y X X Y Y X Y + + + + + +
2 2 2 2 2 2
2 2 i ( )
( )
( ) (
x y x y x y x y x y
x x x x x
y y
1 1 2 2 3 3 4 4 5 5
2
1
2
2
2
3
2
4
2
5
2
2
1
2
2
2
+ + + +
+ + + +
+ ++ + + y y y
3
2
4
2
5
2
2
)
+

v

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 215


Orthogonal Sets and the Expansion Theorem
Two nonzero vectors and in
3
are orthogonal if and only if
(Theorem 3 4.2). More generally, a set of vectors in
n
is called
an orthogonal set if
.
Note that {X} is an orthogonal set if X 0. A set of vectors in
n
is
called orthonormal if it is orthogonal and, in addition, each X
i
is a unit vector:
.
Example 4
The standard basis is an orthonormal set in
n
.
The routine verification is left to the reader, as is the proof of:
Example 5
If is orthogonal, so also is for any
nonzero scalars a
i
.
If X 0, it follows from item (6) of Theorem 1 that X is a unit vector,
that is it has length 1. Hence if is an orthogonal set, then
is an orthonormal set, and we say that it is
the result of normalizing the orthogonal set .
Example 6
If , and ,
then {F
1
, F
2
, F
3
, F
4
} is an orthogonal set in
4
as is easily verified. After
normalizing, the corresponding orthonormal set is .
The most important result about orthogonality is Pythagoras theorem. Given
orthogonal vectors and in
3
, it asserts that as in the dia-
gram. In this form the result holds for any orthogonal set in
n
.
Theorem 4 Pythagoras Theorem
If is a orthogonal set in
n
, then
.
PROOF
The fact that X
i
i
X
j
0 whenever i j gives
This is what we wanted.
X X X X X X X X X
X X X X X X
k k k
k k
1 2
2
1 2 1 2
1 1 2 2
+ + + + + + + + +
+ + +
i
i i i
( ) ( )
( ))
( )
( ) ( ).
+ + + +
+ + + + + + +
X X X X X X
X X X
k
1 2 1 3 2 3
1
2
2
2 2
0 0 0
i i i

X X X X X X
k k 1 2
2
1
2
2
2 2
+ + + + + +
X X X
k 1 2
, , ,
v w v w

+ +
2 2 2
= w

{ } , , ,
1
2
1
1
6
2
1
2
3
1
2 3
4
F F F F
F
T
4
1 3 1 1 [ ] F F F
T T T
1 2 3
1 1 1 1 1 0 1 2 1 0 1 0 [ ] , [ ] , [ ]
{ , , , } X X X
k 1 2

1 1 1
1
1
2
2
X
X
X
X
X
X
k
k
, , ,

{ , , , } X X X
k 1 2

1
X
{ , , , } a X a X a X
k k 1 1 2 2
{ , , , } X X X
k 1 2

{ , , , } E E E
n 1 2

X i
i
1 for each
{ , , , } X X X
k 1 2

X X i j X i
i j i
i 0 0
11
for all and for all
{ , , , } X X X
k 1 2

v w

i 0 w

216 Chapter 5 The Vector Space


n
11 The reason for insisting that orthogonal sets consist of nonzero vectors is that we will be primarily concerned with orthogonal bases.
v
v + w w
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 216
If and are orthogonal, nonzero vectors in
3
, then they are certainly not
parallel, and so are linearly independent by Example 6 5.2. The next theorem gives
a far-reaching extension of this observation.
Theorem 5
Every orthogonal set in
n
is linearly independent.
PROOF
Let be an orthogonal set in
n
and suppose a linear combination
vanishes: . Then
Since

X
1
2
0, this implies that t
1
0. Similarly t
i
0 for each i.
Theorem 5 suggests considering orthogonal bases for
n
, that is orthogonal sets
that span
n
. These turn out to be the best bases. One reason is that, when expand-
ing a vector as a linear combination of the basis vectors, there are explicit formulas
for the coefficients.
Theorem 6 Expansion Theorem
Let be an orthogonal basis of a subspace U of
n
. If X is any
vector in U, we have
.
PROOF
Since spans U, we have where the t
i
are scalars. To find t
1
we take the dot product of both sides with F
1
:
Since F
1
0, this gives . Similarly, for each i.
The expansion of X as a linear combination of the orthogonal basis is
called the Fourier expansion of X, and the coefficients are called the
Fourier coefficients. Note that if is actually orthonormal, then
t
i
X
i
F
i
for each i. We will have a great deal more to say about this in Section 10.5.
{ } F F F
m 1 2
, , ,
t
X F
F
i
i
i

i
2
{ } F F F
m 1 2
, , ,
t
X F
F
i
i
i

i
2
t
X F
F
1
1
1
2

i
X F t F t F t F F
t F F t F F t F F
m m
m m
i i
i i i
1 1 1 2 2 1
1 1 1 2 2 1 1
+ + +
+ + +
( )
( ) ( ) ( )
+ + +

t F t t
t F
m 1 1
2
2
1 1
2
0 0 ( ) ( )
.

X t F t F t F
m m
+ + +
1 1 2 2
{ } F F F
m 1 2
, , ,
X
X F
F
F
X F
F
F
X F
F
F
m
m
m
+ + +

i i

i
1
1
2
1
2
2
2
2
2
{ , , , } F F F
m 1 2

0 0
1 1 1 1 2 2
1 1 1 2 1 2 1
+ + +
+ + +
X X t X t X t X
t X X t X X t X X
k k
k
i i
i i i
( )
( ) ( ) (
kk
k
t X t t
t X
)
( ) ( )
.
+ + +

1 1
2
2
1 1
2
0 0
t X t X t X
k k 1 1 2 2
0 + + +
{ , , , } X X X
k 1 2

w

217 Section 5.3 Orthogonality


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 217
Example 7
Expand as a linear combination of the orthogonal basis
{F
1
, F
2
, F
3
, F
4
} of
4
given in Example 6.
Solution We have , , , and
, so the Fourier coefficients are
The reader can verify that indeed X t
1
F
1
+ t
2
F
2
+ t
3
F
3
+ t
4
F
4
.
A natural question arises here: Does every subspace U of
n
have an orthogonal
basis? The answer is yes; in fact, there is a systematic procedure, called the
GramSchmidt algorithm, for turning any basis of U into an orthogonal one. This
leads to a definition of the projection onto a subspace U that generalizes the projec-
tion along a vector used in
2
and
3
. All this is discussed in Section 8.1.
Exercises 5.3
t
X F
F
a b c d t
X F
F
a c
t
X F
F
a c
1
1
1
2
1
4
3
3
3
2
1
2
2
2
2
2
1
6
+ + +
+
i i
i
( ) ( )
( ++ + + 2 3
4
4
4
2
1
12
d t
X F
F
a b c d ) ( )
i
F
T
4
1 3 1 1 [ ]
F
T
3
1 0 1 0 [ ] F
T
2
1 0 1 2 [ ] F
T
1
1 1 1 1 [ ]
X a b c d
T
[ ]
218 Chapter 5 The Vector Space
n
1. Obtain an orthonormal basis of
3
by
normalizing the following.
(a) {[1 1 2]
T
, [0 2 1]
T
, [5 1 2]
T
}
(b) {[1 1 1]
T
, [4 1 5]
T
, [2 3 1]
T
}
2. In each case, show that the set of vectors is
orthogonal in
4
.
(a) {[1 1 2 5]
T
, [4 1 1 1]
T
, [7 28 5 5]
T
}
( b) {[2 1 4 5]
T
, [0 1 1 1]
T
, [0 3 2 1]
T
}
3. In each case, show that B is an orthogonal
basis of
3
and use Theorem 6 to expand
X [a b c]
T
as a linear combination of the basis
vectors.
(a) B {[1 1 3]
T
, [2 1 1]
T
, [4 7 1]
T
}
( b) B {[1 0 1]
T
, [1 4 1]
T
, [2 1 2]
T
}
(c) B {[1 2 3]
T
, [1 1 1]
T
, [5 4 1]
T
}
(d) B {[1 1 1]
T
, [1 1 0]
T
, [1 1 2]
T
}
4. In each case, write X as a linear combination of
the orthogonal basis of the subspace U.
(a)
(b)
5. In each case, find all [a b c d ]
T
in
4
such that
the given set is orthogonal.
(a) {[1 2 1 0]
T
, [1 1 1 3]
T
, [2 1 0 1]
T
,
[a b c d]
T
}
(b) {[1 0 1 1]
T
, [2 1 1 1]
T
, [1 3 1 0]
T
,
[a b c d]
T
}
6. If and X i Y 2, compute:
7. In each case either show that the statement is
true or give an example showing that it is false.
(a) Every independent set in
n
is orthogonal.
(b) If {X, Y } is an orthogonal set in
n
, then
{X, X + Y } is also orthogonal.
(c) If {X, Y } and {Z, W} are both orthogonal in

n
, then {X, Y, Z, W} is also orthogonal.
(d) If {X
1
, X
2
} and {Y
1
, Y
2
, Y
3
} are both
orthogonal and X
i
i
Y
j
0 for all i and j,
then {X
1
, X
2
, Y
1
, Y
2
, Y
3
} is orthogonal.
(e) If is orthogonal in
n
, then
.
(f ) If X 0 in
n
, then {X} is an orthogonal set.

n
n
X X X , , , span{ }
1 2

{ } X X X
n 1 2
, , ,
(a)
(b)
(c)
(



3 5
2 7
3 2
X Y
X Y
X Y Y X

( ) ( ) i
dd) ( ) ( ) X Y X Y + 2 3 5 i
X Y 3 1 , ,
X
U
T
T T


[ ] ;
{[ ] , [ ] }
14 1 8 5
2 1 0 3 2 1 2 1 span
X
U
T
T T


[ ] ;
{[ ] , [ ] }
13 20 15
1 2 3 1 1 1 span
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 218
8. If A is an m n matrix with orthonormal
columns, show that A
T
A I
n
. [Hint: If
are the columns of A, show that
column j of A
T
A is
.
]
9. Use the Cauchy inequality to show that
for all x 0 and y 0. Here
and are called, respectively,
the geometric mean and arithmetic mean of
x and y.
[Hint: Use and .]
10. Use the Cauchy inequality to prove that:
(a) for all
r
i
in and all n 1.
(b) for all r
1
, r
2
,
and r
3
in . [Hint: See part (a).]
11. (a) Show that X and Y are orthogonal in
n
if
and only if
( b) Show that X + Y and X Y are orthogonal
in
n
if and only if
12. Show that if and only if
X is orthogonal to Y.
13. (a) Show that
for all X, Y in
n
.
( b) Show that
for all X, Y in
n
.
14. If A is n n, show that every eigenvalue of A
T
A
is nonnegative. [Hint: Compute where X
is an eigenvector.]
15. If
n
span{X
1
, , X
m
} and X
i
X
i
0 for all i,
show that X 0. [Hint: Show
]
16. If
n
span{X
1
, , X
m
} and X
i
X
i
Y
i
X
i
for
all i, show that X Y.
17. Let {E
1
, , E
n
} be an orthogonal basis of
n
.
Given X and Y in
n
, show that
. X Y
X E Y E
E
X E Y E
E
n n
n
i
i i

i i
+ +
( )( ) ( )( )
1 1
1
2 2
X 0.
AX
2
X Y
2
+ ]
X Y X Y
2 2 2 1
2
+ + [
X Y X Y X Y i +
1
4
2 2
[ ]
X Y X Y + +
2 2 2
X Y .
X Y X Y + .
r r r r r r r r r
1 2 1 3 2 3 1
2
2
2
3
2
+ + + +
( ) ( ) r r r n r r r
n n 1 2
2
1
2
2
2 2
+ + + + + +
Y
y
x

1
]
1
1
1
X
x
y

1
]
1
1
1
1
2
( ) x y +
xy xy x y +
1
2
( )
[ ] C C C C C C
j j n j
T
1 2
i i i
C C C
n 1 2
, , ,
219 Section 5.4 Rank of a Matrix
SECTION 5.4 Rank of a Matrix
In this section we use independence and spanning to properly define the rank of a
matrix and to study its properties. This requires that we deal with rows and columns
in the same way. While it has been our custom to write the n-tuples in
n
as
columns, in this section we will frequently write them as rows. Subspaces, independ-
ence, spanning, and dimension are defined for rows using matrix operations, just as
for columns. If A is an m n matrix, we define:
The column space, col A, of A is the subspace of
m
spanned by the columns of A.
The row space, row A, of A is the subspace of
n
spanned by the rows of A.
Much of what we do in this section involves these subspaces. Recall from
Theorem 4 2.2 that if are the columns of an m n matrix A, and if
is any column in
n
, then
. ()
With this we can prove:
Theorem 1
Let A, U, and V be matrices of sizes m n, p m, and n q respectively. Then:
1. col(AV)

col A, with equality if V is (square and) invertible.


2. row(UA)

row A, with equality if U is (square and) invertible.


AX C C C
x
x
x
x C x C x C
n
n
n n
+ + +

1
]

1
]
1
1
1
1
1
1
1
1 2
1
2
1 1 2 2

X x x x
n
T
[ ]
1 2

C C C
n 1 2
, , ,
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 219
PROOF
Let denote the columns of A, and let
denote column j of V. Then column j of AV is AX
j
, and by () this is
AX
j
. Since this is in col A for each j, it follows that
col(AV)

col A. If V is invertible, we obtain col A col[(AV)V


1
]

col(AV) in
the same way, and (1) follows.
As to (2), we have col[(UA)
T
] col(A
T
U
T
)

col(A
T
) by (1), from which it
follows that row(UA)

row A. If U is invertible, we obtain row(UA) row A as


in the proof of (1).
Now suppose that a matrix A is carried to some row-echelon matrix R by row
operations. Then R UA for some invertible matrix U by Theorem 1 2.4, so
Theorem 1 shows that row R row A. Moreover, the next lemma shows that
dim(row R) is the rank of A defined in Section 1.2, and hence shows that rank A is
independent of the particular row-echelon matrix to which A can be carried. This
fact was not proved in Section 1.2.
Lemma 1
Let R denote an m n row-echelon matrix.
1. The rows of R are a basis of row R.
2. The columns of R containing the leading ones are a basis of col R.
PROOF
1. If denote the nonzero rows of R, we have
by definition. Suppose where each a
i
is in . Then
a
1
0 because the leading 1 in R
1
is to the left of any nonzero entry in any other
R
i
. But then and so a
2
0 in the same way (because the
matrix R with R
1
deleted is also row-echelon). This continues to show that
each a
i
0.
2. The r columns containing leading ones are independent because the leading ones
are in different rows (and have zeros below them). It is clear that col R is con-
tained in the subspace of all columns in
m
with the last m r entries zero. This
space has dimension r, so the r independent columns containing leading ones are
a basis by Theorem 7 5.2.
Somewhat surprisingly, Lemma 1 is instrumental in showing that
dim(col A) dim(row A) for any matrix A. This is the main result in the following
fundamental theorem.
Theorem 2 Rank Theorem
Let A denote any m n matrix. Then
.
Moreover, suppose A can be carried to a matrix R in row-echelon form by a
series of elementary row operations. If r denotes the number of nonzero rows
in R, then
dim( ) dim( ) row col A A
a R a R
r r 2 2
0 + +
a R a R a R
r r 1 1 2 2
0 + + +
row span{ } R R R R
r
, , ,
1 2
R R R
r 1 2
, , ,
x C x C x C
n n 1 1 2 2
+ + +
X x x x
j n
T
[ ]
1 2
C C C
n 1 2
, , ,
220 Chapter 5 The Vector Space
n
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 220
1. The r nonzero rows of R are a basis of row A.
2. If the leading 1s lie in columns j
1
, j
2
, , j
r
of R, then the corresponding
columns j
1
, j
2
, , j
r
of A are a basis of col A.
PROOF
We have R UA for some invertible matrix U. Hence rowA rowR by Theorem 1,
and (1) follows from Lemma 1.
To prove (2), let C
1
, C
2
, , C
n
denote the columns of A. Then
A [C
1
C
2

C
n
] in block form, and
.
Hence, in the notation of (2), the set consists of
the columns of R that contain a leading 1, so B is a basis of col R by Lemma 1.
But then the fact that U is invertible implies that is linearly
independent. Furthermore, if C
j
is any column of A, then UC
j
is a linear
combination of the columns in the set B. Again, the invertibility of U implies that
C
j
is a linear combination of This proves (2).
Finally, dim(row A) r dim(col A) by (1) and (2).
The common dimension of the row and column spaces of an m n matrix A is
called the rank of A and is denoted rank A. By (1) of Theorem 2, this agrees with
the definition in Section 1.2 and we record the result for reference.
Corollary 1
Suppose a matrix A can be carried to a matrix R in row-echelon form by a series
of elementary row operations. Then the rank of A is equal to the number of
nonzero rows of R.
Example 1
Compute the rank of matrix and find bases for the row
space and the column space of A.
Solution The reduction of A to row-echelon form is as follows:
Hence rank A 2, and {[ 1 2 2 1 ], [ 0 0 1 3 ]} is a basis of the row
space of A. Moreover, the leading 1s are in columns 1 and 3 of the
row-echelon matrix, so Theorem 2 shows that columns 1 and 3 of A are
a basis of col A.
1
3
1
2
5
1

1
]
1
1

1
]
1
1

,
1 2 2 1
3 6 5 0
1 2 1 2
1 2 2 1
0 0 1 3
0 0 1 3
1 2 2 1
0 0 1 3
0 0

1
]
1
1

1
]
1
1

00 0

1
]
1
1
A

1
]
1
1
1 2 2 1
3 6 5 0
1 2 1 2
C C C
j j j
r 1 2
, , , .
{ , , , } C C C
j j j
r 1 2

B UC UC UC
j j j
r
{ , , }
1 2
,
R UA U C C C UC UC UC
n n
[ ] [ ]
1 2 1 2

221 Section 5.4 Rank of a Matrix
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 221
The rank theorem has several other important consequences. Corollary 2 follows
because the rows of A are independent (respectively, span row A) if and only if their
transposes are independent (respectively, span col(A
T
)).
Corollary 2
If A is any matrix, then rank A rank(A
T
).
Corollary 3
If A is an m n matrix, then rank A m and rank A n.
PROOF
If A is carried to the row-echelon matrix R by row operations, then Corollary 1
shows that rank A r where r is the number of nonzero rows of R. Since R is m n
too, it follows that r m. Applying this to A
T
gives rank(A
T
) n because A
T
is
n m. Hence we are done by Corollary 2.
Theorem 1 immediately yields
Corollary 4
rank A rank(UA) rank(AV) whenever U and V are invertible.
Corollary 5
An n n matrix A is invertible if and only if rank A n.
PROOF
If A is invertible, then A I
n
by row operations ( by Theorem 5 2.3), so rank A n
by Corollary 1. Conversely, let A R by row operations where R is an n n
reduced row-echelon matrix. If rank A n, then R has n leading ones by Corollary 1,
and so R I
n
. Hence A I
n
so A is invertible, again by Theorem 5 2.3.
The rank theorem can be used to find bases of subspaces of the space of all
n 1 rows. Here is an example where n 4.
Example 2
Find a basis for the following subspace of
4
, (written as rows).
U

span{[ ], [ ], [ ]}
1 1 2 3 2 4 1 0 1 5 4 9
222 Chapter 5 The Vector Space
n
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 222
Solution U is just the row space of , so we reduce this to row-echelon
form:
The required basis is Thus is
also a basis and avoids fractions.
In Section 5.1 we discussed two other subspaces associated with an m n matrix A,
the null space null A {X

X in
n
and AX 0} and the image im A {AX X in
n
}.
Using the rank, there are simple ways to find bases for these spaces.
We already know ( Theorem 3 2.2) that null A is spanned by the basic solutions
to the system AX 0. The following example is instructive in showing that these
basic solutions are, in fact, independent (and so are a basis of null A).
Example 3
If find a basis of null A and so find its dimension.
Solution If X is in null A, then AX 0, so X is given by solving the system AX 0.
The reduction of the augmented matrix to reduced form is
Hence, writing X [x
1
x
2
x
3
x
4
]
T
, the leading variables are x
1
and x
3
, and the
nonleading variables x
2
and x
4
become parameters: x
2
s and x
4
t. Then the
equations corresponding to the reduced matrix determine the leading variables
in terms of the parameters:
This means that the general solution is
()
Hence X is in span{X
1
, X
2
} where X
1
[2 1 0 0]
T
and X
2
[1 0 2 1]
T
are
the basic solutions, and we have shown that null(A)

span{X
1
, X
2
}. But X
1
and
X
2
are in null A (they are solutions of AX 0), so
by Theorem 1 5.1. We claim further that {X
1
, X
2
} is linearly independent.
To see this, let sX
1
+ tX
2
0 be a linear combination that vanishes. Then (

)
shows that [2s + t s 2t t ]
T
0, whence s t 0. Thus {X
1
, X
2
} is a basis of
null(A), and so dim(null A) 2.
null { , } A X X span
1 2
X s t s t t s t
T T T
+ + [ ] [ ] [ ] . 2 2 2 1 0 0 1 0 2 1
x s t x t
1 3
2 2 + and .
1 2 1 1 0
1 2 0 1 0
2 4 1 0 0
1 2 0 1 0
0 0 1 2 0
0 0 0 0 0

1
]
1
1

1
]
1
1
A

1
]
1
1
1
1 2 1 1
1 2 0 1
2 4 1 0
{[1 1 2 3 0 2 3 6 ], [ ]} {[ ], [ ]}. 1 1 2 3 0 1 3
3
2

1 1 2 3
2 4 1 0
1 5 4 9
1 1 2 3
0 2 3 6
0 4 6 12
1 1 2 3
0 1
3
2

1
]
1
1

1
]
1
1

1
]
1
1
1
3
0 0 0 0
1 1 2 3
2 4 1 0
1 5 4 9

1
]
1
1
223 Section 5.4 Rank of a Matrix
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 223
224 Chapter 5 The Vector Space
n
The calculation in Example 3 is typical of what happens in general. If A is an
m n matrix with rank A r, there are exactly r leading variables, and hence exactly
n r nonleading variables. These lead to exactly n r basic solutions X
1
, X
2
, , X
nr
,
and the reduced equations give the leading variables as linear combinations of the
X
i
. Hence
( This is Theorem 3 2.2.). We now claim that these basic solutions X
i
are indepen-
dent. The general solution is a linear combination X t
1
X
1
+ t
2
X
2
+

+ t
nr
X
nr
where each coefficient t
i
is a parameter equal to a nonleading variable. Thus, if this
linear combination vanishes, then each t
i
0 (as for s and t in Example 3, each t
i
is a
coefficient when X is expressed as a linear combination of the standard basis of
n
).
This proves that {X
1
, X
2
, , X
nr
} is linearly independent, and so is a basis of null A.
This proves the first part of the following theorem.
Theorem 3
Let A denote an m n matrix of rank r.
1. If X
1
, X
2
, , X
nr
are the basic solutions of the homogeneous system AX 0
that are produced by the gaussian algorithm, then {X
1
, X
2
, , X
nr
} is
a basis of null A. In particular
2. We have im A col A so the rank theorem provides a basis of im A.
In particular,
PROOF
It remains to prove (2). But im A col A by Example 8 5.1, so
dim(im A) dim(col A) r. The rest follows from Theorem 2.
Let A be an m n matrix. Corollary 3 of the rank theorem asserts that
rank A m and rank A n, and it is natural to ask when these extreme cases arise.
If are the columns of A, Theorem 2 5.2 shows that
spans
m
if and only if the system AX B is consistent for every B in
m
, and
that is independent if and only if AX 0, X in
n
, implies X 0.
The next two theorems improve on both these results, and relate them to when the
rank of A is n or m.
Theorem 4
The following are equivalent for an m n matrix A:
1. rank A n.
2. The rows of A span
n
.
3. The columns of A are linearly independent in
m
.
{ } C C C
n 1 2
, , ,
{ } C C C
n 1 2
, , , C C C
n 1 2
, , ,
dim(im ) . A r
dim(null ) . A n r
null { , , , }. A X X X
n r


span
1 2

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 224
4. The n n matrix A
T
A is invertible.
5. CA I
n
for some n m matrix C.
6. If AX 0, X in
n
, then X 0.
PROOF
(1) (2). We have row A


n
, and dim(row A) n by (1), so row A
n
by
Theorem 8 5.2. This is (2).
(2) (3). By (2), row A
n
, so rank A n. This means dim(col A) n. Since the
n columns of A span col A, they are independent by Theorem 7 5.2.
(3) (4). If (A
T
A)X 0, X in
n
, we show that X 0 (Theorem 5 2.3). We have
.
Hence AX 0, so X 0 by (3) and Theorem 2 5.2.
(4) (5). Given (4), take C (A
T
A)
1
A
T
.
(5) (6). If AX 0, then left multiplication by C (from (5)) gives X 0.
(6) (1). Given (6), the columns of A are independent by Theorem 2 5.2.
Hence dim(col A) n, and (1) follows.
Theorem 5
The following are equivalent for an m n matrix A:
1. rank A m.
2. The columns of A span
m
.
3. The rows of A are independent in
n
.
4. The m m matrix AA
T
is invertible.
5. AC I
m
for some n m matrix C.
6. The system AX B is consistent for every B in
m
.
PROOF
(1) (2). By (1), dim(col A) m, so col A
m
by Theorem 8 5.2.
(2) (3). By (2), col A
m
, so rank A m. This means dim(row A) m. Since
the m rows of A span row A, they are independent by Theorem 7 5.2.
(3) (4). We have rank A m by (3), so the n m matrix A
T
has rank m. Hence
applying Theorem 4 to A
T
in place of A shows that (A
T
)
T
A
T
is invertible, proving (4).
(4) (5). Given (4), take C A
T
(AA
T
)
1
in (5).
(5) (6). Comparing columns in AC I
m
gives AC
j
E
j
for each j, where C
j
and
E
j
denote column j of C and I
m
respectively. Given B in
m
, write , r
j
in
. Then (6) holds with as the reader can verify.
(6) (1). Given (6), the columns of A span
m
by Theorem 2 5.2. Thus
col A
m
and (1) follows.
Example 4
Show that is invertible if x, y, and z are all distinct.
3
2 2 2
x y z
x y z x y z
+ +
+ + + +

1
]
1
X r C
j j
j
m

1
B r E
j j
j
m

1
AX AX AX X A AX X
T T T T 2
0 0 ( )
225 Section 5.4 Rank of a Matrix
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 225
Solution The given matrix has the form A
T
A where has independent
columns (verify). Hence Theorem 4 applies.
Theorems 4 and 5 relate several important properties of an m n matrix A to the
invertibility of the square, symmetric matrices A
T
A and AA
T
. In fact, even if the
columns of A are not independent or do not span
m
, the matrices A
T
A and AA
T
are
both symmetric and, as such, have real eigenvalues as we shall see. We return to this
in Chapter 7.
Exercises 5.4
A
x
y
z

1
]
1
1
1
1
1
226 Chapter 5 The Vector Space
n
1. In each case find bases for the row and column
spaces of A and determine the rank of A.
2. In each case find a basis of the subspace U.
(a) U span{[1 1 0 3], [2 1 5 1], [4 2 5 7]}
(b) U span{[1 1 2 5 1], [3 1 4 2 7],
[1 1 0 0 0], [5 1 6 7 8]}
(c)
(d)
3. (a) Can a 3 4 matrix have independent
columns? Independent rows? Explain.
(b) If A is 4 3 and rank A 2, can A have
independent columns? Independent rows?
Explain.
(c) If A is an m n matrix and rank A m,
show that m n.
(d) Can a nonsquare matrix have its rows
independent and its columns independent?
Explain.
(e) Can the null space of a 3 6 matrix have
dimension 2? Explain.
(f ) If A is not square, show that either the rows
of A or the columns of A are not linearly
independent.
4. (a) Show that rank UA rank A, with equality if
U is invertible.
(b) Show that rank AV rank A, with equality if
V is invertible.
5. Show that rank (AB) rank A and that
rank (AB) rank B.
6. Show that the rank does not change when an
elementary row or column operation is
performed on a matrix.
7. In each case find a basis of the null space of A.
Then compute rank Aand verify (1) of Theorem 3.
(b)
8. Let A CR where C 0 is a column in
m
and
R 0 is a row in
n
.
(a) Show that col A span{C} and
row A span{R}.
(b) Find dim(null A).
(c) Show that null A null R.
A

1
]
1
1
1
1
3 5 5 2 0
1 0 2 2 1
1 1 1 2 2
2 0 4 4 2


(a) A

1
]
1
1
1
1
3 1 1
2 0 1
4 2 1
1 1 1
U

1
]
1
1
1

1
]
1
1
1

1
]
1
1
1

span
1
5
6
2
6
8
3
7
10
4
8
12
11
]
1
1
1

1
]
1
1
1
1

1
]
1
1
1
1

1
]
1
1
1
1
span
1
1
0
0
0
0
1
1
1
0
1
0
0
1
0
1

1
]
1
1
1
1

(d) A

1
]
1
1 2 1 3
3 6 3 2
( ) c A



1
]
1
1
1
1
1 1 5 2 2
2 2 2 5 1
0 0 12 9 3
1 1 7 7 1
(a) (b) A A

1
]
1
1
1
1

2 4 6 8
2 1 3 2
4 5 9 10
0 1 1 2
2 1 1
2 1 1
4 2 3
6

33 0

1
]
1
1
1
1
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 226
SECTION 5.5 Similarity and Diagonalization
In Section 3.3 we studied diagonalization of a square matrix A, and found important
applications (for example to linear dynamical systems). We can now utilize the con-
cepts of subspace, basis, and dimension to clarify the diagonalization process, reveal
some new results, and prove some theorems which could not be demonstrated in
Section 3.3.
Before proceeding, we introduce a notion that simplifies the discussion of
diagonalization, and is used throughout the book.
Similar Matrices
If A and B are n n matrices, we say that A and B are similar, and write A B, if
B P
1
AP for some invertible matrix P, equivalently (writing Q P
1
) if B QAQ
1
for an invertible matrix Q. The language of similarity is used throughout linear
algebra. For example, a matrix A is diagonalizable if and only if it is similar to a
diagonal matrix.
If A B, then necessarily B A. To see why, suppose that B P
1
AP. Then
A PBP
1
Q
1
BQ where Q P
1
is invertible. This proves the second of the
following properties of similarity (the others are left as an exercise):
1. A A for all square matrices A.
2. If A B, then B A. (

)
3. If A B and B C, then A C.
227 Section 5.5 Similarity and Diagonalization
9. Show that null A 0 if and only if the columns
of A are independent.
10. Let A be an n n matrix.
(a) Show that A
2
0 if and only if col A

null A.
(b) Conclude that if A
2
0, then rank
(c) Find a matrix A for which col A null A.
11. If A is m n and B is n m, show that AB 0 if
and only if col B

null A.

12. If A is an m n matrix, show that


col A {AX X in
n
}.
13. Let A be an m n matrix with columns
C
1
, C
2
, , C
n
. If rank A n, show that
{A
T
C
1
, A
T
C
2
, , A
T
C
n
} is a basis of
n
.
14. If A is m n and B is m 1, show that B lies
in the column space of A if and only if
rank[A B] rank A.
15. (a) Show that AX B has a solution if and only
if rank A rank[A B]. [Hint: Exercises 12
and 14.]
(b) If AX B has no solution, show that
rank[A B] 1 + rank A.
16. Let X be a k m matrix. If I is the m m
identity matrix, show that I + X
T
X is invertible.
[Hint: I + X
T
X A
T
A where A in
block form.]
17. If A is m n of rank r, show that A can be
factored as A PQ where P is m r with r
independent columns, and Q is r n with r
independent rows. [Hint: Let by
Theorem 3, 2.4, and write and
in block form, where U
1
and V
1
are r r.]
18. (a) Show that if A and B have independent
columns, so does AB.
(b) Show that if A and B have independent
rows, so does AB.
19. A matrix obtained from A by deleting rows and
columns is called a submatrix of A. If A has an
invertible k k submatrix, show that rank A k.
[Hint: Show that row and column operations
carry in block form.] Remark: It
can be shown that rank A is the largest integer r
such that A has an invertible r r submatrix.
A
I P
Q
k

1
]
1
0
V
V V
V V

1
]
1
1 1 2
3 4
U
U U
U U

1
]
1
1 1 2
3 4
UAV
I
r

1
]
1
0
0 0
I
X

1
]
1
A
n

2
.
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 227
These properties are often expressed by saying that the similarity relation is an
equivalence relation on the set of n n matrices. Here is an example showing how
these properties are used.
Example 1
If A is similar to B and either A or B is diagonalizable, show that the other is
also diagonalizable.
Solution We have A B. Suppose that A is diagonalizable, say A D where D is
diagonal. Since B A by (2) of (

), we have B A and A D. Hence B D by


(3) of (

), so B is diagonalizable too. An analogous argument works if we


assume instead that B is diagonalizable.
Similarity is compatible with inverses, transposes, and powers in the following
sense: If A B, then A
1
B
1
, A
T
B
T
, and A
k
B
k
for all integers k 1 (the
proofs are routine matrix computations using Theorem 1 3.3). Thus, for example,
if A is diagonalizable, so also is A
T
, A
1
(if it exists), and A
k
(for each k 1).
Indeed, if A D where D is a diagonal matrix, we obtain A
T
D
T
, A
1
D
1
, and
A
k
D
k
, and each of the matrices D
T
, D
1
, and D
k
is diagonal.
We pause to introduce a simple matrix function that will be referred to later.
The trace tr A of an n n matrix A is defined to be the sum of the main diagonal
elements of A. In other words:
It is evident that tr(A + B) tr A + tr B and that tr(cA) c tr A holds for all n n
matrices A and B and all scalars c. The following fact is more surprising.
Lemma 1
Let A and B be n n matrices. Then tr(AB) tr(BA).
PROOF
Write A [a
i j
] and B [b
i j
]. For each i, the (i, i )-entry of the matrix AB is
Hence
.
Similarly we have . Since these two double sums are the same,
Lemma 1 is proved.
As the name indicates, similar matrices share many properties, some of which are
collected in the next theorem for reference.
Theorem 1
If A and B are similar n n matrices, then A and B have the same determinant,
rank, trace, characteristic polynomial, and eigenvalues.
tr( ) ( ) BA b a
ij ji
j i


tr( )
( )
AB d d d d a b
n i i i j ij ji
+ + +
1 2

d a b a b a b a b
i i i i i in ni ij ji
j
+ + +
1 1 2 2
.
If then tr A a A a a a
i j nn
+ + + [ ], .
11 22

228 Chapter 5 The Vector Space
n
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 228
PROOF
Let B P
1
AP for some invertible matrix P. Then we have
det B det(P
1
) det A det P det A because det(P
1
) 1/det P. Similarly,
rank B rank(P
1
AP) rank A by Corollary 4 of Theorem 2 5.4. Next Lemma 1
gives
As to the characteristic polynomial,
Finally, this shows that A and B have the same eigenvalues because the eigenvalues
of a matrix are the roots of its characteristic polynomial.
Example 2
The matrices have the same determinant, rank,
trace, characteristic polynomial, and eigenvalues, but they are not similar
because P
1
IP I for any invertible matrix P. Hence sharing the five properties
in Theorem 1 does not guarantee that two matrices are similar.
Diagonalization Revisited
Recall that a square matrix A is diagonalizable if there exists an invertible matrix P
such that P
1
AP D is a diagonal matrix, that is if A is similar to a diagonal matrix D.
Unfortunately, not all matrices are diagonalizable, for example the matrix
(see Example 8 3.3). Determining whether A is diagonalizable is closely related to
the eigenvalues and eigenvectors of A. Recall that a number is called an
eigenvalue of A if AX X for some nonzero column X in
n
, and any such
nonzero vector X is called an eigenvector of A corresponding to (or simply
a -eigenvector of A). The eigenvalues and eigenvectors of A are closely related to
the characteristic polynomial c
A
(x) of A, defined by
.
If A is n n this is a polynomial of degree n, and its relationship to the eigenvalues
is given in the following theorem ( a repeat of Theorem 2 3.3).
Theorem 2
Let A be an n n matrix.
1. The eigenvalues of A are the roots of the characteristic polynomial c
A
(x) of A.
2. The -eigenvectors X are the nonzero solutions to the homogeneous system
of linear equations with I A as coefficient matrix.
( ) I A X 0
c x xI A
A
( ) det( )
1 1
0 1

1
]
1
A I
1 1
0 1
and
1 0
0 1

1
]
1

1
]
1
c x xI B x P IP P AP
P xI A P
xI
B
( ) det{ } det{ ( ) }
det{ ( ) }
det(

1 1
1

A
c x
A
)
( ).
tr tr tr tr ( ) [ ( )] [( ) ] . P AP P AP AP P A


1 1 1
229 Section 5.5 Similarity and Diagonalization
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 229
Example 3
Show that the eigenvalues of a triangular matrix are the main diagonal entries.
Solution Assume that A is triangular. Then the matrix xI A is also triangular and has
diagonal entries (x a
11
), (x a
22
), , (x a
nn
) where A [a
i j
]. Hence
Theorem 4 3.1 gives
and the result follows because the eigenvalues are the roots of c
A
(x).
Theorem 3 3.3 asserts (in part) that an n n matrix A is diagonalizable if and
only if it has n eigenvectors X
1
, , X
n
such that the matrix P [X
1

X
n
] with the
X
i
as columns is invertible. This is equivalent to requiring that {X
1
, , X
n
} is a basis
of
n
consisting of eigenvectors of A. Hence we can restate Theorem 3 3.3 as
follows:
Theorem 3
Let A be an n n matrix.
1. A is diagonalizable if and only if
n
has a basis {X
1
, X
2
, , X
n
} consisting of
eigenvectors of A.
2. When this is the case, the matrix P [X
1

X
n
] is invertible and
P
1
AP diag(
1
,
2
, ,
n
) where, for each i,
i
is the eigenvalue of A
corresponding to X
i
.
The next result is a basic tool for determining when a matrix is diagonalizable.
It reveals an important connection between eigenvalues and linear independence:
Eigenvectors corresponding to distinct eigenvalues are necessarily linearly
independent.
Theorem 4
Let X
1
, X
2
, , X
k
be eigenvectors corresponding to distinct eigenvalues

1
,
2
, ,
k
of an n n matrix A. Then {X
1
, X
2
, , X
k
} is a linearly
independent set.
PROOF
We use induction on k. If k 1, then {X
1
} is independent because X
1
0. In general,
suppose the theorem is true for some k 1. Given eigenvectors {X
1
, X
2
, , X
k +1
},
suppose a linear combination vanishes:
(

)
We must show that each t
i
0. Left multiply (

) by A and use the fact that


AX
i

i
X
i
to get
(

)
If we multiply (

) by
1
and subtract the result from (

), the first terms cancel and


we obtain
t X t X t X
k k k 2 2 1 2 3 3 1 3 1 1 1 1
0 ( ) ( ) ( ) . + + +
+ + +

t X t X t X
k k k 1 1 1 2 2 2 1 1 1
0 + + +
+ + +
.
t X t X t X
k k 1 1 2 2 1 1
0 + + +
+ +
.
c x x a x a x a
A nn
( ) ( )( ) ( )
11 22

230 Chapter 5 The Vector Space
n
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 230
Since X
2
, X
3
, , X
k+1
correspond to distinct eigenvalues
2
,
3
, ,
k+1
, the set
{X
2
, X
3
, , X
k+1
} is independent by the induction hypothesis. Hence,
and so t
2
t
3


t
k +1
0 because the
i
are distinct. Hence (

) becomes
t
1
X
1
0, which implies that t
1
0 because X
1
0. This is what we wanted.
Theorem 4 will be applied several times; we begin by using it to give a useful test
for when a matrix is diagonalizable.
Theorem 5
If A is an n n matrix with n distinct eigenvalues, then A is diagonalizable.
PROOF
Choose one eigenvector for each of the n distinct eigenvalues. Then these
eigenvectors are independent by Theorem 4, and so are a basis of
n
by
Theorem 7 5.2. Now use Theorem 3.
Example 4
Show that is diagonalizable.
Solution A routine computation shows that c
A
(x) (x 1)(x 3)(x + 1) and so has
distinct eigenvalues 1, 3, and 1. Hence Theorem 5 applies.
However, a matrix can have multiple eigenvalues as we saw in Section 3.3. To
deal with this situation, we prove an important lemma which formalizes a technique
that is basic to diagonalization, and which will be used three times below.
Lemma 2
Let {X
1
, X
2
, , X
k
} be a linearly independent set of eigenvectors of an n n
matrix A, extend it to a basis {X
1
, X
2
, , X
k
, , X
n
} of
n
( by Theorem 6 5.2),
and let
be the (invertible) matrix with the X
i
as its columns. If
1
,
2
, ,
k
are the
(not necessarily distinct) eigenvalues of A corresponding to X
1
, X
2
, , X
k
respectively, then P
1
AP has block form
where B and A
1
are matrices of size k (n k) and (n k) (n k) respectively.
P AP
B
A
k

1
]
1
1 1 2
1
0
diag ( , , ... , )
P X X X
n
[ ]
1 2

A

1
]
1
1
1
1 0 0
1 2 3
1 1 0
t t t
k k 2 2 1 3 3 1 1 1 1
0 0 0 ( ) , ( ) , , ( )
+ +
231 Section 5.5 Similarity and Diagonalization
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 231
PROOF
If {E
1
, E
2
, , E
n
} is the standard basis of
n
, then
Comparing columns, we have P
1
X
i
E
i
for each 1 i n. On the other hand,
observe that
Hence, if 1 i k, column i of P
1
AP is
This describes the first k columns of P
1
AP, and Lemma 2 follows.
Note that Lemma 2 (with k n) shows that an n n matrix A is diagonalizable
if
n
has a basis of eigenvectors of A, as in (1) of Theorem 3.
If is an eigenvalue of an n n matrix A, write
This is a subspace of
n
called the eigenspace of A corresponding to (see Example 3
5.1) and the eigenvectors corresponding to are just the nonzero vectors in E

(A).
In fact E

(A) is the null space of the matrix (I A):


Hence, by Example 7 5.1, the basic solutions of the homogeneous system
(I A)X 0 given by the gaussian algorithm form a basis for E

(A). In particular
.
(

)
Now recall that the multiplicity of an eigenvalue of A is the number of times
occurs as a root of the characteristic polynomial. In other words, the multiplicity of
is the largest integer m 1 such that
for some polynomial g(x). Because of (

), the assertion (without proof ) in


Theorem 4 3.3 can be stated as follows: A square matrix is diagonalizable if and
only if the multiplicity of each eigenvalue equals dim[E

(A)]. We are going to


prove this, and the proof requires the following result which is valid for any square
matrix, diagonalizable or not.
Lemma 3
Let be an eigenvalue of multiplicity m of a square matrix A. Then dim[E

(A)] m.
c x x g x
A
m
( ) ( ) ( )
dim[ ( )] E A

is the number of basic solutions to the system ( ( ) I A X 0


E A X I A X I A

( ) { ( ) } ( ). 0 null
E A X AX X
n

( ) { }. in
( ) ( ) ( ) . P A X P X P X E
i i i i i i i


1 1 1

P AP P A X X X P AX P AX P AX
n n


1 1
[ ] [ ].
1 2
1
1
1
2
1

[ ] [ ]

1 2
1 1
1 2
E E E I P P P X X X
n n n


[ ].
1
1
1
2
1


P X P X P X
n

232 Chapter 5 The Vector Space


n
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 232
PROOF
Write dim[E

(A)] d. It suffices to show that c


A
(x) (x )
d
g(x) for some poly-
nomial g(x), because m is the highest power of (x ) that divides c
A
(x). To this
end, let {X
1
, X
2
, , X
d
} be a basis of E

(A). Then Lemma 2 shows that an


invertible n n matrix P exists such that
in block form, where I
d
denotes the d d identity matrix. Now write A

P
1
AP
and observe that c
A

(x) c
A
(x) by Theorem 1. But Theorem 5 3.1 gives
Hence c
A
(x) c
A

(x) (x )
d
g(x) where g(x) c
A
1
(x). This is what we wanted.
It is impossible to ignore the question when equality holds in Lemma 3 for each
eigenvalue . It turns out that this characterizes the diagonalizable matrices. This
was stated without proof in Theorem 4 3.3.
Theorem 6
The following are equivalent for a square matrix A:
1. A is diagonalizable.
2. dim[E

(A)] equals the multiplicity of for every eigenvalue of the matrix A.


PROOF
Let A be n n and let
1
,
2
, ,
k
be the distinct eigenvalues of A. For each i,
let m
i
denote the multiplicity of
i
and write d
i
dim[E

i
(A)]. Then
so m
1
+

+ m
k
n because c
A
(x) has degree
n. Moreover, d
i
m
i
for each i by Lemma 3.
(1) (2). By (1),
n
has a basis of n eigenvectors of A, so let t
i
of them lie in E

i
(A)
for each i. Since the subspace spanned by these t
i
eigenvectors has dimension t
i
,
we have t
i
d
i
for each i by Theorem 4 5.2. Hence
It follows that d
1
+

+ d
k
m
1
+

+ m
k
, so, since d
i
m
i
for each i, we must have
d
i
m
i
. This is (2).
(2) (1). Let B
i
denote a basis of E

i
(A) for each i, and let B B
1

B
k
.
Since each B
i
contains m
i
vectors by (2), and since the B
i
are pairwise disjoint (the
i
are distinct), it follows that B contains n vectors. So it suffices to show that B is
linearly independent (then B is a basis of
n
). Suppose a linear combination of the
vectors in B vanishes, and let Y
i
denote the sum of all terms that come from B
i
.
n t t d d m m n
k k k
+ + + + + +
1 1 1
.
c x x x x
A
m m
k
m
k
( ) ( ( )
1 2
1 2
) ( )
c x xI A
x I B
xI A
x I
A n
d
n d
d

1
]
1

( ) det( ) det
( )
det [( )

0
1
]] det [ ]
( ) ( ).
xI A
x c x
n d
d
A



1
1

P AP
I B
A
d

1
]
1
1
1
0

233 Section 5.5 Similarity and Diagonalization


nic22772_ch05.qxd 11/21/2005 6:49 PM Page 233
Then Y
i
lies in E

i
(A) for each i, so the nonzero Y
i
are independent by Theorem 4
(as the
i
are distinct). Since the sum of the Y
i
is zero, it follows that Y
i
0 for each
i. Hence all coefficients of terms in Y
i
are zero (because B
i
is independent). This
shows that B is independent.
Example 5
If show that A is diagonalizable
but B is not.
Solution We have c
A
(x) (x + 3)
2
(x 1) so the eigenvalues are
1
3 and
2
1. The
corresponding eigenspaces are E

1
(A) span{X
1
, X
2
} and E

2
(A) span{X
3
}
where
as the reader can verify. Since {X
1
, X
2
} is independent, we have dim(E

1
(A)) 2
which is the multiplicity of
1
. Similarly, dim(E

2
(A)) 1 equals the multiplicity
of
2
. Hence A is diagonalizable by Theorem 6, and a diagonalizing matrix is
P [X
1
X
2
X
3
].
Turning to B, c
B
(x) (x + 1)
2
(x 3) so the eigenvalues are
1
1 and
2
3.
The corresponding eigenspaces are E

1
(B) span{Y
1
} and E

2
(B) span{Y
2
} where
Here dim(E

1
(B)) 1 is smaller than the multiplicity of
1
, so the matrix B is
not diagonalizable, again by Theorem 6. The fact that dim(E

1
(B)) 1 means
that there is no possibility of finding three linearly independent eigenvectors.
Complex Eigenvalues
All the matrices we have considered have had real eigenvalues. But this need not be
the case: The matrix has characteristic polynomial c
A
(x) x
2
+1 which
has no real roots. Nonetheless, this matrix is diagonalizable; the only difference is
that we must use a larger set of scalars, the complex numbers. The basic properties
of these numbers are outlined in Appendix A.
Indeed, nearly everything we have done for real matrices can be done for com-
plex matrices. The methods are the same; the only difference is that the arithmetic
is carried out with complex numbers rather than real ones. For example, the
gaussian algorithm works in exactly the same way to solve systems of linear equa-
tions with complex coefficients, matrix multiplication is defined the same way, and
the matrix inversion algorithm works in the same way.
But the complex numbers are better than the real numbers in one respect: While
there are polynomials like x
2
+ 1 with real coefficients that have no real root, this
problem does not arise with the complex numbers: Every nonconstant polynomial
with complex coefficients has a complex root, and hence factors completely as
a product of linear factors. This fact is known as the Fundamental Theorem of
Algebra, and was first proved by Gauss.
12
A

1
]
1
0 1
1 0
Y Y
T T
1 2
1 2 1 5 6 1 [ ] , [ ] .
X X X
T T T
1 2 3
1 1 0 2 0 1 2 1 1 [ ] , [ ] , [ ]
A B

1
]
1
1

1
]
1
1
5 8 16
4 1 8
4 4 11
2 1 1
2 1 2
1 0 2
and ,
234 Chapter 5 The Vector Space
n
12 This was a famous open problem in 1799 when Gauss solved it at the age of 22 in his Ph.D. dissertation.
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 234
Example 6
Diagonalize the matrix
Solution The characteristic polynomial of A is
where i
2
1. Hence the eigenvalues are
1
i and
2
i, with correspond-
ing eigenvectors . Hence A is diagonalizable by the
complex version of Theorem 5, and the complex version of Theorem 3 shows
that is invertible and
Of course, this can be checked directly.
We shall return to complex linear algebra in Section 8.6.
Symmetric Matrices
13
On the other hand, many of the applications of linear algebra involve a real matrix A
and, while A will have complex eigenvalues by the Fundamental Theorem of Algebra,
it is always of interest to know when the eigenvalues are, in fact, real. While this can
happen in a variety of ways, it turns out to hold whenever A is symmetric. This
important theorem will be used extensively later. Surprisingly, the theory of complex
eigenvalues can be used to prove this useful result about real eigenvalues.
If Z is a complex matrix, the conjugate matrix is defined to be the matrix
obtained from Z by conjugating every entry. Thus, if Z [z
i j
], then For
example,
Recall that holds for all complex numbers z and w. It
follows that if Z and Ware two complex matrices, then
hold for all complex scalars . These facts are used in the proof of the following theorem.
Theorem 7
Let A be a symmetric real matrix. If is any eigenvalue of A, then is real.
14
PROOF
Observe that A

A because A is real. If is an eigenvalue of A, we show that is


real by showing that

. Let X be a (possibly complex) eigenvector corresponding


to , so that X 0 and AX X. Define c X
T
X

.
Z W Z W ZW Z W Z + + , ) and (
z w z w zw z w + + and
If then Z
i
i i
Z
i
i i

+
+

1
]
1

+

1
]
1
2 5
3 4
2 5
3 4
Z z
i j
[ ].
Z
P AP
i
i

1
]
1

1
]
1
1 1
2
0
0
0
0

. P X X
i i

1
]
1
[ ]
1 2
1 1
X
i
X
i
1 2
1 1

1
]
1

1
]
1
and
c x xI A x x i x i
A
( ) det ( ) ( )( ) + +
2
1
A

1
]
1
0 1
1 0
.
235 Section 5.5 Similarity and Diagonalization
13 This discussion uses complex conjugation and absolute value. These topics are discussed in Appendix A.
14 This theorem was first proved in 1829 by the great French mathematician Augustin Louis Cauchy (17891857) who is most
remembered for his work in analysis.
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 235
If we write X [z
1
z
2

z
n
]
T
where the z
i
are complex numbers, we have
Thus c is a real number, and c > 0 because at least one of the z
i
0 (as X 0).
We show that

by verifying that c

c. We have
At this point we use the hypothesis that A is symmetric and real. This means
A
T
A A

, so we continue
as required.
The technique in the proof of Theorem 7 will be used again when we return to
complex linear algebra in Section 8.6.
Example 7
Verify Theorem 7 for every real, symmetric 2 2 matrix A.
Solution If we have c
A
(x) x
2
(a + c) x + (ac b
2
), so the eigenvalues are
given by . But the discriminant
for any choice of a, b, and c. Hence, the eigenvalues are real numbers.
Exercises 5.5
( ) ( ) ( ) a c ac b a c b + +
2 2 2 2
4 4 0
+ +

1
]
1
2
2 2
4 ( ) ( ) ( ) a c a c ac b
A
a b
b c

1
]
1


c X A X X AX X AX X X
X X
X X
c
T T T T T
T
T

( ) ( ) ( )
( )
c X X X X AX X X A X
T T T T T
( ) ( ) ( ) .
c X X z z z
z
z
z
z z z z z z
T
n
n
n n

1
]
1
1
1
1
+ + + [ ]
1 2
1
2
1 1 2 2

+ + + z z z
n 1
2
2
2 2
.
236 Chapter 5 The Vector Space
n
1. By computing the trace, determinant, and rank,
show that A and B are not similar in each case.

1
]
1

1
]
1

1
]
1
1

(d)
(e)
A B
A B
3 1
1 2
2 1
3 2
2 1 1
1 0 1
1 1 0
,
,
11 2 1
2 4 2
3 6 3
1 2 3
1 1 2
0 3 5
2 1 3
6

1
]
1
1

1
]
1
1

(f ) , A B

1
]
1
1
3 9
0 0 0
(a)
(b)
A B
A B

1
]
1

1
]
1

1
]
1

1 2
2 1
1 1
1 1
3 1
2 1
1 1
2 1
,
,
11
]
1

1
]
1

1
]
1
(c) A B
2 1
1 1
3 0
1 1
,
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 236
2.
are not similar.
3. If A B, show that:
4. In each case, decide whether the matrix A is
diagonalizable. If so, find P such that P
1
AP is
diagonal.
5. If A is invertible, show that AB is similar to BA
for all B.
6. Show that the only matrix similar to a scalar
matrix A rI, r in , is A itself.
7. Let be an eigenvalue of A with corresponding
eigenvector X. If B P
1
AP is similar to A, show
that P
1
X is an eigenvector of B corresponding
to .
8. If A B and A has any of the following proper-
ties, show that B has the same property.
(a) Idempotent, that is A
2
A.

(b) Nilpotent, that is A


k
0 for some k 1.
(c) Invertible.
9. Let A denote an n n upper triangular matrix.
(a) If all the main diagonal entries of A are
distinct, show that A is diagonalizable.

(b) If all the main diagonal entries of A are equal,


show that A is diagonalizable only if it is
already diagonal.
(c) Show that is diagonalizable but
that is not.
10. Let A be a diagonalizable n n matrix
with eigenvalues
1
,
2
, ,
n
(including
multiplicities). Show that:
(a) det A
1

(b) tr A
1
+
2
+

+
n
11. Given a polynomial p(x) r
0
+ r
1
x +

+ r
n
x
n
and a square matrix A, the matrix
p(A) r
0
I + r
1
A +

+ r
n
A
n
is called the
evaluation of p(x) at A. Let B P
1
AP. Show
that p(B) P
1
p(A)P for all polynomials p(x).
12. Let P be an invertible n n matrix. If A is
any n n matrix, write T
P
(A) P
1
AP.
Verify that:
(a) T
P
(I ) I

(b) T
P
(AB) T
P
(A) T
P
(B)
(c) T
P
(A + B) T
P
(A) + T
P
(B)
(d) T
P
(rA) rT
P
(A)
(e) T
P
(A
k
) [T
P
(A)]
k
for k 1
(f ) If A is invertible, T
P
(A
1
) [T
P
(A)]
1
.
(g) If Q is invertible, T
Q
[T
P
(A)] T
PQ
(A).
13. (a) Show that two diagonalizable matrices are
similar if and only if they have the same eigen-
values with the same multiplicities.

(b) If A is diagonalizable, show that A A


T
.
14. If A is 2 2 and diagonalizable, show that
C(A) { X XA AX } has dimension 2 or 4.
[Hint: If P
1
AP D, show that X is in C(A) if
and only if P
1
XP is in C(D).]
15. If A is diagonalizable and p(x) is a polynomial
such that p() 0 for all eigenvalues of A,
show that p(A) 0 (see Example 9 3.3). In
particular, show c
A
(A) 0. [Remark: c
A
(A) 0
for all square matrices Athis is the
CayleyHamilton theorem (see Theorem 2 9.4).]
16. Let A be n n with n distinct real eigenvalues.
If AC CA, show that C is diagonalizable.
1 1 0
0 1 0
0 0 2

1
]
1
1
1 0 1
0 1 0
0 0 2

1
]
1
1
(c) 3 1 6
2 1 0
1 0 3

1
]
1
1

1
]
1
1
(d) 4 0 0
0 2 2
2 3 1
(a)
(b)
1 0 0
1 2 1
0 0 1
3 0 6
0 3 0
5 0 2

1
]
1
1

1
]
1
1

(a) (b)
(c) for in (d) for
A B A B
rA rB r A B n
T T
n n

1 1
1
Show that and
1 2 1 0
2 0 1 1
1 1 0 1
4 3 0 0
1 1 3 0
1 0 1 1
0 1 4

1
]
1
1
1
1

11
5 1 1 4

1
]
1
1
1
1
237 Section 5.5 Similarity and Diagonalization
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 237
17. Let
(a) Show that x
3
(a
2
+ b
2
+ c
2
)x 2abc has real
roots by considering A.

(b) Show that a


2
+ b
2
+ c
2
ab + ac + bc by
considering B.
18. Assume the 2 2 matrix A is similar to an upper
triangular matrix. If tr A 0 tr A
2
, show that
A
2
0.
19. Show that A is similar to A
T
for all 2 2
matrices A. [Hint: Let . If c 0, treat
the cases b 0 and b 0 separately. If c 0,
reduce to the case c 1 using Exercise 12(d).]
20. Refer to Section 3.4 on linear recurrences.
Assume that the sequence x
0
, x
1
, x
2
, satisfies
x
n +k
r
0
x
n
+ r
1
x
n +1
+

+ r
k 1
x
n +k 1
for all n 0. Define
Then show that:
(a) V
n
A
n
V
0
for all n.
(b) c
A
(x) x
k
r
k 1
x
k 1

r
1
x r
0
.
(c) If is an eigenvalue of A, the eigenspace E

has dimension 1, and X (1, ,


2
, ,
k 1
)
T
is an eigenvector. [Hint: Use c
A
() 0 to
show that E

X.]
(d) A is diagonalizable if and only if the eigen-
values of A are distinct. [Hint: See part (c)
and Theorem 4.]
(e) If
1
,
2
, ,
k
are distinct real eigenvalues,
there exist constants t
1
, t
2
, , t
k
such that
holds for all n. [Hint: If
D is diagonal with
1
,
2
, ,
k
as the main
diagonal entries, show that A
n
PD
n
P
1
has
entries that are linear combinations of
.]
1 2
n n
k
n
, , ,
x t t
n
n
k k
n
+ +
1 1

A
r r r r
V
x
x
x
k
n
n
n

1
]
1
1
1
1
1

+
0 1 0 0
0 0 1 0
0 0 0 1
0 1 2 1
1

,
nn k +

1
]
1
1
1
1
1
.
A
a b
c d

1
]
1
A
a b
a c
b c
B
c a b
a b c
b c a

1
]
1
1

1
]
1
1
0
0
0
and .
238 Chapter 5 The Vector Space
n
SECTION 5.6 An Application to Correlation and Variance
Suppose the heights of n men are measured. Such a data set is called a
sample of the heights of all the men in the population under study, and various
questions are often asked about such a sample: What is the average height in the
sample? How much variation is there in the sample heights, and how can it be
measured? What can be inferred from the sample about the heights of all men in
the population? How do these heights compare to heights of men in neighbouring
countries? Does the prevalence of smoking affect the height of a man?
The analysis of samples, and of inferences that can be drawn from them, is a
subject called mathematical statistics, and an extensive body of information has been
developed to answer many such questions. In this section we will describe a few
ways that linear algebra can be used.
It is convenient to represent a sample as a sample vector
in
n
. This being done, the dot product in
n
provides a
convenient tool to study the sample and describe some of the statistical concepts
related to it. The most widely known statistic for describing a data set is the sample
mean defined by
15
The mean is typical of the sample values x
i
, but may not itself be one of them.
The number x
i
is called the deviation of x
i
from the mean . The deviation is x x
x
x x x x x
n
n
n
i
i
n
+ + +

1
1 2
1
1
( ) .
x
X x x x
n
T
[ ]
1 2
{ } x x x
n 1 2
, , ,
h h h
n 1 2
, , ,
15 The mean is often called the average of the sample values x
i
, but statisticians use the term mean.
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 238
positive if and it is negative if . Moreover, the sum of these deviations is
zero:
(

)
This is described by saying that the sample mean is central to the sample values x
i
.
If the mean is subtracted from each data value x
i
, the resulting data are
said to be centred. The corresponding data vector is
and () shows that the mean . For example, the sample
is plotted in the diagram. The mean is , and the centred sample
is also plotted. Thus, the effect of centring is to shift the
data by an amount (to the left if is positive) so that the mean moves to 0.
Another question that arises about samples is how much variability there is in the
sample ; that is, how widely are the data spread out around
the sample mean . A natural measure of variability would be the sum of the
deviations of the x
i
about the mean, but this sum is zero by (); these deviations
cancel out. To avoid this cancellation, statisticians use the squares of the
deviations as a measure of variability. More precisely, they compute a statistic called
the sample variance , defined
16
as follows:
The sample variance will be large if there are many x
i
at a large distance from the
mean , and it will be small if all the x
i
are tightly clustered about the mean. The
variance is clearly nonnegative (hence the notation ), and the square root s
x
of the
variance is called the sample standard deviation.
The sample mean and variance can be conveniently described using the dot
product. Let
denote the column with every entry equal to 1. If , then
, so the sample mean is given by the formula
.
Moreover, remembering that is a scalar, we have , so the
centred sample vector X
c
is given by
Thus we obtain a formula for the sample variance:
Linear algebra is also useful for comparing two different samples. To illustrate
how, consider two examples.
s X X x
x n c n
2
1
1
2 1
1
2


1 .
X X x x x x x x x
c n
T
1 [ ] .
1 2

x x x x
T
1 [ ] x
x
n
X
1
( ) i 1
X x x x
n
i 1 + + +
1 2
X x x x
n
T
[ ]
1 2

1 [ ] 1 1 1
T
s
x
2
x
s x x x x x x x x
x
n
n
n
i
i
n
2
1
1
1
2
2
2 2
1
1
2
1
+ + +

[ ] ( ) ( ) ( ) ( ) .
s
x
2
( ) x x
i

2
x
X x x x
n
T
[ ]
1 2

x x
[ ] 3 2 1 2 4
T
X
c

x 2
X
T
[ ] 1 0 1 4 6
c
x 0
X x x x x x x
c n
T
[ ]
1 2

x x
i
x
x
( ) x x x nx nx nx
i
i
n
i
i
n

_
,

.


1 1
0
x x
i
< x x
i
>
239 Section 5.6 An Application to Correlation and Variance
16 Since there are n sample values, it seems more natural to divide by n here, rather than by n 1. The reason for using n 1
is that then the sample variance s
2
x
provides a better estimate of the variance of the entire population from which the sample
was drawn.
1 0 1
Sample X
Centred Sample X
c
2 4 6
4 2 0 -1 -2 -3
x
x
c
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 239
The following table represents the number of sick days at work per year and the
yearly number of visits to a physician for 10 individuals.
The data are plotted in the scatter diagram where it is evident that, roughly
speaking, the more visits to the doctor the more sick days. This is an example of a
positive correlation between sick days and doctor visits.
Now consider the following table representing the daily doses of vitamin C
and the number of sick days.
The scatter diagram is plotted as shown and it appears that the more vitamin C
taken, the fewer sick days. In this case there is a negative correlation between daily
vitamin C and sick days.
In both these situations, we have paired samples, that is observations of two
variables are made for ten individuals: doctor visits and sick days in the first case;
daily vitamin C and sick days in the second case. The scatter diagrams point to a
relationship between these variables, and there is a way to use the sample to
compute a number, called the correlation coefficient, that measures the degree to
which the variables are associated.
To motivate the definition of the correlation coefficient, suppose two paired
samples and are given and consider the
centred samples
If x
k
is large among the x
i
s, then the deviation will be positive; and
will be negative if x
k
is small among the x
i
s. The situation is similar for Y, and the
following table displays the sign of the quantity in all four cases:
Intuitively, if X and Y are positively correlated, then two things happen:
1. Large values of the x
i
tend to be associated with large values of the y
i
, and
2. Small values of the x
i
tend to be associated with small values of the y
i
.
It follows from the table that, if X and Y are positively correlated, then the dot
product
is positive. Similarly X
c
i Y
c
is negative if X and Y are negatively correlated. With
this in mind, the sample correlation coefficient
17
r is defined by
X Y x x y y
c c i i
i
n
i

( )( )
1
Sign of
large small
large positive negative ( )( ): x x y y
x x
y
y
i i
i i
i
i

ssmall negative positive
( )( ) x x y y
i i

x x
k

x x
k

X x x x x x x Y y y y y y y
c n
T
c n
T
[ ] [ ] .
1 2 1 2
and
Y y y y
n
T
[ ]
1 2
X x x x
n
T
[ ]
1 2
Individual
Vitamin C
Sick days
1 5 7 0 4 9 2 8 6 3
5 2 2 6 2 1 4 3
1 2 3 4 5 6 7 8 9 10
22 5
Individual 1 2 3 4 5 6 7 8 9 10
Doctor visits 2 6 8 1 5 10 3 9 7 4
Sick days 2 4 8 3 55 9 4 7 7 2
240 Chapter 5 The Vector Space
n
17 The idea of using a single number to measure the degree of relationship between different variables was pioneered by Francis
Galton (18221911). He was studying the degree to which characteristics of an offspring relate to those of its parents. The
idea was refined by Karl Pearson (18571936) and r is often referred to as the Pearson correlation coefficient.
Sick
Days
Sick
Days
Doctor Visits
Vitamin C Doses
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 240
Bearing the situation in
3
in mind, r is the cosine of the angle between the
vectors X
c
and Y
c
, and so we would expect it to lie between 1 and 1. Moreover,
we would expect r to be near 1 (or 1) if these vectors were pointing in the same
(opposite) direction, that is the angle is near zero (or ).
This is confirmed by Theorem 1 below, and it is also borne out in the examples
above. If we compute the correlation between sick days and visits to the physician
(in the first scatter diagram above) the result is r 0.90 as expected. On the other
hand, the correlation between daily vitamin C doses and sick days (second scatter
diagram) is r 0.84.
However, a word of caution is in order here. We cannot conclude from the second
example that taking more vitamin C will reduce the number of sick days at work.
The (negative) correlation may arise because of some third factor that is related to
both variables. In this case it may be that less healthy people are inclined to take
more vitamin C. Correlation does not imply causation. Similarly, the correlation
between sick days and visits to the doctor does not mean that having many sick days
causes more visits to the doctor. A correlation between two variables may point to
the existence of other underlying factors, but it does not necessarily mean that there
is a causality relationship between the variables.
Our discussion of the dot product in
n
provides the basic properties of the
correlation coefficient:
Theorem 1
Let and be (nonzero) paired samples,
and let r r(X, Y) denote the correlation coefficient. Then:
1. 1 r 1.
2. r 1 if and only if there exist a and b > 0 such that y
i
a + bx
i
for each i.
3. r 1 if and only if there exist a and b < 0 such that y
i
a + bx
i
for each i.
PROOF
The Cauchy inequality (Theorem 2 5.3) proves (1), and also shows that r 1 if and
only if one of X
c
and Y
c
is a scalar multiple of the other. This in turn holds if and
only if Y
c
bX
c
for some b 0, and it is easy to verify that r 1 when b > 0 and
r 1 when b < 0.
Finally, Y
c
bX
c
means for each i; that is, y
i
a + bx
i
where
. Conversely, if y
i
a + bx
i
, then (verify), so
for each i. In other words, Y
c
bX
c
. This
completes the proof.
Properties (2) and (3) in Theorem 1 show that r(X, Y) 1 means that there is
a linear relation with positive slope between the paired data (so large x values are
paired with large y values). Similarly, r(X, Y) 1 means that there is a linear
relation with negative slope between the paired data (so small x values are paired
with small y values). This is borne out in the two scatter diagrams above.
We conclude by using the dot product to derive some useful formulas for com-
puting variances and correlation coefficients. Given samples
and , the key observation is the following formula:
(

)
X Y X Y n x y
c c
i i .
Y y y y
n
T
[ ]
1 2

X x x x
n
T
[ ]
1 2

( ) ( ) ( ) a bx a bx b x x
i i
+ +
y y
i

y a bx + a y bx
y y b x x
i i
( )
Y y y y
n
T
[ ]
1 2
X x x x
n
T
[ ]
1 2

r r X Y
X Y
X Y
c c
c c
, ( ) .
i
241 Section 5.6 An Application to Correlation and Variance
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 241
1. The following table gives IQ scores for 10
fathers and their eldest sons. Calculate the
means, the variances, and the correlation coeffi-
cient r. (The data scaling formula is useful.)
1 2 3 4 5 6 7 8 9 10
Fathers IQ 140 131 120 115 110 106 100 95 91 86
Sons IQ 130 138 110 99 109 120 105 99 100 94
Indeed, remembering that and are scalars:
Taking Y X in () gives a formula for the variance of X.
Variance Formula
If x is a sample vector, then .
We also get a convenient formula for the correlation coefficient,
Moreover, () and the fact that give:
Correlation Formula
If X and Y are sample vectors, then
Finally, we give a method that simplifies the computations of variances and
correlations.
Data Scaling
Let and be sample vectors. Given
constants a, b, c, and d, consider new samples and
where z
i
a + bx
i
for each i and w
i
c + dy
i
for each i.
Then:
(a) .
(b) , so s
z


b

s
x
.
(c) If b and d have the same sign, then r(X, Y ) r(Z, W).
The verification is left as an exercise.
For example, if , subtracting 100 yields
. A routine calculation shows that and , so
, and .
Exercises 5.6
s
x
2
14
3
4 67 . x . 100 99 67
1
3
s
z
2
14
3
z
1
3
Z
T
[ ] 1 2 3 1 0 3
X
T
[ ] 101 98 103 99 100 97
s b s
z x
2 2 2

z a bx +
W w w w
n
T
[ ]
1 2

Z z z z
n
T
[ ]
1 2

Y y y y
n
T
[ ]
1 2
X x x x
n
T
[ ]
1 2

r r X Y
X Y nx y
n s s
x y
,

( )
( )
.
i
1
s X
x
n
c
2
1
1
2

r r X Y
X Y
X Y
c c
c c
, ( ) .
i
s X nx
x n
2
1
1
2 2


s X
x n c
2
1
1
2


X Y X x Y y
X Y y X x Y x y
X Y y nx x
c c
i i
i i i i
i

+

( ) ( )
( ) ( ) ( )
( ) (
1 1
1 1 1 1
nny x y n
X Y n x y
) ( )
.
+
i
y x
242 Chapter 5 The Vector Space
n
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 242
SECTION 5.7 An Application to Least Squares Approximation
In many scientific investigations, data are collected that relate two variables. For
example, if x is the number of dollars spent on advertising by a manufacturer and y
is the value of sales in the region in question, the manufacturer could generate data
by spending x
1
, x
2
, , x
n
dollars at different times and measuring the corresponding
sales values y
1
, y
2
, , y
n
.
Suppose it is known that a linear relationship exists between the variables x and
yin other words, that y a + bx for some constants a and b. If the data are plotted,
the points (x
1
, y
1
), (x
2
, y
2
), , (x
n
, y
n
) may appear to lie on a straight line and
estimating a and b requires finding the best-fitting line through these data points.
For example, if five data points occur as shown in Figure 5.1, line 1 is clearly a
better fit than line 2. In general, the problem is to find the values of the constants
a and b such that the line y a + bx best approximates the data in question. Note
that an exact fit would be obtained if a and b were such that y
i
a + bx
i
were true
for each data point (x
i
, y
i
). But this is too much to expect. Experimental errors in
measurement are bound to occur, so the choice of a and b should be made in such
a way that the errors between the observed values y
i
and the corresponding fitted
values a + bx
i
are in some sense minimized.
The first thing we must do is explain exactly what we mean by the best fit of a line
y a + bx to an observed set of data points (x
1
, y
1
), (x
2
, y
2
), , (x
n
, y
n
). For
convenience, write the linear function r + sx as
so that the fitted points (on the line) have coordinates (x
1
, f (x
1
)), , (x
n
, f (x
n
)).
Figure 5.2 is a sketch of what the line y f (x) might look like. For each i the
observed data point (x
i
, y
i
) and the fitted point (x
i
, f (x
i
)) need not be the same, and
the distance d
i
between them measures how far the line misses the observed point.
For this reason d
i
is often called the error at x
i
, and a natural measure of how close
the line y f (x) is to the observed data points is the sum d
1
+ d
2
+

+ d
n
of all these
errors. However, it turns out to be better to use the sum of squares
as the measure of error, and the line y f (x) is to be chosen so as to make this sum
as small as possible. This line is said to be the least squares approximating line for
the data points (x
1
, y
1
), (x
2
, y
2
), , (x
n
, y
n
).
The square of the error d
i
is given by for each i, so the quantity
S to be minimized is the sum:
S y f x y f x y f x
n n
+ + + [ ( )] [ ( )] [ ( )] .
1 1
2
2 2
2 2

d y f x
i i i
2 2
[ ( )]
S d d d
n
+ + +
1
2
2
2 2

f x r sx ( ) +
243 Section 5.7 An Application to Least Squares Approximation
Figure 5.1
Y
X
O
(x
5
, y
5
)
(x
4
, y
4
)
(x
3
, y
3
)
(x
2
, y
2
)
(x
1
, y
1
)
Line2
Line1
Figure 5.2
Y
X O x
1
y f x = ( )
d
i
d
1
d
n
x
i
x
n
(x
n
, f(x
n
))
(x
n
, y
n
)
(x
i
, y
i
)
(x
i
, f(x
i
))
(x
1
, y
1
)
(x
1
, f(x
1
))

2. The following table gives the number of years of


education and the annual income (in thousands) of
10 individuals. Find the means, the variances, and
the correlation coefficient. (Again the data scaling
formula is useful.)
Individual 1 2 3 4 5 6 7 8 9 10
Years of education 12 16 13 18 19 12 18 19 12 14
Yearly income (1000s) 31 48 35 28 55 40 39 60 32 35
3. If X is a sample vector, and X
c
is the centred
sample, show that and the standard
deviation of X
c
is s
x
.
4. Prove the data scaling formulas:
(a)

(b) (c)
found on page 242.
c
x 0
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 243
Note that all the numbers x
i
and y
i
are given here: what is required is that the
function f be chosen in such a way as to minimize S. Because f (x) r + sx, this
amounts to choosing r and s so as to minimize S, and the problem can be solved
using vector techniques. The following notations simplify the discussion.
Observe that
so the quantity S to be minimized is
Here Y and M are given and we are asked to find Z such that the length of the
vector Y MZ is as small as possible. To this end, consider the set U of all vectors
MZ where Z varies:
This is a subspace of
n
, and the task is to choose MZ in U as close as possible to Y.
If n 3 and x
1
, x
2
and x
3
are distinct, U is the plane containing
and with normal . In this case, we
look for so that is orthogonal to every vector in the plane U,
as in Figure 5.3.
This condition
18
makes sense in
n
so we look for such that
(MZ)
i
(Y MA) 0 for all Z in
2
. This dot product is in
n
, and it can be written
as a dot product in
2
:
for all Z in
2
. This means that M
T
Y M
T
MA is orthogonal to every vector in
2
.
In particular, it is orthogonal to itself, and so must be zero, and we obtain
.
These are called the normal equations for A, and can be solved using gaussian
elimination. Moreover, if at least two of the x
i
are distinct, the matrix M
T
M can be
shown to be invertible, so the solution A is unique. If the solution is , the
best fitting line is y a
0
+ a
1
x. This proves the following useful theorem.
A
a
a

1
]
1
1
0
1
( ) M M A M Y
T T

0 ( ) ( ) [ ( )] ( ) MZ Y MA Z M Y MA Z M Y M MA
T T T T T
i
A
a
a

1
]
1
1
0
1
M z

y Ma

a
a
a

1
]
1
1
0
1
x v x x x x x x
T
[ ]
2 3 3 1 1 2
v
T
[ ] 1 1 1
x x x x
T
[ ]
1 2 3
U MZ Z
r
s
r sx
r sx
r sx
n

1
]
1

+
+
+

1
]
1
1
1
1
arbitrary
1
2

11
1

r s and arbitrary .
S Y MZ
2
.
Y MZ
y r sx
y r sx
y r sx
y f x
y
n n

+
+
+

1
]
1
1
1
1

1 1
2 2
1 1
( )
( )
( )
( )

22 2

1
]
1
1
1
1
f x
y f x
n n
( )
( )

Y
y
y
y
M
x
x
x
Z
r
s
n n

1
]
1
1
1
1

1
]
1
1
1
1

1
]
1
1
2
1
2
1
1
1


244 Chapter 5 The Vector Space
n
18 We will revisit this in Chapter 8 where a more rigorous argument will be given.

U
M
y

Ma

Figure 5.3
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 244
x y
1 1
3 2
4 3
6 4
7 5
Theorem 1
Suppose that n data points (x
1
, y
1
), (x
2
, y
2
), , (x
n
, y
n
) are given, where at least two
of x
1
, x
2
, , x
n
are distinct. Put
Then the least squares approximating line for these data points has the equation
where is found by gaussian elimination from the normal equations
The condition that at least two of x
1
, x
2
, , x
n
are distinct ensures that M
T
M
is an invertible matrix, so A is unique:
Example 1
Let data points (x
1
, y
1
), (x
2
, y
2
), , (x
5
, y
5
) be given as in the accompanying
table. Find the least squares approximating line for these data.
Solution In this case we have
so the normal equations (M
T
M)A M
T
Y for become
The solution (using gaussian elimination) is to two decimal
places, so the least squares approximating line for these data is y 0.24 + 0.66x.
Note that M
T
M is indeed invertible here (the determinant is 114), and the
exact solution is
a
a
0
1
0 24
0 66

1
]
1

1
]
1
.
.
5 21
21 111
15
78
0
1

1
]
1

1
]
1

1
]
1
a
a
A
a
a

1
]
1
0
1
M Y
x x x
y
y
y
y y y
x y x
T

1
]
1

1
]
1
1
1
1

+ + +
+
1 1 1
1 2 5
1
2
5
1 2 5
1 1 2

yy x y
2 5 5
15
78 + +

1
]
1

1
]
1

M M
x x x
x
x
x
x x
x
T

1
]
1

1
]
1
1
1
1

+ +
+ +
1 1 1
1
1
1
5
1 2 5
1
2
5
1 5
1

xx x x
5 1
2
5
2
5 21
21 111
+ +

1
]
1

1
]
1

A M M M Y
T T


( ) .
1
( ) . M M A M Y
T T

A
a
a

1
]
1
0
1
y a a x +
0 1
Y
y
y
y
M
x
x
x
n n

1
]
1
1
1
1

1
]
1
1
1
1
1
2
1
2
1
1
1


245 Section 5.7 An Application to Least Squares Approximation
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 245
Suppose now that, rather than a straight line, we want to find the parabola
y a
0
+ a
1
x + a
2
x
2
that is the least squares approximation to the data points
(x
1
, y
1
), , (x
n
, y
n
). In the function f (x) a
0
+ a
1
x + a
2
x
2
, the three constants
a
0
, a
1
, and a
2
must be chosen to minimize the sum of squares of the errors:
Choosing a
0
, a
1
, and a
2
amounts to choosing the (parabolic) function f that
minimizes S.
In general, there is a relationship y f (x) between the variables, and the range of
candidate functions is limitedsay, to all lines or to all parabolas. The task is to
find, among the suitable candidates, the function that makes the quantity S as small
as possible. The function that does so is called the least squares approximating
function (of that type) for the data points.
As might be imagined, this is not always an easy task. However, if the functions
f (x) are restricted to polynomials of degree m,
the analysis proceeds much as before (where m 1). The problem is to choose the
numbers a
0
, a
1
, , a
m
so as to minimize the sum
The resulting function y f (x) a
0
+ a
1
x +

+ a
m
x
m
is called the least squares
approximating polynomial of degree m for the data (x
1
, y
1
), , (x
n
, y
n
). By
analogy with the preceding analysis, define
Then
so S is the sum of the squares of the entries of Y MA. An analysis similar to that
for Theorem 1 can be used to prove Theorem 2.
Theorem 2
Suppose n data points (x
1
, y
1
), (x
2
, y
2
), , (x
n
, y
n
) are given, where at least m + 1
of x
1
, x
2
, , x
n
are distinct (in particular n m + 1). Put
Y MA
y a a x a x
y a a x a x
y a a x
m
m
m
m
n n

+ + +
+ + +
+
1 0 1 1 1
2 0 1 2 2
0 1
( )
( )
(


++ +

1
]
1
1
1
1
1


a x
y f x
y f x
y f x
m n
m
n n
)
( )
( )
( )
1 1
2 2
]]
1
1
1
1
Y
y
y
y
M
x x x
x x x
x x x
n
m
m
n n n
m

1
]
1
1
1
1

1
2
1 1
2
1
2 2
2
2
2
1
1
1

1
]
1
1
1
1
1

1
]
1
1
1
1
A
a
a
a
m
0
1

S y f x y f x y f x
n n
+ + + [ ( )] [ ( )] [ ( )] .
1 1
2
2 2
2 2

f x a a x a x
m
m
( ) + + +
0 1

S y f x y f x y f x
n n
+ + + [ ( )] [ ( )] [ ( )] .
1 1
2
2 2
2 2

A M M M Y
T T

1
]
1

1
]
1

1
]
1

( )
1
1
114
111 21
21 5
15
78
1
114
27
75

1
]
1
1
38
9
25
246 Chapter 5 The Vector Space
n
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 246
Then the least squares approximating polynomial of degree m for the data points
has the equation
where is found by gaussian elimination from the normal equations
The condition that at least m + 1 of x
1
, x
2
, , x
n
be distinct ensures that the
matrix MM
T
is invertible, so A is unique:
A proof of this theorem is given in Section 8.7 (Theorem 2).
Example 2
Find the least squares approximating quadratic y a
0
+ a
1
x + a
2
x
2
for the
following data points.
(3, 3), (1, 1), (0, 1), (1, 2), (3, 4)
Solution This is an instance of Theorem 2 with m 2. Here
Hence,
M Y
T

1
]
1
1

1
]
1
1
1
1
1
1

1 1 1 1 1
3 1 0 1 3
9 1 0 1 9
3
1
1
2
4
11
4
66

1
]
1
1
M M
T

1
]
1
1

1
]
1
1
1 1 1 1 1
3 1 0 1 3
9 1 0 1 9
1 3 9
1 1 1
1 0 0
1 1 1
1 3 9
11
1
1
1

1
]
1
1
5 0 20
0 20 0
20 0 164
Y M

1
]
1
1
1
1
1
1

1
]
1
1
1
1
1
1
3
1
1
2
4
1 3 9
1 1 1
1 0 0
1 1 1
1 3 9
A M M M Y
T T


( ) .
1
( ) . M M A M Y
T T

A
a
a
a
m

1
]
1
1
1
1
0
1

y a a x a x
m
m
+ + +
0 1

Y
y
y
y
M
x x x
x x x
x x x
n
m
m
n n n
m

1
]
1
1
1
1

1
2
1 1
2
1
2 2
2
2
2
1
1
1

1
]
1
1
1
1
1
247 Section 5.7 An Application to Least Squares Approximation
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 247
The normal equations for A are
This means that the least squares approximating quadratic for these data
is y 1.15 + 0.20x + 0.26x
2
.
Least squares approximation can be used to estimate physical constants, as is
illustrated by the next example.
Example 3
Hookes law in mechanics asserts that the magnitude of the force f required to
hold a spring is a linear function of the extension e of the spring (see the
accompanying diagram). That is,
where k and e
0
are constants depending only on the spring. The following data
were collected for a particular spring.
Find the least squares approximating line f a
0
+ a
1
e for these data, and use it
to estimate k.
Solution Here f and e play the role of y and x in the general theory. We have
as in Theorem 1, so
Hence the normal equations for A are
The least squares approximating line is f 7.70 + 2.84e, so the estimate for
k is k 2.84.
Exercises 5.7
5 67
67 963
229
3254
7 70
2 84

1
]
1

1
]
1

1
]
1
A A whence
.
.
M M M Y
T T

1
]
1

1
]
1
5 67
67 963
229
3254
and
Y M

1
]
1
1
1
1
1
1

1
]
1
1
1
1
33
38
43
54
61
1 9
1 11
1 12
1 16
1 19
11
1
e 9 11 12 16 19
f 33 38 43 54 61
f ke e +
0
5 0 20
0 20 0
20 0 164
11
4
66
1 15
0 20
0 26

1
]
1
1

1
]
1
1
A A whence
.
.
.

1
]
1
1
248 Chapter 5 The Vector Space
n
e
f
1. Find the least squares approximating line
y a
0
+ a
1
x for each of the following sets of data
points.
(a) (1, 1), (3, 2), (4, 3), (6, 4)
(b) (2, 4), (4, 3), (7, 2), (8, 1)
(c) (1, 1), (0, 1), (1, 2), (2, 4), (3, 6)
(d) (2, 3), (1, 1), (0, 0), (1, 2), (2, 4)
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 248
2. Find the least squares approximating quadratic
y a
0
+ a
1
x + a
2
x
2
for each of the following sets
of data points.
(a) (0, 1), (2, 2), (3, 3), (4, 5)
(b) (2, 1), (0, 0), (3, 2), (4, 3)
3. If M is a square invertible matrix, show that
A M
1
Y (in the notation of Theorem 2).
4. Newtons laws of motion imply that an object
dropped from rest at a height of 100 metres will
be at a height metres t seconds
later, where g is a constant called the
acceleration due to gravity. The values of s and t
given in the table are observed. Write x t
2
,
find the least squares approximating line
s a + bx for these data, and use b to estimate g.
Then find the least squares approximating
quadratic s a
0
+ a
1
t + a
2
t
2
and use the value of
a
2
to estimate g.
5. A naturalist measured the heights y
i
(in metres)
of several spruce trees with trunk diameters x
i
(in centimetres). The data are as given in the
table. Find the least squares approximating
line for these data and use it to estimate the
height of a spruce tree with a trunk of diameter
10cm.
6. (a) Use m 0 in Theorem 2 to show that the
best-fitting horizontal line y a
0
through
the data points (x
1
, y
1
), , (x
n
, y
n
) is
y ( y
1
+ y
2
+

+ y
n
), the average of the
y coordinates.
(b) Deduce the conclusion in (a) without using
Theorem 2.
7. Assume n m + 1 in Theorem 2 (so Mis square).
If the x
i
are distinct, use Theorem 6 3.2 to show
that M is invertible. Deduce that A M
1
Y
and that the least squares polynomial is the
interpolating polynomial (Theorem 6 3.2) and
actually passes through all the data points.
1
n
x
i
5 7 8 12 13 16
y
i
2 3.3 4 7.3 7.9 10.1
t 1 2 3
s 95 80 56
s gt 100
1
2
2
249 Section 5.7 An Application to Least Squares Approximation
Supplementary Exercises for Chapter 5
1. In each case either show that the statement is
true or give an example showing that it is false.
Throughout, X, Y, Z, X
1
, X
2
, , X
n
denote
vectors in
n
.
(a) If U is a subspace of
n
and X + Y is in U,
then X and Y are both in U.
(b) If U is a subspace of
n
and rX is in U,
then X is in U.
(c) If U is a nonempty set and sX + tY is in U
for any s and t whenever X and Y are in U,
then U is a subspace.
(d) If U is a subspace of
n
and X is in U,
then X is in U.
(e) If {X, Y } is independent, then {X, Y, X + Y }
is independent.
(f ) If {X, Y, Z} is independent, then {X, Y }
is independent.
(g) If {X, Y } is not independent, then {X, Y, Z}
is not independent.
(h) If all of X
1
, X
2
, , X
n
are nonzero, then
{X
1
, X
2
, , X
n
} is independent.
(i) If one of X
1
, X
2
, , X
n
is zero, then
{X
1
, X
2
, , X
n
} is not independent.
( j) If aX + bY + cZ 0 where a, b, and c are in ,
then {X, Y, Z} is independent.
(k) If {X, Y, Z} is independent, then
aX + bY + cZ 0 for some a, b, and c in .
(l) If { X
1
, X
2
, , X
n
} is not independent, then
t
1
X
1
+ t
2
X
2
+

+ t
n
X
n
0 for t
i
in not all
zero.
(m) If { X
1
, X
2
, , X
n
} is independent, then
t
1
X
1
+ t
2
X
2
+

+ t
n
X
n
0 for some t
i
in .
(n) Every set of four nonzero vectors in
4
is a basis.
(o) No basis of
3
can contain a vector with
a component 0.
(p)
3
has a basis of the form {X, X + Y, Y }
where X and Y are vectors.
(q) Every basis of
5
contains one column of I
5
.
(r) Every nonempty subset of a basis of
3
is
again a basis of
3
.
(s) If {X
1
, X
2
, X
3
, X
4
} and {Y
1
, Y
2
, Y
3
, Y
4
}
are bases of
4
, then
{X
1
+ Y
1
, X
2
+ Y
2
, X
3
+ Y
3
, X
4
+ Y
4
}
is also a basis of
4
.
nic22772_ch05.qxd 11/21/2005 6:49 PM Page 249