10 tayangan

Diunggah oleh bhameed-1958

- Vector Algebra MCQ
- IIT QUIZ
- Artin Algebra CheatSheet
- 1971-OntheCalculationofConsistentStressDistributionsinFiniteElementApproximations
- LL~PresentationMME120913
- Comm Ch06 Signal Space En
- 020 Advanced Engineering Mathematics
- Course Outline WS 2017
- Modal Analysis Lecture Notes
- Theorist's Toolkit Lecture 9: High Dimensional Geometry (continued) and VC-dimension
- P. Dorbec and A. Gajardo- Langton’s Fly
- Linear Algebra Answers
- Additive
- Assignment Week9
- maths 1.pdf
- tt18
- Vector Tensors
- aggregationmaterialbalances.pdf
- BS Mechanical Engineering Curricula PIEAS 2013.pdf
- toth

Anda di halaman 1dari 53

n

5

SECTION 5.1 Subspaces and Spanning

In Section 2.5 we introduced the set

n

of all n 1 columns, investigated the linear

transformations

n

m

, and showed that they are all given by left multiplication

by an m n matrix. Particular attention was paid to the euclidean plane

2

and to

euclidean space

3

, where geometric transformations like rotations and reflections

were shown to be matrix transformations. We returned to this in Section 4.4 where

projections in

2

or

3

were also shown to be matrix transformations, and where

determinants were related to areas and volumes.

In this chapter we investigate

n

in full generality, and introduce some of the

most important concepts and methods in linear algebra. While the n-tuples in

n

can be written as rows or as columns, we will primarily denote them as column

matrices X, Y, etc. The main exception is that the geometric vectors in

2

and

3

will be written as , , etc.

Subspaces of

n

A set

1

U of vectors in

n

is called a subspace of

n

if it satisfies the following

properties:

S1. The zero vector 0 is in U.

S2. If X and Y are in U, then X + Y is also in U.

S3. If X is in U, then aX is in U for every real number a.

We say that the subset U is closed under addition if S2 holds, and that U is closed

under scalar multiplication if S3 holds.

Clearly

n

is a subspace of itself. The set U {0}, consisting of only the zero

vector, is also a subspace because 0 + 0 0 and a0 0 for each a in ; it is called the

zero subspace. Any subspace of

n

other than {0} or

n

is called a proper

subspace.

w

1 We use the language of sets. Informally, a set X is a collection of objects, called the elements of the set. Two sets X and Y are

called equal (written X Y) if they have the same elements. If every element of X is in the set Y, we say that X is a subset

of Y, and write X

Y. Hence both X

Y and Y

X if and only if X Y.

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 197

198 Chapter 5 The Vector Space

n

We saw in Section 4.2 that every plane M through the origin in

3

has equation

ax + by + cz 0 where a, b, and c are not all zero. Here is a normal to

the plane and

where and denotes the dot product introduced in Section 4.2.

2

Then Mis a subspace of

3

. Indeed we show that Msatisfies S1, S2, and S3 as follows:

S1. is in M because ;

S2. If and are in M, then , so is in M;

S3. If is in M, then , so is in M.

This proves the first part of

Example 1

Planes and lines through the origin in

3

are all subspaces of

3

.

Solution We dealt with planes above. If L is a line through the origin with direction

vector then . We leave it as an exercise to verify that L

satisfies S1, S2, and S3.

Example 1 shows that lines through the origin in

2

are subspaces; in fact, they are

the only proper subspaces of

2

(Exercise 24). Indeed, we shall see in Example 11

5.2 that lines and planes through the origin in

3

are the only proper subspaces of

3

. Thus the geometry of lines and planes through the origin is captured by the

subspace concept. (Note that every line or plane is just a translation of one of these.)

Subspaces can also be used to describe important features of an m n matrix A.

The null space of A, denoted null A, and the image space of A, denoted im A, are

defined by

.

In the language of Chapter 2, null A consists of all solutions X in

n

of the

homogeneous system AX 0, and im A is the set of all vectors Y in

m

such that

AX Y has a solution for some X.

Note that X is in null A if it satisfies the condition AX 0, while im A consists of

vectors of the form AX for some X in

n

. These two ways to describe subsets occur

frequently.

Example 2

If A is an m n matrix, then:

1. null A is a subspace of

n

.

2. im A is a subspace of

m

.

Solution 1. The zero vector 0 in

n

lies in null A because A0 0.

3

If X and X

1

are in null A, then X + X

1

and aX are in null A because they satisfy the

required condition:

null { } im { } A X AX A AX X

n n

in and in

0

L td t { }

in d

,

av

n av a n v a

i ( ) ( ) ( ) i 0 0 v

v v

+

1

n v v n v n v

i i i ( ) + + +

1 1

0 0 0 v

1

i 0 0 0

n v

i v x y z

T

[ ]

M v n v { }

in

3

i 0

n a b c

T

[ ]

2 We are using set notation here. In general {q | p } means the set of all objects q with property p.

3 We are using 0 to represent the zero vector in both

m

and

n

. This abuse of notation is common, and causes no confusion

once everyone knows that it is going on.

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 198

199 Section 5.1 Subspaces and Spanning

4 The vector n

is nonzero because v

and w

Hence null A satisfies S1, S2, and S3, and so is a subspace of

n

.

2. The zero vector 0 in

m

lies in im A because 0 A0. Suppose that Y and

Y

1

are in im A, say Y AX and Y

1

AX

1

where X and X

1

are in

n

. Then

show that Y + Y

1

and aY are both in im A (they have the required form).

Hence im A is a subspace of

m

.

There are other important subspaces associated with a matrix A that clarify basic

properties of A. If A is an m n matrix and is any number, let

.

A vector X is in E

, so Example 2 gives:

Example 3

E

n

for each n n matrix A and number .

E

(A) is called the eigenspace of A corresponding to . The reason for the name is

that, in the terminology of Section 3.3, is an eigenvalue of A if E

(A) {0}.

In this case the nonzero vectors in E

corresponding to .

The reader should not get the impression that every subset of

n

is a subspace.

For example:

Hence neither U

1

nor U

2

is a subspace of

2

. (However, see Exercise 20.)

Spanning Sets

Let and be two nonzero, nonparallel vectors in

3

with their tails at the origin.

The plane M through the origin containing these vectors is described in Section 4.2

by saying that is a normal for M, and that M consists of all vectors such

that .

4

While this is a very useful way to look at planes, there is another

approach that is at least as useful in

3

and, more importantly, works for all subspaces

of

n

.

The idea is as follows: Observe that, by the diagram, a vector is in M if and

only if it has the form

for certain real numbers a and b (we say that p

is a linear combination of v

and w

).

p av bw

+

p

n p

i 0

p

n v w

w

U

x

y

x

U

x

y

x y

1

2

2

0

1

]

1

1

]

1

22

E A X AX X

n

( ) { } in

Y Y AX AX A X X aY a AX A aX + + +

1 1 1

( ) ( ) ( ) and

A X X AX AX A aX a AX a ( ) ( ) ( ) . + + +

1 1

0 0 0 0 0 and

a

b

M

p

Hence we can describe M as

and we say that is a spanning set for M. It is this notion of a spanning set that

provides a way to describe all subspaces of

n

.

Given vectors X

1

, X

2

, , X

k

in

n

, a vector of the form

is called a linear combination of the X

i

, and t

i

is called the coefficient of X

i

in the

linear combination. The set of all such linear combinations is called the span of the

X

i

and is denoted

.

Thus span{X, Y} {sX + tY

s, t in }, and span{X} {tX

t in }.

In particular, the above discussion shows that, if and are two nonzero,

nonparallel vectors in

3

, then

M span

is the plane in

3

containing and . Moreover, if is any nonzero vector in

3

(or

2

), then

is the line with direction vector . Hence lines and planes can both be described in

terms of spanning sets.

Example 4

Let and in

4

. Determine whether

or is in U span{X, Y}.

Solution The vector P is in U if and only if P sX + tY for scalars s and t. Equating

components gives equations

.

This linear system has solution s 3 and t 2, so P is in U. On the other

hand, Q sX + tY leads to equations

and this system has no solution. So Q does not lie in U.

Theorem 1

Let in

n

. Then:

1. U is a subspace of

n

containing each X

i

.

2. If Wis a subspace of

n

and each X

i

is in W, then U

W.

U X X X

k

span , , , { }

1 2

2 3 2 4 3 2 1 2 s t s t s t s t + + + , , , and

2 3 0 4 11 2 8 1 s t s t s t s t + + + , , , and

Q

T

[ ] 2 3 1 2 P

T

[ ] 0 11 8 1

Y

T

[ ] 3 4 1 1 X

T

[ ] 2 1 2 1

d

L v td t span{ } { }

in

d

{ , } v w

w

span{ , , , } { } X X X t X t X t X t

k k k i 1 2 1 1 2 2

+ + + in

t X t X t X t

k k i 1 1 2 2

+ + + where the are scalars

{ , } v w

M av bw a b + { , }

in

5

200 Chapter 5 The Vector Space

n

5 In particular, this implies that any vector p

orthogonal to v

av

+ bw

of v

and w

for

some a and b. Can you prove this directly?

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 200

PROOF

Write

1. The zero vector 0 is in U because is a linear

combination of the X

i

. If and

are in U, then X + Y and aX are in U because

Hence S1, S2, and S3 are satisfied for U, proving (1).

2. Let where the t

i

are scalars and each X

i

is in

W. Then each t

i

X

i

is in Wbecause Wsatisfies S3. But then X is in W

because Wsatisfies S2 (verify). This proves (2).

Condition (2) in Theorem 1 can be expressed by saying that is

the smallest subspace of

n

that contains each X

i

. Here is an example of how it is used.

If we say that the vectors span the

subspace U.

Example 5

If X and Y are in

n

, show that .

Solution Since both X + Y and X Y are in span{X, Y }, Theorem 1 gives

But and are both in

span{X + Y, X Y}, so

again by Theorem 1. Thus span{X, Y} span{X + Y, X Y}.

It turns out that many important subspaces are best described by giving a

spanning set. Here are three examples, beginning with an important spanning set

for

n

itself. Column j of the n n identity matrix I

n

is denoted E

j

and called the

jth coordinate vector in

n

, and the set is called the standard basis

of

n

. If is any vector in

n

, then

as the reader can verify. This proves:

Example 6

.

If A is an m n matrix A, the next two examples show that it is a routine

matter to find spanning sets for null A and im A.

Example 7

Given an m n matrix A, let X

1

, X

2

, , X

k

denote the basic solutions to the

system AX 0 given by the gaussian algorithm. Then

. null span{

2

A X X X

k

1

, , , }

n

k

E E E span{ , , , }

1 2

X x E x E x E

n n

+ + +

1 1 2 2

X x x x

n

T

[ ]

1 2

{ , , , } E E E

n 1 2

span span { , } { , } X Y X Y X Y

+

Y X Y X Y +

1

2

( ) ( )

1

2

X X Y X Y + +

1

2

1

2

( ) ( )

span span { , } { , }. X Y X Y X Y +

span span { } { } X Y X Y X Y , , +

X X X

k 1 2

, , , U X X X

k

span , , , { }

1 2

span{ } X X X

k 1 2

, , ,

X t X t X t X

k k

+ + +

1 1 2 2

X Y s t X s t X s t X

aX at X at X

k k k

+ + + + + + +

+ +

( ) ( ) ( ) and

( ) ( )

1 1 1 2 2 2

1 1 2 2

,

+ ( ) at X

k k

.

Y s X s X s X

k k

+ + +

1 1 2 2

X t X t X t X

k k

+ + +

1 1 2 2

0 0 0 0

1 2

+ + + X X X

k

U X X X

k

span , , , { } for convenience.

1 2

201 Section 5.1 Subspaces and Spanning

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 201

Solution If X is in null A, then AX 0 so Theorem 3 2.2 shows that X is a linear

combination of the basic solutions; that is, null A

span{X

1

, X

2

, , X

k

}. On

the other hand, if X is in span{X

1

, X

2

, , X

k

}, then X t

1

X

1

+ t

2

X

2

+

+ t

k

X

k

for scalars t

i

, so

.

This shows that X is in null A, and hence that span{X

1

, X

2

, , X

k

}

null A.

Thus we have equality.

Example 8

Let C

1

, C

2

, , C

n

denote the columns of the m n matrix A. Then

im A span{C

1

, C

2

, , C

n

}.

Solution Observe first that AE

j

C

j

for each j where E

j

is the jth coordinate vector in

n

.

Hence each C

j

is in im A, and so span{C

1

, C

2

, , C

n

}

im A by Theorem 1.

Conversely, let Y be a vector in im A, say Y AX for some X in

n

.

If X [x

1

x

2

x

n

]

T

, then Theorem 4 2.2 gives

so Y is in span{C

1

, C

2

, , C

n

}. Hence im A

span{C

1

, C

2

, , C

n

}, and the

result is proved.

Exercises 5.1

Y AX C C C

x

x

x

x C x C x C

n

n

n n

1

]

1

1

1

1

+ + + [ ]

1 2

1

2

1 1 2 2

AX t AX t AX t AX t t t

k k k

+ + + + + +

1 1 2 2 1 2

0 0 0 0

202 Chapter 5 The Vector Space

n

1. In each case determine whether U is a subspace

of

3

. Support your answer.

(a) U {[1 s t]

T

s and t in }.

(b) U {[0 s t]

T

s and t in }.

(c) U {[r s t]

T

r, s, and t in , r + 3s + 2t 0}.

(d) U {[r 3s r 2]

T

r and s in }.

(e) U {[r 0 s]

T

r

2

+ s

2

0, r and s in }.

(f ) U {[2r s

2

t]

T

r, s, and t in }.

2. In each case determine if X lies in U span{Y, Z}.

If X is in U, write it as a linear combination of Y

and Z; if X is not in U, show why not.

(a) X [2 1 0 1]

T

, Y [1 0 0 1]

T

, and

Z [0 1 0 1]

T

.

(b) X [1 2 15 11]

T

, Y [2 1 0 2]

T

, and

Z [1 1 3 1]

T

.

(c) X [8 3 13 20]

T

, Y [2 1 3 5]

T

, and

Z [1 0 2 3]

T

.

(d) X [2 5 8 3]

T

, Y [2 1 0 5]

T

, and

Z [1 2 2 3]

T

.

3. In each case determine if the given vectors

span

4

. Support your answer.

(a) {[1 1 1 1]

T

, [0 1 1 1]

T

, [0 0 1 1]

T

,

[0 0 0 1]

T

}.

(b) {[1 3 5 0]

T

, [2 1 0 0]

T

, [0 2 1 1]

T

,

[1 4 5 0]

T

}.

4. Is it possible that {[1 2 0]

T

, [2 0 3]

T

} can span

the subspace U {[r s 0]

T

r and s in }?

Defend your answer.

5. Give a spanning set for the zero subspace {0}

of

n

.

6. Is

2

a subspace of

3

? Defend your answer.

7. If U span{X, Y, Z} in

n

, show that

U span{X + tZ, Y, Z} for every t in .

8. If U span{X, Y, Z} in

n

, show that

U span{X + Y, Y + Z, Z + X}.

9. If a 0 is a scalar, show that span{aX} span{X}

for every vector X in

n

.

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 202

10. If are nonzero scalars, show that

for any vectors X

i

in

n

.

11. If X 0 in

n

, determine all subspaces of

span{X}.

12. Suppose that where

each X

i

is in

n

. If A is an m n matrix and

AX

i

0 for each i, show that AY 0 for every

vector Y in U.

13. If A is an m n matrix, show that, for each

invertible m m matrix U, null(A) null(UA).

14. If A is an m n matrix, show that, for each

invertible n n matrix V, im(A) im(AV ).

15. Let U be a subspace of

n

, and let X be a vector

in

n

.

(a) If aX is in U where a 0 is a number, show

that X is in U.

(b) If Y and X + Y are in U where Y is a vector

in

n

, show that X is in U.

16. In each case either show that the statement is

true or give an example showing that it is false.

(a) If U

n

is a subspace of

n

and X + Y is in

U, then X and Y are both in U.

(b) If U is a subspace of

n

and rX is in U for all

r in , then X is in U.

(c) If U is a subspace of

n

and X is in U, then

X is also in U.

(d) If X is in U and U span{Y, Z}, then

U span{X, Y, Z}.

(e) The empty set of vectors in

n

is a subspace

of

n

.

17. (a) If A and B are m n matrices, show that

U {X in

n

AX BX} is a subspace of

n

.

(b) What if A is m n, B is k n, and m k?

18. Suppose that are vectors in

n

.

If Y a

1

X

1

+ a

2

X

2

+

+ a

k

X

k

where a

1

0, show

that span{X

1

, X

2

, , X

k

} span{Y, X

2

, , X

k

}.

19. If U {0} is a subspace of , show that U .

n

. Show that U

is a subspace if and only if S2 and S3 hold.

21. If S and T are nonempty sets of vectors in

n

,

and if S

T, show that span{S}

span{T}.

22. Let U and Wbe subspaces of

n

. Define their

intersection U Wand their sum U + Was

follows:

U W {X in

n

X belongs to both

U and W}.

U + W {X in

n

X is a sum of a vector in U

and a vector in W}.

(a) Show that U Wis a subspace of

n

.

(b) Show that U + Wis a subspace of

n

.

23. Let P denote an invertible n n matrix.

If is a number, show that

for each n n matrix A. [Here E

eigenvectors of A.]

24. Show that every proper subspace U of

2

is a

line through the origin. [Hint: If is a nonzero

vector in U, let denote the

line with direction vector . If is in U but not

in L, argue geometrically that every vector in

2

is a linear combination of and .] d

L d rd r

{ } in

d

{ is in ( )} P E A X X

E PAP

( )

1

X X X

k 1 2

, , ,

U X X X

k

, , , span{ }

1 2

span span { } { } a X a X a X X X X

k k k 1 1 2 2 1 2

, , , , , ,

a a a

k 1 2

, , ,

203 Section 5.2 Independence and Dimension

SECTION 5.2 Independence and Dimension

Some spanning sets are better than others. If is a subspace

of

n

, then every vector in U can be written as a linear combination of the X

i

in at

least one way. Our interest here is in spanning sets where each vector in U has a

exactly one representation as a linear combination of these vectors.

Linear Independence

Suppose that two linear combinations are equal in

n

:

. r X r X r X s X s X s X

k k k k 1 1 2 2 1 1 2 2

+ + + + + +

U X X X

k

, , , span{ }

1 2

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 203

We are looking for a condition on the set of vectors that guarantees

that this representation is unique; that is, r

i

s

i

for each i. Taking all terms to the left

side gives

.

so the required condition is that this equation forces all the coefficients r

i

s

i

to be zero.

With this in mind, we call a set of vectors linearly independent

(or simply independent) if it satisfies the following condition:

.

We record the result of the above discussion for reference.

Theorem 1

If is an independent set of vectors in

n

, then every vector in

has a unique representation as a linear combination of the X

i

.

It is useful to state the definition of independence in different language. Let us

say that a linear combination vanishes if it equals the zero vector, and call a linear

combination trivial if every coefficient is zero. Then the definition of independence

can be compactly stated as follows:

A set of vectors is independent if and only if the only

linear combination that vanishes is the trivial one.

Hence the procedure for checking that a set of vectors is independent is:

Independence Test

To verify that a set of vectors in

n

is independent, proceed as

follows:

1. Set a linear combination equal to zero: .

2. Show that t

i

0 for each i (that is, the linear combination is trivial).

Of course, if some nontrivial linear combination vanishes, the vectors are not

independent.

Example 1

Determine whether is independent

in

4

.

Solution Suppose a linear combination vanishes:

.

Equating corresponding entries gives a system of four equations:

.

The only solution is the trivial one r s t 0 (verify), so these vectors are

independent by the independence test.

r s t s t r t r s t + + , + , + + 2 0 0 2 2 0 5 0 , and

r s t

T T T T

[ ] [ ] [ ] [ ] 1 0 2 5 2 1 0 1 1 1 2 1 0 0 0 0 + +

{[ ] , [ ] , [ ] } 1 0 2 5 2 1 0 1 1 1 2 1

T T T

t X t X t X

k k 1 1 2 2

0 + + +

{ } X X X

k 1 2

, , ,

span{ , , , } X X X

k 1 2

{ } X X X

k 1 2

, , ,

If then t X t X t X t t t

k k k 1 1 2 2 1 2

0 0 + + +

{ } X X X

k 1 2

, , ,

( ) ( ) ( ) r s X r s X r s X

k k k 1 1 1 2 2 2

0 + + +

{ } X X X

k 1 2

, , ,

204 Chapter 5 The Vector Space

n

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 204

Example 2

Show that the standard basis of

n

is independent.

Solution We have for all scalars t

i

, so the linear

combination vanishes if and only if each t

i

0. Hence the independence test

applies.

Example 3

If {X, Y} is independent, show that {2X + 3Y, X 5Y} is also independent.

Solution If s(2X + 3Y) + t(X 5Y) 0, collect terms to get (2s + t)X + (3s 5t)Y 0.

Since {X, Y} is independent this combination must be trivial; that is, 2s + t 0

and 3s 5t 0. These equations have only the trivial solution s t 0, as

required.

Example 4

Show that the zero vector in

n

does not belong to any independent set.

Solution Given a set of vectors containing 0, we have a vanishing,

nontrivial linear combination . Hence the set is

not independent.

Example 5

Given X in

n

, show that {X} is independent if and only if X 0.

Solution A vanishing linear combination from {X} takes the form tX 0, t in . This

implies that t 0 because X 0.

A set of vectors in

n

is called linearly dependent (or simply dependent) if it is

not linearly independent, equivalently if some nontrivial linear combination vanishes.

Example 6

If and are nonzero vectors in

3

, show that is dependent if and only

if and are parallel.

Solution If and are parallel, then one is a scalar multiple of the other (Theorem 4

4.1), say for some scalar a. Then the nontrivial linear combination

vanishes, so is dependent. Conversely, if is dependent,

let be nontrivial, say s 0. Then , so X and are parallel

(by Theorem 4 4.1). A similar argument works if t 0.

By Theorem 5 2.3, the following conditions are equivalent for an n n matrix A:

1. A is invertible.

2. If AX 0 where X is in

n

, then X 0.

3. AX B has a solution X for every vector B in

n

.

w

v w

t

s

sv tw

+ 0

{ } v w

, { } v w

, v aw

0

v aw

{ } v w

, w

1 0 0 0 0 0

1 2

+ + + + X X X

k

{ } 0

1 2

, , , , X X X

k

t E t E t E t t t

n n n

T

1 1 2 2 1 2

+ + + [ ]

{ } E E E

k 1 2

, , ,

205 Section 5.2 Independence and Dimension

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 205

While (1) makes no sense if A is not square, conditions (2) and (3) are meaningful

for any matrix A and, in fact, are related to independence and spanning. To see

how, let denote the columns of an m n matrix A. If X [x

1

x

2

x

n

]

T

is a column in

n

, then

()

by Theorem 4 2.2. With this, we get the following theorem:

Theorem 2

If A is an m n matrix, let denote the columns of A.

1. is independent in

m

if and only if AX 0, X in

n

,

implies X 0.

2. if and only if AX B has a solution X for every

vector B in

n

.

PROOF

Write . Then AX 0 means by

(), so (1) follows from the definition of independence. Similarly, () shows that

a vector B in

n

satisfies AX B if and only if X is a linear combination of the

columns C

j

, so (2) follows from the definition of a spanning set.

For a square matrix A, Theorem 2 characterizes the invertibility of A in terms of

the spanning and independence of its columns (see the discussion preceding

Theorem 2). It is important to be able to discuss these notions for rows. If

are 1 n rows, we define to be the set of all

linear combinations of the X

i

(as matrices), and we say that is

linearly independent if the only vanishing linear combination is the trivial one (that

is, if is independent in

n

, as the reader can verify).

6

Theorem 3

The following are equivalent for an n n matrix A:

1. A is invertible.

2. The columns of A are linearly independent.

3. The columns of A span

n

.

4. The rows of A are linearly independent.

5. The rows of A span the set of all 1 n rows.

PROOF

Let denote the columns of A.

(1) (2). By Theorem 5 2.3, A is invertible if and only if AX 0 implies X 0; this

holds if and only if is independent by Theorem 2.

(1) (3). Again by Theorem 5 2.3, A is invertible if and only if AX B has a

solution for every column B in

n

; this holds if and only if

by Theorem 2.

span{ , , , } C C C

n

n

1 2

{ , , , } C C C

n 1 2

C C C

n 1 2

, , ,

{ , , , } X X X

T T

k

T

1 2

{ , , , } X X X

k 1 2

span{ , , , } X X X

k 1 2

X X X

k 1 2

, , ,

x C x C x C

n n 1 1 2 2

0 + + X x x x

n

T

[ ]

1 2

m

n

C C C span{ , , , }

1 2

{ , , , } C C C

n 1 2

{ , , , } C C C

n 1 2

AX x C x C x C

n n

+ +

1 1 2 2

C C C

n 1 2

, , ,

206 Chapter 5 The Vector Space

n

6 It is best to view columns and rows as just two different notations for ordered n-tuples. This discussion will become

redundant in Chapter 6 where we define the general notion of a vector space.

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 206

207 Section 5.2 Independence and Dimension

7 The plural of basis is bases.

8 We will show in Theorem 6 that every subspace of

n

does indeed have a basis.

(1) (4). The matrix A is invertible if and only if A

T

is invertible (by the

Corollary to Theorem 4 2.3); this in turn holds if and only if A

T

has independent

columns (by (1) (2)); finally, this last statement holds if and only if A has

independent rows (because the rows of A are the transposes of the columns of A

T

).

(1) (5). The proof is similar to (1) (4).

Dimension

It is common geometrical language to say that

3

is 3-dimensional, that planes

are 2-dimensional and that lines are 1-dimensional. The next theorem is a

basic tool for clarifying this idea of dimension. Its importance is difficult to

exaggerate.

Theorem 4 Fundamental Theorem

Let U be a subspace of

n

. If U is spanned by m vectors, and if U contains k

linearly independent vectors, then k m.

We give a proof at the end of this section.

The main use of the fundamental theorem depends on the following concept. If

U is a subspace of

n

, a set of vectors in U is called a basis of U if

it satisfies the following two conditions:

1. is linearly independent.

2. .

The most remarkable result about bases

7

is:

Theorem 5 Invariance Theorem

If and are bases of a subspace U of

n

, then m k.

PROOF

We have k m by the fundamental theorem because spans U,

and is independent. Similarly m k, so m k.

The invariance theorem guarantees that there is no ambiguity in the follow-

ing definition: If U is a subspace of

n

and is any basis of U, the

number m of vectors in the basis is called the dimension of U, and is denoted

.

The importance of the invariance theorem is that the dimension of U can be deter-

mined by counting the number of vectors in any basis.

8

This is very useful as we

shall see.

dimU m

{ , , , } X X X

m 1 2

{ , , , } Y Y Y

k 1 2

{ , , , } X X X

m 1 2

{ , , , } Y Y Y

k 1 2

{ , , , } X X X

m 1 2

U X X X

m

span{ , , , }

1 2

{ , , } X X X

m 1 2

,

{ , , , } X X X

m 1 2

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 207

Let denote the standard basis of

n

, that is the set of columns of

the identity matrix. Then by Example 6 5.1, and

is independent by Example 2. Hence it is indeed a basis of

n

in the

present terminology, and we have

Example 7

dim(

n

) n and is a basis.

This agrees with our sense that

2

is two-dimensional and

3

is three-

dimensional. It also says that

1

is one-dimensional, and {1} is a basis.

Returning to subspaces of

n

, we define

.

This amounts to saying {0} has a basis containing no vectors. This makes sense

because 0 cannot belong to any independent set (Example 4).

Example 8

Let . Show that U is a subspace of

4

, find a

basis of U, and calculate dim U.

Solution Clearly, where

, and . It follows that U span{X

1

, X

2

, X

3

},

and hence that U is a subspace of

4

. Moreover, if a linear combination

vanishes, it is clear that r s t 0, so {X

1

, X

2

, X

3

}

is independent. Hence {X

1

, X

2

, X

3

} is a basis of U and so dim U 3.

Example 9

Let be a basis of

n

. If A is an invertible n n matrix,

then is also a basis of

n

.

Solution Let X be a vector in

n

. Then A

1

X is in

n

so, since B is a basis, we have

for t

i

in . Left multiplication by A gives

,

and it follows that D spans

n

. To

show independence, let , where the s

i

are

in . Then , so left multiplication by A

1

gives

. Now the independence of B shows that each

s

i

0, and so proves the independence of D. Hence D is a basis of

n

.

While we have found bases in many subspaces of

n

, we have not yet shown that

every subspace has a basis. This is part of the next theorem, the proof of which is

deferred to Section 6.4 where it will be proved in more generality.

Theorem 6

Let U {0} be a subspace of

n

. Then:

1. U has a basis and dim U n.

2. Any independent set in U can be enlarged (by adding vectors) to a basis of U.

3. If B spans U, then B can be cut down (by deleting vectors) to a basis of U.

s X s X s X

n n 1 1 2 2

0 + + +

A s X s X s X

n n

( )

1 1 2 2

0 + + +

s AX s AX s AX

n n 1 1 2 2

0 ( ) ( ) ( ) + + +

X t AX t AX t AX

n n

+ + +

1 1 2 2

( ) ( ) ( )

A X t X t X t X

n n

+ + +

1

1 1 2 2

D AX AX AX

n

{ , , , }

1 2

B X X X

n

{ , , , }

1 2

rX sX tX r s t s

T

1 2 3

+ + [ ]

X

T

3

0 0 1 0 [ ] X

T

2

0 1 0 1 [ ]

X

T

1

1 0 0 0 [ ] , [ ] r s t s rX sX tX

T

+ +

1 2 3

U r s t s r s t

T

{[ ] | , , } and in

dim{ } 0 0

{ , , , } E E E

n 1 2

{ , , , } E E E

n 1 2

n

n

E E E span{ , , , }

1 2

{ , , , } E E E

n 1 2

208 Chapter 5 The Vector Space

n

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 208

Theorem 6 has a number of useful consequences. Here is the first.

Theorem 7

Let U be a subspace of

n

, and let be a set of m vectors in U

where m dim U. Then

B is independent if and only if B spans U.

PROOF

Suppose B is independent. If B does not span U then, by Theorem 6, B can be

enlarged to a basis of U containing more than m vectors. This contradicts the

invariance theorem because dim U m, so B spans U. Conversely, if B spans U

but is not independent, then B can be cut down to a basis of U containing fewer

than m vectors, again a contradiction. So B is independent, as required.

Theorem 7 is a labour-saving result. It asserts that, given a subspace U of

dimension m and a set B of exactly m vectors in U, to prove that B is a basis of U it

suffices to show either that B spans U or that B is independent. It is not necessary to

verify both properties.

Example 10

Find a basis of

4

containing B {X

1

, X

2

, X

3

} where ,

, and .

Solution If , then it is routine to verify that {E

1

, X

1

, X

2

, X

3

} is linearly

independent. Since

4

has dimension 4 it follows by Theorem 7 that

{E

1

, X

1

, X

2

, X

3

} is a basis.

9

Theorem 8

Let U

Wbe subspaces of

n

. Then:

1. dim U dim W.

2. If dim U dim W, then U W.

PROOF

Write dim W k, and let B be a basis of U.

1. If dim U > k, then B is an independent set in Wcontaining more than k

vectors, contradicting the fundamental theorem. So dim U dim W,

proving (1).

2. If dim U k, then B is an independent set in Wcontaining k dim Wvectors,

so B spans Wby Theorem 7. Hence W span B U, proving (2).

E

T

1

1 0 0 0 [ ]

X

T

3

1 1 1 1 [ ] X

T

2

0 1 1 3 [ ]

X

T

1

1 2 1 0 [ ]

B X X X

m

{ , , , }

1 2

209 Section 5.2 Independence and Dimension

9 In fact, an independent subset of

n

can always be enlarged to a basis by adding vectors from the standard basis of

n

. (See

Example 7 6.4.)

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 209

It follows from Theorem 8 that if U is a subspace of

n

, then , and

that

, and

The other subspaces are called proper. The following example uses Theorem 8 to

show that the proper subspaces of

2

are the lines through the origin, while the

proper subspaces of

3

are the lines and planes through the origin.

Example 11

1. If U is a subspace of

2

or

3

, then dim U 1 if and only if U is a line

through the origin.

2. If U is a subspace of

3

, then dim U 2 if and only if U is a plane through

the origin.

PROOF

1. Since dim U 1, let be a basis of U. Then , so U is the line

through the origin with direction vector . Conversely each line L with direction

vector has the form . Hence {d

} is a basis of U, so U has

dimension 1.

2. If U

3

has dimension 2, let be a basis of U. Then and are not par-

allel (by Example 6) so . Let denote

the plane through the origin with normal . Then P is a subspace of

3

(Example 1 5.1) and both and lie in P (they are orthogonal to ), so

by Theorem 1 5.1. Hence

.

Since dim U 2 and dim(

3

) 3, it follows from Theorem 8 that dimP 2 or 3,

whence P U or

3

. But P

3

(for example, is not in P) and so U P is a plane

through the origin.

Conversely, if U is a plane through the origin, then dim U 0, 1, 2, or 3 by

Theorem 8. But dim U 0 or 3 because and U

3

, and dim U 1 by (1).

So dim U 2.

Note that this proof shows that if and are nonzero, nonparallel vectors in

3

,

then is the plane with normal . We gave a geometrical

verification of this fact in Section 5.1.

Proof of the Fundamental Theorem

Fundamental Theorem (Theorem 4)

Let U be a subspace of

n

. If and if is an

independent set in U, then k m.

{ , , , } Y Y Y

k 1 2

U X X X

m

span{ , , , }

1 2

n v w

span{ , } v w

w

U { } 0

U P

3

U v w P

span{ , }

n

P x n x { }

in

3

0 i n v w

0

w

{ , } v w

L td t { }

in d

U tu t { }

in { } u

dimU n U

n

if and only if

dim { } U U 0 0 if and only if

dimU n 0 1 2 , , , ,

210 Chapter 5 The Vector Space

n

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 210

PROOF

We assume that k > m and show that this leads to a contradiction. Each Y

j

is in

, so write

where the coefficients a

ij

are real numbers. These coefficients form column j of

a matrix A [a

ij

] of size m k. Since k > m by assumption, the homogeneous

system AX 0 has a nontrivial solution . Consider the

linear combination of the Y

j

with these x

j

as coefficients:

where for each i because it is the ith entry of AX 0. Since the

Y

j

are independent, the equation implies that each x

j

0, a

contradiction because X 0.

Exercises 5.2

j

k

j j

x Y

1

0

j

k

ij j

a x

1

0

x Y x a X a x X

j j

j

k

j

j

k

ij i

i

m

i

m

ij j

j

k

i

_

,

_

,

1 1 1 1 1

(( ) , 0 0

1

X

i

i

m

X x x x

k

T

[ ]

1 2

0

Y a X a X a X a X j k

j j j mj m ij i

i

m

+ + +

1 1 2 2

1

1 2 , , , , ,

U X X X

m

span{ , , , }

1 2

211 Section 5.2 Independence and Dimension

1. Which of the following subsets are independent?

Support your answer.

(a) {[1 1 0]

T

, [3 2 1]

T

, [3 5 2]

T

} in

3

.

(b) {[1 1 1]

T

, [1 1 1]

T

, [0 0 1]

T

} in

3

.

(c) {[1 1 1 1]

T

, [2 0 1 0]

T

, [0 2 1 2]

T

} in

4

.

(d) {[1 1 0 0]

T

, [1 0 1 0]

T

, [0 0 1 1]

T

,

[0 1 0 1]

T

} in

4

.

2. Let {X, Y, Z, W} be an independent set in

n

.

Which of the following sets is independent?

Support your answer.

(a) {X Y, Y Z, Z X}

(b) {X + Y, Y + Z, Z + X}

(c) {X Y, Y Z, Z W, W X}

(d) {X + Y, Y + Z, Z + W, W + X}

3. Find a basis and calculate the dimension of the

following subspaces of

4

.

(a) span{[1 1 2 0]

T

, [2 3 0 3]

T

, [1 9 6 6]

T

}.

(b) span{[2 1 0 1]

T

, [1 1 1 2]

T

, [2 7 4 5]

T

}.

(c) span{[1 2 1 0]

T

, [2 0 3 1]

T

, [4 4 11 3]

T

,

[3 2 2 1]

T

}.

(d) span{[2 0 3 1]

T

, [1 2 1 0]

T

, [2 8 5 3]

T

,

[1 2 2 1]

T

}.

4. Find a basis and calculate the dimension of the

following subspaces of

4

.

(a) U {[a a + b a b b]

T

a and b in }.

(b) U {[a + b a b b a]

T

a and b in }.

(c) U {[a b c + a c]

T

a, b, and c in }.

(d) U {[a b b + c a b + c]

T

a, b, and c in }.

(e) U {[a b c d]

T

a + b c + d 0 in }.

(f ) U {[a b c d]

T

a + b c + d in }.

5. Suppose that {X, Y, Z, W} is a basis of

4

. Show

that:

(a) {X + aW, Y, Z, W} is also a basis of

4

for

any choice of the scalar a.

(b) {X + W, Y + W, Z + W, W} is also a basis

of

4

.

(c) {X, X + Y, X + Y + Z, X + Y + Z + W} is also

a basis of

4

.

6. Use Theorem 3 to determine if the following

sets of vectors are a basis of the indicated space.

(a)

(b)

(c)

(d)

(e)

(f ) {[ ] , [ ] , [ ] ,

[ ] } .

1 0 2 5 4 4 3 2 0 1 0 3

1 3 3 10

4

T T T

T

in

{[ ] , [ ] , [ ] ,

[ ] } .

2 1 1 3 1 1 0 2 0 1 0 3

1 2 3 1

4

T T T

T

in

{[ ] , [ ] , [ ] } . 5 2 1 1 0 1 3 1 0

3

T T T

in

{[ ] , [ ] , [ ] } . 1 1 1 1 1 2 0 0 1

3 T T T

in

{[ ] , [ ] , [ ] } . 1 1 1 1 1 1 0 0 1

3

T T T

in

{[ ] , [ ] } . 3 1 2 2

2

T T

in

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 211

7. In each case show that the statement is true or

give an example showing that it is false.

(a) If {X, Y} is independent, then {X, Y, X + Y}

is independent.

(b) If {X, Y, Z} is independent, then {Y, Z} is

independent.

(c) If {Y, Z} is dependent, then {X, Y, Z} is

dependent.

(d) If all of are nonzero, then

is independent.

(e) If one of is zero, then

is dependent.

(f ) If aX + bY + cZ 0, then {X, Y, Z} is inde-

pendent.

(g) If {X, Y, Z} is independent, then

aX + bY + cZ 0 for some a, b, and c in .

(h) If is dependent, then

for some num-

bers t

i

in not all zero.

(i) If is independent, then

for some t

i

in .

8. If A is an n n matrix, show that det A 0 if

and only if some column of A is a linear

combination of the other columns.

9. Let {X, Y, Z} be a linearly independent set in

4

. Show that {X, Y, Z, E

k

} is a basis of

4

for

some E

k

in the standard basis {E

1

, E

2

, E

3

, E

4

}.

10. If {X

1

, X

2

, X

3

, X

4

, X

5

, X

6

} is an independent set

of vectors, show that the subset {X

2

, X

3

, X

5

} is

also independent.

11. Let A be any m n matrix, and let B

1

, B

2

,

B

3

, , B

k

be columns in

m

such that the

system AX B

i

has a solution X

i

for each i. If

{B

1

, B

2

, B

3

, , B

k

} is independent in

m

, show

that {X

1

, X

2

, X

3

, , X

k

} is independent in

n

.

12. If {X

1

, X

2

, X

3

, , X

k

} is independent, show that

{X

1

, X

1

+ X

2

, X

1

+ X

2

+ X

3

, , X

1

+ X

2

+

+ X

k

}

is also independent.

13. If {Y, X

1

, X

2

, X

3

, , X

k

} is independent, show

that {Y + X

1

, Y + X

2

, Y + X

3

, , Y + X

k

} is also

independent.

14. Suppose that {X, Y} is a basis of

2

, and let

.

(a) If A is invertible, show that

{aX + bY, cX + dY } is a basis of

2

.

(b) If {aX + bY, cX + dY } is a basis of

2

, show

that A is invertible.

15. Let A denote an m n matrix.

(a) Show that null A null(UA) for every

invertible m m matrix U.

(b) Show that dim(null A) dim(null(AV )) for

every invertible n n matrix V. [Hint: If

{X

1

, X

2

, , X

k

} is a basis of null A, show that

{V

1

X

1

, V

1

X

2

, , V

1

X

k

} is a basis of

null(AV ).]

16. Let A denote an m n matrix.

(a) Show that im A im(AV ) for every

invertible n n matrix V.

(b) Show that dim(im A) dim(im(UA)) for

every invertible m m matrix U.

[Hint: If {Y

1

, Y

2

, , Y

k

} is a basis of im(UA),

show that {U

1

Y

1

, U

1

Y

2

, , U

1

Y

k

} is a

basis of im A.]

17. Let U and Wdenote subspaces of

n

, and

assume that U

W. If dimU n 1, show

that either W U or W

n

.

n

, and

assume that U

either U {0} or U W.

A

a b

c d

1

]

1

t X t X t X

k k 1 1 2 2

0 + + +

{ } X X X

k 1 2

, , ,

t X t X t X

k k 1 1 2 2

0 + + +

{ } X X X

k 1 2

, , ,

{ } X X X

k 1 2

, , ,

X X X

k 1 2

, , ,

{ } X X X

k 1 2

, , ,

X X X

k 1 2

, , ,

212 Chapter 5 The Vector Space

n

SECTION 5.3 Orthogonality

Length and orthogonality are basic concepts in geometry and, in

2

and

3

, they

can both can be defined using the dot product. In this section we extend these

concepts to

n

, introduce the idea of an orthogonal basisone of the most useful

concepts in linear algebra, and begin exploring some of its applications.

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 212

Dot Product, Length, and Distance

Let and be two vectors in

n

. The dot

product X

i

Y of X and Y is defined by

.

Note that technically X

T

Y is a 1 1 matrix, which we take to be a number. The

length

X

where indicates the positive square root. A vector of length 1 is called a

unit vector.

Example 1

If and in

5

, then

X

i

Y 2 + 0 15 + 2 1 12, and .

These definitions agree with those in

2

and

3

, and many properties carry over to

n

:

Theorem 1

Let X, Y, and Z denote vectors in

n

. Then:

1. X

i

Y Y

i

X.

2. X

i

(Y + Z) X

i

Y + X

i

Z.

3. (aX)

i

Y a(X

i

Y) X

i

(aY ) for all scalars a.

4.

X

2

X

i

X.

5.

X

0, and

0 if and only if X 0.

6.

aX

PROOF

(1), (2), and (3) follow from matrix arithmetic because X

i

Y X

T

Y; (4) is clear

from the definition; and (6) is a routine verification since . If

, then , so

0 if and only if

. Since each x

i

is a real number this happens if and only if x

i

0

for each i; that is, if and only if X 0. This proves (5).

Because of Theorem 1, computations with dot products in

n

are similar to those

in

3

. In particular, the dot product

equals the sum of mn terms, X

i

i

Y

j

, one for each choice of i and j. For example:

holds for all vectors X and Y.

( ) ( ) 3 4 7 2 21 6 28 8

21 22 8

2 2

X Y X Y X X X Y Y X Y Y

X X Y Y

+ +

i i i i i

i

( ) ( ) X X X Y Y Y

m n 1 2 1 2

+ + + + + + i

x x x

n 1

2

2

2 2

0 + + +

X x x x

n

+ + +

1

2

2

2 2

X x x x

n

T

[ ]

1 2

a a

2

X + + + + 1 0 9 4 1 15

Y

T

[ ] 2 1 5 1 1 X

T

[ ] 1 0 3 2 1

X X X x x x

n

+ + + i

1

2

2

2 2

X Y X Y

x y x y x y

T

n n

i

+ + +

1 2 2 2

Y

y y y

n

T

[ ] 1 2 X x x x

n

T

[ ]

1 2

213 Section 5.3 Orthogonality

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 213

Example 2

Show that

X + Y

2

+ 2X

i

Y +

Y

2

for any X and Y in

n

.

Solution Using Theorem 1 several times:

Example 3

Suppose that for some vectors F

i

. If X

i

F

i

0 for each

i where X is in

n

, show that X 0.

Solution We show X 0 by showing that

Since the F

i

span

n

, write where the t

i

are in .

Then

We saw in Section 4.2 that if u

and v

3

, then

where is the angle between u

and v

. Since

cos

1 for

any angle , this shows that

u

i

v

n

.

Theorem 2 Cauchy Inequality

10

If X and Y are vectors in

n

, then

.

Moreover

X

i

Y

X

Y

of the other.

PROOF

The inequality holds if X 0 or Y 0 (in fact it is equality). Otherwise, write

a > 0 and

Example 2 gives

. ()

It follows that ab X

i

Y 0 and ab + X

i

Y 0, and so that ab X

i

Y ab.

Hence

X

i

Y

ab

X

Y

bX aY ab ab X Y bX aY ab ab X Y + +

2 2

2 2 ( ) ( ) i i and

X Y X Y i

u v

u u

i

|| |||| ||

cos

2

1 1 2 2

1 1 2 2

X X X X t F t F t F

t X F t X F t X F

t

k k

k k

+ + +

+ + +

i i

i i i

( )

( ) ( ) ( )

11 2

0 0 0 0 ( ) ( ) ( ) . + + + t t

k

X t F t F t F

k k

+ + +

1 1 2 2

n

k

F F F , , , span{ }

1 2

X Y X Y X Y X X X Y Y X Y Y

X X Y Y

+ + + + + +

+ +

2

2 2

2

( ) ( )

.

i i i i i

i

214 Chapter 5 The Vector Space

n

10 Augustin Louis Cauchy (17891857) was born in Paris and became a professor at the cole Polytechnique at the age of 26.

He was one of the great mathematicians, producing more than 700 papers, and is best remembered for his work in analysis

in which he established new standards of rigour and founded the theory of functions of a complex variable. He was a devout

Catholic with a long-term interest in charitable work, and he was a royalist, following King Charles X into exile in Prague after

he was deposed in 1830.

Augustin Louis Cauchy.

Photo Corbis.

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 214

215 Section 5.3 Orthogonality

If equality holds, then

X

i

Y

ab, so X

i

Y ab or X

i

Y ab. Hence ()

shows that bX aY 0 or bX + aY 0, so one of X and Y is a multiple of the

other (even if a 0 or b 0).

The Cauchy inequality is equivalent to (X

i

Y)

2

2

; for example, in

5

this

becomes

for all x

i

and y

i

in .

There is an important consequence of the Cauchy inequality. Given X and Y in

n

, use Example 2 and the fact that X

i

Y

X

Y

to compute

.

Taking positive square roots gives:

Corollary Triangle Inequality

If X and Y are vectors in

n

, then

X + Y

+

Y

.

The reason for the name comes from the observation that in

2

the inequality

asserts that the sum of the lengths of two sides of a triangle is not less than the

length of the third side. This is illustrated in the first diagram.

If X and Y are two vectors in

n

, we define the distance d(X, Y ) between

X and Y by

The motivation again comes from

2

as is clear in the second diagram. This

distance function has all the intuitive properties of distance in

2

, including another

version of the triangle inequality.

Theorem 3

If X, Y, and Z are three vectors in

n

we have:

1. d(X, Y ) 0 for all X and Y.

2. d(X, Y ) 0 if and only if X Y.

3. d(X, Y ) d(Y, X).

4. Triangle inequality. d(X, Z) d(X, Y ) + d(Y, Z ).

PROOF

(1) and (2) restate part (5) of Theorem 1 because d(X, Y )

X Y

because

n

. To prove (4) use the Corollary

to Theorem 2:

d X Z X Z X Y Y Z

X Y Y Z d X Y d Y Z

( , ) ( ) ( )

( ) ( ) ( , ) ( , ).

+

+ +

d X Y X Y ( , )

X Y X X Y Y X X Y Y X Y + + + + + +

2 2 2 2 2 2

2 2 i ( )

( )

( ) (

x y x y x y x y x y

x x x x x

y y

1 1 2 2 3 3 4 4 5 5

2

1

2

2

2

3

2

4

2

5

2

2

1

2

2

2

+ + + +

+ + + +

+ ++ + + y y y

3

2

4

2

5

2

2

)

+

v

Orthogonal Sets and the Expansion Theorem

Two nonzero vectors and in

3

are orthogonal if and only if

(Theorem 3 4.2). More generally, a set of vectors in

n

is called

an orthogonal set if

.

Note that {X} is an orthogonal set if X 0. A set of vectors in

n

is

called orthonormal if it is orthogonal and, in addition, each X

i

is a unit vector:

.

Example 4

The standard basis is an orthonormal set in

n

.

The routine verification is left to the reader, as is the proof of:

Example 5

If is orthogonal, so also is for any

nonzero scalars a

i

.

If X 0, it follows from item (6) of Theorem 1 that X is a unit vector,

that is it has length 1. Hence if is an orthogonal set, then

is an orthonormal set, and we say that it is

the result of normalizing the orthogonal set .

Example 6

If , and ,

then {F

1

, F

2

, F

3

, F

4

} is an orthogonal set in

4

as is easily verified. After

normalizing, the corresponding orthonormal set is .

The most important result about orthogonality is Pythagoras theorem. Given

orthogonal vectors and in

3

, it asserts that as in the dia-

gram. In this form the result holds for any orthogonal set in

n

.

Theorem 4 Pythagoras Theorem

If is a orthogonal set in

n

, then

.

PROOF

The fact that X

i

i

X

j

0 whenever i j gives

This is what we wanted.

X X X X X X X X X

X X X X X X

k k k

k k

1 2

2

1 2 1 2

1 1 2 2

+ + + + + + + + +

+ + +

i

i i i

( ) ( )

( ))

( )

( ) ( ).

+ + + +

+ + + + + + +

X X X X X X

X X X

k

1 2 1 3 2 3

1

2

2

2 2

0 0 0

i i i

X X X X X X

k k 1 2

2

1

2

2

2 2

+ + + + + +

X X X

k 1 2

, , ,

v w v w

+ +

2 2 2

= w

{ } , , ,

1

2

1

1

6

2

1

2

3

1

2 3

4

F F F F

F

T

4

1 3 1 1 [ ] F F F

T T T

1 2 3

1 1 1 1 1 0 1 2 1 0 1 0 [ ] , [ ] , [ ]

{ , , , } X X X

k 1 2

1 1 1

1

1

2

2

X

X

X

X

X

X

k

k

, , ,

{ , , , } X X X

k 1 2

1

X

{ , , , } a X a X a X

k k 1 1 2 2

{ , , , } X X X

k 1 2

{ , , , } E E E

n 1 2

X i

i

1 for each

{ , , , } X X X

k 1 2

X X i j X i

i j i

i 0 0

11

for all and for all

{ , , , } X X X

k 1 2

v w

i 0 w

n

11 The reason for insisting that orthogonal sets consist of nonzero vectors is that we will be primarily concerned with orthogonal bases.

v

v + w w

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 216

If and are orthogonal, nonzero vectors in

3

, then they are certainly not

parallel, and so are linearly independent by Example 6 5.2. The next theorem gives

a far-reaching extension of this observation.

Theorem 5

Every orthogonal set in

n

is linearly independent.

PROOF

Let be an orthogonal set in

n

and suppose a linear combination

vanishes: . Then

Since

X

1

2

0, this implies that t

1

0. Similarly t

i

0 for each i.

Theorem 5 suggests considering orthogonal bases for

n

, that is orthogonal sets

that span

n

. These turn out to be the best bases. One reason is that, when expand-

ing a vector as a linear combination of the basis vectors, there are explicit formulas

for the coefficients.

Theorem 6 Expansion Theorem

Let be an orthogonal basis of a subspace U of

n

. If X is any

vector in U, we have

.

PROOF

Since spans U, we have where the t

i

are scalars. To find t

1

we take the dot product of both sides with F

1

:

Since F

1

0, this gives . Similarly, for each i.

The expansion of X as a linear combination of the orthogonal basis is

called the Fourier expansion of X, and the coefficients are called the

Fourier coefficients. Note that if is actually orthonormal, then

t

i

X

i

F

i

for each i. We will have a great deal more to say about this in Section 10.5.

{ } F F F

m 1 2

, , ,

t

X F

F

i

i

i

i

2

{ } F F F

m 1 2

, , ,

t

X F

F

i

i

i

i

2

t

X F

F

1

1

1

2

i

X F t F t F t F F

t F F t F F t F F

m m

m m

i i

i i i

1 1 1 2 2 1

1 1 1 2 2 1 1

+ + +

+ + +

( )

( ) ( ) ( )

+ + +

t F t t

t F

m 1 1

2

2

1 1

2

0 0 ( ) ( )

.

X t F t F t F

m m

+ + +

1 1 2 2

{ } F F F

m 1 2

, , ,

X

X F

F

F

X F

F

F

X F

F

F

m

m

m

+ + +

i i

i

1

1

2

1

2

2

2

2

2

{ , , , } F F F

m 1 2

0 0

1 1 1 1 2 2

1 1 1 2 1 2 1

+ + +

+ + +

X X t X t X t X

t X X t X X t X X

k k

k

i i

i i i

( )

( ) ( ) (

kk

k

t X t t

t X

)

( ) ( )

.

+ + +

1 1

2

2

1 1

2

0 0

t X t X t X

k k 1 1 2 2

0 + + +

{ , , , } X X X

k 1 2

w

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 217

Example 7

Expand as a linear combination of the orthogonal basis

{F

1

, F

2

, F

3

, F

4

} of

4

given in Example 6.

Solution We have , , , and

, so the Fourier coefficients are

The reader can verify that indeed X t

1

F

1

+ t

2

F

2

+ t

3

F

3

+ t

4

F

4

.

A natural question arises here: Does every subspace U of

n

have an orthogonal

basis? The answer is yes; in fact, there is a systematic procedure, called the

GramSchmidt algorithm, for turning any basis of U into an orthogonal one. This

leads to a definition of the projection onto a subspace U that generalizes the projec-

tion along a vector used in

2

and

3

. All this is discussed in Section 8.1.

Exercises 5.3

t

X F

F

a b c d t

X F

F

a c

t

X F

F

a c

1

1

1

2

1

4

3

3

3

2

1

2

2

2

2

2

1

6

+ + +

+

i i

i

( ) ( )

( ++ + + 2 3

4

4

4

2

1

12

d t

X F

F

a b c d ) ( )

i

F

T

4

1 3 1 1 [ ]

F

T

3

1 0 1 0 [ ] F

T

2

1 0 1 2 [ ] F

T

1

1 1 1 1 [ ]

X a b c d

T

[ ]

218 Chapter 5 The Vector Space

n

1. Obtain an orthonormal basis of

3

by

normalizing the following.

(a) {[1 1 2]

T

, [0 2 1]

T

, [5 1 2]

T

}

(b) {[1 1 1]

T

, [4 1 5]

T

, [2 3 1]

T

}

2. In each case, show that the set of vectors is

orthogonal in

4

.

(a) {[1 1 2 5]

T

, [4 1 1 1]

T

, [7 28 5 5]

T

}

( b) {[2 1 4 5]

T

, [0 1 1 1]

T

, [0 3 2 1]

T

}

3. In each case, show that B is an orthogonal

basis of

3

and use Theorem 6 to expand

X [a b c]

T

as a linear combination of the basis

vectors.

(a) B {[1 1 3]

T

, [2 1 1]

T

, [4 7 1]

T

}

( b) B {[1 0 1]

T

, [1 4 1]

T

, [2 1 2]

T

}

(c) B {[1 2 3]

T

, [1 1 1]

T

, [5 4 1]

T

}

(d) B {[1 1 1]

T

, [1 1 0]

T

, [1 1 2]

T

}

4. In each case, write X as a linear combination of

the orthogonal basis of the subspace U.

(a)

(b)

5. In each case, find all [a b c d ]

T

in

4

such that

the given set is orthogonal.

(a) {[1 2 1 0]

T

, [1 1 1 3]

T

, [2 1 0 1]

T

,

[a b c d]

T

}

(b) {[1 0 1 1]

T

, [2 1 1 1]

T

, [1 3 1 0]

T

,

[a b c d]

T

}

6. If and X i Y 2, compute:

7. In each case either show that the statement is

true or give an example showing that it is false.

(a) Every independent set in

n

is orthogonal.

(b) If {X, Y } is an orthogonal set in

n

, then

{X, X + Y } is also orthogonal.

(c) If {X, Y } and {Z, W} are both orthogonal in

n

, then {X, Y, Z, W} is also orthogonal.

(d) If {X

1

, X

2

} and {Y

1

, Y

2

, Y

3

} are both

orthogonal and X

i

i

Y

j

0 for all i and j,

then {X

1

, X

2

, Y

1

, Y

2

, Y

3

} is orthogonal.

(e) If is orthogonal in

n

, then

.

(f ) If X 0 in

n

, then {X} is an orthogonal set.

n

n

X X X , , , span{ }

1 2

{ } X X X

n 1 2

, , ,

(a)

(b)

(c)

(

3 5

2 7

3 2

X Y

X Y

X Y Y X

( ) ( ) i

dd) ( ) ( ) X Y X Y + 2 3 5 i

X Y 3 1 , ,

X

U

T

T T

[ ] ;

{[ ] , [ ] }

14 1 8 5

2 1 0 3 2 1 2 1 span

X

U

T

T T

[ ] ;

{[ ] , [ ] }

13 20 15

1 2 3 1 1 1 span

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 218

8. If A is an m n matrix with orthonormal

columns, show that A

T

A I

n

. [Hint: If

are the columns of A, show that

column j of A

T

A is

.

]

9. Use the Cauchy inequality to show that

for all x 0 and y 0. Here

and are called, respectively,

the geometric mean and arithmetic mean of

x and y.

[Hint: Use and .]

10. Use the Cauchy inequality to prove that:

(a) for all

r

i

in and all n 1.

(b) for all r

1

, r

2

,

and r

3

in . [Hint: See part (a).]

11. (a) Show that X and Y are orthogonal in

n

if

and only if

( b) Show that X + Y and X Y are orthogonal

in

n

if and only if

12. Show that if and only if

X is orthogonal to Y.

13. (a) Show that

for all X, Y in

n

.

( b) Show that

for all X, Y in

n

.

14. If A is n n, show that every eigenvalue of A

T

A

is nonnegative. [Hint: Compute where X

is an eigenvector.]

15. If

n

span{X

1

, , X

m

} and X

i

X

i

0 for all i,

show that X 0. [Hint: Show

]

16. If

n

span{X

1

, , X

m

} and X

i

X

i

Y

i

X

i

for

all i, show that X Y.

17. Let {E

1

, , E

n

} be an orthogonal basis of

n

.

Given X and Y in

n

, show that

. X Y

X E Y E

E

X E Y E

E

n n

n

i

i i

i i

+ +

( )( ) ( )( )

1 1

1

2 2

X 0.

AX

2

X Y

2

+ ]

X Y X Y

2 2 2 1

2

+ + [

X Y X Y X Y i +

1

4

2 2

[ ]

X Y X Y + +

2 2 2

X Y .

X Y X Y + .

r r r r r r r r r

1 2 1 3 2 3 1

2

2

2

3

2

+ + + +

( ) ( ) r r r n r r r

n n 1 2

2

1

2

2

2 2

+ + + + + +

Y

y

x

1

]

1

1

1

X

x

y

1

]

1

1

1

1

2

( ) x y +

xy xy x y +

1

2

( )

[ ] C C C C C C

j j n j

T

1 2

i i i

C C C

n 1 2

, , ,

219 Section 5.4 Rank of a Matrix

SECTION 5.4 Rank of a Matrix

In this section we use independence and spanning to properly define the rank of a

matrix and to study its properties. This requires that we deal with rows and columns

in the same way. While it has been our custom to write the n-tuples in

n

as

columns, in this section we will frequently write them as rows. Subspaces, independ-

ence, spanning, and dimension are defined for rows using matrix operations, just as

for columns. If A is an m n matrix, we define:

The column space, col A, of A is the subspace of

m

spanned by the columns of A.

The row space, row A, of A is the subspace of

n

spanned by the rows of A.

Much of what we do in this section involves these subspaces. Recall from

Theorem 4 2.2 that if are the columns of an m n matrix A, and if

is any column in

n

, then

. ()

With this we can prove:

Theorem 1

Let A, U, and V be matrices of sizes m n, p m, and n q respectively. Then:

1. col(AV)

2. row(UA)

AX C C C

x

x

x

x C x C x C

n

n

n n

+ + +

1

]

1

]

1

1

1

1

1

1

1

1 2

1

2

1 1 2 2

X x x x

n

T

[ ]

1 2

C C C

n 1 2

, , ,

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 219

PROOF

Let denote the columns of A, and let

denote column j of V. Then column j of AV is AX

j

, and by () this is

AX

j

. Since this is in col A for each j, it follows that

col(AV)

1

]

col(AV) in

the same way, and (1) follows.

As to (2), we have col[(UA)

T

] col(A

T

U

T

)

col(A

T

) by (1), from which it

follows that row(UA)

in the proof of (1).

Now suppose that a matrix A is carried to some row-echelon matrix R by row

operations. Then R UA for some invertible matrix U by Theorem 1 2.4, so

Theorem 1 shows that row R row A. Moreover, the next lemma shows that

dim(row R) is the rank of A defined in Section 1.2, and hence shows that rank A is

independent of the particular row-echelon matrix to which A can be carried. This

fact was not proved in Section 1.2.

Lemma 1

Let R denote an m n row-echelon matrix.

1. The rows of R are a basis of row R.

2. The columns of R containing the leading ones are a basis of col R.

PROOF

1. If denote the nonzero rows of R, we have

by definition. Suppose where each a

i

is in . Then

a

1

0 because the leading 1 in R

1

is to the left of any nonzero entry in any other

R

i

. But then and so a

2

0 in the same way (because the

matrix R with R

1

deleted is also row-echelon). This continues to show that

each a

i

0.

2. The r columns containing leading ones are independent because the leading ones

are in different rows (and have zeros below them). It is clear that col R is con-

tained in the subspace of all columns in

m

with the last m r entries zero. This

space has dimension r, so the r independent columns containing leading ones are

a basis by Theorem 7 5.2.

Somewhat surprisingly, Lemma 1 is instrumental in showing that

dim(col A) dim(row A) for any matrix A. This is the main result in the following

fundamental theorem.

Theorem 2 Rank Theorem

Let A denote any m n matrix. Then

.

Moreover, suppose A can be carried to a matrix R in row-echelon form by a

series of elementary row operations. If r denotes the number of nonzero rows

in R, then

dim( ) dim( ) row col A A

a R a R

r r 2 2

0 + +

a R a R a R

r r 1 1 2 2

0 + + +

row span{ } R R R R

r

, , ,

1 2

R R R

r 1 2

, , ,

x C x C x C

n n 1 1 2 2

+ + +

X x x x

j n

T

[ ]

1 2

C C C

n 1 2

, , ,

220 Chapter 5 The Vector Space

n

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 220

1. The r nonzero rows of R are a basis of row A.

2. If the leading 1s lie in columns j

1

, j

2

, , j

r

of R, then the corresponding

columns j

1

, j

2

, , j

r

of A are a basis of col A.

PROOF

We have R UA for some invertible matrix U. Hence rowA rowR by Theorem 1,

and (1) follows from Lemma 1.

To prove (2), let C

1

, C

2

, , C

n

denote the columns of A. Then

A [C

1

C

2

C

n

] in block form, and

.

Hence, in the notation of (2), the set consists of

the columns of R that contain a leading 1, so B is a basis of col R by Lemma 1.

But then the fact that U is invertible implies that is linearly

independent. Furthermore, if C

j

is any column of A, then UC

j

is a linear

combination of the columns in the set B. Again, the invertibility of U implies that

C

j

is a linear combination of This proves (2).

Finally, dim(row A) r dim(col A) by (1) and (2).

The common dimension of the row and column spaces of an m n matrix A is

called the rank of A and is denoted rank A. By (1) of Theorem 2, this agrees with

the definition in Section 1.2 and we record the result for reference.

Corollary 1

Suppose a matrix A can be carried to a matrix R in row-echelon form by a series

of elementary row operations. Then the rank of A is equal to the number of

nonzero rows of R.

Example 1

Compute the rank of matrix and find bases for the row

space and the column space of A.

Solution The reduction of A to row-echelon form is as follows:

Hence rank A 2, and {[ 1 2 2 1 ], [ 0 0 1 3 ]} is a basis of the row

space of A. Moreover, the leading 1s are in columns 1 and 3 of the

row-echelon matrix, so Theorem 2 shows that columns 1 and 3 of A are

a basis of col A.

1

3

1

2

5

1

1

]

1

1

1

]

1

1

,

1 2 2 1

3 6 5 0

1 2 1 2

1 2 2 1

0 0 1 3

0 0 1 3

1 2 2 1

0 0 1 3

0 0

1

]

1

1

1

]

1

1

00 0

1

]

1

1

A

1

]

1

1

1 2 2 1

3 6 5 0

1 2 1 2

C C C

j j j

r 1 2

, , , .

{ , , , } C C C

j j j

r 1 2

B UC UC UC

j j j

r

{ , , }

1 2

,

R UA U C C C UC UC UC

n n

[ ] [ ]

1 2 1 2

221 Section 5.4 Rank of a Matrix

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 221

The rank theorem has several other important consequences. Corollary 2 follows

because the rows of A are independent (respectively, span row A) if and only if their

transposes are independent (respectively, span col(A

T

)).

Corollary 2

If A is any matrix, then rank A rank(A

T

).

Corollary 3

If A is an m n matrix, then rank A m and rank A n.

PROOF

If A is carried to the row-echelon matrix R by row operations, then Corollary 1

shows that rank A r where r is the number of nonzero rows of R. Since R is m n

too, it follows that r m. Applying this to A

T

gives rank(A

T

) n because A

T

is

n m. Hence we are done by Corollary 2.

Theorem 1 immediately yields

Corollary 4

rank A rank(UA) rank(AV) whenever U and V are invertible.

Corollary 5

An n n matrix A is invertible if and only if rank A n.

PROOF

If A is invertible, then A I

n

by row operations ( by Theorem 5 2.3), so rank A n

by Corollary 1. Conversely, let A R by row operations where R is an n n

reduced row-echelon matrix. If rank A n, then R has n leading ones by Corollary 1,

and so R I

n

. Hence A I

n

so A is invertible, again by Theorem 5 2.3.

The rank theorem can be used to find bases of subspaces of the space of all

n 1 rows. Here is an example where n 4.

Example 2

Find a basis for the following subspace of

4

, (written as rows).

U

span{[ ], [ ], [ ]}

1 1 2 3 2 4 1 0 1 5 4 9

222 Chapter 5 The Vector Space

n

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 222

Solution U is just the row space of , so we reduce this to row-echelon

form:

The required basis is Thus is

also a basis and avoids fractions.

In Section 5.1 we discussed two other subspaces associated with an m n matrix A,

the null space null A {X

X in

n

and AX 0} and the image im A {AX X in

n

}.

Using the rank, there are simple ways to find bases for these spaces.

We already know ( Theorem 3 2.2) that null A is spanned by the basic solutions

to the system AX 0. The following example is instructive in showing that these

basic solutions are, in fact, independent (and so are a basis of null A).

Example 3

If find a basis of null A and so find its dimension.

Solution If X is in null A, then AX 0, so X is given by solving the system AX 0.

The reduction of the augmented matrix to reduced form is

Hence, writing X [x

1

x

2

x

3

x

4

]

T

, the leading variables are x

1

and x

3

, and the

nonleading variables x

2

and x

4

become parameters: x

2

s and x

4

t. Then the

equations corresponding to the reduced matrix determine the leading variables

in terms of the parameters:

This means that the general solution is

()

Hence X is in span{X

1

, X

2

} where X

1

[2 1 0 0]

T

and X

2

[1 0 2 1]

T

are

the basic solutions, and we have shown that null(A)

span{X

1

, X

2

}. But X

1

and

X

2

are in null A (they are solutions of AX 0), so

by Theorem 1 5.1. We claim further that {X

1

, X

2

} is linearly independent.

To see this, let sX

1

+ tX

2

0 be a linear combination that vanishes. Then (

)

shows that [2s + t s 2t t ]

T

0, whence s t 0. Thus {X

1

, X

2

} is a basis of

null(A), and so dim(null A) 2.

null { , } A X X span

1 2

X s t s t t s t

T T T

+ + [ ] [ ] [ ] . 2 2 2 1 0 0 1 0 2 1

x s t x t

1 3

2 2 + and .

1 2 1 1 0

1 2 0 1 0

2 4 1 0 0

1 2 0 1 0

0 0 1 2 0

0 0 0 0 0

1

]

1

1

1

]

1

1

A

1

]

1

1

1

1 2 1 1

1 2 0 1

2 4 1 0

{[1 1 2 3 0 2 3 6 ], [ ]} {[ ], [ ]}. 1 1 2 3 0 1 3

3

2

1 1 2 3

2 4 1 0

1 5 4 9

1 1 2 3

0 2 3 6

0 4 6 12

1 1 2 3

0 1

3

2

1

]

1

1

1

]

1

1

1

]

1

1

1

3

0 0 0 0

1 1 2 3

2 4 1 0

1 5 4 9

1

]

1

1

223 Section 5.4 Rank of a Matrix

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 223

224 Chapter 5 The Vector Space

n

The calculation in Example 3 is typical of what happens in general. If A is an

m n matrix with rank A r, there are exactly r leading variables, and hence exactly

n r nonleading variables. These lead to exactly n r basic solutions X

1

, X

2

, , X

nr

,

and the reduced equations give the leading variables as linear combinations of the

X

i

. Hence

( This is Theorem 3 2.2.). We now claim that these basic solutions X

i

are indepen-

dent. The general solution is a linear combination X t

1

X

1

+ t

2

X

2

+

+ t

nr

X

nr

where each coefficient t

i

is a parameter equal to a nonleading variable. Thus, if this

linear combination vanishes, then each t

i

0 (as for s and t in Example 3, each t

i

is a

coefficient when X is expressed as a linear combination of the standard basis of

n

).

This proves that {X

1

, X

2

, , X

nr

} is linearly independent, and so is a basis of null A.

This proves the first part of the following theorem.

Theorem 3

Let A denote an m n matrix of rank r.

1. If X

1

, X

2

, , X

nr

are the basic solutions of the homogeneous system AX 0

that are produced by the gaussian algorithm, then {X

1

, X

2

, , X

nr

} is

a basis of null A. In particular

2. We have im A col A so the rank theorem provides a basis of im A.

In particular,

PROOF

It remains to prove (2). But im A col A by Example 8 5.1, so

dim(im A) dim(col A) r. The rest follows from Theorem 2.

Let A be an m n matrix. Corollary 3 of the rank theorem asserts that

rank A m and rank A n, and it is natural to ask when these extreme cases arise.

If are the columns of A, Theorem 2 5.2 shows that

spans

m

if and only if the system AX B is consistent for every B in

m

, and

that is independent if and only if AX 0, X in

n

, implies X 0.

The next two theorems improve on both these results, and relate them to when the

rank of A is n or m.

Theorem 4

The following are equivalent for an m n matrix A:

1. rank A n.

2. The rows of A span

n

.

3. The columns of A are linearly independent in

m

.

{ } C C C

n 1 2

, , ,

{ } C C C

n 1 2

, , , C C C

n 1 2

, , ,

dim(im ) . A r

dim(null ) . A n r

null { , , , }. A X X X

n r

span

1 2

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 224

4. The n n matrix A

T

A is invertible.

5. CA I

n

for some n m matrix C.

6. If AX 0, X in

n

, then X 0.

PROOF

(1) (2). We have row A

n

, and dim(row A) n by (1), so row A

n

by

Theorem 8 5.2. This is (2).

(2) (3). By (2), row A

n

, so rank A n. This means dim(col A) n. Since the

n columns of A span col A, they are independent by Theorem 7 5.2.

(3) (4). If (A

T

A)X 0, X in

n

, we show that X 0 (Theorem 5 2.3). We have

.

Hence AX 0, so X 0 by (3) and Theorem 2 5.2.

(4) (5). Given (4), take C (A

T

A)

1

A

T

.

(5) (6). If AX 0, then left multiplication by C (from (5)) gives X 0.

(6) (1). Given (6), the columns of A are independent by Theorem 2 5.2.

Hence dim(col A) n, and (1) follows.

Theorem 5

The following are equivalent for an m n matrix A:

1. rank A m.

2. The columns of A span

m

.

3. The rows of A are independent in

n

.

4. The m m matrix AA

T

is invertible.

5. AC I

m

for some n m matrix C.

6. The system AX B is consistent for every B in

m

.

PROOF

(1) (2). By (1), dim(col A) m, so col A

m

by Theorem 8 5.2.

(2) (3). By (2), col A

m

, so rank A m. This means dim(row A) m. Since

the m rows of A span row A, they are independent by Theorem 7 5.2.

(3) (4). We have rank A m by (3), so the n m matrix A

T

has rank m. Hence

applying Theorem 4 to A

T

in place of A shows that (A

T

)

T

A

T

is invertible, proving (4).

(4) (5). Given (4), take C A

T

(AA

T

)

1

in (5).

(5) (6). Comparing columns in AC I

m

gives AC

j

E

j

for each j, where C

j

and

E

j

denote column j of C and I

m

respectively. Given B in

m

, write , r

j

in

. Then (6) holds with as the reader can verify.

(6) (1). Given (6), the columns of A span

m

by Theorem 2 5.2. Thus

col A

m

and (1) follows.

Example 4

Show that is invertible if x, y, and z are all distinct.

3

2 2 2

x y z

x y z x y z

+ +

+ + + +

1

]

1

X r C

j j

j

m

1

B r E

j j

j

m

1

AX AX AX X A AX X

T T T T 2

0 0 ( )

225 Section 5.4 Rank of a Matrix

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 225

Solution The given matrix has the form A

T

A where has independent

columns (verify). Hence Theorem 4 applies.

Theorems 4 and 5 relate several important properties of an m n matrix A to the

invertibility of the square, symmetric matrices A

T

A and AA

T

. In fact, even if the

columns of A are not independent or do not span

m

, the matrices A

T

A and AA

T

are

both symmetric and, as such, have real eigenvalues as we shall see. We return to this

in Chapter 7.

Exercises 5.4

A

x

y

z

1

]

1

1

1

1

1

226 Chapter 5 The Vector Space

n

1. In each case find bases for the row and column

spaces of A and determine the rank of A.

2. In each case find a basis of the subspace U.

(a) U span{[1 1 0 3], [2 1 5 1], [4 2 5 7]}

(b) U span{[1 1 2 5 1], [3 1 4 2 7],

[1 1 0 0 0], [5 1 6 7 8]}

(c)

(d)

3. (a) Can a 3 4 matrix have independent

columns? Independent rows? Explain.

(b) If A is 4 3 and rank A 2, can A have

independent columns? Independent rows?

Explain.

(c) If A is an m n matrix and rank A m,

show that m n.

(d) Can a nonsquare matrix have its rows

independent and its columns independent?

Explain.

(e) Can the null space of a 3 6 matrix have

dimension 2? Explain.

(f ) If A is not square, show that either the rows

of A or the columns of A are not linearly

independent.

4. (a) Show that rank UA rank A, with equality if

U is invertible.

(b) Show that rank AV rank A, with equality if

V is invertible.

5. Show that rank (AB) rank A and that

rank (AB) rank B.

6. Show that the rank does not change when an

elementary row or column operation is

performed on a matrix.

7. In each case find a basis of the null space of A.

Then compute rank Aand verify (1) of Theorem 3.

(b)

8. Let A CR where C 0 is a column in

m

and

R 0 is a row in

n

.

(a) Show that col A span{C} and

row A span{R}.

(b) Find dim(null A).

(c) Show that null A null R.

A

1

]

1

1

1

1

3 5 5 2 0

1 0 2 2 1

1 1 1 2 2

2 0 4 4 2

(a) A

1

]

1

1

1

1

3 1 1

2 0 1

4 2 1

1 1 1

U

1

]

1

1

1

1

]

1

1

1

1

]

1

1

1

span

1

5

6

2

6

8

3

7

10

4

8

12

11

]

1

1

1

1

]

1

1

1

1

1

]

1

1

1

1

1

]

1

1

1

1

span

1

1

0

0

0

0

1

1

1

0

1

0

0

1

0

1

1

]

1

1

1

1

(d) A

1

]

1

1 2 1 3

3 6 3 2

( ) c A

1

]

1

1

1

1

1 1 5 2 2

2 2 2 5 1

0 0 12 9 3

1 1 7 7 1

(a) (b) A A

1

]

1

1

1

1

2 4 6 8

2 1 3 2

4 5 9 10

0 1 1 2

2 1 1

2 1 1

4 2 3

6

33 0

1

]

1

1

1

1

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 226

SECTION 5.5 Similarity and Diagonalization

In Section 3.3 we studied diagonalization of a square matrix A, and found important

applications (for example to linear dynamical systems). We can now utilize the con-

cepts of subspace, basis, and dimension to clarify the diagonalization process, reveal

some new results, and prove some theorems which could not be demonstrated in

Section 3.3.

Before proceeding, we introduce a notion that simplifies the discussion of

diagonalization, and is used throughout the book.

Similar Matrices

If A and B are n n matrices, we say that A and B are similar, and write A B, if

B P

1

AP for some invertible matrix P, equivalently (writing Q P

1

) if B QAQ

1

for an invertible matrix Q. The language of similarity is used throughout linear

algebra. For example, a matrix A is diagonalizable if and only if it is similar to a

diagonal matrix.

If A B, then necessarily B A. To see why, suppose that B P

1

AP. Then

A PBP

1

Q

1

BQ where Q P

1

is invertible. This proves the second of the

following properties of similarity (the others are left as an exercise):

1. A A for all square matrices A.

2. If A B, then B A. (

)

3. If A B and B C, then A C.

227 Section 5.5 Similarity and Diagonalization

9. Show that null A 0 if and only if the columns

of A are independent.

10. Let A be an n n matrix.

(a) Show that A

2

0 if and only if col A

null A.

(b) Conclude that if A

2

0, then rank

(c) Find a matrix A for which col A null A.

11. If A is m n and B is n m, show that AB 0 if

and only if col B

null A.

col A {AX X in

n

}.

13. Let A be an m n matrix with columns

C

1

, C

2

, , C

n

. If rank A n, show that

{A

T

C

1

, A

T

C

2

, , A

T

C

n

} is a basis of

n

.

14. If A is m n and B is m 1, show that B lies

in the column space of A if and only if

rank[A B] rank A.

15. (a) Show that AX B has a solution if and only

if rank A rank[A B]. [Hint: Exercises 12

and 14.]

(b) If AX B has no solution, show that

rank[A B] 1 + rank A.

16. Let X be a k m matrix. If I is the m m

identity matrix, show that I + X

T

X is invertible.

[Hint: I + X

T

X A

T

A where A in

block form.]

17. If A is m n of rank r, show that A can be

factored as A PQ where P is m r with r

independent columns, and Q is r n with r

independent rows. [Hint: Let by

Theorem 3, 2.4, and write and

in block form, where U

1

and V

1

are r r.]

18. (a) Show that if A and B have independent

columns, so does AB.

(b) Show that if A and B have independent

rows, so does AB.

19. A matrix obtained from A by deleting rows and

columns is called a submatrix of A. If A has an

invertible k k submatrix, show that rank A k.

[Hint: Show that row and column operations

carry in block form.] Remark: It

can be shown that rank A is the largest integer r

such that A has an invertible r r submatrix.

A

I P

Q

k

1

]

1

0

V

V V

V V

1

]

1

1 1 2

3 4

U

U U

U U

1

]

1

1 1 2

3 4

UAV

I

r

1

]

1

0

0 0

I

X

1

]

1

A

n

2

.

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 227

These properties are often expressed by saying that the similarity relation is an

equivalence relation on the set of n n matrices. Here is an example showing how

these properties are used.

Example 1

If A is similar to B and either A or B is diagonalizable, show that the other is

also diagonalizable.

Solution We have A B. Suppose that A is diagonalizable, say A D where D is

diagonal. Since B A by (2) of (

(3) of (

assume instead that B is diagonalizable.

Similarity is compatible with inverses, transposes, and powers in the following

sense: If A B, then A

1

B

1

, A

T

B

T

, and A

k

B

k

for all integers k 1 (the

proofs are routine matrix computations using Theorem 1 3.3). Thus, for example,

if A is diagonalizable, so also is A

T

, A

1

(if it exists), and A

k

(for each k 1).

Indeed, if A D where D is a diagonal matrix, we obtain A

T

D

T

, A

1

D

1

, and

A

k

D

k

, and each of the matrices D

T

, D

1

, and D

k

is diagonal.

We pause to introduce a simple matrix function that will be referred to later.

The trace tr A of an n n matrix A is defined to be the sum of the main diagonal

elements of A. In other words:

It is evident that tr(A + B) tr A + tr B and that tr(cA) c tr A holds for all n n

matrices A and B and all scalars c. The following fact is more surprising.

Lemma 1

Let A and B be n n matrices. Then tr(AB) tr(BA).

PROOF

Write A [a

i j

] and B [b

i j

]. For each i, the (i, i )-entry of the matrix AB is

Hence

.

Similarly we have . Since these two double sums are the same,

Lemma 1 is proved.

As the name indicates, similar matrices share many properties, some of which are

collected in the next theorem for reference.

Theorem 1

If A and B are similar n n matrices, then A and B have the same determinant,

rank, trace, characteristic polynomial, and eigenvalues.

tr( ) ( ) BA b a

ij ji

j i

tr( )

( )

AB d d d d a b

n i i i j ij ji

+ + +

1 2

d a b a b a b a b

i i i i i in ni ij ji

j

+ + +

1 1 2 2

.

If then tr A a A a a a

i j nn

+ + + [ ], .

11 22

228 Chapter 5 The Vector Space

n

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 228

PROOF

Let B P

1

AP for some invertible matrix P. Then we have

det B det(P

1

) det A det P det A because det(P

1

) 1/det P. Similarly,

rank B rank(P

1

AP) rank A by Corollary 4 of Theorem 2 5.4. Next Lemma 1

gives

As to the characteristic polynomial,

Finally, this shows that A and B have the same eigenvalues because the eigenvalues

of a matrix are the roots of its characteristic polynomial.

Example 2

The matrices have the same determinant, rank,

trace, characteristic polynomial, and eigenvalues, but they are not similar

because P

1

IP I for any invertible matrix P. Hence sharing the five properties

in Theorem 1 does not guarantee that two matrices are similar.

Diagonalization Revisited

Recall that a square matrix A is diagonalizable if there exists an invertible matrix P

such that P

1

AP D is a diagonal matrix, that is if A is similar to a diagonal matrix D.

Unfortunately, not all matrices are diagonalizable, for example the matrix

(see Example 8 3.3). Determining whether A is diagonalizable is closely related to

the eigenvalues and eigenvectors of A. Recall that a number is called an

eigenvalue of A if AX X for some nonzero column X in

n

, and any such

nonzero vector X is called an eigenvector of A corresponding to (or simply

a -eigenvector of A). The eigenvalues and eigenvectors of A are closely related to

the characteristic polynomial c

A

(x) of A, defined by

.

If A is n n this is a polynomial of degree n, and its relationship to the eigenvalues

is given in the following theorem ( a repeat of Theorem 2 3.3).

Theorem 2

Let A be an n n matrix.

1. The eigenvalues of A are the roots of the characteristic polynomial c

A

(x) of A.

2. The -eigenvectors X are the nonzero solutions to the homogeneous system

of linear equations with I A as coefficient matrix.

( ) I A X 0

c x xI A

A

( ) det( )

1 1

0 1

1

]

1

A I

1 1

0 1

and

1 0

0 1

1

]

1

1

]

1

c x xI B x P IP P AP

P xI A P

xI

B

( ) det{ } det{ ( ) }

det{ ( ) }

det(

1 1

1

A

c x

A

)

( ).

tr tr tr tr ( ) [ ( )] [( ) ] . P AP P AP AP P A

1 1 1

229 Section 5.5 Similarity and Diagonalization

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 229

Example 3

Show that the eigenvalues of a triangular matrix are the main diagonal entries.

Solution Assume that A is triangular. Then the matrix xI A is also triangular and has

diagonal entries (x a

11

), (x a

22

), , (x a

nn

) where A [a

i j

]. Hence

Theorem 4 3.1 gives

and the result follows because the eigenvalues are the roots of c

A

(x).

Theorem 3 3.3 asserts (in part) that an n n matrix A is diagonalizable if and

only if it has n eigenvectors X

1

, , X

n

such that the matrix P [X

1

X

n

] with the

X

i

as columns is invertible. This is equivalent to requiring that {X

1

, , X

n

} is a basis

of

n

consisting of eigenvectors of A. Hence we can restate Theorem 3 3.3 as

follows:

Theorem 3

Let A be an n n matrix.

1. A is diagonalizable if and only if

n

has a basis {X

1

, X

2

, , X

n

} consisting of

eigenvectors of A.

2. When this is the case, the matrix P [X

1

X

n

] is invertible and

P

1

AP diag(

1

,

2

, ,

n

) where, for each i,

i

is the eigenvalue of A

corresponding to X

i

.

The next result is a basic tool for determining when a matrix is diagonalizable.

It reveals an important connection between eigenvalues and linear independence:

Eigenvectors corresponding to distinct eigenvalues are necessarily linearly

independent.

Theorem 4

Let X

1

, X

2

, , X

k

be eigenvectors corresponding to distinct eigenvalues

1

,

2

, ,

k

of an n n matrix A. Then {X

1

, X

2

, , X

k

} is a linearly

independent set.

PROOF

We use induction on k. If k 1, then {X

1

} is independent because X

1

0. In general,

suppose the theorem is true for some k 1. Given eigenvectors {X

1

, X

2

, , X

k +1

},

suppose a linear combination vanishes:

(

)

We must show that each t

i

0. Left multiply (

AX

i

i

X

i

to get

(

)

If we multiply (

) by

1

and subtract the result from (

we obtain

t X t X t X

k k k 2 2 1 2 3 3 1 3 1 1 1 1

0 ( ) ( ) ( ) . + + +

+ + +

t X t X t X

k k k 1 1 1 2 2 2 1 1 1

0 + + +

+ + +

.

t X t X t X

k k 1 1 2 2 1 1

0 + + +

+ +

.

c x x a x a x a

A nn

( ) ( )( ) ( )

11 22

230 Chapter 5 The Vector Space

n

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 230

Since X

2

, X

3

, , X

k+1

correspond to distinct eigenvalues

2

,

3

, ,

k+1

, the set

{X

2

, X

3

, , X

k+1

} is independent by the induction hypothesis. Hence,

and so t

2

t

3

t

k +1

0 because the

i

are distinct. Hence (

) becomes

t

1

X

1

0, which implies that t

1

0 because X

1

0. This is what we wanted.

Theorem 4 will be applied several times; we begin by using it to give a useful test

for when a matrix is diagonalizable.

Theorem 5

If A is an n n matrix with n distinct eigenvalues, then A is diagonalizable.

PROOF

Choose one eigenvector for each of the n distinct eigenvalues. Then these

eigenvectors are independent by Theorem 4, and so are a basis of

n

by

Theorem 7 5.2. Now use Theorem 3.

Example 4

Show that is diagonalizable.

Solution A routine computation shows that c

A

(x) (x 1)(x 3)(x + 1) and so has

distinct eigenvalues 1, 3, and 1. Hence Theorem 5 applies.

However, a matrix can have multiple eigenvalues as we saw in Section 3.3. To

deal with this situation, we prove an important lemma which formalizes a technique

that is basic to diagonalization, and which will be used three times below.

Lemma 2

Let {X

1

, X

2

, , X

k

} be a linearly independent set of eigenvectors of an n n

matrix A, extend it to a basis {X

1

, X

2

, , X

k

, , X

n

} of

n

( by Theorem 6 5.2),

and let

be the (invertible) matrix with the X

i

as its columns. If

1

,

2

, ,

k

are the

(not necessarily distinct) eigenvalues of A corresponding to X

1

, X

2

, , X

k

respectively, then P

1

AP has block form

where B and A

1

are matrices of size k (n k) and (n k) (n k) respectively.

P AP

B

A

k

1

]

1

1 1 2

1

0

diag ( , , ... , )

P X X X

n

[ ]

1 2

A

1

]

1

1

1

1 0 0

1 2 3

1 1 0

t t t

k k 2 2 1 3 3 1 1 1 1

0 0 0 ( ) , ( ) , , ( )

+ +

231 Section 5.5 Similarity and Diagonalization

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 231

PROOF

If {E

1

, E

2

, , E

n

} is the standard basis of

n

, then

Comparing columns, we have P

1

X

i

E

i

for each 1 i n. On the other hand,

observe that

Hence, if 1 i k, column i of P

1

AP is

This describes the first k columns of P

1

AP, and Lemma 2 follows.

Note that Lemma 2 (with k n) shows that an n n matrix A is diagonalizable

if

n

has a basis of eigenvectors of A, as in (1) of Theorem 3.

If is an eigenvalue of an n n matrix A, write

This is a subspace of

n

called the eigenspace of A corresponding to (see Example 3

5.1) and the eigenvectors corresponding to are just the nonzero vectors in E

(A).

In fact E

Hence, by Example 7 5.1, the basic solutions of the homogeneous system

(I A)X 0 given by the gaussian algorithm form a basis for E

(A). In particular

.

(

)

Now recall that the multiplicity of an eigenvalue of A is the number of times

occurs as a root of the characteristic polynomial. In other words, the multiplicity of

is the largest integer m 1 such that

for some polynomial g(x). Because of (

Theorem 4 3.3 can be stated as follows: A square matrix is diagonalizable if and

only if the multiplicity of each eigenvalue equals dim[E

prove this, and the proof requires the following result which is valid for any square

matrix, diagonalizable or not.

Lemma 3

Let be an eigenvalue of multiplicity m of a square matrix A. Then dim[E

(A)] m.

c x x g x

A

m

( ) ( ) ( )

dim[ ( )] E A

E A X I A X I A

( ) { ( ) } ( ). 0 null

E A X AX X

n

( ) { }. in

( ) ( ) ( ) . P A X P X P X E

i i i i i i i

1 1 1

P AP P A X X X P AX P AX P AX

n n

1 1

[ ] [ ].

1 2

1

1

1

2

1

[ ] [ ]

1 2

1 1

1 2

E E E I P P P X X X

n n n

[ ].

1

1

1

2

1

P X P X P X

n

n

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 232

PROOF

Write dim[E

A

(x) (x )

d

g(x) for some poly-

nomial g(x), because m is the highest power of (x ) that divides c

A

(x). To this

end, let {X

1

, X

2

, , X

d

} be a basis of E

invertible n n matrix P exists such that

in block form, where I

d

denotes the d d identity matrix. Now write A

P

1

AP

and observe that c

A

(x) c

A

(x) by Theorem 1. But Theorem 5 3.1 gives

Hence c

A

(x) c

A

(x) (x )

d

g(x) where g(x) c

A

1

(x). This is what we wanted.

It is impossible to ignore the question when equality holds in Lemma 3 for each

eigenvalue . It turns out that this characterizes the diagonalizable matrices. This

was stated without proof in Theorem 4 3.3.

Theorem 6

The following are equivalent for a square matrix A:

1. A is diagonalizable.

2. dim[E

PROOF

Let A be n n and let

1

,

2

, ,

k

be the distinct eigenvalues of A. For each i,

let m

i

denote the multiplicity of

i

and write d

i

dim[E

i

(A)]. Then

so m

1

+

+ m

k

n because c

A

(x) has degree

n. Moreover, d

i

m

i

for each i by Lemma 3.

(1) (2). By (1),

n

has a basis of n eigenvectors of A, so let t

i

of them lie in E

i

(A)

for each i. Since the subspace spanned by these t

i

eigenvectors has dimension t

i

,

we have t

i

d

i

for each i by Theorem 4 5.2. Hence

It follows that d

1

+

+ d

k

m

1

+

+ m

k

, so, since d

i

m

i

for each i, we must have

d

i

m

i

. This is (2).

(2) (1). Let B

i

denote a basis of E

i

(A) for each i, and let B B

1

B

k

.

Since each B

i

contains m

i

vectors by (2), and since the B

i

are pairwise disjoint (the

i

are distinct), it follows that B contains n vectors. So it suffices to show that B is

linearly independent (then B is a basis of

n

). Suppose a linear combination of the

vectors in B vanishes, and let Y

i

denote the sum of all terms that come from B

i

.

n t t d d m m n

k k k

+ + + + + +

1 1 1

.

c x x x x

A

m m

k

m

k

( ) ( ( )

1 2

1 2

) ( )

c x xI A

x I B

xI A

x I

A n

d

n d

d

1

]

1

( ) det( ) det

( )

det [( )

0

1

]] det [ ]

( ) ( ).

xI A

x c x

n d

d

A

1

1

P AP

I B

A

d

1

]

1

1

1

0

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 233

Then Y

i

lies in E

i

(A) for each i, so the nonzero Y

i

are independent by Theorem 4

(as the

i

are distinct). Since the sum of the Y

i

is zero, it follows that Y

i

0 for each

i. Hence all coefficients of terms in Y

i

are zero (because B

i

is independent). This

shows that B is independent.

Example 5

If show that A is diagonalizable

but B is not.

Solution We have c

A

(x) (x + 3)

2

(x 1) so the eigenvalues are

1

3 and

2

1. The

corresponding eigenspaces are E

1

(A) span{X

1

, X

2

} and E

2

(A) span{X

3

}

where

as the reader can verify. Since {X

1

, X

2

} is independent, we have dim(E

1

(A)) 2

which is the multiplicity of

1

. Similarly, dim(E

2

(A)) 1 equals the multiplicity

of

2

. Hence A is diagonalizable by Theorem 6, and a diagonalizing matrix is

P [X

1

X

2

X

3

].

Turning to B, c

B

(x) (x + 1)

2

(x 3) so the eigenvalues are

1

1 and

2

3.

The corresponding eigenspaces are E

1

(B) span{Y

1

} and E

2

(B) span{Y

2

} where

Here dim(E

1

(B)) 1 is smaller than the multiplicity of

1

, so the matrix B is

not diagonalizable, again by Theorem 6. The fact that dim(E

1

(B)) 1 means

that there is no possibility of finding three linearly independent eigenvectors.

Complex Eigenvalues

All the matrices we have considered have had real eigenvalues. But this need not be

the case: The matrix has characteristic polynomial c

A

(x) x

2

+1 which

has no real roots. Nonetheless, this matrix is diagonalizable; the only difference is

that we must use a larger set of scalars, the complex numbers. The basic properties

of these numbers are outlined in Appendix A.

Indeed, nearly everything we have done for real matrices can be done for com-

plex matrices. The methods are the same; the only difference is that the arithmetic

is carried out with complex numbers rather than real ones. For example, the

gaussian algorithm works in exactly the same way to solve systems of linear equa-

tions with complex coefficients, matrix multiplication is defined the same way, and

the matrix inversion algorithm works in the same way.

But the complex numbers are better than the real numbers in one respect: While

there are polynomials like x

2

+ 1 with real coefficients that have no real root, this

problem does not arise with the complex numbers: Every nonconstant polynomial

with complex coefficients has a complex root, and hence factors completely as

a product of linear factors. This fact is known as the Fundamental Theorem of

Algebra, and was first proved by Gauss.

12

A

1

]

1

0 1

1 0

Y Y

T T

1 2

1 2 1 5 6 1 [ ] , [ ] .

X X X

T T T

1 2 3

1 1 0 2 0 1 2 1 1 [ ] , [ ] , [ ]

A B

1

]

1

1

1

]

1

1

5 8 16

4 1 8

4 4 11

2 1 1

2 1 2

1 0 2

and ,

234 Chapter 5 The Vector Space

n

12 This was a famous open problem in 1799 when Gauss solved it at the age of 22 in his Ph.D. dissertation.

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 234

Example 6

Diagonalize the matrix

Solution The characteristic polynomial of A is

where i

2

1. Hence the eigenvalues are

1

i and

2

i, with correspond-

ing eigenvectors . Hence A is diagonalizable by the

complex version of Theorem 5, and the complex version of Theorem 3 shows

that is invertible and

Of course, this can be checked directly.

We shall return to complex linear algebra in Section 8.6.

Symmetric Matrices

13

On the other hand, many of the applications of linear algebra involve a real matrix A

and, while A will have complex eigenvalues by the Fundamental Theorem of Algebra,

it is always of interest to know when the eigenvalues are, in fact, real. While this can

happen in a variety of ways, it turns out to hold whenever A is symmetric. This

important theorem will be used extensively later. Surprisingly, the theory of complex

eigenvalues can be used to prove this useful result about real eigenvalues.

If Z is a complex matrix, the conjugate matrix is defined to be the matrix

obtained from Z by conjugating every entry. Thus, if Z [z

i j

], then For

example,

Recall that holds for all complex numbers z and w. It

follows that if Z and Ware two complex matrices, then

hold for all complex scalars . These facts are used in the proof of the following theorem.

Theorem 7

Let A be a symmetric real matrix. If is any eigenvalue of A, then is real.

14

PROOF

Observe that A

real by showing that

to , so that X 0 and AX X. Define c X

T

X

.

Z W Z W ZW Z W Z + + , ) and (

z w z w zw z w + + and

If then Z

i

i i

Z

i

i i

+

+

1

]

1

+

1

]

1

2 5

3 4

2 5

3 4

Z z

i j

[ ].

Z

P AP

i

i

1

]

1

1

]

1

1 1

2

0

0

0

0

. P X X

i i

1

]

1

[ ]

1 2

1 1

X

i

X

i

1 2

1 1

1

]

1

1

]

1

and

c x xI A x x i x i

A

( ) det ( ) ( )( ) + +

2

1

A

1

]

1

0 1

1 0

.

235 Section 5.5 Similarity and Diagonalization

13 This discussion uses complex conjugation and absolute value. These topics are discussed in Appendix A.

14 This theorem was first proved in 1829 by the great French mathematician Augustin Louis Cauchy (17891857) who is most

remembered for his work in analysis.

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 235

If we write X [z

1

z

2

z

n

]

T

where the z

i

are complex numbers, we have

Thus c is a real number, and c > 0 because at least one of the z

i

0 (as X 0).

We show that

by verifying that c

c. We have

At this point we use the hypothesis that A is symmetric and real. This means

A

T

A A

, so we continue

as required.

The technique in the proof of Theorem 7 will be used again when we return to

complex linear algebra in Section 8.6.

Example 7

Verify Theorem 7 for every real, symmetric 2 2 matrix A.

Solution If we have c

A

(x) x

2

(a + c) x + (ac b

2

), so the eigenvalues are

given by . But the discriminant

for any choice of a, b, and c. Hence, the eigenvalues are real numbers.

Exercises 5.5

( ) ( ) ( ) a c ac b a c b + +

2 2 2 2

4 4 0

+ +

1

]

1

2

2 2

4 ( ) ( ) ( ) a c a c ac b

A

a b

b c

1

]

1

c X A X X AX X AX X X

X X

X X

c

T T T T T

T

T

( ) ( ) ( )

( )

c X X X X AX X X A X

T T T T T

( ) ( ) ( ) .

c X X z z z

z

z

z

z z z z z z

T

n

n

n n

1

]

1

1

1

1

+ + + [ ]

1 2

1

2

1 1 2 2

+ + + z z z

n 1

2

2

2 2

.

236 Chapter 5 The Vector Space

n

1. By computing the trace, determinant, and rank,

show that A and B are not similar in each case.

1

]

1

1

]

1

1

]

1

1

(d)

(e)

A B

A B

3 1

1 2

2 1

3 2

2 1 1

1 0 1

1 1 0

,

,

11 2 1

2 4 2

3 6 3

1 2 3

1 1 2

0 3 5

2 1 3

6

1

]

1

1

1

]

1

1

(f ) , A B

1

]

1

1

3 9

0 0 0

(a)

(b)

A B

A B

1

]

1

1

]

1

1

]

1

1 2

2 1

1 1

1 1

3 1

2 1

1 1

2 1

,

,

11

]

1

1

]

1

1

]

1

(c) A B

2 1

1 1

3 0

1 1

,

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 236

2.

are not similar.

3. If A B, show that:

4. In each case, decide whether the matrix A is

diagonalizable. If so, find P such that P

1

AP is

diagonal.

5. If A is invertible, show that AB is similar to BA

for all B.

6. Show that the only matrix similar to a scalar

matrix A rI, r in , is A itself.

7. Let be an eigenvalue of A with corresponding

eigenvector X. If B P

1

AP is similar to A, show

that P

1

X is an eigenvector of B corresponding

to .

8. If A B and A has any of the following proper-

ties, show that B has the same property.

(a) Idempotent, that is A

2

A.

k

0 for some k 1.

(c) Invertible.

9. Let A denote an n n upper triangular matrix.

(a) If all the main diagonal entries of A are

distinct, show that A is diagonalizable.

show that A is diagonalizable only if it is

already diagonal.

(c) Show that is diagonalizable but

that is not.

10. Let A be a diagonalizable n n matrix

with eigenvalues

1

,

2

, ,

n

(including

multiplicities). Show that:

(a) det A

1

(b) tr A

1

+

2

+

+

n

11. Given a polynomial p(x) r

0

+ r

1

x +

+ r

n

x

n

and a square matrix A, the matrix

p(A) r

0

I + r

1

A +

+ r

n

A

n

is called the

evaluation of p(x) at A. Let B P

1

AP. Show

that p(B) P

1

p(A)P for all polynomials p(x).

12. Let P be an invertible n n matrix. If A is

any n n matrix, write T

P

(A) P

1

AP.

Verify that:

(a) T

P

(I ) I

(b) T

P

(AB) T

P

(A) T

P

(B)

(c) T

P

(A + B) T

P

(A) + T

P

(B)

(d) T

P

(rA) rT

P

(A)

(e) T

P

(A

k

) [T

P

(A)]

k

for k 1

(f ) If A is invertible, T

P

(A

1

) [T

P

(A)]

1

.

(g) If Q is invertible, T

Q

[T

P

(A)] T

PQ

(A).

13. (a) Show that two diagonalizable matrices are

similar if and only if they have the same eigen-

values with the same multiplicities.

T

.

14. If A is 2 2 and diagonalizable, show that

C(A) { X XA AX } has dimension 2 or 4.

[Hint: If P

1

AP D, show that X is in C(A) if

and only if P

1

XP is in C(D).]

15. If A is diagonalizable and p(x) is a polynomial

such that p() 0 for all eigenvalues of A,

show that p(A) 0 (see Example 9 3.3). In

particular, show c

A

(A) 0. [Remark: c

A

(A) 0

for all square matrices Athis is the

CayleyHamilton theorem (see Theorem 2 9.4).]

16. Let A be n n with n distinct real eigenvalues.

If AC CA, show that C is diagonalizable.

1 1 0

0 1 0

0 0 2

1

]

1

1

1 0 1

0 1 0

0 0 2

1

]

1

1

(c) 3 1 6

2 1 0

1 0 3

1

]

1

1

1

]

1

1

(d) 4 0 0

0 2 2

2 3 1

(a)

(b)

1 0 0

1 2 1

0 0 1

3 0 6

0 3 0

5 0 2

1

]

1

1

1

]

1

1

(a) (b)

(c) for in (d) for

A B A B

rA rB r A B n

T T

n n

1 1

1

Show that and

1 2 1 0

2 0 1 1

1 1 0 1

4 3 0 0

1 1 3 0

1 0 1 1

0 1 4

1

]

1

1

1

1

11

5 1 1 4

1

]

1

1

1

1

237 Section 5.5 Similarity and Diagonalization

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 237

17. Let

(a) Show that x

3

(a

2

+ b

2

+ c

2

)x 2abc has real

roots by considering A.

2

+ b

2

+ c

2

ab + ac + bc by

considering B.

18. Assume the 2 2 matrix A is similar to an upper

triangular matrix. If tr A 0 tr A

2

, show that

A

2

0.

19. Show that A is similar to A

T

for all 2 2

matrices A. [Hint: Let . If c 0, treat

the cases b 0 and b 0 separately. If c 0,

reduce to the case c 1 using Exercise 12(d).]

20. Refer to Section 3.4 on linear recurrences.

Assume that the sequence x

0

, x

1

, x

2

, satisfies

x

n +k

r

0

x

n

+ r

1

x

n +1

+

+ r

k 1

x

n +k 1

for all n 0. Define

Then show that:

(a) V

n

A

n

V

0

for all n.

(b) c

A

(x) x

k

r

k 1

x

k 1

r

1

x r

0

.

(c) If is an eigenvalue of A, the eigenspace E

2

, ,

k 1

)

T

is an eigenvector. [Hint: Use c

A

() 0 to

show that E

X.]

(d) A is diagonalizable if and only if the eigen-

values of A are distinct. [Hint: See part (c)

and Theorem 4.]

(e) If

1

,

2

, ,

k

are distinct real eigenvalues,

there exist constants t

1

, t

2

, , t

k

such that

holds for all n. [Hint: If

D is diagonal with

1

,

2

, ,

k

as the main

diagonal entries, show that A

n

PD

n

P

1

has

entries that are linear combinations of

.]

1 2

n n

k

n

, , ,

x t t

n

n

k k

n

+ +

1 1

A

r r r r

V

x

x

x

k

n

n

n

1

]

1

1

1

1

1

+

0 1 0 0

0 0 1 0

0 0 0 1

0 1 2 1

1

,

nn k +

1

]

1

1

1

1

1

.

A

a b

c d

1

]

1

A

a b

a c

b c

B

c a b

a b c

b c a

1

]

1

1

1

]

1

1

0

0

0

and .

238 Chapter 5 The Vector Space

n

SECTION 5.6 An Application to Correlation and Variance

Suppose the heights of n men are measured. Such a data set is called a

sample of the heights of all the men in the population under study, and various

questions are often asked about such a sample: What is the average height in the

sample? How much variation is there in the sample heights, and how can it be

measured? What can be inferred from the sample about the heights of all men in

the population? How do these heights compare to heights of men in neighbouring

countries? Does the prevalence of smoking affect the height of a man?

The analysis of samples, and of inferences that can be drawn from them, is a

subject called mathematical statistics, and an extensive body of information has been

developed to answer many such questions. In this section we will describe a few

ways that linear algebra can be used.

It is convenient to represent a sample as a sample vector

in

n

. This being done, the dot product in

n

provides a

convenient tool to study the sample and describe some of the statistical concepts

related to it. The most widely known statistic for describing a data set is the sample

mean defined by

15

The mean is typical of the sample values x

i

, but may not itself be one of them.

The number x

i

is called the deviation of x

i

from the mean . The deviation is x x

x

x x x x x

n

n

n

i

i

n

+ + +

1

1 2

1

1

( ) .

x

X x x x

n

T

[ ]

1 2

{ } x x x

n 1 2

, , ,

h h h

n 1 2

, , ,

15 The mean is often called the average of the sample values x

i

, but statisticians use the term mean.

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 238

positive if and it is negative if . Moreover, the sum of these deviations is

zero:

(

)

This is described by saying that the sample mean is central to the sample values x

i

.

If the mean is subtracted from each data value x

i

, the resulting data are

said to be centred. The corresponding data vector is

and () shows that the mean . For example, the sample

is plotted in the diagram. The mean is , and the centred sample

is also plotted. Thus, the effect of centring is to shift the

data by an amount (to the left if is positive) so that the mean moves to 0.

Another question that arises about samples is how much variability there is in the

sample ; that is, how widely are the data spread out around

the sample mean . A natural measure of variability would be the sum of the

deviations of the x

i

about the mean, but this sum is zero by (); these deviations

cancel out. To avoid this cancellation, statisticians use the squares of the

deviations as a measure of variability. More precisely, they compute a statistic called

the sample variance , defined

16

as follows:

The sample variance will be large if there are many x

i

at a large distance from the

mean , and it will be small if all the x

i

are tightly clustered about the mean. The

variance is clearly nonnegative (hence the notation ), and the square root s

x

of the

variance is called the sample standard deviation.

The sample mean and variance can be conveniently described using the dot

product. Let

denote the column with every entry equal to 1. If , then

, so the sample mean is given by the formula

.

Moreover, remembering that is a scalar, we have , so the

centred sample vector X

c

is given by

Thus we obtain a formula for the sample variance:

Linear algebra is also useful for comparing two different samples. To illustrate

how, consider two examples.

s X X x

x n c n

2

1

1

2 1

1

2

1 .

X X x x x x x x x

c n

T

1 [ ] .

1 2

x x x x

T

1 [ ] x

x

n

X

1

( ) i 1

X x x x

n

i 1 + + +

1 2

X x x x

n

T

[ ]

1 2

1 [ ] 1 1 1

T

s

x

2

x

s x x x x x x x x

x

n

n

n

i

i

n

2

1

1

1

2

2

2 2

1

1

2

1

+ + +

[ ] ( ) ( ) ( ) ( ) .

s

x

2

( ) x x

i

2

x

X x x x

n

T

[ ]

1 2

x x

[ ] 3 2 1 2 4

T

X

c

x 2

X

T

[ ] 1 0 1 4 6

c

x 0

X x x x x x x

c n

T

[ ]

1 2

x x

i

x

x

( ) x x x nx nx nx

i

i

n

i

i

n

_

,

.

1 1

0

x x

i

< x x

i

>

239 Section 5.6 An Application to Correlation and Variance

16 Since there are n sample values, it seems more natural to divide by n here, rather than by n 1. The reason for using n 1

is that then the sample variance s

2

x

provides a better estimate of the variance of the entire population from which the sample

was drawn.

1 0 1

Sample X

Centred Sample X

c

2 4 6

4 2 0 -1 -2 -3

x

x

c

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 239

The following table represents the number of sick days at work per year and the

yearly number of visits to a physician for 10 individuals.

The data are plotted in the scatter diagram where it is evident that, roughly

speaking, the more visits to the doctor the more sick days. This is an example of a

positive correlation between sick days and doctor visits.

Now consider the following table representing the daily doses of vitamin C

and the number of sick days.

The scatter diagram is plotted as shown and it appears that the more vitamin C

taken, the fewer sick days. In this case there is a negative correlation between daily

vitamin C and sick days.

In both these situations, we have paired samples, that is observations of two

variables are made for ten individuals: doctor visits and sick days in the first case;

daily vitamin C and sick days in the second case. The scatter diagrams point to a

relationship between these variables, and there is a way to use the sample to

compute a number, called the correlation coefficient, that measures the degree to

which the variables are associated.

To motivate the definition of the correlation coefficient, suppose two paired

samples and are given and consider the

centred samples

If x

k

is large among the x

i

s, then the deviation will be positive; and

will be negative if x

k

is small among the x

i

s. The situation is similar for Y, and the

following table displays the sign of the quantity in all four cases:

Intuitively, if X and Y are positively correlated, then two things happen:

1. Large values of the x

i

tend to be associated with large values of the y

i

, and

2. Small values of the x

i

tend to be associated with small values of the y

i

.

It follows from the table that, if X and Y are positively correlated, then the dot

product

is positive. Similarly X

c

i Y

c

is negative if X and Y are negatively correlated. With

this in mind, the sample correlation coefficient

17

r is defined by

X Y x x y y

c c i i

i

n

i

( )( )

1

Sign of

large small

large positive negative ( )( ): x x y y

x x

y

y

i i

i i

i

i

ssmall negative positive

( )( ) x x y y

i i

x x

k

x x

k

X x x x x x x Y y y y y y y

c n

T

c n

T

[ ] [ ] .

1 2 1 2

and

Y y y y

n

T

[ ]

1 2

X x x x

n

T

[ ]

1 2

Individual

Vitamin C

Sick days

1 5 7 0 4 9 2 8 6 3

5 2 2 6 2 1 4 3

1 2 3 4 5 6 7 8 9 10

22 5

Individual 1 2 3 4 5 6 7 8 9 10

Doctor visits 2 6 8 1 5 10 3 9 7 4

Sick days 2 4 8 3 55 9 4 7 7 2

240 Chapter 5 The Vector Space

n

17 The idea of using a single number to measure the degree of relationship between different variables was pioneered by Francis

Galton (18221911). He was studying the degree to which characteristics of an offspring relate to those of its parents. The

idea was refined by Karl Pearson (18571936) and r is often referred to as the Pearson correlation coefficient.

Sick

Days

Sick

Days

Doctor Visits

Vitamin C Doses

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 240

Bearing the situation in

3

in mind, r is the cosine of the angle between the

vectors X

c

and Y

c

, and so we would expect it to lie between 1 and 1. Moreover,

we would expect r to be near 1 (or 1) if these vectors were pointing in the same

(opposite) direction, that is the angle is near zero (or ).

This is confirmed by Theorem 1 below, and it is also borne out in the examples

above. If we compute the correlation between sick days and visits to the physician

(in the first scatter diagram above) the result is r 0.90 as expected. On the other

hand, the correlation between daily vitamin C doses and sick days (second scatter

diagram) is r 0.84.

However, a word of caution is in order here. We cannot conclude from the second

example that taking more vitamin C will reduce the number of sick days at work.

The (negative) correlation may arise because of some third factor that is related to

both variables. In this case it may be that less healthy people are inclined to take

more vitamin C. Correlation does not imply causation. Similarly, the correlation

between sick days and visits to the doctor does not mean that having many sick days

causes more visits to the doctor. A correlation between two variables may point to

the existence of other underlying factors, but it does not necessarily mean that there

is a causality relationship between the variables.

Our discussion of the dot product in

n

provides the basic properties of the

correlation coefficient:

Theorem 1

Let and be (nonzero) paired samples,

and let r r(X, Y) denote the correlation coefficient. Then:

1. 1 r 1.

2. r 1 if and only if there exist a and b > 0 such that y

i

a + bx

i

for each i.

3. r 1 if and only if there exist a and b < 0 such that y

i

a + bx

i

for each i.

PROOF

The Cauchy inequality (Theorem 2 5.3) proves (1), and also shows that r 1 if and

only if one of X

c

and Y

c

is a scalar multiple of the other. This in turn holds if and

only if Y

c

bX

c

for some b 0, and it is easy to verify that r 1 when b > 0 and

r 1 when b < 0.

Finally, Y

c

bX

c

means for each i; that is, y

i

a + bx

i

where

. Conversely, if y

i

a + bx

i

, then (verify), so

for each i. In other words, Y

c

bX

c

. This

completes the proof.

Properties (2) and (3) in Theorem 1 show that r(X, Y) 1 means that there is

a linear relation with positive slope between the paired data (so large x values are

paired with large y values). Similarly, r(X, Y) 1 means that there is a linear

relation with negative slope between the paired data (so small x values are paired

with small y values). This is borne out in the two scatter diagrams above.

We conclude by using the dot product to derive some useful formulas for com-

puting variances and correlation coefficients. Given samples

and , the key observation is the following formula:

(

)

X Y X Y n x y

c c

i i .

Y y y y

n

T

[ ]

1 2

X x x x

n

T

[ ]

1 2

( ) ( ) ( ) a bx a bx b x x

i i

+ +

y y

i

y a bx + a y bx

y y b x x

i i

( )

Y y y y

n

T

[ ]

1 2

X x x x

n

T

[ ]

1 2

r r X Y

X Y

X Y

c c

c c

, ( ) .

i

241 Section 5.6 An Application to Correlation and Variance

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 241

1. The following table gives IQ scores for 10

fathers and their eldest sons. Calculate the

means, the variances, and the correlation coeffi-

cient r. (The data scaling formula is useful.)

1 2 3 4 5 6 7 8 9 10

Fathers IQ 140 131 120 115 110 106 100 95 91 86

Sons IQ 130 138 110 99 109 120 105 99 100 94

Indeed, remembering that and are scalars:

Taking Y X in () gives a formula for the variance of X.

Variance Formula

If x is a sample vector, then .

We also get a convenient formula for the correlation coefficient,

Moreover, () and the fact that give:

Correlation Formula

If X and Y are sample vectors, then

Finally, we give a method that simplifies the computations of variances and

correlations.

Data Scaling

Let and be sample vectors. Given

constants a, b, c, and d, consider new samples and

where z

i

a + bx

i

for each i and w

i

c + dy

i

for each i.

Then:

(a) .

(b) , so s

z

b

s

x

.

(c) If b and d have the same sign, then r(X, Y ) r(Z, W).

The verification is left as an exercise.

For example, if , subtracting 100 yields

. A routine calculation shows that and , so

, and .

Exercises 5.6

s

x

2

14

3

4 67 . x . 100 99 67

1

3

s

z

2

14

3

z

1

3

Z

T

[ ] 1 2 3 1 0 3

X

T

[ ] 101 98 103 99 100 97

s b s

z x

2 2 2

z a bx +

W w w w

n

T

[ ]

1 2

Z z z z

n

T

[ ]

1 2

Y y y y

n

T

[ ]

1 2

X x x x

n

T

[ ]

1 2

r r X Y

X Y nx y

n s s

x y

,

( )

( )

.

i

1

s X

x

n

c

2

1

1

2

r r X Y

X Y

X Y

c c

c c

, ( ) .

i

s X nx

x n

2

1

1

2 2

s X

x n c

2

1

1

2

X Y X x Y y

X Y y X x Y x y

X Y y nx x

c c

i i

i i i i

i

+

( ) ( )

( ) ( ) ( )

( ) (

1 1

1 1 1 1

nny x y n

X Y n x y

) ( )

.

+

i

y x

242 Chapter 5 The Vector Space

n

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 242

SECTION 5.7 An Application to Least Squares Approximation

In many scientific investigations, data are collected that relate two variables. For

example, if x is the number of dollars spent on advertising by a manufacturer and y

is the value of sales in the region in question, the manufacturer could generate data

by spending x

1

, x

2

, , x

n

dollars at different times and measuring the corresponding

sales values y

1

, y

2

, , y

n

.

Suppose it is known that a linear relationship exists between the variables x and

yin other words, that y a + bx for some constants a and b. If the data are plotted,

the points (x

1

, y

1

), (x

2

, y

2

), , (x

n

, y

n

) may appear to lie on a straight line and

estimating a and b requires finding the best-fitting line through these data points.

For example, if five data points occur as shown in Figure 5.1, line 1 is clearly a

better fit than line 2. In general, the problem is to find the values of the constants

a and b such that the line y a + bx best approximates the data in question. Note

that an exact fit would be obtained if a and b were such that y

i

a + bx

i

were true

for each data point (x

i

, y

i

). But this is too much to expect. Experimental errors in

measurement are bound to occur, so the choice of a and b should be made in such

a way that the errors between the observed values y

i

and the corresponding fitted

values a + bx

i

are in some sense minimized.

The first thing we must do is explain exactly what we mean by the best fit of a line

y a + bx to an observed set of data points (x

1

, y

1

), (x

2

, y

2

), , (x

n

, y

n

). For

convenience, write the linear function r + sx as

so that the fitted points (on the line) have coordinates (x

1

, f (x

1

)), , (x

n

, f (x

n

)).

Figure 5.2 is a sketch of what the line y f (x) might look like. For each i the

observed data point (x

i

, y

i

) and the fitted point (x

i

, f (x

i

)) need not be the same, and

the distance d

i

between them measures how far the line misses the observed point.

For this reason d

i

is often called the error at x

i

, and a natural measure of how close

the line y f (x) is to the observed data points is the sum d

1

+ d

2

+

+ d

n

of all these

errors. However, it turns out to be better to use the sum of squares

as the measure of error, and the line y f (x) is to be chosen so as to make this sum

as small as possible. This line is said to be the least squares approximating line for

the data points (x

1

, y

1

), (x

2

, y

2

), , (x

n

, y

n

).

The square of the error d

i

is given by for each i, so the quantity

S to be minimized is the sum:

S y f x y f x y f x

n n

+ + + [ ( )] [ ( )] [ ( )] .

1 1

2

2 2

2 2

d y f x

i i i

2 2

[ ( )]

S d d d

n

+ + +

1

2

2

2 2

f x r sx ( ) +

243 Section 5.7 An Application to Least Squares Approximation

Figure 5.1

Y

X

O

(x

5

, y

5

)

(x

4

, y

4

)

(x

3

, y

3

)

(x

2

, y

2

)

(x

1

, y

1

)

Line2

Line1

Figure 5.2

Y

X O x

1

y f x = ( )

d

i

d

1

d

n

x

i

x

n

(x

n

, f(x

n

))

(x

n

, y

n

)

(x

i

, y

i

)

(x

i

, f(x

i

))

(x

1

, y

1

)

(x

1

, f(x

1

))

education and the annual income (in thousands) of

10 individuals. Find the means, the variances, and

the correlation coefficient. (Again the data scaling

formula is useful.)

Individual 1 2 3 4 5 6 7 8 9 10

Years of education 12 16 13 18 19 12 18 19 12 14

Yearly income (1000s) 31 48 35 28 55 40 39 60 32 35

3. If X is a sample vector, and X

c

is the centred

sample, show that and the standard

deviation of X

c

is s

x

.

4. Prove the data scaling formulas:

(a)

(b) (c)

found on page 242.

c

x 0

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 243

Note that all the numbers x

i

and y

i

are given here: what is required is that the

function f be chosen in such a way as to minimize S. Because f (x) r + sx, this

amounts to choosing r and s so as to minimize S, and the problem can be solved

using vector techniques. The following notations simplify the discussion.

Observe that

so the quantity S to be minimized is

Here Y and M are given and we are asked to find Z such that the length of the

vector Y MZ is as small as possible. To this end, consider the set U of all vectors

MZ where Z varies:

This is a subspace of

n

, and the task is to choose MZ in U as close as possible to Y.

If n 3 and x

1

, x

2

and x

3

are distinct, U is the plane containing

and with normal . In this case, we

look for so that is orthogonal to every vector in the plane U,

as in Figure 5.3.

This condition

18

makes sense in

n

so we look for such that

(MZ)

i

(Y MA) 0 for all Z in

2

. This dot product is in

n

, and it can be written

as a dot product in

2

:

for all Z in

2

. This means that M

T

Y M

T

MA is orthogonal to every vector in

2

.

In particular, it is orthogonal to itself, and so must be zero, and we obtain

.

These are called the normal equations for A, and can be solved using gaussian

elimination. Moreover, if at least two of the x

i

are distinct, the matrix M

T

M can be

shown to be invertible, so the solution A is unique. If the solution is , the

best fitting line is y a

0

+ a

1

x. This proves the following useful theorem.

A

a

a

1

]

1

1

0

1

( ) M M A M Y

T T

0 ( ) ( ) [ ( )] ( ) MZ Y MA Z M Y MA Z M Y M MA

T T T T T

i

A

a

a

1

]

1

1

0

1

M z

y Ma

a

a

a

1

]

1

1

0

1

x v x x x x x x

T

[ ]

2 3 3 1 1 2

v

T

[ ] 1 1 1

x x x x

T

[ ]

1 2 3

U MZ Z

r

s

r sx

r sx

r sx

n

1

]

1

+

+

+

1

]

1

1

1

1

arbitrary

1

2

11

1

r s and arbitrary .

S Y MZ

2

.

Y MZ

y r sx

y r sx

y r sx

y f x

y

n n

+

+

+

1

]

1

1

1

1

1 1

2 2

1 1

( )

( )

( )

( )

22 2

1

]

1

1

1

1

f x

y f x

n n

( )

( )

Y

y

y

y

M

x

x

x

Z

r

s

n n

1

]

1

1

1

1

1

]

1

1

1

1

1

]

1

1

2

1

2

1

1

1

244 Chapter 5 The Vector Space

n

18 We will revisit this in Chapter 8 where a more rigorous argument will be given.

U

M

y

Ma

Figure 5.3

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 244

x y

1 1

3 2

4 3

6 4

7 5

Theorem 1

Suppose that n data points (x

1

, y

1

), (x

2

, y

2

), , (x

n

, y

n

) are given, where at least two

of x

1

, x

2

, , x

n

are distinct. Put

Then the least squares approximating line for these data points has the equation

where is found by gaussian elimination from the normal equations

The condition that at least two of x

1

, x

2

, , x

n

are distinct ensures that M

T

M

is an invertible matrix, so A is unique:

Example 1

Let data points (x

1

, y

1

), (x

2

, y

2

), , (x

5

, y

5

) be given as in the accompanying

table. Find the least squares approximating line for these data.

Solution In this case we have

so the normal equations (M

T

M)A M

T

Y for become

The solution (using gaussian elimination) is to two decimal

places, so the least squares approximating line for these data is y 0.24 + 0.66x.

Note that M

T

M is indeed invertible here (the determinant is 114), and the

exact solution is

a

a

0

1

0 24

0 66

1

]

1

1

]

1

.

.

5 21

21 111

15

78

0

1

1

]

1

1

]

1

1

]

1

a

a

A

a

a

1

]

1

0

1

M Y

x x x

y

y

y

y y y

x y x

T

1

]

1

1

]

1

1

1

1

+ + +

+

1 1 1

1 2 5

1

2

5

1 2 5

1 1 2

yy x y

2 5 5

15

78 + +

1

]

1

1

]

1

M M

x x x

x

x

x

x x

x

T

1

]

1

1

]

1

1

1

1

+ +

+ +

1 1 1

1

1

1

5

1 2 5

1

2

5

1 5

1

xx x x

5 1

2

5

2

5 21

21 111

+ +

1

]

1

1

]

1

A M M M Y

T T

( ) .

1

( ) . M M A M Y

T T

A

a

a

1

]

1

0

1

y a a x +

0 1

Y

y

y

y

M

x

x

x

n n

1

]

1

1

1

1

1

]

1

1

1

1

1

2

1

2

1

1

1

245 Section 5.7 An Application to Least Squares Approximation

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 245

Suppose now that, rather than a straight line, we want to find the parabola

y a

0

+ a

1

x + a

2

x

2

that is the least squares approximation to the data points

(x

1

, y

1

), , (x

n

, y

n

). In the function f (x) a

0

+ a

1

x + a

2

x

2

, the three constants

a

0

, a

1

, and a

2

must be chosen to minimize the sum of squares of the errors:

Choosing a

0

, a

1

, and a

2

amounts to choosing the (parabolic) function f that

minimizes S.

In general, there is a relationship y f (x) between the variables, and the range of

candidate functions is limitedsay, to all lines or to all parabolas. The task is to

find, among the suitable candidates, the function that makes the quantity S as small

as possible. The function that does so is called the least squares approximating

function (of that type) for the data points.

As might be imagined, this is not always an easy task. However, if the functions

f (x) are restricted to polynomials of degree m,

the analysis proceeds much as before (where m 1). The problem is to choose the

numbers a

0

, a

1

, , a

m

so as to minimize the sum

The resulting function y f (x) a

0

+ a

1

x +

+ a

m

x

m

is called the least squares

approximating polynomial of degree m for the data (x

1

, y

1

), , (x

n

, y

n

). By

analogy with the preceding analysis, define

Then

so S is the sum of the squares of the entries of Y MA. An analysis similar to that

for Theorem 1 can be used to prove Theorem 2.

Theorem 2

Suppose n data points (x

1

, y

1

), (x

2

, y

2

), , (x

n

, y

n

) are given, where at least m + 1

of x

1

, x

2

, , x

n

are distinct (in particular n m + 1). Put

Y MA

y a a x a x

y a a x a x

y a a x

m

m

m

m

n n

+ + +

+ + +

+

1 0 1 1 1

2 0 1 2 2

0 1

( )

( )

(

++ +

1

]

1

1

1

1

1

a x

y f x

y f x

y f x

m n

m

n n

)

( )

( )

( )

1 1

2 2

]]

1

1

1

1

Y

y

y

y

M

x x x

x x x

x x x

n

m

m

n n n

m

1

]

1

1

1

1

1

2

1 1

2

1

2 2

2

2

2

1

1

1

1

]

1

1

1

1

1

1

]

1

1

1

1

A

a

a

a

m

0

1

S y f x y f x y f x

n n

+ + + [ ( )] [ ( )] [ ( )] .

1 1

2

2 2

2 2

f x a a x a x

m

m

( ) + + +

0 1

S y f x y f x y f x

n n

+ + + [ ( )] [ ( )] [ ( )] .

1 1

2

2 2

2 2

A M M M Y

T T

1

]

1

1

]

1

1

]

1

( )

1

1

114

111 21

21 5

15

78

1

114

27

75

1

]

1

1

38

9

25

246 Chapter 5 The Vector Space

n

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 246

Then the least squares approximating polynomial of degree m for the data points

has the equation

where is found by gaussian elimination from the normal equations

The condition that at least m + 1 of x

1

, x

2

, , x

n

be distinct ensures that the

matrix MM

T

is invertible, so A is unique:

A proof of this theorem is given in Section 8.7 (Theorem 2).

Example 2

Find the least squares approximating quadratic y a

0

+ a

1

x + a

2

x

2

for the

following data points.

(3, 3), (1, 1), (0, 1), (1, 2), (3, 4)

Solution This is an instance of Theorem 2 with m 2. Here

Hence,

M Y

T

1

]

1

1

1

]

1

1

1

1

1

1

1 1 1 1 1

3 1 0 1 3

9 1 0 1 9

3

1

1

2

4

11

4

66

1

]

1

1

M M

T

1

]

1

1

1

]

1

1

1 1 1 1 1

3 1 0 1 3

9 1 0 1 9

1 3 9

1 1 1

1 0 0

1 1 1

1 3 9

11

1

1

1

1

]

1

1

5 0 20

0 20 0

20 0 164

Y M

1

]

1

1

1

1

1

1

1

]

1

1

1

1

1

1

3

1

1

2

4

1 3 9

1 1 1

1 0 0

1 1 1

1 3 9

A M M M Y

T T

( ) .

1

( ) . M M A M Y

T T

A

a

a

a

m

1

]

1

1

1

1

0

1

y a a x a x

m

m

+ + +

0 1

Y

y

y

y

M

x x x

x x x

x x x

n

m

m

n n n

m

1

]

1

1

1

1

1

2

1 1

2

1

2 2

2

2

2

1

1

1

1

]

1

1

1

1

1

247 Section 5.7 An Application to Least Squares Approximation

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 247

The normal equations for A are

This means that the least squares approximating quadratic for these data

is y 1.15 + 0.20x + 0.26x

2

.

Least squares approximation can be used to estimate physical constants, as is

illustrated by the next example.

Example 3

Hookes law in mechanics asserts that the magnitude of the force f required to

hold a spring is a linear function of the extension e of the spring (see the

accompanying diagram). That is,

where k and e

0

are constants depending only on the spring. The following data

were collected for a particular spring.

Find the least squares approximating line f a

0

+ a

1

e for these data, and use it

to estimate k.

Solution Here f and e play the role of y and x in the general theory. We have

as in Theorem 1, so

Hence the normal equations for A are

The least squares approximating line is f 7.70 + 2.84e, so the estimate for

k is k 2.84.

Exercises 5.7

5 67

67 963

229

3254

7 70

2 84

1

]

1

1

]

1

1

]

1

A A whence

.

.

M M M Y

T T

1

]

1

1

]

1

5 67

67 963

229

3254

and

Y M

1

]

1

1

1

1

1

1

1

]

1

1

1

1

33

38

43

54

61

1 9

1 11

1 12

1 16

1 19

11

1

e 9 11 12 16 19

f 33 38 43 54 61

f ke e +

0

5 0 20

0 20 0

20 0 164

11

4

66

1 15

0 20

0 26

1

]

1

1

1

]

1

1

A A whence

.

.

.

1

]

1

1

248 Chapter 5 The Vector Space

n

e

f

1. Find the least squares approximating line

y a

0

+ a

1

x for each of the following sets of data

points.

(a) (1, 1), (3, 2), (4, 3), (6, 4)

(b) (2, 4), (4, 3), (7, 2), (8, 1)

(c) (1, 1), (0, 1), (1, 2), (2, 4), (3, 6)

(d) (2, 3), (1, 1), (0, 0), (1, 2), (2, 4)

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 248

2. Find the least squares approximating quadratic

y a

0

+ a

1

x + a

2

x

2

for each of the following sets

of data points.

(a) (0, 1), (2, 2), (3, 3), (4, 5)

(b) (2, 1), (0, 0), (3, 2), (4, 3)

3. If M is a square invertible matrix, show that

A M

1

Y (in the notation of Theorem 2).

4. Newtons laws of motion imply that an object

dropped from rest at a height of 100 metres will

be at a height metres t seconds

later, where g is a constant called the

acceleration due to gravity. The values of s and t

given in the table are observed. Write x t

2

,

find the least squares approximating line

s a + bx for these data, and use b to estimate g.

Then find the least squares approximating

quadratic s a

0

+ a

1

t + a

2

t

2

and use the value of

a

2

to estimate g.

5. A naturalist measured the heights y

i

(in metres)

of several spruce trees with trunk diameters x

i

(in centimetres). The data are as given in the

table. Find the least squares approximating

line for these data and use it to estimate the

height of a spruce tree with a trunk of diameter

10cm.

6. (a) Use m 0 in Theorem 2 to show that the

best-fitting horizontal line y a

0

through

the data points (x

1

, y

1

), , (x

n

, y

n

) is

y ( y

1

+ y

2

+

+ y

n

), the average of the

y coordinates.

(b) Deduce the conclusion in (a) without using

Theorem 2.

7. Assume n m + 1 in Theorem 2 (so Mis square).

If the x

i

are distinct, use Theorem 6 3.2 to show

that M is invertible. Deduce that A M

1

Y

and that the least squares polynomial is the

interpolating polynomial (Theorem 6 3.2) and

actually passes through all the data points.

1

n

x

i

5 7 8 12 13 16

y

i

2 3.3 4 7.3 7.9 10.1

t 1 2 3

s 95 80 56

s gt 100

1

2

2

249 Section 5.7 An Application to Least Squares Approximation

Supplementary Exercises for Chapter 5

1. In each case either show that the statement is

true or give an example showing that it is false.

Throughout, X, Y, Z, X

1

, X

2

, , X

n

denote

vectors in

n

.

(a) If U is a subspace of

n

and X + Y is in U,

then X and Y are both in U.

(b) If U is a subspace of

n

and rX is in U,

then X is in U.

(c) If U is a nonempty set and sX + tY is in U

for any s and t whenever X and Y are in U,

then U is a subspace.

(d) If U is a subspace of

n

and X is in U,

then X is in U.

(e) If {X, Y } is independent, then {X, Y, X + Y }

is independent.

(f ) If {X, Y, Z} is independent, then {X, Y }

is independent.

(g) If {X, Y } is not independent, then {X, Y, Z}

is not independent.

(h) If all of X

1

, X

2

, , X

n

are nonzero, then

{X

1

, X

2

, , X

n

} is independent.

(i) If one of X

1

, X

2

, , X

n

is zero, then

{X

1

, X

2

, , X

n

} is not independent.

( j) If aX + bY + cZ 0 where a, b, and c are in ,

then {X, Y, Z} is independent.

(k) If {X, Y, Z} is independent, then

aX + bY + cZ 0 for some a, b, and c in .

(l) If { X

1

, X

2

, , X

n

} is not independent, then

t

1

X

1

+ t

2

X

2

+

+ t

n

X

n

0 for t

i

in not all

zero.

(m) If { X

1

, X

2

, , X

n

} is independent, then

t

1

X

1

+ t

2

X

2

+

+ t

n

X

n

0 for some t

i

in .

(n) Every set of four nonzero vectors in

4

is a basis.

(o) No basis of

3

can contain a vector with

a component 0.

(p)

3

has a basis of the form {X, X + Y, Y }

where X and Y are vectors.

(q) Every basis of

5

contains one column of I

5

.

(r) Every nonempty subset of a basis of

3

is

again a basis of

3

.

(s) If {X

1

, X

2

, X

3

, X

4

} and {Y

1

, Y

2

, Y

3

, Y

4

}

are bases of

4

, then

{X

1

+ Y

1

, X

2

+ Y

2

, X

3

+ Y

3

, X

4

+ Y

4

}

is also a basis of

4

.

nic22772_ch05.qxd 11/21/2005 6:49 PM Page 249

- Vector Algebra MCQDiunggah olehrajeshsaran21
- IIT QUIZDiunggah olehdivyaevats
- Artin Algebra CheatSheetDiunggah olehPyotr Gyuyoung Chang
- 1971-OntheCalculationofConsistentStressDistributionsinFiniteElementApproximationsDiunggah olehMekaTron
- LL~PresentationMME120913Diunggah olehlukaslad
- Comm Ch06 Signal Space EnDiunggah olehGurpreet Singh
- 020 Advanced Engineering MathematicsDiunggah olehBobi Guau
- Course Outline WS 2017Diunggah olehErnest Toka
- Modal Analysis Lecture NotesDiunggah olehnim1987
- Theorist's Toolkit Lecture 9: High Dimensional Geometry (continued) and VC-dimensionDiunggah olehJeremyKun
- P. Dorbec and A. Gajardo- Langton’s FlyDiunggah olehIrokk
- Linear Algebra AnswersDiunggah olehdull_matter
- AdditiveDiunggah olehGeorge Protopapas
- Assignment Week9Diunggah oleh1969
- maths 1.pdfDiunggah olehNikhilGupta
- tt18Diunggah olehggf
- Vector TensorsDiunggah olehFaisal Rahman
- aggregationmaterialbalances.pdfDiunggah oleharumcorp
- BS Mechanical Engineering Curricula PIEAS 2013.pdfDiunggah olehIntelshk
- tothDiunggah olehgtoth
- Land Cover Chnage Detection Using Improved Change Vector AnalisisDiunggah olehRonny Christianto
- DSP--LAB#01Diunggah olehAwais Bashir
- Instantaneous Frequency Estimation Based On Time-Varying Auto Regressive Model And WAX-Kailath AlgorithmDiunggah olehAI Coordinator - CSC Journals
- HD 7 Modal Analysis Undamped MDOFDiunggah oleha_ouchar
- CS 200 HW2Diunggah oleh0307ali
- CurlDiunggah olehJinesh Sebastian
- CWCDiunggah olehmrfintr
- Halmos - Problems as Heart of mathematicsDiunggah olehΧάρης Φραντζικινάκης
- fadiga-texturaDiunggah olehsm_carvalho
- Continuum Dynamics NotesDiunggah olehAaronWienkers

- TestDiunggah olehbhameed-1958
- formula handbookDiunggah olehaugur886
- AE05_solDiunggah olehVipul Mahajan
- 0130286966 - Telecommunications Wiring - Clyde N Herrick 3rd Edition - Chapter 1Diunggah olehbhameed-1958
- Beauty MathDiunggah olehMuhammad Farid Janudin
- Chapter 1 - Analysis of Linear System in State Space FormDiunggah olehbhameed-1958
- Calculus Based PhysicsDiunggah olehapi-3721555

- @Diunggah oleholakhsmi
- LA2Diunggah olehTejaswi Nisanth
- Maths1a Examples Alg VecspaceDiunggah olehtutisarmiento1
- Civils Maths Solved 1979-2006Diunggah olehgautamdb20
- Appl Math Lect NotesDiunggah olehwreathbearer
- MAT_217_Lecture_4.pdfDiunggah olehCarlo Karam
- MidtermDiunggah olehJay Kay
- Chap5Diunggah oleheshbli
- book_linear ALGEBRA.pdfDiunggah olehJeyakumar R
- F90IntrinsicDiunggah olehBauglir
- Tullio Ceccherini-Silberstein and Michel Coornaert- Linear cellular automata: Garden of Eden Theorem, L-surjunctivity and group ringsDiunggah olehIrokk
- Topological Quantum Computation - WangDiunggah olehhenry
- Unit-4 Basis and DimensionDiunggah olehChandradeep Reddy Teegala
- 4-General Vector SpacesDiunggah olehslowjams
- cambridge linear algebra notes.pdfDiunggah olehRushil Gholkar
- LinearAlgebraIDiunggah olehDilip Kumar
- 70261454-Beginning-Functional-Analysis.pdfDiunggah olehbmandar
- Important Definitions of Linear AlgebraDiunggah olehSam Higginbotham
- 263lectures SlidesDiunggah olehDaniel Baker
- Data Mining Text BookDiunggah olehsatpalwadhwa
- 1188689勾的習題詳解Diunggah olehChing-Ting Wu
- Linear Algebra Chapter 5Diunggah olehbhameed-1958
- Fried BergDiunggah olehneedmyscribd
- 2. Core TensorFlowDiunggah olehGioGio2020
- TUTORIALSHEET-2.pdfDiunggah olehdevang_si
- Bounds on the Metric Dimension of GraphsDiunggah olehIJRASETPublications
- Travis D Andrews; Et Al Eds. Excursions in Harmonic Analysis. Volume 1 the February Fourier Talks at the Norbert Wiener CenterDiunggah olehClasa XI-e
- Assignment1_2014Diunggah olehJoshua Nolan Luff
- Coxeter Notes.pdfDiunggah olehAxel Ryo
- Quantum MechanicsDiunggah olehAnonymous YTt4ZwN9Oo