Anda di halaman 1dari 101

Linear Algebra: MAT 217

Lecture notes, Spring 2012

Michael Damron

compiled from lectures and exercises designed with Tasho Kaletha

Princeton University

1
Contents
1 Vector spaces 4
1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Linear independence and bases . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Linear transformations 16
2.1 Definitions and basic properties . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Range and nullspace, one-to-one, onto . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Isomorphisms and L(V, W ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Matrices and coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Dual spaces 32
3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Annihilators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Double dual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Dual maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4 Determinants 39
4.1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Determinants: existence and uniqueness . . . . . . . . . . . . . . . . . . . . 41
4.3 Properties of determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5 Eigenvalues 54
5.1 Definitions and the characteristic polynomial . . . . . . . . . . . . . . . . . . 54
5.2 Eigenspaces and the main diagonalizability theorem . . . . . . . . . . . . . . 56
5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6 Jordan form 62
6.1 Generalized eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.2 Primary decomposition theorem . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.3 Nilpotent operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.4 Existence and uniqueness of Jordan form, Cayley-Hamilton . . . . . . . . . . 70
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

7 Bilinear forms 79
7.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.2 Symmetric bilinear forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.3 Sesquilinear and Hermitian forms . . . . . . . . . . . . . . . . . . . . . . . . 84

2
7.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

8 Inner product spaces 88


8.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.3 Adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
8.4 Spectral theory of self-adjoint operators . . . . . . . . . . . . . . . . . . . . . 95
8.5 Normal and commuting operators . . . . . . . . . . . . . . . . . . . . . . . . 98
8.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

3
1 Vector spaces
1.1 Definitions
We begin with the definition of a vector space. (Keep in mind vectors in Rn or Cn .)
Definition 1.1.1. A vector space is a collection of two sets, V and F . The elements of F
(usually we take R or C) are called scalars and the elements of V are called vectors. For
each v, w V , there is a vector sum, v + w V , with the following properties.
0. There is one (and only one) vector called ~0 with the property
v + ~0 = v for all v V ;

1. for each v V there is one (and only one) vector called v with the property
v + (v) = ~0 for all v V ;

2. commutativity of vector sum:


v + w = w + v for all v, w V ;

3. associativity of vector sum:


(v + w) + z = v + (w + z) for all v, w, z V .

Furthermore, for each v V and c F there is a scalar product cv V with the following
properties.
1. For all v V ,
1v = v .

2. For all v V and c, d F ,


(cd)v = c(dv) .

3. For all c F, v, w V ,
c(v + w) = cv + cw .

4. For all c, d F, v V ,
(c + d)v = cv + dv .

Here are some examples.


1. V = Rn , F = R. Addition is given by
(v1 , . . . , vn ) + (w1 , . . . , wn ) = (v1 + w1 , . . . , vn + wn )
and scalar multiplication is given by
c(v1 , . . . , vn ) = (cv1 , . . . , cvn ) .

4
2. Polynomials: take V to be all polynomials of degree up to n with real coefficients

V = {an xn + + a1 x + a0 : ai R for all i}

and take F = R. Note the similarity to Rn .

3. Let S be any nonempty set and let V be the set of functions from S to C. Set F = C.
If f1 , f2 V set f1 + f2 to be the function given by

(f1 + f2 )(s) = f1 (s) + f2 (s) for all s S

and if c C set cf1 to be the function given by

(cf1 )(s) = c(f1 (s)) .

4. Let  
1 1
V =
0 0
(or any other fixed object) with F = C. Define
     
1 1 1 1 1 1
+ =
0 0 0 0 0 0

and    
1 1 1 1
c = .
0 0 0 0

In general F is allowed to be a field.

Definition 1.1.2. A set F is called a field if for each a, b F there is an element ab F


and one a + b F such that the following hold.

1. For all a, b, c F we have (ab)c = a(bc) and (a + b) + c = a + (b + c).

2. For all a, b F we have ab = ba and a + b = b + a.

3. There exists an element 0 F such that for all a F , we have a + 0 = a; furthermore


there is a non-zero element 1 F such that for all a F , we have 1a = a.

4. For each a F there is an element a F such that a + (a) = 0. If a 6= 0 there


exists an element a1 such that aa1 = 1.

5. For all a, b, c F ,
a(b + c) = ab + ac .

Here are some general facts.

1. For all c F , c~0 = ~0.

5
Proof.
c~0 = c(~0 + ~0) = c~0 + c~0

c~0 + ((c~0)) = (c~0 + c~0) + (c(~0))


~0 = c~0 + (c~0 + ((c~0)))
~0 = c~0 + ~0
~0 = c~0 .

Similarly one may prove that for all v V , 0v = ~0.


2. For all v V , (1)v = v.

Proof.

v + (1)v = 1v + (1)v
= (1 + (1))v
= 0v
= ~0 .

However v is the unique vector such that v + (v) = ~0. Therefore (1)v = v.

1.2 Subspaces
Definition 1.2.1. A subset W V of a vector space is called a subspace if (W, F ) with the
same operations is also a vector space.
Many of the rules for vector spaces follow directly by inheritance. For example, if
W V then for all v, w W we have v + w = w + v. We actually only need to check a few:

A. ~0 W .
B. For all w W the vector w is also in W .
C. For all w W and c F , cw W .
D. For all v, w W , v + w W .

Theorem 1.2.2. W V is a subspace if and only if it is nonempty and for all v, w W


and c F we have cv + w W .
Proof. Suppose that W is a subspace and let v, w W , c F . By (C) we have cv W . By
(D) we have cv + w W .
Conversely suppose that for all v, w W and c F we have cv + w W . Then we need
to show A-D.

6
A. Since W is nonempty choose w W . Let v = w and c = 1. This gives ~0 = ww W .
B. Set v = w, w = 0 and c = 1.
C. Set v = w, w = 0 and c F .
D. Set c = 1.

Examples:
1. If V is a vector space then {0} is a subspace.
2. Take V = Cn . Let n
X
W = {(z1 , . . . , zn ) : zi = 0} .
i=1

Then W is a subspace. (Exercise.)


3. Let V be the set of 2 2 matrices with real entries.
     
a1 b 1 a2 b 2 a1 + a2 b1 + b2
+ =
c1 d 1 c2 d2 c1 + c2 d 1 + d 2
   
a1 b 1 ca1 cb1
c = .
c1 d1 cc1 cd1
Then is W is a subspace, where
  
a b
W = :a+d=1 ?
c d

4. In Rn all subspaces are hyperplanes through the origin.


Theorem 1.2.3. Suppose that C is a non-empty collection of subspaces of V . Then the
intersection
f = W C W
W
is a subspace.
Proof. Let c F and v, w W
f . We need to show that (a) Wf 6= and (b) cv + w W
f . The
first holds because each W is a subspace, so 0 W . Next for all W C we have v, w W .
Then cv + w W . Therefore cv + w W f.

Definition 1.2.4. If S V is a subset of vectors let CS be the collection of all subspaces


containing S. The span of S is defined as
span(S) = W CS W .
Since CS is non-empty, span(S) is a subspace.

7
Question: What is span({})?
Definition 1.2.5. If
v = a1 w 1 + + ak w k
for scalars ai F and vectors wi V then we say that v is a linear combination of
{w1 , . . . , wk }.
Theorem 1.2.6. If S 6= then span(S) is equal to the set of all finite linear combinations
of elements of S.
Proof. Set
Se = all finite l.c.s of elements of S .
We want to show that Se = span(S). First we show that Se span(S). Let
a1 s1 + + ak sk Se
and let W be a subspace in CS . Since si S for all i we have si W . By virtue of
W being a subspace, a1 s1 + + ak sk W . Since this is true for all W CS then
a1 s1 + + ak sk span(S). Therefore Se span(S).
In the other direction, the set Se is itself a subspace and it contains S (exercise). Thus
Se CS and so
span(S) = W CS W Se .

Question: If W1 and W2 are subspaces, is W1 W2 a subspace? If it were, it would be the


smallest subspace containing W1 W2 and thus would be equal to span(W1 W2 ). But it is
not!
Proposition 1.2.7. If W1 and W2 are subspaces then span(W1 W2 ) = W1 + W2 , where
W1 + W2 = {w1 + w2 : w1 W1 , w2 W2 } .
Furthermore for each n 1,
span (nk=1 Wk ) = W1 + + Wn .
Proof. A general element w1 +w2 in W1 +W2 is a linear combination of elements in W1 W2 so
it is in the span of W1 W2 . On the other hand if a1 v1 + +an vn is an element of the span then
we can split the vi s into vectors from W1 and those from W2 . For instance, v1 , . . . , vk W1
and vk+1 , . . . , vn W2 . Now a1 v1 + + ak vk W1 and ak+1 vk+1 + + an vn W2 by the
fact that these are subspaces. Thus this linear combination is equal to w1 + w2 for
w1 = a1 v1 + + ak vk and w2 = ak+1 vk+1 + + an vn .
The general case is an exercise.
Remark. 1. Span(S) is the smallest subspace containing S in the following sense: if
W is any subspace containing S then span(S) W .
2. If S T then Span(S) Span(T ).
3. If W is a subspace then Span(W ) = W . Therefore Span(Span(S)) = Span(S).

8
1.3 Linear independence and bases
Now we move on to linear independence.

Definition 1.3.1. We say that vectors v1 , . . . , vk V are linearly independent if whenever

a1 v1 + + ak vk = ~0 for scalars ai F

then ai = 0 for all i. Otherwise we say they are linearly dependent.

Lemma 1.3.2. Let S = {v1 , . . . , vn } for n 1. Then S is linearly dependent if and only if
there exists v S such that v Span(S \ {v}).

Proof. Suppose first that S is linearly dependent and that n 2. Then there exist scalars
a1 , . . . , an F which are not all zero such that a1 v1 + + an vn = ~0. By reordering we may
assume that a1 6= 0. Now
 
a2 an
v1 = v2 + . . . + vn .
a1 a1

So v1 Span(S \ {v1 }).


If S is linearly dependent and n = 1 then there exists a nonzero a1 such that a1 v1 = ~0,
so v1 = ~0. Now v1 Span({}) = Span(S \ {v1 }).
Conversely, suppose there exists v S such that v span(S \ {v}) and n 2. By
reordering we may suppose that v = v1 . Then there exist scalars a2 , . . . , an such that

v1 = a2 v2 + + an vn .

But now we have


(1)v1 + a2 v2 + + an vn = ~0 .
Since this is a nontrivial linear combination, S is linearly dependent.
If n = 1 and v1 Span(S \ {v1 }) then v1 Span({}) = {~0}, so that v1 = ~0. Now it is
easy to see that S is linearly dependent.

Examples:

1. For two vectors v1 , v2 V , they are linearly dependent if and only if one is a scalar
multiple of the other. By reordering, we may suppose v1 = av2 .

2. For three vectors this is not true anymore. The vectors (1, 1), (1, 0) and (0, 1) in R2
are linearly dependent since

(1, 1) + (1)(1, 0) + (1)(0, 1) = (0, 0) .

However none of these is a scalar multiple of another.

9
3. {~0} is linearly dependent:
1 ~0 = ~0 .

Proposition 1.3.3. If S is a linearly independent set and T is a nonempty subset then T


is linearly independent.

Proof. Suppose that


a1 t1 + + an tn = ~0
for vectors ti T and scalars ai F . Since each ti S this is a linear combination
of elements of S. Since S is linearly independent all the coefficients must be zero. Thus
ai = 0 for all i. This was an arbitrary linear combination of elements of T so T is linearly
independent.

Corollary 1.3.4. If S is linearly dependent and R contains S then R is linearly dependent.


In particular, any set containing ~0 is linearly dependent.

Proposition 1.3.5. If S is linearly independent and v Span(S) then there exist unique
vectors v1 , . . . , vn and scalars a1 , . . . , an such that

v = a1 v1 + + an vn .

Proof. Suppose that S is linearly independent and there are two representations

v = a1 v1 + + an vn and v = b1 w1 + + bk wk .

Then split the vectors in {v1 , . . . , vn } {w1 , . . . , wk } into three sets: S1 are those in the first
but not the second, S2 are those in the second but not the first, and S3 are those in both.
X X X
~0 = v w = aj s j + b j sj + (aj bj )sj .
sj S1 sj S2 sj S3

This is a linear combination of elements from S and by linear independence all coefficients are
zero. Thus both representations used the same vectors (in S3 ) and with the same coefficients
and are thus the same.

Lemma 1.3.6 (Steinitz exchange). Let L = {v1 , . . . , vk } be an linearly independent set in a


vector space V and let S = {w1 , . . . , wm } be a spanning set; that is, Span(S) = V . Then

1. k m and

2. there exist m k vectors s1 , . . . , smk S such that

Span({v1 , . . . , vk , s1 , . . . , smk }) = V .

10
Proof. We will prove this by induction on k. For k = 0 it is obviously true (using the fact
that {} is linearly independent). Suppose it is true for k and we will prove it for k + 1.
In other words, let {v1 , . . . , vk+1 } be a linearly independent set. Then by last lecture, since
{v1 , . . . , vk } is linearly independent, we find k m and vectors s1 , . . . , smk S with

Span({v1 , . . . , vk , s1 , . . . , smk }) = V .

Now since vk+1 V we can find scalars a1 , . . . , ak and b1 , . . . , bmk in F such that

vk+1 = a1 v1 + + ak vk + b1 s1 + bmk smk . (1)

We claim that not all of the bi s are zero. If this were the case then we would have

vk+1 = a1 v1 + + ak vk

a1 v1 + + ak vk + (1)vk+1 = ~0 ,
a contradiction to linear independence. Also this implies that k 6= m, since otherwise the
linear combination (1) would contain no bi s. Thus

k m+1 .

Suppose for example that b1 6= 0. Then we could write

b1 s1 = a1 v1 + + (ak )vk + vk+1 + (b2 )s2 + + (bmk )smk


       
a1 ak 1 b2 bmk
s1 = v1 + + vk + vk+1 + s2 + + smk .
b1 b1 b1 b1 b1
In other words,
s1 Span(v1 , . . . , vk , vk+1 , s2 , . . . , smk )
or said differently,

V = Span(v1 , . . . , vk , s1 , . . . , smk ) Span(v1 , . . . , vk+1 , s2 , . . . , smk ) .

This completes the proof.


This lemma has loads of consequences.
Definition 1.3.7. A set B V is called a basis for V if B is linearly independent and
Span(B) = V .
Corollary 1.3.8. If B1 and B2 are bases for V then they have the same number of elements.
Proof. If both sets are infinite, we are done. If B1 is infinite and B2 is finite then we may
choose |B2 | + 1 elements of B1 that will be linearly independent. Since B2 spans V , this
contradicts Steinitz. Similarly if B1 is finite and B2 is infinite.
Otherwise they are both finite. Since each spans V and is linearly independent, we apply
Steinitz twice to get |B1 | |B2 | and |B2 | |B1 |.

11
Definition 1.3.9. We define the dimension dim V to be the number of elements in a basis.
By the above, this is well-defined.
Remark. Each nonzero element of V has a unique representation in terms of a basis.
Theorem 1.3.10. Let V be a nonzero vector space and suppose that V is finitely generated;
that is, there is a finite set S V such that V = Span(S). Then V has a finite basis:
dim V < .
Proof. Let B be a minimal spanning subset of S. We claim that B is linearly independent.
If not, then by a previous result, there is a vector b B such that b Span(B \ {b}). It
then follows that
V = Span(B) Span(B \ {b}) ,
so B \ {b} is a spanning set, a contradiction.
Theorem 1.3.11 (1 subspace theorem). Let V be a finite dimensional vector space and W
be a nonzero subspace of V . Then dim W < . If C = {w1 , . . . , wk } is a basis for W then
there exists a basis B for V such that C B.
Proof. It is an exercise to show that dim W < . Write n for the dimension of V and
let B be a basis for V (with n elements). Since this is a spanning set and C is a linearly
independent set, there exist vectors b1 , . . . , bnk B such that

B := C {b1 , . . . , bnk }

is a spanning set. Note that B is a spanning set with n = dim V number of elements. Thus
we will be done if we prove the following lemma.
Lemma 1.3.12. Let V be a vector space of dimension n 1 and S = {v1 , . . . , vk } V .
1. If k < n then S cannot span V .
2. If k > n then S cannot be linearly independent.
3. If k = n then S is linearly independent if and only if S spans V .
Proof. Let B be a basis of V . If S spans V , Steinitz gives that |B| |S|. This proves 1. If
S is linearly independent then again Steinitz gives |S| |B|. This proves 2.
Suppose that k = n and S is linearly independent. By Steinitz, we may add 0 vectors
from the set B to S to make S span V . Thus Span(S) = V . Conversely, suppose that
Span(S) = V . If S is linearly dependent then there exists s S such that s Span(S \{s}).
Then S \ {s} is a set of smaller cardinality than S and spans V . But now this contradicts
Steinitz, using B as our linearly independent set and S \ {s} as our spanning set.
Corollary 1.3.13. If V is a vector space and W is a subspace then dim W dim V .
Theorem 1.3.14. Let W1 and W2 be subspaces of V (with dim V < ). Then

dim W1 + dim W2 = dim (W1 W2 ) + dim (W1 + W2 ) .

12
Proof. Easy if either is zero. Otherwise we argue as follows. Let

{v1 , . . . , vk }

be a basis for W1 W2 . By the 1-subspace theorem, extend this to a basis of W1 :

{v1 , . . . , vk , w1 , . . . , wm1 }

and also extend it to a basis for W2 :

{v1 , . . . , vk , w1 , . . . , wm2 } .

We claim that
B := {v1 , . . . , vk , w1 , . . . , wm1 , w1 , . . . , wm2 }
is a basis for W1 + W2 . It is not hard to see it is spanning.
To show linear independence, suppose

a1 v1 + + ak vk + b1 w1 + + bm1 wm1 + c1 w1 + + cm2 wm2 = ~0 .

Then

c1 w1 + + cm2 wm2 = (a1 )v1 + + (ak )vk + (b1 )w1 + + (bm1 )wm1 W1 .

Also it is clearly in W2 . So it is in W1 W2 . So we can find scalars a


1 , . . . , a
k such that

c1 w1 + + cm2 wm2 = a
1 v1 + + a
k vk

k vk + (c1 )w1 + + (cm2 )wm2 = ~0 .


1 v1 + + a
a
This is a linear combination of basis elements, so all ci s are zero. A similar argument gives
all bi s as zero. Thus we finally have

a1 v1 + + ak vk = ~0 .

But again this is a linear combination of basis elements so the ai s are zero.

Theorem 1.3.15 (2 subspace theorem). Let V be a finite dimensional vector space and W1
and W2 be nonzero subspaces. There exists a basis of V that contains bases for W1 and W2 .

Proof. The proof of the previous theorem shows that there is a basis for W1 + W2 which
contains bases for W1 and W2 . Extend this to a basis for V .

13
1.4 Exercises
Notation:

1. If F is a field then define s1 : F F by s1 (x) = x. For integers n 2 define


sn : F F by sn (x) = x + sn1 (x). Last, define the characteristic of F as

char(F ) = min{n : sn (1) = 0} .

If sn (1) 6= 0 for all n 1 then we set char(F ) = 0.

Exercises:

1. The finite field Fp : For n N, let Zn denote the set of integers mod n. That is, each
element of Z/nZ is a subset of Z of the form d + nZ, where d Z. We define addition
and multiplication on Zn by

(a + nZ) + (b + nZ) = (a + b) + nZ.


(a + nZ) (b + nZ) = (ab) + nZ.

Show that these operations are well defined. That is, if a0 , b0 Z are integers such that
a + nZ = a0 + nZ and b + nZ = b0 + nZ, then (a0 + nZ) + (b0 + nZ) = (a + b) + nZ and
(a0 + nZ) (b0 + nZ) = (ab) + nZ. Moreover, show that these operations make Zn into
a field if and only if n is prime. In that case, one writes Z/pZ = Fp .

2. The finite field Fpn : Let F be a finite field.

(a) Show that char(F ) is a prime number (in particular non-zero).


(b) Write p for the characteristic of F and define

F = {sn (1) : n = 1, . . . , p} ,

where sn is the function given in the notation section. Show that F is a subfield
of F , isomorphic to Fp .
(c) We can consider F as a vector space over F . Vector addition and scalar mul-
tiplication are interpreted using the operations of F . Show that F has finite
dimension.
(d) Writing n for the dimension of F , show that

|F | = pn .

3. Consider R as a vector space over Q (using addition and multiplication of real numbers).
Does this vector space have finite dimension?

14
4. Recall the definition of direct sum: if W1 and W2 are subspaces of a vector space V
then we write W1 W2 for the space W1 + W2 if W1 W2 = {~0}. For k 3 we write
W1 Wk for the space W1 + + Wk if for each i = 2, . . . , k, we have

Wi (W1 + + Wi1 ) = {~0} .

Let S = {v1 , . . . , vn } be a subset of nonzero vectors in a vector space V and for each
k = 1, . . . , n write Wk = Span({vk }). Show that S is a basis for V if and only if

V = W1 Wn .

15
2 Linear transformations
2.1 Definitions and basic properties
We now move to linear transformations.
Definition 2.1.1. Let V and W be vector spaces over the same field F . A function T : V
W is called a linear transformation if
1. for all v1 , v2 V , T (v1 + v2 ) = T (v1 ) + T (v2 ) and

2. for all v V and c F , T (cv) = cT (v).


Remark. We only need to check that for all v1 , v2 V , c F , T (cv1 + v2 ) = cT (v1 ) + T (v2 ).

Examples:
1. Let V = F n and W = F m , the vector spaces of n-tuples and m-tuples respectively.
Any m n matrix A defines a linear transformation LA : F n F m by

LA~v = A ~v .

2. Let V be a finite dimensional vector space (of dimension n) and fix an (ordered) basis

= {v1 , . . . , vn }

of V . Define T : V F n by

T (v) = (a1 , . . . , an ) ,

where v = a1 v1 + + an vn . Then T is linear. It is called the coordinate map for the


ordered basis .
Suppose that T : Fn Fm is a linear transformation and ~x Fn . Then writing ~x =
(x1 , . . . , xn ),

T (~x) = T ((x1 , . . . , xn ))
= T (x1 (1, 0, . . . , 0) + + xn (0, . . . , 0, 1))
= x1 T ((1, 0, . . . , 0)) + xn T ((0, . . . , 0, 1)) .

Therefore we only need to know the values of T at the standard basis. This leads us to:
Theorem 2.1.2 (The slogan). Given V and W , vector spaces over F , let {v1 , . . . , vn } be a
basis for V . If {w1 , . . . , wn } are any vectors in W , there exists exactly one linear transfor-
mation T : V W such that

T (vi ) = wi for all i = 1, . . . , n .

16
Proof. We need to prove two things: (a) there is such a linear transformation and (b) there
cannot be more than one. Motivated by the above, we first prove (a).
Each v V has a unique representation

v = a1 v1 + + an vn .

Define T by
T (v) = a1 w1 + + an wn .
Note that by unique representations, T is well-defined. We claim that T is Plinear. Let
n
v, ve V and c F . We must show that T (cv + v
e ) = cT (v) + T (e
v ). If v = a v
i=1 i i and
ve = ni=1 e
P
ai vi then we claim that the unique representation of cv + ve is

a1 )v1 + + (can + e
cv + ve = (ca1 + e an )vn .

Therefore

T (cv + ve) = (ca1 + ea1 )w1 + + (can + ean )wn


= c(a1 w1 + + an wn ) + (e
a1 w 1 + + e
an wn )
= cT (v) + T (ev) .

Thus T is linear.
Now we show that T is unique. Suppose that T 0 is another linear transformation such
that
T 0 (vi ) = wi for all i = 1, . . . , n .
Then if v V write v = ni=1 ai vi . We have
P

T 0 (v) = T 0 (a1 v1 + + an vn )
= a1 T 0 (v1 ) + + an T 0 (vn )
= a1 w 1 + + an w n
= T (v) .

Since T (v) = T 0 (v) for all v, by definition T = T 0 .

2.2 Range and nullspace, one-to-one, onto


Definition 2.2.1. If T : V W is linear we define the nullspace of T by

N (T ) = {v V : T (v) = ~0} .

Here, ~0 is the zero vector in W . We also define the range of T

R(T ) = {w W : there exists v V s.t. T (v) = w} .

17
Proposition 2.2.2. If T : V W is linear then N (T ) is a subspace of V and R(T ) is a
subspace of W .

Proof. First note that T (~0) = ~0. This holds from


~0 = 0T (v) = T (0v) = T (~0) .

Therefore both spaces are nonempty.


Choose v1 , v2 N (T ) and c F . Then

T (cv1 + v2 ) = cT (v1 ) + T (v2 ) = c~0 + ~0 = ~0 ,

so that cv1 + v2 N (T ). Therefore N (T ) is a subspace of V . Also if w1 , w2 R(T ) and


c F then we may find v1 , v2 V such that T (v1 ) = w1 and T (v2 ) = w2 . Now

T (cv1 + v2 ) = cT (v1 ) + T (v2 ) = cw1 + w2 ,

so that cw1 + w2 R(T ). Therefore R(T ) is a subspace of W .

Definition 2.2.3. Let S and T be sets and f : S T a function.

1. f is one-to-one if it maps distinct points to distinct points. In other words if s1 , s2 S


such that s1 6= s2 then f (s1 ) 6= f (s2 ). Equivalently, whenever f (s1 ) = f (s2 ) then
s1 = s2 .

2. f is onto if its range is equal to T . That is, for each t T there exists s S such that
f (s) = t.

Theorem 2.2.4. Let T : V W be linear. Then T is one-to-one if and only if N (T ) = {~0}.

Proof. Suppose T is one-to-one. We want to show that N (T ) = {~0}. Clearly ~0 N (T ) as it


is a subspace of V . If v N (T ) then we have

T (v) = ~0 = T (~0) .

Since T is one-to-one this implies that v = ~0.


Suppose conversely that N (T ) = {~0}. If T (v1 ) = T (v2 ) then
~0 = T (v1 ) T (v2 ) = T (v1 v2 ) ,

so that v1 v2 N (T ). But the only vector in the nullspace is ~0 so v1 v2 = ~0. This implies
that v1 = v2 and T is one-to-one.
We now want to give a theorem that characterizes one-to-one and onto linear maps in a
different way.

Theorem 2.2.5. Let T : V W be linear.

18
1. T is one-to-one if and only if it maps linearly independent sets in V to linearly inde-
pendent sets in W .

2. T is onto if and only if it maps spanning sets of V to spanning sets of W .


Proof. Suppose that T is one-to-one and that S is a linearly independent set in V . We will
show that T (S), defined by
T (S) := {T (s) : s S} ,
is also linearly independent. If

a1 T (s1 ) + + ak T (sk ) = ~0

for some si S and ai F then

T (a1 s1 + + ak sk ) = ~0 .

Therefore a1 s1 + + ak sk N (T ). But T is one-to-one so N (T ) = {~0}. This gives

a1 s1 + + ak sk = ~0 .

Linear independence of S gives the ai s are zero. Thus T (S) is linearly independent.
Suppose conversely that T maps linearly independent sets to linearly independent sets.
If v is any nonzero vector in V then {v} is linearly independent. Therefore so is {T (v)}.
This implies that T (v) 6= 0. Therefore N (T ) = {~0} and so T is one-to-one.
If T maps spanning sets to spanning sets then let w W . Let S be a spanning set of V ,
so that consequently T (S) spans W . If w W we can write w = a1 T (s1 ) + + ak T (sk )
for ai F and si S, so

w = T (a1 s1 + + ak sk ) R(T ) ,

giving that T is onto.


For the converse suppose that T is onto and that S spans V . We claim that T (S) spans
W . To see this, let w W and note there exists v V such that T (v) = w. Write

v = a1 s 1 + + ak s k ,

so w = T (v) = a1 T (s1 ) + + ak T (sk ) Span(T (S)). Therefore T (S) spans W .


Corollary 2.2.6. Let T : V W be linear.
1. if V and W are finite dimensional, then T is an isomorphism (one-to-one and onto)
if and only if T maps bases to bases.

2. If V is finite dimensional, then every basis of V is mapped to a spanning set of R(T ).

3. If V is finite dimensional, then T is one-to-one if and only if T maps bases of V to


bases of R(T ).

19
Theorem 2.2.7 (Rank-nullity theorem). Let T : V W be linear and V of finite dimen-
sion. Then
rank(T ) + nullity(T ) = dim V .
Proof. Let
{v1 , . . . , vk }
be a basis for N (T ). Extend it to a basis

{v1 , . . . , vk , vk+1 , . . . , vn }

of V . Write N 0 = Span{vk+1 , . . . , vn } and note that V = N (T ) N 0 .


Lemma 2.2.8. If N (T ) and N 0 are complementary subspaces (that is, V = N (T ) N 0 ) then
T is one-to-one on N 0 .
Proof. If z1 , z2 N 0 such that T (z1 ) = T (z2 ) then z1 z2 N (T ). But it is in N 0 so it is in
N 0 N (T ), which is only the zero vector. So z1 = z2 .
We may view T as a linear transformation only on N 0 ; call it T |N 0 ; in other words, T |N 0
is a linear transformation from N 0 to W that acts exactly as T does. By the corollary,
{T |N 0 (vk+1 ), . . . , T |N 0 (vn )} is a basis for R(T |N 0 ). Therefore {T (vk+1 ), . . . , T (vn )} is a basis
for R(T |N 0 ). By part two of the corollary, {T (v1 ), . . . , T (vn )} spans R(T ). So

R(T ) = Span({T (v1 ), . . . , T (vn )})


= Span({T (vk+1 ), . . . , T (vn )})
= = R(T |N 0 ) .

The second equality follows because the vectors T (v1 ), . . . , T (vk ) are all zero and do not
contribute to the span (you can work this out as an exercise). Thus {T (vk+1 ), . . . , T (vn )} is
a basis for R(T ) and

rank(T ) + nullity(T ) = (n k) + k = n = dim V .

2.3 Isomorphisms and L(V, W )


Definition 2.3.1. If S and T are sets and f : S T is a function then a function g : T S
is called an inverse function for f (written f 1 ) if

f (g(t)) = t and g(f (s)) = s for all t T, s S .

Fact: f : S T has an inverse function if and only if f is one-to-one and onto. Furthermore
the inverse is one-to-one and onto. (Explain this.)
Theorem 2.3.2. If T : V W is an isomorphism then the inverse map T 1 : W V is
an isomorphism.

20
Proof. We have one-to-one and onto. We just need to show linear. Suppose that w1 , w2 W
and c F . Then
T T 1 (cw1 + w2 ) = cw1 + w2


and
T cT 1 (w1 ) + T 1 (w2 ) = cT (T 1 (w1 )) + T (T 1 (w2 )) = cw1 + w2 .


However T is one-to-one, so

T 1 (cw1 + w2 ) = cT 1 (w1 ) + T 1 (w2 ) .

The proof of the next lemma is in the homework.


Lemma 2.3.3. Let V and W be vector spaces with dim V = dim W . If T : V W is
linear then T is one-to-one if and only if T is onto.
Example: The coordinate map is an isomorphism. For V of dimension n choose a basis

= {v1 , . . . , vn }

and define T : V F n by T (v) = (a1 , . . . , an ), where

v = a1 v1 + + an vn .

Then T is linear (check). To show one-to-one and onto we only need to check one (since
dim V = dim F n . If (a1 , . . . , an ) F n then define v = a1 v1 + + an vn . Now

T (v) = (a1 , . . . , an ) .

So T is onto.

The space of linear maps. Let V and W be vector spaces over the same field F . Define

L(V, W ) = {T : V W |T is linear} .

We define addition and scalar multiplication as usual: for T, U L(V, W ) and c F ,

(T + U )(v) = T (v) + U (v) and (cT )(v) = cT (v) .

This is a vector space (exercise).


Theorem 2.3.4. If dim V = n and dim W = m then dim L(V, W ) = mn. Given bases
{v1 , . . . , vn } and {w1 , . . . , wm } of V and W , the set {Ti,j : 1 i n, 1 j m} is a basis
for L(V, W ), where (
wj if i = k
Ti,j (vk ) = .
~0 otherwise

21
Proof. First, to show linear independence, suppose that
X
ai,j Ti,j = 0T ,
i,j

where the element on the right is the zero transformation. Then for each k = 1, . . . , n, apply
both sides to vk : X
ai,j Ti,j (vk ) = 0T (vk ) = ~0 .
i,j

We then get
m X
X n m
X
~0 = ai,j Ti,j (vk ) = ak,j wj .
j=1 i=1 j=1

But the wj s form a basis, so all ak,1 , . . . , ak,m = 0. This is true for all k so the Ti,j s are
linearly independent.
To show spanning suppose that T : V W is linear. Then for each i = 1, . . . , n, the
vector T (vi ) is in W , so we can write it in terms of the wj s:

T (vi ) = ai,1 w1 + + ai,m wm .

Now define the transformation X


Te = ai,j Ti,j .
i,j

We claim that this equals T . To see this, we must only check on the basis vectors. For some
k = 1, . . . , n,
T (vk ) = ak,1 w1 + + ak,m wm .
However,
X m X
X n
Te(vk ) = ai,j Ti,j (vk ) = ai,j Ti,j (vk )
i,j j=1 i=1
m
X
= ak,j wj
j=1
= ak,1 w1 + + ak,m wm .

2.4 Matrices and coordinates


Let T : V W be a linear transformation and

= {v1 , . . . , vn } and = {w1 , . . . , wm }

bases for V and W , respectively.


We now build a matrix, which we label [T ] . (Using the column convention.)

22
1. Since T (v1 ) W , we can write
T (v1 ) = a1,1 w1 + + am,1 wm .

2. Put the entries a1,1 , . . . , am,n into the first column of [T ] .


3. Repeat for k = 1, . . . , n, writing
T (vk ) = a1,k w1 + + am,k wm
and place the entries a1,k , . . . , am,k into the k-th column.
Theorem 2.4.1. For each T : V W and bases and , there exists a unique m n
matrix [T ] such that for all v V ,

[T ] [v] = [T (v)] .
Proof. Let v V . Then write v = a1 v1 + + an vn .
T (v) = a1 T (v1 ) + + an T (vn )
= a1 (a1,1 w1 + + am,1 wm ) + + an (a1,n w1 + + am,n wm ) .
Collecting terms,
T (v) = (a1 a1,1 + + an a1,n )w1 + + (a1 am,1 + + an am,n )wm .
This gives the coordinates of T (v) in terms of :

a1 a1,1 + + an a1,n a1,1 a1,n a1
[T (v)] = = = [T ] [v] .
a1 am,1 + + an am,n am,1 am,n an
Suppose that A and B are two matrices such that for all v V ,
A[v] = [T (v)] = B[v] .
Take v = vk . Then [v] = ~ek and A[v] is the k-th column of A (and similarly for B).
Therefore A and B have all the same columns. This means A = B.

Theorem 2.4.2. Given bases and of V and W , the map T 7 [T ] is an isomorphism.


Proof. It is one-to-one. To show linear let c F and T, U L(V, W ). For each k = 1, . . . , n,
[cT + U ] [vk ] = [cT (v) + U (v)] = c[T (vk )] + [U (vk )] = c[T ] [vk ] + [U ] [vk ]
= (c[T ] + [U ] )[vk ] .
However the left side is the k-th column of [cT + U ] and the right side is the k-th column
of c[T ] + [U ] .
Both spaces have the same dimension, so the map is also onto and thus an isomorphism.

23
Examples:

1. Change of coordinates: Let V be finite dimensional and , 0 two bases of V . How


do we express coordinates in in terms of those in 0 ? For each v V ,

[v] 0 = [Iv] 0 = [I] 0 [v] .

We multiply by the matrix [I] 0 .

2. Suppose that V, W and Z are vector spaces over the same field. Let T : V W and
U : W Z be linear with , and bases for V, W and Z.

(a) U T : V Z is linear. If v1 , v2 V and c F then

(U T )(cv1 + v2 ) = U (T (cv1 + v2 )) = U (cT (v1 ) + T (v2 )) = c(U T )(v1 ) + (U T )(v2 ) .

(b) For each v V ,

[(U T )v] = [U (T (v))] = [U ] [T (v)] = [U ] [T ] [v] .

Therefore
[U T ] = [U ] [T ] .
(c) If T is an isomorphism from V to W ,

Id = [I] = [T ] [T 1 ] .

Similarly,
Id = [I] = [T 1 ] [T ]
1
This implies that [T 1 ] = [T ] .

3. To change coordinates back,


0
 1  1
[I] = [I 1 ] 0 = [I] 0 ] .

Definition 2.4.3. An n n matrix A is invertible is there exists an n n matrix B such


that
I = AB = BA .

Remark. If , 0 are bases of V then


0
I = [I] = [I] [I] 0 .

Therefore each change of basis matrix is invertible.

Now how do we relate the matrix of T with respect to different bases?

24
Theorem 2.4.4. Let V and W be finite-dimensional vector spaces over F with , e bases
for V and ,
e bases for W .
1. If T : V W is linear then there exist invertible matrices P and Q such that
0
[T ] = P [T ] 0 Q .

2. If T : V W is linear then there exists an invertible matrix P such that


0
[T ] = P 1 [T ] 0 P .

Proof. 0
0
[T ] = [I] [T ] 0 [I] 0 .
Also  1
0 0 0
[T ] = [I] [T ] 0 [I] 0 = [I] 0 [T ] 0 [I] 0 .

Definition 2.4.5. Two n n matrices A and B are similar if there exists an n n invertible
matrix P such that
A = P 1 BP .
Theorem 2.4.6. Let A and B be nn matrices with entries from F . If A and B are similar
then there exists an n-dimensional vector space V , a linear transformation T : V V , and
bases , 0 such that 0
A = [T ] and B = [T ] 0 .
Proof. Suppose that A = P 1 BP . Define the linear transformation LA : F n F n by
LA (~v ) = A ~v .
If we choose to be the standard basis then
[LA ] = A .
Next we will show that if 0 = {~p1 , . . . , p~n } are the columns of P then 0 is a basis and
0
P = [I] , where I : F n F n is the identity map. If we prove this, then P 1 = [I] 0 and so
0 0
B = P 1 AP = [I] 0 [LA ] [I] = [LA ] 0 .
Why is 0 a basis? Note that
p~k = P ~ek ,
so that if LP is invertible then 0 will be the image of a basis and thus a basis. But for all
~v F n ,
LP 1 LP ~v = P 1 P ~v = ~v
and
LP LP 1 ~v = ~v .
So (LP )1 = LP 1 . This completes the proof.

25
The moral: Similar matrices represent the same transformation but with respect to two
different bases. Any property of matrices that is invariant under conjugation can be viewed
as a property of the underlying transformation.

Example: Trace. Given an n n matrix A with entries from F , define


n
X
Tr A = ai,i .
i=1

Note that if P is another matrix (not nec. invertible),


n n
" n # n
" n # n
X X X X X X
T r (AP ) = (AP )i,i = ai,l pl,i = pl,i ai,l = (P A)l,l = T r (P A) .
i=1 i=1 l=1 l=1 i=1 l=1

Therefore if P is invertible,

T r (P 1 AP ) = T r (AP P 1 ) = T r A .

This means that trace is invariant under conjugation. Thus if T : V V is linear (and V is
finite dimensional) then T r T can be defined as

T r [T ]

for any basis .

2.5 Exercises
Notation:

1. A group is a pair (G, ), where G is a set and : G G G is a function (usually


called product) such that

(a) there is an identity element e; that is, an element with the property

e g = g e = g for all g G ,

(b) for all g G there is an inverse element in G called g 1 such that

g 1 g = g g 1 = e ,

(c) and the operation is associative: for all g, h, k G,

(g h) k = g (h k) .

If the operation is commutative; that is, for all g, h G, we have g h = h g then we


call G abelian.

26
2. If (G, ) is a group and H is a subset of G then we call H a subgroup of G if (H, |HH )
is a group. Equivalently (and analogously to vector spaces and subspaces), H G is
a subgroup of G if and only if

(a) for all h1 , h2 H we have h1 h2 H and


(b) for all h H, we have h1 H.

3. If G and H are groups then a function : G H is called a group homomorphism if

(g1 g2 ) = (g1 ) (g2 ) for all g1 and g2 G .

Note that the product on the left is in G whereas the product on the right is in H. We
define the kernel of to be

Ker() = {g G : (g) = eH } .

Here, eH refers to the identity element of H.

4. A group homomorphism : G H is called

a monomorphism (or injective, or one-to-one), if (g1 ) = (g2 ) g1 = g2 ;


an epimorphism (or surjective, or onto), if (G) = H;
an isomorphism (or bijective), if it is both injective and surjective.

A group homomorphism : G G is called an endomorphism of G. An endomor-


phism which is also an isomorphism is called an automorphism. The set of automor-
phisms of a group G is denoted by Aut(G).

5. Recall that, if F is a field (with operations + and ) then (F, +) and (F \ {0}, ) are
abelian groups, with identity elements 0 and 1 respectively. If F and G are fields and
: F G is a function then we call a field homomorphism if

(a) for all a, b F we have


(a + b) = (a) + (b) ,
(b) for all a, b F we have
(ab) = (a)(b) ,
(c) and (1F ) = 1G . Here 1F and 1G are the multiplicative identities of F and G
respectively.

6. If V and W are vector spaces over the same field F then we define their product to be
the set
V W = {(v, w) : v V and w W } .
which becomes a vector space under the operations

(v, w)+(v 0 , w0 ) = (v +v 0 , w +w0 ), c(v, w) = (cv, cw) v, v 0 V, w, w0 W, c F.

27
If you are familiar with the notion of an external direct sum, notice that the product
of two vector spaces is the same as their external direct sum. The two notions cease
being equivalent when one considers infinitely many factors/summands.
If Z is another vector space over F then we call a function f : V W Z bilinear if

(a) for each fixed v V , the function fv : W Z defined by

fv (w) = f ((v, w))

is a linear transformation as a function of w and


(b) for each fixed w W , the function fw : V Z defined by

fw (v) = f ((v, w))

is a linear transformation as a function of v.

Exercises:

1. Suppose that G and H are groups and : G H is a homomorphism.

(a) Prove that if H 0 is a subgroup of H then the inverse image

1 (H 0 ) = {g G : (g) H 0 }

is a subgroup of G. Deduce that Ker() is a subgroup of G.


(b) Prove that if G0 is a subgroup of G, then the image of G0 under ,

(G0 ) = {(g)|g G0 }

is a subgroup of H.
(c) Prove that is one-to-one if and only if Ker() = {eG }. (Here, eG is the identity
element of G.)

2. Prove that every field homomorphism is one-to-one.

3. Let V and W be finite dimensional vector spaces with dim V = n and dim W = m.
Suppose that T : V W is a linear transformation.

(a) Prove that if n > m then T cannot be one-to-one.


(b) Prove that if n < m then T cannot be onto.
(c) Prove that if n = m then T is one-to-one if and only if T is onto.

28
4. Let F be a field, V, W be finite-dimensional F -vector spaces, and Z be any F -vector
space. Choose a basis {v1 , . . . , vn } of V and a basis {w1 , . . . , wm } of W . Let

{zi,j : 1 i n, 1 j m}

be any set of mn vectors from Z. Show that there is precisely one bilinear transforma-
tion f : V W Z such that

f (vi , wj ) = zi,j for all i, j .

5. Let V be a vector space and T : V V a linear transformation. Show that the


following two statements are equivalent.
(A) V = R(T ) N (T ), where R(T ) is the range of T and N (T ) is the nullspace of T .
(B) N (T ) = N (T 2 ), where T 2 is T composed with itself.
6. Let V, W and Z be finite-dimensional vector spaces over a field F . If T : V W and
U : W Z are linear transformations, prove that

rank(U T ) min{rank(U ), rank(T )} .

Prove also that if either of U or T is invertible, then the rank of U T is equal to the
rank of the other one. Deduce that if P : V V and Q : W W are isomorphisms
then the rank of QT P equals the rank of T .
7. Let V and W be finite-dimensional vector spaces over a field F and T : V W be
a linear transformation. Show that there exist ordered bases of V and of W such
that (
0 if i 6= j
[T ] i,j =

.
0 or 1 if i = j

8. The purpose of this question is to show that the row rank of a matrix is equal to its
column rank. Note that this is obviously true for a matrix of the form described in
the previous exercise. Our goal will be to put an arbitrary matrix in this form without
changing either its row rank or its column rank. Let A be an m n matrix with entries
from a field F .

(a) Show that the column rank of A is equal to the rank of the linear transformation
LA : F n F m defined by LA (~v ) = A ~v , viewing ~v as a column vector.
(b) Use question 1 to show that if P and Q are invertible n n and m m matrices
respectively then the column rank of QAP equals the column rank of A.
(c) Show that the row rank of A is equal to the rank of the linear transformation
RA : F m F n defined by RA (~v ) = ~v A, viewing ~v as a row vector.
(d) Use question 1 to show that if P and Q are invertible n n and m m matrices
respectively then the row rank of QAP equals the row rank of A.

29
(e) Show that there exist n n and m m matrices P and Q respectively such that
QAP has the form described in question 2. Deduce that the row rank of A equals
the column rank of A.

9. Given an angle [0, 2) let T : R2 R2 be the function which rotates a vector


clockwise about the origin by an angle . Find the matrix of T relative to the standard
basis. You do not need to prove that T is linear.

10. Given m R, define the line

Lm = {(x, y) R2 : y = mx} .

(a) Let Tm be the function which maps a point in R2 to its closest point in Lm . Find
the matrix of Tm relative to the standard basis. You do not need to prove that
Tm is linear.
(b) Let Rm be the function which maps a point in R2 to the reflection of this point
about the line Lm . Find the matrix of Tm relative to the standard basis. You do
not need to prove that Rm is linear.
Hint for (a) and (b): first find the matrix relative to a carefully chosen basis
and then perform a change of basis.

11. The quotient space V /W . Let V be an F-vector space, and W V a subspace. A


subset S V is called a W -affine subspace of V , if the following holds:

s, s0 S : s s0 W and s S, w W : s + w S.

(a) Let S and T be W -affine subspaces of V and c F. Put


(
{ct : t T } c 6= 0
S + T := {s + t : s S, t T }, cT := .
W c=0

Show that S + T and cT are again W -affine subspaces of V .


(b) Show that the above operations define an F-vector space structure on the set of
all W -affine subspaces of V .
We will write V /W for the set of W -affine subspaces of V . We now know that it
is a vector space. Note that the elements of V /W are subsets of V .
(c) Show that if v V , then

v + W := {v + w : w W }

is a W -affine subspace of V . Show moreover that for any W -affine subspace S V


there exists a v V such that S = v + W .
(d) Show that the map p : V V /W defined by p(v) = v +W is linear and surjective.

30
(e) Compute the nullspace of p and, if the dimension of V is finite, the dimension of
V /W .

A helpful way to think about the quotient space V /W is to think of it as being the
vector space V , but with a new notion of equality of vectors. Namely, two vectors
v1 , v2 V are now seen as equal if v1 v2 W . Use this point of view to find a
solution for the following exercise. When you find it, use the formal definition given
above to write your solution rigorously.

12. Let V and X be F-vector spaces, and f L(V, X). Let W be a subspace of V
contained in N (f ). Consider the quotient space V /W and the map p : V V /W from
the previous exercise.

(a) Show that there exists a unique f L(V /W, X) such that f = f p.
(b) Show that f is injective if and only if W = N (f ).

31
3 Dual spaces
3.1 Definitions
Consider the space F n and write each vector as a column vector. When can we say that a
vector is zero? When all coordinates are zero. Further, we say that two vectors are the same
if all of their coordinates are the same. This motivates the definition of the coordinate maps
ei : F n F by ei (~v ) = i-th coordinate of ~v .
Notice that each ei is a linear function from F n to F . Furthermore they are linearly inde-
pendent, so since the dimension of L(F n , F ) is n, they form a basis.
Last it is clear that a vector ~v is zero if and only if ei (~v ) = 0 for all i. This is true if
and only if f (~v ) = 0 for all f which are linear functions from F n F . This motivates the
following definition.
Definition 3.1.1. If V is a vector space over F we define the dual space V as the space of
linear functionals
V = {f : V F | f is linear} .
We can view this as the space L(V, F ), where F is considered as a one-dimensional vector
space over itself.
Suppose that V is finite dimensional and f V . Then by the rank-nullity theorem,
either f 0 or N (f ) is a dim V 1 dimensional subspace of V . Conversely, you will show
in the homework that any dim V 1 dimensional subspace W (that is, a hyperspace) is the
nullspace of some linear functional.
Definition 3.1.2. If = {v1 , . . . , vn } is a basis for V then we define the dual basis =
{fv1 , . . . , fvn } as the unique functionals satisfying
(
1 i=j
fvi (vj ) = .
0 i 6= j
From our proof of the dimension of L(V, W ) we know that is a basis of V .
Proposition 3.1.3. Given a basis = {v1 , . . . , vn } of V and dual basis of V we can
write
f = f (v1 )fv1 + + f (vn )fvn .
In other words, the coefficients for f in the dual basis are just f (v1 ), . . . , f (vn ).
Proof. Given f V , we can write
f = a1 fv1 + + an fvn .
To find the coefficients, we evaluate both sides at vk . The left is just f (vk ). The right is
ak fvk (vk ) = ak .
Therefore ak = f (vk ) are we are done.

32
3.2 Annihilators
We now study annihilators.

Definition 3.2.1. If S V we define the annihilator of S as

S = {f V : f (v) = 0 for all v S} .

Theorem 3.2.2. Let V be a vector space and S V .

1. S is a subspace of V (although S does not have to be a subspace of V ).

2. S = (Span S) .

3. If dim V < and U is a subspace of V then whenever {v1 , . . . , vk } is a basis for U


and {v1 , . . . , vn } is a basis for V ,

{fvk+1 , . . . , fvn } is a basis for U .

Proof. First we show that S is a subspace of V . Note that the zero functional obviously
sends every vector in S to zero, so 0 S . If c F and f1 , f2 S , then for each v S,

(cf1 + f2 )(v) = cf1 (v) + f2 (v) = 0 .

So cf1 + f2 S and S is a subspace of V .


Next we show that S = (Span S) . To prove the forward inclusion, take f S . Then
if v Span S we can write
v = a1 v1 + + am vm
for scalars ai F and vi S. Thus

f (v) = a1 f (v1 ) + + am f (vm ) = 0 ,

so f (Span S) . On the other hand if f (Span S) then clearly f (v) = 0 for all v S
(since S Span S). This completes the proof of item 2.
For the third item, we know that the functionals fvk+1 , . . . , fvn are linearly independent.
Therefore we just need to show that they span U . To do this, take f U . We can write
f in terms of the dual basis fv1 , . . . , fvn :

f = a1 fv1 + + ak fvk + ak+1 fvk+1 + + an fvn .

Using the formula we have for the coefficients, we get aj = f (vj ), which is zero for j k.
Therefore
f = ak+1 fvk+1 + + an fvn
and we are done.

33
Corollary 3.2.3. If V is finite dimensional and W is a subspace then

dim V = dim W + dim W .

Definition 3.2.4. For S 0 V we define



(S 0 ) = {v V : f (v) = 0 for all f S 0 } .

In the homework you will prove similar properties for (S 0 ).

Fact: v V is zero if and only if f (v) = 0 for all f V . One implication is easy. To prove
the other, suppose that v 6= ~0 and and extend {v} to a basis for V . Then the dual basis has
the property that fv (v) 6= 0.
Proposition 3.2.5. If W V is a subspace and V is finite-dimensional then (W ) = W .
Proof. If w W then for all f W , we have f (w) = 0, so w (W ). Suppose conversely
that w V has f (w) = 0 for all f W . If w / W then build a basis {v1 , . . . , vn } of V
such that {v1 , . . . , vk } is a basis for W and vk+1 = w. Then by the previous proposition,
{fvk+1 , . . . , fvn } is a basis for W . However fw (w) = 1 6= 0, which is a contradiction, since
fw W .

3.3 Double dual


Lemma 3.3.1. If v V is nonzero and dim V < , there exists a linear functional fv V
such that fv (v) = 1. Therefore v = ~0 if and only if f (v) = 0 for all f V .
Proof. Extend {v} to a basis of V and consider the dual basis. fv is in this basis and
fv (v) = 1.
For each v V we can define the evaluation map ve : V F by

ve(f ) = f (v) .

Theorem 3.3.2. Suppose that V is finite-dimensional. Then the map : V V given


by
(v) = ve
is an isomorphism.
Proof. First we show that if v V then (v) V . Clearly ve maps V to F , but we just
need to show that ve is linear. If f1 , f2 V and c F then

ve(cf1 + f2 ) = (cf1 + f2 )(v) = cf1 (v) + f2 (v) = ce


v (f1 ) + ve(f2 ) .

Therefore (v) V . We now must show that is linear and either one-to-one or onto
(since the dimension of V is equal to the dimension of V ). First if v1 , v2 V , c F then
we want to show that
(cv1 + v2 ) = c(v1 ) + (v2 ) .

34
Both sides are elements of V so we need to show they act the same on elements of V . Let
f V . Then

(cv1 + v2 )(f ) = f (cv1 + v2 ) = cf (v1 ) + f (v2 ) = c(v1 )(f ) + (v2 )(f ) = (c(v1 ) + (v2 ))(f ) .

Finally to show one-to-one, we show that N () = {0}. If (v) = ~0 then for all f V ,

0 = (v)(f ) = f (v) .

This implies v = ~0.

Theorem 3.3.3. Let V be finite dimensional.

1. If = {v1 , . . . , vn } is a basis for V then () = {(v1 ), . . . , (vn )} is the double dual


of this basis.

2. If W is a subspace of V then (W ) is equal to (W ) .

Proof. Recall that the dual basis of is = {fv1 , . . . , fvn }, where


(
1 i=j
fvi (vj ) = .
0 i 6= j

Since is an isomorphism, () is a basis of V . Now

(vi )(fvk ) = fvk (vi )

which is 1 if i = k and 0 otherwise. This means () = .


Next if W is a subspace, let w W . Letting f W ,

(w)(f ) = f (w) = 0 .

So w (W ) . However, since is an isomorphism, (W ) is a subspace of (W ) . But


they have the same dimension, so they are equal.

3.4 Dual maps


Definition 3.4.1. Let T : V W be linear. We define the dual map T : W V by

T (g)(v) = g(T (v)) .

Theorem 3.4.2. Let V and W be finite dimensional and let and be bases for V and W .
If T : V W is linear, so is T . If and are the dual bases, then
t
[T ] = [T ] .

35
Proof. First we show that T is linear. If g1 , g2 W and c F then for each v V ,

T (cg1 + g2 )(v) = (cg1 + g2 )(T (v)) = cg1 (T (v)) + g2 (T (v))


= cT (g1 )(v) + T (g2 )(v) = (cT (g1 ) + T (g2 ))(v) .

Next let = {v1 , . . . , vn }, = {w1 , . . . , wm }, = {fv1 , . . . , fvn } and = {gw1 , . . . , gwm }


and write [T ] = (ai,j ). Then recall from the lemma that for any f V we have

f = f (v1 )fv1 + + f (vn )fvn ,

Therefore the coefficient of fvi for T (gwk ) is

T (gwk )(vi ) = gwk (T (vi )) = gwk (a1,i w1 + + am,i wm ) = ak,i .

So
T (gwk ) = ak,1 fv1 + + ak,n fvn .

This is the k-th column of [T ] .

Theorem 3.4.3. If V and W are finite dimensional and T : V W is linear then R(T ) =
(N (T )) and (R(T )) = N (T ).

Proof. If g N (T ) then T (g)(v) = 0 for all v V . If w R(T ) then w = T (v) for


some v V . Then g(w) = g(T (v)) = T (g)(v) = 0. Thus g (R(T )) . If, conversely,
g (R(T )) then we would like to show that T (g)(v) = 0 for all v V . We have

T (g)(v) = g(T (v)) = 0 .

Let f R(T ). Then f = T (g) for some g W . If T (v) = 0, we have f (v) =


T (g)(v) = g(T (v)) = 0. Therefore f (N (T )) . This gives R(T ) (N (T )) . To show
the other direction,
dim R(T ) = m dim N (T )
and
dim (N (T )) = n dim N (T ) .
However, by part 1, dim N (T ) = m dim R(T ), so

dim R(T ) = dim R(T ) = n dim N (T ) = dim (N (T )) .

This gives the other inclusion.

36
3.5 Exercises
Notation:

1. Recall the definition of a bilinear function. Let F be a field, and V, W and Z be


F -vector spaces. A function f : V W Z is called bilinear if

(a) for each v V the function fv : W Z given by fv (w) = f (v, w) is linear as a


function of w and
(b) for each w W the function fw : V Z given by fw (v) = f (v, w) is linear as a
function of v.

When Z is the F -vector space F , one calls f a bilinear form.

2. Given a bilinear function f : V W Z, we define its left kernel and its right kernel
as
LN (f ) = {v V : f (v, w) = 0 w W },
RN (f ) = {w W : f (v, w) = 0 v V }.
More generally, for subspaces U V and X W we define their orthogonal comple-
ments
U f = {w W : f (u, w) = 0 u U },
f
X = {v V : f (v, x) = 0 x X}.
Notice that LN (f ) = f W and RN (f ) = V f .

Exercises:

1. Let V and W be vector spaces over a field F and let f : V W F be a bilinear


form. For each v V , we denote by fv the linear functional W F given by
fv (w) = f (v, w). For each w W , we denote by fw the linear functional V F given
by fw (v) = f (v, w).

(a) Show that the map


: V W , (v) = fv
is linear and its kernel is LN (f ).
(b) Analogously, show that the map

: W V , (w) = fw

is linear and its kernel is RN (f ).


(c) Assume now that V and W are finite-dimensional. Show that the map W V
given by composing with the inverse of the canonical isomorphism W W
is equal to , the map dual to .

37
(d) Assuming further dim(V ) = dim(W ), conclude that the following statements are
equivalent:
i. LN (f ) = {0},
ii. RN (f ) = {0},
iii. is an isomorphism,
iv. is an isomorphism.
2. Let V, W be finite-dimensional F -vector spaces. Denote by V and W the canonical
isomorphisms V V and W W . Show that if T : V W is linear then
1
W T V = T .

3. Let V be a finite-dimensional F -vector space. Show that any basis (1 , . . . , n ) of V


is the dual basis to some basis (v1 , . . . , vn ) of V .

4. Let V be an F -vector space, and S 0 V a subset. Recall the definition



S 0 = {v V : (v) = 0 S 0 }.

Imitating a proof given in class, show the following:



(a) S 0 = span(S 0 ).
(b) Assume that V is finite-dimensional, let U 0 V be a subspace, and let (1 , . . . , n )
be a basis for V such that (1 , . . . , k ) is a basis for U 0 . If (v1 , . . . , vn ) is the ba-
sis of V from the previous exercise, then (vk+1 , . . . , vn ) is a basis for U 0 . In
particular, dim(U 0 ) + dim( U 0 ) = dim(V ).

5. Let V be a finite-dimensional F -vector space, and U V a hyperplane (that is, a


subspace of V of dimension dim V 1).

(a) Show that there exists V with N () = U .


(b) Show that if V is another functional with N () = U , then there exists
c F with = c.

6. (From Hoffman-Kunze)

(a) Let A and B be n n matrices with entries from a field F . Show that T r (AB) =
T r (BA).
(b) Let T : V V be a linear transformation on a finite-dimensional vector space.
Define the trace of T as the trace of the matrix of T , represented in some basis.
Prove that the definition of trace does not depend on the basis thus chosen.
(c) Prove that on the space of n n matrices with entries from a field F , the trace
function T r is a linear functional. Show also that, conversely, if some linear
functional g on this space satisfies g(AB) = g(BA) then g is a scalar multiple of
the trace function.

38
4 Determinants
4.1 Permutations
Now we move to permutations. These will be used when we talk about the determinant.

Definition 4.1.1. A permutation on n letters is a function : {1, . . . , n} {1, . . . , n}


which is a bijection.

The set of all permutations forms a group under composition. There are n! elements.
There are two main ways to write a permutation.

1. Row notation:
1 2 3 4 5 6
6 4 2 3 5 1
Here we write the elements of {1, . . . , n} in the first row, in order. In the second row
we write the elements they are mapped to, in order.

2. Cycle decomposition:
(1 6)(2 4 3)(5)
All cycles are disjoint. It is easier to compose permutations this way. Suppose is the
permutation given above and 0 is the permutation

0 = (1 2 3 4 5)(6) .

Then the product of 0 is (here we will apply 0 first)


  
(1 6)(2 3 4)(5) (1 2 3 4 5)(6) = (1 3 2 4 5 6) .

It is a simple fact that each permutation has a cycle decomposition with disjoint cycles.

Definition 4.1.2. A transposition is a permutation that swaps two letters and fixes the
others. Removing the fixed letters, it looks like (i j) for i 6= j. An adjacent transposition is
one that swaps neighboring letters.

Lemma 4.1.3. Every permutation can be written as a product of transpositions. (Not


necessarily disjoint.)

Proof. All we need to do is write a cycle as a product of transpositions. Note

(a1 a2 ak ) = (a1 ak )(a1 ak1 ) (a1 a3 )(a1 a2 ) .

Definition 4.1.4. A pair of numbers (i, j) is an inversion pair for if i < j but (i) > (j).
Write Ninv () for the number of inversion pairs of .

39
For example in the permutation (13)(245), also written as
1 2 3 4 5
,
3 4 1 5 2
we have inversion pairs (1, 3), (1, 5), (2, 3), (2, 5), (4, 5).
Lemma 4.1.5. Let be a permutation and = (k k +1) be an adjacent transposition. Then
Ninv () = Ninv () 1. If 1 , . . . , m are adjacent transpositions then
(
even m even
Ninv (1 m ) Ninv () = .
odd m odd
Proof. Let a < b {1, . . . , n}. If (a), (b) {k, k + 1} then (a) (b) = ((a) (b))
so (a, b) is an inversion pair for if and only if it is not one for . We claim that in all
other cases, the sign of (a) (b) is the same as the sign of (a) (b). If neither
of (a) and (b) is in {k, k + 1} then (a) (b) = (a) (b). The other cases are
somewhat similar: if (a) = k but (b) > k + 1 then (a) (b) = k + 1 (b) < 0 and
(a) (b) = k (b) < 0. Keep going.
Therefore has exactly the same inversion pairs as except for ( 1 (a), 1 (b)), which
switches status. This proves the lemma.
Definition 4.1.6. Given a permutation on n letters, we say that is even if it can be
written as a product of an even number of transpositions and odd otherwise. This is called
the signature (or sign) of a permutation:
(
+1 if is even
sgn() = .
1 if is odd
Theorem 4.1.7. If can be written as a product of an even number of transpositions, it
cannot be written as a product of an odd number of transpositions. In other words, signature
is well-defined.
Proof. Suppose that = s1 sk and = t1 tj . We want to show that k j is even. In
other words, if k is odd, so is j and if k is even, so is j.
Now note that each transposition can be written as a product of an odd number of
adjacent transpositions:
(5 1) = (5 4)(4 3)(3 2)(1 2)(2 3)(3 4)(4 5)
so write s1 sk = se1 sek0 and t1 tj = e tj 0 where k k 0 is even and j j 0 is even.
t1 e
We have 0 = Ninv (id) = Ninv (e tj1 e1
0 t1 ), which is Ninv () plus an even number if j
0
0 0
is even or and odd number if j is odd. This means that Ninv () j is even. The same
argument works for k, so Ninv () k 0 is even. Now
j k = j j 0 + j 0 Ninv () + Ninv () k 0 + k 0 k
is even.

40
Corollary 4.1.8. For any two permutations and 0 ,

sgn( 0 ) = sgn()sgn( 0 ) .

4.2 Determinants: existence and uniqueness


Given n vectors ~v1 , . . . , ~vn in Rn we want to define something like the volume of the paral-
lelepiped spanned by these vectors. What properties would we expect of a volume?
1. vol(e1 , . . . , en ) = 1.
2. If two of the vectors ~vi are equal the volume should be zero.
3. For each c > 0, vol(c~v1 , ~v2 , . . . , ~vn ) = c vol(~v1 , . . . , ~vn ). Same in other arguments.
4. For each ~v10 , vol(~v1 +~v10 , ~v2 , . . . , ~vn ) = vol(~v1 , . . . , ~vn )+vol(~v10 , ~v2 , . . . , ~vn ). Same in other
arguments.
Using the motivating example of the volume, we define a multilinear function as follows.
Definition 4.2.1. If V is an n-dimensional vector space over F then define

V n = {(v1 , . . . , vn ) : vi V for all i = 1, . . . , n} .

A function f : V n F is called multilinear if for each i and vectors v1 , . . . , vi1 , vi+1 , . . . , vn


V , the function fi : V F is linear, where

fi (v) = f (v1 , . . . , vi1 , v, vi+1 , . . . , vn ) .

A multilinear function f is called alternating if f (v1 , . . . , vn ) = 0 whenever vi = vj for some


i 6= j.
Proposition 4.2.2. Let f : V n F be a multilinear function. If F does not have charac-
teristic two then f is alternating if and only if for all v1 , . . . , vn and i < j,

f (v1 , . . . , vi , . . . , vj , . . . , vn ) = f (v1 , . . . , vj , . . . , vi , . . . , vn ) .

Proof. Suppose that f is alternating. Then

0 = f (v1 , . . . , vi + vj , . . . , vi + vj , . . . , vn )
= f (v1 , . . . , vi , . . . , vi + vj , . . . , vn ) + f (v1 , . . . , vj , . . . , vi + vj , . . . , vn )
= f (v1 , . . . , vi , . . . , vj , . . . , vn ) + f (v1 , . . . , vj , . . . , vi , . . . , vn ) .

Conversely suppose that f has the property above. Then if vi = vj ,

f (v1 , . . . , vi , . . . , vj , . . . , vn ) = f (v1 , . . . , vj , . . . , vi , . . . , vn )
= f (v1 , . . . , vi , . . . , vj , . . . , vn ) .

Since F does not have characteristic two, this means this is zero.

41
Corollary 4.2.3. Let f : V n F be an n-linear alternating function. Then for each Sn ,

f (v(1) , . . . , v(n) ) = sgn() f (v1 , . . . , vn ) .

Proof. Write = 1 k where the i s are transpositions and (1)k = sgn(). Then

f (v(1) , . . . , v(n) ) = f (v1 k1 (1) , . . . , v1 k1 (n) ) .

Applying this k 1 more times gives the corollary.

Theorem 4.2.4. Let {v1 , . . . , vn } be a basis for V . There is at most one multilinear alter-
nating function f : V n F such that f (v1 , . . . , vn ) = 1.

Proof. Let u1 , . . . , un V and write

uk = a1,k v1 + + an,k vn .

Then
n
X
f (u1 , . . . , un ) = ai1 ,1 f (vi1 , u2 , . . . , un )
i1 =1
X n
= ai1 ,1 ain ,n f (vi1 , . . . , vin ) .
i1 ,...,in =1

However whenever two ij s are equal, we get zero, so we can restrict the sum to all distinct
ij s. So this is
Xn
ai1 ,1 ain ,n f (vi1 , . . . , vin ) .
i1 ,...,in =1 distinct

This can now be written as


X
a(1),1 a(n),n f (v(1) , . . . , v(n) )
Sn
X
= sgn() a(1),1 a(n),n f (v1 , . . . , vn )
Sn
X
= sgn() a(1),1 a(n),n .
Sn

We now find that if dim V = n and f : V n F is an n-linear alternating function


with f (v1 , . . . , vn ) = 1 for some fixed basis {v1 , . . . , vn } then we have a specific form for f .
Writing vectors u1 , . . . , un as

uk = a1,k v1 + + an,k vn ,

42
then we have X
f (u1 , . . . , un ) = sgn() a(1),1 a(n),n .
Sn

Now we would like to show that the formula above indeed does define an n-linear alternating
function with the required property.

1. Alternating. Suppose that ui = uj for some i < j. We will then split the set of
permutations into two classes. Let A = { Sn : (i) < (j)}. Letting i,j = (ij),
write for A for the permutation i,j .
X X
f (u1 , . . . , un ) = sgn() a(1),1 a(n),n + sgn( ) a (1),1 a (n),n
A Sn \A
X X
= sgn() a(1),1 a(n),n + sgn(i,j ) ai,j (1),1 ai,j (n),n .
A piA

However, i,j (k) = (k) when k 6= i, j and ui = uj so this equals


X
[sgn() + sgn(i,j )]a(1),1 a(n),n = 0 .
A

2. 1 at the basis. Note that for ui = vi for all i we have


(
1 i=j
ai,j = .
0 i 6= j

Therefore the value of f is


X
sgn() a(1),1 a(n),n = sgn(id)a1,1 an,n = 1 .
Sn

3. Multilinear. Write

uk = a1,k v1 + + an,k vn for k = 1, . . . , n

and write u = b1 v1 + + bn vn . Now for c F ,

cu + u1 = (cb1 + a1,1 )v1 + + (cbn + an,1 )vn .

f (cu + u1 , u2 , . . . , un )
X
= sgn() [cb(1) + a(1),1 ]a(2),2 a(n),n
Sn
X X
= c sgn() b(1) a(2),2 a(n),n + sgn() a(1),1 a(2),2 a(n),n
Sn Sn
= cf (u, u2 , . . . , un ) + f (u1 , . . . , un ) .

43
4.3 Properties of determinants
Theorem 4.3.1. Let f : V n F be a multilinear alternating function and let {v1 , . . . , vn }
be a basis with f (v1 , . . . , vn ) 6= 0. Then {u1 , . . . , un } is linearly dependent if and only if
f (u1 , . . . , un ) = 0.

Proof. One direction is on the homework: suppose that f (u1 , . . . , un ) = 0 but that {u1 , . . . , un }
is linearly independent. Then write

vk = a1,k u1 + + an,k un .

By the same computation as above,


X
f (v1 , . . . , vn ) = f (u1 , . . . , un ) sgn() a(1),1 a(n),n = 0 ,
Sn

which is a contradiction. Therefore {u1 , . . . , un } is linearly dependent.


Conversely, if {u1 , . . . , un } is linearly dependent then for n 2 we can find some uj and
scalars ai for i 6= j such that X
uj = ai u i .
i6=j

Now we have X
f (u1 , . . . , uj , . . . , un ) = ai f (u1 , . . . , ui , . . . , un ) = 0 .
i6=j

Definition 4.3.2. On the space F n we define det : F n F as the unique alternating


n-linear function that gives det(e1 , . . . , en ) = 1. If A is an n n matrix then we define

det A = det(~a1 , . . . , ~an ) ,

where ~ak is the k-th column of A.

Corollary 4.3.3. An n n matrix A over F is invertible if and only if det A 6= 0.

Proof. We have det A 6= 0 if and only if the columns of A are linearly independent. This is
true if and only if A is invertible.
We start with the multiplicative property of determinants.

Theorem 4.3.4. Let A and B be n n matrices over a field F . Then

det (AB) = det A det B .

44
Proof. If det B = 0 then B is not invertible, so it cannot have full column rank. Therefore
neither can AB (by a homework problem). This means det (AB) = 0 and we are done.
Otherwise det B 6= 0. Define a function f : Mnn (F ) F by

det (AB)
f (A) = .
det B
We claim that f is n-linear, alternating and assigns the value 1 to the standard basis (that
is, the identity matrix).

1. f is alternating. If A has two equal columns then its column rank is not full.
Therefore neither can be the column rank of AB and we have det (AB) = 0. This
implies f (A) = 0.

2. f (I) = 1. This is clear since IB = B.

3. f is n-linear. This follows because det is.

But there is exactly one function satisfying the above. We find f (A) = det A and we are
done.
For the rest of the lecture we will give further properties of determinants.

det A = det At . This is on homework.

det is alternating and n-linear as a function of rows.

If A is (a block matrix) of the form


 
B C
A=
0 D

then det A = det B det D. This is also on homework.

The determinant is unchanged if we add a multiple of one column (or row) to another.
To show this, write a matrix A as a collection of columns (~a1 , . . . , ~an ). For example if
we add a multiple of column 1 to column 2 we get

det(~a1 , c~a1 + ~a2 , ~a3 , . . . , ~an ) = det c(~a1 , ~a1 , ~a3 , . . . , ~an ) + det(~a1 , ~a2 , ~a3 , . . . , ~an )
= det(~a1 , ~a2 , ~a3 , . . . , ~an )

det cA = cn det A.

We will now discuss cofactor expansion.

Definition 4.3.5. Let A Mnn (F ). For i, j [1, n] define the (i, j)-minor of A (written
A(i|j)) to be the (n 1) (n 1) matrix obtained from A by removing the i-th row and the
j-th column.

45
Theorem 4.3.6 (Laplace expansion). Let A Mnn (F ) for n 2 and fix j [1, n]. We
have n
X
det A = (1)i+j Ai,j det A(i|j) .
i=1

Proof. Let us begin by taking j = 1. Now write the column ~a1 as

~a1 = A1,1 e1 + a2,1 e2 + + An,1 en .

Then we get
n
X
det A = det(~a1 , . . . , ~an ) = Ai,1 det(ei , ~a2 , . . . , ~an ) . (2)
i=1

We now consider the term det(ei , ~a2 , . . . , ~an ). This is the determinant of the following matrix:

0 A1,2 A1,n



1 Ai,2
Ai,n
.

0 An,2 An,n

Here, the first column is 0 except for a 1 in the i-th spot. We can now swap the i-th row to
the top using i 1 adjacent transpositions (12) (i 1 i). We are left with the determinant
of the matrix
1 Ai,2 Ai,n
0 A1,2 A1,n




i1
(1) 0 Ai1,2 Ai1,n .
0 Ai+1,2 Ai+1,n


0 An,2 An,n
This is a block matrix of the form
 
1 B
.
0 A(i|1)

By the remarks earlier, the determinant is equal to det A(i|j) 1. Plugging this into formula
(5), we get
Xn
det A = (1)i1 Ai,1 det A(i|1) ,
i=1
Pn i+j
which equals i=1 (1) Ai,j det A(i|j).
If j 6= 1 then we perform j 1 adjacent column switches to get the j-th column to the
first. This gives us a new matrix A.
e For this matrix, the formula holds. Compensating for

46
the switches,
n
X
j1 e = (1)j1
det A = (1) det A (1)i1 A
ei,1 det A(i|1)
e
i=1
n
X
= (1)i+j Ai,j det A(i|j) .
i=1

Check the last equality. This completes the proof.

We have discussed determinants of matrices (and of n vectors in F n ). We will now define


for transformations.
Definition 4.3.7. Let V be a finite dimensional vector space over a field F . If T : V V
is linear, we define det T as det[T ] for any basis of V .

Note that det T does not depend on the choice of basis. Indeed, if 0 is another basis,
0
 0 
det[T ] 0 = det [I] [T ] [I] 0 = det[T ] .

If T and U are linear transformations from V to V then det T U = det T det U .

det cT = cdim V
det T .

det T = 0 if and only if T is non-invertible.

4.4 Exercises
Notation:
1. n = {1, . . . , n} is the finite set of natural numbers between 1 and n;

2. Sn is the set of all bijective maps n n;

3. For a sequence k1 , . . . , kt of distinct elements of n, we denote by (k1 k2 . . . kt ) the element


of Sn which is defined by

ks , i = ks1 , 1 < s < t + 1

(i) = k1 , i = kt

i, i
/ {k1 , . . . , kt }

Elements of this form are called cycles (or t-cycles). Two cycles (k1 . . . kt ) and (l1 . . . ls )
are called disjoint if the sets {k1 , . . . , kt } and {l1 , . . . , ls } are disjoint.

4. Let Sn . A subset {k1 , . . . , kt } n is called an orbit of if the following conditions


hold

47
For any j N there exists an 1 i t such that j (k1 ) = ki .
For any 1 i t there exists a j N such that ki = j (k1 ).

Here j is the product of j-copies of .

5. Let V and W be two vector spaces over an arbitrary field F , and k N. Recall that
a k-linear map f : V k W is called

alternating, if f (v1 , . . . , vk ) = 0 whenever the vectors (v1 , . . . , vk ) are not distinct;


skew-symmetric, if f (v1 , . . . , vk ) = f (v (1) , . . . , v (k) ) for any transposition
Sk ;
symmetric, if f (v1 , . . . , vk ) = f (v (1) , . . . , v (k) ) for any transposition Sk .

6. If k and n are positive integers such that k n the binomial coefficient nk is defined


as  
n n!
= .
k k!(n k)!
Note that this number is equal to the number of distinct subsets of size k of a finite
set of size n.

7. Given an F-vector space V , denote by Altk (V ) the set of alternating k-linear forms
(functions) on V ; that is, the set of alternating k-linear map V k F.

Exercises:

1. Prove that composition of maps defines a group law on Sn . Show that this group is
abelian only if n 2.

2. Let f : Sn Z be a function which is multiplicative, i.e. f ( ) = f ()f ( ). Show that


f must be one of the following three functions: f () = 0, f () = 1, f () = sgn().

3. (From Dummit-Foote) List explicitly the 24 permutations of degree 4 and state which
are odd and which are even.

4. Let k1 , . . . , kt be a sequence of distinct elements of n. Show that sgn((k1 . . . kt )) =


(1)t1 .

5. Let Sn be the element (k1 . . . kt ) from the previous exercise, and let Sn be any
element. Find a formula for the element 1 .

6. Let = (k1 . . . kt ) and = (l1 . . . ls ) be disjoint cycles. Show that then = . One
says that and commute.

7. Let Sn . Show that can be written as a product of disjoint (and hence, by the
previous exercise, commuting) cycles.
Hint: Consider the orbits of .

48
8. If G is a group and S G is a subset, define hSi to be the intersection of all subgroups
of G that contain S. (This is the subgroup of G generated by S.)

(a) Show that if S is a subset of G then hSi is a subgroup of G.


(b) Let S be a subset of G and define

S = S {s1 : s S} .

Show that
hSi = {a1 ak : k 1 and ai S for all i} .

9. Prove that Sn = h(12), (12 n)i.


Hint: Use exercise 5.

10. Let V be an finite-dimensional vector space over some field F , W an arbitrary vector
space over F , and k > dim(V ). Show that every alternating k-linear function V k W
is identically zero. Give an example (choose F , V , W , and k as you wish, as long as
k > dim(V )) of a skew-symmetric k-linear function V k W which is not identically
zero.

11. (From Hoffman-Kunze) Let F be a field and f : F2 F be a 2-linear alternating


function. Show that    
a c
f( , ) = (ad bc)f (e1 , e2 ).
b d
Find an analogous formula for F3 . Deduce from this the formula for the determinant
of a 2 2 and a 3 3 matrix.

12. Let V be an n-dimensional vector space over a field F. Suppose that f : V n F is an


n-linear alternating function such that f (v1 , . . . , vn ) 6= 0 for some basis {v1 , . . . , vn } of
V . Show that f (u1 , . . . , un ) = 0 implies that {u1 , . . . , un } is linearly dependent.

13. Let V and W be vector spaces over a field F , and f : V W a linear map.

(a) For Altk (W ) let f be the function on V k defined by

[f ](v1 , . . . , vk ) = (f (v1 ), . . . , f (vk )).

Show that f Altk (V ).


(b) Show that in this way we obtain a linear map f : Altk (W ) Altk (V ).
(c) Show that, given a third vector space X over F and a linear map g : W X,
one has (g f ) = f g .
(d) Show that if f is an isomorphism, then so is f .

49
14. For n 2, we call M Mnn (F) a block upper-triangular matrix if there exists k with
1 k n 1 and matrices A Mkk (F), B Mk(nk) (F) and C M(nk)(nk) (F)
such that M has the form  
A B
.
0 C
That is, the elements of M are given by


Ai,j 1 i k, 1jk

B
i,jk 1 i k, k<jn
Mi,j = .


0 k < i n, 1jk
k < i n, k<jn

Cik,jk
We will show in this exercise that
det M = det A det C . (3)
(a) Show that if det C = 0 then formula (3) holds.
 
(b) Suppose that det C 6= 0 and define a function A 7 A for A Mkk (F) by

B
 
A
 
1
A = [det C] det .
0 C
 
That is, A is a scalar multiple of the determinant of the block upper-triangular
matrix we get when we replace A by A and keep B and C fixed.

i. Show that is k-linear as a function of the columns of A.
ii. Show that is alternating and satisfies (Ik ) = 1, where Ik is the k k
identity matrix.
iii. Conclude that formula (3) holds when det C 6= 0.
15. Suppose that A Mnn (F) is upper-triangular; that is, ai,j = 0 when 1 j < i n.
Show that det A = a1,1 a2,2 an,n .
16. Let A Mnn (F) such that Ak = 0 for some k 0. Show that det A = 0.
17. Let a0 , a1 , . . . , an be distinct complex numbers. Write Mn (a0 , . . . , an ) for the matrix

1 a0 a20 an0
1 a1 a21 an1
.

1 an a2n n
an
The goal of this exercise is to show that
Y
det Mn (a0 , . . . , an ) = (aj ai ) . (4)
0i<jn

We will argue by induction on n.

50
(a) Show that if n = 2 then formula (5) holds.
(b) Now suppose that k 3 and that formula (5) holds for all 2 n k. Show that
it holds for n = k + 1 by completing the following outline.
i. Define the function f : C C by f (z) = det Mn (z, a1 , . . . , an ). Show that f
is a polynomial of degree at most n.
ii. Find all the zeros of f .
iii. Show that the coefficient of z n is (1)n det Mn1 (a1 , . . . , an ).
iv. Show that formula (5) holds for n = k + 1, completing the proof.

18. Show that if A Mnn (F) then det A = det At , where At is the transpose of A.

19. Let V be an n-dimensional F-vector space and k n. The purpose of this problem is
to show that  
k n
dim(Alt (V )) = ,
k
by completing the following steps:

(a) Let W be a subspace of V and let B = (v1 , . . . , vn ) be a basis for V such that
(v1 , . . . , vk ) is a basis for W . Show that
(
vi , i k
pW,B : V W, vi 7
0, i > k

specifies a linear map with the property that pW,B pW,B = pW,B . Such a map
(that is, a T such that T T = T ) is called a projection.
(b) With W and B as in the previous part, let dW be a non-zero element of Altk (W ).
Show that [pW,B ] dW is a non-zero element of Altk (V ). (Recall this notation from
exercise 3.)
(c) Let B = (v1 , . . . , vn ) be a basis of V . Let S1 , . . . , St be subsets of n = {1, . . . , n}.
Assume that each Si has exactly k elements and no two Si s are the same. Let

Wi = Span({vj : j Si }).

For i = 1, . . . , t, let dWi Altk (Wi ) be non-zero. Show that the collection
{[pWi ,B ] dWi : 1 i t} of elements of Altk (V ) is linearly independent.
(d) Show that the above collection is also generating, by taking an arbitrary
Altk (V ), an arbitrary collection u1 , . . . , uk of vectors in V , expressing each ui as a
linear combination of (v1 , . . . , vk ) and plugging those linear combinations into .
In doing this, it may be helpful (although certainly not necessary) to assume
that dWi is the unique element of Altk (Wi ) with dWi (w1 , . . . , wk ) = 1, where
Si = (w1 , . . . , wk ).

51
20. Let A Mnn (F ) for some field F . Recall that if 1 i, j n then the (i, j)-th minor
of A, written A(i|j), is the (n 1) (n 1) matrix obtained by removing the i-th row
and j-th column from A. Define the cofactor
Ci,j = (1)i+j det A(i|j) .
Note that the Laplace expansion for the determinant can be written
n
X
det A = Ai,j Ci,j .
i=1

(a) Show that if 1 i, j, k n with j 6= k then


n
X
Ai,k Ci,j = 0 .
i=1

(b) Define the classical adjoint of A, written adj A, by


(adj A)i,j = Cj,i .
Show that (adj A)A = (det A)I.
(c) Show that A(adj A) = (det A)I and deduce that if A is invertible then
A1 = (det A)1 adj A .

Hint: begin by applying the result of the previous part to At .


(d) Use the formula in the last part to find the inverses of the following matrices:

1 2 3 4
1 2 4 1 0 0
1 3 9 , 0 .
0 1 1 1
1 4 16
6 0 1 1

21. Consider a system of equations in n variables with coefficients from a field F . We


can write this as AX = Y for an n n matrix A, an n 1 matrix X (with entries
x1 , . . . , xn ) and an n 1 matrix Y (with entries y1 , . . . , yn ). Given the matrices A and
Y we would like to solve for X.
(a) Show that
n
X
(det A)xj = (1)i+j yi det A(i|j) .
i=1

(b) Show that if det A 6= 0 then we have


xj = (det A)1 det Bj ,
where Bj is an n n matrix obtained from A by replacing the j-th column of A
by Y . This is known as Cramers rule.

52
(c) Solve the following systems of equations using Cramers rule.

2x y + z 2t = 5
2x y + z =3



2x + 2y 3z + t = 1
2y z =1
x + y z = 1
yx =1


4x 3y + 2z 3t = 8

22. Find the determinants of the following matrices. In the first example, the entries are
from R and in the second they are from Z3 .

1 1 0 0 0
1 4 5 7 1
0 0 2 3 1 1 0 0

, 0 1 1 1 0
1 4 1 7
0 0 1 1 1
2 8 10 14
0 0 0 1 1

53
5 Eigenvalues
5.1 Definitions and the characteristic polynomial
The simplest matrix is I for some F . These act just like the field F . What is the
second simplest? A diagonal matrix; that is, a matrix D that satisfied Dij = 0 if i 6= j.

Definition 5.1.1. Let V be a finite dimensional vector space over F . An linear transfor-
mation T is called diagonalizable if there exists a basis such that [T ] is diagonal.

Proposition 5.1.2. T is diagonalizable if and only if there exists a basis {v1 , . . . , vn } of V


and scalars 1 , . . . , n such that

T (vi ) = i vi for all i .

Proof. Suppose that T is diagonalizable. Then there is a basis {v1 , . . . , vn } such that [T ] is
diagonal. Then, writing D = [T ] ,

[T vk ] = D[vk ] = D ek = Dk,k .

Now we can choose k = Dk,k .


If the second condition holds then we see that [T ] is diagonal with entries Di,j = 0 if
i 6= j and Di,i = i .
This motivates the following definition.

Definition 5.1.3. If T : V V is linear then we call a nonzero vector v an eigenvector of


T if there exists F such that T (v) = v. In this case we call the eigenvalue for v.

Theorem 5.1.4. If dim V < and T : V V is linear then the following are equivalent.

1. is an eigenvalue of T (for some eigenvector).

2. T I is not invertible.

3. det(T I) = 0.

Proof. If (1) holds then the eigenvector v is a non-zero vector in the nullspace of T I. Thus
T I is not invertible. We already know that (2) and (3) are equivalent. If T I is not
invertible then there is a non-zero vector in its nullspace. This vector is an eigenvector.

Definition 5.1.5. If T : V V is linear and dim V = n then we define the characteristic


polynomial c : F F by
c() = det(T I) .

Note that c() does not depend on the choice of basis.

54
We can write in terms of the matrix.

c() = det(T I) = det([T I] ) = det([T ] Id) .

Eigenvalues are exactly the roots of c().

Facts about c(x).

1. c is a polynomial of degree n. We can see this by analyzing each term in the definition
of the determinant: set B = A xI and see

sgn() B1,(1) Bn,(n) .

Each term Bi,(i) is a polynomial of degree 0 or 1 in x. So the product has degree at


most n. A sum of such polynomials is a polynomial of degree at most n.
In fact, the only term of degree n is

sgn(id) B1,1 Bn,n = (A1,1 x) (An,n x) .

So the coefficient of xn is (1)n .

2. In the above description of c(x), all terms corresponding to non-identity permutations


have degree at most n 2. Therefore the degree n 1 term comes from (A1,1
x) (An,n x) as well. It is

(1)n1 xn1 [A1,1 + + An,n ] = (1)n1 xn1 T r A .

3. Because c(0) = det A,

c(x) = (1)n xn T r A xn1 + + det A .


 

For F = C (or any field so that c(x) splits), we can always write c(x) = (1)n (x
1 ) (x n ). Thus the constant term in the polynomial is (1)n i . Therefore
Q

" n
#
Y
c(x) = (1)n xn T r A xn1 + + i .
i=1
Q
We find det A = i in C.

Theorem 5.1.6. If dim V = n and c() has n distinct roots then T is diagonalizable. The
converse is not true.

Proof. Write the eigenvalues as 1 , . . . , n . For each i we have an eigenvector vi . We claim


that the vi s are linearly independent. This follows from the lemma:

55
Lemma 5.1.7. If 1 , . . . , k are k-distinct eigenvalues associated to eigenvectors v1 , . . . , vk
then {v1 , . . . , vk } is linearly independent.
Proof. Suppose that
a1 v1 + + ak vk = ~0 .
Take T of both sides
a1 1 v1 + + ak k vk = ~0 .
Keep doing this k 1 times so we get the system of equations

a1 v1 + + ak vk = ~0
a1 1 v1 + + ak k vk = ~0

a1 k1 k1
1 v1 + + ak k vk = ~0

Write each vi as [vi ] for some basis . This is then equivalent to the matrix equation
1 1 k1

1

k1
a1 (v1 ) a2 (v2 ) ak (vk ) 1 2 2 = ~0 ~0 ~0 .


1 k k1
k

Here the left matrix is n k and has j-th column equal to the column vector aj [vj ] . But
the middle matrix has nonzero determinant when the i s are distinct: its determinant is
Q
1i<jk (j i ). Therefore it is invertible. Multiplying both sides by its inverse, we find
ai vi = ~0 for all i. Since vi 6= ~0, it follows that ai = 0 for all i.

5.2 Eigenspaces and the main diagonalizability theorem


Definition 5.2.1. If F we define the eigenspace

E = N (T I) = {v V : T (v) = v} .

Note that E is a subspace even if is not an eigenvalue. Furthermore,

E 6= {0} if and only if is an eigenvalue of T

and
E is T -invariant for all F .
What this means is that if v E then so is T (v):

(T I)(T (v)) = (T I)(v) = (T I)v = ~0 .

56
Definition 5.2.2. If W1 , . . . , Wk are subspaces of a vector space V then we write

W1 Wk

for the sum space W1 + + Wk and say the sum is direct if

Wj [W1 + + Wj1 ] = {0} for all j = 2, . . . , k .

We also say the subspaces are independent.

Theorem 5.2.3. If 1 , . . . , k are distinct eigenvalues of T then

E1 + + Ek = E1 Ek .

Furthermore !
k
X k
X
dim Ei = dim Ei .
i=1 i=1

Proof. The theorem will follows directly from the following lemma. The proof is in home-
work.
Lemma 5.2.4. Let W1 , . . . , Wk be subspaces of V . The following are equivalent.

1.
W1 + + Wk = W1 Wk .

2. Whenever w1 + + wk = ~0 for wi Wi for all i, we have wi = ~0 for all i.

3. Whenever
Pk i is a basis for Wi for all i, the i s are disjoint and := ki=1 i is a basis
for i=1 Wi .

So take w1 + +wk = ~0 for wi Ei for all i. Note that each nonzero wi is an eigenvector
for the eigenvalue i . Remove all the zero ones. If we are left with any nonzero ones, by the
previous theorem, they must be linearly independent. This would be a contradiction. So
they are all zero.
For the second claim take bases i of Ei . By the lemma, ki=1 i is a basis for ki=1 Ei .
P
This implies the claim.

Theorem 5.2.5 (Main diagonalizability theorem). Let T : V V be linear and dim V <
. The following are equivalent.

1. T is diagonalizable.

2. c(x) can be written as (1)n (x 1 )n1 (x k )nk , where ni = dim Ei for all i.

3. V = E1 Ek , where 1 , . . . , k are the distinct eigenvalues of T .

57
Proof. Suppose first that T is diagonalizable. Then there exists a basis of eigenvectors for
T ; that is, for which [T ] is diagonal. Clearly each diagonal element is an eigenvalue. For
each i, call ni the number of entries on the diagonal that are equal to i . Then [T i I]
has ni number of zeros on the diagonal. All other diagonal entries must be non-zero, so the
nullspace has dimension ni . In other words, ni = dim Ei .
Suppose that condition 2 holds. Since c is a polynomial of degree n we must have

dim E1 + + dimEk = dim V .

However since the i s are distinct the previous theorem gives that
k
X
dim Ei = dim V .
i=1

In other words, V = ki=1 Ei . The previous theorem implies that the sum is direct and the
P
claim follows.
Suppose that condition 3 holds. Then take i a basis for Ei for all i. Then = ki=1 i
is a basis for V . We claim that [T ] is diagonal. This is because each vector in is an
eigenvector. This proves 1 and completes the proof.

5.3 Exercises
1. Let V be anP
F-vector space and let W1 , . . . , Wk be subspaces of V . Recall the definition
of the sum ki=1 Wi . It is the subspace of V given by

{w1 + + wk : wi Wi }.
Lk
Recall further that this sum is called direct, and written as i=1 if and only if for all
1 < i n we have
i1
X
Wi ( Wj ) = {0}.
j=1

Show that the following statements are equivalent:

(a) The sum ki=1 Wi is direct.


P

(b) For any collection {w1 , . . . , wk } with wi Wi for all i, we have


k
X
wi = 0 i : wi = 0.
i=1

(c) If, for each i, i is a basis of Wi , then


Pk the i s are disjoint and their union
k
= ti=1 i is a basis for the subspace i=1 Wi .
(d) For any v ki=1 Wi there exist unique vectors w1 , . . . , wk such that wi Wi for
P

all i and v = ki=1 wi .


P

58
2. Let V be an F-vector space. Recall that a linear map p L(V, V ) is called a projection
if p p = p.

(a) Show that if p is a projection, then so is q = idV p, and we have p q = q p = 0.


(b) Let W1 , . . . , Wk be subspaces of V and assume that V = ki=1 Wi . For 1 t k,
L
show that there is a unique element pt L(V, Wt ) such that for any choice of
vectors w1 , . . . , wk such that wj Wj for all j,
(
wj j = t
pt (wj ) = .
0 j 6= t

(c) ShowPthat each pt defined in the previous part is a projection. Show furthermore
that ki=1 pt = idV and that for t 6= s we have pt ps = 0.

Pkconversely that if p1 , . . . , pt L(V, V ) are projections with the properties


(d) Show
(a) i=1 pi = idV and (b) pi pj = 0 for all i 6= j, and if we put Wi = R(pi ), then
V = ki=1 Wi .
L

3. Let V be an F-vector space.

(a) If U V is a subspace, W is another F-vector space, and f L(V, W ), define


f |U : U W by
f |U (u) = f (u) u U.
Show that the map f 7 f |U is a linear map L(V, W ) L(U, W ). It is called the
restriction map.
(b) Let f L(V, V ) and let U V be an f -invariant subspace (that is, a subspace U
with the property that f (u) U whenever u U ). Observe that f |U L(U, U ).
If W V is another f -invariant subspace and V = U W , show that

N (f ) = N (f |U )N (f |W ), R(f ) = R(f |U )R(f |W ), det(f ) = det(f |U ) det(f |W ).

(c) Let f, g L(V, V ) be two commuting endomorphisms, i.e. we have f g = g f .


Show that N (g) and R(g) are f -invariant subspaces of V .

4. Consider the matrix 


1 1
A := .
0 1
Show that there does not exist an invertible matrix P M22 (C) such that P AP 1 is
diagonal.

5. Let V be a finite-dimensional F-vector space and f L(V, V ). Observe that for each
natural number k we have N (f k ) N (f k+1 ).

(a) Show that there exists a natural number k so that N (f k ) = N (f k+1 ).

59
(b) Show further that for all l k one has N (f l ) = N (f k ).

6. Let V be a finite-dimensional F-vector space and f L(V, V ).

(a) Let U V be an f -invariant subspace and = (v1 , . . . , vn ) a basis of V such that


0 = (v1 , . . . , vk ) is a basis for U . Show that
 
A B
[f ] =
0 C
0
with A = [f |U ] 0 Mkk (F), B Mk(nk) (F), and C M(nk)(nk) (F).
(b) Let U, W V be f -invariant subspaces with V = U W . Let 0 be a basis for
U , 00 a basis for W , and = 0 t 00 . Show that
 
A 0
[f ] =
0 C
0 00
with A = [f |U ] 0 and C = [f |W ] 00 .

7. Let V be a finite-dimensional F-vector space. Recall that an element p L(V, V ) with


p2 = p is called a projection. On the other hand, an element i L(V, V ) with i2 = idV
is called an involution.

(a) Assume that char(F) 6= 2. Show that the maps

Involutions on V  Projections in V
1
i 7 (idV + i)
2
2p idV [ p

are mutually inverse bijections.


(b) Show that if p L(V, V ) is a projection, then the only eigenvalues of p are 0
and 1. Furthermore, V = E0 (p) E1 (p) (the eigenspaces for p). That is, p is
diagonalizable.
(c) Show that if i L(V, V ) is an involution, then the only eigenvalues of i are +1
and 1. Furthermore, V = E+1 (i) E1 (i). That is, i is diagonalizable.

Observe that projections and involutions are examples of diagonalizable endomor-


phisms which do not have dim(V )-many distinct eigenvalues.

8. In this problem we will show that every endomorphism of a vector space over an
algebraically closed field can be represented as an upper triangular matrix. This is a
simpler result than (and is implied by) the Jordan Canonical form, which we will cover
in class soon.

60
We will argue by (strong) induction on the dimension of V . Clearly the result holds for
dim V = 1. So suppose that for some k 1 whenever dim W k and U : W W
is linear, we can find a basis of W with respect to which the matrix of U is upper-
triangular. Further, let V be a vector space of dimension k + 1 over F and T : V V
be linear.

(a) Let be an eigenvalue of T . Show that the dimension of R := R(T I) is


strictly less than dim V and that R is T -invariant.
(b) Apply the inductive hypothesis to T |R to find a basis of R with respect to which
T |R is upper-triangular. Extend this to a basis for V and complete the argument.

9. Let A Mnn (F) be upper-triangular. Show that the eigenvalues of A are the diagonal
entries of A.
10. Let A be the matrix
6 3 2
A = 4 1 2 .
10 5 3
(a) Is A diagonalizable over R? If so, find a basis for R3 of eigenvectors of A.
(b) Is A diagonalizable over C? If so, find a basis for C3 of eigenvectors of A.
11. For which values of a, b, c R is the following matrix diagonalizable over R?

0 0 0 0
a 0 0 0

0 b 0 0
0 0 c 0

12. Let V be a finite dimensional vector space over a field F and let T : V V be linear.
Suppose that every subspace of V is T -invariant. What can you say about T ?
13. Let V be a finite dimensional vector space over a field F and let T, U : V V be
linear transformations.
(a) Prove that if I T U is invertible then I U T is invertible and
(I U T )1 = I + U (I T U )1 T .
(b) Use the previous part to show that T U and U T have the same eigenvalues.
14. Let A be the matrix
1 1 1
1 1 1 .
1 1 1
Find An for all n 1.
Hint: first diagonalize A.

61
6 Jordan form
6.1 Generalized eigenspaces
It is of course not always true that T is diagonalizable. There can be a couple of reasons for
that. First it may be that the roots of the characteristic polynomial do not lie in the field.
For instance  
0 1
1 0
has characteristic polynomial x2 + 1. Even still it may be that the eigenvalues are in the
field, but we still cannot diagonalize. On the homework you will see that the matrix
 
1 1
0 1

is not diagonalizable over C (although its eigenvalues are certainly in C). So we resort to
looking for a block diagonal matrix.
Suppose that we can show that

V = W1 Wk

for some subspaces Wi . Then we can choose a basis for V made up of bases for the Wi s. If
the Wi s are T -invariant then the matrix will be in block form.
Definition 6.1.1. Let T : V V be linear. A subspace W of V is T -invariant if T (w) W
whenever w W .

Each eigenspace is T -invariant. If w E then

(T I)T (w) = (T I)w = ~0 .

Therefore the eigenspace decomposition is a T -invariant direct sum.

To find a general T -invariant direct sum we define generalized eigenspaces.


Definition 6.1.2. Let T : V V be linear. If F then the set

E = {v V : (T I)k v = ~0 for some k}

is called the generalized eigenspace for .

This is a subspace. If v, w E and c F then there exists kv and kw such that

(T I)kv v = ~0 = (T I)kw w .

Choosing k = max{kv , kw } we find

(T I)k (cv + w) = ~0 .

62
Each generalized eigenspace is T -invariant. To see this, suppose that (T I)k v = ~0.
Then because T commutes with (T I)k we have

(T I)k T v = T (T I)k v = ~0 .

To make sure the characteristic polynomial has roots we will take F to be an algebraically
closed field. That is, each polynomial with coefficients in F has a root in F .

6.2 Primary decomposition theorem


Theorem 6.2.1 (Primary decomposition theorem). Let F be algebraically closed and V a
finite-dimensional vector space over F . If T : V V is linear and 1 , . . . , k are the distinct
eigenvalues of T then
V = E1 Ek .
Proof. We follow several steps.
Step 1. c(x) has a root. Therefore T has an eigenvalue. Let be this value.
Step 2. Consider the generalized eigenspace E . We first show that there exists k such that

E = N (T I)k .

Let v1 , . . . , vm be a basis for E . Then for each i there is a ki such that (T I)ki vi = ~0.
Choose k = max{k1 , . . . , km }. Then (T I)k kills all the basis vectors and thus kills
everything in E . Therefore
E N (T I)k .
The other direction is obvious.
Step 3. We now claim that

V = R(T I)k N (T I)k = R(T I)k E .

First we show that the intersection is only the zero vector. Suppose that v is in the in-
tersection. Then (T I)k v = ~0 and there exists w such that (T I)k w = v. Then
(T 1 I)2k1 w = ~0 so w E1 . Therefore

v = (T 1 I)k1 w = ~0 .

By the rank-nullity theorem,

dim R(T 1 I)k1 + dim N (T 1 I)k1 = dim V .

By the 2-subspace (dimension) theorem,

V = R(T 1 I)k1 + N (T 1 I)k1 ,

and so it is a direct sum.

63
Step 4. Write W1 = R(T 1 I)k1 so that

V = E1 W1 .

These spaces are T -invariant. To show that note that we know E1 is already. For W1 ,
suppose that w W1 . Then there exists u such that

w = (T 1 I)k1 u .

So
(T 1 I)k1 (T 1 I)u = (T 1 I)w .
Therefore (T 1 I)w W1 and thus W1 is (T 1 I)-invariant. If w W1 then

T w = (T 1 I)w + 1 Iw W1 ,

so W1 is T -invariant.
Step 5. We now argue by induction and do the base case. Let e(T ) be the number of
distinct eigenvalues of T . Note e(T ) 1.
We first assume e(T ) = 1. In this case we write 1 for the eigenvalue and see

V = E1 R(T 1 I)k1 = E1 W1 .

We claim that the second space is only the zero vector. Otherwise we restrict T to it to get
an operator TW1 . Then TW1 has an eigenvalue . So there is a nonzero vector w W1 such
that
T w = TW1 w = w ,
so w is an eigenvector for T . But T has only one eigenvalue so = 1 . This means that
w E1 and thus
w E1 W1 = {~0} .
This is a contradiction, so
V = E1 {~0} = E1 ,
and we are done.
Step 6. Suppose the theorem is true for any transformation U with e(U ) = k (k 1). Then
suppose that e(T ) = k + 1. Let 1 , . . . , k+1 be the distinct eigenvalues of T and decompose
as before:
V = E1 R(T 1 I)k1 = E1 W1 .
Now restrict T to W1 and call it TW1 .
Claim 6.2.2. TW1 has eigenvalues 2 , . . . , k+1 with the generalized eigenspaces from T : they
are E2 , . . . , Ek+1 .

64
Once we show this we will be done: we will have e(TW1 ) = k and so we can apply the
theorem:
W1 = E2 Ek+1 ,
so
V = E1 Ek+1 .

Proof. We first show that each of E2 , . . . , Ek+1 is in W1 . For this we want a lemma and a
definition:
Definition 6.2.3. If p(x) is a polynomial with coefficients in F and T : V V is linear,
where V is a vector space over F , we define the transformation

P (T ) = an T n + + a1 T + a0 I ,

where p(x) = an xn + + a1 x + a0 .
Lemma 6.2.4. Suppose that p(x) and q(x) are two polynomials with coefficients in F . If
they have no common root then there exist polynomials a(x) and b(x) such that

a(x)p(x) + b(x)q(x) = 1 .

Proof. Homework
Now choose v Ej for some j = 2, . . . , k + 1. By the decomposition we can write
v = u + w where u E1 and w W1 . We can now write

E1 = N (T 1 I)k1 and Ej = N (T j I)kj

and see
~0 = (T j I)kj v = (T j I)kj u + (T j I)kj w .

However E1 and W1 are T -invariant so they are (T j I)kj -invariant. This is a sum of
vectors equal to zero, where on is in E1 , the other is in W1 . Because these spaces direct
sum to V we know both vectors are zero. Therefore

u satisfies (T j I)kj u = ~0 = (T 1 I)k1 u .

In other words, p(T )u = q(T )u = ~0, where p(x) = (x j )kj and q(x) = (x 1 )k1 . Since
these polynomials have no root in common we can find a(x) and b(x) as in the lemma.
Finally,
u = (a(T )p(T ) + b(T )q(T ))u = ~0 .
This implies that v = w W1 and therefore all of E2 , . . . , Ek+1 are in W1 .
Because of the above statement, we now know that all of 2 , . . . , k+1 are eigenvalues of
TW1 . Furthermore if W1 is an eigenvalue of TW1 then it is an eigenvalue of T . It cannot
be 1 because then any eigenvector for TW1 with eigenvalue W1 would have to be in E1

65
but also in W1 so it would be zero, a contradiction. Therefore the eigenvalues of TW1 are
precisely 2 , . . . , k+1 .
Let EWj 1 be the generalized eigenspace for TW1 corresponding to j . We want

EWj 1 = Ej , j = 2, . . . , k + 1 .

If w EWj 1 then there exists k such that (TW1 j I)k w = ~0. But now on W1 , (TW1 j I)k
is the sam as (T j I)k , so
(T j I)k w = (TW1 j I)k w = ~0 ,
so that EWj 1 = Ej . To show the other inclusion, take w Ej , Since Ej W1 , this implies
that w W1 . Now since there exists k such that (T I)k w = ~0, we find
(TW1 j I)k w = (T j I)k w = ~0 ,
and we are done. We find
V = E1 Ek+1 .

6.3 Nilpotent operators


Now we look at the operator T on the generalized eigenspaces. We need only restrict T to
each eigenspace to determine the action on all of V . So for this purpose we will assume that
T has only one generalized eigenspace: there exists F such that
V = E .
In other words, for each v V there exists k such that (T I)k v = ~0. Recall we can then
argue that there exists k such that

V = N (T I)k ,

or, if U = T I, U k = 0.
Definition 6.3.1. Let U : V V be linear. We say that U is nilpotent if there exists k
such that
Uk = 0 .
The smallest k for which U k = 0 is called the degree of U .
The point of this section will be to give a structure theorem for nilpotent operators. It
can be seen as a special case of Jordan form when all eigenvalues are zero. To prove this
structure theorem, we will look at the nullspaces of powers of U . Note that if k = deg(U ),
then N (U k ) = V but N (U k1 ) 6= V . We get then an increasing chain of subspaces
N0 N1 Nk , where Nj = N (U j ) .

66
If v Nj \ Nj1 then U (v) Nj1 \ Nj2 .

Proof. v has the property that U j v = ~0 but U j1 v 6= ~0. Therefore U j1 (U v) = ~0 but


U j2 (U v) 6= ~0.

Definition 6.3.2. If W1 W2 are subspaces of V then we say that v1 , . . . , vm W2 are


linearly independent mod W1 if

a1 v1 + + am vm W1 implies ai = 0 for all i .

Lemma 6.3.3. Suppose that dim W2 dim W1 = l and v1 , . . . , vm W2 \ W1 are linearly


independent mod W1 . Then

1. m l and

2. we can choose l m vectors vm+1 , . . . , vl in W2 \ W1 such that {v1 , . . . , vl } are linearly


independent mod W1 .

Proof. It suffices to show that we can add just one vector. Let w1 , . . . , wt be a basis for W1 .
Then define
X = Span({w1 , . . . , wt , v1 , . . . , vm }) .
Then this set is linearly independent. Indeed, if

a1 w1 + + at wt + b1 v1 + + bm vm = ~0 ,

then b1 v1 + + bm vm W1 , so all bi s are zero. Then

a1 w1 + + at wt = ~0 ,

so all ai s are zero. Thus

t + m = dim X dim W2 = t + l

or m l.
For the second part, if k = l, we are done. Otherwise dim X < dim W2 , so there exists
vk+1 W2 \ X. To show linear independence mod W1 , suppose that

a1 v1 + + ak vk + ak+1 vk+1 = w W1 .

If ak+1 = 0 then we are done. Otherwise we can solve for vk+1 and see it is in X. This is a
contradiction.

Lemma 6.3.4. Suppose that for some m, v1 , . . . , vp Nm \ Nm1 are linearly independent
mod Nm1 . Then U (v1 ), . . . , U (vp ) are linearly independent mod Nm2 .

67
Proof. Suppose that
a1 U (v1 ) + + ap U (vp ) = ~n Nm2 .
Then
U (a1 v1 + + ap vp ) = ~n .
Now
U m1 (a1 v1 + + ap vp ) = U m1 (~n) = ~0 .
Therefore a1 v1 + + ap vp Nm1 . But these are linearly independent mod Nm1 so we
find that ai = 0 for all i.
Now we do the following

1. Write dm = dim Nm dim Nm1 . Starting at the top, choose

k = v1k , . . . , vdkk

which are linearly independent mod Nk1 .

2. Move down one level: write vik1 = U (vik ). Then {v1k1 , . . . , vdk1
k
} is linearly indepen-
dent mod Nk2 , so dk dk1 . By the lemma we can extend this to

k1 = {v1k1 , . . . , vdk1
k
, vdk1
k +1
, . . . , vdk1
k1
},

a maximal linearly independent set mod Nk2 in Nk1 \ Nk2 .

3. Repeat.

Note that dk + dk1 + + d1 = dim V . We claim that if i is the set at level i then

= 1 k

is a basis for V . It suffices to show linear independence. For this, we use the following fact.

If W1 Wk = V is a nested sequence of subspaces with i Wi \ Wi1 linearly


independent mod Wi1 then = i i is linearly independent. (check).

We have shown the following result.

Definition 6.3.5. A chain of length l for U is a set {v, U (v), U 2 (v), . . . , U l1 (v)} of non-zero
vectors such that U l (v) = ~0.

Theorem 6.3.6. If U : V V is linear and nilpotent (dim V < ) then there exists a
basis of V consisting entirely of chains for U .

68
Let U : V V be nilpotent. If C = {U l1 v, U l2 v, . . . , U (v), v} is a chain then note that

C = Span(C) is U -invariant .

Since V has a basis of chains, say C1 , . . . , Cm , we can write

V = C1 Cm .

In our situation each Ci is a basis for Ci so our matrix for U is block diagonal. Each
block corresponds to a chain. Then U |Ci has the following matrix w.r.t. Ci :

0 1 0 0 0
0 0 1 0 0

0 0 0 1
.



0 1
0

Theorem 6.3.7 (Uniqueness of nilpotent form). Let U : V V be linear and nilpotent


with dim V < . Write

li () = # of (maximal) chains of length i in .

Then if , 0 are bases of V consisting of chains for U then

li ()li ( 0 ) for all i .

Proof. Write Ki () for the set of elements v of such that U i (v) = ~0 but U i1 (v) 6= ~0 (for
i = 1 we only require U (v) = ~0). Let li () be the number of (maximal) chains in of length
at least i.
We first claim that #Ki () = li () for all i. To see this note that for each chain C of
length at least i there is a unique element v C such that v Ki (). Conversely, for each
v Ki () there is a unique chain of length at least i containing v.
Let ni be the dimension of N (U i ). We claim that ni equals the number mi () of v
such that U i (v) = ~0. Indeed, the set of such vs is linearly independent and in N (U i ) so
ni mi (). However all other vs (dim V mi () of them) are mapped to distinct elements
of (since is made up of chains), so dim R(U i ) dim V mi (), so ni mi ().
Because N (U i ) contains N (U i1 ) for all i (here we take N (U 0 ) = {~0}, we have
li () = #Ki () = ni ni1 .

Therefore
li () = li () li+1 () = (ni ni1 ) (ni+1 ni ) .
The right side does not depend on and in fact the same argument shows it is equal to
li ( 0 ). This completes the proof.

69
6.4 Existence and uniqueness of Jordan form, Cayley-Hamilton
Definition 6.4.1. A Jordan block for of size l is

1 0 0 0
0 1 0 0

Jl =
.

1

Theorem 6.4.2 (Jordan canonical form). If T : V V is linear with dim V < and F
algebraically closed. Then there is a basis of V such that [T ] is block diagonal with Jordan
blocks.

Proof. First decompose V = E1 Ek . On each Ei , the operator T i I is nilpotent.


Each chain for (T i I)|E gives a block in nilpotent decomposition. Then T = T i I +i I
i
gives a Jordan block.
Draw a picture at this point (of sets of chains). We first decompose

V = E1 Ek

and then
Ei = C1i Cki i ,
where each Cji is the span of a chain of generalized eigenvectors: Cji = {v1 , . . . , vp }, where

T (v1 ) = i v1 , T (v2 ) = v2 + v1 , . . . , T (vp ) = i vp + vp1 .

Theorem 6.4.3 (Cayley-Hamilton). Let T : V V be linear with dim V < and F


algebraically closed. Then
c(T ) = 0 .

Remark. In fact the theorem holds even if F is not algebraically closed by doing a field
extension.

Lemma 6.4.4. If U : V V is linear and nilpotent with dim V = n < then

Un = 0 .

Therefore if T : V V is linear and v E then



(T I)dim E
v = ~0 .

Proof. Let be a basis of chains for U . Then the length of the longest chain is n.

70
Lemma 6.4.5. If T : V V is linear with dim V < and is a basis such that [T ] is
in Jordan form then for each eigenvalue , let S be the basis vectors corresponding to blocks
for . Then
Span(S ) = E for each .
Therefore if
k
Y
c(x) = (i x)ni ,
i=1

then ni = dim Ei for each i.

Proof. Write 1 , . . . , k for the distinct eigenvalues of T . Let

Wi = Span(Si ) .

We may assume that the blocks corresponding to 1 appear first, 2 appear second, and so
on. Since [T ] is in block form, this means V is a T -invariant direct sum

W1 Wk .

However for each i, T i I restricted to Wi is in nilpotent form. Thus (T i I)dim E i v = ~0
for each v Si . This means

Wi Ei for all i or dim Wi dim Ei .

But V = E1 Ek , so ki=1 dim Ei = dim V . This gives that dim Wi = dimEi for
P

all i, or Wi = Ei .
For the second claim, ni is the number of times that i appears on the diagonal; that is,
the dimension of Span(Si ).
Proof of Cayley-Hamilton. We first factor
k
Y
c(x) = (i x)ni ,
i=1

where ni is called the algebraic multiplicity of the eigenvalue i . Let be a basis such that
[T ] is in Jordan form. If v is in a block corresponding to j then v Ej and so

(T j I)dim Ej v = ~0 by the previous lemma). Now
k
! !
Y Y
c(T )v = (i I T )ni v= (i I T )ni (j I T )nj v = ~0
i=1 i6=j

since nj = dim Ej .
Finally we have uniqueness of Jordan form.

71
Theorem 6.4.6. If and 0 are bases of V for which T is in Jordan form, then the matrices
are the same up to permutation of blocks.

Proof. First we note that the characteristic polynomial can be read off of the matrices and
is invariant. This gives that the diagonal entries are the same, and the number of vectors
corresponding to each eigenvalue is the same.
We see from the second lemma that if i and i0 are the parts of the bases corresponding
to blocks involving i then

Wi := Span(i ) = Ei = Span(i0 ) =: Wi0 .

Restricting T to Wi and to Wi0 then gives the blocks for i . But then i and i0 are just bases
of Ei of chains for T i I. The number of chains of each length is the same, and this is
the number of blocks of each size.

6.5 Exercises
Notation:

1. If F is a field then we write F[X] for the set of polynomials with coefficients in F.

2. If P, Q F[X] then we say that P divides Q and write P |Q if there exists S F[X]
such that Q = SP .

3. If P F[X] then we write the deg(P ) for the largest k such that the coefficient of xk
in P is nonzero. We define the degree of the zero polynomial to be .

4. If P F[X] then we say that P is monic if the coefficient of xn is 1, where n =deg(P ).

5. For a complex number z, we denote the complex conjugate by z, i.e. if z = a + ib, with
a, b R, then z = a ib.

6. If V is an F -vector space, recall the definition of V V : it is the F -vector space whose


elements are
V V = {(v1 , v2 ) : v1 , v2 V } .
Vector addition is performed as (v1 , v2 ) + (v3 , v4 ) = (v1 + v3 , v2 + v4 ) and for c F ,
c(v1 , v2 ) is defined as (cv1 , cv2 ).

Exercises

1. (a) Show that for P, Q F[X], one has deg(P Q) = deg(P ) + deg(Q).
(b) Show that for P, D F[X] such that D is nonzero, there exist Q, R F[X] such
that P = QD + R and deg(R) < deg(D).
Hint: Use induction on deg(P ).

72
(c) Show that, for any F ,

P () = 0 (X )|P.

(d) Let P F[X] be of the form p(X) = a(X 1 )n1 (X k )nk for some
a, 1 , . . . , k F and natural numbers n1 , . . . , nk . Show that Q F[X] divides
P if and only if Q(X) = b(X 1 )m1 (X k )mk for some b F and natural
numbers mi with mi ni (we allow mi = 0).
(e) Assume that F is algebraically closed. Show that every P F[X] is of the
form a(X 1 )n1 . . . (X k )nk for some a, 1 , . . . , k F and natural numbers
n1 , . . . , nk with n1 + + nk = deg(P ). In this case we call the i s the roots of
P.

2. Let F be a field and suppose that P, Q are nonzero polynomials in F[X]. Define the
subset S of F[X] as follows:

S = {AP + BQ : A, B F[X]} .

(a) Let D S be of minimal degree. Show that D divides both P and Q.


Hint: Use part 2 of the previous problem.
(b) Show that if S F [X] divides both P and Q then S divides D.
(c) Conclude that there exists a unique monic polynomial D F[X] satisfying the
following conditions
i. D divides P and Q.
ii. If T F[X] divides both P and Q then T divides D.
(Such a polynomial is called the greatest common divisor (GCD) of P and Q.)
(d) Show that if F is algebraically closed and P and Q are polynomials in F[X] with
no common root then there exist A and B in F[X] such that

AP + BQ = 1 .

3. Let F be any field, V be an F -vector space, f L(V, V ), and W V an f -invariant


subspace.

(a) Let p : V V /W denote the natural map defined in Homework 8, problem 5.


Show that there exists an element of L(V /W, V /W ), which we will denote by
f |V /W , such that p f = f |V /W p. It is customary to expresses this fact using
the following diagram:
f
V / V
p p
 f |V /W 
V /W / V /W

73
(b) Let 0 be a basis for W and be a basis for V which contains 0 . Show that the
image of 00 := \ 0 under p is a basis for V /W .
(c) Let 00 be a subset of V with the property that the restriction of p to 00 (which
is a map of sets 00 V /W ) is injective and its image is a basis for V /W . Show
that 00 is a linearly-independent set. Show moreover that if 0 is a basis for W ,
then := 0 t 00 is a basis for V .
(d) Let , 0 , and 00 be as above. Show that
 
A B
[f ] =
0 C
0 p( 00 )
with A = [f |W ] 0 and C = [f |V /W ]p( 00 ) .

4. The minimal polynomial. Let F be any field, V an F -vector space, and f L(V, V ).

(a) Consider the subset S F [X] defined by

S = {P F [X]|P (f ) = 0}.

Show that S contains a nonzero element.


(b) Let Mf S be a monic non-zero element of minimal degree. Show that Mf
divides any other element of S. Conclude that Mf as defined is unique. It is
called the minimal polynomial of f .
(c) Show that the roots of Mf are precisely the eigenvalues of f by completing the
following steps.
i. Suppose that r F is such that Mf (r) = 0. Show that

Mf (X) = Q(X)(X r)k

for some positive integer k and Q F[X] such that Q(r) 6= 0. Prove also
that Q(f ) 6= 0.
ii. Show that if r F satisfies Mf (r) = 0 then f rI is not invertible and thus
r is an eigenvalue of f .
iii. Conversely, let be an eigenvalue of f with eigenvector v. Show that if
P F[X] then
P (f )v = P ()v .
Conclude that is a root of Mf .
(d) Assume that F is algebraically closed. For each eigenvalue of f , express mult (Mf )
in terms of the Jordan form of f .
(e) Assume that F is algebraically closed. Show that f is diagonalizable if and only
if mult (Mf ) = 1 for all eigenvalues of f .

74
(f) Assume that F is algebraically closed. Under which circumstances the does Mf
equal the characteristic polynomial of f ?
5. If T : V V is linear and V is a finite-dimensional F-vector space with F algebraically
closed, we define the algebraic multiplicity of an eigenvalue to be a(), the dimension
of the generalized eigenspace E . The geometric multiplicity of is g(), the dimension
of the eigenspace E . Finally, the index of is i(), the length of the longest chain of
generalized eigenvectors in E .
Suppose that is an eigenvalue of T and g = g() and i = i() are given integers.

(a) What is the minimal possible value for a = a()?


(b) What is the maximal possible for a?
(c) Show that a can take any value between the answers for the above two questions.
(d) What is the smallest dimension n of V for which there exist two linear transfor-
mations T and U from V to V with all of the following properties? (i) There
exists F which is the only eigenvalue of either T or U , (ii) T and U are not
similar transformations and (iii) the geometric multiplicity of for T equals that
of U and similarly for the index.

6. Find the Jordan form for each of the following matrices over C. Write the minimal
polynomial and characteristic polynomial for each. To do this, first find the eigenvalues.
Then, for each eigenvalue , find the dimensions of the nullspaces of (A I)k for
pertinent values of k (where A is the matrix in question). Use this information to
deduce the block forms.

1 0 0 2 3 0 5 1 3
(a) 1 4 1 (b) 0 1 0 (c) 0 2 0
1 4 0 0 1 2 6 1 4

3 0 0 2 3 2 4 1 0
(d) 4 2 0 (e) 1 4 2 (f ) 1 2 0
5 0 2 0 1 3 1 1 3

7. (a) The characteristic polynomial of the matrix



7 1 2 2
1 4 1 1
A= 2 1 5 1

1 1 2 8
is c(x) = (x 6)4 . Find an invertible matrix S such that S 1 AS is in Jordan
form.
(b) Find all complex matrices in Jordan form with characteristic polynomial
c(x) = (i x)3 (2 x)2 .

75
8. Complexification of finite-dimensional real vector spaces. Let V be an R-
vector space. Just as we can view R as a subset of C we will be able to view V as a
subset of a C-vector space. This will be useful because C is algebraically closed so we
can, for example, use the theory of Jordan form in VC and bring it back (in a certain
form) to V . We will give two constructions of the complexification; the first is more
elementary and the second is the standard construction you will see in algebra.
We put VC = V V .
(a) Right now, VC is only an R-vector space. We must define what it means to
multiply vectors by complex scalars. For z C, z = a + ib with a, b R, and
v = (vr , vi ) VC , we define the element zv VC to be

(avr bvi , avi + bvr ).

Show that in this way, VC becomes a C-vector space. This is the complexification
of V .
(b) We now show how to view V as a subset of VC . Show that the map : V VC
which maps v to (v, 0) is injective and R-linear. (Thus the set (V ) is a copy of
V sitting in VC .)
(c) Show that dimC (VC ) = dimR (V ). Conclude that VC is equal to spanC ((V )).
Conclude further that if v1 , . . . , vn is an R-basis for V , then (v1 ), . . . , (vn ) is a
C-basis for VC .
(d) Complex conjugation: We define the complex conjugation map c : VC VC to
be the map (vr , vi ) 7 (vr , vi ). Just as R (sitting inside of C) is invariant under
complex conjugation, so will our copy of V (and its subspaces) be inside of VC .
i. Prove that c2 = 1 and i(V ) = {v VC |c(v) = v}.
ii. Show that for all z C and v VC , we have c(zv) = zc(v). Maps with this
property are called anti-linear.
iii. In the next two parts, we classify those subspaces of VC that are invariant
under c. Let W be a subspace of V . Show that the C-subspace of VC spanned
by (W ) equals
{(w1 , w2 ) VC : w1 , w2 W }
and is invariant under c.
iv. Show conversely that if a subspace W of the C-vector space VC is invariant
under c, then there exists a subspace W V such that W = spanC ((W )).
v. Last, notice that the previous two parts told us the following: The subspaces
of VC which are invariant under conjugation are precisely those which are
equal to WC for subspaces W of the R-vector space V . Show that moreover,
in that situation, the restriction of the complex conjugation map c : VC VC
to WC is equal to the complex conjugation map defined for WC (the latter
map is defined intrinsically for WC , i.e. without viewing it as a subspace of
VC ).

76
9. Let V be a finite dimensional R-vector space. For this exercise we use the notation of
the previous one.

(a) Let W be another finite dimensional R-vector space, and let f L(V, W ). Show
that
fC ((v, w)) := (f (v), f (w))
defines an element fC L(VC , WC ).
(b) Show that for v VC , we have fC (c(v)) = c(fC (v)). Show conversely that if
f L(VC , WC ) has the property that f(c(v)) = c(f(v)), the f = fC for some
f L(V, W ).

10. In this problem we will establish the real Jordan form. Let V be a vector space over
R of dimension n < . Let T : V V be linear and TC its complexification.

(a) If C is an eigenvalue of TC , and E is the corresponding generalized eigenspace,


show that
c(E ) = E .
(b) Show that the non-real eigenvalues of TC come in pairs. In other words, show that
we can list the distinct eigenvalues of TC as

1 , . . . , r , 1 , 2 , . . . , 2m ,

where for each j = 1, . . . , r, j = j and for each i = 1, . . . , m, 2i1 = 2i .


(c) Because C is algebraically closed, the proof of Jordan form shows that

VC = E1 Er E1 E2m .

Using the previous two points, show that for j = 1, . . . , r and i = 1, . . . , m, the
subspaces of VC
Ej and E2i1 E2i
are c-invariant.
(d) Deduce from the results of problem 6, homework 10 that there exist subspaces
X1 , . . . , Xr and Y1 , . . . , Ym of V such that for each j = 1, . . . , r and i = 1, . . . , m,

Ej = SpanC ((Xj )) and E2i1 E2i = SpanC ((Yi )) .

Show that
V = X1 Xr Y1 Ym .
(e) Prove that for each j = 1, . . . , r, the transformation T j I restricted to Xj is
nilpotent and thus we can find a basis j for Xj consisting entirely of chains for
T j I.

77
(f) For each i = 1, . . . , m, let

i = {(v1i , w1i ), . . . , (vni i , wni i )}

be a basis of E2i1 consisting of chains for TC 2i1 I. Prove that

i = {v1i , w1i , . . . , vni i , wni i }

is a basis for Yi . Describe the form of the matrix representation of T restricted


to Yi relative to i .
(g) Gathering the previous parts, state and prove a version of Jordan form for linear
transformations over finite-dimensional real vector spaces. Your version should
be of the form If T : V V is linear then there exists a basis of V such that
[T ] has the form . . .

78
7 Bilinear forms
7.1 Definitions
We now switch gears from Jordan form.
Definition 7.1.1. If V is a vector space over F , a function f : V V F is called a
bilinear form if for fixed v V , f (v, w) is linear in w and for fixed w V , f (v, w) is linear
in v.
Bilinear forms have matrix representations similar to those for linear transformations.
Choose a basis = {v1 , . . . , vn } for V and write

v = a1 v1 + + an vn , w = b1 v1 + + bn vn .

Now n n X
n
X X
f (v, w) = ai f (vi , w) = ai bj f (vi , vj ) .
i=1 i=1 j=1

Define an n n matrix A by Ai,j = f (vi , vj ). Then this is


n n
! n
X X X  
ai bj Ai,j = ai A b = (~a)t A ~b = [v]t [f ] [w] .
~
i
i=1 j=1 i=1

We have proved:
Theorem 7.1.2. If dim V < and f : V V F is a bilinear form there exists a unique
matrix [f ] such that for all v, w V ,

f (v, w) = [v]t [f ] [w] .

Furthermore the map f 7 [f ] is an isomorphism from Bil(V, F ) to Mnn (F ).


Proof. We showed existence. To prove uniqueness, suppose that A is another such matrix.
Then
Ai,j = eti Aej = [vi ]t A[vj ] = f (vi , vj ) .

If 0 is another basis then


 t   0 t
t 0 0 0
([v] )
0 [I] [f ] [I] [w] = [I] [v] [f ] [I] [w] 0 = ([v] )t [f ] [w] = f (v, w) .

0

0

Therefore  0 t
0 0
[f ] 0 = [I] [f ] [I] .
Note that for fixed v V the map Lf (v) : V F given by Lf (v)(w) = f (v, w) is a
linear functional. So f gives a map in L(V, V ).

79
Theorem 7.1.3. Denote by Bil(V, F ) the set of bilinear forms on V . The map L :
Bil(V, F ) L(V, V ) given by
L (f ) = Lf
is an isomorphism.
Proof. If f, g Bil(V, F ) and c F then
(L (cf + g)(v)) (w) = (Lcf +g (v)) (w) = (cf + g)(v, w) = cf (v, w) + g(v, w)
= cLf (v)(w) + Lg (v)(w) = (cLf (v) + Lg (v))(w)
= (cL (f )(v) + L (g)(v))(w) .
Thus L (cf + g)(v) = cL (f )(v) + L (g)(v). This is the same as (cL (f ) + L (g))(v).
Therefore L (cf + g) = cL (f ) + L (g). Thus L is linear.
Now Bil(V, F ) has dimension n2 . This is because the map from last theorem is an
isomorphism. So does L(V, V ). Therefore we only need to show one-to-one or onto. To
show one-to-one, suppose that L (f ) = 0. Then for all v, Lf (v) = 0. In other words, for all
v and w V , f (v, w) = 0. This means f = 0.
Remark. We can also define Rf (w) by Rf (w)(v) = f (v, w). Then the corresponding map
R is an isomorphism.
You will prove the following fact in homework. If is a basis for V and is the dual
basis, then for each f Bil(V, F ),
[Rf ] = [f ] .
Then we have  t
[Lf ] = [f ] .
To see this, set g Bil(V, F ) to be g(v, w) = f (w, v). Then for each v, w V ,
([w] )t [f ] [v] = f (w, v) = g(v, w)
Taking transpose on both sides,
 t
([v] )t [f ] [w] = g(v, w)
 t
so [g] = [f ] . But Lf = Rg , so
 t
[f ] = [Rg ] = [Lf ] .

Definition 7.1.4. If f Bil(V, F ) then we define the rank of f to be the rank of Rf .


By the above remark, the rank equals the rank of either of the matrices
[f ] or [Lf ] .

Therefore the rank of [f ] does not depend on the choice of basis.

80
For f Bil(V, F ), define

N (f ) = {v V : f (v, w) = 0 for all w V } .

This is just
{v V : Lf (v) = 0} = N (Lf ) .
But Lf is a map from V to V , so we have

rank(f ) = rank(Lf ) = dim V dim N (f ) .

7.2 Symmetric bilinear forms


Definition 7.2.1. A bilinear form f Bil(V, F ) is called symmetric if f (v, w) = f (w, v)
for all v, w V . f is called skew-symmetric if f (v, v) = 0 for all v V .

The matrix for a symmetric bilinear form is symmetric and the matrix for a skew-
symmetric bilinear form is anti-symmetric.

Furthermore, each symmetric matrix A gives a symmetric bilinear form:

f (v, w) = ([v] )t A [w] .

Similarly for skew-symmetric matrices.

Lemma 7.2.2. If f Bil(V, F ) is symmetric and char(F ) 6= 2 then f (v, w) = 0 for all
v, w V if and only if f (v, v) = 0 for all v V .
Proof. One direction is clear. For the other, suppose that f (v, v) = 0 for all v V .

f (v + w, v + w) = f (v, v) + 2f (v, w) + f (w, w)

f (v w, v w) = f (v, v) 2f (v, w) + f (w, w) .


Therefore
0 = f (v + w, v + w) f (v w, v w) = 4f (v, w) .
Here 4f (v, w) means f (v, w) added to itself 3 times. If char(F ) 6= 2 then this implies
f (v, w) = 0.
Remark. If char(F ) = 2 then the above lemma is false. Take F = Z2 with V = F 2 and f
with matrix  
0 1
.
1 0
Check that this has f (v, v) = 0 for all v but f (v, w) is clearly not 0 for all v, w V .
Definition 7.2.3. A basis = {v1 , . . . , vn } of V is orthogonal for f Bil(V, F ) if f (vi , vj ) =
0 whenever i 6= j. It is orthonormal if it is orthogonal and f (vi , vi ) = 1 for all i.

81
Theorem 7.2.4 (Diagonalization of symmetric bilinear forms). Let f Bil(V, F ) with
char(F ) 6= 2 and dim V < . If f is symmetric then V has an orthogonal basis for f .

Proof. We argue by induction on n = dim V . If n = 1 it is clear. Let us now suppose that


the statement holds for all k < n for some n > 1 and show that it holds for n. If f (v, v) = 0
for all v then f is identically zero and thus we are done. Otherwise we can find some v 6= 0
such that f (v, v) 6= 0.
Define
W = {w V : f (v, w) = 0} .
Since this is the nullspace of Lf (v) and Lf (v) is a nonzero element of V , it follows that W
is n 1 dimensional. Because f restricted to W is still a symmetric bilinear function, we can
0
find a basis of W such that [f ] 0 is diagonal. Write 0 = {v1 , . . . , vn1 } and = 0 {v}.
0

Then we claim is a basis for V : if

a1 v1 + + an1 vn1 + av = ~0 ,

then taking Lf (v) of both sides we find af (v, v) = ~0, or a = 0. Linear independence gives
that the other ai s are zero.
Now it is clear that [f ] is diagonal. For i 6= j which are both < n this follows because 0
is a basis for which f is diagonal on W . Otherwise one of i, j is n and then the other vector
is in W and so f (vi , vj ) = 0.

In the basis , f has a diagonal matrix. This says that for each symmetric matrix A
we can find an invertible matrix S such that

S t AS is diagonal .

In fact, if F is a field such that each number has a square root (like C) thenp
we can make
a new basis, replacing each element v of such that f (v, v) 6= 0 by v/ f (v, v) and
leaving all elements such that f (v, v) = 0 to find a basis such that the representation
of f is diagonal, with all 1 and 0 on the diagonal. The number of 1s equals the rank
of f .

Therefore if f has full rank and each element of F has a square root, there exists an
orthonormal basis of V for f .

Theorem 7.2.5 (Sylvesters law). Let f be a symmetric bilinear form on Rn . There exists
a basis such that [f ] is diagonal, with only 0s, 1s and 1s. Furthermore, the number
of each is independent of the choice of basis that puts f into this form.
p
Proof. Certainly such a basis exists. Just modify the construction by dividing by |f (vi , vi )|
instead. So we show the other claim. Because the number of 0s is independent of the basis,
we need only show the statement for 1s.

82
For a basis , let V + () be the span of the vi s such that f (vi , vi ) > 0, and similarly for
V () and V 0 (). Clearly
V = V + () V () V 0 () .
Note that the number of 1s equals the dimension of V + (). Furthermore, for each v V + ()
we have p
X
f (v, v) = a2i f (vi , vi ) > 0 ,
i=1

where v1 , . . . , vp are the basis vectors for V + (). A similar argument gives f (v, v) 0 for
all v V () V 0 ().
If 0 is another basis we also have
V = V + ( 0 ) V ( 0 ) V 0 ( 0 ) .
Suppose that dim V + ( 0 ) > dim V + (). Then
dim V + ( 0 ) + dim (V () V 0 ()) > n ,
so V + ( 0 ) intersects V () V 0 () in at least one non-zero vector, say v. Since v V + ( 0 ),
f (v, v) > 0, However, since v V () V 0 (), so f (v, v) 0, a contradiction. Therefore
dim V + () = dim V + ( 0 ) and we are done.

The subspace V 0 () is unique. We can define


NL (f ) = N (Lf ), NR (f ) = N (Rf ) .
In the symmetric case, these are equal and we can define it to be be N (f ). We claim
that
V 0 () = N (f ) for all .
Indeed, if v V 0 () then because the basis is orthogonal for f , f (v, v) = 0 for all
v and so v N (f ). On the other hand,
dim V 0 () = dim V dim V + () + dim V () = dim V rank [f ] = dim N (f ) .
 

However the spaces V + () and V () are not unique. Let us take f Bil(R2 , R) with
matrix (in the standard basis)
 
1 0
[f ] = .
0 1

Then f ((a, b), (c, d)) = ac bd. Take v1 = (2, 3) and v2 = ( 3, 2). Thus we get

f (v1 , v1 ) = (2)(2) ( 3)( 3) = 1 ,

f (v1 , v2 ) = (2)( 3) ( 3)(2) = 0

f (v2 , v2 ) = ( 3)( 3) (2)(2) = 1 .

83
7.3 Sesquilinear and Hermitian forms
One important example of a symmetric bilinear form on Rn is
f (v, w) = v1 w1 + + vn wn .
p
In this case, f (v, v) actually defines a good notion of length of vectors on Rn (we will
define precisely what this means later). In particular, we have f (v, v) 0 for all v. On Cn ,
however, this is not true. If f is the bilinear form from above, then f ((i, . . . , i), (i, . . . , i)) < 0.
But if we define the form
f (v, w) = v1 w1 + + vn wn ,
then it is true. This is not bilinear, but it is sesquilinear.
Definition 7.3.1. Let V be a finite dimensional complex vector space. A function f :
V V C is called sesquilinear if
1. for each w V , the function v 7 f (v, w) is linear and
2. for each v V , the function w 7 f (v, w) is anti-linear.
To be anti-linear, it means that f (v, cw1 + w2 ) = cf (v, w1 ) + f (v, w2 ). The sesquilinear form
f is called Hermitian if f (v, w) = f (w, v).
Note that if f is hermitian, then f (v, v) = f (v, v), so f (v, v) R.
1. If f (v, v) 0 (> 0) for all v 6= 0 then f is positive semi-definite (positive definite).
2. If f (v, v) 0 (< 0) for all v 6= 0 then f is negative semi-definite (negative
definite).
If f is a sesquilinear form and is a basis then there is a matrix for f : as before, if
v = a1 v1 + + an vn and w = b1 v1 + + bn vn ,
n
X n
X
bj f (vi , vj ) = [v]t [f ] [w] .
 
f (v, w) = ai f (vi , w) =
i=1 i=1

The map w 7 f (, w) is a conjugate isomorphism from V to V .


We have the polarization formula
4f (u, v) = f (u + v, u + v) f (u v, u v) + if (u + iv, u + iv) if (u iv, u iv) .
From this we deduce that if f (v, v) = 0 for all v then f = 0.
Theorem 7.3.2 (Sylvester for Hermitian forms). Let f be a hermitian form on a finite-
dimensional complex vector space V . There exists a basis of V such that [f ] is diagonal
with only 0s, 1s and 1s. Furthermore the number of each does not depend on so long
as the matrix is in diagonal form.
Proof. Same proof.

84
7.4 Exercises
Notation:
1. For all problems below, F is a field of characteristic different from 2, and V is a finite-
dimensional F -vector space. We write Bil(V, F ) for the vector space of bilinear forms
on V , and Sym(V, F ) for the subspace of symmetric bilinear forms.
2. If B is a bilinear form on V and W V is any subspace, we define the restriction of
B to W , written B|W Bil(W, F ), by B|W (w1 , w2 ) = B(w1 , w2 ).
3. We call B Sym(V, F ) non-degenerate, if N (B) = 0.

Exercises

1. Let l V . Define a symmetric bilinear form B on V by B(v, w) = l(v)l(w). Compute


the nullspace of B.

2. Let B be a symmetric bilinear form on V . Suppose that W V is a subspace with


the property that V = W N (B). Show that the B|W is non-degenerate.

3. Let B be a symmetric bilinear form on V and char(F) 6= 2. Suppose that W V is a


subspace such that B|W is non-degenerate. Show that then V = W W .
Hint: Use induction on dim(W ).

4. Recall the isomorphism : Bil(V, F ) L(V, V ) given by

(B)(v)(w) = B(v, w), v, w V.

If is a basis of V , and is the dual basis of V , show that


 T

[B] = [(B)] .

5. Let n denote the dimension of V . Let d Altn (V ), and B Sym(V, F ) both be non-
zero. We are going to show that there exists a constant cd,B F with the property
that for any vectors v1 , . . . , vn , w1 , . . . , wn V , the following identity holds:

det(B(vi , wj )ni=1 nj=1 ) = cd,B d(v1 , . . . , vn )d(w1 , . . . , wn ),

by completing the following steps:


(a) Show that for fixed (v1 , . . . , vn ), there exists a constant cd,B (v1 , . . . , vn ) F such
that
det(B(vi , wj )ni=1 nj=1 ) = cd,B (v1 , . . . , vn ) d(w1 , . . . , wn ).

85
(b) We now let (v1 , . . . , vn ) vary. Show that there exists a constant cd,B F such
that
cd,B (v1 , . . . , vn ) = cd,B d(v1 , . . . , vn ).
Show further that cd,B = 0 precisely when B is degenerate.

6. The orthogonal group. Let B be a non-degenerate symmetric bilinear form on V .


Consider

O(B) = {f L(V, V )|B(f (v), f (w)) = B(v, w) v, w V }.

(a) Show that if f O(B), then det(f ) is either 1 or 1


Hint: Use the previous exercise.
(b) Show that the composition of maps makes O(B) into a group.
(c) Let V = R2 , B((x1 , x2 ), (y1 , y2 )) = x1 y1 + x2 y2 . Give a formula for the 2x2-
matrices that belong to O(B).
7. The vector product. Assume that V is 3-dimensional. Let B Sym(V, F ) be
non-degenerate, and d Alt3 (V ) be non-zero.
(a) Show that for any v, w V there exists a unique vector z V such that for all
vectors x V the following identity holds: B(z, x) = d(v, w, x).
Hint: Consider the element d(v, w, ) V .
(b) We will denote the unique vector z from part 1 by v w. Show that V V V ,
(v, w) 7 v w is bilinear and skew-symmetric.
(c) For f O(B), show that f (v w) = det(f ) (f (v) f (w)).
(d) Show that v w is B-orthogonal to both v and w.
(e) Show that v w = 0 precisely when v and w are linearly dependent.
8. Let V be a finite dimensional R-vector space. Recall its complexification VC , defined
in the exercises of last chapter. It is a C-vector space with dimC VC = dimR V . As an
R-vector space, it equals V V . We have the injection : V VC , v 7 (v, 0). We
also have the complex conjugation map c(v, w) = (v, w).
(a) Let B be a bilinear form on V . Show that

BC ((v, w), (x, y)) := B(v, x) B(w, y) + iB(v, y) + iB(w, x)

defines a bilinear form on VC . Show that N (BC ) = N (B)C . Show that BC is


symmetric if and only if B is.
(b) Show that for v, w VC , we have BC (c(v), c(w)) = BC (v, w). Show conversely
on VC with the property B(c(v),
that any bilinear form B w) is equal
c(w)) = B(v,
to BC for some bilinear form B on V .

86
(c) Let B be a symmetric bilinear form on V . Show that

BH ((v, w), (x, y)) := B(v, x) + B(w, y) iB(v, y) + iB(w, x)

defines a Hermitian form on VC . Show that N (BH ) = N (B)C .


(d) Show that for v, w VC , we have BH (c(v), c(w)) = BH (v, w). Show conversely
that any Hermitian form B on VC with the property B(c(v),
w) is
c(w)) = B(v,
equal to BH for some bilinear form B on V .

9. Prove that if V is a finite-dimensional F-vector space with char(F ) 6= 2 and f is a


nonzero skew-symmetric bilinear form (that is, a bilinear form such that f (v, w) =
f (w, v) for all v, w V ) then there is no basis for V such that [f ] is upper
triangular.

10. For each of the following real matrices A, find an invertible matrix S such that S t AS
is diagonal.
0 1 2 3
2 3 5 1
3 0 1 2
7 11 ,
2
.
1 0 1
5 11 13
3 2 1 0
Also find a complex matrix T such that T t AT is diagonal with only entries 0 and 1.

87
8 Inner product spaces
8.1 Definitions
We will be interested in positive definite hermitian forms.

Definition 8.1.1. Let V be a complex vector space. A hermitian form f is called an inner
product (or scalar product) if f is positive definite. In this case we call V a (complex) inner
product space.

An example is the standard dot product:

hu, vi = u1 v1 + + un vn .

p is customary to write an inner product f (u, v) as hu, vi. In addition, we write kuk =
It
hu, ui. This is the norm induced by the inner product h, i. In fact, (V, d) is a metric
space, using
d(u, v) = ku vk .

Properties of norm. Let (V, h, i) be a complex inner product space.

1. For all c C, kcuk = |c|kuk.

2. kuk = 0 if and only if u = ~0.

3. (Cauchy-Schwarz inequality) For u, v V ,

|hu, vi| kukkvk .

hu,vi
Proof. Define If u or v is ~0 then we are done. Otherwise, set w = u kvk2
v.

hu, vi
0 hw, wi = hw, ui hw, vi .
kvk2

However
hu, vi
hw, vi = hu, vi hv, vi = 0 ,
kvk2
so
hu, vihu, vi
0 hw, ui = hu, ui ,
kvk2
and
|hu, vi|2
0 hu, ui |hu, vi|2 kuk2 kvk2 .
kvk2

88
Everything above is an equality so, we have equality if and only if w = ~0, or v and u
are linearly dependent.

4. (Triangle inequality) For u, v V ,

ku + vk kuk + kvk .

This is also written ku vk ku wk + kw vk.

Proof.

ku + vk2 = hu + v, u + vi = hu, ui + hu, vi + hu, vi + hv, vi


= hu, ui + 2Re hu, vi + hv, vi
kuk2 + 2 |hu, vi| + kvk2
kuk2 + 2kukkvk + kvk2 = (kuk + kvk)2 .

Taking square roots gives the result.

8.2 Orthogonality
Definition 8.2.1. Given a complex inner product space (V, h, i) we say that vectors u, v V
are orthogonal if hu, vi = 0.

Theorem 8.2.2. Let v1 , . . . , vk be nonzero and pairwise orthogonal in a complex inner prod-
uct space. Then they are linearly independent.

Proof. Suppose that


a1 v1 + + ak vk = ~0 .
Then we take inner product with vi .
k
X
0 = h~0, vi i = aj hvj , vi i = ai kvi k2 .
j=1

Therefore ai = 0.
We begin with a method to transform a linearly independent set into an orthonormal set.

Theorem 8.2.3 (Gram-Schmidt). Let V be a complex inner product space and v1 , . . . , vk V


which are linearly independent. There exist u1 , . . . , uk such that

1. {u1 , . . . , uk } is orthonormal and

2. for all j = 1, . . . , k, Span({u1 , . . . , uj }) = Span({v1 , . . . , vj }).

89
Proof. We will prove by induction. If k = 1, we must have v1 6= ~0, so set u1 = v1 /kv1 k. This
gives ku1 k = 1 so that {u1 } is orthonormal and certainly the second condition holds.
If k 2 then assume the statement holds for all j = k 1. Find vectors u1 , . . . , uk1 as
in the statement. Now to define uk we set

wk = vk [hvk , u1 iu1 + + hvk , uk1 iuk1 ] .

We claim that wk is orthogonal to all uj s and is not zero. To check the first, let 1 j k 1
and compute

hwk , uj i = hvk , uj i [hvk , u1 ihu1 , uj i + + hvk , uk1 ihuk1 , uj i]


= hvk , uj i hvk , uj ihuj , uj i = 0 .

Second, if wk were zero then we would have

vk Span({u1 , . . . , uk1 }) = Span({v1 , . . . , vk1 }) ,

a contradiction to linear independence. Therefore we set uk = wk /kwk k and we see that


{u1 , . . . , uk } is orthonormal and therefore linearly independent.
Furthermore note that by induction,

Span({u1 , . . . , uk }) Span({u1 , . . . , uk1 , vk }) Span({v1 , . . . , vk }) .

Since the spaces on the left and right have the same dimension they are equal.

Corollary 8.2.4. If V is a finite-dimensional inner product space then V has an orthonormal


basis.

What do vectors look like represented in an orthonormal basis? Let = {v1 , . . . , vn } be


a basis and let v V . Then
v = a1 v1 + + an vn .
Taking inner product with vj on both sides gives aj = hv, vj i, so

v = hv, v1 iv1 + + hv, vn ivn .

Thus we can view in this case (orthonormal) the number hv, vi i as the projection of v onto
vi . We can then find the norm of v easily:
n
X n
X
2
kvk = hv, vi = hv, hv, vi ivi i = hv, vi ihv, vi i
i=1 i=1
n
X
= |hv, vi i|2 .
i=1

This is known as Parsevals identity.

90
Definition 8.2.5. If V is an inner product space and W is a subspace of V we define the
orthogonal complement of V as

W = {v V : hv, wi = 0 for all w W } .

{~0} = V and V = {~0}.

If S V then S is always a subspace of V (even if S was not). Furthermore,



S = (Span S) and S = Span S .

Theorem 8.2.6. Let V be a finite-dimensional inner product space with W a subspace. Then

V = W W .

Proof. Let {w1 , . . . , wk } be a basis for W and extend it to a basis {w1 , . . . , wn } for V . Then
perform Gram-Schmidt to get an orthonormal basis {v1 , . . . , vn } such that

Span({v1 , . . . , vj }) = Span({w1 , . . . , wj }) for all j = 1, . . . , n .

In particular, {v1 , . . . , vk } is an orthonormal basis for W . We claim that {vk+1 , . . . , vn } is a


basis for W . To see this, define W f W . On
f to be the span of these vectors. Clearly W
the other hand,

W W = {w W : hw, w0 i = 0 for all w0 W }


{w W : hw, wi = 0} = {~0} .

This means that dim W + dim W n, or dim W n k. Since dim W


f = n k we see
they are equal.
This leads us to a definition.
Definition 8.2.7. Let V be a finite dimensional inner product space. If W is a subspace of
V we write PW : V V for the operator

PW (v) = w1 ,

where v is written uniquely as w1 +w2 for w1 W and w2 W . PW is called the orthogonal


projection onto W .

Properties of orthogonal projection.


1. PW is linear.
2
2. PW = PW .

3. PW = I PW .

91
4. For all v1 , v2 V ,

hPW (v1 ), v2 i = hPW (v1 ), PW (v2 )i + hPW (v1 ), PW (v2 )i


= hPW (v1 ), PW (v2 )i + hPW (v1 ), PW (v2 )i = hv1 , PW (v2 )i .

Alternatively one may define an orthogonal projection as a linear map with properties 2 and
4. That is, if T : V V is linear with T 2 = I and hT (v), wi = hv, T (w)i for all v, w V
then (check this)

V = R(T ) N (T ) where N (T ) = (R(T )) and

T = PR(T ) .

Example. Orthogonal projection onto a 1-d subspace. What we saw in the proof of V =
W W is the following. If W is a subspace of a finite dimensional inner product space,
there exists an orthonormal basis of V of the form = 1 2 , where 1 is an orthonormal
basis of W and 2 is an orthonormal basis of W .
Choose an orthonormal basis {v1 , . . . , vn } of V so that v1 spans W and the other vectors
span W . For any v V ,

v = hv, v1 iv1 + hv, v2 iv2 + + hv, vn ivn ,

which is a representation of v in terms of W and W . Thus PW (v) = hv, v1 iv1 .


For any vector v 0 we can define the orthogonal projection onto v 0 as PW , where W =
Span{v 0 }. Then we choose w0 = v 0 /kv 0 k as our first vector in the orthonormal basis and

hv, v 0 i 0
0 0
P (v) = PW (v) = hv, w iw =
v0 v .
kv 0 k2

Theorem 8.2.8. If V is a finite dimensional inner product space with W a subspace of V ,


for each v V , PW (v) is the closest vector in W to v (using the distance coming from k k).
That is, for all w W ,
kv PW (v)k kv wk .

Proof. First we note that if w W and w0 W then the Pythagoras theorem holds:

kw + w0 k2 = hw, w0 i = hw, wi + hw0 , w0 i = kwk2 + kw0 k2 .

We now take v V and w W and write vw = PW (v)+PW (v)w = PW (v)w+PW (v).


Applying Pythagoras,

kv wk2 = kPW (v) wk2 + kPW (v)k2 kPW (v)k2 = kPW (v) vk2 .

92
8.3 Adjoints
Theorem 8.3.1. Let V be a finite-dimensional inner product space. If T : V V is linear
then there exists a unique linear transformation T : V V such that for all v, u V ,

hT (v), ui = hv, T (u)i . (5)

We call T the adjoint of T .


Proof. We will use the Riesz representation theorem.
Lemma 8.3.2 (Riesz). Let V be a finite-dimensional inner product space. For each f V
there exists a unique z V such that

f (v) = hv, zi for all v V .

Given this we will define T as follows. For u V we define the linear functional

fu,T : V C by fu,T (v) = hT (v), ui .

You can check this is indeed a linear functional. By Riesz, there exists a unique z V such
that
fu,T (v) = hv, zi for all v V .
We define this z to be T (u). In other words, for a given u V , T (u) is the unique vector
in V with the property

hT (v), ui = fT,u (v) = hv, T (u)i for all v V .

Because of this identity, we see that there exists a function T : V V such that for all
u, v V , (5) holds. In other words, given T , we have a way of mapping a vector u V to
another vector which we call T (u). We need to know that it is unique and linear.
Suppose that R : V V is another function such that for all u, v V ,

hT (v), ui = hv, R(u)i .

Then we see that

hv, T (u) R(u)i = hv, T (u)i hv, R(u)i = hT (v), ui hv, R(u)i = 0

for all u, v V . Choosing v = T (u) R(u) gives that

kT (u) R(u)k = 0 ,

or T (u) = R(u) for all u. This means T = R.


To show linearity, let c C and u1 , u2 , v V .

hT (v), cu1 + u2 i = chT (v), u1 i + hT (v), u2 i = chv, T (u1 )i + hv, T (u2 )i


= hv, cT (u1 ) + T (u2 )i .

93
This means that
hv, T (cu1 + u2 ) cT (u1 ) T (u2 )i = 0
for all v V . Choosing v = T (cu1 + u2 ) cT (u1 ) T (u2 ) give that

T (cu1 + u2 ) = cT (u1 ) + T (u2 ) ,

or T is linear.

Properties of adjoint.

1. T : V V is linear. To see this, if w1 , w2 V and c F then for all v,

hT (v), cw1 + w2 i = chT (v), w1 i + hT (v), w2 i


= chv, T (w1 )i + hv, T (w2 )i
= hv, cT (w1 ) + T (w2 )i .

By uniqueness, T (cw1 + w2 ) = cT (w1 ) + T (w2 ).


 t
2. If is an orthonormal basis of V then [T ] = [T ] .

Proof. If is an orthonormal basis, remembering that h, i is a sesquilinear form, the


matrix of this in the basis is simply the identity. Therefore
 t  t
hT (v), wi = [T (v)]t [w] = [T ] [v] [w] = [v]t [T ] [w] .

On the other hand, for all v, w

hv, T (w)i = [v]t [T (w)] = [v]t [T ] [w] .

Therefore  t
[v]t [T ] [w] = [v]t [T ] [w] .
Choosing v = vi and w = vj tells that all the entries of the two matrices are equal.

3. (T + S) = T + S .

Proof. If v, w V ,

h(T + S)(v), wi = hT (v), wi + hS(v), wi = hv, T (w)i + hv, S (w)i .

This equals hv, (T + S )(w)i.

4. (cT ) = cT . This is similar.

94
5. (T S) = S T .

Proof. For all v, w V ,

h(T S)(v), wi = hT (S(v)), wi = hS(v), T (w)i = hv, S (T (w))i .

This is hv, (S T )(w)i.

6. (T ) = T .

Proof. If v, w V ,

hT (v), wi = hw, T (v)i = hT (w), vi = hv, T (w)i .

8.4 Spectral theory of self-adjoint operators


Definition 8.4.1. If V is an inner product space and T : V V is linear we say that T
1. self-adjoint if T = T ;

2. skew-adjoint if T = T ;

3. unitary if T is invertible and T 1 = T ;

4. normal if T T = T T .
Note all the above operators are normal. Also orthogonal projections are self-adjoint.
Draw relation to complex numbers (SA is real, skew is purely imaginary, unitary is unit
circle).
Theorem 8.4.2. Let V be an inner product space and T : V V linear with an eigenvalue
of T .
1. If T is self-adjoint then is real.

2. If T is skew-adjoint then is purely imaginary.

3. If T is unitary then || = 1 and |det T | = 1.


Proof. Let T : V V be linear and and eigenvalue for eigenvector v.
1. Suppose T = T . Then

hkvk2 = hT (v), vi = hv, T (v)i = kvk2 .

But v 6= 0 so = .

95
2. Suppose that T = T . Define S = iT . Then S = iT = iT = iT = S, so S is
self-adjoint. Now i is an eigenvalue of S:

S(v) = (iT )(v) = iv .

This means i is real, or is purely imaginary.

3. Suppose T = T 1 . Then

||2 kvk2 = hT (v), T (v)i = hv, T 1 T vi = kvk2 .

This means || = 1. Furthermore, det T is the product of eigenvalues, so |det T | = 1.

What do these operators look like? If is an orthonormal basis for V then


 t
1. If T = T then [T ] = [T ] .
 t

2. If T = T then [T ]
= [T ] .

Lemma 8.4.3. Let V be a finite-dimensional inner product space. If T : V V is linear,


the following are equivalent.

1. T is unitary.

2. For all v V , kT (v)k = kvk.

3. For all v, w V , hT (v), T (w)i = hv, wi.

Proof. If T is unitary then hT (v), T (v)i = hv, T 1 T vi = hv, vi. This shows 1 implies 2. If T
preserves norm then it also preserves inner product by the polarization identity. This proves
2 implies 3. To see that 3 implies 1, we take v, w V and see

hv, wi = hT (v), T (w)i = hv, T T (w)i .

This implies that hv, w T T (w)i = 0 for all v. Taking v = w T T (w) gives that
T T (w) = w. Thus T must be invertible and T = T 1 .

Furthermore, T is unitary if and only if T maps orthonormal bases to orthonormal


bases. In particular, [T ] has orthonormal columns whenever is orthonormal.

For an orthonormal basis, the unitary operators are exactly those whose matrices
relative to have orthonormal columns.

We begin with a definition.

96
Definition 8.4.4. If V is a finite-dimensional inner product space and T : V V is linear,
we say that T is unitarily diagonalizable if there exists an orthonormal basis of V such
that [T ] is diagonal.
Note that T is unitarily diagonalizable if and only if there exists a unitary operator U
such that
U 1 T U is diagonal .
Theorem 8.4.5 (Spectral theorem). Let V be a finite-dimensional inner product space. If
T : V V is self-adjoint then T is unitarily diagonalizable.
Proof. We will use induction on dim V = n. If n = 1 just choose a vector of norm 1.
Otherwise suppose the statement is true for n < k and we will show it for k 2. Since T
has an eigenvalue , it has an eigenvector, v1 . Choose v1 with norm 1.
Let U = T I. We claim that
V = N (U ) R(U ) .
To show this we need only prove that R(U ) = N (U ) . This will follow from a lemma:
Lemma 8.4.6. Let V be a finite-dimensional inner product space and U : V V linear.
Then
R(U ) = N (U ) .
Proof. If w R(U ) then let z N (U ). There exists v V such that U (v) = w. Therefore
hw, zi = hU (v), zi = hv, U (z)i = 0 .
Therefore R(U ) N (U ) . For the other containment, note that dim R(U ) = dim R(U )
(since the matrix of U in an orthonormal basis is just the conjugate transpose of that of
U ). Therefore
dim R(U ) = dim R(U ) = dim V dim N (U ) = dim N (U ) .

Now we apply the lemma. Note that since T is self-adjoint,


U = (T I) = T I = T I = U ,
since R. Thus using the lemma with U ,
V = N (U ) N (U ) = N (U ) R(U ) .
Note that these are T -invariant spaces and dim R(U ) < k since is an eigenvalue of T . Thus
there is an orthonormal basis 0 of R(U ) such that T restricted to this space is diagonal.
Taking
= 0 {v1 }
gives now an orthonormal basis such that [T ] is block diagonal. But the block for {v1 } is
of size 1, so [T ] is diagonal.

97
Note that if T is skew-adjoint, iT is self-adjoint, so we can find an orthonormal basis
such that [iT ] is diagonal. This implies that T itself can be diagonalized by : its
matrix is just i[iT ] .

8.5 Normal and commuting operators


Lemma 8.5.1. Let U, T : V V be linear and F be algebraically closed. Write ET1 , . . . , ETk
for the generalized eigenspaces for T . If T and U commute then

V = ET1 ETk

is both a T -invariant direct sum and a U -invariant direct sum.

Proof. We need only show that the generalized eigenspaces of T are U -invariant. If v
N (T i I)m then
(T i I)m (U (v)) = U (T i I)m v = ~0 .

Theorem 8.5.2. Let U, T : V V be linear and F algebraically closed. Suppose that T


and U commute. Then

1. If T and U are diagonalizable then there exists a basis such that both [T ] and [U ]
are diagonal.

2. If V is an inner product space and T and U are self-adjoint then we can choose to
be orthonormal.

Proof. Suppose that T and U are diagonalizable. Then the direct sum of eigenspaces for T
is simply
ET1 ETk ,
the eigenspaces. For each j, choose a Jordan basis j for U on ETj . Set = kj=1 j . These
are all eigenvectors for T so [T ] is diagonal. Further, [U ] is in Jordan form. But since U
is diagonalizable, its Jordan form is diagonal. By uniqueness, [U ] is diagonal.
If T and U are self-adjoint the decomposition

V = ET1 ETk

is orthogonal. For each j, choose an orthonormal basis j of ETj of eigenvectors for U (since
U is self-adjoint on ETj ). Now = ki=1 i is an orthonormal basis of eigenvectors for both
T and U .

Theorem 8.5.3. Let V be a finite-dimensional inner product space. If T : V V is linear


then T is normal if and only if T is unitarily diagonalizable.

98
Definition 8.5.4. If V is an inner product space and T : V V is linear, we write
T = (1/2)(T + T ) + (1/2)(T T ) = T1 + T2
and call these operators the self-adjoint part and the skew-adjoint part of T respectively.
Of course each part of a linear transformation can be unitarily diagonalized on its own.
We now need to diagonalize them simultaneously.
Proof. If T is unitarily diagonalizable, then, taking U such that T = U 1 DU for a diagonal
D, we get

T T = U 1 DU U 1 DU = U DU U D U = U DD U = = T T ,
or T is normal.
Suppose that T is normal. Then T1 and T2 commute. Note that T1 is self-adjoint and
iT2 is also. They commute so we can find an orthonormal basis such that [T1 ] and [iT2 ]
are diagonal. Now
[T ] = [T1 ] i[iT2 ]
is diagonal.

8.6 Exercises
Notation
1. If V is a vector space over R and h, i : V V R is a positive-definite symmetric
bilinear form, then we call h, i a (real) inner product. The pair (V, h, i) is called a
(real) inner product space.
2. If (V, h, i) is a real inner product space and S is a subset of V we say that S is
orthogonal if hv, wi = 0 whenever v, w S are distinct. We say S is orthonormal if S
is orthogonal and hv, vi = 1 for all v S.
3. If f is a symmetric bilinear form on a vector space V the orthogonal group is the set
O(f ) = {T : V V | f (T (u), T (v)) = f (u, v) for all u, v V } .

Exercises
1. Let V be a complex inner product space. Let T L(V, V ) be such that T = T .
We call such T skew-self-adjoint. Show that the eigenvalues of T are purely imaginary.
Show further that V is the orthogonal direct sum of the eigenspaces of T . In other
words, V is a direct sum of the eigenspaces and hv, wi = 0 if v and w are in distinct
eigenspaces.
Hint: Construct from T a suitable self-adjoint operator and apply the known results
from the lecture to that operator.

99
2. Let V be a complex inner product space, and T L(V, V ).
(a) Show that T is unitary if and only if it maps orthonormal bases to orthonormal
bases.
(b) Let be an orthonormal basis of V . Show that T is unitary if and only if the
columns of the matrix matrix [T ] form a set of orthonormal vectors in Cn with
respect to the standard hermitian form (standard dot product).
3. Let (V, h, i) be a complex inner product space, and be a Hermitian form on V (in
addition to h, i ). Show that there exists an orthonormal basis of V such that [] is
diagonal, by completing the following steps:
(a) Show that for each w V , there exists a unique vector, which we call Aw, in V
with the property that for all v V ,

(v, w) = hv, Awi.

(b) Show the the map A : V V which sends a vector w V to the vector Aw just
defined, is linear and self-adjoint.
(c) Use the spectral theorem for self-adjoint operators to complete the problem.
4. Let (V, h, i) be a real inner product space.
(a) Define k k : V R by p
kvk = hv, vi .
Show that for all v, w V ,

|hv, wi| kvkkwk .

(b) Show that k k is a norm on V .


(c) Show that there exists an orthonormal basis of V .
5. Let (V, h, i) be a real inner product space and T : V V be linear.
(a) Prove that for each f V there exists a unique z V such that for all v V ,

f (v) = hv, zi .

(b) For each u V define fu,T : V V by

fu,T (v) = hT (v), ui .

Prove that fu,T V . Define T t (u) to be the unique u V such that for all
v V,
hT (v), ui = hv, T t (u)i
and show that T t is linear.

100
(c) Show that if is an orthonormal basis for V then
 t
[T t ] = [T ] .

6. Let (V, h, i) be a real inner product space and define the complexification of h, i as
in homework 11 by
h(v, w), (x, y)iC = hv, xi + hw, yi ihv, yi + ihw, xi .
(a) Show that h, iC is an inner product on VC .
(b) Let T : V V be linear.
i. Prove that (TC ) = (T t )C .
ii. If T t = T then we say that T is symmetric. Show in this case that TC is
Hermitian.
iii. It T t = T then we say T is anti-symmetric. Show in this case that TC is
skew-adjoint.
iv. If T is invertible and T t = T 1 then we say that T is orthogonal. Show in
this case that TC is unitary. Show that this is equivalent to
T O(h, i) ,
where O(h, i) is the orthogonal group for h, i.
7. Let (V, h, i) be a real inner product space and T : V V be linear.

(a) Suppose that T T t = T t T . Show then that TC is normal. In this case, we can find
a basis of VC such that is orthonormal (with respect to h, iC ) and [TC ] is
diagonal. Define the subspaces of V
X1 , . . . , Xr , Y1 , . . . , Y2m
as in problem 1, question 3. Show that these are mutually orthogonal; that is, if
v, w are in different subspaces then hv, wi = 0.
(b) If T is symmetric then show that there exists an orthonormal basis of V such
that [T ] is diagonal.
(c) If T is skew-symmetric, what is the form of the matrix of T in real Jordan form?
(d) A M22 (R) is called a rotation matrix if there exists [0, 2) such that
 
cos sin
A= .
sin cos

If T is orthogonal, show that there exists a basis of V such that [T ] is block


diagonal, and the blocks are either 2 2 rotation matrices or 1 1 matrices
consisting of 1 or 1.
Hint. Use the real Jordan form.

101

Anda mungkin juga menyukai