Anda di halaman 1dari 60

MA251 Algebra I – Advanced Linear Algebra

Daan Krammer
November 27, 2014

Contents

1 Review of Some Linear Algebra 2


1.1 The matrix of a linear map with respect to two bases . . . . . . . . . . 2
1.2 The column vector of a vector with respect to a basis . . . . . . . . . . 2
1.3 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 The Jordan Canonical Form 3


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 The Cayley-Hamilton theorem . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 The minimal polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Jordan chains and Jordan blocks . . . . . . . . . . . . . . . . . . . . . 7
2.5 Jordan bases and the Jordan canonical form . . . . . . . . . . . . . . . 8
2.6 The JCF when n = 2 and 3 . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.7 The JCF for general n . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.8 Proof of theorem 26 (non-examinable) . . . . . . . . . . . . . . . . . . 14
2.9 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.10 Powers of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.11 Applications to difference equations . . . . . . . . . . . . . . . . . . . 18
2.12 Functions of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.13 Applications to differential equations . . . . . . . . . . . . . . . . . . . 20

3 Bilinear Maps and Quadratic Forms 21


3.1 Bilinear maps: definitions . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Bilinear maps: change of basis . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Quadratic forms: introduction . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Quadratic forms: definitions . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5 Change of variable under the general linear group . . . . . . . . . . . . 26
3.6 Change of variable under the orthogonal group . . . . . . . . . . . . . 28
3.7 Applications of quadratic forms to geometry . . . . . . . . . . . . . . . 33
3.7.1 Reduction of the general second degree equation . . . . . . . . 33
3.7.2 The case n = 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.7.3 The case n = 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.8 Unitary, hermitian and normal matrices . . . . . . . . . . . . . . . . . 38
3.9 Applications to quantum mechanics (non-examinable) . . . . . . . . . 40

4 Finitely Generated Abelian Groups 42


4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Cosets and quotient groups . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4 Homomorphisms and the first isomorphism theorem . . . . . . . . . . 48
4.5 Free abelian groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.6 Unimodular elementary row and column operations and the Smith
normal form for integral matrices . . . . . . . . . . . . . . . . . . . . . 51
4.7 Subgroups of free abelian groups . . . . . . . . . . . . . . . . . . . . . 53
4.8 General finitely generated abelian groups . . . . . . . . . . . . . . . . 54
4.9 Finite abelian groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.10 Tensor products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.11 Hilbert’s Third problem . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2 MA251 Algebra I November 27, 2014

4.12 Possible topics for the second year essays . . . . . . . . . . . . . . . . . 60

1 Review of Some Linear Algebra


Students will need to be familiar with the whole of the contents of the first year Linear
Algebra module (MA106). In this section, we shall review the material on matrices
of linear maps and change of basis. Other material will be reviewed as it arises.
Throughout these lecture notes, K is a field and all vector spaces and linear map
are over K. If K is not allowed to be an arbitrary field we will say so.

1.1 The matrix of a linear map with respect to two bases


Let T : V ! W be a linear map. Let E = (e1 , . . . , en ) be a basis of V and F =
( f 1 , . . . , f m ) of W. It is known that there exist unique scalars ai j for 1  j  n and
1  i  m such that
m
T (e j ) = Â ai j f i
i =1

for all j. The matrix


0 1
a11 ··· a1n
B . .. C
( ai j )i j = @ .. . A
am1 ··· amn

is written [ F, T, E] or [ FTE] and called the matrix of T with respect to E and F.


The set (or vector space) of linear maps V ! W is written Hom(V, W ). For fixed
V, W, E, F as above, the association T 7! [ FTE] defines a bijective map Hom(V, W ) !
K m,n .
Theorem 1. Let S : U ! V and T : V ! W be linear maps. Let (respectively) E, F, G
be bases of (respectively) U, V, W. Then

[ G, TS, E] = [ G, T, F ][ F, S, E]. ⇤

1.2 The column vector of a vector with respect to a basis


Let E = (e1 , . . . , en ) be a basis of a vector space V and let x 2 V. It is known that
there are unique scalars ai for 1  i  n such that
n
v= Â ai ei .
i =1

The column vector


0 1
a1
B.C
@ .. A (2)
an

is written [ E, v] or [ Ev] and is called the column vector of v with respect to E. The ai
are known as the coordinates of v with respect to E.
For typographical reasons we often denote the column vector (2) by ( a1 , . . . , an )T
(T denotes transpose).
For fixed V, E as above, the association v 7! [ E, v] defines a bijective map V !
n,1
K .
Theorem 3. Let T : V ! W be a linear map and let v 2 V. Let (respectively) E, F be
bases of (respectively) V, W. Then

[ F, T (v)] = [ F, T, E][ E, v]. ⇤


November 27, 2014 MA251 Algebra I 3

If E = (e1 , . . . , en ) is the standard basis of K n and v 2 K n then [ E, v] = v. If


F = ( f 1 , . . . , f n ) is another basis of K n then [ E, 1, F ] ei = f i for all i because

[ E, 1, F ] ei = [ E, 1, F ][ F, f i ] = [ E, f i ] = f i .

1.3 Change of basis


Let U be a vector space. The identity map U ! U is defined by x 7! x and is variously
written I, id, 1, IU , idU , 1U .
An easy consequence of theorem 3 is:
Corollary 4. Let T : V ! W be a linear map. Let (respectively) E1 , E2 , F1 , F2 be bases
of (respectively) V, V, W, W. Then

[ F2 , T, E2 ] = [ F2 , 1, F1 ][ F1 , T, E1 ][ E1 , 1, E2 ]. ⇤

The foregoing corollary explains why matrices of the form [ F, 1, E] are called
change of base matrices.
Definition 5. Two matrices A, B 2 K m,n are said to be equivalent if there exist invert-
ible matrices P 2 K m,m , Q 2 K n,n such that B = PAQ.
Theorem 6. A, B 2 K m,n . Then the following are equivalent:
(a) The matrices A, B are equivalent.
(b) The matrices A, B represent the same linear map with respect to possibly different
bases.
(c) The matrices A, B have the same rank. ⇤

2 The Jordan Canonical Form


2.1 Introduction
Throughout this section V will be a vector space of dimension n over a field K, T :
V ! V will be a linear operator1 , and A will be the matrix of T with respect to a
fixed basis e1 , . . . , en of V. Our aim is to find a new basis e01 , . . . , e0n for V, such that
the matrix of T with respect to the new basis is as simple as possible. Equivalently
(by corollary 4), we want to find an invertible matrix P (the associated basis change
matrix) such that P 1 AP is a simple as possible.
Our preferred form of matrix is a diagonal matrix, but we saw in MA106 that
the matrix ( 10 11 ), for example, is not similar to a diagonal matrix. We shall generally
assume that K = C. This is to ensure that the characteristic polynomial of A factorises
into linear factors. Under this assumption, it can be proved that A is always similar
to a matrix B = ( i j ) of a certain type (called the Jordan canonical form or sometimes
Jordan normal form of the matrix), which is not far off being diagonal. In fact i j is
zero except when j = i or j = i + 1, and i,i+1 is either 0 or 1.
We start by summarising some definitions and results from MA106. We shall use
0 both for the zero vector in V and the zero n ⇥ n matrix. The zero linear operator
0V : V ! V corresponds to the zero matrix 0, and the identity linear operator IV :
V ! V corresponds to the identity n ⇥ n matrix In .
Because of the correspondence between linear maps and matrices, which respects
addition and multiplication, all statements about A can be rephrased as equivalent
statements about T. For example, if p( x) is a polynomial equation in a variable x,
then p( A) = 0 , p( T ) = 0V .
1 i.e. a linear map from a space to itself
4 MA251 Algebra I November 27, 2014

If Tv = v for 2 K and 0 6= v 2 V, or equivalently, if Av = v, then is an


eigenvalue, and v a corresponding eigenvector of T and A. The eigenvalues can be
computed as the roots of the characteristic polynomial c A ( x) = det( A xIn ) of A.
The eigenvectors corresponding to are the non-zero elements in the nullspace
(= kernel) of the linear operator T IV This nullspace is called the eigenspace of T
with respect to the eigenvalue . In other words, the eigenspace is equal to {v 2 V |
T (v) = v}, which is equal to the set of eigenvectors together with 0.
The dimension of the eigenspace, which is called the nullity of T IV is therefore
equal to the number of linearly independent eigenvectors corresponding to . This
number plays an important role in the theory of the Jordan canonical form. From the
Dimension Theorem, proved in MA106, we know that

rank( T IV ) + nullity( T IV ) = n,

where rank( T IV ) is equal to the dimension of the image of T IV .


For the sake of completeness, we shall now repeat the results proved in MA106
about the diagonalisability of matrices. We shall use the theorem that a set of n
linearly independent vectors of V form a basis of V without further explicit reference.
Theorem 7. Let T : V ! V be a linear operator. Then the matrix of T is diagonal with
respect to some basis of V if and only if V has a basis consisting of eigenvectors of T.
Proof. Suppose that the matrix A = (↵i j ) of T is diagonal with respect to the basis
e1 , . . . , en of V. Recall that the image of the i-th basis vector of V is represented by
the i-th column of A. But since A is diagonal, this column has the single non-zero
entry ↵ii . Hence T (ei ) = ↵ii ei , and so each basis vector ei is an eigenvector of A.
Conversely, suppose that e1 , . . . , en is a basis of V consisting entirely of eigenvec-
tors of T. Then, for each i, we have T (ei ) = i ei for some i 2 K. But then the matrix
of A with respect to this basis is the diagonal matrix A = (↵i j ) with ↵ii = i for each
i. ⇤
Theorem 8. Let 1 , . . . , r be distinct eigenvalues of T : V ! V, and let v1 , . . . , vr
be corresponding eigenvectors. (So T (vi ) = i vi for 1  i  r.) Then v1 , . . . , vr are
linearly independent.
Proof. We prove this by induction on r. It is true for r = 1, because eigenvectors are
non-zero by definition. For r > 1, suppose that for some ↵1 , . . . , ↵r 2 K we have

↵1 v1 + ↵2 v2 + · · · + ↵r vr = 0.

Then, applying T to this equation gives

↵1 1 v1 + ↵2 2 v2 + · · · + ↵r r vr = 0.

Now, subtracting 1 times the first equation from the second gives

↵2 ( 2 1 )v2 + · · · + ↵r ( r 1 ) vr = 0.

By inductive hypothesis, v2 , . . . , vr are linearly independent, so ↵i ( i 1 ) = 0 for


2  i  r. But, by assumption, i 1 6 = 0 for i > 1, so we must have ↵i = 0 for
i > 1. But then ↵1 v1 = 0, so ↵1 is also zero. Thus ↵i = 0 for all i, which proves that
v1 , . . . , vr are linearly independent. ⇤
Corollary 9. If the linear operator T : V ! V (or equivalently the n ⇥ n matrix A) has
n distinct eigenvalues, where n = dim(V ), then T (or A) is diagonalisable.
Proof. Under the hypothesis, there are n linearly independent eigenvectors, which
therefore form a basis of V. The result follows from theorem 7. ⇤
November 27, 2014 MA251 Algebra I 5

2.2 The Cayley-Hamilton theorem


This theorem says that a matrix satisfies its own characteristic equation. It is easy to
visualise with the following “non-proof”:

c A ( A) = det( A AI ) = det(0) = 0.

This argument is faulty because you cannot really plug the matrix A into det( A xI ):
you must compute this polynomial first.
Theorem 10 (Cayley-Hamilton). Let c A ( x) be the characteristic polynomial of the n ⇥
n matrix A over an arbitrary field K. Then c A ( A) = 0.
Proof. Recall from MA106 that, for any n ⇥ n matrix B, the adjoint adj( B) is the
n ⇥ n matrix whose ( j, i )th entry is the cofactor ci j = ( 1)i+ j det( Bi j ), where Bi j is
the matrix obtained from B by deleting the i-th row and the j-th column of B. We
proved that B adj( B) = det( B) In .
By definition, c A ( x) = det( A xIn ), and ( A xIn )adj( A xIn ) = det( A
xIn ) In . Now det( A xIn ) is a polynomial of degree n in x; that is det( A xIn ) =
a0 x0 + a1 x1 + · · · + an xn , with ai 2 K. Similarly, putting B = A xIn in the last
paragraph, we see that the ( j, i )-th entry ( 1)i+ j det( Bi j ) of adj( B) is a polynomial
of degree at most n 1 in x. Hence adj( A xIn ) is itself a polynomial of degree
at most n 1 in x in which the coefficients are n ⇥ n matrices over K. That is,
adj( A xIn ) = B0 x0 + B1 x + · · · + Bn 1 xn 1 , where each Bi is an n ⇥ n matrix over
K. So we have

(A xIn )( B0 x0 + B1 x + · · · + Bn 1x
n 1
) = ( a0 x0 + a1 x1 + · · · + an xn ) In .

Since this is a polynomial identity, we can equate coefficients of the powers of x on


the left and right hand sides. In the list of equations below, the equations on the left
are the result of equating coefficients of xi for 0  i  n, and those on right are
obtained by multiplying Ai by the corresponding left hand equation:

AB0 = a0 In AB0 = a0 In
AB1 B0 = a1 In A2 B1 AB0 = a1 A
AB2 B1 = a2 In A3 B2 A2 B1 = a2 A2
··· ···
ABn 1 Bn 2 = an 1 In An Bn 1 An 1 Bn 2 = an 1 An 1
Bn 1 = an In An Bn 1 = an An .

Now summing all of the equations in the right hand column gives

0 = a0 A0 + a1 A + . . . + an 1A
n 1
+ an An

(remember A0 = In ), which says exactly that c A ( A) = 0. ⇤


By the correspondence between linear maps and matrices, we also have c A ( T ) =
0.

2.3 The minimal polynomial


We start this section with a brief general discussion of polynomials in a single variable
x with coefficients in a field K, such as p = p( x) = 2x2 3x + 11. The set of all such
polynomials is denoted by K [ x]. There are two binary operations on this set: addition
and multiplication of polynomials. These operations turn K [ x] into a ring, which will
be studied in great detail in Algebra-II.
6 MA251 Algebra I November 27, 2014

As a ring K [ x] has a number of properties in common2 with the integers Z. The


notation a|b mean a divides b. It can be applied to integers (for instance, 3|12), and
also to polynomials (for instance, ( x 3)|( x2 4x + 3)).
We can divide one polynomial p (with p 6= 0) into another polynomial q and get
a remainder with degree less than p. For example, if q = x5 3, p = x2 + x + 1, then
we find q = sp + r with s = x3 x2 + 1 and r = x 4. For both Z and K [ x], this is
known as the Euclidean Algorithm.
A polynomial r is said to be a greatest common divisor of p, q 2 K [ x] if r| p, r|q,
and, for any polynomial r0 with r0 | p, r0 |q, we have r0 |r. Any two polynomials p, q 2
K [ x] have a greatest common divisor and a least common multiple (which is defined
similarly), but these are only determined up to multiplication by a constant. For
example, x 1 is a greatest common divisor of x2 2x + 1 and x2 3x + 2, but so is
1 x and 2x 2. To resolve this ambiguity, we make the following definition.
Definition 11. A polynomial with coefficients in a field K is called monic if the coef-
ficient of the highest power of x is 1.
For example, x3 2x2 + x + 11 is monic, but 2x2 x 1 is not.
Now we can define gcd( p, q) to be the unique monic greatest common divisor of
p and q, and similarly for lcm( p, q).
As with the integers, we can use the Euclidean Algorithm to compute gcd( p, q).
For example, if p = x4 3x3 + 2x2 , q = x3 2x2 x + 2, then p = q( x 1) + r with
r = x2 3x + 2, and q = r( x + 1), so gcd( p, q) = r.
Theorem 12. Let A be an n ⇥ n matrix over K representing the linear operator T :
V ! V. The following hold:
(a) There is a unique monic non-zero polynomial p( x) with minimal degree and coef-
ficients in K such that p( A) = 0,
(b) If q( x) is any polynomial with q( A) = 0, then p|q.
Proof. (a). If we have any polynomial p( x) with p( A) = 0, then we can make
p monic by multiplying it by a constant. By theorem 10, there exists such a p( x),
namely c A ( x). If we had two distinct monic polynomials p1 ( x), p2 ( x) of the same
minimal degree with p1 ( A) = p2 ( A) = 0, then p = p1 p2 would be a non-zero
polynomial of smaller degree with p( A) = 0, contradicting the minimality of the
degree, so p is unique.
(b). Let p( x) be the minimal monic polynomial in (a) and suppose that q( A) = 0.
As we saw above, we can write q = sp + r where r has smaller degree than p. If r is
non-zero, then r( A) = q( A) s( A) p( A) = 0 contradicting the minimality of p, so
r = 0 and p|q. ⇤
Definition 13. The unique monic polynomial µ A ( x) of minimal degree with µ A ( A) =
0 is called the minimal polynomial of A or of the corresponding linear operator T.
(Note that p( A) = 0 () p( T ) = 0 for p 2 K [ x].)
By theorem 10 and theorem 12 (ii), we have:
Corollary 14. The minimal polynomial of a square matrix A divides its characteristic
polynomial. ⇤
Similar matrices A and B represent the same linear operator T, and so their min-
imal polynomial is the same as that of T. Hence we have
Proposition 15. Similar matrices have the same minimal polynomial. ⇤
For a vector v 2 V, we can also define a relative minimal polynomial µ A,v as
2 Technically speaking, they are both Euclidean Domains that is an important topic in Algebra-II.
November 27, 2014 MA251 Algebra I 7

the unique monic polynomial p of minimal degree for which p( T )(v) = 0V . Since
p( T ) = 0 if and only if p( T )(v) = 0V for all v 2 V, µ A is the least common multiple
of the polynomials µ A,v for all v 2 V.
But p( T )(v) = 0V for all v 2 V if and only if p( T )(bi ) = 0V for all bi in a basis
b1 , . . . , bn of V (exercise), so µ A is the least common multiple of the polynomials
µ A,bi .
This gives a method of calculating µ A . For any v 2 V, we can compute µ A,v
by calculating the sequence of vectors v, T (v), T 2 (v), T 3 (v) and stopping when
it becomes linearly dependent. In practice, we compute T (v) etc. as Av for the
corresponding column vector v 2 K n,1 .
For example, let K = R and
0 1
3 1 0 1
B 1 1 0 1 C
A= B C.
@ 0 0 1 0 A
0 0 0 1

Using the standard basis b1 = (1 0 0 0)T , b2 = (0 1 0 0)T , b1 = (0 0 1 0)T ,


b4 = (0 0 0 1)T of R4,1 , we have:
Ab1 = (3 1 0 0)T , A2 b1 = A( Ab1 ) = (8 4 0 0)T = 4Ab1 4b1 , so ( A2 4A +
4)b1 = 0, and hence µ A,b1 = x2 4x + 4 = ( x 2)2 .
Ab2 = ( 1 1 0 0)T , A2 b2 = ( 4 0 0 0)T = 4Ab2 4b2 , so µ A,b2 = x2 4x + 4.
Ab3 = b3 , so µ A,b3 = x 1.
Ab4 = (1 1 0 1)T , A2 b4 = (3 3 0 1)T = 3Ab4 2b4 , so µ A,b4 = x2 3x + 2 =
( x 2)( x 1).
So we have µ A = lcm(µ A,b1 , µ A,b2 , µ A,b3 , µ A,b4 ) = ( x 2)2 ( x 1).

2.4 Jordan chains and Jordan blocks


The Cayley-Hamilton theorem and the theory of minimal polynomials are valid for
any matrix over an arbitrary field K, but the theory of Jordan forms will require an
additional assumption that the characteristic polynomial c A ( x) is split in K [ x], i.e. it
factorises into linear factors. If the field K = C then all polynomials in K [ x] factorise
into linear factors by the Fundamental Theorem of Algebra and JCF works for any
matrix.
Definition 16. A Jordan chain of length k is a sequence of non-zero vectors v1 , . . . ,
vk 2 K n,1 that satisfies
Av1 = v1 , Avi = vi + vi 1, 2  i  k,
for some eigenvalue of A.
Exercise 17. Prove that every Jordan chain is independent.
Example 18. Let V be the vector space of functions in the form f ( z)e z where f ( z)
is the polynomial of degree less than k. Consider the derivative, that is, the linear
operator T : V ! V given by T ( ( z)) = 0 ( z). The vectors vi = zi 1 e z /(i 1)!
form a Jordan chain for T and a basis of V. In particular, the matrix of T in this basis
is the Jordan block defined below.
Definition 19. Let T : V ! V be linear, 2 K and n > 0. The kernel of the linear
map ( T IV )i is called a generalised eigenspace. Likewise for matrices.
Note that ker( T IV ) is an eigenspace; this is the case i = 1. The nonzero
elements of generalised eigenspaces are called generalised eigenvectors.
Notice that v 2 V is an eigenvector with eigenvalue if and only if µ A,v = x .
Similarly, generalised eigenvectors are characterised by the property µ A,v = ( x )i
for some and i.
8 MA251 Algebra I November 27, 2014

Example 20. Let’s find the generalised eigenspaces and a Jordan chain of
0 1
3 1 0
A= @0 3 1A.
0 0 3

Firstly, ker( A I3 ) = 0 if 6= 3. Furthermore


0 1 0 1
0 1 0 0 0 1
2
A 3 I3 = @0 0 1A, (A 3 I3 ) = @0 0 0A, (A 3 I3 )3 = 0.
0 0 0 0 0 0

So the remaining generalised eigenspaces are


ker( A 3 I3 ) = hb1 i, ker( A 3 I3 )2 = hb1 , b2 i, ker( A 3 I3 )3 = hb1 , b2 , b3 i
where bi denotes the standard base vectors.
Also Ab1 = 3b1 , Ab2 = 3b2 + b1 , Ab3 = 3b3 + b2 , so b1 , b2 , b3 is a Jordan chain
of length 3 for the eigenvalue 3 of A. The generalised eigenspaces of index 1, 2, and
3 are respectively hb1 i, hb1 , b2 i, and hb1 , b2 , b3 i.
Notice that the dimension of a generalised eigenspace of A is the nullity of ( T
IV )i , which is a a function of the linear operator T associated with A. Since similar
matrices represent the same linear operator, we have
Proposition 21. Let A, B 2 K n,n be similar. Then
dim ker( A )i = dim ker( B )i
for all 2 K and i > 0. In words, the dimensions of the generalised eigenspaces of A
and B are equal. ⇤
Definition 22. We define a Jordan block with eigenvalue of degree k to be a k ⇥ k
matrix J ,k = ( i j ), such that ii = for 1  i  k, i,i+1 = 1 for 1  i < k, and
i j = 0 if j is not equal to i or i + 1.

So, for example,


0 3 i 1 0
0 1 0 0
1
✓ ◆ 2 1 0
1 1 3 i B 0 0 1 0 C
J1,2 = , J ,3 =@ 0 1 A, and J0,4 = B C
0 1 2
3 i
@ 0 0 0 1 A
0 0 2 0 0 0 0

are Jordan blocks, where = 3 2 i in the second example.


It should be clear that the matrix of T with respect to the basis v1 , . . . , vn of K n,1
is a Jordan block of degree n if and only if v1 , . . . , vn is a Jordan chain for A.
Note also that for A = J ,k , µ A,vi = ( x )i , so µ A = ( x )k . Since J ,k is an
upper triangular matrix with entries on the diagonal, we see that the characteristic
polynomial c A of A is also equal to ( x)k .
Warning: Some authors put the 1’s below rather than above the main diagonal in
a Jordan block. This corresponds to either writing the Jordan chain in the reversed
order or using rows instead of columns for the standard vector space. However, if an
author does both (uses rows and reverses the order) then the 1’s will go back above
the diagonal.

2.5 Jordan bases and the Jordan canonical form


Definition 23. We denote the m ⇥ n matrix in which all entries are 0 by 0m,n . If A
is an m ⇥ m matrix and B an n ⇥ n matrix, then we denote the (m + n) ⇥ (m + n)
matrix with block form ✓ ◆
A 0m,n
0n,m B
by A B. We call A B the block sum of A and B.
November 27, 2014 MA251 Algebra I 9

Definition 24. Let T : V ! V be linear. A Jordan basis for T and V is a finite basis E
of V such that there exist Jordan blocks J1 , . . . , Jk such that

[ ETE] = J1 ··· Jk .

Likewise for matrices instead of linear maps.


Note that a linear map admits Jordan basis if and only if it is similar to a block
sum of Jordan blocks.
Combining definitions 16, 22 and 24 we find the following explicit characterisa-
tion of Jordan bases in terms of Jordan chains.
Proposition 25. Let T : V ! V be linear and E = (e1 , . . . , en ) a basis of V. Then E is
a Jordan basis for T and V if and only if there exists k 1 and integers i (0), . . . , i (k)
such that 0 = i (0) < · · · < i (k) = n and

ei(t)+1 , ei(t)+2 , . . . , ei(t+1)

is a Jordan chain for all t with 0  t < k. ⇤


We can now state without proof the main theorem of this section, which says that
Jordan bases exist.
Theorem 26. Let A be an n ⇥ n matrix over K such that c A ( x) splits into linear factors
in K [ x].
(a) Then there exists a Jordan basis for A, and hence A is similar to a matrix J which
is a block sum of Jordan blocks.
(b) The Jordan blocks occurring in J are uniquely determined by A. More precisely,
if A is similar to J1 · · · J j and L1 · · · L` and Ji and Li are (nonempty)
Jordan blocks then j = ` and there exists a permutation s of {1, . . . , j} such that
Ji = Ls(i) for all i.
The matrix J in the theorem is said to be the Jordan canonical form (JCF) or
sometimes Jordan normal form of A.
We will prove the theorem later. First we derive some consequences and study
methods for calculating the JCF of a matrix. As we have discussed before, polynomials
over C always split. The gives the following corollary.
Corollary 27. Let A be an n ⇥ n matrix over C. Then there exists a Jordan basis for A.

The proof of the following corollary requires algebraic techniques beyond the
scope of this course. You can try to prove yourself after you have done Algebra-II3 .
The trick is to find a field extension F K such that c A ( x) splits in F [ x]. For example,
consider the rotation by 90 degrees matrix A = ( 01 01 ). Since c A ( x) = x2 + 1, its
eigenvalues are imaginary numbers i and i. Hence, it admits no JCF over R but
over complex numbers it has JCF ( 0i 0i ).
Corollary 28. Let A be an n ⇥ n matrix over K. Then there exists a field extension
F K and a Jordan basis for A in F n,1 . ⇤
The next two corollaries are immediate4 consequences of theorem 26 but they are
worth stating because of their computational significance.
Corollary 29. Let A be an n ⇥ n matrix over K that admits a Jordan basis. If P is the
matrix having a Jordan basis as columns, then P 1 AP is the JCF of A.
3 Or you can take Galois Theory next year and this should become obvious.
4 This means I am not proving them here but I expect you to be able to prove them
10 MA251 Algebra I November 27, 2014

Notice that a Jordan basis is not, in general, unique. Thus, there exists multiple
matrices P such that J = P 1 AP is the JCF of A.
The final corollary follows from an explicit calculation5 for J because both mini-
mal and characteristic polynomials of J and A are the same.
Corollary 30. Suppose that the eigenvalues of A are 1 , . . . , t , and that the Jordan
blocks in J for the eigenvalue i are J i ,ki,1 , . . . , J i ,ki, j , where ki,1 ki,2 · · · ki, ji .
i
Then the characteristic polynomial is c A ( x) = ’it=1 ( i x)ki , where ki = ki,1 +
· · · + ki, ji for 1  i  t. The minimal polynomial is µ A ( x) = ’it=1 ( x ki,1
i) .

2.6 The JCF when n = 2 and 3

Figure 1: All Jordan matrices of size 2 or 3 up to reordering the blocks. A dot means
zero; , µ, ⇡ 2 K are distinct.

Jordan matrix cA µA
✓ ◆
·
· µ
( x)(µ x) (x )( x µ)
✓ ◆
1
·
( x)2 (x )2
✓ ◆
·
·
( x)2 (x )
0 1
· ·
@· µ ·A ( x)(µ x)(⇡ x) ( x )( x µ )( x ⇡)
· · ⇡
0 1
1 ·
@· ·A ( x)2 (µ x) (x )2 ( x µ)
· · µ
0 1
· ·
@· ·A ( x)2 (µ x) (x )( x µ)
· · µ
0 1
1 ·
@· 1A ( x)3 (x )3
· ·
0 1
1 ·
@· ·A ( x)3 (x )2
· ·
0 1
· ·
@· ·A ( x)3 (x )
· ·

In figure 1 you can find a full list of the Jordan matrices of size 2 or 3 up to
reordering the blocks. For each we give the minimal and characteristic polynomials.
Corollary 31. Let A 2 K n,n with n 2 {2, 3} admit a JCF. Then a JCF of A is determined
by the minimal and characteristic polynomials of A.
Proof. Immediate from figure 1. ⇤
In the rest of this section we shall show examples where A 2 K n,n is given with
n 2 {2, 3} and one is asked to find a Jordan basis.
Example 32. A = ( 11 41 ). We calculate c A ( x) = x2 2x 3 = (x 3)( x + 1), so
5 The characteristic polynomial of J is the product of the characteristic polynomials of the Jordan

blocks and the minimal polynomial of J is the least common multiple of characteristic polynomials of
the Jordan blocks
November 27, 2014 MA251 Algebra I 11

there are two distinct eigenvalues, 3 and 1. Associated eigenvectors are (2 1)T and
( 2 1)T , so we put P = ( 21 12 ) and then P 1 AP = ( 30 01 ).
If the eigenvalues are equal, then there are two possible JCF’s, J 1 ,1 J 1 ,1 , which
is a scalar matrix, and J 1 ,2 . The minimal polynomial is respectively ( x 1 ) and
(x ) 2 in these two cases. In fact, these cases can be distinguished without any
1
calculation whatsoever, because in the first case A = PJP 1 = J so A is its own JCF.
In the second case, a Jordan basis consists of a single Jordan chain of length
2. To find such a chain, let v2 be any vector for which ( A 1 I2 ) v2 6 = 0 and let
v1 = ( A 1 I2 ) v2 . (In practice, it is often easier to find the vectors in a Jordan chain
in reverse order.)
Example 33. A = ( 11 43 ). We have c A ( x) = x2 + 2x + 1 = ( x + 1)2 , so there is a
single eigenvalue 1 with multiplicity 2. Since the first column of A + I2 is non-zero,
we can choose v2 = (1 0)T and v1 = ( A + I2 )v2 = (2 1)T , so P = ( 21 01 ) and
P 1 AP = ( 01 11 ).
Now let n = 3. If there are three distinct eigenvalues, then A is diagonalisable.
Suppose that there are two distinct eigenvalues, so one has multiplicity 2, and
the other has multiplicity 1. Let the eigenvalues be 1 , 1 , 2 , with 1 6= 2 . Then
there are two possible JCF’s for A, J 1 ,1 J 1 ,1 J 2 ,1 and J 1 ,2 J 2 ,1 , and the minimal
polynomial is ( x 2
1 )( x 2 ) in the first case and ( x 1 ) (x 2 ) in the second.
In the first case, a Jordan basis is a union of three Jordan chains of length 1, each
of which consists of an eigenvector of A.
0 1
2 0 0
Example 34. A = @ 1 5 2 A. Then
2 6 2

c A ( x) = (2 x)[(5 x)( 2 x) + 12] = (2 x)( x2 3x + 2) = (2 x)2 (1 x).

We know from the theory above that the minimal polynomial must be ( x 2)( x 1)
or ( x 2)2 ( x 1). We can decide which simply by calculating ( A 2I3 )( A I3 ) to
test whether or not it is 0. We have
0 1 0 1
0 0 0 1 0 0
A 2I3 = @ 1 3 2 A, A I3 = @ 1 4 2 A,
2 6 4 2 6 3

and the product of these two matrices is 0, so µ A = ( x 2)( x 1).


The eigenvectors v for 1 = 2 satisfy ( A 2I3 )v = 0, and we must find two
linearly independent solutions; for example we can take v1 = (0 2 3 )T , v 2 =
T T
(1 1 1) . An eigenvector for the eigenvalue 1 is v3 = (0 1 2) , so we can choose
0 1
0 1 0
P=@ 2 1 1 A
3 1 2

and then P 1 AP is diagonal with entries 2, 2, 1.


In the second case, there are two Jordan chains, one for 1 of length 2, and one for
2
2 of length 1. For the first chain, we need to find a vector v2 with ( A 1 I3 ) v2 = 0
but ( A 1 I3 ) v2 6 = 0, and then the chain is v1 = ( A 1 I3 ) v2 , v2 . For the second
chain, we simply need an eigenvector for 2 .
0 1
3 2 1
Example 35. A = @ 0 3 1 A. Then
1 4 1

c A ( x) = (3 x)[(3 x)( 1 x) + 4] 2 + (3 x)
3 2 2
= x + 5x 8x + 4 = (2 x) (1 x),
0 1 0 1
1 2 1 0 0 0
A 2I3 = @ 0 1 1 A, (A 2I3 )2 = @ 1 3 2 A,
1 4 3 2 6 4
12 MA251 Algebra I November 27, 2014
0 1
2 2 1
(A I3 ) = @ 0 2 1 A.
1 4 2

and we can check that ( A 2I3 )( A I3 ) is non-zero, so we must have µ A = ( x


2 ) 2 ( x 1 ).
For the Jordan chain of length 2, we need a vector with ( A 2I3 )2 v2 = 0 but
( A 2I3 )v2 6= 0, and we can choose v2 = (2 0 1)T . Then v1 = ( A 2I3 )v2 =
(1 1 1)T . An eigenvector for the eigenvalue 1 is v3 = (0 1 2)T , so we can choose
0 1
1 2 0
P=@ 1 0 1 A
1 1 2

and then 0 1
2 1 0
1
P AP = @ 0 2 0 A.
0 0 1

Finally, suppose that there is a single eigenvalue, 1 , so c A = ( 1 x)3 . There are


three possible JCF’s for A, J 1 ,1 J 1 ,1 J 1 ,1 , J 1 ,2 J 1 ,1 , and J 1 ,3 , and the minimal
polynomials in the three cases are ( x 2 3
1 ), ( x 1 ) , and ( x 1 ) , respectively.
1
In the first case, J is a scalar matrix, and A = PJP = J, so this is recognisable
immediately.
In the second case, there are two Jordan chains, one of length 2 and one of length
1. For the first, we choose v2 with ( A 1 I3 ) v2 6 = 0, and let v1 = ( A 1 I3 ) v2 . (This
case is easier than the case illustrated in Example 4, because we have ( A 2
1 I3 ) v =
3,1
0 for all v 2 C .) For the second Jordan chain, we choose v3 to be an eigenvector
for 1 such that v2 and v3 are linearly independent.
0 1
0 2 1
Example 36. A = @ 1 3 1 A. Then
1 2 0

c A ( x) = x[(3 + x) x + 2] 2( x + 1) 2 + (3 + x)
= x3 3x2 3x 1= (1 + x)3 .

We have 0 1
1 2 1
A + I3 = @ 1 2 1 A,
1 2 1

and we can check that ( A + I3 )2 = 0. The first column of A + I3 is non-zero, so


( A + I3 )(1 0 0)T 6= 0, and we can choose v2 = (1 0 0)T and v1 = ( A + I3 )v2 =
(1 1 1)T . For v3 we need to choose a vector which is not a multiple of v1 such that
( A + I3 )v3 = 0, and we can choose v3 = (0 1 2)T . So we have
0 1
1 1 0
P=@ 1 0 1 A
1 0 2

and then 0 1
1 1 0
1
P AP = @ 0 1 0 A.
0 0 1

In the third case, there is a single Jordan chain, and we choose v3 such that
2 2
(A 1 I3 ) v3 6 = 0, v2 = ( A 1 I3 ) v3 , v1 = ( A 1 I3 ) v3 .
0 1
0 1 0
Example 37. A = @ 1 1 1 A. Then
1 0 2

c A ( x) = x[(2 + x)(1 + x)] (2 + x) + 1 = (1 + x)3 .


November 27, 2014 MA251 Algebra I 13

We have 0 1 0 1
1 1 0 0 1 1
A + I3 = @ 1 0 1 A, ( A + I3 )2 = @ 0 1 1 A,
1 0 1 0 1 1

so ( A + I3 )2 6= 0 and µ A = ( x + 1)3 . For v3 , we need a vector that is not in the


nullspace of ( A + I3 )2 . Since the second column, which is the image of (0 1 0)T is
non-zero, we can choose v3 = (0 1 0)T , and then v2 = ( A + I3 )v3 = (1 0 0)T and
v1 = ( A + I3 )v2 = (1 1 1)T . So we have
0 1 0 1
1 1 0 1 1 0
1
P=@ 1 0 1 A, P AP = @ 0 1 1 A.
1 0 0 0 0 1

2.7 The JCF for general n


Reminder on direct sums. Let W1 , . . . , Wk be subspaces of a vector space V. We
say that V is a direct sum of W1 , . . . , Wk and write

V = W1 ··· Wk

if there exists a bijective map from the Cartesian product W1 ⇥ · · · ⇥ Wk to V,


defined by ( x1 , . . . , xk ) = x1 + · · · + xk .
Lemma 38. Let V = W1 · · · Wk . Let T : V ! V be linear. Assume T (Wi ) ⇢ Wi for
all i and let Ti : Wi ! Wi be the restriction of T to Wi . Then

ker( T ) = ker( T1 ) ··· ker( Tk ).

Proof. Let ⇢ : W1 . . . Wk ! V be defined by addition: ⇢( x1 , . . . , xk ) = x1 + · · · xk . Let


be the restriction of ⇢ to ker( T1 ) ⇥ · · · ⇥ ker( Tk ).
Claim 1: im( ) ⇢ ker( T ). Proof of this. Let xi 2 ker( Ti ) for all i and put
x = ( x1 , . . . , xk ). Then

T x = T ( x1 , . . . , xk ) = T ( x1 + · · · + xk )
= Tx1 + · · · + Txk because T is linear
= T1 x1 + · · · + Tk xk because xi 2 Wi for all i
= 0+···+0 because Ti xi = 0 for all i
=0

which proves ( x) 2 ker( T ) as required.


Claim 2: im( ) ker( T ). Proof of this. Let y 2 ker( T ), say, y = ⇢( x1 , . . . , xk ).
Then

0 = Ty = T⇢( x1 , . . . , xk ) = T ( x1 + · · · + xk )
= Tx1 + · · · + Txk because T is linear
= T1 x1 + · · · + Tk xk because xi 2 Wi for all i
= ⇢( T1 x1 , . . . , Tk xk ) because Ti xi 2 Wi for all i.

But ⇢ is bijective so Ti xi = 0 for all i, that is, xi 2 ker( Ti ). So y = ( x1 , . . . , xk ) as


required.
The above claims show that : ker( T1 ) ⇥ · · · ⇥ ker( Tk ) ! ker( T ) defined by
( x1 , . . . , xk ) = x1 + · · · + xk is surjective. It remains to prove that it is injective.
Well, it is because ⇢ is. The proof is finished. ⇤
A JCF is determined by the dimensions of the generalised eigenspaces. Recall that
the degree of a Jordan block J ,k is k.
14 MA251 Algebra I November 27, 2014

Theorem 39. Let T : V ! V be linear admitting a JCF J. Let i > 0 and 2 K. Then
the number of Jordan blocks of J with eigenvalue and degree at least i is equal to
nullity ( T In )i nullity ( T In )i 1 .
Proof. Let us first prove this if J is a Jordan block, J = J ,n .
Let (b1 , . . . , bn ) be a Jordan basis for J. For all ai 2 K
⇣ ⌘ n
(J In )k  ai bi =  ai bi k
i i =k+1

which shows
(
Span(b1 , . . . , bk ) if 0  k  n;
ker( J In )k =
Kn if n  k.
so (
k if 0  k  n;
nullity( A In )k = nullity( J In )k =
n if n  k.
The result for J = J ,k follows.
Knowing the result for Jordan blocks, we deduce the full result as follows. There
are subspaces W1 , . . . , Wk such that V = W1 · · · Wk and T (Wi ) ⇢ Wi and the
restriction Ti of T to Wi is a Jordan block (that is, the matrix of T with respect to a
suitable basis is a Jordan block). Then

# Jordan blocks of T of eigenvalue and degree i


= Â # Jordan blocks of Tj of eigenvalue and degree i
j
⇣ ⌘
= Â nullity( Tj IW j )i nullity( Tj IW j )i 1
by the special case
j
⇣ ⌘ ⇣ ⌘
=  nullity(Tj IW j )i  nullity(Tj IW j )i 1

j j
i 1
= nullity( T IW ) nullity( T IW )i by lemma 38. ⇤

2.8 Proof of theorem 26 (non-examinable)


We proceed by induction on n = dim(V ). The case n = 0 is clear.
Let be an eigenvalue of T and let U = im( T IV ) and m = dim(U ). Then
m = rank( T IV ) = n nullity( T IV ) < n, because the eigenvectors for lie in
the nullspace of T IV .
Note T (U ) ⇢ U because for u 2 U, we have u = ( T IV )(v) for some v 2 V,
and hence T (u) = T ( T IV )(v) = ( T IV ) T (v) 2 U.
Let TU : U ! U denote its restriction. We apply the inductive hypothesis to TU to
deduce that U has a Jordan basis e1 , . . . , em .
We now show how to extend the Jordan basis of U to one of V. We do this in
two stages. For the first stage, suppose that ` of the (nonempty) Jordan chains of
TU are for the eigenvalue (possibly ` = 0). For each such chain v1 , . . . , vk with
T (v1 ) = v1 , T (vi ) = vi + vi 1 , 2  i  k, since vk 2 U = im( T IV ), we can
find vk+1 2 V with T (vk+1 ) = vk+1 + vk , thereby extending the chain by an extra
vector. So far we have adjoined ` new vectors to the basis, by extending the Jordan
chains of eigenvalue by one vector. Let us call these new vectors w1 , . . . , w` .
Now for the second stage. Recall that W := ker( T IV ) has dimension n m.
We already have ` vectors in W, namely the first vector in each Jordan chain of
eigenvalue .
We can adjoin (n m) ` further eigenvectors of T to the ` that we have already
to complete a basis of W. Let us call these (n m) ` new vectors wl +1 , . . . , wn m .
November 27, 2014 MA251 Algebra I 15

They are adjoined to our basis of V in the second stage. They each form a Jordan
chain of length 1, so we now have a collection of n vectors which form a disjoint
union of Jordan chains.
To complete the proof, we need to show that these n vectors form a basis of V, for
which is it is enough to show that they are linearly independent.
By lemma 40 we may assume that T has only one eigenvalue .
Suppose that ↵1 w1 + · · · + ↵n m wn m + x = 0, where x 2 U. Applying T In
gives
↵1 ( T In )(w1 ) + · · · + ↵l ( T In )(w` ) + ( T In )( x) = 0.
For all i with 1  i  ` then, ( T In )(wi ) is the last member of one of the `
Jordan chains for TU . Moreover, ( T In )( x) is a linear combination of the basis
vectors of U other than ( T In )(wi ) for 1  i  `. Hence, by linear independence
of the basis of U, we deduce that ↵i = 0 for 1  i  ` and ( T IV )( x) = 0.
So x 2 ker( TU IU ). But, by construction, w`+1 , . . . , wn m extend a basis of
ker( TU IU ) to W = ker( T IV ), so we also get ↵i = 0 for ` + 1  i  n m,
which completes the proof. ⇤
Lemma 40. Let T : V ! V be linear and dim(V ) = n < 1. Let ⇤ be a set of
eigenvalues of T. For all 2 ⇤ let x 2 V be such that ( T IV )n x = 0. Assume
Â⇤ x = 0. Then x = 0 for all .
Proof. Induction on k = #⇤. For k = 0 there is nothing to prove. Assume it’s true for
k 1. Pick ⇡ 2 ⇤ and apply ( T ⇡ IV )n to Â⇤ x = 0. We get

 (T ⇡ IV )n x = 0.
⇤\{⇡ }

By the induction hypothesis ( T ⇡ IV )n x = 0 for all .


Fix 2 ⇤ \ {⇡ }. Let µ be the unique monic polynomial such that µ ( T ) x = 0
of least degree (relative minimal polynomial). Then µ ( x) divides both ( x ⇡ )n and
(x )n and ⇡ 6= so µ = 1 and x = 0. ⇤

2.9 Examples
0 1
2 0 0 0
B 0 2 1 0 C
Example 41. A = B
@ 0 0 2
C.
0 A
1 0 2 2
Then c A ( x) = ( 2 x)4 , so there is a single eigenvalue 2 with multiplicity 4. We
find 0 1
0 0 0 0
B 0 0 1 0 C
( A + 2I4 ) = B
@ 0 0 0
C,
0 A
1 0 2 0

and ( A + 2I4 )2 = 0, so µ A = ( x + 2)2 , and the JCF of A could be J 2,2 J 2,2 or


J 2,2 J 2,1 J 2,1 .
To decide which case holds, we calculate the nullity of A + 2I4 which, by theo-
rem 39, is equal to the number of Jordan blocks with eigenvalue 2. Since A + 2I4
has just two non-zero rows, which are distinct, its rank is clearly 2, so its nullity is
4 2 = 2, and hence the JCF of A is J 2,2 J 2,2 .
A Jordan basis consists of a union of two Jordan chains, which we will call v1 , v2 ,
and v3 , v4 , where v1 and v3 are eigenvectors and v2 and v4 are generalised eigenvec-
tors of index 2. To find such chains, it is probably easiest to find v2 and v4 first and
then to calculate v1 = ( A + 2I4 )v2 and v3 = ( A + 2I4 )v4 .
Although it is not hard to find v2 and v4 in practice, we have to be careful, because
they need to be chosen so that no nonzero linear combination of them lies in the
16 MA251 Algebra I November 27, 2014

nullspace of A + 2I4 . In fact, since this nullspace is spanned by the second and fourth
standard basis vectors, the obvious choice is v2 = (1 0 0 0)T , v4 = (0 0 1 0)T , and
then v1 = ( A + 2I4 )v2 = (0 0 0 1)T , v3 = ( A + 2I4 )v4 = (0 1 0 2)T , so to transform
A to JCF, we put
0 1 0 1 0 1
0 1 0 0 0 2 0 1 2 1 0 0
B 0 0 1 0 C 1 B 1 0 0 0 C 1 B 0 2 0 0 C
P= B C, P = B C, P AP = B C.
@ 0 0 0 1 A @ 0 1 0 0 A @ 0 0 2 1 A
1 0 2 0 0 0 1 0 0 0 0 2
0 1
1 3 1 0
B 0 2 1 0 C
Example 42. A = B
@ 0 0 2
C.
0 A
0 3 1 1
Then c A ( x) = ( 1 x)2 (2 x)2 , so there are two eigenvalue 1, 2, both with
multiplicity 2. There are four possibilities for the JCF (one or two blocks for each
of the two eigenvalues). We could determine the JCF by computing the minimal
polynomial µ A but it is probably easier to compute the nullities of the eigenspaces
and use theorem 39. We have
0 1
0 3 1 0
B 0 3 1 0 C
A + I4 = B C,
@ 0 0 3 0 A
0 3 1 0
0 1 0 1
3 3 1 0 9 9 0 0
B 0 0 1 0 C B 0 0 0 0 C
(A 2I4 ) = B
@ 0 0 0
C,
0 A
(A 2I4 )2 = B
@ 0 0 0
C.
0 A
0 3 1 3 0 9 0 9

The rank of A + I4 is clearly 2, so its nullity is also 2, and hence there are two
Jordan blocks with eigenvalue 1. The three non-zero rows of ( A 2I4 ) are linearly
independent, so its rank is 3, hence its nullity 1, so there is just one Jordan block with
eigenvalue 2, and the JCF of A is J 1,1 J 1,1 J2,2 .
For the two Jordan chains of length 1 for eigenvalue 1, we just need two linearly
independent eigenvectors, and the obvious choice is v1 = (1 0 0 0)T , v2 = (0 0 0 1)T .
For the Jordan chain v3 , v4 for eigenvalue 2, we need to choose v4 in the nullspace of
( A 2I4 )2 but not in the nullspace of A 2I4 . (This is why we calculated ( A 2I4 )2 .)
An obvious choice here is v4 = (0 0 1 0)T , and then v3 = ( 1 1 0 1)T , and to
transform A to JCF, we put
0 1 0 1 0 1
1 0 1 0 1 1 0 0 1 0 0 0
B 0 0 1 0 C 1 B 0 1 0 1 C 1 B 0 1 0 0 C
P= B C, P = B C, P AP = B C.
@ 0 0 0 1 A @ 0 1 0 0 A @ 0 0 2 1 A
0 1 1 0 0 0 1 0 0 0 0 2

2.10 Powers of matrices


The theory we developed can be used to compute powers of matrices efficiently. Sup-
pose we need to compute A2012 where
0 1
2 0 0 0
B 0 2 1 0 C
A=B
@ 0 0 2
C
0 A
1 0 2 2

from example 41.


There are two practical ways of computing An for a general matrix. The first one
involves Jordan forms. If J = P 1 AP is the JCF of A then it is sufficient to compute
J n because of the telescoping product:
1 n 1 1 1 1 1 1
An = ( PJP ) = P( JP P)n JP = PJ n JP = PJ n P .
November 27, 2014 MA251 Algebra I 17

0 1 0 1
Jk1 , 1
0 ··· 0 Jkn1 , 1
0 ··· 0
B 0
B Jk2 , ··· 0 C
C
B 0
B Jkn2 , ··· 0 C
C
2 n 2
If J = B .. C then J = B .. C .
@ . A @ . A
0 0 ··· Jkt , t 0 0 ··· Jknt , t

Finally, the power of an individual Jordan block can be computed as


0 n 1
n n 1 · · · Ckn 2 n k+2 Ckn 1 n k+1
B 0 n · · · Ckn 3 n k+3 Ckn 2 n k+2 C
B C
B C
Jk,n = B . . . ..
.
..
.
..
. C
B C
@ 0 0 ··· n n n 1 A
0 0 ··· 0 n

where Ctn = n!/(n t)!t! is the Choose-function, interpreted as Ctn = 0 whenever


t > n.
Let us apply it to the matrix from example 41:
0 1 0 1n 0 1
0 1 0 0 2 1 0 0 0 2 0 1
1 B 0 0 1 0 C B 0 2 0 0 C B 1 0 0 0 C
An = PJ n P =B@ 0 0
C B C B C=
0 1 A @ 0 0 2 1 A @ 0 1 0 0 A
1 0 2 0 0 0 0 2 0 0 1 0
0 1 0 1
10 1
0 1 0 0 ( 2)n n( 2)n 0 0 0 2 0 1
B 0 0 1 0 C B 0 ( 2)n 0 0 CB 1 0 0 0 C
=B C B
1
CB C=
@ 0 0 0 1 A @ 0 0 ( 2)n n( 2)n A@ 0 1 0 0 A
1 0 2 0 0 0 0 ( 2)n 0 0 1 0
0 1
( 2)n 0 0 0
B 0 ( 2)n n( 2)n 1 0 C
C.
=B
@ 0 0 ( 2)n 0 A
n( 2)n 1 0 n( 2)n ( 2)n
The second method of computing An uses Lagrange’s interpolation polynomial.
It is less labour intensive and more suitable for pen-and-paper calculations. Suppose
( A) = 0 for a polynomial ( z). In practice, ( z) is either the minimal or charac-
teristic polynomial. Dividing with a remainder zn = q( z) ( z) + h( z), we conclude
that
An = q( A) ( A) + h( A) = h( A).
Division with a remainder may appear problematic6 for large n but there is a shortcut.
If we know the roots of ( z), say ↵1 , . . . , ↵k with their multiplicities m1 , . . . , mk , then
h( z) can be found by solving the system of simultaneous equations in coefficients of
h ( z ):
f (t) (↵ j ) = h(t) (↵ j ) whenever 1  j  k, 0  t < m j
where f ( z) = zn and f (t) = ( f (t 1) )0 is the t th derivative. In other words, h( z) is
Lagrange’s interpolation polynomial for the function zn at the roots of ( z).
We know that µ A ( z) = ( z + 2)2 for the matrix A above. Suppose the Lagrange
interpolation of zn at the roots of ( z + 2)2 is h( z) = ↵z + . The condition on the
coefficients is given by

( 2)n = h( 2) = 2↵ +
n( 2)n 1 = h0 ( 2) = ↵
Solving them gives ↵ = n( 2)n 1 and = (1 n)( 2)n . It follows that
0 1
( 2)n 0 0 0
1 B 0 ( 2)n n( 2)n 1 0 C
An = n( 2)n A + (1 n)( 2)n I = B
@ n
C.
0 0 ( 2) 0 A
n( 2)n 1 0 n( 2)n ( 2)n
6 Try to divide z2012 by z2 + z + 1 without reading any further.
18 MA251 Algebra I November 27, 2014

2.11 Applications to difference equations


Let us consider an initial value problem for an autonomous system with discrete time:

x(n + 1) = A x(n), n 2 N, x(0) = w.

Here x(n) 2 K m is a sequence of vectors in a vector space over a field K. One thinks
of x(n) as a state of the system at time n. The initial state is x(0) = w. The n ⇥ n
matrix A with coefficients in K describes the evolution of the system. The adjective
autonomous means that the evolution equation does not change with the time7 .
It takes longer to formulate this problem than to solve it. The solution is a no-
brainer:
x(n) = Ax(n 1) = A2 x(n 2) = · · · = An x(0) = An w.
As a working example, let us consider a 2-step linearly recursive sequence. It is
determined by a quadruple ( a, b, c, d) 2 K 4 and the rules

s0 = a, s1 = b, sn = csn 1 + dsn 2 for n 2.

Such sequences are ubiquitous. Arithmetic sequences form a subclass with c = 2,


d = 1. In general, ( a, b, 2, 1) determines the arithmetic sequence starting at a with
the difference b a. For instance, (0, 1, 2, 1) determines the sequence of natural
numbers sn = n.
A geometric sequence starting at a with ratio q admits a non-unique description.
One obvious quadruples giving it is ( a, aq, q, 0). However, it is conceptually better
to use quadruple ( a, aq, 2q, q2 ) because the sequences coming from ( a, b, 2q, q2 )
include both arithmetic and geometric sequences and can be called arithmo-geometric
sequences.
If c = d = 1 then this is a Fibonacci type sequence. For instance, (0, 1, 1, 1)
determines Fibonacci numbers Fn while (2, 1, 1, 1) determines Lucas numbers Ln .
All of these examples admit closed 8 formulae for a generic term sn . Can we find a
closed formula for sn , in general? Yes, we can because this problem is reduced to an
initial value problem with discrete time if we set
✓ ◆ ✓ ◆ ✓ ◆
sn a 0 1
x(n) = , w= , A= .
sn+1 b d c

Computing the characteristic 2 cz d. If c2 + 4d = 0, the


✓ ◆polynomial, c A ( z) = z
c/2 1
JCF of A is J = . Let q = c/2. Then d = q2 and we are dealing with
0 c/2
the arithmo-geometric sequence ( a, b, 2q, ✓q2 ). Let us◆find the closed formula for sn
0 1
in this case using Jordan forms. As A = 2 one can choose the Jordan
q 2q
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
0 1 1 0 1 1 0
basis e2 = , e1 = . If P = then P = and
1 q q 1 q 1
✓ n ◆ ✓ ◆
n 1 n n 1 q nqn 1 1 (1 n)qn nqn 1
A = ( PJP ) = PJ P = P P = .
0 qn nqn+1 (1 + n)qn

This gives the closed formula for arithmo-geometric sequence we were seeking:

n)qn a + nqn 1 b.
sn = (1
✓ p ◆
2 (c + c2 + 4d)/2 p 0
If c + 4d 6= 0, the JCF of A is and
0 (c c2 + 4d)/2
the closed formula for sn will involve the sum of two geometric sequences. Let us
7A nonautonomous system would be described by x(n + 1) = A(n) x(n) here.
8 Closed means non-recursive, for instance, sn = a + n(b a) for the arithmetic sequence
November 27, 2014 MA251 Algebra I 19

see it through for Fibonacci and Lucas numbers using Lagrange’s polynomial.pSince
c = d = 1, c2 + 4d p = 5 and the roots of c A ( z) are the goldenpratio = (1 + 5)/2
and 1 = (1 5)/2. It is useful to observe that 2 1 = 5 and (1 ) = 1.
Let us introduce the number µn = n (1 )n . Suppose the Lagrange interpolation
of zn at the roots of z2 z 1 is h( z) = ↵z + . The condition on the coefficients is
given by ⇢ n
= h( ) = ↵ +
(1 )n = h(1 ) = ↵ (1 )+
p p
Solving them gives ↵ = µn / 5 and = µn 1 / 5. It follows that
✓ p p ◆
n
p p µn 1 /p 5 µn / 5 p
A = ↵A + = µn / 5A + µn 1 / 5I2 = .
µn / 5 (µn + µn 1 )/ 5
✓ ◆ ✓ ◆
Fn n 0
Since =A , it immediately implies that
Fn+1 1
✓ ◆ p
n Fn 1 Fn
A = and Fn = µn / 5 .
Fn Fn+1
✓ ◆ ✓ ◆
Ln n 2
Similarly for the Lucas numbers, we get =A and
Ln+1 1
p
Ln = 2Fn 1 + Fn = Fn 1 + Fn+1 = (µn 1 + µn+1 )/ 5.

2.12 Functions of matrices


We restrict to K = R in this section. Let us consider a power series Ân an zn , an 2 R
with a positive radius of convergence ". It defines a function f : ( ", ") ! R by
1
f ( x) = Â an xn
n=0

and the power series is Taylor’s series of f ( x) at zero. In particular,

an = f [n] (0) = f (n) (0)/n!,


where f [n] ( z) = f (n) ( z)/n! is a divided derivative. We extend the function f ( z) to
matrices by the formula
1
f ( A) = Â f [n] ( 0 ) An .
n=0
The right hand side of this formula is a matrix whose entries are series. All these
series need to converge for f ( A) to be well defined. If the norm9 of A is less than "
then f ( A) is well defined. Alternatively, if all eigenvalues of A belong to ( ", ") then
f ( A) is well defined as can be seen from the JCF method of computing f ( A). If
0 1
Jk1 , 1 0 ··· 0
B 0 Jk2 , 2 · · · 0 C
B C
J=B .. C = P 1 AP
@ . A
0 0 ··· Jkt , t

is the JCF of A then


0 1
f ( Jk1 , 1 ) 0 ··· 0
B 0 f ( Jk2 , ) ··· 0 C
1 B 2 C
f ( A) = P f ( J ) P = PB .. C
@ . A
0 0 ··· f ( Jkt , t )
9 this notion is beyond the scope of this module and will be discussed in Differentiation
20 MA251 Algebra I November 27, 2014

while 0 1
f( ) f [1] ( ) ··· f [k 1]
( )
B 2] C
B 0 f( ) ··· f [k ( ) C
f ( Jk, ) = B C.
B .. C
@ . A
0 0 ··· f( )

Lagrange’s method works as well:


Proposition 43. Let an 2 R for n 0. Suppose that the radius of convergence of
f ( x) = Ân 0 an xn is " > 0.
Let A 2 Rk,k and 2 R[ x] be such that ( A) = 0 and splits in R[ x] and all roots
of are in ( ", ").
Let h 2 R[ x] (Lagrance interpolation) be the unique polynomial of degree < deg( )
such that h(`) (↵ ) = f (`) (↵ ) whenever ( x ↵ )`+1 | . Then f ( A) = h( A).
Proof. In analysis people prove that there exists a unique function q on ( ", "), given
by a converging Taylor series, such that f ( x) h( x) = ( x) q( x). Now set x = A. ⇤
Recall that Taylor’s series for exponent e x = Â1 n
n=0 x / n! converges for all x.
A n
Consequently the matrix exponent e = Ân=0 A /n! is defined for all real m-by-m
1

matrices A. Let us compute e A for the matrix A from example 41.


Suppose the Lagrange interpolation of e z at the roots of µ A ( z) = ( z + 2)2 is
h( z) = ↵z + . The condition on the coefficients is given by
⇢ 2
e = h( 2) = 2↵ +
e 2 = h0 ( 2) = ↵

Solving them gives ↵ = e 2 and = 3e 2. It follows that


0 2 1
e 0 0 0
B 2 2
2 2 0 e e 0 C
eA = e A + 3e I=B
@ 2
C.
0 0 e 0 A
2 2 2
e 0 2e e

2.13 Applications to differential equations


Let us now consider an initial value problem for an autonomous system with contin-
uous time:
dx(t)
= Ax(t), t 2 [0, 1), x(0) = w.
dt
Here A 2 Rn⇥n , w 2 Rn are given, x : R 0 ! Rn is a smooth function to be found.
One thinks of x(t) as a state of the system at time t. The solution to this problem is

x(t) = etA w

because, as one can easily check,


✓ ◆
d d tn n tn 1 tk
dt
( x(t)) = Â dt n!
A w =Â
(n 1)!
An w = A Â Ak w = Ax(t).
k!
n n k

Example 44. Let us consider a harmonic oscillator described by equation y00 (t) +
y(t) = 0. The general solution y(t) = ↵ sin(t) + cos(t) is well known. Let us
obtain it using matrix exponents. Setting
✓ ◆ ✓ ◆
y(t) 0 1
x(t) = , A=
y0 (t) 1 0
November 27, 2014 MA251 Algebra I 21

the harmonic oscillator becomes the initial value problem with a solution x(t) =
etA x(0). The eigenvalues of A are i and i. Interpolating e zt at these values of z gives
the following condition on h( z) = ↵z +

eit = h (i ) = ↵i +
e it = h( i) = ↵i +

Solving them gives ↵ = (eit e it )/ 2i = sin(t) and = (eit + e it )/ 2 = cos(t). It


follows that ✓ ◆
tA cos(t) sin(t)
e = sin(t) A + cos(t) I2 =
sin(t) cos(t)

and y(t) = cos(t) y(0) + sin(t) y0 (0).


Example 45. Let us consider a system of differential equations
8 0 8
< y1 = y1 3y3 < y1 (0) = 1
y0 = y1 y2 6y3 with initial condition y (0) = 1
: 20 : 2
y3 = y1 + 2y2 + 5y3 y3 (0) = 0

Using matrices
0 1 0 1 0 1
y1 (t) 1 1 0 3
x(t) = @ y2 (t) A, w= @ 1 A, A=@ 1 1 6 A ,
y3 (t) 0 1 2 5

it becomes an initial value problem. The characteristic polynomial is c A ( z) = z3 +


5z2 8z + 4 = (1 z)(2 z)2 . We need to interpolate etz at 1 and 2 by h( z) =
↵z2 + z + . At the multiple root 2 we need to interpolate up to order 2 that involves
tracking the derivative (etz )0 = tetz :
8 t
< e = h(1) = ↵+ +
e 2t = h(2) = 4↵ + 2 +
: 2t
te = h0 (2) = 4↵ +

Solving, ↵ = (t 1)e2t + et , = (4 3t)e2t 4et , = (2t 3)e2t + 4et . It follows


that 0 1 0 1
3t 3 6t + 6 9t + 6 4 6 6
etA = e2t @ 3t 2 6t + 4 9t + 3 A + et @ 2 3 3 A
t 2t 3t + 1 0 0 0

and 0 1 0 1 0 1
y1 (t) 1 (3 3t)e2t 2et
x(t) = @ y2 (t) A = etA @ 1 A = @ (2 3t)e2t et A .
y3 (t) 0 te2t

3 Bilinear Maps and Quadratic Forms


3.1 Bilinear maps: definitions
Definition 46. Let V and W be vector spaces over a field K. A bilinear map on W
and V is a map ⌧ : W ⇥ V ! K such that
(a) ⌧ (↵1 w1 + ↵2 w2 , v) = ↵1 ⌧ (w1 , v) + ↵2 ⌧ (w2 , v) and
(b) ⌧ (w, ↵1 v1 + ↵2 v2 ) = ↵1 ⌧ (w, v1 ) + ↵2 ⌧ (w, v2 )
for all w, w1 , w2 2 W, v, v1 , v2 2 V, and ↵1 , ↵2 2 K.
22 MA251 Algebra I November 27, 2014

Notice the difference between linear and bilinear maps. For instance, let V =
W = K. Addition is a linear map but not bilinear. On the other hand, multiplication
is bilinear but not linear.
Let us choose a basis E = (e1 , . . . , en ) of V and a basis F = ( f 1 , . . . , f m ) of W.
Let ⌧ : W ⇥ V ! K be a bilinear map, and let ↵i j = ⌧ ( f i , e j ), for 1  i  m,
1  j  n. The m ⇥ n matrix A = (↵i j ) is called the matrix of ⌧ with respect to the
bases E and F.
Let v 2 V, w 2 W and write v = x1 e1 + · · · + xn en and w = y1 f 1 + · · · + ym f m .
Recall our notation from section 1.2
0 1 0 1
x1 y1
B . C B . C
[ E, v] = @ .. A 2 K n,1 , and [ F, w] = @ .. A 2 K m,1
xn ym

Then, by using the equations (a) and (b) above, we get


m n m n
⌧ (w, v) = ÂÂ yi ⌧ ( f i , e j ) x j = Â Â yi ↵i j x j = [ F, w]T A[E, v] (47)
i =1 j=1 i =1 j=1

For 2
✓ example,
◆ let V = W = R and use the natural basis of V. Suppose that
1 1
A= 2 0
. Then
✓ ◆✓ ◆
1 1 x1
⌧ (( y1 , y2 ), ( x1 , x2 )) = ( y1 , y2 ) 2 0 x2
= y1 x1 y1 x2 + 2y2 x1 .

Proposition 48. Let E be a basis of V and F of W. Write n = dim V, m = dim W.


Then there is a bijection from the set of bilinear forms on W ⇥ V to K m,n , taking ⌧ to its
matrix with respect to ( E, F ).
Proof. Exercise. ⇤

3.2 Bilinear maps: change of basis


Theorem 49. Let A be the matrix of a bilinear map ⌧ : W ⇥ V ! K with respect to the
bases E1 and F1 of V and W, and let B be its matrix with respect to the bases E2 and
F2 of V and W. Consider the basis change matrices P = [ E1 , 1, E2 ] and Q = [ F1 , 1, F2 ].
Then B = QT AP.
Proof. Recall (CD )T = DT CT whenever C 2 K p,q and D 2 K q,r . By (47)

[ F2 , y]T B[ E2 , x] = ⌧ ( x, y) = [ F1 , x]T A[ E1 , y]
T
= [ F1 , 1, F2 ][ F2 , y] A [ E1 , 1, E2 ][ E2 , x]
T
= Q [ F2 , y] A P [ E2 , x] = [ F2 , y]T QT A P [ E2 , x]

for all ( x, y) 2 V ⇥ W. By proposition 48 it follows that B = QT AP. ⇤


Compare this result with the formula B = QAP of corollary 4.
Definition 50. A bilinear map ⌧ : V ⇥ V ! K (so ‘V = W’) is called a bilinear form
on V.
From now on we shall mainly be concerned with bilinear forms rather than bilin-
ear maps.
Instead of saying that the matrix of a bilinear form is with respect to ( E, E) we
just say that it is with respect to E. Theorem 49 immediately yields:
Theorem 51. Let A be the matrix of a bilinear form ⌧ on V with respect to a basis E1
of V, and let B be its matrix with respect to a basis E2 of V. Consider the change of basis
matrix P = [ E1 , 1, E2 ]. Then B = PT AP. ⇤
November 27, 2014 MA251 Algebra I 23

So, in the example at the ✓end of Subsection



3.1,

if we choose

the new basis e01 =
1 1 0 1
(1 1), e02 = (1 0) then P = 1 0
, PT AP = 2 1
, and

⌧ y01 e01 + y02 e02 , x01 e01 + x02 e02 = y01 x02 + 2y02 x01 + y02 x02 .

Definition 52. Two square matrices A and B are called congruent if there exists an
invertible matrix P with B = PT AP.
Compare this with similarity of matrices which is defined by the formula B =
P 1 AP.

Definition 53. A bilinear form ⌧ on V is called symmetric if ⌧ (w, v) = ⌧ (v, w) for all
v, w 2 V. An n ⇥ n matrix A is called symmetric if AT = A.
Proposition 54. Let E be a finite basis of a vector space V. Then, a bilinear form ⌧ on
V is symmetric if and only if its matrix with respect to E is symmetric.
Proof. Easy exercise. ⇤
Example 55. The best known example of a bilinear form is when V = Rn , and ⌧ is
defined by

⌧ (( x1 , x2 , . . . , xn ), ( y1 , y2 , . . . , yn )) = x1 y1 + x2 y2 + · · · + xn yn .

The matrix of this bilinear form with respect to the standard basis of Rn is the iden-
tity matrix In . Geometrically, it is equal to the normal scalar product ⌧ (v, w) =
|vkw| cos ✓, where ✓ is the angle between the vectors v and w.

3.3 Quadratic forms: introduction


A quadratic form on K n is a polynomial function of several variables x1 , . . . , xn in
which each term has total degree two, such as 3x2 + 2xz + z2 4yz + xy. One mo-
tivation to study them comes from the geometry of curves or surfaces defined by
quadratic equations.
As an example, consider the equation 5x2 + 5y2 6xy = 2. See figure 2(a).
This represents an ellipse, in which the two principal axes are at an angle of ⇡ /4
with the x- and y-axes. To study such curves in general, it is desirable to change
variables (which will turn out to be equivalent to a change of basis) so as to make
the principal axes of the ellipse coincide with the x- and y-axes. This is equivalent
to eliminating the xy-term in the equation. We can do this easily by completing the
square.
In the example

5x2 + 5y2 6xy = 2 , 5( x 3y/5)2 9y2 /5 + 5y2 = 2


, 5( x 3y/5)2 + 16y2 /5 = 2

so if we change variables, and put x0 = x 3y/5 and y0 = y, then the equation


becomes 5( x0 )2 + 16( y0 )2 /5 = 2. See figure 2(b).
Here we have allowed an arbitrary basis change. We shall study this situation in
section 3.5.
One disadvantage of doing this is that the shape of the curve has become distorted.
If we wish to preserve the shape, then we should restrict our basis changes to those
that preserve distance and angle. These are called orthogonal basis changes, and
we shall study that situation p in section 3.6. In
p the example, we can use the change
of variables x0 = ( x + y)/ 2, y0 = ( x y)/ 2 (which represents a non-distorting
rotation through an angle of ⇡ /4), and the equation becomes ( x0 )2 + 4( y0 )2 = 1. See
figure 3.
24 MA251 Algebra I November 27, 2014

Figure 2:

0.8 0.8

0.6 0.6

0.4 0.4
y y’

0.2 0.2

–0.8 –0.6 –0.4 –0.2 0.2 0.4 0.6 0.8 –0.6 –0.4 –0.2 0.2 0.4 0.6
x x’
–0.2 –0.2

–0.4 –0.4

–0.6 –0.6

–0.8 –0.8

(a). 5x2 + 5y2 6xy = 2 (b). 5( x0 )2 + 16( y0 )2 /5 = 2

0.4

y’
0.2

–1 –0.5 0 0.5 1
x’
–0.2

–0.4

Figure 3: ( x0 )2 + 4( y0 )2 = 1
November 27, 2014 MA251 Algebra I 25

3.4 Quadratic forms: definitions


Definition 56. Let V be a finite-dimensional vector space. A function q : V ! K is
said to be a quadratic form on V if there exists a bilinear form ⌧ : V ⇥ V ! K such
that q(v) = ⌧ (v, v) for all v 2 V.
As this is the official definition of a quadratic form we will use, we do not really
need to observe that it yields the same notion for the standard vector space K n as the
definition in the previous section. However, it is a good exercise that an inquisitive
reader should definitely do. The key is to observe that the function xi x j comes from
the bilinear form ⌧i, j such that ⌧i, j (ei , e j ) = 1 and zero elsewhere.
In proposition 57 we need to be able to divide by 2 in the field K. This means that
we must assume10 that 1 + 1 6= 0 in K. For example, we would like to avoid the field
of two elements. If you prefer to avoid worrying about such technicalities, then you
can safely assume that K is either Q, R or C.
Let us consider the following three sets. The first set Q(V, K ) consists of all
quadratic forms on V. It is a subset of the set of all functions from V to K. The
second set Bil(V ⇥ V, K ) consists of all bilinear forms on V. It is a subset of the set
of all functions from V ⇥ V to K. Finally, we need Sym(V ⇥ V, K ), the subset of
Bil(V ⇥ V, K ) consisting of the symmetric bilinear forms.
There are two interesting functions connecting these sets:

Bil(V ⇥ V, K ) ! Q(V, K ) ! Sym(V ⇥ V, K ) :


(⌧ ) (v) = ⌧ (v, v), (q) (u, v) = q(u + v) q(u) q(v).
Proposition 57. The following hold for all q 2 Q(V, K ) and ⌧ 2 Sym(V ⇥ V, K ):
(a) (q) 2 Sym(V ⇥ V, K ),
(b) ( (q)) = 2q,
(c) ( (⌧ )) = 2⌧,
(d) If 1 + 1 6= 0 in K then : Q(V, K ) ! Sym(V ⇥ V, K ) is bijective.
Proof. Proof of (a). Let be a bilinear form on V such that q = ( ). So q(v) =
(v, v) for all v. Then
(q) (v, w) = q(v + w) q(v) q(w)
= (v + w, v + w) (v, v) (w, w) = (v, w) + (w, v) (58)
for all v, w. Prove yourself that (w, v) and (v, w) + (w, v) are bilinear expressions
in (v, w). This proves (a).
Proof of (b). Using the notation of (a) (q) (v) = (v, v) + (v, v) = 2 (v, v) =
2 q(v) for all v so (q) = 2q as required.
Proof of (c). Write r = (⌧ ). So (r) (v, w) = ⌧ (v, w) + ⌧ (w, v) by (58).
Therefore (⌧ ))(v, w) = ⌧ (v, w) + ⌧ (w, v) = 2 ⌧ (v, w) because ⌧ is symmetric.
This proves (c).
Proof of (d). By (b) and (c) /2 is an inverse to . ⇤
Let ⌧ : V ⇥ V ! K be a symmetric bilinear form. Let E = (e1 , . . . , en ) be a basis
of V. Recall that the coordinates of v with respect to E are defined to be the scalars
xi such that v = Âin=1 xi ei .
Let A = (↵i j ) be the matrix of ⌧ with respect to E. We will also call A the matrix
of q := (⌧ ) with respect to this basis. Then A is symmetric because ⌧ is, and by (47)
n n
q(v) = [ E, v]T A [ E, v] = Â Â xi ↵i j x j = Â ↵ii xi2 + 2 Â ↵i j xi x j .
i =1 j=1 1 i  n 1 i < j  n

10 Fields
with 1 + 1 = 0 are fields of characteristic 2. One can actually do quadratic and bilinear forms
over them but the theory is quite specific. It could be a good topic for a second year essay.
26 MA251 Algebra I November 27, 2014

When n  3,

we shall

usually write x, y, z instead of x1 , x2 , x3 . For example, if
1 3
n = 2 and A = 3 2
, then q(v) = x2 2y2 + 6xy.
Conversely, if we are given a quadratic form as in the right hand side of (3.4),
then it is easy to write down its0matrix A. For example,
1
if n = 3 and q(v) = 3x2 +
3 2 1/2
y2 2z2 + 4xy xz, then A = @ 2 1 0 A.
1/2 0 2

3.5 Change of variable under the general linear group


Remark 59. Let V be a finite-dimensional vector space. A cobasis of V is a sequence
( f 1 , . . . , f n ) of linear maps f i : V ! K such that for every linear map g : V ! K
there are unique scalars ↵i such that g = ↵1 f 1 + · · · + ↵n f n (recall that this means
g( x) = ↵1 f 1 ( x) + · · · + ↵n f n ( x) for all x 2 V).
We also say that ( f 1 , . . . , f n ) is a system of linear coordinates on V. We sometimes
abuse notation and write f i instead of f i ( x), as in { x 2 V | f 1 + 2 f 3 = 0}.
A basis (e1 , . . . , en ) and cobasis ( f 1 , . . . , f n ) of V are said to be dual to each other
(or correspond to each other) if f i (e j ) = i j for all i, j.
Sometimes a cobasis is more useful than a basis. For example, if ( f 1 , . . . , f n ) is a
cobasis and ↵i j 2 K for 1  i  j  n then

 ↵i j f i f j
1 i  j  n

is a quadratic form on V.
It is for the sake of simplicity that we have introduced cobasis in the foregoing.
The grown-up way to say ‘cobasis of V’ is ‘basis of the dual of V’.
Theorem 60. Assume that 2 6= 0 in K.
Let q be a quadratic form on V and write dim(V ) = n. Then there are ↵1 , . . . ,
↵n 2 K and a basis F of V such that q(v) = Âin=1 ↵i yi2 , where the yi are the coordinates
of v with respect to F.
Equivalently, any symmetric matrix is congruent to a diagonal one.
Proof. This is by induction on n. There is nothing to prove when n = 1. As usual,
let A = (↵i j ) be the matrix of q with respect to an initial basis e1 , . . . , en and write
v = ⌃i xi ei .
Case 1. First suppose that ↵11 6= 0. As in the example in Subsection 3.3, we can
complete the square. Since 2 6= 0 we can write
q(v) = x21 + 2 2 x1 x2 + · · · + 2 n x1 xn + q0 (v),
where q0 is a quadratic form involving only the coordinates x2 , . . . , xn and i, 2 K.
We make the change of coordinates
y1 = x1 + 2 x2 + · · · + n xn , yi = xi for 2  i  n.
Then q(v) = y21 + q1 (v) for another quadratic form q1 (v) involving only y2 , . . . , yn .
By the inductive hypothesis (applied to the subspace of V spanned by e2 , . . . , en ),
we can change the coordinates of q1 from y2 , . . . , yn to z2 , . . . , zn , say, to bring it to
the required form, and then we get q(v) = Âin=1 ↵i zi2 (where z1 = y1 ) as required.
Case 2. ↵11 = 0 but ↵ii 6= 0 for some i > 1. In this case, we start by interchanging
e1 with ei (or equivalently x1 with xi ), which takes us back to case 1.
Case 3. ↵ii = 0 for all i. If ↵i j = 0 for all i and j then there is nothing to prove,
so assume that ↵i j 6= 0 for some i, j. Then we start by making a coordinate change
xi = yi + y j , x j = yi y j , xk = yk for k 6= i, j. This introduces terms 2↵i j ( yi2 y2j )
into q, taking us back to case 2. ⇤
November 27, 2014 MA251 Algebra I 27
0 1
0 1/2 5/2
Example 61. Let n = 3 and q(v) = xy + 3yz 5xz, so A = @ 1/2 0 3 / 2 A.
5/2 3/2 0
Since we are using x, y, z for our variables, we can use x1 , y1 , z1 (rather than
x0 , y0 , z0 ) for the variables with respect to a new basis, which will make things typo-
graphically simpler!
We are in Case 3 of the proof above, and so we start with a coordinate change
x = x01 + y1 , y 1= x1 y1 , z = z1 , which corresponds to the basis change matrix
1 1 0
P1 = @ 1 1 0 A. Then we get q(v) = x21 y21 2x1 z1 8y1 z1 .
0 0 1
We are now in Case 1 of the proof above, and the next basis change, from com-
pleting the square, is x2 = x1 z1 , y2 = y1 , z2 = z1 , or equivalently, x1 0= x2 + z12 ,
1 0 1
y1 = y2 , z1 = z2 , and then the associated basis change matrix is P2 = @ 0 1 0 A,
0 0 1

and q(v) = x22 y22 8y2 z2 z22 .


We now proceed by induction on the 2-coordinate form in y2 , z2 , and completing
the square again leads to the basis change x03 = x2 , y31 = y2 + 4z2 , z3 = z2 , which
1 0 0
corresponds to the basis change matrix P3 = @ 0 1 4 A, and q(v) = x23 y23 + 15z23 .
0 0 1
The total basis change in moving from the original basis with coordinates x, y, z
to the final basis with coordinates x3 , y3 , z3 is
0 1
1 1 3
P = P1 P2 P3 = @ 1 1 5 A,
0 0 1
0 1
1 0 0
and you can check that PT AP = @ 0 1 0 A, as expected.
0 0 15

Lemma 62. Let P, A 2 K n,n with P invertible. Then rank( A) = rank( PT AP).
Proof. In MA106 we proved that rank( A) = rank( QAP) for any invertible matrices
P, Q. Well, PT is also invertible and ( PT ) 1 = ( P 1 )T . Setting Q := PT finishes the
proof. ⇤
Definition 63. Assume 2 6= 0 in K and q 2 Q(V, K ). We define rank(q) to be the
rank of the matrix of q with respect to a basis E of V. This rank doesn’t depend on E
by theorem 51 and lemma 62.
If PT AP is diagonal, then its rank is equal to the number of non-zero terms on the
diagonal.
Remark 64. Notice that both statements of theorem 60 fail in characteristic 2. Let K
be the field of two elements. The quadratic form q( x, y) = xy cannot be diagonalised.
Similarly, the symmetric matrix ( 01 10 ) is not congruent to a diagonal matrix. Do
it as an exercise: there are 6 possible change of variable (you need to choose two
among three possible variables x, y and x + y) and you can observe directly what
happens with each change of variables.
Proposition 65. Any quadratic form q over C has the form q(v) = Âri=1 yi2 with respect
to a suitable basis, where r = rank(q).
Equivalently, for any symmetric matrix A 2 Cn,n , there is an invertible matrix P 2
Cn,n such that PT AP = B, where B = ( i j ) is a diagonal matrix with ii = 1 for
1  i  r, ii = 0 for r + 1  i  n, and r = rank( A).
Proof. By theorem 60 there are coordinates xi such that q(v) = Âin=1 ↵ii xi2 .
We may assume that ↵ii 6= 0 for 1  i  r and ↵ii = 0 for r + 1  i  n, where
r = rank(q) (otherwise we permute the coordinates).
p
Finally we make a coordinate change yi = ↵ii xi (1  i  r), giving q(v) =
Âri=1 yi2 . ⇤
28 MA251 Algebra I November 27, 2014

When K = R, we cannot take square roots of negative numbers, but we can


replace each positive ↵i by 1 and each negative ↵i by 1 to get:
Proposition 66 (Sylvester’s theorem). Any quadratic form q over R has the form
q(v) = Âit=1 xi2 Âiu=1 x2t+i with respect to a suitable basis, where t + u = rank(q).
Equivalently, given a symmetric matrix A 2 Rn,n , there is an invertible matrix P 2
R such that PT AP = B, where B = ( i j ) is a diagonal matrix with ii = 1 for
n,n

1  i  t, ii = 1 for t + 1  i  t + u, and ii = 0 for t + u + 1  i  n, and


t + u = rank( A). ⇤
We shall now prove that the numbers t and u of positive and negative terms are
invariants of q, that is, don’t depend on the basis. The difference t u is called the
signature of q.
Theorem 67 (Sylvester’s Law of Inertia). Let q be a quadratic form on the vector space
V over R. Suppose that E and F are two bases of V with respective coordinates xi and
yi such that

t u t0 u0
q(v) = Â xi2 Â x2t+i = Â yi2 Â y2t +i .
0 (68)
i =1 i =1 i =1 i =1

Then t = t0 and u = u0 .
Proof. Write n = dim(V ). We know that t + u = rank(q) = t0 + u0 , so it is enough
to prove that t = t0 . Suppose not, and suppose that t > t0 . Let

V1 = {v 2 V | xt+1 = xt+2 = . . . = xn = 0}
V2 = {v 2 V | y1 = y2 = . . . = yt0 = 0}.

Then

dim(V1 \ V2 ) = dim(V1 ) + dim(V2 ) dim(V1 + V2 ) = t + (n t0 )


dim(V1 + V2 ) t + (n t0 ) dim(V ) = t t0 > 0

and there is a non-zero vector v 2 V1 \ V2 . But it is easily seen from (68) that
0 6= v 2 V1 ) q(v) > 0 and 0 6= v 2 V2 ) q(v) < 0. This contradiction completes
the proof. ⇤

3.6 Change of variable under the orthogonal group


In subsection 3.6, we assume throughout that K = R.
Definition 69. A quadratic form q on V is said to be positive definite if q(v) > 0
whenever 0 6= v 2 V. The associated symmetric bilinear form ⌧ = (q) is also called
positive definite when q is.
Proposition 70. Let q be a positive definite quadratic form on V and put n = dim(V ).
Then there are coordinates xi on V such that q(v) = x21 + · · · + x2n .
Equivalently, the matrix of q with respect to a suitable basis is In .
Proof. Easy using proposition 66. ⇤
Definition 71. A vector space V over R together with a positive definite symmetric
bilinear form ⌧ is called a Euclidean space. We write v · w instead of ⌧ (v, w).
Definition 72. Let ⌧ be a bilinear form on a vector space V. A sequence of vectors
F = ( f 1 , . . . , f n ) in V is said to be orthonormal if ⌧ ( f i , f j ) = i j for all i, j.
November 27, 2014 MA251 Algebra I 29

Every Euclidean space admits an orthonormal basis by proposition 70.


We shall assume from now on that (V, ⌧ ) is a Euclidean space and we fix an
orthonormal basis E = (e1 , . . . , en ) of V.
Note that v · w = [ E, v]T [ E, w] for all v, w 2 V.
Definition 73. A linear operator T : V ! V is said to be orthogonal if T (v) · T (w) =
v · w for all v, w 2 V. In words, it preserves the scalar product on V.
Definition 74. An n ⇥ n matrix A is called orthogonal if AT A = In .
Proposition 75. A linear operator T : V ! V is orthogonal if and only if [ E, T, E] is
orthogonal (recall our assumption that E is orthonormal).
Proof. Write A := [ E, T, E]. For all v, w 2 V
T
T (v) · T (w) = [ E, Tv]T [ E, Tw] = A[ E, v] A[ E, w] = [ E, v]T AT A [ E, w]

and likewise v · w = [ E, v]T [ E, w]. Therefore

T is orthogonal () T (v) · T (w) = v · w for all v, w 2 v


() [ E, v]T AT A [ E, w] = [ E, v]T In [ E, w] for all v, w () AT A = In . ⇤

Proposition 76. Every orthogonal matrix A is invertible. Equivalently, every orthogo-


nal operator is invertible.
Proof. The identity AT A = In implies that AT is an inverse to A. ⇤
Proposition 77. Let T : V ! V be linear. Then T is orthogonal if and only if ( T (e1 ),
. . . , T (en )) is an orthonormal basis.
Proof. Proof of ). By proposition 76 T is invertible so ( T (e1 ), . . . , T (en )) is a basis.
Setting (v, w) = (ei , e j ) in the equation T (v) · T (w) = v · w gives T (ei ) · T (e j ) = i j
for all i, j as required.
Proof of (. Let v, w 2 V and write v = Âi ai ei , w = Â j b j e j . Using bilinearity of
the scalar product
⇣ ⌘ ⇣ ⌘
T (v) · T (w) = T Â ai ei · T Â b j e j = Â ai b j ( Tei · Te j ) = Â ai bi = v · w. ⇤
i j i, j i

Proposition 78. Let A 2 Rn,n and let c j denote the jth column of A. Then A is
orthogonal if and only if cTi c j = i j for all i, j.
Proof. Let c j denote the jth column of A. Then cTi is the ith row of AT . So the (i, j)th
entry of AT A is cTi c j . So AT A = In if and only if cTi c j = i j for all i, j. ⇤
✓ ◆
cos ✓ sin ✓
Example 79. For any ✓ 2 R, let A = sin ✓ cos ✓
. (This represents a counter-
clockwise rotation through an angle ✓.) Then it is easily checked that AT A = AAT =
I2 . Notice that the columns of A are mutually orthogonal vectors of length 1, and the
same applies to the rows of A.
Definition/Lemma 80 (Gram-Schmidt step). Let F = ( f 1 , . . . , f n ) be a basis of a
Euclidean space V whose first t 1 vectors are orthonormal. We define St ( F ) =
( f 1 , . . . , f t 1 , h, f t+1 , . . . , f n ) where
t 1
g = ft  ( ft · fi ) fi , h = g (g · g) 1/2

i =1

(note that g · g 6= 0 because F is independent). Then St ( F ) is a basis of V whose first


t vectors are orthonormal.
30 MA251 Algebra I November 27, 2014

Proof. For 1  j < t

t 1
g · f j = ft · f j  ( f t · fi )( fi · f j ) = f t · f j f t · f j = 0.
i =1

Note that g 6= 0 because F is independent. Therefore h := g ( g · g ) 1/2 is well-


defined. We still have h · f j = 0 whenever 1  j < t. We also have h · h = 1 by
construction. ⇤
Definition 81. Let F be any basis of an n-dimensional Euclidean space V. Note that
Sn · · · S1 ( F ) is an orthonormal basis of V by lemma 80. Turning F into the latter is
called the Gram-Schmidt orthonormalisation process.
A slight variation of the Gram-Schmidt process replaces f t by

t 1
ft · fi
ft  fi · fi
fi
i =1

in the tth step and rescales all vectors at the very end. This gives the same result as
the original Gram-Schmidt process but is faster in practice.
Corollary 82. Let f 1 , . . . , f r be orthonormal vectors in an n-dimensional Euclidean vec-
tor space. Then they can be extended to an orthonormal basis ( f 1 , . . . , f n ).
Proof. We prove first that f 1 , . . . , f r are linearly independent. Suppose Âri=1 xi f i = 0
for some x1 , . . . , xr 2 R and let j be such that 1  j  r. Taking the scalar product
with f j gives 0 = f j · 0 = Âri=1 xi f j · f i = x j , by since f 1 , . . . , f r are orthonormal.
In MA106 you proved that these can be extended to a basis F = ( f 1 , . . . , f n ). Now
Sn · · · Sr+1 ( F ) is an orthonormal basis and includes f i for i  r as required. ⇤
Proposition 83. Let A 2 Rn,n be a real symmetric matrix. Then:
(a) All complex eigenvalues of A lie in R.
(b) If n > 0 then A has a real eigenvalue.
Proof. Proof of (a). For a column vector v or matrix B over C, we denote by v or
B the result of replacing all entries of v or B by their complex conjugates. Since the
entries of A lie in R, we have A = A.
Let v be a complex eigenvector associated with . Then

Av = v (84)

so, taking complex conjugates and using A = A, we get

Av = v. (85)

Transposing (84) and using A T = A gives

vT A = vT . (86)

By (85) and (86) we have


vT v = vT Av = vT v
and so ( )vT v = 0.
But vT v 6= 0 because writing v = (↵1 , . . . , ↵n )T we have vT v = ↵1↵1 + · · · +
↵n↵n 2 R>0 as v 6= 0. Thus = , so 2 R.
Proof of (b). In MA106 we proved that there exists a complex eigenvalue of A.
By (a) is a real eigenvalue of A. ⇤
November 27, 2014 MA251 Algebra I 31

Before coming to the main theorem of this section, we recall the block sum A B
of matrices, which we introduced in section 2.5. It is straightforward to check that
( A1 B1 )( A2 B2 ) = ( A1 A2 B1 B2 ), provided the sizes of the matrices are such
that A1 A2 and B1 B2 are defined.
Theorem 87. For any symmetric matrix A 2 Rn,n , there is an orthogonal matrix P
such that PT AP is a diagonal matrix.
Note PT = P 1 so that A is simultaneously congruent and similar to the diagonal
matrix PT AP.
Proof. Induction on n. For n = 1 there is nothing to prove. Assume it’s true for n 1.
We use the standard inner product on Rn defined by v · w = vT w and the stan-
dard (orthonormal) basis E = (e1 , . . . , en ) of Rn .
By proposition 83 there exists an eigenvector g 2 Rn of A with eigenvalue, say,
. Note that g · g > 0 and put f 1 = g ( g · g ) 1/2 . Then f 1 · f 1 = 1 so by corollary 82
there exists an orthonormal basis F of Rn whose first vector is f 1 .
Put S = [ E, 1, F ]. Then S is orthogonal by proposition 77. So ST = S 1 .
Put B = ST AS. Then B is again symmetric. Moreover Be1 = e1 because
1
Be1 = S ASe1 = [ F1E] A[ E1F ]e1 = [ F1E] A f 1 = [ F1E] f 1 = e1 .

In other words, B is a block sum C for some symmetric (n 1) ⇥ (n 1) matrix C.


By the induction hypothesis there exists an orthogonal (n 1) ⇥ (n 1) matrix
Q such that QT CQ is diagonal. Putting P = S(1 Q) we deduce

PT AP = (1 Q )T ST A S ( 1 Q) = (1 Q )T B ( 1 Q)
T T
= (1 Q )( C )(1 Q) = ( Q CQ)

which is diagonal. Also, P is orthogonal because

PT P = (1 Q)T ( ST S)(1 Q)
T
= (1 Q )(1 Q) = 1 QT Q = 1 In 1 = In . ⇤

The following is just a restatement of theorem 87:


Theorem 88. Let q be a (second) quadratic form defined on a Euclidean space V. Then
there exists an orthonormal basis F, with coordinates yi say, and scalars ↵i 2 R such
that q(v) = Âin=1 ↵i yi2 . Furthermore, the numbers ↵i are uniquely determined by q.
Proof. Fix an orthonormal basis E of V. Let A be the matrix of q with respect to
E. Then A is symmetric. By theorem 87 there exists an orthogonal matrix P 2 Rn,n
such that B := PT AP is diagonal. But B is the matrix of q with respect to the basis F
defined by P = [ E1F ]. So q is of the form q(v) = Âin=1 ↵i yi2 as stated. Moreover F is
orthonormal by proposition 77.
Finally, the ↵i are the eigenvalues of the matrix of q with respect to any orthonor-
mal basis and hence depend only on q. ⇤
Although it is not used in the proof of the theorem above, the following proposi-
tion is useful when calculating examples. It helps us to write down more vectors in
the final orthonormal basis immediately, without having to use corollary 82 repeat-
edly.
We use the standard inner product on Rn defined by v · w = vT w.
Proposition 89. Let A be a real symmetric matrix, and let 1 , 2 be two distinct eigen-
values of A, with corresponding eigenvectors v1 , v2 . Then v1 · v2 = 0.
Proof. We have

Av1 = 1 v1 (1) and Av2 = 2 v2 (2).


32 MA251 Algebra I November 27, 2014

Transposing (1) and using A = AT gives vT1 A = T


1 v1 , and so

vT1 Av2 = T
1 v1 v2 (3) and by (2) vT1 Av2 = T
2 v1 v2 (4).

Subtracting (3) from (4) gives ( T = 0. Since 6= 0 by assumption,


2 1 )v1 v2 2 1
we have v1T v2 = 0. ⇤
In the following examples a real symmetric matrix A is given and we aim to find
an orthogonal P such that PT AP is diagonal.

Example 90. Let n = 2 and q(v) = x2 + y2 + 6xy, so A = ( 13 31 ). Then

det( A xI2 ) = (1 x)2 9 = x2 2x 8 = (x 4)( x + 2),

so the eigenvalues of A are 4 and 2. Solving Av = v for = 4 and 2, we find


corresponding eigenvectors (1 1)T and (1 1)T . Proposition 89 tells us that these
vectors are orthogonal to each other (which we can of course check directly!), so
if we divide them by their lengths to give vectors of length 1, giving ( p1 p1 )T and
2 2
( p12 p12 )T then we get an orthonormal basis consisting of eigenvectors of A, which
is what we want. The corresponding basis change matrix P has these vectors as
columns, so P = p1 ( 11 11 ), and we can check that PT P = I2 (hence P is orthogonal)
2
and that PT AP = ( 40 02 ).

Example 91. Let n = 3 and q(v) = 3x2 + 6y2 + 3z2 4xy 4yz + 2xz, so
0 1
3 2 1
A=@ 2 6 2 A.
1 2 3

Then, expanding by the first row,

det( A xI3 ) = (3 x)(6 x)(3 x) 4(3 x) 4(3 x) + 4 + 4 (6 x)


= x3 + 12x2 36x + 32 = (2 x)2 (8 x),

so the eigenvalues are 8, 2, 2.


For the eigenvalue 8, if we solve Av = 8v then we find a solution v = (1 2 1)T .
Since 2 is a repeated eigenvalue, we need two corresponding eigenvectors, which
must be orthogonal to each other. The equations Av = 2v all reduce to x 2y + z =
0, and so any vector ( x, y, z)T satisfying this equation is an eigenvector for = 2. By
proposition 89 these eigenvectors will all be orthogonal to the eigenvector for = 8,
but we will have to choose them orthogonal to each other. We can choose the first one
arbitrarily, so let’s choose (1 0 1)T . We now need another solution that is orthogonal
to this. In other words, we want x, y and z not all zero satisfying x 2y + z = 0 and
x z = 0, and x = y = z = 1 is a solution. So we now have a basis (1 2 1)T ,
(1 0 1)T , (1 1 1)T of three mutually orthogonal eigenvectors.
To get anporthonormal
p pbasis, we just need to divide by their lengths, which are,
respectively, 6, 2, and 3, and then the basis change matrix P has these vectors
as columns, so
0 1 1 1
1
p p p
6 2 3
B C
P=B p2 0 p1 C.
@ 6 3 A
p1 p1 p1
6 2 3

It can then be checked that PT P = I3 and that PT AP is the diagonal matrix with
diagonal 8, 2, 2.
November 27, 2014 MA251 Algebra I 33

3.7 Applications of quadratic forms to geometry


3.7.1 Reduction of the general second degree equation
Definition 92. A quadric in Rn is a subset of Rn of the form {v 2 Rn | f (v) = 0}
where f is a polynomial of degree 2, that is, of the form
n n i 1 n
f = Â ↵i xi2 + Â Â ↵i j xi x j + Â i xi + (93)
i =1 i =1 j=1 i =1

such that some ↵i or ↵i j is nonzero.

For example, both x21 + 1 = 0 and x21 + 2 = 0 define the same quadric: the empty
set.
A quadric in R2 is also called a quadric curve, in R3 a quadric surface.
An isometry of Rn is a permutation preserving distance. Equivalently, it’s the
composition of an orthogonal operator with a translation x 7! x + a.
Theorem 94. Any quadric in Rn can be moved by an appropriate isometry to one of the
following:
r r r
 ↵i xi2 = 0  ↵i xi2 + 1 = 0  ↵i xi2 + xr+1 = 0.
i =1 i =1 i =1

Proof. Step 1. By theorem 88, we can apply an orthogonal basis change (that is,
an isometry of Rn that fixes the origin) which has the effect of eliminating the terms
↵i j xi x j in (93).
Step 2. Whenever ↵i 6= 0, we can replace xi by xi i /( 2↵i ), and thereby elimi-
nate the term i xi from the equation. This transformation is just a translation, which
is also an isometry.
Step 3. If ↵i = 0, then we cannot eliminate the term i xi . Let us permute the
coordinates such that ↵i 6= 0 for 1  i  r, and i 6= 0 for r + 1  i  r + s. Then if
s > 1, by using corollary 82, we can find an orthogonal transformation that leaves xi
unchanged for 1  i  r and replaces Âis=1 r+ j xr+ j by xr+1 (where is the length
of Âis=1 r+ j xr+ j ), and then we have only a single non-zero i , namely r+1 .
Step 4. Finally, if there is a non-zero r+1 = , then we can perform the transla-
tion that replaces xr+1 by xr+1 / , and thereby eliminate . We have now reduced
to one of two possible types of equation:
r r
 ↵i xi2 + =0 and  ↵i xi2 + xr+1 = 0.
i =1 i =1

By dividing through by or , we can assume that = 0 or 1 in the first equation,


and that = 1 in the second. ⇤
A curve defined by the first or second equation is called central quadric because it
has central symmetry, that is, if a vector v satisfies the equation, then so does v.
We shall now consider the types of curves and surfaces that can arise in the famil-
iar cases n = 2 and n = 3. These different types correspond to whether the ↵i are
positive, negative or zero, and whether = 0 or 1.
We shall use x, y, z instead of x1 , x2 , x3 , and ↵, , instead of ↵1 , ↵2 , ↵3 . We shall
assume also that ↵, , are all positive, and write ↵, etc., for the negative case.

3.7.2 The case n = 2


When n = 2 we have the following possibilities.
(1) ↵x2 = 0. This just defines the line x = 0 (the y-axis).
34 MA251 Algebra I November 27, 2014

(2) ↵x2 = 1. This defines the two parallel lines x = ± p1↵ .


(3) ↵x2 = 1. This is the empty curve!
(4) ↵x2 + y2 = 0. The single point (0, 0).
q
(5) ↵x2 y2 = 0. Two straight lines y = ± ↵
x, which intersect at (0, 0).
(6) ↵x2 + y2 = 1. An ellipse.
(7) ↵x2 y2 = 1. A hyperbola.
(8) ↵x2 y2 = 1. The empty curve again.
(9) ↵x2 y = 0. A parabola.

3.7.3 The case n = 3

z0
–8
–2 –6
–4
–4
–2
–4
0x
–2
2
y0 4
2
6
4
8

Figure 4: x2 /4 + y2 z2 = 0

When n = 3, we still get the nine possibilities (1)–(9) that we had in the case
n = 2, but now they must be regarded as equations in the three variables x, y, z that
happen not to involve z.
So, in pcase (1), we now get the plane x = 0, in case (2) we get two parallel planes
x = ±1/ ↵, in case (4)p we get the line x = y = 0 (the z-axis), in case (5) two
intersecting planes y = ± ↵ / x, and in cases (6), (7) and (9), we get, respectively,
elliptical, hyperbolic and parabolic cylinders.
The remaining cases involve all of x, y and z. We omit ↵x2 y2 z2 = 1,
which is empty.
(10). ↵x2 + y2 + z2 = 0. The single point (0, 0, 0).
(11). ↵x2 + y2 z2 = 0. See Figure 4.
This is an elliptical cone. The cross sections parallel to the xy-plane are ellipses of
the form ↵x2 + y2 = c, whereas the cross sections parallel to the other coordinate
planes are generally hyperbolas. Notice also that if a particular point ( a, b, c) is on
the surface, then so is t( a, b, c) for any t 2 R. In other words, the surface contains the
straight line through the origin and any of its points. Such lines are called generators.
November 27, 2014 MA251 Algebra I 35

Figure 5:

z0

–1

–2
–2 –2

–1 –1

y0 0x

1 1

2 2

(a). x2 + 2y2 + 4z2 = 7

–4
z0

–2

–2
–2 0x
–1
y0 2
1
2 4

(b). x2 /4 + y2 z2 = 1
36 MA251 Algebra I November 27, 2014

6
4
2
z0
–8
–2
–6
–4 –4
–6 –2
–4 0x
–2 2
y0 4
2 6
4 8

Figure 6: x2 /4 y2 z2 = 1

When each point of a 3-dimensional surface lies on one or more generators, it is


possible to make a model of the surface with straight lengths of wire or string.
(12). ↵x2 + y2 + z2 = 1. An ellipsoid. See Figure 5(a).
(13). ↵x2 + y2 z2 = 1. A hyperboloid. See Figure 5(b).
There are two types of 3-dimensional hyperboloids. This one is connected, and
is known as a hyperboloid of one sheet. Although it is not immediately obvious, each
point of this surface lies on exactly two generators; that is, lines that lie entirely on
the surface. For each 2 R, the line defined by the pair of equations
p p p p p p
↵x z = (1 y); ( ↵x+ z) = 1 + y.
lies entirely on the surface; to see this, just multiply the two equations together. The
same applies to the lines defined by the pairs of equations
p p p p p p
y z = µ (1 ↵ x); µ( y+ z) = 1 + ↵ x.
It can be shown that each point on the surface lies on exactly one of the lines in each
if these two families.
(14). ↵x2 y2 z2 = 1. A hyperboloid. See Figure 6. hyperboloid of two
sheets. It does not have generators. Besides it is easy to observe that it is disconnected.
Substitute x = 0 into its equation. The resulting equation y2 z2 = 1 has no
solutions. This means that the hyperboloid does not intersect the plane x = 0. A
closer inspection confirms that the two parts of the hyperboloid lie on opposite sides
of the plane: intersect the hyperboloid with the line y = z = 0 to see two points on
both sides.
(15). ↵x2 + y2 z = 0. An elliptical paraboloid. See Figure 7(a).
(16). ↵x2 y2 z = 0. A hyperbolic paraboloid. See Figure 7(b). As in the
case of the hyperboloid of one sheet, there are two generators passing through each
point of this surface, one from each of the following two families of lines:
p p p p
( ↵x ) y = z; ↵x+ y= .
p p p p
µ( ↵ x + ) y = z; ↵x y = µ.
November 27, 2014 MA251 Algebra I 37

Figure 7:

–4
z2

–2

0
–2 0x
–1
y0 2
1
2 4

(a). z = x2 /2 + y2

10
8
6
4
2
z0
–2
–4
–6
–8
–10
–4
–4
–2 –2

y0 0x
2 2
4
4

(b). z = x2 y2
38 MA251 Algebra I November 27, 2014

3.8 Unitary, hermitian and normal matrices


In the rest of this chapter K = C unless stated otherwise. All our vector spaces are
finite-dimensional.
Definition 95. Let W, V be complex vector spaces. A map ⌧ : W ⇥ V ! C is called
sesquilinear 11 if
(a) ⌧ (↵1 w1 + ↵2 w2 , v) = ↵ 1 ⌧ (w1 , v) + ↵ 2 ⌧ (w2 , v) and
(b) ⌧ (w, ↵1 v1 + ↵2 v2 ) = ↵1 ⌧ (w, v1 ) + ↵2 ⌧ (w, v2 )
for all w, w1 , w2 2 W, v, v1 , v2 2 V, and ↵1 , ↵2 2 C. Here z denotes the complex
conjugate of z. If W = V then ⌧ is called a sesquilinear form on V.
Let us choose a basis E = (e1 , . . . , en ) of V and a basis F = ( f 1 , . . . , f m ) of W.
Let ⌧ : W ⇥ V ! K be a sesquilinear map, and let ↵i j = ⌧ ( f i , e j ), for 1  i  m,
1  j  n. The m ⇥ n matrix A = (↵i j ) is called the matrix of ⌧ with respect to the
bases E and F.
Let A⇤ = ĀT denote the conjugate matrix of a matrix A. Similar to (47) we have
m n m n
⌧ (w, v) = Â Â yi ⌧ ( fi , e j ) x j = Â Â yi ↵i j x j = [ F, w]⇤ A[E, v] (96)
i =1 j=1 i =1 j=1

The standard inner product on Cn is defined by v · w = v⇤ w rather than vT w. Note


that, for v 2 Rn , v⇤ = vT , so this definition is compatible with the one for real
vectors. The length |v| of a vector is given by |v|2 = v · v = v⇤ v, which is always a
non-negative real number.
We shall formulate several propositions that generalise results of the previous
sections to hermitian matrices. The proofs are very similar and left for you to fill in
as an exercise. The first two propositions are analogous to theorems 49 and 51.
Proposition 97. Let ⌧ : W ⇥ V ! C be sesquilinear. Let E1 , E2 be bases of V and F1 , F2
of W. Let A (respectively, B) be the matrix of ⌧ with respect to E1 , F1 (respectively,
E2 , F2 ). Then B = [ F1 , 1, F2 ]⇤ A[ E1 , 1, E2 ]. ⇤
Proposition 98. Let ⌧ : V ⇥ V ! C be sesquilinear. Let E1 , E2 be bases of V. Let A
(respectively, B) be the matrix of ⌧ with respect to E1 (respectively, E2 ). Then B =
[ E1 , 1, E2 ]⇤ A[ E1 , 1, E2 ]. ⇤
Definition 99. A matrix A 2 Cn,n is called hermitian if A = A⇤ . A sesquilinear form
⌧ on V is called hermitian if ⌧ (w, v) = ⌧ (v, w) for all v, w 2 V. ⇤
These are the complex analogues of symmetric matrices and symmetric bilinear
forms. The following proposition is an analogue of proposition 54.
Proposition 100. Let ⌧ : V ⇥ V ! C be sesquilinear and let E be a basis of V. Then ⌧
is hermitian if and only if the matrix of ⌧ with respect to E is hermitian. ⇤
Two hermitian matrices A and B are congruent if there exists an invertible matrix
P with B = P⇤ AP. A Hermitian quadratic form is a function q : V ! C such that there
exists a sesquilinear form ⌧ on V with q(v) = ⌧ (v, v) for all v 2 V. The following is
a hermitian version of Sylvester’s theorems (proposition 66 and theorem 67).
Proposition 101. Any hermitian quadratic form q has the form q(v) = Âit=1 | xi |2
Âiu=1 | xt+i |2 with respect to a suitable basis.
Equivalently, given a hermitian matrix A 2 Cn,n , there is an invertible matrix P 2
n,n
C such that P⇤ AP = B, where B = ( i j ) is a diagonal matrix with ii = 1 for
1  i  t, ii = 1 for t + 1  i  t + u, and ii = 0 for t + u + 1  i  n, and
t + u = rank( A).
11 from Latin one and a half
November 27, 2014 MA251 Algebra I 39

The numbers t and u are uniquely determined by q (or A). ⇤


Similarly to the real case, t + u is called the rank of q (or A) and t u the signa-
ture.
A hermitian quadratic form q on V is said to be positive definite if q(v) > 0 for all
nonzero v 2 V. By proposition 101 a positive definite hermitian form looks like the
standard inner product on Cn in some choice of a basis. A hermitian vector space is a
vector space over C equipped with a hermitian positive definite form.
Definition 102. A linear operator T : V ! V on a hermitian vector space (V, ⌧ ) is
said to be unitary if it preserves ⌧, that is, if ⌧ ( Tv, Tw) = ⌧ (v, w) for all v, w 2 V.
Definition 103. A matrix A 2 Cn,n is called unitary if A⇤ A = In .
Definition 104. A sequence of vectors (e1 , . . . , en ) in a hermitian space (V, ⌧ ) is or-
thonormal if ⌧ (ei , e j ) = i j for all i, j.
The following is an analogue of proposition 75.
Proposition 105. Let E be an orthonormal basis of a hermitian space (V, ⌧ ) and let
T : V ! V be linear. Then T is unitary if and only if [ ETE] is unitary. ⇤
The Gram-Schmidt process works perfectly well in the hermitian setting. In par-
ticular we obtain the following analogue to corollary 82:
Proposition 106. Let (V, ⌧ ) be a hermitian space of dimension n. Then any orthonor-
mal vectors f 1 , . . . , f r can be extended to an orthonormal basis ( f 1 , . . . , f n ). ⇤
Proposition 106 ensures the existence of orthonormal bases in hermitian spaces.
Proposition 83 and theorems 87 and 88 have analogues as well.
Proposition 107. Let A 2 Cn,n be a complex hermitian matrix. Then all complex
eigenvalues of A are real. If n > 0 then A has a real eigenvalue. ⇤
Theorem 108. Let q be a (second) hermitian quadratic form defined on a hermitian
space (V, ⌧ ). Then there exists an orthonormal basis F, with coordinates yi say, and real
scalars ↵i 2 R such that q(v) = Âin=1 ↵i yi2 . Furthermore, the numbers ↵i are uniquely
determined by q.
Equivalently, for any hermitian matrix A there is a unitary matrix P such that P⇤ AP
is a real diagonal matrix. ⇤
Notice the crucial difference between theorem 87 and proposition 108. In the
former we start with a real matrix to end up with a real diagonal matrix. In the latter
we start with a complex matrix but still we end up with a real diagonal matrix. The
point is that theorem 88 admits a useful generalisation to a wider class of matrices.
Definition 109. A matrix A 2 Cn,n is called normal if AA⇤ = A⇤ A.
In particular, all diagonal, all hermitian and all unitary matrices are normal. Con-
sequently, all real symmetric and real orthogonal matrices are normal.
Lemma 110. If A 2 Cn,n is normal and P 2 Cn,n is unitary, then P⇤ AP is normal.
Proof. If B = P⇤ AP then using (CD )⇤ = D ⇤ C ⇤

BB⇤ = ( P⇤ AP)( P⇤ AP)⇤ = P⇤ APP⇤ A⇤ P


= P⇤ AA⇤ P = P⇤ A⇤ AP = ( P⇤ A⇤ P)( P⇤ AP) = B⇤ B. ⇤

Theorem 111. A matrix A 2 Cn,n is normal if and only if there exists a unitary matrix
P 2 Cn,n such that P⇤ AP is diagonal12 .
12 with complex entries
40 MA251 Algebra I November 27, 2014

Proof. The “if” part follows from lemma 110 as diagonal matrices are normal.
For the “only if” part we proceed by induction on n. If n = 1, there is nothing to
prove. Let us assume we have proved the statement for all dimensions less than n.
The matrix A admits an eigenvector v 2 Cn with eigenvalue . Let W be the vector
subspace of all vectors x satisfying Ax = x. If W = Cn then A is a scalar matrix and
we are done. Otherwise, we have a nontrivial13 decomposition Cn = W W ? where
W ? = {v 2 Cn | v⇤ w = 0 for all w 2 W }.
Let us notice that A⇤ W ⇢ W because AA⇤ x = A⇤ Ax = A⇤ x = ( A⇤ x) for any
x 2 W.
It follows that AW ? ⇢ W ? since ( Ay)⇤ x = y⇤ ( A⇤ x) 2 y⇤ W = 0 so ( Ay)⇤ x = 0
for all x 2 W, y 2 W ? . Also AW ⇢ W.
Now choose orthonormal bases of W and W ? . Together they form a new or-
thonormal basis of Cn . The change of basis matrix P is unitary, hence by lemma 110
the matrix P⇤ AP = ( B0 C0 ) is normal. It follows that the matrices B and C are normal
of smaller size and we can use the inductive hypothesis to complete the proof. ⇤
Theorem 111 is an extremely useful criterion for diagonalisability of matrices. To
find P in practice, we use similar methods to those used in the real case.
✓ ◆
6 2 + 2i
Example 112. Let A be the matrix A = 2 2i 4
. Then

c A ( x) = (6 x)(4 x) (2 + 2i)(2 2i) = x2 10x + 16 = ( x 2)( x 8),

so the eigenvalues of A are 2 and 8. (We saw in proposition 107 that the eigenvalues
of any Hermitian matrix are real.) The corresponding eigenvectors are v1 = (1 +
i, 2)T and v2 = (1 + i, 1) T . We find that |v1 |2 = v⇤1 v1 = 6 and |v2 |2 = 3, so we
divide by their lengths to get an orthonormal basis v1 /|v1 |, v2 /|v2 | of C2 . Then the
matrix 0 1
1p+i 1p+i
6 3
P=@ A
p2 p1
6 3
✓ ◆
2 0
having this basis as columns is unitary and satisfies P⇤ AP = 0 8
.

3.9 Applications to quantum mechanics (non-examinable)


With all the linear algebra we know it is a little step aside to understand the basics of
quantum mechanics. We discuss Schrödinger’s picture14 of quantum mechanics and
(mathematically) derive Heisenberg’s uncertainty principle.
The main ingredient of quantum mechanics is a hermitian vector space (V, h·, ·i).
There are physical arguments showing that real Euclidean vector spaces are no good
and that V must be infinite-dimensional. Here we just take their conclusions for
granted. We denote by [v] the line Cv spanned by v 2 V r {0}. The states of the
physical system are the lines [v]. We use normalised vectors, that is, v such that
hv, vi = 1, to present states as this makes our formulae slightly easier.
It is impossible to observe the state of the quantum system but we can try to ob-
serve some physical quantities such as momentum, energy, spin, etc. Such physical
quantities become observables, that is, hermitian linear operators : V ! V. Her-
mitian in this context means that h x, yi = h x, yi for all x, y 2 V. Sweeping a
subtle mathematical point under the carpet15 , we assume that is diagonalisable
13 that is, neither W nor W ? is zero.
14 The alternative is Heisenberg’s picture which we have no time to discuss here.
15 If V were finite-dimensional we could have used proposition 108. But V is infinite dimensional! To

ensure diagonalisability V must be complete with respect to the hermitian norm. Such spaces are called
November 27, 2014 MA251 Algebra I 41

with eigenvectors ei with eigenvalues i (i 1). The proof of proposition 107 goes
through in the infinite dimensional case, so we conclude that all i belong to R. Back
to physics, if we measure on a state [v] where v is Ân ↵n en and is normalised then
the measurement will return n as a result with probability |↵n |2 .
One observable is energy H : V ! V, often called hamiltonian. It is central to the
theory because it determines the time evolution [v(t)] of the system by Schrödinger’s
equation:
dv(t) 1
= Hv(t)
dt ih̄
where h̄ ⇡ 10 34 Joule per second16 is the reduced Planck constant. We know how
to solve this equation: v(t) = etH /ih̄ v(0).
As a concrete example, let us look at the quantum oscillator. The full energy of
the classical harmonic oscillator mass m and frequency ! is

p2 1
h= + m!2 x2
2m 2
where x is the position and p = mx0 is the momentum. To quantise it, we have to
play with this expression. The vector space of all smooth functions C 1 (R, C) admits
2
a convenient subspace
R1
V = { f ( x)e x /2 | f ( x) 2 C[ x]}, which we make hermitian by
h ( x), ( x)i = 1 ¯ ( x) ( x)dx. Quantum momentum and quantum position are
linear operators (observables) on this space:

P( f ( x)) = ih̄ f 0 ( x), X ( f ( x)) = f ( x) · x.

The quantum Hamiltonian is a second order differential operator given by the same
equation
P2 1 h̄2 d2 1
H= + m!2 X 2 = 2
+ m!2 x2 .
2m 2 2m dx 2
As mathematicians, we can assume that m = 1 and ! = 1, so that H ( f ) = ( f x2
f 00 )/2. The eigenvectors of H are the Hermite functions
2 /2 x2 (n)
n ( x) = ( 1)n e x (e ) , n = 0, 1, 2 . . .

with eigenvalues n + 1/2 whichpare discrete energy levels of the quantum oscillator.
Notice that h k , n i = k,n 2n n! ⇡, so they are orthogonal but not orthonormal. The
states [ n ] are pure states: they do not change with time and always give n + 1/2 as
energy. If we take a system in a state [v] where v is normalised and
1
v= Â ↵n ⇡ 4 2n/2 n! n
n

then the measurement of energy will return n + 1/2 with probability |↵n |2 . Notice
that the measurement breaks the system!! It changes it to the state [ n ] and all future
measurements will return the same energy!
Alternatively, it is possible to model the quantum oscillator on the vector space
W = C[ x] of polynomials. One has to use the natural linear bijection
x2 /2
↵ : W ! V, ↵ ( f ( x)) = f ( x) e

and transfer all the formulae to W. The metric becomes h f , gi = h↵ ( f ), ↵ ( g)i =


R1
¯ x2 dx, the formulae for P and X changes accordingly, and at the end
1 f ( x) g( x) e

Hilbert spaces. Diagonalisability is still subtle as eigenvectors do not span the whole of V but only a
dense subspace. Furthermore, if V admits no dense countably dimensional subspace, further difficulties
arise. . . Pandora’s box of functional analysis is wide open, so let us try to keep it shut.
16 Notice the physical dimensions: H is energy, t is time, i dimensionless, h̄ equalises the dimensions

in the both sides irrespectively of what v is.


42 MA251 Algebra I November 27, 2014

2 2
one arrives at Hermite polynomials ↵ 1 ( n ( x)) = ( 1)n e x (e x )(n) instead of Her-
mite functions.
Let us go back to an abstract system with two observables P and Q. It is pointless
to measure Q after measuring P as the system is broken. But can we measure them
simultaneously? The answer is given by Heisenberg’s uncertainty principle. Mathe-
matically, it is a corollary of Schwarz’s inequality:

kvk2 · kwk2 = hv, vihw, wi |hv, wi|2 .

Let e1 , e2 , . . . be eigenvectors for P and let p1 , p2 , . . . be the corresponding eigenval-


ues. The probability that p j is returned after measuring on [v] with v = Ân ↵n en
normalised depends on the multiplicity of the eigenvalue:

Prob( p j is returned) = Â |↵k |2 .


pk = p j

Hence, we should have the expected value

E ( P, v) = Â pk |↵k |2 = Âh↵k ek , pk↵k ek i = hv, P(v)i.


k k

To compute the expected quadratic error we use the shifted observable Pv = P


E ( P, v) I:
1/2 1/2
D( P, v) = E ( Pv 2 , v) = h Pv2 (v), v)i = h Pv (v), Pv (v)i1/2 = k Pv (v)k

where we use the fact that P and Pv are hermitian. Notice that D( P, v) has a physical
meaning of uncertainty of measurement of P. Notice also that the operator PQ QP
is no longer hermitian in general but we can still talk about its expected value. Here
is Heisenberg’s principle.
1
Theorem 113. D( P, v) · D( Q, v) |E ( PQ QP, v)|.
2
Proof. In the right hand side, E ( PQ QP, v) = E ( Pv Qv Qv Pv , v) = hv, Pv Qv (v)i
hv, Qv Pv (v)i = h Pv (v), Qv (v)i h Qv (v), Pv (v)i. Remembering that the form is her-
mitian,

E ( PQ QP, v) = h Pv (v), Qv (v)i h Pv (v), Qv (v)i = 2 · Im(h Pv (v), Qv (v)i) ,

twice the imaginary part. So the right hand side is estimated by Schwarz’s inequality:

Im(h Pv (v), Qv (v)i)  |h Pv (v), Qv (v)i|  k Pv (v)k · k Qv (v)k . ⇤

Two cases of particular physical interest are commuting observables, that is, PQ =
QP and conjugate observables, that is, PQ QP = ih̄I. Commuting observable can be
measured simultaneously with any degree of certainty. Conjugate observables obey
Heisenberg’s uncertainty:

D( P, v) · D( Q, v) .
2

4 Finitely Generated Abelian Groups


4.1 Definitions
Groups were introduced in the first year in Foundations, and will be studied in detail
next term in Algebra II: Groups and Rings. In this module, we are only interested in
abelian (= commutative) groups, which are defined as follows.
November 27, 2014 MA251 Algebra I 43

Definition 114. An abelian group is a set G together with a binary operation G ⇥


G ! G : ( g, h) 7! g + h, which we write as addition, and which satisfies the following
properties:
(a) (Associativity). For all g, h, k 2 G, ( g + h) + k = g + (h + k);
(b) (Identity or zero). There exists an element 0G 2 G such that g + 0G = g for all
g 2 G;
(c) (Inverse or negative). For all g 2 G there exists g 2 G such that g + ( g) = 0G ;
(d) (Commutativity). For all g, h 2 G, g + h = h + g.
It can be shown that 0G is uniquely determined by ( G, +). Likewise g is deter-
mined by ( G, +, g).
Usually we just write 0 rather than 0G . We only write 0G if we need to distinguish
between the zero elements of different groups.
The commutativity axiom (d) is not part of the definition of a general group, and
for general (non-abelian) groups, it is more usual to use multiplicative rather than
additive notation. All groups in this course should be assumed to be abelian, although
some of the definitions and results apply equally well to general groups.
The terms identity, inverse are intended for the multiplicative notation of groups.
If groups are written additive, as in this chapter, then one says zero, negative instead.
Examples 115. 1. The integers Z.
2. Fix a positive integer n > 0 and let

Zn = {0, 1, 2, . . . , n 1} = { x 2 Z | 0  x < n}.

where addition is computed modulo n. So, for example, when n = 9, we have


2 + 5 = 7, 3 + 8 = 2, 6 + 7 = 4, etc. Note that the negative x of x 2 Zn is equal to
n x (if x 6= 0) in this example. (Later we shall see a better construction of Zn as a
quotient Z/nZ.)
3. Examples from linear algebra. Let K be a field, for example, Q, R, C, Z p with
p prime.
(a). The elements of K form an abelian group under addition.
(b). The non-zero elements of K form an abelian group K ⇥ under multiplication.
(c). The vectors in any vector space form an abelian group under addition.
4. A group G with just one element (that is, G = {0G }) is said to be a trivial
group. We simply write G = 0.
Proposition 116. (The cancellation law) Let G be any group, and let g, h, k 2 G. Then
g + h = g + k ) h = k.
Proof. Add g to both sides of the equation and use the axioms of groups. ⇤
Definition 117. For n 2 Z and g in an abelian group G we define ng recursively as
follows. Firstly 0g := 0. Next (n + 1) g := ng + g for n 0. Finally, ( n) g := (ng)
for n > 0.
Exercise 118. Let G be an abelian group. Prove

(m + n) g = mg + ng, m(ng) = (mn) g, n( g + h) = ng + nh (119)

for all m, n 2 Z and g, h 2 G.


Scalar multiplication Z ⇥ G ! G allows us to think of abelian groups as “vector
spaces over Z” (or, using correct terminology, Z-modules. Modules over rings will
play a significant role in Rings and Modules in year 3). We shall often use (119) in
this chapter.
44 MA251 Algebra I November 27, 2014

One can identify abelian groups with Z-modules and the two terms can be used
interchangeably. However, Z-modules would be a better term than abelian groups
given the material in this chapter.
Definition 120. A group G is called cyclic if there exists an element x 2 G such that
G = {mx | m 2 Z}.
The element x in the definition is called a generator of G. Note that Z and Zn are
cyclic with generator 1.
Definition 121. A bijection : G ! H between two (abelian) groups is called an
isomorphism if ( g + h) = ( g) + (h) for all g, h 2 G. The groups G and H are
called isomorphic, and we write G ⇠
= H, if there is an isomorphism : G ! H.
Isomorphic groups are often thought of as being essentially the same group, but
with elements having different names.
Exercise 122. Prove that any isomorphism : G ! H satisfies (ng) = n ( g) for
all g 2 G, n 2 Z.
Proposition 123. Any cyclic group G is isomorphic either to Z or to Zn for some n > 0.
Proof. Let G be cyclic with generator x. So G = { mx | m 2 Z }. Suppose first that
the elements mx for m 2 Z are all distinct. Then the map : Z ! G defined by
(m) = mx is a bijection, and it is clearly an isomorphism.
Otherwise, we have lx = mx for some l < m, and so (m l ) x = 0 with m
l > 0. Let n be the least integer with n > 0 and nx = 0. Then the elements
0x = 0, 1x, 2x, . . . , (n 1) x of G are all distinct, because otherwise we could find a
smaller n. Furthermore, for any mx 2 G, we can write m = rn + s for some r, s 2 Z
with 0  s < n. Then mx = (rn + s) x = sx, so G = { 0, 1x, 2x, . . . , (n 1) x }, and the
map : Zn ! G defined by (m) = mx for 0  m < n is a bijection, which is easily
seen to be an isomorphism. ⇤
Definition 124. For an element g 2 G, the least positive integer n with ng = 0, if it
exists, is called the order | g| of g. If there is no such n, then g has infinite order and
we write | g| = 1. The order | G | of a group G is just the number of elements of G.
Exercise 125. If : G ! H is an isomorphism, then | g| = | ( g)| for all g 2 G.
Exercise 126. Let G be a finite cyclic group with generator g. Prove: | g| = | G |.
Exercise 127. If G is a finite abelian group then any element of G has finite order.
Definition 128. Let X be a subset of a group G. We say that G is generated or spanned
by X, and X is said to be a generating set of G, if every g 2 G can be written as a finite
sum Âik=1 mi xi , with mi 2 Z and xi 2 X for all i.
If G is generated by X, then we write G = h X i. We write G = h x1 , . . . , xn i instead
of G = h{ x1 , . . . , xn }i.
If G admits a finite generating set X then G is said to be finitely generated.
So a group is cyclic if and only if it has a generating set X with | X | = 1.
Definition 129. Let G1 , . . . , Gn be groups. Their direct sum is written G1 ··· Gn
or G1 ⇥ · · · ⇥ Gn and defined to be the set

( g1 , . . . , gn ) gi 2 Gi for all i

(Cartesian product) with component-wise addition

( g1 , . . . , gn ) + ( h1 , . . . , hn ) = ( g1 + h1 , . . . , gn + hn ).
November 27, 2014 MA251 Algebra I 45

Exercise 130. Prove that G1 · · · Gn is again an abelian group with zero element
(0, . . . , 0) and ( g1 , . . . , gn ) = ( g1 , . . . , gn ).
In general (non-abelian) group theory this is more often known as the direct
product of groups.
One of the main results of this chapter is known as the fundamental theorem of
finitely generated abelian groups, and states that every finitely generated abelian group
is isomorphic to a direct sum of cyclic groups.
Exercise 131. Prove that the group (Q, +) is not finitely generated. Prove that it is
not isomorphic to a direct sum of cyclic groups.

4.2 Subgroups
Definition 132. A subset H of a group G is called a subgroup of G if it forms a group
under the same operation as that of G.
Lemma 133. If H is a subgroup of G, then the identity element 0 H of H is equal to the
identity element 0G of G.
Proof. Using the identity axioms for H and G, 0 H + 0 H = 0 H = 0 H + 0G . Now by
the cancellation law, 0 H = 0G . ⇤
The definition of a subgroup is semantic in its nature. While it precisely pinpoints
what a subgroup is, it is quite cumbersome to use. The following proposition gives a
usable criterion.
Proposition 134. Let H be a subset of a group G. The following statements are equiv-
alent.
(a) H is a subgroup of G.
(b) H is nonempty; and h1 , h2 2 H ) h1 + h2 2 H; and h 2 H ) h 2 H.
(c) H is nonempty; and h1 , h2 2 H ) h1 h2 2 H.
Proof. Proof of (a) ) (c). If H is a subgroup of G then it is nonempty as it contains
0 H . Moreover, h1 h2 = h1 + ( h2 ) 2 H if so are h1 and h2 .
Proof of (c) ) (b). Pick x 2 H. Then 0 = x x 2 H. Now h = 0 h 2 H for
any h 2 H. Finally, h1 + h2 = h1 ( h2 ) 2 H for all h1 , h2 2 H.
Proof of (b) ) (a). We need to verify the four group axioms in H. Two of these,
‘Closure’, and ‘Inverse’, are the conditions (b) and (c). The other two axioms are
‘Associativity’ and ‘Identity’. Associativity holds because it holds in G, and H is a
subset of G. Since we are assuming that H is nonempty, there exists h 2 H, and then
h 2 H by (c), and h + ( h) = 0 2 H by (b), and so ‘Identity’ holds, and H is a
subgroup. ⇤
Examples 135. 1. There are two standard subgroups of any group G: the whole
group G itself, and the trivial subgroup {0} consisting of the identity alone. Sub-
groups other than G are called proper subgroups, and subgroups other than {0} are
called non-trivial subgroups.
2. If g is any element of any group G, then the set of all integer multiples h gi =
{mg | m 2 Z} forms a cyclic subgroup of G called the subgroup generated by g. Note
|h gi| = | g|.
Let us look at a few specific examples. If G = Z, then 5Z, which consists of all
multiples of 5, is the cyclic subgroup generated by 5. Of course, we can replace 5 by
any integer here, but note that the cyclic groups generated by 5 and 5 are the same.
If G = h gi is a finite cyclic group of order n and m is a positive integer dividing n,
then the cyclic subgroup generated by mg has order n/m and consists of the elements
kmg for 0  k < n/m.
46 MA251 Algebra I November 27, 2014

Exercise 136. What is the order of the cyclic subgroup generated by mg for general
m (where we drop the assumption that m|n)?
Exercise 137. Show that the groups of non-zero complex numbers C⇥ under the
operation of multiplication has finite cyclic subgroups of all possible orders.
Exercise 138. Let G be an (abelian) group.
(a) Let { Hi | i 2 I } be a family of subgroups of G. Prove that the intersection
T
K = i2 I Hi is also a subgroup of G.
(b) Let X be a subset of G and let K be the intersection of those subgroups H ⇢ G
satisfying X ⇢ H. Prove K = h X i. In words: h X i is the least subgroup of G
containing X.
(c) Give an example showing that the union of two subgroups of G may not be a
subgroup of G.
Exercise 139. Let G be a group and n 2 Z. Prove that { x 2 G | nx = 0} is a
subgroup of G.
Exercise 140. Let A, B be subgroups of a group G. Prove that { a + b | a 2 A, b 2 B}
(written A + B) is also a subgroup of G.

4.3 Cosets and quotient groups


Definition 141. Let H be a subgroup of G and g 2 G. Then the coset H + g is the
subset {h + g | h 2 H } of G.
Note: Since our groups are abelian, we have H + g = g + H, but in general group
theory the right and left cosets Hg and gH can be different.
Examples 142. 3. G = Z, H = 5Z. There are just 5 distinct cosets H = H + 0 =
{ 5n | n 2 Z }, H + 1 = { 5n + 1 | n 2 Z }, H + 2, H + 3, H + 4. Note that
H + i = H + j whenever i ⌘ j (mod 5).
4. G = Z6 , H = {0, 3}. There are 3 distinct cosets, H = H + 3 = {0, 3},
H + 1 = H + 4 = {1, 4}, and H + 2 = H + 5 = {2, 5},
5. G = C⇥ , the group of non-zero complex numbers under multiplication, H =
S1 = { z 2 C : | z| = 1}, the unit circle. The cosets are the circles centered at the
origin. There are uncountably many distinct cosets, one for each positive real number
(radius of a circle).
Proposition 143. Let H be a subgroup of a group G and g, k 2 G. The following are
equivalent:
(a) k 2 H + g;
(b) H + g = H + k;
(c) k g 2 H.
Proof. Proof of (b) ) (a). Clearly H + g = H + k ) k 2 H + g.
Proof of (a) ) (b). If k 2 H + g, then k = h + g for some fixed h 2 H, so
g = k h. Let f 2 H + g. Then, for some h1 2 H, we have f = h1 + g = h1 + k h 2
H + k, so Hg ✓ Hk. Similarly, if f 2 H + k, then for some h1 2 H, we have
f = h1 + k = h1 + h + g 2 H + g, so H + k ✓ H + g. Thus H + g = H + k.
Proof of (a) ) (c). If k 2 H + g, then, as above, k = h + g, so k g = h 2 H.
Proof of (c) ) (a). If k g 2 H, then putting h = k g, we have h + g = k, so
k 2 H + g. ⇤
Corollary 144. Two cosets of H in G are either equal or disjoint.
November 27, 2014 MA251 Algebra I 47

Proof. If H + g1 and H + g2 are not disjoint, then there exists an element k 2 ( H +


g1 ) \ ( H + g2 ), but then H + g1 = H + k = H + g2 by proposition 143. ⇤
Corollary 145. The cosets of H in G partition G. ⇤
Proposition 146. If H is finite, then all cosets have exactly | H | elements.
Proof. The map : H ! H + g defined by (h) = h + g is bijective because h1 + g =
h2 + g ) h1 = h2 by the cancellation law. ⇤
Corollary 145 and proposition 146 together imply:
Theorem 147 (Lagrange’s Theorem). Let G be a finite (abelian) group and H a sub-
group of G. Then the order of H divides the order of G. ⇤
Definition 148. Let H be a subgroup of H. The set of cosets of H in G is written
G / H. The number of elements of G / H is called the index of H in G and is written as
| G : H |.
If G is finite, then we clearly have | G : H | = | G |/| H |. But, from the example
G = Z, H = 5Z above, we see that | G : H | can be finite even when G and H are
infinite.
Proposition 149. Let G be a finite (abelian) group. Then for any g 2 G, the order | g|
of g divides the order | G | of G.
Proof. Let | g| = n. We saw in example 2 above that the integer multiples { mg | m 2
Z } of g form a subgroup H of G. By minimality of n we have | H | = n. The result
follows now from Lagrange’s Theorem. ⇤
As an application, we can now immediately classify all finite (abelian) groups
whose order is prime.
Proposition 150. Let G be an (abelian) group having prime order p. Then G is cyclic;
that is, G ⇠
= Zp.
Proof. Let g 2 G with 0 6= g. Then | g| > 1, but | g| divides p by proposition 149, so
| g| = p. But then G must consist entirely of the integer multiples mg (0  m < p) of
g, so G is cyclic. ⇤
Definition 151. For subsets A and B of a group G we define the sum A + B = { a + b |
a 2 A, b 2 B}.
Lemma 152. Let H be a subgroup of G and g, h 2 G. Then ( H + g) + ( H + h) =
H + ( g + h ).
Proof. We have H + H = H so

( H + g) + ( H + h) = x + y x 2 H + g, y 2 H + h
= ( a + g) + (b + h) a, b 2 H = ( a + b) + ( g + h) a, b 2 H
= c + ( g + h) c 2 H + H = c + ( g + h) c 2 H = H + ( g + h). ⇤

Theorem 153. Let H be a subgroup of an abelian group G. Then G / H forms a group


under addition of subsets.
Proof. We have just seen that ( H + g) + ( H + h) = H + ( g + h), so we have closure,
and associativity follows easily from associativity of G. Since ( H + 0) + ( H + g) =
H + g for all g 2 G, H = H + 0 is an identity element, and since ( H g) + ( H + g) =
H g + g = H, H g is an inverse to H + g for all cosets H + g. Thus the four group
axioms are satisfied and G / H is a group. ⇤
48 MA251 Algebra I November 27, 2014

Definition 154. The group G / H is called the quotient group (or the factor group) of
G by H.
Notice that if G is finite, then | G / H | = | G : H | = | G |/| H |. So, although the
quotient group seems a rather complicated object at first sight, it is actually a smaller
group than G.
Examples 155. 1. Let G = Z and H = mZ for some m > 0. Then there are exactly
m distinct cosets, H, H + 1, . . . , H + (m 1). If we add together k copies of H + 1,
then we get H + k. So G / H is cyclic of order m and with generator H + 1. So by
proposition 123, Z/mZ ⇠ = Zm . The original definition of Zm was rather clumsy so we
put Zm := Z/mZ from now on.
2. G = R and H = Z. The quotient group G / H is isomorphic to the circle
subgroup S1 of the multiplicative group C⇥ . One writes an explicit isomorphism
: G / H ! S1 by ( x + Z) = e2⇡ xi .
3. G = Q and H = Z. The quotient group G / H featured in a previous exam.
It was asked to show that this group is infinite, not finitely generated and that every
element of G / H has finite order.
4. Let V be a vector space over K and W ⇢ V a subspace. In particular V is
an abelian group with subgroup W, so that quotient group V /W is defined. It can
naturally be made into a vector space over K (outside our scope).

4.4 Homomorphisms and the first isomorphism theorem


Definition 156. Let G and H be groups. A homomorphism from G to H is a map
: G ! H such that ( g1 + g2 ) = ( g1 ) + ( g2 ) for all g1 , g2 2 G.
Linear maps between vector spaces are examples of homomorphisms.
Note that an isomorphism is just a bijective homomorphism.
Lemma 157. Let : G ! H be a homomorphism. Then (ng) = n ( g) for all g 2 G,
n 2 Z.
Proof. Exercise. ⇤
Example 158. Let G be any abelian group, and let n 2 Z. Then : G ! G defined
by ( g) = ng for all g 2 G is a homomorphism.
Definition 159. Let : G ! H be a homomorphism. Then the kernel ker( ) of is
defined to be the set of elements of G that map onto 0 H ; that is,

ker( ) = { g 2 G | ( g) = 0 H }.

Note that by lemma 157 above, ker( ) always contains 0G .


Proposition 160. Let : G ! H be a homomorphism. Then is injective if and only
if ker( ) = {0G }.
Proof. Proof of ). Since 0G 2 ker( ), if is injective then we must have ker( ) =
{ 0 G }.
Proof of (. Suppose that ker( ) = {0G }, and let g1 , g2 2 G with ( g1 ) = ( g2 ).
Then 0 H = ( g1 ) ( g2 ) = ( g1 g2 ) (by lemma 157), so g1 g2 2 ker( ) and
hence g1 g2 = 0G and g1 = g2 . So is injective. ⇤
Theorem 161.
(a) Let : G ! H be a homomorphism. Then ker( ) is a subgroup of G and im( ) is
a subgroup of H.
November 27, 2014 MA251 Algebra I 49

(b) Let H be a subgroup of a group G. Then the map : G ! G / H defined by


( g) = H + g is a surjective homomorphism with kernel H. We call the natural
map or quotient map.
Proof. Part (a) is straightforward using proposition 134.
Proof of (b). The map is a homomorphism by lemma 152. It is clearly surjective.
Also, for all g 2 G we have ( g) = 0G/ H , H + g = H + 0G , g 2 H, so
ker( ) = H. ⇤
The following lemma explains a connection between quotients and homomor-
phisms. It clarifies the trickiest point in the proof of the forthcoming First Isomorphism
Theorem.
Lemma 162. Let : G ! H be a homomorphism with kernel K, and let A be a sub-
group of G. Then A ⇢ K if and only if there exists a homomorphism : G / A ! H
given by ( A + g) = ( g) for all g 2 G.
Proof. We say that the definition of in the lemma is well-defined if for all g, h 2 G
with A + g = A + h one has ( g) = (h). Then

is well-defined
() for all g, h 2 G with A + g = A + h one has ( g) = (h)
() for all g, h 2 G with g h 2 A one has g h 2 K
() A ⇢ H.

Once is well-defined, it is a homomorphism because

( A + h) + ( A + g) = (h) + ( g) = (h + g)
= ( A + h + g) = (( A + h) + ( A + g)). ⇤

We denote the set of all homomorphism from G to H by hom( G, H ). There is


an elegant way to reformulate lemma 162. The composition with the quotient map
: G ! G / A defines a bijection

hom( G / A, H ) ! {↵ 2 hom( G, H ) | ↵ ( A) = {0}}, 7! .

Theorem 163 (First Isomorphism Theorem). Let : G ! H be a homomorphism with


kernel K. Then G /K ⇠= im( ). More precisely, there is an isomorphism : G /K !
im( ) defined by (K + g) = ( g) for all g 2 G.
Proof. The map is a well-defined homomorphism by lemma 162. Clearly, im( ) =
im( ). Finally,

(K + g) = 0 H () ( g) = 0 H () g 2 K () K + g = 0G/K .

By proposition 160, is injective. Thus : G /K ! im( ) is an isomorphism. ⇤


One can associate two quotient groups to a homomorphism : G ! H. The
cokernel of is Coker( ) = H / im( ) and the coimage of is Coim( ) = G /ker( ).
In short, the first isomorphism theorem states that the natural homomorphism from
the coimage to the image is an isomorphism.

4.5 Free abelian groups


In linear algebra you learned about bases of finite-dimensional vector spaces. We
shall now define bases of abelian groups. Another notion that may be new to you is
that of infinite bases which we introduce at the same time.
50 MA251 Algebra I November 27, 2014

Let X be an infinite set, G a group, and f : X ! G any map. The support of f is


{ a 2 X | f ( a) 6= 0}. An (apparently infinite) sum  a2X f ( a) is said to be a finite sum
if the support of f is finite. A finite sum in this sense has a well-defined value. In
these notes  a2X f ( a) is not allowed unless f has finite support.
Definition 164. Let X be a possibly infinite subset of an abelian group G.
(a) A linear combination of X is a finite sum  a2X c a a with c a 2 Z, all but finitely
many zero.
(b) We call X linearly independent if  a2X c a a = 0 with c a 2 Z, all but finitely many
zero, implies c a = 0 for all a 2 X.
(c) We call X an unordered basis of G if it is independent and spans G.
(d) An abelian group is free if it admits a basis.
(e) An ordered basis or simply basis of G is an n-tuple ( x1 , . . . , xn ) of distinct ele-
ments such that { x1 , . . . , xn } is a basis in the above sense.
It is possible to define infinite ordered bases but we won’t need them.
Exercise 165. Give an example where { x1 , . . . , xn } is an unordered basis but ( x1 ,
. . . , xn ) is not a basis.
Example 166. The standard vectors e1 , . . . , en 2 Zn are defined as in K n for fields K.
Then (e1 , . . . , en ) is a basis of Zn . It is known as the standard basis. So Zn is free.
Exercise 167. Contrary to vector spaces, an abelian group may not have a basis.
Prove that Zn has no basis for n > 1.
Proposition 168. Let X be a set. Then there exists a group G such that X ⇢ G and
such that X is a basis of G. Moreover, if H has the same properties then there exists an
isomorphism : G ! H which is the identity on X.
Proof. Existence. The elements of G are the ‘formal finite sums’ Â a2X c a a where
c a 2 Z, all but finitely many zero. This contains X in an obvious way. Addition in G
is defined to be pointwise:
⇣ ⌘ ⇣ ⌘
 ca a +  da a =  (ca + da )a.
a2 X a2 X a2 X

Prove yourself that this makes G into a group and that X is a basis of G.
Uniqueness. For a 2 X let g( a) be the element a 2 X viewed as an element of G
and h( a) viewed as an element of H. We define : G ! H by
⇣ ⌘
 a c g ( a ) =  c a h( a).
a2 X a2 X

It is clear that this : G ! H is an isomorphism which is identity on X. ⇤


Definition 169. Let X be a set. A free (abelian) group on X is a pair ( F, f ) where F is
a group, f : X ! F an injective map (of sets) such that f ( X ) is a basis of F.
Many free groups ( F, f ) on X are such that f ( a) = a for all a 2 X (we say that f
is the inclusion map) but we allow otherwise.
Proposition 170. Let ( F, f ) be a free (abelian) group on X. Let G be any group and
g : X ! G any map of sets.
(Universal property). Then there exists a unique homomorphism h : F ! G such
that h f = g.
h is injective if and only if g is injective with independent image.
h is surjective if and only if G is spanned by g( X ).
November 27, 2014 MA251 Algebra I 51

h is an isomorphism if and only if ( G, g) is a free group on X.


Proof. Exercise. ⇤
If G has a basis of n elements then G ⇠= Zn . A finitely generated group is free if
n
and only if it is isomorphic to Z for some n.
As for finite-dimensional vector spaces, it turns out that any two bases of a free
abelian group have the same size, but this has to be proved. It will follow directly
from the next theorem.
A square matrix P 2 Zn,n is said to be unimodular if det( P) 2 { 1, 1}.
Theorem 171. Let P 2 Zm,n . Then the following are equivalent:
(a) The columns of P form a basis of Zm .
(b) m = n and P 1 2 Zn,n .
(c) m = n and P is unimodular.
Proof. Let f i denote the ith column of P and F = ( f 1 , . . . , f n ). Let E = (e1 , . . . , em )
be the standard basis of Zm .
(a) ) (b). Since F is a Z-basis of Zm , it is a Q-basis of Qm . So m = n. We have
P = [ E1F ] and P 1 = [ F1E]. But F is a Z-basis so [ F1E] 2 Zn,n .
(b) ) (a). By what we learned in linear algebra, F is a Q-basis of Qn and P =
[ E1F ]. So F is Z-independent. But P 1 2 Zn,n so [ F1E] 2 Zn,n so every standard base
vector is a linear combination of the f j so F spans Zn .
(b) ) (c). If T = P 1 has entries in Z, then det( PT ) = det( P) det( T ) =
det( In ) = 1, and since det( P), det( T ) 2 Z, this implies det( P) 2 { 1, 1}.
(c) ) (b). From first year linear algebra, P 1 = (det( P)) 1 adj( P), so det( P) 2
{ 1, 1} implies that P 1 has entries in Z. ⇤
Examples. 1. If n = 2 and P = ( 27 14 ) then det( P) = 8 7 = 1, so the columns of
P form a basis of Z2 .
2. If P = ( 10 02 ) then det( P) = 2, so the columns of P don’t form a basis of Z2 .
Recall that in linear algebra over a field, any set of n linearly independent vectors
in a vector space V of dimension n form a basis of V. This fails in Zn : the columns of
P are linearly independent but do not span Z2 .

4.6 Unimodular elementary row and column operations and the Smith
normal form for integral matrices
We interrupt our discussion of finitely generated abelian groups at this stage to inves-
tigate how the row and column reduction process of Linear Algebra can be adapted
to matrices over Z. Recall from MA106 that we can use elementary row and column
operations to reduce an m ⇥ n matrix of rank r over a field K to a matrix B = ( i j )
with ii = 1 for 1  i  r and i j = 0 otherwise. We called this the Smith Normal
Form of the matrix. We can do something similar over Z, but the non-zero elements
ii will not necessarily all be equal to 1.
The reason that we disallowed = 0 for the row and column operations (R3)
and (C3) (multiply a row or column by a scalar ) was that we wanted all of our
elementary operations to be reversible. When performed over Z, (R1), (C1), (R2)
and (C2) are reversible, but (R3) and (C3) are reversible only when = ±1. So, if A
is an m ⇥ n matrix over Z, then we define the three types of unimodular elementary
row operations as follows:
(UR1): Replace some row ri of A by ri + tr j , where j 6= i and t 2 Z;
(UR2): Interchange two rows of A;
(UR3): Replace some row ri of A by ri .
52 MA251 Algebra I November 27, 2014

The unimodular column operations (UC1), (UC2), (UC3) are defined similarly.
Recall from MA106 that performing elementary row or column operations on a matrix
A corresponds to multiplying A on the left or right, respectively, by an elementary
matrix. These elementary matrices all have determinant ±1 (1 for (UR1) and 1 for
(UR2) and (UR3)), so are unimodular matrices over Z.

Definition 172. A matrix A = (↵i j ) 2 Zm,n is said to be in Smith normal form if


↵i, j = 0 whenever i 6= j; and ↵i,i | ↵i+1,i+1 (divides) whenever 1  i < min(m, n).
(Note that any integer divides 0).

Theorem 173 (Smith Normal Form). Any matrix A 2 Zm,n can be put into Smith nor-
mal form B through a sequence of unimodular elementary row and column operations.
Moreover, B is unique.

Proof. We shall not prove the uniqueness part here. We use induction on m + n. The
base case is m = n = 1, where there is nothing to prove. Also if A is the zero matrix
then there is nothing to prove, so assume not.
Let d be the smallest positive entry in any matrix C = ( i j ) that we can obtain
from A by using unimodular elementary row and column operations. By using (R2)
and (C2), we can move d to position (1, 1) and hence assume that 11 = d. If d does
not divide 1 j for some j > 0, then we can write 1 j = qd + r with q, r 2 Z and
0 < r < d, and then replacing the j-th column c j of C by c j qc1 results in the entry
r in position (1, j), contrary to the choice of d. Hence d | 1 j for 2  j  n and
similarly d | i1 for 2  i  m.
Now, if 1 j = qd, then replacing c j of C by c j qc1 results in entry 0 position
(1, j). So we can assume that 1 j = 0 for 2  j  n and i1 = 0 for 2  i  m. If
m = 1 or n = 1, then we are done. Otherwise, we have C = (d) C 0 for some
(m 1) ⇥ (n 1) matrix C0 . By inductive hypothesis, the result of the theorem
applies to C 0 , so by applying unimodular row and column operations to C which
do not involve the first row or column, we can reduce C to D = ( i j ), which satisfies
11 = d, ii = di > 0 for 2  i  r, and i j = 0 otherwise, where di | di +1 for
2  i < r. To complete the proof, we still have to show that d | d2 . If not, then
adding row 2 to row 1 results in d2 in position (1,2) not divisible by d, and we obtain
a contradiction as before. ⇤

In the following two examples we determine the Smith normal forms of the matrix
A.

Example 174. Let A be the matrix A = ( 4235 2114 ). The general strategy is to reduce
the size of entries in the first row and column, until the (1,1)-entry divides all other
entries in the first row and column. Then we can clear all of these other entries.
Matrix Operation Matrix Operation
✓ ◆ ✓ ◆
42 21 0 21 r2 ! r2
c1 ! c1 2c2
35 14 7 14 r1 $ r2
✓ ◆ ✓ ◆
7 14 7 0
c2 ! c2 2c1
0 21 0 21
0 1
18 18 18 90
B 54 12 45 48 C
Example 175. Let A be the matrix A = B
@ 9 6 6
C.
63 A
18 6 15 12

Matrix Operation Matrix Operation


0 1 0 1
18 18 18 90 0 18 18 90
B 54 12 45 48 C B 9 12 45 48 C
B C c1 ! c1 c3 B C r1 $ r4
@ 9 6 6 63 A @ 3 6 6 63 A
18 6 15 12 3 6 15 12
November 27, 2014 MA251 Algebra I 53
0 1 0 1
3 6 15 12 3 6 15 12
B 9 c2 ! c2 2c1
B 12 45 48 C
C r2 ! r2 3r1 B 0
B 6 0 12 C
C
@ 3 c3 ! c3 5c1
6 6 63 A r3 ! r3 r1 @ 0 12 9 51 A
c4 ! c4 4c1
0 18 18 90 0 18 18 90
0 1 0 1
3 0 0 0 3 0 0 0
B 0 6 0 12 C c2 ! c2 B 0 6 0 12 C
B C B C r2 $ r3
@ 0 12 9 51 A c2 ! c2 + c3 @ 0 3 9 51 A
0 18 18 90 0 0 18 90
0 1 0 1
3 0 0 0 3 0 0 0
B 0 3 9 51 C B 0 3 9 51 C c3 ! c3 + 3c2
B C r3 ! r3 2r2 B C
@ 0 6 0 12 A @ 0 0 18 90 A c4 ! c4 17c2
0 0 18 90 0 0 18 90
0 1 0 1
3 0 0 0 3 0 0 0
B 0 3 0 0 C c4 ! c4 + 5c3 B 0 3 0 0 C
B C B C
@ 0 0 18 90 A r4 ! r4 + r3 @ 0 0 18 0 A
0 0 18 90 0 0 0 0

Note: There is also a generalisation to integer matrices of the row reduced normal
form from Linear Algebra, where only row operations are allowed. This is known
as the Hermite Normal Form and is more complicated. It will appear on an exercise
sheet.

4.7 Subgroups of free abelian groups


Proposition 176. Let G be an abelian group generated by n elements. Then any sub-
group of G is also generated by at most n elements.
Proof. Induction on n. Let G be generated by x1 , . . . , xn and let K be a subgroup of G.
For n = 0 there is nothing to prove.
Suppose n > 0, and let H be the subgroup of G generated by x1 , . . . , xn 1 . By
induction, K \ H is generated by y1 , . . . , yn 1 , say.
If K  H, then K = K \ H and we are done, so suppose not. Then there exist
elements of the form h + txn 2 K with h 2 H and t 6= 0. Since (h + txn ) 2 K, we
can assume that t > 0. Choose such an element yn = h + txn 2 K with t minimal
subject to t > 0. We claim that K is generated by y1 , . . . , yn , which will complete the
proof.
Let k 2 K. Then k = h0 + uxn for some h0 2 H and u 2 Z. If t does not
divide u then we can write u = tq + r with q, r 2 Z and 0 < r < t, and then
k qyn = (h0 qh) + rxn 2 K, contrary to the choice of t. So t | u and hence u = tq
and k qyn 2 K \ H. But K \ H is generated by y1 , . . . , yn 1 , so we are done. ⇤
Definition 177. Let T : G ! H be a homomorphism. Let E be a basis of G and F of
H. Then [ FTE] is not only said to represent T but also to represent im( T ).
In particular, a matrix in Zm,n represents the subgroup of Zm generated by its
columns.
Example
0
178.1If n = 3 and H is generated by v1 = (1 3 1) and v2 = (2 0 1), then
1 2
A=@ 3 0 A.
1 1

Proposition 179. Let H ⇢ Zm be represented by A 2 Zm,n and let P 2 Zm,m and


Q 2 Zn,n be unimodular. Then H is also be represented by QAP.
Proof. Let T : G ! H := Zm be a homomorphism and E, F bases of (respectively)
G, H such that A = [ FTE]. By theorem 171 there are basis E0 of H and F 0 of G such
that P = [ E1E0 ] and Q = [ F 0 1F ]. So QAP = [ F 0 TE0 ]. ⇤
If H is a subgroup of Zm represented by A and B is obtained from A by removing
a zero column then B also represents H.
54 MA251 Algebra I November 27, 2014

Theorem 180. Let H be a subgroup of Zm . Then there exists a basis y1 , . . . , ym of Zm


and integers d1 , . . . , dm 0 such that H = h d1 y1 , . . . , dm ym i and di | di+1 for all i.
Proof. By proposition 176 there are m generators x1 , . . . , xm of H. By the univer-
sal property of free abelian groups (proposition 170) there exists a homomorphism
T : Zm ! Zm such that T (ei ) = xi for all i where E = (e1 , . . . , em ) denotes the
standard basis of Zm . So im( T ) = H. Put A = [ ETE]. By theorem 173 there are
unimodular P, Q such that B := QAP is in Smith normal form. There are basis E0
of H and F 0 of G such that P = [ E1E0 ] and Q = [ F 0 1E]. So B = QAP = [ F 0 TE0 ].
Write E0 = ( z1 , . . . , zm ), F 0 = ( y1 , . . . , ym ), di = i,i . So di | di+1 for all i. Then
T ( zi ) = di yi for all i so im( T ) = h d1 y1 , . . . , dm ym i. ⇤
Example 181. In Example
0 178, it is straightforward to calculate the Smith normal
1
1 0
form of A, which is @ 0 3 A, so H = h y1 , 3y2 i.
0 0
By keeping track of the unimodular row operations carried out, we can, if we
need to, find the basis y1 , . . . , yn of Zn . Doing this in Example 178, we get:

Matrix Operation New basis


0 1
1 2 r2 ! r2 3r1
@ 3 0 A y1 = (1 3 1 ), y 2 = ( 0 1 0 ), y 3 = ( 0 0 1 )
1 1 r3 ! r3 + r1
0 1
1 2
@ 0 6 A c2 ! c2 2c1 y1 = (1 3 1 ), y 2 = ( 0 1 0 ), y 3 = ( 0 0 1 )
0 3
0 1
1 0
@ 0 6 A r2 $ r3 y1 = (1 3 1 ), y 2 = ( 0 0 1 ), y 3 = ( 0 1 0 )
0 3
0 1
1 0
@ 0 3 A r3 ! r3 + 2r2 y1 = (1 3 1 ), y 2 = ( 0 2 1 ), y 3 = ( 0 1 0 )
0 6
0 1
1 0
@ 0 3 A
0 0

4.8 General finitely generated abelian groups


Definition 182. Let X be a basis of an (abelian) group F. Let K ⇢ F be the subgroup
generated by a subset Y ⇢ F. We write

h X | Y i := F /K.

This is called a presentation of F /K or any isomorphic group. If X and Y are finite


then it is called a finite presentation. In this setting X is the set of generators and Y
the set of relations.
Example 183. Zn ⇠
= h x | nxi.
Proposition 184. Any finitely generated abelian group admits a finite presentation of
the form
⌦ ↵
y1 , . . . , yn d1 y1 , . . . , dn yn

where di 2 Z 0 and di | di+1 for all i.


Proof. Let G be an abelian group generated by a finite set { x1 , . . . , xn }. Let E =
(e1 , . . . , en ) be the standard basis of Zn . Define h : Zn ! G by h(ei ) = xi for all i and
put K := ker(h). By theorem 180 there exists a basis ( y1 , . . . , yn ) of Zn and positive
November 27, 2014 MA251 Algebra I 55

integers d1 , . . . , dn such that K = hd1 y1 , . . . , dn yn i. By the first isomorphism theorem


(theorem 163)

= Zn / K = h y 1 , . . . , y n | d 1 y 1 , . . . , d n y n i .
G = im(h) ⇠ ⇤

We shall use the notation Z0 := Z/0Z ⇠


=Z⇠
= h x | 0xi.
Proposition 185. Let di 2 Z 0 for 1  i  n. Then
⌦ ↵
y1 , . . . , yn d1 y1 , . . . , dn yn ⇠= Zd 1 ··· Zd n .

Proof. This is another application of the first isomorphism theorem. Let H = Zd1
· · · Zdn , so H is generated by x1 , . . . , xn , with x1 = (1, 0, . . . , 0), . . . , xn = (0, . . . ,
0, 1). Let e1 , . . . , en be the standard basis of Zn . By the universal property (proposi-
tion 170) there is a surjective homomorphism : Zn ! H for which

(↵1 e1 + · · · + ↵n en ) = ↵1 x1 + · · · + ↵n xn

for all ↵1 , . . . , ↵n 2 Z. By theorem 163, we have H ⇠


= Zn /K where K is the kernel of
. We have

K = (↵1 , . . . , ↵n ) 2 Zn ↵1 x1 + · · · + ↵n xn = 0 H
⌦ ↵
= (↵1 , . . . , ↵n ) 2 Zn di divides ↵i for all i = d1 e1 , . . . , dn en .

Thus
= Zn / K = h e 1 , . . . , e n | d 1 e 1 , . . . , d n e n i .
H⇠ ⇤
Note Z1 = 0 and recall Z0 ⇠
= Z. Putting all results together, we get the main
theorem of this chapter.
Theorem 186 (Fundamental theorem of finitely generated abelian groups). Let G be
a finitely generated abelian group. Then there exist integers r, k and d1 , . . . , dr 2 with
di | di+1 for all i such that

G⇠
= Zd 1 ··· Zdr Zk .

(For k = r = 0 this just means that G = 0). ⇤


It can be proved that r, k and the di are uniquely determined by G.
The integer k may be 0, which is the case if and only if G is finite. At the other
extreme, if r = 0 then G is free abelian.
Examples 187. 1. The group G corresponding to example 174 is

h x1 , x2 | 42x1 35x2 , 21x1 14x2 i

and we have G ⇠ = Z7 Z21 , a group of order 7 ⇥ 21 = 147.


2. The group defined by example 175 is

18x1 + 54x2 + 9x3 + 18x4 , 18x1 + 12x2 6x3 + 6x4 ,
x1 , x2 , x3 , x4 ,
18x1 + 45x2 + 6x3 + 15x4 , 90x1 + 48x2 + 63x3 + 12x4

which is isomorphic to Z3 Z3 Z18 Z, and is an infinite group with a (maximal)


finite subgroup of order 3 ⇥ 3 ⇥ 18 = 162,
3. The group defined by example 178 is

h x1 , x2 , x3 | x1 + 3x2 x3 , 2x1 + x3 i,

and is isomorphic to Z1 Z3 Z⇠
= Z3 Z, so it is infinite, with a finite subgroup of
order 3.
56 MA251 Algebra I November 27, 2014

4.9 Finite abelian groups


In particular, any finite abelian group G is of the form G ⇠ = Zd1 · · · Zdr , where
di > 1 and di | di+1 for all i, and | G | = d1 · · · dr . (For r = 0 this just means G = 0).
It can be shown that r and the di are uniquely determined by G. This enables us
to classify isomorphism classes of finite abelian groups of a given order n.
Examples 188. 1. n = 4. The decompositions are 4 and 2 ⇥ 2, so G ⇠ = Z4 or Z2 Z2 .

2. n = 15. The only decomposition is 15, so G = Z15 is necessarily cyclic.
3. n = 36. Decompositions are 36, 2 ⇥ 18, 3 ⇥ 12 and 6 ⇥ 6, so G ⇠
= Z36 , Z2 Z18 ,
Z3 Z12 and Z6 Z6 .

4.10 Tensor products


Given two abelian groups A and B, one can form a new abelian group A ⌦ B, their
tensor product — do not confuse it with the direct product! We denote by F the free
abelian group with the elements of the direct product A ⇥ B as a basis:

F = h A ⇥ B | ? i.

Elements of F are formal finite Z-linear combinations Âi ni ( ai , bi ), ni 2 Z, ai 2 A,


bi 2 B. Let F0 be the subgroup of F generated by the elements

( a + a0 , b) ( a, b) ( a0 , b) n( a, b) (na, b)
(189)
( a, b + b0 ) ( a, b) ( a, b0 ) n( a, b) ( a, nb)

for all possible n 2 Z, a, a0 2 A, b, b0 2 B. The tensor product is the quotient group

A ⌦ B = F / F0 = h A ⇥ B | relations in (189) i.

We have to get used to this definition that may seem strange at first glance. First, it
is easy to materialise certain elements of A ⌦ B. An elementary tensor is

a ⌦ b := ( a, b) + F0

for a 2 A, b 2 B.
The generators (189) for F0 give rise to properties of elementary tensors:

( a + a0 ) ⌦ b = a ⌦ b + a0 ⌦ b n( a ⌦ b) = (na) ⌦ b
(190)
a ⌦ (b + b0 ) = a ⌦ b + a ⌦ b0 n( a ⌦ b) = a ⌦ (nb).

Exercise 191. Show that a ⌦ 0 = 0 ⌦ b = 0 for all a 2 A, b 2 B.


It is important to realise that not all elements of A ⌦ B are elementary. They are
Z-linear combinations of elementary tensors.
Proposition 192. Suppose that A, B are abelian groups. Let { ai | i 2 I } span A and
{b j | j 2 J } span B. Then the tensor product A ⌦ B is spanned by { ai ⌦ b j | (i, j) 2
I ⇥ J }.
Proof. Given Âk ck ⌦ dk 2 A ⌦ B, we can express ck = Âi nki ai , and all dk = Â j mk j b j .
Then ⇣ ⌘ ⇣ ⌘
 k k   ki i
c ⌦ d = n a ⌦ Â k j j = Â nki mk j ai ⌦ b j .
m b ⇤
k k i j k,i, j

In fact, even a more subtle statement holds.


Exercise 193. Suppose that A, B are free abelian groups. Let { ai | i 2 I } be a basis
of A and {b j | j 2 J } of B. Show that the tensor product A ⌦ B is a free abelian group
with basis { ai ⌦ b j | (i, j) 2 I ⇥ J }.
November 27, 2014 MA251 Algebra I 57

However, for general groups tensor products could behave in quite an unpre-
dictable way. For instance, Z2 ⌦ Z3 = 0. Indeed,

1Z2 ⌦ 1Z3 = 3 · 1Z2 ⌦ 1Z3 = 1Z2 ⌦ 3 · 1Z3 = 0.

To help sorting out zero from nonzero elements in tensor products we need to
understand a connection between tensor products and bilinear maps.
Definition 194. Let A, B, and C be abelian groups. A function ! : A ⇥ B ! C is a
bilinear map if
!( a + a0 , b) = !( a, b) + !( a0 , b) n!( a, b) = !(na, b)
(195)
!( a, b + b0 ) = !( a, b) + !( a, b0 ) n!( a, b) = !( a, nb)

for all n 2 Z, a, a0 2 A, b, b0 2 B.
Let Bil( A ⇥ B, C ) be the set of all bilinear maps from A ⇥ B to C.
Lemma 196 (Universal property of tensor product). The function

✓ : A ⇥ B ! A ⌦ B, ✓ ( a, b) = a ⌦ b

is a bilinear map. This bilinear map is universal, that is, for all C the composition with
✓ defines a bijection

hom( A ⌦ B, C ) ! Bil( A ⇥ B, C ), 7! ✓.

Proof. The function ✓ is a bilinear map: the properties (195) of a bilinear map easily
follow from the corresponding properties (190) of elementary tensors.
Let Fun(·, ·) denote the set of functions between two sets. Recall that F denotes
the free abelian group with basis indexed by A ⇥ B. By the universal property of free
abelian groups (proposition 170), we have a bijection

hom( F, C ) ! Fun( A ⇥ B, C ).

Bilinear maps correspond to functions vanishing on F0 , or equivalently to linear maps


from F / F0 (lemma 162). ⇤
In the following section we will need a criterion for elements of R ⌦ S1 to be
nonzero. The circle group S1 is a group under multiplication, creating certain confu-
sion for tensor products. To avoid this confusion we identify the multiplicative group
S1 with the additive group R/2⇡ Z via the natural isomorphism e xi 7! x + 2⇡ Z.
Proposition 197. Let a ⌦ ( x + 2⇡ Z) 2 R ⌦ R/2⇡ Z where a, x 2 R. Then a ⌦ ( x +
2⇡ Z) = 0 if and only if a = 0 or x 2 ⇡ Q.
n
Proof. If a = 0, then a ⌦ ( x + 2⇡ Z) = 0. If x = m⇡ with m, n 2 Z, then
a
a ⌦ ( x + 2⇡ Z) = 2m ⌦ ( x + 2⇡ Z)
2m
a
= ⌦ 2m( x + 2⇡ Z) = a ⌦ (2n⇡ + 2⇡ Z) = a ⌦ 0 = 0.
2m
In the opposite direction, let us consider a ⌦ ( x + 2⇡ Z) with a 6= 0 and x/⇡ 62 Q.
It suffices to construct a bilinear map : R ⇥ R/2⇡ Z ! A to some group A such that
( a, x + 2⇡ Z) 6= 0. By lemma 196 this gives a homomorphism e : R ⌦ R/2⇡ Z ! A
with e( a ⌦ ( x + 2⇡ Z)) = ( a, x + 2⇡ Z) 6= 0. Hence, a ⌦ ( x + 2⇡ Z) 6= 0.
Let us consider R as a vector space over Q. The subgroup ⇡ Q of R is a vector
subspace, hence the quotient group A = R/⇡ Q is also a vector space over Q. Since
2⇡ Z ⇢ ⇡ Q, we have a homomorphism

: R/2⇡ Z ! R/⇡ Q, ( z + 2⇡ Z) = z + ⇡ Q.
58 MA251 Algebra I November 27, 2014

Since x/⇡ 62 Q, it follows that ( x + 2⇡ Z) 6= 0.


Choose a basis17 ei of R over Q such that e1 = a. Let ei : R ! Q be the linear
function18 computing the i-th coordinate in this basis:
⇣ ⌘
ei  j j = xi .
x e
j

The required bilinear map is defined by (b, z + 2⇡ Z) = e1 (b) ( z + 2⇡ Z). Clearly,


( a, x + 2⇡ Z) = e1 (e1 ) ( x + 2⇡ Z) = 1 · ( x + 2⇡ Z) = x + ⇡ Q 6= 0. ⇤
Exercise 198. Zn ⌦ Zm ⇠
= Zgcd(n,m) .

4.11 Hilbert’s Third problem


All the hard work we have done is going to pay off now. We will understand a
solution of the third Hilbert problem. In 1900 Hilbert formulated 23 problems that,
in his view, would influence mathematics of the 20th century. The third problem was
solved first, in the same year 1900 by Dehn, which is quite remarkable as the problem
was missing from Hilbert’s lecture and appeared in print only in 1902, two years after
its solution.
We need some terminology. A subset C ⇢ Rn is said to be convex if tx + (1 t) y 2
C whenever x, y 2 C and t 2 R and 0  t  1. The convex hull of a subset V ⇢ Rn
is the intersection of those convex subsets of Rn containing V. A polytope in Rn is the
convex hull of a finite subset of Rn .
In his third problem Hilbert asks whether two 3D polytopes of the same volume
are scissor congruent defined as follows.

Definition 199. The scissor group Pn is by definition presented by n-dimensional


polytopes M ⇢ Rn as generators and relations
M = N whenever M can be moved to N by an isometry of Rn (we say that M
and N are congruent).
A = B + C whenever there exists a hyperplane cutting A into two pieces B, C.
For a polytope M, let [ M] 2 Pn denote its class in the scissor group. Two polytopes
M, N are said to be scissor congruent if [ M] = [ N ].

By lemma 162, n-dimensional volume gives rise to a homomorphism

⌫n : Pn ! R, ⌫n ([ M]) = volume( M).

The 3rd Hilbert problem asks whether ⌫3 is injective.

Theorem 200. ⌫2 is injective.

Proof. The following picture shows that a triangle with base b and height h is equiv-
alent to the rectangle with sides b and h/2.

In particular, any triangle is equivalent to a right-angled triangle.


17 Every independent set of vectors in an infinite-dimensional vector space can be extended to a basis.
18 commonly known as a covector
November 27, 2014 MA251 Algebra I 59

Next we shall show that two right-angled triangles of the same area are scissors
congruent.

B Q
C

The equal area triangles are CAB and CPQ. This means that |CAkCB| = |CPkCQ|.
Hence, |CA|/|CQ| = |CP|/|CB| and the triangles CPB and CAQ are similar. In
particular, the edges AQ and PB are parallel, thus the triangles AQP and AQB share
the same base AQ and height and, consequently, are scissors congruent. So

[ ACB] = [ ACQ] [ ABQ] = [ ACQ] [ APQ] = [ PCQ].

Let x 2 ker(⌫2 ). We can write x = Âi ni [ Mi ] for integers ni and polytopes Mi . It


is easy to prove that [ Mi ] is a sum of triangles [ Mi ] = Â j [ Ti j ]. So there are triangles
Ai and B j such that
⇣ ⌘ ⇣ ⌘
x = Â [ Ai ] Â[ B j ] .
i j

Replacing the Ai by triangles of the same volumes if necessary, we may assume


that the Ai have the same height and that they combine to form a single triangle A,
as in the following picture.

Likewise for the B j . So there are triangles A, B such that x = [ A] [ B]. But ⌫2 ( x) = 0
so A and B have the same volume. Therefore [ A] = [ B] and x = 0. ⇤
Observe that ⌫n is surjective. So ⌫2 is an isomorphism and P2 ⇠
= R. However, ⌫3
is not injective as we shall now show.
Theorem 201. Let T be a regular tetrahedron, C a cube, both of unit volume. Then
[ T ] 6= [C ] in P3 .
Proof. Let M be a polytope with set of edges I. For each edge i, let hi be its length hi
and let ↵i 2 R/2⇡ Z be the angle along this edge. The Dehn invariant of M is

( M) = Â hi ⌦ ↵i 2 R ⌦ R/2⇡ Z.
i

Using lemma 162 we shall prove that is a well-defined homomorphism : P3 !


R ⌦ R/2⇡ Z. By definition P3 = F / F0 where
Indeed, defines a homomorphism from the free group F generated by all poly-
topes. Keeping in mind P3 = F / F0 , we need to check that vanishes on generators
of F0 . Clearly, ( M N ) = 0 if M and N are congruent. It is slightly more subtle to
see that ( A B C ) = 0 if A = B [ C is a cut. One collects the terms in 4 groups,
with the terms in each group summing to zero.
Survivor: an edge of length h and angle ↵ survives completely in B or C but not
both. This contributes h ⌦ ↵ h ⌦ ↵ = 0.
Edge cut: an edge of length h and angle ↵ is cut into edges of lengths h B in B and hC
in C. This contributes h ⌦ ↵ h B ⌦ ↵ hC ⌦ ↵ = (h h B hC ) ⌦ ↵ = 0 ⌦ ↵ = 0.
60 MA251 Algebra I November 27, 2014

Angle cut: an edge of length h and angle ↵ has its angle cut into angles ↵ B in B
and ↵C in C. This contributes h ⌦ ↵ h ⌦ ↵ B h ⌦ ↵C = h ⌦ (↵ ↵ B ↵C ) =
h ⌦ 0 = 0.
New edge: a new edge of length h is created. If its angle in B is ↵, then its angle
in C is ⇡ ↵. This contributes h ⌦ ↵ h ⌦ (⇡ ↵ ) = h ⌦ ⇡ = 0, by
proposition 197.
Finally
p
⇡ ⇡ 2 1
([C ]) = 12(1 ⌦ ) = 12 ⌦ = 0, while ([ T ]) = 6 p ⌦ arccos 6= 0,
4 4 3
3 3

by proposition 197 and lemma 202. Hence, [C ] 6= [ T ]. ⇤


/ ⇡ Q.
Lemma 202. arccos(1/3) 2
Proof. Let arccos(1/3) = q⇡. We consider a sequence xn = cos(2n q⇡ ). If q is
rational, then this sequence admits only finitely many values. On the other hand,
x0 = 1/3 and

xn+1 = cos(2 · 2n q⇡ ) = 2 cos2 (2n q⇡ ) 1 = 2x2n 1.

For example

7 17 5983 28545857
x1 = , x2 = , x3 = , x4 = ,...
9 81 38 316
It is easy to show that the denominators grow indefinitely. Contradiction. ⇤
Now it is natural to ask what exactly the group P3 is. It was proved later (in 1965)
that the joint homomorphism (⌫3 , ) : P3 ! R (R ⌦ R/2⇡ Z) is injective. It is not
surjective and the image can be explicitly described, but we won’t do it here.

4.12 Possible topics for the second year essays


If you would like to write an essay taking something further from this course, here
are some suggestions. Ask me if you want more information.
(a) Bilinear and quadratic forms over fields of characteristic 2 (i.e. where 1 + 1 =
0). You can do hermitian forms too.
(b) Grassmann algebras, determinants and tensors.
(c) Matrix Exponents and Baker-Campbell-Hausdorff formula.
(d) Abelian groups and public key cryptography (be careful not to repeat whatever
is covered in Algebra 2).
(e) Lattices (abelian groups with bilinear forms), the E8 lattice and the Leech lattice.
(f) Abelian group law on an elliptic curve.
(g) Groups Pn for other n, including a precise description of P3 .

The End

Anda mungkin juga menyukai