Daan Krammer
November 27, 2014
Contents
[ G, TS, E] = [ G, T, F ][ F, S, E]. ⇤
is written [ E, v] or [ Ev] and is called the column vector of v with respect to E. The ai
are known as the coordinates of v with respect to E.
For typographical reasons we often denote the column vector (2) by ( a1 , . . . , an )T
(T denotes transpose).
For fixed V, E as above, the association v 7! [ E, v] defines a bijective map V !
n,1
K .
Theorem 3. Let T : V ! W be a linear map and let v 2 V. Let (respectively) E, F be
bases of (respectively) V, W. Then
[ E, 1, F ] ei = [ E, 1, F ][ F, f i ] = [ E, f i ] = f i .
[ F2 , T, E2 ] = [ F2 , 1, F1 ][ F1 , T, E1 ][ E1 , 1, E2 ]. ⇤
The foregoing corollary explains why matrices of the form [ F, 1, E] are called
change of base matrices.
Definition 5. Two matrices A, B 2 K m,n are said to be equivalent if there exist invert-
ible matrices P 2 K m,m , Q 2 K n,n such that B = PAQ.
Theorem 6. A, B 2 K m,n . Then the following are equivalent:
(a) The matrices A, B are equivalent.
(b) The matrices A, B represent the same linear map with respect to possibly different
bases.
(c) The matrices A, B have the same rank. ⇤
rank( T IV ) + nullity( T IV ) = n,
↵1 v1 + ↵2 v2 + · · · + ↵r vr = 0.
↵1 1 v1 + ↵2 2 v2 + · · · + ↵r r vr = 0.
Now, subtracting 1 times the first equation from the second gives
↵2 ( 2 1 )v2 + · · · + ↵r ( r 1 ) vr = 0.
c A ( A) = det( A AI ) = det(0) = 0.
This argument is faulty because you cannot really plug the matrix A into det( A xI ):
you must compute this polynomial first.
Theorem 10 (Cayley-Hamilton). Let c A ( x) be the characteristic polynomial of the n ⇥
n matrix A over an arbitrary field K. Then c A ( A) = 0.
Proof. Recall from MA106 that, for any n ⇥ n matrix B, the adjoint adj( B) is the
n ⇥ n matrix whose ( j, i )th entry is the cofactor ci j = ( 1)i+ j det( Bi j ), where Bi j is
the matrix obtained from B by deleting the i-th row and the j-th column of B. We
proved that B adj( B) = det( B) In .
By definition, c A ( x) = det( A xIn ), and ( A xIn )adj( A xIn ) = det( A
xIn ) In . Now det( A xIn ) is a polynomial of degree n in x; that is det( A xIn ) =
a0 x0 + a1 x1 + · · · + an xn , with ai 2 K. Similarly, putting B = A xIn in the last
paragraph, we see that the ( j, i )-th entry ( 1)i+ j det( Bi j ) of adj( B) is a polynomial
of degree at most n 1 in x. Hence adj( A xIn ) is itself a polynomial of degree
at most n 1 in x in which the coefficients are n ⇥ n matrices over K. That is,
adj( A xIn ) = B0 x0 + B1 x + · · · + Bn 1 xn 1 , where each Bi is an n ⇥ n matrix over
K. So we have
(A xIn )( B0 x0 + B1 x + · · · + Bn 1x
n 1
) = ( a0 x0 + a1 x1 + · · · + an xn ) In .
AB0 = a0 In AB0 = a0 In
AB1 B0 = a1 In A2 B1 AB0 = a1 A
AB2 B1 = a2 In A3 B2 A2 B1 = a2 A2
··· ···
ABn 1 Bn 2 = an 1 In An Bn 1 An 1 Bn 2 = an 1 An 1
Bn 1 = an In An Bn 1 = an An .
Now summing all of the equations in the right hand column gives
0 = a0 A0 + a1 A + . . . + an 1A
n 1
+ an An
the unique monic polynomial p of minimal degree for which p( T )(v) = 0V . Since
p( T ) = 0 if and only if p( T )(v) = 0V for all v 2 V, µ A is the least common multiple
of the polynomials µ A,v for all v 2 V.
But p( T )(v) = 0V for all v 2 V if and only if p( T )(bi ) = 0V for all bi in a basis
b1 , . . . , bn of V (exercise), so µ A is the least common multiple of the polynomials
µ A,bi .
This gives a method of calculating µ A . For any v 2 V, we can compute µ A,v
by calculating the sequence of vectors v, T (v), T 2 (v), T 3 (v) and stopping when
it becomes linearly dependent. In practice, we compute T (v) etc. as Av for the
corresponding column vector v 2 K n,1 .
For example, let K = R and
0 1
3 1 0 1
B 1 1 0 1 C
A= B C.
@ 0 0 1 0 A
0 0 0 1
Example 20. Let’s find the generalised eigenspaces and a Jordan chain of
0 1
3 1 0
A= @0 3 1A.
0 0 3
Definition 24. Let T : V ! V be linear. A Jordan basis for T and V is a finite basis E
of V such that there exist Jordan blocks J1 , . . . , Jk such that
[ ETE] = J1 ··· Jk .
Notice that a Jordan basis is not, in general, unique. Thus, there exists multiple
matrices P such that J = P 1 AP is the JCF of A.
The final corollary follows from an explicit calculation5 for J because both mini-
mal and characteristic polynomials of J and A are the same.
Corollary 30. Suppose that the eigenvalues of A are 1 , . . . , t , and that the Jordan
blocks in J for the eigenvalue i are J i ,ki,1 , . . . , J i ,ki, j , where ki,1 ki,2 · · · ki, ji .
i
Then the characteristic polynomial is c A ( x) = ’it=1 ( i x)ki , where ki = ki,1 +
· · · + ki, ji for 1 i t. The minimal polynomial is µ A ( x) = ’it=1 ( x ki,1
i) .
Figure 1: All Jordan matrices of size 2 or 3 up to reordering the blocks. A dot means
zero; , µ, ⇡ 2 K are distinct.
Jordan matrix cA µA
✓ ◆
·
· µ
( x)(µ x) (x )( x µ)
✓ ◆
1
·
( x)2 (x )2
✓ ◆
·
·
( x)2 (x )
0 1
· ·
@· µ ·A ( x)(µ x)(⇡ x) ( x )( x µ )( x ⇡)
· · ⇡
0 1
1 ·
@· ·A ( x)2 (µ x) (x )2 ( x µ)
· · µ
0 1
· ·
@· ·A ( x)2 (µ x) (x )( x µ)
· · µ
0 1
1 ·
@· 1A ( x)3 (x )3
· ·
0 1
1 ·
@· ·A ( x)3 (x )2
· ·
0 1
· ·
@· ·A ( x)3 (x )
· ·
In figure 1 you can find a full list of the Jordan matrices of size 2 or 3 up to
reordering the blocks. For each we give the minimal and characteristic polynomials.
Corollary 31. Let A 2 K n,n with n 2 {2, 3} admit a JCF. Then a JCF of A is determined
by the minimal and characteristic polynomials of A.
Proof. Immediate from figure 1. ⇤
In the rest of this section we shall show examples where A 2 K n,n is given with
n 2 {2, 3} and one is asked to find a Jordan basis.
Example 32. A = ( 11 41 ). We calculate c A ( x) = x2 2x 3 = (x 3)( x + 1), so
5 The characteristic polynomial of J is the product of the characteristic polynomials of the Jordan
blocks and the minimal polynomial of J is the least common multiple of characteristic polynomials of
the Jordan blocks
November 27, 2014 MA251 Algebra I 11
there are two distinct eigenvalues, 3 and 1. Associated eigenvectors are (2 1)T and
( 2 1)T , so we put P = ( 21 12 ) and then P 1 AP = ( 30 01 ).
If the eigenvalues are equal, then there are two possible JCF’s, J 1 ,1 J 1 ,1 , which
is a scalar matrix, and J 1 ,2 . The minimal polynomial is respectively ( x 1 ) and
(x ) 2 in these two cases. In fact, these cases can be distinguished without any
1
calculation whatsoever, because in the first case A = PJP 1 = J so A is its own JCF.
In the second case, a Jordan basis consists of a single Jordan chain of length
2. To find such a chain, let v2 be any vector for which ( A 1 I2 ) v2 6 = 0 and let
v1 = ( A 1 I2 ) v2 . (In practice, it is often easier to find the vectors in a Jordan chain
in reverse order.)
Example 33. A = ( 11 43 ). We have c A ( x) = x2 + 2x + 1 = ( x + 1)2 , so there is a
single eigenvalue 1 with multiplicity 2. Since the first column of A + I2 is non-zero,
we can choose v2 = (1 0)T and v1 = ( A + I2 )v2 = (2 1)T , so P = ( 21 01 ) and
P 1 AP = ( 01 11 ).
Now let n = 3. If there are three distinct eigenvalues, then A is diagonalisable.
Suppose that there are two distinct eigenvalues, so one has multiplicity 2, and
the other has multiplicity 1. Let the eigenvalues be 1 , 1 , 2 , with 1 6= 2 . Then
there are two possible JCF’s for A, J 1 ,1 J 1 ,1 J 2 ,1 and J 1 ,2 J 2 ,1 , and the minimal
polynomial is ( x 2
1 )( x 2 ) in the first case and ( x 1 ) (x 2 ) in the second.
In the first case, a Jordan basis is a union of three Jordan chains of length 1, each
of which consists of an eigenvector of A.
0 1
2 0 0
Example 34. A = @ 1 5 2 A. Then
2 6 2
We know from the theory above that the minimal polynomial must be ( x 2)( x 1)
or ( x 2)2 ( x 1). We can decide which simply by calculating ( A 2I3 )( A I3 ) to
test whether or not it is 0. We have
0 1 0 1
0 0 0 1 0 0
A 2I3 = @ 1 3 2 A, A I3 = @ 1 4 2 A,
2 6 4 2 6 3
c A ( x) = (3 x)[(3 x)( 1 x) + 4] 2 + (3 x)
3 2 2
= x + 5x 8x + 4 = (2 x) (1 x),
0 1 0 1
1 2 1 0 0 0
A 2I3 = @ 0 1 1 A, (A 2I3 )2 = @ 1 3 2 A,
1 4 3 2 6 4
12 MA251 Algebra I November 27, 2014
0 1
2 2 1
(A I3 ) = @ 0 2 1 A.
1 4 2
and then 0 1
2 1 0
1
P AP = @ 0 2 0 A.
0 0 1
c A ( x) = x[(3 + x) x + 2] 2( x + 1) 2 + (3 + x)
= x3 3x2 3x 1= (1 + x)3 .
We have 0 1
1 2 1
A + I3 = @ 1 2 1 A,
1 2 1
and then 0 1
1 1 0
1
P AP = @ 0 1 0 A.
0 0 1
In the third case, there is a single Jordan chain, and we choose v3 such that
2 2
(A 1 I3 ) v3 6 = 0, v2 = ( A 1 I3 ) v3 , v1 = ( A 1 I3 ) v3 .
0 1
0 1 0
Example 37. A = @ 1 1 1 A. Then
1 0 2
We have 0 1 0 1
1 1 0 0 1 1
A + I3 = @ 1 0 1 A, ( A + I3 )2 = @ 0 1 1 A,
1 0 1 0 1 1
V = W1 ··· Wk
T x = T ( x1 , . . . , xk ) = T ( x1 + · · · + xk )
= Tx1 + · · · + Txk because T is linear
= T1 x1 + · · · + Tk xk because xi 2 Wi for all i
= 0+···+0 because Ti xi = 0 for all i
=0
0 = Ty = T⇢( x1 , . . . , xk ) = T ( x1 + · · · + xk )
= Tx1 + · · · + Txk because T is linear
= T1 x1 + · · · + Tk xk because xi 2 Wi for all i
= ⇢( T1 x1 , . . . , Tk xk ) because Ti xi 2 Wi for all i.
Theorem 39. Let T : V ! V be linear admitting a JCF J. Let i > 0 and 2 K. Then
the number of Jordan blocks of J with eigenvalue and degree at least i is equal to
nullity ( T In )i nullity ( T In )i 1 .
Proof. Let us first prove this if J is a Jordan block, J = J ,n .
Let (b1 , . . . , bn ) be a Jordan basis for J. For all ai 2 K
⇣ ⌘ n
(J In )k  ai bi =  ai bi k
i i =k+1
which shows
(
Span(b1 , . . . , bk ) if 0 k n;
ker( J In )k =
Kn if n k.
so (
k if 0 k n;
nullity( A In )k = nullity( J In )k =
n if n k.
The result for J = J ,k follows.
Knowing the result for Jordan blocks, we deduce the full result as follows. There
are subspaces W1 , . . . , Wk such that V = W1 · · · Wk and T (Wi ) ⇢ Wi and the
restriction Ti of T to Wi is a Jordan block (that is, the matrix of T with respect to a
suitable basis is a Jordan block). Then
j j
i 1
= nullity( T IW ) nullity( T IW )i by lemma 38. ⇤
They are adjoined to our basis of V in the second stage. They each form a Jordan
chain of length 1, so we now have a collection of n vectors which form a disjoint
union of Jordan chains.
To complete the proof, we need to show that these n vectors form a basis of V, for
which is it is enough to show that they are linearly independent.
By lemma 40 we may assume that T has only one eigenvalue .
Suppose that ↵1 w1 + · · · + ↵n m wn m + x = 0, where x 2 U. Applying T In
gives
↵1 ( T In )(w1 ) + · · · + ↵l ( T In )(w` ) + ( T In )( x) = 0.
For all i with 1 i ` then, ( T In )(wi ) is the last member of one of the `
Jordan chains for TU . Moreover, ( T In )( x) is a linear combination of the basis
vectors of U other than ( T In )(wi ) for 1 i `. Hence, by linear independence
of the basis of U, we deduce that ↵i = 0 for 1 i ` and ( T IV )( x) = 0.
So x 2 ker( TU IU ). But, by construction, w`+1 , . . . , wn m extend a basis of
ker( TU IU ) to W = ker( T IV ), so we also get ↵i = 0 for ` + 1 i n m,
which completes the proof. ⇤
Lemma 40. Let T : V ! V be linear and dim(V ) = n < 1. Let ⇤ be a set of
eigenvalues of T. For all 2 ⇤ let x 2 V be such that ( T IV )n x = 0. Assume
Â⇤ x = 0. Then x = 0 for all .
Proof. Induction on k = #⇤. For k = 0 there is nothing to prove. Assume it’s true for
k 1. Pick ⇡ 2 ⇤ and apply ( T ⇡ IV )n to Â⇤ x = 0. We get
 (T ⇡ IV )n x = 0.
⇤\{⇡ }
2.9 Examples
0 1
2 0 0 0
B 0 2 1 0 C
Example 41. A = B
@ 0 0 2
C.
0 A
1 0 2 2
Then c A ( x) = ( 2 x)4 , so there is a single eigenvalue 2 with multiplicity 4. We
find 0 1
0 0 0 0
B 0 0 1 0 C
( A + 2I4 ) = B
@ 0 0 0
C,
0 A
1 0 2 0
nullspace of A + 2I4 . In fact, since this nullspace is spanned by the second and fourth
standard basis vectors, the obvious choice is v2 = (1 0 0 0)T , v4 = (0 0 1 0)T , and
then v1 = ( A + 2I4 )v2 = (0 0 0 1)T , v3 = ( A + 2I4 )v4 = (0 1 0 2)T , so to transform
A to JCF, we put
0 1 0 1 0 1
0 1 0 0 0 2 0 1 2 1 0 0
B 0 0 1 0 C 1 B 1 0 0 0 C 1 B 0 2 0 0 C
P= B C, P = B C, P AP = B C.
@ 0 0 0 1 A @ 0 1 0 0 A @ 0 0 2 1 A
1 0 2 0 0 0 1 0 0 0 0 2
0 1
1 3 1 0
B 0 2 1 0 C
Example 42. A = B
@ 0 0 2
C.
0 A
0 3 1 1
Then c A ( x) = ( 1 x)2 (2 x)2 , so there are two eigenvalue 1, 2, both with
multiplicity 2. There are four possibilities for the JCF (one or two blocks for each
of the two eigenvalues). We could determine the JCF by computing the minimal
polynomial µ A but it is probably easier to compute the nullities of the eigenspaces
and use theorem 39. We have
0 1
0 3 1 0
B 0 3 1 0 C
A + I4 = B C,
@ 0 0 3 0 A
0 3 1 0
0 1 0 1
3 3 1 0 9 9 0 0
B 0 0 1 0 C B 0 0 0 0 C
(A 2I4 ) = B
@ 0 0 0
C,
0 A
(A 2I4 )2 = B
@ 0 0 0
C.
0 A
0 3 1 3 0 9 0 9
The rank of A + I4 is clearly 2, so its nullity is also 2, and hence there are two
Jordan blocks with eigenvalue 1. The three non-zero rows of ( A 2I4 ) are linearly
independent, so its rank is 3, hence its nullity 1, so there is just one Jordan block with
eigenvalue 2, and the JCF of A is J 1,1 J 1,1 J2,2 .
For the two Jordan chains of length 1 for eigenvalue 1, we just need two linearly
independent eigenvectors, and the obvious choice is v1 = (1 0 0 0)T , v2 = (0 0 0 1)T .
For the Jordan chain v3 , v4 for eigenvalue 2, we need to choose v4 in the nullspace of
( A 2I4 )2 but not in the nullspace of A 2I4 . (This is why we calculated ( A 2I4 )2 .)
An obvious choice here is v4 = (0 0 1 0)T , and then v3 = ( 1 1 0 1)T , and to
transform A to JCF, we put
0 1 0 1 0 1
1 0 1 0 1 1 0 0 1 0 0 0
B 0 0 1 0 C 1 B 0 1 0 1 C 1 B 0 1 0 0 C
P= B C, P = B C, P AP = B C.
@ 0 0 0 1 A @ 0 1 0 0 A @ 0 0 2 1 A
0 1 1 0 0 0 1 0 0 0 0 2
0 1 0 1
Jk1 , 1
0 ··· 0 Jkn1 , 1
0 ··· 0
B 0
B Jk2 , ··· 0 C
C
B 0
B Jkn2 , ··· 0 C
C
2 n 2
If J = B .. C then J = B .. C .
@ . A @ . A
0 0 ··· Jkt , t 0 0 ··· Jknt , t
Here x(n) 2 K m is a sequence of vectors in a vector space over a field K. One thinks
of x(n) as a state of the system at time n. The initial state is x(0) = w. The n ⇥ n
matrix A with coefficients in K describes the evolution of the system. The adjective
autonomous means that the evolution equation does not change with the time7 .
It takes longer to formulate this problem than to solve it. The solution is a no-
brainer:
x(n) = Ax(n 1) = A2 x(n 2) = · · · = An x(0) = An w.
As a working example, let us consider a 2-step linearly recursive sequence. It is
determined by a quadruple ( a, b, c, d) 2 K 4 and the rules
This gives the closed formula for arithmo-geometric sequence we were seeking:
n)qn a + nqn 1 b.
sn = (1
✓ p ◆
2 (c + c2 + 4d)/2 p 0
If c + 4d 6= 0, the JCF of A is and
0 (c c2 + 4d)/2
the closed formula for sn will involve the sum of two geometric sequences. Let us
7A nonautonomous system would be described by x(n + 1) = A(n) x(n) here.
8 Closed means non-recursive, for instance, sn = a + n(b a) for the arithmetic sequence
November 27, 2014 MA251 Algebra I 19
see it through for Fibonacci and Lucas numbers using Lagrange’s polynomial.pSince
c = d = 1, c2 + 4d p = 5 and the roots of c A ( z) are the goldenpratio = (1 + 5)/2
and 1 = (1 5)/2. It is useful to observe that 2 1 = 5 and (1 ) = 1.
Let us introduce the number µn = n (1 )n . Suppose the Lagrange interpolation
of zn at the roots of z2 z 1 is h( z) = ↵z + . The condition on the coefficients is
given by ⇢ n
= h( ) = ↵ +
(1 )n = h(1 ) = ↵ (1 )+
p p
Solving them gives ↵ = µn / 5 and = µn 1 / 5. It follows that
✓ p p ◆
n
p p µn 1 /p 5 µn / 5 p
A = ↵A + = µn / 5A + µn 1 / 5I2 = .
µn / 5 (µn + µn 1 )/ 5
✓ ◆ ✓ ◆
Fn n 0
Since =A , it immediately implies that
Fn+1 1
✓ ◆ p
n Fn 1 Fn
A = and Fn = µn / 5 .
Fn Fn+1
✓ ◆ ✓ ◆
Ln n 2
Similarly for the Lucas numbers, we get =A and
Ln+1 1
p
Ln = 2Fn 1 + Fn = Fn 1 + Fn+1 = (µn 1 + µn+1 )/ 5.
while 0 1
f( ) f [1] ( ) ··· f [k 1]
( )
B 2] C
B 0 f( ) ··· f [k ( ) C
f ( Jk, ) = B C.
B .. C
@ . A
0 0 ··· f( )
x(t) = etA w
Example 44. Let us consider a harmonic oscillator described by equation y00 (t) +
y(t) = 0. The general solution y(t) = ↵ sin(t) + cos(t) is well known. Let us
obtain it using matrix exponents. Setting
✓ ◆ ✓ ◆
y(t) 0 1
x(t) = , A=
y0 (t) 1 0
November 27, 2014 MA251 Algebra I 21
the harmonic oscillator becomes the initial value problem with a solution x(t) =
etA x(0). The eigenvalues of A are i and i. Interpolating e zt at these values of z gives
the following condition on h( z) = ↵z +
⇢
eit = h (i ) = ↵i +
e it = h( i) = ↵i +
Using matrices
0 1 0 1 0 1
y1 (t) 1 1 0 3
x(t) = @ y2 (t) A, w= @ 1 A, A=@ 1 1 6 A ,
y3 (t) 0 1 2 5
and 0 1 0 1 0 1
y1 (t) 1 (3 3t)e2t 2et
x(t) = @ y2 (t) A = etA @ 1 A = @ (2 3t)e2t et A .
y3 (t) 0 te2t
Notice the difference between linear and bilinear maps. For instance, let V =
W = K. Addition is a linear map but not bilinear. On the other hand, multiplication
is bilinear but not linear.
Let us choose a basis E = (e1 , . . . , en ) of V and a basis F = ( f 1 , . . . , f m ) of W.
Let ⌧ : W ⇥ V ! K be a bilinear map, and let ↵i j = ⌧ ( f i , e j ), for 1 i m,
1 j n. The m ⇥ n matrix A = (↵i j ) is called the matrix of ⌧ with respect to the
bases E and F.
Let v 2 V, w 2 W and write v = x1 e1 + · · · + xn en and w = y1 f 1 + · · · + ym f m .
Recall our notation from section 1.2
0 1 0 1
x1 y1
B . C B . C
[ E, v] = @ .. A 2 K n,1 , and [ F, w] = @ .. A 2 K m,1
xn ym
For 2
✓ example,
◆ let V = W = R and use the natural basis of V. Suppose that
1 1
A= 2 0
. Then
✓ ◆✓ ◆
1 1 x1
⌧ (( y1 , y2 ), ( x1 , x2 )) = ( y1 , y2 ) 2 0 x2
= y1 x1 y1 x2 + 2y2 x1 .
[ F2 , y]T B[ E2 , x] = ⌧ ( x, y) = [ F1 , x]T A[ E1 , y]
T
= [ F1 , 1, F2 ][ F2 , y] A [ E1 , 1, E2 ][ E2 , x]
T
= Q [ F2 , y] A P [ E2 , x] = [ F2 , y]T QT A P [ E2 , x]
⌧ y01 e01 + y02 e02 , x01 e01 + x02 e02 = y01 x02 + 2y02 x01 + y02 x02 .
Definition 52. Two square matrices A and B are called congruent if there exists an
invertible matrix P with B = PT AP.
Compare this with similarity of matrices which is defined by the formula B =
P 1 AP.
Definition 53. A bilinear form ⌧ on V is called symmetric if ⌧ (w, v) = ⌧ (v, w) for all
v, w 2 V. An n ⇥ n matrix A is called symmetric if AT = A.
Proposition 54. Let E be a finite basis of a vector space V. Then, a bilinear form ⌧ on
V is symmetric if and only if its matrix with respect to E is symmetric.
Proof. Easy exercise. ⇤
Example 55. The best known example of a bilinear form is when V = Rn , and ⌧ is
defined by
⌧ (( x1 , x2 , . . . , xn ), ( y1 , y2 , . . . , yn )) = x1 y1 + x2 y2 + · · · + xn yn .
The matrix of this bilinear form with respect to the standard basis of Rn is the iden-
tity matrix In . Geometrically, it is equal to the normal scalar product ⌧ (v, w) =
|vkw| cos ✓, where ✓ is the angle between the vectors v and w.
Figure 2:
0.8 0.8
0.6 0.6
0.4 0.4
y y’
0.2 0.2
–0.8 –0.6 –0.4 –0.2 0.2 0.4 0.6 0.8 –0.6 –0.4 –0.2 0.2 0.4 0.6
x x’
–0.2 –0.2
–0.4 –0.4
–0.6 –0.6
–0.8 –0.8
0.4
y’
0.2
–1 –0.5 0 0.5 1
x’
–0.2
–0.4
Figure 3: ( x0 )2 + 4( y0 )2 = 1
November 27, 2014 MA251 Algebra I 25
10 Fields
with 1 + 1 = 0 are fields of characteristic 2. One can actually do quadratic and bilinear forms
over them but the theory is quite specific. It could be a good topic for a second year essay.
26 MA251 Algebra I November 27, 2014
When n 3,
✓
we shall
◆
usually write x, y, z instead of x1 , x2 , x3 . For example, if
1 3
n = 2 and A = 3 2
, then q(v) = x2 2y2 + 6xy.
Conversely, if we are given a quadratic form as in the right hand side of (3.4),
then it is easy to write down its0matrix A. For example,
1
if n = 3 and q(v) = 3x2 +
3 2 1/2
y2 2z2 + 4xy xz, then A = @ 2 1 0 A.
1/2 0 2
 ↵i j f i f j
1 i j n
is a quadratic form on V.
It is for the sake of simplicity that we have introduced cobasis in the foregoing.
The grown-up way to say ‘cobasis of V’ is ‘basis of the dual of V’.
Theorem 60. Assume that 2 6= 0 in K.
Let q be a quadratic form on V and write dim(V ) = n. Then there are ↵1 , . . . ,
↵n 2 K and a basis F of V such that q(v) = Âin=1 ↵i yi2 , where the yi are the coordinates
of v with respect to F.
Equivalently, any symmetric matrix is congruent to a diagonal one.
Proof. This is by induction on n. There is nothing to prove when n = 1. As usual,
let A = (↵i j ) be the matrix of q with respect to an initial basis e1 , . . . , en and write
v = ⌃i xi ei .
Case 1. First suppose that ↵11 6= 0. As in the example in Subsection 3.3, we can
complete the square. Since 2 6= 0 we can write
q(v) = x21 + 2 2 x1 x2 + · · · + 2 n x1 xn + q0 (v),
where q0 is a quadratic form involving only the coordinates x2 , . . . , xn and i, 2 K.
We make the change of coordinates
y1 = x1 + 2 x2 + · · · + n xn , yi = xi for 2 i n.
Then q(v) = y21 + q1 (v) for another quadratic form q1 (v) involving only y2 , . . . , yn .
By the inductive hypothesis (applied to the subspace of V spanned by e2 , . . . , en ),
we can change the coordinates of q1 from y2 , . . . , yn to z2 , . . . , zn , say, to bring it to
the required form, and then we get q(v) = Âin=1 ↵i zi2 (where z1 = y1 ) as required.
Case 2. ↵11 = 0 but ↵ii 6= 0 for some i > 1. In this case, we start by interchanging
e1 with ei (or equivalently x1 with xi ), which takes us back to case 1.
Case 3. ↵ii = 0 for all i. If ↵i j = 0 for all i and j then there is nothing to prove,
so assume that ↵i j 6= 0 for some i, j. Then we start by making a coordinate change
xi = yi + y j , x j = yi y j , xk = yk for k 6= i, j. This introduces terms 2↵i j ( yi2 y2j )
into q, taking us back to case 2. ⇤
November 27, 2014 MA251 Algebra I 27
0 1
0 1/2 5/2
Example 61. Let n = 3 and q(v) = xy + 3yz 5xz, so A = @ 1/2 0 3 / 2 A.
5/2 3/2 0
Since we are using x, y, z for our variables, we can use x1 , y1 , z1 (rather than
x0 , y0 , z0 ) for the variables with respect to a new basis, which will make things typo-
graphically simpler!
We are in Case 3 of the proof above, and so we start with a coordinate change
x = x01 + y1 , y 1= x1 y1 , z = z1 , which corresponds to the basis change matrix
1 1 0
P1 = @ 1 1 0 A. Then we get q(v) = x21 y21 2x1 z1 8y1 z1 .
0 0 1
We are now in Case 1 of the proof above, and the next basis change, from com-
pleting the square, is x2 = x1 z1 , y2 = y1 , z2 = z1 , or equivalently, x1 0= x2 + z12 ,
1 0 1
y1 = y2 , z1 = z2 , and then the associated basis change matrix is P2 = @ 0 1 0 A,
0 0 1
Lemma 62. Let P, A 2 K n,n with P invertible. Then rank( A) = rank( PT AP).
Proof. In MA106 we proved that rank( A) = rank( QAP) for any invertible matrices
P, Q. Well, PT is also invertible and ( PT ) 1 = ( P 1 )T . Setting Q := PT finishes the
proof. ⇤
Definition 63. Assume 2 6= 0 in K and q 2 Q(V, K ). We define rank(q) to be the
rank of the matrix of q with respect to a basis E of V. This rank doesn’t depend on E
by theorem 51 and lemma 62.
If PT AP is diagonal, then its rank is equal to the number of non-zero terms on the
diagonal.
Remark 64. Notice that both statements of theorem 60 fail in characteristic 2. Let K
be the field of two elements. The quadratic form q( x, y) = xy cannot be diagonalised.
Similarly, the symmetric matrix ( 01 10 ) is not congruent to a diagonal matrix. Do
it as an exercise: there are 6 possible change of variable (you need to choose two
among three possible variables x, y and x + y) and you can observe directly what
happens with each change of variables.
Proposition 65. Any quadratic form q over C has the form q(v) = Âri=1 yi2 with respect
to a suitable basis, where r = rank(q).
Equivalently, for any symmetric matrix A 2 Cn,n , there is an invertible matrix P 2
Cn,n such that PT AP = B, where B = ( i j ) is a diagonal matrix with ii = 1 for
1 i r, ii = 0 for r + 1 i n, and r = rank( A).
Proof. By theorem 60 there are coordinates xi such that q(v) = Âin=1 ↵ii xi2 .
We may assume that ↵ii 6= 0 for 1 i r and ↵ii = 0 for r + 1 i n, where
r = rank(q) (otherwise we permute the coordinates).
p
Finally we make a coordinate change yi = ↵ii xi (1 i r), giving q(v) =
Âri=1 yi2 . ⇤
28 MA251 Algebra I November 27, 2014
t u t0 u0
q(v) = Â xi2 Â x2t+i = Â yi2 Â y2t +i .
0 (68)
i =1 i =1 i =1 i =1
Then t = t0 and u = u0 .
Proof. Write n = dim(V ). We know that t + u = rank(q) = t0 + u0 , so it is enough
to prove that t = t0 . Suppose not, and suppose that t > t0 . Let
V1 = {v 2 V | xt+1 = xt+2 = . . . = xn = 0}
V2 = {v 2 V | y1 = y2 = . . . = yt0 = 0}.
Then
and there is a non-zero vector v 2 V1 \ V2 . But it is easily seen from (68) that
0 6= v 2 V1 ) q(v) > 0 and 0 6= v 2 V2 ) q(v) < 0. This contradiction completes
the proof. ⇤
Proposition 78. Let A 2 Rn,n and let c j denote the jth column of A. Then A is
orthogonal if and only if cTi c j = i j for all i, j.
Proof. Let c j denote the jth column of A. Then cTi is the ith row of AT . So the (i, j)th
entry of AT A is cTi c j . So AT A = In if and only if cTi c j = i j for all i, j. ⇤
✓ ◆
cos ✓ sin ✓
Example 79. For any ✓ 2 R, let A = sin ✓ cos ✓
. (This represents a counter-
clockwise rotation through an angle ✓.) Then it is easily checked that AT A = AAT =
I2 . Notice that the columns of A are mutually orthogonal vectors of length 1, and the
same applies to the rows of A.
Definition/Lemma 80 (Gram-Schmidt step). Let F = ( f 1 , . . . , f n ) be a basis of a
Euclidean space V whose first t 1 vectors are orthonormal. We define St ( F ) =
( f 1 , . . . , f t 1 , h, f t+1 , . . . , f n ) where
t 1
g = ft  ( ft · fi ) fi , h = g (g · g) 1/2
i =1
t 1
g · f j = ft · f j  ( f t · fi )( fi · f j ) = f t · f j f t · f j = 0.
i =1
t 1
ft · fi
ft  fi · fi
fi
i =1
in the tth step and rescales all vectors at the very end. This gives the same result as
the original Gram-Schmidt process but is faster in practice.
Corollary 82. Let f 1 , . . . , f r be orthonormal vectors in an n-dimensional Euclidean vec-
tor space. Then they can be extended to an orthonormal basis ( f 1 , . . . , f n ).
Proof. We prove first that f 1 , . . . , f r are linearly independent. Suppose Âri=1 xi f i = 0
for some x1 , . . . , xr 2 R and let j be such that 1 j r. Taking the scalar product
with f j gives 0 = f j · 0 = Âri=1 xi f j · f i = x j , by since f 1 , . . . , f r are orthonormal.
In MA106 you proved that these can be extended to a basis F = ( f 1 , . . . , f n ). Now
Sn · · · Sr+1 ( F ) is an orthonormal basis and includes f i for i r as required. ⇤
Proposition 83. Let A 2 Rn,n be a real symmetric matrix. Then:
(a) All complex eigenvalues of A lie in R.
(b) If n > 0 then A has a real eigenvalue.
Proof. Proof of (a). For a column vector v or matrix B over C, we denote by v or
B the result of replacing all entries of v or B by their complex conjugates. Since the
entries of A lie in R, we have A = A.
Let v be a complex eigenvector associated with . Then
Av = v (84)
Av = v. (85)
vT A = vT . (86)
Before coming to the main theorem of this section, we recall the block sum A B
of matrices, which we introduced in section 2.5. It is straightforward to check that
( A1 B1 )( A2 B2 ) = ( A1 A2 B1 B2 ), provided the sizes of the matrices are such
that A1 A2 and B1 B2 are defined.
Theorem 87. For any symmetric matrix A 2 Rn,n , there is an orthogonal matrix P
such that PT AP is a diagonal matrix.
Note PT = P 1 so that A is simultaneously congruent and similar to the diagonal
matrix PT AP.
Proof. Induction on n. For n = 1 there is nothing to prove. Assume it’s true for n 1.
We use the standard inner product on Rn defined by v · w = vT w and the stan-
dard (orthonormal) basis E = (e1 , . . . , en ) of Rn .
By proposition 83 there exists an eigenvector g 2 Rn of A with eigenvalue, say,
. Note that g · g > 0 and put f 1 = g ( g · g ) 1/2 . Then f 1 · f 1 = 1 so by corollary 82
there exists an orthonormal basis F of Rn whose first vector is f 1 .
Put S = [ E, 1, F ]. Then S is orthogonal by proposition 77. So ST = S 1 .
Put B = ST AS. Then B is again symmetric. Moreover Be1 = e1 because
1
Be1 = S ASe1 = [ F1E] A[ E1F ]e1 = [ F1E] A f 1 = [ F1E] f 1 = e1 .
PT AP = (1 Q )T ST A S ( 1 Q) = (1 Q )T B ( 1 Q)
T T
= (1 Q )( C )(1 Q) = ( Q CQ)
PT P = (1 Q)T ( ST S)(1 Q)
T
= (1 Q )(1 Q) = 1 QT Q = 1 In 1 = In . ⇤
vT1 Av2 = T
1 v1 v2 (3) and by (2) vT1 Av2 = T
2 v1 v2 (4).
Example 91. Let n = 3 and q(v) = 3x2 + 6y2 + 3z2 4xy 4yz + 2xz, so
0 1
3 2 1
A=@ 2 6 2 A.
1 2 3
It can then be checked that PT P = I3 and that PT AP is the diagonal matrix with
diagonal 8, 2, 2.
November 27, 2014 MA251 Algebra I 33
For example, both x21 + 1 = 0 and x21 + 2 = 0 define the same quadric: the empty
set.
A quadric in R2 is also called a quadric curve, in R3 a quadric surface.
An isometry of Rn is a permutation preserving distance. Equivalently, it’s the
composition of an orthogonal operator with a translation x 7! x + a.
Theorem 94. Any quadric in Rn can be moved by an appropriate isometry to one of the
following:
r r r
 ↵i xi2 = 0  ↵i xi2 + 1 = 0  ↵i xi2 + xr+1 = 0.
i =1 i =1 i =1
Proof. Step 1. By theorem 88, we can apply an orthogonal basis change (that is,
an isometry of Rn that fixes the origin) which has the effect of eliminating the terms
↵i j xi x j in (93).
Step 2. Whenever ↵i 6= 0, we can replace xi by xi i /( 2↵i ), and thereby elimi-
nate the term i xi from the equation. This transformation is just a translation, which
is also an isometry.
Step 3. If ↵i = 0, then we cannot eliminate the term i xi . Let us permute the
coordinates such that ↵i 6= 0 for 1 i r, and i 6= 0 for r + 1 i r + s. Then if
s > 1, by using corollary 82, we can find an orthogonal transformation that leaves xi
unchanged for 1 i r and replaces Âis=1 r+ j xr+ j by xr+1 (where is the length
of Âis=1 r+ j xr+ j ), and then we have only a single non-zero i , namely r+1 .
Step 4. Finally, if there is a non-zero r+1 = , then we can perform the transla-
tion that replaces xr+1 by xr+1 / , and thereby eliminate . We have now reduced
to one of two possible types of equation:
r r
 ↵i xi2 + =0 and  ↵i xi2 + xr+1 = 0.
i =1 i =1
z0
–8
–2 –6
–4
–4
–2
–4
0x
–2
2
y0 4
2
6
4
8
Figure 4: x2 /4 + y2 z2 = 0
When n = 3, we still get the nine possibilities (1)–(9) that we had in the case
n = 2, but now they must be regarded as equations in the three variables x, y, z that
happen not to involve z.
So, in pcase (1), we now get the plane x = 0, in case (2) we get two parallel planes
x = ±1/ ↵, in case (4)p we get the line x = y = 0 (the z-axis), in case (5) two
intersecting planes y = ± ↵ / x, and in cases (6), (7) and (9), we get, respectively,
elliptical, hyperbolic and parabolic cylinders.
The remaining cases involve all of x, y and z. We omit ↵x2 y2 z2 = 1,
which is empty.
(10). ↵x2 + y2 + z2 = 0. The single point (0, 0, 0).
(11). ↵x2 + y2 z2 = 0. See Figure 4.
This is an elliptical cone. The cross sections parallel to the xy-plane are ellipses of
the form ↵x2 + y2 = c, whereas the cross sections parallel to the other coordinate
planes are generally hyperbolas. Notice also that if a particular point ( a, b, c) is on
the surface, then so is t( a, b, c) for any t 2 R. In other words, the surface contains the
straight line through the origin and any of its points. Such lines are called generators.
November 27, 2014 MA251 Algebra I 35
Figure 5:
z0
–1
–2
–2 –2
–1 –1
y0 0x
1 1
2 2
–4
z0
–2
–2
–2 0x
–1
y0 2
1
2 4
(b). x2 /4 + y2 z2 = 1
36 MA251 Algebra I November 27, 2014
6
4
2
z0
–8
–2
–6
–4 –4
–6 –2
–4 0x
–2 2
y0 4
2 6
4 8
Figure 6: x2 /4 y2 z2 = 1
Figure 7:
–4
z2
–2
0
–2 0x
–1
y0 2
1
2 4
(a). z = x2 /2 + y2
10
8
6
4
2
z0
–2
–4
–6
–8
–10
–4
–4
–2 –2
y0 0x
2 2
4
4
(b). z = x2 y2
38 MA251 Algebra I November 27, 2014
Theorem 111. A matrix A 2 Cn,n is normal if and only if there exists a unitary matrix
P 2 Cn,n such that P⇤ AP is diagonal12 .
12 with complex entries
40 MA251 Algebra I November 27, 2014
Proof. The “if” part follows from lemma 110 as diagonal matrices are normal.
For the “only if” part we proceed by induction on n. If n = 1, there is nothing to
prove. Let us assume we have proved the statement for all dimensions less than n.
The matrix A admits an eigenvector v 2 Cn with eigenvalue . Let W be the vector
subspace of all vectors x satisfying Ax = x. If W = Cn then A is a scalar matrix and
we are done. Otherwise, we have a nontrivial13 decomposition Cn = W W ? where
W ? = {v 2 Cn | v⇤ w = 0 for all w 2 W }.
Let us notice that A⇤ W ⇢ W because AA⇤ x = A⇤ Ax = A⇤ x = ( A⇤ x) for any
x 2 W.
It follows that AW ? ⇢ W ? since ( Ay)⇤ x = y⇤ ( A⇤ x) 2 y⇤ W = 0 so ( Ay)⇤ x = 0
for all x 2 W, y 2 W ? . Also AW ⇢ W.
Now choose orthonormal bases of W and W ? . Together they form a new or-
thonormal basis of Cn . The change of basis matrix P is unitary, hence by lemma 110
the matrix P⇤ AP = ( B0 C0 ) is normal. It follows that the matrices B and C are normal
of smaller size and we can use the inductive hypothesis to complete the proof. ⇤
Theorem 111 is an extremely useful criterion for diagonalisability of matrices. To
find P in practice, we use similar methods to those used in the real case.
✓ ◆
6 2 + 2i
Example 112. Let A be the matrix A = 2 2i 4
. Then
so the eigenvalues of A are 2 and 8. (We saw in proposition 107 that the eigenvalues
of any Hermitian matrix are real.) The corresponding eigenvectors are v1 = (1 +
i, 2)T and v2 = (1 + i, 1) T . We find that |v1 |2 = v⇤1 v1 = 6 and |v2 |2 = 3, so we
divide by their lengths to get an orthonormal basis v1 /|v1 |, v2 /|v2 | of C2 . Then the
matrix 0 1
1p+i 1p+i
6 3
P=@ A
p2 p1
6 3
✓ ◆
2 0
having this basis as columns is unitary and satisfies P⇤ AP = 0 8
.
ensure diagonalisability V must be complete with respect to the hermitian norm. Such spaces are called
November 27, 2014 MA251 Algebra I 41
with eigenvectors ei with eigenvalues i (i 1). The proof of proposition 107 goes
through in the infinite dimensional case, so we conclude that all i belong to R. Back
to physics, if we measure on a state [v] where v is Ân ↵n en and is normalised then
the measurement will return n as a result with probability |↵n |2 .
One observable is energy H : V ! V, often called hamiltonian. It is central to the
theory because it determines the time evolution [v(t)] of the system by Schrödinger’s
equation:
dv(t) 1
= Hv(t)
dt ih̄
where h̄ ⇡ 10 34 Joule per second16 is the reduced Planck constant. We know how
to solve this equation: v(t) = etH /ih̄ v(0).
As a concrete example, let us look at the quantum oscillator. The full energy of
the classical harmonic oscillator mass m and frequency ! is
p2 1
h= + m!2 x2
2m 2
where x is the position and p = mx0 is the momentum. To quantise it, we have to
play with this expression. The vector space of all smooth functions C 1 (R, C) admits
2
a convenient subspace
R1
V = { f ( x)e x /2 | f ( x) 2 C[ x]}, which we make hermitian by
h ( x), ( x)i = 1 ¯ ( x) ( x)dx. Quantum momentum and quantum position are
linear operators (observables) on this space:
The quantum Hamiltonian is a second order differential operator given by the same
equation
P2 1 h̄2 d2 1
H= + m!2 X 2 = 2
+ m!2 x2 .
2m 2 2m dx 2
As mathematicians, we can assume that m = 1 and ! = 1, so that H ( f ) = ( f x2
f 00 )/2. The eigenvectors of H are the Hermite functions
2 /2 x2 (n)
n ( x) = ( 1)n e x (e ) , n = 0, 1, 2 . . .
with eigenvalues n + 1/2 whichpare discrete energy levels of the quantum oscillator.
Notice that h k , n i = k,n 2n n! ⇡, so they are orthogonal but not orthonormal. The
states [ n ] are pure states: they do not change with time and always give n + 1/2 as
energy. If we take a system in a state [v] where v is normalised and
1
v= Â ↵n ⇡ 4 2n/2 n! n
n
then the measurement of energy will return n + 1/2 with probability |↵n |2 . Notice
that the measurement breaks the system!! It changes it to the state [ n ] and all future
measurements will return the same energy!
Alternatively, it is possible to model the quantum oscillator on the vector space
W = C[ x] of polynomials. One has to use the natural linear bijection
x2 /2
↵ : W ! V, ↵ ( f ( x)) = f ( x) e
Hilbert spaces. Diagonalisability is still subtle as eigenvectors do not span the whole of V but only a
dense subspace. Furthermore, if V admits no dense countably dimensional subspace, further difficulties
arise. . . Pandora’s box of functional analysis is wide open, so let us try to keep it shut.
16 Notice the physical dimensions: H is energy, t is time, i dimensionless, h̄ equalises the dimensions
2 2
one arrives at Hermite polynomials ↵ 1 ( n ( x)) = ( 1)n e x (e x )(n) instead of Her-
mite functions.
Let us go back to an abstract system with two observables P and Q. It is pointless
to measure Q after measuring P as the system is broken. But can we measure them
simultaneously? The answer is given by Heisenberg’s uncertainty principle. Mathe-
matically, it is a corollary of Schwarz’s inequality:
where we use the fact that P and Pv are hermitian. Notice that D( P, v) has a physical
meaning of uncertainty of measurement of P. Notice also that the operator PQ QP
is no longer hermitian in general but we can still talk about its expected value. Here
is Heisenberg’s principle.
1
Theorem 113. D( P, v) · D( Q, v) |E ( PQ QP, v)|.
2
Proof. In the right hand side, E ( PQ QP, v) = E ( Pv Qv Qv Pv , v) = hv, Pv Qv (v)i
hv, Qv Pv (v)i = h Pv (v), Qv (v)i h Qv (v), Pv (v)i. Remembering that the form is her-
mitian,
twice the imaginary part. So the right hand side is estimated by Schwarz’s inequality:
Two cases of particular physical interest are commuting observables, that is, PQ =
QP and conjugate observables, that is, PQ QP = ih̄I. Commuting observable can be
measured simultaneously with any degree of certainty. Conjugate observables obey
Heisenberg’s uncertainty:
h̄
D( P, v) · D( Q, v) .
2
One can identify abelian groups with Z-modules and the two terms can be used
interchangeably. However, Z-modules would be a better term than abelian groups
given the material in this chapter.
Definition 120. A group G is called cyclic if there exists an element x 2 G such that
G = {mx | m 2 Z}.
The element x in the definition is called a generator of G. Note that Z and Zn are
cyclic with generator 1.
Definition 121. A bijection : G ! H between two (abelian) groups is called an
isomorphism if ( g + h) = ( g) + (h) for all g, h 2 G. The groups G and H are
called isomorphic, and we write G ⇠
= H, if there is an isomorphism : G ! H.
Isomorphic groups are often thought of as being essentially the same group, but
with elements having different names.
Exercise 122. Prove that any isomorphism : G ! H satisfies (ng) = n ( g) for
all g 2 G, n 2 Z.
Proposition 123. Any cyclic group G is isomorphic either to Z or to Zn for some n > 0.
Proof. Let G be cyclic with generator x. So G = { mx | m 2 Z }. Suppose first that
the elements mx for m 2 Z are all distinct. Then the map : Z ! G defined by
(m) = mx is a bijection, and it is clearly an isomorphism.
Otherwise, we have lx = mx for some l < m, and so (m l ) x = 0 with m
l > 0. Let n be the least integer with n > 0 and nx = 0. Then the elements
0x = 0, 1x, 2x, . . . , (n 1) x of G are all distinct, because otherwise we could find a
smaller n. Furthermore, for any mx 2 G, we can write m = rn + s for some r, s 2 Z
with 0 s < n. Then mx = (rn + s) x = sx, so G = { 0, 1x, 2x, . . . , (n 1) x }, and the
map : Zn ! G defined by (m) = mx for 0 m < n is a bijection, which is easily
seen to be an isomorphism. ⇤
Definition 124. For an element g 2 G, the least positive integer n with ng = 0, if it
exists, is called the order | g| of g. If there is no such n, then g has infinite order and
we write | g| = 1. The order | G | of a group G is just the number of elements of G.
Exercise 125. If : G ! H is an isomorphism, then | g| = | ( g)| for all g 2 G.
Exercise 126. Let G be a finite cyclic group with generator g. Prove: | g| = | G |.
Exercise 127. If G is a finite abelian group then any element of G has finite order.
Definition 128. Let X be a subset of a group G. We say that G is generated or spanned
by X, and X is said to be a generating set of G, if every g 2 G can be written as a finite
sum Âik=1 mi xi , with mi 2 Z and xi 2 X for all i.
If G is generated by X, then we write G = h X i. We write G = h x1 , . . . , xn i instead
of G = h{ x1 , . . . , xn }i.
If G admits a finite generating set X then G is said to be finitely generated.
So a group is cyclic if and only if it has a generating set X with | X | = 1.
Definition 129. Let G1 , . . . , Gn be groups. Their direct sum is written G1 ··· Gn
or G1 ⇥ · · · ⇥ Gn and defined to be the set
( g1 , . . . , gn ) gi 2 Gi for all i
( g1 , . . . , gn ) + ( h1 , . . . , hn ) = ( g1 + h1 , . . . , gn + hn ).
November 27, 2014 MA251 Algebra I 45
Exercise 130. Prove that G1 · · · Gn is again an abelian group with zero element
(0, . . . , 0) and ( g1 , . . . , gn ) = ( g1 , . . . , gn ).
In general (non-abelian) group theory this is more often known as the direct
product of groups.
One of the main results of this chapter is known as the fundamental theorem of
finitely generated abelian groups, and states that every finitely generated abelian group
is isomorphic to a direct sum of cyclic groups.
Exercise 131. Prove that the group (Q, +) is not finitely generated. Prove that it is
not isomorphic to a direct sum of cyclic groups.
4.2 Subgroups
Definition 132. A subset H of a group G is called a subgroup of G if it forms a group
under the same operation as that of G.
Lemma 133. If H is a subgroup of G, then the identity element 0 H of H is equal to the
identity element 0G of G.
Proof. Using the identity axioms for H and G, 0 H + 0 H = 0 H = 0 H + 0G . Now by
the cancellation law, 0 H = 0G . ⇤
The definition of a subgroup is semantic in its nature. While it precisely pinpoints
what a subgroup is, it is quite cumbersome to use. The following proposition gives a
usable criterion.
Proposition 134. Let H be a subset of a group G. The following statements are equiv-
alent.
(a) H is a subgroup of G.
(b) H is nonempty; and h1 , h2 2 H ) h1 + h2 2 H; and h 2 H ) h 2 H.
(c) H is nonempty; and h1 , h2 2 H ) h1 h2 2 H.
Proof. Proof of (a) ) (c). If H is a subgroup of G then it is nonempty as it contains
0 H . Moreover, h1 h2 = h1 + ( h2 ) 2 H if so are h1 and h2 .
Proof of (c) ) (b). Pick x 2 H. Then 0 = x x 2 H. Now h = 0 h 2 H for
any h 2 H. Finally, h1 + h2 = h1 ( h2 ) 2 H for all h1 , h2 2 H.
Proof of (b) ) (a). We need to verify the four group axioms in H. Two of these,
‘Closure’, and ‘Inverse’, are the conditions (b) and (c). The other two axioms are
‘Associativity’ and ‘Identity’. Associativity holds because it holds in G, and H is a
subset of G. Since we are assuming that H is nonempty, there exists h 2 H, and then
h 2 H by (c), and h + ( h) = 0 2 H by (b), and so ‘Identity’ holds, and H is a
subgroup. ⇤
Examples 135. 1. There are two standard subgroups of any group G: the whole
group G itself, and the trivial subgroup {0} consisting of the identity alone. Sub-
groups other than G are called proper subgroups, and subgroups other than {0} are
called non-trivial subgroups.
2. If g is any element of any group G, then the set of all integer multiples h gi =
{mg | m 2 Z} forms a cyclic subgroup of G called the subgroup generated by g. Note
|h gi| = | g|.
Let us look at a few specific examples. If G = Z, then 5Z, which consists of all
multiples of 5, is the cyclic subgroup generated by 5. Of course, we can replace 5 by
any integer here, but note that the cyclic groups generated by 5 and 5 are the same.
If G = h gi is a finite cyclic group of order n and m is a positive integer dividing n,
then the cyclic subgroup generated by mg has order n/m and consists of the elements
kmg for 0 k < n/m.
46 MA251 Algebra I November 27, 2014
Exercise 136. What is the order of the cyclic subgroup generated by mg for general
m (where we drop the assumption that m|n)?
Exercise 137. Show that the groups of non-zero complex numbers C⇥ under the
operation of multiplication has finite cyclic subgroups of all possible orders.
Exercise 138. Let G be an (abelian) group.
(a) Let { Hi | i 2 I } be a family of subgroups of G. Prove that the intersection
T
K = i2 I Hi is also a subgroup of G.
(b) Let X be a subset of G and let K be the intersection of those subgroups H ⇢ G
satisfying X ⇢ H. Prove K = h X i. In words: h X i is the least subgroup of G
containing X.
(c) Give an example showing that the union of two subgroups of G may not be a
subgroup of G.
Exercise 139. Let G be a group and n 2 Z. Prove that { x 2 G | nx = 0} is a
subgroup of G.
Exercise 140. Let A, B be subgroups of a group G. Prove that { a + b | a 2 A, b 2 B}
(written A + B) is also a subgroup of G.
( H + g) + ( H + h) = x + y x 2 H + g, y 2 H + h
= ( a + g) + (b + h) a, b 2 H = ( a + b) + ( g + h) a, b 2 H
= c + ( g + h) c 2 H + H = c + ( g + h) c 2 H = H + ( g + h). ⇤
Definition 154. The group G / H is called the quotient group (or the factor group) of
G by H.
Notice that if G is finite, then | G / H | = | G : H | = | G |/| H |. So, although the
quotient group seems a rather complicated object at first sight, it is actually a smaller
group than G.
Examples 155. 1. Let G = Z and H = mZ for some m > 0. Then there are exactly
m distinct cosets, H, H + 1, . . . , H + (m 1). If we add together k copies of H + 1,
then we get H + k. So G / H is cyclic of order m and with generator H + 1. So by
proposition 123, Z/mZ ⇠ = Zm . The original definition of Zm was rather clumsy so we
put Zm := Z/mZ from now on.
2. G = R and H = Z. The quotient group G / H is isomorphic to the circle
subgroup S1 of the multiplicative group C⇥ . One writes an explicit isomorphism
: G / H ! S1 by ( x + Z) = e2⇡ xi .
3. G = Q and H = Z. The quotient group G / H featured in a previous exam.
It was asked to show that this group is infinite, not finitely generated and that every
element of G / H has finite order.
4. Let V be a vector space over K and W ⇢ V a subspace. In particular V is
an abelian group with subgroup W, so that quotient group V /W is defined. It can
naturally be made into a vector space over K (outside our scope).
ker( ) = { g 2 G | ( g) = 0 H }.
is well-defined
() for all g, h 2 G with A + g = A + h one has ( g) = (h)
() for all g, h 2 G with g h 2 A one has g h 2 K
() A ⇢ H.
( A + h) + ( A + g) = (h) + ( g) = (h + g)
= ( A + h + g) = (( A + h) + ( A + g)). ⇤
(K + g) = 0 H () ( g) = 0 H () g 2 K () K + g = 0G/K .
Prove yourself that this makes G into a group and that X is a basis of G.
Uniqueness. For a 2 X let g( a) be the element a 2 X viewed as an element of G
and h( a) viewed as an element of H. We define : G ! H by
⇣ ⌘
 a c g ( a ) =  c a h( a).
a2 X a2 X
4.6 Unimodular elementary row and column operations and the Smith
normal form for integral matrices
We interrupt our discussion of finitely generated abelian groups at this stage to inves-
tigate how the row and column reduction process of Linear Algebra can be adapted
to matrices over Z. Recall from MA106 that we can use elementary row and column
operations to reduce an m ⇥ n matrix of rank r over a field K to a matrix B = ( i j )
with ii = 1 for 1 i r and i j = 0 otherwise. We called this the Smith Normal
Form of the matrix. We can do something similar over Z, but the non-zero elements
ii will not necessarily all be equal to 1.
The reason that we disallowed = 0 for the row and column operations (R3)
and (C3) (multiply a row or column by a scalar ) was that we wanted all of our
elementary operations to be reversible. When performed over Z, (R1), (C1), (R2)
and (C2) are reversible, but (R3) and (C3) are reversible only when = ±1. So, if A
is an m ⇥ n matrix over Z, then we define the three types of unimodular elementary
row operations as follows:
(UR1): Replace some row ri of A by ri + tr j , where j 6= i and t 2 Z;
(UR2): Interchange two rows of A;
(UR3): Replace some row ri of A by ri .
52 MA251 Algebra I November 27, 2014
The unimodular column operations (UC1), (UC2), (UC3) are defined similarly.
Recall from MA106 that performing elementary row or column operations on a matrix
A corresponds to multiplying A on the left or right, respectively, by an elementary
matrix. These elementary matrices all have determinant ±1 (1 for (UR1) and 1 for
(UR2) and (UR3)), so are unimodular matrices over Z.
Theorem 173 (Smith Normal Form). Any matrix A 2 Zm,n can be put into Smith nor-
mal form B through a sequence of unimodular elementary row and column operations.
Moreover, B is unique.
Proof. We shall not prove the uniqueness part here. We use induction on m + n. The
base case is m = n = 1, where there is nothing to prove. Also if A is the zero matrix
then there is nothing to prove, so assume not.
Let d be the smallest positive entry in any matrix C = ( i j ) that we can obtain
from A by using unimodular elementary row and column operations. By using (R2)
and (C2), we can move d to position (1, 1) and hence assume that 11 = d. If d does
not divide 1 j for some j > 0, then we can write 1 j = qd + r with q, r 2 Z and
0 < r < d, and then replacing the j-th column c j of C by c j qc1 results in the entry
r in position (1, j), contrary to the choice of d. Hence d | 1 j for 2 j n and
similarly d | i1 for 2 i m.
Now, if 1 j = qd, then replacing c j of C by c j qc1 results in entry 0 position
(1, j). So we can assume that 1 j = 0 for 2 j n and i1 = 0 for 2 i m. If
m = 1 or n = 1, then we are done. Otherwise, we have C = (d) C 0 for some
(m 1) ⇥ (n 1) matrix C0 . By inductive hypothesis, the result of the theorem
applies to C 0 , so by applying unimodular row and column operations to C which
do not involve the first row or column, we can reduce C to D = ( i j ), which satisfies
11 = d, ii = di > 0 for 2 i r, and i j = 0 otherwise, where di | di +1 for
2 i < r. To complete the proof, we still have to show that d | d2 . If not, then
adding row 2 to row 1 results in d2 in position (1,2) not divisible by d, and we obtain
a contradiction as before. ⇤
In the following two examples we determine the Smith normal forms of the matrix
A.
Example 174. Let A be the matrix A = ( 4235 2114 ). The general strategy is to reduce
the size of entries in the first row and column, until the (1,1)-entry divides all other
entries in the first row and column. Then we can clear all of these other entries.
Matrix Operation Matrix Operation
✓ ◆ ✓ ◆
42 21 0 21 r2 ! r2
c1 ! c1 2c2
35 14 7 14 r1 $ r2
✓ ◆ ✓ ◆
7 14 7 0
c2 ! c2 2c1
0 21 0 21
0 1
18 18 18 90
B 54 12 45 48 C
Example 175. Let A be the matrix A = B
@ 9 6 6
C.
63 A
18 6 15 12
Note: There is also a generalisation to integer matrices of the row reduced normal
form from Linear Algebra, where only row operations are allowed. This is known
as the Hermite Normal Form and is more complicated. It will appear on an exercise
sheet.
h X | Y i := F /K.
= Zn / K = h y 1 , . . . , y n | d 1 y 1 , . . . , d n y n i .
G = im(h) ⇠ ⇤
Proof. This is another application of the first isomorphism theorem. Let H = Zd1
· · · Zdn , so H is generated by x1 , . . . , xn , with x1 = (1, 0, . . . , 0), . . . , xn = (0, . . . ,
0, 1). Let e1 , . . . , en be the standard basis of Zn . By the universal property (proposi-
tion 170) there is a surjective homomorphism : Zn ! H for which
(↵1 e1 + · · · + ↵n en ) = ↵1 x1 + · · · + ↵n xn
K = (↵1 , . . . , ↵n ) 2 Zn ↵1 x1 + · · · + ↵n xn = 0 H
⌦ ↵
= (↵1 , . . . , ↵n ) 2 Zn di divides ↵i for all i = d1 e1 , . . . , dn en .
Thus
= Zn / K = h e 1 , . . . , e n | d 1 e 1 , . . . , d n e n i .
H⇠ ⇤
Note Z1 = 0 and recall Z0 ⇠
= Z. Putting all results together, we get the main
theorem of this chapter.
Theorem 186 (Fundamental theorem of finitely generated abelian groups). Let G be
a finitely generated abelian group. Then there exist integers r, k and d1 , . . . , dr 2 with
di | di+1 for all i such that
G⇠
= Zd 1 ··· Zdr Zk .
h x1 , x2 , x3 | x1 + 3x2 x3 , 2x1 + x3 i,
and is isomorphic to Z1 Z3 Z⇠
= Z3 Z, so it is infinite, with a finite subgroup of
order 3.
56 MA251 Algebra I November 27, 2014
F = h A ⇥ B | ? i.
( a + a0 , b) ( a, b) ( a0 , b) n( a, b) (na, b)
(189)
( a, b + b0 ) ( a, b) ( a, b0 ) n( a, b) ( a, nb)
A ⌦ B = F / F0 = h A ⇥ B | relations in (189) i.
We have to get used to this definition that may seem strange at first glance. First, it
is easy to materialise certain elements of A ⌦ B. An elementary tensor is
a ⌦ b := ( a, b) + F0
for a 2 A, b 2 B.
The generators (189) for F0 give rise to properties of elementary tensors:
( a + a0 ) ⌦ b = a ⌦ b + a0 ⌦ b n( a ⌦ b) = (na) ⌦ b
(190)
a ⌦ (b + b0 ) = a ⌦ b + a ⌦ b0 n( a ⌦ b) = a ⌦ (nb).
However, for general groups tensor products could behave in quite an unpre-
dictable way. For instance, Z2 ⌦ Z3 = 0. Indeed,
To help sorting out zero from nonzero elements in tensor products we need to
understand a connection between tensor products and bilinear maps.
Definition 194. Let A, B, and C be abelian groups. A function ! : A ⇥ B ! C is a
bilinear map if
!( a + a0 , b) = !( a, b) + !( a0 , b) n!( a, b) = !(na, b)
(195)
!( a, b + b0 ) = !( a, b) + !( a, b0 ) n!( a, b) = !( a, nb)
for all n 2 Z, a, a0 2 A, b, b0 2 B.
Let Bil( A ⇥ B, C ) be the set of all bilinear maps from A ⇥ B to C.
Lemma 196 (Universal property of tensor product). The function
✓ : A ⇥ B ! A ⌦ B, ✓ ( a, b) = a ⌦ b
is a bilinear map. This bilinear map is universal, that is, for all C the composition with
✓ defines a bijection
hom( A ⌦ B, C ) ! Bil( A ⇥ B, C ), 7! ✓.
Proof. The function ✓ is a bilinear map: the properties (195) of a bilinear map easily
follow from the corresponding properties (190) of elementary tensors.
Let Fun(·, ·) denote the set of functions between two sets. Recall that F denotes
the free abelian group with basis indexed by A ⇥ B. By the universal property of free
abelian groups (proposition 170), we have a bijection
hom( F, C ) ! Fun( A ⇥ B, C ).
: R/2⇡ Z ! R/⇡ Q, ( z + 2⇡ Z) = z + ⇡ Q.
58 MA251 Algebra I November 27, 2014
Proof. The following picture shows that a triangle with base b and height h is equiv-
alent to the rectangle with sides b and h/2.
Next we shall show that two right-angled triangles of the same area are scissors
congruent.
B Q
C
The equal area triangles are CAB and CPQ. This means that |CAkCB| = |CPkCQ|.
Hence, |CA|/|CQ| = |CP|/|CB| and the triangles CPB and CAQ are similar. In
particular, the edges AQ and PB are parallel, thus the triangles AQP and AQB share
the same base AQ and height and, consequently, are scissors congruent. So
Likewise for the B j . So there are triangles A, B such that x = [ A] [ B]. But ⌫2 ( x) = 0
so A and B have the same volume. Therefore [ A] = [ B] and x = 0. ⇤
Observe that ⌫n is surjective. So ⌫2 is an isomorphism and P2 ⇠
= R. However, ⌫3
is not injective as we shall now show.
Theorem 201. Let T be a regular tetrahedron, C a cube, both of unit volume. Then
[ T ] 6= [C ] in P3 .
Proof. Let M be a polytope with set of edges I. For each edge i, let hi be its length hi
and let ↵i 2 R/2⇡ Z be the angle along this edge. The Dehn invariant of M is
( M) = Â hi ⌦ ↵i 2 R ⌦ R/2⇡ Z.
i
Angle cut: an edge of length h and angle ↵ has its angle cut into angles ↵ B in B
and ↵C in C. This contributes h ⌦ ↵ h ⌦ ↵ B h ⌦ ↵C = h ⌦ (↵ ↵ B ↵C ) =
h ⌦ 0 = 0.
New edge: a new edge of length h is created. If its angle in B is ↵, then its angle
in C is ⇡ ↵. This contributes h ⌦ ↵ h ⌦ (⇡ ↵ ) = h ⌦ ⇡ = 0, by
proposition 197.
Finally
p
⇡ ⇡ 2 1
([C ]) = 12(1 ⌦ ) = 12 ⌦ = 0, while ([ T ]) = 6 p ⌦ arccos 6= 0,
4 4 3
3 3
For example
7 17 5983 28545857
x1 = , x2 = , x3 = , x4 = ,...
9 81 38 316
It is easy to show that the denominators grow indefinitely. Contradiction. ⇤
Now it is natural to ask what exactly the group P3 is. It was proved later (in 1965)
that the joint homomorphism (⌫3 , ) : P3 ! R (R ⌦ R/2⇡ Z) is injective. It is not
surjective and the image can be explicitly described, but we won’t do it here.
The End