MA101
Note 6 by K. D. Joshi
1
are in W and so can be expressed uniquely as linear combinations of the basis elements
w1 , w2 , . . . , wm , say,
T (v1 ) = a11 w1 + a21 w2 + . . . + am1 wm
T (v2 ) = a12 w1 + a22 w2 + . . . + am2 wm
.. .. .. (1)
. . .
T (vn ) = a1n w1 + a2n w2 + . . . + amn wm
Then the m n matrix A = (aij ) is called the matrix of T w.r.t. the ordered
bases (v1 , v2 , . . . , vn ) for V and (w1 , w2 , . . . , wm ) for W . (Note that, unlike in the
case of writing a system of linear equations where the first suffix is common for all
the coefficients appearing in the same row, here the second suffix is common to all the
coefficients appearing in the same row.) The system (1) can be written compactly as
T (v1 ) w1
T (v2 )
t
w2
.. = A .. (2)
. .
T (vn ) wm
where on the R. H. S. we are multiplying a scalar matrix with a vector matrix. A
good way to remember the matrix A associated to a linear transformation T (with given
ordered bases for the domain and the codomain) is to note that for every j = 1, 2, . . . , n,
the j-th column of A consists of the coefficients when the vector T (vj ) is expresses as a
linear combination of the basis elements of W .
Let us see how this matrix A helps us write down the image of a typical element, say
v of the domain V . First write v as c1 v1 + c2 v2 + . . . + cn vn . Similarly, write T (v) as
b1 w1 +b2 w2 +bm wm . We shall show how these coefficients b1 , b2 , . . . , bm can be calculated
from the coefficients c1 , c2 , . . . cn if we know the matrix A. For this, we express linear
combinations of vectors as products of scalar and vector matrices. By linearity of T and
using (2) above, we have
w1
w2
[b1 b2 . . . bm ] .. = T (v) = c1 T (v1 ) + c2 T (v2 ) + . . . + cn T (vn )
.
wm
T (v1 ) w1
T (v2 )
w2
cn ]At
= [c1 c2 . . . cn ] .. = [c1 c2 . . . ..
. .
T (vn ) wm
2
As the vectors w1 , w2 , . . . , wm are linearly independent, the preceding equality means
[b1 b2 . . . bm ] = [c1 c2 . . . cn ]At . Taking transposes, we have
b = Ac (3)
where b and c are the column vectors [b1 b2 . . . bm ]t and [c1 c2 . . . cn ]t respectively.
Worded differently, once we fix the ordered bases (v1 , v2 , . . . , vm ) and (w1 , w2 , . . . , wm ),
elements of V correspond to column vectors of length n while those of W to column
vectors of length m. With these identifications (3) says that the transformation T
behaves very much like the linear transformation induced by the matrix A. As we shall
see later, this fact is crucial in converting concepts about abstract linear transformations
to the corrersponding concepts about matrices which are often easier to handle.
3
Combining this with (2) we have
T (S(u1 )) w1
T (S(u2 ))
t t
w2
.. = BA .. (6)
. .
T (S(up )) wm
4
some canonical choice of bases or unless we show that the concept we are associating is
independent of the choice of the bases, our definitions are ambiguous.
There is no satisfactory way out of this difficulty when V and W are different vector
spaces. But when W = V , the situation can be salvaged. So, suppose T : V V is
a linear transformation where V is a vector space of dimension n (say). In such a case,
we can take two different bases for V , one as the domain and the other as the codomain
and form the matrix, say A, of T w.r.t. these bases. This is rarely done. So, when we
talk about the matrix of a linear transformation from V to V w.r.t. some basis for V
it is tacitly assumed that the same ordered basis is used for both the domain and the
codomain. Of course, when this basis is changed, the same linear transformation may
have a different matrix. But these matrices are related in a particular way. To see what
it is, suppose (v1 , v2 , . . . , vn ) and (w1 , w2 , . . . , wn ) are two ordered bases for V and let
A and B be the matrices of T w.r.t. these bases respectively. Both are n n matrices
and we want to see how A and B are related to each other. We already know that
T (v1 ) v1 T (w1 ) w1
T (v2 )
v2
T (w2 )
w2
At Bt
.. = .. and
.. = .. (8)
. . . .
T (vn ) vn T (wn ) wn
We now introduce one more square matrix C called the change of basis matrix.
Each wi is a unique linear combination of the basis elements v1 , v2 , . . . , vn . The matrix
C is obtained by collecting the coefficients of the vs in these n linear combinations. We
can therefore write
w1 v1
w2
v2
.. = C .. (9)
. .
wn vn
In essence this matrix C expresses the new basis elements in terms of the old ones and
hence is called the change of basis matrix. It is clear that the change of basis matrix
which expresses the vs in terms of ws will be the inverse of C. So we have
v1 w1
v2
1
w2
.. = C .. (10)
. .
vn wn
Consider v1 . It is a linear combination of the vectors w1 , w2 , . . . , wn with the coefficients
coming from the first row of the matrix C 1 . By linearity of T , T (v1 ) will be a linear
5
combination of the vectors T (w1 ), T (w2), . . . , T (wn ) with the same coefficients, viz. the
entries in the first row of C 1 . A similar argument applies to T (v2 ), T (v2 ), . . . , T (vn ).
Putting these expressions together, and further using (9), we get
T (v1 ) T (w1 ) w1 v1
T (v2 )
1
T (w2 )
1 t
w2
1 t
v2
.. = C .. = C B .. = C B C .. (11)
. . . .
T (vn ) T (wn ) wn vn
The first part of (8) and (11) together imply
C 1 B t C = At (12)
Similarity of Matrices
In Note 4, we commented that the word faithful which is generally used socially
is also used technically in mathematics. The same is true of the word similar. Many
times we make a statement like the second assertion follows by a similar argument.
This is the common, non-technical use of the word. But when we say that two triangles
are similar, we mean that their respective sides are proportional. This is a technical use
of the word.
For matrices, too, similarity is a technical concept. The formal definition is as follows.
Definition 1: Suppose A and B are square matrices of the same order, say n. Then B
is said to be similar to A if there exists some non-singular matrix C (of order n) such
that C 1 AC = B.
If A is the identity matrix In , then it commutes with every matrix C and so it follows
that the only matrix similar to In is In itself. The same is true of the zero matrix. But
6
in general,
every
matrix has many other marices
similar
to it. For example the
matrix
4 3 1 2 0 1
B= is similar to the matrix A = as we see by taking C = .
2 1 3 4 1 0
1 0 1 1
(Note that C is its own inverse.) By changing C to , we get that is
0 2 6 4
also similar to A.
A few elementary properties of the relation just defined are listed below.
(ii) Similar matrices have the same rank, the same determinant and the same trace.
(iii) If two matrices are similar to each other, then so are their transposes.
Proof: For (i) we have to verify that similarity is a binary relation which is reflexive,
symmetric and transitive. Taking C = In , we get C 1 AC = In AIn = A which shows
that A is similar to itself. Next, suppose B is similar to A. Then there is some C such
that B = C 1 AC. From this we get CBC 1 = CC 1 ACC 1 = A which shows that A
is similar to B because C can be written as (C 1 )1 . Finally, for transitivity, assume
B = C 1 AC and P = Q1 BQ. Then P = Q1 (C 1 AC)Q = (CQ)1 A(CQ), which
shows that P is similar to A.
For (ii), assume B = C 1 AC. Since C and C 1 are both non-singular, by Part (ii)
of Exercise (5.12), B has the same rank as A. Also from the multiplicative property of
1
determinants, it follows that det(B) = det(C 1 )det(A)det(C) = det(A)det(C).
det(C)
Even if the matrices A and C 1 or the matrices A and C may not commute with
each other, their determinants are real (or complex) numbers and they always commute
with each other. This gives det(B) = det(A). Thus we have shown that ranks and
determinants are invariant under similarity. The invariance of the trace is a bit tricky
to prove and will be given as an exercise.
For (iii) assume B = C 1 AC. Taking transposes, B t = C t At (C 1 )t . But (C 1 )t
is the same as (C t )1 as one sees by directly multiplying (C 1 )t and C t . So, if we let
P = (C 1 )t = (C t )1 we have B t = P 1 At P which shows that B t is similar to At .
Given two matrices A and B, in general there is no easy way to tell if they are simialr
to each other. But Part (ii) of the Theorem above gives some necessary conditions.
Going back to (12), we now see that the matrices B t and At are similar to each
other. By Part (iii) of the theorem above, we see that A and B are also similar to each
7
other. Put differently, for a linear transformation T : V V , even though its
matrix w.r.t. a basis may change as the basis changes, its similarity class
will not change. Therefore properties of matrices which are invariant under similarity
can be defined unambiguously for such transformations. For example we can talk about
the determinant of a linear transformation as the determinant of its matrix w.r.t. any
basis. This is well defined because similar matrices have the same determinant. The
same holds for trace and, more generally, for eigenvalues as we shall show later. As with
ranks, we shall also give a direct definition of eigenvalues of linear transformations. It
will then be seen that the two definitions coincide. In fact, the direct definition has the
advantage that it is applicable even when V is infinite dimensional. Matrices can handle
only finite dimensional vector spaces.
0 0 0 5
0 0 0 0
(They do have the same rank, trace and the same determinant. But that is only a
necessary condition for similarity.)
As another example, consider the transformation T : IR2 IR2 defined by
x 16/25x 12/25y
T = (13)
y 12/25x + 9/25y
16/25 12/25
This linear transformation is induced by the 2 2 matrix A = . So
12/25 9/25
8
the matrix of T w.r.t. the usual ordered basis (e1 , e2 ) is simply
A. Suppose, however,
4/5 3/5
that we take a different ordered basis (v1 , v2 ) where v1 = and v2 = .
3/5 4/5
Let us find the matrix, say B, of T w.r.t. this ordered basis. By a direct calculation, we
have
16/25 12/25 4/5 4/5
T (v1 ) = = = v1 = 1v1 + 0v2 (14)
12/25 9/25 3/5 3/5
16/25 12/25 3/5 0
and T (v2 ) = = = 0 = 0v1 + 0v2 (15)
12/25 9/25 4/5 0
Hence B, the matrix of T w.r.t. the ordered basis (v1 , v2 ) is simply
1 0
B= (16)
0 0
which is considerably simpler than the original matrix A. In fact, it is a diagonal matrix.
And, diagonal matrices are the next simplest matrices to matrices which are multiples
of the identity matrix.
We already know that A and B are similar to each other. An explicit matrix P
for which B will equal P 1 AP is easy to obtain.
It is a matrix whose columns are the
4/5 3/5
vectors v1 , v2 . That is, P = . We leave it as an exercise to find P 1 and
3/5 4/5
verify that P 1 AP indeed equals the diagonal matrix B.
9
y
( x, y )
v2
O
x
v1 T( x , y )
Summing up, the magical feature of the vectors v1 , v2 was that their images under
the transformation were their own scalar multiples. Normally, if T : V V is a linear
transformation and v is some element of V , there is no reason why T (v) should have the
same (or the opposite) direction as v. Vectors for which this happens are very special
and are given a special name.
T (v) = v (17)
10
eigenvectors (except when is a multiple of 2, in which case the rotation is just the
identity transformation). As in the case of an orthogonal projection, a reflection of
the plane into a line L through the origin has two eigenvectors, v1 (along L) and v2
(perpendicular to L). Note, however, that the eigenvalue correspondingto v2 is 1 and
1 0
not 0. So, the matrix of this reflection w.r.t. the basis (v1 , v2 ) will be .
0 1
Note that the vector space appearing in Definition 2 does not have to be finite
dimensional. For example, let V be the vector space of all infinitely differentiable real
valued functions on IR (i.e. functions f : IR IR which have derivatives of all orders).
We already saw that the differential operator D : V V (defined by D(f (x)) = f (x))
is a linear transformation. Since D(ex ) = ex for every real number , every such
function is eigenvector of D and the corresponding eigenvalue is . Since here the
elements of the vector space V are functions, it is more customary to call the eigenvectors
as eigenfunctions. We leave it as an exercise to find the eigenfunctions of the operator
D 2 and the corresponding eigenvalues.
11
where D = D(1 , 2 , . . . , n ) is a diagonal matrix whose (i, i)-th entry is i for i =
1, 2, . . . , n. Since D is its own transpose, it follows from (19), that it is the matrix of T
w.r.t. the ordered basis (v1 , v2 , . . . , vn ).
Conversely, suppose {v1 , v2 , . . . , vn } is a basis w.r.t. which the matrix of T is a
diagonal matrix D. Let i be the (i, i)-th entry of D for i = 1, 2, . . . , n. Then (19) holds
with D replaced by D t . But D t is the same as D. So, (19) holds as it is. But then (18)
also holds for every i = 1, 2, . . . , n. And that means each vi is an eigenvector of T (with
i as the correspnding eigenvalue).
Av = v (20)
12
Any such v is called an eigenvector corresponding to the eigenvalue .
Note the striking resemblance between (17) and (20). But there are two subtle
differences. First, in Definition 2 we defined eigenvectors first and then the eigenvalues
corresponding to them. Here we are turning the tables around. More importantly,
Definition 2 is applicable even when V is infinite dimensional while in Definition 3, the
vector space involved is finite dimensional (and further restricted to a euclidean space).
Despite these differences, we have the following expected relationship.
Proof: For the first part, recall that TA is defined by TA (v) = Av. So, it is clear that
(17) holds (with T replaced by TA ) if and only if (20) holds with the same value of .
For the converse, suppose A is the matrix of T w.r.t. the ordered basis (v1 , v2 , . . . , vn ).
Suppose v = c1 v1 + c2 v2 + . . . + cn vn V and T (v) = b1 v1 + b2 v2 + . . . + bn vn . Then by
Equation (3) above when we represented linear transformations by matrices, we have
b = Ac (21)
where b, c are the column vectors [b1 b2 . . . bn ]t and [c1 c2 . . . cn ]t respectively. There-
fore, the equality T (v) = v translates into Ac = c. Note also that v 6= 0 if and only
if c 6= 0. So, is an eigenvalue of T if and only if it is an eigenvalue of the matrix A
associated to T . We see further that the eigenvector v corresponding to corresponds
to the eigenvector c of A.
This theorem converts the problem of finding the eigenvalues and eigenvectors of
T : V V (when V is finite dimensional) to the problem of finding the eigenvalues
and eigenvectors of the matrix A associated to T . Incidentally, this also shows that it
does not matter which basis is used for the conversion. If we take a different basis for
V , we may get a different matrix, say B. But its eigenvalues will be the same because
they coincide with those of the transformation T (which can be defined directly without
involving any basis). It is like this. The year of birth as well as the year of death of a
person can change if you change the calendar. But the duration of his life span will be
independent of which calendar is used. A mathematical analogy would be that even if
the coordiantes of the three vertices of a triangle may change if we change the coordinate
frame, its area will not change because it is an intrinsic geometric attribute.
13
How to Find Eigenvalues of a Matrix?
The conversion achieved by Theorem 3 will be of some use only if we have some easy
way of finding the eigenvalues and corresponding eigenvectors of matrices. Fortunately,
this turns out to be the case. In fact, this is one respect in which the matrices are
superior. (Another one is the calculation of the rank of a linear transformation.)
We begin by rewriting (20) as
(A In )v = 0 (22)
Clearly, p() is a polynomial in of degree n and leading coefficient (1)n . It is called the
characteristic polynomial of A, its roots the characteristic roots and the equation
p() = 0 the characteristic equation of A. The theorem above says that the char-
acteristic roots of A are the same as its eigenvalues. But conceptually, they are very
different things. The characteristic roots are the roots of a certain polynomial associated
with the matrix while an eigenvalue is a special scalar for which something special (viz.
an eigenvector, i.e. a vector whose direction remains unchanged) exists.
Note that if A is a real matrix, then p() is a real polynomial. Still it may have
complex roots. We can consider them as eigenvalues provided we allow complex eigen-
vectors. This is what happens for rotations as will be pointed out in the exercises. A
few interesting results about eigenvalues will also be given as exercises.
Now that we know how to find eigenvalues, let us go back to the orthogonal pro-
jection
T defined by (13) above. The matrix of T w.r.t. the standard basis is A =
16/25 12/25 16/25 12/25
. Hence p() = . Upon expansion, this
12/25 9/25 12/25 9/25
14
x1 0
to = 1, we have to solve the system (A I2 ) = , i.e. the system
x2 0
9 12 12 16
x1 x2 = 0 and x1 x2 = 0 (24)
25 25 25 25
Both the equations are the same and there are infinitely many solutions. This is to
be expected because any (non-zero) multiple of an eigenvector is also an eigenvector,
corresponding to the same eigenvalue. One possible
solution
is x1 = 4, x2 = 3. So an
4
eigenvector corresponding to the eigenvalue 1 is . By a similar calculation, which
3
3
we skip, an eigenvector corresponding to the other eigenvalue, viz. 0, is . These
4
are multiples of the unit vectors v1 , v2 we obtained earlier. But that time we had to
rely on the geometric interpretation of the transformation T . This time we found the
answer in a purely self-contained manner.
It would thus appear that in order to find an eigenvector basis for any linear trans-
formation T : V V where V is n-dimensional, we should first associate an n n
matrix A to T , then find the characteristic equation of A, find all characteristic roots
(of which there would be n in all counting multiplicities) and then for each eigenvalue ,
find a corresponding eigenvector v by solving the system Av = v. This way we should
get n eigenvectors and they would form a desired eigenvector basis. The matrix of T
w.r.t. this basis would be a diagonal matrix by Theorem 2.
While this procedure is basically correct, it turns out that it does not always work.
Just exactly what the difficulties are and under what conditions the procedure will work
will be taken up in the next note, called diagonalisation.
Exercises
(6.1) Suppose A is the matrix of a linear transformation w.r.t. some ordered bases for
the domain and the codomain. If we permute the elements of these bases, how
does it affect A?
(6.2) For any two square matrices P and Q of the same order, prove that P Q and QP
have the same trace. Hence show that the trace is a similarity invariant. [Hint:
Given B = C 1 AC, choose P and Q cleverly.]
15
W respectively. Prove that there are invertible matrices P and Q of sizes n and
m respectively such that B = QAP .
(6.5) Verify that the matrix of the differential operator w.r.t. the alternate basis given
for the vector space of all polynomials of degree 3 or less, indeed comes out as
claimed.
(6.6) For the differential operator D 2 defined on the vector space of all infinitely differ-
entiable functions from IR to IR, prove that every real number is an eigenvalue.
Identify two linearly independent eigenvectors corresponding to it. (THis is more
q question about solving differential equations.)
cos sin
(6.7) Let A = be the matrix of the counterclockwise rotation through
sin cos
an angle . (See exercise (5.3).) Prove that the eigenvalues of A are ei and ei .
Find the corresponding (complex) eigenvectors.
(6.8) Prove that similar matrices have the same eigenvalues, but that the corresponding
eigenvectors may be different.
(6.9) Prove that the determinant and the trace of an nn matrix equal, respectively, the
product and the sum of its n eigenvalues (some of which may be complex and some
may be repeated). [Hint: Using the fundamental theorem of algebra factorise the
characteristic polynomial completely as (1)n (1 )(2 ) . . . (n ). Consider
the coefficients of 0 and n1 . This and the last exercise give alternate proofs
that the trace and the determinant are similarity invariant.]
(6.11) Prove that 0 is the only eigenvalue of a nilpotent matrix and the only possible
eigenvalues of an idempotent transformation are 0 and 1.
16
0 1
(6.12) Show that the matrix has 0 as its only eigenvalue but no eigenvector
0 0
basis. (This is the prototype which illustrates what can go wrong in finding an
eigenvector basis.)
3 2 1 1 0
(6.13) Let T : IR IR be TA where A = . Compute the matrix of A
2 0 1
w.r.t.
(i) the standard bases (e1 , e2 , e3 ) for IR3 and (e1 , e2 ) for IR2
(ii) the ordered basis (e2 , e1 + e3 , e3 ) for IR3 and the standard basis for IR2
(iii) the standard basis for IR3 and the ordered basis (e1 e2 , 2e1 + e2 ) for IR2
(iv) the ordered basis (e2 , e1 +e3 , e3 ) for IR3 and the ordered basis (e1 e2 , 2e1 +e2 )
for IR2 .
(6.15) In the last exercise, suppose that T (vi ) lies in the span of {v1 , v2 , . . . , vi1 } for
i = 1, 2, . . . n (for i = 1 this means T (v1 ) = 0). Prove that T is nilpotent. Use this
to give an easier proof that a square matrix in which all the entries on and below
the diagonal vanish is nilpotent.
17