Anda di halaman 1dari 78

Eigenvalues and Eigenvectors

7.1 Eigenvalues and Eigenvectors


7.2 Diagonalization
7.3 Symmetric Matrices and Orthogonal Diagonalization
7.4 Application of Eigenvalues and Eigenvectors
7.5 Principal Component Analysis
7.1
7.1 Eigenvalues and Eigenvectors
 Eigenvalue problem one of the most important problems in the
linear algebra):
n
If A is an nn matrix, do there exist nonzero vectors x in R
such that Ax is a scalar multiple of x?
(The term eigenvalue is from the German word Eigenwert, meaning
“proper value”)

 Eigenvalue and Eigenvector


A: an nn matrix
: a scalar (could be zero) ※ Geometric Interpretation
y
x: a nonzero vector in Rn Ax =  x

Eigenvalue

Ax  x
x

Eigenvector
x 7.2
 Ex 1: Verifying eigenvalues and eigenvectors
2 0  1  0 
A  x1    x 2   
 0  1 0 1 
Eigenvalue
※ In fact, for each eigenvalue, it
 2 0  1   2  1  has infinitely many eigenvectors.
Ax1          2    2x1 For  = 2, [3 0]T or [5 0]T are
0 1 0 0  0 both corresponding eigenvectors.
Moreover, ([3 0] + [5 0])T is still
Eigenvector
an eigenvector. The proof is in
Thm. 7.1.
Eigenvalue
 2 0  0   0  0
Ax2          1    (1)x2
0 1 1   1 1 
Eigenvector
7.3
 Thm. 7.1: The eigenspace corresponding to  of matrix A
If A is an nn matrix with an eigenvalue , then the set of all
eigenvectors of  together with the zero vector is a subspace
of Rn. This subspace is called the eigenspace of 
Pf:
x1 and x2 are eigenvectors corresponding to 
(i.e., Ax1   x1 , Ax2   x2 )
(1) A(x1  x 2 )  Ax1  Ax 2   x1   x 2   (x1  x 2 )
(i.e., x1  x 2 is also an eigenvector corresponding to λ)
(2) A(cx1 )  c( Ax1 )  c( x1 )   (cx1 )
(i.e., cx1 is also an eigenvector corresponding to  )
Since this set is closed under vector addition and scalar
multiplication, this set is a subspace of Rn according to
Theorem 4.5 7.4
Ex 3: Examples of eigenspaces on the xy-plane
For the matrix A as follows, the corresponding eigenvalues
are 1 = –1 and 2 = 1:
  1 0
A 
 0 1 
Sol:
For the eigenvalue 1 = –1, corresponding vectors are any vectors on the x-axis

 x   1 0  x    x   x  ※ Thus, the eigenspace


A           1   corresponding to  = –1 is the x-
0   0 1  0  0   0 axis, which is a subspace of R2

For the eigenvalue 2 = 1, corresponding vectors are any vectors on the y-axis

 0   1 0  0   0   0  ※ Thus, the eigenspace


A           1  corresponding to  = 1 is the y-
 y   0 1  y   y   y  axis, which is a subspace of R2
7.5
※ Geometrically speaking, multiplying a vector (x, y) in R2 by the matrix A
corresponds to a reflection to the y-axis

 x   x  0    x 0
Av  A    A         A    A  
 y  0  y   0  y
 x  0   x
 1    1     
0  y   y 

7.6
 Thm. 7.2: Finding eigenvalues and eigenvectors of a matrix AMnn
Let A be an nn matrix.
(1) An eigenvalue of A is a scalar  such that det( I  A)  0
(2) The eigenvectors of A corresponding to  are the nonzero
solutions of ( I  A)x  0
 Note: follwing the definition of the eigenvalue problem
Ax   x  Ax   Ix  ( I  A)x  0 (homogeneous system)
( I  A)x  0 has nonzero solutions for x iff det( I  A)  0
(The above iff results comes from the equivalent conditions on Slide 4.101)

 Characteristic equation of A:
det( I  A)  0
 Characteristic polynomial of AMnn:
det( I  A)  ( I  A)   n  cn1 n1   c1  c0
7.7
 Ex 4: Finding eigenvalues and eigenvectors

2  12
A 
 1  5 

Sol: Characteristic equation:


 2 12
det( I  A) 
1  5
  2  3  2  (  1)(  2)  0
   1,  2

Eigenvalue: 1  1, 2  2

7.8
(1) 1  1  ( I  A)x   3 12   x1   0 
1  1 4   x  0 
  2  
 3 12  G.-J. E. 1 4 
    
  1 4   0 0 
 x1   4t   4 
       t  , t  0
 x2   t  1 
 4 12   x1  0 
(2) 2  2  (2 I  A)x       
 1 3   x2  0 
 4 12  G.-J. E. 1 3
    
 1 3   0 0 
 x1  3s  3
       s  , s  0
 x2   s  1 7.9
 Ex 5: Finding eigenvalues and eigenvectors
Find the eigenvalues and corresponding eigenvectors for
the matrix A. What is the dimension of the eigenspace of
each eigenvalue?

2 1 0
A  0 2 0 
 
 0 0 2 
Sol: Characteristic equation:
  2 1 0
I  A  0  2 0  (  2)3  0
0 0  2
Eigenvalue:   2
7.10
The eigenspace of λ = 2:
0 1 0  x1  0
( I  A)x  0 0 0  x2   0
0 0 0  x3  0
 x1   s  1 0
 x2   0  s 0  t 0, s, t  0
       
 x3   t  0 1
 1   0  
     
 s 0   t 0 s, t  R  : the eigenspace of A corresponding to   2
 0  1  
     

Thus, the dimension of its eigenspace is 2


7.11
 Notes:
(1) If an eigenvalue 1 occurs as a multiple root (k times) for
the characteristic polynominal, then 1 has multiplicity k
(2) The multiplicity of an eigenvalue is greater than or equal
to the dimension of its eigenspace. (In Ex. 5, k is 3 and
the dimension of its eigenspace is 2)

7.12
 Ex 6:Find the eigenvalues of the matrix A and find a basis
for each of the corresponding eigenspaces
1 0 0 0 
0 1 5  10
A 
1 0 2 0 
1 0 0 3 

Sol: Characteristic equation:


 1 0 0 0 ※ According to the note on
the previous slide, the
0   1 5 10 dimension of the
I  A 
1 0  2 0 eigenspace of λ1 = 1 is at
most to be 2
1 0 0  3
※ For λ2 = 2 and λ3 = 3, the
 (  1) 2 (  2)(  3)  0 demensions of their
eigenspaces are at most to
Eigenvalues: 1  1, 2  2, 3  3 be 1 7.13
0 0 0 0   x1  0 
0 0 5 10   x2  0 
(1) 1  1  (1 I  A)x   
 1 0 1 0   x3  0 
    
 1 0 0 2   x4  0
 x1   2t  0   2 
   s 
G.-J.E. x 1   0 
  2      s    t   , s, t  0
 x3   2t  0  2 
       
 x4   t  0  1 
 0    2  
    
1  0  
 ,  is a basis for the eigenspace
0  2   corresponding to 1  1
0  1  
 
※The dimension of the eigenspace of λ1 = 1 is 2
7.14
1 0 0 0   x1  0 
0 1 5 10   x2  0
(2) 2  2  (2 I  A)x   
 1 0 0 0   x3  0 
    
 1 0 0 1  x4  0
 x1   0  0 
  5t  5 
G.-J.E. x
  2     t  , t  0
 x3   t  1 
     
 x4   0  0 
 0  
  
5  is a basis for the eigenspace
 
1  corresponding to 2  2
0 
 
※The dimension of the eigenspace of λ2 = 2 is 1
7.15
2 0 0 0   x1  0
0 2 5 10   x2  0 
(3) 3  3  (3 I  A)x   
 1 0 1 0   x3  0
    
 1 0 0 0   x4  0
 x1   0   0 
   5t   5
G.-J.E. x
  2     t  , t  0
 x3   0   0 
     
 x4   t   1 

 0  
  
 5  is a basis for the eigenspace
 
  0   corresponding to 3  3
 1  
 
※The dimension of the eigenspace of λ3 = 3 is 1
7.16
 Thm. 7.3: Eigenvalues for triangular matrices
If A is an nn triangular matrix, then its eigenvalues are
the entries on its main diagonal
 Ex 7: Finding eigenvalues for triangular and diagonal matrices
 1 0 0 0 0
0 0 
2 0 0  2 0 0
(a) A   1 1 0  (b) A   0 0 0 0 0
 
 5 3 3 0 0 0 4 0
 0 0 0 0 3 
Sol:
 2 0 0
(a)  I  A  1   1 0  (  2)(  1)(  3)  0
5 3   3 ※According to Thm. 3.2, the
determinant of a triangular
 1  2, 2  1, 3  3 matrix is the product of the
entries on the main diagonal
(b) 1  1, 2  2, 3  0, 4  4, 5  3 7.17
 Eigenvalues and eigenvectors of linear transformations:
A number  is called an eigenvalue of a linear transformation
T : V  V if there is a nonzero vector x such that T (x)   x.
The vector x is called an eigenvector of T corresponding to  ,
and the set of all eigenvectors of  (together with the zero
vector) is called the eigenspace of 
※ The definition of linear transformation functions should be introduced in Ch 6
※ Here I briefly introduce the linear transformation and its some basic properties
※ The typical example of a linear transformation function is that each
component of the resulting vector is the linear combination of the components
in the input vector x

 An example for a linear transformation T: R3→R3


T ( x1 , x2 , x3 )  ( x1  3x2 ,3x1  x2 , 2 x3 )
7.18
 Theorem: Standard matrix for a linear transformation
Let T : R n  R n be a linear trtansformation such that
 a11   a12   a1n 
a  a  a 
T (e1 )   21  , T (e2 )   22  , , T (en )   2 n  ,
     
     
 n1 
a  an 2   ann 
where {e1 , e 2 , , e n } is a standard basis for R n . Then an n  n
matrix A, whose i -th column correspond to T (ei ),
 a11 a12 a1n 
a a22 a2 n 
A  T (e1 ) T (e2 ) T (en )    21
,
 
 
 an1 an 2 ann 
satisfies that T (x)  Ax for every x in R n . A is called the
standard matrix for T (T的標準矩陣) 7.19
 Consider the same linear transformation T(x1, x2, x3) =
(x1 + 3x2, 3x1 + x2, –2x3)
1  1   0   3  0  0 
 T (e1 )  T ( 0)  3 , T (e2 )  T ( 1 )  1  , T (e3 )  T ( 0)   0 
0 0 0 0 1  2
 Thus, the above linear transformation T is with the following
corresponding standard matrix A such that T(x) = Ax
1 3 0  1 3 0   x1   x1  3x2 
A  3 1 0   Ax  3 1 0   x2   3x1  x2 
0 0 2 0 0 2   x3   2 x3 
※ The statement on Slide 7.18 is valid because for any linear transformation T: V →V,
there is a corresponding square matrix such that T(x) = Ax. Consequently, the
eignvalues and eigenvectors of a linear transformation T are in essence the
eigenvalues and eigenvectors of the corresponding square matrix A
7.20
 Ex 8: Finding eigenvalues and eigenvectors for standard matrices
Find the eigenvalues and corresponding eigenvectors for
1 3 0  ※ A is the standard matrix for T(x1, x2,
A   3 1 0  x3) = (x1 + 3x2, 3x1 + x2, –2x3) (see
Slides 7.19 and 7.20)
0 0 2 
Sol:
  1  3 0 
I  A    3   1 0   (  2) (  4)  0
2
 
 0 0   2
 eigenvalues 1  4, 2  2

For 1  4, the corresponding eigenvector is (1, 1, 0).


For 2  2, the corresponding eigenvectors are (1,  1, 0)
and (0, 0, 1). 7.21
 Transformation matrix A ' for nonstandard bases
Suppose B is the standard basis of R n . Since the coordinate matrix of a vector
relative to the standard basis consists of the components of that vector, i.e.,
for any x in R n , x = [x]B , the theorem on Slide 7.19 can be restated as follows.

T (x)  Ax  T (x) B  A  x B , where A  T (e1 ) B T (e 2 ) B T (en ) B 


is the standard matrix for T or the matrix of T relative to the standard
basis B
The above theorem can be extended to consider a nonstandard basis B ', which
consists of {v1 , v 2 , , v n }

T (x)B '  A ' xB ' , where A '  T ( v1 )B ' T ( v 2 )B ' T ( v n ) B ' 
is the transformation matrix for T relative to the basis B '
※ On the next two slides, an example is provided to verify numerically that this
extension is valid 7.22
 EX. Consider an arbitrary nonstandard basis B ' to be {v1, v2,
v3}= {(1, 1, 0), (1, –1, 0), (0, 0, 1)}, and find the transformation
matrix A ' such that T (x)B '  A ' xB ' corresponding to the same
linear transformation T(x1, x2, x3) = (x1 + 3x2, 3x1 + x2, –2x3)
 1    4  4  1   2 0
   
T ( v1 )B '  T ( 1  )    4   0  , T ( v 2 )B '  T (  1 )    2    2  ,
 0   0  B '  0    0    0  B '  0 
B' B'

 0  0 0


 
T ( v3 )B '  T ( 0  )    0    0 
 1    2 B '  2
B'

4 0 0 
 A '  0 2 0 
0 0 2  7.23
 Consider x = (5, –1, 4), and check that T (x)B '  A ' xB '
corresponding to the linear transformation T(x1, x2, x3) = (x1 + 3x2,
3x1 + x2, –2x3)

 5  2 8 5 2


 
T (x)B '  T (  1 )   14    6  ,  x B '   1   3  ,
  4    8 B '  8  4  B '  4 
B'

4 0 0   2  8 
 A '  x B '  0 2 0   3   6  T (x) B '
0 0 2  4  8

7.24
 For a special basis 𝐵′ = 𝐯1 , 𝐯2 , … , 𝐯𝑛 , where 𝐯𝑖 ’s are eigenvectors of
the standard matrix 𝐴, 𝐴′ is obtained immediately to be diagonal due to
𝑇 𝐯𝑖 = 𝐴𝐯𝑖 = 𝜆𝑖 𝐯𝑖
and
𝜆𝑖 𝐯𝑖 𝐵′ = 0𝐯1 + 0𝐯2 + ⋯ + 𝜆𝑖 𝐯𝑖 + ⋯ + 0𝐯𝑛 𝐵′ = 0 ⋯ 0 𝜆𝑖 0 ⋯ 0 𝑇

Let B ' be a basis of R 3 made up of three linearly independent eigenvectors


of A, e.g., B '  {v1 , v 2 , v 3}  {(1, 1, 0), (1,  1, 0), (0, 0, 1)} in Ex. 8

Then A ', the transformation matrix for T relative to the basis B ', defined as
[[T ( v1 )]B ' [T ( v 2 )]B ' [T ( v 3 )]B ' ] (see Slide 7.22), is diagonal, and the main
diagonal entries are corresponding eigenvalues (see Slides 7.23)
for 1  4 for 2 2 4 0 0
B '  {(1, 1, 0), (1,  1, 0), (0, 0, 1)} A'  0  2 0 
 
Eigenvectors of A  0 0  2 
Eigenvalues of A 7.25
Keywords in Section 7.1:
 eigenvalue problem:
 eigenvalue:
 eigenvector:
 characteristic equation:
 characteristic polynomial:
 eigenspace:
 multiplicity:
 linear transformation:
 diagonalization:

7.26
7.2 Diagonalization
 Diagonalization problem :
For a square matrix A, does there exist an invertible matrix P
such that P–1AP is diagonal?
 Diagonalizable matrix :
Definition 1: A square matrix A is called diagonalizable if
there exists an invertible matrix P such that P–1AP is a
diagonal matrix (i.e., P diagonalizes A)
Definition 2: A square matrix A is called diagonalizable if A
is similar to a diagonal matrix
※ In Sec. 6.4, two square matrices A and B are similar if there exists an invertible
matrix P such that B = P–1AP.
 Notes:
In this section, I will show that the eigenvalue and eigenvector
problem is closely related to the diagonalization problem
7.27
 Thm. 7.4: Similar matrices have the same eigenvalues
If A and B are similar nn matrices, then they have the
same eigenvalues
Pf:
1 For any diagonal matrix in the
A and B are similar  B  P AP form of D = λI, P–1DP = D

Consider the characteristic equation of B:


 I  B   I  P 1 AP  P 1 IP  P 1 AP  P 1 ( I  A) P
 P 1  I  A P  P 1 P  I  A  P 1P  I  A
 I  A
Since A and B have the same characteristic equation,
they are with the same eigenvalues
※ Note that the eigenvectors of A and B are not identical 7.28
 Ex 1: Eigenvalue problems and diagonalization programs
1 3 0 
A  3 1 0 
 
 0 0  2 
Sol: Characteristic equation:
  1 3 0
 I  A  3   1 0  (  4)(  2)2  0
0 0 2

The eigenvalues : 1  4, 2  2, 3  2


1 
 
(1)   4  the eigenvector p1  1 
0
7.29
1 0
   
(2)   2  the eigenvector p 2   1 , p3  0 
 0  1 
1 1 0  4 0 0 
P  [p1 p 2 p3 ]  1 1 0 , and P 1 AP  0 2 0 
0 0 1  0 0 2
 Note: If P  [p 2 p1 p3 ]
 1 1 0  2 0 0 
  1 1 0  P 1 AP   0 4 0 
 0 0 1   0 0 2 
※ The above example can verify Thm. 7.4 since the eigenvalues for both A and P–1AP
are the same to be 4, –2, and –2
※ The reason why the matrix P is constructed with the eigenvectors of A is
demonstrated in Thm. 7.5 on the next slide 7.30
 Thm. 7.5: Condition for diagonalization
An nn matrix A is diagonalizable if and only if it has n
linearly independent eigenvectors
※ If there are n linearly independent eigenvectors, it does not imply that there are n distinct
eigenvalues. In an extreme case, it is possible to have only one eigenvalue with the
multiplicity n, and there are n linearly independent eigenvectors for this eigenvalue
※ On the other hand, if there are n distinct eigenvalues, then there are n linearly
independent eigenvectors (see Thm. 7.6), and thus A must be diagonalizable
Pf: ()
Since A is diagonalizable, there exists an invertible P s.t. D  P 1 AP
is diagonal. Let P  [p1 p 2 p n ] and D  diag (1 , 2 , , n ), then
1 0 0
0  0 
PD  [p1 p 2 pn ]  2

 
 
0 0 n 
 [1p1 2p 2 np n ] 7.31
AP  PD (since D  P 1 AP )
[ Ap1 Ap 2 Ap n ]  [1p1 2p 2 np n ]
 Api  i pi , i  1, 2, ,n
(The above equations imply the column vectors pi of P are eigenvectors
of A, and the diagonal entries i in D are eigenvalues of A)
Because A is diagonalizable  P is invertible
 Columns in P, i.e., p1 , p 2 , , p n , are linearly independent
(see Slide 4.101 in the lecture note)
Thus, A has n linearly independent eigenvectors
()
Since A has n linearly independent eigenvectors p1 , p 2 , p n with
corresponding eigenvalues 1 , 2 , n (could be the same), then
 Api  i pi , i  1, 2, ,n
Let P  [p1 p2 pn ] 7.32
AP  A[p1 p 2 p n ]  [ Ap1 Ap 2 Ap n ]
 [1p1 2p 2 np n ]
1 0 0
0  0 
 [p1 p 2 pn ]  2
 PD
 
 
0 0 n 
Since p1 , p 2 , , p n are linearly independent
 P is invertible (see Slide 4.101 in the lecture note)
AP  PD  P 1 AP  D
 A is diagonalizable
(according to the definition of the diagonalizable matrix on Slide 7.27)

※ Note that pi 's are linearly independent eigenvectors and the diagonal
entries i in the resulting diagonalized D are eigenvalues of A
7.33
 Ex 4: A matrix that is not diagonalizable
Show that the following matrix is not diagonalizable
1 2 
A 
 0 1 
Sol: Characteristic equation:
  1 2
I  A   (  1)2  0
0  1
The eigenvalue 1  1, and then solve (1I  A)x  0 for eigenvectors

0 2  1 
1 I  A  I  A     eigenvector p1   
0 0  0
Since A does not have two linearly independent eigenvectors,
A is not diagonalizable
7.34
 Steps for diagonalizing an nn square matrix:

Step 1: Find n linearly independent eigenvectors p1 , p 2 , pn


for A with corresponding eigenvalues 1 , 2 , , n

Step 2: Let P  [p1 p 2 pn ]

Step 3: 1 0  0
1
 0 2  0
P AP  D   
   
 0 0  n 
where Api  i pi , i  1, 2, ,n

7.35
 Ex 5: Diagonalizing a matrix
 1  1  1
A 1 3 1
 
 3 1  1
Find a matrix P such that P 1 AP is diagonal.

Sol: Characteristic equation:


 1 1 1
 I  A  1  3 1  (  2)(  2)(  3)  0
3 1  1
The eigenvalues : 1  2, 2  2, 3  3

7.36
1 1 1 1 0 1   x1  0 
1  2  1 I  A   1 1 1 G.-J. E.
 0 1 0   x2   0 
 3 1 3  0 0 0   x3  0 
 x1   t   1
 x    0   eigenvector p   0 
 2   1  
 x3   t   1 

 3 1 1  1 0  14   x1  0 
2  2  2 I  A   1 5 1 
G.-J. E.
 0 1 14   x2   0 
 3 1 1 0 0 0   x3  0 

 x1   14 t  1
 x     1 t   eigenvector p   1
 2  4  2  
 x3   t   4 
7.37
2 1 1 1 0 1   x1  0 
3  3  3 I  A   1 0 1  G.-J. E.
 0 1 1  x2   0 
 3 1 4  0 0 0   x3  0 
 x1   t   1
 x    t   eigenvector p   1 
 2   3  
 x3   t   1 

 1 1 1
P  [p1 p 2 p3 ]   0 1 1  and it follows that
 1 4 1 
2 0 0
P 1 AP   0 2 0 
 0 0 3
7.38
 Note: a quick way to calculate Ak based on the diagonalization
technique

1 0 0 1k 0 0
0   
0   0 2k 0
(1) D   2
 D 
k
   
   k
0 0 n   0 0 n 

(2) D  P 1 AP  D k  P 1 AP P 1 AP P 1 AP  P 1 Ak P
repeat k times

1k 0 0
 
k 1  0 2k 0
A  PD P , where D 
k k
 
 k
 0 0 n  7.39
Thm. 7.6: Sufficient conditions for diagonalization
If an nn matrix A has n distinct eigenvalues, then the
corresponding eigenvectors are linearly independent and
thus A is diagonalizable according to Thm. 7.5.
Pf:
Let λ1, λ2, …, λn be distinct eigenvalues and corresponding
eigenvectors be x1, x2, …, xn. In addition, consider that
the first m eigenvectors are linearly independent, but the
first m+1 eigenvectors are linearly dependent, i.e.,
xm1  c1x1  c2 x2   cm xm , (1)
where ci’s are not all zero. Multiplying both sides of Eq. (1)
by A yields
Axm1  Ac1x1  Ac2 x 2   Acm x m
m1xm1  c11x1  c22 x2   cm m xm (2) 7.40
On the other hand, multiplying both sides of Eq. (1) by λm+1 yields
m1xm1  c1m1x1  c2m1x2   cmm1xm (3)

Now, subtracting Eq. (2) from Eq. (3) produces

c1 (m 1  1 )x1  c2 (m 1  2 )x 2   cm (m 1  m )x m  0

Since the first m eigenvectors are linearly independent, we can


infer that all coefficients of this equation should be zero, i.e.,
c1 (m 1  1 )  c2 (m 1  2 )   cm (m 1  m )  0

Because all the eigenvalues are distinct, it follows all ci’s equal to 0,
which contradicts our assumption that xm+1 can be expressed as a
linear combination of the first m eigenvectors. So, the set of n
eigenvectors is linearly independent given n distinct eigenvalues,
and according to Thm. 7.5, we can conclude that A is diagonalizable7.41
 Ex 7: Determining whether a matrix is diagonalizable
1  2 1 
A  0 0 1
 
0 0  3

Sol: Because A is a triangular matrix, its eigenvalues are


1  1, 2  0, 3  3

According to Thm. 7.6, because these three values are


distinct, A is diagonalizable

7.42
 Ex 8: Finding a diagonalized matrix for a linear transformation
Let T : R 3  R 3 be the linear transformation given by
T (x1 ,x2 ,x3 )  (x1  x2  x3 , x1  3x2  x3 ,  3x1  x2  x3 )
Find a basis B ' for R 3 such that the matrix for T relative
to B ' is diagonal
Sol:
The standard matrix for T is given by
 1 1 1
A   1 3 1 
 3 1 1
From Ex. 5 you know that λ1 = 2, λ2 = –2, λ3 = 3 and thus A is
diagonalizable. So, similar to the result on Slide 7.25, these
three linearly independent eigenvectors found in Ex. 5 can be
used to form the basis B '. That is 7.43
B '  {v1 , v 2 , v3}  {(1, 0, 1),(1, 1, 4),(1, 1, 1)}

The matrix for T relative to this basis is

A '  [T ( v1 )]B ' [T ( v 2 )]B ' [T ( v 3 )]B ' 


2 0 0
  0 2 0 
 0 0 3

※ Note that it is not necessary to calculate A ' through the above equation.
According to the result on Slide 7.25, we already know that A ' is a diagonal
matrix and its main diagonal entries are corresponding eigenvalues of A

7.44
Keywords in Section 7.2:
 diagonalization problem:
 diagonalization:
 diagonalizable matrix:

7.45
7.3 Symmetric Matrices and Orthogonal Diagonalization
 Symmetric matrix :
A square matrix A is symmetric if it is equal to its transpose:
A  AT

 Ex 1: Symmetric matrices and nonsymetric matrices


 0 1  2
A 1 3 0  (symmetric)
 
  2 0 5 
4 3
B  (symmetric)
 3 1 
3 2 1
C  1  4 0 (nonsymmetric)
 
1 0 5
7.46
 Thm 7.7: Eigenvalues of symmetric matrices
If A is an nn symmetric matrix, then the following properties
are true
(1) A is diagonalizable (symmetric matrices (except the
matrices in the form of D = aI) are guaranteed to have n
linearly independent eigenvectors and thus be
diagonalizable)
(2) All eigenvalues of A are real numbers
(3) If  is an eigenvalue of A with the multiplicity to be k, then
 has k linearly independent eigenvectors. That is, the
eigenspace of  has dimension k
※ The above theorem is called the Real Spectral Theorem (實數頻譜理
論), and the set of eigenvalues of A is called the spectrum (頻譜) of A
7.47
 Ex 2:
Prove that a 2 × 2 symmetric matrix is diagonalizable
a c 
A
 c b 

Pf: Characteristic equation:


 a c
I  A   2  (a  b)  ab  c 2  0
c  b
As a function in , this quadratic polynomial function has a
nonnegative discriminant (判別式) as follows
(a  b) 2  4(1)(ab  c 2 )  a 2  2ab  b 2  4ab  4c 2
 a 2  2ab  b 2  4c 2
 (a  b) 2  4c 2  0  real-number solutions
7.48
(1) (a  b) 2  4c 2  0

 a  b, c  0
a c  a 0 
A    itself is a diagonal matrix
c b 0 a
※ Note that in this case, A has one eigenvalue, a, and its multiplicity is 2

(2) (a  b) 2  4c 2  0

The characteristic polynomial of A has two distinct real roots,


which implies that A has two distinct real eigenvalues.
According to Thm. 7.6, A is diagonalizable
7.49
 Orthogonal matrix :
A square matrix P is called orthogonal if it is invertible and
P1  PT (or PPT  PT P  I )
 Thm. 7.8: Properties of orthogonal matrices
An nn matrix P is orthogonal if and only if its column vectors
form an orthonormal set
Pf: Suppose the column vectors of P form an orthonormal set, i.e.,
P  p1 p 2 p n  , where pi  p j  0 for i  j and p i  p i  1
 p1T p1 p1T p 2 p1T p n   p1  p1 p1  p 2 p1  p n 
 T  
 p p p T
2 p2 p 2T p1  p 2  p1 p 2  p 2 p 2  p1 
P P
T 2 1
  In
   
 T   
p n p1 p n p 2
T
p n p n  p n  p1 p n  p 2
T
pn  pn 

It implies that P–1 = PT and thus P is orthogonal 7.50


 Ex 5: Show that P is an orthogonal matrix.
 13 2 2 
 2 
3 3

P 5 1
5
0
 2 4 5 
3 5 3 5 3 5

Sol: If P is a orthogonal matrix, then P 1  PT  PPT  I


 1 2 2  13 2 2  1 0 0 
 3 3 3
 5 3 5
  I
PPT   25 1
5
0   32 1
5
4
3 5
   0 1 0 
 2 4
 2  0 0 1 
 3 5 3 5
  3 
5 5
3 5
0 3 5

7.51
 1   2   2 
 3   3  
3

Moreover, let p1   5  , p 2   5  , and p3   0  ,
2 1

 2   4   5 
 3 5   3 5  3 5 
we can produce p1  p 2  p1  p3  p 2  p3  0 and p1  p1 
p 2  p 2  p3  p3  1

So, {p1 , p 2 , p3} is an orthonormal set (These results are


consistent with Thm. 7.8)

7.52
 Thm. 7.9: Properties of symmetric matrices
Let A be an nn symmetric matrix. If 1 and 2 are distinct
eigenvalues of A, then their corresponding eigenvectors x1 and x2
are orthogonal. (Thm. 7.6 only states that eigenvectors
corresponding to distinct eigenvalues are linearly independent)
Pf:
1 (x1  x 2 )  (1x1 )  x 2  ( Ax1 )  x 2  ( Ax1 )T x 2  (x1T AT )x 2
because A is symmetric
 (x1T A)x 2  x1T ( Ax 2 )  x1T (2 x 2 )  x1  (2 x 2 )  2 (x1  x 2 )

The above equation implies (1  2 )(x1  x 2 )  0, and because


1  2 , it follows that x1  x 2  0. So, x1 and x 2 are orthogonal
※ For distinct eigenvalues of a symmetric matrix, their corresponding
eigenvectors are orthogonal and thus linearly independent to each other
※ Note that there may be multiple x1 and x2 corresponding to 1 and 2 7.53
 Orthogonal diagonalization :
A matrix A is orthogonally diagonalizable if there exists an
orthogonal matrix P such that P–1AP = D is diagonal

 Thm. 7.10: Fundamental theorem of symmetric matrices


Let A be an nn matrix. Then A is orthogonally diagonalizable
and has real eigenvalues if and only if A is symmetric
Pf:
( )
A is orthogonally diagonalizable
 D  P 1 AP is diagonal, and P is an orthogonal matrix s.t. P 1  PT
 A  PDP 1  PDPT  AT  ( PDPT )T  ( PT )T DT PT  PDPT  A
()
See the next two slides
7.54
 Orthogonal diagonalization of a symmetric matrix:
Let A be an nn symmetric matrix.
(1) Find all eigenvalues of A and determine the multiplicity of each
※ According to Thm. 7.9, eigenvectors corresponding to distinct eigenvalues are
orthognoal
(2) For each eigenvalue of multiplicity 1, choose the unit eigenvector
(3) For each eigenvalue of the multiplicity to be k  2, find a set of k
linearly independent eigenvectors. If this set {v1, v2, …, vk} is not
orthonormal, apply the Gram-Schmidt orthonormalization process
It is known that G.-S. process is a kind of linear transformation, i.e., the
produced vectors can be expressed as c1 v1  c2 v 2   ck v k (see Slide 5.55),
i. Since Av1   v1 , Av 2   v 2 , , Av k   v k ,
 A(c1 v1  c2 v 2   ck v k )   (c1 v1  c2 v 2   ck v k )
 The produced vectors through the G.-S. process are still eigenvectors for 
ii. Since v1 , v 2 , , v k are orthogonal to eigenvectors corresponding to other
different eigenvalues (according to Thm. 7.9), c1 v1  c2 v 2   ck v k is also
orthogonal to eigenvectors corresponding to other different eigenvalues. 7.55
(4) The composite of steps (2) and (3) produces an orthonormal set of
n eigenvectors. Use these orthonormal and thus linearly
independent eigenvectors as column vectors to form the matrix P.
i. According to Thm. 7.8, the matrix P is orthogonal
ii. Following the diagonalization process on Slide 7.35, D = P–1AP
is diagonal
Therefore, the matrix A is orthogonally diagonalizable

7.56
 Ex 7: Determining whether a matrix is orthogonally diagonalizable
Symmetric Orthogonally
matrix diagonalizable
1 1 1
A1  1 0 1
 
1 1 1
 5 2 1
A2   2 1 8
 
  1 8 0 
 3 2 0
A3  
2 0 1
0 0 
A4  
0  2
7.57
 Ex 9: Orthogonal diagonalization
Find an orthogonal matrix P that diagonalizes A.
 2 2 2 
A   2 1 4 
 2 4 1
Sol:
(1) I  A  (  3) 2 (  6)  0
1  6, 2  3 (has a multiplicity of 2)
v1
(2) 1  6, v1  (1,  2, 2)  u1   ( 13 , 32 , 32 )
v1
(3) 2  3, v 2  (2, 1, 0), v 3  ( 2, 4, 5) ※ Verify Thm. 7.9 that
v1·v2 = v1·v3 = 0

orthogonal
7.58
※If v2 and v3 are not orthogonal, the Gram-Schmidt Process should be
performed. Here we simply normalize v2 and v3 to find the
corresponding unit vectors
v2 v3
u2  ( 2
5
, 1
5
, 0), u3   ( 325 , 3
4
5
, 5
3 5
)
v2 v3

 13 2 2    6 0 0
  2 15 3 5
  P 1 AP   0 3 0
P3 4

5 3 5  
 23 0 5   0 0 3
 3 5 
u1 u2 u3

※ Note that there are some calculation error in the solution of Ex.9 in the
text book
7.59
Keywords in Section 7.3:
 symmetric matrix:
 orthogonal matrix:
 orthonormal set:
 orthogonal diagonalization:

7.60
7.4 Applications of Eigenvalues and Eigenvectors
 The rotation for quadratic equation: ax2+bxy+cy2+dx+ey+f = 0
 Ex 5: Identify the graphs of the following quadratic equations
(a) 4 x 2  9 y 2  36  0 (b) 13x 2  10 xy  13 y 2  72  0
Sol:

x2 y 2
(a) In standard form, we can obtain 2  2  1.
3 2

※ Since there is no xy-term, it is easy


to derive the standard form and it is
apparent that this equation
represents an ellipse.

7.61
(b) 13x 2  10 xy  13 y 2  72  0
※ Since there is a xy-term, it is difficult to identify the graph of this equation.
In fact, it is also an ellipse, which is oblique on the xy-plane.

※ There is a easy way to identify the graph of


quadratic equation. The basic idea is to rotate the
x- and y-axes to x’- and y’-axes such that there is
no more x’y’-term in the new quadratic equation.
※ In the above example, if we rotate the x- and y-
axes by 45 degree counterclockwise, the new
( x ')2 ( y ')2
quadratic equation 2
 2  1 can be
3 2
derived, which represents an ellipse apparently.

※ In Section 4.8, the rotation of conics is achieved by changing basis, but


here the diagonalization technique based on eigenvalues and
eignvectors is applied to solving the rotation problem
7.62
 Quadratic form :
ax2 + bxy + cy2
is the quadratic form associated with the quadratic equation
ax2 + bxy + cy2 + dx + ey + f = 0

 Matrix of the quadratic form:

 a b / 2 ※ Note that A is a symmetric


A 
 b / 2 c  matrix

x
If we define X =  y  , then XTAX= ax2 + bxy + cy2 . In fact, the
 
quadratic equation can be expressed in terms of X as follows

X T AX   d e X  f
7.63
 Principal Axes Theorem
For a conic whose equation is ax2 + bxy + cy2 + dx + ey + f = 0,
the rotation to eliminate the xy-term is achieved by X = PX’,
where P is an orthogonal matrix that diagonalizes A. That is,
1 1 0 
P AP  P AP  D  
T
 ,
 0 2 
where λ1 and λ2 are eigenvalues of A. The equation for the
rotated conic is given by
1 ( x ') 2  2 ( y ') 2   d e  PX   f  0

7.64
Pf:
According to Thm. 7.10, since A is symmetric, we can
conclude that there exists an orthogonal matrix P such that
P–1AP = PTAP = D is diagonal
Replacing X with PX’, the quadratic form becomes
X T AX  ( PX )T A( PX )  ( X )T PT APX 
 ( X )T DX   1 ( x) 2  2 ( y) 2
※ It is obvious that the new quadratic form in terms of X’ has no x’y’-term, and
the coefficients for (x’)2 and (y’)2 are the two eigenvalues of the matrix A
 x  x   x  x 
※ X  PX       v1 v 2     xv1  yv 2  Since   and   are
 y  y  y  y 
the orignal and new coodinates, the roles of v1 and v 2 (the eigenvectors
of A) are like the basis vectors (or the axis vectors) in the new coordinate
system 7.65
 Ex 6: Rotation of a conic
Perform a rotation of axes to eliminate the xy-term in the
following quadratic equation
13x 2  10 xy  13 y 2  72  0
Sol:
The matrix of the quadratic form associated with this equation is
13 5
A 
 5 13 
The eigenvalues are λ1 = 8 and λ2 = 18, and the corresponding
eigenvectors are
1  1
x1    and x 2   
1 1
7.66
After normalizing each eigenvector, we can obtain the
orthogonal matrix P as follows. ※ According to the results on p.
 1 1  268 in Ch4, X=PX’ is
 2 2  cos 45  sin 45 
equivalent to rotate the xy-
P  
coordinates by 45 degree to
 1 1   sin 45 cos 45  form the new x’y’-coordinates,
 2 2 
which is also illustrated in the
figure on Slide 7.62

Then by replacing X with PX’, the equation of the rotated conic


is
8(x) 2  18( y) 2  72  0,
which can be written in the standard form
( x) 2 ( y) 2
2
 2 1
3 2
※ The above equation represents an ellipse on the x’y’-plane
7.67
 In three-dimensional version:
ax2 + by2 + cz2 + dxy + exz + fyz
is the quadratic form associated with the equation of quadric
surface: ax2 + by2 + cz2 + dxy + exz + fyz + gx + hy + iz + j = 0

 Matrix of the quadratic form:


 a d /2 e/2
A   d / 2 b f / 2 ※ Note that A is a symmetric
matrix
 e / 2 f / 2 c 

If we define X = [x y z]T, then XTAX= ax2 + by2 + cz2 + dxy + exz


+ fyz, and the quadratic surface equation can be expressed as

X T AX   g h i  X  j
7.68
Keywords in Section 7.4:
 quadratic form
 principal axes theorem

7.69
7.5 Principal Component Analysis
 Principal component analysis
 It is a way of identifying the underlying patterns in data
 It can extract information in a large data set with many
variables and approximate this data set with fewer factors
 In other words, it can reduce the number of variables to a
more manageable set
 Steps of the principal component analysis
 Step 1: Get some data
 Step 2: Subtract the mean
 Step 3: Calculate the covariance matrix
 Step 4: Calculate the eigenvectors and eigenvalues of the
covariance matrix
 Step 5: Deriving the transformed data set
 Step 6: Getting the original data back 7.70
Step 1: Step 2:
x y x y
2.5 2.4 0.69 0.49
0.5 0.7 -1.31 -1.21
2.2 2.9 0.39 0.99
1.9 2.2 0.09 0.29
demeaned
 X T   x y
3.1 3.0 1.29 1.09
2.3 2.7  0.49 0.79
2.0 1.6 0.19 -0.31
1.0 1.1 -0.81 -0.81
1.5 1.6 -0.31 -0.31
1.1 0.9 -0.71 -1.01
1.81 1.91 0 0

Step 3:
  x T
   x T
x x T
y
var( X )  E  XX   E   T   x y   E  T
T
 T
 T 
  y    y x y y
 var( x) cov( x, y )   0.616556 0.615444 
    A
cov( x, y ) var( y )   0.615444 0.716556  7.71
 Step 4: Calculate the eigenvectors and eigenvalues of the
covariance matrix A
 0.67787   0.73518 
1  1.284028, v1 =   2  0.049083, v 2 =  
 0.73518   0.67787 

1. The two eigenvectors are perpendicular


v1 (orthogonal) to each other according to
Thm. 7.9 (In fact, they are orthonormal
here)
v2 2. v1 eigenvector (corresponding to the
largest eigenvalue 𝜆1 ) is just like a best-
fit regression line
3. v2 seems less important to explain the
data since the projection of each node
on the v2 axis is very close to zero
4. The interpretation of v1 is the new axis
which retains as much as possible the
interpoint distance information (or the
variance information) that was
contained in the original two
dimensions 7.72
※ The intuition behind the principal component analysis:
(1) Total variance of series of x and y = variance of x + variance of y
= sum of the main diagonal entries in the covariance matrix A
= 0.616556+0.716556 = 1.33311 (The series x, which are the coordinate values
on the x-axis, explains 0.616556/1.33311 = 46.25% of total variance)
(2) Consider P = [v1 v 2 ] and X = PX . (According to the Principal Axes Theorem
on Slide 7.65, it is equivalent to transform x- and y -axes to be v1 - and v 2 -axes,
i.e., the data is the same but with different coordinate values X  on the v1 v 2 -plane.)
 X   P 1 X  PT X  (X )T  X T P, where (X )T  [ x y ] and X T  [ x y ]
 0  (It also implies that the new series
 var((X )T ) = var(X T P)  PT var( X T ) P  PT AP  D   1  of x’ and y’ are independent)
 0 2 
(3) Total variance of transformed series of x and y (called principal components) in X 
= variance of x + variance of y
= sum of the main diagonal entries in the covariance matrix var((X )T ) = 1  2
(4) A property for eigenvalues: Trace(A) =  i , which means that after the transformation,
the total variance remains the same. In this case, 1  2  1.284028  0.049083  1.33311.
(5) The new series x, which are the coordinate values on the v1 -axis, explains 1 /(1  2 )
 1.284028 /(1.284028  0.049083)  96.32% of total variance 7.73
 Step 5: Deriving the transformed data set: (X )T  X T P
v  0.67787 v21  0.73518
Case 1: P   v1 v 2    11  Case 2: Set y '  0 on purpose
v12  0.73518 v22  0.67787 
v  0.67787 v21  0.73518 (X )T   x y
(X )T   x y   x y   11 
v12  0.73518 v22  0.67787    v11 x  v12 y 0
  v11 x  v12 y v21 x  v22 y    0.67787 x  0.73518 y 0
  0.67787 x  0.73518 y  0.73518x  0.67787 y 

x’ y’ x’ y’
-0.82797 -0.17512 -0.82797 0
1.77758 0.14286 1.77758 0
-0.99220 0.38437 -0.99220 0
-0.27421 0.13042 -0.27421 0
-1.67580 -0.20950 -1.67580 0
( X )T  -0.91295 0.17528 ( X )T  -0.91295 0
0.09911 -0.34982 0.09911 0
1.14457 0.04642 1.14457 0
0.43805 0.01776 0.43805 0
1.22382 -0.16268 1.22382 0
0 0 0 0

1.284028 0  1.284028 0
 var((X )T ) =   var((X )T ) = 
 0 0.049083  0 0 7.74
 Step 6: Getting the original data back:
X T  ( X )T P 1 ( ( X )T PT )  original mean, where P = [v1 v 2 ]
 v  0.67787 v12  0.73518
 x y    x y v11  0.73518     v11 x  v21 y v12 x  v22 y
 21 v22 0.67787 
  0.67787 x  0.73518 y  0.73518 x  0.67787 y
case 1 case 2
x y x y
2.5 2.4 2.37 2.52
0.5 0.7 0.61 0.60
2.2 2.9 2.48 2.64
1.9 2.2 2.00 2.11
3.1 3 2.95 3.14
2.3 2.7 2.43 2.58
2 1.6 1.74 1.84
1 1.1 1.03 1.07
1.5 1.6 1.51 1.59
1.1 0.9 0.98 1.01
1.81 1.91 1.81 1.91

※ We can derive the original data set if ※ Although when we derive the transformed data, only v1 and
we take both v1 and v2 and thus x’ thus only x’ are considered, the data gotten back is still
and y’ into account when deriving similar to the original data. That means, x’ can be a
the transformed data common factor almost able to explain both series x and y7.75
v1

v2

※ If only the principal component x’ is considered in the Principal Component Analysis


(PCA), it is equivalent to project all points onto the v1 vector
※ It can be observed in the above figure that the projection onto v1 vector can retains as much
as possible the interpoint distance information (variance) that was contained in the original
series of (x, y)

7.76
 Factor loadings: the correlations between the principal
components (F1 = x’ and F2 = y’) and the original variables (x1
= x and x2 = y)
vij i
lij   F x  i j
s.d . j
0.67787 1.284028 0.73518 0.049083
l11   xx   0.97824 l21   yx   0.20744
0.785211 0.785211
0.73518 1.284028 0.67787 0.049083
l12   xy   0.98414 l22   yy   0.177409
0.846496 0.846496

 Factor loadings are used to identify and interpret the


unobservable principal components

※ The factor loadings of x’ on x and y are close to 1 in absolute levels,


which implies the principal component x’ can explain x and y quite well
and thus x’ can viewed as an index to represent the combined information
of x and y 7.77
※ Questionnaire (問卷) analysis: salary vs. personality and learned-courses
information
i. If there are five variables (the results of five problems), x1 to x5, x1 to x3 are
about personality information of the respondent (作答者), and x4 and x5 are
about the course information which this respondent had learned
ii. Suppose there are two more important principal components x1’ and x2’
(x3’ , x4’ , x5’ and are less important principal components), and the principal
component x1’ is highly correlated with x1 to x3 and the principal component
x2’ is highly correlated with x4 and x5
iii. x1’ can be interpreted as a general index to represent the personal
characteristic, and x2’ can be interpreted as a general index associated with
the course profile that the respondent had learned
iv. Study the relationship between the salary level and the indexes x1’ and x2’

7.78

Anda mungkin juga menyukai