PCA Guide: Principal Component Analysis Explained

Principal Component Analysis
Philosophy of PCA
Introduced
by Pearson (1901) and

Hotelling (1933) to describe the
variation in a set of multivariate data in
terms of a set of uncorrelated variables
We typically have a data matrix of n
observations on p correlated variables
x1,x2,xp
looks for a transformation of the xi
into p new variables yi that are
uncorrelated
PCA
The data matrix

case ht (x1) wt(x2) age(x3) sbp(x4) heart rate (x5)
1
175
1225
25
117
56
156
1050
31
122
63
202
1350
58
154
67
Reduce dimension
The
simplet way is to keep one

variable and discard all others: not
reasonable!
Wheigt all variable equally: not
reasonable (unless they have same
variance)
Wheigted average based on some
citerion.
Which criterion?
Let us write it first

Looking
for a transformation of the data

matrix X (nxp) such that
Y= T X=1 X1+ 2 X2+..+ p Xp
=(1 , 2 ,.., p)T is a column vector of

wheights with
Where
1+ 2+..+ p =1
One good criterion

Maximize
the variance of the projection of

the observations on the Y variables
Find so that
Var( T X)= T Var(X)
is maximal
matrix C=Var(X) is the covariance

matrix of the Xi variables
The
Let us see it on a figure

Good
Better
Covariance matrix
v( x1 ) c(x1,x2 ) ........c(x1,x p )
C=
c(x1,x2 ) v( x2 ) ........c(x2 ,x p )
c(x ,x ) c(x ,x )..........v( x )
1 p
2 p
p
And so.. We find that

The
direction of is given by the

eigenvector 1 correponding to the
largest eigenvalue of matrix C
The second vector that is orthogonal
(uncorrelated) to the first is the one
that has the second highest variance
which comes to be the eignevector
corresponding to the second
eigenvalue
And so on
So PCA gives
variables Yi that are linear
combination of the original variables
(xi):
New
Yi=
ai1x1+ai2x2+aipxp ; i=1..p
new variables Yi are derived in

decreasing order of importance;
they are called principal components
The
Calculating eignevalues and

eigenvectors
The
eigenvalues i are found by

solving the equation
det(C-I)=0
Eigenvectors are columns of the
matrix A such that
1 0 ........0
C=A D AT
0 2 .......0
Where
D= 0
0 ............
p
An example
Let
C=
us take two variables with covariance c>0
1 c
c 1
C-I= 1
c 1
det(C-I)=(1- )-c
Solving
this we find 1 =1+c
2 =1-c < 1
and eigenvectors
Any
a1
A=
a2
eigenvector A satisfies the condition

CA=A
1 c
CA=
c 1
a1
a1 ca2 a1
=
=
a2
ca1 a2 a2
Solving we find A1
A2
PCA is sensitive to scale

If
you multiply one variable by a

scalar you get different results
(can you show it?)
This is because it uses covariance
matrix (and not correlation)
PCA should be applied on data that
have approximately the same scale
in each variable
Interpretation of PCA
The
new variables (PCs) have a

variance equal to their corresponding
eigenvalue
Var(Yi)= i for all i=1p
Small
i small variance data
change little in the direction of

component Yi
The relative variance explained by
each PC is given by i / i
How many components to keep?

Enough
PCs to have a cumulative

variance explained by the PCs that is
>50-70%
Kaiser criterion: keep PCs with
eigenvalues >1
Scree plot: represents the ability of
PCs to explain de variation in data
Do it graphically
Interpretation of components
See
the wheights of variables in each

component
If Y1= 0.89 X1 +0.15X2-0.77X3+0.51X4
Then X1 and X3 have the highest
wheights and so are the mots
important variable in the first PC
See the correlation between variables
Xi and PCs: circle of correlation
Circle of correlation
Normalized (standardized) PCA

If
variables have very heterogenous

variances we standardize them
The standardized variables Xi*
Xi*= (Xi-mean)/variance
The
new variables all have the same

variance (1), so each variable have
the same wheight.
Application of PCA in Genomics

PCA
is useful for finding new, more

informative, uncorrelated features; it
reduces dimensionality by rejecting
low variance features
Analysis of expression data
Analysis of metabolomics data (Ward
et al., 2003)
However
PCA
is only powerful if the biological

question is related to the highest
variance in the dataset
If not other techniques are more
useful : Independent Component
Analysis
Introduced by Jutten in 1987
What is ICA?
That looks like that
The idea behind ICA
How it works?
Rationale of ICA
Find
the components Si that are as

independent as possible in the sens of
maximizing some function F(s1,s2,.,sk)
that measures indepedence
All ICs (except 1) should be nonNormal
The variance of all ICs is 1
There is no hierarchy between ICs
How to find ICs ?

Many
choices of objective function F

Mutual information
f ( s1 , s2 ,..., sk )
MI f ( s1 , s2 ,..., sk ) Log
f1 ( s1 ) f 2 ( s2 )... f k ( sk )
We
use the kurtosis of the variables

to approximate the distribution
function
The number of ICs is chosen by the
user
Difference with PCA

It
is not a dimensionality reduction

technique
There is no single (exact) solution for
components; uses different algorithms
(in R: FastICA, PearsonICA, MLICA)
ICs are of course uncorrelated but
also as independent as possible
Uninteresting for Normally distributed
variables
Example: Lee and Batzoglou (2003)

Microarray
expression data on 7070

genes in 59 Normal human tissue
samples (19 types)
We are not interested in reducing
dimension but rather in looking for
genes that show tissue specific
expression profile (what make tissue
types differents)
PCA vs ICA
Hsiao
et al (2002) applied PCA and

by visual inspection observed three
gene cluster of 425 genes: liverspecific, brain-specific and musclespecific
ICA identified more tissue-specific
genes than PCA

PCA Guide: Principal Component Analysis Explained

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

PCA Guide: Principal Component Analysis Explained

Diunggah oleh

Hak Cipta:

Format Tersedia

Principal Component Analysis

by Pearson (1901) and

The data matrix

simplet way is to keep one

Let us write it first

for a transformation of the data

Y= T X=1 X1+ 2 X2+..+ p Xp

=(1 , 2 ,.., p)T is a column vector of

One good criterion

the variance of the projection of

matrix C=Var(X) is the covariance

Let us see it on a figure

And so.. We find that

direction of is given by the

new variables Yi are derived in

Calculating eignevalues and

eigenvalues i are found by

us take two variables with covariance c>0

this we find 1 =1+c

eigenvector A satisfies the condition

PCA is sensitive to scale

you multiply one variable by a

new variables (PCs) have a

i small variance data

change little in the direction of

How many components to keep?

PCs to have a cumulative

the wheights of variables in each

Normalized (standardized) PCA

variables have very heterogenous

new variables all have the same

Application of PCA in Genomics

is useful for finding new, more

is only powerful if the biological

That looks like that

The idea behind ICA

the components Si that are as

How to find ICs ?

choices of objective function F

use the kurtosis of the variables

Difference with PCA

is not a dimensionality reduction

Example: Lee and Batzoglou (2003)

expression data on 7070

et al (2002) applied PCA and

Anda mungkin juga menyukai