Pca Ica

Principal Component Analysis
by V V HaraGopal, Dept.of Statistics,Osmania University,Hyderabad
Philosophy of PCA
Introduced
by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data in terms of a set of uncorrelated variables We typically have a data matrix of n observations on p correlated variables x1,x2,xp
PCA
looks for a transformation of the xi into p new variables yi that are uncorrelated
The data matrix

case ht (x1) wt(x2) age(x3) sbp(x4) heart rate (x5) 1 2 n 175 156 202 1225 1050 1350 25 31 58 117 122 154 56 63 67
Reduce dimension
The
simplest way is to keep one variable and discard all others: not reasonable! Weight all variable equally: not reasonable (unless they have same variance) Weighted average based on some criterion. Which criterion?
Let us write it first

Looking
for a transformation of the data matrix X (nxp) such that Y= T X= 1 X1+ 2 X2+..+ p Xp
Where
=( 1 , 2 ,.., p)T is a column vector of weights with 1+ 2+..+ p =1
One good criterion

Maximize
the variance of the projection of the observations on the Y variables Find so that Var( T X)= T Var(X)
The
is maximal
matrix C=Var(X) is the covariance matrix of the Xi variables
Let us see it on a figure

Good Better
Covariance matrix
v( x1 ) c(x1,x2 ) ........c(x1,x p ) c(x1,x2 ) v( x2 ) ........c(x2 ,x p ) C= c(x ,x ) c(x ,x )..........v( x ) 2 p p 1 p
And so.. We find that

direction of is given by the eigen vector 1 corresponding to the largest eigenvalue of matrix C The second vector that is orthogonal (un-correlated) to the first is the one that has the second highest variance which comes to be the eigen vector corresponding to the second eigenvalue And so on
The
So PCA gives
New
variables Yi that are linear combination of the original variables (xi): ai1x1+ai2x2+aipxp ; i=1..p
Yi= The
new variables Yi are derived in decreasing order of importance; they are called principal components
Calculating eignevalues and eigenvectors

eigenvalues i are found by solving the equation det(C-I)=0 Eigen vectors are columns of the matrix A such that 1 0 ........ 0 T C=A D A 0 ....... 0 2 Where D= 0
The
.......... 0 .. p
An example
Let C=
us take two variables with covariance c>0
1 c
c 1
C-I= 1
c c 1
det(C-I)=(1- )-c
Solving
this we find 1 =1+c
2 =1-c < 1
and eigenvectors
Any
eigenvector A satisfies the condition CA=A
a1 1 c a1 a1 + ca2 a CA= A= = = 1 c 1 a ca + a a2 2 1 2 a2
Solving we find A1
A2
PCA is sensitive to scale

If
you multiply one variable by a scalar you get different results This is because it uses covariance matrix (and not correlation) PCA should be applied on data that have approximately the same scale in each variable
Interpretation of PCA
The
new variables (PCs) have a variance equal to their corresponding eigenvalue Var(Yi)= i for all i=1p
Small
change little in the direction of component Yi The relative variance explained by each PC is given by i / i
i small variance data
How many components to keep?

Enough
PCs to have a cumulative variance explained by the PCs that is >50-70% Kaiser criterion: keep PCs with eigenvalues >1 Scree plot: represents the ability of PCs to explain de variation in data
Do it graphically
Interpretation of components
See
the weights of variables in each component If Y1= 0.89 X1 +0.15X2-0.77X3+0.51X4

Then
X1 and X3 have the highest weights and so are the most important variable in the first PC See the correlation between variables Xi and PCs: circle of correlation
Circle of correlation
Normalized (standardized) PCA

If
variables have very heterogenous variances we standardize them The standardized variables Xi* Xi*= (Xi-mean)/variance
The
new variables all have the same variance (1), so each variable have the same weight.
Application of PCA
PCA
is useful for finding new, more informative, uncorrelated features; it reduces dimensionality by rejecting low variance features Analysis of expression data Analysis of biological data (Ward et al., 2003)
However
PCA
is only powerful if the required question is related to the highest variance in the dataset If not other techniques are more useful : Independent Component Analysis Introduced by Jutten in 1987
What is ICA?
That looks like That
The idea behind ICA
How it works?
Rationale of ICA
Find
the components Si that are as independent as possible in the sense of maximizing some function F(s1,s2,.,sk) that measures indepedence All ICs (except 1) should be nonNormal The variance of all ICs is 1 There is no hierarchy between ICs
How to find ICs ?

Many
choices of objective function F Mutual information f ( s1 , s2 ,..., sk ) MI = f ( s1 , s2 ,..., sk ) Log f1 ( s1 ) f 2 ( s2 )... f k ( sk )

We
use the kurtosis of the variables to approximate the distribution function The number of ICs is chosen by the user
Difference with PCA

It
is not a dimensionality reduction technique There is no single (exact) solution for components; uses different algorithms (in R: FastICA, PearsonICA, MLICA) ICs are of course un-correlated but also as independent as possible Uninteresting for Normally distributed variables
Example: Lee and Batzoglou (2003)

Microarray
expression data on 7070 genes in 59 Normal human tissue samples (19 types) We are not interested in reducing dimension but rather in looking for genes that show tissue specific expression profile (what make tissue types different)
PCA vs ICA
Hsiao
et al (2002) applied PCA and by visual inspection observed three gene cluster of 425 genes: liverspecific, brain-specific and musclespecific ICA identified more tissue-specific genes than PCA
Thank You

Pca Ica

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Pca Ica

Diunggah oleh

Hak Cipta:

Format Tersedia

Principal Component Analysis

by V V HaraGopal, Dept.of Statistics,Osmania University,Hyderabad

The data matrix

Let us write it first

=( 1 , 2 ,.., p)T is a column vector of weights with 1+ 2+..+ p =1

One good criterion

matrix C=Var(X) is the covariance matrix of the Xi variables

Let us see it on a figure

And so.. We find that

Calculating eignevalues and eigenvectors

us take two variables with covariance c>0

this we find 1 =1+c

eigenvector A satisfies the condition CA=A

PCA is sensitive to scale

i small variance data

How many components to keep?

the weights of variables in each component If Y1= 0.89 X1 +0.15X2-0.77X3+0.51X4

Normalized (standardized) PCA

That looks like That

The idea behind ICA

How to find ICs ?

choices of objective function F Mutual information f ( s1 , s2 ,..., sk ) MI = f ( s1 , s2 ,..., sk ) Log f1 ( s1 ) f 2 ( s2 )... f k ( sk )

Difference with PCA

Example: Lee and Batzoglou (2003)

Anda mungkin juga menyukai