Kernel PCA

Software Assignment 2
Shubham Patel CS17M051

March 16, 2018
1 Kernel PCA
Kernel PCA is the method of dimention reduction where dimention reduction is performed over the transformed higher
dimension space. Higher Dimension space is generated using the kernel methods.
1.1 Kernel Methods

Kernel Method are used to propose a higher dimension space corrosponding to the current space. This can be benifacial since
it can be possible that some points those are not seperable in current dimension can be possibly serperable in some higher
dimension space.
Here are some popular kernel Methods applied to our data set.
1.1.1 Linear
The Linear kernel is the simplest kernel function. It is given by the inner product < x, y > plus an optional constant c.
Kernel algorithms using a linear kernel are often equivalent to their non-kernel counterparts, i.e. KPCA with linear kernel is
the same as standard PCA.
k(x, y) = xT y + c
(a) gamma 0.01 (b) gamma 0.1 gamma 1 (d) gamma 10

Top 1 Component
(f) gamma 0.01 (g) gamma 0.1 (h) gamma 1 (i) gamma 10
Top 2 Component
1
(a) gamma 0.01 (b) gamma 0.1 (c) gamma 1 (d) gamma 10
(e) Top 3 component
1.1.2 Polynomial
The Polynomial kernel is a non-stationary kernel. Polynomial kernels are well suited for problems where all the training data
is normalized.
k(x, y) = (αxT y + c)d
Adjustable parameters are the slope alpha, the constant term c and the polynomial degree d.
(a) degree 1 (b) degree 2 (c) degree 5 (d) degree 10

Top 1 Component
(f) degree 1 (g) degree 2 (h) degree 5 (i) degree 10

Top 2 Component
(a) degree 1 (b) degree 2 (c) degree 5 (d) degree 10

Top 3 component
1.1.3 RBF
The Gaussian kernel is an example of radial basis function kernel.
2
kx − yk2

k(x, y) = exp −
2σ 2
It can also be written as below. Simply be replacing γ = − 2σ1 2
k(x, y) = exp −γkx − yk2

The adjustable parameter σ plays a major role in the performance of the kernel, and should be carefully tuned to according
to the problem. In case of overestimation of sigma it simply act like a linear function, non-linearity will be lost.On other
side, in case of underestimation, the function will lack regularization and the decision boundary will be highly sensitive to
noise in training data.
(a) gamma = 0.01 (b) gamma = 0.1 (c) gamma = 1 (d) gamma = 2
Top 1 component
(f) gamma = 0.01 (g) gamma = 0.1 (h) gamma = 1 (i) gamma = 2
Top 2 component
(k) gamma 0.01 (l) gamma 0.1 (m) gamma 1 (n) gamma 2
Top 3 component
1.1.4 Sigmoid
The sigmoid kernel also known as multilayer perceptron kernel. The Sigmoid Kernel comes from the Neural Networks field,
where the bipolar sigmoid function is often used as an activation function for artificial neurons.
k(x, y) = tanh αxT y + c

Interested to note that SVM model using a sigmoid kernel function is equivalent to a two-layer, perceptron neural network.
There are two adjustable parameter in sigmoid kernel. First α and the second c. α is N1 where N is generally taken as no of
dimensions of the data.
Let’s see some illustraction of KPCA how shape are changing in some top k dimensions.
3
(a) gamma 0.1 (b) gamma 1 (c) gamma 10 (d) gamma 100
Top 1 component
(f) gamma 0.1 (g) gamma 1 (h) gamma 10 (i) gamma 100
Top 2 component
(k) gamma 0.1 (l) gamma 1 (m) gamma 10 (n) gamma 100
Top 3 component
2 Naive Bayes
Naive Bayes is the special case of the Baysian Classifier where the data in different dimension are considered independent
with respect to each other. During the training phase we calculate the mean and variance in wrt every dimension and during
testing phase we utilize this information. It follow the bayes rule that is described as follows.

P Xi P (C)
Cj Ci
P =
Xi P (X)

C
Where P Xji is prosterior probability.P X i
Ci is likely hood that comes from multinomial gaussian distribution. P (C) is
the prior probability.
T
Multinomial gausian distribution p(x; µ, Σ) = 1
n 1 exp −1 21 (x − µ) Σ−1 (x − µ) . Here Σ is the covariance matrix
(2π) 2 |Σ| 2
and µ is the mean.
In case of naive bayes model only diagonal entries of the co-variance matrix are considered. Rest are pivot to zero. This
make our problem computationally more feasible to approach and shape of distribution appeared to be less distorted.
Also, Naive bayes (Bayisan classifiers) are linear in nature. The find hard to classify non-linear boundry.
4
3 Results and Plots of Accuracy
3.1 Training Datasets
Accuracy BarPlots Accuracy BarPlots

100 KPCA + Naive Bayes 100 KPCA + Naive Bayes
Naive Bayes Naive Bayes
80 80
60 60
Accuracy
Accuracy
40 40
20 20
0 0
1 2 3 4 1 2 3 4
Distribution Distribution
(a) N=3,P=0.7K (b) N=10,P=7K
Accuracy BarPlots
100 KPCA + Naive Bayes
Naive Bayes
80
60
Accuracy
40
20
0
1 2 3 4
Distribution
(c) N=50,P=70K
3.2 Testing Train Data Accuracies

3.2.1 Naive : Train
Naive Train distribution 1 distribution 2 distribution 3 distribution 4

N=3,P=0.7K 100 100 86.85 55.28
N=10,P=7k 100 100 98.19 51.32
N=50,P=70k 100 100 99.99 51.08
3.2.2 Naive : Test
Naive Test distribution 1 distribution 2 distribution 3 distribution 4

N=3,P=0.7k 100 100 87.5 48.83
N=10,P=7k 100 100 98.43 49.85
N=50,P=70K 100 100 100 49.94
Parameter Used are :

Kernel : rbf
gamma : 0.01 in dataset 1 and 0.1 in dataset 2
components : 5 in dataset 1 and 10 in dataset 2
5
3.2.3 KPCA + Naive : Train
Naive Train distribution 1 distribution 2 distribution 3 distribution 4

N=2,P=0.7k 100 100 86.71 53.28
N=10,P=7k 100 100 98.21 51.32
N=50,P=70k* 100 100 99.99 50.72
3.2.4 KPCA + Naive : Test
Naive Test distribution 1 distribution 2 distribution 3 distribution 4

N=3,P=0.7k 100 100 87.83 52.66
N=10,P=7k 100 100 98.45 50.06
N=50,P=70K* 100 100 100 49.73
*Not able to perform since datasize is too large to perform kernel transformation, So training is done on a sample of the data
not on full data.
4 References
1. Kernel Functions for the machine learning applications
2. Kernel PCA Documentation SKlearn

3. Naive Bayes: Theory
4. Naive Bayes SK learn documentation
5. Multinomial Gaussian Distribution

Kernel PCA

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Kernel PCA

Diunggah oleh

Hak Cipta:

Format Tersedia

Software Assignment 2

Shubham Patel CS17M051

1.1 Kernel Methods

(a) gamma 0.01 (b) gamma 0.1 gamma 1 (d) gamma 10

(a) degree 1 (b) degree 2 (c) degree 5 (d) degree 10

(f) degree 1 (g) degree 2 (h) degree 5 (i) degree 10

(a) degree 1 (b) degree 2 (c) degree 5 (d) degree 10

k(x, y) = exp −γkx − yk2

k(x, y) = tanh αxT y + c

Accuracy BarPlots Accuracy BarPlots

3.2 Testing Train Data Accuracies

Naive Train distribution 1 distribution 2 distribution 3 distribution 4

3.2.2 Naive : Test

Naive Test distribution 1 distribution 2 distribution 3 distribution 4

Parameter Used are :

Naive Train distribution 1 distribution 2 distribution 3 distribution 4

3.2.4 KPCA + Naive : Test

Naive Test distribution 1 distribution 2 distribution 3 distribution 4

2. Kernel PCA Documentation SKlearn

Anda mungkin juga menyukai