Anda di halaman 1dari 6

Software Assignment 2

Shubham Patel CS17M051


March 16, 2018

1 Kernel PCA
Kernel PCA is the method of dimention reduction where dimention reduction is performed over the transformed higher
dimension space. Higher Dimension space is generated using the kernel methods.

1.1 Kernel Methods


Kernel Method are used to propose a higher dimension space corrosponding to the current space. This can be benifacial since
it can be possible that some points those are not seperable in current dimension can be possibly serperable in some higher
dimension space.

Here are some popular kernel Methods applied to our data set.

1.1.1 Linear
The Linear kernel is the simplest kernel function. It is given by the inner product < x, y > plus an optional constant c.
Kernel algorithms using a linear kernel are often equivalent to their non-kernel counterparts, i.e. KPCA with linear kernel is
the same as standard PCA.
k(x, y) = xT y + c

(a) gamma 0.01 (b) gamma 0.1 gamma 1 (d) gamma 10


Top 1 Component

(f) gamma 0.01 (g) gamma 0.1 (h) gamma 1 (i) gamma 10
Top 2 Component

1
(a) gamma 0.01 (b) gamma 0.1 (c) gamma 1 (d) gamma 10
(e) Top 3 component

1.1.2 Polynomial
The Polynomial kernel is a non-stationary kernel. Polynomial kernels are well suited for problems where all the training data
is normalized.
k(x, y) = (αxT y + c)d
Adjustable parameters are the slope alpha, the constant term c and the polynomial degree d.

(a) degree 1 (b) degree 2 (c) degree 5 (d) degree 10


Top 1 Component

(f) degree 1 (g) degree 2 (h) degree 5 (i) degree 10


Top 2 Component

(a) degree 1 (b) degree 2 (c) degree 5 (d) degree 10


Top 3 component

1.1.3 RBF
The Gaussian kernel is an example of radial basis function kernel.
2
kx − yk2
 
k(x, y) = exp −
2σ 2
It can also be written as below. Simply be replacing γ = − 2σ1 2

k(x, y) = exp −γkx − yk2




The adjustable parameter σ plays a major role in the performance of the kernel, and should be carefully tuned to according
to the problem. In case of overestimation of sigma it simply act like a linear function, non-linearity will be lost.On other
side, in case of underestimation, the function will lack regularization and the decision boundary will be highly sensitive to
noise in training data.

(a) gamma = 0.01 (b) gamma = 0.1 (c) gamma = 1 (d) gamma = 2
Top 1 component

(f) gamma = 0.01 (g) gamma = 0.1 (h) gamma = 1 (i) gamma = 2
Top 2 component

(k) gamma 0.01 (l) gamma 0.1 (m) gamma 1 (n) gamma 2
Top 3 component

1.1.4 Sigmoid
The sigmoid kernel also known as multilayer perceptron kernel. The Sigmoid Kernel comes from the Neural Networks field,
where the bipolar sigmoid function is often used as an activation function for artificial neurons.

k(x, y) = tanh αxT y + c




Interested to note that SVM model using a sigmoid kernel function is equivalent to a two-layer, perceptron neural network.
There are two adjustable parameter in sigmoid kernel. First α and the second c. α is N1 where N is generally taken as no of
dimensions of the data.

Let’s see some illustraction of KPCA how shape are changing in some top k dimensions.

3
(a) gamma 0.1 (b) gamma 1 (c) gamma 10 (d) gamma 100
Top 1 component

(f) gamma 0.1 (g) gamma 1 (h) gamma 10 (i) gamma 100
Top 2 component

(k) gamma 0.1 (l) gamma 1 (m) gamma 10 (n) gamma 100
Top 3 component

2 Naive Bayes
Naive Bayes is the special case of the Baysian Classifier where the data in different dimension are considered independent
with respect to each other. During the training phase we calculate the mean and variance in wrt every dimension and during
testing phase we utilize this information. It follow the bayes rule that is described as follows.
 
  P Xi P (C)
Cj Ci
P =
Xi P (X)
   
C
Where P Xji is prosterior probability.P X i
Ci is likely hood that comes from multinomial gaussian distribution. P (C) is
the prior probability.  
T
Multinomial gausian distribution p(x; µ, Σ) = 1
n 1 exp −1 21 (x − µ) Σ−1 (x − µ) . Here Σ is the covariance matrix
(2π) 2 |Σ| 2
and µ is the mean.
In case of naive bayes model only diagonal entries of the co-variance matrix are considered. Rest are pivot to zero. This
make our problem computationally more feasible to approach and shape of distribution appeared to be less distorted.
Also, Naive bayes (Bayisan classifiers) are linear in nature. The find hard to classify non-linear boundry.

4
3 Results and Plots of Accuracy
3.1 Training Datasets

Accuracy BarPlots Accuracy BarPlots


100 KPCA + Naive Bayes 100 KPCA + Naive Bayes
Naive Bayes Naive Bayes
80 80

60 60
Accuracy

Accuracy
40 40

20 20

0 0
1 2 3 4 1 2 3 4
Distribution Distribution
(a) N=3,P=0.7K (b) N=10,P=7K

Accuracy BarPlots
100 KPCA + Naive Bayes
Naive Bayes
80

60
Accuracy

40

20

0
1 2 3 4
Distribution
(c) N=50,P=70K

3.2 Testing Train Data Accuracies


3.2.1 Naive : Train

Naive Train distribution 1 distribution 2 distribution 3 distribution 4


N=3,P=0.7K 100 100 86.85 55.28
N=10,P=7k 100 100 98.19 51.32
N=50,P=70k 100 100 99.99 51.08

3.2.2 Naive : Test

Naive Test distribution 1 distribution 2 distribution 3 distribution 4


N=3,P=0.7k 100 100 87.5 48.83
N=10,P=7k 100 100 98.43 49.85
N=50,P=70K 100 100 100 49.94

Parameter Used are :


Kernel : rbf
gamma : 0.01 in dataset 1 and 0.1 in dataset 2
components : 5 in dataset 1 and 10 in dataset 2
5
3.2.3 KPCA + Naive : Train

Naive Train distribution 1 distribution 2 distribution 3 distribution 4


N=2,P=0.7k 100 100 86.71 53.28
N=10,P=7k 100 100 98.21 51.32
N=50,P=70k* 100 100 99.99 50.72

3.2.4 KPCA + Naive : Test

Naive Test distribution 1 distribution 2 distribution 3 distribution 4


N=3,P=0.7k 100 100 87.83 52.66
N=10,P=7k 100 100 98.45 50.06
N=50,P=70K* 100 100 100 49.73

*Not able to perform since datasize is too large to perform kernel transformation, So training is done on a sample of the data
not on full data.

4 References
1. Kernel Functions for the machine learning applications

2. Kernel PCA Documentation SKlearn


3. Naive Bayes: Theory
4. Naive Bayes SK learn documentation
5. Multinomial Gaussian Distribution

Anda mungkin juga menyukai