Anda di halaman 1dari 6

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 6June 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 1852



Evolutionary Computing Strategies for Gene
Selection
Meena Moharana
1
, Kaberi Das
2
, Debahuti Mishra
3

1, 2, 3
Institute of

Technical Education and Research,SOA University
Bhubaneswar, Odisha, India

Abstract Recently evolutionary optimization techniques put a
greater impact on the ongoing research field over microarray
gene expression dataset. Finding the gene subset of microarray
data is a big issue which will help in reducing the time complexity
as well as gives relevant chromosome subset that helps in solving
many classical as well as NP-hard problem. In this research, an
efficient clustering technique has been used to finds the
chromosome subset. Here two clustering technique i.e k-means
clustering and hierarchical clustering techniques are used to have
the clusters upon which two evolutionary algorithm i.e an
efficient genetic algorithm (GA) and memetic algorithm (MA) has
been applied. This produces the result in terms of new offspring
for next generation and check whether the produced gene subset
is best suited for next generation or not. The accuracy can be
measured for the subset of gene expression data.

Keywordsclustering, k-means clustering, hierarchical clustering,
evolutionary algorithm.
I. INTRODUCTION
The clustering methods have greater impact on evolutionary
optimization techniques. The aim of clustering method is to
divide the dataset into groups known as cluster. Clustering
analysis has been widely used in the analysis of gene
expression data. The role of cluster analysis in gene research is
as follows: first, similar gene expression patterns may hold the
same function. Thus, the function of genes can be predicted
using gene with known functions in the same clustering
category [1-2]; second, genes with similar expression patterns
are likely in the same cellular process. Therefore, clustering
result can help the study of regulation and control mechanism
of gene regulatory network [3].
Clustering technique has been grouped in to three
categories named as hierarchical clustering; partitioned
clustering and density based clustering (DBCSAN).
Hierarchical clustering algorithm is the most familiar method
for biologist. The hierarchical [4] method takes each object as
distinct cluster and puts the closet object together into one
cluster. Partitions clustering technique constructs various
partitions and evaluate them based upon some criterion. k-
means clustering, k -medoied clustering are of portioned based

clustering technique. Among them k-means clustering
technique is a popular clustering technique. It needs to provide

clustering category in advance [1-5]. DBCSAN is the
clustering technique, which partition the data in order to
reduce the search space of the neighbourhoods. It detects
arbitrary shaped clusters as dense region of objects in the data
space which are separate by low density region [6].
Clustering evolution is the change in the inherited
characteristics of biological populations over successive
generations. Evolutionary process gives rise to diversity at
every level of biological organization, including species,
individual organisms and molecules such as DNA and protein
[7]. Charle Darwin and Aferred Wallace were the 1
st
to
formulate a scientific argument for the theory of evolution by
means of natural selection. In 1920s and 1930s a modern
evolutionary synthesis connected natural selection, mutation
theory ad Mendelian inheritance in to unified theory that
applied to any branch of biology [7].
At the turn of 20th century pioneers in the field of
population generation of evolution on to a robust statistical
philosophy. The false contradiction between Darwins theory
genetic mutation and Mendelian inheritance was thus
reconciled. There are the different types of evolutionary
algorithms presents for the application in engineering field as
well in the field of bio-medical studies. Genetic algorithm
(GA), first developed by Holland [8] is biologically inspired
and embodies many mechanisms minimizing natural
evolution.GA is an evolutionary optimizer that takes a sample
of possible solution (individual) and employs mutation ,
crossover and selection as the primary operators for
optimization[9]. Memetic Algorithm (MA), are inspired by
Dawkins [10] nation of a meme. A meme is a cultural gene
and in contrast to genes, memes are usually adapted by the
people who transmit the before they are passed to the next
generation [11]. MA takes all the possible samples and
employs global search, crossover, mutation, local search over
the dataset to get best gene subset. MAs are known to exploit
the correlation structure of the fitness landscape of
combinatorial optimization problem [11-12].

II. BACKGROUND STUDY

The problem of finding a relevant gene subset from any
microarray dataset is a critical issue. Cuizhen et al. [13]
applied an improved k-means clustering method to GA for
chromosomes encoding and randomly choose initial clustering
centres to form chromosome among samples. Though, the
International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 6June 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 1853

cluster centre number i.e. k is hard to determine, the cluster
centre can be taken randomly. Pingping sun et al. [14], used
the k-means clustering to predict the protein-protein
interactions. The k-means clustering was improved to make it
more effective in the analysis of protein phylogentic profile,
which focuses on the improvement of k-means clustering and
enhance the accuracy of prediction of protein structure. Md.
Kamrul Islam and Madhu chetty [15] combined the clustering
technique with MA to improve the protein structure prediction
through a novel initial population and tried to avoid the
processing of unnecessary individuals in the population
dataset. The efficiency of meme generation technique and
meme replacement technique had been discussed through
proposed clustered MA. Williom A. Greene [16] proposed
hierarchical algorithm which is based on distance between data
elements and creates centroid, when it apply on evolutionary
concept i.e. GA give better result the clusters on microarray
data give more accurate cluster depending upon the centre of
cluster. Satish Gajawada and Durga Toshriwal [17] applied the
hierarchical clustering on GA to enhance the accuracy of
classification technique. They have combinely use the
clustering technique and classification technique for the
diagnosis of diabetic dieses. By the use of both the technique
misclassified instances has been removed which leads to better
accuracy in terms of classification of data element. Wai-HoA
et al. [18] suggested that clustering the large dataset having the
number of attribute and instances can be reduced in terms of
dimension. It gives the smaller subset of original one which is
having only the meaningful data and the subset helps in
selecting and grouping the relevant genes with other inter
dependent genes.
III. PRELIMINARY CONCEPTS
A) Hierarchical Clustering

Hierarchical clustering gives the output in a tree like or
hierarchical manner. The elements or object of the data set
represents as an individual or distinct cluster and put the
object on the basis of distance in to the clusters. It follows
some rules while clustering. If below data is deleted then copy
the cluster index of below gene to the above gene else vice
versa. In this paper, we have clustered the gene expression
microarray data set by hierarchical clustering. First, the
distance between one object to the other has been calculated
and according to the distance measure, the clusters have been
formed. Here, the Euclidian distance measure method used to
calculate the distance of each object to other to form the
clusters. If A
i
and A
j
are two genes and i, j {1, 2,, p} and
ij, then the Euclidian distance between A
i
and A
j
is shown in
equation (1).

Euclidian distance:
(Ai,A]) = (wik w]k)
n
k=1
(1)
Hierarchical clustering requires a measure of similarity
between groups of data points like distance measure. The main
motto behind the use of hierarchical clustering to build a
binary tree of the microarray data that successively merges the
similar groups of points based upon distance measure.

B) k-means Clustering

It is one of the partition based clustering technique. The k-
means clustering is a fast and efficient clustering method with
a strong local search capability. It depends upon the initial
cluster k and cluster centres c, the k can be determined through
prior knowledge. In k-means algorithm, first, the k can be
initialized randomly and the distance has been calculated then
the centroid has been taken randomly in initial state and the
smallest distance to the centroid can be measured by each
gene. Accordingly on the basis of distance, new cluster groups
have been created. And all the steps have been repeated till
there no further change in group found out. Due to the efficient
local search capability, this technique has been used combinely
in GA and MA. It helps in finding the local optimum subset of
relevant gene. At each iteration, the new centroid as bar center
is found out. After the new centroid, a new binding has to be
done between the same data point and nearest new centroid.
This loop has been carried out till no new group formed. The
output will be a set of k clusters that minimizes the sum of the
dissimilarities of all objects to their nearest centroid. It helps in
finding best possible subset from the gene expression
microarray data set.
C. Evolutionary Techniques

I. Genetic Algorithm (GA)
GA is an evolutionary technique based on Darwinian evolution
theory. GA was 1
st
proposed by Holland (1975). With the help
of GA, we got an optimal solution subset of microarray gene
expression data set within a tractable time period. Our model
deals with finding the gene subset selection by applying meta
heuristic GA. We have combined the clustering technique with
the genetic operators to deal with data.GA carry the concept of
evolution technique which help in giving the gene subset.
Different evolutionary genetic operators are selection,
crossover, mutation and termination criterion. As an
optimization technique GA simultaneously examines and
manipulates a set possible solution. During each iteration of
the algorithm, the process of selection, reproduction, and
mutation each take place in order to produce the new offspring
for the next generation. There are the GA operators as
discussed below:

Initial Population:
The initial population indicates to the gene expression
microarray data set i.e GA begins with the randomly selected
gene sequence for selecting the population. May the whole
dataset or a subset of the data set is taken as initial population.
International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 6June 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 1854

GA uses the current population of strings to create a new
population for next generation iteratively.

Crossover:
It is a generic operator that combines the two randomly
selected gene i.e mate
1
and mate
2
to produce new offspring.
This new offspring carried out to the next generation as parent
gene. Likewise the each pair of chromosome produce 2
n
no of
offspring and n denotes the no of parents. The cross_rate is
randomly initialized. It is between 0.1 to 0.9 and the value has
been taken randomly at the time of initialization.

Mutation:
Initially mute_rate has been initialized. It is also randomly
taken in between 0.1 to 0.9. In mutation some bit of characters
are interchange at mating pole. The exchange of some
character or some information taking place between the
parental chromosome and produce some extra variant feature
in the upcoming offspring which are carried out to next
generation. In mutation the exchange of information takes
place at mating pole.

Termination Criterion:
Termination is the criterion by which the genetic algorithm
decides whether to continue the searching or stop the
searching. At each of the iteration termination criterion is
checked and designed according to the problem domain after
each generation to see whether the stopping condition is
satisfied or not.

Pseudocode of GA
1. Set pop_size,max_gen,gen=0
2. Set cross_rate,mute_rate
3. Initialize the population
4. While max_gen >=gen
5. Evaluate fit_fun
for (i=1:pop_size)
select(mate1,mate2)
child=crossover(mate1,mate2) //child is taken to
the next generation
if (rand(0,1)<=mute_rate)
child=mutation(child1,child2)
end for
add child to new generation
Gen=gen+1
end while
while termination condition satisfy
return the best chromosome subset
end

II. Memetic Algorithm:
MA was first introduced by Mascato (1989) in his technical
report. MAs are based on the Darwinian evolutionary concept
theory which is a meta-heuristics evolutionary technique. That
means the algorithm maintain a population of solution for the
problem statement i.e. a pool comprising several solutions
simultaneously which is known as individuals. As it can be
seen, each generation consists of the updating of a population
of individuals, hopefully leading to better and better solutions
for the problem being tackled. There are three main
components in this generational step: selection, reproduction,
and replacement. The component (selection) is responsible
(jointly with the replacement stage) for the com-petition
aspects of individuals in the population. Using the information
provided by an ad hoc guiding function (fitness function in the
EA terminology), the goodness of individuals in pop is
evaluated; subsequently, a sample of individuals is selected for
reproduction according to this goodness measure. This
selection can be done in a variety of ways. The most popular
techniques are fitness-proportionate methods (the probability
of selecting an individual for breeding is proportional to its
fitness), rank-based methods (the probability of selecting an
individual depends on its position after ranking the whole
population), and tournament-based methods (individuals are
selected on the basis of a direct competition within small sub-
groups of individuals).

Pseudocode of MA
1. Initialize population
2. Set pop_size,max_gen,gen=0
3. Set cross_rate,mute_rate
4. Evaluate gene subset
5. While (stopping criteria not met) do
6. Select individuals for variation
7. Crossover
8. Mutation
9. Local search
10. Evaluate accuracy of gene subset
11. Select new population
12. do
13. end
In MA an extra variant has been added i.e hill climbing
local search [19] which after mutation, search again to find
most relevant genes and improves the quality of gene subset
that would be considered as parent chromosome subset for
next generation.

Pseudocode of hill climbing local search:

While (termination condition is not satisfied) do
New subset solution neighbors(best solution)
If new subset solution is better than actual solution
then
International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 6June 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 1855

Best solution actual solution
End if
End while

IV. SCHEMATIC REPRESENTATION OF MODEL

Proposed model consists of normalization of dataset. Through
min-max normalization data became normalized between the
ranges of [0-1] or [-1,+1]. Then it undergoes for feature
reduction. After reducing the data two clustering techniques
are to be applied. Then two evolutionary algorithms i.e. GA
and MA have been applied. It would give result in terms of
gene subsets. Finally, result is compared and check which
evolutionary algorithm gives best gene subset on which
clustering technique.













Fig. 1 Proposed model
V. EXPERIMENTAL EVALUATION AND RESULT ANALYSIS

For the evaluation of the proposed work we have taken two
bench marked dataset from UCI [20] repository whose
description are given in table 1. All the experiment has been
carried out on Intel core i3 processor, 2.3 GHZ, 32 bit 1GB
RAM, 1GB disk space for MATLAB, 3-4 GB for ideal
installation and evaluated using MATLAB 2010 upon two data
set . Our experimental work consists of following steps:

TABLE I
DESCRIPTION OF DATASET

Data set used Dimension
Breast-cancer 98X25
Leukaemia 72X256

Step1: Normalization of datasets: Dataset has multivariate
characteristics and the attributes have real characteristics. The
value of data set D has been mapped within a small range to
get D with one minimum value and one maximum value of
the data set by applying the min-max normalization formula
given in equation (2).

: =
: min
A
mox
A
min_A
(ncw_mox
A
ncw_min
A
)
(2)
Here, v contain the normalized data of the data set which after
stored in v in the matrix form.
min
A
=minimum of attribute value
max
A
=maximum of attribute value
new
maxA
=+1
new
minA
=-1
Normally the term (new
maxA
new
minA
) not taken in to
consideration if range is [0, 1]. Fig. 2(a) shows the output of
breast-cancer dataset after normalization whereas fig. 2(b)
shows the output of leukaemia dataset after normalization.

(a) (b)
Fig.2 After normalization of the data set
Step2: Feature reduction using PCA
Here we apply PCA [21] for reducing the dataset to get only
those genes which are relevant. Fig. 3(a) shows the reduced
form of breast-cancer dataset and fig. 3(b) shows the reduced
form of leukaemia dataset. Only some of the genes are
relevant. Fig. 3(a) shows the reduced form of breast-cancer
dataset between the range of [0.04, 0.14], fig. 3(b) shows the
reduced form of leukaemia dataset within a range of [0.14,
0.156]. After reduction data are taken in the ratio of 3:1 i.e.
75% are taken for testing purpose and 25% are taken for
testing the data.

(a) (b)
Fig.3 : After reduction of both the data set

Step3: Clustering
A. k-means clustering: Here k-means clustering has been
applied on above the two dataset after reduction to have the
clusters and to get the number of class level. Fig. 4(a) shows
the clusters of breast-cancer dataset and group the data points
into three clusters by giving three class level, fig. 4(b) shows
the clusters of leukaemia dataset and group the data points into
two clusters by giving two class level after applying k-means
Min-Max
Normalization
PCA
k-means
clustering
Hierarchical
Clustering
Memetic
Algorithm
(MA)
Genetic
Algorithm
(GA)
Clustering
Technique
Evolutionary
algorithm

Gene
Expression
Dataset
C
o
m
p
a
r
i
s
o
n

o
f

r
e
s
u
l
t


International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 6June 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 1856

clustering technique. The cluster has been found out based on
the euclidian distance along with centre c which here is taken
randomly so that we separate each data and arrange in different
clusters.

(a) (b)
Fig. 4 Figure showing the clusters of both the dataset after applying k-
means clustering

B. Hierarchical clustering: Here hierarchical clustering has
been applied on the above dataset. Fig. 5(a) shows the
dendrogram representation of clusters of breast-cancer dataset,
fig. 5(b) shows the dendrogram representation of clusters of
leukaemia dataset after applying hierarchical technique. We
have the clusters using distance measure technique so that the
points having minimum distance are grouped and a
hierarchical tree like structure are formed by forming the
clusters. Here distance is calculated by applying equation (1).

(a) (b)
Fig. 5 Figure showing the clusters of both the dataset after applying
hierarchical clustering

Step 4 Application of Genetic Algorithm for finding the subset
of genes: After applying the clustering technique, we get
subset of gene in terms of clusters. Then we apply the genetic
algorithm upon those gene subset. Gene subset are taken as
initial population upon which all the procedure is carried out
as mention in pseudocode and for our experiment the mate1
and mate2 are randomly taken as parent chromosome. Here
mute_rate and cross_rate are set to be 0.5 and 0.6 respectively
and after the experiment fitness of gene subset has been
estimated in terms of accuracy. Fig. 6 (a) shows the output of
k-means clustering with GA on breast cancer dataset and fig. 6
(b) shows the output of k-means with GA on leukaemia
dataset. Likewise fig. 7 (a) and fig. 7(b) shows the output of
hierarchical clustering with GA on both the data set
respectively.

(a) (b)
Fig. 6 After applying the GA with k-means clustering on the both data set

(a) (b)
Fig. 7 After applying the GA with hierarchical clustering on both dataset

Step5: Application of Memetic Algorithm for finding the
subset of genes: Accordingly we apply the memetic algorithm
on the clustered gene subset to check fitness in terms of
accuracy. Here after cluster the cluster subset are taken
initially for initial population and mute_rate and cross_rate are
set to be 0.5 and 0.6 respectively and after the experiment
fitness of gene subset has been estimated in terms of accuracy.
Fig. 8(a) and fig. 8(b) shows the output of k-means clustering
with memetic algorithm on breast-cancer dataset. Accordingly,
fig. 9(a) and fig. 9(b) shows the output of hierarchical
clustering with memetic algorithm on both the data set and
fitness of each gene subset has been estimated.

(a) (b)
Fig. 8 After applying the memetic algorithmwith k-means clustering on
both dataset


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
After K Means


class 1
class 2
class 3
-500 -400 -300 -200 -100 0 100
-600
-500
-400
-300
-200
-100
0
After K Means


class 1
class 2
12 30 1 11 29 23 3 16 9 4 27 8 7 14 18 17 24 19 2 13 22 28 25 5 21 10 6 15 26 20
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
After Heirarchical Clustering


215 31826 810172716 424123020 129221314 919 6 51123 7212528
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
3
x 10
4
After Hierarchical clustering


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
After K Means &GA


class 1
class 2
class 3
-500 -400 -300 -200 -100 0 100
-600
-500
-400
-300
-200
-100
0
After K Means &GA


class 1
class 2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
After HC &GA


class 1
class 2
class 3
-500 -400 -300 -200 -100 0 100
-600
-500
-400
-300
-200
-100
0
After HC & GA


class 1
class 2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
After K Means &MM


class 1
class 2
class 3
-500 -400 -300 -200 -100 0 100
-600
-500
-400
-300
-200
-100
0
After K Means &MM


class 1
class 2
International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 6June 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 1857


(a) (b)
Fig. 9 After applying the memetic algorithm with hierarchical clustering on
both dataset

In our experiment we have applied the evolutionary
optimization technique on clustering technique to find out
accuracy of each clustered gene subset along with clustering
technique as given in table-II. The experimental result shows
that memetic algorithm give better accuracy in both the
clustering technique on the data set as compared to other
techniques.
TABLE III
ACCURACY TABLE

Dataset Clustering with
Evolutionary Algorithm
Accuracy
Breast-cancer Only k-means clustering 86.7347
k-means with GA 90.8163
k-means with MA 94.8980
Leukaemia
cancer
Only k-means clustering 84.7222
k-means with GA 90.2778
k-means with MA 94.4440
Breast-cancer Only hierarchical Clustering 90.8163
Hierarchical with GA 92.8571
Hierarchical with MA 95.9184
Leukaemia
cancer
Only hierarchical Clustering 90.2778
Hierarchical with GA 93.0556
Hierarchical with MA 94.4444
VI. CONCLUSION
In our experiment, the relevant gene subset of different micro
array data has been derived. We have got the relevant gene
subsets which are in terms of clusters after applying the
hierarchical clustering and k-means clustering of the dataset
and on the subset the genetic algorithm and memetic algorithm
has been applied. The accuracy of gene subset has been
estimated in each technique and the result has been compaired.
Finally we conclude that the memetic algorithm gives better
result in terms of accuracy of subset as compaired to genetic
algorithm and with all the clustering technique. In future,
different classifier can be applied to check the accuracy of
gene subset upon other clustering technique and evolutionary
algorithm to get relevant gene subset.

VII. REFERENCES

[1] Guiquan Liu, Xiufang J iang, Lingyun Wen, A Clustering System for
Gene Expression Data Based upon Genetic Programming and the HS-
Model, Third International Joint Conference on Computational
Science and Optimization, IEEE,pp.238-241,2010.
[2] J iang D, Tang C, Zhang, A. Cluster analysis for gene expression: a
survey, IEEE Transactions on Knowledge and Data Engineering, Vol.
16(11), pp. 1370-1386, 2004.
[3] Sharan R and Shamir R, A clustering algorithmwith applications to
gene expression analysis, Principles of Intelligent Systems for
Molecular Biology , pp. 307-316, 2000.
[4] Vilda Purutc, Uoglu Gazi, Elif Kays, Comparing Clustering
Techniques for Real Microarray Data, IEEE/ACM International
Conference on Advances in Social Networks Analysis and
Mining,pp.788-791, April 15, 2012.
[5] Hartigan J A and Wong M A, A K-means Clustering algorithm, Proc.
of Applied Statistics, Vol 28, pp. 100-108, 1979.
[6] Yasser El-Sonbaty, M. A. Ismail, Mohamed Farouk, An Efficient
Density Based Clustering Algorithmfor Large Databases, Proceedings
of the 16th IEEE International Conference on Tools with Artificial
Intelligence (ICTAI), IEEE, pp.1-5, 2004.
[7] http://www.wikipedia.com
[8] Weiguo Sheng, Xiaohui Liu, and Michael Fairhurst, A Niching
Memetic Algorithm for Simultaneous Clustering and Feature
Selection, IEEE Transactions On Knowledge and Data Engineering,
Vol. 20, no. 7,pp. 868 -879,J uly 2008.
[9] Yahya Rahmat-Samii, Genetic Algorithm(GA) and Particle Swarm
Optimization (PSO) in Engineering Eelectromagnetics, 17th
lnternational Conference on Applied Electromagnetics and
Communications,pp.1-5, 1 - 3 October 2003.
[10] Beatrice Duval,J in-Kao Hao,J ose Crispin Hernandez,"A Memetic
Algorithm for Gene Selection and Molecular Classification of
Cancer",South African Computer Journal ,Issue 36, 2(1-2), pp. 1 -
135,2008.
[11] Nora Speer, Peter Men,.Christian Spieth, Andreas Zell, Clustering
Gene Expression Data with Memetic Algorithms based on Minimum
Spanning Trees,IEEE,pp. 1848-1855,2003.
[12] P. Merz,Clustering gene expression profiles with memetic algorithms,
In Proceedings of the 7/11 International Conference on Parallel
Problem Solving from Nature, PPSN VI, Lecture Notes in Computer
Science, Springer, Berlin, Heidelberg, pp. 811-820,2002.
[13] Wenhua Dai, Cuizhen J iao, Tingting He,Research of K-means
Clustering Method Based on Parallel Genetic Algorithm, Intelligent
Information Hiding and Multimedia Signal Processing(IIHMSP),
vol.2,pp.158-161,26-28 Nov 2007.
[14] Pingping Sun et. al., Application of Improved K-mean Clustering in
Predicting Protein-Protein Interactions, International Conference on
BioMedical Engineering and Informatics,pp. 83-86,2008.
[15] Md. Kamrul Islam, Madhu Chetty, Clustered Memetic Algorithmfor
Protein Structure Prediction, Evolutionary Computation (CEC), IEEE
Congress,pp. 1-8, 18-23 J uly 2010.
[16] William A. Greene,Unsupervised Hierarchical Clustering via a
Genetic Algorithm, Evolutionary Computation (CEC),IEEE
Congress,Vol2,pp.998-1005, 8-12 Dec. 2003.
[17] Satish Gajwada,Durga Toshniwal, A framework for classification
using genetic algorithmbased clustering ,Intelligent Systems Design
and Applications (ISDA), 12th International Conference,pp. 752 757,
27-29 Nov. 2012.
[18] Wai-Ho Au, Keith C.C. Chan, Andrew K.C. Wong, and Yang Wang,
Attribute Clustering for Grouping, Selection, and Classification of
Gene Expression Data, IEEE/ACM Transactions on Computational
Biology and Bioinformatics, Vol. 2, No. 2, pp.83-101,April-J une 2005.
[19] Poonam garg, A Comparison between Memetic algorithmand
Genetic algorithmfor the cryptanalysis of Simplified Data Encryption
Standard algorithm, International Journal of Network Security & Its
Applications (IJNSA), Vol.1, No 1, pp. 34-42,April 2009
[20] http://archive.ics.uci.edu/ml.
[21] Rajashree Dash, Rasmita Dash,Debahuti Mishra.A Hybridized Rough-
PCA Approach of Attribute Reduction for High Dimensional Data Set,
European Journal of Scientific Research, Vol.44, No.1, pp.29-
38,2010.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
After HC &MM


class 1
class 2
class 3
-500 -400 -300 -200 -100 0 100
-600
-500
-400
-300
-200
-100
0
After HC & MM


class 1
class 2

Anda mungkin juga menyukai