p
:1i
(r
n1
) p
:2]
(r
n2
)
_
T
r
denotes [0,1] , i denotes the number of fuzzy
of the feature f
1
, j denotes the number of fuzzy of the
feature f
2
, denotes the membership grade of
the value of the feature f
1
of the sample r
1
belonging
to a fuzzy set v
11 ,
denotes the minimum operator.
A threshold parameter given by the user is used
to create this matrix. In this matrix i is the number of
fuzzy sets defined in the feature f maximum class degree
is smaller than the given threshold value. j is the number
of fuzzy sets defined in the feature f whose maximum
class degree is smaller than the given threshold value.
step 7: The fuzzy entropy measure CFE(f
1
,f
2
)of a feature
subset focusing on boundary samples is dened as
follows:
CFE(
1
,
2
)
=
S
1B
S
1
X
S
w
S
FS
wI
FS
FE(w) +
S
:1
S
1
:1I
1uB
FE(:
1
)
i
S
1B
S
1
<
S
2B
S
2
S
2B
S
2
X
S
w
S
FS
wI
FS
FE(w) +
S
:2
S
2
:2I
1uB
FE(:
2
)
i 0tcrwisc
S
1B
denotes the summation of the member grade
of the value of the feature f
1
,S
FS
denotes the summation
of the membership grade values of the feature
subset(f
1
,f
2
), S
w
denotes the summation of the
membership grade values of the feature subset (f
1
,f
2
) of
the samples belongs to a combines fuzzy set w, FE(v
1
)
denotes the fuzzy entropy of a combined fuzzy set v
1
of
the feature f
1
and f
2
denotes the fuzzy entropy of the fuzzy
set v
2
of the feature f
2
.Find the minimizes the function
CFE and add it the selected feature subset.
Step 8: Convert the selected feature file into .arff file
format for calculating accuracy by using WEKA tool.
Pseudo code of fuzzy Entropy
Fuzzy entropy(Dataset,Threshold(T
c
,T
r
))
{
do
{
Select feature f;
K=2;
While true
{
Using K-Means algorithm fink K cluster
centres in feature f;
Find membership functions using clusters
K Centres;
Calculate fuzzy entropy of feature f;
If (decreasing rate of fuzzy entropy>T
c
)
K=K+1;
else
K=K-1;
break;
}
}
Create extension matrix for each feature f;
Calculate fuzzy entropy of each feature;
While true
{
Select feature subset f with minimum fuzzy
entropy value;
Add f into previous selected subset and update
combined extension matrix;
International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 2, March April 2013 ISSN 2278-6856
Volume 2, Issue 2 March April 2013 Page 33
Calculate fuzzy entropy of new selected subset
according to T
r
;
If (new fuzzy entropy value >previous
fuzzy entropy value) or (fuzzy entropy
=zero) or (there is no additional feature
for selection)
break;
} While exist feature in dataset D
}
4. RESULT
The proposed entropy method is to be
implemented in MATLAB. The experimental data sets
are belongs to UCI machine learning repository. The Iris
data set, the Breast cancer data set, the Cleve data set is
used in this experiment. First, apply the proposed method
to select feature subsets of these three data sets (i.e., the
Iris data set, the Breast cancer data set, and the Cleve
dataset), respectively. The accuracy rate is shown in the
table.
Table 1: A comparison of the accuracy rates of different
methods
S. no Various Data
Sets
FQI
Method
MIFS
Method
Proposed
entropy
Method
1 Iris Data Set 94.67% 94.67% 94.69%
2 Breast Data Set 97.05% 96.05% 97.09%
3 Cleve Data Set 84.47% 83.69% 84.52%
From the table 1 the average classification
accuracy rates of different feature selection methods with
respect to different clustering approaches are shown in
the Table 1.
The proposed feature subset selection method
with the methods used to compare with the proposed
method in the experiments, i.e.,the FQI(Frequency
Quality Index) method ,the MIFS(Mutual information
based Feature Selector) method, where the Iris dataset,
the Breast cancer data set and the Cleve data set are used
in our experiments.
Figure 3: comparison between FQI, MFIS and proposed
entropy method
5. CONCLUSION
The paper is concerned with fuzzy sets and decision tree.
In this paper, feature selection based on fuzzy set theory
and information theory is presented. It proposes a fuzzy
method of which numeric attributes can be represented
by fuzzy number, interval value as well as crisp value, of
which nominal attributes are represented by crisp
nominal value, and of which class has confidence factor.
An example is used to prove the validity. First, the fuzzy
set theory is applied to transform real-world data into
fuzzy linguistic forms. Secondly, the information theory
is used to select the sub set of the feature selection.
Through the integration of both fuzzy set theory and
information theory, it can make classification tasks
originally thought too difficult or complex to become
possible.
REFERENCES
[1] Wa'el M. Mahmud, Hamdy N.Agiza, and Elsayed
Radwan (October 2009) ,Intrusion Detection Using
Rough Sets based Parallel Genetic Algorithm Hybrid
Model, Proceedings of the World Congress on
Engineering and Computer Science 2009 Vol II
WCECS 2009, San Francisco, USA.
[2] Thangavel, K., & Pethalakshmi, A. Elseviewer
(2009)., Dimensionality reduction based on rough
set theory 9, 1-12. doi: 10.1016/j.asoc.2008.05.006.
[3] Kun-Ming Yu, Ming-Feng Wu,and Wai-Tak Wong
(April,2008), Protocol-Based Classification for
Intrusion Detection, APPLIED COMPUTER &
APPLIED COMPUTATIONAL SCIENCE
(ACACOS '08), Hangzhou, China.
[4] Shaik Akbar, Dr.K.Nageswara Rao
,Dr.J.A.Chandulal (August 2010),Intrusion Detection
System Methodologies Based on Data Analysis,
International Journal of Computer Applications
(0975 8887) Volume 5 No.2.
[5] Chuzhou University,China, Guangshun Yao,
Chuanjian Yang,1Lisheng Ma, Qian Ren (June 2011)
An New Algorithm of Modifying Hus Discernibility
Matrix and its Attribute Reduction, International
Journal of Advancements in Computing Technology
Volume 3, Number 5.
[6] T. Subbulakshmi , A. Ramamoorthi, and Dr. S.
Mercy Shalinie(August 2009), Ensemble design for
intrusion detection systems, International Journal of
Computer science & Information Technology
(IJCSIT), Vol 1, No 1.
[7] Y.Y.Yao and Y. Zhao (2009), Discernibility matrix
simplication for constructing attribute reducts,
Information Sciences, Vol. 179, No. 5, 867-882.
[8] Jen-Da Shie Shyi-Ming Chen (Feb 2007),Feature
subset selection based on fuzzy entropy measures for
handling classication problems, Appl Intell
(2008) 28: 6982,DOI 10.1007/s10489-007-0042-6.
[9] Kosko B (1986) Fuzzy entropy and conditioning. Inf
Scie 40(2):165174
International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 2, March April 2013 ISSN 2278-6856
Volume 2, Issue 2 March April 2013 Page 34
[10] Lee HM, Chen CM, Chen JM, Jou YL (2001) An
efficient fuzzy classifier with feature selection based
on fuzzy entropy. IEEE Trans Syst Man Cybern Part
B Cybern 31(3):426432
[11] Luca AD, Termini S (1972) A definition of a non-
probabilistic entropy in the setting of fuzzy set
theory. Inf Control 20(4):301 312
[12] Shannon CE (1948) A mathematical theory of
communication.Bell Syst Techn J 27(3):379423
[13] Hahn-Ming Lee, Member, IEEE, Chih-Ming Chen
Jyh-Ming Chen, and Yu-Lu Jou , An Efficient Fuzzy
Classifier with Feature Selection Based on Fuzzy
Entropy .
[14] Hamid Parvin, Behrouz Minaei Bidgoli,hossein
ghaffarin , An Innovative Feature Selection Using
Fuzzy Entropy, Advances in Neural Networks
ISNN 2011
[15] B.Azhagusundari,Dr.AntonySelvadossThanamani,
Feature Seelction based on Information Gain,
International Journal of Innovative Technology and
Exploring Engineering (IJITEE) ISSN: 2278-3075,
Volume-2, Issue-2, January 2013.
AUTHOR
B.AzhaguSundari received her B.Sc
Mathematics and Master of Computer
Applications from NGM College, Pollachi,
Coimbatore, India. She completed her
Master of Philosophy in Bharathidasan
University, trichy. Presently she is working as an
Assistant Professor in the P.G Department of Computer
Applications in NGM College (Autonomous), Pollachi.
Her area of interest includes data Mining. Now she is
pursuing her Ph.D Computer Science in Mother Teresa
University, Kodaikannal
Dr. Antony Selvadoss Thanamani is
presently working as Professor and Head,
Dept of Computer Science, NGM College,
Coimbatore, India (affiliated to Bharathiar
University, Coimbatore). He has published
more than 100 papers in international/ national journals
and conferences. He has authored many books on recent
trends in Information Technology. His areas of interest
include E-Learning, Knowledge Management, Data
Mining, Networking, Parallel and Distributed
Computing. He has to his credit 24 years of teaching and
research experience. He is a senior member of
International Association of Computer Science and
Information Technology, Singapore and Active member
of Computer Science Society of India, Computer Science
Teachers Association, New York