INTRODUCTION
DATA MINING Data mining is the process of extracting knowledge from large amount of data Knowledge Discovery in Databases BASIC DATA MINING TASKS
CLASSIFICATION predicts categorical class labels classifies data (constructs a model) based on the training set An algorithm that implements classification is known as a classifier
DIMENSIONALITY REDUCTION
FEATURE EXTRACTION Linear Non -Linear FEATURE SELECTION Feature Ranking Subset Selection Filter Approaches Embedded Approaches Wrapper Approaches
LITERATURE SURVEY
Reducing bioinformatics data dimension with ABCKNN Feature Selection for medical diagnosis: Evaluation for cardiovascular diseases Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients
PAPER -I
PAPER 1
PROBLEM
Analyzing a large amount of data often consumes extensive computational resources and execution time All data features do not equally contribute to the end results Need to identify the major contributing features and other features with low contribution can be eliminated The need for dimension reduction arises because biological data can be massive, with tens of thousands of features to be explored The objective is to design an effective algorithm that can selectively remove irrelevant dimensions from data while preserving the semantics of the original data.
6
PROPOSED WORK
Proposed the Artificial Bee colony(ABC) as a Artificial Bee Colony(ABC) K-Nearest Neighbor(KNN) Begin: method for data dimension reduction in the
Initialize Solutions Repeat // Employed Bees Process Updating_Feasible_Solutions // Onlooker Bees Process Selecting_Feasible_Solutions Updating_Feasible_Solutions // ScoutBeeProcess Avoiding_Sub-Optimal_ Solutions Until (maximum number of iterations or the stopping criterion is met) End Begin: classification problems For i=1 to number of training data items Store_data is then The K-Nearest Neighbor (KNN) method End For within j=1 to numberthe of testing data items used for fitness evaluation ABC Measure_distance framework Sort_by_distance Evaluate_data_class ABC feature selection method wrapped with End End KNN for classification( ABC-KNN)
DATASETS DESCRIPTION
RESULTS
100 90 80 70 60 50 40 30 20 10 0
Accuracy
LLDE-KNN
ABC-KNN
Data Name
10
RESULTS (cont)
11
CONCLUSION
The experimental results of the gene expression analysis show that the proposed method can effectively reduce the data dimension while maintaining the high classification accuracy ABC-KNN can thus be employed to exclude the nonessential data as well as identify the vital elements from a vast amount of biological data
12
PAPER-II
13
PROBLEM
To find suitable algorithm that generates smaller feature subset from high dimensional data with improved diagnosis ability for cardio vascular diseases
14
PROPOSED METHOD
FEATURE SELECTION METHODS
Forward Feature Inclusion Back-elimination Feature Selection Forward feature Selection
279 44
14 used
16 2
4
15
267
303
16
17
18
19
RESULTS
DATA SET CLASSIFICATION PERFOMANCE WITH ALL FEATURES CLASSIFICATION PERFORMANCE WITH PROPOSED FEATURE SELECTION ALGORITHM No of feature in subset 23 19 4 Accuracy with feature subset 0.88 0.78 0.85
20
CONCLUSION
It gives proper estimation of classifier performance when dataset is balanced If the dataset is unbalanced ,it is found that accuracy is not a correct estimate of classifier performance Feature ranking methods investigated in this research work well for arrhythmia and heart disease dataset Hybrid forward feature selection algorithm successfully reduces feature dimensions and improves accuracy of classifier Highest accuracy is achieved when forward selection algorithm is used
21
PAPER-III Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients
Author : Susana M. Vieira Luis F. Mendonca Goncalo J. Farinha Joao M.C. Sousa Year : 2013
22
PROBLEM
The medical condition taken is Sepsis, a common clinical condition defined by a whole-body inflammatory state, called systemic inflammatory response syndrome (SIRS) This clinical condition has different degrees of severity that can lead to severe sepsis and later to septic shock
23
PROPOSED METHOD
DESCRIPTION ADATASET modified S binary particle swarm optimization (MBPSO) method for feature selection with the simultaneous optimization of SVM kernel NUMBER DATABASES SAMPLES FEATURES CLASSES
1 2 German(credit 24 designed 2 to cope with An enhanced version1000 of BPSO, card) premature convergence of the BPSO algorithm Sonar 208 60 2
3
4 5
2
2 2
Colon Cancer
62
2000
2
24
MBPSO
25
RESULTS
26
RESULTS
27
CONCLUSION
MBPSO shows a better performance than the methods for PSO and similar or better results than GA 120 100 80 NO-FS BPSO IBPSO GA MBPSO 20 0 German Sonar WBCO WPBC WDBC Colon
28
Accuracy
60 40
Data base
FUTURE WORK
Future work considers experimenting the introduced algorithm(MBPSO) with other medical databases in order to more consistently compare its performance with other feature selection techniques
29
30
REFERENCES
[1] Thananan Prasartvit, Anan Banharnsakun, Boonserm Kaewkamnerdpong, Tiranee Achalakul,Reducing bioinformatics data dimension with ABC-kNN Neurocomputing 116(2013), 367-381 [2] Swati Shilaskar, Ashok Ghatol ,Feature selection for medical diagnosis :Evaluation for cardiovascular diseases , Expert Systems with Applications 40 (2013), 4146-4153 [3] Susana M. Vieira, Lus F. Mendonca, Gonalo J. Farinha, Joao M.C. Sousa Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients,Applied Soft Computing 13(2013), 3494-3504
31
32