Anda di halaman 1dari 32

ZEROTH REVIEW

SUBANYA.B 10CSR021 LAVANYA.M 10CSL149 RAJA.R 10CSR025


PROJECT GUIDE : Dr.R.R.RAJALAXMI
1

INTRODUCTION
DATA MINING Data mining is the process of extracting knowledge from large amount of data Knowledge Discovery in Databases BASIC DATA MINING TASKS

Predictive Classification Regression Time Series Analysis Prediction

Descriptive Clustering Summarization Association Rules Sequence Discovery

CLASSIFICATION predicts categorical class labels classifies data (constructs a model) based on the training set An algorithm that implements classification is known as a classifier

DIMENSIONALITY REDUCTION
FEATURE EXTRACTION Linear Non -Linear FEATURE SELECTION Feature Ranking Subset Selection Filter Approaches Embedded Approaches Wrapper Approaches

LITERATURE SURVEY
Reducing bioinformatics data dimension with ABCKNN Feature Selection for medical diagnosis: Evaluation for cardiovascular diseases Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients

PAPER -I

PAPER 1

Reducing bioinformatics data dimension with ABC-KNN


Authors: Thananan Prasartvit Anan Banharnsakun Boonserm Kaewkamnerdpong Tiranee Achalakul Year :2013
5

PROBLEM
Analyzing a large amount of data often consumes extensive computational resources and execution time All data features do not equally contribute to the end results Need to identify the major contributing features and other features with low contribution can be eliminated The need for dimension reduction arises because biological data can be massive, with tens of thousands of features to be explored The objective is to design an effective algorithm that can selectively remove irrelevant dimensions from data while preserving the semantics of the original data.
6

PROPOSED WORK
Proposed the Artificial Bee colony(ABC) as a Artificial Bee Colony(ABC) K-Nearest Neighbor(KNN) Begin: method for data dimension reduction in the
Initialize Solutions Repeat // Employed Bees Process Updating_Feasible_Solutions // Onlooker Bees Process Selecting_Feasible_Solutions Updating_Feasible_Solutions // ScoutBeeProcess Avoiding_Sub-Optimal_ Solutions Until (maximum number of iterations or the stopping criterion is met) End Begin: classification problems For i=1 to number of training data items Store_data is then The K-Nearest Neighbor (KNN) method End For within j=1 to numberthe of testing data items used for fitness evaluation ABC Measure_distance framework Sort_by_distance Evaluate_data_class ABC feature selection method wrapped with End End KNN for classification( ABC-KNN)

DATASETS DESCRIPTION

THE FLOWCHART OF ABC-KNN METHOD

RESULTS
100 90 80 70 60 50 40 30 20 10 0

Accuracy

LS-SVM PCA-FDA MSDR-LGC

LLDE-KNN
ABC-KNN

Data Name

10

RESULTS (cont)

11

CONCLUSION
The experimental results of the gene expression analysis show that the proposed method can effectively reduce the data dimension while maintaining the high classification accuracy ABC-KNN can thus be employed to exclude the nonessential data as well as identify the vital elements from a vast amount of biological data

12

PAPER-II

Feature Selection for medical diagnosis Evaluation for cardiovascular diseases


Author :Swathi Shilaskar Ashok Ghatol Year :2013

13

PROBLEM
To find suitable algorithm that generates smaller feature subset from high dimensional data with improved diagnosis ability for cardio vascular diseases

14

PROPOSED METHOD
FEATURE SELECTION METHODS
Forward Feature Inclusion Back-elimination Feature Selection Forward feature Selection

DATA SETS DESCRIPTION


DATASET NO OF SAMPLES NO OF FEATURES CATEGORIES

ARRHYTHMIA 452 SPECTF CARDIAC


HEART DISEASE

279 44
14 used

16 2
4
15

267
303

HYBRID MODEL OF FEATURE SELECTION PROCESS

16

FORWARD FEATURE INCLUSION ALGORITHM

17

Back-elimination Feature Selection

18

FORWARD FEATURE SELECTION

19

RESULTS
DATA SET CLASSIFICATION PERFOMANCE WITH ALL FEATURES CLASSIFICATION PERFORMANCE WITH PROPOSED FEATURE SELECTION ALGORITHM No of feature in subset 23 19 4 Accuracy with feature subset 0.88 0.78 0.85

No of all features Arrhythmia SPECTF cardiac Heart Disease 258 44 10

Accuracy with all features 0.79 0.75 0.81

20

CONCLUSION
It gives proper estimation of classifier performance when dataset is balanced If the dataset is unbalanced ,it is found that accuracy is not a correct estimate of classifier performance Feature ranking methods investigated in this research work well for arrhythmia and heart disease dataset Hybrid forward feature selection algorithm successfully reduces feature dimensions and improves accuracy of classifier Highest accuracy is achieved when forward selection algorithm is used
21

PAPER-III Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients
Author : Susana M. Vieira Luis F. Mendonca Goncalo J. Farinha Joao M.C. Sousa Year : 2013

22

PROBLEM
The medical condition taken is Sepsis, a common clinical condition defined by a whole-body inflammatory state, called systemic inflammatory response syndrome (SIRS) This clinical condition has different degrees of severity that can lead to severe sepsis and later to septic shock

23

PROPOSED METHOD
DESCRIPTION ADATASET modified S binary particle swarm optimization (MBPSO) method for feature selection with the simultaneous optimization of SVM kernel NUMBER DATABASES SAMPLES FEATURES CLASSES
1 2 German(credit 24 designed 2 to cope with An enhanced version1000 of BPSO, card) premature convergence of the BPSO algorithm Sonar 208 60 2

3
4 5

The MBPSO a wrapper9 method WBCOis used as 683(699)


WPBC WDBC 198 569 32 30

2
2 2

Colon Cancer

62

2000

2
24

MBPSO

25

RESULTS

26

RESULTS

27

CONCLUSION
MBPSO shows a better performance than the methods for PSO and similar or better results than GA 120 100 80 NO-FS BPSO IBPSO GA MBPSO 20 0 German Sonar WBCO WPBC WDBC Colon
28

Accuracy

60 40

Data base

FUTURE WORK
Future work considers experimenting the introduced algorithm(MBPSO) with other medical databases in order to more consistently compare its performance with other feature selection techniques

29

FINDINGS FROM THE LITERATURE SURVEY


MBPSO or ABC-KNN can be applied over Heart disease databases to improve the accuracy in the diagnosis of Heart disease
Hybrid models like PSO-KNN,GA-KNN or ABC with other classification algorithms can be developed and applied over the databases to improve the efficiency of finding the subsets

30

REFERENCES
[1] Thananan Prasartvit, Anan Banharnsakun, Boonserm Kaewkamnerdpong, Tiranee Achalakul,Reducing bioinformatics data dimension with ABC-kNN Neurocomputing 116(2013), 367-381 [2] Swati Shilaskar, Ashok Ghatol ,Feature selection for medical diagnosis :Evaluation for cardiovascular diseases , Expert Systems with Applications 40 (2013), 4146-4153 [3] Susana M. Vieira, Lus F. Mendonca, Gonalo J. Farinha, Joao M.C. Sousa Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients,Applied Soft Computing 13(2013), 3494-3504

31

32

Anda mungkin juga menyukai