Zeroth Review1

ZEROTH REVIEW
SUBANYA.B 10CSR021 LAVANYA.M 10CSL149 RAJA.R 10CSR025

PROJECT GUIDE : Dr.R.R.RAJALAXMI
1
INTRODUCTION
DATA MINING Data mining is the process of extracting knowledge from large amount of data Knowledge Discovery in Databases BASIC DATA MINING TASKS
Predictive Classification Regression Time Series Analysis Prediction
Descriptive Clustering Summarization Association Rules Sequence Discovery
CLASSIFICATION predicts categorical class labels classifies data (constructs a model) based on the training set An algorithm that implements classification is known as a classifier
DIMENSIONALITY REDUCTION
FEATURE EXTRACTION Linear Non -Linear FEATURE SELECTION Feature Ranking Subset Selection Filter Approaches Embedded Approaches Wrapper Approaches
LITERATURE SURVEY
Reducing bioinformatics data dimension with ABCKNN Feature Selection for medical diagnosis: Evaluation for cardiovascular diseases Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients
PAPER -I
PAPER 1
Reducing bioinformatics data dimension with ABC-KNN

Authors: Thananan Prasartvit Anan Banharnsakun Boonserm Kaewkamnerdpong Tiranee Achalakul Year :2013
5
PROBLEM
Analyzing a large amount of data often consumes extensive computational resources and execution time All data features do not equally contribute to the end results Need to identify the major contributing features and other features with low contribution can be eliminated The need for dimension reduction arises because biological data can be massive, with tens of thousands of features to be explored The objective is to design an effective algorithm that can selectively remove irrelevant dimensions from data while preserving the semantics of the original data.
6
PROPOSED WORK
Proposed the Artificial Bee colony(ABC) as a Artificial Bee Colony(ABC) K-Nearest Neighbor(KNN) Begin: method for data dimension reduction in the
Initialize Solutions Repeat // Employed Bees Process Updating_Feasible_Solutions // Onlooker Bees Process Selecting_Feasible_Solutions Updating_Feasible_Solutions // ScoutBeeProcess Avoiding_Sub-Optimal_ Solutions Until (maximum number of iterations or the stopping criterion is met) End Begin: classification problems For i=1 to number of training data items Store_data is then The K-Nearest Neighbor (KNN) method End For within j=1 to numberthe of testing data items used for fitness evaluation ABC Measure_distance framework Sort_by_distance Evaluate_data_class ABC feature selection method wrapped with End End KNN for classification( ABC-KNN)
DATASETS DESCRIPTION
THE FLOWCHART OF ABC-KNN METHOD
RESULTS
100 90 80 70 60 50 40 30 20 10 0
Accuracy
LS-SVM PCA-FDA MSDR-LGC
LLDE-KNN
ABC-KNN
Data Name
10
RESULTS (cont)
11
CONCLUSION
The experimental results of the gene expression analysis show that the proposed method can effectively reduce the data dimension while maintaining the high classification accuracy ABC-KNN can thus be employed to exclude the nonessential data as well as identify the vital elements from a vast amount of biological data
12
PAPER-II
Feature Selection for medical diagnosis Evaluation for cardiovascular diseases

Author :Swathi Shilaskar Ashok Ghatol Year :2013
13
PROBLEM
To find suitable algorithm that generates smaller feature subset from high dimensional data with improved diagnosis ability for cardio vascular diseases
14
PROPOSED METHOD
FEATURE SELECTION METHODS
Forward Feature Inclusion Back-elimination Feature Selection Forward feature Selection
DATA SETS DESCRIPTION

DATASET NO OF SAMPLES NO OF FEATURES CATEGORIES
ARRHYTHMIA 452 SPECTF CARDIAC

HEART DISEASE
279 44
14 used
16 2
4
15
267
303
HYBRID MODEL OF FEATURE SELECTION PROCESS
16
FORWARD FEATURE INCLUSION ALGORITHM
17
Back-elimination Feature Selection
18
FORWARD FEATURE SELECTION
19
RESULTS
DATA SET CLASSIFICATION PERFOMANCE WITH ALL FEATURES CLASSIFICATION PERFORMANCE WITH PROPOSED FEATURE SELECTION ALGORITHM No of feature in subset 23 19 4 Accuracy with feature subset 0.88 0.78 0.85
No of all features Arrhythmia SPECTF cardiac Heart Disease 258 44 10
Accuracy with all features 0.79 0.75 0.81
20
CONCLUSION
It gives proper estimation of classifier performance when dataset is balanced If the dataset is unbalanced ,it is found that accuracy is not a correct estimate of classifier performance Feature ranking methods investigated in this research work well for arrhythmia and heart disease dataset Hybrid forward feature selection algorithm successfully reduces feature dimensions and improves accuracy of classifier Highest accuracy is achieved when forward selection algorithm is used
21
PAPER-III Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients
Author : Susana M. Vieira Luis F. Mendonca Goncalo J. Farinha Joao M.C. Sousa Year : 2013
22
PROBLEM
The medical condition taken is Sepsis, a common clinical condition defined by a whole-body inflammatory state, called systemic inflammatory response syndrome (SIRS) This clinical condition has different degrees of severity that can lead to severe sepsis and later to septic shock
23
PROPOSED METHOD
DESCRIPTION ADATASET modified S binary particle swarm optimization (MBPSO) method for feature selection with the simultaneous optimization of SVM kernel NUMBER DATABASES SAMPLES FEATURES CLASSES
1 2 German(credit 24 designed 2 to cope with An enhanced version1000 of BPSO, card) premature convergence of the BPSO algorithm Sonar 208 60 2
3
4 5
The MBPSO a wrapper9 method WBCOis used as 683(699)

WPBC WDBC 198 569 32 30
2
2 2
Colon Cancer
62
2000
2
24
MBPSO
25
RESULTS
26
RESULTS
27
CONCLUSION
MBPSO shows a better performance than the methods for PSO and similar or better results than GA 120 100 80 NO-FS BPSO IBPSO GA MBPSO 20 0 German Sonar WBCO WPBC WDBC Colon
28
Accuracy
60 40
Data base
FUTURE WORK
Future work considers experimenting the introduced algorithm(MBPSO) with other medical databases in order to more consistently compare its performance with other feature selection techniques
29
FINDINGS FROM THE LITERATURE SURVEY

MBPSO or ABC-KNN can be applied over Heart disease databases to improve the accuracy in the diagnosis of Heart disease
Hybrid models like PSO-KNN,GA-KNN or ABC with other classification algorithms can be developed and applied over the databases to improve the efficiency of finding the subsets
30
REFERENCES
[1] Thananan Prasartvit, Anan Banharnsakun, Boonserm Kaewkamnerdpong, Tiranee Achalakul,Reducing bioinformatics data dimension with ABC-kNN Neurocomputing 116(2013), 367-381 [2] Swati Shilaskar, Ashok Ghatol ,Feature selection for medical diagnosis :Evaluation for cardiovascular diseases , Expert Systems with Applications 40 (2013), 4146-4153 [3] Susana M. Vieira, Lus F. Mendonca, Gonalo J. Farinha, Joao M.C. Sousa Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients,Applied Soft Computing 13(2013), 3494-3504
31
32

Zeroth Review1

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Zeroth Review1

Diunggah oleh

Hak Cipta:

Format Tersedia

ZEROTH REVIEW

SUBANYA.B 10CSR021 LAVANYA.M 10CSL149 RAJA.R 10CSR025

Predictive Classification Regression Time Series Analysis Prediction

Descriptive Clustering Summarization Association Rules Sequence Discovery

Reducing bioinformatics data dimension with ABC-KNN

THE FLOWCHART OF ABC-KNN METHOD

LS-SVM PCA-FDA MSDR-LGC

Feature Selection for medical diagnosis Evaluation for cardiovascular diseases

DATA SETS DESCRIPTION

ARRHYTHMIA 452 SPECTF CARDIAC

HYBRID MODEL OF FEATURE SELECTION PROCESS

FORWARD FEATURE INCLUSION ALGORITHM

Back-elimination Feature Selection

FORWARD FEATURE SELECTION

No of all features Arrhythmia SPECTF cardiac Heart Disease 258 44 10

Accuracy with all features 0.79 0.75 0.81

The MBPSO a wrapper9 method WBCOis used as 683(699)

FINDINGS FROM THE LITERATURE SURVEY

Anda mungkin juga menyukai