Literature Review CCSIT205

4th International Conference on System Modeling & Advancement in Research Trends (SMART)
College of Computing Sciences and Information Technology (CCSIT) ,Teerthanker Mahaveer University , Moradabad [2015]
A Systematic Review of Classification Techniques

and Implementation of ID3 Decision Tree Algorithm
Arohi Gupta1, Surbhi Gupta2, Deepika Singh3
1
Research Scholar, College of Computing Sciences & Information Technology, TMU, Moradabad, India
2
Research Scholar, College of Computing Sciences & Information Technology, TMU, Moradabad, India
3
Assistant Professor, College of Computing Sciences & Information Technology, TMU, Moradabad, India
1
arohig.gupta@gmail.com
2
surbhigupta1908@gmail.com
3
deep.16feb84@gmail.com
Abstract Data mining is a knowledge discovery process that into a number of techniques for classifying data
analyzes the data and generates useful information and patterns
from it which assist in decision making in an organization. with the correct class labels. Some of these
Classification is a supervised learning technique of data mining, techniques as proposed by researchers are: Decision
which consists of a set of predefined classes and on the basis of tree based method, Bayseian Classifiers, Neural
these predefined classes new objects are classified. Classification
classifies data based on the training dataset and generates a network based classifiers, Lazy learner, Support
classifier or model and uses it in classifying new data. In this vector machines, Rule based method [1][25].
research paper, we have discussed about the classification Decision tree [34] based classification method is
techniques proposed in the literature and the detailed study of
the Decision tree based data mining algorithms such as ID3 and the graphical representation of the data point
C4.5 has been done. Also, we have presented the comparative attributes and it is one of the simplest method for
study of various classification algorithms along with their building a classifier model. A decision tree is
advantages and disadvantages.
represented using nodes, branches and leaves,
where each node denotes a test, each branch
Keywords Data Mining, Classification, Decision Tree, Neural
Network, K-Nearest Neighbor, Nave Bayesian represents an outcome of the test and leaves
represent classes. This tree can be converted to
I. INTRODUCTION classification rules [8]. Another method of
The development of Information technology has classification, Naive Bayes classifier [35], is a
generated large amount of databases and huge data simple probabilistic classification method based on
in various areas. The research in databases and applying Bayes theorem with strong independence
information technology has given rise to an assumptions. The model based on this classifier
approach to store and manipulate this precious data would be more precisely called as an independent
for further decision making. Data mining refers to feature model [10]. Classification can also be
extracting or mining knowledge from large amounts carried out using Neural Network [36] or an
of data or we can say that data mining is a process artificial neural network, which is a biological
of extraction of useful information and patterns system that detects patterns and makes predictions
from huge data. It is also called as knowledge [3]. Another approach for classification proposed in
discovery process, knowledge mining from data, the literature is of using lazy learners [33]. K-
knowledge extraction or data pattern Analysis Nearest Neighbor is a type of instance-based
[1][9]. Classification is one of the supervised learning, or lazy learning algorithm, which
learning technique of knowledge mining from the classifies objects based on closest training examples
vast amount of data. In classification we find a in the feature space. In k- nearest neighbor
model that describes and distinguishes data classes, algorithm, the function is only approximated locally
using the training dataset whose class label is and all computation is deferred until classification
known. This model can be used to predict the class [11]. One of the strongest method for building
of objects whose class label is unknown. In classifiers is SVM (Support Vector Machines).
literature the classification method is subdivided Support Vector Machines [31] can classify both
144
linear and non linear data. It can transform original network based classifiers, Lazy learners, Support
training data into higher dimensions by using non vector machines, and Rule based method [1][25].
linear mapping [8]. Rule based classification [37]
technique uses a collection of if-then rule for
classifying the dataset [25].
In this paper our aim is to review the state of the
art on the existing classification based algorithms
and to present the advantages and disadvantages of
the various classification algorithms so as to make a
comparison among these algorithms. The rest of
this research paper is categorized into subsections
as: second section consists of overview of different
classification techniques. In the third section, the
detailed study of ID3 and C4.5 algorithms has been
given. The fourth section is a comparative study of
classification algorithms. Section Five is the
conclusion and future aspects for proposing an
efficient classification algorithm on the basis of the
already proposed classification algorithms. Fig. 1 Proposed taxonomy for the classification algorithms
II. STATE OF ART OF THE CLASSIFICATION TECHNIQUES A. Decision Tree
In classification a model or classifier is Decision tree is a classifier that can be viewed as

constructed to find categorical labels. For example, a tree where each internal node (non leaf node)
the loan application data which can help a bank denotes a test on an attribute, each branch
loan officer to analyze that the loan applicant is safe represents an outcome of the test, and each leaf
or risky for the bank. The categorical class labels node (or terminal node) holds a class label. It is a
for the loan application data are safe or risky. Data flowchart like structure. The topmost node in a tree
classification is a two-step process. In the first is the root node. Decision trees can easily be
learning step, a classifier or model is built using converted to classification rules [8]. In data mining,
training dataset whose class labels are known. In Decision tree structures are a common way to
the second step, the model is used for classification, organize classification schemes. Classification
where the accuracy of the classifier is estimated using a decision tree is performed by routing from
using test dataset [8][1]. If the accuracy is the root node until arriving at a leaf node [12]. The
considered acceptable, the classifier can be used to algorithms used for decision tree are ID3 [13], C4.5
classify future data tuple whose class label is [32], C5.0 [5] and CART [38].
unknown. Some typical applications of
classification are target marketing, Medical
diagnosis, Credit approval, Fraud detection [8].
As we have stated a number of classification
techniques has been proposed in the literature. To
start with the description of the taxonomy, we have
explained our proposal in Fig. 1, where we have
categorized the classification algorithms. Mainly
the classification process is divided into six
different categories, which are named as Decision
tree based method, Bayseian Classifiers, Neural
Fig. 2 Decision Tree Model
145
1) Decision Tree Algorithm distribution of the node tuples may be

Generate a decision: Generate a decision tree stored [8].
from the training tuples of data partition D. There are no tuples for a given branch, that
Input: Data partition D, which is a set of training is, a partition Dj is empty. In this case, a
tuples and their associated class labels, attribute list, leaf is created with the majority class in D
Attribute selection method. [8].
Output: A decision tree. B. Naive Bayesian
Method:
Bayesian classifiers, statistical classifiers, can
Create a node N
predict class membership probabilities, such as the
If tuples in D are all of the same class, C
probability that a given tuple belongs to a particular
then
class. Bayesian classification is based on Bayes
Return N as a leaf node labelled with the theorem. The naive Bayes method also called
class C idiots Bayes, simple Bayes, and independence
If attribute list is empty then Bayes. It is very easy to construct, not needing any
Return N as a leaf node labelled with the complicated iterative parameter estimation schemes
majority class in D [4]. Naive Bayes classifiers use all the attributes. It
Apply Attribute selection method (D, is based on following two assumptions:
attribute list) to find the best splitting 1. Attributes are equally important.
criterion 2. Attributes are statistically independent i.e.,
Label node N with splitting criterion class conditional independence, knowing
If splitting attribute is discrete-valued and the value of one attribute says nothing
multiway splits allowed then about the value of another.
Attribute list attribute list splitting attribute The naive Bayesian classifier works as follows:
For each outcome j of splitting criterion Let D be a training set of tuples and their
Let Dj be the set of data tuples in D associated class labels. Each tuple is represented by
satisfying outcome j an n-D attribute vector X = (x1, x2,, xn). Suppose
If Dj is empty then there are m classes C1, C2,, Cm. Classification is
attach a leaf labelled with the majority class to derive the maximum posteriori, i.e., the maximal
in D to node N P(Ci|X). This can be derived from Bayes theorem
P ( X | C ) P (C ) Eq. 1
Else attach the node returned by Generate P (C | X ) i i
i P (X )
decision tree (Dj, attribute list) to node N,
End for Since P(X) is constant for all classes, only needs
Return N [8]. to be maximized
P (C | X ) P ( X | C ) P (C )
i i i
Eq. 2
The recursive partitioning stops only when
Naive Bayesian prediction requires each
any one of the following terminating
conditional probability be non-zero. Otherwise, the
condition is true:
predicted probability will be zero. To avoid this
All of the tuples in partition D belong to the
problem, Laplacian correction or Laplace estimator
same class or [8].
technique is used. The corrected probability
There are no remaining attributes on which estimates will close to their uncorrected
the tuples may be further partitioned. In counterparts, yet the zero probability value will be
this case, majority voting is employed.
avoided [8].
This involves converting Node N into a
leaf and labeling it with the most common C. Neural Network
class in D. Alternatively, the class An artificial neural network (ANN), also called a
neural network (NN), is one of the newest signals
146
processing technology. It is a mathematical model tuples, say X1=(x11, x12,.x1n) and X2=(x21 ,
or computational model based on biological neural x22 ,.x2n) is
networks. It consists of an interconnected group of dist(X1, X2) = (x1i x2i)2 Eq. 3
artificial neurons and processes information using a We normalize the values of each attribute before
connectionist approach to computation. In most using the Eq. 3 (min-max normalization). In k-
cases an ANN is an adaptive system that changes its nearest neighbor classification, the unknown tuple
structure based on external or internal information is assigned the most common class among its k
that flows through the network during the learning nearest neighbours [8]. For categorical attributes we
phase [1]. After the training is complete the compare the corresponding value of the attribute in
parameter are fixed. If there are lots of data and tuple X1 with that in tuple in X2. If the two are
problem is poorly understandable then using ANN identical then the difference between the two is
model is accurate, the non linear characteristics of taken as 0. If the two are different then the
ANN provide it lots of flexibility to achieve input difference is considered to be 1[8].
output map [3]. E. Support Vector Machines
As Support Vector Machines (also known as
Maximization Margin Classifiers) simultaneously
minimize the empirical classification error and
maximize the geometric marginal [23] so, it does
not depend on the dimensions of the feature space
and can therefore efficiently handle high
dimensional data [22][23]. These are based on
Fig. 3 Example of an unacceptable low-resolution image
Structural risk minimization [23]. The basic concept
of structural risk minimization is to find the
Components of ANN are Neuron or node or unit, hypothesis for which it guarantees the lowest true
Input links, Output links, Weight. Each unit error [22]. SVM has strong regularization properties.
performs a simple process: Regularization refers to the generalization of the
1. Receives n-inputs model to new data. Support vector machines were
2. Multiplies each input by its weight designed as a tool to solve supervised learning
3. Applies activation function to the sum of classification problems [29][30]. SVM map input
results vector to a higher dimensional space where a
4. Outputs result maximal separating hyperplane is constructed. Two
D. K-Nearest Neighbor parallel hyperplanes are constructed on each side of
Nearest Neighbor classifiers are based on the hyperplane that separate the data. The
learning by analogy, in which a given test tuples is separating hyperplane is the hyperplane that
compared with training tuples that are similar to it. maximize the distance between the two parallel
All training tuples are stored in an n-dimensional hyperplanes. An assumption is made that the larger
pattern space because each tuple represents a point the margin or distance between these parallel
in an n-dimensional space [1][8]. When given an hyperplanes the better the generalization error of
unknown tuple, a k-nearest neighbor classifier the classifier will be [23][31].
searches the pattern space for the k training tuples F. Rule Based Method
that are closest to the unknown tuple. These k Rule based classification technique uses a
training tuples are the k nearest neighbors of the collection of if- then rule for classifying the
unknown tuple. Closeness is defined in terms of a dataset [25]. For example consider the rule R1
distance metric, such as Euclidean distance [6][1]. given as
The Euclidean distance between two points or
147
R1: IF Manual Checkup = Pass AND Year = Valid simplest, tree is found [8]. The expected
THEN Issue = Yes information needed to classify a tuple in D is given
The IF part of a rule is called the rule by [8]
antecedent or precondition and the THEN part is Eq. 6
the rule consequent. Here in the above rule we are Info(D) is also known as the entropy of D.
predicting the pollution under control certificate for InfoA(D) is the expected information required to
a vehicle. If all the attribute(i.e. manual checkup classify a tuple from D based on the partitioning by
and year) tests in the rule holds true for the given A and is given by [8]
tuple, we say that the rule satised and that the rule
Eq. 7
covers the tuple. There are two parameters for the
assessing a rule R, defined as the rule coverage and
The smaller the expected information required,
the rule accuracy [8].
the greater the purity of the partitions. Information
Consider a dataset D where |D| denotes the
gain is defined as the difference between the
number of tuples in D. Let ncovers be the number
original information requirement and the new
of tuples covered by rule R and ncorrect be the
requirement [8].
number of tuples correctly classified by R. Then the Gain(A) = Info(D) - InfoA(D) Eq. 8
coverage and accuracy is defined as Experiments to evaluate the performance of the
algorithm with continuous valued attributes and
Eq. 4
missing attribute values reveal that ID3 does not
Eq. 5 give acceptable results for continuous valued
attributes and works well in certain data sets with
missing values [14][15].
III. DECISION TREE BASED ALGORITHM
The entropy and information gain has been
As stated above in section II here we have shown calculated for the data shown in the table below
the implementation of decision tree based algorithm [16]. Fig. 4 is the screen shot of the calculated
ID3 in java and have calculated the information values.
gain and entropy. Then further in this section, we TABLE I
have given a brief description of C4.5 and C5.0. DATABASE D
Manu
A. ID3 al
Catego Servic
Name Fuel Kilometers Year Check Issue
ry e
ID3 uses information gain as its attribute up
(MC)
selection measure. ID3, Iterative Dichotomiser3 is a Not
Riva Petrol Two No Valid Pass Yes
decision tree learning algorithm which is used for Covered
the classification of the objects with the iterative Sita Petrol Two Covered Yes Invalid Pass No
inductive approach. It uses the greedy top to down

search to build the tree which will decide the Puru Petrol Four Covered No Valid Fail No
decision rules [16][14]. Let D is a database with N

Riya Petrol Four Covered Yes Valid Pass Yes
number of nodes and some tuples. The attribute
with the highest information gain is chosen as the Neha Diesel Four Covered No Invalid Fail No
splitting attribute for Node N. This attribute
minimizes the information needed to classify the Ram Diesel Three Covered No Valid Fail No
tuples in the resulting partitions and reflects the Not

Ekta Diesel Three No Valid Pass Yes
least randomness or impurity in these partitions. Covered
Such an approach minimizes the expected number Saya CNG Three

Not
Covered
No Valid Pass Yes
of tests needed to classify a given tuple and

guarantees that a simple but not necessarily the Ajay Petrol Four Covered Yes Invalid Fail No
148
It differs from information gain, which measures

the information with respect to classification that is
acquired based on the same partitioning. The gain
ratio is defined as
Eq. 10
The attribute with the maximum gain ratio is
selected as the splitting attribute [8].
C. C5.0
C4.5 was superseded in 1997 by a commercial
system See5/C5.0 (or C5.0 for short) [17]. C4.5
algorithm follows the rules of ID3 algorithm.
Similarly C5 algorithm follows the rules of C4.5
algorithm. C5.0 algorithm provides Feature
Fig. 4 Calculated Entropy and Information Gain for Database D
selection, Cross validation and reduced error
pruning facilities. C5.0 algorithm has many features
The decision tree for this database based on the like [5]:
calculated entropy and information gain is as shown 1. The large decision tree can be viewing as a set
in Fig. 5. of rules which is easy to understand [5].
2. It gives the acknowledge on noise and missing
data [5].
3. Problem of over fitting and error pruning is
solved by the C5.0 algorithm [5].
4. C5.0 classifier can anticipate which attributes
are relevant and which are not relevant in
classification [5].
IV. COMPARATIVE ANALYSIS OF CLASSIFICATION
ALGORITHMS
In this section we studied advantages and
disadvantages of various classification algorithms.
Fig. 5 Decision Tree for Database D
B. C4.5
The information gain measure is biased toward
tests with many outcomes. C4.5 [24], a successor of
ID3, uses an extension to information gain known
as gain ratio (attribute selection measure), which
attempts to overcome this bias [8]. When all
attributes are binary, the gain ratio criterion has
been found to give considerably smaller decision
trees [13]. It applies a kind of normalization to
information gain using a split information value
defined analogously with Info(D) as
Eq. 9
149
TABLE III
ADVANTAGES AND DISADVANTAGES OF CLASSIFICATION ALGORITHMS
Algorithm Advantages Disadvantages

Very simple [21] Does not guarantee optimal solution.
Easy to implement. Does not give acceptable result for continuous
Quite a simple process. data and missing data [15].
ID3
Running time increases only linealy with complexity of It takes the more memory.
problem. It has long searching time.
Data may be over-fitted or over-classified [21].
Avoid overfitting the data [2]. Empty branches [21].
Faster than ID3 [2]. Insignificant branches [21].
More memory efficient than ID3 [2]. Susceptible to noise [21].
C4.5
Handles missing and continuous attributes [2].
Determining how deeply to grow a decision tree.
Improved computational efficiency.
Faster than c4.5 [5][18]. For applications with very many cases C5.0
Use less memory than C4.5 during ruleset construction may crash with a message like segmentation
[18]. fault [19].
Gets similar result with smaller decision tree [5]. The use of case weighting does not guarantee
C5.0 Supports boosting: It improves the trees and give more that the classifier will be more accurate for
accuracy. unseen cases with higher weights [19].
C5.0 rulesets are easier to understand [19].
Lower error rates on unseen cases [5][18].
Solve problem of over fitting [5]
High tolerance to noisy data. Long training time.
Ability to classify untrained patterns. Require a number of parameters typically best
Well-suited for continuous-valued inputs and outputs. determined empirically
Neural Network Successful on a wide array of real-world data. Poor interpretability
Algorithms are inherently parallel.
Techniques have recently been developed for the
extraction of rules from trained neural networks.
Easy to understand [7][17]. Memory limitation [7].
Easy to implement [7][17]. Being a supervised learning lazy algorithm i.e.,
Training is very fast [7]. runs slowly [7].
K- Nearest
Robust to noisy training data [7]. Expensive particularly for large training sets
Neighbor
Particularly well suited for multimodal classes as well as [17].
applications in whichan object can have many class
labels [17].
Simple for constructing classifier. The naive Bayes classifier requires a very large
Require small amount of training data to estimate the number of records to obtain good results [7].
parameters necessary for classification. It is instance-based or lazy in that they store all
Naive Bayesian Even if the nave bayes assumptions does not hold, a of the training samples [7].
nave bayes classifier still often performs surprisingly
well in practice.
Good performance[7].
It can efficiently handle non linear data. The marginal contribution of each financial
It can handle multi class problem. ratio to the score is variable [26].
By introducing the kernel, SVMs gain flexibility in the The lack of transparency of results [26].
choice of the form of the threshold separating solvent The choice of the kernel [27].
Suport Vector from insolvent companies [26]. Extension of multiclass problem [28].
Machines No assumptions about the functional form of the Long training time [28].
transformation [26]. Selection of parameters [28].
Provide a good out-of-sample generalization [26].
Deliver a unique solution, since the optimality problem
is convex [26].
Easy for people to understand [39][25]. When data contains uncertainty, the algorithm
Rule Based
rule learning systems outperform decision tree learners can not process the uncertainty properly [25].
Method
on many problems [25][40][20].
150
[12] Decision Trees for Business Intelligence and Data Mining: Using SAS
V. CONCLUSIONS Enterprise Miner, Decision Trees, What Are They?
[13] J.R. QUINLAN, Induction of Decision Trees, Machine Learning 1: 81-
Data mining is a wide area that integrates 106, 1986, Kluwer Academic Publishers, Boston - Manufactured in
techniques from various fields. These techniques The Netherlands.
[14] Anand Bahety, Extension and Evaluation of ID3 Decision Tree
can be based on supervised learning method or Algorithm. University of Maryland, College Park.
unsupervised learning method. One of the [15] Rupali Bhardwaj , Sonia Vatta, Implementation of ID3 Algorithm,
International Journal of Advanced Research in Computer Science and
supervised learning based method, called as Software Engineering, Volume 3, Issue 6, June 2013 ISSN: 2277 128X
classification, for mining data patterns has been [16] Rupali Bhardwaj, Sonia Vatta, Issuing of Pollution Under Control
Certificate using ID3 algorithm, International Journal of Advanced
reviewed in this paper. The important task of Research in Computer Science and Software Engineering, Volume 3,
classification process is to classify new and unseen Issue 5, May 2013 ISSN: 2277 128X
[17] XindongWu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang
sample correctly. These classification algorithms Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus Ng, Bing Liu,
can be implemented on different types of data sets Philip S. Yu, Zhi-Hua Zhou, Michael Steinbach, David J. Hand, Dan
Steinberg, Top 10 algorithms in data mining, Knowl Inf Syst (2008)
like data of patients, financial data, and student data. 14:137DOI 10.1007/s10115-007-0114-2
Each technique has got its own pros and cons as [18] The Rulequest Research Website. [Online]. Available:
http://rulequest.com/see5-comparison.html.
given in the paper. Based on the needed conditions [19] The Rulequest Research Website. [Online]. Available:
each one as needed can be selected. This paper http://www.rulequest.com/see5-unix.html
[20] S. M. Weiss and N. Indurkhya, Reduced complexity rule induction, in
deals with various classification techniques used in IJCAI, 1991, pp. 678684.
data mining and a detailed study on ID3 and C4.5 [21] Sonia Singh, Priyanka Gupta, COMPARATIVE STUDY ID3, CART
AND C4.5 DECISION TREE ALGORITHM: A SURVEY,
decision tree based algorithms, has been conducted. International Journal of Advanced Information Science and
Technology (IJAIST), Vol.27, No. 27, July 2014.
REFERENCES [22] Thorsten Joachims, Text categorization with Support Vector Machines:
[1] S. Neelamegam, Dr. E. Ramaraj, Classification algorithm in Data Learning with many relevant feature, 10th European Conference on
mining: An Overview, International Journal of P2P Network Trends Machine Learning Chemnitz, Germany, Vol. 1398, April 2123, 1998
and Technology (IJPTT), Volume 4 Issue 8, Sep 2013 Proceedings, pp 137-142.
[2] A. S. Galathiya, A. P. Ganatra, C. K. Bhensdadia, Classification with [23] DURGESH K.Srivastava, Lekha Bhambhu, DATA
an improved Decision Tree Algorithm, International Journal of CLASSIFICATION USING SUPPORT VECTOR MACHINE, Journal
Computer Applications (0975 8887), Volume 46 No.23, May 2012 of Theoretical and Applied Information Technology 2005 - 2009
[3] Nikita Jain, Vishal Srivastava, Data mining techniques: a survey paper, JATIT.
IJRET: International Journal of Research in Engineering and [24] J. R. Quinlan, Improved Use of Continuous Attributes in C4.5, Journal
Technology eISSN: 2319-1163 | pISSN: 2321-7308 of Artificial Intelligence Research, Vol 4, (1996), 77-90
[4] Raj Kumar, Dr. Rajesh Verma, Classification Algorithms for Data [25] Biao Qin, Yuni Xia, Sunil Prabhakar, Yicheng Tu, A Rule-Based
Mining: A Survey, International Journal of Innovations in Engineering Classification Algorithm for Uncertain Data, IEEE International
and Technology (IJIET) Vol. 1 Conference on Data Engineering.
[5] Rutvija Pandya, Jayati Pandya, C5.0 Algorithm to Improved Decision [26] Laura Auria, Rouslan A. Moro, Berlin, Support Vector Machines
Tree with Feature Selection and Reduced Error Pruning, International (SVM) as a Technique for Solvency Analysis, August 2008
Journal of Computer Applications (0975 8887) Volume 117 - No. 16, [27] CHRISTOPHER J.C. BURGES, A Tutorial on Support Vector
May 2015 Machines for Pattern Recognition, Kluwer Academic Publishers,
[6] Thair Nu Phyu, Survey of Classification Techniques in Data Mining, Boston, Manufactured in the Netherlands
Proceedings of the International Multi Conference of Engineers and [28] Support Vector Machine for Pattern Classification by Shigeo Abe
Computer Scientists 2009 Vol I, IMECS 2009, March 18 - 20, 2009, [29] Himani Bhavsar, Mahesh H. Panchal, A Review on Support Vector
Hong Kong Machine for Data Classification, International Journal of Advanced
[7] S. Archana, Dr. K. Elangovan, Survey of Classification Techniques in Research in Computer Engineering & Technology (IJARCET),
Data Mining, International Journal of Computer Science and Mobile Volume 1, Issue 10, December 2012
Applications, Vol.2 Issue. 2, February- 2014, pg. 65-71 ISSN: 2321- [30] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector
8363 Machines, Cambridge University Press, 2000
[8] Jiawei Han & Micheline Kamber, DM Concepts & Techniques , [31] V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag.
Second Edition 1995.
[9] CLUSTERING AND CLASSIFICATION: DATA MINING [32] J. R. Quinlan, Improved Use of Continuous Attributes in C4.5, Journal
APPROACHES by Ed Colet of Arti_cial Intelligence Research 4 (1996) 77-90, Submitted 10/95;
[10] Kavitha Murugeshan, Neeraj RK, Discovering Patterns to Produce published 3/96.
Effective Output through Text Mining Using Nave Bayesian [33] T. G. Dietterich, Ensemble methods in machine learning, Lecture
Algorithm, International Journal of Innovative Technology and Notes in Computer Science, vol. 1857, pp. 115, 2000.
Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-2, Issue-6, [34] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan
May 2013 Kaufman Publishers, 1993.
[11] Dorina Kabakchieva, Predicting Student Performance by Using Data [35] P. Langley, W. Iba, and K. Thompson, An analysis of Bayesian
Mining Methods for Classification, BULGARIAN ACADEMY OF classifiers, in National Conf. on Artigicial Intelligence, 1992, pp. 223
SCIENCES CYBERNETICS AND INFORMATION 228.
TECHNOLOGIES, Volume 13, No 1 ,Sofia 2013 Print ISSN: 1311- [36] R. Andrews, J. Diederich, and A. Tickle, A survey and critique of
9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 techniques for extracting rules from trained artificial neural networks,
Knowledge Based Systems, vol. 8, no. 6, pp. 373389, 1995.
151
[37] W. W. Cohen, Fast effective rule induction, in Proc. of the 12th Intl. [39] J. Catlett, Megainduction: A test flight, in ML, 1991, pp. 596599.
Conf. on Machine Learning, 1995, pp. 115123. [40] G. Pagallo and D. Haussler, Boolean feature discovery in empirical
[38] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification learning, Machine Learning, vol. 5, pp. 7199, 1990.
and Regression Trees, Belmont, CA: Wadsworth, 1984.
152

Literature Review CCSIT205

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Literature Review CCSIT205

Diunggah oleh

Hak Cipta:

Format Tersedia

4th International Conference on System Modeling & Advancement in Research Trends (SMART)

A Systematic Review of Classification Techniques

II. STATE OF ART OF THE CLASSIFICATION TECHNIQUES A. Decision Tree

In classification a model or classifier is Decision tree is a classifier that can be viewed as

1) Decision Tree Algorithm distribution of the node tuples may be

inductive approach. It uses the greedy top to down

decision rules [16][14]. Let D is a database with N

tuples in the resulting partitions and reflects the Not

Such an approach minimizes the expected number Saya CNG Three

of tests needed to classify a given tuple and

It differs from information gain, which measures

Fig. 5 Decision Tree for Database D

Algorithm Advantages Disadvantages

Anda mungkin juga menyukai