Anda di halaman 1dari 9

Machine Learning

Machine Learning is the systematic study of algorithms and systems that improve their knowledge or
performance (learn a model for accomplishing a task) with experience (from available data /examples)
Examples:
 Given an URL decide whether it is a Sports website or not
 Given that a buyer is buying a book at online store, suggest some related products for that
buyer
 Given an ultrasound image of abdomen scan of a pregnant lady, predict the weight of the
baby

Like human learning from past experiences, a computer does not have “experiences”.

A computer system learns from data, which represent some “past experiences” of an
application domain.

Objective of machine learning : learn a target function that can be used to predict the
values of a discrete class attribute, e.g., approve or not-approved, and high-risk or low
risk.

The task is commonly called: Supervised learning, classification, or inductive learning

Supervised Learning

 The computer is presented with example inputs and their desired outputs, given by a "teacher",
and the goal is to learn a general rule that maps inputs to outputs.
 Supervised learning is a machine learning technique for learning a function from training
data.
 The training data consist of pairs of input objects (typically vectors), and desired outputs.
The output of the function can be a continuous value (called regression), or can predict a
class label of the input object (called classification).
 The task of the supervised learner is to predict the value of the function for any valid
input object after having seen a number of training examples (i.e. pairs of input and target
output).
 To achieve this, the learner has to generalize from the presented data to unseen situations
in a "reasonable" way.
 Another term for supervised learning is classification.
 Classifier performance depend greatly on the characteristics of the data to be classified.
There is no single classifier that works best on all given problems.
 Determining a suitable classifier for a given problem is however still more an art than a
science.
 The most widely used classifiers are the Neural Network (Multi-layer Perceptron),
Support Vector Machines, k-Nearest Neighbors, Gaussian Mixture Model, Gaussian,
Naive Bayes, Decision Tree and RBF classifiers.

Supervised learning process: two steps



Learning (training): Learn a model using the training data

Testing: Test the model using unseen test data to assess the model accuracy

Accuracy  Number of correct classifica tions ,


Total number of test cases
Decision Tree Representation/ Learning by decision tree
 Decision tree induction is the learning of decision trees from class-labeled training tuples.
 A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node)
denotes a test on an attribute, each branch represents an outcome of the test, and each leaf
node (or terminal node) holds a class label.
 The topmost node in a tree is the root node.

 A typical decision tree is shown in above Figure.


 It represents the concept buys computer, that is, it predicts whether a customer at
AllElectronics is likely to purchase a computer. Internal nodes are denoted by rectangles, and
leaf nodes are denoted by ovals. Some decision tree algorithms produce only binary trees
(where each internal node branches to exactly two other nodes), whereas others can produce
non binary trees.“How are decision trees used for classification?”
 Decision Tree Induction
 The algorithm is called with three parameters: D, attribute list, and Attribute selection
method.
 We refer to D as a data partition. Initially, it is the complete set of training tuples and
their associated class labels.
 The parameter attribute list is a list of attributes describing the tuples.
Information gain
ID3 uses information gain as its attribute selection measure.

 Information gain is defined as the difference between the original information requirement
(i.e., based on just the proportion of classes) and the new requirement (i.e., obtained after
partitioning on A). That is,

 In other words, Gain(A) tells us how much would be gained by branching on A. It is the
expected reduction in the information requirement caused by knowing the value of A. The
attribute A with the highest information gain, (Gain(A)), is chosen as the splitting attribute at
node N.
 Hence, the gain in information from such a partitioning would be

 Similarly, we can compute Gain(income) = 0.029 bits, Gain(student) = 0.151 bits, and
Gain(credit rating) = 0.048 bits. Because age has the highest information gain among the
attributes, it is selected as the splitting attribute. Node N is labeled with age, and branches are
grown for each of the attribute’s values. The tuples are then partitioned accordingly, as shown
in Figure 6.5. Notice that the tuples falling into the partition for age = middle aged all belong
to the same class. Because they all belong to class “yes,” a leaf should therefore be created at
the end of this branch and labeled with “yes.” The final decision tree returned by the
algorithm is shown in Figure 6.5.

Gain Ratio:
 C4.5, a successor of ID3 uses an extention to information gain known as gain ratio, which
attempts to overcome the bias.
 It applies a kind of normalization to information gain using a “split information”

 SplitInfo(D)=-

 The gain ratio is defined as


GainRatio(A) =

Gini Index:
The Gini index is used in CART. Using the notation, the gini index measures the impurity of D, a
data partition or set of training tuples , as
Gini(D)=1-

GiniA(D)=

Example:
Information gain
PREDICTION
 Numeric prediction is the task of predicting continuous (or ordered) values for given input.
 For example, we may wish to predict the salary of college graduates with 10 years of work
experience, or the potential sales of a new product given its price. By far, the most widely
used approach for numeric prediction is regression.
 Regression analysis can be used to model the relationship between one or more independent
or predictor variables and a dependent or response variable.
 The response variable is what we want to predict.
 Regression analysis is a good choice when all of the predictor variables are continuous
valued as well.
 Linear Regression

 Multiple linear regression is an extension of straight line regression so as to improve more


than one predictor variable.
 Multiple regression problems are commonly solved with the use of statistical software
packages such as SAS, SPASS and S-PLUS.

2) Nonlinear Regression
Polynomial regression is often of interest when there is just one predictor variable. It can
be modeled by adding polynomial terms to the basic linear model. By applying transformations
to the variables, we can convert the nonlinear model into a linear one that can then be solved by
the method of least squares.
The non linear regression model is as follows:
Y=w0+w1x+w2x2+w3x3
To convert this equation to linear form, we define new variables:
X1=x x2=x2 x3=x3

Unsupervised learning

Unsupervised learning, no labels are given to the learning algorithm, leaving it on its own
to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden
patterns in data) or a means towards an end.

Example
 Suppose you have a basket and it is filled with some different types fruits, your task is to
arrange them as groups.
 This time you don’t know anything about the fruits, honestly saying this is the first time
you have seen them. You have no clue about those.
 So, how will you arrange them? What will you do first???
 You will take a fruit and you will arrange them by considering physical character of that
particular fruit.
 Suppose you have considered color.
 Then you will arrange them on considering base condition as color.
 Then the groups will be something like this.
 RED COLOR GROUP: apples & cherry fruits.
 GREEN COLOR GROUP: bananas & grapes.
 So now you will take another physical character such as size.
 RED COLOR AND BIG SIZE: apple.
 RED COLOR AND SMALL SIZE: cherry fruits.
 GREEN COLOR AND BIG SIZE: bananas.
 GREEN COLOR AND SMALL SIZE: grapes.
 Job done happy ending.
 Here you did not learn anything before, means no train data and no response variable.
 This type of learning is known as unsupervised learning.
 Clustering comes under unsupervised learning.
 Clustering

In clustering, a set of inputs is to be divided into groups. Unlike in


classification, the groups are not known beforehand, making this typically
an unsupervised task.
 Probability distribution estimation
 Finding association (in features)
 Dimension reduction
 Dimensionality reduction simplifies inputs by mapping them into a lower-
dimensional space. Topic modeling is a related problem, where a program
is given a list of human language documents and is tasked to find out
which documents cover similar topics.
Clustering:
 The process of grouping a set of physical or abstract objects into classes of similar
objects is called clustering.
 A cluster is a collection of data objects that are similar to one another within the
same cluster and are dissimilar to the objects in other clusters.
 A cluster of data objects can be treated collectively as one group and so may be
considered as a form of data compression.
 Although classification is an effective means for distinguishing groups or classes of
objects, it requires the often costly collection and labeling of a large set of training
tuples or patterns, which the classifier uses to model each group.
 It is often more desirable to proceed in the reverse direction: First partition the set of
data into groups based on data similarity (e.g., using clustering), and then assign
labels to the relatively small number of groups. Additional advantages of such a
clustering-based process are that it is adaptable to changes and helps single out useful
features that distinguish different groups.
 Clustering is a challenging field of research in which its potential applications pose
their own special requirements. The following are typical requirements of clustering
in data mining:
 Scalability: Many clustering algorithms work well on small data sets containing fewer
than several hundred data objects; however, a large database may contain millions of
objects.
 Ability to deal with different types of attributes: Many algorithms are designed to cluster
interval-based (numerical) data.
 Minimal requirements for domain knowledge to determine input parameters: Many
clustering algorithms require users to input certain parameters in cluster.

Anda mungkin juga menyukai