Anda di halaman 1dari 85

Machine Learning

(BCO 086A)
Submitted By:
Anuja Sharma
Assistant Professor
CSE
Machine Learning

MODULE-I
Concept and Decision Tree
Concept & Concept Learning

3 Machine Learning
Concept & Concept Learning

4 Machine Learning
Concept & Concept Learning

5 Machine Learning
A Concept Learning Task – Enjoy Sport
Training Examples

6 Machine Learning
EnjoySport – Hypothesis Representation

7 Machine Learning
Hypothesis Representation

8 Machine Learning
Enjoy Sport ConcptLearning Task

9 Machine Learning
Terminology

10 Machine Learning
Concept Learning as a Search

11 Machine Learning
Enjoy Sport-Hypothesis Space

12 Machine Learning
General to specific ordering

13 Machine Learning
More General than Relation

14 Machine Learning
More-General Relation

15 Machine Learning
FIND-S Algorithm

16 Machine Learning
FIND- S Algorithm

17 Machine Learning
Candidate Elimination Algorithm

18 Machine Learning
Consistent Hypothesis

19 Machine Learning
Version Space & Candidate elimination
Algorthm

20 Machine Learning
Compact Representation of Version
Spaces

21 Machine Learning
Example of Version Space

22 Machine Learning
Candidate Elimination Example

23 Machine Learning
Candidate Elimination Example

24 Machine Learning
Candidate Elimination Example

25 Machine Learning
Candidate Elimination Example

26 Machine Learning
Candidate Elimination Algorithm

27 Machine Learning
Candidate Elimination Algorithm

28 Machine Learning
Candidate Elimination Algortihm

29 Machine Learning
Final Version Space

30 Machine Learning
Candidate Elimination Algorithm-
Example Final Version Space

31 Machine Learning
Inductive Bias-A Biased Hypothesis

32 Machine Learning
Inductive Bias-An Unbaised Learner

33 Machine Learning
Inductive Bias- Formal Definition

34 Machine Learning
Decision Trees

35 Machine Learning
From Decision trees to Logic

36 Machine Learning
Decision Trees

37 Machine Learning
Machine Learning

MODULE-II
Neural Networks
Perceptron Node – Threshold Logic Unit

x1 w1

x2 w2  Z

xn wn

n
1 if x w 
i 1
i i

z n
0 if x w 
i 1
i i

39
Learning Algorithm

x1 .4

.1 Z

x2 -.2

x2 x2 T n
1 if x w i i
.8 .3 1 z
i 1
n
.4 .1 0 0 if x w 
i 1
i i

40 Machine Learning
First Training Instance

.8 .4

.1 Z =1

.3 -.2 Net = .8*.4 + .3*-.2 = .26

x2 x2 T n
1 if x w i i
.8 .3 1 z
i 1
n
.4 .1 0 0 if x w 
i 1
i i

41 Machine Learning
Second Training Instance

.4 .4

.1 Z =1

.1 -.2 Net = .4*.4 + .1*-.2 = .14

x2 x2 T n
1 if x w i i
.8 .3 1 z
i 1 Dwi = (T - Z)* C * Xi
n
.4 .1 0 0 if x w 
i 1
i i

42 Machine Learning
Delta Rule Learning
Dwij = C(Tj – Zj) xi
 Create a network with n input and m output nodes
 Each iteration through the training set is an epoch
 Continue training until error is less than some epsilon
 Perceptron Convergence Theorem: Guaranteed to find a
solution in finite time if a solution exists
 As can be seen from the node activation function the decision
surface is an n-dimensional hyper plane
n
1 if x w 
i 1
i i

z n
0 if x w 
i 1
i i

43 Machine Learning
Linear Separability

44 Machine Learning
Linear Separability and Generalization

When is data noise vs. a legitimate exception

45 Machine Learning
Limited Functionality of Hyperplane

46 Machine Learning
Gradient Descent Learning

Error Landscape

TSS:
Total
Sum
Squared
Error

0
Weight Values

47 Machine Learning
Deriving a Gradient Descent Learning
Algorithm
 Goal to decrease overall error (or other objective function)
each time a weight is changed
 Total Sum Squared error = S (Ti – Zi)2
 Seek a weight changing algorithm such that E is negative
wij descent
 If a formula can be found then we have a gradient
learning algorithm
 Perceptron/Delta rule is a gradient descent learning
algorithm
 Linearly-separable problems have no local minima

48 Machine Learning
Multi-layer Perceptron
 Can compute arbitrary mappings
 Assumes a non-linear activation function
 Training Algorithms less obvious
 Backpropagation learning algorithm not exploited until
1980’s
 First of many powerful multi-layer learning algorithms

49 Machine Learning
Responsibility Problem

Output 1
Wanted 0

50 Machine Learning
Multi-Layer Generalization

51 Machine Learning
Backpropagation
 Multi-layer supervised learner
 Gradient Descent weight updates
 Sigmoid activation function (smoothed threshold logic)

 Backpropagation requires a differentiable activation function

52 Machine Learning
Multi-layer Perceptron Topology
i

i j

Input Layer Hidden Layer(s) Output Layer

53 Machine Learning
Backpropagation Learning Algorithm
 Until Convergence (low error or other criteria) do
 Present a training pattern
 Calculate the error of the output nodes (based on T - Z)
 Calculate the error of the hidden nodes (based on the error of
the output nodes which is propagated back to the hidden nodes)
 Continue propagating error back until the input layer is reached
 Update all weights based on the standard delta rule with the
appropriate error function d

Dwij = Cdj Zi

54 Machine Learning
Activation Function and its Derivative
 Node activation function f(net) is typically the sigmoid

1
Z j  f (net j )  -net
.5

1 e j 0
-5 0 5
Net
 Derivate of activation function is critical part of algorithm

.25

f ' (net j )  Z j (1 - Z j )
0
-5 0 5
Net

55 Machine Learning
Backpropagation Learning Equations
Dwij  Cd j Z i
d j  (T j - Z j ) f ' (net j ) [Output Node]
d j   (d k w jk ) f ' (net j ) [Hidden Node]
k

i j

56 Machine Learning
Backpropagation Summary
 Excellent Empirical results
 Scaling – The pleasant surprise
 Local Minima very rare is problem and network complexity increase
 Most common neural network approach
 User defined parameters lead to more difficulty of use
 Number of hidden nodes, layers, learning rate, etc.
 Many variants
 Adaptive Parameters, Ontogenic (growing and pruning) learning
algorithms
 Higher order gradient descent (Newton, Conjugate Gradient, etc.)
 Recurrent networks

57 Machine Learning
Inductive Bias
 The approach used to decide how to generalize novel cases
 Occam’s Razor – The simplest hypothesis which fits the data
is usually the best – Still many remaining options

A B C -> Z
A B’ C -> Z
A B C’ -> Z
A B’ C’ -> Z
A’ B’ C’ -> Z’

 Now you receive the new input A’ B C What is your output?

58 Machine Learning
Overfitting
Noise vs. Exceptions revisited

59 Machine Learning
The Overfit Problem
 Newer powerful models can have very complex decision
TSS
surfaces which can converge well on most training sets by
learning noisy and irrelevant aspects of theValidation/Test
training setSetin
order to minimize error (memorization inTraining
the limit)
Set
Epochs
 This makes them susceptible to overfit if not carefully
considered

60 Machine Learning
Avoiding Overfit
 Inductive Bias – Simplest accurate model
 More Training Data (vs. overtraining - One epoch limit)
 Validation Set (requires separate test set)
 Backpropagation – Tends to build from simple model (0
weights) to just large enough weights (Validation Set)
 Stopping criteria with any constructive model (Accuracy
increase vs Statistical significance) – Noise vs. Exceptions
 Specific Techniques
 Weight Decay, Pruning, Jitter, Regularization
 Ensembles

61 Machine Learning
Ensembles
 Many different Ensemble approaches
 Stacking, Gating/Mixture of Experts, Bagging, Boosting, Wagging, Mimicking,
Combinations
 Multiple diverse models trained on same problem and then their outputs are
combined
 The specific overfit of each learning model is averaged out
 If models are diverse (uncorrelated errors) then even if the individual models
are weak generalizers, the ensemble can be very accurate

Combining Technique

M1 M2 M3 Mn

62 Machine Learning
Application Issues
 Choose relevant features
 Normalize features
 Can learn to ignore irrelevant features, but will have to fight
the curse of dimensionality
 More data (training examples) the better
 Slower training acceptable for complex and production
applications if accuracy improvement, (“The week
phenomenon”)
 Execution normally fast regardless of training time

63 Machine Learning
Decision Trees - ID3/C4.5
 Top down induction of decision trees
 Highly used and successful
 Attribute Features - discrete nominal (mutually exclusive) –
Real valued features are discretized
 Search for smallest tree is too complex (always NP hard)
 C4.5 use common symbolic ML philosophy of a greedy
iterative approach

64 Machine Learning
Decision Tree Learning
 Mapping by Hyper-Rectangles

A1

A2

65 Machine Learning
ID3 Learning Approach
 C is the current set of examples
 A test on attribute A partitions C into {Ci, C2,...,Cw} where w
is the number of values of A

Attribute:Color Red Green


Purple

C1 C2 C3

66 Machine Learning
Decision Tree Learning Algorithm
 Start with the Training Set as C and test how each attribute
partitions C
 Choose the best A for root
 The goodness measure is based on how well attribute A divides
C into different output classes – A perfect attribute would
divide C into partitions that contain only one output class
each – A poor attribute (irrelevant) would leave each
partition with the same ratio of classes as in C
 20 questions analogy – good questions quickly minimize the
possibilities
 Continue recursively until sets unambiguously classified or a
stopping criteria is reached

67 Machine Learning
ID3 Example and Discussion
 14 Examples. Uses Information Gain. Attributes which best
discriminate between classes are chosenHumidity
Temperature

 If the sameP ratios N


are found in P
partitioned set, N
then gain is 0
Hot 2 2 High 3 4
Mild 4 2 Normal 6 1
Cool 3 1
Gain: .029 Gain: .151

68 Machine Learning
ID3 - Conclusions
 Good Empirical Results
 Comparable application robustness and accuracy with neural
networks - faster learning (though NNs are more natural
with continuous features - both input and output)
 Most used and well known of current symbolic systems -
used widely to aid in creating rules for expert systems

69 Machine Learning
Nearest Neighbor Learners
 Broad Spectrum
 Basic K-NN, Instance Based Learning, Case Based Reasoning,
Analogical Reasoning
 Simply store all or some representative subset of the
examples in the training set
 Generalize on the fly rather than use pre-acquired hypothesis
- faster learning, slower execution, information retained,
memory intensive

70 Machine Learning
Nearest Neighbor Algorithms

71 Machine Learning
Nearest Neighbor Variations
 How many examples to store
 How do stored example vote (distance weighted, etc.)
 Can we choose a smaller set of near-optimal examples
(prototypes/exemplars)
 Storage reduction
 Faster execution
 Noise robustness
 Distance Metrics – non-Euclidean
 Irrelevant Features – Feature weighting

72 Machine Learning
Evolutionary Computation/Algorithms
Genetic Algorithms
 Simulate “natural” evolution of structures via selection and
reproduction, based on performance (fitness)
 Type of Heuristic Search - Discovery, not inductive in
isolation
 Genetic Operators - Recombination (Crossover) and
Mutation are most common
1 1 0 2 3 1 0 2 2 1 (Fitness = 10)
2 2 0 1 1 3 1 1 0 0 (Fitness = 12)
2 2 0 1 3 1 0 2 2 1 (Fitness = calculated or f(parents))

73 Machine Learning
Evolutionary Algorithms
 Start with initialized population P(t) - random, domain-
knowledge, etc.
 Population usually made up of possible parameter settings for
a complex problem
 Typically have fixed population size (like beam search)
 Selection
 Parent_Selection P(t) - Promising Parents used to create new
children
 Survive P(t) - Pruning of unpromising candidates
 Evaluate P(t) - Calculate fitness of population members.
Ranges from simple metrics to complex simulations.

74 Machine Learning
Evolutionary Algorithm
Procedure EA
t = 0;
Initialize Population P(t);
Evaluate P(t);
Until Done{ /*Sufficiently “good” individuals discovered*/
t = t+1;
Parent_Selection P(t);
Recombine P(t);
Mutate P(t);
Evaluate P(t);
Survive P(t);}

75 Machine Learning
EA Example
 Goal: Discover a new automotive engine to maximize
performance, reliability, and mileage while minimizing
emissions
 Features: CID (Cubic inch displacement), fuel system, # of
valves, # of cylinders, presence of turbo-charging
 Assume - Test unit which tests possible engines and returns
integer measure of goodness
 Start with population of random engines

76 Machine Learning
77 Machine Learning
78 Machine Learning
Genetic Operators
 Crossover variations - multi-point, uniform probability,
averaging, etc.
 Mutation - Random changes in features, adaptive, different
for each feature, etc.
 Others - many schemes mimicking natural genetics:
dominance, selective mating, inversion, reordering,
speciation, knowledge-based, etc.
 Reproduction - terminology - selection based on fitness - keep
best around - supported in the algorithms
 Critical to maintain balance of diversity and quality in the
population

79 Machine Learning
Evolutionary Algorithms
 There exist mathematical proofs that evolutionary techniques are efficient
search strategies
 There are a number of different Evolutionary strategies
 Genetic Algorithms
 Evolutionary Programming
 Evolution Strategies
 Genetic Programming
 Strategies differ in representations, selection, operators, evaluation, etc.
 Most independently discovered, initially function optimization (EP, ES)
 Strategies continue to “evolve”

80 Machine Learning
Genetic Algorithm Comments
 Much current work and extensions
 Numerous application attempts. Can plug into many
algorithms requiring search. Has built-in heuristic. Could
augment with domain heuristics
 “Lazy Man’s Solution” to any tough parameter search

81 Machine Learning
Rule Induction
 Creates a set of symbolic rules to solve a classification
problem
 Sequential Covering Algorithms
 Until no good and significant rules can be created
 Create all first order rules Ax -> Classy
 Score each rule based on goodness (accuracy) and significance
using the current training set
 Iteratively (greedily) expand the best rules to n+1 attributes,
score the new rules, and prune weak rules to keep the total
candidate list at a fixed size (beam search)
 Pick the one best rule and remove all instances from the
training set that the rule covers

82 Machine Learning
Rule Induction Variants
 Ordered Rule lists (decision lists) - naturally supports
multiple output classes
 A=Green and B=Tall -> Class 1
 A=Red and C=Fast -> Class 2
 Else Class 1
 Placing new rules at beginning or end of list
 Unordered rule lists for each output class (must handle
multiple matches)
 Rule induction can handle noise by no longer creating new
rules when gain is negligible or not statistically significant

83 Machine Learning
Conclusion
 Many new algorithms and approaches being proposed
 Application areas rapidly increasing
 Amount of available data and information growing
 User desire for more adaptive and user-specific computer
interaction
 This need for specific and adaptable user interaction will make
machine learning a more important tool in user interface
research and applications

84 Machine Learning
Thank You!

Anda mungkin juga menyukai