ML

Machine Learning
(BCO 086A)
Submitted By:
Anuja Sharma
Assistant Professor
CSE
Machine Learning
MODULE-I
Concept and Decision Tree
Concept & Concept Learning
3 Machine Learning
4 Machine Learning
5 Machine Learning
A Concept Learning Task – Enjoy Sport
Training Examples
6 Machine Learning
EnjoySport – Hypothesis Representation
7 Machine Learning
Hypothesis Representation
8 Machine Learning
Enjoy Sport ConcptLearning Task
9 Machine Learning
Terminology
10 Machine Learning
Concept Learning as a Search
11 Machine Learning
Enjoy Sport-Hypothesis Space
12 Machine Learning
General to specific ordering
13 Machine Learning
More General than Relation
14 Machine Learning
More-General Relation
15 Machine Learning
FIND-S Algorithm
16 Machine Learning
FIND- S Algorithm
17 Machine Learning
Candidate Elimination Algorithm
18 Machine Learning
Consistent Hypothesis
19 Machine Learning
Version Space & Candidate elimination
Algorthm
20 Machine Learning
Compact Representation of Version
Spaces
21 Machine Learning
Example of Version Space
22 Machine Learning
Candidate Elimination Example
23 Machine Learning
24 Machine Learning
25 Machine Learning
26 Machine Learning
27 Machine Learning
28 Machine Learning
Candidate Elimination Algortihm
29 Machine Learning
Final Version Space
30 Machine Learning
Candidate Elimination Algorithm-
Example Final Version Space
31 Machine Learning
Inductive Bias-A Biased Hypothesis
32 Machine Learning
Inductive Bias-An Unbaised Learner
33 Machine Learning
Inductive Bias- Formal Definition
34 Machine Learning
Decision Trees
35 Machine Learning
From Decision trees to Logic
36 Machine Learning
Decision Trees
37 Machine Learning
Machine Learning
MODULE-II
Neural Networks
Perceptron Node – Threshold Logic Unit
x1 w1
x2 w2  Z
xn wn
n
1 if x w 
i 1
i i
z n
0 if x w 
i 1
i i
39
Learning Algorithm
x1 .4
.1 Z
x2 -.2
x2 x2 T n
1 if x w i i
.8 .3 1 z
i 1
n
.4 .1 0 0 if x w 
i 1
i i
40 Machine Learning
First Training Instance
.8 .4
.1 Z =1
.3 -.2 Net = .8*.4 + .3*-.2 = .26
x2 x2 T n
.8 .3 1 z
i 1
n
.4 .1 0 0 if x w 
i 1
i i
41 Machine Learning
Second Training Instance
.4 .4
.1 Z =1
.1 -.2 Net = .4*.4 + .1*-.2 = .14
x2 x2 T n
.8 .3 1 z
i 1 Dwi = (T - Z)* C * Xi
n
.4 .1 0 0 if x w 
i 1
i i
42 Machine Learning
Delta Rule Learning
Dwij = C(Tj – Zj) xi
 Create a network with n input and m output nodes
 Each iteration through the training set is an epoch
 Continue training until error is less than some epsilon
 Perceptron Convergence Theorem: Guaranteed to find a
solution in finite time if a solution exists
 As can be seen from the node activation function the decision
surface is an n-dimensional hyper plane
n
1 if x w 
i 1
i i
z n
0 if x w 
i 1
i i
43 Machine Learning
Linear Separability
44 Machine Learning
Linear Separability and Generalization
When is data noise vs. a legitimate exception
45 Machine Learning
Limited Functionality of Hyperplane
46 Machine Learning
Gradient Descent Learning
Error Landscape
TSS:
Total
Sum
Squared
Error
0
Weight Values
47 Machine Learning
Deriving a Gradient Descent Learning
Algorithm
 Goal to decrease overall error (or other objective function)
each time a weight is changed
 Total Sum Squared error = S (Ti – Zi)2
 Seek a weight changing algorithm such that E is negative
wij descent
 If a formula can be found then we have a gradient
learning algorithm
 Perceptron/Delta rule is a gradient descent learning
algorithm
 Linearly-separable problems have no local minima
48 Machine Learning
Multi-layer Perceptron
 Can compute arbitrary mappings
 Assumes a non-linear activation function
 Training Algorithms less obvious
 Backpropagation learning algorithm not exploited until
1980’s
 First of many powerful multi-layer learning algorithms
49 Machine Learning
Responsibility Problem
Output 1
Wanted 0
50 Machine Learning
Multi-Layer Generalization
51 Machine Learning
Backpropagation
 Multi-layer supervised learner
 Gradient Descent weight updates
 Sigmoid activation function (smoothed threshold logic)
 Backpropagation requires a differentiable activation function
52 Machine Learning
Multi-layer Perceptron Topology
i
i j
Input Layer Hidden Layer(s) Output Layer
53 Machine Learning
Backpropagation Learning Algorithm
 Until Convergence (low error or other criteria) do
 Present a training pattern
 Calculate the error of the output nodes (based on T - Z)
 Calculate the error of the hidden nodes (based on the error of
the output nodes which is propagated back to the hidden nodes)
 Continue propagating error back until the input layer is reached
 Update all weights based on the standard delta rule with the
appropriate error function d
Dwij = Cdj Zi
54 Machine Learning
Activation Function and its Derivative
 Node activation function f(net) is typically the sigmoid
1
Z j  f (net j )  -net
.5
1 e j 0
-5 0 5
Net
 Derivate of activation function is critical part of algorithm
.25
f ' (net j )  Z j (1 - Z j )
0
-5 0 5
Net
55 Machine Learning
Backpropagation Learning Equations
Dwij  Cd j Z i
d j  (T j - Z j ) f ' (net j ) [Output Node]
d j   (d k w jk ) f ' (net j ) [Hidden Node]
k
i j
56 Machine Learning
Backpropagation Summary
 Excellent Empirical results
 Scaling – The pleasant surprise
 Local Minima very rare is problem and network complexity increase
 Most common neural network approach
 User defined parameters lead to more difficulty of use
 Number of hidden nodes, layers, learning rate, etc.
 Many variants
 Adaptive Parameters, Ontogenic (growing and pruning) learning
algorithms
 Higher order gradient descent (Newton, Conjugate Gradient, etc.)
 Recurrent networks
57 Machine Learning
Inductive Bias
 The approach used to decide how to generalize novel cases
 Occam’s Razor – The simplest hypothesis which fits the data
is usually the best – Still many remaining options
A B C -> Z
A B’ C -> Z
A B C’ -> Z
A B’ C’ -> Z
A’ B’ C’ -> Z’
 Now you receive the new input A’ B C What is your output?
58 Machine Learning
Overfitting
Noise vs. Exceptions revisited
59 Machine Learning
The Overfit Problem
 Newer powerful models can have very complex decision
TSS
surfaces which can converge well on most training sets by
learning noisy and irrelevant aspects of theValidation/Test
training setSetin
order to minimize error (memorization inTraining
the limit)
Set
Epochs
 This makes them susceptible to overfit if not carefully
considered
60 Machine Learning
Avoiding Overfit
 Inductive Bias – Simplest accurate model
 More Training Data (vs. overtraining - One epoch limit)
 Validation Set (requires separate test set)
 Backpropagation – Tends to build from simple model (0
weights) to just large enough weights (Validation Set)
 Stopping criteria with any constructive model (Accuracy
increase vs Statistical significance) – Noise vs. Exceptions
 Specific Techniques
 Weight Decay, Pruning, Jitter, Regularization
 Ensembles
61 Machine Learning
Ensembles
 Many different Ensemble approaches
 Stacking, Gating/Mixture of Experts, Bagging, Boosting, Wagging, Mimicking,
Combinations
 Multiple diverse models trained on same problem and then their outputs are
combined
 The specific overfit of each learning model is averaged out
 If models are diverse (uncorrelated errors) then even if the individual models
are weak generalizers, the ensemble can be very accurate
Combining Technique
M1 M2 M3 Mn
62 Machine Learning
Application Issues
 Choose relevant features
 Normalize features
 Can learn to ignore irrelevant features, but will have to fight
the curse of dimensionality
 More data (training examples) the better
 Slower training acceptable for complex and production
applications if accuracy improvement, (“The week
phenomenon”)
 Execution normally fast regardless of training time
63 Machine Learning
Decision Trees - ID3/C4.5
 Top down induction of decision trees
 Highly used and successful
 Attribute Features - discrete nominal (mutually exclusive) –
Real valued features are discretized
 Search for smallest tree is too complex (always NP hard)
 C4.5 use common symbolic ML philosophy of a greedy
iterative approach
64 Machine Learning
Decision Tree Learning
 Mapping by Hyper-Rectangles
A1
A2
65 Machine Learning
ID3 Learning Approach
 C is the current set of examples
 A test on attribute A partitions C into {Ci, C2,...,Cw} where w
is the number of values of A
Attribute:Color Red Green

Purple
C1 C2 C3
66 Machine Learning
Decision Tree Learning Algorithm
 Start with the Training Set as C and test how each attribute
partitions C
 Choose the best A for root
 The goodness measure is based on how well attribute A divides
C into different output classes – A perfect attribute would
divide C into partitions that contain only one output class
each – A poor attribute (irrelevant) would leave each
partition with the same ratio of classes as in C
 20 questions analogy – good questions quickly minimize the
possibilities
 Continue recursively until sets unambiguously classified or a
stopping criteria is reached
67 Machine Learning
ID3 Example and Discussion
 14 Examples. Uses Information Gain. Attributes which best
discriminate between classes are chosenHumidity
Temperature
 If the sameP ratios N

are found in P
partitioned set, N
then gain is 0
Hot 2 2 High 3 4
Mild 4 2 Normal 6 1
Cool 3 1
Gain: .029 Gain: .151
68 Machine Learning
ID3 - Conclusions
 Good Empirical Results
 Comparable application robustness and accuracy with neural
networks - faster learning (though NNs are more natural
with continuous features - both input and output)
 Most used and well known of current symbolic systems -
used widely to aid in creating rules for expert systems
69 Machine Learning
Nearest Neighbor Learners
 Broad Spectrum
 Basic K-NN, Instance Based Learning, Case Based Reasoning,
Analogical Reasoning
 Simply store all or some representative subset of the
examples in the training set
 Generalize on the fly rather than use pre-acquired hypothesis
- faster learning, slower execution, information retained,
memory intensive
70 Machine Learning
Nearest Neighbor Algorithms
71 Machine Learning
Nearest Neighbor Variations
 How many examples to store
 How do stored example vote (distance weighted, etc.)
 Can we choose a smaller set of near-optimal examples
(prototypes/exemplars)
 Storage reduction
 Faster execution
 Noise robustness
 Distance Metrics – non-Euclidean
 Irrelevant Features – Feature weighting
72 Machine Learning
Evolutionary Computation/Algorithms
Genetic Algorithms
 Simulate “natural” evolution of structures via selection and
reproduction, based on performance (fitness)
 Type of Heuristic Search - Discovery, not inductive in
isolation
 Genetic Operators - Recombination (Crossover) and
Mutation are most common
1 1 0 2 3 1 0 2 2 1 (Fitness = 10)
2 2 0 1 1 3 1 1 0 0 (Fitness = 12)
2 2 0 1 3 1 0 2 2 1 (Fitness = calculated or f(parents))
73 Machine Learning
Evolutionary Algorithms
 Start with initialized population P(t) - random, domain-
knowledge, etc.
 Population usually made up of possible parameter settings for
a complex problem
 Typically have fixed population size (like beam search)
 Selection
 Parent_Selection P(t) - Promising Parents used to create new
children
 Survive P(t) - Pruning of unpromising candidates
 Evaluate P(t) - Calculate fitness of population members.
Ranges from simple metrics to complex simulations.
74 Machine Learning
Evolutionary Algorithm
Procedure EA
t = 0;
Initialize Population P(t);
Evaluate P(t);
Until Done{ /*Sufficiently “good” individuals discovered*/
t = t+1;
Parent_Selection P(t);
Recombine P(t);
Mutate P(t);
Evaluate P(t);
Survive P(t);}
75 Machine Learning
EA Example
 Goal: Discover a new automotive engine to maximize
performance, reliability, and mileage while minimizing
emissions
 Features: CID (Cubic inch displacement), fuel system, # of
valves, # of cylinders, presence of turbo-charging
 Assume - Test unit which tests possible engines and returns
integer measure of goodness
 Start with population of random engines
76 Machine Learning
77 Machine Learning
78 Machine Learning
Genetic Operators
 Crossover variations - multi-point, uniform probability,
averaging, etc.
 Mutation - Random changes in features, adaptive, different
for each feature, etc.
 Others - many schemes mimicking natural genetics:
dominance, selective mating, inversion, reordering,
speciation, knowledge-based, etc.
 Reproduction - terminology - selection based on fitness - keep
best around - supported in the algorithms
 Critical to maintain balance of diversity and quality in the
population
79 Machine Learning
Evolutionary Algorithms
 There exist mathematical proofs that evolutionary techniques are efficient
search strategies
 There are a number of different Evolutionary strategies
 Genetic Algorithms
 Evolutionary Programming
 Evolution Strategies
 Genetic Programming
 Strategies differ in representations, selection, operators, evaluation, etc.
 Most independently discovered, initially function optimization (EP, ES)
 Strategies continue to “evolve”
80 Machine Learning
Genetic Algorithm Comments
 Much current work and extensions
 Numerous application attempts. Can plug into many
algorithms requiring search. Has built-in heuristic. Could
augment with domain heuristics
 “Lazy Man’s Solution” to any tough parameter search
81 Machine Learning
Rule Induction
 Creates a set of symbolic rules to solve a classification
problem
 Sequential Covering Algorithms
 Until no good and significant rules can be created
 Create all first order rules Ax -> Classy
 Score each rule based on goodness (accuracy) and significance
using the current training set
 Iteratively (greedily) expand the best rules to n+1 attributes,
score the new rules, and prune weak rules to keep the total
candidate list at a fixed size (beam search)
 Pick the one best rule and remove all instances from the
training set that the rule covers
82 Machine Learning
Rule Induction Variants
 Ordered Rule lists (decision lists) - naturally supports
multiple output classes
 A=Green and B=Tall -> Class 1
 A=Red and C=Fast -> Class 2
 Else Class 1
 Placing new rules at beginning or end of list
 Unordered rule lists for each output class (must handle
multiple matches)
 Rule induction can handle noise by no longer creating new
rules when gain is negligible or not statistically significant
83 Machine Learning
Conclusion
 Many new algorithms and approaches being proposed
 Application areas rapidly increasing
 Amount of available data and information growing
 User desire for more adaptive and user-specific computer
interaction
 This need for specific and adaptable user interaction will make
machine learning a more important tool in user interface
research and applications
84 Machine Learning
Thank You!

ML

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

ML

Diunggah oleh

Hak Cipta:

Format Tersedia

Machine Learning

.3 -.2 Net = .8.4 + .3-.2 = .26

.1 -.2 Net = .4.4 + .1-.2 = .14

When is data noise vs. a legitimate exception

 Backpropagation requires a differentiable activation function

Input Layer Hidden Layer(s) Output Layer

 Now you receive the new input A’ B C What is your output?

Attribute:Color Red Green

 If the sameP ratios N

Anda mungkin juga menyukai