Anda di halaman 1dari 42

Concept Learning

Learning from examples


nGeneral-to specific ordering of hypotheses
nVersion spaces and candidate elimination
algorithm
nInductive bias
n

Whats Concept Learning?


n

infer the general definition of some


concept, given examples labeled as
members or nonmembers of the
concept.
example: learn the category of car or
bird
concept is often formulated as booleanvalued function
can be formulated as a problem of
searching a hypothesis space

Training Examples for Concept


Enjoy Sport
Concept: days on which my friend Tom enjoys his favourite
water sports
Task: predict the value of Enjoy Sport for an arbitrary day
based on the values of the other attributes

attributes

Sky

Temp Humid Wind

Water Forecast

Enjoy
Sport

Sunny
Sunny
Rainy
Sunny

Warm
Warm
Cold
Warm

Warm
Warm
Warm
Cool

Yes
Yes
No
Yes

Normal Strong
example
High
Strong
High
Strong
High
Strong

Same
Same
Change
Change

Representing Hypothesis
n

Hypothesis h is described as a conjunction of


constraints on attributes
Each constraint can be:
n
n
n

A specific value : e.g. Water=Warm


A dont care value : e.g. Water=?
No value allowed (null hypothesis): e.g. Water=

Example: hypothesis h
Sky
Temp Humid Wind Water Forecast
< Sunny
?
?
Strong ?
Same >
n

Prototypical Concept Learning


Task
Given:
n Instance Space X : Possible days decribed by the
attributes Sky, Temp, Humidity, Wind, Water, Forecast
n Target function c: EnjoySport X {0,1}
Hypothese Space H: conjunction of literals e.g.
< Sunny ? ? Strong ? Same >
n Training examples D : positive and negative examples of
the target function: <x 1,c(x1)>, , <xn,c(xn)>
Determine:
n A hypothesis h in H such that h(x)=c(x) for all x in D.
n

Inductive Learning Hypothesis


n

Any hypothesis found to approximate the


target function well over the training
examples, will also approximate the target
function well over the unobserved examples.
find the hypothesis that best fits the
training data

Number of Instances,
Concepts, Hypotheses
Sky: Sunny, Cloudy, Rainy
n AirTemp: Warm, Cold
n Humidity: Normal, High
n Wind: Strong, Weak
n Water: Warm, Cold
n Forecast: Same, Change
#distinct instances : 3*2*2*2*2*2 = 96
#distinct concepts : 296
#syntactically distinct hypotheses : 5*4*4*4*4*4=5120
#semantically distinct hypotheses : 1+4*3*3*3*3*3=973
n

organize the search to take advantage of the structure of the


hypothesis space to improve running time

General to Specific Ordering


n

Consider two hypotheses:


n h1 =< Sunny,?,?,Strong,?,?>
n h2 =< Sunny,?,?,?,?,?>
Set of instances covered by h 1 and h2:
h2 imposes fewer constraints than h 1 and therefore classifies more
instances x as positive h(x)=1. h 2 is a more general concept.

Definition: Let hj and hk be boolean-valued functions defined over X.


Then hj is more general than or equal to hk (written hj hk) if and
only if

x X : [ (hk(x) = 1) (hj(x) = 1)]

The relation imposes a partial order over the hypothesis space H


that is utilized in many concept learning methods.

Instance, Hypotheses and


more general
Instances

Hypotheses
specific

x1

h3
x2

h2 h1
h2 h3

h1
h2

general

x1=< Sunny,Warm,High,Strong,Cool,Same>
h1=<
Sunny,?,?,Strong,?,?>
h1 is a minimal
specialization
of h 2
x2=< Sunny,Warm,High,Light,Warm,Same>
h2=<
Sunny,?,?,?,?,?>
h2 is a minimal
generalization
of h 1
h3=< Sunny,?,?,?,Cool,?>

Find-S Algorithm
1.
2.

3.

Initialize h to the most specific hypothesis in H


For each positive training instance x
n
For each attribute constraint ai in h
If the constraint ai in h is satisfied by x
then do nothing
else replace ai in h by the next more
general constraint that is satisfied by x
Output hypothesis h
minimal generalization
to cover x

Constraint Generalization
Arrtibute: Sky
(no value)

Sunny

Cloudy

Rainy

?(any value)

Illustration of Find-S
Instances
x3

Hypotheses
h0

x1

x2
x4

specific

h1
h2,3
h4

general

x1=<Sunny,Warm,Normal,Strong,Warm,Same>+ h0=< , , , , , ,>


h1=< Sunny,Warm,Normal,
Strong,Warm,Same>
x2=<Sunny,Warm,High,Strong,Warm,Same>+
h2,3=< Sunny,Warm,?,
x3=<Rainy,Cold,High,Strong,Warm,Change> Strong,Warm,Same>
x4=<Sunny,Warm,High,Strong,Cool,Change> + h4=< Sunny,Warm,?,
Strong,?,?>

Properties of Find-S
n

Hypothesis space described by


conjunctions of attributes
Find-S will output the most specific
hypothesis within H that is consistent
with the positve training examples
The output hypothesis will also be
consistent with the negative examples,
provided the target concept is
contained in H. (why?)

Why Find-S Consistent?


-

+
+

+
-

h is consistent
with D, then
h>s;

Complaints about Find-S


n

Cant tell if the learner has converged to the target


concept, in the sense that it is unable to determine
whether it has found the only hypothesis consistent
with the training examples. (more examples get
better approximation)
Cant tell when training data is inconsistent, as it
ignores negative training examples. (prefer to detect
and tolerate errors or noise)
Why prefer the most specific hypothesis? Why not
the most general, or some other hypothesis? (more
specific less likely coincident)
What if there are multiple maximally specific
hypothesis? (all of them are equally likely)

Version Spaces
n

A hypothesis h is consistent with a set of


training examples D of target concept if and
only if h(x)=c(x) for each training example
<x,c(x)> in D.

Consistent(h,D) := <x,c(x)>D h(x)=c(x)


n The version space, VSH,D , with respect to
hypothesis space H, and training set D, is the
subset of hypotheses from H consistent with
all training examples:
VSH,D = {h H | Consistent(h,D) }

List-Then Eliminate Algorithm


1. VersionSpace a list containing every
hypothesis in H
2. For each training example <x,c(x)>
remove from VersionSpace any
hypothesis that is inconsistent with the
training example h(x) c(x)
3. Output the list of hypotheses in
VersionSpace
inefficient as it does not utilize the structure
of the hypothesis space.

Example Version Space


S:

{<Sunny,Warm,?,Strong,?,?>}

<Sunny,?,?,Strong,?,?>

G:
x1
x2
x3
x4

<Sunny,Warm,?,?,?,?>

<?,Warm,?,Strong,?,?>

{<Sunny,?,?,?,?,?>, <?,Warm,?,?,?>, }

=
=
=
=

<Sunny
<Sunny
<Rainy
<Sunny

Warm Normal Strong Warm Same> +


Warm High Strong Warm Same> +
Cold High Strong Warm Change> Warm High Strong Cool Change> +

Representing Version Spaces


n

The general boundary, G, of version space VSH,D


is the set of maximally general hypotheses.
The specific boundary, S, of version space VSH,D
is the set of maximally specific hypotheses.
Every hypothesis of the version space lies between
these boundaries

VSH,D = {h H| ( s S) ( g G) (g h s)
where x y means x is more general or equal than y

Boundaries of Version Space

g
h

s
+

s
+
+

h is consistent
with D
Consistent(s,D)
= FALSE

+
-

Consistent(g,D)
=FALSE

Candidate Elimination
Algorithm
G maximally general hypotheses in H
S maximally specific hypotheses in H
For each training example d=<x,c(x)>
modify G and S so that G and S are consistent
with d

Positive Example:
g(d)=s(d)=0

s
+

+
+
-

remove g
remove s

+
-

Possitive Example:
g(d)=1 and s(d)=0
-

s
+

+
+
-

generalize s
+
-

Possitive Example:
g(d)=s(d)=1

s
+ + +
+
+ +

Negative Example:
g(d-)=s (d-)=1

s
+ - +
+
+ +

remove s
remove g
-

Negative Example:
g(d-)=1 and s(d -)=0

s
+

+
+
-

specialize g

+
-

Negative Example:
g(d-)=s(d-)=0

s
+

+
+
-

+
-

Candidate Elimination
Algorithm
G maximally general hypotheses in H
S maximally specific hypotheses in H
For each training example d=<x,c(x)>
If d is a positive example
Remove from G any hypothesis that is inconsistent with d
For each hypothesis s in S that is not consistent with d
n remove s from S.
n Add to S all minimal generalizations h of s such that
n
n

h consistent with d
Some member of G is more general than h

Remove from S any hypothesis that is more general than


another hypothesis in S

Candidate Elimination
Algorithm
If d is a negative example
Remove from S any hypothesis that is inconsistent with d
For each hypothesis g in G that is not consistent with d
n remove g from G.
n Add to G all minimal specializations h of g such that
n
n

h consistent with d
Some member of S is more specific than h

Remove from G any hypothesis that is less general than


another hypothesis in G

Example Candidate Elimination


S:
G:

{<, , , , , >}
{<?, ?, ?, ?, ?, ?>}

x1 = <Sunny Warm Normal Strong Warm Same> +


S:

{< Sunny Warm Normal Strong Warm Same >}

G:

{<?, ?, ?, ?, ?, ?>}

x2 = <Sunny Warm High Strong Warm Same> +


S:
G:

{< Sunny Warm ? Strong Warm Same >}


{<?, ?, ?, ?, ?, ?>}

Example Candidate Elimination


S:

{< Sunny Warm ? Strong Warm Same >}

G:

{<?, ?, ?, ?, ?, ?>}

x3 = <Rainy Cold High


S:
G:

Strong Warm Change> -

{< Sunny Warm ? Strong Warm Same >}

{<Sunny,?,?,?,?,?>, <?,Warm,?,?,?>, <?,?,?,?,?,Same>}


x4 = <Sunny Warm High
S:
G:

Strong Cool Change> +

{< Sunny Warm ? Strong ? ? >}

{<Sunny,?,?,?,?,?>, <?,Warm,?,?,?> }

Remarks on Version Space


and Candidate-Elimination
n

converge to target concept when


n
n

converge to an empty version space when


n
n

n
n

no error in training examples


target concept is in H
inconsistency in training data
target concept cannot be described by hypothesis
representation

what should be the next training example?


how to classify new instances?

Classification of New Data


S:

{<Sunny,Warm,?,Strong,?,?>}

<Sunny,?,?,Strong,?,?>

G:
x5
x6
x7
x8

<Sunny,Warm,?,?,?,?>

<?,Warm,?,Strong,?,?>

{<Sunny,?,?,?,?,?>, <?,Warm,?,?,?>, }

=
=
=
=

<Sunny
<Rainy
<Sunny
<Sunny

Warm Normal Strong Cool Change> + 6/0


Cold Normal Light Warm Same> - 0/6
Warm Normal Light Warm Same> ? 3/3
Cold Normal Strong Warm Same> ? 2/4

Inductive Leap
+ <Sunny Warm Normal Strong Cool Change>
+ <Sunny Warm Normal Light Warm Same>

S : <Sunny Warm Normal ? ? ?>


How can we justify to classify the new example as
+ <Sunny Warm Normal Strong Warm Same>
Bias: We assume that the hypothesis space H contains
the target concept c. In other words that c can be
described by a conjunction of attribute constraints.

Biased Hypothesis Space


n

Our hypothesis space is unable to represent a


simple disjunctive target concept :
(Sky=Sunny) v (Sky=Cloudy)

problem of
expressibility
x1 = <Sunny Warm Normal Strong Cool Change> +
x2 = <Cloudy Warm Normal Strong Cool Change> +
S : { <?, Warm, Normal, Strong, Cool, Change> }
x3 = <Rainy Warm Normal Light Warm Same> S : {}

Unbiased Learner
n

n
n

Idea: Choose H that expresses every teachable


concept, that means H is the set of all possible
subsets of X called the power set P(X)
|X|=96, |P(X)|=296 ~ 1028 distinct concepts
H = disjunctions, conjunctions, negations
n

e.g. <Sunny Warm Normal ? ? ?> v <? ? ? ? ? Change>

H surely contains the target concept.

Unbiased Learner
What are S and G in this case?
Assume positive examples (x 1, x2, x3) and
negative examples (x4, x5)
G : { (x4 v x5) }
S : { (x1 v x2 v x3) }
The only examples that are classified are the training
examples themselves. In other words in order to learn
the target concept one would have to present every single
instance in X as a training example.
Each unobserved instance will be classified positive by
precisely half the hypothesis in VS and negative by the
other half.
problem of generalizability

Futility of Bias-Free Learning


n

A learner that makes no prior assumptions


regarding the identity of the target concept
has no rational basis for classifying any
unseen instances.

No Free Lunch!

Inductive Bias
Consider:
n Concept learning algorithm L
n Instances X, target concept c
n Training examples D c={<x,c(x)>}
n Let L(xi,Dc ) denote the classification assigned to
instance xi by L after training on D c.
Definition:
The inductive bias of L is any minimal set of assertions
B such that for any target concept c and
corresponding training data D c
(xi X)[B Dc xi ] |-- L(xi, D c)
Where A |-- B means that A logically entails B.

Inductive Systems and


Equivalent Deductive Systems
training
examples
new instance

training
examples
new instance
assertion H
contains target
concept

candidate elimination
algorithm

classification of
new instance or
dont know

using hypothesis space H


equivalent deductive system
theorem prover

classification of
new instance or
dont know

Three Learners with Different


Biases
n

Rote learner: Store examples, and classify x if and


only if it matches a previously observed example.
n No inductive bias
Version space candidate elimination algorithm.
n Bias: The hypothesis space contains the target
concept .
Find-S
n Bias: The hypothesis space contains the target
concept and all instances are negative instances
unless the opposite is entailed by its other
knowledge.

Summary
n
n
n
n

n
n

Concept learning as search through H


General-to-specific ordering over H
Version space candidate elimination algorithm
S and G boundaries characterize learner's
uncertainty
Learner can generate useful queries
Inductive leaps possible only if learner is
biased
Inductive learners can be modelled by
equivalent deductive systems

Anda mungkin juga menyukai