= ().
To do this we allow H to allow for any combination of
disjunctions, conjunctions and negations. E.g. the
target concept Sky=Sunny OR Sky=Cloudy would be:
<sunny,?,?,?,?,?> v <cloudy,?,?,?,?,?>
So we can use CE knowing that our target concept will
definitely exist in the hypothesis space. But
We create a new problem. Our learner will learn how
to classify exactly the instances presented as training
examples and not generalise beyond them!
Kristian Guillaumier, 2011 61
Unbiased Learning
To see why, suppose, I have 5 training examples d
1
, d
2
,
d
3
, d
4
, d
5
. And that d
1
, d
2
, d
3
are +ve examples and d
4
,
d
5
are ve examples.
The S boundary will become a disjunction of the +ve
examples (since it is the most specific possible
hypothesis that covers the examples):
S = {(d
1
v d
2
v d
3
)}
The G boundary will become a negation (rule out) of
the negative training examples:
G = {(d
4
v d
5
)}
So the only unambiguously classifiable instances are
those that were provided as training examples.
Kristian Guillaumier, 2011 62
Unbiased Learning
What would happen if we use the partially learned concept
and take a vote.
Instances that were originally in the training data will be
classified unambiguously (obviously).
Any other instance not in the training data will be classified
as +ve by half of the hypothesis in the version space and as
ve by the other half of the hypothesis in the version space.
Note that H is the power set of X.
x is some unobserved instance (not in training data).
Then there is some h in the version space that covers x.
But there is also a corresponding h in the version space that
covers the same x except for its classification (h).
Kristian Guillaumier, 2011 63
More on Bias
Straight from:
A learner that makes no a priori
assumptions regarding the identity of the
target concept has no rational basis for
classifying any unseen instances.
(in fact, CE worked because we biased it with
the assumption that the target concept can be
represented by a conjunction of attribute
values).
Kristian Guillaumier, 2011 64
More on Bias
Consider:
L = a learning algorithm.
Has a set of training data D
c
= {<x, c(x)>}.
c = some target concept.
x
i
is some instance we wish to classify.
L(x
i
, D
c
) = the classification (+ve/-ve) that L assigns to x
i
after
learning from training data D
c
.
The inductive inference step is:
(D
c
. x
i
) L(x
i
, D
c
)
Where a b denotes that b is inductively inferred from a.
So the inductive inference step reads given the training
data D
c
and the instance x
i
, as inputs to L, we can
inductively infer the classification of the instance.
Kristian Guillaumier, 2011 65
More on Bias
Kristian Guillaumier, 2011 66
Sky AirTemp Humidity Wind Water Forecast IsGoodDay
Sunny Warm Normal Strong Warm Same Yes
Sunny Warm High Strong Warm Same Yes
Rainy Cold High Strong Warm Changes No
Sunny Warm High Strong Cool Changes Yes
(D
c
. x
i
) L(x
i
, D
c
)
More on Bias
Because L is an inductive learning algorithm, in
general, we cannot prove that the result L(x
i
, D
c
)
is correct. I.e. the classification of the example
does not necessarily deductively follow from the
training data (can be proven).
However, we can add a number of assumptions
to our system so that the classification would
follow deductively.
The inductive bias of L is defined as these
assumptions.
Kristian Guillaumier, 2011 67
More on Bias
Let B = these assumptions (e.g. the hypothesis
space is made up only of conjunctions of
attribute values).
Then the inductive bas of L is B, giving:
(B . D
c
. x
i
) L(x
i
, D
c
)
Where the notation a b denotes that b
follows deductively from a (b is provable from
a).
Kristian Guillaumier, 2011 68
Defn. of Inductive Bias
Consider a concept learning algorithm L for the set of instances X.
Let c be an arbitrary concept over X and let D
c
= {<x, c(x)>} be an
arbitrary set of training examples of c.
Let L(x
i
, D
c
) denote the classification assigned to the instance x
i
by L
after training on the data D
c
.
The Inductive Bias of L is any minimal set of assertions B such that for
any target concept c and corresponding training examples D
c
(x
i
e X)[(B . D
c
. x
i
) L(x
i
, D
c
)]
Kristian Guillaumier, 2011 69
The Inductive Bias of CE
Let us specify what L(x
i
, D
c
) means for CE (how
classification works).
Given training data D
c
, CE will compute the version space
VS
H,Dc
.
Then it will classify a new instance x
i
by taking a vote
amongst the hypothesis in this version space.
A classification will be output (+ve or ve) if all the
hypothesis in the version space unanimously agree.
Otherwise no classification is output (I cant tell from
training data).
The inductive bias of CE is that the target concept c is
contained in the hypothesis space. i.e. ceH.
Why?
Kristian Guillaumier, 2011 70
The Inductive Bias of CE
1:
Notice that if we assume that ceH, then it follows
deductively (we can prove) that ceVS
H,Dc
.
2:
Recall that we defined the classification L(x
i
, D
c
) to be
a unanimous vote amongst all hypothesis in VS
H,Dc
.
Thus, if L outputs the classification L(x
i
, D
c
), then so
does every hypothesis heVS
H,Dc
including the
hypothesis ceVS
H,Dc
.
Therefore c(x
i
) = L(x
i
, D
c
).
Kristian Guillaumier, 2011 71