Probabilistic Learning
Probabilistic Learning
v Untuk contoh diatas, jika ditambahkan evidence suhu udara dan angin :
P ( Hujan | Mendung , Suhu , Angin ) =
P ( Mendung | Hujan ) ´ P ( Suhu | Hujan ) ´ P ( Angin | Hujan ) ´ P ( Hujan )
P ( Mendung , Suhu , Angin )
CONTOH KASUS
v Given :
§ A doctor knows that meningitis causes stiff neck 50% of the time – P(S|M)
§ Prior probability of any patient having meningitis is 1/50,000 – P(M)
§ Prior probability of any patient having stiff neck is 1/20 – P(S)
v If a patient has stiff neck, what’s the probability he/she has
meningitis ? P(M|S) ?
CONTOH KASUS
v Given :
§ A doctor knows that meningitis causes stiff neck 50% of the time – P(S|M)
§ Prior probability of any patient having meningitis is 1/50,000 – P(M)
§ Prior probability of any patient having stiff neck is 1/20 – P(S)
v If a patient has stiff neck, what’s the probability he/she has
meningitis ? P(M|S) ?
P( S | M ) P( M ) 0.5 ´1 / 50000
P( M | S ) = = = 0.0002
P( S ) 1 / 20
NAÏVE BAYES CLASSIFICATION
BAYESIAN CLASSIFIER
v Jika diketahui record dengan atribut (A1 , A2 , A3 , …, An) :
Tujuan classifier adalah memprediksi Class C
Pendekatan
v Hitung posterior probability P(C | A1, A2, …, An) untuk semua bilai C dengan
menggunakan Teorema Bayes
P( A A ! A | C ) P(C )
P(C | A A ! A ) = 1 2 n
P( A A ! A )
1 2 n
1 2 n
g o ric
a l
in
u o u s
s
t e t e n t a s
ca ca co cl
Tid Refund Marital
Status
Taxable
Income Evade Data :
1 Yes Single 125K No
v Class
2 No Married 100K No
3 No Single 70K No v Discrete attribute
4 Yes Married 120K No
5 No Divorced 95K Yes v Continuous attribute
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
PROBABILITAS DATA CLASSg o r ic a l
g o ric
a l
in
u o u s
s
t e t e n t a s
ca ca co cl
Tid Refund Marital
Status
Taxable
Income Evade Class :
1 Yes Single 125K No P(C) = Nc/N
2 No Married 100K No
3 No Single 70K No = jumlah record bernilai class C / Jumlah Data
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No Contoh :
7 Yes Divorced 220K No
8 No Single 85K Yes
P(No) = 7/10 = 0.7 = 70%
9 No Married 75K No
P(Yes) = 3/10 = 0.3 = 30%
10 No Single 90K Yes
10
PROBABILITAS DATA DISCRETE ATRIBUTES
g o r ic a l
g o ric
a l
in
u o u s
s
t e t e n t a s
ca ca co cl
Tid Refund Marital
Status
Taxable
Income Evade
Discrete attributes:
1 Yes Single 125K No P(Ai | Ck) = |Aik|/ Nc
2 No Married 100K No
3 No Single 70K No
Dimana |Aik| adalah jumlah instance yang
4 Yes Married 120K No memiliki attribute bernilai Ai dan menjadi
5 No Divorced 95K Yes anggota class Ck
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
Contoh :
9 No Married 75K No
P(Status=Married|No) = 4/7
10 No Single 90K Yes P(Refund=Yes|Yes)=0
10
PROBABILITAS DATA CONTINUOUS ATRIBUTES
For continuous attributes :
v Discretize the range into bins
§ one ordinal attribute per bin
§ violates independence assumption
v Two-way split: (A < v) or (A > v)
§ choose only one of the two splits as new attribute
v Probability density estimation:
§ Assume attribute follows a normal distribution
§ Use data to estimate parameters of distribution
(e.g., mean and standard deviation)
§ Once probability distribution is known, can use it to estimate the conditional probability
P(Ai|c)
a l a l s
u
ric ric u o
2ps
i j 2
3 No Single 70K No
ij
4 Yes Married 120K No
5 No Divorced 95K Yes One for each (Ai,ci) pair
6 No Married 60K No
vFor (Income, Class=No ) :
7 Yes Divorced 220K No
Yes
If Class=No è sample mean = 110
8 No Single 85K
9 No Married 75K No sample variance = 2975
10 No Single 90K Yes
10
1 -
( 120 -110 ) 2
g o ric
a l
in
u o u s
s
t e t e n t a s
ca ca co cl
Tid Refund Marital Taxable
Evade
Diketahui record untuk tes :
Status Income
g o ric
a l
in
u o u s
s P(Refund=Yes|No) = 3/7
t e t e n t a s
cl P(Refund=No|No) = 4/7
ca ca co
Tid Refund Marital Taxable P(Refund=Yes|Yes) = 0
Status Income Evade P(Refund=No|Yes) = 1
P(Marital Status=Single|No) = 2/7
1 Yes Single 125K No
P(Marital Status=Divorced|No)=1/7
2 No Married 100K No
P(Marital Status=Married|No) = 4/7
3 No Single 70K No P(Marital Status=Single|Yes) = 2/7
4 Yes Married 120K No P(Marital Status=Divorced|Yes)=1/7
5 No Divorced 95K Yes P(Marital Status=Married|Yes) = 0
6 No Married 60K No
For taxable income:
7 Yes Divorced 220K No
If class=No: sample mean=110
8 No Single 85K Yes
sample variance=2975
9 No Married 75K No If class=Yes: sample mean=90
10 No Single 90K Yes sample variance=25
10
CONTOH
naive Bayes Classifier: X = (Refund = No, Married, Income = 120K)
P(Refund=Yes|No) = 3/7 ● P(X|Class=No) = P(Refund=No|Class=No)
P(Refund=No|No) = 4/7
P(Refund=Yes|Yes) = 0 ´ P(Married| Class=No)
P(Refund=No|Yes) = 1 ´ P(Income=120K| Class=No)
P(Marital Status=Single|No) = 2/7 = 4/7 ´ 4/7 ´ 0.0072 = 0.0024
P(Marital Status=Divorced|No)=1/7
P(Marital Status=Married|No) = 4/7
P(Marital Status=Single|Yes) = 2/7 ● P(X|Class=Yes) = P(Refund=No| Class=Yes)
P(Marital Status=Divorced|Yes)=1/7
P(Marital Status=Married|Yes) = 0
´ P(Married| Class=Yes)
´ P(Income=120K| Class=Yes)
For taxable income: = 1 ´ 0 ´ 1.2 ´ 10-9 = 0
If class=No: sample mean=110
sample variance=2975
If class=Yes: sample mean=90 Since P(X|No)P(No) > P(X|Yes)P(Yes)
sample variance=25
Therefore P(No|X) > P(Yes|X)
=> Class = No
CONTOH : KLASIFIKASI BINATANG
Name Give Birth Can Fly Live in Water Have Legs Class A: attributes
human yes no no yes mammals
python no no no no non-mammals M: mammals
salmon no no yes no non-mammals
whale yes no yes no mammals N: non-mammals
frog no no sometimes yes non-mammals
komodo no no no yes non-mammals
6 6 2 2
bat
pigeon
yes
no
yes
yes
no
no
yes
yes
mammals
non-mammals
P ( A | M ) = ´ ´ ´ = 0.06
cat yes no no yes mammals
7 7 7 7
leopard shark yes no yes no non-mammals 1 10 3 4
turtle no no sometimes yes non-mammals P ( A | N ) = ´ ´ ´ = 0.0042
penguin no no sometimes yes non-mammals 13 13 13 13
porcupine yes no no yes mammals
eel no no yes no non-mammals 7
salamander no no sometimes yes non-mammals P ( A | M ) P ( M ) = 0.06 ´ = 0.021
gila monster no no no yes non-mammals 20
platypus no no no yes mammals
owl no yes no yes non-mammals 13
dolphin yes no yes no mammals P ( A | N ) P ( N ) = 0.004 ´ = 0.0027
eagle no yes no yes non-mammals 20
Give Birth Can Fly Live in Water Have Legs Class P(A|M)P(M) > P(A|N)P(N)
yes no yes no ? => Mammals
Coming Up: