Probabilistic Learning

IS41620 ANALITIKA BISNIS
PROBABILISTIC LEARNING Materi 10

Jurusan Sistem Informasi ITS
OUTLINE
vTeorema Bayes
vNaïve Bayes Classification
TEOREMA BAYES
IDE DASAR
v Ide dasar aturan Bayes : hasil dari hipotesis atau peristiwa (H)
dapat diperkirakan berdasarkan pada beberapa evidence (E)
yang diamati.
v Hal penting dalam Bayes:
§ Sebuah probabilitas awal/priori H atau P(H), adalah
probabilitas dari suatu hipotesis sebelum bukti diamati.
§ Sebuah probabilitas posterior H atau P(H|E), adalah
probabilitas dari suatu hipotesis setelah bukti-bukti yang diamati
ada.
TEOREMA BAYES
P( E | H ) ´ P( H )
P( H | E ) =
P( E )
v P(H|E) : Probabilitas posterior bersyarat (Conditional Probability)
suatu hipotesis H terjadi jika diberikan evidence/bukti E terjadi
v P(E|H) : Probabilitas sebuah evidence E terjadi akan mempengaruhi
hipotesis H
v P(H) : Probabilitas awal (priori) hipotesis H terjadi tanpa
memandang evidence apapun
v P(E) : Probabilitas awal (priori) evidence E terjadi tanpa
memandang hipotesi/evidence yang lain
CONTOH TEOREMA BAYES
v Dalam suatu peramalan cuaca untuk memperkirakan terjadinya
hujan, misal ada faktor yang mempengaruhi terjadinya hujan yaitu
mendung.
v Jika diterapkan dalam Naïve Bayes maka probabilitas terjadinya
hujan jika bukti mendung sudah diamati:
P( Mendung | Hujan ) ´ P( Hujan )
P( Hujan | Mendung ) =
P( Mendung )
§ P(Hujan|Mendung) adalah nilai probabilitas hipotesis hujan terjadi jika bukti mendung
sudah diamati
§ P(Mendung|Hujan) adalah probabilitas bahwa mendung yang diamati akan mempengaruhi
terjadinya hujan
§ P(Hujan) adalah probabilitas awal hujan tanpa memandang bukti apapun
§ P(Mendung) adalah probabilitas terjadinya mendung
TEOREMA BAYES
v Teorema Bayes juga bisa menangani beberapa evidence, misalnya ada E1, E2,
dan E3, maka probabilitas posterior untuk hipotesis hujan:
P( E1 , E2 , E3 | H ) ´ P( H )
P( H | E1 , E2 , E3 ) =
P( E1 , E2 , E3 )
v Bentuk diatas dapat diubah menjadi:

P( E1 | H ) ´ P( E2 | H ) ´ P( E3 | H ) ´ P( H )
P( H | E1 , E2 , E3 ) =
P( E1 , E2 , E3 )
v Untuk contoh diatas, jika ditambahkan evidence suhu udara dan angin :
P ( Hujan | Mendung , Suhu , Angin ) =
P ( Mendung | Hujan ) ´ P ( Suhu | Hujan ) ´ P ( Angin | Hujan ) ´ P ( Hujan )
P ( Mendung , Suhu , Angin )
CONTOH KASUS
v Given :
§ A doctor knows that meningitis causes stiff neck 50% of the time – P(S|M)
§ Prior probability of any patient having meningitis is 1/50,000 – P(M)
§ Prior probability of any patient having stiff neck is 1/20 – P(S)
v If a patient has stiff neck, what’s the probability he/she has
meningitis ? P(M|S) ?
CONTOH KASUS
v Given :
§ A doctor knows that meningitis causes stiff neck 50% of the time – P(S|M)
§ Prior probability of any patient having meningitis is 1/50,000 – P(M)
§ Prior probability of any patient having stiff neck is 1/20 – P(S)
v If a patient has stiff neck, what’s the probability he/she has
meningitis ? P(M|S) ?
P( S | M ) P( M ) 0.5 ´1 / 50000
P( M | S ) = = = 0.0002
P( S ) 1 / 20
NAÏVE BAYES CLASSIFICATION
BAYESIAN CLASSIFIER
v Jika diketahui record dengan atribut (A1 , A2 , A3 , …, An) :
Tujuan classifier adalah memprediksi Class C
Pendekatan
v Hitung posterior probability P(C | A1, A2, …, An) untuk semua bilai C dengan
menggunakan Teorema Bayes
P( A A ! A | C ) P(C )
P(C | A A ! A ) = 1 2 n
P( A A ! A )
1 2 n
1 2 n
v Pilih nilai C yang memaksimalkan P(C | A1, A2, …, An)

v Secara ekivalen pilih nilai C yang memaksimalkan
P(A1, A2, …, An|C) P(C)
NAÏVE BAYES CLASSIFIER
v Assume independence among attributes Ai when class is
given :
§ P(A1, A2, …, An |C) = P(A1| Cj) P(A2| Cj)… P(An| Cj)
§ Can estimate P(Ai| Cj) for all Ai and Cj.
§ New point is classified to Cj if P(Cj) P P(Ai| Cj) is maximal.

BAGAIMANA MENGHITUNG PROBABILITAS DATA
g o r ic a l
g o ric
a l
in
u o u s
s
t e t e n t a s
ca ca co cl
Tid Refund Marital
Status
Taxable
Income Evade Data :
1 Yes Single 125K No
v Class
2 No Married 100K No
3 No Single 70K No v Discrete attribute
4 Yes Married 120K No
5 No Divorced 95K Yes v Continuous attribute
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
PROBABILITAS DATA CLASSg o r ic a l
g o ric
a l
in
u o u s
s
t e t e n t a s
ca ca co cl
Tid Refund Marital
Status
Taxable
Income Evade Class :
1 Yes Single 125K No P(C) = Nc/N
3 No Single 70K No = jumlah record bernilai class C / Jumlah Data
5 No Divorced 95K Yes
6 No Married 60K No Contoh :
8 No Single 85K Yes
P(No) = 7/10 = 0.7 = 70%
9 No Married 75K No
P(Yes) = 3/10 = 0.3 = 30%
10
PROBABILITAS DATA DISCRETE ATRIBUTES
g o r ic a l
g o ric
a l
in
u o u s
s
t e t e n t a s
ca ca co cl
Tid Refund Marital
Status
Taxable
Income Evade
Discrete attributes:
1 Yes Single 125K No P(Ai | Ck) = |Aik|/ Nc
3 No Single 70K No
Dimana |Aik| adalah jumlah instance yang
4 Yes Married 120K No memiliki attribute bernilai Ai dan menjadi
5 No Divorced 95K Yes anggota class Ck
6 No Married 60K No
8 No Single 85K Yes
Contoh :
9 No Married 75K No
P(Status=Married|No) = 4/7
10 No Single 90K Yes P(Refund=Yes|Yes)=0
10
PROBABILITAS DATA CONTINUOUS ATRIBUTES
For continuous attributes :
v Discretize the range into bins
§ one ordinal attribute per bin
§ violates independence assumption
v Two-way split: (A < v) or (A > v)
§ choose only one of the two splits as new attribute
v Probability density estimation:
§ Assume attribute follows a normal distribution
§ Use data to estimate parameters of distribution
(e.g., mean and standard deviation)
§ Once probability distribution is known, can use it to estimate the conditional probability
P(Ai|c)
a l a l s
u
ric ric u o
PROBABILITAS DATA CONTINUOUS ATRIBUTES

c at
e g o
c at
e g o
co
n t in
c las
s
Tid Refund Marital Taxable

Status Income Evade vNormal distribution:
1 Yes Single 125K No 1 -
( Ai - µ ij ) 2
2 No Married 100K No P( A | c ) = e 2 s ij2
2ps
i j 2
3 No Single 70K No
ij
5 No Divorced 95K Yes One for each (Ai,ci) pair
6 No Married 60K No
vFor (Income, Class=No ) :
Yes
If Class=No è sample mean = 110
8 No Single 85K
9 No Married 75K No sample variance = 2975
10
1 -
( 120 -110 ) 2
P( Income = 120 | No) = e 2 ( 2975 )

= 0.0072
2p (54.54)
CONTOH g o r ic a l
g o ric
a l
in
u o u s
s
t e t e n t a s
ca ca co cl
Tid Refund Marital Taxable
Evade
Diketahui record untuk tes :
Status Income
1 Yes Single 125K No X = (Refund = No, Married, Income = 120K)

3 No Single 70K No
Record X masuk klasifikasi Evade No atau Yes ?
5 No Divorced 95K Yes
6 No Married 60K No
8 No Single 85K Yes
9 No Married 75K No
10
naive Bayes Classifier:
CONTOH g o r ic a l
g o ric
a l
in
u o u s
s P(Refund=Yes|No) = 3/7
t e t e n t a s
cl P(Refund=No|No) = 4/7
ca ca co
Tid Refund Marital Taxable P(Refund=Yes|Yes) = 0
Status Income Evade P(Refund=No|Yes) = 1
P(Marital Status=Single|No) = 2/7
1 Yes Single 125K No
P(Marital Status=Divorced|No)=1/7
P(Marital Status=Married|No) = 4/7
3 No Single 70K No P(Marital Status=Single|Yes) = 2/7
4 Yes Married 120K No P(Marital Status=Divorced|Yes)=1/7
5 No Divorced 95K Yes P(Marital Status=Married|Yes) = 0
6 No Married 60K No
For taxable income:
If class=No: sample mean=110
8 No Single 85K Yes
sample variance=2975
9 No Married 75K No If class=Yes: sample mean=90
10 No Single 90K Yes sample variance=25
10
CONTOH
naive Bayes Classifier: X = (Refund = No, Married, Income = 120K)
P(Refund=Yes|No) = 3/7 ● P(X|Class=No) = P(Refund=No|Class=No)
P(Refund=No|No) = 4/7
P(Refund=Yes|Yes) = 0 ´ P(Married| Class=No)
P(Refund=No|Yes) = 1 ´ P(Income=120K| Class=No)
P(Marital Status=Single|No) = 2/7 = 4/7 ´ 4/7 ´ 0.0072 = 0.0024
P(Marital Status=Divorced|No)=1/7
P(Marital Status=Married|No) = 4/7
P(Marital Status=Single|Yes) = 2/7 ● P(X|Class=Yes) = P(Refund=No| Class=Yes)
P(Marital Status=Divorced|Yes)=1/7
P(Marital Status=Married|Yes) = 0
´ P(Married| Class=Yes)
´ P(Income=120K| Class=Yes)
For taxable income: = 1 ´ 0 ´ 1.2 ´ 10-9 = 0
If class=No: sample mean=110
sample variance=2975
If class=Yes: sample mean=90 Since P(X|No)P(No) > P(X|Yes)P(Yes)
sample variance=25
Therefore P(No|X) > P(Yes|X)
=> Class = No
CONTOH : KLASIFIKASI BINATANG
Name Give Birth Can Fly Live in Water Have Legs Class A: attributes
human yes no no yes mammals
python no no no no non-mammals M: mammals
salmon no no yes no non-mammals
whale yes no yes no mammals N: non-mammals
frog no no sometimes yes non-mammals
komodo no no no yes non-mammals
6 6 2 2
bat
pigeon
yes
no
yes
yes
no
no
yes
yes
mammals
non-mammals
P ( A | M ) = ´ ´ ´ = 0.06
cat yes no no yes mammals
7 7 7 7
leopard shark yes no yes no non-mammals 1 10 3 4
turtle no no sometimes yes non-mammals P ( A | N ) = ´ ´ ´ = 0.0042
penguin no no sometimes yes non-mammals 13 13 13 13
porcupine yes no no yes mammals
eel no no yes no non-mammals 7
salamander no no sometimes yes non-mammals P ( A | M ) P ( M ) = 0.06 ´ = 0.021
gila monster no no no yes non-mammals 20
platypus no no no yes mammals
owl no yes no yes non-mammals 13
dolphin yes no yes no mammals P ( A | N ) P ( N ) = 0.004 ´ = 0.0027
eagle no yes no yes non-mammals 20
Give Birth Can Fly Live in Water Have Legs Class P(A|M)P(M) > P(A|N)P(N)
yes no yes no ? => Mammals
Coming Up:
SUPPORT VECTOR MACHINES

REFERENCES
1. Pang Ning Tan, M. Steinbach, V. Kumar, 2006, Introduction to Data Mining,
Pearson International Edition
2. Stuart Russell,Peter Norvig. 2009. Artificial Intelligence: A Modern Approach
(3rd Edition)
3. Dan Klein, Pieter Abbeel. 2013. Courseware edX Artificial Intelligence
University of California at Berkeley
4. D. Poole & A. Mackworth. 2010. Artificial Intelligence: Foundations of
Computational Agents. Cambridge University Press

Probabilistic Learning

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Probabilistic Learning

Diunggah oleh

Hak Cipta:

Format Tersedia

IS41620 ANALITIKA BISNIS

PROBABILISTIC LEARNING Materi 10

v Bentuk diatas dapat diubah menjadi:

v Pilih nilai C yang memaksimalkan P(C | A1, A2, …, An)

§ P(A1, A2, …, An |C) = P(A1| Cj) P(A2| Cj)… P(An| Cj)

§ Can estimate P(Ai| Cj) for all Ai and Cj.

§ New point is classified to Cj if P(Cj) P P(Ai| Cj) is maximal.

PROBABILITAS DATA CONTINUOUS ATRIBUTES

Tid Refund Marital Taxable

2 No Married 100K No P( A | c ) = e 2 s ij2

P( Income = 120 | No) = e 2 ( 2975 )

1 Yes Single 125K No X = (Refund = No, Married, Income = 120K)

SUPPORT VECTOR MACHINES

Anda mungkin juga menyukai