Hsin-Min Wang
whm@iis.sinica.edu.tw
References:
1.
2.
3.
S1
0.33
0.34
S2
back
{A:.34,B:.33,C:.33}
0.33
0.33 0.33
0.33
S3
0.34
0.33
{A:.33,B:.34,C:.33} {A:.33,B:.33,C:.34}
Probability Theorem
Consider the simple scenario of rolling two dice, labeled die 1 and die 2.
Define the following three events:
A: Die 1 lands on 3.
B: Die 2 lands on 1.
C: The dice sum to 8.
{(2,6), (3,5), (4,4), (5,3), (6,2)}
Prior probability: P(A)=P(B)=1/6, P(C)=5/36.
Joint probability: P(A,B) (or P(AB)) =1/36, two events A and B are
statistically independent if and only if P(A,B) = P(A)xP(B).
AB ={(3,1)}
P(B,C)=0, two events B and C are mutually exclusive if and
BC=
only if BC=, i.e., P(BC)=0.
P(B|A)=P(B), P(C|B)=0 P( A) 1 / 6 6
rule
P (O)
Posterior probability
5
maximum likelihood
principle
P ( A, B ) P ( B | A) P ( A)
P ( X1 ) P (X i | X1, X 2 ,..., X i 1 )
i 2
P ( X1, X 2 ,..., X n ) P ( X1 ) P (X i | X i 1 )
i 2
i =P(q1=i) 1iN
(i 1 i 1)
0.6
S1
0.3
0.7
S2
A
0.3
0.1 0.1
0.2
S3
0.2
0.5
P 5 consecutiv e up days
P S1,S1,S1,S1,S1
1a11a11a11a11
0.5
0.2
0.3
10
0.6
0.7
S1
{A:.3,B:.2,C:.5}
0.3
S2
0.3
0.1 0.1
0.2
S3
0.5
0.2
{A:.7,B:.1,C:.2}
{A:.3,B:.6,C:.1}
0.4 0.5 0.1
Given an observation sequence O={ABC}, there are 27 possibl
e corresponding state sequences, and therefore the probability,
P(O|), is
P O P O, Q i P O Q i , P Q i ,
27
27
i 1
i 1
Q i : state sequence
11
(35 state sequences can generate up, up, up, up, up.)
Elements of an HMM
An HMM is characterized by the following:
1. N, the number of states in the model
2. M, the number of distinct observation symbols per state
3. The state transition probability distribution A={aij}, where aij=P[q
t+1=j|qt=i], 1i,jN
4. The observation symbol probability distribution in state j, B={bj
(vk)} , where bj(vk)=P[ot=vk|qt=j], 1jN, 1kM
5. The initial state distribution ={i}, where i=P[q1=i], 1iN
P Q P q1 ,..., qt ,..., qT P q1 P qt qt 1 ,
T
t 2
aij=P(qt+1=j|qt=i), 1i, jN
Output-independent assumption
The observation is dependent on the state that generates it, not
T
dependent on its neighbor observations T
t 1
14
* arg max P (O | i )
i
Problem 2:
How to choose an optimal state sequence Q=(q1,q2,, qT) whic
h best explains the observations?
*
Q
arg max P (Q, O | )
Decoding Problem
Q
Problem 3:
How to adjust the model parameters =(A,B,) to maximize P(O|
)?
Learning/Training Problem
15
Solution to Problem 1
16
all Q
t 2
P O Q, P ot qt , bqt ot
T
t 1
t 1
17
State
a21 b1 (oT )
S3
S3
S3
S3
S3
S2
S2
S2
S2
S2
S1
S1
S1
S1
S1
o1
o2
o3
T-1
oT-1
Sj
aij
oT
18
Time
P Q P O Q,
all Q
all Q
q1,q2 ,..,qT
Complexity :
2T-1 N
P O
with recursion on t
Forward variable :
t i P o1 , o2 ,..., ot , qt i
The probability of the joint event that o1,o2,,ot are observed and
the state at time t is i, given the model
t 1 j P o1 , o2 ,..., ot , ot 1 , qt 1 j
N
t (i )aij b j (ot 1 )
i 1
20
t 1
P o1, o2 ,..., ot , qt i, qt 1
i 1
i 1
i 1
i 1
t (i)aij
t 1
j b j (ot 1 )
P( A, B, )
P A
P ( B, )
t 1
P ( A, B)
all B
P ( A, B | ) P ( A | ) P ( B | A, )
j | o1 , o2 ,..., ot , qt i, ) b j (ot 1 )
j | qt i, ) b j (ot 1 )
b j (ot 1 )
21
State index
3(2)=P(o1,o2,o3,q3=2|)
Time index
State
S3
S3
a
2(3) 32 b (o )
2 3
S2
S2
2(2)
S1
S1
a22
a12
S3
S3
S2
S2
S2
S1
S1
S1
2(1)
1
o1
o2
o3
T-1
oT-1
Sj
aij
22
T
oT
Time
Algorithm
3.Terminat ion
P O T i
N
i 1
Complexity: O(N T)
2
a11=0.6
1(1)=0.5*0.7
a21=0.5
1=0.5 b1(up)=0.7
1(2)= 0.2*0.1
b1(up)=0.7
a31=0.4
a12=0.2
2(2)=(0.35*0.2+0.02*0.3+0.09*0.1)*0.1
a22=0.3
2=0.2
b2(up)= 0.1
1(3)= 0.3*0.3
b2(up)= 0.1
a23=0.2
a32=0.1
a13=0.2
2(3)=(0.35*0.2+0.02*0.2+0.09*0.5)*0.3
a33=0.5
3=0.3
b3(up)=0.3
b3(up)=0.3
Solution to Problem 2
25
State
S3
S3
S3
S3
S3
S2
S2
S2
S2
S2
S1
S1
S1
S1
S1
o1
o2
o3
T-1
oT-1
T
oT
27
Time
1 i i bi o1 , 1 i N
2. Induction
1 (i ) 0 , 1 i N
1i N
3. Termination
P * O max T i
1i N
N
cf. t 1 j t i aij b j ot 1
i 1
P O T i
1i N
4. Backtracking
i 1
Complexity: O(N2T)
28
1(1)=0.5*0.7
1=0.5 b1(up)=0.7
a21=0.5
1(2)= 0.2*0.1
a22=0.3
2=0.2
0.09
b2(up)= 0.1
1(3)= 0.3*0.3
3=0.3
2(1)
=max (0.35*0.6, 0.02*0.5, 0.09*0.4)*0.7
a11=0.6
b3(up)=0.3
2(1)=1
2(2)
=max (0.35*0.2, 0.02*0.3, 0.09*0.1)*0.1
b2(up)= 0.1
a32=0.1
a23=0.2
a33=0.5
a13=0.2
2(2)= 0.35*0.2*0.1=0.007
2(2)=1
2(3)
=max (0.35*0.2, 0.02*0.2, 0.09*0.5)*0.3
2(3)= 0.35*0.2*0.3=0.021
b3(up)=0.3
2(3)=1
Some Examples
30
S3
S3
S3
S3
S3
P(O | 1 ) T (3)
S2
S2
S2
S2
S2
S2
S1
S1
S1
S1
S1
S1
S3
S3
S3
S3
S3
S3
S3
P (O | 0 ) T (3)
S2
S2
S2
S2
S2
S2
S2
S1
S1
S1
S1
S1
S1
S1
o1
o2
o3
T-1
oT-1
oT
31
Time
S6
S6
S6
S6
S6
S6
S6
S5
S5
S5
S5
S5
S5
S5
S4
S4
S4
S4
S4
S4
S4
S3
S3
S3
S3
S3
S3
S3
T (6)
T (6)
T (3)
T (3)
S2
S2
S2
S2
S2
S2
S2
S1
S1
S1
S1
S1
S1
S1
o1
o2
o3
T-1
oT-1
T
32
oT
Time
8 (6)
8 (6)
S6
S6
S6
S6
S6
S6
S6
S6
S6
S5
S5
S5
S5
S5
S5
S5
S5
S5
S4
S4
S4
S4
S4
S4
S4
S4
S4
S3
S3
S3
S3
S3
S3
S3
S3
S3
S2
S2
S2
S2
S2
S2
S2
S2
S2
S1
S1
S1
S1
S1
S1
S1
S1
S1
1
Time
S1
S1
S2
S3
S3
S4
S5
S5
S6
Best
state
sequence
33
CpG Islands
Two Questions
Q1: Given a short sequence, does it come from a CpG is
land?
Q2: Given a long sequence, how would we find the CpG
islands in it?
34
CpG Islands
Answer to Q1:
Given sequence x, probabilistic model M1 of CpG islands, and pr
obabilistic model M2 for non-CpG island regions
Compute p1=P(x|M1) and p2=P(x|M2)
If p1 > p2, then x comes from a CpG island (CpG+)
If p2 > p1, then x does not come from a CpG island (CpG-)
S1:A
S3:T
S2:C
S4:G
CpG+
0.180
0.274
0.426
0.120
0.171
0.368
0.274
0.188
0.161
0.339
0.375
0.125
0.079
0.355
0.384
0.182
vs.
CpG-
0.300
0.205
0.285
0.210
0.322
0.298
0.078
0.302
Small CG
transition
probability
0.248
0.246
0.298
0.208
0.177
0.239
0.292
0.292
35
Large CG
transition
probability
CpG Islands
Answer to Q2:
p12=0.00001
p11=0.99999
S1
p21=0.0001
CpG-
A: 0.3
C: 0.2
G: 0.2
T: 0.3
Hidden
p22=0.9999
S2
A: 0.2
C: 0.3
G: 0.3
T: 0.2
S1
S1
S1
S2
S2
S2
S2
CpG+
S1
S1
Observable
36
38
Solution to Problem 3
39
Solution to Problem 3
Maximum Likelihood Estimation of Model Parameters
How to adjust (re-estimate) the model parameters =(A,
B,) to maximize P(O|)?
The most difficult one among the three problems, because there
is no known analytical method that maximizes the joint probabilit
y of the training data in a closed form
The data is incomplete because of the hidden state sequence
Solution to Problem 3
The Segmental K-means Algorithm
Assume that we have a training set of observations and an initial est
imate of model parameters
Step 1 : Segment the training data
The set of training observation sequences is segmented into states, bas
ed on the current model, by the Viterbi Algorithm
Step 2 : Re-estimate the model parameters
i
Number of times q1 i
Number of training sequences
aij
bj k
41
Solution to Problem 3
The Segmental K-means Algorithm (contd)
3 states and 2 codewords
State
Training data:
s3
s3
s3
s3
s3
s3
s3
s3
s3
s3
s2
s2
s2
s2
s2
s2
s2
s2
s2
s2
s1
s1
s1
s1
s1
s1
s1
s1
s1
s1
10
O1
O2
O3
O4
O5
O6
O7
O8
O9
O10
Re-estimated
parameters:
B
1=1, 2=3=0
a11=3/4, a12=1/4
a22=2/3, a23=1/3
a33=1
b1(A)=3/4, b1(B)=1/4
42
Solution to Problem 3
The Backward Procedure
Backward variable :
t i P ot 1, ot 2 ,..., oT qt i,
S3
S3
S3
S3
S3
S3
S2
S2
S2
S2
S2
S2
S1
S3
S1
a31
S1
S1
S1
b1(o3) 3(1)
T-1
o1
o2
o3
oT-1
oT
43
Time
Solution to Problem 3
The Backward Procedure (contd)
t i P ot 1 , ot 2 ,..., oT qt i,
Algorithm
1. Initializa tion T i 1, 1 i N
N
i 1
i 1
P o2 , o3 ,..., oT q1 i, P o1 q1 i, P q1 i
N
i 1
N
1 (i )bi (o1 ) i
i 1
cf. P O T i
44
i 1
Solution to Problem 3
The Forward-Backward Algorithm
Relation between the forward and backward variables
t i P o1o2 ...ot , qt i
N
t i [ t 1 j a ji ]bi (ot )
j 1
t i P ot 1ot 2 ...oT qt i,
t i
aij b j (ot 1 ) t 1 j
j 1
t i t (i ) P O, qt i
P O iN1 t i t (i )
Solution to Problem 3
The Forward-Backward Algorithm (contd)
t i t (i )
P (o1 , o2 ,..., ot , qt i | ) P(ot 1 , ot 2 ,..., oT | qt i, )
P (o1 , o2 ,..., ot | qt i, ) P(qt i | ) P (ot 1 , ot 2 ,..., oT | qt i, )
P (o1 , o2 ,..., oT | qt i, ) P (qt i | )
P (o1 , o2 ,..., oT , qt i | )
P O, qt i
P O P O, qt i t (i ) t (i )
N
i 1
i 1
46
P O iN1 t i t (i )
P (O, qt i | ) t i t i
t i
P O
P O
t i t i
i 1 t i t i
N
t( i, j )=P(qt = i, qt+1 = j | O, )
Probability of being in state i at time t and state j at time t+1, given O an
t i aijb j ot 1 t 1 j
P qt i, qt 1 j , O
d
t i, j
N
P O
t m amnbn ot 1 t 1 n
N N
m 1n 1
t i t i, j
j 1
47
Ss13
Ss13
S3
S3
S3
S3
Ss2
Ss2
S2
S2
S2
S2
Ss31
Ss31
S1
S1
S1
S1
T-1
T Time
oT-1
oT
3(1)
3(1)
o1
o2
o3
48
b3(o4)
4(3)
Ss13
Ss13
S3
S3
S3
S3
Ss2
Ss2
S2
S2
S2
S2
Ss31
Ss31
S1
S1
S1
S1
a13
3(1)
1
o1
o2
o3
T-1
oT-1
oT
49
Time
t( i, j )=P(qt = i, qt+1 = j | O, )
T 1
t i, j
t 1
t(i)= P(qt = i | O, )
T 1
t i
t 1
t i,j
t i
t 1
t j
T
t 1
expected number of times in state j and observing symbol vk
s.t. ot vk
b j vk
T
expected number of times in state j
t j
t 1
51