Anda di halaman 1dari 51

Hidden Markov Models

Hsin-Min Wang
whm@iis.sinica.edu.tw

References:
1.
2.
3.

L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter 6


X. Huang et. al., (2001) Spoken Language Processing, Chapter 8
L. R. Rabiner, (1989) A Tutorial on Hidden Markov Models and Selected Applications in
Speech Recognition, Proceedings of the IEEE, vol. 77, No. 2, February 1989
1

Hidden Markov Model (HMM)


History
Published in Baums papers in late 1960s and early 1970s
Introduced to speech processing by Baker (CMU) and Jelinek (I
BM) in the 1970s
Introduced to computational biology in late1980s
Lander and Green (1987) used HMMs in the construction of genetic
linkage maps
Churchill (1989) employed HMMs to distinguish coding from noncodi
ng regions in DNA

Hidden Markov Model (HMM)


Assumption
Speech signal (DNA sequence) can be characterized as a para
metric random process
Parameters can be estimated in a precise, well-defined manner

Three fundamental problems


Evaluation of probability (likelihood) of a sequence of observatio
ns given a specific HMM
Determination of a best sequence of model states
Adjustment of model parameters so as to best account for obser
ved signal/sequence

Hidden Markov Model (HMM)


0.34

Given an initial model as follows:


0.34 0.33 0.33
A 0.33 0.34 0.33
0.33 0.33 0.34

S1

b1 A 0.34, b1 B 0.33, b1 C 0.33

b2 A 0.33, b2 B 0.34, b2 C 0.33

0.33
0.34

S2

b3 A 0.33, b3 B 0.33, b3 C 0.34


0.34 0.33 0.33

back

{A:.34,B:.33,C:.33}
0.33

0.33 0.33
0.33

S3

0.34

0.33

{A:.33,B:.34,C:.33} {A:.33,B:.33,C:.34}

We can train HMMs for the following


two classes using their training data respectively.
Training set for class 1:
1. ABBCABCAABC
2. ABCABC
3. ABCA ABC
4. BBABCAB
5. BCAABCCAB
6. CACCABCA
7. CABCABCA
8. CABCA
9. CABCA

Training set for class 2:


1. BBBCCBC
2. CCBABB
3. AACCBBB
4. BBABBAC
5. CCAABBAB
6. BBBCCBAA
7. ABBBBABA
8. CCCCC
9. BBAAA

We can then decide which


class the following testing
sequences belong to.
ABCABCCAB
AABABCCCCBBB
4

Probability Theorem
Consider the simple scenario of rolling two dice, labeled die 1 and die 2.
Define the following three events:
A: Die 1 lands on 3.
B: Die 2 lands on 1.
C: The dice sum to 8.
{(2,6), (3,5), (4,4), (5,3), (6,2)}
Prior probability: P(A)=P(B)=1/6, P(C)=5/36.
Joint probability: P(A,B) (or P(AB)) =1/36, two events A and B are
statistically independent if and only if P(A,B) = P(A)xP(B).
AB ={(3,1)}
P(B,C)=0, two events B and C are mutually exclusive if and
BC=
only if BC=, i.e., P(BC)=0.

Conditional probability: P(C , A) 1 / 36 1


P (C | A)

P(B|A)=P(B), P(C|B)=0 P( A) 1 / 6 6

Bayes arg max P ( | O) arg max P (O | ) P ( ) arg max P (O | ) P ( )

rule
P (O)
Posterior probability

5
maximum likelihood
principle

The Markov Chain


P ( X1, X 2 ,..., X n 2 , X n 1, X n )

P ( A, B ) P ( B | A) P ( A)

P ( X n | X1, X 2 ,..., X n 2 , X n 1 ) P ( X1, X 2 ,..., X n 2 , X n 1 )


P ( X n | X1, X 2 ,..., X n 1 ) P( X n 1 | X1, X 2 ,..., X n 2 ) P ( X1, X 2 ,..., X n 2 )
P ( X n | X1, X 2 ,..., X n 1 ) P ( X n 1 | X1, X 2 ,..., X n 2 )
P ( X n 2 | X1, X 2 ,..., X n 3 ) ... P ( X 3 | X1, X 2 ) P ( X1, X 2 )
P ( X n | X1, X 2 ,..., X n 1 ) P ( X n 1 | X1, X 2 ,..., X n 2 )
P ( X n 2 | X1, X 2 ,..., X n 3 ) ... P ( X 3 | X1, X 2 ) P ( X 2 | X1 ) P ( X1 )
n

P ( X1 ) P (X i | X1, X 2 ,..., X i 1 )
i 2

P ( X1, X 2 ,..., X n ) P ( X1 ) P (X i | X i 1 )
i 2

First-order Markov chain


6

Observable Markov Model


The parameters of a Markov chain, with N states labeled
by {1,,N} and the state at time t in the Markov chain de
noted as qt, can be described as
( Nj1 aij 1 i )

aij=P(qt= j|qt-1=i) 1i,jN

i =P(q1=i) 1iN

(i 1 i 1)

The output of the process is the


set of states at each time instant t,
where each state corresponds to an
observable event Xi
(Rabiner 1989)
There is a one-to-one
correspondence between the observable sequence and t
he Markov chain state sequence
7

The Markov Chain Ex 1


A 3-state Markov Chain

0.6

State 1 generates symbol A only,


State 2 generates symbol B only,
State 3 generates symbol C only

0.6 0.3 0.1


A 0.1 0.7 0.2
0.3 0.2 0.5
0.4 0.5 0.1

S1
0.3
0.7

S2

A
0.3

0.1 0.1
0.2

S3

0.2

Given a sequence of observed symbols O={CABBCABC}, the only


one corresponding state sequence is Q={S3S1S2S2S3S1S2S3}, and the c
orresponding probability is
P(O|)=P(CABBCABC|)=P(Q| )=P(S3S1S2S2S3S1S2S3 |)
8

0.5

The Markov Chain Ex 2


A three-state Markov chain for the Dow Jones Industrial
average

The probability of 5 consecutive up days

P 5 consecutiv e up days

P S1,S1,S1,S1,S1
1a11a11a11a11

0.5 0.6 0.0648

(Huang et al., 2001)

0.5

0.2
0.3

Extension to Hidden Markov Models


HMM: an extended version of Observable Markov Model
The observation is a probabilistic function (discrete or continuou
s) of a state instead of an one-to-one correspondence of a state
The model is a doubly embedded stochastic process with an underly
ing stochastic process that is not directly observable (hidden)
What is hidden? The State Sequence!
According to the observation sequence, we are not sure which st
ate sequence generates it!

10

Hidden Markov Models Ex 1


A 3-state discrete HMM

0.6

0.6 0.3 0.1


Initial model

A 0.1 0.7 0.2


0.3 0.2 0.5
b1 A 0.3, b1 B 0.2, b1 C 0.5
b2 A 0.7, b2 B 0.1, b2 C 0.2

0.7

S1

{A:.3,B:.2,C:.5}

0.3

S2

0.3
0.1 0.1
0.2

S3

0.5

0.2

b3 A 0.3, b3 B 0.6, b3 C 0.1

{A:.7,B:.1,C:.2}
{A:.3,B:.6,C:.1}
0.4 0.5 0.1
Given an observation sequence O={ABC}, there are 27 possibl
e corresponding state sequences, and therefore the probability,
P(O|), is

P O P O, Q i P O Q i , P Q i ,
27

27

i 1

i 1

Q i : state sequence

e.g. when Q i S 2 S 2 S 3 , P O Q i , P A S 2 P B S 2 P C S 3 0.7 0.1 0.1 0.007


P Q i S 2 P S 2 S 2 P S 3 S 2 0.5 0.7 0.2 0.07

11

Hidden Markov Models Ex 2


Given a three-state Hidden Markov Model for the Dow Jones Industrial average
as follows:
cf. the Markov chain

(Huang et al., 2001)

(35 state sequences can generate up, up, up, up, up.)

How to find the probability P(up, up, up, up, up|)?


How to find the optimal state sequence of the model which generates the observation
sequence up, up, up, up, up?
12

Elements of an HMM
An HMM is characterized by the following:
1. N, the number of states in the model
2. M, the number of distinct observation symbols per state
3. The state transition probability distribution A={aij}, where aij=P[q
t+1=j|qt=i], 1i,jN
4. The observation symbol probability distribution in state j, B={bj
(vk)} , where bj(vk)=P[ot=vk|qt=j], 1jN, 1kM
5. The initial state distribution ={i}, where i=P[q1=i], 1iN

For convenience, we usually use a compact notation =


(A,B,) to indicate the complete parameter set of an HM
M
. Requires specification of two model parameters (N and M)
13

Two Major Assumptions for HMM


First-order Markov assumption
The state transition depends only on the origin and destination

P Q P q1 ,..., qt ,..., qT P q1 P qt qt 1 ,
T

t 2

The state transition probability is time invariant

aij=P(qt+1=j|qt=i), 1i, jN

Output-independent assumption
The observation is dependent on the state that generates it, not
T
dependent on its neighbor observations T

P O Q, P o1 ,..., ot ,..., oT q1 ,..., qt ,..., qT , P ot qt , bqt ot


t 1

t 1

14

Three Basic Problems for HMMs


Given an observation sequence O=(o1,o2,,oT), and a
n HMM =(A,B,)
Problem 1:
How to compute P(O|) efficiently ?
Evaluation Problem

P(up, up, up, up, up|)?

* arg max P (O | i )
i

Problem 2:
How to choose an optimal state sequence Q=(q1,q2,, qT) whic
h best explains the observations?
*
Q
arg max P (Q, O | )
Decoding Problem
Q
Problem 3:
How to adjust the model parameters =(A,B,) to maximize P(O|
)?
Learning/Training Problem
15

Solution to Problem 1

16

Solution to Problem 1 - Direct Evaluation


Given O and , find P(O|)= Pr{observing O given }
Evaluating all possible state sequences of length T that ge
nerate the observation sequence O
P O P O, Q P O Q, P Q
all Q

all Q

P Q : The probability of the path Q


By first-order Markov assumption

P Q P q1 P qt qt 1 , q1 aq1q2 aq2 q3 ...aqT 1qT


T

t 2

P O Q, : The joint output probability along the path Q


By output-independent assumption

P O Q, P ot qt , bqt ot
T

t 1

t 1

17

Solution to Problem 1 - Direct Evaluation (cont


d)
3 b3 (o1 ) a32 b2 (o2 ) a23 b3 (o3 )

State

a21 b1 (oT )

S3

S3

S3

S3

S3

S2

S2

S2

S2

S2

S1

S1

S1

S1

S1

o1

o2

o3

T-1

oT-1

Sj

means bj(ot) has been computed

aij

means aij has been computed

oT

18

Time

Solution to Problem 1 - Direct Evaluation (cont


d)
P O

P Q P O Q,

all Q

q1 aq1q2 aq2q3 .....aqT 1qT bq1 o1 bq2 o2 .....bqT oT

all Q

q1 bq1 o1 aq1q2 bq2 o2 .....aqT 1qT bqT oT

q1,q2 ,..,qT

A Huge Computation Requirement:


O(NT) (NT state
sequences)
T
T
T

Complexity :

2T-1 N

MUL 2TN , N -1 ADD

Exponential computational complexity

P O

A more efficient algorithm can be used to evaluate


19

Solution to Problem 1 - The Forward Procedure


Base on the HMM assumptions, the calculation of
P qt qt 1 , and P ot qt , involves only qt-1, qt , and o
t , so it is possible to compute the likelihood
P O

with recursion on t

Forward variable :

t i P o1 , o2 ,..., ot , qt i

The probability of the joint event that o1,o2,,ot are observed and
the state at time t is i, given the model

t 1 j P o1 , o2 ,..., ot , ot 1 , qt 1 j
N

t (i )aij b j (ot 1 )
i 1

20

Solution to Problem 1 - The Forward Procedure


(contd)
t 1 j P o1, o2 ,..., ot , ot 1 , qt 1 j | P( A, B | ) P( ) P( B, ) P( ) P( A | B, ) P( B | )
P o1, o2 ,..., ot , ot 1 | qt 1 j , P(qt 1 j | ) Output-independent assumption
P o1, o2 ,..., ot | qt 1 j , P (ot 1 | qt 1 j , ) P(qt 1 j | )
P o1, o2 ,..., ot , qt 1 j | P (ot 1 | qt 1 j , ) P( A | B, ) P( B | ) P( A, B | )
P o1, o2 ,..., ot , qt 1 j | b j (ot 1 ) P o q j, b (o )
P( A, B, )

t 1

P o1, o2 ,..., ot , qt i, qt 1

i 1

P o1, o2 ,..., ot , qt i P(qt 1

i 1

P o1, o2 ,..., ot , qt i P(qt 1

i 1

i 1

t (i)aij

t 1

j b j (ot 1 )

P( A, B, )

P A

P ( B, )

t 1

P ( A, B)

all B

P ( A, B | ) P ( A | ) P ( B | A, )

j | o1 , o2 ,..., ot , qt i, ) b j (ot 1 )

First-order Markov assumption

j | qt i, ) b j (ot 1 )

b j (ot 1 )
21

Solution to Problem 1 - The Forward Procedure


(contd)

State index

3(2)=P(o1,o2,o3,q3=2|)
Time index
State

=[2(1)*a12+ 2(2)*a22 +2(3)*a32]b2(o3)


S3

S3
S3
a
2(3) 32 b (o )
2 3

S2

S2

2(2)
S1

S1

a22
a12

S3

S3

S2

S2

S2

S1

S1

S1

2(1)
1

o1

o2

o3

T-1
oT-1

Sj

means bj(ot) has been computed

aij

means aij has been computed

22

T
oT

Time

Solution to Problem 1 - The Forward Procedure


(contd)
t i P o1o2 ...ot , qt i

Algorithm

1. Initialization 1 i P (o1, q1 i | ) ibi o1 , 1 i N

2. Induction t 1 j t i aij b j ot 1 , 1 t T-1,1 j N


i 1

3.Terminat ion

P O T i
N

i 1

Complexity: O(N T)
2

M UL: N(N+1 )(T-1 )+N N 2T

cf. O(NT) for


direct evaluation

ADD : (N-1 )N(T-1 ) N 1 N 2T

Based on the lattice (trellis) structure


Computed in a time-synchronous fashion from left-to-right, wher
e each cell for time t is completely computed before proceeding t
o time t+1

All state sequences, regardless how long previously, mer


ge to N nodes (states) at each time instance t
23

Solution to Problem 1 - The Forward Procedure


(contd)
A three-state Hidden Markov Model for the Dow Jones In
dustrial average
2(1)= (0.35*0.6+0.02*0.5+0.09*0.4)*0.7

a11=0.6

1(1)=0.5*0.7

a21=0.5
1=0.5 b1(up)=0.7

1(2)= 0.2*0.1

b1(up)=0.7
a31=0.4
a12=0.2

2(2)=(0.35*0.2+0.02*0.3+0.09*0.1)*0.1

a22=0.3
2=0.2

b2(up)= 0.1

1(3)= 0.3*0.3

b2(up)= 0.1
a23=0.2
a32=0.1

a13=0.2

2(3)=(0.35*0.2+0.02*0.2+0.09*0.5)*0.3

a33=0.5
3=0.3

(Huang et al., 2001)

b3(up)=0.3

b3(up)=0.3

P(up, up|) = 2(1)+2(2)+2(3)


24

Solution to Problem 2

25

Solution to Problem 2 - The Viterbi Algorithm


The Viterbi algorithm can be regarded as the dynamic pr
ogramming algorithm applied to the HMM or as a modifie
d forward algorithm
Instead of summing probabilities from different paths coming to t
he same destination state, the Viterbi algorithm picks and reme
mbers the best path
Find a single optimal state sequence Q*

Q* arg max P (Q, O | )


Q

The Viterbi algorithm also can be illustrated in a trellis framework


similar to the one for the forward algorithm
26

Solution to Problem 2 - The Viterbi Algorithm (c


ontd)

State

S3

S3

S3

S3

S3

S2

S2

S2

S2

S2

S1

S1

S1

S1

S1

o1

o2

o3

T-1
oT-1

T
oT

27

Time

Solution to Problem 2 - The Viterbi Algorithm (c


ontd)
1. Initialization

1 i i bi o1 , 1 i N

2. Induction

t 1 j max [ t i aij ]b j ot 1 , 1 t T-1,1 j N

1 (i ) 0 , 1 i N

1i N

t 1 ( j ) arg max [ t i aij ], 1 t T-1,1 j N


1i N

3. Termination

P * O max T i
1i N

qT* arg max T i

N
cf. t 1 j t i aij b j ot 1
i 1

P O T i

1i N

4. Backtracking

i 1

q*t t 1 (qt*1 ), t T 1.T 2,...,1


Q* ( q1* , q2* ,..., qT* ) is the best state sequence

Complexity: O(N2T)
28

Solution to Problem 2 - The Viterbi Algorithm (c


ontd)
A three-state Hidden Markov Model for the Dow Jones In
dustrial average
*
*
q1 2 (q2 ) 2 (1) 1

q*2 arg max 2 i 1


1i 3

1(1)=0.5*0.7
1=0.5 b1(up)=0.7

a21=0.5

1(2)= 0.2*0.1
a22=0.3
2=0.2

0.09

(Huang et al., 2001)

b1(up)=0.7 (1)= 0.35*0.6*0.7=0.147


2
a31=0.4
a12=0.2

b2(up)= 0.1

1(3)= 0.3*0.3

3=0.3

2(1)
=max (0.35*0.6, 0.02*0.5, 0.09*0.4)*0.7

a11=0.6

b3(up)=0.3

2(1)=1
2(2)
=max (0.35*0.2, 0.02*0.3, 0.09*0.1)*0.1

b2(up)= 0.1
a32=0.1
a23=0.2
a33=0.5

a13=0.2

2(2)= 0.35*0.2*0.1=0.007
2(2)=1

2(3)
=max (0.35*0.2, 0.02*0.2, 0.09*0.5)*0.3
2(3)= 0.35*0.2*0.3=0.021
b3(up)=0.3
2(3)=1

The most likely state sequence that generates 29


up up: 1 1

Some Examples

30

Isolated Digit Recognition


S3

S3

S3

S3

S3

S3
P(O | 1 ) T (3)

S2

S2

S2

S2

S2

S2

S1

S1

S1

S1

S1

S1

S3

S3

S3

S3

S3

S3

S3

P (O | 0 ) T (3)

S2

S2

S2

S2

S2

S2

S2

S1

S1

S1

S1

S1

S1

S1

o1

o2

o3

T-1

oT-1

oT
31

Time

Continuous Digit Recognition

S6

S6

S6

S6

S6

S6

S6

S5

S5

S5

S5

S5

S5

S5

S4

S4

S4

S4

S4

S4

S4

S3

S3

S3

S3

S3

S3

S3

T (6)
T (6)

T (3)
T (3)

S2

S2

S2

S2

S2

S2

S2

S1

S1

S1

S1

S1

S1

S1

o1

o2

o3

T-1
oT-1

T
32

oT

Time

Continuous Digit Recognition (contd)


1

8 (6)
8 (6)

S6

S6

S6

S6

S6

S6

S6

S6

S6

S5

S5

S5

S5

S5

S5

S5

S5

S5

S4

S4

S4

S4

S4

S4

S4

S4

S4

S3

S3

S3

S3

S3

S3

S3

S3

S3

S2

S2

S2

S2

S2

S2

S2

S2

S2

S1

S1

S1

S1

S1

S1

S1

S1

S1

1
Time
S1

S1

S2

S3

S3

S4

S5

S5

S6

Best
state
sequence

33

CpG Islands
Two Questions
Q1: Given a short sequence, does it come from a CpG is
land?
Q2: Given a long sequence, how would we find the CpG
islands in it?

34

CpG Islands
Answer to Q1:
Given sequence x, probabilistic model M1 of CpG islands, and pr
obabilistic model M2 for non-CpG island regions
Compute p1=P(x|M1) and p2=P(x|M2)
If p1 > p2, then x comes from a CpG island (CpG+)
If p2 > p1, then x does not come from a CpG island (CpG-)
S1:A

S3:T

S2:C

S4:G

CpG+

0.180

0.274

0.426

0.120

0.171

0.368

0.274

0.188

0.161

0.339

0.375

0.125

0.079

0.355

0.384

0.182

vs.

CpG-

0.300

0.205

0.285

0.210

0.322

0.298

0.078

0.302

Small CG
transition
probability

0.248

0.246

0.298

0.208

0.177

0.239

0.292

0.292
35

Large CG
transition
probability

CpG Islands
Answer to Q2:

p12=0.00001
p11=0.99999

S1

p21=0.0001

CpG-

A: 0.3
C: 0.2
G: 0.2
T: 0.3

Hidden

p22=0.9999

S2

A: 0.2
C: 0.3
G: 0.3
T: 0.2

S1

S1

S1

S2

S2

S2

S2

CpG+

S1

S1

Observable
36

A Toy Example: 5 Splice Site Recognition


5 splice site indicates the switch from an exon to an int
ron
Assumptions:
Uniform base composition on average in exons (25% each bas
e)
Introns are A/T rich (40% A/T, and 10% C/G)
The 5SS consensus nucleotide is almost always a G (say, 95%
G and 5% A)

From What is a hidden Markov Model?, by Sean R. Eddy


37

A Toy Example: 5 Splice Site Recognition

38

Solution to Problem 3

39

Solution to Problem 3
Maximum Likelihood Estimation of Model Parameters
How to adjust (re-estimate) the model parameters =(A,
B,) to maximize P(O|)?
The most difficult one among the three problems, because there
is no known analytical method that maximizes the joint probabilit
y of the training data in a closed form
The data is incomplete because of the hidden state sequence

The problem can be solved by the iterative Baum-Welch algorith


m, also known as the forward-backward algorithm
The EM (Expectation Maximization) algorithm is perfectly suitable fo
r this problem

Alternatively, it can be solved by the iterative segmental K-mean


s algorithm
The model parameters are adjusted to maximize P(O, Q* |), Q* is th
e state sequence given by the Viterbi algorithm
Provide a good initialization of Baum-Welch training
40

Solution to Problem 3
The Segmental K-means Algorithm
Assume that we have a training set of observations and an initial est
imate of model parameters
Step 1 : Segment the training data
The set of training observation sequences is segmented into states, bas
ed on the current model, by the Viterbi Algorithm
Step 2 : Re-estimate the model parameters
i

Number of times q1 i
Number of training sequences

aij

Number of transitions from state i to state j


Number of transitions from state i

bj k

Number of " k " in state j


Number of observations in state j

41

Solution to Problem 3
The Segmental K-means Algorithm (contd)
3 states and 2 codewords
State

Training data:

s3

s3

s3

s3

s3

s3

s3

s3

s3

s3

s2

s2

s2

s2

s2

s2

s2

s2

s2

s2

s1

s1

s1

s1

s1

s1

s1

s1

s1

s1

10

O1

O2

O3

O4

O5

O6

O7

O8

O9

O10

Re-estimated
parameters:

B
1=1, 2=3=0
a11=3/4, a12=1/4
a22=2/3, a23=1/3
a33=1
b1(A)=3/4, b1(B)=1/4

What if the training data is labeled?

42

Solution to Problem 3
The Backward Procedure
Backward variable :

t i P ot 1, ot 2 ,..., oT qt i,

The probability of the partial observation sequence ot+1,ot+2,,oT,


given state i at time t and the model
2(3)=P(o3,o4,, oT|q2=3,)
=a31* b1(o3)*3(1)+a32* b2(o3)*3(2)+a33* b3(o3)*3(3)
State

S3

S3

S3

S3

S3

S3

S2

S2

S2

S2

S2

S2

S1

S3

S1

a31
S1

S1

S1
b1(o3) 3(1)

T-1

o1

o2

o3

oT-1

oT

43

Time

Solution to Problem 3
The Backward Procedure (contd)
t i P ot 1 , ot 2 ,..., oT qt i,

Algorithm

1. Initializa tion T i 1, 1 i N
N

2. Induction t i aij b j ot 1 t 1 j , 1 t T-1,1 j N


j 1

Complexity MUL : 2 N 2(T-1 ) N 2T ; ADD : (N-1 )N(T-1 ) N 2T


P O P o1 , o2 , o3 ,..., oT , q1 i P o1 , o2 , o3 ,..., oT q1 i, P q1 i
N

i 1

i 1

P o2 , o3 ,..., oT q1 i, P o1 q1 i, P q1 i
N

i 1
N

1 (i )bi (o1 ) i
i 1

cf. P O T i
44

i 1

Solution to Problem 3
The Forward-Backward Algorithm
Relation between the forward and backward variables
t i P o1o2 ...ot , qt i
N

t i [ t 1 j a ji ]bi (ot )
j 1

t i P ot 1ot 2 ...oT qt i,
t i

aij b j (ot 1 ) t 1 j
j 1

t i t (i ) P O, qt i

P O iN1 t i t (i )

(Huang et al., 2001)


45

Solution to Problem 3
The Forward-Backward Algorithm (contd)
t i t (i )
P (o1 , o2 ,..., ot , qt i | ) P(ot 1 , ot 2 ,..., oT | qt i, )
P (o1 , o2 ,..., ot | qt i, ) P(qt i | ) P (ot 1 , ot 2 ,..., oT | qt i, )
P (o1 , o2 ,..., oT | qt i, ) P (qt i | )
P (o1 , o2 ,..., oT , qt i | )
P O, qt i

P O P O, qt i t (i ) t (i )
N

i 1

i 1

46

Solution to Problem 3 The Intuitive View


t i t (i ) P O, qt i

Define two new variables:


t(i)= P(qt = i | O, )

P O iN1 t i t (i )

Probability of being in state i at time t, given O and

P (O, qt i | ) t i t i
t i

P O
P O

t i t i

i 1 t i t i
N

t( i, j )=P(qt = i, qt+1 = j | O, )
Probability of being in state i at time t and state j at time t+1, given O an
t i aijb j ot 1 t 1 j
P qt i, qt 1 j , O
d

t i, j
N

P O

t m amnbn ot 1 t 1 n
N N

m 1n 1

t i t i, j
j 1

47

Solution to Problem 3 The Intuitive View (cont


d)
P(q3 = 1, O | )=3(1)*3(1)
State

Ss13

Ss13

S3

S3

S3

S3

Ss2

Ss2

S2

S2

S2

S2

Ss31

Ss31

S1

S1

S1

S1

T-1

T Time

oT-1

oT

3(1)

3(1)

o1

o2

o3

48

Solution to Problem 3 The Intuitive View (cont


d)
P(q3 = 1, q4 = 3, O | )=3(1)*a13*b3(o4)*4(3)
State

b3(o4)

4(3)

Ss13

Ss13

S3

S3

S3

S3

Ss2

Ss2

S2

S2

S2

S2

Ss31

Ss31

S1

S1

S1

S1

a13

3(1)
1

o1

o2

o3

T-1

oT-1

oT

49

Time

Solution to Problem 3 The Intuitive View (cont


d)

t( i, j )=P(qt = i, qt+1 = j | O, )
T 1

t i, j

t 1

expected number of transitio ns from state i to state j in O

t(i)= P(qt = i | O, )
T 1

t i

t 1

expected number of transitio ns from state i in O


50

Solution to Problem 3 The Intuitive View (cont


d)

Re-estimation formulae for , A, and B are


i expected freqency (number of times) in state i at time (t 1) 1 i
T-1

t i,j

expected number of transitions from state i to state j t 1


aij
T-1
expected number of transitions from state i

t i

t 1

t j
T

t 1
expected number of times in state j and observing symbol vk
s.t. ot vk
b j vk
T
expected number of times in state j
t j
t 1

51

Anda mungkin juga menyukai