Hidden Markov Models

Hidden Markov Models
Hsin-Min Wang
whm@iis.sinica.edu.tw
References:
1.
2.
3.
L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter 6

X. Huang et. al., (2001) Spoken Language Processing, Chapter 8
L. R. Rabiner, (1989) A Tutorial on Hidden Markov Models and Selected Applications in
Speech Recognition, Proceedings of the IEEE, vol. 77, No. 2, February 1989
1
Hidden Markov Model (HMM)

History
Published in Baums papers in late 1960s and early 1970s
Introduced to speech processing by Baker (CMU) and Jelinek (I
BM) in the 1970s
Introduced to computational biology in late1980s
Lander and Green (1987) used HMMs in the construction of genetic
linkage maps
Churchill (1989) employed HMMs to distinguish coding from noncodi
ng regions in DNA

Assumption
Speech signal (DNA sequence) can be characterized as a para
metric random process
Parameters can be estimated in a precise, well-defined manner
Three fundamental problems

Evaluation of probability (likelihood) of a sequence of observatio
ns given a specific HMM
Determination of a best sequence of model states
Adjustment of model parameters so as to best account for obser
ved signal/sequence

0.34
Given an initial model as follows:

0.34 0.33 0.33
A 0.33 0.34 0.33
0.33 0.33 0.34
S1
b1 A 0.34, b1 B 0.33, b1 C 0.33
b2 A 0.33, b2 B 0.34, b2 C 0.33
0.33
0.34
S2
b3 A 0.33, b3 B 0.33, b3 C 0.34

0.34 0.33 0.33
back
{A:.34,B:.33,C:.33}
0.33
0.33 0.33
0.33
S3
0.34
0.33
{A:.33,B:.34,C:.33} {A:.33,B:.33,C:.34}
We can train HMMs for the following

two classes using their training data respectively.
Training set for class 1:
1. ABBCABCAABC
2. ABCABC
3. ABCA ABC
4. BBABCAB
5. BCAABCCAB
6. CACCABCA
7. CABCABCA
8. CABCA
9. CABCA
Training set for class 2:

1. BBBCCBC
2. CCBABB
3. AACCBBB
4. BBABBAC
5. CCAABBAB
6. BBBCCBAA
7. ABBBBABA
8. CCCCC
9. BBAAA
We can then decide which

class the following testing
sequences belong to.
ABCABCCAB
AABABCCCCBBB
4
Probability Theorem
Consider the simple scenario of rolling two dice, labeled die 1 and die 2.
Define the following three events:
A: Die 1 lands on 3.
B: Die 2 lands on 1.
C: The dice sum to 8.
{(2,6), (3,5), (4,4), (5,3), (6,2)}
Prior probability: P(A)=P(B)=1/6, P(C)=5/36.
Joint probability: P(A,B) (or P(AB)) =1/36, two events A and B are
statistically independent if and only if P(A,B) = P(A)xP(B).
AB ={(3,1)}
P(B,C)=0, two events B and C are mutually exclusive if and
BC=
only if BC=, i.e., P(BC)=0.
Conditional probability: P(C , A) 1 / 36 1

P (C | A)
P(B|A)=P(B), P(C|B)=0 P( A) 1 / 6 6
Bayes arg max P ( | O) arg max P (O | ) P ( ) arg max P (O | ) P ( )
rule
P (O)
Posterior probability
5
maximum likelihood
principle
The Markov Chain

P ( X1, X 2 ,..., X n 2 , X n 1, X n )
P ( A, B ) P ( B | A) P ( A)
P ( X n | X1, X 2 ,..., X n 2 , X n 1 ) P ( X1, X 2 ,..., X n 2 , X n 1 )

P ( X n | X1, X 2 ,..., X n 1 ) P( X n 1 | X1, X 2 ,..., X n 2 ) P ( X1, X 2 ,..., X n 2 )
P ( X n | X1, X 2 ,..., X n 1 ) P ( X n 1 | X1, X 2 ,..., X n 2 )
P ( X n 2 | X1, X 2 ,..., X n 3 ) ... P ( X 3 | X1, X 2 ) P ( X1, X 2 )
P ( X n | X1, X 2 ,..., X n 1 ) P ( X n 1 | X1, X 2 ,..., X n 2 )
P ( X n 2 | X1, X 2 ,..., X n 3 ) ... P ( X 3 | X1, X 2 ) P ( X 2 | X1 ) P ( X1 )
n
P ( X1 ) P (X i | X1, X 2 ,..., X i 1 )
i 2
P ( X1, X 2 ,..., X n ) P ( X1 ) P (X i | X i 1 )
i 2
First-order Markov chain

6
Observable Markov Model

The parameters of a Markov chain, with N states labeled
by {1,,N} and the state at time t in the Markov chain de
noted as qt, can be described as
( Nj1 aij 1 i )
aij=P(qt= j|qt-1=i) 1i,jN
i =P(q1=i) 1iN
(i 1 i 1)
The output of the process is the

set of states at each time instant t,
where each state corresponds to an
observable event Xi
(Rabiner 1989)
There is a one-to-one
correspondence between the observable sequence and t
he Markov chain state sequence
7
The Markov Chain Ex 1

A 3-state Markov Chain
0.6
State 1 generates symbol A only,

State 2 generates symbol B only,
State 3 generates symbol C only
0.6 0.3 0.1

A 0.1 0.7 0.2
0.3 0.2 0.5
0.4 0.5 0.1
S1
0.3
0.7
S2
A
0.3
0.1 0.1
0.2
S3
0.2
Given a sequence of observed symbols O={CABBCABC}, the only

one corresponding state sequence is Q={S3S1S2S2S3S1S2S3}, and the c
orresponding probability is
P(O|)=P(CABBCABC|)=P(Q| )=P(S3S1S2S2S3S1S2S3 |)
8
0.5
The Markov Chain Ex 2

A three-state Markov chain for the Dow Jones Industrial
average
The probability of 5 consecutive up days
P 5 consecutiv e up days
P S1,S1,S1,S1,S1
1a11a11a11a11
0.5 0.6 0.0648
(Huang et al., 2001)
0.5
0.2
0.3
Extension to Hidden Markov Models

HMM: an extended version of Observable Markov Model
The observation is a probabilistic function (discrete or continuou
s) of a state instead of an one-to-one correspondence of a state
The model is a doubly embedded stochastic process with an underly
ing stochastic process that is not directly observable (hidden)
What is hidden? The State Sequence!
According to the observation sequence, we are not sure which st
ate sequence generates it!
10
Hidden Markov Models Ex 1

A 3-state discrete HMM
0.6
0.6 0.3 0.1

Initial model
A 0.1 0.7 0.2

0.3 0.2 0.5
b1 A 0.3, b1 B 0.2, b1 C 0.5
b2 A 0.7, b2 B 0.1, b2 C 0.2
0.7
S1
{A:.3,B:.2,C:.5}
0.3
S2
0.3
0.1 0.1
0.2
S3
0.5
0.2
b3 A 0.3, b3 B 0.6, b3 C 0.1
{A:.7,B:.1,C:.2}
{A:.3,B:.6,C:.1}
0.4 0.5 0.1
Given an observation sequence O={ABC}, there are 27 possibl
e corresponding state sequences, and therefore the probability,
P(O|), is
P O P O, Q i P O Q i , P Q i ,
27
27
i 1
i 1
Q i : state sequence
e.g. when Q i S 2 S 2 S 3 , P O Q i , P A S 2 P B S 2 P C S 3 0.7 0.1 0.1 0.007

P Q i S 2 P S 2 S 2 P S 3 S 2 0.5 0.7 0.2 0.07
11
Hidden Markov Models Ex 2

Given a three-state Hidden Markov Model for the Dow Jones Industrial average
as follows:
cf. the Markov chain
(35 state sequences can generate up, up, up, up, up.)
How to find the probability P(up, up, up, up, up|)?

How to find the optimal state sequence of the model which generates the observation
sequence up, up, up, up, up?
12
Elements of an HMM
An HMM is characterized by the following:
1. N, the number of states in the model
2. M, the number of distinct observation symbols per state
3. The state transition probability distribution A={aij}, where aij=P[q
t+1=j|qt=i], 1i,jN
4. The observation symbol probability distribution in state j, B={bj
(vk)} , where bj(vk)=P[ot=vk|qt=j], 1jN, 1kM
5. The initial state distribution ={i}, where i=P[q1=i], 1iN
For convenience, we usually use a compact notation =

(A,B,) to indicate the complete parameter set of an HM
M
. Requires specification of two model parameters (N and M)
13
Two Major Assumptions for HMM

First-order Markov assumption
The state transition depends only on the origin and destination
P Q P q1 ,..., qt ,..., qT P q1 P qt qt 1 ,
T
t 2
The state transition probability is time invariant
aij=P(qt+1=j|qt=i), 1i, jN
Output-independent assumption
The observation is dependent on the state that generates it, not
T
dependent on its neighbor observations T
P O Q, P o1 ,..., ot ,..., oT q1 ,..., qt ,..., qT , P ot qt , bqt ot

t 1
t 1
14
Three Basic Problems for HMMs

Given an observation sequence O=(o1,o2,,oT), and a
n HMM =(A,B,)
Problem 1:
How to compute P(O|) efficiently ?
Evaluation Problem
P(up, up, up, up, up|)?
* arg max P (O | i )
i
Problem 2:
How to choose an optimal state sequence Q=(q1,q2,, qT) whic
h best explains the observations?
*
Q
arg max P (Q, O | )
Decoding Problem
Q
Problem 3:
How to adjust the model parameters =(A,B,) to maximize P(O|
)?
Learning/Training Problem
15
Solution to Problem 1
16
Solution to Problem 1 - Direct Evaluation

Given O and , find P(O|)= Pr{observing O given }
Evaluating all possible state sequences of length T that ge
nerate the observation sequence O
P O P O, Q P O Q, P Q
all Q
all Q
P Q : The probability of the path Q

By first-order Markov assumption
P Q P q1 P qt qt 1 , q1 aq1q2 aq2 q3 ...aqT 1qT

T
t 2
P O Q, : The joint output probability along the path Q

By output-independent assumption
P O Q, P ot qt , bqt ot
T
t 1
t 1
17
Solution to Problem 1 - Direct Evaluation (cont

d)
3 b3 (o1 ) a32 b2 (o2 ) a23 b3 (o3 )
State
a21 b1 (oT )
S3
S3
S3
S3
S3
S2
S2
S2
S2
S2
S1
S1
S1
S1
S1
o1
o2
o3
T-1
oT-1
Sj
means bj(ot) has been computed
aij
means aij has been computed
oT
18
Time
Solution to Problem 1 - Direct Evaluation (cont

d)
P O
P Q P O Q,
all Q
q1 aq1q2 aq2q3 .....aqT 1qT bq1 o1 bq2 o2 .....bqT oT
all Q
q1 bq1 o1 aq1q2 bq2 o2 .....aqT 1qT bqT oT
q1,q2 ,..,qT
A Huge Computation Requirement:

O(NT) (NT state
sequences)
T
T
T
Complexity :
2T-1 N
MUL 2TN , N -1 ADD
Exponential computational complexity
P O
A more efficient algorithm can be used to evaluate

19
Solution to Problem 1 - The Forward Procedure

Base on the HMM assumptions, the calculation of
P qt qt 1 , and P ot qt , involves only qt-1, qt , and o
t , so it is possible to compute the likelihood
P O
with recursion on t
Forward variable :
t i P o1 , o2 ,..., ot , qt i
The probability of the joint event that o1,o2,,ot are observed and
the state at time t is i, given the model
t 1 j P o1 , o2 ,..., ot , ot 1 , qt 1 j
N
t (i )aij b j (ot 1 )
i 1
20

(contd)
t 1 j P o1, o2 ,..., ot , ot 1 , qt 1 j | P( A, B | ) P( ) P( B, ) P( ) P( A | B, ) P( B | )
P o1, o2 ,..., ot , ot 1 | qt 1 j , P(qt 1 j | ) Output-independent assumption
P o1, o2 ,..., ot | qt 1 j , P (ot 1 | qt 1 j , ) P(qt 1 j | )
P o1, o2 ,..., ot , qt 1 j | P (ot 1 | qt 1 j , ) P( A | B, ) P( B | ) P( A, B | )
P o1, o2 ,..., ot , qt 1 j | b j (ot 1 ) P o q j, b (o )
P( A, B, )
t 1
P o1, o2 ,..., ot , qt i, qt 1
i 1
P o1, o2 ,..., ot , qt i P(qt 1
i 1
P o1, o2 ,..., ot , qt i P(qt 1
i 1
i 1
t (i)aij
t 1
j b j (ot 1 )
P( A, B, )
P A
P ( B, )
t 1
P ( A, B)
all B
P ( A, B | ) P ( A | ) P ( B | A, )
j | o1 , o2 ,..., ot , qt i, ) b j (ot 1 )
First-order Markov assumption
j | qt i, ) b j (ot 1 )
b j (ot 1 )
21

(contd)
State index
3(2)=P(o1,o2,o3,q3=2|)
Time index
State
=[2(1)*a12+ 2(2)*a22 +2(3)*a32]b2(o3)

S3
S3
S3
a
2(3) 32 b (o )
2 3
S2
S2
2(2)
S1
S1
a22
a12
S3
S3
S2
S2
S2
S1
S1
S1
2(1)
1
o1
o2
o3
T-1
oT-1
Sj
means bj(ot) has been computed
aij
means aij has been computed
22
T
oT
Time

(contd)
t i P o1o2 ...ot , qt i
Algorithm
1. Initialization 1 i P (o1, q1 i | ) ibi o1 , 1 i N
2. Induction t 1 j t i aij b j ot 1 , 1 t T-1,1 j N

i 1
3.Terminat ion
P O T i
N
i 1
Complexity: O(N T)
2
M UL: N(N+1 )(T-1 )+N N 2T
cf. O(NT) for

direct evaluation
ADD : (N-1 )N(T-1 ) N 1 N 2T
Based on the lattice (trellis) structure

Computed in a time-synchronous fashion from left-to-right, wher
e each cell for time t is completely computed before proceeding t
o time t+1
All state sequences, regardless how long previously, mer

ge to N nodes (states) at each time instance t
23

(contd)
A three-state Hidden Markov Model for the Dow Jones In
dustrial average
2(1)= (0.35*0.6+0.02*0.5+0.09*0.4)*0.7
a11=0.6
1(1)=0.5*0.7
a21=0.5
1=0.5 b1(up)=0.7
1(2)= 0.2*0.1
b1(up)=0.7
a31=0.4
a12=0.2
2(2)=(0.35*0.2+0.02*0.3+0.09*0.1)*0.1
a22=0.3
2=0.2
b2(up)= 0.1
1(3)= 0.3*0.3
b2(up)= 0.1
a23=0.2
a32=0.1
a13=0.2
2(3)=(0.35*0.2+0.02*0.2+0.09*0.5)*0.3
a33=0.5
3=0.3
b3(up)=0.3
b3(up)=0.3
P(up, up|) = 2(1)+2(2)+2(3)

24
25
Solution to Problem 2 - The Viterbi Algorithm

The Viterbi algorithm can be regarded as the dynamic pr
ogramming algorithm applied to the HMM or as a modifie
d forward algorithm
Instead of summing probabilities from different paths coming to t
he same destination state, the Viterbi algorithm picks and reme
mbers the best path
Find a single optimal state sequence Q*
Q* arg max P (Q, O | )

Q
The Viterbi algorithm also can be illustrated in a trellis framework

similar to the one for the forward algorithm
26
Solution to Problem 2 - The Viterbi Algorithm (c

ontd)
State
S3
S3
S3
S3
S3
S2
S2
S2
S2
S2
S1
S1
S1
S1
S1
o1
o2
o3
T-1
oT-1
T
oT
27
Time

ontd)
1. Initialization
1 i i bi o1 , 1 i N
2. Induction
t 1 j max [ t i aij ]b j ot 1 , 1 t T-1,1 j N
1 (i ) 0 , 1 i N
1i N
t 1 ( j ) arg max [ t i aij ], 1 t T-1,1 j N

1i N
3. Termination
P * O max T i
1i N
qT* arg max T i
N
cf. t 1 j t i aij b j ot 1
i 1
P O T i
1i N
4. Backtracking
i 1
q*t t 1 (qt*1 ), t T 1.T 2,...,1

Q* ( q1* , q2* ,..., qT* ) is the best state sequence
Complexity: O(N2T)
28

ontd)
A three-state Hidden Markov Model for the Dow Jones In
dustrial average
*
*
q1 2 (q2 ) 2 (1) 1
q*2 arg max 2 i 1

1i 3
1(1)=0.5*0.7
1=0.5 b1(up)=0.7
a21=0.5
1(2)= 0.2*0.1
a22=0.3
2=0.2
0.09
b1(up)=0.7 (1)= 0.35*0.6*0.7=0.147

2
a31=0.4
a12=0.2
b2(up)= 0.1
1(3)= 0.3*0.3
3=0.3
2(1)
=max (0.35*0.6, 0.02*0.5, 0.09*0.4)*0.7
a11=0.6
b3(up)=0.3
2(1)=1
2(2)
=max (0.35*0.2, 0.02*0.3, 0.09*0.1)*0.1
b2(up)= 0.1
a32=0.1
a23=0.2
a33=0.5
a13=0.2
2(2)= 0.35*0.2*0.1=0.007
2(2)=1
2(3)
=max (0.35*0.2, 0.02*0.2, 0.09*0.5)*0.3
2(3)= 0.35*0.2*0.3=0.021
b3(up)=0.3
2(3)=1
The most likely state sequence that generates 29

up up: 1 1
Some Examples
30
Isolated Digit Recognition

S3
S3
S3
S3
S3
S3
P(O | 1 ) T (3)
S2
S2
S2
S2
S2
S2
S1
S1
S1
S1
S1
S1
S3
S3
S3
S3
S3
S3
S3
P (O | 0 ) T (3)
S2
S2
S2
S2
S2
S2
S2
S1
S1
S1
S1
S1
S1
S1
o1
o2
o3
T-1
oT-1
oT
31
Time
Continuous Digit Recognition
S6
S6
S6
S6
S6
S6
S6
S5
S5
S5
S5
S5
S5
S5
S4
S4
S4
S4
S4
S4
S4
S3
S3
S3
S3
S3
S3
S3
T (6)
T (6)
T (3)
T (3)
S2
S2
S2
S2
S2
S2
S2
S1
S1
S1
S1
S1
S1
S1
o1
o2
o3
T-1
oT-1
T
32
oT
Time
Continuous Digit Recognition (contd)

1
8 (6)
8 (6)
S6
S6
S6
S6
S6
S6
S6
S6
S6
S5
S5
S5
S5
S5
S5
S5
S5
S5
S4
S4
S4
S4
S4
S4
S4
S4
S4
S3
S3
S3
S3
S3
S3
S3
S3
S3
S2
S2
S2
S2
S2
S2
S2
S2
S2
S1
S1
S1
S1
S1
S1
S1
S1
S1
1
Time
S1
S1
S2
S3
S3
S4
S5
S5
S6
Best
state
sequence
33
CpG Islands
Two Questions
Q1: Given a short sequence, does it come from a CpG is
land?
Q2: Given a long sequence, how would we find the CpG
islands in it?
34
CpG Islands
Answer to Q1:
Given sequence x, probabilistic model M1 of CpG islands, and pr
obabilistic model M2 for non-CpG island regions
Compute p1=P(x|M1) and p2=P(x|M2)
If p1 > p2, then x comes from a CpG island (CpG+)
If p2 > p1, then x does not come from a CpG island (CpG-)
S1:A
S3:T
S2:C
S4:G
CpG+
0.180
0.274
0.426
0.120
0.171
0.368
0.274
0.188
0.161
0.339
0.375
0.125
0.079
0.355
0.384
0.182
vs.
CpG-
0.300
0.205
0.285
0.210
0.322
0.298
0.078
0.302
Small CG
transition
probability
0.248
0.246
0.298
0.208
0.177
0.239
0.292
0.292
35
Large CG
transition
probability
CpG Islands
Answer to Q2:
p12=0.00001
p11=0.99999
S1
p21=0.0001
CpG-
A: 0.3
C: 0.2
G: 0.2
T: 0.3
Hidden
p22=0.9999
S2
A: 0.2
C: 0.3
G: 0.3
T: 0.2
S1
S1
S1
S2
S2
S2
S2
CpG+
S1
S1
Observable
36
A Toy Example: 5 Splice Site Recognition

5 splice site indicates the switch from an exon to an int
ron
Assumptions:
Uniform base composition on average in exons (25% each bas
e)
Introns are A/T rich (40% A/T, and 10% C/G)
The 5SS consensus nucleotide is almost always a G (say, 95%
G and 5% A)
From What is a hidden Markov Model?, by Sean R. Eddy

37
A Toy Example: 5 Splice Site Recognition
38
39
Maximum Likelihood Estimation of Model Parameters
How to adjust (re-estimate) the model parameters =(A,
B,) to maximize P(O|)?
The most difficult one among the three problems, because there
is no known analytical method that maximizes the joint probabilit
y of the training data in a closed form
The data is incomplete because of the hidden state sequence
The problem can be solved by the iterative Baum-Welch algorith

m, also known as the forward-backward algorithm
The EM (Expectation Maximization) algorithm is perfectly suitable fo
r this problem
Alternatively, it can be solved by the iterative segmental K-mean

s algorithm
The model parameters are adjusted to maximize P(O, Q* |), Q* is th
e state sequence given by the Viterbi algorithm
Provide a good initialization of Baum-Welch training
40
The Segmental K-means Algorithm
Assume that we have a training set of observations and an initial est
imate of model parameters
Step 1 : Segment the training data
The set of training observation sequences is segmented into states, bas
ed on the current model, by the Viterbi Algorithm
Step 2 : Re-estimate the model parameters
i
Number of times q1 i
Number of training sequences
aij
Number of transitions from state i to state j

Number of transitions from state i
bj k
Number of " k " in state j

Number of observations in state j
41
The Segmental K-means Algorithm (contd)
3 states and 2 codewords
State
Training data:
s3
s3
s3
s3
s3
s3
s3
s3
s3
s3
s2
s2
s2
s2
s2
s2
s2
s2
s2
s2
s1
s1
s1
s1
s1
s1
s1
s1
s1
s1
10
O1
O2
O3
O4
O5
O6
O7
O8
O9
O10
Re-estimated
parameters:
B
1=1, 2=3=0
a11=3/4, a12=1/4
a22=2/3, a23=1/3
a33=1
b1(A)=3/4, b1(B)=1/4
What if the training data is labeled?
42
The Backward Procedure
Backward variable :
t i P ot 1, ot 2 ,..., oT qt i,
The probability of the partial observation sequence ot+1,ot+2,,oT,

given state i at time t and the model
2(3)=P(o3,o4,, oT|q2=3,)
=a31* b1(o3)*3(1)+a32* b2(o3)*3(2)+a33* b3(o3)*3(3)
State
S3
S3
S3
S3
S3
S3
S2
S2
S2
S2
S2
S2
S1
S3
S1
a31
S1
S1
S1
b1(o3) 3(1)
T-1
o1
o2
o3
oT-1
oT
43
Time
The Backward Procedure (contd)
t i P ot 1 , ot 2 ,..., oT qt i,
Algorithm
1. Initializa tion T i 1, 1 i N
N
2. Induction t i aij b j ot 1 t 1 j , 1 t T-1,1 j N

j 1
Complexity MUL : 2 N 2(T-1 ) N 2T ; ADD : (N-1 )N(T-1 ) N 2T

P O P o1 , o2 , o3 ,..., oT , q1 i P o1 , o2 , o3 ,..., oT q1 i, P q1 i
N
i 1
i 1
P o2 , o3 ,..., oT q1 i, P o1 q1 i, P q1 i
N
i 1
N
1 (i )bi (o1 ) i
i 1
cf. P O T i
44
i 1
The Forward-Backward Algorithm
Relation between the forward and backward variables
t i P o1o2 ...ot , qt i
N
t i [ t 1 j a ji ]bi (ot )
j 1
t i P ot 1ot 2 ...oT qt i,
t i
aij b j (ot 1 ) t 1 j
j 1
t i t (i ) P O, qt i
P O iN1 t i t (i )

45
The Forward-Backward Algorithm (contd)
t i t (i )
P (o1 , o2 ,..., ot , qt i | ) P(ot 1 , ot 2 ,..., oT | qt i, )
P (o1 , o2 ,..., ot | qt i, ) P(qt i | ) P (ot 1 , ot 2 ,..., oT | qt i, )
P (o1 , o2 ,..., oT | qt i, ) P (qt i | )
P (o1 , o2 ,..., oT , qt i | )
P O, qt i
P O P O, qt i t (i ) t (i )
N
i 1
i 1
46
Solution to Problem 3 The Intuitive View

t i t (i ) P O, qt i
Define two new variables:

t(i)= P(qt = i | O, )
P O iN1 t i t (i )
Probability of being in state i at time t, given O and
P (O, qt i | ) t i t i
t i
P O
P O
t i t i
i 1 t i t i
N
t( i, j )=P(qt = i, qt+1 = j | O, )
Probability of being in state i at time t and state j at time t+1, given O an
t i aijb j ot 1 t 1 j
P qt i, qt 1 j , O
d
t i, j
N
P O
t m amnbn ot 1 t 1 n
N N
m 1n 1
t i t i, j
j 1
47
Solution to Problem 3 The Intuitive View (cont

d)
P(q3 = 1, O | )=3(1)*3(1)
State
Ss13
Ss13
S3
S3
S3
S3
Ss2
Ss2
S2
S2
S2
S2
Ss31
Ss31
S1
S1
S1
S1
T-1
T Time
oT-1
oT
3(1)
3(1)
o1
o2
o3
48

d)
P(q3 = 1, q4 = 3, O | )=3(1)*a13*b3(o4)*4(3)
State
b3(o4)
4(3)
Ss13
Ss13
S3
S3
S3
S3
Ss2
Ss2
S2
S2
S2
S2
Ss31
Ss31
S1
S1
S1
S1
a13
3(1)
1
o1
o2
o3
T-1
oT-1
oT
49
Time

d)
t( i, j )=P(qt = i, qt+1 = j | O, )
T 1
t i, j
t 1
expected number of transitio ns from state i to state j in O
t(i)= P(qt = i | O, )
T 1
t i
t 1
expected number of transitio ns from state i in O

50

d)
Re-estimation formulae for , A, and B are

i expected freqency (number of times) in state i at time (t 1) 1 i
T-1
t i,j
expected number of transitions from state i to state j t 1

aij
T-1
expected number of transitions from state i
t i
t 1
t j
T
t 1
expected number of times in state j and observing symbol vk
s.t. ot vk
b j vk
T
expected number of times in state j
t j
t 1
51

Hidden Markov Models

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Hidden Markov Models

Diunggah oleh

Hak Cipta:

Format Tersedia

Hidden Markov Models

L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter 6

Hidden Markov Model (HMM)

Hidden Markov Model (HMM)

Three fundamental problems

Hidden Markov Model (HMM)

Given an initial model as follows:

b1 A 0.34, b1 B 0.33, b1 C 0.33

b2 A 0.33, b2 B 0.34, b2 C 0.33

b3 A 0.33, b3 B 0.33, b3 C 0.34

We can train HMMs for the following

Training set for class 2:

We can then decide which

Conditional probability: P(C , A) 1 / 36 1

Bayes arg max P ( | O) arg max P (O | ) P ( ) arg max P (O | ) P ( )

The Markov Chain

P ( X n | X1, X 2 ,..., X n 2 , X n 1 ) P ( X1, X 2 ,..., X n 2 , X n 1 )

First-order Markov chain

Observable Markov Model

aij=P(qt= j|qt-1=i) 1i,jN

The output of the process is the

The Markov Chain Ex 1

State 1 generates symbol A only,

0.6 0.3 0.1

Given a sequence of observed symbols O={CABBCABC}, the only

The Markov Chain Ex 2

The probability of 5 consecutive up days

0.5 0.6 0.0648

(Huang et al., 2001)

Extension to Hidden Markov Models

Hidden Markov Models Ex 1

0.6 0.3 0.1

A 0.1 0.7 0.2

b3 A 0.3, b3 B 0.6, b3 C 0.1

e.g. when Q i S 2 S 2 S 3 , P O Q i , P A S 2 P B S 2 P C S 3 0.7 0.1 0.1 0.007

Hidden Markov Models Ex 2

(Huang et al., 2001)

How to find the probability P(up, up, up, up, up|)?

For convenience, we usually use a compact notation =

Two Major Assumptions for HMM

The state transition probability is time invariant

P O Q, P o1 ,..., ot ,..., oT q1 ,..., qt ,..., qT , P ot qt , bqt ot

Three Basic Problems for HMMs

P(up, up, up, up, up|)?

Solution to Problem 1 - Direct Evaluation

P Q : The probability of the path Q

P Q P q1 P qt qt 1 , q1 aq1q2 aq2 q3 ...aqT 1qT

P O Q, : The joint output probability along the path Q

Solution to Problem 1 - Direct Evaluation (cont

means bj(ot) has been computed

means aij has been computed

Solution to Problem 1 - Direct Evaluation (cont

q1 aq1q2 aq2q3 .....aqT 1qT bq1 o1 bq2 o2 .....bqT oT

q1 bq1 o1 aq1q2 bq2 o2 .....aqT 1qT bqT oT

A Huge Computation Requirement:

MUL 2TN , N -1 ADD

Exponential computational complexity

A more efficient algorithm can be used to evaluate

Solution to Problem 1 - The Forward Procedure

Solution to Problem 1 - The Forward Procedure

P o1, o2 ,..., ot , qt i P(qt 1

P o1, o2 ,..., ot , qt i P(qt 1

First-order Markov assumption

=[2(1)a12+ 2(2)a22 +2(3)*a32]b2(o3)

qt t 1 (qt1 ), t T 1.T 2,...,1

b1(up)=0.7 (1)= 0.350.60.7=0.147