浅谈隐式马尔可夫模型 - A Brief Note of Hidden Markov Model (HMM)

Introduction

Problem Formulation

Now we talk about Hidden Markov Model. Well, what is HMM used for? Consider the following problem:

Given an unknown observation: O, recognize it as one of N classes with minimum probability of error.

So how to define the error and error probability?
Conditional Error: Given O, the risk associated with deciding that it is a class i event:

R(Si|O)=j=1NeijP(Sj|O)
where P(Sj|O) is the probability of that O is a class Sj event and eij is the cost of classifying a class j event as a class i event. eij>0,eii=0.
Expected Error:
E=R(S(O)|O)p(O)dO
where S(O) is the decision made on O based on a policy. Then the question can be considered as:

How should S(O) be made to achieve minimum error probability? Or P(S(O)|O) is maximized?

Bayes Decision Theory

If we institute the policy: S(O)=Si=argmaxSjP(Sj|O) then R(S(O)|O)=minSjR(Sj|O). It is the so-called Maximum A Posteriori (MAP) decision. But how do we know P(Sj|O),i=1,2,,M for any O ?

Markov Model

States : S={S0,S1,S2,,SN}
Transition probabilities : P(qt=Si|qt1=Sj)
Markov Assumption:

P(qt=Si|qt1=Sj,qt1=Sk,)=P(qt=Si|qt1=Sj)=aji,aji0,i=1Naji=1,j

浅谈隐式马尔可夫模型 - A Brief Note of Hidden Markov Model (HMM)

Hidden Markov Model

States: S={S0,S1,S2,,SN}
Transition probabilities : P(qt=Si|qt1=Sj)=aji
Output probability distributions (at state j for symbol k): P(yt=Ok|qt=Sj)=bj(k,λj) parameterized by λj .

浅谈隐式马尔可夫模型 - A Brief Note of Hidden Markov Model (HMM)

HMM Problems and Solutions

  1. Evaluation: Given a model, compute probability of observation sequence.
  2. Decoding: Find a state sequence which maximizes probability of the observation sequence.
  3. Training: Adjust model parameters which maximizes probability of observed sequences.

Evaluation

Compute the probability of observation sequence O=o1o2oT , given a HMM model parameter λ:

P(O|λ)=QP(O,Q|λ),Q=q0q1q2qT=Qaq0q1bq1(o1)aq1q2bq2(o2)aqT1qTbqT(oT)

This is not practical since the number of paths is O(NT). How to deal with it?

Forward Algorithm

αt(j)=P(o1o2ot,qt=Sj|λ)

Compute α recursively:
(1)α0(j)={1,if Sj is the start state0,otherwise(2)αt(j)=[i=0Nαt1(i)aij]bj(ot),t>0
Then
P(O|λ)=αT(N)

Computation is O(N2T).

Backward Algorithm

βt(i)=P(ot+1ot+2oT|qt=Si,λ)

Compute β recursively:
(3)βT(i)={1,if Si is the end state0,otherwise(4)βt(i)=j=0Naijbj(ot+1)βt+1(j),t<T
Then
P(O|λ)=β0(0)

Computation is O(N2T).

Decoding

Find the state sequence Q which maximizes P(O,Q|λ).

Viterbi Algorithm

VPt(i)=maxq0q1qt1P(o1o2ot,qt=Si|λ)

Compute VP recursively:
VPt(j)=maxi=0,1,NVPt1(i)aijbj(ot)t>0
Then
P(O,Q|λ)=VPT(N)
Save each maximum for backtrace at end.

Training

For the sake of tuning λ to maximize P(O|λ), there is NO efficient algorithm for global optimum, nonetheless, efficient iterative algorithm capable of finding a local optimum exists.

Baum-Welch Reestimation

Define the probability of transiting from Si to Sj at time t given O as1

ξt(i,j)=P(qt=Si,qt+1=Sj|O,λ)=αt(i)aijbj(Ot+1)βt+1(j)P(O|λ)

Let

(5)a¯ij=Expected num. of trans. from Si to SjExpected num. of trans. from Si=t=0T1ξt(i,j)t=0T1j=0Nξt(i,j)(6)b¯j(k)=Expected num. of times in Sj with symbol kExpected num. of times in Sj=t:Ot+1=ki=0Nξt(i,j)t=0T1i=0Nξt(i,j)

Training Algorithm:

  1. Initialize λ=(A,B).
  2. Compute α,β and ξ.
  3. Estimate λ¯=(A¯,B¯) from ξ.
  4. Replace λ with λ¯.
  5. If not converge, go to 2.

Reference

More details needed? Refer to :

  1. “An Introduction to Hidden Markov Models”, by Rabiner and Juang.
  2. “Hidden Markov Models: Continuous Speech Recognition”, by Kai-Fu Lee.

  • Thanks B. H. Juang in Georgia Institute of Technology.
  • Thanks Wayne Ward in Carnegie Mellon University.

  1. Forward-Backward Algorithm
    浅谈隐式马尔可夫模型 - A Brief Note of Hidden Markov Model (HMM)