Introduction

Problem Formulation

Now we talk about Hidden Markov Model. Well, what is HMM used for? Consider the following problem:

Given an unknown observation: $O$ , recognize it as one of $N$ classes with minimum probability of error.

So how to define the error and error probability?
Conditional Error: Given $O$ , the risk associated with deciding that it is a class $i$ event:

R (S_{i} | O) = \sum_{j = 1}^{N} e_{i j} P (S_{j} | O)

where

P (S_{j} | O)

is the probability of that

O

is a class

S_{j}

event and

e_{i j}

is the cost of classifying a class

j

event as a class

i

event.

e_{i j} > 0, e_{i i} = 0

.
Expected Error:

E = \int R (S (O) | O) p (O) d O

where

S (O)

is the decision made on

O

based on a policy. Then the question can be considered as:

How should $S (O)$ be made to achieve minimum error probability? Or $P (S (O) | O)$ is maximized?

Bayes Decision Theory

If we institute the policy: $S (O) = S_{i} = \arg max_{S_{j}} P (S_{j} | O)$ then $R (S (O) | O) = min_{S_{j}} R (S_{j} | O)$ . It is the so-called Maximum A Posteriori (MAP) decision. But how do we know $P (S_{j} | O), i = 1, 2, \dots, M$ for any $O$ ?

Markov Model

States : $S = {S_{0}, S_{1}, S_{2}, \dots, S_{N}}$
Transition probabilities : $P (q_{t} = S_{i} | q_{t - 1} = S_{j})$
Markov Assumption:

P (q_{t} = S_{i} | q_{t - 1} = S_{j}, q_{t - 1} = S_{k}, \dots) = P (q_{t} = S_{i} | q_{t - 1} = S_{j}) = a_{j i}, a_{j i} \geq 0, \sum_{i = 1}^{N} a_{j i} = 1, \forall j

Hidden Markov Model

States: $S = {S_{0}, S_{1}, S_{2}, \dots, S_{N}}$
Transition probabilities : $P (q_{t} = S_{i} | q_{t - 1} = S_{j}) = a_{j i}$
Output probability distributions (at state $j$ for symbol $k$ ): $P (y_{t} = O_{k} | q_{t} = S_{j}) = b_{j} (k, λ_{j})$ parameterized by $λ_{j}$ .

浅谈隐式马尔可夫模型 - A Brief Note of Hidden Markov Model (HMM)

HMM Problems and Solutions

Evaluation: Given a model, compute probability of observation sequence.
Decoding: Find a state sequence which maximizes probability of the observation sequence.
Training: Adjust model parameters which maximizes probability of observed sequences.

Evaluation

Compute the probability of observation sequence $O = o_{1} o_{2} \dots o_{T}$ , given a HMM model parameter $λ$ :

\begin{aligned} P (O | λ) & = \sum_{\forall Q} P (O, Q | λ), Q = q_{0} q_{1} q_{2} \dots q_{T} \\ = \sum_{\forall Q} a_{q_{0} q_{1}} b_{q_{1}} (o_{1}) \cdot a_{q_{1} q_{2}} b_{q_{2}} (o_{2}) \dots a_{q_{T - 1} q_{T}} b_{q_{T}} (o_{T}) \end{aligned}

This is not practical since the number of paths is

O (N^{T})

. How to deal with it?

Forward Algorithm

α_{t} (j) = P (o_{1} o_{2} \dots o_{t}, q_{t} = S_{j} | λ)

Compute

α

recursively:

\begin{matrix} (1) & α_{0} (j) = {\begin{aligned} 1, if S_{j} is the start state \\ 0, otherwise \end{aligned} \\ (2) & α_{t} (j) = [\sum_{i = 0}^{N} α_{t - 1} (i) a_{i j}] b_{j} (o_{t}), t > 0 \end{matrix}

Then

P (O | λ) = α_{T} (N)

Computation is

O (N^{2} T)

Backward Algorithm

β_{t} (i) = P (o_{t + 1} o_{t + 2} \dots o_{T} | q_{t} = S_{i}, λ)

Compute

β

recursively:

\begin{matrix} (3) & β_{T} (i) = {\begin{aligned} 1, if S_{i} is the end state \\ 0, otherwise \end{aligned} \\ (4) & β_{t} (i) = \sum_{j = 0}^{N} a_{i j} b_{j} (o_{t + 1}) β_{t + 1} (j), t < T \end{matrix}

Then

P (O | λ) = β_{0} (0)

Computation is

O (N^{2} T)

Decoding

Find the state sequence $Q$ which maximizes $P (O, Q | λ)$ .

Viterbi Algorithm

V P_{t} (i) = max_{q_{0} q_{1} \dots q_{t - 1}} P (o_{1} o_{2} \dots o_{t}, q_{t} = S_{i} | λ)

Compute

V P

recursively:

V P_{t} (j) = max_{i = 0, 1, \dots N} V P_{t - 1} (i) a_{i j} b_{j} (o_{t}) t > 0

Then

P (O, Q | λ) = V P_{T} (N)

Save each maximum for backtrace at end.

Training

For the sake of tuning $λ$ to maximize $P (O | λ)$ , there is NO efficient algorithm for global optimum, nonetheless, efficient iterative algorithm capable of finding a local optimum exists.

Baum-Welch Reestimation

Define the probability of transiting from $S_{i}$ to $S_{j}$ at time $t$ given $O$ as1

ξ_{t} (i, j) = P (q_{t} = S_{i}, q_{t + 1} = S_{j} | O, λ) = \frac{α_{t} (i) a_{i j} b_{j} (O_{t + 1}) β_{t + 1} (j)}{P (O | λ)}

Let

\begin{matrix} (5) & {\bar{a}}_{i j} = \frac{Expected num. of trans. from S_{i} to S_{j}}{Expected num. of trans. from S_{i}} = \frac{\sum_{t = 0}^{T - 1} ξ_{t} (i, j)}{\sum_{t = 0}^{T - 1} \sum_{j = 0}^{N} ξ_{t} (i, j)} \\ (6) & {\bar{b}}_{j} (k) = \frac{Expected num. of times in S_{j} with symbol k}{Expected num. of times in S_{j}} = \frac{\sum_{t : O_{t + 1} = k} \sum_{i = 0}^{N} ξ_{t} (i, j)}{\sum_{t = 0}^{T - 1} \sum_{i = 0}^{N} ξ_{t} (i, j)} \end{matrix}

Training Algorithm:

Initialize $λ = (A, B)$ .
Compute $α, β$ and $ξ$ .
Estimate $\bar{λ} = (\bar{A}, \bar{B})$ from $ξ$ .
Replace $λ$ with $\bar{λ}$ .
If not converge, go to 2.

Reference

More details needed? Refer to :

“An Introduction to Hidden Markov Models”, by Rabiner and Juang.
“Hidden Markov Models: Continuous Speech Recognition”, by Kai-Fu Lee.

Thanks B. H. Juang in Georgia Institute of Technology.
Thanks Wayne Ward in Carnegie Mellon University.

Forward-Backward Algorithm
↩

浅谈隐式马尔可夫模型 - A Brief Note of Hidden Markov Model (HMM)

Introduction

Problem Formulation

Bayes Decision Theory

Markov Model

Hidden Markov Model

HMM Problems and Solutions

Evaluation

Forward Algorithm

Backward Algorithm

Decoding

Viterbi Algorithm

Training

Baum-Welch Reestimation

Reference

相关推荐