# PatReco Introduction

Document Sample

```					PatReco: Hidden Markov Models

Alexandros Potamianos
Dept of ECE, Tech. Univ. of Crete
Fall 2004-2005
Markov Models: Definition

 Markov chains are Bayesian networks that
model sequences of events (states)
 Sequential events are dependent
 Two non-sequential events are conditionally
independent given the intermediate events
(MM-1)
Markov chains

MM-0   q1   q1   q2   q3   q4   …

MM-1   q0   q1   q2   q3   q4   …

q0   q1   q2   q3   q4   …
MM-2

MM-3   q0   q1   q2   q3   q4   …
Markov Chains

MM-0: P(q1,q2.. qN) = n=1..N P(qn)

MM-1: P(q1,q2.. qN) = n=1..N P(qn|qn-1)

MM-2: P(q1,q2.. qN) = n=1..N P(qn|qn-1,qn-2)

MM-3: P(q1,q2.. qN) = n=1..N P(qn|qn-1,qn-2,qn-3)
Hidden Markov Models

 Hidden Markov chains model sequences of events
and corresponding sequences of observations
 Events form an Markov chain (MM-1)
 Observations are conditionally independent given
the sequence of events
 Each observation is directly connected with a single
event (and conditionally independent with the rest
of the events in the network)
Hidden Markov Models

HMM-1     q0    q1    q2     q3    q4    …

o0    o1    o2     o3    o4    …

P(o0,o1..oN ,q0,q1..qN) = n=0..N P(qn|qn-1)P(on|qn)
Parameter Estimation

 The parameters that have to be estimated are the
   a-priori probabilities P(q0)
   transition probabilities P(qn|qn-1)
   observation probabilities P(on|qn)
 For example if there are 3 types of events and
continuous 1-D observations that follow a Gaussian
distribution there are 18 parameters to estimate:
   3 a-priori probabilities
   3x3 transition probabilities matrix
   3 means and 3 variances (observation probabilities)
Parameter Estimation

 If both the sequence of events and sequences
of observations are fully observable then ML
is used
 Usually the sequence of events q0,q1..qN are
non-observable in which case EM is used
 The EM algorithm for HMMs is the Baum-
Welsh or forward-backward algorithm
Inference/Decoding

 The main inference problem for HMMs is
known as the decoding problem: given a
sequence of observations find the best
sequence of states:
q = argmaxq P(q|O) = argmaxq P(q,O)
 An efficient decoding algorithm is the
Viterbi algorithm
Viterbi algorithm

maxq P(q,O) =
maxq P(o0,o1..oN ,q0,q1..qN) =
maxq   n=0..N P(qn|qn-1)P(on|qn) =
maxqN {P(oN|qN) maxqN-1{P(qN|qN-1)P(oN-1|qN-1) …
maxq2 {P(q3|q2)P(o2|q2) maxq1 {P(q2|q1)P(o1|q1)
maxq0 {P(q1|q0) P(o0|q0) P(q0)}}}…}}
Viterbi algorithm
time

1
At each
node keep
2
only the best
(most
3          probable)
path from
4          all the paths
passing
.
.
through that
node
K
Deep Thoughts

 HMM-0 (HMM with MM-0 event chain) is
the Bayes classifier!!!
 MMs and HMMs are poor models but simple
and efficient computationally
   How do you fix this? (dependent observations?)
Some Applications

 Speech Recognition
 Optical Character Recognition
 Part-of-Speech Tagging
…
Conclusions

 HMMs and MMs are useful modeling tools
for dependent sequence of events (states or
classes)
 Efficient algorithms exist for training HMM
parameters (Baum-Welsh) and decoding the
most probable sequence of states given an
observation sequence (Viterbi)
 HMMs have many applications

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 3 posted: 2/27/2012 language: pages: 14