Learning Center
Plans & pricing Sign in
Sign Out

PatReco Introduction


									PatReco: Hidden Markov Models

     Alexandros Potamianos
Dept of ECE, Tech. Univ. of Crete
         Fall 2004-2005
      Markov Models: Definition

 Markov chains are Bayesian networks that
  model sequences of events (states)
 Sequential events are dependent
 Two non-sequential events are conditionally
  independent given the intermediate events
            Markov chains

MM-0   q1   q1   q2   q3   q4   …

MM-1   q0   q1   q2   q3   q4   …

       q0   q1   q2   q3   q4   …

MM-3   q0   q1   q2   q3   q4   …
                Markov Chains

MM-0: P(q1,q2.. qN) = n=1..N P(qn)

MM-1: P(q1,q2.. qN) = n=1..N P(qn|qn-1)

MM-2: P(q1,q2.. qN) = n=1..N P(qn|qn-1,qn-2)

MM-3: P(q1,q2.. qN) = n=1..N P(qn|qn-1,qn-2,qn-3)
          Hidden Markov Models

 Hidden Markov chains model sequences of events
  and corresponding sequences of observations
 Events form an Markov chain (MM-1)
 Observations are conditionally independent given
  the sequence of events
 Each observation is directly connected with a single
  event (and conditionally independent with the rest
  of the events in the network)
         Hidden Markov Models

HMM-1     q0    q1    q2     q3    q4    …

          o0    o1    o2     o3    o4    …

P(o0,o1..oN ,q0,q1..qN) = n=0..N P(qn|qn-1)P(on|qn)
                 Parameter Estimation

 The parameters that have to be estimated are the
      a-priori probabilities P(q0)
      transition probabilities P(qn|qn-1)
      observation probabilities P(on|qn)
 For example if there are 3 types of events and
  continuous 1-D observations that follow a Gaussian
  distribution there are 18 parameters to estimate:
      3 a-priori probabilities
      3x3 transition probabilities matrix
      3 means and 3 variances (observation probabilities)
          Parameter Estimation

 If both the sequence of events and sequences
  of observations are fully observable then ML
  is used
 Usually the sequence of events q0,q1..qN are
  non-observable in which case EM is used
 The EM algorithm for HMMs is the Baum-
  Welsh or forward-backward algorithm

 The main inference problem for HMMs is
  known as the decoding problem: given a
  sequence of observations find the best
  sequence of states:
     q = argmaxq P(q|O) = argmaxq P(q,O)
 An efficient decoding algorithm is the
  Viterbi algorithm
                Viterbi algorithm

maxq P(q,O) =
maxq P(o0,o1..oN ,q0,q1..qN) =
maxq   n=0..N P(qn|qn-1)P(on|qn) =
  maxqN {P(oN|qN) maxqN-1{P(qN|qN-1)P(oN-1|qN-1) …
  maxq2 {P(q3|q2)P(o2|q2) maxq1 {P(q2|q1)P(o1|q1)
  maxq0 {P(q1|q0) P(o0|q0) P(q0)}}}…}}
               Viterbi algorithm

           At each
           node keep
           only the best
3          probable)
           path from
4          all the paths
           through that
                 Deep Thoughts

 HMM-0 (HMM with MM-0 event chain) is
  the Bayes classifier!!!
 MMs and HMMs are poor models but simple
  and efficient computationally
     How do you fix this? (dependent observations?)
           Some Applications

 Speech Recognition
 Optical Character Recognition
 Part-of-Speech Tagging

 HMMs and MMs are useful modeling tools
  for dependent sequence of events (states or
 Efficient algorithms exist for training HMM
  parameters (Baum-Welsh) and decoding the
  most probable sequence of states given an
  observation sequence (Viterbi)
 HMMs have many applications

To top