An Introduction to Hidden Markov Models

Document Sample
An Introduction to Hidden Markov Models Powered By Docstoc
					     An Introduction to Hidden Markov
      Models and Gesture Recognition

                       Troy L. McDaniel
                      Research Assistant
          Center for Cognitive Ubiquitous Computing
                   Arizona State University

Notation and Algorithms From (Dugad and Desai 1996)
 Please send your questions, comments, and errata to
The Big Picture

                                We learned
                                about this part

                  Now lets take a closer look
                  at how this part works…
   A hidden Markov model can be used to recognize any
    temporal or modeling sequence
   How? We can train a finite state machine using
    training data consisting of sequences of symbols
   States will represent, e.g., poses for gestures, and
    transitions between states will have probabilities

                          HMM                goodbye
   Speech Recognition
   Computational Biology
   Computer Vision
   Biometrics
   Gesture Recognition
   And many others…

   Lets take a look at gesture recognition in detail…
   Gesture Recognition-Training
      Interact with a computer through gestures
      Training
         Create a database of gestures

            We store the feature vectors of poses that

             make up each gesture
         Create a database of poses to increase accuracy

         Train HMMs for each class of gestures

Goodbye                                          Single
Gesture                                          Pose
Gesture Recognition-Testing
   Testing
      Segmentation – Obtain the user’s hand by identifying skin
       color pixels. This performs background subtraction.
      Feature Extraction – Extract features. For example, we
       can fit ellipsoids around fingers and palm, and use their
       major axes and angles between them.

       Pose Recognition – Match feature vectors with those in
        the pose database to improve recognition.
       Gesture Recognition – Run gestures through all of the
        HMMs. The HMM with the highest probability is the
        recognized gesture.
Gesture Recognition System
   Overview of system

                         Next, we will learn
                         how HMMs work…
Urns and Marbles Example
   There are 3 urns filled with any number of marbles each of a
    certain color, say red, green or blue
   A friend of ours is in a room choosing urns, each time taking out
    a marble, shouting the color, and putting it back
   We’re outside the room and cannot see in!
   We know the # of urns and observations (R, R, G, R, B,..)
   But what is it that we don’t know?

                       He just               RED!
                       saw a red!
Urns and Marbles Example-ll
   The urns are states, each with an initial probability
   Transition probabilities exist between states
   The Markovian property
   Each state represents a distribution of symbols (E.g.,
    red = 25%, green = 25% and blue = 50% for urn 1)
So What’s An HMM?
   As we’ve already seen, it is a finite number of states
    connected by transitions, which can generate an
    observation sequence depending on its transition,
    bias, and initial probabilities
   It is represented as a set of three sets of probabilities
   The Markov model is hidden because we don’t know
    which state led to each observation
   Going from the urn example to more familiar
So What’s An HMM?-ll
   For gesture recognition, a state will represent a pose
   The distribution for each state will be symbols
    represented by feature vectors—e.g., the major axes
    of fingers and palm, and the angles between them.
   Remember that during training, each gesture, even
    though it may belong to the same class (goodbye,
    etc.), will have variations.
   An HMM can either represent a single object such as
    a word or gesture, or a collection of objects.
The Algorithms
   Next, we’re going to cover algorithms for training and
    testing hidden Markov models
   Algorithms include Forward-Backward [1], Viterbi [1],
    K-means, Baum-Welch [1], and the Kullback-Leibler
    based distance measure [1]
   Each algorithm, once explained, will be mapped to
Notation [1]
HMM Structure Pseudocode
   For the pseudocode, assume that HMMs are objects,
    containing the constants and data structures below.
Problem #1
   HMM applications are reduced to solving 3 problems.
    Lets look at the first one…
   Problem 1: Given , how do we compute P(O| )?
   Solution: Forward-Backward Algorithm
   Why do we care? And when do we use it?

                   What’s the
                   of getting
                   B, G, R, B?
Why Do We Care?
         Red, Green, Blue

 HMM 1       HMM 2          HMM 3

  98%          5%            50%
But First, the Brute Force Approach
   Lets look at the brute force approach [1] first
   We can find this probability by finding the probability
    of O for a fixed state sequence times the probability
    of getting that state sequence
   But we do this for every possible state sequence…

   With NT possible state sequences, it’s not practical.
                  Blue, Green, Red, Blue
                  Urn 1   Urn 1       Urn 1   Urn 1
              N   Urn 2   Urn 2       Urn 2   Urn 2
                  Urn 3   Urn 3       Urn 3   Urn 3
Forward Algorithm
   A more practical approach: Forward Algorithm [1]
   The forward variable

       The probability of the partial observation sequence
        up to time t and state i at time t
   It is an inductive algorithm, shown next…
                 What’s the
                 probability of
                 getting B, G, R,
                 B, and ending
                 at urn 2?
Forward Algorithm-ll

   Order N2T multiplications!
  Forward Algorithm Pseudocode


Forward Algorithm Pseudocode-ll


 Forward Algorithm Example
               1        2        3
             0.125    0.0192   0.0071
         1                              What’s the
              0.05    0.0351   0.0149
States   2                              of R, G, B?
             0.1125   0.0047   0.0075

                                            Just add up the circled
                                            values… It’s 2.95%!
Backward Algorithm
   Next is the Backward Algorithm [1]
   The backward variable

      The probability of the observation sequence Ot+1,
       Ot+2, …, OT given an HMM and state i at time t
   Similar, but important distinctions from the forward
      These differences allow us to break a sequence in
       half and attack it from both ends
      Reduced run time

      Allows for novel algorithms
Backward Algorithm-ll
      Backward Algorithm Pseudocode


Backward Algorithm Pseudocode-ll


 Backward Algorithm Example
               1       2     3
              0.12    0.5    1
         1                       What’s the
             0.1125   0.5    1
States   2                       of R, G, B?
             0.0788   0.5    1

                                    0.5*0.25*0.12 +
                                    0.25*0.2*0.1125 +
                                    0.25*0.45*0.0788 = 2.9%
Problem #2
   Problem 2: Given , find a state sequence I such that
    the occurrence of the observation sequence O is
    greater than from any other state sequence. I.e., find
    a state sequence such that P(O, I| ) is maximized.
   Solution: Viterbi Algorithm [1]
   Why do we care? And when do we use it?
               What sequence of
               urns will give us
               the best chance of       3
               getting B, G, B?
                                    1       2
       Why Do We Care?

      A particular state sequence within a hidden
      Markov model can correspond to a certain object,
      such as the word ‘hello’, which is made up of
      phonemes represented as states.

This highest-probability                             …
sequence may correspond to a
particular word or gesture, for
         Viterbi Algorithm

As these increase…

Cost is -ln(aijbj(Ot)) = -ln(0.6*0.3) = 1.71       0.6
                                                             = 0.3

                    …the cost decreases!       1         2
Viterbi Algorithm-ll
   So, it all comes down to finding the path with the
    minimum cost!
   A low probability = a large cost
   A high probability = a small cost

Low Probability -> High Cost    High Probability -> Low Cost
Viterbi Algorithm-lll

     (and order of N2T multiplications!)
Viterbi Algorithm Pseudocode

    Viterbi Algorithm Pseudocode-ll


    Viterbi Algorithm Pseudocode-lll


         Viterbi Algorithm Example
aTable          Time           sTable          Time
               1     2                        1     2
             2.0794   3.9278                   0      3
         1                              1
             2.9957   3.2833                   0      1
States 2                       States 2
             2.1848   3.5711                   0      3
         3                              3                       What’s the
                                                                best path for
                                                                of R, B?

                                            Take the minimum value here,
                                            match it with this entry, trace
                                            backward, and we get a path of 1,2.
Problem #3

      How can we maximize
      the probability of
      getting B, G, B?

      Or maximize R, B, G,
      and it’s best state
      sequence 1, 3, 2?
     K-means Algorithm

Training Data   K-means
                           Trained HMM
K-means Algorithm-ll
K-means Algorithm Pseudocode

K-means Algorithm Pseudocode-ll




    K-means Algorithm Pseudocode-lll


    K-means Algorithm Pseudocode-lV

         K-means Algorithm Example
                        Initial means
Let the colors of                                                     2
marbles in our urns                       2                           2       2
take on decimal
values, and be a              1               Classify   1        1       2
function of R, G, B.

                                                3                                 3
    Points in R, G, B space

                                                                      2       2
        HMM Generation                    Re-classify    1        1       1

                                        Calculate new means                       3
    Baum-Welch Re-estimation Formulas

Initial HMM
              Baum-Welch     Trained HMM
Baum-Welch Re-estimation Formulas-ll
Baum-Welch Re-estimation Formulas-lll
Gamma Pseudocode
Xi Pseudocode
Baum-Welch Pseudocode



Baum-Welch Pseudocode-ll


Distance Between HMMs
Distance Pseudocode
[1] R. Dugad and U. B. Desai, “A Tutorial on Hidden Markov
    Models,” Published Online, May 1996. See

[2] L. R. Rabiner and B. H. Juang, “An introduction to hidden
    Markov models,” IEEE ASSP Mag., pp. 4-16, Jun. 1986.

Shared By: