# An Introduction to Hidden Markov Models

Document Sample

```					     An Introduction to Hidden Markov
Models and Gesture Recognition

Troy L. McDaniel
Research Assistant
Center for Cognitive Ubiquitous Computing
Arizona State University

Notation and Algorithms From (Dugad and Desai 1996)
The Big Picture

We learned

Now lets take a closer look
at how this part works…
Introduction
   A hidden Markov model can be used to recognize any
temporal or modeling sequence
   How? We can train a finite state machine using
training data consisting of sequences of symbols
   States will represent, e.g., poses for gestures, and
transitions between states will have probabilities

HMM                goodbye
Applications
   Speech Recognition
   Computational Biology
   Computer Vision
   Biometrics
   Gesture Recognition
   And many others…

   Lets take a look at gesture recognition in detail…
Gesture Recognition-Training
   Interact with a computer through gestures
   Training
 Create a database of gestures

 We store the feature vectors of poses that

make up each gesture
 Create a database of poses to increase accuracy

 Train HMMs for each class of gestures

Goodbye                                          Single
Gesture                                          Pose
Gesture Recognition-Testing
   Testing
 Segmentation – Obtain the user’s hand by identifying skin
color pixels. This performs background subtraction.
 Feature Extraction – Extract features. For example, we
can fit ellipsoids around fingers and palm, and use their
major axes and angles between them.

   Pose Recognition – Match feature vectors with those in
the pose database to improve recognition.
   Gesture Recognition – Run gestures through all of the
HMMs. The HMM with the highest probability is the
recognized gesture.
Gesture Recognition System
   Overview of system

Next, we will learn
how HMMs work…
Urns and Marbles Example
   There are 3 urns filled with any number of marbles each of a
certain color, say red, green or blue
   A friend of ours is in a room choosing urns, each time taking out
a marble, shouting the color, and putting it back
   We’re outside the room and cannot see in!
   We know the # of urns and observations (R, R, G, R, B,..)
   But what is it that we don’t know?

He just               RED!
saw a red!
Urns and Marbles Example-ll
   The urns are states, each with an initial probability
   Transition probabilities exist between states
   The Markovian property
   Each state represents a distribution of symbols (E.g.,
red = 25%, green = 25% and blue = 50% for urn 1)
So What’s An HMM?
   As we’ve already seen, it is a finite number of states
connected by transitions, which can generate an
observation sequence depending on its transition,
bias, and initial probabilities
   It is represented as a set of three sets of probabilities
   The Markov model is hidden because we don’t know
which state led to each observation
   Going from the urn example to more familiar
models…
So What’s An HMM?-ll
   For gesture recognition, a state will represent a pose
   The distribution for each state will be symbols
represented by feature vectors—e.g., the major axes
of fingers and palm, and the angles between them.
   Remember that during training, each gesture, even
though it may belong to the same class (goodbye,
etc.), will have variations.
   An HMM can either represent a single object such as
a word or gesture, or a collection of objects.
The Algorithms
   Next, we’re going to cover algorithms for training and
testing hidden Markov models
   Algorithms include Forward-Backward [1], Viterbi [1],
K-means, Baum-Welch [1], and the Kullback-Leibler
based distance measure [1]
   Each algorithm, once explained, will be mapped to
pseudocode
Notation [1]
HMM Structure Pseudocode
   For the pseudocode, assume that HMMs are objects,
containing the constants and data structures below.
Problem #1
   HMM applications are reduced to solving 3 problems.
Lets look at the first one…
   Problem 1: Given , how do we compute P(O| )?
   Solution: Forward-Backward Algorithm
   Why do we care? And when do we use it?

What’s the
probability
of getting
B, G, R, B?
Why Do We Care?
Red, Green, Blue

HMM 1       HMM 2          HMM 3

98%          5%            50%
But First, the Brute Force Approach
   Lets look at the brute force approach [1] first
   We can find this probability by finding the probability
of O for a fixed state sequence times the probability
of getting that state sequence
   But we do this for every possible state sequence…

   With NT possible state sequences, it’s not practical.
T
Blue, Green, Red, Blue
Urn 1   Urn 1       Urn 1   Urn 1
N   Urn 2   Urn 2       Urn 2   Urn 2
Urn 3   Urn 3       Urn 3   Urn 3
Forward Algorithm
   A more practical approach: Forward Algorithm [1]
   The forward variable


   The probability of the partial observation sequence
up to time t and state i at time t
   It is an inductive algorithm, shown next…
What’s the
probability of
getting B, G, R,
B, and ending
at urn 2?
Forward Algorithm-ll

Order N2T multiplications!
Forward Algorithm Pseudocode

1-2

3
Forward Algorithm Pseudocode-ll

1

2
Forward Algorithm Example
Time
1        2        3
0.125    0.0192   0.0071
1                              What’s the
probability
0.05    0.0351   0.0149
States   2                              of R, G, B?
0.1125   0.0047   0.0075
3

values… It’s 2.95%!
Backward Algorithm
   Next is the Backward Algorithm [1]
   The backward variable


  The probability of the observation sequence Ot+1,
Ot+2, …, OT given an HMM and state i at time t
   Similar, but important distinctions from the forward
variable
 These differences allow us to break a sequence in
half and attack it from both ends
 Reduced run time

 Allows for novel algorithms
Backward Algorithm-ll
Backward Algorithm Pseudocode

1-2

3
Backward Algorithm Pseudocode-ll

1

2
Backward Algorithm Example
Time
1       2     3
0.12    0.5    1
1                       What’s the
probability
0.1125   0.5    1
States   2                       of R, G, B?
0.0788   0.5    1
3

0.5*0.25*0.12 +
0.25*0.2*0.1125 +
0.25*0.45*0.0788 = 2.9%
Problem #2
   Problem 2: Given , find a state sequence I such that
the occurrence of the observation sequence O is
greater than from any other state sequence. I.e., find
a state sequence such that P(O, I| ) is maximized.
   Solution: Viterbi Algorithm [1]
   Why do we care? And when do we use it?
What sequence of
urns will give us
the best chance of       3
getting B, G, B?
1       2
Why Do We Care?

A particular state sequence within a hidden
Markov model can correspond to a certain object,
such as the word ‘hello’, which is made up of
phonemes represented as states.

This highest-probability                             …
sequence may correspond to a
particular word or gesture, for
example.
Viterbi Algorithm

As these increase…

Cost is -ln(aijbj(Ot)) = -ln(0.6*0.3) = 1.71       0.6
= 0.3

…the cost decreases!       1         2
Viterbi Algorithm-ll
   So, it all comes down to finding the path with the
minimum cost!
   A low probability = a large cost
   A high probability = a small cost

Low Probability -> High Cost    High Probability -> Low Cost
Viterbi Algorithm-lll

(and order of N2T multiplications!)
Viterbi Algorithm Pseudocode

1
…
Viterbi Algorithm Pseudocode-ll
…

2

…
Viterbi Algorithm Pseudocode-lll
…

3

4
Viterbi Algorithm Example
aTable          Time           sTable          Time
1     2                        1     2
2.0794   3.9278                   0      3
1                              1
2.9957   3.2833                   0      1
States 2                       States 2
2.1848   3.5711                   0      3
3                              3                       What’s the
best path for
of R, B?

Take the minimum value here,
match it with this entry, trace
backward, and we get a path of 1,2.
Problem #3

How can we maximize
the probability of
getting B, G, B?

Or maximize R, B, G,
and it’s best state
sequence 1, 3, 2?
K-means Algorithm

Training Data   K-means
Trained HMM
Trainer
K-means Algorithm-ll
K-means Algorithm Pseudocode

1
…
K-means Algorithm Pseudocode-ll
…

2

3

4

5
…
K-means Algorithm Pseudocode-lll
…

6

…
K-means Algorithm Pseudocode-lV
…

6
K-means Algorithm Example
Initial means
Let the colors of                                                     2
marbles in our urns                       2                           2       2
take on decimal
1
values, and be a              1               Classify   1        1       2
function of R, G, B.

3                                 3
Points in R, G, B space

2
2       2
1
HMM Generation                    Re-classify    1        1       1

Calculate new means                       3
Baum-Welch Re-estimation Formulas

Initial HMM
Baum-Welch     Trained HMM
Algorithm
Observation
Sequence
Baum-Welch Re-estimation Formulas-ll
Baum-Welch Re-estimation Formulas-lll
Gamma Pseudocode
Xi Pseudocode
Baum-Welch Pseudocode

1

2

…
Baum-Welch Pseudocode-ll
…

3

4
Distance Between HMMs
Distance Pseudocode
References
[1] R. Dugad and U. B. Desai, “A Tutorial on Hidden Markov
Models,” Published Online, May 1996. See