Hidden Markov Models - Download Now PowerPoint by yOm8H3

VIEWS: 10 PAGES: 15

									         Hidden Markov
         Models
Theory

By Johan Walters (SR 2003)
Topics overview
   HMM’s as part of Speech Recognition
   Input / Output
   Basics
       Definition
       Simple HMM
       Markov assumption
       HMM program
   Evaluation problem
   Decoding problem
   Learning problem
HMM in SR
Input / Output
    An HMM is a statistical model that describes a
    probability distribution over a number of possible
    sequences.

   Input:     A sequence of feature vectors
   Output:    Words with highest probability being spoken

    Given a sequence of feature vectors, what words are
    most probably meant?
Basics
 –States
 –State transition probabilities
 –Symbol emission probabilities

                        State transition probability matrix
                                        0.6 0.2 0.2
                           A  aij   0.5 0.3 0.2   aij  P( st  j | st 1  i)
                                                   
                                        0.4 0.1 0.5
                                                   

                                           0.5 
                                                       i  P(s1  i)
                             ( i ) t   0.2 
                                           0.3 
                                           

                           O  {o1 , o2 ,...,oM }       up, down, unchanged
A simple HMM
                                   {1,2,3}   Formal definition HMM

                                               An output observation alphabet
                                               O  {o1 , o2 ,...,oM }
                                               The set of states
                                                 {1,2,...,N}
        a12 
                                             A transition probability matrix
                                               A  {aij }  P( st  j | st 1  i)
                                               An output probability matrix
           B                                  B  {bi (k )}     bi (k )  P( X t  ok | st  i)
                       B
                                               An initial state distribution
                                                 P(s0  i)
Formal notation whole parameter set
  ( A, B, )                                 Assumptions
                                               • Markov assumption
                                               • Output independence assumption
                                               Ease of use / no significant affect
Markov assumption
   “probability of the random variable at a given time
   depends only on the value at the preceding time.”


   P ( X i | X i 1 )
  P( X 1 ,, X N )
 P( X 1 ) P( X 2 | X 1 ) P( X 3 | X 2 , X 1 )  P( X N | X 1 ,, X N 1 )
 P( X 1 ) P( X 2 | X 1 ) P( X 3 | X 2 )  P( X N | X N 1 )
            N
 P ( X 1 ) P ( X i | X i 1 )
           i2
HMM Program
   t:=1;
   Start in state sj with probability πi
    (i.e., X1 = i)
   Forever do
       Move from state si to state sj with probability
        aij (i.e. Xt+1 = j)
       Emit observation symbol ot = k with probability
        bijk
       t := t+1
   end
 A symbol sequence (or observations) is generated by starting at an
  initial state and moving from state to state until a terminal state is
  reached.
 The state sequence is “hidden”.
 Only the symbol sequence that hidden states emit is observable.



  HMM                     s1     s2     s3
                                  ㅕ          ㄹ       ㅓ
             b j ( xt )
  Features

                  x1 x 2 x3                  xt           frame
                          Frame shift

  Speech signals
                                                            time
Problems
The Evaluation Problem
Given the observation sequence O and the model Ф, how do we efficiently
compute P(O|Ф), the probability of the observation sequence, given the
model?

The Decoding Problem
Finding the sequence of hidden states that most probably generated an
observed sequence.

The Learning Problem
How can we adjust the model parameter to maximize the joint probability
(likelihood)?
How to evaluate an HMM
   Given multiple HMM’s (1 for each word) and a observation
    sequence. Which HMM most probably generated the sequence?

    Simple (expensive) solution:
   Enumerate all possible state sequences S of length T
   Sum up all probabilities of these sequences
   Probability of path S (calculate for all paths):
    State sequence probability * joint output probability

   Forward Algorithm is used to calculate above idea much more
    efficient, Complexity O(N2T)
   Recursive use of partially computed probabilities for efficiency
How to evaluate an HMM (2)
                                 HMM for
Seoul                       l1   word 1

                         Likelihood    P(X|l1)
                        computation
                            .
                            .                              Recognized
  Speech                                                   word
            Feature         .                     Select
           extraction                            maximum
                                 HMM for
                            lV   word V

                         Likelihood
                        computation
                                       P(X|lV)
How to decode an HMM
   Forward algorithm does not find best state sequence
    (‘best path’)
   Exhaustive search for best path is expensive
   Viterbi algorithm is used:
       Also uses partially computed results recursively
       Partially computed results are best path so far
       Each calculated state remembers most optimal previous state
        invoking it
       Complexity O(N2T)
   Finding best path is very important for continuous
    speech recognition
How to estimate HMM
Parameters (learning)
    Baum-Welch ( or Forward-Backward) algorithm

   Estimation of model parameters ф=(A,B,):
      First make an initial guess of the parameters (which may well be entirely
       wrong)
      Refine it by assessing its worth, attempt to reduce provoked errors
       when fitted to the given data
   Performs a form of gradient descent, looking for a minimum of an
    error measure.
   Forward probability term  and backward probability term 
    Similar to Forward & Viterbi (recursive use of incomplete data) but
    more complex
   Unsupervised learning: feed sample speech data along with
    phonemes of spoken words
  How to estimate HMM
  Parameters (learning) (2)
      waveform            feature
 il

  i
chil

                                                                 Yes
                  Feature           Baum-Welch      Converged?         end
       Speech
                 Extraction         Re-estimation
      database

                                                          No
                 l1                    Word
                                       HMM
                 l2

                 l7

								
To top