# Hidden Markov Models - Download Now PowerPoint by yOm8H3

VIEWS: 10 PAGES: 15

• pg 1
```									         Hidden Markov
Models
Theory

By Johan Walters (SR 2003)
Topics overview
   HMM’s as part of Speech Recognition
   Input / Output
   Basics
   Definition
   Simple HMM
   Markov assumption
   HMM program
   Evaluation problem
   Decoding problem
   Learning problem
HMM in SR
Input / Output
An HMM is a statistical model that describes a
probability distribution over a number of possible
sequences.

   Input:     A sequence of feature vectors
   Output:    Words with highest probability being spoken

Given a sequence of feature vectors, what words are
most probably meant?
Basics
–States
–State transition probabilities
–Symbol emission probabilities

State transition probability matrix
0.6 0.2 0.2
A  aij   0.5 0.3 0.2   aij  P( st  j | st 1  i)
           
0.4 0.1 0.5
           

 0.5 
             i  P(s1  i)
  ( i ) t   0.2 
 0.3 
 

O  {o1 , o2 ,...,oM }       up, down, unchanged
A simple HMM
  {1,2,3}   Formal definition HMM

An output observation alphabet
O  {o1 , o2 ,...,oM }
The set of states
  {1,2,...,N}
a12 
                 A transition probability matrix
A  {aij }  P( st  j | st 1  i)
An output probability matrix
B                                  B  {bi (k )}     bi (k )  P( X t  ok | st  i)
B
An initial state distribution
  P(s0  i)
Formal notation whole parameter set
  ( A, B, )                                 Assumptions
• Markov assumption
• Output independence assumption
Ease of use / no significant affect
Markov assumption
“probability of the random variable at a given time
depends only on the value at the preceding time.”

P ( X i | X i 1 )
P( X 1 ,, X N )
 P( X 1 ) P( X 2 | X 1 ) P( X 3 | X 2 , X 1 )  P( X N | X 1 ,, X N 1 )
 P( X 1 ) P( X 2 | X 1 ) P( X 3 | X 2 )  P( X N | X N 1 )
N
 P ( X 1 ) P ( X i | X i 1 )
i2
HMM Program
   t:=1;
   Start in state sj with probability πi
(i.e., X1 = i)
   Forever do
   Move from state si to state sj with probability
aij (i.e. Xt+1 = j)
   Emit observation symbol ot = k with probability
bijk
   t := t+1
   end
 A symbol sequence (or observations) is generated by starting at an
initial state and moving from state to state until a terminal state is
reached.
 The state sequence is “hidden”.
 Only the symbol sequence that hidden states emit is observable.

HMM                     s1     s2     s3
ㅕ          ㄹ       ㅓ
b j ( xt )
Features

x1 x 2 x3                  xt           frame
Frame shift

Speech signals
time
Problems
The Evaluation Problem
Given the observation sequence O and the model Ф, how do we efficiently
compute P(O|Ф), the probability of the observation sequence, given the
model?

The Decoding Problem
Finding the sequence of hidden states that most probably generated an
observed sequence.

The Learning Problem
How can we adjust the model parameter to maximize the joint probability
(likelihood)?
How to evaluate an HMM
   Given multiple HMM’s (1 for each word) and a observation
sequence. Which HMM most probably generated the sequence?

Simple (expensive) solution:
   Enumerate all possible state sequences S of length T
   Sum up all probabilities of these sequences
   Probability of path S (calculate for all paths):
State sequence probability * joint output probability

   Forward Algorithm is used to calculate above idea much more
efficient, Complexity O(N2T)
   Recursive use of partially computed probabilities for efficiency
How to evaluate an HMM (2)
HMM for
Seoul                       l1   word 1

Likelihood    P(X|l1)
computation
.
.                              Recognized
Speech                                                   word
Feature         .                     Select
extraction                            maximum
HMM for
lV   word V

Likelihood
computation
P(X|lV)
How to decode an HMM
   Forward algorithm does not find best state sequence
(‘best path’)
   Exhaustive search for best path is expensive
   Viterbi algorithm is used:
   Also uses partially computed results recursively
   Partially computed results are best path so far
   Each calculated state remembers most optimal previous state
invoking it
   Complexity O(N2T)
   Finding best path is very important for continuous
speech recognition
How to estimate HMM
Parameters (learning)
Baum-Welch ( or Forward-Backward) algorithm

   Estimation of model parameters ф=(A,B,):
 First make an initial guess of the parameters (which may well be entirely
wrong)
 Refine it by assessing its worth, attempt to reduce provoked errors
when fitted to the given data
   Performs a form of gradient descent, looking for a minimum of an
error measure.
   Forward probability term  and backward probability term 
Similar to Forward & Viterbi (recursive use of incomplete data) but
more complex
   Unsupervised learning: feed sample speech data along with
phonemes of spoken words
How to estimate HMM
Parameters (learning) (2)
waveform            feature
il

i
chil

Yes
Feature           Baum-Welch      Converged?         end
Speech
Extraction         Re-estimation
database

No
l1                    Word
HMM
l2

l7

```
To top