Robust Speech recognition by qao20272

VIEWS: 0 PAGES: 17

									Robust Speech
recognition
 V. Barreaud
 LORIA
Mismatch Between Training and
Testing

             mismatch influences scores
             causes of mismatch
               Speech  Variation
               Inter-Speaker Variation
Robust Approaches
    three categories
      noiseresistant features (Speech var.)
      speech enhancement (Speech var. + Inter-speaker var.)
      model adaptation for noise (Speech var. + Inter-speaker var.)


             Recognition system
                                           Models     training

                   Features
         testing
                   encoding                                      Spk. B
                                                    Word sequence
Spk. A
Contents

              Overview
                Noiseresistant features
                Speach enhancement
                Model adaptation

              Stochastic Matching
              Our current work
Noise resistant features
   Acoustic representation
       Emphasis on less affected evidences
   Auditory systems inspired models
       Filter banks, Loudness curve, Lateral inhibition
   Slow variation removal
       Cepstrum Mean Normalization, Time derivatives
   Linear Discriminative Analysis
       Searches for the best parameterization
Speech enhancement
            Parameter mapping
                stereo data
                observation subspace
            Bayesian estimation
                stochastic modelization of speech and noise
            Template based estimation
                restriction to a subspace
                output is noise free
                various templates and combination methods
            Spectral Subtraction
                noise and speech uncorrelated
                slowly varying noise
Model Adaptation for noise
                  Decomposition of HMM or PMC
                      Viterbi algorithm searches in a NxM state HMM
                      Noise and speech simultaneously recognized
                      complex noises recognized

                  State dependant Wiener filtering
                      Wiener filtering in spectral domain faces non-stationary
                      Hmms divide speech in quasi-stationary segments
                      wiener filters specific to the state

                  Discriminative training
                      Classical technique trains models independently
                      error corrective training
                      minimum classification error training
    Training
                  Training data contamination
                      training set corrupted with noisy speech
                      depends on the test environment
                      lower discriminative scores
Stochastic Matching : Introduction

              General framework
              in feature space
              in model space
Stochastic Matching : General
framework
                                              HMM Models X, X training space
                                               Y ={y1, …, yt} observation in testing space
                                                   X  F Y  and Y  G X 
                                         


                                         



                                                                ( ' ,W ' )  arg max pY ,W  , X 
                                                                                       
                                                                                       
                                                                                                    
                                                                                                    
                                                                                ( ,W )
Y                             W                                  '  arg max  pY , S , C , X 
                                                                                   
                                                                                   
                                                                                                    
                                                                                                    
    max pY ,W  , X        max pY ,W  , X                  
     W                                                                            S       C


                                                                      px x i    wi , j N x, µi , j , Ci , j 
                                                                                    M
                          
                                                                                    j 1
Stochastic Matching : In Feature
Space
              Estimation step : Auxiliary function
                   Q '   E log pY , S,C ', Y , , 
                                                                      
                    
                         
                                     
                                                   
                                                          X       X
                                    
                                                                    
                                                                       

           Q '    p (Y , S , C  , X ) log pY , S , C  ' , X 
                         S ,C


              Maximization step
                                     ' arg maxQ ' 
                                                 
                                                      
                                                       
                                            '        
Stochastic Matching : In Feature
Space (2)
   Simple distorsion function                               b
      xt ,i  F  y   yt ,i  bi
                 t, i 
                     
   Computation of the simple bias
             T , N ,M
                                      yt , i  n , m , i
                t n, m 
                                           
                                                   2
              t ,n ,m
    b' i                                          n , m ,i
                        T , N ,M
                                   t n, m                            y
                                                                  1       2
                                    
                                        2
                         t ,n ,m        n , m ,i
Stochastic Matching : In Model
Space
             random additive bias sequence B={b1,…,bt}
              independent of speech stochastic process of
              mean b and diagonal covariance b

                       μ      μ      μ
                        Y,n,m    X,n,m b

                         Y,n,m
                          2         2X,n,m   b
                                                 2
On-Line Frame-Synchronous Noise
Compensation
                          Lies on stochastic matching method
                          Transformation parameter estimated along with
                           optimal path.
                          Uses forward probabilities
                                                               Bias computation
 b1               b2                  b3               b4

                reco              reco               reco
                                                            Transformed observations

           z2                    z3             z4                     z5


      y2                                   y4
                            y3                  Sequence of observations
Theoretical framework and issue

   Classical Stochastic                                         On line frame synchronous
    Matching                                                                   t    N    M
                                                                                                               y , i  n , m , i
                                                                               B x , n, m
                                                                                       ˆ
                                                                                                                    n , m ,i
                                      yt , i  n , m , i
                                                                                                   1                2
             T , N ,M
                t n, m 
                                                                               1 n 1 m 1
                                                                   bt , i 
                                                                                                       B x , n, m 
                                                                                                            ˆ
                                           
                                                   2                                     t     N   M

    b' i 
              t ,n ,m                              n , m ,i
                                                                                        1  2        1

                        T , N ,M
                                   t n, m                                             1 n 1 m 
                          
                                                                                                            n , m ,i

                                    
                                        2
                         t ,n ,m        n , m ,i                       1. Initiate bias of first frame b0=0
                                                                       2. Compute  and then b
                                                                       3. Transform next frame with b
                                                                       4. Goto next frame
                                                                 cascade of errors
Viterbi Hypothesis vs Linear
Combination
                      Viterbi Hypothesis take into
                       account only the « most probable »
                       state and gaussian component.
states                Linear combination




         t   t+1
                               Experiments
   Phone numbers in a running car
   Forced Align
       transcription + optimum path
   Free Align
       optimum path
   Wild Align
     no     data
                                              Forced Align       Free Align       Wild Align
                    baseline   MCR     PMC
                                             Viterbi   LC      Viterbi    LC   Viterbi      LC
    Word Accuracy      84,47   87,53   87,61  86,04    88,41     85,03 87,16      87,81 84,95
Perspectives

             Error recovery problem
               a  forgetting process
                a model of distorsion function
                environmental clues

             More elaborated transform

								
To top