An introduction to the Speaker Verification Task

Document Sample
An introduction to the Speaker Verification Task Powered By Docstoc
					          An introduction to the
        Speaker Verification Task



                                 Benoit Fauve
                    Speech and Image Research Group
                            Electronics Research Centre
               School of Engineering, University of Wales Swansea, UK
                              http://eegalilee.swan.ac.uk


PR I FYS G O L CYM R U AB E R TAW E           U N IVE R S ITY O F WALE S SWAN S EA
Outline




            1.        Introduction: a biometric problem
            2.        Measure/Features
            3.        Learning/Model
            4.        Result/Decision
            5.        Extra




  An intoduction to Speaker Verification
                                                          2
Feature in biometric



                         A simple biometric problem:

                          How to do an automatic
                         male/female discrimination?




  Speaker Verification
                                                       3
Feature in biometric




                                                            T1
                         Acquisition   Feature extraction
                                                            T2




  Speaker Verification
                                                                 4
Features: building of
statistical model



                         T1             T1
                              T1             T1
                                   T1
                         T2             T2
                              T2             T2
                                   T2

                            T1 T     T1
                         T1     1 T
                                   1
                            T2 T     T2
                         T2     2 T
                                   2



  Speaker Verification
                                              5
Features: building of
statistical model



  T1                     T1
         T1
                 T1           T1                T2
  T2                     T2
         T2                   T2
                 T2

     T1 T     T1
  T1     1 T
            1
     T2 T     T2
  T2     2 T
            2                                                         T1
                                   Key steps:
                                   -Measure
                                   -Discriminative model
  Speaker Verification
                                   -Likeliness estimation, decision
                                                                           6
Speaker verification task


                                                    Joe Bloggs



sound sample ????

                                                                 Someone else



      Is it Joe Bloggs talking in the sound sample?
      Similar problem than gender discrimination:
      “Male” or “Female”?
      “Joe Blog” or “Someone else”?


  Speaker Verification
                                                                                7
• Feature extraction




Speaker Verification
                       8
What is a good feature?


  We are looking for parameters with following
    properties:
  • Low variability between sessions of a same
    speaker.
  • High variability between different speaker
  • Limited perturbations due to recording channel
    (codec, channel and microphone bandwidth,
    noise)



  Speaker Verification
                                                     9
Speech production




                         0




                             Air from
                                         Vocal fold   Vocal tract   Speech
                             the lungs




  Speaker Verification
                                                                             10
Vocal tract measurement



Limitations:
- Most database (ex: NIST) only have sound
recordings.
- Full access to the speaker throat required
(which he might decline to offer).
- Not reproducible (limit for experiments)




     Speaker Verification
                                               11
Friendly way to get vocal tract
characteristics




                                                         Ways to get to the spectral envelop:
                                                         Prediction family
                                                         • LPC Linear Prediction
                                                         • PLP Perceptual Linear Prediction
Air from
                                                         Filter bank family
the lungs
             Vocal fold      Vocal tract    Speech
                                                         • MF Mel-frequency-spaced Filterbank
 noise
                                           synthesised   • LF Linear-frequency-spaced Filterban
              H1(z)          H2(z)           Speech




       Spectral envelop reflects morphological
          characteristics of the vocal tract

      Speaker Verification
                                                                                            12
Example: Mel-Frequency
Ceptral Coeff. MFCC




  Speaker Verification
                         13
Features in speech




                                                                           X1
                                                                           .
                                                                           .
                                                                           .
                                                                           .
                                                                           Xi
                                                                           .
                                                                           .
                                                                           .
                                                                           .

                                                                           .

                  Acquisition                         Feature extraction




                                 Shift Frame length                        Size: ~ 30 to 60
                                10 ms     20 ms




  Speaker Verification
                                                                                              14
• Probabilistic approach
• Speaker modelling




Speaker Verification
                           15
Introduction to the probabilistic approach



                Client
                 Speaker S

                                            Test Y

         ‘other’ speakers
                             - H1 : Y has been pronounced by the speaker S.
                             - H2 : Y has been pronounced by someone else
                             than the speaker S.




                 World
  Speaker Verification
                                                                              16
Probabilistic approach:
training

                             X1
                             .                Mixture of Gaussians representing probabilities densities
                             .
                             .
                             .
                 Speaker S   Xi
                             .
                             .
                             .
                             .                                                Client model
                             .




                                                                                               xi


                                             Description of the statistical distribution of the acoustic
         ‘other’ speakers         Features   observation from the class S.




                                                                                     World model
                                                                                      or UBM
                                                                                    (Universal Background Model)


                                                                                               xi




  Speaker Verification
                                                                                                                   17
In practice: Multi Gaussian and MAP adaptation


                         - Data do no follow Gaussian distribution
                         - There is a limited amount of data for the targeted speaker




                                                           xi




                                                           xi


                             - Mixture of Gaussian: 512 to 2048
                             - MAP adaptation

  Speaker Verification
                                                                                        18
• Result/Decision




Speaker Verification
                       19
Probabilistic approach: test


                                                             In theory we look for the
                                                                   value S(Y) =
                                          P(Yi|H1 )         log P(Y|H1 ) - log P(Y|H2 )




                                   Client model          In practice Output:
   Test Y
                                                                    S(Y) =
                                                      1/N  log P(Yi|H1 ) - log P(Yi|H2 )

                         Yi
                              YN

                                            UBM
                                      P(Yi|H2 )

  Speaker Verification
                                                                                          20
ASR: decision soft/hard


                                        JB                              ??
            Test


                                           ASR System



                                                          Score                                         Soft
                            Rejected                   Threshold             Accepted                  Hard
                                               Rejected                      Accepted
                 Same (client access)          Miss (False rejection)        Correct verification
                 Different (impostor access)   Correct verification          False Alarm (False acceptance)

  Speaker Verification
                                                                                                               21
Error types




  Speaker Verification
                         22
System evaluation - DET curve




S1 ….Sn target scores (example of outputs when the 2 sound samples come from the same person)
Sk ….Sl non-target scores (example of outputs when the 2 sound samples come from different persons)




    Speaker Verification
                                                                                                      23
System evaluation - DET curve




S1 ….Sn target scores (example of outputs when the 2 sound samples come from the same person)
Sk ….Sl non-target scores (example of outputs when the 2 sound samples come from different persons)




                                              DET curve

    Martin, A. and Przybocki, M. A. The DET curve in assessment of detection task performance. Eurospeech 1997, pages 1895–1898


    Speaker Verification
                                                                                                                                  24
• Extra: score normalisation T-Norm




Speaker Verification
                                      25
T-Norm : Principal




                 Scores
                           Threshold??


                          Model

                          Test File




  Speaker Verification
                                         26
T-Norm : Principal




                 Scores                                          Threshold!!


                                                                Model

                                                                Test File

  In practice all test files are tested over a series of impostor models (~70 - 100)
 Depending on the mean and variance of these results the final score is normalised
  Speaker Verification
                                                                                       27
Summary



                                                              World model
                                                               World model
                                      World model              World model
                                                              Impostor models
                        Features
                         X    1
                          .
                          .
                         .         Adaptation
                         .
                         Xi
                          .
                          .
                          .
                          .
                                     Speaker model
                          .
  Target



                                                          S
                                                Testing                           Score!!!

                                                                            S1, S2, … Sn
                                                                 T Norm



 Speaker Verification
                                                                                           28

				
DOCUMENT INFO