tutorial1

Document Sample
tutorial1 Powered By Docstoc
					      Perception of major acoustic cues
              Astrid van Wieringen
5th European Master school on Language and Speech
              Bonn, 12-16 July 2004



             Laboratory for
             Experimental ORL
             KULeuven
                                        Content of Tutorial

Lab Exp ORL   • In order to understand why certain speech sounds are not
                perceived/recognized by hearing-impaired or automatic speech
                recogniser, one should understand:
                 – major categories + acoustic properties of speech sounds
                 – different types of tests & speech materials
                 – how to assess transmission of robust spectral and temporal cues
                    by means of analytical (phoneme) tests
                 – data collection and analyses
              • hearing loss (with focus on cochlear implants)

              • Practical part:
                 – Test perception of filtered speech sounds


KULeuven

                        Astrid van Wieringen 12-16 July 2004                         2
                              Speech sounds - major categories

Lab Exp ORL
              • Vowel and consonant phonemes are classified in terms of
              • Manner of articulation
                 – concerns how the vocal tract restricts airflow
                    • completely stopping of airflow by an occlusion creates a
                      plosive (stop consonant)
                    • vocal tract constrictions of varying degree occur in liquids,
                      fricatives, glides and vowels
                    • lowering the velum causes nasal sounds

              • Place of articulation
                     • refers to the location in the vocal tract
              • Voicing
                     • presence/absence of vocal fold vibration

KULeuven

                         Astrid van Wieringen 12-16 July 2004                         3
                           Manner of articulation of most consonants

Lab Exp ORL
              • Stop consonants (plosives): complete closure and subsequent release of a vocal
                  tract obstruction. Pressure build-up followed by burst.
              • Liquids: like vowels, but tongue is used for some degree of obstruction. For /l/ air
                  escapes around the tip of tongue or dorsum. The /r/ has more variable articulation
              • Nasals: a lowering of the velum. Airflow out of the nostrils. In English only
                  nasalized consonants (oral tract completely closed), in French also nasalized vowels
                  (air escapes through oral tract and nasal cavities). Vowels may be nasalized in English,
                  but the distinction is not phonemic (= vowel identity does not change). In French there
                  are pairs of vowels that differ only in the presence or absence of vowel nasalization.
              • Fricatives: narrow constriction in the oral tract (for some language in the pharynx
                  and in the glottis). If the pressure behind the constriction is high enough and the
                  passage sufficiently narrow, airflow becomes fast enough to generate turbulence at the
                  end of the constriction
              • Strident fricatives: noise amplitude is enhanced by airflow striking a surface: 
                  (shy)
              • Affricate= stop + fricative: d (gin)

KULeuven

                            Astrid van Wieringen 12-16 July 2004                                             4
              Place of articulation (varies per language)

Lab Exp ORL




KULeuven

              Astrid van Wieringen 12-16 July 2004          5
                                        Place of articulation

Lab Exp ORL
              • Labials
                 – bilabial: if both lips constrict
                 – labiodental: if the lower lip contacts the upper teeth
              • Dental: the tongue tip or blade touches the edge or back of upper teeth
                 – interdental: if the tip protrudes between the upper and lower teeth
                    („the‟)
              • Alveolar: the tongue tip or blade touches the alveolar ridge
              • Palatals: the tongue blade or dorsum constricts with the hard palate
                 – retroflex: if the tongue tip curls up
              • Velar: the dorsum approaches the soft palate
              • Uvular: the dorsum approaches the uvula
              • Pharyngeal: constriction in the pharynx
              • Glottal: vocal folds close or constrict

KULeuven

                         Astrid van Wieringen 12-16 July 2004                         6
                                                                              Dutch vowel triangle

Lab Exp ORL


                                              3000

                                                          i

              Second formant frequency (Hz)   2500                        i           e
                                                                              e
                                                      y           i
                                                                                           I
                                              2000            i               I                                                
                                                                              e                        
                                                                      I                        
                                                                                  
                                                                  y           y                    
                                              1500                                                                                 a
                                                                                                                           a
                                                                                                                    a
                                                                                                                 a
                                                                                                       
                                              1000                                                          
                                                              u                       o            
                                                                                                                                              WD- female
                                                                  u                        
                                                              u                   o                                                          MD - male
                                                                          u       o
                                              500
                                                                                                                                              JW - male

                                                0                                                                                             AG- female
                                                200                               400                  600           800       1000    1200

KULeuven                                                                      First formant frequency (Hz)

              Astrid van Wieringen 12-16 July 2004                                                                                                         7
                            Major acoustic cues of stop consonants
              • /p, t, k, b, d, g/
Lab Exp ORL   • Phonetic features
                   – Manner: stop (plosive)
                   – Place (bilabial, alveolar, velar)
                   – Acoustic cues
                        • Silence (corresponds to the period of oral constriction = stop gap)
                                   » Voiced stops: low energy, also called voice bar
                        • Burst: corresponds to the articulatory release of the oral constriciton and to
                           aerodynamic release (due to build-up of pressure). Bursts occur in initial and
                           medial position, rarely found in final position. Place of articulation may be
                           signaled by spectrum of burst, but
                   – Transition is also very important. Transition corresponds to the articulatory
                      movement from oral constriction for the stop to the more open tract for a
                      following sound (usually vowel). Easy to identify for voiced than for voiceless
                      sounds.
              •   Most important features:
                        • stop gap
                        • release burst
                        • presence/absence of voice onset time
                        • transition
KULeuven                • voicing features
                           Astrid van Wieringen 12-16 July 2004                                             8
                                  Duration of stop consonants

Lab Exp ORL
              • Stop gap: 50-100 ms
              • Burst: 5-40 ms (a „transient‟ = disappears immediately, shortest event
                in speech!)

              • CV (consonant - vowel) and VC (vowel consonant) transitions: 10 -
                40 ms. Reflects changes in the vocal tract. Very difficult to
                measure/analyze such a short event. However, perceptually very
                important!




KULeuven

                         Astrid van Wieringen 12-16 July 2004                            9
                               /aba/                               /apa/
                     0.251                           0.2548




Lab Exp ORL              0                                0




                    -0.2109                          -0.2187
                           0              0.677007             0              0.700023
                               Time (s)                            Time (s)

                               /ada/                                /ata/
                     0.296                           0.2924




                                                          0
                         0




                    -0.2195                          -0.2594
                           0              0.704014             0              0.675011
                               Time (s)                            Time (s)

                                                                   /aka/
                                                     0.3234




              Time-signals of Dutch plosives

                                                          0


KULeuven

                    Astrid van Wieringen 12-16 July 02004                                10
                                                      -0.244
                                                                              0.690023
                                                                   Time (s)
                                                   •b •  •aba
                                         •Onset of•/•/in•/ •/                                •d •  •ada
                                                                                   •Onset of•/•/in•/ •/
                             •0.1817                                    •0.296



Lab Exp ORL

                                  •0
                                                                            •0




                             •-0.1792                                  •-0.2195
                                    •0                    •0.0687982          •0                     •0.0817687
                                              •Time (s)                                  •Time (s)

                                                   •p •  •apa
                                         •Onset of•/•/in•/ •/                                •t
                                                                                              •/ •ata
                                                                                   •Onset of•/ • in•/ •/
                              •0.221                                    •0.276




                                  •0
                                                                            •0




                             •-0.2187                                  •-0.2101
                                    •0                    •0.0812925          •0                      •0.111791
                                              •Time (s)                                  •Time (s)
KULeuven      Initial part of Dutch plosives

                        Astrid van Wieringen 12-16 July 2004                                                      11
                       •aba
                      •/ •/                                •apa
                                                          •/ •/
                •4
              •10                                •4
                                               •10                                           •aka
                                                                                            •/ •/
                                                                                    •4
                                                                                  •10



Lab Exp ORL




                •0                               •0
                 •0               •0.677007       •0                 •0.700023      •0
                      •Time (s)                          •Time (s)                   •0                 •0.690023
                                                                                            •Time (s)
                       •ada
                      •/ •/                                •ata
                                                          •/ •/
                •4
              •10                                •4
                                               •10




                                                                                 Spectrogram of Dutch plosives




                •0                               •0
                 •0               •0.704014       •0                 •0.675011
                      •Time (s)                          •Time (s)


KULeuven

                             Astrid van Wieringen 12-16 July 2004                                                   12
                                                     Fricatives

              •   Phonemes:
Lab Exp ORL
                   – voiced /, , , /
                   – voiceless: /,  , , , , /
              •   Phonetic features:
                   – manner: frication
                   – place: labiodental, linguadental, alveolar, palatal, glottal

              •   Acoustic cues:
                   – voicing
                   – frication noise: noise generated as air is forced through a narrow constriction.
                      Then filtered by the vocal tract.
                   – transitions to and from the vowels due to changes in the vocal tract

                   – sibilants/ stridents have intense noise energy
                   – non sibilants: weak noise energy



KULeuven

                            Astrid van Wieringen 12-16 July 2004                                        13
                 Spectrograms of a few Dutch fricatives
                          •afa
                         •/ •/                           •ava
                                                        •/ •/
                •4
              •10                                 •4
                                                •10
Lab Exp ORL




                •0                                •0
                 •0                 •0.716032      •0               •0.754014
                        •Time (s)                       •Time (s)

                          •asa
                         •/ •/                            •aza
                                                         •/ •/
                •4
              •10                                 •4
                                                •10




                •0                                •0
KULeuven         •0                 •0.738005      •0               •0.728027
                        •Time (s)                       •Time (s)

                Astrid van Wieringen 12-16 July 2004                            14
                                                   Nasals

Lab Exp ORL
              • Phonemes: /m, n, /
              • Phonetic features:
                 – manner: nasal
                 – place: bilabial, alveolar, velar
              • Acoustic features:
                 – murmur: as a result of nasal radiation of acoustical energy. The
                    spectrum is dominated by low-freq. energy (< 500 Hz). Murmur
                    cues of three different nasals are not exactly alike, but difficult as
                    a distinctive cue
                 – transitions: preceding and following vowels will be nasalized.
                    Cues to place of articulation
                 – voicing is always present (except during whispering)
              • Spectrum of nasals reflects a combination of formants and
                antiformants
KULeuven

                         Astrid van Wieringen 12-16 July 2004                            15
                        Spectrograms of a few Dutch nasals
                           /ama/                            /ana/
              6000                               6000
Lab Exp ORL




                0                                   0
                 0                    0.762018       0                 0.764014
                           Time (s)                         Time (s)

                           /anga/
              6000




                0
KULeuven         0
                           Time (s)
                                      0.749025



                     Astrid van Wieringen 12-16 July 2004                         16
                                                       Glides

              •   also called „ approximants ‟ and semivowels:
Lab Exp ORL
                    – gradual articulatory movement
                    – vocal tract narrowed, not closed
              •   Phonemes: /j/ & /w/
              •   Phonetic features
                    – Manner: glide or semivowel
                    – Place: palata l or labiovelar
              •   Acoustic cues
                    – A relatively slow transition (75-150 ms)
                    – F1 of both sounds starts at very low value (a little higher than for stops)
                    – F2 of /w/: 800 Hz Compare with /b/!!, F3 of /w/: 2200 Hz
                    – F2 of /j/: 2200 Hz (compare wih /d/!!), F3 is 3000 Hz
              •   longer glides: vowel-vowel sequences!:
                    – [bi] - [wi]- [ui] and
                    – [du] - [ju] - [iu]



KULeuven

                            Astrid van Wieringen 12-16 July 2004                                    17
                       Spectrograms of 2 Dutch glides

Lab Exp ORL


                        /w/ from /awa/                      /j/ from /aja/
               5000                                 5000




                  0                                    0
                   0                     0.754014       0                    0.748027
                           Time (s)                            Time (s)




KULeuven

              Astrid van Wieringen 12-16 July 2004                                      18
                                                        Liquids

              •   Phonemes: /l/ & /r/
Lab Exp ORL
              •   Phonetic features:
                   – Manner: lateral or rhotic
                   – Place: alveolar for /l/, palatal for /r/
              •   Acoustic cues: rather complex:
                   – both relatively fast formant transitions
                   – similarity with glides: well-defined formant structure (less constriction than stops,
                     fricatives, and affricates)
                   – /l/: energy mainly in the low frequencies. Resonances and antiresonances due to
                     divided vocal tract. Resembles /n/. F1: 360 Hz, F2: 1300 Hz, F3: 2700 Hz
                   – /r/: similar for F1
                        • F2 somewhat lower than for /l/
                        • F3 especially lower (1650 Hz). Durations of formant transitions somewhat
                            longer for /r/ than for /l/
                   – temporal cues:
                        • /r/: F1 has a short steady-state + relatively long transition
                        • /l/: F1 has a long steady-state + relatively short transition
KULeuven

                            Astrid van Wieringen 12-16 July 2004                                        19
                       Spectrograms of 2 Dutch liquids

Lab Exp ORL
                        /l/ from /ala/                      /r/ from /ara/
                5000                                5000




                  0                                    0
                   0                     0.678005       0                    0.737029
                           Time (s)                            Time (s)




              •no clear distinction between vowel and consonant
              •F3 of /r/ lower

KULeuven

                 Astrid van Wieringen 12-16 July 2004                                   20
Lab Exp ORL




              Speech perception assessment for the
                        hearing-impaired




KULeuven

                 Astrid van Wieringen 12-16 July 2004   21
                                 Speech perception assessment

Lab Exp ORL
              • Required for diagnostic purposes
              • monitoring progress in a rehabilitation programme
              • comparison of different speech processing strategies (hearing aids and
                ochlear implants)
              • understand “limited” technology/number of channels available for
                hearing impaired or implantees

                  – hearing aid: speech divided into frequency bands. Acoustically
                    enhanced
                  – cochlear implant: acoustical sound is picked up by microphone,
                    analyzed into frequency bands, coded and sent to limited number
                    of electrode pairs in the inner ear (electrical stimulation)



KULeuven

                         Astrid van Wieringen 12-16 July 2004                       22
              How a cochlear implant works... (MedEL)
              •   (1)Sounds are picked up by a
Lab Exp ORL       microphone and turned into an
                  electrical signal.

                  (2) This signal goes to the
                  speech processor where it is
                  "coded" (turned into a special
                  pattern of electrical pulses).

                  (3) These pulses are sent to
                  the coil and are then
                  transmitted across the intact
                  skin (by radio waves) to the
                  implant.

                  (4) The implant sends a
                  pattern of electrical pulses to
                  the electrodes in the
                  cochlea.(5) The auditory nerve
                  picks up these tiny electrical
                  pulses and sends them to the
                  brain. (6) The brain recognizes
                  these signals as sound.
KULeuven

                            Astrid van Wieringen 12-16 July 2004   23
          Tutorial article on cochlear implants that appeared in the IEEE Signal Processing Magazine, pages 101-130, September 1998.
                                                         Introduction to cochlear implants
                                                                 Philipos C. Loizou




Lab Exp ORL




KULeuven

                                      Astrid van Wieringen 12-16 July 2004                                                             24
              Figure of electrode array in the cochlea...

              •   Necessary to „map‟ (fit) acoustical
Lab Exp ORL
                  information to electrical
                  information....




KULeuven

                          Astrid van Wieringen 12-16 July 2004   25
              Top: Output of the CIS algorithm for the word ‘som’. Pulse channels
              reflect the envelopes of the bandpass filter output

Lab Exp ORL



                                                  8

                          amplitude per channel   7


                                                  6


                                                  5
                     A
                                                  4


                                                  3


                                                  2


                                                  1


                                                  100   200   300   400     500       600   700   800   900
                                                                          time (ms)



KULeuven

                         Astrid van Wieringen 12-16 July 2004                                                 26
              Transmission of AMA & ASA by a CI device

Lab Exp ORL




KULeuven

               Astrid van Wieringen 12-16 July 2004      27
                            Analytical tests: purpose and performance

              • Many types of speech tests to evaluate CI performance
Lab Exp ORL
                 – detection of environmental sounds
                 – identification of male/female voice
                 – identification of vowels and consonants (V & C) in nonsense cont.
                 – words
                 – sentences
              • Each type of test triggers a different level of performance.

              • Why is a carefully balanced V & C test important?
                – /paat/, /pit/, /poot/, etc., or /apa/, /ara/, /ana/,

                  – it gives important information on the transmission of speech
                    features via the implant and hearing aid (e.g. voiced-
                    voicelessness, nasality of /m/ or high frequency
                    frication/turbulence of /s/)
                      • analytical: no contextual information
                      • therefore, information can guide the fitting of an implant
KULeuven

                         Astrid van Wieringen 12-16 July 2004                        28
                         Choice of test depends on objectives, BUT
              •   speech stimuli should be
                   – carefully pronounced and, if possible, adjusted to the same RMS level (so
Lab Exp ORL           that other cues are kept in hand)
                   – presented via hard disc of PC, CD or tape (recorded at highest level of
                      quality)
                   – administered to the subject in a quiet room (if presented acoustically)
                   – presented a sufficient number of times to obtain a reliable score
                   – Note: an analytical test does not replace other tests, but it measures
                      speech perception based on auditory information alone. Can be used
                      for several languages.


              •   At the Lab. Exp. ORL recordings were made of Dutch vowels and consonants
                  in different contexts. These were carefully selected from different tokens,
                  segmented (with an additional hamming window to avoid on- and offset
                  clicks), equalized in RMS (root mean square) and partly analyzed (with
                  regard to their main spectral and temporal properties).
                   – /aCa/: /p, t, k, b, d, r, l, m, n, s, f, z, v, w, j/.
                   – /pVt/:/oe, ie, i, oo, o, ee, e, u, aa, a/
              •   All speech sounds were analyzed (frequency, duration, energy, …)
KULeuven

                          Astrid van Wieringen 12-16 July 2004                              29
                                              Confusion matrix

Lab Exp ORL
                            aPa     aBa     aMa      aKa     aZa     aSa
                   apa       4       4        4       0       0       0       12
                   aba       1       5        5       0       0       1       12
                   ama       0       0       12       0       0       0       12
                   aka       0       0        0       12      0       0       12
                   aza       0       0        0       0       9       3       12
                   asa       0       0        0       0       3       9       12



                  In this example consonant identification is 71% (51/72). Note that this score should
                  always be considered together with the chance performance of the closed-set test (here 17
                  %). In this example it is clear that a score of 71% is considered significantly above
                  chance (p< 0.05).


              •   Distribution of errors even more interesting
                   – not random
                   – can be quantified by means of an information transmission algorithm
                      (Miller and Nicely, 1955)
KULeuven

                            Astrid van Wieringen 12-16 July 2004                                     30
                                           Effect of filtering

Lab Exp ORL
              • Loss of auditory information can be examined in normal-hearing
                persons by filtering away acoustical information: to allow certain
                frequencies to be transmitted while attenuating others.
                  – a high-pass filter allow all components above a cutoff frequency to be
                    transmitted
                  – a band-pass filter allows frequencies within a certain band to pass
                  – low-pass filter allow all components below a cutoff frequency to be
                    transmitted


              • Demonstration of loss of acoustical cues!




KULeuven

                         Astrid van Wieringen 12-16 July 2004                                31

				
DOCUMENT INFO