Speech recognition by ashrafp

VIEWS: 20 PAGES: 3

									Adapted from Acoustical Society of America Tutorial by Carol Espy-Wilson, espy@umd.edu, Spring 2005

                           Current applications of speech recognition

Machinery control: speaker-independent, small vocabulary, high-quality mike, very low error
tolerance

Voice telephone dialing: speaker-independent (digit recognition) and speaker dependent (name
recognition), small vocabulary, low-quality mike, moderate error tolerance

Human-machine interface via telephone: speaker-independent, moderate vocabulary, low-quality
mike, high error tolerance

Dictation: speaker-dependent, large vocabulary, high-quality mike, high error tolerance


                         Human speech recognition (HSR) versus ASR

Percentage word error rate (Lippman, Speech Communication 22, pp 1-15, 1997)

%                HSR               ASR
Grammar          0.1               3.6
Non-grammar      2                 17

Reasons for performance gap
    Poor modeling of low-level acoustic-phonetic information
    Poor modeling of speech variability
    Lack of robustness to noise and channel variability
    Inability to deal effectively with disfluencies in spontaneous speech

Major Stumbling Block: Speech variability
Changes due to recording conditions (background noise, room reverb, mike characteristics)
Differences in vocal tract length and shape (age, sex)
Undershoot in articulation (effect of speaking rate and/or style)
Voice quality changes (breathy to creaky)
Variable degree of coarticulation (overlapping sounds/words, exs. “cart”, “seven plus”)
Idiolect (detailed articulatory habits of a single person)
Differences in dialect (vowel substitutions)
Chomsky and Halle, Sound Pattern of English, 1968.
    20 or so phonetic features characterizing al of the world’s languages
    Based on position of articulators
    Phonetic features come in 3 categories
         1. Manner of articulation features, related to how open vocal tract is
         2. Place of articulation features, location of main constriction
         3. Source feature, opening of glottis and vibration of vocal folds

Formant Map or vowel loops

Peterson and Marney, American Journal of Physics 24, 1952, pp 175-184.

heed, hid, head, had, hod, hawed, hood, who'd, hud, and heard.




                        heed
                                hid

                                              head


                                                          had




                                                          hud
                        heard

                                                                 hod

                        hood




                                      hawed

                   who’d
Peterson and Marney, American Journal of Physics 24, 1952, pp 175-184.

heed, hid, head, had, hod, hawed, hood, who'd, hud, and heard.

								
To top