Document Sample
• Speech recognition is the process of converting a
  speech signal to a sequence of words.
• Applications:
   –   Voice Dialing (e.g. Call Home)
   –   Call Routing (e.g. I would like to make a Collect Call)
   –   Data Entry (e.g. entering credit card number)
   –   Document Preparation (e.g. radiology report)
• Started w/ Alexander Graham Bell
• By discovering how to convert air pressure waves
  (sound) into electrical impulses, he began the
  process of uncovering the scientific/mathematical
  basis of understanding speech.
• Discovered that a wire vibrated by a voice could
  be made to vary its resistance and produce a
  current when immersed in a conducting liquid.
• This lead to the invention of the telephone.
              History (Cont)
• 1950s, Bell Laboratories: 1st speech recognizer
  for numbers.
• 1970s, ARPA Speech Understanding Research:
  Objective of automatic speech recognition is the
  understanding of speech not words.
• 1980s: Speaker-independent recognition of small
  vocabularies, large-vocabulary voice recognition.
• Present: Real-time, continuous speech systems
  that augment command, security, and content
  creation tasks w/ exceptionally high accuracy.
           What is Sound?
• Sound produced in speech are a traveling
  wave, which is an oscillation of air pressure
• Sound is a form of energy
• Made up air molecules that vibrate
• Vibrates the air like a slinky!
• SI unit of frequency
• Freq = 1/T, T = Period
• means “one cycle per second” which can
  also mean “one vibration per second”
• average human can hear frequencies in the
  range of 20 Hz and 16,000 Hz
 Difference Between a
Consonant and a Vowel
             According to…
• Ferdinand de Saussure – higher degree of
  aperture in oral cavity in consonants
• Leonard Bloomfield
  – Vowels have ”modifications of voice-sound
    that involve no closure, friction, or contact of
    the tongue or lips”
  – Consonants are “the other sounds”
            According to…
• Chomsky and Halle:
  – Vowel: air stream does not meet any major
    obstacle or constriction in its way from the
    lungs out of the mouth, and the articulation of
    the sound allows spontaneous voicing
  – Consonant: articulation of a consonant always
    involves some kind of blocking of the air
  – Many contemporary linguistics follow this view
            What is voicing?
• Voiced means when tone is present (vibration of
  vocal cords)
• All vowels are voiced
• Some consonants are voiced
• Difficult to determine difference between
  consonants and vowels through voicing
• Ex: L’s and R’s can be detected as vowels through
  resonant frequencies from vocal tract
•A formant is a peak in an acoustic frequency
spectrum which results from the resonant frequencies
of any acoustical system.
•Formants are the distinguishing or meaningful
frequency components of human speech and of
•Formants are the characteristic partials that identify
vowels to the listener
• One of the most the widely used forms of speech
recognition is formant synthesis. At least three
formants are generally required to produce intelligible
speech and up to five formants to produce high quality
1st formant 150-850 Hz
2nd formant 500-2500 Hz
3rd formant 1500-3500 Hz
4th formant 2500-4800 Hz
  Pythagoras and Sound

Pythagoras is credited with developing our
first basic understanding of the harmonic
It is said that one day when walking
through town, he noticed that the tone
created when a black smith struck his anvil
varied according to the weight of the
hammer. He expanded this idea to
experiments involving different lengths of
string held at equal tensions.
         Pythagoras and Pitch
Pythagoras found that when you plucked a string at a certain
tension you got a tone at a particular pitch, but also that
plucking a string half the length of your original string would
cause it to vibrate twice as fast and produce a pitch one octave
higher in pitch.

     Fundamental Pitch

     One Octave Higher
   Pythagoras and Intervals
After further experimentation, he found that dividing the
string into other proportions provided different musical
       Fundamental Tone (1 : 1)

       Octave (2 : 1)

       Fifth (3 : 2)

        Fourth (4 : 3)

        Major Third (5 : 4)
      Pythagoras and Waves
Through these experiments, Pythagoras determined that the
pitch (or frequency) of a sound varies inversely with the
length of the string it is vibrated on provided that the string
remains at the same tension and has a uniform density and

This relationship would later be the basis of the standing
wave equation we know from physics:

     The wave equation is essential for the area of speech
  recognition since every sound can be modeled as the wave
 equation and so the correct and further understanding of the
wave equation will give us as well more understanding of the
 nature of each individual sound. This problem can be easily
simplified by recognizing only the vowel and consonants. The
             wave equations give us the answer.

• As with all partial differential equations,
  suitable initial and/or boundary conditions
  must be given to obtain solutions to the
  equation for particular geometries and
  starting conditions.
             Tuning Forks
• metal, two-pronged device that produces a
  tone when it vibrates
• Commonly used to tune instruments using
  the central A note at 440 Hz.
• Tuning forks can also be used for hearing
• can tell if there is a problem with the nerves
  themselves or with sound getting to nerves
                   Piano Keys
                               • Example:
• only twelve fixed notes of   • The central A has a frequency
  frequencies in each octave     of 440 x 2^(n/12). To
• All notes centered around      calculate C5 from A4, we
  A4 note (440 Hz)               have:
                               •   A — (1) → A♯— (2) → B — (3) → C
• Notes contain the letter,
  any sharp or flat
  associated with it as well
  as an octave number.
• Any note can be
  mathematically calculated
  because they are an
  integer half step away
  from the central A.
                Piano Keys
• Octaves automatically   • used in the Musical
  yield factors of two      Instrument Digital
  times the original        Interface (MIDI).
• general formula used
  to calculate each
  frequency is:

Shared By: