Docstoc

Homework Homework Homework Speech

Document Sample
Homework Homework Homework Speech Powered By Docstoc
					                        Homework                                                                     Homework

•   Exercise 1                                                               •   Exercise 2

    •   Four words with alternate spellings of /f/                               •   at 2 /æt/

        rough, stuff, fish, phone                                                •   math 3 /mæ$/

    •   Six words with a pronounced differently                                  •   cure 4 /kju%/

        art [!], cat [æ], table [ej], above ["], awful [#], Israel [i]           •   hopping 5 /h!p&'/

    •   Four words with different letters/same sounds                            •   psychology 8 /sajk!l"d(i/

        Peter, priest, meet, meat                                                •   knowledge 5 /n!l&d(/

    •   Two words with same letters/different sounds                             •   mailbox 7 /mejlb!ks/

        tough, hiccough
                                                                                 •   awesome 4 /#s"m/


                                                                         1                                                            2




                        Homework                                                                         Speech

•   Finish reading chapter 2                                                 •   Articulatory phonetics views speech sounds via the
                                                                                 anatomy of the human vocal tract in action
•   For Tuesday 2/13

    •   O’Grady pg 52, #5, #6, #9, #10
                                                                             •   Some choices:

    •   Farmer pg. 33, #2.1
                                                                                 •   vowels vs. consonants

    •   Farmer pg, 39, #2.4
                                                                                 •   larynx: voiced vs. voiceless sounds

•   Lab on Thursday 2/15
                                                                                 •   position of tongue, lips

                                                                                 •   oral vs. nasal sounds




                                                                         3                                                            4
                        Consonants                                                           Vowels

•   Consonants involve a radical constriction of the                •   Vowels are highly sonorous and involve an
    airstream; vowels don’t                                             unconstricted airstream

•   Vowels are more sonorous than consonants                        •   All vowels (pretty much) are voiced

•   Consonants can be classified by:                                •   Vowels are characterized as:

    •   Voicing                                                         •   high vs. low

    •   Place of articulation                                           •   front vs. back

        •   labial, dental, alveolar, palatal, velar, glottal           •   rounded vs. unrounded

    •   Manner of articulation                                      •   Diphthongs combine more than one tongue position
                                                                        (combinations of two or more vowels)
        •   stop, nasal, fricative (sibilant vs. affricate),
            approximate, tap


                                                                5                                                             6




                   Suprasegmentals                                                  Acoustic phonetics

•   Vowels and consonants are phones or segments, the               •   Physical properties of speech sounds
    basic units of speech
                                                                    •   “Sound” is a longitudinal pressure wave
•   Suprasegmental properties of speech transcend
    individual phones

    •   Pitch (tone and intonation)

    •   Loudness

    •   Length
                                                                    •   We can draw a sound wave by plotting pressure vs.
•   Suprasegmentals may be contrastive                                  time

                                                                    •   The number of pressure waves per unit time is the
                                                                        frequency of the sound wave, measured in cycles per
                                                                        second or hertz


                                                                7                                                             8
                         Sound                                                                       Sound

•   500 Hz sound wave                                          •   1000 Hz sound wave
         0.6                                                             0.5
                                           line 1                                                                                      line 1

                                                                         0.4

         0.4
                                                                         0.3


                                                                         0.2
         0.2
                                                                         0.1


           0                                                              0


                                                                        -0.1

         -0.2
                                                                        -0.2


                                                                        -0.3
         -0.4

                                                                        -0.4


         -0.6                                                           -0.5
                0   50   100   150   200            250                        0         50          100          150           200              250




                                                           9                                                                                            10




                         Sound                                                                       Sound

•   500+1000 Hz sound wave                                     •   Another way to draw sound is to plot frequency vs.
           1
                                                                   pressure
                                           line 1

         0.8


         0.6
                                                               •   500+1000 Hz sound wave
                                                                        6000
                                                                                                                                       line 1
         0.4


         0.2                                                            5000


           0

                                                                        4000
         -0.2


         -0.4
                                                                        3000

         -0.6


         -0.8                                                           2000


           -1
                0   50   100   150   200            250
                                                                        1000




                                                                           0
                                                                               0   200   400   600   800   1000   1200   1400   1600      1800   2000




                                                          11                                                                                            12
                                                                           Sound                                                                              Acoustic phonetics

                                    •    All sounds are made up of combinations of sine waves                                                  •   Vibrations of the vocal folds generates sound at a
                                         at various frequencies                                                                                    fundamental frequency plus many harmonics

                                    •    Complex sounds can be broken down into their                                                          •   Vocal tract is a tube closed at one end that forms a
                                         component frequencies (power spectrum)                                                                    resonant cavity
3/6/00        2:53 pm
                                    •
                              Page 259
                                         Male voices mostly between 80—200 Hz, female voices                                                   •   The length and shape of the tube determine the
                                         may go up to 400 Hz                                                                                       overtones, or resonant frequencies

                                    •    Human hearing ranges from 20 Hz to 20,000 Hz                                                          •   The position of the tongue and other articulators
                                         (nominally)                  Fitch – Evolution of                                   speech   Review       emphasizes certain frequencies and de-emphasizes
                                                                                                                                                   others
                                    •    Telephones pass 400 Hz to 3,400 Hz

 1. How vocal sounds are produced
                                                                                                                                               •   Source + filter = output

  speech uses rapid variations     (a)
  s acoustic parameters to pack
ng amount of information
  ort utterance. The basic ma-
 that underlies this process is                                                                                                       13                                                                  14

 ilar in humans and in other
 ls: air exhaled from the lungs
   power to drive oscillations
ocal folds (commonly known
‘cords’), which are located in
nx or ‘voice box’. The rate of
 ld oscillation (which varies
out 100 Hz in adult men to
                                              Larynx
                                             (Source)
                                                         Source+filter model
                                                            +
                                                                Vocal tract
                                                                  (Filter)
                                                                            =                                      Output                                     Acoustic phonetics
    in small children) deter-
he pitch of the sound thus         (b)
d. The acoustic energy gen-                                                   F1            F2
hen passes through the vocal                                                                               F3
 e pharyngeal, oral and nasal
    where it is filtered, and                               Output
  out to the environment
   the nostrils and lips. It is
 ring process that plays a cru-
   in speech. The filtering is
 ished by a series of bandpass
which are termed formants.
mants modify the sound that                                                                         Formants
  d, allowing specific frequen-
ass unhindered, but blocking
  mission of others. Formants
 rmined by the length and
 f the vocal tract, and are
 modified during speech by                                       Source
the articulators (tongue, lips,
 te, etc.).
  mperative to note that for-                                                                         trends in Cognitive Sciences
                                                                                                                                      15                                                                  16
are independent of pitch.
 determined by the vibration       Fig. I. Source/filter theory of vocal production. The source/filter theory of vocal
  he vocal folds (the source),     production, originally proposed for speech, appears to apply to vocal production in all mam-
  formants are determined by       mals studied so far. The theory holds that vocalizations result from a sound source (typically
 l tract (the filter). The inde-   produced at the larynx) combined with a vocal tract filter (which consists of a number of for-
 Acoustic phonetics                                                 Spectrograms

                                                •   A spectrogram (‘voice print’) is a plot of the relative
                                                    intensities of each frequency over time

                                                •   With practice, a trained phonetician can sometimes
                                                    guess what is being said just from looking at the
                                                    formants

                                                •   Wide vs. narrow band




                                           17                                                                 18




       Spectrograms                                                 Spectrograms




wide                              narrow                     wide                               narrow

        ii aa ii aa ii aa . . .                                             aaa...

                                           19                                                                 20
                    Spectrogram                                                           Spectrogram




                                                                                        lava rocks glow in the dark
                         computer




                                                                 21                                                            22




               Acoustic phonetics                                                           Segments

•   The resonant properties of the vocal tract emphasize              •   Why are there phones?
    some frequencies and suppress others
                                                                          •   Perceptual Magnet Effect (Iverson & Kuhl 1996)
•   Vowel sounds are the sum of all the resonant
                                                                          •   Self-organization (Oudeyer 2006)
    frequencies produced in the vocal tract: the
    fundamental frequency (f0) plus higher formants (f1, f2,
    f3)

•   Consonants add noise and silence to the signal

•   The relative frequencies of f1, f2, and f3, plus noise and
    silence, are perceived as speech sounds




                                                                 23                                                            24
                                                   Segments                                                                                                       Segments

B50                               P. Iverson et al. / Cognition 87 (2003) B47–B57




                                                                                                             Fig. 2. Goodness, identification, and MDS solutions for Japanese, German, and American listeners. In the goodness
                                                                                                                                                      black = /r/, white = /l/
                                                                                                             and identification graphs, the size of the circle indicates the average goodness rating (larger circle for higher
                                                                                                             goodness), and the shading indicates the most frequently chosen phonetic category for that stimulus in terms of
           B52                         P. Iverson et al. / Cognition 87 (2003) B47–B57                       the listeners’ native language (black for the respective /r/ sounds in Japanese, German, and English; white for /l/
Fig. 1. The formant frequencies for the English results (Fig.stimuli used clues about the causes of these sounds in German and English, and /w/ in Japanese). The numbers within the circles list the average goodness
              The identification and goodness       /ra/ and /la/ 2) provide in this study (from Iverson & Kuhl,
           stimuli varied perceptual sensitivity. Japanese adults assimilated these stimuli into their /r/ ratings and the identification percentages for the predominant phonetic category. The MDS solutions are geometric
1996). The differences in in terms of the second (F2) and third (F3) formants during the initial consonant. The
                                                                                                             representations of the average similarity ratings for these stimuli. The lines between stimuli reflect their spacing in
           category, but spaced equally the assimilation varied with F2 frequency (stimuli 25 lower the stimulus grid (see Fig. 1), and the length of the lines reflects perceptual sensitivities for these acoustic26
formant frequencies were the strength of on the mel scale (Stevens et al., 1937).                with                                                                                                                   differences
           F2 frequencies began to sound like /w/). Their greater sensitivity to F2 may have been (perceptually similar stimuli are placed close together; perceptually dissimilar stimuli are placed far apart).
         caused by this been raised in monolingual homes, and had no clear evidence
All participants had category assimilation (see Best, 1994), although there was not learned other
         for a perceptual magnet effect (Iverson & Kuhl, 1996; Kuhl, 1991) because no stimuli
                     attending Japanese phonemes. German German these stimuli good
languages prior toexemplars of school. All Japanese and adults heardspeakers hadasreceived
         were good
         exemplars of their /l/ in as poor exemplars of their uvular of 7.2 years category
English language instruction and school, for an average duration fricative. Theirfor German
speakers and 7.5 years for Japanese Segments Japanese speakers had lived abroad.
                                          speakers. No                                                                                                     Self-organization
German speakers had spent an average of 2.6 months visiting English-speaking countries.

2.2. Stimuli

   The stimuli were 18 /ra/ and /la/ tokens from a previous study (Iverson & Kuhl, 1996).
They were synthesized (Klatt & Klatt, 1990) to model natural citation speech recordings of
an adult female native American English speaker. As shown in Fig. 1, the stimuli varied in
the frequencies of the second (F2) and third (F3) formants during the consonant closure, to
create a two-dimensional stimulus grid with the frequencies equally spaced on the mel
scale (Stevens, Volkmann, & Newman, 1937). The stimuli were identical in all other
respects. During the closure, F1 was 351 Hz and F4 was 4512 Hz; the bandwidths were
                                  black the formant /w/
200, 100, 150, and 100 Hz, to match = /r/, white =amplitudes of the natural recordings.
During the vowel, the formant frequencies for F1–F4 were 796, 1221, 2973, and 4512; the Fig. 1. The cells in the honey-bees nests (figure on the left) have a perfect hexagonal
bandwidths were 200, 100, 150, and 400.                                                     shape. Packed water bubbles take spontaneously this shape under the laws of physics
                                                                                                                         (figure on the right). This lead D’Arcy Thompson to think that these same laws of
                                                                                                                         physics might be of great help in the building of their hexagonal wax cells.
2.3. Procedure                                                                                          27                                                                                                             28


2.3.1. Identification and goodness                                                       spontaneously and form the structure we want to explain.
   Participants identified each stimulus in terms of their own native-language phonemes,
                               Self-organization                                                                       Self-organization




                                                                                         Fig. 4. Perceptual neural maps of two agents at the beginning (the two agents are
   Fig. 2. Architecture of the artificial system : agents are given an artificial ear,                                                                                  square
                                                                                         chosen randomly among a set of 20 agents). Units are arbitrary. Each of both30
                                                                                29
   an artificial vocal tract, and an artificial “brain” which couples these two organs.
                                                                                         represents the perceptual map of one agent.
   Agents are themselves coupled through their common environment : they perceive
   the vocalizations of their neighbours.

   an overview of the architecture. We will now describe the technical details of
   the architecture.
                               Self-organization                                                                             Phonetics

   5.2   Motor neurons, vocal tract and production of vocalizations
                                                                                                        •   Basic units of speech are phones or segments

                                                                                                        •   Phonetics looks at how phones are produced or what
   Structure. A motor neuron j is characterized by a preferred vector vj which                              they sound like, independent from how they are used
                                                                                                            in particular languages
   determines the vocal tract configuration which is to be reached when it is
   activated and when the agent sends a GO signal to the motor neural map.                              •   Both approaches play an important role in computer
   This GO signal is sent at random times by the agent to the motor neural                                  speech production and comprehension
   map. As a consequence, the agent produces vocalizations at random times,
   independently of any events.
                                                                                                        •   As always, boundaries are fuzzy
                                                                                              Fig. 5. Representation of the same two agent’s attractor field initially.
   When an agent produces a vocalization, the neurons which are activated are            so we represent here only the perceptual maps, as in the rest of the paper).
   chosen randomly. Typically, 2, 3 or 4 neurons are chosen and activated in se-         Yet, it is not so easy to visualize the clusters with the representation in Figure
   quence. Each activation of a neuron specifies, through its preferred vector, a         6, since there are a few neurons which have preferred vectors not belonging to
   vocal tract configuration objective that a sub-system takes care of reaching by        these clusters. They are not statistically significant, but introduce noise into
Fig. 6. Neural maps after 2000 interactions, corresponding to the initial state of
   moving continuously the articulators. In this paper, this sub-system is simply        the representation. Furthermore, in the clusters, basically all points have the
   a linear interpolator, which produces 10 intermediate configurations of neurons,
figure 4 The number of points that one can see is fewer than the number between  31                                                                                  32
                                                                                         same value so that they appear as one point. Figure 7 shows better the clus-
since clusters of neurons have the same preferred vectors andathis is represented by
   each articulatory objective, which is an approximation of dynamic continu-
                                                                                         ters using the attractor landscape that is associated with them. We see that
only one point.
   ous vocalization and that we denote ar1 , ar2 , ..., arN . We did not use realistic
                                                                                         there are now three well-defined attractors or categories, and that there are
   mechanisms like the propagation techniques of population codes proposed in

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:6
posted:7/9/2011
language:English
pages:8